from bs4 import BeautifulSoup
import requests
import os
url = requests.get("https://www.pexels.com/search/flower/")
soup = BeautifulSoup(url.text, "html.parser")
links = []
x = soup.select('img[src^="https://images.pexels.com/photos"]')
for img in x:
links.append(img['src'])
for l in links:
print(l)
我建议使用selenium webdriver获取所有页面源,然后解析它。
from bs4 import BeautifulSoup
from selenium import webdriver
url = "https://www.pexels.com/search/flower/"
options = webdriver.FirefoxOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')
options.headless = True
driver = webdriver.Firefox(executable_path="./geckodriver", options=options)
driver.get(url)
content = driver.page_source
driver.quit()
soup = BeautifulSoup(content, "html.parser")
links = []
x = soup.select('img[src^="https://images.pexels.com/photos"]')
for img in x:
links.append(img['src'])
for l in links:
print(l)
这里的geckodriver最新版本,运行此代码是必需的。
我确实得到以下输出:
https://images.pexels.com/photos/36753/flower-purple-lical-blosso.jpg?auto=compress&cs=tinysrgb&dpr=1&w=500
https://images.pexels.com/photos/3860667/pexels-photo-3860667.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500
https://images.pexels.com/photos/133472/pexels-photo-133472.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500
https://images.pexels.com/photos/4618416/pexels-photo-4618416.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500
https://images.pexels.com/photos/4234543/pexels-photo-4234543.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500
...
https://images.pexels.com/photos/4492525/pexels-photo-4492525.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500
https://images.pexels.com/photos/4210784/pexels-photo-4210784.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500
https://images.pexels.com/photos/4210781/pexels-photo-4210781.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句