我正在尝试从以下页面获取网站地址列表:https : //www.wer-zu-wem.de/dienstleister/filmstudios.html
我的代码:
import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.wer-zu-wem.de/dienstleister/filmstudios.html")
src = result.content
soup = BeautifulSoup(src, 'lxml')
links = soup.find_all('a', {'class': 'col-md-4 col-lg-5 col-xl-4 text-center text-lg-right'})
print(links)
import requests
from bs4 import BeautifulSoup
webLinksList = []
result = requests.get(
"https://www.wer-zu-wem.de/dienstleister/filmstudios.html")
src = result.content
soup = BeautifulSoup(src, 'lxml')
website_Links = soup.find_all(
'div', class_='col-md-4 col-lg-5 col-xl-4 text-center text-lg-right')
if website_Links != "":
print("List is empty")
for website_Link in website_Links:
try:
realLink = website_Link.find(
"a", attrs={"class": "btn btn-primary external-link"})
webLinksList.append(featured_challenge.attrs['href'])
except:
continue
for link in webLinksList:
print(link)
开头会显示“列表为空”,而我尝试过的任何操作都不会将任何数据添加到列表中。
请尝试以下操作以获得通往外部网站的所有链接:
import requests
from bs4 import BeautifulSoup
link = "https://www.wer-zu-wem.de/dienstleister/filmstudios.html"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
}
result = requests.get(link,headers=headers)
soup = BeautifulSoup(result.text,'lxml')
for links in soup.find_all('a',{'class':'external-link'}):
print(links.get("href"))
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句