我在用beautifulsoup抓取网站“ http://www.queensbronxba.com/directory/ ”时陷入了困境。我几乎完成了抓取,我只从段落标签中找到的列表中留下了公司名称。问题是在同一个 div 中有更多的段落标签,但我只需要第一个,因为它给出了公司名称。所以我需要关于以下 div 的第一段也不仅仅是在第一个。这是我用来 srcape 的代码:
page = requests.get("http://www.queensbronxba.com/directory/")
soup = BeautifulSoup(page.content, 'html.parser')
company = soup.find(class_="boardMemberWrap")
contact = company.find_all(class_="boardMember")
info = contact[0]
print(info.prettify())
name_tags = company.select("h4")
names = [nt.get_text() for nt in company_tags]
names
company_tags = company.select("p") #here I need help to get only first paragraphs of following div containers
companies = [ct.get_text() for ct in company_tags]
companies
phone_tags = company.select('a[href^="tel"]')
phones = [pt.get_text() for pt in phone_tags]
phones
email_tags = company.select('a[href^="mailto"]')
emails = [et.get_text() for et in email_tags]
emails
import requests
from bs4 import BeautifulSoup
page = requests.get("http://www.queensbronxba.com/directory/")
soup = BeautifulSoup(page.content, 'html.parser')
company = soup.find(class_="boardMemberWrap")
contact = company.findAll(class_="boardMemberInfo")
info = contact[0]
print(info.prettify())
name_tags = company.select("h4")
names = [nt.get_text() for nt in name_tags]
print(names)
for name in company.findAll(class_="boardMember"):
for n in name.findAll('p')[:1]:
print(n.text)
phone_tags = company.select('a[href^="tel"]')
phones = [pt.get_text() for pt in phone_tags]
print(phones)
email_tags = company.select('a[href^="mailto"]')
emails = [et.get_text() for et in email_tags]
print(emails)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句