的Python 3
我很难遍历表的行。
如何tr[1]
通过表主体中的teamName,teamState,teamLink xpaths的行数迭代组件?
import lxml.html
from lxml.etree import XPath
url = "http://www.maxpreps.com/rankings/basketball-winter-15-16/7/national.htm"
rows_xpath = XPath('//*[@id="rankings"]/tbody)
teamName_xpath = XPath('//*[@id="rankings"]/tbody/tr[1]/th/a/text()')
teamState_xpath = XPath('//*[@id="rankings"]/tbody/tr[1]/td[2]/text()')
teamLink_xpath = XPath('//*[@id="rankings"]/tbody/tr[1]/th/a/@href')
html = lxml.html.parse(url)
for row in rows_xpath(html):
teamName = teamName_xpath(row)
teamState = teamState_xpath(row)
teamLink = teamLink_xpath(row)
print (teamName, teamLink)
我还尝试通过以下方法进行此操作:
from lxml import html
import requests
siteItem = ['http://www.maxpreps.com/rankings/basketball-winter-15-16/7/national.htm'
]
def linkScrape():
page = requests.get(target)
tree = html.fromstring(page.content)
#Get team link
for link in tree.xpath('//*[@id="rankings"]/tbody/tr[1]/th/a/@href'):
print (link)
#Get team name
for name in tree.xpath('//*[@id="rankings"]/tbody/tr[1]/th/a/text()'):
print (name)
#Get team state
for state in tree.xpath('//*[@id="rankings"]/tbody/tr[1]/td[2]/text()'):
print (state)
for target in siteItem:
linkScrape()
谢谢你的:D
如果我了解您的要求,则要遍历ranking
表中的行。因此,从这些行的循环开始:
import lxml.html
doc = lxml.html.parse('http://www.maxpreps.com/rankings/basketball-winter-15-16/7/national.htm')
for row in doc.xpath('//table[@id="rankings"]/tbody/tr'):
这将遍历该文档中的每一行。现在,对于每一行,提取所需的数据:
team_link = row.xpath('th/a/@href')[0]
team_name = row.xpath('th/a/text()')[0]
team_state = row.xpath('td[contains(@class, "state")]/text()')[0]
print(team_state, team_name, team_link)
在我的系统上哪个产生如下输出:
CA Manteca /high-schools/manteca-buffaloes-(manteca,ca)/basketball-winter-15-16/rankings.htm
MD Mount St. Joseph (Baltimore) /high-schools/mount-st-joseph-gaels-(baltimore,md)/basketball-winter-15-16/rankings.htm
TX Brandeis (San Antonio) /high-schools/brandeis-broncos-(san-antonio,tx)/basketball-winter-15-16/rankings.htm
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句