使用这段HTML:
<td align="left">
<a class="playerLink" href="http://bbroto.baseball.cbssports.com/players/playerpage/2000032">
Russell, Addison
</a>
SS OAK - Won at $0
<br>
<a class="playerLink" href="http://bbroto.baseball.cbssports.com/players/playerpage/556425">
Vargas, Jason
</a>
SP LAA
<span title="Angels interested in bringing back Jason Vargas">
<a class="playerLink" href="http://bbroto.baseball.cbssports.com/players/playerpage/556425" subtab="Update">
<img border="0" height="10" src="http://sports.cbsimg.net/images/news-note-recent.gif" width="10"/>
</a>
</span>
- Dropped
</br>
</td>
我只想显示没有子选项卡=“ Update”的块。但是我无法弄清楚如何使用BeautifulSoup在Python循环中引用子选项卡。这是我尝试的:
soup = BeautifulSoup(html)
pl = soup.findAll('a',{'class': 'playerLink'})
for a in pl:
if a.subtab == "Update":
print "UPDATE"
else:
print "Player Name: " + a.text
我还尝试引用findAll部分中的子类型:
pl = soup.findAll('a',{'class': 'playerLink'}, {'subtype':0})
这些方法都不起作用。我的问题是,在所有情况下,该类都是“ playerLink”,因此子类型是我区分它的唯一方法。我是BS的新手,所以我不太擅长处理标签和属性。在第二个示例中,如果我只想要subtype = Update,但是我想要每个不存在子类型的标记,也许它会起作用。
a.attrs
返回<a>
的属性作为字典。您可以使用来检查<a>
标记是否没有subtab
属性'subtab' not in a.attrs
:
from bs4 import BeautifulSoup, SoupStrainer # pip install beautifulsoup4
player_links = SoupStrainer('a', 'playerLink')
soup = BeautifulSoup(html, parse_only=player_links)
names = [a.get_text().strip()
for a in soup.find_all(player_links) if 'subtab' not in a.attrs]
print(names)
# -> ['Russell, Addison', 'Vargas, Jason']
我找不到文档中提到的位置,但似乎指定subtab=False
也可以排除具有subtab
属性的任何标签:
from bs4 import BeautifulSoup, SoupStrainer # pip install beautifulsoup4
player_links = SoupStrainer('a', 'playerLink', subtab=False)
soup = BeautifulSoup(html, parse_only=player_links)
names = [a.get_text().strip()
for a in soup.find_all(player_links)]
print(names)
如果找到的标签(player_links
)没有嵌套,则可以忽略以下.find_all(player_links)
调用:
from bs4 import BeautifulSoup, SoupStrainer # pip install beautifulsoup4
player_links = SoupStrainer('a', 'playerLink', subtab=False)
soup = BeautifulSoup(html, parse_only=player_links)
names = [a.get_text().strip() for a in soup]
print(names)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句