使用beautifulsoup从HTML获取链接文本

ar.dll 发表于 Dev

ar.dll

在HTML的以下示例部分中，我使用beautifilsoup（简单易用）从页面上提取了很多足球得分：

<tr class='report' id='match-row-EFBO695086'> <td class='statistics show' title='Show latest      match stats'> <button>Show</button> </td>  <td class='match-competition'> Premier League  </td>  <td class='match-details
teams'> <p> <span class='team-home teams'> <a href='/sport/football/teams/manchester-city'>Man City</a> </span>   <span class='score'> <abbr title='Score'> 1-0 </abbr> </span>   <span class='team-away teams'> <a
href='/sport/football/teams/crystal-palace'>Crystal Palace</a> </span>   </p> </td> <td class="match-date"> Sat 28 Dec </td>   <td class='time'>  Full time  </td>   <td class='status'>    <a class='report'
href='/sport/football/25474625'>Report</a>

from bs4 import BeautifulSoup
import urllib.request
import csv

url = 'http://www.bbc.co.uk/sport/football/teams/manchester-city/results/'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page)

for score in soup.findAll('abbr'):
    print(score.string)

*** Remote Interpreter Reinitialized  ***
>>> 
None
1-2 
1-0 
0-2 
2-1 
2-2 
4-1 
0-2 
1-1

如何从HTML的此部分提取团队名称：

<span class='team-away teams'> <a href='/sport/football/teams/crystal-palace'>Crystal Palace</a>    </span>

想法是首先获取包含有关每个游戏的信息的元素-这些是tr带有的标签class="report"。对于每一行，请按班级获取团队名称，team-home并按team-away标签名称得分abbr：

from bs4 import BeautifulSoup
import urllib.request

url = 'http://www.bbc.co.uk/sport/football/teams/manchester-city/results/'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page)

for match in soup.select('table.table-stats tr.report'):
    team1 = match.find('span', class_='team-home')
    team2 = match.find('span', class_='team-away')
    score = match.abbr
    if not all((team1, team2, score)):
        continue

    print(team1.text, score.text, team2.text)

印刷：

Man City   1-2   CSKA 
Man City   1-0   Man Utd 
Man City   0-2   Newcastle 
West Ham   2-1   Man City 
...

仅供参考，table.table-stats tr.report是一个CSS选择器，所有匹配tr的标签与class="report"内部table用class="table-stats"。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-02-14

我来说两句

0条评论

登录后参与评论

来自分类Dev

使用BeautifulSoup从HTML获取文本

来自分类Dev

BeautifulSoup获取文本链接？

来自分类Dev

使用BeautifulSoup在html中获取链接

来自分类Dev

Python：使用Beautifulsoup从html获取文本

来自分类Dev

使用BeautifulSoup在HTML标记后获取文本

来自分类Dev

BeautifulSoup获取链接的内容/文本

来自分类Dev

如何使用beautifulsoup从链接获取文本和URL

来自分类Dev

BeautifulSoup HTML获取src链接

来自分类Dev

在使用BeautifulSoup忽略格式标签的同时，如何从html获取文本？

来自分类Dev

使用Python中的BeautifulSoup获取具有特定类属性的链接的href文本

来自分类Dev

使用BeautifulSoup从html编辑文本

来自分类Dev

如何使用BeautifulSoup从网站获取href链接

来自分类Dev

在Python中使用BeautifulSoup从HTML文本中的嵌套元素中获取文本

来自分类Dev

获取跨度内的文本html beautifulSoup

来自分类Dev

使用 BeautifulSoup 获取 HTML 标签

来自分类Dev

如何从html页面获取文本链接？

来自分类Dev

如何使用BeautifulSoup在HTML中抓取链接

来自分类Dev

使用BeautifulSoup获取没有标签的文本

来自分类Dev

使用beautifulsoup从br标签获取文本

来自分类Dev

无法使用BeautifulSoup获取span属性的文本

来自分类Dev

使用BeautifulSoup从<pre>元素获取文本

来自分类Dev

使用BeautifulSoup获取跨度中的跨度文本

来自分类Dev

使用BeautifulSoup获取没有标签的文本？

来自分类Dev

使用 Beautifulsoup 时如何获取文本标记

来自分类Dev

使用Python和BeautifulSoup解析HTML-在<a>标记内外获取文本

来自分类Dev

如何使用BeautifulSoup bs4获取HTML标签的内部文本值？

来自分类Dev

使用BeautifulSoup从网页获取链接并滚动以获取更多信息

来自分类Dev

Pywikibot获取主要文本中使用的链接

来自分类Dev

如何使用jQuery获取链接文本？

Related 相关文章

文章