Python，美丽的汤，<br> 标签

debugcn 发表于 Dev

索菲利亚

所以我查看了堆栈溢出，但似乎无法找到我的问题的答案。如何在 <br> 标签后获取文本、特定文本？

这是我的代码：

product_review_container = container.findAll("span",{"class":"search_review_summary"})
for product_review in product_review_container:
    prr = product_review.get('data-tooltip-html')
    print(prr)

这是输出：

Very Positive<br>86% of the 1,013 user reviews for this game are positive.

我只想要这个字符串中的 86% 和单独的 1,013。所以只有数字。但是它不是 int 所以我不知道该怎么做。

原文出处如下：

   [<span class="search_review_summary positive" data-tooltip-html="Very Positive&lt;br&gt;86% of the 1,013 user reviews for this game are positive.">
</span>]

这是我获取信息的链接：https : //store.steampowered.com/search/?specials=1&page=1

谢谢！

克里斯托夫·瓦尔加

你需要在这里使用正则表达式！

import re

string = 'Very Positive<br>86% of the 1,013 user reviews for this game are positive.'
a = re.findall('(\d+%)|(\d+,\d+)',string)
print(a)

output: [('86%', ''), ('', '1,013')]
#Then a[0][0] will be 86% and a[1][1] will be 1,013

其中 \d 是字符串中的任意数字字符，+ 表示至少有 1 位或更多位数字。

如果您需要更具体的正则表达式，则可以在https://regex101.com 中尝试

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。