I am having a problem finding a value in a soup based on text. Here is the code
from bs4 import BeautifulSoup as bs
import requests
import re
html='http://finance.yahoo.com/q/ks?s=aapl+Key+Statistics'
r = requests.get(html)
soup = bs(r.text)
findit=soup.find("td", text=re.compile('Market Cap'))
This returns [], yet there absolutely is text in a 'td' tag with 'Market Cap'. When I use
soup.find_all("td")
I get a result set which includes:
<td class="yfnc_tablehead1" width="74%">Market Cap (intraday)<font size="-1"><sup>5</sup></font>:</td>
Explanation:
The problem is that this particular tag has other child elements and the .string
value, which is checked when you apply the text argument, is None
(bs4 has it documented here).
Solutions/Workarounds:
Don't specify the tag name here at all, find the text node and go up to the parent:
soup.find(text=re.compile('Market Cap')).parent.get_text()
Or, you can use find_parent()
if td
is not the direct parent of the text node:
soup.find(text=re.compile('Market Cap')).find_parent("td").get_text()
You can also use a "search function" to search for the td
tags and see if the direct text child nodes has the Market Cap
text:
soup.find(lambda tag: tag and
tag.name == "td" and
tag.find(text=re.compile('Market Cap'), recursive=False))
Or, if you are looking to find the following number 5
:
soup.find(text=re.compile('Market Cap')).next_sibling.get_text()
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments