Python Beautiful Soup - find value based on text in HTML

debugcn Published at Dev

clg4

I am having a problem finding a value in a soup based on text. Here is the code

from bs4 import BeautifulSoup as bs
import requests
import re

html='http://finance.yahoo.com/q/ks?s=aapl+Key+Statistics'
r = requests.get(html)
soup = bs(r.text)
findit=soup.find("td", text=re.compile('Market Cap'))

This returns [], yet there absolutely is text in a 'td' tag with 'Market Cap'. When I use

soup.find_all("td")

I get a result set which includes:

<td class="yfnc_tablehead1" width="74%">Market Cap (intraday)<font size="-1"><sup>5</sup></font>:</td>

alecxe

Explanation:

The problem is that this particular tag has other child elements and the .string value, which is checked when you apply the text argument, is None (bs4 has it documented here).

Solutions/Workarounds:

Don't specify the tag name here at all, find the text node and go up to the parent:

soup.find(text=re.compile('Market Cap')).parent.get_text()

Or, you can use find_parent() if td is not the direct parent of the text node:

soup.find(text=re.compile('Market Cap')).find_parent("td").get_text()

You can also use a "search function" to search for the td tags and see if the direct text child nodes has the Market Cap text:

soup.find(lambda tag: tag and
                      tag.name == "td" and
                      tag.find(text=re.compile('Market Cap'), recursive=False))

Or, if you are looking to find the following number 5:

soup.find(text=re.compile('Market Cap')).next_sibling.get_text()

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-07-16

Comments

0 comments

From Dev

Find <td> tag value with specific text (Beautiful Soup)

From Dev

How to find a particular word in html page through beautiful soup in python?

From Dev

Python beautiful soup select text

From Dev

HTML tables with python beautiful soup

From Dev

Python Beautiful Soup (HTML Parsing)

From Dev

Beautiful Soup: Get text data from html

From Dev

Python - Beautiful Soup OR condition in soup.find_all(....)

From Dev

Python Beautiful soup insert comment in html

From Dev

Parsing html using Beautiful Soup in python

From Dev

Parsing html using Beautiful Soup in python

From Dev

Python Beautiful soup insert comment in html

From Dev

How can I find text in class and class name having spaces through Beautiful Soup in Python?

From Dev

how to replace a specific text line within a html page with beautiful soup in python

From Dev

Downloading target link html in a text file (Beautiful Soup - Python3)

From Dev

Use beautiful soup to find elements by textual contents, not text?

From Dev

How to find an html element using Beautiful Soup and regex strings

From Dev

Parse HTML with Beautiful Soup. Return text from specific tag

From Dev

Parse HTML with Beautiful Soup. Return text from specific tag

From Dev

Using page text to select `html` element using`Beautiful Soup`

From Dev

How to get the text from the HTML using Beautiful Soup

From Dev

TypeError in Python - Beautiful Soup

From Dev

TypeError in Python - Beautiful Soup

From Dev

Trying to extract value from html page using beautiful soup

From Dev

Python Beautiful Soup Most Efficient Way to Find Tags

From Dev

Python/Beautiful Soup find particular heading output full div

From Dev

Python: find_all in Beautiful soup does not return what is expected

From Dev

Python Beautiful Soup find string and extract following string

From Dev

Python beautiful soup4- find_all returns "[]"

From Dev

Extracting text nested within several tags with Beautiful Soup — Python

Related Related

Article