Parsing for text under specific tags in HTML, Python

Jackson Blankenship

How to find all the text on a page that falls under this criteria using beautiful soup?

<tr>
    <td class="d_g_l_e" style="border-right:none;”>
        <img src="/d2l/img/LP/pixel.gif" width="20" height="20" alt=“”
    </td>
    <th scope="row" class="d_gt d_ich" style="border-left:none;”>
        <div class="dco”>
            <div class="dco_c”>
                <div class="dco”>
                    <div class="dco_c”>
                        <strong> **EXTRACT THIS (NAME)** </strong>
                    </div>
                </div>
            </div>
        </div>
    </th>
<td class="d_gn d_gr d_gt”>
    <div class="dco”>
        <div class="dco_c”>
            <div class="dco”>
                <div class="dco_c" style="text-align:right;”>
                    <div style="text-align:center;display:inline;”>
                        <label id="z_c"> **EXTRACT THIS (GRADE)** </label>
                    </div>
                </div>
            </div>
        </div>
    </div>
</td>
<td class="d_gn d_gr d_gt">&nbsp;</td>
</tr>

I want the program to scan the whole html page and collect all of the variables this appear in this form. If the "tr" tag (main tag I'm looking for) has both a NAME and a GRADE underneath it, add the name to a list (List1), and then add the grade to a separate list (List2). If one of the two is missing underneath the "tr" tag, skip it, and don't record anything. So by the time the script is done scanning the page, a list would look something like:

List1 = [Grade 1, Grade 2, Grade 3, Grade 4]
List2 = [10/20, 20/40, 50/50, 33/44]

Also, the "z" label ID for the grade text changes from grade to grade, ex. z_a, z_b, z_c.

alecxe

For each tr on the page, find strong tag inside the th and label tag inside the td:

soup = BeautifulSoup(data)

for row in soup.find_all('tr'):
    grade = row.select('th strong')
    name = row.select('td label')
    if grade and name:
        print grade[0].text, name[0].text

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Parsing html with <br> tags (Python)

From Dev

Parsing and Storing HTML Tags Along With Text

From Dev

HTML parsing - getting text between all tags

From Dev

regex pattern in python for parsing HTML title tags

From Dev

Python HTML Parsing Between two tags

From Dev

Python text parsing and saving as html

From Dev

In Python, Parsing Custom XML Tags Without Parsing HTML

From Dev

Parsing big text files with python specific syntax

From Dev

Get Text including the HTML Tags XML Parsing Android

From Dev

Parsing HTML to text with link-tags remaining in R

From Dev

parsing HTML tags coming from a json response text

From Dev

HTML parsing with lxml, python, .tail being broken up by <br> tags

From Dev

HTML parsing with lxml, python, .tail being broken up by <br> tags

From Dev

Extracting text without tags of HTML with Beautifulsoup Python

From Dev

beautifulsoup .get_text() is not specific enough for my HTML parsing

From Dev

Keep specific HTML tags after string is passed into .text() function

From Dev

Regular expression to match text outside html tags and not between specific tag

From Dev

Change color of specific text within HTML tags using Javascript

From Dev

JSOUP using Nodes to get specific text that is outside HTML tags

From Dev

QRegExp - How to get specific text between two HTML tags

From Dev

Regular expression to match text outside html tags and not between specific tag

From Dev

Parsing with regex between html tags

From Dev

python remove html tags including html entities but not normal text with a '&' prefix

From Dev

python remove html tags including html entities but not normal text with a '&' prefix

From Dev

Parsing HTML with formatted text

From Dev

Reading and parsing HTML files starting from a specific line using Python

From Dev

Show html tags as a text

From Dev

Style text with HTML tags

From Dev

How to ignore texts that have specific html tags in XPath Python Selenium?

Related Related

  1. 1

    Parsing html with <br> tags (Python)

  2. 2

    Parsing and Storing HTML Tags Along With Text

  3. 3

    HTML parsing - getting text between all tags

  4. 4

    regex pattern in python for parsing HTML title tags

  5. 5

    Python HTML Parsing Between two tags

  6. 6

    Python text parsing and saving as html

  7. 7

    In Python, Parsing Custom XML Tags Without Parsing HTML

  8. 8

    Parsing big text files with python specific syntax

  9. 9

    Get Text including the HTML Tags XML Parsing Android

  10. 10

    Parsing HTML to text with link-tags remaining in R

  11. 11

    parsing HTML tags coming from a json response text

  12. 12

    HTML parsing with lxml, python, .tail being broken up by <br> tags

  13. 13

    HTML parsing with lxml, python, .tail being broken up by <br> tags

  14. 14

    Extracting text without tags of HTML with Beautifulsoup Python

  15. 15

    beautifulsoup .get_text() is not specific enough for my HTML parsing

  16. 16

    Keep specific HTML tags after string is passed into .text() function

  17. 17

    Regular expression to match text outside html tags and not between specific tag

  18. 18

    Change color of specific text within HTML tags using Javascript

  19. 19

    JSOUP using Nodes to get specific text that is outside HTML tags

  20. 20

    QRegExp - How to get specific text between two HTML tags

  21. 21

    Regular expression to match text outside html tags and not between specific tag

  22. 22

    Parsing with regex between html tags

  23. 23

    python remove html tags including html entities but not normal text with a '&' prefix

  24. 24

    python remove html tags including html entities but not normal text with a '&' prefix

  25. 25

    Parsing HTML with formatted text

  26. 26

    Reading and parsing HTML files starting from a specific line using Python

  27. 27

    Show html tags as a text

  28. 28

    Style text with HTML tags

  29. 29

    How to ignore texts that have specific html tags in XPath Python Selenium?

HotTag

Archive