How to find all the text on a page that falls under this criteria using beautiful soup?
<tr>
<td class="d_g_l_e" style="border-right:none;”>
<img src="/d2l/img/LP/pixel.gif" width="20" height="20" alt=“”
</td>
<th scope="row" class="d_gt d_ich" style="border-left:none;”>
<div class="dco”>
<div class="dco_c”>
<div class="dco”>
<div class="dco_c”>
<strong> **EXTRACT THIS (NAME)** </strong>
</div>
</div>
</div>
</div>
</th>
<td class="d_gn d_gr d_gt”>
<div class="dco”>
<div class="dco_c”>
<div class="dco”>
<div class="dco_c" style="text-align:right;”>
<div style="text-align:center;display:inline;”>
<label id="z_c"> **EXTRACT THIS (GRADE)** </label>
</div>
</div>
</div>
</div>
</div>
</td>
<td class="d_gn d_gr d_gt"> </td>
</tr>
I want the program to scan the whole html page and collect all of the variables this appear in this form. If the "tr" tag (main tag I'm looking for) has both a NAME and a GRADE underneath it, add the name to a list (List1), and then add the grade to a separate list (List2). If one of the two is missing underneath the "tr" tag, skip it, and don't record anything. So by the time the script is done scanning the page, a list would look something like:
List1 = [Grade 1, Grade 2, Grade 3, Grade 4]
List2 = [10/20, 20/40, 50/50, 33/44]
Also, the "z" label ID for the grade text changes from grade to grade, ex. z_a, z_b, z_c.
For each tr
on the page, find strong
tag inside the th
and label
tag inside the td
:
soup = BeautifulSoup(data)
for row in soup.find_all('tr'):
grade = row.select('th strong')
name = row.select('td label')
if grade and name:
print grade[0].text, name[0].text
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments