我在如下所示的html中有一个表格,例如,我需要提取“捕捉时间”列下的“结束捕捉”值,该值为03-Sep-20 02:00:01
<table border="0" width="600" class="tdiff" summary="for snapshot information">
<tr><th class="awrnobg" scope="col"></th><th class="awrbg" scope="col">Snap Id</th><th class="awrbg" scope="col">Snap Time</th><th class="awrbg" scope="col">Sessions</th><th class="awrbg" scope="col">Cursors/Session</th><th class="awrbg" scope="col">Instances</th></tr>
<tr><td scope="row" class='awrnc'>Begin Snap:</td><td align="right" class='awrnc'>121525</td><td align="center" class='awrnc'>03-Sep-20 01:30:07</td><td align="right" class='awrnc'>167</td><td align="right" class='awrnc'> 10.4</td><td align="right" class='awrnc'>6</td></tr>
<tr><td scope="row" class='awrc'>End Snap:</td><td align="right" class='awrc'>121526</td><td align="center" class='awrc'>03-Sep-20 02:00:01</td><td align="right" class='awrc'>174</td><td align="right" class='awrc'> 11.2</td><td align="right" class='awrc'>6</td></tr>
<tr><td scope="row" class='awrnc'>Elapsed:</td><td class='awrnc'> </td><td align="center" class='awrnc'> 29.90 (mins)</td><td class='awrnc'> </td><td class='awrnc'> </td><td class='awrnc'> </td></tr>
<tr><td scope="row" class='awrc'>DB Time:</td><td class='awrc'> </td><td align="center" class='awrc'> 67.15 (mins)</td><td class='awrc'> </td><td class='awrc'> </td><td class='awrc'> </td></tr>
</table>
要求的值以以下格式请求:columnname_row name:
Snap Id_Begin Snap
Snap Id_End Snap
Snap Time_Begin Snap
Snap Time_End Snap
它进入一个名为namesplit的变量。我想先拉列号和行号,然后打印所需的值:
dbii = soup.find_all("table", attrs={"summary": "for snapshot information"})
for tables in dbii:
vcols=tables.findChildren('th')
#print(type(rows)) #bs4.element.ResultSet
#print(rows)
#print(ti)
ii=0
for value ivcols:
#print(value.strip)
#print(value.string)
#print(type(value)) # bs4.element.Tag
if(value.text!=None and value.text.lower() == namesplit[0].lower()): # this matches the column name string
print("match")
col_no=ii
table_no=ti
else:
ii+=1
ti+=1
print(table_no,col_no,namesplit[1]) # correctly gives table 0, column as 1 or 2
print("abc")
#print(dbii[table_no])
#print(type(dbii[table_no]))
# Find Row number.
drow=dbii[table_no].find_all(scope = 'row' )
j=0
print(row_no)
for value in drow:
#print("row",j,"asdasdsad:",value,value.text)
if(value.text!=None and namesplit[1].lower() in value.text.lower() ):
row_no=j
j+=1
print(row_no) # correctly picks the td row as 0(for begin) or 1 (for end)
# We have Table no , column number, Row_no .. get the corresponding value.
fvalue=dbii[table_no].find_all(tr)[row_no] ## this doesnt work. as its a tag.
print(type(fvalue)) ## tag ??
print(fvalue)
要打印End Snap
行和第三列中的值,您可以执行以下操作:
from bs4 import BeautifulSoup
html_text = '''
<table border="0" width="600" class="tdiff" summary="for snapshot information">
<tr><th class="awrnobg" scope="col"></th><th class="awrbg" scope="col">Snap Id</th><th class="awrbg" scope="col">Snap Time</th><th class="awrbg" scope="col">Sessions</th><th class="awrbg" scope="col">Cursors/Session</th><th class="awrbg" scope="col">Instances</th></tr>
<tr><td scope="row" class='awrnc'>Begin Snap:</td><td align="right" class='awrnc'>121525</td><td align="center" class='awrnc'>03-Sep-20 01:30:07</td><td align="right" class='awrnc'>167</td><td align="right" class='awrnc'> 10.4</td><td align="right" class='awrnc'>6</td></tr>
<tr><td scope="row" class='awrc'>End Snap:</td><td align="right" class='awrc'>121526</td><td align="center" class='awrc'>03-Sep-20 02:00:01</td><td align="right" class='awrc'>174</td><td align="right" class='awrc'> 11.2</td><td align="right" class='awrc'>6</td></tr>
<tr><td scope="row" class='awrnc'>Elapsed:</td><td class='awrnc'> </td><td align="center" class='awrnc'> 29.90 (mins)</td><td class='awrnc'> </td><td class='awrnc'> </td><td class='awrnc'> </td></tr>
<tr><td scope="row" class='awrc'>DB Time:</td><td class='awrc'> </td><td align="center" class='awrc'> 67.15 (mins)</td><td class='awrc'> </td><td class='awrc'> </td><td class='awrc'> </td></tr>
</table>
'''
soup = BeautifulSoup(html_text, 'html.parser')
print(soup.select_one('tr:has(td:contains("End Snap:")) td:nth-child(3)').text)
印刷品:
03-Sep-20 02:00:01
要获取所有值,您可以执行以下操作:
all_data = []
for row in soup.select('tr:has(td)'):
tds = [td.get_text(strip=True) for td in row.select('td')]
all_data.append(tds)
for row in all_data:
print('{:<20} {:<20} {:<20} {:<20} {:<20} {:<20}'.format(*row))
印刷品:
Begin Snap: 121525 03-Sep-20 01:30:07 167 10.4 6
End Snap: 121526 03-Sep-20 02:00:01 174 11.2 6
Elapsed: 29.90 (mins)
DB Time: 67.15 (mins)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句