我正在尝试从Selenium测试报告html文件中提取一些数据。我将空白打印到PyCharm控制台。我想从P标签中获取所有数据。它在div标签下。
HTML代码段是:
<div class='heading'>
<h1>Test Report</h1>
<p class='attribute'><strong>Start Time:</strong> 2016-08-12 11:57:33</p>
<p class='attribute'><strong>Duration:</strong> 0:48:09.007000</p>
<p class='attribute'><strong>Status:</strong> Pass 75</p>
<p class='description'>Selenium - ClearCore 501 Regression edit project automated test</p>
</div>
首先,我首先尝试确定启动时间,看看是否可以将值打印到控制台。我什么都没打印出来。我也想对它进行描述,Selenium-ClearCore 501 Regression编辑项目自动化测试
我的代码是:
from bs4 import BeautifulSoup
def extract_data_from_report_htmltestrunner():
filename = (r"C:\share\ClearCore501_Automated_GUI_TestReport.html")
html_report_part = open(filename,'r')
soup = BeautifulSoup(html_report_part, "html.parser")
div_heading = soup.find('div', {'class': 'heading'})
p = div_heading.find('p', text='Start Time:')
print "test"
print p
我已经添加了:
if __name__ == "__main__":
extract_data_from_report_htmltestrunner()
我现在得到的输出是:
test
None
请问我做错了什么?
谢谢,里亚兹
该文本是在强大的标签不是* P,从而发现和呼叫.parent得到p标签:
In [10]: html = """<div class='heading'>
....: <h1>Test Report</h1>
....: <p class='attribute'><strong>Start Time:</strong> 2016-08-12 11:57:33</p>
....: <p class='attribute'><strong>Duration:</strong> 0:48:09.007000</p>
....: <p class='attribute'><strong>Status:</strong> Pass 75</p>
....:
....: <p class='description'>Selenium - ClearCore 501 Regression edit project automated test</p>
....: </div>"""
In [11]: from bs4 import BeautifulSoup
In [12]: soup = BeautifulSoup(html, "html.parser")
In [13]: div_heading = soup.find('div', {'class': 'heading'})
In [14]: p = div_heading.find('strong', text='Start Time:').parent
In [15]: print p
<p class="attribute"><strong>Start Time:</strong> 2016-08-12 11:57:33</p>
要获取描述,请使用类名称:
In [16]: div_heading.find("p", class_="description")
Out[16]: <p class="description">Selenium - ClearCore 501 Regression edit project automated test</p>
In [17]: div_heading.find("p", class_="description").text
Out[17]: u'Selenium - ClearCore 501 Regression edit project automated test'
如果只需要日期,请调用p.find(text = True,recursive = False),这样就不会从任何子级获取文本。
In [18]: p = div_heading.find('strong', text='Start Time:').parent
In [19]: p.find(text=True, recursive=False)
Out[19]: u' 2016-08-12 11:57:33'
In [20]: p.text
Out[20]: u'Start Time: 2016-08-12 11:57:33'
您可以在两种方法中看到以上差异。仅在强标签上调用.text会给您u'开始时间:':
In [21]: div_heading.find('strong', text='Start Time:').text
Out[21]: u'Start Time:'
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句