我已将整个HTML页面加载到BeautifulSoup中,是否可以隔离该词典集合?
这是我用来导入HTML文件的代码(不能使用urllib):从bs4导入BeautifulSoup
with open('/content/drive/My Drive/Colab Notebooks/Projects/20200710_StreetEasy_WebScraping/1.html') as f:
contents = f.read()
soup = BeautifulSoup(contents, 'lxml')
print(soup)
搜索a标签返回输出
a = soup.find_all('a')
a
[<a class="html-attribute-value html-resource-link" href="https://cdn-assets-s3.streeteasy.com/assets/manifest-c93475b02bd2409b4a52e21af023e5d5f489f19500d234a3660fe4d35069bbac.json" rel="noreferrer noopener" target="_blank">//cdn-assets-s3.streeteasy.com/assets/manifest-c93475b02bd2409b4a52e21af023e5d5f489f19500d234a3660fe4d35069bbac.json</a>,
<a class="html-attribute-value html-resource-link" href="https://browser.sentry-cdn.com/5.19.0/bundle.min.js" rel="noreferrer noopener" target="_blank">https://browser.sentry-cdn.com/5.19.0/bundle.min.js</a>,
<a class="html-attribute-value html-resource-link" href="https://cdn-assets-s3.streeteasy.com/assets/jquery-fe1be651ec56a9cc875a437f09db5b175cc6acf4b911bed0ef265955a099db55.js" rel="noreferrer noopener" target="_blank">//cdn-assets-s3.streeteasy.com/assets/jquery-fe1be651ec56a9cc875a437f09db5b175cc6acf4b911bed0ef265955a099db55.js</a>,
...
搜索脚本标签不返回任何输出
import re
scripts = soup.find_all("script")
scripts
[]
导入文档时我可能做错了什么?
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句