Python BeautifulSoup并请求网络抓取

debugcn 发表于 Dev

史密斯

我正在尝试从以下网站获取生成的短语作为字符串。https://randomwordgenerator.com/phrase.php我浏览了html，我相信我已经确定了该短语在html结构中的位置。

这是附近的html。

<div id="loading_result" class="small-img-results">
    <ol id="result">
        <li> == $0
            <div>
                <span class="support-phrase">Generated Phrase </span>
                <span class="subtle">...</span>
            </div>
        </li>
    </ol>
</div>

在这种情况下，我需要文本“ Generated Phrase”

这是我目前正在做的

pageLink = "https://randomwordgenerator.com/phrase.php"
pageResponse = requests.get(pageLink, timeout=5)
pageContent = BeautifulSoup(pageResponse.content, "html.parser")

span = pageContent.find_all("span", {"class": "support-phrase"})

问题是此运行之后的span值为空列表。我刚接触过漂亮的汤，所以这可能是一个非常简单的问题，但是我还没有发现任何特别清晰的解决方案。

提前致谢！

编辑：我现在想知道问题是否是我要查找的特定跨度嵌套在体内的一系列div中。

QHarr

您将需要硒来获取页面上显示的准确值。原因是，虽然总短语（134）是从xhr（https://randomwordgenerator.com/json/phrases.json）返回的数组中返回的；randomiseUniqueNumbers从该数组中选择的实际索引/索引（例如函数），数组中各项的顺序（例如Array.prototype.shuffle = function()）以及处理我认为可能发生的冲突的规则（例如function getResults）均在js文件中定义https://randomwordgenerator.com/assets/js-compress/f0351bd03da6dab13a24355fa7deeabd.js?v=1577899960:formatted。其中的前两个至少在数组大小的边界之间使用随机数生成。没有种子，尽管我认为您可以编写自己的版本，但不能保证获得与页面上相同的结果-实际上，您更有可能获得不同的词组。

硒的轮廓

from selenium import webdriver

d = webdriver.Chrome()
d.get('https://randomwordgenerator.com/phrase.php')
print([i.text for i in d.find_elements_by_css_selector('.support-phrase')])

对于一个短语，只需使用

d.find_element_by_css_selector('.support-phrase').text

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-2

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

Python BeautifulSoup并请求网络抓取

Python BeautifulSoup并请求网络抓取

Python BeautifulSoup网络抓取

Python上BeautifulSoup的属性错误（网络抓取）

使用python和BeautifulSoup进行网络抓取

Python BeautifulSoup网络抓取：将数据追加到列表中

在Python标签中使用BeautifulSoup进行网络抓取

使用BeautifulSoup使用python进行网络抓取，发现错误

使用python beautifulsoup进行网络抓取，等号后获取值

简单的python网络抓取

python lxml并请求语法错误

Python BeautifulSoup抓取表

Python BeautifulSoup抓取表

python BeautifulSoup表抓取

python BeautifulSoup表抓取

使用BeautifulSoup进行Python网络抓取，循环并跳过某些URL值

如何在python请求网络抓取中找到正确的参数FormData和Request标头？

抓取“ __hpKey”的网站，然后在python中使用请求和beautifulsoup登录

嵌套标签网络抓取Python

嵌套标签网络抓取Python

lxml并请求抓取javascript表

使用BeautifulSoup Python抓取网页

使用BeautifulSoup Python抓取网页

python beautifulsoup抓取存档页面

网页抓取 Python (BeautifulSoup,Requests)

Python - 抓取时 BeautifulSoup 错误

python beautifulsoup 网页抓取问题

使用Python请求抓取页面

python 抓取请求以获取 json

使用 python 请求抓取 json

Python 请求和 BeautifulSoup4 .get('href') 从 Safaribooksonline 抓取时返回绝对地址