How to scrape with BeautifulSoup waiting a second to save the soup element to let elements load complete in the page

debugcn 发表于 Dev

Fabio Salinas

i'm trying to scrape data from THIS WEBSITE that have 3 kind of prices in some products, (muted price, red price and black price), i observed that the red price change before the page load when the product have 3 prices.

When i scrape the website i get just two prices, i think if the code wait until the page fully load i will get all the prices.

Here is my code:

url='https://www.exito.com/televisor-led-samsung-55-pulgadas-uhd-4k-smart-tv-serie-7-24449/p'
req = requests.get(url)
soup = BeautifulSoup(req.text, "lxml")

# Muted Price
MutedPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-listPriceValue ph2 dib strike custom-list-price fw5 exito-vtex-component-precio-tachado'})[0].text
MutedPrice=pd.to_numeric(MutedPrice[2-len(MutedPrice):].replace('.',''))

# Red Price
RedPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-sellingPrice fw1 f3 custom-selling-price dib ph2 exito-vtex-component-precio-rojo'})[0].text
RedPrice=pd.to_numeric(RedPrice[2-len(RedPrice):].replace('.',''))

# black Price
BlackPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-alliedPrice fw1 f3 custom-selling-price dib ph2 exito-vtex-component-precio-negro'})[0].text
BlackPrice=pd.to_numeric(BlackPrice[2-len(BlackPrice):].replace('.',''))

print('Muted Price:',MutedPrice)
print('Red Price:',RedPrice)
print('Black Price:',BlackPrice)

Actual Results: Muted Price: 3199900 Red Price: 1649868 Black Price: 0

Expected Results: Muted Price: 3199900 Red Price: 1550032 Black Price: 1649868

Rithin Chalumuri

It might be that those values are rendered dynamically i.e. the values might be populated by javascript in the page.

requests.get() simply returns the markup received from the server without any further client-side changes so it's not fully about waiting.

You could perhaps use Selenium Chrome Webdriver to load the page URL and get the page source. (Or you can use Firefox driver).

Go to chrome://settings/help check your current chrome version and download the driver for that version from here. Make sure to either keep the driver file in your PATH or the same folder where your python script is.

Try replace top 3 lines of your existing code with this:

from contextlib import closing
from selenium.webdriver import Chrome # pip install selenium

url='https://www.exito.com/televisor-led-samsung-55-pulgadas-uhd-4k-smart-tv-serie-7-24449/p'

# use Chrome to get page with javascript generated content
with closing(Chrome(executable_path="./chromedriver")) as browser:
     browser.get(url)
     page_source = browser.page_source

soup = BeautifulSoup(page_source, "lxml")

Outputs:

Muted Price: 3199900
Red Price: 1550032
Black Price: 1649868

References:

Get page generated with Javascript in Python

selenium - chromedriver executable needs to be in PATH

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-1

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

How to scrape with BeautifulSoup waiting a second to save the soup element to let elements load complete in the page

How to scrape with BeautifulSoup waiting a second to save the soup element to let elements load complete in the page

How can I save a <template> for a second use?

Save content of textarea to file and load from PHP server page

Beautifulsoup Python Youtube Scrape无法正常工作

Emacs -- How to replace nth element of a list with a let-bound variable

Beautifulsoup soup.body返回无

BeautifulSoup Soup findall 只提取类数据

How to hide a column in devexpress gridview on page load?

带有请求和beautifulsoup的Python Scrape

在soup.find（）方法中传递变量-Beautifulsoup Python

soup.find（）在BeautifulSoup中提供什么类型的输出？

python BeautifulSoup soup.findAll（），如何使搜索结果匹配

如何让soup.find_all 在BeautifulSoup 中工作？

BeautifulSoup 响应 - Beautiful Soup 不是 HTTP 客户端

How to load external Javascript file in MVC 5 _Layout Page

Python Web Scrape using Beautiful Soup - 从页面返回所有产品详细信息

Check element type in BeautifulSoup 3

How to remove a jquery UI element without removing elements?

How to extract elements from html page using HtmlUnit

.load return page fragment

How can I set element for entire width of the page? (Android)

Show a confirmation page before complete function call

Beautifulsoup：当我尝试通过Beautifulsoup4访问soup.head.next_sibling值时，换行了

带有类别名称的表的Python scrape网站w / BeautifulSoup4 shwoing属性错误

在同一页面上当我单击Save_button时，首先应执行Page_Load事件还是btnSave_Click按钮？

在同一页面上当我单击Save_button时，首先应执行Page_Load事件还是btnSave_Click按钮？

不能在python中使用BeautifulSoup使用soup.findAll（'table'）查找表

Python BeautifulSoup：在soup.find_all（..）时未显示html（网页）中的文本

BeautifulSoup类型的nth返回空列表。Soup.select（）[n -1]返回元素。为什么？

beautifulSoup soup.select（）对于CSS选择器返回空