How to scrape with BeautifulSoup waiting a second to save the soup element to let elements load complete in the page

Fabio Salinas

i'm trying to scrape data from THIS WEBSITE that have 3 kind of prices in some products, (muted price, red price and black price), i observed that the red price change before the page load when the product have 3 prices.

When i scrape the website i get just two prices, i think if the code wait until the page fully load i will get all the prices.

Here is my code:

url='https://www.exito.com/televisor-led-samsung-55-pulgadas-uhd-4k-smart-tv-serie-7-24449/p'
req = requests.get(url)
soup = BeautifulSoup(req.text, "lxml")

# Muted Price
MutedPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-listPriceValue ph2 dib strike custom-list-price fw5 exito-vtex-component-precio-tachado'})[0].text
MutedPrice=pd.to_numeric(MutedPrice[2-len(MutedPrice):].replace('.',''))

# Red Price
RedPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-sellingPrice fw1 f3 custom-selling-price dib ph2 exito-vtex-component-precio-rojo'})[0].text
RedPrice=pd.to_numeric(RedPrice[2-len(RedPrice):].replace('.',''))

# black Price
BlackPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-alliedPrice fw1 f3 custom-selling-price dib ph2 exito-vtex-component-precio-negro'})[0].text
BlackPrice=pd.to_numeric(BlackPrice[2-len(BlackPrice):].replace('.',''))

print('Muted Price:',MutedPrice)
print('Red Price:',RedPrice)
print('Black Price:',BlackPrice)

Actual Results: Muted Price: 3199900 Red Price: 1649868 Black Price: 0

Expected Results: Muted Price: 3199900 Red Price: 1550032 Black Price: 1649868

Rithin Chalumuri

It might be that those values are rendered dynamically i.e. the values might be populated by javascript in the page.

requests.get() simply returns the markup received from the server without any further client-side changes so it's not fully about waiting.

You could perhaps use Selenium Chrome Webdriver to load the page URL and get the page source. (Or you can use Firefox driver).

Go to chrome://settings/help check your current chrome version and download the driver for that version from here. Make sure to either keep the driver file in your PATH or the same folder where your python script is.

Try replace top 3 lines of your existing code with this:

from contextlib import closing
from selenium.webdriver import Chrome # pip install selenium

url='https://www.exito.com/televisor-led-samsung-55-pulgadas-uhd-4k-smart-tv-serie-7-24449/p'

# use Chrome to get page with javascript generated content
with closing(Chrome(executable_path="./chromedriver")) as browser:
     browser.get(url)
     page_source = browser.page_source

soup = BeautifulSoup(page_source, "lxml")

Outputs:

Muted Price: 3199900
Red Price: 1550032
Black Price: 1649868

References:

Get page generated with Javascript in Python

selenium - chromedriver executable needs to be in PATH

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

How can I save a <template> for a second use?

来自分类Dev

Save content of textarea to file and load from PHP server page

来自分类Dev

Beautifulsoup Python Youtube Scrape无法正常工作

来自分类Dev

Emacs -- How to replace nth element of a list with a let-bound variable

来自分类Dev

Beautifulsoup soup.body返回无

来自分类Dev

BeautifulSoup Soup findall 只提取类数据

来自分类Dev

How to hide a column in devexpress gridview on page load?

来自分类Dev

带有请求和beautifulsoup的Python Scrape

来自分类Dev

在soup.find()方法中传递变量-Beautifulsoup Python

来自分类Dev

soup.find()在BeautifulSoup中提供什么类型的输出?

来自分类Dev

python BeautifulSoup soup.findAll(),如何使搜索结果匹配

来自分类Dev

如何让soup.find_all 在BeautifulSoup 中工作?

来自分类Dev

BeautifulSoup 响应 - Beautiful Soup 不是 HTTP 客户端

来自分类Dev

How to load external Javascript file in MVC 5 _Layout Page

来自分类Dev

Python Web Scrape using Beautiful Soup - 从页面返回所有产品详细信息

来自分类Dev

Check element type in BeautifulSoup 3

来自分类Dev

How to remove a jquery UI element without removing elements?

来自分类Dev

How to extract elements from html page using HtmlUnit

来自分类Dev

.load return page fragment

来自分类Dev

How can I set element for entire width of the page? (Android)

来自分类Dev

Show a confirmation page before complete function call

来自分类Dev

Beautifulsoup:当我尝试通过Beautifulsoup4访问soup.head.next_sibling值时,换行了

来自分类Dev

带有类别名称的表的Python scrape网站w / BeautifulSoup4 shwoing属性错误

来自分类Dev

在同一页面上当我单击Save_button时,首先应执行Page_Load事件还是btnSave_Click按钮?

来自分类Dev

在同一页面上当我单击Save_button时,首先应执行Page_Load事件还是btnSave_Click按钮?

来自分类Dev

不能在python中使用BeautifulSoup使用soup.findAll('table')查找表

来自分类Dev

Python BeautifulSoup:在soup.find_all(..)时未显示html(网页)中的文本

来自分类Dev

BeautifulSoup类型的nth返回空列表。Soup.select()[n -1]返回元素。为什么?

来自分类Dev

beautifulSoup soup.select()对于CSS选择器返回空

Related 相关文章

  1. 1

    How can I save a <template> for a second use?

  2. 2

    Save content of textarea to file and load from PHP server page

  3. 3

    Beautifulsoup Python Youtube Scrape无法正常工作

  4. 4

    Emacs -- How to replace nth element of a list with a let-bound variable

  5. 5

    Beautifulsoup soup.body返回无

  6. 6

    BeautifulSoup Soup findall 只提取类数据

  7. 7

    How to hide a column in devexpress gridview on page load?

  8. 8

    带有请求和beautifulsoup的Python Scrape

  9. 9

    在soup.find()方法中传递变量-Beautifulsoup Python

  10. 10

    soup.find()在BeautifulSoup中提供什么类型的输出?

  11. 11

    python BeautifulSoup soup.findAll(),如何使搜索结果匹配

  12. 12

    如何让soup.find_all 在BeautifulSoup 中工作?

  13. 13

    BeautifulSoup 响应 - Beautiful Soup 不是 HTTP 客户端

  14. 14

    How to load external Javascript file in MVC 5 _Layout Page

  15. 15

    Python Web Scrape using Beautiful Soup - 从页面返回所有产品详细信息

  16. 16

    Check element type in BeautifulSoup 3

  17. 17

    How to remove a jquery UI element without removing elements?

  18. 18

    How to extract elements from html page using HtmlUnit

  19. 19

    .load return page fragment

  20. 20

    How can I set element for entire width of the page? (Android)

  21. 21

    Show a confirmation page before complete function call

  22. 22

    Beautifulsoup:当我尝试通过Beautifulsoup4访问soup.head.next_sibling值时,换行了

  23. 23

    带有类别名称的表的Python scrape网站w / BeautifulSoup4 shwoing属性错误

  24. 24

    在同一页面上当我单击Save_button时,首先应执行Page_Load事件还是btnSave_Click按钮?

  25. 25

    在同一页面上当我单击Save_button时,首先应执行Page_Load事件还是btnSave_Click按钮?

  26. 26

    不能在python中使用BeautifulSoup使用soup.findAll('table')查找表

  27. 27

    Python BeautifulSoup:在soup.find_all(..)时未显示html(网页)中的文本

  28. 28

    BeautifulSoup类型的nth返回空列表。Soup.select()[n -1]返回元素。为什么?

  29. 29

    beautifulSoup soup.select()对于CSS选择器返回空

热门标签

归档