i'm trying to scrape data from THIS WEBSITE that have 3 kind of prices in some products, (muted price, red price and black price), i observed that the red price change before the page load when the product have 3 prices.
When i scrape the website i get just two prices, i think if the code wait until the page fully load i will get all the prices.
Here is my code:
url='https://www.exito.com/televisor-led-samsung-55-pulgadas-uhd-4k-smart-tv-serie-7-24449/p'
req = requests.get(url)
soup = BeautifulSoup(req.text, "lxml")
# Muted Price
MutedPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-listPriceValue ph2 dib strike custom-list-price fw5 exito-vtex-component-precio-tachado'})[0].text
MutedPrice=pd.to_numeric(MutedPrice[2-len(MutedPrice):].replace('.',''))
# Red Price
RedPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-sellingPrice fw1 f3 custom-selling-price dib ph2 exito-vtex-component-precio-rojo'})[0].text
RedPrice=pd.to_numeric(RedPrice[2-len(RedPrice):].replace('.',''))
# black Price
BlackPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-alliedPrice fw1 f3 custom-selling-price dib ph2 exito-vtex-component-precio-negro'})[0].text
BlackPrice=pd.to_numeric(BlackPrice[2-len(BlackPrice):].replace('.',''))
print('Muted Price:',MutedPrice)
print('Red Price:',RedPrice)
print('Black Price:',BlackPrice)
Actual Results: Muted Price: 3199900 Red Price: 1649868 Black Price: 0
Expected Results: Muted Price: 3199900 Red Price: 1550032 Black Price: 1649868
It might be that those values are rendered dynamically i.e. the values might be populated by javascript in the page.
requests.get()
simply returns the markup received from the server without any further client-side changes so it's not fully about waiting.
You could perhaps use Selenium Chrome Webdriver to load the page URL and get the page source. (Or you can use Firefox driver).
Go to chrome://settings/help
check your current chrome version and download the driver for that version from here. Make sure to either keep the driver file in your PATH
or the same folder where your python script is.
Try replace top 3 lines of your existing code with this:
from contextlib import closing
from selenium.webdriver import Chrome # pip install selenium
url='https://www.exito.com/televisor-led-samsung-55-pulgadas-uhd-4k-smart-tv-serie-7-24449/p'
# use Chrome to get page with javascript generated content
with closing(Chrome(executable_path="./chromedriver")) as browser:
browser.get(url)
page_source = browser.page_source
soup = BeautifulSoup(page_source, "lxml")
Outputs:
Muted Price: 3199900
Red Price: 1550032
Black Price: 1649868
References:
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句