我一直在尝试在宜家网站上获取产品的可用性状态。在宜家网站上,它用荷兰语写着:“不可送货”,“仅在商店有售”,“无库存”和“您有365天保修期”。
但是我的代码给了我:“不可送货”,“仅可订购和提货”,“检查库存”以及“您有365天保修期”。
我做错了什么导致文本不相同?
这是我的代码:
import requests
from bs4 import BeautifulSoup
# Get the url of the IKEA page and set up the bs4 stuff
url = 'https://www.ikea.com/nl/nl/p/flintan-bureaustoel-vissle-zwart-20336841/'
thepage = requests.get(url)
soup = BeautifulSoup(thepage.text, 'lxml')
# Locate the part where the availability stuff is
availabilitypanel = soup.find('div', {'class' : 'range-revamp-product-availability'})
# Get the text of the things inside of that panel
availabilitysectiontext = [part.getText() for part in availabilitypanel]
print(availabilitysectiontext)
初始服务器响应后,页面标记将使用javascript添加。BeautifulSoup
只能看到初始响应,而无法执行javascript以获得完整响应。如果要运行JavaScript,则需要使用无头浏览器。否则,您将不得不反汇编JavaScript并查看其功能。
您可以将此与一起使用Selenium
。我对您的代码进行了一些修改,并使它起作用。
得到Selenium
:
pip3 install selenium
下载Firefox + geckodriver或Chrome + chromedriver:
from bs4 import BeautifulSoup
import time
from selenium import webdriver
# Get the url of the IKEA page and set up the bs4 stuff
url = 'https://www.ikea.com/nl/nl/p/flintan-bureaustoel-vissle-zwart-20336841/'
#uncomment the following line if using firefox + geckodriver
#driver = webdriver.Firefox(executable_path='/Users/ralwar/Downloads/geckodriver') # Downloaded from https://github.com/mozilla/geckodriver/releases
# using chrome + chromedriver
op = webdriver.ChromeOptions()
op.add_argument('headless')
driver = webdriver.Chrome(options=op, executable_path='/Users/ralwar/Downloads/chromedriver') # Downloaded from https://chromedriver.chromium.org/downloads
driver.get(url)
time.sleep(5) #adding delay to finish loading the page + javascript completely, you can adjust this
source = driver.page_source
soup = BeautifulSoup(source, 'lxml')
# Locate the part where the availability stuff is
availabilitypanel = soup.find('div', {"class" : "range-revamp-product-availability"})
# Get the text of the things inside of that panel
availabilitysectiontext = [part.getText() for part in availabilitypanel]
print(availabilitysectiontext)
上面的代码打印:
['Niet beschikbaar voor levering', 'Alleen beschikbaar in de winkel', 'Niet op voorraad in Amersfoort', 'Je hebt 365 dagen om van gedachten te veranderen. ']
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句