我想在Python中使用Selenium来抓取此网页:https : //www.lelo.com/es/juguetes-sexuales-para-parejas。
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import pandas as pd
from selenium.webdriver.common.action_chains import ActionChains
import time
from tqdm import tqdm
from selenium.common.exceptions import NoSuchElementException
driver.get('https://www.lelo.com/es/juguetes-sexuales-para-parejas/')
通过使用以下代码,我仅识别了此页面上的可见链接:
masa_perso_flist = driver.find_elements_by_xpath('//div[@class="views-field views-field-rendered-
entity"]')
filtered_links = [link for link in masa_perso_flist if link.is_displayed()]
listOflinks = []
for masa in filtered_links:
ppp1=masa.find_element_by_tag_name('div')
ppp2=masa.find_element_by_tag_name('a')
listOflinks.append(ppp2.get_property('href'))
对于每个产品,我打开了listOflinks的链接,并尝试提取每个产品的名称,描述,价格,评论数量和平均评论。我发现在产品页面上用于捕获我感兴趣的信息的元素并不相似。例如,在名称和描述的情况下,有两种提取信息的可能途径(XPath),成功做到了。但是,我正在努力争取价格。在价格的情况下,我尝试使用此代码:
alldetails = []
for i in tqdm(listOflinks):
driver.get(i)
try:
Precio = driver.find_element_by_xpath('.//td[@class= "price-amount"] |.//table[@class= "price-amount"]').text
# I also tried: Precio = driver.find_element_by_xpath('.//td[@class= "price-amount"] |.//tr[@class="price-label"]').text
except NoSuchElementException:
Precio = ("No prices")
tempJb = {'Precios': Precio}
alldetails.append(tempJb)
print(alldetails)
这是我的输出:
[{'Price': '169.00 USD'}, {'Price': ''}, {'Price': ''}, {'Price': ''}, {'Price': ''}, {'Price': ''}]
如果我的代码错误,为什么我没有收到错误消息?为什么我得到{'Price':''}而不是{'Price':'No price'},这可能是一个愚蠢的问题,但是我非常感谢您在学习为这种情况开发合适的代码方面的帮助。我已经尝试了多种XPath组合来捕获价格信息,但是我仍然无法达到目的。非常感谢。
使用get_attribute('textContent')尝试以下操作
get_attribute('textContent')与.text
如果数据是隐藏的或其他方式,则将获取。
driver.get('https://www.lelo.com/es/juguetes-sexuales-para-parejas/')
masa_perso_flist = driver.find_elements_by_xpath('//div[@class="views-field views-field-rendered-entity"]')
filtered_links = [link for link in masa_perso_flist if link.is_displayed()]
listOflinks = []
for masa in filtered_links:
ppp1=masa.find_element_by_tag_name('div')
ppp2=masa.find_element_by_tag_name('a')
listOflinks.append(ppp2.get_property('href'))
alldetails = []
for i in tqdm(listOflinks):
driver.get(i)
try:
Precio = driver.find_element_by_xpath('.//td[@class= "price-amount"] |.//table[@class= "price-amount"]').get_attribute('textContent')
# I also tried: Precio = driver.find_element_by_xpath('.//td[@class= "price-amount"] |.//tr[@class="price-label"]').text
except NoSuchElementException:
Precio = "No prices"
tempJb = {'Precios': Precio}
alldetails.append(tempJb)
print(alldetails)
我没有tqdm,但是输出看起来正确。
输出:
[{'Precios': '$229.00'}]
[{'Precios': '$229.00'}, {'Precios': '$539.00'}]
[{'Precios': '$229.00'}, {'Precios': '$539.00'}, {'Precios': '$219.00'}]
[{'Precios': '$229.00'}, {'Precios': '$539.00'}, {'Precios': '$219.00'}, {'Precios': '$209.00'}]
[{'Precios': '$229.00'}, {'Precios': '$539.00'}, {'Precios': '$219.00'}, {'Precios': '$209.00'}, {'Precios': '$249.00'}]
[{'Precios': '$229.00'}, {'Precios': '$539.00'}, {'Precios': '$219.00'}, {'Precios': '$209.00'}, {'Precios': '$249.00'}, {'Precios': '$259.00'}]
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句