以下是我尝试用来抓取Orbitz.com的脚本。问题在于,SAME字段的XPath(让我们使用FROM Airport字段)一直在变化。一次是fromDate_XPath = ".//*[@id='8f57e1cb92a99815ca1085ac0f6d31db']"
,下次是.//*[@id='0a3807a6e50ffd4cc05eaca5b6aada17']
。
Orbitz是否专门这样做以防止刮擦?我本以为如果我使用他们的网站来获取购买他们将获得的门票的链接,从而可以让我抓取,不是吗?
有什么办法可以解决这个问题?
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome()
driver.get("http://www.orbitz.com/")
# X-PATHS FOR DIFFERENT FIELDS
fltOption_Xpath = ".//*[@id='products']/div/fieldset/div[2]/label[1]/div"
fromAir_XPath = ".//*[@id='2de60aafe0629114603daf0bc1ab52a6']"
toAir_XPath = ".//*[@id='9c64cbe5f29f6f28b64ddb9811e102b5']"
fromDate_XPath = ".//*[@id='8f57e1cb92a99815ca1085ac0f6d31db']"
toDate_XPath = ".//*[@id='aa8496535efd1aec3badf9423813fbbd']"
selFlightsOption_Element = WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath(fltOption_Xpath))
selFlightsOption_Element.click()
fromAir_Element = WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath(fromAir_XPath))
toAir_ELement = WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath(toAir_XPath))
fromDate_Element = WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath(fromDate_XPath))
toDate_Element = WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath(toDate_XPath))
fromAir_Element.click()
fromAir_Element.clear()
fromAir_Element.send_keys("IAH")
toAir_Element.click()
toAir_Element.clear()
toAir_Element.send_keys("MUM")
由于id
输入的属性是动态生成的,因此不要将定位器依赖于它们。
您可以切换到name
s-从我看到的内容来看,它们没有变化,而且可读性很强。例如,对于“仅旅馆”模式下的“起始日期”输入:
fromDate_Element = driver.find_element_by_name("hotel.chkin")
或者由于某种原因需要XPath:
fromDate_Element = driver.find_element_by_xpath("//input[@name='hotel.chkin']")
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句