I am trying to get all the href links from https://search.yhd.com/c0-0-1003817/ (the ones that lead to the specific products), but although my code runs, it only gets 30 links. I don't know why this is happening. Could you help me, please?
I've been working with selenium (python 3.7), but previously I also tried to get the codes with beautiful soup. That didn't work either.
from selenium import webdriver
import time
import requests
import pandas as pd
def getListingLinks(link):
# Open the driver
driver = webdriver.Chrome()
driver.get(link)
time.sleep(3)
# Save the links
listing_links = []
links = driver.find_elements_by_xpath('//a[@class="img"]')
for link in links:
listing_links.append(str(link.get_attribute('href')))
driver.close()
return listing_links
imported = getListingLinks("https://search.yhd.com/c0-0-1003817/")
I should get 60 links, but I am only managing to get 30 with my code.
at initial load, the page contains only 30 images/links. only when you scroll down, does it load all 60 items. you need to do the following:
def getListingLinks(link):
# Open the driver
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(link)
time.sleep(3)
# scroll down: repeated to ensure it reaches the bottom and all items are loaded
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
# Save the links
listing_links = []
links = driver.find_elements_by_xpath('//a[@class="img"]')
for link in links:
listing_links.append(str(link.get_attribute('href')))
driver.close()
return listing_links
imported = getListingLinks("https://search.yhd.com/c0-0-1003817/")
print(len(imported)) ## Output: 60
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加