Selenium doesn't get all the href from a web page

debugcn 投稿 Dev

Flor Pupi

I am trying to get all the href links from https://search.yhd.com/c0-0-1003817/ (the ones that lead to the specific products), but although my code runs, it only gets 30 links. I don't know why this is happening. Could you help me, please?

I've been working with selenium (python 3.7), but previously I also tried to get the codes with beautiful soup. That didn't work either.

from selenium import webdriver 
import time
import requests
import pandas as pd

def getListingLinks(link):
    # Open the driver
    driver = webdriver.Chrome()
    driver.get(link)
    time.sleep(3)

    # Save the links
    listing_links = []
    links = driver.find_elements_by_xpath('//a[@class="img"]')
    for link in links:
        listing_links.append(str(link.get_attribute('href')))
    driver.close()
    return listing_links

imported = getListingLinks("https://search.yhd.com/c0-0-1003817/")

I should get 60 links, but I am only managing to get 30 with my code.

SanV

at initial load, the page contains only 30 images/links. only when you scroll down, does it load all 60 items. you need to do the following:

def getListingLinks(link):
    # Open the driver
    driver = webdriver.Chrome()
    driver.maximize_window()
    driver.get(link)
    time.sleep(3)
    # scroll down: repeated to ensure it reaches the bottom and all items are loaded
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)

    # Save the links
    listing_links = []
    links = driver.find_elements_by_xpath('//a[@class="img"]')
    for link in links:
        listing_links.append(str(link.get_attribute('href')))
    driver.close()
    return listing_links

imported = getListingLinks("https://search.yhd.com/c0-0-1003817/")

print(len(imported))  ## Output:  60

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2021-06-10

コメントを追加

サインイン

分類Dev

Related 関連記事

記事

Selenium doesn't get all the href from a web page

Selenium doesn't get all the href from a web page

In an angular app, href in <a> doesn't reloads the page

Get content from external web page on java

.htaccess deny from all doesn't work

Selenium (Java) Get FULL text from shortened displayed on page

I either don't get any messages from a web socket (socket.io), or occasionally I get them all at once (but not always)

$resource doesn't get array from json

IE doesn't accept GET from iframe

Get all user gd's of the users the person doesn't follow

Why doesn't wget -r get all FTP subdirectories?

Bash script doesn't get all parameters that i've given

Get manufacturers of all products in a category from non-product page

How to get mysql insert percentage from web page?

How to get specific data from mysql to a php web page?

jQuery $.get() and $.ajax() doesn't get data from PHP

Get data from href but it is not usable

Web page doesn't generate output for three minutes and dies. Where is the connection dying?

PowerShell script doesn't get the text from the out file

How to get the raw JSON response of a HTTP request from `driver.page_source` in Selenium webdriver Firefox

Why does Selenium WebDriver doesn't work on redirect url from server?

How to get all td[3] tags from the tr tags with selenium Xpath in python

Get href value from anchor tag with bash

Scrape table from web page

save web page with all the related content

ANTLR4 commonTokenStream.GetTokens() doesn't get all tokens

How do I get all item of an array which doesn't exists in another array

How do I access data returned from an axios get to display on a web page using react js?

Vue @click doesn't work on an anchor tag with href present

Preventing a new page from opening when href is clicked

Launching Unit Tests from ios-sim doesn't execute all tests