Selenium doesn't get all the href from a web page

Flor Pupi

I am trying to get all the href links from https://search.yhd.com/c0-0-1003817/ (the ones that lead to the specific products), but although my code runs, it only gets 30 links. I don't know why this is happening. Could you help me, please?

I've been working with selenium (python 3.7), but previously I also tried to get the codes with beautiful soup. That didn't work either.

from selenium import webdriver 
import time
import requests
import pandas as pd

def getListingLinks(link):
    # Open the driver
    driver = webdriver.Chrome()
    driver.get(link)
    time.sleep(3)

    # Save the links
    listing_links = []
    links = driver.find_elements_by_xpath('//a[@class="img"]')
    for link in links:
        listing_links.append(str(link.get_attribute('href')))
    driver.close()
    return listing_links

imported = getListingLinks("https://search.yhd.com/c0-0-1003817/")

I should get 60 links, but I am only managing to get 30 with my code.

SanV

at initial load, the page contains only 30 images/links. only when you scroll down, does it load all 60 items. you need to do the following:

def getListingLinks(link):
    # Open the driver
    driver = webdriver.Chrome()
    driver.maximize_window()
    driver.get(link)
    time.sleep(3)
    # scroll down: repeated to ensure it reaches the bottom and all items are loaded
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)

    # Save the links
    listing_links = []
    links = driver.find_elements_by_xpath('//a[@class="img"]')
    for link in links:
        listing_links.append(str(link.get_attribute('href')))
    driver.close()
    return listing_links

imported = getListingLinks("https://search.yhd.com/c0-0-1003817/")

print(len(imported))  ## Output:  60

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

In an angular app, href in <a> doesn't reloads the page

分類Dev

Get content from external web page on java

分類Dev

.htaccess deny from all doesn't work

分類Dev

Selenium (Java) Get FULL text from shortened displayed on page

分類Dev

I either don't get any messages from a web socket (socket.io), or occasionally I get them all at once (but not always)

分類Dev

$resource doesn't get array from json

分類Dev

IE doesn't accept GET from iframe

分類Dev

Get all user gd's of the users the person doesn't follow

分類Dev

Why doesn't wget -r get all FTP subdirectories?

分類Dev

Bash script doesn't get all parameters that i've given

分類Dev

Get manufacturers of all products in a category from non-product page

分類Dev

How to get mysql insert percentage from web page?

分類Dev

How to get specific data from mysql to a php web page?

分類Dev

jQuery $.get() and $.ajax() doesn't get data from PHP

分類Dev

Get data from href but it is not usable

分類Dev

Web page doesn't generate output for three minutes and dies. Where is the connection dying?

分類Dev

PowerShell script doesn't get the text from the out file

分類Dev

How to get the raw JSON response of a HTTP request from `driver.page_source` in Selenium webdriver Firefox

分類Dev

Why does Selenium WebDriver doesn't work on redirect url from server?

分類Dev

How to get all td[3] tags from the tr tags with selenium Xpath in python

分類Dev

Get href value from anchor tag with bash

分類Dev

Scrape table from web page

分類Dev

save web page with all the related content

分類Dev

ANTLR4 commonTokenStream.GetTokens() doesn't get all tokens

分類Dev

How do I get all item of an array which doesn't exists in another array

分類Dev

How do I access data returned from an axios get to display on a web page using react js?

分類Dev

Vue @click doesn't work on an anchor tag with href present

分類Dev

Preventing a new page from opening when href is clicked

分類Dev

Launching Unit Tests from ios-sim doesn't execute all tests

Related 関連記事

  1. 1

    In an angular app, href in <a> doesn't reloads the page

  2. 2

    Get content from external web page on java

  3. 3

    .htaccess deny from all doesn't work

  4. 4

    Selenium (Java) Get FULL text from shortened displayed on page

  5. 5

    I either don't get any messages from a web socket (socket.io), or occasionally I get them all at once (but not always)

  6. 6

    $resource doesn't get array from json

  7. 7

    IE doesn't accept GET from iframe

  8. 8

    Get all user gd's of the users the person doesn't follow

  9. 9

    Why doesn't wget -r get all FTP subdirectories?

  10. 10

    Bash script doesn't get all parameters that i've given

  11. 11

    Get manufacturers of all products in a category from non-product page

  12. 12

    How to get mysql insert percentage from web page?

  13. 13

    How to get specific data from mysql to a php web page?

  14. 14

    jQuery $.get() and $.ajax() doesn't get data from PHP

  15. 15

    Get data from href but it is not usable

  16. 16

    Web page doesn't generate output for three minutes and dies. Where is the connection dying?

  17. 17

    PowerShell script doesn't get the text from the out file

  18. 18

    How to get the raw JSON response of a HTTP request from `driver.page_source` in Selenium webdriver Firefox

  19. 19

    Why does Selenium WebDriver doesn't work on redirect url from server?

  20. 20

    How to get all td[3] tags from the tr tags with selenium Xpath in python

  21. 21

    Get href value from anchor tag with bash

  22. 22

    Scrape table from web page

  23. 23

    save web page with all the related content

  24. 24

    ANTLR4 commonTokenStream.GetTokens() doesn't get all tokens

  25. 25

    How do I get all item of an array which doesn't exists in another array

  26. 26

    How do I access data returned from an axios get to display on a web page using react js?

  27. 27

    Vue @click doesn't work on an anchor tag with href present

  28. 28

    Preventing a new page from opening when href is clicked

  29. 29

    Launching Unit Tests from ios-sim doesn't execute all tests

ホットタグ

アーカイブ