Why is Scrapy returning duplicate results?

Ncastro340

I am experimenting with scrapy and running into some issues. The problem is that my script is returning duplicate results. I am trying to scrape urls from a parent page and follow each individual url to obtain an associated date. After scraping each nested url, it seems that it will again output the list of urls from the parent page.

Here is the script:


    import scrapy
    from aeon.items import AeonItem
    from scrapy.http.request import Request

    class AeonSpider(scrapy.Spider):
        name = "aeon"
        allowed_domains = ["aeon.co"]
        start_urls = [
            "http://aeon.co/magazine/technology"
        ]

        def parse(self, response):
            items = []
            for sel in response.xpath('//*[@id="latestPosts"]'):
                item = AeonItem()
                item['primary_url'] = sel.xpath('div/div/div/a/@href').extract()    

                for each in item['primary_url']:
                    yield Request(each, callback=self.parse_next_page,meta={'item':item})

        def parse_next_page(self, response):
            for sel in response.xpath('//*[@id="top"]'):
                item = response.meta['item']
                item['comments'] =  sel.xpath('div[5]/div[3]/div[2]/div/p/em/span[@class="instapaper_datepublished"]/text()').extract()
                return item

Here is the json output:


    {"comments": ["13 February 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]}
    {"comments": ["31 January 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]}
    {"comments": ["12 March 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]}
    {"comments": ["31 March 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]}
    {"comments": ["30 May 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]}

To reiterate, I am having trouble outputting one list of urls from the parent page and one list of corresponding dates from each individual nested url. I am new to scrapy and to python so hopefully someone can point me in the right direction.

Elias Dorneles

Your code is iterating on the wrong thing.

That response.xpath('//*[@id="latestPosts"]') bit returns a list with only one selector that contains all the article links.

Try changing the loop to:

for sel in response.xpath('//*[@id="latestPosts"]/div/div/div'):
    item = AeonItem()
    item['primary_url'] = sel.xpath('./a/@href').extract()

    ...

You probably want to apply the same change on the other callback too -- I'll leave the rest of the fun for you. =)

Read more:

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정
0

몇 마디 만하겠습니다

0리뷰
로그인참여 후 검토

관련 기사

분류에서Dev

Why does the "which" command give duplicate results?

분류에서Dev

mean of non zero elements - why are these two attempt returning different results?

분류에서Dev

Page not returning results

분류에서Dev

Why is this not returning an @@ERROR?

분류에서Dev

C++ code returning multiple duplicate characters

분류에서Dev

scrapy outputs results into one row of csv

분류에서Dev

MySQL Full Text Search Returning 0 Results

분류에서Dev

Creating UTC time in momentjs is returning weird results

분류에서Dev

mysqli queries not returning results inside function

분류에서Dev

scraper only returning results for first 2 inputs

분류에서Dev

ElasticSearch Ruby returning results without ElasticSearch running

분류에서Dev

nslookup returning fake results for one particular domain

분류에서Dev

Internet category not returning any results in Synapse

분류에서Dev

How to check duplicate results of a random character generator

분류에서Dev

Why is diff returning an incorrect value?

분류에서Dev

Why this script is returning ' 'null' is not an object'?

분류에서Dev

ListView View Holder returning duplicate rows multiple times

분류에서Dev

curly bracket issue when inserting scrapy crawler results to postgresql

분류에서Dev

Laravel - Calculations returning unexpected results with decimal/numeric data type (Postgres)

분류에서Dev

MySQL Random Select Query with limit returning different number of results (undesired)

분류에서Dev

Prepared statement is returning empty results, possible syntax error

분류에서Dev

SQL - Searching in a table based on information in another table and returning multiple results

분류에서Dev

Skype returning empty CHATMESSAGES results through the DBus API

분류에서Dev

NEST elasticsearch.NET search query not returning results (part 2)

분류에서Dev

filter function ng-repeat returning true but 0 results are displayed

분류에서Dev

Why is findAll() not returning all objects in model?

분류에서Dev

why is jquery.cookie.js not returning the cookie?

분류에서Dev

Why is this query (potentially) only returning posts that are questions?

분류에서Dev

Why is AES function returning different value?

Related 관련 기사

  1. 1

    Why does the "which" command give duplicate results?

  2. 2

    mean of non zero elements - why are these two attempt returning different results?

  3. 3

    Page not returning results

  4. 4

    Why is this not returning an @@ERROR?

  5. 5

    C++ code returning multiple duplicate characters

  6. 6

    scrapy outputs results into one row of csv

  7. 7

    MySQL Full Text Search Returning 0 Results

  8. 8

    Creating UTC time in momentjs is returning weird results

  9. 9

    mysqli queries not returning results inside function

  10. 10

    scraper only returning results for first 2 inputs

  11. 11

    ElasticSearch Ruby returning results without ElasticSearch running

  12. 12

    nslookup returning fake results for one particular domain

  13. 13

    Internet category not returning any results in Synapse

  14. 14

    How to check duplicate results of a random character generator

  15. 15

    Why is diff returning an incorrect value?

  16. 16

    Why this script is returning ' 'null' is not an object'?

  17. 17

    ListView View Holder returning duplicate rows multiple times

  18. 18

    curly bracket issue when inserting scrapy crawler results to postgresql

  19. 19

    Laravel - Calculations returning unexpected results with decimal/numeric data type (Postgres)

  20. 20

    MySQL Random Select Query with limit returning different number of results (undesired)

  21. 21

    Prepared statement is returning empty results, possible syntax error

  22. 22

    SQL - Searching in a table based on information in another table and returning multiple results

  23. 23

    Skype returning empty CHATMESSAGES results through the DBus API

  24. 24

    NEST elasticsearch.NET search query not returning results (part 2)

  25. 25

    filter function ng-repeat returning true but 0 results are displayed

  26. 26

    Why is findAll() not returning all objects in model?

  27. 27

    why is jquery.cookie.js not returning the cookie?

  28. 28

    Why is this query (potentially) only returning posts that are questions?

  29. 29

    Why is AES function returning different value?

뜨겁다태그

보관