How to make a scrapy spider run multiple times from a tornado request

sebsasto

I have an Scrapy Spider that I need to run when a Tornado get request is called. The first time I called the Tornado Request, the spider runs ok, but when I make another request to the Tornado, the spider does not run and the following error is raised:

Traceback (most recent call last):
    File "/Users/Sebastian/anaconda/lib/python2.7/site-packages/tornado/web.py", line 1413, in _execute
        result = method(*self.path_args, **self.path_kwargs)
    File "server.py", line 38, in get
        process.start()
    File "/Users/Sebastian/anaconda/lib/python2.7/site-packages/scrapy/crawler.py", line 251, in start
        reactor.run(installSignalHandlers=False)  # blocking call
    File "/Users/Sebastian/anaconda/lib/python2.7/site-packages/twisted/internet/base.py", line 1193, in run
        self.startRunning(installSignalHandlers=installSignalHandlers)
    File "/Users/Sebastian/anaconda/lib/python2.7/site-packages/twisted/internet/base.py", line 1173, in startRunning
        ReactorBase.startRunning(self)
    File "/Users/Sebastian/anaconda/lib/python2.7/site-packages/twisted/internet/base.py", line 684, in startRunning
        raise error.ReactorNotRestartable()
ReactorNotRestartable

The tornado method is:

class PageHandler(tornado.web.RequestHandler):

    def get(self):

        process = CrawlerProcess({
            'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
            'ITEM_PIPELINES' : {'__main__.ResultsPipeline': 1}
        })

        process.crawl(YourSpider)
        process.start()

        self.write(json.dumps(results))

So the idea is that always that DirectoryHandler method is called, the spider runs and perform the crawling.

sebsasto

Well after googled a lot of time, finally I get the answer to solve this problem... There is a library scrapydo (https://github.com/darkrho/scrapydo) that is based on croched and block the reactor for you allowing the reuse of the same spider every time is needed.

So to solve the problem you need to install the library, then call the setup method one time and then use the run_spider method... The code is like:

import scrapydo
scrapydo.setup()


class PageHandler(tornado.web.RequestHandler):

    def get(self):

        scrapydo.run_spider(YourSpider(), settings={
            'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
            'ITEM_PIPELINES' : {'__main__.ResultsPipeline': 1}
        })

        self.write(json.dumps(results))

Hope this could help anyone that have the same problem!

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Scrapy run multiple spiders from a main spider?

From Dev

Scrapy: How to run spider from other python script twice or more?

From Dev

How to run a spider from bat file for multiple urls?

From Dev

How to set crawler parameter from scrapy spider

From Dev

Python Scrapy: How do you run your spider from a seperate file?

From Dev

How can I extract multiple items from one page wit Portia/Scrapy Spider

From Dev

How to start my scrapy spider using http request?

From Dev

Run a Scrapy spider in a Celery Task

From Dev

Run a Scrapy spider in a Celery Task

From Dev

How to receive multiple request in a Tornado application

From Dev

scrapy how spider returns value to another spider

From Dev

scrapy how spider returns value to another spider

From Dev

How to collect stats from within scrapy spider callback?

From Dev

How to pass multiple arguments to Scrapy spider (getting error running 'scrapy crawl' with more than one spider is no longer supported)?

From Dev

How to run script multiple times from command line?

From Dev

how to run a function multiple times

From Dev

run multiple tornado processess

From Dev

Why does my Scrapy spider not run as expected?

From Dev

Creating one generic scrapy spider and multiple specific

From Dev

Can't request from website multiple times

From Dev

Run php script multiple times from a for loop

From Dev

c# how to make my program run through multiple times without setting the amount

From Dev

How to pass multiple parameters from Angular form to make POST request?

From Dev

How can I prevent multiple ajax queries from being run on "prev" button being clicked multiple times

From Dev

How can I prevent multiple ajax queries from being run on "prev" button being clicked multiple times

From Dev

Calling Scrapy Spider from python script?

From Dev

Passing Argument to Scrapy Spider from Python Script

From Dev

Assigning data from Scrapy spider to a variable

From Dev

Scrapy pass object from pipeline to Spider

Related Related

  1. 1

    Scrapy run multiple spiders from a main spider?

  2. 2

    Scrapy: How to run spider from other python script twice or more?

  3. 3

    How to run a spider from bat file for multiple urls?

  4. 4

    How to set crawler parameter from scrapy spider

  5. 5

    Python Scrapy: How do you run your spider from a seperate file?

  6. 6

    How can I extract multiple items from one page wit Portia/Scrapy Spider

  7. 7

    How to start my scrapy spider using http request?

  8. 8

    Run a Scrapy spider in a Celery Task

  9. 9

    Run a Scrapy spider in a Celery Task

  10. 10

    How to receive multiple request in a Tornado application

  11. 11

    scrapy how spider returns value to another spider

  12. 12

    scrapy how spider returns value to another spider

  13. 13

    How to collect stats from within scrapy spider callback?

  14. 14

    How to pass multiple arguments to Scrapy spider (getting error running 'scrapy crawl' with more than one spider is no longer supported)?

  15. 15

    How to run script multiple times from command line?

  16. 16

    how to run a function multiple times

  17. 17

    run multiple tornado processess

  18. 18

    Why does my Scrapy spider not run as expected?

  19. 19

    Creating one generic scrapy spider and multiple specific

  20. 20

    Can't request from website multiple times

  21. 21

    Run php script multiple times from a for loop

  22. 22

    c# how to make my program run through multiple times without setting the amount

  23. 23

    How to pass multiple parameters from Angular form to make POST request?

  24. 24

    How can I prevent multiple ajax queries from being run on "prev" button being clicked multiple times

  25. 25

    How can I prevent multiple ajax queries from being run on "prev" button being clicked multiple times

  26. 26

    Calling Scrapy Spider from python script?

  27. 27

    Passing Argument to Scrapy Spider from Python Script

  28. 28

    Assigning data from Scrapy spider to a variable

  29. 29

    Scrapy pass object from pipeline to Spider

HotTag

Archive