How do I get a normal url from redis rather than through url cPikle converted?

rowele

I use scrapy-redis simple to build a distributed crawler, slave machine needs to read url form master queue url, but there is a problem is that I get to url slave machine is after cPikle converted data, I want to get url from redis-url-queue is correct, what do you suggest?

Example:

from scrapy_redis.spiders import RedisSpider
from scrapy.spider import Spider
from example.items import ExampleLoader
class MySpider(RedisSpider):
"""Spider that reads urls from redis queue (myspider:start_urls)."""
    name = 'redisspider'
    redis_key = 'wzws:requests'

    def __init__(self, *args, **kwargs):
        super(MySpider, self).__init__(*args, **kwargs)

    def parse(self, response):
        el = ExampleLoader(response=response)
        el.add_xpath('name', '//title[1]/text()')
        el.add_value('url', response.url)
        return el.load_item()

MySpider inherited the RedisSpider, when I run scrapy runspider myspider_redis.py it occurs not legal url

scrapy-redis github address:scrapy-redis

R. Max

There are a few internal queues used in scrapy-redis. One is for start urls (by default <spider>:start_urls), other for shared requests (by default <spider>:requests) and another for the dupefilter.

The start urls queue and requests queue can't be the same as start urls queue expects single string values and the requests queue expects pickled data.

So, you should not be using <spider>:requests as redis_key in the spider.

Let me know if this helps, otherwise please share the messages in the redis_key queue.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How do you get the ASP.NET MVC action of the page (i.e. one from URL) rather than current action?

From Dev

How do you GET DATA in SPSS syntax for a URL rather than local file?

From Dev

How do you GET DATA in SPSS syntax for a URL rather than local file?

From Dev

How do I get a TIFF bytestream from an OpenCV image, rather than a numpy array?

From Dev

How do I get the user to input an int rather than a float?

From Dev

How do I traverse through a linked list after adding tails rather than heads

From Dev

how to get data from website through URL

From Dev

How do I get the SPListItem from absolute URL?

From Dev

How do I strip the query (used for GET parameters) from a URL?

From Dev

How do I get multiple comma separated values from URL

From Dev

How do I get the likes number from facebook for a given url?

From Dev

How do I get the image url from an image field type

From Dev

How do I create pretty url from the GET parameters in Django?

From Dev

How do I get id number from url in javascript?

From Dev

How can I ensure JAWS reads the link text rather than the URL?

From Dev

URL in-browser DELETE link (rather than GET)

From Dev

How to make the url of links on Screens in Moqui to be relative rather than absolute?

From Dev

In Linux, how do I get man pages for C functions rather than for bash commands?

From Dev

How do I get scribble to make -- be two short dashes rather than one long dash?

From Dev

How do I get my mouse coordinates relative to the window rather than the screen?

From Dev

How do I get this CSS/jQuery menu to open only on click, rather than hover?

From Dev

How do I get python to search text for one word in a list rather than all the words in a list?

From Dev

How do I tell OSX to use matplotlib from brew, rather than default?

From Dev

How do I tell OSX to use matplotlib from brew, rather than default?

From Dev

How do I import data from Activie directories through LDAP URL

From Dev

How do I get a Python distribution URL?

From Dev

How do I get query parameters in the URL

From Dev

How do I get the url to a path in ruby

From Dev

How do I get the redirect url?

Related Related

  1. 1

    How do you get the ASP.NET MVC action of the page (i.e. one from URL) rather than current action?

  2. 2

    How do you GET DATA in SPSS syntax for a URL rather than local file?

  3. 3

    How do you GET DATA in SPSS syntax for a URL rather than local file?

  4. 4

    How do I get a TIFF bytestream from an OpenCV image, rather than a numpy array?

  5. 5

    How do I get the user to input an int rather than a float?

  6. 6

    How do I traverse through a linked list after adding tails rather than heads

  7. 7

    how to get data from website through URL

  8. 8

    How do I get the SPListItem from absolute URL?

  9. 9

    How do I strip the query (used for GET parameters) from a URL?

  10. 10

    How do I get multiple comma separated values from URL

  11. 11

    How do I get the likes number from facebook for a given url?

  12. 12

    How do I get the image url from an image field type

  13. 13

    How do I create pretty url from the GET parameters in Django?

  14. 14

    How do I get id number from url in javascript?

  15. 15

    How can I ensure JAWS reads the link text rather than the URL?

  16. 16

    URL in-browser DELETE link (rather than GET)

  17. 17

    How to make the url of links on Screens in Moqui to be relative rather than absolute?

  18. 18

    In Linux, how do I get man pages for C functions rather than for bash commands?

  19. 19

    How do I get scribble to make -- be two short dashes rather than one long dash?

  20. 20

    How do I get my mouse coordinates relative to the window rather than the screen?

  21. 21

    How do I get this CSS/jQuery menu to open only on click, rather than hover?

  22. 22

    How do I get python to search text for one word in a list rather than all the words in a list?

  23. 23

    How do I tell OSX to use matplotlib from brew, rather than default?

  24. 24

    How do I tell OSX to use matplotlib from brew, rather than default?

  25. 25

    How do I import data from Activie directories through LDAP URL

  26. 26

    How do I get a Python distribution URL?

  27. 27

    How do I get query parameters in the URL

  28. 28

    How do I get the url to a path in ruby

  29. 29

    How do I get the redirect url?

HotTag

Archive