My scrapy can't get valid response

L.Jack

When i use scrapy to get some stocks information from 'http://quote.eastmoney.com/stocklist.html',I can't get right response. Actually, I get nothing when I run it. Here is the contents of stocks.py:

import scrapy
from scrapy.selector import Selector
import re

class StocksSpider(scrapy.Spider):
name = "stocks"

start_urls = ['http://quote.eastmoney.com/stocklist.html']

def parse(self, response):


    for i in Selector(response).xpath('//div[@id="quotesearch"]/ul/li/a/@href').extract():
        try:
            stock=re.split(r'[./]',i)[5]

            url='https://gupiao.baidu.com/stock/'+stock+'.html'
            yield scrapy.Rquest(url,callback=self.parse_stock)
        except:
            continue

def parse_stock(self,response):
    infoDict={}


    name=Selector(response).xpath('//a[@class="bets-name"]/text()').extract()[0]
    keylist=Selector(response).xpath('//dl/dt/text()').extract()

    for i in range(len(keylist)):
        try:
            val=Selector(response).xpath('//dl/dd/text()').extract()[0]
        except:
            val='--'
        infoDict[keylist[i]]=val
    infoDict.update({'股票名称':name[0].split()[0]+'('+Selector(response).xpath('//a[@class="bets-name"]/span/text()')[0].extract()[0]+')'})


    yield infoDict

Here is what I get when I run it:

2017-06-05 20:28:32 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: BaiduStocks)
    2017-06-05 20:28:32 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'BaiduStocks', 'FEED_EXPORT_ENCODING': 'utf-8', 'NEWSPIDER_MODULE': 'BaiduStocks.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['BaiduStocks.spiders']}
2017-06-05 20:28:32 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2017-06-05 20:28:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-06-05 20:28:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-06-05 20:28:33 [scrapy.middleware] INFO: Enabled item pipelines:
['BaiduStocks.pipelines.BaidustocksInfoPipeline']
2017-06-05 20:28:33 [scrapy.core.engine] INFO: Spider opened
2017-06-05 20:28:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-06-05 20:28:33 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-06-05 20:28:33 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quote.eastmoney.com/robots.txt> (referer: None)
2017-06-05 20:28:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quote.eastmoney.com/stocklist.html> (referer: None)
2017-06-05 20:28:33 [scrapy.core.engine] INFO: Closing spider (finished)
2017-06-05 20:28:33 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 458,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 570201,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 1,
 'downloader/response_status_count/404': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2017, 6, 5, 12, 28, 33, 930937),
 'log_count/DEBUG': 3,
 'log_count/INFO': 7,
 'response_received_count': 2,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2017, 6, 5, 12, 28, 33, 28477)}
2017-06-05 20:28:33 [scrapy.core.engine] INFO: Spider closed (finished)

I have been working on it for a few days, but I can't tell what's wrong. So I really need your help. Thank you all!

Granitosaurus

Here, I did a short code review:

import scrapy
from scrapy.selector import Selector
import re


class StocksSpider(scrapy.Spider):
    name = "stocks"

    start_urls = ['http://quote.eastmoney.com/stocklist.html']

    def parse(self, response):
        # response has a shortcut for selector
        for i in response.xpath('//div[@id="quotesearch"]/ul/li/a/@href').extract():
            # never silently catch and drop errors
            stock = re.split(r'[./]', i)[5]
            url = 'https://gupiao.baidu.com/stock/' + stock + '.html'
            yield scrapy.Request(url, callback=self.parse_stock)

    def parse_stock(self, response):
        # objects should be lowercase in python
        item = dict()
        # there's extract_first shortcut for extract()[0]
        name = response.xpath('//a[@class="bets-name"]/text()').extract_first('')
        keylist = response.xpath('//dl/dt/text()').extract()

        # for each is preferred loop style.
        for key in keylist:
            # extract_first allows a default argument to be set
            item[key] = response.xpath('//dl/dd/text()').extract_first('--')
        data = Selector(response).xpath('//a[@class="bets-name"]/span/text()').extract_first('') + ')'
        item['data'] = '{}({})'.format(name.split()[0], data)
        yield item

Aside from minor issues the biggest one was that your try/except clause in parse() method just quit silenty and you had a typo Rquest, so the spider just went on - you should never have silent blanket exceptions for this very reason :)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

scrapy can't get right response

From Dev

Can't get response from returned Request in Scrapy

From Dev

How do I make a Twilio call hangup if my app doesn't get a valid response from the call recipient

From Dev

Can't get Ajax response

From Dev

Can't get response code

From Dev

Can't get any data with Scrapy

From Dev

Can't get correct response when html has emoji by using Scrapy ( parsing Baidu Tieba post's doule-deck(lzl) )

From Dev

Can't get a valid array output

From Dev

Can't get a valid array output

From Dev

In CUDA, I can't get valid value

From Dev

Can't get in my acount

From Dev

How can I get a valid Authenticity Token with my Rails Console?

From Dev

Making a valid password checker. Can not get my program to get through the if conditions to print a valid password

From Dev

Can't get Superfeedr callback response in CakePHP

From Dev

Java can't get client or server response

From Dev

Why can't get Response in Python?

From Dev

Can't get item from a JSON response

From Dev

react can't get restController response

From Dev

Scrapy can't get data when following links

From Dev

Can't get MySQL.connector to work with Scrapy in Python

From Dev

Can't get value from Scrapy stats dictionary

From Dev

Can't get Scrapy spider_opened to be called

From Dev

why the scrapy-plugins/scrapy-jsonrpc can't get the spider's stats

From Dev

can not install scrapy on my mac

From Dev

Can't get my input focus to work?

From Dev

Can't get my footer content to center

From Dev

Can't get my Quiz app to work

From Dev

Can't get my function to do anything

From Dev

Can't get the Children in my firebase database

Related Related

HotTag

Archive