我正在尝试将scrapy CSVFeedSpider用于csv链接,这是一个行示例:
数字,“可能包含逗号”,“可能包含逗号”,“可能包含逗号”,文本,文本,文本,文本,文本,文本,“可能包含逗号”
如果一个值包含逗号,则用引号将其引起来,由于该值仅接受一个定界符,我该如何实现呢?
http://doc.scrapy.org/en/latest/topics/spiders.html#csvfeedspider
如果各列用双引号引起来,则使用逗号分隔时效果很好。如果用单引号引起来,它将抱怨长度不匹配
这是蜘蛛代码:
# -*- coding: utf-8 -*-
from scrapy.spider import Spider
from scrapy.selector import Selector
from stackoverflow23429315.items import DemoItem
from scrapy.contrib.spiders import CSVFeedSpider
from scrapy import log
class DmozSpider(CSVFeedSpider):
name = 'csvFeedTest'
start_urls = ['file:////home/vagrant/labs/stackoverflow23429315/test.csv']
delimiter = ','
headers = ['id', 'name', 'address1', 'address2', 'email']
def parse_row(self, response, row):
log.msg('Hi, this is a row!: %r' % row)
item = DemoItem()
item['id'] = row['id']
item['name'] = row['name']
item['address1'] = row['address1']
item['address2'] = row['address2']
item['email'] = row['email']
return item
物品类别:
from scrapy.item import Item, Field
class DemoItem(Item):
id = Field()
name = Field()
address1 = Field()
address2 = Field()
email = Field()
测试csv文件:
1,"John, Doe","1234 Main Street, APT A","2nd Floor",[email protected]
2,"John2, Doe","1234 Main Street, APT A","2nd Floor",[email protected]
3,'John3, Doe','1234 Main Street, APT A','2nd Floor',[email protected]
4,'John4, Doe','1234 Main Street, APT A','2nd Floor',[email protected]
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句