python和json UTF-8编码

debugcn 发表于 Dev

停下来

我目前面临一些有关编码的问题。因为我是法国人，所以经常使用é或字符è。

我试图弄清楚为什么它们没有显示在我用scrapy...自动创建的JSON文件中。

这是我的python代码：

# -*- coding: utf-8 -*-

import scrapy


class BlogSpider(scrapy.Spider):
    name = 'pokespider'
    start_urls = [
        "https://www.pokepedia.fr/Liste_des_Pok%C3%A9mon_par_apport_en_EV"]

    def parse(self, response):
        for poke in response.css('table.tableaustandard.sortable tr')[1:]:
            num = poke.css('td ::text').extract_first()
            nom = poke.css('td:nth-child(3) a ::text').extract_first()

            yield {'numero': int(num), 'nom': nom}

然后，在键入scrapy命令后，代码将生成一个JSON文件。这是它的第一行：

[
{"numero": 1, "nom": "Bulbizarre"},
{"numero": 2, "nom": "Herbizarre"},
{"numero": 3, "nom": "Florizarre"},
{"numero": 4, "nom": "Salam\u00e8che"},
...
]

（是的，这些是法国神奇宝贝的名字。）

所以，我想摆脱这个\u00e8角色，应该是一个è……有没有办法做到这一点？

预先谢谢你，我希望我的英语不会太差:)

Samsul伊斯兰教|

使用FEED_EXPORT_ENCODING选项：此处为custom_settings。

import scrapy
  
class BlogSpider(scrapy.Spider):
    name = 'pokespider'
    custom_settings = {'FEED_EXPORT_ENCODING': 'utf-8'}
    start_urls = [
        "https://www.pokepedia.fr/Liste_des_Pok%C3%A9mon_par_apport_en_EV"]

    def parse(self, response):
        for poke in response.css('table.tableaustandard.sortable tr')[1:]:
            num = poke.css('td ::text').extract_first()
            nom = poke.css('td:nth-child(3) a ::text').extract_first()

            yield {'numero': int(num), 'nom': nom}

process = CrawlerProcess(settings={
    "FEEDS": {
        "items_json": {"format": "json"},
    },
})

process.crawl(BlogSpider)
process.start()

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。