我有一个包含字母('øæå')的txt文档,我希望此脚本能够识别这些字母并将它们正确地写入csv文件。
with codecs.open('transaksjonliste.txt', 'r', 'ISO-8859-1') as file:
for line in file:
line = file.readline()
lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
splitTab = lineS.split(';')
for s in splitTab:
newS = s[1:-1]
date = splitTab[0].replace('.', '/')
insertList = [date,]
out.writerow(date)
给出:
File "Q:\DropBox\Development\Scripts\tes2.py", line 17, in <module>
lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 14: invalid start byte
with codecs.open('transaksjonliste.txt', 'r', 'ISO-8859-1') as file:
for line in file:
line = file.readline()
lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
splitTab = lineS.split(';')
删除line = file.readline()
,您已经在遍历带有for line in file
构造的各行。
lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
不会是您想要的,因为它会编码为ISO-8859-1,然后尝试将ISO-8859-1解码为好像是UTF-8。如果要将“ ISO-8859-1”转换为UTF-8,通常需要
lineS = line.decode('ISO-8859-1', 'ignore').encode('utf-8')
但是,您已经在codecs.open()表达式中将数据从“ ISO-8859-1”(转换为unicode)转换了。所以你只需要做
lineS = = line.encode('utf-8')
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句