我有一个二进制文件,我想提取所有ascii字符,而忽略非ascii字符。目前我有:
with open(filename, 'rb') as fobj:
text = fobj.read().decode('utf-16-le')
file = open("text.txt", "w")
file.write("{}".format(text))
file.close
但是,写入文件时遇到错误UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128)
。我将如何让Python忽略非ascii?
使用内置的ASCII编解码器,并告诉它忽略任何错误,例如:
with open(filename, 'rb') as fobj:
text = fobj.read().decode('utf-16-le')
file = open("text.txt", "w")
file.write("{}".format(text.encode('ascii', 'ignore')))
file.close()
您可以在Python解释器中对此进行测试和试玩:
>>> s = u'hello \u00a0 there'
>>> s
u'hello \xa0 there'
仅尝试转换为字符串会引发异常。
>>> str(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 6: ordinal not in range(128)
...就像尝试将unicode字符串编码为ASCII一样:
>>> s.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 6: ordinal not in range(128)
...但是告诉编解码器忽略它无法处理的字符可以:
>>> s.encode('ascii', 'ignore')
'hello there'
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句