我有此函数的代码部分,该函数可以替换字符串中编码错误的外来字符:
s = "String from an old database with weird mixed encodings"
s = str(bytes(odbc_str.strip(), 'cp1252'))
s = s.replace('\\x82', 'é')
s = s.replace('\\x8a', 'è')
(...)
print(s)
# b"String from an old database with weird mixed encodings"
我这里需要一个“真实”字符串,而不是字节。但是当我想解码它们时,我有一个例外:
s = "String from an old database with weird mixed encodings"
s = str(bytes(odbc_str.strip(), 'cp1252'))
s = s.replace('\\x82', 'é')
s = s.replace('\\x8a', 'è')
(...)
print(s.decode("utf-8"))
# AttributeError: 'str' object has no attribute 'decode'
提前致谢 !
编辑:
python3中的pypyodbc默认使用所有unicode。那让我感到困惑。在连接时,您可以告诉他使用ANSI。
con_odbc = pypyodbc.connect("DSN=GP", False, False, 0, False)
然后,我可以将返回的内容转换为cp850,这是数据库的初始代码页。
str(odbc_str, "cp850", "replace")
不再需要手动替换每个特殊字符。pepr非常感谢
打印的b"String from an old database with weird mixed encodings"
内容不是字符串内容的表示。它是字符串内容的值。由于您没有将编码参数传递给str()
...(请参阅doc https://docs.python.org/3.4/library/stdtypes.html#str)
如果既未给出编码也未给出错误,则
str(object)
返回object.__str__()
,这是对象的“非正式”或可很好打印的字符串表示形式。对于字符串对象,这是字符串本身。如果object没有__str__()
方法,则str()
退回到returnrepr(object)
。
这就是您的情况。的b"
实际上是两个字符的字符串内容的一部分。您也可以尝试:
s1 = 'String from an old database with weird mixed encodings'
print(type(s1), repr(s1))
by = bytes(s1, 'cp1252')
print(type(by), repr(by))
s2 = str(by)
print(type(s2), repr(s2))
它打印:
<class 'str'> 'String from an old database with weird mixed encodings'
<class 'bytes'> b'String from an old database with weird mixed encodings'
<class 'str'> "b'String from an old database with weird mixed encodings'"
This is the reason why s[2:][:-1]
works for you.
If you think more about it, then (in my opinion) or you want to get bytes
or bytearray
from the database (if possible), and to fix the bytes (see bytes.translate https://docs.python.org/3.4/library/stdtypes.html?highlight=translate#bytes.translate) or you successfully get the string (being lucky that there was no exception when constructing that string), and you want to replace the wrong characters by the correct characters (see also str.translate()
https://docs.python.org/3.4/library/stdtypes.html?highlight=translate#str.translate).
Possibly, the ODBC used internally the wrong encoding. (That is the content of the database may be correct, but it was misinterpreted by the ODBC, and you are not able to tell the ODBC what is the correct encoding.) Then you want to encode the string back to bytes using that wrong encoding, and then decode the bytes using the right encoding.
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句