I was playing around with python's unicode and encoding methods, I used the special character "‽" and a Chinese character to see how different utf encoding deal with these characters, and I get these output.
>>> a = u"‽"
>>> encoded_a = a.encode('utf-32')
>>> a
u'\u203d'
>>> encoded_a
'\xff\xfe\x00\x00= \x00\x00'
>>> b = u"安"
>>> encoded_b = b.encode('utf-32')
>>> b
u'\u5b89'
>>> encoded_b
'\xff\xfe\x00\x00\x89[\x00\x00'
My question is what does the equal sign and the equare bracket mean in the encoded result?
When you print the repr
of a byte string, any byte value in the range of \x20
through \x7e
will be converted to an equivalent ASCII printable character. In this case, =
is the same as \x3d
and [
is the same as \x5b
. You missed the space, which is \x20
.
Here's the complete table:
\x20 ' ' \x21 '!' \x22 '"' \x23 '#'
\x24 '$' \x25 '%' \x26 '&' \x27 '''
\x28 '(' \x29 ')' \x2a '*' \x2b '+'
\x2c ',' \x2d '-' \x2e '.' \x2f '/'
\x30 '0' \x31 '1' \x32 '2' \x33 '3'
\x34 '4' \x35 '5' \x36 '6' \x37 '7'
\x38 '8' \x39 '9' \x3a ':' \x3b ';'
\x3c '<' \x3d '=' \x3e '>' \x3f '?'
\x40 '@' \x41 'A' \x42 'B' \x43 'C'
\x44 'D' \x45 'E' \x46 'F' \x47 'G'
\x48 'H' \x49 'I' \x4a 'J' \x4b 'K'
\x4c 'L' \x4d 'M' \x4e 'N' \x4f 'O'
\x50 'P' \x51 'Q' \x52 'R' \x53 'S'
\x54 'T' \x55 'U' \x56 'V' \x57 'W'
\x58 'X' \x59 'Y' \x5a 'Z' \x5b '['
\x5c '\' \x5d ']' \x5e '^' \x5f '_'
\x60 '`' \x61 'a' \x62 'b' \x63 'c'
\x64 'd' \x65 'e' \x66 'f' \x67 'g'
\x68 'h' \x69 'i' \x6a 'j' \x6b 'k'
\x6c 'l' \x6d 'm' \x6e 'n' \x6f 'o'
\x70 'p' \x71 'q' \x72 'r' \x73 's'
\x74 't' \x75 'u' \x76 'v' \x77 'w'
\x78 'x' \x79 'y' \x7a 'z' \x7b '{'
\x7c '|' \x7d '}' \x7e '~'
Your two strings are actually '\xff\xfe\x00\x00\x3d\x20\x00\x00'
and '\xff\xfe\x00\x00\x89\x5b\x00\x00'
.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments