Python2.7, what does the special characters mean in the utf-32 encoding output of a unicode string?

David Zheng Published at Dev

David Zheng

I was playing around with python's unicode and encoding methods, I used the special character "‽" and a Chinese character to see how different utf encoding deal with these characters, and I get these output.

>>> a = u"‽"
>>> encoded_a = a.encode('utf-32')
>>> a
u'\u203d'
>>> encoded_a
'\xff\xfe\x00\x00= \x00\x00'
>>> b = u"安"
>>> encoded_b = b.encode('utf-32')
>>> b
u'\u5b89'
>>> encoded_b
'\xff\xfe\x00\x00\x89[\x00\x00'

My question is what does the equal sign and the equare bracket mean in the encoded result?

Mark Ransom

When you print the repr of a byte string, any byte value in the range of \x20 through \x7e will be converted to an equivalent ASCII printable character. In this case, = is the same as \x3d and [ is the same as \x5b. You missed the space, which is \x20.

Here's the complete table:

\x20 ' '    \x21 '!'    \x22 '"'    \x23 '#'
\x24 '$'    \x25 '%'    \x26 '&'    \x27 '''
\x28 '('    \x29 ')'    \x2a '*'    \x2b '+'
\x2c ','    \x2d '-'    \x2e '.'    \x2f '/'
\x30 '0'    \x31 '1'    \x32 '2'    \x33 '3'
\x34 '4'    \x35 '5'    \x36 '6'    \x37 '7'
\x38 '8'    \x39 '9'    \x3a ':'    \x3b ';'
\x3c '<'    \x3d '='    \x3e '>'    \x3f '?'
\x40 '@'    \x41 'A'    \x42 'B'    \x43 'C'
\x44 'D'    \x45 'E'    \x46 'F'    \x47 'G'
\x48 'H'    \x49 'I'    \x4a 'J'    \x4b 'K'
\x4c 'L'    \x4d 'M'    \x4e 'N'    \x4f 'O'
\x50 'P'    \x51 'Q'    \x52 'R'    \x53 'S'
\x54 'T'    \x55 'U'    \x56 'V'    \x57 'W'
\x58 'X'    \x59 'Y'    \x5a 'Z'    \x5b '['
\x5c '\'    \x5d ']'    \x5e '^'    \x5f '_'
\x60 '`'    \x61 'a'    \x62 'b'    \x63 'c'
\x64 'd'    \x65 'e'    \x66 'f'    \x67 'g'
\x68 'h'    \x69 'i'    \x6a 'j'    \x6b 'k'
\x6c 'l'    \x6d 'm'    \x6e 'n'    \x6f 'o'
\x70 'p'    \x71 'q'    \x72 'r'    \x73 's'
\x74 't'    \x75 'u'    \x76 'v'    \x77 'w'
\x78 'x'    \x79 'y'    \x7a 'z'    \x7b '{'
\x7c '|'    \x7d '}'    \x7e '~'

Your two strings are actually '\xff\xfe\x00\x00\x3d\x20\x00\x00' and '\xff\xfe\x00\x00\x89\x5b\x00\x00'.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-28

Comments

0 comments

From Dev

Related Related

Article

Python2.7, what does the special characters mean in the utf-32 encoding output of a unicode string?

Python2.7, what does the special characters mean in the utf-32 encoding output of a unicode string?

What does (value << 32) >> 32 mean?

what does [...] mean as an output in python?

Unicode characters encoding for a MySQL output

Output encoding for special characters like !"#

What does special character @ mean in gvim?

Python unicode encoding using UTF-8

What does the + mean in Python string slices?

PHP function to convert special characters to unicode(UTF-16)

what does this nm output mean?

MySQL to JSON: Issue with encoding of German special characters in UTF-8

UTF-8 encoding issues with Spring resourcebundle for special characters

Swift Special Unicode Characters in String Literals

How to convert special characters in a string to unicode?

What does `{...}` mean in the print output of a python variable?

What does collation utf8mb4_unicode_ci mean

Notepad++ inserting special Unicode characters in UTF-8

What does # mean in ls output

UTF-8 Unicode encoding and country specific characters

what does this output of ls mean?

JSON encoding a string with the # and other special characters

Convert a string to unicode with special characters

Unicode encoding for Polish characters in Python

MySQL to JSON: Issue with encoding of German special characters in UTF-8

Swift Special Unicode Characters in String Literals

Replace special characters in a file with their unicode code (Python)

Replace special characters in a file with their unicode code (Python)

What does this diagnostic output mean?

Special Unicode Characters are not removed in Python 3

UTF-8 encoding issue in php, special characters