Python2.7, what does the special characters mean in the utf-32 encoding output of a unicode string?

David Zheng

I was playing around with python's unicode and encoding methods, I used the special character "‽" and a Chinese character to see how different utf encoding deal with these characters, and I get these output.

>>> a = u"‽"
>>> encoded_a = a.encode('utf-32')
>>> a
u'\u203d'
>>> encoded_a
'\xff\xfe\x00\x00= \x00\x00'
>>> b = u"安"
>>> encoded_b = b.encode('utf-32')
>>> b
u'\u5b89'
>>> encoded_b
'\xff\xfe\x00\x00\x89[\x00\x00'

My question is what does the equal sign and the equare bracket mean in the encoded result?

Mark Ransom

When you print the repr of a byte string, any byte value in the range of \x20 through \x7e will be converted to an equivalent ASCII printable character. In this case, = is the same as \x3d and [ is the same as \x5b. You missed the space, which is \x20.

Here's the complete table:

\x20 ' '    \x21 '!'    \x22 '"'    \x23 '#'
\x24 '$'    \x25 '%'    \x26 '&'    \x27 '''
\x28 '('    \x29 ')'    \x2a '*'    \x2b '+'
\x2c ','    \x2d '-'    \x2e '.'    \x2f '/'
\x30 '0'    \x31 '1'    \x32 '2'    \x33 '3'
\x34 '4'    \x35 '5'    \x36 '6'    \x37 '7'
\x38 '8'    \x39 '9'    \x3a ':'    \x3b ';'
\x3c '<'    \x3d '='    \x3e '>'    \x3f '?'
\x40 '@'    \x41 'A'    \x42 'B'    \x43 'C'
\x44 'D'    \x45 'E'    \x46 'F'    \x47 'G'
\x48 'H'    \x49 'I'    \x4a 'J'    \x4b 'K'
\x4c 'L'    \x4d 'M'    \x4e 'N'    \x4f 'O'
\x50 'P'    \x51 'Q'    \x52 'R'    \x53 'S'
\x54 'T'    \x55 'U'    \x56 'V'    \x57 'W'
\x58 'X'    \x59 'Y'    \x5a 'Z'    \x5b '['
\x5c '\'    \x5d ']'    \x5e '^'    \x5f '_'
\x60 '`'    \x61 'a'    \x62 'b'    \x63 'c'
\x64 'd'    \x65 'e'    \x66 'f'    \x67 'g'
\x68 'h'    \x69 'i'    \x6a 'j'    \x6b 'k'
\x6c 'l'    \x6d 'm'    \x6e 'n'    \x6f 'o'
\x70 'p'    \x71 'q'    \x72 'r'    \x73 's'
\x74 't'    \x75 'u'    \x76 'v'    \x77 'w'
\x78 'x'    \x79 'y'    \x7a 'z'    \x7b '{'
\x7c '|'    \x7d '}'    \x7e '~'

Your two strings are actually '\xff\xfe\x00\x00\x3d\x20\x00\x00' and '\xff\xfe\x00\x00\x89\x5b\x00\x00'.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

What does (value << 32) >> 32 mean?

From Dev

what does [...] mean as an output in python?

From Dev

Unicode characters encoding for a MySQL output

From Dev

Output encoding for special characters like !"#

From Dev

What does special character @ mean in gvim?

From Dev

Python unicode encoding using UTF-8

From Dev

What does the + mean in Python string slices?

From Dev

PHP function to convert special characters to unicode(UTF-16)

From Dev

what does this nm output mean?

From Dev

MySQL to JSON: Issue with encoding of German special characters in UTF-8

From Dev

UTF-8 encoding issues with Spring resourcebundle for special characters

From Dev

Swift Special Unicode Characters in String Literals

From Dev

How to convert special characters in a string to unicode?

From Dev

What does `{...}` mean in the print output of a python variable?

From Dev

What does collation utf8mb4_unicode_ci mean

From Dev

Notepad++ inserting special Unicode characters in UTF-8

From Dev

What does # mean in ls output

From Dev

UTF-8 Unicode encoding and country specific characters

From Dev

what does this output of ls mean?

From Dev

JSON encoding a string with the # and other special characters

From Dev

Convert a string to unicode with special characters

From Dev

Unicode encoding for Polish characters in Python

From Dev

MySQL to JSON: Issue with encoding of German special characters in UTF-8

From Dev

Swift Special Unicode Characters in String Literals

From Dev

Replace special characters in a file with their unicode code (Python)

From Dev

Replace special characters in a file with their unicode code (Python)

From Dev

What does this diagnostic output mean?

From Dev

Special Unicode Characters are not removed in Python 3

From Dev

UTF-8 encoding issue in php, special characters

Related Related

  1. 1

    What does (value << 32) >> 32 mean?

  2. 2

    what does [...] mean as an output in python?

  3. 3

    Unicode characters encoding for a MySQL output

  4. 4

    Output encoding for special characters like !"#

  5. 5

    What does special character @ mean in gvim?

  6. 6

    Python unicode encoding using UTF-8

  7. 7

    What does the + mean in Python string slices?

  8. 8

    PHP function to convert special characters to unicode(UTF-16)

  9. 9

    what does this nm output mean?

  10. 10

    MySQL to JSON: Issue with encoding of German special characters in UTF-8

  11. 11

    UTF-8 encoding issues with Spring resourcebundle for special characters

  12. 12

    Swift Special Unicode Characters in String Literals

  13. 13

    How to convert special characters in a string to unicode?

  14. 14

    What does `{...}` mean in the print output of a python variable?

  15. 15

    What does collation utf8mb4_unicode_ci mean

  16. 16

    Notepad++ inserting special Unicode characters in UTF-8

  17. 17

    What does # mean in ls output

  18. 18

    UTF-8 Unicode encoding and country specific characters

  19. 19

    what does this output of ls mean?

  20. 20

    JSON encoding a string with the # and other special characters

  21. 21

    Convert a string to unicode with special characters

  22. 22

    Unicode encoding for Polish characters in Python

  23. 23

    MySQL to JSON: Issue with encoding of German special characters in UTF-8

  24. 24

    Swift Special Unicode Characters in String Literals

  25. 25

    Replace special characters in a file with their unicode code (Python)

  26. 26

    Replace special characters in a file with their unicode code (Python)

  27. 27

    What does this diagnostic output mean?

  28. 28

    Special Unicode Characters are not removed in Python 3

  29. 29

    UTF-8 encoding issue in php, special characters

HotTag

Archive