What is the difference between a unicode and binary string?

showkey

I am in python3.3.

What is the difference between a unicode string and a binary string?

b'\\u4f60'
u'\x4f\x60'
b'\x4f\x60'
u'4f60'

The concept of Unicode and binary string is confusing. How can i change b'\\u4f60' into b'\x4f\x60' ?

roippi

First - there is no difference between unicode literals and string literals in python 3. They are one and the same - you can drop the u up front. Just write strings. So instantly you should see that the literal u'4f60' is just like writing actual '4f60'.

A bytes literal - aka b'some literal' - is a series of bytes. Bytes between 32 and 127 (aka ASCII) can be displayed as their corresponding glyph, the rest are displayed as the \x escaped version. Don't be confused by this - b'\x61' is the same as b'a'. It's just a matter of printing.

A string literal is a string literal. It can contain unicode codepoints. There is far too much to cover to explain how unicode works here, but basically a codepoint represents a glyph (essentially, a character - a graphical representation of a letter/digit), it does not specify how the machine needs to represent it. In fact there are a great many different ways.

Thus there is a very large difference between bytes literals and str literals. The former describe the machine representation, the latter describe the alphanumeric glyphs that we are reading right now. The mapping between the two domains is encoding/decoding.

I'm skipping over a lot of vital information here. That should get us somewhere though. I highly recommend reading more since this is not an easy topic.


How can i change b'\\u4f60' into b'\x4f\x60' ?

Let's walk through it:

b'\u4f60'
Out[101]: b'\\u4f60' #note, unicode-escaped

b'\x4f\x60'
Out[102]: b'O`'

'\u4f60'
Out[103]: '你'

So, notice that \u4f60 is that Han ideograph glyph. \x4f\x60 is, if we represent it in ascii (or utf-8, actually), the letter O (\x4f) followed by backtick.

I can ask python to turn that unicode-escaped bytes sequence into a valid string with the according unicode glyph:

b'\\u4f60'.decode('unicode-escape')
Out[112]: '你'

So now all we need to do is to re-encode to bytes, right? Well...

Coming around to what I think you're wanting to ask -

How can i change '\\u4f60' into its proper bytes representation?

There is no 'proper' bytes representation of that unicode codepoint. There is only a representation in the encoding that you want. It so happens that there is one encoding that directly matches the transformation to b'\x4f\x60' - utf-16be.

b'\\u4f60'.decode('unicode-escape').encode('utf-16-be')
Out[47]: 'O`'

The reason this works is that utf-16 is a variable-length encoding. For code points below 16 bits it just directly uses the codepoint as the 2-byte encoding, and for points above it uses something called "surrogate pairs", which I won't get into.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

What is the difference between these unicode classes?

From Dev

What is the difference between these unicode classes?

From Dev

Difference between u'string' and unicode(string)

From Dev

Difference between u'string' and unicode(string)

From Java

What's the difference between ASCII and Unicode?

From Java

What is the difference between UTF-8 and Unicode?

From Dev

What is the difference between u' ' prefix and unicode() in python?

From Dev

What is the difference between source and binary distributions of mysql

From Dev

What is the difference between binary, binomial, and Fibonacci heaps?

From Dev

What is the difference between binary and utf8?

From Dev

What is the difference between source and binary distributions of mysql

From Dev

What's the difference between a binary file and a library?

From Dev

What is the difference between binary, binomial, and Fibonacci heaps?

From Dev

Difference between String Value and Binary Value in Registry

From Dev

What is the difference between binary safe strings and binary unsafe strings?

From Dev

what is the difference between a Binary Search Tree and a Threaded Binary Tree?

From Dev

What is the difference between "cat file | ./binary" and "./binary < file"?

From Dev

What is the difference between a Redhat "Binary DVD" and an "Update Binary DVD"?

From Java

What is the difference between String and string in C#?

From Dev

What is the difference between String name[] = {}; and String [] name = {};

From Dev

What is the difference between string[][] and string[,] in C#

From Dev

What's the difference between [String!] and [String]!

From Dev

What is the difference between string literals and string values?

From Dev

What's the difference between String and String[]?

From Dev

Scala: What is the difference between (a: String) and (a: => String) for argument?

From Dev

What is the difference between a String[] and String in a for loop (Java)

From Dev

What is the difference between data type String and [String]?

From Dev

What is the difference between UTF8-in literal and unicode point?

From Dev

What's the difference between hex code (\x) and unicode (\u) chars?

Related Related

  1. 1

    What is the difference between these unicode classes?

  2. 2

    What is the difference between these unicode classes?

  3. 3

    Difference between u'string' and unicode(string)

  4. 4

    Difference between u'string' and unicode(string)

  5. 5

    What's the difference between ASCII and Unicode?

  6. 6

    What is the difference between UTF-8 and Unicode?

  7. 7

    What is the difference between u' ' prefix and unicode() in python?

  8. 8

    What is the difference between source and binary distributions of mysql

  9. 9

    What is the difference between binary, binomial, and Fibonacci heaps?

  10. 10

    What is the difference between binary and utf8?

  11. 11

    What is the difference between source and binary distributions of mysql

  12. 12

    What's the difference between a binary file and a library?

  13. 13

    What is the difference between binary, binomial, and Fibonacci heaps?

  14. 14

    Difference between String Value and Binary Value in Registry

  15. 15

    What is the difference between binary safe strings and binary unsafe strings?

  16. 16

    what is the difference between a Binary Search Tree and a Threaded Binary Tree?

  17. 17

    What is the difference between "cat file | ./binary" and "./binary < file"?

  18. 18

    What is the difference between a Redhat "Binary DVD" and an "Update Binary DVD"?

  19. 19

    What is the difference between String and string in C#?

  20. 20

    What is the difference between String name[] = {}; and String [] name = {};

  21. 21

    What is the difference between string[][] and string[,] in C#

  22. 22

    What's the difference between [String!] and [String]!

  23. 23

    What is the difference between string literals and string values?

  24. 24

    What's the difference between String and String[]?

  25. 25

    Scala: What is the difference between (a: String) and (a: => String) for argument?

  26. 26

    What is the difference between a String[] and String in a for loop (Java)

  27. 27

    What is the difference between data type String and [String]?

  28. 28

    What is the difference between UTF8-in literal and unicode point?

  29. 29

    What's the difference between hex code (\x) and unicode (\u) chars?

HotTag

Archive