Strange behavior of std::string with unicode

justanothercoder

I have the following piece of code:

#include <iostream>

std::string eps("ε");

int main()
{
    std::cout << eps << '\n';
    return 0;
}

Somehow it compiles with g++ and clang on Ubuntu, and even prints out right character ε. Also I have almost same piece of code which happily reads ε with cin into std::string. By the way, eps.size() is 2.

My question is - how that works? How can we insert unicode character into std::string? My guess is that operating system handles all this work with unicode, but I'm not sure.

EDIT

As with output, I understood that it is terminal who is responsible for showing me right character (ε in this case).

But with input: cin reads symbols to ' ' or any other space character (and as I understand byte by byte). So, if I take Ƞ, which second byte is 32 ' ' it will read only first byte, and then stop. But it reads Ƞ. How?

NPE

The most likely reason is that everything is getting encoded in UTF-8, as it does on my system:

$ xxd test.cpp
...
0000020: 2065 7073 2822 ceb5 2229 3b0a 0a69 6e74   eps("..");..int
                        ^^^^ ε in UTF-8                 ^^ TWO bytes!
...
$ g++ -o test.out test.cpp
$ ./test.out 
ε
$ ./test.out | xxd
0000000: ceb5 0a
         ^^^^              

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Strange behavior of std::string as an argument of a member function

From Dev

Strange behavior when comparing unicode objects with string objects

From Dev

Strange behavior with std::function

From Dev

Strange behavior with std::vector

From Java

Strange String pool behavior

From Dev

String#split strange behavior

From Dev

Strange behavior of split String method

From Dev

Swift string length strange behavior

From Dev

String#split strange behavior

From Dev

Nasty regex and strange string behavior

From Dev

Strange String Behavior in Canvas/JS

From Dev

Strange Behavior of Empty String in JAVA

From Dev

shell: strange string concatenation behavior

From Dev

Strange Behavior with String Arrays in C

From Dev

C++ 11 std::thread strange behavior

From Dev

strange behavior of std::cout in c++

From Dev

C++ 11 std::thread strange behavior

From Dev

Strange behavior when adding a string with spaces to ZooKeeper

From Dev

Strange behavior with String.to_integer/1

From Dev

Strange behavior of Java String split() method

From Dev

WeakReference wrapping a string causes strange behavior

From Dev

Trying to understand strange string concatenation behavior

From Dev

Strange Behavior With Pandas Group By - Transform On String Columns

From Dev

Strange Behavior With Pandas Group By - Transform On String Columns

From Dev

Strange behavior string.Trim method

From Dev

C++ string.length() Strange Behavior

From Dev

std::vector<string> odd behavior

From Dev

Strange memory behavior with std map and shared_ptr

From Dev

Strange memory behavior with std map and shared_ptr