I have the following piece of code:
#include <iostream>
std::string eps("ε");
int main()
{
std::cout << eps << '\n';
return 0;
}
Somehow it compiles with g++ and clang on Ubuntu, and even prints out right character ε
. Also I have almost same piece of code which happily reads ε
with cin
into std::string
. By the way, eps.size()
is 2.
My question is - how that works? How can we insert unicode character into std::string
? My guess is that operating system handles all this work with unicode, but I'm not sure.
EDIT
As with output, I understood that it is terminal who is responsible for showing me right character (ε in this case).
But with input: cin reads symbols to ' '
or any other space character (and as I understand byte by byte). So, if I take Ƞ
, which second byte is 32 ' '
it will read only first byte, and then stop. But it reads Ƞ
. How?
The most likely reason is that everything is getting encoded in UTF-8, as it does on my system:
$ xxd test.cpp
...
0000020: 2065 7073 2822 ceb5 2229 3b0a 0a69 6e74 eps("..");..int
^^^^ ε in UTF-8 ^^ TWO bytes!
...
$ g++ -o test.out test.cpp
$ ./test.out
ε
$ ./test.out | xxd
0000000: ceb5 0a
^^^^
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments