Is it ok to write the unicode characters inside string literals in C++

FrozenHeart

Is it ok to write the following code?

const char* str = "§some-text";

Will str contain the correct UTF-8 representation of the § character if the source files was saved in a UTF-8 encoding?

Or is the only way to write it is to use u8-prefixed string literals?

Simple

Whether you can use Unicode characters in your source code (not just in string literals) is implementation-defined. The only way to be portable is to stick to characters in the "basic source character set" and use u8"\u00a7some-text".

[lex.phases]/1:

Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical source file characters accepted is implementation-defined. Any source file character not in the basic source character set (2.3) is replaced by the universal-character-name that designates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (e.g., using the \uXXXX notation), are handled equivalently except where this replacement is reverted in a raw string literal.)

The "basic source character set" is:

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:

a b c d e f g h i j k l m n o p q r s t u v w x y z

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

0 1 2 3 4 5 6 7 8 9

_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Swift Special Unicode Characters in String Literals

From Dev

Swift Special Unicode Characters in String Literals

From Dev

C++: portability of Unicode string literals

From Dev

Unicode string literals in VBA

From Dev

Special Characters in string literals

From Dev

convert a string to a sequence of C# Unicode character literals

From Dev

Python to convert special unicode characters (like ♡) to their hexadecimal literals in a string (like 0x2661)

From Dev

Unicode literals in Visual C++

From Java

Replace Unicode Characters in a String

From Dev

Remove Unicode characters in a String

From Dev

Getting the unicode characters of a string

From Dev

Unicode characters in string

From Dev

Unicode characters in string

From Dev

Converting string to Unicode characters

From Dev

Replace all unicode literals in string with corresponding symbol

From Dev

Address Of String Literals in C

From Dev

$"...{}..." string literals in C#?

From Dev

Is there a universal way to write Unicode characters?

From Dev

Protobuf : C++ string with null characters inside

From Dev

Why are unicode characters treated the same in C++ std::string?

From Dev

Is it "bad practice" to use tab characters in string literals?

From Java

How to ignore comments inside string literals

From Dev

How can I convert a unicode string into string literals in Python 2.7?

From Dev

How can I convert a unicode string into string literals in Python 2.7?

From Dev

Converting Unicode string to Chinese characters

From Dev

extract words in string with unicode characters

From Dev

Convert a string to unicode with special characters

From Dev

Java Replace Unicode Characters in a String

From Dev

c++ template and string literals

Related Related

HotTag

Archive