I am using std::wstring_convert
to convert a wstring into a multibyte string as follows:
// convert from wide char to multibyte char
try
{
return std::wstring_convert<std::codecvt_utf8<wchar_t>>().to_bytes(wideMessage);
}
// thrown by std::wstring_convert.to_bytes() for bad conversions
catch (std::range_error& exception)
{
// do something...
}
In order to unit test the block I have commented as do something...
I wish to pass a wstring that will throw a std::range_error
exception.
However, I have not been able to formulate such a wstring that will fail such a conversion. The wstring will use UTF16 and I have been reading about high and low surrogates. For example, a UTF16 character of D800 followed by "b" should be invalid. std::wstring(L"\xd800b");
fails to compile on the same grounds possibly. If I create a wstring such as below it will not throw the exception on conversion:
std::wstring wideMessage(L" b");
wideMessage[0] = L'\xd800';
// doesn't throw
std::wstring_convert<std::codecvt_utf8<wchar_t>>().to_bytes(wideMessage);
Is there a suitable wstring I can use to throw an exception during the conversion?
I have tried 5.1, 5.2 and 5.3 from this link. I am using Visual Studio 2015.
Microsoft's implementation of std::codecvt_utf8
appears to successfully convert any UTF-16 code unit into UTF-8—including surrogate pairs. This is a bug, as surrogates are not encodable. Both libc++ (LLVM) and libstdc++ (GCC) will correctly throw a std::range_error
and fail to convert unpaired surrogates.
Looking at their code, it appears that the only way for it to throw is if a character is greater than the Maxcode
template parameter of the facet. For example:
std::wstring_convert<std::codecvt_utf8<wchar_t, 0x1>>
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments