I have a situation where a server may arbitrarily break up transmitted UTF-8 string data, including in the middle of a UTF-8 sequence. In the websocket proxy that is receiving this data before it goes to the client, I want to detect that case and have the proxy wait for the next packet from the server and concatenate it with the prior one before sending to the client.
Assuming I am seeing the data from the server as a simple array of bytes, what is the simplest logic I can use to reliably detect the case where those bytes end in the middle of a UTF-8 sequence?
This is the logic I wound up using (in JavaScript):
function incompleteUTF8(buf) {
for(var ix = Math.max(buf.length - 6, 0); ix < buf.length; ix++) {
var ch = buf[ix];
if(ch < 0x80)
continue;
if((ch & 0xe0) === 0xc0)
ix++;
else if((ch & 0xf0) === 0xe0)
ix += 2;
else if((ch & 0xf8) === 0xf0)
ix += 3;
else if((ch & 0xfc) === 0xf8)
ix += 4;
else if((ch & 0xfe) === 0xfc)
ix += 5;
else
continue;
if(ix >= buf.length)
return true;
}
return false;
}
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments