Search

Search

How do I detect that a string ends in the middle of a UTF-8 sequence?

chaos Published at Dev

9

chaos

I have a situation where a server may arbitrarily break up transmitted UTF-8 string data, including in the middle of a UTF-8 sequence. In the websocket proxy that is receiving this data before it goes to the client, I want to detect that case and have the proxy wait for the next packet from the server and concatenate it with the prior one before sending to the client.

Assuming I am seeing the data from the server as a simple array of bytes, what is the simplest logic I can use to reliably detect the case where those bytes end in the middle of a UTF-8 sequence?

chaos

This is the logic I wound up using (in JavaScript):

function incompleteUTF8(buf) {
    for(var ix = Math.max(buf.length - 6, 0); ix < buf.length; ix++) {
        var ch = buf[ix];
        if(ch < 0x80)
            continue;
        if((ch & 0xe0) === 0xc0)
            ix++;
        else if((ch & 0xf0) === 0xe0)
            ix += 2;
        else if((ch & 0xf8) === 0xf0)
            ix += 3;
        else if((ch & 0xfc) === 0xf8)
            ix += 4;
        else if((ch & 0xfe) === 0xfc)
            ix += 5;
        else
            continue;
        if(ix >= buf.length)
            return true;
    }
    return false;
}

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-14

0

Comments

0 comments

Login to comment

Related

From Dev

In Python 3.8.2, how do I convert a string that contains a '\uxxxx' sequence into utf-8?

From Dev

How do you convert a string representation of a UTF-16 byte sequence to UTF-8 in Python?

From Dev

In Jython, how can I create unicode string from UTF-8 byte sequence?

From Dev

How do I ignore the middle of the string in regex?

From Dev

How to easily detect utf8 encoding in the string?

From Dev

How do I convert List<String[]> values from UTF-8 to String?

From Dev

How can I properly read the sequence of bytes from a hyper::client::Request and print it to the console as a UTF-8 string?

From Dev

How do I set where a string starts and ends in python?

From Java

How do I properly use std::string on UTF-8 in C++?

From Dev

In Python, how do I most efficiently chunk a UTF-8 string for REST delivery?

From Dev

How do I check if a std::string, containing utf8 text, starts with an uppercase letter in Windows?

From Dev

In TI-BASIC, how do I add a variable in the middle of a String?

From Dev

How do I change the characters in the middle of a String in Java?

From Dev

How to do substring for UTF8 string in java?

From Dev

How should I decode a UTF-8 string

From Dev

How to split an UTF-8 string by an escape sequence provided as command line argument in Python 3?

From Dev

How do I commit with a utf-8 message file?

From Dev

How do I properly handle  in UTF-8 XML?

From Dev

How do I unescape multiple byte character utf8

From Dev

How do I convert UTF-8 special characters in Bash?

From Dev

How do I get MinTTY working with UTF8

From Dev

How do I properly convert a UTF-8 encoded char array to a Go string when using a C-library in Go?

From Dev

How do I check if a number string is in running sequence

From Dev

How do I find largest valid sequence of parentheses and brackets in a string?

From Dev

How do I show the numeric character sequence of a string?

From Dev

How do I convert a string to an escape sequence in Python?

From Dev

UTF-8 escape sequence in C string literal

From Dev

How do I write UTF-8 data to a UTF-16LE file using PHP?

From Dev

How do I find the middle element of an ArrayList?

Related Related

Article

HotTag

Archive