How do I detect that a string ends in the middle of a UTF-8 sequence?

chaos

I have a situation where a server may arbitrarily break up transmitted UTF-8 string data, including in the middle of a UTF-8 sequence. In the websocket proxy that is receiving this data before it goes to the client, I want to detect that case and have the proxy wait for the next packet from the server and concatenate it with the prior one before sending to the client.

Assuming I am seeing the data from the server as a simple array of bytes, what is the simplest logic I can use to reliably detect the case where those bytes end in the middle of a UTF-8 sequence?

chaos

This is the logic I wound up using (in JavaScript):

function incompleteUTF8(buf) {
    for(var ix = Math.max(buf.length - 6, 0); ix < buf.length; ix++) {
        var ch = buf[ix];
        if(ch < 0x80)
            continue;
        if((ch & 0xe0) === 0xc0)
            ix++;
        else if((ch & 0xf0) === 0xe0)
            ix += 2;
        else if((ch & 0xf8) === 0xf0)
            ix += 3;
        else if((ch & 0xfc) === 0xf8)
            ix += 4;
        else if((ch & 0xfe) === 0xfc)
            ix += 5;
        else
            continue;
        if(ix >= buf.length)
            return true;
    }
    return false;
}

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

In Python 3.8.2, how do I convert a string that contains a '\uxxxx' sequence into utf-8?

From Dev

How do you convert a string representation of a UTF-16 byte sequence to UTF-8 in Python?

From Dev

In Jython, how can I create unicode string from UTF-8 byte sequence?

From Dev

How do I ignore the middle of the string in regex?

From Dev

How to easily detect utf8 encoding in the string?

From Dev

How do I convert List<String[]> values from UTF-8 to String?

From Dev

How can I properly read the sequence of bytes from a hyper::client::Request and print it to the console as a UTF-8 string?

From Dev

How do I set where a string starts and ends in python?

From Java

How do I properly use std::string on UTF-8 in C++?

From Dev

In Python, how do I most efficiently chunk a UTF-8 string for REST delivery?

From Dev

How do I check if a std::string, containing utf8 text, starts with an uppercase letter in Windows?

From Dev

In TI-BASIC, how do I add a variable in the middle of a String?

From Dev

How do I change the characters in the middle of a String in Java?

From Dev

How to do substring for UTF8 string in java?

From Dev

How should I decode a UTF-8 string

From Dev

How to split an UTF-8 string by an escape sequence provided as command line argument in Python 3?

From Dev

How do I commit with a utf-8 message file?

From Dev

How do I properly handle &#xFFFF; in UTF-8 XML?

From Dev

How do I unescape multiple byte character utf8

From Dev

How do I convert UTF-8 special characters in Bash?

From Dev

How do I get MinTTY working with UTF8

From Dev

How do I properly convert a UTF-8 encoded char array to a Go string when using a C-library in Go?

From Dev

How do I check if a number string is in running sequence

From Dev

How do I find largest valid sequence of parentheses and brackets in a string?

From Dev

How do I show the numeric character sequence of a string?

From Dev

How do I convert a string to an escape sequence in Python?

From Dev

UTF-8 escape sequence in C string literal

From Dev

How do I write UTF-8 data to a UTF-16LE file using PHP?

From Dev

How do I find the middle element of an ArrayList?

Related Related

  1. 1

    In Python 3.8.2, how do I convert a string that contains a '\uxxxx' sequence into utf-8?

  2. 2

    How do you convert a string representation of a UTF-16 byte sequence to UTF-8 in Python?

  3. 3

    In Jython, how can I create unicode string from UTF-8 byte sequence?

  4. 4

    How do I ignore the middle of the string in regex?

  5. 5

    How to easily detect utf8 encoding in the string?

  6. 6

    How do I convert List<String[]> values from UTF-8 to String?

  7. 7

    How can I properly read the sequence of bytes from a hyper::client::Request and print it to the console as a UTF-8 string?

  8. 8

    How do I set where a string starts and ends in python?

  9. 9

    How do I properly use std::string on UTF-8 in C++?

  10. 10

    In Python, how do I most efficiently chunk a UTF-8 string for REST delivery?

  11. 11

    How do I check if a std::string, containing utf8 text, starts with an uppercase letter in Windows?

  12. 12

    In TI-BASIC, how do I add a variable in the middle of a String?

  13. 13

    How do I change the characters in the middle of a String in Java?

  14. 14

    How to do substring for UTF8 string in java?

  15. 15

    How should I decode a UTF-8 string

  16. 16

    How to split an UTF-8 string by an escape sequence provided as command line argument in Python 3?

  17. 17

    How do I commit with a utf-8 message file?

  18. 18

    How do I properly handle &#xFFFF; in UTF-8 XML?

  19. 19

    How do I unescape multiple byte character utf8

  20. 20

    How do I convert UTF-8 special characters in Bash?

  21. 21

    How do I get MinTTY working with UTF8

  22. 22

    How do I properly convert a UTF-8 encoded char array to a Go string when using a C-library in Go?

  23. 23

    How do I check if a number string is in running sequence

  24. 24

    How do I find largest valid sequence of parentheses and brackets in a string?

  25. 25

    How do I show the numeric character sequence of a string?

  26. 26

    How do I convert a string to an escape sequence in Python?

  27. 27

    UTF-8 escape sequence in C string literal

  28. 28

    How do I write UTF-8 data to a UTF-16LE file using PHP?

  29. 29

    How do I find the middle element of an ArrayList?

HotTag

Archive