JSON Unicode escape sequence - lowercase or not?

Daniel Frey

I was reading RFC 4627 and I can't figure out if the following is valid JSON or not. Consider this minimalistic JSON text:

["\u005c"]

The problem is the lowercase c.

According to the text of the RFC it is allowed:

Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point. The hexadecimal letters A though F can be upper or lowercase. So, for example, a string containing only a single reverse solidus character may be represented as "\u005C".

(Emphasis mine)

The problem is that the RFC also contains the grammar for this:

char = unescaped /
       escape (
           %x22 /          ; "    quotation mark  U+0022
           %x5C /          ; \    reverse solidus U+005C
           %x2F /          ; /    solidus         U+002F
           %x62 /          ; b    backspace       U+0008
           %x66 /          ; f    form feed       U+000C
           %x6E /          ; n    line feed       U+000A
           %x72 /          ; r    carriage return U+000D
           %x74 /          ; t    tab             U+0009
           %x75 4HEXDIG )  ; uXXXX                U+XXXX

where HEXDIG is defined in referenced RFC 4234 as

HEXDIG         =  DIGIT / "A" / "B" / "C" / "D" / "E" / "F"

which includes only uppercase letters.

FWIW, from what I researched most JSON parsers accept both upper and lowercase letters.

Question(s): What is actually correct? Is there a contradiction and the grammar in the RFC should be fixed?

Jon Skeet

I think it's explained by this part of RFC 4234:

ABNF strings are case-insensitive and the character set for these strings is us-ascii.

Hence:

    rulename = "abc"

and:

    rulename = "aBc"

will match "abc", "Abc", "aBc", "abC", "ABc", "aBC", "AbC", and "ABC".

On the other hand, the follow-on part is not terribly clear:

To specify a rule that IS case SENSITIVE, specify the characters individually.

For example:

    rulename    =  %d97 %d98 %d99

or

    rulename    =  %d97.98.99

In the case of the HEXDIG rule, they're individual characters to start with - but they're specified literally as "A" etc rather than %d41, so I suspect that means they're case-insensitive. It's not the clearest spec I've read :(

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

JSON Unicode escape sequence - lowercase or not?

From Dev

Convert Text to Unicode Escape Sequence

From Dev

Behaviors of unicode escape sequence in comments

From Dev

JSON Escape Sequence

From Dev

JSON Escape Sequence

From Dev

Convert Unicode escape sequence to UTF-8

From Dev

undo escape sequence "\" before unicode char

From Dev

Javascript: Invalid Unicode escape sequence while storing accessing unicode

From Dev

How to convert Unicode Escape Sequence to Unicode String in Haskell

From Dev

How to convert Unicode Escape Sequence to Unicode String in Haskell

From Dev

Converting Unicode Escape Sequence to Symbol, and dumping to dom node

From Dev

Inserting unicode escape sequence data to SQL server db (pyodbc)

From Dev

Unicode escape sequence for non-BMP plane character

From Dev

Interacting with files that have unicode characters in filename / escape sequence issues

From Dev

Interacting with files that have unicode characters in filename / escape sequence issues

From Dev

Converting Unicode Escape Sequence to Symbol, and dumping to dom node

From Dev

Unicode escape sequence for non-BMP plane character

From Dev

JSON object parsing and how to escape unicode characters

From Dev

Lowercase of Unicode character

From Dev

Newtosoft.Json Bad JSON escape sequence: \v.

From Dev

Escape unicode characters in Go JSON so the output matches Python

From Dev

Escape unicode escaping in Java

From Dev

Unicode escape syntax in Java

From Dev

Automatically escape unicode characters

From Dev

Unicode escape error

From Dev

Is this Python unicode escape error?

From Dev

Escape unicode escaping in Java

From Dev

Unknown escape sequence

From Dev

Remove Escape Sequence Not Working

Related Related

HotTag

Archive