ANTLR 4.5 - Mismatched Input 'x' expecting 'x'

Chiune Sugihara Published at Dev

Chiune Sugihara

I have been starting to use ANTLR and have noticed that it is pretty fickle with its lexer rules. An extremely frustrating example is the following:

grammar output;

test: FILEPATH NEWLINE TITLE ;

FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ;
NEWLINE: '\r'? '\n' ;
TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;

This grammar will not match something like:

c:\test.txt
x

Oddly if I change TITLE to be TITLE: 'x' ; it still fails this time giving an error message saying "mismatched input 'x' expecting 'x'" which is highly confusing. Even more oddly if I replace the usage of TITLE in test with FILEPATH the whole thing works (although FILEPATH will match more than I am looking to match so in general it isn't a valid solution for me).

I am highly confused as to why ANTLR is giving such extremely strange errors and then suddenly working for no apparent reason when shuffling things around.

CoronA

This seems to be a common misunderstanding of ANTLR:

Language Processing in ANTLR:

The Language Processing is done in two strictly separated phases:

Lexing, i.e. partitioning the text into tokens
Parsing, i.e. building a parse tree from the tokens

Since lexing must preceed parsing there is a consequence: The lexer is independent of the parser, the parser cannot influence lexing.

Lexing

Lexing in ANTLR works as following:

all rules with uppercase first character are lexer rules
the lexer starts at the beginning and tries to find a rule that matches best to the current input
a best match is a match that has maximum length, i.e. the token that results from appending the next input character to the maximum length match is not matched by any lexer rule
tokens are generated from matches:
- if one rule matches the maximum length match the corresponding token is pushed into the token stream
- if multiple rules match the maximum length match the first defined token in the grammar is pushed to the token stream

Example: What is wrong with your grammar

Your grammar has two rules that are critical:

FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ;
TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;

Each match, that is matched by TITLE will also be matched by FILEPATH. And FILEPATH is defined before TITLE: So each token that you expect to be a title would be a FILEPATH.

There are two hints for that:

keep your lexer rules disjunct (no token should match a superset of another).
if your tokens intentionally match the same strings, then put them into the right order (in your case this will be sufficient).
if you need a parser driven lexer you have to change to another parser generator: PEG-Parsers or GLR-Parsers will do that (but of course this can produce other problems).

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-18

Comments

0 comments

From Dev

Related Related

Article

ANTLR 4.5 - Mismatched Input 'x' expecting 'x'

ANTLR 4.5 - Mismatched Input 'x' expecting 'x'

Antlr : beginner 's mismatched input expecting ID

ANTLR4 : mismatched input

ANTLR Grammar line 1:6 mismatched input '<EOF>' expecting '.'

ANTLR 4.5: line 1:22 mismatched input 'randomly' expecting DIRECTION

Antlr v4: 'mismatched input'

Mismatched input ',' expecting ')'

ANTLR mismatched input error

ANTLR mismatched input '<EOF>'

ANTLR mismatched input '<EOF>'

ANTLR 4.2.2: mismatched input

ANTLR mismatched input error

mismatched input ')' expecting EOF in CQL

mismatched input ')' expecting EOF in CQL

Antlr v4 Can I ignore mismatched input?

Antlr v4 Can I ignore mismatched input?

ANTLR4 Grammar extraneous / mismatched input error

mismatched input in antlr generated parser

Jython @property SyntaxError: mismatched input '' expecting CLASS

Antlr - mismatched input error - token not recognised

Antlr4: line 1:14 extraneous input 'w' expecting {<EOF>, ';', ' '}

Mismatched input 'STRING' expecting : near 'name' in column specification

Why am I getting "mismatched input 'addr' expecting {<EOF>, 'addr'}"

Pig: Failed to parse: mismatched input 'id' expecting set null

Xtext grammar : mismatched input '0' expecting RULE_INT

Pig: Failed to parse: mismatched input 'id' expecting set null

Xtext grammar : mismatched input '0' expecting RULE_INT

sklearn LinearSVC - X has 1 features per sample; expecting 5

Antlr4: Evaluate math functions f(x)

ANTLR Pattern "line 1:9 extraneous input ' ' expecting WORD"