ANTLR4 : mismatched input

Ramg

I would like to match the input of the form ::

commit a1b2c3
Author: Michael <[email protected]>

commit d3g4
Author: David <[email protected]> 

Here is the grammar I have written:

grammar commit;

file : commitinfo+;

commitinfo : commitdesc authordesc;
commitdesc : 'commit' COMMITHASH NEWLINE;
authordesc : 'Author:' AUTHORNAME '<' EMAIL '>' NEWLINE;

COMMITHASH : [a-z0-9]+;
AUTHORNAME : [a-zA-Z]+;
EMAIL      : [a-zA-Z0-9.@]+;
NEWLINE    : '\r'?'\n';
WHITESPACE : [ \t]->skip;

The problem with the above parser is that, for the above input it matches perfectly. But when the input changes to :

commit c1d2
Author: michael <[email protected]>

it throws an error like :

line 2:8 mismatched input 'michael' expecting AUTHORNAME.

When I print the tokens, it seems the string 'michael' gets matched by the token COMMITHASH instead of AUTHORNAME.

How to fix the above case?

Rishabh Garg

ANTLR4 matches the lexer rules according to the sequence in which they have been written.

'michael' gets matched by the rule COMMITHASH : [a-z0-9]+ ; which appears before the rule AUTHORNAME and hence you are having the error.

I can think of the following options to resolve the issue you are facing :

  • You can use the 'mode' feature in ANTLR : In ANTLR 4, one lexer mode is active at a time, and the longest non-fragment lexer rule in that mode rule will determine which token is created. Your grammar only includes the default mode, so all the lexer rules are active and hence 'michael' gets matched to COMMITHASH as the length of the token matched is same for COMMITHASH and AUTHORNAME but COMMITHASH appears before AUTHORNAME in the grammar.

  • You can alter your lexical rules by interchanging the way in which they appear in the grammar. Assuming your COMMITHASH rule always has a numeral matched with it. Put AUTHORNAME before COMMITHASH in the following way :

    grammar commit;
    ...
    
    AUTHORNAME : [a-zA-Z]+;
    COMMITHASH : [a-z0-9]+;
    ...
    

Note: I strongly feel that your lexer rules are not crisply written. Are you sure that your COMMITHASH rule should be [a-z0-9]+; This would mean a token like 'abhdks' will also get matched by your COMMITHASH rule. But that's a different issue altogether.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

ANTLR4 Grammar extraneous / mismatched input error

From Dev

ANTLR mismatched input error

From Dev

ANTLR mismatched input '<EOF>'

From Dev

ANTLR mismatched input '<EOF>'

From Dev

ANTLR 4.2.2: mismatched input

From Dev

ANTLR mismatched input error

From Dev

Antlr v4: 'mismatched input'

From Dev

mismatched input in antlr generated parser

From Dev

Antlr v4 Can I ignore mismatched input?

From Dev

Antlr v4 Can I ignore mismatched input?

From Dev

Antlr : beginner 's mismatched input expecting ID

From Dev

ANTLR 4.5 - Mismatched Input 'x' expecting 'x'

From Dev

Antlr - mismatched input error - token not recognised

From Dev

ANTLR4: Extraneous Input error

From Dev

ANTLR4: Erroneous input error

From Dev

ANTLR Grammar line 1:6 mismatched input '<EOF>' expecting '.'

From Dev

ANTLR 4.5: line 1:22 mismatched input 'randomly' expecting DIRECTION

From Dev

Mismatched input ',' expecting ')'

From Dev

mismatched input '.' in rule

From Dev

Solving ambiguous input: mismatched input

From Dev

Antlr4 match whole input string or bust

From Dev

ANTLR4: ignore white spaces in the input but not those in string literals

From Dev

antlr4 mismatch input error on sql parser

From Dev

Antlr4 extraneous input even though it's necessary?

From Dev

Antlr4 grammar extraneous input with single space

From Dev

mismatched input ')' expecting EOF in CQL

From Dev

mismatched input ')' expecting EOF in CQL

From Dev

Python 2.7 & ANTLR4 : Make ANTLR throw exceptions on invalid input

From Dev

Jython @property SyntaxError: mismatched input '' expecting CLASS