ANTLR mismatched input error

Kyranstar

I'm writing a parser for my own language. I'm trying to parse the phrase

Number a is 10;

which is basically equivalent to int a = 10;.

It should match the variable_def rule. When I run it, I get the error

line 1:0 extraneous input 'Number' expecting {<EOF>, 'while', ';', 'if', 'function', TYPE, 'global', 'room', ID}
line 1:9 mismatched input 'is' expecting '('

This is my grammar:

grammar Script;

@header {
package script;
}

// PARSER

program
:
    block EOF
;

block
:
    (
        statement
        | functionDecl
    )*
;

statement
:
    (variable_def
    | functionCall
    | ifStatement
    | forStatement
    | whileStatement) ';'
;

whileStatement
:
    'while' '(' expression ')' '{' (statement)* '}'
;

forStatement
:
;

ifStatement
:
    'if' '(' expression ')' '{' statement* '}'
    (
        (
            'else' '{' statement* '}'
        )
        |
        (
            'else' ifStatement
        )
    )?
;

functionDecl
:
    'function' ID
    (
        '('
        (
            TYPE ID
        )?
        (
            ',' TYPE ID
        )* ')'
    )?
    (
        'returns' RETURN_TYPE
    )? '{' statement* '}'
;

functionCall
:
    ID '(' exprList? ')'
;

exprList
:
    expression
    (
        ',' expression
    )*
;

variable_def
:

    TYPE assignment
    | GLOBAL variable_def
    | ROOM variable_def
;

expression
:
    '-' expression # unaryMinusExpression
    | '!' expression # notExpression
    | expression '^' expression # powerExpression
    | expression '*' expression # multiplyExpression
    | expression '/' expression # divideExpression
    | expression '%' expression # modulusExpression
    | expression '+' expression # addExpression
    | expression '-' expression # subtractExpression
    | expression '>=' expression # gtEqExpression
    | expression '<=' expression # ltEqExpression
    | expression '>' expression # gtExpression
    | expression '<' expression # ltExpression
    | expression '==' expression # eqExpression
    | expression '!=' expression # notEqExpression
    | expression '&&' expression # andExpression
    | expression '||' expression # orExpression
    | expression IN expression # inExpression
    | NUMBER # numberExpression
    | BOOLEAN # boolExpression
    | functionCall # functionCallExpression
    | '(' expression ')' # expressionExpression
;

assignment
:
    ID ASSIGN expression
;

// LEXER

RETURN_TYPE
:
    TYPE
    | 'Nothing'
;

TYPE
:
    'Number'
    | 'String'
    | 'Anything'
    | 'Boolean'
    | 'Growable'? 'List' 'of' TYPE
;

GLOBAL
:
    'global'
;

ROOM
:
    'room'
;

ASSIGN
:
    'is'
    (
        'a'
        | 'an'
        | 'the'
    )?
;

EQUAL
:
    'is'?
    (
        'equal'
        (
            's'
            | 'to'
        )?
        | 'equivalent' 'to'?
        | 'the'? 'same' 'as'?
    )
;

IN
:
    'in'
;

BOOLEAN
:
    'true'
    | 'false'
;

NUMBER
:
    '-'? INT '.' INT EXP? // 1.35, 1.35E-9, 0.3, -4.5

    | '-'? '.' INT EXP? // -.35, .35e5

    | '-'? INT EXP // 1e10 -3e4

    | '-'? INT // -3, 45

;

fragment
EXP
:
    [Ee] [+\-]? INT
;

fragment
INT
:
    '0'
    | [1-9] [0-9]*
;

STRING
:
    '"'
    (
        ' ' .. '~'
    )* '"'
;

ID
:
    (
        'a' .. 'z'
        | 'A' .. 'Z'
        | '_'
    )
    (
        'a' .. 'z'
        | 'A' .. 'Z'
        | '0' .. '9'
        | '_'
    )*
;

fragment
JAVADOC_COMMENT
:
    '/*' .*? '*/'
;

fragment
LINE_COMMENT
:
    (
        '//'
        | '#'
    ) ~( '\r' | '\n' )*
;

COMMENT
:
    (
        LINE_COMMENT
        | JAVADOC_COMMENT
    ) -> skip
;

WS
:
    [ \t\n\r]+ -> skip
;

How can I fix this error?

Bart Kiers

The main reason is because in your current grammar, the TYPE token will never be created because RETURN_TYPE matches a TYPE too and is defined before TYPE (and has therefor precedence over it).

Also, you're doing too much in the lexer. As soon as you start gluing words together in the lexer, it's a sign you should be making those rules parser rules instead.

And white spaces might be skipped by the lexer, but only from parser rules. Take your ASSIGN rule for example:

ASSIGN
 : 'is' ( 'a' | 'an' | 'the' )?
 ;

This rule will not match the string "is a" (a space between "is" and "a"), it will only match "isa", "isan" and "isthe". The solution: create a parser rule from it:

assign
 : 'is' ( 'a' | 'an' | 'the' )?
 ;

which is equivalent to:

assign
 : 'is' ( 'a' | 'an' | 'the' )?
 ;

IS : 'is';
A : 'a';
AN : 'an';
THE : 'the';

...

ID : [a-zA-Z_] [a-zA-Z_0-9]*;

This will cause the tokens 'is', 'a', 'an' and 'the' to never be matched as an ID token. So the following source will fail as a proper assignment:

Number a is 42;

because the 'a' is tokenized as an A token, not an ID.

To work around this, you could add the following parser rule:

id
 : ( ID | A | AN | IS | THE | ... )
 ;

and use that rule in other parser rules instead of ID.

A quick demo would look like this:

grammar Script;

// PARSER

program
 : block EOF
 ;

block
 : ( statement | functionDecl )*
 ;

statement
 : ( variable_def
   | functionCall
   | ifStatement
   | forStatement
   | whileStatement
   )
   ';'
 ;

whileStatement
 : 'while' '(' expression ')' '{' statement* '}'
 ;

forStatement
 :
 ;

ifStatement
 : 'if' '(' expression ')' '{' statement* '}'
   ( ( 'else' '{' statement* '}' ) | ( 'else' ifStatement ) )?
 ;

functionDecl
 : 'function' id ( '(' ( type id )? ( ',' type id )* ')' )?
   ( 'returns' return_type )? '{' statement* '}'
 ;

functionCall
 : id '(' exprList? ')'
 ;

exprList
 : expression ( ',' expression )*
 ;

variable_def
 : type assignment
 | GLOBAL variable_def
 | ROOM variable_def
 ;

expression
 : '-' expression             # unaryMinusExpression
 | '!' expression             # notExpression
 | expression '^' expression  # powerExpression
 | expression '*' expression  # multiplyExpression
 | expression '/' expression  # divideExpression
 | expression '%' expression  # modulusExpression
 | expression '+' expression  # addExpression
 | expression '-' expression  # subtractExpression
 | expression '>=' expression # gtEqExpression
 | expression '<=' expression # ltEqExpression
 | expression '>' expression  # gtExpression
 | expression '<' expression  # ltExpression
 | expression '==' expression # eqExpression
 | expression '!=' expression # notEqExpression
 | expression '&&' expression # andExpression
 | expression '||' expression # orExpression
 | expression IN expression   # inExpression
 | NUMBER                     # numberExpression
 | BOOLEAN                    # boolExpression
 | functionCall               # functionCallExpression
 | '(' expression ')'         # expressionExpression
 ;

assignment
 : id assign expression
 ;

return_type
 : type
 | 'Nothing'
 ;

type
 : TYPE
 | 'Growable'? 'List' OF TYPE
 ;

assign
 : 'is' ( A | AN | THE )?
 ;

equal
 : 'is'? ( EQUAL ( S
                 | TO
                 )?
         | EQUIVALENT TO?
         | THE? SAME AS?
         )
 ;

id
 : ( ID | OF | A | AN | EQUAL | S | EQUIVALENT | TO | THE | SAME | AS )
 ;

// LEXER

// Some keyword you might want to match as an identifier too:
OF : 'of';
A : 'a';
AN : 'an';
EQUAL : 'equal';
S : 's';
EQUIVALENT : 'equivalent';
TO : 'to';
THE : 'the';
SAME : 'same';
AS : 'as';

COMMENT
 : ( LINE_COMMENT | JAVADOC_COMMENT ) -> skip
 ;

WS
 : [ \t\n\r]+ -> skip
 ;

TYPE
 : 'Number'
 | 'String'
 | 'Anything'
 | 'Boolean'
 ;

GLOBAL
 : 'global'
 ;

ROOM
 : 'room'
 ;

IN
 : 'in'
 ;

BOOLEAN
 : 'true'
 | 'false'
 ;

NUMBER
 : '-'? INT '.' INT EXP? // 1.35, 1.35E-9, 0.3, -4.5
 | '-'? '.' INT EXP? // -.35, .35e5
 | '-'? INT EXP // 1e10 -3e4
 | '-'? INT // -3, 45
 ;

STRING
 : '"' .*? '"'
 ;

ID
 : [a-zA-Z_] [a-zA-Z_0-9]*
 ;

fragment EXP
 : [Ee] [+\-]? INT
 ;

fragment INT
 : '0'
 | [1-9] [0-9]*
 ;

fragment JAVADOC_COMMENT
 : '/*' .*? '*/'
 ;

fragment LINE_COMMENT
 : ( '//' | '#' ) ~( '\r' | '\n' )*
 ;

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

ANTLR mismatched input error

From Dev

Antlr - mismatched input error - token not recognised

From Dev

ANTLR mismatched input '<EOF>'

From Dev

ANTLR mismatched input '<EOF>'

From Dev

ANTLR 4.2.2: mismatched input

From Dev

ANTLR4 Grammar extraneous / mismatched input error

From Dev

mismatched input in antlr generated parser

From Dev

ANTLR4 : mismatched input

From Dev

Antlr : beginner 's mismatched input expecting ID

From Dev

ANTLR 4.5 - Mismatched Input 'x' expecting 'x'

From Dev

Antlr v4: 'mismatched input'

From Dev

Antlr v4 Can I ignore mismatched input?

From Dev

ANTLR Grammar line 1:6 mismatched input '<EOF>' expecting '.'

From Dev

Antlr v4 Can I ignore mismatched input?

From Dev

ANTLR 4.5: line 1:22 mismatched input 'randomly' expecting DIRECTION

From Dev

ANTLR4: Extraneous Input error

From Dev

ANTLR4: Erroneous input error

From Dev

Mismatched input ',' expecting ')'

From Dev

mismatched input '.' in rule

From Dev

Solving ambiguous input: mismatched input

From Dev

SSIS: Mismatched quotes error

From Dev

mismatched input ')' expecting EOF in CQL

From Dev

mismatched input ')' expecting EOF in CQL

From Dev

FAILED: Parse Error: line 7:19 mismatched input '(' expecting FROM in from clause

From Dev

antlr4 mismatch input error on sql parser

From Dev

Jython @property SyntaxError: mismatched input '' expecting CLASS

From Dev

Drools decision table, "mismatched input '>' in rule "

From Dev

"Mismatched input <EOF>" with very simple grammar

From Dev

Why am I getting mismatched input?