I'm trying to build lexer for some subset of AMPL language. I need to now what type of symbolic name lexer is dealing right now. Every symbolic name is var or param or set. Fortunately all of them have to be declared before they are used. So i thought i can use lookahead operator in flex simply changing code in lexer from
SYMBOLIC_NAME [a-zA-Z_][a-zA-Z0-9_]*
%%
param { return PARAM; }
var { return VAR; }
set { return SET; }
{SYMBOLIC_NAME} { yylval.string = (char*) strdup(yytext);
return SYMBOLIC_NAME;
}
%%
to some thing like this
SYMBOLIC_NAME [a-zA-Z_][a-zA-Z0-9_]*
%{
#include <vector>
#include <algorithm>
std::vector<std::string> paramNames;
std::vector<std::string> setNames;
std::vector<std::string> varNames;
%}
%%
param/(.|\n)+{SYMBOLIC_NAME} { paramNames.push_back(&yytext[5]);
return PARAM; }
var/(.|\n)+{SYMBOLIC_NAME} { varNames.push_back(&yytext[3]);
return VAR; }
set/(.|\n)+{SYMBOLIC_NAME} { setNames.push_back(&yytext[3]);
return SET; }
{SYMBOLIC_NAME} { if ( std::find(setNames.begin(), setNames.end(), yytext) != setNames.end() ) {
yylval.string = (char*) strdup(yytext);
return SET_NAME;
}
if ( std::find(paramNames.begin(), paramNames.end(), yytext) != paramNames.end() ){
yylval.string = (char*) strdup(yytext);
return PARAM_NAME;
}
if ( std::find(varNames.begin(), varNames.end(), yytext) != varNames.end() ){
yylval.string = (char*) strdup(yytext);
return VAR_NAME;
}
}
%%
I know it's not going to work because yytext does not contain second part of first three regexps. And question appears how can i peek what is under (.|\n)+{SYMBOLIC_NAME} .
PS
I know the code is not optimal but it is not an issue here :D
I think you're trying to check a symbol table for what kind of name you are seeing.
If that is the case, you should do this by communicating through the symbol table. That is:
Create a simple "symbol" rule. Your original rule is fine:
{SYMBOLIC_NAME} { yylval.string = (char*) strdup(yytext);
return SYMBOLIC_NAME;
}
Handle the declaration syntax at the parser level:
var_decl : VAR SYMBOLIC_NAME
{ add name to symbol table }
Now go back and extend your SYMBOLIC_NAME rule to check for defined symbols:
{SYMBOLIC_NAME} {
yylval.string = (char*) strdup(yytext);
if ( std::find(setNames.begin(), setNames.end(), yytext) != setNames.end() ) {
return SET_NAME;
}
else if (... varNames ...) {
return VAR_NAME;
else if (... paramNames ...) {
return PARAM_NAME;
}
else {
return SYMBOLIC_NAME;
}
}
Now you have one Flex target returning four possible tokens, depending on defined-ness. But Flex doesn't have to worry about remembering what symbol definition is active - you can let the parser handle that.
On the parser side, you write different rules:
var_decl: VAR SYMBOLIC_NAME
set_decl: SET SYMBOLIC_NAME
expr: atom '+' atom
atom: VAR_NAME | SET_NAME | PARAM_NAME
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句