CSc 1302, Honors Principles of Computer Science II (Fall 2021)
ANTLR Parser Generator System
- Evaluation while parsing version: Expr.g4, Expr.java
- Evaluation after parsing version: Expr2.g4, Expr2.java, ExprNode.java
Simple Arithmetic Expressions
Consider simple arithmetic expressions that involve integers; Some examples of such expressions are:2 + 3 * 9 (2 + 3) * 16 4 * (-3 + -9) / 2From our elementary school, we know that multiplication and division have higher precedence over addition and subtraction. So, the first expression will be interpreted as 2 + (3 * 9). We can write a "context-free grammar" that define the syntax of the expressions and a set of "regular expressions" to describe the tokens as follows:
prog : expr SEMI expr : term PLUS expr | term MINUS expr | term term : factor MUL term | factor DIV term | factor factor : INT | LPAREN expr RPAREN SEMI : ';' PLUS : '+' MINUS : '-' MUL : '*' DIV : '/' LPAREN : '(' RPAREN : ')' INT : [-]?[0-9]+The above grammar and token/lexical specification can be coded as an ANTLR4 specification that can be converted to Java programs to tokenize and parse simple arithmetic expressions. The ANTLR4 specification is placed in file named Expr.g4:
[raj@tinman antlr]$ more Expr.g4 grammar Expr; prog returns [int value] : e=expr ';' { $value = $e.value; } ; expr returns [int value] : e1=term PLUS e2=expr { $value = $e1.value + $e2.value; } | e1=term MINUS e2=expr { $value = $e1.value - $e2.value; } | e1=term { $value = $e1.value; } ; term returns [int value] : | e1=factor MUL e2=term { $value = $e1.value * $e2.value; } | e1=factor DIV e2=term { $value = $e1.value / $e2.value; } | e1=factor { $value = $e1.value; } ; factor returns [int value] : n=INT { $value = Integer.parseInt($n.text); } | LPAREN e=expr RPAREN { $value = $e.value; } ; SEMI : ';'; PLUS : '+'; MINUS : '-'; MUL : '*'; DIV : '/'; LPAREN : '('; RPAREN : ')'; INT : [-]?[0-9]+; WS : [ \r\n\t]+ -> skip;The commands to compile the ANTLR specification into Lexer and Parser Java code and run tne main program that calls the parser are:
[raj@tinman antlr]$ antlr4 Expr.g4 [raj@tinman antlr]$ javac *.java [raj@tinman antlr]$ java Expr EXPR> -2+3*-2; -8 EXPR> exit; [raj@tinman antlr]$The lines of code that sets up the Lexer and Parser objects and calls the parser are available in Expr.java and are shown below:
import org.antlr.v4.runtime.CharStreams; import org.antlr.v4.runtime.CharStream; import org.antlr.v4.runtime.CommonTokenStream; CharStream in = CharStreams.fromString(input); // convert input String into CharStream ExprLexer lexer = new ExprLexer(in); // construct Lexer object CommonTokenStream tokens = new CommonTokenStream(lexer); // obtain the token stream from Lexer ExprParser parser = new ExprParser(tokens); // construct Parser object Integer answer = parser.prog().value; // call the Parser and obtain resultIf you want to just see the parse tree that is generated rather than run the main program, you can use the following command:
Mac-mini:antlr raj$ grun Expr prog -gui 2 + 3 * 9; Mac-mini:antlr raj$Note that after entering the ";" you will enter the RETURN character and then enter ctrl-D to signify the end of input. Then a pop up window will show with the parse tree
To call the Lexer and output the tokens of the input, you can do the following:
Mac-mini:antlr raj$ grun Expr prog -tokens 2 + 3 * 9; [@0,0:0='2',,1:0] [@1,2:2='+',<'+'>,1:2] [@2,4:4='3', ,1:4] [@3,6:6='*',<'*'>,1:6] [@4,8:8='9', ,1:8] [@5,9:9=';',<';'>,1:9] [@6,11:10=' ', ,2:0]