CSc 1302, Honors Principles of Computer Science II (Fall 2021)

ANTLR Parser Generator System

Simple Arithmetic Expressions

Consider simple arithmetic expressions that involve integers; Some examples of such expressions are:
2 + 3 * 9
(2 + 3) * 16
4 * (-3 + -9) / 2
From our elementary school, we know that multiplication and division have higher precedence over addition and subtraction. So, the first expression will be interpreted as 2 + (3 * 9). We can write a "context-free grammar" that define the syntax of the expressions and a set of "regular expressions" to describe the tokens as follows:
prog   : expr SEMI
expr   : term PLUS expr | term MINUS expr | term
term   : factor MUL term | factor DIV term | factor
factor : INT | LPAREN expr RPAREN 

SEMI   : ';'
PLUS   : '+'
MINUS  : '-'
MUL    : '*'
DIV    : '/'
LPAREN : '('
RPAREN : ')'
INT    : [-]?[0-9]+
The above grammar and token/lexical specification can be coded as an ANTLR4 specification that can be converted to Java programs to tokenize and parse simple arithmetic expressions. The ANTLR4 specification is placed in file named Expr.g4:
[raj@tinman antlr]$ more Expr.g4
grammar Expr;

prog returns [int value] :  
  e=expr ';'
  {
    $value = $e.value;
  }
;

expr returns [int value] :
  e1=term PLUS e2=expr
  {
    $value = $e1.value + $e2.value;
  }
  |
  e1=term MINUS e2=expr
  {
    $value = $e1.value - $e2.value;
  }
  |
  e1=term
  {
    $value = $e1.value;
  }
;

term returns [int value] :
  |
  e1=factor MUL e2=term
  {
    $value = $e1.value * $e2.value;
  }
  |
  e1=factor DIV e2=term
  {
    $value = $e1.value / $e2.value;
  }
  |
  e1=factor
  {
    $value = $e1.value;
  }
;

factor returns [int value] :
  n=INT
  {
    $value = Integer.parseInt($n.text);
  }
  |	LPAREN e=expr RPAREN 
  {
    $value = $e.value;
  }
;

SEMI   : ';';
PLUS   : '+';
MINUS  : '-';
MUL    : '*';
DIV    : '/';
LPAREN : '(';
RPAREN : ')';
INT    : [-]?[0-9]+;
WS     : [ \r\n\t]+ -> skip;
The commands to compile the ANTLR specification into Lexer and Parser Java code and run tne main program that calls the parser are:
[raj@tinman antlr]$ antlr4 Expr.g4
[raj@tinman antlr]$ javac *.java
[raj@tinman antlr]$ java Expr
EXPR> -2+3*-2;
-8
EXPR> exit;
[raj@tinman antlr]$
The lines of code that sets up the Lexer and Parser objects and calls the parser are available in Expr.java and are shown below:
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CommonTokenStream;

CharStream in = CharStreams.fromString(input); // convert input String into CharStream
ExprLexer lexer = new ExprLexer(in); // construct Lexer object
CommonTokenStream tokens = new CommonTokenStream(lexer); // obtain the token stream from Lexer
ExprParser parser = new ExprParser(tokens); // construct Parser object
Integer answer = parser.prog().value; // call the Parser and obtain result
If you want to just see the parse tree that is generated rather than run the main program, you can use the following command:
Mac-mini:antlr raj$ grun Expr prog -gui
2 + 3 * 9;
Mac-mini:antlr raj$
Note that after entering the ";" you will enter the RETURN character and then enter ctrl-D to signify the end of input. Then a pop up window will show with the parse tree

To call the Lexer and output the tokens of the input, you can do the following:

Mac-mini:antlr raj$ grun Expr prog -tokens
2 + 3 * 9;
[@0,0:0='2',,1:0]
[@1,2:2='+',<'+'>,1:2]
[@2,4:4='3',,1:4]
[@3,6:6='*',<'*'>,1:6]
[@4,8:8='9',,1:8]
[@5,9:9=';',<';'>,1:9]
[@6,11:10='',,2:0]