Parser Tools in Prolog

This document describes some tools for processing context-free grammars into LL(1) form. The following tools are available

Context-Free Grammar

The specification of the contex-free grammar for a language consists of four items, the specification of the terminal symbols of the language, the specification of the nonterminals, the productions or derivation rules, and the start symbol of the grammar. As an example of a context-free grammar, here is a specification of the context-free grammar for arithmetic expression.

terminal('+').
terminal('*').
terminal('(').
terminal(')').
terminal(id).
 
nonterminal(e).   % expression
nonterminal(t).   % term
nonterminal(f).   % factor
  
start(e).
 
p(e,[t]).
p(e,[e,'+',t]).
p(t,[f]).
p(t,[t,'*',f]).
p(f,[id]).
p(f,['(',e,')']).

Removal of Left-Recursion

Note that the second production for expression in the grammar of the previous section is left recursive. The use of such a production in the design of a recursive descent parser would result in an infinite loop. Sometimes left recursion may be eliminated.

terminal('+').
terminal('*').
terminal(id).
terminal('(').
terminal(')').

nonterminal(e).
nonterminal(e0).
nonterminal(t).
nonterminal(t0).
nonterminal(f).

start(e).

p(e,[t,e0]).
p(e0,['+',t,e0]).
p(e0,[epsilon]).
p(t,[f,t0]).
p(t0,['*',f,t0]).
p(t0,[epsilon]) .
p(f,['(',e,')']).
p(f,[id]).

Here is a Prolog program which removes left recursion if possible.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%                       Removal of Left Recursion
%                         Waite and Goos: p. 126
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

:- dynamic p/2 .
:- dynamic nonterminal/1 .

remove_left_recursion :- setof(N,nonterminal(N),Ns),
                         rm_left_rec([],Ns).

rm_left_rec(Xjs,[]).
rm_left_rec(Xjs,[Xi|R]) :- step2(Xjs,Xi),
                           step3(Xi,Xip,Xjs),
                           append(R,Xip,Rs),
                           append(Xjs,[Xi],NXjs),
                           rm_left_rec(NXjs,Rs).

step2([],Xi).
step2([Xj|Xs],Xi) :- e_bagof(W,p(Xi,[Xj|W]),Ws), Ws \= [], Ws \= [[]],

                     e_bagof(XXj,p(Xj,XXj),XXs),
                     retractall(p(Xi,[Xj|Wx])),
                     replace_each0(Xi,Xj,Ws,XXs),
                     step2(Xs,Xi).
step2([Xj|Xs],Xi) :- step2(Xs,Xi). 

replace_each0(Xi,Xj,[],_).
replace_each0(Xi,Xj,[W|Ws],XXs) :- replace_each00(Xi,Xj,W,XXs),
                                   replace_each0(Xi,Xj,Ws,XXs).

replace_each00(Xi,Xj,[],XXs).
replace_each00(Xi,Xj,W,[]).
replace_each00(Xi,Xj,W,[XXj|XXs]) :- append(XXj,W,XXjW),
                                     assert(p(Xi,XXjW)),
                                     replace_each00(Xi,Xj,W,XXs).

step3(Xi,[Xip],Xjs) :- e_bagof(W,p(Xi,[Xi|W]),Ws), Ws \= [], Ws \= [[]],

                       e_bagof([K|X],(p(Xi,[K|X]),K\=Xi),Xs),
                       retractall(p(Xi,Rhs)),
                       newN(Xi,Bi),
                       assert(nonterminal(Bi)),
                       assert(p(Bi,[epsilon])), 
                       replace_each1(Xi,Bi,Ws,Xjs),
                       replace_each2(Xi,Bi,Xs).
step3(Xi,[],Done).

newN(Xi,Xip) :- atomtolist(Xi,L), int(N), 
                atomtolist(N,Ln), append(L,Ln,Lp), atomtolist(Xip,Lp),

                \+ nonterminal(Xip).

int(0).
int(N) :- int(M), N is M+1.

replace_each1(Xi,Bi,[],Xjs).  
replace_each1(Xi,Bi,[W|Ws],Xjs) :- append(W,[Bi],WBi),
                                   assert(p(Bi,WBi)),
                                   replace_each1(Xi,Bi,Ws,Xjs).
replace_each2(Xi,Bi,[]).
replace_each2(Xi,Bi,[XX|XXs]) :- append(XX,[Bi],XXBi),
                                 assert(p(Xi,XXBi)),
                                 replace_each2(Xi,Bi,XXs).

Left-Factoring

Another common problem is when two productions for the same nonterminal share a common prefix on the right-hand side of the productions. The common prefix makes it impossible to choose (with a fixed amount of look-ahead) the proper production in top-down parsing. This elimination of the common prefix is called left-factoring. Here is a grammar for the if-then-else construct.

terminal(a).
terminal(b).
terminal(e).
terminal(i).
terminal(t).

nonterminal(ss).
nonterminal(cc).

start(ss).

p(ss,[i,cc,t,ss,e,ss]).
p(ss,[i,cc,t,ss]).
p(ss,[a]).
p(cc,[b]).

Left factoring the grammar produces the following result.

terminal(a).
terminal(b).
terminal(e).
terminal(i).
terminal(t).

nonterminal(ss).
nonterminal(cc).
nonterminal(ss0).

start(ss).

p(ss,[i,cc,t,ss,ss0]).
p(ss,[a]).
p(ss0,[e,ss]).
p(ss0,[epsilon]).
p(cc,[b]).

The following code left-factors the given grammar.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%                          Left Factoring 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

do_left_factoring :- setof(N,nonterminal(N),Ns), do_lf(Ns).

do_lf([]).
do_lf([A|Ns]) :- p(A,R1), p(A,R2), R1 \= R2,
                 maxcommon(R1,R2,C,R1p,R2p), C \= [],
                 newN(A,Ap), 
                 assert(nonterminal(Ap)),
                 restofcommon(A,C,Ap),

                 append(C,[Ap],Rhs),
                 assert(p(A,Rhs)),
                 do_lf([A|Ns]).
do_lf([A|Ns]) :- do_lf(Ns).

restofcommon(A,C,Ap) :- p(A,Rhs),
                        append(C,R,Rhs),
                        retract(p(A,Rhs)),
                        \+ p(Ap,R),
                        ((R =[], \+ p(Ap,[epsilon]), assert(p(Ap,[epsilon])));
                         (R\=[], \+ p(Ap,R),         assert(p(Ap,R)))),

                        restofcommon(A,C,Ap).
restofcommon(A,C,Ap).

maxcommon([X|R1],[X|R2],[X|C],R1p,R2p) :- maxcommon(R1,R2,C,R1p,R2p).
maxcommon(R1,R2,[],R1,R2).

First and Follow Sets

Even if left-recursion can be eliminated and the productions have been left-factored, we may not a have a grammar which is suitable for top-down parsing with one symbol of look-ahead. For example, in the following grammar it is not possible to choose between the productions for the nonterminal aa because x is also derivable from the nonterminal bb.

terminal(x).
terminal(z).

nonterminal(aa).
nonterminal(bb).
nonterminal(cc).

start(aa).

p(aa,[bb,z]).
p(aa,[x,z]).
p(bb,[cc]).
p(bb,[cc,bb]).
p(cc,[x]).

The set of initial terminals derivable from the right-hand side rhs of a production is called first(rhs). For this grammar, the first sets are:

first([bb,z],[x])
first([x,z],[x])
first([cc],[x])
first([cc,bb],[x])
first([x],[x])

The set of first sets are computed by the following program.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%                               First Sets  
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%  First(A) = Fs where
%                       x       in Fs iff A =>* xB
%                       epsilon in Fs iff A =>* epsilon
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


:- dynamic first/2 .

firstsets :- setof(Rhs,N^p(N,Rhs),Rhss), mk_firsts(Rhss).

mk_firsts([]).
mk_firsts([Rhs|Rhss]) :- first_set(Rhs,Fs), assert(first(Rhs,Fs)),

                         mk_firsts(Rhss).

first_set(L,Fs) :- e_setof(X,a_first(L,[],X),Fs).

a_first([],V,epsilon).
a_first([epsilon],V,epsilon) :- !.
a_first([epsilon|L],V,X) :- a_first(L,V,X).
a_first([X|L],V,X) :- terminal(X).
a_first([N|L],V,X) :- nonterminal(N), 
                      \+ in(N,V),
                      p(N,LN),
                      append(LN,L,NL),
                      a_first(NL,[N|V],X).

When the production p(x,[epsilon]) is included in the set of productions, selection must also be based on the set of terminals which can follow a given non-terminal. We can summarize the previous discussion as follows. A grammar is LL(1) ( suitable for top-down parsing with one symbol of look ahead ) iff it satisfies the following two rules.

The sets of initial symbols of all sentences that can be generated from the right-hand sides of a given non-terminal must be disjoint.
The set of initial symbols of each non-terminal which generates the empty string must be disjoint from the set of symbols which can follow it.

The code which constructs the set of initial terminals which can follow a non-terminal follows.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%                              Follow Sets
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%  Follow(A) = Fs where
%                       x       in Fs iff S =>* aAxb
%                       epsilon in Fs iff S =>* aA
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

:- dynamic follow/2 .

followsets :- setof(N,(nonterminal(N),start(S),N \= S),Ns),
              mk_mt_follows(Ns,MT),
              mk_follows([(S,[eof])|MT],FS),

              assertFS(FS).

mk_mt_follows([],[]).
mk_mt_follows([N|Ns],[(N,[])|MT]) :- mk_mt_follows(Ns,MT).


mk_follows(CFS,AFS) :- p(A,L), append(Alpha,[B|Beta],L), nonterminal(B),
                       Beta \= [], 
                       first_set(Beta,FirstBeta),
                       delete(epsilon,FirstBeta,FB),
                       append(Fp,[(B,FSB)|RFs],CFS),
                       union(FB,FSB,NFSB), \+ samesets(FSB,NFSB),
                       append(Fp,[(B,NFSB)|RFs],NCFS),
                       mk_follows(NCFS,AFS).

mk_follows(CFS,AFS) :- p(A,L), append(Alpha,[B],L), nonterminal(B),

                       append(Fp,[(B,FSB)|RFs],CFS),
                       in((A,FSA),CFS),
                       union(FSA,FSB,NFSB), \+ samesets(FSB,NFSB),
                       append(Fp,[(B,NFSB)|RFs],NCFS),
                       mk_follows(NCFS,AFS).

mk_follows(CFS,AFS) :- p(A,L), append(Alpha,[B|Beta],L), nonterminal(B),
                       Beta \= [], 
                       first_set(Beta,FirstBeta), in(epsilon,FirstBeta),
                       append(Fp,[(B,FSB)|RFs],CFS),

                       in((A,FSA),CFS),
                       union(FSA,FSB,NFSB), \+ samesets(FSB,NFSB),
                       append(Fp,[(B,NFSB)|RFs],NCFS),
                       mk_follows(NCFS,AFS).
mk_follows(FS,FS).

assertFS([]).
assertFS([(N,Fs)|FS]) :- assert(follow(N,Fs)), assertFS(FS).

Parsing Table and an LL(1) Parser

The set of first and follow sets may be used to construct a table which can be used to guide the parser of an language which has an LL(1) grammar. The table is indexed by a non-terminal and the current input symbol. The following code constructs such a table.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%                  Construction of Parsing Tables
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

do :- firstsets, followsets, make_LL1_table.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%                  Construction of Predictive Parsing Table
%                     Aho & Ullman: Algorithm 5.4 p.190
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


:- dynamic table/3 .

make_LL1_table :- setof((N,Rhs),p(N,Rhs),Productions),

                  make_entries(Productions).

make_entries([(N,Rhs)|Productions]) :- make_an_entry((N,Rhs)),
                                       make_entries(Productions).

make_an_entry((N,Rhs)) :- first(Rhs,First),
                          each_entry((N,Rhs),First).

each_entry((N,Rhs),[]).
each_entry((N,Rhs),[eof]).

% 2.
each_entry((N,Rhs),First) :- in(T,First), terminal(T), 
                             delete(T,First,Rfirst),
                             table(N,T,(N,Rhs)),!,
                             each_entry((N,Rhs),Rfirst).

each_entry((N,Rhs),First) :- in(T,First), terminal(T), 
                             delete(T,First,Rfirst),
                             assert(table(N,T,(N,Rhs))),

                             each_entry((N,Rhs),Rfirst).
% 3.
each_entry((N,Rhs),First) :- in(epsilon,First), 
                             follow(A,Follow),
                             each_terminal((N,Rhs),Follow),
                             eof_part((N,Rhs),Follow),
                             table(N,T,(N,Rhs)),
                             delete(epsilon,First,Rfirst),
                             each_entry((N,Rhs),Rfirst).


eof_part((N,Rhs),Follow) :- in(eof,Follow),        table(N,eof,(N,Rhs)),!.
eof_part((N,Rhs),Follow) :- in(eof,Follow), assert(table(N,eof,(N,Rhs))).
eof_part((N,Rhs),Follow).

each_terminal((N,Rhs),[]).
each_terminal((N,Rhs),[T|Follow]) :- table(N,T,(N,Rhs)),!,

                                     each_terminal((N,Rhs),Follow).
each_terminal((N,Rhs),[T|Follow]) :- assert(table(N,T,(N,Rhs))),
                                     each_terminal((N,Rhs),Follow).
each_terminal((N,Rhs),[epsilon|Follow]) :- each_terminal((N,Rhs),Follow).
each_terminal((N,Rhs),[    eof|Follow]) :- each_terminal((N,Rhs),Follow).

A parser uses the table generated by the previous program as follows. Initially the start symbol of the grammar is placed on a stack. Then the next two rules are followed until the input is empty or a situation arises for which there is no entry in the table.

If the current input symbol and the stack top symbol are the same, then pop the stack and consume the input symbol.
If the stack top symbol is a non-terminal then consult the table, pop the stack and push the table entry onto the stack.

Here is the code for an LL(1) parser.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%                             LL(1) Parser Program
%                           Aho & Ullman: p.184,186
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

ll_1_parser :- start(S), input(Input), ll_1_parser([S,eof],Input).

ll_1_parser([],[]).
ll_1_parser([T|Stack],[T|Input]) :- ll_1_parser(Stack,Input).
ll_1_parser([N|Stack],[X|Input]) :- table(N,X,(N,Rhs)),

                                    append(Rhs,Stack,NStack),
                                    ll_1_parser(NStack,[X|Input]).

ll_1_parser(Stack,Input) :- error_recovery_routine(Stack,Input).

Here are some miscellaneous predicates required by the previous code.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%                        Miscellaneous Predicates
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

append([],L,L).
append([X|A],B,[X|L]) :- append(A,B,L).

in(X,[X|L]).
in(X,[Y|L]) :- in(X,L).

union([],B,B).
union(A,[],A).
union([X|A],B,C) :- in(X,B), union(A,B,C).
union([X|A],B,[X|C]) :- \+ in(X,B), union(A,B,C).

subset([],B).
subset([X|A],B) :- in(X,B), subset(A,B).

samesets(A,B) :- subset(A,B), subset(B,A).

delete(X,[],[]).
delete(X,[X|R],R).
delete(X,[Y|R],[Y|L]) :- X \= Y, delete(X,R,L).


e_bagof(A,B,C) :- bagof(A,B,C),!.
e_bagof(A,B,[]).

e_setof(A,B,C) :- setof(A,B,C),!.
e_setof(A,B,[]).

sort(List,Sorted) :- sort(List,[],Sorted).
sort([],Sorted,Sorted).
sort([X|List],PSort,Sorted) :- insert(X,PSort,Psort), sort(List,Psort,Sorted).

insert(X,[],[X]).
insert(X,[Y|List],[X,Y|List]) :- X @=< Y.
insert(X,[Y|List],[Y|ListP]) :- X @> Y, insert(X,List,ListP).

?1996 by A. Aaby