Parser Tools in Prolog
This document describes some tools for processing context-free grammars into
LL(1) form. The following tools are available
Context-Free Grammar
The specification of the
contex-free grammar for a language consists of four items, the specification of
the terminal symbols of the language, the specification of the nonterminals, the
productions or derivation rules, and the start symbol of the grammar. As an
example of a context-free grammar, here is a specification of the context-free
grammar for arithmetic expression.
terminal('+').
terminal('*').
terminal('(').
terminal(')').
terminal(id).
nonterminal(e). % expression
nonterminal(t). % term
nonterminal(f). % factor
start(e).
p(e,[t]).
p(e,[e,'+',t]).
p(t,[f]).
p(t,[t,'*',f]).
p(f,[id]).
p(f,['(',e,')']).
Removal of Left-Recursion
Note that the second
production for expression in the grammar of the previous section is left
recursive. The use of such a production in the design of a recursive descent
parser would result in an infinite loop. Sometimes left recursion may be
eliminated.
terminal('+').
terminal('*').
terminal(id).
terminal('(').
terminal(')').
nonterminal(e).
nonterminal(e0).
nonterminal(t).
nonterminal(t0).
nonterminal(f).
start(e).
p(e,[t,e0]).
p(e0,['+',t,e0]).
p(e0,[epsilon]).
p(t,[f,t0]).
p(t0,['*',f,t0]).
p(t0,[epsilon]) .
p(f,['(',e,')']).
p(f,[id]).
Here is a Prolog program which removes left
recursion if possible.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Removal of Left Recursion
% Waite and Goos: p. 126
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
:- dynamic p/2 .
:- dynamic nonterminal/1 .
remove_left_recursion :- setof(N,nonterminal(N),Ns),
rm_left_rec([],Ns).
rm_left_rec(Xjs,[]).
rm_left_rec(Xjs,[Xi|R]) :- step2(Xjs,Xi),
step3(Xi,Xip,Xjs),
append(R,Xip,Rs),
append(Xjs,[Xi],NXjs),
rm_left_rec(NXjs,Rs).
step2([],Xi).
step2([Xj|Xs],Xi) :- e_bagof(W,p(Xi,[Xj|W]),Ws), Ws \= [], Ws \= [[]],
e_bagof(XXj,p(Xj,XXj),XXs),
retractall(p(Xi,[Xj|Wx])),
replace_each0(Xi,Xj,Ws,XXs),
step2(Xs,Xi).
step2([Xj|Xs],Xi) :- step2(Xs,Xi).
replace_each0(Xi,Xj,[],_).
replace_each0(Xi,Xj,[W|Ws],XXs) :- replace_each00(Xi,Xj,W,XXs),
replace_each0(Xi,Xj,Ws,XXs).
replace_each00(Xi,Xj,[],XXs).
replace_each00(Xi,Xj,W,[]).
replace_each00(Xi,Xj,W,[XXj|XXs]) :- append(XXj,W,XXjW),
assert(p(Xi,XXjW)),
replace_each00(Xi,Xj,W,XXs).
step3(Xi,[Xip],Xjs) :- e_bagof(W,p(Xi,[Xi|W]),Ws), Ws \= [], Ws \= [[]],
e_bagof([K|X],(p(Xi,[K|X]),K\=Xi),Xs),
retractall(p(Xi,Rhs)),
newN(Xi,Bi),
assert(nonterminal(Bi)),
assert(p(Bi,[epsilon])),
replace_each1(Xi,Bi,Ws,Xjs),
replace_each2(Xi,Bi,Xs).
step3(Xi,[],Done).
newN(Xi,Xip) :- atomtolist(Xi,L), int(N),
atomtolist(N,Ln), append(L,Ln,Lp), atomtolist(Xip,Lp),
\+ nonterminal(Xip).
int(0).
int(N) :- int(M), N is M+1.
replace_each1(Xi,Bi,[],Xjs).
replace_each1(Xi,Bi,[W|Ws],Xjs) :- append(W,[Bi],WBi),
assert(p(Bi,WBi)),
replace_each1(Xi,Bi,Ws,Xjs).
replace_each2(Xi,Bi,[]).
replace_each2(Xi,Bi,[XX|XXs]) :- append(XX,[Bi],XXBi),
assert(p(Xi,XXBi)),
replace_each2(Xi,Bi,XXs).
Left-Factoring
Another common problem is when two
productions for the same nonterminal share a common prefix on the right-hand
side of the productions. The common prefix makes it impossible to choose (with a
fixed amount of look-ahead) the proper production in top-down parsing. This
elimination of the common prefix is called left-factoring. Here is a grammar for
the if-then-else construct.
terminal(a).
terminal(b).
terminal(e).
terminal(i).
terminal(t).
nonterminal(ss).
nonterminal(cc).
start(ss).
p(ss,[i,cc,t,ss,e,ss]).
p(ss,[i,cc,t,ss]).
p(ss,[a]).
p(cc,[b]).
Left factoring the grammar produces the following
result.
terminal(a).
terminal(b).
terminal(e).
terminal(i).
terminal(t).
nonterminal(ss).
nonterminal(cc).
nonterminal(ss0).
start(ss).
p(ss,[i,cc,t,ss,ss0]).
p(ss,[a]).
p(ss0,[e,ss]).
p(ss0,[epsilon]).
p(cc,[b]).
The following code left-factors the given grammar.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Left Factoring
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
do_left_factoring :- setof(N,nonterminal(N),Ns), do_lf(Ns).
do_lf([]).
do_lf([A|Ns]) :- p(A,R1), p(A,R2), R1 \= R2,
maxcommon(R1,R2,C,R1p,R2p), C \= [],
newN(A,Ap),
assert(nonterminal(Ap)),
restofcommon(A,C,Ap),
append(C,[Ap],Rhs),
assert(p(A,Rhs)),
do_lf([A|Ns]).
do_lf([A|Ns]) :- do_lf(Ns).
restofcommon(A,C,Ap) :- p(A,Rhs),
append(C,R,Rhs),
retract(p(A,Rhs)),
\+ p(Ap,R),
((R =[], \+ p(Ap,[epsilon]), assert(p(Ap,[epsilon])));
(R\=[], \+ p(Ap,R), assert(p(Ap,R)))),
restofcommon(A,C,Ap).
restofcommon(A,C,Ap).
maxcommon([X|R1],[X|R2],[X|C],R1p,R2p) :- maxcommon(R1,R2,C,R1p,R2p).
maxcommon(R1,R2,[],R1,R2).
First and Follow Sets
Even if left-recursion can
be eliminated and the productions have been left-factored, we may not a have a
grammar which is suitable for top-down parsing with one symbol of look-ahead.
For example, in the following grammar it is not possible to choose between the
productions for the nonterminal aa because x is also derivable
from the nonterminal bb.
terminal(x).
terminal(z).
nonterminal(aa).
nonterminal(bb).
nonterminal(cc).
start(aa).
p(aa,[bb,z]).
p(aa,[x,z]).
p(bb,[cc]).
p(bb,[cc,bb]).
p(cc,[x]).
The set of initial terminals derivable from the
right-hand side rhs of a production is called first(rhs). For this
grammar, the first sets are:
first([bb,z],[x])
first([x,z],[x])
first([cc],[x])
first([cc,bb],[x])
first([x],[x])
The set of first sets are computed by the
following program.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% First Sets
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% First(A) = Fs where
% x in Fs iff A =>* xB
% epsilon in Fs iff A =>* epsilon
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
:- dynamic first/2 .
firstsets :- setof(Rhs,N^p(N,Rhs),Rhss), mk_firsts(Rhss).
mk_firsts([]).
mk_firsts([Rhs|Rhss]) :- first_set(Rhs,Fs), assert(first(Rhs,Fs)),
mk_firsts(Rhss).
first_set(L,Fs) :- e_setof(X,a_first(L,[],X),Fs).
a_first([],V,epsilon).
a_first([epsilon],V,epsilon) :- !.
a_first([epsilon|L],V,X) :- a_first(L,V,X).
a_first([X|L],V,X) :- terminal(X).
a_first([N|L],V,X) :- nonterminal(N),
\+ in(N,V),
p(N,LN),
append(LN,L,NL),
a_first(NL,[N|V],X).
When the production p(x,[epsilon]) is included in
the set of productions, selection must also be based on the set of terminals
which can follow a given non-terminal. We can summarize the previous discussion
as follows. A grammar is LL(1) ( suitable for top-down parsing with one symbol
of look ahead ) iff it satisfies the following two rules.
- The sets of initial symbols of all sentences that can be generated from
the right-hand sides of a given non-terminal must be disjoint.
- The set of initial symbols of each non-terminal which generates the empty
string must be disjoint from the set of symbols which can follow it.
The code which constructs the set of initial terminals which can
follow a non-terminal follows.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Follow Sets
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Follow(A) = Fs where
% x in Fs iff S =>* aAxb
% epsilon in Fs iff S =>* aA
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
:- dynamic follow/2 .
followsets :- setof(N,(nonterminal(N),start(S),N \= S),Ns),
mk_mt_follows(Ns,MT),
mk_follows([(S,[eof])|MT],FS),
assertFS(FS).
mk_mt_follows([],[]).
mk_mt_follows([N|Ns],[(N,[])|MT]) :- mk_mt_follows(Ns,MT).
mk_follows(CFS,AFS) :- p(A,L), append(Alpha,[B|Beta],L), nonterminal(B),
Beta \= [],
first_set(Beta,FirstBeta),
delete(epsilon,FirstBeta,FB),
append(Fp,[(B,FSB)|RFs],CFS),
union(FB,FSB,NFSB), \+ samesets(FSB,NFSB),
append(Fp,[(B,NFSB)|RFs],NCFS),
mk_follows(NCFS,AFS).
mk_follows(CFS,AFS) :- p(A,L), append(Alpha,[B],L), nonterminal(B),
append(Fp,[(B,FSB)|RFs],CFS),
in((A,FSA),CFS),
union(FSA,FSB,NFSB), \+ samesets(FSB,NFSB),
append(Fp,[(B,NFSB)|RFs],NCFS),
mk_follows(NCFS,AFS).
mk_follows(CFS,AFS) :- p(A,L), append(Alpha,[B|Beta],L), nonterminal(B),
Beta \= [],
first_set(Beta,FirstBeta), in(epsilon,FirstBeta),
append(Fp,[(B,FSB)|RFs],CFS),
in((A,FSA),CFS),
union(FSA,FSB,NFSB), \+ samesets(FSB,NFSB),
append(Fp,[(B,NFSB)|RFs],NCFS),
mk_follows(NCFS,AFS).
mk_follows(FS,FS).
assertFS([]).
assertFS([(N,Fs)|FS]) :- assert(follow(N,Fs)), assertFS(FS).
Parsing Table and an LL(1) Parser
The set of first and
follow sets may be used to construct a table which can be used to guide the
parser of an language which has an LL(1) grammar. The table is indexed by a
non-terminal and the current input symbol. The following code constructs such a
table.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Construction of Parsing Tables
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
do :- firstsets, followsets, make_LL1_table.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Construction of Predictive Parsing Table
% Aho & Ullman: Algorithm 5.4 p.190
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
:- dynamic table/3 .
make_LL1_table :- setof((N,Rhs),p(N,Rhs),Productions),
make_entries(Productions).
make_entries([(N,Rhs)|Productions]) :- make_an_entry((N,Rhs)),
make_entries(Productions).
make_an_entry((N,Rhs)) :- first(Rhs,First),
each_entry((N,Rhs),First).
each_entry((N,Rhs),[]).
each_entry((N,Rhs),[eof]).
% 2.
each_entry((N,Rhs),First) :- in(T,First), terminal(T),
delete(T,First,Rfirst),
table(N,T,(N,Rhs)),!,
each_entry((N,Rhs),Rfirst).
each_entry((N,Rhs),First) :- in(T,First), terminal(T),
delete(T,First,Rfirst),
assert(table(N,T,(N,Rhs))),
each_entry((N,Rhs),Rfirst).
% 3.
each_entry((N,Rhs),First) :- in(epsilon,First),
follow(A,Follow),
each_terminal((N,Rhs),Follow),
eof_part((N,Rhs),Follow),
table(N,T,(N,Rhs)),
delete(epsilon,First,Rfirst),
each_entry((N,Rhs),Rfirst).
eof_part((N,Rhs),Follow) :- in(eof,Follow), table(N,eof,(N,Rhs)),!.
eof_part((N,Rhs),Follow) :- in(eof,Follow), assert(table(N,eof,(N,Rhs))).
eof_part((N,Rhs),Follow).
each_terminal((N,Rhs),[]).
each_terminal((N,Rhs),[T|Follow]) :- table(N,T,(N,Rhs)),!,
each_terminal((N,Rhs),Follow).
each_terminal((N,Rhs),[T|Follow]) :- assert(table(N,T,(N,Rhs))),
each_terminal((N,Rhs),Follow).
each_terminal((N,Rhs),[epsilon|Follow]) :- each_terminal((N,Rhs),Follow).
each_terminal((N,Rhs),[ eof|Follow]) :- each_terminal((N,Rhs),Follow).
A
parser uses the table generated by the previous program as follows. Initially
the start symbol of the grammar is placed on a stack. Then the next two rules
are followed until the input is empty or a situation arises for which there is
no entry in the table.
- If the current input symbol and the stack top symbol are the same, then
pop the stack and consume the input symbol.
- If the stack top symbol is a non-terminal then consult the table, pop the
stack and push the table entry onto the stack.
Here is the code for an
LL(1) parser.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% LL(1) Parser Program
% Aho & Ullman: p.184,186
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
ll_1_parser :- start(S), input(Input), ll_1_parser([S,eof],Input).
ll_1_parser([],[]).
ll_1_parser([T|Stack],[T|Input]) :- ll_1_parser(Stack,Input).
ll_1_parser([N|Stack],[X|Input]) :- table(N,X,(N,Rhs)),
append(Rhs,Stack,NStack),
ll_1_parser(NStack,[X|Input]).
ll_1_parser(Stack,Input) :- error_recovery_routine(Stack,Input).
Here are some miscellaneous predicates required by the
previous code.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Miscellaneous Predicates
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
append([],L,L).
append([X|A],B,[X|L]) :- append(A,B,L).
in(X,[X|L]).
in(X,[Y|L]) :- in(X,L).
union([],B,B).
union(A,[],A).
union([X|A],B,C) :- in(X,B), union(A,B,C).
union([X|A],B,[X|C]) :- \+ in(X,B), union(A,B,C).
subset([],B).
subset([X|A],B) :- in(X,B), subset(A,B).
samesets(A,B) :- subset(A,B), subset(B,A).
delete(X,[],[]).
delete(X,[X|R],R).
delete(X,[Y|R],[Y|L]) :- X \= Y, delete(X,R,L).
e_bagof(A,B,C) :- bagof(A,B,C),!.
e_bagof(A,B,[]).
e_setof(A,B,C) :- setof(A,B,C),!.
e_setof(A,B,[]).
sort(List,Sorted) :- sort(List,[],Sorted).
sort([],Sorted,Sorted).
sort([X|List],PSort,Sorted) :- insert(X,PSort,Psort), sort(List,Psort,Sorted).
insert(X,[],[X]).
insert(X,[Y|List],[X,Y|List]) :- X @=< Y.
insert(X,[Y|List],[Y|ListP]) :- X @> Y, insert(X,List,ListP).
?1996 by A. Aaby