Csc 4330/6330, Programming Language Concepts (Summer 2020)

Homework 1 (Due: 16 June (Tuesday); Late submission by 23 June (Tuesday))

sudo handin4330 1 DLOGAtom.py DLOGAtomLexer.py DLOGParser.py

Datalog Atom to Relational Algebra

Write a Python program using the PLY package to compute a Relational Algebra expression corresponding to an atomic formula in Datalog. Here are some sample atomic formulas from Datalog:
employee("jones",20,x,y,"john",x,y,x,x,z)
p(x,x,x,x,x)
p(x,20,x,y,"john",x,y)
student(x,y,z)
As can be seen, an atomic formula in Datalog begins with a NAME token followed by a left parenthesis token (LPAREN), followed by a list or arguments and ends with a right parenthesis token (RPAREN). The list of arguments is COMMA separated and each of argument can be either a NUMBER constant (assume positive integers) or a STRING constant, or a NAME corresponding to a variable.

A sample run of the final program on a Terminal is shown below (user input is shown in RED):

macbook-pro:dlog-atom raj$ python3 DLOGAtom.py

DLGAtom: employee("jones",20,x,y,"john",x,y,x,x,z)
rename[x,y,z](project[x_2,x_3,x_9](select[x_0="jones" and x_1=20 and x_4="john" and x_2=x_5 and x_5=x_7 and x_7=x_8 and x_3=x_6](rename[x_0,x_1,x_2,x_3,x_4,x_5,x_6,x_7,x_8,x_9](employee))))

DLGAtom: p(x,x,x,x,x)
rename[x](project[x_0](select[x_0=x_1 and x_1=x_2 and x_2=x_3 and x_3=x_4](rename[x_0,x_1,x_2,x_3,x_4](p))))

DLGAtom: p(x,20,x,y,"john",x,y)
rename[x,y](project[x_0,x_3](select[x_1=20 and x_4="john" and x_0=x_2 and x_2=x_5 and x_3=x_6](rename[x_0,x_1,x_2,x_3,x_4,x_5,x_6](p))))

DLGAtom: student(x,y,z)
rename[x,y,z](student)

DLGAtom: exit
You will develop the program in stages as discussed in the video lectures:
  1. STEP 1: Identify all tokens and define the lexical specifications in DLOGAtomLexer.py. This has already been done for you above!
  2. STEP 2: Write a grammar specification for the Datalog atom. Then, write the Parser specification in DLOGAtomParser.py.
  3. STEP 3: Write the main program (DLOGAtom.py) that prompts the user, reads a Datalog atom from the keyboard, calls the parser which returns a data structure containing the details of the atom. You may print this data structure to verify that the lexical analysis and parsing steps work.
  4. STEP 4: Finally, write the code in DLOGAtom.py to process the data structure and generate the Relational Algebra string. Print this string to terminal.
The conversion from Datalog atom to Relational Algebra string works as follows (describe in an inside-out manner, with p(x,20,x,y,"john",x,y) as an example):
  1. First produce a list of variables, x_0, ..., x_n-1, where n is the number of arguments in the Datalog atom and generate the following string:
    rename[x_0,x_1,x_2,x_3,x_4,x_5,x_6](p)
    
  2. Next, generate the select conditions; There are two types of select conditions: one that correspond to the numeric or string constants in the Datalog atom and the other that correspond to repeated variable arguments in the Datalog atom. Combine all of the conditions with "and" and generate the following string:
    select[x_1=20 and x_4="john" and x_0=x_2 and x_2=x_5 and x_3=x_6](rename[x_0,x_1,x_2,x_3,x_4,x_5,x_6](p))
    
    As you can see, there are two conditions that correspond to the constant arguments of the atom, two conditions to ensure that the 3 instances of variable x are equated and one condition to ensure that the 2 instances of the variable y are equated. Notice that this string builds on the string generated in the previous step.
  3. Next, generate the project list; this list consists of the unique variables in the Datalog atom (in this case the unique variables are x and y); Generate the following string, again building on the previous string:
    project[x_0,x_3](select[x_1=20 and x_4="john" and x_0=x_2 and x_2=x_5 and x_3=x_6](rename[x_0,x_1,x_2,x_3,x_4,x_5,x_6](p)))
    
  4. In the final step, produce the rename list of unique variables from the original names of variables and generate the final string:
    rename[x,y](project[x_0,x_3](select[x_1=20 and x_4="john" and x_0=x_2 and x_2=x_5 and x_3=x_6](rename[x_0,x_1,x_2,x_3,x_4,x_5,x_6](p))))
    
Notice that in the final example, student(x,y,z), there are no constant arguments and no repeated variables. In this case the select conditions will be empty. In this case produce only the final rename part as shown in the sample run. This is a special case and can be handled at the end.

Submit the following files: DLOGAtom.py, DLOGAtomLexer.py, DLOGAtomParser.py, and README.

Since this assignment consists of several phases, and in case you cannot complete the assignment fully, please put down all the phases that are working in the README file. You may also activated debug statements (prints) that show the parts working. Of course, if the program is working then comment out all the debug prints.