Assignment 4 (Context-Free Languages and Grammars)

  1. Construct context-free grammars for the following languages:

    1. The set of words with even number of a's or odd number of b's

    2. L = {anb2namb2m | m ≥ 1 and n ≥ 1 }

    3. The set of words that have more a's than b's.

  2. Consider the following CFG:
    	S -> S + S
    	S -> S * S
    	S -> (S)
    	S -> a
    

    1. Give ALL possible derivations for the string a+a*a

    2. What parse trees can you generate for these derivations?

  3. Consider the following grammar for "Datalog Atoms":
    atom : NAME LPAREN args RPAREN  
      
    args : args COMMA arg  
    args : arg  
      
    arg  : NUMBER  
    arg  : STRING  
    arg  : NAME
    
    Write a "left-most" derivation for p(x,20,x,y,"john") and draw the parse tree.

  4. A JSON expression is one of four types:
    1. STRING
    2. NUMBER
    3. JSON object, which starts with a left brace (LBRACE), followed by a list of key-value pairs separated by COMMAs and ends with a right brace (RBRACE). The key-value pair has a COLON between the key and the value. The keys are STRINGS enclosed within double-quotes and values can be any JSON expression.
    4. JSON array, which starts with a left bracket (LBRACKET), followed by a list of JSON expresssions separated by COMMAs, and ends with a right bracket (RBRACKET).
    Some examples of JSON expressions are:
    25
    
    "jones"
    
    [20, 30, 40]
    
    [20,"jones",22]
    
    [{"a" : 20},{"b": 30}]
    
    { "name" : "John",
      "age" : 23,
      "address" : { "street" : "123 main street",
                    "city"   : "Atlanta",
                    "state"  : "Georgia",
                    "zip"    : "30303"
                  },
      "phone" : ["111-1234","111-1235"]
    }
    
    Write a context-free grammar for JSON expressions. Then, input the grammar rules into PLY programs (Lexer, Parser, and Main) and test this. Include the SEMI token at the end of the JSON expression, read from terminal, and call the parse() method. You need not construct a data structure - so you simply do not include the assignment statement that assigns a value into p[0]! However, for the start rule, include the assignment statement that assigns True to p[0]. This way you can check for return value and print a "VALID" or "INVALID" message. Use the main program JSON.py. A sample run is shown below:
    macbook-pro:json raj$ python3 JSON.py
    JSON: 45;
    VALID
    JSON: "jones";
    VALID
    JSON: [1,2,3];
    VALID
    JSON: [1,"jones",3];
    VALID
    JSON: {"a":2, "b":5};
    VALID
    JSON: { "street" : "123 main street", "city"   : "Atlanta", "state"  : "Georgia", "zip"    : "30303" };
    VALID
    JSON: [{"a":10},{"b":20}];
    VALID
    JSON: {"a": [1,2,3], "b": [3,4,5]};
    VALID
    JSON: [1,2,,];
    Syntax error in input!
    INVALID
    JSON: {1,2,3};
    Syntax error in input!
    INVALID
    JSON: exit;
    
    Submit JSONLexer.py and JSONParser.py.

  5. Write a Python program using PLY to compute a Relational Algebra expression corresponding to an atomic formula in Datalog. Here are some sample atomic formulas from Datalog:
    employee("jones",20,x,y,"john",x,y,x,x,z)
    p(x,x,x,x,x)
    p(x,20,x,y,"john",x,y)
    student(x,y,z)
    
    As can be seen, an atomic formula in Datalog begins with a NAME token followed by a left parenthesis (LPAREN), followed by a list or arguments and ends with a right parenthesis (RPAREN). The list of arguments is COMMA separated and each of argument can be either a NUMBER constant (assume positive integers) or a STRING constant, or a NAME corresponding to a variable.

    A sample run of the final program on a Terminal is shown below (user input is shown in RED):

    macbook-pro:dlog raj$ python3 DLOG.py
    
    DLG: employee("jones",20,x,y,"john",x,y,x,x,z)
    rename[x,y,z](project[x_2,x_3,x_9](select[x_0="jones" and x_1=20 and x_4="john" and x_2=x_5 and x_5=x_7 and x_7=x_8 and x_3=x_6](rename[x_0,x_1,x_2,x_3,x_4,x_5,x_6,x_7,x_8,x_9](employee))))
    
    DLG: p(x,x,x,x,x)
    rename[x](project[x_0](select[x_0=x_1 and x_1=x_2 and x_2=x_3 and x_3=x_4](rename[x_0,x_1,x_2,x_3,x_4](p))))
    
    DLG: p(x,20,x,y,"john",x,y)
    rename[x,y](project[x_0,x_3](select[x_1=20 and x_4="john" and x_0=x_2 and x_2=x_5 and x_3=x_6](rename[x_0,x_1,x_2,x_3,x_4,x_5,x_6](p))))
    
    DLG: student(x,y,z)
    rename[x,y,z](student)
    
    DLG: exit
    
    Some notes:

    1. STEP 1: Identify all tokens and define the lexical specifications in DLOGLexer.py. This has already been done for you above!

    2. STEP 2: Write a grammar specification for the Datalog atom. Then, write the Parser specification in DLOGParser.py.

    3. STEP 3: Write the main program (DLOG.py) that prompts the user, reads a Datalog atom from the keyboard, calls the parser which returns a data structure containing the details of the atom. While debugging, you may print this data structure to verify that the lexical analysis and parsing steps work.

    4. STEP 4: Finally, write the code in DLOG.py to process the data structure and generate the Relational Algebra string. Print this string to terminal.

    The conversion from Datalog atom to Relational Algebra string works as follows (described in an inside-out manner, with p(x,20,x,y,"john",x,y) as an example):

    1. First produce a list of variables, x_0, ..., x_n-1, where n is the number of arguments in the Datalog atom and generate the following string:
      rename[x_0,x_1,x_2,x_3,x_4,x_5,x_6](p)
      
    2. Next, generate the select conditions; There are two types of select conditions: one that correspond to the numeric or string constants in the Datalog atom and the other that correspond to repeated variable arguments in the Datalog atom. Combine all of the conditions with "and" and generate the following string:
      select[x_1=20 and x_4="john" and x_0=x_2 and x_2=x_5 and x_3=x_6](rename[x_0,x_1,x_2,x_3,x_4,x_5,x_6](p))
      
      As you can see, there are two conditions that correspond to the constant arguments of the atom, two conditions to ensure that the 3 instances of variable x are equated and one condition to ensure that the 2 instances of the variable y are equated. Notice that this string builds on the string generated in the previous step.
    3. Next, generate the project list; this list consists of the unique variables in the Datalog atom (in this case the unique variables are x and y); Generate the following string, again building on the previous string:
      project[x_0,x_3](select[x_1=20 and x_4="john" and x_0=x_2 and x_2=x_5 and x_3=x_6](rename[x_0,x_1,x_2,x_3,x_4,x_5,x_6](p)))
      
    4. In the final step, produce the rename list of unique variables from the original names of variables and generate the final string:
      rename[x,y](project[x_0,x_3](select[x_1=20 and x_4="john" and x_0=x_2 and x_2=x_5 and x_3=x_6](rename[x_0,x_1,x_2,x_3,x_4,x_5,x_6](p))))
      
    Notice that in the final example, student(x,y,z), there are no constant arguments and no repeated variables resulting in an empty select condition. In this case produce only the final rename part as shown in the sample run. This is a special case and can be handled at the end.

    Submit the following files: DLOG.py, DLOGLexer.py, and DLOGParser.py.