Csc 1302, Honors Principles of Computer Science II (Fall 2023)

Week 2 (10 September 2023)

Initialize Database from Files; Check Equality of Tuples; Remove Duplicates in Relation

During this week you will write the following methods:
  1. Initialize Database object by reading data stored in several files in a directory that is given in command line (this method belongs to Database.py)
    # Create the database object by reading data from several files in directory dir
    def initializeDatabase(self, dir):
    	pass
    
    Here is a sample of files available in the directory: drinks (also in .zip format: drinks.zip). The file catalog.dat contains schema information for all the relations in the database and individual .dat files contain the relation instances (tuples).

    To read data from a file, you can use the following code:

    f = open("f.dat","r")
    s = f.readLine().strip("\n")
    ...
    ...
    f.close()
    
  2. Test equality of tuples (this method belongs to Tuple.py)
    # Return True if this tuple is equal to compareTuple; False otherwise
    # make sure the schemas are the same; return False if schema's are not same
    def equals(self, compareTuple):
    	pass
    
  3. Remove duplicate tuples (this method belongs to Relation.py)
    ## Remove duplicate tuples from this relation
    def removeDuplicates(self):
    	pass
    

Download the Driver Programs and implement all the methods in the Python classes. Compile and run the driver programs.


You should see the following output when you run Driver2.py:
Mac-mini:week2 raj$ python3 Driver2.py drinks
BAR(BNAME:VARCHAR)
Number of tuples:4

Jillians:
Dugans:
ESPN Zone:
Charlies:

DRINKER(DNAME:VARCHAR)
Number of tuples:5

John:
Peter:
Donald:
Jeremy:
Clark:

SELLS(BAR:VARCHAR,BEER:VARCHAR,PRICE:INTEGER)
Number of tuples:9

Jillians:Bud:6:
Jillians:Michelob:6:
Jillians:Heineken:8:
Dugans:Bud:9:
Dugans:Michelob:10:
Dugans:Fosters:12:
ESPN Zone:Fosters:9:
Charlies:Heineken:10:
Charlies:Foster:10:

Mac-mini:week2 raj$

and the following output when you run the DuplicatesDriver.py program:
Mac-mini:week2 raj$ python3 DuplicatesDriver.py 
Before Removing Duplicates: 
STUDENT(SID:INTEGER,SNAME:VARCHAR)
Number of tuples:7

1111:Robert Adams:
1112:Charles Bailey:
1113:Donald James:
1112:Charles Bailey:
1112:Charles Bailey:
1114:Michael James:
1113:Donald James:

After Removing Duplicates: 
STUDENT(SID:INTEGER,SNAME:VARCHAR)
Number of tuples:4

1111:Robert Adams:
1112:Charles Bailey:
1113:Donald James:
1114:Michael James:


Pseudo code for initializeDatabase

## Create the database object by reading data from several files in directory dir
def initializeDatabase(dir):
  ## Pseudo Code follows
  Open file "catalog.dat"
  Read number of relations in the database
  for each relation:
    ## Read relation name and schema information
    Read relation name
    Read number of attributes un the relation
    Create empty array lists for attributes and domain
    for each attribute:
      Read attribute name
      Read attribute domain
      Add attribute name to attributes array list
      Add domain to domain array list
    Create a new Relation object
    ## Now Read data for tuples, create tuples and add to relation
    Construct file name for relation data file
    Open the relation data file
    Read number of tuples in relation
    for each tuple:
      Create new tuple object
      for each component of tuple:
        Read component value and add to tuple
      Add tuple to relation
    Close relation data file
    Add relation to database
  Close catalog.dat file