CSc 8711, Spring 2023, Project 4

JSON Schema, JSONiq, and MongoDB

Due: Wednesday, 22 March 2023 - Strict Deadline for first 3 problems - Solution will be discussed on 24th March. The fourth problem may be submitted by 31st March for no late penalty.

This is an "Individual" project. No collaboration allowed.

I. JSON Schema

Consider the JSON document company.json. This document describes data about company, its employees, projects, and departments. Here are some constraints in the data:

  • ssn is a string but contains 9 digits
  • the "date" fields (dob, startDate, etc.) should be constrained to include valid dates, of the format DD-MON-YYYY (as shown in company.json)
  • relationship field should be enumerated type, with values "Son", "Daughter", or "Spouse"
  • sex field should be enumerated type, with values "M" or "F"
  • the "hours" field should have a value of no more than 40
Write a JSON Schema specification (companySchema.json) for the company JSON instance documents.

II. JSONiq

Consider the company.json document from previous problem. Write JSONiq query expressions for the following queries (place the queries in files ca.jq, cb.jq, ..., cf.jq):

  1. (: Retrieve the name and address of employees who work for the "Research" department. :)
  2. (: Find the names of employees who work on all the projects controlled by department number 5. :)
  3. (: List the names of all employees with two or more dependents. :)
  4. (: List the names of all employees who have a dependent with the same first name as themselves. :)
  5. (: Find the names of employees who are directly supervised by ‘Franklin Wong’. :)
  6. (: Retrieve the names and addresses of all employees who work on at least one project located in Houston but whose department has no location in Houston. :)

III. MongoDB Queries

Consider a MongoDB instance containing the following collections related to a Mail Order database:

  1. customers.js.
  2. employees.js.
  3. parts.js.
  4. orders.js.
Write Python (PyMongo) programs to answer the following queries:
  1. ## For each customer, find a list of Order Numbers they have placed.
  2. ## Get the names of customers who have ordered parts only from employees living in Wichita.
  3. ## Get employee numbers and total sales for each employee.
  4. ## Get the names of customers who had to wait the longest for their orders to be shipped.
  5. ## Get the names of parts that have been ordered the most in terms of quantity.
  6. ## Get the names of employees who have never made a sale to a customer in the same zipcode as themselves.

IV. Baseball Standings App using MongoDB

Consider the following data describing baseball teams and results of games:
macbook-pro:baseball raj$ more teams.dat 
Braves:Atlanta:ATL
Cardinals:Saint Louis:STL
Cubs:Chicago:CHC
Diamondbacks:Arizona:ARI
Indians:Cleveland:CLE

macbook-pro:baseball raj$ more games.dat
2004-03-20:ARI:CHC:10:11
2004-03-23:ATL:STL:0:1
2004-03-27:STL:CHC:7:9
2004-03-27:CLE:ATL:1:0
2004-03-30:ATL:CHC:10:5
2004-04-01:CLE:ARI:8:8
2004-04-15:ARI:ATL:3:11
2004-04-17:CLE:STL:7:11
2004-04-20:STL:ARI:10:12
2004-04-22:CHC:CLE:7:4
2004-04-24:CHC:ARI:7:12
2004-04-29:STL:ATL:2:10
2004-05-01:ATL:CLE:14:14
2004-05-01:CHC:STL:10:0
2004-05-04:CHC:ATL:10:8
2004-05-04:ARI:CLE:8:7
2004-05-08:ATL:ARI:6:8
2004-05-13:STL:CLE:3:6
2004-05-15:ARI:STL:7:13
2004-05-15:CLE:CHC:6:8
2004-05-18:ARI:CHC:13:5
2004-05-22:ATL:STL:3:6
Rows in teams.dat contain team name, team location, and team code, whereas rows in games.dat contain game date, visiting team code, home team code, visiting team score, and home team score.
  1. Write a Python program (loadData.py) to load this data into a MongoDB database. You should accept the data file names as command line parameter (sys.argv[1] is "teams.dat" and sys.argv[2] is "games.dat"). You will create two collections (named teams and games) within a database named baseballDB. You must ensure the following two constraints are satisfied:
    1. The team code serves as a primary key for the teams collection. So, you should make sure that no duplicate team codes are inserted into the teams collection.
    2. Since the team code is used in the the games data, you must ensure that while inserting into the games collection that both the teams involved are present in the teams collection.
    You may assume that the rest of the data is accurate, i.e. there are no data type mis-matches, no missing fields, etc. You may assume that teams.dat and games.dat are available in the current directory where loadData.py is located. Please delete all data before you load!

  2. Write a REST API program (baseball.py) that implements the following two endpoints:
    @app.route('/baseball/standings/', methods=['GET'])
    
    which returns the standings information as a JSON object shown below:
    { standings: [
        { losses: 3, percent: 0.667, tcode: "CHC", ties: 0, tname: "Cubs", wins: 6 },
        { losses: 3, percent: 0.611, tcode: "ARI", ties: 1, tname: "Diamondbacks", wins: 5 },
        { losses: 5, percent: 0.444, tcode: "STL", ties: 0, tname: "Cardinals", wins: 4 },
        { losses: 5, percent: 0.389, tcode: "ATL", ties: 1, tname: "Braves", wins: 3 },
        { losses: 4, percent: 0.375, tcode: "CLE", ties: 2, tname: "Indians", wins: 2 }
      ]
    }
    
    and
    @app.route('/baseball/results/<string:tcode>/', methods=['GET'])
    
    which returns a list of game results for a particular team in chronological order as a JSON object shown below:
    {
      results: [
        { gdate: "2004-03-20", opponent: "ARI", result: "WIN", them: 10, us: 11 },
        { gdate: "2004-03-27", opponent: "STL", result: "WIN", them: 7, us: 9 },
        { gdate: "2004-03-30", opponent: "ATL", result: "LOSS", them: 10, us: 5 },
        { gdate: "2004-04-22", opponent: "at CLE", result: "WIN", them: 4, us: 7 },
        { gdate: "2004-04-24", opponent: "at ARI", result: "LOSS", them: 12, us: 7 },
        { gdate: "2004-05-01", opponent: "at STL", result: "WIN", them: 0, us: 10 },
        { gdate: "2004-05-04", opponent: "at ATL", result: "WIN", them: 8, us: 10 },
        { gdate: "2004-05-15", opponent: "CLE", result: "WIN", them: 6, us: 8 },
        { gdate: "2004-05-18", opponent: "ARI", result: "LOSS", them: 13, us: 5 }
      ],
      tloc: "Chicago",
      tname: "Cubs"
    }
    
    You may use the front end I have developed to do a final check.

What to Submit?

  1. companySchema.json
  2. ca.jq, cb.jq, cc.jq, cd.jq, ca.je, and cf.jq
  3. ma.py, mb.py, mc.py, md.py, me.py, and mf.py
  4. loadData.py, baseball.py