CSc 8711 Databases and the Web

CSc 8711, Spring 2019, Project 2

XQuery, XML Schema, and XSLT

Due: Sunday, 24 February 2019

This is an "Individual" project. No collaboration allowed.

XQuery

(I) Drinks Data

Consider the XML document drinks.xml which records data about bars, beers, and drinkers in a local neighborhood. Write XQuery expressions to answer the following queries (place the queries in files da.xq, db.xq, ..., de.xq):

Find bars that serve a beer that Donald likes.
Find drinkers who frequent at least one bar that serves a beer they like.
Find drinkers who frequent at least all those bars that Donald frequents.
Find drinkers who frequent no bar that serves a beer they like.
Find drinkers who frequent only bars that serve at least one beer they like.

(II) Countries Data

Consider the XML document countries.xml which records data about countries, their populations, languages spoken, and cities. Write XQuery expressions to answer the following queries (place the queries in files ca.xq, cb.xq, ..., cj.xq)::

Return the area of Mongolia.
Return the names of all countries that have at least three cities with population greater than 3 million.

Create a list of French-speaking and German-speaking countries. The result should take the form:

<result>
  <French>
    <country>country-name</country>
    <country>country-name</country>
    ...
  </French>
  <German>
    <country>country-name</country>
    <country>country-name</country>
    ...
  </German>
</result>

Return the countries with the highest and lowest population densities. Note that because the "/" operator has its own meaning in XPath and XQuery, the division operator is infix "div". To compute population density use "(@population div @area)". You can assume density values are unique. The result should take the form:
```
<result>
  <highest density="value">country-name</highest>
  <lowest density="value">country-name</lowest>
</result>
```
Return the names of all countries where over 50% of the population speaks German.
Return the names of all countries whose name textually contains a language spoken in that country. For instance, Uzbek is spoken in Uzbekistan, so return Uzbekistan.
Return the names of all countries in which people speak a language whose name textually contains the name of the country. For instance, Japanese is spoken in Japan, so return Japan.
Return the names of all countries for which the data does not include any languages or cities, but the country has more than 10 million people.
Return all countries that have at least one city with population greater than 7 million. For each one, return the country name along with the cities greater than 7 million, in the format:
```
<country name="country-name">
  <big>city-name</big>
  <big>city-name</big>
  ...
</country>
```
For each language spoken in one or more countries, create a "language" element with a "name" attribute and one "country" subelement for each country in which the language is spoken. The "country" subelements should have two attributes: the country "name", and "speakers" containing the number of speakers of that language (based on language percentage and the country's population). Order the result by language name, and enclose the entire list in a single "languages" element. For example, your result might look like:
```
<languages>
  ...
  <language name="Arabic">
    <country name="Iran" speakers="660942"/>
    <country name="Saudi Arabia" speakers="19409058"/>
    <country name="Yemen" speakers="13483178"/>
  </language>
  ...
</languages>
```

XML Schema

(I) Gradebook Data

Consider the XML document gradebook.xml. This document describes grade book data as kept by instructors of courses in a university. Here are some constraints in the data:

cno data starts with 3 lower-case letters followed by 3 digits.
sid values are 4 digit numbers.
minit value is a single upper-case letter.
cid values begin with a lower case letters "f", "sp" or "su" followed by a 2-digit number followed by a hyphen (-) followed by a 1- or 2-digit number.
term values begin with a lower case letters "f", "sp" or "su" followed by a 2-digit number.
lineno is a 4-digit number.
a, b, c, and d values are between 0 and 100.
maxpoints can be a number between 1 and 1000.
weight is a number between 1 and 100.
score value is a number between 1 and 1000.

Write an XML Schema for the gradebook XML documents. Submit the schema under the file gradebook.xsd.

(II) Binary Tree Data

Consider a sample binary-tree XML document, storing a set of decimal values, given below:

<node value="5.2" child="none" >
  <node value="8.0" child="left" >
    <node value="6.0" child="left" />
    <node value="18.5" child="right" >
      <node value="2.0" child="left" >
        <node value="-60.0" child="right" />
      </node>
    </node>
  </node>
  <node value="-9.8" child="right" >
    <node value="14.2" child="left" >
      <node value="80.5"  child="right" />
    </node>
    <node value="24.0" child="right" />
  </node>
</node>

Write an XML Schema for such XML documents. Submit the schema under the file btree.xsd.

XSLT

(I) PhD Students Data

Consider the XML data related to PhD graduates of the CS department at GSU as shown in phd.xml. Write a XQuery expression to convert this data into a new format/view as shown in phd2.xml. The data is reorganized to club together all students graduating in the same term. Now write an XSL Transform that will operate on phd2.xml to produce a Web page similar to phd-graduates.html.

(II) Geography Data

Consider the geography XML document as shown in geo.xml. Write XSLT programs to display the list of states and their capital cities in an HTML page in tabular format. The state name should be hyperlinked to a detail Web page for that state displaying all information for the state. PhP program provided in the main class Web page may be used to invoke the XSLT programs.