Csc 8711, Databases and the Web (Spring 2019)

Xquery - Transcript Example in basex

Consider the following XML document:
<transcripts>
  <transcript>
    <student sid="111111111" name="John Doe"/>
    <course code="CS308" semester="F1997" grade="B"/>
    <course code="MAT123" semester="F1997" grade="B"/>
    <course code="EE101" semester="F1997" grade="A"/>
    <course code="CS305" semester="F1995" grade="A"/>
  </transcript>
  <transcript>
    <student sid="987654321"  name="Bart Simpson" />
    <course code="CS305" semester="F1995" grade="C"/>
    <course code="CS308" semester="F1994" grade="B"/>
  </transcript>
  <transcript>
    <student sid="123454321"  name="Joe Blow" />
    <course code="CS315" semester="S1997" grade="A" />
    <course code="CS305" semester="S1996" grade="A" />
    <course code="MAT123" semester="S1996" grade="C" />
  </transcript>
  <transcript>
    <student sid="023456789" name="Homer Simpson" />
    <course code="EE101" semester="F1995" grade="B" />
    <course code="CS305" semester="S1996" grade="A" />
  </transcript>
</transcripts>
and the task of producing class rosters for each class. Here is a first attempt at writing a Xquery expression to solve the problem:
MacBook-Pro:sp19 raj$ more q1.xq 
<rosters>
{
  for $c in doc("enrollDB/transcript.xml")//course
  order by $c/@code
  return
  <roster code="{$c/@code}" semester="{$c/@semester}">
  {
    for $t in doc("enrollDB/transcript.xml")//course
    order by $t/../student/@sid
    where $t/@code = $c/@code and
          $t/@semester = $c/@semester
    return $t/../student
  }
  </roster>
}
</rosters>
Running this on basex gives the following:
> run q1.xq
<rosters>
  <roster code="CS305" semester="F1995">
    <student sid="111111111" name="John Doe"/>
    <student sid="987654321" name="Bart Simpson"/>
  </roster>
  <roster code="CS305" semester="F1995">
    <student sid="111111111" name="John Doe"/>
    <student sid="987654321" name="Bart Simpson"/>
  </roster>
  <roster code="CS305" semester="S1996">
    <student sid="023456789" name="Homer Simpson"/>
    <student sid="123454321" name="Joe Blow"/>
  </roster>
  <roster code="CS305" semester="S1996">
    <student sid="023456789" name="Homer Simpson"/>
    <student sid="123454321" name="Joe Blow"/>
  </roster>
  <roster code="CS308" semester="F1997">
    <student sid="111111111" name="John Doe"/>
  </roster>
  <roster code="CS308" semester="F1994">
    <student sid="987654321" name="Bart Simpson"/>
  </roster>
  <roster code="CS315" semester="S1997">
    <student sid="123454321" name="Joe Blow"/>
  </roster>
  <roster code="EE101" semester="F1997">
    <student sid="111111111" name="John Doe"/>
  </roster>
  <roster code="EE101" semester="F1995">
    <student sid="023456789" name="Homer Simpson"/>
  </roster>
  <roster code="MAT123" semester="F1997">
    <student sid="111111111" name="John Doe"/>
  </roster>
  <roster code="MAT123" semester="S1996">
    <student sid="123454321" name="Joe Blow"/>
  </roster>
</rosters>
Query "q1.xq" executed in 8.31 ms.
Notice the duplicate enties for "CS305-F1995" and "CS305-S1996". This is because the outer for-clause in the query iterates through all course sub-elements in the document and produces a roster for each. To avoid producing these duplicates, we should produce a "unique" set of courses and then iterate through this set. The following query does this:
import module namespace functx = 'http://www.functx.com';

<rosters>
{
  let $courses := 
    <courses> 
    {
      for $c in doc("enrollDB/transcript.xml")//course 
      return <course code="{$c/@code}" semester="{$c/@semester}" />
    }
    </courses>
  for $c in functx:distinct-deep($courses/course)
  order by $c/@code
  return
    <roster code="{$c/@code}" semester="{$c/@semester}">
    {
      for $t in doc("enrollDB/transcript.xml")//course
      order by $t/../student/@sid
      where $t/@code = $c/@code and
            $t/@semester = $c/@semester
      return $t/../student
    }
    </roster>
}
</rosters>
Notice the use of an external funxtion functx:distinct-deep. To install the external functions from functx package, run the following command inside basex (this needs to be done only once):
> REPO INSTALL http://files.basex.org/modules/expath/functx-1.0.xar
Package 'http://files.basex.org/modules/expath/functx-1.0.xar' installed in 783.77 ms.
Running this modified query in basex gives us the correct results:
> run q2.xq
<rosters>
  <roster code="CS305" semester="F1995">
    <student sid="111111111" name="John Doe"/>
    <student sid="987654321" name="Bart Simpson"/>
  </roster>
  <roster code="CS305" semester="S1996">
    <student sid="023456789" name="Homer Simpson"/>
    <student sid="123454321" name="Joe Blow"/>
  </roster>
  <roster code="CS308" semester="F1997">
    <student sid="111111111" name="John Doe"/>
  </roster>
  <roster code="CS308" semester="F1994">
    <student sid="987654321" name="Bart Simpson"/>
  </roster>
  <roster code="CS315" semester="S1997">
    <student sid="123454321" name="Joe Blow"/>
  </roster>
  <roster code="EE101" semester="F1997">
    <student sid="111111111" name="John Doe"/>
  </roster>
  <roster code="EE101" semester="F1995">
    <student sid="023456789" name="Homer Simpson"/>
  </roster>
  <roster code="MAT123" semester="F1997">
    <student sid="111111111" name="John Doe"/>
  </roster>
  <roster code="MAT123" semester="S1996">
    <student sid="123454321" name="Joe Blow"/>
  </roster>
</rosters>
Query "q2.xq" executed in 88.09 ms.
Another way to solve the problem is to have a separate XML file that records all the classes being taught (no duplicates here!):
<classes>
  <class code="CS308" semester="F1997">
    <title>SE</title>
    <instructor>Adrian Jones</instructor>
  </class>
  <class code="MAT123" semester="F1997">
    <title>Algebra</title>
    <instructor>Ann White</instructor>
  </class>
  <class code="EE101" semester="F1997">
    <title>Circuits</title>
    <instructor>David Jones</instructor>
  </class>
  <class code="CS305" semester="F1995">
    <title>Databases</title>
    <instructor>Mary Doe</instructor>
  </class>
  <class code="EE101" semester="F1995">
    <title>Circuits</title>
    <instructor>David Jones</instructor>
  </class>
  <class code="CS308" semester="F1994">
    <title>SE</title>
    <instructor>Adrian Jones</instructor>
  </class>
  <class code="CS315" semester="S1997">
    <title>TP</title>
    <instructor>John Smyth</instructor>
  </class>
  <class code="CS305" semester="S1996">
    <title>Databases</title>
    <instructor>John Doe</instructor>
  </class>
  <class code="MAT123" semester="S1996">
    <title>Algebra</title>
    <instructor>Ann White</instructor>
  </class>
  <class code="CS8711" semester="S2018">
    <title>Databases and the Web</title>
    <instructor>Raj Sunderraman</instructor>
  </class>
</classes>
The following query produces the correct results while querying the two XML files:
<rosters>
{
  for $c in doc("enrollDB/class.xml")/classes/class
  order by $c/@code
  return
  <roster code="{$c/@code}" semester="{$c/@semester}">
  {
    for $t in doc("enrollDB/transcript.xml")/transcripts/transcript/course
    order by $t/../student/@sid
    where $t/@code = $c/@code and
          $t/@semester = $c/@semester
    return $t/../student
  }
  </roster>
}
</rosters>