Csc 8711, Databases and the Web (Spring 2019)
Xquery - Transcript Example in basex
Consider the following XML document:
<transcripts>
<transcript>
<student sid="111111111" name="John Doe"/>
<course code="CS308" semester="F1997" grade="B"/>
<course code="MAT123" semester="F1997" grade="B"/>
<course code="EE101" semester="F1997" grade="A"/>
<course code="CS305" semester="F1995" grade="A"/>
</transcript>
<transcript>
<student sid="987654321" name="Bart Simpson" />
<course code="CS305" semester="F1995" grade="C"/>
<course code="CS308" semester="F1994" grade="B"/>
</transcript>
<transcript>
<student sid="123454321" name="Joe Blow" />
<course code="CS315" semester="S1997" grade="A" />
<course code="CS305" semester="S1996" grade="A" />
<course code="MAT123" semester="S1996" grade="C" />
</transcript>
<transcript>
<student sid="023456789" name="Homer Simpson" />
<course code="EE101" semester="F1995" grade="B" />
<course code="CS305" semester="S1996" grade="A" />
</transcript>
</transcripts>
and the task of producing class rosters for each class. Here is
a first attempt at writing a Xquery expression to solve the problem:
MacBook-Pro:sp19 raj$ more q1.xq
<rosters>
{
for $c in doc("enrollDB/transcript.xml")//course
order by $c/@code
return
<roster code="{$c/@code}" semester="{$c/@semester}">
{
for $t in doc("enrollDB/transcript.xml")//course
order by $t/../student/@sid
where $t/@code = $c/@code and
$t/@semester = $c/@semester
return $t/../student
}
</roster>
}
</rosters>
Running this on basex gives the following:
> run q1.xq
<rosters>
<roster code="CS305" semester="F1995">
<student sid="111111111" name="John Doe"/>
<student sid="987654321" name="Bart Simpson"/>
</roster>
<roster code="CS305" semester="F1995">
<student sid="111111111" name="John Doe"/>
<student sid="987654321" name="Bart Simpson"/>
</roster>
<roster code="CS305" semester="S1996">
<student sid="023456789" name="Homer Simpson"/>
<student sid="123454321" name="Joe Blow"/>
</roster>
<roster code="CS305" semester="S1996">
<student sid="023456789" name="Homer Simpson"/>
<student sid="123454321" name="Joe Blow"/>
</roster>
<roster code="CS308" semester="F1997">
<student sid="111111111" name="John Doe"/>
</roster>
<roster code="CS308" semester="F1994">
<student sid="987654321" name="Bart Simpson"/>
</roster>
<roster code="CS315" semester="S1997">
<student sid="123454321" name="Joe Blow"/>
</roster>
<roster code="EE101" semester="F1997">
<student sid="111111111" name="John Doe"/>
</roster>
<roster code="EE101" semester="F1995">
<student sid="023456789" name="Homer Simpson"/>
</roster>
<roster code="MAT123" semester="F1997">
<student sid="111111111" name="John Doe"/>
</roster>
<roster code="MAT123" semester="S1996">
<student sid="123454321" name="Joe Blow"/>
</roster>
</rosters>
Query "q1.xq" executed in 8.31 ms.
Notice the duplicate enties for "CS305-F1995" and "CS305-S1996". This is because
the outer for-clause in the query iterates through all course sub-elements
in the document and produces a roster for each. To avoid producing these
duplicates, we should produce a "unique" set of courses and then iterate
through this set. The following query does this:
import module namespace functx = 'http://www.functx.com';
<rosters>
{
let $courses :=
<courses>
{
for $c in doc("enrollDB/transcript.xml")//course
return <course code="{$c/@code}" semester="{$c/@semester}" />
}
</courses>
for $c in functx:distinct-deep($courses/course)
order by $c/@code
return
<roster code="{$c/@code}" semester="{$c/@semester}">
{
for $t in doc("enrollDB/transcript.xml")//course
order by $t/../student/@sid
where $t/@code = $c/@code and
$t/@semester = $c/@semester
return $t/../student
}
</roster>
}
</rosters>
Notice the use of an external funxtion
functx:distinct-deep.
To install the external functions from
functx package, run the following
command inside basex (this needs to be done only once):
> REPO INSTALL http://files.basex.org/modules/expath/functx-1.0.xar
Package 'http://files.basex.org/modules/expath/functx-1.0.xar' installed in 783.77 ms.
Running this modified query in basex gives us the correct results:
> run q2.xq
<rosters>
<roster code="CS305" semester="F1995">
<student sid="111111111" name="John Doe"/>
<student sid="987654321" name="Bart Simpson"/>
</roster>
<roster code="CS305" semester="S1996">
<student sid="023456789" name="Homer Simpson"/>
<student sid="123454321" name="Joe Blow"/>
</roster>
<roster code="CS308" semester="F1997">
<student sid="111111111" name="John Doe"/>
</roster>
<roster code="CS308" semester="F1994">
<student sid="987654321" name="Bart Simpson"/>
</roster>
<roster code="CS315" semester="S1997">
<student sid="123454321" name="Joe Blow"/>
</roster>
<roster code="EE101" semester="F1997">
<student sid="111111111" name="John Doe"/>
</roster>
<roster code="EE101" semester="F1995">
<student sid="023456789" name="Homer Simpson"/>
</roster>
<roster code="MAT123" semester="F1997">
<student sid="111111111" name="John Doe"/>
</roster>
<roster code="MAT123" semester="S1996">
<student sid="123454321" name="Joe Blow"/>
</roster>
</rosters>
Query "q2.xq" executed in 88.09 ms.
Another way to solve the problem is to have a separate XML file that records
all the classes being taught (no duplicates here!):
<classes>
<class code="CS308" semester="F1997">
<title>SE</title>
<instructor>Adrian Jones</instructor>
</class>
<class code="MAT123" semester="F1997">
<title>Algebra</title>
<instructor>Ann White</instructor>
</class>
<class code="EE101" semester="F1997">
<title>Circuits</title>
<instructor>David Jones</instructor>
</class>
<class code="CS305" semester="F1995">
<title>Databases</title>
<instructor>Mary Doe</instructor>
</class>
<class code="EE101" semester="F1995">
<title>Circuits</title>
<instructor>David Jones</instructor>
</class>
<class code="CS308" semester="F1994">
<title>SE</title>
<instructor>Adrian Jones</instructor>
</class>
<class code="CS315" semester="S1997">
<title>TP</title>
<instructor>John Smyth</instructor>
</class>
<class code="CS305" semester="S1996">
<title>Databases</title>
<instructor>John Doe</instructor>
</class>
<class code="MAT123" semester="S1996">
<title>Algebra</title>
<instructor>Ann White</instructor>
</class>
<class code="CS8711" semester="S2018">
<title>Databases and the Web</title>
<instructor>Raj Sunderraman</instructor>
</class>
</classes>
The following query produces the correct results while querying the two XML
files:
<rosters>
{
for $c in doc("enrollDB/class.xml")/classes/class
order by $c/@code
return
<roster code="{$c/@code}" semester="{$c/@semester}">
{
for $t in doc("enrollDB/transcript.xml")/transcripts/transcript/course
order by $t/../student/@sid
where $t/@code = $c/@code and
$t/@semester = $c/@semester
return $t/../student
}
</roster>
}
</rosters>