XQuery – XML Query Language

Consider transcript.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<transcripts>
  <transcript>
    <student sid="111111111" name="John Doe"/>
    <course code="CS308" semester="F1997" grade="B"/>
    <course code="MAT123" semester="F1997" grade="B"/>
    <course code="EE101" semester="F1997" grade="A"/>
    <course code="CS305" semester="F1995" grade="A"/>
  </transcript>
  <transcript>
    <student sid="987654321"  name="Bart Simpson" />
    <course code="CS305" semester="F1995" grade="C"/>
    <course code="CS308" semester="F1994" grade="B"/>
  </transcript>
  <transcript>
    <student sid="123454321"  name="Joe Blow" />
    <course code="CS315" semester="S1997" grade="A" />
    <course code="CS305" semester="S1996" grade="A" />
    <course code="MAT123" semester="S1996" grade="C" />
  </transcript>
  <transcript>
    <student sid="023456789" name="Homer Simpson" />
    <course code="EE101" semester="F1995" grade="B" />
    <course code="CS305" semester="S1996" grade="A" />
  </transcript>
</transcripts>

In basex run the following commands to create a database called enrollDB:

> create db enrollDB
> add transcript.xml
> exit

XQuery Basics

General structure (FLWOR expressions):

FOR  variable declarations
LET  variable := expression, 
     variable := expression, …
WHERE condition
ORDER BY fields 
RETURN document

Example 1: Find students who took MAT123

(: students who took  MAT123 :)
for $t in db:open("enrollDB")//transcript
where   $t/course/@code = "MAT123"
return  $t/student

Result:

<student sid="111111111" name="John Doe" />
<student sid="123454321" name="Joe Blow" />

Previous XQuery expression doesn’t produce a well-formed XML document; the following does:

<studentlist>
{
  for $t in db:open("enrollDB")//transcript
  where $t/course/@code = "MAT123"
  return $t/student
}
</studentlist>

for binds $t to transcript elements one by one, filters using where, then places student-children as e-children of studentlist using return

FOR vs LET

FOR = iteration

for $t in db:open("enrollDB")//transcript
return <result> { $t } </result>

will return the following result:

<result> <transcript> ... </transcript> </result>
<result> <transcript> ... </transcript> </result>
<result> <transcript> ... </transcript> </result>
...

LET = assignment of set value to variable

let $t := db:open("enrollDB")//transcript
return <result> { $t } </result>

will return the following result:

<result> 
  <transcript> ... </transcript>
  <transcript> ... </transcript>
  <transcript> ... </transcript> 
  ...
</result>

Document Restructuring with XQuery

Can we write a XQUERY query to restructure the XML document so that the rosters of each class is available immediately? YES!

ATTEMPT 1

<rosters>
{
  for $c in db:open("enrollDB")//course
  order by $c/@code
  return
  <roster code="{$c/@code}" semester="{$c/@semester}">
  {
    for $t in db:open("enrollDB")//course
    order by $t/../student/@sid
    where $t/@code = $c/@code and
          $t/@semester = $c/@semester
    return $t/../student
  }
  </roster>
}
</rosters>

Output elements have the form:

<roster code="CS305" semester="F1995">
  <student sid="111111111" name="John Doe"/>
  <student sid="987654321" name="Bart Simpson"/>
</roster>

But the above element will be output twice, once for each of the following two bindings of $c:

  <course code="CS305” semester="F1995” grade="C”/>
  <course code="CS305” semester="F1995” grade="A”/>

Note: grades are different; so distinct-values() won’t eliminate transcript records that refer to same class!

ATTEMPT 2

We will extract all courses in a variable; then set up iteration on distinct elements in this set.

import module namespace functx = 'http://www.functx.com';

<rosters>
{
  let $courses := 
    <courses> 
    {
      for $c in db:open("enrollDB")//course
      return <course code="{$c/@code}" semester="{$c/@semester}" />
    }
    </courses>
  for $c in functx:distinct-deep($courses/course)
  order by $c/@code
  return
    <roster code="{$c/@code}" semester="{$c/@semester}">
    {
      for $t in db:open("enrollDB")//course
      order by $t/../student/@sid
      where $t/@code = $c/@code and
            $t/@semester = $c/@semester
      return $t/../student
    }
    </roster>
}
</rosters>

ATTEMPT 3

Consider the additional XML data (class.xml):

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<classes>
  <class code="CS308" semester="F1997">
    <title>SE</title>
    <instructor>Adrian Jones</instructor>
  </class>
  <class code="MAT123" semester="F1997">
    <title>Algebra</title>
    <instructor>Ann White</instructor>
  </class>
  <class code="EE101" semester="F1997">
    <title>Circuits</title>
    <instructor>David Jones</instructor>
  </class>
  <class code="CS305" semester="F1995">
    <title>Databases</title>
    <instructor>Mary Doe</instructor>
  </class>
  <class code="EE101" semester="F1995">
    <title>Circuits</title>
    <instructor>David Jones</instructor>
  </class>
  <class code="CS308" semester="F1994">
    <title>SE</title>
    <instructor>Adrian Jones</instructor>
  </class>
  <class code="CS315" semester="S1997">
    <title>TP</title>
    <instructor>John Smyth</instructor>
  </class>
  <class code="CS305" semester="S1996">
    <title>Databases</title>
    <instructor>John Doe</instructor>
  </class>
  <class code="MAT123" semester="S1996">
    <title>Algebra</title>
    <instructor>Ann White</instructor>
  </class>
  <class code="CS8711" semester="S2018">
    <title>Databases and the Web</title>
    <instructor>Raj Sunderraman</instructor>
  </class>
</classes>

In basex run the following commands to add a XML resource into enrollDB:

> open db enrollDB
> add class.xml
> exit

The following query would produce the rosters for each class (by oterating over each class in class.xml):

<rosters>
{
  for $c in db:open("enrollDB")/classes/class
  order by $c/@code
  return
    <roster code="{$c/@code}" semester="{$c/@semester}">
    {
      for $t in db:open("enrollDB")/transcripts/transcript/course
      order by $t/../student/@sid
      where $t/@code = $c/@code and $t/@semester = $c/@semester
      return $t/../student
    }
    </roster>
}
</rosters>

Notice that the answer contains

<roster code="CS8711" semester="S2018"/>

which has no students! To avoid printing such rosters, we may have to modify the query (by adding a where clause to outer-for) as follows:

<rosters>
{
  for $c in db:open("enrollDB")/classes/class
  where db:open("enrollDB")/
        transcripts/transcript/
        course[@code = $c/@code and @semester = $c/@semester]
  order by $c/@code
  return
    <roster code="{$c/@code}" semester="{$c/@semester}">
    {
      for $t in db:open("enrollDB")/transcripts/transcript/course
      order by $t/../student/@sid
      where $t/@code = $c/@code and $t/@semester = $c/@semester
      return $t/../student
    }
    </roster>
}
</rosters>

XQuery Semantics

Step 1: Produce a list of bindings for variables in for clause

Example (bindings):

Step 2: Filter the bindings via the where clause

Example:

Consider where $A/course/@code = $B/class/@code and the bindings:

$A/w, where w = <course code="CS308" …/>

$B/x, where x = <class code="CS308" … />

Then, w/course/@code = x/class/@code, so the where condition is satisfied and the binding is retained

Step 3: Construct result

Grouping and Aggregation

Aggregation Example

Produce a list of students along with the number of courses each student took:

for $t in db:open("enrollDB")//transcript,
    $s in $t/student
let $c := $t/course,
    $result :=  
      <StudentSummary  
        sid = "{$s/@sid}" 
        name = "{$s/@name}"
        numCourses = "{fn:count($c)}"
      />
order by $result/@numCourses descending, $result/@name
return $result

The grouping effect is achieved because $c is bound to a new set of nodes for each binding of $t

Quantification in XQuery

XQuery supports explicit quantification:

Example: Find students who have taken MAT123.

for $t in  db:open("enrollDB")//transcript
where 
  some $ct in $t/course satisfies $ct/@code = "MAT123"
return $t/student

Example: Retrieve class codes for classes which are taken by EVERY student (need not be taken in the same semester).

distinct-values(
  for $c in db:open("enrollDB")//class
  let $g :=  
        (: transcripts for $c :)
        for $t in db:open("enrollDB")//transcript
        where $t/course/@code = $c/@code
        return $t
  let $h :=
        (: all transcripts :)
        for $t in db:open("enrollDB")//transcript
        return $t
  where 
    every $t1 in $h 
    satisfies 
      some $t2 in $g 
      satisfies $t1/student/@sid = $t2/student/@sid
  return $c/@code
)

XQuery: Summary

FOR-LET-WHERE-ORDER BY-RETURN = FLWOR

drawing