XML Query Languages

XPath – core query language. Very limited, a glorified selection operator. Very useful, though: used in XML Schema, XSLT, XQuery, many other XML standards

XQuery – W3C standard. Very powerful, fairly intuitive, SQL-style

XSLT – a functional style document transformation language. Very powerful

Why Query XML?

XPath

XPath Document Tree

drawing

Sample Document Corresponding to the Tree

<?xml  version="1.0" ?>
<!-- Some  comment -->
<students>
  <student sid="111111111" >
    <name>
      <first>John</first>
      <last>Doe</last>
    </name>
    <status>U2</status>
    <course code="CS308" semester="F1997" grade="4"/>
    <course code="MAT123" semester="F1997" grade="3"/>
  </student>
  <student sid="987654321" >
    <name>
      <first>Bart</first>
      <last>Simpson</last>
    </name>
    <status>U4</status>
    <course code="CS308" semester="F1994" grade="3" />
  </student>
  <student sid="444444444" >
    <name>
      <last>Simpson</last>
    </name>
    <status>U4</status>
  </student>
</students>
<!-- Some other comment --><!-- Some other comment -->

Terminology

XPath Basics

Current (or context node) – exists during the evaluation of XPath expressions (and in other XML query languages)

Attributes, Text, etc.

Overall Idea and Semantics

An XPath expression is:

/locationStep1/locationStep2/…

or

locationStep1/locationStep2/…

Location step:

Axis::nodeSelector[predicate]

XPath Semantics

The meaning of the expression locationStep1/locationStep2/… is the set of all document nodes obtained as follows:

locationStep1/locationStep2/… means:

locationStep = axis::node[predicate]

Some Examples

2nd course child of 1st student child of students:

/students/student[1]/course[2]

All last course elements within each student element:

/students/student/course[last()]

Wildcards

Wildcards are useful when the exact structure of document is not known

XPath Queries (selection predicates)

https://www.w3.org/XML/Group/qtspecs/specifications/xpath-functions-31/html/Overview.html

Some more examples

(1) Students who have taken CS308:

//student[course/@code="CS308"]

True if : CS308//student/course/@code

(2) A more complex example:

//student[status="U2" and 
          starts-with(.//last, "D") and 
          contains(string-join(.//@code),"MAT") and 
          not (.//last = .//first) ]

(3) Testing whether a subnode exists:

students who have a grade (for some course)

//student[course/@grade]  

students who have either a first name or have taken a course in some semester or have status U4

//student[name/first or course/@semester or  status/text() = "U2"]

(4) Aggregation: sum( ), count( )

//student[course/@grade and sum(.//@grade) div count(.//@grade) > 3.2]

(5) Union operator |

//course[@semester="F1994"]  |  //course[@semester="F1997"]

union lets us define heterogeneous collections of nodes