XML Query Languages

XPath – core query language. Very limited, a glorified selection operator. Very useful, though: used in XML Schema, XSLT, XQuery, many other XML standards

XQuery – W3C standard. Very powerful, fairly intuitive, SQL-style

XSLT – a functional style document transformation language. Very powerful

Why Query XML?


XPath Document Tree


Sample Document Corresponding to the Tree

<?xml  version="1.0" ?>
<!-- Some  comment -->
  <student sid="111111111" >
    <course code="CS308" semester="F1997" grade="4"/>
    <course code="MAT123" semester="F1997" grade="3"/>
  <student sid="987654321" >
    <course code="CS308" semester="F1994" grade="3" />
  <student sid="444444444" >
<!-- Some other comment --><!-- Some other comment -->


XPath Basics

Current (or context node) – exists during the evaluation of XPath expressions (and in other XML query languages)

Attributes, Text, etc.

Overall Idea and Semantics

An XPath expression is:




Location step:


XPath Semantics

The meaning of the expression locationStep1/locationStep2/… is the set of all document nodes obtained as follows:

locationStep1/locationStep2/… means:

locationStep = axis::node[predicate]

Some Examples

2nd course child of 1st student child of students:


All last course elements within each student element:



Wildcards are useful when the exact structure of document is not known

XPath Queries (selection predicates)


Some more examples

(1) Students who have taken CS308:


True if : CS308//student/course/@code

(2) A more complex example:

//student[status="U2" and 
          starts-with(.//last, "D") and 
          contains(string-join(.//@code),"MAT") and 
          not (.//last = .//first) ]

(3) Testing whether a subnode exists:

students who have a grade (for some course)


students who have either a first name or have taken a course in some semester or have status U4

//student[name/first or course/@semester or  status/text() = "U2"]

(4) Aggregation: sum( ), count( )

//student[course/@grade and sum(.//@grade) div count(.//@grade) > 3.2]

(5) Union operator |

//course[@semester="F1994"]  |  //course[@semester="F1997"]

union lets us define heterogeneous collections of nodes