XPath – core query language. Very limited, a glorified selection operator. Very useful, though: used in XML Schema, XSLT, XQuery, many other XML standards
XQuery – W3C standard. Very powerful, fairly intuitive, SQL-style
XSLT – a functional style document transformation language. Very powerful
Why Query XML?
Sample Document Corresponding to the Tree
<?xml version="1.0" ?>
<!-- Some comment -->
<students>
<student sid="111111111" >
<name>
<first>John</first>
<last>Doe</last>
</name>
<status>U2</status>
<course code="CS308" semester="F1997" grade="4"/>
<course code="MAT123" semester="F1997" grade="3"/>
</student>
<student sid="987654321" >
<name>
<first>Bart</first>
<last>Simpson</last>
</name>
<status>U4</status>
<course code="CS308" semester="F1994" grade="3" />
</student>
<student sid="444444444" >
<name>
<last>Simpson</last>
</name>
<status>U4</status>
</student>
</students>
<!-- Some other comment --><!-- Some other comment -->
/
are absolute path expressions
/
returns root node of XPath tree/students/student
returns all Student-elements that are children of Students elements, which in turn must be children of the root/student
returns empty set (no such children at root)Current (or context node) – exists during the evaluation of XPath expressions (and in other XML query languages)
.
denotes the current node; ..
denotes the parent node
foo/bar
returns all bar nodes that are children of foo nodes, which in turn are children of the current node ./foo/bar
same as above ../abc/cde
all cde
e-children of abc
e-children of the parent of the current node/
are relative (to the current node)Attributes, Text, etc.
/students/student/@sid
returns all sid
a-children of student
, which are e-children of students
, which are children of the root/students/student/name/last/text()
returns all t-children of last
e-children of …/comment()
returns comment nodes under rootAn XPath expression is:
/locationStep1/locationStep2/…
or
locationStep1/locationStep2/…
Location step:
Axis::nodeSelector[predicate]
Navigation axis:
child
, parent
ancestor
, descendant
, ancestor-or-self
, descendant-or-self
, right-sibling
, left-sibling
etc.Node selector: node name or wildcard; e.g.,
./child::Student
(we used ./Student
, which is an abbreviation)./child::*
– any e-child (abbreviation: ./*
)Predicate: a selection condition; e.g.,
/students/student[course/@code = "MAT123"]
The meaning of the expression locationStep1/locationStep2/…
is the set of all document nodes obtained as follows:
locationStep1
from the current nodelocationStep2
; take the union of all these nodeslocationStep3
, etc.locationStep1/locationStep2/…
means:
locationStep1
locationStep2
using N as the current nodelocationStep2
do the samelocationStep = axis::node[predicate]
axis::node
2nd course
child of 1st student
child of students
:
/students/student[1]/course[2]
All last course
elements within each student
element:
/students/student/course[last()]
Wildcards are useful when the exact structure of document is not known
Descendant-or-self axis, //
: allows to descend down any number of levels (including 0)
//course` – all
course`` nodes under the root/students//@sid
– all sid
attribute nodes under the elementstudents
./last
and last
are same.//last
and //last
are differentThe *
wildcard:
*
(any element) e.g. /student/*/text()
@*
(any attribute) e.g. /students//@*
Axis::nodeSelector[predicate]
Axis::nodeSelector[predicate]
⊆ Axis::nodeSelector
but contains only the nodes that satisfy predicatehttps://www.w3.org/XML/Group/qtspecs/specifications/xpath-functions-31/html/Overview.html
(1) Students who have taken CS308:
//student[course/@code="CS308"]
True if : CS308
∈ //student/course/@code
(2) A more complex example:
//student[status="U2" and
starts-with(.//last, "D") and
contains(string-join(.//@code),"MAT") and
not (.//last = .//first) ]
(3) Testing whether a subnode exists:
students who have a grade (for some course)
//student[course/@grade]
students who have either a first name or have taken a course in some semester or have status U4
//student[name/first or course/@semester or status/text() = "U2"]
(4) Aggregation: sum( ), count( )
//student[course/@grade and sum(.//@grade) div count(.//@grade) > 3.2]
(5) Union operator |
//course[@semester="F1994"] | //course[@semester="F1997"]
union lets us define heterogeneous collections of nodes