What is an XML Schema?

An XML Schema: The purpose of a Schema is to define the legal building blocks of an XML document, just like a DTD.

XML Schemas are the Successors of DTDs

An XML Schema Example

Example XML Shipping Order

<?xml version="1.0"?>
<shipOrder>
  <shipTo>
    <name>Tove Svendson</name>
    <street>Ragnhildvei 2</street>
    <address>4000 Stavanger</address>
    <country>Norway</country>
  </shipTo>
  <items>
    <item>
      <title>Empire Burlesque</title>
      <quantity>1</quantity>
      <price>10.90</price>
    </item>
    <item>
      <title>Hide your heart</title>
      <quantity>1</quantity>
      <price>9.90</price>
    </item>
  </items>
</shipOrder>

Corresponding XML Schema

This is the XML Schema that defines the above order:
 
<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">

<xsd:element     name="shipOrder" type="order"/>

<xsd:complexType name="order">
  <xsd:element   name="shipTo"    type="shipAddress"/>
  <xsd:element   name="items"     type="cdItems"/>
</xsd:complexType>

<xsd:complexType name="shipAddress">
  <xsd:element   name="name"      type="xsd:string"/>
  <xsd:element   name="street"    type="xsd:string"/>
  <xsd:element   name="address"   type="xsd:string"/>
  <xsd:element   name="country"   type="xsd:string"/>
</xsd:complexType>

<xsd:complexType name="cdItems">
  <xsd:element   name="item"      minOccurs="0" maxOccurs="unbounded" type="cdItem"/>
</xsd:complexType>

<xsd:complexType name="cdItem">
  <xsd:element   name="title"     type="xsd:string"/>
  <xsd:element   name="quantity"  type="xsd:positiveInteger"/>
  <xsd:element   name="price"     type="xsd:decimal"/>
</xsd:complexType>

</xsd:schema>

XML Schema: Basic Concepts


"instance document" : XML document that conforms to a particular schema.
 

The Purchase Order, po.xml

<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Mill Valley</city>
        <state>CA</state>
        <zip>90952</zip>
    </shipTo>
    <billTo country="US">
        <name>Robert Smith</name>
        <street>8 Oak Avenue</street>
        <city>Old Town</city>
        <state>PA</state>
        <zip>95819</zip>
    </billTo>
    <comment>Hurry, my lawn is going wild!</comment>
    <items>
        <item partNum="872-AA">
            <productName>Lawnmower</productName>
            <quantity>1</quantity>
            <USPrice>148.95</USPrice>
            <comment>Confirm this is electric</comment>
        </item>
        <item partNum="926-AA">
            <productName>Baby Monitor</productName>
            <quantity>1</quantity>
            <USPrice>39.98</USPrice>
            <shipDate>1999-05-21</shipDate>
        </item>
    </items>
</purchaseOrder>

The Purchase Order Schema

The purchase order schema is contained in the file po.xsd
 
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

 <xsd:annotation>
  <xsd:documentation xml:lang="en">
   Purchase order schema for Example.com.
   Copyright 2000 Example.com. All rights reserved.
  </xsd:documentation>
 </xsd:annotation>

 <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>

 <xsd:element name="comment" type="xsd:string"/>

 <xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="shipTo" type="USAddress"/>
   <xsd:element name="billTo" type="USAddress"/>
   <xsd:element ref="comment" minOccurs="0"/>
   <xsd:element name="items"  type="Items"/>
  </xsd:sequence>
  <xsd:attribute name="orderDate" type="xsd:date"/>
 </xsd:complexType>

 <xsd:complexType name="USAddress">
  <xsd:sequence>
   <xsd:element name="name"   type="xsd:string"/>
   <xsd:element name="street" type="xsd:string"/>
   <xsd:element name="city"   type="xsd:string"/>
   <xsd:element name="state"  type="xsd:string"/>
   <xsd:element name="zip"    type="xsd:decimal"/>
  </xsd:sequence>
  <xsd:attribute name="country" type="xsd:NMTOKEN"
     fixed="US"/>
 </xsd:complexType>

 <xsd:complexType name="Items">
  <xsd:sequence>
   <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
    <xsd:complexType>
     <xsd:sequence>
      <xsd:element name="productName" type="xsd:string"/>
      <xsd:element name="quantity">
       <xsd:simpleType>
        <xsd:restriction base="xsd:positiveInteger">
         <xsd:maxExclusive value="100"/>
        </xsd:restriction>
       </xsd:simpleType>
      </xsd:element>
      <xsd:element name="USPrice"  type="xsd:decimal"/>
      <xsd:element ref="comment"   minOccurs="0"/>
      <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
     </xsd:sequence>
     <xsd:attribute name="partNum" type="SKU" use="required"/>
    </xsd:complexType>
   </xsd:element>
  </xsd:sequence>
 </xsd:complexType>

 <!-- Stock Keeping Unit, a code for identifying products -->
 <xsd:simpleType name="SKU">
  <xsd:restriction base="xsd:string">
   <xsd:pattern value="\d{3}-[A-Z]{2}"/>
  </xsd:restriction>
 </xsd:simpleType>

</xsd:schema>

Complex Type Definitions, Element & Attribute Declarations


New complex types are defined using the complexType element and such definitions typically contain a set of element declarations, element references, and attribute declarations.
 

Defining the USAddress Type
 <xsd:complexType name="USAddress" >
  <xsd:sequence>
   <xsd:element name="name"   type="xsd:string"/>
   <xsd:element name="street" type="xsd:string"/>
   <xsd:element name="city"   type="xsd:string"/>
   <xsd:element name="state"  type="xsd:string"/>
   <xsd:element name="zip"    type="xsd:decimal"/>
  </xsd:sequence>
  <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
 </xsd:complexType>


The USAddress definition contains only declarations involving the simple types: string, decimal and NMTOKEN.

In contrast, the PurchaseOrderType definition contains element declarations involving complex types.
 

Defining PurchaseOrderType
 <xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="shipTo" type="USAddress"/>
   <xsd:element name="billTo" type="USAddress"/>
   <xsd:element ref="comment" minOccurs="0"/>
   <xsd:element name="items"  type="Items"/>
  </xsd:sequence>
  <xsd:attribute name="orderDate" type="xsd:date"/>
 </xsd:complexType>
Note:
<xsd:element ref="comment" minOccurs="0"/>

Occurrence Constraints

Attribute Occurrences


Default values of both attributes and elements are declared using the default attribute.
 


The fixed attribute is used in both attribute and element declarations to ensure that the attributes and elements are set to particular values.

For example, po.xsd contains a declaration for the country attribute, which is declared with a fixed value US. This declaration means that the appearance of a country attribute in an instance document is optional (the default value of use is optional), although if the attribute does appear, its value must be US, and if the attribute does not appear, the schema processor will provide a country attribute with the value US.

Note that the concepts of a fixed value and a default value are mutually exclusive, and so it is an error for a declaration to contain both fixed and default attributes.

Global Elements & Attributes:

Global elements, and global attributes, are created by declarations that appear as the children of the schema element.

Once declared, a global element or a global attribute can be referenced in one or more declarations using the ref attribute.

Example: the comment element appears in po.xml .

The declaration of a global element also enables the element to appear at the top-level of an instance document. Hence purchaseOrder, which is declared as a global element in po.xsd, can appear as the top-level element in po.xml.
 

Simple Types:

The purchase order schema declares several elements and attributes that have simple types.

Some of these simple types, such as string and decimal, are built in to XML Schema

Others are derived from the built-in's. For example, the partNum attribute has a type called SKU (Stock Keeping Unit) that is derived from string. Both built-in simple types and their derivations can be used in all element and attribute declarations.
 
 
Table 2. Simple Types Built In to XML Schema 
Simple Type  Examples (delimited by commas)  Notes 
string Confirm this is electric 
normalizedString Confirm this is electric  see (3) 
token Confirm this is electric  see (4) 
byte -1, 126  see (2) 
unsignedByte 0, 126  see (2) 
base64Binary GpM7   
hexBinary 0FB7   
integer -126789, -1, 0, 1, 126789  see (2) 
positiveInteger 1, 126789  see (2) 
negativeInteger -126789, -1  see (2) 
nonNegativeInteger 0, 1, 126789  see (2) 
nonPositiveInteger -126789, -1, 0  see (2) 
int -1, 126789675  see (2) 
unsignedInt 0, 1267896754  see (2) 
long -1, 12678967543233  see (2) 
unsignedLong 0, 12678967543233  see (2) 
short -1, 12678  see (2) 
unsignedShort 0, 12678  see (2) 
decimal -1.23, 0, 123.4, 1000.00  see (2) 
float -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN  equivalent to single-precision 32-bit floating point, NaN is "not a number", see (2) 
double -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN  equivalent to double-precision 64-bit floating point, see (2) 
boolean true, false
1, 0 
time 13:20:00.000, 13:20:00.000-05:00  see (2) 
dateTime 1999-05-31T13:20:00.000-05:00  May 31st 1999 at 1.20pm Eastern Standard Time which is 5 hours behind Co-Ordinated Universal Time, see (2) 
duration P1Y2M3DT10H30M12.3S  1 year, 2 months, 3 days, 10 hours, 30 minutes, and 12.3 seconds 
date 1999-05-31  see (2) 
gMonth --05--  May, see (2) (5) 
gYear 1999  1999, see (2) (5) 
gYearMonth 1999-02  the month of February 1999, regardless of the number of days, see (2) (5) 
gDay ---31  the 31st day, see (2) (5) 
gMonthDay --05-31  every May 31st, see (2) (5) 
Name shipTo  XML 1.0 Name type 
QName po:USAddress  XML Namespace QName 
NCName USAddress  XML Namespace NCName, i.e. a QName without the prefix and colon 
anyURI http://www.example.com/, http://www.example.com/doc.html#ID5 
language en-GB, en-US, fr  valid values for xml:lang as defined in XML 1.0 
ID XML 1.0 ID attribute type, see (1) 
IDREF XML 1.0 IDREF attribute type, see (1) 
IDREFS XML 1.0 IDREFS attribute type, see (1) 
ENTITY XML 1.0 ENTITY attribute type, see (1) 
ENTITIES XML 1.0 ENTITIES attribute type, see (1) 
NOTATION XML 1.0 NOTATION attribute type, see (1) 
NMTOKEN US,
Brésil 
XML 1.0 NMTOKEN attribute type, see (1) 
NMTOKENS US UK,
Brésil Canada Mexique 
XML 1.0 NMTOKENS attribute type, i.e. a whitespace separated list of NMTOKEN's, see (1) 
Notes: 
(1) To retain compatibility between XML Schema and XML 1.0 DTDs, the simple types ID, IDREF, IDREFS, ENTITY, ENTITIES, NOTATION, NMTOKEN, NMTOKENS should only be used in attributes. 
(2) A value of this type can be represented by more than one lexical format, e.g. 100 and 1.0E2 are both valid float formats representing "one hundred". However, rules have been established for this type that define a canonical lexical format, see XML Schema Part 2
(3) Newline, tab and carriage-return characters in a normalizedString type are converted to space characters before schema processing. 
(4) As normalizedString, and adjacent space characters are collapsed to a single space character, and leading and trailing spaces are removed. 
(5) The "g" prefix signals time periods in the Gregorian calender.

Deriving New Simple Types

Suppose we wish to create a new type of integer called myInteger whose range of values is between 10000 and 99999 (inclusive).

<xsd:simpleType name="myInteger">
  <xsd:restriction base="xsd:integer">
    <xsd:minInclusive value="10000"/>
    <xsd:maxInclusive value="99999"/>
  </xsd:restriction>
</xsd:simpleType>
The example shows one particular combination of a base type and two facets used to define myInteger.

Another Example:

A new simple type called SKU is derived (by restriction) from the simple type string.

<xsd:simpleType name="SKU">
  <xsd:restriction base="xsd:string">
    <xsd:pattern value="\d{3}-[A-Z]{2}"/>
  </xsd:restriction>
</xsd:simpleType>
Third Example (enumeration facet):
<xsd:simpleType name="USState">
  <xsd:restriction base="xsd:string">
    <xsd:enumeration value="AK"/>
    <xsd:enumeration value="AL"/>
    <xsd:enumeration value="AR"/>
    <!-- and so on ... -->
  </xsd:restriction>
</xsd:simpleType>

List Types


The value of an atomic type is indivisible from XML Schema's perspective.

NMTOKENS is a list type
example value: "US UK FR".
XML Schema has three built-in list types, they are NMTOKENS, IDREFS, and ENTITIES.

Create new list types by derivation from existing atomic types.

For example, to create a list of myInteger's:

<xsd:simpleType name="listOfMyIntType">
  <xsd:list itemType="myInteger"/>
</xsd:simpleType>
And an element in an instance document whose content conforms to listOfMyIntType is:
<listOfMyInt>20003 15037 95977 95945</listOfMyInt>
Several facets can be applied to list types: length, minLength, maxLength, and enumeration.

For example, to define a list of exactly six US states (SixUSStates), we first define a new list type called USStateList from USState, and then we derive SixUSStates by restricting USStateList to only six items:

 
<xsd:simpleType name="USStateList">
 <xsd:list itemType="USState"/>
</xsd:simpleType>

<xsd:simpleType name="SixUSStates">
 <xsd:restriction base="USStateList">
  <xsd:length value="6"/>
 </xsd:restriction>
</xsd:simpleType>
Elements whose type is SixUSStates must have six items, and each of the six items must be one of the (atomic) values of the enumerated type USState, for example:
<sixStates>PA NY CA NY LA AK</sixStates>
Note that it is possible to derive a list type from the atomic type string. However, a string may contain white space, and white space delimits the items in a list type, so you should be careful using list types whose base type is string. For example, suppose we have defined a list type with a length facet equal to 3, and base type string, then the following 3 item list is legal:
Asie Europe Afrique
But the following 3 "item" list is illegal:
Asie Europe Amérique Latine
Even though "Amérique Latine" may exist as a single string outside of the list, when it is included in the list, the whitespace between Amérique and Latine effectively creates a fourth item, and so the latter example will not conform to the 3-item list type.

Union Types

A union type enables an element or attribute value to be one or more instances of one type drawn from the union of multiple atomic and list types.
<xsd:simpleType name="zipUnion">
  <xsd:union memberTypes="USState listOfMyIntType"/>
</xsd:simpleType>


Now, assuming we have declared an element called zips of type zipUnion, valid instances of the element are:

<zips>CA</zips>

<zips>95630 95977 95945</zips>

<zips>AK</zips>

Anonymous Type Definitions

Schemas can be constructed by defining sets of named types such as PurchaseOrderType and then declaring elements such as purchaseOrder that reference the types using the type= construction.

This style of schema construction is straightforward but it can be unwieldy, especially if you define many types that are referenced only once and contain very few constraints.
 

Two Anonymous Type Definitions
<xsd:complexType name="Items">
 <xsd:sequence>
  <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
   <xsd:complexType>
    <xsd:sequence>
     <xsd:element name="productName" type="xsd:string"/>
     <xsd:element name="quantity">
      <xsd:simpleType>
       <xsd:restriction base="xsd:positiveInteger">
        <xsd:maxExclusive value="100"/>
       </xsd:restriction>
      </xsd:simpleType>
     </xsd:element>
     <xsd:element name="USPrice"  type="xsd:decimal"/>
     <xsd:element ref="comment"   minOccurs="0"/>
     <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
    </xsd:sequence>
    <xsd:attribute name="partNum" type="SKU" use="required"/>
   </xsd:complexType>
  </xsd:element>
 </xsd:sequence>
</xsd:complexType>
In the case of the item element, it has an anonymous complex type consisting of the elements productName, quantity, USPrice, comment, and shipDate, and an attribute called partNum. In the case of the quantity element, it has an anonymous simple type derived from integer whose value ranges between 1 and 99.

Element Content

Complex Types from Simple Types

How to declare an element that has an attribute and contains a simple value such as:
<internationalPrice currency="EUR">423.46</internationalPrice>
The purchase order schema declares a USPrice element that is a starting point:
<xsd:element name="USPrice" type="decimal"/>
Derive a new complex type from the simple type decimal:
 
 <xsd:element name="internationalPrice">
  <xsd:complexType>
   <xsd:simpleContent>
    <xsd:extension base="xsd:decimal">
     <xsd:attribute name="currency" type="xsd:string"/>
    </xsd:extension>
   </xsd:simpleContent>
  </xsd:complexType>
 </xsd:element>

Mixed Content

<letterBody>
<salutation>Dear Mr.<name>Robert Smith</name>.</salutation>
Your order of <quantity>1</quantity> <productName>Baby
Monitor</productName> shipped from our warehouse on
<shipDate>1999-05-21</shipDate>. ....
</letterBody>
Snippet of Schema for Customer Letter
<xsd:element name="letterBody">
 <xsd:complexType mixed="true">
  <xsd:sequence>
   <xsd:element name="salutation">
    <xsd:complexType mixed="true">
     <xsd:sequence>
      <xsd:element name="name" type="xsd:string"/>
     </xsd:sequence>
    </xsd:complexType>
   </xsd:element>
   <xsd:element name="quantity"    type="xsd:positiveInteger"/>
   <xsd:element name="productName" type="xsd:string"/>
   <xsd:element name="shipDate"    type="xsd:date" minOccurs="0"/>
   <!-- etc. -->
  </xsd:sequence>
 </xsd:complexType>
</xsd:element>

Empty Content

Now suppose that we want the internationalPrice element to convey both the unit of currency and the price as attribute values rather than as separate attribute and content values. For example:
<internationalPrice currency="EUR" value="423.46"/>
An Empty Complex Type
<xsd:element name="internationalPrice">
 <xsd:complexType>
  <xsd:complexContent>
   <xsd:restriction base="xsd:anyType">
    <xsd:attribute name="currency" type="xsd:string"/>
    <xsd:attribute name="value"    type="xsd:decimal"/>
   </xsd:restriction>
  </xsd:complexContent>
 </xsd:complexType>
</xsd:element>
 
Annotations
 
Annotations in Element Declaration & Complex Type Definition
<xsd:element name="internationalPrice">
 <xsd:annotation>
  <xsd:documentation xml:lang="en">
      element declared with anonymous type
  </xsd:documentation>
 </xsd:annotation>
 ...
...
</xsd:element>

Groups of Elements/Attributes

Two groups into the PurchaseOrderType definition from the purchase order schema so that purchase orders may contain either separate shipping and billing addresses, or a single address for those cases in which the shippee and billee are co-located:
 
Nested Choice and Sequence Groups
<xsd:complexType name="PurchaseOrderType">
 <xsd:sequence>
  <xsd:choice>
   <xsd:group   ref="shipAndBill"/>
   <xsd:element name="singleUSAddress" type="USAddress"/>
  </xsd:choice>
  <xsd:element ref="comment" minOccurs="0"/>
  <xsd:element name="items"  type="Items"/>
 </xsd:sequence>
 <xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>

<xsd:group name="shipAndBill">
  <xsd:sequence>
    <xsd:element name="shipTo" type="USAddress"/>
    <xsd:element name="billTo" type="USAddress"/>
  </xsd:sequence>
</xsd:group>
The choice group element allows only one of its children to appear in an instance.