Text only | Skip links
Skip links||IT Services, University of Oxford

Contents

1. XPath

XPath is the basis of most other XML querying and transformation languages. It is just a way of locating nodes in an XML document

As with XSLT and XQuery, today we aren't really going to teach you these, just give you a quick whistle-stop tour so you have been exposed to them.

1.1. Accessing your TEI document

So you've created some TEI XML documents, what now?

  • XPath
  • XSLT Tranformation to another format (HTML, PDF, RTF, CSV, etc.)
  • XML Query (XQuery)
  • Custom Applications

1.2. What is XPath?

  • It is a syntax for accessing parts of an XML document
  • It uses a path structure to define XML elements
  • It has a library of standard functions
  • It is a W3C Standard
  • It is one of the main components of XQuery and XSLT

1.3. Example text

<body n="anthology">
 <div type="poem">
  <head>The SICK ROSE </head>
  <lg type="stanza">
   <l n="1">O Rose thou art sick.</l>
   <l n="2">The invisible worm,</l>
   <l n="3">That flies in the night </l>
   <l n="4">In the howling storm:</l>
  </lg>
  <lg type="stanza">
   <l n="5">Has found out thy bed </l>
   <l n="6">Of crimson joy:</l>
   <l n="7">And his dark secret love </l>
   <l n="8">Does thy life destroy.</l>
  </lg>
 </div>
</body>

1.4.

1.5.

1.6.

1.7.

1.8.

1.9.

1.10.

1.11.

1.12.

1.13.

1.14.

1.15.

1.16.

1.17.

1.18.

1.19.

1.20.

1.21.

1.22.

1.23.

1.24.

1.25.

1.26.

1.27.

1.28.

1.29.

1.30. XPath: More About Paths

  • A location path results in a node-set
  • Paths can be absolute (/div/lg[1]/l)
  • Paths can be relative (l/../../head)
  • Formal Syntax: (axisname::nodetest[predicate])
  • For example:child::div[contains(head, 'ROSE')]

1.31. XPath: Axes

ancestor::
Contains all ancestors (parent, grandparent, etc.) of the current node
ancestor-or-self::
Contains the current node plus all its ancestors (parent, grandparent, etc.)
attribute::
Contains all attributes of the current node
child::
Contains all children of the current node
descendant::
Contains all descendants (children, grandchildren, etc.) of the current node
descendant-or-self::
Contains the current node plus all its descendants (children, grandchildren, etc.)

1.32. XPath: Axes (2)

following::
Contains everything in the document after the closing tag of the current node
following-sibling::
Contains all siblings after the current node
parent::
Contains the parent of the current node
preceding::
Contains everything in the document that is before the starting tag of the current node
preceding-sibling::
Contains all siblings before the current node
self::
Contains the current node

1.33. Axis examples

  • ancestor::lg = all <lg> ancestors
  • ancestor-or-self::div = all <div> ancestors or current
  • attribute::n = n attribute of current node
  • child::l = <l> elements directly under current node
  • descendant::l = <l> elements anywhere under current node
  • descendant-or-self::div = all <div> children or current
  • following-sibling::l[1] = next <l> element at this level
  • preceding-sibling::l[1] = previous <l> element at this level
  • self::head = current <head> element

1.34. XPath: Predicates

  • child::lg[attribute::type='stanza']
  • child::l[@n='4']
  • child::div[position()=3]
  • child::div[4]
  • child::l[last()]
  • child::lg[last()-1]

1.35. XPath: Abbreviated Syntax

  • nothing is the same as child::, so lg is short for child::lg
  • @ is the same as attribute::, so @type is short for attribute::type
  • . is the same as self::, so ./head is short for self::node()/child::head
  • .. is the same as parent::, so../lg is short for parent::node()/child::lg
  • // is the same as descendant-or-self::, so div//l is short for child::div/descendant-or-self::node()/child::l

1.36. XPath Functions: Node-Set Functions

  • count() Returns the number of nodes in a node-set: count(person)
  • id() Selects elements by their unique ID : id('S3')
  • last() Returns the position number of the last node : person[last()]
  • name() Returns the name of a node: //*[name('person')]
  • namespace-uri() Returns the namespace URI of a specified node: namespace-uri(persName)
  • position() Returns the position in the node list of the node that is currently being processed : //person[position()=6]

1.37. XPath Functions: String Functions

  • concat() Concatenates its arguments: concat('http://', $domain, '/', $file, '.html')
  • contains() Returns true if the second string is contained within the first string: //persName[contains(surname, 'van')]
  • normalize-space() Removes leading and trailing whitespace and replaces all internal whitespace with one space: normalize-space(surname)
  • starts-with() Returns true if the first string starts with the second: starts-with(surname, 'van')
  • string() Converts the argument to a string: string(@sex)

1.38. XPath Functions: String Functions (2)

  • substring Returns part of a string of specified start character and length: substring(surname, 5,4)
  • substring-after() Returns the part of the string that is after the string given: substring-after(surname, 'De')
  • substring-before Returns the part of the string that is before the string given: substring-before(@date, '-')
  • translate() Performs a character by character replacement. It looks at the characters in the first string and replaces each character in the first argument by the corresponding one in the second argument: translate('1234', '24', '68')

1.39. XPath: Where can I use XPath?

Learning all these functions, though a bit tiring to begin with, can be very useful as they are used throughout XML technologies, but especially in XSLT and XQuery.

You can use XPath to search with in XPath aware editors like oXygen.

2. XSLT

XSLT is Extensible Stylesheet Language - Transformations, and is the main method used today for transforming input XML into output text/HTML/XML.

2.1. XSLT

The XSLT language is
  • expressed in XML; uses namespaces to distinguish output from instructions
  • a Turing-complete functional programing language
  • reads and writes XML trees
  • designed to generate XSL FO, but now widely used to generate HTML or other forms of XML

2.2. What do you mean, "transformation"?

Take this
<recipe>
<title>Pasta for beginners</title>
<ingredients><item>Pasta</item>
<item>Grated cheese</item>
</ingredients>
<cook>Cook the pasta and mix with the cheese</cook>
</recipe>
and make this
<html>
<h1>Pasta for beginners</h1>
<p>Ingredients: Pasta Grated cheese</p>
<p>Cook the pasta and mix with the cheese</p>
</html>

2.3. How do you express that in XSL?

<xsl:stylesheet
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
version="1.0">
<xsl:template match="recipe">
<html>
<h1><xsl:value-of select="title"/></h1>
<p>Ingredients:
<xsl:apply-templates
select = "ingredients/item"/>
</p>
<p><xsl:value-of select="cook"/></p>
</html>
</xsl:template>
</xsl:stylesheet>

2.4. XSL Summary

The core techniques:
  • template rules for nodes in the incoming XSL
  • taking material from other nodes
  • processing nodes several times in different modes
  • variables and functions
  • choosing, sorting, numbering
  • different types of output

3. XQuery

While XSLT is good for transforming XML to other formats (XML, HTML, PDF, Text, etc.) sometimes you may wish to query a large database of XML documents and extract only matching records. In cases such as this, XQuery might be more appropriate.

3.1. What is XQuery?

  • It is a domain-specific method for accessing and manipulating XML
  • It is meant for querying XML
  • It is built upon XPath
  • It is like SQL but for XML
  • A W3C recommendation

3.2. XQuery: Expressions

path expressions
return a nodeset
element constructors
return a new element
FLWOR expressions
analogous to SQL Select statement
list expressions
operations on lists or sets of values
conditional expressions
traditional if then else construction
qualified expressions
boolean operations over lists or sets of values
datatype expressions
test datatypes of values

3.3. XQuery: Path Expression

The simplest kind of XQuery that you've already seen:
document("test.xml")//p //p/foreign[@xml:lang='lat'] //foreign[@xml:lang='lat']/text() //tei:TEI//tei:person[age >= 25]

3.4. XQuery: Element constructor

You may construct elements and embed XQueries or create well-formed results in the output you return. These may contain literal text or variables:
<latin>o tempora o mores</latin> <latin>{$s}</latin> <li>Name: {$surname}, {$forename}</li> <li>Birth Country: {data($person/tei:birth/tei:placeName/tei:country)} </li>

3.5. XQuery: FLWOR expressions

For - Let - Where - Order - Return

declare namespace tei="http://www.tei-c.org/ns/1.0"; for $t in document("book.xml")//tei:text let $latinPhrases := $t//tei:foreign[@xml:lang='lat'] where count($latinPhrases) > 1 order by count($latinPhrases) return <list><item>ID: {data($t/@xml:id)}</item> <item>Phrases: {$latinPhrases} </item></list>
  • for defines a cursor over an xpath
  • let defines a name for the contents of an xpath
  • where selects from the nodes
  • order sorts the results
  • return specifies the XML fragments to construct
  • Curly braces are used for grouping, and defining scope

3.6. XQuery: List Expressions

XQuery expressions manipulate lists of values, for which many operators are supported:

  • constant lists: (7, 9, <thirteen/>)
  • integer ranges: i to j
  • XPath expressions
  • concatenation
  • set operators: | (or union), intersect, except
  • functions: remove, index-of, count, avg, max, min, sum, distinct-values ...

3.7. XQuery: List Expressions (cont.)

When lists are viewed as sets:

  • XML nodes are compared on their node identity
  • Any duplicates which exist are removed
  • Unless re-ordered the database order is preserved

3.8. XQuery: Conditional Expressions

<div> { IF document("xqt")//tei:title/text() ="Introduction to XQuery" THEN <p>This is true.</p> ELSE <p>This is false.</p> } </div>

More often used in user-defined functions

3.9. XQuery: Qualified Expressions

  • some-in-satisfies
    declare namespace tei="http://www.tei-c.org/ns/1.0"; for $b in document("book.xml")//tei:text where some $p in $b//tei:p satisfies (contains($p,"sailing") AND contains($p,"windsurfing")) return $b/ancestor::tei:TEI/tei:teiHeader//tei:title[1]
  • every-in-satisfies
    declare namespace tei="http://www.tei-c.org/ns/1.0"; for $b in document("book.xml")//tei:text where every $p in $b//tei:p satisfies contains($p,"sailing") return $b/ancestor::tei:TEI/tei:teiHeader//tei:title[1]

3.10. XQuery Example: Multiple Variables

One of the real benefits that XQuery gives you over XPath queries is that you can define multiple variables:

(: All Person Elements with their Stone's Description :) declare namespace tei="http://www.tei-c.org/ns/1.0"; for $stones in collection('/db/pc')//tei:TEI let $stoneDesc := $stones//tei:stoneDescription let $people := $stones//tei:person return <div>{$people} {$stoneDesc}</div>

3.11. XQuery Example: Element Constructors

You can construct the results into whatever elements you want:

( : Women's Birth and Death Countries in placeName elements :) declare namespace tei="http://www.tei-c.org/ns/1.0"; let $stones := collection('/db/pc')//tei:TEI for $person in $stones//tei:person[@sex = '2'] let $birthCountry := $person/tei:birth//tei:country/text() let $deathCountry := $person/tei:death//tei:country/text() return <div><p>This woman was born in <placeName>{$birthCountry}</placeName> and died in <placeName>{$deathCountry}</placeName>.</p> </div>

3.12. Result: Element Constructors

A series of <div> elements like:

<div> <p>This woman was born in <placeName> France </placeName> and died in <placeName> Italy </placeName>. </p> </div>

3.13. XQuery Example: Traversing the Tree

You are not limited to one section of the document:

(: Getting the Stone's Title :) declare namespace tei="http://www.tei-c.org/ns/1.0"; let $stones := collection('/db/pc')//tei:TEI for $person in $stones//tei:person[@sex ='2'] let $birthCountry := $person/tei:birth//tei:country/text() let $deathCountry := $person/tei:death//tei:country/text() let $title := $person/ancestor::tei:TEI//tei:teiHeader//tei:title[1]/text() return <div> <head>{$title}</head> <p>This woman was born in <placeName>{$birthCountry}</placeName> and died in <placeName>{$deathCountry}</placeName>.</p> </div>

3.14. Result: Traversing the Tree

A series of <div> elements like:

<div> <head> Stone 2 </head> <p> This woman was born in <placeName> France </placeName> and died in <placeName> Italy </placeName>. </p> </div>

3.15. XQuery in Practice

  • You can handle requests passed from HTML forms inside your XQueries
  • That XQuery is being used is invisible to the user
  • eXist may be embedded into Apache's Cocoon web publishing framework
  • You can send queries to eXist in all sorts of ways including from within Java programs and via HTTP/REST, XML-RPC, SOAP, WebDAV.
  • You can use XUpdate to add/change/delete nodes in the XML database

4. Accessing XML: Conclusions

  • XML is a set of hierachical nodes that can be identified with XPath
  • XSLT is an easy way to transform these into different forms
  • XQuery can enable you to perform sophisticated queries on a large database of texts
  • ...We haven't put things into TEI XML for the fun of it, but to be able to get things out of it or do things to it. These things may be the text in multiple formats, lists of names, statistical information, linguistic analysis, or additional layers of annotation


Date: 2008-07
Copyright University of Oxford