Text only | Skip links
Skip links||IT Services, University of Oxford

1. What more do you need to know?

You've created some TEI XML documents. What processing technologies are now available to you?

  • XPath
  • XSLT
  • XML Query (XQuery)

2. What is XPath?

  • A standardized syntax for identifying and accessing parts of an XML document
  • A library of standard functions
  • A W3C Standard
  • A major component of XQuery, XSLT, and almost every XML processing system

3. Example text

<body n="anthology">
 <div type="poem">
  <head>The SICK ROSE </head>
  <lg type="stanza">
   <l n="1">O Rose thou art sick.</l>
   <l n="2">The invisible worm,</l>
   <l n="3">That flies in the night </l>
   <l n="4">In the howling storm:</l>
  </lg>
  <lg type="stanza">
   <l n="5">Has found out thy bed </l>
   <l n="6">Of crimson joy:</l>
   <l n="7">And his dark secret love </l>
   <l n="8">Does thy life destroy.</l>
  </lg>
 </div>
</body>

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30. XPath in summary

  • A location path results in a node-set
  • Paths can be absolute (/div/lg[1]/l)
  • Paths can be relative (l/../../head)
  • Formal Syntax: (axisname::nodetest[predicate])
  • For example:child::div[contains(head, 'ROSE')]

31. XPath: Abbreviated Syntax

  • nothing is the same as child::, so lg is short for child::lg
  • @ is the same as attribute::, so @type is short for attribute::type
  • . is the same as self::, so ./head is short for self::node()/child::head
  • .. is the same as parent::, so../lg is short for parent::node()/child::lg
  • // is the same as descendant-or-self::, so div//l is short for child::div/descendant-or-self::node()/child::l

32. XPath also has built-in functions

These include...
node-set functions
for example, to find the number of nodes in a node-set, select nodes by position, name, namespace, etc.
string functions
for example, to concatenate string values, do string matching, find substrings, perform one-to-one character translation, etc.

33. eXtensible StyLesheeTs

The XSLT language is
  • expressed in XML; uses namespaces to distinguish output from instructions
  • a Turing-complete functional programing language
  • reads and writes XML trees
  • designed to generate XSL FO, but now widely used to generate HTML or other forms of XML

34. What do you mean, "transformation"?

Take this
<recipe>
<title>Pasta for beginners</title>
<ingredients><item>Pasta</item>
<item>Grated cheese</item>
</ingredients>
<cook>Cook the pasta and mix with the cheese</cook>
</recipe>
and make this
<html>
<h1>Pasta for beginners</h1>
<p>Ingredients: Pasta Grated cheese</p>
<p>Cook the pasta and mix with the cheese</p>
</html>

35. How do you express that in XSL?

<xsl:stylesheet
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
version="1.0">
<xsl:template match="recipe">
<html>
<h1><xsl:value-of select="title"/></h1>
<p>Ingredients:
<xsl:apply-templates
select = "ingredients/item"/>
</p>
<p><xsl:value-of select="cook"/></p>
</html>
</xsl:template>
</xsl:stylesheet>

36. XSL Summary

The core techniques:
  • template rules for nodes in the incoming XSL
  • taking material from other nodes
  • processing nodes several times in different modes
  • variables and functions
  • choosing, sorting, numbering
  • different types of output

37. What is XQuery?

  • a domain-specific method for accessing and manipulating XML
  • designed for querying XML
  • built upon XPath
  • analagous to SQL (but for XML rather than for relational data)
  • a W3C recommendation

38. XQuery: the core techniques

path expressions
return a nodeset
element constructors
return a new element
FLWOR expressions
analogous to SQL Select statement
list expressions
operations on lists or sets of values
conditional expressions
traditional if then else construction
qualified expressions
boolean operations over lists or sets of values
datatype expressions
test datatypes of values

XQUERY is a complete programming language

39. Using XQuery

  • The language itself is usually hidden from the end user (e.g. behind an HTML form)
  • Implementations such as eXist may be embedded in web publishing framework such as Apache...
  • .. but eXist can also be accessed e.g. from a Java program and via HTTP/REST, XML-RPC, SOAP, WebDAV etc.
  • eXist provides XUpdate: a means of adding, changing, or deleting nodes from an XML database

40. Accessing XML: Conclusions

  • XML is a set of hierachical nodes that can be identified with XPath
  • XSLT is an easy way to transform these into different forms
  • XQuery can enable you to perform sophisticated queries on a large database of texts
  • ...We haven't put things into TEI XML for the fun of it, but to be able to get things out of it or do things to it. These things may be the text in multiple formats, lists of names, statistical information, linguistic analysis, or additional layers of annotation


James Cummings, Sebastan Rahtz and other TEI@Oxford authors. Date: February 2009
Copyright University of Oxford