Text only | Skip links
Skip links||IT Services, University of Oxford
In the collection of files (http://tei.oucs.ox.ac.uk/Talks/2013-03-11-copenhagen/data.zip) are 3 data directories:
ecco
ECCO texts (full set from http://www.ota.ox.ac.uk/catalogue/)
godwin
the diaries of William Godwin of the years 1788-1791 (full set from http://godwindiary.bodleian.ox.ac.uk/godwindiary.zip)
owen
a poem of Wilfred Owen
and a simpler file called test.xml. These should give you enough varied material to work on.

1. Rendering TEI to HTML

  • Your task is to make a web page rendering of one of the ECCO texts. The purpose of this is:
    • Remind you how to read a TEI XML file
    • Remind you how to run a transform in oXygen
    • Remind you how to view web pages
    All you have to do is load the XML file (you choose) and press the default buttons in oXygen
  • Your next task is to make your own web page rendering of one of the ECCO texts. The purpose of this is:
    • Remind you how to construct an XSL transform, associate a file with it, and run the transformation
    • Remind you of the structure of an HTML file
    Here is your starting point for an XSL:
    <xsl:stylesheet
      xpath-default-namespace="http://www.tei-c.org/ns/1.0version="2.0">

     <xsl:template match="TEI">
      <html xmlns="http://www.tei-c.org/ns/1.0"
      >

      <head xmlns="http://www.tei-c.org/ns/1.0"
      >

      <title xmlns="http://www.tei-c.org/ns/1.0"
      >
    My document</title></head>
      <body xmlns="http://www.tei-c.org/ns/1.0"
      >

      <xsl:apply-templates select="text"/></body></html>
     </xsl:template>
     <xsl:template match="div">
      <h1 xmlns="http://www.tei-c.org/ns/1.0"
      >

      <xsl:sequence select="head"/></h1>
      <xsl:apply-templates select="*[not(self::head)]"/>
     </xsl:template>
    </xsl:stylesheet>
    Get that into oXygen, and construct your desired transform. In order to relate the XML file to the stylesheet we have to associate the two together.
    • Go to the ‘Document’ -> ‘XML Document’ -> ‘Associate XSLT/CSS Stylesheet’ menu.
    • Click on the ‘XSLT’ tab, and click the folder icon to browse for a file.
    • Choose your file as the XSLT file to use.
    • You should notice that oXygen adds a new line to the top of your file that looks something like:
      <?xml-stylesheet type="text/xsl" href="test.xsl"?>
    • This allows the XML document to know what stylesheet it can use to transform the document.
    • Select from the ‘Document’ -> ‘Transformation’ menu, ‘Configure Transformation Scenario’.
    • On the window that appears select ‘XML Stylesheet Processing Instruction’, and then click ‘Transform Now’.
    • If everything has worked perfectly (sometimes settings change across versions of oXygen), then your web browser should open a web page containing the text.

2. XSLT transformations for genetic editions

Your next task is to write an XSLT transformation to make a plausible rendition of the encoding on Wilfred Owen's poem Strange Meeting as a web page.

The purpose of this exercise is
  • to remind you of the complexity of a full-scale TEI file
  • to get you to think about HTML and CSS, and what sort of methods you will use to tackle the problem
  • to see if you can generate some XSL based on the input XML

Study the input XML carefully (owen/Strange_Meeting_manuscript.xml).

Consider the difference between the <teiHeader/fileDesc/sourceDesc/msDesc>, <text>, the <facsimile>, and <sourceDoc>. Then think about what sort of HTML you want to make. This could be
  • a simple rendition of just one part
  • four separate pages for header, facsimile, genetic editiion, and edited edition
  • four sections in the same document
  • side by side sections for some parts
You'll need to decide what these look like in HTML and start creating the right structures in your XSL.

A useful starting point could be to write a script to look at the values for rend, and generate skeleton templates for each different combination.

<xsl:stylesheet
  xpath-default-namespace="http://www.tei-c.org/ns/1.0version="2.0">

 <xsl:output method="xmlindent="yes"/>
 <xsl:namespace-alias stylesheet-prefix="xoutresult-prefix="xsl"/>
 <xsl:template match="/">
  <stylesheet xmlns="http://www.w3.org/1999/XSL/TransformAlias"
   version="2.0"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0">

   <xsl:for-each
     select="distinct-values(//*[@rend]/concat(name(),'+',@rend))">

    <template xmlns="http://www.w3.org/1999/XSL/TransformAlias"
    
      match="{substring-before(.,'+')}[@rend='{substring-after(.,'+')}']">

     <apply-templates xmlns="http://www.w3.org/1999/XSL/TransformAlias"
     />
</template>
   </xsl:for-each></stylesheet>
 </xsl:template>
</xsl:stylesheet>

You may or may not choose to choose to align the <surface> in <sourceDoc> and <facsimile>.

3. XSLT for changing our XML

These exercises are designed to teach you how to transform your TEI XML file.
  • Write an identity transformation, like this:
    <xsl:template
      match="@*|text()|comment()|processing-instruction()">

     <xsl:copy-of select="."/>
    </xsl:template>
    <xsl:template match="*">
     <xsl:copy>
      <xsl:apply-templates
        select="*|@*|processing-instruction()|comment()|text()"/>

     </xsl:copy>
    </xsl:template>
    Check that it does what you expect (ie reproduce the input)
  • Using one of the ECCO files, add a template which removes line breaks and any empty figures
  • Can you replace ‘coagulum’ in ecco/K004879.000.xml with <foreign>coagulum</foreign>?

4. XSLT for validating our XML

This set of exercises is designed to make us understand how to wrote XSL which reports on and validates our TEI XML, and generate files for loading into other applications. You'll need your XPath!

  • Write an XSLT stylesheet which produces a text file
    <xsl:output method="text"/>
    containing a series of comma-separated lines which contain counts of elements. eg
    "p",67
    "name",102
    "div",4
    you can then load this into a spreadsheet and make a histogram. You can choose which elements to count, or process all of them (not so easy).
  • Write an XSLT stylesheet which checks that each pointer in a resp attribute has a corresponding ID in the file.
  • Write an XSLT stylesheet which analyzes ECCO files and generates a closed <valList> for <div>/@type. Remember the useful template pattern for processing a lot of files at once:
    <xsl:stylesheet
      xpath-default-namespace="http://www.tei-c.org/ns/1.0version="2.0">

     <xsl:output method="html"/>
     <xsl:template name="main">
      <xsl:variable
        name="docs"
        select="collection('./ecco?select=*.xml;recurse=yes;on-error=warning')"/>

      <xsl:for-each select="$docs//TEI">
       <xsl:message>processing <xsl:value-of select="//titleStmt/title"/>
       </xsl:message>
      </xsl:for-each>
     </xsl:template>
    </xsl:stylesheet>

5. Getting better quality TEI XML

Write an ODD from scratch, or use Roma to create a skeleton, and then edit the result. Use Roma to generate a schema, and then validate any of the ECCO files against the result. The ODD should have the following features:
  1. There should be a <valList> for the type attribute on <div> which limits it to a few fixed values; provide a <desc> for each <valItem>
  2. There should be a Schematron constraint which checks that the <publicationStmt> is not empty
  3. There should be a Schematron constraint which checks that all <div> elements have a <head>, unless they have a type attribute with the value 'title_page'.
  4. The examples for some elements should be replaced with ones from the ECCO texts
  5. Mathematics using MathML should be allowed as a child of <formula> (you'll need to study the Exemplars for this)

6. Using TEI stylesheet family

6.1. Introduction

This is a set of XSLT 2.0 specifications to transform TEI XML documents to XHTML, to LaTeX, to XSL Formatting Objects, to/from OOXML (docx), to/from OpenOfice (odt) and to ePub format. The files can be downloaded from the Releases area of http://tei.sf.net. They concentrate on the simpler TEI modules, but adding support for other modules is fairly easy. In the main, the setup has been used on ‘new’ documents, ie reports and web pages that have been authored from scratch, rather than traditional TEI-encoded existing material.

The XSL FO style sheets were developed for use with PassiveTeX (http://projects.oucs.ox.ac.uk/passivetex/), a system using XSL formatting objects to render XML to PDF via LaTeX. They have not been extensively tested with the other XSL FO implementations.

6.2. File organisation

The main stylesheets are divided into four directories:
common2
templates which are independent of output type
fo2
templates for making XSL FO output
xhtml2
templates for making HTML output
latex2
templates for making LaTeX output
Within each directory there is a separate file for the templates which implement each of the TEI modules (eg textstructure.xsl, linking.xsl, or drama.xsl); these are included by a master file tei.xsl. This also includes a parameterization layer in the file tei-param.xsl, and the parameterization file from the common2 directory. The tei.xsl does any necessary declaration of constants and XSL keys.
There are further directories for special-purposes conversions:
epub
conversion to ePub
odt
conversion to and from OpenOffice Writer format
docx
conversion to and from Word OOXML format
odds2
processing of TEI ODD files
rdf
conversion to RDF
txt
conversion to plain text

The final important directory is profiles, which has a set of predefined project starting points, each of which may have a file to.xsl for one or more of the supported output formats (csv, dtd, html, odt, docbook, epub, latex, p4, docx, fo, lite, and relaxng). There may also be a from.xsl to go from the selected format to TEI XML.

For example, to convert TEI to HTML in the default mannner, the user may run profiles/default/html/to.xsl on the selected input file. Other starting points are listed below.

For the brave, there are Linux/OSX command-line shell scripts docxtotei, odttotei, teitodocx, teitodtd, teitoepub, teitoepub3, teitohtml, teitoodt, teitordf, teitorelaxng, teitornc, teitotxt, and teitoxsd for converting to/from Word, to/from OpenOffice, and to DTD, ePub, HTML, RDF, Relax NG, plain text, W3C schema etc. These are implemented using Ant tasks, which are also available within the oXygen XML editor as part of the TEI framework.

6.3. Trying the PDF rendering

  • load an ECCO text into oXygen, choose the TEI P5 to PDF transform scenario (press the ‘Configure Transfomation Scenario’ icon, ). If all goes well, your browser will load a PDF rendering in due course.
  • Now duplicate the transformation scenario and edit it; click on the Parameters button, and you'll see a table of things you can change - double click on a value to put in a replacement
  • Now set the parameter Institution to ‘Oxford Summer School’ and rerun the transformation. See the difference?
  • More dramatically, change columnCount to have the value 2, and see what happens then.
  • Set parIndent to ‘0em’ and parSkip to ‘2pt’
  • Finally, change pageWidth to ‘1755mm’, change columnCount back to 1, run the transform, and check that the page width is lessened.

6.4. Going further with parameters of HTML

Switch to an HTML rendering instead; try some of these changes to an HTML rendering of ECCO, by setting parameters, and check the results:
  • Set autoToc to ‘false’
  • Set numberHeadings to false
  • Set numberParagraphs to ‘true’
You can see the catalogue of parameters at http://www.tei-c.org/release/doc/tei-xsl-common/customize.html.

6.5. Using OxGarage

Now it is time to work with OxGarage, to check that you can create word-processor and ebook files. Visit http://oxgarage.oucs.ox.ac.uk:8080/ege-webclient and
  • Upload one of the ECCO files. Try conversions to Word or OpenOffice format, and check that they load into the relevant application properly.
  • Make an ePub file, if you have an eBook reader to hand (Firefox users can download a good addon from http://www.epubread.com/en/)
  • Open Word or OpenOffice and write a simple document. Upload this to OxGarage and ask for TEI P5 XML to be sent back. Load it into oXygen and see if it is valid or useable. Do not expect miracles. OxGarage cannot read your mind…
  • Edit the generate TEI file a litle, then upload it back into OxGarage and ask for a Word or OpenOffice file. How does that compare with the one you started with?


Sebastian Rahtz. Date: 2013-03
Copyright University of Oxford