Text only | Skip links
Skip links||IT Services, University of Oxford

TEI à la carte : customization

In this tutorial, we'll see how TEI markup might be used in an imaginary (but quite plausible) research project. We'll discuss what markup scheme the project will need, create a TEI schema to support it, and (if there is time) do some experimental markup using it.

As you probably don't know, the late Marcel Virgolos has long been recognised as an expert on the history and evolution of the 20th century postcard. Following his untimely demise, we have received permission from his executors to create a digital research archive of some parts of his unique personal collection of 100,000 mostly used postcards of all kinds in order to further the progress of serious cartophilological studies.

1. Sample documents

In this exercise we will mark up some postcards from the collection, respecting its organization into recto and verso, and the components of its address, etc.

Of course the first part of our research project was spent arguing about how to markup these documents, and if we had more time we'd rerun some of that discussion now. For the purposes of this exercise, we'll assume that we've reached agreement on the TEI elements to be used, and that we have also marked up a sample set of cards using them.

You will find images of the recto and verso of the cards in your Work/Cards folder along with some sample transcriptions.

2. Making a schema with Roma

Roma is a web application developed for the TEI which can be used to create your own TEI customization. We'll use it to make a schema using just the elements we've decided are essential.

2.1. Setting Roma parameters

  • Open a web browser and visit the site http://www.tei-c.org/Roma/.
  • The initial screen allows you to choose from a number of starting points :
    1. start from a mimimal number of TEI modules, to which you can add new elements or modules and remove existing elements;
    2. start from the TEI All schema you used at the start of this exercise and remove things you don't want
    3. start from one of a small number of pre-defined templates: the TEI Absolutely Bare customization which you used in the first exercise is on this list;
    4. start from a pre-existing TEI customization file such as the ever-popular TEI Lite
    5. start from a customization you (or someone else) has already made

We suggest you start from the minimal option.

On the next screen, set the parameters as follows :
  • Title : Change this to "TEI for Postcards".
  • Filename : Change this to (for example) teiCards (this is an XML identifier and so may not contain spaces)
  • Leave the Namespace and Prefix fields unchanged
  • Language : You can continue to work in English, but if you prefer German, Italian, Spanish, French, Portuguese, Russian, Swedish, Chinese or Japanese, feel free to select the appropriate radio button.
  • Author name : Type in your name
  • Change the description to something like "A minimal TEI tagset for the encoding of digitized postcards"
  • Click the Save button at the foot of the page.

If you chose a language other than English, you'll see that the Roma interface language has changed accordingly. You should also visit the Language (or equivalent) page to set the language used by your generated schema

2.2. Select your modules

A module is a group of TEI elements. Every TEI element is declared in a particular module. Some modules are very general and have many components; others are more specialised. For example, if you are encoding a dictionary, or a speech transcript, you will need to choose the module for Dictionaries, or that for Transcribed Speech respectively, but the elements provided by these modules are of less interest in other types of document.

The Roma interface requires you to specify the modules you want to use. You cannot use it to specify an element name directly, though the ODD language does permit this.

Refer to this table for the exercise:
module elements required
tei [none]
textstructure TEI body dateline div postscript salute signed text
core add addrLine address bibl date del foreign graphic head hi item lb list name p publisher q reg resp respStmt street teiCorpus title unclear
header teiHeader fileDesc titleStmt publicationStmt sourceDesc
figures figure figDesc
msdescription stamp
namesdates persName placeName
linking ab
transcr [none]
  • Click the Modules button on the toolbar to see the modules from which your schema is derived
  • On the right, you can see a list of the modules which have been selected so far; on the left you can see a list of the elements available.
  • Referring to the list above, add each of the required modules by clicking on the word Add preceding its name in the list on the left.
  • The module is added to the list on the right

2.3. Including and excluding elements

Selecting a module by default includes all the elements defined by that module, which may not be what we want.
  • Click on the word core in the right hand list (nb. not the word 'remove' but the name of the module). A table listing all the elements provided by this module is displayed.
  • Each row of the table contains:
    • the canonical name of the element
    • an indication of its Inclusion or Exclusion in the current schema
    • the name of this element in the current schema (normally this is the canonical name, but Roma allows you to rename elements, for example if you are working in a language other than English)
    • a question mark link to the full reference information for this element
    • a brief description of the element
    • a link which allows you to change the element's attributes
  • This interface allows you to explore in detail all the elements provided. Click on the question mark link for any of the elements which interest you to read more about them.
  • The interface also controls your selection of elements. Use the Include or Exclude button in the heading of the table to include or exclude by default all the elements defined by a module. Then click on the button beside any element whose status you wish to change in order to include (or exclude) it. Usually it is more convenient to start by excluding all the elements and then adding back the ones you actually want.

Referring to the table above, remove the unwanted elements.

When you've finished, don't forget to click the red Save button at the foot of the page!

So far we have defined a schema which is a pure subset of the TEI recommendations. Suppose, however, that we want to add a new element, which the TEI has not yet standardised. For example, we might like to record text printed on a postcard before it was written using an explicit <printed> element. Roma allows you to add new elements within certain limits.

  • Select the "Add elements" tab
  • Supply a name (‘printed’) and a description (‘Marks any string of text printed on a card before it was used’
  • Note that your new element must be assigned to a different (non-TEI) namespace
  • Choose the appropriate class memberships for your new element: we suggest you should make it a member of the model class model.inter and the attribute class att.global
  • You must also define the content model for your new element. We'll discuss this in more detail later; for the moment, we suggest you just choose macro.specialPara from the dropdown list of commonly used content models
  • Press the red Save button at the foot of the page to complete the process.

Note that in this exercise we only control the presence or absence of elements in a schema. Selecting from the possible attributes for those elements is also possible using Roma, but we do not explore it here.

2.4. Creating a schema and an ODD

  • Click the Schema tab. You can choose amongst several schema languages because the TEI system is defined (as far as possible) independently of any particular language.
  • We recommend you to generate your schema in RELAXNG either in compact or in XML syntax.
  • Click on the red Generate button and save the schema file which Roma sends you in your Work folder.
  • Click on the Save tab in Roma to save a copy of the ODD file that Roma has generated in the same place.

2.5. Check the outputs from Roma

  • Start up oXygen.
  • Click on the New icon, top left (or select New from the File menu, or type CTRL-N) to open the New dialogue
  • Choose New Document, then XML Document,
  • Click the Customize button below. oXygen displays the Customize Editor dialog box.
  • From the dropdown menu at the far right of the Schema URL window choose Browse for local file
  • Navigate to your Workfolder to select the schema file you have just created, and click the Open button
  • Information about your schema is displayed. Click the Create button to create an empty document which will use your schema.
  • You may like to check what elements are available within a <p> element

As well as the schema file you just used, Roma has made an initial ODD file for you.

  • Open the file teiCards.xml with oXygen
  • You'll learn more about how to modify and improve this file later. For the moment, let's look at another way of creating an ODD file, based on actual tagging practice rather than theory.

3. Making a schema with oddbyexample

  • Using oXygen, open any TEI XML file in the target collection of texts
  • Choose Transformation -> Configure Transformation Scenario(s) from the Document menu
  • Click New and choose "XML Transformation with XSLT"
  • Give your scenario a name ("oddGenerator" for example)
  • Leave XML URL as it is. Change XSL URL to point to the stylesheet oddbyexample.xsl in your TEI Framework directory directory. Enter ${frameworks}/tei/xml/tei/stylesheet/tools/oddbyexample.xsl to find it
  • Choose Saxon-PE as processor
  • Click the little yellow wheel next to this window to select Advanced Options: you need to set Template("-it") to main
  • Click the Parameters button : you need to set the corpus parameter to contain the full name of the folder which you want to analyse. Assuming you opened one of its files in step 1 above, just set the parameter to ${cfd} and click OK
  • Now select the Output tab ...
    • In the Save as window supply an output filename such as generated.odd
    • Tick the Open in editor box
    • Select the XML radio button underneath Show in results view as and click OK
  • Launch the transformation by clicking the Apply Associated button
  • If everything works, you should see the resulting ODD file generated.odd

4. Processing an ODD

An ODD file can be uploaded to Roma for processing, or processed directly using oXygen
  • Select Transformation -> ConfigureTransformation Scenario(s) from the Document menu, ou type CTRL-SHIFT-C, or click the spanner + red triangle icon on the tool bar.
  • A list of available transformations for the ODD file you are currently editing is displayed. Scroll down to find the nine available for TEI ODD and check the boxes for those which interest us, for example : TEI ODD to XHTML and TEI ODD to RELAXNG Compact
  • Click the button Apply Associated and wait a minute or two.

An HTML file will be generated, and should be displayed by your default browser: as you can see, it is like any other TEI specification. A schema file in RELAXNG compact form, will be created and stored in a subdirectory called out.

5. Experimenting with your schemas

You should now have two schemas, one produced by Roma called teiCards, and one produced by oddByExample called generated. To find out how they differ you could inspect the ODD sources, or the RELAX NG itself. It may be easier however to see how the two schemas behave in practice. Try the following : .
  • Open one of the existing card transcriptions and validate it first with one schema and then the other
  • Using the usual oXgen manipulations, check to see what elements and attributes are available at different points in the document.
  • If you have time, make a new document for each schema, and add content to it using one of the untagged text files in the Samples directory.

Lou Burnard Consulting. Date: September 2014
Copyright University of Oxford