Text only | Skip links
Skip links||IT Services, University of Oxford

1. The TEI Class System

  • The TEI distinguishes over 500 elements,
  • Having these organised into classes aids comprehension, modularity, and modification.
  • Attribute class: the members share common attributes
  • Model class: they can appear in the same locations (and often are structurally or semantically related)
  • Classes may contain other classes
  • Elements inherit the properties from any classes of which they are members

1.1. Attribute Classes

  • Attribute classes are given (usually adjectival) names beginning with att.; e.g. members of the att.naming class get a key attribute rather than have them define it individually
  • If another element needs a key attribute then the easiest way to provide it is to add it to the att.naming class
  • Classes can be grouped together into a super classes

1.2. att.global

All elements are a member of att.global; this includes, among others:
a unique identifier
the language of the element content
a number or name for an element
how the element in question was rendered or presented in the source text.
att.global also contains att.global.linking so if the linking module is loaded it provides attributes:
points to elements that correspond to the current element in some way
points to an element of which the current element is a copy
points to the next element of a virtual aggregate of which the current element is part.
points to the previous element of a virtual aggregate of which the current element is part

1.3. Model Classes

  • Model classes contain groups of elements allowed in the same place. e.g. if you are adding an element which is wanted wherever the <bibl> is allowed, add it to the model.biblLike class
  • Model classes are usually named with a Like or Part suffix:
    • model.divLike: structural class grouping elements for divisions
    • model.divPart: structural class grouping elements used inside divisions
    • model.nameLike: semantic class grouping name elements
    • model.persNamePart: semantic sub-class grouping elements that are part of a personal name

1.4. Macros

Macros are short-hand names for common patterns:
content of paragraphs and similar elements
content of prose elements that are not used for transcription of extant materials
a sequence of character data and phrase-level elements
a sequence of character data and those phrase-level elements that are not typically used for transcribing extant documents
the content model of elements which either contain a series of component-level elements or else contain a series of phrase-level and inter-level elements

1.5. Datatype Macros

A special set of macros which provide common datatypes, mostly used for attributes:
a coded value
a single word or token
an XML Name
a single XML name taken from a documented list
a W3C duration
a W3C date
a truth value (true/false)
a language
human or animal sex

1.6. Basic Model Class Structure

The TEI class system makes a threefold division of elements:
high level major divisions of texts
elements such as paragraphs appearing within texts or divisions, but not other chunks
phrase-level elements
elements such as highlighted phrases which can occur only within chunks
The TEI identifies the following groupings from these three:
inter-level elements
elements such as lists which can appear either in or between chunks
elements which can appear directly within texts or text divisions

1.7. Classes for divisions

The TEI architecture defines five classes, all of which are populated by this module:
  • model.divTop groups elements appearing at the beginning of a text division.
  • model.divTopPart groups elements which can occur only at the beginning of a text division.
  • model.divBottom groups elements appearing at the end of a text division.
  • model.divBottomPart groups elements which can occur only at the end of a text division.
  • model.divWrapper groups elements which can appear at either top or bottom of a textual division.

1.8. model.divWrapper members

<argument> A formal list or prose description of the topics addressed by a subdivision of a text.
<byline> contains the primary statement of responsibility given for a work on its title page or at the head or end of the work.
<dateline> contains a brief description of the place, date, time, etc. of production of a letter, newspaper story, or other work, prefixed or suffixed to it as a kind of heading or trailer.
<docAuthor> (document author) contains the name of the author of the document, as given on the title page (often but not always contained in a byline).
<docDate> (document date) contains the date of a document, as given (usually) on a title page.
<epigraph> contains a quotation, anonymous or attributed, appearing at the start of a section or chapter, or on a title page.

1.9. model.divTopPart members

<head> (heading) contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc.
<salute> (salutation) contains a salutation or greeting prefixed to a foreword, dedicatory epistle, or other division of a text, or the salutation in the closing of a letter, preface, etc.
<opener> groups together dateline, byline, salutation, and similar phrases appearing as a preliminary group at the start of a division, especially of a letter.

model.divTop = model.divTopPart + model.divWrapper

1.10. model.divBottomPart members

<closer> groups together salutations, datelines, and similar phrases appearing as a final group at the end of a division, especially of a letter.
<signed> (signature) contains the closing salutation, etc., appended to a foreword, dedicatory epistle, or other division of a text.
<trailer> contains a closing title or footer appearing at the end of a division of a text.
<postscript> contains a postscript, e.g. to a letter.

model.divBottom = model.divBottomPart + model.divWrapper

2. Defining a TEI Schema

  • A schema helps you know a document is valid in addition to being well-formed
  • A TEI schema is a combination of TEI modules, optionally including customizations of the elements/attributes/classes that they contain
  • This schema is defined in an application-independent manner with a TEI ODD (One Document Does it all) file which allows for:
    • creation of a schemas such as DTD, RELAX NG or W3C Schema
    • internationalized documentation which reflects your customization of the TEI
    • documentation of how your schema differs from tei_all that is suitable for long-term preservation

2.1. Important ODD concepts

The TEI's literary programming with ODD (One Document Does it all) provides:
  • Schema specification
  • User oriented documentation
  • Modularity: all specifications pertaining to a coherent sub-domain of the TEI
  • Classes: identifying shared behaviours or semantics
  • Extensibility: a consequence of the above mechanisms

2.2. The TEI ODD in practice

The TEI Guidelines, its schema, and its schema fragments, are all produced from a single XML resource containing:
  1. Descriptive prose (lots of it)
  2. Examples of usage (plenty)
  3. Formal declarations for components of the TEI Abstract Model:
    • elements and attributes
    • modules
    • classes and macros

2.3. Possibilities of customizing the TEI

The TEI has over 20 modules. A working project will:
  • Choose the modules they need
  • Probably narrow the set of elements within each module
  • Probably add local datatype constraints
  • Possibly add new elements/attributes in other namespaces
  • Possibly localize the names of elements

2.4. Real life TEI customization

We aim to support a range of interactions with the TEI:
Easy TEI
Simple access to the TEI through Roma
Subsetting the TEI
Making the full TEI even easier to use
Enlarging the application profile
Using modules
Modifying the TEI objects
First insights into extensibility
Behind the scene - ODD
Starting to use the actual specification language

3. Roma

The TEI knows you don't want to necessarily have to write TEI code in order to customize the TEI. So it has provided Roma, which is a command-line script, and corresponding web front-end to help you do this.

The people behind Roma are:
Arno Mittelbach
Initial programming
Sebastian Rahtz
Maintenance and frequent improvements
Ioan Bernevig
A 'Sanity Checker' addition

3.1. How to use the TEI

Imagine that you have seen your colleague next door doing some encoding with the TEI and want to do the same thing:
  • Go to Roma at http://tei.oucs.ox.ac.uk/Roma/
  • Toy with the user profile [ Customize ]
  • Generate a schema [Schema]
  • Make a trial with the editor, creating a simple document
  • Get back to Roma and make basic documentation

3.2. Roma: New

3.3. Roma: Customize

3.4. Roma: Schema

3.5. Roma: Documentation

3.6. Subsetting the TEI

Suppose you now feel you want to use some more of the TEI, but not all of it

  • Go to Roma…
  • Look at [Modules]
  • Explore default modules by pointing to main elements (by order of interest). You can throw away most things, but
    • In textstructure, you should really keep <TEI>, <text>, <body> and <div>
    • In core, most people need <p>, <q>, <list>, <pb/> and <head>
    • From header, keep everything unless you really understand the details
  • Start checking out elements
  • Make editorial choices (numbered vs. unnumbered divs)

3.7. Roma: Modules

3.8. Roma: Change Module

3.9. Roma: Change Attributes

3.10. Roma: Change Attribute Values

3.11. Roma: Change Language

3.12. Roma: Sanity Checker

4. Understanding ODD

A TEI ODD file can contain as much discursive prose as you want, but as a minimum, it needs a <schemaSpec> element to define the schema it documents

<schemaSpec ident="TEI-minimalstart="TEI">
 <moduleRef key="tei"/>
 <moduleRef key="header"/>
 <moduleRef key="core"/>
 <moduleRef key="textstructure"/>

4.1. Even more customisation

<schemaSpec ident="Chaucer-MoLstart="TEI">
 <moduleRef key="tei"/>
 <moduleRef key="header"/>
 <moduleRef key="core"/>
 <moduleRef key="textstructure"/>
 <moduleRef key="namesdates"/>
 <moduleRef key="transcr"/>
<!-- We don't need these drama elements: -->
 <elementSpec ident="spmode="deletemodule="core"/>
 <elementSpec ident="speakermode="deletemodule="core"/>
 <elementSpec ident="stagemode="deletemodule="core"/>

4.2. What is happening here?

TEI customizations are themselves expressed in TEI XML, using elements from the tagdocs module.

For example:
<schemaSpec ident="myTEIlite">
 <desc>This is TEI Lite with simplified heads</desc>
 <moduleRef key="tei"/>
 <moduleRef key="core"/>
 <moduleRef key="textstructure"/>
 <moduleRef key="header"/>
 <moduleRef key="linking"/>
 <elementSpec ident="headmode="change">

produces something like TEI Lite, with a slight change

4.3. ODD processors

  • The TEI maintains a library of XSLT scripts that can generate
    • The TEI Guidelines in canonical TEI XML format
    • The Guidelines in HTML or PDF
    • RELAXNG, DTD, or W3C schema fragments
  • The same library is used by the customization layer to generate
    • project-specific documentation
    • project-specific schemas
    • translations into other (human) languages
  • We use eXist as a database for extracting material from the P5 sources

4.4. The TEI abstract model

  • The TEI abstract model sees a markup scheme (a schema) as consisting of a number of discrete modules, which can be combined more or less as required.
  • A schema is made by combining references to modules and optional element over-rides or additions
  • Each element declares the module it belongs to: elements cannot appear in more than one module.
  • Each module extends the range of elements and attributes available by adding new members to existing classes of elements, or by defining new classes.

4.5. Expression of TEI content models

Within the class system, TEI elements have to be defined using some language notation; choices include:
  1. using XML DTD language (as in older versions of the TEI)
  2. using W3C Schema language
  3. using the RELAXNG schema language
  4. inventing an entirely new abstract language for later transformation to specific schema language
We chose a combination of 3 and 4 — using our abstract language, but switching to RELAXNG for content modelling.

4.6. Why that combination?

  • Expressing constraints in XML language is too attractive to forego
  • There is a clamour for better datatyping than DTDs have
  • The schema languages are so good, it is silly to reinvent them
  • But we like our class system and literate programming

4.7. DTD vs RELAXNG vs W3C Schema

  • DTDs are not XML, and need specialist software
  • W3C schema is not consistently implemented, its documentation is vast and confusing, and it looks over-complex
  • RELAXNG on the other hand…
    • uncluttered design
    • good documentation
    • multiple open source 100%-complete implementations
    • ISO standard
    • useful features for multipurpose structural validation
No contest…

4.8. An Example ODD

<elementSpec module="spokenident="pause">
  <memberOf key="model.divPart.spoken"/>
  <memberOf key="att.timed"/>
  <memberOf key="att.typed"/>
  <attDef ident="whousage="opt">
   <gloss>A unique identifier</gloss>
   <desc>supplies the identifier of the person or group pausing.
       Its value is the identifier of a <gi>person</gi> or <gi>persGrp</gi>
       element in the TEI header.</desc>
    <rng:ref name="data.pointer"/>

4.9. From which we generate: RNC

element pause { pause.content, pause.attributes }
pause.content = empty
pause.attributes =
model.divPart.spoken |= pause
att.timed |= pause
att.typed |= pause
att.ascribed |= pause

4.10. Or DTD

<!ELEMENT %n.pause; %om.RR; EMPTY> <!ATTLIST %n.pause; %att.global.attributes; %att.timed.attributes; %att.typed.attributes; %att.ascribed.attributes;> <!ENTITY % model.divPart.spoken "%x.model.divPart.spoken; %n.event; | %n.kinesic; | %n.pause; | %n.shift; | %n.u; | %n.vocal; | %n.writing;">

4.11. Or documentation

4.12. Overriding an attribute value-list in a TEI ODD

<elementSpec ident="listmodule="core">
  <memberOf key="att.typed"/>
  <attDef ident="typemode="replace">
   <valList type="closed">
    <valItem ident="ordered">
     <gloss>Items are ordered</gloss>
    <valItem ident="bulleted">
     <gloss>Items are bulleted</gloss>
    <valItem ident="gloss">
     <gloss>Part of a gloss list</gloss>

4.13. Modifying TEI objects

Understanding classes is critical.
  • They group together elements with the same role in the TEI architecture
  • They group together elements with the same syntactic behaviour
  • Classes can provide attributes for groups of like-minded elements
  • The elements in the class will appear in the same content models
The class defines a group of elements belonging to the same family of concepts, elements declare themselves as belonging to a class.

4.14. Uniformity of description

  • modules, elements, attributes, value-lists are treated uniformly
  • each has an identifier, a gloss, a description, and one or more equivalents
  • each can be added, changed, replaced, deleted within a given context
  • for example, membership in the att.type class gives you a generic type attribute, which can be over-ridden for specific class members

5. Phrase Level Documentation Elements

  • <code> (literal code from some formal language)
  • <ident> (an identifier for an object of some kind in a formal language)
  • <att> (the name of an attribute appearing within running text)
  • <val> (a single attribute value)
  • <gi> (the name (generic identifier) of an element.)
  • <tag> (text of a complete start- or end-tag, possibly including attribute specifications, but excluding the opening and closing markup delimiter characters)
  • <specList> (marks where a list of descriptions is to be inserted into the prose documentation)
  • <specDesc/> (a description of the specified element or class should be included at this point)

6. Specification Elements

  • <elementSpec> (documents the structure, content, and purpose of a single element type)
  • <classSpec> (reference information for an element class)
  • <macroSpec> (documents the function and implementation of a pattern)

7. Common Elements (1)

  • Description:
    • <remarks> (any commentary or discussion about the usage of an element, attribute, or class)
    • <listRef> (a list of significant references to places where this element is discussed)
  • Examples
    • <exemplum> (a single example demonstrating the use of an element)
    • <eg> (any kind of illustrative example)
    • <egXML> (a single well-formed XML example demonstrating the use of some XML element or attribute)
  • Classification
    • <classes> (the classes of which the element or class is a member)
    • <memberOf> (class membership of the parent element or class)

8. Common Elements (2)

  • Element Specifications
    • <content> (the text of a content model for the schema)
    • <attList> (documentation for all the attributes associated with this element, as a series of <attDef> elements)
  • Attributes
    • <attDef> (definition of a single attribute)
    • <datatype> (schema datatype for the attribute value)
    • <defaultVal> (default declared attribute value)
    • <valDesc> (description of any attribute value)
    • <valList> (a list of attribute value items)
    • <valItem> (a single attribute value item)

9. Defining a TEI Schema

  • A schema helps you know a document is valid in addition to being well-formed
  • A TEI schema is a combination of TEI modules, optionally including customizations of the elements/attributes/classes that they contain
  • This schema is defined in an application-independent manner with a TEI ODD (One Document Does it all) file which allows for:
    • creation of a schemas such as DTD, RELAX NG or W3C Schema
    • internationalized documentation which reflects your customization of the TEI
    • documentation of how your schema differs from tei_all that is suitable for long-term preservation

10. A word of caution

  • The TEI is not a monolithic environment
  • Very few things are really mandatory …
  • …but the TEI is more than just a market place
  • Basic document structure must be preserved
The TEI is a powerful environment for working with elements and producing documentation, but do not abuse it.

Sebastian Rahtz. Date: 2008-07-11
Copyright University of Oxford