Text only | Skip links
Skip links||IT Services, University of Oxford

1. Moving on

  • How the TEI is constructed
  • Making a TEI schema
  • Specifying your profile of the TEI
  • How to use the TEI Guidelines documentation

2. Some terminology

  • The TEI encoding scheme consists of a number of modules
  • Each module contains a number of element specifications
  • Each element specification contains:
    • a canonical name (<gi>) for the element, and optionally other names in other languages
    • a canonical description (also possibly translated) of its function
    • a declaration of the classes to which it belongs
    • a definition for each of its attributes
    • a definition of its content model
    • usage examples and notes
  • a TEI schema specification (<schemaSpec>) is made by selecting modules or elements and (optionally) modifying their contents
  • a TEI document containing a schema specification is called an ODD (One Document Does it all)

3. What is a module?

  • A convenient way of grouping together a number of element declarations
  • These are usually on a related topic or specific application
  • Most chapters of P5 focus on elements drawn from a single module, which that chapter then defines
  • A TEI Schema is created by selecting modules and adding or removing elements from them as needed

4. Which modules exist?

Module name Chapter
analysis Simple Analytic Mechanisms
certainty Certainty and Responsibility
core Elements Available in All TEI Documents
corpus Language Corpora
dictionaries Dictionaries
drama Performance Texts
figures Tables, Formulae, and Graphics
gaiji Representation of Non-standard Characters and Glyphs
header The TEI Header
iso-fs Feature Structures
linking Linking, Segmentation, and Alignment
msdescription Manuscript Description
namesdates Names, Dates, People, and Places
nets Graphs, Networks, and Trees
spoken Transcriptions of Speech
tagdocs Documentation Elements
tei The TEI Infrastructure
textcrit Critical Apparatus
textstructure Default Text Structure
transcr Representation of Primary Sources
verse Verse

5. How do you choose?

  • Just choose everything (not really a good idea)
  • The TEI provides a small set of predefined combinations (TEI Lite, TEI Bare...)
  • Or you could roll your own (but then you need to know what you're choosing)
Roma
a command line script, with a web front end, designed to make this process much easier

http://www.tei-c.org/Roma/

6. Roma: New

7. Roma: Customize

8. Roma: Schema

9. Roma: Documentation

10. What did we just do?

We processed a pre-existing ODD file which contained (as well as some discursive prose) the following schema specification:
<schemaSpec ident="tei_barestart="TEI">
 <moduleRef key="core"/>
 <moduleRef key="tei"/>
 <moduleRef key="header"/>
 <moduleRef key="textstructure"/>
 <elementSpec ident="abbrmode="deletemodule="core"/>
 <elementSpec ident="addmode="deletemodule="core"/>
<!-- ... -->
 <elementSpec ident="trailermode="deletemodule="textstructure"/>
 <elementSpec ident="titlemode="changemodule="core">
  <attList>
   <attDef ident="levelmode="delete"/>
  </attList>
 </elementSpec>
<!-- ... -->
</schemaSpec>

We selected four modules, deleted loads of elements, and also deleted an attribute

11. Roma provides an interface to the detail

  • The [Modules] tab shows the modules available
  • Selecting a module from it shows the elements within that module, and gives you the choice to
    • include all of them (and then remove some)
    • exclude all of them (and then put back the ones you want)
  • You can also change an element's attribute list, and the values they permit

12. Roma: Modules

13. Roma: Change Module

14. What does the Punch Project need?

A simple selection of elements, but also
  • we want to allow only certain values for type on <div>
  • we want a new element to wrap the combination of a <cit> and a comment on it: we will call it a <citCom> (you might like to think of a better name)

Other constraints are possible — we might want to insist that a <div type="cartoon"> contains a graphic, for example.

15. The ODD advantage

We can express these constraints in our ODD, and then generate a formal schema to enforce them using whichever schema language we like

  • TEI schemas can be generated in
    • ISO RELAX NG language
    • W3C Schema Language
    • XML DTD language
  • ODD itself defines an element's content models using a subset of RELAX NG syntax
  • Datatypes are defined in terms of W3C datatypes
  • Some facilities (e.g. alternation, namespaces) cannot be expressed in DTDs — RELAX NG schema is recommended
  • Additional constraints can be expressed in Schematron

16. Roma: selecting attributes

17. Roma: constraining attribute values

18. What did we just do?

Our ODD now includes something like this:
<elementSpec ident="divmodule="textstructuremode="change">
 <attList>
  <attDef ident="typemode="changeusage="req">
   <valList type="closedmode="replace">
    <valItem ident="cartoon"/>
    <valItem ident="snippet"/>
    <valItem ident="verse"/>
<!-- ... -->
   </valList>
  </attDef>
 </attList>
</elementSpec>
Note that we can also add documentation to the ODD:
<valItem ident="cartoon">
 <gloss>contains a humorous picture, usually with
   dialogue underneath</gloss>
</valItem>

19. Defining a new element

When defining a new element, we need to consider
  • its name and description
  • what attributes it can carry
  • what it can contain
  • where it can appear in a document

The TEI class system helps us answer all these questions (except the first).

20. The TEI Class System

  • The TEI distinguishes over 500 elements,
  • Having these organised into classes aids comprehension, modularity, and modification.
  • Attribute class: the members share common attributes
  • Model class: they can appear in the same locations (and are often semantically related)
  • Classes may contain other classes
  • An element can be a member of any number of classes, irrespective of the module it belongs to.

21. Attribute Classes

  • Attribute classes are given (usually adjectival) names beginning with att.; e.g. att.naming, att.typed
  • all members of att.naming inherit from it attributes key and ref; all members of att.typed inherit from it type and subtype
  • If we want an element to carry the type attribute, therefore, we add the element to the att.typed class, rather than define those attributes explicitly.

22. A very important attribute class: att.global

All elements are a member of att.global; this class provides, among others:
xml:id
a unique identifier
xml:lang
the language of the element content
n
a number or name for an element
rend
how the element in question was rendered or presented in the source text.

All new elements are members of this class by default.

23. Model Classes

  • Model classes contain groups of elements which are allowed in the same place. e.g. if you are adding an element which is wanted wherever the <bibl> is allowed, add it to the model.biblLike class
  • Model classes are usually named with a Like or Part suffix:
    • members of model.pLike are all things that ‘behave like’ paragraphs, and are permitted in the same places as paragraphs
    • members of model.pPart are all things which can appear within paragraphs. This class is subdivided into
      • model.pPart.edit elements for simple editorial intervention such as <corr>, <del> etc.
      • model.pPart.data‘data-like’ elements such as <name>, <num>, <date> etc.
      • model.pPart.msdesc extra elements for manuscript description such as <seal> or <origPlace>

24. Basic Model Class Structure

Simplifying wildly, one may say that the TEI recognises three kinds of element:
divisions
high level major divisions of texts
chunks
elements such as paragraphs appearing within texts or divisions, but not other chunks
phrase-level elements
elements such as highlighted phrases which can occur only within chunks
There are ‘base model classes’ corresponding with each of these, and also with the following groupings: three:
inter-level elements
elements such as lists which can appear either in or between chunks
components
elements which can appear directly within texts or text divisions

And yes, there is a class model.global for elements that can appear anywhere — at any hierarchic level.

25. Defining our new element <citCom>

What other elements is it like?
It's like a paragraph or quotation. It's not a phrase level element, because it must contain more than just unstructured text.
What other elements can contain it?
It can only appear within a division, like a paragraph.
What can it contain?
It must contain a citation (i.e. a quote optionally associated with a bibliographic reference) or something like that, followed by at least one paragraph of commentary.
Conclusions:
  • we make it a member of model.divPart
  • we will have to define a special content model for it

26. Roma: Defining a new element

27. Defining a content model

  • A typical TEI element defines its content by referencing classes of element which it can contain, rather than using specific elements.
  • Content models are defined using the RELAXNG vocabulary
  • Here are some very common predefined content models:
    macro.paraContent
    content of paragraphs and similar elements
    macro.limitedContent
    content of prose elements that are not used for transcription of extant materials
    macro.phraseSeq
    a sequence of character data and phrase-level elements
    macro.phraseSeq.limited
    a sequence of character data and those phrase-level elements that are not typically used for transcribing extant documents
    macro.specialPara
    the content model of elements which either contain a series of component-level elements or else contain a series of phrase-level and inter-level elements

28. Roma: Defining a new element 2

29. What did we just do?

We added a new element specification to our ODD, like this:
<elementSpec
  ident="citCom"
  ns="http://www.example.org/ns/nonTEI"
  mode="add">

 <desc> contains a citation followed by some commentary on it.</desc>
 <classes>
  <memberOf key="model.divLike"/>
  <memberOf key="att.typed"/>
 </classes>
 <content>
  <rng:ref name="cit"/>
  <rng:oneOrMore>
   <rng:ref name="model.pLike"/>
  </rng:oneOrMore>
 </content>
</elementSpec>

Note that this new element is not in the TEI namespace. It belongs to the IPP project only!

30. Other kinds of constraints

  • You can also constrain the content of an element or the value of an attribute to be of a particular datatype (for example, to insist that the element <date> contains only a date)
  • This can be done by using one of a set of predefined macros to define the content. Examples include
    data.word
    a single word or token
    data.name
    an XML Name
    data.enumerated
    a single XML name taken from a documented list
    data.temporal.w3c
    a W3C date
    data.truthValue
    a truth value (true/false)
    data.language
    a human language
    data.sex
    human or animal sex
  • Or you can define a more complex constraint, e.g. using Schematron

31. Schematron constraints

  • An element specification can also contain a <constraintSpec> element which contains rules about its content expressed as ISO Schematron constraints
<elementSpec ident="divmodule="teistructuremode="change"   xmlns:s="http://purl.oclc.org/dsdl/schematron">
 <constraintSpec ident="cartoonscheme="isoschematron">
  <constraint>
   <assert xmlns="http://purl.oclc.org/dsdl/schematron"
    test="@type='cartoon' and .//tei:graphic">
a cartoon must include a graphic
   </assert>
  </constraint>
 </constraintSpec>
</elementSpec>
However...
  • You can only add such rules by editing your ODD file: Roma doesn't know about them.
  • Not all schema languages can implement these constraints.


TEI@Oxford. Date: 2010-07
Copyright University of Oxford