Text only | Skip links
Skip links||IT Services, University of Oxford


1. The TEI header

The TEI header is the bibliographic record for the electronic file, storing information about the file itself, how it is was made, categorisations and analytical information about the text, and revision control information. It is one of the most important aspects of any TEI file.

1.1. TEI Header Structure

The TEI header has four main components:
  • <fileDesc> (file description) contains a full bibliographic description of an electronic file.
  • <encodingDesc> (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived.
  • <profileDesc> (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. (just about everything not covered in the other header elements
  • <revisionDesc> (revision description) summarises the revision history for a file.

Only <fileDesc> is required; the others are optional.

1.2. Example Header: Minimal required header

   <title>A title?</title>
   <p>Who published?</p>
   <p>Where from?</p>

1.3. Types of content in the TEI header

  • free prose
    • prose description: series of paragraphs
    • phrase: character data, interspersed with phrase-level elements, but not paragraphs
  • grouping elements: specialised elements recording some structured information
  • declarations: Elements whose names end with the suffix Decl (e.g. subjectDecl, refsDecl) enclose information about specific encoding practices applied in the electronic text.
  • descriptions: Elements whose names end with the suffix Desc (e.g. <settingDesc>, <projectDesc>) contain a prose description, possibly, but not necessarily, organised under some specific headings by suggested sub-elements.

1.4. File Description

  • has some mandatory parts:
    • <titleStmt>: provides a title for the resource and any associated statements of responsibility
    • <sourceDesc>: documents the sources from which the encoded text derives (if any)
    • <publicationStmt>: documents how the encoded text is published or distributed
  • and some optional ones:
    • <editionStmt>: yes, electronic texts have editions too
    • <seriesStmt>: and they also fit into "series".
    • <extent>: how many floppy disks, gigabits, files?
    • <notesStmt>: notes of various types

NB A "file" may actually correspond with several operating system files.

1.5. The File Description

  • <titleStmt>: contains a mandatory <title>which identifies the electronic file (not its source!)
  • optionally followed by additional titles, and by ‘statements of responsibility’, as appropriate, using <author>, <editor>, <sponsor>, <funder>, <principal> or the generic <respStmt>
  • <publicationStmt>: may contain
    • plain text (e.g. to say the text is unpublished)
    • one or more <publisher>, <distributor>, <authority>, each followed by <pubPlace>, <address>, <availability>, <idno>

1.6. The Source Description

Many electronic texts were not 'born digital': their source/s need specification in traditional bibliographic style
  • <bibl>, <biblStruct>
  • (for texts which were born digital): <biblFull> may contain a nested <fileDesc>
  • <listBibl> a list of the foregoing
  • prose description
  • more specialised elements are available for spoken texts (<recordingStmt> etc.) and for manuscripts (<msDescription>)

1.7. For Example

  <title level="a">Enigma</title>, <title level="j">Punch: or the London Charivari</title>, <date when="1914-07-01">July 1, 1914</date>, 147, p. 6</bibl>

1.8. Association between header and text

By default everything asserted by a header is true of the text to which it is prefixed. This can be over-ridden:
  • as when a text header over-rides or amplifies a corpus-header setting
  • when model.declarable elements are selected by means of the decls attribute (available on all model.declaring elements)
  • using special purpose selection/definition elements e.g. <catRef> and <taxonomy> (see below)
Most components of the encoding description are declarable.

1.9. Encoding Description

<encodingDesc> groups notes about the procedures used when the text was encoded, either summarised in prose or within specific elements such as
  • <projectDesc>: goals of the project
  • <samplingDecl>: sampling principles
  • <editorialDecl>: editorial principals, e.g. <correction>, <normalization>, <quotation>, <hyphenation>, <segmentation>, <interpretation>
  • <classDecl>: classification system/s used
  • <tagsDecl>: specifics about usage of particular elements
The <encodingDesc> can replace the user manual, or facilitate semi-automatic document management, given agreed codes of practice.

1.10. New <rendition> element

  • <rendition>: structured information about appearance in the source document
 <rendition xml:id="r-centerscheme="css">text-align: center;</rendition>
 <rendition xml:id="r-smallscheme="css">font-size: small;</rendition>
 <rendition xml:id="r-largescheme="css">font-size: large;</rendition>

1.11. New <appInfo> element

  • <appInfo>: structured information about an application which has edited this TEI file
 <application version="1.7ident="ImageMarkupToolnotAfter="2008-06-01">
  <label>Image Markup Tool</label>
  <ptr target="#P1"/>
  <ptr target="#P2"/>

1.12. Profile Description

An extensible rag-bag of descriptions, categorised only as ‘non-bibliographic’. Default members of the model.profileDescPart) class include:
  • <creation>: information about the origination of the intellectual content of the text, e.g. time and place
  • <langUsage>: information about languages, registers, writing systems etc used in the text
  • <textDesc> and <textClass>: classifications applied to the text by means of a list of specified criteria or by means of a collection of pointers, respectively
  • <particDesc> and <settingDesc>: information about the ‘participants’, either real or depicted, in the text
  • <handList>: information about the hands identified in a manuscript

1.13. Classification Methods

<textClass> provides a classification (by domain, medium, topic...) for the whole of a text expressed in one or more of the following ways:
  • direct reference to a locally defined category (using <catRef>)
  • reference to an externally defined category (using <classCode>)
  • documented by <keywords>

1.14. Example

 <catRef target="#X123"/>
 <classCode scheme="DD12">001.9</classCode>
 <keywords scheme="DD">
  <term>End of the World</term>
  <term>Day of Judgment</term>
  <category xml:id="X1">
   <catDesc>Homiletic writing</catDesc>
   <category xml:id="X123">
    <catDesc>Day of Judgment</catDesc>

1.15. Detailed characterization of a text

<textDesc> provides a description of a text in terms of its ‘Situational parameters’

<textDesc n="novel">
 <channel mode="w">print; part issues</channel>
 <constitution type="single"/>
 <derivation type="original"/>
 <domain type="art"/>
 <factuality type="fiction"/>
 <interaction type="none"/>
 <preparedness type="prepared"/>
 <purpose type="entertaindegree="high"/>
 <purpose type="informdegree="medium"/>
<!-- These subelements constitute the class model.textDescPart: redefine that to roll your own. -->

1.16. Language and character set usage

The <langUsage> element is provided to document usage of languages in the text. Languages are identified by their ISO codes:
 <language ident="en">English</language>
 <language ident="bg-cy">Bulgarian in Cyrillic characters </language>
 <language ident="bg">Romanized Bulgarian</language>

1.17. Revision Description

A list of <change> elements, each with a date and who attributes, indicating significant stages in the evolution of a document. Most recent first.

1.18. Example

 <change when="2006-08-09who="#LB">handedits following
 <change when="2000-10-11who="#OUCS">Final manual corrections
   for BNC-W</change>
 <change when="2000-10-18who="#OUCS">Further manual corrections
   for BNC-W</change>
 <change when="2000-01-08who="#OUCS">Manually changed
   catdescriptions etc. for BNC-W</change>
 <change when="1994-11-30who="#OUCS">First release for

2. Bibliographies

The TEI provides numerous ways to provide bibliographic citations, from the highly flexible to the highly structured.

2.1. Bibliographic Citations

  • <bibl> (loosely structured bibliographic citation)
  • <biblStruct> (structured bibliographic citation)
  • <listBibl> (a list of bibliographic citations such as a bibliography)
  • The 'header' module also includes <biblFull> (fully-structured bibliographic citation based on the TEI fileDesc element)

2.2. Simple <bibl> Example

<p>In Punch there is an brief note which could be misconstrued as a slur upon Canadians.<note>
   <title level="a">Men for the Antarctic</title>
   <title level="j">Punch: or the London Charivari</title>,
  <biblScope>p. 6</biblScope>
 </note> It should not be understood as such.</p>

2.3. Simple <biblStruct> Example

Enigma, Punch: or the London Charivari , July 1, 1914, 147, pp. 1-20
  <title level="a">Enigma</title>
  <title level="j">Punch: or the London Charivari</title>
   <date when="1914-07-01">July 1, 1914</date>
   <biblScope type="vol">147</biblScope>
   <biblScope type="pp">1-20</biblScope>

2.4. <biblFull> example

  <title>The Feminist Companion to Literature in English: women writers from the middle ages to the present</title>
  <author>Blain, Virginia</author>
  <author>Clements, Patricia</author>
  <author>Grundy, Isobel</author>
  <edition>UK edition</edition>
 <extent>1231 pp</extent>
  <publisher>Yale University Press</publisher>
  <pubPlace>New Haven and London</pubPlace>
  <p>No source: this is an original work</p>

2.5. Conclusion

Without proper metadata:
  • no one can find your text
  • no one knows how or why you made it
  • no one knows what it is or what they can do with it

Date: 2008-07
Copyright University of Oxford