Text only | Skip links
Skip links||IT Services, University of Oxford

1. The TEI Header

The TEI header was designed with two goals in mind
  • needs of bibliographers and librarians trying to document ‘electronic books’
  • needs of text analysts trying to document ‘coding practices’ within digital resources
The result is that discussion of the header tends to be pulled in two directions...

1.1. The Librarian’s Header

  • Conforms to standard bibliographic model, using similar terminology
  • Organized as a single source of information for bibliographic description of a digital resource, with established mappings to other such records (e.g. MARC)
  • Emerging code of best practice in its use, endorsed by major digital collections
  • Pressure for greater and more exact constraints to improve precision of description: preference for structured data over loose prose

1.2. Everyman’s Header

  • Gives a polite nod to common bibliographic practice, but has a far wider scope
  • Supports a (potentially) huge range of very miscellaneous information, organized in fairly ad hoc ways
  • Many different codes of practice in different user communities
  • Unpredictable combinations of narrowly encoded documentation systems and loose prose descriptions

1.3. TEI Header Structure

The TEI header has four main components:
  • <fileDesc> (file description) contains a full bibliographic description of an electronic file.
  • <encodingDesc> (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived.
  • <profileDesc> (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. (just about everything not covered in the other header elements
  • <revisionDesc> (revision description) summarizes the revision history for a file.

Only <fileDesc> is required; the others are optional.

1.4. Example Header: Minimal required header

<teiHeader>
 <fileDesc>
  <titleStmt>
   <title>A title?</title>
  </titleStmt>
  <publicationStmt>
   <p>Who published?</p>
  </publicationStmt>
  <sourceDesc>
   <p>Where from?</p>
  </sourceDesc>
 </fileDesc>
</teiHeader>

1.5. The TEI supports two ‘levels’ or types of header

  • corpus level metadata sets default properties for everything in a corpus
  • text level metadata sets specific properties for one component text of a corpus

1.6. Corpus Header Example

<teiCorpus xmlns="http://www.tei-c.org/ns/1.0"
>

<teiHeader xmlns="http://www.tei-c.org/ns/1.0"
 type="corpus">

<!-- corpus-level metadata here --></teiHeader>
<TEI xmlns="http://www.tei-c.org/ns/1.0"
>

<teiHeader xmlns="http://www.tei-c.org/ns/1.0"
 type="text">

<!-- metadata specific to this text here --></teiHeader>
<text xmlns="http://www.tei-c.org/ns/1.0"
>

<!-- ... --></text></TEI>
<TEI xmlns="http://www.tei-c.org/ns/1.0"
>

<teiHeader xmlns="http://www.tei-c.org/ns/1.0"
 type="text">

<!-- metadata specific to this text here --></teiHeader>
<text xmlns="http://www.tei-c.org/ns/1.0"
>

<!-- ... --></text></TEI></teiCorpus>

1.7. Types of content in the TEI header

  • free prose
    • prose description: series of paragraphs
    • phrase: character data, interspersed with phrase-level elements, but not paragraphs
  • grouping elements: specialised elements recording some structured information
  • declarations: Elements whose names end with the suffix Decl (e.g. subjectDecl, refsDecl) enclose information about specific encoding practices applied in the electronic text.
  • descriptions: Elements whose names end with the suffix Desc (e.g. <settingDesc>, <projectDesc>) contain a prose description, possibly, but not necessarily, organised under some specific headings by suggested sub-elements.

1.8. File Description

  • has some mandatory parts:
    • <titleStmt>: provides a title for the resource and any associated statements of responsibility
    • <sourceDesc>: documents the sources from which the encoded text derives (if any)
    • <publicationStmt>: documents how the encoded text is published or distributed
  • and some optional ones:
    • <editionStmt>: yes, electronic texts have editions too
    • <seriesStmt>: and they also fit into "series".
    • <extent>: how many floppy disks, gigabits, files?
    • <notesStmt>: notes of various types

1.9. The File Description

  • <titleStmt>: contains a mandatory <title> which identifies the electronic file (not its source!)
  • optionally followed by additional titles, and by ‘statements of responsibility’, as appropriate, using <author>, <editor>, <sponsor>, <funder>, <principal> or the generic <respStmt>
  • <publicationStmt>: may contain
    • plain text (e.g. to say the text is unpublished)
    • one or more <publisher>, <distributor>, <authority>, each followed by <pubPlace>, <address>, <availability>, <idno>

1.10. A minimal header for Punch

<fileDesc>
 <titleStmt>
  <title>Punch, or the London Charivari: an electronic
     edition</title>
  <editor>Owen Seaman (1861-1936)</editor>
  <respStmt>
   <resp>TEI version</resp>
   <name>TEI@Oxford team</name>
  </respStmt>
 </titleStmt>
 <publicationStmt>
  <p>Unpublished</p>
 </publicationStmt>
 <sourceDesc>
  <p>Recoded from the Project Gutenberg versions</p>
 </sourceDesc>
</fileDesc>

1.11. Title- and Responsibility- statements...

There may be many of them:
<title>Artamene</title>
<title type="alt">Le Grand Cyrus</title>
<title type="sub">Digital Edition</title>
Amongst the guilty parties:
<author>Scudery, Madeleine de</author>
<principal>Geffin, Alexandre</principal>
<funder>Fonds Nationale Suisse de la Recherche
Scientifique</funder>
<respStmt>
 <resp>Encoding check</resp>
 <name>Jean Untel</name>
</respStmt>

1.12. <publicationStmt> example

<publicationStmt>
 <publisher>TEI Consortium</publisher>
 <distributor>Oxford Text Archive</distributor>
 <idno type="ota">1256</idno>
 <availability>
  <p>Available under the terms of a Creative Commons
     Attribution and Share Alike licence.</p>
 </availability>
</publicationStmt>

1.13. <notesStmt> example

<notesStmt> can contain notes on almost any aspect:
<notesStmt>
 <note>Material prepared for the TEI@Oxford Summer
   School.</note>
</notesStmt>

1.14. The Source Description

All electronic works need to indicate their source, even if it is just to say that it is 'born digital'. There are variety of ways to do this:
  • prose description
  • <bibl> : contains free text or any mixture of bibliographic elements such as <author>, <publisher> etc.
  • <biblStruct> contains effectively the same elements but constrained in various ways according to bibliographic standards
  • <biblFull> special-cases texts which were born TEI by replicating an embedded <fileDesc>
  • A <listBibl> may be used for lists of such descriptions
  • Specialised elements for spoken texts (<recordingStmt> etc.) and for manuscripts (<msDesc>) Discussed later!
  • Authority lists for e.g people (<listPerson>) or places (<listPlace>) can be included.

1.15. <sourceDesc> examples

<sourceDesc>
 <p>Born digital.</p>
</sourceDesc>
<sourceDesc>
 <bibl>
  <title level="a">Enigma</title>, <title level="j">Punch:
     or the London Charivari</title>, <date when="1914-07-01">July 1, 1914</date>, 147, p. 6</bibl>
</sourceDesc>

1.16. <bibl> vs. <biblStruct> Example

<bibl>
 <title level="a">Enigma</title>, in <title level="j">Punch:
   or the London Charivari</title> (July 1, 1914), vol 147,
pp. 1-20
</bibl>
<biblStruct>
 <analytic>
  <title level="a">Enigma</title>
 </analytic>
 <monogr>
  <title level="j">Punch: or the London Charivari</title>
  <imprint>
   <pubPlace>London</pubPlace>
   <date when="1914-07-01">July 1, 1914</date>
   <biblScope type="vol">147</biblScope>
   <biblScope type="pp">1-20</biblScope>
  </imprint>
 </monogr>
</biblStruct>

1.17. Encoding Description

<encodingDesc> groups notes about the procedures used when the text was encoded, either summarised in prose or within specific elements such as
  • <projectDesc>: goals of the project
  • <samplingDecl>: sampling principles
  • <editorialDecl>: editorial principals, e.g. <correction>, <normalization>, <quotation>, <hyphenation>, <segmentation>, <interpretation>
  • <classDecl>: classification system/s used
  • <tagsDecl>: specifics about usage of particular elements
The <encodingDesc> can replace the user manual, or facilitate semi-automatic document management, given agreed codes of practice.

1.18. <encodingDesc> Example (1)

<encodingDesc>
 <projectDesc>
  <p>The Imaginary Punch Project aims to ....
  </p>
 </projectDesc>
 <samplingDecl>
  <p>All pages containing editorial text have
     been transcribed in full. Pages containing only
     advertisements or illustrations have been
     omitted.</p>
 </samplingDecl>
 <editorialDecl>
  <hyphenation>
   <p>Original spelling has been retained, except
       that words hyphenated across line breaks have been
       silently re-assembled. The hyphen has been retained
       only where there exist cases of the same word being
       hyphenated in mid-line position. </p>
  </hyphenation>
<!-- ... -->
 </editorialDecl>
<!-- ... -->
</encodingDesc>

1.19. <encodingDesc> Example (2)

<encodingDesc>
<!-- ... -->
 <classDecl>
  <taxonomy xml:id="size">
   <category xml:id="large">
    <catDesc>story occupies more
         than half a page</catDesc>
   </category>
   <category xml:id="medium">
    <catDesc>story occupies between
         quarter and a half page</catDesc>
   </category>
   <category xml:id="small">
    <catDesc>story occupies less
         than a quarter page</catDesc>
   </category>
  </taxonomy>
  <taxonomy xml:id="topic">
   <category xml:id="politics-domestic">
    <catDesc>Refers to
         domestic political events</catDesc>
   </category>
   <category xml:id="politics-foreign">
    <catDesc>Refers to
         foreign political events</catDesc>
   </category>
   <category xml:id="social-women">
    <catDesc>refers to role
         of women in society</catDesc>
   </category>
   <category xml:id="social-servants">
    <catDesc>refers to
         role of servants in society</catDesc>
   </category>
  </taxonomy>
 </classDecl>
<!-- ... -->
</encodingDesc>

1.20. Profile Description

A collection of descriptions, categorised only as ‘non-bibliographic’. Default members of the model.profileDescPart class include:
  • <creation>: information about the origination of the intellectual content of the text, e.g. time and place
  • <langUsage>: information about languages, registers, writing systems etc used in the text
  • <textDesc> and <textClass>: classifications applied to the text by means of a list of specified criteria or by means of a collection of pointers, respectively
  • <particDesc> and <settingDesc>: information about the ‘participants’, either real or depicted, in the text
  • <handNotes>: information about the hands identified in a manuscript

1.21. Language and character set usage

The <langUsage> element is provided to document usage of languages in the text. Languages are identified by their ISO codes:
<langUsage>
 <language ident="en">English</language>
 <language ident="fr">French</language>
 <language ident="bg-cy">Bulgarian in Cyrillic characters </language>
 <language ident="bg">Romanized Bulgarian</language>
</langUsage>

1.22. Classification Methods

<textClass> provides a classification (by domain, medium, topic...) for the whole of a text expressed in one or more of the following ways:
using <catRef>
direct reference to a locally defined (e.g. in the corpus header) category
using <classCode>
reference to some commonly agreed and externally defined category (e.g. UDC)
using <keywords>
assign arbitrary descriptive terms taken from a bibliographic controlled vocabulary or a tag cloud

1.23. BNC Example

<profileDesc>
 <creation>
  <date when="1962"/>
 </creation>
 <textClass>
  <catRef
    target="#WRI #ALLTIM1 #ALLAVA2 #ALLTYP3 #WRIDOM5 #WRILEV2 #WRIMED1 #WRIPP5 #WRISAM3 #WRISTA2 #WRITAS0"/>

  <classCode scheme="DLEE">W nonAc: humanities
     arts</classCode>
  <keywords scheme="COPAC">
   <term>History, Modern - 19th century</term>
   <term>Capitalism - History - 19th century</term>
   <term>World, 1848-1875</term>
  </keywords>
 </textClass>
</profileDesc>

This categorization applies to the whole text. For more fine grained classification, use decls on e.g. a <div> element.

1.24. Revision Description

  • A list of <change> elements, each with a date and who attributes, indicating significant stages in the evolution of a document.
  • Most recent first.
  • Can be maintained manually, but better done by means of a CMS (change management system)
<revisionDesc>
 <change>
  <date>$LastChangedDate: 2010-06-28 09:14:36 +0100 (Mon, 28
     Jun 2010) $.</date>
  <name>$LastChangedBy: lou $</name>
  <note>$LastChangedRevision: 10346 $</note>
 </change>
</revisionDesc>

2. Manuscript Description

Why are manuscripts special?

  • Manuscripts are unique objects, often of great cultural or political value.
  • Books, by contrast, exist in multiple copies, and can be described adequately by well-established and formalised bibliographic conventions.
  • For manuscripts, there are several traditions, often descriptive or belle lettriste, and little consensus.

Similar concerns apply to other text-bearing objects.

2.1. Components of a manuscript description

Within the <msDesc> element come a required <msIdentifier> element, which groups information identifying the manuscript, followed by an optional <head>, which can be used to provide in a brief, unstructured way information on the manuscript's contents etc. These are then followed either by one or more paragraphs (<p>), or one or more of the following specialised elements:
  • <msContents>: an itemised list of the intellectual content of the manuscript, with transcriptions of rubrics, incipits, explicits etc, as well as primary bibliographic references
  • <physDesc>: groups information concerning all physical aspects of the manuscript, its material, size, format, script, decoration, binding, marginalia etc.
  • <history>: provides information on the history of the manuscript, its origin, provenance and acquisition by its holding institution

2.2. Components of a manuscript description (cont.)

  • <additional>: groups other information about the manuscript, in particular, administrative information relating to its availability, custodial history, surrogates etc.
  • <msPart>: contains in essence a nested <msDesc>, in cases of composite manuscripts now regarded as constituting a single unit but made up of two or more parts which were originally physically distinct.
Within each of these elements a number of sub-elements is available; <msContents>, for example, will normally consist of one or more <msItem> elements, each in turn containing specific elements for <rubric>, <incipit>, <explicit> and <colophon>, as well as the standard TEI elements <author>, <title> and <bibl> for bibliographic references. As with <msDescription> itself, however, the contents of these first-level and second-level elements need not be this structured, since there is also the option of using paragraphs.

2.3. Identification (1)

The <msIdentifier>

Traditional three part specification:
  • place (<country>, <region>, <settlement>)
  • repository (<institution>, <repository>)
  • identifier (<collection>, <idno>)
<msIdentifier>
 <country>Canada</country>
 <settlement>Ottawa</settlement>
 <repository>Library and Archives Canada</repository>
 <collection>E.W.B. Morrison</collection>
 <idno>MG 30 E 81 v. 16</idno>
</msIdentifier>

2.4. Identification (2)

Alternative or additional names can also be included:
<msIdentifier>
 <country>Danmark</country>
 <settlement>København</settlement>
 <repository> Det ArnamagnæanskeInstitut </repository>
 <idno>AM 45 fol.</idno>
 <msName xml:lang="la">Codex Frisianus</msName>
 <msName xml:lang="is">Fríssbók</msName>
</msIdentifier>

2.5. Intellectual Content

  • May simply use paragraphs of text…
  • … or a tree of <msItem> elements
  • … optionally preceded by a prose summary
We can describe the content in general terms:
<msContents>
 <p>An extraordinary charivari of heroic deeds and improving
   tales, including an early version of <title>Guy of
     Warwick</title> and several hymns.</p>
</msContents>
or we can provide detail about each distinct item:
<msContents>
 <summary>An extraordinary charivari of heroic deeds,
   improving tales, and hymns.</summary>
 <msItem>
<!-- details of Guy of Warwick here -->
 </msItem>
 <msItem>
<!-- other items here -->
 </msItem>
</msContents>

2.6. Physical Description

An artificial (but helpful) grouping of many distinct items.

You can simply supply paragraphs of prose, covering such topics as
  • <objectDesc>: the physical carrier
  • <handDesc>: what is carried on it
  • <musicNotation>, <decoDesc>, <additions>
  • <bindingDesc> and <sealDesc>
  • <accMat>: accompanying material
Or, group your discussion within the specific elements mentioned above.

Similarly, within the specific elements, you can supply paragraphs of prose, or further specific elements.

2.7. The carrier 1

The <objectDesc> can contain just paragraphs, or <supportDesc> and <layoutDesc>
<objectDesc form="codex">
 <supportDesc material="mixed">
  <p>Early modern <material>parchment</material> and
  <material>paper</material>.</p>
 </supportDesc>
 <layoutDesc>
  <layout columns="1ruledLines="25 32"/>
 </layoutDesc>
</objectDesc>

2.8. The carrier 2

A more complex substructure with specific elements for <support>, <extent>, <foliation>, <collation>, <condition>.

Multiple layouts may also be specified:
<layoutDesc>
 <layout ruledLines="25columns="1">
  <p>
   <locus from="1rto="202v"/>
   <locus from="210rto="212v"/> Between 25 and 32 ruled
     lines.</p>
 </layout>
 <layout ruledLines="34 50columns="1">
  <p>
   <locus from="203rto="209v"/>Between 34 and 50 ruled
     lines.</p>
 </layout>
</layoutDesc>

2.9. <handDesc> and <decoDesc>

  • <handNote> (note on hand) describes a particular style or hand distinguished within a manuscript.
  • <decoNote> contains a note describing either a decorative component of a manuscript or a fairly homogenous class of such components.

2.10. <additions>

The <additions> element can be used to list or describe any additions to the manuscript, such as marginalia, scribblings, doodles, etc., which are considered to be of interest or importance.

<additions>
 <p>The text of this manuscript is not interpolated with
   sentences from Royal decrees promulgated in 1294, 1305
   and 1314. In the margins, however, another somewhat later
   scribe has added the relevant paragraphs of these
   decrees, see pp. 8, 24, 44, 47 etc.</p>
 <p>As a humorous gesture the scribe in one opening of the
   manuscript, pp. 36 and 37, has prolonged the lower stems
   of one letter f and five letters þ and has them drizzle
   down the margin.</p>
</additions>

2.11. <accMat>

<accMat> (accompanying material) contains details of any significant additional material which may be closely associated with the manuscript being described, such as non-contemporaneous documents or fragments bound in with the manuscript at some earlier historical period.

<accMat> A copy of a tax form from 1947 is included in the
envelope with the letter. It is not catalogued separately.
</accMat>

2.12. <history>

  • <origin>: where it all began
  • <provenance>: everything in between
  • <acquisition>: how you acquired it

<origin> is datable element and thus has attributes notBefore and notAfter, when etc.

2.13. <history> Example

<history>
 <origin>
  <p>Written in <origPlace>England</origPlace> in the
  <origDate notAfter="1300notBefore="1200">13th cent.
   </origDate>
  </p>
 </origin>
 <provenance>
  <p>On fol. 54v very faint is <q>Iste liber est fratris
       guillelmi de buria de <gap reason="illegible"/>
       Roberti ordinis fratrum Pred<ex>icatorum</ex>
   </q>,
     14th cent. (?): <q>hanauilla</q> is written at the foot
     of the page (15th cent.).</p>
 </provenance>
 <acquisition>
  <p>Bought from the Rev. <name type="person">W. D.
       Macray</name> on <date when="1863-03-17">March 17,
       1863</date>, for 1 pound 10s.</p>
 </acquisition>
</history>

2.14. And finally

A <msDesc> can contain <msPart>, essentially a nested <msDesc>, where originally distinct manuscripts or parts of a manuscripts have been brought together to form a composite manuscript.

<msDesc>
 <msIdentifier>
  <settlement>Amiens</settlement>
  <repository>Bibliothèque Municipale</repository>
  <idno>MS 3</idno>
  <msName>Maurdramnus Bible</msName>
 </msIdentifier>
<!-- other elements here -->
 <msPart>
  <altIdentifier>
   <idno>MS 6</idno>
  </altIdentifier>
<!-- other information specific to this part here -->
 </msPart>
<!-- other msParts here -->
</msDesc>

3. Annotating namess of people and places

3.1. Names

TEI provides several ways of marking up names and nominal expressions:
  • <rs> ("referring string") -- any phrase which refers to a person or place, e.g. ‘the girl you mentioned’, ‘my husband’...
  • <name> -- any lexical item recognized as a proper name e.g. ‘Budleigh Salterton’ , ‘Bouallebec’, ‘John Doe’ ...
  • <persName>, <placeName>, <orgName>: ‘syntactic sugar’ for <name type="person"> etc.
  • A rich set of proposals for the components of such nominal expressions, e.g. <surname>, <forename>, <geogName>, <geogFeat> etc.

3.2. Entities

Before P5.1.0, the TEI was reluctant to stray into database territory, while recognising the need to distinguish clearly the encoding of references from the encoding of referenced entities. It now provides:

  • <person> corresponding with <persName>
  • <place> corresponding with <placeName>
  • <org> corresponding with <orgName>
  • and in addition <relation>, <event> ...

3.3. Why?

  • to facilitate a more detailed and explicit encoding source documents (historical materials for example) which are primarily of interest because they concern objects in the real world
  • to support the encoding of "data-centric" documents, such as authority files, biographical or geographical dictionaries and gazeteers etc.
  • to represent and model in a uniform way data which is only implicit in readings of many different documents

3.4. Reference theory

Reference is a fundamental semiotic concept
  • We can talk about the real world using natural languages because we know that some types of word are closely associated with real, specific, objects
  • Proper names and technical terms afre canonical examples of this kind of word
  • ‘Martin Luther King’ refers to a single real world entity; ‘Lyon’ and ‘River Thames’ to others: a specific place, a specific river respectively
  • When we translate between natural languages, usually the proper names don't change, or are conventionally equivalent

3.5. How do we represent this association?

Every element which is a member of the att.naming class inherits two attributes from the att.canonical class:
key
provides an externally-defined means of identifying the entity (or entities) being named, using a coded value of some kind.
ref
.provides an explicit means of locating a full definition for the entity being named by means of one or more URIs.
as well as the attributes
role
may be used to specify further information about the entity referenced by this name, for example the occupation of a person, or the status of a place.
nymRef
provides a means of locating the canonical form (nym) of the names associated with the object named by the element bearing it.

Arguably, key is redundant, since ref is defined as anyURI

3.6. Examples

<p>... <name ref="#jsbachtype="person">Johann Sebastian Bach
 </name> the German composer was born in 1685... </p>
<p>... <name ref="grove:jsbachtype="person">Johann Sebastian
   Bach </name>the German composer was born in 1685...
</p>
<p>... <name role="composer">Engelbert Humperdink</name> was
born in 1854... </p>
<p>... <name role="singer">Engelbert Humperdink</name> was born
in 1936... </p>

3.7. References take many forms

Even within a single language, in a single document, there may be many ways of referencing the same person:
...<persName>Clara
Schumann</persName>.... <persName>Clara</persName>
....
<persName>Frau Schumann</persName>
The key or ref can be used simply to combine all references to a specified person:
.... <persName ref="#CS">Clara Schumann</persName>.... <persName ref="#CS">Clara</persName> ....
<persName ref="#CS">Frau
Schumann</persName>
<!-- ... elsewhere -->
<person xml:id="CS">
 <persName xml:lang="de">
<!-- everything we want to say about this lady -->
 </persName>
</person>

3.8. References are also ambiguous

<s>Jean likes
<name ref="#NN123">Nancy</name>
</s>
Using a more precise element (<persName> or <placeName>) is one way of resolving the ambiguity; another is to follow the pointer:
<person xml:id="NN123">
 <persName>
  <forename>Nancy</forename>
  <surname>Ide</surname>
 </persName>
<!-- ... -->
</person>
or...
<place xml:id="NN123">
 <placeName notBefore="1400">Nancy</placeName>
 <placeName notAfter="0056">Nantium</placeName>
<!-- ... -->
</place>

3.9. Components of <persName> elements

<persName ref="#jsbachxml:lang="de">
 <forename type="first">Johann</forename>
 <forename type="middle">Sebastian</forename>
 <surname>Bach</surname>
</persName>
<persName ref="#jsbachxml:lang="fr">
 <forename type="composé">Jean-Sébastien</forename>
 <surname>Bach</surname>
</persName>
Not to mention... <roleName> (e.g. ‘Emperor’), <genName> (eg ‘the Elder’) <addName> (e.g. ‘Hammer of the Scots’), <nameLink> a link between components (e.g. ‘van der’) ...

3.10. Components of place names

  • <placeName> (names can be made up of other names)
  • <geogName> a name associated with some geographical feature such as a mountain or river
  • <geogFeat> a term for some particular kind of geographical feature e.g. ‘Mount’, ‘Lake’
<placeName>
 <geogFeat>Mont</geogFeat>
 <geogName>Blanc</geogName>
</placeName>

3.11. Place names generally fall into a kind of hierarchy

3.12. What can we say about named entities?

Potentially, quite a lot...
<person xml:id="VM1893">
 <persName xml:lang="ru">Владимир Владимирович
   Маяковский</persName>
 <persName xml:lang="fr">Wladimir Maïakowski</persName>
 <birth when="1893-07-19">7 July (OS) 1893, <placeName ref="#BGDTxml:lang="en">Baghdati,
     Georgia</placeName>
 </birth>
 <death when="1930-04-14"/>
 <occupation>Poet and playwright, among the foremost
   representatives of early-20th century Russian
   Futurism.</occupation>
</person>

What elements should the TEI provide for such a purposes?

3.13. Traits, States, and Events

As elsewhere in the TEI, we resolve this question by adding a layer of abstraction. We distinguish three classes of information:
  • <trait>: a more or less intrinsic property of the entity, which (usually) does not change over time (e.g. eye colour for a person, location for a place)
  • <state>: a property which applies to the entity for some specific extent of time (e.g. occupation for a person, population for a place)
  • <event>: an independent event in the real world which may lead to a change in state or trait (e.g. birth for a person, a war for a place)

We define elements for these prototypes, together with more specific elements, make up the associated element classes. The more specific elements aim to meet most common needs

Additionally, all these elements are members of the ‘datable’ class (see below)

3.14. Traits

Some typical traits of a person
  • <faith>: faith, belief system, religion etc. of a person
  • <langKnowledge>: linguistic knowledge of a person
  • <nationality>: nationality (socio-politico status)
  • <sex>: sex
  • <socecStatus>: socio-economic status
Some typical traits of a place:
  • <climate>: describes the climate
  • <location>: describes where a place is (see later)
  • <population>: describes its population
  • <terrain>: describes its terrain

Some of these (e.g. sex) have normalised attributes, but mostly they contain free text descriptions.

3.15. States

Some typical states for a person
  • <occupation> an informal description of a person's trade, profession or occupation
  • <residence> (residence) a person's present or past places of residence
  • <affiliation> an informal description of a person's present or past affiliation with some organization
  • <education> a description of the educational experience of a person
  • <floruit> contains information about a person's period of activity

3.16. A place is defined by its location

The <location> element can contain
  • a more or less well-structured description using the hierarchy of place name components mentioned earlier (a politico-geographical location)
  • a set of geographical co-ordinates
<place type="building">
 <placeName>Brasserie Georges</placeName>
 <location>
  <country key="FR"/>
  <settlement type="city">Lyon</settlement>
  <district type="arrondissement">Perrache</district>
  <placeName type="street">cours de Verdun</placeName>
 </location>
 <location>
  <geo>45.748 4.828</geo>
 </location>
</place>

3.17. A place can be fictional

<place type="imaginary">
 <placeName>Atlantis</placeName>
 <location>
  <offset>fifty leagues beyond</offset>
  <placeName>Pillars of
  <persName>Hercules</persName>
  </placeName>
 </location>
</place>

3.18. Places can self-nest

<place xml:id="LT">
 <country>Lithuania</country>
 <country xml:lang="lt">Lietuva</country>
 <place xml:id="LT-VN">
  <settlement>Vilnius</settlement>
 </place>
 <place xml:id="LT-KA">
  <settlement>Kaunas</settlement>
 </place>
</place>

3.19. Events

For persons, only two specific event elements are defined: <birth> and <death>. Anything else must be defined using the generic <event> element and its type attribute.

<person xml:id="rwagner">
 <persName>
  <forename>Richard</forename>
  <surname>Wagner</surname>
 </persName>
 <birth when="1813-05-22"/>
 <event type="marriagewhen="1836-11-24">
  <desc>On 24 November 1836, Wagner and <persName ref="#MINPLAN">Minna Planer</persName> were married.
  </desc>
 </event>
 <event type="movenotBefore="1836-11-24">
  <desc>They moved to the town of
  <placeName>Riga</placeName>, at that time part of
  <bloc>the Russian empire</bloc>.</desc>
 </event>
</person>

3.20. W3C Date Formats

Any datable element can be associated with a more or less exact date or date range using any combination of the following attributes:
when
supplies the value of a date or time in a standard form
notBefore
specifies the earliest possible date for the event in standard form
notAfter
specifies the latest possible date for the event in standard form
from
indicates the starting point of the period in standard form
to
indicates the ending point of the period in standard form

The ‘ standard form’ by default is that defined by W3C; a parallel set of attributes is provided for ISO format dates.

All dates are normalised to the Gregorian calendar.

3.21. Personal Relationships

The <relation> (relationship) element describes any kind of relationship or linkage amongst other entities

We distinguish ‘mutual’ relationships (e.g. sibling) from non-mutual or directed relationships (e.g. parent-of).

The following attributes are available:
name
supplies a name for the kind of relationship of which this is an instance
active
identifies the 'active' participants in a non-mutual relationship, or all the participants in a mutual one
mutual
supplies a list of participants amongst all of whom the relationship holds equally
passive
identifies the ‘passive’ participants in a non-mutual relationship

3.22. Example

<person xml:id="jsbach">
 <persName>Johann Sebastian Bach</persName>
</person>
<person xml:id="cdbach">
 <persName>Catharina Dorothea Bach</persName>
</person>
<person xml:id="ghbach">
 <persName>Gottfried Heinrich Bach</persName>
</person>
<!--….-->
<relationGrp type="childrensubtype="first-marriage">
 <relation name="parentactive="#jsbachpassive="#cdbach"/>
<!--….-->
</relationGrp>
<relationGrp type="childrensubtype="second-marriage">
 <relation name="parentactive="#jsbachpassive="#ghbach"/>
<!--….-->
</relationGrp>

3.23. Nyms

The elements <listNym> and <nym> are used to document the canonical form of a name or name-component.
  • <nym>
    • can contain model.entryParts (e.g. <form>, <orth>, <etym>) and may also include a number of other <nym>s
    • in addition to global attributes and att.typed, it includes the attribute parts to point to constituent <nym>s
  • <listNym> a list of canonical names
  • nymRef has been added to the attribute class att.naming to refer to the canonical name

3.24. Example

<nym xml:id="J45">
 <form xml:lang="la">Iohannes</form>
 <nym xml:id="J450">
  <form xml:lang="en">John</form>
  <nym xml:id="J4501">
   <form>Johnny</form>
  </nym>
  <nym xml:id="J4502">
   <form>Jon</form>
  </nym>
 </nym>
 <nym xml:id="J455">
  <form xml:lang="ru">Ivan</form>
 </nym>
 <nym xml:id="J453">
  <form xml:lang="fr">Jean</form>
 </nym>
</nym>

4. Optional Exercise: Adding metadata

The second exercise instructs you on how to add a complete manuscript description to the transcription we completed in the first exercise. If you didn't finish it the first exercise you can continue it now. If you have no interest in manuscript description, perhaps just give the exercise a quick read through. If we've run out of time because I've talked too much, then you can do this exercise in your own time if you wish.



James Cummings. Date: 2010-10
Copyright University of Oxford