Text only | Skip links
Skip links||IT Services, University of Oxford

1. Digital data, digital text...

Digital texts are only metaphorically books

... but this metaphor is so pervasive it affects our capacity to profit from them.

2. What's that noise in the digital library?

  • A digital edition should represent the intentions and meaning of a text, not simply its appearance
  • Otherwise, there can be no analysis beyond the documentary level, no "conversation between books"

3. Names

TEI provides several ways of marking up names and nominal expressions:
  • <rs> ("referring string") -- any phrase which refers to a person or place, e.g. ‘the girl you mentioned’, ‘my husband’...
  • <name> -- any lexical item recognized as a proper name e.g. ‘Budleigh Salterton’ , ‘Bouallebec’, ‘John Doe’ ...
  • <persName>, <placeName>, <orgName>: ‘syntactic sugar’ for <name type="person"> etc.
  • A rich set of proposals for the components of such nominal expressions, e.g. <surname>, <forename>, <geogName>, <geogFeat> etc.

4. Entities

Before P5.1.0, the TEI was reluctant to stray into database territory, while recognising the need to distinguish clearly the encoding of references from the encoding of referenced entities. It now provides:

  • <person> corresponding with <persName>
  • <place> corresponding with <placeName>
  • <org> corresponding with <orgName>
  • and in addition <relation>, <event> ...

5. Why?

  • to facilitate a more detailed and explicit encoding source documents (historical materials for example) which are primarily of interest because they concern objects in the real world
  • to support the encoding of "data-centric" documents, such as authority files, biographical or geographical dictionaries and gazeteers etc.
  • to represent and model in a uniform way data which is only implicit in readings of many different documents

6. Reference theory

Reference is a fundamental semiotic concept
  • We can talk about the real world using natural languages because we know that some types of word are closely associated with real, specific, objects
  • Proper names and technical terms afre canonical examples of this kind of word
  • ‘Martin Luther King’ refers to a single real world entity; ‘Lyon’ and ‘River Thames’ to others: a specific place, a specific river respectively
  • When we translate between natural languages, usually the proper names don't change, or are conventionally equivalent

7. How do we represent this association?

Every element which is a member of the att.naming class inherits two attributes from the att.canonical class:
key
provides an externally-defined means of identifying the entity (or entities) being named, using a coded value of some kind.
ref
.provides an explicit means of locating a full definition for the entity being named by means of one or more URIs.
as well as the attributes
role
may be used to specify further information about the entity referenced by this name, for example the occupation of a person, or the status of a place.
nymRef
provides a means of locating the canonical form (nym) of the names associated with the object named by the element bearing it.

Arguably, key is redundant, since ref is defined as anyURI

8. Examples

<p>... <name ref="#jsbachtype="person">Johann Sebastian
   Bach </name> the German composer was born in 1685... </p>
<p>... <name ref="grove:jsbachtype="person">Johann Sebastian
   Bach </name>the German composer was born in 1685... </p>
<p>... <name role="composer">Engelbert Humperdink</name>
was born in 1854... </p>
<p>... <name role="singer">Engelbert Humperdink</name>
was born in 1936... </p>

9. References take many forms

Even within a single language, in a single document, there may be many ways of referencing the same person:
...<persName>Clara Schumann</persName>.... <persName>Clara</persName> ....
<persName>Frau Schumann</persName>
The key or ref can be used simply to combine all references to a specified person:
....
<persName ref="#CS">Clara Schumann</persName>....
<persName ref="#CS">Clara</persName> ....

<persName ref="#CS">Frau Schumann</persName>
<!-- ... elsewhere -->
<person xml:id="CS">
 <persName xml:lang="de">
<!-- everything we want to say about this lady -->
 </persName>
</person>

10. References are also ambiguous

<s>Jean likes <name ref="#NN123">Nancy</name>
</s>
Using a more precise element (<persName> or <placeName>) is one way of resolving the ambiguity; another is to follow the pointer:
<person xml:id="NN123">
 <persName>
  <forename>Nancy</forename>
  <surname>Ide</surname>
 </persName>
<!-- ... -->
</person>
or...
<place xml:id="NN123">
 <placeName notBefore="1400">Nancy</placeName>
 <placeName notAfter="0056">Nantium</placeName>
<!-- ... -->
</place>

11. Components of <persName> elements

<persName ref="#jsbachxml:lang="de">
 <forename type="first">Johann</forename>
 <forename type="middle">Sebastian</forename>
 <surname>Bach</surname>
</persName>
<persName ref="#jsbachxml:lang="fr">
 <forename type="composé">Jean-Sébastien</forename>
 <surname>Bach</surname>
</persName>
Not to mention... <roleName> (e.g. ‘Emperor’), <genName> (eg ‘the Elder’) <addName> (e.g. ‘Hammer of the Scots’), <nameLink> a link between components (e.g. ‘van der’) ...

12. Components of place names

  • <placeName> (names can be made up of other names)
  • <geogName> a name associated with some geographical feature such as a mountain or river
  • <geogFeat> a term for some particular kind of geographical feature e.g. ‘Mount’, ‘Lake’
<placeName>
 <geogFeat>Mont</geogFeat>
 <geogName>Blanc</geogName>
</placeName>

13. Place names generally fall into a kind of hierarchy

14. What can we say about named entities?

Potentially, quite a lot...
<person xml:id="VM1893">
 <persName xml:lang="ru">Владимир Владимирович Маяковский</persName>
 <persName xml:lang="fr">Wladimir Maïakowski</persName>
 <birth when="1893-07-19">7 July (OS) 1893, <placeName ref="#BGDTxml:lang="en">Baghdati, Georgia</placeName>
 </birth>
 <death when="1930-04-14"/>
 <occupation>Poet and playwright, among the foremost representatives of early-20th century Russian Futurism.</occupation>
</person>

What elements should the TEI provide for such a purposes?

15. Traits, States, and Events

As elsewhere in the TEI, we resolve this question by adding a layer of abstraction. We distinguish three classes of information:
  • <trait>: a more or less intrinsic property of the entity, which (usually) does not change over time (e.g. eye colour for a person, location for a place)
  • <state>: a property which applies to the entity for some specific extent of time (e.g. occupation for a person, population for a place)
  • <event>: an independent event in the real world which may lead to a change in state or trait (e.g. birth for a person, a war for a place)

We define elements for these prototypes, together with more specific elements, make up the associated element classes. The more specific elements aim to meet most common needs

Additionally, all these elements are members of the ‘datable’ class (see below)

16. Traits

Some typical traits of a person
  • <faith>: faith, belief system, religion etc. of a person
  • <langKnowledge>: linguistic knowledge of a person
  • <nationality>: nationality (socio-politico status)
  • <sex>: sex
  • <socecStatus>: socio-economic status
Some typical traits of a place:
  • <climate>: describes the climate
  • <location>: describes where a place is (see later)
  • <population>: describes its population
  • <terrain>: describes its terrain

Some of these (e.g. sex) have normalised attributes, but mostly they contain free text descriptions.

17. States

Some typical states for a person
  • <occupation> an informal description of a person's trade, profession or occupation
  • <residence> (residence) a person's present or past places of residence
  • <affiliation> an informal description of a person's present or past affiliation with some organization
  • <education> a description of the educational experience of a person
  • <floruit> contains information about a person's period of activity

18. A place is defined by its location

The <location> element can contain
  • a more or less well-structured description using the hierarchy of place name components mentioned earlier (a politico-geographical location)
  • a set of geographical co-ordinates
<place type="building">
 <placeName>Brasserie Georges</placeName>
 <location>
  <country key="FR"/>
  <settlement type="city">Lyon</settlement>
  <district type="arrondissement">Perrache</district>
  <placeName type="street">cours de Verdun</placeName>
 </location>
 <location>
  <geo>45.748 4.828</geo>
 </location>
</place>

19. A place can be fictional

<place type="imaginary">
 <placeName>Atlantis</placeName>
 <location>
  <offset>fifty leagues beyond</offset>
  <placeName>Pillars of <persName>Hercules</persName>
  </placeName>
 </location>
</place>

20. Places can self-nest

<place xml:id="LT">
 <country>Lithuania</country>
 <country xml:lang="lt">Lietuva</country>
 <place xml:id="LT-VN">
  <settlement>Vilnius</settlement>
 </place>
 <place xml:id="LT-KA">
  <settlement>Kaunas</settlement>
 </place>
</place>

21. Events

For persons, only two specific event elements are defined: <birth> and <death>. Anything else must be defined using the generic <event> element and its type attribute.

<person xml:id="rwagner">
 <persName>
  <forename>Richard</forename>
  <surname>Wagner</surname>
 </persName>
 <birth when="1813-05-22"/>
 <event type="marriagewhen="1836-11-24">
  <desc>On 24 November 1836, Wagner and
  <persName ref="#MINPLAN">Minna Planer</persName> were married. </desc>
 </event>
 <event type="movenotBefore="1836-11-24">
  <desc>They moved to the town of
  <placeName>Riga</placeName>, at that time part of
  <bloc>the Russian empire</bloc>.</desc>
 </event>
</person>

22. W3C Date Formats

Any datable element can be associated with a more or less exact date or date range using any combination of the following attributes:
when
supplies the value of a date or time in a standard form
notBefore
specifies the earliest possible date for the event in standard form
notAfter
specifies the latest possible date for the event in standard form
from
indicates the starting point of the period in standard form
to
indicates the ending point of the period in standard form

The ‘ standard form’ by default is that defined by W3C; a parallel set of attributes is provided for ISO format dates.

All dates are normalised to the Gregorian calendar.

23. Personal Relationships

The <relation> (relationship) element describes any kind of relationship or linkage amongst other entities

We distinguish ‘mutual’ relationships (e.g. sibling) from non-mutual or directed relationships (e.g. parent-of).

The following attributes are available:
name
supplies a name for the kind of relationship of which this is an instance
active
identifies the 'active' participants in a non-mutual relationship, or all the participants in a mutual one
mutual
supplies a list of participants amongst all of whom the relationship holds equally
passive
identifies the ‘passive’ participants in a non-mutual relationship

24. Example

<person xml:id="jsbach">
 <persName>Johann Sebastian Bach</persName>
</person>
<person xml:id="cdbach">
 <persName>Catharina Dorothea Bach</persName>
</person>
<person xml:id="ghbach">
 <persName>Gottfried Heinrich Bach</persName>
</person>
<!--….-->
<relationGrp type="childrensubtype="first-marriage">
 <relation name="parentactive="#jsbachpassive="#cdbach"/>
<!--….-->
</relationGrp>
<relationGrp type="childrensubtype="second-marriage">
 <relation name="parentactive="#jsbachpassive="#ghbach"/>
<!--….-->
</relationGrp>

25. Nyms

The elements <listNym> and <nym> are used to document the canonical form of a name or name-component.
  • <nym>
    • can contain model.entryParts (e.g. <form>, <orth>, <etym>) and may also include a number of other <nym>s
    • in addition to global attributes and att.typed, it includes the attribute parts to point to constituent <nym>s
  • <listNym> a list of canonical names
  • nymRef has been added to the attribute class att.naming to refer to the canonical name

26. Example

<nym xml:id="J45">
 <form xml:lang="la">Iohannes</form>
 <nym xml:id="J450">
  <form xml:lang="en">John</form>
  <nym xml:id="J4501">
   <form>Johnny</form>
  </nym>
  <nym xml:id="J4502">
   <form>Jon</form>
  </nym>
 </nym>
 <nym xml:id="J455">
  <form xml:lang="ru">Ivan</form>
 </nym>
 <nym xml:id="J453">
  <form xml:lang="fr">Jean</form>
 </nym>
</nym>


TEI@Oxford. Date: 2010-07
Copyright University of Oxford