Text only | Skip links
Skip links||IT Services, University of Oxford

1. Transcription of primary sources

In most cases when creating an electronic edition, the primary source from which it orignates may have certain textual phenomena. For example:
  • Text which is abbreviations and/or expanded
  • Editorial correction and conjecture
  • Scribal additions, deletions, substitutions, or restorations
  • Text which an editor supplies or intentionally omits
  • Text which is damaged or illegible

1.1. Transcribable features

Which features of a primary sources might one want to include in a transcription?

  • variant letter forms
  • page layout
  • orthography
  • capitalisation
  • word division
  • punctuation
  • abbreviations
  • additions and deletions
  • errors and omissions
  • regularizations

1.2. Elements defined for transcription work

Defined in 'core' module:
abbr add choice corr del expan gap sic
Defined in 'transcr' module:
addSpan am damage damageSpan delSpan ex facsimile fw handNotes handShift restore space subst supplied surface zone

1.3. <choice> reminder

  • <choice> (groups alternative editorial encodings)
  • Abbreviation:
    • <abbr> (abbreviated form)
    • <expan> (expanded form)
  • Errors:
    • <sic> (apparent error)
    • <corr> (corrected error)
  • Regularization:
    • <orig> (original form)
    • <reg> (regularized form)

1.4. Abbreviation

Abbreviations are highly characteristic of manuscript materials of all kinds. Western MSS traditionally distinguish:
the first letter or letters of the word are written, generally followed by a point, or other marker: for example e.g. for exempla gratia
both first and last letters are written, generally with some other mark of abbreviation such as a superscript stroke, or, less commonly, a point or points: e.g. Mr. for Mister
Special signs or tittels, such as the Tironian nota used for ‘et’, the letter p with a barred tail commonly used for per, the letter c with a circumflex used for cum (ĉ) etc.
Superscript letters (vowels or consonants) are often used to indicate various kinds of contraction: e.g. w followed by superscript ch for which.

1.5. Abbreviation and Expansion

A manuscript abbreviation may be viewed in two ways:
  • One may transcribe it as a particular sequence of letters or marks upon the page: thus, a ‘p with a bar through the descender’, a ‘superscript hook’, a ‘macron’
  • One may also interpret the abbreviation in terms of the letter or letters it is seen as standing for: thus, ‘per’, ‘re’, ‘n'

Both of these views are able to be recorded simultaneously by the TEI

1.6. Two Levels of Encoding Abbreviations

TEI proposes two levels of encoding:
  • the whole of an abbreviated word and the whole of its expansion: <abbr> and <expan>
  • abbreviatory signs or characters and the ‘invisible’ characters they imply: <am> and <ex>

1.7. <ex> and <am>

Using these elements, from the 'transcr' module, a transcriber may indicate the status of the individual letters or signs within both the abbreviation and the expansion.
  • <ex> (editorial expansion) contains a sequence of letters added by an editor or transcriber when expanding an abbreviation.
  • <am> (abbreviation marker) contains a sequence of letters or signs present in an abbreviation which are omitted or replaced in the expanded form of the abbreviation.
Previously, people have re-purposed existing elements such as <hi> and <supplied> to mark individual letters/signs in abbreviations and expansions. The new P5 elements <am> and <ex> are the TEI's attempt to support this desire.

1.8. Brevigraph Example

The Old Icelandic word ‘hann’ (‘he’) is usually written as a brevigraph in medieval manuscripts, combining the letter h with a horizontal stroke representing nasalisation (Unicode character 0305, functionally similar to the modern tilde). It looks like this:

1.9. Encoding abbreviations

Depending on editorial policy, we might represent this combination in any one of the following ways:
<abbr>h&amp;#x305;</abbr> or

1.10. <abbr> and <expan> Examples

eu<g ref="#er">er</g>y <g ref="#per">per</g>sone that loketh after heven hath a place in this ladder
<abbr>ev<g ref="#er">er</g>y</abbr>
 <g ref="#per">per</g>sone
</abbr> ...
<expan>persone</expan> ...
 <abbr>ev<g ref="#er">er</g>y</abbr>

1.11. Brevigraph Alternative

We could also indicate multiple alternatives (at either level) by using the <choice> element




1.12. Classifying abbreviations

The type attribute on <abbr> allows us to provide alternative renderings for the same markup in different contexts.
 <abbr type="susp">k<am>&amp;#x307;</am>
 <abbr type="tittel">ml<am>&amp;#x305;</am>i</abbr>
k(onungr) mællti

As elsewhere, the resp and cert attributes can also be used to indicate who is responsible for an expansion, and the degree of certainty attached to it.

1.13. More Abbreviation and Expansion Examples (1)



<expan>senatus populusque romanorum</expan>

1.14. More Abbreviation and Expansion Examples (2)


 <expan>senatus populusque romanorum</expan>


  <expan>senatus populusque romanorum</expan>
  <expan>Sebastian Patrick Quintus Rahtz</expan>

1.15. Earlier Examples Now With <am> and <ex>

  <g ref="#er"/>
  <g ref="#per"/>
</abbr> ...
</expan> ...

1.16. And one more...

  <g ref="#er"/>
</choice>y <choice>
  <g ref="#per"/>
</choice>sone ...

1.17. Corrections and emendations

The <sic> element can be used to indicate that the reading of the manuscript is erroneous or nonsensical, while <corr> (correction) can be used to provide what in the editor's opinion is the correct reading:
Alternatively, they may be combined within a <choice> element, thus allowing the possibility of providing multiple corrections:
 <corr cert="high">giorir</corr>
 <corr cert="low">gioret</corr>

1.18. Correction/Conjecture Examples

Nos autem iam ostendimus quod nutrimentum
et <choice>

1.19. Normalization

Source texts rarely use modern normalized orthography. For retrieval and other processing reasons, such information may be useful in a transcription. The <reg> (regularized) element is available used to mark a normalized form; the <orig> (original) element to indicate a non-standard spelling. These elements can optionally be grouped as alternatives using the <choice> element:

1.20. Normalization example

 <l>There was an Old Woman,</l>
  </choice> under a hill,</l>
 <l>And if she <orig>'int</orig> gone,</l>
 <l>She lives there still.</l>

1.21. Additions, deletions, and substitutions

Alterations made to the text, whether by the scribe or in some later hand, can be encoded using <add> (addition) or <del> (deletion).

Where the addition and deletion are regarded as a single substitution, they can be grouped together using the <subst> (substitution) element :
  • <add> (addition) or <del> (deletion) are used for evident alterations in the source
  • a combined addition and deletion may be marked using <subst> (substitution)

1.22. Additions and Deletions

  • <add> (addition) contains letters, words, or phrases inserted in the text by an author, scribe, annotator, or corrector.
  • <addSpan/> (added span of text) marks the beginning of a longer sequence of text added by an author, scribe, annotator or corrector (see also add).
  • <del> (deletion) contains a letter, word, or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, annotator, or corrector.
  • <delSpan/> (deleted span of text) marks the beginning of a longer sequence of text deleted, marked as deleted, or otherwise signaled as superfluous or spurious by an author, scribe, annotator, or corrector.

1.23. <add> and <del> Examples

by the ancient Dutch navigators <del rend="strikethroughhand="#WI">of these waters</del> the Tappaan Zee, and where they
always <add hand="#WIplace="supralinear">prudently</add> shortened sail ...
<handNote xml:id="WI">Washington Irving holograph</handNote>

1.24. <addSpan> and <delSpan>

These two elements delimit a span of text by pointing mechanisms rather than by enclosing it. This is useful if an addition or deletion overlaps another span of text.

spanTo indicates the end of a span initiated by the element bearing this attribute.

<addSpan spanTo="#id4"/>
<!-- added text -->
<anchor xml:id="id4"/>

1.25. Substitutions

<subst> (substitution) groups one or more deletions with one or more additions when the combination is to be regarded as a single intervention in the text. Examples:
  • one word/letter written over another
  • one word/letter deleted, replaced by another written above it by the same hand at one time
  • one word/letter deleted, replaced by a different hand some other time
  • a long chain of substitutions on the one stretch of text, with uncertainty as to the order of substitution and as to which of many possible readings should be preferred

1.26. <subst> Examples

 <delSpan rend="verticalStrikespanTo="#delend02"/> Tis moonlight <subst>
 </subst> Oman's sky
<l>Her isles of pearl look lovelily<anchor xml:id="delend02"/>

Deletion precedes addition, but this may be over-ridden by means of the seq attribute indicating sequence.

One must have lived longer with <subst>
 <del seq="1">this</del>
 <del seq="2">
  <add seq="1">such a</add>
 <add seq="2">a</add>
</subst> system, to appreciate its advantages.

1.27. Another <subst> example

<l>And towards our distant rest began to trudge,</l>
  <del>Helping the worst amongst us</del>
  <add>Dragging the worst amongt us</add>
 </subst>, who'd no boots
<l>But limped on, blood-shod. All went lame; <subst>
  <del status="shortEnd">half-</del>
 </subst> blind;</l>
<l>Drunk with fatigue ; deaf even to the hoots</l>
<l>Of tired, outstripped <del>fif</del> five-nines that dropped behind.</l>

1.28. Cancellation of Deletions and Other Markings

<restore> indicates restoration of text to an earlier state by cancellation of an editorial or authorial marking or instruction.

by the ancient Dutch navigators <restore hand="#WI2">
 <del rend="strikethroughhand="#WI2">of these waters</del>
</restore> the Tappaan Zee, and where they always <add hand="#WI2place="supralinear">prudently</add> shortened sail ...
<handNote xml:id="WI2">Washington Irving

1.29. <restore>

If in ‘For I hate this my body’, the my was first deleted then restored by writing ‘stet’ in the margin. This may be encoded:
For I hate this
<restore hand="#dhltype="marginalStetNote">

1.30. Text Omitted from or Supplied in the Transcription

  • <gap> indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible or inaudible.
  • <supplied> signifies text supplied by the transcriber or editor for any reason, typically because the original cannot be read because of physical damage or loss to the original.

1.31. Supplied text

Sometimes, a transcript may need to include words not visibly present in the source:
  • because the carrier has been damaged or is barely legible
  • because of (assumed) scribal error
The <supplied> element is provided for use in either situations; the reason attribute is used to distinguish them.
…Dragging the worst
among<supplied reason="omitted">s</supplied>t us…

1.32. Metadata for supplied text

Attributes resp and cert can be used here as elsewhere. A source attribute is also available to indicate that another witness supports the reconstruction:
<p>ath þeir <supplied reason="omittedsource="AM02-152">mundu</supplied> sundr ganga</p>
When missing text cannot be confidently reconstructed, the <gap> element should be used. Its reason attribute explains the reason for the omission and its extent and unit attributes indicate its presumed size.
<gap reason="damageextent="7unit="cm"/>

1.33. <gap> and <supplied> examples

expansion <gap reason="illegibleagent="water"/> river denominated
expansion <supplied reason="illegiblesource="#SH1862">of the</supplied>river denominated

1.34. <gap> Example

 <head>Lectio x.</head>
 <p> Hic itaque paterfamilias ad excolendam
    reason="not transcribed"

   congregare non desistit.

1.35. More <supplied>

Where the transcriber considers that one or more words have been erroneously omitted in the original source and corrects this omission, the <supplied> element should be used in preference to <corr>.

by the ancient Dutch navigators
<supplied>of</supplied> the Tappan Zee

1.36. <supplied> Example

<p>Oblatus est
<supplied reason="omittedresp="#DC"> quia ipse voluit</supplied>.

1.37. Damage and Illegibility

Use <unclear> if the text has been rendered partly illegible by deletion or damage so that the text can be read but without perfect confidence

Use the reason attribute to state the cause (damage, deletion, etc.) of the uncertainty in transcription and the cert attribute to indicate the confidence in the transcription.

shore of the <unclear reason="damagecert="medium">the Hudson, at</unclear> that broad

1.38. <damage>, <space>, and <unclear> Example

Revelabunt caeli iniquitatem Judae et <damage agent="rubbing"/> consurget et <space/> manifestum erit peccatum ipsius in die furoris do<unclear agent="rubbingresp="#JC">mini</unclear> cum eis qui dixerunt domino deo recede a nobis scientiam viarum tuarum nolumus

1.39. Damage and Illegibility

Use <damage> if the text can be read with perfect confidence

<!-- ... -->
 <pb n="5r"/>
 <damageSpan agent="rubbingextent="whole leafspanTo="#damageEnd"/>
<p> .... </p>
<p> .... <pb n="5vxml:id="damageEnd"/>

1.40. Disjoint Damage

IN the bosom <damage group="1">o</damage>f one of those spa<lb n="2"/>cious coves wh<damage group="1">ich inde</damage>nt the eastern <lb n="3"/>shore of the <damage group="1">Hudson, at </damage>that broad <lb n="4"/>expansion <damage group="1">of the r</damage>iver denominated <lb n="5"/>by the ancie<damage>nt</damage> Dutch navigators

1.41. <fw>

(forme work) contains a running head (e.g. a header, footer), catchword, or similar material appearing on the current page.
<fw place="top-centretype="head">Poëms.</fw>
<fw place="top-righttype="pageno">29</fw>
<fw place="bot-centretype="sig">E3</fw>
<fw place="bot-righttype="catch">TEMPLE</fw>

1.42. <handNote> and <handShift>

The <handNote> element is used to provide information about each hand distinguished within the encoded document.

  • When the 'transcr' module is used, the element <handNotes> is available, within the <profileDesc> element of the Header, to hold one or more <handNote> elements. (brief)
  • When the 'msdescription' module is included, the <handDesc> element also becomes available as part of a structured manuscript description. (more robust)

It is possible to use the two elements together if, for example, the <handDesc> element contains a single summary describing all the hands discursively, while the <handNotes> element gives specific details of each.

1.43. <handShift>

<handShift> marks the beginning of a sequence of text written in a new hand, or the beginning of a scribal stint.

<l>When wolde the cat dwelle in his ynne</l>
<handShift medium="greenish-ink"/>
<l>And if the cattes skynne be slyk <handShift medium="black-ink"/> and gaye</l>
 <handNote xml:id="h1script="copperplate">Carefully written with regular descenders</handNote>
 <handNote xml:id="h2medium="pencil">Unschooled scrawl</handNote>

1.44. <handShift> Example

<handShift new="#h1resp="#das"/>... and that good Order Decency and
regular worship may be once more introduced and Established in this Parish according to
the Rules and Ceremonies of the Church of England and as under a good Consciencious and
sober Curate there would and ought to be <handShift new="#h2resp="#das"/> and
for that purpose the parishioners pray

1.45. hand, resp, cert

<add place="suprahand="#WJcert="medium"> But</add>
 <corr resp="#FBcert="high">one</corr>
</choice> must have lived ...
<!-- elsewhere -->
<respStmt xml:id="FB">
 <resp>editorial changes</resp>
 <name>Fredson Bowers</name>
<respStmt xml:id="WJ">
 <resp>authorial changes</resp>
 <name>William James</name>

2. Critical Apparatus

Scholarly editions of texts, especially texts of great antiquity or importance, often record some or all of the known variations among different witnesses to the text. Witnesses to a text may include authorial or other manuscripts, printed editions of the work, early translations, or quotations of a work in other texts.

The TEI provides methods for encoding not only an existing critical apparatus, but also ways to mark up a text so that such an apparatus can be generated (without the limitations of necessarily choosing a base text).

2.1. Format of an apparatus

The format of an apparatus usually has several parts:
  • The location of the variant in the text (act, scene, line number)
  • The lemma, which is the portion of the text to which the note applies
  • A right bracket (]) or some other separator
  • The source from which the edition took its reading
  • A list of variants, in each case followed by the source in which the variant is found, and usually separated with a semicolon.

2.2. Apparatus Criticus

The standard Apparatus Criticus provides a concise method of recording the variants for any size of text. To take an example, a line in Hamlet might be printed as:
LAERTES. Alas, then she is drowned.
with a critical apparatus provided (usually at the foot of the page) which contained:
4.7.156 Alas, then is she drowned.] HIBBARD; Alas then, is she drown'd? F; Alas then is she drownd. Q3; Alas, then, she is drownd. Q2; So, she is drownde: Q1.

2.3. Critical Apparatus: <app>, <rdg>, and <lem>

(apparatus entry) contains one entry in a critical apparatus, with an optional lemma and at least one reading.
(reading) contains a single reading within a textual variation.
(lemma) contains the lemma, or base text, of a textual variation.

2.4. Example of <app>, <rdg> and <lem>

  <lem wit="#El">Experience though noon Auctoritee</lem>
  <rdg wit="#Hg">Experience thogh noon Auctorite</rdg>
  <rdg wit="#La">Experiment thouh noon Auctoritee</rdg>
  <rdg wit="#Ra2">Eryment though none auctorite</rdg>

2.5. Or apparatus at smaller granularity

  <lem wit="#El #Hg">Experience</lem>
  <rdg type="substantivewit="#La">Experiment</rdg>
  <rdg type="substantivewit="#Ra2">Eryment</rdg>
  <lem wit="#El #Ra2">though</lem>
  <rdg type="orthographicwit="#Hg">thogh</rdg>
  <rdg type="orthographicwit="#La">thouh</rdg>
  <lem wit="#El #La #Hg">noon</lem>
  <rdg type="orthographicwit="#Ra2">none</rdg>
  <lem wit="#El #La">Auctoritee</lem>
  <rdg type="orthographicwit="#Hg">Auctorite</rdg>
  <rdg type="orthographicwit="#Ra2">auctorite</rdg>

2.6. <rdgGrp>, <witDetail>, and <wit>

(reading group) within a textual variation, groups two or more readings perceived to have a genetic relationship or other affinity.
(witness detail) gives further information about a particular witness, or witnesses, to a particular reading.
(witness) contains a list of one or more sigla of witnesses attesting a given reading, in a textual variation.

2.7. <rdgGrp> Example

<app type="substantive">
 <rdgGrp type="subvariants">
  <lem wit="#El #Hg">Experience</lem>
  <rdg wit="#Ha4">Experiens</rdg>
 <rdgGrp type="subvariants">
  <lem wit="#Cp #Ld1">Experiment</lem>
  <rdg wit="#La">Ex&amp;p-underbar;iment</rdg>
 <rdgGrp type="subvariants">
  <rdg wit="#Ra2">Eryment</rdg>

2.8. <witDetail> Example

<app type="substantive">
 <rdgGrp type="subvariants">
  <lem wit="#El #Hgxml:id="W026">Experience</lem>
  <rdg wit="#Ha4">Experiens</rdg>
<witDetail resp="#PRtarget="#W026wit="#El"> Ornamental capital. </witDetail>

2.9. <listWit> and <witness>

(witness list) lists definitions for all the witnesses referred to by a critical apparatus, optionally grouped hierarchically.
contains either a description of a single witness referred to within the critical apparatus, or a list of witnesses which is to be referred to by a single sigil.

One should also use a <msDesc> instead of a <witness> if more information is available.

2.10. <listWit> example

 <witness xml:id="El">Ellesmere, Huntingdon Library 26.C.9</witness>
 <witness xml:id="Hg">Hengwrt, National Library of Wales,
   Aberystwyth, Peniarth 392D</witness>
 <witness xml:id="ms">Sole manuscript</witness>
 <witness xml:id="Ra2">Bodleian Library Rawlinson Poetic 149
   (see further <ptr target="#MSRP149"/>)</witness>

2.11. Nested <listWit>

Witnesses that are similar can be grouped together so that they can be referred by a single siglum:
 <witness xml:id="Ellesmere">Ellesmere, Huntingdon Library 26.C.9</witness>
<!-- ... -->
 <listWit xml:id="Con">
  <head>Constant Group C</head>
  <witness xml:id="Cp">Corpus Christi Oxford MS 198 </witness>
  <witness xml:id="La">British Library Lansdowne 851 </witness>
  <witness xml:id="Sl2">British Library Sloane MS 1686 </witness>
<!-- elsewhere -->
<rdg wit="#Con">Experiment</rdg>
refers to all these manuscripts.

2.12. Fragmentary Witnesses

(fragmented witness start) indicates the beginning, or resumption, of the text of a fragmentary witness
(fragmented witness end) indicates the end, or suspension, of the text of a fragmentary witness.
indicates the beginning of a lacuna in the text of a mostly complete textual witness.
indicates the end of a lacuna in a mostly complete textual witness.

2.13. Fragmentary Witnesses Example

 <lem wit="#El #Hg">Auctoritee</lem>
 <rdg wit="#La #Ra2">auctorite</rdg>
 <rdg wit="#X">


 <lem wit="#El #Hg">Auctoritee</lem>
 <rdg wit="#La #Ra2">auctorite</rdg>
 <rdg wit="#X">

2.14. Location Referenced Example

<div n="WBPtype="prologue">
 <head>The Prologe of the Wyves Tale of Bathe</head>
 <l n="1">Experience though noon Auctoritee</l>
 <l>Were in this world ...</l>
<!-- Elsewhere in Document: -->
<app loc="WBP 1">
 <rdg wit="#La">Experiment</rdg>
 <rdg wit="#Ra2">Eryment</rdg>


<l n="1">Experience though noon Auctoritee
  <rdg wit="#La"> Experiment</rdg>
  <rdg wit="#Ra2"> Eryment</rdg>
<l>Were in this world ...</l>

2.15. Double End-Point Attachment Example

<div n="WBPtype="prologue">
 <head>The Prologe ... </head>
 <l n="1xml:id="WBP.1">Experience<anchor xml:id="WBP-A2"/>
   though noon Auctoritee</l>
 <l>Were in this world ...</l>
<!-- Elsewhere in the same document -->
<app from="#WBP.1to="#WBP-A2">
 <rdg wit="#La">Experiment</rdg>
 <rdg wit="#Ra2">Eryment</rdg>

2.16. Parallel Segmentation Example

<l n="1">
  <rdg wit="#Chi3">Auctoritee, though none experience</rdg>
    <rdg wit="#El #Hg">Experience</rdg>
    <rdg wit="#La">Experiment</rdg>
    <rdg wit="#Ra2">Eryment</rdg>
    <rdg wit="#El #Ra2">though</rdg>
    <rdg wit="#Hg">thogh</rdg>
    <rdg wit="#La">thouh</rdg>
    <rdg wit="#El #Hg">noon Auctorite</rdg>
    <rdg wit="#La #Ra2">none auctorite</rdg>

2.17. A Simple <app> With No <lem>

<ab> Populus domini et oves pascuae eius <app>
    wit="#CAO-B #CAO-V #CAO-R #CAO-D #CAO-F #CAO-S #Ely #Wor #Wcb">
venite adoremus eum</rdg>
  <rdg wit="#CAO-H #Pet"> venite adoremus deum</rdg>
  <rdg wit="#CAO-E #Alb2"> venite adoremus dominum</rdg>
  <rdg wit="#CAO-C #CAO-G #CAO-L #Hyd #Evm"> venite

2.18. Attaching Notes Example

Virginite is grete
 <rdg resp="#ES">perfecti<abbr>oi</abbr>
 <rdg resp="#FJFxml:id="f105"> perfectio<expan>u</expan>n</rdg>
 <rdg resp="#PGRxml:id="r105"> perfectiou<expan>n</expan>
<!-- ... <note> appearing elsewhere in the document ... -->
<note target="#r105 #f105">Furnivall's expansion implies that the bar is an abbreviation for 'u'. There are no certain instances of this mark as an abbreviation for 'u' in these MSS and it is widely used as an abbreviation for 'n'. Ruggiers' expansion is to
be accepted.</note>

2.19. Hamlet example

Think back to the example given from Hamlet:
LAERTES. Alas, then she is drowned.
Where the traditional critical apparatus contained:
4.7.156 Alas, then is she drowned.] HIBBARD; Alas then, is she drown'd? F; Alas then is she drownd. Q3; Alas, then, she is drownd. Q2; So, she is drownde: Q1.
How would you choose to mark it up in TEI?

2.20. How I'd do it (given time)

<l n="156">
  <rdg wit="#Hib">Alas, then</rdg>
  <rdg wit="#F">Alas then,</rdg>
  <rdg wit="#Q3">Alas then</rdg>
  <rdg wit="#Q2">Alas, then,</rdg>
  <rdg wit="#Q1">So,</rdg>
  <rdg wit="#Hib #F #Q3">is she</rdg>
  <rdg wit="#Q2 #Q1">she is</rdg>
  <rdg wit="#Hib">drowned.</rdg>
  <rdg wit="#F">drown'd?</rdg>
  <rdg wit="#Q3 #Q2">drownd.</rdg>
  <rdg wit="#Q1">drownde:</rdg>

James Cummings. Date: April 2009
Copyright University of Oxford