Text only | Skip links
Skip links||IT Services, University of Oxford

1. Transcription: a special kind of reading

What is the goal of a transcription?
  • to make a primary source accessible ...
  • ... and comprehensible
  • which may imply adding or using much additional information
  • all transcription is selective
  • all transcription is imaginative

2. Transcription

In transcription for the digital edition, some textual phenomena which commonly attract editorial attention:
  • original layout information
  • abbreviations or other arcana
  • ‘evident’ errors which invite correction or conjecture
  • scribal additions, deletions, substitutions, restorations
  • non-standard orthography (etc.) which invites normalisation
  • irrelevant or non-transcribable material
  • passages which are damaged or illegible

3. TEI Transcription

  • <teiHeader>: provides metadata for the whole thing, at various levels, notably including a <msDesc>
  • <text>: contains a structured reading of a document's intellectual content ... its ‘text’
  • <facsimile>: organizes a set of page images representing a document
  • <sourceDoc>: a non-interpretative transcription of a physical document, e.g. for a dossier génétique

Does a transcription encode a ‘text’ or a ‘document’ ?

4. What's this ?

  1. ‘agreable’ is struck-through, ‘pleasing’ is written above it, in the interlinear space.
  2. ‘agreable’ is deleted and replaced by ‘pleasing’
  3. Originally, the text read ‘agreable’, but at some subsequent stage this word was deleted; the word ‘pleasing’ was added in the same context.

5. A typical minimal encoding

6. Character annotation

  • distinguish allographical forms of a letter
  • represent non-standard characters

<g> element (character or glyph) represents a glyph, or a non-standard character

Paradi<g ref="#long-s-glyph">s</g>e Lo<g ref="#long-s-glyph">s</g>t
<g ref="#per-glyph">per</g>

and the referenced glyph should be defined in the <charDecl> element in the header

<glyph xml:id="long-s-glyph">
<!-- ... -->

7. Abbreviations &c.

In Western MSS, we commonly distinguish :
the first letter or letters of the word are written, generally followed by a point : for example e.g. for exempla gratia
both first and last letters are written, generally with some mark of abbreviation such as superscript strokes, or points : e.g. Mr. for Mister
Special signs such as the Tironian nota used for ‘et’, the letter p with a barred tail used for per, the letter c with a circumflex used for cum etc.
Superscript letters (vowels or consonants) used to indicate various kinds of contraction: e.g. w followed by superscript ch for which.

Most of the symbols needed are available in Unicode, though not necessarily in all fonts.

8. Abbreviation and Expansion

An abbreviation may be viewed in two different ways:
  • as a particular sequence of letters or marks upon the page: thus, a ‘p with a bar through the descender’, a ‘superscript hook’, a ‘macron’
  • as another way of representing the letter or letters it is believed to be standing for: thus, ‘per’, ‘re’, ‘n'

9. Two Levels of Encoding Abbreviations

TEI proposes elements for two levels of encoding:
  • the whole of an abbreviated word and the whole of its expansion may be marked using <abbr> and <expan> respectively
  • abbreviatory signs or characters and the ‘invisible’ characters they imply may be marked using <am> and <ex> respectively

10. Compare ev(er)y (per)sone

  <g ref="#b-er"/>
  <g ref="#b-per"/>
   <g ref="#b-er"/>
  </am> y</abbr>
   <g ref="#b-per"/>

11. <ex> and <am>

Using these elements, from the 'transcr' module, a transcriber may indicate the status of the individual letters or signs within both the abbreviation and the expansion.
  • <ex> (editorial expansion) contains a sequence of letters added by an editor or transcriber when expanding an abbreviation.
  • <am> (abbreviation marker) contains a sequence of letters or signs present in an abbreviation which are omitted or replaced in the expanded form of the abbreviation.
Previously, people have re-purposed existing elements such as <hi> and <supplied> to mark individual letters/signs in abbreviations and expansions. The new P5 elements <am> and <ex> are the TEI's attempt to support this desire.

12. A simple example (1)


Editorial strategy may be simply to note that we have expanded the abbreviations:
 <lb/>Cours chacune piece <expan>pour</expan>
 <expan>cinquante</expan> soubz

13. A simple example (2)

As you noticed, ‘pour’ was actually written ‘po’ followed by an ‘r’ subscript; ‘cinquante’ as ‘cinquāte’ with a macron on the ‘a’ to indicate nasalisation.

We could therefore encode as follows:
 <abbr>po&amp;#xFFFD;</abbr> .... <abbr>cinqu&amp;#x0101;te</abbr>
... or we could choose one of the following styles:
<p> po<am>&amp;#xFFFD;</am> ... or po<ex>u</ex>r </p>

14. Simple example (3)

And of course TEI permits both cake and the eating of it:
<p> po<choice>

15. Choice

Children of a choice element all represent alternative ways of encoding the same sequence and in most cases they are mutually exclusive.


Where the purpose of an encoding is to record textual variants, rather than to identify multiple possible encoding decisions, the app element & company should be preferred.


16. A <choice> reminder

  • <choice> (groups alternative editorial encodings)
  • Abbreviation:
    • <abbr> (abbreviated form)
    • <expan> (expanded form)
  • Errors:
    • <sic> (apparent error)
    • <corr> (corrected error)
  • Regularization:
    • <orig> (original form)
    • <reg> (regularized form)

17. Classifying abbreviations

The type attribute on <abbr> is a useful way of categorising abbreviations, whether for statistical purposes, or to allow for different types to be rendered differently:
 <abbr type="brevigraphe">po<am>&amp;#xFFFD;</am>
 <expan>po<ex>u</ex>r</expan> en <choice>
  <abbr type="suspension">fin<am>.</am>
This encoding might be displayed as  :
po(u)r en finir

As elsewhere, the resp and cert attributes can also be used to indicate who is responsible for an expansion, and the degree of certainty attached to it.

18. Corrections and emendations

The <sic> element can be used to indicate that the reading of the manuscript is erroneous or nonsensical, while <corr> (correction) can be used to provide what in the editor's opinion is the correct reading:
The two may, of course, be combined within a <choice> element:
 <corr cert="high">relicta</corr>
 <corr cert="low">relatio</corr>

19. Normalization

Source texts rarely use modern orthography. For retrieval and other processing reasons, however, the modernized form may be very. The <reg> (regularized) element is available used to mark a normalized form; the <orig> (original) element to indicate a non-standard spelling. These elements can optionally be grouped as alternatives using the <choice> element:

20. Normalisation example

<lb/>dix <choice>
</choice> grains

In this case, a further semantic regularisation is possible :
<measure quantity="18.75unit="gr">dix
 </choice> grains

21. Additions, deletions, substitutions, and modifications

Alterations made to the text, whether by the scribe or in some later hand, can be encoded using <add> (addition) or <del> (deletion).

Where the addition and deletion are regarded as a single substitution, they can be grouped together using the <subst> (substitution) element :
  • <add> (addition) or <del> (deletion) are used for evident alterations in the source
  • a combined addition and deletion may be marked using <subst> (substitution)
  • <mod> (modification) represents any kind of general modification without interpretation

22. A substitution?

</subst> blind

23. More context for Wilfred Owen

<l>And towards our distant rest began to trudge,</l>
  <del>Helping the worst amongst us</del>
  <add>Dragging the worst
     amongt us</add>
 </subst>, who'd no boots
<l>But limped on, blood-shod. All went lame; <subst>
  <del status="shortEnd">half-</del>
<l>Drunk with fatigue ; deaf even to the hoots</l>
<l>Of tired, outstripped <del>fif</del> five-nines that dropped

24. Semi-legible text

Use <unclear> if the text is partly illegible i.e. it can be read but without perfect confidence. The reason attribute here states the cause of the uncertainty in transcription.

I <subst>
 <add place="above">might</add>
  <unclear reason="overinkingcert="mediumresp="#LDB"> should</unclear>

25. Supplied and damaged text

Use the <supplied> element if the transcriber has provided a reading not actually visible in the text, whether because of damage or scribal error : reason here indicates why the text has been supplied.

…Dragging the worst
among<supplied reason="authorialError">s</supplied>t us…
Use the <damage> element to record the existence of physical damage to the document, whether or not the damaged text is readable :
<l>The Moving Finger wri<damage agent="watergroup="1">es; and</damage> having
<l>Moves <damage agent="watergroup="1">
  <supplied>on: nor
     all your</supplied>
 </damage> Piety nor Wit</l>

26. Lacunae

When missing text cannot be confidently supplied or is intentionally omitted the <gap> element should be used with a reason to explain why. It can use its extent and unit attributes indicate its size.
<gap reason="wormholeextent="7unit="mm"/>
I am dr Sr yr <gap reason="illegiblequantity="3unit="word"/>Sydney Smith

27. Some difficulties

These methods are perfectly adequate where variation is comparatively simple. They rapidly encounter problems when:
  • overlap happens (as it always does)
  • the sequence of interventions is important or indeterminate
  • the layout and the meaning of the writing are not easily separable

28. Text Omitted from or Supplied in the Transcription

  • <gap> indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible or inaudible.
  • <supplied> signifies text supplied by the transcriber or editor for any reason, typically because the original cannot be read because of physical damage or loss to the original.

29. <gap> and <supplied> examples

expansion <gap reason="illegibleagent="water"/> river denominated
expansion <supplied reason="illegiblesource="#SH1862">of the</supplied>river

30. <gap> Example

 <head>Lectio x.</head>
 <p> Hic itaque paterfamilias ad excolendam <gap
    reason="not transcribed"

   congregare non desistit. </p>

31. More <supplied>

Where the transcriber considers that one or more words have been erroneously omitted in the original source and corrects this omission, the <supplied> element should be used in preference to <corr>.

by the ancient Dutch
navigators <supplied>of</supplied> the Tappan Zee

32. <supplied> Example

<p>Oblatus est <supplied reason="omittedresp="#DC"> quia ipse
   voluit</supplied>. </p>

33. <damage>, <space>, and <unclear> Example

Revelabunt caeli
iniquitatem Judae et <damage agent="rubbing"/> consurget et <space/>
manifestum erit peccatum ipsius in die furoris do<unclear agent="rubbingresp="#JC">mini</unclear> cum eis qui dixerunt
domino deo recede a nobis scientiam viarum tuarum nolumus

34. Damage and Illegibility

Use <damage> if the text can be read with perfect confidence

 <pb n="5r"/>
 <damageSpan agent="rubbingextent="whole leafspanTo="#damageEnd"/>
<p> .... </p>
<p> .... <pb n="5vxml:id="damageEnd"/>

35. Disjoint Damage

IN the bosom <damage group="1">o</damage>f one of those spa<lb n="2"/>cious coves
wh<damage group="1">ich inde</damage>nt the eastern <lb n="3"/>shore
of the <damage group="1">Hudson, at </damage>that broad <lb n="4"/>expansion <damage group="1">of the r</damage>iver denominated <lb n="5"/>by the ancie<damage>nt</damage> Dutch navigators

36. Original layout information

The TEI privileges the logical view, but does permit the physical view to ‘show through’ as empty milestone elements :
  • <gb> the start of a new gathering or quire
  • <pb> the start of a new page
  • <cb> the start of a new column
  • <lb> the start of a new written line

These are primarily useful to establish a reference system.

The <fw> element can be used to mark ‘paratextual’ features such as running heads, foliotation etc.

The <handShift> element can be used to mark changes of hand or writing in a document.

37. Editorial phrase-level elements

A summary list of some of the more important phrase-level transcription elements might include:
  • Core module: <abbr>, <add>, <choice>, <corr>, <del>, <expan>, <gap>, <orig>, <reg>, <sic>, <unclear>
  • 'transcr' module: <am>, <damage>, <ex>, <metamark>, <mod>, <redo>, <restore>, <retrace>, <space>, <subst>, <supplied>, <surplus>, <transpose>, <undo>

38. How far will the TEI take us ?

In particular, is the TEI scheme adequate for the needs of those transcribing ‘modern’ manuscripts ?

  • surviving medieval or early modern manuscripts generally have a public function, and a more or less conventionalised (if complex) format
  • modern manuscripts or authorial drafts however often contain entirely private or idiosyncratic signs, with no clear communicative function

39. Text/Image

At all periods we find ‘playful’ texts whose meaning is conveyed by their documentary appearance as much as by their linguistic properties, or by the interplay between the two.

The TEI initially ruled such texts out of scope.

40. Difficult documents

41. More examples ...

42. Diary example

43. Critical Apparatus

A complex print format containing information whose structure it might be useful to encode... c.f. dictionaries.

43.1. Critical Apparatus

Scholarly editions of texts, especially texts of great antiquity or importance, often record some or all of the known variations among different witnesses to the text. Witnesses to a text may include authorial or other manuscripts, printed editions of the work, early translations, or quotations of a work in other texts.

The TEI provides methods for encoding not only an existing critical apparatus, but also ways to mark up a text so that such an apparatus can be generated (without the limitations of necessarily choosing a base text).

Textual editing inevitably reflects a theoretical stance about what a text is, or should be. But there are many conflicting theories/traditions about the editing of texts:
  • Greg, Bowers, McKerrow, Tanselle et al.
  • Greetham, McCann, Shillingsburg ...
  • historisch-kritische Ausgabe (aka ‘The Germans’)
  • l'édition génétique (aka ‘The French’)

As facilitator of multiple theories, the TEI tries to avoid a theoretical stance, but rarely succeeds ...

43.2. Format of an apparatus

The format of an apparatus usually has several parts:
  • The location of the variant in the text (act, scene, line number)
  • The lemma, which is the portion of the text to which the note applies
  • A right bracket (]) or some other separator
  • The source from which the edition took its reading
  • A list of variants, in each case followed by the source in which the variant is found, and usually separated with a semicolon.

43.3. Apparatus Criticus

The standard Apparatus Criticus provides a concise method of recording the variants for any size of text. To take an example, a line in Hamlet might be printed as:
LAERTES. Alas, then she is drowned.
with a critical apparatus provided (usually at the foot of the page) which contained:
4.7.156 Alas, then is she drowned.] HIBBARD; Alas then, is she drown'd? F; Alas then is she drownd. Q3; Alas, then, she is drownd. Q2; So, she is drownde: Q1.

43.4. Critical Apparatus: <app>, <rdg>, and <lem>

(apparatus entry) contains one entry in a critical apparatus, with an optional lemma and at least one reading.
(reading) contains a single reading within a textual variation.
(lemma) contains the lemma, or base text, of a textual variation.

43.5. Parallel Segmentation Example

  <lem wit="#El">Experience though noon Auctoritee</lem>
  <rdg wit="#Hg">Experience thogh noon Auctorite</rdg>
  <rdg wit="#La">Experiment thouh noon Auctoritee</rdg>
  <rdg wit="#Ra2">Eryment though none auctorite</rdg>

43.6. Or apparatus at smaller granularity

  <lem wit="#El #Hg">Experience</lem>
  <rdg type="substantivewit="#La">Experiment</rdg>
  <rdg type="substantivewit="#Ra2">Eryment</rdg>
  <lem wit="#El #Ra2">though</lem>
  <rdg type="orthographicwit="#Hg">thogh</rdg>
  <rdg type="orthographicwit="#La">thouh</rdg>
  <lem wit="#El #La #Hg">noon</lem>
  <rdg type="orthographicwit="#Ra2">none</rdg>
  <lem wit="#El #La">Auctoritee</lem>
  <rdg type="orthographicwit="#Hg">Auctorite</rdg>
  <rdg type="orthographicwit="#Ra2">auctorite</rdg>

43.7. <listWit> and <witness>

(witness list) lists definitions for all the witnesses referred to by a critical apparatus, optionally grouped hierarchically.
contains either a description of a single witness referred to within the critical apparatus, or a list of witnesses which is to be referred to by a single sigil.

One should also use a <msDesc> instead of a <witness> if more information is available.

43.8. <listWit> example

 <witness xml:id="El">Ellesmere, Huntingdon Library 26.C.9</witness>
 <witness xml:id="Hg">Hengwrt, National Library of Wales,
   Aberystwyth, Peniarth 392D</witness>
 <witness xml:id="ms">Sole manuscript</witness>
 <witness xml:id="Ra2">Bodleian Library Rawlinson Poetic 149
   (see further <ptr target="#MSRP149"/>)</witness>

43.9. Nested <listWit>

Witnesses that are similar can be grouped together so that they can be referred by a single siglum:
 <witness xml:id="Ellesmere">Ellesmere, Huntingdon Library 26.C.9</witness>
<!-- ... -->
 <listWit xml:id="Con">
  <head>Constant Group C</head>
  <witness xml:id="Cp">Corpus Christi Oxford MS 198 </witness>
  <witness xml:id="La">British Library Lansdowne 851 </witness>
  <witness xml:id="Sl2">British Library Sloane MS 1686 </witness>
<!-- elsewhere -->
<rdg wit="#Con">Experiment</rdg>
refers to all these manuscripts.

43.10. Alternative 1: Location Referenced Example

<div n="WBPtype="prologue">
 <head>The Prologe of the Wyves Tale of Bathe</head>
 <l n="1">Experience though noon Auctoritee</l>
 <l>Were in this world ...</l>
<!-- Elsewhere in Document: -->
<app loc="WBP 1">
 <rdg wit="#La">Experiment</rdg>
 <rdg wit="#Ra2">Eryment</rdg>


<l n="1">Experience though noon Auctoritee
  <rdg wit="#La"> Experiment</rdg>
  <rdg wit="#Ra2"> Eryment</rdg>
<l>Were in this world ...</l>

43.11. Alternative 2: Double End-Point Attachment Example

<div n="WBPtype="prologue">
 <head>The Prologe ... </head>
 <l n="1xml:id="WBP.1">Experience<anchor xml:id="WBP-A2"/>
   though noon Auctoritee</l>
 <l>Were in this world ...</l>
<!-- Elsewhere in the same document -->
<app from="#WBP.1to="#WBP-A2">
 <rdg wit="#La">Experiment</rdg>
 <rdg wit="#Ra2">Eryment</rdg>

43.12. Parallel Segmentation Example

<l n="1">
  <rdg wit="#Chi3">Auctoritee, though none experience</rdg>
    <rdg wit="#El #Hg">Experience</rdg>
    <rdg wit="#La">Experiment</rdg>
    <rdg wit="#Ra2">Eryment</rdg>
    <rdg wit="#El #Ra2">though</rdg>
    <rdg wit="#Hg">thogh</rdg>
    <rdg wit="#La">thouh</rdg>
    <rdg wit="#El #Hg">noon Auctorite</rdg>
    <rdg wit="#La #Ra2">none auctorite</rdg>

43.13. A Simple <app> With No <lem>

<ab> Populus domini et oves pascuae eius <app>
    wit="#CAO-B #CAO-V #CAO-R #CAO-D #CAO-F #CAO-S #Ely #Wor #Wcb">
venite adoremus eum</rdg>
  <rdg wit="#CAO-H #Pet"> venite adoremus deum</rdg>
  <rdg wit="#CAO-E #Alb2"> venite adoremus dominum</rdg>
  <rdg wit="#CAO-C #CAO-G #CAO-L #Hyd #Evm"> venite adoremus</rdg>

43.14. Attaching Notes Example

Virginite is grete <app>
 <rdg resp="#ES">perfecti<abbr>oi</abbr>
 <rdg resp="#FJFxml:id="f105"> perfectio<expan>u</expan>n</rdg>
 <rdg resp="#PGRxml:id="r105"> perfectiou<expan>n</expan>
<!-- ... <note> appearing elsewhere in the document ... -->
<note target="#r105 #f105">Furnivall's expansion implies that the bar is an abbreviation for 'u'. There are no certain instances of this mark as an abbreviation for 'u' in these MSS and it is widely used as an abbreviation for 'n'. Ruggiers' expansion is to be accepted.</note>

43.15. Hamlet example

Think back to the example given from Hamlet:
LAERTES. Alas, then she is drowned.
Where the traditional critical apparatus contained:
4.7.156 Alas, then is she drowned.] HIBBARD; Alas then, is she drown'd? F;
Alas then is she drownd. Q3; Alas, then, she is drownd. Q2; So, she is drownde: Q1.
How would you choose to mark it up in TEI?

43.16. How I'd do it (given time)

<l n="156">
  <rdg wit="#Hib">Alas, then</rdg>
  <rdg wit="#F">Alas then,</rdg>
  <rdg wit="#Q3">Alas then</rdg>
  <rdg wit="#Q2">Alas, then,</rdg>
  <rdg wit="#Q1">So,</rdg>
  <rdg wit="#Hib #F #Q3">is she</rdg>
  <rdg wit="#Q2 #Q1">she is</rdg>
  <rdg wit="#Hib">drowned.</rdg>
  <rdg wit="#F">drown'd?</rdg>
  <rdg wit="#Q3 #Q2">drownd.</rdg>
  <rdg wit="#Q1">drownde:</rdg>

43.17. Next

Any Questions? Next we're going to do an exercise!

Magdalena Turska @magdaturska. Date: February 2015
Copyright University of Oxford