Text only | Skip links
Skip links||IT Services, University of Oxford

1. Transcription of primary sources

In most cases when creating an electronic edition, the primary source from which it orignates may have certain textual phenomena. For example:
  • Text which is abbreviated and/or expanded
  • Editorial correction and conjecture
  • Scribal additions, deletions, substitutions, or restorations
  • Text which an editor supplies or intentionally omits
  • Text which is damaged or illegible

1.1. Transcribable features

Which features of a primary sources might one want to include in a transcription?

  • variant letter forms
  • page layout
  • orthography
  • capitalisation
  • word division
  • punctuation
  • abbreviations
  • additions and deletions
  • errors and omissions
  • regularizations

1.2. Elements defined for transcription work

Defined in 'core' module:
abbr add choice corr del expan gap sic
Defined in 'transcr' module:
addSpan am damage damageSpan delSpan ex facsimile fw handNotes handShift restore space subst supplied surface zone

1.3. <choice> reminder

  • <choice> (groups alternative editorial encodings)
  • Abbreviation:
    • <abbr> (abbreviated form)
    • <expan> (expanded form)
  • Errors:
    • <sic> (apparent error)
    • <corr> (corrected error)
  • Regularization:
    • <orig> (original form)
    • <reg> (regularized form)

1.4. Abbreviation

Abbreviations are highly characteristic of manuscript materials of all kinds. Western MSS traditionally distinguish:
Suspensions
the first letter or letters of the word are written, generally followed by a point, or other marker: for example e.g. for exempla gratia
Contractions
both first and last letters are written, generally with some other mark of abbreviation such as a superscript stroke, or, less commonly, a point or points: e.g. Mr. for Mister
Brevigraphs
Special signs or tittels, such as the Tironian nota used for ‘et’, the letter p with a barred tail commonly used for per, the letter c with a circumflex used for cum (ĉ) etc.
Superscripts
Superscript letters (vowels or consonants) are often used to indicate various kinds of contraction: e.g. w followed by superscript ch for which.

1.5. Abbreviation and Expansion

A manuscript abbreviation may be viewed in two ways:
  • One may transcribe it as a particular sequence of letters or marks upon the page: thus, a ‘p with a bar through the descender’, a ‘superscript hook’, a ‘macron’
  • One may also interpret the abbreviation in terms of the letter or letters it is seen as standing for: thus, ‘per’, ‘re’, ‘n'

Both of these views may be recorded simultaneously by the TEI

1.6. Two Levels of Encoding Abbreviations

TEI proposes two levels of encoding:
  • the whole of an abbreviated word and the whole of its expansion: <abbr> and <expan>
  • abbreviatory signs or characters and the ‘invisible’ characters they imply: <am> and <ex>

1.7. <ex> and <am>

Using these elements, from the 'transcr' module, a transcriber may indicate the status of the individual letters or signs within both the abbreviation and the expansion.
  • <ex> (editorial expansion) contains a sequence of letters added by an editor or transcriber when expanding an abbreviation.
  • <am> (abbreviation marker) contains a sequence of letters or signs present in an abbreviation which are omitted or replaced in the expanded form of the abbreviation.
Previously, people have re-purposed existing elements such as <hi> and <supplied> to mark individual letters/signs in abbreviations and expansions. The new P5 elements <am> and <ex> are the TEI's recommended tags for standardizing this markup.

1.8. Brevigraph Example

The Old Icelandic word ‘hann’ (‘he’) is usually written as a brevigraph in medieval manuscripts, combining the letter h with a horizontal stroke representing nasalisation (Unicode character 0305, functionally similar to the modern tilde). It looks like this:

1.9. Encoding abbreviations

Depending on editorial policy, we might represent this combination in any one of the following ways:
<abbr>h&amp;#x305;</abbr> or
<abbr></abbr>
<expan>hann</expan>
h<am>&amp;#x305;</am>
h<ex>ann</ex>
<abbr>h<am>&amp;#x305;</am>
</abbr>
<expan>h<ex>ann</ex>
</expan>

1.10. <abbr> and <expan> Examples

eu<g ref="#er">er</g>y <g ref="#per">per</g>sone that loketh after heven hath a place in this ladder
<abbr>ev<g ref="#er">er</g>y</abbr>
<abbr>
 <g ref="#per">per</g>sone
</abbr> ...
<expan>every</expan>
<expan>persone</expan> ...
<choice>
 <abbr>ev<g ref="#er">er</g>y</abbr>
 <expan>every</expan>
</choice>

1.11. Brevigraph Alternative

We could also indicate multiple alternatives (at either level) by using the <choice> element

h<choice>
 <am>&amp;#x305;</am>
 <ex>ann</ex>
</choice>

or

<choice>
 <abbr>h&amp;#x305;</abbr>
 <expan>hann</expan>
</choice>

1.12. Classifying abbreviations

The type attribute on <abbr> allows us to provide alternative renderings for the same markup in different contexts.
<choice>
 <abbr type="susp">k<am>&amp;#x307;</am>
 </abbr>
 <expan>k<ex>onungr</ex>
 </expan>
</choice>
<choice>
 <abbr type="tittel">ml<am>&amp;#x305;</am>i</abbr>
 <expan>m<ex>æl</ex>l<ex>t</ex>i</expan>
</choice>
k(onungr) mællti

As elsewhere, the resp and cert attributes can also be used to indicate who is responsible for an expansion, and the degree of certainty attached to it.

1.13. More Abbreviation and Expansion Examples (1)

<abbr>SPQR</abbr>

or

<expan>senatus populusque romanorum</expan>

1.14. More Abbreviation and Expansion Examples (2)

or

<choice>
 <abbr>SPQR</abbr>
 <expan>senatus populusque romanorum</expan>
</choice>

or

<choice>
 <abbr>SPQR</abbr>
 <choice>
  <expan>senatus populusque romanorum</expan>
  <expan>Sebastian Patrick Quintus Rahtz</expan>
 </choice>
</choice>

1.15. Earlier Examples Now With <am> and <ex>

<abbr>ev<am>
  <g ref="#er"/>
 </am>y</abbr>
<abbr>
 <am>
  <g ref="#per"/>
 </am>sone
</abbr> ...
or
<expan>ev<ex>er</ex>y</expan>
<expan>
 <ex>per</ex>sone
</expan> ...

1.16. And one more...

ev<choice>
 <am>
  <g ref="#er"/>
 </am>
 <ex>er</ex>
</choice>y <choice>
 <am>
  <g ref="#per"/>
 </am>
 <ex>per</ex>
</choice>sone ...

1.17. Corrections and emendations

The <sic> element can be used to indicate that the reading of the manuscript is erroneous or nonsensical, while <corr> (correction) can be used to provide what in the editor's opinion is the correct reading:
<sic>giorit</sic>
<corr>giorir</corr>
Alternatively, they may be combined within a <choice> element, thus allowing the possibility of providing multiple corrections:
<choice>
 <sic>giorit</sic>
 <corr cert="high">giorir</corr>
 <corr cert="low">gioret</corr>
</choice>

1.18. Correction/Conjecture Examples

Nos autem iam ostendimus quod nutrimentum
et <choice>
 <sic>angues</sic>
 <corr>augens</corr>
</choice>.

1.19. Normalization

Source texts rarely use modern normalized orthography. For retrieval and other processing reasons, such information may be useful in a transcription. The <reg> (regularized) element is available used to mark a normalized form; the <orig> (original) element to indicate a non-standard spelling. These elements can optionally be grouped as alternatives using the <choice> element:

1.20. Normalization example

<lg>
 <l>There was an Old Woman,</l>
 <l>
  <choice>
   <orig>Liv'd</orig>
   <reg>Lived</reg>
  </choice> under a hill,</l>
 <l>And if she <orig>'int</orig> gone,</l>
 <l>She lives there still.</l>
</lg>

1.21. Additions, deletions, and substitutions

Alterations made to the text, whether by the scribe or in some later hand, can be encoded using <add> (addition) or <del> (deletion).

Where the addition and deletion are regarded as a single substitution, they can be grouped together using the <subst> (substitution) element :
  • <add> (addition) or <del> (deletion) are used for evident alterations in the source
  • a combined addition and deletion may be marked using <subst> (substitution)

1.22. Additions and Deletions

  • <add> (addition) contains letters, words, or phrases inserted in the text by an author, scribe, annotator, or corrector.
  • <addSpan/> (added span of text) marks the beginning of a longer sequence of text added by an author, scribe, annotator or corrector (see also add).
  • <del> (deletion) contains a letter, word, or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, annotator, or corrector.
  • <delSpan/> (deleted span of text) marks the beginning of a longer sequence of text deleted, marked as deleted, or otherwise signaled as superfluous or spurious by an author, scribe, annotator, or corrector.

1.23. <add> and <del> Examples

by the ancient Dutch navigators <del rend="strikethroughhand="#WI">of these waters</del> the Tappaan Zee, and where they
always <add hand="#WIplace="supralinear">prudently</add> shortened sail ...
<handNote xml:id="WI">Washington Irving holograph</handNote>

1.24. <addSpan> and <delSpan>

These two elements delimit a span of text by pointing mechanisms rather than by enclosing it. This is useful if an addition or deletion overlaps another span of text.

spanTo indicates the end of a span initiated by the element bearing this attribute.

<addSpan spanTo="#id4"/>
<!-- added text -->
<anchor xml:id="id4"/>

1.25. Substitutions

<subst> (substitution) groups one or more deletions with one or more additions when the combination is to be regarded as a single intervention in the text. Examples:
  • one word/letter written over another
  • one word/letter deleted, replaced by another written above it by the same hand at one time
  • one word/letter deleted, replaced by a different hand some other time
  • a long chain of substitutions on the one stretch of text, with uncertainty as to the order of substitution and as to which of many possible readings should be preferred

1.26. <subst> Examples

<l>
 <delSpan rend="verticalStrikespanTo="#delend02"/> Tis moonlight <subst>
  <del>upon</del>
  <add>over</add>
 </subst> Oman's sky
</l>
<l>Her isles of pearl look lovelily<anchor xml:id="delend02"/>
</l>

Deletion precedes addition, but this may be over-ridden by means of the seq attribute indicating sequence.

One must have lived longer with <subst>
 <del seq="1">this</del>
 <del seq="2">
  <add seq="1">such a</add>
 </del>
 <add seq="2">a</add>
</subst> system, to appreciate its advantages.

1.27. Another <subst> example

<l>And towards our distant rest began to trudge,</l>
<l>
 <subst>
  <del>Helping the worst amongst us</del>
  <add>Dragging the worst amongt us</add>
 </subst>, who'd no boots
</l>
<l>But limped on, blood-shod. All went lame; <subst>
  <del status="shortEnd">half-</del>
  <add>all</add>
 </subst> blind;</l>
<l>Drunk with fatigue ; deaf even to the hoots</l>
<l>Of tired, outstripped <del>fif</del> five-nines that dropped behind.</l>

1.28. Cancellation of Deletions and Other Markings

<restore> indicates restoration of text to an earlier state by cancellation of an editorial or authorial marking or instruction.

by the ancient Dutch navigators <restore hand="#WI2">
 <del rend="strikethroughhand="#WI2">of these waters</del>
</restore> the Tappaan Zee, and where they always <add hand="#WI2place="supralinear">prudently</add> shortened sail ...
<handNote xml:id="WI2">Washington Irving
holograph</handNote>

1.29. <restore>

If in ‘For I hate this my body’, the my was first deleted then restored by writing ‘stet’ in the margin. This may be encoded:
For I hate this
<restore hand="#dhltype="marginalStetNote">
 <del>my</del>
</restore>
body

1.30. Text Omitted from or Supplied in the Transcription

  • <gap> indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible or inaudible.
  • <supplied> signifies text supplied by the transcriber or editor for any reason, typically because the original cannot be read because of physical damage or loss to the original.

1.31. Supplied text

Sometimes, a transcript may need to include words not visibly present in the source:
  • because the carrier has been damaged or is barely legible
  • because of (assumed) scribal error
The <supplied> element is provided for use in either situations; the reason attribute is used to distinguish them.
…Dragging the worst
among<supplied reason="omitted">s</supplied>t us…

1.32. Metadata for supplied text

Attributes resp and cert can be used here as elsewhere. A source attribute is also available to indicate that another witness supports the reconstruction:
<p>ath þeir <supplied reason="omittedsource="AM02-152">mundu</supplied> sundr ganga</p>
When missing text cannot be confidently reconstructed, the <gap> element should be used. Its reason attribute explains the reason for the omission and its extent and unit attributes indicate its presumed size.
<gap reason="damageextent="7unit="cm"/>

1.33. <gap> and <supplied> examples

expansion <gap reason="illegibleagent="water"/> river denominated
expansion <supplied reason="illegiblesource="#SH1862">of the</supplied>river denominated

1.34. <gap> Example

<div>
 <head>Lectio x.</head>
 <p> Hic itaque paterfamilias ad excolendam
 <gap
    extent="20"
    unit="words"
    reason="not transcribed"
    resp="#DC"/>

   congregare non desistit.
 </p>
</div>

1.35. More <supplied>

Where the transcriber considers that one or more words have been erroneously omitted in the original source and corrects this omission, the <supplied> element should be used in preference to <corr>.

by the ancient Dutch navigators
<supplied>of</supplied> the Tappan Zee

1.36. <supplied> Example

<p>Oblatus est
<supplied reason="omittedresp="#DC"> quia ipse voluit</supplied>.
</p>

1.37. Damage and Illegibility

Use <unclear> if the text has been rendered partly illegible by deletion or damage so that the text can be read but without perfect confidence

Use the reason attribute to state the cause (damage, deletion, etc.) of the uncertainty in transcription and the cert attribute to indicate the confidence in the transcription.

shore of the <unclear reason="damagecert="medium">the Hudson, at</unclear> that broad

1.38. <damage>, <space>, and <unclear> Example

Revelabunt caeli iniquitatem Judae et <damage agent="rubbing"/> consurget et <space/> manifestum erit peccatum ipsius in die furoris do<unclear agent="rubbingresp="#JC">mini</unclear> cum eis qui dixerunt domino deo recede a nobis scientiam viarum tuarum nolumus

1.39. Damage and Illegibility

Use <damage> if the text can be read with perfect confidence

<p>
<!-- ... -->
 <pb n="5r"/>
 <damageSpan agent="rubbingextent="whole leafspanTo="#damageEnd"/>
</p>
<p> .... </p>
<p> .... <pb n="5vxml:id="damageEnd"/>
</p>

1.40. Disjoint Damage

IN the bosom <damage group="1">o</damage>f one of those spa<lb n="2"/>cious coves wh<damage group="1">ich inde</damage>nt the eastern <lb n="3"/>shore of the <damage group="1">Hudson, at </damage>that broad <lb n="4"/>expansion <damage group="1">of the r</damage>iver denominated <lb n="5"/>by the ancie<damage>nt</damage> Dutch navigators

1.41. <fw>

<fw>
(forme work) contains a running head (e.g. a header, footer), catchword, or similar material appearing on the current page.
<fw place="top-centretype="head">Poëms.</fw>
<fw place="top-righttype="pageno">29</fw>
<fw place="bot-centretype="sig">E3</fw>
<fw place="bot-righttype="catch">TEMPLE</fw>

1.42. <handNote> and <handShift>

The <handNote> element is used to provide information about each hand distinguished within the encoded document.

  • When the 'transcr' module is used, the element <handNotes> is available, within the <profileDesc> element of the Header, to hold one or more <handNote> elements. (brief)
  • When the 'msdescription' module is included, the <handDesc> element also becomes available as part of a structured manuscript description. (more robust)

It is possible to use the two elements together if, for example, the <handDesc> element contains a single summary describing all the hands discursively, while the <handNotes> element gives specific details of each.

1.43. <handShift>

<handShift> marks the beginning of a sequence of text written in a new hand, or the beginning of a scribal stint.

<l>When wolde the cat dwelle in his ynne</l>
<handShift medium="greenish-ink"/>
<l>And if the cattes skynne be slyk <handShift medium="black-ink"/> and gaye</l>

1.44. <handShift> Example

<handNotes>
 <handNote xml:id="h1script="copperplate">Carefully written with regular descenders</handNote>
 <handNote xml:id="h2medium="pencil">Unschooled scrawl</handNote>
</handNotes>
<handShift new="#h1resp="#das"/>... and that good Order Decency and
regular worship may be once more introduced and Established in this Parish according to
the Rules and Ceremonies of the Church of England and as under a good Consciencious and
sober Curate there would and ought to be <handShift new="#h2resp="#das"/> and
for that purpose the parishioners pray

1.45. hand, resp, cert

<add place="suprahand="#WJcert="medium"> But</add>
<choice>
 <sic>One</sic>
 <corr resp="#FBcert="high">one</corr>
</choice> must have lived ...
<!-- elsewhere -->
<respStmt xml:id="FB">
 <resp>editorial changes</resp>
 <name>Fredson Bowers</name>
</respStmt>
<respStmt xml:id="WJ">
 <resp>authorial changes</resp>
 <name>William James</name>
</respStmt>


Dot Porter. Date: July 2009
Copyright University of Oxford