1. Transcription: a special kind of reading
What is the goal of a transcription?
- to make a primary source accessible ...
- ... and comprehensible
- which may imply adding or using much additional information
Hence,
- all transcription is selective
- all transcription is imaginative
2. Transcription
In transcription for the digital edition, some textual phenomena which commonly
attract editorial attention:
- original layout information
- abbreviations or other arcana
- ‘evident’ errors which invite correction or conjecture
- scribal additions, deletions, substitutions, restorations
- non-standard orthography (etc.) which invites normalisation
- irrelevant or non-transcribable material
- passages which are damaged or illegible
3. TEI Transcription
- <teiHeader>: provides metadata for the whole thing, at various
levels, notably including a <msDesc>
- <text>: contains a structured reading of a document's
intellectual content ... its ‘text’
- <facsimile>: organizes a set of page images representing a
document
- <sourceDoc>: a non-interpretative transcription of a physical
document, e.g. for a dossier
génétique
Does a transcription encode a ‘text’ or a ‘document’
?
4. What's this ?
- ‘agreable’ is struck-through, ‘pleasing’ is written above
it, in the interlinear space.
- ‘agreable’ is deleted and replaced by ‘pleasing’
- Originally, the text read ‘agreable’, but at some subsequent
stage this word was deleted; the word ‘pleasing’ was added in the
same context.
5. A typical minimal encoding
6. Character annotation
- distinguish allographical forms of a letter
- represent non-standard characters
<g> element (character or glyph) represents a glyph, or a non-standard
character
Paradi<g ref="#long-s-glyph">s</g>e Lo<g ref="#long-s-glyph">s</g>t
<g ref="#per-glyph">per</g>
and the referenced glyph should be defined in the <charDecl>
element in the header
<glyph xml:id="long-s-glyph">
</glyph>
7. Abbreviations &c.
In Western MSS, we commonly distinguish :
- Suspensions
- the first letter or letters of the word are written, generally
followed by a point : for example e.g. for exempla
gratia
- Contractions
- both first and last letters are written, generally with some mark
of abbreviation such as superscript strokes, or points : e.g.
Mr. for Mister
- Brevigraphs
- Special signs such as the Tironian nota used for
‘et’, the letter p with a barred tail used for
per, the letter c with a circumflex used for
cum etc.
- Superscripts
- Superscript letters (vowels or consonants) used to indicate
various kinds of contraction: e.g. w followed by
superscript ch for which.
Most of the symbols needed are available in Unicode, though not
necessarily in all fonts.
8. Abbreviation and Expansion
An abbreviation may be viewed in two different ways:
- as a particular sequence of letters or marks upon the page: thus,
a ‘p with a bar through the descender’, a ‘superscript hook’, a
‘macron’
- as another way of representing the letter or letters it is
believed to be standing for: thus, ‘per’, ‘re’, ‘n'
9. Two Levels of Encoding Abbreviations
TEI proposes elements for two levels of encoding:
- the whole of an abbreviated word and the whole of its expansion
may be marked using <abbr> and <expan>
respectively
- abbreviatory signs or characters and the
‘invisible’ characters they imply may be
marked using <am> and <ex> respectively
10. Compare ev(er)y (per)sone
ev<choice>
<am>
<g ref="#b-er"/>
</am>
<ex>er</ex>
</choice>y
<choice>
<am>
<g ref="#b-per"/>
</am>
<ex>per</ex>
</choice>sone
...
<choice>
<abbr>ev<am>
<g ref="#b-er"/>
</am> y</abbr>
<expan>ev<ex>er</ex>y</expan>
</choice>
<choice>
<abbr>
<am>
<g ref="#b-per"/>
</am>sone</abbr>
<expan>
<ex>per</ex>sone</expan>
</choice>
11. <ex> and <am>
Using these elements, from the 'transcr' module, a transcriber may indicate
the status of the individual letters or signs within both the abbreviation
and the expansion.
-
<ex> (editorial expansion) contains a sequence of letters
added by an editor or transcriber when expanding an
abbreviation.
-
<am> (abbreviation marker) contains a sequence of letters or
signs present in an abbreviation which are omitted or replaced in
the expanded form of the abbreviation.
Previously, people have re-purposed existing elements such as
<hi> and
<supplied> to mark individual letters/signs in
abbreviations and expansions. The new P5 elements
<am> and
<ex> are the TEI's attempt to support this desire.
12. A simple example (1)
.
Editorial strategy may be simply to note that we have expanded the
abbreviations:
<p>
<lb/>Cours chacune piece <expan>pour</expan>
<lb/>
<expan>cinquante</expan> soubz
<expan>tournois</expan>
<pc>.</pc>
</p>
13. A simple example (2)
As you noticed, ‘pour’ was actually written ‘po’ followed by an
‘r’ subscript; ‘cinquante’ as ‘cinquāte’ with
a macron on the ‘a’ to indicate nasalisation.
We could therefore encode as follows:
<p>
<abbr>po&#xFFFD;</abbr> .... <abbr>cinqu&#x0101;te</abbr>
</p>
... or we could choose one of the following styles:
<p> po<am>&#xFFFD;</am> ... or po<ex>u</ex>r </p>
<abbr>po<am>&#xFFFD;</am>
</abbr>
<expan>po<ex>u</ex>r</expan>
14. Simple example (3)
And of course TEI permits both cake and the eating of it:
<p> po<choice>
<am>&#xFFFD;</am>
<ex>ur</ex>
</choice>
</p>
<choice>
<abbr>po<am>&#xFFFD;</am>
</abbr>
<expan>po<ex>u</ex>r</expan>
</choice>
15. Choice
Children of a choice element all represent alternative ways of
encoding the same sequence and in most cases they are mutually
exclusive.
<choice>
<abbr>Dat</abbr>
<expan>Dat<ex>ae</ex>
</expan>
</choice>
Where the purpose of an encoding is to record textual variants, rather than
to identify multiple possible encoding decisions, the app element &
company should be preferred.
<app>
<rdg>
<expan>Dat<ex>ae</ex>
</expan>
</rdg>
<rdg>
<expan>Dat<ex>um</ex>
</expan>
</rdg>
</app>
16. A <choice> reminder
- <choice> (groups alternative editorial encodings)
- Abbreviation:
- <abbr> (abbreviated form)
- <expan> (expanded form)
- Errors:
- <sic> (apparent error)
- <corr> (corrected error)
- Regularization:
- <orig> (original form)
- <reg> (regularized form)
17. Classifying abbreviations
The
type attribute on
<abbr> is a useful way of
categorising abbreviations, whether for statistical purposes, or to allow
for different types to be rendered differently:
<choice>
<abbr type="brevigraphe">po<am>&#xFFFD;</am>
</abbr>
<expan>po<ex>u</ex>r</expan> en <choice>
<abbr type="suspension">fin<am>.</am>
</abbr>
<expan>fin<ex>ir</ex>
</expan>
</choice>
</choice>
This encoding might be displayed as :
po(u)r en finir
As elsewhere, the resp and cert attributes can also be
used to indicate who is responsible for an expansion, and the degree of
certainty attached to it.
18. Corrections and emendations
The
<sic> element can be used to indicate that the reading of the
manuscript is erroneous or nonsensical, while
<corr> (correction) can
be used to provide what in the editor's opinion is the correct reading:
<sic>relea</sic>
<corr>relicta</corr>
The two may, of course, be combined within a
<choice> element:
<choice>
<sic>relea</sic>
<corr cert="high">relicta</corr>
<corr cert="low">relatio</corr>
</choice>
19. Normalization
Source texts rarely use modern orthography. For retrieval and other
processing reasons, however, the modernized form may be very. The
<reg> (regularized) element is available used to mark a
normalized form; the <orig> (original) element to indicate a
non-standard spelling. These elements can optionally be grouped as
alternatives using the <choice> element:
20. Normalisation example
<lb/>dix <choice>
<orig>huict</orig>
<reg>huit</reg>
</choice> grains
<choice>
<orig>troys
quartz</orig>
<reg>trois-quart</reg>
</choice>
In this case, a further semantic regularisation is possible :
<lb/>
<measure quantity="18.75" unit="gr">dix
<choice>
<orig>huict</orig>
<reg>huit</reg>
</choice> grains
<choice>
<orig>troys
quartz</orig>
<reg>trois-quart</reg>
</choice>
</measure>
21. Additions, deletions, substitutions, and modifications
Alterations made to the text, whether by the scribe or in some later hand,
can be encoded using <add> (addition) or <del> (deletion).
Where the addition and deletion are regarded as a single
substitution, they can be grouped together using the
<subst> (substitution) element :
- <add> (addition) or <del> (deletion) are used for
evident alterations in the source
- a combined addition and deletion may be marked using
<subst> (substitution)
- <mod> (modification) represents any kind of general
modification without interpretation
22. A substitution?
<subst>
<del>half-</del>
<add>all</add>
</subst> blind
23. More context for Wilfred Owen
<l>And towards our distant rest began to trudge,</l>
<l>
<subst>
<del>Helping the worst amongst us</del>
<add>Dragging the worst
amongt us</add>
</subst>, who'd no boots
</l>
<l>But limped on, blood-shod. All went lame; <subst>
<del status="shortEnd">half-</del>
<add>all</add>
</subst>
blind;</l>
<l>Drunk with fatigue ; deaf even to the hoots</l>
<l>Of tired, outstripped <del>fif</del> five-nines that dropped
behind.</l>
24. Semi-legible text
Use <unclear> if the text is partly illegible i.e. it can be read but
without perfect confidence. The reason attribute here states the
cause of the uncertainty in transcription.
I <subst>
<add place="above">might</add>
<del>
<unclear reason="overinking" cert="medium" resp="#LDB"> should</unclear>
</del>
</subst>have
25. Supplied and damaged text
Use the <supplied> element if the transcriber has provided a reading
not actually visible in the text, whether because of damage or scribal error
: reason here indicates why the text has been supplied.
…Dragging the worst
among<supplied reason="authorialError">s</supplied>t us…
Use the
<damage> element to record the existence of physical damage to
the document, whether or not the damaged text is readable :
<l>The Moving Finger wri<damage agent="water" group="1">es; and</damage> having
writ,</l>
<l>Moves <damage agent="water" group="1">
<supplied>on: nor
all your</supplied>
</damage> Piety nor Wit</l>
26. Lacunae
When missing text cannot be confidently supplied or is intentionally omitted
the
<gap> element should be used with a
reason to explain
why. It can use its
extent and
unit attributes
indicate its size.
<gap reason="wormhole" extent="7" unit="mm"/>
I am dr Sr yr <gap reason="illegible" quantity="3" unit="word"/>Sydney Smith
27. Some difficulties
These methods are perfectly adequate where variation is comparatively simple.
They rapidly encounter problems when:
- overlap happens (as it always does)
- the sequence of interventions is important or indeterminate
- the layout and the meaning of the writing are not easily
separable
28. Text Omitted from or Supplied in the Transcription
-
<gap> indicates a point where material has been omitted in a
transcription, whether for editorial reasons described in the TEI
header, as part of sampling practice, or because the material is
illegible or inaudible.
-
<supplied> signifies text supplied by the transcriber or
editor for any reason, typically because the original cannot be read
because of physical damage or loss to the original.
29. <gap> and <supplied> examples
expansion <gap reason="illegible" agent="water"/> river denominated
expansion <supplied reason="illegible" source="#SH1862">of the</supplied>river
denominated
30.
<gap> Example
<div>
<head>Lectio x.</head>
<p> Hic itaque paterfamilias ad excolendam <gap
extent="20"
unit="words"
reason="not transcribed"
resp="#DC"/>
congregare non desistit. </p>
</div>
31. More <supplied>
Where the transcriber considers that one or more words have been erroneously
omitted in the original source and corrects this omission, the
<supplied> element should be used in preference to
<corr>.
by the ancient Dutch
navigators <supplied>of</supplied> the Tappan Zee
32.
<supplied> Example
<p>Oblatus est <supplied reason="omitted" resp="#DC"> quia ipse
voluit</supplied>. </p>
33.
<damage>, <space>, and <unclear> Example
Revelabunt caeli
iniquitatem Judae et <damage agent="rubbing"/> consurget et <space/>
manifestum erit peccatum ipsius in die furoris do<unclear agent="rubbing" resp="#JC">mini</unclear> cum eis qui dixerunt
domino deo recede a nobis scientiam viarum tuarum nolumus
34. Damage and Illegibility
Use <damage> if the text can be read with perfect confidence
<p>
<pb n="5r"/>
<damageSpan agent="rubbing" extent="whole leaf" spanTo="#damageEnd"/>
</p>
<p> .... </p>
<p> .... <pb n="5v" xml:id="damageEnd"/>
</p>
35. Disjoint Damage
IN the bosom <damage group="1">o</damage>f one of those spa<lb n="2"/>cious coves
wh<damage group="1">ich inde</damage>nt the eastern <lb n="3"/>shore
of the <damage group="1">Hudson, at </damage>that broad <lb n="4"/>expansion <damage group="1">of the r</damage>iver denominated <lb n="5"/>by the ancie<damage>nt</damage> Dutch navigators
36. Original layout information
The TEI privileges the logical view, but does permit the physical view to
‘show through’ as empty milestone elements :
- <gb> the start of a new gathering or quire
- <pb> the start of a new page
- <cb> the start of a new column
- <lb> the start of a new written line
These are primarily useful to establish a reference system.
The <fw> element can be used to mark ‘paratextual’
features such as running heads, foliotation etc.
The <handShift> element can be used to mark changes of hand or writing
in a document.
37. Editorial phrase-level elements
A summary list of some of the more important phrase-level transcription
elements might include:
- Core module: <abbr>, <add>, <choice>,
<corr>, <del>, <expan>, <gap>,
<orig>, <reg>, <sic>,
<unclear>
- 'transcr' module: <am>, <damage>, <ex>,
<metamark>, <mod>, <redo>,
<restore>, <retrace>, <space>, <subst>,
<supplied>, <surplus>, <transpose>,
<undo>
38. How far will the TEI take us ?
In particular, is the TEI scheme adequate for the needs of those transcribing
‘modern’ manuscripts ?
- surviving medieval or early modern manuscripts generally have a public
function, and a more or less conventionalised (if complex) format
- modern manuscripts or authorial drafts however often contain entirely
private or idiosyncratic signs, with no clear communicative
function
39. Text/Image
At all periods we find ‘playful’ texts whose meaning is conveyed by
their documentary appearance as much as by their linguistic properties, or
by the interplay between the two.
The TEI initially ruled such texts out of scope.
43. Critical Apparatus
A complex print format containing information whose
structure it might be useful to encode... c.f. dictionaries.
43.1. Critical Apparatus
Scholarly editions of texts, especially texts of great antiquity
or importance, often record some or all of the known variations
among different witnesses to the text. Witnesses to a text may
include authorial or other manuscripts, printed editions of the
work, early translations, or quotations of a work in other texts.
The TEI provides methods for encoding not only an existing
critical apparatus, but also ways to mark up a text so that such
an apparatus can be generated (without the limitations of
necessarily choosing a base text).
Textual editing inevitably reflects a theoretical stance about what
a text is, or should be. But there are many conflicting theories/traditions about
the editing of texts:
- Greg, Bowers, McKerrow, Tanselle et al.
- Greetham, McCann, Shillingsburg ...
- historisch-kritische Ausgabe (aka ‘The Germans’)
- l'édition génétique (aka ‘The French’)
As facilitator of multiple theories, the TEI tries to avoid a
theoretical stance, but rarely succeeds ...
43.2. Format of an apparatus
The format of an apparatus usually has several parts:
- The location of the variant in the text (act, scene,
line number)
- The lemma, which is the portion of the text to which
the note applies
- A right bracket (]) or some other separator
- The source from which the edition took its
reading
- A list of variants, in each case followed by the source
in which the variant is found, and usually separated with a
semicolon.
43.3. Apparatus Criticus
The standard
Apparatus
Criticus provides a concise method of recording the
variants for any size of text. To take an example, a line in
Hamlet might be printed as:
LAERTES. Alas, then she is drowned.
with a critical apparatus provided (usually at the foot of the
page) which contained:
4.7.156 Alas, then is she drowned.] HIBBARD; Alas then, is she drown'd? F; Alas then is she drownd. Q3; Alas, then, she is drownd. Q2; So, she is drownde: Q1.
43.4. Critical Apparatus: <app>, <rdg>, and
<lem>
-
<app>
- (apparatus entry) contains one entry in a critical
apparatus, with an optional lemma and at least one
reading.
-
<rdg>
- (reading) contains a single reading within a textual
variation.
-
<lem>
- (lemma) contains the lemma, or base text, of a textual
variation.
43.5. Parallel Segmentation Example
<l>
<app>
<lem wit="#El">Experience though noon Auctoritee</lem>
<rdg wit="#Hg">Experience thogh noon Auctorite</rdg>
<rdg wit="#La">Experiment thouh noon Auctoritee</rdg>
<rdg wit="#Ra2">Eryment though none auctorite</rdg>
</app>
</l>
43.6. Or apparatus at smaller granularity
<l>
<app>
<lem wit="#El #Hg">Experience</lem>
<rdg type="substantive" wit="#La">Experiment</rdg>
<rdg type="substantive" wit="#Ra2">Eryment</rdg>
</app>
<app>
<lem wit="#El #Ra2">though</lem>
<rdg type="orthographic" wit="#Hg">thogh</rdg>
<rdg type="orthographic" wit="#La">thouh</rdg>
</app>
<app>
<lem wit="#El #La #Hg">noon</lem>
<rdg type="orthographic" wit="#Ra2">none</rdg>
</app>
<app>
<lem wit="#El #La">Auctoritee</lem>
<rdg type="orthographic" wit="#Hg">Auctorite</rdg>
<rdg type="orthographic" wit="#Ra2">auctorite</rdg>
</app>
</l>
43.7.
<listWit> and <witness>
-
<listWit>
- (witness list) lists definitions for all the witnesses
referred to by a critical apparatus, optionally grouped
hierarchically.
-
<witness>
- contains either a description of a single witness
referred to within the critical apparatus, or a list of
witnesses which is to be referred to by a single
sigil.
One should also use a <msDesc> instead of a
<witness> if more information is available.
43.8. <listWit> example
<listWit>
<witness xml:id="El">Ellesmere, Huntingdon Library 26.C.9</witness>
<witness xml:id="Hg">Hengwrt, National Library of Wales,
Aberystwyth, Peniarth 392D</witness>
<witness xml:id="ms">Sole manuscript</witness>
<witness xml:id="Ra2">Bodleian Library Rawlinson Poetic 149
(see further <ptr target="#MSRP149"/>)</witness>
</listWit>
43.9. Nested <listWit>
Witnesses that are similar can be grouped together so that they can be referred by a single siglum:
<listWit>
<witness xml:id="Ellesmere">Ellesmere, Huntingdon Library 26.C.9</witness>
<listWit xml:id="Con">
<head>Constant Group C</head>
<witness xml:id="Cp">Corpus Christi Oxford MS 198 </witness>
<witness xml:id="La">British Library Lansdowne 851 </witness>
<witness xml:id="Sl2">British Library Sloane MS 1686 </witness>
</listWit>
</listWit>
<rdg wit="#Con">Experiment</rdg>
refers to all these manuscripts.
43.10. Alternative 1: Location Referenced Example
<div n="WBP" type="prologue">
<head>The Prologe of the Wyves Tale of Bathe</head>
<l n="1">Experience though noon Auctoritee</l>
<l>Were in this world ...</l>
</div>
<app loc="WBP 1">
<rdg wit="#La">Experiment</rdg>
<rdg wit="#Ra2">Eryment</rdg>
</app>
or
<l n="1">Experience though noon Auctoritee
<app>
<rdg wit="#La"> Experiment</rdg>
<rdg wit="#Ra2"> Eryment</rdg>
</app>
</l>
<l>Were in this world ...</l>
43.11. Alternative 2: Double End-Point Attachment Example
<div n="WBP" type="prologue">
<head>The Prologe ... </head>
<l n="1" xml:id="WBP.1">Experience<anchor xml:id="WBP-A2"/>
though noon Auctoritee</l>
<l>Were in this world ...</l>
</div>
<app from="#WBP.1" to="#WBP-A2">
<rdg wit="#La">Experiment</rdg>
<rdg wit="#Ra2">Eryment</rdg>
</app>
43.12. Parallel Segmentation Example
<l n="1">
<app>
<rdg wit="#Chi3">Auctoritee, though none experience</rdg>
<rdg>
<app>
<rdg wit="#El #Hg">Experience</rdg>
<rdg wit="#La">Experiment</rdg>
<rdg wit="#Ra2">Eryment</rdg>
</app>
<app>
<rdg wit="#El #Ra2">though</rdg>
<rdg wit="#Hg">thogh</rdg>
<rdg wit="#La">thouh</rdg>
</app>
<app>
<rdg wit="#El #Hg">noon Auctorite</rdg>
<rdg wit="#La #Ra2">none auctorite</rdg>
</app>
</rdg>
</app>
</l>
43.13. A Simple <app> With No <lem>
<ab> Populus domini et oves pascuae eius <app>
<rdg
wit="#CAO-B #CAO-V #CAO-R #CAO-D #CAO-F #CAO-S #Ely #Wor #Wcb"> venite adoremus eum</rdg>
<rdg wit="#CAO-H #Pet"> venite adoremus deum</rdg>
<rdg wit="#CAO-E #Alb2"> venite adoremus dominum</rdg>
<rdg wit="#CAO-C #CAO-G #CAO-L #Hyd #Evm"> venite adoremus</rdg>
</app>
</ab>
43.14. Attaching Notes Example
Virginite is grete <app>
<rdg resp="#ES">perfecti<abbr>oi</abbr>
</rdg>
<rdg resp="#FJF" xml:id="f105"> perfectio<expan>u</expan>n</rdg>
<rdg resp="#PGR" xml:id="r105"> perfectiou<expan>n</expan>
</rdg>
</app>
<note target="#r105 #f105">Furnivall's expansion implies that the bar is an abbreviation for 'u'. There are no certain instances of this mark as an abbreviation for 'u' in these MSS and it is widely used as an abbreviation for 'n'. Ruggiers' expansion is to be accepted.</note>
43.15. Hamlet example
Think back to the example given from
Hamlet:
LAERTES. Alas, then she is drowned.
Where the traditional critical apparatus contained:
4.7.156 Alas, then is she drowned.] HIBBARD; Alas then, is she drown'd? F;
Alas then is she drownd. Q3; Alas, then, she is drownd. Q2; So, she is drownde: Q1.
How would
you choose to mark it up in TEI?
43.16. How I'd do it (given time)
<l n="156">
<app>
<rdg wit="#Hib">Alas, then</rdg>
<rdg wit="#F">Alas then,</rdg>
<rdg wit="#Q3">Alas then</rdg>
<rdg wit="#Q2">Alas, then,</rdg>
<rdg wit="#Q1">So,</rdg>
</app>
<app>
<rdg wit="#Hib #F #Q3">is she</rdg>
<rdg wit="#Q2 #Q1">she is</rdg>
</app>
<app>
<rdg wit="#Hib">drowned.</rdg>
<rdg wit="#F">drown'd?</rdg>
<rdg wit="#Q3 #Q2">drownd.</rdg>
<rdg wit="#Q1">drownde:</rdg>
</app>
</l>
43.17. Next
Any Questions? Next we're going to do an exercise!