Text only | Skip links
Skip links||IT Services, University of Oxford

1. TEI technology in Oxford projects

As a TEI Consortium host and provider of editorial services to the TEI, the University of Oxford naturally has many projects and services which use the TEI Guidelines.

The OUCS Research Technologies Service (RTS) aims to provide a centre of expertise supporting the development of research e-infrastructure. The RTS seeks to lead or collaborate on projects that pilot digital technologies on behalf of the University. Many of these projects use TEI XML where appropriate.

2. OUCS Website

  • Oxford University Computing Services Website: http://www.oucs.ox.ac.uk/
  • Subversion-based version control
  • Preview/Publish system
  • Many files split at top-level <div> elements for display
  • Currently processed with AxKit + XSLT
  • 'Simple Text', 'Printable', and 'Normal' views
  • Approx. 2000 of the files in the web tree are TEI XML files
  • Aprrox. 17000 individual HTML renderings

3. OUCS Website: http://www.oucs.ox.ac.uk/

4. OUCS Website: Another page

5. Oxford Text Archive

  • Founded in 1976 to preserve electronic texts
  • Provides long-term preservation for resources
  • Collects resources from any time period in any language
  • Makes suitable resources available for download
  • Acts (as part of RTS) as consultants on funded projects
  • Stores all metadata about each resources as a TEI header, regardless of format of actual resource

6. Oxford Text Archive Screenshot

7. OTA Catalogue

8. OTA Header Display

9. British National Corpus

  • a snapshot of British English, taken at the end of the 20th century
  • 100 million words in approx 4000 different text samples, both spoken (10%) and written (90%)‏
  • synchronic (1990-4), sampled, general purpose corpus
  • available under licence; latest edition is BNC-XML (13 March 2007)
  • Part-of-speech and lemma tagging
  • Uses a variant of TEI XML originally called CDIF


<wtext type="NONAC">
<div level="1" n="1" type="leaflet">
<head type="MAIN"><s n="1"><w c5="NN1" hw="factsheet" pos="SUBST">FACTSHEET</w>
<w c5="DTQ" hw="what" pos="PRON">WHAT</w> <w c5="VBZ" hw="be" pos="VERB">IS</w>
<w c5="NN1" hw="aids" pos="SUBST">AIDS</w><c c5="PUN">?</c> </s>  </head>
<p><s n="2"><hi rend="bo">  <w c5="NN1" hw="aids" pos="SUBST">AIDS</w>
<c c5="PUL">(</c><w c5="VVN-AJ0" hw="acquire" pos="VERB">Acquired</w>
<w c5="AJ0" hw="immune" pos="ADJ">Immune</w> <w c5="NN1" hw="deficiency" pos="SUBST">Deficiency</w>
<w c5="NN1" hw="syndrome" pos="SUBST">Syndrome</w><c c5="PUR">)</c></hi> 
<w c5="VBZ" hw="be" pos="VERB">is</w> <w c5="AT0" hw="a" pos="ART">a</w>
<w c5="NN1" hw="condition" pos="SUBST">condition</w> <w c5="VVN" hw="cause" pos="VERB">caused</w>
<w c5="PRP" hw="by" pos="PREP">by</w> <w c5="AT0" hw="a" pos="ART">a</w> ... </s> … </p>
… </div></wtext>

11. OpenOffice <-> TEI Stylesheets

  • OpenOffice is a free Office suite of tools (e.g. Word-processor, Spreadsheet, Presentations, Database, Drawing)
  • Reads and writes many formats (inc. Microsoft), but also including arbitrary XML if you create XSLT to map it to its own ODF schema
  • Sebastian Rahtz has created stylesheets to enable:
    • Opening TEI XML files in OpenOffice
    • Saving OpenOffice files as TEI XML
  • Limitations of presentational markup
  • Freely available from TEI SourceForge Website

12. OpenOffice, SaveAs TEI XML


  • Project with ISO, OUCS, TEI, Brigham Young University, and the Max Planck Digital Library
  • International Standards Organization: production of standards documents
  • TEI ODD for documenting a schema to store these standards as TEI XML
  • A suite of XSLT to allow lossless conversion to and from various word-processing systems (e.g. MS Word)
  • This ability to round-trip from presentational markup may also be available to other users
  • http://tei.oucs.ox.ac.uk/TEIISO/

14. ISO Standard: PDF rendered from TEI

15. Holinshed's Chronicles

  • Tudor Chronicles of England, Scotland, and Ireland
  • Popular source for Shakespeare, Spenser, Daniel, Drayton
  • Two Editions: 1577 and 1587(revised and significantly expanded)
  • Producing a print, old-spelling, annotated critical edition
  • An electronic full-text copy of 1587 exists in EEBO-TCP
  • EEBO-TCP has been commissioned to create an electronic full-text version of 1577 edition
  • We will be producing something to allow them to fuzzy-match paragraphs from one to another, using stand-off linking, to assist them in making their comparisons
  • We wrote an EEBO-TCP markup to TEI P5 XML transformation in XSLT as a first step
  • http://www.cems.ox.ac.uk/holinshed/

16. Holinshed Website

17. Dickens Journals Online

  • Project at University of Buckingham
  • Bicentenary in Feb. 2012
  • Complete online edition of Dickens's weekly magazines: Household Words and All the Year Round.
  • Texts have been digitised by Internet Archive
  • Exist in proprietary Deja-Vu format, which has a very limited XML format
  • We wrote an XSLT transformation from DjVu-XML to TEI P5 XML
  • http://www.buckingham.ac.uk/djo/

18. DJO Project Website

19. Lexicon of Greek Personal Names

  • Ongoing project to record of all Greek names from 7th century BC to 7th century AD
  • For each occurrence of a name record: place, date, bibliographical data and any name variation
  • Early adopter of IT from early 80s, using various SQL databases
  • Now being represented in TEI XML, using <person>, <nym>, <place> etc
  • XML is delivered using eXist and interface written in XQuery
  • http://www.lgpn.ox.ac.uk/

20. LGPN Website


  • Create seamless access to distributed information about manuscripts and rare old printed books in Europe
  • Connect existing digital libraries, bring aboard those who do not have them
  • Provide access to these in Manuscriptorium in their own languages and their own virtual interface
  • To standardise metadata on a variant of TEI P5 XML where possible
  • All we implement should work together seamlessly

22. Workpackage Leaders

  • National Library of the Czech Republic, Prague
  • AiP Beroun, s r.o., Beroun, Czech Republic
  • Oxford University Computing Services, Oxford, United Kingdom
  • Centro per la comunicazione e l’integrazione dei media, Florence, Italy
  • SYSTRAN S.A., Paris, France
  • Institute of mathematics and informatics, Vilnius, Lithuania
  • Biblioteca Nacional de España, Madrid, Spain

23. Workpackages

  • WP1: Project Management
  • WP2: Preparation for system implementation and content enhancement
  • WP3: Standardization of shared metadata
  • WP4: User personalization
  • WP5: Personalization for contributors
  • WP6: Multilingual and sophisticated access
  • WP7: Evaluation, testing, and validation
  • WP8: Dissemination and exploitation

24. WP3

  • Conversion between MASTER (and other formats) and TEI P5 / ENRICH ODD
  • Implementation of the OAI harvester into existing Manuscriptorium platform
  • Enhancement of internal use of METS containerization format to use benefits of TEI P5
  • Improvement and generalization of use of Unicode treatment throughout interface

25. WP3 Deliverables

  • D.3.1: Revised TEI-conformant specification
  • D.3.2: Documentation and training materials for use with specification
  • D.3.3: Report on development and validation of migration tools
  • D.3.4: Report on METS/TEI interoperability, best practice with respect to handling of Unicode and non-Unicode data in Manuscriptorium and P5 conversion techniques

26. MASTER to TEI P5

Content model changes
<msHeading> to <head>; unnecessary <p>s
Elements renamed
<decoration> to <decoDesc>, <msWriting> to <handDesc> etc.
Elements added
Splitting of <physDesc> children into <objectDesc>, <supportDesc>, <layoutDesc>, etc.
Elements removed
<overview>, <paratext>, <remarks>, <watermark> all dealt with differently now
Attribute changes
Many new attributes, a few removed

27. Work done so far...

  • A comparison between MASTER, Manuscriptorium and TEI P5, with resolutions of inconsistencies for ENRICH
  • A draft ENRICH ODD recording the customisation from the TEI, but also providing additional constraints, and detailed internationalised project documentation
  • A draft XSLT stylesheet for transforming MASTER to TEI P5 and/or ENRICH – works for:
  • An initial test corpus of 1064 manuscript descriptions from various sources, about to be increased exponentially by samples from ENRICH partners
  • http://tei.oucs.ox.ac.uk/ENRICH/

28. http://enrich.manuscriptorium.com

29. William Godwin Diaries

  • William Godwin:
    • 1756-1836, philosopher, writer, political activist,
    • husband of Mary Wollstonecraft, father of Mary Godwin (aka Wollstonecraft Shelley).
  • Inter-Departmental:
    • Politics, Statistics, Computing Services, Bodleian Library, etc. Project Staff:
    • Dr Mark Philp, David O'Shaughnessy, two students, + others
  • Objectives:
    • research to identify people mentioned in the 48 years of diary;
    • provide a searchable cross-referenced electronic edition alongside digital images of the diary
  • Started in October 2007, it coincided with initial release of TEI P5 and immediately benefited from new features.

30. Diary

31. Diary + XML

32. The odd Godwin ODD

  • Created a customised TEI ODD for the Godwin project, this included:
    • removal of many TEI modules and elements
    • addition of new syntactic sugar elements
    • providing closed attribute value lists
  • New elements:
    • should be added in new namespace
    • choice between that and canonicalisation (by XSL)
  • ODD provides:
    • schemas (DTD, RelaxNG, W3C Schema)
    • project specific documentation
    • option for internationalisation
    • useful assistance in chosen editor

33. Godwin ODD

34. <person>, <persName>, and people

  • Pointing from the c. 64000 instances of <persName> elements to <person> elements stored in separate files
  • Around 10000 distinct values of <persName> Instances of multiple @ref values, was corrected in most recent TEI P5 release, but how to display?
    <persName ref="#BR01 #BR02 #BR03">The Browns</persName>
  • Person records follow a strictly limited template
  • Submit person file and it instantly appears on SVN-driven project website in indices
  • Almost identical system will eventually be used for identifying texts and places

35. Person File

36. <person> and <persName>

37. Transforming Godwin

  • Unlike many editions, researchers interested more in social relationships, frequencies of contact, and statistics
  • Lists of element/attribute combinations produced to assist proofreading and editorial standards
    • lists by frequency
    • lists by distinct-value
    • lists by year
  • XQuery in eXist will be used for front-facing site
  • For project site: XSLT2-based grouping with xsl:for-each-group to create statistical lists
  • Transformations to CSV for people in Statistics Department
  • interested in networked relations of contacts in meetings

38. xsl:for-each-group

39. Rendered Generated List

40. Project Work

  • Project working practices...practices working
    • scheduled coding sessions
    • project blame sheets
    • group-written internal editorial policies
  • 2 Days Training:
    • Customised TEI XML,
    • oXygen,
    • Tortoise SVN
  • Subversion-based website: they commit, anyone reads
  • Project evolution:
    • new additions/customisations,
    • finer distinctions based on generated lists,
  • Feedback Loop: Lather, rinse, repeat
  • Fine line between freedom to change and locking down practices -- project has been impressed with TEI flexibility

41. Future of the Godwin Project

  • Project to complete in 2010, future still open
  • Website to be hosted by Bodleian
  • Website will allow access to underlying XML
  • Circular navigation & dynamic record aggregation
  • Places and textual works to be encoded similar to people
  • Leverage social networking technology? FOAF Browsers?
  • Javascript timelines of certain events
  • Google Maps API could be used for zoomify-like dynamic zooming of images Bodleian is taking of the diary

42. TEI-Related Projects at Oxford -- Conclusions

  • We tend to get involved with TEI-related project which:
    • are of small to medium scale projects
    • where people need a single defined workpackage
    • where we provide advice, support, or schema Design
    • where we feel it will benefit other users (so we release the output with an open licence)
    • where people have approached us before they write their funding bid instead of after they get the money

Date: 2008-07
Copyright University of Oxford