github.com/TEIC/TEI-Simple

TEI Simple

Processing Model

abstraction layer for XML processing

by Magdalena Turska | @magdaturska

University of Oxford | DiXiT ITN

Press SPACE to navigate through slides.

TEI Simple

a new highly constrained and prescriptive subset of TEI

extension to TEI’s ODD metalanguage for specifying processing rules for TEI encoded texts

reference implementations of processing rules defined for this TEI subset

formal mapping to the CIDOC CRM

“TEI Performance Indicators” indicating to a user what they can expect to use the text for

simple ain't easy

TEI

  • concentrates on data modelling aspects
  • assumes ideological agnosticism
  • avoids standardization constraining individual projects
  • avoids any recommendation for processing & publishing
  • TEI stylesheets often too complicated to customize
  • only concerned with TEI documents

TEI Simple

  • concentrates on eliminating the ambiguity for the encoder
  • aims at interoperability
  • provides out-of-the-box mechanism for default processing
  • allows for [relatively] easy customization and extension
  • Simple processing model can be applied to other vocabularies

XML processing

TEI Simple ODD

=

a schema

+

processing models

for all elements

Rahtz Rationale

workflow with three distinct roles

Editor

Programmer

Designer

Editor

manages the text integrity,
makes the high-level output decisions:

structural descriptions

should the original or corrected version be displayed by default‘, or ‘is this a block level or inline component

indications of appearance
titles are in italics

Programmer

takes the editor’s specification, and the data, and creates the input for the designer to make the output

Designer

creates the output envelope (eg book layout using InDesign, or a web site using Drupal), making decisions about the appearance in conjunction with the editor

‘use Garamond font throughout’

‘every page must show the departmental logo’

Rahtz Rationale

Turska Tenet

ODD stores as much information as possible

power to the editor

@behaviour magic

  • processing complexity hidden behind behaviour functions
  • functions based on commonly used terms
  • just a handful of functions that achieve > 80% tasks

model examples


	
		text-decoration: underline;
	

	
	

Single model is not enough

for each element there are potentially numerous <model> instructions that specify intended processing and rendering for different outputs and in various contexts


	
		corr[1]
	
	
		expan[1]
	
	
		reg[1]
	
	
		corr[1]
		sic[1]
	
	
		expan[1]
		abbr[1]
	
	
		reg[1]
		orig[1]
	

Brand New ODD

  • model
  • param
  • outputRendition
  • modelGrp
  • modelSeq
  • @output specifies output in which model applies
  • @predicate specifies context in which model applies
  • @behaviour specifies function from teiSimple function library to apply
  • @useSourceRendition indication to preserve @rendition value from the source
  • <param> elements specify parameters for behaviour
  • if no content parameter specified, all functions use current element as default content
  • <outputRendition> elements specify CSS instructions to indicate outline appearance
  • there can be as many <model> statements as required
  • each <model> may have multiple <outputRendition> children
  • set of multiple <model> statements is regarded as an alternation and only the first <model> with @predicate matching current context is applied
  • @behaviour specifies which one from TEI Simple function library should be applied and function parameters are specified as <param> children
  • <desc> allows for initial textual description of required processing

Use scenarios

if editorial decisions recommended by TEI Simple fit project’s needs perfectly

just use teisimple.odd

if not

overwrite teisimple.odd with custom processing and rendition instructions

What's an editor to do (apart from editing)

  • identify elements that require individual treatment
  • if treatment differs depending on context, identify all possible situations via XPath expressions (eg. div type=“act” headings treated differently than all other head elements)
  • decide which behaviour is required under given circumstances, specify parameters (eg. use only lem child as visible by default in app entries)
  • if treatment differs depending on output type create additional models with @output as necessary
  • specify rendition as CSS where required

Trade-offs

  • familiarity with source files and XML encoding used
  • ability to identify different use scenarios
  • relative XPath fluency to specify model parameters*
  • relative CSS fluency*
  • better control for editor
  • reduced size and complexity of the code
  • increased long-term code maintainability

* possibly assisted by project’s IT support

@joewiz U.S. State Department Office of the Historian

I have nearly completed adapting the rules for rendering our TEI as HTML, a complex, intricately structured (and thus brittle), hand-written XQuery modules of nearly 1,500 lines in length (70,000 characters). The resulting TEI ODD file is a much less dense 500 lines of TEI XML (12,500 characters). Thus far, we have only had to write 2-3 TEI Simple extension "behaviours" (to handle tables of contents and application concerns such as "previous/next" navigation), totaling about 300 lines of code.
(...) besides the reduction in complexity and custom code, the resulting ODD file will also be able to perform transformations into other target formats, such as PDF and EPUB - which I currently handle, again, with custom XQuery modules. Thus, ODD will significantly reduce the complexity and thus increase the long-term maintainability of the code.

Wanna try?

see eXist TEI Simple demo app by Wolfgang Meier

if you want to play around changing ODD download exist-db and install TEI Simple app locally from http://exist-db.org/exist/apps/homepage/index.html

Thank you!

Next TEI Simple event is TEI Simple HackAThon in Lyon, 26th of October 2015!

alternate (default,alternate)

create a specialized display of alternating elements for displaying the preferred version, both at once or toggling between the two.

anchor (id)

create anchor with ID

block (content)

create a block out of the content parameter.

body (content)

create the body of a document.

break (type,label)

make a line, column, or page break according to type

cell (content)

create a table cell

cit (content,source)

show the content, with an indication of the source

document (content)

start a new output document

figure (title)

make a figure with the title as caption

glyph (content)

show a character by looking up reference

graphic (url)

if url is present, uses it to display graphic, else display a placeholder image.

heading (content)

creates a heading.

index (type)

generate list according to type

inline (content,label)

creates inline element out of content if there's something in rendition, use that formatting otherwise just show text of selected content.

link (content,target)

create hyperlink

list (content)

create a list by following content

listItem (content)

create list item

metadata (content)

create metadata section

note (content,place,marker)

create a note, according to value of place; could be margin, footnote, endnote, inline

omit

do nothing, do not process children

paragraph (content)

create a paragraph out of content.

row (content)

make table row

section (content)

create a new section of the output document

table (content)

make table

text (content)

literal text

title (content)

make document title