Text only | Skip links
Skip links||IT Services, University of Oxford

1. Exercise 2: all about ODD

In the preceding exercise, you learned how to use the Roma web application to create a customization of the TEI. As you saw, Roma can be used to produce three different kinds of output: a formal schema in RELAX NG or some other schema language; human-readable documentation of the schema in HTML or PDF or some other format; and what we call an ODD file, which specifies both the schema and its documentation, using a TEI-defined XML vocabulary.

The ODD files created by the Roma web application use only a small subset of the features available in that language. In this and the next exercise you will explore the ODD language in depth, and see how it can be used to provide highly customized and richly documented XML systems, both TEI-based and independent of the TEI.

You will need the following to complete this exercise:
  • the sample ODD file which you created at the end of the last exercise (if you've lost it, use the file my_ipp.xml in the work directory).
  • a running version of oXygen or any other XML editor
  • you will also need a schema to validate your ODD file: use the file tei_odds.rnc in the work directory.

2. Editing an ODD

  • Open the ODD file which you saved at the end of the previous exercise in your editor. Using oXygen, you will probably see a dialog like this:
    because the web application does not add linebreaks to the file when it is saved. To make the file easier to read, we recommend you to click the Format button, either from this dialog or from the tool bar.
  • Make sure the ODD file is associated with the right schema. You can do this by selecting Schema -> Associate Schema from the Document menu. You should then see a dialog like this:
    Make sure you select RELAX NG compact syntax from the dropdown menu to the left. Click the arrow to the right of the folder button icon, select Browse for local file and then navigate to the file tei_odds.rnc which you saved earlier. Press OK.
  • Your ODD file contains the usual TEI header, followed by a <schemaSpec> element containing a handful of <moduleRef> elements and many many <elementSpec> elements, most of them simply saying that the element concerned is to be deleted. Scroll down to the end of this long list, and you will see the two specifications corresponding to the changes we proposed in the first exercise.
  • We will start by enhancing the value list. At present this simply specifies the values we wish to allow for the type attribute on <div> and looks something like this
    It would be helpful to add a brief phrase describing in what circumstances we would use each of these values.
  • Remove the slash inside the empty <valItem> element. oXygen adds a closing tag, and you could now add some content in front of it. Type a < and oXygen shows you a list of the elements which are legal at this point: if you hover over one of them it also pops up a brief description of what that element does, like this:
    <desc> (description) is probably the most useful for us here, so choose that. Enter a brief description explaining how this value should be used, e.g. (for cartoon) used for sections containing only a picture and associated caption..
  • Repeat this for each <valItem>. Make sure your document remains valid.
  • You can insert additional <desc> values in other languages if you wish to make life easier for non-English-speaking users of your schema. Use the xml:lang attribute on <desc> to specify the language: you can use any legal language code, but note that Roma only currently knows about French (fr), Spanish (sp), Italian (it), Taiwanese Chinese (zh-tw), Korean (kr) and Japanese (ja). Please volunteer to add support for other languages in P5 (but not during this exercise)!
  • It might also be helpful to document a default value for this attribute: the <defaultVal> element is available for this purpose. Add one!

3. Checking the result

Before proceeding, you may like to check the consequences of the changes you've just made.
  • Save your modified ODD file.
  • Open Roma. On the first screen, select the "Open existing customization" option, supply the name of the file containing your modified ODD and click on Start.
  • Click on the Schemas tab and generate a new version of your schema. Save it to the Desktop.
  • Return to oXygen and click the New Document button (or select "New" from the file menu).
  • In the Create an XML Document dialog, make sure to specify your newly created schema file.
  • Insert a <div> element in the body of your new document. See what options oXygen now offers you as values for the type attribute.

4. Expanding the schema (1)

Now return to the source of your ODD file in oXygen. We won't go into the full details, but you should now know enough about using oXygen to be able to do the following:
  • Add a completely new attribute called revision to the <div> element. You will need to insert an <attDef> element inside the existing <attList>, supplying it with appropriate values for its ident, mode and usage attributes.
  • We suggest using this attribute to hold a revision date for the division. You will need to supply an appropriate <desc> element and also an appropriate <datatype>.
  • Like that of <content>, the value of the <datatype> element is expressed using the RELAX NG XML vocabulary. You will therefore need to add an appropriate declaration for the Relaxng namespace to your new <datatype> element, in the same way as Roma has done for the <content> element it created for the new <citComm> element in your ODD.
  • With that namespace declaration in place, you can limit possible values for the new attribute to valid temporal expressions (as defined by the TEI) by supplying as content for the new <datatype> <rng:ref name="data.temporal.w3c"/>.
  • If that seemed too easy, try adding a second new attribute called pageRange, which can be used to supply exactly two numbers, no more and no less. Hint: use the attributes minOccurs and maxOccurs on <datatype>, and the TEI datatype data.count.

5. Expanding the schema (2)

Now let's take a look at the specification for our new <citCom> element. It could do with some work... Feel free to use oXygen to explore how you might enhance it. Just type a < at different points inside the <elementSpec> and see what elements are available — and if the help oXygen gives you isn't enough to determine how to use that element, consult the online version of the Guidelines.

To get you started, here are a few suggestions:
  • Make <citCom> a member of the class att.global since it will otherwise not get useful attributes such as xml:id or xml:lang.
  • Provide at least one usage example showing how the element should be used. The recommended way of providing an XML example inside a TEI XML document is to wrap it in an <egXML> element which declares a different namespace. The <egXML> is then wrapped inside an <exemplum> along with an optional paragraph or two so that you can also supply some commentary on the particular example: like this
    Don't forget that <citCom> is in a different XML namespace too...
  • Decide on a typology for your <citCom>s. Now, how can you enforce it by means of a type attribute? One way would be to supply an <attList> containing an <attDef>, as we have done for the <div> element. Arguably better however (since we may not yet be able to predefine all the possible values in our typology) would be to make citCom a member of the TEI attribute class att.typed, so try that way instead. This will give your element two attributes type and subtype: how would you remove the latter?

6. Using Schematron

You've seen how you can constrain the content of an element using <content>, and the value of an attribute using <datatype>. What about other constraints, in particular co-dependent ones? For example, we want to ensure that any <div> which has verse as the value of its type attribute has at least one <lg> element in its content.

Constraints of this kind can be expressed using the ISO Schematron language, and then specified in your ODD by supplying them within a <constraintSpec> element inside the <elementSpec>. Add one to your ODD, immediately ahead of the <attList> for <div>. Note that (just like content models and datatype values) the content of this element is expressed using a vocabulary taken from a different namespace. It might look like this:
<constraintSpec ident="verseRulescheme="isoschematron"   xmlns:tei="http://www.tei-c.org/ns/1.0"   xmlns:sch="http://purl.oclc.org/dsdl/schematron">
 <constraint>
  <assert xmlns="http://purl.oclc.org/dsdl/schematron"
   test="@type='verse' and .//tei:lg">
a div of type verse must include an lg element
  </assert>
 </constraint>
</constraintSpec>

7. What else?

So far we've looked only at the <elementSpec> element and its children, since these are the parts of an ODD which most people wish to modify or extend most of the time. However, an ODD can contain several other specification elements. In this section we will talk about TEI classes and TEI datatypes, both of which have already been referenced in passing.

As you already know, the TEI defines two different kinds of class: attribute classes, which supply attributes, and model classes, which determine where an element can be used. The element <classSpec> is used to specify a new class, and also to reference an existing one.

In this part of the exercise we will modify an existing class (att.global) by deleting the attributes rendition and xml:space from it.

  • Modification of an attribute class is done in much the same way as modification of the attributes of an element. We add to the ODD an element <classSpec> specifying the identifier of the class concerned (att.global), the module in which it is to be found (tei) and the mode in which it is being declared (change). Within the <classSpec> element, we supply an <attList> containing <attDef>s for the attributes we wish to remove, specifying mode='delete' for each.
  • If that was too easy, you might like to consider how you would define a new attribute class, and add existing elements to it. For example, we might want to make our revision attribute available on elements other than <div>. Hint: you will need to add a <classSpec> for the new class which will contain an <attList> looking very similar to that within an <elementSpec>; then we can add <memberOf> elements to the <classes> part of all the elements which are to use its attributes.
TEI datatypes are declared by means of a more general facility known as a macro specification and represented in an odd by a <macroSpec> element. Macros are used extensively in the TEI scheme for two purposes: to define frequently used datatypes, and to define commonly encountered datatypes. For example, the TEI datatype data.temporal.w3c we used earlier is defined by a <macroSpec> which looks more or less like this:
<macroSpec xmlns="http://www.tei-c.org/ns/1.0"
 module="teitype="dtident="data.temporal.w3c"   xmlns:tei="http://www.tei-c.org/ns/1.0">

<desc xmlns="http://www.tei-c.org/ns/1.0"
>
defines the range of attribute values expressing a temporal
expression such as a date, a time, or a combination of them, that
conform to the W3C <title xmlns="http://www.tei-c.org/ns/1.0"
>
XML Schema Part 2: Datatypes</title>
specification. </desc>

<!-- ... -->
<content xmlns="http://www.tei-c.org/ns/1.0"
>

<rng:choice>
 <rng:data type="date"/>
 <rng:data type="gYear"/>
 <rng:data type="gMonth"/>
 <rng:data type="gDay"/>
 <rng:data type="gYearMonth"/>
 <rng:data type="gMonthDay"/>
 <rng:data type="time"/>
 <rng:data type="dateTime"/>
</rng:choice></content></macroSpec>

The <content> element within a <macroSpec> supplies a declaration expressed (as elsewhere) in the RELAX NG schema language. The ident attribute on the <macroSpec> provides a shortcut name for this declaration, which can then be referenced by other declarations. In this example, the names referenced (date, gYear, etc.) are all examples of datatypes declared in the W3C Schema language, but they could be any valid RELAX NG declaration.

Your task is to change this declaration so that the variant forms beginning with a g (gYear etc.) are no longer permitted: in other words, so that a attribute declared as being of the datatype data.temporal.w3c must contain either a full date YYYY-MM-DD or a time HH:MM:SS or both, but none of the variants YYYY or MM-DD etc.

If that seems too easy, we invite you to declare and use a new macro called macro.stuff. This will be used in an imaginary corpus linguistics project, where we want to be able to define several different segment-like elements, all of which have the same model: any mixture of <w>, <pc>, or text, in alternation. Of course we could express this model directly in the <content> of each segment like element, but there are advantages to making explicit that they all have the same potential content, by giving that a name.

You may find the following simple declaration suitable to start witl.
<macroSpec mode="addident="macro.stuff"   xmlns:tei="http://www.tei-c.org/ns/1.0">
 <desc>mix of segment-like elements and text</desc>
 <content>
  <rng:zeroOrMore>
   <rng:choice>
    <rng:text/>
    <rng:ref name="w"/>
    <rng:ref name="pc"/>
   </rng:choice>
  </rng:zeroOrMore>
 </content>
</macroSpec>

When you have added the declaration for your macro to your ODD, remember to include some elements which reference it in their content model, or you will be unable to test it! You could do this either by changing the content model of an existing TEI element such as <phr> or <seg>, or (preferably perhaps) by adding an entirely new element of your own, in your own namespace, such as <phrase>. In the latter case, don't forget to add your new element to an appropriate existing TEI class such as model.segLike, or it will never be accessible. Don't forget, also, that the elements <w> and <pc> are in the module called analysis, so you will need to include that module in your schema specification.

8. And finally...

We've covered quite a lot in this exercise. Don't worry if you didn't manage to complete all of it: as long as you have carried out some of the suggested modifications, seen how they affect your schema, and hence the way that oXygen behaves when you edit a document using the schema, you've done very well!

A completed version of this exercise is available in the work directory as expanded_ipp.xml



Sebastian Rahtz . Date: 2010-11-06
Copyright University of Oxford