Text only | Skip links
Skip links||IT Services, University of Oxford

1. Task

…all based on a chimerical and simplistic idea that you can take an arbitrary TEI document which models everything you know in a single XML vocabulary, and then extract useful RDF assertions in CIDOC CRM for interchange.

  • record the relationship of TEI elements to known CIDOC CRM concepts in a formal way, maintained in a single document with the mapping guidelines
  • provide actual mapping code to get from TEI XML to RDF XML
  • consider workflow and recommendations on tailoring conversion
  • target is a module for OxGarage — upload your TEI XML file, get back an RDF file

Contrast research on extracting assertions from published literature using NLP (eg Stellar/Star work on archaeological grey literature).

2. Background to my work

CLAROS project based at Oxford aims to combine discrete databases of information about the ancient world using an RDF triplestore of assertions using CIDOC CRM.

Currently includes art objects, archaeological sites, antiquarian photographs, and onomastics.

Lexicon of Greek Personal Names contributes via representation in TEI XML

3. Basic tool for recording mapping: <equiv>

<elementSpec ident="eventmode="change">
 <equiv
   filter="crm.xsl"
   mimeType="text/xsl"
   name="E5"
   uri="http://erlangen-crm.org/101001/E5_Event"/>

</elementSpec>
name
names the underlying concept of which the parent is a representation
uri
references the underlying concept of which the parent is a representation by means of some external identifier
filter
references an external script which contains a method to transform instances of this element to canonical TEI
mimeType
MIME media type of filter script

4. Workflow

Read ODD, extract filter info from each <equiv> and use it to generate a wrapper XSLT script
<xsl:stylesheet version="2.0"
  xpath-default-namespace="http://www.tei-c.org/ns/1.0">

 <xsl:import href="crm.xsl"/>
 <xsl:template match="*[ancestor::teiHeader]"/>
 <xsl:template match="*">
  <xsl:apply-templates
    select="*|@*|processing-instruction()|comment()|text()"/>

 </xsl:template>
 <xsl:template
   match="text()|comment()|@*|processing-instruction()"/>

 <xsl:template match="person">
  <xsl:call-template name="E21"/>
 </xsl:template>
 <xsl:template match="place">
  <xsl:call-template name="E53"/>
 </xsl:template>
 <xsl:template match="persName">
  <xsl:call-template name="E82"/>
 </xsl:template>
 <xsl:template match="placeName">
  <xsl:call-template name="E48"/>
 </xsl:template>
 <xsl:template match="event">
  <xsl:call-template name="E5"/>
 </xsl:template>
 <xsl:template match="residence">
  <xsl:call-template name="P74"/>
 </xsl:template>
 <xsl:template match="birth">
  <xsl:call-template name="E67"/>
 </xsl:template>
 <xsl:template match="death">
  <xsl:call-template name="E69"/>
 </xsl:template>
 <xsl:template match="geo">
  <xsl:call-template name="E47"/>
 </xsl:template>
 <xsl:template match="name">
  <xsl:call-template name="teiname"/>
 </xsl:template>
</xsl:stylesheet>

5. What is in crm.xsl?

<xsl:template name="E47">
 <P87_is_identified_by>
  <E47_Place_Spatial_Coordinates>
   <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   >

    <xsl:value-of select="."/></value>
  </E47_Place_Spatial_Coordinates>
 </P87_is_identified_by>
</xsl:template>
<xsl:template name="E69">
 <P100i_died_in>
  <E69_Death>
   <P4_has_time-span>
    <E52_Time-Span>
     <P82_at_some_time_within>
      <E61_Time_Primitive>
       <xsl:call-template name="calc-date-value"/>
      </E61_Time_Primitive>
     </P82_at_some_time_within>
    </E52_Time-Span>
   </P4_has_time-span>
  </E69_Death>
 </P100i_died_in>
</xsl:template>

6. Input

<person xml:id="ArnMag01sex="1role="scholar">
 <persName xml:lang="is">Árni Magnússon</persName>
 <persName xml:lang="la">Arnas Magnæus</persName>
 <persName xml:lang="da">Arne Magnusson</persName>
 <birth when="1663-11-13">13 November 1663</birth>
 <death when="1730-01-07">7 January 1730</death>
 <residence>
  <date from="1663to="1680">1663-1680</date>
  <placeName>
   <settlement type="farm">Hvammur</settlement>
   <region type="county">Dalasýsla</region>
   <region type="compass">Western</region>
   <country key="IS">Iceland</country>
  </placeName>
 </residence>
 <residence>
  <date from="1680to="1683">1680-1683</date>
  <placeName>
   <settlement type="institution">Skálholt</settlement>
   <region type="county">Árnessýsla</region>
   <region type="compass">Southern</region>
   <country key="IS">Iceland</country>
  </placeName>
 </residence>
 <residence>
  <date from="1683to="1685">1683-1685</date>
  <placeName>
   <settlement type="city">Copenhagen</settlement>
   <country key="DK">Denmark</country>
  </placeName>
 </residence>
 <residence>
  <date from="1685to="1686">1685-1686</date>
  <placeName>
   <settlement type="farm">Hvammur</settlement>
   <region type="county">Dalasýsla</region>
   <region type="compass">Western</region>
   <country key="IS">Iceland</country>
  </placeName>
 </residence>
 <residence>
  <date from="1686to="1689">1686-1689</date>
  <placeName>
   <settlement type="city">Copenhagen</settlement>
   <country key="DK">Denmark</country>
  </placeName>
 </residence>
 <residence>
  <date from="1689to="1690">1689-1690</date>
  <placeName>
   <country key="NO">Norway</country>
  </placeName>
 </residence>
 <residence>
  <date from="1690to="1694">1690-1694</date>
  <placeName>
   <settlement type="city">Copenhagen</settlement>
   <country key="DK">Denmark</country>
  </placeName>
 </residence>
 <residence>
  <date from="1694to="1696">1694-1696</date>
  <placeName>
   <country key="D">Germany</country>
  </placeName>
 </residence>
 <residence>
  <date from="1696to="1702">1696-1702</date>
  <placeName>
   <settlement type="city">Copenhagen</settlement>
   <country key="DK">Denmark</country>
  </placeName>
 </residence>
 <residence>
  <date from="1702to="1705">1702-1705</date>
  <placeName>
   <settlement type="institution">Skálholt</settlement>
   <region type="county">Árnessýsla</region>
   <region type="compass">Southern</region>
   <country key="IS">Iceland</country>
  </placeName>
 </residence>
 <residence>
  <date from="1705to="1706">1705-1706</date>
  <placeName>
   <settlement type="city">Copenhagen</settlement>
   <country key="DK">Denmark</country>
  </placeName>
 </residence>
 <residence>
  <date from="1706to="1708">1706-1708</date>
  <placeName>
   <settlement type="institution">Skálholt</settlement>
   <region type="county">Árnessýsla</region>
   <region type="compass">Southern</region>
   <country key="IS">Iceland</country>
  </placeName>
 </residence>
 <residence>
  <date from="1708to="1709">1708-1709</date>
  <placeName>
   <settlement type="city">Copenhagen</settlement>
   <country key="DK">Denmark</country>
  </placeName>
 </residence>
 <residence>
  <date from="1709to="1712">1709-1712</date>
  <placeName>
   <settlement type="institution">Skálholt</settlement>
   <region type="county">Árnessýsla</region>
   <region type="compass">Southern</region>
   <country key="IS">Iceland</country>
  </placeName>
 </residence>
 <residence>
  <date from="1712to="1730">1712-1730</date>
  <placeName>
   <settlement type="city">Copenhagen</settlement>
   <country key="DK">Denmark</country>
  </placeName>
 </residence>
 <occupation>Professor, <foreign xml:lang="da">Arkivsekretær</foreign> (Secretary of the Royal Archives)</occupation>
 <bibl>
  <ref target="IsAev">Íslenzkar æviskrár</ref> I, pp. 62-63</bibl>
 <bibl>
  <ref target="DBL">Dansk Biografisk Leksikon</ref> XV, pp. 230-34</bibl>
</person>

7. Result

<RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>

 <E21_Person xmlns="http://purl.org/NET/crm-owl#"
 
   about="http://www.example.com/idArnMag01">

  <P131_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
   xml:lang="is">

   <E82_Actor_Appellation xmlns="http://purl.org/NET/crm-owl#"
   >

    <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    >
Árni Magnússon</value></E82_Actor_Appellation></P131_is_identified_by>
  <P131_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
   xml:lang="la">

   <E82_Actor_Appellation xmlns="http://purl.org/NET/crm-owl#"
   >

    <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    >
Arnas Magnæus</value></E82_Actor_Appellation></P131_is_identified_by>
  <P131_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
   xml:lang="da">

   <E82_Actor_Appellation xmlns="http://purl.org/NET/crm-owl#"
   >

    <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    >
Arne Magnusson</value></E82_Actor_Appellation></P131_is_identified_by>
  <P98i_was_born xmlns="http://purl.org/NET/crm-owl#"
  >

   <E67_Birth xmlns="http://purl.org/NET/crm-owl#"
   >

    <P4_has_time-span xmlns="http://purl.org/NET/crm-owl#"
    >

     <E52_Time-Span xmlns="http://purl.org/NET/crm-owl#"
     >

      <P82_at_some_time_within xmlns="http://purl.org/NET/crm-owl#"
      >

       <E61_Time_Primitive xmlns="http://purl.org/NET/crm-owl#"
       >

        <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        >
1663-11-13</value></E61_Time_Primitive></P82_at_some_time_within></E52_Time-Span></P4_has_time-span>
    <P7_took_place_at xmlns="http://purl.org/NET/crm-owl#"
     resource="http://www.example.com/id/1/"/>
</E67_Birth></P98i_was_born>
  <P100i_died_in xmlns="http://purl.org/NET/crm-owl#"
  >

   <E69_Death xmlns="http://purl.org/NET/crm-owl#"
   >

    <P4_has_time-span xmlns="http://purl.org/NET/crm-owl#"
    >

     <E52_Time-Span xmlns="http://purl.org/NET/crm-owl#"
     >

      <P82_at_some_time_within xmlns="http://purl.org/NET/crm-owl#"
      >

       <E61_Time_Primitive xmlns="http://purl.org/NET/crm-owl#"
       >

        <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        >
1730-01-07</value></E61_Time_Primitive></P82_at_some_time_within></E52_Time-Span></P4_has_time-span></E69_Death></P100i_died_in>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/1">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Hvammur Dalasýsla Western Iceland</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/2">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Skálholt Árnessýsla Southern Iceland</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/3">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Copenhagen Denmark</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/4">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Hvammur Dalasýsla Western Iceland</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/5">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Copenhagen Denmark</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/6">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Norway</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/7">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Copenhagen Denmark</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/8">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Germany</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/9">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Copenhagen Denmark</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/10">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Skálholt Árnessýsla Southern Iceland</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/11">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Copenhagen Denmark</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/12">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Skálholt Árnessýsla Southern Iceland</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/13">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Copenhagen Denmark</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/14">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Skálholt Árnessýsla Southern Iceland</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence>
  <P74_has_current_or_former_residence xmlns="http://purl.org/NET/crm-owl#"
  >

   <E53_Place xmlns="http://purl.org/NET/crm-owl#"
    about="http://www.example.com/id/15">

    <P87_is_identified_by xmlns="http://purl.org/NET/crm-owl#"
    >

     <E48_Place_Name xmlns="http://purl.org/NET/crm-owl#"
     >

      <value xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      >
Copenhagen Denmark</value></E48_Place_Name></P87_is_identified_by></E53_Place></P74_has_current_or_former_residence></E21_Person></RDF>

8. Problem of ambiguity of context

Contrast
<place>
 <placeName>Zadar</placeName>
</place>
with
<p>He was born in <placeName>Zadar</placeName>
</p>

9. Problem of ambiguity: <name>

<name type="place">Zadar</name>
<name type="person">Oyvind</name>
<rs type="person">Piotr</rs>

10. The usual problems

  • how to record location in TEI text of source claim
  • date of claim
  • how to actually express dates
  • representing uncertainty and precision
  • chronological periods
  • how to actually express spatial coordinates

11. Taxonomies, categories, @type etc

Do we
  • understand how to use E55_Type
  • convert existing taxonomies and categories to SKOS notation, and refer to it
  • enhance existing taxonomies with links to SKOS
  • use informal @type from TEI documents to generate SKOS on the fly?
  • something else?

12. What more could we do with the ODD?

Use it to create a schema to check whether a document can be mapped

  • remove elements which cannot be matched to CRM
  • add constraints to check situations which stop meaningful mapping
  • eg check whether a <placeName> has a ref
  • eg check whether a <placeName> has a xml:id

13. What next?

Can this simplistic approach work?

Todo:


Sebastian Rahtz. Date: TEI members meeting, Zadar, 2010-11-13
Copyright University of Oxford