TEI - why we need to keep it simple The experience of the Diplomatarium Danicum project Mogens Devantier & Thomas Hansen, Society for Danish Language and Literature Diplomatarium Danicum Goal: Publish all documents pertaining to Denmark, AD 7891450 Currently: 3-year Carlsberg Foundation project aiming at development of • Textbank - archive with standardized texts • Web-application - consumer of standardized texts Future: Textbank leverages 1. publication of documents 1413-1450 - app 8500 texts 2. transformed material 1401-1412 - app 3000 texts [http://diplomatarium.dk/] 3. digitized annotated material 789-1400 - app 15 000 texts Why TEI? Two reasons 1. The most popular way of communicating data that are o portable o fine-grained and structured 2. The XML modus operandi o Specialization o Standardization o Routinization TEI - it gets complicated Standardization "lite" - TEI has guidelines, not specifications, so • Specialization needed at all levels • Routinization difficult When routinization is obstructed, portability is compromised • Format inconsistency • Tag-abuse • Missing information - non-existant or undetermined? Serious problems • querying - low precision, low recall • rendering - maintaining stylesheets is difficult Simplify: Controlling input with a TEI user interface Standardize! - develop your own standard and map it to TEI Make it operable with • schema • stylesheet Make it intelligible with • documentation Make documentation transparent and accessible with • URIs Simple uniform resources are strategic Immediate advantages in terms of • usability • management - segmentation of work, enriching markup • easier implementation • support Short-term advantages in terms of • preservation - attainable and should be promoted Long-term advantages in terms of • interoperability - essential to the final vision, but not always attainable right now Short-term advantages of simple Indications of an emerging market for text resources: • Centralization - more resources in fewer repositories o EU-CLARIN o National research infrastructures • Maximization - more texts, more consumers, more tools • Specialization - producers, preservers, consumers Markets depend on standards in order to compare the goods therefore, most infrastructure projects implement TEI Long-term advantages of simple Given the fact that • no single archive will ever hold all resources, and • no single XML markup schema will ever be imposed on all resources - users will, at some point, depend on interoperation between different archives and resources Interoperation requires standardization - a set of shared semantics implemented by a service that may function as a single point of access to distributed resources Conclusion - We need to keep it simple because... if the standard is observed • users will have immediate access to more resources • the resources will be better preserved, and • services will have easier access to more resources After all... "a complex system that works is always based on a simple system that works" -freely adapted from John Gall, Systemantics, 1978 Contact: th@dsl.dk