ppt - Thomas Hansen`s pages

advertisement
TEI - why we need to keep it
simple
The experience of the
Diplomatarium Danicum project
Mogens Devantier & Thomas Hansen,
Society for Danish Language and Literature
Diplomatarium Danicum
Goal: Publish all documents pertaining to Denmark, AD 7891450
Currently: 3-year Carlsberg Foundation project aiming at
development of
• Textbank - archive with standardized texts
• Web-application - consumer of standardized texts
Future: Textbank leverages
1. publication of documents 1413-1450 - app 8500 texts
2. transformed material 1401-1412 - app 3000 texts
[http://diplomatarium.dk/]
3. digitized annotated material 789-1400 - app 15 000 texts
Why TEI?
Two reasons
1. The most popular way of communicating data that are
o portable
o fine-grained and structured
2. The XML modus operandi
o Specialization
o Standardization
o Routinization
TEI - it gets complicated
Standardization "lite" - TEI has guidelines, not specifications, so
• Specialization needed at all levels
• Routinization difficult
When routinization is obstructed, portability is compromised
• Format inconsistency
• Tag-abuse
• Missing information - non-existant or undetermined?
Serious problems
• querying - low precision, low recall
• rendering - maintaining stylesheets is difficult
Simplify: Controlling input with a TEI
user interface
Standardize! - develop your own standard and map it to TEI
Make it operable with
• schema
• stylesheet
Make it intelligible with
• documentation
Make documentation transparent and accessible with
• URIs
Simple uniform resources are strategic
Immediate advantages in terms of
• usability
• management - segmentation of work, enriching markup
• easier implementation
• support
Short-term advantages in terms of
• preservation - attainable and should be promoted
Long-term advantages in terms of
• interoperability - essential to the final vision, but not always
attainable right now
Short-term advantages of simple
Indications of an emerging market for text resources:
• Centralization - more resources in fewer repositories
o EU-CLARIN
o National research infrastructures
• Maximization - more texts, more consumers, more tools
• Specialization - producers, preservers, consumers
Markets depend on standards in order to compare the goods therefore, most infrastructure projects implement TEI
Long-term advantages of simple
Given the fact that
• no single archive will ever hold all resources, and
• no single XML markup schema will ever be imposed on all
resources
- users will, at some point, depend on interoperation between
different archives and resources
Interoperation requires standardization - a set of shared
semantics implemented by a service that may function as a
single point of access to distributed resources
Conclusion - We need to keep it simple
because...
if the standard is observed
• users will have immediate access to more resources
• the resources will be better preserved, and
• services will have easier access to more resources
After all... "a complex system that works is always based on a
simple system that works"
-freely adapted from John Gall, Systemantics, 1978
Contact: th@dsl.dk
Download