Applying Semantic Web technologies to medieval manuscript research

advertisement
European Science Foundation Exploratory Workshop, Birmingham (UK), 30 March – 1 April 2009
Applying Semantic Web technologies to medieval manuscript research
Position Paper
Toby Burrows, ARC Network for Early European Research
Medieval manuscript research is a complex, fragmented, multilingual field of
knowledge, which is difficult to navigate, analyse and exploit. Though printed sources
are still of great importance and value, there is a large and rapidly growing body of
material on the Web. Much of this Web material consists of information about
manuscripts, though a considerable amount of digitization and transcription has also
been carried out.
This European Science Foundation Exploratory Workshop focuses on the possibilities
for applying new Semantic Web technologies to enhance medieval manuscript
research. These technologies are intended to represent a complex body of knowledge
in standardized ways, and to enable sophisticated discovery and reasoning tools to be
applied to data and documents across the Web. These technologies have the potential
to enhance medieval manuscript research greatly, by enabling much more effective
access to, and use of, relevant materials and knowledge. Imagine a Web service
through which you could readily find all manuscripts of relevance to the research
question you are investigating, and be pointed to previous work about them and to
digital representations of them…
Medieval manuscripts: research questions
Medieval manuscripts are used in addressing a wide range of research questions. Most
obviously, these include research into the characteristics of manuscripts
themselves as physical objects. These characteristics include: the place of origin, the
date or period of origin, the materials used, the decoration and illumination, the
handwriting, the scribe, the binding, arrangement of the physical volume, and the
language. Research into the subsequent history of a manuscript looks at its owners
and at changes to its appearance over time, as well as at its modern location, and its
place in modern collections.
Relationships between manuscripts are a common topic, including research
which reunites dispersed leaves of what was originally a single manuscript.
Identifying connections between specific medieval manuscripts and other materials
which survive from the medieval period (especially art works, buildings, and other
material objects) is another closely related area of research.
Defining these physical characteristics also forms the starting-point of many
research projects, e.g. defining specific styles of handwriting, establishing different
categories of decoration, and identifying different ways of arranging or binding
physical volumes in the medieval period.
Fundamental to all these kinds of research projects is the availability of
standardized descriptions of manuscripts as physical objects. In the digital arena, the
most promising approach to standardizing descriptions is that of the Text Encoding
Initiative (TEI) (1).
The other major area of research involves the use of manuscripts as evidence
for all aspects of life in the medieval period. This requires knowledge and
understanding of the content of a manuscript – the text, the images, the music, etc.
European Science Foundation Exploratory Workshop, Birmingham (UK), 30 March – 1 April 2009
This kind of research is heavily dependent on the descriptors used to identify the
content, including authors’ names, titles of works, incipits (opening words), subject
and concept terms, and so on.
Both these major areas of research also draw on a knowledge of the secondary
literature relating to specific manuscripts: catalogues and descriptions (medieval and
modern), secondary works, bibliographies, etc. These are likely to reflect changes
over time – as concepts shift, and descriptions and attributions are revised. All aspects
of the body of knowledge in this field are also multilingual; descriptions and
descriptors may be in a variety of (mainly European) languages.
Web services for medieval manuscript research
There are many existing Web services relevant to medieval manuscript research. At
present, they have to be consulted separately and individually – though search engines
like Google cover some of them. These services employ a range of different
descriptive standards and vocabularies, and use a variety of different technologies to
make their information available on the Web.





Numerous collecting institutions provide information about the manuscripts
they hold, either as part of more general databases or as specific manuscript
databases (2-3). There are a range of national databases (4-5) as well as a
small number of international databases (6-8).
Some of these services provide digital images of manuscripts as well as
descriptive information about them. Europeana, for instance, focuses
specifically on digitized materials, but its scope is much broader than
manuscripts (9).
There are many Web sites which list, transcribe, or provide digital images of
manuscripts of a specific text or relating to a specific medieval author (10-11).
Ancillary Web services include sites devoted to manuscript terminology and
vocabularies (e.g., 12), incipits (13), subjects (14), authors (15-16), and people
more generally (17-20).
Other services provide indexes to journal articles, scholarly books and other
secondary literature about specific manuscripts (e.g. 21-22).
Semantic Web technologies
Semantic Web technologies are methods for adding semantic structures to Web data
and documents, with the broad aim of making them more interoperable and
automatically discoverable. The main building-blocks for Semantic Web services are
as follows.

Object identifiers: machine-processable alpha-numeric addresses (URIs) used
to identify an object uniquely. It is possible to assign identifiers to abstract
“objects” like concepts, subject terms, personal names and place names, as
well as to physical objects like manuscripts.

Ontologies and ontological languages: ways of structuring the relationships
between elements of a body of knowledge, expressed in a formal language like
European Science Foundation Exploratory Workshop, Birmingham (UK), 30 March – 1 April 2009
OWL or SKOS. The result is a machine-readable conceptual map of a domain
of knowledge.

RDF databases: collections of statements about objects, their properties, and
their relationships (“triples”), expressed in the RDF syntax. These statements
can be used to show how and where an object fits in the ontological structure
of the body of knowledge.

Agent systems and Web services: software environments which can be built
to explore, analyse and exploit the knowledge embedded in ontologies and
RDF databases.
Future directions
What sorts of projects would be required?
Transforming existing knowledge into Semantic Web forms:
 Assigning and maintaining URIs for objects;
 Transforming existing vocabularies into SKOS-type ontologies;
 Mapping between vocabularies (SKOS, OWL).
Making these building-blocks available on the Web:
 Delivering ontologies for use over the Web;
 Building RDF databases (“triple stores”) out of existing manuscript
descriptions;
 Embedding ontologies into these databases.
Building Web services to exploit these data stores:
 Building query and browse services (e.g. using SPARQL);
 Enabling decentralized updating of these services.
Who should be involved in such projects? How should such projects be
organized and managed?




Research groups and individual researchers (to ensure relevance; test, correct
and update; supply data and expertise);
Libraries and other collecting institutions (to supply data and expertise);
Technology experts (to design and build services);
Commercial firms (to supply data and expertise).
What sources of funding should be pursued?




European Science Foundation;
Other European Union funding sources;
National schemes;
Foundations and other non-government sources.
European Science Foundation Exploratory Workshop, Birmingham (UK), 30 March – 1 April 2009
Terminology
OWL: Web Ontology Language
RDF: Resource Description Framework
SKOS: Simple Knowledge Organization System
SPARQL: a protocol and query language for RDF
Triple: a subject–predicate–object expression in RDF
URI: Uniform Resource Identifier
References
1. http://www.tei-c.org/release/doc/tei-p5-doc/en/html/MS.html
2. British Library Manuscripts Catalogue: http://www.bl.uk/catalogues/manuscripts/INDEX.asp
3. Codices Electronici Sangallenses: http://www.cesg.unifr.ch/en/
4. Medieval Manuscripts in Dutch Collections: http://www.mmdc.nl/static/site/index.html
5. IRHT MEDIUM: http://www.irht.cnrs.fr/ressources/medium_frame.htm
6. Digital Scriptorium (US): http://www.scriptorium.columbia.edu/
7. CERL Portal: http://cerl.epc.ub.uu.se/sportal/
8. ENRICH/Manuscriptorium: http://enrich.manuscriptorium.com/
9. Europeana: http://europeana.eu/portal/
10. Dante, Divina Commedia: http://www.danteonline.it/italiano/codici_indice.htm
11. Chrétien de Troyes, Le Chevalier de la Charrette : http://lancelot.baylor.edu/
12. Denis Muzerelle, Vocabulaire codicologique: http://vocabulaire.irht.cnrs.fr/vocab.htm
13. In Principio (Brepols)
14. International Medieval Bibliography (Brepols): subject thesaurus
15. International Medieval Bibliography (Brepols): author lists
16. Medieval Manuscripts in Dutch Collections: list of authors
http://www.mmdc.nl/static/media/2/17/authors.pdf
17. Personennamen des Mittelalters (online as part of Personennamendatei):
http://www.d-nb.de/eng/standardisierung/normdateien/pnd.htm
18. Europa Sacra (Brepols)
19. Fasti Ecclesiae Anglicanae 1066-1300:
http://www.british-history.ac.uk/subject.aspx?subject=2&gid=39
20. CERL Thesaurus: http://cerl.sub.uni-goettingen.de/ct/
21. Scriptorium indexes: http://www.scriptorium.be/en/frameset2.htm
22. International Medieval Bibliography (Brepols): manuscripts index
Download