DARIAH-DE Use Case: “Federation Architecture” The DARIAH-DE Use Case describes a “Federation Architecture” for research data and collection descriptions of distributed sources at cultural heritage institutions, such as libraries, archives, research institution, and data centers. It aims at • indexing and listing of research data, • providing sustainable and persistent access for further use, • technical tools to compare descriptions and contents of digital collections, • comprehensive search functionality for heterogeneously structured data collections and archives. The following figure illustrates tools and services for searching within distributed sources/data: Figure 1: Illustration of search tools and services. 1 Within the last three years, we have been developing these search tools and services: Collection Registry The Collection Registry is a service for registering collections of research data and their interfaces and for making this registration information available via the Generic Search of DARIAH-DE. Schema Registry The Schema Registry allows the storing of different metadata schemas for use by the Crosswalk Registry and Generic Search. Both the schemas and the underlying algorithms can be generated with the Crosswalk Registry. Crosswalk Registry The Crosswalk Registry is a graphical tool, enabling researchers in the Arts, the Humanities, and Social Sciences to map different metadata standards stored in the Schema Registry. This mapping allows automated translation from one data schema to another, and that, in turn, allows scholars to use just this one tool in order to search data of different collections. Figure 2: Crosswalk Registry. 2 Generic Search The Generic Search provides a front end for the data registered in the Collection Registry. Generic Search can search the registries as well as third party sources. This tool allows searching in heterogeneous data sets, e.g. data from ZVDD1, HathiTrust2, or other libraries, repositories and data centers. DARIAH-DE Repository The DARIAH-DE Repository enables users to archive data of any kind in a sustainable and persistent manner. Both the data describing the collection and the research data itself can be indexed and found by Generic Search. Figure 3: Form to enter data for publishing. 1 http://www.zvdd.de/startseite/ 2 http://www.hathitrust.org 3 Summary These search tools and services form a distributed architecture, all services may be provided by multiple institutions in various instances. Each service allows access to heterogeneous data sources of various provenance. New methods of analyzing existing distributed data collections in a unified way are possible with the “DARIAHDE Federation Architecture” for research data. 4 Use Case tables ID: Use Case 1: DARIAH-DE-1 Title: Platform to provide a (federated) search through heterogeneous digital collections. Description: Platform for federated search in research data and collection descriptions of heterogeneous digital material in libraries, archives, data centers and cultural heritage institutions. Trigger: - Preconditions: A researcher wants to search in various data/sources/collections pertaining to her/his current research question and compare findings from different resources. Steps for Main Success Scenario: 1. The researcher identifies the data/source/collection she/he wants to search and analyze. 2. The collection is registered with Generic Search and the interface for accessing is mapped. 3. The metadata schema describing the source is selected or newly created. 4. Different related schemas are mapped and crosswalks are generated. 5. Generic Search indexes the existing source 5 material and provides the interface for the researcher. Alternate scenarios: - Postconditions: - Frequency of Use: Whenever the researcher wants to find research data relating to her/his current research question. Status: Draft Author: DARIAH-DE 6 ID: Use Case 2: DARIAH-DE-2 Title: Platform for persistent archiving and publishing of research data. Description: Platform for federated search in research data and collection descriptions of heterogeneous digital material in libraries, archives, data centers and culture heritage institutions. Trigger: - Preconditions: A researcher has some data or source material along with structured metadata describing the material in some standardized way. It should be published for public reuse. Steps for Main Success Scenario: Alternate scenarios: 1. The researcher uploads her/his research data to the DARIAH-DE Repository for persistent and sustainable archiving. 2. She/He adds the collection and the metadata schema to the respective registries of DARIAH-DE-1. 1. The data is never published but still longterm preserved. 7 Postconditions: Any researcher can find the published and linked research data and the connected and related sources through Generic Search and reuse it for their own research questions. Frequency of Use: Whenever the researcher has some new research data or wants to add new data and optionally re-publish (versioning). Status: Draft Author: DARIAH-DE 8