NORDR_UseCase_RepoPlat

advertisement
RDA Repository Platforms for Research Data
Interest Group
Use Case: Nanoscopy Open Research Data Repository (NORDR)
Author(s): A. Prabhune, T. Jejkal, V. Hartmann, R. Stotzka
1. Scientific Motivation and Outcomes
Novel imaging methods enable new insights but very often the images are hard to interpret and
potential users and scientists need to be trained on the basis of reference data and their already
known interpretation. These need to be publicly shared in an open reference data repository.
Furthermore, new insights of the reference data will be gained with growing experience.
Therefore, appropriate tools must enable the open discussion of data and associated analysis
results. This necessitates data sharing on the one hand and annotation capabilities on the other.
The nanoscopy research data repository supports the complete data life cycle by providing
various services such for long term archival, large scale data processing, automated metadata
modelling and storage, annotation services and data publication.
2. Functional Description








High-resolution microscopes generate raw datasets in the range of hundreds of
Terabytes which are ingested into the NORDR. Depending on the high-resolution
microscope different raw data files are generated. Based on the data type of the file, the
metadata is automatically extracted, modelled and stored in the metadata storage. Each
dataset is assigned a PID
For systematic management of metadata the captured metadata is categorised under
Administrative Metadata (AM), Descriptive Metadata (DM) and Technical Metadata (TM).
Once a raw dataset is ingested, community-defined data processing workflows are
executed. The results of all workflow steps, also intermediate results, are ingested in the
NORDR for reuse and are linked to the dataset they originate from. Furthermore, an
according provenance graph is created and stored.
Experts from the Nanoscopy Research Community evaluate and annotate the results.
Depending on the evaluation results and new insights from the community, processing
algorithms and workflows are improved and the datasets have to be reprocessed.
For allowing data discovery, the NORDR provides various services which are built on top
of the metadata storage.
The NORDR supports METS metadata standard for allowing metadata interoperability.
Metadata mining services allow to analyse and compare different workflow results.
Page 1 of 3
3. Achieved Results



A first version of the NORDR is installed,
Data ingest and access workflow has been implemented and tested and was made
available to the community.
A generic metadata framework for extraction, modelling and storing heterogeneous
metadata has been defined and is partly implemented.
Page 2 of 3
4. Requirements
Requirement
Description
Motivation from Use
Case
Importance (1 - very
important to 5 - not at
all important)
Data Annotation
The novel nanoscopy
result images have to
be annotated to
capturing valuable
insights
Remotely located
researchers can
share their insights
1
Vocabulary Service
Scientific terms have
to be consistent for
allowing future reuse
Integrating
vocabulary for
allowing systematic
annotations
2
Data and Metadata
quality control
Bit preservation,
checksum of data
and metadata
completeness,
accuracy, correctness
and etc
Metadata is the
backbone for data
discovery and hence
quality assessment is
necessary
1
High performance
computing (HPC)
integrating with Data
repository
For efficient
processing of large
datasets
Frequent (re2
)processing must not
be triggered by users
but should be
seamlessly integrated
into the repository
system.
DOIs assignment
For publishing the
results there has to
be clear mapping
between the PID and
the DOI
DOIs will allow data
sharing and with PID
mapping enable
reproducibility of
results
2
Data Policies
Data policies are
used to define what
happens when to
which dataset,
Especially for
processing and
quality control
regularly enforced
policies are helpful.
2
Page 3 of 3
Download