RDA Repository Platforms for Research Data Interest Group Use Case: Nanoscopy Open Research Data Repository (NORDR) Author(s): A. Prabhune, T. Jejkal, V. Hartmann, R. Stotzka 1. Scientific Motivation and Outcomes Novel imaging methods enable new insights but very often the images are hard to interpret and potential users and scientists need to be trained on the basis of reference data and their already known interpretation. These need to be publicly shared in an open reference data repository. Furthermore, new insights of the reference data will be gained with growing experience. Therefore, appropriate tools must enable the open discussion of data and associated analysis results. This necessitates data sharing on the one hand and annotation capabilities on the other. The nanoscopy research data repository supports the complete data life cycle by providing various services such for long term archival, large scale data processing, automated metadata modelling and storage, annotation services and data publication. 2. Functional Description High-resolution microscopes generate raw datasets in the range of hundreds of Terabytes which are ingested into the NORDR. Depending on the high-resolution microscope different raw data files are generated. Based on the data type of the file, the metadata is automatically extracted, modelled and stored in the metadata storage. Each dataset is assigned a PID For systematic management of metadata the captured metadata is categorised under Administrative Metadata (AM), Descriptive Metadata (DM) and Technical Metadata (TM). Once a raw dataset is ingested, community-defined data processing workflows are executed. The results of all workflow steps, also intermediate results, are ingested in the NORDR for reuse and are linked to the dataset they originate from. Furthermore, an according provenance graph is created and stored. Experts from the Nanoscopy Research Community evaluate and annotate the results. Depending on the evaluation results and new insights from the community, processing algorithms and workflows are improved and the datasets have to be reprocessed. For allowing data discovery, the NORDR provides various services which are built on top of the metadata storage. The NORDR supports METS metadata standard for allowing metadata interoperability. Metadata mining services allow to analyse and compare different workflow results. Page 1 of 3 3. Achieved Results A first version of the NORDR is installed, Data ingest and access workflow has been implemented and tested and was made available to the community. A generic metadata framework for extraction, modelling and storing heterogeneous metadata has been defined and is partly implemented. Page 2 of 3 4. Requirements Requirement Description Motivation from Use Case Importance (1 - very important to 5 - not at all important) Data Annotation The novel nanoscopy result images have to be annotated to capturing valuable insights Remotely located researchers can share their insights 1 Vocabulary Service Scientific terms have to be consistent for allowing future reuse Integrating vocabulary for allowing systematic annotations 2 Data and Metadata quality control Bit preservation, checksum of data and metadata completeness, accuracy, correctness and etc Metadata is the backbone for data discovery and hence quality assessment is necessary 1 High performance computing (HPC) integrating with Data repository For efficient processing of large datasets Frequent (re2 )processing must not be triggered by users but should be seamlessly integrated into the repository system. DOIs assignment For publishing the results there has to be clear mapping between the PID and the DOI DOIs will allow data sharing and with PID mapping enable reproducibility of results 2 Data Policies Data policies are used to define what happens when to which dataset, Especially for processing and quality control regularly enforced policies are helpful. 2 Page 3 of 3