Data Fabric IG Use Case Analysis Data Fabric Analysis how to come to essential components & services? Analyze Data Practices 2 Data Practices I (120 interviews etc.) 3 Data Practices II – EUDAT federation Community Centers Common Data Centers projects to push limits and raise awareness 4 Data Practices II – split of functions physical layer operations are trivial – know how to do it “logical layer” operations are complex due to relations, etc. all LL information needs to be aggregated and we need to have a secure access layer around it 5 Data Fabric Analysis how to come to essential components & services? Analyze Use Cases 6 7 10 (+5) Use Cases so far (2 in development, others mature) environmental science natural science life science humanities, soc. sciences IT, various all indicated nodes are centers of national, regional and even worldwide federations 8 10 (+5) Use Cases so far (2 in development, others mature) Name Institute state 1 Language Archive Max Planck Institute NL in operation 2 Geodata Sharing Platform Academy of China In operation 3 Datanet Federation Concortium RENCI US In operation 4 ADCIRC Storm Forcasting RENCI US In operation 5 EPOS Plate Observation INGV/CINECA Italy In operation 6 ENVRI Environment Observation U Helsinki, Finland In design 7 Nanoscopy Repository Cell structures KIT, Germany In design 8 Human Brain Neuroinformatics EPFL Switzerland in testing 9 ENES Climate Modeling DKRZ Germany In operation 10 LIGO Gravitation Physics NCSA US In operation 11 ECRIN Medical Trial Interoperation U Düsseldorf Germany In testing 12 VPH Physiology Simulation U London UK In operation 13 Species Archive Nature Museum Germany In operation 14 all indicated nodes are centers of national, regional and even worldwide federations International NeuroI Facility INCF Sweden In operation 15 Molecular Genetics MPI Germany In operation 9 10 (+5) Use Cases so far (2 in development, others mature) Name Institute state 1 Language Archive Max Planck Institute NL in operation 2 Geodata Sharing Platform Academy of China In operation 3 Datanet Federation Concortium RENCI US In operation 4 ADCIRC Storm Forcasting RENCI US In operation 5 EPOS Plate Observation INGV/CINECA Italy In operation 6 ENVRI Environment Observation U Helsinki, Finland In design 7 Nanoscopy Repository Cell structures KIT, Germany In design 8 Human Brain Neuroinformatics EPFL Switzerland in testing 9 ENES Climate Modeling DKRZ Germany In operation 10 LIGO Gravitation Physics NCSA US In operation 11 ECRIN Medical Trial Interoperation U Düsseldorf Germany In testing 12 VPH Physiology Simulation U London UK In operation 13 Species Archive Nature Museum Germany In operation 14 all indicated nodes are centers of national, regional and even worldwide federations International NeuroI Facility INCF Sweden In operation 15 Molecular Genetics MPI Germany In operation Issues of Relevance 10 management, analytics, conversion provenance – reproducibility workflows, policies, deployment new collection new metadata temp store AAI/FIM highly distributed in federations sensors simulations crowd etc. PID, Metadata Rights Syntax, Types Semantics Relations virtual collection builder FS, Cloud, DB Repository System How do WGs/IGs fit? REPRO PROV PP BDA 11 BROK CERT FIM REP DMP DOM CITDD CERT Components I 12 domain of registered digital objects (DO) incl. basic organization principles (data, code, knowledge) -> worldwide PID system (Handles/DOI) domain of registered actors -> worldwide ID system (ORCID) domain of trusted repositories for DOs -> worldwide Rep Registry proper DFT/DSA/WDS compliant repository systems accepted policy commons (proper organization support, self-documenting, tested/certified, etc.) -> policy component registry policy/services -> service registry authentication system -> various in place (ORCID just number) authorization system -> authorization registry Components II MD components/schemas -> metadata schema registry data types /schemas/formats -> data type registry semantic categories -> category registry vocabularies -> vocabulary registry what about complex ontologies (thesauri, ontologies, etc.) what about mapping relations? 13 Components II MD components/schemas -> metadata schema registry data types /schemas/formats -> data type registry semantic categories -> category registry vocabularies -> vocabulary registry what about complex ontologies (thesauri, ontologies, etc.) what about mapping relations? 14 What to do today 4 use cases (max 10 min) with the following goals understand whether we get what we want to get (common components/services) discuss whether we need to adapt the template Zhu Dieter Sean Giuseppe Ed discuss how to move on with use cases & analysis discuss my first look on C/S (?) update of existing and appearance on wiki (deadline) deadline for first round (when, whom to motivate, ?) virtual meeting for a discussion on analysis (when?) at P6 (September) a first document with analysis 15 16 Did we forget something? Data Practices I – Survey 17 ~120 Interviews/Interactions 2 Workshops with Leading Scientists (EU, US) too much manual or via ad hoc scripts too much in Legacy formats (no PID & MD) there are lighthouse projects etc. but ... DM and DP not efficient and too expensive (Biologist for 75% of his time data manager) federating data incl. logical information much too expensive hardly usage of automated workflows and lack of reproducibility Data Practices I – Survey 18 ~120 Interviews/Interactions 2 Workshops with Leading Scientists (EU, US) too much manual or via ad hoc scripts too much in Legacy formats (no PID & MD) there are lighthouse projects etc. but ... DM and DP not efficient and too expensive (Biologist for 75% of his time data manager) federating data incl. logical information much too expensive hardly usage of automated workflows and lack of reproducibility