df-core-pewi - Research Data Alliance

advertisement
Data Fabric IG
Use Case Analysis
Data Fabric Analysis
how to come to essential components &
services?
Analyze Data Practices
2
Data Practices I (120 interviews etc.)
3
Data Practices II – EUDAT federation
Community Centers
Common Data Centers
projects to push limits
and raise awareness
4
Data Practices II – split of functions
 physical layer operations are trivial – know how to do it
 “logical layer” operations are complex due to relations, etc.
 all LL information needs to be aggregated and we need to have
a secure access layer around it
5
Data Fabric Analysis
how to come to essential components &
services?
Analyze Use Cases
6
7
10 (+5) Use Cases so far (2 in development, others mature)
environmental science
natural science
life science
humanities, soc. sciences
IT, various
all indicated nodes are centers of national, regional and even worldwide federations
8
10 (+5) Use Cases so far (2 in development, others mature)
Name
Institute
state
1
Language Archive
Max Planck Institute NL
in operation
2
Geodata Sharing Platform
Academy of China
In operation
3
Datanet Federation Concortium
RENCI US
In operation
4
ADCIRC Storm Forcasting
RENCI US
In operation
5
EPOS Plate Observation
INGV/CINECA Italy
In operation
6
ENVRI Environment Observation
U Helsinki, Finland
In design
7
Nanoscopy Repository Cell structures
KIT, Germany
In design
8
Human Brain Neuroinformatics
EPFL Switzerland
in testing
9
ENES Climate Modeling
DKRZ Germany
In operation
10
LIGO Gravitation Physics
NCSA US
In operation
11
ECRIN Medical Trial Interoperation
U Düsseldorf Germany
In testing
12
VPH Physiology Simulation
U London UK
In operation
13
Species Archive
Nature Museum Germany
In operation
14
all indicated nodes are centers of national, regional and even worldwide federations
International
NeuroI Facility
INCF Sweden
In operation
15
Molecular Genetics
MPI Germany
In operation
9
10 (+5) Use Cases so far (2 in development, others mature)
Name
Institute
state
1
Language Archive
Max Planck Institute NL
in operation
2
Geodata Sharing Platform
Academy of China
In operation
3
Datanet Federation Concortium
RENCI US
In operation
4
ADCIRC Storm Forcasting
RENCI US
In operation
5
EPOS Plate Observation
INGV/CINECA Italy
In operation
6
ENVRI Environment Observation
U Helsinki, Finland
In design
7
Nanoscopy Repository Cell structures
KIT, Germany
In design
8
Human Brain Neuroinformatics
EPFL Switzerland
in testing
9
ENES Climate Modeling
DKRZ Germany
In operation
10
LIGO Gravitation Physics
NCSA US
In operation
11
ECRIN Medical Trial Interoperation
U Düsseldorf Germany
In testing
12
VPH Physiology Simulation
U London UK
In operation
13
Species Archive
Nature Museum Germany
In operation
14
all indicated nodes are centers of national, regional and even worldwide federations
International
NeuroI Facility
INCF Sweden
In operation
15
Molecular Genetics
MPI Germany
In operation
Issues of Relevance
10
management, analytics, conversion
provenance – reproducibility
workflows, policies, deployment
new collection
new metadata
temp store
AAI/FIM
highly distributed
in federations
sensors
simulations
crowd
etc.
PID, Metadata
Rights
Syntax, Types
Semantics
Relations
virtual
collection
builder
FS, Cloud, DB
Repository System
How do WGs/IGs fit?
REPRO
PROV
PP
BDA
11
BROK
CERT
FIM
REP
DMP
DOM
CITDD
CERT
Components I
12
 domain of registered digital objects (DO) incl. basic organization principles
(data, code, knowledge) -> worldwide PID system (Handles/DOI)
 domain of registered actors -> worldwide ID system (ORCID)
 domain of trusted repositories for DOs -> worldwide Rep Registry
 proper DFT/DSA/WDS compliant repository systems
 accepted policy commons (proper organization support, self-documenting,
tested/certified, etc.) -> policy component registry
 policy/services -> service registry
 authentication system -> various in place (ORCID just number)
 authorization system -> authorization registry
Components II
 MD components/schemas -> metadata schema registry
 data types /schemas/formats -> data type registry
 semantic categories -> category registry
 vocabularies -> vocabulary registry
 what about complex ontologies (thesauri, ontologies, etc.)
 what about mapping relations?
13
Components II
 MD components/schemas -> metadata schema registry
 data types /schemas/formats -> data type registry
 semantic categories -> category registry
 vocabularies -> vocabulary registry
 what about complex ontologies (thesauri, ontologies, etc.)
 what about mapping relations?
14
What to do today
 4 use cases (max 10 min) with the following goals
 understand whether we get what we want to get
(common components/services)
 discuss whether we need to adapt the template
 Zhu
 Dieter
 Sean
 Giuseppe
 Ed
 discuss how to move on with use cases & analysis




discuss my first look on C/S (?)
update of existing and appearance on wiki (deadline)
deadline for first round (when, whom to motivate, ?)
virtual meeting for a discussion on analysis (when?)
 at P6 (September) a first document with analysis
15
16
Did we forget something?
Data Practices I – Survey
17
 ~120 Interviews/Interactions
 2 Workshops with Leading Scientists (EU, US)




too much manual or via ad hoc scripts
too much in Legacy formats (no PID & MD)
there are lighthouse projects etc. but ...
DM and DP not efficient and too expensive
(Biologist for 75% of his time data manager)
 federating data incl. logical information much too expensive
 hardly usage of automated workflows and lack of
reproducibility
Data Practices I – Survey
18
 ~120 Interviews/Interactions
 2 Workshops with Leading Scientists (EU, US)




too much manual or via ad hoc scripts
too much in Legacy formats (no PID & MD)
there are lighthouse projects etc. but ...
DM and DP not efficient and too expensive
(Biologist for 75% of his time data manager)
 federating data incl. logical information much too expensive
 hardly usage of automated workflows and lack of
reproducibility
Download