EUDAT, Johannes Reetz

advertisement
EUDAT
AAI for a Collaborative Data Infrastructure
- Challenges and Approaches Johannes Reetz, EUDAT
VAMP workshop
Helsinki, 30 Sep 2013
The CDI concept
Collaborative Data Infrastructure
Data Curation
Trust
Data Generators
Users
User-focused
functionality, data
capture & transfer, VREs
Community Support Services
Data discovery &
navigation, workflow
creation, annotation,
interpretability
Common Data Services
Persistent storage,
identification,
authenticity, workflow
execution, mining
2
Initially six research communities on Board
•
•
•
•
•
•
EPOS: European Plate Observatory System
CLARIN: Common Language Resources and Technology Infrastructure
ENES: Service for Climate Modelling in Europe
LifeWatch: Biodiversity Data and Observatories
VPH: The Virtual Physiological Human
INCF: International Neuroinformatics
•
All share common challenges:
–
–
–
–
–
Reference models and architectures
Persistent data identifiers
Metadata management
Distributed data sources
Data interoperability
3
Communities and Data Centers
Identifying basic
requirements
Identify commonalities,
common data services
What community users see …
Community portal,
single credential type
Community Layer
Community specific authentication, authorization & single sign-on
commutity
data
What community users see …
EUDAT portal, for
non-affiliated users,
many credential types
Various community portals,
different credential types
common metadata exploration
common data stage-in and stage-out services
data services for the long tail data, also from citizen scientists
common replication services with access to distributed storage
Unified Authentication, Authorization & Single Sign-On
community
data
data
commutity
data
Other
very
useful
from: Analysis of the FIM doc (v0.7, L. Florio et al. 2013)
1. User friendliness (high)
2. Browser & non-browser federated access (high)
3. Multiple technologies with translators including dynamic issue of
credentials (medium)
(high)
(high)
4. Bridging communities (medium)
5. Implementations based on open standards and sustainable with
compatible licenses (high)
6. Different Levels of Assurance with provenance (high)
7. Authorisation under community and/or facility control (high)
8. Attributes must be able to cross national borders(high)
(high)
9. Well defined semantically harmonised attributes(medium)
10. Flexible and scalable IdP attribute release policy(medium)
EUDAT supports these requirements, but emphasizes #3, #4 and #9
EUDAT Sites
community centres
repositories
general data centres
(replica) storages
Safe Replication Service
• Robust, safe and highly available data replication service
for small- and medium- sized repositories
– To guard against data loss in long-term archiving and
preservation
– To optimize access for
user from different regions
– To bring data strategically
closer to systems for
powerful computeintensive analysis
PIDs • Policy rules
– PIDs are used to keep
EUDAT CDI Domain of registered data
track on location and
can provide attributes
9
Use Case: CLARIN – Safe Replication
EPIC PID registry
Safe Replication “islands”
INCF
EPOS / Orpheus
diXa
ENES
/CMIP5,IPCC-AR5
CLARIN / Replix
community centres
repositories
CLARIN / CUNI
VPH / VIP
CLARIN / CUNI
general data centres
NeuGrid
replica storages
EPOS / PP WG7
Data Staging Service
• Support researchers in transferring large data collections
from EUDAT storage to HPC facilities
• Reliable, efficient, and easy-to-use tools to manage data
transfers
• Provide the means to ingest
PRACE
computational results into
HPC
the repository via the
EUDAT infrastructure
HPC
EUDAT CDI Domain
of registered data
12
EUDAT Services (1)
Safe Replication Service
• Replicating Data Objects (DO) from a Repository to Replica Storages
• Repository & Replica Storage belong to separate administrative zones
• Registration of Original DO and Replica
PID / object identifier Service
• Create DO handles
• Manages/Maintain DO handles
• Resolve DO handles
Data Staging Service
• Replication of Data from the domain of registered data (Stage-Out)
• Replication of data objects into the domain of registered data (Stage-In)
• Replication of not-registered Data Objects between scratch storages
13
Service specific actors/actions (1)
Safe Replication Service
• Repository Data Manager replicates
• Replica Storage Manager registers DOs
• 1) (community) user access data via repository
• 2) User access data via replica storage
PIDs • Policy rules
EUDAT CDI Domain of registered da
PID (Handle) Service
• Repository Data Manager: creates/manages primary object handle
• Replica Storage Manager: creates/manages secondary object handles
• Users and others resolves the location of the physical storage the handles
(PIDs)
Data Staging
• Users access and fetch data from
either the repository or the replica storage
• User ingest new data into the repository
14
Simple Store for ”long-tail” data and the Citizen scientists
• Allow registered users to upload ”long tail” data into the
EUDAT store
• Enable sharing objects and collections with other
researchers
• Utilise other EUDAT
services to provide
reliability and data
retention
• PIDs are assigned to
uploaded DO
Simple upload
Simple metadata
PID registration
EUDAT CDI Domain of registered data
Joint Metadata Service
• Find and define collections of scientific data – generated
either by various communities or via EUDAT services
(e.g. facetted search)
• Access those data collections through the given
references in the metadata to the relevant data stores
EUDAT CDI Domain
of registered data
Definition of the data sets as objects for entitlement
EUDAT Services (2)
Simple Store Service
•
•
•
Repository for registered data with metadata for the sharing
Digital objects are registered (handles are assigned)
Fragmented User Group: many communities & „citizen scientists“ are
contributing and retrieving data
EUDATbox Service
•
•
•
Temporary shareable storage space for data, not necessarily registered
User deposits data – not necessarily with metadata
Not a homogeneous user group: many communities, „citizen scientists“
(Joint) Metadata Service
•
•
Metadata from various repositories are harvested and collected
Metadata exploration, facetted search:
result sets define data set for entitlement
17
Service specific actors/actions (2)
Simple Store (Repository)
•
•
•
Users deposit data and metadata
User search for and access data
Repository Storage Manager
(needs to create the handle service)
EUDAT box
•
•
•
User deposit data
User shares data by inviting other users
User access data
(Joint) Meta Data Service
•
•
Manager harvests metadata from (many)
repositories
also via the replica site
EUDAT CDI Domain
of registered data
18
*
IdP
A
• zoned credential conversion service
• unique user Ids, project-wise mapped to
• attribute based access control information
OpenID
AtP
1
AtP
2
AtP
3
Attribute Provider
AuthZ
either community-managed or ( ) attributes provided by user’s home IdP are reused
*
20
EUDAT AAI-TF approach
ConSec: Contrail Security code
21
The Figure shows the high level view: SAML is used for authentication (possibly translated from
OpenID (not shown));
OAuth (version 2) is used for delegation (internally, within the federation), and XACML is used for
access control policies.
Control (in the workflow sense) roughly goes from left to right and from top to bottom. Internally, an
X.509 certificate with authorisation attributes is generated; this certificate is also managed internally
and thus not usually exposed to (or accessible by) the user.
Its purpose is threefold: (a) to ensure that non-HTTP services can be accessed (i.e., outside the
OAuth delegation workflow), such as GridFTP and iRODS, and (b) to allow fine-grained authorisation,
and (c) to allow command line access to services for expert users. In OAuth, the authorisation server
remains the central hub where access is delegated. However since, EUDAT needs finer grained
access, so the generated X.509 certificate carries also authorisation attributes (see below), which are
checked against pre-defined access policies.
The system deployed and used by EUDAT was built by the Contrail project, so we are reusing the
Contrail Security (ConSec) code and tools developed within this pilot project. This decision was
based on the evaluation of options, where ConSec promised most of the features required by the
EUDAT communities. EUDAT is currently running a ConSec authentication infrastructure for
integration at FZJ. EUDAT is currently not running an authorisation infrastructure.
22
Download