API Data Grids

advertisement
Repositories, Federations, APIs, Policies
- wrap up Peter Wittenburg
these slides are just a personal summary of major points
they do not represent per se the opinions of all participants
Wrap up 1
• Larry's points:
– self-standing objects including all essential information to be worked
on independently - only way things will scale in a complex landscape
– question: what is the scope of that information - not dynamic stuff
• pragmatic choices - backend will deal with multiple layers of information
– question: need separate layers for different information types
– question: would inclusion of PID not be sufficient
• Alex' points:
– solutions for many topics we are thinking about
(Zentity - metadata level as Triple Store, RIC - Workspace, AZURE =
cloud store etc., DURACLoud = intermediate layer, TRIDENT =
Workflow Workbench)
– question: vendor locking
• MS will provide SDKs for most of the modules as a possible solution
– question: independent modules, open source, open APIs etc
– question: easy to combine with community traditions
• only possible with the help of concrete projects
• trust development required
Wrap up 2
• Malte's points:
– follow atomistic model - separate different types of information
– information loss when exchanging (meta) data
– make use of web-technologies and make things web accessible
• often chosen solutions are not fully web-accessible due to hidden parameters etc
– offer a variety of formats - nice addressing scheme
– exchanging data is difficult (XACML, D-SPACE-HAL, etc)
• XACML standard indeed utterly complex - not well suited for manipulations
– question: why is exchange complex - MPI does it for data and now MD
– question: information loss - wrong approach
• John's points:
– community does not yet use metadata - all relevant info in names
– GPFS domain is easy going - one shared file system
– going out of GPFS domain is not trivial
• problems when leaving domains is a general issue of course
– question: how can iRODS help
– question: what about metadata in future
Wrap up 3
• Ken's points:
–
–
–
–
–
–
time&space joins all data from water, weather and climate
federating huge amount of different repositories
CLASS can't communicate with community solution
iRODS use for processing pipelines, virtualization, time stamping, etc
policy rules for control (sharing, publishing, preservation, etc)
seems to exist a core set of policy rules similar for all reps
• this came out as a result of intensive interactions
– question: when results for iRODS
• keep going on with this "agile" approach of constant interactions
– question: does iRODS help/simplify etc
Wrap up 4
• Willem's points:
– concrete limited iRODS test case for layered setup
– iRODS allows to do the things but not ideal if only replication in mind ...
– question: different zones vs. one zone
• multiple zones in complex configurations unavoidable
– question: iRODS metadata - for what
– question: how to do access rights transfer
• very sensitive point - need to be done with great care
Wrap up 5
• Jean-Yves' points:
–
–
–
–
–
–
iRODS helps in virtualizing from data storage/management solutions
iRODS can be used in complex scenarios and fulfils its role
policy rules easily become complex and micro-services do the job
accept iRODS assumptions (metadata, one iCAT, full data control, ...)
question: amount of effort for microservices
question: useful for federating many zones
• iCAT is single point of trust
• choice for single zone since users "don't care so much about data"
– question: does it scale
• iRODS seems to scale
• iCAT scalable since standard DB techniques can be applied
• scaling of iRODS metadata is very important
landscape
app
app
app
applications
app
two separate
organizations
one single
organization
RS
RS
RS
app
RS
service centers
repository systems
data services API
DC
DC
DC
DC
DC
data centers
replication software
LTP storage
Messages
• metadata is crucial (info about object, history of object, context ...)
• clear identification of objects by "external PID" is crucial
•
granularity issue is important and not solved
• repository federations are integrating diverse systems grown over
years
• how to keep things simple in the integration layer
– separate things which are functionally different
– clearly specify services to offer
– many repository systems make deep assumption about data model
(get some functionality for free, but complexity price in heterogeneous
landscapes)
• iRODS is something to look at due to proper ideas
–
–
–
–
iRODS can be used for production
iRODS is part of the whole story and not a magical tool
policy rules as workflows - although easily getting complex
easy going if no own metadata, full control and single zone but this is not
reality in rep solutions
– the way towards a declarative language for policy rules is right
Questions
• how to make replicated data at data centers accessible to users
with minimal effort
• obviously by having a referencing mechanism at one place for
application layer
• how to easily "transfer" access rights
• which data to be included in object to make it self-contained
• what are the consequences for applications etc
• is it realistic on short term
• is collaboration with a company a possible way
• do we agree that iRODS is the solution to start with
• do we need a generic API specification - also if we use iRODS e.g.
• what else?
Download