Repositories, Federations, APIs, Policies - wrap up Peter Wittenburg these slides are just a personal summary of major points they do not represent per se the opinions of all participants Wrap up 1 • Larry's points: – self-standing objects including all essential information to be worked on independently - only way things will scale in a complex landscape – question: what is the scope of that information - not dynamic stuff • pragmatic choices - backend will deal with multiple layers of information – question: need separate layers for different information types – question: would inclusion of PID not be sufficient • Alex' points: – solutions for many topics we are thinking about (Zentity - metadata level as Triple Store, RIC - Workspace, AZURE = cloud store etc., DURACLoud = intermediate layer, TRIDENT = Workflow Workbench) – question: vendor locking • MS will provide SDKs for most of the modules as a possible solution – question: independent modules, open source, open APIs etc – question: easy to combine with community traditions • only possible with the help of concrete projects • trust development required Wrap up 2 • Malte's points: – follow atomistic model - separate different types of information – information loss when exchanging (meta) data – make use of web-technologies and make things web accessible • often chosen solutions are not fully web-accessible due to hidden parameters etc – offer a variety of formats - nice addressing scheme – exchanging data is difficult (XACML, D-SPACE-HAL, etc) • XACML standard indeed utterly complex - not well suited for manipulations – question: why is exchange complex - MPI does it for data and now MD – question: information loss - wrong approach • John's points: – community does not yet use metadata - all relevant info in names – GPFS domain is easy going - one shared file system – going out of GPFS domain is not trivial • problems when leaving domains is a general issue of course – question: how can iRODS help – question: what about metadata in future Wrap up 3 • Ken's points: – – – – – – time&space joins all data from water, weather and climate federating huge amount of different repositories CLASS can't communicate with community solution iRODS use for processing pipelines, virtualization, time stamping, etc policy rules for control (sharing, publishing, preservation, etc) seems to exist a core set of policy rules similar for all reps • this came out as a result of intensive interactions – question: when results for iRODS • keep going on with this "agile" approach of constant interactions – question: does iRODS help/simplify etc Wrap up 4 • Willem's points: – concrete limited iRODS test case for layered setup – iRODS allows to do the things but not ideal if only replication in mind ... – question: different zones vs. one zone • multiple zones in complex configurations unavoidable – question: iRODS metadata - for what – question: how to do access rights transfer • very sensitive point - need to be done with great care Wrap up 5 • Jean-Yves' points: – – – – – – iRODS helps in virtualizing from data storage/management solutions iRODS can be used in complex scenarios and fulfils its role policy rules easily become complex and micro-services do the job accept iRODS assumptions (metadata, one iCAT, full data control, ...) question: amount of effort for microservices question: useful for federating many zones • iCAT is single point of trust • choice for single zone since users "don't care so much about data" – question: does it scale • iRODS seems to scale • iCAT scalable since standard DB techniques can be applied • scaling of iRODS metadata is very important landscape app app app applications app two separate organizations one single organization RS RS RS app RS service centers repository systems data services API DC DC DC DC DC data centers replication software LTP storage Messages • metadata is crucial (info about object, history of object, context ...) • clear identification of objects by "external PID" is crucial • granularity issue is important and not solved • repository federations are integrating diverse systems grown over years • how to keep things simple in the integration layer – separate things which are functionally different – clearly specify services to offer – many repository systems make deep assumption about data model (get some functionality for free, but complexity price in heterogeneous landscapes) • iRODS is something to look at due to proper ideas – – – – iRODS can be used for production iRODS is part of the whole story and not a magical tool policy rules as workflows - although easily getting complex easy going if no own metadata, full control and single zone but this is not reality in rep solutions – the way towards a declarative language for policy rules is right Questions • how to make replicated data at data centers accessible to users with minimal effort • obviously by having a referencing mechanism at one place for application layer • how to easily "transfer" access rights • which data to be included in object to make it self-contained • what are the consequences for applications etc • is it realistic on short term • is collaboration with a company a possible way • do we agree that iRODS is the solution to start with • do we need a generic API specification - also if we use iRODS e.g. • what else?