DF intro slides - Research Data Alliance

advertisement
RDA Data Fabric (DF) Interest Group
Peter Wittenburg & Gary Berg-Cross
DF IG Goals
Introduction to Group
Coming to a reproducible data science is a high
priority. Only highly automated self-documenting
procedures respecting proper data organization
principles will overcome current barriers. The Data
Fabric group needs to work out directions in the
belief that integrated data fabrics are a critical
component of infrastructures paving the way to
reproducible science.
The goal is to develop alternative
views, components and aspects of the
DF concept and related infrastructure.
Conceptualizations be discussed to
come to an agreed RDA view on how
the evolving DF landscape can be
productively described.
The idea for this IG emerged from the
discussions amongst the chairs of various
RDA WGs
Characteristics of a Data Fabric:
We are just beginning to scout out
the landscape of data fabric . In one view it is a
minimalistic set of infrastructure and service
requirements by which services can plug into
(belong to) the defined fabric. In a data fabric
we see how the separate components,
developed separately, can be made to work
together, this means that for different sets of
components the data fabric will be different. We
note, strongly, that it is meant as a
descriptive/conceptual way to deal with the
interrelation between many components, rather
than prescriptive (like you would have with an
architecture).
As part of this, essential DF
components and their interrelation
need to be identified and defined.
This diagram provides a high-level view
of possible actions within a Data Fabric
running from raw data to increasingly
documented data that has been
enriched and analyzed creating
referable and citable data. As shown
publications are part of this Data
Fabric since they are often used for
data mining and other analysis.
Some of the existing RDA groups
including metadata WGs and IGs are
working on DF components and need
to be positioned in such a landscape.
New working groups need to be
defined to work on identified
components and interfaces.
Data Fabric IG
DF
Organizing Chairs: Gary Berg-Cross & Peter Wittenburg
Data Foundation and Terminology
DFT
Chairs: Gary Berg-Cross, Raphael Ritz, Peter Wittenburg
Some early statements on Data Fabric & Components
Infrastructure Component View
(after Reagan Moore)
A data fabric is the set of software and hardware
infrastructure components that are used to manage
data, information, and knowledge.
When an enterprise implements a data management
solution, one of multiple types of DFs infrastructure is
typically chosen to enable the:
Data Object View
(after Peter Wittenburg)
Data Fabric Service View
(after Beth Plale)
A DF should:
The data fabric covers a domain of
registered digital objects (DO) that Be self-documenting – a service
are stored in well managed
contributes to the lifecycle
repositories.
of data objects it handles and must
keep track of the scientifically
DOs are associated with metadata relevant actions it performs on
Data management –enterprise to build a data
describing its creation context and those data objects.
repository, manage an information catalog, & enforce
history (provenance).
management policies
The resulting log files are periodically
be sent to a provenance consolidator.
The
Data
Fabric
covers
a
domain
of
Data analysis –enterprise to process a data collection,
registered software components
apply analysis tools, and automate a processing
Track
data
objects
through
its
service
(workflows, services) that are in
pipeline.
processing
using
one
of
the
wellfact a special class of DOs.
known object identifier schemes
Data preservation –enterprise to build reference
Actions on DOs may be guided by
collections and knowledge bases that comprise the
Identify
itself
as
one
type
of
service
abstract policies that are explicit
intellectual capital, while managing technology
as
drawn
from
an
RDAagreed
upon
and thus auditable.
evolution
list of service types.
Data publication –discovery and access of data
There can be multiple data fabric
Implement an interface to a publishcollections.
implementations that should be
subscribe system which serves as
highly interoperable.
the Data Fabric Control mechanism.
Data sharing – controlled sharing of a data collection,
shared analysis workflows, and information catalogs.
People and Plans
Who
The DF group is a joint
initiative of the first 5 working
groups who understood that
all what they started needs to
be seen as part of a bigger
plan full of dynamics. They
also realized that the work
which they started needs to
be maintained to meet new
requirements. Others
interested in the goal of
improving our data creation
machine is welcome to
participate in the DF group.
Initiating Members
The suggested Data Fabric IG is planned as a
forum to discuss these alternative views,
components and aspects of the DF concept.
Status & Planned Outcomes
The DF group will first work on an
initial position paper to characterize
the landscape of components and
To be discussed:
• What is the agreed RDA view on a Data Fabric. interfaces that have the potential to
realize a reproducible data science.
• How the outputs from the RDA working
Such a landscape will change over
groups fit in the DF concept and how they
years where components will be
relate to each other and to various related
replaced by others requiring that the
WGs and IGs within the RDA.
position paper will need to be
• Which further activities are required to push amended frequently.
The DF group will be a continuous
the data fabric concept ahead.
source for identifying new
requirements
and
barriers
that
need
• Continuation and initialization of working
to
be
addressed
by
new
RDA
groups,
group activities related to the DF.
in particular working groups.
Rebecca Koskela, Keith Jefferey,
Jane Greenberg, Reagan Moore,
•
Improving
the
uptake
of
the
WG
outputs
by
Rainer Stotzka, Tim Delauro,
communicating
them
as
a
coherent
whole
Tobias Weigel, Raphael Ritz, Gary
within the DF concept.
Berg-Cross, Peter Wittenburg,
Daan Broeder, Larry Lannom,
Prepared for the 4th RDA Plenary
Beth Plale, Juan Bicarregui,
in Amsterdam, Sept. 2014,
Herman Stehouwer
Updating the results will be an
interesting but worthwhile challenge.
Download