InstitutionalLifeCycleRDM_UseCase_RepoPlat20150819

advertisement
RDA Repository Platforms for Research Data
Interest Group
Use Case: Institutional Life-cycle Research Data Management
Author(s): Eric Maris
This is a use case description of the “Repository Platforms for Research Data” IG. While points
1, 2 and 3 aim at a general description/overview of the use case, point 4 is meant to list the
requirements.
Please, save the file using the name scheme: UseCaseName_UseCase_RepoPlat.docx
1. Scientific Motivation and Outcomes
For my institute (the Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The
Netherlands), we need a facility for
1. Preserving our research data.
2. Documenting the scientific process (as a means to increase the reproducibility of our
scientific results)
3. Sharing the data of published studies with the scientific community.
We will realize these goals by a set of protocols that are to be used in combination with a digital
repository.
The best possible outcome would be a set of protocols that is fully adopted by all members of
the institute and a digital repository with an easy-to-use interface that provides all the
functionality that is specified in the protocols.
2. Functional Description
To realize the three goals described under 1., we have defined three collection types:
1. Data acquisition collections (DACs)
2. Scientific integrity collections (SICs)
3. Data sharing collections (DSCs)
Each of these collections has its own set of data metadata, and their construction is described in
the protocols.
3. Achieved Results
We have finished the protocols for the users, a test version of the system (with limited
functionality) is running, and we are building a web-client. We have not yet tested with naïve
users.
4. Requirements
Note
Page 1 of 7
I found it difficult to describe the IT requirements of our system in the form of a table. After doing
it, it felt like much of the underlying ideas were lost. I therefore opted for a free text description.
Collections, Roles and System Architecture
The IT requirements are imposed by our protocols. Prior to listing these requirements, we
introduce a few simple concepts that determine the structure of the repository. First, the
repository distinguishes between three types of collections of files:
1. Data Acquisition Collections (DACs)
2. Scientific Integrity Collections (SICs)
3. Data Sharing Collections (DSCs)
Collections are defined by their metadata, the access rights with respect to the metadatafields,
and the postprocessing of the metadata (see further).
Second, the repository allows for role-based access to the data (i.e., the files in the collections)
and metadata. There are roles at the level of the organization (o), the organizational units (ou),
and the collections (coll). At the organization level, there are two roles:
1. anonymous_user
2. o_user
At the level of some organizational unit, there are also two roles:
1. research_administrator
2. ou_ reviewer
At the level of some collection in some organizational unit, there are three roles:
1. collection manager
2. collection reviewer
3. collection viewer
Third, the architecture of the system will be of the client-server type. We will use this
architecture to allow for the collections being access through different clients. One of these
clients will be a web client, and this one will be used to read and edit the collection metadata, as
well as the user profiles.
Authentication
Because a web client is not suitable for file transfer, we distinguish between authentication for
the web-client and authentication for a file transfer client.
Authentication for the web-client
For the web-client, the user authenticates against an Identity Provider (IdP). Depedent on the
IdP against which the user authenticates, he has/can get different rights. For access to DACs
and SICs, it is required that the user authenticates using a trusted federated authentication
Page 2 of 7
service (Surfconext, EduGain). For access to DSC, it is sufficient if the user authenticates
against one of the popular IdPs (Google, Facebook, Twitter, …).
Authentication for a file transfer client
Authentication for a file transfer client requires that users first authenticate for the web client. Via
the web client, the user can obtain a one-time password with which he can authenticate for the
file transfer client. This authentication scheme will be implemented for webdav clients.
Fallback to userID-password
If the preferred authentication scheme – described in previous paragraphs – does not allow to
realize all functional requirements, it must be possible to use the traditional authentication using
a userID-password pair.
Collection Definition
A collection is defined by (1) its metadata, (2) a particular role-dependent metadata access, and
(3) post-processing of metadata.
The different metadata types
Free text
Alphanumeric with a maximum number of characters.
Numerical
A number with a given unit.
Controlled vocabulary
An element of a controlled vocabulary, possibly hierarchical (see, MeSH).
Role-dependent metadata access
The write access to a given metadata field is role-dependent. For example, some metadata
fields can only be edited by research administrators. Also, some metadata fields can only be
written by the system, and therefore no role allows for editing these fields (e.g., the systeminternal collection ID).
Post-processing metadata
Some metadata fields require post-processing. For example, this holds for the field that specify
the disk quotum, and the field that specifies that a frozen copy has to be made.
Collection Initiation
Collection initiation is performed by a research administrator. He does this by completing
metadata fields for which, with one exception, only he is authorized:
Page 3 of 7
1. Authorizing collection managers and a reviewer. (Note: A collection manager can also
be authorized by another collection manager.)
2. Assigning a disk quota
3. Completing administrative metadata
The first two metadata fields require post-processing.
A collection is initiated with default values for some metadata fields.
Collection Building
Collection building is performed by the collection managers and contributors. It involves both
metadata and data. The metadata are accessed via the web-client, and the data via the file
transfer client.
Editing metadata
This involves the following:
1. Authorizing collection managers, contributors and viewers. Only the collection manager
is authorized for this.
2. Completing research-related metadata. Both collection managers and contributors are
authorized for this.
File up- and download
Both collection managers and contributors are authorized for this.
Collection Closure and Versioning
Collection closure
Collection closure involves the following steps:
1. A collection manager requests for collection closure by setting the value of some
metadata field. After that field is set, the collection becomes read-only (while keeping the
old authorizations as information), and the collection is highlighted in the web-client view
of the reviewer.
2. There are two possibilities:
a. The collection reviewer approves collection closure by setting the value of some
metadata field. Following approval, a frozen copy with PID is generated.
b. The collection reviewer does not approve the collection, and the original write
authorizations for this collection are reinstalled.
Versioning
It is possible to make multiple frozen copies of the same collection. Via their PIDs, it is possible
to reconstruct the sequence in which they were generated.
Page 4 of 7
Authentication-Method-Dependent Collection Access
Access to collections depends on the authentication method.
No authentication
Only the metadata of the DSCs can be read.
Authentication against a non-trusted IdP
Only the metadata of the DSCs can be read. The authenticated user can be authorized as a
viewer of a DSC.
Authentication against a trusted IdP
Authorizations for data and metadata are determined by (1) the collection-level authorizations,
and (2) the user profile.
Browsing, Sorting, and Searching Collections
In the web-client, the user can browse, sort and search for collections. For sorting and browsing,
he can make use of the collections’ metadata fields.
User Profile Editing
Role-dependent user profile editing
The editing of the fields of the user profile is role-dependent: some fields can be edited by the
user himself, others by a research administrator, and still others by a system administrator.
Center-level authorizations
o_user
A research administrator can edit a field in a user profile, giving that user access to the
metadata of all of the collections of an organizational unit. Such a user is called a o_user. Only
users that have registered via a trusted IdP can become an o_user.
ou_reviewers
A research administrator can edit a field in a user profile, giving that user read access to all of
the collections of an organizational unit. Such a user is called an ou_reviewer.
Research administrators
A system administrator can edit a field in a user profile, giving that user all the rights that belong
to the research administrator role.
Page 5 of 7
Linking internal user accounts
A research administrator can change the field in the user profile that contains the systeminternal ID associated with the user account. This allows for continuity in the authorizations in
case a user changes IdP.
Requirement
Description
Motivation from Use
Case
Definition of
collection types in
terms of their
metadata
Definition of a
namespace in which
the collections are
organized according
to organizational unit
Definition of roles in
terms of their rights
with respect to
specific collections
Users interact via a
web-interface (for
editing metadata and
authorizing users)
and specialized
clients for file up- and
download
Possibility to use
multiple clients to
interact with the same
middleware layer
Federated
authentication
Data repository can
be used by different
organizational units
that have controlled
access to each
other’s collections
Page 6 of 7
Importance (1 - very
important to 5 - not at
all important)
Data repository can
be organized such
that the metadata of
collections that can
be shared (DSCs) are
visible to the world
and can be searched
by web crawlers
Scalable to the
petabyte level
Hardware
independent (in the
sense that the logical
namespace does not
change when all the
files are migrated)
Page 7 of 7
Download