Vecnet_UseCase_RepoPlat_20150915

advertisement
RDA Repository Platforms for Research Data
Interest Group
Use Case: VecNet – Vector-borne Disease Network
Author(s): Natalie Meyers, Reid Boehm
1. Scientific Motivation and Outcomes
The Vector Control Development Network or VCDN, funded by the Bill and Melinda Gates
Foundation in an effort to aid malaria eradication, is a collaborative effort that leverages
malaria modelling to inform strategic planning and malaria elimination assessments at
multiple levels. The outcome for the creation of the interactive digital data library and
modeling framework was to combine a shared set of resources that foster communication
between scientists and modellers in their research activities. This in turn facilitates the
analysis of transmission of vector-borne pathogens, in particular malaria, and the control of
the diseases they cause.
The main goal of this cyberinfrastructure is to translate the questions posed into salient
analysis of the modelling output. Data that is missing in the analysis process can be filled in
using similar settings from the digital library with proper citation and notification to the user.
Each query to the network needs to be supported by the body of data that is managed within
the cyber-infrastructure.
Examples of VecNet’s audience of users include individuals and groups who want to use
software models to explore combinations of vector and drug-based malaria interventions to
determine the optimal mix for use in specific geographic areas, but may not have access to
computational or analyst resources of their own. New product developers such as chemical
companies and drug developers may want to refine their target product profiles and policy
makers or funders who use the modelled data will be better able to make decisions about
where to spend their resources. Additionally vector disease model developers and users are
able to democratize access to their models as well as input and output dataDe.
VecNet has participated in past RDA analysis as an example within the Data Citation
Working Group. Principle members of VecNet see the importance of sharing not only the
facets of data citation that are important experiences to aid other data sharing projects and
platforms, but also the important repository considerations outlined by the data repository
interest group in this template.
Page 1 of 5
2. Functional Description
Digital Library requirements include the full lifecycle for the information. This includes the
ability to:
 Ingest new data:
o Published works
o Batch Data sets
o Single data sets
o Submit known product values to DigLib
 Curate the Library:
o Maintaining Metadata
o Maintaining Citation Metadata
o Promote information
 Manage reference data
 Provide remote access
3. Achieved Results
The VecNet repository is fully functioning and addresses the requirements identified in the
functional description. The digital library implemented software and metadata that aid in the
continued function. The principal Hydra technology stack elements include Fedora Commons
repository software, SOLR, Ruby on Rails, and Blacklight. VecNet uses Dublin Core for the
generic metadata elements and elements of FGDC CSDGM for geospatial metadata, and
Darwin core for extending the taxonomic naming capability. Authorities used to aid in data entry,
normalization, and features for search and browse include: the National Library of Medicine’s
Medical Subject Headings (MeSH), Geonames: a federated gazetter that leverages linked data,
and the National Center for Biotechnology Information (NCBI) taxonomy which supports species
name features.
Future goals or plans for VecNet beyond the current implementation and use cases include:
1. Expand GeoBlacklight discovery layer implementation to include rendering & data attribute display for
points, lines, gridded data and polygons in geospatial data files at record level. We currently only
display file level Geospatial metadata via Blacklight not data level from within file.
2. Collaborate with Center for Open Science to support registration of VecNet assets in the Open
Science Framework . We will do a data crosswalk between systems, pilot some VecNet records and
datasets in OSF as use cases, and compare the optional and required feature set in VecNet with
feature set in OSF.
3. Expand auto-generation of metadata records for simulation data. Now support metadata for EMOD
simulations at run level, expand this feature to fully support OpenMalaria and to support automatic
metadata generation at simulation and sweep level for all simulations.
4. Integrate with ORCID and work on more systematic ways to do names disambiguation.
Page 2 of 5
5. Implement authority service for attaching International Chemical Identifier (InChI) to chemical data.
6. Improve system performance/responsiveness .
4. Requirements
Requirement
Description
Motivation from Use
Case
Importance (1 - very
important1 to 5 - not
at all important2)
Remote Access
Management
Allows an authorized
individual the
capacity to curate
materials from
distributed locations
Democratizing
access to models,
input, and output
1
Support Staged
Content
Staged Content
includes: submission
states that are raw,
processed, curated
and published
Facilitating decision
making /translating
questions posed into
salient analysis
4
Support Full Text
Search
Full Text Search will
allow for greater
nuances in retrieving
related results
Facilitate decision
making/translating
questions posed into
salient analysis
1
Provide both single
and batch ingest
paths
Allows for a range of
data types and scales
to be submitted with
maximum efficiency
Foster
communication
between scientists
and modellers
2
Extracted information
is stored with related
metadata
This combination of
information with
metadata during
ingest aids in more
comprehensive
records over time.
Facilitate decision
making/translating
questions posed into
salient analysis
2
Proper citations and
user notifications
1
Maintain metadata for This includes
stored information
descriptions such as
author, owner,
license, source
publication, librarian,
date and time stamps
1
2
1= Feature Exists/Complete in existing VecNet system
5= Feature not yet implemented, low priority
Page 3 of 5
Provide connections
to current reference
data
This includes
linkages, pointers,
local cache, etc. to
data such as
population and
weather.
Using software
models with
geographic areas
2
Allow authorized data
contributors to
annotate submissions
that require special
security controls
Increased
communication of
access in relation to
security and
sensitivity of data
products
Democratize access
to models, input, and
output
1
Allow product
developers to update
product information
Gives products that
are being developed
a working space
within the repository
Developer ability to
refine target product
profiles
2
Maintain citations
linked to
experiments/simulatio
ns conducted in
Transmission
Simulator
Citations provide
recognition and
updates from
experiments utilizing
data
Foster
communication
between scientists
and modellers/
proper citations and
user notifications
3
Allow data providers
to choose the level of
access to data
Control over access
to data is in the
hands of those who
provide not the
system
Democratize access
to models, input, and
output
1
Capture authorized
relevant metadata
This provides a
record of information
flow into and out of
the repository
Proper citations and
user notifications
1
Track changes to
resource metadata
and information
relationships
Charts the
connections that are
created and shift as
use occurs and new
files are added
Foster
communication
between scientists
and modellers/
proper citation and
user notifications
1
Support the ability to
ingest external data
from another source
Allows sharing across Democratize access
repositories when
to models, input, and
related data has
output
already been
archived elsewhere
Page 4 of 5
3
Provide authorized
users access to
previously run
simulations
Allows users to see
past uses that can
inform their own work
Translate posed
questions into salient
analysis
1
Provide historical
reference data
For example:
population,
meteorology, and
agriculture
Missing data is filled
in with similar
settings/ using
software models with
geographic areas
1
Allow content to be
marked for deletion
by authorized users
Within context certain
files may not be
necessary to keep
authorized users are
given control to deem
which files should go.
Democratize access
to models, input, and
output
2
Allow local download
of a selected set of
information
The ability to
download content to
a local device when
the information
conforms to proper
handling and is
marked as
accessible.
Democratize access
to models, input, and
output
1
Require all data to be
attributed with
handling
requirements
Handling
requirements include
licenses and security
parameters.
Proper citations and
user notifications
1
Maintain a permanent Having a permanent
history of versions for history of versions
all stored materials
shows the process of
change overtime and
allows users to return
to the original or
other iteration.
Proper citations and
user notifications
1
Capture "degree of
confidence" on each
library item
Translate questions
posed into salient
analysis
4
Provides users with
idea of quality and
trustworthiness of
material in selection
process.
Page 5 of 5
Download