RDA Repository Platforms for Research Data Interest Group Use Case: VecNet – Vector-borne Disease Network Author(s): Natalie Meyers, Reid Boehm 1. Scientific Motivation and Outcomes The Vector Control Development Network or VCDN, funded by the Bill and Melinda Gates Foundation in an effort to aid malaria eradication, is a collaborative effort that leverages malaria modelling to inform strategic planning and malaria elimination assessments at multiple levels. The outcome for the creation of the interactive digital data library and modeling framework was to combine a shared set of resources that foster communication between scientists and modellers in their research activities. This in turn facilitates the analysis of transmission of vector-borne pathogens, in particular malaria, and the control of the diseases they cause. The main goal of this cyberinfrastructure is to translate the questions posed into salient analysis of the modelling output. Data that is missing in the analysis process can be filled in using similar settings from the digital library with proper citation and notification to the user. Each query to the network needs to be supported by the body of data that is managed within the cyber-infrastructure. Examples of VecNet’s audience of users include individuals and groups who want to use software models to explore combinations of vector and drug-based malaria interventions to determine the optimal mix for use in specific geographic areas, but may not have access to computational or analyst resources of their own. New product developers such as chemical companies and drug developers may want to refine their target product profiles and policy makers or funders who use the modelled data will be better able to make decisions about where to spend their resources. Additionally vector disease model developers and users are able to democratize access to their models as well as input and output dataDe. VecNet has participated in past RDA analysis as an example within the Data Citation Working Group. Principle members of VecNet see the importance of sharing not only the facets of data citation that are important experiences to aid other data sharing projects and platforms, but also the important repository considerations outlined by the data repository interest group in this template. Page 1 of 5 2. Functional Description Digital Library requirements include the full lifecycle for the information. This includes the ability to: Ingest new data: o Published works o Batch Data sets o Single data sets o Submit known product values to DigLib Curate the Library: o Maintaining Metadata o Maintaining Citation Metadata o Promote information Manage reference data Provide remote access 3. Achieved Results The VecNet repository is fully functioning and addresses the requirements identified in the functional description. The digital library implemented software and metadata that aid in the continued function. The principal Hydra technology stack elements include Fedora Commons repository software, SOLR, Ruby on Rails, and Blacklight. VecNet uses Dublin Core for the generic metadata elements and elements of FGDC CSDGM for geospatial metadata, and Darwin core for extending the taxonomic naming capability. Authorities used to aid in data entry, normalization, and features for search and browse include: the National Library of Medicine’s Medical Subject Headings (MeSH), Geonames: a federated gazetter that leverages linked data, and the National Center for Biotechnology Information (NCBI) taxonomy which supports species name features. Future goals or plans for VecNet beyond the current implementation and use cases include: 1. Expand GeoBlacklight discovery layer implementation to include rendering & data attribute display for points, lines, gridded data and polygons in geospatial data files at record level. We currently only display file level Geospatial metadata via Blacklight not data level from within file. 2. Collaborate with Center for Open Science to support registration of VecNet assets in the Open Science Framework . We will do a data crosswalk between systems, pilot some VecNet records and datasets in OSF as use cases, and compare the optional and required feature set in VecNet with feature set in OSF. 3. Expand auto-generation of metadata records for simulation data. Now support metadata for EMOD simulations at run level, expand this feature to fully support OpenMalaria and to support automatic metadata generation at simulation and sweep level for all simulations. 4. Integrate with ORCID and work on more systematic ways to do names disambiguation. Page 2 of 5 5. Implement authority service for attaching International Chemical Identifier (InChI) to chemical data. 6. Improve system performance/responsiveness . 4. Requirements Requirement Description Motivation from Use Case Importance (1 - very important1 to 5 - not at all important2) Remote Access Management Allows an authorized individual the capacity to curate materials from distributed locations Democratizing access to models, input, and output 1 Support Staged Content Staged Content includes: submission states that are raw, processed, curated and published Facilitating decision making /translating questions posed into salient analysis 4 Support Full Text Search Full Text Search will allow for greater nuances in retrieving related results Facilitate decision making/translating questions posed into salient analysis 1 Provide both single and batch ingest paths Allows for a range of data types and scales to be submitted with maximum efficiency Foster communication between scientists and modellers 2 Extracted information is stored with related metadata This combination of information with metadata during ingest aids in more comprehensive records over time. Facilitate decision making/translating questions posed into salient analysis 2 Proper citations and user notifications 1 Maintain metadata for This includes stored information descriptions such as author, owner, license, source publication, librarian, date and time stamps 1 2 1= Feature Exists/Complete in existing VecNet system 5= Feature not yet implemented, low priority Page 3 of 5 Provide connections to current reference data This includes linkages, pointers, local cache, etc. to data such as population and weather. Using software models with geographic areas 2 Allow authorized data contributors to annotate submissions that require special security controls Increased communication of access in relation to security and sensitivity of data products Democratize access to models, input, and output 1 Allow product developers to update product information Gives products that are being developed a working space within the repository Developer ability to refine target product profiles 2 Maintain citations linked to experiments/simulatio ns conducted in Transmission Simulator Citations provide recognition and updates from experiments utilizing data Foster communication between scientists and modellers/ proper citations and user notifications 3 Allow data providers to choose the level of access to data Control over access to data is in the hands of those who provide not the system Democratize access to models, input, and output 1 Capture authorized relevant metadata This provides a record of information flow into and out of the repository Proper citations and user notifications 1 Track changes to resource metadata and information relationships Charts the connections that are created and shift as use occurs and new files are added Foster communication between scientists and modellers/ proper citation and user notifications 1 Support the ability to ingest external data from another source Allows sharing across Democratize access repositories when to models, input, and related data has output already been archived elsewhere Page 4 of 5 3 Provide authorized users access to previously run simulations Allows users to see past uses that can inform their own work Translate posed questions into salient analysis 1 Provide historical reference data For example: population, meteorology, and agriculture Missing data is filled in with similar settings/ using software models with geographic areas 1 Allow content to be marked for deletion by authorized users Within context certain files may not be necessary to keep authorized users are given control to deem which files should go. Democratize access to models, input, and output 2 Allow local download of a selected set of information The ability to download content to a local device when the information conforms to proper handling and is marked as accessible. Democratize access to models, input, and output 1 Require all data to be attributed with handling requirements Handling requirements include licenses and security parameters. Proper citations and user notifications 1 Maintain a permanent Having a permanent history of versions for history of versions all stored materials shows the process of change overtime and allows users to return to the original or other iteration. Proper citations and user notifications 1 Capture "degree of confidence" on each library item Translate questions posed into salient analysis 4 Provides users with idea of quality and trustworthiness of material in selection process. Page 5 of 5