Read More - Penn State Data Commons

advertisement
http://www.datacommons.psu.edu
Overview of Today’s session
– DataCommons@PSU background
– Overview of capabilities
– Case studies & data partners
– Findings
Why Develop a DataCommons?
• Data management plans and curation are now a
requirement of funding agencies like NSF and
NIH.
• The issue of data has been featured in journals
such as Science and discussed and supported in
international scientific research societies such as
the Royal Society in the UK (Science as an Open
Enterprise) and in organizations such as the
European Commission.
Why Develop a DataCommons?
•
The issues related to data acquisition, collection, curation, and access have not
only become of central importance to funding agencies, they have been
recognized as vital to research, collaboration, and teaching.
•
Science February 2011 special issue highlights the importance of these issues:
“Scientific innovation has been called on to spur economic recovery; science and technology are
essential to improving public health and welfare and to inform sustainability; and the
scientific community has been criticized for not being sufficiently accountable and
transparent. Data collection, curation, and access are central to all of these issues.”
Furthermore:
“Most scientific disciplines are finding the data deluge to be extremely challenging,
and tremendous opportunities can be realized if we can better organize and access
the data.”
Science: The State of Research Data
Background on the
DataCommons@PSU
•
•
•
2005
–
Concept first presented by PSIEE as an environmental data library and repository for geospatial information
created by PSU faculty, researchers, and collaborators.
–
Received a $5K grant from PSU to explore the concept and acquire data.
2010
–
PSIEE presented this idea to the PSIEE director, the director of the Institute for Cyberscience, and the
director of the High Performance Computing Center in spring 2010.
–
They recognized that this was a common need and had similar goals and interests currently underway.
–
Growing interest and support in the next six months led to a PSU data community meeting sponsored by the
Institute for CyberScience and PSIEE in September 2010.
2011
–
Over the next few months the DataCommons site and search/retrieval mechanism were developed and
tested and the first new research data was acquired.
–
The DataCommons@psu site was officially launched in April 2011 has grown to include a wide array of data
including geospatial, tabular data and databases, documents, models, and protocols.
Why is this important to Penn State?
• Access to information is vital to much of the
research, teaching, and outreach conducted
by the Penn State Community.
• Data also demonstrates research
productivity which is usually only
represented in $.
• The DataCommons@PSU provides a picture
of PSU research.
Purpose
•
The purpose of the datacommons@psu is to serve as a portal to data,
applications, and resources that support efforts across the Penn State
community.
•
The datacommons@psu facilitates interdisciplinary collaboration by connecting
people and resources through:
–
–
–
–
–
–
Data Discovery & Access
Data Archiving and Preservation
Support of Data Sharing
Development of Data and Application Documentation (Metadata)
Support for development of agency required data management plans
Metadata development seminars (new)
The datacommons@psu does not replace existing programs or projects but
highlights those by making information and their websites/data accessible via
the datacommons@psu search engine.
Purpose…
• Highlight data, applications, models, and projects created by
members of the university community.
• Support collaboration and data sharing across those efforts and
communities.
• Support the development of large scale research proposals and
provide the data infrastructure to build research gateways.
• Reduce costs by providing widespread access to data needed by
multiple projects and programs and reduce redundant data
acquisition efforts and storage of data—Core Data
• Enhance the ability to develop research proposals, publish
results, and aid in supporting the educational/outreach
component of major funders.
• Provide a unifying tool that promotes cooperation and the
development of cross college/cross campus initiatives by linking
individuals and groups with similar interests and information
needs together.
What are other universities doing?
Capabilities
•
•
•
•
•
•
•
•
•
•
Data storage
Metadata development
Data search, retrieval, and access
Visualization of compatible data
Core data
Documentation and access to apps created by PSU
Documentation of models and protocols
Creation of Digital Object Identifiers (DOIs)
Links to existing data repositories with PSU data
References and links to publications based on the
data
Search Engine & Data Discovery Portal
Enhanced Data Discovery
Options
Search by PSU
College/Dept/Center/Institute
Enhanced Data Discovery Options
Search by PSU Researcher
Enhanced Data Discovery Options
Search by Research Theme
Search Results
•
•
•
•
Researcher: Gabrielle Alpirez de Davie, Education
Data: Validity of *ONET Work Importance Profile web version for Spanish
speaking populations
Researcher: John Reichendorfer, OPP, Tom Flynn, OPP Landscape
Data: Aerial Photography, Tree Database, PSU vector data
Researcher: Dennis Decoteau, Horticulture,
Data: Ambient air monitoring for Pennsylvania
Researcher: Marc Abrams, Department of Ecosystem Management
Data: Impacts of contrasting land-use history on composition, soils, and development of
mixed-oak, coastal plain forests on Shelter Island, New York
•
Researcher: Kim Steiner, Department of Ecosystem Management
Data: Oak Forest Regeneration
•
Researcher: Dr. Robert P. Brooks, Geography Department, Riparia
Data: Pocono Birds--Presence & Proportion on Lakes
•
•
Researcher: Dr. Eric Post, Department of Biology
Data: Trophic Mismatch—Caribou Phenology
Researcher: Andrew Patterson, Huck Life Sciences Institute
Data: Metabolomics, Ant Tissue
Data Summary Page:
Downloadable Data
Data Summary Page:
App
Apps & Tools
• Links to Data in Application
• Links to Data in Thematic
Database
Data Summary Page:
Multiple Options
Multiple Data
Viewing/Download Options
GIS Enabled Data
Case Studies: Metabolomics Data
• Currently hosting data for the Center for Molecular Toxicology
and Carcinogenesis.
• Data and protocols can be accessed from Data Commons and
from the Metabolomics Explorer.
Case Studies: Arboretum at Penn State
Collaboration across departments
• DataCommons is working with
the PSU Arboretum and OPP
to acquire and provide access
to data via the DataCommons
as well as an interactive
application.
• Goals are to provide both
access to data and usable apps
for the public, for teaching at
PSU, and for research.
• The PSU campus as a living
lab!
Findings
• Need for preservation long term.
• Need for plan to transition or upgrade to new
versions of software.
• Need for metadata education.
• Curation of sensitive data.
• Large datasets—video, astronomical
observations, remotely sensed data need to
be housed and preserved.
Findings continued…
• Data storage for projects that already provide
public access to data but need a centralized
permanent home.
• Need for data to be accessed by multiple
interfaces.
• Identity can be important to some providers.
• Cross campus workgroups that have common
data, platform/software needs but no place to
store the data.
Questions?
Download