http://www.datacommons.psu.edu Overview of Today’s session – DataCommons@PSU background – Overview of capabilities – Case studies & data partners – Findings Why Develop a DataCommons? • Data management plans and curation are now a requirement of funding agencies like NSF and NIH. • The issue of data has been featured in journals such as Science and discussed and supported in international scientific research societies such as the Royal Society in the UK (Science as an Open Enterprise) and in organizations such as the European Commission. Why Develop a DataCommons? • The issues related to data acquisition, collection, curation, and access have not only become of central importance to funding agencies, they have been recognized as vital to research, collaboration, and teaching. • Science February 2011 special issue highlights the importance of these issues: “Scientific innovation has been called on to spur economic recovery; science and technology are essential to improving public health and welfare and to inform sustainability; and the scientific community has been criticized for not being sufficiently accountable and transparent. Data collection, curation, and access are central to all of these issues.” Furthermore: “Most scientific disciplines are finding the data deluge to be extremely challenging, and tremendous opportunities can be realized if we can better organize and access the data.” Science: The State of Research Data Background on the DataCommons@PSU • • • 2005 – Concept first presented by PSIEE as an environmental data library and repository for geospatial information created by PSU faculty, researchers, and collaborators. – Received a $5K grant from PSU to explore the concept and acquire data. 2010 – PSIEE presented this idea to the PSIEE director, the director of the Institute for Cyberscience, and the director of the High Performance Computing Center in spring 2010. – They recognized that this was a common need and had similar goals and interests currently underway. – Growing interest and support in the next six months led to a PSU data community meeting sponsored by the Institute for CyberScience and PSIEE in September 2010. 2011 – Over the next few months the DataCommons site and search/retrieval mechanism were developed and tested and the first new research data was acquired. – The DataCommons@psu site was officially launched in April 2011 has grown to include a wide array of data including geospatial, tabular data and databases, documents, models, and protocols. Why is this important to Penn State? • Access to information is vital to much of the research, teaching, and outreach conducted by the Penn State Community. • Data also demonstrates research productivity which is usually only represented in $. • The DataCommons@PSU provides a picture of PSU research. Purpose • The purpose of the datacommons@psu is to serve as a portal to data, applications, and resources that support efforts across the Penn State community. • The datacommons@psu facilitates interdisciplinary collaboration by connecting people and resources through: – – – – – – Data Discovery & Access Data Archiving and Preservation Support of Data Sharing Development of Data and Application Documentation (Metadata) Support for development of agency required data management plans Metadata development seminars (new) The datacommons@psu does not replace existing programs or projects but highlights those by making information and their websites/data accessible via the datacommons@psu search engine. Purpose… • Highlight data, applications, models, and projects created by members of the university community. • Support collaboration and data sharing across those efforts and communities. • Support the development of large scale research proposals and provide the data infrastructure to build research gateways. • Reduce costs by providing widespread access to data needed by multiple projects and programs and reduce redundant data acquisition efforts and storage of data—Core Data • Enhance the ability to develop research proposals, publish results, and aid in supporting the educational/outreach component of major funders. • Provide a unifying tool that promotes cooperation and the development of cross college/cross campus initiatives by linking individuals and groups with similar interests and information needs together. What are other universities doing? Capabilities • • • • • • • • • • Data storage Metadata development Data search, retrieval, and access Visualization of compatible data Core data Documentation and access to apps created by PSU Documentation of models and protocols Creation of Digital Object Identifiers (DOIs) Links to existing data repositories with PSU data References and links to publications based on the data Search Engine & Data Discovery Portal Enhanced Data Discovery Options Search by PSU College/Dept/Center/Institute Enhanced Data Discovery Options Search by PSU Researcher Enhanced Data Discovery Options Search by Research Theme Search Results • • • • Researcher: Gabrielle Alpirez de Davie, Education Data: Validity of *ONET Work Importance Profile web version for Spanish speaking populations Researcher: John Reichendorfer, OPP, Tom Flynn, OPP Landscape Data: Aerial Photography, Tree Database, PSU vector data Researcher: Dennis Decoteau, Horticulture, Data: Ambient air monitoring for Pennsylvania Researcher: Marc Abrams, Department of Ecosystem Management Data: Impacts of contrasting land-use history on composition, soils, and development of mixed-oak, coastal plain forests on Shelter Island, New York • Researcher: Kim Steiner, Department of Ecosystem Management Data: Oak Forest Regeneration • Researcher: Dr. Robert P. Brooks, Geography Department, Riparia Data: Pocono Birds--Presence & Proportion on Lakes • • Researcher: Dr. Eric Post, Department of Biology Data: Trophic Mismatch—Caribou Phenology Researcher: Andrew Patterson, Huck Life Sciences Institute Data: Metabolomics, Ant Tissue Data Summary Page: Downloadable Data Data Summary Page: App Apps & Tools • Links to Data in Application • Links to Data in Thematic Database Data Summary Page: Multiple Options Multiple Data Viewing/Download Options GIS Enabled Data Case Studies: Metabolomics Data • Currently hosting data for the Center for Molecular Toxicology and Carcinogenesis. • Data and protocols can be accessed from Data Commons and from the Metabolomics Explorer. Case Studies: Arboretum at Penn State Collaboration across departments • DataCommons is working with the PSU Arboretum and OPP to acquire and provide access to data via the DataCommons as well as an interactive application. • Goals are to provide both access to data and usable apps for the public, for teaching at PSU, and for research. • The PSU campus as a living lab! Findings • Need for preservation long term. • Need for plan to transition or upgrade to new versions of software. • Need for metadata education. • Curation of sensitive data. • Large datasets—video, astronomical observations, remotely sensed data need to be housed and preserved. Findings continued… • Data storage for projects that already provide public access to data but need a centralized permanent home. • Need for data to be accessed by multiple interfaces. • Identity can be important to some providers. • Cross campus workgroups that have common data, platform/software needs but no place to store the data. Questions?