Photo by Michelle Chang. All Rights Reserved Lesson 2: Data Sharing Data Sharing Data Sharing Within the Data Lifecycle Value of Data Sharing Concerns About Data Sharing Methods for Making Data Sharable CC image by Kevin Byron on Flickr • • • • Data Sharing • After completing this lesson, the participant will be able to: discuss data sharing considerations within the data life cycle recognize the benefits of sharing scientific data address concerns about sharing data outline a process for making data sharable o identify mechanisms for sharing data CC image by PictureYouth on Flickr o o o o Data Sharing • Data sharing should be addressed throughout Analyze the data life cycle1 Plan Collect Integrate Assure Discover Describe Preserve Data Sharing • Several stages require critical attention to ensure effective data sharing Describe document the data content, character and process Deposit store the data in a location from which it can be accessed Preserve select storage formats and media with long term use in mind Discover publish information about the data so that others can find it Data Sharing For the benefit of: o the public o the research sponsor o the research community o the researcher Data Sharing CC image by Jessica Lucia on Flickr Data sharing requires effort, resources, and faith in others. Why do it? A better informed public yields better decision making with regard to: Data Sharing CC image by falonyates on Flickr o Environmental and economic planning o Federal, state, and local policies o social choices such as use of tax dollars and education options o personal lifestyle and health such as nutrition and recreation • Organizations that sponsor research must maximize the value of research dollars • Data sharing enhances the value of research investments by enabling: o verification of performance metrics and outcomes o new research and increased return on investment o advancement of the science o reduced data duplication expenditures Data Sharing Access to related research enables community members to: CC image by Lawrence Berkeley National Laboratory on Flickr o build upon the work of others and further, rather than repeat, the science4 o perform meta analyses that cannot be performed with individual datasets or laboratories5 o share resources and perspectives so that comprehension is expanded and enhanced5 Data Sharing Access to related research enables community members to (cont’d): o increase transparency, reproducibility and comparability of results5 o expand methodology assessment, recommendations and improvement6 o educate new researchers as to the most current and significant findings6 Data Sharing Scientists that share data gain the benefit of: CC image by SLU Madrid Campus on Flickr o research sponsor recognition as an authoritative source and wise investment o improved data quality due to expanded use, field checks, and feedback o greater opportunity for data exchange o improved connections to scientific network, peers, and potential collaborators Data Sharing CC image by CyberHades on Flickr Even if the value of data sharing is recognized, concerns remain as to the impacts of increased data exposure. Data Sharing Concern inappropriate use due to misunderstanding of research purpose or parameters security and confidentiality of sensitive data lack of acknowledgement / credit loss of advantage when competing for research dollars Data Sharing Solution Concern Solution inappropriate use due to misunderstanding of research purpose or parameters metadata security and confidentiality of sensitive data metadata lack of acknowledgement / credit metadata loss of advantage when competing for research dollars metadata Data Sharing Concern Solution inappropriate use due to misunderstanding of research purpose or parameters provide rich Abstract, Purpose, Use Constraints and Supplemental Information where needed security and confidentiality of sensitive data • the metadata does NOT contain the data • Use Constraints specify who may access the data and how lack of acknowledgement / credit specify a required data citation within the Use Constraints loss data insight and competitive advantage when vying for research dollars create second, public version with generalized Data Processing Description Data Sharing Step One: Create robust metadata that is discoverable o specify geography and time periods o use discipline specific theme, place and temporal keywords, thesauri, and ontologies o describe attributes o include links to associated data catalogues, data downloads, project websites, etc. Data Sharing Step Two: Include archival and reference information o properly formatted data citations for the data and all sources o Universally Unique Identifiers (UUID) that uniquely identify your data and help to link the data with the metadata See the DataONE unique identifier guidance at: http://mule1.dataone.org/ArchitectureDocs-current/design/PIDs.html Data Citation Example: Sidlauskas, B. 2007. Data from: Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20 Data Sharing Step Three: Have data contributors review your metadata to ensure validity and organizational ‘correctness’? o are the processes described accurately? o are all contributions adequately identified? o has management reviewed the product and documentation? o is the funding organization properly recognized? Data Sharing Step Four: Publish your metadata via: Data Portals / Clearinghouses Federal o geodata.gov o data.gov o USGS Core Science Metadata Clearinghouse (mercury.ornl.gov/clearinghouse) Communities of Practice o Knowledge Network for Biodiversity (KNB) Data Portal o Long Term Ecological Research (LETR) Network Data Portal Data Sharing Step Four: Publish your metadata and/or data via: (cont’d) Other Online Resources: o o o o Data Sharing Project and/or Program websites Links within online lessons and outreach products Web-accessible folders (WAF) Community or Public Cloud • Document and publish data using standards • Promote data use via presentations and meetings • Solicit feedback from data users and address identified issues • Monitor publications and websites for data use and address misapplications Data Sharing Data Sharing CC image by Andrew Mccluskey on Flickr • In 2003, a group of scientists from the National Institutes of Health, the Food and Drug Administration, drug and medical imaging industries, universities, and nonprofit groups joined in a collaborative effort to find the biological markers that show the progression of Alzheimer’s disease in the human brain. • The goal of this project was to do research on a massive scale that would involve sharing and making accessible all the data uncovered to anyone in the world with a computer. • Dr. John Trojanowski an Alzheimer’s researcher at the University of Pennsylvania stated, “It’s not science the way most of us have practiced it in our careers. But we all realized that we would never get biomarkers unless all of us parked our egos and intellectual-property noses outside the door and agreed that all of our data would be made public immediately.” • http://www.nytimes.com/2010/08/13/health/research/13alzheimer.html • Data sharing adds value to the data • It is the responsibility of the researcher to share their data • Metadata supports data accountability, liability, and usability • Sponsors expect, some require, data to be shared • Data sharing is essential to the advancement of science Data Sharing 1. 2. 3. 4. Inter-university Consortium for Political and Social Research (ICPSR), ICPSR Guide to Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle (ICPSR, 2009; http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf). [4th Edition] Australian Bureau of Statistics - National Statistical Service (ABS-NSS), A good practice guide to sharing your data with others (ABS-NSS, 2009; http://www.nss.gov.au/nss/home.nsf/NSS/E6C05AE57C80D737CA25761D002 FD676?opendocument). [Vers. 1] H.A. Piwowar, A new task for NSF reviewers: Recognizing the value of data reuse. ResearchRemix vers. May 28, 2011 (http://researchremix.wordpress.com/2011/05/28/dear-nsf-reviewers/). [blog posting of draft] H.A. Piwowar, M.J. Becich, H. Bilofsky, R.S. Crowley, Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med. 5(9), e183 (2008), doi:10.1371/journal.pmed.0050183. [on behalf of the caBIG Data Sharing and Intellectual Capital Workspace] Data Sharing 5. 6. 7. 8. J.L. Teeters, K.D. Harris, K.J. Millman, B.A. Olshausen, F.T. Sommer, Data Sharing for Computational Neuroscience. Neuroinform (2008), DOI 10.1007s12021-008-9009-y. [http://redwood.berkeley.edu/fsommer/papers/teetersetal08.pdf] National Institute of Health (NIH) “NIH Data Sharing Policy and Implementation Guidelines” (NIH, Washington D.C., 2003, http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm ). R. Geambasu, S.D. Gribble, H. M. Levy, "CloudViews: Communal Data Sharing in Public Clouds" In Proceedings of the First USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), San Diego, USA, June 2009. [Paper: PDF; Presentation: PPT, PDF] J. Niu, “Reward and Punishment Mechanisms for Research Data Sharing”. IASSIST Quarterly, Winter (2006). Data Sharing 9. C.L. Borgman, “Research Data: Who will share what, with whom, when, and why?” in Proceedings of the China-North American Library Conference, Beijing , September 2010 (http://works.bepress.com/borgman/238/). Data Sharing The full slide deck may be downloaded from: http://www.dataone.org/education-modules Suggested citation: DataONE Education Module: Data Sharing. DataONE. Retrieved Nov12, 2012. From http://www.dataone.org/sites/all/documents/L02_DataSharing. pptx Copyright license information: No rights reserved; you may enhance and reuse for your own purposes. We do ask that you provide appropriate citation and attribution to DataONE. Data Sharing