An Introduction to Research Data Management 29th October 2014 Isabel Chadwick, Research Data Management Librarian rdm-project@open.ac.uk Overview of the workshop • • • • • • What is Research Data Management? Sharing data Working with data Planning for data Useful resources Questions? What is Research Data Management? “Research data management concerns the organisation of data, from its entry to the research cycle through to the dissemination and archiving of valuable results. It aims to ensure reliable verification of results, and permits new and innovative research built on existing information." Digital Curation Centre (2011) Making the Case for Research Data Management http://www.dcc.ac.uk/sites/default/files/documents/publications/Making%20the%20case.pdf What is Research Data Management? Discussion What research data do you create/use? What data management challenges do you face? What is Research Data Management? UK Data Archive Data Lifecycle model Preserving datato data Giving access Re-using data Analysing data Migrate data to best Distribute data Creating data Data oftendata have a longer • •Processing research • Follow-up Interpret data format Share data Design Enter data, research digitise, than the research • ••lifespan New research Derive data Migrate data to suitable Control access Plan transcribe, data management translate that creates them. • ••project Undertake research Produce research medium Establish copyright • reviews Plan Check, consent validate, for clean outputs Back-up and store data Promote data sharing data may continue to work • ••You Scrutinise findings Author publications Create metadata and Locate Anonymise existing datadata data after funding has • ••onTeach and learn Prepare data for documentation •ceased; Collect Describe data data projects follow-up preservation •may Archive data (experiment, Manage and observe, storetodata analyse or add the measure, simulate) data; data may be re-used •by Capture and create other researchers. metadata http://www.data-archive.ac.uk/create-manage/life-cycle What is Research Data Management? Why spend time and effort on this? • So you can work efficiently and effectively –Save time and reduce frustration –Highlight patterns or connections that might otherwise be missed • Because your data is precious • To enable data re-use and sharing • To meet funders’ and institutional requirements Photo by HikingArtist.com http://www.lurvely.com/photo/3000043099/Passing_time/ What is Research Data Management? What does the OU expect? “Research data must be managed to the highest standards throughout their life-cycle in order to support excellence in research practice. In keeping with OU principles of open-ness, it is expected that research data will be open and accessible to other researchers, as soon as appropriate and verifiable, subject to the application of appropriate safeguards relating to the sensitivity of the data and legal requirements.” OU Principles of Research Data Management, April 2013 http://intranet.open.ac.uk/research-school/strategy-infogovernance/docs/CoPamendedJuly2013mergedwithappendix-forintranet.pdf What is Research Data Management? What do funders expect? “Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.” RCUK Common Principles on Research Data Policy, 2011 http://www.rcuk.ac.uk/research/datapolicy/ What is Research Data Management? What do funders expect? http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies What is Research Data Management? ESRC Research Data Policy • Submit a data management and sharing plan with the grant application • Include costs for RDM in the bid • Incorporate data management into the research project • Submit an annual report on on-going implementation of the data management plan to ESRC • Offer any data to the UK Data Service for archiving (ReShare) • Ensure data are available for sharing within 3 months of end of project http://www.esrc.ac.uk/_images/Research_Data_Policy_2010_tcm8-4595.pdf What is Research Data Management? DFID Research Open and Advanced Access Policy • Submit an Access and Data Management Plan • Budget for RDM at commissioning stage to be included in DFID’s award • Deposit raw or derived datasets in a suitable open access repository within 12 months of the final collection • Retain and provide free on request raw datasets for a minimum of 5 years after project completion • Deposit metadata for all outputs in R4D https://www.gov.uk/government/publications/dfid-research-open-and-enhanced-access-policy What is Research Data Management? Hewlett Foundation “The Hewlett Foundation now requires that grantees receiving project-based grants openly license the final materials created with those grants under the most recent Creative Commons Attribution license. We also will require that the materials be made easily accessible to the public, such as by posting them to a grantee’s website.” Hewlett Foundation (2014) Commitment to Open Licensing http://www.hewlett.org/about-us/values-policies/ commitment-open-licensing What is Research Data Management? UNESCO Global Open Access Portal http://www.unesco.org/new/en/communication-and-information/portals-and-platforms/goap/ Sharing data Benefits of sharing data Sharing data Benefits of sharing data (2) Sharing data Benefits of sharing data (3) Sharing data What do you need to share? • Raw data • Derived data • Data underpinning publications • Code • Methods What are research data in your context? What would others need to understand your research? Sharing data Barriers to sharing data: discussion Discuss barriers to sharing your research data. • Ethical • Legal • Professional How could these barriers be overcome? Sharing data How can I share my data? Funders’ repository services • UK Data Service ReShare • R4D Online data sharing services • Figshare • Zenodo • CKAN DataHub Directories • re3data • DataBib Working with data “Start as you mean to go on” The end point of all projects should involve making the data publicly available. Many data will be deposited in national archives which have regulations for files and metadata. Thinking about the requirements at the beginning of the project will limit the transformations needed at the end of the project. Working with data Filing systems Filing is more than saving files, it’s making sure you can find them later in your project •Naming •Directory Structure •File Types •Versioning All these help to keep your data safe and accessible. Image by Theen Moy: https://www.flickr.com/photos/theenmoy/8078124630 (CC BY-NC-SA 2.0) Working with data Naming conventions Decide on a file naming convention at the start of your project. Useful file names are: • consistent. • meaningful to you and your colleagues. • allow you to find the file easily. Agree on the following elements of a file name: • Vocabulary • Punctuation • Dates (YYYY-MM-DD) • Order • Numbers • Version information Ideally you should be able to tell what’s in a file before opening it. Working with data File formats • • • • Unencrypted Uncompressed Non-proprietary/patent-encumbered Open, documented standard • Standard representation (ASCII, Unicode) Type Recommended Avoid for data sharing Tabular data CSV, TSV, SPSS portable Excel Text Plain text, HTML, RTF PDF/A only if layout matters Word Media Container: MP4, Ogg Codec: Theora, Dirac, FLAC Quicktime H264 Images TIFF, JPEG2000, PNG GIF, JPG Structured data XML, RDF RDBMS Further examples: http://www.data-archive.ac.uk/create-manage/format/formats-table Working with data File formats “Design outputs requiring minimal data download to see and use…” http://www.nationmaster.com/country-info/stats/Media/Internet/International-Internet-bandwidth/Mbps DfID Open and Enhanced Access Policy For more information: Web design guidelines for low bandwidth: http://www.aptivate.org/webguidelines/Home.html Publishers for Development bandwidth challenge: http://www.pubs-for-dev.info/bandwidthchallenge/ Working with data Metadata • Metadata is additional information that is required to make sense of your files – it’s data about data. • This is not a new idea; consider your music or film collection; • Think: title, authors, release date, producers, directors, etc. • Maybe the artwork, the studio, or format Image by Wilfried Joh: https://www.flickr.com/photos/wilfriedjoh/11494134233 (CC- BY-NC-ND 2.0) Working with data Metadata (2) Consider: •What contextual details are needed? – e.g. a description of the capture methods and data analysis. •How will you capture additional information? – e.g. in papers, in a database, in a ‘readme’ text file, in file properties/headers. •Which standards will you use and why? – Data centre recommendations for metadata, controlled vocabularies, and required documentation. Working with data Metadata (3) What contextual details are needed? •Who is in this picture? •When was it taken? •Where are they? •Who took this photo? •How was this picture taken? Working with data Metadata (4) How will you capture additional information? •If your data were separated from a related publication, would it make sense? •If you have a results table or database, ensure that metadata is provided for each column and/or row •Record instructions for use for any software developed •Your images need to have the required properties, which can be automatically attached or can you add more information manually Working with data Metadata (5) Which standards will you use and why? Many data centres recommend particular metadata for the formats that they support. This may be controlled vocabularies or required documentation. • Are you required to deposit in a particular data centre? For more information: Digital Curation Centre Guide to Disciplinary Metadata Standards (http://www.dcc.ac.uk/resources/metadata-standards) Working with data Storage and Security Where to store your data: • Networked drive • Personal computers or laptops • External portable storage (USB memory sticks, hard drives, CDs) • Cloud storage (eg. Dropbox) Cloud storage Networked drive Personal computers or External portable storage ••laptops The best place to store yourof the provider may go out longevity is not data while you are working on • business convenient forespecially storing your guaranteed, if it. • the data may be stored data temporarily they are not stored • IT managed Antivirus Software of the (orfor the • outside should not be UK used correctly on desktop computers storing master copies of •• EU) IT managed vulnerability errors with writing to CDs • secure destruction of data management program your data and DVDs are common difficult toemail ensure •• is IT managed filtering local drives may fail or PCs • may not be big enough for solution • what is the provider’s and laptops may data be lost or all the research • policy IT managed protection thenetwork case of a stolen in leading to an technologies (e.g. Firewall) security breach? inevitable loss Access of your data •Ensure Secure that Wireless sensitivePoints data • Encryption solutions to protect is encrypted Ensure you aware Ensure you haveare a secure sensitivethat University data of the provider’s passwordpolicies Working with data Security: sensitive data Working with data Security: sensitive data (2) Managing sensitive data • If possible, collect the necessary data without using personally identifying information • De-identify your data upon collection or as soon as possible thereafter • Avoid transmitting unencrypted personal data electronically • Consider whether you need to keep original collection instruments (recordings, surveys etc.) once they have been transcribed and quality assured Working with data Storage and Security: Discussion Planning for data DMPs are often submitted with grant applications, but are useful whenever you are creating data to: • Make informed decisions to anticipate and avoid problems • Avoid duplication, data loss and security breaches • Develop procedures early on for consistency • Ensure data are accurate, complete, reliable and secure Photo by CalsidyRose: https://www.flickr.com/photos/calsidyrose/3552473207 BY-NC-SA 2.0) • Save time and effort – make your life easier! Planning for data Activity Think about your own research. What actions would you need to perform on your data at each stage of the UKDA’s Lifecycle model? How would you do this? Would you need any additional funding/staff? Planning for data Activity Planning for data Which funders require a DMP? Note: Data Management Plans are a requirement of Horizon 2020 projects included in the Research Data pilot www.dcc.ac.uk/resources/policy-and-legal/ overview-funders-data-policies Planning for data What do research funders want? • A brief plan submitted in grant applications • 1-3 sides of A4 as attachment or a section in Je-S form • Typically a prose statement covering suggested themes • An outline of data management and sharing plans, justifying decisions and any limitations Planning for data ESRC Data Management Plans The Data Management Plan is an integral part of the grant application. • Analysis of existing data sources • Information on the data that will be produced: • data volume, data type, data quality, formats, standards documentation and metadata • planned quality assurance and back-up procedures • plans for management and archiving of collected data • expected difficulties in data sharing, along with and causes and possible measures to overcome these difficulties • consent, confidentiality, anonymisation and other ethical matters • copyright and intellectual property ownership of the data • responsibilities within research teams at all participating institutions. For more information: ESRC Research Data Policy: http://www.esrc.ac.uk/_images/Research_Data_Policy_2010_tcm8-4595.pdf RDM intranet pages on data management planning (include OU examples): http://intranet6.open.ac.uk/library/main/supporting-ou-research/research-data-management/creating-your-data/datamanagement-plans-0 Planning for data DfID Access and Data Management Plans • Outlines the researchers’ strategy for maximising opportunities to make research outputs openly accessible • Where appropriate, the plan will be assessed as part of the award process. • Plans usually also required when competitive tendering is not used • May be further developed during the inception phase and revisited and revised during the course of the project or at annual review as required. For more information: DfID Research Open and Enhanced Access Policy : https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/181176/DFIDResearch-Open-andEnhanced-Access-Policy.pdf RDM intranet pages on data management planning (include OU examples): http://intranet6.open.ac.uk/library/main/supporting-ou-research/research-data-management/creating-your-data/datamanagement-plans-0 Planning for data DMPOnline A web-based tool to help you write DMPs according to different requirements. DCC, funder and OU guidance. https://dmponline.dcc.ac.uk Planning for data Tips • Keep it simple, short and specific • Seek advice - consult and collaborate • Base plans on available skills and support • Make sure implementation is feasible • Justify any resources or restrictions needed Planning for data Put your plan into action… – Ensure consistency – Improve efficiency – Maintain ethical practice – Avoid security breaches and data loss – Make the most of your data Planning for data Example of good practice • • • Katiba project (Arts faculty) Team of 5, based in UK and Kenya Data includes interview transcripts and recordings, photographs and media clippings Planning for data Example of good practice (2) RDM handbook written by PI and RA expands on DMP: • Responsibilities • File naming, metadata, quality control, questionnaire design, storage and back-up procedures • Specific challenges of working in Kenya • Links to useful resources Planning for data Activity: documenting current procedure In groups discuss the areas of concern on the matrix. • What are your current procedures? • Can these be improved? How? • Are there any barriers to improving current practice?? Useful links • The OU Research Data Management intranet site: http://intranet6.open.ac.uk/library/main/supporting-ouresearch/research-data-management • Digital Curation Centre: http://www.dcc.ac.uk/ • DMPOnline: https://dmponline.dcc.ac.uk/ • UK Data Archive: http://www.data-archive.ac.uk/ • MANTRA: http://datalib.edina.ac.uk/mantra/ • The Orb: http://open.ac.uk/blogs/the_orb Questions? Isabel Chadwick Research Data Management Librarian RDM-project@open.ac.uk Photo credits Janneke Staaks, Research Data Management https://www.flickr.com/photos/jannekestaaks/1 4390184414 Ian “Harry” Harris, Order! Order! https://www.flickr.com/photos/harryharris/3 00782460 Katy Fentress, Wanted Honest Leaders https://www.flickr.com/photos/lakatyusha/7 588956704/ Climate Change, Agriculture and Food Security, Workshop in Lushoto, Tanzania https://www.flickr.com/photos/cgiarclimate/ 8550330905/ Climate Change, Agriculture and Food Security, East Africa Strategic Futures Workshop https://www.flickr.com/photos/cgiarclimate/79852 52532 Brian Wolfe, “Good teacher” “Good student” https://www.flickr.com/photos/mightyboybrian/63 89271595 Global Partnership for Education, A teacher shows a cell phone to her students, Chennai India https://www.flickr.com/photos/gpforeducati on/8644408460 DFID, Education for all https://www.flickr.com/photos/dfid/3860978139 Climate Change, Agriculture and Food Security, Assessing how Indian farmers manage climate and weather risks in India https://www.flickr.com/photos/cgiarclimate/ 8000068204 Afromusing, IMG_1502.JPG https://www.flickr.com/photos/afropicmusing/214 2907771/