Managing and Sharing Data: Training Resources * Workshop

advertisement
MANAGING & SHARING RESEARCH DATA
……………………………………………………………………………………………………………………………….……………………………..
……………………………………………………………….…...
LOUISE CORTI
UK DATA SERVICE
UNIVERSITY OF ESSEX
……………………………………………….……………..…….
University of Sussex, Why share your research data
14 February 2013
OVERVIEW FOR TODAY
……………………………………………………………………………………………………………………………….……………………………..
•
•
•
•
•
•
Why share data
Data management planning
Looking after your data
Formatting and organising your data
Storage, security, transfer and file sharing
Copyright of data
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
UK DATA SERVICE
……………………………………………………………………………………………………………………………….……………………………..
• UK Data Archive has over forty years experience in selecting,
ingesting, curating and providing access to social science data
• huge experience of supporting researchers and data creators of
social science data and related disciplines
• Been managing data sharing for the ESRC Data Policy since
1995, including the Rural Economy and Land Use Programme
(2004-2012)
• our best practice approaches to making data shareable are
based on:
• challenges faced by researchers to share data
• handling research data – quantitative and qualitative
• highly skilled staff comprising researchers, technical and
information specialists
www.data-archive.ac.uk/create-manage
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
OUR MANAGING AND SHARING DATA RESOURCES
……………………………………………………………………………………………………………………………….……………………………..
Managing and sharing guidance
• sections
• references
• training programme
www.data-archive.ac.uk/create-manage
www.data-archive.ac.uk/media/2894/managingsharing.pdf
Training resources:
• presentations
• exercises and discussions / answers
www.data-archive.ac.uk/create-manage/training-resources
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
BENEFITS OF MANAGING AND SHARING YOUR
DATA
……………………………………………………………………………………………………………………………….……………………………..
• Data created from research are valuable resources that can
be used and re-used for future scientific and educational
purposes.
• Sharing data:
• facilitates new scientific inquiry
• avoids duplicate data collection
• provides rich real-life resources for education and training
• credits the data creator with intellectual effort
• can help to support evidence of impact
• is something to be proud of!
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
DATA LIFECYCLE & DATA MANAGEMENT PLANNING
……………………………………………………………………………………………………………………………….……………………………..
A data management and sharing plan helps researchers
consider when research is being designed and planned, how
data will be managed during the research process and shared
afterwards with the wider research community
Areas of coverage:
• Data management planning why & how and the research
lifecycle
• Data management checklist
• Roles and responsibilities
• Costing data management
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
WHY DATA MANAGEMENT PLANNING?
……………………………………………………………………………………………………………………………….……………………………..
• Research funders require planning for data management and data
sharing, e.g. UK Research Councils
•
•
•
•
•
which data
how manage
how share, preserve, curate
rights to access, use,….
roles & responsibilities
• Research benefits
•
•
•
•
•
•
think what to do with research data, how collect, how look after
keep track of research data (e.g. staff leaving)
identify support, resources, services needed
plan storage, short & long-term
plan security, ethical aspects
be prepared for data requests (FoI, funder)
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
COMPLETING A DMP
……………………………………………………………………………………………………………………………….……………………………..
• Funder template for DMPs
•
•
•
ESRC DMP requirements in data policy and DMP guidance
MRC DMP guidance and template
AHRC technical plan requirements
• DCC’s DMPonline tool
• Good to share best practice in your own departments e.g
what does an ideal DMP looks like?
UK Data Archive data management checklist
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
KEY DATA MANAGEMENT INTERVENTION POINTS
……………………………………………………………………………………………………………………………….……………………………..
Data formats,
data migration
Sign off consent
form
Agree data &
metadata templates
Licensing, terms
and conditions for
sharing, formal
documentation
Shared data
sharing protocols
………………………………………………………………………………………………………………………………………….……………………..…
Green and Gutmann, 2007
UK DATA ARCHIVE
ROLES & RESPONSIBILITIES
……………………………………………………………………………………………………………………………….……………………………..
Assign, not presume roles or responsibilities for data management
Who?
• PI
• Research staff / students - collecting, creating, processing,
analysing data
• External contractors - data collection, collation, processing; e.g.
transcribers
• Support staff - managing, administering research
• Local Information Services - data storage, security, back-up
services
• External/institutional data centres / archives - data sharing
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
COSTING
……………………………………………………………………………………………………………………………….……………………………..
• Cost data management and sharing into research
• Identify resources needed to make research data shareable
beyond primary research team - above planned standard
research procedures and practices
• Resources = people, equipment, infrastructure, tools to manage,
document, organise, store and provide access to data
• Early planning can reduce costs
• See our data management costing tool
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
FORMATTING YOUR DATA
……………………………………………………………………………………………………………………………….……………………………..
• Using standard and interchangeable or open lossless data
formats ensures long-term usability of data
• High quality data are well organised, structured, named and
versioned
• Authenticity of master files must be identified
•
•
•
•
•
File formats and conversions
Organising files and folders
File naming
Version control and authenticity
Transcription
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
CAN YOU UNDERSTAND/USE THESE DATA?
……………………………………………………………………………………………………………………………….……………………………..
SrvMthdDraft.doc
SrvMthdFinal.doc
SrvMthdLastOne.doc
SrvMthdRealVersion.doc
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
FILE FORMATS
……………………………………………………………………………………………………………………………….……………………………..
• Choice of software format for digital data:
•
•
•
•
planned data analyses
software availability/cost
hardware used e.g. audio capture
discipline-specific standards and customs
• Digital data endangered by obsolescence of software/ hardware
• Best formats for long-term preservation - standard formats,
interchangeable formats, open formats
e.g. tab-delimited, comma-delimited (CSV), ASCII, RTF, PDF/A,
OpenDocument format, SPSS portable, XML
see UK Data Archive optimal file formats for various data types
• Beware of errors or losses of data when converting formats!
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
EXAMPLE: FORMAT CONVERSION
……………………………………………………………………………………………………………………………….……………………………..
MS Excel (.XLSX) format using colour highlighting for annotation
Loss of
annotation
Format
change
Tab-delimited text format, and loss of colour annotation
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
ORGANISING DATA
……………………………………………………………………………………………………………………………….……………………………..
• Plan in advance how best to organise data
• Use a logical structure and ensure collaborators understand
Examples
• hierarchical structure of files, grouped in folders, e.g. audio,
transcripts and annotated transcripts
• survey data – spreadsheet, SPSS,
relational database
• interview transcripts - individual
well-named files
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
FILE NAMING
……………………………………………………………………………………………………………………………….……………………………..
•
•
•
•
file name - principal identifier of file
use logical naming i.e. easy to identify, locate, retrieve, access
naming provides organisation, context & consistency
name elements: version nr, date, content description, creator
name
Best practice
•
•
•
•
•
•
•
name independent of location
brief & relevant
no special characters, dots or spaces
for separation use underscores _
versioning via filename: ordinal and decimal version numbers
use names to classify broad types of files
avoid very long file names
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
VERSION CONTROL
……………………………………………………………………………………………………………………………….……………………………..
Keep track of different copies or versions of data files
Which method:
• single site vs. across locations
• different versions to be stored vs. files to be synchronised
Best practice:
• unique identifiers for files (naming convention keeping track)
• record file status/versions
• record relationships between files
e.g. data file and documentation; similar data files
• keep track of file locations
e.g. laptop vs. PC
Various tools available for versioning and syncronising files
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
EXAMPLE : VERSION CONTROL TABLE
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
AUDIO TRANSCRIPTION
……………………………………………………………………………………………………………………………….……………………………..
• adopt a uniform layout throughout the research project
• compatibility with import features of Computer Assisted
Qualitative Data Analysis Software (CAQDAS)
• role of transcription varies by discipline. What to transcribe?
•
•
•
verbal and non-verbal?
turn-taking?
‘interruptions’
• who does it – researcher, service? Need rules
• implications of technologies – video, multiple camera, screen
capture, webcams
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
STORING YOUR DATA
……………………………………………………………………………………………………………………………….……………………………..
• Looking after research data for the longer-term and protecting
them from unwanted loss requires having good strategies in place
for:
•
•
•
•
backing-up
transmission
secure storage
disposal
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
BACKING-UP DATA
……………………………………………………………………………………………………………………………….……………………………..
• Why do back-ups? Risk of loss and change - would your
data survive a disaster?
• Protect against: software failure, hardware failure,
malicious attack, natural disasters
• Back-ups are additional copies that can be used to restore
originals
• It’s not backed-up unless backed-up with a strategy
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
DIGITAL BACK-UP STRATEGY
……………………………………………………………………………………………………………………………….……………………………..
Consider
• what’s backed-up? - all, some, just the bits you change?
• where? - original copy, external local and remote copies
• what media? - CD, DVD, external hard drive, tape, etc.
• how often? – assess frequency and automate the process
• for how long is it kept? Data retention policies that might apply?
know your personal/institutional back-up strategy
• verify and recover - never assume, regularly test a restore
Backing-up need not be expensive
• 1Tb external drives are around
£50, with back-up software
Consider non-digital storage too!
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
ENCRYPTION and SECURITY
……………………………………………………………………………………………………………………………….……………………………..
Always encrypt personal or sensitive data
• when moving data files e.g. interview transcripts
• when or storing files e.g. shared areas, mobile devices
Free softwares that are easy to use
• encrypt hard drives, partitions, files and folders
• encrypt portable storage devices such as USB flash drives
• Safehouse, Truecrypt, Axcrypt
Protect data from unauthorised access, use, change, disclosure
and destruction
• control access to all computers devices
• control physical access to buildings, rooms, cabinets
• restrict access to sensitive materials e.g. consent forms
Proper disposal of equipment and media
• even reformatting the hard drive is not sufficient
………………………………………………………………………………………………………………………………………….……………………..…
24
UK DATA ARCHIVE
FILE SHARING & COLLABORATIVE ENVIRONMENTS
……………………………………………………………………………………………………………………………….……………………………..
Sharing data between researchers and teams
• too often email attachments
• Yousendit, Dropbox – consider if appropriate as services can be
hosted outside the EU (DPA for personal data), e.g. encrypt
• Virtual Research Environments
• MS SharePoint
• Sakai
• file transfer protocol (ftp)
• physical media
• Essex ZendTo
………………………………………………………………………………………………………………………………………….……………………..…
25
UK DATA ARCHIVE
OPTIONS FOR SHARING CONFIDENTIAL DATA
……………………………………………………………………………………………………………………………….……………………………..
Researchers to consider
• obtaining informed consent, also for data sharing and
preservation / curation
• protecting identities e.g. anonymisation, not collecting personal
data
• restricting / regulating access where needed (all or part of data)
e.g. by group, use, time period.
UK Data Archive has gradation of access controls
• securely storing personal or sensitive data
Consider jointly and in dialogue with participants
Plan early in research
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
COPYRIGHT AND DATA SHARING
……………………………………………………………………………………………………………………………….……………………………..
• Copyright permissions sought and granted prior to data sharing /
archiving
• Clearing copyright – reach agreement with copyright holder
• Repositories only publish data – they hold no copyright
• Copyright holders give permission to data archives to preserve
data and make them accessible to users
• For secondary use, copyright clearance before data can be
reproduced
• Exception - fair dealing - for non-commercial research, private
study, teaching, quotations, criticism or review; then author and
source must be cited
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
AN ARCHIVED RESEARCH COLLECTION
……………………………………………………………………………………………………………………………….……………………………..
• Foot and Mouth in Cumbria dataset – qualitative study
• http://beta-discovery.dataservice.ac.uk/Home/DataCatalogueTab/5407
• High quality hard to replicate/collect data
• Added value from reports, methods and links to
publications
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
CONTACT
……………………………………………………………………………………………………………………………….……………………………..
UK DATA ARCHIVE
UNIVERSITY OF ESSEX
WIVENHOE PARK
COLCHESTER
ESSEX CO4 3SQ
……………………….…………………….….
T: +44 (0)1206 872001
E: datasharing@data-archive.ac.uk
W: www.data-archive.ac.uk
……………………………….………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
Download