25 Practical data management Wallom

advertisement
Data Management Tools
David Wallom
YOUR DATA DOES NOT BELONG TO
YOU!
IT BELONGS TO YOUR EMPLOYING
INSTITUTION!
The RDM Lifecycle
Conceive
Design
Experiment
Analyse
Collaborate
Publish
Slide borrowed with permission from Anthony Beitz, Monash University. Presented at OR 2012, Edinburgh
Expose
Expose
The RDM Lifecycle
Conceive
Design
Slide borrowed with permission from Anthony Beitz, Monash University. Presented at OR 2012, Edinburgh
Conceive and Design
• Collaborative
– Multiple Investigators, multiple institutions
• Process driven
– RCUK, EC, a.n.other funder all have processes that
must be followed
• Time limited
– Calls come with deadlines and everyone leaves it
until the last moment…
Tools
• Shared document development tools
– E.g. Google Docs, Office 365
• Shared document/project management tools
– Sharepoint – locally managed and normally connected to
an Exchange system
– Online
•
•
•
•
•
Teambox
Projecturf
Apollo
Basecamp
Huddle
• Data Management Planning
– DMPOnline
The RDM Lifecycle
Experiment
Analyse
Collaborate
Backup
• What data you need to back-up?
– Criteria?
• How many versions you should retain?
– Current, raw, processed?
• How often you intend to backup?
– After every change, daily, weekly, monthly?
• How many copies you should retain and their location?
– 2, 3, onsite, offsite?
• How you intend storing backup data?
– Media, online, cloud?
• What software you will use to manage backups?
– Operating system based, 3rd party, criteria to choose?
Tools
• Desktop Drive (USB)
• RO Media
• Online
–
–
–
–
–
–
–
–
‘Institutional Dropbox’
SkyDrive
iCloud
Dropbox
Humyo
Memopal
Mozy
ZumoDrive
Data Security
•
•
The available skills and expertise required to ensure an adequate level of data security;
Risk assessment to determine the value of data
–
–
–
–
•
•
•
•
•
•
the level of confidentiality required
applicable statutory requirements
impact of unauthorised access to, or loss of, the data
steps required to provide appropriate data protection.
The prevention of unauthorised and malicious access to buildings and rooms where
computers and other devices holding data may be housed.
How access to data is managed, authorised and logged.
How data is protected from loss or damage, for example by regular backups, implementing
version control and installing anti-malware software.
The means to access data from both within Oxford and from outside the Oxford network; and
the transmission of data from one computer to another (e.g. via email, ftp, Web server).
The storage and encryption of data taken offsite (whether, for example, on an external drive,
laptop, mobile device).
The process to verify the deletion of confidential data (for example, when equipment is redeployed or in line with a project's exit strategy)
Active Research Data Storage
– DataStage
– DataVerse
– NeuroHub/Hub
http://www.dcc.ac.uk/resources/external/cate
gory/active-data-storage
Workflow and Lab Books
• LabTrove
• cRUNCH
• NeuroHub/Hub
http://www.dcc.ac.uk/resources/external/cate
gory/workflow-and-lab-notebookmanagement
The RDM Lifecycle
Publish
Slide borrowed with permission from Anthony Beitz, Monash University. Presented at OR 2012, Edinburgh
Expose
Expose
Sharing Your Data
• Enables that data to be validated and tested,
improving the scientific record.
• Meets funding body requirements obliging award
holders to share their data to avoid duplication of
effort and to cut costs.
• Is in the public interest, where research data has
been publicly funded.
• Can facilitate its rediscovery and its preservation
as technology becomes obsolete.
• Means that data can be reused for scientific and
educational purposes.
When should I consider not sharing
my data?
• There may be occasions when you should consider not
sharing your data:
– If your research data is potentially commercially valuable or
exploitable by your employer.
• If there are ethical issues, legal issues, time constraints and
other issues which could limit data sharing opportunities,
• If there are conditions of confidentiality (eg. through
industrial sponsors) attached to the funding of your
research.
• Often, sensitive and confidential data can be shared
ethically if informed consent for data sharing has been
given, or by anonymising research data.
Community Data Repositories
Download