Data goodness

advertisement
Data goodness
Mostly in black and white
By Dom
You must love your data!
• Lost data :
•
Current imaging data in BRIC cost ~£5.1M, just for scanning costs! (2011)
 no research
no publications
 no jobs
 no PhDs!
 Sad Dom 
• Look after your data!
– It looks after you
• Happy Dom 
Data Storage
• Home directories:
– ISIS home, U Home
» Not for large amounts of imaging data
• Projects directory
– ISIS, V: Big stuff goes here
• If you require large amounts of space
– E.g. > 50 GB
– LET ME KNOW IN ADVANCE!
Server goodness
• Why is the server a good place to store data?
• Mirror and parity - some errors - data can be easily recovered
– BACKUPS:
• Tape backups, daily - 1 month retention
• if you have funding, processed data can be mirrored off site
• raw data is always mirrored offsite (ECDF) by default
– Desktop PC's
• not reliable - no mirroring, no parity - some errors - data is lost
(Often all of it)
• Network backups often fail
– Machines turned off, Network busy
– moving to a new system when I get time!
Data love
• Curation: Do this as you work!
• Plan your data use
– Use meaningful folder names
– Make 'README.txt' files with dates, names of students/employees
involved, references to software, scripts and versions, purpose of
experiment/processing.
– Be tidy with your data - tidy up occasionally
– Friday afternoon - quick tidy up
– Big tidy up at end of experiment/ project/ phase/ year
• BE CAREFUL, don’t rush
• Data, spreadsheets, databases
– Anonymisation
– *** Repatriation keys***
Code and Scripts
• Coding:
• Testing
– Make sure that the software you are using does exactly
what you think it does!
» Check every step for every image!
– Do not use hard coded paths
• Use versioning software (ECDF)
Safe data is
Happy data!
Download