… because good research needs good data DAF methodology & Glasgow Uni scoping study Sarah Jones DCC, University of Glasgow s.jones@hatii.arts.gla.ac.uk Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-ncsa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data Background to DAF project “JISC should develop a Data Audit Framework to enable all universities and colleges to carry out an audit of departmental data collections, awareness, policies and practice for data curation and preservation” Liz Lyon, Dealing with Data: Roles, Rights, Responsibilities and Relationships, (2007) www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data The methodology http://www.data-audit.eu/DAF_Methodology.pdf www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data Stage 1: planning Objective Determine what you want to find out and prepare work in advance Process - Define scope / expected outcomes - Research organisational context - Set up survey, interviews, meetings… www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data Stage 2: identifying data Objective Create inventory to understand scale of data Process Engage researchers to: - Identify key data assets - Classify data to restrict scope www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data Stage 3: assessing data management Objective Identify weaknesses in data management and potential risks Process - In-depth assessment of most crucial assets, given purpose of audit - Discussion on lifecycle of data to assess data management www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data Stage 4: recommendations Objective Recommend changes to improve data management Process - Collate audit results - Analyse data - Suggest changes to mitigate weaknesses www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data DAF pilot implementations • Early test cases: GeoSciences; Archaeology; Mechanical Engineering; Humanities • University of Edinburgh Physiology; Divinity; History; Brain Imaging; Astronomy • University College London Archaeology; Scandinavian Studies; Physics & Astronomy; Life & Medical Sciences • Imperial College London Chemical Engineering; Physics; Business School • King’s College London Geography; Psychiatry; Environmental Research; Biomedical And Health Sciences • DataShare examples Cardiac group; Dept of International Development; Social Sciences www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data Workshop on next steps for DAF • Many of the pilots found the actual process of gathering information on data management was more valuable than the asset register. The DAF approach was felt to be useful for defining requirements to improve data management. (JISC funded RDMI projects) • A suggestion was made to enhance DAF with practical examples / guidance from the pilot studies. (Implementation Guide) • Align the DAF process with other data management planning tools. (IDMP project between AIDA, DAF, DRAMBORA, LIFE) www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data GU scoping studies • Digital preservation Advisory Board established at GU in 2008 • Keen to identify scale of digital preservation needs across the uni • Scoping studies ran in 2009 in: • • • • • • • • Archaeology Chemistry Corporate Communications Court Office English Language Electronics and Electrical Engineering Evolutionary Ecology and Biology MRC Social and Public Health Sciences Unit www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data Methodology • Semi-structured interviews • • • interview framework sent in advance some background research done before interview e.g. reading staff profile recorded (with permission) then transcribed and sent for comments • Spoke with HoDs, researchers, teaching, admin and support staff • Reviewed preliminary findings and increased scope • • • added more PhDs and ECRs as most researchers we’d spoken to were senior added corporate communications for ‘web’ perspective Spoke to additional key people at the Uni e.g. William Nixon, repository manager; James Currall, security expert. www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data Interview framework 1. what digital material is being created 2. how this is being created and maintained 3. any issues that have been encountered 4. plans for the long-term e.g. preservation, reuse 5. requirements for support and services. http://www.gla.ac.uk/media/media_126658_en.pdf www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data What did we find? Pockets of good practice… •to connect data with documentation, we name files using a code number which is the person’s initials, the lab book number in roman numerals and then the experiment number It makes a huge difference if somebody can come and talk through problems and solutions with you. A personal contact like the RDOs is helpful. •We produced documentation workflows on how to take material from the DAT machines, how to transfer these into computer files, guidelines on transcription and anonymisation, and making derivates. It’s all very well documented which means there is consistency across the team, which is vitally important. …. but a lot of confusion and need for support www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data Procedures for creation & management •the network has always been the bane of everyone’s lives to find stuff on - you end up opening umpteen files to see if it’s the one you’re after •The volume of data produced makes maintenance a bit like drinking from a fire hose. •the licence is very expensive and if this weren’t renewed it wouldn’t be possible to continue to access the data They had major problems last year moving from ArcGIS 9.1 to 9.3 – everything stopped working as they’d changed the geodatabase format. It was not straightforward to fix… •the paper records system hasn’t transferred easily to the digital Digital images are a classic case in point as many still have the numerical ordering and cryptic letter sequence auto-generated by the camera. www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data Storage and backup Research groups tend to run their own little fiefdom. The correlation seems to be the more computers they have, the less IT expertise there is. •Insufficient backup space is a recurring problem, but it’s not really a lack of space, it’s more an issue of not being able to control what people store on their hard drives. •People bring in sticks with 4GB of data on that simply no longer work and nothing can be done to retrieve it. If they throw some money at the problem they can install another networked drive and the problem goes away for a while large and reliable storage is expensive. You need this for home directories but things that are to be archived or backed up could be punted out of the way to cheaper storage. www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data Selection / long-term preservation •It’s one thing to keep something going, but are people still able to use it in the same way? •If the website comes to an end, the data could still be preserved, but you lose the richness of being able to search that, or see it on a map, or have them synchronised. •How do you decide what can be deleted? I’m not confident to make that decision. •it’s like giving your baby away Probably only one tenth of what’s currently held should be retained. •Archiving is to allow someone else to reuse it If I know the code will be public I’ll pay more attention to properly annotating it with comments so other people can understand it. www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010 … because good research needs good data What next… • DPAB continues to address this at senior management level • JISC-funded Incremental project (part of MRD programme) • Ensuring researchers can find guidance and support when needed • Making data training and guidance more understandable to researchers • Offering tailored support and partnering http://www.lib.cam.ac.uk/preservation/incremental/index.html www.data-audit.eu/ Tools of the Trade Workshop, Manchester, 19th May 2010