… because good research needs good data
DAF methodology &
Glasgow Uni scoping study
Sarah Jones
DCC, University of Glasgow
[email protected]
Funded by:
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK:
Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-ncsa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San
Francisco, California, 94105, USA.
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
Background to DAF project
“JISC should develop a Data Audit Framework
to enable all universities and colleges to carry
out an audit of departmental data collections,
awareness, policies and practice for data
curation and preservation”
Liz Lyon, Dealing with Data: Roles, Rights,
Responsibilities and Relationships, (2007)
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
The methodology
http://www.data-audit.eu/DAF_Methodology.pdf
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
Stage 1: planning
Objective
Determine what you want to find out
and prepare work in advance
Process
- Define scope / expected outcomes
- Research organisational context
- Set up survey, interviews, meetings…
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
Stage 2: identifying data
Objective
Create inventory to
understand scale of data
Process
Engage researchers to:
- Identify key data assets
- Classify data to restrict scope
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
Stage 3: assessing data management
Objective
Identify weaknesses in data
management and potential risks
Process
- In-depth assessment of most crucial
assets, given purpose of audit
- Discussion on lifecycle of data to
assess data management
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
Stage 4: recommendations
Objective
Recommend changes to
improve data management
Process
- Collate audit results
- Analyse data
- Suggest changes to mitigate
weaknesses
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
DAF pilot implementations
• Early test cases: GeoSciences; Archaeology; Mechanical Engineering; Humanities
• University of Edinburgh
Physiology; Divinity; History; Brain Imaging; Astronomy
• University College London
Archaeology; Scandinavian Studies; Physics & Astronomy; Life & Medical Sciences
• Imperial College London
Chemical Engineering; Physics; Business School
• King’s College London
Geography; Psychiatry; Environmental Research; Biomedical And Health Sciences
• DataShare examples
Cardiac group; Dept of International Development; Social Sciences
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
Workshop on next steps for DAF
• Many of the pilots found the actual process of gathering information
on data management was more valuable than the asset register.
The DAF approach was felt to be useful for defining requirements
to improve data management. (JISC funded RDMI projects)
• A suggestion was made to enhance DAF with practical examples /
guidance from the pilot studies. (Implementation Guide)
• Align the DAF process with other data management planning tools.
(IDMP project between AIDA, DAF, DRAMBORA, LIFE)
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
GU scoping studies
• Digital preservation Advisory Board established at GU in 2008
• Keen to identify scale of digital preservation needs across the uni
• Scoping studies ran in 2009 in:
•
•
•
•
•
•
•
•
Archaeology
Chemistry
Corporate Communications
Court Office
English Language
Electronics and Electrical Engineering
Evolutionary Ecology and Biology
MRC Social and Public Health Sciences Unit
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
Methodology
• Semi-structured interviews
•
•
•
interview framework sent in advance
some background research done before interview e.g. reading staff profile
recorded (with permission) then transcribed and sent for comments
• Spoke with HoDs, researchers, teaching, admin and support staff
• Reviewed preliminary findings and increased scope
•
•
•
added more PhDs and ECRs as most researchers we’d spoken to were senior
added corporate communications for ‘web’ perspective
Spoke to additional key people at the Uni e.g. William Nixon, repository
manager; James Currall, security expert.
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
Interview framework
1. what digital material is being created
2. how this is being created and maintained
3. any issues that have been encountered
4. plans for the long-term e.g. preservation, reuse
5. requirements for support and services.
http://www.gla.ac.uk/media/media_126658_en.pdf
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
What did we find?
Pockets of good practice…
•to connect data with documentation,
we name files using a code number
which is the person’s initials, the lab
book number in roman numerals and
then the experiment number
It makes a huge difference if
somebody can come and talk
through problems and solutions
with you. A personal contact like
the RDOs is helpful.
•We produced documentation workflows on how to take material from the DAT machines,
how to transfer these into computer files, guidelines on transcription and anonymisation,
and making derivates. It’s all very well documented which means there is consistency
across the team, which is vitally important.
…. but a lot of confusion and need for support
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
Procedures for creation & management
•the network has always been the bane of everyone’s
lives to find stuff on - you end up opening umpteen
files to see if it’s the one you’re after
•The volume of data produced makes
maintenance a bit like drinking from a fire hose.
•the licence is very expensive and if this
weren’t renewed it wouldn’t be possible to
continue to access the data
They had major problems
last year moving from
ArcGIS 9.1 to 9.3 –
everything stopped working
as they’d changed the geodatabase format. It was not
straightforward to fix…
•the paper records system hasn’t
transferred easily to the digital
Digital images are a classic case in point as many
still have the numerical ordering and cryptic letter
sequence auto-generated by the camera.
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
Storage and backup
Research groups tend to run
their own little fiefdom. The
correlation seems to be the
more computers they have,
the less IT expertise there is.
•Insufficient backup space is a recurring
problem, but it’s not really a lack of
space, it’s more an issue of not being
able to control what people store on
their hard drives.
•People bring in sticks with 4GB of data on
that simply no longer work and nothing can
be done to retrieve it.
If they throw some money at the problem
they can install another networked drive
and the problem goes away for a while
large and reliable storage is
expensive. You need this for
home directories but things that
are to be archived or backed up
could be punted out of the way
to cheaper storage.
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
Selection / long-term preservation
•It’s one thing to keep something going, but are people still able to use it in the same way?
•If the website comes to an end, the data could still be preserved, but you lose the richness
of being able to search that, or see it on a map, or have them synchronised.
•How do you decide what can
be deleted? I’m not confident to
make that decision.
•it’s like giving your baby away
Probably only one tenth of what’s
currently held should be retained.
•Archiving is to allow someone else to reuse it
If I know the code will be public I’ll pay more
attention to properly annotating it with comments
so other people can understand it.
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
… because good research needs good data
What next…
• DPAB continues to address this at senior management level
• JISC-funded Incremental project (part of MRD programme)
•
Ensuring researchers can find guidance and support when needed
•
Making data training and guidance more understandable to researchers
•
Offering tailored support and partnering
http://www.lib.cam.ac.uk/preservation/incremental/index.html
www.data-audit.eu/
Tools of the Trade Workshop, Manchester, 19th May 2010
Download

DAF Methodology and Glasgow Uni Scoping Study