Griffith University*s Journey in Data Citation

ANDS Webinar 5 June 2014
Natasha Simons
Senior Data Management Specialist
Australian National Data Service
Located at: Griffith University, Brisbane, Australia
Tw: @n_simons
Established in 1971 and opened in
Now has five South-East
Queensland campuses
Around 43,000 students and 4,300
26 schools and departments in
four academic groups:
 Arts, Education and Law
 Business
 Health
 Science, Environment,
Engineering and Technology
Image credit: Danny Munnerley,
32 research centres and institutes
Priority areas
Image credit: Anne Ruthmann,
Water science
Drug discovery and infectious diseases
Asian politics, security and development
Climate change adaptation
Criminology and crime prevention
Music, the arts and the Asia Pacific
Sustainable tourism
Chronic disease prevention
Physical sciences
Environmental sciences
Strong commitment from
University leaders to improving
data management
Staff resources – operational and
project related
Successful in seeking funding from
ANDS and NeCTAR to build
national and local infrastructure
Strong emphasis on seeking
internal funds and working with
researchers on grants for funds to
develop, enhance and support
institutional tools
Policy frameworks and service
models for data management
support under discussion
There are really only two things you need
before you start on a data citation
1. Some research data collections at
your institution that have open,
embargoed or mediated access.
1. A publically available metadata
record that describes each of these
collections and provides access to
Image credit:
At Griffith, we have:
1. Research Data Repository
1. Research Hub (metadata store/researcher
profile system) -
On your journey, you may also need:
1. Management support:
Malcolm Wolski
eResearch Services & Scholarly
Application Development
Division of Information Services
Griffith University
2. Technical support:
Arve Solland
Senior Developer,
eResearch Services & Scholarly
Application Development
Division of Information Services
Griffith University
August –‘PIDs for data’ options paper, recommended DOIs
August – ANDS launched Cite My Data service pilot
September to December – signed up; developed m-2-m scripts, minted DOIs
c.May - Put ‘Cite this collection’ feature in Griffith Research Hub
October - Commenced data citation project
May - Concluded data citation project
September – produced DOI guidelines; developed roadmap
Griffith needed a persistent identifier that would:
• Fill gaps in persistent identifiers for scholarly works
• Replace long and incomprehensible URLs for metadata
• Signal long-term management of our research data collections
• Contribute to the semantic vision for data in the Research Hub
• Later: foundation for data citation.
We chose DOIs to meet our needs because they:
• Are a global persistent identifier, already used for many scholarly publications
• Can be assigned to research data, theses, grey lit and even software code
• Improve visibility of, and access to, research data
• Gave us responsibility for managing persistent access to our data collections
• Won’t break when IR software is re-indexed (as handles sometimes do)
Later, because they:
• Facilitate data citation
• Greatly assist tracking impact of data sets through collection of metrics and
altmetrics based on DOI
The ANDS Cite My Data service provided:
 Partnership with international DOI registration agency: DataCite
 Minting DOIs for metadata records about open, mediated or embargoed
research data, theses, grey literature (even software code)
 Machine-to-machine workflow
 Easily achieved kernel metadata
 Trial in safe test environment
 High level documentation for the M-2-M provided by ANDS
 High level information on data citation on the ANDS website
 Free!
And so we became the first guinea pigs of the Cite My Data Service….
1. Sign agreement to use the service
2. ANDS give you an institutional id
3. Prepare your m-2-m script (includes required metadata for each DOI:
title, creator, publisher, publication year, identifier)
4. Execute script against Cite My Data service
5. Cite My Data service returns DOIs
6. Store DOIs in own system
7. Create citation element
8. Make citation element avail in RIF-CS feed for ANDS harvester
DOI scipts:
What’s the criteria for assigning a DOI to a research data collection?
At what level of granularity should a DOI be applied?
Should the DOI link to the landing page or the actual data? Which landing page?
What if the data is changed e.g. updated? Should a new DOI be issued?
Should researchers be able to mint the DOI or should we mint it for them?
How are DOIs assigned if the research data is the result of a collaboration between
various institutions?
What happens to the DOIs we have minted if ANDS closes shop?
Can you cite data without a DOI?
Implementing DOIs for Research Data D-Lib article
We found answers to our questions and wrote them up in guidelines:
Digital Object Identifiers (DOIs): Introduction and
Management Guide
Available for download from the ANDS website:
Documented our experiences in the Gold Standard Project @ Griffith blog:
Established a blog -
Spoke with librarians about citation practices in different disciplines
Included data citation as part of standard consultations with a group in
Health & an individual in environmental economics
Notifications workflows
Investigated Dryad automated notifications workflows
 Modified their depositor notification
 Manually emailed collections owners of new collections
 Notifications added to technical requirements for data deposit
Reviewed existing information and workflows
Griffith policies and procedures
 Academic style guides
 Training materials and guides
Included data citation in new Best practice guidelines for researchers:
managing research data and primary materials
Citation practices
Style guides
Publishing protocols
Target audiences
Types of research output
Usage of metrics
Age and career stage
Attitudes to open access
 Motivations
 Technical know-how
Image credit: Taki Steve,
Find ‘hooks’ in the researchers’
e.g. point of data deposit
 e.g. final report on funded research
 e.g. through data planning
Long term goal should be to
get in early - improving the
training and supporting
artefacts (style guides,
bibliographic management
software) that introduce new
students and researchers to
the principles of citation
Image credit: Todd Lappin,
A depositor shouldn’t have to
know what a DOI is or where it
comes from, or be asked to
make a decision about
whether they want one or not
Minting DOIs should be done
automatically for collections
that meet the rules defined by
the ‘publisher’ of the deposited
data (in this case, Griffith
University) and the DOI
registration agency
Image credit: Taki Steve,
Be honest about the evidence base – they’re researchers so
they will ask!
Be honest about the lack of rewards within the current
system and have empathy – researchers know better than us
what they do and don’t get rewarded for
action is
needed for
change in
these areas
Style guides
Tools e.g.
of data
and training
these now and
in the near
We mostly
know what we
are doing with
D-Lib article: Growing Institutional Support for Data Citation: Results of a Partnership
between Griffith University and the Australian National Data Service
What Griffith University are doing to establish a culture of data citation:
But we didn’t conquer the world…
On the ‘to do’ list:
• Embedding DOIs into automated data collection workflows
• Minting DOIs for grey literature: theses, reports, discussion papers etc.
• Improving links between research publications and underlying data
• Reviewing DOI guidelines, rules and workflows at future points in time
• Embedding types of metadata, such as COINS, into the landing pages to assist
import into citation tools
Easy lessons learnt:
Do what you can with what you have available
Technical minting and maintaining of DOIs is relatively easy
Cite My Data service is straight forward
Getting citation element is also relatively easy
There are a lot of materials available now on DOIs (infrastructure) and on
data citation (researchers) so don’t reinvent the wheel
• You could decide to set up an administrator interface for minting and
maintaining the DOIs (e.g. the way TERN have done this). This would run
over the top of the m-2-m scripts.
Hard lessons learnt:
• Establishing workflows for DOIs and data citation is not easy if you don’t
know when researchers are going to publish their data and if data publication
is not routine
• Data citation is not (yet) common practice but there is a large international
community supporting data citation as a principle and to encourage practice
• There is a growing body of evidence on a positive link between open data
and citation counts