RDM workshop FELS/IDO

advertisement
An Introduction to
Research Data Management
29th October 2014
Isabel Chadwick,
Research Data Management Librarian
rdm-project@open.ac.uk
Overview of the workshop
•
•
•
•
•
•
What is Research Data Management?
Sharing data
Working with data
Planning for data
Useful resources
Questions?
What is Research Data Management?
“Research data management concerns the
organisation of data, from its entry to the research
cycle through to the dissemination and archiving of
valuable results. It aims to ensure reliable
verification of results, and permits new and
innovative research built on existing information."
Digital Curation Centre (2011)
Making the Case for Research Data Management
http://www.dcc.ac.uk/sites/default/files/documents/publications/Making%20the%20case.pdf
What is Research Data Management?
Discussion
What research data do you create/use?
What data management challenges do you face?
What is Research Data Management?
UK Data Archive Data Lifecycle model
Preserving
datato data
Giving access
Re-using
data
Analysing
data
Migrate
data
to
best
Distribute
data
Creating
data
Data
oftendata
have
a longer
• •Processing
research
• Follow-up
Interpret data
format
Share
data
Design
Enter
data,
research
digitise,
than
the
research
• ••lifespan
New
research
Derive
data
Migrate
data
to suitable
Control
access
Plan
transcribe,
data
management
translate
that
creates
them.
• ••project
Undertake
research
Produce research
medium
Establish
copyright
• reviews
Plan
Check,
consent
validate,
for clean
outputs
Back-up
and
store
data
Promote
data
sharing
data
may continue
to work
• ••You
Scrutinise
findings
Author publications
Create
metadata
and
Locate
Anonymise
existing
datadata
data after
funding
has
• ••onTeach
and
learn
Prepare
data
for
documentation
•ceased;
Collect
Describe
data
data projects
follow-up
preservation
•may
Archive
data
(experiment,
Manage
and
observe,
storetodata
analyse
or add
the
measure,
simulate)
data;
data may
be re-used
•by Capture
and create
other researchers.
metadata
http://www.data-archive.ac.uk/create-manage/life-cycle
What is Research Data Management?
Why spend time and effort on this?
• So you can work efficiently and
effectively
–Save time and reduce frustration
–Highlight patterns or connections
that might otherwise be missed
• Because your data is precious
• To enable data re-use and sharing
• To meet funders’ and institutional
requirements
Photo by HikingArtist.com
http://www.lurvely.com/photo/3000043099/Passing_time/
What is Research Data Management?
What does the OU expect?
“Research data must be managed to the highest
standards throughout their life-cycle in order to
support excellence in research practice.
In keeping with OU principles of open-ness, it is
expected that research data will be open and
accessible to other researchers, as soon as
appropriate and verifiable, subject to the
application of appropriate safeguards relating to
the sensitivity of the data and legal requirements.”
OU Principles of Research Data Management, April 2013
http://intranet.open.ac.uk/research-school/strategy-infogovernance/docs/CoPamendedJuly2013mergedwithappendix-forintranet.pdf
What is Research Data Management?
What do funders expect?
“Publicly funded research data are a public good,
produced in the public interest, which should be
made openly available with as few restrictions as
possible in a timely and responsible manner that
does not harm intellectual property.”
RCUK Common Principles on Research Data Policy, 2011
http://www.rcuk.ac.uk/research/datapolicy/
What is Research Data Management?
What do funders expect?
http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
What is Research Data Management?
ESRC Research Data Policy
• Submit a data management and sharing plan with the
grant application
• Include costs for RDM in the bid
• Incorporate data management into the research project
• Submit an annual report on on-going implementation of
the data management plan to ESRC
• Offer any data to the UK Data Service for archiving
(ReShare)
• Ensure data are available for sharing within
3 months of end of project
http://www.esrc.ac.uk/_images/Research_Data_Policy_2010_tcm8-4595.pdf
What is Research Data Management?
DFID Research Open and Advanced Access Policy
• Submit an Access and Data Management Plan
• Budget for RDM at commissioning stage to be included
in DFID’s award
• Deposit raw or derived datasets in a suitable open
access repository within 12 months of the final
collection
• Retain and provide free on request raw datasets for a
minimum of 5 years after project completion
• Deposit metadata for all outputs in R4D
https://www.gov.uk/government/publications/dfid-research-open-and-enhanced-access-policy
What is Research Data Management?
Hewlett Foundation
“The Hewlett Foundation now requires that
grantees receiving project-based grants openly
license the final materials created with those grants
under the most recent Creative Commons
Attribution license.
We also will require that the materials be made
easily accessible to the public, such as by posting
them to a grantee’s website.”
Hewlett Foundation (2014)
Commitment to Open Licensing
http://www.hewlett.org/about-us/values-policies/
commitment-open-licensing
What is Research Data Management?
UNESCO Global Open Access Portal
http://www.unesco.org/new/en/communication-and-information/portals-and-platforms/goap/
Sharing data
Benefits of sharing data
Sharing data
Benefits of sharing data (2)
Sharing data
Benefits of sharing data (3)
Sharing data
What do you need to share?
• Raw data
• Derived data
• Data underpinning
publications
• Code
• Methods
What are research data in your context?
What would others need to understand your research?
Sharing data
Barriers to sharing data: discussion
Discuss barriers to sharing
your research data.
• Ethical
• Legal
• Professional
How could these barriers
be overcome?
Sharing data
How can I share my data?
Funders’ repository services
• UK Data Service ReShare
• R4D
Online data sharing services
• Figshare
• Zenodo
• CKAN DataHub
Directories
• re3data
• DataBib
Working with data
“Start as you mean to go on”
The end point of all projects should
involve making the data publicly
available. Many data will be
deposited in national archives which
have regulations for files and
metadata.
Thinking about the requirements at
the beginning of the project will limit
the transformations needed at the
end of the project.
Working with data
Filing systems
Filing is more than saving files, it’s making sure you can find them later in
your project
•Naming
•Directory Structure
•File Types
•Versioning
All these help to keep your data safe and accessible.
Image by Theen Moy: https://www.flickr.com/photos/theenmoy/8078124630 (CC BY-NC-SA 2.0)
Working with data
Naming conventions
Decide on a file naming convention at the start of your project.
Useful file names are:
• consistent.
• meaningful to you and your colleagues.
• allow you to find the file easily.
Agree on the following elements of a file name:
• Vocabulary
• Punctuation
• Dates (YYYY-MM-DD)
• Order
• Numbers
• Version information
Ideally you should be able to tell what’s in a file before opening it.
Working with data
File formats
•
•
•
•
Unencrypted
Uncompressed
Non-proprietary/patent-encumbered
Open, documented standard
• Standard representation (ASCII, Unicode)
Type
Recommended
Avoid for data sharing
Tabular data
CSV, TSV, SPSS portable
Excel
Text
Plain text, HTML, RTF
PDF/A only if layout matters
Word
Media
Container: MP4, Ogg
Codec: Theora, Dirac, FLAC
Quicktime
H264
Images
TIFF, JPEG2000, PNG
GIF, JPG
Structured data
XML, RDF
RDBMS
Further examples: http://www.data-archive.ac.uk/create-manage/format/formats-table
Working with data
File formats
“Design
outputs
requiring
minimal data
download to
see and use…”
http://www.nationmaster.com/country-info/stats/Media/Internet/International-Internet-bandwidth/Mbps
DfID Open and Enhanced Access Policy
For more information:
Web design guidelines for low bandwidth: http://www.aptivate.org/webguidelines/Home.html
Publishers for Development bandwidth challenge: http://www.pubs-for-dev.info/bandwidthchallenge/
Working with data
Metadata
• Metadata is additional information that is required
to make sense of your files – it’s data about data.
• This is not a new idea; consider your music or film
collection;
• Think: title, authors, release date, producers,
directors, etc.
• Maybe the artwork, the studio, or format
Image by Wilfried Joh: https://www.flickr.com/photos/wilfriedjoh/11494134233 (CC- BY-NC-ND 2.0)
Working with data
Metadata (2)
Consider:
•What contextual details are needed?
– e.g. a description of the capture methods and data
analysis.
•How will you capture additional information?
– e.g. in papers, in a database, in a ‘readme’ text file, in
file properties/headers.
•Which standards will you use and why?
– Data centre recommendations for metadata, controlled
vocabularies, and required documentation.
Working with data
Metadata (3)
What contextual details are needed?
•Who is in this picture?
•When was it taken?
•Where are they?
•Who took this photo?
•How was this picture taken?
Working with data
Metadata (4)
How will you capture additional information?
•If your data were separated from a related publication,
would it make sense?
•If you have a results table or database, ensure that
metadata is provided for each column and/or row
•Record instructions for use for any software developed
•Your images need to have the required properties, which
can be automatically attached or can you add more
information manually
Working with data
Metadata (5)
Which standards will you use and why?
Many data centres recommend particular metadata for the
formats that they support.
This may be controlled vocabularies or required
documentation.
• Are you required to deposit in a particular data centre?
For more information: Digital Curation Centre Guide to Disciplinary
Metadata Standards (http://www.dcc.ac.uk/resources/metadata-standards)
Working with data
Storage and Security
Where to store your data:
• Networked drive
• Personal computers or
laptops
• External portable storage
(USB memory sticks, hard
drives, CDs)
• Cloud storage (eg.
Dropbox)
Cloud
storage
Networked
drive
Personal
computers
or
External
portable
storage
••laptops
The
best place
to store
yourof
the
provider
may
go out
longevity
is not
data while you are working on
• business
convenient
forespecially
storing your
guaranteed,
if
it.
• the
data
may
be stored
data
temporarily
they
are
not
stored
• IT managed Antivirus Software
of the
(orfor
the
• outside
should
not
be UK
used
correctly
on desktop
computers
storing
master
copies
of
•• EU)
IT
managed
vulnerability
errors
with
writing
to CDs
• secure
destruction
of data
management
program
your
data
and DVDs
are
common
difficult
toemail
ensure
•• is
IT
managed
filtering
local
drives
may
fail or PCs
• may
not
be
big
enough
for
solution
• what
is
the
provider’s
and
laptops
may data
be lost or
all
the
research
• policy
IT managed
protection
thenetwork
case
of
a
stolen in
leading
to an
technologies
(e.g. Firewall)
security
breach?
inevitable
loss Access
of your data
•Ensure
Secure that
Wireless
sensitivePoints
data
• Encryption
solutions to protect
is
encrypted
Ensure
you
aware
Ensure
you
haveare
a secure
sensitivethat
University
data
of the provider’s
passwordpolicies
Working with data
Security: sensitive data
Working with data
Security: sensitive data (2)
Managing sensitive data
• If possible, collect the necessary data without using
personally identifying information
• De-identify your data upon collection or as soon as
possible thereafter
• Avoid transmitting unencrypted personal data
electronically
• Consider whether you need to keep original collection
instruments (recordings, surveys etc.) once they have
been transcribed and quality assured
Working with data
Storage and Security: Discussion
Planning for data
DMPs are often submitted with grant applications,
but are useful whenever you are creating data to:
• Make informed decisions to anticipate
and avoid problems
• Avoid duplication, data loss and
security breaches
• Develop procedures early on for
consistency
• Ensure data are accurate, complete,
reliable and secure
Photo by CalsidyRose:
https://www.flickr.com/photos/calsidyrose/3552473207
BY-NC-SA 2.0)
• Save time and effort – make your life
easier!
Planning for data
Activity
Think about your own
research.
What actions would you
need to perform on your
data at each stage of the
UKDA’s Lifecycle model?
How would you do this?
Would you need any
additional funding/staff?
Planning for data
Activity
Planning for data
Which funders require a DMP?
Note: Data Management Plans are a requirement of
Horizon 2020 projects included in the Research Data pilot
www.dcc.ac.uk/resources/policy-and-legal/ overview-funders-data-policies
Planning for data
What do research funders want?
• A brief plan submitted in grant applications
• 1-3 sides of A4 as attachment or a section in Je-S
form
• Typically a prose statement covering suggested
themes
• An outline of data management and sharing plans,
justifying decisions and any limitations
Planning for data
ESRC Data Management Plans
The Data Management Plan is an integral part of the grant application.
• Analysis of existing data sources
• Information on the data that will be produced:
• data volume, data type, data quality, formats, standards
documentation and metadata
• planned quality assurance and back-up procedures
• plans for management and archiving of collected data
• expected difficulties in data sharing, along with and causes and
possible measures to overcome these difficulties
• consent, confidentiality, anonymisation and other ethical matters
• copyright and intellectual property ownership of the data
• responsibilities within research teams at all participating institutions.
For more information:
ESRC Research Data Policy: http://www.esrc.ac.uk/_images/Research_Data_Policy_2010_tcm8-4595.pdf
RDM intranet pages on data management planning (include OU examples):
http://intranet6.open.ac.uk/library/main/supporting-ou-research/research-data-management/creating-your-data/datamanagement-plans-0
Planning for data
DfID Access and Data Management Plans
• Outlines the researchers’ strategy for maximising
opportunities to make research outputs openly accessible
• Where appropriate, the plan will be assessed as part of the
award process.
• Plans usually also required when competitive tendering is
not used
• May be further developed during the inception phase and
revisited and revised during the course of the project or at
annual review as required.
For more information:
DfID Research Open and Enhanced Access Policy :
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/181176/DFIDResearch-Open-andEnhanced-Access-Policy.pdf
RDM intranet pages on data management planning (include OU examples):
http://intranet6.open.ac.uk/library/main/supporting-ou-research/research-data-management/creating-your-data/datamanagement-plans-0
Planning for data
DMPOnline
A web-based tool to help you
write DMPs according to
different requirements. DCC,
funder and OU guidance.
https://dmponline.dcc.ac.uk
Planning for data
Tips
• Keep it simple, short and specific
• Seek advice - consult and collaborate
• Base plans on available skills and support
• Make sure implementation is feasible
• Justify any resources or restrictions needed
Planning for data
Put your plan into action…
– Ensure consistency
– Improve efficiency
– Maintain ethical practice
– Avoid security breaches and data loss
– Make the most of your data
Planning for data
Example of good practice
•
•
•
Katiba project
(Arts faculty)
Team of 5, based
in UK and Kenya
Data includes
interview
transcripts and
recordings,
photographs and
media clippings
Planning for data
Example of good practice (2)
RDM handbook written by
PI and RA expands on DMP:
• Responsibilities
• File naming, metadata,
quality control, questionnaire
design, storage and back-up
procedures
• Specific challenges of
working in Kenya
• Links to useful resources
Planning for data
Activity: documenting current procedure
In groups discuss the
areas of concern on
the matrix.
• What are your current
procedures?
• Can these be
improved? How?
• Are there any barriers
to improving current
practice??
Useful links
• The OU Research Data Management intranet site:
http://intranet6.open.ac.uk/library/main/supporting-ouresearch/research-data-management
• Digital Curation Centre: http://www.dcc.ac.uk/
• DMPOnline: https://dmponline.dcc.ac.uk/
• UK Data Archive: http://www.data-archive.ac.uk/
• MANTRA: http://datalib.edina.ac.uk/mantra/
• The Orb: http://open.ac.uk/blogs/the_orb
Questions?
Isabel Chadwick
Research Data Management Librarian
RDM-project@open.ac.uk
Photo credits
Janneke Staaks, Research Data Management
https://www.flickr.com/photos/jannekestaaks/1
4390184414
Ian “Harry” Harris, Order! Order!
https://www.flickr.com/photos/harryharris/3
00782460
Katy Fentress, Wanted Honest Leaders
https://www.flickr.com/photos/lakatyusha/7
588956704/
Climate Change, Agriculture and Food
Security, Workshop in Lushoto, Tanzania
https://www.flickr.com/photos/cgiarclimate/
8550330905/
Climate Change, Agriculture and Food Security,
East Africa Strategic Futures Workshop
https://www.flickr.com/photos/cgiarclimate/79852
52532
Brian Wolfe, “Good teacher” “Good student”
https://www.flickr.com/photos/mightyboybrian/63
89271595
Global Partnership for Education, A
teacher shows a cell phone to her
students, Chennai India
https://www.flickr.com/photos/gpforeducati
on/8644408460
DFID, Education for all
https://www.flickr.com/photos/dfid/3860978139
Climate Change, Agriculture and Food
Security, Assessing how Indian farmers
manage climate and weather risks in India
https://www.flickr.com/photos/cgiarclimate/
8000068204
Afromusing, IMG_1502.JPG
https://www.flickr.com/photos/afropicmusing/214
2907771/
Download