View/Open - ScholarsArchive@OSU is Oregon State University's

advertisement
USING DATA MANAGEMENT PLANS
as a RESEARCH TOOL
for IMPROVING DATA SERVICES
in ACADEMIC LIBRARIES
Amanda Whitmire
Patricia Hswe & Brian Westra
DLF Forum 2015
Vancouver, BC Canada
26-28 October 2015
Jake Carlson in absentia
Susan Wells Parham
D A R T Team
DART Project | @DMPResearch
Amanda Whitmire | @AWhitTwit
Jake Carlson | @jrcarlso
Patricia M. Hswe | @pmhswe
Susan Wells Parham | 
Lizzy Rolando | 
Brian Westra | @bdwestra
http://dmpresearch.library.oregonstate.edu
27 Oct 2015
2
D A R T Team
Acknowledgements
Amanda Whitmire | Oregon State University Libraries
Jake Carlson | University of Michigan Library
Patricia M. Hswe | Pennsylvania State University Libraries
Susan Wells Parham | Georgia Institute of Technology Library
Brian Westra | University of Oregon Libraries
This project was made possible in part by the
Institute of Museum and Library Services
grant number LG-07-13-0328.
27 Oct 2015
3
@DMPResearch
27 Oct 2015
4
@DMPResearch
27 Oct 2015
5
@DMPResearch
27 Oct 2015
6
@DMPResearch
27 Oct 2015
7
transition slide
@DMPResearch
27 Oct 2015
8
Levels of data services
high level
mid-level
the basics
infrastructure
metadata
support
DMP review
data curation
facilitate
deposit in
DRs
consults
website
dedicated
“research
services”
workshops
From: Reznik-Zellen, Rebecca C.; Adamick, Jessica; and McGinty, Stephen. (2012). "Tiers of Research
Data Support Services." Journal of eScience Librarianship 1(1): Article 5.
http://dx.doi.org/10.7191/jeslib.2012.1002
27 Oct 2015
9
@DMPResearch
Informed data services development
Survey
@DMPResearch
27 Oct 2015
10
Informed data services development
Survey
DCPs
@DMPResearch
27 Oct 2015
11
Informed data services development
DMP
Survey
DCPs
@DMPResearch
DMPs
27 Oct 2015
12
DART Premise
DMP
researcher
Research Data
Management
knowledge
capabilities
practices
needs
@DMPResearch
27 Oct 2015
13
@DMPResearch
27 Oct 2015
14
DART Premise
Research Data
Management
knowledge
capabilities
Research Data
Services
practices
needs
@DMPResearch
27 Oct 2015
15
DART Premise
@DMPResearch
27 Oct 2015
16
We need a tool
@DMPResearch
27 Oct 2015
17
Solution: an analytic rubric
Performance Criteria
Performance Levels
Winning
Okay
No
Thing 1
Thing 2
Thing 3
@DMPResearch
27 Oct 2015
18
NSF Directorate or Division
NSF Directorate or Division
BIO
ENG
DBI
DEB
EF
IOS
MCB
CISE
ACI
CCF
CNS
IIS
EHR
DGE
DRL
DUE
HRD
SBE
BCS
SES
SMA
Biological Sciences
Biological Infrastructure
Environmental Biology
Emerging Frontiers Office
Integrative Organismal Systems
Molecular & Cellular Biosciences
Computer & Information Science & Engineering
Advanced Cyberinfrastructure
Computing & Communication Foundations
Computer & Network Systems
Information & Intelligent Systems
Education & Human Resources
Division of Graduate Education
Research on Learning in Formal & Informal Settings
Undergraduate Education
Human Resources Development
Social, Behavioral & Economic Sciences
Behavioral & Cognitive Sciences
Social & Economic Sciences
SBE Office of Multidisciplinary Activities
Engineering
CBET
CMMI
ECCS
EEC
EFRI
IIP
GEO
Chemical, Bioengineering, Environmental, & Transport Systems
Civil, Mechanical & Manufacturing Innovation
Electrical, Communications & Cyber Systems
Engineering Education & Centers
Emerging Frontiers in Research & Innovation
Industrial Innovation & Partnerships
Geosciences
AGS
EAR
OCE
PLR
MPS
Atmospheric & Geospace Sciences
Earth Sciences
Ocean Sciences
Polar Programs
Mathematical & Physical Sciences
AST
CHE
DMR
DMS
PHY
@DMPResearch
Astronomical Sciences
Chemistry
Materials Research
Mathematical Sciences
Physics
27 Oct 2015
19
Source
Guidance text
NSF guidelines
The standards to be used for data and metadata format and content (where existing
standards are absent or deemed inadequate, this should be documented along with any
proposed solutions or remedies)
BIO
Describe the data that will be collected, and the data and metadata formats and standards
used.
CSE
The DMP should cover the following, as appropriate for the project: ...other types of
information that would be maintained and shared regarding data, e.g. the means by which
it was generated, detailed analytical and procedural information required to reproduce
experimental results, and other metadata
ENG
Data formats and dissemination. The DMP should describe the specific data formats,
media, and dissemination approaches that will be used to make data available to others,
including any metadata
GEO AGS
Data Format: Describe the format in which the data or products are stored (e.g. hardcopy
@DMPResearch
27 Oct 2015
20
Project team
testing &
revisions
Feedback &
iteration
Rubric
@DMPResearch
Advisory
Board
27 Oct 2015
21
Inter-rater reliability
5 June 2015
22
Performance Level
Directorate- or divisionspecific assessment criteria
General Assessment
Criteria
Performance
Criteria
Complete / detailed
Addressed issue, but
incomplete
Did not address
issue
Directorates
Describes what types
of data will be
captured, created or
collected
Clearly defines data type(s).
E.g. text, spreadsheets, images, 3D
models, software, audio files, video
files, reports, surveys, patient
records, samples, final or
intermediate numerical results from
theoretical calculations, etc. Also
defines data as: observational,
experimental, simulation, model
output or assimilation
Some details about data
types are included, but
DMP is missing details or
wouldn’t be well
understood by someone
outside of the project
No details
included, fails to
adequately
describe data
types.
All
Describes how data
will be collected,
captured, or created
(whether new
observations, results
from models, reuse
of other data, etc.)
Clearly defines how data will be
captured or created, including
methods, instruments, software, or
infrastructure where relevant.
Missing some details
regarding how some of
the data will be
produced, makes
assumptions about
reviewer knowledge of
methods or practices.
Does not clearly
address how
data will be
captured or
created.
GEO AGS,
GEO EAR SGP,
MPS AST
Identifies how much
data (volume) will be
produced
Amount of expected data (MB, GB,
TB, etc.) is clearly specified.
Amount of expected
data (GB, TB, etc.) is
vaguely specified.
Amount of
expected data
(GB, TB, etc.) is
NOT specified.
GEO EAR SGP,
GEO AGS
@DMPResearch
27 Oct 2015
23
@DMPResearch
27 Oct 2015
24
DMP
text
“The results of this research will be presented at major biological science conferences,
including the Ecological Society of America meeting and the annual Soil Ecology Society
meeting, and published in peer-reviewed journals. All data and sample access will be
handled according to NSF data-sharing policies. Samples of soil strata will be stored
appropriately for future use or for lending to other institutions.”
Performance
Criterion
Provides details
on when the data
will be made
publicly available
Complete / detailed
Clearly specifies
when the data will
be made available
to people outside of
the project.
Addressed issue,
but incomplete
Vaguely specifies that the data will
be made available outside of the
project but does not include a date
or specific time frame.
@DMPResearch
Did not address
issue
Does not specify
when the data will
be made available
outside of the
project.
27 Oct 2015
25
DMP
text
“The results of this research will be presented at major biological science conferences,
including the Ecological Society of America meeting and the annual Soil Ecology Society
meeting, and published in peer-reviewed journals. All data and sample access will be
handled according to NSF data-sharing policies. Samples of soil strata will be stored
appropriately for future use or for lending to other institutions.”
Performance
Criterion
Describes how
the data will be
made publicly
available
Complete / detailed
Includes specific
details on the
means by which the
data will be made
available.
Addressed issue,
but incomplete
Includes vague/limited details on
the means by which the data will be
made available, or sharing details
can be inferred because the plan
indicates that data will be
deposited with a repository or
archive.
@DMPResearch
Did not address
issue
Includes no details
on the means by
which the data will
be made available.
27 Oct 2015
26
General results
@DMPResearch
27 Oct 2015
27
Distribution of DMPs across directorates
@DMPResearch
27 Oct 2015
28
DMP
section
1
n=436
2
3
n=105
n=111
4
5
@DMPResearch
27 Oct 2015
29
Identifies a metadata standard
@DMPResearch
27 Oct 2015
30
Sharing
method
@DMPResearch
27 Oct 2015
31
Data types, metadata, data formats, reuse/redistribution/derivatives
BRIEF ANALYSIS OF BIO DMPS
27 October 2015
3
2
DMPs from the BIO Directorate
45 BIO DMPs (< 10% of all reviewed DMPs)
– University of Oregon: 17
– Penn State: 10
– University of Michigan: 7
– Oregon State: 6
– Georgia Tech: 5
However, one DMP
stated it would not be
collecting any data.
As a result, analysis was
done with 44 DMPs.
27 October 2015
33
Distribution across BIO directorates
27 October 2015
34
27 October 2015
3
4
BIO: Description of data types
27 October 2015
35
27 October 2015
35
BIO: Metadata standards? Which ones?
Rubric
requirement:
Does the plan mention
a specific metadata
standard? If yes, please
describe.
27 October 2015
36
27 October 2015
3
6
BIO: Policies for Sharing and Public Access
27 October 2015
37
27 October 2015
37
BIO: Policies for reuse and redistribution
27 October 2015
38
27 October 2015
3
8
Looking across the data
SHARING AND ARCHIVING
27 October 2015
39
27 October 2015
3
9
BIO: How will they share the data?
27 October 2015
40
27 October 2015
4
0
BIO: How will they archive the data?
27 October 2015
41
BIO: Thoughts/Looking ahead
• Connects & disconnects
• Cross-directorate &
cross-institutional
comparisons
• Implications for library
services / librarianship
• Implications for NSF and
the requirement
• Implications for curation
infrastructure
• Compare with reviews of
post-2013 BIO DMPs
42
27 October 2015
4
2
A brief look into
SBE DMPs
@DMPResearch
27 Oct 2015
43
Social, Behavioral and Economic Sciences (SBE)
Directorate (n=50)
SBE Office of Multidisciplinary Activities
1
Social & Economic Sciences
3
Behavioral & Cognitive Sciences
4
7
17
2
1
1
1
4
5
4
GT
OSU
PSU
UMICH
UO
27 October 2015
4
4
Data description – SBE Guidance
“Expected data. The DMP should describe the
types of data, samples, physical collections,
software, curriculum materials, or other
materials to be produced in the course of the
project.”
(SBE guidance: http://www.nsf.gov/sbe/SBE_DataMgmtPlanPolicy.pdf)
27 October 2015
4
5
Describes types of data to be
produced
Did not address
Addressed but
incomplete
Complete/
detailed
0
10
20
30
40
50
27 October 2015
4
6
Example: Data to be produced
•
•
•
•
raw electroencephlogram (EEG)
autonomic nervous system (ANS) physiology recordings
behavioral data
data from questionnaires and other assessments
Data will be stored in native formats, in secure spreadsheets
(e.g., Excel), statistical data files (e.g., SPSS), and matrices
(MATLAB).
27 October 2015
4
7
Restrictions on sharing
“Factors that might impinge on their ability to
manage data, e.g. legal and ethical restrictions
on access to non-aggregated data.”
27 October 2015
4
8
Describes protections for sensitive data
Did not
address (2%)
N/A (32%)
Complete
(48%)
Incomplete
(18%)
27 October 2015
4
9
Describes protections for sensitive data
“No research material that includes
personally identifiable information will be
re-used or re-distributed publicly without
specific written consent from participants.”
27 October 2015
5
0
SBE: Data Sharing
42%
26%
20%
22%
18%
12%
10%
8%
2%
2%
27 October 2015
5
1
Named data centers and repositories
• Archbishopric Archive of Lima (AAL),
Peruvian National Archive (AGN).
• Chandel Endangered Languages
Committee archive
• CUAHSI-Consortium of Universities
for the Advancement of Hydrologic
Science, Inc
• ICPSR (4)
• Intensively Monitored Watershed
server
• International Tree-Ring Data Bank
• Laboratory of Linguistic and
Anthropological Documentation and Research
in Argentina.
• National Science Digital Library web site
• NBER's data website
• Paleomagnetic Database Portal (MAGICPMAG)
• PANGAEA, NOAA Paleoclimatology archive,
CUAHSI
• Online Repository of African American
Language Corpora (ORAAL)
27 October 2015
5
2
Named data centers and repositories
What can this tell us?
Similar to other elements of the DMP,
it may provide insight into:
• Intent
• Knowledge
• Previous or current practice
27 October 2015
5
3
A take-home lesson
funder guidance +
requirements +
personal practices +
domain practices +
intentions
=
DMP
27 October 2015
5
4
Summing up
@DMPResearch
27 Oct 2015
55
27 October 2015
56
transition slide
@DMPResearch
27 Oct 2015
57
content slide
27 October 2015
5
8
DMP
text
“The data that we generate will be digital (video and audio recordings, plus
transcriptions). We have made arrangements to have the 40 hours of audio and video
recordings, corresponding transcriptions and educational materials archived at the
Endangered Languages Archive. Archived materials will be accessible to the public in
accordance to access restrictions specified by each speaker.”
Performance
Criterion
Complete / detailed
Indicates whether Clearly indicates
or not the data
whether or not data
will be archived
will be archived,
including digital data
and physical samples
where applicable.
Addressed issue,
but incomplete
Did not address
issue
Generally describes intent to
preserve some aspects of data, but
lacks clarity on portions of the
dataset. E.g., indicates that digital
or physical data will be archived but
isn't explicit about both.
Makes no
mention of intent
to archive or
preserve digital or
physical data.
@DMPResearch
27 Oct 2015
59
Download