Amanda Whitmire Maura Valentino OSU Libraries OPP Workshop Series

advertisement
Amanda Whitmire
Maura Valentino
OSU Libraries
OPP Workshop Series
5 December 2012
Why is a Librarian asking?
We are curious.
We manage information.
Data are a kind of
information.
TAKING CARE OF
YOUR DATA
What’s your plan?
GOAL:
Achievable habits for implementing
data management best practices into
your workflow
Research data is:
“…the recorded factual material
commonly accepted in the
scientific community as
necessary to
validate research findings.”
U.S. Office of Management and Budget, Circular A-110
Data curation is:
“…management activities required
to maintain research data long-term
such that it is available for
reuse and preservation.”
Wikipedia
CURATION ≠ ARCHIVAL
“It is obvious that
making data widely
available is an
essential element of
scientific research.”
Science editorial, “Making
Data Maximally Available,”
11 Feb 2011
The case for data
management
stewardship
curation
etc.
$
Common missteps
“Why can’t I open this WordPerfect document?”
“I think those data are on a ZipDisk
somewhere…”
“Oh, that dataset is on our group server…”
“I never actually gave my advisor the final
dataset…”
“My laptop got stolen, so I lost the data…”
“It was so long ago, I can’t remember …”
Research data lifecycle
Data
transformed /
repurposed
Accessible
data located
New research
question
posed
Research
Cycle
Research
planning &
design
Data collection
& description
Data archiving
Dissemination
& publication of
findings
Data
processing &
analysis
How can we help?
Data
transformed /
repurposed
Accessible
data located
New research
question
posed
Research
Cycle
Research
planning &
design
Data collection
& description
Data archiving
Dissemination
& publication of
findings
Data
processing &
analysis
Where to start?
Make a plan. Consider:
How much data?
Resources needed
Roles & responsibilities
Metadata
Data formats
Data storage
Ethics & consent
Copyright (open data)
Sharing
Data storage & curation
Anticipate:
Volume/File type(s)
Raw data vs. processed/analyzed
data
File Naming Conventions
Privacy Concerns
Storage practice
Backup plans (LOCKSS,
checksums)
File naming conventions
1. Be consistent
•
•
•
Have conventions for naming:
(1) Directory structure
(2) Folder names
(3) File names
Always include the same information (e.g. date and time)
Retain the order of information (e.g. YYYYMMDD, not MMDDYYY )
2. Be descriptive
•
Try to keep file and folder names under 32 characters
example: Project_instrument_location_YYYYMMDDhhmmss_extra.ext
SG157_20100426_001.raw (raw data)

SG157_20100426_001.mat (working data)

(shareable)
ESPOMZ_SG157_20100426_001.txt
Legal and ethical considerations
Intellectual property
• Office for Commercialization & Corporate Development (OCCD)
• Copyright
Licensing
Charging for data?
Data attribution & citation
Human subjects?
 Informed consent & anonymization prior to publishing
Resources @ OSU:
• Office of Research Integrity, Institutional Review
Board (IRB)
• Responsible Conduct of Research (RCR) Program
Archiving and preservation
Policies
Preservation options
Types of repositories
Costs and benefits
A word about backups…
University of Southampton
School of Electronics & Computer Science
Southampton, UK, 2005
Metadata
“The metadata accompanying your
data should be written for a user 20
years into the future -- what does that
person need to know to use your data
properly? Prepare the metadata for a
user who is unfamiliar with your
project, methods, or observations.”
Oak Ridge National Laboratory Distributed Active
Archive Center for Biogeochemical Dynamics
(ORNL DAAC)
What is Metadata?
Metadata is “data about data”
WHO created the data?
WHAT is the content of the data?
WHEN were the data created?
WHERE is it geographically?
HOW were the data developed?
WHY were the data developed?
Metadata schemes
Metadata schemes
“Metadata schemes are like toothbrushes – everybody
agrees that you should use one, but nobody wants to use
someone else’s.”
You already use metadata…
-23
87
48
Metadata in use
State
City
Location
Date
Time Temperature (F)
Alaska
Anchorage
City Hall
2/12/2010
1400
-23
Florida
Miami
Weather
Center
2/12/2010
1400
87
New York
New York
Empire State
2/12/2010
Building
1400
48
Metadata in real life
You use it all the time…
Major metadata standards
Darwin Core | biological diversity, taxonomy
Dublin Core | general
DDI (Data Documentation Initiative) | social and
behavioral sciences data
DIF (Directory Interchange Format) | environmental
sciences
EML (Ecological Metadata Language) | ecology
FGDC/CSDGM (Federal Geographic Data
Committee/Content Standard for Digital Geospatial
Metadata) | geographic data
http://sbc.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-sbc.10
NBII (National Biological Information Infrastructure) |
Metadata activity!
Take it away,
Maura…
Let’s Describe this Dataset
Bright orange Garibaldi fish
Hypsypops rubicundus
California, USA
Ornate Butterfly fish
Chaetodon ornatissimus
Indo-Pacific
Scenario 1
Research for preschoolers to see
if they learn colors and patterns
better from real life examples
Scenario 2
Research on what fish are local to
a particular area. The photos are
the data
Scenario 3
Research into specific details of
specific types of fish
File/Folder Organization
You have monitors attached to 18 athletes (6
tennis players, 6 golfers, 6 rowers) for 7
days. Each day you get 2 readouts for each
athlete, 1 for heart rate and 1 for body
temperature. You transfer the data to Excel.
Name and organize the files for this
experiment.
Think about your own data
– What types of data need to be described?
– What are the relationships between them?
– What descriptive metadata can you find?
– What metadata is being captured automatically?
– What other descriptive metadata do you need to help users find
your data?
– What metadata do you need to help other scientists reproduce
your data or use it for comparison?
– What events has/will the data undergo?
– For how long do you want to retain the data?
– How intensive are your preservation needs?
– How diverse is your user base? Does this influence your
preservation needs?
Data Management Plans
Data Management Plans
The types of data
Data & metadata standards | format and
content
Policies for access and sharing
Policies and provisions for re-use
Plans for archiving data
{Budget}
Use available resources
https://dmp.cdlib.org/
http://www.dataone.org/d
ata-managementplanning
Contact information
Amanda Whitmire | Data Management
Specialist
amanda.whitmire@oregonstate.edu
Maura Valentino| Metadata Librarian
maura.valentino@oregonstate.edu
fin
Download