Documentation and Metadata - University of Virginia Library

advertisement
Data Management:
Documentation & Metadata
Types of Documentation
Data
Discovery
Proposal
Planning
Writing
Project
Start Up
Re-Use
Data
Collection
Data
Analysis
Deposit
Data
Archive
Data
Sharing
Re-Purpose
Data Life Cycle
End of
Project
Data Documentation (Metadata)
• Informal or formal methods to describe your
data
• Important if you want to reuse your own data
in the future
• Also necessary when sharing your data
2
You’re already documenting your data
• Notebook
– Paper
– Digital
– Lab
• Folders with notes, text files
• Sources, experiments or surveys, procedures,
etc.
3
Documentation in Research
Project Documentation
Dataset Documentation
Context of data collection
Data collection methods
Structure, organization of data files
Data sources used
Data validation, quality assurance
Transformations of data from the
raw data through analysis
• Information on confidentiality,
access and use conditions
• Variable names and descriptions
• Explanation of codes and schemas
used
• Algorithms used to transform data
• File format and software (including
version) used
•
•
•
•
•
•
4
Types of Documentation
Documentation for understanding & re-use
• Readme File
• Data Dictionary
• Codebook
5
ReadMe
• Describes the core documentation about an
investigation and its data files
• Typically a simple text file
• Can describe the individual file(s) and/or data
package as a whole
6
ReadMe Example - Dataset
7
Data Dictionary
• Provides definitions of the data fields in a data file
• More details on the variables, observations of a
file
• Used to understand the data and the databases
that contain it
• Identifies data elements and their attributes
including names, definitions and units of measure
and other information
• Often they are organized as a table
8
Data Dictionary Example
9
What is a Codebook?
• Typical in social sciences research
• Includes elements similar to readme and
dictionary
– Project level information (e.g. survey design and
methodology)
– Response codes for each variable
– Codes used to indicate nonresponse and missing
data
http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what
-is-codebook
10
What is a Codebook?
• Additionally, codebooks may also contain:
– A copy of the survey questionnaire (if applicable)
– Exact questions and skip patterns used in a survey
– Frequencies of response
• Quite long!
http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/fa
qs/2006/01/what-is-codebook
11
Other Examples of Data Documentation
•
•
•
•
•
•
Lab notebooks
Software syntax
Programming code
Instrument settings and/or calibration
Provenance of sources of data
Embedded metadata (e.g. EXIF, FITS)
12
Download