Data Management: Documentation & Metadata Types of Documentation Data Discovery Proposal Planning Writing Project Start Up Re-Use Data Collection Data Analysis Deposit Data Archive Data Sharing Re-Purpose Data Life Cycle End of Project Data Documentation (Metadata) • Informal or formal methods to describe your data • Important if you want to reuse your own data in the future • Also necessary when sharing your data 2 You’re already documenting your data • Notebook – Paper – Digital – Lab • Folders with notes, text files • Sources, experiments or surveys, procedures, etc. 3 Documentation in Research Project Documentation Dataset Documentation Context of data collection Data collection methods Structure, organization of data files Data sources used Data validation, quality assurance Transformations of data from the raw data through analysis • Information on confidentiality, access and use conditions • Variable names and descriptions • Explanation of codes and schemas used • Algorithms used to transform data • File format and software (including version) used • • • • • • 4 Types of Documentation Documentation for understanding & re-use • Readme File • Data Dictionary • Codebook 5 ReadMe • Describes the core documentation about an investigation and its data files • Typically a simple text file • Can describe the individual file(s) and/or data package as a whole 6 ReadMe Example - Dataset 7 Data Dictionary • Provides definitions of the data fields in a data file • More details on the variables, observations of a file • Used to understand the data and the databases that contain it • Identifies data elements and their attributes including names, definitions and units of measure and other information • Often they are organized as a table 8 Data Dictionary Example 9 What is a Codebook? • Typical in social sciences research • Includes elements similar to readme and dictionary – Project level information (e.g. survey design and methodology) – Response codes for each variable – Codes used to indicate nonresponse and missing data http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what -is-codebook 10 What is a Codebook? • Additionally, codebooks may also contain: – A copy of the survey questionnaire (if applicable) – Exact questions and skip patterns used in a survey – Frequencies of response • Quite long! http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/fa qs/2006/01/what-is-codebook 11 Other Examples of Data Documentation • • • • • • Lab notebooks Software syntax Programming code Instrument settings and/or calibration Provenance of sources of data Embedded metadata (e.g. EXIF, FITS) 12