Life Cycle Models & Principles Jake Carlson Associate Professor of Library Science Data Services Specialist Purdue University Libraries What will be Covered • An introduction to terms and concepts relating to data lifecycles. • An understanding of the purpose of lifecycle models. • Coverage of some life cycle models and principles how they may relate to each other. • An introduction to ICPSR’s lifecycle model, as a loose framework for this workshop. Data Science – Loukides, M. (2011) What is Data Science? http://radar.oreilly.com /2010/06/what-isdata-science.html • “Data science enables the creation of data products.” • “We're increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.” Data Curation • “…the active and on-going management of data through its lifecycle of interest and usefulness to scholarly to scholarly andand educational educational activities.” activities.” - UIUC - UIUC GSLISGSLIS http://cirss.lis.illinois.edu/CollMeta/dcep.html http://cirss.lis.illinois.edu/CollMeta/dcep.html • “… the value-added activities and features that stewards of content engage in to make the content useful.” -- Nancy Nancy McGovern, McGovern, ICPSR ICPSR What is a Lifecycle? The continuous sequence of changes undergone by an organism from one primary form, as a gamete, to the development of the same form again. http://www.dictionary.com Graphic: http://insected.arizona.edu/manduca/Mand_cycle.html Data Lifecycles Primer on Data Management http://www.dataone.org/sites/all/documents/ DataONE_BP_Primer_020212.pdf Why Use Life Cycle Models? • Helps define and explain complex processes (graphically). • Help to identify important components, roles, responsibilities, milestones, etc. • Demonstrate connections and relationships between parts and the whole. • Provide a framework to develop services and support. Limitations of Lifecycle Models • “All models are wrong, but some are useful” George E.P. Box, Statistician, 1976 – Models generally reflect the interests, perspectives (and biases) of the agencies that created them. – Models mask complexity. – Models tend to overlook heterogeneity / diversity. – Models are often presented as orderly and linear. – Models depict the ideal. Aspects of Lifecycle Models • Subject Based – Scholarly Communication – Research – Data – Curation • Source Based – Individual – Organizational – Community Scholarly Communication Lifecycles Scholarly Communication Lifecycles Gettysburg College Library Graphic: http://www.gettysburg.edu/library/research/g uides/scientific_information/index.dot Research Lifecycles Loughborough University Library (UK) Graphic: http://www.lboro.ac.uk/services/library/research/ Scholarly Communication Lifecycles Microsoft Research Graphic: http://research.microsoft.com/en-us/news/features/zentity-052009.aspx Research Lifecycle: Project The Research360 Project will develop technical and human infrastructure for research data management at the University of Bath… Focus in particular on issues and challenges that arise from private sector partnerships and research collaborations; http://blogs.bath.ac.uk/resea rch360/about/ Research Lifecycles: Specialized CrossCultural Surveys Institute of Social Research Graphic: http://ccsg.isr.umich.edu/intro.cfm Research Lifecycle: Funding Wayne State University, Division of Research Graphic: http://spa.wayne.edu/grant/ Connecting Research & Data Lifecycles “How JISC is Helping Researchers” http://www.jisc.ac.uk/whatwedo/campaigns/res3/jis chelp.aspx Data Lifecycles Chuck Humphrey (2006) “e-Science and the lifecycles of Research http://datalib.library.ualberta.ca/~humphrey/lifecycle-science060308.doc A Data Curation Profile contains: Information about an individual data set, including it’s data lifecycle. Current management practice. Unmet needs. http://datacurationprofiles.org Individual Data Lifecycles are Unique Individual Data Lifecycles can be Complex Data Lifecycle Model: UVA Data Mining Data Curation & Preservation Data Search DMP Consulting Grant Writing & Planning Publication Rights & Restrictions DM Planning Data Processing HPC/Visualization Tool Development Metadata & Documentation Data Storage Image: University of Virginia Libraries Scientific Data Consulting Group: http://dmconsult.library.virginia.edu/ Data Lifecycle Model for ICPSR 1. Proposal and Planning 2. Project Start Up 3. Data Collection 4. Data Analysis 5. Preparing Data for Sharing 6. Deposit ICPSR’s Guide to Social Science Data Preparation and Archiving: http://www.icpsr.umich.edu/icpsrweb/content/d eposit/guide/ Common Elements in Data Lifecycle • • • • Collect / Generate Process Analyze Finalize / Summarize for Publication Curation Lifecycle Neil Beagrie (2004) “The Continuing Access and Digital Preservation Strategy for the UK Joint Information Systems Committee (JISC)” D-Lib Magazine. http://www.dlib.org/dlib/july04/beagrie/07beagrie.html Curation Lifecycle: DCC http://www.dcc.ac.uk/resources/ curation-lifecycle-model OAIS Reference Model: Preservation ICPSR Pipeline Process http://staging.icpsr.umich.edu/icpsralpha/content/data management/lifecycle/oais.html Deposit Inputs – Materials to Deposit: • Data • Documentation • Data Form (Description) Outputs – SIP: • Deposited Files • Metadata from the Deposit • Signed Deposit Form Ingest Actions: • Processing Plan • Assign a Study Number • Formatting for Access and Preservation Outputs – AIP: • Data • Documentation • Set Up Files • Processing History Archival Storage Actions: • Migrations • Checking integrity - checksums • Making, storing and synching redundant copies at various locations Outputs – Curated AIP Data Management Actions: • Populating, • Maintaining, • Making the descriptive information accessible Outputs: • Compliant Metadata Access Actions: • Data set is indexed, searchable and made available. Outcome – DIP: • Data and document files • Bibliography file • Study description file • Terms of use file • File Manifest Common Elements in Curation Lifecycle • • • • • • Deposit / Ingest Storage Document / Describe Discover / Access / Use Manage Preserve Lifecycle Models & Data Services • Need for developing your organizational model – based on community models and informed by individual lifecycles. • Need for alignment between data lifecycles and curation lifecycles – informed by research and scholarly communication lifecycles Alignment Between Lifecycles Proposal Develop ment & DMP Project Start-up Data Collection & File Creation Research Data Analysis Ingest Ingest Storage Data Mgmt Archival Storage Archival Access Access Storage Preparing Data for Sharing Scholarly Communication Example of Lifecycle Alignment Image: Green, Ann G., and Myron P. Gutmann. (2007). “Building Partnerships Among Social Science Researchers, Institution-based Repositories, and Domain Specific Data Archives.” OCLC Systems and Services: International Digital Library Perspectives, 23: 35-53. Life Cycle Models & Principles Jake Carlson Associate Professor of Library Science Data Services Specialist Purdue University Libraries