LifeCycleModelsandPrinciples-day1

advertisement
Life Cycle Models & Principles
Jake Carlson
Associate Professor of Library Science
Data Services Specialist
Purdue University Libraries
What will be Covered
• An introduction to terms and concepts
relating to data lifecycles.
• An understanding of the purpose of lifecycle
models.
• Coverage of some life cycle models and
principles how they may relate to each other.
• An introduction to ICPSR’s lifecycle model, as a
loose framework for this workshop.
Data Science
– Loukides, M. (2011)
What is Data Science?
http://radar.oreilly.com
/2010/06/what-isdata-science.html
• “Data science enables the
creation of data products.”
• “We're increasingly finding data
in the wild, and data scientists
are involved with gathering
data, massaging it into a
tractable form, making it tell its
story, and presenting that story
to others.”
Data Curation
•
“…the active and on-going management of data
through its lifecycle of interest and usefulness to
scholarly
to scholarly
andand
educational
educational
activities.”
activities.”
- UIUC
- UIUC
GSLISGSLIS
http://cirss.lis.illinois.edu/CollMeta/dcep.html
http://cirss.lis.illinois.edu/CollMeta/dcep.html
•
“… the value-added activities and features that
stewards of content engage in to make the
content useful.” -- Nancy
Nancy McGovern,
McGovern, ICPSR
ICPSR
What is a Lifecycle?
The continuous
sequence of changes
undergone by an
organism from one
primary form, as a
gamete, to the
development of the same
form again.
http://www.dictionary.com
Graphic: http://insected.arizona.edu/manduca/Mand_cycle.html
Data Lifecycles
Primer on Data
Management
http://www.dataone.org/sites/all/documents/
DataONE_BP_Primer_020212.pdf
Why Use Life Cycle Models?
• Helps define and explain complex processes
(graphically).
• Help to identify important components, roles,
responsibilities, milestones, etc.
• Demonstrate connections and relationships
between parts and the whole.
• Provide a framework to develop services and
support.
Limitations of Lifecycle Models
• “All models are wrong, but some are useful”
George E.P. Box, Statistician, 1976
– Models generally reflect the interests,
perspectives (and biases) of the agencies that
created them.
– Models mask complexity.
– Models tend to overlook heterogeneity / diversity.
– Models are often presented as orderly and linear.
– Models depict the ideal.
Aspects of Lifecycle Models
• Subject Based
– Scholarly Communication
– Research
– Data
– Curation
• Source Based
– Individual
– Organizational
– Community
Scholarly Communication Lifecycles
Scholarly Communication Lifecycles
Gettysburg College
Library
Graphic:
http://www.gettysburg.edu/library/research/g
uides/scientific_information/index.dot
Research Lifecycles
Loughborough University Library (UK)
Graphic: http://www.lboro.ac.uk/services/library/research/
Scholarly Communication Lifecycles
Microsoft Research
Graphic: http://research.microsoft.com/en-us/news/features/zentity-052009.aspx
Research Lifecycle: Project
The Research360 Project will
develop technical and human
infrastructure for research
data management at the
University of Bath…
Focus in particular on issues
and challenges that arise
from private sector
partnerships and research
collaborations;
http://blogs.bath.ac.uk/resea
rch360/about/
Research Lifecycles: Specialized
CrossCultural
Surveys
Institute of
Social
Research
Graphic: http://ccsg.isr.umich.edu/intro.cfm
Research Lifecycle: Funding
Wayne State
University,
Division of
Research
Graphic:
http://spa.wayne.edu/grant/
Connecting Research & Data Lifecycles
“How JISC is Helping
Researchers”
http://www.jisc.ac.uk/whatwedo/campaigns/res3/jis
chelp.aspx
Data Lifecycles
Chuck Humphrey (2006) “e-Science and the lifecycles of Research
http://datalib.library.ualberta.ca/~humphrey/lifecycle-science060308.doc
A Data Curation Profile contains:

Information about an individual data set,
including it’s data lifecycle.

Current management practice.

Unmet needs.
http://datacurationprofiles.org
Individual Data Lifecycles are Unique
Individual
Data
Lifecycles
can be
Complex
Data Lifecycle Model: UVA
Data Mining
Data Curation
& Preservation
Data Search
DMP Consulting
Grant Writing
& Planning
Publication
Rights &
Restrictions
DM Planning
Data Processing
HPC/Visualization
Tool Development
Metadata &
Documentation
Data Storage
Image: University of Virginia Libraries Scientific Data Consulting Group:
http://dmconsult.library.virginia.edu/
Data Lifecycle Model for ICPSR
1. Proposal and Planning
2. Project Start Up
3. Data Collection
4. Data Analysis
5. Preparing Data for
Sharing
6. Deposit
ICPSR’s Guide to Social Science Data
Preparation and Archiving:
http://www.icpsr.umich.edu/icpsrweb/content/d
eposit/guide/
Common Elements in Data Lifecycle
•
•
•
•
Collect / Generate
Process
Analyze
Finalize / Summarize for Publication
Curation Lifecycle
Neil Beagrie (2004) “The Continuing Access and Digital
Preservation Strategy for the UK Joint Information Systems
Committee (JISC)” D-Lib Magazine.
http://www.dlib.org/dlib/july04/beagrie/07beagrie.html
Curation Lifecycle: DCC
http://www.dcc.ac.uk/resources/
curation-lifecycle-model
OAIS Reference Model: Preservation
ICPSR Pipeline Process
http://staging.icpsr.umich.edu/icpsralpha/content/data
management/lifecycle/oais.html
Deposit
Inputs – Materials to Deposit:
• Data
• Documentation
• Data Form (Description)
Outputs – SIP:
• Deposited Files
• Metadata from the
Deposit
• Signed Deposit Form
Ingest
Actions:
• Processing Plan
• Assign a Study Number
• Formatting for Access
and Preservation
Outputs – AIP:
• Data
• Documentation
• Set Up Files
• Processing History
Archival Storage
Actions:
• Migrations
• Checking integrity - checksums
• Making, storing and synching
redundant copies at various
locations
Outputs – Curated AIP
Data Management
Actions:
• Populating,
• Maintaining,
• Making the descriptive
information accessible
Outputs:
• Compliant Metadata
Access
Actions:
• Data set is indexed,
searchable and made
available.
Outcome – DIP:
• Data and document files
• Bibliography file
• Study description file
• Terms of use file
• File Manifest
Common Elements in Curation Lifecycle
•
•
•
•
•
•
Deposit / Ingest
Storage
Document / Describe
Discover / Access / Use
Manage
Preserve
Lifecycle Models & Data Services
• Need for developing your organizational
model – based on community models and
informed by individual lifecycles.
• Need for alignment between data lifecycles
and curation lifecycles – informed by research
and scholarly communication lifecycles
Alignment Between Lifecycles
Proposal
Develop
ment &
DMP
Project
Start-up
Data
Collection
& File
Creation
Research
Data
Analysis
Ingest
Ingest
Storage
Data Mgmt
Archival
Storage
Archival
Access
Access
Storage
Preparing
Data for
Sharing
Scholarly
Communication
Example of Lifecycle Alignment
Image: Green, Ann G., and
Myron P. Gutmann. (2007).
“Building Partnerships Among
Social Science Researchers,
Institution-based Repositories,
and Domain Specific Data
Archives.” OCLC Systems and
Services: International Digital
Library Perspectives, 23: 35-53.
Life Cycle Models & Principles
Jake Carlson
Associate Professor of Library Science
Data Services Specialist
Purdue University Libraries
Download