Slide 1 - Ideals - University of Illinois at Urbana

advertisement
Data Practices across Disciplines:
Informing Collections & Curation
Carole L. Palmer
Melissa H. Cragin, Tiffany Chao, & Nic Weber
Center for Informatics Research in Science & Scholarship
Graduate School of Library & Information Science
University of Illinois at Urbana-Champaign
iConference
9 February 2011
Seattle, WA
Data Conservancy studies of scientists
Astronomy
NCAR
Life
Sciences
Earth
Sciences
Social
Sciences
Task-based design and usability testing
 User cases, data requirements, system
recommendations
UCLA
Ethnography,
oral histories
 Use cases,
Data reqs.
SMALL SCIENCE
 Curation requirements
relating data characteristics &
community data practices
 Reuse potentials
ILLINOIS
Small science is big, and poorly curated
12,025 NSF grants awarded in 2007 = $2,865,388,605
20%
80%
Number
of Grants
2405
9621
Total Dollars
$1,747,957,451
$1,117,431,154
Range
$300,000 - $38,131,952
$579 - $300,000
Top 254 grants received 20% of the total awarded
(Heidorn, 2009)
Research questions & target domains
• What data, in what forms, are needed to advance research?
• What factors predict value for reuse of data sets?
• How do the dependencies among research communities evolve around
data resources?
Earth & life science intersections, with challenging curation problems:
systems geobiology - soil ecology - oceanography . . .
• interdisciplinary research; need for data from outside fields,
integration of data across fields and scales.
• production and use of compound / complex data sets.
• ingest / curation of community databases, policy and reuse issues.
Progressive data collection
Talking shop about data
- efficient exchange with the right scientists about the right things
Scientists leading research
- IP, access, discovery, research context
• Pre-interview worksheets
• Semi-structured interviews
• follow up sessions with selected participants
Scientists managing data - stages, versions, standards, tools
(post docs, others from labs and research groups)
• Data deposit & sharing worksheet
• Data samples, related documentation
Units of analysis
Data “sets”
aligned with research group production and dissemination
workflows and services
policies on attribution, embargoing, etc.
Data communities
Aligned with current and future interactions around data
representation, functionality, and use
policies for selection, appraisal, retention, description
Data communities
What are the meaningful social units for organization and
use of data over the long term?
• Sub-discipline focused on particular kinds of data that
produce specific measurements or analysis - (systems geobiology)
• Specialized domain focused on a research problem,
often interdisciplinary in nature - (urban vulnerability)
• Developers of shared community-level data collection
(i.e., “Resource Collection”, NSB 2005) - (soil science)
Core research challenge:
Predict and design for communities of users,
which will differ from producers, and change over time
Data curation and sharing dynamics
Data units
User
communities
Geobiology
Volcanology
Soil ecology
Site-specific time series:
Rock profile:
Database:
physical rock
thin section
chemical analysis
photographs
field notes
• multiple abiotic
soil measurements
• associated
metadata
• reduced spreadsheets:
rock, water, microbial
• microscopy images
• annotated digital
photographs
Geology
Chemistry
Microbiology
Genomics
U.S. Park Service
• by request
Sharing
• no repository
conventions • mostly post-publication
some unpublished
•
•
•
•
•
Geology – igneous
petrology
Geophysics
Geochemistry
Geology – bio geo
chemistry
Earthworm ecology
Sensor network
researchers
•
• by request
• no repository
• public resource
collection
Data Curation Framework
Data Conservancy collection criteria
• Broad scope, targeted research areas / needs
– earth sciences, life sciences, social sciences, and astronomy
• At-risk and highly unique or valuable data for target research areas
– consistent with the traditional role of special collections
• Data with high potential for future reuse
– Yet, producers often fail to recognize the potential for reuse by others.
(Cragin, Palmer, Carlson, & Witt. 2010.
Philosophical Transactions of the Royal Society A)
Hjørland’s epistemological potential of documents
• Representation (subject analysis) should go beyond description of aboutness
• Expose ability to “transfer knowledge”
– requires “understanding of which future problems can give rise to the use
of the document in question” (p. 93)
• Documents can have an infinite number of properties capable of informing a
user, therefore description must be informed by:
– Analysis of contributions to various user groups—beyond the originally
intended audience
– Prioritization of the contributions with the most “long-term utility”
– Categorizations that will function in the information system
Data as raw materials of research
• Do not transfer knowledge directly
• Processing and tools for intelligibility and interpretation
• Effort and resources to determine integrity and fit for new purpose
Curation roles in DC:
– Integrity - assessed in part by applying OAIS criteria for preservation
description information.
– Fit-for-purpose - alignment with the methods and tools of a given
research community.
Analytic potential of data
Data
domains
of interest
user
communities
integrity
contributions
fit-forpurpose
categorization
contributions
description
Data curation expertise
As was true with bibliographic resources,
understanding future uses of data involves comprehension of
particulars of data functionality and application
And,
• historical and cultural dynamics of research areas
• broad cross-disciplinary epistemological trends
to address needs of current and yet unknown user groups.
Questions & comments, please
clpalmer@illinois.edu
http://cirss.lis.uiuc.edu/
Center for Informatics Research in Science and Scholarship
Download