Metadata Management and Cataloging Breakout

advertisement
Metadata Management and
Cataloging Breakout
Jim Myers, Line Pouchard
Ann Chervenak, Richard Mount, Larry Rahn,
Greg Riccardi, Sonja Tidemann, Steve Wiley
Wrong Title?
“I could do more science if I could:
Automate workflow,
Search my data faster.
…”
“My paper is too short, I need more metadata…”
Drivers for Metadata
• Extracting more value from data
– Data useful beyond grad student lifetime
– Single-user efficiency
• Dealing with Moore’s Law, CS Advances
– Managing more complex experiments –
unique names is no longer sufficient
– Componetization of Codes
– (Decomposition of concerns (aspects))
Drivers for Metadata
• Changing Science – moving beyond an oral
tradition
– Need to share context-dependent data across
community(ies) (data dissemination/discovery)
– Support mapping between data models (across
domains, over time)
– Managing non-hierarchical data relationships /
multiple hierarchies at once
– Describing hypothesis/statements of trust
/reification (statements about other statements)
Catch 22
• Everybody says metadata is important, but few
actually record it
– Frog in the pot
– Tragedy of the Commons
– Paradigm shift
• What’s changing?
– New Science drivers require it
– New technologies will simplify capture and
management
Uses
– Provenance (original conditions, subsequent workflow-workflow
by example),
• Reproducing experiments and analysis
• Virtual Data
• Workflow-by-example
– Data Discovery
• Metadata-based search (features, subsets, …)
– Data Quality
• evaluation/review
• endorsement
• Curation/records information
– Annotation
• Data context
• Relation to other data
– Discovery/Mining/Inference/Monitoring
– Not discussed much – metadata applies not only to data but
services, programs, machines, instruments
R&D Challenges
– What to standardize, what to record?
• Infrastructure is general, some schema should be
(workflow, experiment mgmt) but most are domain
specific
– Metadata Services
• scalable, distributed, schema-independent
• semantic federation/ontology mapping, derived
indexes/info retrieval service, global ids, rich
authorization models, data granularity, inference
services, curation (tuning based on access, etc.) )
• Usability - Metadata input/capture – automation,
cultural change, rewards, Google precedent
• Maintainability - Automated quality management
From workshop 1
Provenance: Conceptual Services
•
•
•
•
•
•
•
•
•
•
•
•
•
Logical Naming
Lifecycle
Discovery (data, schema)
Basic Management (ingest, storage, query, update,
notification, ...)
Reasoning (mapping, inference, …)
Records (signing, nonrepudiation)
Migration (schema, formats, signatures, ...)
Archival/versioning (copies of external data, services, …)
Policy enforcement (fit for purpose, adheres to common
data model, …)
Federation & aggregation
Collection and/or Compounding
Curation (e.g. conflict detection & resolution)
Workflow “Proxy”
Program Scope
– Research – metadata services (see above)
– Pilot – use of rich metadata to support grand
challenge projects
– Develop/deploy: General metadata capture
tools – capture from workflows, problem
solving environments
– Maintain – metadata management as
cyberinfrastructure (requires research on
scaling, maintainability,…)
Download