The Way Things Go

The Way Things Go    e-Science is a complex activity Scientific knowledge is comprehensible only in the context of those activities Adopt the Rube Goldberg view Rube Goldberg National Center for Supercomputing Applications Grand challenge: systems-scale science   “... modeling complex systems will be a major research challenge for the 21st century” - National Science Foundation  Observation and modeling of multiple systems at multiple scales Linking data and tools from different disciplines to get a valid global result! National Center for Supercomputing Applications Building current practices up isn't working    Heterogeneous tools, data formats Little global coordination of research Little funding for sustained stewardship of tools and data M.C. Escher, “Tower of Babel” (1928) National Center for Supercomputing Applications Proposed solutions aren't working  e-Journals – not machine-interpretable  Collaboration tools    scientists just use email like everyone else Portals and digital libraries – typically:  centralized  domain-specific The Grid – can orchestrate complex processing jobs, but that's not science National Center for Supercomputing Applications Only networks work at scale Desktop  Single researcher  Workgroup  Community   Network Ad hoc data mgt, single-user apps Community tools, resources, control Global  No global practice, tools, control National Center for Supercomputing Applications How do we get there?  model refine predict observe critical interface   e-Science means managing  Process, and  Data Current approaches favor one or the other Information is getting lost data National Center for Supercomputing Applications Trends: process data process Workflow * provenance * the grid * portals Interactive * desktop apps * e-notebooks * digital libraries Batch * formats * mainframes Data Metadata * rules * ontologies data Semantics National Center for Supercomputing Applications Key technologies  Semantic web: data/metadata   Workflow: process   Provides means of merging descriptive information even if it only partially agrees (e.g., comes from two different communities) Describes complex procedures independently of how they are executed Provenance: process + data/metadata  Links workflow, data, and any ancillary descriptive information (e.g., attribution) National Center for Supercomputing Applications Semantics: data to knowledge Abstract Knowledge Ontologies, rules, models, etc. (a.k.a. semantics) Learning, inference Information Collections, tags, attributes, etc. (a.k.a. metadata) Aggregation, annotation Concrete Data Streams, arrays, swaths, etc. (a.k.a. files) (cf Reagan Moore) National Center for Supercomputing Applications Semantic web: RDF triple subject    predicate object Declarative: asserts a fact Subject and object URI's identify arbitrary entities (things, people, concepts, events) Predicate identifies the relationship between them National Center for Supercomputing Applications Triples form an open network    hasBreed Subject nodes aren't “owned” by any single agent or container Any actor can add arcs to the implicit, total, world graph Any two graphs can be joined National Center for Supercomputing Applications Non satis non scire (to know is not enough)    Semantic web “layer cake” Where do we manage process?  User interface?  Applications? “Semantic Grid” (D. DeRoure, C. Goble) (source: World Wide Web Consortium) National Center for Supercomputing Applications Workflow: process description  (Taverna)   Describe complex operations as networks of simpler operations Abstract operation execution from description Can be shared (but may not be portable) (Kepler) National Center for Supercomputing Applications Anatomy of a workflow Execution model (usu. implicit)    Declarative: says what do to Modules identify arbitrary procedures Arcs identify flow of control and/or data (data flow is usually implicit) “Module” Control flow National Center for Supercomputing Applications Workflow systems    D2K (source: NCSA) Modules representing units of computation Language for specifying WF  modules  control flow Engine for executing WF National Center for Supercomputing Applications Work vs. workflow systems   (source: CNRS/UCSD) Scientists are not WF modules Science work also involves  social organization incl. funding  field and “wet lab” manual work  discourse: review, validation National Center for Supercomputing Applications Provenance: what happened   Answers critical questions  What led to this result?  When and how were observations made, conclusions reached? Is a causal network of events National Center for Supercomputing Applications Complementary incomplete notions of provenance  Artifact-centric (e.g., digital libraries)  Process-centric (e.g., workflow)  “lineage”= events in lifecycle of artifact e.g., custody  computational events (e.g., service invocations)  IR's focus on curation events (not antecedent processes)  control flow  artifacts are either not mentioned or opaque (tool-specific) National Center for Supercomputing Applications Provenance Challenges 1 & 2   IPAW 2006, HPDC 2007 20 teams, 1 workflow, 9 queries   major players Interoperability?  lots of manual work required  call for standards (source: gridprovenance.org) National Center for Supercomputing Applications Artifact + process provenance = “open provenance”    (source: Luc Moreau et al) Can describe any process, not just WF execution (e.g., science!) Allows alternate accounts by different observers Rules for inferring transitive causal relationships National Center for Supercomputing Applications Open Provenance Model (source: Luc Moreau et al)     3 node types – artifact, process, agent 5 arc types – used, generated, triggered, derived, controlled – and inference rules Generic – extensibility via annotation Choice of granularity and focus (e.g., artifact or process-centric) National Center for Supercomputing Applications NCSA Provenance Infrastructure Visualization, interaction destkop, portal, etc. Tracking, modeling, presentation OPM toolkit OPM toolkit Open Provenance Model Tupelo Semantic Content Repository Abstraction, inference, storage Context Context Context Store Store Store National Center for Supercomputing Applications Tupelo: semantic content    (tupeloproject.org) Abstracts content from storage impls (e.g., Sesame, Mulgara) Provides location-independent addressing of content and metadata Supports transparent mirroring, caching, failover, etc. National Center for Supercomputing Applications CyberIntegrator: workflow by example   Records what users do as provenance  source, intermediate, and final artifacts  steps and parameters Can re-enact interaction as a workflow National Center for Supercomputing Applications MAEviz: analaysis/viz app, workflow “behind the scenes”     GIS app. platform Earthquake hazard analysis plug-in Data catalog  built environment  fragility/hazard models Driven by workflow -> provenance National Center for Supercomputing Applications CyberCollaboratory: collaboration + provenance    User interaction with tools generates events Events are captured using the OPM and published to Tupelo Non-portal apps can browse / use provenance National Center for Supercomputing Applications Summary    “The way things go” is critical to e-Science at scale Provenance is an open causal network New infrastructure supports provenance National Center for Supercomputing Applications Resources / acknowledgements  Grid Provenance Challenge    http://twiki.gridprovenance.org/ NCSA technologies  Tupelo: http://tupeloproject.org/  CyberIntegrator: http://isda.ncsa.uiuc.edu/  MAEviz: http://maeviz.cee.uiuc.edu/  CyberCollaboratory: http://ecid.ncsa.uiuc.edu/cybercollab/ Acknowledgements:  Jim Myers, Luc Moreau, Juliana Friere, Patrick Paulson, Simon Miles, Bob McGrath, and more ... National Center for Supercomputing Applications

The Way Things Go

Related documents

Products

Support

The Way Things Go

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib