Tea soon Meta Tea now © m.c. schraefel, ecs u of southampton, dec 1, 2003 Designing Interaction to support Annotations and Provenance in the Chemical Aether or What We Learned from Making Tea with Chemists m.c. schraefel Gareth Hughs Graham Smith Hugo Mills Jeremy Frey Dave De Roure U of Southampton, at the 51st Parallel, slightly south of here © m.c. schraefel, ecs u of southampton, dec 1, 2003 thank you Carole Goble for saying “that’s provenance!” after seeing a version of this talk thank you Mike Wilde for saying “maybe so” before seeing a version of this talk and asking me to participate in this panel thank you Luc Moreau for saying “people” and “workflow” in the same talk and Dave Berry for the invitation to the workshop this is fun! © m.c. schraefel, ecs u of southampton, dec 1, 2003 Our sponsors: IAM the Smarttea Project of CombeChem (EPSRC) Advanced Knowledge Technologies IRC (EPSRC) © m.c. schraefel, ecs u of southampton, dec 1, 2003 problem eScience Context • to support @source publication/ curation of data usually recorded (or not) by hand on paper • to look at how such capture might be digitized and automated where possible • to explore implications for provenance re new models of publication/curation in other words, how replace a lab book with an eLabBook © m.c. schraefel, ecs u of southampton, dec 1, 2003 interaction for interrogation • at the producer/curator/consumer level rather than the system level • not the layer above; the layer before © m.c. schraefel, ecs u of southampton, dec 1, 2003 What is required 1.Working with scientific data – generated and legacy data, rapid evolution of data schema. 2. Need to include data planning as part of the experiment plan and evolve the schema in concert. 3.Rich & variable data sources. 4.Annotation. 5.High context sensitivity for services and data reuse. 6. Aim to allow non-specialist users to understand and manipulate the information. 7. Planning & workflow prescriptive (sometimes overly), supportive and retrospective (for discussion and repeatability). 8.Information is only generated if the meta data is attached thus whole range of metadata capture facilities are required. 9.Real time or concurrent generation of data & metadata. 10.A system must cope with no one run being the same as any other (which is different to industrial production). © m.c. schraefel, ecs u of southampton, dec 1, 2003 t h e C h e m i c a l A E t h e r the imagined metadata layer of the experimental environment - all services, data and associated metadata circulates here how is the data that will go into this system generated? That’s a toughie: • Chemists are busy • Chemists already have ways they like to work • We’re asking them to change how they work We need to understand not just how and what they do but WHY © m.c. schraefel, ecs u of southampton, dec 1, 2003 A chemistry lab is a hostile environment without much room to maneuver what can be automatically captured with sensors? what must rely on manual annotation? © m.c. schraefel, ecs u of southampton, dec 1, 2003 Fume cupboard bad chemist: no gloves - with gloves - how id a process? © m.c. schraefel, ecs u of southampton, dec 1, 2003 very precise scales - but not connected © m.c. schraefel, ecs u of southampton, dec 1, 2003 multiple chemists concurrently working in the lab © m.c. schraefel, ecs u of southampton, dec 1, 2003 critical data entry © m.c. schraefel, ecs u of southampton, dec 1, 2003 no dedicated location vulnerability of data captured access to data by others is limited privilege (IP) • rights uniqueness history © m.c. schraefel, ecs u of southampton, dec 1, 2003 Tea 1 Tea 2 Tea 1a Tea 2a what is recorded (bare minimum) © m.c. schraefel, ecs u of southampton, dec 1, 2003 how get more of this into more of this © m.c. schraefel, ecs u of southampton, dec 1, 2003 adapt data collection methods away from the analog book into the chemical aether to do that, we need to understand not only the environment, but the experimental process Methodology Shift others have failed. a lot © m.c. schraefel, ecs u of southampton, dec 1, 2003 Making Tea: Getting not just the what and how, but the why © m.c. schraefel, ecs u of southampton, dec 1, 2003 * * * * Reaction Workup Purify Analysis RDF graph of Tea experiments test against “the real thing” © m.c. schraefel, ecs u of southampton, dec 1, 2003 © m.c. schraefel, ecs u of southampton, dec 1, 2003 * * * * Reaction Workup Purify Analysis © m.c. schraefel, ecs u of southampton, dec 1, 2003 © m.c. schraefel, ecs u of southampton, dec 1, 2003 each process has a data output implications for workflow dependent models like daml-s process-to-process: where’s the data go? © m.c. schraefel, ecs u of southampton, dec 1, 2003 Translating the graph into services © m.c. schraefel, ecs u of southampton, dec 1, 2003 what must be recorded (1) © m.c. schraefel, ecs u of southampton, dec 1, 2003 what must be recorded (2) © m.c. schraefel, ecs u of southampton, dec 1, 2003 Services First Building Bricks for COSSH transposition from legal form into experiment planner: integration of services at site - converted an existing simple inventory database used by the chemists into a web service. - integrated barcode scanners, to provide integrated inventory management support. - provide useful standard pieces of information about a chemical - provides all of the data of a normal periodic table. - grammes–to-moles chemical calculator. - No such systems were available from national services or major chemical suppliers. - building blocks for the planner - reduce repetitive task load for chemists; improve data capture; support annotation on all data fields at site © m.c. schraefel, ecs u of southampton, dec 1, 2003 Translating experience into affective tools © m.c. schraefel, ecs u of southampton, dec 1, 2003 © m.c. schraefel, ecs u of southampton, dec 1, 2003 critical data entry © m.c. schraefel, ecs u of southampton, dec 1, 2003 In Context Capture Support Second Provenance as side effect of in-context practice © m.c. schraefel, ecs u of southampton, dec 1, 2003 Is it used? is it desired? Yes! leverages current processes and adds obvious immediate value “I can go anywhere and its, like, this is me and my data. It’s all there! Bang!” We have challenges getting this right in publication at source for academics. The reward is not as immediate. Work integration is critical © m.c. schraefel, ecs u of southampton, dec 1, 2003 Future issues Provenance Work Publication at Source - Automated authoring and automated adaption for userdetermined rendering of the information (versionable, adaptable hypermedia) - Interrogation of Provenance -Trust - manual and automatic processes re: provenance and IP/Digital Rights Management (DRM) - Exploration (related work) for NEW knowledge building Distribution -The need for distributed triplestores for scalability and interoperability. - The design of messages that will transport data from interface systems to triplestores. - The generation of unique, shared URIs for the chemical aether. -Triplestore technology in the context of rapidly evolving data. © m.c. schraefel, ecs u of southampton, dec 1, 2003 Aside: Mike mentioned viewing assets in a lab © m.c. schraefel, ecs u of southampton, dec 1, 2003 tea service take away 1. moving towards curaion/publication @ sources is only possible because of the semantic web/grid support for annotation/provenance 2. working with people changes models 3. Interaction for provenance means integration with existing work practices © m.c. schraefel, ecs u of southampton, dec 1, 2003