myExperiment – A Web 2.0 Virtual Research Environment David De Roure Carole Goble Overview e-Science is about scientists doing science – A Tale of Two Projects myExperiment Design Patterns for a VRE NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 2 CombeChem pilot project Video Simulation Diffractometer Properties Analysis Structures Database X-Ray e-Lab Properties e-Lab Grid Middleware NeSC VRE Workshop www.combechem.org 26/2/2007 | myExperiment | Slide 3 Virtual Learning Environment Undergraduate Students Digital Library E-Scientists E-Scientists Reprints PeerReviewed Journal & Conference Papers Reducing time-toexperiment Technical Reports Preprints & Metadata E-Experimentation Publisher Holdings Institutional Archive Local Web Certified Experimental Results & Analyses Graduate Students Data, Metadata & Ontologies http://www.ukoln.ac.uk/projects/ebank-uk/ Entire e-Science Cycle Encompassing experimentation, analysis, publication, research, learning Provenance The key observation! The details of the origins of data are just as important to understanding as their actual values “Publication at Source” describes the need to capture data and its context from the outset and maintain a complete endto-end connection between the laboratory bench and the intellectual chemical knowledge that is published as a result of the investigation NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 5 My Chemistry Experiment Box of Chemists NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 6 NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 7 NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 9 Data creation & capture in “Smart lab” Presentation services: portals Data discovery, linking, citation Data analysis, transformation, mining, modelling Search, harvest Aggregator services Harvest Deposit e-Research workflows e-Crystals Federation model Data curation & preservation: databases & databanks Institutional data repositories Laboratory repository Deposit Validation Publication Validation (Chemistry Central) Linking, citation Publishers: peer-review journals, conference This work is licensed under a proceedings Creative Commons Licence Attribution-ShareAlike 2.0 Bioinformatics is not Chemistry There are many pieces, from many boxes, but no box, and no lid with a complete picture of what the puzzle is supposed to be. Planning? No. Metadata an afterthought NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 13 myGrid Open Source middleware for Life Scientists that enables them to undertake in silico experiments and share those experiments and their results. Machinery for linking together datasets and tools Individual scientists, in under-resourced labs, who use other people’s datasets and applications. Ad hoc & exploratory workflows (data flows) To support sharing and collaboration between scientists to disseminate best practice and improve the quality of science 33,000 downloads; 200+ user sites; 400+ workflows; 3500 third party external services accessible. Moved from prototype to production quality. Open Middleware Infrastructure Institute UK http://www.mygrid.org.uk NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 14 NeSC VRE Workshop Taverna Workflow Workbench 26/2/2007 | myExperiment | Slide 15 Widespread Adoption Users in US, Asia, UK, Europe, Australia Systems biology Proteomics Gene/protein annotation Microarray data analysis Medical image analysis Heart simulation orchestration High throughput screening of chemical compounds Phenotypical studies Public Health studies Clinical trial analysis Plants, Mouse, Human Astronomy Cultural Heritage NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 16 Recycling, Reuse, Repurposing Identified a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance. Manual analysis on the microarray and QTL data failed to identify this gene as a candidate. Repetitive, unbiased analysis. Trypanosomiasis cattle workflow reused without change to identify the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite. Previously a manual two year study of candidate genes had failed to do this. NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 17 Paul Fisher et al A Systematic Strategy for Large-Scale Unbiased Analysis of Genotype-Phenotype Correlations Bioinformatics in review Service and workflow annotation Ontology 710 classes Full time curator Tagging by the masses 3500 service. 350 curated Provenance Ontology 35 classes Enriched with domain ontologies and service ontologies. Possibly. Export with data. Desirably. NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 18 New Scientific Digital Artefacts Design Workflow design history Experiment purpose Scientist LogBook Workflow run log Data lineage Results interpretation log NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 19 New digital artefacts Kepler Triana NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 20 myExperiment.org Portal Party 28th & 29th Sept 2006 Hand picked Taverna users + Taverna development team Facilitated by NCeSS. AJAX based development CombeChem xfer 1. A social networking environment for sharing any workflow 2. A Taverna workflow run environment 3. A multi-workflow launch environment NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 21 NeSC VRE Workshop openwetware.org NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 24 NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 25 What are we trying to do? Enabling scientists to be (more) creative. Enabling scientists to be scientists. And not programmers. Enabling mediocre scientists to become better and thus have better science. Enabling smart scientists to be smarter and propagate their smartness. Accelerate dissemination, pooling, insight. Encouraging sanctioned plagiarism. NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 26 Principles Focus on making it easy to publish information – Discovering and sharing experimental artefacts – Publishing results to standard community repositories – Publishing scholarly output Familiar social networking / web paradigms – Keeping it free and fluid and creative. Me-Science. Crossing system boundaries – Trans-workflow Crossing discipline boundaries – Multi-disciplinary, Inter-disciplinary, Trans-disciplinary – Clustering expertise – Intellectual fusion outside discipline. We-Science. 26/2/2007 | myExperiment | Slide 27 – Life Science, Social Science, Astronomy, Chemistry NeSC VRE Workshop Scoping exercise Workflow warehouse / federation of repositories Open Archives Initiative. Federated myExperiments. Sharepoint. Social space + organised rich site Social discourse + organised service / workflow space using curated semantics. Granularity and identifiers Rolling-up provenance. Id resolution Open vs protected content Quality, Reliability, Validation, Safety, Intellectual Property, Ownership, Secrecy, A duty of guardianship. Curation? Policing? Local data mixed with shared resources Desktop integration Google gadgets for workflows. Interacting with workflows through Office products. Workflow execution (WHIP) Workflows Hosted in Portals project Evolving the myExperiment software Community development Enabling Scientists added value through applications and collaborative tagging NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 28 Hack Fest NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 29 Q1. Workflow Warehouse or Federation of Repositories? Everything on the myExperiment.org web site vs Distributed stores Multiple myExperiments NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 32 Q2. Social Space or Shoe Shop? Shopping for Workflows and Services and Data should be as easy as shopping for shoes. Organic growth is good and bad. Social tagging might help discover workflows but we need good metadata for automated use. NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 33 26/2/2007 | myExperiment | Slide 33 Q3. How open is the content? OpenWetware is open Our users don’t want this Provenance helps NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 34 Q4. Integration Bring user to Web Site vs Bringing myExperimentness to existing interfaces NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 35 Web 2.0 Design Patterns 1. The Long Tail 2. Data is the Next Intel Inside 3. Users Add Value 4. Network Effects by Default 5. Some Rights Reserved 6. The Perpetual Beta 7. Cooperate, Don't Control 8. Software Above the Level of a Single Device http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 36 1. The Long Tail Our target users are not just the specialist e-Scientists using computing resources to tackle major scientific breakthroughs, but also the large number of scientists conducting the routine processes of science on a daily basis. Through sharing we have the potential to enable smart scientists to be smarter and propagate their smartness, in turn enabling other scientists to become better and conduct better science. NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 37 2. Data is the Next “Intel Inside” myExperiment understands that scientists are focused on data, not software or one particular workflow engine. Workflows are components of customised applications, many of which are data-oriented rather than processoriented. Users manipulate, through their own applications, the product (data, model) yielded by the workflow. Furthermore, workflows themselves are the data of myExperiment and provide its unique value. NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 38 3. Users Add Value myExperiment makes it easy to find workflows and is designed to make it useful and straightforward to share workflows and add workflows to the pool. To succeed we draw on the insights into the incentive models of scientists gained through experience with Taverna. NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 39 4. Network Effects by Default myExperiment aggregates user data as a side-effect of using the VRE. The ability to execute workflows from myExperiment, and the integration of tools such as Taverna with myExperiment, further enable us to achieve increased value through usage. NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 40 5. Some Rights Reserved myExperiment users require protection as well as sharing, but the environment is designed for maximum ease of sharing to achieve collective benefits – workflows are "hackable" and "remixable". Initiatives such as Science Commons provide a useful context for this. NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 41 6. The Perpetual Beta myExperiment is an online service (a collection of online services) and is continually evolving in response to its users. To support this, the project commenced with developers being embedded in the user community. Through day-to-day contact between designers and researchers, design is both inspired and validated. NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 42 7. Cooperate, Don't Control myExperiment is a network of cooperating data services with simple interfaces which make it easy to work with content. It both provides services and reuses the service of others. It aims to support lightweight programming models so that it can easily be part of loosely coupled systems. NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 43 8. Software Above the Level of a Single Device The current model of Taverna running on the scientist’s desktop PC or laptop is evolving into myExperiment being available through a variety of interfaces and supporting workflow execution. NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 44 Closing e-Science is difficult – workflows and Web 2.0 make it easier. Our design workshops and the review against Web 2.0 design patterns have revealed the relationship between myExperiment and Web 2.0. The collective benefits of participation arise not only from the users but also from the developers – ease of use and ease of development. It might be useful to review other VREs against the design patterns. NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 45 Take homes myExperiment is a Web 2.0 Environment for Scientists to share experiments Join us! David De Roure – dder@ecs.soton.ac.uk Carole Goble – carole.goble@manchester.ac.uk NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 46 Credits myGrid and CombeChem Matt Lee David Withers Don Cruickshank Rob Procter Alex Voss June Finch Ed Zaluska All the users inc. embedders NeSC VRE Workshop 26/2/2007 | myExperiment | Slide 47