– A Web 2.0 Virtual Experiment Research Environment my

myExperiment
– A Web 2.0 Virtual
Research Environment
David De Roure
Carole Goble
Overview
 e-Science is about scientists doing science
– A Tale of Two Projects

myExperiment
 Design Patterns for a VRE
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 2
CombeChem pilot project
Video
Simulation
Diffractometer
Properties
Analysis
Structures
Database
X-Ray
e-Lab
Properties
e-Lab
Grid Middleware
NeSC VRE Workshop
www.combechem.org
26/2/2007 | myExperiment | Slide 3
Virtual Learning
Environment
Undergraduate
Students
Digital
Library
E-Scientists
E-Scientists
Reprints
PeerReviewed
Journal &
Conference
Papers
Reducing
time-toexperiment
Technical
Reports
Preprints &
Metadata
E-Experimentation
Publisher
Holdings
Institutional
Archive
Local
Web
Certified
Experimental
Results &
Analyses
Graduate
Students
Data,
Metadata &
Ontologies
http://www.ukoln.ac.uk/projects/ebank-uk/
Entire e-Science Cycle
Encompassing
experimentation, analysis,
publication, research,
learning
Provenance
 The key observation!
The details of the origins of data are
just as important to understanding as
their actual values
 “Publication at Source” describes the need to capture data
and its context from the outset and maintain a complete endto-end connection between the laboratory bench and the
intellectual chemical knowledge that is published as a result
of the investigation
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 5
My Chemistry Experiment
Box of Chemists
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 6
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 7
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 9
Data creation
& capture in
“Smart lab”
Presentation services: portals
Data discovery,
linking, citation
Data analysis,
transformation,
mining, modelling
Search,
harvest
Aggregator
services
Harvest
Deposit
e-Research
workflows
e-Crystals
Federation
model
Data curation &
preservation:
databases &
databanks
Institutional
data
repositories
Laboratory
repository
Deposit
Validation
Publication
Validation
(Chemistry
Central)
Linking, citation
Publishers: peer-review
journals, conference
This work is licensed under a
proceedings
Creative Commons Licence
Attribution-ShareAlike 2.0
Bioinformatics is not Chemistry
There are many pieces,
from many boxes, but no
box, and no lid with a
complete picture of what
the puzzle is supposed
to be.
 Planning? No.
 Metadata an afterthought
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 13
myGrid
 Open Source middleware for Life Scientists that enables them to
undertake in silico experiments and share those experiments and their
results.
 Machinery for linking together datasets and tools
 Individual scientists, in under-resourced labs, who use other people’s
datasets and applications.
 Ad hoc & exploratory workflows (data flows)
 To support sharing and collaboration between scientists to disseminate
best practice and improve the quality of science
 33,000 downloads; 200+ user sites; 400+ workflows;
 3500 third party external services accessible.
 Moved from prototype to production quality.
 Open Middleware Infrastructure Institute UK
 http://www.mygrid.org.uk
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 14
NeSC VRE Workshop
Taverna Workflow Workbench
26/2/2007 | myExperiment | Slide 15
Widespread Adoption
 Users in US, Asia, UK, Europe,
Australia
 Systems biology
 Proteomics
 Gene/protein annotation
 Microarray data analysis
 Medical image analysis
 Heart simulation orchestration
 High throughput screening of
chemical compounds
 Phenotypical studies
 Public Health studies
 Clinical trial analysis
 Plants, Mouse, Human
 Astronomy
 Cultural Heritage
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 16
Recycling, Reuse, Repurposing
 Identified a pathway for which its correlating gene
(Daxx) is believed to play a role in trypanosomiasis
resistance.
 Manual analysis on the microarray and QTL data
failed to identify this gene as a candidate.
 Repetitive, unbiased analysis.
 Trypanosomiasis cattle workflow reused without
change to identify the biological pathways
involved in sex dependence in the mouse model,
previously believed to be involved in the ability of
mice to expel the parasite.
 Previously a manual two year study of candidate
genes had failed to do this.
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 17
Paul Fisher et al A Systematic Strategy for Large-Scale Unbiased Analysis of Genotype-Phenotype Correlations Bioinformatics in review
 Service and workflow
annotation
 Ontology 710 classes
 Full time curator
 Tagging by the masses
 3500 service. 350 curated
 Provenance
 Ontology 35 classes
 Enriched with domain
ontologies and service
ontologies. Possibly.
 Export with data. Desirably.
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 18
New Scientific Digital Artefacts
Design
 Workflow design history
 Experiment purpose
 Scientist
LogBook
 Workflow run log
 Data lineage
 Results interpretation log
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 19
New digital artefacts
Kepler
Triana
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 20
myExperiment.org Portal Party

28th & 29th Sept 2006

Hand picked Taverna users
+ Taverna development
team

Facilitated by NCeSS.

AJAX based development

CombeChem xfer
1. A social networking environment for sharing any workflow
2. A Taverna workflow run environment
3. A multi-workflow launch environment
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 21
NeSC VRE Workshop
openwetware.org
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 24
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 25
What are we trying to do?
 Enabling scientists to be (more) creative.
 Enabling scientists to be scientists. And not programmers.
 Enabling mediocre scientists to become better and thus
have better science.
 Enabling smart scientists to be smarter and propagate their
smartness.
 Accelerate dissemination, pooling, insight.
 Encouraging sanctioned plagiarism.
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 26
Principles
 Focus on making it easy to publish information
– Discovering and sharing experimental artefacts
– Publishing results to standard community repositories
– Publishing scholarly output
 Familiar social networking / web paradigms
– Keeping it free and fluid and creative. Me-Science.
 Crossing system boundaries
– Trans-workflow
 Crossing discipline boundaries
– Multi-disciplinary, Inter-disciplinary, Trans-disciplinary
– Clustering expertise
– Intellectual fusion outside discipline. We-Science.
26/2/2007 | myExperiment | Slide 27
– Life Science, Social Science, Astronomy, Chemistry
NeSC VRE Workshop
Scoping exercise
 Workflow warehouse / federation of repositories Open Archives
Initiative. Federated myExperiments. Sharepoint.
 Social space + organised rich site Social discourse + organised service /
workflow space using curated semantics.
 Granularity and identifiers Rolling-up provenance. Id resolution
 Open vs protected content Quality, Reliability, Validation, Safety,
Intellectual Property, Ownership, Secrecy, A duty of guardianship. Curation?
Policing? Local data mixed with shared resources
 Desktop integration Google gadgets for workflows. Interacting with
workflows through Office products.
 Workflow execution (WHIP) Workflows Hosted in Portals project
 Evolving the myExperiment software Community development
 Enabling Scientists added value through applications and collaborative
tagging
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 28
Hack Fest
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 29
Q1. Workflow Warehouse or
Federation of Repositories?
 Everything on the
myExperiment.org web site
vs
 Distributed stores
 Multiple myExperiments
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 32
Q2. Social Space or Shoe Shop?
 Shopping for Workflows and
Services and Data should
be as easy as shopping for
shoes.
 Organic growth is good and
bad.
 Social tagging might help
discover workflows but we
need good metadata for
automated use.
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 33
26/2/2007 | myExperiment | Slide
33
Q3. How open is the content?
 OpenWetware is open
 Our users don’t want this
 Provenance helps
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 34
Q4. Integration
 Bring user to Web Site
vs
 Bringing myExperimentness
to existing interfaces
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 35
Web 2.0 Design Patterns
1. The Long Tail
2. Data is the Next Intel Inside
3. Users Add Value
4. Network Effects by Default
5. Some Rights Reserved
6. The Perpetual Beta
7. Cooperate, Don't Control
8. Software Above the Level of
a Single Device
 http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 36
1. The Long Tail
 Our target users are not just the specialist e-Scientists using
computing resources to tackle major scientific
breakthroughs, but also the large number of scientists
conducting the routine processes of science on a daily basis.
 Through sharing we have the potential to enable smart
scientists to be smarter and propagate their smartness, in
turn enabling other scientists to become better and conduct
better science.
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 37
2. Data is the Next “Intel Inside”

myExperiment
understands that scientists are focused on
data, not software or one particular workflow engine.
 Workflows are components of customised applications,
many of which are data-oriented rather than processoriented.
 Users manipulate, through their own applications, the
product (data, model) yielded by the workflow.
 Furthermore, workflows themselves are the data of
myExperiment and provide its unique value.
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 38
3. Users Add Value

myExperiment
makes it easy to find workflows and is
designed to make it useful and straightforward to share
workflows and add workflows to the pool.
 To succeed we draw on the insights into the incentive
models of scientists gained through experience with
Taverna.
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 39
4. Network Effects by Default

myExperiment
aggregates user data as a side-effect of using
the VRE.
 The ability to execute workflows from myExperiment, and the
integration of tools such as Taverna with myExperiment,
further enable us to achieve increased value through usage.
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 40
5. Some Rights Reserved

myExperiment
users require protection as well as sharing, but
the environment is designed for maximum ease of sharing to
achieve collective benefits – workflows are "hackable" and
"remixable".
 Initiatives such as Science Commons provide a useful
context for this.
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 41
6. The Perpetual Beta

myExperiment
is an online service (a collection of online
services) and is continually evolving in response to its users.
 To support this, the project commenced with developers
being embedded in the user community.
 Through day-to-day contact between designers and
researchers, design is both inspired and validated.
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 42
7. Cooperate, Don't Control

myExperiment
is a network of cooperating data services with
simple interfaces which make it easy to work with content.
 It both provides services and reuses the service of others.
 It aims to support lightweight programming models so that it
can easily be part of loosely coupled systems.
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 43
8. Software Above the Level of a Single Device
 The current model of Taverna running on the scientist’s
desktop PC or laptop is evolving into myExperiment being
available through a variety of interfaces and supporting
workflow execution.
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 44
Closing
 e-Science is difficult – workflows and Web 2.0 make it
easier.
 Our design workshops and the review against Web 2.0
design patterns have revealed the relationship between
myExperiment and Web 2.0.
 The collective benefits of participation arise not only from the
users but also from the developers – ease of use and ease
of development.
 It might be useful to review other VREs against the design
patterns.
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 45
Take homes
 myExperiment is a Web 2.0 Environment
for Scientists to share experiments
 Join us!
 David De Roure
– dder@ecs.soton.ac.uk
 Carole Goble
– carole.goble@manchester.ac.uk
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 46
Credits
 myGrid and CombeChem
 Matt Lee
 David Withers
 Don Cruickshank
 Rob Procter
 Alex Voss
 June Finch
 Ed Zaluska
 All the users inc. embedders
NeSC VRE Workshop
26/2/2007 | myExperiment | Slide 47