presented by mc schraefel
Crafted by Andrew Gibson, Robert Stevens, Sacha Brostoff, Ray Cooke, mc
This is a project about Human Computer
Interaction in eScience
This is a story about indirect interaction to support eScience
It is about supporting the ad hoc, associated and asynchronous
In carrying out eScience, with particular attention to BioInformatics
The associated context - related projects
The problem context
Approach for solution where we are so far where we’ll be by project end where we’d like to go from there.
3
The myTea project
1yr EPSRC funded project
Extend expertise from Smart Tea and myGrid
Develop the principles and framework for a smart application for bioinformaticians
Southampton m.c. schraefel – HCI
Sacha Brostoff – Usability
Ray Cooke – CS
Manchester
Robert Stevens – Biology / CS
Andrew Gibson – Bioinformatics
Co-investigators: David De Roure, Jeremy Frey (Southampton),
Carole Goble (Manchester), Christopher Greenhalgh
(Nottingham)
www.smarttea.org
Smart Tea – enabling eScience for chemists
Making tea
An analogy of chemistry practice
Improved communication between chemists and computer scientists
Chemists in the lab still use lab books
Overcome reluctance to change established practice
Develop an integrated application that:
Replaces paper
Provides new opportunities for chemists interaction with the lab
www.mygrid.org.uk
myGrid – Middleware for bioinformaticians
A set of connected bioinformatics tasks
Repetitive and time consuming
Uses multiple resources on the web
Data passed between services needs manipulation
Difficult to record provenance data myGrid provides workflows
Encapsulates such a set of experiments
Automatically records provenance data
Saves much time and effort
Enabled for collaboration
Linking web services via workflows
Big usability boon
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt
12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt
12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct
12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt
12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt
12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt
12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg
12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga
12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc
12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa
12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa
Hannah Tipney – ISMB 2004
myGrid
Metadata capture
Automate tedious processes
Virtual collaborative environment
Cannot capture thoughts/hypotheses
Cannot capture short bespoke experiments
Smart Tea
Allows Scientist to record notes, digitally, in Lab
Does not integrate with bioinformaticians’ daily activities (since there’s not usually a “lab” per se)
Explaining Bioinformatics to Computer Scientists
Not a particularly good question
Asked by many non-bioinformaticians
Is however the standard intro for bioinformatics talks
Better questions to ask
What do you need to do bioinformatics?
Data
Tools
People
How do bioinformaticians work?
As with chemistry, this needs an analogy…
Explaining Bioinformatics to Computer Scientists
Not a particularly good question
Asked by many non-bioinformaticians
Is however the standard intro for bioinformatics talks
Better questions to ask
What do you need to do bioinformatics?
Data
Tools
People
How do bioinformaticians work?
As with chemistry, this needs an analogy…
www.mytea.ecs.soton.ac.uk
The Premises quickly dismissed
Bioinformaticians are scientists
Scientists follow the traditional scientific approach
Hypothesis, Materials and Methods, Results, Conclusions
Scientists use a lab book to record this process
The scope of a bioinformatics project can be large
Wet Lab experiments are carefully planned
Literature searches are a prerequisite
Hypotheses and methods consume time
These and results are carefully recorded in lab books
The tags assigned to a “lab experiment” are more like the parameters of a bioinformatics project
Bioinformatics experiments are comparatively cheap
A hypothesis conceived on the bus can be tested by lunch time
A “dry run” is often all that is needed to assess t ti l
Ad hoc hypothesis testing
Making notes is inefficient
Good notes could take as long as experiment
Types of data difficult to record
URLs used, Large chunks of raw data, Parameters
Repeatability
The cost of re-running an experiment is small
Results stored as files
Screen captures, Copy and paste – Word documents
Plethora of other types of file
“ Dude, where’s my file ”
Logical structure for linking files
Absolute minimum user interaction
“ What was it called?
”
Additional naming convention
If these critical criteria are not met:
“I can’t find the data” OR
“I can’t remember the context” SO
“ I will just run the experiment again ”
Established cultures
“I like to use this particular program and method to manipulate my data”
“I have written my own program to manipulate some data”
If this critical issue is not addressed:
“I cant use my favourite/own application to do this so why bother?”
This is not a new finding - but it’s certainly current here.
Metadata capture
Provenance data
Tracking user modifications to data
“I have examined this data”
“I have rejected this section of the data because…”
“When did I retrieve this data”
Checking for data modification
“Has this data been modified at the source since I ran my experiment”
Status
Failed short-term experiments often not recorded
Status indicators – e.g. “Complete/
Incomplete”
Requires:
A slight change in behaviour
User interaction
Report Generation
“I am running a subset of experiments”
“I want to see a summary of how my processes of (x, y, z) are doing”
OR
“I want to show my supervisor a summary of what I have been working on lately
Status comes into play
Is it worth the time cost?
In Smart Tea, we put the digital book into the lab
Moving from paper interaction to digital interaction
In myTea, as myGrid shows, everything is digital.
Challenge:
Bringing the aspects of the book - rapid review, sharable context - into the digital scene
Dirty glassware stuff
Shared bench
Log book of another chemist multiple chemists concurrently working in the lab
Our proposed approach:
Ubiquitous computing: transparent interaction
Trace context histories of I/O
Auto-generate metadata and support manual
Reports (like lab books) on demand
Similar but different to other file managers and digital note book software.
“ Can I trust myTea to capture everything I want it to ?”
With careful design, users can be persuaded myGrid test user – Hannah Tipney
1 st run of myGrid workflow not only captured everything expected
Also produced results missed by manual process
Now has complete confidence in letting myGrid do its work
Macro-Collaboration
Conference Contacts
Other Institution
Institution
Other Group
Group
Supervisor t
Projec
Micro-Collaboration
Controlling the level of interaction
Handling of privileged data
“Who can see what I am doing”
“Who needs to see what I’m doing?”
Virtual Collaborative Environment
Mainly asynchronous
Quick sharing of subsets of work for focus
(as if ripping out chunks of lab books)
Getting the views right:
How preview the data associated with reports?
Support post-hoc annotation
Watch “file deterioration” - inference
Support naming conventions
S
S
S
C1
C1
C1
C2
C2
C2
N
N
N
Significant coding; platform-specific
Low level I/O
BUT
Can tie in semantic layer for metatagging
Can tie into associated resources
Scenarios with specific types of bioinformatition co-strand:
I/O track for artifact discussion what is useful from this list?
what is not?
how might you use this?
Visualization/representation assuming results, how could they be interrogated?
31
Related Activities
Main workshop proposed for ECSCW
Special Issue of IJHCS for usability in eSci collaborative with UTF
Project End - projection
Prototype for field deployment and assessment
Initial formative studies of use for refinement of prototpye for improving understanding of transparent interaction for individual work and async collab.
32
Where we want to go
General model of transparent interaction
Understanding of multilayering reports/pubs
Investigate trust/sharing/separation networked vs local no small cultural hurdle
**How enhance asynchronous collaboration?
How integrate with services.
33
HCI Usability Summer School, UTF
June 6-9, here, NeSC
FREE
Students on eSci projects - cost covered
Register at NeSC site : 5 4 2 http://www.nesc.ac.uk/esi/events/542/
34
Dave, for the invitation.
EPSRC myGrid and Comb e Chem
Explaining Bioinformatics to Computer Scientists:
Scientists are from Venus and Computer
Scientists are from Mars
Mars vs Venus
“Not my problem”
Over-complication
Size matters
“Mother knows best”
Fin
“Suits me”
Venus vs Mars
“The parent principle”
“It works cos I say so”
Short-termism
Isolationism
Carole Goble, Presented at SIAM 2005
May be about supporting context of activity:
The Tsunami lab book: everything in one zone
Smart Tea lab book: leveraging what’s there to reduce management during experiment
Chemistry Crystallography: bringing together views of multiple machines into one file format for comparative analysis
Bioinformatics: making it possible to recover sensibly what’s already there.