myTea: Bringing eScience into Bioinformatics Practice presented by mc schraefel

advertisement

myTea: Bringing eScience into Bioinformatics Practice

presented by mc schraefel

Crafted by Andrew Gibson, Robert Stevens, Sacha Brostoff, Ray Cooke, mc

Introduction

This is a project about Human Computer

Interaction in eScience

This is a story about indirect interaction to support eScience

It is about supporting the ad hoc, associated and asynchronous

In carrying out eScience, with particular attention to BioInformatics

overview

The associated context - related projects

The problem context

Approach for solution where we are so far where we’ll be by project end where we’d like to go from there.

3

myTea

The myTea project

1yr EPSRC funded project

Extend expertise from Smart Tea and myGrid

Develop the principles and framework for a smart application for bioinformaticians

Southampton m.c. schraefel – HCI

Sacha Brostoff – Usability

Ray Cooke – CS

Manchester

Robert Stevens – Biology / CS

Andrew Gibson – Bioinformatics

Co-investigators: David De Roure, Jeremy Frey (Southampton),

Carole Goble (Manchester), Christopher Greenhalgh

(Nottingham)

Smart Tea

www.smarttea.org

Smart Tea – enabling eScience for chemists

Making tea

An analogy of chemistry practice

Improved communication between chemists and computer scientists

Chemists in the lab still use lab books

Overcome reluctance to change established practice

Develop an integrated application that:

Replaces paper

Provides new opportunities for chemists interaction with the lab

The Language of Tea

myGrid

www.mygrid.org.uk

myGrid – Middleware for bioinformaticians

A set of connected bioinformatics tasks

Repetitive and time consuming

Uses multiple resources on the web

Data passed between services needs manipulation

Difficult to record provenance data myGrid provides workflows

Encapsulates such a set of experiments

Automatically records provenance data

Saves much time and effort

Enabled for collaboration

myGrid

Linking web services via workflows

Big usability boon

12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt

12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt

12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct

12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt

12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt

12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt

12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg

12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga

12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc

12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa

12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

Hannah Tipney – ISMB 2004

myTea – Filling a gap

myGrid

Metadata capture

Automate tedious processes

Virtual collaborative environment

Cannot capture thoughts/hypotheses

Cannot capture short bespoke experiments

Smart Tea

Allows Scientist to record notes, digitally, in Lab

Does not integrate with bioinformaticians’ daily activities (since there’s not usually a “lab” per se)

Explaining Bioinformatics to Computer Scientists

“ What is Bioinformatics?

Not a particularly good question

Asked by many non-bioinformaticians

Is however the standard intro for bioinformatics talks

Better questions to ask

What do you need to do bioinformatics?

Data

Tools

People

How do bioinformaticians work?

As with chemistry, this needs an analogy…

Explaining Bioinformatics to Computer Scientists

“ What is Bioinformatics?

Not a particularly good question

Asked by many non-bioinformaticians

Is however the standard intro for bioinformatics talks

Better questions to ask

What do you need to do bioinformatics?

Data

Tools

People

How do bioinformaticians work?

As with chemistry, this needs an analogy…

<Jigsaw>

myTea

www.mytea.ecs.soton.ac.uk

The Premises quickly dismissed

Bioinformaticians are scientists

Scientists follow the traditional scientific approach

Hypothesis, Materials and Methods, Results, Conclusions

Scientists use a lab book to record this process

Bioinformatics Practice:

Ad Hoc Science

The scope of a bioinformatics project can be large

Wet Lab experiments are carefully planned

Literature searches are a prerequisite

Hypotheses and methods consume time

These and results are carefully recorded in lab books

The tags assigned to a “lab experiment” are more like the parameters of a bioinformatics project

Bioinformatics experiments are comparatively cheap

A hypothesis conceived on the bus can be tested by lunch time

A “dry run” is often all that is needed to assess t ti l

“What, no lab book?”

Ad hoc hypothesis testing

Making notes is inefficient

Good notes could take as long as experiment

Types of data difficult to record

URLs used, Large chunks of raw data, Parameters

Repeatability

The cost of re-running an experiment is small

Results stored as files

Screen captures, Copy and paste – Word documents

Plethora of other types of file

“Now We’re Talking”

Pertinent solutions for bioinformatics

“ Dude, where’s my file ”

Logical structure for linking files

Absolute minimum user interaction

“ What was it called?

Additional naming convention

If these critical criteria are not met:

“I can’t find the data” OR

“I can’t remember the context” SO

“ I will just run the experiment again ”

“Now We’re Talking”

Pertinent solutions for bioinformatics

Established cultures

“I like to use this particular program and method to manipulate my data”

“I have written my own program to manipulate some data”

If this critical issue is not addressed:

“I cant use my favourite/own application to do this so why bother?”

This is not a new finding - but it’s certainly current here.

“Now We’re Talking”

Extended solutions for bioinformatics

Metadata capture

Provenance data

Tracking user modifications to data

“I have examined this data”

“I have rejected this section of the data because…”

“When did I retrieve this data”

Checking for data modification

“Has this data been modified at the source since I ran my experiment”

“Now We’re Talking”

Extended solutions for bioinformatics

Status

Failed short-term experiments often not recorded

Status indicators – e.g. “Complete/

Incomplete”

Requires:

A slight change in behaviour

User interaction

“Now We’re Talking”

Extended solutions for bioinformatics

Report Generation

“I am running a subset of experiments”

“I want to see a summary of how my processes of (x, y, z) are doing”

OR

“I want to show my supervisor a summary of what I have been working on lately

Status comes into play

Is it worth the time cost?

Putting the Book into NoteBook

In Smart Tea, we put the digital book into the lab

Moving from paper interaction to digital interaction

In myTea, as myGrid shows, everything is digital.

Challenge:

Bringing the aspects of the book - rapid review, sharable context - into the digital scene

Dirty glassware stuff

Shared bench

Log book of another chemist multiple chemists concurrently working in the lab

Context Histories and

Transparent Interaction

Our proposed approach:

Ubiquitous computing: transparent interaction

Trace context histories of I/O

Auto-generate metadata and support manual

Reports (like lab books) on demand

Similar but different to other file managers and digital note book software.

Challenges:

Trust in myTea

“ Can I trust myTea to capture everything I want it to ?”

With careful design, users can be persuaded myGrid test user – Hannah Tipney

1 st run of myGrid workflow not only captured everything expected

Also produced results missed by manual process

Now has complete confidence in letting myGrid do its work

Challenges:

Collaboration

Macro-Collaboration

Conference Contacts

Other Institution

Institution

Other Group

Group

Supervisor t

Projec

Micro-Collaboration

Challenges:

Virtual Collaborative Environments

Controlling the level of interaction

Handling of privileged data

“Who can see what I am doing”

“Who needs to see what I’m doing?”

Virtual Collaborative Environment

Mainly asynchronous

Quick sharing of subsets of work for focus

(as if ripping out chunks of lab books)

Challenges: Visualization

Getting the views right:

How preview the data associated with reports?

Support post-hoc annotation

Watch “file deterioration” - inference

Support naming conventions

S

S

S

C1

C1

C1

C2

C2

C2

N

N

N

Challenges: Prototyping

Significant coding; platform-specific

Low level I/O

BUT

Can tie in semantic layer for metatagging

Can tie into associated resources

Current Approach

Scenarios with specific types of bioinformatition co-strand:

I/O track for artifact discussion what is useful from this list?

what is not?

how might you use this?

Visualization/representation assuming results, how could they be interrogated?

31

Current Status

Related Activities

Main workshop proposed for ECSCW

Special Issue of IJHCS for usability in eSci collaborative with UTF

Project End - projection

Prototype for field deployment and assessment

Initial formative studies of use for refinement of prototpye for improving understanding of transparent interaction for individual work and async collab.

32

Future Work

Where we want to go

General model of transparent interaction

Understanding of multilayering reports/pubs

Investigate trust/sharing/separation networked vs local no small cultural hurdle

**How enhance asynchronous collaboration?

How integrate with services.

33

ADVERT 5 4 2

HCI Usability Summer School, UTF

June 6-9, here, NeSC

FREE

Students on eSci projects - cost covered

Register at NeSC site : 5 4 2 http://www.nesc.ac.uk/esi/events/542/

34

Acknowledgements

Dave, for the invitation.

EPSRC myGrid and Comb e Chem

Explaining Bioinformatics to Computer Scientists:

Scientists are from Venus and Computer

Scientists are from Mars

Mars vs Venus

“Not my problem”

Over-complication

Size matters

“Mother knows best”

Fin

“Suits me”

Venus vs Mars

“The parent principle”

“It works cos I say so”

Short-termism

Isolationism

Carole Goble, Presented at SIAM 2005

eScience Usability

May be about supporting context of activity:

The Tsunami lab book: everything in one zone

Smart Tea lab book: leveraging what’s there to reduce management during experiment

Chemistry Crystallography: bringing together views of multiple machines into one file format for comparative analysis

Bioinformatics: making it possible to recover sensibly what’s already there.

Download