Biological Storytelling A Software Tool for Biological Information

advertisement
Biological Storytelling:
A Software Tool for Biological Information
Organization Based upon Narrative Structure
Allan Kuchinsky, Kathy Graham, David Moh, Annette Adler,
Ketan Babaria, Michael L. Creech
Agilent Technologies
3500 Deer Creek Road
Palo Alto, California, USA
allan_kuchinsky@agilent.com
Understanding the Molecular
Basis of Disease
Source: Weewaratna et al, “Wnt5a Signaling Directly Affects
Cell Motility and Invasion of Metastatic Melanoma”,
Cancer Cell, April 2002
High-Throughput
Experimental
Methods
DNA microarray technology
enables biologists to
simultaneously
study an entire set of cellular
processes at a molecular level
(per experiment … )
Source: Stephanie Fulmer-Smentek, Robert Kincaid: Agilent Technologies
Biological Storytelling
• Reason across multiple types of experimental data
• Formulate hypotheses and high-level descriptions
Supporting Synthesis Tasks of
Biomedical researchers
•
•
Most bioinformatics tools support analysis, but
not synthesis activities of bioscience
researchers
Synthesis involves:
1.
2.
keeping track of the diverse pieces of information collected
during database searches and other data analysis activities,
and
organizing and using the diverse information, formulating
hypotheses and higher-level explanations
–
3.
4.
E.g. elucidating the structure and function of biological
pathways.
validating hypotheses and higher-level explanations against
detailed experimental data
sharing the information with colleagues and working
collaboratively to refine hypotheses.
Aspects of Synthesis work
Findings from User Research
•
•
•
•
•
•
•
Like detective work, “mind-mapping”
Information is “free-form”, semi-structured
Multiple hypothesis/alternatives
Group work, multiple perspectives, views
Grouping together chunks of related information
Verbal and visual reasoning
Notepads, whiteboards, lab notebooks
The Role of Narrative
• Many biologists talked in terms of "piecing
together a story“…
– of what a gene does, how it fits into a pathway with other
genes, proteins…
• Cognitive/social psychology research finds that
people use story structure as a way of organizing
and remembering information
– Thorndyke, Shank, Middleton and Edwards, Erickson,
Gershon and Ward
• Useful story development software tools exist for
other domains
– E.g. screenwriting, video production
Main concepts of our software
• Free-form data model
• Narrative Structure
• “Top down” hypothesis formulation, “bottom-up”
data exploration
• Annotation
• Multi-disciplinary Collaboration
ITEMS
•Basic “atomic” unit of information
•Represent “biological entities” – genes, proteins, …
•Sortable on multiple keys
•Group selectable
•Populate manually or semi-automatically
•Links to detailed experimental data
•Links to public data and literature
•Data values can be color-encoded
COLLECTIONS
•Free-form sets of items
•Malleable
–split, merge, add, move
•Represent cognitive chunks
•Can be nested
•Populate manually or semi-automatically
•Links to detailed experimental data
•Links to public data and literature
STORIES
•Represent state of biological hypotheses and understandings
•Represent paths explored and alternative hypotheses
•Support for deliberation via Support/Oppose story nodes
•Links to detailed experimental data
•Links to public data and literature
•Narrative structure as an organizing principle
Story Grammar
Source: Thorndyke, P.W. (1977), "Cognitive Structures in Comprehension and Memory
of Narrative Discourses", Cognitive Psychology, 9, pp. 77-110
PUTTING THE STORY TOGETHER GRAPHICALLY
•Graphical network diagram is common visual metaphor in molecular
biology
•Pathways, protein/protein interaction networks
•Represent the “nouns” and “verbs” of biological stories
•Nouns = biological entities (genes, proteins), (players),
• verbs = relationships between biological entities (promotes,
inhibits)
SEMANTIC OVERLAYS
•For validating high-level explanations, hypotheses against data
•Juxtapose data values onto elements of graphical and textual stories
•“step through” experimental data columns and “light up” elements of
graphical and textual stories
•Analogy to qualitative simulation
Support for Multidisciplinary
Collaboration
• Annotation is tagged with user name and
timestamp
• Support and Oppose story nodes document
alternative lines of thought
• Web repository enables review by non-users of
system
“… we build up a consensus hallucination about what is going on in the living cell… (NHGRI)”
Annotation
• Every system element can have arbitrary textual notes
• Citations can be dragged/dropped from the Web-based
literature
• Citations can have attached notes/comments
• Annotations interlinked with system elements, other
annotations
Web Repository
Usage Feedback
• How much structure is just right?
– Flexibility in story grammar (tags vs. rigid
structure)
– Diversity in grouping (columns as well as rows)
• Diagrammatic vocabulary
– Biochemical reactions vs. signal transduction
Related Work
• Digital storytelling
– Storyspace (EastGate Systems)
• Personal information management
– Lotus Agenda
• Issue-Based Information Systems
– Rittel, McCall, Conklin & Begeman
• Biological Pathway Databases
– STKE, BIND, KEGG, EcoCyc, TransPath, SPAD, …
• Semantic overlays
– EcoCyc, GenMAP (gene expression on pathways)
– Kenna Technologies (qualitative simulation)
• Diagrammatic UIs to cell and pathway information
– CellSpace (Cellomics)
• Biological information management
– eLabBook (LabBook, Inc.)
Future work
•
•
•
•
•
•
•
Multiple data types
Support a “systems” perspective on biology
Utilization of data mining, computational tools
Duality between graphical and textual storytelling
Richer annotation
Richer diagrammatic semantics
Scaling
Acknowledgements
• Agilent Technologies
– Dean Thompson, Deborah Hall, Laurakay Bruhn, Steve
Laderman, Steve Andrews, Shawn Hwang, Carl Steves,
and Alex Veilleux, Robert Kincaid, Aditya Vailaya
• National Human Genome Research Institute
– Paul Meltzer, Mike Bittner, Yidong Chen, and Jeff Trent
• Formative discussions
– Bob Allen (UMD)
– Abbe Don
… more info …
Allan Kuchinsky
Agilent Technologies
3500 Deer Creek Road
Palo Alto, California, USA
allan_kuchinsky@agilent.com
+1 (650) 485-7423
http://www.agilent.com
Just Enough Molecular
Biology
Chromosome
Gene
What the cell could
Theoretically do
GCA CAG GGC
CGT GTC CCG
DNA
DNA sequence (genes) determine which proteins are
made
transcription
What the cell is
Trying to do
GCA CAG GGC
RNA
mRNA quantity determines amount of each
protein
translation
What the cell
Is doing
Regulation
Arg - Val - Pro
Proteins perform cellular functions
Protein
Download