Biological Storytelling: A Software Tool for Biological Information Organization Based upon Narrative Structure Allan Kuchinsky, Kathy Graham, David Moh, Annette Adler, Ketan Babaria, Michael L. Creech Agilent Technologies 3500 Deer Creek Road Palo Alto, California, USA allan_kuchinsky@agilent.com Understanding the Molecular Basis of Disease Source: Weewaratna et al, “Wnt5a Signaling Directly Affects Cell Motility and Invasion of Metastatic Melanoma”, Cancer Cell, April 2002 High-Throughput Experimental Methods DNA microarray technology enables biologists to simultaneously study an entire set of cellular processes at a molecular level (per experiment … ) Source: Stephanie Fulmer-Smentek, Robert Kincaid: Agilent Technologies Biological Storytelling • Reason across multiple types of experimental data • Formulate hypotheses and high-level descriptions Supporting Synthesis Tasks of Biomedical researchers • • Most bioinformatics tools support analysis, but not synthesis activities of bioscience researchers Synthesis involves: 1. 2. keeping track of the diverse pieces of information collected during database searches and other data analysis activities, and organizing and using the diverse information, formulating hypotheses and higher-level explanations – 3. 4. E.g. elucidating the structure and function of biological pathways. validating hypotheses and higher-level explanations against detailed experimental data sharing the information with colleagues and working collaboratively to refine hypotheses. Aspects of Synthesis work Findings from User Research • • • • • • • Like detective work, “mind-mapping” Information is “free-form”, semi-structured Multiple hypothesis/alternatives Group work, multiple perspectives, views Grouping together chunks of related information Verbal and visual reasoning Notepads, whiteboards, lab notebooks The Role of Narrative • Many biologists talked in terms of "piecing together a story“… – of what a gene does, how it fits into a pathway with other genes, proteins… • Cognitive/social psychology research finds that people use story structure as a way of organizing and remembering information – Thorndyke, Shank, Middleton and Edwards, Erickson, Gershon and Ward • Useful story development software tools exist for other domains – E.g. screenwriting, video production Main concepts of our software • Free-form data model • Narrative Structure • “Top down” hypothesis formulation, “bottom-up” data exploration • Annotation • Multi-disciplinary Collaboration ITEMS •Basic “atomic” unit of information •Represent “biological entities” – genes, proteins, … •Sortable on multiple keys •Group selectable •Populate manually or semi-automatically •Links to detailed experimental data •Links to public data and literature •Data values can be color-encoded COLLECTIONS •Free-form sets of items •Malleable –split, merge, add, move •Represent cognitive chunks •Can be nested •Populate manually or semi-automatically •Links to detailed experimental data •Links to public data and literature STORIES •Represent state of biological hypotheses and understandings •Represent paths explored and alternative hypotheses •Support for deliberation via Support/Oppose story nodes •Links to detailed experimental data •Links to public data and literature •Narrative structure as an organizing principle Story Grammar Source: Thorndyke, P.W. (1977), "Cognitive Structures in Comprehension and Memory of Narrative Discourses", Cognitive Psychology, 9, pp. 77-110 PUTTING THE STORY TOGETHER GRAPHICALLY •Graphical network diagram is common visual metaphor in molecular biology •Pathways, protein/protein interaction networks •Represent the “nouns” and “verbs” of biological stories •Nouns = biological entities (genes, proteins), (players), • verbs = relationships between biological entities (promotes, inhibits) SEMANTIC OVERLAYS •For validating high-level explanations, hypotheses against data •Juxtapose data values onto elements of graphical and textual stories •“step through” experimental data columns and “light up” elements of graphical and textual stories •Analogy to qualitative simulation Support for Multidisciplinary Collaboration • Annotation is tagged with user name and timestamp • Support and Oppose story nodes document alternative lines of thought • Web repository enables review by non-users of system “… we build up a consensus hallucination about what is going on in the living cell… (NHGRI)” Annotation • Every system element can have arbitrary textual notes • Citations can be dragged/dropped from the Web-based literature • Citations can have attached notes/comments • Annotations interlinked with system elements, other annotations Web Repository Usage Feedback • How much structure is just right? – Flexibility in story grammar (tags vs. rigid structure) – Diversity in grouping (columns as well as rows) • Diagrammatic vocabulary – Biochemical reactions vs. signal transduction Related Work • Digital storytelling – Storyspace (EastGate Systems) • Personal information management – Lotus Agenda • Issue-Based Information Systems – Rittel, McCall, Conklin & Begeman • Biological Pathway Databases – STKE, BIND, KEGG, EcoCyc, TransPath, SPAD, … • Semantic overlays – EcoCyc, GenMAP (gene expression on pathways) – Kenna Technologies (qualitative simulation) • Diagrammatic UIs to cell and pathway information – CellSpace (Cellomics) • Biological information management – eLabBook (LabBook, Inc.) Future work • • • • • • • Multiple data types Support a “systems” perspective on biology Utilization of data mining, computational tools Duality between graphical and textual storytelling Richer annotation Richer diagrammatic semantics Scaling Acknowledgements • Agilent Technologies – Dean Thompson, Deborah Hall, Laurakay Bruhn, Steve Laderman, Steve Andrews, Shawn Hwang, Carl Steves, and Alex Veilleux, Robert Kincaid, Aditya Vailaya • National Human Genome Research Institute – Paul Meltzer, Mike Bittner, Yidong Chen, and Jeff Trent • Formative discussions – Bob Allen (UMD) – Abbe Don … more info … Allan Kuchinsky Agilent Technologies 3500 Deer Creek Road Palo Alto, California, USA allan_kuchinsky@agilent.com +1 (650) 485-7423 http://www.agilent.com Just Enough Molecular Biology Chromosome Gene What the cell could Theoretically do GCA CAG GGC CGT GTC CCG DNA DNA sequence (genes) determine which proteins are made transcription What the cell is Trying to do GCA CAG GGC RNA mRNA quantity determines amount of each protein translation What the cell Is doing Regulation Arg - Val - Pro Proteins perform cellular functions Protein