Multiple-Scale Visualization and Modeling of Biological Networks/Pathways Zhenjun Hu Bioinformatics Program, Boston University, Boston, MA02215 http://visant.bu.edu Outlines • Multiscale visualization & modeling using metagraph – Distinguished features of biological networks – Handling large-scale networks – Advanced graphs & multiscale visualization & modeling • Existing compound graph • Metagraph: an extension of compound graph, or an alternative of hypergraph that can be used for pictorial representation. – Metagraph for pathway visualization – Hierarchical visualization, integration & modeling • Potential applications of metagraph for social networks 2 Why networks Circuit diagrams for biological networks ? The enthusiasm of the biological networks probably comes from the successful stories of the circuit diagrams in electronics. An early stored-program computer (left), built around 1950, used vacuum tubes in logic circuits, whereas modern computers use transistors and silicon wafers (right), but both are based on the same principles. 3 Hartwell LH, Hopfield JJ, Leibler S et al. From molecular to modular cell biology, Nature 1999;402:C47-52 Why graphs Circuit diagrams for biological networks ? Tools for mining and visualizing cell systems has moved beyond static pictures of networks and links, most of them are based on the types of graphs listed below: Simple graph: contains no selfloops or multiple edges between pairs of nodes. Multigraph: Allows multiple edges between pairs of nodes. Compound graph: Integrates both adjacency relations (correlations between pairs of nodes) and inclusion relations among nodes (that is, simple nodes within a larger ‘compound’ node such as the ellipse around the simple nodes, A and B). Compound nodes cannot intersect one another When knowledge is integrated: simple graph multigraph/hybrid graph compound graph 4 What features a biological network However, there are fundamental differences between biological networks and logic circuits: Scale: There are thousands of biomolecules, such as genes, RNAs, and proteins, each may have different states. Abstract: Each node represents thousands of copies of the same biomolecule. Dynamic: The biological networks are changing dynamically, components may appear or disappear under certain condition. (Modular): Biological networks may have a modular nature, and may organized in a hierarchical structure. 5 Handling large-scale networks There are two key aspects need to be addressed when handling large-scale networks: • System performance. – – – – – Memory handling Right data structure Avoid nice drawing Compact size Batch mode • Network readability. – Better zooming/layout? – Not much we can do? 6 Handling large-scale networks Batch mode. This mode reads instructions from a command file, and process the requests without any visual interface and user interactions, which enables VisANT to run in the background ( http://visant.bu.edu/vmanual/cmd.htm ). • Command to run (assume the command file is located under res directory and the name is “batch_cmd.txt”): java -Xmx512M -Djava.awt.headless=true -jar VisAnt.jar -b res/batch_cmd.txt • Sample input/output: 7 Handling large-scale networks A functional linkage network with 15,447 nodes and 1,722,708 edges and laid out using elegant->spring-embedded relaxing, as shown at right. The data of the network is downloaded from http://www.functionalnet.org/mous enet/ and directly loaded into VisANT on a duocore computer with 2G memory and win XP. Be aware that we specified the maximum memory size that are available on the test machine in the run.cmd: 1424M, which may not be required by this network and you can therefore reduce it in case necessary. In addition, VisANT can now directly read the zip file therefore the downloaded data is zipped. It takes 5+ hours for the test case to finish 8 Handling large-scale networks 81,287 9 Handling large-scale networks • So far we have discussed the solutions to improve system performance using the methods of the software engineering. But there seems no good solution to improve the network readability. • We will discuss how to use the advanced graph to improve the network readability and system performance by integrating more biological information An interaction network with 5489 nodes and 29,983 edges (Y2H:blue and Phylo: green) 10 Advanced graphs & multiscale visualization & modeling How geographical map zooms Countries … TX MA States Cities Blocks 11 Advanced graphs & multiscale visualization & modeling Semantic zooming vs. geometric zooming • Geometric (standard) zooming: The view depends on the physical properties of what is being viewed, objects change only their size. • Semantic zooming: Different representations for different spatial scales. The objects being viewed can additionally change shape, details (not merely size of existing details) or, indeed, their very presence in the display, with objects appearing/disappearing according to the context of the map at hand. • Biological network is much more complicated than geological maps 12 Advanced graphs & multiscale visualization & modeling Behind the scenes: compound graph= inclusive tree + adjacency graph A B H C G M D K A H B G E C M inclusive tree F D A H B K E F G C M D K E F adjacency graph 13 Sugiyama, K. & Misue, K. Visualization of structure information: Automatic drawing of compound digraphs. IEEE Trans. Systems, Man, and Cybernetics 21, 876-892 (1991). Advanced graphs & multiscale visualization & modeling Compound graph continued. A H B G C M D K E F A H B G Two restrictions 1. No intersection between groups 2. An rooted inclusive tree C M D K E F 14 Sugiyama, K. & Misue, K. Visualization of structure information: Automatic drawing of compound digraphs. IEEE Trans. Systems, Man, and Cybernetics 21, 876-892 (1991). Advanced graphs & multiscale visualization & modeling • • Except the leaf node, each node in the inclusive tree can be thought as a group containing nodes of next detail level. From the point view of biological networks, such group can be a functional module, a protein complex etc. And a biological network seems have a modular structure: 15 Advanced graphs & multiscale visualization & modeling And life complexity seems hierarchical 16 Oltvai, Z.N. & Barabasi, A.L. Systems biology. Life’s complexity pyramid. Science 298,763–764 (2002). Advanced graphs & multiscale visualization & modeling And metabolic network seems to have a hierarchical organization 17 Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N. & Barabasi, A.L. Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002). Advanced graphs & multiscale visualization & modeling It seems that we can use compound graph to turn a “hair ball” of interaction network into a much readable network of functional modules: 18 Tucker, C.L., J.F. Gera, and P. Uetz, Towards an understanding of complex protein networks. Trends Cell Biol, 2001. 11(3): p. 102-6 Advanced graphs & multiscale visualization & modeling • However, biological modules usually overlaps, because biomolecules usually play multiple roles. But compound graph does not support overlapping between groups • But why the complicated circuit diagram in electronics does not have overlapping problem? A biological network is an abstract network 19 Advanced graphs & multiscale visualization & modeling • Metagraph definition Gm {V , E} V {Vs ,Vm } E {Es , Em } 20 Hu Z, Mellor J, Wu J et al. Towards zoomable multidimensional maps of the cell, Nat Biotechnol 2007;25:547-554 Advanced graphs & multiscale visualization & modeling • Metanode definition Expanded vm Vm v V A Collapsed B C v vi i 0 21 Hu Z, Mellor J, Wu J et al. Towards zoomable multidimensional maps of the cell, Nat Biotechnol 2007;25:547-554 Advanced graphs & multiscale visualization & modeling • Metaedge definition: transient em Em evm ,v evm ,v g (vm , v) 22 Hu Z, Mellor J, Wu J et al. Towards zoomable multidimensional maps of the cell, Nat Biotechnol 2007;25:547-554 Advanced graphs & multiscale visualization & modeling • Metagraph illustration Illustration of the dynamics of meta graph. (I) An eight gene network grouped into three metanodes (G1, G2, G3), each containing a set of genes that subserve some common function. E I The idea that a node, such as C, is known to participate in more than one function at a given level, is represented by displaying it G1 A in more than one metanode. Three meta-nodes are in expanded state and their internal network structure is visible. (II) Meta-node G2 is collapsed and three meta-edges H_G2 (=H_B), G3 G E_G2 (=E_B) and C_G2 are created based on the original network connectivity. Meta-edge C_G2 is a special edge because it represents the shared components and rendered E I using a dashed line. (III) Both G1 and G2 are collapsed, three meta-edges are created, with G1_G2=E_G2 + H_G2, A G1 G1_G3=A_G and G3_G3=C_G2. It has also been shown here that meta-node can be embedded, with G1 and G3 embedded in G3 G a new meta-node G4. (IV) meta-node G4 collapsed, with a new meta-edge G4_G2=G1_G2+G3_G2. The procedures between I, II, III and IV are reversible. This might be best explained in terms of GO levels. For example G1, G2 and G3 might be GO level 10 (pathway level) whereas G4 is GO level 9 etc. B G2 H C G2 G4 F C I IV II III G2 H G1 G2 G4 F C G3 23 Advanced graphs & multiscale visualization & modeling An example to use metagraph to improve the readability and performance 24 Total: 5,321 nodes and 33,992 edges Advanced graphs & multiscale visualization & modeling An example to use metagraph to improve the readability and performance (continued) 25 Total: 5,321 nodes and 33,992 edges Advanced graphs & multiscale visualization & modeling An example to use metagraph to improve the readability and performance (continued) 26 Total: 5,321 nodes and 33,992 edges Metagraph for pathway visualization • Metagraph application in pathway visualization C KEGG Pathway Diagram (part of G1 phase of cell cycle) Complex Hierarchy A B E 27 Metagraph for pathway visualization • Metagraph application in pathway visualization (continued) I II Improved readability and performance with multi-scale I information integrated in pathway visualization using metagraph. Blue boxes represent the KEGG pathways; blue boxes with dark border are contracted metanodes representing a group of proteins; orange boxes with light border representing the protein complex, filled circles represent protein and open circles represent compounds. (I) Five signaling pathways of Homo sapiens visualized using metagraph, dashed lines indicate that there are shared nodes. (II) Same number of pathways visualized as an interaction network. The size of the node is reduced to improve the readability. 28 Hu Z, Snitkin ES, DeLisi C. VisANT: an integrative framework for networks in systems biology, Brief Bioinform 2008;9:317-325 Metagraph for pathway visualization • Condition dependency 29 Hu Z, Snitkin ES, DeLisi C. VisANT: an integrative framework for networks in systems biology, Brief Bioinform 2008;9:317-325 Hierarchical visualization, integration & modeling • Metagraph application: visualization of the network hierarchy Level 4 Level 3 Module of level 3 Protein of level 4 Level 1 Level 2 Level 1: 1 module Level 2: 8 modules Level 3: 161 modules Level 4: 810 proteins. Only part of proteins are shown in the figure due to space limit. 30 Hu Z, Mellor J, Wu J et al. Towards zoomable multidimensional maps of the cell, Nat Biotechnol 2007;25:547-554 Hierarchical visualization, integration & modeling • Metagraph application: integrating interaction network with GO hierarchical modules A sequence-specific DNA binding 0(+34) genes B centromeric rDNA AT DNA telomeric DNA DNA replication DNA binding Binding Binding Binding origin binding 6 genes 6 genes 3 genes 9 genes 10 genes C D 31 Hu Z, Mellor J, Wu J et al. Towards zoomable multidimensional maps of the cell, Nat Biotechnol 2007;25:547-554 • Hierarchical visualization, integration & modeling Metagraph application: network of protein complexes 32 Gavin, A.C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002). • Hierarchical visualization, integration & modeling Metagraph application: network of protein complexes integrated with Y2H interactions 33 Hierarchical visualization, integration & modeling • bottom-up modeling: cancer network 34 Goh KI, Cusick ME, Valle D et al. The human disease network, Proc Natl Acad Sci U S A 2007;104:8685-8690. Hierarchical visualization, integration & modeling • top-down modeling: disease networkcancer gene network 35 Goh KI, Cusick ME, Valle D et al. The human disease network, Proc Natl Acad Sci U S A 2007;104:8685-8690. Quick summary • Metagraph improves the network readability and system performance with integrated context information. • Metagraph helps to represent the complication of the biological network, such as condition-dependency, combinatory control etc. • Metagraph extends the system’s capability to integrate multiscale knowledge, making it much more practical to model/simulate the complexity of biological system: from cell to functional module, network motif, protein… 36 Metagraph: potential application in social network • Science of Science and Innovation Policy (SciSIP) 37 Metagraph: potential application in social network • What can be expected from SciSIP? 1. Predict potential research innovation 2. Predict potential new cross-discipline research fields 3. Predict potential collaboration between different research scientists 4. and more …… 38 Metagraph: potential application in social network • Let’s model each paper (blue) as a metanode with authors (red) as its components and then we get a network of publications: A collaboration network between different research fields 39 Metagraph: potential application in social network • Let’s turn the publication network into co-author network: More importantly, an author can also be modeled as a metanode with educations, hobbies etc. as the subcomponents, which will enable us to draw the correlations from heterogeneous data 40 Acknowledge VisANT Community Team of Development: Zhenjun Hu, Boston Univ. Evan Snitkin, Boston Univ. Yan Wang, Boston Univ. Bolan Linghu, Boston Univ. Jui-Hung Hung, Boston Univ. Collaborators: IBM Watson Research Laboratory KEGG Database Stuart Lab Center of Cancer System Biology Joint Developers: Takuji Yamada, Kyoto Univ. Shuichi Kawashima, University of Tokyo David M. Ng, UCSC Chunnuan Chen, UCSC Changyu Fan, CCSB, Harvard Medical School Veterans: Joe Mellor, Harvard Medical School Jie Wu, Boston Univ. Advisory Board: Aravind Iyer, Computational Biology Branch, NCBI, NLM, NIH Bart Weimer, Director, Center for Integrated BioSystems, Utah State University Chris Sander, Sloan Kettering Memorial Cancer Center Daniel Segrè, Bioinformatics Program, Boston University Frederick Roth, Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School Joseph Lehár, Combinatorix, Inc Josh Stuart, Biomolecular Engineering, UCSC Charles DeLisi Part of the support funding come from NIH & Pfizer 41 Have fun with your own networks! 42