Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Protein-protein interaction (PPI) Definition Physical association of two or more protein molecules RNA polymerase II, 12 subunits Examples Receptor-ligand interactions Kinase-substrate interactions Transcription factor-co-activator interactions Multiprotein complex, e.g. multimeric enzymes Cramer et al. Science 292:1863, 2001 2 BCHM352, Spring 2011 Significance of protein interaction 3 Most proteins mediate their function through interacting with other proteins To form molecular machines To participate in various regulatory processes Distortions of protein interactions can cause diseases BCHM352, Spring 2011 Yeast two-hybrid Method Pros Bait strain: a protein of interest, bait (B), fused to a DNA-binding domain (DBD) Prey strains: ORFs fused to a transcriptional activation domain (AD) Mate the bait strain to prey strains and plate diploid cells on selective media (e.g. without Histidine) If bait and prey interact in the diploid cell, they reconstitute a transcription factor, which activates a reporter gene whose expression allows the diploid cell to grow on selective media Pick colonies, isolate DNA, and sequence to identify the ORF interacting with the bait High-throughput Can detect transient interactions Cons False positives Non-physiological (done in the yeast nucleus) Can’t detect multiprotein complexes Uetz P. Curr Opin Chem Biol. 6:57, 2002 4 BCHM352, Spring 2011 Tandem affinity purification Method Pros TAP tag: Protein A, Calmodulin binding domain, TEV protease cleavage site Bait protein gene is fused with the DNA sequences encoding TAP tag Tagged bait is expressed in cells and forms native complexes Complexes purified by TAP method Components of each complex are identified through gel separation followed by MS/MS High-throughput Physiological setting Can detect large stable protein complexes Cons High false positives Can’t detect transient interactions Can’t detect interactions not present under the given condition Tagging may disturb complex formation Binary interaction relationship is not clear Chepelev et al. Biotechnol & Biotechnol 22:1, 2008 5 BCHM352, Spring 2011 Large scale protein interaction identification Experimental Yeast two-hybrid Tandem affinity purification Computational Gene fusion Ortholog interaction Phylogenetic profiling Microarray gene co-expression Valencia et al. Curr. Opin. Struct. Biol, 12:368, 2002 6 BCHM352, Spring 2011 Protein interaction data in the public domain Database of Interacting Proteins (DIP) http://dip.doe-mbi.ucla.edu/ The Molecular INTeraction database (MINT) http://mint.bio.uniroma2.it/mint/ The Biomolecular Interaction Network Database (BIND) http://www.binddb.org/ The General Repository for Interaction Datasets (BioGRID) http://www.thebiogrid.org/ Human Protein Reference Database (HPRD) http://www.hprd.org Online Predicted Human Interaction Database (OPHID) http://ophid.utoronto.ca The Munich Information Center for Protein Sequences (MIPS) http://mips.gsf.de 7 BCHM352, Spring 2011 HPRD 8 BCHM352, Spring 2011 Protein interaction networks 9 Saccharomyces cerevisiae Drosophila melanogaster Jeong et al. Nature, 411:41, 2001 Giot et al. Science, 302:1727, 2003 Caenorhabditis elegans Homo sapiens Li et al. Science, 303:540, 2004 Rual et al. Nature, 437:1173, 2005 BCHM352, Spring 2011 Gene regulatory networks Experimental 10 Chromatin immunoprecipitation (ChIP) ChIP-chip ChIP-seq Computational Promoter sequence analysis Reverse engineering from microarray gene expression data Public databases Transfac (http://www.gene-regulation.com) MSigDB (http://www.broadinstitute.org/gsea/msigdb) hPDI (http://bioinfo.wilmer.jhu.edu/PDI/ ) BCHM352, Spring 2011 Shen-orr et al. Nat Genet, 31:64, 2002 KEGG metabolic network 11 BCHM352, Spring 2011 Network visualization tools Cytoscape http://www.cytoscape.org Gehlenborg et al. Nature Methods, 7:S56, 2010 12 BCHM352, Spring 2011 Graph representation of networks Graph: a graph is a set of objects called nodes or vertices connected by links called edges. In mathematics and computer science, a graph is the basic object of study in graph theory. node edge RNA polymerase II 13 Cramer et al. Science 292:1863, 2001 BCHM352, Spring 2011 Undirected graph vs directed graph Protein interaction network Nodes: protein Edges: physical interaction Undirected Krogan et al. Nature 440:637, 2006 Lee et al. Science 298:799, 2002 Metabolic network Transcriptional regulatory network Nodes: metabolites Nodes: transcription factors and genes Edges: enzymes Edges: transcriptional regulation Directed Directed Substrate->Product TF->target gene Ravasz et al. Science 297:1551, 2002 14 BCHM352, Spring 2011 Fhl1 RPL2B Degree, path, shortest path Degree: the number of edges adjacent to a node. A simple measure of the node centrality. Path: a sequence of nodes such that from each of its nodes there is an edge to the next node in the sequence. Shortest path: a path between two nodes such that the sum of the distance of its constituent edges is minimized. YDL176W Degree: 3 Fhl1 Out degree: 4 In degree: 0 15 BCHM352, Spring 2011 Obama vs Lady Gaga: who is more influential? Twitter following (out degree) 16 BCHM352, Spring 2011 Twitter followers (in degree) 701,301 Obama 7,035,548 144,263 Gaga 8,873,525 0 Eminem 3,509,469 Network properties (I): hubs Random network Scale-free network 130 nodes, 215 edges 130 nodes, 215 edges Homogeneous: most nodes have approximately the same number of links Five red nodes with the highest number of links reach 27% of the nodes Heterogeneous: the majority of the nodes have one or two links but a few nodes have a large number of links Five red nodes with the highest degrees reach 60% of the nodes (hubs) Albert et al., Nature, 406:378, 2000 17 BCHM352, Spring 2011 Scale-free biological networks 18 Metabolic network C. elegans Protein interaction network H. sapiens Jeong et al, Nature, 407:651, 2000 Stelzl et al. Cell, 122:957, 2005 BCHM352, Spring 2011 Gene co-expression network S. cerevisiae Noort et al, EMBO Reports,5:280, 2004 Network properties (II): small world network Wichita Stanly Milgram’s small world experiment Social network Average path length between two person Boston Omaha "If you do not know the target person on a personal basis, do not try to contact him directly. Instead, mail this folder to a personal acquaintance who is more likely than you to know the target person." Small world network: a graph in which most nodes can be reached from every other by a small number of steps. Biological interpretation: Efficiency in transfer of biological information Six degrees of separation 19 BCHM352, Spring 2011 Network properties (III): motifs Network motifs: Patterns that occur in the real network significantly more often than in randomized networks. Three-node patterns Milo et al., Science, 298:824, 2002 Feed-forward loop Feedback loop 20 BCHM352, Spring 2011 Network properties (IV): modularity 21 Modularity refers to a group of physically or functionally linked molecules (nodes) that work together to achieve a relatively distinct function. Examples Protein interaction modules Palla et al, Nature, 435:841, 2005 Transcriptional module: a set of coregulated genes sharing a common function Protein complex: assembly of proteins that build up some cellular machinery, commonly spans a dense sub-network of proteins in a protein interaction network Signaling pathway: a chain of interacting proteins propagating a signal in the cell BCHM352, Spring 2011 Gene co-expression modules Shi et al, BMC Syst Biol, 4:74, 2010 Network distance vs functional similarity Proteins that lie closer to one another in a protein interaction network are more likely to have similar function and involve in similar biological process. Sharan et al. Mol Syst Biol, 3:88, 2007 22 BCHM352, Spring 2011 Network-based disease gene prioritization Kohler et al. Am J Hum Genet. 82:949, 2008 For a specific disease, candidate genes can be ranked based on their proximity to known disease genes. 23 BCHM352, Spring 2011 Summary Biological networks Graph representation of networks Graph, node, edge, undirected graph, directed graph, degree, path, shortest path Network properties Hubs and scale-free degree distribution Small-world Motifs Modularity Network-based applications 24 Protein-protein interaction network; Gene regulatory network; Metabolic network Disease gene prioritization BCHM352, Spring 2011