Biological Networks 8 April 2015 Slides courtesy of Eric Franzosa, Kimberly Glass Co-authorship of scientific articles http://www.jeffkennedyassociates.com:16080/connections/concept/image.html 2 Networks in Molecular Biology • Protein-Protein interactions • Protein-DNA interactions • Genetic interactions • Metabolic reactions • Co-expression interactions • Text mining interactions • Association Networks • Etc. Barabasi & Oltvai, Nature Reviews, 2004 3 1. Network terminology and vocabulary • Paths • Motifs • Node metrics • Network metrics 2. Translation to biological networks • Undirected interactions • Functional networks • Predicting PPI and inferring knowledge from them • Directed networks 3. Activies • Parsing network data; • KEGG, STRING, and Cytoscape; • Kevin Bacon Introduction to networks NETWORKS A network is a collection of “things” connected by relationships (in math language a network is called a graph). It is a set of vertices V and edges E (G=V, E). Vocabulary NETWORKS The “things” being connected are called nodes (or, in math language, vertices). V = {v1, v2, v3, v4, v5, v6} Vocabulary NETWORKS Relationships/connections between nodes are called edges (the same term is used in math language). E = {(v1, v2), (v1, v3), (v1, v4), (v1, v5) , (v1, v6)} Vocabulary NETWORKS An edge is said to be incident to two nodes. Two nodes are connected by an edge. Vocabulary NETWORKS An edge can be undirected (“A and B do/are something”) A B or directed (“A does/is something to B”). A B Network examples Network NETWORKS Node is... Edge is... Directed? Person Friendship No Network examples NETWORKS Network Node is... Edge is... Directed? Politics Politician Shared project No Network examples NETWORKS Network Node is... Edge is... Directed? The Internet Website Hyperlink Yes Network examples NETWORKS Network Node is... Edge is... Directed? Family tree Person Descent/Marriage Yes/No Vocabulary NETWORKS The number of edges incident to a node is the node’s degree. Nodes of high degree are called hubs. 1 1 1 5 1 1 This hub is a 5th-degree node Vocabulary NETWORKS Degree in directed networks can be split into in-degree and out-degree for number of incoming and outgoing edges respectively. 1/0 0/1 1/0 2/3 0/1 1/0 In-Degree/Out-Degree Graphs • Graph G=(V,E) is a set of vertices V and edges E – V = {v1, v2, v3, v4, v5} – E = {(v1, v2), (v1, v3), (v2, v4), (v2, v5) , (v3, v5)} • A subgraph G’ of G is induced by some V’ V and E’ E – For example, V’ = {v1, v2, v3} and E’ = {(v1, v2), (v1, v3)} v1 v2 v3 v1 v4 v2 v5 v3 16 Networks and Graphs: Terminology • Formally, a network is a graph is… – G = (V, E), an ordered tuple of two sets – V = {v1, …, vn}, a set of unique nodes, and – E = {(vi, vj), …}, a set of (un)ordered node tuples Bipartite -2 0.5 6 1.2 Directed Multigraph Loops (Self-connections) Acyclic (DAG) Undirected Cyclic Weighted 17 Sparse vs Dense • G(V, E) where |V|=n, |E|=m the number of vertices and edges • Graph is sparse if m~n • Graph is dense if m~n2 • Complete graph when m=n2 18 Connected Components • G(V,E) • |V| = 69 • |E| = 71 19 Connected Components • • • • G(V,E) |V| = 69 |E| = 71 6 connected components 20 PATHS Vocabulary INTRO End Node Path Start Node Path length = 4 Vocabulary: Weighted Edges (distance) INTRO End Node 1 1 6 2 6 2 Path 2 1 4 1 9 1 2 2 4 2 4 1 1 5 Larger distance = weaker connection 7 3 2 3 1 Weighted Path length = 9 6 1 4 2 9 1 3 3 Start Node 1 6 Paths A path is a sequence {x1, x2,…, xn} such that (x1,x2), (x2,x3), …, (xn-1,xn) are edges of the graph. A closed path xn=x1 on a graph is called a graph cycle or circuit. 24 Shortest-Path between nodes 25 Shortest-Path between nodes 26 Longest Shortest-Path 27 Breadth First Search (BFS) SHORTEST PATH Goal: Search for a node j starting from a start node i BFS Algorithm - Begin at start node i - Explore all neighbors - For each neighbor, explore its neighbors - Keep going till you find search node j BFS Train Problem SHORTEST PATH How do I get from Frankfurt to Munich with the fewest number of connections? FKM Simple version does not account for weights Dijkstra’s Algorithm SHORTEST PATH Finds shortest path in a network Network can be weighted or unweighted (distances = 1) Network can be directed or undirected Widely used in computer network routing protocols and transportation route calculations Basic idea Consider closest nodes (to start node) rather than every neighbor. In unweighted case it is BFS. Dijkstra’s Algorithm: Train Problem FM Step 1 (Initialization): Sort neighbors by distance to start node. Mark all nodes (except start) as unvisited. F 85 173 217 Ma 80 Kr W 186 K 103 E N 502 250 183 A S 84 SHORTEST PATH 167 M Node Visited? Distance F 1 0 Ma 0 85 Kr 0 ??? K 0 173 W 0 217 N 0 ??? E 0 ??? A 0 ??? M 0 ??? S 0 ??? Dijkstra’s Algorithm: Train Problem FM F 85 173 217 Ma 80 Kr W 186 K 103 E N 502 250 183 A S 84 167 M SHORTEST PATH Step 2 (Visit closest neighbors): Visit closest node to start. Mark as visited and keep track of path. Calculate/update neighbor distances to start node. Node Visited? Distance F 1 0 Ma 1 85 Kr 0 165 K 0 173 W 0 217 N 0 ??? E 0 ??? A 0 ??? M 0 ??? S 0 ??? Dijkstra’s Algorithm: Train Problem FM Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. F 85 173 217 Ma 80 Kr W 186 K 103 E N 502 250 183 A S 84 SHORTEST PATH 167 M Node Visited? Distance F 1 0 Ma 1 85 Kr 1 165 K 0 173 W 0 217 N 0 ??? E 0 ??? A 0 415 M 0 ??? S 0 ??? Dijkstra’s Algorithm: Train Problem FM Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. F 85 173 217 Ma 80 Kr W 186 K 103 E N 502 250 183 A S 84 SHORTEST PATH 167 M Node Visited? Distance F 1 0 Ma 1 85 Kr 1 165 K 1 173 W 0 217 N 0 ??? E 0 ??? A 0 415 M 0 675 S 0 ??? Dijkstra’s Algorithm: Train Problem FM Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. F 85 173 217 Ma 80 Kr W 186 K 103 E N 502 250 183 A S 84 SHORTEST PATH 167 M Node Visited? Distance F 1 0 Ma 1 85 Kr 1 165 K 1 173 W 1 217 N 0 320 E 0 403 A 0 415 M 0 675 S 0 ??? Dijkstra’s Algorithm: Train Problem FM Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. F 85 173 217 Ma 80 Kr W 186 K 103 E N 502 250 183 A S 84 SHORTEST PATH 167 M Node Visited? Distance F 1 0 Ma 1 85 Kr 1 165 K 1 173 W 1 217 N 1 320 E 0 403 A 0 415 M 0 675 S 0 503 Dijkstra’s Algorithm: Train Problem FM Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. F 85 173 217 Ma 80 Kr W 186 K 103 E N 502 250 183 A S 84 SHORTEST PATH 167 M Node Visited? Distance F 1 0 Ma 1 85 Kr 1 165 K 1 173 W 1 217 N 1 320 E 1 403 A 0 415 M 0 675 S 0 503 Dijkstra’s Algorithm: Train Problem FM Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. F 85 173 217 Ma 80 Kr W 186 K 103 E N 502 250 183 S A 84 SHORTEST PATH 167 M Node Visited? Distance F 1 0 Ma 1 85 Kr 1 165 K 1 173 W 1 217 N 1 320 E 1 403 A 1 415 M 0 599 S 0 503 Dijkstra’s Algorithm: Train Problem FM Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. F 85 173 217 Ma 80 Kr W 186 K 103 E N 502 250 183 A S 84 167 M SHORTEST PATH DONE!!!!! Node Visited? Distance F 1 0 Ma 1 85 Kr 1 165 K 1 173 W 1 217 N 1 320 E 1 403 A 1 415 M 0 599 S 0 503 NODE METRICS What is centrality? CENTRALITY Centrality is a measure of the relative importance of a node within a network Three typical measures of centrality: 1. Degree Centrality 2. Closeness Centrality 3. Betweenness Centrality Degree Centrality CENTRALITY Degree centrality is the simplest measure and is equal to the degree of the node What are nodes with high degree centrality called? Hubs What are hubs? Hubs are nodes with a “high” degree Date Hubs: Interact with many at different times Party Hubs: Interact with many all the time Controversial: Is there a difference? CENTRALITY Removing hubs is bad for network integrity Removing date hubs from yeast PPI network results in small subgraphs CENTRALITY Hubs are essential CENTRALITY Single knockouts of essential genes cause the organism to die Knockouts of hubs are more essential than other genes in the yeast protein-protein interaction (PPI) network Knock-out lethality and connectivity 1.0E+01 60 y = 1.2x-1.91 50 % Essential Genes P (k ) 1.0E+00 1.0E-01 1.0E-02 1.0E-03 40 30 20 10 0 1.0E-04 1 10 Degree k 100 0 5 10 15 20 25 Degree k 46 How do you determine degree cutoff? CENTRALITY A hub is a node with a “high” degree Hub has degree > k k = 5 or 8 or 12 or 20 Hub has degree > degree of x % of all nodes x = 50 or 80 or 95 % The degree cutoff is (typically) determined ad hoc Degree centrality is normalized CENTRALITY CD(i) = Degree(i) / (N-1) Degree of node divided by total possible nodes it could connect to (ignoring self loop) Normalized metric for comparing same node in different networks Closeness Centrality CENTRALITY Closeness centrality measures how close a node is to everything else ~ Average shortest path length to all other nodes Betweenness Centrality CENTRALITY Betweenness centrality measures the number of times a node is present in shortest paths between ALL pairs of nodes Betweenness Centrality CENTRALITY Betweenness centrality measures the number of times a node is present in shortest paths between ALL pairs of nodes Clustering Coefficient CLUSTERING COEFFICIENT Clustering coefficient of node i (Ci) measures how close its neighbors are to being a clique (completely connected subgraph) Clique: CA: All nodes interact # edges = 2 # max edges = 6 CA = 2/6 = 1/3 A A B C B C Clustering coefficient The density of the network surrounding node I, characterized as the number of triangles through I. Related to network modularity nI 2n I CI k k k 1 2 k: neighbors of I The center node has 8 (grey) neighbors nI: edges between node I’s neighbors There are 4 edges between the neighbors C = 2*4 /(8*(8-1)) = 8/56 = 1/7 53 WHOLE NETWORK METRICS Node to Network Properties NETWORK PROPERTIES Simple set of properties come from averaging a given property of all nodes: Degreeavg , Cavg Also you can average all distances (shortest paths) Characteristic Path Length (CPL): Average distance between all pairs of nodes But averages are highly dependent on the number of nodes. It is better to look at a distribution (more in 3 slides…) RANDOM NETWORKS Network properties can be compared against random (and randomized) networks to assess significance Diameter NETWORK PROPERTIES Diameter: Maximum distance between all pairs of nodes Network properties allow you to compare different networks. Random Networks: ER Model RANDOM NETWORKS Erdös-Rényi (ER) model is a method for generating a random network Algorithm: - Loop through each pair of N nodes -Randomly add an edge between them with probability p Alfréd Rényi Paul Erdös p = 0.01 Real vs. Random NETWORK PROPERTIES Real networks have different properties than random networks Real networks are small-world and scale-free Small-world NETWORK PROPERTIES Small-world: Most nodes can be reached from every other by a small number of steps i.e. Small-world networks have small diameter President Teddy Roosevelt has a Bacon number of 3 6⁰ of separation Randomizing Networks: Swapping RANDOM NETWORKS Some properties such as shortest path length are heavily dependent on the size of the network AND the degrees of the nodes To avoid changing basic degree related properties, one can randomize an existing real network by iteratively swapping the ends of two edges X1 Y1 X1 Y1 X2 Y2 X2 Y2 Scale-free NETWORK PROPERTIES Degree Distribution: Frequency of all possible node degrees in a network Scale-free: The degree distribution follows a power-law i.e. Most nodes have small degree, but some have a very large degree P(k) ~ kg Motifs NETWORK MOTIFS Recurring pattern in network with a biological significance Pioneering work by Uri Alon Biological function of motifs NETWORK MOTIFS Network motifs are considered the basic building blocks of a network Network motifs act as information processing circuits Coherent FFL acts as a noise filter X increases Y increases X and Y increase Z increases TF x TF y Gene z Time delay between X increasing and Y increasing 3-node model and simulation NETWORK MOTIFS Biological Networks Complexity comes from the set of parts... INTRO ...and their connections (e.g., metabolism) INTRO How is biological data represented in networks? Low Correlation High Correlation • Gene expression • Physical PPIs + = • Genetic interactions • Colocalization • Sequence • Protein domains • Regulatory binding sites … 69 Building and Interpreting Biological Netw • How we build a biological network depends on what data we have AND what we want the edges in the network to represent. • The meaning of the edges in a biological network depend on the method used to generate those edges. Influences how we interpret the interactions in a network. node: an object in the network (e.g. genes) edge: indicates relationship between two nodes 70 Interpreting the “edges” in Biological Networks A B A B A B Relational Networks Correlation Network Regulatory Network • Generally Undirected (non-causal relationships) • Nodes all of same “type” • Generally no “signs” on edges • Undirected (non-causal relationships) • Nodes all of same “type” • Edges can have “signs” • Directed Network (causal relationships) • Can have “types” of nodes • Edges can have “signs” Example: When the expression of Gene A changes, so does the expression for Gene B. Example: TF A regulates Gene B. Example: Protein A is a dimerization partner with protein B. *Correlation is not causation. 71 Network examples (Molecular biology -omes) NETWORKS Network Node is... Edge is... Directed? Physical Interactome Protein Direct/indirect contact No Genetic Interactome Gene Epistatic relationship No Informatic Interactome Various Computed similarity No Regulatory Interactome 1 TF/gene Transcriptional activation Yes Regulatory Interactome 2 Kinase/target Phosphorylation Yes Metabolome 1 Reactant Reaction Yes Metabolome 2 Reaction Reactant Yes PHYSICAL INTERACTION Physical interactions between proteins (protein-protein interactions) are intuitive to think about. Protein A makes direct physical contact with Protein B in the cell; alternatively, A and B both interact with a third (mediator) protein, C. C Examples PHYSICAL INTERACTION ATP synthase is a large, stable complex of physically interacting proteins. These are permanent* interactions. *also called “obligate” or “constitutive” Examples PHYSICAL INTERACTION (1) Cyclin binds to CDK and (2) the Cyclin-CDK complex binds to a target protein. These are transient interactions. Detection PHYSICAL INTERACTION Some physical interactions are inferred from biochemical activities (e.g., a kinase and its target) or from structures (e.g., two chains in contact in the PDB). There are many experimental techniques for validating or screening for protein-protein interactions. The most popular are affinity capture and two-hybrid. Affinity capture PHYSICAL INTERACTION The cell’s contents are exposed to a surface engineered to bind a particular protein (the bait, here A). This is often done using an antibody specific to A or a tag fused to A. Affinity capture PHYSICAL INTERACTION The bait protein binds to the surface, bringing its various interaction partners along with it (called prey). Affinity capture PHYSICAL INTERACTION The unbound cellular contents are then washed away. Affinity capture PHYSICAL INTERACTION C Prey proteins pulled down by the bait are identified using prey-specific antibodies or by mass spectrometry. Affinity capture PHYSICAL INTERACTION Method strengths: Done well, co-immunoprecipitation is considered a gold standard of protein-protein interaction. Method weaknesses: Can’t differentiate between direct and indirect (mediated) contact; prey must bind bait tightly to be pulled down. Two-hybrid PHYSICAL INTERACTION The two-hybrid method manipulates the independent operation of DNA-binding (BD) and transcription activating (AD) domains of eukaryotic transcription factors to detect interactions. transcription factor UAS Gene Two-hybrid PHYSICAL INTERACTION The two-hybrid method manipulates the independent operation of DNA-binding (BD) and transcription activating (AD) domains of eukaryotic transcription factors to detect interactions. BD UAS Transcription ON Gene Two-hybrid PHYSICAL INTERACTION The two-hybrid method manipulates the independent operation of DNA-binding (BD) and transcription activating (AD) domains of eukaryotic transcription factors to detect interactions. Two fusion proteins are made: BD-P1 (bait) and AD-P2 (prey). Two-hybrid PHYSICAL INTERACTION The two-hybrid method manipulates the independent operation of DNA-binding (BD) and transcription activating (AD) domains of eukaryotic transcription factors to detect interactions. Two fusion proteins are made: BD-P1 (bait) and AD-P2 (prey). BD UAS Gene Two-hybrid PHYSICAL INTERACTION The two-hybrid method manipulates the independent operation of DNA-binding (BD) and transcription activating (AD) domains of eukaryotic transcription factors to detect interactions. Two fusion proteins are made: BD-P1 (bait) and AD-P2 (prey). Interaction of P1 and P2 is sufficient to initiate transcription. BD UAS Transcription ON Gene Two-hybrid PHYSICAL INTERACTION Method strengths: Scales well to very high-throughput screens; can detect transient interactions; reasonably specific to binary (A+B) type interactions. Method weaknesses: High false positive and negative rates; fusion may affect bait/prey proteins’ ability to fold or bind; bait/prey may not be able to enter the nucleus (required for activation). GENETIC INTERACTIONS Genetic interactions are more abstract. They go by many names, often recognized by the terms phenotypic, synthetic, or dosage. All are related to the concept of epistasis. Epistasis GENETIC INTERACTIONS Let’s say there are two methods of recreating ATP from ADP and Pi: one mediated by gene 1 (solid) and another by gene 2 (dashed). gene 1 gene 2 Epistasis GENETIC INTERACTIONS If only one of the two pathways is lost, the redundant pathway remains, the cell can still produce ATP, and therefore lives. Phenotype = alive. gene 1 gene 1 gene 2 gene 2 Epistasis GENETIC INTERACTIONS If both pathways are lost the cell cannot produce ATP and therefore dies. Loss of both genes results in a new phenotype. Phenotype = dead. gene 1 gene 2 This notion, that a new phenotype can result from a combination of changes at the genetic level, is epistasis. We report a genetic interaction between genes 1 and 2 called synthetic lethality. (Related terms: sick, phenotypic enhancement, rescue). GENETIC INTERACTIONS Genetic interactions can be useful for identifying parallel pathways and other subtle (non-physical) interactions. Complexes may also be revealed if they are robust against the removal of one, but not two, components. B A D C A B A D C B C D Common interaction databases DATABASES BioGRID (http://www.thebiogrid.org/) Biological General Repository for Interaction Datasets. Comprehensive, especially for yeast; includes high throughput and small-scale analyses; 250,000 interactions. MINT (http://mint.bio.uniroma2.it/mint/) Molecular Interaction database. Experimental interaction data manually curated from literature. 80,000 interactions. MIPS (http://mips.helmholtz-muenchen.de/) Munich Information Center for Protein Sequences. Very well curated; often used as a “gold standard” of protein-protein interaction. HPRD (http://www.hprd.org/) Human Protein Reference Database. Emphasis on human protein bioinformatics, including 40,000 interactions. Others… Single interaction report Gene/Protein 1, code and alias DATABASES Experimental method YOR128C YCR066W ADE2 RAD18 Two-hybrid Uetz P (2000) 10688190 Gene/Protein 2, code and alias Reference (including Pubmed ID) Statistics from BioGRID (2009): Organisms DATABASES Genes in Genome Reported Interactions % Confirmed % Physical % Genetic 6,000 95,978 25% 49% 54% Homo sapiens (Human) 25,000 26,864 29% 100% 1% Drosophila melanogaster (Fruitfly) 14,000 24,953 11% 89% 11% 5,000 11,562 11% 16% 88% Caenorhabditis elegans (Nematode worm) 20,000 6,622 2% 69% 31% Arabidopsis Thaliana (Mouse-ear cress) 25,000 2,611 27% 97% 4% Mus musculus (Mouse) 24,000 894 21% 99% 3% Species Saccharomyces cerevisiae (Baker’s Yeast) Schizosaccharomyces pombe (Fission yeast) Statistics from BioGRID (2009): Methods Method Type Method Name Physical DATABASES Interactions Reported Papers Using Two-hybrid 48,192 4,519 Physical Affinity Capture-MS 31,258 655 Genetic Phenotypic Enhancement 30,807 2,675 Physical Affinity Capture-Western 16,524 8,763 Genetic Phenotypic Suppression 12,399 1,936 Genetic Synthetic Growth Defect 12,085 980 Physical Reconstituted Complex 11,782 7,138 Genetic Synthetic Lethality 11,666 1,555 Physical Biochemical Activity 6,657 1,370 Genetic Dosage Rescue 3,660 1,736 Genetic Synthetic Rescue 2,767 1,277 Genetic PCA 2,685 31 Physical Co-purification 2,168 615 Physical Affinity Capture-RNA 1,209 24 Physical Co-fractionation 1,065 444 Statistics from BioGRID (2009): Papers DATABASES Interactions Reported (≤) Number of Papers 1 9,639 10 10,696 100 1,049 1,000 64 10,000 25 100,000 2 The vast majority of interaction-reporting papers (94.7%) report 10 or fewer interactions (99.6% for 100 or fewer). About 20% of known interactions have only been observed in studies reporting 10 or fewer interactions. What are they? FUNCTIONAL NETWORKS Functional association network or Functional linkage network (FLN) - Nodes are genes or proteins - Proteins aka functional association What can we use to functionally link genes/proteins? GO! STRING FUNCTIONAL NETWORKS http://string.embl.de/ - Physical interactions - Genomic context (e.g. gene fusion events) - Coexpression (microarray) - Literature co-occurrence STRING FUNCTIONAL NETWORKS Functional association Predicted physical interaction Maybe? Works because they include another information: Species co-occurrence (630 organisms!!) Homology based prediction PPI PREDICTION - Interacting proteins are more likely to co-evolve - Interactions are transferred to corresponding orthologs ortholog A α physical interaction Mouse B ortholog physical interaction? β Human Homology based prediction PPI PREDICTION - Interacting proteins are more likely to co-evolve - Interactions are transferred to corresponding orthologs ortholog A α physical interaction Mouse B ortholog physical interaction β Human “Interologs”: Interacting AND Homologous Homology based prediction PPI PREDICTION - Interacting proteins are more likely to co-evolve - Interactions are transferred to corresponding orthologs HOLD YOUR HORSES! ortholog A α physical interaction Mouse B ortholog physical interaction β Human Phylogenetic profiling PPI PREDICTION Ortholog interactions must be present across many species A-B Human Mouse Chicken Yeast Worm Fly Fugu E. Coli ? Yes Yes Yes No Yes Yes No Phylogenetic profiling PPI PREDICTION Ortholog interactions must be present across many species A-B Human Mouse Chicken Yeast Worm Fly Fugu E. Coli Yes Yes Yes Yes No Yes Yes No 5 out of 7 p-value = 0.0001 Phylogenetic tree similarity PPI PREDICTION - Entirely based on co-evolution - A and B have similar trees they must interact ≈ Protein A Protein B Structural patterns PPI PREDICTION - Identify interaction interfaces from structures - Search for the same interface in other pairs of PDB structures A B Interface Integrate all information PPI PREDICTION The best prediction algorithms integrate different evidences using machine learning – like STRING Basic idea: Step 1: Identify recurring evidence pattern in known interactions – training Step 2: Identify new interactions by searching for same evidence pattern in unknown protein pairs – testing How to use interactomes? PPI ANALYSES Remember: Network is undirected Clustering Find complexes Protein neighborhoods – functional Other Inferring knowledge such as functional annotations Clustering PPI ANALYSES Function Assignment PPI ANALYSES Guilt-by-association: Function is transferred from neighbors Interacting partner annotations: BLUE GREEN B F C A E D Function Assignment PPI ANALYSES Guilt-by-association: Function is transferred from neighbors Interacting partner annotations: BLUE GREEN B F B C A E D “best” = max F C A E D Correlation Networks Function Assignment PPI ANALYSES Guilt-by-association: Function is transferred from neighbors Interacting partner annotations: BLUE McGary et al, Genome Biology, 2007 GREEN B F B C A E D all F C A E D Correlation is the simplest metric for co-expression genes genes conditions genes 115 Mutual Information is a Measure of Non-linear Correlation Pearson correlation value Source: http://en.wikipedia.org/wiki/Correlation_and_dependence 116 Mutual Information (MI) Definition I ( X ; Y ) p ( x, y ) log p ( x, y ) p( x) p( y ) yY x X Properties MI I ( X ; Y ) H ( X ) H ( X | Y ) • Measures how much knowing one of these variables reduces uncertainty about the other • Positive and symmetric • Invariant under nonlinear transformation Network Reconstruction Algorithms that use MI: • ARACNE • CLR 117 Regulatory Networks DIRECTED NETWORKS Signaling Phosphorylation Activation Inhibition Protein A Protein B Transcriptional Regulation TF A TF = Transcription Factor Expression Repression Gene B Regulatory Networks Regulatory Networks DIRECTED NETWORKS Signal at Cell Surface Cascade to Nucleus Activate Transcription Factors TF Genes Gene Expression Transcriptional Regulatory Networks TF NETWORKS Identify genes where transcription factors bind DNA binding sites - Experimental techniques - Computational prediction Identifying DNA Binding Sites: Experiments TF NETWORKS ChIP-chip Chromatin immunoprecipitation (ChIP) followed by microarray analysis (chip) or sequencing (seq) Identifying DNA Binding Sites: Computational TF NETWORKS Motif Scanning Scan promoters using position weight matrices (PWM) Yeast Transcriptional Regulatory Network Rick Young dataset TF NETWORKS Yeast Transcriptional Regulatory Network TF NETWORKS TF – TF interactions only Every edge can be an activation or an inhibition. Overview SIGNALING NETWORKS Edges: activation or inhibition (multiple edge types!) KEGG Pathways Database SIGNALING NETWORKS Edges: activation, inhibition, phosphorylation, etc. KEGG Pathways Database SIGNALING NETWORKS Literature curated, manually drawn pathways Groups of pathways Metabolism Genetic Information Processing Environmental Information Processing Cellular Processes Human Diseases Pathways are both species specific & cross-species (KO) Other Pathway Databases SIGNALING NETWORKS KEGG (http://www.kegg.jp/kegg/pathway.html) Great for metabolic pathways. Simple interface. Multiple species including prokaryotes. REACTOME (http://www.reactome.org/) Supposedly the most comprehensive resource for signal transduction pathways. Human only. BIOCARTA (http://www.biocarta.com/genes/index.asp) Pretty maps with lots of colors. Mammalian. Experiments SIGNALING NETWORKS Decades of low throughput, painstaking experiments - Stimulation - Mutants - Structure - Context No single experiment type to deduce signaling network Directions = Pathways DIRECTED NETWORKS - Chain regulatory interactions - Concept of pathway emerges from directions - New analyses not possible with undirected networks TF A TF B TF C Gene D Recept. A Kinase B Signal Protein C TF D Connect the dots DIRECTED NETWORKS Signal at Cell Surface Cascade to Nucleus Activate Transcription Factors TF Genes Gene Expression Clustering DIRECTED NETWORKS Network Analysis and Visualization http://www.cytoscape.org/ http://igraph.sourceforge.net/ http://www.graphviz.org/ 134 SUMMARY / APPLICATION Functional mapping: mining biological networks Predicted relationships between genes Low Confidence High Confidence The strength of these relationships indicates how cohesive a process is. Cell cycle genes 136 Functional mapping: mining biological networks Predicted relationships between genes Low Confidence High Confidence Cell cycle genes 137 Functional mapping: mining biological networks Predicted relationships between genes Low Confidence High Confidence The strength of these relationships indicates how associated two processes are. Cell cycle genes DNA replication genes 138 Predicting gene function Predicted relationships between genes Low Confidence High Confidence Cell cycle genes 139 Predicting gene function Predicted relationships between genes Low Confidence High Confidence Cell cycle genes 140 Predicting gene function Predicted relationships between genes Low Confidence High Confidence These edges provide a measure of how likely a gene is to specifically participate in the process of interest. Cell cycle genes 141 IMAGE SOURCES Slide numbers are no longer correct due to rearrangement and slide deck merging, but consult these URLs for all otherwise unattributed images Slide Source 1 http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg 4 http://en.wikipedia.org/wiki/Leonhard_Euler 8 http://en.wikipedia.org/wiki/Breadth-first_search 31 http://www.ncbi.nlm.nih.gov/pubmed/15190252 34 http://genomebiology.com/content/figures/gb-2007-8-5-r95-3.gif 36 http://www.sgcity.org/airport/images/routemaplg.gif 38 http://en.wikipedia.org/wiki/Centrality 45 http://en.wikipedia.org/wiki/Paul_Erdős http://en.wikipedia.org/wiki/Alfréd_Rényi http://en.wikipedia.org/wiki/Erdős–Rényi_model 48 http://en.wikipedia.org/wiki/Six_degrees_of_separation http://en.wikipedia.org/wiki/Theodore_roosevelt http://oracleofbacon.org/images/Kevin_Bacon.jpg 49 http://network-science.org/fig_rand_versus_scalefree.html 50 http://www.weizmann.ac.il/mcb/UriAlon/ 51 https://www.weizmann.ac.il/complex/tlusty/courses/InfoInBio/Papers/AlonMotifs2002.pdf 53 http://www.weizmann.ac.il/mcb/UriAlon/Papers/network_motifs_in_coli.pdf IMAGE SOURCES Slide Source 1 http://en.wikipedia.org/wiki/Signal_transduction 17 http://interactome.dfci.harvard.edu/S_cerevisiae/S_images/Y2H_YI1.png 29 http://www.bcm.edu/molvir/images/faculty/Palzkill-Graphic.png http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1236910/figure/F3/ 41 http://evolutionarygenomics.imim.es/pf/pf_documentation.php?WID= http://ani.embl.de/trawler/result_paper/logo_mammals_trawler/nfkb_transfac.png 42 http://www.sciencemag.org/cgi/content/full/298/5594/799 43 http://www.biomedcentral.com/1471-2105/7/113/figure/F5?highres=y 45 http://www.kegg.jp/kegg/pathway/hsa/hsa04010.html 50 http://publications.nigms.nih.gov/computinglife/images/fuzzy_big.gif