Inferring subnetworks from perturbed expression profiles Dana Pe’er, Aviv Regev, Gal Elidan and Nir Friedman Bioinformatics, Vol.17 Suppl. 1 2001 Motivation • Expression profiles give genome wide information about the state of metabolism, gene regulation, signal transduction, etc. • One would like to infer functional relationships between the genes from this data. • Perturbations such as mutations give insight into the effects of particular genes and help us infer causal relationships. Tool – Bayesian Networks Z Pr(X|Z) X Pr(Y|Z) Y • Random Variables: Gene Expression Levels • Probabilistic Dependencies: Regulatory Interactions Goal of Paper • Extend Bayesian framework in cellular context to deal with mutations • Develop better methods to discretize data • Define and learn new features in our model such as mediators, activators, and inhibitors • Construct subnetworks of strong statistical significance Learning Networks • Network is learned through maximizing a score function with respect to the collected data. S ( D : G) Slocal ( X i , Pa : D) G i i Slocal ( X i ,U : D) log( Pai U ) log P( X i [m] | U [m], )dP( ) m D = Data; G = Graph; Pa = Parent; X = expression level; m = sample number Equivalent Graphs • Two graphs may imply the same dependencies and are called equivalent. X Y Z = X Y Z • So instead of directed graphs we make partially directed graphs. X Y Z Learning with Mutations • If gene X is mutated we replace its expression level by a constant. For example if X is knocked out, its expression is replaced by 0. • Our new score function is: Slocal ( X i ,U : D) log( Pai U ) log P( X [m] | U [m], )dP( ) i m , X i Int ( m ) Where Int(m) is the set of “intervened” (mutated) variables in experiment m. • Notice that two structurally equivalent graph are no longer guaranteed to get the same score. If two graphs get the same score under this scoring function they are called “intervention equivalent.” Other Perturbations • Temperature sensitivity, kinetic mutations, and environmental stress can also be model in the Bayesian Network framework. • A node is added for each condition which can take the values “on” or “off.” Temperature Y X Z What Do Bayesian Networks Buy Us 1) Markov Neighbors (Direct Relationships) X X Y Y Z X Y 2) Activator/Inhibitors A X B Y Let U = Parents(Y) – X. If for all states u of U we have: P (Y 1 | X , u ) increasing as X increases then we say X is an activator. P(Y 1 | X , u ) decreasing as X decreases then we say X is an inhibitor. 3) d-Seperation: Mediators Z U X Y Both Z and U d-separate X and Y. In this framework they would be called mediators of X and Y. Feature Confidence • A confidence can be associated with each feature which measures how sure we are about truth of the detected feature. This confidence is given by: P( f (G ) | D) f (G) P(G | D) High Scoring G where f(G) is the indicator function of the feature of interest Building Significant Subnetworks 1) Naïve Approach: For some threshold, T, find all if edges such that confidence is above T. For all maximally connected subgraphs of size greater than 3, grow out the graph by adding edges which have confidence greater than some weaker threshold S. Z A Y X B 2) Score-based Approach They want to build a subnetwork and associate a score measuring the networks significance. If we build a network with k nodes from a possible n nodes and include k edges, the score we assign the network is: n K ( )( ) F (ci ) k l i where K is (k choose 2), the number of possible edges on k nodes, ci is the confidence of edge i, and F(x) is probability that an edge has confidence greater than or equal to x. F(x) is estimated by calculated by counting the number of edges with confidence greater than x. Using this criteria networks are built from seeds as in the naïve approach are grown one node at a time. Data • • • • • • • The Rosetta Inpharmatics Compendium Organism: S. cerevisiae 300 complete genomes (experiments) 276 deletion mutations 11 tetracyclin regulatable alleles 13 chemical treated cultures In this paper 565 genes analyzed Pairwise Relations • The method can recognize functional relationships missed by similiarity. Scores are reported as (Confidence, Pearson Correlation) Purine Biosynthesis pair: (.797, .518) ADE2 Novel Predictions: ESC4 Chromatin silencing ADE1 (.914, .162) KU70 DNA break repair Literature search reveal strong support for this interaction. Seperator Relations Transcription Regulators: Nuclear Fusion Post-Translational Activation (by phosphorylation): cell wall integrity pathway Post-Translational Negative Regulation: G-protein mating signalling pathway KAR4 SST2 FUS1 - AGA1 TEC1 SLT2 Rlm1p Swi4/6 STE6 Subnetwork Analysis KAR4 SST2 TEC1 SLT2 KSS1 YLR343W YLR334C SLT2 STE6 FUS1 PRM1 AGA1 AGA2 TOM6 FIG1 FUS3 YEL059W •They claim they often get modular components •More structure than clustering alone •Visual inspection can give clue to unknown gene functions •STE12 missing and marginal position of FUS3 disturbing •http://www.cs.huji.ac.il/labs/compbio/ismb01/ Conclusions • This technique is better than clustering alone because confidence measures can detect interactions previously undetected. Also, we get more specific information about structure of interaction networks so it is easier to guess at unknown gene functions. • Statistical significance of features allows biological exploration of interaction network. • Can not recover all interactions • No incorporation of previous biological knowledge