Network inference in Systems Genetics Dirk Husmeier Biomathematics & Statistics Scotland (BioSS) Identify candidate causal genes within the eQTL confidence interval around a marker by (partial) gene expression correlation analysis Target gene Genome with potential candidate genes Target gene Marker Target gene Bootstrap confidence interval Target gene Significant correlation with target gene Target gene Significant correlation with target gene Distinguish between direct and indirect interactions direct common interaction indirect regulator interaction co-regulation A and B have a high correlation, but a low partial correlation Method of Bing and Hoeschele Target gene Significant correlation with target gene Method of Bing and Hoeschele Target gene Keep only the strongest correlation, if significant Method of Bing and Hoeschele Target gene Compute 1st-order partial correlations Method of Bing and Hoeschele Target gene Keep only the strongest partial correlation, if significant Method of Bing and Hoeschele Target gene Compute 2nd –order partial correlations Method of Bing and Hoeschele Target gene Discard 2nd-order partial correlation if not significant Method of Bing and Hoeschele Target gene Resulting network Network reconstruction, part 1 • For each gene included in the gene list of an eQTL confidence interval Æ compute correlation coefficient with the gene expression profile of the gene affected by the eQTL. • Test for significant departure from zero via a ttest with Bonferroni correction (threshold pvalue: 0.05/n, n: number of genes in the eQTL confidence interval) • If significant: Identify the gene with the most significant correlation coefficient Æ Gene 1. Network reconstruction, part 2 • Compute first-order partial correlation coefficients between the other genes and the gene affected by the eQTL, conditional on Gene 1. • Test for significant departure from zero via a t-test with Bonferroni correction (threshold p-value: 0.05/(n-1), n: number of genes in the eQTL confidence interval). • If significant: Identify the gene with the most significant partial correlation coefficient Æ Gene 2. Network reconstruction, part 3 • Compute second-order partial correlation coefficients between the other genes and the gene affected by the eQTL, conditional on Genes 1 & 2. • Test for significant departure from zero via a t-test with Bonferroni correction (threshold p-value: 0.05/(n-2), n: number of genes in the eQTL confidence interval). • If significant: Identify the gene with the most significant partial correlation coefficient Æ Gene 3. • And so on … Shortcomings • Iterative, heuristic piecemeal approach • No conditioning on the whole system, but on a set of pre-selected genes ? Shortcomings • Iterative, heuristic piecemeal approach • No conditioning on the whole system, but on a set of pre-selected genes ? Shortcomings • Iterative, heuristic piecemeal approach • No conditioning on the whole system, but on a set of pre-selected genes ? Marriage between graph theory and probability theory Friedman et al. (2000), J. Comp. Biol. 7, 601-620 Bayesian analysis: integration of prior knowledge β Hyperparameter β trades off data versus prior knowledge Microarray data KEGG pathway Hyperparameter β trades off data versus prior knowledge β small Microarray data KEGG pathway Hyperparameter β trades off data versus prior knowledge Microarray data β large KEGG pathway Input: MCMC Learn: Protein signalling network from the literature Predicted network