Network inference in Systems Genetics Dirk Husmeier Biomathematics & Statistics

advertisement
Network inference in
Systems Genetics
Dirk Husmeier
Biomathematics & Statistics
Scotland (BioSS)
Identify candidate causal genes
within the eQTL confidence interval
around a marker by (partial) gene
expression correlation analysis
Target gene
Genome with potential candidate genes
Target gene
Marker
Target gene
Bootstrap confidence interval
Target gene
Significant correlation with target gene
Target gene
Significant correlation with target gene
Distinguish between direct and indirect interactions
direct
common
interaction
indirect
regulator
interaction
co-regulation
A and B have a high correlation,
but a low partial correlation
Method of Bing and Hoeschele
Target gene
Significant correlation with target gene
Method of Bing and Hoeschele
Target gene
Keep only the strongest correlation, if significant
Method of Bing and Hoeschele
Target gene
Compute 1st-order partial correlations
Method of Bing and Hoeschele
Target gene
Keep only the strongest partial correlation, if significant
Method of Bing and Hoeschele
Target gene
Compute 2nd –order partial correlations
Method of Bing and Hoeschele
Target gene
Discard 2nd-order partial correlation if not significant
Method of Bing and Hoeschele
Target gene
Resulting network
Network reconstruction, part 1
• For each gene included in the gene list of an
eQTL confidence interval Æ compute correlation
coefficient with the gene expression profile of the
gene affected by the eQTL.
• Test for significant departure from zero via a ttest with Bonferroni correction (threshold pvalue: 0.05/n, n: number of genes in the eQTL
confidence interval)
• If significant: Identify the gene with the most
significant correlation coefficient Æ Gene 1.
Network reconstruction, part 2
• Compute first-order partial correlation coefficients
between the other genes and the gene affected by
the eQTL, conditional on Gene 1.
• Test for significant departure from zero via a t-test
with Bonferroni correction (threshold p-value:
0.05/(n-1), n: number of genes in the eQTL
confidence interval).
• If significant: Identify the gene with the most
significant partial correlation coefficient Æ Gene 2.
Network reconstruction, part 3
• Compute second-order partial correlation
coefficients between the other genes and the gene
affected by the eQTL, conditional on Genes 1 & 2.
• Test for significant departure from zero via a t-test
with Bonferroni correction (threshold p-value:
0.05/(n-2), n: number of genes in the eQTL
confidence interval).
• If significant: Identify the gene with the most
significant partial correlation coefficient Æ Gene 3.
• And so on …
Shortcomings
• Iterative, heuristic piecemeal approach
• No conditioning on the whole system, but on a
set of pre-selected genes
?
Shortcomings
• Iterative, heuristic piecemeal approach
• No conditioning on the whole system, but on a
set of pre-selected genes
?
Shortcomings
• Iterative, heuristic piecemeal approach
• No conditioning on the whole system, but on a
set of pre-selected genes
?
Marriage between
graph theory
and
probability theory
Friedman et al.
(2000), J. Comp.
Biol. 7, 601-620
Bayesian analysis:
integration of prior knowledge
β
Hyperparameter β trades off data
versus prior knowledge
Microarray data
KEGG pathway
Hyperparameter β trades off
data versus prior knowledge
β small
Microarray data
KEGG pathway
Hyperparameter β trades off
data versus prior knowledge
Microarray data
β
large
KEGG pathway
Input:
MCMC
Learn:
Protein signalling network from the literature
Predicted network
Download