Pathway‐Based Approach for Genetic Analysis of Gene Expression Alex C. Lam GeneSys inaugural meeting

Pathway‐Based Approach for Genetic Analysis of Gene Expression Alex C. Lam GeneSys inaugural meeting 1 Oct ‐ 3 Oct 2008 Edinburgh, UK eQTL mapping Rockman et al. Nat. Rev. Genet. 2006 Signals from a gene set Type II error • eQTL studies have great potential in dissecting complex traits, but…. – Genome scan + large number of traits – Massive multiple testing – Stringent threshold required • Some real signals fail to reach the statistical significant threshold • Can we use existing knowledge in biology as prior in guiding our detection? Outline of this presentation • • • • • • • Idea of gene set testing The BXH/HXB rat inbred line dataset KEGG pathway Wilcoxon Test Fisher’s Exact Test Gene Ontology (GO) Results and summary Gene set testing • Consider differential expression (DE) • Genes are classified as DE or non‐DE • Suppose that genes can be categorized according to their gene function • Null hypothesis: The extent of DE is identical across all functional gene sets • If more DE genes came from a gene set than expected, the relationship could be interesting Hypertensive rat eQTL study • Hubner et al. (Nat. Genet. 2005) • 30 Recombinant Inbred Lines of Spontaneous Hypertensive Rat / Brown Norway at F60 • Rat Affymetrix GeneChip • 2 tissues: kidney and fat • 1,011 autosomal microsatellite markers • Linkage analysis for each transcript KEGG pathways • Kyoto Encyclopedia of Genes and Genomes Gene filtering 15,923 probesets Remove controls and non‐ expressed probesets ~13,000 probesets Remove probesets with no EntrezGene entry. Also remove duplicates ~9,000 probesets Remove probesets with no KEGG entry. Also remove KEGG sets with < 5 genes Fat: 2185 genes, 152 KEGG pathways Gene set testing along the genome • Conventional eQTL analysis: – For each gene expression phenotype, consider the linkage evidence (e.g. Likelihood Ratio Test statistics) along the genome at regular intervals (e.g. every 1 cM) • Gene set testing: – At regular intervals, consider the linkage evidence of multiple gene expression traits. The LRT statistics are the input of the test. Method (1) • Two‐sample Wilcoxon test – Non‐parametric version of the “t‐test” – Rank the LRT test statistics – Test if the genes in the gene set rank higher than those not in the gene set Genes in set A Genes not in set A RANK Permutation (1) W Genes in set A Re‐sampling W1 W1000 Ribosome pathway Permutation (2) W Genes in set A Re‐sampling W1 W1000 Results ‐ Wilcoxon Test • 10 signals < 5% genome‐wise significance • 2 examples below: most members of the gene set ranked highly at the locus Ranks might not be meaningful Point‐wise 5% Method (2) • Set a linkage threshold for individual genes • Test for over‐representation of gene set Expected Enrichment Gene set A Gene universe Genes with linkage detected 2 by 2 table representation • One‐tailed Fisher’s Exact test • P‐value 0.001 used as linkage threshold In gene set Not in gene set Genes linked to Genes not linked eQTL to eQTL A B C D A + B + C + D = all genes in gene universe Results ‐ Fisher’s Exact Test KEGG ID Ch cM r Minimum genomewise P-value Gene set size No. of genes with significant statistic * KEGG pathway name local 05020 1 143 0.013 12 3 (23) Parkinson's disease 04360 3 210211 0.015 70 5 (8) Axon guidance 00630 5 71-72 0.030 7 3 (8) Glyoxylate and dicarboxylate metabolism 03022 12 18 0.043 15 2 (3) Basal transcription factors 00260 19 35-36 0.041 24 3 (10) Glycine, serine and threonine metabolism 04514 20 1-6 0.007 81 9 (14) Cell adhesion (CAMs) 04612 20 1-5 0.003 46 10 (14) Antigen processing presentation 04940 20 2-5 0.003 34 9 (14) Type I diabetes mellitus molecules and Results ‐ Fisher’s Exact Test • Chromosome 20 signals came from MHC genes; likely to be artefacts • Other signals tend to come from a small number of genes; robustness? • Overall, none of the signals are very convincing • Too many eQTL discarded; KEGG coverage is quite low Gene Ontology (GO) • An alternative way to KEGG to create gene sets • Describe the functions of gene products • More genes are annotated in GO than in KEGG – After removing GO terms with > 100 genes and < 10 genes, 5893 genes are retained for analysis (from 1676 GO terms) Results ‐ Fisher’s Exact Test • 13 signals • LRT statistic threshold P < 0.001 • Examples: – Chr 17 potassium channel activity (19 / 61) – Chr 17 oxidoreductase activity (13 / 28) – Chr 5 immunological synapse (6 / 14) • Should these genes be looked at more closely? Potentials • Highlight putative effects that are moderate in size • Start more in‐depth in‐silico analyses – Correlation – Interaction • Generate new hypotheses for future study Caveats • Analysis is only as good as the annotation – Incomplete? – Incorrect? • The number of genes included has a strong influence – Repeatable in a larger study? • Arbitrary threshold for Fisher’s Exact Test • Multiple testing of gene sets Summary • Gene set testing can identify interesting “multi‐trait eQTL” • Permutation should be carried out at the subject level • Wilcoxon Test picks up a lot of noise • The size of the gene universe is important • Currently, GO has a higher coverage than KEGG Acknowledgements – Edinburgh • DJ de Koning • Chris Haley – MRC Clinical Sciences Centre / Imperial • Tim Aitman • Enrico Petretto – Aarhus • Peter Sørensen • Funding – BBSRC – Genesis Faraday – Genus / PIC

Pathway‐Based Approach for Genetic Analysis of Gene Expression Alex C. Lam GeneSys inaugural meeting

Related documents

Products

Support

Pathway‐Based Approach for Genetic Analysis of Gene Expression Alex C. Lam GeneSys inaugural meeting

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib