An integrated statistical framework for mapping quantitative traits in mice Richard Mott Jonathan Flint Wellcome Trust Centre for Human Genetics, Oxford Richard.Mott@well.ox.ac.uk Outline • Introduction • QTL Mapping • Multiple Phenotype Heterogeneous Stock Experiment • Testing for Functional Variants • Expression Data • Future Genetic Traits • Quantitative (height, weight) • Dichotomous (affected/unaffected) • Factorial (blood group) • Mendelian - controlled by single gene (cystic fibrosis) • Complex – controlled by multiple genes*environment (diabetes, asthma) Quantitative Trait Loci QTL: Quantitative Trait Locus chromosome genes Quantitative Trait Loci QTL: Quantitative Trait Locus chromosome QTG: Quantitative Trait Gene Quantitative Trait Loci QTL: Quantitative Trait Locus chromosome QTG: Quantitative Trait Gene QTN: Quantitative Trait Nucleotide Map in Humans or Animal Models ? • Disease studied directly • Population and environment stratification • Very many SNPs (1,000,000?) required • Hard to detect trait loci – very large sample sizes required to detect loci of small effect (5,000-10,000) • Potentially very high mapping resolution – single gene • Very Expensive • Animal Model required • Population and environment controlled • Fewer SNPs required (~10010,000) • Easy to detect QTL with ~500 animals • Poorer mapping resolution – 1Mb (10 genes) • Relatively inexpensive Mosaic Crosses Inbred founders G3 mixing GN chopping up F2, diallele F20 inbreeding Heterogeneous Stock, Advanced Intercross, Random Outbreds Recombinant Inbred Lines Sizes of Behavioural QTL in rodents (% of total phenotypic variance) 30 25 Number 20 15 10 5 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 Effect size (% var) Effect size of cloned genes 4 Number 3 2 1 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 Effect size (% var) Mapping Resolution • F2 crosses – Powerful at detecting QTL – Poor at Localisation – 20cM – Too few recombinants • Increase number of recombinants: – more animals – more generations in cross Heterogeneous Stocks • cross 8 inbred strains for >10 generations Heterogeneous Stocks • cross 8 inbred strains for >10 generations Heterogeneous Stocks • cross 8 inbred strains for >10 generations 0.25 cM Multiple Phenotype QTL Experiment Multiple Phenotypes measured on a Heterogeneous Stock • 2000 HS mice (Northport, Bob Hitzeman) 84 families 40th generation • 150 traits measured on each animal – Standardised phenotyping protocol – Covariates Recorded • Experimenter • Time/Date • Litter – Microchipping Phenotypes • • • • • • • • • Anxiety (Conditioned and Unconditioned Tests) Asthma (Plethysmography) Diabetes (Glucose Tolerance Test) Haematology Immunology Biochemistry Wound Healing (Ear Punch) Gene Expression ….others…. High throughput phenotyping facility Neophobia Fear Potentiated Startle Ovalbumin sensitization Plethysmograph Intraperitoneal Glucose Tolerance Test Ears Genotyping • 15360 SNPs genotyped by Illumina – 2000 HS mice – 300 HS parents – 8 inbred HS founders – 500 other inbreds • www.well.ox.ac.uk/mouse/snp.selector • 13459 SNPs successful • 99.8% accuracy (parent-offspring) Distribution of Marker Spacing 1200 Mean Interval (kb) SD Max interval Min interval Number of Markers 1000 800 204 231 11328 0 (chromosome X) (9 Markers) 600 400 200 0 0 0.5 1 1.5 2 2.5 Distance (Mb) 3 3.5 4 4.5 5 LD Decay with distance 0.9 0.8 0.7 R squared 0.6 0.5 Chr 1 Chr 2 Chr 3 Chr 4 Chr 5 Chr 6 Chr 7 Chr 8 Chr 9 Chr 10 Chr 11 Chr 12 Chr 13 Chr 14 Chr 15 Chr 16 Chr 17 Chr 18 Chr 19 Chr X 0.4 0.3 0.2 0.1 0 0 5 15 10 20 Distance (MB) 99.2% marker pairs on different autosomes have R2 < 0.05. 25 Genetic Drift in HS • 40 generations of breeding • Allele Frequency in founders will drift • 8% of genome fixed Allele Frequency in Founders Allele Frequency in HS 12.5 14.99 25 23.23 37.5 29.77 50 31.45 Analysis • Automated analysis pipeline – R HAPPY package – Single Marker Association • Each phenotype analysed independently – Transformed to Normality, outliers removed – Tailored set of covariates – Linear models for most phenotypes – Survival models for latency phenotypes Twisted Pair Analysis of Heterogeneous Stock chromosome markers alleles 1 1 2 1 1 1 2 1 11 2 2 1 2 2 1 1 1 1 2 1 1 2 111 11 2 2 1 2 1 2 • Want to predict ancestral strain from genotype • We know the alleles in the founder strains • Single marker association lacks power, can’t distinguish all strains • Multipoint analysis – combine data from neighbouring markers Twisted Pair Analysis of Heterogeneous Stock chromosome markers alleles • • • • 1 1 2 1 1 1 2 1 11 2 2 1 2 2 1 1 1 1 2 1 1 2 111 11 2 2 1 2 1 2 Hidden Markov model HAPPY Hidden states = ancestral strains Observed states = genotypes Unknown phase of genotypes • Analyse both chromosomes simultaneously • Twisted pair of HMMs • Mott et al 2000 PNAS Testing for a QTL • piL(s,t) = Prob( animal i is descended from strains s,t at locus L) • piL(s,t) calculated by HMM using – genotype data – founder strains’ alleles • Phenotype is modelled E(yi) = Ss,t piL(s,t)T(s,t) + mi Var(yi) = s2 • Test for no QTL at locus L – H0: T(s,t) are all same – ANOVA partial F test Genome Scan • Additive and dominance models • Record all peaks that exceed 5% genomewide significance, – Threshold based on 200 permutations – 9000 preliminary candidate QTL found Jointly Significant QTL • Forward selection over candidate QTL • Test each QTL conditional on other QTL • Rescan genome conditional on selected QTLs to identify new QTL • 5% genome-wide significance threshold • Bootstrap residuals to find QTL confidence regions Results • ~7 jointly significant QTL per phenotype • 95% Confidence Interval ~ 2 Mb • ~50% of QTL have a significant nonadditive component • Only 3 phenotypes were explained by single major QTL – Most phenotypes are complex Distribution of QTL Effects Mean Effect size 2.7% 180 160 140 Number of QTL 120 100 80 60 40 20 0 0 5 10 15 Effect size of QTL (% Var) 20 25 Distribution of #Genes under QTL 60 50 40 #QTL 30 +20 QTL With > 10 genes 20 10 0 01 12 23 34 45 56 #genes 6 7 78 89 10 9 10 11 %Variance Explained % Additive Genetic Variance 100 90 % Addtive Genetic Variance 80 70 60 50 % Additive Genetic Variance 40 30 20 10 0 0 20 40 60 80 100 %Var Joint QTL [% Additive Genetic Variance calculated using 3-generation pedigree data, not genotypes] Coat colour genes albino agouti brown dilute Gene Tyr Asip Tyrp Myo5a Chr. 7 2 4 9 Position (Mb) 149 310.14 158.4 150.8 HS Mapping Position 148.8 - 150.6 309.6 - 310.2 158.2 - 159 150.8 - 151.2 A known QTL: HDL Wang et al, 2003 HS mapping New QTLs: two examples • Ear Punch Hole Area Regrowth – wound healing • Cue Conditioning Freeze.During.Tone – measure of fear Cue Conditioning • Freeze.During.Tone: huge effect, small chr15 number of genes cntn1: Contactin precursor (Neural cell surface protein) Gene x Environment Gene x Sex • Repeat analysis looking for QTLs that interact with – Gender – Litter number – Season, Month, etc – Experimenter • Compare models E(y) = m + locus + env E(y) = m + locus * env Gene x Environment • 431 jointly significant GxE QTLs – – – – – 27 gene x experimenter, 81 gene x litter number, 67 gene x age, 105 gene x study day 151 gene x season. • 13% of variation is GxE • 25 GxE QTLs overlapped with original joint QTL – defined as lying within 4Mb of the peak position • 42 GxSex QTLs Testing for Functional Variants • Is a SNP functional for a trait? • Is a functional assay measured in founders related to a trait? – Gene expression – DNA-Protein binding Testing for non-Functional Variants • Is a SNP’s pattern of variation inconsistent with the QTL’s pattern of action ? • Is a functional assay’s distribution inconsistent with the QTL’s pattern of action ? Merge Analysis Yalcin et al 2005 Genetics • Require sequence of HS founders – Determine all variants and their strain distribution patterns (SDP) • Don’t genotype every variant in the HS – Instead predict genotypes in HS at all variants based on a sparse skeleton of genotypes Merge Analysis • A variant v will partition the HS founder strains into 2 or more groups, depending on its strain distribution pattern (SDP) • If p is functional for the trait then the strain effects at the QTL must be identical for strains with the same allele. – so if merging founders according to v’s SDP destroys significance then we reject v Merge Analysis Model Comparison • piL(s,t) = Prob( animal i is descended from strains s,t at locus L) • Replace strains s,t by merged pseudo-strains g,h – Add together probabilities for strains with the same allele – Phenotypic effect of merged strains g,h is F (g,h) • viL(g,h) = Prob( animal i is descended from merged strains g,h at locus L) • Compare fits of nested models • E(yi) = Ss,t piL(s,t)T(s,t) + mi E(yi) = Sg,h viL(g,h)F(g,h) + mi E(yi) = mi unmerged merged null Require no significant difference between merged and unmerged models, – and for both to be significant compared to null model Merge Analysis Open Field Activity, Chr 1 Merge Analysis rgs18 Functional Merge Analysis • Measure functional assay on HS founders – – FL(t) is value at locus L on founder s e.g. gene expression • Expected value in HS is • If assay is related to phenotype y then • Compare nested models (thanks to Chris Holmes) E(fi) = Ss,t piL(s,t)[F(s) + F(t)] assuming additivity E(yi) = q E(fi) + mi E(yi) = Ss,t piL(s,t)T(s,t) E(yi) = q Ss,t piL(s,t)[F(s) + F(t)] E(yi) = • + mi + mi mi unmerged merged null Require no significant difference between merged and unmerged models, – and for both to be significant compared to null model Gene Expression Data (with Binnaz Yalcin, Jennifer Taylor) • Illumina 40k chip • Livers, Lungs – 190 HS – HS founders Weight.GrowthRanSlope Biochem.LDL 5 5 10 10 exp.log(Pr>F) exp.log(Pr>F) 15 15 20 20 0 0 15 15 10 10 15 15 00 55 15 15 55 20 20 00 55 1010 1515 Freeze Biochem.Tot.Cholesterol Explore Biochem.Tot.Protein exp.log(Pr>F) 15 00 55 15 15 10 10 2020 15 15 exp.log(Pr>F) model difference logp locus.log(Pr>F) locus.log(Pr>F) exp.log(Pr>F) modelexp.log(Pr>F) difference logp 55 20 20 15 15 Anx Biochem.Sodium locus.log(Pr>F) locus.log(Pr>F) Context Biochem.Phosphorous 20 20 00 55 1010 1515 exp.log(Pr>F) exp.log(Pr>F) model difference logp 15 Biochem.Urea (Pr>F) Biochem.Triglycerides 15 10 10 model difference logp exp.log(Pr>F) modelexp.log(Pr>F) difference logp (Pr>F) 5 exp.log(Pr>F) exp.log(Pr>F) 0 5 0 5 00 5 model difference logp 0 5 0 5 00 0 05 5 1515 Weight.GrowthSlope Biochem.HDL locus.log(Pr>F) locus.log(Pr>F) exp.log(Pr>F) 0 5 15 0 5 15 locus.log(Pr>F) locus.log(Pr>F) locus.log(Pr>F) locus.log(Pr>F) 0 0 locus.log(Pr>F) locus.log(Pr>F) exp.log(Pr>F) 2020 Future Work Extensions to basic model • • • • Generalised linear models Multivariate data Mixture Models, EM (Chris Holmes) Family Effects, Variance Components, REML (Peter Visscher, Allan McRae) • Epistasis • Pleiotropy Mixture Models (with Chris Holmes, Sascha Antonyuk) • piL(s,t) = Prob( animal i is descended from strains s,t at locus L) • Expectation Model E(yi)= Ss,t piL(s,t)T(s,t) + mi Var(yi) = s2 • Mixture model: yi ~ Ss,t piL(s,t) f([yi - mi -T(s,t)]/s) e.g. f(t) is standard Normal density What do we want? • Biological: – Joint QTL containing the functional genes and that lead to their identification – But genetic mapping finds the variants not the genes • Statistical: – Multi-locus QTL selection algorithms that predict the phenotype of new animals accurately – Model-Averaging: no best choice? – Ghost QTL • Are statistical QTL algorithms consistent? – Do they find the biological QTL given a large enough sample size? – Simulations of multiple QTL models indicate mapping accuracy declines as complexity increases [Valdar et al 2006 Genetics in press] Conclusions • • • • Complexity of analysis No definitive analysis Gene x Environment Mouse Systems Biology Work of many hands Carmen Arboleda-Hitas Amarjit Bhomra Stephanie Burnett Peter Burns Richard Copley Stuart Davidson Simon Fiddy Jonathan Flint Polinka Hernandez Sue Miller Richard Mott Chela Nunez Gemma Peachey Sagiv Shifman Leah Solberg Amy Taylor Martin Taylor William Valdar Binnaz Yalcin Dave Bannerman Shoumo Bhattacharya Bill Cookson Rob Deacon Dominique Gauguier Doug Higgs Tertius Hough Paul Klenerman Nick Rawlins Jennifer Taylor Chris Holmes Project funded by The Wellcome Trust, UK