Lecture 27 : Asscociation Genetics April 21, 2014 Announcements Final exam April 29 at 3 pm, 3306 LSB (computer lab) Review session on Friday Bring questions Final lab on Wednesday Course evaluations Extra credit opportunity: earn up to 10 points for lab report Due at final exam Last Time Sequence data and quantification of variation Sequence-based tests of neutrality Ewens-Watterson Test Tajima’s D Hudson-Kreitman-Aguade Test Synonymous versus Nonsynonymous substitutions McDonald-Kreitman Today Quantitative traits Genetic basis Heritability Linking phenotype to genotype QTL analysis introduction Limitations of QTL Association genetics Mendelian trait Individual 1 2 3 4 5 6 7 8 9 10 Allele A1 Allele A2 Genotype = 12 11 22 22 11 22 12 11 22 12 Quantitative trait 16 28 40 52 Height 64 76 88 Courtesy of Glenn Howe Quantitative traits are polygenic 55 Students at Connecticut Agricultural College, 1914 60 65 70 75 As the number of loci controlling a trait increases, the distribution of trait values in a population becomes bell-shaped 80 85 Influence of Environment on Human Height Mean = 67 2.7 in. 1914 Height vs GDP (1925-1949) By Country Mean = 70 3 in. 1996 Baten 2006 4:10 Schilling et al. 2002. Amer. Stat. 56: 223-229 6:5 Hartl and Clark 2007 Hartl, D. 1987. A primer of Population Genetics. 3 loci, 2 additive alleles Uppercase alleles contribute 1 unit to phenotype (e.g., shade of color) The phenotype is the outward manifestation of the genotype = + Phenotype Genotype σ2 P σ2G Environment σ2E Courtesy of Glenn Howe Types of genetic variance (σ2G) Additive (σ2A): effects of individual alleles Dominance locus Interaction (epistasis) (σ2D): effects of allele interactions within (σ2I): effects of interactions among loci σ2G = σ2A + σ2D + σ2I Non-additive Main cause for resemblance between relatives Heritability Phenotype vs Genotype Var(phenotype) = Var(genotype) + Var(environment) Heritability: Var(genotype) / Var(phenotype) Two types of heritability Broad-Sense Heritability includes all genetic effects: dominance, epistasis, and additivity − For example, the degree to which clones or monozygotic twins have the same phenotype Narrow-Sense Heritability includes only additive effects − For example, degree to which offspring resemble their parents Heritability (continued) Characteristic of a trait measured in a particular population in a particular environment Best estimated in experiments (controlled environments) Estimated from resemblance between relatives The higher the heritability, the better the prediction of genotype from phenotype (and vice versa) h² = 0.5 h² = 0.1 P h² = 0.9 P G P G http://psych.colorado.edu/~carey/hgss/hgssapplets/heritability/heritability1/heritability1.html G Identifying Genes Underlying Quantitative Traits Many individual loci are responsible for quantitative traits, even those with high heritability Identification of these loci is a major goal of breeding programs Allows mechanistic understanding of adaptive variation Methods usually rely on correlations between molecular marker polymorphisms and phenotypes Quantitative Trait Locus Mapping A B C A B C Parent 2 a b c X HEIGHT Parent 1 a b c A B C A B c B b a b c Bb A B C X A A b B c c a a BB c c BB F1 F1 BB a A B b c c bb modified from D. Neale A a b b c c bb AA bB cC BB bb a b c A A B b c c Bb a a B B c c Bb GENOTYPE A a b B c c Bb a B c BB BB Quantitative Trait Locus Analysis Step 1: Make a controlled cross to create a large family (or a collection of families) Parents should differ for phenotypes of interest Segregation of trait in the progeny Step 2: Create a genetic map Large number of markers phenotyped for all progeny Step 3: Measure phenotypes Need phenotypes with high heritability Step 1: Construct Pedigree Cross two individuals with contrasting characteristics Create population with segregating traits Ideally: inbred parents crossed to produce F1s, which are intercrossed to produce F2s Recombinant Inbred Lines created by repeated intercrossing Allows precise phenotyping, isolation of allelic effects Grisel 2000 Alchohol Research & Health 24:169 Step 2: Construct Genetic Map Number of recombinations between markers is a function of map distance Gives overview of structure of entire genome Anonymous markers are cheap and efficient: AFLP, Genotyping by Sequencing Codominant markers much more informative: SSR, SNP Genotyping by Sequencing gives best of both worlds: cheap, abundant, codominant markers! Step 3: Determine Phenotypes of Offspring 0.1 Phenotype must be segregating in pedigree Must differentiate genotype and environment effects How? 0.5 0.9 Works best with phenotypes with high heritability Step 4: Detect Associations between Markers and Single-marker associations are Phenotypes simplest Simple ANOVA, correcting for multiple comparisons Log likelihood ratio: LOD (Log10 of odds) LOD = log10 Pr(Data | QTL) Pr(Data | noQTL) If QTL is between two markers, situation more complex Recombination between QTL and markers (genotype doesn't predict phenotype) 'Ghost' QTL due to adjacent QTL Use interval mapping or composite interval mapping Simultaneously consider pairs of loci across the genome Step 5: Identify underlying molecular mechanisms QTL chromosome Genetic Marker QTG: Quantitative Trait Gene QTN: Quantitative Trait Nucleotide Adapted from Richard Mott, Wellcome Trust Center for Human Genetics QTL Limitations Huge regions of genome underly QTL, usually hundreds of genes How to distinguish among candidates? Biased toward detection of large-effect loci Need very large pedigrees to do this properly Limited genetic base: QTL may only apply to the two individuals in the cross! Genotype x Environment interactions rampant: some QTL only appear in certain environments Linkage Disequilibrium and Quantitative Trait Mapping Linkage and quantitative trait locus (QTL) analysis Need a pedigree and moderate number of molecular markers Very large regions of chromosomes represented by markers Association Studies with Natural Populations No pedigree required Need large numbers of genetic markers Small chromosomal segments can be localized Many more markers are required than in traditional QTL analysis Cardon and Bell 2001, Nat. Rev. Genet. 2: 91-99 ancestral chromosomes G T HEIGHT Association Mapping * TT TC GENOTYPE CC recombination through evolutionary history present-day chromosomes in natural population G C G T A C A C * G T A T * * Slide courtesy of Dave Neale Next-Generation Sequencing and Whole Genome Scans The $1000 genome is on the horizon Current cost with Illumina HiSeq 2000 is about $2000 for 10X depth Thousands of human genomes have now been sequenced at low depth Can detect most polymorphisms with frequency >0.01 True whole genome association studies now possible at a very large scale Direct to Consumer Genomics: 23 & Me and other genotyping services http://www.1000genomes.org/ Commercial Services for Human Genome-Wide SNP Characterization NATURE|Vol 437|27 October 2005 Assay 1.2 million “tag SNPs” scattered across genome using Illumina BeadArray technology Ancestry analyses and disease/behavioral susceptibility Identifying genetic mechanisms of simple vs. complex diseases Simple (Mendelian) diseases: Caused by a single major gene High heritability; often can be recognized in pedigrees Example: Huntington’s, Achondroplasia, Cystic fibrosis, Sickle Cell Anemia Tools: Linkage analysis, positional cloning Over 2900 disease-causing genes have been identified thus far: Human Gene Mutation Database: www.hgmd.cf.ac.uk Complex (non-Mendelian) diseases: Caused by the interaction between environmental factors and multiple genes with minor effects Interactions between genes, Low heritability Example: Heart disease, Type II diabetes, Cancer, Asthma Tools: Association mapping, SNPs !! Over 35,000 SNP associations have been identified thus far: http://www.snpedia.com Slide adapted from Kermit Ritland Complicating factor: Trait Heterogeneity Same phenotype has multiple genetic mechanisms underlying it Slide adapted from Kermit Ritland Case-Control Example: Diabetes Knowler et al. (1988) collected data on 4920 Pima and Papago Native American populations in Southwestern United States High rate of Type II diabetes in these populations Found significant associations with Immunoglobin G marker (Gm) Does this indicate underlying mechanisms of disease? Knowler et al. (1988) Am. J. Hum. Genet. 43: 520 Case-control test for association (case=diabetic, control=not diabetic) Gm Haplotype Type 2 Diabetes present absent Total present 8 29 37 absent 92 71 163 100 100 200 Total Question: Is the Gm haplotype associated with risk of Type 2 diabetes??? (1) Test for an association C21 = (ad - bc)2N . (a+c)(b+d)(a+b)(c+d) = [(8x71)-(29x92)]2 (200) = 14.62 (100)(100)(37)(163) (2) Chi-square is significant. Therefore presence of GM haplotype seems to confer reduced occurence of diabetes Slide adapted from Kermit Ritland Case-control test for association (continued) Question: Is the Gm haplotype actually associated with risk of Type 2 diabetes??? The real story: Stratify by American Indian heritage 0 = little or no indian heritage; 8 = complete indian heritage Index of indian Heritage 0 4 8 Conclusion: Gm Haplotype Percent with diabetes Present 17.8 Absent 19.9 Present 28.3 Absent 28.8 Present 35.9 Absent 39.3 The Gm haplotype is NOT a risk factor for Type 2 diabetes, but is a marker of American Indian heritage Slide adapted from Kermit Ritland Population structure and spurious association Assume populations are historically isolated One has higher disease frequency by chance Unlinked loci are differentiated between populations also Unlinked loci show disease association when populations are lumped together Population with high disease frequency Gene flow barrier Population with low disease frequency Alleles at neutral locus Alleles causing susceptibility to disease Association Study Limitations Population structure: differences between cases and controls Genetic heterogeneity underlying trait Random error/false positives Inadequate genome coverage Poorly-estimated linkage disequilibrium Association Analysis with a Mixed Model effects of background SNPs phenotype (response variable) of individual i effect of target SNP Family effect (Kinship coefficient) Population Effect (e.g., Admixture coefficient from Structure or values of Principal Components) Implemented in the Tassel program (Wednesday in lab)