Lecture 16: Introduction to Linkage Disequilibrium October 19, 2015 Exam 2 u Wednesday, October 28 at 6:30 in lab u Genetic Drift, Population Structure, Population Assignment, Individual Identity, Paternity Analysis, and Linkage Disequilibrium u Sample exam posted on website u Review on Monday, October 26 Last Time u Population structure and gene flow u Introduction to paternity analysis Today Multiple loci and independent segregation Estimating linkage disequilibrium Causes of linkage disequilibrium Extending to Multiple Loci So far, only considering dynamics of alleles at single loci Loci occur on chromosomes, linked to other loci! “The fitness of a single locus ripped from its interactive context is about as relevant to real problems of evolutionary genetics as the study of the psychology of individuals isolated from their social context is to an understanding of man’s sociopolitical evolution” Richard Lewontin (quoted in Hedrick 2005) Size of region that must be considered depends on Linkage Disequilibrium Gametic (Linkage) Disequilibrium (LD) Nonrandom association of alleles at different loci into gametes Haplotype: Genotype of a group of loci in LD LD is a major factor in evolution LD itself provides insights into population history Estimation of LD is critical for ALL population genetic data Nomenclature and concepts Two loci, two alleles Frequency of allele i at locus 1 is pi Frequency of allele i at locus 2 is qi p1 A1 B1 q1 p2 A2 B2 q2 n n p q i 1 i i 1 i 1 Nomenclature and concepts Genotype is written as A 1 B 1 A2 B 2 A1 B1 A2 B2 A1 and B1 are in coupling phase A1 and B2 are in repulsion phase A1 B1 and A2 B2 are haplotypes Gametic Disequilibrium Easiest to think about physically linked loci, but not necessarily the case A1 B1 A2 B2 Meiosis A1 B1 A1 B2 A 2 B1 A2 B2 What Are Expectedp Frequencies ofq Gametespinq a Population p1q1 q p 1 2 2 1 2 2 Under Independent Assortment? What are expected frequencies of gametes with complete linkage? p1 A1 B1 q1 p2 A2 B2 q2 p1 A1 B2 q1 p2 A2 B1 q2 Meiosis A1 B 1 A1 B2 A2 B1 A2 B2 x11 x21 x22 x12 The frequency of the gametes in the current population. Expected to stay stable in the absence of other departures from H-W Linkage disequilibrium measure, D Independent Assortment: With Linkage Disquilibrium: Substituting p1 and q1 from above table: D x11x22 x12 x21 Problem: D is sensitive to allele frequencies Gamete frequencies must be between 0 and 1 Maximum |D| set by allele frequencies Solution: D' = D/Dmax ranges from -1 to 1 Example, if D is positive: p1=0.5, q2=0.5, Dmax=0.25 but p1=0.1, q2=0.9, Dmax=0.09 Dmax Calculation: If D is positive, Dmax is lesser of p1q2 or p2q1 If D is negative, Dmax is lesser of p1q1 or p2q2 LD can also be estimated as correlation between alleles r 2 D p1 p2 q1q2 r can also be standardized to a -1 to 1 scale It is equivalent to D’ in this case r' D p1 p2 q1q2 D' Dmax p1 p2 q1q2 Recombination Shuffling of parental alleles during meiosis A 1 B 1 A2 B 2 A1 B1 A1 B2 A2 B2 A2 B1 Occurs for unlinked loci and linked loci Rate of recombination for linked markers is partially a function of physical distance Recombination Rate A 1 B 1 A 2 B2 Meiosis A1 B 1 Coupling nr c nr nc A1 B2 A2 B1 A2 B2 Repulsion Repulsion Coupling Products of Recombination Where nr is number of repulsion phase gametes, and nc is number of coupling phase gametes What is the expected recombination rate for unlinked loci? Expected Gamete Frequencies: Double Homozygote A1 B1 A1 B 1 Meiosis A1 B 1 NonRecombinant A 1 B1 A1 B1 A1 B 1 Recombinant Recombinant NonRecombinant Expected Gamete Frequencies: Double Heterozygote A1 B1 A2 B2 Meiosis A1 B 1 NonRecombinant A 1 B2 A2 B1 A2 B 2 Recombinant Recombinant NonRecombinant LD is partially a function of recombination rate Expected proportions of haplotypes produced in a population after 1 generation of mating Offspring Genotypes Parent Haplotype Offpsring Haplotype Frequencies (accounting for parental recombination) Frequencies Where c is the recombination rate and D0 is the initial amount of LD Recombination degrades LD over time D1 x'11 x'22 x'12 x'21 = (x11 - cD0 )(x22 - cD0 )-(x12 + cD0 )(x21 + cD0 ) D1 (1 c) D0 Dt (1 c) D0 t ct Dt e D0 Where t is time (in generations) and e is base of natural log (2.718) Effects of recombination rate on LD Decline in LD over time with different theoretical recombination rates (c) Even with independent segregation (c=0.5), multiple generations required to break up allelic associations ct Dt e D0 Where t is time (in generations) and e is base of natural log (2.718) LD varies substantially across human genome NATURE|Vol 437|27 October 2005 Average r2 for pairs of SNP separated by 30 kb in 1 Mb windows LD affected by location relative to telomeres and centromeres, chromosome length, GC content, sequence polymorphism, and repeat composition Highest and lowest levels of LD found in gene-rich regions Human HapMap Project and Whole Genome Scans NATURE|Vol 437|27 October 2005 LD structure of human Chromosome 19 (www.hapmap.org) 1 common SNP genotyped every 700 bp for 270 individuals (3.4 million SNP) 9.2 million SNP in total LD in the Poplar Genome LD declines rapidly with distance LD higher in genes than in genome as a whole Loci separated by kilobases still in LD! 1 0.5 Genomewide (core of range) Genes (core of range) 1 0.4 3 2 0.3 r2 2 4 0.2 5 1 0.1 3 2 0.0 0 5 10 15 20 Distance (kb) Slavov et al. 2012 New Phyt 196:713-725 Recombination Across Poplar Chromosomes Substantial variation in recombination rate Related to repeat composition, methylation, and distance from centromere Recombination rate varies among individuals Rate is often higher in females than males Rate varies among individuals within males and females Variation in recombination rate in the MHC region (3.3 Mb in human sperm donors Genetic Drift and LD Begin with highly diverse haplotype pool Drift leads to chance increase of certain haplotypes Generates nonrandom association between alleles at different loci (LD) Genetic Drift and LD Why doesn’t recombination reduce LD in this situation? LD is partially a function of recombination rate Expected proportions of gametes produced by various genotypes over two generations Effective LD increases with homozygosity Double heterozygote is only case where recombination matters Effect of Drift on LD Drift and recombination will have opposing effects on LD 1 E(r ) = 1+ 4N ec 2 4Nec is “population recombination rate”, Expression approaches 0 for large populations or high recombination rates Where r2 is the squared correlation coefficient for alleles at two loci, Ne is effective population size, and c is recombination rate Combined effects of Drift and Recombination LD declines as a function of population recombination rate Effects of chance fluctuation of gamete frequencies Nec How should inbreeding affect linkage disequilibrium?