Consortium for Comparative Genomics University of Colorado School of Medicine Population Genetics The study of naturally occurring genetic differences among organisms Biochemistry and Molecular Genetics Consortium for Comparative Genomics Human Medical Genetics, Computational Bioscience University of Colorado School of Medicine David.Pollock@UCDenver.edu www.EvolutionaryGenomics.com Topics Population Genetics Molecular Evolution Phylogenetics Statistical Inference in Genetics The Study of Polymorphism & Divergence Stabilizing Selection: Human Birth Weight Phenotype Genotype (Variation) Mutagenic Lesion Mutation (Recombination, Indels, Segmental Duplication) Variation Molecular Characteristics Expression Modification Folding Interaction Function Evolution (Fixation) Phenotypic Effects Dominant } Pleiotropic Effects Recessive Background Genetics (Compensatory Changes) Environment Dependence Genetic Essentials I Genotype and Phenotype Most complex traits (hair color, height, weight, behavior, life span, reproductive fitness) are influenced by many genes Genetic Essentials I Alleles and Polymorphism Mutagenic lesion, mutation, fixation SNP, non-synonymous, synonymous, indel Microsatellite, minisatellite, tandem repeat Transposable elements, rearrangement, inversion Reversion, multiple mutations, recombination Evolution Creates Diversity Populations (species) split, multiply, and diverge And sometimes go extinct Schaal B A , Olsen K M PNAS 2000;97:7024-7029 Evolution Creates Diversity Genes duplicate, multiply, and diverge And sometimes are deleted Uses of Polymorphism I Estimate level of genetic variation and understand patterns in different types To understand Uses of Polymorphism I To understand evolutionary mechanisms that produce variation Uses of Polymorphism I DNA fingerprinting: ID individuals Uses of Polymorphism II As indicators of population history To understand evolutionary origin, global expansion, and diversification of humans European Origins EUROPE H, U, X X T, U,V, W I, J, K L Africa (origin 120-150KYA) Asia, NA, SA, Australia 35-50 KYA N North America M FBZA CDGY • dispersal from Africa 55-75 thousand years ago 55-75 KYA • Major European lineages (I, J, K, T, U, V, W, H, X) split 35-50 thousand years ago 34.02% M L * * H R V A I 2.24% U * W x J • Some lineages were more successful T 12.00% K 8.26% Some Perspective 41K MODERN EUROPEANS 370K AGA Ancestral Great Ape 195K Uses of Polymorphism II As indicators of population history To understand evolutionary origin, global expansion, and diversification of humans To understand the origins of domestic species Uses of Polymorphism II As indicators of population history To understand evolutionary origin, global expansion, and diversification of humans To understand the origins of domestic species To determine origins of species (phylogeny) and of morphology, behavior and other adaptations Looking into the Past (or Future) Uniformitarianism Past Present Future Laws of Geology, Nature, Physics Sexual reproduction Diploid Parents Haploid Gametes (sperm, eggs) Diploid Zygote (offspring) Genetic Essentials II Genotype-the combination of genes at a locus (one from Ma, one from Pa) What happens to genotypic variation? Mixing, recombination, selection, drift Homozygotes, heterozygotes, recessive, dominant, codominant Genotype => phenotype Linkage disequilibrium Measuring Variation Number of individuals sampled +/+: 76 people +/D32: 22 people D32/D32:2 people Express as proportions => genotype frequencies +/+: 76/100=0.76 D32 is a CCR5 human chemokine receptor gene variant is strongly resistant to infection by HIV-1 It is a major macrophage coreceptor for HIV-1 +/D32:22/100=0.22 D32/D32:2/100=0.02 Measuring Variation Allele frequency of D32 allele is <freq of D32 allele> = (2*2 + 22)/(2*100) = 0.13 p = allele frequency, <p> is estimator Estimated variance is <Var<p>>, sqrt => std error 95% within 2 std errors => 95% confidence interval Sampled n individuals, 2n alleles; q = 1 - p About 10% of Northern Europeans (black death?) Example Example CCDR5 D32 q 0.018 se 0.0009 n 111 Pyrenees 19 p 1.7 10 Isolated small population in mountains ~18,000 years ago Species and Geographical Structure Biological species have nonrandom of spatial distribution of organisms Clumping, aggregation, herd (school, flock) formation, colonies Environmental patchiness What’s a species anyway? “Population” is easier to define Local interbreeding units in restricted geographic area, N individuals Systematic changes in allele frequencies Monday, July 25, Biology 3040 Hardy-Weinberg ~1908 Mathematical model: non-overlapping generations, sexual, diploid First approximation Oversimplified, but often good enough Random mating, large N With respect to the genotype under consideration For now, ignore mutation, selection, dominance, migration Hardy-Weinberg Genotype frequencies achieve equilibrium after ONE generation Diploid parents => haploid gametes A and a, frequencies are p + q = 1 AA: D’ = p2, Aa: H’ = 2pq, aa: R’ = q2 Random mating of individuals is usually equivalent to random union of gametes Graphical Genotype Frequencies p p q p AA Aa p q Aa aa q AA Aa q Aa aa If one allele rare, almost all individuals that have it are heterozygotes Hardy-Weinberg Genotype frequencies can only be calculated from allele frequencies if H-W EQ is assumed Don’t calculate p from sqrt(p2), as this assumes HW is true Can test if there is deviation: observed genotype frequencies versus expected (p2, 2pq, q2) Usually, no deviations. Common exception is due to mixing of populations (Wahlund effect). Heterozygotes under-represented Experiment: Hair Color • Experimental Outcome Genotype DD Db bb N Count 13 3 36 20 p = 26.5/3 6 = 53/72 = 0.74 q = 1 - p = 9.5/36 = 19/72 = 0.26 HW P2 * N calculation Expected 19.5 2pq * N q2 * N N 14 2.5 36 Mixing Populations MM MN NN Size Aborigine 22 Navajo 305 Total 327 216 52 268 492 4 496 730 361 1091 Test for HW, pooled population p = (327 + 268/2) / 1091 = 461/1091 = 0.423 q = 0.577 Expected MM MN NN Size 195 533 533 363 1091 Sexual reproduction Diploid Parents Haploid Gametes (sperm, eggs) Diploid Zygote (offspring) What is Sex? Mixing Gender Ploidy Recombination Multiple Loci: Linkage 2 loci, A and B, in a good population, each in HW equilibrium Not necessarily randomly associated The gametes may have nonrandom associations Gametes are A1B1, A1B2, A2B1, and A2B2 Allele frequencies of A and B, separately, are p1 and p2 for A, q1 and q2 for B Allele frequencies for A and B combined are P11, P12, P21, P22, respectively