Lab 7. Estimating Population Structure Goals 1. Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2. Demonstrate roles of gene flow and genetic drift on population structure. Gene flow and Genetic drift Gene flow maintains similar allele frequency in different subpopulations. Genetic drift causes random differences in allele frequencies among small subpopulations. qm m m q0 q0 m m m q0 q0 q0 Wright’s Island model: Assumes Gene flow occurs with equal probability from the continent (large source population) to each island (smaller subpopulations) Gene flow and Genetic drift Assuming equilibrium between gene flow (increasing variations) and genetic drift (reducing variation in finite population) and also assuming Wright’s island model, diversity among subpopulations(FST) can be calculated as : FST (1 m) 2 2 N (2 N 1)(1 m) 2 1 FST 4 Nm 1 If, m=0, FST =1; i.e. Strong genetic differentiation exists among subpopulations. If, m=1, FST =0; i.e. No genetic differentiation exists among subpopulations. F-coefficients with different levels of structure F Formula Meaning FIT HT HO FIT HT Measure of deviation (MD) from HWE in total population. 0 : No deviation from HWE in TP. Positive: Deviation due to deficiency of heterozygotes in TP. Negative: Deviation due to excess of heterozygotes in TP. FST Measure of genetic differentiation among subpopulations. H T H S It is always positive. FST H T 0 : No genetic differentiations among subpopulations. 1 :Strong genetic differentiations among subpopulations. FIS FIS Measure of deviation from HWE within subpopulations. H S H O 0 = No deviation from HWE within SP. Positive: Deviation due to deficiency of heterozygotes within SP. H S Negative: Deviation due to excess of heterozygotes within SP. F-coefficients with different levels of structure Parameter FSR FRT Formula Meaning HR HS FSR HR Measure of genetic differentiation among subpopulations within a region. HT H R FRT HT Measure of genetic differentiation among regions for the total population. 0 : No genetic differentiation among subpopulations within a region. 1 :Strong genetic differentiation among subpopulations within a region. 0 : No genetic differentiation among regions in TP. 1 :Strong genetic differentiation among regions in TP. Estimation of F Coefficients using AMOVA Parameter AMOVA (Arlequin) FST φST or FST FSR φSC or FSC FRT φCT or FCT Population structure from worldwide human population Population = subpopulation. Group = Regions Eurasia East Asia Oceania America Africa AMOVA result interpretations: -------------------------------------------------------------------------------------------------------------Source of variations Percentage of variation -------------------------------------------------------------------------------------------------------------Among groups(regions) 10 Among sub(populations) within a region Within sub(populations) 4 86 Fixation Indices: FST : 0.14 FSC : 0.04 FCT : 0.10 ---------------------------------------------------------------------- 14 % of total genetic variation is due to differentiation among subpopulations. 86 % of total genetic variation is due to differentiation within subpopulations. 4 % of regional genetic variation is due to differentiation among subpopulations. 10 % of total genetic variation is due to differentiation among regions. # of individuals # of pops. 87 4 10 Human structure data # individuals in pops. # of regions 13 24 25 25 Colombian Karitiana Maya Pima 2 # individuals in regions 37 50 SA NA ID Population 46 Colombian 120 120 128 124 142 124 133 129 47 Colombian 120 120 128 124 146 124 129 129 48 Colombian 126 126 128 124 146 144 129 129 Infinite Alleles Model (Crow and Kimura Model) • Each mutation creates a completely new allele • Reversion is so rare as to be essentially non-existant • Any single mutation is as likely as any other single mutation Stepwise Mutation Model • Do all loci conform to Infinite Alleles Model? • Are mutations from one state to another equally probable? • Consider microsatellite loci: small insertions/deletions more likely than large ones? Problem 1. File human_struc.xls (which is already in GenAlEx format) contains data for 10 microsatellite loci used to genotype 41 human populations from a worldwide sample. a) Five regions are already defined in the file (AFRICA, AMERICA, EAST ASIA, EURASIA, and OCEANIA). Convert the file into Arlequin format and perform AMOVA based on this grouping of populations within regions using distance measures based on the Infinite Alleles Model (IAM) and the Stepwise Mutation Model (SMM). How do you interpret these results? Report values of Φ-statistics and their statistical significance for each AMOVA you run. b) Do you think that any of these regions can justifiably be divided into subregions? Pick a region, form a hypothesis for what would be a reasonable grouping of populations into subregions (see information in Appendix 1 and map in Appendix 2), then run AMOVA only for the region you selected using distance measures based on both the IAM and the SMM. Was your hypothesis supported by the data? c) How do Φ-statistics calculated from distance measures based on the SMM compare to those based on the IAM? d) GRADUATE STUDENTS ONLY: Which of the 5 initially defined regions has the highest diversity in terms of effective number of alleles? What is your biological explanation for this? Make sure that you cite your sources, and avoid dubious internet sites. How to choose K? Picking the Best K K Log-likelihood 2 -1235 3 -1238 e-1235 e-1235 1 P(K = 2 | Data) = -1235 -1238 = -1235 = = 0.9526 -3 -3 e +e e (1+ e ) 1+ e e-1238 e-1238 1 P(K = 3 | Data) = -1235 -1238 = -1238 3 = 3 = 0.0474 e +e e (e +1) e +1 Picking the Best K Problem 2. Use Structure to further test the hypotheses you developed in Problem 1. a) Calculate the posterior probabilities to test whether: (i) All subpopulations form a single, genetically homogeneous group. (ii) There are two genetically distinct groups within your selected region. (iii) There are three genetically distinct groups within your selected region. b) Use the ΔK method to determine the most likely number of groups. How does this compare to the method based on posterior probabilities? c) How do the groupings of subpopulations compare to your expectations from Problem 1? d) Is there evidence of admixture among the groups? If so, include a table or figure showing the proportion of each subpopulation assigned to each group. e) GRADUATE STUDENTS ONLY: Provide brief, literature-based explanation for the groupings you observe.