Electronic Supplementary Material 3. Addressing alternative

Electronic Supplementary Material 3. Addressing alternative explanations for a correlation between the occurrence of rare alleles and expected heterozygosity. ESM 3a Allele-calling errors. The process of calling microsatellite alleles in next generation sequencing data is inevitably errorprone, particularly when the data are low coverage. The likelihood-based program lobSTR appears effective at making the best calls given the data, but cannot of course guard against issues such as slippage during library preparation. Having said this, errors appear to be rather or even very rare, as evidenced by the very high probability, 0.699, that a called allele is the reference sequence allele. For the low variability loci I analyse this is very much the expected level of equality. To assess the possible impact of mis-called alleles on the patterns I described it is useful to categorise all three possible classes of event: i) A mis-called allele creates a new allele that is neither a PNM or one of the other alleles present in all four population groups. ii) An allele is mis-called in a way that creates a PNM. iii) An allele is mis-called to create a non-PNM allele that is called elsewhere in the dataset. Case (i) can be dismissed because such instances will be excluded from the analysis. Case (ii) will be ignored if the allele created is already present as a PNM. If the mis-called allele creates a PNM de novo then there exists the possibility that populations with lower quality data will carry more PNMs. However, this possibility will only be realized if the same population has higher heterozygosity, a scenario that is addressed below. Finally, case (iii) has the potential to impact heterozygosity. However, the impact will be very small because most population group – locus combinations are based one >200 allele calls. More importantly, there is no particular reason why such mis-calls will predictably increase (or decrease heterozygosity). Thus, although at first glance it might seem plausible that mis-called alleles might increase both heterozygosity and the frequency of PNMs, in practice the stringent requirement for all populations to carry the same number and identity of alleles means that no net change in heterozygosity is expected. ESM 3b Controlling more generally for population-specific issues. Mis-called alleles present just one possibility for why a given population or population group might carry more PNMs that others. Demographic history and sample structure are two others. For example, expanding populations are likely to carry more rare alleles while population groups comprising disparate populations will tend to carry alleles at more even frequencies, as evidenced by Africa having on average 18% higher heterozygosity than the other three groups. Wherever populations differ in heterozygosity, the possibility exists that the same population also carries more PNMs, for reasons that may or may not be directly related. Such a pattern can potentially drive a correlation between PNM occurrence and heterozygosity even when no causal relationship exists. To remove the risk of trends driven by population group characteristics I transformed the data such that heterozygosity values within each population group all have the same mean and variance. This was achieved by subtracting the group mean from each and then dividing by the group standard deviation. This creates the standard normal distribution with mean=0 and unit standard deviation. When this is done, trends driven by population-specific properties such as demographic history, the frequency of allele mis-calling and others should be removed. All that will remain will be trends in which, PNMs are genuinely associated with higher heterozygosity relative both to the same locus in other populations and to other loci with the same number of alleles in the same population. ESM 3c Is there an influence of natural selection and selective sweeps? Positive selection has the potential both to reduce heterozygosity and to remove PNMs. If the selection is very recent and impacts humans in just one part of the world, selection might possibly offer an explanation why population-locus combinations with PNMs also have relatively high heterozygosity. Although other evidence argues against this, a recent review concluding “Classical sweeps have been shown to be rare in humans and, if they do exist, they occur around loci with large effect alleles”, I also tested a clear prediction arising from models based on selection. Selection will tend to generation regions with high linkage disequilibrium, within which microsatellites will tend to show the same pattern: the same population groups carrying higher heterozygosity and PNMs. Consequently, models based on selection predict that PNMs will be clustered. In practice, I find no evidence of clustering, the probability that adjacent PNMs are found in the same population being indistinguishable from a random ordering and this probability does not vary with distance between adjacent PNMs (see text).

Electronic Supplementary Material 3. Addressing alternative

Related documents

Products

Support

Electronic Supplementary Material 3. Addressing alternative

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib