Electronic Supplementary Material 4: Analysis of three common dinucleotide repeats: AC, AT and AG.
Electronic Supplementary Material 4a: How the heterozygosity difference between population groups with and without a putative new allele varies with allele number at three
different dinucleotide repeat motifs. Heterozygosity values are standardised within each population group – allele number combination by subtracting the mean and dividing by the standard deviation. Allele numbers two to five contribute 98%, 99% and 99.5% of all observations in AC, AT and AG respectively. Several of the higher allele number (six or more)-motif combinations give negative differences but these are based on very few data points and no combination is significantly negative. Error bars are one standard error of the mean.
AC repeats
0,4
0,3
0,2
0,1
0
2 3 4 5
Number of alleles
6
AT repeats
0,5
0,4
0,3
0,2
0,1
0
2
AG repeats
0,4
0,3
0,2
0,1
0
2
3 4 5
Number of alleles
6
3 4 5
Number of alleles
6
Electronic Supplementary Material 4b: How the heterozygosity difference between population groups with and without a putative new allele varies between the four
population groups at three different dinucleotide repeat motifs. Heterozygosity values are standardised within each population group – allele number combination by subtracting the mean and dividing by the standard deviation. Error bars are one standard error of the mean. Sample sizes of loci are 1348, 947, 2697 and 1529 (for AC in Europe, EUR, East Asia, EAS, Africa, AFR and
Central Southern Asia, CSA respectively), 278,284, 1237 and 629 for AT (same order) and 695, 468,
1438 and 850 for AG (same order).
AC repeats
0,1
0
EUR EAS AFR
Population
CSA
AT repeats
0,2
0,1
0
EUR
AG repeats
0,1
EAS AFR
Population
CSA
0
EUR EAS AFR
Population
CSA
Electronic Supplementary Material 4c: Impact of filtering by Q score on the heterozygosity difference between population groups with and without a putative new allele in relation to
the number of alleles at AC repeat microsatellite loci. Heterozygosity values are standardised within each population group – allele number combination by subtracting the mean and dividing by the standard deviation. Error bars are one standard error of the mean. Genotypes with a lobSTR quality score, Q, less than 0.9 were excluded. Differences for 3, 4 and 6 alleles are significant at
P=8.6 x 10 -5 , 1.9 x 10 -6 and 0.007 respectively. Filtering reduces total sample size of loci by approximately one third from 6521 to 4195. Note, the Q score reflects the probability of observing a particular genotype relative to other genotypes at the same locus. As such, the score reflects not only sequence quality but also allelic diversity: filtering by Q selects strongly against heterozygotes, so particularly reduces the contribution of African individuals.
1,5
1
0,5
0
2 3 4 5
Number of alleles
6