Electronic Supplementary Material 4: Analysis of three common

advertisement

Electronic Supplementary Material 4: Analysis of three common dinucleotide repeats: AC, AT and AG.

Electronic Supplementary Material 4a: How the heterozygosity difference between population groups with and without a putative new allele varies with allele number at three

different dinucleotide repeat motifs. Heterozygosity values are standardised within each population group – allele number combination by subtracting the mean and dividing by the standard deviation. Allele numbers two to five contribute 98%, 99% and 99.5% of all observations in AC, AT and AG respectively. Several of the higher allele number (six or more)-motif combinations give negative differences but these are based on very few data points and no combination is significantly negative. Error bars are one standard error of the mean.

AC repeats

0,4

0,3

0,2

0,1

0

2 3 4 5

Number of alleles

6

AT repeats

0,5

0,4

0,3

0,2

0,1

0

2

AG repeats

0,4

0,3

0,2

0,1

0

2

3 4 5

Number of alleles

6

3 4 5

Number of alleles

6

Electronic Supplementary Material 4b: How the heterozygosity difference between population groups with and without a putative new allele varies between the four

population groups at three different dinucleotide repeat motifs. Heterozygosity values are standardised within each population group – allele number combination by subtracting the mean and dividing by the standard deviation. Error bars are one standard error of the mean. Sample sizes of loci are 1348, 947, 2697 and 1529 (for AC in Europe, EUR, East Asia, EAS, Africa, AFR and

Central Southern Asia, CSA respectively), 278,284, 1237 and 629 for AT (same order) and 695, 468,

1438 and 850 for AG (same order).

AC repeats

0,1

0

EUR EAS AFR

Population

CSA

AT repeats

0,2

0,1

0

EUR

AG repeats

0,1

EAS AFR

Population

CSA

0

EUR EAS AFR

Population

CSA

Electronic Supplementary Material 4c: Impact of filtering by Q score on the heterozygosity difference between population groups with and without a putative new allele in relation to

the number of alleles at AC repeat microsatellite loci. Heterozygosity values are standardised within each population group – allele number combination by subtracting the mean and dividing by the standard deviation. Error bars are one standard error of the mean. Genotypes with a lobSTR quality score, Q, less than 0.9 were excluded. Differences for 3, 4 and 6 alleles are significant at

P=8.6 x 10 -5 , 1.9 x 10 -6 and 0.007 respectively. Filtering reduces total sample size of loci by approximately one third from 6521 to 4195. Note, the Q score reflects the probability of observing a particular genotype relative to other genotypes at the same locus. As such, the score reflects not only sequence quality but also allelic diversity: filtering by Q selects strongly against heterozygotes, so particularly reduces the contribution of African individuals.

1,5

1

0,5

0

2 3 4 5

Number of alleles

6

Download