1 Supplementary Information 2 3 4 5 FIGURE S1: First four principal components of PCA including populations from Cameroon, Caribbean Islands, France, Raleigh, southeast US, and Winters reveal that most variation explained is within the Cameroon population 5 4 3 2 1 0 Percent variation explained Percentage of variation explained by each PC 0 20 40 60 80 100 120 PC # 6 7 8 9 FIGURE S2. Percentage of variation explained over all principal components 1 10 11 12 13 14 FIGURE S3: First and second principal components (eigenvectors) of PCA with the Cameroon population removed, but including populations from Caribbean Islands (CAR), France (FRA), Raleigh (RAL), southeast US (SEU), and Winters (WIN) 15 16 17 18 19 FIGURE S4: First four principal components of PCA with the Cameroon population removed, but including populations from Caribbean Islands, France, Raleigh, southeast US, and Winters 2 0.78 0.79 ● 0.77 0.75 0.76 CV error ● ● ● ● 1 2 3 4 5 K (num of ancestral populations) 20 21 22 23 24 FIGURE S5: Cross validation results for ADMIXTURE analysis to determine the optimal number of ancestral populations 3 25 26 27 28 29 30 31 32 33 34 35 FIGURE S6: ADMIXTURE X-chromosome analysis for number of groups K=2. Red indicates African ancestry, FIGURE S7: Expected proportion of African ancestry for each population by chromosomal region CHR 2 FRA FRA 36 37 38 WIN RAL SEUS CAR CAM 0.03479 0.04292 0.06273 0.07378 0.17891 0.02220 0.03786 0.04657 0.14145 0.02602 0.02985 0.11839 0.03815 0.13094 WIN 0.03903 RAL 0.03795 0.02431 SEUS 0.05114 0.05487 0.03238 CAR 0.07435 0.09362 0.05798 0.04345 CAM 0.14316 0.15647 0.11718 0.11065 0.11311 0.09201 TABLE S1: Average FST values between populations for chromosome 2 divided by regions 2L (below diagonal) and 2R (above diagonal) 4 CHR 3 FRA FRA 39 40 41 RAL SEUS CAR CAM 0.06091 0.04787 0.05638 0.08436 0.15828 0.02519 0.05181 0.10992 0.19523 0.04005 0.08867 0.17283 0.05206 0.15105 WIN 0.03119 RAL 0.03769 0.02126 SEUS 0.05548 0.03855 0.02791 CAR 0.07135 0.05789 0.03810 0.04468 CAM 0.17481 0.16001 0.13335 0.14126 0.13220 0.11451 TABLE S2: Average FST values between populations for chromosome 3 divided by regions 3L (below diagonal) and 3R (above diagonal) FRA FRA WIN RAL SEUS CAR CAM 42 43 44 WIN 0.00808221 0.00808332 0.00675504 0.00722308 0.0172637 FRA WIN RAL SEUS CAR 0.001230015 0.001146293 0.001754388 0.001699935 0.000797005 0.001671908 0.00156751 0.0071234 0.001537613 0.001396098 0.0068906 0.00664005 0.001741715 0.00703498 0.0067045 0.0048818 0.0166368 0.0164513 0.0158264 0.0157321 WIN RAL SEUS CAR CAM 0.003080155 0.003207963 0.00308286 0.003077658 0.00277879 FRA WIN RAL SEUS CAR CAM CAM TABLE S3: Average DXY values between populations for autosomes (above diagonal) and X chromosome (below diagonal) 5