1 Supplemental Methods Human subjects. In addition to the DNA samples described in the text, we used male DNA samples from 65 European Americans from Utah, 36 African Americans, 5 Chinese and 14 Japanese. We also used female DNA samples from 28 African Americans, 4 Chinese and 16 Japanese. Of the African Americans, 24 were obtained as part of the International Collaborative Study of Hypertension in Blacks, while all other samples were obtained from the Coriell Cell Repositories. SNP genotyping results. Twenty-five SNPs around G6PD and 21 SNPs around TNFSF5 were successfully genotyped and used in analysis. Of 17,687 genotypes used in the study, 10,361 (59%) were obtained more than once, yielding 131 discrepant genotypes (1.3%). When these discrepant genotypes were checked manually and retested for accuracy, only 30 genotypes (0.3%) remained inconsistent. We note that there are no missing genotypes for any of the samples in our data set. Moreover, as the study was carried out in males on the X chromosome, all samples have known haplotype phase. Sequencing. We sequenced 3,170 bp (1,779 non-coding and 1,291 coding) of G6PD and 1,724 bp (1,327 non-coding and 397 coding) of TNFSF5 in 45 males (15 Yoruba, 15 Shona, and 15 African Americans) and 3 primates (2 chimpanzees and 1 orangutan) as described previously1. Control regions supplement. We genotyped the 17 control regions in the Yoruba, Beni, and Shona males, as well as 29 additional Yoruban trios, which were of value for resolving haplotype phase. Since haplotype phase is required for the LRH test, we used a computer program to infer phase (NJP in preparation). This program operates using the same underlying population genetics framework as Stephens et al.2, but has the additional advantage that it can handle missing data. We only included those control 2 regions for our analysis that we could match to our data both in terms of homozygosity at the core, and homozygosity at long distance (25% stringency of matching). At the core, we progressively eliminated SNPs with the most missing data until we had a subset that matched the homozygosity at G6PD or TNFSF5 to within 25%. At long distances, we matched the control regions to our data by progressively adding SNPs at further distances from the core until homozygosity was broken down to the same extent as was observed in G6PD or TNFSF5. Computer simulations (DER, in preparation) show that it is not necessary to exactly match the SNP homozygosity and frequency to calculate P-values, which makes the comparison of the control regions to our data appropriate. Date estimation. We calculated dates for the most recent common ancestor of G6PDcorehap8 and TNFSF5-corehap4 haplotypes using a previously published method3 which uses the breakdown of a multi-marker haplotype from an inferred ancestral state as a molecular clock. Recombination is the main source for haplotype breakdown; we used female recombination rates of 2.04-3.58 cM/Mb for G6PD and 1.03-1.41 cM/Mb for TNFSF5, obtained from a comparison of physical distances from the HG12 human genome sequence(http://genome.ucsc.edu/cgi-bin/hgGateway?db=hg12) to genetic distances from three different maps4 [http://snp.cshl.org/linkage_maps]. We assumed a generation time of 25 years, which, when corrected for the fact that only two-thirds of X chromosomes recombine any generation (those in females), corresponds to an average of 37.5 years between meioses (effective generation time). Confidence intervals for the dates were calculated assuming the positively-selected haplotype expanded rapidly in frequency since its common ancestor (we note that confidence intervals would be larger if there were a slower increase in frequency, but the date estimate itself would not change). Here, we only quote dates for G6PD proximal and TNFSF5 distal, because date estimation is much more robust when more markers covering longer distances are used. 3 G6PD Estimated Date estimate recombination (and 95% rate (cM/Mb) confidence interval) in generations Marshfield 1 2.04 2 deCODE 1 3.58 2 TSC 1 2.64 2 80 (41-133) 92 (47-158) 45 (24-76) 52 (27-89) 62 (32-103) 71 (36-122) Date estimate in years 3,000 3,450 1,688 1,950 2,325 2,663 TNFSF5 Estimated recombination rate (cM/Mb) 1.03 1.54 1.41 Date estimate (and 95% confidence interval) in generations Date estimate in years 206 (118-324) 278 (156-453) 137 (78-217) 186 (104-303) 150 (86-236) 203 (114-331) 7,725 10,425 5,138 6,975 5,625 7,613 We provide two date estimates for each scenario: (1) assuming the haplotype distribution through the past was the same as today’s except that the dated haplotype was absent, and (2) assuming the distribution in the past was identical to today’s. The reality is in the middle. References: 1. Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet 22, 231-8 (1999). 2. Fearnhead, P. & Donnelly, P. Estimating recombination rates from population genetic data. Genetics 159, 1299-318. (2001). 3. Reich, D. E. & Goldstein, D. B. in Microsatellites: evolution and applications 128-138 (Oxford University Press, Oxford; New York, 1999). 4. Kong, A. et al. A high-resolution recombination map of the human genome. Nat Genet 31, 241-7. (2002).