Word file (29 KB )

advertisement
1
Supplemental Methods
Human subjects. In addition to the DNA samples described in the text, we used male
DNA samples from 65 European Americans from Utah, 36 African Americans, 5
Chinese and 14 Japanese. We also used female DNA samples from 28 African
Americans, 4 Chinese and 16 Japanese. Of the African Americans, 24 were obtained as
part of the International Collaborative Study of Hypertension in Blacks, while all other
samples were obtained from the Coriell Cell Repositories.
SNP genotyping results. Twenty-five SNPs around G6PD and 21 SNPs around
TNFSF5 were successfully genotyped and used in analysis. Of 17,687 genotypes used
in the study, 10,361 (59%) were obtained more than once, yielding 131 discrepant
genotypes (1.3%). When these discrepant genotypes were checked manually and
retested for accuracy, only 30 genotypes (0.3%) remained inconsistent. We note that
there are no missing genotypes for any of the samples in our data set. Moreover, as the
study was carried out in males on the X chromosome, all samples have known
haplotype phase.
Sequencing. We sequenced 3,170 bp (1,779 non-coding and 1,291 coding) of G6PD
and 1,724 bp (1,327 non-coding and 397 coding) of TNFSF5 in 45 males (15 Yoruba,
15 Shona, and 15 African Americans) and 3 primates (2 chimpanzees and 1 orangutan)
as described previously1.
Control regions supplement. We genotyped the 17 control regions in the Yoruba,
Beni, and Shona males, as well as 29 additional Yoruban trios, which were of value for
resolving haplotype phase. Since haplotype phase is required for the LRH test, we used
a computer program to infer phase (NJP in preparation). This program operates using
the same underlying population genetics framework as Stephens et al.2, but has the
additional advantage that it can handle missing data. We only included those control
2
regions for our analysis that we could match to our data both in terms of homozygosity
at the core, and homozygosity at long distance (25% stringency of matching). At the
core, we progressively eliminated SNPs with the most missing data until we had a
subset that matched the homozygosity at G6PD or TNFSF5 to within 25%. At long
distances, we matched the control regions to our data by progressively adding SNPs at
further distances from the core until homozygosity was broken down to the same extent
as was observed in G6PD or TNFSF5. Computer simulations (DER, in preparation)
show that it is not necessary to exactly match the SNP homozygosity and frequency to
calculate P-values, which makes the comparison of the control regions to our data
appropriate.
Date estimation. We calculated dates for the most recent common ancestor of G6PDcorehap8 and TNFSF5-corehap4 haplotypes using a previously published method3
which uses the breakdown of a multi-marker haplotype from an inferred ancestral state
as a molecular clock. Recombination is the main source for haplotype breakdown; we
used female recombination rates of 2.04-3.58 cM/Mb for G6PD and 1.03-1.41 cM/Mb
for TNFSF5, obtained from a comparison of physical distances from the HG12 human
genome sequence(http://genome.ucsc.edu/cgi-bin/hgGateway?db=hg12) to genetic
distances from three different maps4 [http://snp.cshl.org/linkage_maps]. We assumed a
generation time of 25 years, which, when corrected for the fact that only two-thirds of X
chromosomes recombine any generation (those in females), corresponds to an average
of 37.5 years between meioses (effective generation time). Confidence intervals for the
dates were calculated assuming the positively-selected haplotype expanded rapidly in
frequency since its common ancestor (we note that confidence intervals would be larger
if there were a slower increase in frequency, but the date estimate itself would not
change). Here, we only quote dates for G6PD proximal and TNFSF5 distal, because
date estimation is much more robust when more markers covering longer distances are
used.
3
G6PD
Estimated
Date estimate
recombination (and 95%
rate (cM/Mb) confidence
interval) in
generations
Marshfield 1 2.04
2
deCODE 1 3.58
2
TSC
1 2.64
2
80 (41-133)
92 (47-158)
45 (24-76)
52 (27-89)
62 (32-103)
71 (36-122)
Date
estimate
in years
3,000
3,450
1,688
1,950
2,325
2,663
TNFSF5
Estimated
recombination
rate (cM/Mb)
1.03
1.54
1.41
Date estimate
(and 95%
confidence
interval) in
generations
Date
estimate
in years
206 (118-324)
278 (156-453)
137 (78-217)
186 (104-303)
150 (86-236)
203 (114-331)
7,725
10,425
5,138
6,975
5,625
7,613
We provide two date estimates for each scenario: (1) assuming the haplotype
distribution through the past was the same as today’s except that the dated haplotype
was absent, and (2) assuming the distribution in the past was identical to today’s. The
reality is in the middle.
References:
1.
Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human
genes. Nat Genet 22, 231-8 (1999).
2.
Fearnhead, P. & Donnelly, P. Estimating recombination rates from population genetic data. Genetics
159, 1299-318. (2001).
3.
Reich, D. E. & Goldstein, D. B. in Microsatellites: evolution and applications 128-138 (Oxford
University Press, Oxford; New York, 1999).
4. Kong, A. et al. A high-resolution recombination map of the human genome. Nat Genet 31, 241-7.
(2002).
Download