In Depth Sequencing of European Related Populations for

advertisement
Evidence for Malaria Selection of a CR1 Haplotype in Sardinia
Supplemental Information
Table S1. Source of Genotypes
Genotype Source
WTCCC (1958 British
Birth Cohort)
HYPERGENES
U Michigan
North Shore
I-ControlDB (CHOP)
i-ControlDB (HGDP)
Totals
Platforma
Irish
English
Ashkenazi
1.2M
1.2M
550K
550K
550K
550K
39
367
2
Sardinian
137
a. Illumina platform used for genotyping.
58
60
367
120
Arab
Norwegian
1
123
98
North South
Italian Italian
1
24
148
337
142
23
35
56
22
395
220
58
4
104
63
104
Table S2. European Haplotypes for CR1 Region Showing Sardinian Selection Signala
SNP Alleleb
SNPc
BPd
rs4844599
rs4844600
rs12567973
rs12567990
rs11117949
rs12039306
rs11117956
rs11117959
rs6540435
rs4562624
rs7512361
rs12130494
rs4844601
rs10863358
rs6656401
1-205760525
rs11117991
rs6661489
205745852
205745930
205748124
205748308
205748879
205750032
205750982
205751142
205752387
205752588
205752813
205753997
205755850
205757494
205758672
205760525
205760980
205764667
205782363
205804492
205804493
205809879
205814607
205815416
rs35245495
rs7521382
1-205804493
rs3738467
rs17046851
rs650877
Allelese
Imputation Haplotype Haplotype Haplotype Ancestral
Pf
A
B
C
Alleleg
G
A
C
C
C
A
G
A
C
A
C
A
G
C
A
G
C
C
T
G
G
T
T
G
T
G
T
C
T
T
T
G
G
T
T
T
0.96
0.95
0.96
0.98
0.97
0.96
0.97
0.98
0.96
0.93
0.93
0.96
0.93
0.93
0.92
0.93
0.96
0.94
C
T
NA
A
A
A
A
A
G
C
C
G
G
0.90
0.94
NA
0.96
0.96
G
G
G
T
C
A
G
G
C
C
C
T
T
C
G
T
C
C
(C)
A
A
(T)
G
G
T
G
C
C
T
G
T
A
T
C
T
A
G
G
G
T
T
C
(T)
G
C
(G)
A
A
T
A
C
C
T
G
T
A
T
A
T
A
G
G
A
T
T
T
(T)
G
C
(G)
A
A
Amino
Acid
Variantsh
G
G
C
C
T
A
G
A
T
C
C
T
T
C
G
T
T
C
T
Ile643Thr
G
C
G
A
G
Gln981His
rs12757487
rs599948
rs601356
rs614709
rs2274566
rs2274567
rs9659222
rs6687175
rs12034598
rs646817
rs1752688
rs1746659
rs11118131
rs3738468
rs10779330
rs11118133
rs608282
rs11118135
rs4844608
rs4844382
rs594955
rs12141045
rs11118136
rs7539922
rs677066
rs11118147
rs11118157
rs1408079
rs11118166
rs11118167
rs6691117
205815811
205818862
205819173
205819898
205819968
205820244
205821322
205823858
205824138
205824559
205824855
205825167
205827819
205828981
205829297
205829867
205832472
205837051
205837199
205837312
205837315
205838777
205839031
205839962
205840614
205841505
205844942
205846671
205848618
205848777
205849554
A
C
G
C
C
A
A
A
A
A
A
A
C
A
A
A
A
A
A
A
C
C
A
A
C
A
A
A
A
C
A
G
T
T
T
T
G
G
G
G
G
G
T
T
G
G
T
T
G
G
G
T
T
G
G
T
G
C
T
G
T
G
0.96
0.99
0.97
0.99
0.98
0.99
0.99
0.98
0.99
0.99
0.99
0.99
1.00
1.00
1.00
NA
1.00
0.99
0.99
0.99
0.99
1.00
0.99
0.99
1.00
0.99
1.00
0.99
0.99
0.99
1.00
A
C
G
C
C
G
G
G
G
G
G
A
T
A
A
(T)
A
G
G
A
T
C
G
A
C
G
A
T
G
C
G
G
T
T
T
T
A
A
A
A
A
A
T
C
G
G
(A)
T
A
A
G
C
T
A
G
T
A
C
A
A
T
A
G
T
T
T
C
A
A
A
A
A
A
T
C
G
G
(A)
T
A
A
G
C
T
A
G
T
A
C
A
A
T
A
A
C
G
C
C
A
A
A
A
G
G
A
C
G
G
A
T
A
A
G
T
T
G
G
T
A
A
A
G
T
G
His1167Arg
Ile1574Val
rs12032275
rs3818361
rs7519119
rs12403552
rs7542544
rs6701713
rs2093761
rs2093760
rs10429953
rs11576522
rs10429943
rs3811381
rs12036785
205850130
205851591
205852775
205852793
205852846
205852912
205853165
205853451
205855201
205855892
205856094
205856711
205859532
C
A
A
A
A
A
A
A
A
A
C
C
A
T
G
G
G
C
G
G
G
G
G
T
G
C
0.97
1.00
0.99
0.99
0.99
1.00
1.00
0.98
0.99
0.99
0.99
NA
0.99
T
G
G
A
C
G
G
G
G
A
T
G
C
C
G
A
G
A
G
G
G
A
G
C
(C)
A
C
A
G
G
C
A
A
A
A
A
C
(C)
C
C
G
G
A
C
G
G
G
A
A
C
C
C
Pro1786Arg
a. The haplotypes were derived both genotyped SNPs and imputed SNPs. The genotyped SNPs are indicated in the probability
colomn with a probability of 1.0. Imputed SNPs are indicated with posterior probabilities of <1.0. The imputation was
performed using Impute V2.01 and the posterior probabilties were used for determining the haplotypes. The haplotypes in this
Table were estimated using Beagle software2 using the posterior probability output for each possible genotype from Impute
V2.0. In Sardinians the frequencies of these exact haplotypes were: A, 0.581; B, 0.230; and C, 0.125. These frequencies are
similar to those derived using Haploview (Figure 2 in manuscript).
b. The SNP allele for each haplotype is shown. The alleles for SNPs used in previous studies but not genotyped or imputed in
the current study are included in the table within parentheses. These SNPs were not included in the quality filtered 1000
Genome preliminary results (June 2010 release). The most likely alleles for these SNPs are based on haplotypes inferred in
previous studies3.
c. The SNP rs number or for SNPs without rs numbers the chromosome and bp position (HG18) are shown.
d. The bp position is shown for NCBI build 36.3 (HG18).
e. Forward Alleles.
f.
Posterior probabilities for imputation assessment from Impute V2.0 analyses or genotypes. Not available (NA) are indicated
for genotypes based on earlier studies (see footnote b). Where P = 1.0, all the SNPs all individuals had complete genotypes.
g. The ancestral allele based on the Chimpanzee genotype is listed here. The ancestral status was obtained from UCSC
Browser (http://genome.ucsc.edu/) and NCBI dbSNP (http://www.ncbi.nlm.nih.gov/snp) compilations.
h. The amino acid variants are shown for SNPs with nonsynonymous changes. In each case the variant amino acid
corresponded to the SNP in haplotype A and is listed after the amino acid position. The position of the amino acid is provided
both without the 41 amino acid leader sequence. Many of the older references provide positions that include the leader
sequence. The position without the leader sequence and corresponding positions with the leader sequence are as follows:
Ile643Thr, Ile684Thr; Gln981His, Gln1022His; His1167Arg, His1208Arg; Ile1574Val, Ile1615Val; and Pro1786Arg,
Pro1827Arg.
Supplemental Note
For SNP rs2274567 assuming a hard sweep model in which the selected mutant starts from a new mutant or very low frequency by a
founder effect and assuming 25 years/generation, we used the haplotype sharing in Sardinian population to estimate:
1) Selection intensity s = 0.0054 (95% confidence interval: 0.0052 - 0.0056)
2) Allele age t = 45101 years (95% confidence interval: 43949 - 46253)
Method for estimating allele age and selection intensity:
We estimate the allele age and selection intensity of a selective sweep using the following procedures:
1. Allele age is estimated from the Extended Haplotype Homozygosity following the methods of Voight et al4. This model assumes
that the decaying of haplotype homozygosity follows a Poisson process using:
Pr (homozygosity) = e-2rg
Where homozygosity (Pr) is the probability that two haplotypes are homozygous at a distance, r, to the selected mutant, and g is
number of generations. As in Voight et al4 we choose Pr = 0.25, and the time in years, t = 25g .
2. After obtaining the point estimation of the allele age, we estimate selection intensity by assuming a deterministic logistic sweep
model, in which we assume the change of selected allele frequency follows the logistic differential equation, and thus
Where f is the allele frequency of the selected allele in current generation, f0 is the initial allele frequency before selection, and t is
allele age. We assume a hard sweep model, selection starts with a new mutant, the initial allele frequency is assumed to be 0.0001.
3. Confidence intervals of allele age and selection intensity are estimated by bootstrap.
Supplemental Information References
1.
Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genomewide association studies. PLoS Genet 2009; 5(6): e1000529.
2.
Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of
trios and unrelated individuals. Am J Hum Genet 2009; 84(2): 210-23.
3.
Xiang L, Rundles JR, Hamilton DR, Wilson JG. Quantitative alleles of CR1: coding sequence analysis and comparison of
haplotypes in two ethnic groups. J Immunol 1999; 163(9): 4939-45.
4.
Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLoS Biol 2006;
4(3): e72.
Download