Supplemental Digital Content 2: SNP Selection Process

advertisement
Supplemental Digital Content 2: SNP Selection Process
Each gene on the candidate gene list was first mapped to the NCBI reference genome build 35
based on Entrez Gene text database or interface. Occasionally, a gene may be mapped based on
the Ensembl database. All putative functional SNPs (coding SNP, 5’-UTR, 3’-UTR, splice site
and regulatory region) from internal, NCBI dbSNP, and Ensembl variation databases with known
minor allele frequency (MAF) greater than 0 in the population (European ancestry only) were
selected. Illumina generated design scores were used to classify SNPs into 4 groups (I: design
score ≥ 0.9; II: design score ≥ 0.8; III: design score ≥ 0.7; IV: design score ≥ 0.6). SNPs from
HapMap release 22 were retrieved based on genomic coordinates for each gene including the 10
kb upstream region and 5 kb downstream region and also binned into 4 groups (I’: design score ≥
0.9; II’: design score ≥ 0.8; III’: design score ≥ 0.7; IV’: design score ≥ 0.6). Tagging SNPs
were selected using the aggressive tagging option of Haploview in an iterative fashion. First,
using SNPs in the I group as the constraint, tagging SNPs were selected to capture all SNPs in
the I’ group with MAF greater than 5% using a R^2 of 0.8 or above. Secondly, all SNPs in group
I, II and tagging SNPs selected from the previous cycle were used as the constraint, and
additional tagging SNPs were selected to capture all SNPs in II’ group using similar criteria.
The cycles were repeated two more times to capture all SNPs with design scores greater than 0.6.
In addition, the literature was curated to include SNPs that had been studied by other research
groups in related phenotypes.
These SNPs were mapped to the reference genome and/or
flanking sequence defined. All SNPs included in the Illumina Infininum II beadarray must have
a design score and follow the following rules: 1) a SNP cannot be mapped to the genome more
than once, 2) a SNP must be bi-allelic. A triallelic SNP must be assayed by two pseudo bi-allelic
SNPs, and 3) a SNP cannot be an indel. A total of 29,080 SNPs was selected and a bead array
was designed and manufactured by Illumina.
Download