Supplementary Information (doc 92K)

advertisement
1
Supplementary information
Genotype Quality Control and Cleaning
We first performed quality control for the subjects genotyped on the OmniExpress. Any
genotyping calls with a “GenCall” score less than 0.15 were removed from further analysis
(1.7%), as recommended by Illumina technical support. To identify markers inappropriately
placed on the Y chromosome, the genotyping rate in females was reviewed and markers that had
less than 98% missing in females were considered to be inappropriately mapped and were
removed from further analysis (n=489 markers). Similarly, putative X chromosome markers
were excluded if more than 2% of males were heterozygotes (n=102 markers). Since both blood
and lymphoblastoid cell line-derived DNA were utilized for this study, we reviewed the data to
identify chromosomal regions that might be duplicated or deleted. The entire chromosome was
removed from further analyses in individuals where such regions were identified (n=162
chromosomes).
We had previously genotyped a set of unrelated individuals on the Human 1M BeadChip
(CC_GWAS). Two hundred seventy five individuals in the CC_GWAS are in the pedigrees of
this family GWAS. To reduce duplicated genotyping while still maintaining good quality
control and remove potential “site effects”, we included 51 subjects (primarily probands) for
repeated genotyping. We compared the 51 subjects genotyped on both the Illumina
OmniExpress and Illumina 1M arrays to determine the compatibility of allele calls between
different genotyping platforms. Of the 544,276 SNPs genotyped on both platforms, we found
571 SNPs with more than one discrepancy (defined as having non-missing, but different,
genotypes on both chips) and these SNPs were excluded from all further analyses in all samples.
2
Pedigree and ethnicity confirmation
We tested the reported pedigree structure using a pruned set of approximately 100,000
SNPs that were not in linkage disequilibrium (r2=0.8). Pairwise IBD estimates were computed
(http://pngu.mgh.harvard.edu/purcell/plink/)1 to detect pairs of individuals related differently
than reported. After family structures were modified accordingly, we performed analyses to
identify SNP genotypes that were inconsistent with Mendelian inheritance using PEDCHECK2.
We identified 2,899 SNPs with 2 or more inconsistencies; these SNPs were removed from
further analysis. Next, we utilized EIGENSTRAT3 with the pruned set of SNPs along with
HapMap European reference samples to identify ethnic stratification within the sample and
evaluated all SNPs for deviations from Hardy-Weinberg equilibrium using 442 genotyped
founders. SNPs that failed the HWE test at p<10-6 were removed.
Imputation
We used the program BEAGLE version 3.3.14 to impute SNPs not genotyped on the
Illumina OmniExpress array. Since our sample was European American, we used as a reference
set the genotypic data from the EUR in the August 2010 release of the 1000 Genomes Project,
provided with the Beagle release. BEAGLE uses the correlation (R2) between the best-guess
genotype and the allele dosage as a measure to assess the R2 between the best-guess genotype
and the true genotype4. Only SNPs with a correlation between the best-guess genotype and allele
dosage greater than 0.3 (r2>0.3) were used. For individual-level genotype data, we retained
genotypes having a probability equal to or greater than 80% (from the gprob metric in Beagle);
otherwise, that genotype was set to missing. To account for uncertainty, we used the mean of the
distribution of imputed genotypes, which corresponds to an expected allelic or genotypic count
3
(dosage) for each individual. The imputed SNPs were cleaned by the same methods as
genotyped SNPs.
C15orf53 gene expression analysis
Gene expression was assayed using standard Affymetrix procedures on samples from 9
different brain regions (prefrontal cortex, cerebral cortex, thalamus, visual cortex, hippocampus,
amygdala, caudate, putamen, and cerebellum) of 4 individuals (1 each male alcoholic, female
alcoholic, male control, female control), obtained from the National Institute on Alcohol Abuse
and Alcoholism (NIAAA)-supported brain bank at the Tissue Resource Center located in the
Neuropathology Unit of the Department of Pathology, University of Sydney, Australia.
Description of the expression assay was reported in Edenberg et al.5.
CXCR7 and SLC9A9 gene expression analyses
Gene expression was examined in human prefrontal cortices derived from 65 unrelated
subjects of European descent. Thirty-four of 65 subjects were alcoholics. We used TaqMan
assays with standard quantitative PCR procedures provided to analyze total RNA expression.
Description of the expression assay was reported in Wang et al.6,7.
Supplementary figures
Figure S1. Distribution of family size in the GWAS sample.
Figure S2. Distribution of DSM-IV alcohol dependence in the genotyped GWAS sample.
Figure S3. Quantile-Quantile plot of genome-wide association results for symptom count
using genotyped SNPs.
4
Supplementary tables
Table S1. Demographics of the COGA family GWAS sample
Table S2. The top 72 genotyped SNPs that associated with symptom count (inflation
corrected p<0.0001) in COGA GWAS sample and replication data in SAGE and OZALC
samples.
In COGA sample, p values in bold indicate suggestive evidence of association
(p≤1.0x10-5). In SAGE and OZALC samples, p values in bold indicate nominal association.
Table S3. The association of SNPs in the C15orf53 gene with symptom count in COGA and
SAGE samples. SNP in red is a non-synonymous cSNP; SNP in blue located in 3'UTR of the
gene. Bold p values in SAGE sample indicate the nominal association with p<0.05.
Reference
1.
2.
3.
4.
5.
6.
7.
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based
linkage analyses. American journal of human genetics 81, 559-75 (2007).
O'Connell, J.R. & Weeks, D.E. PedCheck: a program for identification of genotype
incompatibilities in linkage analysis. American journal of human genetics 63, 259-66
(1998).
Price AL, P.N., Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components
analysis corrects for stratification in genome-wide association studies. Nat Genet. 38,
904-909 (2006).
Browning, B.L. & Browning, S.R. A unified approach to genotype imputation and
haplotype-phase inference for large data sets of trios and unrelated individuals. American
journal of human genetics 84, 210-23 (2009).
Edenberg, H.J. et al. Genome-wide association study of alcohol dependence implicates a
region on chromosome 11. Alcoholism, clinical and experimental research 34, 840-52
(2010).
Wang, J.C. et al. Risk for nicotine dependence and lung cancer is conferred by mRNA
expression levels and amino acid change in CHRNA5. Human molecular genetics 18,
3125-35 (2009).
Wang, J.C. et al. Genetic variation in the CHRNA5 gene affects mRNA levels and is
associated with risk for alcohol dependence. Molecular psychiatry 14, 501-10 (2009).
Download