1 Supplementary information Genotype Quality Control and Cleaning We first performed quality control for the subjects genotyped on the OmniExpress. Any genotyping calls with a “GenCall” score less than 0.15 were removed from further analysis (1.7%), as recommended by Illumina technical support. To identify markers inappropriately placed on the Y chromosome, the genotyping rate in females was reviewed and markers that had less than 98% missing in females were considered to be inappropriately mapped and were removed from further analysis (n=489 markers). Similarly, putative X chromosome markers were excluded if more than 2% of males were heterozygotes (n=102 markers). Since both blood and lymphoblastoid cell line-derived DNA were utilized for this study, we reviewed the data to identify chromosomal regions that might be duplicated or deleted. The entire chromosome was removed from further analyses in individuals where such regions were identified (n=162 chromosomes). We had previously genotyped a set of unrelated individuals on the Human 1M BeadChip (CC_GWAS). Two hundred seventy five individuals in the CC_GWAS are in the pedigrees of this family GWAS. To reduce duplicated genotyping while still maintaining good quality control and remove potential “site effects”, we included 51 subjects (primarily probands) for repeated genotyping. We compared the 51 subjects genotyped on both the Illumina OmniExpress and Illumina 1M arrays to determine the compatibility of allele calls between different genotyping platforms. Of the 544,276 SNPs genotyped on both platforms, we found 571 SNPs with more than one discrepancy (defined as having non-missing, but different, genotypes on both chips) and these SNPs were excluded from all further analyses in all samples. 2 Pedigree and ethnicity confirmation We tested the reported pedigree structure using a pruned set of approximately 100,000 SNPs that were not in linkage disequilibrium (r2=0.8). Pairwise IBD estimates were computed (http://pngu.mgh.harvard.edu/purcell/plink/)1 to detect pairs of individuals related differently than reported. After family structures were modified accordingly, we performed analyses to identify SNP genotypes that were inconsistent with Mendelian inheritance using PEDCHECK2. We identified 2,899 SNPs with 2 or more inconsistencies; these SNPs were removed from further analysis. Next, we utilized EIGENSTRAT3 with the pruned set of SNPs along with HapMap European reference samples to identify ethnic stratification within the sample and evaluated all SNPs for deviations from Hardy-Weinberg equilibrium using 442 genotyped founders. SNPs that failed the HWE test at p<10-6 were removed. Imputation We used the program BEAGLE version 3.3.14 to impute SNPs not genotyped on the Illumina OmniExpress array. Since our sample was European American, we used as a reference set the genotypic data from the EUR in the August 2010 release of the 1000 Genomes Project, provided with the Beagle release. BEAGLE uses the correlation (R2) between the best-guess genotype and the allele dosage as a measure to assess the R2 between the best-guess genotype and the true genotype4. Only SNPs with a correlation between the best-guess genotype and allele dosage greater than 0.3 (r2>0.3) were used. For individual-level genotype data, we retained genotypes having a probability equal to or greater than 80% (from the gprob metric in Beagle); otherwise, that genotype was set to missing. To account for uncertainty, we used the mean of the distribution of imputed genotypes, which corresponds to an expected allelic or genotypic count 3 (dosage) for each individual. The imputed SNPs were cleaned by the same methods as genotyped SNPs. C15orf53 gene expression analysis Gene expression was assayed using standard Affymetrix procedures on samples from 9 different brain regions (prefrontal cortex, cerebral cortex, thalamus, visual cortex, hippocampus, amygdala, caudate, putamen, and cerebellum) of 4 individuals (1 each male alcoholic, female alcoholic, male control, female control), obtained from the National Institute on Alcohol Abuse and Alcoholism (NIAAA)-supported brain bank at the Tissue Resource Center located in the Neuropathology Unit of the Department of Pathology, University of Sydney, Australia. Description of the expression assay was reported in Edenberg et al.5. CXCR7 and SLC9A9 gene expression analyses Gene expression was examined in human prefrontal cortices derived from 65 unrelated subjects of European descent. Thirty-four of 65 subjects were alcoholics. We used TaqMan assays with standard quantitative PCR procedures provided to analyze total RNA expression. Description of the expression assay was reported in Wang et al.6,7. Supplementary figures Figure S1. Distribution of family size in the GWAS sample. Figure S2. Distribution of DSM-IV alcohol dependence in the genotyped GWAS sample. Figure S3. Quantile-Quantile plot of genome-wide association results for symptom count using genotyped SNPs. 4 Supplementary tables Table S1. Demographics of the COGA family GWAS sample Table S2. The top 72 genotyped SNPs that associated with symptom count (inflation corrected p<0.0001) in COGA GWAS sample and replication data in SAGE and OZALC samples. In COGA sample, p values in bold indicate suggestive evidence of association (p≤1.0x10-5). In SAGE and OZALC samples, p values in bold indicate nominal association. Table S3. The association of SNPs in the C15orf53 gene with symptom count in COGA and SAGE samples. SNP in red is a non-synonymous cSNP; SNP in blue located in 3'UTR of the gene. Bold p values in SAGE sample indicate the nominal association with p<0.05. Reference 1. 2. 3. 4. 5. 6. 7. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. American journal of human genetics 81, 559-75 (2007). O'Connell, J.R. & Weeks, D.E. PedCheck: a program for identification of genotype incompatibilities in linkage analysis. American journal of human genetics 63, 259-66 (1998). Price AL, P.N., Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 38, 904-909 (2006). Browning, B.L. & Browning, S.R. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. American journal of human genetics 84, 210-23 (2009). Edenberg, H.J. et al. Genome-wide association study of alcohol dependence implicates a region on chromosome 11. Alcoholism, clinical and experimental research 34, 840-52 (2010). Wang, J.C. et al. Risk for nicotine dependence and lung cancer is conferred by mRNA expression levels and amino acid change in CHRNA5. Human molecular genetics 18, 3125-35 (2009). Wang, J.C. et al. Genetic variation in the CHRNA5 gene affects mRNA levels and is associated with risk for alcohol dependence. Molecular psychiatry 14, 501-10 (2009).