Supplement material and methods, tables and figures SLC6A4 allelic expression imbalance MATERIALS AND METHODS Genotyping For the 5HTTLPR, two oligonucleotide primers (5′-GGCGTTGCCGCTCTGAATGC-3′, 5′GAGGGACTGAGCTGGACAACCCAC-3′) were used to generate 5HTTLPR allele-specific fragments (484 base pairs [bp] and 528 bp) (Lerman et al. 1998) by polymerase chain reaction (PCR). The PCR products were separated on 4% agarose gel electrophoresis, stained with ethidium bromide and visualized under UV light. The PCR was performed in a 10 μL reaction including 10 ng DNA, 1 μL NH4 buffer, 0.25 μL dNTPs (40 mM), 2.35 μL 2× Polymate (BioLine, London, United Kingdom), 0.3 μL MgCl2 (50 mM), 1 μL primers (10 μM) and 0.1 μL BioTaq DNA Polymerase (BioLine). The PCR consisted of a 5 min denaturing step at 95°C (1 cycle), then 95°C for 30 sec, 62°C for 45 sec, 72°C for 1 min (35 cycles), and finally 72°C for 4 minutes (1 cycle). The genotyping of the A/G SNP within the 5HTTLPR long allele (Hu et al. 2006) was obtained by incubating 10 μL of PCR products with 4 units of restriction enzyme MspI which recognizes the restriction site (CCGG) created by the G allele. The fragments for the A allele (62, 126, 340 bp) and the G allele (62, 126, 166 and 174 bp) were separated on 3% agarose gel, stained with ethidium bromide and visualized under UV light. The two SNPs, rs2020933 and rs8073965, were designed and genotyped using the Sequenom platform as 1 described below for the cDNA allelotyping, with approximately 90% success rate. Primer details are available on request. Sequencing The Ensembl web database was used to obtain a copy of the SLC6A4 gene sequence. Primer3 was used to design primers that covered the promoter and all 15 exons of the gene. DNA was amplified in a 50l PCR reaction, with 10 pmol of oligonucleotides, 100 ng of DNA, 0.2 units of Taq Gold, 8 mM dNTP, 8 mM 1 PCR buffer, and 25 mM MgCl2. PCR products were purified in a 96-well Millipore purification plate and resuspended in 30 l of H2O. Two sequencing reactions were prepared for each DNA sample, one with the forward primer and one with the reverse primer. The PCR reagents were removed from solution by an ethanol precipitation in the presence of sodium acetate. All sequencing reactions were run out on anABI3700 sequencer and assembled by using PHRED/PHRAP and visualised using the Consed program. Measure of allelic expression imbalance Thirty three lymphoblastoid cell lines of Centre d'Etude du Polymorphisme Humain (CEPH) individuals were obtained from Coriell Cell Repositories (Camden, NJ) and were grown in RMPI 1640 medium supplemented with 15% fetal calf serum, 1% L-glutamine, 1% penicillinstreptomycin in an incubator set to 37 °C with 5% carbon dioxide. RNA was isolated using RNeasy midi kit (Qiagen). RNA samples were treated with the TURBO DNA-freeTM kit (Ambion, Austin, Texas) according to the manufacturer's recommendations, to eliminate possible contamination from genomic DNA (gDNA). cDNA was prepared using 50 ng of random primers and SuperSciptTM III RNA transcriptase (Invitrogen) with 1g of RNA according to the 2 manufacture protocol. To remove the RNA, the cDNA was treated with RNase H (Invitrogen). The cDNA was tested for DNA contamination using primers (GTCGTTTGAAGCCAGGAGAT and GGCTGAATGTTGTCGGATTT) within the coding region of an unrelated gene (STXPB4), which gives a different amplicon length in the presence of gDNA contamination (159 bp for cDNA and 339 bp for gDNA). Heterozygote individuals for the transcribed SNP (rs1042173) were used to measure the relative expression of the alleles within each mRNA sample. Genomic DNA of heterozygotes was used to correct for unequal readings of the alleles in the assay. For each cell line we generated cDNA in two separate reactions which were allelotyped separately. Each cDNA was allelotyped twelve times together with four samples of genomic DNA of the same cell line. Assays for the PCR and associated extension reaction were designed by SpectroDESIGNER software (Sequenom, San Diego, CA). The PCR primers ACGTTGGATGGCAGCACATGGATTAGAAGG ACGTTGGATGAGAACAGGGATGCTATCTCG and are and the extension primer is AGTAGATTCCAGCAATAAAATT. PCRs were performed in 10 l reactions with final concentrations of 2.5 mM MgCl2, 200 M dNTPs, 0.2 U of HotStar Taq (Qiagen), and primer concentration of 0.2 mM. The PCR profile is 45 cycles of 20 s at 95°C, 30 s at 56°C, and 1 min at 72°C. Non-incorporated dNTPs were removed with shrimp alkaline phosphatase for 20 min at 37°C. The mass-extension reaction was performed using MassEXTEND enzymes thermosequenase (Amersham Pharmacia), homogenous MassEXTEND (hME) termination mixes, and hME extension primers; 55 cycles were performed for 5 s at 94°C, for 5 s at 52°C, and for 5 s at 72°C. Unincorporated ddNTPs and dNTPs were removed with SpectroCLEAN resin, and products were transferred to a 384 SpectroCHIP (Sequenom, San Diego, CA) using 3 SpectroPOINT robot (Sequenom, San Diego, CA). The chip was read using the Bruker Autoflex Mass Spectrometer system (Bruker-Sequenom, San Diego, CA). The allelotyping was analyzed using MassARRAY Typer version 3.1 software (Sequenom). Peak areas were used to calculate the allele frequencies. Allelic expression imbalance analysis The expression ratios between the G and T allele of SNP rs1042173 were corrected for unequal detection using the average ratio in genomic DNA from heterozygotes (mean ratio = 1.18). The expression ratios were analysed on the log (base 10) scale because the distributions of the log ratios tend to be closer to a normal distribution. The presence of a significant AEI in each genotyping group was determined by a Student's t test of whether the mean of the log transformed values was different from 0. The association mapping analysis was based on the expectation that in the case of one cis-acting variant, the heterozygotes should show AEI (i.e. allelic ratio significantly different from one) but not the homozygotes (allelic ratio close to 1). Under the null hypothesis (that the SNPs are not functional and not in LD with any other functional SNP) the distribution of allelic ratios should be equal in the heterozygote and the homozygote groups (Table 1). For each of the tested SNPs, the significance of the differences between the homozygotes and the heterozygotes log transformed allelic ratios was tested by a Wilcoxon rank-sum test using the 'wilcox.test' function in the R statistical analysis package version 2.1.1. (R Development Core Team 2004). The alleles of the 5HTTLPR were phased relative to other SNPs using the program PHASE2 (Stephens et al. 2001). There are four possible haplotype pairs (diplotypes): homozygotes at the tested SNP for one or the other allele, and heterozygotes in two possible ways (Table 1). For SNPs creating two possible heterozygous 4 diplotypes, the association test was performed twice after inverting the allelic ratios of one subgroup or the other. For example, if an A/B SNP and the transcribed SNP (rs1042173; T/G) had two types of diplotypes: T-A/G-B and G-A/T-B. In this case, the maximum P-value of the two was recorded. Similarly, the proportion of variance in AEI explained by the heterozygote vs. homozygote genotypes of the 5HTTLPR, rs16965628 and rs2020933 variants was estimated twice using analysis of variance (ANOVA), recording the maximum proportion of the two estimates. The 5% threshold for significance was empirically evaluated by randomly permuting the allelic ratios 100,000 times, applying the above test and recording the minimum P-value across all SNPs. 5 Table 1. Four possible diplotypes with the expected AEI outcome Expected allelic expression ratio Haplotype between T/G Transcribed SNP a Tested SNP b Functional SNP: A > B c (rs1042173) T G --------- A A balanced: ratio = 1 T G --------- B B balanced: ratio = 1 T G --------- A B imbalanced: ratio>1 T ----- B imbalanced: ratio<1 G ----A a The transcribed SNP is used as a tag to measure the relative abundance of allelic transcripts. b The tested SNP is analysed for association with AEI. c Functional SNP with a higher expression of the A allele relative to the B allele. Association analysis with neuroticism A detail description of the subjects used for the association analysis has been published elsewhere 22, 23. This study was approved by the Oxford Local Ethical Review Committee, and informed consent was obtained from all participants. We have previously collected N scores from 88,142 individuals from an ethnically homogenous population from the South West and South East of England 23. We identified 768 unrelated individuals from the extremes (10%) of the N-score distribution that were previously genotyped for the 5HTTLPR 22. We have estimated 27 that the sample has sufficient power (80%) to detect a genetic effect contributing only 0.53% of phenotypic variance at a 1% alpha level, assuming no dominance effect and a QTL increaser 6 allele frequency of 5%. Allele frequencies were compared between the high and low neuroticism score extreme groups using Fisher exact test as implemented in R ('fisher.test'). The haplotypes were analyzed based on a score statistic using "haplo.stats" package for R 24. 7 Table2. Linkage disequilibrium in the 33 CEPH sample between the variants associated with allelic expression imbalance a Variant 1 (MAF a) Variant 2 r2 D' rs16965628 (7.6%) 5HTTLPR 0.019 0.54 rs2020933 (6.1%) rs16965628 0.79 1.00 5HTTLPR (44%) rs2020933 0.0094 0.43 minor allele frequency in the 33 CEPH samples 8 Table 3. Genotype and allele distribution High N Low N rs8073965 counts frequency rs2020933 counts frequency GG 382 0.953 TT 391 0.914 GT 19 0.047 AT 36 0.084 TT 0 0.000 AA 1 0.002 G 783 0.976 T 818 0.956 T 19 0.024 A 38 0.044 GG 257 0.948 TT 261 0.894 GT 14 0.052 AT 31 0.106 TT 0 0.000 AA 0 0.000 G 528 0.974 T 553 0.947 T 14 0.026 A 31 0.053 High N = individuals with high neuroticism score, Low N = low neuroticism score. 9 Table 4. Haplotype association with neuroticism tested using a haplotype score test rs2020933 a 5HTTLPR Haplotype frequencies Total Low N High N Haplotype scorea Simulated P-valueb A L 0.044 0.047 0.042 -0.67 0.50 A S 0.0041 0.0066 0.0030 -0.71 0.54 T L 0.55 0.56 0.54 -0.45 0.65 T S 0.41 0.39 0.42 0.81 0.42 Haplotype score statistics were calculated using the Haplo.Stats package in R b The simulated P-value for the maximum score statistic is 0.81 10 Genomic DNA log10 ratios Figure -0.4 -0.2 0 0.2 0.4 0.4 0.2 0 -0.2 -0.4 Frequency cDNA log10 ratios 18 16 14 12 10 8 6 4 2 0 1-1.2 1.2-1.5 1.5-1.8 1.8-2.1 2.1-3 Ratio Figure 1. The distribution of allele-specific expression ratios. In the top plot each dot is the log 10 of the average expression allelic ratio of one of the CEPH samples for cDNA against genomic DNA. The bottom histogram is the distribution of allelic expression ratios. For consistency, ratios below 1 were inverted. 11 Average expression ratio 1.4 1.2 1 0.8 0.6 0.4 0.2 0 LS (n=17) LAS LAS (n=14) LLGS GS (n=3) LALG LALG (n=2) LALA LALA (n=8) LL & SS (n=16) 5-HTTLPR genotypes Average expression ratio 1.4 1.2 1 0.8 0.6 0.4 0.2 0 AT (n=4) LS (n=14) Homozygous (n=15) 5-HTTLPR and rs2020933 genotypes Figure 2. Mean allelic expression imbalance of different genotypes. The error bars are the 95% confidence interval for the mean. Numbers of heterozygotes for each group are shown in parentheses. The average allelic expression imbalance for different genotypes at the 5HTTLPR is shown at top, including the A/G SNP within the long allele. Below is shown the average allelic expression imbalance for heterozygotes for rs2020933 (AT), heterozygotes for the 5HTTLPR which are not heterozygotes for rs2020933, and homozygotes for both rs2020933 and the 5HTTLPR. 12