Supplementary material to: Methylome-wide comparison of human genomic DNA extracted from whole blood and from EBV transformed lymphocyte cell lines Karolina Åberg (PhD)a*; Amit N. Khachane (PhD)a; Gábor Rudolf (PhD)a; Srilaxmi Nerella (MS)a; Douglas A. Fugman (PhD)b; Jay A. Tischfield (PhD)b; Edwin J.C.G. van den Oord (PhD)a a Center for Biomarker Research and Personalized Medicine, School of Pharmacy, Virginia Commonwealth University, PO Box 980533, Richmond ,VA 23298, USA; b Department of Genetics, Rutgers University, 145 Bevier Road, Piscataway, NJ 08854, USA. *Correspondence to: Karolina Åberg, Center for Biomarker Research and Personalized Medicine, School of Pharmacy, Virginia Commonwealth University, 1112 East Clay Street, P.O. Box 980533, Richmond, VA 23298. Tel: +1 804-628 3023, fax: +1 804-628 3991, e-mail: kaaberg@vcu.edu Methods Lymphocyte cell line (LCL) establishment and DNA extraction Lymphocytes cell lines (LCLs) were established by separating lymphocytes from whole blood by centrifugation on a Nycoprep density gradient (Axis-Shield, Oslo, Norway) and transformed utilizing Epstein-Barr virus (EBV) isolated from B958 cell line (in house preparation) according to standard operating procedures in place at the Rutgers University Cell and DNA repository (RUCDR). Briefly the lymphocyte layer from the Nycoprep gradient was washed, re-suspended in culture medium with 25% fetal calf serum(FCS), RPMI-1640, 0.1% phytoheaglutinin (PHA), and incubated at 37°C, 5% CO2 in a humidified incubator with the EBV. Cultures were maintained by twice weekly examination and medium supplementation with 15% FCS/RPMI-1640 as needed. After 5-6 weeks when the transformed LCLs cultures exhibited the desired density of healthy aggregates, they were transferred to larger flasks for expansion for DNA extraction and cryopreservation. For all samples included in this investigation DNA was extracted from the LCLs at the same passage as when they were cryopreserved. DNA samples were also extracted directly from aliquots of WB from the same samples as were used to create the LCLs. DNA was extracted from LCL cultures or WB using AutoPURE LS auotomated DNA extractors (Qiagen) utilizing standard PureGene extraction reagents1,2. This is an inorganic, salt-precipitation (i.e., phenol free) method that eliminates the hazards of phenol exposure and alleviates environmental concerns. All buffers and reagents are standardized and meet Qiagen’s strict quality control procedures. DNA quality, of DNA from WB and from LCL, was verified using restriction enzyme digestions and agarose gel electrophoresis, PCR, and by UV spectroscopy according to RUCDR standard 2 operating procedures. RUCDR maintains secure, state-of-the-art facilities where each operation is computerized to minimize sample mislabeling and/or cross-contamination. Probe correlation Probes that have low variation between the two blood samples but a high variation between samples from different individuals will have a high probe correlation indicating a variable methylation site. A low probe correlation, the variation between the two samples from the same individual is high, is likely to indicate a methodological issue, such as a failing probe or an empty probe (a probe located in a genomic region without any methylation sites). To identify the variable methylation sites, we use a previously developed procedure 3. In short, the array signal yijk for biosample i on probe j and replicate number k can be written as: yijk = mj + aij + eijk (Equation 1) where mj is the average signal at probe j, aij the biosample specific deviation at probe j, and eij the measurement error for biosample i on probe j for replicate k. In this study, we obtained two replicates, k=1..2, and calculated for a given probe j the Pearson (product moment) correlation between the two replicates using the data from all biosamples. This correlation is labeled the “probe correlation”. It can be shown that the correlation for probe j across the two replicates equals: COR(yi1, yi2 ) j VAR (A) j VAR (A) j VAR (E) j (Equation 2) where VAR(A)j and VAR(E)j are the variances of the methylation signals and error, respectively. This probe correlation is an index of the signal-to-error ratio, as it equals the biological variation in methylation signals across biosamples divided by the total variance. 3 Sample correlation The sample correlation for a given biosample i equals the correlation between the two replicates calculated across the data from all probes. Using assumptions similar to those upon which equation (Equation 2) is based, the sample correlation for biosample i measured on two occasions equals: C O R( y j1 , y j 2 ) i VAR (M ) i VAR (M ) i V A R ( E) i (Equation 3) where VAR(M)i is the variance in methylation signals across all probes for biosample i and VAR(E)i is the variance in the measurement error across all probes for biosample i. If measurement error is large relative to differences among probes in their methylation status, in addition to observing low probe correlations, we would expect the sample correlations to be low. Inter-correlation between adjacent probes To investigate the methylation pattern, we combined highly inter-correlated adjacent probes into blocks. Differences in block structures indicate differences in the methylation pattern between WB DNA and LCL DNA. To create these blocks, we used a two-step algorithm. Starting with the first two probes in the p-telomer on each chromosome, we first calculated the inter-correlation between adjacent probes and kept adding probes to that “block” until the average inter-correlation dropped below a threshold of 0.5. The idea is that the methylation signal will span a larger chromosomal region but that altered methylation patterns may cause the inter-correlation to drop below our threshold, thereby producing multiple blocks. As poor probes (i.e., probes with a large measurement error) will also “break up” methylation patterns, we used a second step. In this second step, we calculated the average inter-correlation between probes in 4 adjacent blocks. If the adjacent blocks were no further apart than 500 bp and their average inter-correlation was higher than our threshold of 0.5, we combined them again into a single block. The R script for block construction and the blocks created in this investigations are made available through the authors web sites http://www.people.vcu.edu/~kaaberg/ 5 Results The distributions of Cohen’s D is shown in Figure S1. Similar to figure 1, this figure shows that the majority of probes show small differences between the technical duplicates of WB while much bigger difference are observed for the comparisons with WB vs. LCL DNA. Figure S1. The distribution of Cohen’s D for duplicates of WB DNA (top) and each of the WB samples vs. LCLs (middle and bottom, respectively) are shown. Probes with complete data from all samples that showed inter-individuals variation are included. 6 Acknowledgment Control subjects were obtained from the National Institute of Mental Health Schizophrenia Genetics Initiative (NIMH-GI), data and biomaterials were collected by the "Molecular Genetics of Schizophrenia II" (MGS-2) collaboration. The investigators and coinvestigators are: ENH/Northwestern University, Evanston, IL, MH059571, Pablo V. Gejman, M.D. (Collaboration Coordinator; PI), Alan R. Sanders, M.D.; Emory University School of Medicine, Atlanta, GA, MH59587, Farooq Amin, M.D. (PI); Louisiana State University Health Sciences Center; New Orleans, Louisiana, MH067257, Nancy Buccola, APRN, B.C., M.S.N. (PI); University of California-Irvine, Irvine, CA, MH60870, William Byerley, M.D. (PI); Washington University, St. Louis, MO, U01, MH060879, C. Robert Cloninger, M.D. (PI); University of Iowa, Iowa, IA,MH59566, Raymond Crowe, M.D. (PI),Donald Black, M.D.; University of Colorado, Denver, CO, MH059565, Robert Freedman, M.D. (PI); University of Pennsylvania, Philadelphia, PA, MH061675, Douglas Levinson M.D. (PI); University of Queensland, Queensland, Australia, MH059588, Bryan Mowry, M.D. (PI); Mt. Sinai School of Medicine, New York, NY,MH59586, Jeremy Silverman, Ph.D. (PI).The samples were collected by Vishwajit Nimgaonkar's group at the University of Pittsburgh, as part of a multiinstitutional collaborative research project with Jordan Smoller, M.D., D.Sc., and Pamela Sklar, M.D., Ph.D., Massachusetts General Hospital(grant MH 63420). Data and biomaterials used in Study 23 were collected by the University of Pittsburgh and funded by an NIMH grant (Genetic Susceptibility in Schizophrenia, MH 56242) to Vishwajit Nimgaonkar, M.D., Ph.D. Additional Principal Investigators on this grant include Smita Deshpande, M.D., Dr. Ram Moanohar Lohia Hospital, New Delhi, India; and Michael 7 Owen, M.D., Ph.D., University of Wales College of Medicine, Cardiff, UK. Most importantly, we thank the families who have participated in and contributed to these studies. References 1. Sahota A, Brooks AI, Tischfield JA: Protocol 1: Preparing DNA from Cell Pellets; in Genetic variation; a laboratory manual; in: Weiner MP, Gabriel S, Stephens JC (eds): Genetic variation: a laboratory manual. Cold Spring Harbor, New York: Cold Spring harbor Laboratory Press, 2007, pp 107-109. 2. Sahota A, Brooks AI, Tischfield JA: Protocol 6: preparing DNA from Blood: Large-Scale Extraction; in: Weiner MP, Gabriel S, Stephens JC (eds): Genetic variation: a laboratory manual. Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press, 2007, pp 124-128. 3. Meng H, Joyce AR, Adkins DE et al: A statistical method for excluding nonvariable CpG sites in high-throughput DNA methylation profiling. BMC Bioinformatics; 11: 227. 8