Dynamic DNA methylation across diverse human cell lines and tissues Katherine E. Varley, Jason Gertz, Kevin M. Bowling, Stephanie L. Parker, Timothy E. Reddy, Florencia Pauli, Marie K. Cross, Brian A. Williams, John A. Stamatoyannopoulos, Gregory E. Crawford, Devin M. Absher, Barbara J. Wold, Richard M. Myers* * Corresponding author: rmyers@hudsonalpha.org Supplemental Figures and Legends: Supplemental Figure 1. Reduced Representation Bisulfite Sequencing experimental method. Genomic DNA is digested with MspI. The restriction fragment overhangs are filled-in and tailed with 3’A overhangs. Methylated adapters containing sequences compatible with Illumina GAIIx sequencing are ligated to the restriction fragments. 1 Genomic restriction fragments between 40-120 bp are extracted from agarose gels, treated with sodium bisulfite, and PCR amplified. Supplemental Figure 2. Computational methods for aligning sequencing reads and determining percent methylated (PM) values for each cytosine. 2 Supplemental Figure 3. Method Performance. A) Optimum coverage for reproducible measurements of DNA methylation was determined by comparing two growth replicates of the K562 cell line. For each minimum read depth (coverage), we calculated the difference in percent methylated values between replicates, and determined the average across all assayed CpGs. We found that requiring a minimum of 10x coverage reduces the average difference between replicates to 3.96 percent methylated. B) Percent methylated values measured in the cell line K562 by RRBS and Illumina Methyl450K array are highly correlated (r=0.954). C) The distribution of percent methylated values in fully methylated and completely unmethylated (whole genome amplified) control samples indicates that the assay can produce accurate measurements of DNA methylation for the two extremes of possible genomic methylation. The highly similar genomic distributions of DNA methylation across 82 human cell line and tissues, and their replicates, demonstrates that RRBS is reproducible and that, while individual CpGs may 3 have variable methylation across samples, the fraction of CpGs in the genome that are methylated remains fairly constant (mean R2=0.96). 4 5 Supplemental Figure 4. Detailed tree containing sample identifiers from unsupervised hierarchical clustering of top 5% of CpGs with the most varying methylation across all samples. Sample identifier key is Supplemental Table 1. Supplemental Figure 5. The clustergram from unsupervised hierarchical clustering of 6 the top 5% of CpGs with the most varying methylation across 82 samples was depicted in Figure 1. Here the same CpGs are divided based on their relationship to gene annotations. We find that regardless of whether the CpGs are located in regulatory regions near a gene’s transcription start site (TSS) or in the body of a gene they recapitulate patterns of methylation associated with four major clades: cancer cell lines, tissues, primary cell lines, and blood leukocytes. A) The subset of the top 5% of CpGs with the most varying methylation that are found within 2,000 bp of the transcription start site (TSS). B) The subset of the top 5% of CpGs with the most varying methylation that are found in the body of genes more than 2,000 bp from the transcription start site (TSS). Supplemental Figure 6. Replicate growths of three cell lines obtained from different labs across the United States exhibit highly correlated percent methylated values within a cell line regardless of whether the replicates were grown in the same lab or different labs. Correlation coefficients are reported for each pairwise comparison. 7 Supplemental Figure 7. CpGs with allele-specific methylation in primary cell lines and tissues exhibit aberrant methylation in cancer cell lines. A) The CpGs that exhibit allelespecific methylation in primary cell lines and tissues have a skewed distribution of percent methylated values in cancer cell lines. B) A regulatory region of the imprinted 8 gene GNAS exhibits allelic and partial methylation in primary cell lines and normal tissues (yellow bars are 50% methylated CpGs), and is hyper-methylated across diverse cancer cell lines (red bars are 100% methylated CpGs). C) The intron of GPR123 exhibits allelic methylation and partial methylation across primary cell lines and normal tissues (yellow bars are 50% methylated CpGs), and is hypo-methylated across diverse cancer cell lines (green bars are 0% methylated CpGs). 9 10 Supplemental Figure 8. Detailed tree containing sample identifiers from unsupervised hierarchical clustering of top 5% of CpGs with the most varying methylation across noncancer samples. Sample identifier key is Supplemental Table 1. Supplemental Figure 9. CpGs that reside in CGIs in gene bodies far from the transcription start site (TSS) can be either positively or negatively correlated with expression. Those that are bound by p300 have methylation values that are negatively correlated with gene expression, exhibiting a methylation pattern similar to gene promoters. This indicates that p300-bound CGI intragenic enhancers are unmethylated in expressed genes, unlike most other gene body CpGs. 11