Supplementary Information Supplementary methods CNV analysis Sample preparation We used a CNV early access array for CNV analysis, which is designed based on the database of known CNVs. Each sample and reference DNA (3.0 μg each) was labeled with Cy5 or Cy3, respectively, using the Agilent DNA labeling kit (Agilent Technologies, Inc., Santa Clara, CA, USA). Following the manufacturer’s recommended hybridization and washes, the arrays were scanned with an Agilent MicroArray Scanner G2505A and the obtained TIFF image data were processed with Agilent Feature Extraction software (version 9.5.3.1) using the CGH-v4_95_Feb07 protocol. CNV early access array data analysis Extracted data were analyzed with Agilent DNA Analytics 4.0 software (version 4.0.85), and the Aberration Detection Method 2 (ADM-2) algorithm 1 was used to identify contiguous genomic regions that corresponded to chromosomal aberrations. The following parameters were used in this analysis: threshold of ADM-2: 6.0; centralization: on (threshold: 6.0, bin size: 10); fuzzy zero: off; aberration filters: on (minProbes = 2 AND minAvgAbsLogRatio = 0.5 AND maxAberrations = 10000 AND percentPenetrance = 0); feature level filters: on (gIsSaturated = true OR rIsSaturated = true OR gIsFeatNonUnifOL = true OR rIsFeatNon- UnifOL = true). At a minimum, two contiguous suprathreshold probes were required to define a change. To find an obvious homozygous deletion, aberrant regions with a signal log ratio of less than –5.0 were 1 searched. Genomic positions were based on the UCSC March 2006 human reference sequence (hg18) (NCBI build 36.1 reference sequence assembly). To find copy number differences between the twins, we detected their respective copy number changes compared to a reference. Then we calculated the fold change of each probe in the regions and selected the probes with fold changes of more than 1.2. Real-time PCR analysis We performed real-time PCR analysis using SYBR-green dye (Applied Biosystems, Foster City, CA, USA) with an ABI PRISM 7900HT (Applied Biosystems) to confirm the copy number differences between the twins detected by arrays. For real-time PCR analysis, we selected as candidates those regions that contained several probes showing consecutive changes in the same direction of fold change or that contained a probe showing large absolute values of fold change. In total, we selected 40 regions and one immunoglobulin-related region was included as a positive control for the experiment. Apolipoprotein B (APOB) was used as a single control gene, and the copy numbers in the candidate regions were calculated as a relative ratio to APOB. For quality control, a gene on the X chromosome (X inactive-specific transcript, XIST) was also examined using an unrelated female sample. Applied primer sets were shown in Supplementary Table 2. Reference S1 Lipson, D., Aumann, Y., Ben-Dor, A., Linial, N. & Yakhini, Z. Efficient calculation of interval scores for DNA copy number data analysis. J Comput Biol 2006; 13: 215-228. 2 Supplementary Figure Legends Supplementary Figure 1. Copy number profiles of CNV array in pair 1 (a) and pair 2 (b). Copy number profiles in sex chromosome reflect the effect of female reference sample. The copy number profiles within each twin pair were nearly identical. Supplementary Figure 2. The chromosomal distribution of copy number differences in pair 1(a) and pair 2(b). Each bar shows the number of probes for which fold changes were more than 1.2 (FC > 1.2) in each chromosome. In both twin pairs, marked differences were restricted to chromosomes 2, 14, and 22, which contain immunoglobulin (Ig)-related regions. Supplementary Figure 3. Results of real-time PCR analysis. (a) A difference in copy number between pair 1 twins in the Ig-related region on chromosome 22 was detected by CNV array. Any other copy number differences detected by CNV arrays between the twin pairs were not confirmed by real time PCR. (b) The detected CNV was confirmed by real-time PCR analysis (FC = 2.12). (c) The copy number differences of XIST on the X chromosome in male twins and a control female. The result ensures the quality of this experiment (FC = 2.02 for healthy co-twin, 1.92 for bipolar twin, respectively). Supplementary Figure 4. The flowchart of tiling array data analysis and detection of candidate regions for case-control analysis. Bold numbers represent the number of MRs selected in each step. The filtering process was described in detail in supplementary materials and methods. In brief, 1) the data of each twin pair were directly compared 3 and the regions (containing 6 or more CpG sites) showing significant differences (p value < 10-4) between the BD twin and the healthy co-twin were selected. The selected MRs were named BD (bipolar twin)-dominant MRs and C (healthy co-twin)-dominant MRs. 2) The data of each twin were compared with those of a reference sample (i.e., unmethylated DNA). Among the BD-dominant MRs, the regions showing the significantly methylated signal compared with a reference sample in BD twin but not in healthy co-twin were selected. The selected regions were named BD-specific MRs. C-specific MRs were determined vice versa. 3) The MRs overlapping with a CpG islands were selected. 4) Based on the results of bisulfite sequencing of representative regions, we applied a more stringent threshold for filtering (p value < 10-6) for the direct comparison between each twin pair. 5) The regions that showed alteration of DNA methylation status before and after the transformation by Epstein–Barr virus in our previous study using 4 sets of lymphoblastoid cell lines (LCLs) and peripheral blood cells were excluded. 6) From the candidate regions for BD-specific MRs, the regions that were detected as MRs in at least one of 4 LCLs were excluded. From the C-specific MRs, only the regions overlapping with common MRs in all of 4 LCLs were selected. Supplementary Figure 5. Results of bisulfite sequencing of the 13 candidate regions. Regions in which the differential DNA methylation was not confirmed (No) had p value greater than 10−6. The regions that were partially confirmed (Partial) also had p value around 10−6. Regions where methylation was completely confirmed (Complete) had p value less than 10−6. Note that partially confirmed or non-confirmed results were attributable to the false negative signals from tiling array (see Supplementary Fig 6). We did not observe false-positive results at the examined sites. 4 Supplementary Figure 6. The representative results of bisulfite sequencing. Among the 13 regions, differences in DNA methylation in 11 regions were confirmed (5 regions were completely confirmed, 6 regions were partially confirmed) and differences in 2 regions were not confirmed (Fig. S5). Examples of a completely confirmed region (a), a partially confirmed region (b), and a region that was not confirmed (c). The results of tiling arrays in a pair of monozygotic twins are shown above. The vertical axis represents the signal intensity, and the horizontal axis represents the base number (NCBI36/hg18). The CpG island is shown by a gray square. The region showing statistically significant methylation difference between the bipolar twin and the healthy co-twin, and the region examined by bisulfate sequencing are shown by a black square and a red bar, respectively. The results of bisulfite sequencing are shown on the bottom. Black circles represent the methylated CpGs, and the white circles represent the unmethylated CpGs. Each row shows the data of one clone. Supplemntary Figure 7. Perspective view of the methylation difference between the twins in chr17. The vertical axis represents the signal intensity, and the horizontal axis represents the base number on the chromosome 17 (NCBI36/hg18). The methylation of SLC6A4 promoter region was enlarged in a box. The region showing statistically significant methylation difference between the bipolar twin and the healthy co-twin is shown by a red bar. Supplementary Figure 8. DNA methylation difference between twins in FANK1. a) Results of comprehensive DNA methylation analysis of LCLs of a pair of 5 monozygotic twins discordant for BD using tiling arrays. The vertical axis represents the signal intensity, and the horizontal axis represents the base number on the chromosome 10 (NCBI36/hg18). Exon-intron structure of the FANK1 is shown below the data of tiling arrays. The CpG island is shown by a gray square. The region showing statistically significant methylation difference between the bipolar twin and the healthy co-twin, and the region examined by bisulfate sequencing are shown by a black square and a red bar, respectively. b) Results of bisulfite sequencing. The genomic region detected statistically significant methylation difference between the twins by tiling arrays, which corresponds to the base numbers from 127573680 and 127574304, is shown above. The five CpG sites are shown by red letters with under bars. Black and white circles represent the methylated and unmethylated CpGs, respectively. Each raw shows the data of one clone. Five circles in one raw represent the five CpG sites shown above. This region is equally methylated in both twins. Supplementary Figure 9. DNA methylation difference between twins in KIAA1530. a) Results of comprehensive DNA methylation analysis of LCLs of a pair of monozygotic twins discordant for BD using tiling arrays. The vertical axis represents the signal intensity, and the horizontal axis represents the base number on the chromosome 4 (NCBI36/hg18). Exon-intron structure of the KIAA1530 is shown below the data of tiling arrays. The CpG island is shown by a gray square. The region showing statistically significant methylation difference between the bipolar twin and the healthy co-twin, and the region examined by bisulfate sequencing are shown by a black square and a red bar, respectively. 6 b) Results of bisulfite sequencing. The genomic region detected statistically significant methylation difference between the twins by tiling arrays, which corresponds to the base numbers from 1353373 and 1354215, is shown above. The 22 CpG sites are shown by red fonts with under bars. Black and white circles represent the methylated and unmethylated CpGs, respectively. Each raw shows the data of one clone. Five circles in one raw represent the five CpG sites shown above. This region is hypermethylated in both twins. 7