Manuscript ID: HEP-12-1863 Supplementary materials Genomic landscape of copy number aberrations enables the identification of oncogenic drivers in hepatocellular carcinoma Kai Wang1,*, Ho Yeong Lim2, Stephanie Shi3, Jeeyun Lee2, Shibing Deng1, Tao Xie1, Zhou Zhu1, Yuli Wang1, David Pocalyko1,†, Wei Jennifer Yang1, Paul A. Rejto1, Mao Mao1, CheolKeun Park4,* and Jiangchun Xu1,*,‡ 1 Oncology Research Unit, Pfizer Inc., San Diego, CA 92121, USA 2 Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 135-710, Korea 3 External Research Solutions, Pfizer Inc., San Diego, CA 92121, USA 4 Department of Pathology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 135-710, Korea † Current address: Janssen Research & Development, 3210 Merryfield Row, San Diego, CA 92121, USA ‡ Current address: Quanticel Pharmaceuticals, 9393 Towne Centre Dr., Suite 110, San Diego, CA 92121 * Correspondence should be addressed to: Kai Wang (Kai.Wang4@pfizer.com) 1 Manuscript ID: HEP-12-1863 Cheol-Keun Park (ckpark@skku.edu) Jiangchun Xu (jxucam@gmail.com) 2 Manuscript ID: HEP-12-1863 3 Contents Supplementary Methods ................................................................................................................. 5 Clinico-pathological features of primary HCC samples ............................................................. 5 DNA and RNA extraction ........................................................................................................... 6 Gene expression analysis ............................................................................................................ 6 Copy number data processing pipeline ....................................................................................... 7 RNAi Knockdown ...................................................................................................................... 7 Assays used in quantitative real-time RT-PCR .......................................................................... 7 Immunohistochemical staining on tissue microarrays ................................................................ 8 Supplementary Tables ................................................................................................................... 10 Table S1. Major demographic and clinicopathological parameters of the HCC cohort. .......... 10 Table S2. Cell lines used in this study and their sources. ......................................................... 10 Table S3. All CNA peaks predicted by GISTIC2 analysis. ...................................................... 10 Table S4. Association of average copy number of the focal amplification and deletion peaks identifed by GISTIC2 to clinical and outcome variables.......................................................... 10 Manuscript ID: HEP-12-1863 4 Table S5. All pathways enriched among cis-acting genes in CNA peaks. ............................... 10 Table S6. Candidate driver genes selected based on focal CNA, expression changes and model availability................................................................................................................................. 10 Table S7. Somatic copy number, gene expression and IHC staining results for BCL9. .......... 11 Table S8. Somatic copy number, gene expression and IHC staining results for MTDH. ........ 11 Supplementary Figures ................................................................................................................. 12 Figure S1. Association between somatic CNA, mRNA expression and clinical outcome. ...... 12 Figure S2. Pair-wise DNA/DNA correlations reveal significant associations between unlinked loci............................................................................................................................................. 13 Figure S3. Distributions of GISTIC2 peak statistics. ............................................................... 14 Figure S4. Frequent somatic copy number alterations in critical signaling pathways in HCC. 16 Figure S5. Immunostaining in HCCs for BCL9 and MTDH. ................................................... 17 Figure S6. Inferred copy numbers and expression levels of BCL9 and MTDH in a panel of 30 HCC cell line models. ............................................................................................................... 18 Figure S7. Association to clinical outcomes and AJCC tumor stages for the putative CNA drivers BCL9 and MTDH. ........................................................................................................ 19 Reference ...................................................................................................................................... 20 Manuscript ID: HEP-12-1863 5 Supplementary Methods Clinico-pathological features of primary HCC samples We defined curative resection as complete resection of all tumor nodules with clear microscopic resection margins and no residual tumors as indicated by a computed tomography (CT) scan one month after surgery. None of the patients received preoperative chemotherapy. Clinical parameters, including age, gender, date of surgery, and tumor size were obtained from pathology reports. We also examined histopathologic features of the HCCs including histological differentiation, vascular invasion, intrahepatic metastasis, AJCC stage (1) and non-tumor liver pathology. HCCs were graded histologically according to the criteria of Edmondson and Steiner (2). Vascular invasion was considered present when a neoplastic cell group was surrounded by at least one or more endothelial cells or the tunica media of the vessel. Intrahepatic metastasis was defined according to the criteria of the Liver Cancer Study Group of Japan (3). Patient serum levels of α-fetoprotein and CT scans were performed at least once every 3 months after surgery until December 31, 2010. When tumor recurrence was suspected, precise diagnostic imaging was performed by means of magnetic resonance imaging. Disease-free survival (DFS) was defined as time from surgery to the date of tumor recurrence. While HCC is the cause of death for most patients with the disease, some patients died of liver failure or other causes in the absence of progressive HCC. For disease-specific survival (DSS), we defined HCC-related mortality (disease-specific death) as follows: 1) tumor occupying more than 80% of the liver, 2) portal venous tumor thrombus (PVTT) proximal to the second bifurcation, 3) obstructive jaundice due Manuscript ID: HEP-12-1863 6 to tumor, 4) distant metastases, or 5) variceal hemorrhage with PVTT proximal to the first bifurcation (4). DNA and RNA extraction Genomic DNA and total RNA were extracted from the sliced tissue specimens using the QIAamp DNA mini kit and RNeasy Plus Mini kit (Qiagen, Hilden, Germany), respectively. RNA integrity was assessed using Agilent 2100 BioAnalyzer (Agilent Technologies, Palo Alto, CA). For gene expression analysis, 276 tumor and 247 adjacent non-tumor liver samples with an RNA integrity number greater than 5.0 were included. Gene expression analysis Background corrected bead level data were exported from Illumina GenomeStudio for further analysis. Median was used to summarize bead level data to probe level data, which were then transformed with base 2 logarithm. The log transformed probe intensity data were normalized using quantile normalization. A probe was called present (p≤0.05) or absent (p>0.05) based on its comparison to the negative controls at 0.05 level. For genes with multiple probes, probe level data were summarized into gene level data using median after removing non-performing probes. Non-performing probes were defined only for genes with more than one probe and for the purpose of averaging probe level data. A non-performing probe has average intensity below the background threshold (i.e., 95% quantile of the negative control probes) for both normal and tumor samples, and another probe from the same gene has intensity at least two fold above the background threshold in either tumor or normal samples. In this case, we excluded the “nonperforming” probes when averaging the probe level data into gene level data. Manuscript ID: HEP-12-1863 7 Copy number data processing pipeline Normalized single channel intensity data were further corrected for potential dye-bias effect with the tQN method (5) and converted into raw copy number estimates in the form of Log R ratios (LRRs), which were then adjusted for genomic wave effect using the PennCNV package (6). For primary HCCs, we further normalized the LRR values in tumors by subtracting those from their matched non-tumor liver tissues, in an effort to eliminate germline copy number variations in each individual patient. Copy number segmentation was performed on the LRR data using the GLAD algorithm (7) with default parameters as provided in the R aroma package (8). To summarize the copy number of each gene in a sample, we took the mean LRR of the segment to which it belongs, and converted it back to copy number space by 2LRR+1. When a gene overlapped multiple segments, the LRR of the segment with the largest absolute value was taken. RNAi Knockdown siRNA transfection was performed using Lipofectamine 2000 (Life Technolgoeis, Carlsbad, CA) according to manufacturer’s protocol with control (Block-iT Alexa Fluor Red, Life Technologies) or target siRNAs (BCL9: M-007268-01, MTDH: M-018531-00, Thermo Scientific Dharmacon, Lafayette, CO). Assays used in quantitative real-time RT-PCR Taqman gene expression assay was performed using the following assays: BCL9 Taqman Gene Expression Assay (Applied Biosystems, Hs00979216_m1), MTDH Taqman Gene Expression Assay (Applied Biosystems, Hs00757841_m1) and Human GAPDH Endogenous Control (VIC/TAMRA Probe, Primer Limited, Applied Biosystems, cat#4310884E). Manuscript ID: HEP-12-1863 8 Immunohistochemical staining on tissue microarrays All histologic sections were examined and representative tumor areas free of necrosis or hemorrhage were pre-marked in formalin-fixed paraffin-embedded blocks. Two 2.0-mmdiameter tissue cores were taken from the donor blocks and transferred to the recipient paraffin block at defined array positions. Uninvolved normal liver tissues from 12 patients with metastatic colonic carcinoma of the liver were used as controls. Immunostaining was performed using rabbit polyclonal antibody to BCL9 (ab37305, 1:100; Abcam Inc., Cambridge, MA) and mouse monoclonal antibody to MTDH (NBP1-51585, 1:400; Novus Bio., Littleton, CO). Consecutive 4-µm tissue sections embedded in the slides were deparaffinized with xylene, hydrated in serial dilutions of alcohol, and immersed in peroxidase-blocking solution (Dako, Glostrup, Denmark) to quench endogenous peroxidase activity. The sections were then microwaved in 0.01 mol/L Citrate buffer (pH 6.0) for 30 minutes. Incubation with the primary antibody was performed overnight at 4ºC. 3,3’-Diaminobenzidine tetrahydrochloride was used as the chromogen, and Mayer’s hematoxylin counterstain was applied. Negative controls (isotypematched irrelevant antibody or preimmune mouse serum as primary antibody) were run simultaneously. Results of staining were evaluated without knowledge of the clinicopathologic features. Duplicate tissue cores for each tumor showed high levels of homogeneity for staining intensity and percentage of positive cells. The higher score was taken as the final score where there was a difference between duplicate tissue cores. Immunoreactivity for BCL9 was observed in the nucleus and/or cytoplasm of tumor cells, but only in the cytoplasm of hepatocytes in 12 control normal livers. For assessment of the positivity of immunostaining for BCL9, only nuclear staining was regarded as positive. The staining intensity was first scored (0, negative; 1, weak; 2, Manuscript ID: HEP-12-1863 9 moderate; 3, strong), followed by the percentage of positive cells (1, 1-5%; 2, 6–25%; 3, 26– 50%; 4, 51%–75%; 5, >75%). The final score of each tumor was obtained by multiplying the score for staining intensity by the score for percentage of positive cells. For categorical analyses, the immunoreactivity of tumor cells was graded as low (total score =1), moderate (total score =2), or high (total score ≥3). Immunoreactivity for MTDH was observed only in the cytoplasm of tumor cells and hepatocytes in 12 control normal livers. In all control normal livers, MTDH was observed in fewer than 20% of hepatocytes. We defined MTDH as positive when ≥20% of tumor cells showed cytoplasmic immunoreactivity. The staining intensity was first scored (0, negative; 1, weak; 2, moderate; 3, strong), followed by the percentage of positive cells (1, 2140%; 2, 41–60%; 3, 61–80%; 4, >80%). The final score of each tumor was obtained by multiplying the score for staining intensity by the score for percentage of positive cells. For categorical analyses, the immunoreactivity of tumor cells was graded as low (total score =1-4), moderate (total score =5-8), or high (total score =9-12). Manuscript ID: HEP-12-1863 Supplementary Tables Table S1. Major demographic and clinicopathological parameters of the HCC cohort. See file “Wang_HCC_CNA_landscape_Table_S1.docx”. Table S2. Cell lines used in this study and their sources. See file “Wang_HCC_CNA_landscape_Table_S2.docx”. Table S3. All CNA peaks predicted by GISTIC2 analysis. See file “Wang_HCC_CNA_landscape_Table_S3.docx”. A full version of Table S3 can be found in file “Wang_HCC_CNA_landscape_Table_S3_full.docx”. Table S4. Association of average copy number of the focal amplification and deletion peaks identified by GISTIC2 to clinical and outcome variables. See file “Wang_HCC_CNA_landscape_Table_S4.docx”. Table S5. All pathways enriched among cis-acting genes in CNA peaks. See file “Wang_HCC_CNA_landscape_Table_S5.docx”. A full version of Table S5 can be found in file “Wang_HCC_CNA_landscape_Table_S5_full.docx”. Table S6. Candidate driver genes selected based on focal CNA, expression changes and model availability. See file “Wang_HCC_CNA_landscape_Table_S6.docx”. 10 Manuscript ID: HEP-12-1863 Table S7. Somatic copy number, gene expression and IHC staining results for BCL9. See file “Wang_HCC_CNA_landscape_Table_S7.docx”. Table S8. Somatic copy number, gene expression and IHC staining results for MTDH. See file “Wang_HCC_CNA_landscape_Table_S8.docx“. 11 Manuscript ID: HEP-12-1863 12 Supplementary Figures A B 10 1400 5 DSS DFS DSS perm DFS perm CNA-mRNA correlation in cis Permutation 1200 10 No. genes (< p-value) Counts 1000 800 600 10 10 4 3 2 400 10 1 200 0 -0.4 -0.2 0 0.2 0.4 cis-correlation 0.6 0.8 1 10 0 0 1 2 3 4 5 -log10(p-value) 6 7 8 Figure S1. Association between somatic CNA, mRNA expression and clinical outcome. (A) Distribution of genome-wide cis-correlation between somatic CNAs and mRNA expression levels across the HCCs (red) and those obtained from the permutated dataset where sample labels were randomly scrambled (blue). (B) Cumulative Distribution of Cox regression p-values for associating somatic CNAs to clinical outcomes including both disease specific survival (DSS, blue) and disease-free survival (DFS, red), in comparison to same distributions calculated from a permutated dataset where sample labels were randomly scrambled (“DSS perm” in green and “DFS perm” in black). X-axis of the plot shows the –log10 of the Cox regression p-value cutoffs, and Y-axis is the number of genes with a p-value smaller than the corresponding cutoff on the Xaxis. Manuscript ID: HEP-12-1863 Figure S2. Pair-wise DNA/DNA correlations reveal significant associations between unlinked loci. Pair-wise Pearson correlations computed from ~20k gene copy number are ordered by genes’ chromosomal positions through the genome on the X and Y axes with red indicating a positive correlation and blue indicating a negative correlation. The red diagonal represents the correlation of genes with themselves. 13 Manuscript ID: HEP-12-1863 14 A B 4 x 10 Two-sample t-test: p-value = 2.6e-007 0.18 12 0.16 0.14 Peak frequency Peak size (KB) 10 8 6 4 0.12 0.1 0.08 0.06 2 0.04 0 0.02 Amplifications 0 0 10 Deletions D 0.2 0.2 0.18 0.18 0.16 0.16 0.14 0.14 Peak frequency Peak frequency C 5 10 Peak size (KB) 0.12 0.1 0.08 0.06 0.04 0.12 0.1 0.08 0.06 0.04 0.02 0.02 0 0 10 2 4 10 10 Peak size (KB) 1 p = 4e-032 p = 4.4e-005 0.8 0.6 0.4 0.2 0 -0.2 -0.4 non-peak deletion 6 10 E Correlation incis amplification deletion amplification 0 1 2 3 Peak amplitude 4 Figure S3. Distributions of GISTIC2 peak statistics. (A) Peak size distribution and comparison between amplification and deletion peaks. (B) Relationship between peak frequency and peak size for amplification peaks. (C) Relationship between peak frequency and peak size for deletion peaks. (D) Relationship between peak frequency and peak amplitude. Peak frequencies were calculated based on copy number cutoffs of 3 and 1.3 for amplification and deletion peaks, respectively. Peak amplitudes were taken as the average copy number of a peak among patients called positive for the peak. (E) Distribution of ciscorrelations for genes not in any GISTIC2 peak, in deletion or amplification peaks. P-values shown were based on two-sample t-tests. Manuscript ID: HEP-12-1863 15 Manuscript ID: HEP-12-1863 Figure S4. Frequent somatic copy number alterations in critical signaling pathways in HCC. 16 Manuscript ID: HEP-12-1863 A 17 B Figure S5. Immunostaining in HCCs for BCL9 and MTDH. HRP, original magnification x200) showing high levels of immunoreactivity for BCL9 in the nucleus (A) and MTDH in the cytoplasm (B). Manuscript ID: HEP-12-1863 18 Gene expression (log2) A Copy number Gene expression (log2) B Copy number Figure S6. Inferred copy numbers and expression levels of BCL9 and MTDH in a panel of 30 HCC cell line models. (A) BCL9; (B) MTDH. Cell lines colored in green were used as amplified models for each candidate driver in the functional validation, and those in pink were used as controls (i.e. copy number neutral with respect to the target). Manuscript ID: HEP-12-1863 19 D 0 20 40 60 80 100 1.0 0.4 0.6 0.8 MTDH<2.3 (n=109) MTDH 3 (n=36) 0.2 0.2 0.4 0.6 0.8 BCL9<2.3 (n=106) BCL9 3 (n=24) Disease-Specific Survival (p=0.716) 0.0 Probability of Disease-Specific Survival 1.0 Disease-Specific Survival (p=0.12) 0.0 Probability of Disease-Specific Survival A 0 120 20 40 60 80 100 120 Time (month) Time (month) B E Disease-Free Survival (p=0.215) 0 20 40 60 80 100 1.0 0.2 0.4 0.6 0.8 MTDH<2.3 (n=114) MTDH 3 (n=37) 0.0 Probability of Disease-Free Survival 1.0 0.2 0.4 0.6 0.8 BCL9<2.3 (n=110) BCL9 3 (n=24) 0.0 Probability of Disease-Free Survival Disease-Free Survival (p=0.033) 0 120 20 40 60 80 100 120 Time (month) Time (month) C F BCL9 (p=0.0008) 4.5 3.5 1.5 2.5 Inferred Copy Number 3.5 3.0 2.5 2.0 Inferred Copy Number 4.0 MTDH (p=0.0090) I II III AJCC Stage IV I II III IV AJCC Stage Figure S7. Association to clinical outcomes and AJCC tumor stages for the putative CNA drivers BCL9 and MTDH. Panels (A-C) show data for BCL9; panels (D-F) show data for MTDH. Patients were separated into two groups based on the amplification status of BCL9 and MTDH. Differences in disease-specific and disease-free survival were assessed by Kaplan-Meier curves and the associated log rank test. For association with Manuscript ID: HEP-12-1863 20 AJCC tumor stage, a linear trend test was performed (p-values shown in parenthesis). Reference 1. Edge SB, Byrd DR, Compton CC, Fritz AG, Greene FL, Trotti A. AJCC Cancer Staging Manual. 7 ed. Chicago, IL: Springer, 2010. 2. Edmondson HA, Steiner PE. Primary carcinoma of the liver: a study of 100 cases among 48,900 necropsies. Cancer 1954;7:462-503. 3. LCSGJ: The Liver Cancer Study Group of Japan: The general rules for the clinical and pathological study of primary liver cancer. In. 2 ed. Tokyo, Japan: Kanehara & Co., 2003; 38. 4. Hoshida Y, Villanueva A, Kobayashi M, Peix J, Chiang DY, Camargo A, Gupta S, et al. Gene expression in fixed tissues and outcome in hepatocellular carcinoma. N Engl J Med 2008;359:1995-2004. 5. Staaf J, Vallon-Christersson J, Lindgren D, Juliusson G, Rosenquist R, Hoglund M, Borg A, et al. Normalization of Illumina Infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios. BMC Bioinformatics 2008;9:409. 6. Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, Hakonarson H, et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 2007;17:1665-1674. 7. Hupe P, Stransky N, Thiery JP, Radvanyi F, Barillot E. Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics 2004;20:3413-3422. Manuscript ID: HEP-12-1863 8. Bengtsson H, Simpson K, Bullard J, Hansen K. aroma.affymetrix: A generic framework in R for analyzing small to very large Affymetrix data sets in bounded memory. Berkeley: Department of Statistics, University of California, Berkeley; 2008 February 2008. Report No.: Tech Report #745. 9. 21 Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res;40:D109-114.