Tracking the Origin of Metastatic Prostate Cancer Online Supplementary Information Table of Contents Supplementary text .................................................................................................................. 1 Patient ....................................................................................................................................... 1 Tissue preparation and histological examination ................................................................. 1 Laser capture microdissection ................................................................................................ 2 DNA extraction ......................................................................................................................... 2 Construction of sequencing libraries...................................................................................... 2 Processing of sequence data .................................................................................................... 2 Identification of copy-number alterations, break-point regions, and similarity................ 3 Supplementary figures ............................................................................................................. 5 Supplementary table ................................................................................................................ 8 References ................................................................................................................................. 9 Supplementary Text Patient In this study we examined one case of metastatic prostate cancer in which the patient underwent radical prostatectomy including pelvic lymphadenectomy at the Karolinska University Hospital, Stockholm, Sweden, in 2011. The patient presented with elevated serum prostate-specific antigen (PSA) of 16 ng/ml, and preoperative biopsies revealed a total cancer extent of 6 mm with Gleason score 4+5=9. The preoperative clinical stage was T2. At the first postoperative visit to the urology clinic, serum PSA was 0.5 ng/ml and the patient was considered to have generalized disease. Treatment with bicalutamide was initiated and serum PSA declined to immeasurable levels (<0.05 ng/ml). Since then the patient has been treated intermittently with bicalutamide and his serum PSA has varied accordingly. On his last visit in February 2014, his serum PSA was <0.05 ng/ml. Tissue preparation and histological examination The prostate, seminal vesicles, and lymph nodes were cut and totally embedded as previously described [1]. Microscopic examination of the prostate revealed a bilateral conventional adenocarcinoma with Gleason score 4+3=7 and tertiary Gleason pattern 5. Metastatic adenocarcinoma was observed bilaterally in the lymph nodes. The tumor in the prostate extended from the apex to the base and the maximum extent measured 47 mm 30 mm. Extraprostatic extension, especially in the area around the base, and bilateral seminal vesicle invasion were observed. The main tumor originated from the peripheral zone. A small separate tumor focus was also found in the transition zone. The postoperative stage was pT3bN1. The slides were reviewed and cancer tissue was outlined with Indian ink. For genetic analysis, 45 morphologically distinct foci were identified, including conventional adenocarcinoma (both from the peripheral zone and the transition zone), intraductal carcinoma (IDC), high-grade prostatic intraepithelial neoplasia, and atrophy. Areas with extraprostatic extension and seminal vesicle invasion were also included (Supplementary Table 1). The paraffin blocks were sectioned in 10-m serial sections and mounted on UVtreated membrane slides. The slides were stained with hematoxylin and eosin and stored at 4C before laser capture microdissection (LCM). Immunohistochemical staining for alphamethylacyl-CoA-racemase (AMACR) and p63 was performed according to standard protocols. 1 Laser capture microdissection Cells dissected by LCM (Zeiss) were collected in adhesive capsules. The original slides were examined under a microscope during dissection to ensure that the correct areas were collected. In some cases the focus was examined, marked, and cut using the LCM microscope but instead of being catapulted into the adhesive capsule, it was transferred to the capsule using a sharp knife. In this way larger areas of cells were collected, yielding larger amounts of DNA, in a shorter period of time. All tumor foci comprised 80–100% tumor cells. Six foci were excluded because of morphological variation after resectioning. Two samples of normal tissue were also collected, one containing epithelial cells and the other containing stromal cells. Photographs were taken before and after dissection. The dissected samples were kept in a box protected from light at room temperature (18–22C) until they were prepared for DNA extraction. This study was approved by the Regional Ethics Committee, Stockholm (registration number 2010/710-31/2). DNA extraction Genomic DNA extracted from laser-captured cells was prepared using a QIAamp DNA Micro Kit (Qiagen) according to the manufacturer’ instructions. The samples were eluted in RNAsefree water and stored at 18C before further genome analysis. Sample quantities were measured using a Qubit 2.0 fluorometer. The amount of DNA extracted from the samples varied from 1.3 to 51.6 ng, depending on the size and quality of the dissected area. Six samples were lost because of manufacturing flaws (leakage from the adhesive capsules during extraction). Construction of sequencing libraries Sequencing libraries for the fresh-frozen primary tissue (SWE-54A), right/left lymph-node metastases (SWE-54B/SWE-54C), and blood were constructed using an in-house protocol [2]. A ThruPLEX kit (Rubicon Genomics) was chosen to prepare laser-captured tissues according to a standard protocol because of its superior performance with minute amounts of material. Library preparation failed for four samples (5_PZ_T1, 6_PZ_T1, 32_PZ_T4, 33_PZ_T4). Processing of sequence data Samples SWE-54A–C were processed previously [3]. Adapters were removed and overlapping reads from the ThruPLEX libraries were merged using SeqPrep software (v. 1.1) [4] without boosting base qualities. BWA (v. 0.6.2) [5] was used to map the reads to human 2 reference genome GRCh37. SAMtools (v. 0.1.18) [6] was used for various purposes such as BAM file sorting and sequence quality filtering. Quality metrics were generated via the Picard (v. 1.84) toolkit [7]. Realignment and recalibration were subsequently performed using the Genome Analysis toolkit (v. 2.4.7) [8]. Identification of copy-number alterations, break-point regions, and similarity To infer somatic phylogenetic relationship between the tissues analyzed we adopted a similar approach to that of Navin et al. [9] with modifications. This involved five basic steps. 1. Identification of a set of genomic regions for a minimum set of reads mapping within each region. 2. Calculation of log2 ratios relative to a germline DNA source (non-tumor tissue or blood) for each focus and GC normalization to remove sequencing library preparation biases. 3. Smoothing and segmentation for identification of break-point regions (BPRs) marking the start of an amplification or deletion event. 4. Use of BPRs to infer the somatic phylogenetic relationship between all profiled tissues. Identification of a set of genomic regions for a minimum set of reads mapping within each region The first region was defined by starting from chromosome 1 and expanding until 75 reads were accounted for. This process was iteratively continued to the end of chromosome 1 and for each chromosome separately. Reads mapping in large regions with poor mappability [10] and regions reported to perform poorly [9] were removed. These regions were subsequently removed from the final list. DNA from stromal tissue was used to prevent tumor-associated copy-number alterations (CNAs) from inflating/deflating the number of reads. The stromal DNA was subsampled to match the sample with the lowest number of uniquely mapping reads (median of all samples 5 879 978, range 4 779 694–10 542 513). To validate the robustness of the regions, only those containing 35 reads or more using the epithelial normal DNA were kept for CNA estimation [9]. This yielded ~45 000 regions with a median size of ~55 kb, a sensitivity limit below the median size of focal CNA events in cancer (deletions 700 kb, amplifications 900 kb) [11]. 3 Calculating log2 ratios relative to a germline DNA source for each region and performing GC normalization to remove library preparation biases The fresh-frozen primary tumor (SWE-54A) and lymph-node metastases (SWE-54B,C) were compared to the blood sample, processed simultaneously using the same library preparation protocol [2]. The tissue foci were compared to the normal epithelial tissue, all prepared using the Rubicon ThruPLEX kit. Only uniquely mapping single-end reads were accounted for, as the presence of paired-end is an indication of low read quality, causing SeqPrep to fail because of merging of short fragments obtained from formalin-fixed, paraffin-embedded tissues. GC normalization was carried out using the lowess function in R [12]. Visual inspection confirmed removal of GC biases for all libraries except focus 11_PZ_T1, for which obvious artifacts remained. This sample was excluded from further processing. Smoothing and segmentation for BPR identification The R package DNAcopy [13] was used to smooth and segment the log2 ratios with setting “prune” to undo local trends in the data. The R package ggplot2 [14], was used to visualize the CNA data. Obvious false positives, fluctuating around zero with ~10% change in copy number were removed from downstream processing. Use of BPRs to infer somatic phylogenetic relationships To avoid false positives emanating from repetitive regions, any BPRs starting within 10 kb after the end of a filtered repetitive region (described above) were removed. BPRs starting at the first region of each chromosome were also removed, as they did not contain any discriminative information. Adjacent BPRs within 100 kb were labeled as the same event, which yielded a total of 385 BPRs. Nine foci contained fewer than five BPRs and were excluded, leaving 25 tissue foci for further analysis, and 28 samples in total with SWEA/B/C. Of the BPRs detected in the right and left lymph-node metastases, 80% were found in both (62 events). This set of BPRs was used to infer similarity to the two metastases. In addition, because prostate cancer progression commonly occurs in evolutionary leaps [15], we used events present in two or more areas to obtain a robust set of BPRs for construction of a phylogenetic tree. The Euclidean distances for the BPR matrix were fed into the neighborjoining algorithm available in the R package APE [16] to construct the phylogenetic tree. 4 Supplementary Figures Supplementary Fig. 1 – Whole-mount sections of radical prostatectomy specimen from patient SWE-54. Cancer tissue is outlined with black Indian ink. Areas harvested by laser-capture microdissection are marked with dashed lines and numbered. Histopathological features are listed in Supplementary Table 1. E = extraprostatic extension. HV, VV = seminal vesicles. II, III = lymph nodes. 5 SP3 53-11 6V HE SP353-11 5V HE SP353-1 1 6H HE SP353-11 HE 5H 30 23 24 21 40 39 16 SP353-11 HE 7V SP353-11 HE 7H 44 SP353-11 4H HE SP353-11 4V HE 20 25 26 19 22 17 SP353-11 HE 8M SP353-11 HE 8V 38 SP353-11 HE 8M 18 43 SP353-11 3 HE 45 15 42 14 13 11 12 35 SP353-11 HE HV 34 10 SP353-11 HE VV 31 29 SP353-11 HE 2 37 32 9 8 1 28 33 7 6 5 27 4 2 41 3 36 SP353-11 HE 1V 0.0 SP353-11 HE 1M 0.2 0.4 0.6 0.8 Proportion LN-Met break-point regions SP353-11 HE 1H SP353-11 HE II SP353-11 HE III Failed during preparatio n/QC Contained <5 BPRs Supplementary Fig. 2 – Schematic map of the whole-mount sections including somatic similarity to the metastases for each area investigated. The annotation for each region is available in Supplementary Table 1. Areas that failed sample preparation, library preparation, or sequencing quality control are colored light grey. Areas containing fewer than five break-point regions were excluded from the phylogenetic analysis and are colored dark grey. 6 Supplementary Fig. 3 – Copy-number alterations used for break point analysis for 8_T1_IDC (intraductal carcinoma), SWE-54C (lymph-node metastasis), 21_PZ_T1 (perineural invasion), SWE-54A (previously profiled fresh-frozen primary tissue), and area 20_PZ_T1 (highly related to SWE-54A). 7 Supplementary Table Supplementary Table 1 – Tissue areas investigated using laser-capture microdissection Areas Histopathological feature 1–29 Main tumor 7–9 Intraductal carcinoma 30–34 Separate foci in peripheral and transitional zone 10, 18 Extraprostatic extension 27–29 Seminal vesicle invasion 35–39 High grade PIN 40–42 Atrophy 43–44 Benign prostatic hyperplasia 45 Basal cell hyperplasia (with spots of squamous cell metaplasia) 46 Normal tissue, epithelial cells 47 Normal tissue, stromal cells 8 References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] Jonmarker S, Glaessgen A, Culp WD, et al. Expression of PDX-1 in prostate cancer, prostatic intraepithelial neoplasia and benign prostatic tissue. Apmis 2008;116:491–8. Neiman M, Sundling S, Grönberg H, et al. Library preparation and multiplex capture for massive parallel sequencing applications made efficient and easy. PLoS One 2012;7:e48616. Lindberg J, Mills IG, Klevebring D, et al. The mitochondrial and autosomal mutation landscapes of prostate cancer. Eur Urol 2013;63:702–8. St John J, editor. SeqPrep. http://github.com/jstjohn/SeqPrep Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009;25:1754–60. Li H, Handsaker B, Wysoker A, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009;25:2078–9. Picard toolkit.Picard Web site. http://picard.sourceforge.net/ Depristo MA, Banks E, Poplin R, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011;43:491–8. Navin N, Kendall J, Troge J, et al. Tumour evolution inferred by single-cell sequencing. Nature 2011;472:90–4. Karolchik D. The UCSC Table Browser data retrieval tool. Nucleic Acids Res 2004;32:493D–496. Zack TI, Schumacher SE, Carter SL, et al. Pan-cancer patterns of somatic copy number alteration. Nat Genet 2013;45:1134–40. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2013. Seshan VE, Olshen A. DNAcopy: DNA copy number data analysis. http://www.bioconductor.org/packages/release/bioc/html/DNAcopy.html Wickham H. ggplot2: Elegant Graphics for Data Analysis. New York, NY: Springer; 2009. Baca SC, Prandi D, Lawrence MS, et al. Punctuated evolution of prostate cancer genomes. Cell 2013;153:666–77. Paradis E, Claude J, Strimmer K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 2004;20:289–90. 9