Clinical and genomic analysis of a randomised phase II study evaluating anastrozole and fulvestrant in postmenopausal patients treated for large operable or locally-advanced hormone-receptor-positive breast cancer Quenel-Tueux et al Supplementary Methods DNA was purified from frozen samples by phenol/chloroform extraction (Phase Lock Gel, 5 PRIME GmbH, Hamburg, Germany) followed by a DNeasy column (Qiagen GmbH, Hilden, Germany). The order of extraction was randomized except that pre and post treatment biopsies from the same tumor were processed at the same time. Libraries were prepared and sequenced on a GAIIx sequencer (Illumina, San Diego, CA) in multiple batches, as described (Wood et al, 2010). Reads passing quality control in Casava (Illumina) were aligned to human genome version hg19 using bwa version 0.5.9r16 in single end mode using default parameters. The bam files are available in the NCBI Sequence Read Archive under accession number SRP035504. The quality was assessed with qualimap v.0.7.1 (Garcia-Alcalde et al, 2012). The bam files were reformatted in Perl with the bam2windows.pl script available on www.precancer.leeds.ac.uk/softwareand-datasets/cnanorm/ and processed with the CNAnorm package (Gusnanto et al, 2012) in R statistical software (R Core Team, 2013). CNAnorm converts the number of reads in non-overlapping sliding windows to ratios with respect to a pool of normal female DNA, normalizes for GC content and segments the ratios with DNAcopy (Gusnanto et al, 2012; Olshen et al, 2004). Base quality, mapping quality, read depth and tumor cell content are summarized in Supplementary Table 6. Tumor cell content was estimated independently by a pathologist and by CNAnorm. To create the heatmap in Fig 3, the normalized segmented ratios from CNAnorm (Gusnanto et al, 2012) for genomic windows for which complete data were available were median centered then centroid-clustered with correlation as the distance metric using Cluster 3.0 and Treeview 1.1.6r4 (de Hoon et al, 2004) (the cdt and atr files are provided as supplementary data). To identify significant differences before and after treatment, reads were counted in 200 kb windows and processed in CNAnorm with the same parameters for all samples. To correct for differences in normal tissue contamination in the two samples, a linear model was used to project the segmented pretreatment biopsy values onto the post-treatment surgical measurement scale. The procedure is shown for tumour H11 in Sup Fig. 4. In this case there was no difference between the profiles before and after treatment. The software finds the peaks corresponding to the major ploidy values (the blue dots in Sup Fig. 4B). When the amount of normal tissue differs before and after treatment the spacing of the peaks is different in the two plots. The linear model adjusts the scale to ensure that the spacing of the peaks is the same before and after treatment. In the normalised data we do not know the amount of normal tissue in either sample, but we do know that the separation of the modal ploidy peaks is the same, so we can legitimately compare the two profiles. We used this approach with all chromosomes included in the linear model, then repeated it with only chr 1 and 16 included in the model. The logic behind using chr1 and 16 as reference chromosomes is that changes in these chromosomes are likely to be present in all subclones, because the der(1;16) translocation is normally the first oncogenic event to occur in the transformation of ER+ tumour cells. The expected values after applying the model to the biopsy data were subtracted from the observed values in the surgical samples to generated the raw differences in copy number. The difference score is the standard deviation of these differences. To prevent distortion of the results by outliers, segments <20Mb were removed before calculating the difference scores. The median absolute deviation (mad) of the normalized ratios minus the segmented copy numbers was used as a measure of noise in individual samples. The mads for the pre-treatment biopsies were multiplied by the coefficient from the linear model, then the difference scores were divided by the higher of the mads for the biopsy and surgical sample to adjust for quality. To estimate the significance of the corrected scores, the tumors for which replicates were available were used to generate an empirical distribution of corrected scores for self comparisons (n = 9). The difference between the corrected scores for the before and after treatment comparisons and the mean of the self comparisons for the replicates was divided by the standard deviation of the replicates to generate Z values for each tumor. The Z value thus measures differences before and after treatment as a multiple of the standard deviation of the differences between the replicates under the null hypothesis that two biopsies from the same tumour should have the same profile. The results are shown for a linear model including all chromosomes and for one using chr1 and chr16 as "invariant" reference chromosomes (the most logical choice for luminal breast cancer); the same tumors were identified as different with both models. To facilitate visual interpretation of differences before and after treatment, the inferred modal ploidy in the copy number plots shown was adjusted to the same value in both samples (this scaling does not change the underlying data). Supplementary clinical and surgical results In the anastrozole arm one patient who refused surgery had stable disease and received chemotherapy and trastuzumab for a HER2-positive tumour. The other had stable disease, so anastrozole treatment was continued for an additional three years. At the end of this period she developed progressive disease, refused surgery again, and was transferred to fulvestrant. For the six patients in the fulvestrant arm who did not undergo surgery, two had disease progression at 4 and 5 months, treated by chemotherapy and mastectomy, respectively; three showed stable disease and continued to receive endocrine therapy (1 anastrozole and 2 fulvestrant) for more than 6 months; and the remaining patient did not undergo surgical treatment because of rib invasion after 6 months of hormonal therapy (she was not scored as having progressed because she was T4a at inclusion). de Hoon MJ, Imoto S, Nolan J, Miyano S (2004) Open source clustering software. Bioinformatics 20(9): 1453-4 Garcia-Alcalde F, Okonechnikov K, Carbonell J, Cruz LM, Gotz S, Tarazona S, Dopazo J, Meyer TF, Conesa A (2012) Qualimap: evaluating next-generation sequencing alignment data. Bioinformatics 28(20): 2678-9 Gusnanto A, Wood HM, Pawitan Y, Rabbitts P, Berri S (2012) Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data. Bioinformatics 28(1): 40-7 Olshen AB, Venkatraman ES, Lucito R, Wigler M (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5(4): 557-72 R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/ Wood HM, Belvedere O, Conway C, Daly C, Chalkley R, Bickerdike M, McKinley C, Egan P, Ross L, Hayward B, Morgan J, Davidson L, MacLennan K, Ong TK, Papagiannopoulos K, Cook I, Adams DJ, Taylor GR, Rabbitts P (2010) Using next-generation sequencing for high resolution multiplex analysis of copy number variation from nanogram quantities of DNA from formalin-fixed paraffin-embedded specimens. Nucleic Acids Res 38(14): e151