Supplement to “High-Resolution Analysis of Copy Number Alterations and Associated Expression Changes in Ovarian Tumors” Peter M. Haverty, Lawrence S. Hon, Joshua S. Kaminker, John Chant, Zemin Zhang SUPPLEMENTARY METHODS Breakpoint Analysis by Subtype Then number of chromosomal breakpoints in each sample was calculated as the number of GLAD segments minus one. Gains and losses were calculated as the number of GLAD segments with an inferred log ratio of > 0.3 and < -0.3, respectively. Comparison of Tumor Gain Regions with Normal Copy Number Polymorphisms Data on normal human copy number variations (Redon et al., 2006) were downloaded from http://projects.tcag.ca/variation/data/500K_EA_sample_level_CNV.gff for the Affymetrix 500K SNP array. For the tumor data, GLAD segments in individual samples with an inferred copy number of > 2.5 or < 1.62 were separately selected for comparison. LOH Analysis Genotypes were generated using the BRLMM method (apt-probeset-genotype version 1.8.5 from the Affymetrix Power Tools package). The HMM-based method implemented in Partek 6.3b was used to find detect of LOH in each sample using the default parameters. Pathway Analysis of Gains and Losses Genes in the GISTIC-identified peaks of gain and loss, with significant expression changes, were overlaid onto a global molecular network developed from information contained in the Ingenuity knowledge base (Ingenuity Systems, www.ingenuity.com). This database contained information for 231 out of 260 genes. Significant expression changes were taken to be two-fold over-expression or under-expression, in samples with a given CNA relative to normal samples, 1 for regions of gain or loss, respectively. Networks of these genes were then algorithmically generated based on their connectivity. Gene Locations SNP Chip probe sets were mapped to the genome, NCBI assembly version 36, using annotation provided by the Affymetrix web site (http://www.affymetrix.com/products/arrays/specific/500k.affx). Genes and Affymetrix expression probe sets were localized on the genome by aligning RefSeq sequences and probe set targets to the genome, NCBI Version 36, using GMAP (Wu and Watanabe, 2005). Cytoband and miRNA locations, for the NCBI Version 36 of the genome, were downloaded from the UCSC genome browser (http://www.genome.ucsc.edu). 2 REFERENCES Benjamini Y, Hochberg Y. 1995. Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing. J. R. Statist. Soc. 57:289-300. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzalez JR, Gratacos M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME. 2006. Global variation in copy number in the human genome. Nature 444:444-54. Wu TD, Watanabe CK. 2005. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21:1859-75. 3 SUPPLEMENTARY FIGURE LEGENDS Figure S1: Copy Number Alteration Trends by Subtype a) The number of transitions from one copy number to another (breakpoints) per sample, stratified by ovarian cancer subtype. b) The sum of the inferred log ratio (ILR) in each segment > 0.3 (gain) in each sample, stratified by subtype. c) The sum of the ILR in each segment < -0.3 (loss) in each sample, stratified by subtype. Figure S2: Comparison of CNA and CNV lengths Distribution of the lengths of genome segments with abnormal copy number in a normal population (blue) (Redon et al., 2006) and in 52 ovary tumors (red) for a) gain and b) loss. Figure S3: GISTIC and Heatmaps for CNAs on Chromosomes 3, 8, and 20 Heatmap and GISTIC for specific chromosomes. Details of the amplicon structure and statistical significance is presented for (a,b) chromosome 3; (b, c) chromosome 8; (d,e) chromosome 20. Heatmaps and GISTIC amplification significance were prepared and plotted as in Fig. 2. Figure S4: Chromosome 8 Amplifications and Associated Expression Change a) GISTIC Q-values for chromosome 8, plotted as in Fig. 4. b) Close-up of the region of panel “a” indicated by vertical, red lines. The locations of all genes associated with a RefSeq transcript or Affymetrix probeset are indicated in red. c) Expression ratios for genes in the genome region depicted in panel “b” and represented on the HGU133 Plus 2.0 array. The probeset associated with each gene with the highest 90th percentile value was selected. Red bars indicate the log2 ratio of the mean value in samples with copy gain of this gene and the mean value in samples without gain of this gene. Orange bars represent the log2 ratio of the mean value in samples with copy gain of this gene and the mean expression level in the normal samples. In order to show the PVT1 probeset 1558290_a_at, which is not present in the Affymetrix U133A or B chips, we used expression data from Affymetrix U133 Plus 2.0 chips for 41 of the 52 tumors and 51 normal samples. Figure S5: Chromosome 8 Deletions and Associated Expression Changes 4 a) GISTIC Q-values for loss on chromosome 3, plotted as in Fig. 2. Vertical, dashed lines indicate areas of focus in panels ‘b’ (red) and ‘c’ (blue). b) Close-up of the region of panel ‘a’ indicated by vertical, red lines. The locations of all genes associated with a RefSeq transcript or Affymetrix probeset are indicated in red. c) Close-up of the region of panel ‘a’ indicated by vertical, blue lines. The locations of all genes associated with a RefSeq transcript or an Affymetrix probeset are indicated in red. d) Expression ratios for genes in the genome region depicted in panel ‘b’ and represented on the HGU133A and B arrays. The probeset associated with each gene with the highest 90th percentile value was selected. Red bars indicate the log2 ratio of the mean value in samples with copy gain of this gene and the mean value in samples without gain of this gene. Orange bars represent the log2 ratio of the mean value in samples with copy gain of this gene and the mean expression level in the normal samples. e) Expression ratios for genes in the genome region depicted in panel ‘c’, plotted as in panel ‘d’. Figure S6: Deletions and LOH The fraction of samples with copy loss (ILR < -0.3) at each position. b) The fraction of samples determined to have LOH at each position. c) A Heatmap and hierarchical clustering dendrogram depicting regions of LOH (red). Hierarchical clustering was performed with the Euclidean distance metric and complete linkage clustering. Figure S7: Pathway Analysis of Gains and Losses Pathway analysis inferred connectivity among genes in GISTIC-identified regions of gain and loss with significant expression changes. Network nodes have been colored to indicate the level and magnitude of their expression change in samples with a given CNA relative to normal ovary. Green represents decreased expression and red indicates increased expression. White network nodes represent genes in this set and white nodes represent genes related to these genes through the Ingenuity database. 5