1 Additional File 1 2 METHODS 3 Papillary thyroid carcinoma (PTC) multi-'omics data 4 Clinical and genomic profiles for 310 PTC were obtained from The Cancer Genome 5 Atlas Project (TCGA). 6 SNP6.0 platform), methylation (Illumina Infinium HM450 BeadChips) and mutations 7 (exon sequencing) were processed as described [1]. Multidimensional analysis was 8 performed on level 3 data, using TCGA criteria for calling hypermethylation (difference 9 in beta value Tumor vs Normal > 0.1), copy number loss and mutation, and by querying 10 the cBio Cancer Genomics Portal [2]. For methylation analysis, an averaged beta value 11 for normal adjacent tissue was used, due to the lack of matched tissue for all samples. 12 Analyses were focused on the following CpG sites: CUL3: cg12698349 (chr2: 13 225,449,008) / cg09509863 (chr2: 225,450,859); KEAP1: cg10505024 (chr19: 14 10,602,877) / cg20226327 (chr19: 10,602,960) / cg22779878 (chr19: 10,600,446) / 15 cg25801292 (chr19: 10,614,272); RBX1: cg07288693 (chr22:41,348,222) / cg21454656 16 (chr22:41,347,267). Similarly, normalized RNA sequencing data (Illumina HiSeq 2000) 17 data was obtained for 310 tumors and 40 adjacent non-malignant tissues and analyzed 18 as previously described [3, 4]. 19 Statistical analysis 20 Differences between tumor and non-malignant groups were evaluated in GraphPad 21 software v6 using a Mann-Whitney test. Gene-set enrichment analysis (GSEA) was 22 performed using whole transcriptome normalized mRNA levels (n= 20,074 genes) from Data for genome-wide gene dosage alterations (Affymetrix 1 23 tumor (n= 310) and adjacent non-malignant tissues (n= 40) profiles. Here, GSEA was 24 applied to calculate the probability that a transcriptional target gene set from the 25 Molecular Signatures Database v4.0 (Broad Institute) is significantly enriched in tumors 26 relative to non-malignant control tissues. Two GSEA were performed using default 27 parameters. One, against a transcription factor target motif set comprised of NFE2L2 28 transcriptional target genes (n=255 genes), defined as those with an NFE2L2 predicted 29 binding motif (NTGCTGAGTCAKN) in promoter regions around [-2kb,2kb] a 30 corresponding transcription start site (V$NRF2_Q4, MSigDB database v4.0, Broad 31 Institute), and another against all available transcriptional target gene sets in the 32 MSigDB database. 33 34 References for Additional File 1: 35 36 37 38 1. The Cancer Genome Atlas Pilot Project (TCGA). The results published here are based upon data generated by The Cancer Genome Atlas Project established by the NCI and NHGRI. TCGA 2013, http://cancergenome.nih.gov/ . 39 40 41 42 2. Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, Jacobsen A, Byrne CJ, Heuer ML, Larsson E, et al: The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer discovery 2012, 2:401-404. 43 44 45 3. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, et al: MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic acids research 2010, 38:e178. 46 47 4. Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN: RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 2010, 26:493-500. 48 49 2 50 K E A P 1 -c g 2 5 8 0 1 2 9 2 100 51 52 53 R e la tiv e fr e q u e n c y (% ) N o rm a l Tum or 80 60 40 20 0 0 .0 0 5 .1 0 0 .1 0 5 .2 0 0 .2 0 5 .3 0 0 .3 0 5 .4 0 0 .4 0 5 .5 0 0 .5 0 5 .6 0 0 .6 0 5 .7 0 0 .7 0 5 .8 0 0 .8 0 5 .9 0 54 55 B e t a - v a lu e s ( b in c e n t e r ) 56 C U L 3 -c g 1 2 6 9 8 3 4 9 57 20 64 0 5 .5 0 0 .4 0 5 0 .4 0 0 .3 5 .3 0 .2 0 0 .2 0 5 0 .1 0 0 .0 0 5 0 63 .1 62 40 .0 61 60 0 60 N o rm a l 80 0 59 Tum or R e la tiv e fr e q u e n c y (% ) 58 100 B e t a - v a lu e s ( b in c e n t e r ) 65 R B X 1 -c g 0 7 2 8 8 6 9 3 66 100 0 .4 5 0 0 0 .3 .4 5 0 .3 .2 .2 .1 5 0 0 5 0 0 0 .1 0 0 73 0 0 .0 72 20 5 71 40 .0 70 60 0 69 N o rm a l 80 0 68 R e la tiv e fr e q u e n c y (% ) Tum or 67 B e t a - v a lu e s ( b in c e n t e r ) 74 75 76 77 78 Histogram of beta values for normal (green, N=45) and tumoral (red, N=310) tissues across the set of analyzed samples. The percentage of samples with beta values (yaxis) falling in a given bin interval (x-axis) is plotted. Examples of CpG cluster considered for analysis are shown for each gene: A) KEAP1 B) CUL3 C) RBX1 3