Core epithelial-to-mesenchymal transition interactome gene-expression signature is associated with claudin-low and metaplastic breast cancer subtypes The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Taube, J. H., J. I. Herschkowitz, K. Komurov, A. Y. Zhou, S. Gupta, J. Yang, K. Hartwell, et al. “Core epithelial-tomesenchymal transition interactome gene-expression signature is associated with claudin-low and metaplastic breast cancer subtypes.” Proceedings of the National Academy of Sciences 107, no. 35 (August 31, 2010): 15449-15454. As Published http://dx.doi.org/10.1073/pnas.1004900107 Publisher National Academy of Sciences (U.S.) Version Final published version Accessed Thu May 26 22:53:34 EDT 2016 Citable Link http://hdl.handle.net/1721.1/84510 Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. Detailed Terms Corrections CELL BIOLOGY Correction for “Core epithelial-to-mesenchymal transition interactome gene-expression signature is associated with claudinlow and metaplastic breast cancer subtypes,” by Joseph H. Taube, Jason I. Herschkowitz, Kakajan Komurov, Alicia Y. Zhou, Supriya Gupta, Jing Yang, Kimberly Hartwell, Tamer T. Onder, Piyush B. Gupta, Kurt W. Evans, Brett G. Hollier, Prahlad T. Ram, Eric S. Lander, Jeffrey M. Rosen, Robert A. Weinberg, and Sendurai A. Mani, which appeared in issue 35, August 31, 2010 of Proc Natl Acad Sci USA (107:15449–15454; first published August 16, 2010; 10.1073/pnas.1004900107). The authors note that on page 15453, right column, fifth full paragraph, sentence 2, “Microarray data for HMLE Gsc, Snail, Twist, TGF-β1, and vector control has been deposited in GEO under accession number GSE9691” should instead appear as “Microarray data for HMLE Gsc, Snail, Twist, TGF-β1, and vector control has been deposited in GEO under accession number GSE24202.” MEDICAL SCIENCES Correction for “Detection of MLV-related virus gene sequences in blood of patients with chronic fatigue syndrome and healthy blood donors,” by Shyh-Ching Lo, Natalia Pripuzova, Bingjie Li, Anthony L. Komaroff, Guo-Chiuan Hung, Richard Wang, and Harvey J. Alter, which appeared in issue 36, September 7, 2010 , of Proc Natl Acad Sci USA (107:15874–15879; first published August 23, 2010; 10.1073/pnas.1006901107). The authors note that the GenBank Accession Numbers for the gag gene are HM630557–HM630562, and GenBank Accession Numbers for the env gene are HQ157342–HQ157343. www.pnas.org/cgi/doi/10.1073/pnas.1015107107 www.pnas.org/cgi/doi/10.1073/pnas.1015095107 19132 | PNAS | November 2, 2010 | vol. 107 | no. 44 www.pnas.org Core epithelial-to-mesenchymal transition interactome gene-expression signature is associated with claudinlow and metaplastic breast cancer subtypes Joseph H. Taubea,1, Jason I. Herschkowitzb,1, Kakajan Komurovc,1, Alicia Y. Zhoud,e, Supriya Guptaf, Jing Yangg, Kimberly Hartwellh, Tamer T. Onderd,e, Piyush B. Guptae,f, Kurt W. Evansa, Brett G. Holliera, Prahlad T. Ramc, Eric S. Landerd,e,f,i, Jeffrey M. Rosenb, Robert A. Weinbergd,e,j,2, and Sendurai A. Mania,2 Departments of aMolecular Pathology and cSystems Biology, University of Texas M.D. Anderson Cancer Center, Houston, TX 77054; bDepartment of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030; dWhitehead Institute for Biomedical Research, Cambridge, MA 02142; eDepartment of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142; gDepartment of Pharmacology, University of California, La Jolla, CA 92093-0636; h Department of Medicine, Brigham and Womens Hospital, Boston, MA 02115; fBroad Institute, Cambridge, MA 02142; iDepartment of Systems Biology, Harvard Medical School, Boston, MA 02115; and jMassachusetts Institute of Technology, Ludwig Center for Molecular Oncology, Cambridge, MA 02139 Contributed by Robert A. Weinberg, June 24, 2010 (sent for review January 26, 2010) cancer stem cells | Twist | Snail | FOXC1 T he epithelial-to-mesenchymal transition (EMT) is a process in which adherent epithelial cells shed their epithelial characteristics and acquire, in their stead, mesenchymal properties, including fibroblastoid morphology, characteristic gene-expression changes, increased potential for motility, and in the case of cancer cells, increased invasion, metastasis, and resistance to chemotherapy (1, 2). Recent studies have linked EMTs with both metastatic progression of cancer (3–5) and acquisition of stem-cell characteristics (6, 7), leading to the hypothesis that cancer cells that undergo an EMT are capable of metastasizing through their acquired invasiveness and, following dissemination, through their acquired self-renewal potential, which enables them to spawn the large cell populations that constitute macroscopic metastases. EMTs can be induced in vitro by exposing certain normal and neoplastic epithelial cells to various growth factors, including TGF-β1, hepatocyte growth factor, and PDGF (1, 8). Downstream of each of these growth factors and their cognate receptors lies an array of transcription factors (TFs), each of which is capable, on its own, of inducing an EMT. These TFs include the homeobox protein Goosecoid (Gsc) (9), the zinc-finger proteins Snai1 (Snail) and Snai2 (Slug) (10–12), the basic helix-loop-helix protein Twist1 (Twist) (3), the forkhead box proteins FOXC1 (8, 13) and FOXC2 www.pnas.org/cgi/doi/10.1073/pnas.1004900107 (14), and the zinc-finger, E-box-binding proteins Zeb1 and Sip1 (Zeb2) (8, 15). In addition to TFs, members of the miR-200 family of microRNAs are down-regulated during an EMT (16). This downregulation results, in turn, in the up-regulated expression of several critical target genes, notably Zeb1 and Zeb2. Expression of any one of these TFs or down-regulation of the miR-200 family in an appropriate epithelial cell suffices to induce an EMT (17, 18). Moreover, many of these TFs are expressed concomitantly in the mesenchymal cells that have passed through an EMT. The overlapping and unique contributions of each inducer to the EMT program have not been adequately explored. Recent microarray analyses have allowed stratification of clinical breast cancers into a large number of distinct subtypes, such as luminal, basal-like, and HER2+ (19–21). Yet, other subtypes of tumors have recently been delineated by us and others (22, 23). These distinctions have proven to be useful in predicting responses to therapy, time to metastasis, and survival. In the present study, we assayed gene expression signatures (GESs) in human mammary epithelial cells (HMLE) induced to undergo an EMT by expressing Gsc, Snail, Twist, or TGF-β1 or by knocking down expression of E-cadherin (24), and found that EMTs induced by these methods induce an overlapping set of changes in gene expression, which we term the “EMT core signature.” We compared the EMT core signature with signatures that define breast cancer subtypes and found a close association with the claudin-low and metaplastic breast cancer subtypes. Results Interaction of EMT-Related TFs as a Regulatory Network. To eluci- date the network of interactions between EMT regulatory factors, we first assessed the expression of known EMT inducers and the genes known to be regulated during EMT in various breast cancer cell lines. To do so, we assayed four breast cancer cell lines for expression of the following TFs known to promote EMTs: Gsc (9), FOXC1 (8, 13), FOXC2 (14), Zeb1, Zeb2 (25, 26), Slug (10, 27), Snail (11, 28), and Twist (3), as well as other genes associated with Author contributions: J.H.T., J.I.H., K.K., R.A.W., and S.A.M. designed research; J.H.T., J.I.H., K.K., A.Y.Z., S.G., J.Y., P.B.G., K.W.E., B.G.H., and S.A.M. performed research; J.Y., K.H., T.T.O., P.B.G., P.T.R., E.S.L., J.M.R., R.A.W., and S.A.M. contributed new reagents/ analytic tools; J.H.T., J.I.H., K.K., P.B.G., R.A.W., and S.A.M. analyzed data; and J.H.T., J.I.H., K.K., R.A.W., and S.A.M. wrote the paper. The authors declare no conflict of interest. 1 J.H.T., J.I.H., and K.K. contributed equally to this work. 2 To whom correspondence may be addressed. E-mail: weinberg@wi.mit.edu or smani@ mdanderson.org. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1004900107/-/DCSupplemental. PNAS | August 31, 2010 | vol. 107 | no. 35 | 15449–15454 CELL BIOLOGY The epithelial-to-mesenchymal transition (EMT) produces cancer cells that are invasive, migratory, and exhibit stem cell characteristics, hallmarks of cells that have the potential to generate metastases. Inducers of the EMT include several transcription factors (TFs), such as Goosecoid, Snail, and Twist, as well as the secreted TGF-β1. Each of these factors is capable, on its own, of inducing an EMT in the human mammary epithelial (HMLE) cell line. However, the interactions between these regulators are poorly understood. Overexpression of each of the above EMT inducers up-regulates a subset of other EMT-inducing TFs, with Twist, Zeb1, Zeb2, TGF-β1, and FOXC2 being commonly induced. Up-regulation of Slug and FOXC2 by either Snail or Twist does not depend on TGF-β1 signaling. Gene expression signatures (GESs) derived by overexpressing EMT-inducing TFs reveal that the Twist GES and Snail GES are the most similar, although the Goosecoid GES is the least similar to the others. An EMT core signature was derived from the changes in gene expression shared by upregulation of Gsc, Snail, Twist, and TGF-β1 and by down-regulation of E-cadherin, loss of which can also trigger an EMT in certain cell types. The EMT core signature associates closely with the claudin-low and metaplastic breast cancer subtypes and correlates negatively with pathological complete response. Additionally, the expression level of FOXC1, another EMT inducer, correlates strongly with poor survival of breast cancer patients. either the mesenchymal state [N-cadherin (29) and fibronectin (30)] or the epithelial state [E-cadherin (31)]. Relative to the luminal epithelial cell line, MCF-7, the MDA-MB 231, MDA-MB 435, and SUM 1315 basal B cell lines consistently expressed higher levels of Snail, Slug, and Zeb2, but not Gsc or FOXC1 (Fig. 1A). To further understand the interactions among EMT-inducing TFs, we tested how overexpression of a single EMT-inducing TF affects the expression of other TFs in this network, thereby establishing an “EMT interactome.” To do so, we used immortalized HMLE cells that express high levels of E-cadherin and low levels of vimentin, similar to the MCF-7 breast cancer cell line and contrasting with the more mesenchymal MDA-MB 231 and SUM1315 breast cancer cell lines (Fig. S1). HMLE cells were induced to undergo EMTs through overexpression of Gsc, Snail, Twist, or TGF-β1. We then used RT-PCR to confirm the expression levels of these genes, as well as various genes known to be associated with EMT programs (Fig. 1B and Fig. S2). Among the EMT-inducing TFs, expression of FOXC2, Slug, Zeb1, and Zeb2 was consistently elevated in response to overexpression of Gsc, Snail, Twist, or TGF-β1, suggesting that these four genes operate downstream of Gsc, Snail, Twist, and TGF-β1 in the EMT interactome (Fig. 1 B and C). In contrast, Gsc was not up-regulated by overexpression of the other EMT inducers, suggesting an independent mode of transcriptional regulation (Fig. 1B). Recent findings have demonstrated that EMT programs can also be induced by inhibiting members of the miR-200 family of microRNAs (17, 18, 32). We wished to determine the location of these miRNAs in the EMT interactome. The miR-200 family and Zeb transcription factors form a mutually inhibitory, double negative-feedback loop (16–18, 33). Consistent with a role for Zeb1 and Zeb2 in miR-200 repression, all miR-200 family members were repressed by the forced expression of either Gsc, Snail, Twist, or TGF-β1 (Fig. 1D). These results reinforce the findings that in addition to interactions of the EMT-inducing TFs, re- pression of the miR-200 family of microRNAs accompanies and participates in execution of the EMT program. A widely accepted model of the EMT during tumorigenesis proposes that TGF-β1, produced by the tumor microenvironment, promotes tumor progression by inducing expression of the EMTinducing TFs (34). Indeed, treatment of HMLE cells with TGF-β1 for a period of 12 d induced an EMT (14) and the expression of a number of EMT-inducing TFs (Fig. 1B). We undertook to determine whether another type of feedback loop operated here, specifically whether induction of an EMT achieved by overexpressing EMT-inducing TFs yields, in turn, the expression of TGF-β1. For this experiment, we measured the expression of TGF-β1 mRNA by quantitative RT-PCR (qRT-PCR) in HMLE cells expressing either an empty vector control, Gsc, Snail, Twist, or TGF-β1. In all cases, expression of TGF-β1 mRNA increased by more than 5-fold compared with control cells (Fig. 2A). Because TGF-β1 sufficed to induce an EMT, we investigated whether the TGF-β1 expressed in response to an EMT operates in an autocrine manner to induce and maintain expression of downstream EMT effectors, specifically FOXC2 and Slug. Because Snail and Twist are known to induce a complete EMT, including expression of FOXC2, Slug, and TGF-β1, we blocked TGF-β1 signaling in these cells using SM16, a compound developed to inhibit the TGF-β1 pathway by interfering with the type I receptor signaling (35). As anticipated, treating HMLE cells expressing either an empty control vector, Snail, or Twist with SM16 significantly reduced the levels of phosphorylated SMAD2/3 by 50 to 90%, compared with DMSO (Fig. 2B). Under these conditions, neither Fig. 1. Expression of EMT marker genes in breast cancer cell lines and nontransformed EMT-induced human mammary epithelial cells. (A) Expression of EMT marker genes was measured by semiquantitative RT-PCR performed on RNA extracted from MCF-7, MDA-MB 231, MDA-MB 435, and SUM1315 cells. (B–D) HMLE cells were transduced with a retrovirus overexpressing the indicated gene and expression of the indicated gene was measured by semiquantitative RT-PCR (B) or quantitative PCR (C and D). GAPDH (C) or U6 snoRNA (D) was amplified for normalization. Error bars represent the SD from three independent experiments. Fig. 2. TGF-β1 is up-regulated by EMT inducers, but is not required for upregulation of FOXC2 or Slug. (A) HMLE cells were transduced with a retrovirus overexpressing the indicated genes and expression of TGF-β1 mRNA was measured by quantitative RT-PCR. GAPDH was amplified for normalization. (B) HMLE cells transduced with a retrovirus overexpressing the indicated gene were treated with DMSO or SM16, and a TGF-β signaling inhibitor and gene expression was assayed by Western blot for the indicated proteins. Relative levels of pSmad2/3 were calculated by densitometry and listed beneath the bands. α-Actin was used as a loading control. 15450 | www.pnas.org/cgi/doi/10.1073/pnas.1004900107 Taube et al. Hierarchical Clustering of GESs from EMTs Induced by Gsc, Snail, Twist, TGF-β1, and by Down-Regulating E-Cadherin. Because various inducers of EMT seemed to be capable of transactivating a common set of downstream effectors, we extended these comparisons by determining the larger effects of these TFs on overall gene expression within cells. We began by deriving individual GESs of the cells induced to undergo EMT by forced expression of Gsc, Snail, Twist, or TGF-β1, or by knocking down E-cadherin, and cataloguing genes exhibiting at least a 2-fold up- or down-regulation under any of the conditions of EMT induction. Cells in which Ecadherin was experimentally down-regulated were also included in these analyses, as we previously demonstrated that this alteration could also serve to induce an EMT in these cells (24). We then performed hierarchical clustering of these GESs to measure their degree of similarity (Fig. 3A). Of the five methods used to induce EMTs in this analysis, Snail and Twist generated the most similar GESs, consistent with the fact that Twist is a direct target gene of the Snail transcription factor (36), and Gsc showed the most distinct GES (Fig. 3A), consistent with the observation that Gsc was not induced by expression of either Snail, Twist, or TGF-β1 (Fig. 1B). Not surprisingly, expression of TGF-β1 or knockdown of E-cadherin produced GESs that diverged slightly from both Snail and Twist, likely because of the fact that, even though either method induces an EMT, neither involves ectopic expression of a TF and, therefore, may affect its downstream targets less directly. Fig. 3. Clustering of the individual EMT-inducer gene expression profiles based on their similarity to each other and to a large cohort of breast cancer patient gene-expression samples. (A) Heatmap of gene expression profiles in each sample. Values represent the log2 ratio over control. The diagram above each heatmap shows the similarities of EMT-inducer profiles to each other. (B) Heat map of correlations of each EMT-inducer profile with the gene expression profiles of patient tumors in a large cohort of breast cancer patients (49). Pearson correlation coefficients were calculated for all of the EMT-inducer–patient pairs and plotted as a heat map, in which red indicates a highly significant positive correlation, green indicates a highly significant negative correlation and black indicates a weak or absent correlation. Taube et al. EMT Core Signature. Localized paracrine signals arising in the tumor microenvironment appear to be important in inducing an EMT in nearby cancer cells; accordingly, only a minority of cells within a tumor may display characteristics of having entered into or passed through an EMT. These observations complicate attempts to identify groups of cells within a tumor that have undergone an EMT. For this reason, we sought to identify a GES common to several known EMT-inducing signals. Establishment of such a core signature should prove useful in the future for identifying the small subset of cells that have undergone an EMT within a tumor, even if such an EMT is induced by a currently unknown inducer of this program. To identify such a signature, we reanalyzed the microarraybased gene-expression changes from HMLE cells expressing Gsc, Snail, Twist, or TGF-β1 or an siRNA targeting E-cadherin. We identified an EMT core signature consisting of 159 genes that were down-regulated and 87 genes that were up-regulated at least 2-fold by all of these EMT-inducing signals (Table S1). Several of these gene-expression changes were validated by qRT-PCR (Fig. S3). As expected, the epithelial adhesion molecule E-cadherin (CDH1) was down-regulated in all samples and the mesenchymal markers N-cadherin (CDH2), vimentin, and the invasion-associated protease, matrix metalloproteinase (MMP2), were commonly upregulated (Table S1). In addition, Zeb1, one of the TFs capable of orchestrating an EMT, was commonly up-regulated. Overexpression of Snail has been shown to down-regulate the expression of the cell-cycle protein cyclin D2 (CCND2) (37), and indeed we found that CCND2 was down-regulated in the EMT core signature. Cells undergoing an EMT are known to be resistant to apoptosis (38–41). Consistent with this finding, the proapoptosis gene BIK was down-regulated in all samples. In addition, genes with a Zeb1 binding site present in their promoters were enriched in the set of genes down-regulated by EMT, including the gene discoidin domain receptor 1 (DDR1), which encodes an RTK involved in E-cadherin localization and distinguishes basal A from basal B cell lines (42–45), and the gene follistatin (FST), which is a TGF-β antagonist (Tables S2 and S3) (46–48). Contributions of EMT-Inducing TFs to Breast Cancer. We next attempted to use existing GESs of various types of breast cancer to uncover possible connections between the various TFs and breast cancer pathogenesis. We sought to understand the relatedness of individual EMT-inducers and their respective GESs to the geneexpression profiles with individual breast carcinomas in a large (244 patient) cohort of these tumors (49). The gene-expression profiles derived from many tumors displayed a high correlation to GESs derived from Gsc, Snail, Twist, and TGF-β1, but correlated less strongly with the signature derived from knocking down E-cadherin (Fig. 3B). As predicted from previous observations, tumors with a high correlation to the Snail GES also displayed a high correlation to the Twist GES. In addition, the geneexpression profiles derived from each tumor in this dataset tend to correlate with the GESs derived from more than one EMT inducer. Moreover, the expression signature of individual tumors did not make it possible to resolve between alternative mechanisms of EMT induction (i.e., Twist-, Snail-, TGF-β1-, or Gsc-induced EMT) in these breast cancers. Nonetheless, the GESs of individual EMT-inducers could be used to assay for the occurrence of an EMT in a breast tumor, even if the EMT was induced by a stillunknown inducer and activation of this transdifferentiation program occurred in only a minority of the cells within each tumor. Importantly, the possibility that stromal elements within the tumor samples rather than the carcinoma cells themselves resulted in the detection of a mesenchymal GES could not be discounted. Correlation of EMT Core Signature with the Basal B, Claudin-Low, and Metaplastic Signatures. We also wished to determine how the EMT core signature relates to the GESs of various subtypes of breast cancer. We compared the mean expression values of genes up- or PNAS | August 31, 2010 | vol. 107 | no. 35 | 15451 CELL BIOLOGY the Snail- nor Twist-expressing cells altered expression of FOXC2 or Slug protein (Fig. 2B). This finding indicated that induction of an EMT by Snail or Twist does not depend on ongoing TGF-β1 autocrine signaling and suggests, instead, that Snail and Twist can induce FOXC2 and Slug through alternative mechanisms, quite possibly involving only intracellular signaling. down-regulated by an EMT against the GES of various breast cancer cell line subtypes (50) and found that the EMT core signature most strongly correlated with the signature derived from basal B cell lines (Fig. S4). This subtype is characterized by high vimentin expression and a stem cell-like expression profile (45, 50), similar to cells that have undergone an EMT; in contrast, the signatures derived from the basal A and luminal subtypes (45, 50) (Fig. S4) were not closely related to the EMT core signature. We also measured the association of the EMT core signature to GESs derived from breast cancer subtypes defined by Hennessy et al. (23) and against the GES of a cohort of metaplastic tumors that had been previously analyzed using the same microarray platform (22). The up- and down-regulated genes from the EMT core signature were also significantly up- and down-regulated in both the metaplastic and claudin-low breast cancers, but not in other breast cancer subtypes (Fig. 4 A and B). This finding is consistent with recent reports that have linked low E-cadherin expression, indicative of passage through an EMT, to clinically encountered breast tumors of either the claudin-low or metaplastic subtype (22, 23). Because passage through an EMT has also been linked with acquisition of stem cell characteristics (6), this suggests that the neoplastic cells in these tumors contain relatively high proportions of cancer stem cells. We also assayed the expression of the mRNAs encoding EMTinducing TFs in breast cancers that had been classified into basallike, HER2+, luminal, claudin-low, and metaplastic subtypes (22, 23). We found that the Snail, Slug, Zeb1, Twist1, and FOXC1 TFs were up-regulated in metaplastic tumors, and Zeb2 was most commonly up-regulated in claudin-low tumors (Fig. 5A). Both metaplastic and claudin-low tumor subtypes showed significant down-regulation of E-cadherin mRNA expression (CDH1) as well as claudin 4 (Fig. 5A). Although nearly all claudin-low tumors express high levels of Zeb2, only half of these tumors concomitantly expressed high levels of other EMT-inducing genes (Snail, Twist, FOXC1) (Fig. 5A). Strikingly, nearly all basal-like breast tumors, but not luminal or HER2+ tumors, display consistently high expression of the EMT-inducer FOXC1, which is known to be associated with increased cell motility and invasion and decreased expression of E-cadherin (13) (Fig. 5A and Fig. S5). Correlation of Expression of an EMT-Inducer with Patient Survival. We anticipated that tumors expressing the EMT-associated genes would exhibit a poorer survival than tumors not expressing the EMT-associated genes. For this reason, we performed clinical prediction analyses on the breast tumors in the Netherlands Cancer Institute (NKI) and University of North Carolina (UNC) databases (49, 51), using the EMT core signature. We found, 0.4 0.2 0.0 -0.2 -0.4 -0.2 al or m e 2+ ER N al ik H in l-l m Ba sa ic w in ud la et -lo st la ap C M Lu al m or N e 2+ ER H al ik in l-l m Lu Ba sa ic w -lo st la ud in la ap C et 0.2 0.0 -0.4 -0.6 M EMT Down Genes 0.4 Mean Expression B E MT U p G e n e s 0.6 Mean Expression A Fig. 4. The core EMT signature correlates with metaplastic and claudin-low breast cancers. (A and B) Gene-expression data were plotted as box plots for the mean expression of the EMT-up genes (A) and the EMT-down genes (B) by subtype using the dataset from Hennessey et al. (23) with the addition of 12 metaplastic tumors. Subtypes were called as in Herschkowitz et al. (22). The list was derived using Significance Analysis of Microarrays and cut off at the top ∼1,155 probes, 544 up and 611 down. Next, the genes were extracted in the dataset and averaged in each tumor (up and down separately). The one-way ANOVA significance for each plot was P < 0.0001. 15452 | www.pnas.org/cgi/doi/10.1073/pnas.1004900107 Fig. 5. EMT-inducing genes are up-regulated in metaplastic and claudin-low tumors and FOXC1 expression marks basal-like tumors and is a predictor of poor clinical outcome. (A) Data were extracted for EMT-related genes and samples were ordered by intrinsic subtype as in Herschkowitz et al. (22). The twelve metaplastic tumors from Hennessey et al. (23) were also included. (B) Patients from the NKI and UNC datasets were divided into high- and low-FOXC1 expressers and their survival was compared. The P value was generated using the χ2 test of equality. Taube et al. Discussion The EMT core signature that we have identified was generated by comparing the gene-expression changes that occurred by overexpressing either Gsc, Snail, Twist, or TGF-β1, or by reducing levels of E-cadherin. As we found, this signature is enriched for genes containing the Zeb1 transcription factor-binding site near their transcription start sites (Table S2), and is most similar to GES from claudin-low and metaplastic breast cancers, as well as cancers without pCR. Of note, a recent study demonstrated that the GES obtained from normal mammary epithelial stem cells is most similar to the claudin-low signature (55). This finding is consistent with the present findings that the expression changes associated with the EMT-inducing TFs correlate most closely with the claudin-low and metaplastic breast cancers and with our earlier demonstration that passage through an EMT results in the acquisition of stemcell characteristics (6). Analysis of mRNA expression levels in cells overexpressing EMT-inducers reveals that Twist, Snail, and TGF-β1 each upregulate expression of Foxc2, Zeb1, and Zeb2 (Fig. 2C). Furthermore, we demonstrate that Snail and Twist generate the most closely related GESs, and expression of TGF-β1, Gsc, and knockdown of E-cadherin generate quite distinct, less closely related signatures. Further exploration of the differences between these various GESs will likely yield insights into the mechanisms required to activate the EMT program and those involved in maintaining the resulting mesenchymal/stem cell state. Surprisingly, we were unable to observe any correlation between expression of genes in the EMT core signature and a poorer survival outcome among breast cancer patients, despite the observation that the EMT core signature correlates with metaplastic tumors, which are themselves linked with poorer patient survival (56). Interestingly, a recent report by Creighton et al. found that the GES derived from breast tumor-initiating cells was also enriched in claudin-low tumors, but likewise did not serve as a useful prognostic marker for clinical progression (53). However, they found that cell populations that survive conventional chemotherapeutic treatment are enriched for cells with EMT-associated mesenchymal and stem cell-associated tumor-initiating features (53). Additionally, an association between EMT and chemotherapy resistance, rather than survival, is suggested by Farmer et al., who show that a stroma-related GES predicts shorter relapse-free surTaube et al. vival among patients who received chemotherapy, but not among untreated breast cancer patients (52). These findings are consistent with our data indicating that gene expression profiles of tumors from patients that responded to chemotherapy correlate negatively with the EMT core signature. The use of total mRNA isolated from entire tumors for annotation of the expression datasets may preclude detection of cells that have undergone EMT, as only a small proportion of the neoplastic cells in each tumor may exhibit an EMT phenotype. Poor-prognosis tumors may therefore harbor an insufficient number of cells with a mesenchymal phenotype, so that there is no detectable effect on the gene-expression profile of the tumor as a whole. Accordingly, it may be necessary to collect tumor cells at the invasive edges of tumor cell islands and analyze these for the expression of the EMT-associated genes to gauge the true malignant potential of the tumor as a whole. Among the group of EMT-inducing genes studied here, FOXC1 expression most closely correlates with a poorer survival of the breast cancer patients in the NKI and UNC datasets (Fig. 5B). FOXC1 was highly expressed in metaplastic and basal-like breast cancer subtypes (Fig. 5A), for which highly effective treatments are not currently available. The closely related FOXC2 TF is known to play a key role in inducing an EMT, promoting metastasis and to be associated with aggressive basal-like breast cancers as studied using immunohistochemistry methods (14). However, expression of FOXC2 could not be correlated with survival in the cited microarray-based stratification of breast cancer patients, because the arrays used did not contain suitable FOXC2 probes. Given the consistently high levels of FOXC1 in basal-like and metaplastic breast cancer subtypes, the high levels of Zeb2 in claudin-low breast cancers, and the contribution of FOXC2 to basal-like breast cancer (14), these genes or the pathways that regulate these genes would seem to represent potential targets for the development of novel anticancer therapeutics. Methods RT-PCR. RNA was prepared from cultured cells by TRIzol extraction (Invitrogen). Complementary DNA was synthesized using Moloney Murine Leukemia virus reverse transcriptase (Invitrogen). Relative quantification values were calculated using the ddCt method (57) and values were plotted with SD using GraphPad Prism v5.0 (GraphPad Software, Inc.). Cell Culture. Immortalized HMLE were grown as previously described (58). MCF-7, SUM1315, and MDA-MB 231 cells were maintained according to ATCC instructions. The SUM159 cell line used for the study was developed from pleural effusions of breast cancer patients (59). SUM159 cells were cultured in F-12 Hams (Gibco) supplemented with 5% FBS (Tissue Culture Biologicals), 5 μg/mL of insulin, and 1 μg/mL hydrocortisone at 37 °C in 5% CO2. Microarray Data Analysis and Deposition. Microarray data for HMLE shCDH1 and vector control were extracted from the Gene Expression Omnibus (GEO) database under the accession GSE9691. Microarray data for HMLE Gsc, Snail, Twist, TGF-β1, and vector control has been deposited in GEO under accession number GSE9691. To compare gene expression among different EMT inducers, a heat map was generated using genes with at least a 2-fold change in at least one condition included in the clustering. To compare individual EMT signatures to breast cancer patient gene-expression data, a heat map showing correlations between gene-expression profiles of the EMT inducers and the geneexpression profiles of tumors from breast cancer patients in the UNC cohort was created. Pearson correlation coefficients were calculated between the gene-expression profiles of all EMT inducer-patient pairs. Significance Analysis of Microarrays. The list of up- and down-regulated EMT genes was derived using Significance Analysis of Microarrays and cut off at the top 1,155 probes, 544 up and 611 down. The genes in the dataset were then extracted and averaged in each sample (up and down separately). ANOVA was performed and boxplot graphs were plotted with gene expression using GraphPad Prism v5.0 (GraphPad Software, Inc.). The one-way ANOVA for each plot was P < 0.0001. PNAS | August 31, 2010 | vol. 107 | no. 35 | 15453 CELL BIOLOGY however, that expression by a tumor of the EMT core signature failed to predict patient survival. We therefore pursued an alternative hypothesis: that expression of individual genes capable of inducing an EMT, rather than multigene signatures, could predict patient survival. Although the expression of Gsc, Snail, Twist, or TGF-β1 genes did not predict clinical outcome, high expression of the FOXC1 TF was indeed a powerful predictor of poor clinical outcome. Thus, those patients whose tumors exhibited high levels of FOXC1 gene expression showed a significantly poorer survival outcome (Fig. 5B). Although, FOXC1 was not induced by Gsc, Snail, Twist, or TGF-β1, expression of FOXC1 in immortalized, but nontumorigenic, human mammary epithelial MCF12A cells has been shown to induce an EMT (8, 13). Recent reports have linked expression of specific sets of genes to resistance to chemotherapy in breast cancer patients (52, 53). Using an M.D. Anderson Cancer Center study of response to neoadjuvant chemotherapy with paclitaxel, 5-fluorouracil, doxorubicin, and cyclophosphamide (54), we asked if tumor geneexpression profiles from patients that had a pathological complete response (pCR) correlated with the EMT core signature. We found that gene-expression profiles from patient tumors with pCR showed a negative correlation to the EMT core signature, whereas GESs from tumors from patients without pCR showed a slightly positive correlation to the EMT Core Signature (Fig. S6). Survival Analysis. Patients from the NKI and UNC datasets were divided into high- and low-FOXC1 expressers and their survival was compared. The P value was generated using the χ2 test of equality using the survival package in R (www.r-project.org). ACKNOWLEDGMENTS. We thank the members of the Mani laboratory for helpful discussions and Biogen for the SM16 (TGF-β inhibitor). Additional help with data analysis was provided by Dr. Keith Beggarly and Nianxiang Zhang. This work was supported in part by the Broad Institute, Komen postdoctoral Fellowship PDF0707744 (to. J.I.H.), KG091219 (to B.G.H), National Institutes of Health Grant R01CA125109 (to P.T.R.), a pilot grant from the Dan L. Duncan Cancer Center (to J.M.R. and S.A.M.), a Research Trust award from M. D. Anderson Cancer Center, the V Foundations V scholar award, and Cancer Center Support Grant CA016672 (to S.A.M.), and National Institutes of Health Grant RO1 CA78461, the Breast Cancer Research Foundation, and Ludwig Center for Molecular Oncology at the Koch Institute (to R.A.W.). 1. Kalluri R, Weinberg RA (2009) The basics of epithelial-mesenchymal transition. J Clin Invest 119:1420–1428. 2. Gupta PB, et al. (2009) Identification of selective inhibitors of cancer stem cells by high-throughput screening. Cell 138:645–659. 3. Yang J, et al. (2004) Twist, a master regulator of morphogenesis, plays an essential role in tumor metastasis. Cell 117:927–939. 4. Frixen UH, et al. (1991) E-cadherin-mediated cell-cell adhesion prevents invasiveness of human carcinoma cells. J Cell Biol 113(1):173–185. 5. Sabbah M, et al. (2008) Molecular signature and therapeutic perspective of the epithelial-to-mesenchymal transitions in epithelial cancers. Drug Resist Updat 11(4-5): 123–151. 6. Mani SA, et al. (2008) The epithelial-mesenchymal transition generates cells with properties of stem cells. Cell 133:704–715. 7. Morel AP, et al. (2008) Generation of breast cancer stem cells through epithelialmesenchymal transition. PLoS ONE 3:e2888. 8. Polyak K, Weinberg RA (2009) Transitions between epithelial and mesenchymal states: Acquisition of malignant and stem cell traits. Nat Rev Cancer 9:265–273. 9. Hartwell KA, et al. (2006) The Spemann organizer gene, Goosecoid, promotes tumor metastasis. Proc Natl Acad Sci USA 103:18969–18974. 10. Nieto MA, Sargent MG, Wilkinson DG, Cooke J (1994) Control of cell behavior during vertebrate development by Slug, a zinc finger gene. Science 264:835–839. 11. Batlle E, et al. (2000) The transcription factor snail is a repressor of E-cadherin gene expression in epithelial tumour cells. Nat Cell Biol 2(2):84–89. 12. Cano A, et al. (2000) The transcription factor snail controls epithelial-mesenchymal transitions by repressing E-cadherin expression. Nat Cell Biol 2(2):76–83. 13. Bloushtain-Qimron N, et al. (2008) Cell type-specific DNA methylation patterns in the human breast. Proc Natl Acad Sci USA 105:14076–14081. 14. Mani SA, et al. (2007) Mesenchyme Forkhead 1 (FOXC2) plays a key role in metastasis and is associated with aggressive basal-like breast cancers. Proc Natl Acad Sci USA 104: 10069–10074. 15. Aigner K, et al. (2007) The transcription factor ZEB1 (deltaEF1) promotes tumour cell dedifferentiation by repressing master regulators of epithelial polarity. Oncogene 26: 6979–6988. 16. Gregory PA, Bracken CP, Bert AG, Goodall GJ (2008) MicroRNAs as regulators of epithelial-mesenchymal transition. Cell Cycle 7:3112–3118. 17. Gregory PA, et al. (2008) The miR-200 family and miR-205 regulate epithelial to mesenchymal transition by targeting ZEB1 and SIP1. Nat Cell Biol 10:593–601. 18. Park SM, Gaur AB, Lengyel E, Peter ME (2008) The miR-200 family determines the epithelial phenotype of cancer cells by targeting the E-cadherin repressors ZEB1 and ZEB2. Genes Dev 22:894–907. 19. Hu Z, et al. (2006) The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics 7:96. 20. Sørlie T, et al. (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA 98:10869–10874. 21. Sorlie T, et al. (2003) Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA 100:8418–8423. 22. Herschkowitz JI, et al. (2007) Identification of conserved gene expression features between murine mammary carcinoma models and human breast tumors. Genome Biol 8(5):R76. 23. Hennessy BT, et al. (2009) Characterization of a naturally occurring breast cancer subset enriched in epithelial-to-mesenchymal transition and stem cell characteristics. Cancer Res 69:4116–4124. 24. Onder TT, et al. (2008) Loss of E-cadherin promotes metastasis via multiple downstream transcriptional pathways. Cancer Res 68:3645–3654. 25. Comijn J, et al. (2001) The two-handed E box binding zinc finger protein SIP1 downregulates E-cadherin and induces invasion. Mol Cell 7:1267–1278. 26. Vandewalle C, et al. (2005) SIP1/ZEB2 induces EMT by repressing genes of different epithelial cell-cell junctions. Nucleic Acids Res 33:6566–6578. 27. Savagner P, Yamada KM, Thiery JP (1997) The zinc-finger protein slug causes desmosome dissociation, an initial and necessary step for growth factor-induced epithelial-mesenchymal transition. J Cell Biol 137:1403–1419. 28. Carver EA, Jiang R, Lan Y, Oram KF, Gridley T (2001) The mouse snail gene encodes a key regulator of the epithelial-mesenchymal transition. Mol Cell Biol 21:8184–8188. 29. Derycke LD, Bracke ME (2004) N-cadherin in the spotlight of cell-cell adhesion, differentiation, embryogenesis, invasion and signalling. Int J Dev Biol 48:463–476. 30. Burdsal CA, Damsky CH, Pedersen RA (1993) The role of E-cadherin and integrins in mesoderm differentiation and migration at the mammalian primitive streak. Development 118:829–844. 31. Hay ED (1995) An overview of epithelio-mesenchymal transformation. Acta Anat (Basel) 154(1):8–20. 32. Yu M, et al. (2009) A developmentally regulated inducer of EMT, LBX1, contributes to breast cancer progression. Genes Dev 23:1737–1742. 33. Bracken CP, et al. (2008) A double-negative feedback loop between ZEB1-SIP1 and the microRNA-200 family regulates epithelial-mesenchymal transition. Cancer Res 68: 7846–7854. 34. Bierie B, Moses HL (2006) Tumour microenvironment: TGFbeta: The molecular Jekyll and Hyde of cancer. Nat Rev Cancer 6:506–520. 35. Suzuki E, et al. (2007) A novel small-molecule inhibitor of transforming growth factor beta type I receptor kinase (SM16) inhibits murine mesothelioma tumor growth in vivo and prevents tumor recurrence after surgical resection. Cancer Res 67:2351–2359. 36. Ip YT, Park RE, Kosman D, Yazdanbakhsh K, Levine M (1992) Dorsal-twist interactions establish snail expression in the presumptive mesoderm of the Drosophila embryo. Genes Dev 6:1518–1530. 37. Vega S, et al. (2004) Snail blocks the cell cycle and confers resistance to cell death. Genes Dev 18:1131–1143. 38. Vitali R, et al. (2008) Slug (SNAI2) down-regulation by RNA interference facilitates apoptosis and inhibits invasive growth in neuroblastoma preclinical models. Clin Cancer Res 14:4622–4630. 39. Inoue A, et al. (2002) Slug, a highly conserved zinc finger transcriptional repressor, protects hematopoietic progenitor cells from radiation-induced apoptosis in vivo. Cancer Cell 2:279–288. 40. Roy HK, et al. (2004) Down-regulation of SNAIL suppresses MIN mouse tumorigenesis: Modulation of apoptosis, proliferation, and fractal dimension. Mol Cancer Ther 3: 1159–1165. 41. Sayan AE, et al. (2009) SIP1 protein protects cells from DNA damage-induced apoptosis and has independent prognostic value in bladder cancer. Proc Natl Acad Sci USA 106:14884–14889. 42. Eswaramoorthy R, et al. (2010) DDR1 regulates the stabilization of cell surface Ecadherin and E-cadherin-mediated cell aggregation. J Cell Physiol 224:387–397. 43. Wang CZ, Yeh YC, Tang MJ (2009) DDR1/E-cadherin complex regulates the activation of DDR1 and cell spreading. Am J Physiol Cell Physiol 297:C419–C429. 44. Maeyama M, et al. (2008) Switching in discoid domain receptor expressions in SLUGinduced epithelial-mesenchymal transition. Cancer 113:2823–2831. 45. Blick T, et al. (2010) Epithelial mesenchymal transition traits in human breast cancer cell lines parallel the CD44(hi/)CD24 (lo/-) stem cell phenotype in human breast cancer. J Mammary Gland Biol Neoplasia 15:235–252. 46. Subramanian A, et al. (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102:15545–15550. 47. Mootha VK, et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34: 267–273. 48. Nogai H, et al. (2008) Follistatin antagonizes transforming growth factor-beta3induced epithelial-mesenchymal transition in vitro: Implications for murine palatal development supported by microarray analysis. Differentiation 76:404–416. 49. Hoadley KA, et al. (2007) EGFR associated expression profiles vary with breast tumor subtype. BMC Genomics 8:258. 50. Neve RM, et al. (2006) A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 10:515–527. 51. van de Vijver M (2005) Gene-expression profiling and the future of adjuvant therapy. Oncologist 10 (Suppl 2):30–34. 52. Farmer P, et al. (2009) A stroma-related gene signature predicts resistance to neoadjuvant chemotherapy in breast cancer. Nat Med 15(1):68–74. 53. Creighton CJ, et al. (2009) Residual breast cancers after conventional therapy display mesenchymal as well as tumor-initiating features. Proc Natl Acad Sci USA 106:13820– 13825. 54. Hess KR, et al. (2006) Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. J Clin Oncol 24:4236–4244. 55. Lim E, et al. (2009) kConFab (2009) Aberrant luminal progenitors as the candidate target population for basal tumor development in BRCA1 mutation carriers. Nat Med 15:907–913. 56. Luini A, et al. (2007) Metaplastic carcinoma of the breast, an unusual disease with worse prognosis: The experience of the European Institute of Oncology and review of the literature. Breast Cancer Res Treat 101:349–353. 57. Heid CA, Stevens J, Livak KJ, Williams PM (1996) Real time quantitative PCR. Genome Res 6:986–994. 58. Elenbaas B, et al. (2001) Human breast cancer cells generated by oncogenic transformation of primary mammary epithelial cells. Genes Dev 15(1):50–65. 59. Ethier SP (1996) Human breast cancer cell lines as models of growth regulation and disease progression. J Mammary Gland Biol Neoplasia 1(1):111–121. 15454 | www.pnas.org/cgi/doi/10.1073/pnas.1004900107 Taube et al. Supporting Information Taube et al. 10.1073/pnas.1004900107 SI Methods Microarray Data Collection. We used 1 μg of total RNA to prepare complementary DNA (cDNA) using the Genechip HT One-Cycle cDNA synthesis Kit (Affymetrix 900687) and the GeneChip HT IVT Labeling Kit (Affymetrix 900688). Total RNA was first reverse transcribed using a T7-Oligo (dT) promoter primer. Following RNase H-mediated second strand cDNA synthesis, the double stranded cDNA was purified and served as a template for in an in vitro transcription reaction. The in vitro transcription reaction was carried out in the presence of T7 RNA polymerase and a biotinylated nucleotide analog/ribonucleotide mix for cRNA amplification and biotin labeling. The biotinylated cRNA targets were then cleaned up, fragmented and hybridized to Affymetrix HT-HG U133 A peg arrays (Affymetrix 900751). The hybridization and subsequent washing and staining was performed on the Affymetrix GeneChip Array Station automation platform. Arrays with signal intensity < 100 failed because of high noise and high background. Samples with percent presents within the range of 40 to 60% of genes present and GAPDH and β-actin ratios less than 3 passed and were incorporated in the analysis. 1. Hess KR, et al. (2006) Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. J Clin Oncol 24:4236–4244. 2. Subramanian A, et al. (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102: 15545–15550. Pathological Complete-Response Analysis. Genes with at least 2-fold up- or down-regulation in all five epithelial-to-mesenchymal transition (EMT) conditions were collected and their fold-changes were averaged to generate an EMT profile. Patient microarray data (1) were row-normalized to their median, so that each row had a median of 0. Pearson correlation coefficients were calculated between the EMT profile and each patient gene expression profile. Average correlation values of patients annotated as “pCR” were compared with those annotated as “non-pCR” with a Welch t test. Gene Set Enrichment Analysis. Gene set enrichment analysis was performed using Genepattern (http://broad.mit.edu/cancer/software/genepattern/) (2, 3). The rank order of genes from human mammary epithelial (HMLE) cell lines in two biological states, mesenchymal versus epithelial controls, was compared with gene sets within the molecular signatures database and significant enrichments reported. For statistical strength of these enrichments, gene set enrichment analysis uses family wise error rate to correct for multiple testing and false-discovery rate to reduce false positive reporting. 3. Mootha VK, et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34:267– 273. MDA MB 231 SUM 1315 Relative mRNA HMLE MCF-7 100000 10000 1000 100 10 1 0.1 0.01 0.001 0.0001 0.00001 E-Cadherin Vimentin Fig. S1. Expression of E-cadherin (CDH1) and Vimentin (VIM) in HMLE, MCF-7, MDA-MB 231, and SUM1315 cells. Quantitative RT-PCR was performed for indicated genes on RNA from the indicated cell lines and graphed relative to expression in empty vector control cells. Error bars represent the standard deviation from at least three separate measurements. HMLE-Snail HMLE HMLE-Twist HMLE-Gsc HMLE-TGFbeta1 Relative mRNA 10000 1000 100 10 1 G 1 Tw is t Sn ai l TG FB oo s ec oi d 0.1 Gene Fig. S2. Overexpression of retrovirally introduced transgenes. Quantitative RT-PCR was performed for indicated genes on RNA from the indicated cell lines and graphed relative to expression in empty vector control cells. Error bars represent the standard deviation from at least three separate measurements. Taube et al. www.pnas.org/cgi/content/short/1004900107 1 of 4 Relative Expression of EMT associated genes HMLE 1.0x10 4 HMLE-Gsc HMLE-Snail HMLE-Twist HMLE-TGFb 1.0x10 3 1.0x10 2 Relative mRNA 1.0x10 1 1.0x10 0 1.0 x10 - 1 1.0 x10 - 2 1.0 x10 - 3 1.0 x10 - 4 1B 1A R R SP R B PI N SP R SE R KR T5 1 EM C N FB P2 IG R G D L3 A1 O C L1 A2 C O C 1.0 x10 - 6 A9 1.0 x10 - 5 Fig. S3. Validation of gene-expression changes of select members of the EMT core signature. Quantitative RT-PCR was performed for indicated genes on RNA from the indicated cell lines and graphed relative to expression in empty vector control cells. Error bars represent the standard deviation from at least three separate measurements. Fig. S4. The EMT core signature is enriched in basal B breast cancer cell lines. (A and B) Gene-expression data were plotted as box plots for the mean expression of the EMT up genes and the EMT down genes by subtype using the dataset from Neve et al. (1). The list was derived using Significance Analysis of Microarrays and cut off at the top 1,155 genes, 544 up and 611 down. Next, the genes were extracted in the dataset and averaged in each tumor (up and down separately). The one-way ANOVA for the EMT up genes was P = 0.0001 and for the EMT down genes was P = 0.0075. 1. Neve RM, et al. (2006) A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 10:515–527. Taube et al. www.pnas.org/cgi/content/short/1004900107 2 of 4 mRNA expression log2 R/G Ba s al H like ER Lu 2+ C m la i ud nal M in-lo et a w U p la nc s la tic ss ifi N ed or m al Cdh1 mRNA expression log2 R/G j Cldn4 sa lH l i ke ER L 2+ C um la i ud nal M in-lo et a w U pla nc s la tic ss ifi N ed or m al mRNA expression log2 R/G mRNA expression log2 R/G Snai2 e Ba mRNA expression log2 R/G i Gsc Ba sa lH like ER Lu 2+ C m la i ud nal M in-lo et a w U p la nc s la tic ss ifi N ed or m al Twist1 Ba sa lH like ER L 2+ C um la i ud nal M in-lo et a w U p la nc s la tic ss ifi N ed or m al mRNA expression log2 R/G al H like ER Lu 2+ C m la i ud nal M in-lo et a w U pla nc s la tic ss ifi N ed or m al Ba s al H like ER Lu 2 + C m la i ud nal M in-lo et a w U pla nc s la tic ss ifi N ed or m al Ba s mRNA expression log2 R/G d Foxc1 h Snai1 lH like ER Lu 2+ C m la i ud nal M in-lo et a w U pla nc s la tic ss ifi N ed or m al Ba s g Ba sa mRNA expression log2 R/G Ba sa lH like ER Lu 2+ C m la i ud nal M in-lo et a w U pla nc s la tic ss ifi N ed or m al Twist2 al H like ER Lu 2+ C m la i u d na l M in-lo et a w U p la nc s la tic ss ifi N ed or m al mRNA expression log2 R/G f c Zeb2 Ba sa lH like ER Lu 2+ C m la i ud nal M in-lo et a w U pla nc s la tic ss ifi N ed or m al b Zeb1 mRNA expression log2 R/G a Fig. S5. Gene-expression data were plotted as box plots for EMT-related genes and samples were classified by intrinsic subtype as in Herschkowitz et al. (1). The number of samples in each class was: basal-like = 58, HER2+ = 44, luminal = 93, claudin-low = 13, unclassified = 15, and normal = 9. The 12 metaplastic tumors from Hennessey et al. (2) were also included. -0.2 0.0 0.2 * -0.4 Correlation with EMT signature 0.4 1. Herschkowitz JI, et al. (2007) Identification of conserved gene expression features between murine mammary carcinoma models and human breast tumors. Genome Biol 8:R76. 2. Hennessy BT, et al. (2009) Characterization of a naturally occurring breast cancer subset enriched in epithelial-to-mesenchymal transition and stem cell characteristics. Cancer Res 69: 4116–4124. pCR non-pCR Fig. S6. The EMT core signature negatively correlates with gene expression profiles of patients with pathological complete response (pCR). Patients from Hess et al. (1) were stratified based on their pCR status and their gene-expression profiles were correlated to the EMT core signature. A significant difference between the two populations was determined by a Welch t test. *P = 0.005. Table S1. Genes in the EMT core signature Table S1 (DOCX) Genes up- or down-regulated at least 2-fold by overexpression of Twist, Snail, Gsc, TGF-β1, and by down-regulation of E-cadherin are listed with foldchanges relative to control cells. Table S2. Gene set enrichment analysis of the EMT core signature Table S2 (DOCX) Using gene set enrichment analysis, the rank order of genes from HMLE cell lines in two biological states, mesenchymal versus epithelial controls, was compared with gene sets within the molecular signatures database and significant enrichments reported. FDR, false discovery rate; FWER, family-wise error rate. Taube et al. www.pnas.org/cgi/content/short/1004900107 3 of 4 Table S3. Complete list of EMT-down-regulated genes containing putative, conserved Zeb1 binding elements in their promoters Table S3 (DOCX) Gene set enrichment analysis was performed comparing HMLE cells induced to undergo EMT with HMLE control cells. The gene set V$AREB6_01 (Zeb1 binding sites) was found to be significantly enriched in genes down-regulated when cells underwent EMT. ES, enrichment score. Taube et al. www.pnas.org/cgi/content/short/1004900107 4 of 4