Supplementary materials and methods Cell culture and transfections The pEF1/Cldn2 vector was constructed by ligating the full-length murine Cldn2 cDNA (clone ID: 4223446; Open Biosystems, Huntsville, AL, USA) into a pEF1/V5-His expression vector (Invitrogen, Burlington, ON, Canada) using 5’ EcoR1 and 3’ Not1 restriction enzyme sites. Pooled stable 4T1 or explant populations were generated by transfection using Lipofectamine 2000 reagent (Invitrogen). Stable cell lines were maintained under 1mg/ml G418 antibiotic selection. Two independent shRNAs targeting Cldn2 or CLDN2 were designed using the same protocol: Cldn2 shRNA KD1: 5’-CTCATACAGCCTGACTGGGTAT-3’; Cldn2 ShRNA KD2: 5’-TGC GAT ATC TAC AGT ACC CTT T-3’; CLDN2 shRNA KD1: 5’-ACC TCC CAA AGT CAA GAG TGA G-3’; CLDN2 ShRNA KD2: 5’-ATC ACC CAG TGT GAC ATC TAT A-3’. Pooled stable populations were maintained under 1µg/ml puromycin antibiotic selection. RNA amplification, labeling and hybridization to Agilent microarrays RNA was extracted from 4T1 parental and individual in vivo selected liver metastatic sub-populations using RNeasy Mini Kits and QIAshredder columns (Qiagen, Mississauga, ON, Canada). Purified total RNA (40 ng) was subjected to two consecutive rounds of T7-based amplification using Amino Allyl MessageAmp II kits (Applied Biosystems, Streetsville, ON, Canada) and the resulting aRNA was conjugated to Cy3 and Cy5 dyes (GE Healthcare Bio-sciences, Bair d’Urfe, QC, Canada). RNA concentration and dye incorporation was measured using a UV-VIS spectrophotometer 1 (Nanodrop ND-1000, Agilent Technologies, Wilmington, DE, USA). RNA quality was assessed using a bioanalyser (Agilent Technologies). In parallel, the same labeling procedure was utilized to amplify and label a universal mouse reference RNA (Stratagene, La Jolla, CA, USA). Hybridization to 44K whole mouse genome microarray gene expression chips (Agilent Technologies) was conducted following manufacturer’s protocol (Agilent Technologies) and dye swaps (Cy3 and Cy5) were performed for RNA amplified from each population. Microarray chips were then washed and immediately scanned using a DNA Microarray Scanner (Model G2565BA, Agilent Technologies). Gene expression analysis Microarray data were feature extracted using Feature Extraction Software (v. 9.5.3.1) available from Agilent, using the default variables. Outlier features on the arrays were flagged by the same software package. Data preprocessing and normalization was automated using the BIAS system (Finak et al., 2005). Raw feature intensities were background corrected using the RMA background correction algorithm (Irizarry et al., 2003a; Irizarry et al., 2003b) and the resulting expression estimates were converted to log 2 ratios. Within-array normalization was done using spatial and intensity-dependent loess (Smyth and Speed, 2003). Median absolute deviation scale normalization was used to normalize between arrays (Yang et al., 2001). The hierarchical clustering was done using Ward's minimum variance method with a correlation distance metric. Differential expression was performed using Linear Models for Microarray Analysis (Smyth, 2005). If a gene is represented by several probes, only the probe with the largest interquartile range is used. Probes that could not be mapped to any gene were ignored. A gene is 2 considered differentially expressed if it displays a fold change greater than or equal to 4 and a Holm-adjusted P-value less than or equal to 0.05 between the two categories (Holm, 1979). RT-qPCR Total RNA was extracted from the indicated cell populations using RNeasy Mini Kits and QIAshredder columns (Qiagen). According to manufacturer’s protocol, 1 μg of total RNA was converted to DNA using a High Capacity cDNA Reverse Transcription Kit (Applied Biosystems). Following the reverse transcription (RT) reaction, all samples were diluted 1:50 in ddH2O and subjected to real time PCR analysis with SYBR Green PCR Master Mix (Applied Biosystems). Ten picograms of gene specific primers (Supplementary Table 4) were used in a total reaction volume of 15 μl. For all targets, the cycling conditions were: 95°C for 10 minutes, followed by 40 cycles each consisting of 95°C for 15 seconds, 60°C for 30 seconds and 72°C for 45 seconds. Incorporation of SYBR Green dye into the PCR products was monitored using a 7500 Real time PCR system (Applied Biosystems). Serial dilutions were performed to generate a standard curve for each gene target in order to define the efficiency of the qRT-PCR reaction. The integrity and specificity of the amplified PCR products were confirmed by dissociation curve analysis (SDS 2.0 software, Applied Biosystems). Pfaffl analysis method was used to measure the relative quantity of gene expression (Pfaffl, 2001). The reference gene, Gapdh, was selected based on its stable expression in all cell populations analyzed. Relative mRNA levels were expressed in terms of fold induction over the parental cell 3 population (4T1p). All measurements were done in triplicate and three independent experiments were performed for each gene target. Motility and invasion assays Motility and invasion assays were performed as previously described using 1x105 cells (Rodrigues et al., 2005). In Supplementary Figure 5, the migration and invasion data is expressed as the number of cells per field, due to the large variability in cell size and shape exhibited by the liver-weak versus liver-aggressive explant populations. To quantify the migration and invasion assays (Supplementary Figure 7), the average pixel count from five independent images was first quantified using Imagescope software (Aperio, Vista, CA, USA). We next calculated the average pixel count of one cell, using five randomly selected individual cells within a field. The data is expressed as an approximate number of cells per field (total pixel count/pixel count for one cell). In each experiment, two independent inserts were quantified for each explant and the data represents the average of 4 independent experiments. “Scratch” wound closure assay Cells were cultured in 6-well plates until they formed confluent monolayers, after which a scratch wound was created using a standard 200-μl pipette tip. Wounded monolayers were washed with PBS and digital images captured for the 0 hour timepoint with an inverted microscope equipped with a digital camera. Digital images were again captured 6 hours post-wounding and the pictures were analyzed using image analysis software (Imagescope, Aperio). The wound healing effect was calculated as a percentage of wound 4 closure compared with the area of the initial wound. Briefly, the distance between the wound margins was measured at 0 hours and again at 6 hours post-wounding. The following formula was used to evaluate the wound closure: % of wound closure = 100 [(distancet=6h/distancet=0h) x 100]. The data represents the average of at least 5 independent experiments (2 wells/experiments). Human samples Breast cancer liver metastases were obtained from 24 patients that underwent liver resection surgery either at the Royal-Victoria Hospital (Montréal, Canada) or the Hôpital Paul Brousse (Villejuif, France) to remove isolated breast cancer liver metastases. The McGill University Health Center or Assistance Publique-Hôpitaux de Paris ethics review boards approved the protocols. Consent was obtained from all participating patients or next of kin to gain access to the stored tissue blocks. Parameters such as hormone receptor and ErbB-2 receptor status of each liver resection specimen (Supplementary Table 3) were scored by a pathologist using the following antibodies: Estrogen Receptor, Progesterone Receptor and ErbB-2 (Ventana Medical Systems, Oro Valley, AZ, USA). Paraffin embedded tissues were sectioned and subjected to claudin-2 immunohistocytochemistry as described above. Breast tumors were obtained from the Breast cancer Functional Genomics Group at the McGill University. The McGill University Health Center board approved the protocol and consent was obtained from all participating patients or next of kin to gain access to the stored tissue blocks. Finally, matched breast tumor and liver metastasis samples were obtained from patients with metastatic breast cancer that were enrolled into the study at 5 the Segal Cancer Centre, following a protocol approved by the Jewish General Hospital research ethics committee. After informed consent had been obtained, the patients underwent ultrasound-guided liver biopsy for diagnostic reasons. All biopsy and primary breast tumor samples were assessed for conventional histopathology as well as for estrogen receptor, progesterone receptor and ErbB-2 receptor expression by a pathologist (Supplementary Table 3). Antigen retrieval for the assessment of estrogen and progesterone receptor status was carried out by heating, followed by immunohistocytochemical staining using antibodies from Ventana Medical Systems. The cut-off levels for hormone receptor positivity were greater than 10%. Antigen retrieval for the assessment of ErbB-2 expression was carried out by heating representative paraffin sections with the CB11 monoclonal antibody (Novocastra, Newcastle, UK) followed by protease digestion using TAB250 (Invitrogen). Claudin-2 immunohistochemistry was performed as described above. Statistical analysis The significance value associated with the differences in liver metastasis formation between claudin-2 overexpressing cells and controls (Supplementary Figure 3) was obtained using a t-test by considering data from the three weakly aggressive cell lines carrying the empty vector control as one group and data from the three claudin-2 overexpressing cell lines as the other group. The significance values associated with motility and invasion differences between the weak and liver-aggressive populations (Supplementary Figure 5) were calculated using a Wilcoxon test, performed on each 6 experimental replicate. The resulting P-values were combined in a Z-transform test (Whitlock, 2005). Supplementary Figure Legends Supplementary Figure 1 In vivo selection of breast cancer cells that aggressively grow in the liver. (a) Schematic demonstrating selection procedure. Breast cancer cells are delivered to the liver following splenic injection and cells derived from the liver metastases are explanted back into culture and subjected to addition rounds of injection. (b) Surface lesions quantified over the entire liver surface from mice injected with parental 4T1 (4T1p) cells, first (LivM1) and second (LivM2) round liver metastases explants (n = 16 for 4T1p, n = 9 for 2270 LivM1, n = 20 for 2521 LivM2, n = 18 for 2526 LivM2, n = 8 for 2263 LivM1, n = 20 for 2568 LivM2, n = 9 for 2265 LivM1 and n = 14 for 2557 LivM2). The average number of surface lesions is shown (*, P < 0.05; comparisons are relative to the parental 4T1 population (4T1p)). (c) Quantification of tumor burden (lesion area/tissue area) within the cardiac liver lobe (n = 16 for 4T1p, n = 9 for 2270 LivM1, n = 20 for 2521 LivM2, n = 18 for 2526 LivM2, n = 8 for 2263 LivM1, n = 20 for 2568 LivM2, n = 9 for 2265 LivM1 and n = 14 for 2557 LivM2). The average percentage of total liver area that contains metastatic lesions is shown (*, P < 0.05; comparisons are relative to the parental 4T1 population (4T1p)). Supplementary Figure 2 Liver-aggressive breast cancer populations have lost expression of tight-junction components. Quantitative real-time PCR analysis was performed for Cldn2, Cldn3, Cldn4, Cldn7 and Ocln in weak and liver-aggressive breast 7 cancer cells normalized to total Gapdh levels. The data is depicted as fold expression relative to 4T1p cells and is representative of 3 independent experiments performed in triplicate. Supplementary Figure 3 Claudin-2 overexpression in the liver-weak cell populations promotes establishment and growth in the liver. (a) Immunoblot analysis of claudin-2 expression in cells transfected with a claudin-2 expression vector (C2) or the corresponding empty vector (EV) as a control. Total lysate from the 2869 cells, an in vivo selected liver-aggressive 4T1 sub-population, was used to indicate the relative claudin-2 expression levels in the transfectants. As a loading control, total cell lysates were blotted for α-tubulin. (b) Quantification of the number of surface lesions over the entire liver. (c) The tumor burden (tumor area/tissue area) within the cardiac liver lobe is shown. Cohort sizes: n = 9 for 4T1p-VC, n = 8 for 4T1p-C2, n = 9 for 2648 LivM3-VC, n = 9 for 2648 LivM3-C2, n = 10 for 2801 LivM3-VC and n = 8 for 2801 LivM3-C2 (*, P < 0.05). (d) Representative H&E images of the cardiac liver lobe are shown for mice injected with empty vector (upper) or claudin-2 overexpressing cells (lower). Dotted lines circumscribe the breast cancer lesions. Scale bars represent 2 mm. Supplementary Figure 4 Reduced expression of claudin-2 has no effect on tumor outgrowth. Tumor growth in the mammary fat pad was assessed by caliper measurement for the liver-aggressive cell populations with claudin-2 knockdown (KD1, KD2) and their corresponding empty vector (EV) controls. Cohort sizes: n = 10 tumors for 2776-EV, 2776-KD1, 2792-KD1, 2792-KD2 and n = 9 for 2792-EV and 2776-KD2. 8 Supplementary Figure 5 Liver-aggressive 4T1 explants display enhanced motility and invasion compared to their weak counterparts. Representative images of 4T1 explants following migration through modified Boyden chambers with filters containing 8 m pores (a) or invasion through a matrigel barrier (c). Quantification of motility (b) and invasion (d) assays. Results are shown from three independent experiments performed in triplicate. Supplementary Figure 6 4T1-derived subpopulations that aggressively grow in the liver are highly motile in a “wound” assay compared to their weak counterparts. Confluent monolayer cultures were scratched with a pipette tip and images taken immediately after the “wound” was created (0h) and again after 6 hours (6h). Representative images illustrating that the weak populations close the wound as an adherent front of cells (a) whereas liver-aggressive populations (b) close the wound as individual cells. (c) Quantification of percentage wound closure at 6 hours post “wounding” for three independent weak (grey) and three independent aggressive (black) 4T1-derived breast cancer populations. A statistically significant increase in cell motility was observed in all aggressive explants. Supplementary Figure 7 Effects of elevated or diminished claudin-2 expression on breast cancer motility and invasion. (a), Quantification of percentage wound closure at 6 hours post “wounding” in the liver-weak cell populations overexpressing claudin-2 (C2) and their corresponding empty vector (EV). (b), Quantification of breast cancer cell 9 invasion in a modified Boyden chamber assay using the liver-weak cell populations overexpressing claudin-2 (C2) and their corresponding empty vector (EV). (c), Quantification of percentage wound closure at 6 hours post “wounding” in the liveraggressive cell populations with a claudin-2 knockdown (KD1, KD2) and their corresponding empty vector (EV) and parental cell (P) controls. (d), Quantification of breast cancer cell invasion in a modified Boyden chamber assay using liver-aggressive cell populations with a claudin-2 knockdown (KD1, KD2) and their corresponding empty vector (EV) and parental cell (P) controls. Supplementary Figure 8 Claudin-2 expression enhances attachment to type IV collagen and fibronectin. Quantification of cancer cell adhesion to collagen type IV (a) or fibronectin–coated plates (b) using liver-weak explants populations overexpressing claudin-2 compared to empty vector controls (EV). Statistically significant increases in adhesion were observed in all claudin-2 overexpressing cells (*, P < 0.05). Supplementary Figure 9 Claudin-2 knockdown (KD) in MDA-MB-231 cells has no effect on claudin-4 expression. Expression of claudin-4 and claudin-2 was assessed by immunoblot analysis. As a loading control, total cell lysates were blotted for α-tubulin. P, parental MDA-MB-231; EV, empty vector control. Supplementary Figure 10 Reduced claudin-2 expression in MDA-MB-231 cells correlates with diminished levels of integrin α2β1 and α5β1 cell surface expression. The signal from MDA-MB-231-EV cells was divided equally (black line in Figure 5B) 10 representing high or low surface expression of the integrin complexes. Bars show the proportion of MDA-MB-231 KD1 and MDA-MB-231 KD2 cells that display high or low surface expression for each integrin complex relative to MDA-MB-231 EV control. The decrease in high surface expression of α2β1 and α5β1 in both individual knockdown (KD) cell lines is significant compared to the empty vector (EV) control (P < 0.05). Supplementary References Finak G, Godin N, Hallett M, Pepin F, Rajabi Z, Srivastava V et al (2005). BIAS: Bioinformatics Integrated Application Software. Bioinformatics 21: 1745-6. Holm S (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Biostatistics 6: 65-70. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP (2003a). Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31: e15. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U et al (2003b). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4: 249-64. Pfaffl MW (2001). A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res 29: e45. Rodrigues SP, Fathers KE, Chan G, Zuo D, Halwani F, Meterissian S et al (2005). CrkI and CrkII function as key signaling integrators for migration and invasion of cancer cells. Mol Cancer Res 3: 183-94. Smyth GK (2005). Limma: linear models for microarray data. In: Gentleman R (ed). Bioinformatics and Computational Biology Solutions using R and Bioconductor, R. Springer: New York. pp 397-420. Smyth GK, Speed T (2003). Normalization of cDNA microarray data. Methods 31: 26573. Whitlock MC (2005). Combining probability from independent tests: the weighted Zmethod is superior to Fisher's approach. J Evol Biol 18: 1368-73. Yang YH, Buckley MJ, Speed TP (2001). Analysis of cDNA microarray images. Brief Bioinform 2: 341-9. 11 12