References [1] So, C.W., Karsunky, H., Wong P, Weissman, I.L. & Cleary, M.L. Leukemic transformation of hematopoietic progenitors by MLL-GAS7 in the absence of Hoxa7 or Hoxa9. Blood 103, 3192-3199 (2004). [2] Yoshida, Y. et al. Leukemic transformation of hematopoietic progenitors by MLL-GAS7 in the absence of Hoxa7 or Hoxa9. Cell 103, 1085-1097 (2000). [3] Yuasa, H. et al. Oncogenic transcription factor Evi1 regulates hematopoietic stem cell proliferation through GATA-2 expression. EMBO J. 24, 1976-1987 (2005). [4] Martini, A. et al. Recurrent rearrangement of the Ewing's sarcoma gene, EWSR1, or its homologue, TAF15, with the transcription factor CIZ/NMP4 in acute leukemia. Cancer Res. 62, 54080-5412 (2002). [5] Huang, J.S. et al. Diverse cellular transformation capability of overexpressed genes in human hepatocellular carcinoma. Biochem. Biophys. Res. Commun. 315, 950-958 (2004). [6] Harada, J.N. et al. Identification of novel mammalian growth regulatory factors by genomescale quantitative image analysis. Genome Res. 15, 1136-1144 (2005). [7] Paterlini-Brechot, P. et al. Hepatitis B virus-related insertional mutagenesis occurs frequently in human liver cancers and recurrently targets human telomerase gene. Oncogene 22, 3911-3916 (2003). [8] Ryu, S., Zhou, S., Ladurner, A.G. & Tjian, R. The transcriptional cofactor complex CRSP is required for activity of the enhancer-binding protein Sp1. Nature 397, 446-450 (1999). [9] Trapasso, F. et al. Genetic ablation of Ptprj, a mouse cancer susceptibility gene, results in normal growth and development and does not predispose to spontaneous tumorigenesis. DNA Cell Biol. 25, 376-382 (2006). Li, Stamatoyannopoulos & Emery (1) [10] Dasika, G.K. et al. DNA damage-induced cell cycle checkpoints and DNA strand break repair in development and tumorigenesis. Oncogene 18, 7883-7899 (1999). [11] Ohkumo, T. et al. UV-B radiation induces epithelial tumors in mice lacking DNA polymerase eta and mesenchymal tumors in mice deficient for DNA polymerase iota. Mol. Cell. Biol. 26, 7696-7706 (2006). [12] Klose, R.J & Bird, A.P. Genomic DNA methylation: the mark and its mediators. Trends Biochem. Sci. 31, 89-97 (2006). [13] Gaozza, E., Baker, S.J., Vora, R.K. & Reddy, E.P. AATYK: a novel tyrosine kinase induced during growth arrest and apoptosis of myeloid cells. Oncogene 15, 3127-3135 (1997). [14] Wu, M.X. Roles of the stress-induced gene IEX-1 in regulation of cell death and oncogenesis. Apoptosis 8, 11-18 (2003). [15] Cao, Z.A et al. CRA-026440: a potent, broad-spectrum, hydroxamic histone deacetylase inhibitor with antiproliferative and antiangiogenic activity in vitro and in vivo. Mol. Cancer Ther. 5, 1693-1701 (2006). [16] Gudi, R. et al. Siva-1 negatively regulates NF-kappaB activity: effect on T-cell receptormediated activation-induced cell death (AICD). Oncogene 25, 3458-3462 (2006). [17]. Eldridge, A.G. et al. The evi5 oncogene regulates cyclin accumulation by stabilizing the anaphase-promoting complex inhibitor emi1. Cell 124, 367-380 (2006). [18] Black, R.A. Tumor necrosis factor-alpha converting enzyme. Int. J. Biochem. Cell Biol. 34, 1-5 (2002). [19] Ringel, J. Aberrant expression of a disintegrin and metalloproteinase 17/tumor necrosis factor-alpha converting enzyme increases the malignant potential in human pancreatic ductal adenocarcinoma. Cancer Res. 66, 9045-9053 (2006). Li, Stamatoyannopoulos & Emery (2) [20] Ayyanan, A. et al. Increased Wnt signaling triggers oncogenic conversion of human breast epithelial cells by a Notch-dependent mechanism. Proc. Natl. Acad. Sci. USA 103, 37993804 (2006). [21] Bustelo, X.R. Regulatory and signaling properties of the Vav family. Mol. Cell. Biol. 20, 1678-1691 (2000). [22] Wysocka, J. et al. WDR5 associates with histone H3 methylated at K4 and is essential for H3 K4 methylation and vertebrate development. Cell 121, 859-872 (2005). [23] Hu, H. et al. Foxp1 is an essential transcriptional regulator of B cell development. Nat. Immunol. 7, 819-826 (2006). [24] Eckstein, J.W. Cdc25 as a potential target of anticancer agents. Invest. New Drugs 18, 149-156 (2000). [25] Yang, L., Wang, N., Tang, Y., Cao, X. & Wan, M. Acute myelogenous leukemia-derived SMAD4 mutations target the protein to ubiquitin-proteasome degradation. Hum. Mutat. 27, 897-905 (2006). [26] Raouf, A. et al. Genomic instability of human mammary epithelial cells overexpressing a truncated form of EMSY. J. Natl. Cancer Inst. 97, 1302-1306 (2005). [27] Brown, L.A. et al. Amplification of EMSY, a novel oncogene on 11q13, in high grade ovarian surface epithelial carcinomas. Gynecol. Oncol. 100, 264-270 (2006). [28] Ouko, L., Ziegler, T.R., Gu, L.H., Eisenberg, L.M. & Yang, V.W. Wnt11 signaling promotes proliferation, transformation, and migration of IEC6 intestinal epithelial cells. J. Biol. Chem. 279, 26707-26715 (2004). SUPPLEMENTARY METHODS Integration site analysis Li, Stamatoyannopoulos & Emery (3) Most vector integration sites from HT1080 cell clones were identified by inverse PCR as previously described,1 with minor modifications. In short, genomic DNA was digested individually with either PstI, Hind III, BglII, SphI, BamHI, XbaI, or a combination of BglII and BamHI, all of which cut within the vector provirus. The restriction enzymes were inactivated, and approximately 100 ng was ligated with 400 units T4 DNA ligase in 20 ul reaction volume. Provirus-genomic junction fragments were then amplified by nested PCR using vector-specific primers. First primer pair, 5’-CTAGAAACTGCTGAGGGCGG and 5’-CTGATCCTTGGGAGGGT; nested primer pair, 5’-TCCTAACCTTGATCTGA and 5’-CAGATTGATTGACTGCC. The resulting PCR products were separated by gel electrophoresis, excised, and column purified with the Qiaquick gel extraction kit (Quigen). Junction fragments were then sequenced either directly using the PCR fragments as template and the nested PCR oligos as primers, or after subcloning into TOPO TA cloning vectors (Invitrogen Corp., Carlsbad, CA). Some vector integration sites from HT1080 cell clones were identified by a directed DNA library screening approach as previously described,2 with modifications. First, genomic DNA was digested with XbaI (which cuts once in each vector LTR), and half of the digested product was subjected to Southern analysis essentially as described above using probes for the vector LTR that are located either proximally (5' LTR) or distally (3' LTR) to the Xba I site in order to identify band sizes for individual provirus junction fragments. The remaining digested DNA products were then separated by gel electrophoresis under identical conditions as those used for the Southern analysis, and band-specific regions were excised and column purified. Extracted DNA was then subcloned into pUC19 cloning vectors, and the resulting libraries were screened by conventional colony lifts and hybridization using the same proximal LTR probe. Plasmid inserts from positive colonies were subsequently sequenced. Some vector integration sites in HT1080 cell clones, and all vector integration sites in 32D cell clones, were identified by linear amplification-mediated polymerase chain reaction (LAM-PCR) essentially as previously described.3 In short, single-stranded copies of the viral 5' Li, Stamatoyannopoulos & Emery (4) LTR junction fragments were first generated by 100 cycles of linear amplification using biotinylated primers specific for the proximal region of the vector LTR's (vector MGPN2, 5'biotinTTCTCTAGAAACTGCTGAGG; vector INS4(+), 5'biotin-ATTCTAAATCTCTCTTTCAGCC). The products were then isolated using streptavidin-coated magnetic M280 Dynabeads (Dynal Biotech, Oslo, Norway), converted to double-stranded DNA using random hexamers and Klenow, digested with either Tsp509I, HaeIII, or RsaI (which cut in the genomic sequences), and capped with anchor primers compatible with the restricted ends. The vector-genomic junction fragments were eluted from the Dynabead matrix and amplified by two additional rounds of nested PCR using primers specific to the vector LTR and anchor primer sequences. The resulting LAM-PCR products were separated by gel electrophoresis, excised, and column purified. Junction fragments were then sequenced either directly using the PCR fragments as template and the nested PCR oligos as primers, or after subcloning in TOPO TA cloning vectors. Sequences were BLAST searched against either the human genome (March 2006 assembly) or the mouse genome (February 2006 assembly) using the UCSC Genome Browser (http://genome.ucsc.edu/) as previously described.4 Insertion sites were considered authentic if they contained adjoining retroviral sequences and gave a unique best match with better than 90% identity. Analysis of integration sites relative to flanking transcription units were also performed using the UCSC Genome Browser and included all known genes (UCSC known genes based on UniProt, RefSeq, and GeneBank mRNA). Simulated random integration datasets were generated essentially as described.4 In short, random sites in the human or mouse genomes were chosen using a random number generator. Sequences of lengths about the same size as the experimental data (50 bp) were then identified adjacent to these sites and BLAST searched using the criteria used for the experimental datasets described above. Expression microarray analysis Li, Stamatoyannopoulos & Emery (5) The transduced HT1080 cell clones were screened for dysregulated cellular genes using Codelink UniSet Human 20K I Bioarrays and gene expression system (Amersham / GE Healthcare Bio-Sciences Corp., Piscataway, NJ) following the manufacturer's directions. These arrays include approximately 20,000 human genes. Total RNA from HT1080 cell clones and two independent aliquots of untransduced HT1080 cells was prepared by column purification (RNeasy Mini kit, Qiagen), and used as template to prepare biotin-labeled cRNA target by linear amplification. Labeled target was then fragmented and hybridized to individual bioarrays (one array per clone or control). The hybridized arrays were then washed, stained with Cy5streptavidin, and scanned using a GenePix 4000A analyzer (Axon Instruments / MolecularDevices, Sunnyvale, CA). Expression levels were first analyzed using the manufacturer's software (CodeLink EXP v4.1) in order to assess the overall signal quality and to establish minimum thresholds for signal reliability. Pair-wise comparisons between each of the individual arrays versus all of the remaining arrays (two untransduced controls and 86 transduced clones) were then performed using GeneSifter software (VizX Lab LLC, Seattle, WA). For this purpose we normalized signals to array means, and excluded individual spots if they were background-contaminated, irregularly shaped, near background, or saturated; otherwise, no additional transformations or corrections were made. A gene was considered to be dysregulated if the intensity of that gene's signal within any one cell clone was either 5-fold higher or 5-fold lower than the mean signal intensity for the remaining cell clones and untransduced controls, that gene's signal was considered reliable by the manufacturer's criteria, and that gene was not found to be dysregulated in more than one clone. Statistical analysis Li, Stamatoyannopoulos & Emery (6) Most comparisons between discrete datasets were performed using the KolmogorovSmirnov (KS) test.5 This is a non-parametric and distribution free method that does not require the datasets to be normally distributed. In cases where comparisons were performed between the means of small matched datasets with apparent normal distributions, we used the paired, two-tailed Student's t-test. In cases where comparisons were made between two discrete proportions (frequencies), we used the Z-test for two proportions. Kaplan-Meier survival curves were analyzed using the logrank test and chi-squared distribution. In order to estimate the frequency of vector-mediated tumor formation (Fig. 5a), we first estimated the number of independent transformation events based on the fraction of tumor-free animals at 130 days using the Poisson distribution: vector MGPN2, 1 of 10 animals surviving indicating 23 independent transformation events; vector INS4(+), 4 of 10 animals surviving indicating 9 independent transformation events. We then divided the estimated number of transforming events by the estimated number of cells that were transduced during the original transduction culture (a total of 37,700 for vector MGPN2 and 89,680 for vector INS4(+), Supplementary Table 3, experiments D and E). These ratios were then compared by the 1sided Z-test for two proportions. In order to estimate the number of simulated random integration events found +/- 40 Mb of dysregulated genes (Table 1), we first mapped 100 simulated random integration sites relative to the dysregulated genes. This analysis revealed 29 cases (out of 32 dysregulated genes) where unique simulated random integration sites were located within a 40 Mb window of unique dysregulated genes, for an overall risk of 1 in 100 for each of these 29 genes (and 0 for the remaining 3 genes). We then calculated the relative risk for each of the cell clones by multiplying the risk for the dysregulated genes present in that clone (either 0 or 0.01) by the number of authentic vector provirus present within that clone. Finally, we calculated the cumulative risk for all such occurrences by summing over all clones for each vector panel. Li, Stamatoyannopoulos & Emery (7) Since none of the simulated random integration sites were found to be located within the body of dysregulated genes, we calculated this risk to be 0. References 1. Nolta JA, Dao MA, Wells S, Smogorzewska EM, Kohn DB. Transduction of pluripotent human hematopoietic stem cells demonstrated by clonal analysis after engraftment in immune-deficient mice. Proc Natl Acad Sci USA. 1996;93:2414-2419. 2. Li CL, Coullin P, Bernheim A, Joliot V, Auffray C, Zoroob R, Perbal B. Integration of Myeloblastosis Associated Virus proviral sequences occurs in the vicinity of genes encoding signaling proteins and regulators of cell proliferation. Cell Commun Signal. 2006;4:1-15. 3. Harkey MA, Kaul R, Jacobs MA, et al. Multiarm high-throughput integration site detection: limitations of LAM-PCR technology and optimization for clonal analysis. Stem Cells Dev. 2007;16;381-392. 4. Aker M, Tubb J, Miller DG, Stamatoyannopoulos G, Emery DW Integration bias of gammaretrovirus vectors following transduction and growth of primary mouse hematopoietic progenitor cells with and without selection. Mol Ther. 2006;4;226-235. 5. Horn SD. Goodness-of-fit tests for discrete data: a review and an application to a health impairment scale. Biometrics. 1977;33:237-247. Li, Stamatoyannopoulos & Emery (8)