RESEARCH ARTICLE © 2003 Nature Publishing Group http://www.nature.com/naturebiotechnology Scanning the human genome with combinatorial transcription factor libraries Pilar Blancafort, Laurent Magnenat, and Carlos F. Barbas III∗ Published online 18 February 2003; doi:10.1038/nbt794 Despite the critical importance of transcription factors in mediating gene regulation, there exists no general, genome-wide tool that uses transcription factors to induce or silence a target gene or select for a particular phenotype. In the strategy described here, we prepared large combinatorial libraries of artificial transcription factors comprising three or six zinc-finger domains, and selected transcription factor–DNA interactions able to upregulate several genes in human cells. Selected transcription factors either induced the expression of an endothelial-specific differentiation marker, VE-cadherin, in non-endothelial cell lines or, when combined with a repression domain, knocked down expression. Potential binding sites for a number of these transcription factors were mapped along the promoter of CDH5, the gene encoding VE-cadherin. Transcription factor libraries represent a useful approach for studying and modulating gene function in cells and potentially in whole organisms. Regulatory sequences and their attendant transcription factors provide the spatial-temporal cues that direct when, where, and to what extent a given gene is expressed. Most regulatory sequences contain binding sites for repertoires of transcription factors that mediate activation or repression of target genes. Considerable efforts have been devoted to engineering artificial sequence-specific transcription factors able to regulate specific genes, particularly therapeutic targets1. Compared with other approaches to studying gene function, such as RNA interference, ribozymes, or antisense RNA, that provide solely knock-down phenotypes2,3, transcription factor–based tools can generate both lossof-function phenotypes (when the transcription factor is linked to a repressor domain) and gain-of-function phenotypes (through linkage to an activator domain). Nevertheless, no general genome-wide transcription-factor tools have been described4. Current transcription factor–based strategies involve the individualized design and testing of transcription factors targeted to particular genes. Modular zinc-finger DNA-recognition domains allow the assembly of transcription factors with predictable in vitro specificity. Such ‘de novo’ design has been used successfully for the regulation of a small number of genes (including ERBB2, ERBB3, VEGF, and EPO5–8). However, rational design has not always yielded functional regulators in vivo, mainly because knowledge of both regulatory areas and of endogenous factors affecting transcription factor–DNA interactions (such as chromatin structure, accessibility of the regulatory area, DNA modifications, and the presence of other cellular or tissue-specific factors) is often very limited6,7. In the combinatorial strategy described here, large libraries of artificial transcription factors were created and used to select in vivo protein-DNA interactions that confer a desired phenotype or molecular function to human cells through the activation of one or more genomic loci. Results and discussion Construction of zinc-finger libraries. We created libraries of zincfinger transcription factors (TFZFs) for the recognition of DNA target sites of 9 and 18 base pairs (bp). Zinc-finger domains have exquisite sequence specificity and modularity1. Previous studies have identified α-helical sequences in the zinc-finger domain that confer specific recognition of 3 bp of DNA sequence, and have shown that these domains can be recombined to prepare polydactyl zinc-finger proteins of desired specificity5–13. Use of characterized zinc-finger domains allowed the prediction of potential DNA binding sites for each TFZF after the functional screen or selection was done. We created the 3ZF library by combinatorial assembly of three different zinc-finger repertoires (ZF1, ZF2, and ZF3). Each repertoire consisted of an equimolar mixture of a subset of defined zinc-finger DNA sequences encoding a characteristic α-helical element previously optimized to provide specific recognition of 3 bp of DNA (Fig. 1A). Combination of a variety of available specific zinc-finger recognition helices for ZF1, ZF2, and ZF3 (consisting of all the helices recognizing DNA triplets of type GNN and a subset of the ANN and TNN triplets8,10,12) allowed the preparation of a 9,177 member, 9 bp–targeting 3ZF library. The 3ZF library was then used as a template to assemble the 18 bp–targeting 6ZF library (8.4 × 107 members). TFZFs were linked to a potent transcriptional activation domain (VP64)10. We expected that the 3ZF library would recognize a subset of genomic DNA sequences of type 5′-(NNN)3-3′, whereas the 6ZF library would recognize a subset of genomic sequences of type 5′-(NNN)6-3′. Given the zinc-finger domains used, both libraries were more likely to recognize (RNN)x-type sequences (R = G or A). In theory, the human genome contains 750 million (RNN)3 sites (considering both strands) and 93.75 million (RNN)6 sites14. Although any (RNN)3-binding TFZF might be expected to bind many sites in the genome, in the living cell many of these binding sites would be inaccessible or in regions with no impact on regulation. (RNN)6-binding TFZFs can bind unique sites in the genome. Screening for upregulation of target genes in human cell lines. We delivered millions of TFZFs into the human squamous carcinoma cell line A431 using a retroviral vector, pMX-IRES-GFP15 (Fig. 1B). Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037. ∗Corresponding author (carlos@scripps.edu). www.nature.com/naturebiotechnology • MARCH 2003 • VOLUME 21 • nature biotechnology 269 RESEARCH ARTICLE © 2003 Nature Publishing Group http://www.nature.com/naturebiotechnology A B transferase-4, FUT4; CD15); Apo1-FAS antigen (tumor necrosis factor superfamily member 6, TNFRSF6; CD95); integrin-α6 (ITGA6; CD49f) and integrin-β4 (ITGB4; CD104); the adhesion molecules CD54 (intracellular adhesion molecule, ICAM-1) and leukocyte function-associated antigen (LFA-3; CD58); and the receptors erythroblastic leukemia viral oncogene homolog-2 (ERBB2), ERBB-3, and epidermal growth factor (EGF). Independent selections were carried out for each marker (Fig. 1B). These markers were chosen because they localize on the cell surface, facilitating cell sorting, and because they are involved in important aspects of tumor D E F C biology such as cell proliferation, adhesion, or migration. After three rounds of selection with the 3ZF library and four rounds with the 6ZF library, pools of infected A431 cells were analyzed by flow cytometry. For both libraries, five cell surface markers showed changes in expression levels (Fig. 1 and Supplementary Fig. 1 online): ERBB-2 and VE-cadherin were the most highly regulated by the 3ZF library Figure 1. The TFZF library design. (A) The TFZF library construction based on the modular (Fig. 1C,D), and VE-cadherin and ICAM-1 organization of protein-DNA contacts. (B) Screening for functional TFZF activators in A431 cells. (C–F) Flow cytometric analysis of A431 cells infected with some of the selected pMX-TFZF pools were the most highly regulated by the 6ZF from the 3ZF selections (C, D) or 6ZF selections (E, F). Shown are upregulation of ERBB-2 (C), library (Fig. 1E,F). The remaining three VE-cadherin (D, E), and ICAM-1 (F). Blue, A431 cells infected with the selected pMX-TFZF pools and markers showed only small changes in gene stained with the corresponding antibody; orange, A431 cells infected with the 3ZF or 6ZF unselected expression with both 3ZF and 6ZF selections. libraries; green, mock-infected cells; stippled line, control staining without primary antibody. Both the 3ZF (Fig. 1D) and 6ZF (Fig. 1E) libraries induced expression of the strictly Infection efficiency and expression of individual library members endothelial-specific marker VE-cadherin. This marker is not signifwere tracked with the green fluorescent protein (GFP) marker. Cells icantly expressed in A431 cells, as determined by FACS and overexpressing a target gene product on the cell surface were selected RT-PCR. VE-cadherin is a transmembrane glycoprotein that selfby flow cytometry. The DNA encoding the zinc-finger domain was associates in the adherens junctions of endothelial cells, controlling recovered by PCR and re-cloned into the retroviral vector for subsethe permeability of the endothelium16. In addition, VE-cadherin is necessary for vascular morphogenesis17 and is involved in several quent rounds of selection. Finally, individual TFZF clones were isolated and sequenced and their functional properties were analyzed aspects of angiogenesis18, tumor growth, and metastasis19,20. We focused further studies on the characterization of TFZFs activating in vivo and in vitro. A431 cells infected with 3ZF and 6ZF libraries its associated gene, CDH5. were screened with monoclonal antibodies against ten different In vitro and in vivo analysis of TFZFs regulating CDH5. The markers: vascular endothelial cadherin (also known as VE-cadherin; sequences of the TFZFs regulating CDH5 and their predicted cadherin-5 type 2, CDH5; CD144); 3-FAL selectin ligand (fucosylTable 1. 6ZF (top) and 3ZF (bottom) clones activating VE-cadherin TFZF 144-3 144-4 144-5 144-13 144-23 144-29 VE-1 VE-5 VE-8 VE-13 VE-18 ZF helicesa F4 F6 F5 QSSSLVR QAGHLAS TSGELVR QSGDLRR DPGALVR QAGHLAS TSGHLVR RSDDLVR QLAHLRA DPGNLVR QLAHLRA QAGHLAS RSDHLTT TSGELVR QSGDLRR TSGHLVR QSSHLVR TSGHLVR F3 F2 F1 QSSNLVR QAGHLAS TSGSLVR REDNLHT QSSSLVR QSSSLVR REDNLHT RSDKLVR RSDKLVR TSGSLVR TSGHLVR QLAHLRA RSDKLVR DPGNLVR DPGNLVR DPGHLVR DPGHLVR RSDKLVR TSGNLVR QSSNLVR QSSNLVR QAGHLAS QSSHLVR DPGALVR QSSNLAS DCRDLAR QSSSLVR QRANLRA QSSNLVR QRANLRA QRANLRA RSDNLVR RSDDLVR Target sitesb Half-site 1 Half-site 2 5′-GTA GGT TGG – GAA AGA GGA-3′ 5′-TGA GCG GCT – TGA GGG GTC-3′ 5′-GCT AGA GCA – GTT GAC TAA-3′ 5′-GCA GAC GGT – TAG GAC GCC-3′ 5′-GTC AGA GGA – GTA GGC GTA-3′ 5′-TGA TGA GGT – GTA GGC AAA-3 5′-TAG GGG GAA-3′ 5′-GGG GAT AAA-3′ 5′-GGG GAA AAA-3′ 5′-GTT GAA GAG-3′ 5′-GGT TGA GCG-3′ Fold act.c 8× 10× 20× 80× 79× 27× 80× 4× 30× 5× 7× Kd (nM)d n.d. 23 n.d. 74 n.d. n.d. 95 n.d. 1,009 n.d. n.d. The DNA interacting helices are presented with the predicted 18 bp or 9 bp target site. The fold activation of the endogenous VE-cadherin gene is shown. aZF helices are positioned in anti-parallel orientation (COOH-F6 to F1-NH2) relative to the DNA target sequence. Amino acid position –1 to +6 of each DNA recognition helix is shown.144 clones are 6ZF proteins; VE clones are 3ZF proteins. bPredicted target DNA sequences are presented in the 5′→3′ orientation. cFold change of expression from FACS data is determined relative to the primary unselected library (3ZF or 6ZF library). dDissociation constant (K ) determined by gel shift assay. Data represents the average of two to four experiments. d 270 nature biotechnology • VOLUME 21 • MARCH 2003 • www.nature.com/naturebiotechnology RESEARCH ARTICLE Figure 2. Specificity of isolated TFZF clones in vivo and in vitro. (A) A431 cells were infected with different pMX-TFZF (containing the VP64 activator domain), stained with ten different antibodies, and analyzed by flow cytometry. Blue, A431 cells infected with pMX-TFZF VE-1 (a single clone selected for VE-cadherin activation); orange, A431 cells infected with the 3ZF unselected library; green, mock-infected cells; stippled line, control staining without primary antibody. Genes encode: CD58, leukocyte function-associated antigen; CDH5, VE-cadherin (CD144); EGF, epidermal growth factor; FUT4, 3-FAL selectin ligand (CD15); ICAM1, intracellular adhesion molecule (CD54); ERBB2, ERBB3, erythroblastic leukemia viral oncogene homolog-2 and -3; ITGA6, integrin-α6 (CD49f); ITGB4, integrin-β4 (CD104); TNFRSF6, Apo1-FAS antigen (CD95). (B, C) DNA-binding ELISA of the selected –6ZF (B) and –3ZF protein domains (C) expressed as fusions with MBP. All TFZFs were selected for VE-cadherin upregulation except 54.3, which was selected for ICAM-1 activation. The DNA substrates contained the 18 bp or 9 bp predicted binding site for each 6ZF or 3ZF protein, respectively (Table 1). © 2003 Nature Publishing Group http://www.nature.com/naturebiotechnology A B C binding sites are presented in Table 1. From a total of 48 3ZF clones and 36 6ZF clones tested, a number of sequences were identical at the nucleotide level, indicating selective pressure for particular clones from the libraries. Some TFZFs were able to induce strong CDH5 expression—for example, the 6ZF clone 144-13 and the 3ZF clones VE-1 and VE-8. To test the specificity of these TFZFs for CDH5, we delivered TFZFs into A431 cells and probed them with antibodies specific for ten different cell surface markers. The TFZF clones VE-1, VE-5, VE-8, VE-13, 144-4, 144-5, and 144-13 preferentially activated CDH5 compared with the other genes tested (Fig. 2 and Supplementary Fig. 2 online). The 3ZF clone VE-1 was the most specific TFZF regulator in vivo, as determined by FACS (Fig. 2A). 3ZF proteins may be capable of binding multiple sites in the human genome and activating, to varying degrees, more than one gene. Depending on the application, this could be a limitation or an advantage. To verify that the selected TFZFs bound their predicted DNA substrates in vitro, we expressed the zinc-finger binding domains as C-terminal fusions with bacterial maltose-binding protein (MBP). The DNA-binding specificity of each fusion protein was tested by ELISA using a panel of DNA substrates (Fig. 2B,C). The predicted DNA binding site of each TFZF was decoded from the α-helical sequence of the corresponding zinc finger (Table 1). As expected, the majority of the TFZFs specifically bound their predicted target site in vitro. Notably, some of the α-helices selected in TFZFs VE-1, VE-5, and VE-8 were identical or very similar (Table 1), explaining their similar binding-site preferences (Fig. 2C). TFZFs VE-1 and VE-8 shared two identical α-helices that interact with the subsequence 5′-GGGGAA-3′, resulting in recognition of the VE-1 predicted target site by both VE-1 and VE-8. The binding-site preferences of these proteins, and in particular the strong recognition of both VE-1 and VE-8 for the same target site, raises the possibility that these TFZFs have been selected to bind partially overlapping genomic sites. To verify that the selected TFZFs were able to regulate CDH5 at the level of transcription, we analyzed CDH5 mRNA levels of A431 cells infected with clones 144-4, 144-13, and VE-1 by RT-PCR. As a positive control we used human umbilical endothelial cells (HUVEC) expressing CDH5. Specific CDH5 product was detected in A431 cells infected with the TFZF constructs, and these clones were able to upregulate the expression of CDH5 at the level of transcription (Fig. 3A,B). www.nature.com/naturebiotechnology • Next, we investigated whether or not the TFZFs were able to directly activate the proximal human CDH5 promoter. In mice, a promoter fragment (–2486 to +24) is sufficient to drive endothelial-specific expression of a reporter gene in transgenic animals21. We cloned a homologous region of the human CDH5 promoter upstream of a luciferase reporter and carried out transactivation studies using TFZFs in transient transfection assays of A431 cells (Fig. 4). Only TFZFs VE-1, VE-5, and VE-8 strongly activated the CDH5 promoter (up to 200-fold; Fig. 4A and Supplementary Fig. 3 online). We mapped the VE-1, VE-5, and VE-8 response elements in the CDH5 promoter using serial deletions of the promoter. Important transactivation determinants of VE-1 were located between positions –2369 and –1861, whereas VE-5 and VE-8 responded significantly to elements located between nucleotides –1861 and –1342 (Fig. 4A). In addition, both VE-1 and VE-8 (but not VE-5) activated the proximal (–403 to +80) fragment of the CDH5 promoter 10–15-fold. Promoter regions associated with luciferase activation correlated with TFZF binding in vitro (Fig. 4B). We localized a putative VE-1 and VE-8 binding site between positions –88 and +80 of the proximal A B Figure 3. Semiquantitative RT-PCR analysis of A431 cells infected with several pMX-TFZF selected for CDH5 activation. (A) RT-PCR analysis of CDH5 expression in these infected cells (clones 144-4, 144-13, and VE-1). HUVEC cells, which express CDH5, were used as a positive control. A431, mock-infected cells; –, control experiment in absence of cDNA. (B) Relative CDH5 mRNA levels were normalized to TFZF expression using VP64-specific primers. Equal loading was controlled using GAPDH-specific primers. MARCH 2003 • VOLUME 21 • nature biotechnology 271 RESEARCH ARTICLE © 2003 Nature Publishing Group http://www.nature.com/naturebiotechnology A C D endothelial cells23. Our experiments showed that VE-1 and VE-8 are able to activate the proximal (–88 to +80) human CDH5 promoter fragment 10–15-fold through interaction with a single EBS, but activation of the reporter was enhanced up to 200fold through interaction with distal sequences located between positions –2369 and –1342. In mice, the proximal CDH5 promoter (–139 to +24) is responsible for ubiquitous transcription, but upstream sequences are necessary to silence the activity of the F B basic promoter in non-endothelial cells21. It is possible that the TFZFs could interfere with the silencing of the CDH5 promoter between positions –2369 and –1342, resulting in an enhanced transactivation. Examination of the CDH5 promoter E showed several potential VE-1, VE-5, and VE-8 binding sites in that region. Next, we focused on TFZF VE-1 to study zinc finger–binding determinants along this distal promoter area. To determine the binding-site G preferences of VE-1, we carried out in vitro DNA selection experiments (cyclic amplification of selected targets (CAST) assay) using a randomized 10 bp DNA library and purified Figure 4. Interactions of TFZFs VE-1, VE-5, and VE-8 with the CDH5 promoter. (A) Luciferase transactivation VE-1 protein. After four rounds assay of VE-1, VE-5, and VE-8 with several 5′ deletions of the CDH5 promoter in A431 cells. (B) DNA-binding of DNA selection, all the analyzed ELISA of several promoter fragments with the TFZFs VE-1, VE-5, and VE-8 purified as a fusion with MBP. selected targets contained a 7 bp Promoter fragments (boxes) were amplified by PCR using 5′-biotinylated primers. The binding of each fragment was normalized and expressed as percentage of the highest value. Binding data was represented invariable consensus core, in a color gradient (higher binding corresponds to darker boxes). (C) DNA-binding ELISA of VE-1, VE-5, and 5′-AGGGGGA-3′ (Fig. 4G). VE-8 proteins with the DNA duplex pr–88 and with the mutant pr–88(G4→T4). (D) Luciferase transactivation Positions 1 and 9 flanking this core assay of VE-1, VE-5, and VE-8 with the proximal –88 CDH5 promoter fragment and the same fragment tolerated nucleotide variations. containing a point mutation (G4→T4). (E) Summary of putative interactions between VE-1 and CDH5 promoter fragments. Open boxes, potential binding sites for VE-1 as determined in vitro; underlining, putative EBS. The Indeed, nucleotide 1 is the partner sequence of the –88 bp proximal human CDH5 promoter and the point mutation G→T introduced for of Thr+6, located in the α-helical transactivation studies are indicated. (F) Interaction of VE-1 with several potential binding sites located in the region of VE-1 ZF3. As in the case of CDH5 promoter. The Kd (±s.d.) of VE-1 with its predicted DNA substrate (VE-1 subs) was determined by gel shift assay. Kd values for VE-1 with promoter DNA duplexes containing potential VE-1 binding sites (comprising Zif268 (ref. 25), Thr+6 is not the 9 bp putative interacting sequence and three flanking base pairs) were determined by ELISA and expected to make specific hydrogen normalized to VE-1 subs. The positions of the potential binding sites relative to the transcription start site are bonds and therefore could not indicated. Nucleotides that differ from the theoretical VE-1 binding site (VE-1 subs) are indicated in red. unambiguously discriminate its tar(G) DNA sequences selected in vitro from a randomized DNA library (N10) for its interaction with VE-1 by get nucleotide. Nucleotide 9 is a tarCAST assay. The number of sequences containing identical VE-1 binding site is indicated. Open box, invariable nucleotides (consensus). get of Gln–1 in the α-helix of ZF1. Although Gln–1 in this particular zinc finger prefers A at position CDH5 promoter (the pr –88 duplex, 5′-CAGG4GGGAA-3′) that 3′ of the triplet, it can also tolerate T, C, or G, as reported for the matched 8 of 9 bp of the predicted VE-1 binding site (Fig. 4E). same GAA-binding zinc finger of a Zif268 variant12. Figure 4F shows an alignment of the potential VE-1 binding sites found in Indeed, both VE-1 and VE-8 interacted specifically with this duplex the distal CDH5 promoter between positions –2369 and –1342. In in vitro (Fig. 4C). A single mutation in this duplex (G4→T4) completely disrupted its interactions with VE-1 and VE-8. A promoter vitro binding data showed that three DNA sequences in this region fragment (–88 to +80) containing this sequence retained VE-1- and interacted with VE-1 with an affinity similar to those of the preVE-8-mediated transactivation, whereas the fragment bearing the dicted VE-1 substrate and the –88 duplex (duplexes –2303, –1990, point mutation was unresponsive to the transcription factors (Fig. and –1591). In agreement with the CAST data, these duplexes have 4D). The sequence 5′-GGAA-3′ (ETS-binding site-2, EBS2) is conan identical core but different nucleotides at positions 1 and 9. As served between mouse and the human promoters, and in the mouse expected, mutations in the conserved core all decreased the affiniit interacts with the ETS-1 protein, a transcription factor of the ETS ty of VE-1 for its target DNA duplex. Overall, these data suggest family expressed in endothelial cells during blood vessel formathat a possible mechanism of activation by TFZF VE-1 involves direct regulation of the promoter by interaction with multiple tion22–24. In the mouse proximal promoter, Ets-1 binds to two neighboring GGAA sites (EBS2 and EBS4) and activates CDH5 in binding sites in both the proximal and distal regions. 272 nature biotechnology • VOLUME 21 • MARCH 2003 • www.nature.com/naturebiotechnology © 2003 Nature Publishing Group http://www.nature.com/naturebiotechnology RESEARCH ARTICLE A B C D E F Figure 5. Regulation of CDH5 by TFZFs in several human cancer cell lines. Blue, cells infected with a pMX construct containing the DNA binding domain of VE-1 (A–E) or VE-8 (F) and the VP64 activator domain, stained with antiCD144 and analyzed by FACS. Red, cells infected with the pMX vector containing the same DNA-binding domain but linked to the KRAB repression domain (SKD), and stained with anti-CD144 (anti-VE-cadherin). Green, level of VE-cadherin expressed on mock-infected cells; stippled lines, cells stained in the absence of primary antibody. Many TFZFs activated CDH5 in cancer cell lines where the gene product was not significantly expressed as determined by FACS, such as A431 (squamous carcinoma), HeLa, MDA-MB-435s (breast cancer), and HT29 (colon cancer) cells (Fig. 5). Notably, some regulators activated (when linked to a VP64 activator domain) or repressed (when linked to the KRAB repression domain5) CDH5 expression in cell lines where the gene is well expressed, such as in melanoma C8161 (Fig. 5B) or SKBR-3 cells (Fig. 5C). In melanoma C8161 cells, expression of CDH5 has been associated with the formation of vascular-like networks in three-dimensional collagen gels26. The selected TFZFs could be useful tools for studying the role of CDH5 with respect to several aspects of angiogenesis, tumor progression, and metastasis by these different cancer cell lines. Among all the TFZFs tested, the promoter-binding TFZFs were able to regulate CDH5 in all cell lines tested, as expected for direct regulation of the promoter. Those TFZFs that did not transactivate the promoter in the reporter assay (such as 144-4, 144-5, and 144-13) showed different activation profiles that varied depending on the cell line examined. Some of these TFZFs could bind regulatory regions located in the large 5′ introns of CDH5, or even regulatory regions of upstream genes, perhaps encoding tissue-specific factors involved in controlling CDH5 expression. Candidates for these indirect targets include some members of the ETS family, including ETS1, ERG, and FLI1 (refs. 23, 24). However, database searches showed that at most 14 of 18 bp within these regions had identity to predicted TFZF targets. A search for 6ZF-binding sites in the human genome identified target sites matching between 13 and 18 bp (see Supplementary Tables 1 and 2 online). Within the CDH5 locus, 13–14 bp matches were identified. Although further investigation is required to understand their in vivo significance, these results suggest that 6ZF proteins could use a subset of the 18 bp sites to interact with genomic sites. In summary, we present a method to identify functional DNAprotein interactions involved in the activation of target genes in human cells by screening large combinatorial libraries of TFZFs. We characterized clones selected from 3ZF and 6ZF libraries that were able to induce an endothelial specific marker, VE-cadherin, in a non-endothelial cancer cell line A431. A population of selected TFZFs was able to directly transactivate the CDH5 promoter by www.nature.com/naturebiotechnology • binding both a proximal and a distal promoter region. In addition, we showed that these TFZFs could regulate their target gene in a variety of human cancer cell lines. The advantages of libraries of small TFZF, such as 3ZF libraries, include high representation of individual members and the possibility of binding multiple sites in one or more regulatory regions, a mode of regulation analogous to the action of natural transcription factors. Highly complex libraries of the 6ZF type have low representation of each individual TFZF clone but potentially higher specificity. These TFZFs could recognize low-frequency, potentially unique sites that are sufficient to activate or repress the target gene. Used in combination with current technologies such as DNA microarrays and chromatin immunoprecipitations, they could be useful for identifying genes and defining pathways. Recent studies in transgenic tobacco and Arabidopsis thaliana plants indicate that zinc-finger technology can be applied to whole organisms27,28. Thus, this methodology represents a genetic tool for the selection or screening of gain-offunction and loss-of-function phenotypes at the level of the cell or organism based on direct gene regulation or on more complex changes in transcriptional programs. Experimental protocol Construction of TFZF libraries. The 3ZF library was created by overlapping PCR using 23 different ZF1s, 21 ZF2s, and 19 ZF3s mixed into the PCR reaction (see Supplementary Experimental Protocol online). All DNAs used as templates for PCR were SP1 variants containing specific zinc-finger α-helices selected and characterized in our laboratory8,10,12. These templates were cloned and sequenced in pMalc2 (New England Biolabs, Beverly, MA). The final (F1 + F2 + F3) PCR product was digested with SfII and cloned in the pComb3X vector29. The resulting pComb3X-3ZF library vector was used to construct the 6ZF library as follows. First, 10 µg of pComb3X-3ZF library vector was digested with AgeI and NheI and ligated with 3 µg of XmaI- and NheI-digested inserts to generate the pComb3X-6ZF library vector. Both 3ZF and 6ZF library inserts were digested with SfII and subcloned into the retroviral vector pMX-IRES-GFP, containing the VP64 activation domain5. The final sizes of the 3ZF and 6ZF libraries in the retroviral vector were 3.52 × 105 and 5.3 × 107, respectively. Screening for functional TFZF activators in A431 cells and flow cytometry. The pMX-IRES-GFP-3ZF library and pMX-IRES-GFP-6ZF library DNAs were transfected into 293 packaging cells5 using Lipofectamine Plus (Invitrogen, Carlsbad, CA) according to the manufacturer’s directions. The product retroviral particles were used to infect 5 × 105 (3ZF library) or 108 (6ZF library) A431 cells. At 48 h after infection, these cells were stained with ten different primary antibodies (5 µg/ml) specific for different cell surface markers: anti-CD15 (clone 2F3; BD, PharMingen, San Diego, CA), antiERBB-2 (clone SP77; ref. 5), anti-ERBB-3 (clone SPG1, NeoMarkers, Fremont, CA), anti-CD104 (clone 450–9D), anti-CD144 (clone 55–7H1, PharMingen), anti-CD54 (clone HA58, PharMingen), anti-CD58 (clone 1C3, PharMingen), anti-CD95 (Clone DX2, PharMingen), anti-EGF (Santa Cruz Biotechnology, Santa Cruz, CA), anti-CD49f (clone GoH3, PharMingen) and secondary antibodies conjugated to phycoerythrin (PE, 1:100 dilution, Jackson ImmunoResearch, West Grove, PA). Next, 5 × 105 to 106 GFP+PE+ infected cells (3ZF library) or 107 GFP+PE+ infected cells (6ZF library) were sorted using a FACSVantage (BD, PharMingen), and the DNA encoding the pool of TFZFs was recovered by PCR using the primers pMXf2 (forward) 5′-TCAAAGTAGACGGCATCG-3′ and VP64AscB (backward) 5′-TCGTCCAGCGCGCGTCGGCGCG-3′, and cloned again into the pMX vector. PCR was typically carried out using 50 ng–1 µg of genomic DNA and a program of 1 cycle of 5 min at 94 °C; 35 cycles of 30 s at 94 °C, 2 min at 52 °C and 2 min (3ZF library) or 3 min (6ZF library) at 72 °C cycles; and a final cycle of 10 min at 72 °C. Independent selections were done for each cellsurface marker. The selections were repeated for three (3ZF library) and four rounds (6ZF library). DNA from individual clones was prepared and used to prepare virus to infect A431 cells. These cells were analyzed by flow cytometry using ten different antibodies as described above. For downregulation analysis, zinc fingers were subcloned into pMX-IRES-GFP-SKD vector (containing the KRAB repression domain, SKD; ref. 5) and infections were carried out as described above. The cell lines A431, HeLa, and SKBR-3 were cultured as MARCH 2003 • VOLUME 21 • nature biotechnology 273 RESEARCH ARTICLE © 2003 Nature Publishing Group http://www.nature.com/naturebiotechnology described5, cell line MDA-MB-435s was obtained from the American Type Culture Collection (Manassas, VA), and cell lines C8161 and HT29 were a generous gift from R.A. Reisfeld of the Scripps Research Institute. RNA extraction and RT-PCR. RNA from A431-infected cells and HUVEC cells (Clonetics, San Diego, CA) were extracted with the Tri reagent method (MRC, Cincinnati, OH). cDNA was made using a RT-PCR kit (Invitrogen, Carlsbad, CA). PCR was made using CDH5-specific primers25: VE-CAD-f (forward) 5′CCGGCGCCAAAAGAGAGA-3′ and VE-CAD-b (backward) 5′-CTCCTTTTCCTTCAGCTGAAGTGGT-3′. Expression of GAPDH (encoding glyceraldehyde-3-phosphate dehydrogenase) was measured as a loading control using the primers GAPDH-f (forward) 5′-CCATGTTCGTCATGGGTGTGA-3′ and GAPDH-b (backward) 5′-CATGGACTGTGGTCATGAGT-3′. CDH5 mRNA levels were normalized relative to TFZFs using primers NLSseq-F (forward) 5′-CCGAAAAAGAAACGCAAAGTTGGG-3′ and pMXB (backward) 5′-CAGAATTTCGACCACTGTGC-3′, which amplify VP64. PCR conditions were 1 cycle of 3 min at 94 °C; 20–30 cycles of 1 min at 94 °C, 2.5 min at 52 °C, and 2 min at 72 °C; and 1 cycle of 5 min at 72 °C. PCR products were visualized in a 1% (CDH5) or 1.5% agarose gel (GAPDH) and quantified using ImageQuant 1.2. The 1-kbp CDH5-specific PCR product was sequenced and shown to correspond to the expected CDH5 sequence. Luciferase assays. The human CDH5 promoter fragment (–2486 to +24) was amplified from A431 cells by PCR using the primers cdh5pro-f3 (forward) 5′GAGGAGGAGGAGGAGGGTACCGGGGCCCAAGAAATCTGCATATTC-3′ and cdh5pro-b2 (backward) 5′-GAGGAGGAGGAGGAGAGATCTTGTTTCTGTTCCGTTGGACTGC-3′). The products were sequenced and cloned into pGL3basic (Promega, Madison, WI). Next, 100 ng of reporter construct, 75 ng 1. Beerli, R.R. & Barbas, C.F. III. Engineering polydactyl zinc-finger transcription factors. Nat. Biotechnol. 12, 632–641 (2002). 2. Brummelkamp, T.R., Benards, R. & Agami, R. A system for stable expression of short interfering RNAs in mammalian cells. Science 296, 550–553 (2002). 3. Hiroaki, K., Onuki, R., Suyama, E. & Taira, K. Identification of genes that function in the TNF-α-mediated apoptotic pathway using randomized hybrid ribozyme libraries. Nat. Biotechnol. 20, 376–380 (2002). 4. Walden, R. et al. Activation tagging: a means of isolating genes implicated as playing a role in plant growth and development. Plant Mol. Biol. 26, 1521–1528 (1994). 5. Beerli, R.R., Dreier, B. & Barbas, C.F. III. Positive and negative regulation of endogenous genes by designed transcription factors. Proc. Natl. Acad. USA 97, 1495–1500 (2000). 6. Zhang, L. et al. Synthetic zinc finger transcription factor action at an endogenous chromosomal site. Activation of the human erythropoietin gene. J. Biol. Chem. 275, 33850–33860 (2000). 7. Liu, P.Q. et al. Regulation of an endogenous locus using a panel of designed zinc finger proteins targeted to accessible chromatin regions. Activation of vascular endothelial growth factor A. J. Biol. Chem. 276, 11323–11334 (2001). 8. Dreier, B., Beerli, R.R., Segal, D.J., Flippin, J.D. & Barbas, C.F. III. Development of zinc finger domains for recognition of the 5′-ANN-3′ family of DNA sequences and their use in the construction of artificial transcription factors. J. Biol. Chem. 276, 29466–29478 (2001). 9. Jamieson, A.C., Kim, S.H. & Wells, J.A. In vitro selection of zinc fingers with altered DNA-binding specificity. Biochemistry 33, 5689–5695 (1994). 10. Segal, D.J., Dreier, B., Beerli, R.R & Barbas, C.F. III. Toward controlling gene expression at will: selection and design of zinc finger domains recognizing each of the 5′-GNN-3′ DNA target sequences. Proc. Natl. Acad. USA 96, 2758–2763 (1999). 11. Beerli, R.R., Segal, D.J., Dreier, B. & Barbas, C.F. III. Toward controlling gene expression at will: specific regulation of the erbB-2/HER-2 promoter by using polydactyl zinc finger proteins constructed from modular building blocks. Proc. Natl. Acad. USA 95, 14628–14633 (1998). 12. Dreier, B., Segal, D.J. & Barbas, C.F III. Insights into the molecular recognition of the 5′-GNN-3′ family of DNA sequences by zinc finger domains. J. Mol. Biol. 303, 489–502 (2000). 13. Liu, Q., Xia, Z. & Case, C.C. Validated zinc finger protein designs for all 16 GNN DNA triplet targets. J. Biol. Chem. 277, 3850–3856 (2002). 14. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001). 15. Liu, X., Sun, Y., Constantinescu, S.N, Karam, E., Weinberg, R.A. & Lodish, H.F. Transforming growth factor β-induced phosphorylation of Smad3 is required for growth inhibition and transcriptional induction in epithelial cells. Proc. Natl. Acad. Sci. USA, 94, 10669–10674 (1997). 274 nature biotechnology • VOLUME 21 • of TFZF cloned in pcDNA3 (Invitrogen), and 100 ng of CMV-LacZ reporter were transiently cotransfected in A431 cells. Luciferase activities were measured using a luciferase reporter assay system (Promega). Transfection efficiencies were normalized with the β-galactosidase reporter system (Galacto-Light Plus kit; Tropix, Bedford, MA). Data represent the average of 6–12 experiments. Point mutations in the promoter were introduced by PCR using high-fidelity enzyme (Roche, Indianapolis, IN) and verified by DNA sequencing. In vitro analysis of TFZF binding, mobility-shift experiments, and CAST. These assays were done as described previously11,30 (see Supplementary Experimental Protocol online). Note: Supplementary information is available on the Nature Biotechnology website. Acknowledgments The authors thank D. Valente and N. Niederberger for technical support, and D.J. Segal and X. Li for the critical reading of the manuscript. This work was supported by the US National Institutes of Health CA86258 and DK61803. L. Magnenat was the recipient of postdoctoral fellowships from the Swiss National Science Foundation. Competing interests statement The authors declare that they have competing financial interests: see the Nature Biotechnology website (http://www.nature.com/naturebiotechnology) for details. Received 16 September 2002; accepted 3 January 2003 16. Dejana, E., Bazzoni, G. & Lampugnani, M.G. Vascular endothelial (VE)–cadherin: only an intercellular glue? Exp. Cell Res. 252, 13–19 (1999). 17. Vittet, D., Buchou, T., Schweitzer, A., Dejana, E. & Hubert, P. Targeted null-mutation in the vascular endothelial-cadherin gene impairs the organization of vascular-like structures in embryoid bodies. Proc. Natl. Acad. USA 94, 6273–6278 (1997). 18. Carmeliet, P. et al. Targeted deficiency or cytosolic truncation of the VE-cadherin gene in mice impairs VEGF-mediated endothelial survival and angiogenesis. Cell 98, 147–157 (1999). 19. Liao, F. et al. Monoclonal antibody to vascular endothelial–cadherin is a potent inhibitor of angiogenesis, tumor growth, and metastasis. Cancer Res. 60, 6805–6810 (2000). 20. Liao, F. et al. Selective targeting of angiogenic tumor vasculature by vascular endothelial–cadherin antibody inhibits tumor growth without affecting vascular permeability. Cancer Res. 62, 2567–2575 (2002). 21. Gory, S., Vernet, M., Laurent, M., Dejana, E., Dalmon, J. & Huber, P. The vascular endothelial–cadherin promoter directs endothelial-specific expression in transgenic mice. Blood 93, 184–192 (1999). 22. Gory, S. et al. Requirement of a GT box (Sp1 site) and two Ets binding sites for vascular endothelial cadherin gene transcription. J. Biol. Chem. 273, 6750–6755 (1998). 23. Lelievre, E., Mattot, V., Huber, P., Vandenbunder, B. & Soncin, F. ETS1 lowers capillary endothelial cell density at confluence and induces the expression of VE-cadherin. Oncogene 19, 2438–2446 (2000). 24. Lelievre, E., Lionneton, F., Mattot, V., Spruyt, N. & Soncin, F. Ets-1 regulates fli-1 expression in endothelial cells. Identification of ETS binding sites in the fli-1 gene promoter. J. Biol. Chem. 277, 25143–25151 (2002). 25. Elrod-Erickson, M., Rould, M.A., Nekludova, L. & Pabo, C.O. Zif268 protein-DNA complex refined at 1.6 Α: a model system for understanding zinc finger-DNA interactions. Structure 4, 1171–1180 (1996). 26. Hendrix, M.J.C. et al. Expression and functional significance of VE-cadherin in aggressive human melanoma cells: role in vasculogenic mimicry. Proc. Natl. Acad. USA 94, 8018-8023 (2001). 27. Ordiz, M.I., Barbas, C.F III & Beachy, R.N. Regulation of transgene expression in plants with polydactyl zinc finger transcription factors. Proc. Natl. Acad. USA 99,13290–13295 (2002). 28. Guan, X. et al. Heritable endogenous gene regulation in plants with designed polydactyl zinc finger transcription factors. Proc. Natl. Acad. Sci. USA 99, 13296–13301 (2002). 29. Barbas, C.F III, Burton, D.R., Scott, J.K., Silverman, G.J. Phage-display vectors. in Phage Display: A Laboratory Manual 2.1–2.19 (CSH, New York, 2001). 30. Segal, D.J. et al. Evaluation of a modular strategy for the construction of novel polydactyl zinc finger DNA-binding proteins. Biochemistry (in press). MARCH 2003 • www.nature.com/naturebiotechnology