THE MYC-INTERACTING ZINC FINGER PROTEIN-1: DNA AND PROTEIN INTERACTIONS IN HUMAN EMBRYONIC STEM CELLS A Project Presented to the faculty of the Department of Biological Sciences California State University, Sacramento Submitted in partial satisfaction of the requirements for the degree of MASTER OF ARTS in Biological Sciences (Stem Cell) by Dana Anne Burow SPRING 2012 THE MYC-INTERACTING ZINC FINGER PROTEIN-1: DNA AND PROTEIN INTERACTIONS IN HUMAN EMBRYONIC STEM CELLS A Project by Dana Burow Approved by: __________________________________, Committee Chair Thomas Landerholm __________________________________, Second Reader Christine Kirvan __________________________________, Third Reader Jan Nolta ________________________ Date ii Student: Dana Burow I certify that this student has met the requirements for format contained in the University format manual, and that this thesis is suitable for shelving in the Library and credit is to be awarded for the thesis. _________________________, Graduate Coordinator Ronald Coleman Department of Biological Sciences iii _________________ Date Abstract of THE MYC-INTERACTING ZINC FINGER PROTEIN-1: DNA AND PROTEIN INTERACTIONS IN HUMAN EMBRYONIC STEM CELLS by Dana Burow Stem cells can divide indefinitely and maintain their capability to differentiate into many cell types, the key features of self-renewal and pluripotency, which explains their importance to regenerative medicine. Embryonic stem cells (ESCs), the most highly pluripotent of all stem cells, have the added potential of tumorigenesis. This is thought to be driven in part through a shared gene expression program regulated by the transcription factor, Myc. Myc, first characterized as a potent oncogene, is shown to maintain pluripotency and self-renewal in mouse ESCs. Myc regulation of pluripotency and self-renewal is evident by its role in the generation of induced pluripotent stem (iPS) cells. Myc is thought to regulate target gene expression both locally through classical mechanisms, and globally through euchromatin remodeling. In this way, Myc can affect gene expression on a large enough scale to reprogram differentiated cells into iPS cells. Miz-1, a transcription factor named for its interaction with Myc, is thought to form a co-repressor complex with Myc, silencing Miz-1 target genes including those associated with differentiation and proliferation. Miz-1 contains BTB/POZ and 13 C2H2 zinc fingers iv and is thought to bind initiator sequences (INR) in the core promoters of target genes thereby modulating their expression. Still, relatively little is known about the function of Miz-1 as a transcriptional regulator and recent epigenetics analysis in hESCs suggest Miz-1 binds alternative sequences, not associated with the INR of target gene promoters. Using a Miz-1 maltose binding protein (MBP) fusion protein tag system, this study implemented an in vitro, high-throughput DNA binding assay and Multiple em for Motif Elicitation (MEME) analysis to identify putative Miz-1 DNA biding motifs de novo. The consensus motifs, ATCGAT and GATTACCGA were then confirmed by electrophoretic mobility shift analysis (EMSA) and further bioinformatics analysis revealed motif occurrences in functionally relevant gene ontology clusters including: transcription regulation, growth, chromatin, and developmental genes. MBP pull-down mass spectrometry analysis also identified interesting Miz-1 protein cofactors from hESC nuclear extracts that are associated with reported Miz-1 functions. Miz-1 DNA and protein interactions highlighted in this study confirm its role as a master transcriptional regulator, cofactor and antagonist of Myc in hESCs. Though, the findings also underline the importance of further characterization of pluripotency and self-renewal in hESCs so that potential therapies may be safe and effective. ___________________________, Committee Chair Thomas Landerholm ___________________________ Date v ACKNOWLEDGEMENTS I would like to acknowledge the support and dedication of the faculty mentors in the Department of Biological Sciences at Sacramento State University. I would also like to recognize the Knoepfler lab at UC Davis for their guidance and the opportunity to be a part of the lab for the past year. Thanks to the Segal lab at the UC Davis Genome Center for their advice and collaboration on Bind-n-Seq. Finally, thanks goes out to my wonderful family and friends who’s love and support has helped me accomplish my goals. vi TABLE OF CONTENTS Page Acknowledgements ............................................................................................................ vi List of Tables ................................................................................................................... viii List of Figures .................................................................................................................... ix INTRODUCTION ...............................................................................................................1 METHODS ..........................................................................................................................7 Cloning, Recombinant Protein Expression and Purification7 Bind-n-Seq: in vitro DNA Binding Assay and de novo Motif Finding8 Electrophoretic Mobility Shift Assay9 Bioinformatics Analysis of Motifs Identified by Bind-n-Seq.9 MBP-Miz-1 Pull-down Mass Spectrometry Analysis9 RESULTS ..........................................................................................................................11 Miz-1 Expression and Purification by MBP11 De novo Motif Finding by Bind-n-Seq11 Bioinformatics Analysis of Motifs Identified by Bind-n-Seq13 EMSA Supports Miz-1 Binding ATCGAT and GATTACCGA16 MBP-Miz-1 Pull-down Mass Spectrometry Analysis16 DISCUSSION ....................................................................................................................25 Literature Cited ..................................................................................................................30 vii LIST OF TABLES Table Page 1. Full-length Miz-1 Motif Consensus Sequences Identified by Bind-n-Seq14 2. Zinc finger Miz-1 Motif Consensus Sequences Identified by Bind-n-Seq15 3. Putative Miz-1 DNA Binding Motifs17 4. Gene Ontology Clusters Identified by DAVID Analysis18 5. MBP-Miz-1 Mass Spectrometry Analysis by Scaffold 323 viii LIST OF FIGURES Figure Page 1. SDS-PAGE Detection of Protein from Purification by MBP. ......................................12 2. DAVID Analysis of Motif-Containing Miz-1 Bound Genes. .......................................19 3. EMSA Indicates Miz-1 Binding Motif-containing Oligonucleotide Probes20 4. MBP-Miz-1 Pull-down of c-Myc22 5. Current Model Proposed for Miz-1 Target Gene Regulation24 ix 1 INTRODUCTION Stem cells are the focus of regenerative medicine and may hold the key to unlocking cures for diseases ranging from neurological disorders to cancer and HIV infection. In the early 1980s, embryonic stem cells (ESCs) were first isolated from mouse embryos and shown to grow indefinitely while maintaining pluripotency, the ability to differentiate into many different cell types (Evans, 1981). Since then there have been significant gains in our understanding of these key features of stem cells: selfrenewal and pluripotency. A complex network of transcription factors mediate stem cell differentiation. In 2006, scientists were able to reverse this process and generate induced pluripotent stem cells, termed iPS cells, from differentiated fibroblast epithelial cells in mice (Takahashi & Yamanaka, 2006) and later in humans (Takahashi et al., 2007). The induction of pluripotent stem cells (iPS cells) demonstrated by Takahashi and Yamanaka requires the forced expression of a set of transcription factors, including the protooncogene, c-Myc (Takahashi & Yamanaka, 2006). Stem and cancer cells have many of the same properties, postulated to be driven in part through a shared gene expression program regulated by the transcription factor, cMyc. Myc was first characterized through its association as a potent oncogene, though the mechanisms of oncogenic activation were initially vague (Eisenman, 2001). Members of the Myc family of transcription factors are well documented as being deregulated in many cancers. This deregulation of Myc through translocation and duplication events results in increased Myc expression and allows for oncogenic activation (Meyer, 2008), whereas the normal, high levels of endogenous expression in 2 ESCs plays a key role in mediating pluripotency and self-renewal (N. V. Varlakhanova, Cotterman, R.F., deVries, W.N., Morgan, J., Donahue, L.R., Murray, S., Knowles, B.B., Knoepfler, P.S., 2010). Pluripotency and self-renewal properties, the foundation of stem cells, are implicated in their cancerous counterpart, teratoma, as early as 1960 (Pierce, 1960). Teratoma tumors contain all three germ layers and continue to be challenging to overcome with current cancer therapies. Despite the promise of stem cell therapies, there still remains the problem of ESC-related tumorigenesis. Myc is a key player in the generation of iPS cells and though alternate routes to iPS cell formation have been presented in recent years, the endogenous role of c-Myc is implicated in the shift to pluripotency and self-renewal in each of these cases. A 2008 study by Nakagawa was able to generate iPS cells without the forced expression of cMyc (Nakagawa M., 2008). Though they generally have significantly reduced reprogramming efficiency, it is suggested that endogenous c-Myc expression in these iPS cells is able to mediate the shift to pluripotency and self-renewal through its activation by the other exogenous factors used in reprogramming, including Oct4, Sox2, and Klf4 (Stadtfeld, 2010). This notion is supported by a study demonstrating that neural stem cells can be converted to iPS cells through exogenous Oct4 expression alone since the other defined factors are endogenously expressed in neural stem cells (Kim, 2009). In 2010, Stadtfeld have also linked the original set of iPS cell forming factors to another proven set (Oct4, Sox2, Nanog, and Lin28) through the deregulation of c-Myc by Lin28 (Stadtfeld, 2010). The importance of Myc in the mediation of pluripotency and selfrenewal in stem cells cannot be overlooked. 3 The Myc protein family is a group of transcription factors containing the basic helix loop helix leucine zipper (bHLHZ) domain. Myc, like other members of the bHLHZ superfamily of transcription factors, can heterodimerize with another bHLHZ protein, Max, and bind the enhancer box (E-box) sequence, CACGTG, activating target genes (Chaudhary, 1999). Myc can also have repressive action on gene expression through association with and inhibition of the activating function of the Miz-1 transcription factor (Peukert, 1997). In this way, Myc modulates gene expression to in turn regulate diverse cellular processes including: metabolism, cell cycle, differentiation, apoptosis, senescence, and DNA replication (Grandori, 2000). In more recent studies, Myc is shown to regulate global euchromatin structure and is tightly associated with certain histone modifications (Knoepfler et al., 2006). Myc and the cofactor transformation/transcription domain-associated protein (TRRAP) are known to recruit histone acetyl transferases (HAT) (McMahon, 2000). HATs are wellcharacterized proteins that function in the acetylation of amino-terminal lysine residues of histone proteins, especially those located near transcriptional start sites (Knoepfler, 2007). Myc is shown to regulate euchromatin on a global scale and to some extent independent of its role as a classical transcription factor (Cotterman et al., 2008). In a recent review, a model is proposed that defines Myc’s role in the self-renewal of stem cells and tumorigenesis through both local transcriptional activation, by classical mechanisms, and global euchromatin structure, whereby overexpression results in tumorigenesis (Knoepfler, 2007). In iPS cell formation, c-Myc is hypothesized to allow Oct4 and Sox2, two other defined factors in iPS cell formation, to bind to genomic targets 4 through its mediation of global histone acetylation (Takahashi & Yamanaka, 2006). While Myc is most well known for its transcriptional activating function, interest in its repressive functions is growing and is implicated in its regulation of stem cell pluripotency and self-renewal. Miz-1, named for its interaction with Myc (Myc-interacting zinc finger protein-1), was first characterized in 1997 and found to function strongly in growth arrest (Peukert, 1997). Miz-1 is a BTB/POZ (BR-C, ttk and bab/pox virus and zinc-finger) domaincontaining transcription factor that is thought to directly bind core promoter initiator (INR) sequences and recruit the co-activator protein, p300, in order to activate target genes, such as negative regulators of cell cycle control and growth and positive regulators of differentiation (Kime & Wright, 2003; Seoane et al., 2001; Staller, 2001; M. Wanzel, Herold, S., Eilers, M., 2003; Wu, 2003). Myc can bind Miz-1 through its HLH domain and is thought to repress Miz-1 gene activation through competition with p300 (Staller, 2001). Recent epigenetic studies also support the hypothesis that the mechanism by which Myc represses expression of differentiation genes, thereby maintaining pluripotency and self-renewal, is related to Miz-1. Co-immunoprecipitation using anti-cMyc antibody suggests recruitment of Histone Deacetylase 2 (HDAC2) and DNA (cytosine-5)-methyltransferase 3a (DNMT3a) to form a repressor complex (Varlakhanova unpublished data). The DNA CpG methyltransferase, DNMT3a, is known to interact with Myc by means of Miz-1, forming a corepressor complex that functions at the promoter of target genes like p21Cip1, a cyclin-dependent kinase inhibitor (Brenner et al., 2005) and Mad4, another transcriptional regulator of proliferation and differentiation 5 (Kime & Wright, 2003). This confirms that the mechanism by which Myc represses differentiation gene expression in hESCs is through Miz-1. Biological and epigenetic characterization of Miz-1 in hESCs demonstrates that Myc and Miz-1 function coordinately in the regulation of pluripotency and self-renewal through their repression of differentiation associated genes. Myc and Miz-1 also display antagonistic roles in the regulation of genes important to stem cell pluripotency, selfrenewal and differentiation. Genome-wide chromatin immunoprecipitation-microarray (ChIP-chip) analysis demonstrates that Myc also occupies nearly 30% of Miz-1 targets, and that these are predominantly differentiation associated genes, including many members of the Hox gene family (N. V. Varlakhanova, et al, 2011). Additionally, parallel ChIP-chip analysis of activating euchromatin marks, including acetylation of lysine 9 on histone 3 (AcH3K9) and trimethylation of lysine 4 on histone 3 (H3K4me3), and Miz-1 DNA binding show a significant overlap between genes involved in cellular metabolism and growth. Conversely, genes not associated with active euchromatin marks and those associated with the inactivation mark trimethylation of lysine 27 on histone 3 (H3K27me3) were predominately differentiation associated genes, including many Hox genes (N. V. Varlakhanova, et al, 2011). Myc knockdown in hESCs results in an upregulation of differentiation associated genes and a downregualtion of pluripotency and growth associated genes, while conversely Miz-1 knockdown in hESCs results in a downregulation of differentiation associated genes and an upregualtion pluripotency and growth associated genes (N. V. Varlakhanova, et al, 2011). Interestingly and contrary to the current literature (Kime & Wright, 2003; Seoane, et al., 2001; Staller, 2001; M. 6 Wanzel, Herold, S., Eilers, M., 2003; Wu, 2003), which describes Miz-1 binding localized to core promoter INR sequences, the recent work of Varlakhanova demonstrates that the global distribution of Miz-1 binding is predominantly localized to regions more than 1000 bases upstream of the transcriptional start sites of target genes (N. V. Varlakhanova, et al, 2011). It is important to note that unlike the Varlakhanova study, previous studies only analyzed Miz-1 regulation of few candidate genes and did not assess global genomic binding. Cis-Regulatory Element Annotation System (CEAS) analysis (Ji X, 2006) of Miz-1 ChIP-chip data from the Varlakhanova study failed to identify potential DNA binding motifs for Miz-1, however, Miz-1 INR sequenceindependent DNA binding is of clear significant to the global function of Miz-1. INRindependent Miz-1 binding represents more than half of total Miz-1 genomic binding sites, and identification of novel Miz-1 DNA binding motifs is central to furthering our understanding of this important Myc antagonist. Understanding the interworking of the complex Myc network of transcription factors, including Miz-1, that mediates stem cell self-renewal, pluripotency and differentiation will help to further our knowledge of both stem and cancer cell biology. Teasing out the subtle differences between Myc-mediated pluripotency and self-renewal in stem cells and that in cancer cells is of vital importance in furthering stem cell based therapies so that they are both safe and effective, and may also lead to novel cancer treatments. The present work identifies putative Miz-1 DNA binding motifs and potential protein cofactors and serves as a platform for further investigation into specific Miz-1 DNA and protein interactions in hESCs. 7 METHODS Cloning, Recombinant Protein Expression and Purification. A plasmid vector coding for a N-termal fusion of E. coli maltose binding protein (MBP) to full-length human Miz-1 was cloned by restriction ligation using pMAL-c5G (New England Biolabs Ipswich, MA) and Miz-1 cDNA generated from H9 hESC mRNA. The sequence encoding the 13 C2H2 zinc fingers (nucleotides 805-2379) of Miz-1 cDNA were cloned by Gateway (Invitrogen Life Technologies, Carlsbad, CA) into a plasmid vector coding for an N-terminal GSTMBP tag (Segal Lab, UC Davis Genome Center, Davis, CA). Transformed E. coli BL21STAR (Invitrogen Life Technologies) were grown at 37°C and 225rpm. Expression of the MBP-hMiz-1 fusion constructs was induced at 2.5 hours by Isopropyl β-D-1thiogalactopyranoside (IPTG). Cells were harvested 5 hours post-induction by centrifugation (3500rpm, 20 min, 4°C) and lysed in Zinc Buffer A [ZBA; 10mM Tris (pH 7.5), 90mM KCl, 1mM MgCl2, 90μM ZnCl2, 5mM DTT] by sonication (6 rounds: 30 sec (high), 30 sec rest). Protein lysate was isolated by centrifugation (20,000rpm, 30 min, 4°C) then incubated at 4°C with amylose linked agarose beads (New England Biolabs) for 20 min. Protein lysate was cleared by gravity flow and beads subsequently washed with 10 column volumes of ZBA. MBP-hMiz was eluted in 3 mL ZBA and 10 mM maltose then dialyzed in 2 L ZBA overnight to deplete free maltose. MBP-hMiz-1 protein was concentrated using Amicon Ultra Filter units (Millipore, Billerica, MA). Purity and quantity of the MBP-hMiz fusion protein was assessed by SDS-PAGE and Bradford Assay (Thermo Fisher Scientific, Waltham, MA). 8 Bind-n-Seq in vitro DNA Binding Assay and de novo Motif Finding. MBP-hMiz-1 proteins at various concentrations (Tables 1-2) were bound to random oligonucleotides with barcodes in Bind-n-Seq (BnS) binding buffer [BnSBB; 0.12μg/μL Herring Sperm DNA, 100μM ZnCl2, 5mM DTT, 5% BSA] for 30 minutes with agitation at room temp. Binding reactions were then washed 6X for 10 min each with BnS wash buffer [BnSWB; 10mM Tris (pH 8.5), 100μM ZnCl2, 1mM MgCl2, 5mM DTT] under various KCl salt concentrations. Bound oligonucleotides were eluted for 10 min in EB buffer (Qiagen, Hilden, Germany) and 10mM maltose. Quantitative PCR was performed by the Opticon Monitor system and SYBR green detection (Program: 94°C for 4 min initial denaturation, 26 cycles of 94°C for 30 sec, 63°C for 30 sec and 72°C for 1 min) to determine optimal amplification cycles for each set of oligonucleotides. Oligonucleotides were amplified using iProof DNA Polymerase (Bio Rad, Hercules, CA), cleaned by PCR Purification Kit (Qiagen), quantified by NanoDrop (Thermo Fisher Scientific), and 100ng of each sampled pooled for sequencing. Amplified, pooled oligonucleotides with barcodes were sequenced on the MiSeq (Illumina, San Diego, CA) at the UCD Genome Center Core facility and reads sorted and filtered for quality by the MiSeq platform software. Bioinformatics were performed by the Segal Lab at the UC Davis Genome center. Sorted, filtered reads were analyzed in randomly sampled clusters of 10,000 reads by MEME. Intermediate motifs were matched back to the original dataset and subsequent rounds of MEME performed to generate the most enriched motifs for each BnS condition (table 1). De novo motif finding was performed at the UC Davis Genome Center by the Segal lab. 9 Electrophoretic Mobility Shift Assay. EMSA was performed by binding MBP-hMiz to synthesized and hybridized probes (5’ CAAAAGTGCGATCGATGCTGCGTGGT 3’ and 5’ CAAAAGTGCGGATTACCGAGCTGCGTGGT 3’) and poly(deoxyinosinicdeoxycytidylic) acid nonspecific competitor in ZBA for 20 min at room temperature then visualized on Novex 6% DNA Retardation polyacrylamide gels (Invitrogen Life Technologies) by SYBR green DNA stain and SYPRO ruby protein stain (Invitrogen Life Technologies) at 300nm UV transillumination. Bioinformatics Analysis of Motifs Identified by Bind-n-Seq. Gapped Local Alignment of Motifs (GLAM2) (Frith, 2008) analysis was performed on motifs identified by BnS for each full-length and zinc-finger Miz-1 constructs. Then, GLAM2 results were searched against Miz-1 ChIP-chip sequence data (N. V. Varlakhanova, et al, 2011) for motif occurrences by Find Individual Motif Occurrences (FIMO) (Grant, 2011) analysis with a p-value cut-off of 0.0001. The subsequent motif-containing gene list was analyzed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) (Huang, 2009a, 2009b) with a p-value cut-off of 0.001 for identification of enriched gene ontology clusters. MBP-Pull down Mass Spectrometry Analysis. MBP-hMiz-1 was bound to amylose linked agarose beads (New England Biolabs) and in vitro transcribed and translated c-Myc (TNT Translation/Transcription Kit, Promega) in ZBA for 20 minutes then washed with 10 volumes ZBA, eluted with LDS sample loading buffer (Invitrogen Life Technologies) and detected by Western Blot against human c-Myc. MBP-hMiz-1 was bound to amylose linked agarose beads (New England Biolabs) and H9 human embryonic stem cell nuclear 10 extracts in ZBA for 20 minutes then washed with 10 volumes ZBA and eluted in ½X ZBA and 10mM maltose. Samples were submitted to the UC Davis Proteomics Core for Mass Spectrometry analysis. Data was analyzed by Scaffold 3 (Proteome Software, Inc., Portland, Oregon). 11 RESULTS Miz-1 Expression and Purification by MBP. Induction of recombinant MBP-hMiz-1 protein expression by IPTG under the Tac promoter in E. coli is an efficient and effective means of robust protein production in vitro. The MBP tag allows for efficient purification by amylose-linked agarose beads and elution with maltose. SDS-PAGE and Bradford Assay confirm MBP-Miz-1 purity and concentrations of greater than 2 μM, important for subsequent implementation in the in vitro DNA binding assay. De Novo Motif Finding by Bind-n-Seq. Bind-n-Seq is a high-throughput, in vitro DNA biding assay that allows for the systematic and rapid detection of DNA binding motifs in parallel. While other protein-DNA binding approaches have been identified and widely implemented, including ChIP-chip, ChIP-Seq, protein-binding microarrays (PBM), cyclical amplification and selection of targets (CAST), systematic evolution of ligands by exponential enrichment (SELEX) and even one and two-hybrid systems, Bind-n-Seq has distinct advantages over these other analyses. In vivo approaches including ChIP technologies, and one and two-hybrid systems are powerful but experimentally complex and limited in their application, by for example, the availability of ChIP quality antibodies. In vitro analyses including, PBM, CAST and SELEX are limited in scope by the size of the DNA library available for analysis and/or the labor required to successfully execute the technique. Bind-n-Seq, through its simple design and implementation of next-generation sequencing technology, overcomes challenges of experimental complexity and scope. Short, randomly generated 12 Figure 1. SDS-PAGE Detection of Protein from Purification by MBP. Significant amounts of recombinant protein was obtained by purification by MBP, while little carryover of bacterial proteins is evident. 13 oligonucleotides (21bp binding region) with barcodes are bound to MBP-protein constructs and amylose-linked agarose beads, washed and eluted with maltose and identified by massively parallel sequencing to generate approximately 100,000 reads per sample, while maintaining the ability to run up to 64 samples in parallel (Zykovich, 2009). In this study, MBP fused to Full-length Miz-1 and MBP fused to Miz-1 zinc fingers (residues 269-793) constructs were each analyzed by Bind-n-Seq across 5 different binding buffer and wash buffer conditions. The 5 most highly enriched consensus sequence motifs identified for each the full-length and zinc-finger construct and condition are presented in Tables 1 and 2 respectively. All motifs had significant enrichment of greater than 5-fold and up to 25-fold over background. Interestingly, conditions of higher stringency (higher salt concentration) did not see the lowest enrichment, rather, conditions of low protein concentration produces the lowest enrichment values. Further Gapped Local Alignment of Motifs (GLAM2) analysis (Frith, 2008) was performed on each set of consensus sequences identified for the respective protein constructs and the results are presented in Table 3. The motifs GATTACCGA and ATCGAT were identified as the most significant matches for the full-length and zinc finger constructs respectively. Bioinformatics Analysis of Motifs Identified by Bind-n-Seq. Motifs retrieved from BnS were analyzed by GLAM2 to generate a list of principal motifs for each full-length and zinc finger constructs. The two top-scoring motifs are shown in Table 3. The MEMEformatted motifs from GLAM2 were subsequently used to search Miz-1 ChIP-chip data by Find Individual Motif Occurrences (FIMO). FIMO analysis revealed 3052 and 2411 14 Table 1. Full-length Miz-1 Motif Consensus Sequences Identified by Bind-n-Seq. The five most highly enriched consensus sequences for each binding condition of the BnS assay for the full-length Miz-1 construct is shown along with the enrichment score. Binding Condition 50nM [protein] 1mM [salt] 50nM [protein] 50mM [salt] 50nM [protein] 100mM [salt] 5nM [protein] 100mM [salt] 350nM [protein] 100mM [salt] Consensus Sequence ATAATCGAT GATTACCGA CGATTAATCG ATTACCGATC AATCGATCTC ATCGGTAATC GATTACCGA ATCGGCAATC ATCGGTATTC GGCTTACCGA ATCGGTAATC GATTACCGA GGATTACCGA AGATTACCGA GATTGCCGAA GATTACCGA AGATTGCCGA ATCGGTAATC GATTGCCGA ATCGATTAA GATTACCGA ATCGATTAC GATTGCCGA AATCGATTA TAATCGATTA Fold-Enrichment 17.417 16.667 14.5 14.312 13.6 20.867 19.31 17.5 16.952 15.529 14.125 13.478 12.8 11.8 11.667 9.808 9.6 9.375 6.333 5.871 11.162 11.029 11 10.514 9.474 15 Table 2. Zinc finger Miz-1 Motif Consensus Sequences Identified by Bind-n-Seq. The five most highly enriched consensus sequences for each binding condition of the Bind-n-Seq assay for the zinc finger Miz-1 construct is shown along with the enrichment score. Binding Condition 50nM [protein] 1mM [salt] 50nM [protein] 50mM [salt] 50nM [protein] 100mM [salt] 5nM [protein] 100mM [salt] 120nM [protein] 100mM [salt] Consensus Sequence ATCGATTAAT TAATCGATTA ATAATCGATC ATCGGTAATC ATCGATTAA AAAAATCGAT ATCGGTAATC ATCGGCAATC ATCGATTAAA ATCGATTAC AACATCGAT GATTGCCGA AGTAATCGAT CATCGATCG ATCGATCGAT ATCGATCGAT ATCGATCGA GATTGCCGA ATCGGTACTC ATCGGTACTC ATCGGTATC ATCGATTG AATCATCGAT GATTGCCGA GATTACCGA Fold-Enrichment 25.053 23.71 19.611 16.909 16.81 26.2 17.267 16.053 16 14.2 16.421 16.31 15.053 14.864 13.333 19.6 16 14.227 12.2 11 10.75 9.619 8.833 7.781 6.892 16 motif occurrences with a p-value less than 0.0001 respectively for the full-length and zinc finger constructs. Database for Annotation, Visualization and Integrated Discovery (DAVID) analysis of Miz-1 bound genes containing motif occurrences identified by FIMO shows enriched clusters of functionally related genes. Significant gene ontology clusters are outlined for both full-length and zinc finger constructs in Table 4 and includes genes involved in cell growth, differentiation, including Hox genes, and other developmental associated genes, regulation of transcription and DNA binding and chromatin structure including acetylation. EMSA Supports Miz-1 Binding ATCGAT and GATTACCGA. Electrophoretic mobility shift analysis is a long-standing, robust method of confirming protein-DNA interaction and serves as a means of validating the motifs ATCGAT and GATTACCGA affinity for the human Miz-1 protein. Poly-dI/dC non-specific inhibitor oligonucleotide was added to the binding reactions and results were analyzed by polyacrylamide DNA retardation gel electrophoresis. SYBR green DNA stain revels that ATCGAT and GATTACCGA containing probes were bound and shifted in the gel in the presence of full-length Miz-1 protein, while reactions not containing Miz-1 protein ran unaltered. MBP-Miz-1 Pull-down Mass Spectrometry Analysis. MBP-Miz-1 pull-down experiments were performed in order to screen for novel Miz-1 cofactors in hESCs. To assess the possibility of performing a pull-down experiment with MBP-Miz-1 constructs, MBPMiz-1 proteins were incubated with a known interactor, c-Myc, and analyzed by western blot (Figure 3). c-Myc is shown to bind MBP-Miz-1 full-length construct, but does not 17 Table 3. Putative Miz-1 DNA Binding Motifs. Consensus seed motifs from Bind-n-Seq were analyzed by GLAM2 and the highest scoring motifs for the full-length and zincfinger constructs are shown. Construct Fulllength Zinc fingers Consensus GATTACCGA Motif G(A/C)(T/A)(T/A)(A/G)(C/T)CGA Score 198.362 ATCGAT ATCG(A/G)(T/C) 188.623 18 Table 4. Gene Ontology Clusters Identified by DAVID Analysis. Gene lists generated by FIMO for full-length and zinc finger constructs respectively were submitted for DAVID analysis, the most significant gene ontology clusters are shown (p ≤ 0.001) along with the percentage of genes that comprise the cluster and the enrichment score. Gene Ontology Cluster Positive regulation of cell proliferation Miz-1 Construct % Total Genes in GO category P-Value FoldEnrich ment FL 8 2.50x10-05 3.5 -04 1.6 Regulation of transcription FL 23 4.10x10 Regulation of cell proliferation FL 9.9 7.80x10-04 2.3 4.2 8.90x10 -04 4.5 1.30x10 -04 3.4 2.30x10 -04 1.9 3.20x10 -04 1.7 3.80x10 -04 5.1 4.80x10 -04 2.7 -07 1.7 Embryonic organ development Transcription cofactor activity Transcription regulator activity DNA binding Transcription corepressor activity Transcription factor binding FL FL FL FL FL FL 7 16.4 22.1 4.2 8 Nucleus FL 36.2 7.80x10 DNA-binding FL 18.8 2.90x10-05 Homeobox Phosphoprotein Acetylation FL FL FL 5.6 48.8 22.5 2 5.70x10 -05 4.7 1.30x10 -04 1.3 2.00x10 -04 1.7 -04 1.8 Transcription regulation FL 17.8 7.00x10 Chromatin ZF 0.5 4.20x10-04 -04 2.3 5 Transcription factor activity ZF 1.2 6.00x10 Transcription regulator activity ZF 1.6 7.70x10-04 1.9 4.8 3.60x10 -05 1.4 1.80x10 -04 1.6 9.30x10 -04 1.7 Phosphoprotein Nucleus Acetylation ZF ZF ZF 3.2 2.1 19 DAVID Analysis of Motif-Containing Miz-1 Bound Genes chromatin transcription regulation acetylation phosphoprotein Homeobox DNA-binding nucleus transcription factor binding Miz-1 ZF transcription corepressor activity Miz-1 FL DNA binding transcription regulator activity transcription cofactor activity embryonic organ development regulation of cell proliferation regulation of transcription positive regulation of cell proliferation 0 1 2 3 4 Fold-Enrichment 5 6 Figure 2: DAVID Analysis of Motif-containing Miz-1 Bound Genes. DAVID analysis gene ontology clusters of genes both bound by Miz-1 and containing motifs. GATTACCGA Miz FL + GATTACCGA ATCGAT Miz FL +ATCGAT STD 20 100bp Figure 3. EMSA Indicates Miz-1 Binding Motif-containing Oligonucleotide Probes. Oligonucleotide probes and polydI/dC nonspecific competitor nucleotide DNA was incubated with or without full-length Miz-1 protein and subsequently separated by nondenaturing polyacrylamide get electrophoresis and DNA visualized by SYBR green, 300nm UV transillumination. Protein-containing lanes show a retardation of the short oligonucleotides containing the motifs ATCGAT and GATTACCGA. 21 bind the MBP-Miz-1 zinc-finger construct. Although the zinc-finger construct lacks the BTB/POZ domain, it does contain the reported c-Myc interaction domain (Sakamuro & Prendergast, 1999). Stringent washing conditions with high ionic strength buffer (2 M NaCl) did not disrupt the interaction between MBP-Miz-1 and c-Myc, however, treatment with detergent abolished any interaction. Less stringent washing with ZBA yielded background bands in the bead-only control. The MPB-Miz-1 pull down mass spectrometry analysis was performed with an MBP only control and the results are summarized in Table 5. Significant matches were identified in the full-length Miz-1 construct, while the MBP only and Miz-1 zinc finger constructs yielded far fewer peptides, all present in the MBP only control. Miz-1 recombinant protein was readily detected and comprised the most abundant protein in the samples, as expected. Other ribosomal and collagen associated proteins identified as top hits in the scaffold 3 analysis can be disregarded as background from the hESC nuclear extracts and contamination respectively. However, nucleophosmin (NPM) and developmental pluripotencyassociated protein 4 (DPPA4) are of interest as putative Miz-1 cofactors, but further validation by western blot is needed. 22 Figure 4. MBP-Miz-1 Pull-down of c-Myc. Western blot of MBP-Miz-1 Pull-down with c-Myc (72 kDa) input under increasingly stringent washing conditions reveals specific binding under high salt. Under 2M NaCl salt washing condition c-Myc appears to maintain interaction with Miz-1. NP-40 detergent abolishes any interaction and ZBA wash buffer yields higher background. 23 Table 5. MBP-Miz-1 Mass Spectrometry Analysis by Scaffold 3. Percent probability of proteins identified by MS is given for targets of interest. Two putative cofactors were identified by MS. DPPA4 is of interest for its role in mediating pluripotency in mouse embryonic stem cells and NPM is another target of interest with important known transcriptional regulatory functions. Further validation by direct pull-down and western blot is needed to confirm these interactions. Protein Coverage Miz-1 NPM DPPA4 9% 8% 8% Protein Identification Probability 100% 100% 50% 24 Dppa4 Myc Max p300 Dnmt3a HDAC Miz-1 NPM Miz-1 INR ATCGAT GATTACCGA Growth Cell cycle Chromatin structure Acetylation Transcriptional control Development Hox Figure 5. Current Model Proposed for Miz-1 Target Gene Regulation. Miz-1 gene repression involves the recruitment of protein cofactors including Myc, HDAC and Dnmt3A, while Miz-1 gene activation involves co-activators p300 and NPM. Adapted from Varlakhanova et al, 2011. 25 DISCUSSION Essential to the success of regenerative medicine therapies is our basic scientific understanding of hESCs. The complex regulatory network that governs pluripotency and self-renewal, distinguishing characteristics of hESCs, is an important topic of basic stem cell research. c-Myc, the well-studied oncogene, is also a key player in the maintenance of pluripotency and self-renewal in stem cells (N. V. Varlakhanova, Cotterman, R.F., deVries, W.N., Morgan, J., Donahue, L.R., Murray, S., Knowles, B.B., Knoepfler, P.S., 2010). Myc regulation of pluripotency and self-renewal is evidenced by its function in the generation of iPS cells (Takahashi, et al., 2007; Takahashi & Yamanaka, 2006). Myc regulates target gene expression both locally by classical mechanisms, and globally through euchromatin remodeling (Knoepfler, et al., 2006). In this way, Myc can affect gene expression on a large enough scale to reprogram differentiated cells into iPS cells. Miz-1, named for its interaction with Myc, is known to bind initiator sequences in the promoters of target gene core promoters thereby modulating their expression (Kime & Wright, 2003; Peukert, 1997; Seoane, Le, & Massague, 2002). In the current model (Figure 5), Miz-1 is thought to form a co-repressor complex with Myc, silencing Miz-1 target genes (Peukert, 1997), and alternately, Miz-1 forms a co-activating complex with p300 and NPM, activating target genes (M. Wanzel, Herold, S., Eilers, M., 2003; M. Wanzel, Russ, A.C., Kleine-Kohlbrecher, D., Colombo, E., Pelicci, P.G., Eilers, M., 2008). Still, relatively little is known about the function of Miz-1 as a transcriptional regulator and recent gene expression and epigenetic analysis in human ESCs suggest 26 Miz-1 binds alternative sequences, not associated with the initiator sequences of target gene promoters (N. V. Varlakhanova, et al, 2011). CEAS analysis of Miz-1 ChIP-chip data failed to identify putative motifs for Miz1 DNA binding. While not surprising given the size and complexity of the data, finding motifs is still vital to the study of Miz-1 in hESCs. An in vitro approach to finding DNA binding motifs is an efficient and comprehensive alternative way to examine Miz-1-DNA binding. The production of MBP-Miz-1 fusion protein allows for flexible analysis using both protein-DNA and protein-protein biochemical assays. Bind-n-Seq overcomes problems associated with other motif-finding approaches including limitations on in vivo detection and sensitivity, and time and labor-intensive in vitro approaches. Instead, Bindn-Seq employs massively parallel sequencing and a MBP purification scheme to renovate de novo motif finding into a high-throughput in vitro assay. Identification of novel human Miz-1 DNA binding motifs by Bind-n-Seq assay (Zykovich, 2009) has revealed highly enriched DNA binding motifs for Miz-1 relating to both full-length and zinc-finger containing protein constructs. Currently, the Bind-n-Seq assay is optimized for the study of zinc-finger containing proteins and even then, results highly depend on the specific properties of each protein fusion construct. Additionally, Bind-n-Seq analysis is likely to only identify the most highly enriched DNA motifs, while there may be several motifs for a given protein based on its structural conformation. Because of these limitations, Bind-nSeq serves best as a stepping off point for further analysis of important protein-DNA interactions, and it is important to incorporate other analyses to corroborate Bind-n-Seq findings. The consensus motifs identified for Miz-1, ATCGAT and GATTACCGA, were 27 both highly enriched over background and were further analyzed for their ability to bind Miz-1 protein in vitro and their presence near Miz-1 bound genes, determined from Varlakhanova (N. V. Varlakhanova, et al, 2011) hESC ChIP-chip data. EMSA analysis confirmed Miz-1 binding ATCGAT and GATTACCGA containing DNA probes in the presence of poly-dI/dC non-specific competitor oligonucleotide. Probing Miz-1 ChIPchip data by FIMO reveled an abundance of motif-containing sequences related to the Miz-1 ChIP-chip peaks. In total, over 2000 and 3000 sequences contained significant matches to the motifs identified by GLAM2 respectively. Of these thousands of hits, many are sequences corresponding to genes that were present in replicate in the ChIPchip data or belong to genes that have yet to be fully annotated. Respectively, 166 and 212 well-annotated genes identified by FIMO were submitted for DAVID analysis for the zinc finger and full length constructs. Significant, functionally related, gene ontology clusters had overlap between both constructs including: acetylation, phosphoprotein, and transcriptional regulators. The full-length construct revealed more significant matches in both FIMO and DAVID analysis, yet it is important to note that there is significant overlap between both constructs. Across both constructs, the most highly enriched gene ontology clusters from DAVID analysis include: chromatin, homeobox, transcription corepressor, embryonic organ development, and cell proliferation associated genes. These results are in agreement with known functional roles of Miz-1 in hESC and support the hypothesis that Myc maintains hESC pluripotency and self-renewal in part through a corepression program with Miz-1. 28 Additionally, the motif, ATCGAT, is of particular interest because of its similarity to known transcription factor binding motifs, including c-Myc. ATCGAT is a palindromic sequence like that of the Myc E-box, CACGTG. However, ATCGAT has not yet been previously associated with other human proteins in the literature. Palindromic motifs in DNA are common and not unique to just transcription factor binding motifs. They can be highly conserved across many species and important for mobile, repetitive DNA elements like transposons. These shared features of palindromic DNA motifs may imply an important function and significance for the DNA-binding motifs of master transcriptional regulators like Myc and Miz-1. The MBP-Miz-1 fusion protein construct also allowed for detection of novel Miz1 protein interactors by MBP pull-down mass spectrometry analysis. Pull-down of hESC nuclear extracts followed by mass spectrometry identification of proteins revealed a couple candidate cofactors for Miz-1 including DPPA4 and NPM. DPPA4 is not well studied, however, it has been shown to be important to mESC pluripotency and selfrenewal, whereby overexpression resulted in cell proliferation and inhibition of differentiation (Masaki, 2007). Alternately, NPM is a better-characterized protein that functions in diverse cellular processes from histone assembly and cell proliferation to regulation of important tumor suppressors like p53 and ARF (Okuwaki, 2008; Swaminathan V., 2005; Wang H.F., 2011). The coordinate role of Miz-1 and NPM is only characterized by their association in a co-activating complex (Figure 4), and further investigation is of particular interest. Additional biochemical analysis, like direct immunoprecipitation or pull-down and western blot detection, will need to be conducted 29 in order to validate putative protein cofactors. Like the Bind-n-Seq assay, MBP pulldown mass spectrometry analysis serves as a great starting point for more in-depth biochemical and in vivo studies. Bind-n-Seq has revealed important INR-independent DNA binding functions of Miz-1 and MBP pull-down mass spectrometry has identified interesting putative cofactors of Miz-1 in hESCs, while supporting the antagonistic functions of Myc and Miz-1 in hESCs. Though these important analyses require further validation and study in vivo, the work represents an essential advance in the understanding of an important master transcriptional regulator and Myc antagonist in hESCs. Continued study of the basic regulation of pluripotency and self-renewal in hESCs is vital to our understanding of their purpose and potential in regenerative medicine so that therapies may be safe and effective. 30 LITERATURE CITED Brenner, C., Deplus, R., Didelot, C., Loriot, A., Vire, E., De Smet, C., et al. (2005). Myc represses transcription through recruitment of DNA methyltransferase corepressor. [print]. EMBO (European Molecular Biology Organization) Journal, 24(2), 336-346. Chaudhary, J., Michael K. Skinner. (1999). Basic Helix-Loop-Helix Proteins Can Act at the E-Box within the Serum Response Element of the c-fos Promoter to Influence Hormone-Induced Promoter Activation in Sertoli Cells. Molecular Endocrinology 12(5), 774-786. Cotterman, R., Jin, V. X., Krig, S. R., Lemen, J. M., Wey, A., Farnham, P. J., et al. (2008). N-Myc Regulates a Widespread Euchromatic Program in the Human Genome Partially Independent of Its Role as a Classical Transcription Factor. Cancer Research, 68(23). Eisenman, R. N. (2001). Deconstructing Myc. Genes and Development, 15, 2023-2030. Evans, M. J., Kaufman, M.H. (1981). Establishment in culture of pluripotent cells from mouse embryos. Nature, 292, 154-156. Frith, M. C., Saunders, N.F., Kobe, B., Bailey, T.L. (2008). Discovering sequence motifs with arbitrary insertions and deletions. PLoS Computational Biology, 4(5). Grandori, C., Cowley, S.M., James, L.P., Eisenman, R.N. (2000). The MYC/MAX/MAD network and the transcriptional control of cell behavior. [Review]. Annual Review of Cell Developmental Biology, 16, 653-699. Grant, C. E. B., T.L., Noble, W.S. (2011). FIMO: Scanning for occurrences of a given motif. Bioinformatics, 27(7), 1017-1018. Huang, D. W., Sherman, B.T., Lempicki, R.A. (2009a). Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Research, 37(1), 1-13. Huang, D. W., Sherman, B.T., Lempicki, R.A. (2009b). Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protocols, 4(1), 44-57. Ji X, L. W., Song J, Wei L, Liu XS. . (2006). CEAS: cis-regulatory element annotation system. Nucleic Acids Research, 1(34), 551-554. 31 Kim, J. B., Sebastiano, V., Wu, G., Araúzo-Bravo, M.J., Sasse, P., Gentile, L., Ko, K., Ruau, D., Ehrich, M., van den Boom, D., Meyer, J., Hübner, K., Bernemann, C., Ortmeier, C., Zenke, M., Fleischmann, B.K., Zaehres, H., Schöler, H.R. (2009). Oct4-induced pluripotency in adult neural stem cells. Cell, 136(3), 411-419. Kime, L., & Wright, S. C. (2003). Mad4 is regulated by a transcriptional repressor complex that contains Miz-1 and c-Myc. [print]. Biochemical Journal, 370(1), 291-298. Knoepfler, P. S. (2007). Myc goes global: New tricks for an old oncogene. Cancer Research, 67(11), 5061-5063. Knoepfler, P. S., Zhang, X.-y., Cheng, P. F., Gafken, P. R., McMahon, S. B., & Eisenman, R. N. (2006). Myc influences global chromatin structure. EMBO (European Molecular Biology Organization) Journal, 25(12). Masaki, H., Nishida, T., Kitajima, S., Asahina, K. and Teraoka, H. (2007). Developmental Pluripotency-associated 4 (DPPA4) Localized in Active Chromatin Inhibits Mouse Embryonic Stem Cell Differentiation into a Primitive Ectoderm Lineage. Journal of Biological Chemistry, 282, 33034-33042. McMahon, S. B., Wood, M.A., Cole, M.D. (2000). The essential cofactor TRRAP recruits the histine acetyltransferase hGCN5 to c-Myc. Molecular and Cellular Biology, 20, 556-562. Meyer, N., Penn, L.Z. (2008). Reflecting on 25 years with MYC. [Review]. Nature Review Cancer, 8(12), 976-990. Nakagawa M., K. M., Tanabe K., Takahashi K., Ichisaka T., Aoi T. Okita K., Mochiduki Y., Takizawa N., Yamanaka S. (2008). Generation of induced pluripotent stem cells without Myc from mouse and human fibroblasts. Nature Biotechnology, 26(1), 101-106. Okuwaki, M. (2008). The structure and functions of NPM1/Nucleophsmin/B23, a multifunctional nucleolar acidic protein. Biochemistry, 143(4), 441-448. Peukert, K., Staller, P., Schneider, A., Carmichael, G., Hanel, F., Eilers, M. (1997). An alternative pathway for gene regulation by Myc. EMBO (European Molecular Biology Organization) Journal, 16, 5672-5686. Pierce, G. B. J., Dixon, F.J. Jr., Verney, E.L. (1960). Tetracarcinogenic and tissueforming potentials of the cell types comprising neoplastic embryoid bodies. Laboratory Investigation, 9, 583-602. 32 Sakamuro, D., & Prendergast, G. C. (1999). New Myc-interacting proteins: A second Myc network emerges. [print]. Oncogene, 18(19), 2942-2954. Seoane, J., Le, H.-V., & Massague, J. (2002). Myc suppression of the p21Cip1 Cdk inhibitor influences the outcome of the p53 response to DNA damage. [print]. Nature (London), 419(6908), 729-734. Seoane, J., Pouponnot, C., Staller, P., Schader, M., Eilers, M., & Massague, J. (2001). TGFbeta influences Myc, Miz-1 and Smad to control the CDK inhibitor p15INK4b. [print]. Nature Cell Biology, 3(4), 400-408. Stadtfeld, M., Hochedlinger, K. (2010). Induced pluripotency: history, mechanisms, and application. [Review]. Genes & Development, 24(20), 2239-2263. Staller, P., Peukert, K., Kiermaier, A., Seoanet, J., Lukas, J., Karsunky, H., Moroy, T., Bartek, J., Massague, J., Hanel, F., Eilers, M. (2001). Repression of p15INK4b expression by Myc through association with Miz-1. Nature Cell Biology, 3, 392399. Swaminathan V., K. A. H., Febitha K.K., Kundu T.K. (2005). Human histone chaperone nucleophosmin enhances acetylation-dependent chromatin transcription. Molecular and Cellular Biology, 25(17), 7534-7545. Takahashi, K., Tanabe, K., Ohnuki, M., Narita, M., Ichisaka, T., Tomoda, K., et al. (2007). Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell, 131(5), 861-872. Takahashi, K., & Yamanaka, S. (2006). Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell, 126(4), 663-676. Varlakhanova, N. V., Cotterman, R.F., deVries, W.N., Morgan, J., Donahue, L.R., Murray, S., Knowles, B.B., Knoepfler, P.S. (2010). myc maintains embryonic stem cell pluripotency and self-renewal. Differentiation, 80, 9-19. Varlakhanova, N. V., et al. (2011). Myc and Miz-1 have coordinate genomic functions including targeting Hox genes in human embryonic stem cells. Epigenetics and Chromatin, 4(20). Wang H.F., T. K., Nakanishi A., Miki Y. (2011). BRCA2 and nucleophosmin coregulate centrosome amplification and form a complex with the Rho effector kinase ROCK2. Cancer Research, 71(1), 68-77. Wanzel, M., Herold, S., Eilers, M. (2003). Transcriptional Repression by Myc. [Review]. TRENDS in Cell Biology, 13(3), 146-150. 33 Wanzel, M., Russ, A.C., Kleine-Kohlbrecher, D., Colombo, E., Pelicci, P.G., Eilers, M. (2008). A ribosomal protein L23-nucleophosmin circuit coordinates Mizl function with cell growth. Nature Cell Biology, 10(9), 1051-1061. Wu, S., Cetinkaya, C., Munoz-Alonso, M., von der Lehr, N., Bahram, F., Beuger, V., Eilers, M., Leon, J., Larsson, L.G. (2003). Myc represses differentiation-induced p21CIP1 expression via Miz-1-dependent interaction with the p21 core promoter. Oncogene, 22, 351-360. Zykovich, A., Korf, I., Segal, D.J. (2009). Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing. Nucleic Acids Research, 37(22), e151.