Response to reviews, MS ID#: GENOME/2013/164830 MS TITLE: Dynamic shifts in occupancy by TAL1 are guided by GATA factors and drive large-scale reprogramming of gene expression during hematopoiesis Weisheng Wu et al. The comments from reviewers are in black, and our responses are in blue. Reviewer 1 Reviewer 1 Comments for the Author... In this manuscript, Wu and colleagues investigate the role of TAL1 in hematopoiesis, and specifically how changes in the binding specificity of TAL1 is modulated by other transcription factors and contributes to commitment to the erythroid lineage. The authors identify the GATA factors as key transcription factors in this process, and demonstrate that the ensuing choreography of TAL1 occupancy is highly dynamic through differentiation. I am overall enthusiastic about the manuscript. My major critique is that the manuscript suffers, in my opinion, from multiple sections that are very dense and difficult to work through. In several of these cases, it was difficult for me to figure out what was the main point of the section, and it took several reads before I understood. Much of the discussion closely recapitulates the results, and it would perhaps be better if the discussion was focused on integrating the complex results into a bigger picture, and putting them into context. Overall, I feel that this is interesting, revealing, and well-executed research that would strongly benefit from efforts to streamline the text and to focus attention on the key points. We are gratified by the complimentary comments, and we have worked for clarity during our revision and streamlining. We revised the sections specifically commented on by the Reviewer, and made revisions throughout to improve clarity. We re-wrote the Discussion to emphasize how our results integrate into a larger picture. Specific critiques / descriptions of some areas where writing could be improved: I struggle with the analysis presented in Table 2 (pages 7-8) of the manuscript. In this section, the authors demonstrate a divergence in the function of candidate TAL1 target genes. They describe how TAL1 targets in precursor cells are enriched in functions such as proliferation and apoptosis, while targets in megakaryocytic cells and erythroid cells are enriched in related functions, respectively. In addition to being difficult to parse through, I was also distracted/confused by several results that seem to suggest that the functional enrichment is not specific. For example, several of the megakaryocte gene categories are also enriched in TAL1 target genes that are specific to erythroid cells (e.g. abnormal megakaryocyte morphology, p = 2E-129). Also, the authors describe enriched innate immunity in the gene targets of Ter119- cells, but it appears that innate immunity is more strongly enriched in the G1E cell type. Anything that could be done to make this section more approachable would, in my opinion, greatly benefit the reader. The reviewer’s comment led us to make extensive revision to this section. We now emphasize from the start of the section two confounding factors in this analysis: (1) each gene is associated with multiple TAL1 OSs, each of which can have a different occupancy pattern, and (2) the target for each TAL1 OS is unknown. For the latter issue, we took two approaches. One is an inclusive method that assigns every gene within an enhancer-promoter unit (EPU) as a potential target for a TAL1 OS. The second, more exclusive method assigns the gene with the nearest TSS as the target. In the inclusive approach, each TAL1 OS has multiple gene targets. These several multiplicities allow a given gene to be included in the functional term enrichment for more than one TAL1 OS type (each column in the Table). We emphasize even more strongly in the revised test that the results are significant and robust, even with these serious confounding issues that could have hidden the associations. We also added the point that some TAL1 OSs could be playing a negative role, e.g. erythroid TAL1 OSs in megakaryocytic genes could be involved in repression. Indeed, we found that the genes contributing to the enrichment for megakaryocytic function in targets of TAL1 occupancy in erythroid cells tended to be expressed at higher levels in megakaryocytes than in erythroblasts. This is now stated in the Results and presented as Supplementary Figure 7. We also made extensive edits to improve clarity. I found Figure 5C confusing. I am not sure how to interpret TF binding in some cell types with TAL1 occupancy in different cell types. For example, what does the high overlap of GATA1 binding (and comparatively less overlap with GATA2 binding) mean in the Gata1- cell line? What is driving TAL1 absent GATA1 and 2? We have re-written and expanded the presentation to address this issue. The reviewer is correct that some of the TFs evaluated for overlap with TAL1 OSs are not present in the cell type in which TAL1 was ascertained. We interpret these overlaps as reflecting re-use of the previously bound DNA segments, e.g. early binding by GATA2 could mark places that are later bound by TAL1 and GATA1. Likewise, GATA1 (and TAL1) can bind to places that were previously bound by other factors. This is now explicitly stated in the revised text. Also, we added an analysis of GATA switch sites in a later section. All this is consistent with our previous results that GATA1 binding sites are largely pre-determined by chromatin environment in G1E cell line, even though GATA1 is absent there (Wu, et al. 2011). This is added to the revised Discussion. In Figure 6B, the legend is confusing because it aligns with the cell condition. Moving the legend would make the figure easier to read. In the same figure, I struggle to understand what it means for the sites with indirect gain or retention of TAL1 to be depleted in _both_ repression and induction. Intuitively, this seems contradictory -- are these sites just largely non-fucntional? Perhaps it would help to be more clear about how the enrichment and depletion are measured in these analyses. We appreciate the recommendations for clarification. The legend with the + and – designations was found to be redundant, and it was removed. The method for evaluating enrichment and depletion, relative to the induction and repression frequency of all TAL1 occupied genes, now is explained more completely in the Results and Methods. Also, two new graphs were generated to convey the results more clearly (Fig. 6B). Because the reference point to calculate enrichment of induction is independent of the reference point for enrichment of repression (the two dotted lines now in the bar plot), it is possible for the presumptive target genes for a group of TAL1 OSs to be depleted for both induction and repression. The question as to whether these sites associated with groups of genes depleted for both induction and repression could be nonfunctional led to a new paragraph in the Discussion, where do acknowledge the possibility that some of this binding could be “opportunistic” and point to further experiments to address this issue. Reviewer 2 This manuscript reports the analysis of genomic occupancy by the key transcriptional regulator TAL1 across hematopoietic differentiation in progenitor cells, at various stages of erythroid maturation and in megakaryocytes. One of the aims was to determine whether the genes regulated by TAL1 in the different cell types are distinct and how differential binding might occur, in order to get better insight into TAL1 function. The authors have compiled their own data with that from previously published reports to correlate the pattern of TAL1 genomic occupancy with gene expression, histone modifications, DNA motifs and co-occupancy by other critical transcription factors. This is a very thorough and integrated analysis that offers a comprehensive map and a clear view of the highly dynamic shifts in TAL1-occupied genomic sites across hematopoiesis. This is, to my knowledge, the first time a study of that kind is reported. It is a very good demonstration of how large-scale data analyses can validate and extend important previous observations, thereby linking individual studies in a meaningful way. The authors have created a complete genomic platform that will facilitate further analyses of some of the regulatory mechanisms underlying lineage specification and erythroid differentiation. This is an important resource for the community that will certainly serve as a paradigm for the study of other transcription factors and differentiation systems. We thank the reviewer for these comments. Some interesting observations are made that could be further developed, as detailed below. 1. Page 8. The increasing number of TAL1 binding sites in the Cpox locus as erythroid differentiation progresses is of interest but its significance is unclear and not discussed. Would inspection of the sequences underlying the peaks provide some clues? Could the authors discuss how this might lead to increased levels of expression as differentiation proceed? We examined this locus more carefully in response to these questions. Matches to transcription factor binding sites for GATA factors and KLF proteins, as well as E-boxes, were found underlying the peaks. These motifs, especially the GATA and KLF motifs, are enriched at a large number of TAL1 OSs, which is discussed in a separate section. Thus we did not add a motif analysis at this point in the manuscript. The observation of multiple KLF binding site motifs motivated us to examine another KLF1 occupancy dataset (Tallack et al, 2010, Genome Research), which showed peaks overlapping the TAL1 OSs in Cpox, and we modified the figure to include this. Also, to address the potential significance of the additional binding sites, we added the statement “These additional sites of binding by TAL1 and other hematopoietic transcription factors may serve to keep the Cpox gene expressed while the bulk of the genome is repressed during later stages of erythroid maturation.” Inspection of both the microarray and RNA-seq data during erythroid maturation shows that Cpox is already expressed at a moderate level in erythroid progenitors, and its level of expression increases about two-fold during maturation. Thus the additional TAL1 binding sites may not be needed for a strong increase in expression, at least for this locus, but there is a need to block the encroaching repressive chromatin. We also added to the Discussion some thoughts about the complex relationships between number of TF OSs and expression levels. 2. Page 8. How do the authors explain the complex and changing binding pattern of TAL1 in some genes such as Cbfa2t3, with limited changes in expression levels? As in point 1, what are the sequences underlying each of those peaks and could different combinations of DNA motifs account for the differential binding? This is a challenging observation, which we now emphasize more in the text in response to this comment. After observing “This indicates a complex set of regulatory regions that are utilized dynamically during hematopoiesis,” we expanded the text to say “Remarkably, these dynamic changes in occupancy are not accompanied by large changes in expression of Cbfa2t3, suggesting that distinct sets of TF-bound DNA segments are utilized in different lineages to achieve a similar level of expression”. This was not what we expected, and it is not clear why different sets of CRMs would be used in different lineages to achieve the same end, but we agree with the reviewer that this is important to explicitly point out. We also discuss possible explanations for the multiple binding sites per gene in the Discussion (complex regulation, redundancy, opportunistic binding) and suggest experiments that could test them. As in our response to comment 1, we devote a separate section to motif analysis, which leads to some helpful insights about other TFs that help guide TAL1 to lineage-specific binding sites. Thus we do not describe separately the motif occurrences underlying TAL1 OSs in Cbfa2t3. 3. Page 11. The analysis of TAL1 binding in G1E cells before and after induction of GATA1 expression gives a good picture of the possible combinations of TAL1 and GATA1 binding and their association with gene expression. It would be interesting to take this further and provide some data on how the interplay between GATA2 and GATA1, as hinted at in the text, might explain TAL1 genomic binding. To do this, the authors might want to incorporate in their study data from the recent paper by Suzuki et al (Genes to Cells, 2013) who have contrasted GATA1 and GATA2 binding by ChIP in primary erythroid cells. This might give molecular insights into how the shifts in TAL1 occupancy may occur during erythroid differentiation and thereby strengthen one of the key points of the paper. We thank the reviewer for this suggestion to explore the role of GATA2. Suzuki et al (2013) mapped sites of occupancy by GATA2 and GATA1 in a cell model system similar to the G1E system, using ChIP-chip. They found about 1800 GATA2-occupied segments and 1600 GATA1 OSs. We wanted to compare their results to the approximately 4000 GATA2 OSs and 14,000 GATA1 OSs we had previously reported, but after multiple contacts with the authors, we were not able to obtain their peak calls. It does appear from the results in the paper that we are seeing similar patterns, albeit with quite different numbers of bound sites. Following the suggestion of the reviewer, we added an analysis of how TAL1 occupancy is influenced by GATA2 binding, in particular at the GATA switch sites. We added a panel (C) to Fig. 6, a Supplementary Figure 10, a new paragraph at the end of the Results, and a new paragraph in the Discussion. The new analysis reveals two prominent classes of TAL1 OSs at GATA switch sites. The most abundant are sites at which TAL1 is retained after the switch, and these are in a category associated with gene induction. The other class is characterized by a loss of TAL1 after the switch, which is a well-known model for repression during erythroid differentiation. Other points 4. The authors need to soften some of their statements with regards to the novelty of their findings. As examples, the fact that Gata motifs are stronger determinants of TAL1 binding than Ebox motifs has been reported previously: (i) before large-scale ChIP assays were performed, as mentioned in Palii et al (Embo J 2011) and (ii) more recently in ChIP-seq studies by the authors themselves (Tripic et al) and others. Similarly, the broader statement that recruitment of TAL1 to different genomic sites in different cell contexts operates through association with distinct, additional regulators clearly comes through this analysis to strengthen previous observations. It is important to mention that molecular mechanisms that underlie the redistribution of TAL1 in different cell types are now coming to light, as described in El Omari et al, Cell Reports, 2013. We agree with the reviewer and we have revised our text to insure that the previous reports are incorporated. We thank the reviewer for emphasizing the insights from the structural analysis in El Omari et al, and we have incorporated these into the Discussion. 5. Page 4. The sentence referring to the heptad of TFs described by Wilson et al suggests the existence of one large multiprotein complex containing the 7 TFs. I don’t think it has ever been shown that such a complex exists; it is more likely that several smaller complexes may form on a given element. Please rephrase. We have rephrased this to refer to co-associating proteins rather than a large complex. 6. The recent paper by Sanjuan-Pla et al, Nature 2013 identifies a platelet-primed HSC population and should therefore be referred to when mentioning the priming of megakaryopoiesis in HPC7 cells. We thank the reviewer for pointing this out, and we now incorporate the reference as recommended. 7. Suppl Figure S3. The correlation in signal strength for TAL1 and GATA1 Chip-seq in G1E-ER4+E2 (p.11) is not clear. Please provide additional examples. We thank the reviewer for pointing out this error. We meant to refer to Supplementary Figure 9C, and we have corrected this. 8. Page 9. The authors make the assumption that the TAL1-bound DNA segments are active in HPC7 cells. However, as TAL1 interacts with co-repressors, one would assume that it can be associated with repressed elements. The authors should define “active chromatin”. We now explicitly state our operational definition of “active chromatin” as being in a state dominated by the histone modifications H3K4me1, H3K4me3, or H3K36me3. The issue of association with co-repressors and how that impacts chromatin is interesting and it needs more data. For all the TFs we have examined, most of the occupancy occurs in DNase-accessible chromatin with histone modifications associated with activity. We rarely see TF binding to chromatin with the repressive modifications (an example is in Fig. 4). We hope that future work will allow a direct examination of chromatin states in HPC7 cells, or better still, in primary multipotential progenitors. Given the current data, we think that stating this assumption that the HPC7 binding is in “active chromatin” allows us to see some evidence of a more repressed state for some genes upon maturation. However, that conclusion is dependent on the accuracy of that initial assumption. 9. Page 9, last paragraph and page 10, second paragraph, Supplemental Figure 1 should be corrected to Supplemental Figure 7. Thanks for pointing this out; we made the change. 10. Figure 6B. This figure is confusing. Please label the occupancy pattern as in Figure 7. The columns of + and – should be relabeled and ordered differently. It should be clear that the first column corresponds to binding of TAL1 in G1E cells, the second column of GATA1 in G1E-ER4+E2 cells and the third column of TAL1 in G1E-ER4+E2 cells (to reflect the representation of the occupancy pattern). We appreciate the recommendations to improve clarity, and we made several changes, including dropping the + and – designations. As discussed in more detail in our reply to Reviewer 1, we also re-made the graphs to make it clearer how we assessed frequency of induction and repression, and calculated enrichment or depletion relative to the frequency of those responses for all genes that are potential targets for TAL1 occupancy. 11. Figure 8. The cartoon suggests direct interactions between TAL1/E47 and RUNX1, RUNX1 and EKLF, GATA2 and ERG. Have these direct interactions been demonstrated or is this just for drawing purposes? Please clarify in the legend. The reviewer is correct and we have clarified this in the legend. We also re-drew the figure to fit better with the models in El Omari et al, 2013, and added a drawing for co-associations in megakaryocytes. 12. Suppl Figures 3-6. The WT TAL1 track should be relabeled Epro TAL1. We made this change.