Overview of Articles for the literature talks Nr PMID Titel Date Student Betreuer 1 1722994 The prognostic role of a gene 9 signature from tumorigenic breastcancer cells 09.07.201 Friederike 4 Christen Bertram Klinger 2 1732287 A module of negative feedback 8 regulators defines growth factor signaling. 09.07.201 Dominique 4 Sydow Florian Uhlitz 3 2450688 The two active X chromosomes in 10.07.201 Isabelle Jurke 4 female ESCs block exit from the 4 pluripotent state by modulating the ESC signaling network Johannes Meisig 4 2174689 Systematic investigation of genetic 10.07.201 Laura 6 vulnerabilities across cancer cell 4 Schneider/ lines reveals lineage-specific LukasDaniel dependencies in ovarian cancer Bertram Klinger 5 2070330 Mammalian microRNAs 10.07.201 Vera Schützhold Jörn Schmiedel 0 predominantly act to decrease target 4 mRNA levels 6 2377475 Efficient translation initiation 8 dictates codon usage at gene start. 14.07.201 Stefanie Blaue 4 Nils Blüthgen 7 2023293 Structure-based analysis of DNA 6 sequence patterns guiding nucleosome positioning in vitro. 14.07.201 Jana Biermann 4 Robert Lehmann 8 2065535 Circadian transcription in liver 3 14.07.201 Eva-Maria 4 Bendel Sarah Lück 9 2295561 An integrated encyclopedia of DNA 15.07.201 Janine Kuntze 6 elements in the human genome 4 10 2295598 Predicting cell-type-specific gene 3 expression from regions of open chromatin 11 2295562 The long-range interaction 1 landscape of gene promoters Florian Uhlitz 15.07.201 Mike Schröder 4 Manuela Benary 15.07.201 Ferdinand 4 Krupp Manuela Benary 16.07.201 Valentina 12 1999598 ChIP-Seq of transcription factors 4 predicts absolute and differential 4 Rausch gene expression in embryonic stem cells Johannes Meisig 13 2484302 Genome-wide dynamics of Pol II 16.07.201 Alexander Kiefer Manuela Benary 7 elongation and its interplay with 4 promoter proximal pausing, Nr PMID Titel Date Student Betreuer chromatin, and exons 14 2441789 Differential protein occupancy 16.07.201 Claudia Bohg 6 profiling of the mRNA 4 transcriptome Florian Uhlitz 15 1855578 Integration of External Signaling 5 Pathways with the Core Transcriptional Network in Embryonic Stem Cells 17.07.201 Urs Kindler 4 Johannes Meisig 16 1921387 Genome-Wide Analysis in Vivo of 7 Translation with Nucleotide Resolution Using Ribosome Profiling 17.07.201 Janine Arndt 4 Florian Uhlitz 17 2172317 Transcriptome-wide Analysis of 1 Regulatory Interactions of the RNA-Binding Protein HuR 17.07.201 Johannes Scholz Jörn Schmiedel 4 Corresponding abstracts 1. The Prognostic Role of a Gene Signature from Tumorigenic Breast-Cancer Cells Liu et al. 2007 Breast cancers contain a minority population of cancer cells characterized by CD44 expression but low or undetectable levels of CD24 (CD44+CD24−/low) that have higher tumorigenic capacity than other subtypes of cancer cells. We compared the gene-expression profile of CD44+CD24−/low tumorigenic breast-cancer cells with that of normal breast epithelium. Differentially expressed genes were used to generate a 186gene “invasiveness” gene signature (IGS), which was evaluated for its association with overall survival and metastasis-free survival in patients with breast cancer or other types of cancer. There was a significant association between the IGS and both overall and metastasis-free survival (P<0.001, for both) in patients with breast cancer, which was independent of established clinical and pathological variables. When combined with the prognostic criteria of the National Institutes of Health, the IGS was used to stratify patients with high-risk early breast cancer into prognostic categories (good or poor); among patients with a good prognosis, the 10-year rate of metastasis-free survival was 81%, and among those with a poor prognosis, it was 57%. The IGS was also associated with the prognosis in medulloblastoma (P = 0.004), lung cancer (P = 0.03), and prostate cancer (P = 0.01). The prognostic power of the IGS was increased when combined with the woundresponse (WR) signature. The IGS is strongly associated with metastasis-free survival and overall survival for four different types of tumors. This genetic signature of tumorigenic breast-cancer cells was even more strongly associated with clinical outcomes when combined with the WR signature in breast cancer. 2. A module of negative feedback regulators defines growth factor signaling. Amit et al. 2007 Signaling pathways invoke interplays between forward signaling and feedback to drive robust cellular response. In this study, we address the dynamics of growth factor signaling through profiling of protein phosphorylation and gene expression, demonstrating the presence of a kinetically defined cluster of delayed early genes that function to attenuate the early events of growth factor signaling. Using epidermal growth factor receptor signaling as the major model system and concentrating on regulation of transcription and mRNA stability, we demonstrate that a number of genes within the delayed early gene cluster function as feedback regulators of immediate early genes. Consistent with their role in negative regulation of cell signaling, genes within this cluster are downregulated in diverse tumor types, in correlation with clinical outcome. More generally, our study proposes a mechanistic description of the cellular response to growth factors by defining architectural motifs that underlie the function of signaling networks. 3. The two active X chromosomes in female ESCs block exit from the pluripotent state by modulating the ESC signaling network Schulz et al. 2014 During early development of female mouse embryos, both X chromosomes are transiently active. X gene dosage is then equalized between the sexes through the process of X chromosome inactivation (XCI). Whether the double dose of X-linked genes in females compared with males leads to sexspecific developmental differences has remained unclear. Using embryonic stem cells with distinct sex chromosome compositions as a model system, we show that two X chromosomes stabilize the naive pluripotent state by inhibiting MAPK and Gsk3 signaling and stimulating the Akt pathway. Since MAPK signaling is required to exit the pluripotent state, differentiation is paused in female cells as long as both X chromosomes are active. By preventing XCI or triggering it precociously, we demonstrate that this differentiation block is released once XX cells have undergone X inactivation. We propose that double X dosage interferes with differentiation, thus ensuring a tight coupling between X chromosome dosage compensation and development. 4. Systematic investigation of genetic vulnerabilities across cancer cell lines reveals lineage-specific dependencies in ovarian cancer. Cheung et al. 2011 A comprehensive understanding of the molecular vulnerabilities of every type of cancer will provide a powerful roadmap to guide therapeutic approaches. Efforts such as The Cancer Genome Atlas Project will identify genes with aberrant copy number, sequence, or expression in various cancer types, providing a survey of the genes that may have a causal role in cancer. A complementary approach is to perform systematic loss-of-function studies to identify essential genes in particular cancer cell types. We have begun a systematic effort, termed Project Achilles, aimed at identifying genetic vulnerabilities across large numbers of cancer cell lines. Here, we report the assessment of the essentiality of 11,194 genes in 102 human cancer cell lines. We show that the integration of these functional data with information derived from surveying cancer genomes pinpoints known and previously undescribed lineage-specific dependencies across a wide spectrum of cancers. In particular, we found 54 genes that are specifically essential for the proliferation and viability of ovarian cancer cells and also amplified in primary tumors or differentially overexpressed in ovarian cancer cell lines. One such gene, PAX8, is focally amplified in 16% of high-grade serous ovarian cancers and expressed at higher levels in ovarian tumors. Suppression of PAX8 selectively induces apoptotic cell death of ovarian cancer cells. These results identify PAX8 as an ovarian lineage-specific dependency. More generally, these observations demonstrate that the integration of genome-scale functional and structural studies provides an efficient path to identify dependencies of specific cancer types on particular genes and pathways. 5. Mammalian microRNAs predominantly act to decrease target mRNA levels Guo et al. 2010 MicroRNAs (miRNAs) are endogenous approximately 22-nucleotide RNAs that mediate important gene-regulatory events by pairing to the mRNAs of protein-coding genes to direct their repression. Repression of these regulatory targets leads to decreased translational efficiency and/or decreased mRNA levels, but the relative contributions of these two outcomes have been largely unknown, particularly for endogenous targets expressed at low-to-moderate levels. Here, we use ribosome profiling to measure the overall effects on protein production and compare these to simultaneously measured effects on mRNA levels. For both ectopic and endogenous miRNA regulatory interactions, lowered mRNA levels account for most (>/=84%) of the decreased protein production. These results show that changes in mRNA levels closely reflect the impact of miRNAs on gene expression and indicate that destabilization of target mRNAs is the predominant reason for reduced protein output. 6. Efficient translation initiation dictates codon usage at gene start Bentele et al. 2013 The genetic code is degenerate; thus, protein evolution does not uniquely determine the coding sequence. One of the puzzles in evolutionary genetics is therefore to uncover evolutionary driving forces that result in specific codon choice. In many bacteria, the first 5-10 codons of protein-coding genes are often codons that are less frequently used in the rest of the genome, an effect that has been argued to arise from selection for slowed early elongation to reduce ribosome traffic jams. However, genome analysis across many species has demonstrated that the region shows reduced mRNA folding consistent with pressure for efficient translation initiation. This raises the possibility that unusual codon usage is a side effect of selection for reduced mRNA structure. Here we discriminate between these two competing hypotheses, and show that in bacteria selection favours codons that reduce mRNA folding around the translation start, regardless of whether these codons are frequent or rare. Experiments confirm that primarily mRNA structure, and not codon usage, at the beginning of genes determines the translation rate. 7. Structure-based analysis of DNA sequence patterns guiding nucleosome positioning in vitro. Cui et al. 2010 Recent studies of genome-wide nucleosomal organization suggest that the DNA sequence is one of the major determinants of nucleosome positioning. Although the search for underlying patterns encoded in nucleosomal DNA has been going on for about 30 years, our knowledge of these patterns still remains limited. Based on our evaluations of DNA deformation energy, we developed new scoring functions to predict nucleosome positioning. There are three principal differences between our approach and earlier studies: (i) we assume that the length of nucleosomal DNA varies from 146 to 147 bp; (ii) we consider the anisotropic flexibility of pyrimidine-purine (YR) dimeric steps in the context of their neighbors (e.g., YYRR versus RYRY); (iii) we postulate that alternating AT-rich and GC-rich motifs reflect sequence-dependent interactions between histone arginines and DNA in the minor groove. Using these functions, we analyzed 20 nucleosome positions mapped in vitro at single nucleotide resolution (including clones 601, 603, 605, the pGUB plasmid, chicken beta-globin and three 5S rDNA genes). We predicted 15 of the 20 positions with 1-bp precision, and two positions with 2-bp precision. The predicted position of the '601' nucleosome (i.e., the optimum of the computed score) deviates from the experimentally determined unique position by no more than 1 bp - an accuracy exceeding that of earlier predictions. Our analysis reveals a clear heterogeneity of the nucleosomal sequences which can be divided into two groups based on the positioning 'rules' they follow. The sequences of one group are enriched by highly deformable YR/YYRR motifs at the minor-groove bending sites SHL+/- 3.5 and +/- 5.5, which is similar to the alpha-satellite sequence used in most crystallized nucleosomes. Apparently, the positioning of these nucleosomes is determined by the interactions between histones H2A/H2B and the terminal parts of nucleosomal DNA. In the other group (that includes the '601' clone) the same YR/YYRR motifs occur predominantly at the sites SHL +/- 1.5. The interaction between the H3/H4 tetramer and the central part of the nucleosomal DNA is likely to be responsible for the positioning of nucleosomes of this group, and the DNA trajectory in these nucleosomes may differ in detail from the published structures. Thus, from the stereochemical perspective, the in vitro nucleosomes studied here follow either an X-ray-like pattern (with strong deformations in the terminal parts of nucleosomal DNA), or an alternative pattern (with the deformations occurring predominantly in the central part of the nucleosomal DNA). The results presented here may be useful for genome-wide classification of nucleosomes, linking together structural and thermodynamic characteristics of nucleosomes with the underlying DNA sequence patterns guiding their positions. 8. Circadian transcription in liver Bozek et al. 2010 Circadian rhythms regulate a wide range of cellular, physiological, metabolic and behavioral activities in mammals. The complexity of tissue- and day-time specific regulation of thousands of clock controlled genes (CCGs) suggests that many transcriptional regulators are involved. Our bioinformatic analysis is based on two published DNA-array studies from mouse liver. We search overrepresented transcription factor binding sites in promoter regions of CCGs using GC-matched controls. Analyzing a large set of CCG promoters, we find known motifs such as E-boxes, D-boxes and cAMP responsive elements. In addition, we find overrepresented GC-rich motifs (Sp1, ETF, Nrf1), AT-rich motifs (TBP, Fox04, MEF-2), Y-box motifs (NF-Y, C/EBP) and cell cycle regulators (E2F, Elk-1). In a subset of system-driven genes, we find overrepresented motifs of the serum response factor SRF and the estrogen receptor ER. The analysis of published ChIP data reveals that some of our predicted regulators (C/EBP, E2F, HNF-1, Myc, MEF-2) target relatively many clock controlled genes. Our analysis of CCG promoters contributes to an understanding of the complex transcriptional regulation of circadian rhythms in liver. 9. An integrated encyclopedia of DNA elements in the human genome ENCODE Project Consortium 2012 The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research. 10. Predicting cell-type-specific gene expression from regions of open chromatin Natarajan et al. 2012 Complex patterns of cell-type-specific gene expression are thought to be achieved by combinatorial binding of transcription factors (TFs) to sequence elements in regulatory regions. Predicting celltype-specific expression in mammals has been hindered by the oftentimes unknown location of distal regulatory regions. To alleviate this bottleneck, we used DNase-seq data from 19 diverse human cell types to identify proximal and distal regulatory elements at genome-wide scale. Matched expression data allowed us to separate genes into classes of cell-type-specific upregulated, down-regulated, and constitutively expressed genes. CG dinucleotide content and DNA accessibility in the promoters of these three classes of genes displayed substantial differences, highlighting the importance of including these aspects in modeling gene expression. We associated DNase I hypersensitive sites (DHSs) with genes, and trained classifiers for different expression patterns. TF sequence motif matches in DHSs provided a strong performance improvement in predicting gene expression over the typical baseline approach of using proximal promoter sequences. In particular, we achieved competitive performance when discriminating up-regulated genes from different cell types or genes up- and down-regulated under the same conditions. We identified previously known and new candidate cell-type-specific regulators. The models generated testable predictions of activating or repressive functions of regulators. DNase I footprints for these regulators were indicative of their direct binding to DNA. In summary, we successfully used information of open chromatin obtained by a single assay, DNase-seq, to address the problem of predicting cell-type-specific gene expression in mammalian organisms directly from regulatory sequence. 11. The long-range interaction landscape of gene promoters Sanyal et al. 2012 The vast non-coding portion of the human genome is full of functional elements and diseasecausing regulatory variants. The principles defining the relationships between these elements and distal target genes remain unknown. Promoters and distal elements can engage in looping interactions that have been implicated in gene regulation. Here we have applied chromosome conformation capture carbon copy (5C) to interrogate comprehensively interactions between transcription start sites (TSSs) and distal elements in 1% of the human genome representing the ENCODE pilot project regions. 5C maps were generated for GM12878, K562 and HeLa-S3 cells and results were integrated with data from the ENCODE consortium. In each cell line we discovered >1,000 long-range interactions between promoters and distal sites that include elements resembling enhancers, promoters and CTCF-bound sites. We observed significant correlations between gene expression, promoter-enhancer interactions and the presence of enhancer RNAs. Long-range interactions show marked asymmetry with a bias for interactions with elements located ∼120 kilobases upstream of the TSS. Long-range interactions are often not blocked by sites bound by CTCF and cohesin, indicating that many of these sites do not demarcate physically insulated gene domains. Furthermore, only ∼7% of looping interactions are with the nearest gene, indicating that genomic proximity is not a simple predictor for long-range interactions. Finally, promoters and distal elements are engaged in multiple long-range interactions to form complex networks. Our results start to place genes and regulatory elements in three-dimensional context, revealing their functional relationships. 12. ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells Ouyang et al. 2009 Next-generation sequencing has greatly increased the scope and the resolution of transcriptional regulation study. RNA sequencing (RNA-Seq) and ChIP-Seq experiments are now generating comprehensive data on transcript abundance and on regulator–DNA interactions. We propose an approach for an integrated analysis of these data based on feature extraction of ChIP-Seq signals, principal component analysis, and regression-based component selection. Compared with traditional methods, our approach not only offers higher power in predicting gene expression from ChIP-Seq data but also provides a way to capture cooperation among regulators. In mouse embryonic stem cells (ESCs), we find that a remarkably high proportion of variation in gene expression (65%) can be explained by the binding signals of 12 transcription factors (TFs). Two groups of TFs are identified. Whereas the first group (E2f1, Myc, Mycn, and Zfx) act as activators in general, the second group (Oct4, Nanog, Sox2, Smad1, Stat3, Tcfcp2l1, and Esrrb) may serve as either activator or repressor depending on the target. The two groups of TFs cooperate tightly to activate genes that are differentially up-regulated in ESCs. In the absence of binding by the first group, the binding of the second group is associated with genes that are repressed in ESCs and derepressed upon early differentiation. 13. Genome-wide dynamics of Pol II elongation and its interplay with promoter proximal pausing, chromatin, and exons Jonkers et al. 2014 Production of mRNA depends critically on the rate of RNA polymerase II (Pol II) elongation. To dissect Pol II dynamics in mouse ES cells, we inhibited Pol II transcription at either initiation or promoter-proximal pause escape with Triptolide or Flavopiridol, and tracked Pol II kinetically using GRO-seq. Both inhibitors block transcription of more than 95% of genes, showing that pause escape, like initiation, is a ubiquitous and crucial step within the transcription cycle. Moreover, paused Pol II is relatively stable, as evidenced from half-life measurements at ∼3200 genes. Finally, tracking the progression of Pol II after drug treatment establishes Pol II elongation rates at over 1000 genes. Notably, Pol II accelerates dramatically while transcribing through genes, but slows at exons. Furthermore, intergenic variance in elongation rates is substantial, and is influenced by a positive effect of H3K79me2 and negative effects of exon density and CG content within genes. 14. Differential protein occupancy profiling of the mRNA transcriptome Schueler et al. 2014 RNA-binding proteins (RBPs) mediate mRNA biogenesis, translation and decay. We recently developed an approach to profile transcriptome-wide RBP contacts on polyadenylated transcripts by next-generation sequencing. A comparison of such profiles from different biological conditions has the power to unravel dynamic changes in protein-contacted cis-regulatory mRNA regions without a priori knowledge of the regulatory protein component. We compared protein occupancy profiles of polyadenylated transcripts in MCF7 and HEK293 cells. Briefly, we developed a bioinformatics workflow to identify differential crosslinking sites in cDNA reads of 4-thiouridine crosslinked polyadenylated RNA samples. We identified 30,000 differential crosslinking sites between MCF7 and HEK293 cells at an estimated false discovery rate of 10%. 73% of all reported differential protein-RNA contact sites cannot be explained by local changes in exon usage as indicated by complementary RNA-seq data. The majority of differentially crosslinked positions are located in 3′ UTRs, show distinct secondary-structure characteristics and overlap with binding sites of known RBPs, such as ELAVL1. Importantly, mRNA transcripts with the most significant occupancy changes show elongated mRNA half-lives in MCF7 cells. We present a global comparison of protein occupancy profiles from different cell types, and provide evidence for altered mRNA metabolism as a result of differential protein-RNA contacts. Additionally, we introduce POPPI, a bioinformatics workflow for the analysis of protein occupancy profiling experiments. Our work demonstrates the value of protein occupancy profiling for assessing cis-regulatory RNA sequence space and its dynamics in growth, development and disease. 15. Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells Chen et al. 2008 Transcription factors (TFs) and their specific interactions with targets are crucial for specifying gene-expression programs. To gain insights into the transcriptional regulatory networks in embryonic stem (ES) cells, we use chromatin immunoprecipitation coupled with ultra-highthroughput DNA sequencing (ChIP-seq) to map the locations of 13 sequence-specific TFs (Nanog, Oct4, STAT3, Smad1, Sox2, Zfx, c-Myc, n-Myc, Klf4, Esrrb, Tcfcp2l1, E2f1, and CTCF) and 2 transcription regulators (p300 and Suz12). These factors are known to play different roles in ES-cell biology as components of the LIF and BMP signaling pathways, self-renewal regulators, and key reprogramming factors. Our study provides insights into the integration of the signaling pathways into the ES-cell-specific transcription circuitries. Intriguingly, we find specific genomic regions extensively targeted by different TFs. Collectively, the comprehensive mapping of TF-binding sites identifies important features of the transcriptional regulatory networks that define ES-cell identity. 16. Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling Ingolia et al. 2009 Techniques for systematically monitoring protein translation have lagged far behind methods for measuring messenger RNA (mRNA) levels. Here, we present a ribosome-profiling strategy that is based on the deep sequencing of ribosome-protected mRNA fragments and enables genome-wide investigation of translation with subcodon resolution. We used this technique to monitor translation in budding yeast under both rich and starvation conditions. These studies defined the protein sequences being translated and found extensive translational control in both determining absolute protein abundance and responding to environmental stress. We also observed distinct phases during translation that involve a large decrease in ribosome density going from early to late peptide elongation as well as widespread regulated initiation at non-adenine-uracil-guanine (AUG) codons. Ribosome profiling is readily adaptable to other organisms, making high-precision investigation of protein translation experimentally accessible. 17. Transcriptome-wide Analysis of Regulatory Interactions of the RNA-Binding Protein HuR Lebedeva et al. 2011 Posttranscriptional gene regulation relies on hundreds of RNA binding proteins (RBPs) but the function of most RBPs is unknown. The human RBP HuR/ELAVL1 is a conserved mRNA stability regulator. We used PAR-CLIP, a recently developed method based on RNA-protein crosslinking, to identify transcriptome-wide ∼26,000 HuR binding sites. These sites were on average highly conserved, enriched for HuR binding motifs and mainly located in 3' untranslated regions. Surprisingly, many sites were intronic, implicating HuR in mRNA processing. Upon HuR knockdown, mRNA levels and protein synthesis of thousands of target genes were downregulated, validating functionality. HuR and miRNA binding sites tended to reside nearby but generally did not overlap. Additionally, HuR knockdown triggered strong and specific upregulation of miR-7. In summary, we identified thousands of direct and functional HuR targets, found a human miRNA controlled by HuR, and propose a role for HuR in splicing.