The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15 INTRODUCTION • Human genome: – Non-transcribed – Transcribed • Coding • Non-coding • Only 5% are protein-coding genes • Nowdays: 20000 genes • 1990s: 60000 genes dark matter? Non-coding RNA (ncRNA) • Difficulties in defining a non-coding transcript • Overlapping • What fraction of all intergenic sequence in the human genome is transcribed into stable noncoding RNA products? • What are their sequences and expression patterns? • Function? Ribosomes, tRNA, snRNA… (Kapranov et al., Science 2007) cDNA Sanger sequencing • FANTOM project findings (early 2000s) – Majority of transcripts contain single exons (no splicing) – Poorly conserved – Low expression levels • Do they have a real function? Tiling Arrays • This technology detects transcription using probes that are regularly spaced on the genome • Transcribed dark matter was found by tiling microarrays to be even more abundant in human, mouse and other genomes • Limitation: transcription outside known genes RNA-seq • Tens of millions of fragment reads are mapped to a reference genome sequence and intersected with existing or novel transcript and gene annotations • Both the proportion of reads that contribute to transcribed dark matter (‘dark matter mass’) and the fraction of the genome sequence covered by such reads (‘dark matter coverage’) can be calculated. RNA-seq • Dark matter mass is thus relatively low, consistent with previous observations from cDNA sequencing and tiling arrays that ncRNAs. • Dark matter coverage, on the other hand, is relatively high with over a quarter of all transcribed regions not overlapping known genes (Ponting et al., Hum. Mol. Gen. 2010) RNA-seq • Understimation: – Ambigous gene annotations – Antisense transcript – Containing of TE by non-coding transcripts FUNCTIONALITY • Together cDNA sequencing, tiling arrays and RNA-Seq approaches have identified thousands of long (.200 bp) intergenic ncRNA (lincRNA) loci in human and mouse genomes. • Differential expression among different tissues • Purifying selection? • Low abundant lincRNAs may act in cis. In contrast, those lincRNAs with stable secondary structures and that act in trans perhaps are likely to be more abundant. Chromatin modification (Qu and Andelson., Frontiers in Gen. 2012) Transcriptional regulation (Qu and Andelson., Frontiers in Gen. 2012) Post-transcripional regulation (Qu and Andelson., Frontiers in Gen. 2012) lncRNAs in Human Disease • p53 response through regulation in trans • lnc-RNAs in GWAS: diabetes, gliomas, coronary diseases • Missing heritability? • lncRNAs specific drugs • Use as biomarkers? Conclusions • We now know that the human genome contains thousands of lncRNAs, both genic and intergenic. • This new class of non-protein coding RNAs (ncRNAs) lack functional ORFs, are modestly con- served and seem to negatively and positively regulate protein coding gene expression, in cis and trans. • Diverse mechanisms of action have been observed • Main goal: characterize their status and functionality • the concept of a ‘gene’ will increasingly appear incomplete and overly simplistic. References • Derrien et al., The long non-codingRNAs: a new(p)layer in the dark matter. Frontiers in Genetics. January(2) 2012. • Qu and Andelson, Evolutionary conservation and functional roles of ncRNA. Frontiers in Genetics. October(3) 2012. • Ponting et al., Transcribed dark matter: meaning or myth?. Human Molecular Genetics. (19) 2010. • Wilhem et al., Defining transcribed regions using RNA-seq. Nature Protocols. (5) 2010. • Shapiro et al., The coding and non-coding architecture of the Caulobacter crescentus genome. PLOS Genetics. July (10) 2014 • Cirulli et al. Screening the human exome: a comparison of whole genome and whole transcriptome sequencing. Genome Biology. (11) 2010.