Diapositiva 1

advertisement
The generalized transcription of
the genome
Víctor Gámez Visairas
Genomics Course
2014/15
INTRODUCTION
• Human genome:
– Non-transcribed
– Transcribed
• Coding
• Non-coding
• Only 5% are protein-coding genes
• Nowdays: 20000 genes
• 1990s: 60000 genes
dark matter?
Non-coding RNA (ncRNA)
• Difficulties in defining a non-coding transcript
• Overlapping
• What fraction of all intergenic sequence in the
human genome is transcribed into stable noncoding
RNA products?
• What are their sequences and expression patterns?
• Function? Ribosomes, tRNA, snRNA…
(Kapranov et al., Science 2007)
cDNA Sanger sequencing
• FANTOM project findings (early 2000s)
– Majority of transcripts contain single exons (no splicing)
– Poorly conserved
– Low expression levels
• Do they have a real function?
Tiling Arrays
• This technology detects transcription using probes
that are regularly spaced on the genome
• Transcribed dark matter was found by tiling
microarrays to be even more abundant in human,
mouse and other genomes
• Limitation: transcription outside known genes
RNA-seq
• Tens of millions of fragment reads are mapped to a reference
genome sequence and intersected with existing or novel
transcript and gene annotations
• Both the proportion of reads that contribute to transcribed
dark matter (‘dark matter mass’) and the fraction of the
genome sequence covered by such reads (‘dark matter
coverage’) can be calculated.
RNA-seq
• Dark matter mass is thus relatively low, consistent with
previous observations from cDNA sequencing and tiling arrays
that ncRNAs.
• Dark matter coverage, on the other hand, is relatively high
with over a quarter of all transcribed regions not overlapping
known genes
(Ponting et al., Hum. Mol. Gen. 2010)
RNA-seq
• Understimation:
– Ambigous gene annotations
– Antisense transcript
– Containing of TE by non-coding transcripts
FUNCTIONALITY
• Together cDNA sequencing, tiling arrays and RNA-Seq
approaches have identified thousands of long (.200 bp)
intergenic ncRNA (lincRNA) loci in human and mouse
genomes.
• Differential expression among different tissues
• Purifying selection?
• Low abundant lincRNAs may act in cis. In contrast, those
lincRNAs with stable secondary structures and that act in
trans perhaps are likely to be more abundant.
Chromatin modification
(Qu and Andelson., Frontiers in Gen. 2012)
Transcriptional regulation
(Qu and Andelson., Frontiers in Gen. 2012)
Post-transcripional regulation
(Qu and Andelson., Frontiers in Gen. 2012)
lncRNAs in Human Disease
• p53 response through regulation in trans
• lnc-RNAs in GWAS: diabetes, gliomas, coronary
diseases
• Missing heritability?
• lncRNAs specific drugs
• Use as biomarkers?
Conclusions
• We now know that the human genome contains thousands of
lncRNAs, both genic and intergenic.
• This new class of non-protein coding RNAs (ncRNAs) lack
functional ORFs, are modestly con- served and seem to
negatively and positively regulate protein coding gene
expression, in cis and trans.
• Diverse mechanisms of action have been observed
• Main goal: characterize their status and functionality
• the concept of a ‘gene’ will increasingly appear incomplete
and overly simplistic.
References
• Derrien et al., The long non-codingRNAs: a new(p)layer in the dark matter.
Frontiers in Genetics. January(2) 2012.
• Qu and Andelson, Evolutionary conservation and functional roles of
ncRNA. Frontiers in Genetics. October(3) 2012.
• Ponting et al., Transcribed dark matter: meaning or myth?. Human
Molecular Genetics. (19) 2010.
• Wilhem et al., Defining transcribed regions using RNA-seq. Nature
Protocols. (5) 2010.
• Shapiro et al., The coding and non-coding architecture of the Caulobacter
crescentus genome. PLOS Genetics. July (10) 2014
• Cirulli et al. Screening the human exome: a comparison of whole genome
and whole transcriptome sequencing. Genome Biology. (11) 2010.
Download