Functional non-coding DNA # 1

advertisement
Functional Non-Coding DNA
Part I
Non-coding genes and non-coding elements
of coding genes
BNFO 602/691
Biological Sequence Analysis
Mark Reimers, VIPBG
What Does ‘Functional Non-Coding
DNA’ Mean?
• DNA whose sequence affects transcripts made
from DNA in some way
• Could affect transcription levels, splicing or
sequestering of RNA
• Three main ways to identify functional noncoding elements
– Sequence characteristics – favored bases
– Genomic conservation
– Epigenetic marks and open chromatin
• especially outside of genes
Types of Non-Coding Elements
• Non-coding RNAs
– miRNAs, lncRNAs, etc
• Non-coding gene elements
– UTRs, splice sites, poly-adenylation sites, splice
sites and regulating element, RNA-binding sites
• DNA elements outside genes – our main focus
– Promoters
– Enhancers/Silencers
– Insulators
Types of Non-Coding RNA
•
•
•
•
•
•
•
•
microRNAs
Silencing RNAs
Small nuclear/nucleolar RNAs
Piwi-Interacting RNAs
Long Non-Coding RNAs
Circular RNAs
Still other RNAs???
Comprehensive data base at www.ncrna.org
Micro-RNAs
• Micro-RNAs are small non-coding RNA molecules, about 21–
25 nucleotides in length
• They are processed from much longer genes, or from introns
within mRNA, by several molecular pathways
• Micro-RNAs base-pair with complementary sequences within
mRNA molecules, often in 3’ or 5’ UTR.
• miRNA binding usually results in gene repression either via
translational stalling or by triggering mRNA degradation
Image by Charles Mallery, U of Miami
Micro-RNAs
• The human genome encodes over 1500 miRNAs,
which are believed to affect more than half of
human genes
• miRNAs are abundant in many cell types
– Thousands of copies per cell of some miRNAs
– Those within gene introns share regulation
• miRNAs are well-conserved across vertebrates
– No orthologs between plant and animal miRNAs
– miRBase is the comprehensive repository of microRNAs
Other Short RNAs: siRNA
• Small interfering RNAs are double-stranded
with an overhang
• They are processed by some of the same
machinery as miRNAs and have some of the
same effects
Other Short RNAs: piRNA
• Piwi-Interacting RNAs are longer 26-31 base
single-stranded RNAs
– PIWI (P-element Induces Wimpy Testis) protein
• Over 50,000 sequences known in mouse
– They are the largest class of nc-RNA
• They seem to play an ancient role in defense
against retro-viruses and transposons
Other Short RNAs: snRNAs & snoRNAs
• Small nuclear RNAs (snRNAs) are typically ~
150 bases long, and associate with protein
– Many conserved copies of each snRNA gene U6 snRNA
– U1-U6 snRNAs key parts of splicing machinery
• Small nucleolar RNAs (snoRNAs)
– Guide chemical modifications of other RNAs
– Prader-Willi syndrome results from deletion of
region containing 29 copies of SNORD116 on chr
15q11
Long Non-Coding RNAs
• Many long (>200bp) stretches of
genome are transcribed and
have epigenetic marks like those
of protein-coding genes
• Most of these are spliced RNAs
with two (or more) exons
• GENCODE v15 has 13.5K lncRNA
• See also
– Derrien et al, Genome Research
2012
– Lee, Science 2012
From Derrien et al Genome Res 2012
Many lncRNAs Induce Silencing
• Coat nearby gene(s)
and silence them
• Xist binds to gene
clusters first
• Xist binds disparate
parts of chromosome
• Many lncRNA are
antisense to genes
• Some lncRNAs
maintain pluripotency
of stem cells
From Jeannie Lee lab (Harvard) website
Long Non-Coding RNAs - 2
• Most lncRNAs are expressed
in only a few tissues
• Most human lncRNAs are
specific to the primate lineage
From Derrien et al Genome Res 2012
Circular RNAs
• Several thousand non-coding
RNAs apparently form circular
structures
• Many form complexes with
AGO and seem to absorb
attached miRNAs, blocking
processing
• CDR1 has 70 conserved binding
sites for mir7
Functional Pseudo-Genes
• Pseudo-genes are copies of genes that are
decaying and rarely (never) make proteins
• Some pseudo-genes act to absorb negative
regulators of the original gene – eg. SRGAP2B
How to Identify Non-Coding RNAs?
• Short (and long) RNA transcriptomes
• Promoter chromatin marks for independent
(non-embedded) miRNAs and lncRNAs
DEMO: Display HOTAIR & XIST Tracks in
UCSC Browser
Non-Coding Elements of Genes
•
•
•
•
•
•
TSS
5' UTRs
Introns
Splicing regulation sites
3' UTRs
Termination/Poly-adenylation sites
Transcription Start Sites
• Transcription of most genes may initiate at
several distinct clusters of locations with
distinct promoters for each TSS
• Two major types of metazoan TSS: CG-rich
broad TSS, and narrow (often tissue-specific)
TSS
Transcription Start Sites
Transcription
often starts at CG
within promoter
5’ Untranslated Regions
• First exon often contains dozens to thousands
of bases before Start codon (median 150)
• Sometimes contains regulatory sequences,
e.g. binding sites for RNA binding proteins,
and translation initiators
Splice Regulatory Sites
• Splicing is achieved through binding of
spliceosome to recognition sequences on
nascent RNA molecule
Splice Regulatory Sites
• Tissue-specific splice regulatory sites are
highly conserved
From Merkin et al Science 2012
Splicing Patterns Evolve in All Tissues
Except Brain
From Merkin et al Science 2012
Non-Coding Elements in Coding Exons
• Many regulatory sites occur within coding exons, esp.
toward 5’ end
• These constrain some codons as much as protein
sequence
• Many human SNPs break TFBS but have little effect
on protein (AFAWK)
From Stergachis et al Science 2013
3’ Untranslated Regions
• Longest exon is usually 3’UTR (>1000 nt)
• Typically 1/3 – 1/2 of a gene is in 5’ & 3’ UTRs
• 3’UTR has binding sites for miRNAs and RNA binding
proteins
• AU-rich elements (AREs) stabilize mRNA
• Proteins recognize complex secondary structure
GRIK4 3’UTR secondary structure is conserved
RNA Binding-Protein Sites
• mRNAs are usually further processed (e.g.
transported or sequestered)
• RNA binding proteins recognize specific motifs
within secondary structure of 3’ or 5’ UTR
• These sites are often highly conserved
From Ray et al Nature 2013
Poly-adenylation/Termination Sites
• Transcripts can be terminated and poly-adenylated at
sites with specific sequences
• Most genes have alternate poly-adenylation sites
• Median lengths of 3’UTR are 250 & 1773 bp (mouse)
Poly-adenylation/Termination Sites
• Rapidly proliferating cells express gene
isoforms with short 3’ UTRs
• Neurons typically have longer 3’ UTRs
Types of alternate poly-adenylation Elkon et al, NRG 2013
DEMO:
GAPDH and GABRA1 in UCSC Browser
Download