Non-coding RNAs (Gill) - CS273a - A computational tour of the

advertisement
http://cs273a.stanford.edu
[Bejerano Fall10/11]
1
Lecture 11
HW1 Feedback (ours)
(Upcoming Project – discuss Wed)
Non-Coding RNAs
Halfway Feedback (yours)
http://cs273a.stanford.edu
[Bejerano Fall10/11]
2
“non coding” RNAs
http://cs273a.stanford.edu
[Bejerano Fall10/11]
3
Central Dogma of Biology:
4
RNA is an Active Player:
reverse transcription
long ncRNA
5
What is ncRNA?
• Non-coding RNA (ncRNA) is an RNA that functions without being
translated to a protein.
• Known roles for ncRNAs:
– RNA catalyzes excision/ligation in introns.
– RNA catalyzes the maturation of tRNA.
– RNA catalyzes peptide bond formation.
– RNA is a required subunit in telomerase.
– RNA plays roles in immunity and development (RNAi).
– RNA plays a role in dosage compensation.
– RNA plays a role in carbon storage.
– RNA is a major subunit in the SRP, which is important in protein trafficking.
– RNA guides RNA modification.
– In the beginning it is thought there was an RNA World, where RNA was both the
information carrier and active molecule.
6
RNA Folds into (Secondary and) 3D Structures
AAUUGCGGGAAAGGGGUCAA
CAGCCGUUCAGUACCAAGUC
UCAGGGGAAACUUUGAGAUG
GCCUUGCAAAGGGUAUGGUA
AUAAGCUGACGGACAUGGUC
CUAACCACGCAGCCAAGUCC
UAAGUCAACAGAUCUUCUGU
UGAUAUGGAUGCAGUUCA
We would like
to predict them
from sequence.
A
C
A
GA CA
G
C
U
A
G
C
200 G
C
G
U 120
U
P5a CA G
U
G
C
P5 UC G
G
U
A
C
G
C
G
G
G
U
U
AA
A
U
A
A
A
AU
A
A
A
C
180 G
C
G
C
P5c G U A G
C
G
U
A
A
260
A
C
G
A A GG
G
C
C
A
C
P4 C G U
GUUCC G
A
U
A
140
G
U
G
A
U
U U
160 G
C
P6 G
C
G AA
A
U
C
A
C
P5b G
A
U
A
C
U
G
A
G
U
G
220 G
U
C
G
U
A
A
G
C
G P6a
AA
C
G
U
U
A
A
A
G
U
U
A
C
G
A
U
A
U P6b
C
G
A
U
G
C 240
A
U
UCU
Waring & Davies.
(1984) Gene 28: 277.
Cate, et al. (Cech & Doudna).
(1996) Science 273:1678.
7
RNA structure rules
•
•
•
•
Canonical basepairs:
– Watson-Crick basepairs:
• G-C
• A-U
– Wobble basepair:
• G–U
Stacks: continuous nested
basepairs. (energetically
favorable)
Non-basepaired loops:
– Hairpin loop.
– Bulge.
– Internal loop.
– Multiloop.
Pseudo-knots
RNA structure: Basics
•
•
•
Key: RNA is single-stranded. Think of a string
over 4 letters, AC,G, and U.
The complementary bases form pairs.
Base-pairing defines a secondary structure.
The base-pairing is usually non-crossing.
Bafna
1
Ab initio structure prediction:
lots of Dynamic Programming
• Maximizing the number of base pairs
(Nussinov et al, 1978)
simple model:
(i, j) = 1
Pseudoknots drastically increase
computational complexity
Nearest Neighbor Model for RNA
Secondary Structure Free Energy at 37 OC:
-2.1 -0.9 -1.6
C G
G U 
U U
U G 
Ghelix = G G C  + G  C A  + 2G A A  + G A C  =
C G U U U G G G -2.0 kcal/mol - 2.1 kcal/mol + 2x(-0.9) kcal/mol - 1.8 kcal/mol = -7.7 kcal/mol
U
U
G C A A A CA C
G G
Ghairpin loop = Ginitiation (6 nucleotides) + Gmismatch C A  =


-2.0 -0.9 -1.8 +5.0
5.0 kcal/mol - 1.6 kcal/mol = 3.4 kcal/mol
Gtotal = Ghairpin + Ghelix = 3.4 kcal/mol - 7.7 kcal/mol = -4.3 kcal/mol
Mathews, Disney, Childs, Schroeder, Zuker, & Turner. 2004. PNAS 101: 7287.
http://cs273a.stanford.edu
[Bejerano Fall10/11]
12
Zuker’s algorithm MFOLD: computing
loop dependent energies
Energy Landscape of Real & Inferred Structures
http://cs273a.stanford.edu
[Bejerano Fall10/11]
14
Unfortunately…
–
–
Random DNA (with high GC content) often folds
into low-energy structures.
What other signals determine non-coding genes?
1
Evolution to the Rescue
http://cs273a.stanford.edu
[Bejerano Fall10/11]
16
http://cs273a.stanford.edu
[Bejerano Fall10/11]
17
Stochastic context-free grammar (SCFG)
S
S  aSu
S  uSa
S  gSc
S  cSg
SL
L  aL
L  cL
La
Lc
S
S
S
L
L
L
L
c guu aga aac cucucccc
• Each derivation tree corresponds to a structure.
Stochastic context-free grammar (cont’)
S  aSu
S  aSu
S  cSg
 acSgu
S  gSc
 accSggu
S  uSa
 accuSaggu
Sa
 accuSSaggu
Sc
 accugScSaggu
Sg
 accuggSccSaggu
Su
 accuggaccSaggu
S  SS
 accuggacccSgaggu
 accuggacccuSagaggu
 accuggacccuuagaggu
1. A CFG
2. A derivation of “accuggacccuuagaggu”
3. Corresponding structure
http://cs273a.stanford.edu
[Bejerano Fall10/11]
20
MicroRNA
http://cs273a.stanford.edu
[Bejerano Fall10/11]
21
Genomic context
known miRNAs in human
intergenic
polycistronic
monocistronic
intronic
tRNA
tRNA Activity
http://cs273a.stanford.edu
[Bejerano Fall10/11]
25
http://cs273a.stanford.edu
[Bejerano Fall10/11]
26
Human specific accelerated evolution
rapid
change
Human
Chimp
conserved
http://cs273a.stanford.edu
[Bejerano Fall10/11]
27
Human Accelerated Regions
Human-specific substitutions in conserved sequences
rapid
change
Human
Chimp
conserved
Human
Derived
[Pollard, K. et al., Nature, 2006]
Chimp
Ancestral
HAR1:
• Novel ncRNA
•Co-expressed in Cajal-Retzius
cells with reelin.
•Similar expression in
human, chimp, rhesus.
•18 unique human substitutions
leading to novel conformation.
•All weak (AT) to strong (GC).
[Beniaminov, A. et al., RNA, 2008]
28
Other Non Coding Transcripts
http://cs273a.stanford.edu
[Bejerano Fall10/11]
29
http://cs273a.stanford.edu
[Bejerano Fall10/11]
30
mRNA
http://cs273a.stanford.edu
[Bejerano Fall10/11]
31
EST
http://cs273a.stanford.edu
[Bejerano Fall10/11]
32
lincRNAs (long intergenic non coding RNAs)
http://cs273a.stanford.edu
[Bejerano Fall10/11]
33
X chromosome inactivation in mammals
X
X
X
Dosage compensation
X
Y
Xist – X inactive-specific transcript
Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67
Microarrays, Next Gen(eration) Sequencing etc.
http://cs273a.stanford.edu
[Bejerano Fall10/11]
36
End Results
http://cs273a.stanford.edu
[Bejerano Fall10/11]
37
http://cs273a.stanford.edu
[Bejerano Fall10/11]
38
http://cs273a.stanford.edu
[Bejerano Fall10/11]
39
Transcripts, transcripts everywhere
Human Genome
Transcribed (Tx)
Leaky tx?
Tx from both strands
Functional?
http://cs273a.stanford.edu
[Bejerano Fall10/11]
40
Or are they?
http://cs273a.stanford.edu
[Bejerano Fall10/11]
41
Halfway Feedback
http://cs273a.stanford.edu
[Bejerano Fall10/11]
42
Download