Non-protein Coding Genes

advertisement
CS173
Lecture 6: NON protein coding genes
MW 11:00-12:15 in Beckman B302
Prof: Gill Bejerano
TAs: Jim Notwell & Harendra Guturu
http://cs173.stanford.edu [BejeranoWinter12/13]
1
Announcements
• HW1 due in one week.
http://cs173.stanford.edu [BejeranoWinter12/13]
2
ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA
TATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC
TAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC
TGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT
CTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG
AATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA
GCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT
TTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA
CTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG
TTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT
TGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT
TTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG
CGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC
ATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA
GAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA
ATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAA
TTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGA
ATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTT
ATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTT
TGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGT
TCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATAC
ATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCT
GCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTA
CGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGA
ATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACA
TCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAAC
GGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAA
CTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTG
GCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTC
TTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAAT
TGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCT
GCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTT
AATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCT
TCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTT
AATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGA
TTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTA
CTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTT
TACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTT
ACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAA
http://cs173.stanford.edu [BejeranoWinter12/13]
3
AATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGT
“non coding” RNAs (ncRNA)
http://cs173.stanford.edu [BejeranoWinter12/13]
4
Central Dogma of Biology:
5
Active forms of “non coding” RNA
long non-coding
RNA
reverse transcription
microRNA
rRNA,
snRNA,
snoRNA
6
What is ncRNA?
• Non-coding RNA (ncRNA) is an RNA that functions without being
translated to a protein.
• Known roles for ncRNAs:
– RNA catalyzes excision/ligation in introns.
– RNA catalyzes the maturation of tRNA.
– RNA catalyzes peptide bond formation.
– RNA is a required subunit in telomerase.
– RNA plays roles in immunity and development (RNAi).
– RNA plays a role in dosage compensation.
– RNA plays a role in carbon storage.
– RNA is a major subunit in the SRP, which is important in protein trafficking.
– RNA guides RNA modification.
– RNA can do so many different functions, it is thought in the beginning there was an
RNA World, where RNA was both the information carrier and active molecule.
7
“non coding” RNAs (ncRNA)
Small structural RNAs (ssRNA)
http://cs173.stanford.edu [BejeranoWinter12/13]
8
ssRNA Folds into Secondary and 3D Structures
AAUUGCGGGAAAGGGGUCAA
CAGCCGUUCAGUACCAAGUC
UCAGGGGAAACUUUGAGAUG
GCCUUGCAAAGGGUAUGGUA
AUAAGCUGACGGACAUGGUC
CUAACCACGCAGCCAAGUCC
UAAGUCAACAGAUCUUCUGU
UGAUAUGGAUGCAGUUCA
We would like
to predict them
from sequence.
A
C
A
GA CA
G
C
U
A
G
C
200 G
C
G
U 120
U
P5a CA G
U
G
C
P5 UC G
G
U
A
C
G
C
G
G
G
U
U
AA
A
U
A
A
A
AU
A
A
A
C
180 G
C
G
C
P5c G U A G
C
G
U
A
A
260
A
C
G
A A GG
G
C
C
A
C
P4 C G U
GUUCC G
A
U
A
140
G
U
G
A
U
U U
160 G
C
P6 G
C
G AA
A
U
C
A
C
P5b G
A
U
A
C
U
G
A
G
U
G
220 G
U
C
G
U
A
A
G
C
G P6a
AA
C
G
U
U
A
A
A
G
U
U
A
C
G
A
U
A
U P6b
C
G
A
U
G
C 240
A
U
UCU
Waring & Davies.
(1984) Gene 28: 277.
Cate, et al. (Cech & Doudna).
(1996) Science 273:1678.
9
For example, tRNA
tRNA Activity
ssRNA structure rules
•
•
•
•
Canonical basepairs:
– Watson-Crick basepairs:
• G–C
• A–U
– Wobble basepair:
• G-U
Stacks: continuous nested
basepairs. (energetically
favorable)
Non-basepaired loops:
– Hairpin loop
– Bulge
– Internal loop
– Multiloop
Pseudo-knots
Ab initio RNA structure prediction:
lots of Dynamic Programming
• Objective: Maximizing the number of base pairs
(Nussinov et al, 1978)
simple model:
(i, j) = 1 if allowed
fancier model:
GC > AU > GU
Pseudoknots drastically increase
computational complexity
Objective: Minimize Secondary Structure
Free Energy at 37 OC:
Instead of (i, j), measure and sum energies:
-2.1 -0.9 -1.6
C G
G U 
U U
U G 
Ghelix = G G C  + G  C A  + 2G A A  + G A C  =
C G U U U G G G -2.0 kcal/mol - 2.1 kcal/mol + 2x(-0.9) kcal/mol - 1.8 kcal/mol = -7.7 kcal/mol
U
U
G C A A A CA C
G G 
Ghairpin loop = Ginitiation (6 nucleotides) + Gmismatch C A  =


-2.0 -0.9 -1.8 +5.0
5.0 kcal/mol - 1.6 kcal/mol = 3.4 kcal/mol
Gtotal = Ghairpin + Ghelix = 3.4 kcal/mol - 7.7 kcal/mol = -4.3 kcal/mol
Mathews, Disney, Childs, Schroeder, Zuker, & Turner. 2004. PNAS 101: 7287.
http://cs273a.stanford.edu
[Bejerano Fall11/12]
15
Zuker’s algorithm MFOLD: computing
loop dependent energies
RNA structure
•
Base-pairing defines a secondary structure.
The base-pairing is usually non-crossing.
Bafna
1
Stochastic context-free grammar
S  aSu
S  aSu
S  cSg
 acSgu
S  gSc
 accSggu
S  uSa
 accuSaggu
Sa
 accuSSaggu
Sc
 accugScSaggu
Sg
 accuggSccSaggu
Su
 accuggaccSaggu
S  SS
 accuggacccSgaggu
 accuggacccuSagaggu
 accuggacccuuagaggu
1. A CFG
2. A derivation of “accuggacccuuagaggu”
3. Corresponding structure
Cool algorithmics. Unfortunately…
–
–
Random DNA (with high GC content) often folds
into low-energy structures.
We will mention powerful newer methods later on.
ssRNA transcription
• ssRNAs like tRNAs are usually encoded by short
“non coding” genes, that transcribe independently.
• Found in both the UCSC “known genes” track, and
as a subtrack of the RepeatMasker track
http://cs173.stanford.edu [BejeranoWinter12/13]
20
“non coding” RNAs (ncRNA)
microRNAs (miRNA/miR)
http://cs173.stanford.edu [BejeranoWinter12/13]
21
MicroRNA (miR)
~70nt
~22nt
miR match to target mRNA
is quite loose.
mRNA
 a single miR can regulate the expression of hundreds of genes.
http://cs173.stanford.edu [BejeranoWinter12/13]
22
MicroRNA Transcription
mRNA
http://cs173.stanford.edu [BejeranoWinter12/13]
23
MicroRNA Transcription
mRNA
http://cs173.stanford.edu [BejeranoWinter12/13]
24
MicroRNA (miR)
Computational challenges:
Predict miRs.
Predict miR targets.
~70nt
~22nt
miR match to target mRNA
is quite loose.
mRNA
 a single miR can regulate the expression of hundreds of genes.
http://cs173.stanford.edu [BejeranoWinter12/13]
25
MicroRNA Therapeutics
Idea: bolster/inhibit miR production to
broadly modulate protein production
Hope: “right” the good guys and/or
“wrong” the bad guys
Challenge: and not vice versa.
~70nt
~22nt
miR match to target mRNA
is quite loose.
mRNA
 a single miR can regulate the expression of hundreds of genes.
http://cs173.stanford.edu [BejeranoWinter12/13]
26
Other Non Coding Transcripts
http://cs173.stanford.edu [BejeranoWinter12/13]
27
lncRNAs (long non coding RNAs)
Don’t seem to fold into clear structures (or only a sub-region does).
Diverse roles only now starting to be understood.
 Hard to detect or predict function computationally (currently)
http://cs173.stanford.edu [BejeranoWinter12/13]
28
lncRNAs come in many flavors
http://cs173.stanford.edu [BejeranoWinter12/13]
29
X chromosome inactivation in mammals
X
X
X
Dosage compensation
X
Y
Xist – X inactive-specific transcript
Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67
Transcripts, transcripts everywhere
Human Genome
Transcribed* (Tx)
Tx from both strands*
* True size of set unknown
http://cs173.stanford.edu [BejeranoWinter12/13]
32
Or are they?
http://cs173.stanford.edu [BejeranoWinter12/13]
33
The million dollar question
Human Genome
Transcribed* (Tx)
Leaky tx?
Tx from both strands*
Functional?
* True size of set unknown
http://cs173.stanford.edu [BejeranoWinter12/13]
34
Coding and non-coding gene production
To change its behavior
a cell can change the
repertoire of genes and
ncRNAs it makes.
The cell is constantly
making new proteins and
ncRNAs.
These perform their
function for a while,
And are then degraded.
Newly made coding and
non coding gene products
take their place.
The picture within a cell is
constantly “refreshing”.
http://cs173.stanford.edu [BejeranoWinter12/13]
35
Cell differentiation
To change its behavior
a cell can change the
repertoire of genes and
ncRNAs it makes.
That is exactly what happens
when cells differentiate during
development from stem cells
to their different final fates.
http://cs173.stanford.edu [BejeranoWinter12/13]
36
Human manipulation of cell fate
To change its behavior
a cell can change the
repertoire of genes and
ncRNAs it makes.
We have learned (in a dish) to:
1 control differentiation
2 reverse differentiation
3 hop between different states
http://cs173.stanford.edu [BejeranoWinter12/13]
37
Cell replacement therapies
We want to use this
knowledge to provide a
patient with healthy self
cells of a needed type.
We have learned (in a dish) to:
1 control differentiation
2 reverse differentiation
3 hop between different states
(iPS = induced pluripotent stem cells)
http://cs173.stanford.edu [BejeranoWinter12/13]
38
How does this happen?
Different cells in our body
hold copies of (essentially)
the same genome.
Yet they express very
different repertoires of
proteins and non-coding
RNAs.
How do cells do it?
A: like they do everything else: using their proteins & ncRNAs…
http://cs173.stanford.edu [BejeranoWinter12/13]
39
Gene Regulation
Some proteins and non
coding RNAs go “back”
to bind DNA near
genes, turning these
genes on and off.
Gene
DNA
Proteins
To be continued…
http://cs173.stanford.edu [BejeranoWinter12/13]
40
Review
Lecture 6
• Central dogma recap
– Genes, proteins and non coding RNAs
• RNA world hypothesis
• Small structural RNAs
– Sequence, structure, function
– Structure prediction
– Transcription mode
• MicroRNAs
– Functions
– Modes of transcription
• lncRNAs
– Xist
• Genome wide (and context wide) transcription
– How much?
– To what goals?
• Gene transcription and cell identity
– Cell differentiation
– Human manipulation of cell fates
• Gene regulation control
http://cs173.stanford.edu [BejeranoWinter12/13]
41
(On Mondays) ask students to stack
the chairs without wheels at the back
of the room at the end of class.
http://cs173.stanford.edu [BejeranoWinter12/13]
42
Download