Presentation

advertisement
Mira Abraham-Cohen and Haim J.Wolfson
Blavatnik School of Computer Science
Tel Aviv University
Tel Aviv, Israel
Why RNA?
 RNA (ribonucleic acid) is:
 not solely a carrier of genetic information
(non-coding RNAs)
DNA
RNA
X
The Central Dogma of Molecular Biology
Protein
Why RNA?
 RNA (ribonucleic acid) is:
 not solely a carrier of genetic information
(non-coding RNAs)

a key player in essential cellular processes
(e.g. protein synthesis and transport, gene silencing)

involved in pathological processes
(e.g. cancerous tumors, AIDS)

a potential drug or drug-target
(e.g. RNAi, bacterial ribosomes as antibiotic-targets)
RNA Structure
1D
2D
3D
Why RNA secondary structure?
 “RNA structure” usually refers to 2D structure

Easier to achieve (more common than 3D
structures)
 Secondary structure elements


Helix
Loop
Secondary Structure elements
Helix
Bulge
Internal
loop
Multi
branch
loop
Hairpin
GUCUGUCCCCACACGACAGAUAAUCGGGUGCAACUCCCGCCCCUUUUCCGAGGGUCAUCGGAACCA
.((((((.......))))))....((((.......)))).[[[..((((((]]]...))))))...
Pseudoknot structural motif
 Important for the function of many RNAs
helix1
i1 < i2 < j1 < j2
helix2
 RNA 2D structure alignment
 Disregarding pseudoknots O(n4)
[Zhang and Shasha 1989]
 Including pseudoknots NP-Hard
[Zhang et al. 1999]
Why do pseudoknots make a
difference?
Are they common?
Over 30% of the functional groups
Less than 70% 2D similarity
Previous work – RNA 2D
alignment
 Methods disregarding pseudoknots



RNAforester [Hofacker et al. 2004]
Migals [Allali and Sagot 2005]
MARNA [Siebert and Backofen 2005]
 Methods that deal with limited cases


rna_align (DP) [Jiang et al. 2001]
pkalign (DP) [Mohl et al. 2009]
Previous work – RNA 2D
alignment
 A method that deals with the general problem

LARA (ILP) [Bauer et al. 2007]
 All current methods dealing with pseudoknots


High time and memory complexity
Impractical for big structures



rna_align < 150 nts
pkalign < 800 nts
LARA < 1600 nts on pc-wolfson1 (2GB RAM)
HARP Motivation
Preserved 3D
structure
?
Preserved
function
Preserved
relative 3D
distances
Preserved
function
Preserved
relative 2D
distances
Preserved
function
HARP
 Aligns RNA 2D structures with no limitation on
the pseudoknot type
 Exploits inherent RNA distance constraints


Distances between 2D elements are usually
conserved
Pseudoknots often create spatial distance
constraints
 Goal: Finding the largest set of conserved
helices
 Heuristic method based on an analog of
Geometric Hashing
Geometric hashing
Point of “view”
Each pair of points defines a “view”
Voting
table
HARP - Overview
R1
R2
Generate reduced “helix” graph
representations
G1
G2
Build a look-up
table of geodesic
distances in all
bases
Query the
look-up table
Refine alignments and extend the match
Reduced graph representation
 Vertices- stable helices

Helix beginning, termination and length
 Edges connect adjacent helices


Direction: polymerization direction
Weight: minimal number of nucleotides
needed for connection
Graph representation
On log n 
Graph representation
i
k
j
forward
11
i
k
7
backward
4
20
j
i
k
16
4
j
Building a look-up table
 Shortest path between any two vertices
 Any two vertices (i,j) define a “view”
forward
backward
Similar views
Inserting G1 triangles
Querying with G2 triangles
O n 3 
Querying the vote table
 Querying the table with the indexing edges of
G2

ε-vicinity
Indexing edges
Basis edge
• Filtering by
– Triangle type F/B
– ε-vicinity
3
On
 
Alignment refinement
G1
G2
w



l vi l v j 
1


w
 C 1  d v , v   1  l v   l v 2 
i
j 
i
j
 f

Distance between the
Correlation between
verticesHungarian algorithm
helices’ lengths
O n 3 
O n 7 
Alignment extension and scoring
 
 Greedy approach
6
O
n
 Starting with the largest (pair of bases) match
 Extending by adding the pair that contributes most to the
extension
 Score
Sbp R1 , R2 
NSbp R1 , R2  
min bp1, bp 2 
Complexity
Generating reduced graphs
representations
On log n 
In practice:
Building a look-up
size structures less than a second
3 Average
table
n Big structures (~2800 nucleotides) less than a
Querying the look-up
minute and 10 MB
table
O
O

Generating alignments:
1. Alignments refinement
2. Alignment extension
n 
O n 7 
O n 6 
3
Results
 HARP’s statistics

Average score and p-value
 Comparison with LARA
 Alignment examples
HARP’s statistics
Functional Group
(based on DARTS)
Group
size
Average size
(nts)
Average
score
p-value
tRNA
4
78
100%
0
Ribosomal 23S subunit
4
2852
71.9%
0
Ribosomal 5S subunit
4
120
77.2%
0.18
Ribosomal 16S subunit
2
1530
86.7%
0
Self splicing group I introns
2
224
78.0%
0.02
Thi-box riboswitch
2
80
95.0%
0.07
Guanine riboswitch
2
69
100%
0
SRP S domain
2
114
73.2%
0.17
RNase P catalytic domain
2
298
68.9%
0.02
Total
24
596
83.4%
0.05
Similar 2D yet different function
5S ribosomal RNA
SRP
Comparison with LARA
23 rRNA
TP/P=TP/(TP+F
N)
Sensitivity
Comparison with LARA
HARP
LARA
1-Specificity = FPR
FP / N = FP / (FP +
TN)
Self splicing group I introns
68.9% similarity
(left) PDB id 1zzn chain B, 10 stable helices.
(right) PDB id 1y0q chain A, 13 stable helices.
Catalytic domains of ribonuclease P
(left) PDB id 2a2e chain A, 19 stable helices
(right) PDB id 2a64 chain A, 16 stable helices .
Conclusions
HARP
 HARP is a tool for the alignment of RNA
secondary structures, which may include
pseudoknots
 Accurate tool capable of distinguishing between
homologous structures and non-homologous
structures
 Highly efficient


Takes less than a second for average-size structures
Less than a minute and 10 MB for very big structures
 Web server :
http://bioinfo3d.cs.tau.ac.il/HARP
Thank you for your
attention !
Download