Presentation

Mira Abraham-Cohen and Haim J.Wolfson Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Why RNA?  RNA (ribonucleic acid) is:  not solely a carrier of genetic information (non-coding RNAs) DNA RNA X The Central Dogma of Molecular Biology Protein Why RNA?  RNA (ribonucleic acid) is:  not solely a carrier of genetic information (non-coding RNAs)  a key player in essential cellular processes (e.g. protein synthesis and transport, gene silencing)  involved in pathological processes (e.g. cancerous tumors, AIDS)  a potential drug or drug-target (e.g. RNAi, bacterial ribosomes as antibiotic-targets) RNA Structure 1D 2D 3D Why RNA secondary structure?  “RNA structure” usually refers to 2D structure  Easier to achieve (more common than 3D structures)  Secondary structure elements   Helix Loop Secondary Structure elements Helix Bulge Internal loop Multi branch loop Hairpin GUCUGUCCCCACACGACAGAUAAUCGGGUGCAACUCCCGCCCCUUUUCCGAGGGUCAUCGGAACCA .((((((.......))))))....((((.......)))).[[[..((((((]]]...))))))... Pseudoknot structural motif  Important for the function of many RNAs helix1 i1 < i2 < j1 < j2 helix2  RNA 2D structure alignment  Disregarding pseudoknots O(n4) [Zhang and Shasha 1989]  Including pseudoknots NP-Hard [Zhang et al. 1999] Why do pseudoknots make a difference? Are they common? Over 30% of the functional groups Less than 70% 2D similarity Previous work – RNA 2D alignment  Methods disregarding pseudoknots    RNAforester [Hofacker et al. 2004] Migals [Allali and Sagot 2005] MARNA [Siebert and Backofen 2005]  Methods that deal with limited cases   rna_align (DP) [Jiang et al. 2001] pkalign (DP) [Mohl et al. 2009] Previous work – RNA 2D alignment  A method that deals with the general problem  LARA (ILP) [Bauer et al. 2007]  All current methods dealing with pseudoknots   High time and memory complexity Impractical for big structures    rna_align < 150 nts pkalign < 800 nts LARA < 1600 nts on pc-wolfson1 (2GB RAM) HARP Motivation Preserved 3D structure ? Preserved function Preserved relative 3D distances Preserved function Preserved relative 2D distances Preserved function HARP  Aligns RNA 2D structures with no limitation on the pseudoknot type  Exploits inherent RNA distance constraints   Distances between 2D elements are usually conserved Pseudoknots often create spatial distance constraints  Goal: Finding the largest set of conserved helices  Heuristic method based on an analog of Geometric Hashing Geometric hashing Point of “view” Each pair of points defines a “view” Voting table HARP - Overview R1 R2 Generate reduced “helix” graph representations G1 G2 Build a look-up table of geodesic distances in all bases Query the look-up table Refine alignments and extend the match Reduced graph representation  Vertices- stable helices  Helix beginning, termination and length  Edges connect adjacent helices   Direction: polymerization direction Weight: minimal number of nucleotides needed for connection Graph representation On log n  Graph representation i k j forward 11 i k 7 backward 4 20 j i k 16 4 j Building a look-up table  Shortest path between any two vertices  Any two vertices (i,j) define a “view” forward backward Similar views Inserting G1 triangles Querying with G2 triangles O n 3  Querying the vote table  Querying the table with the indexing edges of G2  ε-vicinity Indexing edges Basis edge • Filtering by – Triangle type F/B – ε-vicinity 3 On   Alignment refinement G1 G2 w    l vi l v j  1   w  C 1  d v , v   1  l v   l v 2  i j  i j  f  Distance between the Correlation between verticesHungarian algorithm helices’ lengths O n 3  O n 7  Alignment extension and scoring    Greedy approach 6 O n  Starting with the largest (pair of bases) match  Extending by adding the pair that contributes most to the extension  Score Sbp R1 , R2  NSbp R1 , R2   min bp1, bp 2  Complexity Generating reduced graphs representations On log n  In practice: Building a look-up size structures less than a second 3 Average table n Big structures (~2800 nucleotides) less than a Querying the look-up minute and 10 MB table O O  Generating alignments: 1. Alignments refinement 2. Alignment extension n  O n 7  O n 6  3 Results  HARP’s statistics  Average score and p-value  Comparison with LARA  Alignment examples HARP’s statistics Functional Group (based on DARTS) Group size Average size (nts) Average score p-value tRNA 4 78 100% 0 Ribosomal 23S subunit 4 2852 71.9% 0 Ribosomal 5S subunit 4 120 77.2% 0.18 Ribosomal 16S subunit 2 1530 86.7% 0 Self splicing group I introns 2 224 78.0% 0.02 Thi-box riboswitch 2 80 95.0% 0.07 Guanine riboswitch 2 69 100% 0 SRP S domain 2 114 73.2% 0.17 RNase P catalytic domain 2 298 68.9% 0.02 Total 24 596 83.4% 0.05 Similar 2D yet different function 5S ribosomal RNA SRP Comparison with LARA 23 rRNA TP/P=TP/(TP+F N) Sensitivity Comparison with LARA HARP LARA 1-Specificity = FPR FP / N = FP / (FP + TN) Self splicing group I introns 68.9% similarity (left) PDB id 1zzn chain B, 10 stable helices. (right) PDB id 1y0q chain A, 13 stable helices. Catalytic domains of ribonuclease P (left) PDB id 2a2e chain A, 19 stable helices (right) PDB id 2a64 chain A, 16 stable helices . Conclusions HARP  HARP is a tool for the alignment of RNA secondary structures, which may include pseudoknots  Accurate tool capable of distinguishing between homologous structures and non-homologous structures  Highly efficient   Takes less than a second for average-size structures Less than a minute and 10 MB for very big structures  Web server : http://bioinfo3d.cs.tau.ac.il/HARP Thank you for your attention !

Presentation

Related documents

Products

Support

Presentation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib