Project_Plan - The University of Hong Kong

advertisement
1
The University of Hong Kong
CSIS0801 Final Year Project
Alignment Algorithms for RNA Molecules
Project Plan
V1.0
Supervisor: Dr. S.M.Yiu
Member: Kwong Lap Man, Levin
Email: fyp11016@cs.hku.hk
Website: http://i.cs.hku.hk/fyp/2011/fyp11016/public_html/index.html
Date: September 23, 2011
Final Year Project 2011-2012 Alignment Algorithms for RNA Molecules
Kwong Lap Man, Levin
2
Introduction:
Ribonucleic acid, or RNA, is a major macromolecule that is essential for
all known forms of life. RNA is responsible for regulating genetic and
metabolic activities in cells. RNA is made up of a single long chain of
components called nucleotides, which have four basic types, namely
Adenine (A), Cytosine (C), Guanine (G) and Uracil (U). The sequence of
nucleotides affects the structure of an RNA which further affects the
functions of the molecule.
Nowadays many functional genes are found and identified. And they are
classified by their functions into families. Given a query RNA with known
structure, and a genome, it is useful to identify all genomic sub-strings
that match the query sequence and structure, it will help biologist to
determine the function of this RNA.
Actually this problem has been studied for many years; many computer
scientists have developed algorithms for this problem. For this project, it
is focused on using different programming techniques to improve the
algorithms.
Final Year Project 2011-2012 Alignment Algorithms for RNA Molecules
Kwong Lap Man, Levin
3
Possible directions:
(a) Improve the fundamental DP algorithm:
One of the directions to improve the algorithm already exists is
to improve the fundamental dynamic programming algorithm.
First step is to study the O(mn4) algorithm (BHUM HAN, BANU
DOST, VINEET BAFNA and SHAOJIE ZHANG, 2008);
Second step is to study some techniques that had already been
used to improve DP algorithm, such as Four-Russians speed up;
Final step is to use the technique to improve the algorithm and
test the performance.
(b) Index the RNA sequence:
Another direction is to index the RNA sequence, because the
RNA sequence is long and there exists redundancies (mainly exists in
the non-functional substrings). So the idea to use suffix tree to
eliminate the redundancies comes.
Step one is to build up a suffix tree of the RNA sequence;
Step two is to change the DP algorithm to adapt suffix tree;
Final step is to test the performance of the suffix tree DP
algorithm.
(c) Index the genome families:
Another direction is to index the genome families, because each
RNA substring needs to be compared to all genes from each genome
families, and structure of genes in the same family has similar
structure. So the idea to index each genome family comes.
Step one is to find a way to index a family;
Step two is to change the DP algorithm to adapt indexed families;
Final step is to test the performance of the improved DP
algorithm
Final Year Project 2011-2012 Alignment Algorithms for RNA Molecules
Kwong Lap Man, Levin
4
Project Schedule:
Items
Study existing algorithms
Design improved algorithm on
focusing the core DP
Implement and test the algorithm
Design improved algorithm on
indexing the RNA sequence
Implement and test the algorithm
Design improved algorithm on
indexing the genomic families
Implement and test the algorithm
Time required
1.5 months
1.5 months
0.5 months
1.5 months
0.5 months
1.5 months
0.5 months
References:
(1) Predicting RNA Secondary Structures with Arbitrary Pseudoknots by Maximizing the Number of
Stacking Pairs , Journal Of Computational Biology, Volume 10, Number 6, 2003 by SAMUEL IEONG,
MING-YANG KAO, TAK-WAH LAM, WING-KIN SUNG, and SIU-MING YIU
(2) Structural Alignment of RNAs with Pseudoknots, by Thomas K.F. Wong, and S.M Yiu
(3) Structural Alignment of Pseudoknotted RNA, Journal Of Computational Biology
Final Year Project 2011-2012 Alignment Algorithms for RNA Molecules
Kwong Lap Man, Levin
Download