Nandini

advertisement
Whole Genome Sequence Alignment
Overview: Multiple whole genome sequence alignment provides a basis for comparative genomics.
Pairwise whole genome alignment is a fundamentally different problem than the alignment of short
sequences. Sequence alignment is the primary tool for detecting similarity between genomes, which is
important in understanding the evolutionary analysis of functional elements.
A lot of tools and software are available that do whole genome alignment, however there performance
levels are different. The results of the “Alignathon” competition brought to light the performance
metrics of the various aligners like Cactus, MULTIZ, TBA, VISTA-LAGAN, Pecan, GenomeMatch, Mugsy,
Robusta, PSAR-Align, AutoMZ and EPO.
Goals: Most of the tools use BLASTZ/LASTZ like techniques to begin the alignment process. We intend to
improve the process of anchor selection in the initial stages, so that we can improve the accuracy of the
multiple sequence alignment of whole genomes.
Our approach would be to determine a better anchor selection procedure and then align two E. Coli
genomes using the Smith-Waterman algorithm.
Dataset:
Reference genome: >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655
chromosome, complete genome
Query genome: >gi|218687878|ref|NC_011745.1| Escherichia coli ED1a chromosome, complete
genome
Outcome:
a) Calculate and compare the alignment score of our approach with LASTZ and find out if our
approach performs better than LASTZ.
b) Identify coding regions in other bacteria and see how well genes are matched.
Download