GREEDY ALGORITHMS AND GENOMIC REARRANGEMENTS

advertisement
SORTING ALGORITHMS AND GENOMIC REARRANGEMENTS
INTRODUCTION
Genomic rearrangements are crucial for evolution and are responsible for the existing
varieties of genome architectures. The genomic sequences of human and mouse provide
evidence for a larger number of rearrangements. The most common types of genomic
rearrangements are reversals, translocations, fusion and fissions. Comparison of
sequences through sorting techniques can reveal significant information about their
ancestry and evolution. This paper talks about greedy and insertion sort algorithms for
sequence comparisons.
Reversal
Reversal in a gene sequence happens when the genes are misread. Reversals introduce
breakpoints and hence disruptions in the order.
Examples of Reversals:
i.1 2 3 4 5 6 7 8 9 10 could be misread as:
1 2 3 -8 -7 -6 -5 -4 9 10
The negative numbers indicate the reversal in the gene order.
ii. 5’ ATGCCTGTACTA 3’
3’ TACGGACATGAT 5’
iii. P = 1 2 3 4 5 6 7 8
p(3,5)
P=12543678
P(5,6)
P= 12546378
A. Reversal Distance Problem(Using greedy algorithm)
Goal: Given two permutations (one being the identity sequence 1 through n), find the
shortest series of reversals that transforms one to the other.
Input: Two permutations (One permutation is the identity sequence).
Output: A series of reversals p1………………pt transforming one to the other such that t
is minimum.
B. Comparison of two sequences using insertion sort:
Goal: Given two sequences (DNA / Protein) find through insertion sorting if they share a
common ancestor.
Input: Two sequences (DNA/Protein) from file.
Output: A series of sorts confirming common ancestry.
ABSTRACT
Greedy and Insertion sort algorithms:
A. Greedy Algorithm:
Greedy algorithms are shortsighted in their approach i.e. they take decisions based on the
information available at hand. It takes four steps to arrive at the structure of greedy
algorithm which are as follows:
i.
A function that checks whether chosen set of items provide a solution
ii.
A function that checks the feasibility of a set.
iii.
The selection function tells which of the candidates is the most promising.
iv.
An objective function, which does not appear explicitly, gives the value of a
solution.
A feasible set is promising if it can be extended to produce not merely a solution but
an optimal solution to the problem. Dynamic programming solves the subproblems
bottom up, but a greedy strategy usually progresses in a top down fashion, making
one greedy choice after another, reducing each problem to a smaller one. Greedy
choice property and optimal substructure are the two key ingredients in the problem
that lend to a greedy strategy. The “greedy choice property” says that a globally
optimal solution can be arrived at by making a locally optimal choice.
Reversals and gene orders:
Gene order is represented by a permutation p:
p = p 1 ------ p i-1 p i p i+1 ------ p j-1 p j p
j+1 ----- p n
p 1 ------ p i-1 p j p j-1 ------ p i+1 p i p j
+1 ----- pn
Reversal r (i, j) reverses (flips) the elements from i to j in p.
B. Insertion Sort:
Insertion sort is a simple sorting algorithm, a comparison sort in which the sorted
array (or list) is built one entry at a time.
Advantages of Insertion sort:

Simple to implement

Efficient on (quite) small data sets

Efficient on data sets which are already substantially sorted: it runs in O (n + d)
time, where d is the number of inversions

More efficient in practice than most other simple O(n2) algorithms such as
selection sort or bubble sort: the average time is n2/4 and it is linear in the best
case

Stable (does not change the relative order of elements with equal keys)

In-place (only requires a constant amount O(1) of extra memory space)

It is an online algorithm, in that it can sort a list as it receives it.
ALGORITHMS
A. Greedy Algorithm
Simple Reversal Sort (p)
1 to n – 1
1 for i
2 j
position of element i in p (i.e., pj = i)
3 if j ≠i
4
p
p * r(i, j)
5
output p
6. if p is the identity permutation
7. Return
B. Insertion Sort:

template <class Comparable> void insertion Sort ( vector<Comparable> & a )

{

for( int p = 1; p < a.size( ); p++ )

{

Comparable tmp = a[ p ];

int j;

for( j = p; j > 0 && tmp < a[ j - 1 ]; j-- )

a[ j ] = a[ j - 1 ];

a[ j ] = tmp;


}
}
RESULTS
Greedy Algorithm sort:
Input:
Output: (Next page)….
Insertion Sort( Result 1: protein sequences)
Result2:
Result3(Nucleic acid)
CONCLUSIONS
Two different sorting algorithms as a part of the data structure course was extended to
this bioinformatics project. Sorting is a fundamental data structure and is important to
almost all the applications. In the implementation of the greedy algorithm, an identity
sequence was supplied as a base sequence in the program. The second sequence was
obtained from the user and sorted to the identity sequence using simple reversal sort
swapping two elements at a time. In the implementation of the insertion sort
algorithm, two sequences either DNA or protein were read from files and sorted
according to the ASCII values and then compared to check for a common ancestor.
To conclude, two sorting techniques were implanted for bioinformatics related
problems.
FUTURE WORKS :
Future work could be directed to see whether both the
algorithms can be implemented in conjunction to work more efficiently. The time
complexity could be reduced if both the algorithms can work together. Also, it can be
studied if greedy algorithms can have more applications in bioinformatics.
REFERENCES:
1. Data structures and Problem solving using C++ by Mark Allen Weiss.
2. Java programming from the beginning by K.N.King
3. www.bioalgorithms.info
Download