SORTING ALGORITHMS AND GENOMIC REARRANGEMENTS INTRODUCTION Genomic rearrangements are crucial for evolution and are responsible for the existing varieties of genome architectures. The genomic sequences of human and mouse provide evidence for a larger number of rearrangements. The most common types of genomic rearrangements are reversals, translocations, fusion and fissions. Comparison of sequences through sorting techniques can reveal significant information about their ancestry and evolution. This paper talks about greedy and insertion sort algorithms for sequence comparisons. Reversal Reversal in a gene sequence happens when the genes are misread. Reversals introduce breakpoints and hence disruptions in the order. Examples of Reversals: i.1 2 3 4 5 6 7 8 9 10 could be misread as: 1 2 3 -8 -7 -6 -5 -4 9 10 The negative numbers indicate the reversal in the gene order. ii. 5’ ATGCCTGTACTA 3’ 3’ TACGGACATGAT 5’ iii. P = 1 2 3 4 5 6 7 8 p(3,5) P=12543678 P(5,6) P= 12546378 A. Reversal Distance Problem(Using greedy algorithm) Goal: Given two permutations (one being the identity sequence 1 through n), find the shortest series of reversals that transforms one to the other. Input: Two permutations (One permutation is the identity sequence). Output: A series of reversals p1………………pt transforming one to the other such that t is minimum. B. Comparison of two sequences using insertion sort: Goal: Given two sequences (DNA / Protein) find through insertion sorting if they share a common ancestor. Input: Two sequences (DNA/Protein) from file. Output: A series of sorts confirming common ancestry. ABSTRACT Greedy and Insertion sort algorithms: A. Greedy Algorithm: Greedy algorithms are shortsighted in their approach i.e. they take decisions based on the information available at hand. It takes four steps to arrive at the structure of greedy algorithm which are as follows: i. A function that checks whether chosen set of items provide a solution ii. A function that checks the feasibility of a set. iii. The selection function tells which of the candidates is the most promising. iv. An objective function, which does not appear explicitly, gives the value of a solution. A feasible set is promising if it can be extended to produce not merely a solution but an optimal solution to the problem. Dynamic programming solves the subproblems bottom up, but a greedy strategy usually progresses in a top down fashion, making one greedy choice after another, reducing each problem to a smaller one. Greedy choice property and optimal substructure are the two key ingredients in the problem that lend to a greedy strategy. The “greedy choice property” says that a globally optimal solution can be arrived at by making a locally optimal choice. Reversals and gene orders: Gene order is represented by a permutation p: p = p 1 ------ p i-1 p i p i+1 ------ p j-1 p j p j+1 ----- p n p 1 ------ p i-1 p j p j-1 ------ p i+1 p i p j +1 ----- pn Reversal r (i, j) reverses (flips) the elements from i to j in p. B. Insertion Sort: Insertion sort is a simple sorting algorithm, a comparison sort in which the sorted array (or list) is built one entry at a time. Advantages of Insertion sort: Simple to implement Efficient on (quite) small data sets Efficient on data sets which are already substantially sorted: it runs in O (n + d) time, where d is the number of inversions More efficient in practice than most other simple O(n2) algorithms such as selection sort or bubble sort: the average time is n2/4 and it is linear in the best case Stable (does not change the relative order of elements with equal keys) In-place (only requires a constant amount O(1) of extra memory space) It is an online algorithm, in that it can sort a list as it receives it. ALGORITHMS A. Greedy Algorithm Simple Reversal Sort (p) 1 to n – 1 1 for i 2 j position of element i in p (i.e., pj = i) 3 if j ≠i 4 p p * r(i, j) 5 output p 6. if p is the identity permutation 7. Return B. Insertion Sort: template <class Comparable> void insertion Sort ( vector<Comparable> & a ) { for( int p = 1; p < a.size( ); p++ ) { Comparable tmp = a[ p ]; int j; for( j = p; j > 0 && tmp < a[ j - 1 ]; j-- ) a[ j ] = a[ j - 1 ]; a[ j ] = tmp; } } RESULTS Greedy Algorithm sort: Input: Output: (Next page)…. Insertion Sort( Result 1: protein sequences) Result2: Result3(Nucleic acid) CONCLUSIONS Two different sorting algorithms as a part of the data structure course was extended to this bioinformatics project. Sorting is a fundamental data structure and is important to almost all the applications. In the implementation of the greedy algorithm, an identity sequence was supplied as a base sequence in the program. The second sequence was obtained from the user and sorted to the identity sequence using simple reversal sort swapping two elements at a time. In the implementation of the insertion sort algorithm, two sequences either DNA or protein were read from files and sorted according to the ASCII values and then compared to check for a common ancestor. To conclude, two sorting techniques were implanted for bioinformatics related problems. FUTURE WORKS : Future work could be directed to see whether both the algorithms can be implemented in conjunction to work more efficiently. The time complexity could be reduced if both the algorithms can work together. Also, it can be studied if greedy algorithms can have more applications in bioinformatics. REFERENCES: 1. Data structures and Problem solving using C++ by Mark Allen Weiss. 2. Java programming from the beginning by K.N.King 3. www.bioalgorithms.info