Document

advertisement
Greedy Algorithms
CS 6030
by
Savitha Parur Venkitachalam
Outline
•
•
•
•
•
•
Greedy approach to Motif searching
Genome rearrangements
Sorting by Reversals
Greedy algorithms for sorting by reversals
Approximation algorithms
Breakpoint Reversal sort
Greedy motif searching
• Developed by Gerald Hertz and Gary Stormo
in 1989
• CONSENSUS is the tool based on greedy
algorithm
• Faster than Brute force and Simple motif
search algorithms
• An approximation algorithm with an unknown
approximation ratio
Greedy motif search – Psuedocode
Greedy motif search – Steps
• Input – DNA Sequence , t (# sequences) , n (length of
one sequence) , l (length of motif to search)
• Output – set of starting points of l-mers
• Performs an exhaustive search using hamming
distance on first two sequences of the DNA
• Forms a 2 x l seed matrix with the two closest l-mers
• Scans the rest of t-2 sequences to find the l-mer that
best matches the seed and add it to the next row of
the seed matrix
Complexity
• Exhaustive search on first two sequences
require l(n-l+1)2 operations which is O(ln2)
• The sequential scan on t-2 sequences requires
l(n-l+1)(t-2) operations which is O(lnt)
• Thus running time of greedy motif search is
O(ln2 + lnt)
• If t is small compared to n algorithm behaves
O(ln2)
Consensus tool
• Greedy motif algorithm may miss the optimal
motif
• Consensus tool saves large number of seed
matrices
• Consensus tool can check sequences in
random
• Consensus tool is less likely to miss the
optimal motif
Genome rearrangements
• Gene rearrangements results in a change of
gene ordering
• Series of gene rearrangements can alter
genomic architecture of a species
• 99% similarity between cabbage and turnip
genes
• Fewer than 250 genomic rearrangements
since divergence of human and mice
History of Chromosome X
Rat Consortium, Nature, 2004
Types of Rearrangements
Reversal
1 2 3 4 5 6
1 2 -5 -4 -3 6
Translocation
1 2 3
45 6
1 26
4 53
Fusion
1 2 3 4
5 6
1 2 3 4 5 6
Fission
Greedy algorithms in Gene
Rearrangements
• Biologists are interested in finding the smallest
number of reversals in an evolutionary
sequence
• gives a lower bound on the number of
rearrangements and the similarity between
two species
• Two greedy algorithms used
- Simple reversal sort
- Breakpoint reversal sort
Gene Order
• Gene order is represented by a permutation p:
p = p 1 ------ p i-1 p i p i+1 ------ p j-1 p j p j+1 ----- p n


Reversal r ( i, j ) reverses (flips) the elements
from i to j in p
p * r ( i, j )
↓
p 1 ------ p i-1 p j p j-1 ------ p i+1 p i p j+1 ----- pn
Reversal example
p=12345678
r(3,5)
↓
12543678
r(5,6)
↓
12546378
Reversal distance problem
• Goal: Given two permutations, find the shortest series
of reversals that transforms one into another
• Input: Permutations p and s
• Output: A series of reversals r1,…rt transforming p into
s, such that t is minimum
• t - reversal distance between p and s
• d(p, s) - smallest possible value of t, given p and s
Sorting by reversal
• Goal : Given a permutation , find a shortest
series of reversals that transforms it into the
identity permutation.
• Input: Permutation π
• Output : A series of reversals r1,…rt
transforming p into identity permutation, such
that t is minimum
Sorting by reversal - Greedy algorithm
• If sorting permutation p = 1 2 3 6 4 5, the first
three elements are already in order so it does
not make any sense to break them.
• The length of the already sorted prefix of p is
denoted prefix(p)
– prefix(p) = 3
• This results in an idea for a greedy algorithm:
increase prefix(p) at every step
Simple Reversal sort – Psuedocode
• A very generalized approach leads to analgorithm that sorts
by moving ith element to ith position
SimpleReversalSort(p)
1 for i  1 to n – 1
2 j  position of element i in p (i.e., pj = i)
3 if j ≠i
4
p  p * r(i, j)
5
output p
6 if p is the identity permutation
7
return
Example – SimpleReversalSort not
optimal
Input – 612345
612345 ->162345 ->126345 ->123645->123465 -> 123456
Greedy SimpleReversalSort takes 5 steps where
as optimal solution only takes 2 steps
612345 -> 543216 -> 123456
• An example of SimpleReversalSort is ‘Pancake
Flipping problem’
Approximation Ratio
• These algorithms produce approximate
solution rather than an optimal one
• Approximation ratio is of an algorithm A is
given by A(p) / OPT(p)
– For algorithm A that minimizes objective
function (minimization algorithm):
• max|p| = n A(p) / OPT(p)
– For maximization algorithm:
• min|p| = n A(p) / OPT(p)
Breakpoints – A different face of greed
• In a permutation p = p 1 ----p n
- if p i and p i+1 are consecutive numbers it is an adjacency
- if p i and p i+1 are not consecutive numbers it is a breakpoint
Example:
p =1|9|3 4|7 8|2 |6 5
• Pairs (1,9), (9,3), (4,7), (8,2) and (2,6) form breakpoints
• Pairs (3,4) (7,8) and (6,5) form adjacencies
• b(p) - # breakpoints in permutation p
• Our goal is to eliminate all breakpoints and thus forming the
identity permutation
Breakpoint Reversal Sort – Steps
•
•
•
•
Put two elements p 0 =0 and p n + 1=n+1 at the ends of p
Eliminate breakpoints using reversals
Each reversal eliminates at most 2 breakpoints
This implies reversal distance ≥ #breakpoints/2
p =2 3 1 4 6 5
0
0
0
0
2
1
1
1
3
3
2
2
1
2
3
3
4
4
4
4
6
6
6
5
57
5 7
5 7
6 7
b(p) = 5
b(p) = 4
b(p) = 2
b(p) = 0
• Not efficient as it may run forever
Psuedocode – Breakpoint reversal Sort
BreakPointReversalSort(p)
1 while b(p) > 0
2 Among all possible reversals,
choose reversal r minimizing b(p • r)
3 p  p • r(i, j)
4 output p
5 return
Using strips
A strip is an interval between two consecutive
breakpoints in a permutation
• Decreasing strip: strip of elements in decreasing
order
• Increasing strip: strip of elements in increasing
order
0 1 9 4 3 7 8 2 5 6 10
•
A single-element strip can be declared either increasing or decreasing. We
will choose to declare them as decreasing with exception of the strips with 0
and n+1
Reducing breakpoints
• Choose the decreasing strip with the smallest element
k in p
• Find K-1 in the permutation
• Reverse the segment between k and k-1
Eg: p = 1 4 6 5 7 8 3 2
0 1 4 6 5 7 8 3 2 9
b(p) = 5
0 1 2 3 8 7 5 6 4 9
b(p ) = 4
01234 65 789
b(p ) = 2
0123456789
ImprovedBreakpointReversalSort
• Sometimes permutation may not contain any decreasing strips
• So an increasing strip has to be reversed so that it becomes a
decreasing strip
• Taking this into consideration we have an improved algorithm
ImprovedBreakpointReversalSort(p)
1 while b(p) > 0
2
if p has a decreasing strip
3
Among all possible reversals, choose reversal r
that minimizes b(p • r)
4
else
5
Choose a reversal r that flips an increasing strip in
p
p  p • r
output p
6
7
8 return
Example – ImprovedBreakPointSort
• There are no decreasing strips in p, for:
p = 0 1 2 | 5 6 7 | 3 4 | 8 b(p) = 3
p • r(6,7) = 0 1 2 | 5 6 7 | 4 3 | 8 b(p) = 3
r(6,7) does not change the # of breakpoints
r(6,7) creates a decreasing strip thus
guaranteeing that the next step will decrease
the # of breakpoints.
Approximation Ratio ImprovedBreakpointReversalSort
• Approximation ratio is 4
– It eliminates at least one breakpoint in every two
steps; at most 2b(p) steps
– Approximation ratio: 2b(p) / d(p)
– Optimal algorithm eliminates at most 2
breakpoints in every step: d(p)  b(p) / 2
– Performance guarantee:
• ( 2b(p) / d(p) )  [ 2b(p) / (b(p) / 2) ] = 4
References
• An Introduction to Bioinformatics Algorithms
- Neil C.Jones and Pavel A.Pevzner
• http://bix.ucsd.edu/bioalgorithms/slides.php#
Ch5
Questions
Download