Designing Algorithms Csci 107 Lecture 4

advertisement
Designing Algorithms
Csci 107
Lecture 4
Outline
Last time
•
Sequential search
Today: More algorithms
•
•
Variations of sequential search
Pattern matching
Reading: Finish Chapter 2 textbook
The (sequential) search algorithm (Fig 2.9)
Variables: target, n, list list of n values
Get the value of target, n, and the list of n values
Set index to 1
Set found to false
Repeat until found = true or index > n
If the value of listindex = target
then
Output the index
Set found to true
else
Increment the index by 1
If found is false then
Output a message that target was not found
Stop
Iterating through a list
• Assume the input list is stored in a1, a2, …, an
• In general, an algorithm that will have to explore
every single element in the list in order will look
something like this
Set i = 1
Repeat until (i>n)
<do something with element ai>
Set i = i+1
More algorithms
Modify the sequential search algorithm in order
– To find all occurrences of target in the list and print the positions
where they occur
– To count the number of occurrences of target in the list
– To count how many elements in the list are larger than target
Given a list of numbers from the user, write algorithms to find
– the largest number in a list of numbers (and the position where it
occurs)
– the smallest number in a list of numbers (and the position where it
occurs)
– the (arithmetic) average of a list of numbers
– the sum of a list of numbers
A Search Application in
Bioinformatics
• Human genome: sequence of billions of nucleotides
• Gene
– Determines human behavior
– Sequence of tens of thousands of nucleotides{A,C, G, T}
– The sequence is not fully known, only a portion of it..
• Problem: How to determine a gene in the human genome?
Genome: …….TCAGGCTAATCGTAGG…….
Gene probe:
TAATC
Idea: Find all matches of the probe within the genome and then examine the
nucleotides in that neighborhood
A Search Application in
Bioinformatics
• Problem:
– Suppose we have a text T = TCAGGCTAATCGTAGG and a pattern P =
TA. Design an algorithm that searches T to find the position of every
instance of P that appears in T.
• E.g., for this text, the algorithm should return the answer:
There is a match at position 7
There is a match at position 13
This problem is similar to the search algorithm
– except that for every possible starting position every character of P must
be compared with a character of T.
Pattern Matching
• Input
– Text of n characters T1, T2, …, Tn
– Pattern of m (m < n) characters P1, P2, …Pm
• Output:
– Location (index) of every occurrence of pattern within text
• Algorithm:
– What is the idea?
Pattern Matching
• Algorithm idea:
– Check if pattern matches starting at position 1
– Then check if it matches starting at position 2
– …and so on
• How to check if pattern matches text starting at
position k?
– Check that every character of pattern matches
corresponding character of text
• How many loops will you need?
Pattern Matching
• Algorithm idea
– Get input (text and pattern)
– Set starting location k to 1
– Repeat until reach end of text
• Attempt to match every character in the pattern beginning at
pos k in text
• If there was a match, print k
• Add 1 to k
– Stop
• Question: is this an algorithm?
– Yes, at a high level of abstraction
– Now we need to write in pseudocode
Pattern Matching Algorithm (Fig. 2.12)
Variables: n, m, T1T2…Tn , P1P2…Pm , k, mismatch
Get values for n, m, the text T1T2…Tn and the pattern P1P2…Pm
Set k =1
Repeat until k>n-m+1
Set i to 1
Set mismatch=“NO”
Repeat until either (i>m) or (mismatch = “YES”)
if Pi ≠ Tk+(i-1) then
Set mismatch=“YES”
else Increment i by 1
if mismatch = “NO” then
Print the message “There is a match at position” k
increment k by 1
Stop
Variations on the pattern matching algorithm
How would you modify the algorithm in order to
•Find only the first match for P in T.
•Find only the last match for P in T.
Download