Last time
• Computing 1+2+…+n
•
Adding 2 n-digit numbers
Today: More algorithms
•
Sequential search
• Variations of sequential search
•
Pattern matching
A Search Algorithm
Problem statement: Write a pseudocode algorithm to find the location of a target value in a list of values.
Input: a list of values and the target value
Output: the location of the target value, or else a message that the value does not appear in the list.
Variables:
The (sequential) search algorithm (Fig 2.9)
Variables: target, n, list list of n values
Get the value of target , n , and the list of n values
Set index to 1
Set found to false
Repeat until found = true or index > n
If the value of list index then
= target
Output the index
Set found to true else
Increment the index by 1
If found is false then
Output a message that target was not found
Stop
• Modify the sequential search algorithm in order
– To find all occurrences of target in the list and print the positions where they occur
– To count the number of occurrences of target in the list
– To count how many elements in the list are larger than target
• Assume the input list is stored in a
1
, a
2
, …, a n
• In general, an algorithm that will have to explore every single element in the list in order will look something like this
Set i = 1
Repeat until (i>n)
<do something with element a i
>
Set i = i+1
• Write algorithms to find
– the largest number in a list of numbers (and the position where it occurs)
– the smallest number in a list of numbers (and the position where it occurs)
– the range of a list of numbers
• Range= largest - smallest
– the average of a list of numbers
– the sum of a list of numbers
Modify the sequential search algorithm in order
– To find all occurrences of target in the list and print the positions where they occur
– To count the number of occurrences of target in the list
– To count how many elements in the list are larger than target
Given a list of numbers from the user, write algorithms to find
– the largest number in a list of numbers (and the position where it occurs)
– the smallest number in a list of numbers (and the position where it occurs)
– the (arithmetic) average of a list of numbers
– the sum of a list of numbers
• Human genome: sequence of billions of nucleotides
• Gene
– Determines human behavior
– Sequence of tens of thousands of nucleotides{A,C, G, T}
– The sequence is not fully known, only a portion of it..
• Problem: How to determine a gene in the human genome?
Genome: …….TCAGGCTAATCGTAGG…….
Gene probe: TAATC
Idea: Find all matches of the probe within the genome and then examine the nucleotides in that neighborhood
• Problem:
– Suppose we have a text
T = TCAGGCTAATCGTAGG and a pattern P =
TA . Design an algorithm that searches T to find the position of every instance of P that appears in T.
• E.g., for this text, the algorithm should return the answer:
There is a match at position 7
There is a match at position 13
This problem is similar to the search algorithm
– except that for every possible starting position every character of P must be compared with a character of T.
• Input
– Text of n characters T1, T2, …, Tn
– Pattern of m (m < n) characters P1, P2, …Pm
• Output:
– Location (index) of every occurrence of pattern within text
• Algorithm:
– What is the idea?
• Algorithm idea:
– Check if pattern matches starting at position 1
– Then check if it matches starting at position 2
– …and so on
• How to check if pattern matches text starting at position k?
– Check that every character of pattern matches corresponding character of text
• How many loops will you need?
• Algorithm idea
– Get input (text and pattern)
– Set starting location k to 1
– Repeat until reach end of text
• Attempt to match every character in the pattern beginning at pos k in text
• If there was a match, print k
• Add 1 to k
– Stop
• Question: is this an algorithm?
– Yes, at a high level of abstraction
– Now we need to write in pseudocode
Pattern Matching Algorithm (Fig. 2.12)
Variables: n, m, T
1
T
2
…T n
, P
1
P
2
…P
Get values for n, m, the text T
1
T
2
…T n m , k, mismatch and the pattern P
1
P
2
…P m
Set k =1
Repeat until k>n-m+1
Set i to 1
Set mismatch=“NO”
Repeat until either (i>m) or (mismatch = “YES”) if P i
≠ T k+(i-1) then
Set mismatch=“YES” else Increment i by 1 if mismatch = “NO” then
Print the message “There is a match at position” k increment k by 1
Stop
Variations on the pattern matching algorithm
How would you modify the algorithm in order to
•Find only the first match for P in T.
•Find only the last match for P in T.