Designing Algorithms Csci 107 Lecture 4 Outline Last time • Sequential search Today: More algorithms • • Variations of sequential search Pattern matching Reading: Finish Chapter 2 textbook The (sequential) search algorithm (Fig 2.9) Variables: target, n, list list of n values Get the value of target, n, and the list of n values Set index to 1 Set found to false Repeat until found = true or index > n If the value of listindex = target then Output the index Set found to true else Increment the index by 1 If found is false then Output a message that target was not found Stop Iterating through a list • Assume the input list is stored in a1, a2, …, an • In general, an algorithm that will have to explore every single element in the list in order will look something like this Set i = 1 Repeat until (i>n) <do something with element ai> Set i = i+1 More algorithms Modify the sequential search algorithm in order – To find all occurrences of target in the list and print the positions where they occur – To count the number of occurrences of target in the list – To count how many elements in the list are larger than target Given a list of numbers from the user, write algorithms to find – the largest number in a list of numbers (and the position where it occurs) – the smallest number in a list of numbers (and the position where it occurs) – the (arithmetic) average of a list of numbers – the sum of a list of numbers A Search Application in Bioinformatics • Human genome: sequence of billions of nucleotides • Gene – Determines human behavior – Sequence of tens of thousands of nucleotides{A,C, G, T} – The sequence is not fully known, only a portion of it.. • Problem: How to determine a gene in the human genome? Genome: …….TCAGGCTAATCGTAGG……. Gene probe: TAATC Idea: Find all matches of the probe within the genome and then examine the nucleotides in that neighborhood A Search Application in Bioinformatics • Problem: – Suppose we have a text T = TCAGGCTAATCGTAGG and a pattern P = TA. Design an algorithm that searches T to find the position of every instance of P that appears in T. • E.g., for this text, the algorithm should return the answer: There is a match at position 7 There is a match at position 13 This problem is similar to the search algorithm – except that for every possible starting position every character of P must be compared with a character of T. Pattern Matching • Input – Text of n characters T1, T2, …, Tn – Pattern of m (m < n) characters P1, P2, …Pm • Output: – Location (index) of every occurrence of pattern within text • Algorithm: – What is the idea? Pattern Matching • Algorithm idea: – Check if pattern matches starting at position 1 – Then check if it matches starting at position 2 – …and so on • How to check if pattern matches text starting at position k? – Check that every character of pattern matches corresponding character of text • How many loops will you need? Pattern Matching • Algorithm idea – Get input (text and pattern) – Set starting location k to 1 – Repeat until reach end of text • Attempt to match every character in the pattern beginning at pos k in text • If there was a match, print k • Add 1 to k – Stop • Question: is this an algorithm? – Yes, at a high level of abstraction – Now we need to write in pseudocode Pattern Matching Algorithm (Fig. 2.12) Variables: n, m, T1T2…Tn , P1P2…Pm , k, mismatch Get values for n, m, the text T1T2…Tn and the pattern P1P2…Pm Set k =1 Repeat until k>n-m+1 Set i to 1 Set mismatch=“NO” Repeat until either (i>m) or (mismatch = “YES”) if Pi ≠ Tk+(i-1) then Set mismatch=“YES” else Increment i by 1 if mismatch = “NO” then Print the message “There is a match at position” k increment k by 1 Stop Variations on the pattern matching algorithm How would you modify the algorithm in order to •Find only the first match for P in T. •Find only the last match for P in T.