Searching UC Berkeley Fall 2004, E77 http://jagger.me.berkeley.edu/~pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons Attribution-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA. Searching Finding a specific entry in a 1-dimensional (eg, column vector, row vector) object. Brute force approach, scan (potentially) the entire list. function Idx = bfsearch(A,Key) N = length(A); Idx=1; while Idx<=N & A(Idx)~=Key Idx = Idx + 1; end if Idx==N+1 Idx = []; % no match found end Entire list scanned, no match found Loop exits if Idx>N or A(Idx)==Key Match found Matlab logical conditionals The Matlab logical conditionals (==, ~=, >, >=, <, <=) all do this form of searching, but – actually check every entry, and – return an vector of 0s and 1s reflecting the outcome They are implemented at “low level” and are essentially as fast as the brute force can be. PseudoCode looks like: function Idx = ltsearch(A,Key) szA = size(A); Idx = zeros(szA); for i=1:prod(szA) if A(i)<Key Idx(i) = 1; end end If the number of elements in A doubles, we expect program will take about twice as long to run. Matlab find The Matlab command find looks for all nonzero entries. Again, brute force, but implemented at “low level” and essentially as fast as the brute force can be. PseudoCode looks like: function Idx = find(A) szA = size(A); Idx = zeros(prod(szA),1); Cnt = 0; for i=1:prod(szA) if A(i)~=0 cnt = cnt + 1; Idx(cnt) = i; end End Idx = Idx(1:cnt); If the number of elements in A doubles, we expect program will take about twice as long to run. Searching in a sorted list If the list is sorted, it should be easier to search, since checking for a match at any location in the list –finds the match, or (more likely) –Splits the list (at that location) into two lists, one to the “left” of the location, and one to the “right” • Since the list is sorted, we immediately know which of the two sublists the match must belong to. Take a sorted list A, of length N. A1 A2 Ak 1 Ak Ak 1 AN Match must be over here If Ak Key Match must be over here Else (ie., Ak Key) Searching in a sorted list Take a sorted list A, of length N. A1 A2 Ak 1 Ak Ak 1 AN Match must be over here If Ak Key Match must be over here Else (ie., Ak Key) Choose k in the “middle”, halving the size of the relevant list each time, until a match is found. Looks a lot like Bisection for root finding Simple Binary Search Example Lets do an example with A as below, and Key = 0.1; 3.2 1.9 0.2 0.1 1.3 1.9 2.1 3.5 3.9 4.0 4.4 4.8 6.0 9.7 L = 1; R = length(A); while R>L M = floor((L+R)/2); if A(M)<Key L = M+1; else R = M; end end Matlab code for Binary Search function Idx = bsearch(A,Key) Left = 1; Right = length(A); while Right>Left Mid = floor((Left+Right)/2); if A(Mid)<Key If Right-Left>1, then Mid is between them, and subsequent Left = Mid+1; value of Right-Left is else reduced (essentially halved). Right = Mid; end If Right-Left==1, then Mid end equals Left, and subsequently Left==Right, leading to the if A(Left)==Key correct final case Idx = Left; else Idx = []; % no match found end Operation Count for Binary Search Let R(n) denote the number of operations it takes to search for a key in a sorted list of length n. Clearly, if n=1, then R(n)=0, as there is nothing to do. Also, it only takes one comparison to split the list length in half (since it is sorted!), so R(n) = R(n/2) + 1. We can then prove that R(n) ≤ log2(n) Rough Operation Count for BSearch The recursive relation for R R(1)=0, R(n) = R(n/2) + 1 Claim: For n=2m, it is true that R(n) ≤ log2(n) Case (m=0): true, since log2(1)=0 Case (derive m=k+1 case from m=k) R2 2 R2 1 log 2 1 R2 k 1 k Recursive relation k k 2 k 1 log 2 2 k 1 Induction hypothesis Matlab code for Binary Search (Recursive) function Idx = bsearch(A,Key,Left,Right) if Left==Right If Right-Left==1, then Mid equals Left, and if A(Left)==Key both recursive calls have Left==Right, leading Idx = Left; to the correct base case else Idx = []; % no match found end else Mid = floor((Left+Right)/2); if A(Mid) < Key % match must be beyond Mid Idx = bsearch(A,Key,Mid+1,Right); else Idx = bsearch(A,Key,Left,Mid); end If Right-Left>1, then Mid is between them, both recursive end calls involve “smaller” intervals