Search Algorithms This presentation is on Search Algorithms. [Slide 1] Searching is what we do when we want to find a specific item among a group of items. In computer science, we look at the problem like this, we have a list and we want to find a specific item so we specify it. We define the property that we're looking for and then we use a search algorithm to find it. We're going to look at 2 search algorithms. Linear Search which is sometimes called Sequential Search and Binary Search. [Slide 2] Linear Search Algorithms are the simplest types to use. It works like this, you have a list and you check everyone of its elements one at a time in sequence until you find the desired item. Here's the pseudo code. Specify the item you're looking for and then for each item in your list, check the list item to see if it's the desired item. If it is, then stop the search and return the item's location so you can find it. If not, go onto the next item in the list and keep repeating until you find the item you're looking for. [Slide 3] Let's look at an example. Say you have a list, it's thousands of items long, in this case it's numbers and you're looking for a specific number. Let's look for 25. You look at the first item in the list, the number 50 and you compare it to the number 25. Since 50 is not equal to 25, you move on to the next item, 539. Well 539 is also not equal to 25 so you move on to the next item, 810. 810 is not equal to 25 so you keep going down the list thousands of items long until you get to the number 25. Well, 25 is equal to 25 and you say "Hallelujah. I don't have to keep doing this anymore. I found what I'm looking for." You don't have to look at the rest of the items in the list. [Slide 4] There are pros and cons to a Linear Search. The pros are that it's very simple to understand and therefor very simple to implement and it works well for small lists or a list of unsorted items that you only have to search a few times. The cons are that it's more time consuming than other methods and in the worst case scenario, if your item is at the end of the list, then you have to search all the items at the list to get to it. [Slide 5] Let's look at a Binary Search. Binary Search is very different from a Linear Search. The problem is the same though except that you can only perform Search Algorithms Page 1 of 5 a binary search on a sorted list of items. The list has to be arranged in some order already, from lowest to highest, brightest to darkest or darkest to brightest, in some order that you know and you'll see why in a minute. This is the method, you start with the sorted list and then you divide the list in half. You check to see which half your item is in and because it's sorted than you'd only have to look at the numbers where you divided it in half. You'll see what I mean in a minute. Then you select the half, the item that you're looking for is in and you repeat, you divide that list in half. Check to see which half your item is in and keep on going until you get to your item. Let's take a look. [Slide 6] Okay, here we have a list. The list is sorted. In this case it's sorted in numerical order, from lowest to highest and we're looking for the number 18. Now you're saying "I can see the number 18, we don't need to look for it." Well I know. This seems like a trivial example but I just wanted to show you how a Binary Search works. Okay. The first step is to divide the list in half and then we look at the numbers that are closest to the division that we made where we divided the list in half. We ask the question in this case because we have sorted according to numerical order, "Is 18 less than or equal to 27?" Or "Is 18 greater than or equal to 39?" Well we know that 18 is less than or equal to 27, so it must be in the list on the left and that's the list we're going to look at from now on. Okay. We divide this list in half again and we look at the 2 numbers that are closest to the where we divided the list in half, to the division point there. we ask the question "Is 18 less than or equal to 5?" Or is 18 greater than or equal to 18?" Well, 18 is equal to 18, so we found the number we're looking for. Let's take a look at a video. [Slide 7] here's the information about the website where you can find the whole video. This is just a portion of the video. [Slide 8] Speaker 2: What do we mean though by how ... teaching people how to think more carefully, more algorithmically? Well this little visual always seems to go over well and it seems to be memorable and I was even asked just yesterday by a former student, "Oh, you're going to do the phone book thin again" and I pretty much on the stop decided "All right. Sure. We'll do the phone book thing." And he asked me "When you tore it in half right?" I was like "Well yeah. Technically we tore it in half but not in the way ..." in the computer science way, well I'd be able to tear this thing in half. Search Algorithms Page 2 of 5 And so here was the problem that we presented for sometime. here's the phone book. It's got at least a thousand or so pages and the simple goal at hand, very real world is to find say a person in here. Mike Smith, last name starting with S. I'm a typical human, I pick up this phone book and then you went out there. What would you, a typical person do to start finding Mike Smith? Obviously not knowing in advance what page he is actually on? All right. You go roughly in the middle? Right? An at this point in the story, I'm probably in the Ms or Ns. Roughly half way through the phone book. Turns put the last time I did this example, I somehow found myself in the escort section. It's actually not equally balanced between A through M and N through Z. But today, we are in fact in the Ms. All right. Now I'm at the Ms but what's my takeaway now just as a normal human off the street? Where do I go next for Mike Smith? He's probably on this half right? Because S comes after N and so here in lies the visual drama. So it's not really tearing it in half right? I kind of cheated down the center. But we now know that Mike is at least not in that half. We can literally throw half of the problem away and I'm left with the problem that's fundamentally still the same thing, find Mike Smith in a really big book but the problem is now half as large. If it had a thousand pages before, now it's got 500. You know what? I can do the same thing again. I can kind of recursively or repeatedly do the same thing. Now I'm not quite at S, I'm at T and so oh, I went a little too far but I know now that Mike is not to the right. There's got to be some class on there or I just can't tear the damn thing I bet. But now, I know he's not to the right. Now the problem has been quartered. I've gone from a thousand to 500 to 250 pages and again, if you continue the logic, continue the mathematics, I'm chopping this problem in half, in half, in half until finally I'm either going to not find any Smiths at all, unlikely. Or I'm going to find the one I'm looking for. But that then begs the question "Is this any better than the simple approach of just saying "A no, B no." Starting from left to right, going linearly through the book. Well instinctively yes. It's going to be a lot faster. But how much less? Well if I have a thousand page phone book or maybe let's say a 1,024 for those of you who like powers of 2. Well how many times might I have to split this problem in half before finding Mr. Smith? 10 right? If you have a 1,024 pages and you split them in half, in half, in half, in half, I do that 10 times which means I go from a thousand pages to the person I'm looking for in just 10 page turns and that's kind of neat but if you think about it, you've been doing this all of your life, not that dramatic. Search Algorithms Page 3 of 5 But now suppose the phone book isn't just for Boston, it's for the entire US or the entire world and this thing has billions of pages in it. Imagine a phone book with 4 billion pages. How many page turns am I going to have to do maximally to find Mike Smith in a 4 billion page phone book? Now, so yeah, if you're kind of like the math type because this is log base 2 but if you think 4 billion to 2 billion to 1 billion to half a billion. I mean that actually goes windows itself down pretty darn fast and in fact within 32 havings of the phone book even from 4 billion I'll get down to 1 page and that's when this stuff gets powerful I think. That's when these ideas get compelling when you can have a 4 billion page problem and in 32 steps you can find the person you're looking for. And so that's what we mean when we say that you'll learn how to think more carefully, more algorithmically, more intelligently about solving problems and the returns are huge when you can actually do this. Speaker 1: [Slide 9] Okay. A Binary Search can be very useful and here are the pros and cons. A Binary Search Algorithm is very powerful, it's fast and actually it's relatively easy to understand and it works well for large sorted data sets like we say with the phone book. Another pro is that you would want to use a Binary Search if you had a large data set that you have to search repeatedly. You wouldn't want to use a Linear Search in that case because it takes a long time each time you're looking for an item. The cons are that you must sort your list first and it's a little bit more difficult to implement. [Slide 10] Let's compare the Linear Search Algorithm with the Binary Search Algorithm. In this case we're looking at the worse case scenario. If the item that you're looking for is in the absolute last search you do using each search method. For a Linear Search, it would be the very last item in your list. In a Binary Search, it would be that very last division that you do of your list until you have only 1 element in your list. Well, let's look at the case where you have 10 items in your list. In that case, a Linear Search the worse case it would be would be the 10th item, so you do 10 comparisons. Search Algorithms Page 4 of 5 For Binary Search, the worse case would be 4 comparisons. Well there's not much difference there. I mean 10 comparisons is not that big of deal. But if we go down and say look at 10,000, your list is now 10,000 items long, well, in a Linear Search case, you'd have to look at 10,000 items for the worse case scenario but in the Binary Search Case, you'd only have to divide that list 14 times before you got to the very last item that you were looking for. How about if you had a billion items in your list? Well, for Linear Search, you'd have to look at a billion items if that item you were looking for was the last item in the list. But for Binary Search, you'd only have to do 30 searches. That's great. [Slide 11] In summary, Search Algorithms are what you use to find an item with specific properties among a group of items. In a Linear or Sequential Search, you check every element or item in the list one at a time. It's simple to understand and simple to implement. It's good for small lists or single searches of unsorted large data sets but it is more time consuming and the worse case search equals the number of items in the list. In a Binary Search you keep dividing the list in half and checking to see which half of the list your item is in until you find it. This is powerful and it's fast and it works really well for large sorted data sets. But you must sort the list first. It is slightly more difficult to implement than a Linear Search. Thank you. Search Algorithms Page 5 of 5