transcript - New Mexico Computer Science for All

advertisement
Search Algorithms
This presentation is on Search Algorithms.
[Slide 1] Searching is what we do when we want to find a specific item among a
group of items.
In computer science, we look at the problem like this, we have a list and we want
to find a specific item so we specify it. We define the property that we're looking
for and then we use a search algorithm to find it.
We're going to look at 2 search algorithms. Linear Search which is sometimes
called Sequential Search and Binary Search.
[Slide 2] Linear Search Algorithms are the simplest types to use. It works like this,
you have a list and you check everyone of its elements one at a time in sequence
until you find the desired item.
Here's the pseudo code. Specify the item you're looking for and then for each
item in your list, check the list item to see if it's the desired item. If it is, then
stop the search and return the item's location so you can find it. If not, go onto
the next item in the list and keep repeating until you find the item you're looking
for.
[Slide 3] Let's look at an example. Say you have a list, it's thousands of items
long, in this case it's numbers and you're looking for a specific number. Let's look
for 25. You look at the first item in the list, the number 50 and you compare it to
the number 25. Since 50 is not equal to 25, you move on to the next item, 539.
Well 539 is also not equal to 25 so you move on to the next item, 810. 810 is not
equal to 25 so you keep going down the list thousands of items long until you get
to the number 25. Well, 25 is equal to 25 and you say "Hallelujah. I don't have to
keep doing this anymore. I found what I'm looking for." You don't have to look at
the rest of the items in the list.
[Slide 4] There are pros and cons to a Linear Search. The pros are that it's very
simple to understand and therefor very simple to implement and it works well
for small lists or a list of unsorted items that you only have to search a few times.
The cons are that it's more time consuming than other methods and in the worst
case scenario, if your item is at the end of the list, then you have to search all the
items at the list to get to it.
[Slide 5] Let's look at a Binary Search. Binary Search is very different from a
Linear Search. The problem is the same though except that you can only perform
Search Algorithms
Page 1 of 5
a binary search on a sorted list of items. The list has to be arranged in some
order already, from lowest to highest, brightest to darkest or darkest to
brightest, in some order that you know and you'll see why in a minute.
This is the method, you start with the sorted list and then you divide the list in
half. You check to see which half your item is in and because it's sorted than
you'd only have to look at the numbers where you divided it in half. You'll see
what I mean in a minute.
Then you select the half, the item that you're looking for is in and you repeat,
you divide that list in half. Check to see which half your item is in and keep on
going until you get to your item.
Let's take a look. [Slide 6] Okay, here we have a list. The list is sorted. In this case
it's sorted in numerical order, from lowest to highest and we're looking for the
number 18. Now you're saying "I can see the number 18, we don't need to look
for it." Well I know. This seems like a trivial example but I just wanted to show
you how a Binary Search works.
Okay. The first step is to divide the list in half and then we look at the numbers
that are closest to the division that we made where we divided the list in half.
We ask the question in this case because we have sorted according to numerical
order, "Is 18 less than or equal to 27?" Or "Is 18 greater than or equal to 39?"
Well we know that 18 is less than or equal to 27, so it must be in the list on the
left and that's the list we're going to look at from now on.
Okay. We divide this list in half again and we look at the 2 numbers that are
closest to the where we divided the list in half, to the division point there. we ask
the question "Is 18 less than or equal to 5?" Or is 18 greater than or equal to
18?" Well, 18 is equal to 18, so we found the number we're looking for.
Let's take a look at a video. [Slide 7] here's the information about the website
where you can find the whole video. This is just a portion of the video. [Slide 8]
Speaker 2:
What do we mean though by how ... teaching people how to think more
carefully, more algorithmically? Well this little visual always seems to go over
well and it seems to be memorable and I was even asked just yesterday by a
former student, "Oh, you're going to do the phone book thin again" and I pretty
much on the stop decided "All right. Sure. We'll do the phone book thing." And
he asked me "When you tore it in half right?" I was like "Well yeah. Technically
we tore it in half but not in the way ..." in the computer science way, well I'd be
able to tear this thing in half.
Search Algorithms
Page 2 of 5
And so here was the problem that we presented for sometime. here's the phone
book. It's got at least a thousand or so pages and the simple goal at hand, very
real world is to find say a person in here. Mike Smith, last name starting with S.
I'm a typical human, I pick up this phone book and then you went out there.
What would you, a typical person do to start finding Mike Smith? Obviously not
knowing in advance what page he is actually on?
All right. You go roughly in the middle? Right? An at this point in the story, I'm
probably in the Ms or Ns. Roughly half way through the phone book. Turns put
the last time I did this example, I somehow found myself in the escort section.
It's actually not equally balanced between A through M and N through Z. But
today, we are in fact in the Ms.
All right. Now I'm at the Ms but what's my takeaway now just as a normal human
off the street? Where do I go next for Mike Smith? He's probably on this half
right? Because S comes after N and so here in lies the visual drama. So it's not
really tearing it in half right? I kind of cheated down the center. But we now
know that Mike is at least not in that half. We can literally throw half of the
problem away and I'm left with the problem that's fundamentally still the same
thing, find Mike Smith in a really big book but the problem is now half as large. If
it had a thousand pages before, now it's got 500.
You know what? I can do the same thing again. I can kind of recursively or
repeatedly do the same thing. Now I'm not quite at S, I'm at T and so oh, I went a
little too far but I know now that Mike is not to the right. There's got to be some
class on there or I just can't tear the damn thing I bet. But now, I know he's not
to the right. Now the problem has been quartered. I've gone from a thousand to
500 to 250 pages and again, if you continue the logic, continue the mathematics,
I'm chopping this problem in half, in half, in half until finally I'm either going to
not find any Smiths at all, unlikely. Or I'm going to find the one I'm looking for.
But that then begs the question "Is this any better than the simple approach of
just saying "A no, B no." Starting from left to right, going linearly through the
book.
Well instinctively yes. It's going to be a lot faster. But how much less? Well if I
have a thousand page phone book or maybe let's say a 1,024 for those of you
who like powers of 2. Well how many times might I have to split this problem in
half before finding Mr. Smith?
10 right? If you have a 1,024 pages and you split them in half, in half, in half, in
half, I do that 10 times which means I go from a thousand pages to the person
I'm looking for in just 10 page turns and that's kind of neat but if you think about
it, you've been doing this all of your life, not that dramatic.
Search Algorithms
Page 3 of 5
But now suppose the phone book isn't just for Boston, it's for the entire US or
the entire world and this thing has billions of pages in it. Imagine a phone book
with 4 billion pages. How many page turns am I going to have to do maximally to
find Mike Smith in a 4 billion page phone book?
Now, so yeah, if you're kind of like the math type because this is log base 2 but if
you think 4 billion to 2 billion to 1 billion to half a billion. I mean that actually
goes windows itself down pretty darn fast and in fact within 32 havings of the
phone book even from 4 billion I'll get down to 1 page and that's when this stuff
gets powerful I think. That's when these ideas get compelling when you can have
a 4 billion page problem and in 32 steps you can find the person you're looking
for.
And so that's what we mean when we say that you'll learn how to think more
carefully, more algorithmically, more intelligently about solving problems and
the returns are huge when you can actually do this.
Speaker 1:
[Slide 9] Okay. A Binary Search can be very useful and here are the pros and
cons.
A Binary Search Algorithm is very powerful, it's fast and actually it's relatively
easy to understand and it works well for large sorted data sets like we say with
the phone book. Another pro is that you would want to use a Binary Search if
you had a large data set that you have to search repeatedly. You wouldn't want
to use a Linear Search in that case because it takes a long time each time you're
looking for an item.
The cons are that you must sort your list first and it's a little bit more difficult to
implement.
[Slide 10] Let's compare the Linear Search Algorithm with the Binary Search
Algorithm.
In this case we're looking at the worse case scenario. If the item that you're
looking for is in the absolute last search you do using each search method. For a
Linear Search, it would be the very last item in your list. In a Binary Search, it
would be that very last division that you do of your list until you have only 1
element in your list.
Well, let's look at the case where you have 10 items in your list. In that case, a
Linear Search the worse case it would be would be the 10th item, so you do 10
comparisons.
Search Algorithms
Page 4 of 5
For Binary Search, the worse case would be 4 comparisons. Well there's not
much difference there. I mean 10 comparisons is not that big of deal.
But if we go down and say look at 10,000, your list is now 10,000 items long,
well, in a Linear Search case, you'd have to look at 10,000 items for the worse
case scenario but in the Binary Search Case, you'd only have to divide that list 14
times before you got to the very last item that you were looking for.
How about if you had a billion items in your list? Well, for Linear Search, you'd
have to look at a billion items if that item you were looking for was the last item
in the list. But for Binary Search, you'd only have to do 30 searches. That's great.
[Slide 11] In summary, Search Algorithms are what you use to find an item with
specific properties among a group of items. In a Linear or Sequential Search, you
check every element or item in the list one at a time. It's simple to understand
and simple to implement. It's good for small lists or single searches of unsorted
large data sets but it is more time consuming and the worse case search equals
the number of items in the list.
In a Binary Search you keep dividing the list in half and checking to see which half
of the list your item is in until you find it. This is powerful and it's fast and it
works really well for large sorted data sets. But you must sort the list first. It is
slightly more difficult to implement than a Linear Search.
Thank you.
Search Algorithms
Page 5 of 5
Download