Lab 13

advertisement
Laboratory 13
Searching
Introduction
We now turn our attention to a second fundamental list-processing algorithm, that of
searching a list for a desired object. In this lab, the objects belong to the class Player.
Specifically, each Player object corresponds to a record containing the name, team, and
some batting statistics for a major league baseball player. The class LeagueData allows
us to manipulate the entire list of Player objects for a given league, performing
operations such as searching for a particular player’s record, and listing the top ten hitters
in the league. As an exercise, you will be asked to add more member functions to this
class. Our primary goal, however, is to use this setting to compare the two algorithms for
searching: linear searching vs. binary searching.
Key Concepts
Abstract Data Types
Arrays
Problem Solving: Linear vs. Binary Searching
Complexity: Average case and worst case measures
Before the Lab
Read Section 9.8 in Cohoon and Davidson.
Preliminary
Turn the PC on, if necessary.
Access the network.
Copy the source files from the Lab 13 Source Files folder.
Start up CodeWarrior.
Open the LabWork project.
Linear and Binary Searches
Observation
The purpose of this program is to compare the efficiency of the linear search to the binary
search. The difference is dramatic, as you will see, indicating that the binary search is
much better than the linear search. On the other hand, the binary search requires that the
list be sorted, which itself takes more time than a linear search.
Open the LabWork project.
Open the files PlayerSearch.cpp, leaguedata.cpp, and player.cpp. and add them to the
LabWork project.
Read the program PlayerSearch.cpp to see what the program does.
In this program, we perform a linear search for Jeff Kent’s record and a binary search for
Mike Piazza’s record, by sending to the appropriate search member functions as a string
constant the last name of the Player we are interested in. Then, to see what happens
when a record is not found, we perform a linear search for Valente and a binary search
for Koether.
The LinSearch() and BinSearch() functions return the number of comparisons made
during the search. The program PlayerSearch.cpp reports the count after each search.
Each function also has a second parameter, verbose. If this parameter is true, then the
search will print the record if it is found, and it will print a message if the record is not
found. If this parameter is false or omitted, then the search functions will print nothing.
Run PlayerSearch.cpp.
How many comparisons were required to find Jeff Kent’s record?
How many comparisons were required to find Mike Piazza’s record?
Assume this particular list has 341 Player objects.
How many comparisons, in the worst case, should we expect of the linear search?
How many comparisons, on the average, should we expect the linear search to perform?
How many comparisons, in the worst case, should we expect of the binary search? Note that
with each comparison, the number of records remaining under consideration is cut in half.
How many comparisons, on the average, should we expect of binary search to locate an object in
this list of 341 objects?
Write down your answers to the above questions. Later in this lab we will experiment to
see if your answers are close to the empirical results.
Open the file leaguedata.cpp.
Read the functions LinSearch() and BinSearch() to see how the search functions are
implemented.
Open the file PlayerRandomSearch.cpp.
Add PlayerRandomSearch.cpp to the LabWork project.
Remove PlayerSearch.cpp from the LabWork project. Leave player.cpp and
leaguedata.cpp in the project.
Earlier in this lab, you guessed the number of comparisons in the average and worst cases
for the linear and binary search algorithms performed on a list of 341 records. Now we
will test your hypotheses by running a program that will conduct 1000 random linear
searches and reporting the average and the maximum number of comparisons. The
maximum and the average should be good approximations to the worst case and the true
average number of comparisons, respectively.
Run PlayerRandomSearch.cpp.
What are the maximum and average numbers of comparisons?
Is this result surprising? How does it compare to your guesses?
If your guesses were not close, then ask your instructor for an explanation.
Now we will repeat this experiment for binary searches by performing 1000 random
binary searches and reporting the results. In the program PlayerRandomSearch.cpp,
Change the function call LinSearch() to the call BinSearch().
Run PlayerRandomSearch.cpp.
Is this result surprising? How does it compare to your guesses?
If your guesses were not close, then ask your instructor for an explanation.
Sorting an Array by a Field
Observation
Note that the class LeagueData stores two different lists of Players, one called
alphaPlayerList and the other called playerList. These lists contain exactly the same
Players, but in different orders. The array alphaPlayerList always remains in
alphabetical order according to the Players’ last names. That is why the BinSearch()
member functions is applied only to alphaPlayerList.
Normally, the records stored in a database contain many individual fields. In this lab,
some of the fields are the player’s name, team, and batting average. It is possible to sort
the database by any of these fields. In our next example, we would like to find the
players who have the highest batting averages. To do this, we will sort the records by the
field of batting averages.
Open the file BALeaders.cpp.
Add BALeaders.cpp to the LabWork project.
This program will sort the players in the playerList array into order according to
batting average. Then it will print the names of the players with the top ten batting
averages.
Note that, in sorting the Players from highest to lowest batting average, the SortByBA()
member function is careful to sort the playerList array, not the alphaPlayerList
array. Were it instead to sort alphaPlayerList by batting average, it would render
useless all further calls to BinSearch() since BinSearch() depends on the names in
alphaPlayerList being in alphabetical order.
As a good review of sorting,
Study the SortByBA() function.
Run BALeaders.cpp.
Application
Open the MS Word document Lab 13 App.doc, found in the Lab 13 App subfolder, and
follow the instructions.
Summary
In this lab, we conducted experiments with two different searching algorithms: linear
search and binary search. We obtained evidence to support the theory that a linear search
of a list with n objects requires, on average, (n + 1)/2 comparisons. For example, our list
of 341 objects requires 170.5 comparisons on average. In effect, if one doubles the size
of the list, the linear search algorithm will require twice the time to search it. This is a
property of all “linear” or O(n) algorithms.
Our experiments demonstrated that a binary search is vastly superior to a linear search.
Of course, a binary search can work properly only if the list is ordered. To appreciate the
differences in performance of the two searching algorithms, we note that if the list
contains 341 names, the binary search algorithm requires, in the worst case, just nine
comparisons of the search name to names in the list. This number comes about as the
number of terms in the sequence of numbers that represent the sizes of the sublists under
consideration:
341 170 85 42 21 10 5 2 1
The above sequence suggests that if the list size is doubled, the binary search algorithm’s
worst case will involve only 1 additional comparison! This is a property of all O(log n)
algorithms.
Download