select_sort - Computer Science

advertisement
For the birds
Selecting and sorting data
CS112 Scientific Computation
Department of Computer Science
Wellesley College
A bird database
Name
GreatEgret
GreatBlueHeron
SnowyEgret
GreenHeron
AmericanBittern
MuteSwan
CanadaGoose
SnowGoose
AmericanBlackDuck
NorthernPintail
Mallard
BlueWingedTeal
Osprey
BaldEagle
NorthernHarrier
CoopersHawk
RedTailedHawk
PeregrineFalcon
...
Family
heron
heron
heron
heron
heron
waterfowl
waterfowl
waterfowl
waterfowl
waterfowl
waterfowl
waterfowl
hawk/eagle
hawk/eagle
hawk/eagle
hawk/eagle
hawk/eagle
hawk/eagle
Habitat
Size
marshes
39
marshes
48
marshes/ponds
24
marshes/ponds
19
marshes
28
lagoons/ponds
60
wetlands/fields 40
marshes/fields 28
wetlands
23
marshes/ponds
26
wetlands
24
marshes/ponds
15
wetlands
23
wetlands
32
marshes/fields 22
woods
17
woods/fields
22
marshes/cities 18
Wingspan
51
72
38
25
45
96
72
58
36
35
36
24
72
80
54
28
58
44
BackColor UnderColor HeadColor Spotted Comment
white
white
white
no
longLegs
blue/gray black
black/white no
longLegs
white
white
white
no
longLegs
green
brown
black
no
darkBill
brown
brown
brown
no
greenLegs
white
white
white
no
orangeBill
brown
brown
black/white no
fliesInV
white
white
white
no
orangeBill
brown
brown
brown
no
yellowBill
gray
white
brown
no
longTail
brown
gray
green
no purpleChest
brown
brown
blue/gray
yes blueShoulders
brown
white
brown/white yes
whiteLegs
brown
brown
white
no
yellowBeak
gray
gray
brown
no
longTail
gray
white
gray
no
orangeLegs
brown
white
brown
yes
orangeTail
gray
gray
black/white no
sideburn
Suppose you’d like to select all the herons,
select the birds with giant wingspans,
sort by size or wingspan,
sort alphabetically by name…
Databases
18-2
Let’s load it in and see what we have
% loadBirds.m
birds = cell(1,10);
[birds{1} birds{2} birds{3} birds{4} birds{5} birds{6} birds{7} …
birds{8} birds{9} birds{10}] = ...
textread('birds.txt', '%s %s %s %u %u %s %s %s %s %s',
'headerlines', 1);
function printBirdInfo (birds)
% prints out all the information stored in the input cell array
for i = 1:length(birds{1})
disp(sprintf('%22s %10s %17s %3u %3u %12s %12s %12s %4s %17s', ...
birds{1}{i}, birds{2}{i}, birds{3}{i}, birds{4}(i), birds{5}(i), ...
birds{6}{i}, birds{7}{i}, birds{8}{i}, birds{9}{i}, birds{10}{i}))
end
Databases
18-3
Selecting all the herons
function herons = getHerons (birds)
% find the indices of all birds from the heron family
indices = find(strcmp(birds{2},'heron'));
% create an empty cell array and fill it with all the
% information from the heron family
herons = cell(1,10);
for i = 1:10
herons{i} = birds{i}(indices);
end
Exercise: select birds with large
wingspans (> 48”)
Databases
18-4
MATLAB sort function
>> nums = [7 2 9 7 8 3 6 1 3 4];
>> sortNums = sort(nums)
sortNums =
1 2 3 3 4 6 7 7 8 9
>> sortNums = sort(nums, 'descend')
sortNums =
9 8 7 7 6 4 3 3 2 1
>> [sortNums sortIndices] = sort(nums, 'ascend')
sortNums =
1 2 3 3 4 6 7 7 8 9
sortIndices =
8 2 6 9 10 7 1 4 5 3
Exercise: What does the expression nums(sortIndices) return?
Databases
18-5
Now let’s sort a cell array of strings
>> words = {'early' 'cloud' 'heights' 'a' 'black' 'great' 'from’ 'descended'};
>> sortWords = sort(words)
sortWords =
'a' 'black' 'cloud' 'descended' 'early' 'from' 'great' 'heights'
>> words = {'early' 'Cloud' 'heights' 'A' 'black' 'Great' 'from' 'Descended'};
>> [sortWords sortIndices] = sort(words)
sortWords =
'A' 'Cloud' 'Descended' 'Great' 'black' 'early' 'from' 'heights'
sortIndices =
4 2 8 6 5 1 7 3
Hmmmm…. What’s going on here???
Databases
18-6
Remember the ASCII code?
When comparing the order of
two strings MATLAB uses the
order of characters in the
ASCII code in which all
capital letters appear before
all lowercase letters
Exercise: Write a function that
sorts a cell array of words
alphabetically, independent of
capitalization
Databases
18-7
Sorting the bird data
function sortedData = sortByWingspan (birds)
% sort the bird information by wingspan
[temp indices] = sort(birds{5});
% create an empty cell array and fill it with all the bird
% information in sorted order
sortedData = cell(1,10);
for i = 1:10
sortedData{i} = birds{i}(indices);
end
Exercise: sort the birds
alphabetically by name
Databases
18-8
On the flip side…
Suppose we have some nice, orderly information that we want to
scramble
>> order = randperm(10)
order =
6 2 5 1 4 8 10
3 7 9
>> conditions = [2.0 -2.0 2.5 -2.5 3.0 -3.0 3.5 -3.5 4.0 -4.0];
>> newConditions = conditions(order)
newConditions =
-3.0 -2.0 3.0 2.0 -2.5 -3.5 -4.0 2.5 3.5 4.0
Exercise: write a function makeAnagram
that has an input string and returns an
anagram of the string
Databases
18-9
Revisiting selection
function selection = getHerons (birds)
% select birds from the heron family
indices = find(strcmp(birds{2},'heron'));
selection = cell(1,10);
for i = 1:10
selection{i} = birds{i}(indices);
end
function selection = getSmalls (birds)
% select small birds
indices = find(birds{4} < 10);
selection = cell(1,10);
for i = 1:10
selection{i} = birds{i}(indices);
end
function selection = getBigWings (birds)
% select birds with large wingspans
indices = find(birds{5} > 48);
selection = cell(1,10);
for i = 1:10
selection{i} = birds{i}(indices);
end
function selection = getWetlands (birds)
% select birds who inhabit wetlands
indices = [];
for i = 1:length(birds{3})
if findstr(birds{3}{i},'wetlands')
indices(end+1) = i;
end
end
selection = cell(1,10);
for i = 1:10
selection{i} = birds{i}(indices);
end
Databases 18-10
Tailoring the selection criterion
function indices = getHerons (birds)
% select birds from the heron family
indices = find(strcmp(birds{2}, 'heron');
function indices = getBigWings (birds)
% select birds with large wingspans
indices = find(birds{5} > 48);
function indices = getWetlands (birds)
% select birds who inhabit wetlands
indices = [];
for i = 1:length(birds{3})
if findstr(birds{3}{i},'wetlands')
indices(end+1) = i;
end
function indices = getSmalls (birds)
end
% select small birds
indices = find(birds{4} < 10);
Databases
18-11
Functions as inputs
function newVal = calc (val)
newVal = 2*val+3;
function results = applyFunc1 (func, vect)
for i = 1:length(vect)
results(i) = feval(func, vect(i));
end
>> results = applyFunc1('calc', nums)
function results = applyFunc2 (func, vect)
for i = 1:length(vect)
results(i) = func(vect(i))
end
>> results = applyFunc2(@calc, nums)
Databases 18-12
Revisiting selection
function selection = getBirds1 (birds, criterion)
% select birds using input criterion function
indices = feval(criterion, birds)
selection = cell(1,10);
for i = 1:10
selection{i} = birds{i}(indices);
end
>> herons = getBirds1(birds, 'getHerons');
function selection = getBirds2 (birds, criterion)
% select birds using input criterion function
indices = criterion(birds)
selection = cell(1,10);
for i = 1:10
selection{i} = birds{i}(indices);
end
>> herons = getBirds2(birds, @getHerons)
Databases 18-13
The search is on ...
Suppose we have our data in sorted order, how can we search
for a particular value, in an efficient way?
Let’s play a game . . .
I’m thinking of a number
between 1 and 100
What is it?
You’re probably using the
binary search strategy
Databases 18-14
Download