For the birds Selecting and sorting data CS112 Scientific Computation Department of Computer Science Wellesley College A bird database Name GreatEgret GreatBlueHeron SnowyEgret GreenHeron AmericanBittern MuteSwan CanadaGoose SnowGoose AmericanBlackDuck NorthernPintail Mallard BlueWingedTeal Osprey BaldEagle NorthernHarrier CoopersHawk RedTailedHawk PeregrineFalcon ... Family heron heron heron heron heron waterfowl waterfowl waterfowl waterfowl waterfowl waterfowl waterfowl hawk/eagle hawk/eagle hawk/eagle hawk/eagle hawk/eagle hawk/eagle Habitat Size marshes 39 marshes 48 marshes/ponds 24 marshes/ponds 19 marshes 28 lagoons/ponds 60 wetlands/fields 40 marshes/fields 28 wetlands 23 marshes/ponds 26 wetlands 24 marshes/ponds 15 wetlands 23 wetlands 32 marshes/fields 22 woods 17 woods/fields 22 marshes/cities 18 Wingspan 51 72 38 25 45 96 72 58 36 35 36 24 72 80 54 28 58 44 BackColor UnderColor HeadColor Spotted Comment white white white no longLegs blue/gray black black/white no longLegs white white white no longLegs green brown black no darkBill brown brown brown no greenLegs white white white no orangeBill brown brown black/white no fliesInV white white white no orangeBill brown brown brown no yellowBill gray white brown no longTail brown gray green no purpleChest brown brown blue/gray yes blueShoulders brown white brown/white yes whiteLegs brown brown white no yellowBeak gray gray brown no longTail gray white gray no orangeLegs brown white brown yes orangeTail gray gray black/white no sideburn Suppose you’d like to select all the herons, select the birds with giant wingspans, sort by size or wingspan, sort alphabetically by name… Databases 18-2 Let’s load it in and see what we have % loadBirds.m birds = cell(1,10); [birds{1} birds{2} birds{3} birds{4} birds{5} birds{6} birds{7} … birds{8} birds{9} birds{10}] = ... textread('birds.txt', '%s %s %s %u %u %s %s %s %s %s', 'headerlines', 1); function printBirdInfo (birds) % prints out all the information stored in the input cell array for i = 1:length(birds{1}) disp(sprintf('%22s %10s %17s %3u %3u %12s %12s %12s %4s %17s', ... birds{1}{i}, birds{2}{i}, birds{3}{i}, birds{4}(i), birds{5}(i), ... birds{6}{i}, birds{7}{i}, birds{8}{i}, birds{9}{i}, birds{10}{i})) end Databases 18-3 Selecting all the herons function herons = getHerons (birds) % find the indices of all birds from the heron family indices = find(strcmp(birds{2},'heron')); % create an empty cell array and fill it with all the % information from the heron family herons = cell(1,10); for i = 1:10 herons{i} = birds{i}(indices); end Exercise: select birds with large wingspans (> 48”) Databases 18-4 MATLAB sort function >> nums = [7 2 9 7 8 3 6 1 3 4]; >> sortNums = sort(nums) sortNums = 1 2 3 3 4 6 7 7 8 9 >> sortNums = sort(nums, 'descend') sortNums = 9 8 7 7 6 4 3 3 2 1 >> [sortNums sortIndices] = sort(nums, 'ascend') sortNums = 1 2 3 3 4 6 7 7 8 9 sortIndices = 8 2 6 9 10 7 1 4 5 3 Exercise: What does the expression nums(sortIndices) return? Databases 18-5 Now let’s sort a cell array of strings >> words = {'early' 'cloud' 'heights' 'a' 'black' 'great' 'from’ 'descended'}; >> sortWords = sort(words) sortWords = 'a' 'black' 'cloud' 'descended' 'early' 'from' 'great' 'heights' >> words = {'early' 'Cloud' 'heights' 'A' 'black' 'Great' 'from' 'Descended'}; >> [sortWords sortIndices] = sort(words) sortWords = 'A' 'Cloud' 'Descended' 'Great' 'black' 'early' 'from' 'heights' sortIndices = 4 2 8 6 5 1 7 3 Hmmmm…. What’s going on here??? Databases 18-6 Remember the ASCII code? When comparing the order of two strings MATLAB uses the order of characters in the ASCII code in which all capital letters appear before all lowercase letters Exercise: Write a function that sorts a cell array of words alphabetically, independent of capitalization Databases 18-7 Sorting the bird data function sortedData = sortByWingspan (birds) % sort the bird information by wingspan [temp indices] = sort(birds{5}); % create an empty cell array and fill it with all the bird % information in sorted order sortedData = cell(1,10); for i = 1:10 sortedData{i} = birds{i}(indices); end Exercise: sort the birds alphabetically by name Databases 18-8 On the flip side… Suppose we have some nice, orderly information that we want to scramble >> order = randperm(10) order = 6 2 5 1 4 8 10 3 7 9 >> conditions = [2.0 -2.0 2.5 -2.5 3.0 -3.0 3.5 -3.5 4.0 -4.0]; >> newConditions = conditions(order) newConditions = -3.0 -2.0 3.0 2.0 -2.5 -3.5 -4.0 2.5 3.5 4.0 Exercise: write a function makeAnagram that has an input string and returns an anagram of the string Databases 18-9 Revisiting selection function selection = getHerons (birds) % select birds from the heron family indices = find(strcmp(birds{2},'heron')); selection = cell(1,10); for i = 1:10 selection{i} = birds{i}(indices); end function selection = getSmalls (birds) % select small birds indices = find(birds{4} < 10); selection = cell(1,10); for i = 1:10 selection{i} = birds{i}(indices); end function selection = getBigWings (birds) % select birds with large wingspans indices = find(birds{5} > 48); selection = cell(1,10); for i = 1:10 selection{i} = birds{i}(indices); end function selection = getWetlands (birds) % select birds who inhabit wetlands indices = []; for i = 1:length(birds{3}) if findstr(birds{3}{i},'wetlands') indices(end+1) = i; end end selection = cell(1,10); for i = 1:10 selection{i} = birds{i}(indices); end Databases 18-10 Tailoring the selection criterion function indices = getHerons (birds) % select birds from the heron family indices = find(strcmp(birds{2}, 'heron'); function indices = getBigWings (birds) % select birds with large wingspans indices = find(birds{5} > 48); function indices = getWetlands (birds) % select birds who inhabit wetlands indices = []; for i = 1:length(birds{3}) if findstr(birds{3}{i},'wetlands') indices(end+1) = i; end function indices = getSmalls (birds) end % select small birds indices = find(birds{4} < 10); Databases 18-11 Functions as inputs function newVal = calc (val) newVal = 2*val+3; function results = applyFunc1 (func, vect) for i = 1:length(vect) results(i) = feval(func, vect(i)); end >> results = applyFunc1('calc', nums) function results = applyFunc2 (func, vect) for i = 1:length(vect) results(i) = func(vect(i)) end >> results = applyFunc2(@calc, nums) Databases 18-12 Revisiting selection function selection = getBirds1 (birds, criterion) % select birds using input criterion function indices = feval(criterion, birds) selection = cell(1,10); for i = 1:10 selection{i} = birds{i}(indices); end >> herons = getBirds1(birds, 'getHerons'); function selection = getBirds2 (birds, criterion) % select birds using input criterion function indices = criterion(birds) selection = cell(1,10); for i = 1:10 selection{i} = birds{i}(indices); end >> herons = getBirds2(birds, @getHerons) Databases 18-13 The search is on ... Suppose we have our data in sorted order, how can we search for a particular value, in an efficient way? Let’s play a game . . . I’m thinking of a number between 1 and 100 What is it? You’re probably using the binary search strategy Databases 18-14