Pearls of Functional Algorithm Design, Chapter 2, Surpasser

advertisement
Pearls of Functional Algorithm
Design
Chapter 2
Roger L. Costello
July 2011
1
The Problem We Will Solve
2
Recurring Problem
• Stock Market: each day I record the closing
value of the DOW. Occasionally, I pick a date
and ask, “How many days after this date has
the stock market closed at a higher value?”
• A more challenging question is, “Which day
has the most number of following days where
the stock market closed at a higher value?”
3
DOW:
12,324
12,214
12,390
1 2,400
12,367
12,380
12,310
Date:
6/1/11
6/2/11
6/3/11
6/6/11
6/7/11
6/8/11
6/9/11
12,330
12,340
6/10/11
6/13/11
Number of days
that surpassed this
day: 1
Number of days
that surpassed this
day: 2
Number of days
that surpassed this
day: 1
Number of days
that surpassed this
day: 1
Number of days
that surpassed this
day: 0
Number of days
that surpassed this
day: 1
Number of days
that surpassed this
day: 7
Number of days
that surpassed this
day: 6
4
Recurring Problem (cont.)
• People’s Height: line up a bunch of people.
Pick one person and ask, “How many of the
following people are taller than this person?”
• A more challenging question is, “Which
person has the most number of following
people that are taller?”
5
Height (inches):
Person:
72
68
69
73
65
68
69
64
71
Tom
John
George
Jim
Pete
Sam
Bill
Mike
Shaun
Number of persons
that surpass this
person’s height: 1
Number of persons
that surpass this
person’s height: 1
Number of persons
that surpass this
person’s height: 2
Number of persons
that surpass this
person’s height: 3
Number of persons
that surpass this
person’s height: 0
Number of persons
that surpass this
person’s height: 2
Number of persons
that surpass this
person’s height: 4
Number of persons
that surpass this
person’s height: 1
6
Recurring Problem (cont.)
• Word Analysis: take a letter in a word and
ask, “How many of the following letters are
bigger (occurs later in the alphabet) than this
letter?”
• A more challenging question is, “Which letter
has the most number of following letters that
are bigger?”
7
Word:
G
E
N
E
R
A
T
I
N
G
Number of letters
that surpass this
letter: 0
Number of letters
that surpass this
letter: 1
Number of letters
that surpass this
letter: 0
Number of letters
that surpass this
letter: 4
Number of letters
that surpass this
letter: 1
Number of letters
that surpass this
letter: 5
Number of letters
that surpass this
letter: 2
Number of letters
that surpass this
letter: 6
Number of letters
that surpass this
letter: 5
8
Problem Statement
• Create a list of values.
– Example: create a list of stock market values, or a list
of people’s heights, or a list of letters.
• Simple Problem: select one value in the list, and
count the number of following values that surpass
it.
• Harder Problem: for every value in the list solve
the simple problem; this produces a list of
numbers; return the maximum number.
– This is called the “surpasser problem”
9
Solve the Simple Problem
• Let’s create a function that counts the number
of surpassers of a value.
• The function takes two arguments:
1. The value, x
2. A list, xs, that consists of all the values that
follow x
10
Select the list items
that are greater than 'G'
E N E R A T I N G
filter (>'G') ____
[N, R, T, I, N]
11
Count the selected list items
[N, R, T, I, N]
length ____
5
Five items surpass “G”. That’s the answer!
12
scount
• “scount” (surpasser count) is a user-defined
function; it is the collection of functions shown
on the previous two slides.
scount
scount
scount x xs
::
=
Ord a => a -> [a] -> Int
length (filter (>x) xs)
13
Solve the Harder Problem
• We need to apply “scount” to each item in the
list, producing a list of numbers; then take the
maximum of the numbers.
14
Invoke “scount” multiple times
"GENERATING"
scount
scount
scount
scount
scount
scount
scount
scount
scount
scount
'G'
'E'
'N'
'E'
'R'
'A'
'T'
'I'
'N'
'G'
"ENERATING"

"NERATING"

"ERATING"

"RATING"

"ATING"

"TING"

"ING"

"NG"

"G"

""

maximum:
5
6
2
5
1
4
0
1
0
0
6
15
tails
• “tails” is a standard function.
• It takes one argument, a list.
• It returns a list of lists, i.e., a list of all items,
then a list of all items but the first, then a list
of all items but the first and second, etc.
tails "GENERATING"
["GENERATING","ENERATING","NERATING",…,"G",""]
16
List Comprehension
• Recall that “scount” takes as arguments a
value, x, and a list consisting of its following
items.
• A list comprehension will be used to provide
the arguments to “scount”:
[scount z zs | z : zs <- tails xs]
“For each list produced by the tails function, take its first
item and the remaining items, and use them as arguments to
the scount function.”
17
Set of surpasser counts
"GENERATING"
[scount z zs | z: zs <- tails ____]
[5,6,2,5,1,4,0,1,0,0]
18
maximum surpasser count (msc)
[5,6,2,5,1,4,0,1,0,0]
maximum ____
6
That’s the answer!
19
msc
• “msc” (maximum surpasser count) is a userdefined function; it is the collection of
functions shown on the previous two slides.
msc
msc
msc xs
::
=
Ord a => [a] -> Int
maximum [scount z zs | z : zs <- tails xs]
20
Here’s the Solution
import List
-- msc = maximum surpasser count
msc
msc xs
::
=
Ord a => [a] -> Int
maximum [scount z zs | z : zs <- tails xs]
scount
scount x xs
::
=
Ord a => a -> [a] -> Int
length (filter (>x) xs)
21
Time Requirements
• With a list of length “n” the msc function shown on the
previous slide takes on the order of n2 steps.
• Here’s why: recall that n surpasser counts are generated
(see slide 18). To generate the first surpasser count, we
take the first list item and compare it against the
remaining n-1 items. To generate the second surpasser
count, we take the second list item and compare it
against the remaining n-2 items. And so forth. So, the
total number of comparisons is:
(n-1) + (n-2) + … + 1 = n(n+1)/2, i.e., T(n) = O(n2)
22
Divide and Conquer
Solution
23
The Key Concepts
1. Determine the maximum surpasser count (msc) of list ws.
2. Divide ws into two lists: ws  xs + ys
3. Determine the scount of each value in xs and the scount
of each value in ys.
4. Assume that xs and ys are sorted in increasing order and ys
is of length n.
5. x is the first value in xs and it has an scount (within xs)
of c. y is the first value in ys and it has an scount (within
ys) of d. There are the two cases to consider:
a)
b)
x < y: then the scount of x equals c + n (remember, ys is
sorted, so if x < y then it is less than all n values in ys).
x ≥ y: then the scount of y equals d (remember, xs and ys are
sorted, so if x ≥ y then y is less than all values in xs and all
values in ys).
24
The Simplest Example
GE
Split into xs and ys
G
E
25
GE
('G',0)
('E',0)
The scount of 'G' in xs is zero and
the scount of 'E' in ys is zero.
26
GE
('G',0)
('E',0)
xs is sorted in increasing order
and so is ys. Obviously.
27
GE
('G',0)
('E',0)
Compare 'G' with 'E'.
'G' ≥ 'E' so 'E' must be the smallest
value. Output 'E' then 'G'.
28
GE
('G',0)
('E',0)
('E',0) : ('G',0)
29
GE
('G',0)
('E',0)
('E',0) : ('G',0)
These are the correct surpasser
counts for GE. Furthermore, the
resulting list is sorted!
30
Another Simple Example
NE
Split into xs and ys
N
E
31
NE
('N',0)
('E',0)
The scount of 'N' in xs is zero and
the scount of 'E' in ys is zero.
32
GE
('N',0)
('E',0)
xs is sorted in increasing order
and so is ys. Obviously.
33
GE
('N',0)
('E',0)
Compare 'N' with 'E'.
'N' ≥ 'E' so 'E' must be the smallest value.
Output 'E' then 'N'.
34
GE
('N',0)
('E',0)
('E',0) : ('N',0)
35
GE
('N',0)
('E',0)
('E',0) : ('N',0)
These are the correct surpasser
counts for NE. Furthermore, the
resulting list is sorted!
36
A larger example
GENE
Split into xs and ys
GE
NE
37
GENE
GE
NE
('E',0) : ('G',0)
('E',0) : ('N',0)
The previous slides showed
how to process the two sublists.
38
GENE
GE
NE
('E',0) : ('G',0)
('E',0) : ('N',0)
Compare 'E' with 'E'.
'E' ≥ 'E' so the right 'E' must be the smallest value.
Output 'E' and process the remaining sub-lists.
39
GENE
GE
NE
('E',0) : ('G',0)
('N',0)
Output: ('E', 0)
40
GENE
GE
NE
('E',0) : ('G',0)
('N',0)
Compare 'E' with 'N'.
'E' < 'N' so all the values in ys must be surpassers of 'E'.
Output 'E', but first increment its surpasser count by length ys.
41
GENE
GE
NE
('G',0)
('N',0)
Output: ('E', 0) : ('E', 1)
42
GENE
GE
NE
('G',0)
('N',0)
Compare 'G' with 'N'.
'G' < 'N' so all the values in ys must be surpassers of 'N'.
Output 'G', but first increment its surpasser count by length ys.
43
GENE
GE
NE
""
('N',0)
Output: ('E', 0) : ('E', 1) : ('G', 1)
44
GENE
GE
NE
""
('N',0)
Output 'N'.
45
GENE
GE
NE
""
""
Output: ('E', 0) : ('E', 1) : ('G', 1) : ('N', 0)
46
Surpasser Counts
GENE
Output: ('E', 0) : ('E', 1) : ('G', 1) : ('N', 0)
let zs = the list of second values in each pair
msc = the maximum of zs
47
Terminology: table
GENE
('E', 0) : ('E', 1) : ('G', 1) : ('N', 0)
The result of processing is a list of pairs. The
second value is the scount of the first value. This
list of pairs is called a "table".
The "table function" takes as its argument a list and
returns a table.
48
Terminology: join
GE
('N',0)
('E',0)
('E',0) : ('N',0)
Processing two sub-lists to create one list is
called "join".
The "join function" takes as its arguments two
lists, xs and ys, and returns a table.
49
Here's how to implement the
table function
table
table (w:[])
table ws
::
=
=
Ord a => [a] -> [(a,Int)]
[(w, 0)]
join (table xs) (table ys)
where m
= length (ws)
n
= m `div` 2
(xs,ys) = splitAt n (ws)
"Process a list. If there is just one value in the list then its
surpasser count is zero and return a list containing one
pair, where the second value is zero. If there's more than
one value in the list then divide the list in half, into xs and
ys; get the table of xs and the table of ys (i.e., recurse) and
then join those two tables."
50
Here's how to implement the
join function
join
join [] tys
join txs []
join xs@((x,c):txxs) ys@((y,d):tyys)
::
=
=
|
|
Ord a => [(a,Int)] -> [(a,Int)] -> [(a,Int)]
tys
txs
x < y = (x, c + length ys) : join txxs ys
x >= y = (y, d) : join xs tyys
"Join two tables, txs and tys. If txs is empty then return tys.
It tys is empty then return txs. Compare the first value of
txs with the first value of tys. Specifically, compare the
first value of each pair, (x,c) and (y,d). If x < y then
x's surpasser count is c plus the length of ys (ys is an alias
for the table). If x >= y then y's surpasser count is d.
Join the remaining tables."
51
Efficiency improvment
• Each time the join function is invoked it
computes the length of tys.
• To gain a slight efficiency improvement,
invoke join with an additional argument: a
value, n, corresponding to the length of tys.
52
Here's how to implement msc
msc
msc ws
::
=
Ord a => [a] -> Int
maximum (map snd (table ws))
"Invoke the table function with the list, ws. It returns a
list of pairs, (value, surpasser count). Create a list
containing all the surpasser counts. Use map snd to
accomplish this. Now get the largest surpasser count."
53
Here's the complete
implementation
import List
-- msc = maximum surpasser count
msc
msc ws
::
=
Ord a => [a] -> Int
maximum (map snd (table ws))
table
table (w:[])
table ws
::
=
=
Ord a => [a] -> [(a,Int)]
[(w, 0)]
join (table xs) (table ys)
where m
= length (ws)
n
= m `div` 2
(xs,ys) = splitAt n (ws)
join
join [] tys
join txs []
join xs@((x,c):txxs) ys@((y,d):tyys)
::
=
=
|
|
Ord a => [(a,Int)] -> [(a,Int)] -> [(a,Int)]
tys
txs
x < y = (x, c + length ys) : join txxs ys
x >= y = (y, d) : join xs tyys
54
Time Requirements
• With a list of length “n” the msc function
shown on the previous slide takes on the order
of n log n steps. That's a lot faster than the first
solution, especially for a large list.
55
Download