Slide

advertisement
How to Catch a Tiger:
Understanding Putting Performance on the PGA TOUR
Jason Acimovic
MIT Operations Research Center, acimovic@mit.edu
Douglas Fearing
MIT Operations Research Center, dfearing@mit.edu
Professor Stephen Graves
MIT Sloan School of Management, sgraves@mit.edu
February 19, 2010
Agenda
• Introduction
– Project Question
– Applications
– Approach and contribution
• Golf and data overview
• Putting model
• Off-green model
• Situational analysis
February 19, 2010
2
Project Question
• How well do people perform on tasks?
– Tasks differ from each other
– Not everyone performs every task
– Even the same task can be different from person to person
February 19, 2010
3
Applications
• Evaluating employees in a distribution center
– Pickers in a warehouse vary in skill (picks per hour)
– Pick zones vary in difficulty (books vs. electronics)
– Difficulty also varies by hour of day and day of week
– Pickers shift around, but not enough to ensure perfect mixing
– How do you compensate the best employees and identify underperformers?
• Golf putting
– Different golfers play different tournaments
– Greens vary in their difficulty
– Different golfers start on the green from different distances
– How do we identify the best putters?
February 19, 2010
4
Project approach and contribution
• Develop statistical models to predict strokes-to-go
• Correct for player skill and course difficulty
• Evaluate incremental value of each shot taken relative to
the expectation for the field
– Compare predicted strokes-to-go before and after shot
• Aggregate shot value across players, shot types, etc. to
better understand player performance
• Compare our model to current metrics, namely, Putting
Average
• Paper: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1538300 (or email us)
February 19, 2010
5
Agenda
• Introduction
• Golf and data overview
– Strokes-to-go example
– ShotLink data
• Putting model
• Off-green model
• Situational analysis
February 19, 2010
6
Quick golf primer
• The goal is to get from the tee to the pin in the fewest number
of strokes
• 18 holes in a round of golf
• Typically 4 rounds in a tournament
• Lowest total score wins
Green
Tee
Fairway
February 19, 2010
7
Strokes-to-go example
Shot Location
Strokes-To-Go
Shot Gain
1
4.4
0.4
2
3.0
0.2
3
1.8
0.8
4.4 – 3.0 – 1 = 0.4
February 19, 2010
8
ShotLink Data
• Every tournament, 250 volunteers gather data on every
shot
– Lasers pinpoint the ball location to within an inch
– Field volunteers gather qualitative characteristics
• Data is used for both real time reporting as well as
detailed analyses
• 5 Million shot data points
• 2 Million putt data points
February 19, 2010
9
Visual explanation of ShotLinkTM dataset
Z Coordinate
Z Coordinate
X Coordinate
X Coordinate
Y Coordinate
Y Coordinate
Course
Year
Round Number
Hole Number
Tee Location
Ball Location
Pin Location
Player
Shot Number
Location Type
Ball Lie
Hole Par
Stimp Reading
Green Length
16th Hole on Colonial
February 19, 2010
10
Data for the 14th hole at Quail Hollow – 1 day
Bunker
Fairway
Green
February 19, 2010
Rough
Water
Pin
11
Agenda
• Introduction
• Golf and data overview
• Putting model
– Empirical data
– Two stage model
• Holing out submodel
• Distance-to-go submodel
– Markov chain
– Correct for hole difficulty and player skill
– Putts-gained per round and results
• Off-green model
• Situational analysis
February 19, 2010
12
Empirical mean and std. dev. of putts-to-go
Mean
Std. Dev.
2.6
0.60
2.4
0.50
Number of Putts
Number of Putts
2.2
2.0
1.8
1.6
0.40
0.30
0.20
1.4
0.10
1.2
1.0
0.00
0
20
40
60
Putt Distance (feet)
80
100
Empirical
0
20
40
60
Putt Distance (feet)
80
100
Empirical
February 19, 2010
13
Two-stage model to predict putts-to-go
• First stage sub-model
– From anywhere on the green, the first model predicts the
probability of sinking the putt
Probability of 0.1 of
making it in on this putt
February 19, 2010
14
Second stage finds conditional distance-to-go
• Second stage sub-model
– If the golfer misses the putt, the second model calculates
the distribution of the distance-to-go for the green
If I miss, I have a 0.0021 probability
of being in this blue area. (calculate
this for entire green)
February 19, 2010
15
Combine and …
• We can calculate the putts-to-go distribution from
anywhere on the green
Consider only
distance in our
model
February 19, 2010
16
Empirical probabilities of holing out
Empirical probability of holing out vs. distance
Probability of holing out
1
0.75
0.5
0.25
0
0
10
20
30
40
50
60
70
80
90
100
Putt Distance (feet)
Empirical
February 19, 2010
17
Normal regression is inappropriate
• With Ordinary Least Squares regression, “one” might
predict the probability of making a putt based on
starting distance….
Y   0  1d
• But…
– We want to predict a probability with a range between 0 and 1
– Errors are not normal
February 19, 2010
18
One-putt logistic regression model
• Y – putts-to-go
• d – initial distance to the pin
• Fitted model parameters:  0 ,, 5
• Probability:
P[Y  1| d ] 
1  exp (  0  1d +L   4 d 4   5 log d )


February 19, 2010
1
19
Model holing out as a logistic regression
Model probability of holing out vs. distance
Probability of holing out
1
0.75
0.5
0.25
0
0
10
20
30
40
50
60
Putt Distance (feet)
Empirical
February 19, 2010
70
80
90
100
Model
20
2nd-stage problem, determining distance-to-go
• What happens if we miss the first putt?
z
February 19, 2010
21
Empirical mean and std. dev. of distance-to-go
Mean
Std. Dev.
12
16
3.0
14
10
Standard Deviation of
Distance-to-go (feet)
Distance-to-go (feet)
12
8
6
4
2.0
10
8
1.5
6
1.0
Coefficient of Variation
2.5
4
2
0.5
2
0
0
0
20
40
60
Putt Distance (feet)
80
100
Empirical
0.0
0
20
40
60
Putt Distance (feet)
Empirical Standard Deviation
February 19, 2010
80
100
Empirical Coefficient of Variation
22
Empirical distributions of distance-to-go
From 30 ft.
0.4
0.4
0.3
0.3
Probability Density
Initial distance = 30ft
Probability Density
Initial distance = 10ft
From 10 ft.
0.2
0.1
0.2
0.1
0
0
0
2
4
6
Distance-to-go (feet)
8
10
Empirical
0
2
4
6
Distance-to-go (feet)
8
10
Empirical
February 19, 2010
23
Distance-to-go gamma regression model
• d – initial distance to the pin
• z – distance-to-go (assuming a miss)
• Fitted model parameters: Shape (k ),  0 ,, 3
2


exp{



log
d


d


d
}
0
1
2
3
• Mean: d
• Density:
f ( z | d )  gamma( z; k , d )
 zk/d
e
kzk1
(k )d
February 19, 2010
24
Distance-to-go model: mean and std. dev.
Mean
Std. Dev.
12
10
8
Distance-to-go (feet)
Distance-to-go (feet)
10
8
6
4
6
4
2
2
0
0
0
20
40
60
Putt Distance (feet)
Empirical
80
100
Model
March 24,
201619, 2010
February
0
20
40
60
Putt Distance (feet)
Empirical
80
100
Model
25
Distance-to-go model distributions
From 10 ft.
From 30 ft.
0.4
Probability Density
Initial distance = 30ft
Probability Density
Initial distance = 10ft
0.4
0.3
0.2
0.1
0.3
0.2
0.1
0
0
0
2
4
6
Distance-to-go (feet)
Empirical
8
10
Model
0
2
4
6
Distance-to-go (feet)
Empirical
February 19, 2010
8
10
Model
26
Putts-to-go as Markov chain
g (z|d) = (1 - [ 1 + exp(…) ]-1) x f(z|d)
p = [ 1 + exp(…) ]-1
Probability of holing out in n
putts is probability of reaching
absorbing state in n transitions
H
Where
g(z|d):
f(z|d)
z
d
distance
probability density of ending up at z conditioned on starting at d
probability density of ending up at z conditioned on missing and starting at d
(from the distance-to-go gamma regression model)
February 19, 2010
27
Making it within n putts (model prediction)
• Over 90% of golfers 2-putt or better within 35 ft.
• Only a 1.6% chance of 4-putting or worse at 100 ft.
Two-Stage Model Within N Putts
1
Probability
0.8
0.6
0.4
0.2
0
0
10
20
30
Model 1 Putt
40
50
60
Putt Distance (feet)
Model 2 Putts
Empirical 1 Putt
February 19, 2010
Empirical 2 Putts
70
80
90
100
Model 3 Putts
Empirical 3 Putts
28
Two-stage model mean and std. dev.
Mean
Std. Dev.
2.6
0.60
2.4
0.50
2.2
2
Number of Putts
Number of Putts
0.40
1.8
1.6
1.4
0.30
0.20
0.10
1.2
0.00
0
1
0
20
40
60
Putt Distance (feet)
Empirical
80
100
Model
-0.10
20
40
80
100
Putt Distance (feet)
Empirical
February 19, 2010
60
Model
29
Comparing putt quality
• Greens vary in difficulty
– Fast vs. slow greens
– Type and length of grass
• Good putts on a hard green should be valued more
than the same on an easy green
• Adjust parameters for each hole to the logistic and
gamma regression models
February 19, 2010
30
Revised logistic and gamma regressions
• Every player p and hole h have their own dummy
variables and specific holing-out probabilities*

 { 0  1d  ...   4 d   5log d   



P(Yi  1)  1  exp 

I



d

I

}



1p
h 0h
 p 0p


p
h



4
1
– Ip is the indicatory variable, and is equal to 1 if observation
i contains player p and is zero otherwise.
– Instead of a regression with 6 parameters, we now have
thousands of parameters
Thehole
gamma
• E.g., there is a β0h parameter for every
regression
is adjusted similarly
*The actual analysis accounts for the number of observations per player and per hole, so that the model is more
complex for players about whom we know more.
February 19, 2010
31
Visualizing player skill level differences
• Comparison of above average (Brent Geiberger),
below average (John Huston), and field average
putter for an average green
February 19, 2010
32
Visualizing green difficulty differences
• Comparison of an easy green (Bay Hill #9), difficult
green (Sawgrass #1), and average green based on a
field average golfer
February 19, 2010
33
Calculating putts gained per round
• Calculate the gain associated with each putt
– Relative to the putts-to-go for each specific hole
– Example: Golfer starts at 12 ft. and takes 2 putts to sink
ball
• Expected putts-to-go: 1.71
• Actual number of putts: 2
• Relative gain: (- 0.29)
• Sum the relative gains for each player
• Divide by the number of rounds played
February 19, 2010
12 feet
1.71 putts to go
34
Top 10 putts gained per round
Rank
Golfer
Putts Gained /
Round
Number of
Rounds
Putts Gained /
Round Stdev
1
Tiger Woods
0.69
230
0.12
2
David Frost
0.67
113
0.16
3
Fredrik Jacobson
0.56
248
0.11
4
Nathan Green
0.55
197
0.12
5
Aaron Baddeley
0.53
303
0.10
6
Jesper Parnevik
0.50
315
0.10
7
Stewart Cink
0.49
375
0.09
8
Darren Clarke
0.45
107
0.17
9
Ben Crane
0.44
273
0.11
10
Willie Wood
0.42
72
0.20
February 19, 2010
35
Putting average is the most popular metric today
• Putting Average
– Average number of putts per green*
• When a golfer reaches a green
– Count the putts it takes to get it in the hole
– Average this among all his green appearances
– Regardless of how close he starts on the green
*Actually,
a green in regulation, which means the green was reached in no more than (par – 2) strokes
February 19, 2010
36
Comparing with putting average
Putts Gained /
Round
PG/R
Rank
Putting Average
PA
Rank
Tiger Woods
0.69
1
1.71
1
David Frost
0.67
2
1.77
60
Fredrik Jacobson
0.56
3
1.74
4
Nathan Green
0.55
4
1.74
5
Aaron Baddeley
0.53
5
1.74
3
Jesper Parnevik
0.50
6
1.76
47
Stewart Cink
0.49
7
1.75
12
Darren Clarke
0.45
8
1.75
19
Ben Crane
0.44
9
1.75
17
Willie Wood
0.42
10
1.77
92
Golfer
February 19, 2010
37
Understanding the discrepancies
PG/R
Putts Gained /
PA
•
Insert
first-putt
distance
histograms
for
most
severe
Percentile Golfer
Round
Putting Average Percentile
outlier.
9th
Stephen Leaney
59th
0.26
1.79
88th
Ernie Els
-0.63
5th
1.75
Percentage of 1st putts 20 ft. or closer
• 54% for All Players
• 51% for Stephen Leaney
• 60% for Ernie Els
February 19, 2010
On average he starts
closer to the hole, so
his putting average is
inflated by his
excellent approach
shots
38
Agenda
• Introduction
• Golf and data overview
• Putting model
• Off-green model
• Situational analysis
February 19, 2010
39
Evaluating off-green performance
• For each hole, calculate “field par”
– Empirical average number of strokes corrected for player
skill and hole difficulty
• Calculate total strokes gained per round for each
player
• Calculate off-green strokes gained per round
(Off-green strokes gained =
Total strokes gained –
February 19, 2010
putts gained)
40
Top 10 golfers (on and off green performance)
Rank
Golfer
Putts Gained /
Round
Off-Green Gain /
Round
Total
1
Tiger Woods
0.69
2.53
3.22
2
Vijay Singh
-0.36
2.65
2.29
3
Jim Furyk
0.00
2.03
2.03
4
Phil Mickelson
0.19
1.74
1.94
5
Ernie Els
-0.63
2.48
1.85
6
Adam Scott
0.08
1.69
1.77
7
Sergio Garcia
-0.67
2.20
1.52
8
David Toms
0.16
1.27
1.43
9
Retief Goosen
-0.44
1.84
1.40
10
Stewart Cink
0.49
0.89
1.39
February 19, 2010
41
Agenda
• Introduction
• Golf and data overview
• Putting model
• Off-green model
• Situational analysis
– Player specific putts
– Fourth round pressure
– Tiger woods’ fourth round performance
February 19, 2010
42
Situational putting performance
• Above, we used the general putting model to evaluate
putting relative to the field of professionals
• We also have the capability to evaluate a golfer’s
putting relative to his own expected performance
• For instance, even if Tiger Woods usually putts better
than the field, we can also determine when he putts
worse than himself
– Does he putt better or worse after the cut?
– Does he putt better or worse for birdie vs. for par?
February 19, 2010
43
Player-specific putts gained – example
• On the 10th green at Quail Hollow, 9 feet from the pin:
– Tiger Woods’ personal expected putts-to-go is 1.54
– Vijay Singh’s personal expected putt-to-go is 1.59
– If they each sink it, Tiger gains only 0.54 strokes whereas
Vijay gains 0.59 strokes
Vijay: E[putts] = 1.59
Tiger: E[putts] = 1.54
9ft
February 19, 2010
9ft
44
Advantages of player-specific putts gained
• Easy to test various hypotheses
– After calculating the shot value for every putt, we need
only to filter and aggregate the results
• Describes the magnitude in terms of score impact
• Suggests areas for further investigation
– Standard deviation of putts gained provides the relative
significance of the effect
February 19, 2010
45
Fourth round pressure
• Putting does not seem to be affected by the pressures
of being in the fourth round
Putt Count
Putts Gained Per Putts Gained Per
Putt
Putt Deviation
3rd Round
359,079
0.00237
0.00027
4th Round
353,979
0.00246
0.00027
0.00009
0.00038
Difference
February 19, 2010
46
Tiger Woods’ fourth round performance
• A common perception is that Tiger has the ability to
kick it up a notch during the final round
• Looking at his putts-gained suggests otherwise
Putt Count
Putts Gained Per Putts Gained Per
Putt
Putt Deviation
1st Round
1,614
0.00036
0.00386
2nd Round
1,589
0.00847
0.00395
3rd Round
1,654
-0.00293
0.00375
4th Round
1,671
-0.00022
0.00380
February 19, 2010
47
Conclusion
• Developed a model for putting
– Corrected for player skill and hole difficulty
– Intuitive model that describes how putts occur
• Demonstrated the differences between our metric and
current putting statistics
• Developed a “field par” which corrects for hole
difficulty and quality of field
• Compared on- and off-green performance
• Examined situational putting performance
February 19, 2010
48
Download