How to Catch a Tiger: Understanding Putting Performance on the PGA TOUR Jason Acimovic MIT Operations Research Center, acimovic@mit.edu Douglas Fearing MIT Operations Research Center, dfearing@mit.edu Professor Stephen Graves MIT Sloan School of Management, sgraves@mit.edu February 19, 2010 Agenda • Introduction – Project Question – Applications – Approach and contribution • Golf and data overview • Putting model • Off-green model • Situational analysis February 19, 2010 2 Project Question • How well do people perform on tasks? – Tasks differ from each other – Not everyone performs every task – Even the same task can be different from person to person February 19, 2010 3 Applications • Evaluating employees in a distribution center – Pickers in a warehouse vary in skill (picks per hour) – Pick zones vary in difficulty (books vs. electronics) – Difficulty also varies by hour of day and day of week – Pickers shift around, but not enough to ensure perfect mixing – How do you compensate the best employees and identify underperformers? • Golf putting – Different golfers play different tournaments – Greens vary in their difficulty – Different golfers start on the green from different distances – How do we identify the best putters? February 19, 2010 4 Project approach and contribution • Develop statistical models to predict strokes-to-go • Correct for player skill and course difficulty • Evaluate incremental value of each shot taken relative to the expectation for the field – Compare predicted strokes-to-go before and after shot • Aggregate shot value across players, shot types, etc. to better understand player performance • Compare our model to current metrics, namely, Putting Average • Paper: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1538300 (or email us) February 19, 2010 5 Agenda • Introduction • Golf and data overview – Strokes-to-go example – ShotLink data • Putting model • Off-green model • Situational analysis February 19, 2010 6 Quick golf primer • The goal is to get from the tee to the pin in the fewest number of strokes • 18 holes in a round of golf • Typically 4 rounds in a tournament • Lowest total score wins Green Tee Fairway February 19, 2010 7 Strokes-to-go example Shot Location Strokes-To-Go Shot Gain 1 4.4 0.4 2 3.0 0.2 3 1.8 0.8 4.4 – 3.0 – 1 = 0.4 February 19, 2010 8 ShotLink Data • Every tournament, 250 volunteers gather data on every shot – Lasers pinpoint the ball location to within an inch – Field volunteers gather qualitative characteristics • Data is used for both real time reporting as well as detailed analyses • 5 Million shot data points • 2 Million putt data points February 19, 2010 9 Visual explanation of ShotLinkTM dataset Z Coordinate Z Coordinate X Coordinate X Coordinate Y Coordinate Y Coordinate Course Year Round Number Hole Number Tee Location Ball Location Pin Location Player Shot Number Location Type Ball Lie Hole Par Stimp Reading Green Length 16th Hole on Colonial February 19, 2010 10 Data for the 14th hole at Quail Hollow – 1 day Bunker Fairway Green February 19, 2010 Rough Water Pin 11 Agenda • Introduction • Golf and data overview • Putting model – Empirical data – Two stage model • Holing out submodel • Distance-to-go submodel – Markov chain – Correct for hole difficulty and player skill – Putts-gained per round and results • Off-green model • Situational analysis February 19, 2010 12 Empirical mean and std. dev. of putts-to-go Mean Std. Dev. 2.6 0.60 2.4 0.50 Number of Putts Number of Putts 2.2 2.0 1.8 1.6 0.40 0.30 0.20 1.4 0.10 1.2 1.0 0.00 0 20 40 60 Putt Distance (feet) 80 100 Empirical 0 20 40 60 Putt Distance (feet) 80 100 Empirical February 19, 2010 13 Two-stage model to predict putts-to-go • First stage sub-model – From anywhere on the green, the first model predicts the probability of sinking the putt Probability of 0.1 of making it in on this putt February 19, 2010 14 Second stage finds conditional distance-to-go • Second stage sub-model – If the golfer misses the putt, the second model calculates the distribution of the distance-to-go for the green If I miss, I have a 0.0021 probability of being in this blue area. (calculate this for entire green) February 19, 2010 15 Combine and … • We can calculate the putts-to-go distribution from anywhere on the green Consider only distance in our model February 19, 2010 16 Empirical probabilities of holing out Empirical probability of holing out vs. distance Probability of holing out 1 0.75 0.5 0.25 0 0 10 20 30 40 50 60 70 80 90 100 Putt Distance (feet) Empirical February 19, 2010 17 Normal regression is inappropriate • With Ordinary Least Squares regression, “one” might predict the probability of making a putt based on starting distance…. Y 0 1d • But… – We want to predict a probability with a range between 0 and 1 – Errors are not normal February 19, 2010 18 One-putt logistic regression model • Y – putts-to-go • d – initial distance to the pin • Fitted model parameters: 0 ,, 5 • Probability: P[Y 1| d ] 1 exp ( 0 1d +L 4 d 4 5 log d ) February 19, 2010 1 19 Model holing out as a logistic regression Model probability of holing out vs. distance Probability of holing out 1 0.75 0.5 0.25 0 0 10 20 30 40 50 60 Putt Distance (feet) Empirical February 19, 2010 70 80 90 100 Model 20 2nd-stage problem, determining distance-to-go • What happens if we miss the first putt? z February 19, 2010 21 Empirical mean and std. dev. of distance-to-go Mean Std. Dev. 12 16 3.0 14 10 Standard Deviation of Distance-to-go (feet) Distance-to-go (feet) 12 8 6 4 2.0 10 8 1.5 6 1.0 Coefficient of Variation 2.5 4 2 0.5 2 0 0 0 20 40 60 Putt Distance (feet) 80 100 Empirical 0.0 0 20 40 60 Putt Distance (feet) Empirical Standard Deviation February 19, 2010 80 100 Empirical Coefficient of Variation 22 Empirical distributions of distance-to-go From 30 ft. 0.4 0.4 0.3 0.3 Probability Density Initial distance = 30ft Probability Density Initial distance = 10ft From 10 ft. 0.2 0.1 0.2 0.1 0 0 0 2 4 6 Distance-to-go (feet) 8 10 Empirical 0 2 4 6 Distance-to-go (feet) 8 10 Empirical February 19, 2010 23 Distance-to-go gamma regression model • d – initial distance to the pin • z – distance-to-go (assuming a miss) • Fitted model parameters: Shape (k ), 0 ,, 3 2 exp{ log d d d } 0 1 2 3 • Mean: d • Density: f ( z | d ) gamma( z; k , d ) zk/d e kzk1 (k )d February 19, 2010 24 Distance-to-go model: mean and std. dev. Mean Std. Dev. 12 10 8 Distance-to-go (feet) Distance-to-go (feet) 10 8 6 4 6 4 2 2 0 0 0 20 40 60 Putt Distance (feet) Empirical 80 100 Model March 24, 201619, 2010 February 0 20 40 60 Putt Distance (feet) Empirical 80 100 Model 25 Distance-to-go model distributions From 10 ft. From 30 ft. 0.4 Probability Density Initial distance = 30ft Probability Density Initial distance = 10ft 0.4 0.3 0.2 0.1 0.3 0.2 0.1 0 0 0 2 4 6 Distance-to-go (feet) Empirical 8 10 Model 0 2 4 6 Distance-to-go (feet) Empirical February 19, 2010 8 10 Model 26 Putts-to-go as Markov chain g (z|d) = (1 - [ 1 + exp(…) ]-1) x f(z|d) p = [ 1 + exp(…) ]-1 Probability of holing out in n putts is probability of reaching absorbing state in n transitions H Where g(z|d): f(z|d) z d distance probability density of ending up at z conditioned on starting at d probability density of ending up at z conditioned on missing and starting at d (from the distance-to-go gamma regression model) February 19, 2010 27 Making it within n putts (model prediction) • Over 90% of golfers 2-putt or better within 35 ft. • Only a 1.6% chance of 4-putting or worse at 100 ft. Two-Stage Model Within N Putts 1 Probability 0.8 0.6 0.4 0.2 0 0 10 20 30 Model 1 Putt 40 50 60 Putt Distance (feet) Model 2 Putts Empirical 1 Putt February 19, 2010 Empirical 2 Putts 70 80 90 100 Model 3 Putts Empirical 3 Putts 28 Two-stage model mean and std. dev. Mean Std. Dev. 2.6 0.60 2.4 0.50 2.2 2 Number of Putts Number of Putts 0.40 1.8 1.6 1.4 0.30 0.20 0.10 1.2 0.00 0 1 0 20 40 60 Putt Distance (feet) Empirical 80 100 Model -0.10 20 40 80 100 Putt Distance (feet) Empirical February 19, 2010 60 Model 29 Comparing putt quality • Greens vary in difficulty – Fast vs. slow greens – Type and length of grass • Good putts on a hard green should be valued more than the same on an easy green • Adjust parameters for each hole to the logistic and gamma regression models February 19, 2010 30 Revised logistic and gamma regressions • Every player p and hole h have their own dummy variables and specific holing-out probabilities* { 0 1d ... 4 d 5log d P(Yi 1) 1 exp I d I } 1p h 0h p 0p p h 4 1 – Ip is the indicatory variable, and is equal to 1 if observation i contains player p and is zero otherwise. – Instead of a regression with 6 parameters, we now have thousands of parameters Thehole gamma • E.g., there is a β0h parameter for every regression is adjusted similarly *The actual analysis accounts for the number of observations per player and per hole, so that the model is more complex for players about whom we know more. February 19, 2010 31 Visualizing player skill level differences • Comparison of above average (Brent Geiberger), below average (John Huston), and field average putter for an average green February 19, 2010 32 Visualizing green difficulty differences • Comparison of an easy green (Bay Hill #9), difficult green (Sawgrass #1), and average green based on a field average golfer February 19, 2010 33 Calculating putts gained per round • Calculate the gain associated with each putt – Relative to the putts-to-go for each specific hole – Example: Golfer starts at 12 ft. and takes 2 putts to sink ball • Expected putts-to-go: 1.71 • Actual number of putts: 2 • Relative gain: (- 0.29) • Sum the relative gains for each player • Divide by the number of rounds played February 19, 2010 12 feet 1.71 putts to go 34 Top 10 putts gained per round Rank Golfer Putts Gained / Round Number of Rounds Putts Gained / Round Stdev 1 Tiger Woods 0.69 230 0.12 2 David Frost 0.67 113 0.16 3 Fredrik Jacobson 0.56 248 0.11 4 Nathan Green 0.55 197 0.12 5 Aaron Baddeley 0.53 303 0.10 6 Jesper Parnevik 0.50 315 0.10 7 Stewart Cink 0.49 375 0.09 8 Darren Clarke 0.45 107 0.17 9 Ben Crane 0.44 273 0.11 10 Willie Wood 0.42 72 0.20 February 19, 2010 35 Putting average is the most popular metric today • Putting Average – Average number of putts per green* • When a golfer reaches a green – Count the putts it takes to get it in the hole – Average this among all his green appearances – Regardless of how close he starts on the green *Actually, a green in regulation, which means the green was reached in no more than (par – 2) strokes February 19, 2010 36 Comparing with putting average Putts Gained / Round PG/R Rank Putting Average PA Rank Tiger Woods 0.69 1 1.71 1 David Frost 0.67 2 1.77 60 Fredrik Jacobson 0.56 3 1.74 4 Nathan Green 0.55 4 1.74 5 Aaron Baddeley 0.53 5 1.74 3 Jesper Parnevik 0.50 6 1.76 47 Stewart Cink 0.49 7 1.75 12 Darren Clarke 0.45 8 1.75 19 Ben Crane 0.44 9 1.75 17 Willie Wood 0.42 10 1.77 92 Golfer February 19, 2010 37 Understanding the discrepancies PG/R Putts Gained / PA • Insert first-putt distance histograms for most severe Percentile Golfer Round Putting Average Percentile outlier. 9th Stephen Leaney 59th 0.26 1.79 88th Ernie Els -0.63 5th 1.75 Percentage of 1st putts 20 ft. or closer • 54% for All Players • 51% for Stephen Leaney • 60% for Ernie Els February 19, 2010 On average he starts closer to the hole, so his putting average is inflated by his excellent approach shots 38 Agenda • Introduction • Golf and data overview • Putting model • Off-green model • Situational analysis February 19, 2010 39 Evaluating off-green performance • For each hole, calculate “field par” – Empirical average number of strokes corrected for player skill and hole difficulty • Calculate total strokes gained per round for each player • Calculate off-green strokes gained per round (Off-green strokes gained = Total strokes gained – February 19, 2010 putts gained) 40 Top 10 golfers (on and off green performance) Rank Golfer Putts Gained / Round Off-Green Gain / Round Total 1 Tiger Woods 0.69 2.53 3.22 2 Vijay Singh -0.36 2.65 2.29 3 Jim Furyk 0.00 2.03 2.03 4 Phil Mickelson 0.19 1.74 1.94 5 Ernie Els -0.63 2.48 1.85 6 Adam Scott 0.08 1.69 1.77 7 Sergio Garcia -0.67 2.20 1.52 8 David Toms 0.16 1.27 1.43 9 Retief Goosen -0.44 1.84 1.40 10 Stewart Cink 0.49 0.89 1.39 February 19, 2010 41 Agenda • Introduction • Golf and data overview • Putting model • Off-green model • Situational analysis – Player specific putts – Fourth round pressure – Tiger woods’ fourth round performance February 19, 2010 42 Situational putting performance • Above, we used the general putting model to evaluate putting relative to the field of professionals • We also have the capability to evaluate a golfer’s putting relative to his own expected performance • For instance, even if Tiger Woods usually putts better than the field, we can also determine when he putts worse than himself – Does he putt better or worse after the cut? – Does he putt better or worse for birdie vs. for par? February 19, 2010 43 Player-specific putts gained – example • On the 10th green at Quail Hollow, 9 feet from the pin: – Tiger Woods’ personal expected putts-to-go is 1.54 – Vijay Singh’s personal expected putt-to-go is 1.59 – If they each sink it, Tiger gains only 0.54 strokes whereas Vijay gains 0.59 strokes Vijay: E[putts] = 1.59 Tiger: E[putts] = 1.54 9ft February 19, 2010 9ft 44 Advantages of player-specific putts gained • Easy to test various hypotheses – After calculating the shot value for every putt, we need only to filter and aggregate the results • Describes the magnitude in terms of score impact • Suggests areas for further investigation – Standard deviation of putts gained provides the relative significance of the effect February 19, 2010 45 Fourth round pressure • Putting does not seem to be affected by the pressures of being in the fourth round Putt Count Putts Gained Per Putts Gained Per Putt Putt Deviation 3rd Round 359,079 0.00237 0.00027 4th Round 353,979 0.00246 0.00027 0.00009 0.00038 Difference February 19, 2010 46 Tiger Woods’ fourth round performance • A common perception is that Tiger has the ability to kick it up a notch during the final round • Looking at his putts-gained suggests otherwise Putt Count Putts Gained Per Putts Gained Per Putt Putt Deviation 1st Round 1,614 0.00036 0.00386 2nd Round 1,589 0.00847 0.00395 3rd Round 1,654 -0.00293 0.00375 4th Round 1,671 -0.00022 0.00380 February 19, 2010 47 Conclusion • Developed a model for putting – Corrected for player skill and hole difficulty – Intuitive model that describes how putts occur • Demonstrated the differences between our metric and current putting statistics • Developed a “field par” which corrects for hole difficulty and quality of field • Compared on- and off-green performance • Examined situational putting performance February 19, 2010 48