Baseball Enigma: In Quest of An Optimal Batting Palisade User Conference - 2007 Miami, Florida USA © 2007 Analytical Advantages, LLC, all rights reserved Time Line – Exciting Journey! 2004 Optimal Batting Order Model 2005 “Can you Predict Who wins” 2005 Build Baseball Gambling Model 2006 Beta Test baseball gambling Model © 2007 Analytical Advantages, llc all rights reserved 2006 Palisade User Conference January 2007 First Meeting Boston Red Sox June 2007 First Meeting New York Yankees October 2007 Palisade User Conference Agenda – Kind of? • Traditions • Good batting order characteristics • Nature of game • Batter vs. Pitcher • Field of play • Umpire bias © 2007 Analytical Advantages, llc all rights reserved Traditional Thinking • “…the leadoff man should be a good base stealer.” • “…number two should be a contact hitter who can hit behind a runner.” • “…bat your best hitter third.” © 2007 Analytical Advantages, llc all rights reserved Characteristics of a Good Batting Order to Consider • Runs scored from those on base • Expected Run Production vs. Actual • Runs Scored Consistency © 2007 Analytical Advantages, llc all rights reserved Runs are the Currency of Baseball It’s simple – the more runs you score the greater likelihood you’ll win. 100% 90% Probability of Winning 80% 70% 60% 50% 40% 30% 20% 10% 0% 1 2 3 4 Source: MLB.COM 2003-2005 AL Scores © 2007 Analytical Advantages, llc all rights reserved 5 6 7 8 Runs/Game 9 10 11 12 13 14 When on Base Do You Score? Scoring When On Base 41% Percent Scoring 40% 39% 38% 37% 36% 35% 34% 33% 32% 31% Source: MLB.COM Tabulated by: Universal Analytics, llc © 2007 Analytical Advantages, llc all rights reserved Scoring Expectations: Doesn’t Mean Winning Expected Runs per Game 5.75 Expected Runs/Game 5.50 5.25 5.00 4.75 4.50 4.25 4.00 Source: MLB, Baseball Prospectus, CBSSportline.com 2007 Season Tabulated by: Analytical Advantages, llc © 2007 Analytical Advantages, llc all rights reserved Scoring Effectiveness: Actual - Expected Runs / Game (essential to winning) Performance Variation (O/U_runs/game) 0.75 Expected Runs/Game 0.50 0.25 0.00 -0.25 -0.50 -0.75 Source: MLB, Baseball Prospectus, CBSSportline.com 2007 Season Tabulated by: Analytical Advantages, llc © 2007 Analytical Advantages, llc all rights reserved Winning Tied to Effective Scoring Winning as a Function of Run Scoring Efficiency 80% 70% %Win 60% 50% 40% R2 = 0.51 30% 20% 0.3 0.32 0.34 0.36 0.38 Runs/OB Source: MLB, Baseball Prospectus, Team Offensive Statistics 1996 - 2007 Seasons Tabulated by: Analytical Advantages, llc © 2007 Analytical Advantages, llc all rights reserved 0.4 0.42 0.44 Dichotomous Objectives! Maximize the Expected Value of Runs Scored for Each and Every Game Consistency (small variability) Not the same for all games (teams)! © 2007 Analytical Advantages, llc all rights reserved Fundamentals of The Game Runs Pitcher vs. batters Field of play (layout, surface and game temperature) Umpire characteristics Understanding performance © 2007 Analytical Advantages, llc all rights reserved Scoring Profile: Interdependencies (Building Blocks of Scoring) Runs the result of multiple factors Batter Park RUNS Ump Pitcher © 2007 Analytical Advantages, llc all rights reserved Batter vs. Pitcher: The Fundamental Conflict Runs per Plate Appearance – Measurement Foundation 0.150 0.145 Runs / Plate Appearance 0.140 0.135 0.130 0.125 0.120 0.115 0.110 0.105 0.100 35% Source: MLB.COM (2006 Season) Tabulated by: Analytical Advantage,llc © 2007 Analytical Advantages, llc all rights reserved 45% 55% Winning Percentage Pitchers Batters 65% Park Physical Layouts Vary Ability to Score Dependent Upon Field Configuration Infield Layout: Only Consistency Yankee Stadium Fenway Park (New York) Skydome Camden Yards Tropicana Field (Toronto) (Baltimore) (Tampa Bay) Wrigley Field Minute Maid Park (Chicago) (Houston) © 2007 Analytical Advantages, llc all rights reserved Batters Favored on Artificial Surface Performance Factor (Avg. = 1.00) Hitting Performance Improved on Artificial Surface 1.30 1.25 1.20 1.15 29% 1.10 1.05 Artificial 10% 7% Grass 1.00 0.95 0.90 0.85 0.80 Singles Doubles Type of Hit © 2007 Analytical Advantages, llc all rights reserved Triples Ballpark Fair Territory: Yankee Stadium Among Smaller Fields in Baseball Area by Field 106,000 104,000 Square Feet 102,000 100,000 98,000 96,000 94,000 92,000 90,000 Source: MLB.COM Tabulated by: Universal Analytics, llc © 2007 Analytical Advantages, llc all rights reserved Expected Runs Varies With Park 1.40 1.35 1.30 Run Index 1.25 1.20 1.15 1.10 1.05 1.00 0.95 0.90 0.85 0.80 Source: MLB.COM Tabulated by: Universal Analytics, llc © 2007 Analytical Advantages, llc all rights reserved Venue Quantifying Umpire Bias Comparison of Input Distribution and Gamma(47.19,4.28e-2) 2.0 Pitcher Favored Batter Favored 1.0 0.0 1.0 1.5 Source: Baseball Prospectus 2005-2007 Tabulated by: Universal Analytics, llc © 2007 Analytical Advantages, llc all rights reserved 1.9 2.4 2.9 3.4 On Base Percentage Key Characteristic On Base Percentage - MLB Onbase Percentage 0.36 0.35 0.34 0.33 0.32 0.31 0.30 Source: CBSSportline.com 6/11/2007 Tabulated by: Analytical Advantages, llc © 2007 Analytical Advantages, llc all rights reserved More Wins Requires a Scoring Consistency - Redistribution Yankee’s Probability of Winning Vulnerable Especially While Scoring Less Than 7 Runs 100% Percent of Games Won 2006 90% 80% Areas of high opportunity to win more 70% 60% 50% 40% 30% 20% 10% 0% 1 2 3 Source: MLB.COM 2006 AL Scores © 2007 Analytical Advantages, llc all rights reserved 4 5 6 7 8 Runs Per Game 9 10 11 12 13 14 Therefore: Effective Batting Order Key to Winning More! © 2007 Analytical Advantages, llc all rights reserved Batting Order: Consider the Possibilities 9 batters Over 200 possible pitchers (1 to 5 per game) 14 American League parks (plus 16 National) 80+ umpires © 2007 Analytical Advantages, llc all rights reserved Batting Order: Consider the Possibilities 9 batters Over 200 possible pitchers (1 to 5 per game) 14 American League parks (plus 16 National) 80+ umpires © 2007 Analytical Advantages, llc all rights reserved Hence: Calculations too numerous for anyone to enumerate! © 2007 Analytical Advantages, llc all rights reserved It’s All About Raising The Probabilities of Winning © 2007 Analytical Advantages, llc all rights reserved © 2007 Analytical Advantages, llc all rights reserved Create an Objective Foundation Simultaneously Integrate the Building Blocks of Scoring Ump Bias Field & Temp Elements Pitcher Characteristics Batter Profile © 2007 Analytical Advantages, llc all rights reserved Method of Approach and Path to Optimality: Improving the probability of winning Data driven analytics Model the process resultant game Stochastic performance Genetic programming algorithm Expectations © 2007 Analytical Advantages, llc all rights reserved Genetic Programming • • • • • • Random vs. Deterministic Population vs. Single Best Solution Creating new Solutions through Mutation Combining Solutions through Crossover Selecting Solutions via “Survival of the Fittest” Drawback of Genetic (evolutionary) Algorithms • Better compared to what you know • Never know when to quit © 2007 Analytical Advantages, llc all rights reserved Genetic Programming • • • • • • Random vs. Deterministic Population vs. Single Best Solution Creating new Solutions through Mutation Combining Solutions through Crossover Selecting Solutions via “Survival of the Fittest” Drawback of Genetic (evolutionary) Algorithms • Better compared to what you know • Never know when to quit © 2007 Analytical Advantages, llc all rights reserved Lack of Consistency Burdens Yankee Winning Standard Deviation Yankees Most Volatile Scoring Team in American League 3.7 3.6 3.5 3.4 3.3 3.2 3.1 3.0 2.9 2.8 2.7 © 2007 Analytical Advantages, llc all rights reserved Scoring Skewness Robs Yankee Scoring Potential (Getting Runs When You Need Them) Scoring Frequency Shows Yankees Higher Output Bias 18% 16% 14% Percent 12% 10% 2006 AL % Averages 8% 2006 Yankees % 6% 2007 Yankees % 4% 2% 0% 0 1 2 3 4 5 6 7 8 9 10 11 Runs © 2007 Analytical Advantages, llc all rights reserved 12 13 14 15 16 17 18 19 20 Discrete Density Functions for each: Player, Pitcher, Field Umpire Combination Outcome Probabilities By Batter (P/U: Perez/Wegner) at Yankee Stadium 70% 60% 50% Out BB 40% HP 1 2 30% 3 HR 20% 10% 0% Melky Cabrera Derek Jeter Bobby Abreu © 2007 Analytical Advantages, llc all rights reserved Jorge Posada Miguel Cairo Josh Phelps Robinson Cano Hideki Matsui Alex Rodriguez An Example: Angels vs. Yankees May 27th 65,500 © 2007 Analytical Advantages, llc all rights reserved Raises Expected Value of Runs Scored Expected Runs from 3.8 to 4.0 Original Optimal .16 .08 0 0 4 8 12 Runs/Game © 2007 Analytical Advantages, llc all rights reserved 16 20 0 4 8 12 Runs/Game 16 20 Optimal Batting Order: Shifts Distribution to be more Effective (winning) Original Key to More Wins! Optimal .16 .08 0 0 4 8 12 Runs/Game 16 © 2007 Analytical Advantages, llc all rights reserved 20 0 4 8 12 Runs/Game 16 20 Subtle Changes in Batting Order Raises Expected Runs Scored by 7% (huge) LA Angels at New York May 27 (lost 4- 3) Starting Order • • • • • • • • • Cabrera Jeter Matsui Rodriguez Giambi Cano Abru Mientkiewicz Nieves © 2007 Analytical Advantages, llc all rights reserved Optimal Order • • • • • • • • • Cabrera Giambi Matsui Rodriguez Cano Jeter Abru Mientkiewicz Nieves The Further From Optimum: The Greater the Probability of Loss Probability of Losing over 60% Potential High Risk Games Identified Runs/Game from Optimal 3.50 Loss 3.00 Wins 2.50 2.00 Vulnerable Games Identified 1.50 1.00 0.50 0.00 Selected 2006 Games With Boston © 2007 Analytical Advantages, llc all rights reserved The Further From Optimum: The Greater the Probability of Loosing Runs(Optimal - Original) > .6, P(winning) < 40% (further from optimal solution) Runs(Optimal - Original) < .6, P(winning) > 60% (closer to optimal) © 2007 Analytical Advantages, llc all rights reserved Profiles of Batter, Pitcher, Field and Umpire Yields Discrete Density Function for Every at Bat © 2007 Analytical Advantages, llc all rights reserved From Density Function to Reaction Matrix (Finite number situations) Number of outs Players on base At bat Performance © 2007 Analytical Advantages, llc all rights reserved Game Summary © 2007 Analytical Advantages, llc all rights reserved What Can Be Gained? Increased expected value of runs for each and every game, Greater consistency (reduced sigma), Skewness favorably shifted, Confidence doing best with what you’ve got, Give Yankee pitchers greater support. © 2007 Analytical Advantages, llc all rights reserved What Can Be Accomplished? Raise the probability of winning each and every game!, Increase likelihood of making playoffs, Significantly enhance the possibility of getting to and winning the World Series! © 2007 Analytical Advantages, llc all rights reserved What Can You Expect? Win between 6 and 15 (or more) games that would have otherwise been lost! © 2007 Analytical Advantages, llc all rights reserved Next Step Only 119 Days Until Pitchers and Catchers Show up for Spring Training! © 2007 Analytical Advantages, llc all rights reserved