Aaron Abstatz, Whelan Boyd, and Nate Davis Math 20: Term Project Zachary Hamaker Conditional Outcomes in Boxing Matches Introduction: Predicting the outcomes of boxing matches has sparked the interest of many, becoming a hobby or pastime for some and a means of wealth appropriation for bookies and gamblers. Instead of analyzing the Las Vegas trends involving bookies’ favorites or betting patterns, we hoped to identify patterns within the match itself that might be helpful in predicting the outcome during the fight. Hence, we set out to examine various pools of data regarding specific statistics found in boxing matches in order to derive an algorithm that could be used to predict the outcomes. In particular, we analyze statistics regarding the number of jabs and power punches thrown and landed per round and per fight, focusing on the victorious fighter. We observe that these compilations of data often fit a normal distribution and thus we can use the various instruments at our disposal to compute confidence intervals regarding the probability that a winner falls within our statistical range. Furthermore, we observe that the rate of punches thrown and landed per round is relatively stable and thus lends itself to the use of Poisson approximations to predict conditional outcomes of the fight. Regarding our data set, we attempted to use fights from many different fighters in order to achieve some semblance of randomness. However, there were various constraints. Notably, the statistical match data available to us is weighted somewhat unevenly to a few particular fighters, namely Floyd Mayweather and Manny Pacquiao among other well-known boxers, because of the higher number of their fights that are televised. In order to account for this, we selected randomly for all heavyweight fighters, purposefully excluding a proportional number of the aforementioned boxers’ fights. Our principle conclusion is that early rounds have very little effect on the outcome of the match. This provides insight into how betters ought to take advantage of ostensibly lopsided early rounds because the statistical probability of winning is higher than one might intuitively perceive. Methodologies: We obtained our data set through a website called www.compubox.com as well as personal viewings of boxing matches. Our data set includes jabs thrown, jabs landed, power shots thrown, and power shots landed per fight for as close to a random subset of boxers as our means allowed. Calculations of the probability of a successful punch easily follow from these data. We began by entering these data into an excel spreadsheet. We derived a system of points to be used in determining the total number of successful punches needed to win a match. We assigned a coefficient value of 1 to jabs and 2 to power punches. That is, if a fighter lands 10 jabs and 5 power punches, he would achieve 20 points. 1*(10 jabs)+2*(5 power shots)=20 points We noticed that the total number of points achieved by the victor loosely followed a normal distribution when aggregated. Hence, we used the formulas from class to determine the variance, both per fight and per round, of points as well as an expected value of points both per fight and per round. For variance, we used the summation formula, adding the percentage of punches landed multiplied by the square of the difference between the expected number of points and the observed number of points. ( prob(X) * (X -m)^2) We then calculated various confidence intervals. For example, our 95% confidence interval means that of all victorious fighters, we can say with 95% confidence that the victor of a particular fight has achieved a point total within our interval. Next, for our active component to the project, we sought out to determine the number of points the fighter must achieve by a given round in order to have a given probability of winning. We calculated the variance per round by dividing the total variance by twelve, the number of rounds in a title bout. This provides a figure for the “average” boxer. We used the expected number of points per round to calculate the parameter lambda and then utilized a Poisson approximation. To determine the parameter k, we subtracted the given number of previously achieved points from the lower limit of our expected value confidence interval. We then divided this figure by the number of rounds remaining to find an average number of points per round necessary for the fighter to achieve in order to reach the range. We can disregard the upper limit of the range because the fighter must only achieve a minimum number of points to enter the interval. Tallying a point total greater than the range only leads to a higher (though quite marginally higher) probability of winning. Lastly, since the figures we used in our Poisson approximation refer to the “average” boxer, it follows that his probability of winning is 50%. Thus, given the probability that a boxer will achieve enough points to be in the 95% confidence interval (the result of our Poisson approximation - from k=x to infintiy (⋋ke-k )k! where X is the average number of points per round needed to reach the lower limit of points necessary to be within the range of points that 95% of winning boxers achieve), we can multiply this figure by .5 in order to obtain the probability that the fighter will win. Then, we calculated the necessary number of points through a given round to have roughly 50%, 33%, and 25% chance of winning the fight. Results: Our findings (active component) show that the total number of points scored in rounds 1-2 has no perceivable effect on the probability of winning the fight. At the start of round 4, with 0 points scored, the fighter has a 49.7% chance of winning. With 19 points scored, he has a 50% chance of winning. Having scored 0 points by end of round 4, the fighter’s probability then drops significantly to a 39.2% chance of winning the fight. By scoring 36 points by the end of round 4, the fighter’s probability of winning increases to 50%. The accumulation of points begins to have more than marginal effects on the outcome at the start of round 5, steadily influencing the result more strongly each round thereafter. See spreadsheet for empirical data. Discussion: It is quite a surprising and important that the first two rounds have virtually no effect on the outcome of the match (assuming there is no KO which is something we would take into account if we went more in depth on this project.) However, as the fight gets into later rounds, the differences in points needed for a certain probability to win, have greater variability. For example, at the start of round 8, a boxer needs to have already scored 85 points to have a 50% chance of winning, only 15 points to have a 33% chance of winning, and only 5 points to have a 26% chance of winning! These results may seem startling; however, they make sense because a common boxing strategy is to conserve energy and tire your opponent. Furthermore, there is a 50% that a boxer goes on a streak of at least 32 points in one round (boxer’s often have flurries of punches which our research suggests.) Lastly, some fights produce a victor who has achieved fewer points than his opponent. This does not contradict our results. Our results show the probability of winning, given a fighter achieves a point total within the necessary range. That is not to say that he will win every time he reaches this total. For example, see attached stat sheet for Williams defeats Lara. This happens in real scenarios because of variability in the actual punches’ effects and thus the judges’ perception of the fight (i.e. a head shot trumps a body shot). For future analysis, a detailed examination of punch location would benefit our findings. Our research does have problems because the scope of this project is not as large as necessary to come up with a more accurate research paper. One of our major assumptions, which is obviously not always true, is that all boxers are equal. We could solve this problem by going through round by round for each boxer and compute their individual ratio of power to jab punch, and also a lambda (rate of points scored) for each individual boxer. Furthermore, these statistics can never be too accurate because the determination of jab versus power and hit versus miss is all determined by human, which is obviously susceptible to error. Our choice of coefficients is admittedly somewhat arbitrary. Probably the most significant change to our research methods in the future would be to compute more accurate coefficients for the jab and power punch. Another problem with this research is that we do not have a great sample space. Firstly, we have too little observations. Secondly, the sample space is not completely random because there are more statistics available for famous boxers (there are more stats for Floyd Mayweather who is known for his power boxing style.) If we had a larger data pool, the standard deviations and hence the 95% confidence intervals will tend towards the mean. As a result, the number of points accumulated at the start of a round will need to be higher in order to maintain our given chances of winning. Consequently, the earlier rounds will matter more; however, their effect will be very small compared to later rounds. Furthermore, if we wanted to include matches that ended with a KO, it might be more appropriate to use a fat-tail distribution and treat KO’s as black swans (very rare events for a normal distribution.) This distribution is very similar to the normal distribution; however, there is a greater chance that outcomes far from the mean occur (as shown by the picture below.) Conclusions: Our results provide information for betters who may choose to place their bets during the fight given certain conditions from previous rounds. Punching statistics only begin to affect the outcome of the match by the end of round 4 or the start of round 5. That is, how many punches a boxer lands in the first 3 rounds has very little statistical effect on the outcome of the match, and only begins to positively influence the result in round 4. Thus, while many observers may view lopsided early rounds as reason to bet against a particular fighter, our results show that a better ought to keep in mind that the results of early rounds do not statistically matter. For example, if a boxer conserves his energy and does not land a single punch by the start of round 7, many people will believe that this boxer will lose the match (and the announcers will most likely fuel that fire.) However, our results show that this fighter still has a 39% chance of winning! Therefore, as long as the odds on that fighter are longer than 3-to-2, a better should place a bet on him. Although these results are somewhat surprising, it is not counter-intuitive to see how a boxer could accumulate very few points in early rounds, allowing his opponent to tire, and still make a comeback by tallying 40-50 points or more in each of the remaining rounds. Mohammad Ali provides strong anecdotal evidence of this with the rope-a-dope strategy he championed in the 1974 fight, “The Rumble in the Jungle,” versus George Foreman.