Nick Jones LOGOM-5300 Term Project Big Ten Men’s Basketball Overview College athletics, and specifically men's college basketball, generates millions of dollars in revenue for universities in America. The highest level of collegiate men’s basketball is NCAA Division I. As of 2015-16, there are 351 teams spread across 32 conferences in NCAA Division I Men’s Basketball (www.ncaa.org). One such conference, the Big Ten, is consistently among the top conferences in terms of competitiveness and overall strength. The Big Ten, which began is 1895, currently consists of 14 schools spread from the Midwest to the East Coast. Their current members are Illinois, Indiana, Iowa, Maryland, Michigan, Michigan State, Minnesota, Nebraska, Northwestern, Ohio State, Penn State, Purdue, Rutgers, and Wisconsin (www.bigten.org). This report will explore the Big Ten Men’s Basketball Conference over the past 11 seasons and what team statistics most significantly affect the overall success of teams. The goal is to develop a statistical model and equation that can assist with predicting the success (win total) of a team given certain variables. It will also explore the conference as it compares the NCAA as a whole to discover if there are any significant statistical differences and if so, what may be causing those differences. Data Collection To begin, season totals for each team in the conference were compiled via data stored on www.bigten.org. Data was used from the 2004-05 through the 2014-15 seasons. This was done to provide an accurate representation of overall success over the past decade. The table below shows an Nick Jones LOGOM-5300 Term Project example of the data collected for one team. Once the totals were entered, further calculations were completed to determine the per-game averages for each statistical category (see below). After the per-game averages for each season were determined, the 11-season averages were totaled in the bottom row. These averages were then compiled for each team in the table seen below. Nick Jones LOGOM-5300 Term Project The table above allows for sorting based on a statistical category in order to determine which school faired best over the past 11 seasons. This is useful because it will show which statistics may best correlate to wins. However, to accurately determine the correlation between a statistic and wins, additional work is needed. For the second portion of the study, total NCAA Division I per-game averages were used via www.ncaa.org. As with the Big Ten team analysis, the per-game averages for each season were compiled after which a total average was calculated for the 11 season period. Totals for all the Big Ten teams were also calculated. This data was then combined to allow for comparison of the Big Ten to the total NCAA. Nick Jones LOGOM-5300 Term Project **The number of teams in the NCAA varies from year to year based on teams moving up and down divisions or leaving the NCAA entirely. Thus it was necessary to include total number of teams among the statistics used to compute the total averages. Hypothesis For the study of the Big Ten teams, a correlation analysis was conducted for each statistics test the significance to wins and win percentage. The table below summarizes what was found. Nick Jones LOGOM-5300 Term Project According to the correlation analysis, there are 9 statistics that have a strong correlation (r > .5 or r < -.5) and 4 that have a very strong positive correlations (r > .7). Using the four statistics with a very strong correlation, a multiple regression equation will be developed that can predict the average wins per-year. If this model does not prove to be viable, other combinations of independent variables will be tested using multiple regression analysis to build the most effective model. Null Hypothesis: There is no relation between Y (wins) and the four variables with the strongest correlation coefficients (FG%, 3FG%, FGM, & PTS/GM). For the second portion of the study, after analyzing the Big Ten averages versus the NCAA averages, the A/TO Ratio and FT% stand out to the noticeably different. A T-Test will be used to determine if they are significantly different. Null Hypothesis: The mean A/TO Ratio for the Big Ten is 0.961. Null Hypothesis: The mean FT% for the Big Ten is .0705 Testing Plan/Analysis To find out if there is a relationship between our independent and dependent variables to predict wins a multiple regression analysis was done. Using Microsoft Excel, the multiple regression analysis was completed using the 4 independent variables with the highest correlation found in the correlation analysis; Field Goal Percentage(X1), 3-Point Field Goal Percentage(X2), Field Goals Made(X3), and Points Per Game(X4). These independent variables are tested against the dependent variable of total wins (Y). The results of the regression analysis are shown below. Nick Jones LOGOM-5300 Term Project Using the coefficients and intercepts, we can develop the below equation to predict win totals: Y = 163.02X1 + 76.48X2 + 2.54X3 - .81X4 – 85.39 However, we must analyze the regression summary to determine if the equation can be trusted. While the results appear to show there is a significant relationship between the independent variables and the dependent variable (Wins), this may not be the case. The levels of the Significance F and R Square do show promise in the model. The Significance F value is shown as 0.0106, meaning there is only a 1% chance that the results are occurring by chance. This is well within the accepted 5% to show Nick Jones LOGOM-5300 Term Project significance. The R Square value is .7369, meaning that 74% of the outputs can be explained by the inputs. However, when looking deeper into the results of the regression analysis, a clear sign of multicollinearity stands out. The P-Values of the independent variables are all well above the .05 threshold for significance. Meaning two or more of the variables are highly correlated, and in turn, the model is not an acceptable predictor. In order to find a more effective model, other variables will need to be explored. After numerous multiple regression analyses, a best fit model was attained. This model used A/TO Ratio (X1) and Defensive Rebounds per Game (X2) to predict wins (See Multiple Regression Results Below). Nick Jones LOGOM-5300 Term Project The model above has a relatively high R-Square value of .81 as well as a low Significance F value of .0001. These two values point to this model being an accurate predictor of the dependent variable wins. Also, the model corrects the previous problem of multicollinearity. The P-Value of both variables is below the .05 threshold meaning they can be assumed significant. The Standard Residuals using this model are all below 2, which is a sign there are no outliers in the data set. All of this leads to the creation of a more accurate equation to predict wins which is shown below: Y = 22.72X1 + 2.49X2 – 63.62 To determine if the Big Ten statistical averages are the same as the overall NCAA averages, a Ttest was used. The two statistics that stood out to the naked eye as different were A/TO Ratio and FT Percentage. In order to compute the T-test, descriptive statistics were calculated on both the NCAA average and the Big Ten average. First, it is determined a 2-tail test is needed because the goal is to find out if the sample is significantly different. The critical value for a 2-tail t-test with an alpha of .05 was computed using the “=TINV” function in Microsoft Excel. It was determine the critical value in this test is 2.23. To find the T- Nick Jones LOGOM-5300 Term Project Stat, or the number of standard deviations the sample mean is from the population mean, the mean of the population (NCAA) is subtracted from the sample mean (Big Ten) and that value is divided by the sample’s standard error. The value of the t-stat for this test is 6.85. Because the T-stat of 6.86 is larger than our critical value of 2.23, it can be determined that the Big Ten’s average A/To Ratio is significantly different than the NCAA average. The same process outlined above was used to find out if the Free Throw Percentage of the Big Ten was significantly different from that of the NCAA. The results are below. Again, because the calculated T-Stat of the sample (3.85) is larger than the critical value calculated for this 2-tailed test with an alpha of .05 (2.23), we determine that the sample mean is significantly different. The sample mean falls 3.85 standard deviations from the mean of the population. Nick Jones LOGOM-5300 Term Project Findings: For the first portion of the study, the findings suggested that the multiple regression equation could be used to predict wins, just not the multiple regression equation we initially developed using the best correlated statistics. The null hypothesis stated that there was not a significant relationship between our first set of X variables and the Y variable of wins. Based on the findings, we would accept the null hypothesis. The X variables (FG%, 3FG%, FGM, PTS/GM) showed a clear signs of multicollinearity and were not significant variables to use in predicting the win total. However, when further analysis was done, a multiple regression equation was found that could be used to predict the win totals. Using X variables A/TO Ratio and Defensive Reb/Game, an equation was created that was had significant R-Squared, Significance F, and P-Values. I think that these results are extremely valuable and interesting. The correlation of shooting statistics and scoring used in the first regression analysis did not meet the necessary criteria for an accurate model even though they had the highest direct correlation with wins. The A/TO Ratio and Def Reb/Game variables provided the most accurate model. These two statistics are constantly emphasized as keys to winning games and my model backs up that point. For each additional Def Reb/Game a team averages, their win prediction raises ~2.5 total games. The typical win total a team needs to make the NCAA tournament is around 20 games, so increasing your win total by 2.5 games is an enormous benefit. Also, the A/TO ratio having a coefficient of 22 is a little misleading when that statistic usual raises by tenths and not whole units, but even raising your A/TO Ratio by one-tenth will raise your predicted wins around 2 games. These numbers are proof that these statistics are important and coaches are right to emphasize their significance on wins over the course of a season. Nick Jones LOGOM-5300 Term Project In testing whether the two statistics (A/TO Ratio and FT%) are significantly different in the Big Ten than the NCAA as a whole. In both cases we reject the null hypothesis that the averages are the same as the NCAA average in favor of the alternative. The Big Ten averages are significantly different than the NCAA averages. This is demonstrated in the analysis portion of the report above. Again, I did not find these results too surprising. The Big Ten conference is known nationally as a physical conference in which the teams are disciplined. This explains both findings and proves this national perception to be fairly accurate as well. The A/TO Ratio is the most different and the T-Stat is nearly three times the critical value, showing that the teams in the Big Ten take better care of the ball than the rest of the teams in the NCAA, which speaks to their discipline in both sharing the ball and have a high number of assists and also the low number of times they turn the ball over. The high FT% is also commonly seen as a sign of a disciplined team. In conclusion, I was really excited when I dove deeper than my initial regression analysis that explored the four most highly-correlated variables. It showed me that the preaching’s I’d heard for so long from coaches about the importance of defensive rebounding and sharing/protecting the ball were backed up by statistical evidence. I also enjoyed finding evidence that some long believed narratives about the Big Ten conference were true.