Big Ten Men`s Basketball

advertisement
Nick Jones
LOGOM-5300
Term Project
Big Ten Men’s Basketball
Overview
College athletics, and specifically men's college basketball, generates millions of dollars in
revenue for universities in America. The highest level of collegiate men’s basketball is NCAA Division I.
As of 2015-16, there are 351 teams spread across 32 conferences in NCAA Division I Men’s Basketball
(www.ncaa.org). One such conference, the Big Ten, is consistently among the top conferences in terms
of competitiveness and overall strength. The Big Ten, which began is 1895, currently consists of 14
schools spread from the Midwest to the East Coast. Their current members are Illinois, Indiana, Iowa,
Maryland, Michigan, Michigan State, Minnesota, Nebraska, Northwestern, Ohio State, Penn State,
Purdue, Rutgers, and Wisconsin (www.bigten.org).
This report will explore the Big Ten Men’s Basketball Conference over the past 11 seasons and
what team statistics most significantly affect the overall success of teams. The goal is to develop a
statistical model and equation that can assist with predicting the success (win total) of a team given
certain variables. It will also explore the conference as it compares the NCAA as a whole to discover if
there are any significant statistical differences and if so, what may be causing those differences.
Data Collection
To begin, season totals for each team in the conference were compiled via data stored on
www.bigten.org. Data was used from the 2004-05 through the 2014-15 seasons. This was done to
provide an accurate representation of overall success over the past decade. The table below shows an
Nick Jones
LOGOM-5300
Term Project
example of the data collected for one team.
Once the totals were entered, further calculations were completed to determine the per-game
averages for each statistical category (see below).
After the per-game averages for each season were determined, the 11-season averages were
totaled in the bottom row. These averages were then compiled for each team in the table seen below.
Nick Jones
LOGOM-5300
Term Project
The table above allows for sorting based on a statistical category in order to determine which
school faired best over the past 11 seasons. This is useful because it will show which statistics may best
correlate to wins. However, to accurately determine the correlation between a statistic and wins,
additional work is needed.
For the second portion of the study, total NCAA Division I per-game averages were used via
www.ncaa.org. As with the Big Ten team analysis, the per-game averages for each season were
compiled after which a total average was calculated for the 11 season period.
Totals for all the Big Ten teams were also calculated. This data was then combined to allow for
comparison of the Big Ten to the total NCAA.
Nick Jones
LOGOM-5300
Term Project
**The number of teams in the NCAA varies from year to year based on teams moving up and down
divisions or leaving the NCAA entirely. Thus it was necessary to include total number of teams among
the statistics used to compute the total averages.
Hypothesis
For the study of the Big Ten teams, a correlation analysis was conducted for each statistics test
the significance to wins and win percentage. The table below summarizes what was found.
Nick Jones
LOGOM-5300
Term Project
According to the correlation analysis, there are 9 statistics that have a strong correlation (r > .5
or r < -.5) and 4 that have a very strong positive correlations (r > .7). Using the four statistics with a very
strong correlation, a multiple regression equation will be developed that can predict the average wins
per-year. If this model does not prove to be viable, other combinations of independent variables will be
tested using multiple regression analysis to build the most effective model.

Null Hypothesis: There is no relation between Y (wins) and the four variables with the
strongest correlation coefficients (FG%, 3FG%, FGM, & PTS/GM).
For the second portion of the study, after analyzing the Big Ten averages versus the NCAA
averages, the A/TO Ratio and FT% stand out to the noticeably different. A T-Test will be used to
determine if they are significantly different.

Null Hypothesis: The mean A/TO Ratio for the Big Ten is 0.961.

Null Hypothesis: The mean FT% for the Big Ten is .0705
Testing Plan/Analysis
To find out if there is a relationship between our independent and dependent variables to
predict wins a multiple regression analysis was done. Using Microsoft Excel, the multiple regression
analysis was completed using the 4 independent variables with the highest correlation found in the
correlation analysis; Field Goal Percentage(X1), 3-Point Field Goal Percentage(X2), Field Goals Made(X3),
and Points Per Game(X4). These independent variables are tested against the dependent variable of
total wins (Y). The results of the regression analysis are shown below.
Nick Jones
LOGOM-5300
Term Project
Using the coefficients and intercepts, we can develop the below equation to predict win totals:
Y = 163.02X1 + 76.48X2 + 2.54X3 - .81X4 – 85.39
However, we must analyze the regression summary to determine if the equation can be trusted.
While the results appear to show there is a significant relationship between the independent variables
and the dependent variable (Wins), this may not be the case. The levels of the Significance F and R
Square do show promise in the model. The Significance F value is shown as 0.0106, meaning there is
only a 1% chance that the results are occurring by chance. This is well within the accepted 5% to show
Nick Jones
LOGOM-5300
Term Project
significance. The R Square value is .7369, meaning that 74% of the outputs can be explained by the
inputs.
However, when looking deeper into the results of the regression analysis, a clear sign of
multicollinearity stands out. The P-Values of the independent variables are all well above the .05
threshold for significance. Meaning two or more of the variables are highly correlated, and in turn, the
model is not an acceptable predictor. In order to find a more effective model, other variables will need
to be explored. After numerous multiple regression analyses, a best fit model was attained. This model
used A/TO Ratio (X1) and Defensive Rebounds per Game (X2) to predict wins (See Multiple Regression
Results Below).
Nick Jones
LOGOM-5300
Term Project
The model above has a relatively high R-Square value of .81 as well as a low Significance F value
of .0001. These two values point to this model being an accurate predictor of the dependent variable
wins. Also, the model corrects the previous problem of multicollinearity. The P-Value of both variables
is below the .05 threshold meaning they can be assumed significant. The Standard Residuals using this
model are all below 2, which is a sign there are no outliers in the data set. All of this leads to the
creation of a more accurate equation to predict wins which is shown below:
Y = 22.72X1 + 2.49X2 – 63.62
To determine if the Big Ten statistical averages are the same as the overall NCAA averages, a Ttest was used. The two statistics that stood out to the naked eye as different were A/TO Ratio and FT
Percentage. In order to compute the T-test, descriptive statistics were calculated on both the NCAA
average and the Big Ten average.
First, it is determined a 2-tail test is needed because the goal is to find out if the sample is
significantly different. The critical value for a 2-tail t-test with an alpha of .05 was computed using the
“=TINV” function in Microsoft Excel. It was determine the critical value in this test is 2.23. To find the T-
Nick Jones
LOGOM-5300
Term Project
Stat, or the number of standard deviations the sample mean is from the population mean, the mean of
the population (NCAA) is subtracted from the sample mean (Big Ten) and that value is divided by the
sample’s standard error. The value of the t-stat for this test is 6.85. Because the T-stat of 6.86 is larger
than our critical value of 2.23, it can be determined that the Big Ten’s average A/To Ratio is significantly
different than the NCAA average.
The same process outlined above was used to find out if the Free Throw Percentage of the Big
Ten was significantly different from that of the NCAA. The results are below.
Again, because the calculated T-Stat of the sample (3.85) is larger than the critical value
calculated for this 2-tailed test with an alpha of .05 (2.23), we determine that the sample mean is
significantly different. The sample mean falls 3.85 standard deviations from the mean of the population.
Nick Jones
LOGOM-5300
Term Project
Findings:
For the first portion of the study, the findings suggested that the multiple regression equation
could be used to predict wins, just not the multiple regression equation we initially developed using the
best correlated statistics. The null hypothesis stated that there was not a significant relationship
between our first set of X variables and the Y variable of wins. Based on the findings, we would accept
the null hypothesis. The X variables (FG%, 3FG%, FGM, PTS/GM) showed a clear signs of
multicollinearity and were not significant variables to use in predicting the win total. However, when
further analysis was done, a multiple regression equation was found that could be used to predict the
win totals. Using X variables A/TO Ratio and Defensive Reb/Game, an equation was created that was
had significant R-Squared, Significance F, and P-Values.
I think that these results are extremely valuable and interesting. The correlation of shooting
statistics and scoring used in the first regression analysis did not meet the necessary criteria for an
accurate model even though they had the highest direct correlation with wins. The A/TO Ratio and Def
Reb/Game variables provided the most accurate model. These two statistics are constantly emphasized
as keys to winning games and my model backs up that point. For each additional Def Reb/Game a team
averages, their win prediction raises ~2.5 total games. The typical win total a team needs to make the
NCAA tournament is around 20 games, so increasing your win total by 2.5 games is an enormous
benefit. Also, the A/TO ratio having a coefficient of 22 is a little misleading when that statistic usual
raises by tenths and not whole units, but even raising your A/TO Ratio by one-tenth will raise your
predicted wins around 2 games. These numbers are proof that these statistics are important and
coaches are right to emphasize their significance on wins over the course of a season.
Nick Jones
LOGOM-5300
Term Project
In testing whether the two statistics (A/TO Ratio and FT%) are significantly different in the Big
Ten than the NCAA as a whole. In both cases we reject the null hypothesis that the averages are the
same as the NCAA average in favor of the alternative. The Big Ten averages are significantly different
than the NCAA averages. This is demonstrated in the analysis portion of the report above.
Again, I did not find these results too surprising. The Big Ten conference is known nationally as a
physical conference in which the teams are disciplined. This explains both findings and proves this
national perception to be fairly accurate as well. The A/TO Ratio is the most different and the T-Stat is
nearly three times the critical value, showing that the teams in the Big Ten take better care of the ball
than the rest of the teams in the NCAA, which speaks to their discipline in both sharing the ball and have
a high number of assists and also the low number of times they turn the ball over. The high FT% is also
commonly seen as a sign of a disciplined team.
In conclusion, I was really excited when I dove deeper than my initial regression analysis that
explored the four most highly-correlated variables. It showed me that the preaching’s I’d heard for so
long from coaches about the importance of defensive rebounding and sharing/protecting the ball were
backed up by statistical evidence. I also enjoyed finding evidence that some long believed narratives
about the Big Ten conference were true.
Download