View

advertisement
Page |0
TO DRAFT OR NOT TO DRAFT?
A Statistical Analysis of the NFL Draft
Abstract
America’s up and coming past-time, the National Football League, is gaining popularity every
year. This paper will use NFL draft results, combine results, and NCAA data to analyze 2,639
drafted college football players. Through use of one-way ANOVA’s we find significant effects on
draft position from NCAA conference and player positions. We also create significant multiple
regression models to predict draft position for each football position group. Potential predictor
variables include: height, weight, 40-yard dash, vertical leap, broad jump, 3 cone drill, shuttle
run, bench press, and the number of collegiate awards obtained throughout college.
Page |1
Introduction & Significance:
Many believe that America’s game is no longer baseball, but rather, American football.
The United States President, Barack Obama, had six NFL players in attendance (Clarke) and
even called Seattle Seahawks’ Richard Sherman by name during his most recent White House
Correspondents’ Dinner (Florio). As the NFL becomes more and more accessible through
television and the internet, America’s game continues to gain viewers and fans. Last season’s
NFL Super Bowl became the most watched television event of all time, and led to 24.9 million
tweets on Twitter (“Super”). In addition to the Super Bowl, 2013 regular season games
accounted for 34 of the 35 most viewed shows in America (David Smith). As fans become more
interested in football and NFL teams, they start to care more about what players are on their
favorite teams. The number one way NFL teams add new and young players is through the
NFL draft, which takes place in the spring of each year. At this event the 32 NFL teams are
allowed to draft (in reverse order for the previous year’s standings) new football players to join
their respective teams. In 2012 the Indianapolis Colts were 2-14 when they chose to use their
number one draft pick on Stanford’s Andrew Luck. The following season, the team went 11-5
and made it to the playoffs. But for every Andrew Luck there is also a Jamarcus Russel or Ryan
Leaf, who end up actually making their teams worse! Clearly the importance of the NFL draft on
team performance cannot be overstated.
When one player can have such an extreme effect on a team’s performance, how do
NFL general managers decide who to draft, and maybe more importantly, who not to draft? At
the annual NFL combine, NFL team scouts interview players and record performance-related
measures. In addition to these combine results, NFL teams often use players’ college position
and performance to determine a player’s draft stock.
One method used to assess a player’s collegiate performance is through the number of
awards the player has won. Every year the National Collegiate Athletic Association (NCAA)
gives out awards to recognize the best performing collegiate football players overall and for
each football position (Appendix 1, Note 1). It is often believed that some NCAA football
conferences are tougher than others, and produce better players. Others also believe offensive
players are the most important players on the field, and should be drafted earlier than defensive
players.
Methods
For the purposes of this paper we will analyze the possible effects of NFL combine
measures, collegiate performance and player position on the overall draft position of NFL
players. We will use recorded NFL combine data (“NFL Combine”), draft results (“NFL Draft”),
and NCAA award results (“College”) from 1999 to 2012, to create a data set containing
numerical variables for the number of collegiate awards obtained and overall draft pick of each
player, as well as the players NFL combine height (inches), weight (pounds), 40 yard dash time
(seconds), shuttle time (seconds), vertical leap (inches), broad jump (inches), bench press
(number of 225 pound repetitions), and three cone drill time (seconds), with categorical
variables to indicate the players position and collegiate NCAA division 1-A conference
(Appendix 1, Figure 2). This study utilizes eight player position groups: defensive backs,
linebackers, defensive linemen, wide receivers, offensive linemen, tight ends, quarterbacks, and
running backs (Appendix 1, Figure 1).
This data set contains 2,639 players who play the positions mentioned in figure 1. We
assume this sample is a representative sample of all NFL draft data, which allows us to do
inference even though our sample is not technically random and independent. Through the use
of this compiled data set we will attempt to determine whether or not the position or collegiate
conference a player plays in has an effect on overall draft position, and what the best predictors
or combination of predictors of overall draft position are. We will primarily use one-way
ANOVAs and multiple linear regression techniques to study these research questions.
Page |2
Results
The first question we chose to analyze was whether the conference a player plays in
effects overall draft position. This question will be analyzed with a one-way ANOVA. In addition
to independence, the ANOVA conditions are normality and zero mean of the residuals, and
similar standard deviation for each group. The residual vs. fitted plot shows an even distribution
of points above and below zero signifying that the zero mean assumption is met. The normal
QQ plot shows slight curvature in the lower and upper areas. Due to the ANOVA’s robustness
however, we still deemed this condition met. The discussed plots are shown in Appendix 1,
Figures 3 & 4. The ANOVA results suggest that college conference does have an effect on
overall draft position (F(10, 2274) = 2.94, p < .01). Figure 2 in Appendix 1 finds that as expected
the big powerhouse conferences such as the SEC, ACC, and Pac12 have the lowest average
overall draft position. (Note: a lower overall draft position indicates a player was chosen earlier
in the draft.)
Before determining the best predictors of overall draft position, we need to determine
whether it is appropriate to construct a single multiple regression model for all players
(regardless of position), or construct position-specific regression models. To test for a position
effect on the overall draft position we used a one-way ANOVA with response variable, overall
drafted position, and categorical explanatory variable, position. All of the conditions were met
and assessed just as described in the previous paragraph. The plots used are shown in
Appendix 1, Figures 5 & 6. The ANOVA results (F(7, 2631) = 3.86, p < .01) indicate player
position does have an effect of draft position, suggesting that multiple regression models for
each player position is appropriate.
We began using statistical software to run through all subsets of possible predictors of
overall draft position for each player position. After this we chose models with high adjusted Rsquared and low Mallow’s CP terms. Then after various alterations and analysis of the overall
adjusted R-squared, the significance of the predictors, and residual plots, we determined the
best models. We wanted our final models to have a relatively high adjusted R-squared,
significant predictors, and residual plots that met the multiple regression conditions.
The conditions for a multiple linear regression are: randomness and independence in the
sample, and linearity in the coefficients, as well as zero mean, constant variance, and normality
of the residuals. To meet the linearity condition, we created each model such that all
coefficients were linear. The zero mean assumption is automatic due to least squares
regression. We did many transformations to acquire residual vs. plots that showed a random
scatter of points, and straight line normal QQ plots. All of the plots are shown in Appendix 1,
Pages 4 & 5.
The final models are seen in Table 4. More detailed fitted models are shown in Appendix
1, Pages 2 & 3. Throughout each model we used a square root transformation on the response
variable, overall draft position, and used at least one other transformation on an explanatory
variable as well. The majority of the models also use interaction or squared terms. Although
the adjusted R-squared values were rather small, each model was accompanied by significant
p-values, indicating each model is a good model of overall draft position. In addition every
predictor in every model was significant at the 10% level.
TABLE 4. (T.C.A. = Total Collegiate Awards)
WR Model -19.08+38.1*log(40 Time)-2.9*log(Bench)-.03*Height-4.55*T.C.A.
QB Model 166.2-24.65*log(Weight)-18.15*log(Broad Jump)+32.08*log(3 Cone)5.38*T.C.A.+1.26*T.C.A.^2
RB Model
-308+112.45*40 Time-0.17*Broad Jump-4.09*Shuttle-2.86*T.C.A.-1.65*40
Time^3
TE Model
76.61+5.94*40 Time-12.05*log(Broad Jump)-2.06*sqrt(Weight)-2.97*T.C.A.0.16*Bench
Page |3
OL Model
DL Model
LB Model
DB Model
35.167+4.59*40 Time-4.23*sqrt(Height)-2.34*log(Bench)-0.07*Weight6.3*T.C.A.+3.81*T.C.A.^2+8.45*log(3 Cone)
-22.18+51.25*40 Time-0.16*Broad Jump+0.2*Vert. Leap-0.08*Weight2.37*T.C.A.+32.71*3 Cone-2.63*Shuttle-5.91*40 Time*3 Cone
-103.9+8.58*40 Time-9.03*log(Broad Jump)+1.03*Weight-0.01*Weight^29.43*T.C.A+3.46*T.C.A^2
6.34+56.01*log(40 Time)-4.15*log(Vert. Leap)-17.57*log(Height)4.19*T.C.A.+1.4*3 Cone
Discussion & Conclusion
During the NFL combine, much emphasis is placed on player’s 40-yard dash time and
vertical leap distance. The 40-yard dash variable was a significant predictor for all positions
except quarterbacks, yet the vertical leap measurement was significant for only defensive backs
and defensive linemen. Many analysts associate vertical leap skills with receivers, yet the
variable was not deemed significant in our receiver model. However, the broad jump variable
was a significant predictor for five of the position groups. In all models that include height,
broad jump, or bench reps, as the variable increases, predicted overall draft position decreases.
The speed variables 40-yard dash and 3 cone drill decreased overall draft pick as the variables’
time decreased. Shuttle run, used in two models, had the opposite effect by increasing draft
position as time decreased. This result suggests that confounding variables may be in play.
The number of total collegiate awards won by players entering the draft was a significant
predictor for every single position, and in every case lowered the predicted overall draft position.
This suggests NFL teams put a lot of weight on collegiate performance.
The models with the lowest R-squared values were offensive linemen and wide
receivers. Offensive linemen need to understand intricate NFL blocking schemes, while wide
receivers need to quickly know which routes will work against each defensive scheme. In both
cases smart players may have an advantage. The same sort of process could be made for any
position, and suggests NFL teams may covet smart, quick-witted players, by putting a lot of
weight on IQ (Wonderlic) scores, interviews, and even GPA’s. All of which are unavailable to
the public. Our models also do not account for college game statistics, only collegiate awards.
NFL teams likely use game statistics and unreported player attributes when determining a
players worth. All of these missing variables more than likely lead to our low R-squared values.
Future research should be done to integrate game statistics into these models, as well as
attempt to attain Wonderlic scores and other psychological or personality measures.
There is an almost infinite number of variables that could be used by NFL general
managers to determine who to draft, which makes it very difficult to create a highly effective
model for draft position. Although the NFL combine is now televised and more combine results
are becoming available to the public, some things are still unavailable. Interviewing sessions,
Wonderlic scores, college GPAs, and even criminal records are all sources of information the
public does not have access to. Even if those sources were available, they still cannot account
for everything that affects draft order. However we do conclude that a player’s position and
collegiate conference has an effect on where a player is drafted. It’s evident that some
variables are more important than others when determining a player’s draft position, and
depend on player position. In order to make a more reliable model, many variables need to be
considered and private information needs to become available. NFL combine results
undoubtedly have an effect on where the next Hall-of-Famer will be drafted, but what happens
off the field may be even more important.
Page |4
References
Clarke, Patrick. “NFL Players Reportedly Will Attend White House Correspondents’ Dinner.”
Bleacherreport.com. Web. 2014. <http://bleacherreport.com/articles/2048459-nfl-playersreportedly-will-attend-white-house-correspondents-dinner>
“College Football Awards.” ESPN.com. Web. 2014. <http://espn.go.com/collegefootball/awards>
David Smith, Michael. “34 of America’s 35 most-watched fall TV shows were NFL games.”
Profootballtalk.com. Web. 2014. <http://profootballtalk.nbcsports.com/2014/01/08/34-ofamericas-35-most-watched-fall-tv-shows-were-nfl-games/>
Florio, Mike. “Obama mimics Sherman during Correspondents Dinner.” Profootballtalk.com.
Web. 2014. <http://profootballtalk.nbcsports.com/2014/05/05/obama-mimics-shermanduring-correspondents-dinner/>
“NFL Combine Results”. NFLcombineresults.com. Web. 2014.
<http://nflcombineresults.com/nflcombinedata.php>
“NFL Draft History.” CBSSports.com. Web. 2014. <http://www.cbssports.com/nfl/draft/history>
“Super Bowl XLVIII breaks record for most-watched TV event in US history.” Foxnews.com.
Web. 2014. <http://www.foxnews.com/entertainment/2014/02/04/super-bowl-xlviiibreaks-record-for-most-watched-tv-event-in-us-history/>
Page |1
Figure 1. Red implies offense, black implies defense, and
Italicized implies skill position.1
Figure 2. Box plots of overall draft pick for
each conference.2
Note 1: NCAA awards used in calculation of Total Collegiate Awards (T.C.A.): Heisman Trophy, Maxwell Award, Walter Camp
Award, Doak Walker Award, Davey O’Brien Award, Johnny Unitas Golden Arm, Fred Biletnikoff Award, John Mackey Award, Outland
Trophy, Vince Lombardi/Rotary Award, Rimington Trophy, Chuck Bednarik Award, Bronko Nagurski Award, Dick Butkus Award, Jim
Thorpe Award.
1
DB (Defensive Backs; Safeties & Cornerbacks), LB (Linebackers), DL (Defensive Linemen), OL (Offensive Linemen), QB (Quarterback), WR (Wide Receiver), TE
(Tight End), RB (Running Backs; Running Backs & Full Backs)
2
NCAA D1A Conferences: American Athletic, ACC, Big 12, Big Ten, Conference USA, FBS Independents, Mid-American, Mountain West, Pac-12, SEC, Sunbelt.
1|Page
Page |2
Table 1A. Defensive Lineman Model
Table 1B. Offensive Lineman Model
Table 1C. Linebacker Model
Table 1D. Quarterback Model
Table 1E. Running Back Model
2|Page
Page |3
Table 1F. Tight End Model
Table 1G. Defensive Back Model
Table 1H. Wide Receiver Model
Figure 3. Normal QQ
Figure 4. Residual vs. Fitted
Figure 5. Normal QQ
Figure 6. Residual vs. Fitted
3|Page
Page |4
Page |5
5|Page
Download