Page |0 TO DRAFT OR NOT TO DRAFT? A Statistical Analysis of the NFL Draft Abstract America’s up and coming past-time, the National Football League, is gaining popularity every year. This paper will use NFL draft results, combine results, and NCAA data to analyze 2,639 drafted college football players. Through use of one-way ANOVA’s we find significant effects on draft position from NCAA conference and player positions. We also create significant multiple regression models to predict draft position for each football position group. Potential predictor variables include: height, weight, 40-yard dash, vertical leap, broad jump, 3 cone drill, shuttle run, bench press, and the number of collegiate awards obtained throughout college. Page |1 Introduction & Significance: Many believe that America’s game is no longer baseball, but rather, American football. The United States President, Barack Obama, had six NFL players in attendance (Clarke) and even called Seattle Seahawks’ Richard Sherman by name during his most recent White House Correspondents’ Dinner (Florio). As the NFL becomes more and more accessible through television and the internet, America’s game continues to gain viewers and fans. Last season’s NFL Super Bowl became the most watched television event of all time, and led to 24.9 million tweets on Twitter (“Super”). In addition to the Super Bowl, 2013 regular season games accounted for 34 of the 35 most viewed shows in America (David Smith). As fans become more interested in football and NFL teams, they start to care more about what players are on their favorite teams. The number one way NFL teams add new and young players is through the NFL draft, which takes place in the spring of each year. At this event the 32 NFL teams are allowed to draft (in reverse order for the previous year’s standings) new football players to join their respective teams. In 2012 the Indianapolis Colts were 2-14 when they chose to use their number one draft pick on Stanford’s Andrew Luck. The following season, the team went 11-5 and made it to the playoffs. But for every Andrew Luck there is also a Jamarcus Russel or Ryan Leaf, who end up actually making their teams worse! Clearly the importance of the NFL draft on team performance cannot be overstated. When one player can have such an extreme effect on a team’s performance, how do NFL general managers decide who to draft, and maybe more importantly, who not to draft? At the annual NFL combine, NFL team scouts interview players and record performance-related measures. In addition to these combine results, NFL teams often use players’ college position and performance to determine a player’s draft stock. One method used to assess a player’s collegiate performance is through the number of awards the player has won. Every year the National Collegiate Athletic Association (NCAA) gives out awards to recognize the best performing collegiate football players overall and for each football position (Appendix 1, Note 1). It is often believed that some NCAA football conferences are tougher than others, and produce better players. Others also believe offensive players are the most important players on the field, and should be drafted earlier than defensive players. Methods For the purposes of this paper we will analyze the possible effects of NFL combine measures, collegiate performance and player position on the overall draft position of NFL players. We will use recorded NFL combine data (“NFL Combine”), draft results (“NFL Draft”), and NCAA award results (“College”) from 1999 to 2012, to create a data set containing numerical variables for the number of collegiate awards obtained and overall draft pick of each player, as well as the players NFL combine height (inches), weight (pounds), 40 yard dash time (seconds), shuttle time (seconds), vertical leap (inches), broad jump (inches), bench press (number of 225 pound repetitions), and three cone drill time (seconds), with categorical variables to indicate the players position and collegiate NCAA division 1-A conference (Appendix 1, Figure 2). This study utilizes eight player position groups: defensive backs, linebackers, defensive linemen, wide receivers, offensive linemen, tight ends, quarterbacks, and running backs (Appendix 1, Figure 1). This data set contains 2,639 players who play the positions mentioned in figure 1. We assume this sample is a representative sample of all NFL draft data, which allows us to do inference even though our sample is not technically random and independent. Through the use of this compiled data set we will attempt to determine whether or not the position or collegiate conference a player plays in has an effect on overall draft position, and what the best predictors or combination of predictors of overall draft position are. We will primarily use one-way ANOVAs and multiple linear regression techniques to study these research questions. Page |2 Results The first question we chose to analyze was whether the conference a player plays in effects overall draft position. This question will be analyzed with a one-way ANOVA. In addition to independence, the ANOVA conditions are normality and zero mean of the residuals, and similar standard deviation for each group. The residual vs. fitted plot shows an even distribution of points above and below zero signifying that the zero mean assumption is met. The normal QQ plot shows slight curvature in the lower and upper areas. Due to the ANOVA’s robustness however, we still deemed this condition met. The discussed plots are shown in Appendix 1, Figures 3 & 4. The ANOVA results suggest that college conference does have an effect on overall draft position (F(10, 2274) = 2.94, p < .01). Figure 2 in Appendix 1 finds that as expected the big powerhouse conferences such as the SEC, ACC, and Pac12 have the lowest average overall draft position. (Note: a lower overall draft position indicates a player was chosen earlier in the draft.) Before determining the best predictors of overall draft position, we need to determine whether it is appropriate to construct a single multiple regression model for all players (regardless of position), or construct position-specific regression models. To test for a position effect on the overall draft position we used a one-way ANOVA with response variable, overall drafted position, and categorical explanatory variable, position. All of the conditions were met and assessed just as described in the previous paragraph. The plots used are shown in Appendix 1, Figures 5 & 6. The ANOVA results (F(7, 2631) = 3.86, p < .01) indicate player position does have an effect of draft position, suggesting that multiple regression models for each player position is appropriate. We began using statistical software to run through all subsets of possible predictors of overall draft position for each player position. After this we chose models with high adjusted Rsquared and low Mallow’s CP terms. Then after various alterations and analysis of the overall adjusted R-squared, the significance of the predictors, and residual plots, we determined the best models. We wanted our final models to have a relatively high adjusted R-squared, significant predictors, and residual plots that met the multiple regression conditions. The conditions for a multiple linear regression are: randomness and independence in the sample, and linearity in the coefficients, as well as zero mean, constant variance, and normality of the residuals. To meet the linearity condition, we created each model such that all coefficients were linear. The zero mean assumption is automatic due to least squares regression. We did many transformations to acquire residual vs. plots that showed a random scatter of points, and straight line normal QQ plots. All of the plots are shown in Appendix 1, Pages 4 & 5. The final models are seen in Table 4. More detailed fitted models are shown in Appendix 1, Pages 2 & 3. Throughout each model we used a square root transformation on the response variable, overall draft position, and used at least one other transformation on an explanatory variable as well. The majority of the models also use interaction or squared terms. Although the adjusted R-squared values were rather small, each model was accompanied by significant p-values, indicating each model is a good model of overall draft position. In addition every predictor in every model was significant at the 10% level. TABLE 4. (T.C.A. = Total Collegiate Awards) WR Model -19.08+38.1*log(40 Time)-2.9*log(Bench)-.03*Height-4.55*T.C.A. QB Model 166.2-24.65*log(Weight)-18.15*log(Broad Jump)+32.08*log(3 Cone)5.38*T.C.A.+1.26*T.C.A.^2 RB Model -308+112.45*40 Time-0.17*Broad Jump-4.09*Shuttle-2.86*T.C.A.-1.65*40 Time^3 TE Model 76.61+5.94*40 Time-12.05*log(Broad Jump)-2.06*sqrt(Weight)-2.97*T.C.A.0.16*Bench Page |3 OL Model DL Model LB Model DB Model 35.167+4.59*40 Time-4.23*sqrt(Height)-2.34*log(Bench)-0.07*Weight6.3*T.C.A.+3.81*T.C.A.^2+8.45*log(3 Cone) -22.18+51.25*40 Time-0.16*Broad Jump+0.2*Vert. Leap-0.08*Weight2.37*T.C.A.+32.71*3 Cone-2.63*Shuttle-5.91*40 Time*3 Cone -103.9+8.58*40 Time-9.03*log(Broad Jump)+1.03*Weight-0.01*Weight^29.43*T.C.A+3.46*T.C.A^2 6.34+56.01*log(40 Time)-4.15*log(Vert. Leap)-17.57*log(Height)4.19*T.C.A.+1.4*3 Cone Discussion & Conclusion During the NFL combine, much emphasis is placed on player’s 40-yard dash time and vertical leap distance. The 40-yard dash variable was a significant predictor for all positions except quarterbacks, yet the vertical leap measurement was significant for only defensive backs and defensive linemen. Many analysts associate vertical leap skills with receivers, yet the variable was not deemed significant in our receiver model. However, the broad jump variable was a significant predictor for five of the position groups. In all models that include height, broad jump, or bench reps, as the variable increases, predicted overall draft position decreases. The speed variables 40-yard dash and 3 cone drill decreased overall draft pick as the variables’ time decreased. Shuttle run, used in two models, had the opposite effect by increasing draft position as time decreased. This result suggests that confounding variables may be in play. The number of total collegiate awards won by players entering the draft was a significant predictor for every single position, and in every case lowered the predicted overall draft position. This suggests NFL teams put a lot of weight on collegiate performance. The models with the lowest R-squared values were offensive linemen and wide receivers. Offensive linemen need to understand intricate NFL blocking schemes, while wide receivers need to quickly know which routes will work against each defensive scheme. In both cases smart players may have an advantage. The same sort of process could be made for any position, and suggests NFL teams may covet smart, quick-witted players, by putting a lot of weight on IQ (Wonderlic) scores, interviews, and even GPA’s. All of which are unavailable to the public. Our models also do not account for college game statistics, only collegiate awards. NFL teams likely use game statistics and unreported player attributes when determining a players worth. All of these missing variables more than likely lead to our low R-squared values. Future research should be done to integrate game statistics into these models, as well as attempt to attain Wonderlic scores and other psychological or personality measures. There is an almost infinite number of variables that could be used by NFL general managers to determine who to draft, which makes it very difficult to create a highly effective model for draft position. Although the NFL combine is now televised and more combine results are becoming available to the public, some things are still unavailable. Interviewing sessions, Wonderlic scores, college GPAs, and even criminal records are all sources of information the public does not have access to. Even if those sources were available, they still cannot account for everything that affects draft order. However we do conclude that a player’s position and collegiate conference has an effect on where a player is drafted. It’s evident that some variables are more important than others when determining a player’s draft position, and depend on player position. In order to make a more reliable model, many variables need to be considered and private information needs to become available. NFL combine results undoubtedly have an effect on where the next Hall-of-Famer will be drafted, but what happens off the field may be even more important. Page |4 References Clarke, Patrick. “NFL Players Reportedly Will Attend White House Correspondents’ Dinner.” Bleacherreport.com. Web. 2014. <http://bleacherreport.com/articles/2048459-nfl-playersreportedly-will-attend-white-house-correspondents-dinner> “College Football Awards.” ESPN.com. Web. 2014. <http://espn.go.com/collegefootball/awards> David Smith, Michael. “34 of America’s 35 most-watched fall TV shows were NFL games.” Profootballtalk.com. Web. 2014. <http://profootballtalk.nbcsports.com/2014/01/08/34-ofamericas-35-most-watched-fall-tv-shows-were-nfl-games/> Florio, Mike. “Obama mimics Sherman during Correspondents Dinner.” Profootballtalk.com. Web. 2014. <http://profootballtalk.nbcsports.com/2014/05/05/obama-mimics-shermanduring-correspondents-dinner/> “NFL Combine Results”. NFLcombineresults.com. Web. 2014. <http://nflcombineresults.com/nflcombinedata.php> “NFL Draft History.” CBSSports.com. Web. 2014. <http://www.cbssports.com/nfl/draft/history> “Super Bowl XLVIII breaks record for most-watched TV event in US history.” Foxnews.com. Web. 2014. <http://www.foxnews.com/entertainment/2014/02/04/super-bowl-xlviiibreaks-record-for-most-watched-tv-event-in-us-history/> Page |1 Figure 1. Red implies offense, black implies defense, and Italicized implies skill position.1 Figure 2. Box plots of overall draft pick for each conference.2 Note 1: NCAA awards used in calculation of Total Collegiate Awards (T.C.A.): Heisman Trophy, Maxwell Award, Walter Camp Award, Doak Walker Award, Davey O’Brien Award, Johnny Unitas Golden Arm, Fred Biletnikoff Award, John Mackey Award, Outland Trophy, Vince Lombardi/Rotary Award, Rimington Trophy, Chuck Bednarik Award, Bronko Nagurski Award, Dick Butkus Award, Jim Thorpe Award. 1 DB (Defensive Backs; Safeties & Cornerbacks), LB (Linebackers), DL (Defensive Linemen), OL (Offensive Linemen), QB (Quarterback), WR (Wide Receiver), TE (Tight End), RB (Running Backs; Running Backs & Full Backs) 2 NCAA D1A Conferences: American Athletic, ACC, Big 12, Big Ten, Conference USA, FBS Independents, Mid-American, Mountain West, Pac-12, SEC, Sunbelt. 1|Page Page |2 Table 1A. Defensive Lineman Model Table 1B. Offensive Lineman Model Table 1C. Linebacker Model Table 1D. Quarterback Model Table 1E. Running Back Model 2|Page Page |3 Table 1F. Tight End Model Table 1G. Defensive Back Model Table 1H. Wide Receiver Model Figure 3. Normal QQ Figure 4. Residual vs. Fitted Figure 5. Normal QQ Figure 6. Residual vs. Fitted 3|Page Page |4 Page |5 5|Page