Moneyball Are your students getting on base? Planning and Research Office Craig Hayward, Ph.D. Natalia Cordoba-Velasquez Cabrillo College January 31st, 2012 First, a primer on baseball: ◦ http://www.youtube.com/watch?v=cMha-DjYMqQ What is sabermetrics? The search for objective knowledge about baseball The value of getting on base Baseball statistics, unlike statistics in any other area, have acquired the power of language. ◦ Bill James, 1985 Statistical Abstract The four inefficiencies 1. Not basing decisions on data 2. Using the wrong data to make decisions 3. Using good data but in the wrong way 4. Not collecting the right data The Moneyball odyssey Where did Bill James start? ◦ He started where the data was the best ◦ “Looking at places where the stats don’t tell the whole truth – or even lie about the situation.” See a pattern? Spring 1993 Spring 1993 Spring 1995 Spring 1995 Credit Grade Count Credit Grade Count Credit Grade Count (%) Spring 1997 Spring 1997 Spring 1999 Spring 1999 Spring 2001 Spring 2001 Credit Credit Credit Grade Credit Grade Credit Grade Credit Grade Grade Grade Count Count (%) Count (%) Count (%) Count Count 100.00% 38,780 100.00% 38,339 100.00% 38,159 100.00% Credit Grade Count (%) Cabrillo Total 38,954 100.00% 38,289 Grade A 10,602 27.22% 10,603 27.69% 11,525 29.72% 11,651 30.39% 11,297 29.61% Grade B 6,083 15.62% 6,121 15.99% 5,626 14.51% 6,013 15.68% 6,011 15.75% Grade C 3,568 9.16% 3,327 8.69% 2,995 7.72% 3,025 7.89% 3,007 7.88% Grade D 815 2.09% 746 1.95% 699 1.80% 754 1.97% 643 1.69% Grade F 695 1.78% 830 2.17% 651 1.68% 974 2.54% 921 2.41% Pass 6,591 16.92% 5,405 14.12% 5,687 14.66% 5,856 15.27% 6,358 16.66% No Pass Incomplete No Credit Report Delayed Dropped 2,513 6.45% 2,604 6.80% 2,812 7.25% 3,048 7.95% 3,218 8.43% 575 1.48% 730 1.91% 710 1.83% 413 1.08% 339 0.89% 0.00% 54 0.14% 82 0.21% 125 0.33% 60 0.16% Withdrew Military Withdrawal Unknown 6,326 0.00% 16.24% 0.00% 6,999 0.00% 1,186 3.04% 18.28% 0.00% 6,857 0.00% 870 2.27% 17.68% 0.00% 5,535 0.00% 1,136 2.93% 945 14.44% 0.00% 6,305 16.52% 0.00% 0.00% 2.46% 0.00% Proportion of “A” Grades relative to all other grade notations Grade A 33% 32% 31% 30% 29% 28% 27% 26% 25% 24% Grade inflation? y = 0.0052x + 0.2721 R² = 0.8349 Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 A bit less inflated 45% y = 0.0041x + 0.3774 R² = 0.4208 40% 35% 30% Grade A 25% Grade B 20% Grade C Grade D 15% Grade F 10% Withdrew 5% 0% Linear (Grade A ) Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 Grade inflation in proper context 60% 50% 40% Grade A y = 2E-05x + 0.5076 R² = .00002 Grade B Grade C 30% Grade D 20% Grade F 10% 0% Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 Cabrillo College’s Campus Climate Study Biennial survey Major revision in 2008 ◦ Dropped demographic questions ◦ Collected data sufficient for a “fuzzy match” ◦ Added engagement, behavioral & tech questions 2,055 cases from Fall 2008 & Fall 2010 Sample Description Characteristics Gender Ethnicity Age Workload Female Latino White 18-20 21-25 26-30 Full Time College Population Fact Book 2011 53% 30% 57% 29% 25% 12% 28% Campus Climate Sample (n=2,055) 56% 33% 47% 46% 24% 9% 55% Student Engagement Student engagement is “…the interaction or fusion of behavior, emotion, and cognition in the process of learning.” Fredricks, Blumenfeld & Paris (2004) Student Engagement 4.00 3.50 3.00 2.92 2.77 2.75 2.54 2.52 2.50 2.05 2.00 2.00 1.50 1.00 Participated in class Rapid instructor Asked instructor feedback re: assignments Had meaningful Worked with Sought advice re: Used a chat or conversations other students career plans email for class with students of different ethnicity Full Time students are engaged (statistically significant differences for all items) 4.0 FT PT 3.5 2.96 3.0 2.86 2.58 2.5 2.40 2.84 2.66 2.82 2.67 2.59 2.45 2.59 2.42 2.13 1.91 2.0 2.10 1.90 1.5 1.0 Average Engagement scale Participated in Rapid instructor Asked instructor Meaningful Worked with Sought advice Used a chat or class feedback re: assignments conversations other students re: career plans email for class with different ethnicity Technology usage 2008 Frequency of Usage 2008 100% 90% 80% 70% 39% 62% 63% 60% 19% 50% 40% 30% 10% 20% 12% 10% 16% 19% 14% 23% 0% Facebook 12% My Space 11% C. Wireless Never Seldom Sometimes Often Technology usage 2010 Frequency of Usage 2010 100% 90% 23% 80% 42% 70% 11% 60% 16% 71% 16% 50% 40% 30% 20% 51% 10% 23% 20% 7% 20% 3% 0% Facebook My Space C. Wireless Never Seldom Sometimes Often Building a model of student achievement Multivariate Linear Regression ◦ Does the inclusion of a factor change the model? ◦ Standardized Beta coefficients range from -1 to 1 Dependent Variable: GPA 16 Independent/predictor variables tested Hypothesis – student engagement has a direct effect on student achievement Is the influence of Student Engagement on achievement mediated by other factors or does it have a direct effect? Demographics Student Engagement - Age - Gender -Ethnicity -SES -Other factors Student Achievement (GPA) The basic relationship Student Engagement GPA Unit Load and Working Unit Load Hours Working Interaction of load*work GPA Unit Load and Working - details Unstandardized Coefficients Model 1 (Constant) Hours worked Term Units Work*units interaction a. Dependent Variable: GPA B Std. Error 3.040 .146 Standardize d Coefficient s Beta t 20.867 Sig. .000 .083 .031 .201 2.679 .007 -.004 .012 -.018 -.327 .744 -.006 .003 -.197 -2.424 .015 Demographics Gender Age Ethnicity GPA Demographic model - details Unstandardized Coefficients Model 1 B Standardized Coefficients 2.507 Std. Error .063 Age .022 .002 .249 10.273 .000 Gender .165 .041 .096 4.001 .000 Latino -.265 .043 -.149 -6.192 .000 (Constant) Beta t 40.070 Sig. .000 a. Dependent Variable: GPA N.B. Age has a bivariate association with GPA of virtually zero! r = .022 Uber model Age Ethnicity Gender Teacher support Unit load Home Tech Live with parents Engagem ent GPA Über model - details Model 1 (Constant) Age Gender Latino Perception of Instructors Technology in Home Term Units Living with parents engagement a. Dependent Variable: GPA Unstandardized Standardized Coefficients Coefficients B Std. Error Beta 2.375 .181 .017 .003 .191 .151 .045 .089 -.184 .049 -.105 .086 .030 .080 .084 .031 .074 -.014 .006 -.069 -.168 .052 -.100 .100 .037 .076 t 13.100 5.924 3.326 -3.775 2.860 -2.716 -2.399 -3.218 2.698 Sig. .000 .000 .001 .000 .004 .007 .017 .001 .007 What opportunities are we missing? What data do you think might be important to predict achievement that we are not currently collecting/using? Next Steps Continue to reflect on how to use predictive information ◦ In what contexts can this information be used to enhance student success? Integrating psychological measures ◦ the College Self-Assessment Survey (CSSAS) ◦ Research question: Do psychological measures enhance our ability to predict student performance? Consider benefits of integrating Climate survey with Instructional Planning survey CSSAS CONSTRUCTS: Academic SelfEfficacy Hope Interventions • • • • Learning communities Grant activities Curricular innovation Matched comparison groups Communication Academic Identity Goals Self-Regulation Relationship to Self Personal Responsib ility Achievement (GPA) Relationship to Others Leadership & Teamwork The four inefficiencies 1. Not basing decisions on data ◦ “Death by anecdote” 2. Using the wrong data to make decisions ◦ Granularity; Simpson’s paradox 3. Using good data but in the wrong way ◦ Grade inflation 4. Not collecting the right data ◦ Missed classes, missed opportunities Final thought The answers I arrive at – and thus the methods that I choose – are almost never wholly satisfactory, never wholly disappointing. The most consistent problems that I have arise from the limitations on my information sources. ◦ Bill James as quoted in Moneyball, page 82 Mauriello and Armbruster’s goal was to value the events that occurred on a baseball field more accurately than they had ever been valued before. In 1994, they stopped analyzing derivatives and formed a company to analyze baseball players, called AVM Systems. Ken Mauriello had seen a connection between the new complex financial markets and baseball: “the inefficiency caused by sloppy data.” As Bill James had shown, baseball data conflated luck and skill, and simply ignored a lot of what happened in a game. – Moneyball, page 131