Moneyball: Using stats to look at grade inflation, student engagement, and achievement; includes data from Campus Climate Fall 2010 Survey

advertisement
Moneyball
Are your students getting on base?
Planning and Research Office
Craig Hayward, Ph.D.
Natalia Cordoba-Velasquez
Cabrillo College
January 31st, 2012

First, a primer on baseball:
◦ http://www.youtube.com/watch?v=cMha-DjYMqQ
What is sabermetrics?
The search for objective knowledge about
baseball
 The value of getting on base


Baseball statistics, unlike statistics in any
other area, have acquired the power of
language.
◦ Bill James, 1985 Statistical Abstract
The four inefficiencies
1.
Not basing decisions on data
2.
Using the wrong data to make decisions
3.
Using good data but in the wrong way
4.
Not collecting the right data
The Moneyball odyssey

Where did Bill James start?
◦ He started where the data was the best
◦ “Looking at places where the stats don’t tell
the whole truth – or even lie about the
situation.”
See a pattern?
Spring 1993 Spring 1993
Spring 1995 Spring 1995
Credit
Grade
Count
Credit
Grade
Count
Credit Grade
Count (%)
Spring 1997 Spring 1997
Spring 1999 Spring 1999
Spring 2001 Spring 2001
Credit
Credit
Credit Grade Credit Grade
Credit Grade
Credit Grade
Grade
Grade
Count
Count (%)
Count (%)
Count (%)
Count
Count
100.00%
38,780
100.00%
38,339
100.00%
38,159
100.00%
Credit Grade
Count (%)
Cabrillo Total
38,954
100.00%
38,289
Grade A
10,602
27.22%
10,603
27.69%
11,525
29.72%
11,651
30.39%
11,297
29.61%
Grade B
6,083
15.62%
6,121
15.99%
5,626
14.51%
6,013
15.68%
6,011
15.75%
Grade C
3,568
9.16%
3,327
8.69%
2,995
7.72%
3,025
7.89%
3,007
7.88%
Grade D
815
2.09%
746
1.95%
699
1.80%
754
1.97%
643
1.69%
Grade F
695
1.78%
830
2.17%
651
1.68%
974
2.54%
921
2.41%
Pass
6,591
16.92%
5,405
14.12%
5,687
14.66%
5,856
15.27%
6,358
16.66%
No Pass
Incomplete
No Credit
Report
Delayed
Dropped
2,513
6.45%
2,604
6.80%
2,812
7.25%
3,048
7.95%
3,218
8.43%
575
1.48%
730
1.91%
710
1.83%
413
1.08%
339
0.89%
0.00%
54
0.14%
82
0.21%
125
0.33%
60
0.16%
Withdrew
Military
Withdrawal
Unknown
6,326
0.00%
16.24%
0.00%
6,999
0.00%
1,186
3.04%
18.28%
0.00%
6,857
0.00%
870
2.27%
17.68%
0.00%
5,535
0.00%
1,136
2.93%
945
14.44%
0.00%
6,305
16.52%
0.00%
0.00%
2.46%
0.00%
Proportion of “A” Grades relative
to all other grade notations
Grade A
33%
32%
31%
30%
29%
28%
27%
26%
25%
24%
Grade inflation?
y = 0.0052x + 0.2721
R² = 0.8349
Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring
1993 1995 1997 1999 2001 2003 2005 2007 2009 2011
A bit less inflated
45%
y = 0.0041x + 0.3774
R² = 0.4208
40%
35%
30%
Grade A
25%
Grade B
20%
Grade C
Grade D
15%
Grade F
10%
Withdrew
5%
0%
Linear (Grade A
)
Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring
1993 1995 1997 1999 2001 2003 2005 2007 2009 2011
Grade inflation in proper context
60%
50%
40%
Grade A
y = 2E-05x + 0.5076
R² = .00002
Grade B
Grade C
30%
Grade D
20%
Grade F
10%
0%
Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring
1993 1995 1997 1999 2001 2003 2005 2007 2009 2011
Cabrillo College’s
Campus Climate Study
Biennial survey
 Major revision in 2008

◦ Dropped demographic questions
◦ Collected data sufficient for a “fuzzy match”
◦ Added engagement, behavioral & tech questions

2,055 cases from Fall 2008 & Fall 2010
Sample Description
Characteristics
Gender
Ethnicity
Age
Workload
Female
Latino
White
18-20
21-25
26-30
Full Time
College
Population
Fact Book
2011
53%
30%
57%
29%
25%
12%
28%
Campus
Climate
Sample
(n=2,055)
56%
33%
47%
46%
24%
9%
55%
Student Engagement
Student engagement is “…the interaction or fusion
of behavior, emotion, and cognition in the process
of learning.”
Fredricks, Blumenfeld & Paris (2004)
Student Engagement
4.00
3.50
3.00
2.92
2.77
2.75
2.54
2.52
2.50
2.05
2.00
2.00
1.50
1.00
Participated in
class
Rapid instructor Asked instructor
feedback
re: assignments
Had meaningful Worked with Sought advice re: Used a chat or
conversations other students
career plans email for class
with students of
different ethnicity
Full Time students are engaged
(statistically significant differences for all items)
4.0
FT
PT
3.5
2.96
3.0
2.86
2.58
2.5
2.40
2.84
2.66
2.82
2.67
2.59
2.45
2.59
2.42
2.13
1.91
2.0
2.10
1.90
1.5
1.0
Average
Engagement
scale
Participated in Rapid instructor Asked instructor
Meaningful
Worked with Sought advice Used a chat or
class
feedback
re: assignments conversations other students re: career plans email for class
with different
ethnicity
Technology usage 2008
Frequency of Usage 2008
100%
90%
80%
70%
39%
62%
63%
60%
19%
50%
40%
30%
10%
20%
12%
10%
16%
19%
14%
23%
0%
Facebook
12%
My Space
11%
C. Wireless
Never
Seldom
Sometimes
Often
Technology usage 2010
Frequency of Usage 2010
100%
90%
23%
80%
42%
70%
11%
60%
16%
71%
16%
50%
40%
30%
20%
51%
10%
23%
20%
7%
20%
3%
0%
Facebook
My Space
C. Wireless
Never
Seldom
Sometimes
Often
Building a model of student
achievement

Multivariate Linear Regression
◦ Does the inclusion of a factor change the model?
◦ Standardized Beta coefficients range from -1 to 1



Dependent Variable: GPA
16 Independent/predictor variables
tested
Hypothesis – student engagement has
a direct effect on student achievement
Is the influence of Student Engagement
on achievement mediated by other factors
or does it have a direct effect?
Demographics
Student
Engagement
- Age
- Gender
-Ethnicity
-SES
-Other factors
Student
Achievement
(GPA)
The basic relationship
Student
Engagement
GPA
Unit Load and Working
Unit Load
Hours
Working
Interaction
of
load*work
GPA
Unit Load and Working - details
Unstandardized
Coefficients
Model
1
(Constant)
Hours worked
Term Units
Work*units
interaction
a. Dependent Variable: GPA
B
Std. Error
3.040
.146
Standardize
d
Coefficient
s
Beta
t
20.867
Sig.
.000
.083
.031
.201
2.679
.007
-.004
.012
-.018
-.327
.744
-.006
.003
-.197 -2.424
.015
Demographics
Gender
Age
Ethnicity
GPA
Demographic model - details
Unstandardized
Coefficients
Model
1
B
Standardized
Coefficients
2.507
Std. Error
.063
Age
.022
.002
.249
10.273
.000
Gender
.165
.041
.096
4.001
.000
Latino
-.265
.043
-.149
-6.192
.000
(Constant)
Beta
t
40.070
Sig.
.000
a. Dependent Variable: GPA
N.B. Age has a bivariate association with GPA of virtually zero! r = .022
Uber model
Age
Ethnicity
Gender
Teacher
support
Unit load
Home
Tech
Live with
parents
Engagem
ent
GPA
Über model - details
Model
1
(Constant)
Age
Gender
Latino
Perception of Instructors
Technology in Home
Term Units
Living with parents
engagement
a. Dependent Variable: GPA
Unstandardized
Standardized
Coefficients
Coefficients
B
Std. Error
Beta
2.375
.181
.017
.003
.191
.151
.045
.089
-.184
.049
-.105
.086
.030
.080
.084
.031
.074
-.014
.006
-.069
-.168
.052
-.100
.100
.037
.076
t
13.100
5.924
3.326
-3.775
2.860
-2.716
-2.399
-3.218
2.698
Sig.
.000
.000
.001
.000
.004
.007
.017
.001
.007
What opportunities are we missing?

What data do you think might be
important to predict achievement that we
are not currently collecting/using?
Next Steps

Continue to reflect on how to use predictive
information
◦ In what contexts can this information be used to
enhance student success?

Integrating psychological measures
◦ the College Self-Assessment Survey (CSSAS)
◦ Research question: Do psychological measures
enhance our ability to predict student
performance?

Consider benefits of integrating Climate
survey with Instructional Planning
survey
CSSAS CONSTRUCTS:
Academic
SelfEfficacy
Hope
Interventions
•
•
•
•
Learning
communities
Grant activities
Curricular
innovation
Matched
comparison groups
Communication
Academic
Identity
Goals
Self-Regulation
Relationship to
Self
Personal
Responsib
ility
Achievement
(GPA)
Relationship to
Others
Leadership
&
Teamwork
The four inefficiencies
1.
Not basing decisions on data
◦ “Death by anecdote”
2.
Using the wrong data to make decisions
◦ Granularity; Simpson’s paradox
3.
Using good data but in the wrong way
◦ Grade inflation
4.
Not collecting the right data
◦ Missed classes, missed opportunities
Final thought

The answers I arrive at – and thus the
methods that I choose – are almost never
wholly satisfactory, never wholly
disappointing. The most consistent
problems that I have arise from the
limitations on my information sources.
◦ Bill James as quoted in Moneyball, page 82

Mauriello and Armbruster’s goal was to value the
events that occurred on a baseball field more
accurately than they had ever been valued before.
In 1994, they stopped analyzing derivatives and
formed a company to analyze baseball players,
called AVM Systems. Ken Mauriello had seen a
connection between the new complex financial
markets and baseball: “the inefficiency caused by
sloppy data.” As Bill James had shown, baseball
data conflated luck and skill, and simply ignored a
lot of what happened in a game. – Moneyball, page
131
Download