Grade Bias in the Faculty of Social Sciences at the Hebrew

advertisement
1
Estimating Grade Differentials and Grade Inflation when
Students Chase Grades, and when Student Quality and
Instructors Matter
Michael Beenstock
Dan Feldman
Department of Economics
Hebrew University of Jerusalem
Previous studies have shown that there are persistent differences in university course
grades across subjects, suggesting that some departments grade more leniently than
others. Since better students are expected to obtain higher grades, some of these
differences might be due to student quality. The grades of the 2003 cohort of BA social
science students at the Hebrew University of Jerusalem are decomposed into an ability
effect, an instructor effect, a choice of major effect, a departmental grading effect, and a
random effect. Two methods are used to estimate these effects. The first conditions on
student entry grades to measure high school ability. The second conditions on student
specific effects to measure university ability.
We show that instructors who teach in different departments adjust grades to
departmental norms. This suggests that inter-departmental grade differentials are induced
by grading standards rather than the quality of instruction.
Whereas departmental grade differentials are large and persistent, there is little evidence
of grade inflation during 2000 – 2008. A simple test is proposed to determine whether
grade-chasing by students induces grade inflation. We show that grade-chasing has been
stable during the study period..
June 4, 2012
Keywords: differential grading, measuring academic ability, grade inflation, gradechasing.
2
1. Introduction
The scientific literature on course grade comparability at universities has three related
components. The first, in order of chronology, is concerned with measuring the degree of
grade comparability, or its lack (Brogden and Taylor, 1950). Differential grading has a
hierarchical structure; it may exist between universities, between faculties at universities,
between departments and academic disciplines in faculties, between courses in
departments and between instructors. Apart from its inherent inequity, grade
incomparability has practical and financial implications since universities typically award
scholarships according to the grades of the best students. Also, acceptance into graduate
programs depends on BA grades, as may also the employment prospects of graduates.
Whereas grade differentials are concerned with horizontal inequity, grade inflation
(Johnson 2003) is concerned with vertical inequity. Grade inflation creates the misleading
impression that students who graduated later performed better than students belonging to
an earlier cohort. Grade incomparability and grade inflation seem to be widespread, if not
universal, phenomena.
The second component follows the first and is concerned, in the interests of
fairness, with designing adjustment methods that correct for differences in grading
standards1. Goldman and Widawski (1976) suggested adjustments based on the grades of
the same students who studied in different departments. This idea was extended by Elliot
and Strenta (1988) to the course grades of the same students within departments. Stricker
et al (1993, 1994) “residualized” course fixed effects by regressing course grades on
student covariates, and suggested that these fixed effects be applied for purposes of grade
adjustment. However, these adjustments ignore the possibility that grades might depend
on the quality of instruction. Given everything else, the students of superior instructors
may achieve higher grades.
The third and most recent component is concerned with the behavioral causes of
differential grading. For example, Krautman and Sander (1999) and Johnson (2003)
claim that instructors “buy” favorable student evaluations by grading leniently. Also,
instructors may offer higher grades to compete in the market for students. The “grade
1
These adjustment methods for university and college grades have their counterparts for high school grades
(Linn 1966 and Young 1993).
3
race” triggered by this competitive behavior induces grade inflation (Johnson 2003). Bar
and Zussman (2011) claim that at Cornell University instructors affiliated to the
Republican party graded less equally and favored white students relative to instructors
affiliated to the Democrats.
The present paper is concerned with the first component, the measurement of
differential grading, intra-temporally and inter-temporally. Departments typically set
differential rules for accepting students and set higher entry standards according to the
pressure of applications. If student quality varies between departments, we naturally
expect grades to be higher in departments with better students. However, student quality
is not observable, which complicates the empirical estimation of differential grading.
Specifically, two measures of student quality are compared. The first measures student
quality by their entry or acceptance grades, which measure student ability post highschool and pre-university and are based on high school matriculation grades and
psychometric scores. The second measures student quality at university. The latter is
estimated as a student specific effect using course grade histories on individual students.
We show that measures of differential grading may depend on how student quality is
measured. We also show that pre-university ability and university ability (as measured by
student specific effects) is correlated 0.41, i.e. pre-university ability is an imperfect
indicator of university ability.
Just as student quality is unobserved, so is the quality of their instructors
unobserved. Given everything else, students taught by better instructors should obtain
higher grades. Therefore, some of the grade differences might be induced by the quality
of instructors as well as the quality of students. We do not think that student evaluations
are reliable measures of instructor quality because, as mentioned, instructors might grade
to curry favor with students. Instead, we estimate instructor fixed effects to measure
instructor specific grade differentials. These differentials embody three components;
leniency, quality and a peer group effect induced by departmental grading norms. We
propose a quasi-experiment to establish that the dominant component is the departmental
grading norm.
As pointed out by Achen and Courant (2009), data on grades are rare. There is
unfortunately no centralized data depository for course grades that may be used for
4
purposes of research. Universities tend to be discreet about grades to the point of secrecy.
Such data have three main levels of aggregation: by department, by course and by
student. The latter consist of micro data for the grades of individual students by course.
Achen and Courant (2009) and Bar et al (2009) used data for the University of Michigan
and Cornell University respectively which are aggregated by course and discipline.
Johnson (2003) and Sabot and Wakeman-Linn (1991) also used aggregated data for Duke
University and Williams College respectively. Indeed, the few studies that are available
refer to grading in perhaps a non-representative sample of a handful of US universities.
There seem to be no studies for universities outside the US.
There are even fewer studies using micro data on individual students. Stricker et
al (1993, 1994) used first semester grades of students at a large state university in the US
to compare different methods of adjusting GPAs for differential grading standards. Barr
and Zussman (2011) matched students to their instructors at Cornell University. In the
present study we use longitudinal micro data for BA students at the Hebrew University of
Jerusalem2. Specifically, we use complete cohort data on 1217 students who registered in
2003 and who majored in at least one of the social sciences. The data are longitudinal and
cover the years 2003 – 2008 by the end of which the vast majority of students had
graduated. To our best knowledge this is the first time that data on entire grade histories
of individual students are being used. It is this feature of the data that enables the
estimation of student specific effects. We have also matched instructors to courses, which
enables the estimation of instructor specific effects.
Social science students in Israeli universities typically choose two majors. They
might major in two departments in the Faculty of Social Sciences (FSS), or they might
choose a second department from outside FSS as their second major. To broaden their
education students are also required to attend courses outside their two majors. This
means that students have course grades from three or more departments. Since most
students major in two departments we may compare the performance of the same
students across departments as in Goldman and Widowski (1976) and across courses in
departments as in Elliot and Strenta (1988).
2
I am grateful to Billy Shapira and Rachel Amir for supplying the data. The research was instigated by the
Pedagogic Committee of the Faculty of Social Sciences chaired by Menachem Hoffnung.
5
A further effect not previously investigated is related to choice of major. For
example, we show that given their ability, students majoring in economics obtain higher
grades in non-economics courses. This effect might be induced by knowledge spillovers;
i.e. economics is a discipline that benefits learning in other disciplines. Or, it might be a
peer group effect; economic students are more competitive. Although we are unable to
distinguish between these interpretations, these “major effects” complicate the estimation
of differential grading. If, for example, students of economics happen to take courses in
sociology, the mean grade in sociology will increase for reasons unrelated to differential
grading.
Measuring grade inflation involves similar index number problems to measuring
price inflation. If students chase higher grades, student-weighted measures of grade
inflation will be biased upwards since courses with higher grades will attract more
students. We suggest a simple test of grade-chasing based on a comparison of weighted
and simple grade averages. We show that during the study period (2000 – 2008) there is
little evidence of grade inflation and that grade-chasing was, on the whole, stable.
2. Methodology
The entry grade for student i is denoted by Xi, which is a weighted average of high school
matriculation grades and grades obtained in nationwide psychometric test for university
entrance3. X is scaled between 16 and 25 points. Course grades for student i in course j
in year t are denoted by Yijt and are scaled between 0 and 100. Course j is supplied by
department k, and there are K departments. A set of K-1 dummy variables is generated to
identify departments denoted by Dk. Therefore Djk = 1 if course j was supplied by
department k and is zero otherwise. Another set of dummy variables is generated such
that Mik = 1 if student i majored in department k and is zero otherwise. A further set of
dummy variables is generated such that Ct = 1 if the course was attended in year t (year
since registration) and is zero otherwise. Pnjt =1 and zero otherwise if instructor n taught
3
In Israel psychometric test are carried out by a national body, the Center for Psychometric Evaluation.
Therefore test results are inter-personally comparable. The matriculation result is made up of two
components, results from nation-wide examinations of the Ministry of Education, which are interpersonally
comparable, and an assessment made by the high school, which is not. Students must have entry grades of
at least 16 points to study at the Hebrew University.
6
course j in year t.
Finally, the data include a vector of demographic controls for students
(Zi) for gender, age in 2003, and year of immigration for immigrants.
2.1 Method 1: Ability Measured by Entry Grades
Method 1 conditions on entry grades (X), which measure high school ability:
T 1
K 1
K 1
N 1
t 1
k 1
k 1
n 1
Yijt    X i  Z i   t Ct    k D jk    k M ik    n Pnjt  u ijt
(1)
In equation (1) u denotes a residual error, which measures unobserved phenomena such
as ability, luck, study habits, ethnicity etc, and is expected to be zero. Including X in
equation (1) is intended to control for observed ability, therefore,  is expected to be
positive. Since high school ability and university ability are positively correlated, and u
embodies ability at university, u and X may not be independent. This dependence would
induce positive bias in the estimate of . Another problem is that E(ujuh) is unlikely to be
zero because students who have better grades in course j may have better grades in course
h and other courses. This problem does not induce statistical bias, but it is detrimental to
statistical efficiency. The standard errors of the parameters are therefore clustered by
student, which mitigates this problem.
The γ coefficients capture the potential effects of demographic variables on
grades, and the  coefficients capture the potential effect of time on grades. If t exceeds
t-1 students who attended courses in year t obtained better grades than those who
attended the same courses in the previous year. This sounds like grade inflation, but it is
not. It may simply be the case that students do better in their second year than in the first,
as they gain more university experience. To estimate grade inflation, it would be
necessary to add additional cohorts to the analysis. We return to the issue of grade
inflation in section 6.
The  coefficients capture potential motivational effects and peer group pressures
such that if k is positive students majoring in department k outperform otherwise
identical students majoring in department K in common courses. Since equation (1)
conditions on student ability and the department supplying the course,  begs the
interpretation of a peer group effect or knowledge spillover effect since ability and course
identity have already been factored out. If k is positive, studying major k helps students
7
perform better in courses of department K. For example, statistics might help students
perform better in economics and vice-versa.
In the present context, the most important parameters in equation (1) are the s,
since if k is positive the grades in department k are higher than in department K. Given
everything else, including student ability as measured by X, and choice of majors etc, the
s measure differential grading by department. In summary method 1 suffers from a
number of statistical problems, which arise from the fact that X is an imperfect measure
of university ability. Method 2 does not suffer from these problems, and is based on panel
data econometrics (Baltagi 2005).
2.2 Method 2: Ability Measured by Student Specific Effects
Method 2 requires the estimation of equation (2):
T 1
K 1
K 1
N 1
t 1
k 1
k 1
n 1
Yijt   i   t Ct    k D jk    k M ik    n Pnjt  vijt
(2)
The difference between equation (1) and (2) is that equation (2) does not specify X and Z.
Instead it specifies a separate intercept term for each student, or specific effects (i).
Since the other covariates in equation (2) are the same as in equation (1), i is larger the
more able is student i at getting good grades. Therefore the  coefficients are measures of
student ability at university. Because they measure ability at university rather than at high
school, the statistical problems that arose with methodology 1 do not arise with method 2
since by definition the residuals (v) in equation (2) are independent of the s, whereas the
residuals of equation (1) may not be independent of X and Z. The  coefficients
estimated by equation (2) may therefore be different from their counterparts estimated by
equation (1) since the latter may be biased whereas the former are unbiased.
The  coefficients may be specified as fixed effects or as random effects. The
latter assumes that the students have been sampled randomly from the population of
students. Since the data refer to an entire student cohort the s are specified as fixed
effects rather than random effects. Although the specification of fixed effects increases
the burden of estimation, it avoids possible misspecification error resulting from
inappropriate parametric assumptions regarding the distribution of the random effects. It
is impossible to combine models 1 and 2 by specifying X and Z in equation (2) because
these variables are perfectly correlated with the specific effects.
8
Equations (1) and (2) are not nested and are essentially different because equation (1)
uses up a relatively small number of degrees of freedom to estimate the  and γ
coefficients, whereas equation (2) specifies separate intercept terms for each student,
which in our case uses up 1180 degrees of freedom. Since the equations are non-nested4,
adjusted R-squared does not indicate whether method 1 is superior to model 2. It would
take a non-nested test of the two models to determine which method is preferable.
Because the panel data are unbalanced (the number of observations per student is
not constant), the standard errors of the parameters are likely to be heteroscedastic.
Therefore, robust standard errors are reported for the parameters. Also, the standard
errors are clustered, as mentioned, by course and student.
2.3 Major Effects
If majors selected by students were fixed during the entire BA the  coefficients in
equation (2) could not be estimated because Mi would be perfectly correlated with their
fixed effects (i). However, majors are not fixed for two reasons. First, departments
operate different policies regarding entry into year 2. For example, students who
registered for economics in year 1 will not be allowed to register for economics in year 2
if their first year grades in economics were inadequate. Such students will be required to
seek a replacement major. Second, students switch majors on their own initiative. For
example, in the previous example students might not wish to continue with economics
even they were allowed to register in year 2. Whereas departments select students at the
end of year 1, students may switch majors at any time. Indeed, there is substantial
mobility in majors. Therefore, the  coefficients are identified because M is not perfectly
correlated with student fixed effects.
Nevertheless, the interpretation of the  coefficients is problematic because of
selectivity in the choice of majors. Since methods 1 and 2 control for generic ability, the
 coefficients are unbiased if students choose majors according to their ability. If,
however, ability is specific rather than generic matters might be different. For example, a
student switches from economics to sociology because he is more suited to sociology.
This self-sorting process is efficient because students are better matched to both
4
Non-nested because equation (1) is not a special case of equation (2), nor is equation (2) a special case of
equation (1). See e.g. Davidson and Mackinnon (2009), chapter 14.
9
economics and sociology in terms of their specific abilities. If the generic ability of these
sociology students is greater than the generic ability of economic students, the estimate of
 for sociology will be positive.
Therefore, in addition to peer group and knowledge spillover effects, the 
coefficients may capture the effects of specific ability. Since the  coefficients constitute
the parameters of interest, controlling for M in equations (1) and (2) also controls for
otherwise unobserved peer group effects, knowledge spillover effects and effects induced
by differences between generic and specific abilities. Therefore, specification of M
reduces the risk of omitted variable bias.
3. The Data
3.1 Students
The data were supplied by the Department of Student Administration of the Faculty of
Social Sciences at the Hebrew University of Jerusalem. The data comprise the entire
student cohort of 2003 who registered in the Faculty of Social Sciences. These comprise
1,217 BA students whose grades during 2003 – 2008 are included in the data. The total
number of grade observations is about 40,000, which works out at an average of 33
grades per student. By 2008 79 percent of this student cohort had graduated.
Table 1 Cohort Demographics
(percent)
Male
45.7
Immigrated after 1989
11.0
Immigrated after 1998
2.6
Born before 1975
2.1
Born 1975 – 1977
10.9
Born 1978
12.3
Born 1979
19.3
Born 1980
22.7
Born 1981
15.5
Born 1982
7.7
10
Born 1983
5.8
Born 1984
2.8
Table 1 shows the demographic composition of the students in the cohort. There were
slightly more women than men and 11 percent were new immigrants (immigrated after
1989 when the former USSR permitted Jews to emigrate). The modal age was 23 years in
1983, but the age dispersion was quite large5. The second column in Table 2 records the
number of students who registered by major in 2003. The fourth column reports the
number of students who graduated by 2008. Since a significant minority of students did
not graduate, the numbers in column 4 are expected to be smaller than the numbers in
column 2. The graduation rates are particularly low in sociology and statistics. In the case
of business studies there were more graduates than initial registrations because apparently
some students transferred to business studies after 2003.
Table 2 Majors
Students
Entry Grade
Graduates
Final Grade
Psychology
116
22.7
115
89.7
Sociology
240
18.3
157
84.8
Political Science
179
17.9
121
85.0
Int Relations
255
18.1
213
84.8
Statistics
57
16.9
27
78.0
Economics
333
20.7
248
83.8
Business Studies
81
21.0
85
87.5
Accountancy
86
20.6
86
81.3
Communications
139
21.1
119
88.5
PPE
31
20.9
23
88.3
Geography
75
17.4
58
86.4
52
84.4
Islamic Studies
5
Undergraduates in Israel are older than their counterparts abroad because military conscription is 3 years
for men and 2 years for women. Since Arabs are not required to serve in the army, the younger students are
less likely to be Jewish. Many Bedouins and Druzes serve in the army. Student ethnicity is not identified in
the data.
11
E. Asian Studies
35
86.3
History
20
86.9
L. America
27
90.3
Italian
17
89.0
Education
57
87.8
General BA
41
85.0
Law
38
84.7
Economics &
18
87.1
Law
Most students initially chose two majors. Table 2 indicates that in many cases these
majors were from the Faculty of Humanities. The most popular pairings of majors are
economics and accountancy (260 students), economics and business studies (150),
political science and international relations (162), communications and sociology (65),
communications and international relations (73) and communications and psychology.
Recall that majors may vary during the BA as a whole.
Table 2 also reports entry grades (on a scale of 16 – 25) by department. Since, the
offer of a university place is by entry grade alone, the minimum entry grade is in fact the
threshold for acceptance. These thresholds vary by department. For example, the
threshold for economics was lower than for psychology, but was higher than for
sociology. The hardest departments to enter from this point of view were (and still are)
psychology and communications. On the other hand, these departments enrolled fewer
students than the economics department. Choosier departments naturally enroll fewer
students and set higher entry thresholds. However, some departments, such as statistics,
enroll fewer students even though the threshold is relatively low, because the underlying
demand to study statistics is low.
Entry grades and student registrations are not available for majors outside the Faculty
of Social Sciences. For example, there were 52 students who majored in Islamic Studies
who also majored in one of the social sciences. The total number of students majoring in
Islamic Studies is obviously much larger than this. Since the minimal entry grade in
Islamic Studies is set by the Faculty of Humanities, the entry grade and registrations in
12
Islamic Studies does not feature in Table 2. These data only feature for the social science
majors.
Table 2 also records the final BA grades by major. The grand mean grade was 86.5
and the median grade was 87.2. The highest final grades were obtained by 27 students in
the department of Latin American Studies (Faculty of Humanities). Within the Faculty of
Social Sciences the highest average final grade was achieved by students majoring in
psychology, and the lowest by far was achieved by students majoring in statistics.
Table 3 Graduation
Year
2005
2006
2007
2008 – 2010
Percent
3.5
56.2
26.4
13.7
As mentioned, 80 percent of the students in the cohort had graduated by 2008. Table
3 reports the years in which these students graduated. The modal year for graduation was
2006, which is three years after registration6. However, only 60 percent of graduates
completed their BA within three years.
3.2 Instructors
We have matched 1500 individual instructors to the courses attended by the students in
the cohort. The number of instructors exceeds the number of students due to high turnover among instructors, especially external instructors. There are 628 instructors who
supplied at least 8 grades during 2003-2008. To estimate instructor fixed effects, we set 8
grades as a minimum. These instructors supplied a total of 33,166 grades out of the
almost 40,000 grades mentioned in section 3.1. Therefore, estimation of instructor fixed
effects reduces the sample size by slightly more than 5,000 observations. Instructor fixed
effects embody three components: a departmental grading norm, leniency or strictness,
and instruction quality.
Observed characteristics of the 628 instructors are reported in Table 4. Apart from
members of faculty, instructors include graduate students as well as external instructors.
Note that 68 instructors are affiliated to more than one department. We use these
6
These data refer to the year in which the BA degree was awarded. In practice there are students who
missed the deadline for various administrative reasons and were awarded their degree the following year.
Some of the 26 percent of students in 2007 no doubt fall into this category.
13
instructors for quasi-experimental purposes to test the hypothesis that instructors are
influenced by departmental grading norms.
Table 4 Instructor Charactersitics
Men
Young
Members of faculty
Phd students
External instructors
Affiliated to more than 1 department
4. Results Ignoring Instructor Fixed Effects
4.1 Overview
We begin by ignoring instructor fixed effects because, as mentioned, this involves a
substantial reduction in the sample size. In section 5 we estimate models including
instructor fixed effects. Two sets of results are reported. The first (method 1) is based on
equation (1) and conditions on students’ entry grade as a measure of their ability. The
second (method 2) is based on equation (2) and conditions on the student fixed effect as a
measure of ability. Since method 2 does not require data on entrance grades, which are
missing for some students, it uses more observations than method 1. In fact method 1
uses 36,867 observations on grades for 1120 students while method 2 uses 38,542
observations on grades for 1180 students. Table 5 summarizes the sample sizes and
presents goodness-of-fit statistics for the two methods. The base department for courses
is economics, the base major is economics and the base year is 2004 since these
categories cover the largest number of courses and students.
Table 5 Summary Statistics
Students
Observations
Method 1
1120
36,867
Method 2
1180
38,542
14
R-squared (adjusted)
With Major effects
Without major effects
P-value for fixed effects
Non-nested test
1.0055
0.1346
0.1246
Na
se = 0.1155
0.7153
0.0485
0.0362
<0.0001
se = 0.1857
Methods 1 and 2 come in two versions, with and without major effects, which are
jointly statistically significant. The goodness-of-fit of method 1, as measured by R2, is not
necessarily better than the goodness-of-fit of method 2 since in the latter case it excludes
the explanatory power of the student fixed effects. In any case the two methods are not
directly comparable because they are non-nested; method 1 conditions on high school
ability whereas method 2 conditions on ability at university.
A non-nested test7 is carried out to discriminate between the two methods. There are
three potential outcomes of this test. Method 1 encompasses method 2, method 2
encompasses method 1, and neither method encompasses the other. When for the
common set of observations the predicted grade of method 2 is added to method 1 it
obtains a coefficient of 1.0055 which is statistically significant. Therefore, method 2
explains grades that method 1 failed to explain. When the predicted grade of method 1 is
added to method 2 it obtains a coefficient of 0.7153, which is also statistically significant.
Therefore, method 1 explains grades that method 1 failed to explain. The test therefore
shows that neither method encompasses its rival, in which case there is no preferred
model. This result implies that both high school ability and university ability matter.
However, for reasons already stated, it is not possible to specify both types of ability.
4.2 Departmental Grade Differentials
There are 55 grade differential coefficients (k). Since courses supplied by the
Department of Economics comprise the base group, these differentials are expressed
relative to economics. The estimates of grade differentials for key departments are
reported in Table 6. For example, grades in the Department of Political Science exceed
grades in the Department of Economics by about 5.12 points according to method 1 with
7
See in particular Davidson and MacKinnon (2009) chapter 14 on non-nested hypothesis testing and the
encompassing principle. The reported results refer to both variance encompassing and parameter
encompassing. Method 1 encompasses method 2 if method 1 explains what method 2 fails to explain, while
method 2 does not explain what method 1 fails to explain. The non-nested test uses the specification with
major effects.
15
major effects. The standard errors are robust and clustered by student in method 1 and
they are robust in method 2 because the panel is unbalanced8. In the case of method 2
robust standard errors are also clustered by 3028 courses. Therefore, the grade differential
in political science relative to economics is statistically significant.
The grade differential estimates are all positive except for mathematics. Therefore all
departments have higher grade than economics except for mathematics. Notice that
grades in other departments in the Faculty of Natural Sciences are higher than in
economics. The largest grade differentials occur in the Humanities Program and grades
supplied by the School of Education. Within the Faculty of Social Sciences the largest
grade differentials occur in geography and communications. Note that the estimates of
grade differentials are not on the whole sensitive to the method of estimation or the
specification of major effects. However, in the case of accountancy and statistics, grade
differentials are not statistically significant in the case of method 1, but are small and
statistically significant in the case of method 2. Therefore, these grade differential
estimates are robust to the way in which one controls for student ability and their choice
of majors.
A t – test may be used to determine whether grade differentials are statistically
significant. To determine whether the grade differential in say psychology is significantly
different to that in geography, the difference of 1.68 points (method 2 with major effects)
is divided by the square root of the sum of the estimated variances minus twice their
covariance. The t statistic equals 1.61, which is smaller than conventional critical values.
Therefore, the difference in grades between these departments is not statistically
significant. However, other grade differentials are statistically significant.
Table 6 Departmental Grade Differentials
Method 1
Method 2
Major Effect Yes
No
Yes
No
5.12
5.49
5.29
5.44
Political Science
(10.47)
(9.55)
(11.72)
(11.82)
7.56
7.90
7.60
7.74
Geography
(11.90)
(11.09)
(12.39)
(12.34)
5.18
5.49
5.30
5.47
International
8
Longer panels naturally have smaller variances. In unbalanced panels the panel length varies by course,
which induces heteroscedasticity in the residuals. Robust standard errors take this heteroscedasticity into
account.
16
Relations
Communications
Psychology
Sociology
Statistics
Business Studies
Accountancy
Latin American
Studies
Law
E. Asian Studies
Islamic & Middle
East Studies
Philosophy
PPE
Education
Humanities
Maths
Physics
Biology
(11.10)
6.88
(14.02)
5.80
(10.4)
5.27
(8.84)
0.67
(1.11)
5.77
(12.82)
0.93
(1.92)
11.51
(19.09)
5.06
(8.69)
4.01
(5.14)
4.80
(5.78)
4.53
(6.97)
5.17
(8.47)
10.58
(18.65)
13.30
(13.06)
-3.85
(-2.11)
3.36
(2.69)
3.90
(5.32)
(9.55)
6.58
(11.37)
5.51
(9.49)
5.07
(7.48)
0.34
(0.47)
5.60
(10.45)
0.24
(0.40)
11.60
(16.61)
3.96
(5.59)
4.25
(4.64)
4.32
(5.99)
4.97
(7.24)
5.58
(5.97)
10.06
(11.63)
13.13
(12.82)
-5.58
(-2.16)
1.91
(1.49)
3.23
(4.02)
(11.81)
7.45
(15.64)
5.91
(9.69)
5.47
(11.43)
1.63
(3.73)
6.09
(15.56)
2.12
(4.87)
10.61
(16.28)
4.36
(5.76)
4.41
(7.07)
4.34
(7.82)
4.87
(7.89)
5.06
(8.35)
11.62
(18.58)
13.63
(13.74)
-4.51
(-3.13)
3.42
(2.88)
3.01
(4.12)
(11.99)
7.63
(15.89)
5.93
(9.70)
5.63
(11.86)
(1.68)
(3.83)
6.09
(15.44)
2.06
(4.71)
10.82
(16.38)
3.31
(4.24)
4.60
(7.35)
4.49
(8.10)
5.04
(8.05)
4.81
(7.99)
11.79
(18.40)
13.80
(13.88))
-4.65
(-3.06)
3.10
(2.52)
2.84
(3.66)
Note: t-statistics reported in parentheses.
The estimates of grade differentials reported in Table 6 are on the whole robust with
respect to the method of estimation, except in the cases of statistics and accountancy.
They are also robust with respect to the specification of major effects.
4.3 The Role of Majors
Table 7 reports the estimated  coefficients in equations (1) and (2). Only six of these
coefficients are statistically significant according to method 1 at conventional levels of
17
probability. With the exception of geology the  coefficients are negative. Since the base
refers to economics, this means that students majoring in economics tend to do better,
given their ability etc, than other students studying the same courses. This effect might be
due to one or more of the three reasons mentioned in Section 2. First, there may be
intellectual complementarity or spillover between economics and other subjects so that
economics helps students obtain better grades in other subjects. Second, peer group
effects in economics are more conducive to learning. Third, students with specific ability
in economics have higher than average generic ability.
Table 7 Major Effects
Method 1
Method 2
Coefficient
t-statistic
Coefficient
t-statistic
Political Science
0.04
0.07
0.30
0.76
Geography
0.03
0.04
0.16
0.37
Int Relations
-0.37
-0.80
0.09
0.28
Communications
-1.16
-1.95
-0.28
-0.76
Psychology
-0.82
-1.67
-0.13
0.38
Sociology
-0.76
-1.53
-0.30
-0.78
Statistics
-1.82
1.31
-0.72
-1.95
Business studies
-0.77
-1.20
-0.72
-1.95
Accountancy
-2.23
-3.52
-0.84
-2.33
Law
-3.70
-3.36
-4.78
-3.39
E Asian studies
0.02
0.02
-0.07
-0.10
L American studies
-0.15
-0.15
-0.70
-1.03
Philosophy
0.40
0.42
0.59
0.70
PPE
0.03
0.02
-5.04
-3.04
Education
-1.60
-1.8
0.69
1.25
Islam and M. East
0.25
0.40
-0.23
-0.44
Russian & Slavic
-14.12
-7.67
0.75
0.27
6.38
6.52
-0.43
-0.35
Studies
Geology
18
Chemistry
-9.47
-7.08
-0.15
-0.20
Maths
-4.61
-1.34
-1.18
-0.96
See notes to Table 6
The coefficient estimates in Table 7 are more sensitive to the method of estimation than
their counterparts in Table 6. Since, as suggested, the choice of majors may be related to
ability, the estimates of the ’s are likely to depend upon how ability is measured.
Therefore, it is not surprising that the two methods produce quite different results. For
example, the major effect for accountancy is -2.23 according to method 1 while it is only
-0.84 according to method 2. Some major effects, which are statistically significant
according to method 1 (geology, chemistry and Russian and Slavic Studies) are not
statistically significant according to method 2.
4.4 Progress during the BA
Table 8 Year Effects
Method 1
Method 2
Coefficient t-statistic
Coefficient
t-statistic
2003-2004
-0.45
-0.44
-3.12
2004-2005
base
2005-2006
0.46
2.37
1.02
7.46
2006-2007
-2.25
-4.28
1.60
4.57
2007-2008
-4.21
-4.09
2.30
3.42
2008-2009
-6.49
-4.53
1.94
1.52
-2.30
Base
Note: Estimated with major effects.
Table 8 reports fixed effects (’s) for the years in which the course was studied (the base
year is 2004-5). According to both methods students tend to perform weakly in their first
year (2003-2004); their grades are lower by 0.45 points. Subsequently, they perform
better. However, from 2006-7 onwards the two methods give opposite results. Students
who took courses after 2005-6 obtained significantly lower grades according to method 1,
whereas the opposite is true according to method 2. As noted in Table 3, many students
took longer than three years to graduate and some did not graduate at all.
19
Since method 2 estimates students' university ability and less able students take
longer to graduate (and might not graduate at all), the time fixed effects are less affected
by adverse selection among students who graduated later. Given their university ability,
method 2 indicates that students who took longer to graduate in fact obtained better
grades. This premium reaches 2.3 points in 2007-2008.
When a dummy variable is specified in method 1 for students who failed to
graduate by 2008, its estimated coefficient is -8.05 with standard deviation 0.58. As
expected, students who failed to graduate by 2008 are negatively selected, and it is not
simply a matter that these students were slow to graduate. The results for method 1
reported above are robust, however, with respect to this specification.
4.5 High School Ability v University Ability
The two methods handle ability differently. Method 1 hypothesizes that university ability
is correlated with high school ability but may differ between men and women,
immigrants and natives, and may be age dependent. Method 2 estimates university ability
directly. These estimates may depend on age etc but if they do, such effects are taken
directly into consideration. Method 1 shows (Table 9) that entry grade matters and is
highly significant. Since entry grades vary by 9 points between departments, the
contribution of high school ability adds at most 12½ grade points. Table 9 shows that
there are no significant sex differences, that new immigrants perform like natives and that
older students do slightly better.
Table 9 Cross Section Variables (Model 1)
Major Effects
Yes
No
Coefficient t-statistic
Coefficient
t-statistic
Entry Grade
1.419
12.14
1.424
13.33
Female
0.962
2.30
0.925
2.24
Age in 2003
0.426
3.99
0.458
4.30
Immigrant
0.604
1.21
0.648
1.29
Figure 1 plots the relationship between student fixed effects and their entry
grades. The bold lines are drawn through the means of the two axes. The four quadrants
indicate that there are many students with low entry grades who did well at university
20
(top left) and there are many with high entry grades who performed weakly (bottom
right). On the whole, however, the two measures of ability are positively correlated, but
the correlation is only 0.417. In fact there is a substantial degree of mobility in ability
between high school and university. The mean reversion coefficient between university
ability (measured by normalized fixed effects) and high school ability is only 0.32,
indicating a high degree of mobility between high school performance and university
performance. Students with top entry grades also had the greatest university ability.
However, many students with intermediate entry grades had similar university ability. On
the whole Figure 1 indicates that high school ability is a poor predictor of university
ability.
Figure 1 The Correlation between University Ability and High School Ability
Figure 2 plots the distribution of student fixed effects, which are approximately
normalized to zero (mean = -1.07). The empirical distribution is clearly different from the
normal distribution (which is indicated in Figure 2) due to excess kurtosis, and left
skewness. There is a long left tail of weaker students.
21
4.6 Robustness Checks
When the number of students attending the various courses is specified in the model, the
estimated coefficient according to method 2 is -0.0162 with standard deviation 0.001.
Method 1 returns an almost identical result. This estimate means that students obtain
lower grades in larger classes9, and grades decrease by 1 point when the number of
students attending the course increases by 60. Specifying the number of students
attending the course does not, however, significantly alter the other parameters. If
instruction quality varies inversely with class size, this robustness check suggests that the
estimates of grade differentials reported in Table 6 are unrelated to teaching quality.
Method 1 is also estimated using data for 2003. Since compulsory courses are
taught in the first year of the BA program, the estimates for 2003 should not be affected
by potential course selection bias. On the other hand grading policies might vary by
department in the first year and for compulsory courses. The sample size is inevitably
9
Some departments allocate students to separate classes. Therefore, the number of students in the course
would only equal class size if there is only one class.
22
reduced (from 36163 to 10453 observations) and the standard deviations of the parameter
estimates consequently increase.
Results are reported in Table 10. Comparing Tables 6 and 10 reveals that the
grade the differential is largest in geography and smallest in international relations. Since
economics continues to serve as the base, this means that grades in economics continue to
be low in the first year. On the other hand, grade differentials in business studies,
communications and education are large in both Tables 6 and 10. It seems therefore that
for some departments, such as psychology, the grade differential intensifies during the
second and third years.
Table 10 Grade Differentials in the First Year (2003)
Grade Differential
t-statistic
Political Science
3.85
1.911
Geography
11.92
3.300
Int Relations
-2.18
-1.94
Communications
8.42
5.15
Psychology
4.33
2.11
Sociology
3.93
1.18
Statistics
1.33
2.10
Business studies
9.39
5.00
Accountancy
5.79
2.35
Law
3.50
3.53
Philosophy
5.91
3.10
PPE
1.51
0.86
Education
9.42
4.31
Biology
9.84
3.81
5. Results with Instructor Fixed Effects
We now estimate methods 1 and 2 with instructor fixed effects. As mentioned, this
involves a reduction in sample size by about 5,000 course grades. Since there are 628
instructors and almost 1,200 students, method 2 involves the estimation of 1,868 fixed
effects. This is feasible because there are over 33,000 observations on course grades.
23
However, we streamline by abstracting from major effects, which turn out to be
unimportant when instructor fixed effects are specified.
We focus on a number of related questions. First, does the specification of instructor
fixed effects alter the estimates of grade differentials? For these purposes we define the
departmental grade differential by the weighted average of instructor fixed effects by
departmental affiliation. Secondly, does grading heterogeneity among instructors vary by
department? Suppose, for example, that the mean instructor fixed effect is the same in
departments A and B, but the variance in B is larger than in A. Therefore, instructors are
more heterogeneous in their grading in B than in A. Third, do instructors grade according
to departmental norms? Do instructors grade more leniently in more lenient departments?
Since some instructors are affiliated to more than one department, do they grade more
leniently (strictly) in their department which grades more leniently (strictly)? Fourth, do
instructors grade differentially depending on their sex, age and status. In particular, is it
the case as suggested by Johnson (2003), that external instructors grade more leniently to
increase their popularity among students, and enhance their employment prospects?
5.1 Grade Differentials
Models 1 and 3 in Table 11 reports estimated grade differentials for methods 1 and 2 in
which instructor fixed effects are specified. For instructors affiliated to more than one
department, we specify separate instructor fixed effects. Since these estimated grade
differentials use smaller samples than their counterparts in Table 6, Models 2 and 4 are
reported for purposes of comparison. For example, according to Table 6 the grade
differential for geography is 7.09 when using method 2. In Table 11 this parameter is
estimated at 6.52 using 33,166 observations (model 2). When instructor fixed effects are
specified the estimated differential is 6.98 (model 1). A comparison of models 1 and 2
shows that the estimated grade differentials are, on the whole, insensitive to the
specification of instructor fixed effects. The same applies to method 1 based on a
comparison of columns 3 and 4. The maximum difference is about half a grade point.
Therefore, ignoring the identity of instructors is unimportant when estimating
departmental grade differentials.
Table 11 Departmental Grading Differentials
24
Method
Model
Fixed effects
300
301
311
312
320
321
322
323
325
326
401
802
Observations
2
1
1
2
3
4
Students &
instructors
Students
Instructors
None
4.1584623
4.437324
3.7820001
3.101328
4.8698805
4.54037
3.7256423
3.196854
4.4497588
4.426159
3.7494584
3.906262
4.3920777
4.679812
3.8781176
4.226971
1.1686173
0.9364849
-0.2965462 -0.7851966
0
0
0
0
5.9548419
5.508387
5.6927214
4.856543
6.31934
5.954251
4.9401809
4.214675
0.9136477
0.9297713 -0.91361989 -1.028171
2.1713848
2.289123
2.9330178
2.82098
1.9560671
2.650691
2.8679076
2.850693
6.977011
6.515433
6.4369908
5.911659
33,166
33,166
31,792
31,792
The 668 instructor fixed effects are jointly statistically significant. The sum of their
squared t – statistics is 1,106, which greatly exceeds its critical chi-squared value. Also,
there is heterogeneity between departments in the variance of the estimated fixed effects.
Figure 3 plots the kernel densities for the estimated instructor specific effects by
department. The distribution is tighter in departments where instructors grade more
similarly, and it is more diffuse in departments where instructors grade less similarly.
5.2 Decomposing Instructor Fixed Effects
Table 12: Regression Model for Instructor Fixed Effects
Coef.
t-stat
300 2.390715
2.01
301
3.99557
3.73
311 3.764748
3.5
312 2.321733
2.12
320 -1.88085
-1.45
321 (base)
322 1.639356
1.61
323
4.58675
4.21
326
-1.2834
-0.43
350 2.738174
0.65
399 7.855539
1.87
802 5.078505
4.09
25
Doctorants
Faculty
External
1.228779
(base)
1.806651
1.25
Sex
Age
Intercept
Observations
R2
-0.79779
-1.29
0.076784
2.87
0.167163
0.11
274
0.245
2.96
We obtained demographic data for 274 of the 648 instructors. In Table 12 we report the
results of a regression model for the estimated instructor fixed effects for model 1 in
Table 11. Table 12 shows that departmental affiliation is a key determinant of grading by
individual instructors. The estimated coefficients naturally reflect the estimates of grade
differentials in Table 11. For example, the coefficient for geography is 5.08 in Table 12
and is 6.98 in Table 11. There is no difference in the grading behavior between male and
female instructors. However, external instructors grade more leniently (by almost 2 grade
points). So do older instructors, however, the size effect is small.
5.3 A Quasi-experiment
The interpretation of the departmental coefficients in Table 12 is ambiguous. Either
instructors in geography grade more leniently, or the grades in geography are higher
because instructor quality is superior in the Department of Geography. A simple quasiexperiment to resolve this ambiguity is to compare the grade differentials of instructors
who are affiliated to two departments since instructor quality is specific to the instructor
rather than the department.
We use the following differences-in-differences estimator (DID): Let i1 and i2 denote
instructor n’s fixed effects in departments 1 and 2., G1 and G2 denote the departmental
grade averages, fen denotes a fixed effect, capturing instructor quality and leniency, and
h denotes a residual error.
 n1  fen   1G1  h1i
 n 2  fen   2 G2  h2i
If γ1 = γ2 = γ, the DID estimator eliminates fe:
d n   n1   n 2   (G1  G2 )  (h1i  h2i )
26
Using model 1 in Table 11 the DID estimate of γ is 0.856 with t statistic = 3.55 (Figure
4). Instructors affiliated to more than one department grade more leniently in the
department with the higher grade. The rate of convergence to the departmental grading
norm is 86 percent. This result strongly suggests that instructors are affected by
departmental grade norms. Indeed, it strengthens the suspicion that grade differentials are
not induced by differential instructor quality. Rather, they are induced by academic
policy to grade more strictly or leniently.
Figure 4 Testing for Departmental Grading Bias
6. Grade Inflation
The cohort data for 2003 are not informative about grade inflation. The statistical models
specified the year in which courses were studied, and the estimates indicate that grades
were lowest in 2003 and rose subsequently. However, this effect reflects the gradual
adjustment of students to the university environment rather than grade inflation. They
achieved lower grades in 2003 simply because this was their first year of study and the
27
university environment was new. One would need additional student cohorts to estimate
grade inflation, which implies that later cohorts obtain higher grades than earlier ones.
Another way to estimate grade inflation is to track grade averages over time. This
assumes that average student quality does not change over time. If average student
quality happened to increase over time, and better students obtain higher grades, average
grades should increase over time, as noted by Bar et al (2009). This could not, of course,
be counted as grade inflation10. For example, the entry requirement at the Hebrew
University has been raised in economics and lowered in psychology. Given everything
else, this might have been expected to raise grades in economics and to lower them in
psychology.
Series A in Table 13 reports the weighted average grades on courses supplied by
departments in the Faculty of Social Sciences. The weights (w) are based on student
participation by course, so that more popular courses are given greater weight. On the
whole, these data do not suggest that grade inflation occurred during 2000 to 2008. Grade
inflation occurred in the Department of Political Science (4 points) and in the Department
of Communications (2 points) while in the Department of International Relations there
was grade disinflation (-3 points). The most remarkable feature of Table 13 is the large
and persistent differences in average grades across the departments. Psychology and
communications head the league table while statistics and economics share bottom
places.
Let Gjt denote the average grade in course j in year t. Since series A is weighted by
student course participation, i.e. At  1 w jt G jt , there is an obvious index number
J
problem. If students increasingly choose courses that award higher grades, average
course grades would increase even if the course grades themselves did not change. Bar et
al (2009) point out that at Cornell University the incentive to chase grades11 increased in
1998 when the university began to publish median grades. Since in the Hebrew
10
The parallel between actual inflation and grade inflation is complete. Inflation ignores improvements in
the quality of goods, and is understated if consumers purchase cheaper goods.
11
Grade chasing occurs when students chose courses for their grades rather than for their academic content
(Sabot and Wakeman - Linn 1991). Sabot and Wakeman-Linn show that the probability of taking a further
course in a subject varies directly with the previous grade. They interpret this as grade chasing, when it
might simply be the case that students choose courses that suit them. Johnson (2003) argues that grade
inflation is induced by competition between departments over student numbers, which induces “grade
races” as in arms race models.
28
University grades have always been public knowledge, this incentive to chase grades has
not changed. Therefore a simple rather than a weighted course grade average might be a
superior measure of grade inflation. This is shown by series Bt 
1
J
G
J
1
jt
in Table 13.
Since A is a student-weighted average of course grades, A is equal to B plus the
covariance between w and G and w, i.e At  Bt  cov t ( wG) . Cov(wG) may differ from
zero for two main reasons. First, grade chasing increases the covariance in which
causality is from G to w. Secondly, unpopular instructors may ingratiate themselves with
students by grading more generously to boost student numbers. This ingratiation effect
has a negative effect on cov(wG). If the covariance is zero then A = B. The covariance
equals A – B.
Equilibrium in the market fpr students and grades is illustrated in Figure x where
schedule S is induced by grade-chasing and schedule I is induced by ingratiation.
Equilibrium grades and student rolls is determined where the two schedules intersect.
Equilibrium with Ingratiation &
Grade Chasing
G
I
S
G*
w
w*
We cannot decompose the covariance into its two components. However, if the class-size
covariance is constant, changes in the covariance may be attributed to the intensity of
grade-chasing. The estimated covariances range between -4,9 (economics 2004) to 1.4
(statistics 2007). Therefore the covariance between grades and course choice can make a
29
substantial contribution to grade differentials. The mean covariances range between -3.48
in political science to 0.2 in international relations, and the grand mean is -1.61. Since the
covariance is typically negative the class-size covariance dominates its grade-chasing
component. The covariance increases in statistics by about 2 and slightly decreases in
international relations and psychology, suggesting that grade-chasing has increased in the
former and decreased in the latter. In other departments grade-chasing has remained
stable.
CommunicA
ations
B
cov(wG)
C
Economics
A
B
Cov(wG)
C
Geography
A
B
Cov(wG)
C
International A
Relations
B
Cov(wG)
C
Political
A
Science
B
Cov(wG)
C
Psychology
A
B
Cov(wG)
C
Sociology
A
B
Cov(wG)
C
Statistics
A
B
Cov(wG)
C
2000
85.8
85.2
0.6
77.2
81.9
-4.7
84.1
85.4
-1.3
86.5
86.4
0.1
79.9
81.8
-1.9
88.5
89.6
-1.1
80.7
82.4
-1.7
77.8
80.3
-2.5
Table 13 Average Grades
2001 2002 2003 2004 2005
87.6 88.7 86.3 87.7 86.5
88.9 89.1 85.3 87.6 87.6
-1.3 -0.4 -1.0 0.1
-1.1
89.0
80.0 79.2 79.4 77.5 77.3
81.2 82.7 82.5 82.4 81.3
-1.2 -3.5 -3.3 -4.9 -4.0
84.1
85.4 85.6 84.9 84.7 83.8
85.8 86.8 87.1 86.2 84.4
-0.4 -1.2 -2.2 -1.5 -0.6
85.9
85.4 85.0 84.5 83.1 82.8
84.6 82.7 82.9 82.4 83.4
0.8
2.3
1.6
0.7
-0.6
85.1
79.0 80.6 81.4 82.8 83.3
83.6 84.4 85.1 86.9 86.3
-4.6 -3.8 -3.7 -4.1 -3.0
84.1
88.5 88.7 88.7 88.0 88.7
90.6 90.8 90.2 90.2 90.9
-2.1 -2.1 -1.5 -2.2 -2.2
90.5
78.3 80.5 80.2 79.1 80.8
79.8 83.4 81.6 81.8 83.2
-1.5 -2.9 -1.4 -2.7 -2.4
84.3
81.8 80.8 79.7 80.0 80.3
83.3 80.2 79.0 79.7 79.9
-1.5 0.6
0.7
0.3
0.4
82.6
2006
87.6
88.0
-0.4
88.1
77.5
82.0
-4.5
84.7
84.5
86.6
-1.1
85.5
82.9
83.6
-0.7
85.5
83.1
86.7
-3.6
84.4
88.6
91.2
-2.6
89.8
82.0
83.2
-1.2
85.5
76.3
77.1
-0.8
82.7
2007
86.9
86.8
0.1
87.8
77.3
82.4
-5.1
84.3
85.2
86.9
-1.7
83.4
83.4
84.4
-1.0
84.9
84.0
87.1
-2.9
84.7
88.7
91.6
-2.9
90.6
81.6
84.3
-2.7
86.5
78.3
76.9
1.4
83.8
2008
88.1
88.8
-0.7
87.9
79.4
83.0
-3.6
84.5
85.9
87.4
-1.5
85.1
83.7
85.1
-1.4
85.0
83.9
86.5
-2.6
85.1
89.4
91.5
-2.1
90.0
81.2
84.7
-3.5
86.3
78.8
79.8
-1.0
83.1
2009
88.2
84.5
85.7
85.5
85.1
90.5
86.9
81.7
30
A: average course grade weighted by the number of students in the course. B: simple
average course grade. C: average final course grade (GPA). I am grateful to Benny Yakir
for the data on A and B. Source: Department of Student Administration.
Series C in Table 13 reports average final grades in the various majors, and is not
directly comparable to series A and B because it is based on courses attended over a
period of at least three years and it only refers to students who registered by major. By
contrast series A and B refer to all students irrespective of their major. Although it is
available for a shorter period of time, series C does not suggest that there has been grade
inflation during 2003 – 2009.
7. Conclusion
Two statistical methods have been compared for estimating departmental grade
differentials. The first controls for student ability by using their university entrance
grades (matriculation and psychometric scores). The second controls for student ability
by estimating specific effects for each student using panel data estimation. The former
measures high school ability while the latter measures ability at university. It turns out
that using data for the 2003 cohort of BA students studying Social Science at the Hebrew
University of Jerusalem the correlation between the two measures of ability is only 0.41.
Indeed, there is substantial upward mobility between high-school and university ability.
Although the strongest high-school students tend to be the strongest university students,
there are many weaker high-school students who do well and even excel at university.
Despite differences in the two measures of ability, both methods suggest that departments
grade to different standards. Indeed, the differences are significant and can be as large as
15 points (out of 100). In the Faculty of Social Sciences grades are lowest in economics
and highest in communications and geography. Another interpretation of these results is
that they reflect differential teaching quality. However, attempts to control for teaching
quality suggest that this interpretation is unreasonable. Instructors affiliated to more than
one department grade more leniently in the department where grades are higher and more
strictly in the department where grades are lower. These instructors adjust their grading to
norms set by their department. Therefore, we claim that departmental grade differentials
31
are not caused by the quality of instructors or the quality of students. They are caused by
arbitrary standards of leniency and strictness.
A simple methodology is also proposed for estimating grade inflation under the
assumption that student quality does not vary over time. The methodology takes account
of grade-chasing by students, i.e. students study courses where grades are higher.
Surprisingly, there is no evidence of grade inflation during the last decade. This is
surprising because the available evidence indicates that grade inflation is a problem in a
number of countries. On the other hand, the result that economics seems to be the strictest
of the social sciences in terms of grading is consistent with what seems to be happening
in other countries.
32
References
Achen A.C. and P.N. Courant (2009) What are grades made of? Journal of Economic
Perspectives, 23: 77 - 92
Bar T., V. Kadiyali and A. Zussman (2009) Grade information and grade inflation: the
Cornell experiment. Journal of Economic Perspectives, 23: 93-108.
Bar T. and A. Zussman (2011) Partisan Grading, American Economic Journal: Applied
Economics (fortcoming).
Brogden H.E and E.K. Taylor (1950) The theory and classification of criterion bias.
Educational and Psychological Measurement, 10: 159-186.
Elliot R. and A.C. Strenta (1988) Effects of improving the reliability of the GPA on
prediction generally and on comparative predictions for gender and race particularly.
Journal of Educational Measurement, 25: 333-347.
Davidson R. and J. G. MacKinnon (2009) Econometric Models and Methods, Oxford,
Oxford University Press.
Goldman R.D. and M.H. Widawski (1976) A within-subjects technique for comparing
college grading standards: implications in the validity of evaluations of college
achievement. Educational and Psychological Measurement, 36: 381-90.
Johnson V.E. (2003) Grade Inflation: a Crisis in College Education, New York,
Springer.
Krautman A.C. and W. Sander (1999) Grades and student evaluations of teachers.
Economics of Education Review, 18: 59-63.
Linn R.L. (1966) Grade adjustments for prediction of academic performance. Journal of
Education Measurement, 3: 313-329.
Sabot R. and J. Wakeman-Lin (1991) Grade inflation and course choice. Journal of
Economic Perspectives, 5: 159-71.
Strenta A.C. and R. Elliot (1987) Differential Grading standards revisited. Journal of
Educational Measurement, 24: 282-91.
Stricker L.J., D.A. Rock and N.W. Burton (1993) Sex differences in prediction of college
grades from scholastic aptitude test scores. Journal of Educational Psychology, 85: 710718.
33
Stricker L.J., D.A. Rock, N.W. Burton, E. Mutaki and T.J. Jirele (1994) Adjusting
college grade point average criteria for variations in grading standards: a comparison of
methods. Journal of Applied Psychology, 79: 178-183.:
Young J.W. (1993) Grade adjustment methods. Review of Educational Research, 63:
151-165.
Download