1 Estimating Grade Differentials and Grade Inflation when Students Chase Grades, and when Student Quality and Instructors Matter Michael Beenstock Dan Feldman Department of Economics Hebrew University of Jerusalem Previous studies have shown that there are persistent differences in university course grades across subjects, suggesting that some departments grade more leniently than others. Since better students are expected to obtain higher grades, some of these differences might be due to student quality. The grades of the 2003 cohort of BA social science students at the Hebrew University of Jerusalem are decomposed into an ability effect, an instructor effect, a choice of major effect, a departmental grading effect, and a random effect. Two methods are used to estimate these effects. The first conditions on student entry grades to measure high school ability. The second conditions on student specific effects to measure university ability. We show that instructors who teach in different departments adjust grades to departmental norms. This suggests that inter-departmental grade differentials are induced by grading standards rather than the quality of instruction. Whereas departmental grade differentials are large and persistent, there is little evidence of grade inflation during 2000 – 2008. A simple test is proposed to determine whether grade-chasing by students induces grade inflation. We show that grade-chasing has been stable during the study period.. June 4, 2012 Keywords: differential grading, measuring academic ability, grade inflation, gradechasing. 2 1. Introduction The scientific literature on course grade comparability at universities has three related components. The first, in order of chronology, is concerned with measuring the degree of grade comparability, or its lack (Brogden and Taylor, 1950). Differential grading has a hierarchical structure; it may exist between universities, between faculties at universities, between departments and academic disciplines in faculties, between courses in departments and between instructors. Apart from its inherent inequity, grade incomparability has practical and financial implications since universities typically award scholarships according to the grades of the best students. Also, acceptance into graduate programs depends on BA grades, as may also the employment prospects of graduates. Whereas grade differentials are concerned with horizontal inequity, grade inflation (Johnson 2003) is concerned with vertical inequity. Grade inflation creates the misleading impression that students who graduated later performed better than students belonging to an earlier cohort. Grade incomparability and grade inflation seem to be widespread, if not universal, phenomena. The second component follows the first and is concerned, in the interests of fairness, with designing adjustment methods that correct for differences in grading standards1. Goldman and Widawski (1976) suggested adjustments based on the grades of the same students who studied in different departments. This idea was extended by Elliot and Strenta (1988) to the course grades of the same students within departments. Stricker et al (1993, 1994) “residualized” course fixed effects by regressing course grades on student covariates, and suggested that these fixed effects be applied for purposes of grade adjustment. However, these adjustments ignore the possibility that grades might depend on the quality of instruction. Given everything else, the students of superior instructors may achieve higher grades. The third and most recent component is concerned with the behavioral causes of differential grading. For example, Krautman and Sander (1999) and Johnson (2003) claim that instructors “buy” favorable student evaluations by grading leniently. Also, instructors may offer higher grades to compete in the market for students. The “grade 1 These adjustment methods for university and college grades have their counterparts for high school grades (Linn 1966 and Young 1993). 3 race” triggered by this competitive behavior induces grade inflation (Johnson 2003). Bar and Zussman (2011) claim that at Cornell University instructors affiliated to the Republican party graded less equally and favored white students relative to instructors affiliated to the Democrats. The present paper is concerned with the first component, the measurement of differential grading, intra-temporally and inter-temporally. Departments typically set differential rules for accepting students and set higher entry standards according to the pressure of applications. If student quality varies between departments, we naturally expect grades to be higher in departments with better students. However, student quality is not observable, which complicates the empirical estimation of differential grading. Specifically, two measures of student quality are compared. The first measures student quality by their entry or acceptance grades, which measure student ability post highschool and pre-university and are based on high school matriculation grades and psychometric scores. The second measures student quality at university. The latter is estimated as a student specific effect using course grade histories on individual students. We show that measures of differential grading may depend on how student quality is measured. We also show that pre-university ability and university ability (as measured by student specific effects) is correlated 0.41, i.e. pre-university ability is an imperfect indicator of university ability. Just as student quality is unobserved, so is the quality of their instructors unobserved. Given everything else, students taught by better instructors should obtain higher grades. Therefore, some of the grade differences might be induced by the quality of instructors as well as the quality of students. We do not think that student evaluations are reliable measures of instructor quality because, as mentioned, instructors might grade to curry favor with students. Instead, we estimate instructor fixed effects to measure instructor specific grade differentials. These differentials embody three components; leniency, quality and a peer group effect induced by departmental grading norms. We propose a quasi-experiment to establish that the dominant component is the departmental grading norm. As pointed out by Achen and Courant (2009), data on grades are rare. There is unfortunately no centralized data depository for course grades that may be used for 4 purposes of research. Universities tend to be discreet about grades to the point of secrecy. Such data have three main levels of aggregation: by department, by course and by student. The latter consist of micro data for the grades of individual students by course. Achen and Courant (2009) and Bar et al (2009) used data for the University of Michigan and Cornell University respectively which are aggregated by course and discipline. Johnson (2003) and Sabot and Wakeman-Linn (1991) also used aggregated data for Duke University and Williams College respectively. Indeed, the few studies that are available refer to grading in perhaps a non-representative sample of a handful of US universities. There seem to be no studies for universities outside the US. There are even fewer studies using micro data on individual students. Stricker et al (1993, 1994) used first semester grades of students at a large state university in the US to compare different methods of adjusting GPAs for differential grading standards. Barr and Zussman (2011) matched students to their instructors at Cornell University. In the present study we use longitudinal micro data for BA students at the Hebrew University of Jerusalem2. Specifically, we use complete cohort data on 1217 students who registered in 2003 and who majored in at least one of the social sciences. The data are longitudinal and cover the years 2003 – 2008 by the end of which the vast majority of students had graduated. To our best knowledge this is the first time that data on entire grade histories of individual students are being used. It is this feature of the data that enables the estimation of student specific effects. We have also matched instructors to courses, which enables the estimation of instructor specific effects. Social science students in Israeli universities typically choose two majors. They might major in two departments in the Faculty of Social Sciences (FSS), or they might choose a second department from outside FSS as their second major. To broaden their education students are also required to attend courses outside their two majors. This means that students have course grades from three or more departments. Since most students major in two departments we may compare the performance of the same students across departments as in Goldman and Widowski (1976) and across courses in departments as in Elliot and Strenta (1988). 2 I am grateful to Billy Shapira and Rachel Amir for supplying the data. The research was instigated by the Pedagogic Committee of the Faculty of Social Sciences chaired by Menachem Hoffnung. 5 A further effect not previously investigated is related to choice of major. For example, we show that given their ability, students majoring in economics obtain higher grades in non-economics courses. This effect might be induced by knowledge spillovers; i.e. economics is a discipline that benefits learning in other disciplines. Or, it might be a peer group effect; economic students are more competitive. Although we are unable to distinguish between these interpretations, these “major effects” complicate the estimation of differential grading. If, for example, students of economics happen to take courses in sociology, the mean grade in sociology will increase for reasons unrelated to differential grading. Measuring grade inflation involves similar index number problems to measuring price inflation. If students chase higher grades, student-weighted measures of grade inflation will be biased upwards since courses with higher grades will attract more students. We suggest a simple test of grade-chasing based on a comparison of weighted and simple grade averages. We show that during the study period (2000 – 2008) there is little evidence of grade inflation and that grade-chasing was, on the whole, stable. 2. Methodology The entry grade for student i is denoted by Xi, which is a weighted average of high school matriculation grades and grades obtained in nationwide psychometric test for university entrance3. X is scaled between 16 and 25 points. Course grades for student i in course j in year t are denoted by Yijt and are scaled between 0 and 100. Course j is supplied by department k, and there are K departments. A set of K-1 dummy variables is generated to identify departments denoted by Dk. Therefore Djk = 1 if course j was supplied by department k and is zero otherwise. Another set of dummy variables is generated such that Mik = 1 if student i majored in department k and is zero otherwise. A further set of dummy variables is generated such that Ct = 1 if the course was attended in year t (year since registration) and is zero otherwise. Pnjt =1 and zero otherwise if instructor n taught 3 In Israel psychometric test are carried out by a national body, the Center for Psychometric Evaluation. Therefore test results are inter-personally comparable. The matriculation result is made up of two components, results from nation-wide examinations of the Ministry of Education, which are interpersonally comparable, and an assessment made by the high school, which is not. Students must have entry grades of at least 16 points to study at the Hebrew University. 6 course j in year t. Finally, the data include a vector of demographic controls for students (Zi) for gender, age in 2003, and year of immigration for immigrants. 2.1 Method 1: Ability Measured by Entry Grades Method 1 conditions on entry grades (X), which measure high school ability: T 1 K 1 K 1 N 1 t 1 k 1 k 1 n 1 Yijt X i Z i t Ct k D jk k M ik n Pnjt u ijt (1) In equation (1) u denotes a residual error, which measures unobserved phenomena such as ability, luck, study habits, ethnicity etc, and is expected to be zero. Including X in equation (1) is intended to control for observed ability, therefore, is expected to be positive. Since high school ability and university ability are positively correlated, and u embodies ability at university, u and X may not be independent. This dependence would induce positive bias in the estimate of . Another problem is that E(ujuh) is unlikely to be zero because students who have better grades in course j may have better grades in course h and other courses. This problem does not induce statistical bias, but it is detrimental to statistical efficiency. The standard errors of the parameters are therefore clustered by student, which mitigates this problem. The γ coefficients capture the potential effects of demographic variables on grades, and the coefficients capture the potential effect of time on grades. If t exceeds t-1 students who attended courses in year t obtained better grades than those who attended the same courses in the previous year. This sounds like grade inflation, but it is not. It may simply be the case that students do better in their second year than in the first, as they gain more university experience. To estimate grade inflation, it would be necessary to add additional cohorts to the analysis. We return to the issue of grade inflation in section 6. The coefficients capture potential motivational effects and peer group pressures such that if k is positive students majoring in department k outperform otherwise identical students majoring in department K in common courses. Since equation (1) conditions on student ability and the department supplying the course, begs the interpretation of a peer group effect or knowledge spillover effect since ability and course identity have already been factored out. If k is positive, studying major k helps students 7 perform better in courses of department K. For example, statistics might help students perform better in economics and vice-versa. In the present context, the most important parameters in equation (1) are the s, since if k is positive the grades in department k are higher than in department K. Given everything else, including student ability as measured by X, and choice of majors etc, the s measure differential grading by department. In summary method 1 suffers from a number of statistical problems, which arise from the fact that X is an imperfect measure of university ability. Method 2 does not suffer from these problems, and is based on panel data econometrics (Baltagi 2005). 2.2 Method 2: Ability Measured by Student Specific Effects Method 2 requires the estimation of equation (2): T 1 K 1 K 1 N 1 t 1 k 1 k 1 n 1 Yijt i t Ct k D jk k M ik n Pnjt vijt (2) The difference between equation (1) and (2) is that equation (2) does not specify X and Z. Instead it specifies a separate intercept term for each student, or specific effects (i). Since the other covariates in equation (2) are the same as in equation (1), i is larger the more able is student i at getting good grades. Therefore the coefficients are measures of student ability at university. Because they measure ability at university rather than at high school, the statistical problems that arose with methodology 1 do not arise with method 2 since by definition the residuals (v) in equation (2) are independent of the s, whereas the residuals of equation (1) may not be independent of X and Z. The coefficients estimated by equation (2) may therefore be different from their counterparts estimated by equation (1) since the latter may be biased whereas the former are unbiased. The coefficients may be specified as fixed effects or as random effects. The latter assumes that the students have been sampled randomly from the population of students. Since the data refer to an entire student cohort the s are specified as fixed effects rather than random effects. Although the specification of fixed effects increases the burden of estimation, it avoids possible misspecification error resulting from inappropriate parametric assumptions regarding the distribution of the random effects. It is impossible to combine models 1 and 2 by specifying X and Z in equation (2) because these variables are perfectly correlated with the specific effects. 8 Equations (1) and (2) are not nested and are essentially different because equation (1) uses up a relatively small number of degrees of freedom to estimate the and γ coefficients, whereas equation (2) specifies separate intercept terms for each student, which in our case uses up 1180 degrees of freedom. Since the equations are non-nested4, adjusted R-squared does not indicate whether method 1 is superior to model 2. It would take a non-nested test of the two models to determine which method is preferable. Because the panel data are unbalanced (the number of observations per student is not constant), the standard errors of the parameters are likely to be heteroscedastic. Therefore, robust standard errors are reported for the parameters. Also, the standard errors are clustered, as mentioned, by course and student. 2.3 Major Effects If majors selected by students were fixed during the entire BA the coefficients in equation (2) could not be estimated because Mi would be perfectly correlated with their fixed effects (i). However, majors are not fixed for two reasons. First, departments operate different policies regarding entry into year 2. For example, students who registered for economics in year 1 will not be allowed to register for economics in year 2 if their first year grades in economics were inadequate. Such students will be required to seek a replacement major. Second, students switch majors on their own initiative. For example, in the previous example students might not wish to continue with economics even they were allowed to register in year 2. Whereas departments select students at the end of year 1, students may switch majors at any time. Indeed, there is substantial mobility in majors. Therefore, the coefficients are identified because M is not perfectly correlated with student fixed effects. Nevertheless, the interpretation of the coefficients is problematic because of selectivity in the choice of majors. Since methods 1 and 2 control for generic ability, the coefficients are unbiased if students choose majors according to their ability. If, however, ability is specific rather than generic matters might be different. For example, a student switches from economics to sociology because he is more suited to sociology. This self-sorting process is efficient because students are better matched to both 4 Non-nested because equation (1) is not a special case of equation (2), nor is equation (2) a special case of equation (1). See e.g. Davidson and Mackinnon (2009), chapter 14. 9 economics and sociology in terms of their specific abilities. If the generic ability of these sociology students is greater than the generic ability of economic students, the estimate of for sociology will be positive. Therefore, in addition to peer group and knowledge spillover effects, the coefficients may capture the effects of specific ability. Since the coefficients constitute the parameters of interest, controlling for M in equations (1) and (2) also controls for otherwise unobserved peer group effects, knowledge spillover effects and effects induced by differences between generic and specific abilities. Therefore, specification of M reduces the risk of omitted variable bias. 3. The Data 3.1 Students The data were supplied by the Department of Student Administration of the Faculty of Social Sciences at the Hebrew University of Jerusalem. The data comprise the entire student cohort of 2003 who registered in the Faculty of Social Sciences. These comprise 1,217 BA students whose grades during 2003 – 2008 are included in the data. The total number of grade observations is about 40,000, which works out at an average of 33 grades per student. By 2008 79 percent of this student cohort had graduated. Table 1 Cohort Demographics (percent) Male 45.7 Immigrated after 1989 11.0 Immigrated after 1998 2.6 Born before 1975 2.1 Born 1975 – 1977 10.9 Born 1978 12.3 Born 1979 19.3 Born 1980 22.7 Born 1981 15.5 Born 1982 7.7 10 Born 1983 5.8 Born 1984 2.8 Table 1 shows the demographic composition of the students in the cohort. There were slightly more women than men and 11 percent were new immigrants (immigrated after 1989 when the former USSR permitted Jews to emigrate). The modal age was 23 years in 1983, but the age dispersion was quite large5. The second column in Table 2 records the number of students who registered by major in 2003. The fourth column reports the number of students who graduated by 2008. Since a significant minority of students did not graduate, the numbers in column 4 are expected to be smaller than the numbers in column 2. The graduation rates are particularly low in sociology and statistics. In the case of business studies there were more graduates than initial registrations because apparently some students transferred to business studies after 2003. Table 2 Majors Students Entry Grade Graduates Final Grade Psychology 116 22.7 115 89.7 Sociology 240 18.3 157 84.8 Political Science 179 17.9 121 85.0 Int Relations 255 18.1 213 84.8 Statistics 57 16.9 27 78.0 Economics 333 20.7 248 83.8 Business Studies 81 21.0 85 87.5 Accountancy 86 20.6 86 81.3 Communications 139 21.1 119 88.5 PPE 31 20.9 23 88.3 Geography 75 17.4 58 86.4 52 84.4 Islamic Studies 5 Undergraduates in Israel are older than their counterparts abroad because military conscription is 3 years for men and 2 years for women. Since Arabs are not required to serve in the army, the younger students are less likely to be Jewish. Many Bedouins and Druzes serve in the army. Student ethnicity is not identified in the data. 11 E. Asian Studies 35 86.3 History 20 86.9 L. America 27 90.3 Italian 17 89.0 Education 57 87.8 General BA 41 85.0 Law 38 84.7 Economics & 18 87.1 Law Most students initially chose two majors. Table 2 indicates that in many cases these majors were from the Faculty of Humanities. The most popular pairings of majors are economics and accountancy (260 students), economics and business studies (150), political science and international relations (162), communications and sociology (65), communications and international relations (73) and communications and psychology. Recall that majors may vary during the BA as a whole. Table 2 also reports entry grades (on a scale of 16 – 25) by department. Since, the offer of a university place is by entry grade alone, the minimum entry grade is in fact the threshold for acceptance. These thresholds vary by department. For example, the threshold for economics was lower than for psychology, but was higher than for sociology. The hardest departments to enter from this point of view were (and still are) psychology and communications. On the other hand, these departments enrolled fewer students than the economics department. Choosier departments naturally enroll fewer students and set higher entry thresholds. However, some departments, such as statistics, enroll fewer students even though the threshold is relatively low, because the underlying demand to study statistics is low. Entry grades and student registrations are not available for majors outside the Faculty of Social Sciences. For example, there were 52 students who majored in Islamic Studies who also majored in one of the social sciences. The total number of students majoring in Islamic Studies is obviously much larger than this. Since the minimal entry grade in Islamic Studies is set by the Faculty of Humanities, the entry grade and registrations in 12 Islamic Studies does not feature in Table 2. These data only feature for the social science majors. Table 2 also records the final BA grades by major. The grand mean grade was 86.5 and the median grade was 87.2. The highest final grades were obtained by 27 students in the department of Latin American Studies (Faculty of Humanities). Within the Faculty of Social Sciences the highest average final grade was achieved by students majoring in psychology, and the lowest by far was achieved by students majoring in statistics. Table 3 Graduation Year 2005 2006 2007 2008 – 2010 Percent 3.5 56.2 26.4 13.7 As mentioned, 80 percent of the students in the cohort had graduated by 2008. Table 3 reports the years in which these students graduated. The modal year for graduation was 2006, which is three years after registration6. However, only 60 percent of graduates completed their BA within three years. 3.2 Instructors We have matched 1500 individual instructors to the courses attended by the students in the cohort. The number of instructors exceeds the number of students due to high turnover among instructors, especially external instructors. There are 628 instructors who supplied at least 8 grades during 2003-2008. To estimate instructor fixed effects, we set 8 grades as a minimum. These instructors supplied a total of 33,166 grades out of the almost 40,000 grades mentioned in section 3.1. Therefore, estimation of instructor fixed effects reduces the sample size by slightly more than 5,000 observations. Instructor fixed effects embody three components: a departmental grading norm, leniency or strictness, and instruction quality. Observed characteristics of the 628 instructors are reported in Table 4. Apart from members of faculty, instructors include graduate students as well as external instructors. Note that 68 instructors are affiliated to more than one department. We use these 6 These data refer to the year in which the BA degree was awarded. In practice there are students who missed the deadline for various administrative reasons and were awarded their degree the following year. Some of the 26 percent of students in 2007 no doubt fall into this category. 13 instructors for quasi-experimental purposes to test the hypothesis that instructors are influenced by departmental grading norms. Table 4 Instructor Charactersitics Men Young Members of faculty Phd students External instructors Affiliated to more than 1 department 4. Results Ignoring Instructor Fixed Effects 4.1 Overview We begin by ignoring instructor fixed effects because, as mentioned, this involves a substantial reduction in the sample size. In section 5 we estimate models including instructor fixed effects. Two sets of results are reported. The first (method 1) is based on equation (1) and conditions on students’ entry grade as a measure of their ability. The second (method 2) is based on equation (2) and conditions on the student fixed effect as a measure of ability. Since method 2 does not require data on entrance grades, which are missing for some students, it uses more observations than method 1. In fact method 1 uses 36,867 observations on grades for 1120 students while method 2 uses 38,542 observations on grades for 1180 students. Table 5 summarizes the sample sizes and presents goodness-of-fit statistics for the two methods. The base department for courses is economics, the base major is economics and the base year is 2004 since these categories cover the largest number of courses and students. Table 5 Summary Statistics Students Observations Method 1 1120 36,867 Method 2 1180 38,542 14 R-squared (adjusted) With Major effects Without major effects P-value for fixed effects Non-nested test 1.0055 0.1346 0.1246 Na se = 0.1155 0.7153 0.0485 0.0362 <0.0001 se = 0.1857 Methods 1 and 2 come in two versions, with and without major effects, which are jointly statistically significant. The goodness-of-fit of method 1, as measured by R2, is not necessarily better than the goodness-of-fit of method 2 since in the latter case it excludes the explanatory power of the student fixed effects. In any case the two methods are not directly comparable because they are non-nested; method 1 conditions on high school ability whereas method 2 conditions on ability at university. A non-nested test7 is carried out to discriminate between the two methods. There are three potential outcomes of this test. Method 1 encompasses method 2, method 2 encompasses method 1, and neither method encompasses the other. When for the common set of observations the predicted grade of method 2 is added to method 1 it obtains a coefficient of 1.0055 which is statistically significant. Therefore, method 2 explains grades that method 1 failed to explain. When the predicted grade of method 1 is added to method 2 it obtains a coefficient of 0.7153, which is also statistically significant. Therefore, method 1 explains grades that method 1 failed to explain. The test therefore shows that neither method encompasses its rival, in which case there is no preferred model. This result implies that both high school ability and university ability matter. However, for reasons already stated, it is not possible to specify both types of ability. 4.2 Departmental Grade Differentials There are 55 grade differential coefficients (k). Since courses supplied by the Department of Economics comprise the base group, these differentials are expressed relative to economics. The estimates of grade differentials for key departments are reported in Table 6. For example, grades in the Department of Political Science exceed grades in the Department of Economics by about 5.12 points according to method 1 with 7 See in particular Davidson and MacKinnon (2009) chapter 14 on non-nested hypothesis testing and the encompassing principle. The reported results refer to both variance encompassing and parameter encompassing. Method 1 encompasses method 2 if method 1 explains what method 2 fails to explain, while method 2 does not explain what method 1 fails to explain. The non-nested test uses the specification with major effects. 15 major effects. The standard errors are robust and clustered by student in method 1 and they are robust in method 2 because the panel is unbalanced8. In the case of method 2 robust standard errors are also clustered by 3028 courses. Therefore, the grade differential in political science relative to economics is statistically significant. The grade differential estimates are all positive except for mathematics. Therefore all departments have higher grade than economics except for mathematics. Notice that grades in other departments in the Faculty of Natural Sciences are higher than in economics. The largest grade differentials occur in the Humanities Program and grades supplied by the School of Education. Within the Faculty of Social Sciences the largest grade differentials occur in geography and communications. Note that the estimates of grade differentials are not on the whole sensitive to the method of estimation or the specification of major effects. However, in the case of accountancy and statistics, grade differentials are not statistically significant in the case of method 1, but are small and statistically significant in the case of method 2. Therefore, these grade differential estimates are robust to the way in which one controls for student ability and their choice of majors. A t – test may be used to determine whether grade differentials are statistically significant. To determine whether the grade differential in say psychology is significantly different to that in geography, the difference of 1.68 points (method 2 with major effects) is divided by the square root of the sum of the estimated variances minus twice their covariance. The t statistic equals 1.61, which is smaller than conventional critical values. Therefore, the difference in grades between these departments is not statistically significant. However, other grade differentials are statistically significant. Table 6 Departmental Grade Differentials Method 1 Method 2 Major Effect Yes No Yes No 5.12 5.49 5.29 5.44 Political Science (10.47) (9.55) (11.72) (11.82) 7.56 7.90 7.60 7.74 Geography (11.90) (11.09) (12.39) (12.34) 5.18 5.49 5.30 5.47 International 8 Longer panels naturally have smaller variances. In unbalanced panels the panel length varies by course, which induces heteroscedasticity in the residuals. Robust standard errors take this heteroscedasticity into account. 16 Relations Communications Psychology Sociology Statistics Business Studies Accountancy Latin American Studies Law E. Asian Studies Islamic & Middle East Studies Philosophy PPE Education Humanities Maths Physics Biology (11.10) 6.88 (14.02) 5.80 (10.4) 5.27 (8.84) 0.67 (1.11) 5.77 (12.82) 0.93 (1.92) 11.51 (19.09) 5.06 (8.69) 4.01 (5.14) 4.80 (5.78) 4.53 (6.97) 5.17 (8.47) 10.58 (18.65) 13.30 (13.06) -3.85 (-2.11) 3.36 (2.69) 3.90 (5.32) (9.55) 6.58 (11.37) 5.51 (9.49) 5.07 (7.48) 0.34 (0.47) 5.60 (10.45) 0.24 (0.40) 11.60 (16.61) 3.96 (5.59) 4.25 (4.64) 4.32 (5.99) 4.97 (7.24) 5.58 (5.97) 10.06 (11.63) 13.13 (12.82) -5.58 (-2.16) 1.91 (1.49) 3.23 (4.02) (11.81) 7.45 (15.64) 5.91 (9.69) 5.47 (11.43) 1.63 (3.73) 6.09 (15.56) 2.12 (4.87) 10.61 (16.28) 4.36 (5.76) 4.41 (7.07) 4.34 (7.82) 4.87 (7.89) 5.06 (8.35) 11.62 (18.58) 13.63 (13.74) -4.51 (-3.13) 3.42 (2.88) 3.01 (4.12) (11.99) 7.63 (15.89) 5.93 (9.70) 5.63 (11.86) (1.68) (3.83) 6.09 (15.44) 2.06 (4.71) 10.82 (16.38) 3.31 (4.24) 4.60 (7.35) 4.49 (8.10) 5.04 (8.05) 4.81 (7.99) 11.79 (18.40) 13.80 (13.88)) -4.65 (-3.06) 3.10 (2.52) 2.84 (3.66) Note: t-statistics reported in parentheses. The estimates of grade differentials reported in Table 6 are on the whole robust with respect to the method of estimation, except in the cases of statistics and accountancy. They are also robust with respect to the specification of major effects. 4.3 The Role of Majors Table 7 reports the estimated coefficients in equations (1) and (2). Only six of these coefficients are statistically significant according to method 1 at conventional levels of 17 probability. With the exception of geology the coefficients are negative. Since the base refers to economics, this means that students majoring in economics tend to do better, given their ability etc, than other students studying the same courses. This effect might be due to one or more of the three reasons mentioned in Section 2. First, there may be intellectual complementarity or spillover between economics and other subjects so that economics helps students obtain better grades in other subjects. Second, peer group effects in economics are more conducive to learning. Third, students with specific ability in economics have higher than average generic ability. Table 7 Major Effects Method 1 Method 2 Coefficient t-statistic Coefficient t-statistic Political Science 0.04 0.07 0.30 0.76 Geography 0.03 0.04 0.16 0.37 Int Relations -0.37 -0.80 0.09 0.28 Communications -1.16 -1.95 -0.28 -0.76 Psychology -0.82 -1.67 -0.13 0.38 Sociology -0.76 -1.53 -0.30 -0.78 Statistics -1.82 1.31 -0.72 -1.95 Business studies -0.77 -1.20 -0.72 -1.95 Accountancy -2.23 -3.52 -0.84 -2.33 Law -3.70 -3.36 -4.78 -3.39 E Asian studies 0.02 0.02 -0.07 -0.10 L American studies -0.15 -0.15 -0.70 -1.03 Philosophy 0.40 0.42 0.59 0.70 PPE 0.03 0.02 -5.04 -3.04 Education -1.60 -1.8 0.69 1.25 Islam and M. East 0.25 0.40 -0.23 -0.44 Russian & Slavic -14.12 -7.67 0.75 0.27 6.38 6.52 -0.43 -0.35 Studies Geology 18 Chemistry -9.47 -7.08 -0.15 -0.20 Maths -4.61 -1.34 -1.18 -0.96 See notes to Table 6 The coefficient estimates in Table 7 are more sensitive to the method of estimation than their counterparts in Table 6. Since, as suggested, the choice of majors may be related to ability, the estimates of the ’s are likely to depend upon how ability is measured. Therefore, it is not surprising that the two methods produce quite different results. For example, the major effect for accountancy is -2.23 according to method 1 while it is only -0.84 according to method 2. Some major effects, which are statistically significant according to method 1 (geology, chemistry and Russian and Slavic Studies) are not statistically significant according to method 2. 4.4 Progress during the BA Table 8 Year Effects Method 1 Method 2 Coefficient t-statistic Coefficient t-statistic 2003-2004 -0.45 -0.44 -3.12 2004-2005 base 2005-2006 0.46 2.37 1.02 7.46 2006-2007 -2.25 -4.28 1.60 4.57 2007-2008 -4.21 -4.09 2.30 3.42 2008-2009 -6.49 -4.53 1.94 1.52 -2.30 Base Note: Estimated with major effects. Table 8 reports fixed effects (’s) for the years in which the course was studied (the base year is 2004-5). According to both methods students tend to perform weakly in their first year (2003-2004); their grades are lower by 0.45 points. Subsequently, they perform better. However, from 2006-7 onwards the two methods give opposite results. Students who took courses after 2005-6 obtained significantly lower grades according to method 1, whereas the opposite is true according to method 2. As noted in Table 3, many students took longer than three years to graduate and some did not graduate at all. 19 Since method 2 estimates students' university ability and less able students take longer to graduate (and might not graduate at all), the time fixed effects are less affected by adverse selection among students who graduated later. Given their university ability, method 2 indicates that students who took longer to graduate in fact obtained better grades. This premium reaches 2.3 points in 2007-2008. When a dummy variable is specified in method 1 for students who failed to graduate by 2008, its estimated coefficient is -8.05 with standard deviation 0.58. As expected, students who failed to graduate by 2008 are negatively selected, and it is not simply a matter that these students were slow to graduate. The results for method 1 reported above are robust, however, with respect to this specification. 4.5 High School Ability v University Ability The two methods handle ability differently. Method 1 hypothesizes that university ability is correlated with high school ability but may differ between men and women, immigrants and natives, and may be age dependent. Method 2 estimates university ability directly. These estimates may depend on age etc but if they do, such effects are taken directly into consideration. Method 1 shows (Table 9) that entry grade matters and is highly significant. Since entry grades vary by 9 points between departments, the contribution of high school ability adds at most 12½ grade points. Table 9 shows that there are no significant sex differences, that new immigrants perform like natives and that older students do slightly better. Table 9 Cross Section Variables (Model 1) Major Effects Yes No Coefficient t-statistic Coefficient t-statistic Entry Grade 1.419 12.14 1.424 13.33 Female 0.962 2.30 0.925 2.24 Age in 2003 0.426 3.99 0.458 4.30 Immigrant 0.604 1.21 0.648 1.29 Figure 1 plots the relationship between student fixed effects and their entry grades. The bold lines are drawn through the means of the two axes. The four quadrants indicate that there are many students with low entry grades who did well at university 20 (top left) and there are many with high entry grades who performed weakly (bottom right). On the whole, however, the two measures of ability are positively correlated, but the correlation is only 0.417. In fact there is a substantial degree of mobility in ability between high school and university. The mean reversion coefficient between university ability (measured by normalized fixed effects) and high school ability is only 0.32, indicating a high degree of mobility between high school performance and university performance. Students with top entry grades also had the greatest university ability. However, many students with intermediate entry grades had similar university ability. On the whole Figure 1 indicates that high school ability is a poor predictor of university ability. Figure 1 The Correlation between University Ability and High School Ability Figure 2 plots the distribution of student fixed effects, which are approximately normalized to zero (mean = -1.07). The empirical distribution is clearly different from the normal distribution (which is indicated in Figure 2) due to excess kurtosis, and left skewness. There is a long left tail of weaker students. 21 4.6 Robustness Checks When the number of students attending the various courses is specified in the model, the estimated coefficient according to method 2 is -0.0162 with standard deviation 0.001. Method 1 returns an almost identical result. This estimate means that students obtain lower grades in larger classes9, and grades decrease by 1 point when the number of students attending the course increases by 60. Specifying the number of students attending the course does not, however, significantly alter the other parameters. If instruction quality varies inversely with class size, this robustness check suggests that the estimates of grade differentials reported in Table 6 are unrelated to teaching quality. Method 1 is also estimated using data for 2003. Since compulsory courses are taught in the first year of the BA program, the estimates for 2003 should not be affected by potential course selection bias. On the other hand grading policies might vary by department in the first year and for compulsory courses. The sample size is inevitably 9 Some departments allocate students to separate classes. Therefore, the number of students in the course would only equal class size if there is only one class. 22 reduced (from 36163 to 10453 observations) and the standard deviations of the parameter estimates consequently increase. Results are reported in Table 10. Comparing Tables 6 and 10 reveals that the grade the differential is largest in geography and smallest in international relations. Since economics continues to serve as the base, this means that grades in economics continue to be low in the first year. On the other hand, grade differentials in business studies, communications and education are large in both Tables 6 and 10. It seems therefore that for some departments, such as psychology, the grade differential intensifies during the second and third years. Table 10 Grade Differentials in the First Year (2003) Grade Differential t-statistic Political Science 3.85 1.911 Geography 11.92 3.300 Int Relations -2.18 -1.94 Communications 8.42 5.15 Psychology 4.33 2.11 Sociology 3.93 1.18 Statistics 1.33 2.10 Business studies 9.39 5.00 Accountancy 5.79 2.35 Law 3.50 3.53 Philosophy 5.91 3.10 PPE 1.51 0.86 Education 9.42 4.31 Biology 9.84 3.81 5. Results with Instructor Fixed Effects We now estimate methods 1 and 2 with instructor fixed effects. As mentioned, this involves a reduction in sample size by about 5,000 course grades. Since there are 628 instructors and almost 1,200 students, method 2 involves the estimation of 1,868 fixed effects. This is feasible because there are over 33,000 observations on course grades. 23 However, we streamline by abstracting from major effects, which turn out to be unimportant when instructor fixed effects are specified. We focus on a number of related questions. First, does the specification of instructor fixed effects alter the estimates of grade differentials? For these purposes we define the departmental grade differential by the weighted average of instructor fixed effects by departmental affiliation. Secondly, does grading heterogeneity among instructors vary by department? Suppose, for example, that the mean instructor fixed effect is the same in departments A and B, but the variance in B is larger than in A. Therefore, instructors are more heterogeneous in their grading in B than in A. Third, do instructors grade according to departmental norms? Do instructors grade more leniently in more lenient departments? Since some instructors are affiliated to more than one department, do they grade more leniently (strictly) in their department which grades more leniently (strictly)? Fourth, do instructors grade differentially depending on their sex, age and status. In particular, is it the case as suggested by Johnson (2003), that external instructors grade more leniently to increase their popularity among students, and enhance their employment prospects? 5.1 Grade Differentials Models 1 and 3 in Table 11 reports estimated grade differentials for methods 1 and 2 in which instructor fixed effects are specified. For instructors affiliated to more than one department, we specify separate instructor fixed effects. Since these estimated grade differentials use smaller samples than their counterparts in Table 6, Models 2 and 4 are reported for purposes of comparison. For example, according to Table 6 the grade differential for geography is 7.09 when using method 2. In Table 11 this parameter is estimated at 6.52 using 33,166 observations (model 2). When instructor fixed effects are specified the estimated differential is 6.98 (model 1). A comparison of models 1 and 2 shows that the estimated grade differentials are, on the whole, insensitive to the specification of instructor fixed effects. The same applies to method 1 based on a comparison of columns 3 and 4. The maximum difference is about half a grade point. Therefore, ignoring the identity of instructors is unimportant when estimating departmental grade differentials. Table 11 Departmental Grading Differentials 24 Method Model Fixed effects 300 301 311 312 320 321 322 323 325 326 401 802 Observations 2 1 1 2 3 4 Students & instructors Students Instructors None 4.1584623 4.437324 3.7820001 3.101328 4.8698805 4.54037 3.7256423 3.196854 4.4497588 4.426159 3.7494584 3.906262 4.3920777 4.679812 3.8781176 4.226971 1.1686173 0.9364849 -0.2965462 -0.7851966 0 0 0 0 5.9548419 5.508387 5.6927214 4.856543 6.31934 5.954251 4.9401809 4.214675 0.9136477 0.9297713 -0.91361989 -1.028171 2.1713848 2.289123 2.9330178 2.82098 1.9560671 2.650691 2.8679076 2.850693 6.977011 6.515433 6.4369908 5.911659 33,166 33,166 31,792 31,792 The 668 instructor fixed effects are jointly statistically significant. The sum of their squared t – statistics is 1,106, which greatly exceeds its critical chi-squared value. Also, there is heterogeneity between departments in the variance of the estimated fixed effects. Figure 3 plots the kernel densities for the estimated instructor specific effects by department. The distribution is tighter in departments where instructors grade more similarly, and it is more diffuse in departments where instructors grade less similarly. 5.2 Decomposing Instructor Fixed Effects Table 12: Regression Model for Instructor Fixed Effects Coef. t-stat 300 2.390715 2.01 301 3.99557 3.73 311 3.764748 3.5 312 2.321733 2.12 320 -1.88085 -1.45 321 (base) 322 1.639356 1.61 323 4.58675 4.21 326 -1.2834 -0.43 350 2.738174 0.65 399 7.855539 1.87 802 5.078505 4.09 25 Doctorants Faculty External 1.228779 (base) 1.806651 1.25 Sex Age Intercept Observations R2 -0.79779 -1.29 0.076784 2.87 0.167163 0.11 274 0.245 2.96 We obtained demographic data for 274 of the 648 instructors. In Table 12 we report the results of a regression model for the estimated instructor fixed effects for model 1 in Table 11. Table 12 shows that departmental affiliation is a key determinant of grading by individual instructors. The estimated coefficients naturally reflect the estimates of grade differentials in Table 11. For example, the coefficient for geography is 5.08 in Table 12 and is 6.98 in Table 11. There is no difference in the grading behavior between male and female instructors. However, external instructors grade more leniently (by almost 2 grade points). So do older instructors, however, the size effect is small. 5.3 A Quasi-experiment The interpretation of the departmental coefficients in Table 12 is ambiguous. Either instructors in geography grade more leniently, or the grades in geography are higher because instructor quality is superior in the Department of Geography. A simple quasiexperiment to resolve this ambiguity is to compare the grade differentials of instructors who are affiliated to two departments since instructor quality is specific to the instructor rather than the department. We use the following differences-in-differences estimator (DID): Let i1 and i2 denote instructor n’s fixed effects in departments 1 and 2., G1 and G2 denote the departmental grade averages, fen denotes a fixed effect, capturing instructor quality and leniency, and h denotes a residual error. n1 fen 1G1 h1i n 2 fen 2 G2 h2i If γ1 = γ2 = γ, the DID estimator eliminates fe: d n n1 n 2 (G1 G2 ) (h1i h2i ) 26 Using model 1 in Table 11 the DID estimate of γ is 0.856 with t statistic = 3.55 (Figure 4). Instructors affiliated to more than one department grade more leniently in the department with the higher grade. The rate of convergence to the departmental grading norm is 86 percent. This result strongly suggests that instructors are affected by departmental grade norms. Indeed, it strengthens the suspicion that grade differentials are not induced by differential instructor quality. Rather, they are induced by academic policy to grade more strictly or leniently. Figure 4 Testing for Departmental Grading Bias 6. Grade Inflation The cohort data for 2003 are not informative about grade inflation. The statistical models specified the year in which courses were studied, and the estimates indicate that grades were lowest in 2003 and rose subsequently. However, this effect reflects the gradual adjustment of students to the university environment rather than grade inflation. They achieved lower grades in 2003 simply because this was their first year of study and the 27 university environment was new. One would need additional student cohorts to estimate grade inflation, which implies that later cohorts obtain higher grades than earlier ones. Another way to estimate grade inflation is to track grade averages over time. This assumes that average student quality does not change over time. If average student quality happened to increase over time, and better students obtain higher grades, average grades should increase over time, as noted by Bar et al (2009). This could not, of course, be counted as grade inflation10. For example, the entry requirement at the Hebrew University has been raised in economics and lowered in psychology. Given everything else, this might have been expected to raise grades in economics and to lower them in psychology. Series A in Table 13 reports the weighted average grades on courses supplied by departments in the Faculty of Social Sciences. The weights (w) are based on student participation by course, so that more popular courses are given greater weight. On the whole, these data do not suggest that grade inflation occurred during 2000 to 2008. Grade inflation occurred in the Department of Political Science (4 points) and in the Department of Communications (2 points) while in the Department of International Relations there was grade disinflation (-3 points). The most remarkable feature of Table 13 is the large and persistent differences in average grades across the departments. Psychology and communications head the league table while statistics and economics share bottom places. Let Gjt denote the average grade in course j in year t. Since series A is weighted by student course participation, i.e. At 1 w jt G jt , there is an obvious index number J problem. If students increasingly choose courses that award higher grades, average course grades would increase even if the course grades themselves did not change. Bar et al (2009) point out that at Cornell University the incentive to chase grades11 increased in 1998 when the university began to publish median grades. Since in the Hebrew 10 The parallel between actual inflation and grade inflation is complete. Inflation ignores improvements in the quality of goods, and is understated if consumers purchase cheaper goods. 11 Grade chasing occurs when students chose courses for their grades rather than for their academic content (Sabot and Wakeman - Linn 1991). Sabot and Wakeman-Linn show that the probability of taking a further course in a subject varies directly with the previous grade. They interpret this as grade chasing, when it might simply be the case that students choose courses that suit them. Johnson (2003) argues that grade inflation is induced by competition between departments over student numbers, which induces “grade races” as in arms race models. 28 University grades have always been public knowledge, this incentive to chase grades has not changed. Therefore a simple rather than a weighted course grade average might be a superior measure of grade inflation. This is shown by series Bt 1 J G J 1 jt in Table 13. Since A is a student-weighted average of course grades, A is equal to B plus the covariance between w and G and w, i.e At Bt cov t ( wG) . Cov(wG) may differ from zero for two main reasons. First, grade chasing increases the covariance in which causality is from G to w. Secondly, unpopular instructors may ingratiate themselves with students by grading more generously to boost student numbers. This ingratiation effect has a negative effect on cov(wG). If the covariance is zero then A = B. The covariance equals A – B. Equilibrium in the market fpr students and grades is illustrated in Figure x where schedule S is induced by grade-chasing and schedule I is induced by ingratiation. Equilibrium grades and student rolls is determined where the two schedules intersect. Equilibrium with Ingratiation & Grade Chasing G I S G* w w* We cannot decompose the covariance into its two components. However, if the class-size covariance is constant, changes in the covariance may be attributed to the intensity of grade-chasing. The estimated covariances range between -4,9 (economics 2004) to 1.4 (statistics 2007). Therefore the covariance between grades and course choice can make a 29 substantial contribution to grade differentials. The mean covariances range between -3.48 in political science to 0.2 in international relations, and the grand mean is -1.61. Since the covariance is typically negative the class-size covariance dominates its grade-chasing component. The covariance increases in statistics by about 2 and slightly decreases in international relations and psychology, suggesting that grade-chasing has increased in the former and decreased in the latter. In other departments grade-chasing has remained stable. CommunicA ations B cov(wG) C Economics A B Cov(wG) C Geography A B Cov(wG) C International A Relations B Cov(wG) C Political A Science B Cov(wG) C Psychology A B Cov(wG) C Sociology A B Cov(wG) C Statistics A B Cov(wG) C 2000 85.8 85.2 0.6 77.2 81.9 -4.7 84.1 85.4 -1.3 86.5 86.4 0.1 79.9 81.8 -1.9 88.5 89.6 -1.1 80.7 82.4 -1.7 77.8 80.3 -2.5 Table 13 Average Grades 2001 2002 2003 2004 2005 87.6 88.7 86.3 87.7 86.5 88.9 89.1 85.3 87.6 87.6 -1.3 -0.4 -1.0 0.1 -1.1 89.0 80.0 79.2 79.4 77.5 77.3 81.2 82.7 82.5 82.4 81.3 -1.2 -3.5 -3.3 -4.9 -4.0 84.1 85.4 85.6 84.9 84.7 83.8 85.8 86.8 87.1 86.2 84.4 -0.4 -1.2 -2.2 -1.5 -0.6 85.9 85.4 85.0 84.5 83.1 82.8 84.6 82.7 82.9 82.4 83.4 0.8 2.3 1.6 0.7 -0.6 85.1 79.0 80.6 81.4 82.8 83.3 83.6 84.4 85.1 86.9 86.3 -4.6 -3.8 -3.7 -4.1 -3.0 84.1 88.5 88.7 88.7 88.0 88.7 90.6 90.8 90.2 90.2 90.9 -2.1 -2.1 -1.5 -2.2 -2.2 90.5 78.3 80.5 80.2 79.1 80.8 79.8 83.4 81.6 81.8 83.2 -1.5 -2.9 -1.4 -2.7 -2.4 84.3 81.8 80.8 79.7 80.0 80.3 83.3 80.2 79.0 79.7 79.9 -1.5 0.6 0.7 0.3 0.4 82.6 2006 87.6 88.0 -0.4 88.1 77.5 82.0 -4.5 84.7 84.5 86.6 -1.1 85.5 82.9 83.6 -0.7 85.5 83.1 86.7 -3.6 84.4 88.6 91.2 -2.6 89.8 82.0 83.2 -1.2 85.5 76.3 77.1 -0.8 82.7 2007 86.9 86.8 0.1 87.8 77.3 82.4 -5.1 84.3 85.2 86.9 -1.7 83.4 83.4 84.4 -1.0 84.9 84.0 87.1 -2.9 84.7 88.7 91.6 -2.9 90.6 81.6 84.3 -2.7 86.5 78.3 76.9 1.4 83.8 2008 88.1 88.8 -0.7 87.9 79.4 83.0 -3.6 84.5 85.9 87.4 -1.5 85.1 83.7 85.1 -1.4 85.0 83.9 86.5 -2.6 85.1 89.4 91.5 -2.1 90.0 81.2 84.7 -3.5 86.3 78.8 79.8 -1.0 83.1 2009 88.2 84.5 85.7 85.5 85.1 90.5 86.9 81.7 30 A: average course grade weighted by the number of students in the course. B: simple average course grade. C: average final course grade (GPA). I am grateful to Benny Yakir for the data on A and B. Source: Department of Student Administration. Series C in Table 13 reports average final grades in the various majors, and is not directly comparable to series A and B because it is based on courses attended over a period of at least three years and it only refers to students who registered by major. By contrast series A and B refer to all students irrespective of their major. Although it is available for a shorter period of time, series C does not suggest that there has been grade inflation during 2003 – 2009. 7. Conclusion Two statistical methods have been compared for estimating departmental grade differentials. The first controls for student ability by using their university entrance grades (matriculation and psychometric scores). The second controls for student ability by estimating specific effects for each student using panel data estimation. The former measures high school ability while the latter measures ability at university. It turns out that using data for the 2003 cohort of BA students studying Social Science at the Hebrew University of Jerusalem the correlation between the two measures of ability is only 0.41. Indeed, there is substantial upward mobility between high-school and university ability. Although the strongest high-school students tend to be the strongest university students, there are many weaker high-school students who do well and even excel at university. Despite differences in the two measures of ability, both methods suggest that departments grade to different standards. Indeed, the differences are significant and can be as large as 15 points (out of 100). In the Faculty of Social Sciences grades are lowest in economics and highest in communications and geography. Another interpretation of these results is that they reflect differential teaching quality. However, attempts to control for teaching quality suggest that this interpretation is unreasonable. Instructors affiliated to more than one department grade more leniently in the department where grades are higher and more strictly in the department where grades are lower. These instructors adjust their grading to norms set by their department. Therefore, we claim that departmental grade differentials 31 are not caused by the quality of instructors or the quality of students. They are caused by arbitrary standards of leniency and strictness. A simple methodology is also proposed for estimating grade inflation under the assumption that student quality does not vary over time. The methodology takes account of grade-chasing by students, i.e. students study courses where grades are higher. Surprisingly, there is no evidence of grade inflation during the last decade. This is surprising because the available evidence indicates that grade inflation is a problem in a number of countries. On the other hand, the result that economics seems to be the strictest of the social sciences in terms of grading is consistent with what seems to be happening in other countries. 32 References Achen A.C. and P.N. Courant (2009) What are grades made of? Journal of Economic Perspectives, 23: 77 - 92 Bar T., V. Kadiyali and A. Zussman (2009) Grade information and grade inflation: the Cornell experiment. Journal of Economic Perspectives, 23: 93-108. Bar T. and A. Zussman (2011) Partisan Grading, American Economic Journal: Applied Economics (fortcoming). Brogden H.E and E.K. Taylor (1950) The theory and classification of criterion bias. Educational and Psychological Measurement, 10: 159-186. Elliot R. and A.C. Strenta (1988) Effects of improving the reliability of the GPA on prediction generally and on comparative predictions for gender and race particularly. Journal of Educational Measurement, 25: 333-347. Davidson R. and J. G. MacKinnon (2009) Econometric Models and Methods, Oxford, Oxford University Press. Goldman R.D. and M.H. Widawski (1976) A within-subjects technique for comparing college grading standards: implications in the validity of evaluations of college achievement. Educational and Psychological Measurement, 36: 381-90. Johnson V.E. (2003) Grade Inflation: a Crisis in College Education, New York, Springer. Krautman A.C. and W. Sander (1999) Grades and student evaluations of teachers. Economics of Education Review, 18: 59-63. Linn R.L. (1966) Grade adjustments for prediction of academic performance. Journal of Education Measurement, 3: 313-329. Sabot R. and J. Wakeman-Lin (1991) Grade inflation and course choice. Journal of Economic Perspectives, 5: 159-71. Strenta A.C. and R. Elliot (1987) Differential Grading standards revisited. Journal of Educational Measurement, 24: 282-91. Stricker L.J., D.A. Rock and N.W. Burton (1993) Sex differences in prediction of college grades from scholastic aptitude test scores. Journal of Educational Psychology, 85: 710718. 33 Stricker L.J., D.A. Rock, N.W. Burton, E. Mutaki and T.J. Jirele (1994) Adjusting college grade point average criteria for variations in grading standards: a comparison of methods. Journal of Applied Psychology, 79: 178-183.: Young J.W. (1993) Grade adjustment methods. Review of Educational Research, 63: 151-165.