5. I E T :

advertisement
5. IDENTIFYING EXEMPLARY TEACHERS AND TEACHING:
EVIDENCE FROM STUDENT RATINGS1
Kenneth A. Feldman
SUNY-Stony Brook University
Kenneth.Feldman.1@sunysb.edu
Key Words: College teaching; dimensions of teaching; exemplary teaching; student
ratings of instruction; reliability and validity; teaching myths vs. research evidence;
faculty evaluation; faculty development
Formal or systematic evaluation by college students of their
teachers has long been used to help students in their selection of
courses, to provide feedback to faculty about their teaching, and to
supply information for administrators and personnel committees in
their deliberations on the promotion and tenure of individual faculty
members. Moreover, with the increasing emphasis that many colleges
and universities are currently putting on good teaching and on designating, honoring, and rewarding good teachers, the use of student
ratings is, if anything, likely to increase. Yet, for all their use, student
ratings of instructors and instruction are hardly universally accepted.
It is no secret, for example, that some college teachers have little
regard for them. For these faculty, student evaluations of teachers (or
courses)—whether sponsored by the university administration, facultydevelopment institutes, individual academic departments, or studentrun organizations—are not reliable, valid, or useful, and may even be
1
This paper is based on an earlier one (Feldman, 1994) commissioned by the National Center
on Post-secondary Teaching, Learning, and Assessment for presentation at the Second AAHE
Conference on Faculty Roles & Rewards held in New Orleans (January 28–30, 1994). The earlier
paper benefited by the thoughtful suggestions of Robert Menges and Maryellen Weimer. As for
the present paper, I am grateful to Herbert Marsh, Harry Murray, and Raymond Perry for their
helpful comments. A brief version of this paper is to appear in an issue of New Directions for
Teaching and Learning, edited by Marill Svinicki and Robert Menges (Feldman, forthcoming).
R.P. Perry and J.C. Smart (eds.), The Scholarship of Teaching and Learning in Higher
Education: An Evidence-Based Perspective, 93–143.
© 2007 Springer.
93
Feldman: Identifying Exemplary Teachers and Teaching
harmful. Others, of course, believe more or less the opposite; and still
others fall somewhere in between these two poles of opinion.
If the credibility of teacher evaluations is to be based on more than
mere opinion, one asks what the research on their use shows. This
question turns out to be more difficult to answer than might be thought
because, even apart from the substance of the pertinent research, the
number of relevant studies is voluminous. A few years ago, in a letter
to the editor in The Chronicle of Higher Education (Sept. 5, 1990),
William Cashin pointed out that 1,300 citations could be found in
the Educational Resources Information Center on “student evaluation
of teacher performance” at the postsecondary level. This same year,
my own collection of books and articles on instructional evaluation
numbered about 2,000 items (Feldman, 1990b). This collection has
grown still larger since then, of course. It is true that, at a guess, well
over one-half of the items in this collection are opinion pieces (filled
with insightful observations at best and uninformed polemics at worst).
Even so, this still leaves a large number of research pieces.
Luckily, this research—either as a whole or subportions of it—
has been reviewed relatively often (see, among others, Aubrect, 1981;
Braskamp, Brandenburg and Ory, 1984; Braskamp and Ory, 1994; Centra,
1979, 1989, 1993; Costin, Greenough and Menges, 1971; Doyle, 1975,
1983; Kulik and McKeachie, 1975; Marsh, 1984, 1987; Marsh and
Dunkin, 1992; McKeachie, 1979, Miller, 1972, 1974; and Murray, 1980).
Cashin (1988, 1995) has even supplied particularly useful reviews of the
major reviews. My own series of reviews started in the mid-1970s and
has continued to the present. (See Feldman, 1976a, 1976b, 1977, 1978,
1979, 1983, 1984, 1986, 1987, 1989a, 1989b, 1990a, 1993; two other
analyses—Feldman, 1988, 1992—are indirectly relevant.)
One of the best overviews in the area is that by Marsh (1987),
which is an update and elaboration of an earlier review of his (Marsh,
1984). In this review, after 100 pages or so of careful, critical, and
reflective analysis of the existing research and major reviews of student
ratings of instruction, Marsh (1987) sums up his findings and observations, as follows:
Research described in this article demonstrates that student ratings
are clearly multidimensional, quite reliable, reasonably valid,
relatively uncontaminated by many variables often seen as sources
of potential bias, and are seen to be useful by students, faculty,
and administrators. However, the same findings also demonstrate
that student ratings may have some halo effect, have at least
some unreliability, have only modest agreement with some criteria
94
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
of effective teaching, are probably affected by some potential
sources of bias and are viewed with some skepticism by faculty
as a basis for personnel decisions. It should be noted that this
level of uncertainty probably also exists in every area of applied
psychology and for all personnel evaluation systems. Nevertheless,
the reported results clearly demonstrate that a considerable amount
of useful information can be obtained from student ratings; useful
for feedback to faculty, useful for personnel decision, useful to
students in the selection of courses, and useful for the study of
teaching. Probably, students’ evaluations of teaching effectiveness
are the most thoroughly studied of all forms of personnel evaluation, and one of the best in terms of being supported by empirical
research (p. 369).
Marsh’s tempered conclusions set the stage for the present
comments. This discussion first explores various interpretations that
can be made of information gathered from students about their teachers
(which includes a consideration of the possible half-truths and myths
that continue to circulate about teacher and course evaluations). It
then analyzes the differential importance of the individual items that
constitute the rating forms used to evaluate teachers. The primary aim
of this discussion is to see how student evaluations can be used to help
identify exemplary teachers and instruction.
TRUTHS, HALF-TRUTHS, AND MYTHS: INTERPRETING
STUDENT RATINGS
The unease felt by some faculty, and perhaps by some administrators and
students as well, in using teacher and course evaluations to help identify
exemplary teachers and instruction may in part be due to the half-truths
if not outright myths that have cropped up about these evaluations. Some
of the myths can be laid to rest; and the half-truths can be more fully
analyzed to separate the real from the imagined. To do so requires a
consideration of certain factors or influences that have been said to “bias”
ratings. At the moment there is no clear consensus on the definition of
bias in the area of student ratings (see Marsh, 1984, 1987; and Marsh
and Dunkin, 1992). I take bias to mean something other than (or more
than) the fact that student ratings may be influenced by conditions not
under the teacher’s control or that conditions may somehow be “unfair”
to the instructor (making it harder for him or her to teach well and thus
to get high ratings compared to teachers in “easier” situations). Rather,
95
Feldman: Identifying Exemplary Teachers and Teaching
bias here refers to one or more factors directly and somehow inappropriately influencing students’ judgments about and evaluation of teachers
or courses. In essence, the question is whether a condition or influence
actually affects teachers and their instruction, which is then accurately
reflected in students’ evaluations (a case of nonbias), or whether in some
way this condition or influence only affects students’ attitudes toward the
course and students’ perceptions of instructors (and their teaching) such
that evaluations do not accurately reflect the instruction that students
receive (a case of bias). (For a more extensive discussion of the meaning
of bias as it pertains to student ratings, see Feldman, 1984, 1993;
Marsh, 1987, and Marsh and Dunkin, 1992.) Implications and examples
of this conceptualization of bias will be given as the discussion proceeds.
Myths
Aleamoni (1987) has listed a number of speculations, propositions, and
generalizations about students’ ratings of instructors and instruction
that he declares “are (on the whole) myths.” Although I would
not go so far as to call each of the generalizations on his list a
myth, some of them indeed are—at least as far as current research
shows—as follows: students cannot make consistent judgments about
the instructor and instruction because of their immaturity, lack of
experience, and capriciousness (untrue); only colleagues with excellent
publication records and expertise are qualified to teach and to evaluate
their peers’ instruction—good instruction and good research being so
closely allied that it is unnecessary to evaluate them separately (untrue);
most student rating schemes are nothing more than a popularity
contest, with the warm, friendly, humorous instructor emerging as the
winner every time (untrue); students are not able to make accurate
judgments until they have been away from the course, and possibly
away from the university for several years (untrue); student ratings are
both unreliable and invalid (untrue); the time and day the course is
offered affect student ratings (untrue); students cannot meaningfully
be used to improve instruction (untrue). I call these statements untrue
because supporting evidence was not found for them in one or another
of the following research reviews: Abrami, Leventhal, and Perry (1982);
Cohen (1980b); Feldman (1977, 1978, 1987, 1989a, 1989b); LevinsonRose and Menges (1981); L’Hommedieu, Menges and Brinko (1988,
1990); Marsh (1984, 1987); and Marsh and Dunkin (1992).
For the most part, Aleamoni (1987) also seems correct in calling
the following statement a myth: “Gender of the student and the
96
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
instructor affects student ratings.” Consistent evidence cannot be found
that either male or female college students routinely give higher ratings
to teachers (Feldman, 1977). As for the gender of the teacher, a recent
review (Feldman, 1993) of three dozen or so studies showed that a
majority of these studies found male and female college teachers not
to differ in the global ratings they receive from their students. In
those studies in which statistically significant differences were found,
more of them favored women than men. However, across all studies,
the average association between gender and overall evaluation of the
teacher, while favoring women, is so small (average r = +02) as to
be insignificant in practical terms. This would seem to show that the
gender of the teacher does not bias students’ ratings (unless, of course,
it can be shown by other indicators of teachers’ effectiveness that the
ratings of one gender “should” be higher than the other to indicate the
reality of this group’s better teaching).
This said, it should also be noted that there is some indication of
an interaction effect between the gender of the student and the gender
of the teacher: across studies, there is some evidence to suggest that
students may rate same-gendered teachers a little more highly they than
do opposite-gendered teachers. What is unknown from the existing
studies, however, is what part of this tendency is due to male and female
students taking different classes (and thus having different teachers)
and what part is due to differences in preferences of male and female
students within classes (thus possibly indicating a bias in their ratings).
Half-truths and the Question of Bias in Ratings
Aleamoni (1987) also presents the following statements as candidates for the status of myth: the size of the class affects student
ratings; the level of the course affects student ratings; the rank of the
instructor affects student ratings; whether students take the course as
a requirement or as an elective affects their ratings; whether students
are majors or nonmajors affects their ratings. That these are myths is
not clear-cut. Each of these course, instructor or student factors is, in
fact, related to student evaluation. The real question is: “Why?”
Although the results of pertinent studies are somewhat mixed, some
weak trends can be discerned: slightly higher ratings are given (a) to
teachers of smaller rather than larger courses (Feldman, 1984; Marsh,
1987); (b) to teachers of upper-level rather than lower-level courses
(Feldman, 1978); (c) to teachers of higher rather than lower academic
ranks (Feldman, 1983; Marsh, 1987); (d) by students taking a course
97
Feldman: Identifying Exemplary Teachers and Teaching
as an elective rather than as a requirement (Feldman, 1978; Marsh,
1987); and (e) by students taking a course that is in their major rather
than one that is not (Feldman, 1978; Marsh, 1987). These associations do not prove causation, of course; each of these factors may not
actually and directly “affect” ratings, but may simply be associated with
the ratings due to their association with other factors affecting ratings.
Even if it can be shown that one or more of these factors actually
and directly “affect” students’ ratings, the ratings are not necessarily
biased by these factors, as is often inferred when such associations
are found (probably an important underlying worry of those prone to
discount teacher or course evaluations). To give an example, at certain
colleges and universities teachers of higher rank may in fact typically
be somewhat better teachers, and thus “deserve” the slightly higher
ratings they receive. To give another example, teachers in large classes
may receive slightly lower ratings because they indeed are somewhat
less effective in larger classes than they are in smaller classes, not
because students take out their dislike of large classes by rating them a
little lower than they otherwise would. So, while it may be somewhat
“unfair” to compare teachers in classes of widely different sizes, the
unfairness lies in the difference in teaching conditions, not in a rating
bias as defined here.1
To put the matter in general terms, certain course characteristics
and situational contexts—conditions that may not necessarily be under
full control of the teachers—may indeed affect teaching effectiveness;
and student ratings may then accurately reflect differences in teaching
effectiveness. Although rating bias may not necessarily be involved,
those interested in using teaching evaluations to help in decisions about
promotions and teaching awards may well want to take into account the
fact that it may be somewhat harder to be effective in some courses than
in others. Along these lines, note that student ratings gathered from
the Instructional Development and Effectiveness Assessment (IDEA)
system are reported separately for four categories of class size—small
(1–14 students), medium (15–34), large (35–99) and very large (100
or more)—as well as for five levels of student motivation for the class
as a whole (determined by the average of the students’ responses to
the background question, “I have a strong desire to take this course”).
The reason for this procedure is made clear to users of the evaluation
instrument, as follows:
1
Using a different definition of bias, Cashin (1988) would consider the size of class a source
of bias if its correlation with student ratings of teachers were sufficiently large (but see Cashin,
1995).
98
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
In addition to using flexible criteria, the IDEA system also controls
for level of student motivation or the students’ desire to take the
course and the size of the class—two variables which the research
has shown are correlated with student rating. The IDEA system
assumes that it is harder to teach large groups of students who
do not want to take a course than it is to teach small groups of
students who do want to take a course. IDEA controls for this by
comparing an instructor’s ratings, not only with “All” courses in
the comparative data pool, but with “Similar” courses [same level
of student motivation and same class size] as well (Cashin and
Sixbury, 1993, pp. 1–2, emphasis in original).
Another candidate for the status of myth concerns students’
grades. As Aleamoni (1987) words it, “the grades or marks students
receive in the course are highly correlated with their ratings of the
course and instructor.” On the one hand, the word “highly” indeed
makes the statement mythical; grades are not highly correlated with
students’ ratings. On the other hand, almost all of the available research
does show a small or even modest positive association between grades
and evaluation (usually a correlation somewhere between +10 and
+30), whether the unit of analysis is the individual student or the
class itself (see Feldman, 1976a, 1977; Stumpf and Freedman, 1979).
Research has shown that some part of the positive correlation
between students’ grades (usually expected grades) and students’ evaluation of teachers is due to “legitimate” reasons and therefore is
unbiased: students who learn more earn higher grades and thus legitimately give higher evaluations. This has been called the “validity
hypothesis” or “validity effect” (see Marsh, 1987, and Marsh and
Dunkin, 1992). Moreover, some part of the association may be
spurious, attributable to some third factor—for example, students’
interest in the subject matter of the course—which has been referred to
as the “student characteristics hypothesis” or “student characteristics
effect” (see Marsh, 1989, and Marsh and Dunkin, 1992). Yet another
part of the positive correlation may indeed be due to a rater bias in the
ratings, although the bias might not be large. Researchers currently are
trying to determine the degree to which an attributional bias (students’
tendency to take credit for successes and avoid blame for failure) and a
retributional bias (students “rewarding” teachers who give them higher
grades by giving them higher evaluations, and “punishing” teachers
who give them lower grades by giving them lower evaluations) are at
work (see Gigliotti and Buchtel, 1990; Theall, Franklin, and Ludlow,
1990a, 1990b). The second of these two biases has been called a
99
Feldman: Identifying Exemplary Teachers and Teaching
“grading leniency hypothesis” or “grading leniency effect” (Marsh,
1987; Marsh and Dunkin, 1992). In their review of research relating
grades and teacher evaluations, Marsh and Dunkin (1992) conclude as
follows:
Evidence from a variety of different types of research clearly
supports the validity hypothesis and the student characteristics
hypothesis, but does not rule out the possibility that a grading
leniency effect operates simultaneously. Support for the grading
leniency effect was found with some experimental studies, but
these effects were typically weak and inconsistent, may not generalize to nonexperimental settings where SETs [students’ evaluations of teaching effectiveness] are actually used, and in some
instances may be due to the violation of grade expectations that
students had falsely been led to expect or that were applied to other
students in the same course. Consequently, while it is possible that
a grading leniency effect may produce some bias in SETs, support
for this suggestion is weak and the size of such an effect is likely
to be insubstantial in the actual use of SETs (p. 202).
Yet another correlate of—and, therefore, a possible influence on—
teacher evaluations is not mentioned by Aleamoni (1987): academic
discipline of the course. Reviewing eleven studies available at the time
(Feldman, 1978), I found that teachers in different academic fields
tend to be rated somewhat differently. Teachers in English, humanities,
arts, and language courses tend to receive somewhat higher student
ratings than those in social science courses (especially political sciences,
sociology, psychology and economic courses); this latter group of
teachers in turn receive somewhat higher ratings than teachers in the
sciences (excepting certain subareas of biological sciences), mathematics and engineering courses. Recently, based on data from tens of
thousands of classes either from the IDEA system only (Cashin and
Clegg, 1987; Cashin and Sixbury, 1993) or from this system and the
Student Instructional Report (SIR) of the Educational Testing Service
combined (Cashin, 1990), differences among major fields similar to
those in my review have been reported.
Cashin and his associates have suggested several possible causes
that could be operating to produce these differences in ratings of
teachers in different academic disciplines, including the following:
some courses are harder to teach than others; some fields have
better teachers than others; and students in different major fields rate
differently because of possible differences in their attitudes, academic
skills, goals, motivation, learning styles, and perceptions of the
100
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
constituents of good teaching. The following practical advice given by
Cashin and Sixbury (1993) is informative:
There is increasing evidence that different academic fields are
rated differently. What is not clear is why. Each institution should
examine its own data to determine to what extent the differences
found in the general research hold true at that particular institution. If an institution concludes that the differences found at
that institution are due to something other than the teaching effectiveness of the instructors, e.g., because low rated courses are more
difficult to teach, or reflect a stricter rating response set on the
part of the students taking those courses, then some control for
those differences should be instituted. Using the comparative data
in this technical report is one possibility. If however, it is decided
that the differences in ratings primarily reflect differences in teaching
effectiveness, that is, that the low rated courses are so rated because
they are not as well taught, then of course no adjustments should
be made (pp. 2–3, emphases in original).
IDENTIFYING INSTRUCTIONAL DIMENSIONS IMPORTANT
TO EFFECTIVE TEACHING
Thus far, I have explored how student ratings can be used to identify
those persons who are seen by students as exemplary teachers (as well
as those who are not), noting certain precautions in doing so. Now,
I turn to the related topic of how exemplary teaching itself can be
identified through the use of student ratings of specific pedagogical
dispositions, behaviors and practices of teachers.2 Teaching comprises
many different elements—a multidimensionality that instruments of
teacher evaluation usually attempt to capture. The construction of
most of these instruments, as Marsh and Dunkin (1992) point out, is
based on “a logical analysis of the content of effective teaching and the
purposes the ratings are intended to serve, supplemented by reviews
of previous research and feedback” (p. 146). Less often used is an
2
As with overall evaluation of teachers, the characteristics of courses, of teachers themselves,
and of situational contexts have all been found to correlate with specific evaluations. Those
characteristics most frequently studied have been class size, teacher rank/experience and the
gender of the teacher. Class size and the rank/experience of the teacher each correlate more
highly with some specific evaluations than with others (for details, see Feldman, 1983, 1984).
(The degree to which these factors actually affect teaching rather than “biasing” students in their
ratings has yet to be determined.) With the possible exception of their sensitivity to and concern
with class level and progress, male and female teachers do not consistently differ in the specific
evaluations they receive across studies (Feldman, 1993).
101
Feldman: Identifying Exemplary Teachers and Teaching
empirical approach that emphasizes statistical techniques such as factor
analysis or multitrait-multimethod analysis.
Marsh and Dunkin (1992) also note that “for feedback to teachers,
for use in student course selection, and for use in research in
teaching there appears to be general agreement that a profile of
distinct components of SETs [students’ evaluations of teaching effectiveness] based on an appropriately constructed multidimensional
instrument is more useful than a single summary score” (p. 146).
However, whether a multidimensional profile score is more useful than
a single summary score for personnel decisions has turned out to be
more controversial (see Abrami, 1985, 1988, 1989a, 1989b; Abrami
and d’Apollonia, 1991; Abrami, d’Apollonia, and Rosenfield, 1993,
1996; Cashin and Downey, 1992; Cashin, Downey, and Sixbury, 1994;
Hativa and Raviv, 1993; and Marsh, 1987, 1991a, 1991b, 1994).
In earlier reviews (Feldman, 1976b, 1983, 1984, 1987, 1989a),
I used a set of roughly 20 instructional dimensions into which the
teaching components of relevant studies could be categorized. In recent
years, I extended this set in one way or another to include more
dimensions (see Feldman, 1988, 1989b, 1993). The fullest set—28
dimensions—is given in the Appendix, along with specific examples of
evaluation items that would be categorized in each dimension. Unlike
studies using factor analyses or similar techniques to arrive at instructional dimensions, the categories are based on a logical analysis of the
single items and multiple-item scales found in the research literature on
students’ views of effective teaching and on their evaluations of actual
teachers. Over the years, I have found the system of categorization to
be useful in classifying the characteristics of instruction analyzed in
various empirical studies even though it may differ from the definitions
and categories found in any one of these studies.3
Teaching That is Associated with Student Learning
Although all 28 dimensions of instruction found in the Appendix
would seem to be important to effective teaching, one would assume
that some of them are more important than others. One way of establishing this differential importance is to see how various teaching
3
Abrami and d’Apollonia (1990) adapted these categories for use in their own work (also see
d’Apollonia and Abrami, 1988). More recently, they have made more extensive refinements
and modifications to the dimensions and concomitant coding scheme (Abrami, d’Apollonia, and
Rosenfield, 1993, 1996).
102
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
dimensions relate to student learning, which Cohen (1980a, 1981,
1987) did in his well-known meta-analytic study of the relationships
of student achievement with eight different instructional dimensions.4
Based in large part on work by d’Apollonia and Abrami (1987, 1988)
and Abrami, Cohen, and d’Apollonia, 1988), I extended Cohen’s metaanalysis a few years ago by using less heterogeneous categories for
coding the evaluation items and scales in the studies under review,
widening the range of instructional dimensions under consideration,
and preserving more of the information in the studies Cohen used in his
meta-analysis (see Feldman, 1989b, 1990a). To be included in Cohen’s
meta-analysis or my own, a study had to provide data from actual
college classes rather than from experimental analogues of teaching.
The unit of analysis in the study had to be the class or instructor and
not the individual student. Its data had to be based on a multisection
course with a common achievement measure used for all sections of
the course (usually an end-of the course examination as it turned out).
Finally, the study had to provide data from which a rating/achievement
correlation could be calculated (if one was not given).
The correlations between specific evaluations and student
achievement from the studies under review were distributed among
28 instructional dimensions (given in the present Appendix), with
weighting procedures used to take into account evaluational items or
scales that were coded in more than one dimension. Average correlations were calculated for each of the instructional dimensions having
information from at least three studies. These average correlations are
given in Table 1,T1 along with the percent of variance explained r2 .5
Note that average r’s for the instructional dimensions range from
+57 to −11. All but one (Dimension No. 11) are positive, and all but
three (Dimensions No. 11, No. 23, No. 24) are statistically significant.
The two highest correlations of .57 and .56—explained variance of
over 30%—are for Dimensions No. 5 (teacher’s preparation and course
organization) and No. 6 (teacher’s clarity and understandableness).
The teacher’s pursuit and/or meeting of course objectives and the
student-perceived outcome or impact of the course (Dimensions No.
28 and No. 12) are the next most highly related dimensions with
achievement (r = +49 and +46). Somewhat more moderately-sized
correlations—indicating between roughly 10% and 15% of explained
4
These dimensions are labeled: Skill; Rapport; Structure; Difficulty; Interaction; Feedback; Evaluation; and Interest/Motivation.
5
The results given in Table 1 are similar to those shown in an analysis in d’Apollonia and Abrami
(1988), although there are some differences (see Abrami, d’Apollonia, and Rosenfield, 1996).
103
Feldman: Identifying Exemplary Teachers and Teaching
Table 1: Average Correlations of Specific Evaluations of Teachers with Student
Achievement
Percent
Variance
Explained
30.0%-34.9%
Instructional
Dimension
No. 5
No. 6
25.0%-29.9%
20.0%-24.9%
No. 28
No. 12
15.0%-19.9%
10.0%-14.9%
No. 1
No. 20
No. 16
No. 19
No. 7
No. 9
5.0%-9.9%
No. 3
No.8
No. 2
No. 13
No. 25
No. 17
No. 14
No. 18
No. 15
No. 26
Average r
Teacher’s Preparation; Organization of
the Course
Clarity and Understandableness
57
Teacher Pursued and/or Met Course
Objectives
Perceived Outcome or Impact of
Instruction
49
Teacher’s Stimulation of Interest in the
Course and Its Subject Matter
Teacher Motivates Students to Do Their
Best; High Standard of Performance
Required
Teacher’s Encouragement of Questions,
and Openness to Opinions of Others
Teacher’s Availability and Helpfulness
Teacher’s Elocutionary Skills
Clarity of Course Objectives and
Requirements
Teacher’s Knowledge of the Subject
Teacher’s Sensitivity to, and Concern
with, Class Level and Progress
Teacher’s Enthusiasm (for Subject or for
Teaching)
Teacher’s Fairness; Impartiality of
Evaluation of Students; Quality of
Examinations
Classroom Management
Intellectual Challenge and
Encouragement of Independent
Thought (by the Teacher and the
Course)
Personality Characteristics
(“Personality”) of the Teacher
Teacher’s Concern and Respect for
Students; Friendliness of the Teacher
Nature, Quality, and Frequency of
Feedback from the Teacher to the
Students
Pleasantness of Classroom Atmosphere
38
104
56
46
38
36
36
35
35
34
30
27
26
26
25
24
23
23
23
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
Table 1: (Continued)
0.0%-4.9%
No. 10
No. 23
No. 24
No. 11
Nature and Value of the Course (Including
Its Usefulness and Relevance)
Difficulty of the Course (and
Workload)—Description
Difficulty of the Course (and
Workload)—Evaluation
Nature and Usefulness of Supplementary
Materials and Teaching Aids
17
09
07
−11
Note: This table has been constructed from data given in Table 1 in Feldman
(1989b), which itself was based on information in the following studies: Benton
and Scott (1976); Bolton and Marr (1979); Braskamp, Caulley, and Costin (1979);
Bryson (1974); Centra (1977); Chase and Keene (1979); Cohen and Berger (1970);
Costin (1978); Doyle and Crichton (1978); Doyle and Whitely (1974); Elliott
(1950); Ellis and Rickard (1977); Endo and Della-Piana (1976); Frey (1973); Frey
(1976); Frey, Leonard, and Beatty (1975); Greenwood, Hazelton, Smith, and Ware
(1976); Grush and Costin (1975); Hoffman (L978); Marsh, Fleiner, and Thomas
(1975); Marsh and Overall (1980); McKeachie, Lin and Mann (1971); Mintzes
(1976–77); Morgan and Vasché (1978); Morsh, Burgess, and Smith (1956); Murray
(1983); Orpen (1980); Rankin, Greenmun, and Tracy (1965); Remmers, Martin,
and Elliott (1949); Rubinstein and Mitchell (1970); Solomon, Rosenberg, and
Bezdek (1964); and Turner and Thompson (1974). Each r given in (or derived from
information in) individual studies was converted to a Fisher’s Z transformation zr and weighted by the inverse of the number of instructional dimensions in which it
was coded. For each instructional dimension, the weighted zr ’s were averaged and
then backtransformed to produce the weighted average r’s given in this table. These
r’s are shown only for those instructional dimensions having information from at
least three studies; thus there are no entries for Dimensions 4, 21, 22 and 27. All
correlations in this table are statistically significant except those for Dimensions
11, 23, and 24.
variance—were found for several instructional dimensions: teacher’s
stimulation of students’ interest in the course and its subject (Instructional Dimension No. 1, average r = +38); teacher’s motivation of
students to do their best (No. 20, +38); teacher’s encouragement
of questions and discussion, and openness to the opinions of others
(No. 16, +36); teacher’s availability and helpfulness (No. 19, +36);
teacher’s elocutionary skills (No. 7, +35); clarity of course objectives
and requirements (No. 9, +35); and teacher’s knowledge of subject
(No. 3, +34).
Less strongly associated with student achievement are: the
teacher’s sensitivity to, and concern with, class level and progress
(No. 8); teacher’s enthusiasm (No. 2); teacher’s fairness and impartiality
105
Feldman: Identifying Exemplary Teachers and Teaching
of evaluation (No. 13); classroom management (No. 25); intellectual
challenge and encouragement of students’ independent thought (No.
17); teacher’s “personality” (No. 14); teacher’s friendliness and respect
or concern for students (No. 18); the quality and frequency of teacher’s
feedback to students (No. 15); the pleasantness of the classroom
atmosphere (No. 26); and the nature and value of the course material
(No. 10). The nature and usefulness of supplementary materials and
teaching aids as well as the difficulty and workload of the course (either
as a description or as an evaluation by students) are not related to
student achievement. Because of insufficient data in the set of studies
under consideration, the relationship of the following dimensions to
student achievement is not clear from these studies: No. 4 (teacher’s
intellectual expansiveness); No. 21 (teacher’s encouragement of selfinitiated learning); No. 22 (teacher’s productivity in research); and No.
27 (individualization of teaching).
Do Certain Kinds of Teaching Actually Produce
Student Achievement?
It is important to recognize that the associations between specific
evaluations of teachers and student achievement by themselves do not
establish the causal connections between the instructional characteristics under investigation and student achievement. For example, it is
possible that the correlations that have been found in some proportion
of the studies (whose results were used to create Table 1) do not
necessarily indicate that the instructional characteristics were causal
in producing the students’ achievement. Rather, as Leventhal (1975)
was one of the first to point out, some third variable such as student
motivation, ability or aptitude of the class might independently affect
both teacher performance and student learning, which would account
for the correlations between instructional characteristics and student
achievement even if there were no direct causal connection.
Leventhal (1975) has suggested that causality can be more clearly
established in studies in which students are randomly assigned to
sections of a multisection course rather than self-selected into them,
for the “random assignment of students promotes equivalence of the
groups of students by disrupting the causal processes which ordinarily
control student assignment” (p. 272). It is not always possible,
however, to assign students randomly to class sections. In some of the
studies reviewed by Cohen (and by Feldman, 1989b), students were
randomly assigned to class sections, whereas in other studies they were
106
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
not. Interestingly, in his meta-analysis, Cohen (1980a) found that, for
each of the four instructional dimensions that he checked, studies in
which students were randomly assigned to sections gave about the same
results as did studies where students picked their own class sections.
Cohen (1980a) also compared studies where the ability of students in
class sections was statistically controlled with studies where it was not.
Again, for each of the four instructional dimensions that he checked,
the correlations for the two sets of studies did not differ. Results such
as these increase the likelihood that the instructional characteristics
and student achievement are causally connected, although the possibility of spurious elements has not been altogether ruled out. Even
with random assignment, the results of multisection validation studies
may still permit certain elements of ambiguity in interpretation and
generalization (Marsh, 1987; and Marsh and Dunkin, 1992; but see
Abrami, d’Apollonia, and Rosenfield, 1993, 1996).
The results of experimental studies—whether field experiments
or laboratory experiments–are obviously useful here, for they can help
clarify cause-effect relationships in ways that the correlational studies
just reviewed cannot. Relevant research has been reviewed (selectively)
by Murray (1991), who notes in his analysis of pertinent studies that
either teacher’s enthusiasm/expressiveness or teacher clarity (or both)
has been a concern in nearly all relevant experimental research, and that
these studies usually include measures of amount learned by students.
In his overview of this research, Murray (1991) reports that “classroom
teaching behaviors, at least in the enthusiasm and clarity domains,
appear to be causal antecedents (rather than mere correlates) of various
instructional outcome measures” (p. 161, emphasis added).
Although Murray’s (1991) definitions of these domains are not
completely identical with the definitions of pertinent dimensions of
the present analysis, it is still of interest to compare his conclusions
and the findings given here. Thus, in the present discussion, teacher
clarity has also been shown to be of high importance to teaching,
whether indicated by the correlation of teacher clarity with student
achievement in the multisection correlational studies or, as will be seen
in a later section of this paper, by the association of teacher clarity with
the global evaluation of the teacher. As for the enthusiastic/expressive
attitudes and behaviors of teachers, highlighted in Murray’s (1991)
analysis, the instructional dimensions of “teachers enthusiasm (for
subject or for teaching)” referred to in the present discussion is, in
fact, associated with achievement in the multisection correlational
studies, but only moderately so compared to some of the other
107
Feldman: Identifying Exemplary Teachers and Teaching
instructional dimensions. However, the instructional dimension of
“teacher’s elocutionary skills,” which assumedly is an aspect of enthusiasm/expressiveness is more strongly associated with achievement in
the multisectional-correlational studies. Furthermore, note that Murray
writes that “behaviors loading on the Enthusiasm [Expressive] factor
share elements of spontaneity and stimulus variation, and thus are
perhaps best interpreted as serving to elicit and maintain student
attention to material presented in class” (p. 146). Given this interpretation, it is of relevance that the instructional dimension of “teacher’s
stimulation of interest in the course and its subject matter” has been
found to be rather highly correlated (albeit less so than the top
four dimensions) with students’ achievement in multisectional correlational studies; moreover, this particular dimension is highly associated,
as well, with global evaluation of instruction relative to the other
instructional dimensions (to be discussed in a later section of this
paper).
Underlying Mechanisms and Other Considerations
Whether the associations between student learning and teacher’s
attitudes, behaviors, and practices are established by correlational
studies or by experimental studies, the exact psychological and social
psychological mechanisms by which these instructional characteristics
influence student learning need to be more fully and systematically
detailed than they have been. When a large association between an
instructional characteristic and student achievement is found, the
tendency is to see the finding as obvious—that is, as being a selfexplanatory result. For example, given the size of the correlation
involved, it would seem obvious that a teacher who is clear and understandable naturally facilitates students’ achievement; little more needs
to be said or explained, it might be thought. But, in a very real sense,
the “obviousness” or “naturalness” of the connection appears only after
the fact (of a substantial association). Were the correlation between
dimension of “feedback” and student achievement a great deal larger
than was found, then this instructional characteristic, too, would be
seen by some as obviously facilitative of student achievement: naturally,
teachers who give frequent and good feedback effect high cognitive
achievement in their students. But, as previously noted, frequency and
quality of feedback has not been found to correlate particularly highly
with student achievement, and there is nothing natural or obvious
about either a high or low association between feedback and students’
108
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
achievement; and, in fact, to see either as natural or obvious ignores
the specific psychological and social psychological mechanisms that
may be involved in either a high or low correlation.
In short, although a case can be made that many of the different
instructional characteristics could be expected to facilitate student
learning (see, for example, Marsh and Dunkin, 1992, pp. 154–156),
what is needed are specific articulations about which particular dimensions of instruction theoretically and empirically are more likely
and which less likely to produce achievement. A crucial aspect of
this interest is specifying exactly how those dimensions that affect
achievement do so—even when, at first glance, the mechanisms
involved would seem to be obvious. Indeed, conceptually and empirically specifying such mechanisms in perhaps the most “obvious”
connection of them all in this area—that between student achievement
and the clarity and understandableness of instructors—has turned out
to be particularly complex, not at all simple or obvious (see, for
example, Land, 1979, 1981; Land and Combs, 1981, 1982, Land and
Smith, 1979, 1981; and Smith and Land, 1980). Likewise, the mechanisms underlying the correlation between teacher’s organization and
student achievement have yet to be specifically and fully determined,
although Perry (1991) has recently started the attempt by offering the
following hypothetical linkages:
Instructor organization involves teaching activities intended to
structure course material into units more readily accessible from
students’ long-term memory. An outline for the lecture provides
encoding schemata and advanced organizers which enable students
to incorporate new, incoming material into existing structures.
Presenting linkages between content topics serves to increase the
cognitive integration of the new material and to make it more
meaningful, both of which should facilitate retrieval. (p. 26)
One other consideration may be mentioned at this point.
McKeachie (1987) has recently reminded educational researchers and
practitioners that the achievement tests assessing student learning in
the sorts of studies being considered here typically measure lower-level
educational objectives such as memory of facts and definitions rather
than the higher-level outcomes such as critical thinking and problem
solving that are usually taken as important in higher education. He
points out that “today cognitive and instructional psychologists are
placing more and more emphasis upon the importance of the way in
which knowledge is structured as well as upon skills and strategies
109
Feldman: Identifying Exemplary Teachers and Teaching
for learning and problem solving” (p. 345). Moreover, although not
a consideration of this paper, there are still other cognitive skills and
intellectual dispositions as well as a variety of affective and behavioral
outcomes of students that my be influenced in the college classroom
(see, for example, discussions in Baxter Magolda, 1992; Bowen, 1977;
Chickering and Reisser, 1993; Doyle, 1972; Ellner and Barnes, 1983;
Feldman and Newcomb, 1969; Feldman and Paulsen, 1994; Hoyt, 1973;
King and Kitchener, 1994; Marsh, 1987; Pascarella and Terenzini, 1991;
Sanders and Wiseman, 1990; Sockloff, 1973; and Turner, 1970).
Specific Aspects of Teaching as Related to Overall
Evaluation of the Teacher
There is another way of determining the differential importance of
various instructional dimensions, one that uses information internal to
the evaluation form itself. If it is assumed that each student’s overall
evaluation of an instructor is an additive combination of the student’s
evaluation of specific aspects of the teacher and his or her instruction,
weighted by the student’s estimation of the relative importance of these
aspects to good teaching, then it would be expected that students’
overall assessment of teachers would be more highly associated with
instructional characteristics that students generally consider to be
more important to good teaching than with those they consider to
be less important (cf. Crittenden and Norr, 1973). Thus, one way to
establish the differential importance of various instructional characteristics is to compare the magnitudes of the correlations between
the actual overall evaluations by students of their teachers and their
ratings of each of the specific attitudinal and behavioral characteristics
of these teachers. Otherwise put, the importance of an instructional
dimension is indicated by its ability to discriminate among students’
global assessment of teachers.6
In an analysis (Feldman, 1976b) done a while ago now, though
one still of full relevance here, I located some 23 studies containing
correlations (or comparable information showing the extent of the
associations) between students’ overall evaluations of their teachers
and their ratings of specific attitudinal and behavioral characteristics
of these teachers.
6
Limitations of this approach to determining the importance of instructional dimensions are
discussed in Feldman (1976b, 1988; also see Abrami, d’Apollonia and Rosenfield, 1993, 1996).
110
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
This information in each study was used to rank order the importance of these characteristics (in terms of size of its association with
overall evaluation) and then to calculate for each study standardized
ranks (rank of each item divided by the number of items ranked) for
the specific evaluations in the study. Finally, for each of the instructional dimensions under consideration (see Feldman, 1976, Table 1
and note 5), standardized ranks were averaged across the pertinent
studies.
These average standardized ranks are given in Column 2 of
Table 2.T2 Column 1 of this same table repeats those data previously
given in Table 1 on the associations between instructional dimensions and student achievement for just those instructional dimensions
considered in both analyses. The two analyses, each determining the
importance of instructional dimensions in its own way, have eighteen
instructional dimensions in common, although data for only seventeen
of them are given in the table. Instructional Dimension No. 4 (teacher’s
intellectual expansiveness) has been left out, as it was in Table 1,
because of insufficient data about the correlation between it and student
achievement. Table 2T2 also shows (in parentheses) the rank in importance of each of the instructional dimensions that is produced by each
of the two different methods of gauging importance of the dimensions.
There is no overlap in the studies on which the data in Columns
1 and 2 of Table 2 are based. Furthermore, because the studies
considered in the student achievement analyses (Col. 1) are mostly of
students in multisection courses of an introductory nature, these
students and courses are less representative of college students and
courses in general than are the students and courses in the second
set of studies (Col. 2). Despite these circumstances, the rank-order
correlation (rho) between the ranks shown in the two columns is
+61. Those specific instructional dimensions that are the most highly
associated with student achievement tend to be the same ones that best
discriminate among teachers with respect to the overall evaluation they
receive from students. The correlation is not a perfect one, however.
The largest discrepancies are for teacher’s availability and helpfulness
(relatively high importance in terms of its association with achievement
and relatively low importance in terms of its association student’s
global evaluations) and for intellectual challenge and encouragement of
students’ independent thought (relatively low importance by the first
indicator and relatively high importance by the second indicator). The
other large “shifts” between the two indicators of importance are less
dramatic: teacher’s preparation and course organization (from Rank 1
111
Feldman: Identifying Exemplary Teachers and Teaching
Table 2: Comparison of Instructional Dimensions on Two Different Indicators of
Importance
Importance
Shown by
Correlation
with Student
Achievement
(1)
Instructional Dimension
No. 5
No. 6
No. 12
No. 1
No. 16
No. 19
No. 7
No. 9
No. 3
No. 8
No. 2
No. 13
No. 17
No. 18
No. 15
Teacher’s Preparation;
Organization of the Course
Clarity and
Understandableness
Perceived Outcome or Impact
of Instruction
Teacher’s Stimulation of
Interest in the Course and
Its Subject Matter
Teacher’s Encouragement of
Questions and Discussion,
and Openness to Opinions
of Others
Teacher’s Availability and
Helpfulness
Teacher’s Elocutionary Skills
Clarity of Course Objectives
and Requirements
Teacher’s Knowledge of the
Subject
Teacher’s Sensitivity to, and
Concern with, Class Level
and Progress
Teacher’s Enthusiasm (for
Subject or for Teaching)
Teacher’s Fairness; Impartiality
of Evaluation of Students;
Quality of Examinations
Intellectual Challenge
and Encouragement of
Independent Thought (by
the Teacher and the Course)
Teacher’s Concern and Respect
for Students; Friendliness of
the Teacher
Nature, Quality, and
Frequency of Feedback from
the Teacher to Students
112
Importance
Shown by
Correlation
with Overall
Evaluations
(2)
.57 (1)
.41 (6)
.56 (2)
.25 (2)
.46 (3)
.28 (3)
.38 (4)
.20 (1)
.36 (5.5)
.60 (11)
.36 (5.5)
.74 (16)
.35 (7.5)
.35 (7.5)
.49 (10)
.45 (7)
.34 (9)
.48 (9)
.30 (10)
.40 (5)
.27 (11)
.46 (8)
.26 (12)
.72 (14.5)
.25 (13)
.33 (4)
.23 (14.5)
.65 (12)
.23 (14.5)
.87 (17)
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
Table 2: (Continued)
Instructional Dimension
No. 10
No. 11
Nature and Value of the
Course Material (Including
Its Usefulness and
Relevance)
Nature and Usefulness of
Supplementary Materials
and Teaching Aids
Importance
Shown by
Correlation
with Student
Achievement
(1)
Importance
Shown by
Correlation
with Overall
Evaluations
(2)
.17 (16)
.70 (13)
−11 (17)
.72 (14.5)
Note: This table is adapted from Table 3 in Feldman (1989b). The correlations
shown in Column 1 are the same as those in Table 1 of the present analysis.
The higher the correlation, the more important the instructional dimension. The
correlations have been ranked from 1 to 17 (with the ranks shown in parentheses).
The average standardized ranks given in Column 2 originally were given in Feldman
(1976b, see Table 2 and footnote 5), and are based on information in the following
studies: Brooks, Tarver, Kelley, Liberty, and Dickerson (1971); Centra (1975); Cobb
(1956); French-Lazovik (1974, two studies); Garber (1964); Good (1971); Harry
and Goldner (1972); Harvey and Barker (1970); Jioubu and Pollis (1974); Leftwich
and Remmers (1962); Maas and Owen (1973); Owen (1967); Plant and Sawrey
(1970); Remmers (1929); Remmers and Weisbrodt (1964); Rosenshine, Cohen,
and Furst (1973); Sagen (1974); Spencer (1967); Van Horn (1968); Walker (1968);
Widlak, McDaniel, and Feldhusen (1973); and Williams (1965). The lower the
average standardized rank (that is, the smaller the fraction), the more important
the dimension. The average standardized ranks in Column 2 have been ranked
from 1 to 17 (with the ranks shown in parentheses). This table includes only those
dimensions considered in both Feldman (1976b) and Feldman (1989b), and thus
there are fewer dimensions in this table than there are in Table 1.
to Rank 6, the latter still relatively high in importance), and teacher’s
encouragement of questions and openness to others’ opinion (from
rank 5.5 to rank 11).
If ranks 1 through 6 are thought of as indicating high importance
(relative to the other dimensions), rank 7–12 as indicating moderate
importance, and ranks 13–17 as indicating low importance (low, that
is, relative to the other dimensions, not necessarily unimportant),
then the two methods determining the importance of instructional
dimensions show the following pattern. Both methods indicate that the
teacher’s preparation and course organization, the teacher’s clarity and
113
Feldman: Identifying Exemplary Teachers and Teaching
understandableness, the teacher’s stimulation of students’ interest and
the students’ perceived outcome or impact of the course are of high
importance (relative to the other dimensions). Although the teacher’s
encouragement of questions and openness to others’ opinion as well
as his or her availability and helpfulness are also of high importance
in terms of the association of each with achievement, the first is only
of moderate importance and the second of low importance in terms of
its association with global evaluation of teachers.
Both methods of determining the importance of the instructional dimensions show the following to be of moderate importance
relative to other dimensions: teacher’s elocutionary skill, clarity of
course objective and requirements, teacher’s knowledge of subject, and
teacher’s enthusiasm. The importance of the teacher’s sensitivity to class
level and progress is also moderate by the first indicator (association
with student learning) but high by the second (association with overall
evaluation of the teacher), whereas the teacher’s fairness and impartiality of evaluation is moderate by the first and low by the second.
Each of the following five dimensions is of low relative importance in
terms of its association with student achievement, although only the
first three are also relatively low in importance in terms of their association with global evaluation: nature, quality and frequency of feedback
to students; nature and value of course material; nature and usefulness
of supplementary materials and teaching aids; intellectual challenge
and encouragement of independent thought (which is of relatively high
importance in the strength of its association with the global evaluation
of teachers); and teacher’s friendliness and concern/respect for student
(of moderate importance in its association with global evaluation).
Table 3T3 offers a summary of the results of using the two different
ways considered here of determining the importance of various instructional dimensions from student ratings of teachers. By averaging (when
possible) the rank order of the dimensions produced by the two
methods, information in Table 2 (and, in some cases, Table 1 as well)
has been used to classify roughly the instructional dimensions into
four categories of importance: high importance; moderate importance;
moderate-to-low importance; and low (or no) importance. For most of
the instructional dimensions, placement into the categories depended
on information from both indicators of importance (association with
achievement and association with global rating); in the other cases,
classification was based on information from only one indicator (association with achievement).
114
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
Table 3: Summary of the Importance of Various Instructional Dimensions Based on
Student Ratings
High Importance
(Two Sources)
(Two Sources)
No. 6
No. 1
(Two Sources)
(Two Sources)
No. 12
No. 5
(One Source)
No. 28
(One Source)
No. 20
(Two Sources)
No. 9
(Two Sources)
No. 8
(Two Sources)
No. 16
(Two Sources)
No. 17
(Two
(Two
(Two
(Two
No.
No.
No.
No.
Clarity and Understandableness
Teacher’s Stimulation of Interest in the
Course and Its Perceived Subject Matter
Perceived Outcome of Impact of Instruction
Teacher’s Preparation; Organization of the
Course
Teacher Pursued and/or Met Course
Objectives
Teacher Motivates Students to Do Their
Best; High Standard of Performance
Required
Moderate Importance
Sources)
Sources)
Sources)
Sources)
7
3
2
19
Clarity of Course Objectives and
Requirements
Teacher’s Sensitivity to, and Concern with,
Class Level and Progress
Teacher’s Encouragement of Questions and
Discussion, and Openness to Opinions of
Others
Intellectual Challenge and Encouragement
of Independent Thought
Teacher’s Elocutionary Skills
Teacher’s Knowledge of the Subject
Teacher’s Enthusiasm for the Subject
Teacher’s Availability and Helpfulness
Moderate-to-Low Importance
(Two Sources)
No. 13
(Two Sources)
No. 18
(One Source)
(One Source)
No. 25
No. 14
(One Source)
No. 26
Teacher’s Fairness; Impartiality of
Evaluation of Students; Quality of
Examinations
Teacher’s Concern and Respect for
Students; Friendliness of the Teacher
Classroom Management
Personality Characteristics (“Personality”)
of the Teacher
Pleasantness of Classroom Atmosphere
Low Importance or No Importance
(Two Sources)
No. 10
(Two Sources)
No. 15
Nature and Value of the Course (Including
its Usefulness and Relevance)
Nature, Quality, and Frequency of
Feedback from the Teacher to the
Student
(cont.)
115
Feldman: Identifying Exemplary Teachers and Teaching
Table 3: (Continued)
(Two Sources)
No. 11
(One Source)
(One Source)
No. 23
No. 24
Nature and Usefulness of Supplementary Materials and
Teaching Aids
Difficulty of the Course (and Workload)—Description
Difficulty of the Course (and Workload)—Evaluation
Note: By averaging (when possible) the rank ordering of dimensions produced by
two different methods of determining importance of various instructional dimensions, information in Table 2 (and, in some cases, Table 1) has been used to classify
instructional dimensions into one of the four categories shown in this table. As
indicated in the table, for some instructional dimensions two sources of information were available (association of the instructional dimension with achievement
and with global evaluations, as given in Table 2); for other instructional dimensions, only one source of information was available (association of the instructional
dimension with achievement, as given in Table 1.)
Although the present paper has concentrated on data derived from
student ratings of actual teachers, I want to note briefly another way of
determining the importance of various instructional dimensions using
different information: Those most involved with teaching and learning
can be asked directly about the importance of various components
of instruction. In one analysis (Feldman, 1988), I collected thirtyone studies in which both students and faculty (separately) specified
the instructional characteristics they considered particularly important
to good teaching and effective instruction. Students and faculty were
generally similar, though not identical, in their views, as indicated
by an average correlation of +71 between them in their valuation of
various aspects of teaching. However, the ordering of the instructional
dimensions by either of these groups shows differences (as well as
some similarities) with that based on the two indicators of importance
using student ratings of actual teachers.
A few examples may be given. Similar to the results shown in
Table 3, Instructional Dimensions No. 5 (teacher’s preparation and
organization of the course) and No. 6 (clarity and understandableness)
are of high importance to students and to faculty when these groups are
asked directly about what is important to good teaching and effective
instruction. Further, when asked directly, students again place high
importance on Dimension No. 1 (teacher’s stimulation of interest), but
in this case faculty (when asked directly) see this aspect of teaching as
less important than do the students (when asked directly) or by the two
indicators of importance derived from student evaluations (summarized in Table 3). Moreover, compared to the importance determined
116
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
by the analysis of data from student evaluations, students and faculty,
when asked directly, place less importance on Instructional Dimension
No. 12 (perceived outcome or impact of instruction) but more importance on Dimensions No. 8 (teacher’s sensitivity to, and concern with,
class level and progress), No. 3 (teacher’s knowledge of subject matter),
and No. 2 (teacher’s enthusiasm).7
CONCLUDING COMMENTS
This paper was not intended as a comprehensive review of the research
literature on evaluation of college students of their teachers or on the
correlates of effective teaching in college. Indeed, several topics or
areas usually explored in such reviews have not been considered in this
paper. To take two instances, I have ignored an analysis of whether
there is a connection between research productivity and teaching effectiveness as well as a discussion of the usefulness of student ratings as
feedback to faculty to improve their teaching (other than to label as
myths the statements that good instruction and good research are so
closely allied as to make it unnecessary to evaluate them separately and
that student ratings cannot meaningfully be used to improve teaching).
Rather, I have somewhat single-mindedly focused on the use of student
ratings to identify exemplary teachers and teaching. In doing so, I
have drawn together relevant parts of my own work over the years
in addition to incorporating findings and conclusions from selected
others.
Nothing I have written in this paper is meant to imply that the
use of teacher evaluations is the only means of identifying exemplary
teachers and teaching at the college level. The recent discussion
of the multitude of items that would be appropriate for “teaching
portfolios” by itself suggests otherwise (see, among others, Centra,
1993, Edgerton, Hutchings and Quinlan, 1991, and Seldin, 1991).
For instance, in a project sponsored by the Canadian Association of
University Teachers to identify the kinds of information a faculty
member might use as evidence of teaching effectiveness, some
forty-nine specific items were suggested as possible items for inclusion
in a dossier (Shore and associates, 1986); only one of these items
7
Other similarities and differences can be found in Feldman, 1989b (Table 3), where data
for all four indicators of the importance of various instructional dimensions—association with
achievement, association with global ratings, direct report of students, and direct report of
faculty—are given.
117
Feldman: Identifying Exemplary Teachers and Teaching
referred to student ratings (listed as “student course and teaching
evaluation data ”). Given the diverse ways noted in these dossiers of
“capturing the scholarship of teaching,” as Edgerton, Hutchings and
Quinlan (1991) put it, gathering teacher evaluations may or may not
be the one best way to identify excellence in teaching. But it is an
important way; and current research evidence does show that when
teacher evaluation forms are properly constructed and administered
(Feldman, 1979), the global and specific ratings contained in them,
as interpreted with appropriate caution, are undeniably helpful in
identifying exemplary teachers and teaching.
Reprinted by permission of Agathon Press, New York.
118
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
References
Abrami, P.C. (1985). Dimensions of effective college instruction. Review of Higher
Education 8: 211–228.
Abrami, P.C. (1988), SEEQ and ye shall find: A review of Marsh’s “Students’ evaluation
of university teaching.” Instructional Evaluation 9: 19–27.
Abrami, P.C. (1989a). How should we use student ratings to evaluate teaching?
Research in Higher Education 30: 221–227.
Abrami, P.C. (1989b). SEEQing the truth about student ratings of instruction. Educational Researcher 43: 43–45.
Abrami, P.C., Cohen, P.A., and d’Apollonia, S. (1988). Implementation problems in
meta-analysis. Review of Educational Research 58: 151–179.
Abrami, P.C., and d’Apollonia, S. (1990). The dimensionality of ratings and their
use in personnel decisions. In M. Theall and J. Franklin (eds.), Student Ratings of
Instruction: Issues for Improving Practice (New Directions for Teaching and Learning
No. 43). San Francisco: Jossey-Bass.
Abrami, P.C., and d’Apollonia, S. (1991). Multidimensional students’ evaluations of
teaching effectiveness—generalizability of “N = 1” research: Comments on Marsh.
Journal of Educational Psychology 83: 411–415.
Abrami, P.C., d’Apollonia, S., and Rosenfield, S. (1993). The Dimensionality of Student
Ratings of Instruction: Introductory Remarks. Paper presented at the annual meeting
of the American Educational Research Association.
Abrami, P.C., d’Apollonia, S., and Rosenfield, S. (1996). The dimensionality of student
ratings of instruction: What we know and what we do not. In J.C. Smart (ed.) Higher
Education: Handbook of Theory and Research (Vol. 11). New York: Agathon Press.
Abrami, P.C., Leventhal, L., and Perry R.P. (1982). Educational seduction. Review of
Educational Research 52: 446–464.
Aleamoni, L. (1987). Student rating myths versus research facts. Journal of Personnel
Evaluation in Education 1: 111–119.
Aubrecht, J.D. (1981). Reliability, validity and generalizability of student ratings of
instruction. (IDEA Paper No. 6). Manhattan, KS: Kansas State University, Center for
Faculty Evaluation and Development. (ERIC Document Reproduction Service No.
ED 213 296)
Baxter Magolda, M.B. (1992). Knowing and Reasoning in College: Gender-Related Patterns
in Students’ Intellectual Development. San Francisco: Jossey-Bass.
Benton, S.E., and Scott, O. (1976). A Comparison of the Criterion Validity of Two
Types of Student Response Inventories for Appraising Instruction. Paper presented at
the annual meeting of the National Council on Measurement in Education.
Bolton, B., Bonge, D., and Marr, J. (1979). Ratings of instruction, examination performance, and subsequent enrollment in psychology courses. Teaching of Psychology
6: 82–85.
Bowen, H.R. (1977). Investment in Learning: The Individual and Social Value of American
Higher Education. San Francisco: Jossey-Bass.
Braskamp, L.A., Brandenburg, D.C., and Ory, J.C. (1984). Evaluating Teaching Effectiveness: A Practical Guide. Beverly Hills, Calif.: Sage.
Braskamp, L.A., Caulley, D., and Costin, F. (1979). Student ratings and instructor selfratings and their relationship to student achievement. American Educational Research
Journal 16: 295–306.
119
Feldman: Identifying Exemplary Teachers and Teaching
Braskamp, L.A., and Ory, J.C. (1994). Assessing Faculty Work: Enhancing Individual
and Institutional Performance. San Francisco: Jossey-Bass.
Brooks, T.E., Tarver, D.A., Kelley, H.P., Liberty, P.G., Jr., and Dickerson, A.D. (1971).
Dimensions underlying student ratings of courses and instructors at the University of
Texas at Austin: Instructor Evaluation Form 2. (Research Bulletin RB-71-4). Austin,
Texas: University of Texas at Austin, Measurement and Evaluation Center.
Bryson, R. (1974). Teacher evaluations and student learning: A reexamination. Journal
of Educational Research 68: 11–14.
Cashin, W.E. (1988). Student Ratings of Teaching: A Summary of the Research. (IDEA
Paper No. 20). Manhattan, KS: Kansas State University, Center for Faculty Evaluation
and Development.
Cashin, W.E. (1990). Students do rate different academic fields differently. In M. Theall
and J. Franklin (eds.), Student Ratings of Instruction: Issues for Improving Practice
(New Directions for Teaching and Learning No. 43). San Francisco: Jossey-Bass.
Cashin, W.E. (1995). Student Ratings of Teaching: The Research Revisited. (IDEA Paper
No. 32). Manhattan, KS: Kansas State University, Center for Faculty Evaluation and
Development.
Cashin, W.E., and Clegg, V.L. (1987). Are Student Ratings of Different Academic Fields
Different? Paper presented at the annual meeting of the American Educational
Research Association. (ERIC Document Reproduction Service No. ED 286 935)
Cashin, W.E., and Downey, R.G. (1992). Using global student rating items for
summative evaluation. Journal of Educational Psychology 84: 563–572.
Cashin, W.E., Downey, R.G., and Sixbury, G.R. (1994). Global and specific ratings of
teaching effectiveness and their relation to course objectives: Reply to Marsh. Journal
of Educational Psychology 86: 649–657.
Cashin, W.E., and Sixbury, G.R. (1993). Comparative Data by Academic Field. (IDEA
Technical Report No. 8). Manhattan, KS: Kansas State University, Center for Faculty
Evaluation and Development.
Centra, J.A. (1975). Colleagues as raters of classroom instruction. Journal of Higher
Education 46: 327–337.
Centra, J.A. (1977). Student ratings of instruction and their relationship to student
learning. American Educational Research Journal 14: 17–24.
Centra, J.A. (1979). Determining Faculty Effectiveness: Assessing Teaching, Research,
and Service for Personnel Decisions and Improvement. San Francisco: Jossey-Bass.
Centra, J.A. (1989). Faculty evaluation and faculty development in higher education.
In J.C. Smart (ed.), Higher Education: Handbook of Theory and Research (Vol. 5).
New York: Agathon Press.
Centra, J.A. (1993). Reflective Faculty Evaluation: Enhancing Teaching and Determining
Faculty Effectiveness. San Francisco: Jossey-Bass.
Chase, C.I., and Keene, J.M., Jr. (1979). Validity of student ratings of faculty. (Indiana
Studies in Higher Education No. 40). Bloomington, Ind.: Indiana University, Bureau
of Evaluation Studies and Testing, Division of Research and Development. (ERIC
Document Reproduction Service No. ED 169 870).
Chickering, A.W., and Reisser, L. (1993). Education and Identity (2nd edition).
San Francisco: Jossey-Bass.
Cobb, E.B. (1956). Construction of a Forced-choice University Instructor Rating Scale.
Unpublished doctoral dissertation, University of Tennessee, Knoxville.
120
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
Cohen, P.A. (1980a). A Meta-analysis of the Relationship between Student Ratings of
Instruction and Student Achievement. Unpublished doctoral dissertation, University
of Michigan, Ann Arbor.
Cohen, P.A. (1980b). Effectiveness of student-rating feedback for improving college
instruction: A meta-analysis of findings. Research in Higher Education 13: 321–341.
Cohen, P.A. (1981). Student ratings of instruction and student achievement. Review
of Educational Research 51: 281–309.
Cohen, P.A. (1987). A Critical Analysis and Reanalysis of the Multisection Validity
Meta-analysis. Paper presented at the annual meeting of the American Educational
Research Association. (ERIC Document Reproduction Service No. ED 283 876)
Cohen, S.H., and Berger, W.G. (1970). Dimensions of students’ ratings of college
instructors underlying subsequent achievement on course examinations. Proceedings
of the 78th Annual Convention of the American Psychological Association 5: 605–606.
Costin, F. (1978). Do student ratings of college teachers predict student achievement?
Teaching of Psychology 5: 86–88.
Costin, F., Greenough, W.T., and Menges, R. J. (1971). Student ratings of college
teaching: Reliability, validity and usefulness. Review of Educational Research 41:
511–535.
Crittenden, K.S., and Norr, J.L. (1973). Student values and teacher evaluation:
A problem in person perception. Sociometry 36: 143–151.
d’Apollonia, S., and Abrami, P.C. (1987). An Empirical Critique of Metaanalysis: The
Literature on Student Ratings of Instruction. Paper presented at the annual meeting of
the American Educational Research Association.
d’Apollonia, S., and Abrami, P.C. (1988). The Literature on Student Ratings of
Instruction: Yet Another Meta-analysis. Paper presented at the annual meeting of the
American Educational Research Association.
d’Apollonia, S., Abrami, P.C., and Rosenfield, S. (1993). The Dimensionality of Student
Ratings of Instruction: A Meta-Analysis of the Factor Studies. Paper presented at the
annual meeting of the American Educational Research Association.
Doyle, K.O., Jr. (1972). Construction and Evaluation of Scale for Rating College
Instructors. Unpublished doctoral dissertation, University of Minnesota, Minneapolis.
Doyle, K.O., Jr. (1975). Student Evaluation of Instruction. Lexington, MA: D. C. Heath.
Doyle, K.O., Jr. (1983). Evaluating Teaching. Lexington, MA: D. C. Health.
Doyle, K.O., Jr., and Crichton, L.I. (1978). Student, peer, and self evaluation of college
instruction. Journal of Educational Psychology 70: 815–826.
Doyle, K.O., Jr., and Whitely, S.E. (1974). Student ratings as criteria for effective
teaching. American Educational Research Journal 11: 259–274.
Edgerton, R., Hutchings, P., and Quinlan, K. (1991). The Teaching Portfolio: Capturing
the Scholarship in Teaching. Washington, DC: American Association for Higher
Education.
Elliott, D.N. (1950). Characteristics and relationship of various criteria of college and
university teaching. Purdue University Studies in Higher Education 70: 5–61.
Ellis, N.R., and Rickard, H.C. (1977). Evaluating the teaching of introductory
psychology. Teaching of Psychology 4: 128–132.
Ellner, C.L., and Barnes, C.P. (1983). Studies of College Teaching: Experimental Results,
Theoretical Interpretations, and New Perspectives. Lexington, MA: D. C. Heath.
Endo, G.T., and Della-Piana, G. (1976). A validation study of course evaluation ratings.
Improving College and University Teaching 24: 84–86.
121
Feldman: Identifying Exemplary Teachers and Teaching
Feldman, K.A. (1976a). Grades and college students’ evaluation of their courses and
teachers. Research in Higher Education 4: 69–111.
Feldman, K.A. (1976b). The superior college teacher from the students’ view. Research
in Higher Education 5: 243–288.
Feldman, K.A. (1977). Consistency and variability among college students in rating
their teachers and courses: A review and analysis. Research in Higher Education 6:
223–274.
Feldman, K.A. (1978). Course characteristics and college students’ ratings of their
teachers: What we know and what we don’t. Research in Higher Education 9: 199–242.
Feldman, K.A. (1979). The significance of circumstances for college students’ ratings
of their teachers and courses. Research in Higher Education 10: 149–172.
Feldman, K.A. (1983). Seniority and experience of college teachers as related to
evaluation they receive from students. Research in Higher Education 18: 3–124.
Feldman, K.A. (1984). Class size and college students’ evaluations of teachers and
courses: A closer look. Research in Higher Education 21: 45–116.
Feldman, K.A. (1986). The perceived instructional effectiveness of college teachers as
related to their personality and attitudinal characteristics: A review and synthesis.
Research in Higher Education 24: 139–213.
Feldman, K.A. (1987). Research productivity and scholarly accomplishment of college
teachers as related to their instructional effectiveness: A review and exploration.
Research in Higher Education 26: 227–298.
Feldman, K.A. (1988). Effective college teaching from the students’ and faculty’s view:
Matched or mismatched priorities? Research In Higher Education 28: 291–344.
Feldman, K.A. (1989a). Instructional effectiveness of college teachers as judged by
teachers themselves, current and former students, colleagues, administrators, and
external (neutral) observers. Research in Higher Education 30: 137–194.
Feldman, K.A. (1989b). The association between student ratings of specific instructional dimensions and student achievement: Refining and extending the synthesis of
data from multisection validity studies. Research in Higher Education 30: 583–645.
Feldman, K.A. (1990a). An afterword for “The association between student ratings of
specific instructional dimensions and student achievement: Refining and extending
the synthesis of data from multisection validity studies.” Research in Higher Education
31: 315–318.
Feldman, K.A. (1990b). Instructional evaluation. The Teaching Professor 4: 5–7.
Feldman, K.A. (1992). College students’ views of male and female college teachers:
Part I—evidence from the social laboratory and experiments. Research in Higher
Education 33: 317–375.
Feldman, K.A. (1993). College students’ views of male and female college teachers:
Part II—evidence from students’ evaluations of their classroom teachers. Research in
Higher Education 34: 151–211.
Feldman, K.A. (1994). Identifying Exemplary Teaching: Evidence from Course and
Teacher Evaluations. Paper commissioned by the National Center on Postsecondary
Teaching, Learning, and Assessment for presentation at the Second AAHE
Conference on Faculty Roles and Rewards.
Feldman, K.A. (forthcoming). Identifying exemplary teaching: Using data from course
and teacher evaluations. In M.D. Svinicki and R.J. Menges (eds.), Honoring Exemplary
Teaching (New Directions for Teaching and Learning). San Francisco: Jossey-Bass.
122
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
Feldman, K.A., and Newcomb, T.M. (1969). The Impact of College on Students. San
Francisco: Jossey-Bass.
Feldman, K.A., and Paulsen, M.B. (eds.) (1994). Teaching and Learning in the College
Classroom. Needham Heights, MA: Ginn Press.
French-Lazovik, G. (1974). Predictability of students’ evaluation of college teachers
from component ratings. Journal of Educational Psychology 66: 373–385.
Frey, P.W. (1973). Student ratings of teaching: Validity of several rating factors. Science
182: 83–85.
Frey, P.W. (1976). Validity of student instructional ratings as a function of their timing.
Journal of Higher Education 47: 327–336.
Frey, P.W., Leonard, D.W., and Beatty, W.W. (1975). Student ratings of instruction:
Validation research. American Educational Research Journal 12: 435–444.
Garber, H., 1964. Certain Factors Underlying the Relationship between Course Grades and
Student Judgments of College Teachers. Unpublished doctoral dissertation, University
of Connecticut, Storrs.
Gigliotti, R.J., and Buchtel, F.S. (1990). Attributional bias and course evaluation.
Journal of Educational Psychology 82: 341–351.
Good, K.C. (1971). Similarity of Student and Instructor Attitudes and Student’s
Attitudes Toward Instructors. Unpublished doctoral dissertation, Purdue University,
West Lafayette.
Greenwood, G.E., Hazelton, A., Smith, A.B., and Ware, W.B. (1976). A study of the
validity of four types of student ratings of college teaching assessed on a criterion
of student achievement gains. Research in Higher Education 5: 171–178.
Grush, J.E., and Costin, F. (1975). The student as consumer of the teaching process.
American Educational Research Journal 12: 55–66.
Harry, J., and Goldner, N.S. (1972). The null relationship between teaching and
research. Sociology of Education 45: 47–60.
Harvey, J.N., and Barker, D.G. (1970). Student evaluation of teaching effectiveness.
Improving College and University Teaching 18: 275–278.
Hativa, N., and Raviv, A. (1993). Using a single score for summative teacher evaluation
by students. Research in Higher Education 34: 625–646.
Hoffman, R.G. (1978). Variables affecting university student ratings of instructor
behavior. American Educational Research Journal 15: 287–299.
Hoyt, D.P. (1973). Measurement of instructional effectiveness. Research in Higher
Education 1: 367–378.
Jiobu, R.M., and Pollis, C.A. (1971). Student evaluations of courses and instructors.
American Sociologist 6: 317–321.
King, P.M., and Kitchener, K.S. (1994). Developing Reflective Judgment: Understanding
and Promoting Intellectual Growth and Critical Thinking in Adolescents and Adults.
San Francisco: Jossey-Bass.
Kulik, M.A., and McKeachie, W.J. (1975). The evaluation of teachers in higher
education. In F.N. Kerlinger (ed.), Review of Research in Education (Vol. 3). Itasca, II:
F.E. Peacock.
Land, M.L. (1979). Low-inference variables of teacher clarity: Effects on student
concept learning. Journal of Educational Psychology 71: 795–799.
Land, M.L. (1981). Actual and perceived teacher clarity: Relations to student
achievement in science. Journal of Research in Science Teaching 18: 139–143.
123
Feldman: Identifying Exemplary Teachers and Teaching
Land, M.L., and Combs, A. (1981). Teacher Clarity, Student Instructional Ratings, and
Student Performance. Paper read at the annual meeting of the American Educational
Research Association.
Land, M.L., and Combs, N. (1982). Teacher behavior and student ratings. Educational
and Psychological Research 2: 63–68.
Land, M.L., and Smith, L.R. (1979). The effect of low inference teacher clarity inhibitors
and student achievement. Journal of Teacher Education 30: 55–57.
Land, M.L., and Smith, L.R. (1981). College student ratings and teacher behavior: An
Experimental Study. Journal of Social Studies Research 5: 19–22.
Leftwich, W.H., and Remmers, H.H. (1992). A comparison of graphic and forcedchoice ratings of teaching performance at the college and university level. Purdue
Universities Studies in Higher Education 92: 3–31.
Leventhal, L. (1975). Teacher rating forms: Critique and reformulation of previous
validation designs. Canadian Psychological Review 16: 269–276.
Levinson-Rose, J., and Menges, R.L. (1981). Improving college teaching: A critical
review of research. Review of Educational Research 51: 403–434.
L’Hommedieu, R., Menges, R.J., and Brinko, K.T. (1988). The Effects of Student Ratings
Feedback to College Teachers: A Meta-analysis and Review of Research. Unpublished manuscript, Northwestern University, Center for the Teaching Professions,
Evanston.
L’Hommedieu, R., Menges, R.J., and Brinko, K.T. (1990). Methodological explanations
for the modest effects of feedback. Journal of Educational Psychology 82: 232–241.
Maas, J.B., and Owen, T.R. (1973). Cornell Inventory for Student Appraisal of Teaching
and Courses: Manual of Instructions. Ithaca, NY: Cornell University, Center for
Improvement of Undergraduate Education.
Marsh, H.W. (1984). Students’ evaluations of university teaching: Dimensionality,
reliability, validity, potential biases, and utility. Journal of Educational Psychology 76:
707–754.
Marsh, H.W. (1987). Students’ evaluations of university teaching: Research findings,
methodological issues, and directions for future research. International Journal of
Educational Research 11: 253–388.
Marsh, H.W. (1991a). Multidimensional students’ evaluation of teaching effectiveness:
A test of alternative higher-order structures. Journal of Educational Psychology 83:
285–296.
Marsh, H.W. (1991b). A multidimensional perspective on students’ evaluations of
teaching effectiveness: A reply to Abrami and d’Apollonia. Journal of Educational
Psychology 83: 416–421.
Marsh, H.W. (in press). Weighting for the right criterion in the IDEA system: Global
and specific ratings of teaching effectiveness and their relation to course objectives.
Journal of Educational Psychology.
Marsh, H.W., and Dunkin, M.J. (1992). Students’ evaluations of university teaching:
A multidimensional approach. In J.C. Smart (ed.), Higher Education: Handbook of
Theory and Research (Vol. 8). New York: Agathon Press.
Marsh, H.W., Fleiner, H., and Thomas, C.S. (1975). Validity and usefulness of student
evaluations of instructional quality. Journal of Educational Psychology 67: 833–839.
Marsh, H.W., and Overall, J.U. (1980). Validity of students’ evaluations of teaching
effectiveness: Cognitive and affective criteria. Journal of Educational Psychology 72:
468–475.
124
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
McKeachie, W.J. (1979). Student ratings of faculty: A reprise. Academe 65: 384–397.
McKeachie, W.J. (1987). Instructional evaluation: Current issues and possible improvements. Journal of Higher Education 58: 344–350.
McKeachie, W.J., Lin, Y-G, and Mann, W. (1971). Student ratings of teacher effectiveness: Validity studies. American Educational Research Association 8: 435–445.
Miller, R.I. (1972). Evaluating Faculty Performance. San Francisco: Jossey-Bass.
Miller, R.I. (1974). Developing Programs for Faculty Evaluation. San Francisco:
Jossey-Bass.
Mintzes, J.J., (1976–77). Field test and validation of a teaching evaluation instrument:
The Student Opinion Survey of Teaching (A report submitted to the Senate
Committee for Teaching and Learning, Faculty Senate, University of Windsor).
Windsor, ON: University of Windsor.
Morgan, W.D., and Vasché, J.D. (1978). An Educational Production Function
Approach to Teaching Effectiveness and Evaluation. Journal of Economic Education
9: 123–126.
Morsh, J.E., Burgess, G.G., and Smith, P.N. (1956). Student achievement as a measure
of instructor effectiveness. Journal of Educational Psychology 47: 79–88.
Murray, H.G. (1980). Evaluating University Teaching: A Review of Research. Toronto:
Ontario Confederation of University Faculty Associations.
Murray, H.G. (1983). Low-inference classroom teaching behaviors in relation to six
measures of college teaching effectiveness. Proceedings of the Conference on the
Evaluation and Improvement of University Teaching: The Canadian Experience
(pp. 43–73). Montreal: McGill University, Centre for Teaching and Learning Service.
Murray, H.G. (1991). Effective teaching behaviors in the college classroom. In
J.C. Smart (ed.), Higher Education: Handbook of Theory and Research (Vol. 7). New
York: Agathon Press.
Orpen, C. (1980). Student evaluations of lecturers as an indicator of instructional
quality: A validity study. Journal of Educational Research 74: 5–7.
Owen, P.H. (1967). Some Dimensions of College Teaching: An Exploratory Study Using
Critical Incidents and Factor Analyses of Student Ratings. Unpublished doctoral dissertation, University of Houston, Houston.
Pascarella, E.T., and Terenzini, P.T. (1991). How College Affects Students: Findings and
Insights from Twenty Years of Research. San Francisco: Jossey-Bass.
Perry, R.P. (1991). Perceived control in college students: Implications for instruction
in higher education. In J.C. Smart (ed.), Higher Education: Handbook of Theory and
Research (Vol. 7). New York: Agathon Press.
Plant, W.T., and Sawrey, J.M. (1970). Student ratings of psychology professors
as teachers and the research involvement of the professors rated. The Clinical
Psychologist 23: 15–16, 19.
Rankin, E.F., Jr., Greenmun, R., and Tracy, R.J. (1965). Factors related to student
evaluations of a college reading course. Journal of Reading 9: 10–15.
Remmers, H.H. (1929). The college professor as the student sees him. Purdue University
Studies in Higher Education 11: 1–63.
Remmers, H.H., Martin, F.D., and Elliott, D.N. (1949). Are students’ ratings of
instructors related to their grades? Purdue University Studies in Higher Education 66:
17–26.
Remmers, H.H., and Weisbrodt, J.A. (1964). Manual of Instructions for Purdue Rating
Scale of Instruction. West Lafayette, IN: Purdue Research Foundation.
125
Feldman: Identifying Exemplary Teachers and Teaching
Rosenshine, B., Cohen, A., and Furst, N. (1973). Correlates of student preference
ratings. Journal of College Student Personnel 14: 269–272.
Rubinstein, J., and Mitchell, H. (1970). Feeling free, student involvement, and appreciation. Proceedings of the 78th Annual Convention of the American Psychological
Association 5: 623–624.
Sagen, H.B. (1974). Student, faculty, and department chairmen ratings of instructors:
Who agrees with whom? Research in Higher Education 2: 265–272.
Sanders, J.A., and Wiseman, R.L. (1990). The effects of verbal and nonverbal teacher
immediacy on perceived cognitive, affective, and behavioral learning in the multicultural classroom. Communication Education 39: 341–353.
Seldin, P. (1991). The Teaching Portfolio. Boston: Anker Publishing.
Shore, B.M., and associates (1986). The Teaching Dossier: A Guide to Its Preparation
and Use (Rev. Ed.). Montreal: Canadian Association of University Teachers.
Smith, L.R., and Land, M.L. (1980). Student perception of teacher clarity in mathematics. Journal for Research in Mathematics Education 11: 137–146.
Sockloff, A.L. (1973). Instruments for student evaluation of faculty: Ideal and actual.
In A.L. Sockloff (ed.), Proceedings of the First Invitational Conference on Faculty Effectiveness as Evaluated by Students. Philadelphia, PA: Temple University, Measurement
and Research Center.
Solomon, D., Rosenberg, L., and Bezdek, W.E. (1964). Teacher behavior and student
learning. Journal of Educational Psychology 55: 23–30.
Spencer, R.E. (1967). Analysis of the Instructor Rating Form—General Engineering
Department. (Research Report No. 253). Urbana, II: University of Illinois,
Measurement and Research Division, Office of Instructional Resources.
Stumpf, S.A., and Freedman, R.D. (1979). Expected grade covariation with student
ratings of instruction: Individual versus class effects. Journal of Educational
Psychology 71: 293–302.
Theall, M., Franklin, J., and Ludlow, L. (1990a). Attributions and retributions: Student
ratings and the perceived causes of performance. Instructional Evaluation 11: 12–17.
Theall, M., Franklin, J., and Ludlow, L. (1990b). Attributions or Retributions: Student
Ratings and the Perceived causes of Performance. Paper presented at the annual meeting
of the American Educational Research Association.
Turner, R.L. (1970). Good teaching and its contexts. Phi Delta Kappan 52: 155–158.
Turner, R.L., and Thompson, R.P. (1974). Relationships between College Student Ratings
of Instructors and Residual Learning. Paper presented at the annual meeting of the
American Educational Research Association.
Van Horn, C. An Analysis of the 1968 Course and Instructor Evaluation Report.
(Institutional Research Bulletin No. 2–68). West Lafayette, IN: Purdue University,
Measurement and Research Center.
Walker, B.D. (1968). An Investigation of Selected Variables Relative to the Manner in
which a Population of Junior College Students Evaluate their Teachers. Unpublished
doctoral dissertation, University of Houston,
Widlak, F.W., McDaniel, E.D., and Feldhusen, J.F. (1973). Factor Analysis of an
Instructor Rating Scale. Paper presented at the annual meeting of the American
Educational Research Association.
Williams, H.Y., Jr. (1965). College Students’ Perceptions of the Personal Traits and
Instructional Procedures of Good and Poor Teachers. Unpublished doctoral dissertation, University of Minnesota, Minneapolis.
126
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
Appendix
This appendix, with its listing of 28 instructional dimensions, first appeared in
Feldman (1989b) in a slightly different version. For each of the instructional dimensions, examples of evaluation items that would be classified into it are given.
For refinements and modifications to this list of dimensions and attendant coding
scheme, see d’Apollonia, Abrami and Rosenfield (1993) and Abrami, d’Apollonia and
Rosenfield (1996).
No. 1 Teacher’s Stimulation of Interest in the Course and Its Subject Matter: “the
instructor puts material across in an interesting way”; “the instructor gets
students interested in the subject”; “it was easy to remain attentive”; “the teacher
stimulated intellectual curiosity”; etc.
No. 2 Teacher’s Enthusiasm (for Subject or for Teaching): “the instructor shows interest
and enthusiasm in the subject”; “the instructor seems to enjoy teaching”; “the
teacher communicates a genuine desire to teach students”; “the instructor never
showed boredom for teaching this class”; “the instructor shows energy and
excitement”; etc.
No. 3 Teacher’s Knowledge of Subject Matter: “the instructor has a good command
of the subject material”; “the teacher has a thorough knowledge, basic and
current, of the subject”; “the instructor has good knowledge about or beyond
the textbook”; “the instructor knows the answers to questions students ask”;
“the teacher keeps lecture material updated”; etc.
No. 4 Teacher’s Intellectual Expansiveness (and Intelligence): “the teacher is well
informed in all related fields”; “the teacher has respect for other subject areas
and indicates their relationship to his or her own subject of presentation”; “the
teacher exhibited a high degree of cultural attainment”; etc.
No. 5 Teacher’s Preparation; Organization of the Course: “the teacher was well prepared
for each day’s lecture”; “the presentation of the material is well organized”;
the overall development of the course had good continuity”; “the instructor
planned the activities of each class period in detail”; etc.
No. 6 Clarity and Understandableness: “the instructor made clear explanations”; the
instructor interprets abstract ideas and theories clearly”; “the instructor makes
good use of examples and illustrations to get across difficult points”; “the teacher
effectively synthesizes and summarizes the material”; “the teacher answers
students’ questions in a way that helps students to understand”; etc.
No. 7 Teacher’s Elocutionary Skills: “the instructor has a good vocal delivery”; “the
teacher speaks distinctly, fluently and without hesitation”; “the teacher varied
the speech and tone of his or her voice”; “the teacher has the ability to speak
distinctly and be clearly heard”; “the instructor changed pitch, volume, or
quality of speech”; etc.
No. 8 Teacher’s Sensitivity to, and Concern with, Class Level and Progress: “the teacher
was skilled in observing student reactions”; “the teacher was aware when
students failed to keep up in class”; “the instructor teaches near the class level”;
“the teacher takes an active personal interest in the progress of the class and
shows a desire for students to learn”; etc.
127
Feldman: Identifying Exemplary Teachers and Teaching
No. 9 Clarity of Course Objectives and Rdequirements: “the purposes and policies of
the course were made clear to the student”; “the instructor gave a clear idea of
the student requirements”; “the teacher clearly defined student responsibilities
in the course”; “the teacher tells students which topics are most important and
what they can expect on tests”; “the instructor gave clear assignments”; etc.
No. 10 Nature and Value of the Course Material (Including Its Usefulness and Relevance):
“the teacher has the ability to apply material to real life”; “the instructor makes
the course practical”; “there is worthwhile and informative material in lectures
that doesn’t duplicate the text”; “the course has excellent content”; “the class
considers what we are learning worth learning”; etc.
No. 11 Nature and Usefulness of Supplementary Materials and Teaching Aids: “the
homework assignments and supplementary readings were helpful in understanding the course”; “the teacher made good use of teaching aids such as films
and other audio-visual materials”; “the instructor provided a variety of activities
in class and used a variety of media (slides, films, projections, drawings) and
outside resource persons”; etc.
No. 12 Perceived Outcome or Impact of Instruction: “gaining of new knowledge was
facilitated by the instructor”; “I developed significant skills in the field”;
“I developed increased sensitivity and evaluative judgment”; “the instructor
has given me tools for attacking problems”; “the course has increased my
general knowledge”; “apart from your personal feelings about the teacher,
has he/she been instrumental in increasing knowledge of the course’s subject
matter”; etc.
No. 13 Teacher’s Fairness; Impartiality of Evaluation of Students; Quality of Examinations: “grading in the course was fair”; “the instructor has definite standards
and is impartial in grading”; “the exams reflect material emphasized in the
course”; “test questions were clear”; “coverage of subject matter on exams was
comprehensive”; etc.
No. 14 Personality Characteristics (“Personality”) of the Teacher: “the teacher has a
good sense of humor”; “the teacher was sincere and honest”; “the teacher is
highly personable at all times in dress, voice, social grace, and manners”; “the
instructor was free of personal peculiarities”; “the instructor is not autocratic
and does not try to force us to accept his ideas and interpretations”; “the
teacher exhibits a casual, informal attitude”; “the instructor laughed at his own
mistakes”; etc.
No. 15 Nature Quality, and Frequency of Feedback from the Teacher to Students: “the
teacher gave satisfactory feedback on graded material”; “criticism of papers was
helpful to students”; “the teacher told students when they had done a good
job”; “the teacher is prompt in returning tests and assignments”; etc.
No. 16 Teacher’s Encouragement of Questions and Discussion, and Openness to Opinions
of Others: “students felt free to ask questions or express opinions”; the instructor
stimulated class discussions”; “the teacher encouraged students to express differences of opinions and to evaluate each other’s ideas”; “the instructor invited
criticisms of his or her own ideas”; “the teacher appeared receptive to new ideas
and the viewpoints of others”; etc.
No. 17 Intellectual Challenge and Encouragement of Independent Thought (by the Teacher
and the Course): “this course challenged students intellectually”; “the teacher
encouraged students to think out answers and follow up ideas”; “the teacher
128
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
attempts to stimulate creativity”; “the instructor raised challenging questions
and problems”; etc.
18 Teacher’s Concern and Respect for Students; Friendliness of the Teacher: “the
instructor seems to have a genuine interest in and concern for students”; “the
teacher took students seriously”; “the instructor established good rapport with
students”; “the teacher was friendly toward all students”; etc.
19 Teacher’s Availability and Helpfulness: “the instructor was willing to help
students having difficulty”; “the instructor is willing to give individual
attention”; “the teacher was available for consultation”; “the teacher was accessible to students outside of class”; etc.
20 Teacher Motivates Students to Do Their Best; High Standard of Performance
Required: “Instructor motivates students to do their best work”; “the instructor
sets high standards of achievement for students”; “the teacher raises the aspirational level of students”; etc.
21 Teacher’s Encouragement of Self-Initiated Learning: “Students are encouraged
to work independently”; “students assume much responsibility for their own
learning”; “the general approach used in the course gives emphasis to learning
on the students’ own”; “the teacher does not suppress individual initiative”; etc.
22 Teacher’s Productivity in Research Related Activities: “The teacher talks about
his own research”; “instructor displays high research accomplishments”; “the
instructor publishes material related to his subject field”; etc.
23 Difficulty of the Course (and Workload)—Description: “the workload and pace of
the course was difficult”; “I spent a great many hours studying for this course”;
“the amount of work required for this course was very heavy”; “this course
required a lot of time”; “the instructor assigned very difficult reading”; etc.
24 Difficulty of the Course (and Workload)—Evaluation: “the content of this course
is too hard”; “the teacher’s lectures and oral presentations are ‘over my head’ ”;
“the instructor often asked for more than students could get done”; “the instructor
attempted to cover too much material and presented it too rapidly”; etc.
25 Classroom Management: “the instructor controls class discussion to prevent
rambling and confusion”; “the instructor maintained a classroom atmosphere
conducive to learning”; “students are allowed to participate in deciding the
course content”; “the teacher did not ‘rule with an iron hand’ ”; etc.
26 Pleasantness of Classroom Atmosphere: “the class does not make me nervous”;
“I felt comfortable in this class”; “the instructor created an atmosphere in which
students in the class seemed friendly”; “this was not one of those classes where
students failed to laugh, joke, smile or show other signs of humor”; “the teacher
is always criticizing and arguing with students”; etc.
27 Individualization of Teaching: “instead of expecting every student to do the same
thing, the instructor provides different activities for different students”; “my
grade depends primarily upon my improvement over my past performance”;
“in this class each student is accepted on his or her own merits”; “my grade is
influenced by what is best for me as a person as well as by how much I have
learned”; “the instructor evaluated each student as an individual”; etc.
28 Teacher Pursued and/or Met Course Objectives: “the instructor accomplished
what he or she set out to do”; “there was close agreement between the announced
objectives of the course and what was actually taught”; “course objectives stated
agreed with those actually pursued”; etc.
129
COMMENTARY AND UPDATE ON FELDMAN’S (1997)
“IDENTIFYING EXEMPLARY TEACHERS AND TEACHING:
EVIDENCE FROM STUDENT RATINGS”
Michael Theall∗ and Kenneth A. Feldman†
∗
Youngstown State University
mtheall@ysu.edu
†
SUNY-Stony Brook
Abstract
In the original chapter (1997), Feldman explores how student ratings can be used to
identify those teachers who are seen by students as exemplary, while noting certain
precautions (which involve myths, half-truths and bias) in doing so. He also analyzes
how exemplary teaching itself can be identified in terms of specific pedagogical dispositions, behaviors and practices of teachers. While the essential findings of this earlier
analysis remain valid, there have been changes in the nature and focus of research
on college teaching and its evaluation. As well, new challenges and developments
are forcing higher education to rethink its paradigms and practices in such areas
as teaching, the evaluation of faculty performance, and the kinds of support faculty
need to meet the increasingly complex professional demands placed on the professoriate. The co-authors of the commentary and update (Theall and Feldman) review the
principal findings of the original chapter, discuss the literature of the past decade, and
offer suggestions for ways in which higher education and the professoriate can survive
and flourish in the future
Key Words: College teaching; dimensions of teaching; exemplary teaching; student
ratings of instruction; reliability and validity; myths vs. research evidence; faculty
evaluation; the professoriate; higher education; paradigm shift; research and practice;
faculty development; professional enrichment; faculty as meta-professionals; faculty
careers
Reviewing the extant literature, Feldman (1997) explored how student
ratings could be used to identify those persons who are seen by students
as exemplary teachers, while noting certain precautions (involving
current myths and half-truths as well as issues of bias) in doing so.
He then analyzed how exemplary teaching itself can be identified in
terms of specific pedagogical dispositions, behaviors and practices of
teachers. He reviewed dimensions of teaching that are associated with
student learning and with overall evaluations of teachers.
130
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
Since Feldman’s chapter appeared, the number of publications
about student ratings has not noticeably diminished nor has the amount
of discussion abated. Although, in general, the major conclusions of
his chapter still hold, two tracks of activity have become apparent—
both of which we consider in various places in this commentary and
update. One track is the continuation of scholarship by researchers
and practitioners in the field with an increasing emphasis on bringing
the research into practice. This activity has not gone unnoticed as
evidenced by the fact that the 2005 American Educational Research
Association’s “Interpretive Scholarship, Relating Research to Practice
Award” went to evaluation and ratings researchers doing just this
kind of work. The second track has been less productive in terms of
improving practice. It is represented by an array of opinion and reports
of investigations attempting to prove that ratings are biased, are the
cause of grade inflation, and are threats to promotion, tenure, academic
freedom, and the general quality of higher education. One result of
this activity has been the extension of misinformation and mythology
surrounding ratings, which in effect has made improved practice more
difficult (Aleamoni, 1987; Feldman, 1997).
GENERALIZATIONS AND CONCLUSIONS ABOUT RATINGS
AS EVIDENCE OF EXEMPLARY TEACHERS
AND TEACHING: SOME CAUTIONS
Feldman cautioned that his 1997 chapter was “ not intended as a
comprehensive review of the research literature on evaluation of college
students (ratings) of their teachers or on the correlates of effective
teaching in college.” (p. 385, parenthetical term added). This caution
still applies for four reasons. First, it is clear that the number and
variety of issues affecting teaching and learning is exceptionally large
and complex, and thus beyond the scope of the present update. For
example, recent work by Pascarella and Terenzini (2005) and Kuh et al.
(2005) demonstrates that student performance is affected by a number
of conditions beyond classroom teaching and other efforts of the faculty
members. College instructors, existing in this same set of conditions,
cannot help but be influenced as well, and thus their satisfaction as well
as that of their students can affect their teaching and their students’
perceptions of it (Cranton & Knoop, 1991). Ratings reflect students’
opinions about teaching, and they do correlate with learning (Cohen,
1981), but to some degree they also indicate students’ general satisfaction with their experiences. Environmental factors can affect those
131
Theall and Feldman: Commentary on Feldman’s (1997)
experiences and thus complicate an already complex measurement
situation. Indeed, the complexity of the whole teaching-learning picture
demands a broader view that both includes and goes beyond ratings as
evidence of exemplary teaching.
A second reason for repeating the caveat is that while Feldman’s
chapter can remain essentially unchallenged in terms of its conclusions
(because there is little in the way of substantial new evidence that contradicts his interpretations), at the same time there is an absence of new
literature about exemplary teaching in contexts essentially nonexistent
when the earlier analysis was completed. Of particular note, for example,
is the growth of technologically enhanced instruction and on-line or
other “distance” teaching and learning experiences. Thus, Feldman’s
(1989) work refining and extending the synthesis of data from multivalidation studies (reviewed in Feldman’s 1997 chapter) remains a primary
source of information about the dimensions of college teaching in traditional teaching and learning settings (also, see Abrami, d’Apollonia and
Rosenfield, 1996). But, the 1997 chapter cannot be updated without also
considering Feldman’s (1998) chapter urging readers to consider the
effects of context and “unresolved issues.”
The growth of instruction in settings other than traditional classrooms raises questions about the extent to which established models
and psychometric techniques can be transplanted into these new
situations. Because there has not been a great deal of research on
how to evaluate teaching in these contexts, these questions remain
unresolved. Using the same traditional questionnaires and producing
the same traditional reports raises serious validity issues. In addition,
and given the number of references found in opinion pieces in the
press and elsewhere, the emergence of private or for-profit on-line
ratings has added to the faculty’s legitimate concern about the misuse
of ratings data. Clearly, the issues related to technological innovations
are numerous, and while we note their impact here we cannot explore
them in depth.
The third reason involves the nature and variety of publications
specifically on student ratings. Earlier literature contained many indepth analyses of ratings issues characterized by reports drawn from
validation studies (e.g., Cashin, 1990, with IDEA; Centra, 1972, with
SIR; Marsh, 1987, with SEEQ), syntheses or meta-analyses (e.g., Cohen,
1981; Feldman, 1989; Abrami, d’Apollonia and Rosenfield, 1996), the
use of large databases to explore specific issues (e.g., Franklin and
Theall, 1992, with TCEP), the involvement of scientists/researchers
whose primary research emphases were teaching, learning, or ratings
132
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
themselves (e.g., Doyle, 1975; Centra, 1979; Marsh, 1984) and, importantly, the extended discussion of reported results. An example of
this last factor can be found in commentary and studies following the
Naftulin, Ware and Donnelly (1973) “Dr. Fox” article. Commentaries
were published in four issues of Instructional Evaluation between 1979
and 1982,1 and Raymond Perry and associates conducted a series of
studies on educational seduction and instructor expressiveness between
1979 and 1986, incorporating perceived control as a major variable in
their later work.2 Though there was public notice of the “Dr. Fox”
study, the primary participants in the dialogue were the researchers
themselves. This is less the case today, as almost any opinion from
any quarter (it would seem) is deemed worthy of publication, and
because communications technologies allow anyone to publish and
widely circulate an opinion without the process required in traditional
refereed environments.
Finally, affecting the scope of this update is the general descent
of the status of the professoriate and higher education itself. Even
respected academicians have produced work with clear sarcasm in their
titles—for example, “Dry Rot in the Ivory Tower” (Campbell, 2000) and
“Declining by Degrees” (Hersh and Merrow, 2005). A spate of books
and opinions has inflamed the general public, editorial writers, and
legislators who feel ever more comfortable demanding “accountability.”
The interesting irony is that if, to the joy of many critics, ratings were
to be eliminated in favor of student learning as a measure of teaching
excellence, then the same arguments used to question the reliability
and validity of ratings would arise with respect to testing and grading.
“Grade inflation,” which according to these same critics (e.g., Johnson,
2003; Trout, 2000) is the by-product of ratings, would not disappear.
Rather, grades might either become favored by faculty as evidence
of teaching excellence (“Look how well I did. My students all got
As!”) or they would become the new criteria by which poor teaching
would be characterized (“S/he must be a poor teacher! Look how
many students got As). Thus, in this brave new world, teachers might
1
Instructional Evaluation (now Instructional Evaluation and Faculty Development) is a semi-annual
publication of the Special Interest Group in Faculty Teaching, Evaluation, and Development
of the American Educational Research Association. Issues from 1996 are available on-line at:
http://www.umanitoba.ca/uts/sigfted/backissues.php. Earlier issues can be purchased using information provided at: http://www.umanitoba.ca/uts/sigfted/iefdi/spring00/bkissues.htm.
2
Studies with instructor expressiveness as a variable include Perry, Abrami and Leventhal (1979)
through Perry, Magnusson, Parsonson and Dickens (1986). The conclusions of the research were
that expressiveness alone does not enhance achievement but can influence ratings of specific
presentation skills, and that in combination with appropriate content it can positively influence
both ratings and achievement.
133
Theall and Feldman: Commentary on Feldman’s (1997)
provide evidence of excellence by either “dumbing down” courses to
maximize the numbers of As or, in the opposite perversion, failing as
many students as possible.
THE PUBLIC DEBATE ON STUDENT RATINGS
Discussion of ratings issues has continued, and perhaps has even
been unduly influenced by recent publications. For example, Johnson
(2003) has supported a proposal at Duke University (see Gose, 1997)
whereby class grade profiles would be used to rate classes so that
the grades students received could then be given more or less weight
in a calculation of the GPA. Such reaction and over-reaction seems
based primarily on assumptions that grade inflation has been caused
by the use of ratings and by a focus on learners as customers
or consumers. The language of complaints almost always includes
these issues in a simplistic way without crediting established findings
(e.g., Cohen, 1981; Marsh, 1987) or taking account of the larger
picture of improving evaluation and teaching in complimentary ways
(e.g., Theall & Franklin, 1991).
Many recent publications are based on one-time and/or smallsample studies that vary substantially from methodologically accepted
practice (e.g., Williams & Ceci, 1997), many include ratings issues in
work from other disciplinary perspectives (e.g., Hamermesh & Parker,
2003), many are more opinion pieces than specific research on ratings
(e.g., Trout, 2000), and few are by researchers whose primary emphasis
has been faculty evaluation or student ratings (which would include
all of the above-cited items). Many of these pieces have become well
known by virtue of the interest of widely distributed publications
(e.g., Academe, The Chronicle of Higher Education) in the controversy
surrounding ratings.
One partial exception was the substantial work by Greenwald and
Gillmore (1997a, 1997b) that culminated in a “Current Issues” section
of American Psychologist (Vol. 52, No. 11) devoted to the topic of grade
inflation and ratings. That was followed by an AERA symposium on the
same topic. While there was considerable disagreement with Greenwald
and Gillmore’s contention that grading leniency was a “contaminant” of
ratings to be statistically corrected, the point is that the work included
a series of studies using a substantial database, and it was followed by
an extended debate on the work and its conclusions by experienced
ratings researchers. Nonetheless, the Chronicle published a lengthy
article (Wilson, 1998) that contained errors serious enough to attract
134
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
critical letters from many researchers who were quoted, including
Gerald Gillmore himself.
This over-emphasis on criticisms of ratings has led to another
problem: the propagation of the criticisms themselves as a separate and
widely believed mythology about ratings. Arreola (2005a) maintains
that these myths have become a kind of “ common knowledge, so
pervasive that it far overshadows the ‘truth’ concerning student ratings
and other faculty evaluation tools buried in the pages of psychometric
journals” (p. 1). This pattern has even progressed to the point where
writers critical of ratings (e.g., Johnson, 2003) refer to well-established
and often-replicated ratings findings as “myths.” Not surprisingly, there
has been criticism of Johnson’s book from within the community of
ratings researchers and practitioners (e.g., Perry, 2004). One implication of Johnson’s notoriety is that experienced evaluation and ratings
researchers need to do a better job of putting their findings before two
critical audiences: the faculty and administrators who use these data
(Arreola, 2005a, 2005b).
A POSITIVE SHIFT IN EMPHASIS: RESEARCH INFORMING
PRACTICE
Apart from the public debate on student ratings, there has been a
shift in emphasis in recent years in the study and consideration of
student ratings. Ratings researchers, writers, and practitioners have
tended to move from necessary but sometimes narrow psychometric
investigations concerned with validity and reliability of ratings, to the
application of evaluation and ratings research to practice. Beginning
as early as Theall and Franklin (1990a), through updates of previous
work, this pattern has led to detailed descriptions of, and guidelines
for, the development of “comprehensive evaluation systems” (Arreola,
2000). As recently as the 2005 meeting of the American Educational
Research Association, the question, “Valid Faculty Evaluation Data:
Are There Any?” was raised with an emphasis on improving evaluation
practice rather than establishing or re-establishing the purely technical
validity and reliability of ratings.
To a large extent this stream of thinking echoes Feldman’s
(1998) emphasis on the importance of a “continuing quest” (when
analyzing the correlates and determinants of effective instruction) for
“ establishing the conditions or contexts under which relationships
become stronger or weaker or change in some other way. . . . The
quest calls attention to determining the importance of ‘interaction
effects’ as well as ‘main effects”’ (p. 36). The context in which evaluation
135
Theall and Feldman: Commentary on Feldman’s (1997)
takes place has been shown to have a potentially serious effect on the
way that ratings data can be both interpreted and used. For example,
Franklin and Theall (1993) found gender differences in ratings in
certain academic departments. Although women had lower average
ratings, further exploration showed that in those departments women
had been disproportionately assigned to teach large, introductory,
required classes—those where teachers in general might be expected
to have lower ratings. Replication of the study at another institution
where course assignments were equally distributed found no gender
differences. The information available to faculty and administrators
rarely includes analysis that goes beyond mean scores or averages;
thus, contextual subtleties are lost, misinterpretation is more likely,
and integration of ratings research with practice is hindered.
Another contextual factor that can influence ratings is institutional
type as exemplified, say, by the different emphases and operations of
community colleges and research universities described by Birnbaum
(1988). Such differences can affect the perceptions of faculty, students,
and administrators at these institutions, thus influencing the expectations held for faculty work and the definitions of “exemplary teaching.”
Contextual differences can further occur across disciplines in average
ratings of teachers and courses (Cashin, 1990); in instructional choices
of faculty (Franklin & Theall, 1992); in their effects on students’ assimilation into the disciplines (Smart, Feldman, and Ethington, 2000);
and in the extent to which teachers communicate expectations about
course work (Franklin & Theall, 1995).
IMPROVING THE PRACTICE OF RATING TEACHERS
AND INSTRUCTION
In the past half-dozen years or so, there have been several new attempts
to improve ratings and evaluation practice. Perhaps the most focused
work is by Arreola (2000), who describes a detailed process for “Developing a Comprehensive Faculty Evaluation System.” Arreola outlines
an eight-step process that can be used to generate institutional dialogue
on issues that need to be discussed before any evaluation or ratings
process is begun. Theall and Franklin (1990b) proposed that ratings are
only one part of “complex evaluation systems,” and Arreola’s (2000)
process outline remains the only articulated approach that takes into
account and deals with the contextual issues that greatly influence
evaluation and ratings practice on a campus-by-campus basis.
136
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
Arreola has not been alone in pressing for improved practice.
Indeed, no less than six volumes of the Jossey Bass “New Directions” series have been devoted to ratings issues since Feldman’s
(1997) chapter was published. The first contribution to this extended
discussion was from Ryan (2000), who proposed a “Vision for
the Future” based not only on sound measurement, but on
“ philosophical issues that need to be addressed if faculty evaluation is to receive the respect and attention it deserves” (backpage,
“From the Editor”). Theall, Abrami and Mets (2001) asked about
ratings, “Are they valid? How can we best use them?” Included in
their volume are chapters reminiscent of the depth and extent of
earlier exemplary dialogue and debate in the field (noted earlier in the
present commentary). Lewis (2001) edited a volume concentrating on
“Techniques and Strategies for Interpreting Student Evaluations.” In
particular, this set of articles connects issues of accountability to the
faculty evaluation process and considers ratings as existing within the
context of department, college, and institutional imperatives (and as
needing to be responsive to these pressures). This volume was immediately followed by one (Knapper & Cranton, 2001) presenting “Fresh
Approaches to the Evaluation of Teaching.” Colbeck (2002) took an
appropriately broad view of “Evaluating Faculty Performance,” noting
that “Forces for change within and outside academe are modifying
faculty work and the way that work is—or should be—evaluated”
(p. 1). Finally, Johnson and Sorenson (2004) presented a specific
discussion of a new aspect of the ratings and evaluation picture: the
rapidly increasing use of on-line systems. Acknowledging that this
rapid growth is occurring “ even amidst challenges and doubt” (p.1),
they and other contributors present a balanced review of the advantages
and disadvantages of on-line systems.3
BEYOND RATINGS AS EVIDENCE OF EXEMPLARY
TEACHING: ENHANCING FACULTY CAREERS
It can be argued that college teaching and learning, evaluations of
teachers, and higher education itself have changed to the point where
it is no longer reasonable or prudent to consider student ratings of
3
Although there is space only to list references here, in the past yen years or so various volumes of
Higher Education: Handbook of Theory and Research have published articles dealing with evaluation
of teaching, teaching effectiveness, and improvement in instructional practices: see, for example,
Boice (1997), Feldman (1998), Murray (2001), Cashin (2003), and Centra (2004).
137
Theall and Feldman: Commentary on Feldman’s (1997)
teaching effectiveness without also considering the context in which they
occur. These ratings are or should be embedded in processes (faculty
evaluation and development) that are connected to department and
college issues (staffing, funding, competition for resources), institutional
issues (assessment, accreditation, reputation) and other matters that
extend beyond campus (public support, legislation, accountability). A
systematic and ecological approach to evaluation and ratings is needed
because effective practice is not possible without consideration of (and
coordination with) these other issues.
Recent work (Arreola, 2005b, 2005c; Arreola, Theall, & Aleamoni,
2003; Theall, 2002)4 has expanded on past approaches and incorporated a wider view that encompasses past literature on faculty evaluation and development, the professoriate, business, science, communications, and reaction to change, as well as new discussions of how
contemporary changes and forces are affecting higher education and
the professoriate (e.g., Hersh & Merrow, 2005; Newman, Couturier, &
Scurry, 2004).
Defining the professoriate as requiring in-depth expertise in a
disciplinary or “base profession” as well as professional skills in
several other “meta-professional” areas, Arreola, Theall and Aleamoni
(2003) have developed a two-dimensional matrix that arrays four
faculty roles (Teaching, Scholarly and Creative Activities, Service, and
Administration) against three base-profession skills (content expertise,
practice/clinical skills, and research techniques), and twenty metaprofessional skill sets (e.g., instructional design skills, group process
and team-building skills, public speaking skills). The frequency of need
for each skill-by-role cell in the matrix is indicated by color-coding.
Five additional matrices are provided, in which the roles are broken
down into component parts. For example, the “teaching” role has seven
contextual components ranging from traditional classroom situations
to on-line and distance learning. Scholarly and Creative Activities,
Service, and Administration are also broken down into their contextual
components, and a special matrix demonstrates how Boyer’s (1990)
“scholarship of teaching (and learning)” presents a special case of
meta-professional requirements.
4
The “Meta-Profession Project” is an ongoing effort to improve practice in faculty evaluation
and development. It is based on the analysis of the roles faculty are required to fill and
the skills necessary to successfully carry out role responsibilities. The basic roles and skill
sets are displayed in a series of two-dimensional matrices available at the project website at
http://www.cedanet.com/meta. The site also contains an overview of the concept and project,
copies of various papers and publications, and related information.
138
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
The conceptualization of the meta-profession and the matrices
provide a framework for exploring the nature and demands of faculty
work on a campus-by-campus basis and thus for improving practice in
faculty evaluation and development. Similarly, this exploration can lead
to improved policy and provide numerous opportunities to investigate
faculty work on a broad scale, particularly as it is affected by variables
such as institutional type, individual and campus demographics, and
changes in prevailing economic and other conditions.
A FINAL COMMENT
Clearly, various psychometric issues (including reliability and validity)
are important to the study and improvement of student ratings
(Feldman, 1998). Even so, and despite these technical requirements,
faculty evaluation and the use of student ratings involve more
than psychometric issues; important professional, political, social,
personnel, and personal issues also come into play. Recent years have
seen the potential threat of a seemingly endless and unproductive
debate on reliability and validity issues—unproductive in the sense
that what has been established in over fifty years of substantial research
has been largely ignored for reasons that include administrative convenience, ignorance, personal biases, suspicion, fear, and the occasional
hostility that surrounds any evaluative process.
Evaluation and student ratings are unlikely to improve until
practice is based on a careful and accurate analysis of the work
required of faculty, the skills required to do that work, and the
levels of performance expected. Further, good practice requires the
creation of complete systems that acknowledge the absolute need
to blend professional and faculty development resources with those
necessary for fair and equitable faculty evaluation. Student ratings
form a part of this picture, but too often have been inappropriately
employed with the result that there has been a disproportionate amount
of attention, debate and dissension, accompanied by misinformation
based on questionable research, a ratings mythology, and the propagation of a general sense that the use of ratings somehow demeans the
teaching profession. To the extent that the negative feeling is based
on a degree of truth that has its base in poor practice, then improving
practice becomes a critical agenda.
139
Theall and Feldman: Commentary on Feldman’s (1997)
References
Abrami, P.C., d’Apollonia, S., and Rosenfield, S. (1996). The dimensionality of student
ratings of instruction: What we know and what we do not. In J.C. Smart (ed.),
Higher Education: Handbook of Theory and Research. New York: Agathon Press.
Aleamoni, L.M. (1987). Student rating myths versus research facts. Journal of Personnel
Evaluation in Education 1: 111–119.
Arreola, R.A. (2000). Developing a Comprehensive Faculty Evaluation System (2nd
edition). Bolton, MA: Anker Publishing Company.
Arreola, R.A. (2005a). Validity, like beauty is Paper presented at the annual
meeting of the American Educational Research Association. (Available at:
http://www.cedanet.com/meta/
and
http://www.umanitoba.ca/uts/sigfted/
backissues.php)
Arreola, R.A. (2005b). Crossing over to the dark side: Translating research in faculty
evaluation into academic policy and practice. Invited address presented at the
annual meeting of the American Educational Research Association. (Available at:
http://www.cedanet.com/meta/)
Arreola, R.A. (2005c). The monster at the foot of the bed. To Improve the Academy
24: 15–28.
Arreola, R.A., Theall, M., and Aleamoni, L.M. (2003). Beyond scholarship: Recognizing the multiple roles of the professoriate. Paper presented at the annual
meeting of the American Educational Research Association. (Available at:
http://www.cedanet.com/meta/)
Birnbaum, R. (1988). How Colleges Work. San Francisco: Jossey Bass.
Boice, B. (1997). What discourages research-practioners in faculty development.
In J.C. Smart (ed.), Higher Education: Handbook of Theory and Research (Vol. 12).
New York: Agathon Press.
Boyer, E.L. (1990). Scholarship Reconsidered. San Francisco: Jossey Bass.
Campbell, J.R. (2000). Dry Rot in the Ivory Tower. Lanhan, MD: University Press of
America.
Cashin, W.E. (1990). Students do rate different academic fields differently. In M. Theall
and J. Franklin (eds.), Student Ratings of Instruction: Issues for Improving Practice
(New Directions for Teaching and Learning No. 43). San Francisco: Jossey Bass.
Cashin, W.E. (2003). Evaluating college and university teaching: Reflections of a
practioner. In J.C. Smart (ed.), Higher Education: Handbook of Theory and Research
(Vol. 18). Norwell, MA: Kluwer Academic Publishers.
Centra, J.A. (1972). The student instructional report: Its development and uses.
(Student Instructional Report No. 1). Princeton, NJ: Educational Testing Service.
Centra, J.A. (1979). Determining Faculty Effectiveness: Assessing Teaching, Research, and
Service for Personnel Decisions and Improvement. San Francisco: Jossey Bass.
Centra, J.A. (2004). Service through research: My life in higher education. In J.C. Smart
(ed.), Higher Education: Handbook of Theory and Research (Vol. 19). Norwell, MA:
Kluwer Academic Publishers.
Cohen, P.A. (1981). Student ratings of instruction and student achievement: A metaanalysis of multisection validity studies. Review of Educational Research 51: 281–309.
Colbeck, C.L. (ed.) (2002). Evaluating Faculty Performance (New Directions for
Institutional Research No. 114). San Francisco: Jossey Bass.
140
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
Cranton, P.A., and Knoop, R. (1991) Incorporating job satisfaction into a model of
instructional effectiveness. In M. Theall and J. Franklin (eds.), Effective Practices
for Improving Teaching (New Directions for Teaching and Learning No. 48).
San Francisco: Jossey Bass.
Doyle, K.O. (1975). Student Evaluation of Instruction. Lexington, MA: D. C. Heath.
Feldman, K.A. (1989). The association between student ratings of specific instructional
dimensions and student achievement: Refining and extending the synthesis of data
from multisection validity studies. Research in Higher Education 30: 583–645.
Feldman, K.A. (1997). Identifying exemplary teachers and teaching: Evidence from
student ratings. In R.P. Perry and J.C. Smart (eds.), Effective Teaching in Higher
Education: Research and Practice. New York: Agathon Press.
Feldman, K.A. (1998). Reflections on the effective study of college teaching and student
ratings: One continuing quest and two unresolved issues. In J.C. Smart (ed.), Higher
Education: Handbook of Theory and Research (Vol. 13). New York: Agathon Press.
Franklin, J., and Theall, M. (1992). Disciplinary differences, instructional goals and
activities, measures of student performance, and student ratings of instruction. Paper
presented at the annual meeting of the American Educational Research Association.
(ERIC Document Reproduction Service No. ED 346 786)
Franklin, J., and Theall, M. (1993). Student ratings of instruction and gender
differences revisited. Paper presented at the annual meeting of the American Educational Research Association.
Franklin, J., and Theall, M. (1995). The relationship of disciplinary differences and the
value of class preparation time to student ratings of instruction. In N. Hativa and
M. Marincovich (eds.), Disciplinary Differences in Teaching and Learning: Implication
for Practice (New Directions for Teaching and Learning No. 64). San Francisco:
Jossey-Bass.
Gose, B. (1997). Duke may shift grading system to reward students who take
challenging classes. The Chronicle of Higher Education February 14: A43.
Greenwald, A.G., and Gillmore, G.M. (1997a). Grading leniency is a removable
contaminant of student ratings. American Psychologist 52: 1209–1217.
Greenwald, A.G., and Gillmore, G.M. (1997b). No pain, no gain? The importance of
measuring course workload in student ratings of instruction. Journal of Educational
Psychology 89: 743–751.
Hamermesh, D., and Parker, A.M. (2003). Beauty in the classroom: Professors’
pulchritude and putative pedagogical productivity. (NBER Working Paper No.
W9853). Austin TX: University of Texas at Austin. Abstract available at:
http://ssrn.com/abstract=425589
Hersh, R.H., and Merrow, J. (2005). Declining by Degrees: Higher Education at Risk.
New York: Palgrave Macmillan.
Johnson, T., and Sorenson, L. (2004). Online Student Ratings of Instruction (New Directions for Teaching and Learning No. 96). San Francisco: Jossey Bass.
Johnson, V. (2003). Grade Inflation: A Crisis in Higher Education. New York: Springer
Verlag.
Knapper, C., and Cranton, P.A. (eds.) (2001). Fresh Approaches to the Evaluation
of Teaching (New Directions for Teaching and Learning No. 88). San Francisco:
Jossey Bass.
Kuh, G.D., Kinzie, J., Schuh, J.H., Whitt, E.J., and Associates (2005). Student Success
in College: Creating Conditions That Matter. San Francisco: Jossey-Bass.
141
Theall and Feldman: Commentary on Feldman’s (1997)
Lewis, K.G. (ed) (2001). Techniques and Strategies for Interpreting Student Evaluations
(New Directions for Teaching and Learning No. 87). San Francisco: Jossey Bass.
Marsh, H.W. (1984). Students’ evaluations of university teaching: Dimensionality,
reliability, validity, potential biases, and utility. Journal of Educational Psychology 76:
707–754.
Marsh, H.W. (1987). Students’ evaluations of university teaching: Research findings,
methodological issues, and directions for future research. International Journal of
Educational Research 11: 253–388.
Murray, H.G. (2001). Low-interence teaching behaviors and college teaching effectiveness: Recent developments and controversies. In J.C. Smart (ed.), Higher
Education: Handbook of Theory and Research (Vol. 16). New York: Agathon Press.
Naftulin, D.H., Ware, J.E., and Donnelly, F.A. (1973). The Dr. Fox lecture: A paradigm
of educational seduction. Journal of Medical Education 48: 630–635.
Newman, F., Couturier, L., and Scurry, J. (2004). The Future of Higher Education:
Rhetoric, Reality, and the Risks of the Market. San Francisco: Jossey Bass.
Pascarella, E.T., and Terenzini, P.T. (2005). How College Affects Students, Volume 2:
A Third Decade of Research. San Francisco: Jossey Bass.
Perry, R.P. (2004). Review of a V. Johnson’s Grade Inflation: A Crisis in College
Education. Academe January-February: 10–13.
Perry, R.P., Abrami, P.C., and Leventhal, L. (1979). Educational seduction: The effect
of instructor expressiveness and lecture contrent on student ratings and achievement.
Journal of Educational Psychology 71: 107–116.
Perry, R.P., Magnusson, J.L., Parsonson, K.L., and Dickens, W.J. (1986). Perceived
control in the college classroom: Limitations in instructor expressiveness due to
non-contingent feedback and lecture content. Journal of Educational Psychology 78:
96–107.
Ryan, K.E. (ed.) (2000). Evaluating Teaching in Higher Education: A Vision of the Future
(New Directions for Teaching and Learning No. 83). San Francisco: Jossey Bass.
Smart, J.C., Feldman, K.A., and Ethington, C.A. (2000). Academic Disciplines: Holland’s
Theory and the Study of College Students and Faculty. Nashville, TN: Vanderbilt
University Press.
Theall, M. (2002). Leadership in faculty evaluation and development: Some thoughts
on why and how the meta-profession can control its own destiny. Invited address at
the annual meeting of the American Educational Research Association. (Available
at: http://www.cedanet.com/meta/)
Theall, M., Abrami, P.C., and Mets, L.A. (eds.) (2001). The Student Ratings Debate: Are
They Valid? How Can We Best Use Them? (New Directions for Institutional Research
No. 109). San Francisco: Jossey Bass.
Theall, M., and Franklin, J.L. (eds.) (1990a). Student Ratings of Instruction: Issues
for Improving Practice (New Directions for Teaching and Learning No. 43).
San Francisco: Jossey Bass.
Theall, M., and Franklin, J.L. (1990b). Student ratings in the context of complex
evaluation systems. In M. Theall and J. Franklin (eds.), Student Ratings of Instruction:
Issues for Improving Practice (New Directions for Teaching and Learning No. 43).
San Francisco: Jossey Bass.
Theall, M., and Franklin, J. (eds.) (1991). Effective Practices for Improving Teaching
(New Directions for Teaching and Learning No. 48). San Francisco: Jossey Bass.
142
THE SCHOLARSHIP OF TEACHING AND LEARNING IN HIGHER EDUCATION
Trout, P. (2000). Flunking the test: The dismal record of student evaluations. Academe
July-August: 58–61.
Williams, W.M., and Ceci, S.J. (1997). How’m I doing?: Problems with student ratings
of instructors and courses. Change 29(5): 13–23.
Wilson, R. (1998). New research casts doubt on value of student evaluations of
professors. The Chronicle of Higher Education January 16: A1, A16.
143
Download