K C. D

advertisement
Student Perceptions of High Course
Workloads are Not Associated with Poor
Student Evaluations of Instructor
Performance
KAY C. DEE
Department of Applied Biology and Biomedical Engineering
Rose-Hulman Institute of Technology
ABSTRACT
Many engineering faculty believe that when students perceive a
course to have a high workload, students will rate the course and
the performance of the course instructor poorly. This belief can be
particularly worrying to engineering faculty since engineering
courses are often perceived as uniquely demanding. The present
investigation demonstrated that student ratings of workload and
of overall instructor performance in engineering courses were not
correlated (e.g., Spearman’s rho 0.068) in data sets from either
of two institutions. In contrast, a number of evaluation items were
strongly correlated (Spearman’s rho 0.7 to 0.899) with ratings
of overall instructor performance across engineering, mathematics and science, and humanities courses. The results of the present
study provide motivation for faculty seeking to improve their
teaching and course evaluations to focus on teaching methods,
organization/preparation, and interactions with students, rather
than course workload.
Keywords: student evaluations, student workload, teaching
evaluations
I. INTRODUCTION
Student evaluations of teaching are frequently used in faculty
performance reviews, often with a focus on one or two numericallycoded, broadly-phrased evaluation items (e.g., “Overall, the instructor’s performance was:” rated on a scale of 1 to 5). Evaluation
scores, and factors that may affect these scores, can therefore be a
source of worry for instructors. For example, many faculty members
believe that when students perceive a course to have a high workload, students rate that course, and the performance of the instructor, poorly. The corollary belief—that students give favorable
course and instructor ratings when course workload is low—is also
common. In one study, 54 percent of faculty (but only 26 percent of
students) agreed or strongly agreed with the statement “To get favorable evaluations, professors demand less from students” [1].
A number of published studies have shown little to no relationship between student ratings of course workload and overall course
January 2007
quality or instructor performance (e.g., [2–5]). Why, then, do beliefs relating workload and teaching evaluations persist among engineering faculty? One reason may be that many published studies
[2–5] utilized data from a variety of course types rather than specifically engineering students/courses. Engineering courses are often
held to be uniquely demanding, and engineering students are often
considered to be a unique type of audience. Evidence that students
do or do not rate engineering courses differently than other types of
courses would provide guidance in applying research findings from
courses in other disciplines to those in engineering. Some studies
[6, 7] have shown little to no relationship between student ratings
of engineering course workload and overall course/instructor ratings, but engineering faculty may be reluctant to accept either results from an individual campus system and culture [7] or from a
very large pool of multiple types of campuses and cultures [6] as applicable to their specific situation. Results from only one institution
may seem too specific; results from a large pool of institutions may
seem too broadly homogenized. Evidence that students do or do
not rate engineering courses/instructors differently at different institutions would provide guidance regarding the generalizability of
published results from different study populations.
The present investigation sought to address some of the informational needs of engineering faculty who are seeking to understand and improve their course/teaching evaluations. First, data
from engineering courses at two different types of institutions were
examined for evidence of a relationship between student perceptions of engineering course workload and evaluations of overall instructor effectiveness. Second, data from multiple types of courses
were used to investigate whether students appear to evaluate engineering courses/instructors and mathematics, science, and humanities courses/instructors differently. Third, the collected data were
examined to identify trends that may be helpful to engineering faculty working to improve their teaching and course evaluations.
II. METHODS
A. Data Collection
Data from two schools were used in this study. The School of
Engineering at Tulane University was, between 1997 and 2002, a
relatively small unit within a large doctoral-granting research university: roughly 700 of the approximately 6500 undergraduates were
engineering majors.1 Engineering students at Tulane interacted
1Data are from 1997 to 2002.
Journal of Engineering Education 69
regularly with peers from different academic areas (liberal arts, business, etc.). In this situation, faculty worry that students will anecdotally compare workloads across majors, perceive their engineering
workload as too high, and punish engineering instructors with poor
course/teaching evaluations. In contrast, Rose-Hulman Institute of
Technology is a small school, strongly focused on undergraduate
engineering and science education. At present, roughly 84 percent
of Rose-Hulman’s approximately 1900 students (M.S. and B.S.
only) major in some form of engineering. In this situation, faculty
worry that career-oriented students may perceive less value in completing work for non-engineering courses, and may rate non-engineering courses differently from those directly related to an engineering major. Because the environments and campus cultures of
Tulane and Rose-Hulman are different, common trends in student
evaluations of teaching across the two data sets may be more
generally-applicable than trends discerned from one campus population only.
Course-averaged evaluation scores for all classes offered at RoseHulman Institute of Technology during the 2004–2005 academic
year were obtained from the Rose-Hulman Office of Institutional
Research, Planning, and Assessment. These evaluations were administered electronically via a Web-based form. Students had one week
to complete evaluations—the final week of classes, prior to the final
exam period each academic quarter. Table 1 shows the set of numerically-rated evaluation items with the applicable response scales; spaces
were provided for students to type additional or explanatory comments as well. If fewer than five students submitted evaluations for a
course, no data from that course were used for analyses. Evaluations
of military science (i.e., Reserve Officers’ Training Corps) courses and
courses designated as graduate-level only (i.e., graduate seminars)
were not used. The resulting data set consisted of information from
490 engineering courses, 390 mathematics and science courses, and
165 humanities courses. The response rates for the engineering,
mathematics and science, and humanities course evaluations used in
the present study were 78 15%, 78 15%, and 82 13%
(means standard deviations), respectively. Therefore, the data used
in the present study are likely a fair to good representation of the perceptions of students in these classes.
A second data set consisted of course-averaged evaluation
scores for all classes offered through the Tulane University
School of Engineering from the fall of 1997 to the fall of 2002.
These evaluations consisted of a 17-item “bubble sheet” paper
questionnaire, which asked students to indicate whole numbers
ranging from 1 to 5 to signify their level of agreement with a
given statement. The evaluations were administered in each
class by School of Engineering staff at the end of each semester,
prior to the final exam period. Course instructors left their
classrooms while evaluations were administered. Evaluation response rates were not calculated or tracked as part of the administrative process, but since these evaluations were administered
during a normal class period with anticipated normal attendance, the data used in the present study are likely to be a fair to
good representation of the perceptions of students in these
classes. Further information on the Tulane evaluation form
items and procedures can be found in reference [8]. If fewer
than five students submitted evaluations for a course, no data
from that course were used for analyses. The resulting data set
consisted of information from 823 courses offered through the
School of Engineering.
70
Journal of Engineering Education
B. Data Analysis
Simple correlational analyses were chosen for the present study
since they provide easily-understandable verbal information (numerical coefficients, significance values) and visual information
(scatterplots, trendlines) accessible to a broad audience. The Pearson correlation coefficient is a common way to characterize the association between two variables; this parametric technique carries a
number of assumptions about the variance and distribution of the
data to be examined [9]. The nonparametric Spearman’s rho correlation is based on relative ranks of data rather than on the observed
numerical values of data, and does not depend on stringent assumptions about the shape of the population from which the observations were drawn [9].
Pearson correlation coefficients and Spearman’s rho correlation
coefficients were first calculated from the Tulane data. Statistical
outliers, or courses with at least one course-averaged item score
more than three standard deviations away from the overall mean
score for that item, were identified and removed from the data set.
Pearson and Spearman’s correlation coefficients were then recalculated for this data set. The Pearson correlation coefficients
were compared with previously-reported Pearson correlation coefficients obtained from the same original data set after outliers were
removed, a natural log transform was applied to the scores, and z
scores were subsequently calculated from the log-transformed data
[8]. The transforms and z scores reduced skew and equalized variances, making the data fit better with the assumptions inherent in
the use of the parametric Pearson correlation (see reference [8] for
detailed data descriptors, discussion of model adequacy checking
and data transformations, and a complete set of inter-item correlations from the Tulane data). The (previously-reported) Pearson
correlations calculated after multiple data transformations and the
Pearson correlations calculated (in the present study) after the removal of statistical outliers were only slightly different (mean
change of 0.017 across all items, maximum change of 0.04 on any
one item) from Pearson correlations calculated from the original
data set with no data transformation or outlier removal. Spearman’s
rho correlation coefficients calculated after removing outliers were
only slightly different (mean change of 0.007 across all items, maximum change of 0.02 on any one item) from Spearman’s coefficients
calculated without removing outliers. Subsequent data analyses utilized solely Spearman’s rho correlation coefficients and original data
sets with no alterations or transformations.
Spearman’s correlation coefficients were calculated from the
Rose-Hulman data for all courses, for engineering courses only, for
mathematics, science, and humanities courses only, for mathematics and science courses only, and for humanities courses only. Selected correlation coefficients were quantitatively compared using
Fisher’s Z statistic [10], a method of testing whether two population correlation coefficients are equal. Linear regressions were conducted on selected items from the Rose-Hulman data, and the
slope and intercept values from the regression lines were statistically
compared using t-tests [9]. Data from the Rose-Hulman engineering courses were sorted into quartiles according to the numerical
ratings of the overall instructor performance. In other words, courses were sorted in descending order of overall instructor performance
scores; courses in the top 25 percent of overall instructor performance scores were considered “highest quartile” courses and courses
in the bottom 25 percent of overall instructor performance scores
were considered “lowest quartile” courses. Mean scores on each
January 2007
Table 1. Evaluation items and response scales. Evaluations were administered electronically in a Web-based form. For each evaluation
item, students clicked on a “radio button” corresponding to the numerical and text ratings shown in this table.
evaluation item from courses in the highest and lowest quartiles
(e.g., ostensibly viewed by students as the “best-taught” and “worsttaught” courses) were then compared using the Mann-Whitney
test, a nonparametric way of determining whether two independent
samples are from the same population [9].
All calculations of correlation coefficients, linear and quadratic
regressions, and Mann-Whitney tests were conducted using SPSS
for Windows© (SPSS Inc.). Determination of statistical outliers,
quartiles and associated means, and comparisons of correlation coJanuary 2007
efficients and linear regression parameters were conducted using
Excel© (Microsoft Corporation).
III. RESULTS
Figure 1 shows that Rose-Hulman student ratings of overall instructor performance were neither linearly nor quadratically related
to student ratings of course workload required in relation to other
Journal of Engineering Education 71
Figure 1. Ratings of overall instructor performance as a function of perceived relative workload. Axis scales are the same for each subframe
within this figure. For engineering courses, n 472; for mathematics and science courses, n 390; for humanities courses, n 165. Correlation coefficients shown (rS) are Spearman’s rho. Solid lines represent linear regression lines; dashed lines represent quadratic curve fits.
courses of equal credit. Figure 2 presents the means and standard
deviations of scores on all of the Rose-Hulman evaluation items, for
courses in the highest and lowest quartiles of ratings on the item assessing overall instructor performances. Examining Figure 2 in conjunction with Table 1 reveals that courses in the lowest quartile (i.e.,
with the poorest ratings of overall instructor performance) received
different (p 0.05, Mann-Whitney test) mean ratings from highest-quartile courses on all evaluation items except for items related to
the pace of the material and the workload relative to courses of equal
credit. Lowest-quartile courses and highest-quartile courses did not
receive significantly different workload ratings from students.
In contrast to the workload-related results, ratings on a number
of other Rose-Hulman evaluation items were strongly correlated
with ratings of overall instructor performance (Figure 3 and
Table 2). For example, the strength of agreement with statements
such as “The professor used teaching methods that helped me
learn,” “The professor met the stated course objectives,” and “The
professor generally was well-prepared for class” was strongly associated with better ratings of overall instructor performance (Figure 3).
Items most strongly correlated with overall instructor performance
ratings (Table 2) tended to focus on student perceptions of the professor’s teaching/presentation methods, preparation, sensitivity to
students and interest in the subject, and overall learning experience
72
Journal of Engineering Education
and course quality. The items listed in Table 2 also yielded the
strongest correlations when considering only humanities courses
(with correlations ranging from 0.880 to 0.673) and when considering only mathematics and science courses (with correlations ranging
from 0.892 to 0.703), with some minor re-ordering and in the case
of humanities courses the substitution of “Grading was objective
and impartial” for “The professor seemed genuinely interested in
teaching this subject.” Qualitatively similar results were obtained
from the Tulane data (Table 3), in which items most strongly correlated with overall instructor performance ratings tended to focus on
student perceptions of the professor’s teaching/presentation methods, interest in teaching and students, and overall learning experience. Some items on the Tulane evaluation were phrased similarly
to items on the Rose-Hulman form; a few of these “matched” items
yielded differing correlations with overall instructor performance
(Table 3).
Items most weakly associated with overall instructor performance
ratings (Table 5) tended to focus on the number or percent of responses to evaluation items and student perceptions of course pace,
workload, textbook, and coordination between laboratory exercises
and course materials. The items listed in Table 5 also yielded the
weakest correlations when considering only humanities courses (with
correlations ranging from 0.003 to 0.381) and when considering
January 2007
Figure 2. Comparisons of evaluation ratings from engineering courses with the lowest- and highest-quartile ratings of overall instructor
performance. All evaluation items were ranked on a numerical scale from 1 to 5; text descriptors of numerical rankings for each evaluation item
are given in Table 1. Lowest-quartile courses received significantly (0.01, two-tailed Mann-Whitney test) different ratings from highestquartile courses on all evaluation items except those denoted by arrows (i.e., ratings of the pace of the material, and of the amount of work relative to other courses of the same credit). Data shown are mean one standard deviation; n 118 for each quartile.
only mathematics and science courses (with correlations ranging from
0.022 to 0.333), with some minor re-ordering. The correlations between the workload evaluation item and the overall instructor performance item were similar across engineering (0.07), mathematics and
science (0.05), and humanities (0.08) courses.
None of the correlation coefficients presented in Tables 2 or 5
and calculated using information from engineering courses were
significantly (p 0.05) different from coefficients calculated for the
same evaluation item using information from mathematics, science,
and humanities courses. Differences between linear regression
slopes and intercepts from engineering and from mathematics, science, and humanities courses were only observed for two of the six
evaluation items most strongly correlated (Table 2) with overall instructor performance. The largest differences in slope and intercept
were observed for the item “The professor generally was wellprepared for class.” For this item, the regression slope from mathematics, science, and humanities courses (1.17) was higher
(p 0.01) than that from engineering courses (0.952); the intercept
of the regression line from mathematics, science, and humanities
courses (1.04) was more negative (p 0.01) than that from engineering courses (0.10).
January 2007
IV. DISCUSSION
The analyses conducted in the present study revealed a very
small correlation (0.07) between Rose-Hulman student ratings of
overall instructor performance and of course workload. This is in
agreement with a previous investigation of the data from Tulane
University, which detected a small correlation at best between students’ perceptions of course workload and instructor performance,
and which was unable to determine an even marginally-acceptable
descriptive relationship (either linear or quadratic) between workload and instructor performance evaluation items [8]. Marsh [3,
11] has reported the correlation of higher levels of course
workload/difficulty with higher student ratings of instructional
quality, citing, for example, a correlation coefficient of 0.16 between student ratings of course workload/difficulty and of the instructor overall [3]. Similarly, Chau and Hocevar [2] calculated a
mean correlation of 0.13 between the workload/difficulty factor
and all other factors of the Students’ Evaluations of Educational
Quality rating instrument. Furthermore, analysis of the Individual
Development and Educational Assessment system revealed low
correlations between ratings of “Amount of reading” and “Amount
Journal of Engineering Education 73
Figure 3. Scatterplots of overall instructor performance ratings versus the strongest-correlating evaluation items from all engineering,
mathematics, science, and humanities courses. Correlation coefficients shown (rS) are Spearman’s rho. Solid lines represent linear regression
lines. n 1014 for all subframes within this figure.
of work in other (non-reading) assignments” and items regarding
the instructor’s teaching procedures [5]. These investigations [2–5]
utilized data from a variety of course types. Gall et al. have reported
a weak correlation (0.21) between student ratings of mechanical
engineering course workload and instructor performance [7].
Centra [6] has examined student ratings of instruction across multiple fields of study, including a group of engineering and technology courses from two- and four-year colleges. Combining level of
difficulty, workload, and course pace into a single “Difficulty/Workload” factor, Centra calculated a correlation of 0.06 between this factor and overall course evaluations [6], which agrees well with the results of the present study.
Centra [12] has also reported that courses in natural sciences,
mathematics, and engineering tend to receive lower course and instructor ratings than courses in other disciplines. In the present
study, instructor ratings for humanities courses (4.15 0.44,
mean standard deviation) were statistically (p 0.01, MannWhitney test) higher than for either engineering or mathematics
and science courses (3.90 0.55 and 3.96 0.57 respectively;
means standard deviations), in agreement with Centra’s report
[12]. The largest differences observed in the present study between
regression line slopes and intercepts imply that engineering profes74
Journal of Engineering Education
sors with the same overall instructor performance rating as mathematics, science, and humanities professors may be rated as slightly
more prepared for class—or that engineering instructors rated at the
same level of preparation as mathematics, science, and humanities
instructors, may receive slightly lower overall instructor performance
ratings. Further research could explore this observation. The predominance of the observations from the present study imply that the
population of students examined used similar standards to rate
courses in different disciplines, as evidenced by similar correlations
between the same teaching/course evaluation items and ratings of
overall instructor performance. A key result of the present study is
the observation of poor relationships—regardless of type of courses
or campus culture—between student ratings of instructor performance and course workload.
Definitions of “workload” may vary from student to student [13].
Different student or researcher definitions (i.e., assigned homework
only, or assigned homework plus hours spent studying) of courseassociated work may account for some differences between studies
[6]. For example, Greenwald and Gillmore constructed a model
presenting expected grade as a direct mediator and workload as an
indirect mediator of course evaluation ratings, positing that grading
leniency influences student evaluations of teaching [14]. A re-analysis
January 2007
Table 2. Strongest correlations with ratings of overall instructor performance. All correlation coefficients (Spearman’s rho) 0.7 are reported.
Response scales for evaluation items are given in Table 1; the item used to rate overall instructor performance was “Overall, how would you rate
the professor’s performance in this course?” For engineering courses, n 472; for mathematics, science, and humanities courses, n 540. All
correlations are statistically significant at the 0.01 (two-tailed) level. None of the engineering correlation coefficients for the evaluation items in
this table were statistically different (at or below the 0.05, two-tailed level) from the related mathematics, science, or humanities coefficients.
of Greenwald and Gillmore’s data [15] concluded that the relationship between expected grade and course evaluations could be eliminated by including perceived learning as a factor, and that workload
not associated with perceived learning (i.e., viewed by students as
unnecessary, excessive, etc.) had a negative effect on student evaluations of teaching [15]. Structural models which treat workload
viewed by students as valuable to learning (“good”) distinctly from
unneeded “bad” workload have shown that “good” workload is positively and “bad” workload is negatively associated with ratings of
overall teaching and perceived learning [4]. If the data sets presented
in Figure 1 were shaped like inverted letter U’s or V’s, this could have
provided evidence that in addition to viewing too much work as
“bad,” students view too little work as “bad” (i.e., not enough to help
them learn), and automatically give negative course/teaching evaluations in both cases. However, the present study does not provide
such evidence. The quadratic curve fits shown in Figure 1 are poor.
Furthermore, replotting the data shown in Figure 1, such that workload ratings that deviate to either side of the middle “About the
Same” rating are equivalently ranked, produces poor correlations and
curve fits similar to those displayed in Figure 1.
The main goal of the present study was not to define or discriminate between types of workloads, but rather to test the simple hypothesis that students perceive high workloads as “bad” and low
workloads as “good” and bias course/teaching evaluations accordingly. The correlations reported in the present study (and others)
between workload and instructor performance are small, and large
sample sizes (hundreds, as in the present study and others [3, 4], to
many thousands [2, 6], to over a hundred thousand [5]) allow small
effects that are not necessarily of practical importance [16] to be
January 2007
detected with statistical significance. The results of the present
study indicate that for the real-world purpose of attempting to determine reasons for poor evaluations or to focus efforts to improve
evaluations, student perceptions of higher course workloads are not
simply associated with poorer student evaluations of instructor
performance.
In contrast to workload, some evaluation items (dealing with the
general areas of organization and preparation, teacher/student
interactions, and teaching methods that help students stay attentive
and learn) were easily and strongly associated with the overall
instructor performance item, across different populations of engineering students from different campus cultures and across different
disciplines within a given campus culture. These results concur with
other reports that ratings of teacher organization/preparation
[17–19] and instructor accessibility [7] are strongly associated with
overall course/instructor ratings. It is possible that: when instructors
use organization strategies that students understand combined with
teaching methods that help students stay attentive, students may
learn more; when students learn more, they may rate the quality of
their overall learning experience more highly; when instructors
create a learning environment for students that is organized, understandable, and supportive, students may rate the performance of instructors more highly. Certainly, correlation does not imply causation. However, the areas found in the present study to be strongly
associated with ratings of overall instructor performance are appropriate professional development topics for educators regardless of
their academic discipline, and there are many resources (e.g.,
[20–24]) for faculty seeking ideas. Not every idea in the educational
literature will fit every instructor’s style, courses, or goals; it is better
Journal of Engineering Education 75
Table 3. Strongest correlations with ratings of overall instructor performance —Tulane University School of Engineering evaluations. All
correlation coefficients reported are Spearman’s rho. Response scales for evaluation items were a reverse Likert scale with five divisions ranging
from “Strongly Agree” (1) to “Strongly Disagree” (5). The evaluation item used to rate overall instructor performance was “Overall, how would
you rate the professor’s performance in this course?” and the response scale for this item was five divisions ranging from “Excellent” (1) to “Poor”
(5). n 822 except for items 1, 6, and 7 for which n 821, 820, and 823, respectively. All correlations are statistically significant at the 0.01
level (two-tailed).
Table 4. Comparison of correlations calculated from similar evaluation items. All correlation coefficients reported are Spearman’s rho.
Response scales for Rose-Hulman items are given in Table 1. Superscripts associated with Tulane evaluation items denote the response scales:
a five divisions ranging from “Strongly Agree” (1) to “Strongly Disagree” (5); b five divisions ranging from “Excellent” (1) to “Poor” (5);
c five divisions of “Definitely Too Little” (1), “Somewhat Too Little” (2), “About Right” (3), “Somewhat Too Much” (4), and “Definitely
Too Much” (5). n 472 for Rose-Hulman items and n 822 for Tulane items except for the items associated with delivery, accessibility,
interest in teaching, and the text, for which n 820, 822, 822, and 808, respectively. Shaded arrows denote a pair of correlations that are statistically different (p 0.01, two-tailed); unfilled arrows denote a pair of correlations that are not statistically different.
76
Journal of Engineering Education
January 2007
Table 5. Weakest correlations with ratings of overall instructor performance. Evaluation Factors are aspects other than responses to items on
the evaluation. All correlation coefficients reported are Spearman’s rho. Response scales for evaluation items are given in Table 1; the evaluation item used to rate overall instructor performance was “Overall, how would you rate the professor’s performance in this course?” For engineering courses, n 472 except for the laboratory assignment item (n 448) and the textbook item (n 447). For mathematics, science, and
humanities courses, n 540 except for the laboratory assignment item (n 490) and the textbook item (n 531). The symbol indicates
that a correlation is statistically significant at the 0.05 level; the * symbol indicates that a correlation is statistically significant at the 0.01 level.
None of the engineering correlation coefficients for the evaluation items in this table were statistically different (at or below the 0.05, two-tailed
level) from the related mathematics, science, or humanities coefficients.
for an instructor to try a few small changes that feel appropriate and
could be sustainable over time rather than to attempt a drastic revision of style and/or substance. For example, here are two small ideas
that can be applicable across a broad range of instructor styles and
pedagogical goals:
1) Tell your students what you are doing and why. As the course
progresses, briefly explain: your choice of teaching methods;
the design of assignments; the organization of the material;
your expectations for student learning and performance; that
you are interested in helping students learn.
2) Seek and use formative feedback. Make at least one opportunity early in the course (say, after the first three weeks) for
students to anonymously give you (a) ideas of things that are
going well, and (b) things that could be changed. Choose at
least one thing students believe to be going well, tell students
that you will continue that practice, and then do so. Choose
at least one thing that students have suggested as a potential
change, thank students for their ideas, describe the change
you will make, and then make the change. Consider seeking
additional student feedback on the impact of the change after
two or three more weeks.
Communicating with the class should improve their understanding of your organization and preparation, your approach to
teacher/student interactions, and your choice of teaching methods.
Seeking and using formative feedback is not only a clear demonstration of considering and meeting students’ needs, it will give you
a chance to improve the course prior to the final summative
evaluations.
January 2007
V. CONCLUSION
Hundreds of studies have been conducted on student evaluations
of teaching. As quoted from reference [11] “Probably, students’ evaluations of teaching effectiveness are the most thoroughly studied of all
forms of personnel evaluation, and one of the best in terms of being
supported by empirical research.” The present study assumes that student evaluations reflect student opinions reliably, validly, and usefully
[11, 17, 18]. This investigation did not attempt to address a number of
questions that could be asked about student evaluations of teaching,
such as whether factors about the instructor or course [3, 17, 18, 23,
25] beyond the criteria on evaluation forms affect ratings, whether
characteristics of the student population [25] affect ratings, what other
types of evaluations or assessments can or should be used for faculty
performance reviews, etc. This study sought to determine what, if any,
strong correlations could be found within teaching evaluation data because student evaluations of teaching are part of faculty performance
review systems at many institutions. Such review systems may prioritize numerical scores from one or a few general overall items (e.g., an
“overall instructor performance” item) as quantifiable and general descriptors of a multifaceted practice. Whether faculty agree with this
practice [26] or not [3, 17], faculty generally have to work within the
system (i.e., earn tenure/promotion) before they can lead efforts to
promote changes in the system. With this in mind, the present investigation provides motivation for faculty across academic disciplines and
campus cultures to focus on teaching methods that help students stay
attentive and learn, on organization and preparation, and on
teacher/student interactions, rather than on course workloads.
Journal of Engineering Education 77
ACKNOWLEDGMENTS
I thank Mark Schawitsch of the Rose-Hulman Institutional
Research, Planning, and Assessment Office, for providing the
Rose-Hulman course evaluation data. I also thank Glen A. Livesay
for offering comments on and suggestions for this project, as well as
former colleagues at Tulane University for their interest in this work.
REFERENCES
[1] Sojka, J., Gupta, A.K., and D.R. Deeter-Schmelz, “Student and Faculty Perceptions of Student Evaluations of Teaching: A Study of Similarities
and Differences,” College Teaching, Vol. 50, No. 2, 2002, pp. 44–49.
[2] Chau, H., and D. Hocevar, “Higher-Order Factor Analysis of
Multidimensional Students’ Evaluations of Teaching Effectiveness,” presented at The Annual Conference of the American Educational Research
Association (AERA), New Orleans, Lousiana, 1994, obtainable via Educational Resources Information Center (ERIC).
[3] Marsh, H.W., “Students’ Evaluations of University Teaching: Dimensionality, Reliability, Validity, Potential Biases, and Utility,” Journal of
Educational Psychology, Vol. 76, No. 5, 1984, pp. 707–754.
[4] Marsh, H.W., “Distinguishing Between Good (Useful) and Bad
Workloads on Students’ Evaluations of Teaching,” American Educational
Research Journal, Vol. 38, No. 1, 2001, pp. 183–212.
[5] Sixbury, G.R., and W.E. Cashin, “Description of Database for the
Idea Diagnostic Form,” IDEA Technical Report No. 9, 1995, Center for Faculty Evaluation & Development, Kansas State University, Manhattan, Kansas.
[6] Centra, J.A., “Will Teachers Receive Higher Student Evaluations
by Giving Higher Grades and Less Course Work?,” Research in Higher
Education, Vol. 44, No. 5, 2003, pp. 495–518.
[7] Gall, K., D.W. Knight, L.E. Carlson, and J.F. Sullivan, “Making
the Grade with Students: The Case for Accessibility,” Journal of Engineering Education, Vol. 92, No. 4, 2003, pp. 337–343.
[8] Dee, K.C., “Reducing the Workload in Your Class Won’t “Buy”
You Better Teaching Evaluation Scores: Re-Refutation of a Persistent
Myth,” Proceedings, 2004 American Society for Engineering Education Annual
Conference and Exposition, Salt Lake City, Utah: American Society for Engineering Education, 2004, Session 1331.
[9] Glantz, S.A., Primer of Biostatistics, 4th ed., New York, New York:
McGraw-Hill Health Professions Division, 1997.
[10] Mickey, R.M., O.J. Dunn, and V.A. Clark, Applied Statistics:
Analysis of Variance and Regression, 3rd ed., Hoboken, New Jersey: John
Wiley & Sons, Inc., 2004.
[11] Marsh, H.W., “Students’ Evaluations of University Teaching: Research Findings, Methodological Issues, and Directions for Future Research,”
International Journal of Educational Research, Vol. 11, 1987, pp. 253–388.
[12] Centra, J.A., Reflective Faculty Evaluation: Enhancing Teaching
and Determining Faculty Effectiveness, San Francisco, CA: Jossey-Bass
Publishers, 1993.
[13] Kember, D., “Interpreting Student Workload and the Factors
Which Shape Students’ Perceptions of Their Workload,” Studies in Higher
Education, Vol. 29, No. 2, 2004, pp. 165–184.
[14] Greenwald, A.G., and G.M. Gillmore, “No Pain, No Gain?
The Importance of Measuring Course Workload in Student Ratings of
Instruction,” Journal of Educational Psychology, Vol. 89, No. 4, 1997,
pp. 743–751.
[15] Marsh, H.W., “Effects of Grading Leniency and Low Workload
on Students’ Evaluations of Teaching: Popular Myth, Bias, Validity, or
78
Journal of Engineering Education
Innocent Bystanders?,” Journal of Educational Psychology, Vol. 92, No. 1,
2000, pp. 202–228.
[16] Kirk, R.E., “Practical Significance: A Concept Whose Time Has
Come,” Educational and Psychological Measurement, Vol. 56, No. 5, 1996,
pp. 746–759.
[17] Cashin, W.E., “Student Ratings of Teaching: The Research Revisited,” IDEA Paper no. 32, September 1995, Center for Faculty Evaluation &
Development, Division of Continuing Education, Kansas State University,
Manhattan, Kansas; http://www.idea.ksu.edu/, accessed August 2, 2006.
[18] Cohen, P.A., “Student Ratings of Instruction and Student
Achievement: A Meta-Analysis of Multisection Validity Studies,” Review
of Educational Research, Vol. 51, No. 3, 1981, pp. 281–309.
[19] Pittman, R.B., “Perceived Instructional Effectiveness and Associated Teaching Dimensions,” Journal of Experimental Education, Vol. 54,
No. 1, 1985, pp. 34–39.
[20] Cashin, W.E., “Readings to Improve Selected Teaching Methods,”
IDEA Paper no. 30, September 1994, Center for Faculty Evaluation & Development, Division of Continuing Education, Kansas State University,
Manhattan, Kansas; http://www.idea.ksu.edu/, accessed August 2, 2006.
[21] Felder, R.M., “Additional Higher Education Resources,”
http://www.ncsu.edu/felder-public/coolsites.html, accessed August 3, 2006.
[22] Felder, R.M., “Resources in Science and Engineering Education,”
http://www.ncsu.edu/felder-public/RMF.html, accessed August 3, 2006.
[23] McKeachie, W.J., Teaching Tips: Strategies, Research, and Theory
for College and University Teachers, 9th ed., Lexington, Massachusetts: D.C.
Heath and Company, 1994.
[24] Marsh, H.W., and L.A. Roche, “Appendix 5: Targeted Teaching
Strategy Booklets,” in The Use of Students’ Evaluations of University
Teaching to Improve Teaching Effectiveness, Canberra, ACT: Higher
Education Division, Evaluations and Investigations Program, Department
of Employment, Education and Training, Australian Government Publishing Service, 1994.
[25] Wachtel, H.K., “Student Evaluation of College Teaching Effectiveness: A Brief Review,” Assessment & Evaluation in Higher Education,
Vol. 23, No. 2, 1998, pp. 191–211.
[26] d’Apollonia, S., and P.D. Abrami, “Navigating Student Ratings of
Instruction,” American Psychologist, Vol. 52, No. 11, 1997, pp. 1198–1208.
BIOGRAPHICAL SKETCH
Dr. Kay C Dee is an associate professor of Applied Biology and
Biomedical Engineering at Rose-Hulman Institute of Technology.
She writes papers and gives workshops and presentations on topics
such as student learning styles, evaluations of teaching, assessment
and accreditation, and helping faculty be effective in the classroom.
Her biomedical engineering research focuses on tissue engineering
and biomaterials. Her teaching, educational research, and mentoring of students and faculty have been recognized with a Carnegie
Foundation for the Advancement of Teaching “Professor of the
Year” award for the state of Louisiana; a Tulane University “Inspirational Undergraduate Professor” award; the opportunity to serve
as a Teaching Fellow for a National Effective Teaching Institute; a
Graduate Alliance for Education in Louisiana “Award for Excellence in Mentoring Minority Researchers,” and more.
Address: Department of Applied Biology and Biomedical Engineering, Rose-Hulman Institute of Technology, 5500 Wabash
Avenue, Terre Haute, Indiana, 47803; telephone: (1) 812.877.8502;
fax: (1) 812.877.8545; e-mail: dee@rose-hulman.edu.
January 2007
Download