Student Perceptions of High Course Workloads are Not Associated with Poor Student Evaluations of Instructor Performance KAY C. DEE Department of Applied Biology and Biomedical Engineering Rose-Hulman Institute of Technology ABSTRACT Many engineering faculty believe that when students perceive a course to have a high workload, students will rate the course and the performance of the course instructor poorly. This belief can be particularly worrying to engineering faculty since engineering courses are often perceived as uniquely demanding. The present investigation demonstrated that student ratings of workload and of overall instructor performance in engineering courses were not correlated (e.g., Spearman’s rho 0.068) in data sets from either of two institutions. In contrast, a number of evaluation items were strongly correlated (Spearman’s rho 0.7 to 0.899) with ratings of overall instructor performance across engineering, mathematics and science, and humanities courses. The results of the present study provide motivation for faculty seeking to improve their teaching and course evaluations to focus on teaching methods, organization/preparation, and interactions with students, rather than course workload. Keywords: student evaluations, student workload, teaching evaluations I. INTRODUCTION Student evaluations of teaching are frequently used in faculty performance reviews, often with a focus on one or two numericallycoded, broadly-phrased evaluation items (e.g., “Overall, the instructor’s performance was:” rated on a scale of 1 to 5). Evaluation scores, and factors that may affect these scores, can therefore be a source of worry for instructors. For example, many faculty members believe that when students perceive a course to have a high workload, students rate that course, and the performance of the instructor, poorly. The corollary belief—that students give favorable course and instructor ratings when course workload is low—is also common. In one study, 54 percent of faculty (but only 26 percent of students) agreed or strongly agreed with the statement “To get favorable evaluations, professors demand less from students” [1]. A number of published studies have shown little to no relationship between student ratings of course workload and overall course January 2007 quality or instructor performance (e.g., [2–5]). Why, then, do beliefs relating workload and teaching evaluations persist among engineering faculty? One reason may be that many published studies [2–5] utilized data from a variety of course types rather than specifically engineering students/courses. Engineering courses are often held to be uniquely demanding, and engineering students are often considered to be a unique type of audience. Evidence that students do or do not rate engineering courses differently than other types of courses would provide guidance in applying research findings from courses in other disciplines to those in engineering. Some studies [6, 7] have shown little to no relationship between student ratings of engineering course workload and overall course/instructor ratings, but engineering faculty may be reluctant to accept either results from an individual campus system and culture [7] or from a very large pool of multiple types of campuses and cultures [6] as applicable to their specific situation. Results from only one institution may seem too specific; results from a large pool of institutions may seem too broadly homogenized. Evidence that students do or do not rate engineering courses/instructors differently at different institutions would provide guidance regarding the generalizability of published results from different study populations. The present investigation sought to address some of the informational needs of engineering faculty who are seeking to understand and improve their course/teaching evaluations. First, data from engineering courses at two different types of institutions were examined for evidence of a relationship between student perceptions of engineering course workload and evaluations of overall instructor effectiveness. Second, data from multiple types of courses were used to investigate whether students appear to evaluate engineering courses/instructors and mathematics, science, and humanities courses/instructors differently. Third, the collected data were examined to identify trends that may be helpful to engineering faculty working to improve their teaching and course evaluations. II. METHODS A. Data Collection Data from two schools were used in this study. The School of Engineering at Tulane University was, between 1997 and 2002, a relatively small unit within a large doctoral-granting research university: roughly 700 of the approximately 6500 undergraduates were engineering majors.1 Engineering students at Tulane interacted 1Data are from 1997 to 2002. Journal of Engineering Education 69 regularly with peers from different academic areas (liberal arts, business, etc.). In this situation, faculty worry that students will anecdotally compare workloads across majors, perceive their engineering workload as too high, and punish engineering instructors with poor course/teaching evaluations. In contrast, Rose-Hulman Institute of Technology is a small school, strongly focused on undergraduate engineering and science education. At present, roughly 84 percent of Rose-Hulman’s approximately 1900 students (M.S. and B.S. only) major in some form of engineering. In this situation, faculty worry that career-oriented students may perceive less value in completing work for non-engineering courses, and may rate non-engineering courses differently from those directly related to an engineering major. Because the environments and campus cultures of Tulane and Rose-Hulman are different, common trends in student evaluations of teaching across the two data sets may be more generally-applicable than trends discerned from one campus population only. Course-averaged evaluation scores for all classes offered at RoseHulman Institute of Technology during the 2004–2005 academic year were obtained from the Rose-Hulman Office of Institutional Research, Planning, and Assessment. These evaluations were administered electronically via a Web-based form. Students had one week to complete evaluations—the final week of classes, prior to the final exam period each academic quarter. Table 1 shows the set of numerically-rated evaluation items with the applicable response scales; spaces were provided for students to type additional or explanatory comments as well. If fewer than five students submitted evaluations for a course, no data from that course were used for analyses. Evaluations of military science (i.e., Reserve Officers’ Training Corps) courses and courses designated as graduate-level only (i.e., graduate seminars) were not used. The resulting data set consisted of information from 490 engineering courses, 390 mathematics and science courses, and 165 humanities courses. The response rates for the engineering, mathematics and science, and humanities course evaluations used in the present study were 78 15%, 78 15%, and 82 13% (means standard deviations), respectively. Therefore, the data used in the present study are likely a fair to good representation of the perceptions of students in these classes. A second data set consisted of course-averaged evaluation scores for all classes offered through the Tulane University School of Engineering from the fall of 1997 to the fall of 2002. These evaluations consisted of a 17-item “bubble sheet” paper questionnaire, which asked students to indicate whole numbers ranging from 1 to 5 to signify their level of agreement with a given statement. The evaluations were administered in each class by School of Engineering staff at the end of each semester, prior to the final exam period. Course instructors left their classrooms while evaluations were administered. Evaluation response rates were not calculated or tracked as part of the administrative process, but since these evaluations were administered during a normal class period with anticipated normal attendance, the data used in the present study are likely to be a fair to good representation of the perceptions of students in these classes. Further information on the Tulane evaluation form items and procedures can be found in reference [8]. If fewer than five students submitted evaluations for a course, no data from that course were used for analyses. The resulting data set consisted of information from 823 courses offered through the School of Engineering. 70 Journal of Engineering Education B. Data Analysis Simple correlational analyses were chosen for the present study since they provide easily-understandable verbal information (numerical coefficients, significance values) and visual information (scatterplots, trendlines) accessible to a broad audience. The Pearson correlation coefficient is a common way to characterize the association between two variables; this parametric technique carries a number of assumptions about the variance and distribution of the data to be examined [9]. The nonparametric Spearman’s rho correlation is based on relative ranks of data rather than on the observed numerical values of data, and does not depend on stringent assumptions about the shape of the population from which the observations were drawn [9]. Pearson correlation coefficients and Spearman’s rho correlation coefficients were first calculated from the Tulane data. Statistical outliers, or courses with at least one course-averaged item score more than three standard deviations away from the overall mean score for that item, were identified and removed from the data set. Pearson and Spearman’s correlation coefficients were then recalculated for this data set. The Pearson correlation coefficients were compared with previously-reported Pearson correlation coefficients obtained from the same original data set after outliers were removed, a natural log transform was applied to the scores, and z scores were subsequently calculated from the log-transformed data [8]. The transforms and z scores reduced skew and equalized variances, making the data fit better with the assumptions inherent in the use of the parametric Pearson correlation (see reference [8] for detailed data descriptors, discussion of model adequacy checking and data transformations, and a complete set of inter-item correlations from the Tulane data). The (previously-reported) Pearson correlations calculated after multiple data transformations and the Pearson correlations calculated (in the present study) after the removal of statistical outliers were only slightly different (mean change of 0.017 across all items, maximum change of 0.04 on any one item) from Pearson correlations calculated from the original data set with no data transformation or outlier removal. Spearman’s rho correlation coefficients calculated after removing outliers were only slightly different (mean change of 0.007 across all items, maximum change of 0.02 on any one item) from Spearman’s coefficients calculated without removing outliers. Subsequent data analyses utilized solely Spearman’s rho correlation coefficients and original data sets with no alterations or transformations. Spearman’s correlation coefficients were calculated from the Rose-Hulman data for all courses, for engineering courses only, for mathematics, science, and humanities courses only, for mathematics and science courses only, and for humanities courses only. Selected correlation coefficients were quantitatively compared using Fisher’s Z statistic [10], a method of testing whether two population correlation coefficients are equal. Linear regressions were conducted on selected items from the Rose-Hulman data, and the slope and intercept values from the regression lines were statistically compared using t-tests [9]. Data from the Rose-Hulman engineering courses were sorted into quartiles according to the numerical ratings of the overall instructor performance. In other words, courses were sorted in descending order of overall instructor performance scores; courses in the top 25 percent of overall instructor performance scores were considered “highest quartile” courses and courses in the bottom 25 percent of overall instructor performance scores were considered “lowest quartile” courses. Mean scores on each January 2007 Table 1. Evaluation items and response scales. Evaluations were administered electronically in a Web-based form. For each evaluation item, students clicked on a “radio button” corresponding to the numerical and text ratings shown in this table. evaluation item from courses in the highest and lowest quartiles (e.g., ostensibly viewed by students as the “best-taught” and “worsttaught” courses) were then compared using the Mann-Whitney test, a nonparametric way of determining whether two independent samples are from the same population [9]. All calculations of correlation coefficients, linear and quadratic regressions, and Mann-Whitney tests were conducted using SPSS for Windows© (SPSS Inc.). Determination of statistical outliers, quartiles and associated means, and comparisons of correlation coJanuary 2007 efficients and linear regression parameters were conducted using Excel© (Microsoft Corporation). III. RESULTS Figure 1 shows that Rose-Hulman student ratings of overall instructor performance were neither linearly nor quadratically related to student ratings of course workload required in relation to other Journal of Engineering Education 71 Figure 1. Ratings of overall instructor performance as a function of perceived relative workload. Axis scales are the same for each subframe within this figure. For engineering courses, n 472; for mathematics and science courses, n 390; for humanities courses, n 165. Correlation coefficients shown (rS) are Spearman’s rho. Solid lines represent linear regression lines; dashed lines represent quadratic curve fits. courses of equal credit. Figure 2 presents the means and standard deviations of scores on all of the Rose-Hulman evaluation items, for courses in the highest and lowest quartiles of ratings on the item assessing overall instructor performances. Examining Figure 2 in conjunction with Table 1 reveals that courses in the lowest quartile (i.e., with the poorest ratings of overall instructor performance) received different (p 0.05, Mann-Whitney test) mean ratings from highest-quartile courses on all evaluation items except for items related to the pace of the material and the workload relative to courses of equal credit. Lowest-quartile courses and highest-quartile courses did not receive significantly different workload ratings from students. In contrast to the workload-related results, ratings on a number of other Rose-Hulman evaluation items were strongly correlated with ratings of overall instructor performance (Figure 3 and Table 2). For example, the strength of agreement with statements such as “The professor used teaching methods that helped me learn,” “The professor met the stated course objectives,” and “The professor generally was well-prepared for class” was strongly associated with better ratings of overall instructor performance (Figure 3). Items most strongly correlated with overall instructor performance ratings (Table 2) tended to focus on student perceptions of the professor’s teaching/presentation methods, preparation, sensitivity to students and interest in the subject, and overall learning experience 72 Journal of Engineering Education and course quality. The items listed in Table 2 also yielded the strongest correlations when considering only humanities courses (with correlations ranging from 0.880 to 0.673) and when considering only mathematics and science courses (with correlations ranging from 0.892 to 0.703), with some minor re-ordering and in the case of humanities courses the substitution of “Grading was objective and impartial” for “The professor seemed genuinely interested in teaching this subject.” Qualitatively similar results were obtained from the Tulane data (Table 3), in which items most strongly correlated with overall instructor performance ratings tended to focus on student perceptions of the professor’s teaching/presentation methods, interest in teaching and students, and overall learning experience. Some items on the Tulane evaluation were phrased similarly to items on the Rose-Hulman form; a few of these “matched” items yielded differing correlations with overall instructor performance (Table 3). Items most weakly associated with overall instructor performance ratings (Table 5) tended to focus on the number or percent of responses to evaluation items and student perceptions of course pace, workload, textbook, and coordination between laboratory exercises and course materials. The items listed in Table 5 also yielded the weakest correlations when considering only humanities courses (with correlations ranging from 0.003 to 0.381) and when considering January 2007 Figure 2. Comparisons of evaluation ratings from engineering courses with the lowest- and highest-quartile ratings of overall instructor performance. All evaluation items were ranked on a numerical scale from 1 to 5; text descriptors of numerical rankings for each evaluation item are given in Table 1. Lowest-quartile courses received significantly (0.01, two-tailed Mann-Whitney test) different ratings from highestquartile courses on all evaluation items except those denoted by arrows (i.e., ratings of the pace of the material, and of the amount of work relative to other courses of the same credit). Data shown are mean one standard deviation; n 118 for each quartile. only mathematics and science courses (with correlations ranging from 0.022 to 0.333), with some minor re-ordering. The correlations between the workload evaluation item and the overall instructor performance item were similar across engineering (0.07), mathematics and science (0.05), and humanities (0.08) courses. None of the correlation coefficients presented in Tables 2 or 5 and calculated using information from engineering courses were significantly (p 0.05) different from coefficients calculated for the same evaluation item using information from mathematics, science, and humanities courses. Differences between linear regression slopes and intercepts from engineering and from mathematics, science, and humanities courses were only observed for two of the six evaluation items most strongly correlated (Table 2) with overall instructor performance. The largest differences in slope and intercept were observed for the item “The professor generally was wellprepared for class.” For this item, the regression slope from mathematics, science, and humanities courses (1.17) was higher (p 0.01) than that from engineering courses (0.952); the intercept of the regression line from mathematics, science, and humanities courses (1.04) was more negative (p 0.01) than that from engineering courses (0.10). January 2007 IV. DISCUSSION The analyses conducted in the present study revealed a very small correlation (0.07) between Rose-Hulman student ratings of overall instructor performance and of course workload. This is in agreement with a previous investigation of the data from Tulane University, which detected a small correlation at best between students’ perceptions of course workload and instructor performance, and which was unable to determine an even marginally-acceptable descriptive relationship (either linear or quadratic) between workload and instructor performance evaluation items [8]. Marsh [3, 11] has reported the correlation of higher levels of course workload/difficulty with higher student ratings of instructional quality, citing, for example, a correlation coefficient of 0.16 between student ratings of course workload/difficulty and of the instructor overall [3]. Similarly, Chau and Hocevar [2] calculated a mean correlation of 0.13 between the workload/difficulty factor and all other factors of the Students’ Evaluations of Educational Quality rating instrument. Furthermore, analysis of the Individual Development and Educational Assessment system revealed low correlations between ratings of “Amount of reading” and “Amount Journal of Engineering Education 73 Figure 3. Scatterplots of overall instructor performance ratings versus the strongest-correlating evaluation items from all engineering, mathematics, science, and humanities courses. Correlation coefficients shown (rS) are Spearman’s rho. Solid lines represent linear regression lines. n 1014 for all subframes within this figure. of work in other (non-reading) assignments” and items regarding the instructor’s teaching procedures [5]. These investigations [2–5] utilized data from a variety of course types. Gall et al. have reported a weak correlation (0.21) between student ratings of mechanical engineering course workload and instructor performance [7]. Centra [6] has examined student ratings of instruction across multiple fields of study, including a group of engineering and technology courses from two- and four-year colleges. Combining level of difficulty, workload, and course pace into a single “Difficulty/Workload” factor, Centra calculated a correlation of 0.06 between this factor and overall course evaluations [6], which agrees well with the results of the present study. Centra [12] has also reported that courses in natural sciences, mathematics, and engineering tend to receive lower course and instructor ratings than courses in other disciplines. In the present study, instructor ratings for humanities courses (4.15 0.44, mean standard deviation) were statistically (p 0.01, MannWhitney test) higher than for either engineering or mathematics and science courses (3.90 0.55 and 3.96 0.57 respectively; means standard deviations), in agreement with Centra’s report [12]. The largest differences observed in the present study between regression line slopes and intercepts imply that engineering profes74 Journal of Engineering Education sors with the same overall instructor performance rating as mathematics, science, and humanities professors may be rated as slightly more prepared for class—or that engineering instructors rated at the same level of preparation as mathematics, science, and humanities instructors, may receive slightly lower overall instructor performance ratings. Further research could explore this observation. The predominance of the observations from the present study imply that the population of students examined used similar standards to rate courses in different disciplines, as evidenced by similar correlations between the same teaching/course evaluation items and ratings of overall instructor performance. A key result of the present study is the observation of poor relationships—regardless of type of courses or campus culture—between student ratings of instructor performance and course workload. Definitions of “workload” may vary from student to student [13]. Different student or researcher definitions (i.e., assigned homework only, or assigned homework plus hours spent studying) of courseassociated work may account for some differences between studies [6]. For example, Greenwald and Gillmore constructed a model presenting expected grade as a direct mediator and workload as an indirect mediator of course evaluation ratings, positing that grading leniency influences student evaluations of teaching [14]. A re-analysis January 2007 Table 2. Strongest correlations with ratings of overall instructor performance. All correlation coefficients (Spearman’s rho) 0.7 are reported. Response scales for evaluation items are given in Table 1; the item used to rate overall instructor performance was “Overall, how would you rate the professor’s performance in this course?” For engineering courses, n 472; for mathematics, science, and humanities courses, n 540. All correlations are statistically significant at the 0.01 (two-tailed) level. None of the engineering correlation coefficients for the evaluation items in this table were statistically different (at or below the 0.05, two-tailed level) from the related mathematics, science, or humanities coefficients. of Greenwald and Gillmore’s data [15] concluded that the relationship between expected grade and course evaluations could be eliminated by including perceived learning as a factor, and that workload not associated with perceived learning (i.e., viewed by students as unnecessary, excessive, etc.) had a negative effect on student evaluations of teaching [15]. Structural models which treat workload viewed by students as valuable to learning (“good”) distinctly from unneeded “bad” workload have shown that “good” workload is positively and “bad” workload is negatively associated with ratings of overall teaching and perceived learning [4]. If the data sets presented in Figure 1 were shaped like inverted letter U’s or V’s, this could have provided evidence that in addition to viewing too much work as “bad,” students view too little work as “bad” (i.e., not enough to help them learn), and automatically give negative course/teaching evaluations in both cases. However, the present study does not provide such evidence. The quadratic curve fits shown in Figure 1 are poor. Furthermore, replotting the data shown in Figure 1, such that workload ratings that deviate to either side of the middle “About the Same” rating are equivalently ranked, produces poor correlations and curve fits similar to those displayed in Figure 1. The main goal of the present study was not to define or discriminate between types of workloads, but rather to test the simple hypothesis that students perceive high workloads as “bad” and low workloads as “good” and bias course/teaching evaluations accordingly. The correlations reported in the present study (and others) between workload and instructor performance are small, and large sample sizes (hundreds, as in the present study and others [3, 4], to many thousands [2, 6], to over a hundred thousand [5]) allow small effects that are not necessarily of practical importance [16] to be January 2007 detected with statistical significance. The results of the present study indicate that for the real-world purpose of attempting to determine reasons for poor evaluations or to focus efforts to improve evaluations, student perceptions of higher course workloads are not simply associated with poorer student evaluations of instructor performance. In contrast to workload, some evaluation items (dealing with the general areas of organization and preparation, teacher/student interactions, and teaching methods that help students stay attentive and learn) were easily and strongly associated with the overall instructor performance item, across different populations of engineering students from different campus cultures and across different disciplines within a given campus culture. These results concur with other reports that ratings of teacher organization/preparation [17–19] and instructor accessibility [7] are strongly associated with overall course/instructor ratings. It is possible that: when instructors use organization strategies that students understand combined with teaching methods that help students stay attentive, students may learn more; when students learn more, they may rate the quality of their overall learning experience more highly; when instructors create a learning environment for students that is organized, understandable, and supportive, students may rate the performance of instructors more highly. Certainly, correlation does not imply causation. However, the areas found in the present study to be strongly associated with ratings of overall instructor performance are appropriate professional development topics for educators regardless of their academic discipline, and there are many resources (e.g., [20–24]) for faculty seeking ideas. Not every idea in the educational literature will fit every instructor’s style, courses, or goals; it is better Journal of Engineering Education 75 Table 3. Strongest correlations with ratings of overall instructor performance —Tulane University School of Engineering evaluations. All correlation coefficients reported are Spearman’s rho. Response scales for evaluation items were a reverse Likert scale with five divisions ranging from “Strongly Agree” (1) to “Strongly Disagree” (5). The evaluation item used to rate overall instructor performance was “Overall, how would you rate the professor’s performance in this course?” and the response scale for this item was five divisions ranging from “Excellent” (1) to “Poor” (5). n 822 except for items 1, 6, and 7 for which n 821, 820, and 823, respectively. All correlations are statistically significant at the 0.01 level (two-tailed). Table 4. Comparison of correlations calculated from similar evaluation items. All correlation coefficients reported are Spearman’s rho. Response scales for Rose-Hulman items are given in Table 1. Superscripts associated with Tulane evaluation items denote the response scales: a five divisions ranging from “Strongly Agree” (1) to “Strongly Disagree” (5); b five divisions ranging from “Excellent” (1) to “Poor” (5); c five divisions of “Definitely Too Little” (1), “Somewhat Too Little” (2), “About Right” (3), “Somewhat Too Much” (4), and “Definitely Too Much” (5). n 472 for Rose-Hulman items and n 822 for Tulane items except for the items associated with delivery, accessibility, interest in teaching, and the text, for which n 820, 822, 822, and 808, respectively. Shaded arrows denote a pair of correlations that are statistically different (p 0.01, two-tailed); unfilled arrows denote a pair of correlations that are not statistically different. 76 Journal of Engineering Education January 2007 Table 5. Weakest correlations with ratings of overall instructor performance. Evaluation Factors are aspects other than responses to items on the evaluation. All correlation coefficients reported are Spearman’s rho. Response scales for evaluation items are given in Table 1; the evaluation item used to rate overall instructor performance was “Overall, how would you rate the professor’s performance in this course?” For engineering courses, n 472 except for the laboratory assignment item (n 448) and the textbook item (n 447). For mathematics, science, and humanities courses, n 540 except for the laboratory assignment item (n 490) and the textbook item (n 531). The symbol indicates that a correlation is statistically significant at the 0.05 level; the * symbol indicates that a correlation is statistically significant at the 0.01 level. None of the engineering correlation coefficients for the evaluation items in this table were statistically different (at or below the 0.05, two-tailed level) from the related mathematics, science, or humanities coefficients. for an instructor to try a few small changes that feel appropriate and could be sustainable over time rather than to attempt a drastic revision of style and/or substance. For example, here are two small ideas that can be applicable across a broad range of instructor styles and pedagogical goals: 1) Tell your students what you are doing and why. As the course progresses, briefly explain: your choice of teaching methods; the design of assignments; the organization of the material; your expectations for student learning and performance; that you are interested in helping students learn. 2) Seek and use formative feedback. Make at least one opportunity early in the course (say, after the first three weeks) for students to anonymously give you (a) ideas of things that are going well, and (b) things that could be changed. Choose at least one thing students believe to be going well, tell students that you will continue that practice, and then do so. Choose at least one thing that students have suggested as a potential change, thank students for their ideas, describe the change you will make, and then make the change. Consider seeking additional student feedback on the impact of the change after two or three more weeks. Communicating with the class should improve their understanding of your organization and preparation, your approach to teacher/student interactions, and your choice of teaching methods. Seeking and using formative feedback is not only a clear demonstration of considering and meeting students’ needs, it will give you a chance to improve the course prior to the final summative evaluations. January 2007 V. CONCLUSION Hundreds of studies have been conducted on student evaluations of teaching. As quoted from reference [11] “Probably, students’ evaluations of teaching effectiveness are the most thoroughly studied of all forms of personnel evaluation, and one of the best in terms of being supported by empirical research.” The present study assumes that student evaluations reflect student opinions reliably, validly, and usefully [11, 17, 18]. This investigation did not attempt to address a number of questions that could be asked about student evaluations of teaching, such as whether factors about the instructor or course [3, 17, 18, 23, 25] beyond the criteria on evaluation forms affect ratings, whether characteristics of the student population [25] affect ratings, what other types of evaluations or assessments can or should be used for faculty performance reviews, etc. This study sought to determine what, if any, strong correlations could be found within teaching evaluation data because student evaluations of teaching are part of faculty performance review systems at many institutions. Such review systems may prioritize numerical scores from one or a few general overall items (e.g., an “overall instructor performance” item) as quantifiable and general descriptors of a multifaceted practice. Whether faculty agree with this practice [26] or not [3, 17], faculty generally have to work within the system (i.e., earn tenure/promotion) before they can lead efforts to promote changes in the system. With this in mind, the present investigation provides motivation for faculty across academic disciplines and campus cultures to focus on teaching methods that help students stay attentive and learn, on organization and preparation, and on teacher/student interactions, rather than on course workloads. Journal of Engineering Education 77 ACKNOWLEDGMENTS I thank Mark Schawitsch of the Rose-Hulman Institutional Research, Planning, and Assessment Office, for providing the Rose-Hulman course evaluation data. I also thank Glen A. Livesay for offering comments on and suggestions for this project, as well as former colleagues at Tulane University for their interest in this work. REFERENCES [1] Sojka, J., Gupta, A.K., and D.R. Deeter-Schmelz, “Student and Faculty Perceptions of Student Evaluations of Teaching: A Study of Similarities and Differences,” College Teaching, Vol. 50, No. 2, 2002, pp. 44–49. [2] Chau, H., and D. Hocevar, “Higher-Order Factor Analysis of Multidimensional Students’ Evaluations of Teaching Effectiveness,” presented at The Annual Conference of the American Educational Research Association (AERA), New Orleans, Lousiana, 1994, obtainable via Educational Resources Information Center (ERIC). [3] Marsh, H.W., “Students’ Evaluations of University Teaching: Dimensionality, Reliability, Validity, Potential Biases, and Utility,” Journal of Educational Psychology, Vol. 76, No. 5, 1984, pp. 707–754. [4] Marsh, H.W., “Distinguishing Between Good (Useful) and Bad Workloads on Students’ Evaluations of Teaching,” American Educational Research Journal, Vol. 38, No. 1, 2001, pp. 183–212. [5] Sixbury, G.R., and W.E. Cashin, “Description of Database for the Idea Diagnostic Form,” IDEA Technical Report No. 9, 1995, Center for Faculty Evaluation & Development, Kansas State University, Manhattan, Kansas. [6] Centra, J.A., “Will Teachers Receive Higher Student Evaluations by Giving Higher Grades and Less Course Work?,” Research in Higher Education, Vol. 44, No. 5, 2003, pp. 495–518. [7] Gall, K., D.W. Knight, L.E. Carlson, and J.F. Sullivan, “Making the Grade with Students: The Case for Accessibility,” Journal of Engineering Education, Vol. 92, No. 4, 2003, pp. 337–343. [8] Dee, K.C., “Reducing the Workload in Your Class Won’t “Buy” You Better Teaching Evaluation Scores: Re-Refutation of a Persistent Myth,” Proceedings, 2004 American Society for Engineering Education Annual Conference and Exposition, Salt Lake City, Utah: American Society for Engineering Education, 2004, Session 1331. [9] Glantz, S.A., Primer of Biostatistics, 4th ed., New York, New York: McGraw-Hill Health Professions Division, 1997. [10] Mickey, R.M., O.J. Dunn, and V.A. Clark, Applied Statistics: Analysis of Variance and Regression, 3rd ed., Hoboken, New Jersey: John Wiley & Sons, Inc., 2004. [11] Marsh, H.W., “Students’ Evaluations of University Teaching: Research Findings, Methodological Issues, and Directions for Future Research,” International Journal of Educational Research, Vol. 11, 1987, pp. 253–388. [12] Centra, J.A., Reflective Faculty Evaluation: Enhancing Teaching and Determining Faculty Effectiveness, San Francisco, CA: Jossey-Bass Publishers, 1993. [13] Kember, D., “Interpreting Student Workload and the Factors Which Shape Students’ Perceptions of Their Workload,” Studies in Higher Education, Vol. 29, No. 2, 2004, pp. 165–184. [14] Greenwald, A.G., and G.M. Gillmore, “No Pain, No Gain? The Importance of Measuring Course Workload in Student Ratings of Instruction,” Journal of Educational Psychology, Vol. 89, No. 4, 1997, pp. 743–751. [15] Marsh, H.W., “Effects of Grading Leniency and Low Workload on Students’ Evaluations of Teaching: Popular Myth, Bias, Validity, or 78 Journal of Engineering Education Innocent Bystanders?,” Journal of Educational Psychology, Vol. 92, No. 1, 2000, pp. 202–228. [16] Kirk, R.E., “Practical Significance: A Concept Whose Time Has Come,” Educational and Psychological Measurement, Vol. 56, No. 5, 1996, pp. 746–759. [17] Cashin, W.E., “Student Ratings of Teaching: The Research Revisited,” IDEA Paper no. 32, September 1995, Center for Faculty Evaluation & Development, Division of Continuing Education, Kansas State University, Manhattan, Kansas; http://www.idea.ksu.edu/, accessed August 2, 2006. [18] Cohen, P.A., “Student Ratings of Instruction and Student Achievement: A Meta-Analysis of Multisection Validity Studies,” Review of Educational Research, Vol. 51, No. 3, 1981, pp. 281–309. [19] Pittman, R.B., “Perceived Instructional Effectiveness and Associated Teaching Dimensions,” Journal of Experimental Education, Vol. 54, No. 1, 1985, pp. 34–39. [20] Cashin, W.E., “Readings to Improve Selected Teaching Methods,” IDEA Paper no. 30, September 1994, Center for Faculty Evaluation & Development, Division of Continuing Education, Kansas State University, Manhattan, Kansas; http://www.idea.ksu.edu/, accessed August 2, 2006. [21] Felder, R.M., “Additional Higher Education Resources,” http://www.ncsu.edu/felder-public/coolsites.html, accessed August 3, 2006. [22] Felder, R.M., “Resources in Science and Engineering Education,” http://www.ncsu.edu/felder-public/RMF.html, accessed August 3, 2006. [23] McKeachie, W.J., Teaching Tips: Strategies, Research, and Theory for College and University Teachers, 9th ed., Lexington, Massachusetts: D.C. Heath and Company, 1994. [24] Marsh, H.W., and L.A. Roche, “Appendix 5: Targeted Teaching Strategy Booklets,” in The Use of Students’ Evaluations of University Teaching to Improve Teaching Effectiveness, Canberra, ACT: Higher Education Division, Evaluations and Investigations Program, Department of Employment, Education and Training, Australian Government Publishing Service, 1994. [25] Wachtel, H.K., “Student Evaluation of College Teaching Effectiveness: A Brief Review,” Assessment & Evaluation in Higher Education, Vol. 23, No. 2, 1998, pp. 191–211. [26] d’Apollonia, S., and P.D. Abrami, “Navigating Student Ratings of Instruction,” American Psychologist, Vol. 52, No. 11, 1997, pp. 1198–1208. BIOGRAPHICAL SKETCH Dr. Kay C Dee is an associate professor of Applied Biology and Biomedical Engineering at Rose-Hulman Institute of Technology. She writes papers and gives workshops and presentations on topics such as student learning styles, evaluations of teaching, assessment and accreditation, and helping faculty be effective in the classroom. Her biomedical engineering research focuses on tissue engineering and biomaterials. Her teaching, educational research, and mentoring of students and faculty have been recognized with a Carnegie Foundation for the Advancement of Teaching “Professor of the Year” award for the state of Louisiana; a Tulane University “Inspirational Undergraduate Professor” award; the opportunity to serve as a Teaching Fellow for a National Effective Teaching Institute; a Graduate Alliance for Education in Louisiana “Award for Excellence in Mentoring Minority Researchers,” and more. Address: Department of Applied Biology and Biomedical Engineering, Rose-Hulman Institute of Technology, 5500 Wabash Avenue, Terre Haute, Indiana, 47803; telephone: (1) 812.877.8502; fax: (1) 812.877.8545; e-mail: dee@rose-hulman.edu. January 2007