Int. J. Engng Ed. Vol. 22, No. 5, pp. 1070±1076, 2006 Printed in Great Britain. 0949-149X/91 $3.00+0.00 # 2006 TEMPUS Publications. Comparisons Between Performances in a Statics Concept Inventory and Course Examinations* PAUL S. STEIF Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA. Email: steif@andrew.cmu.edu MARY HANSEN Department of Secondary Education and Graduate Studies, Robert Morris University, Pittsburgh, PA 15213, USA. Email: hansen@rmu.edu Multiple analyses of results are presented from the Statics Concept Inventory, a multiple choice test assessing conceptual knowledge in engineering statics. Results are based on the administration of this test to 1331 students in ten classes at seven US universities during the 2004±2005 academic year. Evidence confirming the validity and reliability of the test is offered. Detailed comparisons are made between inventory scores and performance on course examinations, including evaluations of how inventory sub-scores on specific concepts correlate with performance on certain types of problems. Finally, based on analysis of the prevalence of various answer choices, common misconceptions are identified. Keywords: statics; assessment; multiple choice test; test administration summative assessment of conceptual gains. However, there is also the potential for using the inventory as a means of formative assessment, particularly since the analysis of the test results provides scores on distinct concepts. Consider, for example, the inventory to be administered a few weeks prior to the end of Statics, or at the beginning of a follow-on course (e.g., Dynamics or Mechanics of Materials). Then, the performance of individual students on different concepts could be fed back in real time to enable additional instruction that is tailored to the specific needs of individual students. The current paper presents results for the 2004±2005 version of the inventory, which has been administered at a number of institutions. In addition, through detailed comparisons of inventory results with performances on course examinations, evidence is offered for the potential value of such concept-specific feedback. INTRODUCTION ASSESSMENT IS increasingly recognized as crucial to improving learning [1, 2]. Since conceptual understanding is strongly tied to the ability to transfer newly acquired knowledge [3]Ða major goal of many engineering science coursesÐassessment ought to include conceptual knowledge. One approach, pioneered with the Force Concept Inventory [4], has been to assess conceptual understanding through multiple choice tests in which questions and the various answer choices are carefully constructed to measure knowledge of important concepts and identify common misconceptions. Concept inventories have recently been under development for a number of engineering subjects [5]. One such inventory, the Statics Concept Inventory (henceforth referred to here as `the inventory'), has been developed to assess conceptual understanding for the engineering course Statics [6]. Underlying the test was an extensive observational analysis of the products of novice problem solving and a formulation of those observations into a conceptual framework [7]. The inventory comprises 27 questions which focus on eight concepts in Statics. Like most concept inventories, this test is intended to be administered towards the beginning and towards the end of a Statics course, offering a RESULTS FOR 2004±2005 ADMINISTRATIONS This paper reports on data from administrations of the inventory during the 2004±2005 academic year at US universities. In all cases, the test was administered near or after the completion of Statics (post-test) to a total of 1331 students in 10 different classes (see Table 1). (The test was administered at two additional schools; one lost the data and a second mistakenly used the * Accepted 20 November 2005. 1070 Performances in a Statics Concept Inventory and Course Examinations Table 1. Classes taking inventory (test methodÐpaper and pencil: 1, 4, 5, 7, 10; electronic: 2, 3, 6, 8, 9) Class 1 2 3 4 5 6 7 8 9 10 N Course University 97 38 35 42 419 262 158 75 42 163 Statics Statics Statics Statics Statics Mech. of Mat. Statics Mech. of Mat. Mech. of Mat. Statics A B B C D D E E F G 2003±2004 test.) As reported below, the inventory was also administered as a pre-test (prior to Statics) in six of these classes. In some cases, the test was administered in class with pencil and paper; in other cases, students took the test electronically. (In one class, students entered answers into a spreadsheet that was emailed in.) Students were aware that their score on the inventory would not affect their class grade, although at some institutions points were given for merely completing the test. Time limits of 50 to 60 minutes were imposed for tests taken in class; no limit was imposed when the test was completed electronically. While students were told to complete the test on their own, there was no monitoring of the students who took the test electronically. As will be seen from the data below, no noticeable differences in performance were found between different methods of test administration. Classes 2 and 3 were taught by the same instructor in two successive semesters of Statics. Classes 5 and 6 correspond to Statics and Mechanics of Materials, respectively, taught during successive semesters at the same university; the Mechanics of Materials course consisted mostly of students from the prior Statics course. Classes 7 and 8 likewise corresponded to Statics and Mechanics of Materials courses. With the exception of Class 1, which is a private research university (CMU), the remaining classes were at public universities, all with graduate and undergraduate programs with various rankings. Means and standard deviations for the post-test scores, and the mean for the pre-test scores (where administered), are shown in Table 2 for each class. In the transition from a paper-based to web-based administration, personal data were not collected for many students. Therefore, the authors cannot report reliably on any differences of performance between genders and ethnic groups. With the exception of Class 1, all schools had highly comparable pre-test scores. Students in Class 4 were instructed to leave blank any questions that they did not know how to answer. Considering the fraction of questions that were left blank, the pre-test scores in Class 4 would likely have been comparable to other schools. It can be seen that inventory scores were generally 1071 Table 2. Means and standard deviations for post-test, means for pre-test, and normalized gains Class 1 2 3 4 5 6 7 8 9 10 Post-Test S.D. Pre-Test Norm. Gain 20.51 13.87 14.17 11.62 12.14 12.08 13.64 14.37 12.24 13.04 4.24 4.66 4.12 4.14 4.59 4.92 5.14 5.09 4.10 4.96 10.46 7.04 7.13 4.82 7.35 * 7.41 * * * 0.61 0.34 0.35 0.31 0.24 * 0.32 * * * similar for classes taking the paper and pencil and electronic formats of the inventory. The variation in the post-test scores for different classes is more significant. Normalized gain was used by Hake to compare pre-post differences in performances on the Force Concept Inventory among classes with very different starting points [8]. Normalized gain is defined as the improvement divided by the maximum possible improvement. Clearly, the use of normalized gain is less important here, where classes have similar pre-test scores. Class 1 had a three-week introduction to Statics in the prior year as part of a freshman Fundamentals of Mechanical Engineering course. The relatively high post-test scores in Class 1 should also be viewed in light of the fact that the instructor is also the main developer of the inventory (first author). As was the case for all other classes, students in Class 1 did not see the inventory questions during the course of the semester. Nevertheless, it is likely that instruction in this class was influenced by the developer/instructor's views of Statics. Still, the higher pre-test scores for this class may indicate a greater readiness for the subject that allowed for greater gains. A more extensive discussion of the evidence supporting the reliability and validity of the 2003±2004 Statics Concept Inventory scores has been presented [6]. For the current version of the inventory, Cronbach's coefficient alpha, a measure of the internal consistency reliability, was found to be 0.82. This indicates that the inventory items consistently measure the same underlying content. One measure of the quality of an individual test question is its ability to discriminate between students. A question is more likely to be deemed poor if students who answered it correctly performed worse generally on other questions, as compared with students who answered it incorrectly. To determine the extent to which individual questions discriminate between students, the sample was divided into students who scored in the upper 27% overall and those who scored in the lower 27% overall. The fraction of students in these two groups that answered a given question correctly was then computed. The Discrimination Index for that question is defined as the difference 1072 P. S. Steif and M. Hansen Table 4. Correlations between inventory and 1st, 2nd and final exams in Mechanics of Materials classes Class r 8 0.65 9 0.57 0.39 0.46 0.19 0.30 Table 5. Correlations between course examinations, which are found to be comparable to correlations between inventory and examinations Class Fig. 1. Numbers of questions with Discrimination Index in various ranges. between these fractions. The numbers of questions with Discrimination Index in various ranges are shown in Fig. 1. Questions with Discrimination Index above 0.4 are considered very good discriminators, between 0.3 and 0.4 are considered to be good discriminators, and below 0.3 to be marginal or poor discriminators [9]. Most items on this test discriminate quite well. RELATION TO CLASS EXAMINATIONS Ultimately, instructors are concerned that students perform well in Statics courses, and also in subsequent courses. Thus, the inventory will be a more valuable tool if it is indicative of performance on course examinations. Inventory scores of students were correlated with their examination scores (when they were available) using the Pearson Correlation coefficient r. For four Statics courses, the comparison was based on the final examination (Table 3); in Class 1, there was no final exam, so the correlation was based on the average of all four exams spread throughout the semester. For the two Mechanics of Materials courses, the inventory was compared with each of three course examinations (Table 4). Given that the inventory asks questions on highly isolated and idealized aspects of statics, we argue that these correlations are quite meaningful. As a point of comparison, consider the correlations between exams within each class (Table 5). For example, in Class 1, the correlations 0.65, 0.66, 0.63, 0.59, 0.63, and 0.71 are between exams 1 and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, and 3 and 4, respectively. With four exams, there are six pairwise correlations; with three exams, there are three pair-wise correlations. By comparing the correlations provided in Tables 3, 4 and 5, it can be seen that the correlations between the inventory and Table 3. Correlations between inventory and final exams in Statics classes Class r 1 2 3 7 10 0.62 0.59 0.24 0.48 0.41 1 2 3 7 8 9 Correlations between course examinations 0.65, 0.57, 0.42, 0.32, 0.60, 0.54, 0.66, 0.34, 0.33, 0.44, 0.60, 0.69, 0.63, 0.42, 0.66, 0.49, 0.70 0.64 0.59. 0.55. 0.13. 0.48. 0.63, 0.71, 0.25, 0.48, 0.71 0.73 0.47 0.59 class exams are generally of the same order as correlations between class exams within each class. CONCEPT-SPECIFIC SUB-SCORES Feedback on students' performances that is more specific will be of greater value to instructors than just the overall scores. The questions in the inventory were each devised to correspond with one of several concepts in Statics, as follows: . Separating bodies (FBD): Identifying forces acting on a subset of a system of bodies . Equilibrium (Equil.): Consideration of both force and moment balance in equilibrium . Friction: trade-off between implications of equilibrium and Coulomb's law . Static equivalence (St.Eq.): Static equivalence between forces, couples and combinations . Roller: Direction of force between a roller and the rolled surface . Negligible friction (Neg.Fr.): Direction of force between frictionless bodies in point contact . Slot: Direction of force between a pin and the slot of a member . Representation (Repres.): Representing unknown loads at connections. Hence, one should be able to infer from scores information regarding specific concepts and provide it to instructors. To investigate the feasibility and usefulness of such inferences, we have undertaken: (1) to determine if independent subscores can be extracted consistently from the inventory; and (2) if concept-specific sub-scores are correlated with concept-specific performance in class examinations. The psychometric analysis regarding sub-scores has included extensive factor analysis (both exploratory and confirmatory). Details of this analysis will be presented elsewhere. In short, exploratory factor analysis provided evidence that sub-scales arise from the data, and confirmatory factor analysis showed that these sub-scales Performances in a Statics Concept Inventory and Course Examinations Fig. 2. Statics examination problem (Class 1) with multiple connected bodies (dimensions removed). The motor exerts a given torque on the arm, which leads the ram to crush the object with a force to be determined. Problem analyzed for errors in forces at roller and slot (usual assumptions regarding frictionless pins apply). are associated with the distinct concepts on which the test questions are based. In the following section we offer comparisons between performance on course examinations and specific concept subscores from the inventory. EXAM ERRORS AND CONCEPT SUB-SCORES Given that statistical analysis legitimized the extraction of concept specific sub-scores from the data, how are sub-scores related to performance in course examinations? Two examination problems from Class 1, depicted in Figs 2 and 3, were used to study these relations in detail. From the problem in Fig. 2, we assessed the ability of students to recognize the directions of force exerted by a roller on a surface and by a pin on a slot. From the problem in Fig. 3, we assessed the ability of students to allow the friction force to be less than 1073 N, if equilibrium is satisfied and no slippage occurs. Students in Class 1 were divided into two groups: those who erred and and those who did not err in recognizing the direction of the force of the roller in the exam problem of Fig. 2. Students in Class 1 were likewise split into those who did and did not err in the force of the pin on the slot (Fig. 2) and in the friction force (Fig. 3). For all three splittings of the class, students who erred had on average lower overall inventory scores and lower sub-scores on all concepts. However, for some concepts, the differences between the subscores of those who erred and those who did not err were not statistically significant. On the other hand, one expects the sub-score Roller, for example, to be significantly different for the two groups, those who erred and those who did not err, on the roller force in the examination problem. For each of the other exam errors, there is one concept sub-score (Slot and Friction) that also ought to discriminate most strongly between those who erred and those who did not. Based on their content, one expects the remaining concepts to be less directly related to the three exam errors. In Table 6, we display the mean inventory subscores on these three concepts for the groups of students that did and did not make these three exam errors. In addition, a t-test was performed, and the p-value for the test that the two groups have the same mean concept sub-score is shown. The smallest p-values are for concept sub-scores that correspond to the concept at issue in the exam problem. This indicates that the groups that erred and did not err in the use of a concept in a class exam differed significantly on those inventory questions specifically addressing that concept. In fact, with one exception (Roller sub-score on the slot error), other differences between the groups were not statistically significant (p > 0.05). The above exam-inventory comparison was based on results from Class 1 (students taught by the primary inventory developer). Given the correlations between class examinations and overall inventory scores shown above for other universities, it is likely that similar comparisons could be made for exam problems at other schools. One Table 6. Comparison of concept sub-scores for students who made and did not make specific errors in course examinations. Most significant differences between groups were for concept most directly related to error Roller: 47/97 erred Slot: 33/97 erred Fig. 3. Statics examination problem (Class 1) with friction. Levels of force P to cause sliding of the blocks to be determined for different levels of friction coefficient 2. Problem analyzed for error of automatically associating the friction force with N. Friction: 53/97 erred Error No error p Error No error p Error No error p Roller Slot Friction 0.645 0.867 0.002 0.646 0.818 0.023 0.723 0.803 0.265 0.872 0.927 0.199 0.778 0.964 0.001 0.868 0.939 0.075 0.652 0.760 0.127 0.667 0.729 0.42 0.610 0.826 0.001 1074 P. S. Steif and M. Hansen Fig. 4. Statics examination problem from Class 2 compared with inventory results. Students were asked to draw the shear force and bending moment diagrams. such analysis was conducted on a problem on the final exam for Class 2 (Fig. 4). Grading in this problem was done by the instructor of Class 2 prior to, and independent of, comparisons with the inventory. Scores for this problem were based primarily on a student's ability to draw free body diagrams and write down equations of equilibrium, all of which lead to shear force and bending moment diagrams. Students could earn a maximum of 6 points, and one point was taken off for very minor errors. Thus, we compared inventory scores of two groups of students: those who earned 5 or 6 and those who earned less than 5, indicating more substantial errors in free body diagrams or equilibrium conditions (Table 7). Sub-scores on all concepts are shown, along with (i) the difference in the means divided by the standard deviation of the sub-scores for the full group, and (ii) the p-value from a t-test comparing the means of the two groups. One should note the concepts that most discriminate between students who did well (5±6) on this exam problem and those who did not (0±4). Subscores on concepts that are irrelevant to this problem, namely Friction, Neglect of Friction, Slot, and Representing Forces at Connections, display non-significant differences. (The concept Roller is also irrelevant to this problem; with p 0.045, its difference is marginally significant.) On the other hand, the most relevant concepts, Free Body Diagrams, Equilibrium, and Static Equivalence, display the most significant differences. The relevance of these concepts to this problem is noteworthy. Certainly, the inventory Fig. 5. Question requesting student to draw the free body diagram of a subset of a system. was devised with a focus on the concepts necessary to solve problems of multiple inter-connected bodies [6], such as that of Fig. 2. It is clear, however, that the inventory is also quite relevant to problems that instructors would view as very distinct, namely, the drawing of shear force and bending moment diagrams, as demanded by the problem in Fig. 4. COMMON MISCONCEPTIONS IN STATICS One important use for the inventory is to assess the prevalence of various types of errors. These can point to widely held misconceptions that ought to be addressed in instruction. Here we point out errors that occurred most frequently. An example of a question in the group `FBD' is shown in Fig. 5. The student is asked to choose the correct free body diagram for the collection of blocks 1, 3 and 6, and the cords connecting them. The most commonly chosen wrong answer was the distractor with an `internal force', that is, a force acting between two of the bodies in the diagram (Tc in Fig. 6). This incorrect answer was chosen by 15% of students, while the correct answer was chosen by 66% of students. (In other questions, the internal force distractor was chosen by 26% of students.) Table 7. Inventory performances of students in Class 2 with at most minor errors (score 5 or 6, N 19) on problem in Fig. 4, compared with students with more significant errors (score 4 or less, N 19). FBD Equil. Friction St. Eq. Roller Neg. Fr. Slot Repres. Total Score 5±6 Score 0±4 Diff/SD p 0.80 0.58 0.33 0.33 0.82 0.46 0.77 0.67 16.47 0.55 0.34 0.18 0.12 0.61 0.32 0.67 0.49 11.26 0.80 0.90 0.56 0.78 0.65 0.53 0.36 0.52 1.12 0.013 0.004 0.087 0.015 0.045 0.102 0.278 0.109 <0.0005 Fig. 6. Commonly selected wrong answer to question in Fig. 5, featuring an internal force. Performances in a Statics Concept Inventory and Course Examinations Fig. 7. Question requesting student to choose loading that is statically equivalent (to the 20 N m couple). Fig. 8. Commonly chosen answer to question in Fig. 7 (force taken to be statically equivalent to couple). Fig. 9. Question requesting student to choose direction of force on surface contacted by roller. Fig. 10. Question requesting student to choose magnitude of force between bodies in frictional contact. Fig. 11. (a) Question in which student is requested to find load exerted by hand at right which balances bar. (b) Most commonly chosen wrong answer. Fig. 12. Question in which student is asked if loadings (I) and/ or (II) could be in equilibrium. 1075 An example of a question in the group `Static Equivalence' is shown in Fig. 7. Students are given one loading and asked to find a second loading that is equivalent. The most commonly chosen wrong answer was the distractor (Fig. 8) in which the couple is taken to be equivalent to a force that apparently produces the same moment. This incorrect answer was chosen by 26% of students, while the correct answer was chosen by 32% of students. For one question in the group `Static Equivalence', distractors of this type were chosen more frequently than the correct answer. An example of a question in the group `Roller' is shown in Fig. 9. Students are shown a system and asked for the direction of the force of the roller on the body on which it rolls (the usual assumptions of a frictionless pin and so forth are given). The most commonly chosen wrong answer was the distractor in which the force is acting parallel to the arm to which the roller is pinned, rather than perpendicularly to the rolled surface. This incorrect answer was chosen by 20% of students, while the correct answer was chosen by 58% of students. In general, there is a strong tendency to assume that forces always act parallel to elongated members (as they do in a two-force member). An example of a question in the group `Friction' is shown in Fig. 10. In this question, students are asked for the upward force exerted by the left block on a center block. The choices are all numbers. The distractor of 8 N was chosen by 39% of students, while the correct answer of 3 N was chosen by only 32% of students. Students are powerfully drawn to assume that the friction force equals the coefficient of friction times the normal force, rather than choosing a lesser force that maintains equilibrium. An example of question in the group `Equilibrium' is shown in Fig. 11. Students are shown the bar (a) with the force applied and asked for the loading exerted by a hand that grips the right end and maintains the bar in equilibrium. The most commonly chosen wrong answer is the distractor in which a force and a moment try to balance each other. The incorrect answer, shown as (b) in Fig. 11, was chosen by 25% of students, while the correct answer was chosen by 58% of students. This parallels the error in static equivalency. A second example of a question in the group `Equilibrium' is shown in Fig. 12. The student is asked whether one or both of the loadings could be in equilibrium (all forces and couple have positive magnitudes and act in the direction shown, but the magnitudes are adjustable). Very few students (11%) recognized that neither loading could produce equilibrium, (I) because forces cannot balance and (II) because moments cannot balance. In particular, 70% of students accepted that (II) could be in equilibrium, while 53% of students accepted that (I) could be in equilibrium. Students exhibit a wide range of misconceptions, as evidenced by the choice of wrong answers; however, certain errors stand out. Many students 1076 P. S. Steif and M. Hansen have trouble rejecting internal forces, perhaps revealing a lack of clarity regarding what body exerts each force or which bodies' forces ought legitimately to be included in a free body diagram. Students often fail to grasp the necessity of independently satisfying force and moment summation and/ or that a couple carries no net force. Forces at various connections sometimes have their directions set by the nature of the connection, whereas students are distracted from this by the shape of the body or by other applied forces. Finally, the quantity N is only the limit on the friction force; students are too often convinced that N must be the actual level of the friction force, even when there is no slip and a lesser force is sufficient for equilibrium. SUMMARY AND CONCLUSIONS The Statics Concept Inventory is a multiple choice test that assesses conceptual knowledge of students in Statics. The test consists of 27 questions that capture eight distinct concepts and include distractors (wrong answers) that have been constructed based on observations of student work. This paper is based on results from the administration of this test to 1331 engineering students in ten classes from seven US universities during the 2004±2005 academic year. Previous observations as to the psychometric soundness of the test, including reliability and quality of items in terms of discrimination indices have been reconfirmed. The bulk of the paper addresses comparisons between performance on the inventory and performance on class examinations. It has been found that there are significant correlations between overall inventory scores and examination scores, correlations that are on the order of correlations between class examinations. In addition, we have shown that inventory sub-scores on specific concepts can offer insight into the propensity to make the analogous conceptual errors in examinations. Finally, by investigating the distribution of wrong answers, we have identified common misconceptions, information that may be valuable for instruction. The inventory is available for all instructors, who can learn more by visiting: www.engineering-education.com/CATS. AcknowledgementsÐSupport by the National Science Foundation under grant REC-0440295 and by the Department of Mechanical Engineering at Carnegie Mellon University is gratefully acknowledged. REFERENCES 1. P. Black and D. William, Assessment and Classroom Learning, Assessment in Education, 5(1), pp. 7±73 (1998). 2. National Research Council, Knowing What Students Know: the Science and Design of Educational Assessment, J.W. Pellegrino, N. Chudowsky, and R. Glaser (Eds.), Washington, D.C., National Academy Press, (2001). 3. National Research Council, How people learn: brain, mind, experience and school, Committee on Developments in the Science of Learning, J.D. Bransford, A.L. Brown, and R.R. Cocking, (Eds.), Washington, D.C., National Academy Press (1999). 4. D. Hestenes, M. Wells, and G. Swackhamer, Force Concept Inventory, The Physics Teacher, 30, p. 141±158 (1992). 5. D. Evans, et al., Panel discussion: progress on concept inventory assessment tools, 33rd ASEE/IEEE Frontiers in Education Conference, Boulder, Co., November 5±8 (2003). 6. P.S. Steif and J.A. Dantzler, A statics concept inventory: development and psychometric analysis Journal of Engineering Education 33, pp. 363±371 (2005). 7. P.S. Steif, An articulation of the concepts and skills which underlie engineering statics, 34rd ASEE/IEEE Frontiers in Education Conference, Savannah, GA, October 20±23 (2004). 8. R. Hake, Interactive engagement versus traditional methods: A six-thousand student survey of mechanics test data for introductory physics courses, American Journal of Physics, 66(1), pp. 64±74 (1998). 9. L. Crocker and J. Algina, Introduction to Classical and Modern Test Theor,. Holt, Rinehart and Winston, Inc., Philadelphia (1986). Paul S. Steif received undergraduate and graduate degrees from Brown University and Harvard University in engineering mechanics. He is currently Professor of Mechanical Engineering at Carnegie Mellon University. He has been active as a teacher and researcher in the field of engineering mechanics, for which he has received a number of awards. Dr. Steif is currently involved in research to study student learning in basic engineering subjects, to measure student conceptual progress, and to construct educational materials that facilitate learning. Many of these developments have reached an international audience, including educational software that is published with widely selling textbooks. Mary Hansen received M.S. and Ph.D. degrees in Research Methodology from the University of Pittsburgh. She is currently an Assistant Professor in the School of Education and Social Sciences at Robert Morris University. Her research interests include educational measurement and assessment.