Automatic assessment of math skills Przemysław Kajetanowicz Institute of Mathematics and Computer Science Wrocław University of Technology przemyslaw.kajetanowicz@pwr.wroc.pl Jędrzej Wierzejewski Institute of Mathematics and Computer Science Wrocław University of Technology jedrzej.wierzejewski@pwr.wroc.pl Abstract It has been three semesters now that the authors deliver e-courses in linear algebra to students of Wroclaw University of Technology. A few hundred problems in algebra with analytical geometry are covered in each course. The problems are offered to students in the form of interactive randomly generated exercises, where the difficulty level and other parameters can be controlled at design time. Individual problems can be combined into comprehensive exams (in particular, the whole course material can be covered by a single exam). The article presents an effective solution of automated math skills assessment. Partial grading of selected individual problems is discussed. For three semesters now, the grading system that completely relies on distant electronic tests and exams proctored in a computer lab is in use, with big success. 1. Introduction Regardless of any specific field of education, the mechanisms of automatic assessment of a learner’s progress and/or skills have nowadays become an essential objective for designers of computer-aided learning systems. Testing utilities can be widely seen in every major learning management system (LMS). In the wellknown WebCT e-learning platform, for example, a teacher is given a tool called online quiz, offering means to conveniently create questions of various types: multiple-choice, matching, calculated, short answer or paragraph. Such tools provide relatively universal and decent means to automate typical exams, where the student is assessed in the sense of his or her knowledge of facts or the ability to apply simple formulae. Mathematics is one of those areas of education, where the student’s progress is measured almost exclusively by testing his or her ability to solve problems related to a specific part of material. The solution of an individual math problem usually breaks down into two or more steps (except, of course, simplistic problems involving the application of a single correctly identified formula or rule). When grading a math exam, the teacher typically devises a grading procedure for each individual problem in such a way that each part of the solution is assigned a given fraction of total score for the graded problem. In that way, the grading procedure reflects the solution process: the student that arrives at a partial solution earns the corresponding partial credit. (One can say that there is a one-to-one correspondence between the structure of a scoring system and the structure of a typical solution). Obviously, there are limits to that kind of approach: in many math problems, it is very difficult or impossible to determine what “a typical solution” might mean. Yet in many cases the scoring procedure that mimics the solution process works well and is widely used in mathematics and related fields. In view of those remarks, it is evident that generalpurpose testing utilities that a typical LMS offers are of limited value when one tries to map them to a more flexible math grading system. In the sequel of the paper the authors present their experience in implementing a grading system totally based on automatic knowledge assessment, with extensive use of the above-mentioned partial credit feature. 2. E-course in algebra – first version 2.1. Project history During the first 9 months of 2005, the authors designed and created an e-course in algebra with analytical geometry (see [5] for details). The course employed and extended the idea of automatic knowledge assessment first used in an experimental lesson on quadratic function ([4]). The reader is referred to [5] for the details concerning the functionality of the first version of the course and for the results of its first implementation in the spring of 2005. 2.2. Functional features and first implementation For the sake of completeness, let us repeat the most important features of the e-course (again, refer to [5] for details): • The math content of the course covers complex numbers, polynomials and rational functions, matrices and linear systems, analytical geometry in space, and conic sections. The whole content can be easily aggregated into an IMS or a SCORM package (see [1]). • The material is offered in the form of Web pages supplemented with highly interactive Java-driven exercises and interactive math simulators. • Nearly 80 types of math problems are supported in the form of interactive tests, varying from simple, drill-type exercises to quite sophisticated graphing problems or problems where the user is supposed to demonstrate mastery of a given method. • Individual problems can be combined into sophisticated comprehensive exams that can be used both for practice and as an element of the official grading system. The grading system that accompanied the first implementation of the course was totally based on 3 electronic exams that the students were taking in a computer lab. solution to the exam window. As a result, the solution was graded as incorrect, even though the student had arrived at the correct solution on paper. In the enhanced version, the program first initially checks for incorrect solutions and then gives the student a certain number of chances to correct the solution. Since that feature obviously gives extra chances also to students who simply do not know how to get to the correct solution, the mechanism should be used with caution (e.g. only one or two tries are offered to the student). 4. Second implementation In the fall semester of 2005, the course was delivered to a group of almost 400 students of Civil Engineering Department of Wroclaw University of Technology. The course was taught, as before, on a blended learning basis, with two hours of lecture and one hour of recitation per week. The authors continued to make use of WebCT as the learning platform. The grading system was again supposed to be based on automatic assessment. One natural technical barrier emerged: with a number of students totaling to almost 400, it was practically impossible to carry out mid-term electronic exams in computer labs. Therefore, the authors decided that only a final exam was to be held under teachers’ supervision in a lab. In addition, each student was supposed to take 6 online exams whose results were automatically stored in a database. The obvious lack of security (obviously, a student can be taking online exams with a third party’s help) was taken into account when the final grade was calculated. The next section presents the details of the grading system. 3. Further development The results of the first implementation of the course were promising enough to decide to apply the automatic assessment system on a larger scale. In the meantime, a few more types of problems were supported to enhance the course functionality. In response to some students’ complaints that the automatic tests are “ruthless” in the sense that forgetting to fill out a single edit field results in the solution being rejected, two safety features were added to Java-driven tests. • Completeness check. When a student submits an exam, the program checks whether all the necessary edit fields in the exam window have been filled out. The program warns the student if the solution is not completely entered. The student can then go back to the exam window and complete his or her work. • Initial correctness check. Many students complained that they are punished for “technical” errors that could take place while entering the 5. Administrative grading system The grading system was completely based on automatic assessment. During the semester, each student was taking 6 online exams (without teachers’ supervision). At the end of the semester, the final exam was administered in a computer lab. The following grading procedures were used. • Each of the 6 online exams consisted of 5 problems, with a maximum score of 4 points for a single problem. The time constraint for an individual exam was 60 minutes. In that way, a student could earn a total of 120 points throughout the semester. • The final exam consisted of 6 problems, each carrying a maximum score of 5 points (totaling to 30). A student was given 90 minutes for the solution. • Additionally, a student could earn up to 6 points for the in-class activity. Regardless of how well a student had done on the online exams, he or she had to demonstrate appropriate level of skills on the final exam. Specifically, a minimum of 16 points on the final exam was the necessary (but not sufficient) condition for a student to pass. Otherwise a student was failing and had to take the make-up exam. • If a student scored at least 16 points on the final exam, then 20% of the student’s online-earned scores (up to 24 points) as well as the activity credit (up to 6 points) were added to the result of the final exam. In that way, a maximum of 60 points was available for a student to earn for all the elements of the course. The final grade was based on the table below. • A student who did not get a passing grade after the final exam had to take the make-up exam consisting of 8 problems to be solved within 120 minutes. Up to 60 points could be earned for the solutions. The final grade was based on the table below, with online exams or in-class activity no longer taken into account. The following table contains the conversion of scores into grades (the grades in Polish education system form the set 2, 3, 3.5, 4, 4.5 and 5, with “2” corresponding to “fail” and “5” corresponding to “very good”). • Table 1. Score-grade conversion Total score Grade 0 – 35 2 36 – 40 3 41 – 45 3.5 46 – 50 4 51 – 55 4.5 56 – 60 5 6. How automatic assessment works The whole course includes nearly 80 problem types, covering the whole material, as listed in 2.2. Recall that each problem can be delivered to student either in the form of a single interactive exercise or as a part of a more comprehensive exam. Regardless of the actual role played by a problem at a specific point of study (a single exercise or an exam problem), a student is required to solve a problem by hand and then to enter appropriate elements of the solution in the applet window. Specially designed user interface is provided for that purpose: edit fields, spin controls, and – in graphical problems – sophisticated graphical tools, including a virtual ruler and virtual eraser. Additionally, in some problems involving matrix and linear systems, a student is provided with additional highly specialized tools so that elementary row/column operations can be performed conveniently without the risk of computation errors. The grading begins once a student submits an exercise or an exam for evaluation. Regardless of the fact that different problem types implement different grading algorithms, grading is governed by a few common rules, which we will now discuss in greater detail. 6.1. Three types of grading One of the following three grading types can be associated with an individual problem. Two-state grading – “correct” or “incorrect” is one of two possible grades. Three-state grading – a student receives the grade equal to 0, ½ or 1, with ½ reflecting partial credit for a partial solution. Score grading – a specified maximum number of points can be earned for the solution. The teacher’s freedom in choosing one of the abovedescribed types of grading essentially depends on the role that an individual problem plays. Single-problem-exercise. If a problem “acts” as a singleproblem-exercise, then it is up to the teacher/designer, which of the three grading types to choose. Two-state and three-state grading is commonly applied in such cases, although score grading is also available. No intervention in the source code is necessary - the teacher defines the desired grading type through a parameter in an external configuration file. Exam problem. The score grading is the teacher’s sole option in the case of an exam problem. To each problem on the exam, the teacher/designer has to assign a maximum score M by setting an appropriate parameter in an external configuration file. 6.2. Grading algorithm for a single problem The actual grading of a single problem is carried out in two steps, of which the first is completely hard-coded, while the second partially depends on the type of grading that the teacher has chosen; if the score grading has been chosen, then the maximum score M assigned by the teacher is also taken into account. Step 1. A hard-coded initial grading function is associated with every problem type. The function first verifies the correctness of individual elements of the solution entered by the student. A real number p from the interval [0,100] is then returned, based on the function’s internal algorithm, specific to a particular problem type. (From the user’s point of view, the number p can be viewed as the evaluated “percentage of correctness” of the solution). We will refer to the number p as raw score. Note that the raw score heavily depends on the specific type of a problem and the structure of the solution. We will give illustrative examples in the sequel. Step 2. The actual grade for the solution of a single problem is based on two elements: a) the actual raw score that the initial grading function returns, and b) the associated type of grading (see the remarks on the teacher’s options above). Two-state grading. If the returned raw score equals 100, then the solution is graded as “correct”. Otherwise the solution is graded as “incorrect”. Three-state grading. If the returned raw score is equal to 100, then the solution is graded as “1”; if the raw score satisfies 50 ≤ p < 100, then the solution is graded as “½”; otherwise the solution is graded as “0”. Score grading. Let M denote the maximum score set by the teacher for the solution of the problem, and let p stand for the raw score returned by the initial grading function (see Step 1 above). The actual score S is then computed from the formula p⋅M P = E , 100 where E stands for the greatest integer function. For example, if the initial grading function has returned p = 40 and the teacher has set M = 5, then the actual score that a student gets for the solution is P = 2. Recall that the raw score depicts the “percentage of correctness” of the solution. It is easy to see that the formula for P simply converts the raw score into the corresponding (rounded) fraction of the maximum score assigned to the problem by the teacher. 6.3. Inside the initial grading function As already said above, the initial grading function acts in accordance with hard-coded rules that reflect the specific nature of the problem in question. Two examples will now follow, in order to illustrate how the idea of initial grading works in practice. Each example will consist of the problem’s typical formulation, followed by the description of initial grading method. We start with an example where the grading rules are applied in a linear way, so to speak. Problem. Find the inverse of a given matrix. Initial grading algorithm. (Recall that the minimum possible value of raw score is 0 and the maximum possible value is 100). The student provides the solution as a collection of numbers – the entries of the required matrix inverse. If c denotes the number of the student’s correct entries, and i denotes the total number of entries, then the value of the raw score p is given by the formula p = 100 c i Note that the above formula linearly maps the percentage of correct elements into p, thus resulting in a somewhat “mechanical” way of grading. On the other way, a single computational error on the student’s part does not result in losing much of the maximum credit. The next example illustrates the application of a more sophisticated grading algorithm in the initial grading function. Problem. Find the roots of a given polynomial. Determine the multiplicity of each root. Here, the solution is to be provided as the collection of all roots together with the collection of all corresponding multiplicities. Thus the number of entries that the student is required to provide is twice the total number of roots. Initial grading algorithm. Let N be the total number of roots. Denote by n the number of correct roots provided by the student. Let m denote the number of correct multiplicities provided by the student. Finally, denote by i the number of incorrect roots provided by the student. The raw score is then given by the formula p = 100 (n + m − i ) , 2N provided that the expression on the right-hand side is nonnegative. If the expression is negative, then p is set to 0. The last example addresses the Gauss-Jordan elimination method. Problem. Reduce a given system of linear equations to the echelon form. Then determine the number of solutions. If there are infinitely many solutions, determine the number of parameters and then provide an arbitrary integer-valued solution. If the system has a unique solution, provide that solution. Initial grading algorithm. If the linear system in question has no solutions, then the raw score p is set to 0 or 100, depending on the student’s response. If the system has 1 solution, then the available maximum value of 100 is partitioned in the following way: 50 (for reducing the system to the echelon form) + 20 (for correctly determining that the number of solutions is equal to 1) + 30 (for the correct numeric solution). If the system has infinitely many solutions, then the partition has the form 40 (for reducing the system to the echelon form) + 10 (for the correct number of solutions) + 20 (for the correct number of parameters) + 30 (for a correct integer-valued solution). It should be emphasized at this point that the grading algorithm for each problem type has been made available to students. Whenever the raw score assumes more than two values, detailed description of the initial grading is given. The student is thus aware of the correspondence between a partial solution and a partial credit. 7. Results The administrative grading system exclusively based on the automatic assessment turned out to be far more effective than the traditional system, in which students are given two “paper” mid-term exams followed by the final exam. In each of the two tables below, the second column gives the percentage of students that received a grade given in the left column. Of course, other factors in favor of the e-course can be easily identified, like availability of the complete set of detailed lecture notes, access to interactive exercises, and the presence of practice exams. The fact that Table 2 gives more detail is caused by the limitations of the university data-gathering system when it comes to traditional forms of instruction. In the case of an e-course, detailed data can be easily obtained from a database where the results of automatic exams are being stored. Table 2. E-course - final grades Grade 2 3 3.5 4 4.5 5 Percentage of students 48.72% 6.41% 17.95% 3.85% 8.97% 14.10% Table 3. Traditional course - final grades Grade 2 3 or better Percentage of students 84.11% 15.89% 8. Conclusion One question was often raised by outside observers: how reliable is the final grade, if it includes credit that the student earns through distantly taken exams? As justified as questions of that sort are, it is not uncommon in the academic courses that a student’s homework assignments, say, are taken into account when the final grade is determined. (Online exams can be viewed as homework assignments in the electronic form, to be true). Accordingly, the sole fact that some of the student’s activity is not supervised, cannot serve as an argument against including online exams as part of the grading system, provided that additional conditions are imposed (e.g. the requirement that a student obtains a specified minimum on the final exam before distant exams are taken into account at all). The anonymous survey that the students have been given at the end of the course showed that a vast majority of students highly appreciate the new form of learning. As of this writing, in the fall semester of 2006, a total of nearly 1000 students take part in algebra e-courses where the automatic assessment again serves as the basis for the grading system. 9. References [1] ADL Technical Team, “Sharable Content Object Reference Model (SCORM)” Version 1.2, Advanced Distributed Learning, www.adlnet.org,, 2001 [2] E.Cosyn, J-P.Doignon, J-C.Falmagne, N.Thiery, “The Assessment of Knowledge, in Theory and Practice”, www.aleks.com/about/Science_Behind_ALEKS.pdf, 2004 [3] J.Engelbrecht, A.Harding, “Teaching Undergraduate Mathematics On the Internet”, Educational Studies in Mathematics, 58 (2005), p. 235-252. [4] P.Kajetanowicz, J.Wierzejewski, “E-lesson on Quadratic Function. A Step Towards an Online Remedial Math Course”, Proceedings, 5th International Conference Virtual University, Bratislava, Slovakia, December 2004 [5] P.Kajetanowicz, J.Wierzejewski, “E-learning in College Mathematics – an Online Course in Algebra with Automatic Knowledge Assessment”, Proceedings, 6th International Conference Virtual University, Bratislava, Slovakia, 2005