Automatic assessment of math skills

advertisement
Automatic assessment of math skills
Przemysław Kajetanowicz
Institute of Mathematics and Computer Science
Wrocław University of Technology
przemyslaw.kajetanowicz@pwr.wroc.pl
Jędrzej Wierzejewski
Institute of Mathematics and Computer Science
Wrocław University of Technology
jedrzej.wierzejewski@pwr.wroc.pl
Abstract
It has been three semesters now that the authors deliver
e-courses in linear algebra to students of Wroclaw
University of Technology. A few hundred problems in
algebra with analytical geometry are covered in each
course. The problems are offered to students in the form
of interactive randomly generated exercises, where the
difficulty level and other parameters can be controlled at
design time. Individual problems can be combined into
comprehensive exams (in particular, the whole course
material can be covered by a single exam). The article
presents an effective solution of automated math skills
assessment. Partial grading of selected individual
problems is discussed. For three semesters now, the
grading system that completely relies on distant
electronic tests and exams proctored in a computer lab is
in use, with big success.
1. Introduction
Regardless of any specific field of education, the
mechanisms of automatic assessment of a learner’s
progress and/or skills have nowadays become an essential
objective for designers of computer-aided learning
systems. Testing utilities can be widely seen in every
major learning management system (LMS). In the wellknown WebCT e-learning platform, for example, a
teacher is given a tool called online quiz, offering means
to conveniently create questions of various types:
multiple-choice, matching, calculated, short answer or
paragraph. Such tools provide relatively universal and
decent means to automate typical exams, where the
student is assessed in the sense of his or her knowledge of
facts or the ability to apply simple formulae.
Mathematics is one of those areas of education, where the
student’s progress is measured almost exclusively by
testing his or her ability to solve problems related to a
specific part of material. The solution of an individual
math problem usually breaks down into two or more steps
(except, of course, simplistic problems involving the
application of a single correctly identified formula or
rule). When grading a math exam, the teacher typically
devises a grading procedure for each individual problem
in such a way that each part of the solution is assigned a
given fraction of total score for the graded problem. In
that way, the grading procedure reflects the solution
process: the student that arrives at a partial solution earns
the corresponding partial credit. (One can say that there is
a one-to-one correspondence between the structure of a
scoring system and the structure of a typical solution).
Obviously, there are limits to that kind of approach: in
many math problems, it is very difficult or impossible to
determine what “a typical solution” might mean. Yet in
many cases the scoring procedure that mimics the solution
process works well and is widely used in mathematics and
related fields.
In view of those remarks, it is evident that generalpurpose testing utilities that a typical LMS offers are of
limited value when one tries to map them to a more
flexible math grading system.
In the sequel of the paper the authors present their
experience in implementing a grading system totally
based on automatic knowledge assessment, with extensive
use of the above-mentioned partial credit feature.
2. E-course in algebra – first version
2.1. Project history
During the first 9 months of 2005, the authors designed
and created an e-course in algebra with analytical
geometry (see [5] for details). The course employed and
extended the idea of automatic knowledge assessment
first used in an experimental lesson on quadratic function
([4]).
The reader is referred to [5] for the details concerning the
functionality of the first version of the course and for the
results of its first implementation in the spring of 2005.
2.2. Functional features and first implementation
For the sake of completeness, let us repeat the most
important features of the e-course (again, refer to [5] for
details):
• The math content of the course covers complex
numbers, polynomials and rational functions,
matrices and linear systems, analytical geometry in
space, and conic sections. The whole content can be
easily aggregated into an IMS or a SCORM package
(see [1]).
• The material is offered in the form of Web pages
supplemented with highly interactive Java-driven
exercises and interactive math simulators.
• Nearly 80 types of math problems are supported in
the form of interactive tests, varying from simple,
drill-type exercises to quite sophisticated graphing
problems or problems where the user is supposed to
demonstrate mastery of a given method.
• Individual problems can be combined into
sophisticated comprehensive exams that can be used
both for practice and as an element of the official
grading system.
The grading system that accompanied the first
implementation of the course was totally based on 3
electronic exams that the students were taking in a
computer lab.
solution to the exam window. As a result, the solution
was graded as incorrect, even though the student had
arrived at the correct solution on paper. In the
enhanced version, the program first initially checks
for incorrect solutions and then gives the student a
certain number of chances to correct the solution.
Since that feature obviously gives extra chances also
to students who simply do not know how to get to the
correct solution, the mechanism should be used with
caution (e.g. only one or two tries are offered to the
student).
4. Second implementation
In the fall semester of 2005, the course was delivered to a
group of almost 400 students of Civil Engineering
Department of Wroclaw University of Technology.
The course was taught, as before, on a blended learning
basis, with two hours of lecture and one hour of recitation
per week. The authors continued to make use of WebCT
as the learning platform. The grading system was again
supposed to be based on automatic assessment.
One natural technical barrier emerged: with a number of
students totaling to almost 400, it was practically
impossible to carry out mid-term electronic exams in
computer labs. Therefore, the authors decided that only a
final exam was to be held under teachers’ supervision in a
lab. In addition, each student was supposed to take 6
online exams whose results were automatically stored in a
database.
The obvious lack of security (obviously, a student can be
taking online exams with a third party’s help) was taken
into account when the final grade was calculated. The
next section presents the details of the grading system.
3. Further development
The results of the first implementation of the course were
promising enough to decide to apply the automatic
assessment system on a larger scale. In the meantime, a
few more types of problems were supported to enhance
the course functionality. In response to some students’
complaints that the automatic tests are “ruthless” in the
sense that forgetting to fill out a single edit field results in
the solution being rejected, two safety features were
added to Java-driven tests.
• Completeness check. When a student submits an
exam, the program checks whether all the necessary
edit fields in the exam window have been filled out.
The program warns the student if the solution is not
completely entered. The student can then go back to
the exam window and complete his or her work.
• Initial correctness check. Many students
complained that they are punished for “technical”
errors that could take place while entering the
5. Administrative grading system
The grading system was completely based on automatic
assessment. During the semester, each student was taking
6 online exams (without teachers’ supervision). At the
end of the semester, the final exam was administered in a
computer lab. The following grading procedures were
used.
• Each of the 6 online exams consisted of 5 problems,
with a maximum score of 4 points for a single
problem. The time constraint for an individual exam
was 60 minutes. In that way, a student could earn a
total of 120 points throughout the semester.
• The final exam consisted of 6 problems, each
carrying a maximum score of 5 points (totaling to
30). A student was given 90 minutes for the solution.
• Additionally, a student could earn up to 6 points for
the in-class activity.
Regardless of how well a student had done on the
online exams, he or she had to demonstrate
appropriate level of skills on the final exam.
Specifically, a minimum of 16 points on the final
exam was the necessary (but not sufficient) condition
for a student to pass. Otherwise a student was failing
and had to take the make-up exam.
• If a student scored at least 16 points on the final
exam, then 20% of the student’s online-earned scores
(up to 24 points) as well as the activity credit (up to 6
points) were added to the result of the final exam. In
that way, a maximum of 60 points was available for a
student to earn for all the elements of the course. The
final grade was based on the table below.
• A student who did not get a passing grade after the
final exam had to take the make-up exam consisting
of 8 problems to be solved within 120 minutes. Up to
60 points could be earned for the solutions. The final
grade was based on the table below, with online
exams or in-class activity no longer taken into
account.
The following table contains the conversion of scores into
grades (the grades in Polish education system form the set
2, 3, 3.5, 4, 4.5 and 5, with “2” corresponding to “fail”
and “5” corresponding to “very good”).
•
Table 1. Score-grade conversion
Total score
Grade
0 – 35
2
36 – 40
3
41 – 45
3.5
46 – 50
4
51 – 55
4.5
56 – 60
5
6. How automatic assessment works
The whole course includes nearly 80 problem types,
covering the whole material, as listed in 2.2. Recall that
each problem can be delivered to student either in the
form of a single interactive exercise or as a part of a more
comprehensive exam. Regardless of the actual role played
by a problem at a specific point of study (a single exercise
or an exam problem), a student is required to solve a
problem by hand and then to enter appropriate elements
of the solution in the applet window. Specially designed
user interface is provided for that purpose: edit fields, spin
controls, and – in graphical problems – sophisticated
graphical tools, including a virtual ruler and virtual eraser.
Additionally, in some problems involving matrix and
linear systems, a student is provided with additional
highly specialized tools so that elementary row/column
operations can be performed conveniently without the risk
of computation errors.
The grading begins once a student submits an exercise or
an exam for evaluation. Regardless of the fact that
different problem types implement different grading
algorithms, grading is governed by a few common rules,
which we will now discuss in greater detail.
6.1. Three types of grading
One of the following three grading types can be
associated with an individual problem.
Two-state grading – “correct” or “incorrect” is one of two
possible grades.
Three-state grading – a student receives the grade equal
to 0, ½ or 1, with ½ reflecting partial credit for a partial
solution.
Score grading – a specified maximum number of points
can be earned for the solution.
The teacher’s freedom in choosing one of the abovedescribed types of grading essentially depends on the role
that an individual problem plays.
Single-problem-exercise. If a problem “acts” as a singleproblem-exercise, then it is up to the teacher/designer,
which of the three grading types to choose. Two-state and
three-state grading is commonly applied in such cases,
although score grading is also available. No intervention
in the source code is necessary - the teacher defines the
desired grading type through a parameter in an external
configuration file.
Exam problem. The score grading is the teacher’s sole
option in the case of an exam problem. To each problem
on the exam, the teacher/designer has to assign a
maximum score M by setting an appropriate parameter in
an external configuration file.
6.2. Grading algorithm for a single problem
The actual grading of a single problem is carried out in
two steps, of which the first is completely hard-coded,
while the second partially depends on the type of grading
that the teacher has chosen; if the score grading has been
chosen, then the maximum score M assigned by the
teacher is also taken into account.
Step 1. A hard-coded initial grading function is
associated with every problem type. The function first
verifies the correctness of individual elements of the
solution entered by the student. A real number p from the
interval [0,100] is then returned, based on the function’s
internal algorithm, specific to a particular problem type.
(From the user’s point of view, the number p can be
viewed as the evaluated “percentage of correctness” of the
solution). We will refer to the number p as raw score.
Note that the raw score heavily depends on the specific
type of a problem and the structure of the solution. We
will give illustrative examples in the sequel.
Step 2. The actual grade for the solution of a single
problem is based on two elements: a) the actual raw score
that the initial grading function returns, and b) the
associated type of grading (see the remarks on the
teacher’s options above).
Two-state grading. If the returned raw score equals 100,
then the solution is graded as “correct”. Otherwise the
solution is graded as “incorrect”.
Three-state grading. If the returned raw score is equal to
100, then the solution is graded as “1”; if the raw score
satisfies 50 ≤ p < 100, then the solution is graded as “½”;
otherwise the solution is graded as “0”.
Score grading. Let M denote the maximum score set by
the teacher for the solution of the problem, and let p stand
for the raw score returned by the initial grading function
(see Step 1 above). The actual score S is then computed
from the formula
 p⋅M 
P = E
,
 100 
where E stands for the greatest integer function. For
example, if the initial grading function has returned p =
40 and the teacher has set M = 5, then the actual score that
a student gets for the solution is P = 2. Recall that the raw
score depicts the “percentage of correctness” of the
solution. It is easy to see that the formula for P simply
converts the raw score into the corresponding (rounded)
fraction of the maximum score assigned to the problem by
the teacher.
6.3. Inside the initial grading function
As already said above, the initial grading function acts in
accordance with hard-coded rules that reflect the specific
nature of the problem in question. Two examples will
now follow, in order to illustrate how the idea of initial
grading works in practice. Each example will consist of
the problem’s typical formulation, followed by the
description of initial grading method.
We start with an example where the grading rules are
applied in a linear way, so to speak.
Problem. Find the inverse of a given matrix.
Initial grading algorithm. (Recall that the minimum
possible value of raw score is 0 and the maximum
possible value is 100). The student provides the solution
as a collection of numbers – the entries of the required
matrix inverse. If c denotes the number of the student’s
correct entries, and i denotes the total number of entries,
then the value of the raw score p is given by the formula
p = 100
c
i
Note that the above formula linearly maps the percentage
of correct elements into p, thus resulting in a somewhat
“mechanical” way of grading. On the other way, a single
computational error on the student’s part does not result
in losing much of the maximum credit.
The next example illustrates the application of a more
sophisticated grading algorithm in the initial grading
function.
Problem. Find the roots of a given polynomial.
Determine the multiplicity of each root.
Here, the solution is to be provided as the collection of all
roots together with the collection of all corresponding
multiplicities. Thus the number of entries that the student
is required to provide is twice the total number of roots.
Initial grading algorithm. Let N be the total number of
roots. Denote by n the number of correct roots provided
by the student. Let m denote the number of correct
multiplicities provided by the student. Finally, denote by i
the number of incorrect roots provided by the student. The
raw score is then given by the formula
p = 100
(n + m − i )
,
2N
provided that the expression on the right-hand side is nonnegative. If the expression is negative, then p is set to 0.
The last example addresses the Gauss-Jordan elimination
method.
Problem. Reduce a given system of linear equations to
the echelon form. Then determine the number of
solutions. If there are infinitely many solutions, determine
the number of parameters and then provide an arbitrary
integer-valued solution. If the system has a unique
solution, provide that solution.
Initial grading algorithm. If the linear system in
question has no solutions, then the raw score p is set to 0
or 100, depending on the student’s response. If the system
has 1 solution, then the available maximum value of 100
is partitioned in the following way: 50 (for reducing the
system to the echelon form) + 20 (for correctly
determining that the number of solutions is equal to 1) +
30 (for the correct numeric solution). If the system has
infinitely many solutions, then the partition has the form
40 (for reducing the system to the echelon form) + 10 (for
the correct number of solutions) + 20 (for the correct
number of parameters) + 30 (for a correct integer-valued
solution).
It should be emphasized at this point that the grading
algorithm for each problem type has been made available
to students. Whenever the raw score assumes more than
two values, detailed description of the initial grading is
given. The student is thus aware of the correspondence
between a partial solution and a partial credit.
7. Results
The administrative grading system exclusively based on
the automatic assessment turned out to be far more
effective than the traditional system, in which students are
given two “paper” mid-term exams followed by the final
exam. In each of the two tables below, the second column
gives the percentage of students that received a grade
given in the left column. Of course, other factors in favor
of the e-course can be easily identified, like availability of
the complete set of detailed lecture notes, access to
interactive exercises, and the presence of practice exams.
The fact that Table 2 gives more detail is caused by the
limitations of the university data-gathering system when it
comes to traditional forms of instruction. In the case of an
e-course, detailed data can be easily obtained from a
database where the results of automatic exams are being
stored.
Table 2. E-course - final grades
Grade
2
3
3.5
4
4.5
5
Percentage
of students
48.72%
6.41%
17.95%
3.85%
8.97%
14.10%
Table 3. Traditional course - final grades
Grade
2
3 or better
Percentage
of students
84.11%
15.89%
8. Conclusion
One question was often raised by outside observers: how
reliable is the final grade, if it includes credit that the
student earns through distantly taken exams? As justified
as questions of that sort are, it is not uncommon in the
academic courses that a student’s homework assignments,
say, are taken into account when the final grade is
determined. (Online exams can be viewed as homework
assignments in the electronic form, to be true).
Accordingly, the sole fact that some of the student’s
activity is not supervised, cannot serve as an argument
against including online exams as part of the grading
system, provided that additional conditions are imposed
(e.g. the requirement that a student obtains a specified
minimum on the final exam before distant exams are
taken into account at all).
The anonymous survey that the students have been given
at the end of the course showed that a vast majority of
students highly appreciate the new form of learning.
As of this writing, in the fall semester of 2006, a total of
nearly 1000 students take part in algebra e-courses where
the automatic assessment again serves as the basis for the
grading system.
9. References
[1] ADL Technical Team, “Sharable Content Object
Reference Model (SCORM)” Version 1.2, Advanced
Distributed Learning, www.adlnet.org,, 2001
[2] E.Cosyn, J-P.Doignon, J-C.Falmagne, N.Thiery, “The
Assessment of Knowledge, in Theory and Practice”,
www.aleks.com/about/Science_Behind_ALEKS.pdf,
2004
[3] J.Engelbrecht, A.Harding, “Teaching Undergraduate
Mathematics On the Internet”, Educational Studies in
Mathematics, 58 (2005), p. 235-252.
[4] P.Kajetanowicz, J.Wierzejewski, “E-lesson on
Quadratic Function. A Step Towards an Online Remedial
Math Course”, Proceedings, 5th International Conference
Virtual University, Bratislava, Slovakia, December 2004
[5] P.Kajetanowicz, J.Wierzejewski, “E-learning in
College Mathematics – an Online Course in Algebra with
Automatic Knowledge Assessment”, Proceedings, 6th
International Conference Virtual University, Bratislava,
Slovakia, 2005
Download