Practical Online Formative Assessment for First Year Accounting

advertisement
Practical Online Formative Assessment for First
Year Accounting
B. Bauslaugh
bruce@lyryx.com
N. Friess
nathan@lyryx.com
Lyryx Learning Inc.
#205, 301 - 14th Street N.W.
Calgary, Alberta, Canada T2N 2A1
C. Laflamme
laflamme@ucalgary.ca
University of Calgary
Department of Mathematics and Statistics
2500 University Dr. NW. Calgary Alberta Canada T2N1N4
August 10, 2011
Abstract
Lyryx Learning began as a project in the Department of Mathematics and
Statistics at the University of Calgary, Alberta, Canada. The objective was
to provide students and instructors with a learning and assessment tool better
than the available multiple-choice style systems. Lyryx soon began focusing
on Business and Economics courses at the higher education level, eventually
offering course assessment to 100,000 students annually.
This article describes its efforts toward online formative assessment in first
year accounting courses. An appropriate user interface was first designed to
allow for the presentation of genuine accounting objects, in particular financial
statements. Next, grading algorithms were implemented to analyze student
work and provide them with personalized feedback on their work, sometimes
even going beyond what is normally provided by a human grader. Finally,
performance analysis is presented to students and instructors, enabling them
to better to assess learning and help them both make necessary adjustments.
1
Introduction
Multiple-choice style (MCS) assessment has now been around for about a century.
The simplicity of the MCS structure made it a perfect candidate for early automated
implementation, and in the 1970’s PLATO may very well have been the first large
scale system to do so. Although still widely used for summative assessment (i.e.
1
for testing mastery of learning outcomes), it is precisely the simplicity of the MCS
structure which undermines its value and potential as a learning tool (see for example
Scouller [14]). A grader that merely indicates whether answers to questions are
correct or not can only provide limited learning assistance. However, “Computeraided assessment” (CAA) has made significant progress over the last forty years (see
Conole and Warburton [4]), now offering a much wider range of tools for learning
and assessment. In particular CAA now has a much better chance to be effectively
used toward formative assessment, generally defined as the use of assessment to
provide information to students and instructors over the course of instruction to be
used in improving the learning process (see for example [2]). Boston [3] goes on
to say that “Feedback given as part of formative assessment helps learners become
aware of any gaps that exist between their desired goal and their current knowledge,
understanding, or skill and guides them through actions necessary to obtain the goal”
(see also Ramaprasad [12], and Sadler [13]).
One of the best forms of formative assessment is what is traditionally known as
”homework”, which is the focus of this project and current article, as applied to the
area of first year accounting. Because of its procedural aspects, accounting lends
itself naturally to CAA. Bangert-Drowns, Kulick and Morgan [1], and Elawar and
Corno, [6]), emphasize that one of the most helpful type of feedback on homework
and examinations is specific comments on the mistakes and errors in a student’s
work, together with targeted suggestions for improvement. As we will see, this
capability is now much closer to reality for CAA, with the further ability to allow
students to repeatedly try a given task again and again, each time receiving feedback
and personal guidance on their work. This provides students with a rich learning
environment: it is not simply getting the right answer that is important, but rather
the opportunity to succeed while focusing their attention on the task at hand.
2
Overview of the Formative Assessment Implementation
Lyryx assessment was initially based on the pedagogical premise that assessment
methods can be categorized as either formative or summative. Formative assessment methods are primarily concerned with providing feedback to students to assist
their learning of the subject material. Typically this takes the form of homework assignments, lab work, etc. in which the student is given extensive feedback regarding
what errors they made and how they should be corrected. Summative assessment
is primarily concerned with gauging the extent to which a student has mastered
the subject material. This usually takes the form of quizzes or exams, and detailed
feedback may not be provided. MCS assessment performs reasonably well as a summative assessment tool, but fails in the formative role. Our goal was to develop and
implement a formative CAA.
Our system uses four main features to accomplish this goal:
1. Randomized Questions
2. The User Interface
2
3. The Grading Algorithms
4. Feedback Information to Students and Instructors
Randomization of the question content allows the students to attempt the same
question as many times as they require to master a concept without allowing mere
memorization of the answer to a specific question. Each time a question in our
system is accessed, the specifics are regenerated randomly to provide an effectively
infinite pool of similar questions. In this respect CAA can be superior to traditional homework, where a student has only one (or very few) attempts, and does
not have the opportunity to make use of the feedback provided by the instructor
by attempting the assignment again. With our system the student can practice the
same question over and over, each time receiving informative feedback, until they
succeed. Furthermore, only the student’s best grade over all attempts at the question is recorded, providing a low-threat environment and motivating the student to
continue practicing until the material is mastered.
The user interface (UI) is necessary so that students can be presented with objects appropriate for the intended field of study, and should be as intuitive as possible
to avoid placing any barriers to learning. Students need to focus on learning the
subject material rather than the interface of a piece of software. The user interface
also needs to be flexible enough to avoid inadvertently giving the student extra information; a restrictive user interface can force the student into making decisions
that they should be using their knowledge to make. In the case of accounting, students must learn how to create financial statements, and the experience should be as
similar as possible to creating a financial statement using a spreadsheet, or free-form
on paper.
Grading homework is generally viewed as one of the most onerous duties of an instructor. Effective grading for formative assessment requires skills obtained through
years of teaching experience: the ability to understand and anticipate student difficulties, and to provide helpful and fair comments on their work and suggestions for
improvement. Replicating these skills to any extent in CAA is a significant undertaking. The UI plays a role in helping the computer understand the actual student
work, by imposing some restrictions and structure on the student’s input. Then the
grading knowledge of an experienced instructor must be harnessed and be implemented into a series of computer instructions. The correct answer is typically easy
enough to recognize, although even here rounding errors in calculations need to be
carefully handled. When the answer is incorrect, however, an experienced instructor
can often recognize where the student has gone astray and show them how to correct
their error. This ability is reproduced in our system by comparing incorrect student
responses to a pool of ’common mistakes’ for a given problem. If such a common
mistake is recognized, appropriate feedback is given. Clearly quantitative subjects,
in particular procedural ones, lend themselves much better to this implementation.
As experienced instructors know, there is a subtle and difficult balance to be
achieved between helping students with feedback and suggestions but not spoonfeeding them. Restraint must be used in not providing students with recipes to
solve the problems at hand, but instead providing enough guidance to assist them
and avoid excessive frustration. Moreover the content of the feedback, and grading
3
procedures in general, should be closely aligned with the content learned by the
students in class. For example, notation and procedures, as well as difficulty level,
should match what is used by the student’s textbook and instructor as closely as
possible. This is simply so that all parts of the learning process, whether they be
lessons, assessment, or any other component, form as cohesive a learning experience
as possible.
Finally, computers also have the ability to store vast amounts of information,
and this can provide valuable feedback to students and instructors, completing the
framework for effective formative assessment. In this context privacy is always a
concern - it is our opinion that details of when and how a student works is personal
information that should not be tracked. However, aggregate information such as
the average number of attempts at a question taken by students in a class, average
grade per attempt, and similar statistics can be extremely valuable to an instructor
for determining where the class is struggling. This can be used to focus class time
more effectively. For individual students, the system can track their performance
on individual topics and provide suggestions as to where their efforts would be best
spent during their study time.
3
Randomized Questions
It is well known that the key to learning any skill is practice. By far the most effective
way for students to master a subject such as accounting is to actively engage with the
material - actually doing accounting and receiving feedback on their performance.
Unfortunately this requires a significant amount of resources per student, and
a typical instructor with several large classes of students can devote only a limited
amount of effort to any individual student. As a result, each topic in a course
will generally be tested only by a single assignment or quiz, with limited feedback
provided to each student. Students will often give only a cursory glance to the
feedback, since they will likely not be tested on the same material again until the
final exam. Ideally, students should be provided with immediate feedback on their
work and have the opportunity to apply the new information as quickly as possible.
In this respect an automated system is ideal - the feedback is instant and the student
can immediately attempt the question again to verify that they have understood the
feedback.
However, this is undermined if the student simply attempts an identical question.
For the feedback to be useful it should contain the correct answer, but there is no
value in the student simply copying the correct answer into a new instance of the
question. What is required is a similar question, testing the same content, but with
enough difference that the student must engage the new question using the concepts
explained in the feedback.
A key component of the Lyryx system is randomization of the questions. Each
question in the test bank is really a question type that can be instantiated in an
essentially unlimited number of specific randomized instances. Obviously the details
of how this is done are dependent on the question being implemented. In accounting,
for example, a journalization question will typically have a large pool of transaction
templates from which a subset is selected, and then realistic random values are
4
generated for the amounts within each transaction.
4
The User Interface
One of the key parts of the Lyryx assessment engine is the user interface. How a
question is presented to the student ties in with the level of grading that will be
applied to the student’s answer and detail of feedback that can be provided. If an
instructor wants to provide a more granular response than “right” or “wrong”, then
the student must be allowed to explore the course material in a rich user interface. In
the field of accounting, two of the most important and frequently used user interface
elements are journals and financial statements. Both these feature prominently in
the Lyryx system.
Early on in learning financial accounting, students typically encounter the concept of journalizing transactions. A student will be given a list or narrative describing transactions performed by a fictitious company and they are asked to record
these transactions in a journal using double-entry bookkeeping. One journal entry
will contain the date that the transaction is performed, a list of one or more accounts to which the transaction applies (either as a credit or debit), and the dollar
amounts that are credited or debited to the associated account. Finally, there can
be a comment or explanation of the transaction, which may include an short English
statement, a related invoice number, and so on. The Lyryx journal entry interface is
presented in a fixed format that is similar to the format found in a typical journal.
A blank journal entry is shown in Figure 1.
Figure 1: A blank journal entry
When a student clicks on the date, debit, or credit fields of a journal entry, a text
box allows the student to enter their value for that field. Clicking on the account
field brings up a dialog that allows the student to choose from a chart of accounts,
containing a full selection of accounts for a company of the type the question refers
to. This allows the student the freedom to make mistakes and learn which accounts
are appropriate for the transaction presented in the question. Formulas can be input
for numerical amounts (credits and debits), and these are checked immediately for
correct syntax. Similarly, dates are parsed using typical day/month formats. For
each journal entry, two rows are displayed by default, as each entry will require at
least two accounts. For more complex entries, the student will need to add more
5
rows to the entry using the buttons under the date field. This avoids giving the
student extra information (i.e. how many accounts the transaction requires). When
a student is presented with multiple transactions to be journalized, they can enter
the entries into the journal in any order, and individual account names can be listed
in each entry in any order. A typical completed journal entry is shown in Figure 2.
Figure 2: A completed journal entry
Another important user interface object in Lyryx accounting products is the
financial statement. Technically, a financial statement is very similar to a journal;
both are variants on a spreadsheet. Journals have a more strict layout than financial
statements, but they both have fields for accounts and numerical amounts. Financial
statements, however, are much more free-form than journals. In the Lyryx system
they start as an empty list of rows with no formatting. An empty income statement
is shown in Figure 3.
Figure 3: A blank income statement
The student can click on the large column to bring up a chart of accounts, similar
to the one shown in journals, but also including a list of section headings that can
be used in the statement. The last narrow columns are used for numbers that are
the account balances, totals, etc. for the rows. The student can add and delete rows
using down arrow and X buttons. The left and right arrows change the indenting
of each row, to help the student format the various sections and sub-sections of the
financial statement (although this is purely for display purposes and is not considered
in the grading process). For financial statements, it is left to the student to know
which sections are appropriate for this kind of statement, such as Assets, Liabilities,
6
and Equity, as well as how these should be organized into sub-sections like Current
Liabilities and Long-Term Liabilities. The student can enter the various sections
in any order, and choose any account from the general list of accounts and place
them in the section that they believe is appropriate. The student will also need to
total each section and choose the appropriate heading for that line, such as Total
Liabilities. Here the chart of accounts dialog has various distractors for headings
that may be used in one kind of statement, but not appropriate for the current
question. Figure 4 shows a completed income statement in the Lyryx system.
Figure 4: A completed income statement
5
The Grading Algorithms
The primary goal for the Lyryx assessment engine is to grade all student work as
closely as possible to the way a human (instructor) would grade the work, if not
better, drawing on the strengths of computer algorithms. When a student makes a
mistake, the computer will attempt to determine where the student made the error
in their work, and will use the instructor knowledge that is encoded in the grading
algorithm to determine the specific error committed and provide targeted feedback
to the student. This helps the student to understand where they went wrong and
how they can improve their grade on a future attempt.
A simple example of the level of grading desired can be shown in a financial
statement. If a student makes an error in one of the account balances in the Liabilities section of a financial statement, then the system should identify the line
where the error was made. Perhaps a common mistake is for students to enter a
7
negative number instead of a positive number, and the system can make a note of
this in the feedback presented to the student. Furthermore, if the student sums up
the Liabilities section correctly, based on the incorrect number from the earlier line,
then the system should not penalize them again. Marks should be deducted only
once for the initial mistake and not for totals or other values that were computed
consistently with the initial mistake. All of these are relatively simple details that a
human would be able to manage easily, but they require considerable sophistication
in an automated system.
When grading journal entries, there are a number of types of mistakes that a
student can make. For an individual transaction, a student can select an incorrect
account, enter an incorrect dollar amount, date, or comment. There may also be
more than one correct way to journalize the same transaction, or set of transactions. For example, a purchase transaction may be entered either as one transaction
involving four accounts or as two transactions involving two accounts each. The
system should accept all correct methods of journalizing the transactions, while rejecting any incorrect variants. The grading process should not be dependent upon
the order in which the transactions are entered in the journal, or the order in which
the accounts are entered in the transaction, although there are often conventions for
these which the system should point out to the student.
As mentioned earlier, correct solutions are relatively straightforward to grade.
The difficulty arises when the input is incorrect, because it is often unclear (especially to an automated system) what the student intended to do, and hence it is
difficult to provide meaningful feedback. For example, if two transactions are entered in a journal incorrectly, it may be unclear which entry was intended for which
transaction.
The Lyryx grading algorithm for journals takes as its input the list of transactions from the question, and the journal that the student entered via the user
interface. The list of transactions can include information on alternative correct
ways to journalize the transactions, individually or in combination. The algorithm
considers every possible matching of the transactions to the student’s journal entries,
and selects the best one. We define the best matching to be the one that awards the
student with the highest number of marks. Thus, this results in an interpretation of
the student’s answer that gives them the most marks. We opted for this metric in
acknowledgment of the fact that any automated system may misinterpret the student’s input, and it would be particularly frustrating for the student if they received
a lower grade due to an error by the grading process. Clearly for this method to be
effective, it is important for the assignment of marks to be an accurate reflection of
how good a fit an entry is for a transaction.
Financial statements are similar to journals in that the algorithm takes in a
correct financial statement and the student’s financial statement. However, the
basic unit being marked is not a journal entry, but instead the basic unit is a
section in the financial statement. Each section of the student’s answer (such as
Assets, Liabilities, etc) is decomposed into its smallest possible sub-sections. The
sub-sections are then combined into the higher-level sections, building a tree of all
possible ways to combine the sections and sub-sections of the student’s answer. The
algorithm then runs through each possible interpretation and marks it against the
correct answer, choosing the interpretation that will award the student with the best
8
possible marks. Again, this may not be the same interpretation that the student
intended. However, this approach allows not only for a few misplaced or incorrect
accounts in various sections, but it also allows the marker to recognize a misplaced
sub-section within a larger section, as well as the simple cases of differing orders of
accounts or sections.
To illustrate how the grading algorithm for financial statements works and the
level of feedback provided, we provide an example of a correct income statement, a
student answer with some common kinds of mistakes, and the feedback provided by
the marker. The correct answer and student answer are shown in Figure 4 above:
these are the inputs to the grading algorithm. The resulting feedback is shown in
Figure 5.
Your solution was:
Grading:
Congratulations. You have completed this income statement correctly.
Figure 5: Typical feedback for a correct income statement
Note in particular how the feedback deals with errors that are used in subsequent
totals in Figure 6.
These algorithms are based on brute force attempts to find the best interpretation, as defined by giving the student the best possible grade. Currently, these
algorithms are effective enough for students to receive a fair grade and receive useful
feedback. In practice we have found that students and instructors are very satisfied
with the feedback generated by these algorithms and that the interpretations made
by the algorithms match closely with the intentions of the students. However, the
brute force algorithms are not perfect; they can be very resource intensive and as
mentioned previously, the interpretation that the algorithm uses may not align with
9
Your solution was:
Grading:
Revenues (Lines 1-2)
You have completed this part correctly.
Expenses (Lines 3-8)
Line 4: ’Advertising expense’ should be $5,700, but you have not entered this. This will cost you 1 mark.
Line 8: Total expenses should be $183,700, but you have not entered this. However, your answer is consistent with
the accounts you listed in this section so you will not lose any marks.
Operating income (Line 9)
Line 9: Operating income should be $66,300, but you have not entered this. However, your answer is consistent
with the accounts you listed in this section so you will not lose any marks.
Other gains and losses (Lines 10-11)
Line 11: ’Loss on sale of investment’ should be $5,500, but you have not entered this. This will cost you 1 mark.
Income before income taxes (Line 12)
Line 12: Income before income taxes should be $60,800, but you have not entered this. However, your answer is
consistent with the accounts you listed in this section so you will not lose any marks.
Income tax expense (Line 13)
You have completed this part correctly.
Net income (Line 14)
Line 14: Net income should be $56,200, but you have not entered this. However, your answer is consistent with the
accounts you listed in this section so you will not lose any marks.
Figure 6: Typical feedback for grading consistent values in an income statement
the student’s mental model of their answer. There is room for more research into
other algorithms for interpreting students’ answers.
6
Feedback Information to Students and Instructors
An important part of formative assessment is the opportunity to reflect on the
learning process and make adjustments as required by individual learners. This is
true for both students and instructors.
It is straightforward, of course, to record student grades and provide instructors
10
with a gradebook for their class. This provides both students and instructors with
a general overview of their performance in the class.
For the student, receiving a lower grade than expected on an assignment is
typically a wake-up call for action, but where to start? There is much information
that can be useful to assist the student in that direction. For example the system
stores grades broken down for each question from all assignments, each of which
corresponds to particular concepts in the subject material. Thus the system can
report to the student not only on their overall performance, but on how they are
performing on each concept covered in their assignments; a much more granular
approach. The student may learn they have performed better on some aspects of
the course than others, and thus can focus on improving in these specific areas.
Further, the system can suggest practice questions exactly in these areas.
What is also of importance here is that this environment is for the student alone,
which is a low threat tool not visible even to the instructor. There are no grades
involved but simply the opportunity for the student to improve. A simple colored
green/yellow/red Performance Meter informs the student on their performance for
each of the concepts in the subject. The more practice and success, the longer the
green bar.
For the instructor, that same granularity of measurement of student performance
on each question is averaged over the entire class and presented on a graph. Additional data is provided, such as the average number of attempts on each question,
as well as the average grade on all attempts by the students. This data provides
instructors with an overall measure of student mastery of the course material. They
can then address any perceived shortcomings and better assist students with their
needs.
There are other variables worth considering. One is a sense of how long students
typically require to complete an assignment, which could in theory provide a sense
of difficulty level for the material. However that data is difficult to record and gauge
accurately; students can typically do their work at anytime and anywhere they wish,
thus they may well start an assignment, and then take a break for long period of
time without the system being aware. Even a long period of inactivity cannot be
interpreted with certainty.
7
Limitations and Future Work
Currently Lyryx assessment focuses on subjects in fields that have a significant
amount of quantitative material, such as accounting, finance, economics, statistics,
and mathematics. These subject areas are chosen because they lend themselves well
to questions that allow for multiple steps in the student’s work, which can then be
analyzed by a grading algorithm. Also, quantitative questions have answers that
can be classified as “right” or “wrong” more easily than verbal questions.
In verbal or qualitative subject areas, Lyryx is still primarily limited to multiple choice or simple “fill in the blank” style answers. Currently, Lyryx is actively
engaged in researching other methodologies for grading qualitative questions. One
such approach is to use a peer-to-peer style grading system, where students write
short paragraphs and grade each others’ work, thereby removing the requirement
11
for a computer to analyze and grade these answers. More research is needed to
effectively harness computing power for qualitative grading.
Even in the quantitative fields, much better algorithms may be applied compared to the brute force algorithms that Lyryx currently uses. For example, various
classification algorithms used in artificial intelligence may be applicable to grading
student work, such as a Naive Bayesian classifier, or a Perceptron classifier. These
sorts of algorithms are current used in classifying email SPAM (see Graham [7]),
where a similar binary verdict is determined based on inputs (SPAM versus not
SPAM, or in our case, correct versus incorrect answers).
Finally, although we have demonstrated two algorithms that mark particular
kinds of questions used in accounting courses, more research is needed into user
interfaces and grading algorithms for other kinds of quantitative questions. For
example, Lyryx has a user interface for allowing a student to draw a graph, as well
as a corresponding grading algorithm. Increasing the pool of possible questions
allows for instructors to assess students’ knowledge of the material in more depth.
References
[1] R.L. Bangert-Drowns , J.A. Kulick, and M.T. Morgan, The instructional effect
of feedback in test-like events, Review of Educational Research, 61 no. 2 (1991),
pp. 213-238.
[2] P. Black and D. Wiliam, ’In Praise of Educational Research’: formative assessment, British Educational Research Journal, 29 no. 5 (2003), pp. 623-637.
[3] C. Boston, The concept of formative assessment, Practical Assessment, Research
& Evaluation ( http://pareonline.net ), 8 no. 9, (2002).
[4] G. Conole and B. Warburton, A review of computer-assisted assessment, ALT-J,
Research in Learning Technology, 13, No. 1 (2005), pp. 17-31.
[5] T. Crooks, The Validity of Formative Assessments, British Educational Research
Association Annual Conference, University of Leeds, September 2001. Available
at http://www.leeds.ac.uk/educol/documents/00001862.htm
[6] M.C. Elawar and L. Corno, A factorial experiment in teachers’ written feedback
on student homework: Changing teacher behaviour a little rather than a lot,
Journal of Educational Psychology, 77 no. 2 (1985), pp. 162-173.
[7] P. Graham, Better Bayesian filtering, 2003 Spam Conference, January 2003.
Available at: http://www.paulgraham.com/better.html
[8] K. Howie and N. Sclater, User requirements of the ultimate online assessment
engine, Computers and Education, 40, no. 3 (2003), pp. 285-306.
[9] T. Jensen, Enhancing the critical thinking skills of first year business students,
presentation at Kwantlen Polytechnic University, Vancouver, 2008.
12
[10] Report for JISC, Roadmap for e-assessment, June 2006. Available
at
http://www.jiscinfonet.ac.uk/InfoKits/effective-use-ofVLEs/resources/roadmap-for-eassessment
[11] D. Nicol and C. Milligan, Rethinking technology-supported assessment in terms
of the seven principles of good feedback practice. In C. Bryan and K. Clegg
(Eds), Innovative Assessment in Higher Education, Taylor and Francis Group
Ltd, London (2006).
[12] A. Ramaprasad, On the definition of feedback, Behavioral Science, 28 no. 1,
(1983) pp. 4-13.
[13] D.R. Sadler, Formative assessment and the design of instructional systems,
Instructional Science, 18 no.2, (1989) pp. 119-144.
[14] K. Scouller, The influence of assessment method on students’ learning approaches: multiple choice question examination versus assignment essay, Higher
Education, 35 pp. 453-472 (1998).
13
Download