Multiple choice items How to gain the most out of them

advertisement
188
Multiple Choice Items: How to Gain the Most Out of
Them
PINCHAS TAMIR
School of Education and Israel Science Teaching Center
Hebrew University
Jerusalem, Israel
Introduction
Multiple choice questions have been often blamed for a
variety of negative educational outcomes such as rote/
superficial learning, under-development of communication skills, deficient ability to develop and present an
argument, and more. The major justifications offered for
their widespread use, especially in the United States, are:
(1) they permit coverage of a wide range of topics in a
relatively short time;
(2) they are objective in terms of scoring and are
therefore more reliable;
(3) they are easily and quickly scored and lend themselves
to computer marking; and
(4) they avoid unjustified penalties to students who know
their subject matter but are poor writers.
The purpose of this article is to show how multiple
choice items can be designed and used as an effective
diagnostic tool by avoiding their pitfalls and by taking
advantage of their potential benefits.
The following issues will be discussed:
(a) 'correct' versus 'best' answers;
(b) construction of diagnostic multiple choice items;
(c) the problem of guessing;
(d) the use of justifications to choices; and
(e) positive versus negative items.
Correct Versus Best Answers
It is relatively easy to design multiple choice items in
which one option is correct and the rest (the distractors)
are incorrect. Items of this kind tend to be of a lower
cognitive level, requiring most often no more than
memorization of particular facts. Since most teachermade tests are comprised of such items, they do indeed
deserve the harsh criticism put forward against them.
However, as shown by many authors I (e.g., Schwab,
1963), as the focus turns away from correct/incorrect to
the best answer, the picture changes dramatically. Now
the student is faced with the task of carefully analyzing the
various options, each of which may present factually
correct information, and selecting that answer which best
fits the context and the data given in the item's stem.
Multiple choice items of this kind cater for a wide range of
cognitive abilities. When compared with open-ended
questions, they admittedly do not require the student to
formulate an answer, but they do impose the additional
requirement of weighing the evidence provided by the
different options. Take, for example, the following item:
The dry weight of corn plants at the end of their
growth is 6 tons per acre. All of this crop was
produced from
B I O C H E M I C A L E D U C A T I O N 19(4) 1991
A Water and minerals absorbed from the soil
(1)
B Minerals from the soil and oxygen from the
air (13)
*C Water and minerals from the soil and
carbon dioxide from the air (64)
D Water, minerals and organic substances
from the soil (22)
The asterisk (*) indicates that C is the best answer. The
figures in parentheses show the percentage of students
who took the low level matriculation examination in
biology in Israel in the year 1985 (n = 2405), who chose
each option.
Had this been an open question asking "How has all this
crop been produced?" the expected answer would have
been: "By photosynthesis." However, in the actual
context of the multiple choices, the students had to know
that the carbon dioxide from the air is a raw substance in
photosynthesis as well as to rule out the notion that plants
absorb their organic substances from the soil. They also
had to realize the although the information in option (A)
is not incorrect, this is not the best answer to the question.
As for diagnostic purposes, it is certainly worth knowing that 22% believe that plants obtain their organic
matter from the soil and that 13% somehow confuse
photosynthesis with respiration.
In a way, the distractors in a multiple choice item
function much like one of the standard procedures in a
Piagetian classical interview, in which the interviewer is
not fully satisfied even when the child gives a correct
answer and proceeds to check the understanding by
suggesting a competing answer. Take for example the
famous interview regarding area conservation. When the
child indicates that the cows have the same grazing area,
regardless of the manner in which the houses are scattered
in the field, the interviewer keeps pressing: "Yesterday
another child told me that here (where the houses are
scattered all over) the cows have more food than here
(where the houses are close together) - - what do you
say?" Children who do not really understand that the two
areas are equal may fall in the trap. Thus, the distractors
in a good multiple choice item serve as such traps. It may
be concluded that wisely designed multiple choice items
have a high diagnostic potential.
Construction of Diagnostic Multiple Choice Items
There are two ways to go about constructing diagnostic
multiple choice items: (a) using known misconceptions as
distractors; and (b) using students' answers to open-ended
questions as a basis for constructing distractors. 2
The research literature in science education is full of
studies which identify students' misconceptions in relation
to a variety of topics. These misconceptions may serve as
excellent distractors. For example, option D in the item
cited above represents a common belief of many students
in many countries that plants obtain their food, including
organic matter, from the soil much like animals which
obtain their food by eating. 2-7 When such items are used
the results quickly indicate not only how many students
chose the best answer but also how many students hold
particular misconceptions.
In spite of extensive research there are, and will
continue to exist, many topics and concepts for which
there is no a priori information regarding misconceptions.
In such cases the alternative approach suggested by
Tamir 2 is still viable. According to this approach, teachers
who administer open questions to their students may
collect, while assessing the papers, typical student
answers, correct and incorrect. These answers which
represent the ways the students think on given questions
actually reveal certain conceptions including misconceptions which are excellent sources for item options. This
approach has been used in the study of student conceptions about natural selection. 8
The Problem of Guessing
It is generally recognized that multiple choice items lend
themselves to guessing so that the probability of obtaining
correct answers in items comprised of four options by
purely random selection is 25%. However, different
evaluators have taken different positions regarding the
way this problem should be dealt with. Those who
consider guessing as 'noise' causing measurement error
tend to use a formula according to which incorrect choices
involve penalty expressed in actually losing points
(marks). Under such procedure students who do not know
the correct answer are advised not to respond to that
particular item, since a nil response does not result in
losing points.
My position is the following: as long as we deal with
cognitively low level recall items, in which one option is
clearly correct while the distractors are factually incorrect,
guessing should indeed be discouraged. However, when
cognitively high level items are considered where we ask
for the best answer, the situation is totally different. Here
the students have to think, compare, weigh evidence,
apply, analyze, synthesize or evaluate; in short, they have
to solve a problem by utilizing their knowledge and
intellectual skills. Under these circumstances, choices are
often made by 'educated guess', which, in my opinion,
should be encouraged. It may be concluded that, in this
kind of multiple choice test, correction for guessing is
neither necessary nor desirable, and students should be
advised to attempt all items.
At the same time, however, it would be worth knowing
to what extent guessing is indeed 'educated' rather than
totally random. The following procedure offers at least a
partial solution: the students are asked to make two
responses to each item - - f i r s t , to choose the best answer
and, second, to indicate if they are sure or not sure in their
choice. The following marking scheme is used:
correct: sure - - 2 points
correct: not sure - - 1 point
incorrect: not sure - - 1/2 point
incorrect: sure - - 0
This marking scheme has been found to be verY useful
B I O C H E M I C A L E D U C A T I O N 19(4) 1991
BE 1 9 : 4 - D
in two ways. First, its reward hierarchy facilitates honest
reporting by students; and second, it provides a very
important feedback to the teacher as well as to the
student. Thus, for example, if most students in the class
are not sure about a particular item, the teacher may
conclude that there is a need to revisit the relevant subject
matter in class. As for the students, they learn how to selfevaluate their knowledge. If many mismatches occur, the
student may attempt to find the reasons and adjust his/her
learning strategies. Conveniently, this procedure lends
itself easily to machine scoring. Finally, test reliability
increases substantially.
The Use of Justifications
In the context of this article the term justification is
assigned to reasons and arguments given by a respondent
to a multiple choice item for the choice she/he has made.
Very little is reported in the literature about the use of
justifications, mainly because very little use has been
made of this approach. The main reason for avoiding the
use of justifications is, most probably, that certain
advantages associated with objective items, namely, high
reliability, machine scoring, and economical coverage of a
wide range of topics, are lost. In essence the items become
very similar to short essay questions. If so, why do we
need the multiple choice item?
There are at least two important reasons for using
justifications with multiple choice rather than using short
essay questions. First, as already explained in relation to
the example given in the first section of this article, the
distractors serve as traps. When students are required to
justify their choice, they have to consider the data in all
the options and explain why a certain option is better than
others. By including wise options, both as the best answer
and as distractors, we 'force' the students to consider
specific matters and to express their position in writing.
Thus, in the particular item cited, it is not enough to know
that the dry weight of corn plants is mainly a result of
photosynthesis, but in addition, the student needs to
relate to the following: (a) the role of carbon dioxide in
the process, (b) that minerals and water have a share, (c)
that plants do not obtain their organic matter from the
soil, and (d) that oxygen is not a source of that organic
matter.
The second reason for requiring justifications for
multiple choice items is the 'back-wash' effect. Students
who know that they may be asked to justify their choices
will attempt to learn their subject matter in a more
meaningful way and in more depth so that they will be
prepared to write an adequate and complete justification.
Treagust and Haslam 9 designed two-tier items: in the
first step the student chooses the option, and then is
presented with several possible justifications from which
she or he has to choose the best. Although this approach
has the advantage of objective and quick scoring, often
the relationships between the options in the first tier and
those in the second tier are quite awkward.
The results of the Israeli matriculation examination in
18791468, 1991, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1016/0307-4412(91)90094-O, Wiley Online Library on [28/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
189
biology reveal substantial gaps between the scores on the
multiple choice responses and that of the justifications.
For example, in the four items included in the 1985
examination, the percentages of students choosing correctly the best answers were 78, 65, 81, 38 and that of
students providing satisfactory justifications were 59, 48,
61, 29, respectively. On the average the gap between
choice and justification scores has reached a whole
standard deviation.
This gap indicates that a considerable number of
students who choose correctly the best answer do not
really understand the relevant subject matter. This itself is
worth noting. However, the most important contribution
of the justifications is that they provide information about
students' conceptions and reasoning patterns beyond that
which can be obtained by the various procedures outlined
above. For more details see Tamir. 1°
Positive and Negative Multiple Choice Items: How Different are They?
The usual form of a multiple choice item in most countries
is a stem followed by 4 or 5 options one of which is
representing the best or correct answer. The task of the
student is to identify the best answer. However, the world
is full of surprises. While visiting Australia I discovered
that all multiple choice items included in the biology
matriculation examinations used in the State of Victoria
were of the negative type. It was explained to me that
there was an explicit decision to prefer this format because
of the belief that it is better for students to be exposed to
correct information than to incorrect one. The following is
the rationale: since responding to a test is in itself a
learning experience, why not include many correct facts
which will reinforce students' knowledge and restrict
incorrect information to the minimum necessary? Additional argument has been that it is easier to construct a
good multiple choice item which has only one incorrect
answer whereas, on the other hand, it is quite difficult to
invent good distractors. Thus, natural conditions for an
interesting study as described below have been created.
The purpose of this study was to examine certain issues
related to the different item modes, namely Negative (N)
and Positive (P), in an attempt to gain a better insight into
the underlying reasoning processes involved in responding
to the N and P modes.
Most test constructors object to N items arguing that
"the danger of confusion inherent in negative items
outweighs any possible value" (Tinkelman, ref 11, p 58).
Similarly, Wesman 12 writes: "One occasionally finds a
stem phrased in a negative c o n t e x t . . . This may lead the
students to respond with the wrong answer because they
have been tripped up by the tricky or careless item writing
rather than through lack of knowledge" (p 96). Cassels &
Johnstone ~3 investigated the effect of language and
context on students' performance in multiple choice tests.
They found that in some cases a change of one word in the
stem improved the performance in certain items by about
15%. A n o t h e r finding relates directly to the problem of
BIOCHEMICAL
E D U C A T I O N 19(4) 1991
our study: Questions in chemistry set in a positive form
brought better performance from pupils than negative
ones. If questions contained double negatives (one in the
stem and one in the options) the performance was very
poor. Johnstone 14 discusses these results and writes:
"Linguistic literature (Wason ~5'16) has shown that ideas in
a negative form occupy twice as much space in the
working m e m o r y as positive forms. Double negatives may
even occupy four times the space occupied by a positive
form. It is little wonder that negative questions fail so
badly in tests in that they leave less space in the working
m e m o r y for thought" (p 115).
Seventy multiple choice items selected from biology
matriculation examinations in Victoria by the local chief
examiner were mailed from Australia to the author.
Thirty five items were selected and translated by the
author into Hebrew. The accuracy of the translation was
checked independently by three biology educators. The
translated test consisted of negative items. A corresponding form consisting of matching positive items was
prepared by the author. An attempt was made to use as
much as possible the same options in the two modes and
to design distractors which would be as similar as possible
to their matching correct options in terms of the content
and concepts included. The Appendix presents an
example.
Out of the 35 items the first 20 items were shared by all.
In these 20 items students had to choose either the best
(correct) answer in positive items or the least acceptable
(incorrect) answer in the negative items. The remaining 15
items were of high cognitive level and required the
students to choose and justify their choices. 1° The tests
were administered to 254 Israeli 12th grade students from
nine high schools all over the country, by teachers who
had agreed to do so in A p r i l - M a y 1990 just a short time
before the date of their matriculation examination.
The results were analyzed using regular test scoring
procedures yielding reliability indices, frequency distributions, means and standard deviations, and point biserial
correlations. The justifications were subjected to two
analyses. Firstly each was evaluated on a 3 point scale in
which 1 = incorrect; 2 = partially correct; and 3 = correct and complete. Secondly, the justifications were
content-analysed and appropriate categories were created
to accommodate the various arguments. Having established the categories, two independent evaluators read all
the justifications and classified them into the agreed upon
categories so that frequencies could be calculated.
On the average there were no significant differences in
the scores of items of low cognitive level. On the other
hand, in items of higher cognitive levels, the scores on P
items were substantially higher than on N items. This
result was explained in terms of reasoning patterns
developed in the cognitive structure of students as a result
of long experience with P items, as well as by the larger
space required to process negative items in the working
memory. There were practically no gender differences.
Justifications's scores were substantially lower than mul-
18791468, 1991, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1016/0307-4412(91)90094-O, Wiley Online Library on [28/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
190
tiple choice scores. Providing satisfactory justifications in
the N mode was on the average more difficult than in the
P mode.
Substantive differences were found between the kinds
of justifications provided to the P and N items. An
example is presented in the Appendix. The majority of
students who chose the correct answer in the positive
mode justified their choice by saying: "In tubes 4, 5 the
substrate is the same, the p H is different and the products
are different. This indicates that the pH had an effect."
The majority of students who chose the correct answer in
the negative mode justified their choice by saying that
option C is incorrect since "the amount of product
depends on the amount of substrate, not on the amount of
enzyme". As may be seen in the Appendix about a third of
those responding to the N mode chose option B. The
justification provided by most of those choosing option B
was "since a different compound is composed of different
substances the end products must be different". These
students had failed to notice that tubes 1, 2 which
contained different compounds yielded the same end
product.
In this case one may speculate that the positive mode
was easier since students had known from their experiences with enzymes that pH was an important factor
which usually affects enzymes' activity. On the other
hand, the decision regarding options B and C in the
negative mode required careful evaluation of the meaning
of the information provided and reliance on prior knowledge was not enough.
Based on the results of this pioneering study there
appear to be a variety of differences pertaining to student
performance in P and N modes of multiple choice items.
The main findings and conclusions of this study are the
following:
(1) In items of low cognitive level there are, on the
average, no differences in performance between the N
and the P modes.
(2) In items which require high cognitive reasoning the N
mode is, on the average, more difficult than the P mode.
(3) This difference between low and high cognitive items
may lend support to the hypothesis that processing N
items requires more space in the working memory.
(4) T h e r e may be interactions between performance in
the P/N mode and the items' content.
(5) In items with which algorithms are used, such as the
check board used to solve crosses in genetics, there are no
P/N mode differences.
(6) Multiple choice scores are positively correlated with
the extent of offering justifications as well as with the
justification scores. Stated differently a student choosing
correctly the best answer in both P/N modes is more likely
to offer a justification and, as well, more likely to have a
higher justification score.
(7) On the average, justification scores in the P mode are
higher than in the N mode even when the contents of the
items and the actual options are very similar.The detailed
BIOCHEMICAL
E D U C A T I O N 19(4) 1991
analysis of the justifications lends support to the assertion
that information processing in the N mode is more
complex and involves more steps than in the P mode.
(8) T h e r e are no gender interactions in any of the
measures and processes related to the P/N mode effects
identified in this study.
(9) The level of performance on the various measures is
positively correlated with the school grade in biology. The
magnitude of the correlations in the P mode is very similar
to that of the N mode. If we consider the school grade as a
measure of concurrent validity we may conclude that the
two modes are equally valid. Hence, the two modes may
be regarded as equally valid measures of students'
performance, even though they may differ in their
difficulty level.
(10) A detailed content analysis of the justifications
shows that a plausible explanation for the higher difficulty
level of N items is that the necessary information
processing involves more steps and is more complex than
in the P mode. The data also lend support to the
hypothesis that processing negative items occupies more
space in the working memory.
It still remains to be seen whether or not the performance of Australian students who have been used to the N
mode will be different from that of the Israeli students,
who like most students in other countries have been used
to the P mode.
References
1Schwab, J J (1963) The Biology Teacher Handbook, Wiley, New York
2Tamir, P (1971) 'An alternative approach to the construction of
multiple choice test items' J Biol Educ 5, 305-307
3Bell, B (1985) 'Students' ideas about plant nutrition: what are they?' J
Biol Educ 19,213-219
4Simpson, M and Arnold, B (1980) 'An investigation of the development of the concept of photosynthesis to SCE 'O' grade', Aberdeen
College of Education, Aberdeen, Scotland
5Smith, E L and Lott, G W (1983) 'Teaching for conceptual change:
some ways to go wrong', in Helm H and Novak J D (Editors),
International Seminar on Misconceptions in Science and Mathematics,
Cornell University Press, Ithaca, pp 57-66
6Stavy, R, Eisen, Y and Yaakobi, D (1987) 'How Israeli students aged
13-25 understood photosynthesis' Int J Science Education 9, 105-115
7Wandersee, J H (1983) 'Students' misconceptions about photosynthesis: a cross age study', in Helm, H and Novak J D (Editors),
International Seminar on Misconceptions in Science and Mathematics,
Cornell University Press, Ithaca, pp 441-463
8Brumby, M (1979) 'Problems in learning the concept of natural
selection' J Biol Educ 13, 119-122
9Treagust, D F and Haslam, F (1986) 'Evaluating secondary students'
misconceptions of photosynthesis and respiration in plants using a twotier diagnostic instrument', Paper presented at the annual meeting of
the National Association of Research in Science Teaching, San
Francisco
t°Tamir, P (1990) 'Some issues related to the use of justifications to
multiple choice items', J Biol Educ 24, 285-292
11Tinkelman, S N (1971) Planning the Objective Test, in Thorndike R L
(Editor) Educational Measurement, American Council of Education,
Washington, DC, pp 46-80
12Wesman, A (1971) 'Writing the Test Item', in Thorndike R L (Editor)
Educational Measurement, American Council of Education, Washington, DC, pp 81-129
18791468, 1991, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1016/0307-4412(91)90094-O, Wiley Online Library on [28/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
191
t3Cassels, J R T and Johnstone, A H (1980) 'Understanding of Nontechnical Words in Science', London, Royal Society of Chemistry
1aJohnstone, A H (1983) 'Training teachers to be aware of the student
learning difficulties', in Tamir P, Holstein A and Ben Peretz M
(Editors) Preservice and Inservice Education of Science Teachers,
Balaban International Science Services, Rehovot-Philadelphia, pp
109-116
ESWason, P C (1959) 'The processing of positive and negative information', Quarterly J Experimental Psychology 11, 92-107
t6Wason P C (1961) 'Response to the affirmative and negative binary
statements', Brit J Psychol 52, 133-142
Appendix: An item featuring differences in justifications
Mammalian liver tissue was finely ground, filtered and treated so that only enzymes remained in the solution. Some of
the solution was added to a series of test tubes (see below) and incubated in 37°C for one hour. The treatments and
results were as follows:
Tube no
1
2
3
4
5
6
Compound added
tryptophan
kynurenine
histidine
maltose
maltose
protein
pH
Compoundsdetected after 1 hour
not tested
not tested
not tested
6.6
10
5
nicotinic acid
nicotinic acid
glutamic acid + formic acid
glucose
maltose
various amino acids
From these results it may be concluded that:
Negative]
(6)
(32)
(*38)
(4)
A
B
C
D
the amount of glucose in tube 4 will depend on the amount of maltose put there
adding a seventh tube with a different compound would not necessarily result in a different end product
addition of more enzyme solutions to tube 2 will increase the amount of nicotinic acid.
at least one of the reactions indicated is likely to be affected by pH.
Positive
(15)
(2)
(*66)
(4)
A
B
C
D
addition of more enzyme solution to tube 2 will increase the amount of nicotinic acid
adding a seventh tube with a different compound would result in a different end product
at least one of the reactions is likely to be affected by pH.
addition of more maltose to tube 5 will increase the amount of glucose
* = correct answer; the figures indicate the percentage of students choosing the corresponding option
Reshaping the preclinicai medical curriculum:
modest proposal
BRUCE G CHARLTON
Department of Anatomy
University of Glasgow
Glasgow G12 8QQ, UK
Introduction
If you are happy with the current preclinical medical
training in British Medical Schools, then you need read no
further. If, on the other hand, you consider it to be a dull
anachronism, consisting of too many 'facts', overtaught,
encouraging passive learning, insufficiently interactive
betwen staff and students, lacking in clinical relevance,
unscientific and boring, then you may consider that we
should be looking for ways to improve it.
There has been no shortage of suggestions for improvement dating back over more than a century, ~ but most of
BIOCHEMICAL
E D U C A T I O N 19(4) 1991
these have been flawed by excessive idealism. For
example, the complete integration of the pre- and the
clinical components - - with much basic science being
taught by practicing clinicians - - is one excellent idea; but
is probably logistically impossible in an established
medical school without an unrealistic investment of time,
money and dedication to the project. Another idea is the
universal extension of the course by a year, so that every
student does a bachelor's degree in medical science
(instead of just a selected minority, as at present). But this
would be too expensive, expands an already bloated
training programme, would not be to all student's tastes;
and anyway would leave the problems of the existing
curriculum untouched, an extra year merely serving to
undo the bad habits of the previous two.
What is required is a simple method of reducing the
bulk of compulsory basic 'factual' material, and replacing
it with the kind of challenging, in-depth study which is the
norm (or at least the ideal) for many other university
18791468, 1991, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1016/0307-4412(91)90094-O, Wiley Online Library on [28/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
192
Download