Uploaded by Châu Thy

Language-testing-assessment: Techniques for Summative assessment - Interpreting test score

 Techniques for Summative assessment
 Interpreting test score
Multiple choice items are very frequently used in tests of reading and listening.
Discrete point and text based multiple choice
True / false
Gap-filling (cloze) with multiple choice options
Gap-filling with selection from bank
Gap-filling at paragraph level
Extra word error detection
Discrete point and text based multiple choice items
- Text-based multiple choice items are often presented as a question followed by
three, four or five options which include the key, or correct answer.
 Discrete point multiple choice item
 Text-based multiple choice item
True/ False
 The true/false item is one in which test takers have to make a choice as to the truth or otherwise
of a statement, normally in relation to a reading or listening text.
 Candidates who rely on guessing still has a chance of achieving a reasonably high score.
Gap-filling (cloze passage) with multiple choice options
 Words are deleted from a text, creating gaps which the candidate has to fill,
normally with either one or a two words.
 Multiple choice cloze tests are typically used to test reading, grammar or
Gap-filling with selection from bank  CLT
 Suitable for use in elementary level tests of reading.
Gap-filling at paragraph level
 A test of reading skills at a relatively high level, involving a test of candidates' understanding of
an extensive text as a whole and of its structural and narrative coherence.
 Should be presented either on one page or on facing pages in the examination paper 
candidates can easily look back and forth between the text and the options they must choose
 The fact that there are six gaps and seven options prevents the last one chosen from being
automatically correct if the previous five choices have been correct.
Gap-filling at paragraph level  CLT
 Used in tests of reading
 Can be used with candidates at quite an elementary level.
Extra word error detection
 Requires candidates to focus on their conscious knowledge of the way language structures
work, and has a particular use in tests of structural competence.
 It is difficult to construct items which represent plausible errors, or errors which could not (if
this were a 'real-life' task) be corrected in more than one way.
Extra word error detection
Writting extended writing questions
Writing tasks with detailed input
Writing tasks with titles only
Writing extended writing questions
 Examples of extended writing tasks at a variety of levels
Elementary level
Intermediate level
Writing tasks with detailed input
 Detailed input produces a more uniform set of responses from candidates, which makes
marking quicker, easier and more reliable
 Reading and writing skills are being tested, which may detract from the objectives and raise
questions about the validity of the task as a test of writing.
 An example at B1 level
Writing tasks with titles only
 A stimulus which consists of an essay title only produces very varied responses 
more difficult to mark fairly, but does not test comprehension of the input in any
significant way
 Successful essay writing may depend on the candidate's background, culture and
education, which may not be part of what the language tester is trying to assess.
 Set a task that gives more input than an essay question, includes the text-type to
write, a reason for writing and sense of who the target reader will be.
 Candidates are able to express personal ideas and to some extent to adapt the topic
to their own interests.
An example at B2 level
- Presentation
- Use of picture prompts
- Written promts
- Information gap tasks
 Candidates speak in specified minutes on a topic prepared in advance or
given shortly before the test.
Pro: candidates can talk on topics of real concern to themselves
Con: speeches on mainstream topics may be prepared and or memorized.
If more than one candidate is being assessed, tasks consist of questioning
each other on the content of their presentations and answering questions
=>Test listening and interactive communicative skills.
Use of picture prompts
 Candidates look at a photograph or set of photographs.
 They describe what they can see -> suggest a common theme for the photos or give opinions
on the situation depicted.
 A discussion may be developed out of the theme introduced in the photos.
Written promts
 Apply for candidates at elementary level.
 Discussions can be prompted by a brief written statement.
 If candidates are being examined in pairs, they are given oral instructions and visual stimuli
to form the basis of a task (listening / turn-taking / initiating / responding) => Test interactive
Information gap tasks
 Each candidate has a picture the other cannot see. Although similar, they are not identical, and
one has to describe the pictures to the other to discover what the differences are.
 Each candidate has a set of pictures which are identical but arranged in a different order. One
candidate must describe one of the pictures, and the other must identify by number which
picture was being described.
Short answer item
Sentence completion
Open gap-filling (cloze)
Word formation
Transformation cloze
Note expansion
Error correction (proof reading)
Information transfer
Short answer item
 Consists of a question which can be answered in one word or a short phrase (related to a text).
 Used in tests of reading and listening.
 Example: Complete the information below. Write NO MORE THAN TWO
WORDS AND/OR A NUMBER for each answer.
Sentence completion
 Part of a sentence is provided, the candidate use information derived from a text to complete
 Used in tests of reading and listening.
 Example: Complete the sentence using no more than two words and/or a number
Open gap-filling (cloze)
 Used in tests of structural competence.
 There is often only one possible correct answer.
 Gaps should occur approximately every seven to ten words.
 Example:
 Used in tests of structural competence or writing at sentence level.
 It is important to consider the number of testing points, and the acceptable
answers for each.
 Limiting the number of words to limit the range of possible correct answers.
 Example:
Word formation
 Used in tests of structural competence or writing where there is a focus on testing
knowledge of vocabulary
 Sentence should give an economical and unambiguous context to the target word.
 It look more natural to put in a proper name rather than to use 'he' or 'she' all the time.
 Example:
Transformation cloze
 Used in tests of structural competence or writing at sentence level.
 Consist of a text with a word missing in each line, and a different grammatical form of the
word required supplied.
 The candidate has to find the location of the missing word and supply it in its correct form.
Note expansion
 Used in a test of structural competence or writing.
 The item writer must be clear about what the testing
points are, and must construct a mark scheme as part
of the item writing task.
 A disadvantage of the item type is that it necessitates a
rather complicated mark scheme, and is difficult to
mark accurately.
 Example:
Error correction (proof reading)
 This is a tightly controlled type of
candidate-supplied response item, used
in tests of structural competence.
 There is only one correct answer.
 It is important that the item writer
knows the range of types of incorrect
words to be used.
 Example:
Information transfer
 Used in test of writing and structural
 Taking information given in a certain form
and presenting it in a different form.
 Involving mail, maps, diagrams, graphs, etc.,
 Example:
Interpreting test score
1.Frequency distribution
 Table 3: + contains a
frequency distribution
showing the number of
students who obtained
each mark awarded.
+ Tallies (///) simply
illustrate the method of
counting the frequency of
Interpreting test score
1.Frequency distribution
 The frequency polygon
illustrates the distribution of
the scores
Interpreting test score
2.Measures of central tendency
 Mode: the score which most candidates obtained (5 testees
have scored this mark)
 Median: the score gained by the middle candidate in the order
of merit  the score halfway between the lowest score in the
top half and the highest score in the bottom half is taken as the
median  the median score is also 26
 Mean: the arithmetical average (the sum of the separate scores
divided by the total number of testees) the mode, median &
mean are all measures of central tendency.
Interpreting test score
x: denotes the score
N: the number of testees
m: the mean
f: denotes the frequency with which a score occurs
∑: the sum of
•x = 702 : the total number of items which the group of 26 Ss got right
between them. 702/26  the average
•there is fairly close correspondence between the mean (27) and the median
Interpreting test score
3.Measures of dispersion (range or spread of
 Range: a way of measuring the spread of marks, based
on the difference between the highest and lowest
Ex: in a 50-item test, the highest score is 43 and the
lowest is 21 The range is 22.
Ex: The range in Table 1 is 16.
Interpreting test score
 3.Measures of dispersion (range or spread of scores)
Standard deviation (s.d.)
+ Shows the spread of scores.
+ Measures the degree to which the group of score deviates from the
+ Describes the gap between the highest and lowest marks.
N: the number of scores.
d: the deviation of each
score from the mean.
Interpreting test score
 3.Measures of dispersion (range or spread of scores)
Step 1: find out the amount by which
each score deviates from the mean (d).
Step 2: square each result d2
Step 3: total the results
Step 4: divide the total by the number of
Step: find the square root of this result.
Interpreting test score
4. Item analysis (There are 3 sections that need to be taken into consideration)
Item difficulty
 The index of difficulty (facility value): shows how easy or difficult the particular item proved in the
test. It is generally expressed as the fraction (or percentage) of the students who answered the item
 Formula: FV = 𝑵
 FV: the index of difficulty
 R: number of correct answers
 N: number of students taking the test
Interpreting test score
4.Item analysis
 Item discrimination
 The discrimination index: the extent to which the item discriminates between the testees,
separating the more able testees from the less able.
 This (D) index tells us whether those students who performed well on the whole test tended to
do well or badly on each item in the test.
 D=
𝑪𝒐𝒓𝒓𝒆𝒄𝒕 𝑼−𝑪𝒐𝒓𝒓𝒆𝒄𝒕 𝑳
D: Discrimination index
U: Upper half
L: Lower half
n = : Number of candidates in one group (either the U or L group)
Interpreting test score
 4.Item analysis
 Extended answer analysis:
 Helps to know:
 Why items have not performed according to expectations
 Why certain testees have answered a particular item incorrectly.
 Apply well in the multiple-choice technique.
Interpreting test score
 Items sometimes contain only the minimum of context => there are bound to be many blindspots in tests, especially in the field of objective teting
 It is essential that test writters should submit submit the test for moderation to colleagues 
the most appreciate and efficient measuring instrument is produced for the particular purpose at
hand. In these cases, moderation is also frequently concerned with the scoring of the test and
with the evaluation of the test results
 Some items in a test may appear too difficult or else far too easy  take moderation from a
friend, a spouse or an older student.
Interpreting test score
6.Item cards and banks
 The construction of objective tests necessitates taking a great deal of time and trouble
 The best way of recording and storing items: means of small cards.
 Only 1 item is entered on each card. After being arranged according to the element or skills
which they are intended to test, the items on the separate cards are grouped according to
difficulty level, the particular area tested.
 Although building up an item bank consisting of a few hundred items, item bank will prove of
enormous value and will save the teacher a great deal of time and trouble later.
 The same items can be used again many times, the order of the items is changed each time.
Interpreting test score
6.Item cards and banks
 Multiple-choice items testing
most areas of the various
language elements and skills can
be rewritten in the way: the
same options are generally-kept,
but the context is changed so
that one of the distractors now
becomes the correct option.
Random flashcards
Arab people

15 Cards


20 Cards


30 Cards


17 Cards

Create flashcards