LANGUAGE TESTING & ASSEESSMENT GROUP 2- WEEK 9 CONTENT Techniques for Summative assessment Interpreting test score TECHNIQUES FOR SUMMATIVE ASSESSMENT I. MULTIPLE CHOICE AND OTHER SELECTION ITEM TYPES Multiple choice items are very frequently used in tests of reading and listening. Discrete point and text based multiple choice True / false Gap-filling (cloze) with multiple choice options Gap-filling with selection from bank Gap-filling at paragraph level Matching Extra word error detection Discrete point and text based multiple choice items - Text-based multiple choice items are often presented as a question followed by three, four or five options which include the key, or correct answer. Discrete point multiple choice item Text-based multiple choice item True/ False The true/false item is one in which test takers have to make a choice as to the truth or otherwise of a statement, normally in relation to a reading or listening text. Candidates who rely on guessing still has a chance of achieving a reasonably high score. Gap-filling (cloze passage) with multiple choice options Words are deleted from a text, creating gaps which the candidate has to fill, normally with either one or a two words. Multiple choice cloze tests are typically used to test reading, grammar or vocabulary. Gap-filling with selection from bank CLT Suitable for use in elementary level tests of reading. Gap-filling at paragraph level A test of reading skills at a relatively high level, involving a test of candidates' understanding of an extensive text as a whole and of its structural and narrative coherence. Should be presented either on one page or on facing pages in the examination paper candidates can easily look back and forth between the text and the options they must choose among. The fact that there are six gaps and seven options prevents the last one chosen from being automatically correct if the previous five choices have been correct. Gap-filling at paragraph level CLT Matching Used in tests of reading Can be used with candidates at quite an elementary level. Extra word error detection Requires candidates to focus on their conscious knowledge of the way language structures work, and has a particular use in tests of structural competence. It is difficult to construct items which represent plausible errors, or errors which could not (if this were a 'real-life' task) be corrected in more than one way. Extra word error detection II. NON-ITEM-BASED TASK TYPE * WRITTING: - Writting extended writing questions - Writing tasks with detailed input - Writing tasks with titles only Writing extended writing questions Examples of extended writing tasks at a variety of levels Elementary level Intermediate level Writing tasks with detailed input Detailed input produces a more uniform set of responses from candidates, which makes marking quicker, easier and more reliable Reading and writing skills are being tested, which may detract from the objectives and raise questions about the validity of the task as a test of writing. An example at B1 level Writing tasks with titles only A stimulus which consists of an essay title only produces very varied responses more difficult to mark fairly, but does not test comprehension of the input in any significant way Successful essay writing may depend on the candidate's background, culture and education, which may not be part of what the language tester is trying to assess. Set a task that gives more input than an essay question, includes the text-type to write, a reason for writing and sense of who the target reader will be. Candidates are able to express personal ideas and to some extent to adapt the topic to their own interests. An example at B2 level II. NON-ITEM-BASED TASK TYPE • SPEAKING: - Presentation - Use of picture prompts - Written promts - Information gap tasks Presentation Candidates speak in specified minutes on a topic prepared in advance or given shortly before the test. Pro: candidates can talk on topics of real concern to themselves Con: speeches on mainstream topics may be prepared and or memorized. If more than one candidate is being assessed, tasks consist of questioning each other on the content of their presentations and answering questions =>Test listening and interactive communicative skills. Use of picture prompts Candidates look at a photograph or set of photographs. They describe what they can see -> suggest a common theme for the photos or give opinions on the situation depicted. A discussion may be developed out of the theme introduced in the photos. Written promts Apply for candidates at elementary level. Discussions can be prompted by a brief written statement. If candidates are being examined in pairs, they are given oral instructions and visual stimuli to form the basis of a task (listening / turn-taking / initiating / responding) => Test interactive skill. Information gap tasks Each candidate has a picture the other cannot see. Although similar, they are not identical, and one has to describe the pictures to the other to discover what the differences are. Each candidate has a set of pictures which are identical but arranged in a different order. One candidate must describe one of the pictures, and the other must identify by number which picture was being described. III. CANDIDATE SUPPLIED RESPONSE ITEM TYPES - Short answer item Sentence completion Open gap-filling (cloze) Transformation Word formation Transformation cloze Note expansion Error correction (proof reading) Information transfer Short answer item Consists of a question which can be answered in one word or a short phrase (related to a text). Used in tests of reading and listening. Example: Complete the information below. Write NO MORE THAN TWO WORDS AND/OR A NUMBER for each answer. Sentence completion Part of a sentence is provided, the candidate use information derived from a text to complete it. Used in tests of reading and listening. Example: Complete the sentence using no more than two words and/or a number Open gap-filling (cloze) Used in tests of structural competence. There is often only one possible correct answer. Gaps should occur approximately every seven to ten words. Example: Transformation Used in tests of structural competence or writing at sentence level. It is important to consider the number of testing points, and the acceptable answers for each. Limiting the number of words to limit the range of possible correct answers. Example: Word formation Used in tests of structural competence or writing where there is a focus on testing knowledge of vocabulary Sentence should give an economical and unambiguous context to the target word. It look more natural to put in a proper name rather than to use 'he' or 'she' all the time. Example: Transformation cloze Used in tests of structural competence or writing at sentence level. Consist of a text with a word missing in each line, and a different grammatical form of the word required supplied. The candidate has to find the location of the missing word and supply it in its correct form. Note expansion Used in a test of structural competence or writing. The item writer must be clear about what the testing points are, and must construct a mark scheme as part of the item writing task. A disadvantage of the item type is that it necessitates a rather complicated mark scheme, and is difficult to mark accurately. Example: Error correction (proof reading) This is a tightly controlled type of candidate-supplied response item, used in tests of structural competence. There is only one correct answer. It is important that the item writer knows the range of types of incorrect words to be used. Example: Information transfer Used in test of writing and structural competence. Taking information given in a certain form and presenting it in a different form. Involving mail, maps, diagrams, graphs, etc., Example: Interpreting test score 1.Frequency distribution Table 3: + contains a frequency distribution showing the number of students who obtained each mark awarded. + Tallies (///) simply illustrate the method of counting the frequency of scores Interpreting test score 1.Frequency distribution The frequency polygon illustrates the distribution of the scores Interpreting test score 2.Measures of central tendency Mode: the score which most candidates obtained (5 testees have scored this mark) Median: the score gained by the middle candidate in the order of merit the score halfway between the lowest score in the top half and the highest score in the bottom half is taken as the median the median score is also 26 Mean: the arithmetical average (the sum of the separate scores divided by the total number of testees) the mode, median & mean are all measures of central tendency. Interpreting test score x: denotes the score N: the number of testees m: the mean f: denotes the frequency with which a score occurs ∑: the sum of •x = 702 : the total number of items which the group of 26 Ss got right between them. 702/26 the average •there is fairly close correspondence between the mean (27) and the median (26) Interpreting test score 3.Measures of dispersion (range or spread of scores) Range: a way of measuring the spread of marks, based on the difference between the highest and lowest score. Ex: in a 50-item test, the highest score is 43 and the lowest is 21 The range is 22. Ex: The range in Table 1 is 16. Interpreting test score 3.Measures of dispersion (range or spread of scores) - Standard deviation (s.d.) + Shows the spread of scores. + Measures the degree to which the group of score deviates from the mean. + Describes the gap between the highest and lowest marks. N: the number of scores. d: the deviation of each score from the mean. Interpreting test score 3.Measures of dispersion (range or spread of scores) Ex: Step 1: find out the amount by which each score deviates from the mean (d). Step 2: square each result d2 Step 3: total the results Step 4: divide the total by the number of testees Step: find the square root of this result. Interpreting test score 4. Item analysis (There are 3 sections that need to be taken into consideration) Item difficulty The index of difficulty (facility value): shows how easy or difficult the particular item proved in the test. It is generally expressed as the fraction (or percentage) of the students who answered the item correctly. 𝑹 Formula: FV = 𝑵 FV: the index of difficulty R: number of correct answers N: number of students taking the test Interpreting test score 4.Item analysis Item discrimination The discrimination index: the extent to which the item discriminates between the testees, separating the more able testees from the less able. This (D) index tells us whether those students who performed well on the whole test tended to do well or badly on each item in the test. D= 𝑪𝒐𝒓𝒓𝒆𝒄𝒕 𝑼−𝑪𝒐𝒓𝒓𝒆𝒄𝒕 𝑳 𝒏 D: Discrimination index U: Upper half L: Lower half 𝑁 n = : Number of candidates in one group (either the U or L group) 2 Interpreting test score 4.Item analysis Extended answer analysis: Helps to know: Why items have not performed according to expectations Why certain testees have answered a particular item incorrectly. Apply well in the multiple-choice technique. Interpreting test score 5.Moderating Items sometimes contain only the minimum of context => there are bound to be many blindspots in tests, especially in the field of objective teting It is essential that test writters should submit submit the test for moderation to colleagues the most appreciate and efficient measuring instrument is produced for the particular purpose at hand. In these cases, moderation is also frequently concerned with the scoring of the test and with the evaluation of the test results Some items in a test may appear too difficult or else far too easy take moderation from a friend, a spouse or an older student. Interpreting test score 6.Item cards and banks The construction of objective tests necessitates taking a great deal of time and trouble The best way of recording and storing items: means of small cards. Only 1 item is entered on each card. After being arranged according to the element or skills which they are intended to test, the items on the separate cards are grouped according to difficulty level, the particular area tested. Although building up an item bank consisting of a few hundred items, item bank will prove of enormous value and will save the teacher a great deal of time and trouble later. The same items can be used again many times, the order of the items is changed each time. Interpreting test score 6.Item cards and banks Multiple-choice items testing most areas of the various language elements and skills can be rewritten in the way: the same options are generally-kept, but the context is changed so that one of the distractors now becomes the correct option. THANKS FOR LISTENING!