Basic Issues in Language Assessment 袁韻璧 輔仁大學英文系 Contents Introduction: relationship between teaching & testing Forms of test delivery Characteristics of a good test Validity, reliability, practicality, positive washback Multiple-choice reading tests Computer-based testing Advantages and disadvantages Conclusion Relationship between Teaching & Testing Subordinate partnership (supportive, corrective) Forms of Test Delivery alternative assessment paper-&-pencil tests computer-based testing Characteristics of a Good Test Validity Reliability Practicality (feasibility) Positive washback The effect of tests on teaching & learning Validity Definition: a test should measure what it is intended to measure, and nothing else (i.e., no external knowledge or other skills measured at the same time). Types of validity Face validity, content validity, construct validity, criterion-related validity Face Validity You know if the test is valid or not by ‘looking’ at it. It “looks right” to other testers, teachers, and testees, etc. Essential to all kinds of tests, but it is not enough. Content Validity “A test is said to have content validity if its content constitutes a representative sample of the language skills, structures, etc. with which it is meant to be concerned.” (Hughes 1989, p. 22) Also called rational or logical validity. Check against: Test specification (test plan) Teaching materials, textbooks Course syllabus/objectives Another teacher or subject-matter experts Definition of Reliability “The consistency of measures across different times, test forms, raters, and other characteristics of the measurement context” (Bachman, 1990, p. 24). The accuracy or precision with which a test measures something; consistency, dependability, or stability of test results. How to make sure the test is reliable for teachers Take enough samples of behavior Try to avoid ambiguous items Provide clear and explicit instructions Well layout Provide uniform and undistracted condition Try to use objective tests Try to use direct tests Have independent, trained raters Try to identify the test takers by number, not by names Try to have more multiple independent scoring in subjective tests (Hughes, 1989, pp. 36-41). Practicality Practical consideration when planning tests or ways of measurement, including cost, time/effort required Economy Ease of Scoring and score interpretation Administration Test compilation A test should be practical to use, but also valid and reliable. Multiple-choice Reading Tests Comprehension—being able to find meaning in what is read Three levels of comprehension: Literal, interpretive (or referential) & critical Problems of multiple-choice reading tests Recall the info. Or text recycling Ambiguous, flawed texts/items Information gaps in passages Unfair, tricky tasks (e.g., full of unfamiliar words) Too much background knowledge assumed Scored for wrong reason or vice versa Test-taking techniques Advantages of CBT Scoring done automatically and immediately Tests tailored to the particular abilities of each test taker Tests provided on demand Many item combos are possible test security Multi-media multiple-intelligent learning Disadvantages of CBT Writing tests: Do raters react differentially to printed vs. handwritten texts? To testees: different composing processes Reading tests: Do testees react in the same way to read texts presented on computer screen and texts printed on paper? Speaking tests (semi-direct tests): Nature of communication: a shared human activity, involving interlocutors & interaction Conclusion Variables of test performance: Types/formats of tasks, nervousness, physical conditions of testees, rater factors, etc. Adoption of multiple methods of assessment, alternative assessment Valid, reliable paper-and-pencil tests that have positive washback CBT for classroom teachers—depending on the testing purpose & needs