Matakuliah Tahun : G0444 / MATERIAL DESIGN and TESTING : 2009 Pertemuan 24 RELIABILITY It concerns with how far we can depend on the results that a test produces or in other words, could the results be produced consistently. 3 A good test should give consistent results. For example: If the same group of students took the same test twice within two days without reflecting on the first test before they sat it again – they should get the same results on each occasion. If they took another similar test, the results should be consistent. If two groups who were demonstrably alike took the test, the marking range would be the same. In practice, RELIABILITY is enhanced by making the test instructions absolutely clear, restricting the scope for variety in the answers and making sure that the test conditions remain constant. RELIABILITY also depends on the people who mark the test – the scorers. Clearly a test is unreliable if the result depends to any large extent on who is marking it. Bina Nusantara The Reliability coefficients: The ideal reliability coefficient is 1. A test with a reliability coefficient of 1 is one which would give precisely the same results for a particular set of candidates regardless of when it happened to be administered. A test with a reliability coefficient of zero would give sets of results quite unconnected with each other. Lado says that good vocabulary, structure and reading tests are usually in the range of .90 to .99 , while auditory comprehension tests are often in the .80 to .89 range. Oral production tests may be in the .70 to .79 range. 5 How to arrive at the reliability coefficient ? The requirement is to have two sets of scores for comparison, by: 1. getting a group of subjects to take the same test twice (test-retest method); 2. using two different forms of the same test (alternate forms method). 6 The standard error of measurement and the true score While the reliability coefficient allows us to compare the reliability of tests, it does not tell us directly how close an individual’s actual score is to what he or she might have scored on another occasion. With a little further calculation, however, it is possible to estimate how close a person’s actual score is to what is called their “true score’. For the calculation, see appendix 1, “Testing for Language Teachers”, Arthur Hughes, page 159 7 HOW TO MAKE TESTS MORE RELIABLE 1. take enough samples of behaviour 2. do not allow candidates too much freedom 3. write unambiguous items 4. provide clear and explicit instructions 5. ensure that tests are well laid out and perfectly legible 6. candidates should be familiar with format and testing techniques 7. provide uniform and non-distracting 8conditions of administration 8. use items that permit scoring which is as objective as possible 9. make comparison between candidates as direct as possible 10. provide a detailed scoring key 11. train scorers 12. agree acceptable responses and appropriate scores at outset of scoring 13. identify candidates by number, not name 14. employ multiple, independent scoring. 9