FACTSHEET 30 Estimating Test Reliability Test - Retest Reliability The test-retest reliability coefficient is the correlation between the scores obtained by the same person on two separate administrations of the test. If test results are generally different on the second administration from the first, the test-retest reliability coefficient will be low. If there is a good relationship between scores on the two occasions, the coefficient will be high. Test-retest reliability therefore provides an index of the test’s stability over time. The test-retest reliability coefficient will be weakened by random variations in scores from one test session to another. These variations may result from: i) Changes in test conditions, such as distractions, changes in instructions, poorly controlled timing etc. ii) Changes in candidates themselves, such as illness, tiredness, tension, mood, recent experiences etc. iii) Ambiguous or opaque test items: if the test questions are unclear or open to different interpretations, the candidates are more likely to respond differently on the two test sessions, than would be the case if items were clear and unambiguous. When test - retest reliability is reported in a test manual, it should be accompanied by details of the time interval between the two testing sessions. This is because test-retest coefficients decrease as this interval increases. It is therefore arguable that there are potentially an infinite number of test-retest reliabilities for a given test, one for every possible time interval. The benefits of the test - retest method for estimating reliability are: • Only one form of the test is required • The reliability estimate provides an indication of the extent to which scores on a test can be generalised over different occasions; the higher the test-retest reliability the less the test is influenced by random variations in the conditions of the candidates or the testing environment. © 2009 Knight Chapman Psychological Ltd. All rights reserved. BPS Occupational Testing Level BPS Occupational B Intermediate Testing Level A The reliability coefficient is in fact a correlation coefficient. To arrive at a correlation coefficient we have to collect for a single sample two sets of data and correlate them. In this section we will look at the kinds of data that can be used to provide an estimate of a test’s reliability. The reliability coefficients published in test manuals are derived using one (or more) of the three methods described below. FACTSHEET 30 Estimating Test Reliability • Even though test-retest reliability provides some measure of item quality, it is likely that even poor (ie ambiguous or unclear) items will be answered in the same way on both testing sessions by many candidates. This may lead to over-confidence in the quality of test items. • Candidates are likely to vary in the extent to which they benefit from practice effects when retaking the same test. • Memory of the first session may influence responses in the second session, resulting in non-independence of the two sets of scores and a spuriously high correlation. Alternate-Form Reliability This is computed by comparing scores obtained by the same individuals on two forms (or versions) of the same test. Of course, this method is limited to tests for which more than one form exists. Like test-retest coefficients, alternate-form reliabilities should be accompanied by details of the interval between sessions. If this period is longer rather than shorter, alternate-form will be influenced by the same factors affecting the condition of the candidate as test-retest reliability. Benefits of estimating reliability in this way are: • This provides a good measure of item quality in that ambiguous or unclear items should result in reduced alternate-form reliability. • Practice effects and memory of test content/solutions are attenuated. Disadvantages include the following: • Most occupational test batteries do not provide alternate forms. In these cases the method obviously cannot be used. • Though attenuated, practice effects may transfer across very similar parallel forms. © 2009 Knight Chapman Psychological Ltd. All rights reserved. BPS Occupational Testing Level BPS Occupational B Intermediate Testing Level A The disadvantages are: FACTSHEET 30 Estimating Test Reliability Internal Consistency More sophisticated ways of dividing the questions into two sets are also available. If all of the questions are measuring the same characteristic, the internal consistency coefficient should be high: perhaps in the range 0.65 - 0.90. If a test has been designed to assess a number of heterogeneous attributes, as is the case with some tests of general intelligence, the internal consistency will be lower. Generally tests are designed to measure discrete attributes and so should have good internal consistencies. Benefits of this method of measuring reliability are • Internal consistency provides a good measure of item quality because (as with alternate-form reliability) ambiguous or unclear items will reduce the internal consistency coefficient. • Only one administration of the test is required. • Practice/memory effects are eliminated. • The reliability coefficient also indicates the extent to which the instrument measures a discrete attribute. Disadvantages are: • This method is not appropriate for homogenous tests (ie those consisting of a variety of item types, such as tests of general intelligence). • Internal consistency does not indicate a test’s stability over time. • Internal consistency may be spuriously high if items are so similar as to cause interdependence between items (eg if a self-report scale consists of items of almost identical wording). • Internal consistency can be less useful with speeded tests. Not all candidates will complete all of the test items. © 2009 Knight Chapman Psychological Ltd. All rights reserved. BPS Occupational Testing Level BPS Occupational B Intermediate Testing Level A Only one administration of the test is required to compute the internal consistency coefficient. This is derived by splitting the test questions into two and then correlating the respective totals for the two halves. One approach is to divide out the odd and even questions and then correlate odds with evens.