Estimating Test Reliability - Knight Chapman Psychological Ltd

FACTSHEET 30
Estimating Test Reliability
Test - Retest Reliability
The test-retest reliability coefficient is the correlation between the scores obtained by the same
person on two separate administrations of the test. If test results are generally different on the
second administration from the first, the test-retest reliability coefficient will be low. If there is a
good relationship between scores on the two occasions, the coefficient will be high.
Test-retest reliability therefore provides an index of the test’s stability over time. The test-retest
reliability coefficient will be weakened by random variations in scores from one test session to
another. These variations may result from:
i)
Changes in test conditions, such as distractions, changes in instructions, poorly controlled
timing etc.
ii)
Changes in candidates themselves, such as illness, tiredness, tension, mood, recent
experiences etc.
iii)
Ambiguous or opaque test items: if the test questions are unclear or open to different
interpretations, the candidates are more likely to respond differently on the two test sessions,
than would be the case if items were clear and unambiguous.
When test - retest reliability is reported in a test manual, it should be accompanied by details of the
time interval between the two testing sessions. This is because test-retest coefficients decrease as
this interval increases.
It is therefore arguable that there are potentially an infinite number of test-retest reliabilities for a
given test, one for every possible time interval.
The benefits of the test - retest method for estimating reliability are:
•
Only one form of the test is required
•
The reliability estimate provides an indication of the extent to which scores on a test can be
generalised over different occasions; the higher the test-retest reliability the less the test is
influenced by random variations in the conditions of the candidates or the testing environment.
© 2009 Knight Chapman Psychological Ltd. All rights reserved.
BPS Occupational
Testing Level
BPS Occupational
B Intermediate
Testing
Level A
The reliability coefficient is in fact a correlation coefficient. To arrive at a correlation coefficient
we have to collect for a single sample two sets of data and correlate them. In this section we will
look at the kinds of data that can be used to provide an estimate of a test’s reliability.
The reliability coefficients published in test manuals are derived using one (or more) of the three
methods described below.
FACTSHEET 30
Estimating Test Reliability
•
Even though test-retest reliability provides some measure of item quality, it is likely that even
poor (ie ambiguous or unclear) items will be answered in the same way on both testing
sessions by many candidates. This may lead to over-confidence in the quality of test items.
•
Candidates are likely to vary in the extent to which they benefit from practice effects when
retaking the same test.
•
Memory of the first session may influence responses in the second session, resulting in
non-independence of the two sets of scores and a spuriously high correlation.
Alternate-Form Reliability
This is computed by comparing scores obtained by the same individuals on two forms (or versions)
of the same test. Of course, this method is limited to tests for which more than one form exists.
Like test-retest coefficients, alternate-form reliabilities should be accompanied by details of the
interval between sessions. If this period is longer rather than shorter, alternate-form will be influenced
by the same factors affecting the condition of the candidate as test-retest reliability.
Benefits of estimating reliability in this way are:
•
This provides a good measure of item quality in that ambiguous or unclear items should result
in reduced alternate-form reliability.
•
Practice effects and memory of test content/solutions are attenuated.
Disadvantages include the following:
•
Most occupational test batteries do not provide alternate forms. In these cases the method
obviously cannot be used.
•
Though attenuated, practice effects may transfer across very similar parallel forms.
© 2009 Knight Chapman Psychological Ltd. All rights reserved.
BPS Occupational
Testing Level
BPS Occupational
B Intermediate
Testing
Level A
The disadvantages are:
FACTSHEET 30
Estimating Test Reliability
Internal Consistency
More sophisticated ways of dividing the questions into two sets are also available. If all of the
questions are measuring the same characteristic, the internal consistency coefficient should be
high: perhaps in the range 0.65 - 0.90. If a test has been designed to assess a number of
heterogeneous attributes, as is the case with some tests of general intelligence, the internal
consistency will be lower. Generally tests are designed to measure discrete attributes and so should
have good internal consistencies.
Benefits of this method of measuring reliability are
•
Internal consistency provides a good measure of item quality because (as with alternate-form
reliability) ambiguous or unclear items will reduce the internal consistency coefficient.
•
Only one administration of the test is required.
•
Practice/memory effects are eliminated.
•
The reliability coefficient also indicates the extent to which the instrument measures a discrete
attribute.
Disadvantages are:
•
This method is not appropriate for homogenous tests (ie those consisting of a variety of item
types, such as tests of general intelligence).
•
Internal consistency does not indicate a test’s stability over time.
•
Internal consistency may be spuriously high if items are so similar as to cause interdependence
between items (eg if a self-report scale consists of items of almost identical wording).
•
Internal consistency can be less useful with speeded tests. Not all candidates will complete all
of the test items.
© 2009 Knight Chapman Psychological Ltd. All rights reserved.
BPS Occupational
Testing Level
BPS Occupational
B Intermediate
Testing
Level A
Only one administration of the test is required to compute the internal consistency coefficient. This
is derived by splitting the test questions into two and then correlating the respective totals for the
two halves. One approach is to divide out the odd and even questions and then correlate odds with
evens.