Uploaded by Miles Justine Rivera

Reliability in Psychological Assessment

Refers to the degree to which test scores are
consistent, dependable, or repeatable. It is a function
of the degree to which test scores are free from
errors of measurement (Drummon, 2000)
Refers to the consistency of test scores obtained by
the same persons when they are re-examined with
the same test on different occasions, or with
different sets of equivalent items, or under other
variable examining conditions (Anastasi & Urbina,
Underlies the error of measurement of a single
score, whereby we can predict the range of
fluctuation likely to occur in a single individual’s
scores as a results of irrelevant chance factors
Refers to internal consistency of a test based on
the number of items in the test and the average
inter-correlations among all items and
computing the average of these inter correlation
among test items
Some Sources of
(Error Variance)
Characteristics of an individual
General skills and techniques of taking
tests (test-wiseness or test naivete)
Stable response sets (e.g., to mark A option
marked more frequently that other options of
multiple-choice items, to mark true-false items
“true” when undecided)
Level of practice on the specific skills involved
(especially psychomotor skills)
Factors affecting performance on many
or all tests at a particular time
-Noises and distraction
-Emotional Strain
Administration of the test/
Appraisal of test performance
Conditions of testing: adherence to time
limits, freedom from distraction, clarity
of instructions, etc.
 Interaction of personality, sex, or race of
examiner with that of examinee that
facilitates or inhibits performance
 Unreliability or bias in grading or rating
Luck in selection of answers by sheer
 Momentary distraction
Psychological Assessment for the Guidance Practitioner by Alexa PrielaAbrenica (2009)
Key Concepts
Sources of Error
 Time Sampling: Test-Retest method
 Item Sampling: Parallel Forms Method
 Internal Consistency
◦ Split-half Method
◦ KR20 Formula
◦ Coefficient Alpha
Inter-rater Reliability
Measurement error will have an effect
on reliability.
 True scores vs Observed scores
 Reliability coefficient – ratio of variance
of true scores on a test to the variance
of observed scores
Source of Error: Time
Test-retest reliability
 Same test is given twice
 Find correlation of scores on both
 Disadvantage: Carryover effect
Source of Error: Items
Parallel Forms
 Different items are used to measure the
same attribute
 Pearson Product-Moment Correlation
Source of Error: Internal Consistency
 Tests are given and divided into halves
and scored separately
 Compare scores on first half and scores
on the second half
 Cronbach’s coefficient alpha – lowest
estimate of reliability expected
 Kuder Richardson 20(dichotomous)
KR20 (Kuder-Richardson Formula
an index of the internal consistency
reliability of a measurement instrument,
such as a test, questionnaire, or
 be applied to any test item responses
that are dichotomously scored
 Values of KR-20 generally range from
0.0 to 1.0, with higher values
representing a more internally
consistent instrument
very rare cases, typically with very
small samples, values less than 0.0 can
occur, which indicates an extremely
unreliable measurement
 0.7 is an acceptable value
 0.8 for longer tests of 50 items or
Methods of assessing reliability
(Drummond, 2006)
Procedure: same test given twice with
time interval between testings
 Coefficient: stability
 Problems: memory effect, practice
effect, change over time
Alternate Forms-Version 1
Procedure: equivalent test given with
one after the other
 Coefficient: equivalence
 Problems: hard to develop equivalent
Alternate Forms-Version 2
Procedure: equivalent test given with
time between testings
 Coefficient: equivalence and stability
 Problems: hard to develop equivalent
tests, may reflect change in behavior
over time
Internal Consistency
Procedure: one test given at one time
only (test divided into part in split-half)
 Coefficient: equivalence and internal
 Problems: uses shortened forms (splithalf), only good if traits are unitary or
homogeneous, gives high estimates on a
speeded test, hard to compute by hand
Psychological Assessment for the Guidance Practitioner by Alexa PrielaAbrenica (2009)
Inter-Rater/Inter-Observer Reliability
The degree of agreement among
 Percentage of agreement
 Correlations
 Cohen’s Kappa
Factors Affecting Reliability
Length increases reliability
 Homogeneity increases reliability
 Shorter time, higher reliability
 Types of reliability estimate
How reliable is reliable?
Usual reliability coefficients of at least
0.70 to 0.80