Chapter 5: Reliability

advertisement
Chapter 5: Reliability Concepts
–
–
–
–
Definition of Reliability
Test consistency
Classical Test Theory
X
=
T
+
Obtained
True
Score
Score
E
Random
Error
–
–
–
–
–
Estimation of Reliability
Correlation coefficients (r) are used to estimate reliability
The proportion of variance attributable to individual differences
Directly interpretable
Reliability of .90 accounts for what % of the variance?
•
–
–
–
–
–
–
Conceptual True Score Variance
Fine for true score variance
T
=
X
E
Convert formula to variances (2)
Partial out the ratio of obtained score variance and error variance
Substitute the ratio of obtained score variance for 1
Reliability is the ratio of error variance to obtained score variance subtracted from 1
–
–
–
–
–
–
Types of Reliability
Test-retest
Alternate Forms
Split-Half
Inter-item consistency
Interscorer
–
–
Test-Retest Reliability
Coefficient of stability is the correlation of the two sets of test scores
•
–
Alternate (Equivalent)
Forms Reliability
Coefficient of equivalence is the correlation of the two sets of test scores
–
Split-half Reliability
–
–
Coefficient of internal consistency is the correlation of the two equal halves of the test.
Reliability tends to decrease when test length decreases
–
–
Spearman-Brown Formula
A correction estimate The S-B formula is computed with the following ratio:
# new items
# original items
–
–
–
Spearman-Brown Example
Reducing the test length reduces reliability
What is the new estimated reliability for a 100-item test with a reliability of .90 that is
reduced to 50 items?
–
–
–
–
–
Inter-item Consistency
The degree to which test items correlate with each other
Two special formulas to look at all possible splits of a test
a) Kuder-Richardson 20
b) Coefficient Alpha
–
–
Inter-scorer reliability
Tests (or performance) are scored by two independent judges and the scores are correlated
- What are fluctuations attributed?
•
–
Possible Sources of Error Variance
Error differences associated with differences in test scores
– Time Sampling
– Item Sampling
– Inter-Scorer Differences
–
–
Time sampling
Conditions associated with administering a test across two different occasions
–
–
–
Item Sampling
Conditions associated with item content
Content heterogeneity v. homogeneity
–
–
Inter-scorer differences
Error associated with differences among raters
•
–
Factors affecting the reliability coefficient
Test length
– The greater the number of reliable test items, the higher the reliability coefficient
– Larger test length increases the probability of obtaining reliable items that accurately
measure the behavior domain
Heterogeneity of scores
Item Difficulty
Speeded Tests (Timed tests)
Based on speed of work, not consistency of the test
For example, consistency of speed, not performance
Test situation
– Conditions associated with test administration
Examinee-related
–
–
–
–
–
–
–
–
–
–
–
– Conditions associated with the test taker
Examiner-related
– conditions associated with scoring and interpretation
Stability of the construct
– dynamic v. stable variables
stable variables more reliable
Homogeneity of the items
– The more homogeneous the items, the higher the reliability
–
–
–
–
Interpreting Reliability
A test is never perfectly reliable
A method for interpreting individual test scores takes into account random error
We may never obtain a test-taker’s true score
•
•
•
•
Standard Error of Measurement (SEM)
Provides an index of test measurement error
SEM interpreted as standard deviations within a normally distributed curve.
SEM is used to estimate true score by constructing a range (confidence interval) within
which the examinee's true score is likely to fall given the obtained score
SEM Formula
St is the standard deviation
rtt is the reliability
–
•
•
–
–
–
–
–
SEM Example
For example, X = 86, St = 10, rtt = .84
What is the SEM?
What is the Confidence Interval (CI)?
Within 1 standard deviation, there is a 68% chance that the true score falls within the
confidence interval
– 2 SDs = 95%
– 3 SDs = 99%
–
–
–
–
–
–
–
–
Generalizability Theory
Extension of Classical test theory
Based on domain sampling theory (Tryson, 1957)
Classical Theory emphasizes test error
Generalizability Theory emphasizes test circumstances, conditions, and content
Test score is considered relatively stable
Estimates sources of error that contribute to test scores
Variability is the result of variables or error in the test situation
–
–
–
–
Importance of Reliability
Estimates accuracy/consistency of a test
Recognizes that error plays a role in testing
Understanding reliability helps a test administrator decide which test to use
– Strong reliability contributes to the validity of a test
Related documents
Download