Reliability

advertisement
Reliability
• Consistency
• Test Scores & Error
– X=T+E
• As T goes up & E goes down, reliability increases
• Variance & Error Variance
2
2
2
   tr   e
Sources of Error
• Test Construction/Content
– Sampling; finite number of questions
– Poorly written questions
• Test Administration
– Error related to the test taker
– Error related to the test environment
– Error related to the examiner
Sources of Error (cont.)
• Test scoring & interpretation
– Objective v. subjective
– scoring rubrics
Parallel Tests
• Theoretical underpinning of reliability
– Similar content
• Same true score & same error variance
– Theoretical, not produced in reality
– Not to be confused with “alternate forms”
• Reliability can be defined as the correlation
between 2 parallel tests
rxx
Types of Reliability
• Reliability over time
• Internal consistency/reliability
• Inter-rater reliability
Reliability over time
• Test-retest reliability
– Obtained by correlating pairs of scores from the
same sample on two different administrations
of the same test
• Error related to passage of time & intervening
factors
• Alternate-Form (Immediate & Delayed)
– Error related to time & content
Internal Consistency
•
Split-half
1. Divide the test into two equivalent halves
•
•
•
Odd-even
Randomly assign items
Divide by equivalency of items
2. Calculate r between 2 halves
3. Correct with Spearman-Brown
•
Allows estimation of reliability of test that has been
shortened or lengthened
Internal Consistency (cont.)
• Inter-item consistency
– Index of homogeneity of test; degree to which
all items measure same construct
– Desirable: aids in interpretation of test (as
opposed to homogeneity of groups)
Internal Consistency (cont.)
• Kuder-Richardson formulas
– KR-20: statistic of choice for determining
reliability of tests with dichotomous items
(right-wrong)
– KR-21: can be used if assumption that all items
are of similar difficulty
Internal Consistency (cont.)
• Cronbach’s coefficient alpha
– Function of all items on test & the total test
score
– Each item conceptualized as a test
– 36-item test, 36-parallel tests
– In addition to use with dichotomous tests can be
used with tests containing nondichotomous
items, e.g., opinion, tests which allow partial
credit
Inter-rater reliability
• How well do 2 raters/judges agree?
– Correlation between scores from 2 raters
– Percentage of agreement; percentage of
intervals where both raters agreed behavior
occurred
– Kappa
Factors influencing reliability
• Length of test
– Longer tests increase percentage of domain that
can be sampled
– Point of diminishing returns
• Homogeneity of items
– Measure same construct; easier to interpret
• Dynamic or static characteristics
Factors influencing
reliability (cont.)
• Homogeneity of sample
– Restriction of range
– If sample is homogenous then any observed variance
must be error
• Power v. Speed tests
– Speed use test-retest; alternate forms; split half from 2
separately timed half tests
– Internal consistency not applicable
• Speed tests easy; internal consistency inflates reliability
Reliability of Individual Scores
• How much error is in an individual score?
– How much confidence do we have in a
particular score?
• Standard Error of Measurement
– Extent to which one individual’s scores vary
over tests that are presumed to be parallel
• Assume error is distributed “normally”
– Where is the individual’s “true” score?
Standard Error of Measurement
Smeas  S.D. 1 rxx
Smeas 15 1 .96  3
SEM (cont.)
• Odds are 68% that “true” score falls within
plus or minus 1 SEM.
• Odds are __% that “true” score falls within
plus or minus 2 (1.96) SEM.
• Odds are __% that “true” score falls within
plus or minus 3 SEM.
• WHAT IS THE RELATIONSHIP
BETWEEN RELIABILITY & SEM?
Standard Error of the Difference
of Two Scores
• Compare test takers performance on two
different tests
• Compare two test takers on the same test
• Compare two test takers on two different
tests
Standard Error of the Difference
 diff  
2
meas1

2
meas2
 diff   2  r1  r2
Standard Error of the Difference
• Set confidence intervals for difference
scores
• Difference scores contain error from both of
the comparison measures.
– Difference scores are less reliable than scores
from individual tests.
Test-retest reliability:
Social Interaction Self-Statement
• r+1+2 = .99
• r-1-2 = .99
•
•
•
•
r+1-1 = -.45
r+1-2 = -.55
r+2-1 = -.47
r+2-2 = -.56
Download