Reliabilty

advertisement
Topics: Quality of Measurements
• Reliability
• Validity
The Quality of Measuring
Instruments: Definitions
• Reliability: Consistency - the extent to which
the data are consistent
• Validity: Accuracy- the extent to which the
instrument measures what it purports to
measure
Hitting the Bull’s Eye
The Questions of Reliability
• To what degree does a subject’s measured performance
remain consistent across repeated testings? How
consistently will results be reproduced if we measure the
same individuals again?
• What is the equivalence of results of two measurement
occasions using “parallel” tests?
• To what extent do the individual items that go together to
make up a test or inventory consistently measure the same
underlying characteristic?
• How much consistency exists among the ratings provided
by a group of raters?
• When we have obtained a score, how precise is it?
True and Error Score
Parallel Tests
Individuals Test1 Test 2
Test3 …
1
X11
X12
X13
2
X21
X22
3
X31
4
.
.
N
Test N
True Score
Error
…
X1n
True Score1 = Mean
Score of Tests
Error1 = Va riance
of Test Scores
X23
…
X2n
True Score2 = Mean Error2 = Va riance
Scores of Tests
of Test Scores
X32
X33
…
X3n
True Score3 = Mean Error3 = Va riance
Scores of Tests
of Test Scores
X41
X42
X43
…
X4n
True Score4 = Mean
Scores of Tests
XN1
XN1
XN1
…
XN1
True ScoreN = Mean ErrorN = Va riance
Scores of Tests
of Test Scores
Error4 = Va riance
of Test Scores
Sources of Error: Conditions of Test
Administration and Construction
•
•
•
•
•
•
•
•
•
•
•
Changes in time limits
Changes in directions
Different scoring procedures
Interrupted testing session
Qualities of test administrator
Time test is taken
Sampling of items
Ambiguity in wording of items/questions
Ambiguous directions
Climate of test situation (heating, light, ventilation, etc)
Differences in observers
Sources of Error: Conditions of the
Person Taking the Test
•
•
•
•
•
•
•
•
•
•
•
Reaction to specific items
Health
Motivation
Mood
Fatigue
Luck
Memory and/or attention fluctuations
Attitudes
Test-taking skills (test-wiseness)
Ability to understand instructions
Anxiety
Reliability
• Reliability: ratio of true variance to
observed variance
• Reliability coefficient: a numerical index
which assumes a value between 0 and +1.00
Relation between Reliability and
Error
Error
True-Score
Variability
Reliable Measure (A)
True-Score
Error
Variability
Unreliable Measure (B)
Methods of Estimating Reliablity
• Test-Retest: Repeated measures with the same test
(coefficient of stability)
• Parallel Forms: Repeated measures with
equivalent forms of a test (coefficient of
equivalence)
• Internal Consistency: Repeated measures using
items on a single test
• Inter-Rater: Judgments by more than one rater.
Reliability Is The Consistency Of
A Measurement
Repeated Measurements/Observations
Person
X1
X2
X3
...
Xk-->infinity
Charlie
20
19
21
...
20
Harry
15
17
16
...
16
Reliable
Repeated Measurements/Observations
Person
X1
X2
X3
...
Xk-->infinity
Charlie
20
10
8
...
23
2
11
4
...
15
Harry
Unreliable
Test-Retest Reliability
• Situation: Same people taking two administrations
of the same test
• Procedure: Correlate scores on the two tests which
yields the coefficient of stability
• Meaning: the extent to which scores on a test can be
generalized over different occasions (temporal
stability).
• Appropriate use: Information about the stability of
the trait over time.
Parallel (Alternate)Forms Reliability
• Situation: Testing of same people on different but
comparable forms of the test
• Procedure: correlate the scores from the two tests
which yields a coefficient of equivalence
• Meaning: the consistency of response to different item
samples (where testing is immediate) and across
occasions (where testing is delayed).
• Appropriate use: to provide information about the
equivalence of forms
Internal Consistency Reliability
• Situation: a single administration of one test form
• Procedure: Divide test into comparable halves and
correlate scores from both halves.
– Split Half with Spearman Brown adjustment
– Kuder Richardson #20 and #21
– Cronbach’s Alpha
• Meaning: consistency across the parts of a measuring
instrument (“parts” = individual items or subgroups of
items).
• Appropriate Use: Where focus is on the degree to which
same characteristic is being measured. A measure of test
homogeneity.
Inter-rater Reliability
• Situation: Having a sample of test papers (essays) scored
independently by two examiners
• Procedure: correlate the two sets of scores
–
–
–
–
Kendall’s coefficient of concordance
Cohen’s kappa
Intraclass correlation
Pearson product moment
• Meaning: measure of scorer (rater) reliability (consistency,
agreement) which yields the coefficient of concordance.
• Appropriate Use: For ensuring consistency between raters
When is a reliability satisfactory?
• Depends on the type of instrument
• Depends on the purpose of the study
• Depends on who is affected by results
Factors Affecting Reliability
Estimates
• Test length
• Range of scores
• Item similarity
Standard Error of Measurement
• All tests scores contain some error
• For any test, the higher the reliability
estimate, the lower the error
• The standard error or measurement is the
average standard deviation of the error
variance over the number of people in the
sample
• Can be used to estimate a range within
which a true score would likely fall
Use of Standard Error of
Measurement
• We never know the true score
• By knowing the s.e.m. and by
understanding the normal curve, we can
assess the likelihood of the true score being
within certain limits.
• The higher the reliability the lower the
standard error of measurement, hence more
confidence we can place in the accuracy of
a person’s test score.
Normal Curve
Areas Under the Curve
.3413
.3413
.1359
68%
.0214
.0214
95%
.0013
-3se
.1359
.0013
99%
-2se
-1se
X=test score
+1se
+2se
+3se
Warnings about Reliability
• No such thing as “the” reliability; Different
methods are assessing consistency from
different perspectives
• Reliability coefficients apply to the data,
NOT to the instrument
• Any reliability is only an estimate of
consistency
Download