Reliability

advertisement
MGTO 324 Recruitment and Selections
Reliability
Kin Fai Ellick Wong Ph.D.
Department of Management of Organizations
Hong Kong University of Science & Technology
Prologue
• A quick test of your understanding of reliability
– Which of the following is the most reliable thermometer?
Trial
Thermometer A
Thermometer B
Thermometer C
1
101
103
108
2
99
97
108
3
100
102
107
4
101
103
108
5
99
95
108
Tests that are relatively free of measurement random errors are deemed to be reliable.
Reliability refers to consistency of measurement or repeatability.
Prologue
How reliable your eyes are….
How many digits are there?
“456”
“8734868412473”
“24368414863247821475235874123547532547”
Prologue
How reliable your eyes are….
How many digits are there?
“888”
“88888888888888”
“88888888888888888888888888888888888”
Prologue
• Some items are more reliable than others.
• Which “item” is more reliable?
– “456” vs. “888”
– “8734868412473” vs. “88888888888888”
“24368414863247821475235874123547532547”
vs.
“88888888888888888888888888888888888”
Outline
Reliability
Part I: Measurement errors
Part II: Estimating reliability
Outline
Reliability
Part I: Measurement errors
Part II: Estimating reliability
Part I: Measurement errors
• Reliability
– Consistency of measurement of an attribute.
– When we want to assess how well you eyes are…
• “8734868412473” vs. “88888888888888”
• Measurement errors
– Error is a kind of inaccuracy and variability
– Error can be reduced if the measuring device is precise,
but it cannot be completely eliminated
Part I: Measurement errors
• True score theory
– Observed score =
• True score + systematic errors + random errors
– Systematic errors
• A constant is added to every measure
• A constant CANNOT make a test/score “inconsistent”
• Having no effects on reliability
– Random errors
• A random value is added to every measure
• The random values are normally distributed, with mean = 0
• Measurement errors, leading to low consistency (i.e., low reliability)
Part I: Measurement errors
Part I: Measurement errors
• Reliability and measurement errors
– A test can truly reflect the “true score” is claimed as a reliable test
– For a perfectly reliable test
• The correlation between the true score and the observed score is 1
• There is no random error; measuring something meaningful.
– For a perfectly unreliable test
• The correlation between the true score and the observed score is 0
• The test is “full” of random error; measuring something meaningless.
Part I: Measurement errors
• Interpretation of reliability
– Consistency
– Theoretical reliability =
• The variance of the true score / the variance of the observed score
– When the reliability coefficient of an exam = 0.4
• 40% of the obtained scores is due to the true score
• 60% of the obtained scores is due to random errors
– When reliability coefficient = 1
• All observed-score (X) variance reflects true-score (T) variance
= σ2 T )
– When reliability coefficient = 0
• All observed variance reflects error variance (σ2X = σ2E)
(σ2X
Part I: Measurement errors
• Interpretation of reliability
– Consistency
– Theoretical reliability =
• The variance of the true score / the variance of the observed
score
– When the reliability coefficient of an exam = 0.4
• 40% of the obtained scores is due to the true score
• 60% of the obtained scores is due to random errors
Outline
Reliability
Part I: Measurement errors
Part II: Estimating reliability
Part II: Estimating reliability
• Three common methods
– Time sampling
• Same test given at two points in time
– Test-retest reliability
– Item sampling
• Different items used to assess the same attribute
– Parallel/alternate forms reliability
– Internal consistency
• Consistency of items within the same test
– Split half reliability
– Coefficient Alpha
Part II: Estimating reliability
Estimating
reliability
Test-Retest
Parallel/Alternate
Split Half
Coefficient Alpha
Part II: Estimating reliability
• Test-retest reliability
– Same test given at two points in time
– How to assess?
• Correlation between scores obtained from two occasions
Part II: Estimating reliability
S001
S002
S003
S004
S005
S006
S007
S008
S009
S010
Time 1
5
6
5
4
2
1
3
1
7
2
Time 2
4
6
4
4
1
1
2
1
6
1
• Test-retest reliability
– Same test given at two
points in time
– How to assess?
• Correlation between scores
obtained from two occasions
Part II: Estimating reliability
S001
S002
S003
S004
S005
S006
S007
S008
S009
S010
Time Time
1
2
5
4
6
6
5
4
4
4
2
1
1
1
3
2
1
1
7
6
2
1
C or re lat io ns
Time1
Time1
Pearson Correlation
1
Sig. (2-tailed)
.
N
10
Time2 Pearson Correlation
.970**
Sig. (2-tailed)
.000
N
10
**. Correlation is significant at the 0.01 level
(2-tailed).
Time2
.970**
.000
10
1
.
10
Part II: Estimating reliability
• Consideration before using test-retest method
– Carryover effects
• The first testing session influences scores from the second
session
– Test-retest time interval
• When the tests are too close, the influences of carryover
effects increase
• When the tests are too far, the influences of other factors
increase
Part II: Estimating reliability
• When should I use test-retest reliability?
– For “Trait” or characteristics that do not change over
time
• Intelligence, personality
• Test-retest is not appropriate for
– Changing characteristics
• Knowledge, mood, emotion, motivation
Part II: Estimating reliability
Estimating
reliability
Test-Retest
Parallel/Alternate
Split Half
Coefficient Alpha
Part II: Estimating reliability
S001
S002
S003
S004
S005
S006
S007
S008
S009
S010
Form 1
5
6
5
4
2
1
3
1
7
2
Form 2
4
6
4
4
1
1
2
1
6
1
• Parallel/alternate reliability
– Different items used to
assess the same attribute
– Two sets of tests are
developed
• How to assess?
– Correlation between
equivalent forms of test that
have different items
Part II: Estimating reliability
• Use of parallel/alternate forms
– The two forms have been constructed with an effort to
make them parallel
• Equal (very similar) observed means, SDs and correlation
with other measures
– The correlation between the two forms
• How reliable they are
• How parallel they are
– The two forms may be implemented in the same or
different periods
Part II: Estimating reliability
• Practical constraints
– Developing two forms are time and resources
consuming
– It may be difficult to retest the same group of
individuals
– Test developers usually prefer to base their estimate of
reliability on a single form of test
Part II: Estimating reliability
Estimating
reliability
Test-Retest
Parallel/Alternate
Split Half
Coefficient Alpha
Part II: Estimating reliability
• Split-half reliability
– Dividing a test into two parts
• First and second half
– Carryover effects
• Odd-even system
• Random split
• How to assess?
– The correlation between the two halves
• More items must be more reliable than fewer items
• Splitting half lowers/underestimates the reliability of a test
– Spearman-Brown method of split-half reliability
• Adjusting the problems of underestimation
Part II: Estimating reliability
S001
Mean score of all odd items
(1,3,5,7,9)
5
Mean score of all even items
(2,4,6,8,10)
4
S002
S003
S004
S005
6
5
4
2
6
4
4
1
S006
S007
S008
S009
1
3
1
7
1
2
1
6
S010
2
1
Part II: Estimating reliability
• Split-half reliability
– Spearman-Brown method of split-half reliability
• Adjusting the problems of underestimation
– Example
• The correlation between scores of the two halves is 0.5, what is the
split-half reliability?
• (2 x 0.5) / (1 + 0.5)
• 1 / 1.5 = 0.67
Part II: Estimating reliability
• Split-half reliability
– Other applications of Spearman-Brown formula
– Estimate the reliability of a lengthened test
• A test with 20 items has a reliability coefficient of 0.4. If we increase
the length of the test to 40 items, what is the reliability of the
lengthened test?
– Estimate the reliability of a shortened test
• A test with 10 items has a reliability coefficient of 0.8. If we decrease
the length of the test to 2 items, what is the reliability of the shortened
test?
– Estimate how many items are needed to increase the
reliability to a specific value
• A test with 20 items has a reliability coefficient of 0.6. If we want to
increase the reliability to 0.9, how many items do we need?
Part II: Estimating reliability
• Split-half reliability
– Advantages
• One test is enough
• No carryover or practice effect
– For random and odd-even splitting methods
– Disadvantages
• Different splitting methods yield different estimations
• Solutions
– Coefficient alpha or Cronbach’s Alpha
Part II: Estimating reliability
Estimating
reliability
Test-Retest
Parallel/Alternate
Split Half
Coefficient Alpha
Part II: Estimating reliability
• Cronbach’s alpha
– can be viewed as the average of all split-half reliability
coefficients resulting different splitting of a test
– When test items are continuous scores
Part II: Estimating reliability
• Cronbach’s alpha
– You are not required to memorize the formula
– Alpha ranges from 0 to 1
– How reliable is reliable?
• For basic research
– 0.7 to 0.8 is usually acceptable
– Refining tests up to 0.9 or above may waste resources
– High reliability might be expected for very focused tests
• For clinical settings
– High reliability is extremely important
– Decision might affect one’s future
– 0.9 or even 0.95
• For selection purpose
– 0.7 to 0.8 is reasonable
– Selection is not just determined by the test score…
Part II: Estimating reliability
• Cronbach’s alpha
– The most popular method of assessing reliability with
continuous variable
– I will show you how to assess different methods of reliability
by using SPSS in the next Workshop
Download