hwk DRAFT - jwalkonline.org

advertisement
Walk Homework #2
1
Michael J. Walk
Modern Measurement Theories
Homework #2
Question 1:
Test Length
1/3
1/2
1
2
3
Type
Test-retest
Parallel
Cronbach’s α
Test-retest
.33
.43
.60
.75
.82
Estimate
.60
.80
.95
Reliability Coefficient
Parallel Forms
.57
.67
.80
.89
.92
95% CI Lower
0.52
0.76
0.94
Cronbach’s α
.86
.90
.95
.97
.98
95% CI Upper
0.68
0.84
0.96
Question #2
(a)
Tests that are parallel measure the same construct in the same way. For example, if one
developed a test to measure Algebra 1 ability, one could develop a similar test designed to
measure the same ability. However, for these two tests to be parallel, they must be statistically
equivalent. First, true scores on both tests must be equal. Second, the error variance on both tests
must be equal (i.e., measurement error is the same across forms). Third, there can be no
correlation between the test errors. I think that Raykov (1997, as cited in Graham, 2006) put it
succinctly: “All items must measure the same latent variable, on the same scale, with the same
degree of precision, and with the same amount of error” (p. 934). Practically speaking, if the two
tests are parallel, an individual’s score on both tests should be the same (on average) after
repeated administrations of the tests (assuming no time-based effects, e.g., fatigue, carry-over,
etc). If the tests are parallel, they are statistically and functionally equivalent.
(b)
To create parallel tests, the first step would to choose items for both forms that measure
the same construct. In addition, all items should be scored the same way as all other items, and to
make things easier, there should be the same number of items on both forms of the test.
In practical terms, a method for creating parallel tests is to create a pair of almost
identical items and put one on each form. For example, I may want to test a student’s ability to
solve radical equations. So, I create one item (open-response) that asks the examinee to solve the
following equation for x: √(x – 3) = 7. Then, I create a similar item (with the same conceptual
difficulty), √(x – 4) = 8, by changing some numbers. One could repeat this procedure for all
items on both tests, and one would be off to a good start creating parallel forms.
Walk Homework #2
2
(c)
Apart from the appearance of being parallel (i.e.,. the tests both seem to measure the
same thing), one would have to conduct a study that has each examinee complete both forms of
the test (include counterbalancing for order effects). Then, one could use confirmatory factor
analysis to analyze the adequacy of fitting a parallel measurement model to the data. That is, if
examinees’ observed scores load equally onto a single latent variable, the scores have equal error
variances, and the error variances are not correlated, then the forms are parallel. One would have
to compare the fit of a parallel measurement model to the fit of a tau-equivalent or possibly an
essentially parallel model to see if alternative (less-restrictive) models provide better data fit.
Question #3
(a)
Cronbach’s α is a measure of internal consistency (i.e., the degree to which responses
across items represent true score variance). For at least essentially tau-equivalent measures, it
provides the average of all possible split-half correlations. For congeneric measures, Cronbach’s
α actually underestimates score reliability. It is important to remember that this reliability is a
function of the test scores (which are a function of participant characteristics, test characteristics,
and administration characteristics; Traub & Rowly, 1991), and not a characteristic of a test. So,
reliability coefficients do not give information about the quality of a test, but of the quality of the
scores it produces for a given sample and administration mode. In addition, Cronbach’s α does
not provide information about the dimensionality of the test scores. That is, it is possible for
scores to not be unidimensional but the estimate for Cronbach’s alpha to still be high.
(b)
Theoretically, the true reliability of a test is equal to the ratio of true score variance to
total observed variance. However, estimation of α can be affected by a number of factors. For
example, a single poorly written item that violates the tau-equivalent model can impact the
estimated Cronbach’s α for a set of scores (Graham, 2006). Therefore, even though the test
scores may be theoretically reliable (i.e., all items accurately assess a person’s standing on a
given construct), the numerical value of the reliability coefficient (i.e., the value of Cronbach’s
α) may underestimate the scores’ actual reliability.
Since Cronbach’s α is calculated based on sample data, sample characteristics also affect
its estimated value. For a given sample, the value of Cronbach’s α may be underestimated due to
a lack of heterogeneity in examinee true scores. Since there is less true score variance to be
accounted for, the proportion of true score variance to total observed variance is lower than when
there is a large amount of true score variance to be accounted for. Assuming that the population
of interest contains a large amount of true score variance, then a homogenous sample will
provide an underestimate of Cronbach’s α in the population.
Question #4
Walk Homework #2
Predicted True Scores Based on Observed Scores
True Score
Lower Estimate
Upper Estimate
30.00
Predicted True Score
25.00
20.00
15.00
10.00
5.00
0.00
2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Observed Score
3
Walk Homework #2
Predicted Parallel Test Scores
Predicted Score
Upper Estimate
Lower Estimate
30.00
Predicted Score
25.00
20.00
15.00
10.00
5.00
0.00
2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Observed Test Score
4
Walk Homework #2
5
Conditional vs. Unconditional SEM
3.00
CSEM
2.50
2.00
1.50
1.00
0
5
10
15
20
25
30
SUM
Question #5
In order to test which measurement model fits the data, I conducted confirmatory factor
analysis using LISREL 8. Since the data are binary, I analyzed a matrix of tetrachoric
correlations using the weighted least squares estimation method. The congeneric model is the
least restrictive and would provide the best possible fit to the data, so I first tested the congeneric
model: factor loadings and error variances were estimated; the loading of the first item was fixed
to 1. Fit indices are presented in the table below. I then tested data fit of the tau-equivalent model
by constraining all factor loadings to 1. Fit indices are presented in the table below.
The Congeneric model fits the data significantly better than the Tau-Equivalent model,
Δχ2(29) = 25486.63, p < .001. In addition, the GFI (goodness of fit index) and the AIC (Akaike
Information Criterion) indicate that the congeneric model provides better data fit than the tauequivalent model. Because the parallel model is even more restrictive than the tau-equivalent
model, the data fit would only decrease; therefore, that model was not tested.
In addition, the congeneric model was a significantly better fit than the null model,
suggesting that there is a latent variable influencing the correlations between the observed items.
Consequently, the data are congeneric. (Incidentially, I tried to fit the essentially congeneric and
essentially tau-equivalent models to the data, and they failed to converge after several attempts.)
Walk Homework #2
Model
Null
Congeneric
Tau-Equivalent
χ2
200089.03
4162.59
29649.22
df
435
405
434
GFI
PGFI
AIC
.98
.86
.85
.80
4282.59
29711.22
6
Download