Walk Homework #2 1 Michael J. Walk Modern Measurement Theories Homework #2 Question 1: Test Length 1/3 1/2 1 2 3 Type Test-retest Parallel Cronbach’s α Test-retest .33 .43 .60 .75 .82 Estimate .60 .80 .95 Reliability Coefficient Parallel Forms .57 .67 .80 .89 .92 95% CI Lower 0.52 0.76 0.94 Cronbach’s α .86 .90 .95 .97 .98 95% CI Upper 0.68 0.84 0.96 Question #2 (a) Tests that are parallel measure the same construct in the same way. For example, if one developed a test to measure Algebra 1 ability, one could develop a similar test designed to measure the same ability. However, for these two tests to be parallel, they must be statistically equivalent. First, true scores on both tests must be equal. Second, the error variance on both tests must be equal (i.e., measurement error is the same across forms). Third, there can be no correlation between the test errors. I think that Raykov (1997, as cited in Graham, 2006) put it succinctly: “All items must measure the same latent variable, on the same scale, with the same degree of precision, and with the same amount of error” (p. 934). Practically speaking, if the two tests are parallel, an individual’s score on both tests should be the same (on average) after repeated administrations of the tests (assuming no time-based effects, e.g., fatigue, carry-over, etc). If the tests are parallel, they are statistically and functionally equivalent. (b) To create parallel tests, the first step would to choose items for both forms that measure the same construct. In addition, all items should be scored the same way as all other items, and to make things easier, there should be the same number of items on both forms of the test. In practical terms, a method for creating parallel tests is to create a pair of almost identical items and put one on each form. For example, I may want to test a student’s ability to solve radical equations. So, I create one item (open-response) that asks the examinee to solve the following equation for x: √(x – 3) = 7. Then, I create a similar item (with the same conceptual difficulty), √(x – 4) = 8, by changing some numbers. One could repeat this procedure for all items on both tests, and one would be off to a good start creating parallel forms. Walk Homework #2 2 (c) Apart from the appearance of being parallel (i.e.,. the tests both seem to measure the same thing), one would have to conduct a study that has each examinee complete both forms of the test (include counterbalancing for order effects). Then, one could use confirmatory factor analysis to analyze the adequacy of fitting a parallel measurement model to the data. That is, if examinees’ observed scores load equally onto a single latent variable, the scores have equal error variances, and the error variances are not correlated, then the forms are parallel. One would have to compare the fit of a parallel measurement model to the fit of a tau-equivalent or possibly an essentially parallel model to see if alternative (less-restrictive) models provide better data fit. Question #3 (a) Cronbach’s α is a measure of internal consistency (i.e., the degree to which responses across items represent true score variance). For at least essentially tau-equivalent measures, it provides the average of all possible split-half correlations. For congeneric measures, Cronbach’s α actually underestimates score reliability. It is important to remember that this reliability is a function of the test scores (which are a function of participant characteristics, test characteristics, and administration characteristics; Traub & Rowly, 1991), and not a characteristic of a test. So, reliability coefficients do not give information about the quality of a test, but of the quality of the scores it produces for a given sample and administration mode. In addition, Cronbach’s α does not provide information about the dimensionality of the test scores. That is, it is possible for scores to not be unidimensional but the estimate for Cronbach’s alpha to still be high. (b) Theoretically, the true reliability of a test is equal to the ratio of true score variance to total observed variance. However, estimation of α can be affected by a number of factors. For example, a single poorly written item that violates the tau-equivalent model can impact the estimated Cronbach’s α for a set of scores (Graham, 2006). Therefore, even though the test scores may be theoretically reliable (i.e., all items accurately assess a person’s standing on a given construct), the numerical value of the reliability coefficient (i.e., the value of Cronbach’s α) may underestimate the scores’ actual reliability. Since Cronbach’s α is calculated based on sample data, sample characteristics also affect its estimated value. For a given sample, the value of Cronbach’s α may be underestimated due to a lack of heterogeneity in examinee true scores. Since there is less true score variance to be accounted for, the proportion of true score variance to total observed variance is lower than when there is a large amount of true score variance to be accounted for. Assuming that the population of interest contains a large amount of true score variance, then a homogenous sample will provide an underestimate of Cronbach’s α in the population. Question #4 Walk Homework #2 Predicted True Scores Based on Observed Scores True Score Lower Estimate Upper Estimate 30.00 Predicted True Score 25.00 20.00 15.00 10.00 5.00 0.00 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Observed Score 3 Walk Homework #2 Predicted Parallel Test Scores Predicted Score Upper Estimate Lower Estimate 30.00 Predicted Score 25.00 20.00 15.00 10.00 5.00 0.00 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Observed Test Score 4 Walk Homework #2 5 Conditional vs. Unconditional SEM 3.00 CSEM 2.50 2.00 1.50 1.00 0 5 10 15 20 25 30 SUM Question #5 In order to test which measurement model fits the data, I conducted confirmatory factor analysis using LISREL 8. Since the data are binary, I analyzed a matrix of tetrachoric correlations using the weighted least squares estimation method. The congeneric model is the least restrictive and would provide the best possible fit to the data, so I first tested the congeneric model: factor loadings and error variances were estimated; the loading of the first item was fixed to 1. Fit indices are presented in the table below. I then tested data fit of the tau-equivalent model by constraining all factor loadings to 1. Fit indices are presented in the table below. The Congeneric model fits the data significantly better than the Tau-Equivalent model, Δχ2(29) = 25486.63, p < .001. In addition, the GFI (goodness of fit index) and the AIC (Akaike Information Criterion) indicate that the congeneric model provides better data fit than the tauequivalent model. Because the parallel model is even more restrictive than the tau-equivalent model, the data fit would only decrease; therefore, that model was not tested. In addition, the congeneric model was a significantly better fit than the null model, suggesting that there is a latent variable influencing the correlations between the observed items. Consequently, the data are congeneric. (Incidentially, I tried to fit the essentially congeneric and essentially tau-equivalent models to the data, and they failed to converge after several attempts.) Walk Homework #2 Model Null Congeneric Tau-Equivalent χ2 200089.03 4162.59 29649.22 df 435 405 434 GFI PGFI AIC .98 .86 .85 .80 4282.59 29711.22 6