SEM Power Analysis (N. K. Bowen)

advertisement
v.2 N.K. Bowen, 2014
Estimating Power for SEM Analyses
Brief Background on Power Analysis
The power (Beta) of a statistical test is the probability of correctly rejecting a
false null hypothesis. The calculation of power for common statistical procedures is
based on a pre-established p value (alpha, ), effect size (ES), and sample size (N).
Bigger effect size
Smaller SD
Bigger alpha
Bigger sample
more power
bigger ES
more power
smaller SD
more power
bigger ES
more power
By convention we want power (Beta) to be .80 or higher.
Beta expresses the probability of detecting a true alternative hypothesis (or
rejecting a false null hypothesis).
1 – Beta = 20% is the probability of inappropriately rejecting a true alternative
hypothesis (making a Type II error; saying we do not have a finding when we
really do have one).
Alpha is typically set at .05.
Alpha expresses the probability of rejecting a true null hypothesis (making a Type
I error; saying we have a finding when we really don’t). We make it small (5%).
We are more willing to inappropriately reject a true alternative hypothesis (say we
have no finding when we really do) than to inappropriately reject a true null
hypothesis (say we have a finding when we don’t).
Effect size
Different statistics can be transformed into comparable ESs. Correlations,
regression coefficients, and the difference between two means can be expressed
as effect sizes. The size of an effect has a big influence on power. Think of a
logical extreme. If an intervention cures a mental health disorder 100% of the
time compared to 0% getting better without intervention, it won’t take too many
cases before you are convinced the intervention is effective. If an intervention
cures 60% of cases when 50% would get better even without the intervention, it
would take a lot of cases before you decide the intervention is better than doing
nothing. The 100% fix is a big effect; the 10% improvement is a small effect. A
smaller number of cases is needed to detect a bigger ES.
1
v.2 N.K. Bowen, 2014
Effect size is partially determined by the spread of scores in a distribution. A
common effect size in intervention research is calculated from the difference in
group means on an outcome. The difference in two group means is divided by the
SD of one of them, or their average or pooled SD. Using this formula, you can see
that an effect size is bigger and easier to detect if the SD is small.
_ _
ES =
X1 – X2
10-5/SD = 5/1
ES = 5 if SD = 1
SD
10-5/SD = 5/10 ES = .5 if SD = 10
Also demonstrated with pictures of distributions
Much more overlap here with big SD
It will be harder to show a difference.
We use distributions and known probabilities of scores given distributions to
evaluate our statistics. We know, for example, that with a normally distributed
variable with mean of 0 and SD of 1, a score with an absolute value over 1.96 is
highly unlikely—only a 5% probability given the distribution. Therefore if we get a
score in that range or beyond, we figure it could very well come from a different
population (which would be represented with a different curve with a different
mean).
The chi square distribution is used for evaluating overall fit in SEM as well as for
comparing models.
2
v.2 N.K. Bowen, 2014
One Approach to Estimating Power in SEM
MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between
covariance structure models: Power analysis and null hypothesis. Psychological
Methods, 11(1), 19-35. doi:10.1037/1082-989X.11.1.19; 10.1037/1082989X.11.1.19.supp
Article Highlights
MacCallum, Browne, & Sugawara (1996) suggest that the RMSEA ()can be treated
like an effect size for testing the power of SEM models. The confidence interval
of the RMSEA also plays a critical role of their evaluation of power. In addition to
sample size (N), effect size (ES), and alpha (), the degrees of freedom (df) of a
model contribute to the calculation of power in MacCallum et al.’s approach.
After describing the statistical theory behind their approach (see notes below),
MacCallum et al. 1996 make the point that the null hypothesis in SEM (that the
input and implied matrices are the same, Σ = Σ*) can be restated in terms of the
value of RMSEA. The RMSEA is a function of the minimization function value (F)
and a model’s df. The null hypothesis that is analogous to the usual one in SEM
using RMSEA is: H0: because is a measure of the discrepancy between Σ and
Σ*. The authors recommend using RMSEA = in tests of power, but not in the
simple null form given above.

Specifically, they recommend examination of a null hypothesis of “not-close fit,”
instead of the usual  test of the null hypothesis of exact fit. Here is their
thinking:
1. Exact fit is not a plausible finding with an over-identified model.
2. Tests of close fit would be better than tests of exact fit, because close
fit between the input and implied matrices is plausible with well-specified
models.
--To determine power to test a hypothesis of close fit, the authors
suggest setting the null hypothesis as ≤and the alternative
hypothesis =
3. Tests of “not-close fit” are even better than tests of close fit, however.
Tests of “not-close fit” make the null hypothesis more similar to the null
in most statistical tests—i.e., the null is now the undesirable option; we
hope to reject the null hypothesis.
3
v.2 N.K. Bowen, 2014
--To determine power to test a hypothesis of not-close fit, the
authors suggest setting the null hypothesis as ≥and the
alternative hypothesis =

Note that the value of .05 as the transition point between good fit and bad fit is
recommended by MacCallum et al., but a different value could be used. Some
researchers suggest .06 as the upper bound for good fit (see West et al., 2012).
Different values can be used in the computer program provided by MacCallum et
al., but the tables they provide are based on .05. Given that any cutoff is arbitrary,
as admitted by the authors, using other cutoffs is okay. It is important to
remember that the tests here are not tests of fit, but of the power to accept or
reject hypotheses (therefore the specific cutoff is less important).

Practical Use of the Tables in the 1996 MacCallum Article
Table 1: Using RMSEA Confidence Intervals to Test Hypotheses of Fit
Table 1 on page 137 of MacCallum et al. (1996) can be used as a guide to interpret
RMSEA values if it is determined that a model has adequate power to test the
specified hypothesis. With just the RMSEA reported for a model and its CI, we
can decide if we can accept or reject hypotheses of exact fit, close fit, or notclose fit. To use Table 1 in conjunction with RMSEA information reported in a
study (or obtained in your own study), do the following:
1. Ascertain the author’s definition of close fit (.05, .06, .08 are commonly
chosen definitions; we recommend either .05 or .06).
2. Substitute the author’s definition of close fit into the statements in the
first column of the first table (if it is not .05).
3. Examine the RMSEA’s CI and determine which of the three statements in
the first column accurately described the values in relation to the
author’s definition of close fit.
4. Look across the row of the applicable statement to see its implications for
the rejection of the hypotheses of close and not-close fit. Note that for
some studies you may not be able to reject either hypothesis.
4
v.2 N.K. Bowen, 2014
Table 2: Determining the Power of SEM Models based on DF and N
Table 2 on page 142 presents selected results that can be obtained with the first
SAS program provided in the Appendix of MacCallum et al., 1996. Code and
information on using the program is included at the end of this handout. The table
can be used to quickly estimate power for proposals and studies (and provide an
authoritative citation), and to evaluate whether completed studies actually had
power to test their hypotheses. As noted by MacCallum et al., many published
studies did not have power to support the researchers’ conclusions.
Note that the table’s values are based on an  of .05. The test of close fit is based
on a null hypothesis of ≤and an alternative hypothesis of aWith this
test you hope to retain the null hypothesis of close fit by obtaining a value of
that is below the critical ratio. A value above the critical ratio supports the
not-close fit alternative hypothesis and fails to support the hypothesis of close fit.
Power is based on the area under the non-central curve for the alternative
hypothesis (as shown in Figure 3, page 140), which is the probability that ≥the
critical ratio for the test. This feels backwards because it is!
As pointed out by MacCallum et al., it makes more sense to test (and hopefully
reject) the null hypothesis of not-close fit. Table 2’s values for hypotheses of nonclose fit are based on a null hypothesis that
≥ .05 and an alternative hypothesis
that a= .01. In this case, as in conventional statistical tests, the researcher hopes
to reject the null hypothesis of not-close fit and retain the alternative hypothesis
of close fit. A value below the critical ratio supports the close fit alternative
hypothesis in this test (because lower are better) and fails to support the null
hypothesis of not-close fit. Power is based on the area under the non-central
curve for the alternative hypothesis, as shown in Figure 4 on page 141.
To use Table 2 to determine the power of a study with a given df and N to reject
the null hypothesis of not-close fit, do the following:
1. Find the df value in the leftmost column in the table that is closest to the
df of the study whose power you are examining (choices range from 5 to
100).
2. Find the closest sample size among the column headings under Sample
Size (choices range from 100 to 500).
5
v.2 N.K. Bowen, 2014
3. Look at the value at the intersection of the df row labeled “not close” and
the sample size column. The value is the power of a study with the given
df and sample size to reject the null hypothesis of not-close fit.
Adequate power is conventionally defined as .80 or higher.
Above and below the values for Not-close fit in Table 2 are values for close
and exact fit, respectively. The default test in SEM programs is a test of
exact fit.
Table 4: Determining the Minimum Sample Size Needed for Adequate Power in
SEM Studies
Table 4 on page 144 of MacCallum et al. provides selected results that can be
obtained with the second SAS program provided in the Appendix of MacCallum et
al., 1996. The table and program help you determine the minimum sample size
required to have adequate power for a proposed study (or to see if a completed
study had power to accept or reject hypotheses about fit.
To use Table 4 to find the minimum sample size required for a study with a given
df and with a hypothesis of close or not-close fit, do the following:
1. Find the degrees of freedom of the study in the first column (choices
range from 2 to 100).
2. In the appropriate column for you hypothesis, find the sample size in the
df row. The N value is the minimum number of cases needed to have
adequate power for your hypothesis testing.
6
v.2 N.K. Bowen, 2014
Highly Recommended References
Lee, T., Cai, L., & MacCallum, R. C. (2012). Power analysis for tests of structural
equation models. In R. H. Hoyle (Ed.), Handbood of structural equation
modeling (pp. 181-194). New York: Guilford Press.
MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor
analysis. Psychological Methods, 4(1), 84-99.
MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between
nested covariance structure models: Power analysis and null
hypotheses. Psychological Methods, 11(1), 19-35. doi:10.1037/1082989X.11.1.19; 10.1037/1082-989X.11.1.19.supp
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and
determination of sample size for covariance structure modeling. Psychological
Methods, 1, 130-149.
Tomarken, A. J., & Waller, N. G. (2003). Potential problems with "well fitting"
models. Journal of Abnormal Psychology, 112(4), 578-598. doi:10.1037/0021843X.112.4.578
West, S. G., Taylor A. B., & Wu, W. (2012). Model fit and model selection in
structural equation modeling. In R. H. Hoyle (Ed.), Handbook of structural
equation modeling (pp. 209-231). New York, NY: Guilford Press.
7
v.2 N.K. Bowen, 2014
RMSEA and Power Analysis for SEM Using SAS Calculator
MacCallum, R.C., Browne, M.W., & Sugawara, H.M. (1996). Power analysis and determination
of sample size for covariance structure modeling, Psychological Methods 1, 130-149.
RMSEA is a fit index that MacCallum et al. suggest can be used to calculate the power of
any given SEM to detect close fit or not close fit. MacCallum and others note that finding
exact fit in an over-identified model (where S = is virtually impossible. Therefore tests
of close fit or not-close fit are recommended. MacCallum et al., recommend testing for
“not close fit” because in this test, the null hypothesis is the one we want to reject (as in
most statistical tests). If we can reject the null hypothesis that the implied covariance
matrix does not have close fit, we can claim support for our model.
RMSEA as described by the authors is a function of the estimate of the population chi
square, degrees of freedom in a model, and sample size. Specifically it equals the square
root of (F^o/df), where df = degrees of freedom, and F^o equals - df/(N-1). RMSEA, takes
into account the degrees of freedom in a model. The more df for a given  and sample
size, the lower the RMSEA. In addition, larger sample sizes are rewarded—the larger the
sample size for a given  and degrees of freedom, the lower the RMSEA.
To conduct an SEM power analysis using the RMSEA, we need an alpha level (, or preestablished level of significance of our statistical test; degrees of freedom of the model
(d) calculated the usual way as the difference between unique covariance model elements
and the number of parameters to be estimated; sample size (n), and two values of RMSEA
that serve as null and alternative hypothesis values ( and a respectively).
The tables in the 1996 article can be used to get approximate ideas about the power of
different analyses, but their SAS code can also be used to get exact power values for
SEM analyses with different characteristics. As mentioned above, they use  and a to
represent the RMSEA null and alternative hypothesis respectively.
To test a null hypothesis of close fit, use RMSEA  .05 for the null hypothesis RMSEA and
.08 for non-close fit alternative hypothesis. In tests to determine power or sample size
requirement, these hypothesized RMSEA values are used in conjunction with model
characteristics, sample size, and degrees of freedom. To test a null hypothesis of notclose fit (.05 for the null and .01 for the alternative hypothesis).
To run this SAS code, all you have to do is open SAS and paste the code below into the
editor, then run. You can substitute values for df (d) and n as needed. The code for
estimating power is below. Code for determining minimum sample size can be found in the
Appendix of the 1996 article.
8
v.2 N.K. Bowen, 2014
SAS code from MacCallum et al., 1996 on determining the power of tests for close fit and
not-close fit in models with given degrees of freedom and sample sizes. (Example below is
for test of close fit; to change it to a test of non-close fit, the “rmseaa=.08” statement
would be changed to = .01.) Indicated elements are changeable; the rest will usually stay
the same. If this text is cut and pasted into SAS, be sure to delete the arrows and
italicized notes on the right before running the code.
title "power estimate for SEM";
data one;
alpha=.05; *significance level;
rmsea0=.05; *null hyp value;
rmseaa=.08; *alt hyp value;
d=50; *degrees of freedom;
n=200; *sample size;
ncp0=(n-1)*d*rmsea0**2;
ncpa=(n-1)*d*rmseaa**2;
if rmsea0<rmseaa then do;
cval=cinv(1-alpha,d,ncp0);
power=1-probchi(cval,d,ncpa);
end;
if rmsea0>rmseaa then do;
cval=cinv(alpha,d,ncp0);
power=probchi(cval,d,ncpa);
end;
output;
proc print data=one;
var rmsea0 rmseaa alpha d n power;
run;
(.08 can be changed to .01)
(This value changes based on model)
(This value changes based on model)
Examples of output:
1. power estimate for SEM 1 (Default in code)
(Test of close fit)
Obs
rmsea0
rmseaa
alpha
1
0.05
0.08
0.05
d
n
power
50
200
0.7691
2. power estimate for SEM (from a measurement model with 61 observed variables)
(Test of close fit)
Obs
rmsea0
rmseaa
alpha
d
n
power
1
0.05
0.08
0.05
1579
391
.
3. power estimate for SEM
(Default d and n in test of Not-close fit)
(Test of not-close fit)
Obs
rmsea0
rmseaa
alpha
d
n
power
1
0.05
0.01
0.05
4. power estimate for SEM
(Test of not-close fit in the CFA with 61 variables)
Obs
rmsea0
rmseaa
alpha
1
0.05
0.01
0.05
9
50
200
0.60824
d
n
power
1579
391
1
Download