v.2 N.K. Bowen, 2014 Estimating Power for SEM Analyses Brief Background on Power Analysis The power (Beta) of a statistical test is the probability of correctly rejecting a false null hypothesis. The calculation of power for common statistical procedures is based on a pre-established p value (alpha, ), effect size (ES), and sample size (N). Bigger effect size Smaller SD Bigger alpha Bigger sample more power bigger ES more power smaller SD more power bigger ES more power By convention we want power (Beta) to be .80 or higher. Beta expresses the probability of detecting a true alternative hypothesis (or rejecting a false null hypothesis). 1 – Beta = 20% is the probability of inappropriately rejecting a true alternative hypothesis (making a Type II error; saying we do not have a finding when we really do have one). Alpha is typically set at .05. Alpha expresses the probability of rejecting a true null hypothesis (making a Type I error; saying we have a finding when we really don’t). We make it small (5%). We are more willing to inappropriately reject a true alternative hypothesis (say we have no finding when we really do) than to inappropriately reject a true null hypothesis (say we have a finding when we don’t). Effect size Different statistics can be transformed into comparable ESs. Correlations, regression coefficients, and the difference between two means can be expressed as effect sizes. The size of an effect has a big influence on power. Think of a logical extreme. If an intervention cures a mental health disorder 100% of the time compared to 0% getting better without intervention, it won’t take too many cases before you are convinced the intervention is effective. If an intervention cures 60% of cases when 50% would get better even without the intervention, it would take a lot of cases before you decide the intervention is better than doing nothing. The 100% fix is a big effect; the 10% improvement is a small effect. A smaller number of cases is needed to detect a bigger ES. 1 v.2 N.K. Bowen, 2014 Effect size is partially determined by the spread of scores in a distribution. A common effect size in intervention research is calculated from the difference in group means on an outcome. The difference in two group means is divided by the SD of one of them, or their average or pooled SD. Using this formula, you can see that an effect size is bigger and easier to detect if the SD is small. _ _ ES = X1 – X2 10-5/SD = 5/1 ES = 5 if SD = 1 SD 10-5/SD = 5/10 ES = .5 if SD = 10 Also demonstrated with pictures of distributions Much more overlap here with big SD It will be harder to show a difference. We use distributions and known probabilities of scores given distributions to evaluate our statistics. We know, for example, that with a normally distributed variable with mean of 0 and SD of 1, a score with an absolute value over 1.96 is highly unlikely—only a 5% probability given the distribution. Therefore if we get a score in that range or beyond, we figure it could very well come from a different population (which would be represented with a different curve with a different mean). The chi square distribution is used for evaluating overall fit in SEM as well as for comparing models. 2 v.2 N.K. Bowen, 2014 One Approach to Estimating Power in SEM MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between covariance structure models: Power analysis and null hypothesis. Psychological Methods, 11(1), 19-35. doi:10.1037/1082-989X.11.1.19; 10.1037/1082989X.11.1.19.supp Article Highlights MacCallum, Browne, & Sugawara (1996) suggest that the RMSEA ()can be treated like an effect size for testing the power of SEM models. The confidence interval of the RMSEA also plays a critical role of their evaluation of power. In addition to sample size (N), effect size (ES), and alpha (), the degrees of freedom (df) of a model contribute to the calculation of power in MacCallum et al.’s approach. After describing the statistical theory behind their approach (see notes below), MacCallum et al. 1996 make the point that the null hypothesis in SEM (that the input and implied matrices are the same, Σ = Σ*) can be restated in terms of the value of RMSEA. The RMSEA is a function of the minimization function value (F) and a model’s df. The null hypothesis that is analogous to the usual one in SEM using RMSEA is: H0: because is a measure of the discrepancy between Σ and Σ*. The authors recommend using RMSEA = in tests of power, but not in the simple null form given above. Specifically, they recommend examination of a null hypothesis of “not-close fit,” instead of the usual test of the null hypothesis of exact fit. Here is their thinking: 1. Exact fit is not a plausible finding with an over-identified model. 2. Tests of close fit would be better than tests of exact fit, because close fit between the input and implied matrices is plausible with well-specified models. --To determine power to test a hypothesis of close fit, the authors suggest setting the null hypothesis as ≤and the alternative hypothesis = 3. Tests of “not-close fit” are even better than tests of close fit, however. Tests of “not-close fit” make the null hypothesis more similar to the null in most statistical tests—i.e., the null is now the undesirable option; we hope to reject the null hypothesis. 3 v.2 N.K. Bowen, 2014 --To determine power to test a hypothesis of not-close fit, the authors suggest setting the null hypothesis as ≥and the alternative hypothesis = Note that the value of .05 as the transition point between good fit and bad fit is recommended by MacCallum et al., but a different value could be used. Some researchers suggest .06 as the upper bound for good fit (see West et al., 2012). Different values can be used in the computer program provided by MacCallum et al., but the tables they provide are based on .05. Given that any cutoff is arbitrary, as admitted by the authors, using other cutoffs is okay. It is important to remember that the tests here are not tests of fit, but of the power to accept or reject hypotheses (therefore the specific cutoff is less important). Practical Use of the Tables in the 1996 MacCallum Article Table 1: Using RMSEA Confidence Intervals to Test Hypotheses of Fit Table 1 on page 137 of MacCallum et al. (1996) can be used as a guide to interpret RMSEA values if it is determined that a model has adequate power to test the specified hypothesis. With just the RMSEA reported for a model and its CI, we can decide if we can accept or reject hypotheses of exact fit, close fit, or notclose fit. To use Table 1 in conjunction with RMSEA information reported in a study (or obtained in your own study), do the following: 1. Ascertain the author’s definition of close fit (.05, .06, .08 are commonly chosen definitions; we recommend either .05 or .06). 2. Substitute the author’s definition of close fit into the statements in the first column of the first table (if it is not .05). 3. Examine the RMSEA’s CI and determine which of the three statements in the first column accurately described the values in relation to the author’s definition of close fit. 4. Look across the row of the applicable statement to see its implications for the rejection of the hypotheses of close and not-close fit. Note that for some studies you may not be able to reject either hypothesis. 4 v.2 N.K. Bowen, 2014 Table 2: Determining the Power of SEM Models based on DF and N Table 2 on page 142 presents selected results that can be obtained with the first SAS program provided in the Appendix of MacCallum et al., 1996. Code and information on using the program is included at the end of this handout. The table can be used to quickly estimate power for proposals and studies (and provide an authoritative citation), and to evaluate whether completed studies actually had power to test their hypotheses. As noted by MacCallum et al., many published studies did not have power to support the researchers’ conclusions. Note that the table’s values are based on an of .05. The test of close fit is based on a null hypothesis of ≤and an alternative hypothesis of aWith this test you hope to retain the null hypothesis of close fit by obtaining a value of that is below the critical ratio. A value above the critical ratio supports the not-close fit alternative hypothesis and fails to support the hypothesis of close fit. Power is based on the area under the non-central curve for the alternative hypothesis (as shown in Figure 3, page 140), which is the probability that ≥the critical ratio for the test. This feels backwards because it is! As pointed out by MacCallum et al., it makes more sense to test (and hopefully reject) the null hypothesis of not-close fit. Table 2’s values for hypotheses of nonclose fit are based on a null hypothesis that ≥ .05 and an alternative hypothesis that a= .01. In this case, as in conventional statistical tests, the researcher hopes to reject the null hypothesis of not-close fit and retain the alternative hypothesis of close fit. A value below the critical ratio supports the close fit alternative hypothesis in this test (because lower are better) and fails to support the null hypothesis of not-close fit. Power is based on the area under the non-central curve for the alternative hypothesis, as shown in Figure 4 on page 141. To use Table 2 to determine the power of a study with a given df and N to reject the null hypothesis of not-close fit, do the following: 1. Find the df value in the leftmost column in the table that is closest to the df of the study whose power you are examining (choices range from 5 to 100). 2. Find the closest sample size among the column headings under Sample Size (choices range from 100 to 500). 5 v.2 N.K. Bowen, 2014 3. Look at the value at the intersection of the df row labeled “not close” and the sample size column. The value is the power of a study with the given df and sample size to reject the null hypothesis of not-close fit. Adequate power is conventionally defined as .80 or higher. Above and below the values for Not-close fit in Table 2 are values for close and exact fit, respectively. The default test in SEM programs is a test of exact fit. Table 4: Determining the Minimum Sample Size Needed for Adequate Power in SEM Studies Table 4 on page 144 of MacCallum et al. provides selected results that can be obtained with the second SAS program provided in the Appendix of MacCallum et al., 1996. The table and program help you determine the minimum sample size required to have adequate power for a proposed study (or to see if a completed study had power to accept or reject hypotheses about fit. To use Table 4 to find the minimum sample size required for a study with a given df and with a hypothesis of close or not-close fit, do the following: 1. Find the degrees of freedom of the study in the first column (choices range from 2 to 100). 2. In the appropriate column for you hypothesis, find the sample size in the df row. The N value is the minimum number of cases needed to have adequate power for your hypothesis testing. 6 v.2 N.K. Bowen, 2014 Highly Recommended References Lee, T., Cai, L., & MacCallum, R. C. (2012). Power analysis for tests of structural equation models. In R. H. Hoyle (Ed.), Handbood of structural equation modeling (pp. 181-194). New York: Guilford Press. MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psychological Methods, 4(1), 84-99. MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between nested covariance structure models: Power analysis and null hypotheses. Psychological Methods, 11(1), 19-35. doi:10.1037/1082989X.11.1.19; 10.1037/1082-989X.11.1.19.supp MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130-149. Tomarken, A. J., & Waller, N. G. (2003). Potential problems with "well fitting" models. Journal of Abnormal Psychology, 112(4), 578-598. doi:10.1037/0021843X.112.4.578 West, S. G., Taylor A. B., & Wu, W. (2012). Model fit and model selection in structural equation modeling. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (pp. 209-231). New York, NY: Guilford Press. 7 v.2 N.K. Bowen, 2014 RMSEA and Power Analysis for SEM Using SAS Calculator MacCallum, R.C., Browne, M.W., & Sugawara, H.M. (1996). Power analysis and determination of sample size for covariance structure modeling, Psychological Methods 1, 130-149. RMSEA is a fit index that MacCallum et al. suggest can be used to calculate the power of any given SEM to detect close fit or not close fit. MacCallum and others note that finding exact fit in an over-identified model (where S = is virtually impossible. Therefore tests of close fit or not-close fit are recommended. MacCallum et al., recommend testing for “not close fit” because in this test, the null hypothesis is the one we want to reject (as in most statistical tests). If we can reject the null hypothesis that the implied covariance matrix does not have close fit, we can claim support for our model. RMSEA as described by the authors is a function of the estimate of the population chi square, degrees of freedom in a model, and sample size. Specifically it equals the square root of (F^o/df), where df = degrees of freedom, and F^o equals - df/(N-1). RMSEA, takes into account the degrees of freedom in a model. The more df for a given and sample size, the lower the RMSEA. In addition, larger sample sizes are rewarded—the larger the sample size for a given and degrees of freedom, the lower the RMSEA. To conduct an SEM power analysis using the RMSEA, we need an alpha level (, or preestablished level of significance of our statistical test; degrees of freedom of the model (d) calculated the usual way as the difference between unique covariance model elements and the number of parameters to be estimated; sample size (n), and two values of RMSEA that serve as null and alternative hypothesis values ( and a respectively). The tables in the 1996 article can be used to get approximate ideas about the power of different analyses, but their SAS code can also be used to get exact power values for SEM analyses with different characteristics. As mentioned above, they use and a to represent the RMSEA null and alternative hypothesis respectively. To test a null hypothesis of close fit, use RMSEA .05 for the null hypothesis RMSEA and .08 for non-close fit alternative hypothesis. In tests to determine power or sample size requirement, these hypothesized RMSEA values are used in conjunction with model characteristics, sample size, and degrees of freedom. To test a null hypothesis of notclose fit (.05 for the null and .01 for the alternative hypothesis). To run this SAS code, all you have to do is open SAS and paste the code below into the editor, then run. You can substitute values for df (d) and n as needed. The code for estimating power is below. Code for determining minimum sample size can be found in the Appendix of the 1996 article. 8 v.2 N.K. Bowen, 2014 SAS code from MacCallum et al., 1996 on determining the power of tests for close fit and not-close fit in models with given degrees of freedom and sample sizes. (Example below is for test of close fit; to change it to a test of non-close fit, the “rmseaa=.08” statement would be changed to = .01.) Indicated elements are changeable; the rest will usually stay the same. If this text is cut and pasted into SAS, be sure to delete the arrows and italicized notes on the right before running the code. title "power estimate for SEM"; data one; alpha=.05; *significance level; rmsea0=.05; *null hyp value; rmseaa=.08; *alt hyp value; d=50; *degrees of freedom; n=200; *sample size; ncp0=(n-1)*d*rmsea0**2; ncpa=(n-1)*d*rmseaa**2; if rmsea0<rmseaa then do; cval=cinv(1-alpha,d,ncp0); power=1-probchi(cval,d,ncpa); end; if rmsea0>rmseaa then do; cval=cinv(alpha,d,ncp0); power=probchi(cval,d,ncpa); end; output; proc print data=one; var rmsea0 rmseaa alpha d n power; run; (.08 can be changed to .01) (This value changes based on model) (This value changes based on model) Examples of output: 1. power estimate for SEM 1 (Default in code) (Test of close fit) Obs rmsea0 rmseaa alpha 1 0.05 0.08 0.05 d n power 50 200 0.7691 2. power estimate for SEM (from a measurement model with 61 observed variables) (Test of close fit) Obs rmsea0 rmseaa alpha d n power 1 0.05 0.08 0.05 1579 391 . 3. power estimate for SEM (Default d and n in test of Not-close fit) (Test of not-close fit) Obs rmsea0 rmseaa alpha d n power 1 0.05 0.01 0.05 4. power estimate for SEM (Test of not-close fit in the CFA with 61 variables) Obs rmsea0 rmseaa alpha 1 0.05 0.01 0.05 9 50 200 0.60824 d n power 1579 391 1