Guidelines for Reliability, Confirmatory and Exploratory Factor Analysis Diana Suhr, Ph.D., University of Northern Colorado, Greeley CO Mary Shay, Cherry Creek Schools, Denver CO Abstract Reliability refers to accuracy and precision of a measurement instrument. Confirmatory factor analysis (CFA) is a statistical technique used to verify the factor structure of a measurement instrument. EFA, traditionally, is used to explore the possible underlying factor structure of a measurement instrument. Guidelines for reliability, confirmatory and exploratory factor analysis will be discussed. Examples using the Race and Schooling Instrument (Revised, Shay 2008) and PROC CORR, PROC CALIS and PROC FACTOR illustrate reliability, CFA and EFA statistical techniques. Introduction CFA and EFA are powerful statistical techniques. An example of CFA and EFA could occur with the development of measurement instruments, e.g. a satisfaction scale, attitudes toward health, customer service questionnaire. During the process, a blueprint is developed, questions written, a response scale determined, the instrument pilot tested, data collected, and CFA completed. The blueprint identifies the factor structure or what we think it is. If questions do not measure what we thought they should, the factor structure does not follow our blueprint, the factor structure is not confirmed, and EFA is the next step. EFA helps us determine what the factor structure looks like according to participant responses. Exploratory factor analysis is essential to determine underlying constructs for a set of measured variables. Measurement Instrument The example for this presentation analyzes data collected with the Race and Schooling School-Based Instrument (Shay, 2008). The 23-item instrument measures the essential principles of culturally responsive teaching and learning. Each question, aligning to a principle and category, provides insight into participants’ levels of involvement in culturally responsive teaching practices and attitudes and beliefs toward confronting institutional bias and discrimination in schools (Shay, in press). The Race and Schooling School-Based Instrument was developed by Singleton (2003) to include seven categories. Permission to use and adjust the instrument for research purposes was granted by Singleton to Shay (2008). See Appendix A to review the survey instrument. The original survey instrument had a 5-point Likert scale with (1) Rarely, (3) Sometimes, and (5) Often as response choices to survey questions. In the revised version, responses were changed to (1) Almost Never; (2) Seldom; (3) Sometimes; (4) Frequently; and (5) Almost Always. Additional questions were added to allow respondents to provide information on demographics, e.g., years of experience in the district and in the field of education. Data was collected in a metropolitan School District in Midwestern United States. Of the 282 respondents, 16% were male (n=42) and 84% were female (n=220). The majority of respondents were white (89.1%, n=229) with 3.5% Hispanic (n=9), 3.5% Black (n=9), 2.0% Asian (n=5) and 2.0% identified their race as Other. Reliability Reliability refers to the accuracy and precision of a measurement procedure (Thorndike, Cunningham, Thorndike, & Hagen, 1991). Reliability may be viewed as an instrument’s relative lack of error. In addition, reliability is a function of properties of the underlying construct being measured, the test itself, the groups being assessed, the testing environment, and the purpose of assessment. Reliability answers the question, How well does the instrument measure what it purports to measure? Some degree of inconsistency is present in all measurement procedures. The variability in a set of item scores is due to the actual variation across individuals in the phenomenon that the scale measures, made up of true score and error. Therefore, each observation of a measurement (X) is equal to true score (T) plus measurement error (e), or X = T + e. Another way to think about total variation is that it has two components: “signal” (i.e., true differences in the latent construct) and “noise” (i.e., differences caused by everything but true differences). 1 Sources of measurement inconsistency could be due to 1) a person changing from one testing to the next a. the amount of time between tests may have resulted in growth or change b. motivation to perform may be different at each testing c. the individual may be physically better able to perform, e.g., more rested d. the individual may have received tutoring between testings 2) the task being different from one testing to the next a. different environment b. different administrator c. different test items on parallel forms 3) the sample of behavior resulting in an unstable or undependable score a. the sample of behavior and evaluation of it are subject to chance or random influences b. a small sample of behavior does not provide a stable and dependable characterization of an individual c. for example, the average distance of 100 throws of a football would provide a more stable and accurate index of ability than a single throw. Reliability may be expressed 1) as an individual’s position in a group (correlation between first and second measurements; the more nearly individuals are ranked in the same order, the higher the correlation and the more reliable the test) 2) within a set of repeated measures for an individual (internal consistency, how consistency are items answered) Reliability can be assessed by 1) repeating the same test or measure (test-retest) 2) administering an equivalent form (parallel test forms) 3) using single-administration methods a. b. subdividing the test into two or more equivalent parts internal consistency – measured with Cronbach’s coefficient alpha. Internal Consistency Internal consistency is a procedure to estimate the reliability of a test from a single administration of a single form. Internal consistency depends on the individual’s performance from item to item based on the standard deviation of the test and the standard deviations of the items. ∀ = where ( n ) ( n - 1) ( SDt2 - ΕSDi2 ) ( SDt2 ) (1) ∀ is the estimate of reliability, n is the number of items in the test, SDt is the standard deviation of the test scores Ε means “take the sum” and covers the n items, SDi is the standard deviation of the scores from a group of individuals on an item. KR20, Kuder-Richardson Formula 20, is a special form of coefficient alpha that applies when items are dichotomous (e.g., yes/no, true/false) or are scored as right or wrong. Factors Influencing Reliability Many factors can affect the reliability of a measurement instrument. They are the 1) range of the group a. pooling a wider range of grades or ages produces a reliability coefficient of higher magnitude b. take into account the sample on which the reliability coefficient is based when comparing instruments 2) level of ability in the group a. precision of measurement can be related to ability level of the people being measured b. report the standard error of measurement for different score levels 3) methods used for estimating reliability a. amount of time between administrations b. method of calculating reliability 2 4) length of the test a. when reliability is moderately high, it takes a considerable increase in test length to increase reliability b. relationship of reliability to length of test can be expressed with rkk = krtt 1 + ( k – 1) rtt (2) where rkk is the reliability of the test k times as long as the original test, rtt is the reliability of the original test, and k is the factor by which the length of the test is changed. For example, If reliability is .60 for a 10-item instrument, what is reliability for a 20-item instrument? rkk = 2(.60) / (1 + (2 – 1)(.60)) = 1.20 / 1.60 = 0.75 Levels of Reliability Acceptable levels of reliability depend on the purpose of the instrument. Acceptable reliability of instruments developed for research purposes can be as low as 0.60. An acceptable reliability level of a diagnostic instrument used for making decisions about individuals (e.g., a psychological measure) should be much higher, e.g., 0.95. Comparisions The reliability coefficient provides a basis for assessment instrument comparison when measurement is expressed in different scales. The assessment with the higher reliability coefficient could provide a more consistent measurement of individuals. Statistical Power An often overlooked benefit of more reliable scales is that they increase statistical power for a given sample size (or allow smaller sample size to yield equivalent power), relative to less reliable measures. A reliable measure, like a larger sample, contributes relatively less error to the statistical analysis. The power gained from improving reliability depends on a number of factors including 1) the initial sample size 2) the probability level set for detecting a Type I error 3) the effect size (e.g., mean difference) that is considered significant 4) the proportion of error variance that is attributable to measure unreliability rather than sample heterogeneity or other sources. To raise the power, substitute a highly reliable scale for a substantially poorer one. For example, when n = 50, two scales with reliabilities of 0.38 have a correlation of 0.24, p < 0.10, and would be significant at p < 0.01 if their reliabilities were increased to 0.90 or if the sample was more than twice as large (n > 100). PROC CORR and options for Reliability DATA= OUTP= ALPHA NOMISS NOCORR NOSIMPLE input data set output data set with Pearson correlation statistics compute Cronbach’s coefficient alpha exclude observations with missing analysis values suppresses printing Pearson correlations suppresses printing descriptive statistics Confirmatory Factor Analysis CFA allows the researcher to test the hypothesis that a relationship between the observed variables and their underlying latent construct(s) exists. The researcher uses knowledge of the theory, empirical research, or both, postulates the relationship pattern a priori and then tests the hypothesis statistically. 3 PROC CALIS and options for CFA DATA = OUTSTAT= COV CORR METHOD= MAXITER= KURTOSIS MODIFICATION specifies dataset to be analyzed output statistic analyzes covariance matrix analyzes correlation matrix estimation method max iterations compute and display kurtosis modification indices Exploratory Factor Analysis Psychologists searching for a neat and tidy description of human intellectual abilities lead to the development of th th factor analytic methods. Galton, a scientist during the 19 and 20 centuries, laid the foundations for factor analytic methods by developing quantitative methods to determine the interdependence between 2 variables. Karl Pearson was the first to explicitly define factor analysis. In 1902, Macdonnell was the first to publish an application of factor analysis. His study compared physical characteristics of 3000 criminals and 1000 Cambridge undergraduates. Factor analysis could be described as orderly simplification of interrelated measures. Traditionally factor analysis has been used to explore the possible underlying structure of a set of interrelated variables without imposing any preconceived structure on the outcome (Child, 1990). By performing exploratory factor analysis (EFA), the number of constructs and the underlying factor structure are identified. Goals of factor analysis are 1) to help an investigator determine the number of latent constructs underlying a set of items (variables) 2) to provide a means of explaining variation among variables (items) using few newly created variables (factors), e.g., condensing information 3) to define the content or meaning of factors, e.g., latent constructs Assumptions underlying EFA are • Interval or ratio level of measurement • Random sampling • Relationship between observed variables is linear • A normal distribution (each observed variable) • A bivariate normal distribution (each pair of observed variables) • Multivariate normality Limitations of EFA are • the correlations, the basis of factor analysis, describe relationships. No causal inferences can be made from correlations alone. • the reliabilty of measurement instrument (avoid instrument with low reliability) • sample size ( larger sample Æ larger correlation) o minimal number for reliable results is greater than 100 and 5 times the number of items o since some subjects may not answer every item, a larger sample is desirable, e.g., for 30 items, at least 150 subjects (5*30), a sample of 200 subjects would allow for missing data. • sample selection o Representative of population o Do not pool populations • variables could be sample specific, e.g., a unique quality possessed by a group does not generalize to the population • nonnormal distribution of data 4 Factor Extraction Factor analysis seeks to discover common factors. The technique for extracting factors attempts to take out as much common variance as possible in the first factor. Subsequent factors are, in turn, intended to account for the maximum amount of the remaining common variance until, hopefully, no common variance remains. Direct extraction methods obtain the factor matrix directly from the correlation matrix by application of specified mathematical models. Most factor analysts agree that direct solutions are not sufficient. Adjustment to the frames of reference by rotation methods improves the interpretation of factor loadings by reducing some of the ambiguities which accompany the preliminary analysis (Child, 1990). The process of manipulating the reference axes is known as rotation. The results of rotation methods are sometimes referred to as derived solution because they are obtained as a second stage from the results of direct solutions. Rotation applied to the reference axes means the axes are turned about the origin until some alternative position has o been reached. The simplest case is when the axes are held at 90 to each other, orthogonal rotation. Rotating the o axes through different angles gives an oblique rotation (not at 90 to each other). Methods As an aside, names given to factor extraction methods have some interesting origins. • • Procrustes was a highwayman who tied his victims to a bed and shaped them to its structure either by stretching them or by cutting off their limbs. In factor analysis, the Procrustes technique/method involves testing data to see how close they fit a hypothesized factor structure. The plasmode method is taken from well-established areas (e.g., physics, chemistry) so that the factor structure is predictable. Criteria for Extracting Factors Determining the number of factors to extract in a factor analytic procedure is dependent on meeting appropriate criteria. They are 1) Kaiser’s criterion, suggested by Guttman and adapted by Kaiser, considers factors with an eigenvalue greater than one as common factors (Nunnally, 1978) 2) Cattell’s (1966) scree test. The name is based on an analogy between the debris, called scree, that collects at the bottom of a hill after a landslide, and the relatively meaningless factors that result from overextraction. On a scree plot, because each factor explains less variance than the preceding factors, an imaginary line connecting the markers for successive factors generally runs from top left of the graph to the bottom right. If there is a point below which factors explain relatively little variance and above which they explain substantially more, this usually appears as an “elbow” in the plot. This plot bears some physical resemblance to the profile of a hillside. The portion beyond the elbow corresponds to the rubble, or scree, that gathers. Cattell’s guidelines call for retaining factors above the elbow and rejecting those below it. This amounts to keeping the factors that contribute most to the variance 3) Proportion of variance accounted for keeps a factor if it accounts for a predetermined amount of the variance (e.g., 5%, 10%). 4) Interpretability criteria a. Are there at least 3 items with significant loadings (>0.30)? b. Do the variables that load on a factor share some conceptual meaning? c. Do the variables that load on different factors seem to measure different constructs? d. Does the rotated factor pattern demonstrate simple structure? Are there relatively i. high loadings on one factor? ii. low loadings on other factors? Statistics EFA decomposes an adjusted correlation matrix. Variables are standardized in EFA, e.g., mean=0, standard deviation=1, diagonals are adjusted for unique factors, 1-u. The amount of variance explained is equal to the trace of the matrix, the sum of the adjusted diagonals or communalities. Squared multiple correlations (SMC) are used as communality estimates on the diagonals. Observed variables are a linear combination of the underlying and unique factors. Factors are estimated, (X1 = b1F1 + b2F2 + . . . e1 where e1 is a unique factor). Factors account for common variance in a data set. The amount of variance explained is the trace (sum of the diagonals) of the decomposed adjusted correlation matrix. Eigenvalues indicate the amount of variance explained by each factor. Eigenvectors are the weights that could be used to calculate factor scores. In common practice, factor scores are calculated with a mean or sum of measured variables that “load” on a factor. 5 The EFA Model is Y = Xβ+ E where Y is a matrix of measured variables X is a matrix of common factors β is a matrix of weights (factor loadings) E is a matrix of unique factors, error variation Communality is the variance of observed variables accounted for by a common factor. A large communality value indicates a strong influence by an underlying construct. Community is computed by summing squares of factor loadings 2 d1 = 1 – communality = % variance accounted for by the unique factor d1 = square root (1-community) = unique factor weight (parameter estimate) EFA Steps 1) initial extraction each factor accounts for a maximum amount of variance that has not previously been accounted for by the other factors • factors are uncorrelated • eigenvalues represent amount of variance accounted for by each factor determine number of factors to retain • scree test, look for elbow • proportion of variance • prior communality estimates are not perfectly accurate, cumulative proportion must equal 100% so some eigenvalues will be negative after factors are extracted, e.g., if 2 factors are extracted, cumulative proportion equals 100% with 6 items, then 4 items have negative eigenvalues • interpretability • at least 3 observed variables per factor with significant factors • common conceptual meaning • measure different constructs • rotated factor pattern has simple structure (no cross loadings) rotation – a transformation interpret solution calculate factor scores results in a table prepare results, paper • 2) 3) 4) 5) 6) 7) PROC FACTOR and options for EFA DATA = PRIORS =SMC METHOD =ML,ULS ROTATE = SCREE N = MINEIGEN=1 OUT = FLAG = REORDER = specifies dataset to be analyzed squared multiple correlations used as adjusted diagonals of the correlation matrix specifies maximum likelihood and unweighted least squares methods PROMAX (ORTHOGONAL), VARIMAX(OBLIQUE) requests a scree plot of the eigenvalues specifies number of factors specifies select factors with eigenvalues greater than 1 data and estimated factor scores, use raw data and N= include a flag (*) for factor loadings above a specified value arranges factor loadings from largest to smallest for each factor An example of SAS code to run EFA. priors specify the prior communality estimate proc factor method=ml priors=smc maximum likelihood factor analysis method=uls priors=smc unweighted least squares factor analysis method=prin priors=smc principal factor analysis. 6 Similarities between CFA and EFA • • • • Both techniques are based on linear statistical models. Statistical tests associated with both methods are valid if certain assumptions are met. Both techniques assume a normal distribution. Both incorporate measured variables and latent constructs. Differences between CFA and EFA CFA requires specification of • a model a priori • the number of factors • which items load on each factor • a model supported by theory or previous research • error explicitly EFA • determines the factor structure (model) • explains a maximum amount of variance Statistical Anaysis With background knowledge of reliability, exploratory and confirmatory factor analysis, we’re ready to proceed to the statistical analysis! Reliablity Analysis with PROC CORR (7 factors) Data was analyzed with PROC CORR to determine the degree of internal consistency (reliability). A Cronbach alpha statistic indicates the level of reliability. Values could range from 0.0 to 1.0 with a value closer to 1.0 indicating a higher level of reliability. Reliability ranged from 0.62 to 0.88 on the total and category subscales for the sample data on the Race and Schooling School-Based Instrument. For research purposes, reliability was acceptable (see Table 1). Table 1. The Race and Schooling School-Based Instrument - Reliability Items Alpha (Standardized) 1, 2, 3, 4, 5 0.80 Relationships 6, 7, 8 0.63 Engagement 9, 10, 11 0.70 Learning and Teaching 12, 13, 14 0.73 Achievement Results 15, 16, 17 0.62 Community 18, 19, 20 0.63 School Culture 21, 22, 23 0.79 1-23 0.88 Factor (category) Professional Development Total 7 SAS Code PROC CORR procedure calculates a Cronbach Alpha statistics for the reliability analysis. The following code calculates Cronbach Alpha for the total scale (questions 1-23) and for each category subscale. proc corr data=rawsub2 var q1-q23; proc corr data=rawsub2 var q1-q5; proc corr data=rawsub2 var q6-q8; proc corr data=rawsub2 var q9-q11; proc corr data=rawsub2 var q12-q14; proc corr data=rawsub2 var q15-q17; proc corr data=rawsub2 var q18-q20; proc corr data=rawsub2 var q21-q23; nocorr alpha nomiss; nocorr alpha nomiss; nocorr alpha nomiss; nocorr alpha nomiss; nocorr alpha nomiss; nocorr alpha nomiss; nocorr alpha nomiss; nocorr alpha nomiss; Confirmatory Factor Analysis (7 factor) Category subscale scores (factors) were created with the MEAN function. The factors were subjected to a confirmatory factor analysis (CFA) with PROC CALIS. Figure 1 illustrates the CFA model. The SAS code below tests the underlying factor structure. proc calis data=rawsub2 kurtosis; lineqs Professional_Development=s1 F1 + e1, Relationships =s2 F1 + e2, Engagement =s3 F1 + e3, Learning_Teaching =s4 F1 + e4, AchResults =s5 F1 + e5, Community =s6 F1 + e6, SchoolCulture =s7 F1 + e7; std e1-e7=vare1-vare7, F1=1; var Professional_Development Relationships Engagement Learning_Teaching AchResults Community SchoolCulture; Another step that could be taken in the confirmatory approach is to test the structure for each of the 7 factors. The following SAS code would test a model for the Professional Development factor. proc calis data=rawsub2 kurtosis; **modification; lineqs q1= p1 F1 + e1, q2= p2 F1 + e2, q3= p3 F1 + e3, q4= p4 F1 + e4, q5= p5 F1 + e5; std e1-e5=vare1-vare5, F1=1; var q1-q5; If each factor can be confirmed, then a combined model is evaluated. If this model is confirmed, then a model including one latent construct (race and schooling) with a direct relationship to each factor is evaluated. Each factor influences responses to items. For example, the latent construct professional development influences responses to items 1, 2, 3, 4 and 5. 8 Professional Development Relationships Engagement Learning and Teaching Race and Schooling Achievement Results Community School Culture Figure 1. Confirmatory Factor Analysis Results PROC CALIS procedure provides the number of observations, variables, estimated parameters, and informations (related to model specification), descriptive statistics, and multivariate kurtosis. PROC CALIS procedure also indicates the observations with the largest contribution to kurtosis (see Figure 2). Fit Statistics Determine criteria a priori to access model fit and confirm the factor structure. Some of the criteria indicate acceptable model fit while other are close to meeting values for acceptable fit (see Figure 3). • Chi-square describes similarity of the observed and expected matrices. Acceptable model fit. Is indicated by a chi-square probability greater than or equal to 0.05. For this CFA model, the chi-square value is not close to zero (Chi-square=35.7367) and p = 0.0011, not greater than 0.05. • RMSEA indicates the amount of unexplained variance or residual. A value of 0.743 RMSEA value is larger than the 0.06 or less criteria. • CFI (0.9472), NNI (0.9208), and NFI (0.9174) values meet the criteria (0.90 or larger) for acceptable model fit. 9 The CALIS Procedure Covariance Structure Analysis: Maximum Likelihood Estimation Observations Variables Informations Variable Professional_Development Relationships Engagement Learning_Teaching AchResults Community SchoolCulture 282 7 28 Mean Model Terms Model Matrices Parameters Std Dev 6.50000 6.44681 6.49291 5.83688 7.68440 7.01418 9.32979 1.39712 1.48504 1.54008 1.51681 1.43037 1.40404 1.04068 Mardia's Multivariate Kurtosis Relative Multivariate Kurtosis Normalized Multivariate Kurtosis Mardia Based Kappa (Browne, 1982) Mean Scaled Univariate Kurtosis Adjusted Mean Scaled Univariate Kurtosis 1 4 14 Skewness Kurtosis -0.44553 -0.22743 -0.23384 -0.07231 -0.34269 -0.02535 -1.47618 -0.08904 -0.18439 0.12440 -0.00706 -0.01039 -0.06631 1.39821 3.9089 1.0620 2.9239 0.0620 0.0555 0.0555 Observation Numbers with Largest Contribution to Kurtosis 145 238 87 45 455.0658 290.2460 282.2438 270.1272 47 203.2169 Figure 2. Descriptives and Multivariate Kurtosis The CALIS Procedure Covariance Structure Analysis: Maximum Likelihood Estimation Fit Function Goodness of Fit Index (GFI) . . . Chi-Square Chi-Square DF Pr > Chi-Square . . . RMSEA Estimate . . . Bentler's Comparative Fit Index . . . Bentler & Bonett's (1980) Non-normed Index Bentler & Bonett's (1980) NFI . . . 0.1272 0.9621 35.7367 14 0.0011 0.0743 0.9472 0.9208 0.9174 Figure 3. Fit Statistics For purposes of this example, 3 fit statistics indicate acceptable fit and 2 fit statistics indicate unacceptable fit. The CFA analysis has not confirmed the factor structure. If the analysis indicates unacceptable model fit, the factor structure cannot be confirmed, an exploratory factor analysis is the next step. The factor structure is not confirmed. No further investigation of the confirmatory model is necessary, parameter estimates, variances, covariances. Proceed with exploratory factor analysis to determine the factor structure. 10 Exploratory factor analysis with PROC FACTOR • • • method is maximum likelihood diagonals of the correlation matrix are equal to squared multiple correlations criteria set a priori is each factor retained will explain at least 10% of the variance SAS Code proc factor data=rawsub2 method=ml priors=smc rotate=varimax reorder; var q1-q23; Preliminary Eigenvalues: Total = 19.0588542 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Average = 0.82864583 Eigenvalue Difference Proportion 11.3729772 3.6419565 2.1325504 1.6888314 1.0239633 0.7563391 0.6567901 0.3993095 0.1894706 0.1391390 0.1049819 0.0119620 -0.0246090 -0.0625760 -0.1113500 -0.2146331 -0.2450015 -0.2575795 -0.3326418 -0.3994169 -0.4345544 -0.4570144 -0.5200403 7.7310207 1.5094061 0.4437191 0.6648680 0.2676242 0.0995490 0.2574806 0.2098389 0.0503316 0.0341571 0.0930199 0.0365710 0.0379670 0.0487740 0.1032831 0.0303684 0.0125781 0.0750623 0.0667751 0.0351375 0.0224600 0.0630259 0.5967 0.1911 0.1119 0.0886 0.0537 0.0397 0.0345 0.0210 0.0099 0.0073 0.0055 0.0006 -0.0013 -0.0033 -0.0058 -0.0113 -0.0129 -0.0135 -0.0175 -0.0210 -0.0228 -0.0240 -0.0273 Cumulative 0.5967 0.7878 0.8997 0.9883 1.0420 1.0817 1.1162 1.1371 1.1471 1.1544 1.1599 1.1605 1.1592 1.1560 1.1501 1.1388 1.1260 1.1125 1.0950 1.0741 1.0513 1.0273 1.0000 5 factors will be retained by the PROPORTION criterion. Figure 4. Eigenvalues Results show 5 factors. Three factors each explain at least 10% of the variance (59.67%, 19.11%, 11.19%) while 2 factors each explain less than 10% of the variance (8.86%, 5,37%). According to criteria set a priori, each factor retained will explain at least 10% of the variance, three factors will be retained. The analysis will be rerun specifying three factors. proc factor data=rawsub2 method=ml priors=smc rotate=varimax reorder n=3; var q1-q23; Figure 5 indicates that both hypothesis tests are rejected, no common factors and 3 factors are sufficient. In practice, we want to reject the first hypotheses and accept the second hypothesis. Tucker and Lewis’s Reliability Coefficient indicates good reliability. Reliability is a value between 0 and 1 with a larger value indicating better reliability. 11 Significance Tests Based on 282 Observations Test H0: HA: H0: HA: No common factors At least one common factor 3 Factors are sufficient More factors are needed DF Chi-Square Pr > ChiSq 253 2392.4292 <.0001 187 522.6155 <.0001 Chi-Square without Bartlett's Correction Akaike's Information Criterion Schwarz's Bayesian Criterion Tucker and Lewis's Reliability Coefficient 542.90188 168.90188 -512.13474 0.78776 Figure 5. Hypothesis Tests, Reliability Results show items 21, 22, 23 as a factor, items 1, 2, 3, 4 as a factor, and items 5-20 as a factor. Two of the factors are the same or close to the original factor structure. Items 21, 22, 23 are the School Culture factor. Items 1, 2, 3, 4, 5 are the Professional Development factor in the original factor structure. Item 5 is not included in the revised factor structure. Interpretability Is there some conceptual meaning for each factor? Could the factors be given a name? Factor1 could be called Professional Development (items 1, 2, 3, 4). Factor2 could be called Attitudes (items 5-20). Factor3 could be called School Culture (items 21, 22, 23). Rotated Factor Pattern q10 q12 q14 q11 q7 q13 q9 q16 q20 q8 q15 q6 q19 q5 q17 q18 q3 q4 q2 q1 q22 q23 q21 Factor1 Factor2 Factor3 0.72350 0.63153 0.57995 0.57515 0.56258 0.55962 0.54358 0.54234 0.49741 0.49444 0.47474 0.45938 0.43811 0.42258 0.37788 0.34111 0.10059 0.16327 0.25783 0.26039 0.07401 0.13707 0.07095 0.22819 0.32713 0.18075 0.14190 0.27168 0.11049 0.07180 0.27697 0.17016 0.01762 0.15822 0.34502 0.09902 0.42156 0.10460 0.08108 0.86961 0.75458 0.64261 0.41033 0.01893 -0.07665 -0.03185 0.05126 -0.01767 0.03112 -0.06608 0.04450 0.02234 0.08688 0.04569 0.12907 0.02129 0.16368 0.08328 0.12647 -0.02872 0.06759 0.17346 -0.06967 -0.11327 0.05330 0.03917 0.76583 0.75179 0.70232 Figure 6. Factor Loadings 12 Reliability with PROC CORR tested the reliability for the revised factor structure. The SAS code for reliability is proc corr data=rawsub2 nocorr alpha nomiss; var q1-q4; proc corr data=rawsub2 nocorr alpha nomiss; var q5-q20; proc corr data=rawsub2 nocorr alpha nomiss; var q21-q23; Reliability with Cronbach Alpha for items 21, 22, 23 is 0.79, for items 1, 2, 3, 4 is 0.79, and for items 5-20 is 0.88. The range is 0.79 to 0.88 for the revised factor structure. The range for the original factor structure is 0.62 to 0.80 Conclusion Confirmatory and Exploratory Factor Analysis are powerful statistical techniques. The techniques have similarities and differences. Determine the type of analysis a priori to answer research questions and maximize your knowledge. WAM (Walk away message) Select CFA to verify the factor structure and EFA to determine the factor structure. References Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245-276. Child, D. (1990). The essentials of factor analysis, second edition. London: Cassel Educational Limited. DeVellis, R. F. (1991). Scale Development: Theory and Applications. Newbury Park, California: Sage Publications. ® Hatcher, L. (1994). A step-by-step approach to using the SAS System for factor analysis and structural equation modeling. Cary, NC: SAS Institute Inc. Hoyle, R. H. (1995). The structural equation modeling approach: Basic concepts and fundamental issues. In Structural equation modeling: Concepts, issues, and applications, R. H. Hoyle (editor). Thousand Oaks, CA: Sage Publications, Inc., pp. 1-15. Hu, L. & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1-55. Joreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis, Psychometrika, 34, 183-202. Kline, R. B. (1998). Principles and Practice of Structural Equation Modeling. New York: The Guilford Press. nd Nunnally, J. C. (1978). Psychometric theory, 2 edition. New York: McGraw-Hill. ® SAS Language and Procedures, Version 6, First Edition. Cary, N.C.: SAS Institute, 1989. ® SAS Online Doc 9. Cary, N.C.: SAS Institute. SAS® Procedures, Version 6, Third Edition. Cary, N.C.: SAS Institute, 1990. ® SAS/STAT User’s Guide, Version 6, Fourth Edition, Volume 1. Cary, N.C.: SAS Institute, 1990. SAS/STAT® User’s Guide, Version 6, Fourth Edition, Volume 2. Cary, N.C.: SAS Institute, 1990. Schumacker, R. E. & Lomax, R. G. (1996). A Beginner’s Guide to Structural Equation Modeling. Mahwah, New Jersey: Lawrence Erlbaum Associates, Publishers. Shay, Mary. (in press). An Investigation of the Attitudes, Beliefs, and Values of Elementary School Teachers Toward Race and Schooling. Greeley CO: University of Northern Colorado. Shay, Mary (2008). Race and Schooling Instrument. Thorndike, R. M., Cunningham, G. K., Thorndike, R. L., & Hagen E. P. (1991). Measurement and evaluation in psychology and education. New York: Macmillan Publishing Company. Truxillo, Catherine. (2003). Multivariate Statistical Methods: Practical Research Applications Course Notes. Cary, N.C.: SAS Institute. Contact Information Diana Suhr, Ph.D., Statistical Analyst Office of Budget & Institutional Analysis University of Northern Colorado Greeley,CO 80639, diana.suhr@unco.edu Mary Shay, Principal Cottonwood Creek Elementary School Cherry Creek School District Denver CO 80237 SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. 13 Appendix A Race and Schooling Instrument Revised – Shay, (2008) Professional Staff Development 1. To what extent do teachers and administrators in your school talk openly and constructively about race with each other? 2. To what extent do staff development activities help educators understand the ways in which race influences student behavior? 3. To what extent do staff development activities help educators acquire knowledge about the history and culture of various racial groups? 4. To what extent do staff development activities help educators become knowledgeable about the diverse racial perspectives on historical and current events? 5. To what extent are successful efforts being made in your school to attract, retain, and advance teachers and administrators of color? 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always Relationships 6. To what extent do teachers in your school talk openly and constructively about race with students? 7. To what extent do teachers encourage students to have open and constructive conversations about being victims of racial discrimination and about possessing racial power and white privilege? 8. To what extent do teachers structure interracial cooperative groups that enable students of different racial groups to become acquainted with each other? 1 Almost Never 1 Almost Never 2 Seldom 2 Seldom 3 Sometimes 3 Sometimes 4 Frequently 4 Frequently 5 Almost Always 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always Engagement 9. To what extent is decision-making in the school widely shared among administrators, teachers, parents and students of a variety of racial groups? 10. To what extent do teachers encourage students of different racial groups to talk with each other openly and constructively about race? 11. To what extent are deliberate actions taken by administrators and teachers to insure that students of color are represented proportionately in extracurricular activities and leadership roles? 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 14 Learning and Teaching 12. To what extent do teachers help students acquire the knowledge and skills needed to have thoughtful, constructive, heartfelt discussions about race? 13. To what extent are students provided factual information in social studies and other subject areas that contradicts misconceptions about people of color? 14. To what extent does the school curriculum include a focus on racial power and white privilege through examples in history, art, science, and other disciplines? 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always Achievement Results 15. To what extent do teachers use a variety of assessment devices to ensure that students of all racial groups meet rigorous standards across the curriculum? 16. To what extent do teachers use a variety of assessment devices to measure improved race relations between students of different racial groups? 17. To what extent are students of all racial groups consistently exposed to and supported in the most rigorous curricular opportunities? 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 18. To what extent do teachers have regular personal contact (e.g. phone, face to face) with Black and Hispanic families to advise them on the achievement of their children? 19. To what extent are Black and Hispanic children specifically invited to become a part of school-wide activities, committees, and councils? 20. To what extent are families of Black and Hispanic families visible in school and leadership positions? 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always Community School Culture 21. To what extent do you believe that Black and Hispanic students should perform at least as well as White students? 22. To what extent do you believe it is our responsibility as educators to make this level of achievement occur for our Black and Hispanic students? 23. To what extent do you believe that our Black and Hispanic students can actually reach this level of achievement? 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always 1 Almost Never 2 Seldom 3 Sometimes 4 Frequently 5 Almost Always Note: Information questions are not shown in Appendix A. 15