Discussion of Cronbach’s Alpha On the EDSTAT-L List I close with some messages that were posted on the EDSTAT list. Date: Sun, 10 Jan 1999 11:28:25 -0600 Sender: owner-edstat-l@eos.ncsu.edu From: "Kevin F. Spratt" <Kevin-Spratt@Uiowa.edu> Perhaps the easiest way to conceptualize Cronbach's Alpha is to think of it as the average of all possible split half reliabilities for a set of items. A split half reliability is simply the reliability between two parts of a test or instrument where those two parts are halves of the total instrument. If you want to get technical, the reliabilities of these two halves should then be stepped up (Spearman Brown Prophesy Formula) to estimate the reliability for the full length test rather than the reliability between two half length tests. Assuming, for ease of interpretation, that a test has an even number of items (e.g, 10), then items 1-5 vs. 6-10 would be one split, evens vs. odds would be another and, in fact, with 10 items chosen 5 at a time, there are 252 possible split halves for this test. If you computed each of these split half reliabilities and averaged them all, this average would be Cronbach's Alpha. Since some splits will be better than others in terms of creating two more closely parallel halves, and the reliability between parallel halves is probably the most appropriate estimate of an instrument's reliability, Cronbach's alpha is often considered a relatively conservative estimate of the internal consistency of a test. As for the use of SAS for computing coefficient alpha or Cronbach's Alpha,it is an option in the PROC CORR procedure. In SAS, for a WORK data set called ONE, if you wanted the internal consistency or coefficient alpha or Cronbach's alpha for x1-x10, the syntax is: PROC CORR DATA=WORK.ONE ALPHA; VAR X1-X10; RUN; There are at least three important caveats to consider when figuring coefficient alpha. 1. How to handle "missing" values. In achievement testing, a missing value or a not reached value is traditionally coded as 0 or wrong. The CORR procedure in SAS DOES NOT treating missing as wrong. It is not difficult to write code to force this to happen, but you must write the code. In the above example you could do so as follows: DATA WORK.ONE;SET WORK.ONE; /* ESTABLISHING A DATA STEP SO YOU CAN MANIPULATE THE DATA */ ARRAY X {10} X1-X10; /* DEFINING AN ARRAY FOR THE 10 ITEMS */ DO I=1 TO 10; IF X(I) = . THEN X(I) = 0; Alpha-Edstat.doc /* FOR EACH ITEM X1-X10 CHANGING MISSING VALUES (.) TO 0 */ END; RUN; 2. The use of the NOMISS option in the CORR procedure. This is related to point 1 above. Another way to handling missing observations is to use the NOMISS option in the CORR procedure. The syntax is as follows: PROC CORR DATA=WORK.ONE ALPHA NOMISS; VAR X1-X10; The effect of this is to remove all items X1-X10 from analysis for any record where at least one of these items X1-X10 are missing. Obviously, for achievement testing, especially for speeded tests, where most examinees might not be expected to complete all items, this would be a problem. The use of the NOMISS option would restrict the analysis to the subset of examinees who did complete all items and this quite often would not be the population of interest when wishing to establish an internal consistency reliability estimate. One common approach to resolving this problem might be to define a number of items that must be attempted for the record to be included. Some health status measures, for example the SF-36, have scoring rules that require that at least 50% of the items must be answered for the scale to be defined. If less than half of the items are attempted, then the scale is not interpreted. If the scale is considered valid, by THEIR definition, then all missing values on that scale are replaced by the average of the non-missing items on that scale. The SAS code to implement this scoring algorithm is summarized below under the assumption that the scale is has 10 items. DATA WORK.ONE;SET WORK.ONE; ARRAY X {10} X1-X10; IF NMISS(OF X1-X10) > 5 THEN DO I=1 TO 10; X(I) = .; END; ELSE IF NMISS(OF X1-X10) <= 5 THEN DO I=1 TO 10; IF X(I) =. THEN X(I) = MEAN(OF X1-X10); END; RUN; Note that replacing all missing values with the average of the non-missing values in the cases where then number of missing values is not greater than half of the total number of items will result in an inflated Cronbach's alpha. A better approach would be to remove from consideration records where fewer than 50% of the records are completed and to leave the remaining records intact, with the missing values still in. In other words, to implement that first IF statement above, but to eliminate the ELSE IF clause and then to run the PROC CORR without the NOMISS option. The bottom line: The NOMISS option in PROC CORR in general, and with the ALPHA options in particular must be considered carefully. 3. Making sure that all items in the set are coded in the same direction. Although 0/1 (wrong/right) coding is rarely a problem with this, for Likert or other scales with more than 2 points on the scale, it is not uncommon for the scale to remain constant (e.g., Strongly Agree, Agree, Disagree, Strongly Disagree), but for the wording of the questions to reverse the appropriate interpretation of the scale. For example, Q1. President Clinton should be removed from office SA A D SD Q2. President Clinton should NOT be removed from office. SA A D SD Any criticism of the wording or of the value of having these two questions in the same set, or the advisability of the 4-point Likert scale defined above aside, it should be clear that the two questions are on the same scale, but the meanings of the end points opposite. In SAS, the way to adjust for this problem is to pick the direction that you want the scale to be coded, that is, do you want SA to be a positive statement about the President or a negative one, and then reverse scale those items were SA reflects negatively (or positively) on the President. In the above example, SA for Q1 is a negative position relative to the President and, therefore should be reverse scaled if the decision is to scale so the SA implies positive attitudes. If the coding of the 4-point Likert Scale was SA-0, A-1, D-2, SD-3, then the item will be reverse scaled as follows: Q1 = 3-Q1 (in this way 0 becomes 3-0 = 1; 1 becomes 3-1 = 2; 2 becomes 3-2 = 1; and 3 becomes 3-3 = 0). If the coding of the 4-point Likert Scale was SA-1, A-2, D-3, SD-4, then the item will be reverse scaled as follows: Q1 = 5-Q1 (in this way 1 becomes 5-1 = 4; 2 becomes 5-2 = 3; 3 becomes 5-3 = 2; and 4 becomes 5-4 = 1). From the earlier example, If items X1, X3, X5, X7, and X9 would need to be reverse scaled for before computing an internal consistency estimate, then the following SAS code would do the job, Assuming a the 4-point Likert scale illustrated above with 1-4 scoring. DATA WORK.ONE;SET WORK.ONE; /* ESTABLISHING A DATA STEP SO YOU CAN MANIPULATE THE DATA */ ARRAY X {10} X1-X10; /* DEFINING AN ARRAY FOR THE 10 ITEMS */ DO I=1,3,5,7,9; /* INDICATING WHICH ITEMS IN THE ARRAY TO BE REVERSE SCALED */ X(I) = 5-X(I); /* REVERSE SCALING FOR 1-4 CODING OF 4-POINT LIKERT SCALE */ END; RUN; It should be noted that some of the output from PROC CORR with the ALPHA option, such as the correlation of the item with the total and the internal consistency estimate for the scale with each individual item NOT part of the scale provides very useful diagnostics that should alert the researcher about either poorly functioning items or items that were missed when considering reverse scaling. An item that correlated negatively with the total usually needs to be reverse scaled or is poorly formed. I hope this longer than intended summary of coefficient Alpha (Cronbach's Alpha) and some SAS coding issues has been helpful. ___________________________________________________________ Kevin F. Spratt, Ph.D. Iowa Testing Programs & Spine Diagnostic & Treatment Center & Iowa Spine Research Center 224-D Lindquist Center University of Iowa Iowa City, Iowa 52242 (319) 335-5572 (voice) (319) 335-6399 (fax) Kevin-Spratt@Uiowa.edu (e-mail) ================================================================ Date: Mon, 11 Jan 1999 11:41:52 +1100 From: Paul Gardner <Paul.Gardner@Education.monash.edu.au> To: edstat-l@jse.stat.ncsu.edu, Subject: Re: Kronbach's Alph Dennis Roberts wrote about Cronbach's alpha: > essentially ... it goes something like this. the more a set of items > INTERcorrelate with one another, the HIGHER will be cronbachs alpha ... > this is not exactly it but, good enough for government work > This is perfectly true, but the problem is that this assertion is frequently misinterpreted. Naive students of psychometrics often interpret such assertions by arguing the converse, i.e. that if Cronbach's alpha is high, this "proves" that the items all intercorrelate with each other. In the words of the classic Gershwin song, it ain't necessarily so. This is a wonderful illustration of the fallacy of affirming the consequent: i.e. the fallacy that results from arguing that if p then q, therefore if q then p. Cronbach's alpha can be quite high if there are separate clusters of items which intercorrelate well within each cluster, even if the clusters don't correlate with other clusters. Alpha can reach high levels as long as every item in a scale correlates well with SOME other items, but not necessarily ALL of them. I interpret Cronbach's alpha as an indicator of the relative absence of item error variance. It is NOT an indicator of unidimensionality. (Dr) Paul Gardner ================================================================= Dr Paul Gardner, Reader in Education and Director, Research Degrees, Faculty of Education, Monash University, Clayton, Victoria, Australia 3168. Tel: Int+ 61 3 9905 2854; fax Int+ 61 3 9905 2779 Home: 61 3 9578 4724; mobile: 0412 275 623 Email: paul.gardner@education.monash.edu.au ================================================================ Date: Sat, 03 Apr 1999 23:33:47 -0800 From: "Hoi K. Suen" <HoiSuen@psu.edu> To: edstat-l@jse.stat.ncsu.edu Subject: Cronbach and Standardized item alpha Dennis Roberts forwarded the note below to me. Not being on this listserv, I am not sure what the context of the discussion was nor what precisely Rich Ulrich meant by "coefficient to describe an actual, computed score". If we wish to be very precise about it -Standardized item alpha makes strictly classically parallel tests assumptions and is an application of the Spearman-Brown formula to K one-item parallel tests. It is theoretically equivalent to the average of all Spearman-Brown-corrected reliability estimates from all possible splits in a series applications of the split half method on a single test. Cronbach alpha, on the other hand, requires only the less restrictive essentially-tauequivalent assumptions. Numerically, it is less than or equal to Standardized item alpha. It is also numerically equivalent to the G-coefficient in generalizability theory from a single-item-facet, cross-design, relative model D-study in which the number of items is defined as that of the instrument. It is also numerically equivalent to Hoyt's intraclass correlation. Standardized item alpha is more appropriate if we accept the classically parallel tests assumptions. Cronbach alpha requires less restrictive assumptions and may be more practical. The issue of "accuracy of internal consistency" is not relevant to the issue here since the concept of internal consistency is different from that of a reliability coefficient. Internal consistency is best viewed as either a strategy to estimate reliability coefficients or a measure of item interchangeability. The former is a method, not a coefficient. The latter is not a reliability coefficient, given that reliability coefficient is defined as true variance/(true variance+error variance) with the focus on scores, not items. (This is similar to the idea that interrater reliability does not describe the reliability of rating scores, which has been discussed by others elsewhere.) (My own opinion is that a reliability coefficient is not that important of a statistic to begin with. Judgments of the magnitude of a reliability coefficient and concerns about the accuracy of its estimation are misplaced emphases. It was either Linquist or Thorndike who said that if we must make decisions about an individual, we'll do so with the best available information no matter what the reliability coefficient is, provided that it is better than zero.) Hoi Suen >dennis roberts (dmr@psu.edu) wrote: >: from hoi suen's principles of test theories ... erlbaum 1990, page 34/35 > >: if you find the average of the intercorrelations amongst all the items and >: then use the spearman brown prophecy formula to estimate the reliability of >: a k length test ... you have the standardized alpha (laborious to calculate >: but, most accurate for internal consistency) > >: if you use what typically we see ... k/k-1 ( 1 - (sum item var/tottest >: var)) ... you get the most common form of cronbachs alpha ... which is a >: lower bound for standardized alpha > > > - that is either an erroneous or a sloppy phrase, "most accurate for >internal consistency"; if he encourages you apply that coefficient >to describe an actual, computed score -- he is absolutely wrong. > >So, the standardized alpha is an *upper bound* to the practical >alpha? That sounds reasonable. > > >->Rich Ulrich, biostatistician wpilib+@pitt.edu >http://www.pitt.edu/~wpilib/index.html Univ. of Pittsburgh _____________________ Hoi K. Suen Professor of Educational Psychology 103 CEDAR, Penn State University University Park, PA 16802-3108 Email: HoiSuen@psu.edu Phone: 814-865-2235 Fax: 814-863-1002 Website: http://espse.ed.psu.edu/suen/suen.htm ================================================================ Date: Sun, 4 Apr 1999 08:52:17 -0500 From: "Rick Bello" <spcd-rsb@nich-nsunet.nich.edu> To: <edstat-l@jse.stat.ncsu.edu> Subject: Re: standardized alpha Dale, Dennis, Lazar, etc: Thanks to all who responded to my request for information about the differences between alpha and stand. item alpha. You guys were a great help. By the way, the crux of my problem was that while it is *generally* true that (as Dale said) there is very little substantial difference between the two, in my case the stand. alpha was producing a result five points higher than alpha. While I would like to have legitimately used stand. alpha, the info. on Hoi Suen's web site convinced me (along with some additional tinkering on SPSS) that the scale I was using did not meet the *parallel* test assumptions that underlie stand. alpha. Specifically, one analysis on SPSS showed that item variances were sig. diff. from one another, which would preclude (I think) any reliability analysis that includes parallel test assumptions. Therefore, I decided to report alpha only. Please, someone correct me if you think my reasoning here is wrong! This involves my dissertation, and I am have a very limited time for turing in the final, revised copy. Thanks again for your help. Rick Bello