Discussion of Cronbach`s Alpha On the EDSTAT

advertisement
Discussion of Cronbach’s Alpha On the EDSTAT-L List
I close with some messages that were posted on the EDSTAT list.
Date: Sun, 10 Jan 1999 11:28:25 -0600
Sender: owner-edstat-l@eos.ncsu.edu
From: "Kevin F. Spratt" <Kevin-Spratt@Uiowa.edu>
Perhaps the easiest way to conceptualize Cronbach's Alpha is to think of it as the
average of all possible split half reliabilities for a set of items. A split half reliability is
simply the reliability between two parts of a test or instrument where those two parts are
halves of the total instrument. If you want to get technical, the reliabilities of these two
halves should then be stepped up (Spearman Brown Prophesy Formula) to estimate the
reliability for the full length test rather than the reliability between two half length tests.
Assuming, for ease of interpretation, that a test has an even number of items (e.g, 10),
then items 1-5 vs. 6-10 would be one split, evens vs. odds would be another and, in
fact, with 10 items chosen 5 at a time, there are 252 possible split halves for this test. If
you computed each of these split half reliabilities and averaged them all, this average
would be Cronbach's Alpha. Since some splits will be better than others in terms of
creating two more closely parallel halves, and the reliability between parallel halves is
probably the most appropriate estimate of an instrument's reliability, Cronbach's alpha is
often considered a relatively conservative estimate of the internal consistency of a test.
As for the use of SAS for computing coefficient alpha or Cronbach's Alpha,it is an
option in the PROC CORR procedure. In SAS, for a WORK data set called ONE, if you
wanted the internal consistency or coefficient alpha or Cronbach's alpha for x1-x10, the
syntax is:
PROC CORR DATA=WORK.ONE ALPHA;
VAR X1-X10;
RUN;
There are at least three important caveats to consider when figuring coefficient alpha.
1. How to handle "missing" values. In achievement testing, a missing value or a
not reached value is traditionally coded as 0 or wrong. The CORR procedure in SAS
DOES NOT treating missing as wrong. It is not difficult to write code to force this to
happen, but you must write the code.
In the above example you could do so as follows:
DATA WORK.ONE;SET WORK.ONE;
/* ESTABLISHING A DATA STEP SO YOU CAN MANIPULATE THE DATA */
ARRAY X {10} X1-X10;
/* DEFINING AN ARRAY FOR THE 10 ITEMS */
DO I=1 TO 10;
IF X(I) = . THEN X(I) = 0;
Alpha-Edstat.doc
/* FOR EACH ITEM X1-X10 CHANGING MISSING VALUES (.) TO 0 */
END; RUN;
2. The use of the NOMISS option in the CORR procedure. This is related
to point 1 above.
Another way to handling missing observations is to use the NOMISS
option in the CORR procedure. The syntax is as follows:
PROC CORR DATA=WORK.ONE ALPHA NOMISS; VAR X1-X10;
The effect of this is to remove all items X1-X10 from analysis for any record
where at least one of these items X1-X10 are missing. Obviously, for achievement
testing, especially for speeded tests, where most examinees might not be expected to
complete all items, this would be a problem. The use of the NOMISS option would
restrict the analysis to the subset of examinees who did complete all items and this
quite often would not be the population of interest when wishing to establish an internal
consistency reliability estimate.
One common approach to resolving this problem might be to define a number of
items that must be attempted for the record to be included. Some health status
measures, for example the SF-36, have scoring rules that require that at least 50% of
the items must be answered for the scale to be defined. If less than half of the items
are attempted, then the scale is not interpreted. If the scale is considered valid, by
THEIR definition, then all missing values on that scale are replaced by the average of
the non-missing items on that scale. The SAS code to implement this scoring algorithm
is summarized below under the assumption that the scale is has 10 items.
DATA WORK.ONE;SET WORK.ONE; ARRAY X {10} X1-X10;
IF NMISS(OF X1-X10) > 5 THEN DO I=1 TO 10; X(I) = .; END;
ELSE IF NMISS(OF X1-X10) <= 5 THEN DO I=1 TO 10;
IF X(I) =. THEN X(I) = MEAN(OF X1-X10); END; RUN;
Note that replacing all missing values with the average of the
non-missing values in the cases where then number of missing values is not
greater than half of the total number of items will result in an inflated
Cronbach's alpha. A better approach would be to remove from consideration
records where fewer than 50% of the records are completed and to leave the
remaining records intact, with the missing values still in. In other words, to
implement that first IF statement above, but to eliminate the ELSE IF clause and
then to run the PROC CORR without the NOMISS option. The bottom line: The
NOMISS option in PROC CORR in general, and with the ALPHA options in
particular must be considered carefully.
3. Making sure that all items in the set are coded in the same direction.
Although 0/1 (wrong/right) coding is rarely a problem with this, for Likert or other
scales with more than 2 points on the scale, it is not uncommon for the scale to remain
constant (e.g., Strongly Agree, Agree, Disagree, Strongly Disagree), but for the wording
of the questions to reverse the appropriate interpretation of the scale. For example,
Q1. President Clinton should be removed from office
SA A D SD
Q2. President Clinton should NOT be removed from office. SA A D SD
Any criticism of the wording or of the value of having these two questions in the
same set, or the advisability of the 4-point Likert scale defined above aside, it should be
clear that the two questions are on the same scale, but the meanings of the end points
opposite.
In SAS, the way to adjust for this problem is to pick the direction that you want
the scale to be coded, that is, do you want SA to be a positive statement about the
President or a negative one, and then reverse scale those items were SA reflects
negatively (or positively) on the President. In the above example, SA for Q1 is a
negative position relative to the President and, therefore should be reverse scaled if the
decision is to scale so the SA implies positive attitudes.
If the coding of the 4-point Likert Scale was SA-0, A-1, D-2, SD-3, then the item
will be reverse scaled as follows:
Q1 = 3-Q1 (in this way 0 becomes 3-0 = 1;
1 becomes 3-1 = 2;
2 becomes 3-2 = 1; and
3 becomes 3-3 = 0).
If the coding of the 4-point Likert Scale was SA-1, A-2, D-3, SD-4, then the item
will be reverse scaled as follows:
Q1 = 5-Q1 (in this way 1 becomes 5-1 = 4;
2 becomes 5-2 = 3;
3 becomes 5-3 = 2; and
4 becomes 5-4 = 1).
From the earlier example, If items X1, X3, X5, X7, and X9 would need to be
reverse scaled for before computing an internal consistency estimate, then the following
SAS code would do the job, Assuming a the 4-point Likert scale illustrated above with
1-4 scoring.
DATA WORK.ONE;SET WORK.ONE;
/* ESTABLISHING A DATA STEP SO YOU CAN MANIPULATE THE DATA */
ARRAY X {10} X1-X10;
/* DEFINING AN ARRAY FOR THE 10 ITEMS */
DO I=1,3,5,7,9;
/* INDICATING WHICH ITEMS IN THE ARRAY TO BE REVERSE SCALED */
X(I) = 5-X(I);
/* REVERSE SCALING FOR 1-4 CODING OF 4-POINT LIKERT SCALE */
END; RUN;
It should be noted that some of the output from PROC CORR with the ALPHA
option, such as the correlation of the item with the total and the internal consistency
estimate for the scale with each individual item NOT part of the scale provides very
useful diagnostics that should alert the researcher about either poorly functioning items
or items that were missed when considering reverse scaling. An item that correlated
negatively with the total usually needs to be reverse scaled or is poorly formed.
I hope this longer than intended summary of coefficient Alpha (Cronbach's Alpha)
and some SAS coding issues has been helpful.
___________________________________________________________
Kevin F. Spratt, Ph.D.
Iowa Testing Programs &
Spine Diagnostic & Treatment Center &
Iowa Spine Research Center
224-D Lindquist Center
University of Iowa
Iowa City, Iowa 52242
(319) 335-5572
(voice)
(319) 335-6399
(fax)
Kevin-Spratt@Uiowa.edu (e-mail)
================================================================
Date: Mon, 11 Jan 1999 11:41:52 +1100
From: Paul Gardner <Paul.Gardner@Education.monash.edu.au>
To: edstat-l@jse.stat.ncsu.edu,
Subject: Re: Kronbach's Alph
Dennis Roberts wrote about Cronbach's alpha:
> essentially ... it goes something like this. the more a set of items
> INTERcorrelate with one another, the HIGHER will be cronbachs alpha ...
> this is not exactly it but, good enough for government work
>
This is perfectly true, but the problem is that this assertion is frequently misinterpreted.
Naive students of psychometrics often interpret such assertions by arguing the
converse, i.e. that if Cronbach's alpha is high, this "proves" that the items all
intercorrelate with each other. In the words of the classic Gershwin song, it ain't
necessarily so. This is a wonderful illustration of the fallacy of affirming the consequent:
i.e. the fallacy that results from arguing that if p then q, therefore if q then p. Cronbach's
alpha can be quite high if there are separate clusters of items which intercorrelate well
within each cluster, even if the clusters don't correlate with other clusters. Alpha can
reach high levels as long as every item in a scale correlates well with SOME other
items, but not necessarily ALL of them. I interpret Cronbach's alpha as an indicator of
the relative absence of item error variance. It is NOT an indicator of unidimensionality.
(Dr) Paul Gardner
=================================================================
Dr Paul Gardner,
Reader in Education and Director, Research Degrees,
Faculty of Education, Monash University,
Clayton, Victoria, Australia 3168.
Tel: Int+ 61 3 9905 2854; fax Int+ 61 3 9905 2779
Home: 61 3 9578 4724; mobile: 0412 275 623
Email: paul.gardner@education.monash.edu.au
================================================================
Date: Sat, 03 Apr 1999 23:33:47 -0800
From: "Hoi K. Suen" <HoiSuen@psu.edu>
To: edstat-l@jse.stat.ncsu.edu
Subject: Cronbach and Standardized item alpha
Dennis Roberts forwarded the note below to me. Not being on this listserv, I am not
sure what the context of the discussion was nor what precisely Rich Ulrich meant by
"coefficient to describe an actual, computed score". If we wish to be very precise about
it -Standardized item alpha makes strictly classically parallel tests assumptions and is an
application of the Spearman-Brown formula to K one-item parallel tests. It is
theoretically equivalent to the average of all Spearman-Brown-corrected reliability
estimates from all possible splits in a series applications of the split half method on a
single test.
Cronbach alpha, on the other hand, requires only the less restrictive essentially-tauequivalent assumptions. Numerically, it is less than or equal to Standardized item
alpha. It is also numerically equivalent to the G-coefficient in generalizability theory
from a single-item-facet, cross-design, relative model D-study in which the number of
items is defined as that of the instrument. It is also numerically equivalent to Hoyt's
intraclass correlation.
Standardized item alpha is more appropriate if we accept the classically parallel tests
assumptions. Cronbach alpha requires less restrictive assumptions and may be more
practical. The issue of "accuracy of internal consistency" is not relevant to the issue
here since the concept of internal consistency is different from that of a reliability
coefficient. Internal consistency is best viewed as either a strategy to estimate reliability
coefficients or a measure of item interchangeability. The former is a method, not a
coefficient. The latter is not a reliability coefficient, given that reliability coefficient is
defined as true variance/(true variance+error variance) with the focus on scores, not
items. (This is similar to the idea that interrater reliability does not describe the reliability
of rating scores, which has been discussed by others elsewhere.)
(My own opinion is that a reliability coefficient is not that important of a statistic to begin
with. Judgments of the magnitude of a reliability coefficient and concerns about the
accuracy of its estimation are misplaced emphases. It was either Linquist or Thorndike
who said that if we must make decisions about an individual, we'll do so with the best
available information no matter what the reliability coefficient is, provided that it is better
than zero.)
Hoi Suen
>dennis roberts (dmr@psu.edu) wrote:
>: from hoi suen's principles of test theories ... erlbaum 1990, page 34/35
>
>: if you find the average of the intercorrelations amongst all the items and
>: then use the spearman brown prophecy formula to estimate the reliability of
>: a k length test ... you have the standardized alpha (laborious to calculate
>: but, most accurate for internal consistency)
>
>: if you use what typically we see ... k/k-1 ( 1 - (sum item var/tottest
>: var)) ... you get the most common form of cronbachs alpha ... which is a
>: lower bound for standardized alpha
>
>
> - that is either an erroneous or a sloppy phrase, "most accurate for
>internal consistency"; if he encourages you apply that coefficient
>to describe an actual, computed score -- he is absolutely wrong.
>
>So, the standardized alpha is an *upper bound* to the practical
>alpha? That sounds reasonable.
>
>
>->Rich Ulrich, biostatistician wpilib+@pitt.edu
>http://www.pitt.edu/~wpilib/index.html Univ. of Pittsburgh
_____________________
Hoi K. Suen
Professor of Educational Psychology
103 CEDAR, Penn State University
University Park, PA 16802-3108
Email: HoiSuen@psu.edu
Phone: 814-865-2235
Fax: 814-863-1002
Website: http://espse.ed.psu.edu/suen/suen.htm
================================================================
Date: Sun, 4 Apr 1999 08:52:17 -0500
From: "Rick Bello" <spcd-rsb@nich-nsunet.nich.edu>
To: <edstat-l@jse.stat.ncsu.edu>
Subject: Re: standardized alpha
Dale, Dennis, Lazar, etc:
Thanks to all who responded to my request for information about the differences
between alpha and stand. item alpha. You guys were a great help.
By the way, the crux of my problem was that while it is *generally* true that (as Dale
said) there is very little substantial difference between the two, in my case the stand.
alpha was producing a result five points higher than alpha. While I would like to have
legitimately used stand. alpha, the info. on Hoi Suen's web site convinced me (along
with some additional tinkering on SPSS) that the scale I was using did not meet the
*parallel* test assumptions that underlie stand. alpha. Specifically, one analysis on
SPSS showed that item variances were sig. diff. from one another, which would
preclude (I think) any reliability analysis that includes parallel test assumptions.
Therefore, I decided to report alpha only. Please, someone correct me if you think my
reasoning here is wrong! This involves my dissertation, and I am have a very limited
time for turing in the final, revised copy.
Thanks again for your help.
Rick Bello
Download