Reliability analysis

advertisement
1
HANDOUT ON RELIABILITY
Reliability refers to the consistency and stability in the results of a test or scale. A test is
said to be reliable if it yields similar results in repeated administrations when the attribute being
measured is believed not to have changed in the interval between measurements, even though the
test may be administered by different people and alternative forms of the test are used. For
example, if you weighed yourself twice consecutively and the first time the scale read 130 lbs.
And the second time 140 lbs., we would say that the scale was an unreliable measure of weights.
In addition, to be reliable, an instrument or test must be confined to measuring a single construct
and only one dimension. For example, if a questionnaire designed to measure anxiety
simultaneously measured depression, the instrument would not be a reliable measure of anxiety.
A reliable instrument or test must meet two conditions: it must have a small random error; and it
must measure a single dimension.
Among others, one major source of inconsistency in test results is random measurement
error. A primary concern of test developers and test users is therefore to determine the extent to
which random measurement errors influence test performance. The classical true score model
provides a useful theoretical framework for defining reliability and for the development of
practical reliability investigations. In the classical true score model, an examinee’s or a subject’s
observed score on a particular test is viewed as a random sample of one of the many possible test
scores that a person could have earned under repeated administrations of the same test; and the
observed score (X) is envisioned as the composite of two hypothetical components - a true score
(T) and a random error component (E). T is defined as the expected value of the examinee’s test
scores over many repeated testings with the same test and E is the discrepancy between an
examinee’s observed score and his/her true score. The following equation summarizes the
relationship between X, T and E:
X=T+E
An important question which follows from the above is: How closely related are the
examinees’ true and observed scores on a particular test or instrument? Based on the classical
true score model1, two indices are derived to measure the relationship between true and observed
scores.
1.
Reliability coefficient - defined as the correlation between parallel measures2 .
1
“X = T + E” is only one of the assumptions of the classical true score theory. Please
consult texts on measurement/test theory for other assumptions in the model as well as how the
reliability coefficient and the reliability index are derived from the model.
2
According to classical true score theory, two measures/tests are defined as parallel when
1) each examinee or subject has the same true score on both measures/tests, and 2). The error
variances of the two measures/tests are equal. Based on this definition, it is sensible to assume that
2
This coefficient ( Dxx,) can be shown to equal the ratio F
observed score variance due to true score variance.
2.
2
/F2X , the proportion of
T
Reliability index - defined as the correlation between true and observed scores on a single
measure (i.e. DXT) and is equivalent to Fx/FT.
However, in reality, we rarely know about the true scores. Besides, the reliability
coefficient defined above is purely a theoretical concept because it is not possible to verify that
two tests are truly parallel. Therefore reliability of tests have to be estimated using other
methods.
Methods of Estimating Reliability:
The methods of estimating reliability can be roughly categorized into two groups: one
group of methods includes methods that require two separate test administrations; and another
group of methods includes those using one test administration.
1.
Methods Requiring Two Separate Test Administrations:
a.
Test-Retest Method Test-Retest method yields a reliability estimate, m12, is based on testing the same
examinees/subjects twice with the same test/scale and then correlating the results. If each
examinee/subject receives exactly the same observed score on the second testing as
he/she did on the first, and if there is some variance in the observed scores among
examinees/subjects, then the correlation is 1.0, indicating perfect reliability. The
correlation coefficient obtained from this test-retest procedure is called the coefficient of
stability, which measures how consistently examinees/subjects respond to this test/scale
at different times.
b.
Alternate-Forms Method This method involves constructing two similar forms of a test/scale (i.e. both forms have
the same content) and administering both forms to the same group of examinees within a
very short time period. The correlation between observed scores on the alternate
test/scale forms, (i.e. mxy computed using the Pearson product moment formula), is n
estimate of the reliability of either one of the alternate forms. This correlation coefficient
is known as coefficient of equivalence.
a.
Test-Retest with Alternate Forms Method
This method is a combination of the test-retest and alternate-forms methods. In
parallel tests are matched in content.
Dr. Robert Gebotys 2003
3
this case, the procedure is to administer form 1 of the test/scale, wait, and then administer
form 2. The correlation coefficient between the two sets of observed scores is an
estimate of the reliability of either one of the alternate forms and is known as the
coefficient of stability and equivalence.
2.
Methods Using One Test Administration:
There are many situations when a single form of a test/scale will be administered
only once to a group of examinees/subjects. The following are methods of estimating
reliability based on scores from a single test administration. These methods of estimating
reliability are mainly focused on how consistently the examinees/subjects performed or
scored across items or subsets of items on this single test/scale form. The reliability
estimates generated by these methods are usually called coefficient of internal
consistency.
These methods of estimating reliability are based on the argument that if the
scores of the subjects/examinees are consistent across items or subsets of items on the
single test/scale form, then it is reasonable to think that these items or subsets of items
came from the same content domain and were constructed according to the same
specifications. In addition, if the examinees/subjects’ performance is consistent across
subsets of items within a test/scale, the test/scale administrator can also have some
confidence that this performance would generalize to other possible items in the content
domain.
a.
Reliability Estimates Based on Item Variances: Calculation of Cronbach’s Alpha This is the most widely used method of estimating reliability using a single test
administration. Cronbach’s Alpha (") is calculated based on the following formula:
" =
k / k -1 ( { 1 - E F i
2
} / F2x )
where k is the number of items on the test/scale, F2i is the variance of item i, and F2x is
the total test variance Cronbach’s " can actually be conceived as the average of all the
possible split-half reliabilities (Calculation of split-half reliabilities will be discussed in a
following section) estimated on the single test/scale. However, unlike the split-half
methods, Cronbach’s " is not affected by how the items are arranged in the test/scale.
b.
Split-Half Method Under this method, test/scale developers divide the scale/test into two halves, so
that the first half forms the first part of the entire test/scale and the second half forms the
remaining part of the test/scale. Both halves are normally of equal lengths and they are
designed in such a way that each is an alternate form of the other. Estimation of
reliability is based on correlating the results of the two halves of the same test/scale. If
4
the two halves of the test/scale are parallel forms of one another, the Spearman Brown
prophecy formula is used to estimate the reliability coefficient of the entire test/scale.
The Spearman Brown prophecy formula is:
Dxx’
=
2 DYY, / 1 +
DYY,
where Dxx’ is the reliability projected for the full-length test/scale, and DYY` is the
correlation between the half-tests. DYY, is also an estimate of the reliability of the
test/scale if it contains the same number of items as that contained in the half-test.
If the two halves of test/scale are not parallel, the reliability of the full-length
test/scale is calculated using the formula for coefficient " for split halves:
" = 2 [ F2x - ( F2Y1 + F2 Y2) ] 1 / F2x
Where F Y1 and F Y2 are the variances of scores on the two halves of the test, and F
is the variance of the scores on the whole test, with X = Yl + Y2.
2
2
2
x
In the SPSS program, the ‘SPLIT-HALF” model for reliability analysis is
conducted on the assumption that the two halves of the test/scale are parallel forms.
Hence, coefficient " has to be obtained by hand calculations.
Besides, it must be noted that split-half reliability estimate is contingent upon how
the items in the test/scale are arranged. Reordering of the items and/or regrouping of
items in the test/scale can result in different reliability estimates using the split-half
method. Hence, reliability estimate obtained from the even/odd method (a method which
is similar to split-half method and which will be mentioned below) on the same test/scale
will most likely be different from the reliability estimated by using the split-half method.
c.
Even/Odd Method Even/odd method is similar to split-half method, with the exception that the
estimation of reliability for the entire test/scale is no longer based on correlating the first
half of the test/scale with the second half, but instead it is based on correlating even items
with odd items.
Determining Reliability Using SPSS:
Example 1:
Dr. Robert Gebotys 2003
5
The following illustrative example contains six items extracted from a scale used
to measure adolescents’ attitude towards the use of physical aggressive behaviours in
their daily life. Each item in the scale refers to a situation where physical aggressive
behaviour is or is not used. Adolescents are asked whether they agree or disagree with
each and every item on the scale. Adolescents’ responses to the items are converted to
scores of either 1 or 0, where 0 represents the endorsement of the use of physical
aggressive behaviours and 1 represents disapproval of the use of physical aggressive
behaviours. Below are the contents of the six items as well as the scores of 14
adolescents on these six items:
Item No.
Content
1
When there are conflicts, people won’t listen to you unless you get physically
aggressive.
2
It is hard for me not to act aggressively if I am angry with someone.
3
Physical aggression does not help to solve problems, it only makes situations
worse.
4
There is nothing wrong with a husband hitting his wife if she has an affair.
5
Physical aggression is often needed to keep things under control.
6
When someone makes me mad, I don’t have to use physical aggression. I can
think of other ways to express my anger.
6
The following is the data obtained from 14 adolescents:
Person
1
2
Items
3
4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0
0
1
1
1
0
0
1
0
0
1
0
0
0
0
0
0
1
1
0
0
1
0
1
1
0
0
0
0
0
1
1
1
1
1
1
0
0
1
1
0
0
0
0
1
1
1
0
1
1
1
1
0
1
0
0
5
6
0
1
1
1
1
0
1
1
0
0
1
1
0
0
0
0
0
1
1
0
0
0
0
1
1
1
0
0
In the pages that follow, we will first outline the major commands for different models of
reliability analyses and briefly explain the usage of these commands. Then, the whole program
for the different reliability analyses will be reproduced in the next section, which will in turn be
followed by discussions on the outputs.
SPSS Commands for Reliability Analyses:3
1.
Calculation of Cronbach’s Alpha:
reliability
variables=item 1 to item 6/
scale (test score) =item 1 to item 6/
model = alpha
statistics all
The subcommand “scale (testscore) = item1 to item 6" specifies the items on which
reliability analysis is to be carried out. In this case, item1 to item6 will form the “scale” on
which analysis will be done. The subcommand “model =alpha” instructs the computer to
perform the “ALPHA” model (i.e. to calculate Cronbach’s Alpha) for reliability analysis.
3
In the following illustrations and explanations, only a sample of commonly used commands
and computer languages are shown. Students are advised to consult SPSS User’s Guide for
other appropriate commands and computer languages in reliability analyses.
Dr. Robert Gebotys 2003
7
a.
b.
c.
d.
e.
f.
g.
h.
2.
The command “statistics all” will instruct the computer to give us the following additional
statistics from reliability analysis:4
Item means and standard deviations;
Inter-item covariance matrix;
Inter-item correlation matrix;
Scale mean, variance and standard deviation;
Summary statistics for item means, item variances, inter-item covariances, inter-item
correlations, and item-total statistics (i.e. summary statistics comparing each item to the
scale composed of other items (including alpha (") if that item is deleted));
ANOVA;
Hotelling’s T-Squared;
Other statistics like Friedman’s chi-square, Kendall’s coefficient of concordance and
Cochran’s Q, if applicable.
Assessing Split-Half Reliability:
reliability
variables=item1 to item6/
statistics=scale/
summary=means variances covariance correlations/
scale (test score) =item1 to item6/
model=split
The “scale (test score) =item1 to item6" subcommand specifies the number as well as the
order of the items on which subsequent reliability analysis is to be performed. The
subcommand “model=split’ instructs the computer to use the “SPLIT-HALF” model for
reliability analysis on the scale. A split-half reliability analysis will be performed based on
the order in which the items were named on the preceding “scale” subcommand, i.e., the
first half of the items (rounding up if the number of items is odd) form the first part/half,
and the remaining items form the second part/half. In this case, items 1, 2 and 3 will form
the first part and items 4, 5 and 6 will form the second part.
Since the inter-item covariance matrix, inter-item correlation matrix, item means and
standard deviations as well as the item-total statistics produced from this reliability analysis
are the same as those produced in the preceding “ALPHA” model (because the two
analyses were performed on the same set of data), we may not want to look at these again
at this stage. However, we may be interested in knowing the following:
a.
the means and standard deviations of each of the two parts of the scale;
b.
the summary statistics (i.e. item means, item variances, inter-item
4
Only outputs containing statistics categorized under a to e will be reproduced and
discussed in subsequent pages because these are already sufficient in terms of serving the purposes
and needs of our present analyses. Statistics under categories f to h will not be reported.
8
covariances and inter-item correlations) of each of the two parts of the scale.
The insertion of the two subcommands, namely, “statistics=scale” and “summary= . . .
correlations”, into the computer program will enable us to obtain the above-mentioned
statistics which were not provided by the previous analysis based on the “ALPHA” model.5
3.
Estimating Even/Odd Reliability:
reliability
variables=item1 to item6/
scale (test score) =item1 item3 item5 item2 item4 item6/
model=split
statistics all
Since “EVEN/ODD” model for reliability analysis is not an available option in SPSS, the
“SPLIT-HALF” model is used for this analysis. However, in order that the “SPLITHALF” model can be successfully employed for estimating even/odd reliability, the order
of the items listed in the preceding “scale” subcommand must have been arranged in such a
way that the odd items form the first part of the scale and that the even items form the
remaining part. Please see the above “scale” subcommand for an illustration.
As already mentioned, the command “statistics all” instructs the computer to produce the
eight categories of additional statistics from reliability analysis6 In a later section, it will be
shown that the item-total summary statistics, items means and standard deviations, interitem covariance matrix, and the inter-item correlation matrix produced in this analysis are
virtually the same as those produced from the “ALPHA” model of reliability analysis, with
the exception that the statistics are displayed slightly differently as a result of reordering
the six items. Alternatively, additional statistics which are specific to this model of
reliability analysis and which are of interest to us can be obtained by using the same
“statistics=scale” and “summary= . . . correlations” subcommands as those shown in the
computer program for “SPLIT-HALF” model of reliability analysis.
Conducting All the Above-mentioned 3 Models of Reliability Analyses on the Set of Scores
Obtained from 14 Adolescents for the 6 Items Using SPSS
1.
SPSS Computer Program
5
If you want the full set of additional statistics from split-half reliability analysis, you have
to write into the program the command “statistics all” in the same manner as that shown in the
computer program for conducting the “ALPHA” model of reliability test.
6
Again, only statistics under categories a to e will be reported and discussed in
subsequent pages.
Dr. Robert Gebotys 2003
9
2.
SPSS Outputs and Discussions7
a.
Reliability Analysis - “ALPHA” Model
The initial part of the output contains descriptive statistics on each of the items (i.e. means
and standard deviations), an inter-item covariance8 matrix and an inter-item correlation
matrix. These will be followed by descriptive statistics for the scale and the summary
statistics.
****** Method 2 (covariance matrix) will be used for this analysis ******
R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)
Mean
1.
2.
3.
4.
5.
6.
ITEM1
ITEM2
ITEM3
ITEM4
ITEM5
ITEM6
.3571
.3571
.5714
.5714
.5714
.3571
7
Std Dev
Cases
.4972
.4972
.5136
.5136
.5136
.4972
14.0
14.0
14.0
14.0
14.0
14.0
Discussions and Explanations are in italics. These are not parts of the original computer
outputs.
8
Covariance (Sxy) is defined as the average product of the deviations in X and Y, where a
deviation is a distance from the mean. Its relation with the Pearson product-moment correlation coefficient is
illustrated by the formula: mxy =
Sxy
Sx Sy
.
10
R E L I A B I L I T Y A N A L Y S I S - S C A L E (T E S T S C O R)
Correlation Matrix
ITEM1
ITEM1
ITEM2
ITEM3
ITEM4
ITEM5
ITEM6
1.0000
.6889
.6455
.3443
.6455
.3778
ITEM2
1.0000
.3443
.3443
.3443
.6889
ITEM3
1.0000
.4167
.7083
.3443
ITEM4
1.0000
.4167
.3443
ITEM5
1.0000
.3443
ITEM6
ITEM6
1.0000
It is shown in the above inter-item correlation matrix that the largest correlation coefficient
occurs between items 3 and 5 (i.e. r = .7083). Item 2 is also fairly highly correlated with
both item 1 and item 6 (i.e. r in both cases are .6889). The lowest correlation coefficient is
.3443, which occurs between a number of pairs of items (e.g. between item 1 and item 4,
etc.)
RELIABILITY ANALYSIS - SCALE
(T E S T S C O R)
R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)
R E L I A B I L I T Y
N of Cases =
A N A L Y S I S
-
S C A L E
(A L P H A)
14.0
Mean
2.7857
Variance
5.1044
Std Dev
2.2593
N of
Variables
6
Item Means
Mean
.4643
Minimum
.3571
Maximum
.5714
Range
.2143
Max/Min
1.6000
Variance
.0138
Item Variances
Mean
.2555
Minimum
.2473
Maximum
.2637
Range
.0165
Max/Min
1.0667
Variance
.0001
Mean
.1190
Minimum
.0879
Maximum
.1868
Range
.0989
Max/Min
2.1250
Variance
.0015
Mean
.4665
Minimum
.3443
Maximum
.7083
Range
.3641
Max/Min
2.0575
Variance
.0234
Statistics for
Scale
Inter-item
Covariances
Inter-item
Correlations
Dr. Robert Gebotys 2003
11
The section of output reproduced above gives us descriptive statistics for the scale9 and
summary statistics for the items.
From the above section, it can be seen that the average score for the scale is 2.7857 and the
standard deviation is 2.2593. The average score on an item is 0.4643, with a range of 0.2143
(i.e. maximum minus minimum). The average of the item variances is 0.2555, with a minimum
of 0.2473 and a maximum of 0.2637. These show that the items in the scale have fairly
comparable variances. The average covariance between the items is .119. The correlations
between the items range from .3443 to .7083. The ratio between the largest and the smallest
correlations is .7083/.3443, or 2.0575. The average correlation between the items is .4665.
The item-total summary statistics forms the next section of the output and is reproduced below:
Item-total Statistics
ITEM1
ITEM2
ITEM3
ITEM4
ITEM5
ITEM6
Scale
Mean
if Item
Deleted
Scale
Variance
if Item
Deleted
2.4286
2.4286
2.2143
2.2143
2.2143
2.4286
3.4945
3.6484
3.5659
3.8736
3.5659
3.8022
Corrected
ItemTotal
Correlation
.7330
.6364
.6572
.4784
.6572
.5440
Squared
Multiple
Correlation
Alpha
if Item
Deleted
.7511
.7469
.6000
.2533
.6000
.5733
.7901
.8095
.8051
.8404
.8051
.8273
Item-total Statistics
For each item, the first column of the above set of statistics shows what the average score for
the scale would be if the item were excluded from the scale. For example, if item 1 were
deleted from the scale, the mean score of the scale would be 2.4286. The next column in this
set of statistics is the scale variance if the item were eliminated. The column labeled
“Corrected Item-Total Correlation” is the Pearson correlation coefficient between the score
on the individual item and the sum of the scores on the remaining items. For example, the
smallest correlation reported is .4784, which occurs between the score on item 4 and the sum
of the scores of items 1, 2, 3, 5 and 6. We can say that the relationship between item 4 and the
other items is not very strong. Comparatively speaking, the relationship between item 1 and
the other items is much stronger, with r = .7330.
Another way of looking at the relationship between an individual item and the rest of the scale
is to try to predict a person’s score on the item based on the scores obtained on the other
9
The scale in this case is formed by items 1 to 6. For each individual adolescent (or case), a
score on the scale is computed by adding his/her scores on the six items.
12
items. We can do this by calculating a multiple regression equation with the item of interest as
the dependent variable and with all of the other items as independent variables. The multiple
R2 from this regression equation is displayed for each of the items in the column labeled
“Squared Multiple Correlation”. We can see that about 75% of the observed variability in the
responses to item 1 can be explained by the other items. As expected, item 4 is less well
predicted from the other items. Its multiple R2 is only .2533.
The final column “Alpha if Item Deleted” tells us how the reliability of the scale is affected by
each of the items. Six Cronbach’s "’s are reported in this column, each representing the
Cronbach’s " of the scale when one item on the scale is removed. As will be shown later, the
Cronbach’s " for the entire scale of 6 items is .8396. We can see from this column of statistics
that removing item 4 from the scale causes " to increase from .8396 to .8404. On the other
hand, eliminating any items other than item 4 from the scale will cause the " to decrease. If
for some reason the scale must have to be shortened, then item 4 will logically be the first one
to go. Conversely, it will be most undesirable to remove item 1 from the scale because " will
decrease to .79 as a result.
The final results of the reliability analysis based on the “ALPHA” model is reported in the
final section of the output and is reproduced below:
Reliability Coefficients
Alpha =
6 items
.8396
Standardized item alpha =
.8399
Cronbach’s alpha is shown in the above output. The value is .8396 and can be regarded as
quite large. This indicates that the 6 - item scale is quite reliable. “Standardized item alpha”
refers to the " that would be obtained if all of the items were standardized to have a variance
of 1. Since there is not much variation among the variances of the 6 items in the scale10, there
is therefore little difference between the two reported "’s. If items in the scale have widely
differing variances, the two "’s may differ substantially.
b.
Reliability Analysis - “SPLIT-HALF” Model
RELIABILITY
ANALYSIS
-
S C A L E (T E S T S C O R)
The subcommand “statistics=scale” instructs the computer to produce the above output, while
the subcommand “summary=means . . . correlations” instructs the computer to perform and
produce the following:
10
Please refer to the statistics reported in the section on “summary statistics on the items” as
well as the discussion in that section. The variances of individual items can be computed by squaring
the standard deviations reported for individual items in the initial section of the output.
Dr. Robert Gebotys 2003
13
R E L I A B I L I T Y
A N A L Y S I S
N of Cases =
-
S C A L E
(S P L I T)
14.0
Mean
1.2857
1.5000
2.7857
Variance
1.6044
1.3462
5.1044
Std Dev
1.2666
1.1602
2.2593
N of
Variables
3
3
6
Item Means
Part 1
Part 2
Scale
Mean
.4286
.5000
.4643
Minimum
.3571
.3571
.3571
Maximum
.5714
.5714
.5714
Range
.2143
.2143
.2143
Max/Min
1.6000
1.6000
1.6000
Variance
.0153
.0153
.0138
Item Variances
Part 1
Part 2
Scale
Mean
.2527
.2582
.2555
Minimum
.2473
.2473
.2473
Maximum
.2637
.2637
.2637
Range
.0165
.0165
.0165
Max/Min
1.0667
1.0667
1.0667
Variance
.0001
.0001
.0001
Inter-item
Covariances
Part 1
Part 2
Scale
Mean
.1410
.0952
.1190
Minimum
.0879
.0879
.0879
Maximum
.1703
.1099
.1868
Range
.0824
.0220
.0989
Max/Min
1.9375
1.2500
2.1250
Variance
.0017
.0001
.0015
Inter-item
Correlations
Part 1
Part 2
Scale
Mean
.5596
.3684
.4665
Minimum
.3443
.3443
.3443
Maximum
.6889
.4167
.7083
Range
.3446
.0724
.3641
Max/Min
2.0010
1.2103
2.0575
Variance
.0282
.0014
.0234
Statistics for
Part 1
Part 2
Scale
Please note that the descriptive statistics for the entire scale and the summary statistics over
all items in the entire scale given in these sections of the computer output are identical to those
produced in the corresponding sections of the output based on the “ALPHA” model of
reliability analyses (check statistics on the “scale” row of corresponding sets of statistics).
The significant feature of these sections of the output is that descriptive and summary statistics
are given for each of the two parts of the scale, namely, Part 1 which is formed by items 1, 2
and 3, and Part 2 which is composed of items 4, 5 and 6. It is clearly evident that the two
Parts have different means and standard deviations, as well as different item means, item
variances, inter-item covariances and inter-item correlations.
Reliability Coefficients
6 items
Correlation between forms =
.7328
Equal-length Spearman-Brown =
.8458
Guttman Split-half =
.8439
Unequal-length Spearman-Brown =
.8458
Alpha for part 1 =
.7911
Alpha for part 2 =
.6367
3 items in part 1
3 items in part 2
14
The above section of the output contains the results of reliability analysis based on the
“SPLIT-HALF” model. The correlation between the two halves (or parts), labeled on the
output as “Correlation between forms”, is .7328. This is an estimate of the reliability of the
scale if it has three items. The equal length Spearman-Brown coefficient, which has a value of
.8458 in this case, tells us what the reliability of the entire scale would be if it was made up of
two equal (or parallel) parts that have a three-item reliability of .7328. If the number of items
on each of the two parts is not equal, the unequal length Spearman-Brown coefficient can be
used to estimate the reliability of the overall scale. In the present example, since the two parts
of the scale are of equal length, the two Spearman-Brown coefficients are identical. The
Guttman split-half coefficient is another estimate of the reliability of the overall scale. It does
not assume that the two parts are equally reliable or have the same variance, hence the
reliability coefficient produced is smaller. Finally, separate values of Cronbach’s " are also
shown for each of the two parts of the scale in the output.
c.
Reliability Analysis - “EVEN/ODD” Model
RELIABILITY ANALYSIS
-
S C A L E (T E S T S C O R)
****** Method 2 (covariance matrix) will be used for this analysis ******
R E L I A B I L I T Y
1.
2.
3.
4.
5.
6.
ITEM1
ITEM3
ITEM5
ITEM2
ITEM4
ITEM6
A N A L Y S I S
-
S C A L E
Mean
Std Dev
Cases
.3571
.5714
.5714
.3571
.5714
.3571
.4972
.5136
.5136
.4972
.5136
.4972
14.0
14.0
14.0
14.0
14.0
14.0
(S P L I T)
Covariance Matrix
Correlation Matrix
ITEM1
ITEM3
ITEM5
ITEM2
ITEM4
ITEM6
ITEM1
1.0000
.6455
.6455
.6889
.3443
.3778
ITEM6
ITEM6
1.0000
Dr. Robert Gebotys 2003
ITEM3
ITEM5
ITEM2
ITEM4
1.0000
.7083
.3443
.4167
.3443
1.0000
.3443
.4167
.3443
1.0000
.3443
.6889
1.0000
.3443
15
The additional statistics produced in this section of the output are basically similar to those shown in
corresponding sections of the output based on the “ALPHA” model. The only difference is that as a
result of reordering the items in the scale, the statistics are displayed differently here.
N of Cases =
14.0
Mean
1.5000
1.2857
2.7857
Variance
1.8077
1.4505
5.1044
Std Dev
1.3445
1.2044
2.2593
N of
Variables
3
3
6
Item Means
Part 1
Part 2
Scale
Mean
.5000
.4286
.4643
Minimum
.3571
.3571
.3571
Maximum
.5714
.5714
.5714
Range
.2143
.2143
.2143
Max/Min
1.6000
1.6000
1.6000
Variance
.0153
.0153
.0138
Item Variances
Part 1
Part 2
Scale
Mean
.2582
.2527
.2555
Minimum
.2473
.2473
.2473
Maximum
.2637
.2637
.2637
Range
.0165
.0165
.0165
Max/Min
1.0667
1.0667
1.0667
Variance
.0001
.0001
.0001
Inter-item
Covariances
Part 1
Part 2
Scale
Mean
.1722
.1154
.1190
Minimum
.1648
.0879
.0879
Maximum
.1868
.1703
.1868
Range
.0220
.0824
.0989
Max/Min
1.1333
1.9375
2.1250
Variance
.0001
.0018
.0015
Inter-item
Correlations
Part 1
Part 2
Scale
Mean
.6664
.4591
.4665
Minimum
.6455
.3443
.3443
Maximum
.7083
.6889
.7083
Range
.0628
.3446
.3641
Max/Min
1.0973
2.0010
2.0575
Variance
.0011
.0317
.0234
Statistics for
Part 1
Part 2
Scale
The above section of the output looks similar to the corresponding section produced under the
“SPLIT-HALF model. In fact, the descriptive and summary statistics reported in these two outputs
for the entire scale are identical. However, descriptive and summary statistics for corresponding
parts of the scale reported in the two outputs are not the same. The differences originate from the
fact that the compositions of Part 1 and Part 2 are altered in the present analysis, i.e., Part 1 is made
up of items 1, 3 and 5; and Part 2 is composed of items 2, 4 and 6.
16
Item-total Statistics
Item-total Statistics
ITEM1
ITEM3
ITEM5
ITEM2
ITEM4
ITEM6
Scale
Mean
if Item
Deleted
Scale
Variance
if Item
Deleted
2.4286
2.2143
2.2143
2.4286
2.2143
2.4286
3.4945
3.5659
3.5659
3.6484
3.8736
3.8022
Corrected
ItemTotal
Correlation
.7330
.6572
.6572
.6364
.4784
.5440
Squared
Multiple
Correlation
Alpha
if Item
Deleted
.7511
.6000
.6000
.7469
.2533
.5733
.7901
.8051
.8051
.8095
.8404
.8273
The item-total statistics reported in the present analysis are exactly the same as those reported under
the “ALPHA” model, with the only exception that the statistics are arranged differently. Again this is
a direct result of reordering the items in the scale.
Reliability Coefficients
6 items
Correlation between forms =
.5700
Equal-length Spearman-Brown =
.7262
Guttman Split-half =
.7234
Unequal-length Spearman-Brown =
.7262
Alpha for part 1 =
.8571
Alpha for part 2 =
.7159
3 items in part 1
3 items in part 2
The above are the results of the reliability analysis based on the “EVEN/ODD” model. Please note
that the correlation coefficient between the parts formed respectively by even and odd items is
smaller than the correlation reported in the “SPLIT-HALF” model (i.e. .5700 compared with .7328
in the “SPLIT-HALF” model). As a result, the Spearman-Brown coefficients reported in this analysis
are comparatively smaller (i.e. .7262 against .8458). This illustrative example shows that “splithalf” reliability analyses are capable of producing different reliability estimates on the same scale,
depending on the methods researchers used in splitting items in the scale.
Determining Reliability Using SPSS:
Example 2:
The following questionnaire was developed by a researcher as part of an effort to collect participants’
feedback on a five-week community-based program designed to teach individuals disease prevention
Dr. Robert Gebotys 2003
17
and to encourage healthier lifestyles. The questionnaire contained six items. Respondents were
asked to respond to each item according to the following scale:
Strongly
Agree
Agree
1
No Opinion
2
3
Disagree
Strongly
Disagree
4
5
The 6 items in the questionnaire were:
1.
The goals of the program are clear.
2.
I feel comfortable in discussing my plans, concerns and experiences with the group.
3.
The materials covered in the program are helpful.
4.
The health contract is useful in assisting me to make healthy lifestyle changes.
5.
Overall speaking, the group is supportive.
6.
Overall, the program is useful in assisting me develop positive changes towards healthy lifestyles.
The following is the data obtained from 10 participants:
Items
Person
1
2
3
4
5
6
1
2
3
4
5
6
7
8
9
10
2
1
4
5
2
3
4
2
2
3
3
2
3
3
1
3
5
1
2
4
1
1
4
2
2
1
2
2
2
2
3
1
5
4
2
3
3
2
2
5
4
3
3
3
1
3
4
1
2
4
2
1
3
2
1
1
2
1
2
2
Conducting Cronbach’s Alpha; Split-Half Reliability & Even-Odd Reliability Analyses on the Set of
Scores Obtained from 10 Respondents for the 6 Items Using SPSS
1. SPSS Computer Program
18
2. SPSS Outputs and Results11
a. Reliability Analysis - “Alpha” Model
****** Method 2 (covariance matrix) will be used for this analysis ******
R E L I A B I L I T Y
1.
2.
3.
4.
5.
6.
ITEM1
ITEM2
ITEM3
ITEM4
ITEM5
ITEM6
A N A L Y S I S
-
S C A L E
Mean
Std Dev
Cases
2.8000
2.7000
1.9000
3.0000
2.8000
1.7000
1.2293
1.2517
.8756
1.3333
1.1353
.6749
10.0
10.0
10.0
10.0
10.0
10.0
(A L P H A)
Correlation Matrix
ITEM1
ITEM2
ITEM3
ITEM4
ITEM5
ITEM6
ITEM1
ITEM2
ITEM3
ITEM4
ITEM5
1.0000
.6066
.4955
.7457
.3662
.5892
1.0000
.0710
.5992
.8914
.5392
1.0000
.5710
-.1341
.6956
1.0000
.5138
.7408
1.0000
.4930
ITEM6
ITEM6
1.0000
R E L I A B I L I T Y
N of Cases =
A N A L Y S I S
10.0
-
S C A L E
(A L P H A)
Mean
14.9000
Variance
25.8778
Std Dev
5.0870
N of
Variables
6
Item Means
Mean
2.4833
Minimum
1.7000
Maximum
3.0000
Range
1.3000
Max/Min
1.7647
Variance
.2937
Item Variances
Mean
1.2278
Minimum
.4556
Maximum
1.7778
Range
1.3222
Max/Min
3.9024
Variance
.2621
Statistics for
Scale
Inter-item
11
Only sections of outputs relevant to the purposes and needs of our present analysis will be reproduced
below. Brief discusions are in italics and they are not parts of the original computer outputs.
Dr. Robert Gebotys 2003
Covariances
Inter-item
Correlations
Mean
.6170
Minimum
-.1333
Maximum
1.2667
Range
1.4000
Max/Min
-9.5000
19
Variance
.1434
Mean
.5189
Minimum
-.1341
Maximum
.8914
Range
1.0255
Max/Min
-6.6457
Variance
.0651
Item-total Statistics
ITEM1
ITEM2
ITEM3
ITEM4
ITEM5
ITEM6
Scale
Mean
if Item
Deleted
Scale
Variance
if Item
Deleted
12.1000
12.2000
13.0000
11.9000
12.1000
13.2000
16.9889
16.8444
22.0000
15.4333
18.9889
20.6222
Reliability Coefficients
Alpha =
.8584
Corrected
ItemTotal
Correlation
Squared
Multiple
Correlation
Alpha
if Item
Deleted
.7344
.9060
.8634
.7752
.9363
.8531
.8192
.8196
.8750
.7973
.8499
.8311
.7281
.7267
.3788
.8273
.5660
.7830
6 items
Standardized item alpha =
.8662
The Cronbach’s Alpha reported in the above analysis is .8584. This indicates that the 6-item
questionnaire is quite reliable. The last column in the Item-total Statistics indicates that removing
item 4 from the questionnaire will lead to a drop of Cronbach’s " from .8584 to .7973; while
removing item 3 from the questionnaire will lead to an increase of Cronbach’s " from .8584 to
.8750.
b.
Reliability Analysis - “SPLIT-HALF” Model
RELIABILITY ANALYSIS
R E L I A B I L I T Y
N of Cases =
Statistics for
Part 1
Part 2
Scale
Item Means
Part 1
Part 2
Scale
- S C A L E (T E S T S C O R)
A N A L Y S I S
-
S C A L E
(S P L I T)
10.0
Mean
7.4000
7.5000
14.9000
Variance
6.9333
7.1667
25.8778
Std Dev
2.6331
2.6771
5.0870
N of
Variables
3
3
6
Mean
2.4667
2.5000
2.4833
Minimum
1.9000
1.7000
1.7000
Maximum
2.8000
3.0000
3.0000
Range
.9000
1.3000
1.3000
Max/Min
1.4737
1.7647
1.7647
Variance
.2433
.4900
.2937
20
Item Variances
Part 1
Part 2
Scale
Mean
1.2815
1.1741
1.2278
Minimum
.7667
.4556
.4556
Maximum
1.5667
1.7778
1.7778
Range
.8000
1.3222
1.3222
Max/Min
2.0435
3.9024
3.9024
Variance
.1995
.4470
.2621
Inter-item
Covariances
Part 1
Part 2
Scale
Mean
.5148
.6074
.6170
Minimum
.0778
.3778
-.1333
Maximum
.9333
.7778
1.2667
Range
.8556
.4000
1.4000
Max/Min
12.0000
2.0588
-9.5000
Variance
.1466
.0341
.1434
Inter-item
Correlations
Part 1
Part 2
Scale
Mean
.3910
.5825
.5189
Minimum
.0710
.4930
-.1341
Maximum
.6066
.7408
.8914
Range
.5356
.2478
1.0255
Max/Min
8.5474
1.5026
-6.6457
Variance
.0639
.0151
.0651
Reliability Coefficients
6 items
Correlation between forms =
.8354
Equal-length Spearman-Brown =
.9103
Guttman Split-half =
.9103
Unequal-length Spearman-Brown =
.9103
Alpha for part 1 =
.6683
Alpha for part 2 =
.7628
3 items in part 1
RELIABILITY COEFFICIENTS
3 items in part 2
6
ITEMS
The Spearman-Brown results reported in the output of reliability analysis based on the “SPLITHALF “ model indicate that the reliability of the entire scale/questionnaire is .9103 if it is made up
of two equal (or parallel) parts that have a three-item reliability of .8354 each. Separate values of
Cronbach’s "s are shown for each of the two parts of the scale/questionnaire, i.e. Cronbach’s "
for the first half is .6683 and that for the second half is .7628.
c.
Reliability Analysis - “EVEN/ODD” Model
R E L I A B I L I T Y A N A L Y S I S - S C A L E (T E S T S C O R)
N of Cases =
Statistics for
Part 1
Part 2
Scale
Mean
7.5000
7.4000
14.9000
Dr. Robert Gebotys 2003
10.0
Variance
5.3889
8.0444
25.8778
Std Dev
2.3214
2.8363
5.0870
N of
Variables
3
3
6
21
Item Means
Part 1
Part 2
Scale
Mean
2.5000
2.4667
2.4833
Minimum
1.9000
1.7000
1.7000
Maximum
2.8000
3.0000
3.0000
Range
.9000
1.3000
1.3000
Max/Min
1.4737
1.7647
1.7647
Variance
.2700
.4633
.2937
Item Variances
Part 1
Part 2
Scale
Mean
1.1889
1.2667
1.2278
Minimum
.7667
.4556
.4556
Maximum
1.5111
1.7778
1.7778
Range
.7444
1.3222
1.3222
Max/Min
1.9710
3.9024
3.9024
Variance
.1460
.5046
.2621
Inter-item
Covariances
Part 1
Part 2
Scale
Mean
.3037
.7074
.6170
Minimum
-.1333
.4556
-.1333
Maximum
.5333
1.0000
1.2667
Range
.6667
.5444
1.4000
Max/Min
-4.0000
2.1951
-9.5000
Variance
.1147
.0603
.1434
Inter-item
Correlations
Part 1
Part 2
Scale
Mean
.2425
.6264
.5189
Minimum
-.1341
.5392
-.1341
Maximum
.4955
.7408
.8914
Range
.6296
.2016
1.0255
Max/Min
-3.6942
1.3738
-6.6457
Variance
.0885
.0086
.0651
Reliability Coefficients
6 items
Correlation between forms =
.9450
Equal-length Spearman-Brown =
.9717
Guttman Split-half =
.9618
Unequal-length Spearman-Brown =
.9717
Alpha for part 1 =
.5072
Alpha for part 2 =
.7914
3 items in part 1
RELIABILITY COEFFICIENTS
3 items in part 2
6 ITEMS
The Spearman-Brown coefficient reported in the results of reliability analysis based on the “EVENODD” model is .9717, indicating that the 6 - item questionnaire is very reliable. This SpearmanBrown coefficient is even higher than that reported in the “SPLIT-HALF” model. The correlation
coefficient between the parts formed respectively by even and odd items is also larger than the
correlation reported in the “SPLIT-HALF” model (i.e., .9450 compared with .8354). The
Cronbach’s " for the first part of the questionnaire (i.e. which is made up of odd items) is .5072,
while that for the second part is .7914.
22
Part Five: Using SPSS for Windows to Implement Reliability
Analyses
The following section will outline the steps necessary in undertaking three different forms of ‘single
test administration’ reliability analyses: (1) Cronbach alpha, (2) Split-half, and (3) Even-odd. For
further discussion of these reliability measures, students are encouraged to consult Bob Gebotys’
“Handout on Reliability.”
1.1 Conducting a Reliability Analysis using the Cronbach Alpha (α
α) Measure
For this analysis we will use the data regarding adolescent attitudes toward physical aggression, as
outlined on page 6 of Gebotys’ “Handout on Reliability.”
In order to conduct this analysis, the following steps are required.
1.
2.
3.
Enter the aforementioned data set into an SPSS Data Editor Window (see Section 2.1 for
instructions, if necessary).
Next, click Statistics on the main menu bar, followed by Scale, and then Reliability
Analysis… This series of clicks will open a Reliability Analysis dialogue box similar to the
one shown below.
You should note that all of the variables (all Items) are listed in the text box at the left-hand
side of the dialog box. Take your cursor and click on “item1.” Keeping your finger
depressed on the left button of the mouse, scroll your mouse downward until all variables
(i.e., item1 through item6) are highlighted. Once they are highlighted, click the right arrow
button (<) in the centre of the dialog box to move the selected variables into the ‘Items:’ text
box.
4.
Next, check to see that text in the ‘Model:’ text box reads “Alpha.” If it does not, click on the
downward arrow (?) to the right of the text box and select Alpha from the list that appears.
Dr. Robert Gebotys 2003
23
5.
Next, click on the Statistics… pushbutton, which will open a ‘Reliability Analysis:
Statistics’ subdialog box similar to the one below.
6.
Next, select (i.e., click on) all options under ‘Descriptives for’ (i.e., item, scale, scale if item
deleted), ‘Summaries’ (i.e., means, vaiances, covariances, correlations), and ‘Inter-item’
(i.e., correlations, covariances). These are the primary statistics that you will need to
interpret your reliability analyses. If, however, you would like further statistics, such as the
‘F test’ and ‘Hotelling’s T-square,’ you can make these selections from options in this
subdialog box.
Once you have made your selections, click the Continue command pushbutton at the top
right-hand corner of the subdialog box. This will return you to the ‘Reliability Analysis:
Statistics’ subdialog box.
You have now completed all the necessary steps in specifying the reliability procedure. If
you would like to examine the SPSS syntax for this procedure, please read the note below. If
you would like to run this procedure now, without examining the syntax, click the OK
command pushbutton at the top right-hand corner of the dialog box.
7.
8.
Note: If you would like to examine the SPSS syntax for this procedure, click
on the Paste command pushbutton to open an SPSS Syntax Window. The
syntax window should then resemble the one below. In order to run this syntax
and complete the reliability analysis, click Run on the menu bar, followed by
All.
24
Once you have run the Cronbach Alpha reliability procedure, the results should appear in an SPSS
Viewer window similar to the one shown below.
At this stage it is recommended that you save and print the contents of the SPSS Viewer window.
The steps that you take to save and print this reliability analysis are identical to the steps taken to
save and print the Scatterplot, as outlined in section 2.4 of this guide.
Your output should resemble the information on the following pages.
Dr. Robert Gebotys 2003
25
5.2 Conducting the Split-Half Reliability Analysis
Please note that the steps necessary for conducting the Split-Half reliability analysis are
almost identical to the procedures outlined above for the Cronbach Alpha analysis. The only
difference when using SPSS for Windows is that you must specify the “Split-Half” model instead of
the “Alpha” model in the Reliability Analysis dialog box. Therefore, for the Split-Half model, step
#4 should read as follows:
#4. Next, check to see that text in the ‘Model:’ text box reads “Split-half.” If it does not,
click on the downward arrow to the right of the text box and select Split-half from the list that
appears.
The SPSS Syntax window for the Split-half analysis should resemble the example below.
Again, it is recommended that you save and print the contents of the SPSS Viewer window. Your
output should resemble the output on the following pages.
26
1.2 Conducting the Even-Odd Reliability Analysis
Unlike the Cronbach Alpha and Split-Half models, the Even-Odd method of assessing reliability
cannot be accessed using the “point and click” approach in SPSS. In order to utilize the Even-Odd
option, one needs to modify the syntax file for the Split-Half model. More specifically, the order of
the items examined needs to be changed so that the odd items form the first part of the scale and the
even items form the remaining part. Therefore, your first step should be to follow the instructions
noted above for undertaking the Split-Half model, but then be sure to click the Paste command
pushbutton in the Reliability Analysis dialog box in order to open the SPSS Syntax window.
Recall that the syntax for the Split-Half model contains the following information:
RELIABILITY
/VARIABLES=item1 item2 item3 item4 item5 item6
/FORMAT=NOLABELS
/SCALE(SPLIT)=ALL/MODEL=SPLIT
/STATISTICS=DESCRIPTIVE SCALE HOTELLING CORR COV ANOVA
/SUMMARY=TOTAL MEANS VARIANCE COV CORR .
For the Even-Odd model, you will need to make the following change to line 4 of the syntax file
(note: the part to be changed is in bold):
Before:
/SCALE(SPLIT)=ALL/MODEL=SPLIT
After:
/SCALE(SPLIT)= item1 item3 item5 item2 item4 item6/MODEL=SPLIT
The entire syntax file should now read:
RELIABILITY
/VARIABLES=item1 item2 item3 item4 item5 item6
/FORMAT=NOLABELS
/SCALE(SPLIT)= item1 item3 item5 item2 item4 item6/MODEL=SPLIT
/STATISTICS=DESCRIPTIVE SCALE HOTELLING CORR COV ANOVA
/SUMMARY=TOTAL MEANS VARIANCE COV CORR .
Once you have made the change noted above, click Run on the menu bar followed by All. The
analysis output should then appear in an SPSS Viewer window. You should then proceed to save and
print the analysis. Your output should resemble the information provided on the following pages.
Dr. Robert Gebotys 2003
27
Download