PAS-Pendergast_suppl.FINAL

advertisement
Supplemental Materials
“Structural Invariance of General Behavior Inventory (GBI) Scores in Black and White
Young Adults”
By L. L. Pendergast et al., 2014, Psychological Assessment
Description of Findings from Differential Item Functioning Analyses
Data Analysis
An item response theory (IRT) framework was used to examine item-level
differential item functioning (DIF) across racial groups for items within each of the 20
item parcels. (Item parcels names and content are provided in Table 1 in the article.)
Specifically, within each parcel, DIF across racial groups was evaluated for each item
relative to the other items in the same parcel. DIF was assessed using the IRTPRO
program (Cai, du Toit, & Thissen, 2011). Samejima’s (1997) graded response model was
fitted for responses to items within each parcel. Because the GBI has four response
options, each analysis estimates one slope and three threshold parameters (see Embertson
& Reise, 2000, for review of graded IRT models).
In IRT analyses, some items are commonly designated as anchors. Anchor items
are presumed to be DIF-free and create a scale for the latent variable. In this study, DIF
for each item was tested by designating all other items in the parcel as anchors and
comparing model fit for the non-anchor item with and without imposed equality
constraints between racial groups (Cohen, Kim, & Wollack, 1996). Although many
methods are available for selecting anchor items (see Woods, 2009, for review), we
decided to designate all items within a parcel, except the tested item, as anchors because
each parcel contains only three or four items. Using too few item anchors (e.g., single
items) can lead to low power and less stable estimation (Woods, 2009). Thus, for each
analysis, DIF of each item was tested individually with the remaining two or three items
used as anchors. Items that exhibited slope or threshold DIF were not used as anchors.
Results
The majority of GBI items (n = 64) did not display significant DIF (p < .01) as
evidenced by the omnibus chi-square values (testing the item thresholds and slope as a
set). Two items showed evidence of significant slope DIF (i.e., the degree to which the
item is related to the factor differed across racial groups); for seven items, there was
evidence of significant threshold DIF (i.e., there were racial differences in the way
participants with similar levels of the mood factor responded), and one item displayed
slope and threshold DIF; see Table S1.
Significant slope DIF on Items 42 and 71 indicates that these items are more
related to mania for White participants than for Black participants. Significant threshold
differences on Items 2, 6, 9, and 65 reveal that Black participants would need a lower
mean level of depression in order to indicate that they experienced a particular depressive
symptom more frequently. For example, on Item 6, “Have others said you seem down or
lonely?,” the last threshold for White participants (3.61) compared with that for Black
participants (2.27) suggests that Black participants would need a lower mean level of
depression to endorse “Very often or almost constantly” instead of “Often.” Conversely,
on Item 47, “Have there been times when you hated yourself?,” findings indicate that
Black participants would require a higher mean level of depression than their White
counterparts to indicate that the symptom occurred more frequently. On Items 42, 52, 56,
and 71, patterns of differing thresholds varied within the items, rather than indicating that
White or Black participants consistently endorsed all thresholds at lower levels of the
mood factor.
Discussion
Item analyses found that 64 of 73 GBI items did not show evidence of significant
DIF in either the item slope—indicating strength of association with the underlying mood
factor—or thresholds—referring to the level of mood needed for people to be likely to
endorse the next higher score on the item. Two items showed slope DIF where the
relationship to mood was significantly stronger in the White sample: Items 42 (urges to
do mischievous or dangerous things) and 71 (felt fearful or suspicious at times). Eight of
the items showed DIF in thresholds, but many of them showed complex patterns where
one group would endorse mid-range scores with lower levels of the mood factor, but then
require higher levels of mood to endorse the top score on the item. Overall, patterns of
threshold DIF tended to cancel out across items, so that subscale scores on the GBI
showed little bias between groups, consistent with findings of Freeman et al., (2012)
examining a smaller set of parent-rated GBI items in a sample of children and
adolescents.
References for Supplemental Materials
Cai, L., du Toit, S. H. C., & Thissen, D. (2011). IRTPRO: Flexible, multidimensional, multiple
categorical IRT modeling. Chicago, IL: Scientific Software International.
Cohen, A. S., Kim, S. O., & Wollack, J. A. (1996). An investigation of the likelihood ratio test for
detection of differential item functioning. Applied Psychological Measurement, 20, 15–26.
doi:10.1177/014662169602000102
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Freeman, A. J., Youngstrom, E. A., Frazier, T. W., Youngstrom, J. K., Demeter, C., & Findling, R. L.
(2012). Portability of a screener for pediatric bipolar disorder to a diverse setting. Psychological
Assessment, 24, 341–351. doi:10.1037/a0025617
Samejima, F. (1997). Graded response models. In W. J. Hambleton (Ed.), Handbook of modern item
response theory (pp. 85–100). New York, NY: Springer–Verlag.
Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied
Psychological Measurement, 33, 42–57. doi:10.1177/014662160731404
Table S1
Results for General Behavior Inventory Items Displaying Significant Differential Item Functioning Across Racial Groups
All equal
Itema
2. Has your enjoyment in social interaction changed?
White
Black
6. Have others said you seem down or lonely?
White
Black
9. Have you lost interest in being with others at times?
White
Black
42. Have had urges to do mischievous/dangerous things?
White
Black
47. Have there been times when you hated yourself?
White
Black
52. Have you had sleep difficulty while depressed?
White
Black
56. Have you felt worthless for periods of several days?
White
Black
65. Have you felt more down in the morning than in the
evening?
White
Black
71. Have you felt fearful/suspicious at times?
White
Black
a
χ2
p
14.80
.005
27.50
17.30
12.00
20.60
22.70
20.90
18.20
27.70
Item wording has been modified for test security purposes.
.000
.002
.017
.000
.000
.000
.001
.000
Equal slopes
χ2
p
Equal thresholds
χ2
p
0.90
13.90
0.00
3.60
8.10
2.90
2.70
0.90
0.80
13.40
.345
.923
.059
.004
.091
.100
.339
.363
.000
27.50
13.70
3.90
17.70
19.90
20.00
17.40
14.30
Slope and threshold parameters
b1
b2
b3
a
.003
1.26
1.05
1.42
1.44
0.85
0.74
2.47
2.11
1.33
1.40
0.31
0.67
1.65
1.40
3.61
2.27
2.72
1.98
0.07
0.43
1.22
1.18
2.12
2.03
2.16
1.24
0.25
0.51
1.34
1.97
2.43
3.20
3.25
2.51
0.43
0.15
0.86
0.98
1.82
1.82
1.92
2.75
0.01
0.24
1.37
1.58
2.54
2.04
4.89
4.00
0.08
0.12
1.05
1.34
1.94
1.77
2.37
3.23
0.30
0.03
1.63
1.55
2.86
1.95
1.96
1.09
0.10
0.40
1.67
2.40
2.83
3.55
.000
.003
.271
.001
.000
.000
.001
.003
Download