Supplemental Materials “Structural Invariance of General Behavior Inventory (GBI) Scores in Black and White Young Adults” By L. L. Pendergast et al., 2014, Psychological Assessment Description of Findings from Differential Item Functioning Analyses Data Analysis An item response theory (IRT) framework was used to examine item-level differential item functioning (DIF) across racial groups for items within each of the 20 item parcels. (Item parcels names and content are provided in Table 1 in the article.) Specifically, within each parcel, DIF across racial groups was evaluated for each item relative to the other items in the same parcel. DIF was assessed using the IRTPRO program (Cai, du Toit, & Thissen, 2011). Samejima’s (1997) graded response model was fitted for responses to items within each parcel. Because the GBI has four response options, each analysis estimates one slope and three threshold parameters (see Embertson & Reise, 2000, for review of graded IRT models). In IRT analyses, some items are commonly designated as anchors. Anchor items are presumed to be DIF-free and create a scale for the latent variable. In this study, DIF for each item was tested by designating all other items in the parcel as anchors and comparing model fit for the non-anchor item with and without imposed equality constraints between racial groups (Cohen, Kim, & Wollack, 1996). Although many methods are available for selecting anchor items (see Woods, 2009, for review), we decided to designate all items within a parcel, except the tested item, as anchors because each parcel contains only three or four items. Using too few item anchors (e.g., single items) can lead to low power and less stable estimation (Woods, 2009). Thus, for each analysis, DIF of each item was tested individually with the remaining two or three items used as anchors. Items that exhibited slope or threshold DIF were not used as anchors. Results The majority of GBI items (n = 64) did not display significant DIF (p < .01) as evidenced by the omnibus chi-square values (testing the item thresholds and slope as a set). Two items showed evidence of significant slope DIF (i.e., the degree to which the item is related to the factor differed across racial groups); for seven items, there was evidence of significant threshold DIF (i.e., there were racial differences in the way participants with similar levels of the mood factor responded), and one item displayed slope and threshold DIF; see Table S1. Significant slope DIF on Items 42 and 71 indicates that these items are more related to mania for White participants than for Black participants. Significant threshold differences on Items 2, 6, 9, and 65 reveal that Black participants would need a lower mean level of depression in order to indicate that they experienced a particular depressive symptom more frequently. For example, on Item 6, “Have others said you seem down or lonely?,” the last threshold for White participants (3.61) compared with that for Black participants (2.27) suggests that Black participants would need a lower mean level of depression to endorse “Very often or almost constantly” instead of “Often.” Conversely, on Item 47, “Have there been times when you hated yourself?,” findings indicate that Black participants would require a higher mean level of depression than their White counterparts to indicate that the symptom occurred more frequently. On Items 42, 52, 56, and 71, patterns of differing thresholds varied within the items, rather than indicating that White or Black participants consistently endorsed all thresholds at lower levels of the mood factor. Discussion Item analyses found that 64 of 73 GBI items did not show evidence of significant DIF in either the item slope—indicating strength of association with the underlying mood factor—or thresholds—referring to the level of mood needed for people to be likely to endorse the next higher score on the item. Two items showed slope DIF where the relationship to mood was significantly stronger in the White sample: Items 42 (urges to do mischievous or dangerous things) and 71 (felt fearful or suspicious at times). Eight of the items showed DIF in thresholds, but many of them showed complex patterns where one group would endorse mid-range scores with lower levels of the mood factor, but then require higher levels of mood to endorse the top score on the item. Overall, patterns of threshold DIF tended to cancel out across items, so that subscale scores on the GBI showed little bias between groups, consistent with findings of Freeman et al., (2012) examining a smaller set of parent-rated GBI items in a sample of children and adolescents. References for Supplemental Materials Cai, L., du Toit, S. H. C., & Thissen, D. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT modeling. Chicago, IL: Scientific Software International. Cohen, A. S., Kim, S. O., & Wollack, J. A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement, 20, 15–26. doi:10.1177/014662169602000102 Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum. Freeman, A. J., Youngstrom, E. A., Frazier, T. W., Youngstrom, J. K., Demeter, C., & Findling, R. L. (2012). Portability of a screener for pediatric bipolar disorder to a diverse setting. Psychological Assessment, 24, 341–351. doi:10.1037/a0025617 Samejima, F. (1997). Graded response models. In W. J. Hambleton (Ed.), Handbook of modern item response theory (pp. 85–100). New York, NY: Springer–Verlag. Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33, 42–57. doi:10.1177/014662160731404 Table S1 Results for General Behavior Inventory Items Displaying Significant Differential Item Functioning Across Racial Groups All equal Itema 2. Has your enjoyment in social interaction changed? White Black 6. Have others said you seem down or lonely? White Black 9. Have you lost interest in being with others at times? White Black 42. Have had urges to do mischievous/dangerous things? White Black 47. Have there been times when you hated yourself? White Black 52. Have you had sleep difficulty while depressed? White Black 56. Have you felt worthless for periods of several days? White Black 65. Have you felt more down in the morning than in the evening? White Black 71. Have you felt fearful/suspicious at times? White Black a χ2 p 14.80 .005 27.50 17.30 12.00 20.60 22.70 20.90 18.20 27.70 Item wording has been modified for test security purposes. .000 .002 .017 .000 .000 .000 .001 .000 Equal slopes χ2 p Equal thresholds χ2 p 0.90 13.90 0.00 3.60 8.10 2.90 2.70 0.90 0.80 13.40 .345 .923 .059 .004 .091 .100 .339 .363 .000 27.50 13.70 3.90 17.70 19.90 20.00 17.40 14.30 Slope and threshold parameters b1 b2 b3 a .003 1.26 1.05 1.42 1.44 0.85 0.74 2.47 2.11 1.33 1.40 0.31 0.67 1.65 1.40 3.61 2.27 2.72 1.98 0.07 0.43 1.22 1.18 2.12 2.03 2.16 1.24 0.25 0.51 1.34 1.97 2.43 3.20 3.25 2.51 0.43 0.15 0.86 0.98 1.82 1.82 1.92 2.75 0.01 0.24 1.37 1.58 2.54 2.04 4.89 4.00 0.08 0.12 1.05 1.34 1.94 1.77 2.37 3.23 0.30 0.03 1.63 1.55 2.86 1.95 1.96 1.09 0.10 0.40 1.67 2.40 2.83 3.55 .000 .003 .271 .001 .000 .000 .001 .003