Organizational Research Methods http://orm.sagepub.com/ Using Mixed-Model Item Response Theory to Analyze Organizational Survey Responses: An Illustration Using the Job Descriptive Index Nathan T. Carter, Dev K. Dalal, Christopher J. Lake, Bing C. Lin and Michael J. Zickar Organizational Research Methods 2011 14: 116 originally published online 26 April 2010 DOI: 10.1177/1094428110363309 The online version of this article can be found at: http://orm.sagepub.com/content/14/1/116 Published by: http://www.sagepublications.com On behalf of: The Research Methods Division of The Academy of Management Additional services and information for Organizational Research Methods can be found at: Email Alerts: http://orm.sagepub.com/cgi/alerts Subscriptions: http://orm.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations: http://orm.sagepub.com/content/14/1/116.refs.html >> Version of Record - Dec 10, 2010 OnlineFirst Version of Record - Apr 26, 2010 What is This? Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Using Mixed-Model Item Response Theory to Analyze Organizational Survey Responses: An Illustration Using the Job Descriptive Index Organizational Research Methods 14(1) 116-146 ª The Author(s) 2011 Reprints and permission: sagepub.com/journalsPermissions.nav DOI: 10.1177/1094428110363309 http://orm.sagepub.com Nathan T. Carter1, Dev K. Dalal1, Christopher J. Lake1, Bing C. Lin1,2, and Michael J. Zickar1 Abstract In this article, the authors illustrate the use of mixed-model item response theory (MM-IRT) and explain its usefulness for analyzing organizational surveys. The authors begin by giving an overview of MM-IRT, focusing on both technical aspects and previous organizational applications. Guidance is provided on how researchers can use MM-IRT to check scoring assumptions, identify the influence of systematic responding that is unrelated to item content (i.e., response sets), and evaluate individual and group difference variables as predictors of class membership. After summarizing the current body of research using MM-IRT to address problems relevant to organizational researchers, the authors present an illustration of the use of MM-IRT with the Job Descriptive Index (JDI), focusing on the use of the ‘‘?’’ response option. Three classes emerged, one most likely to respond in the positive direction, one most likely to respond in the negative direction, and another most likely to use the ‘‘?’’ response. Trust in management, job tenure, age, race, and sex were considered as correlates of class membership. Results are discussed in terms of the applicability of MM-IRT and future research endeavors. Keywords item response theory, latent class analysis, invariance testing Item response theory (IRT) models have played a large role in organizational researchers’ understanding regarding measures of a variety of domains, including job attitudes (e.g., Collins, Raju, & Edwards, 2000; Donovan, Drasgow, & Probst, 2000; Wang & Russell, 2005), personality 1 2 Bowling Green State University, OH, USA Department of Psychology, College of Arts and Sciences, Portland State University, OR, USA Corresponding Author: Nathan T. Carter, Department of Psychology, Bowling Green State University, 214 Psychology Building, Bowling Green, OH 43402, USA Email: ntcarte@bgsu.edu 116 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Carter et al. 117 (e.g., Chernyshenko, Stark, Chan, Drasgow, & Williams, 2001; Zickar, 2001; Zickar & Robie, 1999), general mental ability (e.g., Chan, Drasgow, & Sawin, 1999), performance ratings (e.g., Maurer, Raju, & Collins, 1998), vocational interests (e.g., Tay, Drasgow, Rounds, & Williams, 2009), and employee opinions (e.g., Ryan, Horvath, Ployhart, Schmitt, & Slade, 2000). Recent extensions of IRT models provide increased flexibility in the questions researchers and practitioners may ask about the response–trait relationship in organizational surveys. In this article, we explain how organizational survey data can be analyzed using mixed- (or mixture-) model IRT (MM-IRT). MM-IRT combines features of traditional IRT models (e.g., the partial credit model [PCM]) and latent class analysis (LCA; see Lazarsfeld & Henry, 1968) and identifies subgroups of respondents for whom the item response–latent variable relationships indicated by the item response functions (IRFs) are considerably different. Here, we explain the technical features of MM-IRT and discuss some of its potential uses, including checking scoring assumptions, identifying systematic responding, and evaluating potential correlates of class membership. We then illustrate the use of MM-IRT by applying the framework to answer some fundamental questions about the measurement properties of the Job Descriptive Index (JDI; Balzer et al., 1997; Smith, Kendall, & Hulin, 1969), a commercially available survey that measures facets of job satisfaction with five separate scales measuring persons’ satisfaction with their work, coworkers, supervision, pay, and opportunities for promotions (9 items). Our discussion and results are presented in concert with Table 1 that outlines the process from beginning to end that a researcher would use when conducting an MM-IRT analysis. We first show how these steps are accomplished and then how we address each of these steps in our analysis of the JDI. The JDI was chosen for this illustration for several reasons. First, it has consistently been found to be one of the most frequently used measures of job satisfaction (see Bowling, Hendricks, & Wagner, 2008; Connolly & Viswesvaran, 2000; Cooper-Hakim & Viswesvaran, 2005; Judge, Heller, & Mount, 2002). In fact, the JDI Office at Bowling Green State University continues to acquire around 150 data-sharing agreements per year (J. Z. Gillespie, personal communication, October 28, 2009). In addition, the JDI’s ‘‘Yes,’’ ‘‘No,’’ ‘‘?’’ response format has been examined with conventional IRT analyses (e.g., Hanisch, 1992), and measures using similar response scales (i.e., ‘‘Yes,’’ ‘‘No,’’ ‘‘?’’) have been investigated in past MM-IRT research (e.g., Hernandez, Drasgow, & Gonzalez-Roma, 2004). In sum, the JDI provided a well-known measure with properties that have been of interest in the past and in the current research and application, allowing for an accessible and substantively interesting measure for the illustrative MM-IRT analyses. An Introduction to Mixed-Model IRT IRT is essentially a collection of formulated models that attempt to describe the relationship between observed item responses and latent variables (e.g., attitude, personality, and interests). The IRFs of IRT are logistic regressions that start with observed responses and use conditional probabilities of responding to an item in a particular way (e.g., strongly agree) to find an appropriate transformation of the sum score to represent the underlying or latent variable. Additionally, the models parameterize different properties of items (depending on the model) that are on a scale common to the estimates of persons’ standing on the latent variable. Although any number of item properties could be included, IRT models are generally concerned with the location and discrimination of items. The item’s location reflects its difficulty (in an ability context) or extremity (in attitudes or personality). In situations where items have more than two options, the item’s location is often quantified by threshold parameters. Thresholds represent the point on the latent variable continuum at which the probability of responding to one option becomes greater than choosing another; thus, there will be one less threshold parameter than there are options. Generally, the average of these thresholds can be considered an estimate of the item’s location. An item’s discrimination is a reflection of 117 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 118 Organizational Research Methods 14(1) its sensitivity to variability in the latent variable. Discrimination parameters are typically quantified as the regression line’s slope at the point of the item’s location on the trait continuum. IRT models are advantageous for several reasons. Most pertinent to this article is that they place persons and items on a common scale, providing measurement researchers an appropriate framework for determining whether group differences in observed sum scores are likely to be due to differences on the latent variable or for other, extraneous reasons, such as membership to a protected class, response sets, or individual differences other than what the researcher is attempting to measure.1 Conventional IRT models assume that item responses are drawn from one homogenous subpopulation. This implies that one set of IRFs can be used to describe the relationship between item responses and the latent trait. However, it may be plausible that there are subgroups of respondents with different response–trait relationships; in other words, more than one set of item parameters may be needed to model item responding. In fact, it has been noted that such heterogeneity can be expected in situations where the studied population is complex (von Davier, Rost, & Carstensen, 2007), which organizational researchers are likely to encounter. Typically, researchers examine the viability of the homogenous subpopulation assumption by conducting differential item functioning (DIF) analyses based on important manifest group variables. For organizational researchers, these groups are typically based on legally, ethically, or practically important manifest variables such as respondent sex and race. However, it is possible that differences in the response–trait relationship are better described by latent groupings that may or may not correspond to the previously mentioned observed variables. These unobserved groups may exist for a variety of reasons, including differential endorsement strategies and comparison processes that may result from different sociocultural experiences (Rost, 1997). MM-IRT identifies unobservable groups of respondents with different response–trait relationships, in effect an exploratory method of DIF detection wherein subgroups are identified a posteriori (Mislevy & Huang, 2007). In the following sections, we provide a general definition of MM-IRT and discuss the estimation of item and person parameters under the Rasch family of models. We focus here on the use of Rasch-based IRT models because these are the models available in the WINMIRA program (von Davier, 2001), a user-friendly software program that can be used by researchers to conduct such analyses without the more intensive programming involved when estimating other mixed IRT models. MM-IRT Model Definition As noted above, MM-IRT is a hybrid of IRT and LCA, which uncovers unobservable groups whose item parameters, and therein IRFs, differ substantially. The most general form of the LCA model can be written: PðuÞ ¼ G X pg PðujgÞ; ð1Þ g¼1 where P(u) denotes the probability of a vector of observed responses, u ¼ x1, x2, . . . , xi. The term P(u|g) denotes the probability of the response vector within an unobservable group, g, of the size, p. The p parameter, also called the mixing proportions parameter, is used to represent the proportion of respondents belonging to the gth group and carries the assumption: G X pg ¼ 1; g¼1 that is, the summation across the proportion parameter estimates must be equal to 1. 118 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 ð2Þ Carter et al. 119 The conditional probability of observed responses within a group, P(u|g), can be replaced by any number of conventional dichotomous or polytomous IRT models, assuming that the basic form of the model holds across groups with a different set of values for the IRT model’s parameters. Here, we focus on the use of the PCM (Masters, 1982), which is stated: expðhyj sih Þ ; for h ¼ 0; 1; 2 . . . m; M P expðsyj sis Þ PðUij ¼ hÞ ¼ ð3Þ s¼0 or that the probability of person j responding h to an item, i, is determined by the distance between person j’s standing on the latent trait, y, and the sum of the h option thresholds for the s ¼ h þ 1 possible observed response options: sih ¼ h X tig ; with si0 ¼ 0: ð4Þ s¼1 The option location parameter, tis, is the location of the threshold, s, on the scale of y for the ith item. This IRT model assumes that there is one latent population of respondents who use the same response process (i.e., who use scales similarly). Note that the s term here is simply a counting variable for purposes of summation that corresponds to the levels of h. The PCM is a polytomous Rasch model and therefore assumes equal discriminations across items, whereas other polytomous IRT models do not (e.g., the graded response model). Although Rasch models often have worse fit than models that allow item discriminations to vary (see Maijde Meij, 2008), the WINMIRA program uses only Rasch-based measurement models. Estimation is much easier using the simple Rasch models because of the exclusion of multiplicative parameter terms (Rost, 1997). Although we hope that commercially available user-friendly software will someday incorporate more complex models, we proceeded cautiously with the Rasch-based PCM, paying careful attention to item-level fit to test whether the model was tenable for our data. Other researchers using attitudinal (Eid & Rauber, 2000) and personality data (Hernandez et al., 2004; Zickar, Gibby, & Robie, 2004) have found acceptable fit using similar models. Although we focus on the Rasch models available in WINMIRA, it should be noted that our discussion of MM-IRT generalizes easily to IRT models that allow for discrimination to vary. However, doing so requires more extensive programming experience using a more complex program such as LatentGOLD 4.5 (Vermunt & Magidson, 2005). The mixed PCM (MPCM) is obtained by substituting the IRF (Equation 3) in place of the P(u|g) term2 in Equation 1, or PðUij ¼ hÞ ¼ G X g¼1 pg h X expðhyjg sihg Þ ; with s ¼ tisg : ihg M P s¼1 expðsyjg sisg Þ ð5Þ s¼0 According to this model, each person and item have as many sets of relevant parameters as there are groups; so in a 3-class solution, each person will have three estimates of y, and each item will have three times the number of item parameters in the single-group version of the PCM. The item and group-proportion parameters are estimated simultaneously using an extended expectation-maximization (EM) algorithm, using conditional maximum likelihood in the maximization step. Because the sum score can be considered a sufficient estimate of y in Rasch models, the trait estimate is not involved in the EM procedure. The y estimates are obtained using the item and group-proportion parameters established in the EM procedure and then are estimated by setting the observed total score equal to the right-hand side of the model in Equation 5 and solving for y 119 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 120 Organizational Research Methods 14(1) iteratively. This is an important point, because this means y for person j in one group will not differ greatly from their estimate in another group; this is because estimation of y is based on the observed sum score, which is the same for a given person, j, across groups (Rost, 1997). The researcher using MM-IRT need only to specify the number of subpopulations believed to underlie what would typically be considered data coming from one homogenous subpopulation. First, a 1-class model is fit, then a 2-class model, a 3-class model, and so on. Once an increase in the number of classes no longer shows an increase in model-data fit, the models are compared and the best fitting is retained for further analysis. Identifying the latent class structure and therein unmixing groups with different response–trait relationships nested within the 1-class model, the researcher can ask both practically and theoretically interesting questions. In the following section, we show some past uses of MM-IRT in organizational research before presenting our illustration using the JDI. Applications of MM-IRT in Organizational Research In the first application of MM-IRT to organizational research, Eid and Rauber (2000) found two distinct response styles in a measure of organizational leadership satisfaction—one class that made use of the whole response scale and one that preferred extreme responses. These authors found that length of service and level within the organization explained group membership, suggesting usage of the entire response scale may be too complex or too time-consuming, and that certain employees may be more prone to use global judgments such as ‘‘good’’ and ‘‘bad.’’ Additionally, a larger percentage of females belonged to the second class compared to males. Zickar et al. (2004) examined faking in two personality inventories. In their first analysis, Zickar and colleagues applied MM-IRT to an organizational personality inventory (Personal Preferences Inventory; Personnel Decisions International, 1997) with subscales mapping onto the Big 5, and found three classes of respondents for each of the five dimensions of the personality inventory, with the exception of the Neuroticism subscale extracting four classes. In general, results suggested that there were likely three extents of faking: no faking, slight faking, and extreme faking. In their second analysis, the authors applied MM-IRT to a personality inventory (Assessment of Background and Life events; White, Nord, Mael, & Young, 1993) in a sample of military personnel. Prior to survey administration, the authors placed respondents in honest, ad lib faking, and trained faking conditions. Results suggested two response classes for each of these conditions: A faking and an honest class. Interestingly, 7.2% to 22.9% of respondents in the honest condition were placed in the faking class, 27.2% to 41.6% of participants in the coached faking condition were placed in the honest class, reflecting the inherent complexity and high variability in personality test faking. Finally, in a study of measures of extraversion and neuroticism in the Amsterdam Biographical Questionnaire (ABQ; Wilde, 1970), Maij-de Meij, Kelderman, and van der Flier (2008) found that a 3-class solution best fit the data for each scale. The classes were differentiated by the probability of using each of the ‘‘Yes,’’ ‘‘No,’’ and ‘‘?’’ responses. Participants also completed a measure of social desirability as part of the ABQ, and scores on social desirability and ethnic background had significant effects on class membership. Results from this study suggest that personality measure scores from ethnically diverse and high-stakes contexts must be interpreted with caution, as there were strong effects of ethnicity and social desirability on response class membership. Although the analyses discussed above provide valuable insight into the use of MM-IRT in organizational research, a straightforward illustrative application of MM-IRT involving attitudinal surveys is not available in the current literature. Hernandez et al. (2004) provided a thorough investigation of personality questionnaires, and Eid & Rauber (2000) investigated the use of a satisfaction questionnaire. However, their discussion considered the case of a 2-class solution in regard to predicting class membership, which has recently been noted as possibly too restrictive (Maij-de Meij 120 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Carter et al. 121 et al., 2008), and use of solutions with more than two classes requires the use of different analytic techniques and considerations. Additionally, some unique issues related to the use of Rasch models have not seen coverage in the literature on MM-IRT. In this illustration, we intend to address these as yet unattended issues, providing a fairly comprehensive illustration of the use of the method in organizational research. Background to the Illustration: The JDI and Similar Scales Before presenting our own illustration of MM-IRT analysis using the JDI, we begin by briefly reviewing the background of the measure and its typical assumptions for scoring to inform the interpretation of results. In addition, we review MM-IRT research analyzing measures using a ‘‘Yes,’’ ‘‘No,’’ and ‘‘?’’ response format. We then proceed with our analyses, discussing our use of the technique to (a) identify and assess the fit of the appropriate model; (b) qualify the latent classes, (c) check scoring assumptions, (d) examine the possibility of systematic response styles, and (e) examine relevant individual difference and group membership variables as a reason for the latent class structure. Step 1: Background Review of the JDI and Similar Scales Originally, Smith et al. (1969) scored the items in the five JDI scales as follows: ‘‘Yes’’ ¼ 3, ‘‘?’’ ¼ 2, and ‘‘No’’ ¼ 1. Taking data from 236 persons responding to the Work scale, they split the sample into ‘‘satisfied’’ and ‘‘dissatisfied’’ groups and found that dissatisfied persons tended to have significantly more ‘‘?’’ responses than those in the satisfied group. This led them to recommend the currently used asymmetric scoring scheme of 3 (Yes), 1 (?), and 0 (No) because the data suggested the ‘‘?’’ response was more likely to be associated with dissatisfied than satisfied persons (Balzer et al., 1997). Hanisch (1992) evaluated the viability of the Smith et al. (1969) scoring procedure using Bock’s (1972) nominal IRT model. In observing option response functions (ORFs), it was clear that the intervals between options were not equidistant, suggesting that the ‘‘Yes’’ option was well above ‘‘?’’ and ‘‘No’’ on the trait continuum and that those moderately low in y were more likely to endorse the ‘‘?’’ response, thus verifying the original scoring by Smith et al. One of the limitations of the nominal IRT model used by Hanisch is that it assumes that all respondents use the same type of response process when answering items. It may be possible, however, that even though a majority of individuals interpret the ‘‘?’’ response option as a neutral response, there are others who interpret this ambiguous option in other ways. MM-IRT will allow us to further probe how different groups of respondents interpret and use the ‘‘?’’ option. Expectations concerning class structure. Due to our review of measures using response anchors similar to the JDI, we were particularly concerned with the use of the ‘‘?’’ option. Although the research concerning the ‘‘?’’ response of the JDI have been mostly supportive of the common conceptualization of the ‘‘?’’ response as being between ‘‘Yes’’ and ‘‘No’’ responses (e.g., Hanisch, 1992; Smith et al., 1969), other researchers have shown less confidence about this assumption in the analyses of other scales (e.g., Bock & Jones, 1968; Dubois & Burns, 1975; Goldberg, 1971; Hernandez et al., 2004; Kaplan, 1972; Worthy, 1969). To date, the research available using MM-IRT to investigate the use of ‘‘?’’ have agreed with the latter group of researchers (see Hernandez et al., 2004; Maijde Meij et al., 2008; Smit, Kelderman, & van der Flier, 2003), finding that the vast majority avoid the ‘‘?’’ response. Thus, we hypothesize (Hypothesis 1) that a class will be identified that has a higher probability of using the ‘‘?’’and that the remaining classes will avoid the ‘‘?’’ response (Hernandez et al., 2004; Maij-de Meij et al., 2008; Smit et al., 2003). 121 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 122 Organizational Research Methods 14(1) Eid and Zickar (2007) noted that latent classes can also uncover groups of respondents that are ‘‘yea-sayers’’ and ‘‘nay-sayers’’, which use one end of the scale regardless of item content (Hernandez et al., 2004; Maij-de Meij et al., 2008; Reise & Gomel, 1995; Smit et al., 2003). Thus, we expected that the persons who avoid the ‘‘?’’ response would constitute extreme responders, manifested in either one class using only extremes, or two classes, each preferring one of the two extremes (i.e., ‘‘Yes’’ or ‘‘No’’) over other options. Because there was some inconsistency in the past research concerning the division of extreme respondents, we do not formulate formal hypotheses concerning the expected number of classes. Expectations concerning systematic responding. Concerning consistency of classification across scales, we found only one study by Hernandez et al. (2004) that directly addressed this issue. These researchers found only moderate consistency in class membership across the 16 PF, as indicated by the average size of phi correlations between class membership in every scale, M ¼ .39, SD ¼ .04, most of which was due to the largest class. That is, even though for one of the scales, a respondent might be in the class that uses the ‘‘?’’ frequently, did not mean that the same person was placed in the same class for other scales. We postulated that (Hypothesis 2) respondents will be classified into a particular class across the 5 JDI scales with moderate consistency. Support for this hypothesis would suggest that although latent class membership is to some extent determined by a specific response style, it is by no means the only possible consideration. Checking scoring assumptions. Although there is some available research using MM-IRT to evaluate scoring assumptions of scales using the ‘‘?’’ option, we believe this literature is not so established that these findings can be generalized across scales. In this investigation, we will be using the MPCM to examine the viability of the assumption that ‘‘?’’ falls between the ‘‘Yes’’ and ‘‘No’’ response options and therein whether summing across options is an appropriate scoring algorithm. The Rasch models are especially useful for examining the viability of this assumption due to the additivity property of Rasch models (Rost, 1991). This property implies that if the total score does not hold as a meaningful additive trait representation one potential consequence is the disordering of item threshold estimates (Andrich, 2005; Rost, 1991). Take the example of measuring perceived length and relating it to physical length of rods: If additivity holds, a 10-foot rod will be judged as longer than a 5-foot rod; if additivity does not hold, there will be a considerable number of persons judging the 5-foot rod as longer than the 10-foot rod. Correlates of class membership. In addition to the above hypotheses, we were also interested in identifying variables that could explain respondents’ classification into the subpopulation that does not avoid the ‘‘?’’ response. Hernandez et al. (2004) noted that ‘‘Other factors have been suggested (Cruickshank, 1984; Dubois & Burns, 1975; Worthy, 1969) . . . ’’ for respondents’ choice of ‘‘?’’ other than representing a point on the trait continuum between ‘‘Yes’’ and ‘‘No,’’ noting it is possible that respondents ‘‘ . . . (a) have a specific response style, (b) do not understand the statement, (c) do not feel competent enough or sufficiently informed to take a position, or (d) do not want to reveal their personal feelings about the question asked’’ (p. 688). Although Hypothesis 2 above addresses the question of response style influence are the reason for classification, the remaining points should also be addressed. We identified variables to explore in the available data set to address points (c) and (d): job tenure (JT) and trust in management (TIM), respectively. Unfortunately, as will be discussed later, we did not have data available to evaluate point (b), that comprehension drives the decision to use or not use the ‘‘?’’ response. JT was used as an approximation of the idea of Hernandez and colleagues (2004) that those using the ‘‘?’’ response ‘‘ . . . do not feel competent enough or sufficiently informed to take a position’’ (p. 688). Although knowledge concerning the job is variable among new employees, knowledge regarding its domains can be expected to increase during an employee’s tenure (Ostroff & Kozlowski, 1992). JT can be expected to affect the extent to which an employee feels informed regarding the five dimensions on the JDI, which map onto the four aspects of organizational 122 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 123 Using past research and organizational initiatives/concerns identify variables that may be useful for explaining class membership Conduct analyses to determine if these variables explain class membership 5/6. Evaluate the Influence of Response Sets 7. Assess Correlates of Class Membership 5/6. Check scoring assumptions PCM¼partial credit model. 4. Name the Classes 3. Assess Absolute Fit 2. Assess Relative Fit/Model Choice 1. Background/ Review Conduct a review of the measure being studied and measures using similar response scales Formulate hypotheses and expectations concerning class structure and correlates of class membership Determine the appropriate number of classes; fit first the 1-class, then 2-class model, and so on, comparing their fit May need to defer to absolute fit when relative differences are small Determine whether the model fits the data well without reference to other models, and whether there is sufficient item-level fit Name the latent classes (LCs) in a way that is behaviorally meaningful according to response behavior Determine that thresholds are ordered appropriately If not, determine if freeing discrimination parameters alleviates disordering/ Determine if categories are appropriately ordered Ensure that observed scores and trait estimates are commensurate via correlation coefficients Determine whether persons are consistently classified into similar latent groups across scales (only applicable to multiple-scale measures) Description Step Hernandez, Drasgow, & Gonzalez-Roma (2004) for 2-class case. This article for more than two classes Hernandez et al., 2004 for 2-class case This article for more than two classes Kutner, Nachtsheim, Neter, & Li (2005) for logistic regression Maij-de Meij, Kelderman, & van der Flier (2008) for covariate integration Phi correlations for 2-class case Contingency statistics, for example: Chi-square and related effect sizes Cohen’s Kappa Logistic Regression (bi- or multinomial) Covariate Integration Descriptive Statistics across classes Rost & von Davier (1994) WINRMIRA User Manual (von Davier, 2001) Eid & Zickar (2007) Rost (1991) Andrich (2005) Multilog user manual (Thissen, Chen, & Bock, 2003) Thissen & Steinberg (1986) for PCM Item-level Q index for item fit Bootstrapped Pearson’s w2 Model Parameter Values Category probability histograms Bozdogan (1987) WINRMIRA User Manual (von Davier, 2001) Key Citations (where relevant) Item Threshold Parameter Plots Option Response Function Plots Use alternate Bilog, Multilog, or Parscale parameterizations to free discrimination parameters across items Information Theory Statistics such as CAIC Bootstrapped Pearson’s w2 Typical search engines such as PsycInfo Useful Tools/Statistics Table 1. Common Steps for Conducting Mixed-Model Item Response Theory (MM-IRT) Analyses and Explaining Class Membership 124 Organizational Research Methods 14(1) characteristics that employees much learn (Work—job-related tasks, Supervision & Coworkers— group processes and work roles, Pay and Promotion—organizational attributes). TIM was used to approximate the idea of Hernandez et al. (2004) that the ‘‘?’’ response may indicate of a lack of willingness to divulge feelings. In work settings, it has long been thought that trust moderates the level of honest and accurate upward communication that takes place (see Jablin, 1979; Mellinger, 1956). Evidence of the link between trust and open communication comes from recent empirical studies. First, Detert and Burris (2007) found that employee voice, the act of openly giving feedback to superiors, was related to a feeling of safety and trust. Levin, Whitener, and Cross (2006) found that subordinates’ perceived trust in supervisors related to levels of subordinate–supervisor communication. Finally, Abrams, Cross, Lesser, and Levin (2003), in a series of employee interviews, found that trust facilitated the sharing of knowledge with others. In sum, giving honest, open feedback about perceived shortcomings of an organization is likely to be seen as risky (Rousseau, Sitkin, Burt, & Camerer, 1998). Based on this link, we expect those with low TIM to use the ‘‘?’’ response more than those with high TIM as it allows responding in a non-risky way to sensitive scale items. We also decided to examine sex and race as explanatory variables that have been suggested as important considerations in MM-IRT studies (Hernandez et al., 2004). Past research has found that sex is not a significant predictor of class membership when examining scales with the ‘‘Yes’’– ‘‘No’’–‘‘?’’ response scheme (e.g., Hernandez et al., 2004), whereas ethnic background has been found to have a significant effect on class membership in one study (see Maij-de Meij et al. 2008). Here, we consider minority status (i.e., Caucasian vs. non-Caucasian), a typical concern among researchers investigating bias in organizational contexts in the United States. No research we are aware of has examined the influence of age on class membership. These variables were investigated in an exploratory fashion; we do not postulate formal hypotheses. Illustrative Analyses Data Set The data were obtained from the JDI offices at Bowling Green State University, which included responses from 1,669 respondents to the five facet scales: Pay (9 items), the Work (18 items), Opportunities for Promotion (9 items), Supervision (18 items), and Coworkers (18 items). Sample sizes for each scale after deletion of respondents showing no response variance are included in Table 2. The sample consisted of mostly full-time workers (87.8%) and had a roughly equal number of supervisors (42.7%) and nonsupervisors. The mean age was 44, and 45.2% were female. Race demographics were 77% Caucasian/White, 17% Black/African American, 2% Hispanic/Latino, 1% Asian, and 0.5% Native American; the remaining used the ‘Other’ category. From the spring to summer of 1996, data were collected approximately uniformly from the north, midwest, west, and south of United States as part of norming efforts for the JDI (Balzer et al., 1997). JT was measured by asking respondents, ‘‘How many years have you worked on your current job?’’ Responses ranged from 0 to 50, M ¼ 9.84, SD ¼ 9.2 and were positively skewed. We transformed JT by the method of Hart and Hart (2002) to approach normality. TIM was measured with the JDI TIM scale, a 24-item item scale with a similar format to the JDI, using the same response scale, with M ¼ 35.34, SD ¼ 23.11, and a ¼ .96 in this sample. For more information on these data and the TIM measure, see Balzer et al. (1997). Results Step 2: Assessing relative fit. The process of identifying the number of latent classes involves a sequential search process. First, the single-group PCM is estimated by setting g to 1 or 100% of 124 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Carter et al. 125 Table 2. Mixed-Model Item Response Theory (MM-IRT) Model Fit Statistics for Job Descriptive Index (JDI) Scales by the Number of Latent Classes in the Model Number of classes in model (CAIC fit statistic) JDI Scale N 1 2 3 4 5 Work Coworker Supervisor Promotions Pay 1,563 1,529 1,562 1,586 1,564 34,549.18 41,415.55 42,472.62 20,406.90 22,081.51 33,927.93 39,326.18 41,020.70 19,532.70 20,133.49 33,529.71 39,104.40 40,745.23 19,026.70 19,556.70 33,505.90 39,102.27 40,770.56 18,868.63 19,687.08 33,621.13 39,112.48 18,968.41 CAIC ¼ Consistent Aikake’s Information Criteria. Italicized CAIC statistics denote the most appropriate latent class solution. respondents; this model represents the case where it is assumed that all items and persons can be represented by one set of parameters and reduces the model in Equation 5 to Equation 3. The number of classes is then specified as one greater, until a decrease in fit is observed. The most appropriate model is chosen with reference to one of several relative fit statistics. Here, we use the Consistent Aikake’s Information Criteria (CAIC; Bozdogan, 1987). CAIC is considered useful because it penalizes less parsimonious models based on the number of person and item parameters estimated CAIC ¼ 2 lnðLÞ þ 2p½lnðN Þ þ 1; ð6Þ where 2 ln(L) is 2 times the log-likelihood function (common to conventional IRT model fit assessment) taken from the maximization step of the EM procedure. This statistic is increased by correcting for the number of parameters estimated, p, and ln(N) þ 1, the log-linear sample size with an additive constant of 1. The related statistic, Aikiake’s Information Criteria (Akaike, 1973) does not correct for either of these, and the Bayesian Information Criteria (Schwarz, 1978) only corrects for p. CAIC is used because it promotes parsimony more than the alternatives. According to CAIC, all JDI subscales fit a 3-class model well relative to other models estimated (see Table 2). Initially, the 4- and 5-class models for the Promotions scale would not converge within 9,999 iterations to meet the accuracy criterion of .0005 (the default of the WINMIRA program). We solved this problem by attempting to use several different starting values to avoid local maxima. Nonconvergence is not uncommon and is often due to low sample size, as this can lead to nearzero within-class response frequencies; such a finding should spur the researcher to consider the bootstrapped fit statistics in addition to model choice statistics (M. von Davier, personal communication, October 23, 2009). For the Work, Coworker, and Promotions scales, the 4-class model showed slightly better fit than the 3-class model. As noted earlier, in MM-IRT analyses model choice can become difficult due to small incremental increases in fit, as is the case here (see Table 2). It is important to understand that the probability of choosing an overparameterized model increases sharply as the difference between information criteria and sample size become smaller and that model choice should be largely motivated by parsimony (Bozdogan, 1987). In fact, Smit et al. (2003) constrained their number of groups to 2, underscoring the importance of this issue. However, it has been suggested more recently that this approach is likely too restrictive (Maij-de Meij et al., 2008). Following the strategy of Hernandez et al. (2004), we analyzed both solutions further to determine which was more appropriate by examining the absolute (as opposed to relative) fit of the competing models via item-fit, bootstrapped-fit statistics and checking for unreasonable parameter estimates. As noted previously, we attempted to proceed cautiously in our use of the Rasch-based PCM. Therefore, we were concerned with the extent to which there would be misfit due to the fact that 125 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 126 Organizational Research Methods 14(1) Table 3. Mixed-Model Item Response Theory (MM-IRT) Item Fit and Latent Class Size Estimate, p, by Class Type (i.e., ‘‘Y,’’ ‘‘N,’’ and ‘‘?’’) Latent class size estimate, p JDI Scale Work Coworker Supervisor Promotions Pay Misfit item rate AC DC MLQC 5/54 3/54 8/54 2/27 0/27 .52 .40 .48 .32 .34 .27 .27 .29 .57 .51 .21 .33 .22 .11 .15 AC ¼ acquiescent class; DC ¼ demurring class; JDI ¼ Job Descriptive Index; Misfit item rate ¼ number of misfit items/total number of item fit tests; MLQC ¼ most likely to use the question mark class. ‘‘Y’’ corresponds to the class most likely to respond. ‘‘Yes;’’ ‘‘N’’ corresponds to the class most likely to respond. ‘‘No;’’ ‘‘?’’ corresponds to the class most likely to respond ‘‘?’’ the PCM does not take discrimination into account. Therefore, we thought it important to investigate the relative fit of the PCM and the generalized PCM (GPCM). We estimated the equivalent parameterizations of the 1-class PCM and the GPCM (which allows for discrimination to vary across items) by Thissen and Steinberg (1986) in Multilog 7.03 (Thissen, Chen, & Bock, 2003) for the Work scale. We found that the GPCM did show somewhat better fit, CAIC ¼ 12,034, than the PCM, CAIC ¼ 12,458. However, the two models appeared to fit similarly at the absolute level. Additionally, the locations of items under the two models were correlated .93. These results suggested that although the GPCM showed somewhat better fit, the PCM fit reasonably well. Below, we proceed with our analysis using the PCM. Step 3: Assessing absolute fit. Item-level fit provides both important information on the interpretability of item parameters and is an indicator of absolute fit; up to now the focus has been on relative fit (i.e., model choice statistics). The significance test of the z-transformed Q statistic (see Rost & von Davier, 1994) was used to test for item misfit. For scales showing competitive solutions (i.e., lack of clarity in relative fit), we calculated and compared item misfit rates for the 3- and 4-class solutions. Item misfit rates for the 3-class solution of all scales are provided in Table 3. The 3-class solution for the Work scale showed 5 misfitting items, which was above 5% chance levels (i.e., .05[ig], or .05[18 3] ¼ 2.7), whereas the 4-class solution showed 2 misfitting items, below the expected value (i.e., .05[18 4] ¼ 3.6). For the Coworker scale, the 3-class solution, showed 3 items with significant misfit, approximately the number expected by chance (i.e., 2.7); for the 4-class solution, 2 items were misfitting, which was just below chance levels (i.e., 3.6). For the Promotions scale, the 3-class solution showed a smaller proportion of misfitting items (i.e., 2/27 ¼ .037) than the 4-class solution (i.e., 3/36 ¼ .056). The empirical p values of bootstrapped Pearson w2 fit statistics showed better fit for the 3-class solution than the 4-class solution in the Coworker scale (p ¼ .03 vs. p < .001), both solutions showed acceptable fit for the Promotions scale (p ¼ .08 vs. p ¼ .10), and both showed misfit for the Work scale (both p < .001), suggesting absolute fit was either similar or better for the 3-class model in each of these scales. More importantly, inspection of item threshold parameter estimates for the 4-class solution of the Work, Coworker, and Promotions scales showed 6 of the 72, 4 of the 72, and 5 of the 36 items had threshold values greater than +4, respectively, whereas for the 3-class solutions only 2 of 27 items for the Promotions scale were found to exceed this value; these are unreasonable values for item threshold parameters. These findings are indicative that the 4-class solution cannot be ‘‘trusted’’ and the more parsimonious model is more appropriate (M. von Davier, personal communication, October 26, 2009). 126 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Carter et al. 127 Category 0 Category 1 Category 2 Category Probabilities in Class 1 with size 0.48417 0.9 0.8 0.7 Probability 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Item Figure 1. Within-class category probability chart for the acquiescent class (AC) in the Supervisor scale Step 4: Naming the latent classes (LCs). After the most appropriate LC structure has been identified, one can qualify or find a way to refer to LCs that is behaviorally meaningful. This can be accomplished by identifying important differences in response within and between classes. Qualifying the LCs can help both researchers and readers avoid confusion in interpreting the results of MM-IRT studies (Eid & Zickar, 2007). An overall inspection and comparison of within-class item-category probability histograms showed a clear trend in category use. One class, the largest in all scales, with the exception of Pay and Promotions, was more likely than any other class to respond in the positive (i.e., satisfied) direction which we named the Acquiescent Class (AC; see Figure 1). Another class emerged that was more likely than other class to respond negatively and was the largest class in the Pay and Promotions scales; we named this LC the Demurring Class (DC; see Figure 2). Finally, for all scales, there was one class, the smallest for all but the Coworkers scale, which was more likely than other to use the ‘‘?’’ response, which we named the Most Likely to use the Question mark Class (MLQC; see Figure 3). Those belonging to the AC and DC avoided the ‘‘?’’ response, and only the MLQC used the ‘‘?’’ with any regularity, as indicated by its nonzero mode for frequency of ‘‘?’’ usage (see Table 4). These results offer support for Hypothesis 1 that the majority of respondents would avoid using this option (AC and DC). The size of the AC and DC also confirmed our expectations that most respondents would prefer extremes. Table 3 shows the size of each class labeled by the names given above. Steps 5/6: Identification of systematic responding. One possible reason for latent class membership is the manifestation of particular systematic response styles. This question can be addressed by comparing the consistency of class assignment across several scales measuring different attitudes or other latent variables. The current authors could find only one instance of this type of analysis in Hernandez et al. (2004) where it was found that classification consistency was low to moderate across scales of the 16 Personality Factors questionnaire. Conducting 10 w2 analyses based on 3 3 contingency tables for each of the possible scale by scale combinations (e.g., the concordance between the three latent classes for the Pay and Work scales), we found significant w2 statistics 127 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 128 Organizational Research Methods 14(1) Category 0 Category 1 Category 2 Category Probabilities in Class 2 with size 0.29118 0.9 0.8 0.7 Probability 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 10 9 Item 11 12 13 14 15 16 17 18 Figure 2. Within-class category probability chart for the demurring class (DC) in the Supervisor scale Category 0 Category 1 Category 2 Category Probabilities in Class 3 with size 0.22466 0.9 0.8 0.7 Probability 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Item Figure 3. Within-class category probability chart for the most likely to use the question mark class (MLQC) in the Supervisor scale (p< .001) and small to medium effect sizes3 for all the 10 tests (see Table 5). These findings support for Hypothesis 2 that although class membership is not purely a function of response style, it is still a considerable factor, in agreement with Hernandez et al. (2004; see above). 128 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Carter et al. 129 Table 4. Central Tendency Estimates for the Number of Times ‘‘?’’ Was Used by Members of Each Class Work scale (18 items) Coworker scale (18 items) Supervision scale (18 items) Promotion scale (9 items) Pay scale (9 items) Latent Class M SD AC DC MLQC AC DC MLQC AC DC MLQC AC DC MLQC AC DC MLQC 0.41 0.95 4.43 0.63 0.92 6.10 0.70 1.15 6.09 0.34 0.49 2.99 0.26 0.46 2.71 0.70 1.13 2.11 0.88 1.26 3.11 0.98 1.35 3.10 0.72 0.85 2.15 0.55 0.74 1.18 Mode 0 0 3 0 0 5 0 0 5 0 0 3 0 0 2 AC ¼ acquiescent class; DC ¼ demurring class; MLQC ¼ most likely to use the question mark class. Steps 5/6: Checking scoring assumptions. Inspecting item parameters revealed the threshold locations of the PCM were disordered for the majority of classes, suggesting a potential violation of the property of additivity (discussed above). Across all scales of the JDI, the AC and DC showed disordering with large distances between thresholds (e.g., see Figures 4 and 5). For the MLQC, the thresholds were ordered as would be expected (e.g., see Figures 6–8), suggesting the sum score is an appropriate representation of these classes’ satisfaction levels. For the Pay and Work scales, thresholds were nearly identical (see Figures 9 and 10). When thresholds are disordered, as was the case for the AC and DC, it is possible that ordered integer scoring (e.g., 0, 1, 2 or 0, 1, 3) may not be appropriate. However, it is also possible that the model has been properly estimated in such cases (Borsboom, 2005; Rost, 1991). Thus, we looked closer at these classes to determine whether the typical ordered sum-scoring of the JDI is viable for representing the latent trait. First, we consider whether the observed score distributions are consistent with the type discussed by Rost (1991) in which a measurement model may be estimated properly. Additionally, we consider the influence of excluding a discrimination parameter in the measurement model (see Borsboom, 2005, chap. 4). The observed score distributions for the AC and DC were either highly skewed (e.g., Figure 11) or U-shaped (e.g., see Figure 12), whereas the MLQC showed a quasi-normal distribution with low kurtosis (e.g., see Figure 13). This is consistent with Rost’s (1991) guidance that disordered thresholds could be properly estimated under such distributional conditions. Although this threshold disordering may seem serious at first glance, it should be noted that this means the intersections of categories, and not the categories themselves, are disordered. Thresholds represent the point on the trait continuum at which endorsing one option becomes greater than another (e.g., the level of the trait where endorsing ‘‘Yes’’ becomes more likely than endorsing ‘‘?’’) or the intersection of the category curves. Thus, it is possible for thresholds to be disordered whereas category curves are not. Consulting response curve plots for the 1-class PCM for each of the scales (which also showed disordered thresholds), we noted that the disordering of thresholds was a result of the low probability of using the ‘‘?’’ option and not disordered categories (e.g., see Figure 14). Additionally, y estimates were highly positively correlated with the sum score across 129 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 130 Organizational Research Methods 14(1) Table 5. w2 Statistics and Cohen’s w Below and Cohen’s Kappa Above the Diagonal by Scale Pairs w2 (Cohen’s w based on Cramér’s j) JDI Scale Work Work Coworker Supervisor Promotions Pay Coworker Supervisor .01 162.14 176.56 51.31 113.95 (.34) (.35) (.18) (.28) Promotions .03 .02 .07 .23 .03 222.84 (.38) 90 (.24) 73.37 (.22) Pay 80.38 (.22) 92.17 (.25) .01 .07 .01 .06 110.9 (.27) All tests performed on 3 (latent class membership, scale 1) 3 (class membership, scale 2) contingency tables with df ¼ 4. Negative Cohen’s Kappa suggests no agreement and can effectively be set to 0. Threshold 1 Threshold 2 Item Parameters in Class 1 with size 0.48417 3 2 Threshold 1 0 −1 −2 −3 1 2 3 4 5 6 7 8 9 10 Item 11 12 13 14 15 16 17 18 Figure 4. Within-class item threshold parameter plot for the acquiescent class (AC) in the Supervisor scale classes for the Work (r ¼ .97), Coworker (r ¼ .96), Supervisor (r ¼ .97), Pay (r ¼ .99), and Promotions (r ¼ .98) scales. As noted above, it is also important to consider the possibility that the threshold disordering is due to the Rasch-based model being too restrictive by not taking discrimination into account (M. von Davier, personal communication, October 23, 2009). We determined that this was not the case. As noted above, we estimated the equivalent parameterizations of the 1-class PCM and GPCM shown by Thissen and Steinberg (1986) for the Work scale. Although the GPCM fit the data somewhat better than the PCM, the inclusion of varying discriminations did not alleviate disordering for the large majority of items. These results suggest the disordering in the AC and DC is due to these groups’ low probability of choosing the ‘‘?’’ option, as can be seen by examining the intersections of the ORFs in Figure 14. Were the probability of using ‘‘?’’ higher in this plot, the thresholds would be ordered as expected. Furthermore, observed score distributions were consistent with those discussed by Rost (1991) and were not due to the exclusion of a discrimination parameter. Thus, the threshold disordering did not 130 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Carter et al. 131 Threshold 1 Threshold 2 Item Parameters in Class 2 with size 0.29118 3 2 Threshold 1 0 −1 −2 −3 1 2 3 4 5 6 7 8 9 10 Item 11 12 13 14 15 16 17 18 Figure 5. Within-class item threshold parameter plot for the demurring class (DC) in the Supervisor scale Threshold 1 Threshold 2 Item Parameters in Class 3 with size 0.22466 3 2 Threshold 1 0 −1 −2 −3 1 2 3 4 5 6 7 8 9 10 Item 11 12 13 14 15 16 17 18 Figure 6. Within-class item threshold parameter plot for the most likely to use the question mark class (MLQC) in the Supervisor scale appear because of problems with the data or the model, which appeared to have been properly specified, gave trait estimates consistent with sum scores and showed appropriate category ordering in spite of the disordered parameters. 131 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 132 Organizational Research Methods 14(1) Threshold 1 Threshold 2 Item Parameters in Class 2 with size 0.32449 3 2 Threshold 1 0 −1 −2 −3 −4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Item Figure 7. Within-class item threshold parameter plot for the most likely to use the question mark class (MLQC) in the Coworker scale Threshold 1 Threshold 2 Item Parameters in Class 3 with size 0.11211 4 3 2 Threshold 1 0 −1 −2 −3 −4 1 2 3 4 5 6 7 8 9 Item Figure 8. Within-class item threshold parameter plot for the most likely to use the question mark class (MLQC) in the Promotions scale 132 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Carter et al. 133 Threshold 1 Threshold 2 Item Parameters in Class 3 with size 0.15026 3 Threshold 2 1 0 −1 −2 −3 1 2 3 4 5 6 7 8 9 Item Figure 9. Within-class item threshold parameter plot for the most likely to use the question mark class (MLQC) in the Pay scale Threshold 1 Threshold 2 Item Parameters in Class 3 with size 0.20947 4 3 2 Threshold 1 0 −1 −2 −3 −4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Item Figure 10. Within-class item threshold parameter plot for the most likely to use the question mark class (MLQC) in the Work scale 133 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 134 Organizational Research Methods 14(1) Frequency WLE MLE Person Parameters in Class 1 with size 0.40114 4 40 2 Frequency Parameter 30 0 20 −2 10 −4 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Raw score Figure 11. Observed score distribution for the Coworker scale in the acquiescent class (AC) Frequency WLE MLE Person Parameters in Class 3 with size 0.27437 4 40 2 20 Frequency Parameter 30 0 −2 10 −4 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Raw score Figure 12. Observed score frequency distribution for the Coworker scale in the demurring class (DC) Step 7: Correlates of class membership. Two basic ways of investigating this question were found by the authors in the available MM-IRT literature: (a) use of regression techniques and (b) integration of covariates into measurement models. In our illustration, we focus on the former because more organizations and researchers are likely to possess the statistical and/or programming expertise to accomplish the regression-based approach. In this method, the latent class structure is used as a multinomial dependent variable, and therefore multinomial logistic regression (MLR) is an appropriate mode of analysis in determining predictors of class membership. This is an important extension of 134 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Carter et al. 135 Frequency WLE MLE Person Parameters in Class 2 with size 0.32449 4 40 2 0 20 Frequency Parameter 30 −2 10 −4 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Raw score Figure 13. Observed score frequency distribution for the Coworker scale in the most likely to use the question mark class (MLQC) Item Characteristic Curve: 3 1.0 1 Probability 0.8 0.6 0.4 0.2 3 2 0 −3 −2 −1 0 Ability 1 2 3 Figure 14. Example Option Response Curve for an item in the Work scale under the 1-class partial credit model (PCM) Note: 1 ¼ No, 2 ¼ ?, 3 ¼ Yes. from previous research that has (appropriately for their purposes) used only binary logistic models for 2-class solutions (e.g., Hernandez et al., 2004) or complex covariate-measurement model integration when more than two classes are retained (e.g., Maij-de Meij, 2008). 135 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 136 Organizational Research Methods 14(1) In all, we used five variables as possible predictors of class membership: (a) JT; the transformed number of years on the job; (b) the TIM, self-report measure; (c) age; (d) sex; and (e) race. We performed multinomial logistic regression to investigate these variables as correlates of latent class membership (i.e., AC, DC, and MLQC), one for each variable representing latent class membership on the JDI scales (i.e., the outcome variables). The correlations between these variables are shown in Table 6. The AC was used as the reference group in these analyses as it had a consistently large relative size compared to the DC and MLQC. First, likelihood ratio statistics were calculated to select variables explaining class membership (see Table 7). Variables that were nonsignificant were not included in the final MLR analyses. TIM was the only variable selected for all scales, and race was the only variable that did not contribute to any of the five models. JT was only included in the MLR analyses for Promotions and Pay scales; Sex was included for Work, Promotions, and Pay; and Age for Work and Promotions. Next, the final MLR analyses can be conducted using only the variables selected in the first step. Logit regression coefficients, B, can be tested by the Wald statistic, which is distributed approximately w2 with one degree of freedom. The exponentiation of the log-odds, exp(B), can be interpreted as the probability of belonging to one class over the reference class (AC here). However, for dichotomous predictor variables, two special considerations are necessary. First, B and therein, exp(B) will likely be much larger than those for continuous predictors. This is due to the fact that a one-unit change in, for example, TIM is much less than that for a one-unit change in Sex (i.e., moving from male to female). Second, the standard errors used to compute the Wald statistic are often unduly inflated and thus can lead to high Type II Errors (Menard, 2002). Thus, for dichotomous variables, the likelihood ratios above should be used for variable selection and exp(B) should be considered over the Wald statistic. The final MLR models showed that the selected variables were successful in explaining low to moderate amounts of variance in class membership (Nagelkerke R2 from .04 to .20; see model information in Table 8). For all JDI scales, TIM was found to be a significant predictor of belonging to the MLQC and DC for all scales such that having a higher TIM score decreased the chances of belonging to either group relative to the AC by between 97% and 99%, as indicated by the exp(B) coefficient.4 The only exception was for the Promotions scale, where higher TIM increased the chance of belonging to the MLQC, though by a negligible factor (i.e., 3%). JT was a significant predictor of belonging to the MLQC and DC for the Promotions and Pay scales; more JT increases the likelihood of belonging to these classes for the Promotions scale and decreases this likelihood for the Pay scale. Higher age decreased the likelihood of belonging to the DC and MLQC in the Work scale, whereas those with higher age showed increased chances of belonging to the DC and MLQC. Females had larger chances than males of being in the MLQC for the Work and Pay scales and in the DC for the Pay scale. Females showed lower chances than males of being in the DC for the Work scale and the MLQC group for the Promotions scale. Results of the logistic regression analyses for the MLQC classes are briefly summarized in Table 9. Discussion In this article, we provided a comprehensive overview of MM-IRT and past research using the method, in addition to related techniques that can be used to understand latent class structures in item response data. We have included here, a table (Table 1) outlining the major steps commonly undertaken in conducting MM-IRT analyses. Given that MM-IRT is an underutilized tool in the organizational research, this study used MM-IRT to investigate some interesting questions about a popular work-related satisfaction measure, the JDI. The data at hand afforded us a way to show some 136 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 137 2 .185** .001 –.115** .017 –.048 –.065** .297** .020 –.234** –.067** –.067** –.032 –.016 –.008 –.009 –.051* .019 –.008 6a 6b .148** .012 .083** .053* –.196** –.034 7a 7b –.159** –.108** –.466** .312** –.538** –.347** 6c 7c 8a 8b –.011 .240** –.084** –.144** .149** .071** –.176** .064** –.084** .161** –.050* .028 .101** –.091** –.572** –.054* –.172** –.023 .285** –.165** –.160** .375** –.466** –.299** –.026 .054* –.022 –.001 .089** –.547** –.074** –.489** –.244** 5 8c 9a 9b –.091** .020 .025 .045 .127* –.043 9c 10a 10b .142** –.048 –.116** .188** –.041 –.078** .189** .009 –.071** .138** –.013 –.030 .181** –.064** –.016 .162** –.086** –.124** .210** –.045 –.606** –.103** .142** –.060* –.073** .129** –.068** –.065** .215** –.143** –.064** –.229** –.511 .113** .136** .191** –.064** –.112** .123** –.002 –.096** .145** –.057* –.076** –.066** –.117** –.040 .112** –.040 –.016 .092** –.030 –.010 .116** –.069** –.632** –.004 –.041 –.113** –.050* .233** –.098** –.070** .198** –.110** –.049* .211** –.321** –.379** .014 .036 .001 .014 .013 .022 .025 .050* .010 .152** 4 –.144** .086** –.107** –.035 –.049* .165** –.271** .131** .083** –.008 –.100** .289** –.152** –.042 .060* .097** .198** .012 –.054* –.184** –.056* –.067** .009 –.033 .199** .046 –.174** –.058* –.056* –.081** .185** .082** 3 –.006 .007 –.017 .110** .432** .030 .117** .028 .171** –.057* 1 AC ¼ acquiescent class; DC ¼ demurring class; JT ¼ job tenure; MLQC ¼ most likely to use the question mark class; TIM ¼ trust in management. Dummy coding: Sex (1 ¼ Female, 0 ¼ Male); Race (1 ¼ Not Caucasian, 0 ¼ Caucasian). * p < .05, and ** p < .01. 1. JT 2. TIM 3. Age 4. Race 5. Sex Dummy variables Work 6a. AC 6b. DC 6c. MLQC Coworker 7a. AC 7b. DC 7c. MLQC Supervisor 8a. AC 8b. DC 8c. MLQC Pay 9a. AC 9b. DC 9c. MLQC Promotions 10a. AC 10b. DC 10c. MLQC Predictor Variables Table 6. Predictor Variables and Dummy Variables by Scale 138 Organizational Research Methods 14(1) Table 7. Likelihood Ratio Statistics for Model Predictors by Scale Likelihood ratio statistics for variable selection JDI Scale Work Coworker Supervisor Promotions Pay TIM JT Age Sex Race 68.09a 0.58 13.96a 15.01a 3.85 51.91a 2.59 6.29 4.56 0.75 143.63a 1.53 7.32 5.75 1.28 193.01a 15.03a 39.21a 14.83a 3.64 88.49a 20.31a 2.96 31.12a 7.22 Dummy coding: Sex (1 ¼ Female, 0 ¼ Male); Race (1 ¼ Not Caucasian, 0 ¼ Caucasian). JDI ¼ Job Descriptive Index; JT ¼ job tenure; TIM ¼ trust in management. a Indicates that the predictor significantly improved the model and was selected to remain in the model. *p< or essentially equivelent to .01 (i.e. .05 corrected for the number of JDI scales analyzed, or .05/5). Table 8. Multinomial Logistic Regression Results Using the AC as the Reference Category for All Scales Scale (Class) Variable Work (DC) TIM* Age* Sex TIM* Age* Sex TIM* TIM* TIM* TIM* TIM* JT* Age* Sex TIM* JT Age* Sex* TIM* JT* Sex* TIM* JT* Sex* Work (MLQC) Coworker (DC) Coworker (MLQC) Supervisor (DC) Supervisor (MLQC) Promotions (DC) Promotions (MLQC) Pay (DC) Pay (MLQC) B Wald Significance exp(B) 0.02 0.02 0.30 0.01 0.02 0.31 0.02 0.01 0.03 0.02 0.02 0.66 0.03 <0.01 0.03 0.25 0.02 0.55 0.02 0.59 0.71 0.01 0.68 0.55 62.97 7.59 3.89 21.37 7.75 5.16 42.08 26.08 127.52 47.33 41.35 14.40 15.03 <0.01 44.29 1.18 6.95 10.10 84.92 12.89 34.05 17.88 13.85 12.65 <.001 .012 .026 <.001 .005 .023 <.001 <.001 <.001 <.001 <.001 <.001 <.001 .998 <.001 .277 .008 .001 <.001 <.001 <.001 <.001 <.001 .001 0.98 0.98 0.74 0.99 0.98 1.37 0.98 0.99 0.97 0.98 0.98 1.93 1.03 1.00 1.03 1.28 0.98 0.58 0.98 0.60 2.04 0.99 0.51 1.74 95% CIexp(B) (Low:High) 0.97:0.98 0.97:1.00 0.57:0.96 0.98:0.99 0.97:1.00 1.04:1.79 0.98:0.98 0.98:0.99 0.96:0.97 0.97:0.98 0.98:0.99 1.37:2.70 1.01:1.04 0.77:1.30 1.02:1.04 0.82:1.99 0.96:0.99 0.41:0.81 0.97:0.98 0.45:0.79 1.60:2.59 0.98:0.99 0.35:0.72 1.28:2.35 Model information. Work: N ¼ 1,471; w2 (10) ¼ 102.96; Nagelkerke R2 ¼ .08; Coworker: N ¼ 1,506; w2 (10) ¼ 51.79; Nagelkerke R2 ¼ .04; Supervisor: N ¼ 1,510; w2 (10) ¼ 154.77; Nagelkerke R2 ¼ .11; Promotions: N ¼ 1,529; w2 (10) ¼ 282.89; Nagelkerke R2 ¼ .20; Pay: N ¼ 1,505; w2 (10) ¼ 140.94; Nagelkerke R2 ¼ .10. Dummy coding: Sex (1 ¼ Female, 0 ¼ Male). AC ¼ acquiescent class; DC ¼ demurring class; MLQC ¼ most likely to use the question mark class; TIM ¼ trust in management. potential issues that can arise and should be investigated when using mixed Rasch models as available in the WINMIRA (von Davier, 2001) software program. First, we hope to have elucidated the procedure and importance of model selection in both the relative and absolute sense, how model selection can be complicated by data and model issues, and 138 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Carter et al. 139 Table 9. Summary of Predictor Variable Influence and Directionality on the Odds of Belonging to the DC and MLQC Over the AC Class membership predictors DC JDI Scale TIM Work Coworker Supervisor Promotions Pay 98% 98% 97% 98% 98% JT þ93% 60% MLQC Age Female TIM 98% 74% þ3% þ100% þ103% 99% 99% 98% þ3% 99% JT þ128% 51% Age Female 98% þ134% 98% 58% þ73% þ Signifies that as the variable increases, the odds of being in the indicated class relative to the AC increase. Signifies that as the variable increases, the odds of being in the indicated class relative to the AC decrease. No value signifies the variable was excluded from the model. Values in boldface were not found to be significant by the Wald statistic. TIM ¼ trust in management; JT ¼ job tenure; AC ¼ acquiescent class; DC ¼ demurring class; MLQC ¼ most likely to use the question mark class. the courses of action that can be taken to determine whether the model is appropriate. Second, by showing how one may go about identifying and qualifying the latent class structure, we intended to show the utility of this strategy in enhancing the clarity of analyses to both the researcher and the reader. Our results showed the majority of respondents were not classified into the MLQC. This result supported our first hypothesis in agreement with past research (see Hernandez et al., 2004; Maij-de Meij et al., 2008; Smit et al., 2003). However, given the small size of this class, it is difficult to tell whether these results would generalize to future samples. Therefore, similar studies should be conducted to determine whether this finding is replicable, ideally with larger sample sizes and among several measures using similar ‘‘Yes’’–‘‘No’’–‘‘?’’ scales. Our illustration demonstrated that model selection is not always a simple, clear-cut process. In fact, it can require several indicators of absolute and relative fit to determine the number of classes necessary to appropriately explain heterogeneity in response data. Although relative fit statistics such as CAIC may be highly helpful, they should not be viewed as the only indication of the proper number of classes. As was shown here, in situations where model selection becomes unclear, it may be necessary to turn to measures of absolute fit (see Table 1, Step 3), such as item-level fit and the p values of bootstrapped fit statistics (e.g., Pearson’s w2). In this situation, however, the simpler indication was the value of parameter estimates for the 3- and 4-class solutions. We found that although the 3-class solution showed reasonable estimates for threshold locations, the 4-class solution did not. Although the two solutions were similar in relative and absolute fit measures, the 4-class solution’s threshold values were unreasonable in that they indicated that within some classes, endorsing one option over another became probable only at extreme levels of the trait, where an unreasonably small number of persons are located. One rough heuristic in these cases is to interpret this in a fashion similar to z scores, wherein values larger than +2.5 become questionable (Reise & Waller, 2003) and those larger than our relatively liberal cutoff of +4 are unacceptable. We also showed how MM-IRT results can be analyzed to identify the influence of a systematic response set by comparing classification consistency across scales. Results also supported our second hypothesis, in that effect sizes based on contingency tables indicated only a small to medium degree of consistency of class membership across scales, suggesting that respondents’ LC membership is not due solely to a particular response set, though it is a considerable influence. This result is also consistent with the study of Hernandez et al. (2004). Given that the ‘‘?’’ does not appear to be merely the result of 139 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 140 Organizational Research Methods 14(1) a response set, these results suggest those who are using the ‘‘?’’ purposefully use it when responding to items addressing particular facets of the job (i.e., in different scales of the JDI). Additionally, we showed how scoring processes can be checked by taking advantage of the strong assumptions of Rasch-based models and checking to see whether the sum score is a sufficient estimate of the latent variable. Our results suggested that the ordered-scoring assumption of the JDI is tenable across classes. Although the disordered thresholds suggest the ‘‘?’’ response does not lie between the ‘‘Yes’’ and ‘‘No’’ options for the AC and DC, and no information is gleaned from the ‘‘?’’ response for these persons’ satisfaction levels, this is due to the fact that they simply do not use the option with any frequency. For example, in item 12 of the Supervisor scale, 1.9% of the AC and 3.1% of the DC used the ‘‘?’’ response. However, 32.1% of the MLQC chose the ‘‘?’’ option. We also found that the threshold disordering was not due to the exclusion of a discrimination parameter. Although the GPCM showed better fit to the data, it did not show ordered thresholds for the majority of items. Furthermore, the observed score distributions were similar to those Rost (1991) has said would be observed in disordered but properly estimated measurement models. Therefore, the respondents for whom the ‘‘?’’ does not fall between ‘‘Yes’’ and ‘‘No,’’ there is effectively no influence of this option on the sum score. For the respondents who actually use the ‘‘?’’ response, the option falls between the other categories, and the sum score is conceptually valid. Hernandez et al. (2004) offered four potential explanations for using the ‘‘?’’ option. We used proxy variables to test the notion that (a) the respondents may not feel sufficiently informed to make a decision (JT) as well as (b) the respondents may not feel comfortable divulging their view (TIM). Additionally, we tested age, race, and sex due to their importance in the pantheon of organizational research. MLR was used to explain class membership from these individual difference variables, which provides a unique contribution in, that other MM-IRT studies have either used binary logistic regression (e.g., Hernandez et al., 2004) or by integrating covariates into the latent trait model (e.g., Maij-de Meij et al., 2008). Results of the MLR supported the explanation of willingness of Hernandez et al. (2004; see also DuBois & Burns, 1975) to divulge information. Those with lower levels of TIM were more likely to be classified into the MLQC over the AC. This supports the view that if respondents are willing to divulge their opinions (i.e., respondents trust management), they are less likely to use the ‘‘?’’ response. In addition, TIM predicted class membership, such that for all scales of the JDI, lower TIM also increased the probability of belonging to the DC over the AC. For all scales, the strength of relationship was similar as indicated by similar exp(B) values. Not having enough information about the topic was also considered a reason to use the ‘‘?’’ option (Hernandez et al., 2004). Using JT as a proxy for this assertion, we found that the analyses supported this explanation only for the Pay scale of the JDI; more JT decreases the chances of being in the MLQC and DC. This may suggest that those who have been on their job for a number of years are less uncertain concerning their pay, and therein more likely to used the ‘‘Yes’’ or ‘‘No’’ responses. JT had the opposite influence in the Promotions scale; the longer one has been on the job, the more likely they are to be in the DC or MLQC. This may be because any lack of clarity in the promotional structure of organizations may be more apparent or salient as one approaches the level of tenure where those opportunities are realistic options for employees. Age can also be considered a proxy for information about the job; it is reasonable to assume younger workers are less likely to have contact with the work world compared to older to workers. Age, as a proxy for job information, helped explain class membership for the Work and Promotion scales of the JDI. Older workers were less likely to be classified into the ‘‘?’’ class, whereas older workers could be expected to have more solidified attitudes toward work satisfaction facets, as was noted by Adams, Carter, Wolford, Zickar, and Highhouse (2009). Sex and race as explanations were harder to evaluate, as the Wald statistic is less reliable for dichotomous variables. Although some of these variables were nonsignificant according to this 140 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Carter et al. 141 statistic, we considered the predictor for each model it was selected into by the LR statistic (see Table 7). However, the more clear findings were those for the Pay scale. Here, we found that females were around twice as likely to be categorized as MLQC or DC over AC. This may be a product of the well-known inequality in salary between males and females in the United States. From an organizational perspective, this would be an unsettling finding for a business as it may indicate that females do not feel as though they have been provided a clear understanding of the pay structure or the reasons for differentials in pay between themselves and their male colleagues. Note that such a finding for age could be equally problematic, especially if the variable were coded into ‘‘Over-40’’ and ‘‘Under-40’’ categories consistent with the Age Discrimination and Employment Act. The finding for race shown here would be a much more positive finding, as there appeared to be no relationship between class membership and this important group membership variable. Another trend we noticed here is the consistency of findings for the DC and MLQC. This is especially interesting in light of the conception of Smith et al. (1969) of scoring the option 0, 1, and 3. The authors concluded that this was the appropriate scoring scheme as respondents who were less satisfied tended to use the ‘‘?’’ response more than satisfied persons, as determined by a distributional split on satisfaction scores. In fact, the MLQC was found to have generally lower observed scores (see Figure 13) than those in the AC (see Figure 11) but higher than those in the DC (see Figure 12). However, for these respondents, there appears to be little difference between choosing ‘‘?’’ over ‘‘No’’ and choosing ‘‘Yes’’ over ‘‘?’’ in terms of satisfaction. Unfortunately, all the suggestions of Hernandez et al. (2004) could not be followed due to the range of variables available in the data set. The idea that respondents who use the ‘‘?’’ category do not understand the content of the item could be tested by inclusion of a cognitive or general mental ability measure. Research by Krosnick et al. (2002) suggests that if the ‘‘?’’ response is used a no-opinion response, then lower intelligence respondents are more likely to use the ‘‘?’’ category. That is, if respondents are using the ‘‘?’’ response because of misunderstanding, less intelligent respondents are expected to use the ‘‘?’’ more often. This is still open for researchers to investigate, as it has yet to be addressed empirically in the literature. However, potential alternate and/or complimenting proxy variables for testing some of these suggestions were unavailable. For example, one reviewer noted that the personality trait of neuroticism could have been used to address the idea that persons do not feel competent enough to provide a response other than ‘‘?’’ and that both neuroticism and agreeableness could be used to test whether those in the MLQC were reluctant to divulge their feelings. Furthermore, it could be that the inclusion of personality variables would aid prediction over and/or above TIM, as TIM is situational and has shown a low correlation with personality-based trust, r ¼ .16 (Dirks & Ferrin, 2002). However, personality-based measures of trust would tell us little about the work situation faced by persons, as they tell us nothing in regard to the target of trust (i.e., the manager; Clark & Payne, 1997; Dirks & Ferrin, 2002; Mayer, Davis, & Schoorman, 1995; Payne & Clark, 2003). Furthermore, TIM’s strong relationship to organizationally relevant situation-driven variables, such as managerial leadership styles and organizational justice (see Dirks & Ferrin, 2002) and the finding here that class membership is variable across scales may reflect that the use of ‘‘?’’ is situation driven. However, the inclusion of personality variables related to trust in future studies may shed light on the stability or instability of ‘‘?’’ usage. Low correlations of TIM with more stable individual difference variables such as trait anxiety, and the finding that relatively more variance in TIM is explained by situational aspects (Payne & Clark, 2003) suggest that TIM is likely not redundant with regard to personality or mental ability. As of now, the question remains; Is use of the ‘‘?’’ determined situationally or is it a manifestation of a more stable individual difference. Although Hernandez et al. (2004) found that traits such as social boldness, abstractedness, and impression management predict membership into the group likely to use the ‘‘?,’’ it is interesting to note these MM-IRT analyses were based on personality 141 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 142 Organizational Research Methods 14(1) measures. Therefore, it may be that stable individual differences may best predict the use ‘‘?’’ for personality measures (as in Hernandez et al., 2004), whereas its use in attitudinal measures may best be determined by situational variables (as shown here). Additionally, comparisons between the use of ‘‘?’’ in different organizational conditions or stages, such as restructuring or layoff periods would also give some indication of the nature of ‘‘?’’ usage. Finally, as one reviewer noted, it could be interesting to examine the use of the ‘‘?’’ response across time. Although this was not likely an issue here, given the short time frame for data collection, we believe this would be an interesting and relevant question for future studies. Although our illustration shows the utility of the MM-IRT technique, one limitation is the sample size requirement for accurate standard errors of parameter estimates, which are larger than for typical IRT analyses (Zickar & Broadfoot, 2009), given the need to estimate class membership parameters. However, we still encourage researchers to use this tool in their research. The increased costs to obtain more respondents will likely be outweighed by the rich information that MM-IRT offers. Here, we have shown how MM-IRT can be used by organizational researchers to determine the nature of the group structure by closely examining the most appropriate multigroup solution and considering individual and group differences important to organizational researchers as explanations of class membership. We hope our illustration has provided researchers with a somewhat comprehensive reference, expanding on prior research in regard to the topics addressed (e.g., use of Rasch models, qualifying class structures, and attitudinal measures), the complementary methods of analyses used (e.g., MLR), as well as providing further insight into respondents use of the ‘‘?’’ response. Additionally, we hope that organizational researchers will harness this powerful tool to conduct further research on organizational surveys in general, as well as to further investigate the use of ‘‘Yes’’/ ‘‘No’’/ ‘‘?’’ and other types of response scales. Notes 1. Readers unfamiliar with conventional item response theory (IRT) models are referred to Zickar (2001) for a concise introduction to polytomous IRT modeling, which is beyond the scope of this manuscript; for a more comprehensive treatment, we suggest Item Response Theory for Psychologists by Embretson and Reise (2000). 2. Note that any other IRT model could be substituted in the same fashion for P(u|g). pffiffiffiffiffiffiffiffiffiffiffi 3. To estimate association, we calculated the w effect size from Cramér’s (1946) j by w ¼ j h 1, where h is the number of categories for the smallest number, r or c, of an r c contingency table (Sheskin, 2004). Using Cohen’s (1988) criteria for w, values of .1 are considered small and values between .3 and .5 are considered medium. 4. Exp(B) can be interpreted as a probability, and thus converted to percentage chance of belonging to a group over the reference group (see Kutner, Nachtsheim, Neter, & Li, 2005). Acknowledgments The authors would like to thank Dr. Matthias von Davier of Educational Testing Service for his kindness in answering their questions and for providing them with his personal communications and Dr. Jennifer Z. Gillespie of Bowling Green State University and the JDI Office for her willingness to share these data and her personal communications. Declaration of Conflicting Interests The authors declared no potential conflicts of interests with respect to the authorship and/or publication of this article. 142 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Carter et al. 143 Funding The authors received no financial support for the research and/or authorship of this article. References Abrams, L. C., Cross, R., Lesser, E., & Levin, D. Z. (2003). Nurturing interpersonal trust in knowledge-sharing networks. Academy of Management Executive, 17, 64-71. Adams, J. E., Carter, N. T., Wolford, K., Zickar, M. J., & Highhouse, S. (2009). The job descriptive index: A reliability generalization study. Presented at the 24th Annual Meeting of the Society for Industrial and Organizational Psychology, New Orleans, LA. Akaike, H. (1973). Information theory and the extension of the maximum likelihood principle. In V. N. Petrov & F. Scáki (Eds.), Second international symposium of information theory (pp. 267-281). Budapest: Akadémiai Kiadó. Andrich, D. (2005). The Rasch model explained. In S. Alagumalai, D. D. Durtis & N. Hungi (Eds.), Applied Rasch measurement: A book of exemplars. New York: Springer. Balzer, W. K., Kihm, J. A., Smith, P. C., Irwin, J. L., Bachiochi, P. D., Robie, C., et al. (1997). User’s manual for the Job Descriptive Index (JDI; 1997 Revision) and the Job in General (JIG) scales. OH: Bowling Green State University. Bock, R. D. (1972). Estimating item parameters and latent ability when the responses are scored in two or more nominal categories. Psychometrika, 37, 29-51. Bock, R., & Jones, L. V. (1968). The measurement and prediction of judgment and choice. San Francisco: Holden Day. Borsboom, D. (2005). Measuring the mind: Conceptual issues in contemporary psychometrics. New York: Cambridge University Press. Bowling, N. A., Hendricks, E. A., & Wagner, S. H. (2008). Positive and negative affectivity and facet satisfaction: A meta-analysis. Journal of Business and Psychology, 23, 115-125. Bozdogan, H. (1987). Model selection and Akaike’s Information Criteria: The general theory and its analytic extensions. Psychometrika, 52, 345-370. Chan, K.-Y., Drasgow, F., & Sawin, L. L. (1999). What is the shelf life of a test? The effect of time on the psychometrics of a cognitive ability test battery. Journal of Applied Psychology, 84, 610-619. Chernyshenko, O. S., Stark, S., Chan, K.-Y., Drasgow, F., & Williams, B. (2001). Fitting item response theory models to two personality inventories: Issues and insights. Multivariate Behavioral Research, 36, 523-562. Clark, M. C., & Payne, R. L. (1997). The nature and structure of workers’ trust in management. Journal of Organizational Behavior, 18, 205-224. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. Collins, W. C., Raju, N. S., & Edwards, J. E. (2000). Assessing differential functioning in a satisfaction scale. Journal of Applied Psychology, 85, 451-461. Connolly, J. J., & Viswesvaran, C. (2000). The role of affectivity in job satisfaction: A meta-analysis. Personality and Individual Differences, 29, 265-281. Cooper-Hakim, A., & Viswesvaran, C. (2005). The construct of work commitment: Testing an integrative framework. Psychological Bulletin, 131, 241-259. Cramér, H. (1946). Mathematical methods of statistics. Uppsala, Sweden: Almqvist & Wiksells. Cruickshank, P. J. (1984). A stress arousal mood scale for low vocabulary subjects: A reworking of Mackay et al. (1978). British Journal of Psychology, 75, 89-94. Detert, J. R., & Burris, E. R. (2007). Leadership behavior and employee voice: Is the door really open? Academy of Management Journal, 50, 869-884. Dirks, K. T., & Ferrin, D. L. (2002). Trust in leadership: Meta-analytic findings and implications for research and practice. Journal of Applied Psychology, 87, 611-628. 143 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 144 Organizational Research Methods 14(1) Donovan, M. A., Drasgow, F., & Probst, T. M. (2000). Does computerizing paper-and-pencil job attitude scales make a difference? New IRT analyses offer insight. Journal of Applied Psychology, 85, 305-313. Dubois, B., & Burns, J. A. (1975). An analysis of the meaning of the question mark response category in attitude scales. Educational and Psychological Measurement, 35, 869-884. Eid, M., & Rauber, M. (2000). Detecting measurement invariance in organizational surveys. European Journal of Psychological Assessment, 16, 20-30. Eid, M., & Zickar, M. J. (2007). Detecting response styles and faking in personality and organizational assessments by mixed Rasch models. In M. Von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models: Extensions and applications. New York: Springer. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum. Goldberg, G. (1971). Response format in attitude scales. Unpublished manuscript, Northwestern University. Hanisch, K. A. (1992). The job descriptive index revisited: Questions about the question mark. Journal of Applied Psychology, 77, 377-382. Hart, R., & Hart, M. (2002). Statistical process control for health care. Pacific Grove, CA: Duxbury. Hernandez, A., Drasgow, F., & Gonzalez-Roma, V. (2004). Investigating the functioning of a middle category by means of a mixed-measurement model. Journal of Applied Psychology, 89, 687-699. Jablin, F. M. (1979). Superior-subordinate communication: The state of the art. Psychological Bulletin, 86, 1201-1222. Judge, T. A., Heller, D., & Mount, M. K. (2002). Five-factor model of personality and job satisfaction: A metaanalysis. Journal of Applied Psychology, 87, 530-541. Kaplan, K. J. (1972). On the ambivalence-indifference problem in attitude theory: A suggested modification of the semantic differential technique. Psychological Bulletin, 77, 361-372. Krosnick, J. A., Holbrook, A .L., Berent, M. K., Carson, R. T., Hanemann, W. M., Kopp, R. J., et al. (2002). The impact of ‘‘no opinion’’ response options on data quality: Non-attitude reduction or an invitation to satisfice? Public Opinion Quarterly, 66, 371-403. Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied linear statistical models (5th ed.). Boston: McGraw-Hill. Lazarsfeld, P. E., & Henry, N. W. (1968). Latent structure analysis. Boston: Houghton Mifflin. Levin, D. Z., Whitener, E. M., & Cross, R. (2006). Perceived trustworthiness of knowledge sources: The moderating impact of relationship length. Journal of Applied Psychology, 91, 1163-1171. Maij-de Meij, A. M., Kelderman, H., & van der Flier, H. (2008). Fitting a mixture item response theory model to personality questionnaire data: Characterizing latent classes and investigating possibilities for improving prediction. Applied Psychological Measurement, 32, 611-631. Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174. Maurer, T. J., Raju, N. S., & Collins, W. C. (1998). Peer and subordinate performance appraisal measurement equivalence. Journal of Applied Psychology, 83, 693-702. Mayer, R. C., Davis, J. H., & Schoorman, F. D. (1995). An integrative model of organizational trust. Academy of Management Review, 20, 709-734. Mellinger, G. D. (1956). Interpersonal trust as a factor in communication. Journal of Abnormal Social Psychology, 52, 304-309. Menard, S. (2002). Applied logistic regression analysis (2nd ed.). Thousand Oaks, CA: Sage. Mislevy, R., & Huang, C. -W. (2007). Measurement models as narrative structures. In M. Von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models: Extensions and applications. New York: Springer. Ostroff, C., & Kozlowski, S. (1992). Organizational socialization as a learning process: The role of information acquisition. Personnel Psychology, 45, 849-874. Payne, R., & Clark, M. (2003). Dispositional and situational determinants of trust in two types of managers. International Journal of Human Resource Management, 14, 128-138. 144 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 Carter et al. 145 Personnel Decisions International. (1997). Selection systems test scale manual. Unpublished document. Minneapolis, MN: Author. Reise, S. P., & Gomel, J. N. (1995). Modeling qualitative variation within latent trait dimensions: Application of mixed-measurement to personality assessment. Multivariate Behavioral Research, 30, 341-358. Reise, S. P., & Waller, N. G. (2003). How many IRT parameter does it take to model personality items? Psychological Methods, 8, 164-184. Rost, J. (1991). A logistic mixture distribution for polychotomous item responses. British. Journal of Mathematical and Statistical Psychology, 44, 75-92. Rost, J. (1997). Logistic mixture models. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 449-463). New York: Springer. Rost, J., & von Davier, M. (1994). A conditional item-fit index for Rasch models. Applied Psychological Measurement, 18, 171-182. Rousseau, D. M., Sitkin, S. B., Burt, R. S., & Camerer, C. (1998). Not so different after all: A cross-discipline view of trust. Academy of Management Review, 23, 393-404. Ryan, A. M., Horvath, M., Ployhart, R. E., Schmitt, N., & Slade, L. (2000). Hypothesizing differential item functioning in global employee opinion surveys. Personnel Psychology, 53, 531-562. Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464. Sheskin, D. J. (2004). Handbook of parametric and non-parametric statistical procedures (3rd ed.). Boca Raton, FL: Chapman & Hall/CRC. Smit, A., Kelderman, H., & van der Flier, H. (2003). Latent trait latent class analysis of an Eysenck personality questionnaire. Methods of Psychological Research Online, 8, 23-50. Smith, P. C., Kendall, L. M., & Hulin, C. L. (1969). The measurement of satisfaction in work and retirement: A strategy for the study of attitudes. Skokie, IL: Rand-McNally. Tay, L., Drasgow, F., Rounds, J., & Williams, B. A. (2009). Fitting measurement models to vocational interest data: Are dominance models ideal? Journal of Applied Psychology, 94, 1287-1304. Thissen, D., Chen, W.-H, & Bock, R. D. (2003). Multilog (version 7) [Computer software]. Lincolnwood, IL: Scientific Software International. Thissen, D. & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51, 567-577. Vermunt, J. K., & Magidson, J. (2005). LatentGOLD (v4.5). Belmont, MA: Statistical Innovations, Inc. von Davier, M. (2001). WINMIRA 2001: Windows mixed Rasch model analysis [Computer software and User manual]. Kiel, the Netherlands: Institute for Science Education. von Davier, M., Rost, J., & Carstensen, C. H. (2007). Introduction: Extending the Rasch model. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models: Extensions and applications. New York: Springer. Wang, M., & Russell, S. S. (2005). Measurement equivalence of the Job Descriptive Index across Chinese and American workers: Results from confirmatory factor analysis and item response theory. Educational and Psychological Measurement, 65, 709-732. White, L. A., Nord, R. D., Mael, F. A., & Young, M. C. (1993). The Assessment of Background and Life Experiences (ABLE). In T. Trent & J. H. Laurence (Eds.), Adaptability screening for the armed forces (pp. 101-162). Washington, DC: Office of Assistant secretary of Defense (Force Management and Personnel). Wilde, G. (1970). Neurotische labiliteit gemeten volgens de vragenlijstmethode [Neurotic ability measured by the questionnaire method]. Amsterdam: Van Rossen. Worthy, M. (1969). Note on scoring midpoint responses in extreme response style scores. Psychological Reports, 24, 189-190. Zickar, M. J. (2001). Conquering the next frontier: Modeling personality data with item response theory. In B. Roberts & R. Hogan (Eds.), Applied personality psychology: The intersection of personality and I/O psychology (pp. 141-158). Washington, DC: American Psychological Association. 145 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012 146 Organizational Research Methods 14(1) Zickar, M. J., & Broadfoot, A. A. (2009). The partial revival of a dead horse? Comparing classical test theory and item response theory. In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and methodological myths and urban legends: Doctrine, verity, and fable in the organizational and social sciences (pp. 37-60). New York: Routledge. Zickar, M. J., Gibby, R. E., & Robie, C. (2004). Uncovering faking samples in applicant, incumbent, and experimental data sets: An application of mixed-model item response theory. Organizational Research Methods, 7, 168-190. Zickar, M. J., & Robie, C. (1999). Modeling faking good on personality items: An item-level analysis. Journal of Applied Psychology, 84, 551-563. Bios Nathan T. Carter is currently a doctoral student in Industrial-Organizational Psychology at Bowling Green State University. His research concerns the application of psychometric techniques in organizational and educational settings. He is also interested in individual differences and the history of applied psychology. Dev K. Dalal is currently a doctoral student in the Industrial-Organizational Psychology program at Bowling Green State University. His research interests include methodological issues as applied to behavioral and applied research, application of psychometric theories to measurement issues, and investigating how individuals respond to items. Christopher J. Lake is a graduate student currently working toward his PhD in Industrial-Organizational Psychology at Bowling Green State University. His research focuses on testing, measurement, and methodological issues in the social sciences. Bing C. Lin is a doctoral student in the Industrial-Organizational Psychology area in the Portland State University Applied Psychology program. His research interests include various occupational health psychology topics such as interruptions and recovery at work. In addition, he is interested in the application of psychometric techniques to organizational issues. Michael J. Zickar is an associate professor of Psychology at Bowling Green State University where he is also department chair. He has published widely in the area of psychological measurement as well as the history of applied psychology. 146 Downloaded from orm.sagepub.com at Hong Kong Inst of Education Library on March 20, 2012