Supplementary Material File 1 Title: Organizationally-Relevant Configurations: The Value of Modeling Local Dependence Journal: Quality and Quantity Author Names: Pankaj C. Patel Assistant Professor of Management Department of Marketing and Management Miller College of Business Ball State University 2000 W. University Ave. Muncie, IN 47306 Phone: 765-285-3174 Fax: 765-285-4315 Email: pcpatel@bsu.edu Sherry M.B. Thatcher Katerina Bezrukova TABLE S.1. Sample of Research Studies on Organizational Configurations Level of configuration Nation Inductive Configurations Key areas Empirical method used Deductive Configurations Key Areas Institutional configurations (Hall & Gingerich, 2009) Capitalism complementarities (Amable, 2003) Cross-country innovation systems Fixed and random effects cross-country cluster analysis Classification of legal systems – common law and civil law (e.g. Glaeser & Shleifer, 2002) Distance based Clustering Economic Systems – capitalism, socialism, communism (e.g. Peng, Wang, & Jiang, 2008) Hofstede’s cultural values – power distance, uncertainty avoidance, 1 Enhances robustness of cluster analysis by using 1 The list is by no means exhaustive. It lists recent papers in respective areas; our goal is to draw attention to the fact that in recent years research on OR configurations has been very robust. 1 (Castellacci & Archibugi, 2008) Industry Technology diffusion across (Sundqvist, Frank, & Puumalainen, 2005) Regional knowledge and capital accumulation (Powell, Koput, Bowie, & SmithDoerr, 2002) Industry classification (Kelton, Pasquale, & Rebelein, 2008) Strategic groups (Chen, 1996; DeSarbo, Grewal, & Wang, 2009; J. C. Short et al., 2007; Wang, 2007) Operational and supply chain capabilities (Heim & Sinha, 2002) Marketing and sales strategy (Homburg, Jensen, & Krohmer, 2008) three different hierarchical agglomerative methods, three different distance measures, and using kmeans iterative partitioning method to confirm findings of hierarchical agglomerative clustering approach Hierarchical cluster analysis using Ward's approach individualism vs. collectivism, masculinity vs. femininity, long-term vs. short-term orientation (Hofstede, 1983) Spatial clustering Regional development -- Rural, urban, semi-urban areas (Stimson, Stough, & Roberts, 2006) Update to Feser and Berger's factor analysis based approach to identify industry clusters at the national level Cluster analysis, Bi-linear spatial multidimensional scaling, and Finite mixture structural equation modeling Nature of product innovation -- Hitech, low-tech, and medium tech (Von Tunzelmann & Acha, 2005) Hierarchical cluster analysis using Ward's approach A three stage clustering approach: (a) number of clusters were determined using cubic clustering criteria and pseudo-t2; (b) observations were assigned to clusters using a hybrid approach combining Ward's approach and k2 Strategic orientation – Prospectors, analyzers, defenders, and reactors (Miles & Snow, 1984) Niche partitioning – Specialist vs. generalist (Carroll, 1985) Firms Teams Information systems (Ferratt, Agarwal, Brown, & Moore, 2005) Human resource management (Toh, Morgeson, & Campion, 2008) Organizational climate systems (Schulte, Ostroff, & Kinicki, 2006; Schulte, Ostroff, Shmulyian, & Kinicki, 2009) Organizational commitment (Sinclair, Tucker, Cullen, & Wright, 2005; Somers, 2009) Organizational relationship processes (Song, Tsui, & Law, 2009) Employee coping (Cortina & Wasti, 2005) Leadership (Foti & Hauenstein, 2007) Demographic faultlines (Barkema & Shvyrkov, 2007; Tuggle, Schnatterly, & Johnson, 2010) means; (c) stability of cluster assignment was assessed using cross validation procedure. Two-stage clustering approach Hierarchical cluster analysis using Ward's approach Two-stage clustering approach Organizational Structure – Mechanistic or organic (Lawrence & Lorsch, 1967) Hierarchical cluster analysis using Ward's approach Industry/organizational/product life cycle – Birth, growth, maturity, decline, and death (Audretsch & Feldman, 1996; Jawahar & McLaughlin, 2001; Klepper, 1996) Hierarchical cluster analysis using Ward's approach Deviant workplace behavior– production deviance, property deviance, political deviance, and personal aggression (Robinson & Bennett, 1995) Power – Expert, referent, reward, charisma, legitimate, and coercive (Hinkin & Schriesheim, 1989) Leadership styles – Charismatic, participative, situational, transactional, transformational (Avolio, Bass, & Jung, 1999) Teams based on work context -Advice and involvement teams, production and service teams, action and negotiation teams, project and development teams (Sundstrom, De Meuse, & Futrell, 1990) Hierarchical cluster analysis using Ward's approach Hierarchical cluster analysis using Ward's approach LCCA Demographic, personal and ability, 3 and geographical faultlines2 (Bezrukova et al., 2009; Homan, van Knippenberg, Van Kleef, & De Dreu, 2007; Lau & Murnighan, 2005; Li & Hambrick, 2005; Molleman, 2005; Polzer, Crisp, Jarvenpaa, & Kim, 2006; Rico, Molleman, SanchezManzanares, & Van der Vegt, 2007; Thatcher et al., 2003) Team temporal status -- long term teams and short term teams (Marks, Mathieu, & Zaccaro, 2001) Uncertainty, complexity, and routine - project teams vs. production teams (De Dreu & Weingart, 2003) Reward structure configuration16 (Homan et al., 2008) 2 These papers use either multivariate cluster analysis where configurations are defined a priori based on theory or experimental manipulations where configurations are created from theory 4 Supplementary Material File 2 Title: Organizationally-Relevant Configurations: The Value of Modeling Local Dependence Journal: Quality and Quantity Author Names: Pankaj C. Patel Assistant Professor of Management Department of Marketing and Management Miller College of Business Ball State University 2000 W. University Ave. Muncie, IN 47306 Phone: 765-285-3174 Fax: 765-285-4315 Email: pcpatel@bsu.edu Sherry M.B. Thatcher Katerina Bezrukova 5 Table S.2 Comparison between CA and LCCA with Local Dependence for OR Configurations Empirical Steps Selection of Variables Standardization of Variables Multicollinearity Description Cluster Analysis (CA) LCCA with local dependence Researcher must chose variables to include in inductive or deductive approach (Ketchen Jr et al., 1993) Variables selected by researcher used as inputs to the cluster analysis. Variable selection acts as input to LCCA for inductive selection. Allows variables to contribute equally. However, meaningful differences may be lost. High correlation among a set or sets of variables could overestimate presence and effects of underlying construct or constructs Equivocal suggestions on standardization Standardization not required Use factor analysis or assess if cluster solution remains the same under different methods of addressing multicollinearity Local dependence modeling allows for explicit modeling of correlations that are inductively or deductively driven. Using factor analysis results in excluding factors below eigenvalue of 1, and hence it may eliminate meaningful variance (Dillon, Mulani, & Frederick, 1989). Furthermore, assessing reliability under different multicollinearity conditions is not exhaustive and By modeling local dependence for deductive approaches one can empirically assess whether theoretically relevant group variables are interrelated in an empirical context. 6 Comparing LCCA with local dependence and CA As this step is driven by researcher volition, one statistical method may not provide a distinct advantage over the other. However, in the context of deductive configurations, LCCA with local dependence can help assess the relevance of the variables used to classify cases into configurations by researcher. Local dependence tests can help model interrelationships to ensure robustness of the configurations. By not requiring standardization, LCCA with local dependence preserves meaningful differences that are typically lost through standardization in CA This is the key advantage of LCCA with local dependence over CA. Modeling local dependence helps theoretically and empirically model relationships between two or more set of variables that correlate. Classification Algorithms Organization method used to classify cases in a sample to different groups Determining number of clusters Different classification approaches may result in a different number of clusters. A more valid classification approach helps determine the number of clusters that closely resemble the underlying empirical phenomenon. Reliability External Validity Hierarchical and nonhierarchical approaches based on distance measures Hierarchical: Visual inspection of dendogram Non-hierarchical: Visual inspection of ‘elbow’ graph. Cubic Clustering Coefficient: Measures within and between cluster homogeneity. However, it results in identifying too many clusters (Milligan & Cooper, 1985) The extent to which the a) Run CA multiple times current cluster solution b) Split the sample to can be replicated assess if cluster solution is replicated in the other half of the sample Testing theoretically Using multivariate meaningful relationships techniques such as not all methods for reducing multicollinearity may be available to the researcher. Modeling local dependence helps identify helps efficient and non-redundant classification under inductive or deductive configurations. Cases classified into clusters using posterior membership probabilities estimated from maximum likelihood methods. Robust statistical indices such as Log-likelihood, AIC and BIC are available to assess fit for a different number of cluster solutions. Drawing on numerical methods, likelihood based approaches in LCCA with local dependence have shown to be more robust and reliable by providing asymptotically unbiased and efficient parameter estimates (Vermunt, 2003). Comparison between competing models is possible using LCCA with local dependence. In CA such comparison between multiple cluster solutions is based on visual inspection and researcher skill. Stable solutions with no need for replication (Winsberg & De Soete, 1993) LCCA with local dependence fit indices (based on asymptotically distributed Chisquare random variables) determine the optimal number of clusters. Such indices are less dependable when calculated from multi-dimensional contingency tables with sparse data (Formann & Kohlmann, 1996). Increased reliability in estimation of configurations make LCCA with local dependence more appealing. Using mixture regressions, clustering and exogenous With the ability to create configurations simultaneously with exogenous variables, 7 between cluster solutions and externallydefined variables ANOVA, OLS, and Ftests. variables can be simultaneously modeled; thus providing a robust test of external validity (McLachlan & Peel, 2000). Type I errors are reduced and better causality can be established. Comparing LCCA with local dependence and CA LCCA with local dependence helps accommodate a variety of measurement scales. Other Issues Description Cluster Analysis (CA) LCCA with local dependence Types of variables The extent to which different types of variables can be incorporated in analysis Number of cases required to minimize Type II errors Typically need continuous or categorical variables to calculate distances. Small sample sizes are acceptable Any combination of continuous, categorical (nominal or ordinal), or count variables are acceptable User Expertise Programming expertise and in-depth knowledge of modeling Minimal Significant Convergence Likelihood of model convergence failure Software cost and accessibility Minimal Significant Available in the majority of statistical packages, and accessible to users at different expertise levels Software (MPlus and LatentGold) are relatively expensive but are accessible to advanced users Sample Size Software Typically larger sample sizes are required. 8 LCCA with local dependence helps model unobserved heterogeneity (i.e. different configurations may have different effect sizes with exogenous variables). LCCA with local dependence requires a significantly larger sample size and hence may lead to large Type II errors with smaller samples. LCCA with local dependence requires programming and the ability to analyze outputs in order to make adjustments to models. LCCA with local dependence could be result in limited convergence. LCCA with local dependence software is relatively costly and not easily accessible to users with low to moderate programming expertise. Supplementary Material File 3 Title: Organizationally-Relevant Configurations: The Value of Modeling Local Dependence Journal: Quality and Quantity Author Names: Pankaj C. Patel Assistant Professor of Management Department of Marketing and Management Miller College of Business Ball State University 2000 W. University Ave. Muncie, IN 47306 Phone: 765-285-3174 Fax: 765-285-4315 Email: pcpatel@bsu.edu Sherry M.B. Thatcher Katerina Bezrukova Data and Variables for the Search Strategy Sample To assess the role of organizational configurations in the context of strategy we explore the degree to which search routines for innovation differ among four unique industry groups high technology, low technology, medium-high-technology, and medium low technology sectors. The data is based on the Community Innovation Survey 3 (CIS-3). Data description and variable operationalizations are listed below. Data Description The Community Innovation Survey (CIS-3) was conducted by Eurostat in 2001 in EU member states, including neighboring countries and ascending members. The key focus of the survey was to elicit innovation activities in firms with at least 10 employees. The survey focused on innovation activities between 1998 and 2000 in 27 EU member states (micro data is available for 18 countries). Although macro-data is publicly available, only recently has the micro-data been made available. Micro-data is available for 13 countries. The micro-data consists of information on the NACE 2 sector which allows identification of different innovation intensities in different industry sectors. Based on technology aggregation levels proposed in OECD (2006), such sectors are indentified as low-tech, high-tech, medium-high-tech, and medium-low-tech. Table S.3 provides details on the industries represented in our analysis. CIS data has been used in over 200 published articles (Laursen & Salter, 2004). 9 Search strategies are central to the innovation process. Depending on industry conditions, firms may adopt different search strategies to successfully innovate. Research has found that firms are less likely to adopt innovation patterns that are not consistent with environmental conditions (Laursen & Salter, 2004, 2006). Based on the OECD technology classification of industries, we were able to identify firms as belonging to a specific technological group. We used training samples so that firms could be grouped according to k-means and LCCA. We then used the individual dimension scores to classify firms in the holdout sample and measure classification accuracy. The dataset consists of 71,602 manufacturing and service firms. We eliminated 26,783 service firms. Of the remaining 44,819 manufacturing firms, we eliminated 17,327 firms that did not report R&D expenditures and 13,830 firms with missing values. Of the remaining 13,662 firms, we cleaned the data for inconsistent values such as negative values for R&D expenditures. After eliminating 198 cases, our final sample consisted of 13,464 firms representing 18 countries. The representative countries were Italy, Belgium, Finland, Norway, Spain, Germany, Iceland, Greece, Slovenia, Portugal, Czech Republic, Latvia, Slovakia, Hungary, Estonia, Lithuania, Romania and Bulgaria. To create training and holdout samples, we randomly picked half the firms in each country cell to ensure representativeness from each country. Therefore, in our final sample we have 6,732 firms in each of the training and holdout samples. We operationalized five search strategies across different groups of stakeholders. Respondents, R&D managers, were asked to evaluate the importance of the information sources for their innovation activities on a 4-point Likert scale ranging from “0- not used” to “3-highly important”. In line with Grimpe & Sofka (2009) we use six potential sources of information that are central to innovation: (a) internal sources - 'within your enterprise or enterprise group' (b) suppliers - 'suppliers of equipment, materials, components or software' (c) customers - 'Clients or customers' (d) Competitors - 'competitors or other enterprises in your sector' (e) Universities 'universities or higher education institutes (f) Government - 'government or public research institutes' TABLE S.3 Industry Classification Industry Chemicals and pharmaceuticals Office and computing machinery Radio, TV and communication equipment Medical, precision and optical equipment Food and tobacco Textiles and leather Wood / paper / publishing Manufacturing n.e.c. (e.g. furniture, jewelery, sports equipment and toys) Machinery and equipment Electrical machinery and apparatus Motor vehicles and trailers Transport equipment Plastics / rubber NACE Code 24 30 32 33 15 – 16 17 – 19 20 – 22 36 – 37 Industry Group High -technology High-technology High-technology High-technology Low-technology Low-technology Low-technology Low-technology 29 31 34 35 25 Medium-high-technology Medium-high-technology Medium-high-technology Medium-high-technology Medium-low-technology 10 Glass / ceramics Metals 26 27 – 28 11 Medium-low-technology Medium-low-technology Supplementary Material File 4 Title: Organizationally-Relevant Configurations: The Value of Modeling Local Dependence Journal: Quality and Quantity Author Names: Pankaj C. Patel Assistant Professor of Management Department of Marketing and Management Miller College of Business Ball State University 2000 W. University Ave. Muncie, IN 47306 Phone: 765-285-3174 Fax: 765-285-4315 Email: pcpatel@bsu.edu Sherry M.B. Thatcher Katerina Bezrukova Data and variables for the employee outcome sample Supplementary Material File 3 describes the data and variables used to classify firms' search strategies to relevant industrial classes. Similarly, we tested different configurations of employee outcomes based on the industry in which they work. Different industry sectors may result in different employee outcomes, ceteris paribus. Data description To test the differences in employee outcomes we use data from the Future of Work project in the UK3. The survey conducted between 2001 and 2003 focused on effects of workplace practices at the employee level. The survey focused on employees in six organizations in different industrial sectors. Two organizations were aerospace firms (604 responses, 62% response rate; 878 responses, 80% response rate), one medium-sized finance company (128 responses, 32% response rate), an insurance subsidiary (127 responses, 25% response rate) a local authority employer (386 responses, 52% response rate), and a hospital (452 responses, 38% response rate). Therefore, our final sample consisted of a total of 1697 employees from five different sectors of the economy - aerospace, finance, insurance, local authority, and hospitals. The firms represent the following industries: manufacturing, service, and non-profits. 3 http://www.leeds.ac.uk/esrcfutureofwork/ 12 Before identifying the variables and conducting the analysis, it is essential to ensure that the environments across these firms are similar. If the firm environments are dissimilar, then the employee outcomes could be a result of endogenous factors and not due to industry-related issues. To assess this, we created a list of practices within the firms and identified differences in firm practices across the five sets of firms. We created a list of organizational practices available in the dataset and conducted a within unit analysis to assess the degree to which employees differ in their assessment of practices, and then we conducted between unit differences. The details are listed in Table S.4. Table S.4 Manufacturing rwg(1) rwg(1) aerospace aerospace2 1 Service Non-profit rwg(1) rwg(1) rwg(1) rwg(1) finance insurance local hospital authority Between group difference 4 1. Selfdirected teams 2. Integrated project teams 3. Problem Solving groups 4. Job rotation within teams 5. Job rotation between teams 6. Team briefing 7. Formal consulting practices 8. Work Council 9. Employee 0.87 0.81 0.89 0.81 0.86 0.84 0.24 0.81 0.89 0.86 0.81 0.86 0.87 0.18 0.81 0.82 0.88 0.9 0.87 0.93 0.26 0.83 0.83 0.83 0.81 0.87 0.83 0.27 0.87 0.84 0.91 0.81 0.82 0.9 0.23 0.89 0.84 0.85 0.84 0.83 0.84 0.23 0.87 0.81 0.84 0.92 0.87 0.90 0.11 0.9 0.82 0.83 0.88 0.9 0.87 0.26 0.83 0.88 0.85 0.83 0.81 0.84 0.2 4 See Pasisz & Hurtz (2009) for comparing Rwg values among groups using Games-Howell approach; only the highest p-values among 15 possible comparisons among groups are reported. Detailed results are available upon request. 13 Appraisals 10. On-the-job training 11. Off-thejob training 12. Merit ownership pay 13. Shareownership scheme 14. Profitsharing schemes 0.85 0.87 0.86 0.85 0.89 0.81 0.12 0.89 0.82 0.82 0.88 0.9 0.86 0.26 0.88 0.84 0.86 0.84 0.62 0.60 0.23 0.88 0.84 0.9 0.89 0.73 0.77 0.17 0.82 0.84 0.83 0.81 0.75 0.70 0.26 As shown in the table there is no significant difference in practices across the firms. This provides further evidence of increasing convergence of workplace practices across industries (Carr & Pudelko, 2006; Tregaskis, 2006). Given the uniformity in practices, there were no significant differences in perceptions of practices across firms. The employee outcomes that we investigated include the following: Job Satisfaction. Job satisfaction was measured using four items. The respondents were asked, 'How satisfied are you with' (a) the amount of influence you have over your job; (b) the amount of pay you receive; (c) the sense of achievement you get from your work; and (d) the respect you get from supervisors/line managers. The items were based on a four point scale on a four point scale (3-very satisfied, 2- satisfied, 1-dissatisfied, 0 - very dissatisfied). The reliability of the measure was 0.68. Employee Commitment. Employee commitment was measured using six items on a four-point scale (strongly agree (3), agree (2), disagree (1), strongly disagree (0)). Employees were asked, 'Do you agree or disagree with the following': (a) I share many of the values of my employer (b) I feel loyal to my employer (c) I am proud to tell people who I work for (d) I am willing to work harder than I have to in order to help this organisation succeed (e) I will take almost any job to keep working for this organisation (f) I would turn down another job with more pay in order to stay with this organisation. The reliability of the measure was 0.80. Workplace Stress. This measure was based on a three item four-point scale (strongly agree (3), agree (2), disagree (1), strongly disagree (0)). Employees were asked, 'Do you agree or disagree with the following': (a) I never seem to have enough time to get my job done (b) I worry a lot about my work outside working hours (c) I feel very tired at the end of a workday. The reliability of the measure was 0.68. Work-Family Conflict. Work-family conflict was measured using a four item five point scale (0almost always, 1-often, 2-sometimes, 3-rarely, 4-never). The scale items were: (a) I have too 14 little time to carry out my family responsibilities (b) After work I have enough time to pursue other interests (reverse coded) (c) My partner/family get fed up with the pressure of my job (d) My job allows me to give the time I like to my partner or family (reverse coded). The reliability for this measure was 0.78. Family-Work Conflict. While work-family conflict explains spillover of work on family life, family-work conflict focuses on spillover from family life to work life. The construct was based on a dichotomous response (Yes/No). The respondents were asked if, 'Household or family responsibilities prevented you from (a) accepting a full-time job (b) accepting promotion (c) changing jobs (d) devoting to your work. We added the responses to create an index of these four questions. Overall, we use these five employee outcomes to assess differences in turnover based on the industry in which the employee worked. 15 Supplementary Material File 5 Title: Organizationally-Relevant Configurations: The Value of Modeling Local Dependence Journal: Quality and Quantity Author Names: Pankaj C. Patel Assistant Professor of Management Department of Marketing and Management Miller College of Business Ball State University 2000 W. University Ave. Muncie, IN 47306 Phone: 765-285-3174 Fax: 765-285-4315 Email: pcpatel@bsu.edu Sherry M.B. Thatcher Katerina Bezrukova As explained earlier, CA and LCCA with local dependence approaches were conducted on the training samples. The approach that leads to fewest classification errors in the training sample is deemed the best approach. We used two traditional techniques in binary classification to show the comparative significance of the classification approaches: (a) the Kruskal-Wallis test, and (b) the muiltinomial Respondent Operation Curve (ROC) approach5. The Kruskal-Wallis Test, a non-parametric variant of ANOVA compares the distribution of objects across three or more groups. In the current analyses we have externally-defined four technology groups for the search strategy sample and three organizational groups for the employee outcome sample. Using multivariate discriminant analyses we develop the discriminant function from the training samples. The multivariate discriminant function is used to classify the firms into the four technology groups for the search strategy sample, and employees into the three organizational groups for the employee outcome sample. Based on the classification function from the multivariate discriminant analyses, we compared empirical classifications against externally-defined classifications. The Kruskal-Wallis tests the null hypothesis that the specific groups in each of the two training samples are from the same underlying population. When the statistic for the Kruskal-Wallis test (H) is above a critical value, there is a low overlap among the groups. In other words, a significant H statistic indicates that 5 While parametric assumptions may not be valid in the current set of classification techniques, we conducted additional analyses and found that the results did not differ using the additional analyses. 16 classifications are more differentiated. Typically, a chi-square distribution (df = number of groups - 1) is used. Therefore a low p-value signals a large difference among the groups. A close examination of Table S.5.1. shows that LCCA with local dependence classifies firms and employees into distinct groups in comparison to CA. A more precise test of classification accuracy uses the Receiver Operating Characteristics (ROC) curve. Traditionally, the ROC curve is used to assess classification accuracies in binary outcomes (Hanley & McNeil, 1982). In our examples, we must be able to assess classification accuracies in more than two classes. While one may conduct pairwise comparison between ROC curves, such individual comparisons may not be an accurate indicator of the overall effectiveness of the classification system (Li & Fine, 2008; Obuchowski, 2005). Li and Fine (2008) propose an alternative method for determining the overall effectiveness of classification when more than two classes are present. Therefore, we use pairwise comparisons of the area under the ROC curve (AUC) and hypervolume under manifold (HUM) as proposed by Li and Fine (2008). Table S.5.2. shows the differences in HUM scores and pairwise AUC comparisons for CA, LCCA without local dependence, and LCCA with local dependence. For HUM calculations we use the macro in Matlab provided by Li and Fine (2008)6 and for pairwise comparison we use ROCCONTRAST macro in SAS9.2. A high value of HUM indicates a high overall classification ability of a given approach. In pairwise AUC comparisons, a higher value indicates a greater degree of classification accuracy. The p-value based on the Chi-Square tests informs us of whether the areas covered by each classification approach are significantly different. Based on the results we see that the separation in the Kruskal-Wallis test provided by LCCA without local dependence is greater than that of CA, and the separation for LCCA with local dependence is larger than that for the LCCA without local dependence. Overall, we demonstrate that using LCCA and modeling LCCA provides a better fit than traditional CA. Overall, taken together our results suggest that accounting for local dependence not only identifies fewer clusters than those identified in LCCA without local dependence and k-means clustering, but that the differences in classifications are statistically significant. More importantly, LCCA with local dependence provides more accurate classifications of externallydefined classifications, at least with respect to the two samples illustrated in this manuscript. While both CA and LCCA without local dependence resulted in five classes for the search strategy sample, LCCA with local dependence provided a four class solution – low-tech, hightech, and two medium-tech categories. Similarly, in the case of the employee outcome data, LCCA with local dependence provided a theoretically-valid, three class solution. This validity was also reflected in the holdout sample comparison for both samples. Overall, LCCA with local dependence predicted the externally-defined classifications more accurately than did CA and LCCA without local dependence. 6 The code for the analysis is available at http://www.stat.nus.edu.sg/~stalj/3droc.m 17 TABLE S.5. Trial and Holdout Sample Comparison TABLE S.5.1. Cluster analysis, LCCA, and LCCA with local dependence holdout sample classification accuracy using Wilk's Lambda difference tests. 𝝀𝑳𝑪𝑪𝑨 − 𝝀𝒌−𝒎𝒆𝒂𝒏𝒔 𝝀𝑳𝑪𝑪𝑨−𝒍𝒐𝒄𝒂𝒍 𝒅𝒆𝒑𝒆𝒏𝒅𝒆𝒏𝒄𝒆 − 𝝀𝒌−𝒎𝒆𝒂𝒏𝒔 𝝀𝑳𝑪𝑪𝑨−𝒍𝒐𝒄𝒂𝒍 𝒅𝒆𝒑𝒆𝒏𝒅𝒆𝒏𝒄𝒆 − 𝝀𝑳𝑪𝑪𝑨 Search strategy sample High-tech through low-tech Low-tech through Medium-high tech Medium-high tech through medium low-tech Employee outcome sample Manufacturing through service Service through nonprofit 7 2.833***7 3.443*** 6.964*** 1.241 2.184*** 4.786*** 0.998 4.993*** 5.138*** 0.525 8.507*** 7.981*** 0.533 7.182*** 6.705*** F-values based on difference tests for Wilk's lambda are reported. 18 Table S.5.2: ROC curve results Technology groups in the search strategy sample Firm groups in the employee outcome sample HUM (confiden ce interval) 0.683 (0.942, 0.424) AUC(1: HUM 2) AUC(1: AUC(1: AUC(2: AUC(2: AUC(3: (confidenc 3) 4) 3) 4) 4) e interval) -- -- -- -- -- -- LCA 0.784 (0.938, 0.424) 0.1417 0.1420 0.2053 0.1249 0.1254 0.1442 LCAlocal depend ence 0.916 (1.00, 0.73) 0.044 0.103 0.039 0.0347 0.0253 0.0121 kmeans AUC(1:2) AUC(1:3) -- -- -- 0.687 (0.846, 0.275) 0.2914 0.2244 0.5719 0.849 (0.973, 0.725) 0.038 0.016 0.015 0.559 (0.843, 0.275) AUC(2:3) HUM (Hypervolume under manifold): a higher value indicates that the model approaches a perfect classifier. AUC (Area under curve) comparison p-value: compares the areas under three different approaches (k-means cluster analysis, LCCA without local dependence, and LCCA with local dependence) and reports the p-value of the differences based on Chi-Square tests. 19