Supplementary Material File 1 Title: Organizationally

advertisement
Supplementary Material File 1
Title: Organizationally-Relevant Configurations: The Value of Modeling Local
Dependence
Journal: Quality and Quantity
Author Names:
Pankaj C. Patel
Assistant Professor of Management
Department of Marketing and Management
Miller College of Business
Ball State University
2000 W. University Ave.
Muncie, IN 47306
Phone: 765-285-3174
Fax: 765-285-4315
Email: pcpatel@bsu.edu
Sherry M.B. Thatcher
Katerina Bezrukova
TABLE S.1.
Sample of Research Studies on Organizational Configurations
Level of
configuration
Nation
Inductive Configurations
Key areas
Empirical method used
Deductive Configurations
Key Areas
Institutional
configurations
(Hall & Gingerich,
2009)
Capitalism
complementarities
(Amable, 2003)
Cross-country
innovation systems
Fixed and random effects
cross-country cluster
analysis
Classification of legal systems –
common law and civil law (e.g.
Glaeser & Shleifer, 2002)
Distance based Clustering
Economic Systems – capitalism,
socialism, communism (e.g. Peng,
Wang, & Jiang, 2008)
Hofstede’s cultural values – power
distance, uncertainty avoidance,
1
Enhances robustness of
cluster analysis by using
1
The list is by no means exhaustive. It lists recent papers in respective areas; our goal is to draw attention to the fact
that in recent years research on OR configurations has been very robust.
1
(Castellacci &
Archibugi, 2008)
Industry
Technology
diffusion across
(Sundqvist, Frank,
& Puumalainen,
2005)
Regional
knowledge and
capital
accumulation
(Powell, Koput,
Bowie, & SmithDoerr, 2002)
Industry
classification
(Kelton, Pasquale,
& Rebelein, 2008)
Strategic groups
(Chen, 1996;
DeSarbo, Grewal,
& Wang, 2009; J.
C. Short et al.,
2007; Wang, 2007)
Operational and
supply chain
capabilities (Heim
& Sinha, 2002)
Marketing and
sales strategy
(Homburg, Jensen,
& Krohmer, 2008)
three different hierarchical
agglomerative methods,
three different distance
measures, and using kmeans iterative partitioning
method to confirm findings
of hierarchical
agglomerative clustering
approach
Hierarchical cluster
analysis using Ward's
approach
individualism vs. collectivism,
masculinity vs. femininity, long-term
vs. short-term orientation (Hofstede,
1983)
Spatial clustering
Regional development -- Rural, urban,
semi-urban areas (Stimson, Stough, &
Roberts, 2006)
Update to Feser and
Berger's factor analysis
based approach to identify
industry clusters at the
national level
Cluster analysis, Bi-linear
spatial multidimensional
scaling, and Finite mixture
structural equation
modeling
Nature of product innovation -- Hitech, low-tech, and medium tech (Von
Tunzelmann & Acha, 2005)
Hierarchical cluster
analysis using Ward's
approach
A three stage clustering
approach: (a) number of
clusters were determined
using cubic clustering
criteria and pseudo-t2; (b)
observations were assigned
to clusters using a hybrid
approach combining
Ward's approach and k2
Strategic orientation – Prospectors,
analyzers, defenders, and reactors
(Miles & Snow, 1984)
Niche partitioning – Specialist vs.
generalist (Carroll, 1985)
Firms
Teams
Information
systems (Ferratt,
Agarwal, Brown,
& Moore, 2005)
Human resource
management (Toh,
Morgeson, &
Campion, 2008)
Organizational
climate systems
(Schulte, Ostroff,
& Kinicki, 2006;
Schulte, Ostroff,
Shmulyian, &
Kinicki, 2009)
Organizational
commitment
(Sinclair, Tucker,
Cullen, & Wright,
2005; Somers,
2009)
Organizational
relationship
processes (Song,
Tsui, & Law,
2009)
Employee coping
(Cortina & Wasti,
2005)
Leadership (Foti &
Hauenstein, 2007)
Demographic
faultlines
(Barkema &
Shvyrkov, 2007;
Tuggle,
Schnatterly, &
Johnson, 2010)
means; (c) stability of
cluster assignment was
assessed using cross
validation procedure.
Two-stage clustering
approach
Hierarchical cluster
analysis using Ward's
approach
Two-stage clustering
approach
Organizational Structure –
Mechanistic or organic (Lawrence &
Lorsch, 1967)
Hierarchical cluster
analysis using Ward's
approach
Industry/organizational/product life
cycle – Birth, growth, maturity,
decline, and death (Audretsch &
Feldman, 1996; Jawahar &
McLaughlin, 2001; Klepper, 1996)
Hierarchical cluster
analysis using Ward's
approach
Deviant workplace behavior–
production deviance, property
deviance, political deviance, and
personal aggression (Robinson &
Bennett, 1995)
Power – Expert, referent, reward,
charisma, legitimate, and coercive
(Hinkin & Schriesheim, 1989)
Leadership styles – Charismatic,
participative, situational,
transactional, transformational
(Avolio, Bass, & Jung, 1999)
Teams based on work context -Advice and involvement teams,
production and service teams, action
and negotiation teams, project and
development teams (Sundstrom, De
Meuse, & Futrell, 1990)
Hierarchical cluster
analysis using Ward's
approach
Hierarchical cluster
analysis using Ward's
approach
LCCA
Demographic, personal and ability,
3
and geographical faultlines2
(Bezrukova et al., 2009; Homan, van
Knippenberg, Van Kleef, & De Dreu,
2007; Lau & Murnighan, 2005; Li &
Hambrick, 2005; Molleman, 2005;
Polzer, Crisp, Jarvenpaa, & Kim,
2006; Rico, Molleman, SanchezManzanares, & Van der Vegt, 2007;
Thatcher et al., 2003)
Team temporal status -- long term
teams and short term teams (Marks,
Mathieu, & Zaccaro, 2001)
Uncertainty, complexity, and routine - project teams vs. production teams
(De Dreu & Weingart, 2003)
Reward structure configuration16
(Homan et al., 2008)
2
These papers use either multivariate cluster analysis where configurations are defined a priori based on theory or
experimental manipulations where configurations are created from theory
4
Supplementary Material File 2
Title: Organizationally-Relevant Configurations: The Value of Modeling Local
Dependence
Journal: Quality and Quantity
Author Names:
Pankaj C. Patel
Assistant Professor of Management
Department of Marketing and Management
Miller College of Business
Ball State University
2000 W. University Ave.
Muncie, IN 47306
Phone: 765-285-3174
Fax: 765-285-4315
Email: pcpatel@bsu.edu
Sherry M.B. Thatcher
Katerina Bezrukova
5
Table S.2
Comparison between CA and LCCA with Local Dependence for OR Configurations
Empirical
Steps
Selection of
Variables
Standardization of
Variables
Multicollinearity
Description
Cluster Analysis (CA)
LCCA with local dependence
Researcher must chose
variables to include in
inductive or deductive
approach (Ketchen Jr et
al., 1993)
Variables selected by
researcher used as inputs
to the cluster analysis.
Variable selection acts as input
to LCCA for inductive selection.
Allows variables to
contribute equally.
However, meaningful
differences may be lost.
High correlation among
a set or sets of variables
could overestimate
presence and effects of
underlying construct or
constructs
Equivocal suggestions on
standardization
Standardization not required
Use factor analysis or
assess if cluster solution
remains the same under
different methods of
addressing
multicollinearity
Local dependence modeling
allows for explicit modeling of
correlations that are inductively
or deductively driven. Using
factor analysis results in
excluding factors below
eigenvalue of 1, and hence it
may eliminate meaningful
variance (Dillon, Mulani, &
Frederick, 1989). Furthermore,
assessing reliability under
different multicollinearity
conditions is not exhaustive and
By modeling local dependence
for deductive approaches one
can empirically assess whether
theoretically relevant group
variables are interrelated in an
empirical context.
6
Comparing LCCA with local dependence
and CA
As this step is driven by researcher volition,
one statistical method may not provide a
distinct advantage over the other.
However, in the context of deductive
configurations, LCCA with local
dependence can help assess the relevance of
the variables used to classify cases into
configurations by researcher. Local
dependence tests can help model
interrelationships to ensure robustness of
the configurations.
By not requiring standardization, LCCA
with local dependence preserves meaningful
differences that are typically lost through
standardization in CA
This is the key advantage of LCCA with
local dependence over CA. Modeling local
dependence helps theoretically and
empirically model relationships between
two or more set of variables that correlate.
Classification
Algorithms
Organization method
used to classify cases in
a sample to different
groups
Determining
number of
clusters
Different classification
approaches may result in
a different number of
clusters. A more valid
classification approach
helps determine the
number of clusters that
closely resemble the
underlying empirical
phenomenon.
Reliability
External
Validity
Hierarchical and nonhierarchical approaches
based on distance
measures
Hierarchical: Visual
inspection of dendogram
Non-hierarchical: Visual
inspection of ‘elbow’
graph.
Cubic Clustering
Coefficient: Measures
within and between
cluster homogeneity.
However, it results in
identifying too many
clusters (Milligan &
Cooper, 1985)
The extent to which the a) Run CA multiple times
current cluster solution b) Split the sample to
can be replicated
assess if cluster
solution is replicated
in the other half of the
sample
Testing theoretically
Using multivariate
meaningful relationships techniques such as
not all methods for reducing
multicollinearity may be
available to the researcher.
Modeling local dependence
helps identify helps efficient and
non-redundant classification
under inductive or deductive
configurations.
Cases classified into clusters
using posterior membership
probabilities estimated from
maximum likelihood methods.
Robust statistical indices such as
Log-likelihood, AIC and BIC are
available to assess fit for a
different number of cluster
solutions.
Drawing on numerical methods, likelihood
based approaches in LCCA with local
dependence have shown to be more robust
and reliable by providing asymptotically
unbiased and efficient parameter estimates
(Vermunt, 2003).
Comparison between competing models is
possible using LCCA with local
dependence. In CA such comparison
between multiple cluster solutions is based
on visual inspection and researcher skill.
Stable solutions with no need for
replication (Winsberg & De
Soete, 1993)
LCCA with local dependence fit indices
(based on asymptotically distributed Chisquare random variables) determine the
optimal number of clusters. Such indices are
less dependable when calculated from
multi-dimensional contingency tables with
sparse data (Formann & Kohlmann, 1996).
Increased reliability in estimation of
configurations make LCCA with local
dependence more appealing.
Using mixture regressions,
clustering and exogenous
With the ability to create configurations
simultaneously with exogenous variables,
7
between cluster
solutions and externallydefined variables
ANOVA, OLS, and Ftests.
variables can be simultaneously
modeled; thus providing a robust
test of external validity
(McLachlan & Peel, 2000).
Type I errors are reduced and better
causality can be established.
Comparing LCCA with local dependence
and CA
LCCA with local dependence helps
accommodate a variety of measurement
scales.
Other Issues
Description
Cluster Analysis (CA)
LCCA with local dependence
Types of
variables
The extent to which
different types of
variables can be
incorporated in analysis
Number of cases
required to minimize
Type II errors
Typically need
continuous or categorical
variables to calculate
distances.
Small sample sizes are
acceptable
Any combination of continuous,
categorical (nominal or ordinal),
or count variables are acceptable
User Expertise
Programming expertise
and in-depth knowledge
of modeling
Minimal
Significant
Convergence
Likelihood of model
convergence failure
Software cost and
accessibility
Minimal
Significant
Available in the majority
of statistical packages,
and accessible to users at
different expertise levels
Software (MPlus and
LatentGold) are relatively
expensive but are accessible to
advanced users
Sample Size
Software
Typically larger sample sizes are
required.
8
LCCA with local dependence helps model
unobserved heterogeneity (i.e. different
configurations may have different effect
sizes with exogenous variables).
LCCA with local dependence requires a
significantly larger sample size and hence
may lead to large Type II errors with
smaller samples.
LCCA with local dependence requires
programming and the ability to analyze
outputs in order to make adjustments to
models.
LCCA with local dependence could be
result in limited convergence.
LCCA with local dependence software is
relatively costly and not easily accessible to
users with low to moderate programming
expertise.
Supplementary Material File 3
Title: Organizationally-Relevant Configurations: The Value of Modeling Local
Dependence
Journal: Quality and Quantity
Author Names:
Pankaj C. Patel
Assistant Professor of Management
Department of Marketing and Management
Miller College of Business
Ball State University
2000 W. University Ave.
Muncie, IN 47306
Phone: 765-285-3174
Fax: 765-285-4315
Email: pcpatel@bsu.edu
Sherry M.B. Thatcher
Katerina Bezrukova
Data and Variables for the Search Strategy Sample
To assess the role of organizational configurations in the context of strategy we explore
the degree to which search routines for innovation differ among four unique industry groups high technology, low technology, medium-high-technology, and medium low technology
sectors. The data is based on the Community Innovation Survey 3 (CIS-3). Data description and
variable operationalizations are listed below.
Data Description
The Community Innovation Survey (CIS-3) was conducted by Eurostat in 2001 in EU
member states, including neighboring countries and ascending members. The key focus of the
survey was to elicit innovation activities in firms with at least 10 employees. The survey focused
on innovation activities between 1998 and 2000 in 27 EU member states (micro data is available
for 18 countries). Although macro-data is publicly available, only recently has the micro-data
been made available. Micro-data is available for 13 countries. The micro-data consists of
information on the NACE 2 sector which allows identification of different innovation intensities
in different industry sectors. Based on technology aggregation levels proposed in OECD (2006),
such sectors are indentified as low-tech, high-tech, medium-high-tech, and medium-low-tech.
Table S.3 provides details on the industries represented in our analysis. CIS data has been used in
over 200 published articles (Laursen & Salter, 2004).
9
Search strategies are central to the innovation process. Depending on industry conditions,
firms may adopt different search strategies to successfully innovate. Research has found that
firms are less likely to adopt innovation patterns that are not consistent with environmental
conditions (Laursen & Salter, 2004, 2006). Based on the OECD technology classification of
industries, we were able to identify firms as belonging to a specific technological group. We
used training samples so that firms could be grouped according to k-means and LCCA. We then
used the individual dimension scores to classify firms in the holdout sample and measure
classification accuracy.
The dataset consists of 71,602 manufacturing and service firms. We eliminated 26,783
service firms. Of the remaining 44,819 manufacturing firms, we eliminated 17,327 firms that did
not report R&D expenditures and 13,830 firms with missing values. Of the remaining 13,662
firms, we cleaned the data for inconsistent values such as negative values for R&D expenditures.
After eliminating 198 cases, our final sample consisted of 13,464 firms representing 18
countries. The representative countries were Italy, Belgium, Finland, Norway, Spain, Germany,
Iceland, Greece, Slovenia, Portugal, Czech Republic, Latvia, Slovakia, Hungary, Estonia,
Lithuania, Romania and Bulgaria. To create training and holdout samples, we randomly picked
half the firms in each country cell to ensure representativeness from each country. Therefore, in
our final sample we have 6,732 firms in each of the training and holdout samples.
We operationalized five search strategies across different groups of stakeholders.
Respondents, R&D managers, were asked to evaluate the importance of the information sources
for their innovation activities on a 4-point Likert scale ranging from “0- not used” to “3-highly
important”. In line with Grimpe & Sofka (2009) we use six potential sources of information that
are central to innovation: (a) internal sources - 'within your enterprise or enterprise group' (b)
suppliers - 'suppliers of equipment, materials, components or software' (c) customers - 'Clients or
customers' (d) Competitors - 'competitors or other enterprises in your sector' (e) Universities 'universities or higher education institutes (f) Government - 'government or public research
institutes'
TABLE S.3 Industry Classification
Industry
Chemicals and pharmaceuticals
Office and computing machinery
Radio, TV and communication equipment
Medical, precision and optical equipment
Food and tobacco
Textiles and leather
Wood / paper / publishing
Manufacturing n.e.c. (e.g. furniture,
jewelery, sports equipment and toys)
Machinery and equipment
Electrical machinery and apparatus
Motor vehicles and trailers
Transport equipment
Plastics / rubber
NACE Code
24
30
32
33
15 – 16
17 – 19
20 – 22
36 – 37
Industry Group
High -technology
High-technology
High-technology
High-technology
Low-technology
Low-technology
Low-technology
Low-technology
29
31
34
35
25
Medium-high-technology
Medium-high-technology
Medium-high-technology
Medium-high-technology
Medium-low-technology
10
Glass / ceramics
Metals
26
27 – 28
11
Medium-low-technology
Medium-low-technology
Supplementary Material File 4
Title: Organizationally-Relevant Configurations: The Value of Modeling Local
Dependence
Journal: Quality and Quantity
Author Names:
Pankaj C. Patel
Assistant Professor of Management
Department of Marketing and Management
Miller College of Business
Ball State University
2000 W. University Ave.
Muncie, IN 47306
Phone: 765-285-3174
Fax: 765-285-4315
Email: pcpatel@bsu.edu
Sherry M.B. Thatcher
Katerina Bezrukova
Data and variables for the employee outcome sample
Supplementary Material File 3 describes the data and variables used to classify firms'
search strategies to relevant industrial classes. Similarly, we tested different configurations of
employee outcomes based on the industry in which they work. Different industry sectors may
result in different employee outcomes, ceteris paribus.
Data description
To test the differences in employee outcomes we use data from the Future of Work
project in the UK3. The survey conducted between 2001 and 2003 focused on effects of
workplace practices at the employee level. The survey focused on employees in six organizations
in different industrial sectors. Two organizations were aerospace firms (604 responses, 62%
response rate; 878 responses, 80% response rate), one medium-sized finance company (128
responses, 32% response rate), an insurance subsidiary (127 responses, 25% response rate) a
local authority employer (386 responses, 52% response rate), and a hospital (452 responses, 38%
response rate). Therefore, our final sample consisted of a total of 1697 employees from five
different sectors of the economy - aerospace, finance, insurance, local authority, and hospitals.
The firms represent the following industries: manufacturing, service, and non-profits.
3
http://www.leeds.ac.uk/esrcfutureofwork/
12
Before identifying the variables and conducting the analysis, it is essential to ensure that
the environments across these firms are similar. If the firm environments are dissimilar, then the
employee outcomes could be a result of endogenous factors and not due to industry-related
issues. To assess this, we created a list of practices within the firms and identified differences in
firm practices across the five sets of firms. We created a list of organizational practices available
in the dataset and conducted a within unit analysis to assess the degree to which employees differ
in their assessment of practices, and then we conducted between unit differences. The details are
listed in Table S.4.
Table S.4
Manufacturing
rwg(1)
rwg(1)
aerospace aerospace2
1
Service
Non-profit
rwg(1)
rwg(1)
rwg(1)
rwg(1)
finance insurance local
hospital
authority
Between
group
difference
4
1. Selfdirected
teams
2. Integrated
project
teams
3. Problem
Solving
groups
4. Job
rotation
within
teams
5. Job
rotation
between
teams
6. Team
briefing
7. Formal
consulting
practices
8. Work
Council
9. Employee
0.87
0.81
0.89
0.81
0.86
0.84
0.24
0.81
0.89
0.86
0.81
0.86
0.87
0.18
0.81
0.82
0.88
0.9
0.87
0.93
0.26
0.83
0.83
0.83
0.81
0.87
0.83
0.27
0.87
0.84
0.91
0.81
0.82
0.9
0.23
0.89
0.84
0.85
0.84
0.83
0.84
0.23
0.87
0.81
0.84
0.92
0.87
0.90
0.11
0.9
0.82
0.83
0.88
0.9
0.87
0.26
0.83
0.88
0.85
0.83
0.81
0.84
0.2
4
See Pasisz & Hurtz (2009) for comparing Rwg values among groups using Games-Howell approach; only the
highest p-values among 15 possible comparisons among groups are reported. Detailed results are available upon
request.
13
Appraisals
10. On-the-job
training
11. Off-thejob
training
12. Merit
ownership
pay
13. Shareownership
scheme
14. Profitsharing
schemes
0.85
0.87
0.86
0.85
0.89
0.81
0.12
0.89
0.82
0.82
0.88
0.9
0.86
0.26
0.88
0.84
0.86
0.84
0.62
0.60
0.23
0.88
0.84
0.9
0.89
0.73
0.77
0.17
0.82
0.84
0.83
0.81
0.75
0.70
0.26
As shown in the table there is no significant difference in practices across the firms. This
provides further evidence of increasing convergence of workplace practices across industries
(Carr & Pudelko, 2006; Tregaskis, 2006). Given the uniformity in practices, there were no
significant differences in perceptions of practices across firms. The employee outcomes that we
investigated include the following:
Job Satisfaction. Job satisfaction was measured using four items. The respondents were asked,
'How satisfied are you with' (a) the amount of influence you have over your job; (b) the amount
of pay you receive; (c) the sense of achievement you get from your work; and (d) the respect you
get from supervisors/line managers. The items were based on a four point scale on a four point
scale (3-very satisfied, 2- satisfied, 1-dissatisfied, 0 - very dissatisfied). The reliability of the
measure was 0.68.
Employee Commitment. Employee commitment was measured using six items on a four-point
scale (strongly agree (3), agree (2), disagree (1), strongly disagree (0)). Employees were asked,
'Do you agree or disagree with the following': (a) I share many of the values of my employer (b)
I feel loyal to my employer (c) I am proud to tell people who I work for (d) I am willing to work
harder than I have to in order to help this organisation succeed (e) I will take almost any job to
keep working for this organisation (f) I would turn down another job with more pay in order to
stay with this organisation. The reliability of the measure was 0.80.
Workplace Stress. This measure was based on a three item four-point scale (strongly agree (3),
agree (2), disagree (1), strongly disagree (0)). Employees were asked, 'Do you agree or disagree
with the following': (a) I never seem to have enough time to get my job done (b) I worry a lot
about my work outside working hours (c) I feel very tired at the end of a workday. The reliability
of the measure was 0.68.
Work-Family Conflict. Work-family conflict was measured using a four item five point scale (0almost always, 1-often, 2-sometimes, 3-rarely, 4-never). The scale items were: (a) I have too
14
little time to carry out my family responsibilities (b) After work I have enough time to pursue
other interests (reverse coded) (c) My partner/family get fed up with the pressure of my job (d)
My job allows me to give the time I like to my partner or family (reverse coded). The reliability
for this measure was 0.78.
Family-Work Conflict. While work-family conflict explains spillover of work on family life,
family-work conflict focuses on spillover from family life to work life. The construct was based
on a dichotomous response (Yes/No). The respondents were asked if, 'Household or family
responsibilities prevented you from (a) accepting a full-time job (b) accepting promotion (c)
changing jobs (d) devoting to your work. We added the responses to create an index of these four
questions.
Overall, we use these five employee outcomes to assess differences in turnover based on the
industry in which the employee worked.
15
Supplementary Material File 5
Title: Organizationally-Relevant Configurations: The Value of Modeling Local
Dependence
Journal: Quality and Quantity
Author Names:
Pankaj C. Patel
Assistant Professor of Management
Department of Marketing and Management
Miller College of Business
Ball State University
2000 W. University Ave.
Muncie, IN 47306
Phone: 765-285-3174
Fax: 765-285-4315
Email: pcpatel@bsu.edu
Sherry M.B. Thatcher
Katerina Bezrukova
As explained earlier, CA and LCCA with local dependence approaches were conducted on the
training samples. The approach that leads to fewest classification errors in the training sample is
deemed the best approach. We used two traditional techniques in binary classification to show
the comparative significance of the classification approaches: (a) the Kruskal-Wallis test, and (b)
the muiltinomial Respondent Operation Curve (ROC) approach5.
The Kruskal-Wallis Test, a non-parametric variant of ANOVA compares the distribution
of objects across three or more groups. In the current analyses we have externally-defined four
technology groups for the search strategy sample and three organizational groups for the
employee outcome sample. Using multivariate discriminant analyses we develop the
discriminant function from the training samples. The multivariate discriminant function is used
to classify the firms into the four technology groups for the search strategy sample, and
employees into the three organizational groups for the employee outcome sample. Based on the
classification function from the multivariate discriminant analyses, we compared empirical
classifications against externally-defined classifications. The Kruskal-Wallis tests the null
hypothesis that the specific groups in each of the two training samples are from the same
underlying population. When the statistic for the Kruskal-Wallis test (H) is above a critical value,
there is a low overlap among the groups. In other words, a significant H statistic indicates that
5
While parametric assumptions may not be valid in the current set of classification techniques, we conducted
additional analyses and found that the results did not differ using the additional analyses.
16
classifications are more differentiated. Typically, a chi-square distribution (df = number of
groups - 1) is used. Therefore a low p-value signals a large difference among the groups.
A close examination of Table S.5.1. shows that LCCA with local dependence classifies
firms and employees into distinct groups in comparison to CA. A more precise test of
classification accuracy uses the Receiver Operating Characteristics (ROC) curve. Traditionally,
the ROC curve is used to assess classification accuracies in binary outcomes (Hanley & McNeil,
1982). In our examples, we must be able to assess classification accuracies in more than two
classes. While one may conduct pairwise comparison between ROC curves, such individual
comparisons may not be an accurate indicator of the overall effectiveness of the classification
system (Li & Fine, 2008; Obuchowski, 2005). Li and Fine (2008) propose an alternative method
for determining the overall effectiveness of classification when more than two classes are
present. Therefore, we use pairwise comparisons of the area under the ROC curve (AUC) and
hypervolume under manifold (HUM) as proposed by Li and Fine (2008).
Table S.5.2. shows the differences in HUM scores and pairwise AUC comparisons for
CA, LCCA without local dependence, and LCCA with local dependence. For HUM calculations
we use the macro in Matlab provided by Li and Fine (2008)6 and for pairwise comparison we use
ROCCONTRAST macro in SAS9.2. A high value of HUM indicates a high overall classification
ability of a given approach. In pairwise AUC comparisons, a higher value indicates a greater
degree of classification accuracy. The p-value based on the Chi-Square tests informs us of
whether the areas covered by each classification approach are significantly different. Based on
the results we see that the separation in the Kruskal-Wallis test provided by LCCA without local
dependence is greater than that of CA, and the separation for LCCA with local dependence is
larger than that for the LCCA without local dependence. Overall, we demonstrate that using
LCCA and modeling LCCA provides a better fit than traditional CA.
Overall, taken together our results suggest that accounting for local dependence not only
identifies fewer clusters than those identified in LCCA without local dependence and k-means
clustering, but that the differences in classifications are statistically significant. More
importantly, LCCA with local dependence provides more accurate classifications of externallydefined classifications, at least with respect to the two samples illustrated in this manuscript.
While both CA and LCCA without local dependence resulted in five classes for the search
strategy sample, LCCA with local dependence provided a four class solution – low-tech, hightech, and two medium-tech categories. Similarly, in the case of the employee outcome data,
LCCA with local dependence provided a theoretically-valid, three class solution. This validity
was also reflected in the holdout sample comparison for both samples. Overall, LCCA with local
dependence predicted the externally-defined classifications more accurately than did CA and
LCCA without local dependence.
6
The code for the analysis is available at http://www.stat.nus.edu.sg/~stalj/3droc.m
17
TABLE S.5.
Trial and Holdout Sample Comparison
TABLE S.5.1. Cluster analysis, LCCA, and LCCA with local dependence holdout sample classification accuracy using Wilk's
Lambda difference tests.
𝝀𝑳𝑪𝑪𝑨 − 𝝀𝒌−𝒎𝒆𝒂𝒏𝒔 𝝀𝑳𝑪𝑪𝑨−𝒍𝒐𝒄𝒂𝒍 𝒅𝒆𝒑𝒆𝒏𝒅𝒆𝒏𝒄𝒆 − 𝝀𝒌−𝒎𝒆𝒂𝒏𝒔 𝝀𝑳𝑪𝑪𝑨−𝒍𝒐𝒄𝒂𝒍 𝒅𝒆𝒑𝒆𝒏𝒅𝒆𝒏𝒄𝒆 − 𝝀𝑳𝑪𝑪𝑨
Search strategy
sample
High-tech through
low-tech
Low-tech through
Medium-high tech
Medium-high tech
through medium
low-tech
Employee outcome
sample
Manufacturing
through service
Service through nonprofit
7
2.833***7
3.443***
6.964***
1.241
2.184***
4.786***
0.998
4.993***
5.138***
0.525
8.507***
7.981***
0.533
7.182***
6.705***
F-values based on difference tests for Wilk's lambda are reported.
18
Table S.5.2: ROC curve results
Technology groups in the search strategy sample
Firm groups in the employee outcome sample
HUM
(confiden
ce
interval)
0.683
(0.942,
0.424)
AUC(1:
HUM
2)
AUC(1: AUC(1: AUC(2: AUC(2: AUC(3: (confidenc
3)
4)
3)
4)
4)
e interval)
--
--
--
--
--
--
LCA
0.784
(0.938,
0.424)
0.1417
0.1420
0.2053
0.1249
0.1254
0.1442
LCAlocal
depend
ence
0.916
(1.00,
0.73)
0.044
0.103
0.039
0.0347
0.0253
0.0121
kmeans
AUC(1:2)
AUC(1:3)
--
--
--
0.687
(0.846,
0.275)
0.2914
0.2244
0.5719
0.849
(0.973,
0.725)
0.038
0.016
0.015
0.559
(0.843,
0.275)
AUC(2:3)
HUM (Hypervolume under manifold): a higher value indicates that the model approaches a perfect classifier.
AUC (Area under curve) comparison p-value: compares the areas under three different approaches (k-means cluster analysis, LCCA
without local dependence, and LCCA with local dependence) and reports the p-value of the differences based on Chi-Square tests.
19
Download