Customer Satisfaction Analysis Laura Funa i

advertisement
Master Thesis in Statistics, Data Analysis and Knowledge Discovery
Customer Satisfaction Analysis
Laura Funa
i
ii
The back cover shall contain the ISRN number obtained from the department. This
number shall be centered and at the same distance from the top as the last line on the
front page.
LiU-IDA-???-SE
(Garamond 18)
iii
Abstract
The objective of this master thesis is to identify “key-drivers” embedded in customer
satisfaction data. The data was collected by a large transportation sector corporation
during five years and in four different countries. The questionnaire involved several
different sections of questions and ranged from demographical information to
satisfaction attributes with the vehicle, dealer and several problem areas. Various
regression, correlation and cooperative game theory approaches were used to identify
the key satisfiers and dissatisfiers. The theoretical and practical advantages of using
the Shapley value, Canonical Correlation Analysis and Hierarchical Logistic
Regression has been demonstrated and applied to market research.
iv
v
Acknowledgements
This work would not have been completed without support of many individuals. I
would like to thank everyone who has helped me along the way. Particularly: Prof.
Anders Nordgaard and Malte Isacsson for providing guidance, encouragement and
support over the course of my master’s research. Prof. Anders Grimvall for serving on
my thesis committee and valuable suggestions. Volvo Car Corporation for providing
the data. Lastly, to everyone else without whose support none of this would have been
possible.
vi
vii
Table of contents
1
2
3
Introduction ..................................................................................................................................... 5
1.1
Background .............................................................................................................................. 6
1.2
Objective .................................................................................................................................. 7
Data ................................................................................................................................................. 7
2.1
Raw data .................................................................................................................................. 8
2.2
Secondary data......................................................................................................................... 9
2.3
Assessment of data quality ..................................................................................................... 10
2.3.1
Univariate Analysis of the Satisfaction Attributes ............................................................ 11
2.3.2
Univariate Analysis of Problem Areas.............................................................................. 13
Methods ........................................................................................................................................ 16
3.1
Kano Modeling ....................................................................................................................... 16
3.2
Shapley Value Regression ....................................................................................................... 18
3.2.1
Assessing Importance in a Regression Model .................................................................. 18
3.2.2
Potential, Value and Consistency .................................................................................... 19
3.2.3
Shapley-based R2 Decomposition .................................................................................... 22
3.2.4
Choosing “key-drivers” ................................................................................................... 25
3.3
Trend Analysis ........................................................................................................................ 26
3.3.1
3.4
Hierarchical Logistic Regression Modeling .............................................................................. 27
3.4.1
Ordinary logistic regression model.................................................................................. 28
3.4.2
Hierarchical logistic regression........................................................................................ 28
3.5
4
The time consistent Shapley value .................................................................................. 27
Canonical Correlation Analysis................................................................................................ 29
3.5.1
Formulation .................................................................................................................... 29
3.5.2
Issues and practical usage ............................................................................................... 30
Computations and Results.............................................................................................................. 30
4.1
Shapley Value......................................................................................................................... 30
4.1.1
Ranked Satisfiers (related to the satisfaction with the dealer) ......................................... 32
4.1.2
Ranked Satisfiers (related to the satisfaction with the vehicle)........................................ 34
4.1.3
Ranked Dissatisfiers ........................................................................................................ 40
4.1.4
“Key attributes” identification......................................................................................... 43
1
5
4.2
Time Series and Trend Analysis............................................................................................... 43
4.3
Hierarchical Logistic Regression: SAS Modeling....................................................................... 48
4.4
Canonical Correlation Analysis................................................................................................ 50
Discussion and conclusions ............................................................................................................ 53
5.1
6
Proposed further research...................................................................................................... 57
5.1.1
Kernel Canonical Correlation Analysis ............................................................................. 57
5.1.2
Moving Coalition Analysis ............................................................................................... 57
Literature and sources.................................................................................................................... 58
Appendix A: SAS and R codes ................................................................................................................. 59
Appendix B: Outputs .............................................................................................................................. 61
2
Index of tables
Table 1: Datasets Summary .................................................................................................... 8
Table 2: Frequencies and proportions of the satisfaction attributes (Country A, Year
2006) ........................................................................................................................................ 11
Table 3: Problem areas occurance (Country A, Year 2006) .............................................. 13
Table 4: Dealer Satisfiers, Country A, Years 2006 and 2007 respectively ....................... 32
Table 5: Dealer Satisfiers, Country A, Years 2008 and 2009 respectively ....................... 33
Table 6: Dealer Satisfiers, Country A, Year 2010 ............................................................... 33
Table 7: Vehicle satisfiers, Country A, Years 2006 and 2007 respectively....................... 34
Table 8: Vehicle Satisfiers, Country A, Years 2008 and 2009 respectively...................... 35
Table 9: Vehicle Satisfiers, Country A, Year 2010 .............................................................. 36
Table 10: Vehicle Satisfiers, Country A, Years 2006 and 2007 respectively (respondents
with no problems) ................................................................................................................... 37
Table 11: Vehicle Satisfiers, Country A, Years 2008 and 2009 respectively (respondents
with no problems) ................................................................................................................... 38
Table 12: Vehicle Satisfiers, Country A, Year 2010 (respondents with no problems) ..... 39
Table 13: Dissatisfiers, Country A, Year 2006 and 2007 respectively .............................. 40
Table 14: Dissatisfiers, Country A, Year 2008 and 2009 respectively .............................. 41
Table 15: Dissatisfiers, Country A, Year 2010..................................................................... 41
Table 16: Ven problem area sub-categories, Country A, Year 2006................................. 42
Table 17: Bulding the GLIMMIX procedure.......................................................................... 48
Table 18: Country A, Year 2006............................................................................................ 48
Table 19................................................................................................................................... 49
Table 20: Solution for fixed effects........................................................................................ 49
Table 21................................................................................................................................... 51
Table 22................................................................................................................................... 52
Table 23................................................................................................................................... 53
3
Index of figures
Figure 1: Frequency distribution of variable V191 in Country A, Year 2006 ............................. 11
Figure 2: Proportions of problem areas .................................................................................... 15
Figure 3: Kano Model Attributes ............................................................................................... 17
Figure 4: Two-level hierarchical regression .............................................................................. 27
Figure 5: Satisfaction Attribute V90, Country A, Year 2006....................................................... 32
Figure 6: Noise-Reach table, Country A, Year 2006 .................................................................. 43
Figure 7: Time Series Analysis, Country A .......................................................................... 44
Figure 8: Trend in V1, Country A .............................................................................................. 45
Figure 9: Time Series Analysis, Country A (respondents with no problems)............................... 45
Figure 10: Trend in V1, Country A (respondents with no problems) .......................................... 46
Figure 11: Trend in V8, Country A, all respondents vs. only those with no problems ................. 46
Figure 12: Trend in V10, Country A, all respondents vs. only those with no problems ............... 47
Figure 13: Trend in V17, Country A, all respondents vs. only those with no problems ............... 47
Figure 14: Time Series Analysis, problem areas, Country A ...................................................... 48
Index of equations
Equation 1: Regression Model................................................................................................... 18
Equation 2: R-squared ............................................................................................................... 19
Equation 3: Nash Equlibrium..................................................................................................... 20
Equation 4: Marginal contribution of a player in a game............................................................ 20
Equation 5: Potential Function................................................................................................... 21
Equation 6: Differences ............................................................................................................. 21
Equation 7: Payoff..................................................................................................................... 21
Equation 8: Shapley Value......................................................................................................... 22
Equation 9: Regression model ................................................................................................... 23
Equation 10: Variance ............................................................................................................... 23
Equation 11: Relative contributions ........................................................................................... 23
Equation 12: Marginal Effect..................................................................................................... 24
Equation 13: Shapley Value R-squared decomposition .............................................................. 24
Equation 14: Fields R-squared decomposition ........................................................................... 25
Equation 15: Success ................................................................................................................. 26
Equation 16: Ordinary logistic regression model........................................................................ 28
Equation 17: Random effects..................................................................................................... 28
Equation 18: Fixed effects ......................................................................................................... 29
Equation 19: CCA parameter..................................................................................................... 29
4
1 Introduction
Predictive analytics rely on different statistical techniques deriving from fields such as data
mining, modeling and game theory. The main reason for using these is to extract information
from large and complex datasets and use it to forecast future trend patterns. In business,
predictive models search for patterns and hidden relationships in historical or transactional data to
serve as a guide for decision making and identifying risk and opportunities.
Several data mining techniques have been developed and showed positive impact over the years
in the large range of business fields. The most well-known applications include applications in
finance (e.g. credit scoring), marketing, and fraud detection. Each field by itself then offers an
enormous amount of possibilities where the analysis of the large datasets can be exploited in a
profitable fashion.
Taking marketing into consideration; among the main topics covered by predictive analytics are
CRM (Customer Relationship Management), cross-selling, customer retention and direct
marketing. Moreover, achieving these goals is in general based on conducting an appropriate
customer analysis.
A large portion of the data required by customer analytics is more than often acquired conducting
customer satisfaction surveys. Customer satisfaction is a well known term in marketing; it
indicates how well the products or services provided by the supplier meet the customer
expectations. In a highly competitive market, the companies may take advantage of such
information to differentiate or improve their products or/and services in order to increase their
(market) share of customers and customers’ loyalty. Such data is among the most frequently
collected indicators of market perceptions.
The purpose of this master thesis is to develop an appropriate customer satisfaction analysis
procedure that will provide an indicator of customer behavior on a highly competitive automotive
market. Thus, the aim of the analysis is to find hidden relationships, patterns and trends in the
datasets provided.
5
The master thesis is divided into five chapters, starting with stating the objectives and the
motivation of the problem addressed, followed by description and assessment of data quality,
including data sources, raw data and secondary data. The third part consists of the methods used
and model building, while the last part is focusing on the results and discussion of the latter. The
research is concluded with a critical assessment of the results obtained and the adequacy of the
methods used.
1.1 Background
The research is based on a customer satisfaction survey performed on new car owners (i.e.
owners of cars that are three months in service). The survey was conducted on several different
markets and consists of different areas of customer characteristics, customer satisfaction and
related issues. As cars are consumer products, automotive businesses are driven by customer
satisfaction. Hence an improvement in consumer insight and information gain through customer
data is sought for constantly. It is essential to mention that satisfaction is a very abstract concept
and the actual state of satisfaction varies between different individuals and different products or
services. It depends on several psychological and physical variables. Additional options or
alternative products and services that are available to customers in particular industry can be too
seen as a source of variability of the satisfaction level. Most valuable satisfaction behaviors to
investigate are loyalty and recommend rate.
The main purpose of customer satisfaction analysis is often, understanding the impact of
explanatory variables on the overall dependent variable. This means that a list of priority items,
that can be improved, needs to be established, since the improvement in any of these will have a
positive impact on overall satisfaction or customer loyalty and retention (Tang & Weiner, 2005).
When choosing an appropriate statistical technique it is necessary to have a clear vision whether
the purpose of the analysis is solely exploratory or predictive.
Some of the most common customer satisfaction techniques include; ordinary least squares,
Shapley value regression, penalty & reward analysis, Kruskal’s relative importance, partial least
squares and logistic regression. Since customer satisfaction studies are usually tracking studies,
the results can be monitored over time and allow for trend detection. Moreover, one of the
6
challenges when choosing the methodology and building the model is to assure that the results
are consistent when tracking market over time (Tang & Weiner, 2005).
1.2 Objective
The main objective of the master thesis is to find appropriate statistical techniques that present
good applications in customer satisfaction analysis. Furthermore, they should provide tools for
identifying “key drivers”, patterns, relationships among several sets of (dependent and
independent) variables and measures of relative importance.
More specific objectives of the thesis are; finding an exact measure of the contributions of the
explanatory variables to the dependent variable and identifying the greatest satisfiers and
dissatisfiers influencing customer satisfaction with the dealer and with the vehicle. Exploring the
nature of the satisfaction attributes and evaluating whether there is a possibility to establish a
“clean” measure of experienced problems and consequentially classify them into “fixable” and
those that cannot be repaired, but are a matter of customers’ personal preferences (i.e. “annoying
concept”). Finally, examining the relationships between two sets of variables (i.e. satisfaction
related problems and satisfaction attributes).
The thesis aims to be consistent with the most commonly used customer satisfaction analysis
techniques and available literature. What can and cannot be modeled and predicted needs to be
clearly stated at all points.
2 Data
The data used in this research was collected by conducting a customer satisfaction survey among
new car owners (i.e. customers who had purchased a new car within three months). The
questionnaire was divided into twelve different sections; ranging from personal, demographical
questions to questions directly connected to satisfaction with the new car, previous cars and
views on automotive industry. Data on customer satisfaction is often taken as a key performance
indicator within business and is often incorporated in balance scorecards.
7
An important basic requirement for effective research on customer satisfaction is building an
appropriate questionnaire that provides reliable and representative measures.
The general
guideline is to build questions on whether the product or service has met or exceeded
expectations. Expectations and consequently customer perceptions are therefore the key factor
behind satisfaction.
Questions are based on individual level perceptions but are usually reported on aggregate level.
According to Batra and Athola (Batra & Athola, 1990) customers purchase products and services
based on two types of benefits; hedonic and utilitarian. The first is connected to experiential
attributes and the latter is linked to the functional attributes of the product.
The survey used in this research, involved most common measures of customer satisfaction; sets
of statements using Likert technique and scales (Likert, 1932).
2.1 Raw data
The data provided was based on the survey conducted in four different countries (A, B, C and D)
and ranging over five years – 2006 to 2010, except for country C, where the survey was
conducted every second year (i.e. 2001, 2003, 2005, 2007, 2009)
Table 1: Datasets Summary
Country/Year
Number of
Variables
Recorded
responses
A 2006
A 2007
A 2008
A 2009
A 2010
B 2006
B 2007
B 2008
B 2009
B 2010
C 2001
C 2003
394
411
387
382
346
390
400
387
382
385
301
362
41474
41657
42783
40531
39879
46690
46148
49918
48833
46987
10830
9738
8
C 2005
C 2007
C 2009
D 2006
D 2007
D 2008
D 2009
D 2010
TOTAL
371
398
365
403
410
385
382
426
/
10912
10592
12341
13667
16509
18968
20875
23664
592996
In total there were 592996 responses and 229218251 data-points. The number of variables in
each survey ranged from 301 to 426.
The results from the development of the methodology are based on a survey1 in the country A
that included 41474 responses and 394 variables in the year 2006. In the trend analysis the
datasets included all five years.
The survey in question yielded 31351 valid responses, representing 76% of all customers who
participated. The variables used in the core part of the analysis were 34 satisfaction attributes and
14 problem areas, where each problem area consists in general of 20 sub-categories.
The satisfaction attributes were evaluated on a 1 to 10 scale, where 1 represented the worst and
10 the best possible outcome. Problem areas on the other hand allowed for several nominal
values.
2.2 Secondary data
Since the survey comprised several questions that allowed more than one answer (e.g. problem
areas), the first step of the analysis was to transform these into binary form, using dummy
variables. However, various variables in the research posed a bigger challenge and required
further investigation to decide whether they should be treated as being ordinal or interval.
1
The methodology and models developed in this research were then re-applied to the remaining datasets and the
results can be found in the Appendix C.
9
Variables ranked on a “never, occasionally, sometimes, always” scale present a problem on
relative placement of the two middle categories, thus Knapp (Knapp, 1990) argues that this
produces a less-than ordinal scale. The controversy arises from the key terms such as
“appropriateness” and “meaningfulness”. Conservative views (Siegel, 1956) are based on the
assumption that once the ordinal level has been adopted, the inferences are restricted to
population medians and non-parametric procedures must be used, hence the power of the
statistics is lower. Labovitz (Labovits, 1967, pp. 151-160) on the other hand argues that there are
no true restrictions in using parametric procedures for ordinal scales, since the assumption of the
validity of the t and F distributions do not include the type of the scale, which consequentially
provides statistics of higher power.
The number of the categories building the scale is important too. The remaining variables varied
in scale level and the two types of scales occurring were a 1 to 4 scale and 1 to 10 scale, where
the latter tends to continuize things more than the first. Moreover, there have been several studies
(Hausknecht, 1990) on measurement scales in customer satisfaction analysis, which attempt to
prove the validity of treating an ordinal scale with several categories as interval.
2.3 Assessment of data quality
The quality and the nature of the data provided was first assessed by applying an univariate
approach; identifying the distributions, response rate and percentage of missing values. As a last
step of this pre-analysis, the most common issues when dealing with customer satisfaction data
were pointed out.
10
2.3.1 Univariate Analysis of the Satisfaction Attributes
Figure 1: Frequency distribution of variable V191 in Country A, Year 2006
Table 2: Frequencies and proportions of the satisfaction attributes (Country A, Year
2006)
Scale
1
2
3
4
5
6
7
8
9
10
Total (responses)
Missing values
V191
0,41%
0,27%
0,57%
1,17%
1,42%
5,20%
8,71%
28,68%
29,09%
24,48%
94,78%
5,22%
V14
0,06%
0,04%
0,12%
0,31%
1,01%
3,61%
8,89%
24,65%
28,46%
32,85%
97,59%
2,41%
V193
0,24%
0,14%
0,22%
0,62%
1,11%
3,51%
8,90%
25,37%
31,62%
28,27%
97,53%
2,47%
V7
0,07%
0,07%
0,20%
0,63%
1,47%
5,13%
12,11%
27,32%
28,01%
24,99%
97,59%
2,41%
V3
0,18%
0,16%
0,32%
0,86%
1,72%
4,95%
11,94%
26,21%
27,42%
26,24%
97,54%
2,46%
Scale
1
2
V6
0,15%
0,17%
V8
0,11%
0,15%
V17
0,09%
0,05%
V23
0,27%
0,18%
V1
0,48%
0,27%
11
3
4
5
6
7
8
9
10
Total (responses)
Missing values
0,31%
0,98%
2,18%
6,51%
14,35%
27,12%
25,39%
22,83%
97,36%
2,64%
0,29%
1,00%
2,25%
7,33%
14,31%
27,57%
26,08%
20,90%
96,96%
3,04%
0,06%
0,19%
0,82%
2,77%
8,37%
24,16%
29,59%
33,91%
96,14%
3,86%
0,33%
1,06%
2,68%
8,44%
14,38%
25,95%
22,60%
24,11%
96,99%
3,01%
0,41%
0,75%
1,00%
2,64%
6,75%
19,63%
28,05%
40,03%
94,15%
5,85%
Scale
1
2
3
4
5
6
7
8
9
10
Total (responses)
Missing values
V12
0,10%
0,06%
0,14%
0,57%
1,38%
4,49%
11,25%
27,18%
27,95%
26,88%
97,35%
2,65%
V15
0,09%
0,07%
0,16%
0,49%
1,06%
3,65%
9,54%
25,44%
29,00%
30,50%
97,52%
2,48%
V208
0,27%
0,24%
0,47%
1,41%
2,16%
5,95%
11,64%
24,79%
25,96%
27,11%
94,37%
5,63%
V9
0,13%
0,08%
0,16%
0,49%
0,92%
3,08%
8,98%
25,16%
29,17%
31,83%
97,46%
2,54%
V19
0,15%
0,19%
0,34%
1,14%
1,81%
5,31%
11,12%
25,76%
26,68%
27,50%
97,41%
2,59%
Scale
1
2
3
4
5
6
7
8
9
10
Total (responses)
Missing values
V211
0,52%
0,35%
0,64%
1,61%
5,63%
10,72%
16,59%
24,61%
19,90%
19,44%
90,23%
9,77%
V4
0,28%
0,19%
0,24%
0,70%
2,56%
5,81%
12,44%
24,94%
24,13%
28,71%
94,66%
5,34%
V16
0,13%
0,14%
0,28%
0,92%
2,06%
5,86%
13,22%
27,81%
25,38%
24,19%
97,29%
2,71%
V13
0,07%
0,07%
0,11%
0,45%
1,15%
3,94%
11,09%
27,74%
27,96%
27,43%
97,26%
2,74%
V11
0,12%
0,10%
0,22%
0,78%
1,72%
5,22%
13,18%
28,08%
26,36%
24,22%
96,84%
3,16%
Scale
1
2
3
4
5
6
V20
0,13%
0,11%
0,22%
0,64%
1,75%
5,34%
V26
0,10%
0,09%
0,16%
0,77%
1,47%
5,09%
V10
0,19%
0,16%
0,31%
0,87%
1,37%
4,41%
V22
0,18%
0,15%
0,28%
1,11%
1,86%
5,67%
V2
0,14%
0,09%
0,25%
0,73%
1,45%
4,47%
12
7
8
9
10
Total (responses)
Missing values
12,42%
27,76%
26,63%
25,02%
94,84%
5,16%
Scale
1
2
3
4
5
6
7
8
9
10
Total (responses)
Missing values
13,01%
27,95%
25,78%
25,57%
97,03%
2,97%
10,06%
25,13%
27,97%
29,52%
97,69%
2,31%
12,32%
27,12%
26,06%
25,24%
96,91%
3,09%
11,42%
27,57%
27,79%
26,09%
97,50%
2,50%
V221
V222
V223
V18
0,06%
0,09%
0,15%
0,54%
1,51%
4,45%
10,95%
25,89%
27,23%
29,15%
97,50%
2,50%
0,26%
0,27%
0,56%
1,80%
3,21%
8,24%
12,70%
23,91%
23,09%
25,97%
97,43%
2,57%
0,19%
0,18%
0,41%
1,19%
2,68%
6,92%
13,33%
26,06%
24,06%
24,97%
97,16%
2,84%
0,09%
0,13%
0,23%
0,73%
1,78%
5,37%
12,67%
26,89%
26,05%
26,05%
97,52%
2,48%
Taking into consideration the most favorable rating scores; meaning that the attribute scores were
at least “very satisfied” (i.e. 7) the above tables illustrate that 77,5% to 96% of the customers
were at least “very satisfied” on at least one of the satisfaction attributes. The lowest satisfaction
score was associated with V202 with 77,5%, however it still represents a majority attitude. The
lowest response rate was associated with the attribute V211 with missing rate at 9,8%.
2.3.2 Univariate Analysis of Problem Areas
Total of 17950 problems appeared, meaning that 43,3% of all respondents experienced at least
one problem. The below tables represent frequencies of the individual problem areas. The most
common problems appear in the “Vel” category with 10,9% occurrence. The least common are
“Vs” problems.
Table 3: Problem areas occurance (Country A, Year 2006)
Problem Area
Vp
Ve
Vw
Vb
Vo
Vi
Number of
experienced problems 2081 1759
738
3401 3155 3635
% in the total
population
5,20% 4,24% 1,78% 8,20% 7,61% 8,76%
Vel
4500
10,85%
Problem Area
Ven
Vcl
Vbr
Vsw
Vs
Vex
Vot
Number of
2452
1568
1480
1216
502
364
428
13
experienced problems
% in the total
population
5,91% 3,78% 3,57% 2,93% 1,21%
Figure 2 represents the proportion of each problem area.
14
0,88%
1,03%
Figure 2: Proportions of problem areas
A very common challenge when dealing with customer satisfaction data is how to overcome the
problem of multicollinearity. It can be controlled and avoided by a well-designed questionnaire,
however, in most cases this is difficult to achieve. The attributes measured in the survey were in
general highly correlated with each other. An example of such problem would be when
evaluating the dealer where the car was purchased; the dealers’ ability to solve problems is highly
correlated with the dealers’ friendliness.
Another issue relates to dealing with customer satisfaction data that is of tracking nature. It is
challenging to reassure that the results obtained reflect real changes in the market and not just a
small number of respondent checking different satisfaction levels (e.g. 8 instead of 9).
It is important to note that the percentage of customer who had experienced at least one problem,
but are still at least “very satisfied” is 84,5%, which is the majority portion. Adding this
imbalance to the fact that the nature of the survey is offering only very scarce information on
problem areas, this may lead to several restrictions when analyzing the latter. Since the objectives
15
of the thesis involve a deep analysis of the experienced problems, more appropriate measures
should be provided by further expansion and development of the “things go wrong” section of the
questionnaire.
3 Methods
The main methods used in the research are:
•
Kano Modeling; providing deeper understanding of the customer satisfaction data and
what can be achieved using the available data.
•
Shapley Value; overcoming the problem of multicollinearity, providing better regression
results and allowing for trend analysis.
•
Hierarchical logistic regression; exploring different, hierarchically ranked layers of the
data.
•
Canonical correlation; analyzing relationships between two different sets of variables.
3.1 Kano Modeling
The theory of has been developed by professor Noriaki Kano (Mikulic & Prebežac, 2011, pp. 4466) and involves product development and customer satisfaction. It classifies product attributes
into five categories based on customer perceptions; enhancers, one-dimensional, must-be,
indifferent and reverse.
The theory states that the relationship between the performance of a product attribute and
satisfaction level is not necessarily linear. Certain attributes can be asymmetrically related with
satisfaction levels. These relationships are visually presented in figure 3.
16
Figure 3: Kano Model Attributes
Where an attractive attribute provides satisfaction when it is fully implemented, the nonfulfillment of such does not, however cause dissatisfaction. Must-be attributes on the other hand
results in dissatisfaction if not fulfilled, but the fulfillment does not increase satisfaction. Onedimensional attributes increase the satisfaction when implemented and dissatisfaction appears if
the attribute is not fulfilled. Indifferent attributes do not affect the consumer satisfaction in any
way, while reversal attributes result in customer dissatisfaction when fulfilled and satisfaction
when not fulfilled (e.g. when technology that is difficult to understand and complicated to use or
maneuver is implemented this may cause dissatisfaction).
There are several advantages to integrate the Kano modeling; classification of attributes can be
used to optimize and improve the products, discover the attractors and develop product
differentiation. Moreover, attribute classification provides valuable help in prioritizing
requirements and identifying attributes that need attention. An important measure to separate
experienced problems with the product, from those that can be fixed and those that are of
17
personal preference, may be introduced by using Kano modeling. The nature of the attractive and
must-be attributes would allow applying attributable and relative risk techniques.
Attributable risk measures the reduction in dissatisfaction that would be observed if the
consumers would not experience a particular problem, compared to the actual pattern. Relative
risk is a ratio of the probability of dissatisfaction occurring among the group of consumers that
experienced a particular problem compared to the probability of the dissatisfaction occurring
among the group of consumers that did not experience the problem. However, the data used in
this research did not include any additional indicator on problem attributes, hence classification
of these is problematic.
3.2 Shapley Value Regression
Regression models offer a convenient method for summarizing and achieving two very different
goals in data analysis. One is prediction and another is inference about interaction between the
predictor variables and the outcome variable. Yet, regression models do not prove that such
relationships exist, they simply summarize the likely effects if the models are as hypothesized
(Lipovetsky & Conklin, 2001).
3.2.1 Assessing Importance in a Regression Model
Considering a simple model;
Y ≈ f (X ,β)
Equation 1: Regression Model
where all of the predictor variables – x - are uncorrelated with each other; the standardized
regression coefficients (called Beta coefficients - •) are taken as measures of importance. These
measure the expected change in Y (i.e. dependent variable) when x changes by one standard
deviation.
Having a negative • (for one particular predictor) can present a potential complication. However,
since the actual value of • is its absolute value and the sign represents the direction of the effect,
• can be represented by either squaring the values or simply taking the absolute value.
18
The sum of the standardized coefficients is then equal to the overall R2 of the model, where R2
(named coefficient of multiple determination) is a measure of the overall quality of the fit of the
model (Lipovetsky & Conklin, 2001). Hence, each individual squared coefficient can be
interpreted as the percentage of the explained variance by that individual variable.
R2
∑( f
=
∑(y
i
− y) 2
i
− y)
i
2
=
SS reg
SS tot
i
Equation 2: R-squared
Nevertheless, the above explained situation almost never occurs in real data. Consequently,
assessing standardized regression coefficients as explained above does not lead to a good
indication of importance of each individual variable. The greater the correlation between the
predictor variables the less meaningful the evaluated coefficients are (e.g. taking two variables
with correlation of 1 into consideration; their coefficients would yield an infinite number of
combination of predictors, each making exactly the same contribution). As a solution to this, I
propose a technique used in Game Theory – the Shapley Value.
3.2.2 Potential, Value and Consistency
Shapley value, a solution concept in cooperative game theory, was introduced by Lloyd Shapley
in 1953 (Shapley, 1953). It assigns a unique distribution of total surplus generated by the
coalition of all players and it produces a unique solution satisfying the general requirements of
the Nash equilibrium (i.e. choosing an optimal strategy under uncertainty) (Kuhn & Tucker,
1959). There is always exactly one such allocation procedure.
3.2.2.1 Nash Equilibrium
Nash Equilibrium is a solution strategy in game theory (named after John Forbes Nash, who
introduced it). It involves a game of two or more players, where each player is assumed to be
aware of the equilibrium strategies – x-i*. of other players and is making the best decision they
19
can, taking into consideration the decisions of the remaining players. Moreover, none of the
players can gain anything by changing their decision, if the decisions of the others remain
unchanged. The set of strategies chosen under such circumstances and its payoff then constitute
the Nash Equilibrium.
∀i, xi ∈ S i , xi ≠ xi* ; f i ( xi* , x −*i ) ≥ f i ( xi , x −*i )
Equation 3: Nash Equlibrium
Where;
•
(S, f) is a game of n players and Si is a strategy of player i
•
S = S1xS2 … xSN is a set of strategy combinations where, f = (f1(x),…fn(x)) is the payoff
function for x∈ S
•
xi is a strategy combination for player i while x-i is a strategy combination for all players
except player i
Thus, when each player i chooses strategy xi, it follows that x = (x1,.., xn), and the resulting
playoff for player i equals fi(x). Once player i cannot improve their payoff by changing their
strategy, then the strategy has achieved xi*. Consequently the strategy combination x* ∈ S is the
Nash Equilibrium.
3.2.2.2 Potential
Theorem 1: “There exists a unique real function on games – called the potential – such that the
marginal contributions of all players (according to this function) are always efficient. Moreover,
the resulting payoff vector is precisely the Shapley value (Econometrica, 1989).”
D i P( N , v) = P( N , v) − P( N \ {i}, v)
Equation 4: Marginal contribution of a player in a game
Where;
•
N is a finite number of players
•
v is a characteristic function satisfying v(•) = 0
•
(N\{i}, v) is a subgame S
•
DiP(N, v) is a payoff vector
20
Thus, a function P(N, v) is called the potential function if it satisfies the following for all games;
∑ D P ( N , v ) = v( N )
i
i∈N
Equation 5: Potential Function
Moreover, the satisfaction of the above condition determines the uniqueness of the potential
function. According to Hart & Mass-Colell (Hart & Mass-Colell, 1989, pp. 589-614), it follows
that the potential function is such that the allocation of marginal contributions always adds up
exactly to the grand coalition. This is referred to as efficiency.
Furthermore, DiP(N, v) = Shi(N, v); where Shi denotes the Shapley value of player i in the game
(N, v).
3.2.2.3 Preservation of differences
Preservation of differences looks at the payoff allocation problem from another view. That is,
what would player i gain if player j is not be included and what would j get if player i would not
be included in the model. Hart & Mass-Colell (Ibid.), show that one obtains a unique efficient
outcome which simultaneously preserves all these differences.
d ij = x i ( N \ {j}) - x j ( N \ {i})
Equation 6: Differences
Thus,
x i ( N ) − x i ( N \ {i}) = x j ( N ) − x j ( N \ {i})
Equation 7: Payoff
The above equality has been used by Myerson (Myerson, 1980) and it has been proven that any
solution that is obtained by a potential function satisfies the condition. Hence, any such solution
clearly coincides with the Shapley value.
3.2.2.4 Consistency
An important characterization of the value is its internal consistency property.
21
Theorem 2: “Consider the class of solutions that, for two-person games, divide the surplus
equally. Then the Shapley value is the unique consistent solution in this class (Econometrica,
1989).”
In general, the consistency requirement as stated above may be described with:
•
• being a function that associates a payoff to every player in every game
•
reduced game, among any group of players in a game, defined as: giving the payoff
according to • to the rest of the players
It follows that • is consistent if and only if, when applied to any reduced game, yields the same
payoffs as in the original game (Econometrica, 1989).
3.2.2.5 Value
In regression, the attributes are thought of as players and the total value of the game as the R2.
The formulation of the Shapley value of a single attribute is defined as:
[
SV j = ∑∑ γ k v( M i| j ) − v ( M i| j ( − j ) )
k
]
i
Equation 8: Shapley Value
Where;
•
v(Mi|j) is the R2 of a model i containing predictor j
•
v(Mi|j(-j)) is the R2 of the same model i without j
•
γk =
k!(n − k − 1)!
; is a weight based on the number of predictors in total (n) and the
n!
number of predictors in this model (k)
3.2.3 Shapley-based R2 Decomposition
Shapley value offers very robust estimate of the relative importance of predictor variables even
when there are high levels of correlation and/or skewness in the data.
The most common approach to R2 decomposition in cases of multicollinearity is a stepwise
regression and its procedures. However, this method is of arbitrary nature and it does not always
lead to efficient conclusions. Moreover, the significance test does not always allow the ranking of
the independent variables in order of importance (Israeli, 2007, pp. 199-212). An alternative
22
approach has been proposed by Chantreuil and Trannoy (Chanteruil & Trannoy, 1999), who used
the concept of the Shapley value. Shorrocks (Shorrocks, 1999) then argues that Shapley value
based procedures can be applied in various situations, leading to different results.
While traditional decompositions such as Fields (Fields, 2003) decomposition, can be applied to
simple linear regressions models and perform well in finding the effects of the explanatory
variables, the new approach (i.e. Shapley value based approach) may also be applied to more
complicated regression models. These may include interactions, dummy variables and high
multicollinearity between explanatory variables.
3.2.3.1 Decomposing R2
Consider a regression model;
J
y = a + ∑bj x j + e
j =1
Equation 9: Regression model
where the total sum of squares (in essence the raw variance of y) can be decomposed into the
model sum of squares (SSreg) and the error sum of squares (SSerror):
Var ( y ) = SS tot = Var ( yˆ ) + Var (e)
Equation 10: Variance
The R2 of the regression is then taken as previously stated:
R2 =
SS reg
SS tot
Following the Mood, Graybill and Boes (Mood et al., 1974) theorem the relative contributions
may be stated as:
J
Var ( y ) = ∑ Cov (b j x j , y ) + Cov (e, y )
j =1
Equation 11: Relative contributions
23
Omitting the residuals it follows that:
J
R ( y) =
2
∑ b Cov( x
j =1
j
, y)
j
Var ( y )
= 1−
Cov (e, y )
Var ( y )
Continuing from the above equation, the explanatory variables can be ranked according to their
importance. However, this fails to account for probable correlation between the contribution of an
individual explanatory variable and that of the remaining variables. On the other hand, Shapley
decomposition procedure requires the contribution of a variable being equal to its marginal effect.
The marginal effect can be expressed as:




M k = R 2  y = a + ∑ b j x j + bk x k + e  − R 2  y = a * + ∑ b *j x j + e * 
j∈S
j∈S




Equation 12: Marginal Effect
Where; S is a subgroup of explanatory variables not including variable k.
Taking a simple example into consideration, where y = a + b1 x1 + b2 x 2 + e , the difference of the
two decompositions may be seen from the following:
•
Shapley decomposition:
[
]
[
]
1 2
R (a + b1 x1 + b2 x 2 + e) − R 2 (a * + b2* x 2 + e * ) + R 2 (a ** + b1** x1 + e ** )
2
1
C 2 = R 2 (a + b1 x1 + b2 x 2 + e) − R 2 (a ** + b1** x1 + e ** ) + R 2 (a * + b2* x 2 + e * )
2
C1 =
Equation 13: Shapley Value R-squared decomposition
•
Fields decomposition:
C1 =
(b1 + b1** ) Cov ( x1 , y ) (b2 + b2* ) Cov ( x 2 , y )
+
2
Var ( y )
2
Var ( y )
24
(b2 + b2* ) Cov ( x 2 , y ) (b1 + b1** ) Cov ( x1 , y )
C2 =
+
2
Var ( y )
2
Var ( y )
Equation 14: Fields R-squared decomposition
Of special interest when comparing the two decompositions are models that are including high
multicollinearity. This issue is particularly problematic when dealing with Fields decomposition
due to the reason that it uses the estimated coefficients. The estimated variances of these will be
large and consequentially the estimated coefficients will deviate largely from the population
coefficients. Moreover, a small change in the model can result in a large change in the estimated
coefficients. In contrast, Shapley based decomposition uses marginal contributions of a variable
from all sequences. The value of the contribution will be high or low depending on whether the
variable to which the variable in question is correlated is already included in the model.
Consequentially two strongly correlated variables will result in having similar contributions.
Israeli (Israeli, 2006, pp. 199-212) then argues that it is possible to similarly treat cases where
non-linear effects of a variable are included in the regression models and models where
interacting variables are introduced. There is no evidence of Fields decomposition, how the
contribution should be divided in such cases, while this represents no problem for Shapley
decomposition.
3.2.4 Choosing “key-drivers”
Up to this point, a method that successfully measures the relative importance of attributes in the
model has been established. The following analytical design is proposed to effectively identify
the key dissatisfiers (i.e. attributes that need attention).
The notations used include:
•
P(D) – probability of dissatisfaction
•
P(F) – probability of failure by any of the independent attributes
•
P(D|F) – conditional probability of dissatisfaction among failed
•
P(D|F’) – conditional probability of dissatisfaction among non-failed
•
P(F|D) – conditional probability of failure among those dissatisfied – reach value
•
P(F|D’) – conditional probability of failure among those non-dissatisfied – noise value
25
In general, it is possible to say that values on the several bottom levels (less than 5) on the ordinal
satisfaction scale prove dissatisfaction (D) and an identified problem corresponds to failure (F).
The opposite events; non-dissatisfaction and non-failure are denoted as D’ and F’ respectively.
To identify the attributes that need attention, it is necessary to find the maximum values of the:
Success =Re ach − Noise = P ( F D) − P ( F D' )
Equation 15: Success
This is a measure of the prevalence of failed respondents, among those who are dissatisfied, in
comparison with failed respondents, among those non-dissatisfied.
Considering a situation where all the attributes are ordered by their Shapley values in descending
order and corresponding reach and noise values are given. According to Conklin and Lipovetsky
(Conklin & Lipovetsky, 2004), adding the second ranked attribute to the model along with the
first one; will increase the reach function (i.e. the failure on either of the two attributes increases
the amount of dissatisfied customers). However, the noise function increases correspondingly
(i.e. the non-dissatisfied ones). Adding more attributes results in the same pattern.
In general, reach means reassuring that a large part of the total number of dissatisfied customers
are taken into consideration (which needs to be maximized), while a large noise number would
mean focusing on problems that are not actual causes of dissatisfaction (Conklin & Lipovetsky,
2004).
Once added noise overwhelms the added reach, when including the next attribute into the model,
success begins to decrease. At that point the final set of key dissatisfiers is defined (Conklin &
Lipovetsky, 2004).
3.3 Trend Analysis
Using the Shapley value as the measure of importance, allows us to track market over time. The
differences between two waves are due to actual changes in the market.
26
3.3.1 The time consistent Shapley value
The Shapley value is one of the most commonly used sharing mechanisms in static cooperation
games with transferable payoffs (Yeung, 2010, pp. 137-149). Actually, the time-consistency
property of the Shapley value means that if one renegotiates the agreement at any intermediate
instant of time, assuming that cooperation has prevailed from initial date until that instant, then
one would obtain the same outcome (Petrosjan & Zaccour, 2001, pp. 381-398). Thus, taking this
property allows us to compare the marginal contribution of each satisfaction attribute over time.
3.4 Hierarchical Logistic Regression Modeling
A hierarchical logistic regression model is proposed to examine data with group structure and a
binary response variable. There group structure is usually characterized by two levels; micro and
macro. The structure is visually presented in the figure 4.
Figure 4: Two-level hierarchical regression
The same variables, predictors are used in each context, but the micro predictors are allowed to
vary over context. At the first (micro) level, ordinary logistic regression model is applied. At the
second (macro) level the micro coefficients are treated as functions of macro predictors. A Bayes
estimation procedure is used to estimate the micro and macro coefficients. The components of the
model represent within- and between- macro variance. An algorithm for finding the maximum
likelihood estimates of the covariance of the components is proposed. The make-model car is
viewed as macro observations and individual cars as micro. Dai, Li and Rocke (Dai et al., NN)
propose the following procedure.
27
3.4.1 Ordinary logistic regression model
Let y be a binary outcome variable (i.e. the customer is satisfied or dissatisfied) that follows
Bernoulli distribution y ~ Bin (1, •) and x be a car level predictor. Then the model can be written
as:
y ij = π ij + eij
logit(•ij) = log(
π ij
1 − π ij
) = α + β xij
Equation 16: Ordinary logistic regression model
Where;
-
i = 1 … Ij is the car level indicator and
-
j = 1 … J is the make-model level indicator
-
π ij is the probability of dissatisfaction for car i among make-model j, conditional on x
Assumptions made in this model are that the micro level random errors eij are independent with
moments E(eij) = 0 and Var(eij) = σ e2 = π ij (1 − π ij ) .
3.4.2 Hierarchical logistic regression
Extending the ordinary model and accounting for effects of the second macro- level may be done
by including design variables (dummy variables). Each second level unit (i.e. each make-model
unit) has its own intercept in the model. These intercepts are used to measure the differences
between make-models.
logit(•ij) = •j + •xij
where •j is the make-model intercept and its effect can be either fixed or
random (Domidenko, 2004). For simplicity purposes it is possible to treat the effects as random
and re-write the model as following:
logit(•ij) = •j + •xij
where •j = α + u j
Equation 17: Random effects
It is then possible to add second level predictors. The above equation will therefore be extended
to:
28
logit(•ij) = •j + •xij
•j = • + •zj + uj
Equation 18: Fixed effects
Where the added term • is a fixed effect and z is the second level predictor. Using the same
predictors, the model can be extended further for investigation of possible cross-level interaction.
The algorithm can be applied using SAS procedure PROC GLIMMIX.
3.5 Canonical Correlation Analysis
Canonical correlation has been introduced by Harold Hotelling (Johnson & Wichern, 2001) and
is a way of exploring the cross-covariance matrices.
Consider two sets of variables x1, … , xn and y1, … ,ym and assume there are correlations among
these variables. Then the canonical correlation analysis will result in finding combinations of x’s
and y’s which have maximum correlation with each other.
3.5.1 Formulation
Given vectors;
• X = (x1, …, xn) and,
• Y = (y1, …, yn)
Let;
•
•
∑
∑
xx
= cov( X , X ) and,
YY
= cov(Y , Y )
The parameter to maximize is;
ρ=
a ' ∑ xy b
a ' ∑ xx a b' ∑YY b
Equation 19: CCA parameter
Following: The canonical variables are defined by;
• U = a’X
• V = b’Y
29
3.5.2 Issues and practical usage
The main benefit of using the canonical correlation analysis is its diversification from other
(appropriate) multivariate techniques that impose very rigid restrictions. It is generally believed
that those provide results of higher quality. However, for the purpose of this research and when
dealing with this type of data, the fact that canonical correlation places the fewest restrictions
makes it the most appropriate and powerful multivariate technique. It may be seen as a
generalization of the multiple linear regression.
Variables included in the analysis should be on ratio or interval scale. However nominal or
ordinal variables can be used after converting them to sets of dummy variables. Even though
testing significance of the canonical correlations requires data to be multivariate normal, the
technique performs well for descriptive purposes even if the requirement is not necessarily
fulfilled. Hair (Hair et al.,1998) discusses the flexibility of the canonical correlation and its
advantages, particularly in the context when the dependent and explanatory variables can be
either metric or non-metric. Hence, the application is broadly consistent with existing literature.
4 Computations and Results
The very first step when conducting the analyses was using SAS statistical software to transform
the variables that allowed more than one answer (e.g. problem areas) into binary form by adding
dummy variables.2
4.1 Shapley Value
I used R statistical language more specifically The Package relaimpo (Relative Importance for
Linear Regression in R). This package implements six different metrics for assessing relative
importance of predictors in the linear model. Moreover, it offers exploratory bootstrap confidence
intervals (Journal of Statistical Software, 2006).
For the purpose of this research, there are three particularly useful metrics; “lmg”, “first” and
“last”, described in the following
30
•
“lmg”; these are the Shapley Values. The metric is a decomposition of R2 into non-
negative contributions that automatically sum to the total R2. It is recommended to use when
calculating relative importance, since it uses both direct effect and effects adjusted for other
predictors in the model.
•
“First”; these are univariate R2 values from regression models with one predictor only.
They explain what each predictor individually is able to explain. If predictors are correlated the
sum of all “firsts” will be high above the the overall R2 of the model.
•
“Last”; these explain what each predictor is able to able to explain in addition to all other
predictors. The values represent the increase in R2 when the specific predictor is added to the
model. In case of correlation among the predictors, summing “lasts” will not add up to the overall
R2.
A potential drawback are computational difficulties, hence sampling of attributes is necessary.
Theil (Theil, 1987) suggests that an information measure may be introduced, thus information
coefficient was introduced as a pre-analysis step. Information coefficient is a measure for
evaluating the quality and usefulness of attributes. Unavoidably, 20 vehicle related attributes
were chosen in each dataset.
The following analysis is based on the R-output3 and includes relative importance of 15
satisfaction attributes regarding the dealer, where the vehicle was purchased, followed by 20
attributes regarding the vehicle, both ranging over 4 years.
2
3
See Appendix A for SAS codes
See Appendix B
31
4.1.1 Ranked Satisfiers (related to the satisfaction with the dealer)
Figure 5 is illustrating the frequency distribution of the response variable.
Figure 5: Satisfaction Attribute V90, Country A, Year 2006
Tables 4 to 6 are displaying the “lmg” metrics of the attributes regarding the satisfaction with the
dealer and are ordered according to their relative importance.
Table 4: Dealer Satisfiers, Country A, Years 2006 and 2007 respectively
V91
V94
V103
lmg
RI %
0,185213
0,098906
0,093681
18,52%
9,89%
9,37%
V91
V94
V103
32
lmg
RI %
0,1910097
0,09609172
0,08891168
19,10%
9,61%
8,89%
0,085307 8,53%
8,80%
V98
V198 0,08803514
V93
0,072336 7,23%
V93
0,07442355
7,44%
0,06959
6,96%
6,92%
V101
V101 0,06921337
0,064972
6,50%
0,06514394
6,51%
V95
V95
V97
0,062902 6,29%
V97
0,06124171
6,12%
0,05719067
5,72%
V102 0,058982 5,90%
V96
V96
0,055878 5,59%
V102 0,05623089
5,62%
V99
0,055683 5,57%
V99
0,05405284
5,41%
0,05121555
5,12%
V100 0,048342 4,83%
V92
V92
0,048208 4,82%
V100 0,04723924
4,72%
Table 5: Dealer Satisfiers, Country A, Years 2008 and 2009 respectively
V91
V94
V103
V98
V93
V101
V95
V97
V102
V96
V99
V92
V100
lmg
RI %
0,1876471
0,0988604
0,0874508
0,0854664
0,0752088
0,0711366
0,0653855
0,0627659
0,0596454
0,0561828
0,0542581
0,0498752
0,0461172
18,76%
9,89%
8,75%
8,55%
7,52%
7,11%
6,54%
6,28%
5,96%
5,62%
5,43%
4,99%
4,61%
V91
V94
V103
V98
V93
V101
V95
V97
V102
V96
V99
V92
V100
lmg
RI %
0,18835054
0,09666288
0,09215811
0,08446151
0,0736499
0,07036984
0,06565629
0,06140672
0,0573845
0,05621675
0,05484663
0,0518594
0,04697694
18,84%
9,67%
9,22%
8,45%
7,36%
7,04%
6,57%
6,14%
5,74%
5,62%
5,48%
5,19%
4,70%
Table 6: Dealer Satisfiers, Country A, Year 2010
V91
V94
V103
V98
V93
V101
V95
V102
V97
V96
V92
Lmg
RI%
18,78%
10,01%
8,75%
8,35%
7,49%
7,21%
6,77%
6,12%
6,01%
5,50%
5,21%
18,78%
10,01%
8,75%
8,35%
7,49%
7,21%
6,77%
6,12%
6,01%
5,50%
5,21%
33
V99
V100
5,18%
5,18%
4,63%
4,63%
4.1.2 Ranked Satisfiers (related to the satisfaction with the vehicle)
Tables 7 to 9 are illustrating satisfaction attributes regarding the vehicle and are ordered
according to their relative importance.
Table 7: Vehicle satisfiers, Country A, Years 2006 and 2007 respectively
V1
V2
V3
V4
V5
V6
V7
V8
V9
V10
V11
V12
V13
V14
V15
V16
V17
V18
V19
V20
lmg
RI %
0,17533809
0,06934197
0,06650988
0,05235531
0,05018064
0,04983194
0,04902994
0,04524226
0,04373201
0,04235835
0,04168734
0,03989079
0,03868629
0,03835219
0,03749254
0,03447326
0,03417306
0,03296703
0,03095085
0,02740625
17,53%
6,93%
6,65%
5,24%
5,02%
4,98%
4,90%
4,52%
4,37%
4,24%
4,17%
3,99%
3,87%
3,84%
3,75%
3,45%
3,42%
3,30%
3,10%
2,74%
V1
V2
V21
V3
V7
V11
V6
V9
V5
V8
V13
V10
V12
V25
V15
V17
V14
V16
V22
V20
34
lmg
RI %
0,1712068
0,0652616
0,0643264
0,0581126
0,0481799
0,0468997
0,0458007
0,0452338
0,0443357
0,0433154
0,0409352
0,0400688
0,0396294
0,0386817
0,0373659
0,0370816
0,0359364
0,0352603
0,0334706
0,0288976
17,12%
6,53%
6,43%
5,81%
4,82%
4,69%
4,58%
4,52%
4,43%
4,33%
4,09%
4,01%
3,96%
3,87%
3,74%
3,71%
3,59%
3,53%
3,35%
2,89%
Table 8: Vehicle Satisfiers, Country A, Years 2008 and 2009 respectively
lmg
V1
V23
V2
V21
V15
V7
V3
V24
V4
V8
V10
V9
V11
V17
V13
V5
V14
V25
V12
V16
RI %
0,12683183 12,68%
0,08183276 8,18%
0,06149193 6,15%
0,05708817 5,71%
0,04966849 4,97%
0,04958275 4,96%
0,04926621 4,93%
0,04758717 4,76%
0,04521186 4,52%
0,04422061 4,42%
0,04354485 4,35%
0,04270511 4,27%
0,04203345 4,20%
0,03907649 3,91%
0,03894794 3,89%
0,03793851 3,79%
0,03757862 3,76%
0,0361539
3,62%
0,03570495 3,57%
0,03353438
V1
V23
V25
V21
V7
V3
V4
V11
V9
V10
V8
V13
V12
V17
V14
V26
V15
V24
V22
V16
3,35%
35
lmg
RI %
0,11186845
0,0769934
0,06519498
0,06440816
0,05385323
0,05198625
0,04671821
0,04539
0,04407208
0,04395921
0,04258536
0,04169902
0,04096882
0,0406323
0,0406012
0,03910197
0,03835864
0,03791848
0,03720759
11,19%
7,70%
6,52%
6,44%
5,39%
5,20%
4,67%
4,54%
4,41%
4,40%
4,26%
4,17%
4,10%
4,06%
4,06%
3,91%
3,84%
3,79%
3,72%
0,03648265
3,65%
Table 9: Vehicle Satisfiers, Country A, Year 2010
V27
V1
V2
V21
V7
V4
V8
V3
V9
V11
V13
V17
V10
V12
V15
V14
V25
V22
V16
V19
lmg
RI %
0,15284015
0,11910833
0,05708717
0,05692879
0,04802484
0,04617533
0,04512617
0,04414864
0,04018143
0,0388276
0,03744338
0,03739872
0,03708772
0,0362962
0,03521464
0,03482542
0,03391699
0,03373433
0,03357805
0,0320561
15,28%
11,91%
5,71%
5,69%
4,80%
4,62%
4,51%
4,41%
4,02%
3,88%
3,74%
3,74%
3,71%
3,63%
3,52%
3,48%
3,39%
3,37%
3,36%
3,21%
36
4.1.2.1 Among customers that did not experience any problems
The follow up analysis took a closer look on the customers, who did not experience any problems
and compared the obtained relative importances to those obtained in the previous section where
all the customers were included in the analysis.
Tables 10 to 12 are displaying the satisfaction attributes regarding the vehicle, ordered according
to their relative importance.
Table 10: Vehicle Satisfiers, Country A, Years 2006 and 2007 respectively (respondents
with no problems)
V7
V14
V2
V1
V8
V3
V11
V6
V13
V10
V5
V9
V12
V16
V28
V15
V26
V22
V17
V4
lmg
RI %
0,06927767
0,06706004
0,06605162
0,0636296
0,0587735
0,05574437
0,05187558
0,05026635
0,04927452
0,04775618
0,046488
0,04626672
0,04400858
0,04299308
0,04233533
0,04196933
0,04179348
0,03961751
0,03906054
0,035758
6,93%
6,71%
6,61%
6,36%
5,88%
5,57%
5,19%
5,03%
4,93%
4,78%
4,65%
4,63%
4,40%
4,30%
4,23%
4,20%
4,18%
3,96%
3,91%
3,58%
V14
V2
V7
V11
V1
V8
V3
V10
V13
V6
V9
V25
V29
V5
V22
V12
V26
V15
V17
V30
37
lmg
RI %
0,07256916
0,06769435
0,0675463
0,0565514
0,05623229
0,05400056
0,052504
0,05111744
0,05007357
0,0469457
0,04605065
0,04549564
0,04543294
0,04505041
0,04427986
0,04359521
0,04337561
0,04075267
0,04038925
0,03034298
7,26%
6,77%
6,75%
5,66%
5,62%
5,40%
5,25%
5,11%
5,01%
4,69%
4,61%
4,55%
4,54%
4,51%
4,43%
4,36%
4,34%
4,08%
4,04%
3,03%
Table 11: Vehicle Satisfiers, Country A, Years 2008 and 2009 respectively (respondents
with no problems)
V2
V7
V14
V10
V8
V3
V1
V11
V31
V26
V9
V12
V17
V13
V21
V16
V25
V15
V28
V27
lmg
RI %
0,0751163
0,06607087
0,06505198
0,05773926
0,05375072
0,05281364
0,05021427
0,04988962
0,04841213
0,04825039
0,04797203
0,04751064
0,04722728
0,04678278
0,04508103
0,04223649
0,04173557
0,04111215
0,03983678
7,51%
6,61%
6,51%
5,77%
5,38%
5,28%
5,02%
4,99%
4,84%
4,83%
4,80%
4,75%
4,72%
4,68%
4,51%
4,22%
4,17%
4,11%
3,98%
0,03319606
3,32%
V14
V17
V2
V10
V1
V11
V31
V8
V13
V3
V26
V16
V17
V21
V25
V28
V9
V12
V15
V27
38
lmg
RI %
0,07195362
0,0706223
0,06242987
0,05606256
0,0551477
0,05439075
0,0535409
0,05202964
0,05015063
0,0497964
0,04625483
0,04612255
0,04419138
0,04413617
0,04400253
0,04375152
0,0433502
0,04258554
0,03882214
7,20%
7,06%
6,24%
5,61%
5,51%
5,44%
5,35%
5,20%
5,02%
4,98%
4,63%
4,61%
4,42%
4,41%
4,40%
4,38%
4,34%
4,26%
3,88%
0,03065876
3,07%
Table 12: Vehicle Satisfiers, Country A, Year 2010 (respondents with no problems)
V2
V7
V14
V8
V1
V31
V11
V13
V10
V3
V21
V170
V9
V26
V25
V22
V16
V15
V12
V27
lmg
RI %
0,06599232
0,06496484
0,060577
0,05773304
0,05416185
0,05325409
0,05198424
0,05113744
0,05081902
0,04954933
0,04667376
0,04555793
0,04555003
0,04542082
0,04530835
0,04492645
0,04480985
0,041844
0,04176165
0,03797399
6,60%
6,50%
6,06%
5,77%
5,42%
5,33%
5,20%
5,11%
5,08%
4,95%
4,67%
4,56%
4,56%
4,54%
4,53%
4,49%
4,48%
4,18%
4,18%
3,80%
From the above tables it is possible to notice that the contributions of the satisfaction attributes
are very close in terms of importance. A number of new attributes, which were previously less
important, entered the new model (e.g. V28, V29, V30 and V31). The importance of the attribute
V14 increased greatly and is appearing on the top three list each year.
39
4.1.3 Ranked Dissatisfiers
In contrast to the previous section, this part is focusing on the identification of the greatest
dissatisfier. The Shapley value was calculated for all experienced problem areas, followed by
analysis of problems in each problem area (i.e. sub-categories).
Tables 13 to 15 are illustrating the problem areas ranked according to their relative importance.
Table 13: Dissatisfiers, Country A, Year 2006 and 2007 respectively
Ven
Vb
Vc
Vel
Vi
Vo
Vsw
Vbr
Ve
Vp
Vs
Vw
Vex
Vot
lmg
RI %
0,212139073
0,12142325
0,120215922
0,108495021
0,100405042
0,062205607
0,0606551
0,045555738
0,042242747
0,039842387
0,033740768
0,03095202
0,013582749
21,21%
12,14%
12,02%
10,85%
10,04%
6,22%
6,07%
4,56%
4,22%
3,98%
3,37%
3,10%
1,36%
0,008544577
0,85%
Ven
Vc
Vb
Vel
Vi
Vo
Vsw
Vw
Vbr
Vs
Vp
Ve
Vex
Vot
40
lmg
RI %
0,212809172
0,137871419
0,123632357
0,089960019
0,074496974
0,061172457
0,056500635
0,05154355
0,048614873
0,046087759
0,042708995
0,035370297
0,0104424
21,28%
13,79%
12,36%
9,00%
7,45%
6,12%
5,65%
5,15%
4,86%
4,61%
4,27%
3,54%
1,04%
0,008789093
0,88%
Table 14: Dissatisfiers, Country A, Year 2008 and 2009 respectively
Ven
Vb
Vc
Vel
Vi
Vo
Vw
Vbr
Vp
Vs
Vsw
Ve
Vex
Vot
lmg
RI %
0,252488469
0,110277994
0,103348859
0,093153875
0,087610414
0,05987042
0,05203434
0,048519557
0,045876248
0,042098418
0,039320631
0,033029765
0,025816516
0,006554494
25,25%
11,03%
10,33%
9,32%
8,76%
5,99%
5,20%
4,85%
4,59%
4,21%
3,93%
3,30%
2,58%
0,66%
Ven
Vc
Vb
Vel
Vi
Vs
Vbr
Vo
Ve
Vsw
Vp
Vw
Vex
Vot
lmg
RI %
0,214316999
0,151047967
0,122505803
0,088606572
0,079500754
0,055625147
0,050078129
0,049666289
0,048929548
0,047450551
0,047259639
0,027595428
0,015697143
0,001720031
21,43%
15,10%
12,25%
8,86%
7,95%
5,56%
5,01%
4,97%
4,89%
4,75%
4,73%
2,76%
1,57%
0,17%
Table 15: Dissatisfiers, Country A, Year 2010
Ven
Vc
Vb
Vel
Vi
Vo
Vbr
Vs
Vsw
Vp
Ve
Vw
Vex
Vot
lmg
RI %
0,292895
0,136178
0,089357
0,08703
0,065648
0,05879
0,049947
0,049329
0,041235
0,039328
0,036337
0,028256
0,014915
0,010756
29,29%
13,62%
8,94%
8,70%
6,56%
5,88%
4,99%
4,93%
4,12%
3,93%
3,63%
2,83%
1,49%
1,08%
41
The analysis was then applied to sub-categories in order to identify the absolute dissatisfier.
Table 16: Ven problem area sub-categories, Country A, Year 2006
Ve4
Ve1
Ve7
Ve98
Ve5
Ve8
Ve6
Ve18
Ve15
Ve9
Ve19
Ve2
Ve11
Ve16
Ve10
Ve27
Ve26
Ve22
Ve12
Ve14
Ve17
Ve3
lmg
RI %
0,244029091
0,20440145
0,149881772
0,105140416
0,071517411
0,041940597
0,033103608
0,019165955
0,018551414
0,018305727
0,016829286
0,016335526
0,01555382
0,013381763
0,008700418
0,007703845
0,006485322
0,003608788
0,002974477
0,00104175
0,000853017
0,000494547
24,40%
20,44%
14,99%
10,51%
7,15%
4,19%
3,31%
1,92%
1,86%
1,83%
1,68%
1,63%
1,56%
1,34%
0,87%
0,77%
0,65%
0,36%
0,30%
0,10%
0,09%
0,05%
42
4.1.4 “Key attributes” identification
Figure 6: Noise-Reach table, Country A, Year 2006
Figure 6 (above) is illustrating the “key attributes” identification. Problem areas are ranked
according to the corresponding Shapley values and “reach” and “noise” are calculated according
to the equation 15. Once added noise overcomes added reach, the cutting point is known. All
problem areas with corresponding success less than 0 are unimportant.
4.2 Time Series and Trend Analysis
Time series analysis in order to detect possible trend in relative importance was applied to those
satisfaction attributes (in relation to the vehicle) that were repeating in the model over the 5 years.
This is illustrated in figure 7.
43
Figure 7: Time Series Analysis, Country A
According to the above chart, the relative importance of the satisfaction attribute V1 is showing
most fluctuation over time, while the remaining attributes are fairly stable.
Figure 8 shows fitted linear trend line, which illustrates the changes in satisfaction attribute V1
over five consecutive years of study. The R2 represents trendline trustworthiness. Its value of
0,8153 confirms a fairly good fit of the line to the data.4
4
Trend fitted to the remaining variables is displayed in Appendix B.
44
Figure 8: Trend in V1, Country A
Since there were significant differences in relative importance of the attributes when taking into
account all the respondents and when only performing the analysis on respondents who did not
experience any problems, time series analysis was applied to the latter as well. Figure 9 shows
the changes in relative importance of the attributes that were continuously included over all five
years.5
Figure 9: Time Series Analysis, Country A (respondents with no problems)
5
The remining attributes trend analysis graphs are in Appendix B
45
Trend analysis was then applied to the same satisfaction attribute (i.e. V1). While the previous
case (where all the respondents were included in the analysis) the linear trend provided a good fit,
here a better option (with R2 = 0,8429) was a polynomial trend.
Figure 10: Trend in V1, Country A (respondents with no problems)
Several attributes appeared in both analyses (i.e. when all the respondents were included and
where only those who had reported experienced problem were considered). However, there are
differences to note when comparing trends over the five years. This is illustrated in Figure 11.
Figure 11: Trend in V8, Country A, all respondents vs. only those with no problems
While satisfaction attribute V8, still follows rather similar pattern, a very big difference can be
noticed in the following (figure 12), attribute V10.
46
Figure 12: Trend in V10, Country A, all respondents vs. only those with no problems
Figure 13: Trend in V17, Country A, all respondents vs. only those with no problems
The trend pattern in attribute V17 resulted in expected similarities. Since the perception of this
particular attribute is directly linked to the fact whether a certain problem (especially Ven; which
also has the greatest contribution to the overall satisfaction) occurred, the slope is steeper when
all respondents are included in the model.
47
As a last step of time series analysis, the relative contribution of problem areas to the overall
dissatisfaction was inspected.
Figure 14: Time Series Analysis, problem areas, Country A
Figure 14 illustrates an increase in relative importance of the problem area Ven on overall
dissatisfaction while the remaining problems show rather stable patterns or minor decreases.
4.3 Hierarchical Logistic Regression: SAS Modeling
The investigation whether the results depend on the make-model of the car or not was conducted
with hierarchical logistic modeling. Table 17 illustrates the chosen variables for each level of the
regression. The corresponding SAS code can be found in Appendix A.
Table 17: Bulding the GLIMMIX procedure
Defintion
Dependent variable measured at the car level; within the j-th
make-model
Number of
Car (micro) level variable; measuring the number of problems
problems
identified
Recommendation
Make-model (macro) level variable; indicating whether the
customer would recommend the model in question
Table 18: Country A, Year 2006
Fit Statistics
Variable
Satisfaction
48
2 Res Log Pseudo-Likelihood
Generalized Chi-Square 36400.50
Gener. Chi Square / DF
210468
36400,5
0,99
Covariance Parameter Estimates
Cov Parameter
Subject
Estimate
Standard Error
Intercept
V4
0,02303
0,01175
Table 18 is a part of the SAS output and is displaying the between make-model variance, which
equals 0,02303.
Effect
V387
V227
Table 19
Type III Tests of Fixed Effects
Num Den
DF DF
F Value
1 36469 158.92
3 508 431.09
Pr > F
<.0001
<.0001
The P-value from the Wald chi-square is <.0001, indicating statistically significant association
between make-model and variables V387 and V227.
Table 20: Solution for fixed effects
Solutions for fixed effects
Effect
Intercept
V387
V227
V227
Recommendation
New Car
1.00
2.00
Estimate
0.8552
0.2161
-3.9507
-3.5416
Error
0.1593
0.01714
0.1582
0.1592
DF
232
36469
508
508
t Value
5.37
12.61
-24.97
-22.24
Pr > |t|
<.0001
<.0001
<.0001
<.0001
The coefficient of V387 is 0,2161 , and its standard error is 0,01714. The corresponding P-value
is <.0001 which indicates statistical significance. This indicates that V387 and V227 have
significant effect on overall satisfaction.
49
4.4 Canonical Correlation Analysis
Canonical correlation analysis (CCA) was applied to all satisfaction attributes and a priori
specified problem areas (including only problems that are of “annoying concept”).
Table 21 shows the strongest possible linear combination between any sets of variables. In
addition, it provides information on how many of the canonical variables are significant (i.e. the
first 15). In general the number of canonical variates is equal to the number of variables in the
smaller set, however the number of significant canonical variates is usually smaller. The first Ftest corresponds to the hypothesis whether all canonical variates are significant, the second
whether the combinations of all remaining excluding the first one are significant and so on.
50
Table 21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Canonical
Adjusted
Canonical
Approximate
Standard
Squared
Canonical
Correlation
Correlation
Error
Correlation
0.367972
0.193745
0.185235
0.153670
0.102792
0.096696
0.084292
0.078367
0.077105
0.070409
0.067683
0.065785
0.063492
0.061129
0.058516
0.057007
0.052718
0.051726
0.050328
0.045737
0.044686
0.043131
0.042241
0.037202
0.036330
0.035254
0.032469
0.029588
0.027873
0.023221
0.021481
0.018175
0.364525
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0.004763
0.005302
0.005320
0.005379
0.005451
0.005457
0.005470
0.005475
0.005476
0.005482
0.005484
0.005485
0.005487
0.005488
0.005490
0.005491
0.005494
0.005494
0.005495
0.005497
0.005498
0.005499
0.005499
0.005501
0.005502
0.005502
0.005503
0.005504
0.005505
0.005506
0.005506
0.005507
0.135404
0.037537
0.034312
0.023614
0.010566
0.009350
0.007105
0.006141
0.005945
0.004957
0.004581
0.004328
0.004031
0.003737
0.003424
0.003250
0.002779
0.002676
0.002533
0.002092
0.001997
0.001860
0.001784
0.001384
0.001320
0.001243
0.001054
0.000875
0.000777
0.000539
0.000461
0.000330
51
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Likelihood
Eigenvalue Difference Proportion Cumulative Ratio
Approximate
F Value
Num DF Den DF
Pr > F
0.1566
0.0390
0.0355
0.0242
0.0107
0.0094
0.0072
0.0062
0.0060
0.0050
0.0046
0.0043
0.0040
0.0038
0.0034
0.0033
0.0028
5.39
3.18
2.66
2.15
1.79
1.67
1.56
1.48
1.42
1.36
1.31
1.26
1.21
1.16
1.11
1.07
1.02
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
0.0004
0.0106
0.0918
0.3675
0.1176
0.0035
0.0113
0.0135
0.0012
0.0023
0.0010
0.0002
0.0010
0.0004
0.0003
0.0003
0.0003
0.0003
0.0002
0.0005
0.0001
0.4514
0.1124
0.1024
0.0697
0.0308
0.0272
0.0206
0.0178
0.0172
0.0144
0.0133
0.0125
0.0117
0.0108
0.0099
0.0094
0.0080
0.4514
0.5638
0.6662
0.7359
0.7667
0.7939
0.8146
0.8324
0.8496
0.8640
0.8772
0.8898
0.9014
0.9122
0.9221
0.9315
0.9396
0.71610449
0.82825292
0.86055586
0.89113248
0.91268499
0.92243167
0.93113789
0.93780107
0.94359606
0.94923947
0.95396867
0.95835891
0.96252443
0.96642027
0.97004514
0.97337812
0.97655169
2048
1953
1860
1769
1680
1593
1508
1425
1344
1265
1188
1113
1040
969
900
833
768
941133
914643
888037
861308
834451
807459
780326
753044
725608
698007
670236
642284
614145
585807
557262
528501
499513
Tabl 22 illustrates several multivariate statistics. The small p-values for these tests implie
rejection of the null hypothesis that all the canonical correlation are zero.
Table 22
Multivariate Statistics
and F
Approximations
Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley
Trace
Roy's Greatest Root
S=32
M=15.5
Value
N=16426.5
F Value
Num DF
Den DF
Pr > F
0.71610449
0.32198874
5.39
5.22
2048
2048
941133
1.05E6
<.0001
<.0001
0.34693587
0.15660903
5.57
80.47
2048
64
693852
32886
<.0001
<.0001
NOTE: F Statistic for Roy's Greatest Root is an upper bound.
The canonical variables, despite being “artificial” can be identified in terms of original variables.
The standardized canonical coefficients (table 23) are interpreted in a similar manner as for
standardized regression coefficients. For example, one standard deviation increase in the first
52
variable (V192), leads to 0,1928 standard deviation increase in the score on the first canonical
variate for set 1, ceteris paribus.
Table 23
V192
V194
Satisf1
Satisf2
Satisf3
Satisf4
Satisf5
0.1628
0.1018
0.1983
0.3363
-0.0386
0.1476
0.0514
0.1327
-0.2083
0.2528
Satisf6
0.4022
0.2809
The first step in the explanation of the CCA is examining the sign and magnitude of the canonical
weight. However, canonical weights may be affected by multicollinearity. Hence, examining
canonical loadings is considered to be more reliable. Finally cross-loadings can be examined. A
cross-loading is the correlation between an observed variable from satisfaction area with one
canonical variate from the problem area (and vice versa).
The CCA did provide a good model for identification of the linkages between satisfaction
attributes and problem areas. However, that was not the case when taking only a priori specified
problem sub-categories (those that are of the type “annoying concept”) into consideration. While
the significant relationships were found, the structural context was perfunctory.
5 Discussion and conclusions
Throughout this paper several statistical techniques and applied economics methods have been
implemented in order to build exploratory and predictive models that lead to accurate outputs.
There were three major objectives; exploring relative importance and marginal contributions of
several satisfaction attributes on overall customer satisfaction, evaluating the relationships
between experienced problems with the product and satisfaction attributes and investigating
whether the former depends on volume mix.
The first major challenge encountered when selecting an appropriate methodology was the nature
and the dimensionality of the input data. A very large number of different types of the input
variables and their distributions did present severe problems for several rigid techniques. In
addition, there may be several externalities affecting customer behavior and perceptions that were
53
not measured by the survey, and therefore remain unknown. The second challenge was to
overcome the problem of multicollinearity that is common to appear in sciences with
predominance of observational data and finally, the measurements used needed to be consistent
over time and allow for trend detection. Given this, the methodology used needed to be very
flexible, with as few underlying requirements as possible, yet computationally efficient and
accurate.
The first method used was the Shapley value, as it solved the core part of the research. The topic
of assigning relative importance to predictors in regression is in general quite old. However, more
recent developments in computational capabilities have led to applications of advanced methods
and enable different approaches to decomposition of the R2. This type of decomposition is often
met in sciences that rely on observational data (i.e. psychology, economy and so forth). The
metric “lmg” offered by the R - relaimpo - package is based on heuristic approach of averaging
over all orders (Grömping, 2007, pp. 139 - 147). In many of the previous studies relative
importance is described in purely descriptive fashion (i.e. no explanation of the statistical
behavior of the variance is given). This research takes a step forward and offers a more
illustrative example of the R2 decomposition, which is important for understanding the Shapley
value (which is exactly the “lmg” metric).
The results obtained were very satisfactory, since the Shapley value is a very robust estimator and
can handle very complex datasets, including large portions of missing values and different types
of measurement levels of the input variables. It successfully avoids falling into the
multicollinearity trap. The method is very stable in evaluating the impact of attributes measured
over time. The changes in consecutive time periods are in fact due to real changes in the market.
Furthermore, the basis for the “key attributes” analysis is in fact the Shapley value. It provides
very a useful tool that can be applied to numerous problems of data modeling in various
managerial fields. This research effectively identified the attributes that need managerial attention
and if improved increase sales and profitability. As a result of such analysis, decision makers can
implement several strategies for customer acquisition and retention.
54
The results based on the relative importance when all the respondents were included in the model
were compared to the results from a dataset limited only to respondents who did not experience
any problems, which can be seen as a type of segmentation technique that groups customers with
similar behaviors and consequentially attributes preferences into two distinctive groups. This
placement offers optimization of the targeting processes.
The latter group (i.e. group that did not experience any problems) perceives the attributes that are
directly related to the features and characteristics of the new vehicle as much more important than
the group that experienced problems. Among those customers, the attributes that are of broader
nature (i.e. are connected to the performance and overall quality) contribute heavily to the overall
satisfaction. Moreover, the gap between the importance of the feature- related attributes and the
overall quality attributes is much wider than within the first group.
There are also differences to note between observing the relative importance of the attributes
regarding the vehicle and the ones regarding the dealer. The latter did not show much change
over time and moreover even the ranking of the attributes did not change significantly and the top
list is always consisting of the same attributes.
Time series analysis was then applied to satisfaction attributes (both previously mentioned group
of respondents) and problem areas. Due to the fact that the questionnaire changed over the years,
not all satisfaction related attributes re-appeared in all consecutive years. Therefore, only those
that appeared in all five models were used in trend analysis. The data displayed many
fluctuations; therefore a polynomial trend represented the best fit. However, even this was very
weak in the majority of cases and several attributes (V2, V11, V13, V9, V12) did not show any
trend pattern whatsoever.
There were several differences to note when comparing the trend patterns of the same attribute
within the group of all respondents and the group of those who reported experiencing a problem.
The nature of V1 attribute is such that its relative contribution to the overall satisfaction is greater
when experiencing problems (i.e. the attribute is perceived as more valuable with customers who
had problems). Hence, the trend illustrates a similar but steeper pattern in the first group. While
55
satisfaction attribute V8, still follows a rather similar shape, a very big difference can be noticed
in attribute V10. Since the proportion of the customer experiencing problems did not change
significantly over the years, the explanation of these different behaviors lies in the nature of the
attributes and the perceptions affected by psychological factors among those that experience a
certain problem.
The research continued with an investigation that combine individual-level and aggregate data
are rather common. The method used was the hierarchical logistic regression. The advantage of
such modeling is that it takes the hierarchical structure of the data into account. It specifies
random effects on all levels of the analysis and consequentially provides more conservative
implication of the aggregate fixed effects. Such aggregate data often includes valuable hints on
individual behavior.
An important variable to take into consideration is the willingness to recommend. It is the key
metric relating to customer satisfaction. The results obtained showed statistically significant
association between the make-model and variables V387 and V227.
While the main benefit of using the CCA is that it provides a good exploratory technique when
comparing two sets of variables, an issue of “meaningfulness” and “significance” appeared in this
case. The CCA performed well with manipulated datasets (i.e. limiting the dataset to attributes
and problems that are assumed to be correlated) and it was an appropriate method to choose. The
problem appeared when it was applied to all the satisfaction attributes and an a priori specified
set of sub-problems. While it did couple the attributes, it failed to provide satisfactory results in
sense of meaningfulness (i.e. the results obtained were not logical in the sense of which
satisfaction attributes coupled with which experienced problem).
In order to automatize the methods used, the surveys should not vary in terms of variables and
attributes over the years nor between the countries. Hence a standardization of the surveys is
needed. Moreover, the coding, labels and formats of the variables in question should be
synchronized. Some of the codes in the appendix A may be re-applied.
56
The results are consistent with the key element of the company objectives, which is intention of
building on customer satisfaction and retention. The results are also broadly compatible with
researches done in other sectors. Hence the most important objective of the customer satisfaction
analysis is revealed in this research.
While all used models did perform fairly well there is room for further investigation and research.
5.1 Proposed further research
5.1.1 Kernel Canonical Correlation Analysis
The issue with the classical canonical correlation is that it is limited to linear associations. Using
Kernel methods as a pre-step in the analysis can enhance the results, by extending the classical
model to a general nonlinear setting. In addition, it no longer requires the Gaussian distributional
assumption for the observations.
5.1.2 Moving Coalition Analysis
Mansor and Ohsato (Mansor & Ohsato, 2010) proposed a method called “Motion Coalition
Analysis” (MCA) to observed the performance trends of a coalition over time. It divides the
coalition into several sub-coalitions and determines the characteristic function of all subcoalitions. Each period is then treated as a player (Mansor & Ohsato, 2010).
57
6 Literature and sources
1. Alterman, T., Deddens, A.J., Constella, J.L., (NN). Analysis of Large Hierarchical Data
with Multilevel Logistic Modeling Using PROC GLIMMIX. SUGI – SAS Users Group
International. 151 (32).
2. Dai, J., Li, Z., Rocke, D. (NN). Hierarchical Logistic Regression Modeling with SAS
GLIMMIX. University of California: Davis.
3. Chantreuil, F., Trannoy, A. (1999). Inequality decomposition values: the trade-off
between marginality and consistency. THEMA Discussion Paper. Universite de CergyPontoise: France.
4. Conklin, M., Powaga, K., Lipovetsky, S. (NN, 2004). Customer Satisfaction Analysis:
Identification of Key Drivers. European Journal of Operational Research, 3 (154), 819827.
5. Feldman, B. (December, 1999). The proportional Value of a Cooperative Game.
Accessed
21st
September,
2011
on
webpage
http://ideas.repec.org/p/ecm/wc2000/1140.html
6. Feldman, B. (March, 2007). A theory of attribution. Accessed 5th September, 2011 on
webpage http://mpra.ub.uni-muenchen.de/3349/
7. Garavaglia, S., Sharma, A. (NN). A smart guide to dummy variables: Four applications
and a macro. New Jersey: Murray Hill.
8. GFK Customer Loyalty (NN). Getting Better Regression Results with Shapley Value
Regression.
Accessed
on
28th
September,
2011
on
webpage
http://marketing.gfkamerica.com/website/articles/ShapelyValueRegression.pdf
58
9. Grömping, U. (May, 2007). Estimators of Relative Importance in Linear Regression
Based on Variance Decomposition. The American Statistician. 2 (61), pp. 139 - 147.
10. Grömping, U. (October, 2007). Relative Importance for Linear Regression in R: The
Package relaimpo. Journal of Statistical Software. 1 (17).
11. Hair, J.F, Anderson, R.E., Tatham, R.L., Black, C.W. (1998). Multivariate Data Analysis
(5th ed.). New Jersey: Prentice Hall, Inc.
12. Hart, S., Mas-Colell, A. (May, 1989). Potential, Value and Consistency. Econometrica, 57
(3), 589-614.
13. Hausknecht,
R.D.
(NN,
1990).
Measurment
Scales
in
Satisfaction/Dissatisfaction.
Accessed
16th
June,
2011
on
http://lilt.ilstu.edu/staylor/csdcb/articles/Volume3/Hausknecht%201990.pdf.
Customer
webpage
14. Huang, S-Y., Lee, H-M, Hsiao, C.K. (August, 2006). Kernel Canonical Correlation
Analysis and its Applications to Nonlinear Measures of Association and Test of
Independence. Institute of Statistical Science: Academia Sinica, Taiwan.
15. Israeli, O. (March, 2007). A Shapley-based decomposition of the R-square of a linear
regression. The Journal of Economic Inequality. 5, 199-212.
16. Johnson, A.R., Wichern, W.D. (2001). Applied Multivariate Statistical Analysis (5th ed.).
New Jersey: Prentice Hall.
17. Knapp, T. (March/April, 1990). Commentary: Treating Ordinal Scales as Interval Scales:
An Attempt to Resolve the Controversy. Psychometrica, 39 (2), 121-123.
18. Kruskal, W. (NN, 1987). Relative Importance by averaging over orderings. The American
Statistician, 41 (1).
19. Likert, R. (NN, 1932). A technique for the measurement of attitudes. Archives of
Psychology. 140 (32), 55.
20. Lipovetsky S., Conklin M. (NN, 2001). Analysis of Regression in a Game Theory
Approach. Applied Stochastic Models in Business and Industry. 17, 319-330.
21. Mansor, M.A., Ohsato, A. (NN, 2010). The Concept of Moving Colaition Analysis and its
Transpose. European Journal of Scientific Research. 4 (39), 548-557.
22. Mikulic, J., Prebežac, D. (NN, 2011). A critical review of techniques for classifying
quality attributes in the Kano model. Managing Service Quality, 1 (21), 46 – 66.
23. Petrosjan, L., Zaccour, G. (June, 2001). Time-consistent Shapley value allocation of
pollution cost reduction. Journal of Economic Dynamics & Control. 2003 (27), 381-398.
59
24. Siegel, S. (1967). Nonparametric statistics for the behavioral science. New York:
McGraw-Hill Book Co.
25. Shapley, L.S. (1953). A value for n-person games. In: Kuhn, H.W. and Tucker, A.W.
(eds.), (1953). Contributions to the theory of games. Princeton: Priceton University Press,
307-317.
26. Sharrocks, A.F. (1999). Decomposition Procedures for Distributional Analysis: A unified
Frameowrk Based on the Shapley Value.United Kingdom: University of Essex.
27. Theil, H. (NN, 1987). How many bits of information does an independent variable yield
in a multiple regression? Statistics and Probability Letters, 6 (2).
28. Von Neumann, J. (1928). On theory of playing games. English translation in: Kuhn,
H.W. and Tucker, A.W. (1959) Contribution to the Theory of Games. Princeton:
Princeton University Press, 13-41.
29. Weiner, J.L., Tang, J. (NN). Multicollinearity in Customer Satisfaction Research. Ipsos
Loyalty: www.ipsosloyalty.com
30. Yeung, W.K. D. (NN, 2010). Time consistent Shapley Value Imputations for Cost-Saving
Joint
Ventures.
Accessed
21st
September,
2011
on
webpage
http://www.mathnet.ru/php/archive.phtml?wshow=paper&jrnid=mgta&paperid=44&optio
n_lang=eng.
31. Yeung D.W.K., Petrosyan L.A. (NN, 2004). Subgame consistent cooperative solutions in
stochastic differential games. Journal of Optimization Theory and Applications. 120 (3),
651-666.
60
Appendix A: SAS and R codes
• Univariate Analysis (SAS graphics)
goptions reset = (axis, legend, pattern, symbol, title, footnote)
colors=(black blue green red yellow cyan gold)
norotate
hpos=0 vpos=0 htext= ftext= ctext= target= gaccess=
gsfmode= ;
title1 'Frequency Distribution'
color=gold;
title3 underlin=1
'V191'
color=red;
Footnote color = green 'Dataset: ;
goptions device=WIN ctext=blue
graphrc interpo=join;
pattern1 color=blue value=X1;
pattern2 color=blue value=X1;
pattern3 color=blue value=X1;
pattern4 color=blue value=X1;
pattern5 color=blue value=X1;
pattern6 color=blue value=X1;
pattern7 color=blue value=X1;
pattern8 color=blue value=X1;
pattern9 color=blue value=X1;
pattern10 color=blue value=X1;
axis1 color=blue width=2.0;
axis2 color=blue width=2.0;
axis3 color=blue width=2.0;
proc gchart data=;
hbar V191/DISCRETE;
run;
• Dummy variables MACRO (SAS)
option nosymbolgen mlogic mprint
obs=999999999;
libname;
filneame out;
data;
set;
;
/*MACRO PARAMETERS :
dsn = input dataset name,
var = variable to be categorized,
prefix = categorical variable prefix,
flat = flatfile name with code (referenced in file name statement)*/
59
%macro dmycode (dsn =,
Var =,
Prefix =,
Flat = );
proc summary data = &dsn nway;
class &var ;
output out = x (keep=&var);
proc print;
*;
data _null_;
set x nobs=totx end=last;
if last then call symput (‘tot’,
trim(left(put(totx, best.))));
call symput (‘z’ || trim (left (
put (_n_; best. ))), trim(left
(&var)));
data _null_;
file ♭
%do i=1 to %tot;
put “&prefix&&z&I =0;”;
%end
put “SELECT;”;
%do i=1 %to &tot;
put “ when (&var= &&z&i)
&prefix&&z&I = 1;”;
%end
put “ otherwise V_oth=1;”;
put “end;”;
run;
%mend dmycode;
%dmycode (dsn = , var =, prefix =, flat =out);
run;quit;
• Relative Importance (R-code)
>linmod <- lm(‘response_variable’~ .. , data=)
>metrics <- calc.relimp(linmod, type = c(“lmg”, “first”, “last”), rela=TRUE)
>metrics
• Canonical Correlation (SAS-Code)
proc cancorr corr data=
vprefix = problems wprefix = satisfaction
vname = „Problem Areas wname = „Satisfaction Areas ;
var Vp Ve Vw Vb Vo Vi Vel Ven Vcl Vbr Vsw Vs Vex Vot;
with v191 v192 v193 v194 v195 v196 v197 v198 v199 v200 v201 v202 v203 v204 v205 v206 v207
v208 v209 v210 v211 v212 v213 v214 v215 v216 v217 v218 v219 v220 v221 v222 v223 v224;
run;
• Hierarchical Logistic Regression (SAS Code)
data;
set;
IF V >= 5 then V_S = 1;
Else V_S = 0;
Keep V_S make-model number_problems recommendation;
run;
proc glimmix;
class make-model recommendation;
model V_S = number_problems recommendation / dist=binary link=logit ddfm=bw
solution;
random intercept / subject=make-model;
run;
Appendix B: Outputs
I.
Relative Importance (R-Output)
• Attributes related to the vehicle
Country A: Year 2006
Response variable: V191
Total response variance: 2.058392
Analysis based on 34510 observations
20 Predictors:
V200 V195 V220 V194 V196 V198 V201 V212 V216 V213 V214 V192 V206 V207 V209 V197
V215 V218 V224 V210
Proportion of variance explained by model: 56.21%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V200 0.17533809 6.670676e-01 0.07014639
V195 0.06650988 3.989017e-02 0.06035864
V220 0.06934197 9.345311e-02 0.05804864
V194 0.04902994 5.570296e-03 0.05454818
V196 0.04983194 1.916273e-07 0.05512604
V198 0.03417306 2.064780e-02 0.04590987
V201 0.05018064 3.947526e-05 0.05408146
V212 0.05235531 3.462571e-02 0.05210374
V216 0.02740625 1.781458e-02 0.03937961
V213 0.03447326 4.347548e-04 0.04692400
V214 0.03868629 8.282556e-03 0.05142443
V192 0.03835219 2.292815e-02 0.04229709
V206 0.03989079 3.311699e-03 0.04849407
V207 0.03749254 7.339540e-04 0.04698004
V209 0.04373201 1.269670e-03 0.05122101
V197 0.04524226 4.550039e-02 0.04726832
V215 0.04168734 1.427227e-02 0.04995258
V218 0.04235835 1.150227e-02 0.04707904
V224 0.03296703 5.192290e-03 0.03948656
V210 0.03095085 7.463055e-03 0.03917028
61
Country A, Year 2007
Response variable: V243
Total response variance: 2.0479
Analysis based on 35070 observations
20 Predictors:
V252 V247 V272 V376 V250 V246 V266 V270 V253 V259 V267 V261 V244 V248 V374 V268
V265 V258 V271 V249
Proportion of variance explained by model: 54.55%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V252 0.17120676 7.033375e-01 0.06804549
V247 0.05811261 1.483454e-02 0.05774994
V272 0.06526155 8.042678e-02 0.05630593
V376 0.06432643 5.451424e-02 0.05757349
V250 0.03708160 4.096109e-03 0.04747884
V246 0.04817986 1.375960e-02 0.05293146
V266 0.04093524 4.557363e-03 0.05166054
V270 0.04006884 9.694720e-03 0.04584990
V253 0.04433569 6.528559e-03 0.05098552
V259 0.03736594 6.950527e-05 0.04676655
V267 0.04689966 2.915300e-02 0.05087537
V261 0.04523375 4.699467e-03 0.05100752
V244 0.03593635 1.401735e-02 0.04100905
V248 0.04580069 4.881238e-05 0.05297541
V374 0.03868168 9.356847e-03 0.04424975
V268 0.02889764 2.119193e-02 0.04031436
V265 0.03526031 1.385710e-04 0.04622123
V258 0.03962943 2.872651e-03 0.04817641
V271 0.03347058 1.434143e-04 0.04254681
V249 0.04331541 2.655901e-02 0.04727644
Country A, Year 2008
Response variable: V185
Total response variance: 2.086333
Analysis based on 35992 observations
20 Predictors:
V358 V350 V189 V193 V188 V186 V201 V348 V191 V204 V345 V198 V199 V190 V200 V192
V346 V205 V356 V357
Proportion of variance explained by model: 57.11%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V358 0.05708817 0.0162859062 0.05756880
V350 0.06149193 0.0589116256 0.05669807
V189 0.04926621 0.0172649684 0.05199343
V193 0.12683183 0.5019156978 0.06398757
V188 0.04958275 0.0149977892 0.05311521
V186 0.03757862 0.0192935222 0.04208401
V201 0.04270511 0.0001110518 0.05140442
V348 0.04354485 0.0117737912 0.04826778
V191 0.03907649 0.0189771626 0.04858283
V204 0.04521186 0.0225305517 0.04850728
V345 0.03894794 0.0087256304 0.05034929
V198 0.03793851 0.0002560687 0.04798146
V199 0.03570495 0.0019793658 0.04624136
V190 0.04422061 0.0148505132 0.04666335
V200 0.04966849 0.0629385786 0.04740075
V192 0.08183276 0.1805968122 0.05385416
V346 0.04203345 0.0131359458 0.04957674
V205 0.03353438 0.0030582268 0.04488937
V356 0.03615390 0.0016817661 0.04343725
V357 0.04758717 0.0307150258 0.04739688
Country A, Year 2009
Response variable: V166
Total response variance: 1.970236
Analysis based on 34290 observations
20 Predictors:
V174 V200 V172 V169 V192 V170 V185 V182 V198 V167 V190 V186 V179 V187 V188 V173
V180 V171 V191 V189
Proportion of variance explained by model: 56.19%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V174 0.11186845 0.4531807766 0.06079830
V200 0.06440816 0.0440922785 0.05898719
V172 0.04063230 0.0287191872 0.04937835
V169 0.05385323 0.0289676112 0.05400725
V192 0.06519498 0.1002344739 0.05615337
V170 0.05198625 0.0154245596 0.05291007
V185 0.04671821 0.0252078885 0.04884621
V182 0.04407208 0.0022455794 0.05098877
V198 0.03791848 0.0050820263 0.04381818
V167 0.04060120 0.0268104287 0.04310929
V190 0.04395921 0.0131949586 0.04841819
V186 0.03648265 0.0007514274 0.04630650
V179 0.04096882 0.0042731491 0.04813841
V187 0.04169902 0.0062132525 0.05152353
V188 0.04539000 0.0283392387 0.05035166
V173 0.07699340 0.2045259857 0.05167943
V180 0.03835864 0.0001828287 0.04660620
V171 0.04258536 0.0111044666 0.04600678
V191 0.03720759 0.0006127727 0.04367013
V189 0.03910197 0.0008371102 0.04830217
Country A, Year 2010
Response variable: V136
Total response variance: 2.300986
Analysis based on 26077 observations
20 Predictors:
V169 V144 V142 V140 V162 V156 V158 V157 V139 V160 V161 V155 V150 V153 V149 V152
V167 V141 V170 V137
Proportion of variance explained by model: 57.96%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V169 0.05692879 8.767741e-03 0.05894160
V144 0.11910833 1.788261e-01 0.06802633
V142 0.03739872 4.597620e-03 0.04892917
V140 0.04414864 2.612953e-03 0.05170097
V162 0.05708717 8.176049e-02 0.05466183
V156 0.03357805 9.584655e-04 0.04639863
V158 0.03882760 9.703937e-03 0.04888020
V157 0.03744338 4.698080e-03 0.05086771
V139 0.04802484 2.619986e-02 0.05315768
V160 0.03708772 3.509546e-03 0.04665635
V161 0.03373433 5.328507e-08 0.04430867
V155 0.04617533 2.388766e-02 0.05029013
V150 0.03521464 5.273087e-04 0.04678968
V153 0.03205610 1.749369e-02 0.04033196
V149 0.03629620 1.551674e-03 0.04755669
V152 0.04018143 1.951319e-04 0.05043326
V167 0.03391699 3.368393e-03 0.04288100
V141 0.04512617 5.022643e-02 0.04709774
V170 0.15284015 5.597033e-01 0.06050640
V137 0.03482542 2.141156e-02 0.04158400
II.
Attributes related to the vehicle (among those that did not
experience any problems)
Country A, Year 2006
Response variable: V191
Total response variance: 1.200345
Analysis based on 16594 observations
20 Predictors:
V200 V194 V195 V209 V220 V192 V214 V201 V215 V196 V198 V207 V218 V217 V206 V213
V197 V219 V221 V212
Proportion of variance explained by model: 54.84%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V200 0.06362960 1.701838e-01 0.05289687
V194 0.06927767 9.967971e-02 0.05819301
V195 0.05574437 4.577733e-03 0.05618415
V209 0.04626672 2.446540e-03 0.05102899
V220 0.06605162 1.471059e-01 0.05597795
V192 0.06706004 2.282329e-01 0.05029801
V214 0.04927452 2.204473e-03 0.05422015
V201 0.04648800 1.070166e-02 0.04940747
V215 0.05187558 5.265314e-02 0.05175723
V196 0.05026635 9.860334e-04 0.05284699
V198 0.03906054 3.515196e-02 0.04659809
V207 0.04196933 2.526534e-03 0.04782239
V218 0.04775618 2.814527e-02 0.04838327
V217 0.04179348 7.071445e-03 0.04639525
V206 0.04400858 1.192303e-02 0.04819595
V213 0.04299308 5.352590e-04 0.04854995
V197 0.05877350 1.771389e-01 0.04963180
V219 0.03961751 2.378401e-05 0.04453943
V221 0.04233533 1.815867e-02 0.04425729
V212 0.03575800 5.532474e-04 0.04281576
Country A, Year 2007
Response variable: V243
Total response variance: 1.201164
Analysis based on 18070 observations
20 Predictors:
V252 V246 V376 V247 V266 V261 V244 V272 V250 V253 V267 V248 V259 V270 V269 V377
V258 V265 V249 V271
Proportion of variance explained by model: 52.59%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V252 0.05623229 1.017877e-01 0.05136808
V246 0.06754630 9.680398e-02 0.05734186
V376 0.04549564 3.427819e-03 0.05280071
V247 0.05250400 6.948057e-03 0.05504152
V266 0.05007357 2.359134e-03 0.05422168
V261 0.04605065 8.670221e-03 0.05048469
V244 0.07256916 2.958595e-01 0.05076800
V272 0.06769435 1.747223e-01 0.05517171
V250 0.04038925 1.117182e-02 0.04778182
V253 0.04505041 1.067390e-02 0.04923535
V267 0.05655140 8.625700e-02 0.05258238
V248 0.04694570 3.807004e-04 0.05179084
V259 0.04075267 8.682217e-06 0.04770874
V270 0.05111744 4.356310e-02 0.04881679
V269 0.04337561 8.680529e-03 0.04756474
V377 0.03034298 1.094301e-02 0.03464764
V258 0.04359521 1.220453e-02 0.04818256
V265 0.04543294 3.480866e-03 0.04936963
V249 0.05400056 1.152797e-01 0.04890685
V271 0.04427986 6.777423e-03 0.04621440
Country A, Year 2008
Response variable: V185
Total response variance: 1.207184
Analysis based on 17654 observations
20 Predictors:
V193 V188 V201 V350 V186 V345 V191 V358 V199 V348 V346 V347 V198 V189 V359 V205
V200 V190 V356 V351
Proportion of variance explained by model: 55.29%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V193 0.05021427 5.073150e-02 0.04978496
V188 0.06607087 7.592560e-02 0.05703007
V201 0.04797203 3.820524e-06 0.05333964
V350 0.07511630 2.209559e-01 0.05823800
V186 0.06505198 2.113912e-01 0.05023616
V345 0.04678278 7.661378e-03 0.05329954
V191 0.04722728 3.099617e-03 0.05135730
V358 0.04508103 2.198063e-03 0.05198768
V199 0.04111215 8.108458e-03 0.04879660
V348 0.05773926 7.494597e-02 0.05237052
V346 0.04988962 3.962038e-02 0.05182858
V347 0.04825039 2.344735e-02 0.05052732
V198 0.04751064 2.433252e-02 0.05071958
V189 0.05281364 5.274360e-03 0.05250057
V359 0.03319606 2.819822e-02 0.03645728
V205 0.04223649 9.318714e-05 0.04824157
V200 0.04841213 7.716896e-02 0.04753308
V190 0.05375072 1.160624e-01 0.04698353
V356 0.04173557 2.981868e-02 0.04456383
V351 0.03983678 9.624043e-04 0.04420420
Country A, Year 2009
Response variable: V166
Total response variance: 1.262197
Analysis based on 17592 observations
20 Predictors:
V174 V169 V182 V172 V187 V192 V167 V200 V190 V189 V180 V188 V170 V179 V201 V181
V186 V198 V171 V193
Proportion of variance explained by model: 55.26%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V174 0.05514770 9.096436e-02 0.05112949
V169 0.07062230 1.016150e-01 0.05823730
V182 0.04335020 4.510340e-03 0.05073319
V172 0.04419138 3.126511e-04 0.05033357
V187 0.05015063 2.018111e-03 0.05492181
V192 0.06242987 9.690852e-02 0.05513183
V167 0.07195362 2.595020e-01 0.05178702
V200 0.04413617 1.741172e-03 0.05163339
V190 0.05606256 7.166346e-02 0.05157028
V189 0.04625483 4.974454e-03 0.05093626
V180 0.03882214 1.082792e-02 0.04750050
V188 0.05439075 4.933711e-02 0.05345940
V170 0.04979640 8.583846e-06 0.05174134
V179 0.04258554 1.375565e-02 0.04836944
V201 0.03065876 1.004690e-02 0.03526965
V181 0.05354090 1.463955e-01 0.04778978
V186 0.04612255 5.872432e-03 0.05001568
V198 0.04400253 3.034489e-02 0.04604635
V171 0.05202964 8.247920e-02 0.04760875
V193 0.04375152 1.672166e-02 0.04578497
Country A, Year 2010
Response variable: V136
Total response variance: 1.309809
Analysis based on 13350 observations
20 Predictors:
V144 V139 V152 V142 V157 V137 V162 V169 V150 V170 V160 V159 V158 V140 V149 V156
V161 V141 V167 V151
Proportion of variance explained by model: 51.52%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V144 0.05416185 6.985963e-02 0.05137624
V139 0.06496484 8.374789e-02 0.05659155
V152 0.04555003 1.590906e-03 0.05152652
V142 0.04555793 1.901789e-04 0.05086216
V157 0.05113744 2.641150e-05 0.05497880
V137 0.06057700 1.698714e-01 0.04935664
V162 0.06599232 1.777644e-01 0.05482148
V169 0.04667376 1.989268e-04 0.05260723
V150 0.04184400 3.455916e-05 0.04888765
V170 0.03797399 4.087843e-02 0.03916625
V160 0.05081902 2.807938e-02 0.05047922
V159 0.04542082 5.158531e-03 0.05004528
V158 0.05198424 4.155107e-02 0.05201739
V140 0.04954933 7.062043e-04 0.05147886
V149 0.04176165 9.967343e-04 0.04833916
V156 0.04480985 3.378128e-03 0.04907881
V161 0.04492645 6.361530e-03 0.04752775
V141 0.05773304 1.556097e-01 0.04871006
V167 0.04530835 4.845730e-02 0.04568283
V151 0.05325409 1.655388e-01 0.04646612
III.
Attributes related to the dealer
Country A, Year 2006
Response variable: V90
Total response variance: 3.757734
Analysis based on 17669 observations
13 Predictors:
V91 V92 V93 V94 V95 V96 V97 V98 V99 V100 V101 V102 V103
Proportion of variance explained by model: 83.44%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V91 0.18521298 6.847516e-01 0.11288476
V92 0.04820797 9.116609e-04 0.05945564
V93 0.07233644 2.862797e-05 0.08168633
V94 0.09890643 6.431012e-02 0.09279135
V95 0.06497206 3.968239e-06 0.07616506
V96 0.05587773 2.360987e-05 0.06826234
V97 0.06290219 7.867090e-03 0.06919138
V98 0.08530652 2.806959e-02 0.08517191
V99 0.05568341 5.348390e-04 0.06452460
V100 0.04834171 3.576874e-03 0.05786154
V101 0.06958985 3.141434e-03 0.08005059
V102 0.05898171 5.458162e-04 0.07266655
V103 0.09368100 2.062347e-01 0.07928795
Country A, Year 2007
Response variable: V93
Total response variance: 3.600228
Analysis based on 18132 observations
13 Predictors:
V94 V95 V96 V97 V98 V99 V100 V101 V102 V103 V104 V105 V106
Proportion of variance explained by model: 84.2%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V94 0.19100970 7.264426e-01 0.11368961
V95 0.05121555 5.483709e-04 0.06250500
V96 0.07442355 1.696865e-03 0.08262941
V97 0.09609172 3.716799e-02 0.09258794
V98 0.06514394 5.963570e-04 0.07646783
V99 0.05719067 7.870795e-06 0.06944981
V100 0.06124171 6.756278e-03 0.06833179
V101 0.08803514 3.947776e-02 0.08559081
V102 0.05405284 2.101083e-03 0.06243708
V103 0.04723924 4.862534e-04 0.05785069
V104 0.06921337 3.359284e-03 0.07973824
V105 0.05623089 3.475600e-03 0.07081507
V106 0.08891168 1.778837e-01 0.07790672
Country A, Year 2008
Response variable: V51
Total response variance: 3.639024
Analysis based on 22321 observations
13 Predictors:
V52 V53 V54 V55 V56 V57 V58 V59 V60 V61 V62 V63 V64
Proportion of variance explained by model: 84.63%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V52 0.18764711 7.271273e-01 0.11268902
V53 0.04987515 6.951197e-04 0.06112621
V54 0.07520875 1.230900e-03 0.08297386
V55 0.09886036 5.301004e-02 0.09296528
V56 0.06538550 7.392421e-04 0.07628896
V57 0.05618275 6.938766e-05 0.06820079
V58 0.06276593 7.147703e-03 0.06906188
V59 0.08546641 3.079432e-02 0.08493816
V60 0.05425806 1.389492e-03 0.06313038
V61 0.04611721 1.128925e-03 0.05628250
V62 0.07113660 4.435687e-03 0.08107075
V63 0.05964542 3.203869e-03 0.07349821
V64 0.08745076 1.690280e-01 0.07777400
Country A, Year 2009
Response variable: V129
Total response variance: 3.53333
Analysis based on 21084 observations
13 Predictors:
V130 V131 V132 V133 V134 V135 V136 V137 V138 V139 V140 V141 V142
Proportion of variance explained by model: 84.65%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V130 0.18835054 7.301989e-01 0.11221974
V131 0.05185940 9.211642e-04 0.06318838
V132 0.07364990 3.366056e-04 0.08213865
V133 0.09666288 4.502798e-02 0.09244620
V134 0.06565629 9.256717e-04 0.07634026
V135 0.05621675 9.789573e-05 0.06868110
V136 0.06140672 3.467800e-03 0.06779338
V137 0.08446151 2.312392e-02 0.08471682
V138 0.05484663 6.909861e-05 0.06376546
V139 0.04697694 1.766338e-03 0.05676957
V140 0.07036984 5.329674e-03 0.08013106
V141 0.05738450 5.201283e-03 0.07168433
V142 0.09215811 1.835337e-01 0.08012505
Country A, Year 2010
Response variable: V99
Total response variance: 3.549067
Analysis based on 21352 observations
13 Predictors:
V100 V101 V102 V103 V104 V105 V106 V107 V108 V109 V110 V111 V112
Proportion of variance explained by model: 85.27%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
V100 0.18775470 0.7362742584 0.11205994
V101 0.05209857 0.0004553937 0.06360624
V102 0.07491296 0.0004533549 0.08297144
V103 0.10011112 0.0511773923 0.09361036
V104 0.06765425 0.0024312043 0.07772847
V105 0.05504420 0.0020952891 0.06784081
V106 0.06010152 0.0032824960 0.06856897
V107 0.08353759 0.0240797591 0.08380773
V108 0.05177882 0.0009654425 0.06056569
V109 0.04627701 0.0001588560 0.05682245
V110 0.07205349 0.0061650701 0.08096007
V111 0.06121120 0.0005103106 0.07423479
V112 0.08746456 0.1719511732 0.07722304
IV.
Problem Areas
Country A, Year 2006
Response variable: V191
Total response variance: 2.140151
Analysis based on 39307 observations
14 Predictors:
Vp Ve Vw Vb Vo Vi Vel Ven Vc Vbr Vsw Vs Vex Vot
Proportion of variance explained by model: 14.41%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
Vp 0.039842387 0.02977609 0.048137509
Ve 0.042242747 0.03239782 0.050406735
Vw 0.030952020 0.02632669 0.034680935
Vb 0.121423250 0.11532126 0.125186465
Vo 0.062205607 0.05878978 0.064523369
Vi 0.100405042 0.09893048 0.101098128
Vel 0.108495021 0.11726448 0.101685549
Ven 0.212139073 0.25024403 0.183571241
Vc 0.120215922 0.12446019 0.116230932
Vbr 0.045555738 0.03933297 0.050383691
Vsw 0.060655100 0.05583672 0.064556730
Vs 0.033740768 0.02881239 0.037625045
Vex 0.013582749 0.01195273 0.014853031
Vot 0.008544577 0.01055439 0.007060639
Country A, Year 2007
Response variable: V243
Total response variance: 2.117815
Analysis based on 39485 observations
14 Predictors:
Vp Ve Vw Vb Vo Vi Vel Ven Vc Vbr Vsw Vs Vex Vot
Proportion of variance explained by model: 14.12%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
Vp 0.042708995 0.034500288 0.049491698
Ve 0.035370297 0.026804428 0.042385400
Vw 0.051543550 0.049246900 0.053372287
Vb 0.123632357 0.111864701 0.131278734
Vo 0.061172457 0.057526995 0.063762537
Vi 0.074496974 0.071446048 0.076935143
Vel 0.089960019 0.097802332 0.084192155
Ven 0.212809172 0.250392368 0.184893020
Vc 0.137871419 0.141512990 0.134150987
Vbr 0.048614873 0.044353022 0.051915504
Vsw 0.056500635 0.054858109 0.057818221
Vs 0.046087759 0.040707305 0.050150517
Vex 0.010442400 0.008255915 0.012225820
Vot 0.008789093 0.010728598 0.007427978
Country A, Year 2008
Response variable: V185
Total response variance: 2.150441
Analysis based on 40678 observations
14 Predictors:
Vp Ve Vw Vb Vo Vi Vel Ven Vc Vbr Vsw Vs Vex Vot
Proportion of variance explained by model: 14.39%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
Vp 0.045876248 0.037517477 0.052662217
Ve 0.033029765 0.023748463 0.040566347
Vw 0.052034340 0.047325520 0.055719729
Vb 0.110277994 0.097681766 0.118646910
Vo 0.059870420 0.056588090 0.062211016
Vi 0.087610414 0.083404228 0.090635488
Vel 0.093153875 0.100558400 0.087406287
Ven 0.252488469 0.302818865 0.215539736
Vc 0.103348859 0.105917495 0.100859107
Vbr 0.048519557 0.044443819 0.051469277
Vsw 0.039320631 0.034126298 0.043349360
Vs 0.042098418 0.035666986 0.046912783
Vex 0.025816516 0.022517877 0.028298956
Vot 0.006554494 0.007684718 0.005722788
Country A, Year 2009
Response variable: V166
Total response variance: 2.051197
Analysis based on 38628 observations
14 Predictors:
Vp Ve Vw Vb Vo Vi Vel Ven Vc Vbr Vsw Vs Vex Vot
Proportion of variance explained by model: 12.77%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
Vp 0.047259639 0.036880618 0.055496184
Ve 0.048929548 0.044519069 0.052543048
Vw 0.027595428 0.024629018 0.030038289
Vb 0.122505803 0.112297562 0.128963244
Vo 0.049666289 0.044956433 0.053164563
Vi 0.079500754 0.074262146 0.083292872
Vel 0.088606572 0.096729137 0.082742135
Ven 0.214316999 0.250222913 0.187771898
Vc 0.151047967 0.165427418 0.139916901
Vbr 0.050078129 0.041951225 0.056202797
Vsw 0.047450551 0.044491043 0.049717297
Vs 0.055625147 0.049164918 0.060388137
Vex 0.015697143 0.012480133 0.018231952
Vot 0.001720031 0.001988368 0.001530682
Country A, Year 2010
Response variable: V136
Total response variance: 2.393666
Analysis based on 30264 observations
14 Predictors:
Vp Ve Vw Vb Vo Vi Vel Ven Vc Vbr Vsw Vs Vex Vot
Proportion of variance explained by model: 14.41%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
last
first
Vp 0.03932781 0.02999155 0.047216946
Ve 0.03633730 0.02897662 0.042543982
Vw 0.02825591 0.02583377 0.030373011
Vb 0.08935714 0.07966546 0.096309440
Vo 0.05878993 0.05458021 0.061575447
Vi 0.06564755 0.05853186 0.071027124
Vel 0.08702988 0.08889453 0.085019095
Ven 0.29289464 0.35551348 0.245493547
Vc 0.13617786 0.13824893 0.133451330
Vbr 0.04994728 0.04271740 0.055541753
Vsw 0.04123526 0.03442902 0.046479289
Vs 0.04932866 0.03731569 0.058440264
Vex 0.01491473 0.01038544 0.018715628
Vot 0.01075603 0.01491603 0.007813144
V.
Trend Analysis
• Trend Analysis (comparison)
The above trend line is fitted to the V7 attribute and is including only the respondents who did
not experience any problems. The fit line is quite poor, however when taking the whole dataset
into consideration, there was no pattern at all. The bellow figure, on the other hand, is showing
weak pattern when all the respondents were included in the analysis, but did not display any trend
pattern at all among those that did not experience any problem.
56
Trend line fitted to the relative importance of the attribute V3 (including all the respondents)
displayed a good fit, while it did not show any pattern at all, when considering only respondents
who did not report any problem.
To emphasise the differences between the two datasets, V15 provides a clear example of different
movements in relative importance over the five years (figures below).
Download