Master Thesis in Statistics, Data Analysis and Knowledge Discovery Customer Satisfaction Analysis Laura Funa i ii The back cover shall contain the ISRN number obtained from the department. This number shall be centered and at the same distance from the top as the last line on the front page. LiU-IDA-???-SE (Garamond 18) iii Abstract The objective of this master thesis is to identify “key-drivers” embedded in customer satisfaction data. The data was collected by a large transportation sector corporation during five years and in four different countries. The questionnaire involved several different sections of questions and ranged from demographical information to satisfaction attributes with the vehicle, dealer and several problem areas. Various regression, correlation and cooperative game theory approaches were used to identify the key satisfiers and dissatisfiers. The theoretical and practical advantages of using the Shapley value, Canonical Correlation Analysis and Hierarchical Logistic Regression has been demonstrated and applied to market research. iv v Acknowledgements This work would not have been completed without support of many individuals. I would like to thank everyone who has helped me along the way. Particularly: Prof. Anders Nordgaard and Malte Isacsson for providing guidance, encouragement and support over the course of my master’s research. Prof. Anders Grimvall for serving on my thesis committee and valuable suggestions. Volvo Car Corporation for providing the data. Lastly, to everyone else without whose support none of this would have been possible. vi vii Table of contents 1 2 3 Introduction ..................................................................................................................................... 5 1.1 Background .............................................................................................................................. 6 1.2 Objective .................................................................................................................................. 7 Data ................................................................................................................................................. 7 2.1 Raw data .................................................................................................................................. 8 2.2 Secondary data......................................................................................................................... 9 2.3 Assessment of data quality ..................................................................................................... 10 2.3.1 Univariate Analysis of the Satisfaction Attributes ............................................................ 11 2.3.2 Univariate Analysis of Problem Areas.............................................................................. 13 Methods ........................................................................................................................................ 16 3.1 Kano Modeling ....................................................................................................................... 16 3.2 Shapley Value Regression ....................................................................................................... 18 3.2.1 Assessing Importance in a Regression Model .................................................................. 18 3.2.2 Potential, Value and Consistency .................................................................................... 19 3.2.3 Shapley-based R2 Decomposition .................................................................................... 22 3.2.4 Choosing “key-drivers” ................................................................................................... 25 3.3 Trend Analysis ........................................................................................................................ 26 3.3.1 3.4 Hierarchical Logistic Regression Modeling .............................................................................. 27 3.4.1 Ordinary logistic regression model.................................................................................. 28 3.4.2 Hierarchical logistic regression........................................................................................ 28 3.5 4 The time consistent Shapley value .................................................................................. 27 Canonical Correlation Analysis................................................................................................ 29 3.5.1 Formulation .................................................................................................................... 29 3.5.2 Issues and practical usage ............................................................................................... 30 Computations and Results.............................................................................................................. 30 4.1 Shapley Value......................................................................................................................... 30 4.1.1 Ranked Satisfiers (related to the satisfaction with the dealer) ......................................... 32 4.1.2 Ranked Satisfiers (related to the satisfaction with the vehicle)........................................ 34 4.1.3 Ranked Dissatisfiers ........................................................................................................ 40 4.1.4 “Key attributes” identification......................................................................................... 43 1 5 4.2 Time Series and Trend Analysis............................................................................................... 43 4.3 Hierarchical Logistic Regression: SAS Modeling....................................................................... 48 4.4 Canonical Correlation Analysis................................................................................................ 50 Discussion and conclusions ............................................................................................................ 53 5.1 6 Proposed further research...................................................................................................... 57 5.1.1 Kernel Canonical Correlation Analysis ............................................................................. 57 5.1.2 Moving Coalition Analysis ............................................................................................... 57 Literature and sources.................................................................................................................... 58 Appendix A: SAS and R codes ................................................................................................................. 59 Appendix B: Outputs .............................................................................................................................. 61 2 Index of tables Table 1: Datasets Summary .................................................................................................... 8 Table 2: Frequencies and proportions of the satisfaction attributes (Country A, Year 2006) ........................................................................................................................................ 11 Table 3: Problem areas occurance (Country A, Year 2006) .............................................. 13 Table 4: Dealer Satisfiers, Country A, Years 2006 and 2007 respectively ....................... 32 Table 5: Dealer Satisfiers, Country A, Years 2008 and 2009 respectively ....................... 33 Table 6: Dealer Satisfiers, Country A, Year 2010 ............................................................... 33 Table 7: Vehicle satisfiers, Country A, Years 2006 and 2007 respectively....................... 34 Table 8: Vehicle Satisfiers, Country A, Years 2008 and 2009 respectively...................... 35 Table 9: Vehicle Satisfiers, Country A, Year 2010 .............................................................. 36 Table 10: Vehicle Satisfiers, Country A, Years 2006 and 2007 respectively (respondents with no problems) ................................................................................................................... 37 Table 11: Vehicle Satisfiers, Country A, Years 2008 and 2009 respectively (respondents with no problems) ................................................................................................................... 38 Table 12: Vehicle Satisfiers, Country A, Year 2010 (respondents with no problems) ..... 39 Table 13: Dissatisfiers, Country A, Year 2006 and 2007 respectively .............................. 40 Table 14: Dissatisfiers, Country A, Year 2008 and 2009 respectively .............................. 41 Table 15: Dissatisfiers, Country A, Year 2010..................................................................... 41 Table 16: Ven problem area sub-categories, Country A, Year 2006................................. 42 Table 17: Bulding the GLIMMIX procedure.......................................................................... 48 Table 18: Country A, Year 2006............................................................................................ 48 Table 19................................................................................................................................... 49 Table 20: Solution for fixed effects........................................................................................ 49 Table 21................................................................................................................................... 51 Table 22................................................................................................................................... 52 Table 23................................................................................................................................... 53 3 Index of figures Figure 1: Frequency distribution of variable V191 in Country A, Year 2006 ............................. 11 Figure 2: Proportions of problem areas .................................................................................... 15 Figure 3: Kano Model Attributes ............................................................................................... 17 Figure 4: Two-level hierarchical regression .............................................................................. 27 Figure 5: Satisfaction Attribute V90, Country A, Year 2006....................................................... 32 Figure 6: Noise-Reach table, Country A, Year 2006 .................................................................. 43 Figure 7: Time Series Analysis, Country A .......................................................................... 44 Figure 8: Trend in V1, Country A .............................................................................................. 45 Figure 9: Time Series Analysis, Country A (respondents with no problems)............................... 45 Figure 10: Trend in V1, Country A (respondents with no problems) .......................................... 46 Figure 11: Trend in V8, Country A, all respondents vs. only those with no problems ................. 46 Figure 12: Trend in V10, Country A, all respondents vs. only those with no problems ............... 47 Figure 13: Trend in V17, Country A, all respondents vs. only those with no problems ............... 47 Figure 14: Time Series Analysis, problem areas, Country A ...................................................... 48 Index of equations Equation 1: Regression Model................................................................................................... 18 Equation 2: R-squared ............................................................................................................... 19 Equation 3: Nash Equlibrium..................................................................................................... 20 Equation 4: Marginal contribution of a player in a game............................................................ 20 Equation 5: Potential Function................................................................................................... 21 Equation 6: Differences ............................................................................................................. 21 Equation 7: Payoff..................................................................................................................... 21 Equation 8: Shapley Value......................................................................................................... 22 Equation 9: Regression model ................................................................................................... 23 Equation 10: Variance ............................................................................................................... 23 Equation 11: Relative contributions ........................................................................................... 23 Equation 12: Marginal Effect..................................................................................................... 24 Equation 13: Shapley Value R-squared decomposition .............................................................. 24 Equation 14: Fields R-squared decomposition ........................................................................... 25 Equation 15: Success ................................................................................................................. 26 Equation 16: Ordinary logistic regression model........................................................................ 28 Equation 17: Random effects..................................................................................................... 28 Equation 18: Fixed effects ......................................................................................................... 29 Equation 19: CCA parameter..................................................................................................... 29 4 1 Introduction Predictive analytics rely on different statistical techniques deriving from fields such as data mining, modeling and game theory. The main reason for using these is to extract information from large and complex datasets and use it to forecast future trend patterns. In business, predictive models search for patterns and hidden relationships in historical or transactional data to serve as a guide for decision making and identifying risk and opportunities. Several data mining techniques have been developed and showed positive impact over the years in the large range of business fields. The most well-known applications include applications in finance (e.g. credit scoring), marketing, and fraud detection. Each field by itself then offers an enormous amount of possibilities where the analysis of the large datasets can be exploited in a profitable fashion. Taking marketing into consideration; among the main topics covered by predictive analytics are CRM (Customer Relationship Management), cross-selling, customer retention and direct marketing. Moreover, achieving these goals is in general based on conducting an appropriate customer analysis. A large portion of the data required by customer analytics is more than often acquired conducting customer satisfaction surveys. Customer satisfaction is a well known term in marketing; it indicates how well the products or services provided by the supplier meet the customer expectations. In a highly competitive market, the companies may take advantage of such information to differentiate or improve their products or/and services in order to increase their (market) share of customers and customers’ loyalty. Such data is among the most frequently collected indicators of market perceptions. The purpose of this master thesis is to develop an appropriate customer satisfaction analysis procedure that will provide an indicator of customer behavior on a highly competitive automotive market. Thus, the aim of the analysis is to find hidden relationships, patterns and trends in the datasets provided. 5 The master thesis is divided into five chapters, starting with stating the objectives and the motivation of the problem addressed, followed by description and assessment of data quality, including data sources, raw data and secondary data. The third part consists of the methods used and model building, while the last part is focusing on the results and discussion of the latter. The research is concluded with a critical assessment of the results obtained and the adequacy of the methods used. 1.1 Background The research is based on a customer satisfaction survey performed on new car owners (i.e. owners of cars that are three months in service). The survey was conducted on several different markets and consists of different areas of customer characteristics, customer satisfaction and related issues. As cars are consumer products, automotive businesses are driven by customer satisfaction. Hence an improvement in consumer insight and information gain through customer data is sought for constantly. It is essential to mention that satisfaction is a very abstract concept and the actual state of satisfaction varies between different individuals and different products or services. It depends on several psychological and physical variables. Additional options or alternative products and services that are available to customers in particular industry can be too seen as a source of variability of the satisfaction level. Most valuable satisfaction behaviors to investigate are loyalty and recommend rate. The main purpose of customer satisfaction analysis is often, understanding the impact of explanatory variables on the overall dependent variable. This means that a list of priority items, that can be improved, needs to be established, since the improvement in any of these will have a positive impact on overall satisfaction or customer loyalty and retention (Tang & Weiner, 2005). When choosing an appropriate statistical technique it is necessary to have a clear vision whether the purpose of the analysis is solely exploratory or predictive. Some of the most common customer satisfaction techniques include; ordinary least squares, Shapley value regression, penalty & reward analysis, Kruskal’s relative importance, partial least squares and logistic regression. Since customer satisfaction studies are usually tracking studies, the results can be monitored over time and allow for trend detection. Moreover, one of the 6 challenges when choosing the methodology and building the model is to assure that the results are consistent when tracking market over time (Tang & Weiner, 2005). 1.2 Objective The main objective of the master thesis is to find appropriate statistical techniques that present good applications in customer satisfaction analysis. Furthermore, they should provide tools for identifying “key drivers”, patterns, relationships among several sets of (dependent and independent) variables and measures of relative importance. More specific objectives of the thesis are; finding an exact measure of the contributions of the explanatory variables to the dependent variable and identifying the greatest satisfiers and dissatisfiers influencing customer satisfaction with the dealer and with the vehicle. Exploring the nature of the satisfaction attributes and evaluating whether there is a possibility to establish a “clean” measure of experienced problems and consequentially classify them into “fixable” and those that cannot be repaired, but are a matter of customers’ personal preferences (i.e. “annoying concept”). Finally, examining the relationships between two sets of variables (i.e. satisfaction related problems and satisfaction attributes). The thesis aims to be consistent with the most commonly used customer satisfaction analysis techniques and available literature. What can and cannot be modeled and predicted needs to be clearly stated at all points. 2 Data The data used in this research was collected by conducting a customer satisfaction survey among new car owners (i.e. customers who had purchased a new car within three months). The questionnaire was divided into twelve different sections; ranging from personal, demographical questions to questions directly connected to satisfaction with the new car, previous cars and views on automotive industry. Data on customer satisfaction is often taken as a key performance indicator within business and is often incorporated in balance scorecards. 7 An important basic requirement for effective research on customer satisfaction is building an appropriate questionnaire that provides reliable and representative measures. The general guideline is to build questions on whether the product or service has met or exceeded expectations. Expectations and consequently customer perceptions are therefore the key factor behind satisfaction. Questions are based on individual level perceptions but are usually reported on aggregate level. According to Batra and Athola (Batra & Athola, 1990) customers purchase products and services based on two types of benefits; hedonic and utilitarian. The first is connected to experiential attributes and the latter is linked to the functional attributes of the product. The survey used in this research, involved most common measures of customer satisfaction; sets of statements using Likert technique and scales (Likert, 1932). 2.1 Raw data The data provided was based on the survey conducted in four different countries (A, B, C and D) and ranging over five years – 2006 to 2010, except for country C, where the survey was conducted every second year (i.e. 2001, 2003, 2005, 2007, 2009) Table 1: Datasets Summary Country/Year Number of Variables Recorded responses A 2006 A 2007 A 2008 A 2009 A 2010 B 2006 B 2007 B 2008 B 2009 B 2010 C 2001 C 2003 394 411 387 382 346 390 400 387 382 385 301 362 41474 41657 42783 40531 39879 46690 46148 49918 48833 46987 10830 9738 8 C 2005 C 2007 C 2009 D 2006 D 2007 D 2008 D 2009 D 2010 TOTAL 371 398 365 403 410 385 382 426 / 10912 10592 12341 13667 16509 18968 20875 23664 592996 In total there were 592996 responses and 229218251 data-points. The number of variables in each survey ranged from 301 to 426. The results from the development of the methodology are based on a survey1 in the country A that included 41474 responses and 394 variables in the year 2006. In the trend analysis the datasets included all five years. The survey in question yielded 31351 valid responses, representing 76% of all customers who participated. The variables used in the core part of the analysis were 34 satisfaction attributes and 14 problem areas, where each problem area consists in general of 20 sub-categories. The satisfaction attributes were evaluated on a 1 to 10 scale, where 1 represented the worst and 10 the best possible outcome. Problem areas on the other hand allowed for several nominal values. 2.2 Secondary data Since the survey comprised several questions that allowed more than one answer (e.g. problem areas), the first step of the analysis was to transform these into binary form, using dummy variables. However, various variables in the research posed a bigger challenge and required further investigation to decide whether they should be treated as being ordinal or interval. 1 The methodology and models developed in this research were then re-applied to the remaining datasets and the results can be found in the Appendix C. 9 Variables ranked on a “never, occasionally, sometimes, always” scale present a problem on relative placement of the two middle categories, thus Knapp (Knapp, 1990) argues that this produces a less-than ordinal scale. The controversy arises from the key terms such as “appropriateness” and “meaningfulness”. Conservative views (Siegel, 1956) are based on the assumption that once the ordinal level has been adopted, the inferences are restricted to population medians and non-parametric procedures must be used, hence the power of the statistics is lower. Labovitz (Labovits, 1967, pp. 151-160) on the other hand argues that there are no true restrictions in using parametric procedures for ordinal scales, since the assumption of the validity of the t and F distributions do not include the type of the scale, which consequentially provides statistics of higher power. The number of the categories building the scale is important too. The remaining variables varied in scale level and the two types of scales occurring were a 1 to 4 scale and 1 to 10 scale, where the latter tends to continuize things more than the first. Moreover, there have been several studies (Hausknecht, 1990) on measurement scales in customer satisfaction analysis, which attempt to prove the validity of treating an ordinal scale with several categories as interval. 2.3 Assessment of data quality The quality and the nature of the data provided was first assessed by applying an univariate approach; identifying the distributions, response rate and percentage of missing values. As a last step of this pre-analysis, the most common issues when dealing with customer satisfaction data were pointed out. 10 2.3.1 Univariate Analysis of the Satisfaction Attributes Figure 1: Frequency distribution of variable V191 in Country A, Year 2006 Table 2: Frequencies and proportions of the satisfaction attributes (Country A, Year 2006) Scale 1 2 3 4 5 6 7 8 9 10 Total (responses) Missing values V191 0,41% 0,27% 0,57% 1,17% 1,42% 5,20% 8,71% 28,68% 29,09% 24,48% 94,78% 5,22% V14 0,06% 0,04% 0,12% 0,31% 1,01% 3,61% 8,89% 24,65% 28,46% 32,85% 97,59% 2,41% V193 0,24% 0,14% 0,22% 0,62% 1,11% 3,51% 8,90% 25,37% 31,62% 28,27% 97,53% 2,47% V7 0,07% 0,07% 0,20% 0,63% 1,47% 5,13% 12,11% 27,32% 28,01% 24,99% 97,59% 2,41% V3 0,18% 0,16% 0,32% 0,86% 1,72% 4,95% 11,94% 26,21% 27,42% 26,24% 97,54% 2,46% Scale 1 2 V6 0,15% 0,17% V8 0,11% 0,15% V17 0,09% 0,05% V23 0,27% 0,18% V1 0,48% 0,27% 11 3 4 5 6 7 8 9 10 Total (responses) Missing values 0,31% 0,98% 2,18% 6,51% 14,35% 27,12% 25,39% 22,83% 97,36% 2,64% 0,29% 1,00% 2,25% 7,33% 14,31% 27,57% 26,08% 20,90% 96,96% 3,04% 0,06% 0,19% 0,82% 2,77% 8,37% 24,16% 29,59% 33,91% 96,14% 3,86% 0,33% 1,06% 2,68% 8,44% 14,38% 25,95% 22,60% 24,11% 96,99% 3,01% 0,41% 0,75% 1,00% 2,64% 6,75% 19,63% 28,05% 40,03% 94,15% 5,85% Scale 1 2 3 4 5 6 7 8 9 10 Total (responses) Missing values V12 0,10% 0,06% 0,14% 0,57% 1,38% 4,49% 11,25% 27,18% 27,95% 26,88% 97,35% 2,65% V15 0,09% 0,07% 0,16% 0,49% 1,06% 3,65% 9,54% 25,44% 29,00% 30,50% 97,52% 2,48% V208 0,27% 0,24% 0,47% 1,41% 2,16% 5,95% 11,64% 24,79% 25,96% 27,11% 94,37% 5,63% V9 0,13% 0,08% 0,16% 0,49% 0,92% 3,08% 8,98% 25,16% 29,17% 31,83% 97,46% 2,54% V19 0,15% 0,19% 0,34% 1,14% 1,81% 5,31% 11,12% 25,76% 26,68% 27,50% 97,41% 2,59% Scale 1 2 3 4 5 6 7 8 9 10 Total (responses) Missing values V211 0,52% 0,35% 0,64% 1,61% 5,63% 10,72% 16,59% 24,61% 19,90% 19,44% 90,23% 9,77% V4 0,28% 0,19% 0,24% 0,70% 2,56% 5,81% 12,44% 24,94% 24,13% 28,71% 94,66% 5,34% V16 0,13% 0,14% 0,28% 0,92% 2,06% 5,86% 13,22% 27,81% 25,38% 24,19% 97,29% 2,71% V13 0,07% 0,07% 0,11% 0,45% 1,15% 3,94% 11,09% 27,74% 27,96% 27,43% 97,26% 2,74% V11 0,12% 0,10% 0,22% 0,78% 1,72% 5,22% 13,18% 28,08% 26,36% 24,22% 96,84% 3,16% Scale 1 2 3 4 5 6 V20 0,13% 0,11% 0,22% 0,64% 1,75% 5,34% V26 0,10% 0,09% 0,16% 0,77% 1,47% 5,09% V10 0,19% 0,16% 0,31% 0,87% 1,37% 4,41% V22 0,18% 0,15% 0,28% 1,11% 1,86% 5,67% V2 0,14% 0,09% 0,25% 0,73% 1,45% 4,47% 12 7 8 9 10 Total (responses) Missing values 12,42% 27,76% 26,63% 25,02% 94,84% 5,16% Scale 1 2 3 4 5 6 7 8 9 10 Total (responses) Missing values 13,01% 27,95% 25,78% 25,57% 97,03% 2,97% 10,06% 25,13% 27,97% 29,52% 97,69% 2,31% 12,32% 27,12% 26,06% 25,24% 96,91% 3,09% 11,42% 27,57% 27,79% 26,09% 97,50% 2,50% V221 V222 V223 V18 0,06% 0,09% 0,15% 0,54% 1,51% 4,45% 10,95% 25,89% 27,23% 29,15% 97,50% 2,50% 0,26% 0,27% 0,56% 1,80% 3,21% 8,24% 12,70% 23,91% 23,09% 25,97% 97,43% 2,57% 0,19% 0,18% 0,41% 1,19% 2,68% 6,92% 13,33% 26,06% 24,06% 24,97% 97,16% 2,84% 0,09% 0,13% 0,23% 0,73% 1,78% 5,37% 12,67% 26,89% 26,05% 26,05% 97,52% 2,48% Taking into consideration the most favorable rating scores; meaning that the attribute scores were at least “very satisfied” (i.e. 7) the above tables illustrate that 77,5% to 96% of the customers were at least “very satisfied” on at least one of the satisfaction attributes. The lowest satisfaction score was associated with V202 with 77,5%, however it still represents a majority attitude. The lowest response rate was associated with the attribute V211 with missing rate at 9,8%. 2.3.2 Univariate Analysis of Problem Areas Total of 17950 problems appeared, meaning that 43,3% of all respondents experienced at least one problem. The below tables represent frequencies of the individual problem areas. The most common problems appear in the “Vel” category with 10,9% occurrence. The least common are “Vs” problems. Table 3: Problem areas occurance (Country A, Year 2006) Problem Area Vp Ve Vw Vb Vo Vi Number of experienced problems 2081 1759 738 3401 3155 3635 % in the total population 5,20% 4,24% 1,78% 8,20% 7,61% 8,76% Vel 4500 10,85% Problem Area Ven Vcl Vbr Vsw Vs Vex Vot Number of 2452 1568 1480 1216 502 364 428 13 experienced problems % in the total population 5,91% 3,78% 3,57% 2,93% 1,21% Figure 2 represents the proportion of each problem area. 14 0,88% 1,03% Figure 2: Proportions of problem areas A very common challenge when dealing with customer satisfaction data is how to overcome the problem of multicollinearity. It can be controlled and avoided by a well-designed questionnaire, however, in most cases this is difficult to achieve. The attributes measured in the survey were in general highly correlated with each other. An example of such problem would be when evaluating the dealer where the car was purchased; the dealers’ ability to solve problems is highly correlated with the dealers’ friendliness. Another issue relates to dealing with customer satisfaction data that is of tracking nature. It is challenging to reassure that the results obtained reflect real changes in the market and not just a small number of respondent checking different satisfaction levels (e.g. 8 instead of 9). It is important to note that the percentage of customer who had experienced at least one problem, but are still at least “very satisfied” is 84,5%, which is the majority portion. Adding this imbalance to the fact that the nature of the survey is offering only very scarce information on problem areas, this may lead to several restrictions when analyzing the latter. Since the objectives 15 of the thesis involve a deep analysis of the experienced problems, more appropriate measures should be provided by further expansion and development of the “things go wrong” section of the questionnaire. 3 Methods The main methods used in the research are: • Kano Modeling; providing deeper understanding of the customer satisfaction data and what can be achieved using the available data. • Shapley Value; overcoming the problem of multicollinearity, providing better regression results and allowing for trend analysis. • Hierarchical logistic regression; exploring different, hierarchically ranked layers of the data. • Canonical correlation; analyzing relationships between two different sets of variables. 3.1 Kano Modeling The theory of has been developed by professor Noriaki Kano (Mikulic & Prebežac, 2011, pp. 4466) and involves product development and customer satisfaction. It classifies product attributes into five categories based on customer perceptions; enhancers, one-dimensional, must-be, indifferent and reverse. The theory states that the relationship between the performance of a product attribute and satisfaction level is not necessarily linear. Certain attributes can be asymmetrically related with satisfaction levels. These relationships are visually presented in figure 3. 16 Figure 3: Kano Model Attributes Where an attractive attribute provides satisfaction when it is fully implemented, the nonfulfillment of such does not, however cause dissatisfaction. Must-be attributes on the other hand results in dissatisfaction if not fulfilled, but the fulfillment does not increase satisfaction. Onedimensional attributes increase the satisfaction when implemented and dissatisfaction appears if the attribute is not fulfilled. Indifferent attributes do not affect the consumer satisfaction in any way, while reversal attributes result in customer dissatisfaction when fulfilled and satisfaction when not fulfilled (e.g. when technology that is difficult to understand and complicated to use or maneuver is implemented this may cause dissatisfaction). There are several advantages to integrate the Kano modeling; classification of attributes can be used to optimize and improve the products, discover the attractors and develop product differentiation. Moreover, attribute classification provides valuable help in prioritizing requirements and identifying attributes that need attention. An important measure to separate experienced problems with the product, from those that can be fixed and those that are of 17 personal preference, may be introduced by using Kano modeling. The nature of the attractive and must-be attributes would allow applying attributable and relative risk techniques. Attributable risk measures the reduction in dissatisfaction that would be observed if the consumers would not experience a particular problem, compared to the actual pattern. Relative risk is a ratio of the probability of dissatisfaction occurring among the group of consumers that experienced a particular problem compared to the probability of the dissatisfaction occurring among the group of consumers that did not experience the problem. However, the data used in this research did not include any additional indicator on problem attributes, hence classification of these is problematic. 3.2 Shapley Value Regression Regression models offer a convenient method for summarizing and achieving two very different goals in data analysis. One is prediction and another is inference about interaction between the predictor variables and the outcome variable. Yet, regression models do not prove that such relationships exist, they simply summarize the likely effects if the models are as hypothesized (Lipovetsky & Conklin, 2001). 3.2.1 Assessing Importance in a Regression Model Considering a simple model; Y ≈ f (X ,β) Equation 1: Regression Model where all of the predictor variables – x - are uncorrelated with each other; the standardized regression coefficients (called Beta coefficients - •) are taken as measures of importance. These measure the expected change in Y (i.e. dependent variable) when x changes by one standard deviation. Having a negative • (for one particular predictor) can present a potential complication. However, since the actual value of • is its absolute value and the sign represents the direction of the effect, • can be represented by either squaring the values or simply taking the absolute value. 18 The sum of the standardized coefficients is then equal to the overall R2 of the model, where R2 (named coefficient of multiple determination) is a measure of the overall quality of the fit of the model (Lipovetsky & Conklin, 2001). Hence, each individual squared coefficient can be interpreted as the percentage of the explained variance by that individual variable. R2 ∑( f = ∑(y i − y) 2 i − y) i 2 = SS reg SS tot i Equation 2: R-squared Nevertheless, the above explained situation almost never occurs in real data. Consequently, assessing standardized regression coefficients as explained above does not lead to a good indication of importance of each individual variable. The greater the correlation between the predictor variables the less meaningful the evaluated coefficients are (e.g. taking two variables with correlation of 1 into consideration; their coefficients would yield an infinite number of combination of predictors, each making exactly the same contribution). As a solution to this, I propose a technique used in Game Theory – the Shapley Value. 3.2.2 Potential, Value and Consistency Shapley value, a solution concept in cooperative game theory, was introduced by Lloyd Shapley in 1953 (Shapley, 1953). It assigns a unique distribution of total surplus generated by the coalition of all players and it produces a unique solution satisfying the general requirements of the Nash equilibrium (i.e. choosing an optimal strategy under uncertainty) (Kuhn & Tucker, 1959). There is always exactly one such allocation procedure. 3.2.2.1 Nash Equilibrium Nash Equilibrium is a solution strategy in game theory (named after John Forbes Nash, who introduced it). It involves a game of two or more players, where each player is assumed to be aware of the equilibrium strategies – x-i*. of other players and is making the best decision they 19 can, taking into consideration the decisions of the remaining players. Moreover, none of the players can gain anything by changing their decision, if the decisions of the others remain unchanged. The set of strategies chosen under such circumstances and its payoff then constitute the Nash Equilibrium. ∀i, xi ∈ S i , xi ≠ xi* ; f i ( xi* , x −*i ) ≥ f i ( xi , x −*i ) Equation 3: Nash Equlibrium Where; • (S, f) is a game of n players and Si is a strategy of player i • S = S1xS2 … xSN is a set of strategy combinations where, f = (f1(x),…fn(x)) is the payoff function for x∈ S • xi is a strategy combination for player i while x-i is a strategy combination for all players except player i Thus, when each player i chooses strategy xi, it follows that x = (x1,.., xn), and the resulting playoff for player i equals fi(x). Once player i cannot improve their payoff by changing their strategy, then the strategy has achieved xi*. Consequently the strategy combination x* ∈ S is the Nash Equilibrium. 3.2.2.2 Potential Theorem 1: “There exists a unique real function on games – called the potential – such that the marginal contributions of all players (according to this function) are always efficient. Moreover, the resulting payoff vector is precisely the Shapley value (Econometrica, 1989).” D i P( N , v) = P( N , v) − P( N \ {i}, v) Equation 4: Marginal contribution of a player in a game Where; • N is a finite number of players • v is a characteristic function satisfying v(•) = 0 • (N\{i}, v) is a subgame S • DiP(N, v) is a payoff vector 20 Thus, a function P(N, v) is called the potential function if it satisfies the following for all games; ∑ D P ( N , v ) = v( N ) i i∈N Equation 5: Potential Function Moreover, the satisfaction of the above condition determines the uniqueness of the potential function. According to Hart & Mass-Colell (Hart & Mass-Colell, 1989, pp. 589-614), it follows that the potential function is such that the allocation of marginal contributions always adds up exactly to the grand coalition. This is referred to as efficiency. Furthermore, DiP(N, v) = Shi(N, v); where Shi denotes the Shapley value of player i in the game (N, v). 3.2.2.3 Preservation of differences Preservation of differences looks at the payoff allocation problem from another view. That is, what would player i gain if player j is not be included and what would j get if player i would not be included in the model. Hart & Mass-Colell (Ibid.), show that one obtains a unique efficient outcome which simultaneously preserves all these differences. d ij = x i ( N \ {j}) - x j ( N \ {i}) Equation 6: Differences Thus, x i ( N ) − x i ( N \ {i}) = x j ( N ) − x j ( N \ {i}) Equation 7: Payoff The above equality has been used by Myerson (Myerson, 1980) and it has been proven that any solution that is obtained by a potential function satisfies the condition. Hence, any such solution clearly coincides with the Shapley value. 3.2.2.4 Consistency An important characterization of the value is its internal consistency property. 21 Theorem 2: “Consider the class of solutions that, for two-person games, divide the surplus equally. Then the Shapley value is the unique consistent solution in this class (Econometrica, 1989).” In general, the consistency requirement as stated above may be described with: • • being a function that associates a payoff to every player in every game • reduced game, among any group of players in a game, defined as: giving the payoff according to • to the rest of the players It follows that • is consistent if and only if, when applied to any reduced game, yields the same payoffs as in the original game (Econometrica, 1989). 3.2.2.5 Value In regression, the attributes are thought of as players and the total value of the game as the R2. The formulation of the Shapley value of a single attribute is defined as: [ SV j = ∑∑ γ k v( M i| j ) − v ( M i| j ( − j ) ) k ] i Equation 8: Shapley Value Where; • v(Mi|j) is the R2 of a model i containing predictor j • v(Mi|j(-j)) is the R2 of the same model i without j • γk = k!(n − k − 1)! ; is a weight based on the number of predictors in total (n) and the n! number of predictors in this model (k) 3.2.3 Shapley-based R2 Decomposition Shapley value offers very robust estimate of the relative importance of predictor variables even when there are high levels of correlation and/or skewness in the data. The most common approach to R2 decomposition in cases of multicollinearity is a stepwise regression and its procedures. However, this method is of arbitrary nature and it does not always lead to efficient conclusions. Moreover, the significance test does not always allow the ranking of the independent variables in order of importance (Israeli, 2007, pp. 199-212). An alternative 22 approach has been proposed by Chantreuil and Trannoy (Chanteruil & Trannoy, 1999), who used the concept of the Shapley value. Shorrocks (Shorrocks, 1999) then argues that Shapley value based procedures can be applied in various situations, leading to different results. While traditional decompositions such as Fields (Fields, 2003) decomposition, can be applied to simple linear regressions models and perform well in finding the effects of the explanatory variables, the new approach (i.e. Shapley value based approach) may also be applied to more complicated regression models. These may include interactions, dummy variables and high multicollinearity between explanatory variables. 3.2.3.1 Decomposing R2 Consider a regression model; J y = a + ∑bj x j + e j =1 Equation 9: Regression model where the total sum of squares (in essence the raw variance of y) can be decomposed into the model sum of squares (SSreg) and the error sum of squares (SSerror): Var ( y ) = SS tot = Var ( yˆ ) + Var (e) Equation 10: Variance The R2 of the regression is then taken as previously stated: R2 = SS reg SS tot Following the Mood, Graybill and Boes (Mood et al., 1974) theorem the relative contributions may be stated as: J Var ( y ) = ∑ Cov (b j x j , y ) + Cov (e, y ) j =1 Equation 11: Relative contributions 23 Omitting the residuals it follows that: J R ( y) = 2 ∑ b Cov( x j =1 j , y) j Var ( y ) = 1− Cov (e, y ) Var ( y ) Continuing from the above equation, the explanatory variables can be ranked according to their importance. However, this fails to account for probable correlation between the contribution of an individual explanatory variable and that of the remaining variables. On the other hand, Shapley decomposition procedure requires the contribution of a variable being equal to its marginal effect. The marginal effect can be expressed as: M k = R 2 y = a + ∑ b j x j + bk x k + e − R 2 y = a * + ∑ b *j x j + e * j∈S j∈S Equation 12: Marginal Effect Where; S is a subgroup of explanatory variables not including variable k. Taking a simple example into consideration, where y = a + b1 x1 + b2 x 2 + e , the difference of the two decompositions may be seen from the following: • Shapley decomposition: [ ] [ ] 1 2 R (a + b1 x1 + b2 x 2 + e) − R 2 (a * + b2* x 2 + e * ) + R 2 (a ** + b1** x1 + e ** ) 2 1 C 2 = R 2 (a + b1 x1 + b2 x 2 + e) − R 2 (a ** + b1** x1 + e ** ) + R 2 (a * + b2* x 2 + e * ) 2 C1 = Equation 13: Shapley Value R-squared decomposition • Fields decomposition: C1 = (b1 + b1** ) Cov ( x1 , y ) (b2 + b2* ) Cov ( x 2 , y ) + 2 Var ( y ) 2 Var ( y ) 24 (b2 + b2* ) Cov ( x 2 , y ) (b1 + b1** ) Cov ( x1 , y ) C2 = + 2 Var ( y ) 2 Var ( y ) Equation 14: Fields R-squared decomposition Of special interest when comparing the two decompositions are models that are including high multicollinearity. This issue is particularly problematic when dealing with Fields decomposition due to the reason that it uses the estimated coefficients. The estimated variances of these will be large and consequentially the estimated coefficients will deviate largely from the population coefficients. Moreover, a small change in the model can result in a large change in the estimated coefficients. In contrast, Shapley based decomposition uses marginal contributions of a variable from all sequences. The value of the contribution will be high or low depending on whether the variable to which the variable in question is correlated is already included in the model. Consequentially two strongly correlated variables will result in having similar contributions. Israeli (Israeli, 2006, pp. 199-212) then argues that it is possible to similarly treat cases where non-linear effects of a variable are included in the regression models and models where interacting variables are introduced. There is no evidence of Fields decomposition, how the contribution should be divided in such cases, while this represents no problem for Shapley decomposition. 3.2.4 Choosing “key-drivers” Up to this point, a method that successfully measures the relative importance of attributes in the model has been established. The following analytical design is proposed to effectively identify the key dissatisfiers (i.e. attributes that need attention). The notations used include: • P(D) – probability of dissatisfaction • P(F) – probability of failure by any of the independent attributes • P(D|F) – conditional probability of dissatisfaction among failed • P(D|F’) – conditional probability of dissatisfaction among non-failed • P(F|D) – conditional probability of failure among those dissatisfied – reach value • P(F|D’) – conditional probability of failure among those non-dissatisfied – noise value 25 In general, it is possible to say that values on the several bottom levels (less than 5) on the ordinal satisfaction scale prove dissatisfaction (D) and an identified problem corresponds to failure (F). The opposite events; non-dissatisfaction and non-failure are denoted as D’ and F’ respectively. To identify the attributes that need attention, it is necessary to find the maximum values of the: Success =Re ach − Noise = P ( F D) − P ( F D' ) Equation 15: Success This is a measure of the prevalence of failed respondents, among those who are dissatisfied, in comparison with failed respondents, among those non-dissatisfied. Considering a situation where all the attributes are ordered by their Shapley values in descending order and corresponding reach and noise values are given. According to Conklin and Lipovetsky (Conklin & Lipovetsky, 2004), adding the second ranked attribute to the model along with the first one; will increase the reach function (i.e. the failure on either of the two attributes increases the amount of dissatisfied customers). However, the noise function increases correspondingly (i.e. the non-dissatisfied ones). Adding more attributes results in the same pattern. In general, reach means reassuring that a large part of the total number of dissatisfied customers are taken into consideration (which needs to be maximized), while a large noise number would mean focusing on problems that are not actual causes of dissatisfaction (Conklin & Lipovetsky, 2004). Once added noise overwhelms the added reach, when including the next attribute into the model, success begins to decrease. At that point the final set of key dissatisfiers is defined (Conklin & Lipovetsky, 2004). 3.3 Trend Analysis Using the Shapley value as the measure of importance, allows us to track market over time. The differences between two waves are due to actual changes in the market. 26 3.3.1 The time consistent Shapley value The Shapley value is one of the most commonly used sharing mechanisms in static cooperation games with transferable payoffs (Yeung, 2010, pp. 137-149). Actually, the time-consistency property of the Shapley value means that if one renegotiates the agreement at any intermediate instant of time, assuming that cooperation has prevailed from initial date until that instant, then one would obtain the same outcome (Petrosjan & Zaccour, 2001, pp. 381-398). Thus, taking this property allows us to compare the marginal contribution of each satisfaction attribute over time. 3.4 Hierarchical Logistic Regression Modeling A hierarchical logistic regression model is proposed to examine data with group structure and a binary response variable. There group structure is usually characterized by two levels; micro and macro. The structure is visually presented in the figure 4. Figure 4: Two-level hierarchical regression The same variables, predictors are used in each context, but the micro predictors are allowed to vary over context. At the first (micro) level, ordinary logistic regression model is applied. At the second (macro) level the micro coefficients are treated as functions of macro predictors. A Bayes estimation procedure is used to estimate the micro and macro coefficients. The components of the model represent within- and between- macro variance. An algorithm for finding the maximum likelihood estimates of the covariance of the components is proposed. The make-model car is viewed as macro observations and individual cars as micro. Dai, Li and Rocke (Dai et al., NN) propose the following procedure. 27 3.4.1 Ordinary logistic regression model Let y be a binary outcome variable (i.e. the customer is satisfied or dissatisfied) that follows Bernoulli distribution y ~ Bin (1, •) and x be a car level predictor. Then the model can be written as: y ij = π ij + eij logit(•ij) = log( π ij 1 − π ij ) = α + β xij Equation 16: Ordinary logistic regression model Where; - i = 1 … Ij is the car level indicator and - j = 1 … J is the make-model level indicator - π ij is the probability of dissatisfaction for car i among make-model j, conditional on x Assumptions made in this model are that the micro level random errors eij are independent with moments E(eij) = 0 and Var(eij) = σ e2 = π ij (1 − π ij ) . 3.4.2 Hierarchical logistic regression Extending the ordinary model and accounting for effects of the second macro- level may be done by including design variables (dummy variables). Each second level unit (i.e. each make-model unit) has its own intercept in the model. These intercepts are used to measure the differences between make-models. logit(•ij) = •j + •xij where •j is the make-model intercept and its effect can be either fixed or random (Domidenko, 2004). For simplicity purposes it is possible to treat the effects as random and re-write the model as following: logit(•ij) = •j + •xij where •j = α + u j Equation 17: Random effects It is then possible to add second level predictors. The above equation will therefore be extended to: 28 logit(•ij) = •j + •xij •j = • + •zj + uj Equation 18: Fixed effects Where the added term • is a fixed effect and z is the second level predictor. Using the same predictors, the model can be extended further for investigation of possible cross-level interaction. The algorithm can be applied using SAS procedure PROC GLIMMIX. 3.5 Canonical Correlation Analysis Canonical correlation has been introduced by Harold Hotelling (Johnson & Wichern, 2001) and is a way of exploring the cross-covariance matrices. Consider two sets of variables x1, … , xn and y1, … ,ym and assume there are correlations among these variables. Then the canonical correlation analysis will result in finding combinations of x’s and y’s which have maximum correlation with each other. 3.5.1 Formulation Given vectors; • X = (x1, …, xn) and, • Y = (y1, …, yn) Let; • • ∑ ∑ xx = cov( X , X ) and, YY = cov(Y , Y ) The parameter to maximize is; ρ= a ' ∑ xy b a ' ∑ xx a b' ∑YY b Equation 19: CCA parameter Following: The canonical variables are defined by; • U = a’X • V = b’Y 29 3.5.2 Issues and practical usage The main benefit of using the canonical correlation analysis is its diversification from other (appropriate) multivariate techniques that impose very rigid restrictions. It is generally believed that those provide results of higher quality. However, for the purpose of this research and when dealing with this type of data, the fact that canonical correlation places the fewest restrictions makes it the most appropriate and powerful multivariate technique. It may be seen as a generalization of the multiple linear regression. Variables included in the analysis should be on ratio or interval scale. However nominal or ordinal variables can be used after converting them to sets of dummy variables. Even though testing significance of the canonical correlations requires data to be multivariate normal, the technique performs well for descriptive purposes even if the requirement is not necessarily fulfilled. Hair (Hair et al.,1998) discusses the flexibility of the canonical correlation and its advantages, particularly in the context when the dependent and explanatory variables can be either metric or non-metric. Hence, the application is broadly consistent with existing literature. 4 Computations and Results The very first step when conducting the analyses was using SAS statistical software to transform the variables that allowed more than one answer (e.g. problem areas) into binary form by adding dummy variables.2 4.1 Shapley Value I used R statistical language more specifically The Package relaimpo (Relative Importance for Linear Regression in R). This package implements six different metrics for assessing relative importance of predictors in the linear model. Moreover, it offers exploratory bootstrap confidence intervals (Journal of Statistical Software, 2006). For the purpose of this research, there are three particularly useful metrics; “lmg”, “first” and “last”, described in the following 30 • “lmg”; these are the Shapley Values. The metric is a decomposition of R2 into non- negative contributions that automatically sum to the total R2. It is recommended to use when calculating relative importance, since it uses both direct effect and effects adjusted for other predictors in the model. • “First”; these are univariate R2 values from regression models with one predictor only. They explain what each predictor individually is able to explain. If predictors are correlated the sum of all “firsts” will be high above the the overall R2 of the model. • “Last”; these explain what each predictor is able to able to explain in addition to all other predictors. The values represent the increase in R2 when the specific predictor is added to the model. In case of correlation among the predictors, summing “lasts” will not add up to the overall R2. A potential drawback are computational difficulties, hence sampling of attributes is necessary. Theil (Theil, 1987) suggests that an information measure may be introduced, thus information coefficient was introduced as a pre-analysis step. Information coefficient is a measure for evaluating the quality and usefulness of attributes. Unavoidably, 20 vehicle related attributes were chosen in each dataset. The following analysis is based on the R-output3 and includes relative importance of 15 satisfaction attributes regarding the dealer, where the vehicle was purchased, followed by 20 attributes regarding the vehicle, both ranging over 4 years. 2 3 See Appendix A for SAS codes See Appendix B 31 4.1.1 Ranked Satisfiers (related to the satisfaction with the dealer) Figure 5 is illustrating the frequency distribution of the response variable. Figure 5: Satisfaction Attribute V90, Country A, Year 2006 Tables 4 to 6 are displaying the “lmg” metrics of the attributes regarding the satisfaction with the dealer and are ordered according to their relative importance. Table 4: Dealer Satisfiers, Country A, Years 2006 and 2007 respectively V91 V94 V103 lmg RI % 0,185213 0,098906 0,093681 18,52% 9,89% 9,37% V91 V94 V103 32 lmg RI % 0,1910097 0,09609172 0,08891168 19,10% 9,61% 8,89% 0,085307 8,53% 8,80% V98 V198 0,08803514 V93 0,072336 7,23% V93 0,07442355 7,44% 0,06959 6,96% 6,92% V101 V101 0,06921337 0,064972 6,50% 0,06514394 6,51% V95 V95 V97 0,062902 6,29% V97 0,06124171 6,12% 0,05719067 5,72% V102 0,058982 5,90% V96 V96 0,055878 5,59% V102 0,05623089 5,62% V99 0,055683 5,57% V99 0,05405284 5,41% 0,05121555 5,12% V100 0,048342 4,83% V92 V92 0,048208 4,82% V100 0,04723924 4,72% Table 5: Dealer Satisfiers, Country A, Years 2008 and 2009 respectively V91 V94 V103 V98 V93 V101 V95 V97 V102 V96 V99 V92 V100 lmg RI % 0,1876471 0,0988604 0,0874508 0,0854664 0,0752088 0,0711366 0,0653855 0,0627659 0,0596454 0,0561828 0,0542581 0,0498752 0,0461172 18,76% 9,89% 8,75% 8,55% 7,52% 7,11% 6,54% 6,28% 5,96% 5,62% 5,43% 4,99% 4,61% V91 V94 V103 V98 V93 V101 V95 V97 V102 V96 V99 V92 V100 lmg RI % 0,18835054 0,09666288 0,09215811 0,08446151 0,0736499 0,07036984 0,06565629 0,06140672 0,0573845 0,05621675 0,05484663 0,0518594 0,04697694 18,84% 9,67% 9,22% 8,45% 7,36% 7,04% 6,57% 6,14% 5,74% 5,62% 5,48% 5,19% 4,70% Table 6: Dealer Satisfiers, Country A, Year 2010 V91 V94 V103 V98 V93 V101 V95 V102 V97 V96 V92 Lmg RI% 18,78% 10,01% 8,75% 8,35% 7,49% 7,21% 6,77% 6,12% 6,01% 5,50% 5,21% 18,78% 10,01% 8,75% 8,35% 7,49% 7,21% 6,77% 6,12% 6,01% 5,50% 5,21% 33 V99 V100 5,18% 5,18% 4,63% 4,63% 4.1.2 Ranked Satisfiers (related to the satisfaction with the vehicle) Tables 7 to 9 are illustrating satisfaction attributes regarding the vehicle and are ordered according to their relative importance. Table 7: Vehicle satisfiers, Country A, Years 2006 and 2007 respectively V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 lmg RI % 0,17533809 0,06934197 0,06650988 0,05235531 0,05018064 0,04983194 0,04902994 0,04524226 0,04373201 0,04235835 0,04168734 0,03989079 0,03868629 0,03835219 0,03749254 0,03447326 0,03417306 0,03296703 0,03095085 0,02740625 17,53% 6,93% 6,65% 5,24% 5,02% 4,98% 4,90% 4,52% 4,37% 4,24% 4,17% 3,99% 3,87% 3,84% 3,75% 3,45% 3,42% 3,30% 3,10% 2,74% V1 V2 V21 V3 V7 V11 V6 V9 V5 V8 V13 V10 V12 V25 V15 V17 V14 V16 V22 V20 34 lmg RI % 0,1712068 0,0652616 0,0643264 0,0581126 0,0481799 0,0468997 0,0458007 0,0452338 0,0443357 0,0433154 0,0409352 0,0400688 0,0396294 0,0386817 0,0373659 0,0370816 0,0359364 0,0352603 0,0334706 0,0288976 17,12% 6,53% 6,43% 5,81% 4,82% 4,69% 4,58% 4,52% 4,43% 4,33% 4,09% 4,01% 3,96% 3,87% 3,74% 3,71% 3,59% 3,53% 3,35% 2,89% Table 8: Vehicle Satisfiers, Country A, Years 2008 and 2009 respectively lmg V1 V23 V2 V21 V15 V7 V3 V24 V4 V8 V10 V9 V11 V17 V13 V5 V14 V25 V12 V16 RI % 0,12683183 12,68% 0,08183276 8,18% 0,06149193 6,15% 0,05708817 5,71% 0,04966849 4,97% 0,04958275 4,96% 0,04926621 4,93% 0,04758717 4,76% 0,04521186 4,52% 0,04422061 4,42% 0,04354485 4,35% 0,04270511 4,27% 0,04203345 4,20% 0,03907649 3,91% 0,03894794 3,89% 0,03793851 3,79% 0,03757862 3,76% 0,0361539 3,62% 0,03570495 3,57% 0,03353438 V1 V23 V25 V21 V7 V3 V4 V11 V9 V10 V8 V13 V12 V17 V14 V26 V15 V24 V22 V16 3,35% 35 lmg RI % 0,11186845 0,0769934 0,06519498 0,06440816 0,05385323 0,05198625 0,04671821 0,04539 0,04407208 0,04395921 0,04258536 0,04169902 0,04096882 0,0406323 0,0406012 0,03910197 0,03835864 0,03791848 0,03720759 11,19% 7,70% 6,52% 6,44% 5,39% 5,20% 4,67% 4,54% 4,41% 4,40% 4,26% 4,17% 4,10% 4,06% 4,06% 3,91% 3,84% 3,79% 3,72% 0,03648265 3,65% Table 9: Vehicle Satisfiers, Country A, Year 2010 V27 V1 V2 V21 V7 V4 V8 V3 V9 V11 V13 V17 V10 V12 V15 V14 V25 V22 V16 V19 lmg RI % 0,15284015 0,11910833 0,05708717 0,05692879 0,04802484 0,04617533 0,04512617 0,04414864 0,04018143 0,0388276 0,03744338 0,03739872 0,03708772 0,0362962 0,03521464 0,03482542 0,03391699 0,03373433 0,03357805 0,0320561 15,28% 11,91% 5,71% 5,69% 4,80% 4,62% 4,51% 4,41% 4,02% 3,88% 3,74% 3,74% 3,71% 3,63% 3,52% 3,48% 3,39% 3,37% 3,36% 3,21% 36 4.1.2.1 Among customers that did not experience any problems The follow up analysis took a closer look on the customers, who did not experience any problems and compared the obtained relative importances to those obtained in the previous section where all the customers were included in the analysis. Tables 10 to 12 are displaying the satisfaction attributes regarding the vehicle, ordered according to their relative importance. Table 10: Vehicle Satisfiers, Country A, Years 2006 and 2007 respectively (respondents with no problems) V7 V14 V2 V1 V8 V3 V11 V6 V13 V10 V5 V9 V12 V16 V28 V15 V26 V22 V17 V4 lmg RI % 0,06927767 0,06706004 0,06605162 0,0636296 0,0587735 0,05574437 0,05187558 0,05026635 0,04927452 0,04775618 0,046488 0,04626672 0,04400858 0,04299308 0,04233533 0,04196933 0,04179348 0,03961751 0,03906054 0,035758 6,93% 6,71% 6,61% 6,36% 5,88% 5,57% 5,19% 5,03% 4,93% 4,78% 4,65% 4,63% 4,40% 4,30% 4,23% 4,20% 4,18% 3,96% 3,91% 3,58% V14 V2 V7 V11 V1 V8 V3 V10 V13 V6 V9 V25 V29 V5 V22 V12 V26 V15 V17 V30 37 lmg RI % 0,07256916 0,06769435 0,0675463 0,0565514 0,05623229 0,05400056 0,052504 0,05111744 0,05007357 0,0469457 0,04605065 0,04549564 0,04543294 0,04505041 0,04427986 0,04359521 0,04337561 0,04075267 0,04038925 0,03034298 7,26% 6,77% 6,75% 5,66% 5,62% 5,40% 5,25% 5,11% 5,01% 4,69% 4,61% 4,55% 4,54% 4,51% 4,43% 4,36% 4,34% 4,08% 4,04% 3,03% Table 11: Vehicle Satisfiers, Country A, Years 2008 and 2009 respectively (respondents with no problems) V2 V7 V14 V10 V8 V3 V1 V11 V31 V26 V9 V12 V17 V13 V21 V16 V25 V15 V28 V27 lmg RI % 0,0751163 0,06607087 0,06505198 0,05773926 0,05375072 0,05281364 0,05021427 0,04988962 0,04841213 0,04825039 0,04797203 0,04751064 0,04722728 0,04678278 0,04508103 0,04223649 0,04173557 0,04111215 0,03983678 7,51% 6,61% 6,51% 5,77% 5,38% 5,28% 5,02% 4,99% 4,84% 4,83% 4,80% 4,75% 4,72% 4,68% 4,51% 4,22% 4,17% 4,11% 3,98% 0,03319606 3,32% V14 V17 V2 V10 V1 V11 V31 V8 V13 V3 V26 V16 V17 V21 V25 V28 V9 V12 V15 V27 38 lmg RI % 0,07195362 0,0706223 0,06242987 0,05606256 0,0551477 0,05439075 0,0535409 0,05202964 0,05015063 0,0497964 0,04625483 0,04612255 0,04419138 0,04413617 0,04400253 0,04375152 0,0433502 0,04258554 0,03882214 7,20% 7,06% 6,24% 5,61% 5,51% 5,44% 5,35% 5,20% 5,02% 4,98% 4,63% 4,61% 4,42% 4,41% 4,40% 4,38% 4,34% 4,26% 3,88% 0,03065876 3,07% Table 12: Vehicle Satisfiers, Country A, Year 2010 (respondents with no problems) V2 V7 V14 V8 V1 V31 V11 V13 V10 V3 V21 V170 V9 V26 V25 V22 V16 V15 V12 V27 lmg RI % 0,06599232 0,06496484 0,060577 0,05773304 0,05416185 0,05325409 0,05198424 0,05113744 0,05081902 0,04954933 0,04667376 0,04555793 0,04555003 0,04542082 0,04530835 0,04492645 0,04480985 0,041844 0,04176165 0,03797399 6,60% 6,50% 6,06% 5,77% 5,42% 5,33% 5,20% 5,11% 5,08% 4,95% 4,67% 4,56% 4,56% 4,54% 4,53% 4,49% 4,48% 4,18% 4,18% 3,80% From the above tables it is possible to notice that the contributions of the satisfaction attributes are very close in terms of importance. A number of new attributes, which were previously less important, entered the new model (e.g. V28, V29, V30 and V31). The importance of the attribute V14 increased greatly and is appearing on the top three list each year. 39 4.1.3 Ranked Dissatisfiers In contrast to the previous section, this part is focusing on the identification of the greatest dissatisfier. The Shapley value was calculated for all experienced problem areas, followed by analysis of problems in each problem area (i.e. sub-categories). Tables 13 to 15 are illustrating the problem areas ranked according to their relative importance. Table 13: Dissatisfiers, Country A, Year 2006 and 2007 respectively Ven Vb Vc Vel Vi Vo Vsw Vbr Ve Vp Vs Vw Vex Vot lmg RI % 0,212139073 0,12142325 0,120215922 0,108495021 0,100405042 0,062205607 0,0606551 0,045555738 0,042242747 0,039842387 0,033740768 0,03095202 0,013582749 21,21% 12,14% 12,02% 10,85% 10,04% 6,22% 6,07% 4,56% 4,22% 3,98% 3,37% 3,10% 1,36% 0,008544577 0,85% Ven Vc Vb Vel Vi Vo Vsw Vw Vbr Vs Vp Ve Vex Vot 40 lmg RI % 0,212809172 0,137871419 0,123632357 0,089960019 0,074496974 0,061172457 0,056500635 0,05154355 0,048614873 0,046087759 0,042708995 0,035370297 0,0104424 21,28% 13,79% 12,36% 9,00% 7,45% 6,12% 5,65% 5,15% 4,86% 4,61% 4,27% 3,54% 1,04% 0,008789093 0,88% Table 14: Dissatisfiers, Country A, Year 2008 and 2009 respectively Ven Vb Vc Vel Vi Vo Vw Vbr Vp Vs Vsw Ve Vex Vot lmg RI % 0,252488469 0,110277994 0,103348859 0,093153875 0,087610414 0,05987042 0,05203434 0,048519557 0,045876248 0,042098418 0,039320631 0,033029765 0,025816516 0,006554494 25,25% 11,03% 10,33% 9,32% 8,76% 5,99% 5,20% 4,85% 4,59% 4,21% 3,93% 3,30% 2,58% 0,66% Ven Vc Vb Vel Vi Vs Vbr Vo Ve Vsw Vp Vw Vex Vot lmg RI % 0,214316999 0,151047967 0,122505803 0,088606572 0,079500754 0,055625147 0,050078129 0,049666289 0,048929548 0,047450551 0,047259639 0,027595428 0,015697143 0,001720031 21,43% 15,10% 12,25% 8,86% 7,95% 5,56% 5,01% 4,97% 4,89% 4,75% 4,73% 2,76% 1,57% 0,17% Table 15: Dissatisfiers, Country A, Year 2010 Ven Vc Vb Vel Vi Vo Vbr Vs Vsw Vp Ve Vw Vex Vot lmg RI % 0,292895 0,136178 0,089357 0,08703 0,065648 0,05879 0,049947 0,049329 0,041235 0,039328 0,036337 0,028256 0,014915 0,010756 29,29% 13,62% 8,94% 8,70% 6,56% 5,88% 4,99% 4,93% 4,12% 3,93% 3,63% 2,83% 1,49% 1,08% 41 The analysis was then applied to sub-categories in order to identify the absolute dissatisfier. Table 16: Ven problem area sub-categories, Country A, Year 2006 Ve4 Ve1 Ve7 Ve98 Ve5 Ve8 Ve6 Ve18 Ve15 Ve9 Ve19 Ve2 Ve11 Ve16 Ve10 Ve27 Ve26 Ve22 Ve12 Ve14 Ve17 Ve3 lmg RI % 0,244029091 0,20440145 0,149881772 0,105140416 0,071517411 0,041940597 0,033103608 0,019165955 0,018551414 0,018305727 0,016829286 0,016335526 0,01555382 0,013381763 0,008700418 0,007703845 0,006485322 0,003608788 0,002974477 0,00104175 0,000853017 0,000494547 24,40% 20,44% 14,99% 10,51% 7,15% 4,19% 3,31% 1,92% 1,86% 1,83% 1,68% 1,63% 1,56% 1,34% 0,87% 0,77% 0,65% 0,36% 0,30% 0,10% 0,09% 0,05% 42 4.1.4 “Key attributes” identification Figure 6: Noise-Reach table, Country A, Year 2006 Figure 6 (above) is illustrating the “key attributes” identification. Problem areas are ranked according to the corresponding Shapley values and “reach” and “noise” are calculated according to the equation 15. Once added noise overcomes added reach, the cutting point is known. All problem areas with corresponding success less than 0 are unimportant. 4.2 Time Series and Trend Analysis Time series analysis in order to detect possible trend in relative importance was applied to those satisfaction attributes (in relation to the vehicle) that were repeating in the model over the 5 years. This is illustrated in figure 7. 43 Figure 7: Time Series Analysis, Country A According to the above chart, the relative importance of the satisfaction attribute V1 is showing most fluctuation over time, while the remaining attributes are fairly stable. Figure 8 shows fitted linear trend line, which illustrates the changes in satisfaction attribute V1 over five consecutive years of study. The R2 represents trendline trustworthiness. Its value of 0,8153 confirms a fairly good fit of the line to the data.4 4 Trend fitted to the remaining variables is displayed in Appendix B. 44 Figure 8: Trend in V1, Country A Since there were significant differences in relative importance of the attributes when taking into account all the respondents and when only performing the analysis on respondents who did not experience any problems, time series analysis was applied to the latter as well. Figure 9 shows the changes in relative importance of the attributes that were continuously included over all five years.5 Figure 9: Time Series Analysis, Country A (respondents with no problems) 5 The remining attributes trend analysis graphs are in Appendix B 45 Trend analysis was then applied to the same satisfaction attribute (i.e. V1). While the previous case (where all the respondents were included in the analysis) the linear trend provided a good fit, here a better option (with R2 = 0,8429) was a polynomial trend. Figure 10: Trend in V1, Country A (respondents with no problems) Several attributes appeared in both analyses (i.e. when all the respondents were included and where only those who had reported experienced problem were considered). However, there are differences to note when comparing trends over the five years. This is illustrated in Figure 11. Figure 11: Trend in V8, Country A, all respondents vs. only those with no problems While satisfaction attribute V8, still follows rather similar pattern, a very big difference can be noticed in the following (figure 12), attribute V10. 46 Figure 12: Trend in V10, Country A, all respondents vs. only those with no problems Figure 13: Trend in V17, Country A, all respondents vs. only those with no problems The trend pattern in attribute V17 resulted in expected similarities. Since the perception of this particular attribute is directly linked to the fact whether a certain problem (especially Ven; which also has the greatest contribution to the overall satisfaction) occurred, the slope is steeper when all respondents are included in the model. 47 As a last step of time series analysis, the relative contribution of problem areas to the overall dissatisfaction was inspected. Figure 14: Time Series Analysis, problem areas, Country A Figure 14 illustrates an increase in relative importance of the problem area Ven on overall dissatisfaction while the remaining problems show rather stable patterns or minor decreases. 4.3 Hierarchical Logistic Regression: SAS Modeling The investigation whether the results depend on the make-model of the car or not was conducted with hierarchical logistic modeling. Table 17 illustrates the chosen variables for each level of the regression. The corresponding SAS code can be found in Appendix A. Table 17: Bulding the GLIMMIX procedure Defintion Dependent variable measured at the car level; within the j-th make-model Number of Car (micro) level variable; measuring the number of problems problems identified Recommendation Make-model (macro) level variable; indicating whether the customer would recommend the model in question Table 18: Country A, Year 2006 Fit Statistics Variable Satisfaction 48 2 Res Log Pseudo-Likelihood Generalized Chi-Square 36400.50 Gener. Chi Square / DF 210468 36400,5 0,99 Covariance Parameter Estimates Cov Parameter Subject Estimate Standard Error Intercept V4 0,02303 0,01175 Table 18 is a part of the SAS output and is displaying the between make-model variance, which equals 0,02303. Effect V387 V227 Table 19 Type III Tests of Fixed Effects Num Den DF DF F Value 1 36469 158.92 3 508 431.09 Pr > F <.0001 <.0001 The P-value from the Wald chi-square is <.0001, indicating statistically significant association between make-model and variables V387 and V227. Table 20: Solution for fixed effects Solutions for fixed effects Effect Intercept V387 V227 V227 Recommendation New Car 1.00 2.00 Estimate 0.8552 0.2161 -3.9507 -3.5416 Error 0.1593 0.01714 0.1582 0.1592 DF 232 36469 508 508 t Value 5.37 12.61 -24.97 -22.24 Pr > |t| <.0001 <.0001 <.0001 <.0001 The coefficient of V387 is 0,2161 , and its standard error is 0,01714. The corresponding P-value is <.0001 which indicates statistical significance. This indicates that V387 and V227 have significant effect on overall satisfaction. 49 4.4 Canonical Correlation Analysis Canonical correlation analysis (CCA) was applied to all satisfaction attributes and a priori specified problem areas (including only problems that are of “annoying concept”). Table 21 shows the strongest possible linear combination between any sets of variables. In addition, it provides information on how many of the canonical variables are significant (i.e. the first 15). In general the number of canonical variates is equal to the number of variables in the smaller set, however the number of significant canonical variates is usually smaller. The first Ftest corresponds to the hypothesis whether all canonical variates are significant, the second whether the combinations of all remaining excluding the first one are significant and so on. 50 Table 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Canonical Adjusted Canonical Approximate Standard Squared Canonical Correlation Correlation Error Correlation 0.367972 0.193745 0.185235 0.153670 0.102792 0.096696 0.084292 0.078367 0.077105 0.070409 0.067683 0.065785 0.063492 0.061129 0.058516 0.057007 0.052718 0.051726 0.050328 0.045737 0.044686 0.043131 0.042241 0.037202 0.036330 0.035254 0.032469 0.029588 0.027873 0.023221 0.021481 0.018175 0.364525 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.004763 0.005302 0.005320 0.005379 0.005451 0.005457 0.005470 0.005475 0.005476 0.005482 0.005484 0.005485 0.005487 0.005488 0.005490 0.005491 0.005494 0.005494 0.005495 0.005497 0.005498 0.005499 0.005499 0.005501 0.005502 0.005502 0.005503 0.005504 0.005505 0.005506 0.005506 0.005507 0.135404 0.037537 0.034312 0.023614 0.010566 0.009350 0.007105 0.006141 0.005945 0.004957 0.004581 0.004328 0.004031 0.003737 0.003424 0.003250 0.002779 0.002676 0.002533 0.002092 0.001997 0.001860 0.001784 0.001384 0.001320 0.001243 0.001054 0.000875 0.000777 0.000539 0.000461 0.000330 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Likelihood Eigenvalue Difference Proportion Cumulative Ratio Approximate F Value Num DF Den DF Pr > F 0.1566 0.0390 0.0355 0.0242 0.0107 0.0094 0.0072 0.0062 0.0060 0.0050 0.0046 0.0043 0.0040 0.0038 0.0034 0.0033 0.0028 5.39 3.18 2.66 2.15 1.79 1.67 1.56 1.48 1.42 1.36 1.31 1.26 1.21 1.16 1.11 1.07 1.02 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.0004 0.0106 0.0918 0.3675 0.1176 0.0035 0.0113 0.0135 0.0012 0.0023 0.0010 0.0002 0.0010 0.0004 0.0003 0.0003 0.0003 0.0003 0.0002 0.0005 0.0001 0.4514 0.1124 0.1024 0.0697 0.0308 0.0272 0.0206 0.0178 0.0172 0.0144 0.0133 0.0125 0.0117 0.0108 0.0099 0.0094 0.0080 0.4514 0.5638 0.6662 0.7359 0.7667 0.7939 0.8146 0.8324 0.8496 0.8640 0.8772 0.8898 0.9014 0.9122 0.9221 0.9315 0.9396 0.71610449 0.82825292 0.86055586 0.89113248 0.91268499 0.92243167 0.93113789 0.93780107 0.94359606 0.94923947 0.95396867 0.95835891 0.96252443 0.96642027 0.97004514 0.97337812 0.97655169 2048 1953 1860 1769 1680 1593 1508 1425 1344 1265 1188 1113 1040 969 900 833 768 941133 914643 888037 861308 834451 807459 780326 753044 725608 698007 670236 642284 614145 585807 557262 528501 499513 Tabl 22 illustrates several multivariate statistics. The small p-values for these tests implie rejection of the null hypothesis that all the canonical correlation are zero. Table 22 Multivariate Statistics and F Approximations Statistic Wilks' Lambda Pillai's Trace Hotelling-Lawley Trace Roy's Greatest Root S=32 M=15.5 Value N=16426.5 F Value Num DF Den DF Pr > F 0.71610449 0.32198874 5.39 5.22 2048 2048 941133 1.05E6 <.0001 <.0001 0.34693587 0.15660903 5.57 80.47 2048 64 693852 32886 <.0001 <.0001 NOTE: F Statistic for Roy's Greatest Root is an upper bound. The canonical variables, despite being “artificial” can be identified in terms of original variables. The standardized canonical coefficients (table 23) are interpreted in a similar manner as for standardized regression coefficients. For example, one standard deviation increase in the first 52 variable (V192), leads to 0,1928 standard deviation increase in the score on the first canonical variate for set 1, ceteris paribus. Table 23 V192 V194 Satisf1 Satisf2 Satisf3 Satisf4 Satisf5 0.1628 0.1018 0.1983 0.3363 -0.0386 0.1476 0.0514 0.1327 -0.2083 0.2528 Satisf6 0.4022 0.2809 The first step in the explanation of the CCA is examining the sign and magnitude of the canonical weight. However, canonical weights may be affected by multicollinearity. Hence, examining canonical loadings is considered to be more reliable. Finally cross-loadings can be examined. A cross-loading is the correlation between an observed variable from satisfaction area with one canonical variate from the problem area (and vice versa). The CCA did provide a good model for identification of the linkages between satisfaction attributes and problem areas. However, that was not the case when taking only a priori specified problem sub-categories (those that are of the type “annoying concept”) into consideration. While the significant relationships were found, the structural context was perfunctory. 5 Discussion and conclusions Throughout this paper several statistical techniques and applied economics methods have been implemented in order to build exploratory and predictive models that lead to accurate outputs. There were three major objectives; exploring relative importance and marginal contributions of several satisfaction attributes on overall customer satisfaction, evaluating the relationships between experienced problems with the product and satisfaction attributes and investigating whether the former depends on volume mix. The first major challenge encountered when selecting an appropriate methodology was the nature and the dimensionality of the input data. A very large number of different types of the input variables and their distributions did present severe problems for several rigid techniques. In addition, there may be several externalities affecting customer behavior and perceptions that were 53 not measured by the survey, and therefore remain unknown. The second challenge was to overcome the problem of multicollinearity that is common to appear in sciences with predominance of observational data and finally, the measurements used needed to be consistent over time and allow for trend detection. Given this, the methodology used needed to be very flexible, with as few underlying requirements as possible, yet computationally efficient and accurate. The first method used was the Shapley value, as it solved the core part of the research. The topic of assigning relative importance to predictors in regression is in general quite old. However, more recent developments in computational capabilities have led to applications of advanced methods and enable different approaches to decomposition of the R2. This type of decomposition is often met in sciences that rely on observational data (i.e. psychology, economy and so forth). The metric “lmg” offered by the R - relaimpo - package is based on heuristic approach of averaging over all orders (Grömping, 2007, pp. 139 - 147). In many of the previous studies relative importance is described in purely descriptive fashion (i.e. no explanation of the statistical behavior of the variance is given). This research takes a step forward and offers a more illustrative example of the R2 decomposition, which is important for understanding the Shapley value (which is exactly the “lmg” metric). The results obtained were very satisfactory, since the Shapley value is a very robust estimator and can handle very complex datasets, including large portions of missing values and different types of measurement levels of the input variables. It successfully avoids falling into the multicollinearity trap. The method is very stable in evaluating the impact of attributes measured over time. The changes in consecutive time periods are in fact due to real changes in the market. Furthermore, the basis for the “key attributes” analysis is in fact the Shapley value. It provides very a useful tool that can be applied to numerous problems of data modeling in various managerial fields. This research effectively identified the attributes that need managerial attention and if improved increase sales and profitability. As a result of such analysis, decision makers can implement several strategies for customer acquisition and retention. 54 The results based on the relative importance when all the respondents were included in the model were compared to the results from a dataset limited only to respondents who did not experience any problems, which can be seen as a type of segmentation technique that groups customers with similar behaviors and consequentially attributes preferences into two distinctive groups. This placement offers optimization of the targeting processes. The latter group (i.e. group that did not experience any problems) perceives the attributes that are directly related to the features and characteristics of the new vehicle as much more important than the group that experienced problems. Among those customers, the attributes that are of broader nature (i.e. are connected to the performance and overall quality) contribute heavily to the overall satisfaction. Moreover, the gap between the importance of the feature- related attributes and the overall quality attributes is much wider than within the first group. There are also differences to note between observing the relative importance of the attributes regarding the vehicle and the ones regarding the dealer. The latter did not show much change over time and moreover even the ranking of the attributes did not change significantly and the top list is always consisting of the same attributes. Time series analysis was then applied to satisfaction attributes (both previously mentioned group of respondents) and problem areas. Due to the fact that the questionnaire changed over the years, not all satisfaction related attributes re-appeared in all consecutive years. Therefore, only those that appeared in all five models were used in trend analysis. The data displayed many fluctuations; therefore a polynomial trend represented the best fit. However, even this was very weak in the majority of cases and several attributes (V2, V11, V13, V9, V12) did not show any trend pattern whatsoever. There were several differences to note when comparing the trend patterns of the same attribute within the group of all respondents and the group of those who reported experiencing a problem. The nature of V1 attribute is such that its relative contribution to the overall satisfaction is greater when experiencing problems (i.e. the attribute is perceived as more valuable with customers who had problems). Hence, the trend illustrates a similar but steeper pattern in the first group. While 55 satisfaction attribute V8, still follows a rather similar shape, a very big difference can be noticed in attribute V10. Since the proportion of the customer experiencing problems did not change significantly over the years, the explanation of these different behaviors lies in the nature of the attributes and the perceptions affected by psychological factors among those that experience a certain problem. The research continued with an investigation that combine individual-level and aggregate data are rather common. The method used was the hierarchical logistic regression. The advantage of such modeling is that it takes the hierarchical structure of the data into account. It specifies random effects on all levels of the analysis and consequentially provides more conservative implication of the aggregate fixed effects. Such aggregate data often includes valuable hints on individual behavior. An important variable to take into consideration is the willingness to recommend. It is the key metric relating to customer satisfaction. The results obtained showed statistically significant association between the make-model and variables V387 and V227. While the main benefit of using the CCA is that it provides a good exploratory technique when comparing two sets of variables, an issue of “meaningfulness” and “significance” appeared in this case. The CCA performed well with manipulated datasets (i.e. limiting the dataset to attributes and problems that are assumed to be correlated) and it was an appropriate method to choose. The problem appeared when it was applied to all the satisfaction attributes and an a priori specified set of sub-problems. While it did couple the attributes, it failed to provide satisfactory results in sense of meaningfulness (i.e. the results obtained were not logical in the sense of which satisfaction attributes coupled with which experienced problem). In order to automatize the methods used, the surveys should not vary in terms of variables and attributes over the years nor between the countries. Hence a standardization of the surveys is needed. Moreover, the coding, labels and formats of the variables in question should be synchronized. Some of the codes in the appendix A may be re-applied. 56 The results are consistent with the key element of the company objectives, which is intention of building on customer satisfaction and retention. The results are also broadly compatible with researches done in other sectors. Hence the most important objective of the customer satisfaction analysis is revealed in this research. While all used models did perform fairly well there is room for further investigation and research. 5.1 Proposed further research 5.1.1 Kernel Canonical Correlation Analysis The issue with the classical canonical correlation is that it is limited to linear associations. Using Kernel methods as a pre-step in the analysis can enhance the results, by extending the classical model to a general nonlinear setting. In addition, it no longer requires the Gaussian distributional assumption for the observations. 5.1.2 Moving Coalition Analysis Mansor and Ohsato (Mansor & Ohsato, 2010) proposed a method called “Motion Coalition Analysis” (MCA) to observed the performance trends of a coalition over time. It divides the coalition into several sub-coalitions and determines the characteristic function of all subcoalitions. Each period is then treated as a player (Mansor & Ohsato, 2010). 57 6 Literature and sources 1. Alterman, T., Deddens, A.J., Constella, J.L., (NN). Analysis of Large Hierarchical Data with Multilevel Logistic Modeling Using PROC GLIMMIX. SUGI – SAS Users Group International. 151 (32). 2. Dai, J., Li, Z., Rocke, D. (NN). Hierarchical Logistic Regression Modeling with SAS GLIMMIX. University of California: Davis. 3. Chantreuil, F., Trannoy, A. (1999). Inequality decomposition values: the trade-off between marginality and consistency. THEMA Discussion Paper. Universite de CergyPontoise: France. 4. Conklin, M., Powaga, K., Lipovetsky, S. (NN, 2004). Customer Satisfaction Analysis: Identification of Key Drivers. European Journal of Operational Research, 3 (154), 819827. 5. Feldman, B. (December, 1999). The proportional Value of a Cooperative Game. Accessed 21st September, 2011 on webpage http://ideas.repec.org/p/ecm/wc2000/1140.html 6. Feldman, B. (March, 2007). A theory of attribution. Accessed 5th September, 2011 on webpage http://mpra.ub.uni-muenchen.de/3349/ 7. Garavaglia, S., Sharma, A. (NN). A smart guide to dummy variables: Four applications and a macro. New Jersey: Murray Hill. 8. GFK Customer Loyalty (NN). Getting Better Regression Results with Shapley Value Regression. Accessed on 28th September, 2011 on webpage http://marketing.gfkamerica.com/website/articles/ShapelyValueRegression.pdf 58 9. Grömping, U. (May, 2007). Estimators of Relative Importance in Linear Regression Based on Variance Decomposition. The American Statistician. 2 (61), pp. 139 - 147. 10. Grömping, U. (October, 2007). Relative Importance for Linear Regression in R: The Package relaimpo. Journal of Statistical Software. 1 (17). 11. Hair, J.F, Anderson, R.E., Tatham, R.L., Black, C.W. (1998). Multivariate Data Analysis (5th ed.). New Jersey: Prentice Hall, Inc. 12. Hart, S., Mas-Colell, A. (May, 1989). Potential, Value and Consistency. Econometrica, 57 (3), 589-614. 13. Hausknecht, R.D. (NN, 1990). Measurment Scales in Satisfaction/Dissatisfaction. Accessed 16th June, 2011 on http://lilt.ilstu.edu/staylor/csdcb/articles/Volume3/Hausknecht%201990.pdf. Customer webpage 14. Huang, S-Y., Lee, H-M, Hsiao, C.K. (August, 2006). Kernel Canonical Correlation Analysis and its Applications to Nonlinear Measures of Association and Test of Independence. Institute of Statistical Science: Academia Sinica, Taiwan. 15. Israeli, O. (March, 2007). A Shapley-based decomposition of the R-square of a linear regression. The Journal of Economic Inequality. 5, 199-212. 16. Johnson, A.R., Wichern, W.D. (2001). Applied Multivariate Statistical Analysis (5th ed.). New Jersey: Prentice Hall. 17. Knapp, T. (March/April, 1990). Commentary: Treating Ordinal Scales as Interval Scales: An Attempt to Resolve the Controversy. Psychometrica, 39 (2), 121-123. 18. Kruskal, W. (NN, 1987). Relative Importance by averaging over orderings. The American Statistician, 41 (1). 19. Likert, R. (NN, 1932). A technique for the measurement of attitudes. Archives of Psychology. 140 (32), 55. 20. Lipovetsky S., Conklin M. (NN, 2001). Analysis of Regression in a Game Theory Approach. Applied Stochastic Models in Business and Industry. 17, 319-330. 21. Mansor, M.A., Ohsato, A. (NN, 2010). The Concept of Moving Colaition Analysis and its Transpose. European Journal of Scientific Research. 4 (39), 548-557. 22. Mikulic, J., Prebežac, D. (NN, 2011). A critical review of techniques for classifying quality attributes in the Kano model. Managing Service Quality, 1 (21), 46 – 66. 23. Petrosjan, L., Zaccour, G. (June, 2001). Time-consistent Shapley value allocation of pollution cost reduction. Journal of Economic Dynamics & Control. 2003 (27), 381-398. 59 24. Siegel, S. (1967). Nonparametric statistics for the behavioral science. New York: McGraw-Hill Book Co. 25. Shapley, L.S. (1953). A value for n-person games. In: Kuhn, H.W. and Tucker, A.W. (eds.), (1953). Contributions to the theory of games. Princeton: Priceton University Press, 307-317. 26. Sharrocks, A.F. (1999). Decomposition Procedures for Distributional Analysis: A unified Frameowrk Based on the Shapley Value.United Kingdom: University of Essex. 27. Theil, H. (NN, 1987). How many bits of information does an independent variable yield in a multiple regression? Statistics and Probability Letters, 6 (2). 28. Von Neumann, J. (1928). On theory of playing games. English translation in: Kuhn, H.W. and Tucker, A.W. (1959) Contribution to the Theory of Games. Princeton: Princeton University Press, 13-41. 29. Weiner, J.L., Tang, J. (NN). Multicollinearity in Customer Satisfaction Research. Ipsos Loyalty: www.ipsosloyalty.com 30. Yeung, W.K. D. (NN, 2010). Time consistent Shapley Value Imputations for Cost-Saving Joint Ventures. Accessed 21st September, 2011 on webpage http://www.mathnet.ru/php/archive.phtml?wshow=paper&jrnid=mgta&paperid=44&optio n_lang=eng. 31. Yeung D.W.K., Petrosyan L.A. (NN, 2004). Subgame consistent cooperative solutions in stochastic differential games. Journal of Optimization Theory and Applications. 120 (3), 651-666. 60 Appendix A: SAS and R codes • Univariate Analysis (SAS graphics) goptions reset = (axis, legend, pattern, symbol, title, footnote) colors=(black blue green red yellow cyan gold) norotate hpos=0 vpos=0 htext= ftext= ctext= target= gaccess= gsfmode= ; title1 'Frequency Distribution' color=gold; title3 underlin=1 'V191' color=red; Footnote color = green 'Dataset: ; goptions device=WIN ctext=blue graphrc interpo=join; pattern1 color=blue value=X1; pattern2 color=blue value=X1; pattern3 color=blue value=X1; pattern4 color=blue value=X1; pattern5 color=blue value=X1; pattern6 color=blue value=X1; pattern7 color=blue value=X1; pattern8 color=blue value=X1; pattern9 color=blue value=X1; pattern10 color=blue value=X1; axis1 color=blue width=2.0; axis2 color=blue width=2.0; axis3 color=blue width=2.0; proc gchart data=; hbar V191/DISCRETE; run; • Dummy variables MACRO (SAS) option nosymbolgen mlogic mprint obs=999999999; libname; filneame out; data; set; ; /*MACRO PARAMETERS : dsn = input dataset name, var = variable to be categorized, prefix = categorical variable prefix, flat = flatfile name with code (referenced in file name statement)*/ 59 %macro dmycode (dsn =, Var =, Prefix =, Flat = ); proc summary data = &dsn nway; class &var ; output out = x (keep=&var); proc print; *; data _null_; set x nobs=totx end=last; if last then call symput (‘tot’, trim(left(put(totx, best.)))); call symput (‘z’ || trim (left ( put (_n_; best. ))), trim(left (&var))); data _null_; file &flat; %do i=1 to %tot; put “&prefix&&z&I =0;”; %end put “SELECT;”; %do i=1 %to &tot; put “ when (&var= &&z&i) &prefix&&z&I = 1;”; %end put “ otherwise V_oth=1;”; put “end;”; run; %mend dmycode; %dmycode (dsn = , var =, prefix =, flat =out); run;quit; • Relative Importance (R-code) >linmod <- lm(‘response_variable’~ .. , data=) >metrics <- calc.relimp(linmod, type = c(“lmg”, “first”, “last”), rela=TRUE) >metrics • Canonical Correlation (SAS-Code) proc cancorr corr data= vprefix = problems wprefix = satisfaction vname = „Problem Areas wname = „Satisfaction Areas ; var Vp Ve Vw Vb Vo Vi Vel Ven Vcl Vbr Vsw Vs Vex Vot; with v191 v192 v193 v194 v195 v196 v197 v198 v199 v200 v201 v202 v203 v204 v205 v206 v207 v208 v209 v210 v211 v212 v213 v214 v215 v216 v217 v218 v219 v220 v221 v222 v223 v224; run; • Hierarchical Logistic Regression (SAS Code) data; set; IF V >= 5 then V_S = 1; Else V_S = 0; Keep V_S make-model number_problems recommendation; run; proc glimmix; class make-model recommendation; model V_S = number_problems recommendation / dist=binary link=logit ddfm=bw solution; random intercept / subject=make-model; run; Appendix B: Outputs I. Relative Importance (R-Output) • Attributes related to the vehicle Country A: Year 2006 Response variable: V191 Total response variance: 2.058392 Analysis based on 34510 observations 20 Predictors: V200 V195 V220 V194 V196 V198 V201 V212 V216 V213 V214 V192 V206 V207 V209 V197 V215 V218 V224 V210 Proportion of variance explained by model: 56.21% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V200 0.17533809 6.670676e-01 0.07014639 V195 0.06650988 3.989017e-02 0.06035864 V220 0.06934197 9.345311e-02 0.05804864 V194 0.04902994 5.570296e-03 0.05454818 V196 0.04983194 1.916273e-07 0.05512604 V198 0.03417306 2.064780e-02 0.04590987 V201 0.05018064 3.947526e-05 0.05408146 V212 0.05235531 3.462571e-02 0.05210374 V216 0.02740625 1.781458e-02 0.03937961 V213 0.03447326 4.347548e-04 0.04692400 V214 0.03868629 8.282556e-03 0.05142443 V192 0.03835219 2.292815e-02 0.04229709 V206 0.03989079 3.311699e-03 0.04849407 V207 0.03749254 7.339540e-04 0.04698004 V209 0.04373201 1.269670e-03 0.05122101 V197 0.04524226 4.550039e-02 0.04726832 V215 0.04168734 1.427227e-02 0.04995258 V218 0.04235835 1.150227e-02 0.04707904 V224 0.03296703 5.192290e-03 0.03948656 V210 0.03095085 7.463055e-03 0.03917028 61 Country A, Year 2007 Response variable: V243 Total response variance: 2.0479 Analysis based on 35070 observations 20 Predictors: V252 V247 V272 V376 V250 V246 V266 V270 V253 V259 V267 V261 V244 V248 V374 V268 V265 V258 V271 V249 Proportion of variance explained by model: 54.55% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V252 0.17120676 7.033375e-01 0.06804549 V247 0.05811261 1.483454e-02 0.05774994 V272 0.06526155 8.042678e-02 0.05630593 V376 0.06432643 5.451424e-02 0.05757349 V250 0.03708160 4.096109e-03 0.04747884 V246 0.04817986 1.375960e-02 0.05293146 V266 0.04093524 4.557363e-03 0.05166054 V270 0.04006884 9.694720e-03 0.04584990 V253 0.04433569 6.528559e-03 0.05098552 V259 0.03736594 6.950527e-05 0.04676655 V267 0.04689966 2.915300e-02 0.05087537 V261 0.04523375 4.699467e-03 0.05100752 V244 0.03593635 1.401735e-02 0.04100905 V248 0.04580069 4.881238e-05 0.05297541 V374 0.03868168 9.356847e-03 0.04424975 V268 0.02889764 2.119193e-02 0.04031436 V265 0.03526031 1.385710e-04 0.04622123 V258 0.03962943 2.872651e-03 0.04817641 V271 0.03347058 1.434143e-04 0.04254681 V249 0.04331541 2.655901e-02 0.04727644 Country A, Year 2008 Response variable: V185 Total response variance: 2.086333 Analysis based on 35992 observations 20 Predictors: V358 V350 V189 V193 V188 V186 V201 V348 V191 V204 V345 V198 V199 V190 V200 V192 V346 V205 V356 V357 Proportion of variance explained by model: 57.11% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V358 0.05708817 0.0162859062 0.05756880 V350 0.06149193 0.0589116256 0.05669807 V189 0.04926621 0.0172649684 0.05199343 V193 0.12683183 0.5019156978 0.06398757 V188 0.04958275 0.0149977892 0.05311521 V186 0.03757862 0.0192935222 0.04208401 V201 0.04270511 0.0001110518 0.05140442 V348 0.04354485 0.0117737912 0.04826778 V191 0.03907649 0.0189771626 0.04858283 V204 0.04521186 0.0225305517 0.04850728 V345 0.03894794 0.0087256304 0.05034929 V198 0.03793851 0.0002560687 0.04798146 V199 0.03570495 0.0019793658 0.04624136 V190 0.04422061 0.0148505132 0.04666335 V200 0.04966849 0.0629385786 0.04740075 V192 0.08183276 0.1805968122 0.05385416 V346 0.04203345 0.0131359458 0.04957674 V205 0.03353438 0.0030582268 0.04488937 V356 0.03615390 0.0016817661 0.04343725 V357 0.04758717 0.0307150258 0.04739688 Country A, Year 2009 Response variable: V166 Total response variance: 1.970236 Analysis based on 34290 observations 20 Predictors: V174 V200 V172 V169 V192 V170 V185 V182 V198 V167 V190 V186 V179 V187 V188 V173 V180 V171 V191 V189 Proportion of variance explained by model: 56.19% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V174 0.11186845 0.4531807766 0.06079830 V200 0.06440816 0.0440922785 0.05898719 V172 0.04063230 0.0287191872 0.04937835 V169 0.05385323 0.0289676112 0.05400725 V192 0.06519498 0.1002344739 0.05615337 V170 0.05198625 0.0154245596 0.05291007 V185 0.04671821 0.0252078885 0.04884621 V182 0.04407208 0.0022455794 0.05098877 V198 0.03791848 0.0050820263 0.04381818 V167 0.04060120 0.0268104287 0.04310929 V190 0.04395921 0.0131949586 0.04841819 V186 0.03648265 0.0007514274 0.04630650 V179 0.04096882 0.0042731491 0.04813841 V187 0.04169902 0.0062132525 0.05152353 V188 0.04539000 0.0283392387 0.05035166 V173 0.07699340 0.2045259857 0.05167943 V180 0.03835864 0.0001828287 0.04660620 V171 0.04258536 0.0111044666 0.04600678 V191 0.03720759 0.0006127727 0.04367013 V189 0.03910197 0.0008371102 0.04830217 Country A, Year 2010 Response variable: V136 Total response variance: 2.300986 Analysis based on 26077 observations 20 Predictors: V169 V144 V142 V140 V162 V156 V158 V157 V139 V160 V161 V155 V150 V153 V149 V152 V167 V141 V170 V137 Proportion of variance explained by model: 57.96% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V169 0.05692879 8.767741e-03 0.05894160 V144 0.11910833 1.788261e-01 0.06802633 V142 0.03739872 4.597620e-03 0.04892917 V140 0.04414864 2.612953e-03 0.05170097 V162 0.05708717 8.176049e-02 0.05466183 V156 0.03357805 9.584655e-04 0.04639863 V158 0.03882760 9.703937e-03 0.04888020 V157 0.03744338 4.698080e-03 0.05086771 V139 0.04802484 2.619986e-02 0.05315768 V160 0.03708772 3.509546e-03 0.04665635 V161 0.03373433 5.328507e-08 0.04430867 V155 0.04617533 2.388766e-02 0.05029013 V150 0.03521464 5.273087e-04 0.04678968 V153 0.03205610 1.749369e-02 0.04033196 V149 0.03629620 1.551674e-03 0.04755669 V152 0.04018143 1.951319e-04 0.05043326 V167 0.03391699 3.368393e-03 0.04288100 V141 0.04512617 5.022643e-02 0.04709774 V170 0.15284015 5.597033e-01 0.06050640 V137 0.03482542 2.141156e-02 0.04158400 II. Attributes related to the vehicle (among those that did not experience any problems) Country A, Year 2006 Response variable: V191 Total response variance: 1.200345 Analysis based on 16594 observations 20 Predictors: V200 V194 V195 V209 V220 V192 V214 V201 V215 V196 V198 V207 V218 V217 V206 V213 V197 V219 V221 V212 Proportion of variance explained by model: 54.84% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V200 0.06362960 1.701838e-01 0.05289687 V194 0.06927767 9.967971e-02 0.05819301 V195 0.05574437 4.577733e-03 0.05618415 V209 0.04626672 2.446540e-03 0.05102899 V220 0.06605162 1.471059e-01 0.05597795 V192 0.06706004 2.282329e-01 0.05029801 V214 0.04927452 2.204473e-03 0.05422015 V201 0.04648800 1.070166e-02 0.04940747 V215 0.05187558 5.265314e-02 0.05175723 V196 0.05026635 9.860334e-04 0.05284699 V198 0.03906054 3.515196e-02 0.04659809 V207 0.04196933 2.526534e-03 0.04782239 V218 0.04775618 2.814527e-02 0.04838327 V217 0.04179348 7.071445e-03 0.04639525 V206 0.04400858 1.192303e-02 0.04819595 V213 0.04299308 5.352590e-04 0.04854995 V197 0.05877350 1.771389e-01 0.04963180 V219 0.03961751 2.378401e-05 0.04453943 V221 0.04233533 1.815867e-02 0.04425729 V212 0.03575800 5.532474e-04 0.04281576 Country A, Year 2007 Response variable: V243 Total response variance: 1.201164 Analysis based on 18070 observations 20 Predictors: V252 V246 V376 V247 V266 V261 V244 V272 V250 V253 V267 V248 V259 V270 V269 V377 V258 V265 V249 V271 Proportion of variance explained by model: 52.59% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V252 0.05623229 1.017877e-01 0.05136808 V246 0.06754630 9.680398e-02 0.05734186 V376 0.04549564 3.427819e-03 0.05280071 V247 0.05250400 6.948057e-03 0.05504152 V266 0.05007357 2.359134e-03 0.05422168 V261 0.04605065 8.670221e-03 0.05048469 V244 0.07256916 2.958595e-01 0.05076800 V272 0.06769435 1.747223e-01 0.05517171 V250 0.04038925 1.117182e-02 0.04778182 V253 0.04505041 1.067390e-02 0.04923535 V267 0.05655140 8.625700e-02 0.05258238 V248 0.04694570 3.807004e-04 0.05179084 V259 0.04075267 8.682217e-06 0.04770874 V270 0.05111744 4.356310e-02 0.04881679 V269 0.04337561 8.680529e-03 0.04756474 V377 0.03034298 1.094301e-02 0.03464764 V258 0.04359521 1.220453e-02 0.04818256 V265 0.04543294 3.480866e-03 0.04936963 V249 0.05400056 1.152797e-01 0.04890685 V271 0.04427986 6.777423e-03 0.04621440 Country A, Year 2008 Response variable: V185 Total response variance: 1.207184 Analysis based on 17654 observations 20 Predictors: V193 V188 V201 V350 V186 V345 V191 V358 V199 V348 V346 V347 V198 V189 V359 V205 V200 V190 V356 V351 Proportion of variance explained by model: 55.29% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V193 0.05021427 5.073150e-02 0.04978496 V188 0.06607087 7.592560e-02 0.05703007 V201 0.04797203 3.820524e-06 0.05333964 V350 0.07511630 2.209559e-01 0.05823800 V186 0.06505198 2.113912e-01 0.05023616 V345 0.04678278 7.661378e-03 0.05329954 V191 0.04722728 3.099617e-03 0.05135730 V358 0.04508103 2.198063e-03 0.05198768 V199 0.04111215 8.108458e-03 0.04879660 V348 0.05773926 7.494597e-02 0.05237052 V346 0.04988962 3.962038e-02 0.05182858 V347 0.04825039 2.344735e-02 0.05052732 V198 0.04751064 2.433252e-02 0.05071958 V189 0.05281364 5.274360e-03 0.05250057 V359 0.03319606 2.819822e-02 0.03645728 V205 0.04223649 9.318714e-05 0.04824157 V200 0.04841213 7.716896e-02 0.04753308 V190 0.05375072 1.160624e-01 0.04698353 V356 0.04173557 2.981868e-02 0.04456383 V351 0.03983678 9.624043e-04 0.04420420 Country A, Year 2009 Response variable: V166 Total response variance: 1.262197 Analysis based on 17592 observations 20 Predictors: V174 V169 V182 V172 V187 V192 V167 V200 V190 V189 V180 V188 V170 V179 V201 V181 V186 V198 V171 V193 Proportion of variance explained by model: 55.26% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V174 0.05514770 9.096436e-02 0.05112949 V169 0.07062230 1.016150e-01 0.05823730 V182 0.04335020 4.510340e-03 0.05073319 V172 0.04419138 3.126511e-04 0.05033357 V187 0.05015063 2.018111e-03 0.05492181 V192 0.06242987 9.690852e-02 0.05513183 V167 0.07195362 2.595020e-01 0.05178702 V200 0.04413617 1.741172e-03 0.05163339 V190 0.05606256 7.166346e-02 0.05157028 V189 0.04625483 4.974454e-03 0.05093626 V180 0.03882214 1.082792e-02 0.04750050 V188 0.05439075 4.933711e-02 0.05345940 V170 0.04979640 8.583846e-06 0.05174134 V179 0.04258554 1.375565e-02 0.04836944 V201 0.03065876 1.004690e-02 0.03526965 V181 0.05354090 1.463955e-01 0.04778978 V186 0.04612255 5.872432e-03 0.05001568 V198 0.04400253 3.034489e-02 0.04604635 V171 0.05202964 8.247920e-02 0.04760875 V193 0.04375152 1.672166e-02 0.04578497 Country A, Year 2010 Response variable: V136 Total response variance: 1.309809 Analysis based on 13350 observations 20 Predictors: V144 V139 V152 V142 V157 V137 V162 V169 V150 V170 V160 V159 V158 V140 V149 V156 V161 V141 V167 V151 Proportion of variance explained by model: 51.52% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V144 0.05416185 6.985963e-02 0.05137624 V139 0.06496484 8.374789e-02 0.05659155 V152 0.04555003 1.590906e-03 0.05152652 V142 0.04555793 1.901789e-04 0.05086216 V157 0.05113744 2.641150e-05 0.05497880 V137 0.06057700 1.698714e-01 0.04935664 V162 0.06599232 1.777644e-01 0.05482148 V169 0.04667376 1.989268e-04 0.05260723 V150 0.04184400 3.455916e-05 0.04888765 V170 0.03797399 4.087843e-02 0.03916625 V160 0.05081902 2.807938e-02 0.05047922 V159 0.04542082 5.158531e-03 0.05004528 V158 0.05198424 4.155107e-02 0.05201739 V140 0.04954933 7.062043e-04 0.05147886 V149 0.04176165 9.967343e-04 0.04833916 V156 0.04480985 3.378128e-03 0.04907881 V161 0.04492645 6.361530e-03 0.04752775 V141 0.05773304 1.556097e-01 0.04871006 V167 0.04530835 4.845730e-02 0.04568283 V151 0.05325409 1.655388e-01 0.04646612 III. Attributes related to the dealer Country A, Year 2006 Response variable: V90 Total response variance: 3.757734 Analysis based on 17669 observations 13 Predictors: V91 V92 V93 V94 V95 V96 V97 V98 V99 V100 V101 V102 V103 Proportion of variance explained by model: 83.44% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V91 0.18521298 6.847516e-01 0.11288476 V92 0.04820797 9.116609e-04 0.05945564 V93 0.07233644 2.862797e-05 0.08168633 V94 0.09890643 6.431012e-02 0.09279135 V95 0.06497206 3.968239e-06 0.07616506 V96 0.05587773 2.360987e-05 0.06826234 V97 0.06290219 7.867090e-03 0.06919138 V98 0.08530652 2.806959e-02 0.08517191 V99 0.05568341 5.348390e-04 0.06452460 V100 0.04834171 3.576874e-03 0.05786154 V101 0.06958985 3.141434e-03 0.08005059 V102 0.05898171 5.458162e-04 0.07266655 V103 0.09368100 2.062347e-01 0.07928795 Country A, Year 2007 Response variable: V93 Total response variance: 3.600228 Analysis based on 18132 observations 13 Predictors: V94 V95 V96 V97 V98 V99 V100 V101 V102 V103 V104 V105 V106 Proportion of variance explained by model: 84.2% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V94 0.19100970 7.264426e-01 0.11368961 V95 0.05121555 5.483709e-04 0.06250500 V96 0.07442355 1.696865e-03 0.08262941 V97 0.09609172 3.716799e-02 0.09258794 V98 0.06514394 5.963570e-04 0.07646783 V99 0.05719067 7.870795e-06 0.06944981 V100 0.06124171 6.756278e-03 0.06833179 V101 0.08803514 3.947776e-02 0.08559081 V102 0.05405284 2.101083e-03 0.06243708 V103 0.04723924 4.862534e-04 0.05785069 V104 0.06921337 3.359284e-03 0.07973824 V105 0.05623089 3.475600e-03 0.07081507 V106 0.08891168 1.778837e-01 0.07790672 Country A, Year 2008 Response variable: V51 Total response variance: 3.639024 Analysis based on 22321 observations 13 Predictors: V52 V53 V54 V55 V56 V57 V58 V59 V60 V61 V62 V63 V64 Proportion of variance explained by model: 84.63% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V52 0.18764711 7.271273e-01 0.11268902 V53 0.04987515 6.951197e-04 0.06112621 V54 0.07520875 1.230900e-03 0.08297386 V55 0.09886036 5.301004e-02 0.09296528 V56 0.06538550 7.392421e-04 0.07628896 V57 0.05618275 6.938766e-05 0.06820079 V58 0.06276593 7.147703e-03 0.06906188 V59 0.08546641 3.079432e-02 0.08493816 V60 0.05425806 1.389492e-03 0.06313038 V61 0.04611721 1.128925e-03 0.05628250 V62 0.07113660 4.435687e-03 0.08107075 V63 0.05964542 3.203869e-03 0.07349821 V64 0.08745076 1.690280e-01 0.07777400 Country A, Year 2009 Response variable: V129 Total response variance: 3.53333 Analysis based on 21084 observations 13 Predictors: V130 V131 V132 V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 Proportion of variance explained by model: 84.65% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V130 0.18835054 7.301989e-01 0.11221974 V131 0.05185940 9.211642e-04 0.06318838 V132 0.07364990 3.366056e-04 0.08213865 V133 0.09666288 4.502798e-02 0.09244620 V134 0.06565629 9.256717e-04 0.07634026 V135 0.05621675 9.789573e-05 0.06868110 V136 0.06140672 3.467800e-03 0.06779338 V137 0.08446151 2.312392e-02 0.08471682 V138 0.05484663 6.909861e-05 0.06376546 V139 0.04697694 1.766338e-03 0.05676957 V140 0.07036984 5.329674e-03 0.08013106 V141 0.05738450 5.201283e-03 0.07168433 V142 0.09215811 1.835337e-01 0.08012505 Country A, Year 2010 Response variable: V99 Total response variance: 3.549067 Analysis based on 21352 observations 13 Predictors: V100 V101 V102 V103 V104 V105 V106 V107 V108 V109 V110 V111 V112 Proportion of variance explained by model: 85.27% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first V100 0.18775470 0.7362742584 0.11205994 V101 0.05209857 0.0004553937 0.06360624 V102 0.07491296 0.0004533549 0.08297144 V103 0.10011112 0.0511773923 0.09361036 V104 0.06765425 0.0024312043 0.07772847 V105 0.05504420 0.0020952891 0.06784081 V106 0.06010152 0.0032824960 0.06856897 V107 0.08353759 0.0240797591 0.08380773 V108 0.05177882 0.0009654425 0.06056569 V109 0.04627701 0.0001588560 0.05682245 V110 0.07205349 0.0061650701 0.08096007 V111 0.06121120 0.0005103106 0.07423479 V112 0.08746456 0.1719511732 0.07722304 IV. Problem Areas Country A, Year 2006 Response variable: V191 Total response variance: 2.140151 Analysis based on 39307 observations 14 Predictors: Vp Ve Vw Vb Vo Vi Vel Ven Vc Vbr Vsw Vs Vex Vot Proportion of variance explained by model: 14.41% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first Vp 0.039842387 0.02977609 0.048137509 Ve 0.042242747 0.03239782 0.050406735 Vw 0.030952020 0.02632669 0.034680935 Vb 0.121423250 0.11532126 0.125186465 Vo 0.062205607 0.05878978 0.064523369 Vi 0.100405042 0.09893048 0.101098128 Vel 0.108495021 0.11726448 0.101685549 Ven 0.212139073 0.25024403 0.183571241 Vc 0.120215922 0.12446019 0.116230932 Vbr 0.045555738 0.03933297 0.050383691 Vsw 0.060655100 0.05583672 0.064556730 Vs 0.033740768 0.02881239 0.037625045 Vex 0.013582749 0.01195273 0.014853031 Vot 0.008544577 0.01055439 0.007060639 Country A, Year 2007 Response variable: V243 Total response variance: 2.117815 Analysis based on 39485 observations 14 Predictors: Vp Ve Vw Vb Vo Vi Vel Ven Vc Vbr Vsw Vs Vex Vot Proportion of variance explained by model: 14.12% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first Vp 0.042708995 0.034500288 0.049491698 Ve 0.035370297 0.026804428 0.042385400 Vw 0.051543550 0.049246900 0.053372287 Vb 0.123632357 0.111864701 0.131278734 Vo 0.061172457 0.057526995 0.063762537 Vi 0.074496974 0.071446048 0.076935143 Vel 0.089960019 0.097802332 0.084192155 Ven 0.212809172 0.250392368 0.184893020 Vc 0.137871419 0.141512990 0.134150987 Vbr 0.048614873 0.044353022 0.051915504 Vsw 0.056500635 0.054858109 0.057818221 Vs 0.046087759 0.040707305 0.050150517 Vex 0.010442400 0.008255915 0.012225820 Vot 0.008789093 0.010728598 0.007427978 Country A, Year 2008 Response variable: V185 Total response variance: 2.150441 Analysis based on 40678 observations 14 Predictors: Vp Ve Vw Vb Vo Vi Vel Ven Vc Vbr Vsw Vs Vex Vot Proportion of variance explained by model: 14.39% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first Vp 0.045876248 0.037517477 0.052662217 Ve 0.033029765 0.023748463 0.040566347 Vw 0.052034340 0.047325520 0.055719729 Vb 0.110277994 0.097681766 0.118646910 Vo 0.059870420 0.056588090 0.062211016 Vi 0.087610414 0.083404228 0.090635488 Vel 0.093153875 0.100558400 0.087406287 Ven 0.252488469 0.302818865 0.215539736 Vc 0.103348859 0.105917495 0.100859107 Vbr 0.048519557 0.044443819 0.051469277 Vsw 0.039320631 0.034126298 0.043349360 Vs 0.042098418 0.035666986 0.046912783 Vex 0.025816516 0.022517877 0.028298956 Vot 0.006554494 0.007684718 0.005722788 Country A, Year 2009 Response variable: V166 Total response variance: 2.051197 Analysis based on 38628 observations 14 Predictors: Vp Ve Vw Vb Vo Vi Vel Ven Vc Vbr Vsw Vs Vex Vot Proportion of variance explained by model: 12.77% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first Vp 0.047259639 0.036880618 0.055496184 Ve 0.048929548 0.044519069 0.052543048 Vw 0.027595428 0.024629018 0.030038289 Vb 0.122505803 0.112297562 0.128963244 Vo 0.049666289 0.044956433 0.053164563 Vi 0.079500754 0.074262146 0.083292872 Vel 0.088606572 0.096729137 0.082742135 Ven 0.214316999 0.250222913 0.187771898 Vc 0.151047967 0.165427418 0.139916901 Vbr 0.050078129 0.041951225 0.056202797 Vsw 0.047450551 0.044491043 0.049717297 Vs 0.055625147 0.049164918 0.060388137 Vex 0.015697143 0.012480133 0.018231952 Vot 0.001720031 0.001988368 0.001530682 Country A, Year 2010 Response variable: V136 Total response variance: 2.393666 Analysis based on 30264 observations 14 Predictors: Vp Ve Vw Vb Vo Vi Vel Ven Vc Vbr Vsw Vs Vex Vot Proportion of variance explained by model: 14.41% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first Vp 0.03932781 0.02999155 0.047216946 Ve 0.03633730 0.02897662 0.042543982 Vw 0.02825591 0.02583377 0.030373011 Vb 0.08935714 0.07966546 0.096309440 Vo 0.05878993 0.05458021 0.061575447 Vi 0.06564755 0.05853186 0.071027124 Vel 0.08702988 0.08889453 0.085019095 Ven 0.29289464 0.35551348 0.245493547 Vc 0.13617786 0.13824893 0.133451330 Vbr 0.04994728 0.04271740 0.055541753 Vsw 0.04123526 0.03442902 0.046479289 Vs 0.04932866 0.03731569 0.058440264 Vex 0.01491473 0.01038544 0.018715628 Vot 0.01075603 0.01491603 0.007813144 V. Trend Analysis • Trend Analysis (comparison) The above trend line is fitted to the V7 attribute and is including only the respondents who did not experience any problems. The fit line is quite poor, however when taking the whole dataset into consideration, there was no pattern at all. The bellow figure, on the other hand, is showing weak pattern when all the respondents were included in the analysis, but did not display any trend pattern at all among those that did not experience any problem. 56 Trend line fitted to the relative importance of the attribute V3 (including all the respondents) displayed a good fit, while it did not show any pattern at all, when considering only respondents who did not report any problem. To emphasise the differences between the two datasets, V15 provides a clear example of different movements in relative importance over the five years (figures below).