Chapter 1 Introduction 1-1 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Chapter 1 Introduction LEARNING OBJECTIVES: Upon completing this chapter, you should be able to do the following: 1. 2. 3. 4. 5. Explain what multivariate analysis is and when its application is appropriate. Define and discuss the specific techniques included in multivariate analysis. Determine which multivariate technique is appropriate for a specific research problem. Discuss the nature of measurement scales and their relationship to multivariate techniques. Describe the conceptual and statistical issues inherent in multivariate analyses. 1-2 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. What is Multivariate Analysis? What is it? Multivariate Data Analysis = all statistical methods that simultaneously analyze multiple measurements on each individual or object under investigation. Why use it? • Measurement • Explanation & Prediction • Hypothesis Testing 1-3 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Basic Concepts of Multivariate Analysis The Variate Measurement Scales • Nonmetric • Metric Multivariate Measurement Measurement Error Types of Techniques 1-4 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. The Variate The variate is a linear combination of variables with empirically determined weights. Weights are determined to best achieve the objective of the specific multivariate technique. Variate equation: (Y’) = W1 X1 + W2 X2 + . . . + Wn Xn Each respondent has a variate value (Y’). The Y’ value is a linear combination of the entire set of variables. It is the dependent variable. Potential Independent Variables: X1 = income X2 = education X3 = family size X4 = ?? 1-5 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Types of Data and Measurement Scales Data Metric or Quantitative Nonmetric or Qualitative Nominal Scale Ordinal Scale Interval Scale Ratio Scale 1-6 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Measurement Scales Nonmetric • • Nominal – size of number is not related to the amount of the characteristic being measured Ordinal – larger numbers indicate more (or less) of the characteristic measured, but not how much more (or less). Metric • • Interval – contains ordinal properties, and in addition, there are equal differences between scale points. Ratio – contains interval scale properties, and in addition, there is a natural zero point. NOTE: The level of measurement is critical in determining the appropriate multivariate technique to use! 1-7 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Measurement Error • • • All variables have some error. What are the sources of error? Measurement error = distorts observed relationships and makes multivariate techniques less powerful. Researchers use summated scales, for which several variables are summed or averaged together to form a composite representation of a concept. 1-8 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Measurement Error In addressing measurement error, researchers evaluate two important characteristics of measurement: • Validity = the degree to which a measure accurately represents what it is supposed to. • Reliability = the degree to which the observed variable measures the “true” value and is thus error free. 1-9 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Statistical Significance and Power Type I error, or , is the probability of rejecting the null hypothesis when it is true. Type II error, or , is the probability of failing to reject the null hypothesis when it is false. Power, or 1-, is the probability of rejecting the null hypothesis when it is false. Fail to Reject H0 Reject H0 H0 true H0 false 1- Type II error Type I error 1- Power 1-10 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Power is Determined by Three Factors Effect size: the actual magnitude of the effect of interest (e.g., the difference between means or the correlation between variables). Alpha (): as is set at smaller levels, power decreases. Typically, = .05. Sample size: as sample size increases, power increases. With very large sample sizes, even very small effects can be statistically significant, raising the issue of practical significance vs. statistical significance. 1-11 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 1-12 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Impact of Sample Size on Power 1-13 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Rules of Thumb 1–1 • • • • • Statistical Power Analysis Researchers should always design the study to achieve a power level of .80 at the desired significance level. More stringent significance levels (e.g., .01 instead of .05) require larger samples to achieve the desired power level. Conversely, power can be increased by choosing a less stringent alpha level (e.g., .10 instead of .05). Smaller effect sizes always require larger sample sizes to achieve the desired power. Any increase in power is most likely achieved by increased sample size. 1-14 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Types of Multivariate Techniques Dependence techniques: a variable or set of variables is identified as the dependent variable to be predicted or explained by other variables known as independent variables. o Multiple Regression o Multiple Discriminant Analysis o Logit/Logistic Regression o Multivariate Analysis of Variance (MANOVA) and Covariance o Conjoint Analysis o Canonical Correlation o Structural Equations Modeling (SEM) 1-15 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. The relationships between multivariate dependence methods 1-16 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Types of Multivariate Techniques Interdependence techniques: involve the simultaneous analysis of all variables in the set, without distinction between dependent variables and independent variables. o Principal Components and Common Factor Analysis o Cluster Analysis o Multidimensional Scaling (perceptual mapping) o Correspondence Analysis 1-17 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Selecting a Multivariate Technique 1. What type of relationship is being examined – dependence or interdependence? 2. Dependence relationship: How many variables are being predicted? What is the measurement scale of the dependent variable? What is the measurement scale of the predictor variable? 3. Interdependence relationship: Are you examining relationships between variables, respondents, or objects? 1-18 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Two Broad Types of Multivariate Methods: 1. Dependence – analyze dependent and independent variables at the same time. 2. Interdependence – analyze dependent and independent variables separately. 1-19 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Selecting a multivariate technique 1-20 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Selecting the Correct Multivariate Method Multivariate Methods Interdependence Methods Dependence Methods One Dependent Variable Several Dependent Variables Multiple Relationships Structural Equations SEM Metric Nonmetric Metric Nonmetric Canonical Correlation with Dummy Variables Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Multiple Regression and Conjoint Discriminant Analysis and Logit MANOVA and Canonical Nonmetric Metric CFA Factor Analysis Cluster Analysis Metric MDS Nonmetric MDS and Correspondence Analysis 1-21 Multiple Regression . . . a single metric dependent variable is predicted by several metric independent variables. X1 e.g. Monthly expenditures on dining out-family income size, age of head of household Sales-expenditures on advertising, number of sales people, number of stores carrying the products Y X2 1-22 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Discriminant Analysis • What is it? . . . single, non-metric (categorical) dependent variable is predicted by several metric independent variables. • Why use it? 1-23 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Logistic Regression (Logit analysis) • A single nonmetric dependent variable is predicted by several metric independent variables. • This technique is similar to discriminant analysis (the difference is it accepts both metric and non-metric independent variables, and does not require multivariate normality), but relies on calculations more like regression (with differences in estimation method and assumptions). e.g. Financial advisors trying to select emerging firms for start-up investment – review past records and place firms in two groups: successful over a 5-year period and unsuccessful over 5year period. They use financial and managerial data. They identify those financial and managerial data that best differentiate between successful and unsuccessful firms to select the best candidates in future. 1-24 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. MANOVA Several metric dependent variables are predicted by a set of nonmetric (categorical) independent variables. e.g.- Company wants to know if a humorous ad will be more effective than a nonhumorous ad. It develops two ads—one humorous and one non-humorous show a group of customers the two ads. After seeing the ads, the customers rate the company and its products, such as modern versus traditional or high quality versus low quality. MANOVA would be the technique to use to determine the extent of any statistical differences between the perceptions of customers who saw the humorous ad versus those who saw the non-humorous one. 1-25 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. CANONICAL ANALYSIS Logical extension to multiple regression analysis • Several metric dependent variables are predicted by several metric independent variables. • Development of a linear combination of each set of variables (both independent and dependent) so that maximizes the correlation between the two sets. • Procedure involves obtaining a set of weights for the dependent and independent variables that provides the maximum simple correlation between the set of dependent variables and the set of independent variables. E.g. Company collects information on its service quality based on answers to 50 metrically measured questions. • The study includes benchmarking information on perceptions of the service quality of worldclass companies as well as the company for which the research is being conducted. • Canonical correlation is used to compare the perceptions of the world-class companies on the 50 questions with the perceptions of the company. The research could then conclude whether the perceptions of the company are correlated with those of world-class companies. • The technique provides information on the overall correlation of perceptions as well as the correlation between each of the 50 questions. 1-26 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. CONJOINT ANALYSIS . . . is used to understand respondents’ preferences for products and services. • • • In doing this, it determines the E.g. - Assume a product concept has importance of both: three attributes (price, quality, and color), each at three possible levels (red, yellow, and blue). Instead of having to evaluate all 27 (3 * 3 * 3) possible combinations, a subset (9 or more) can be evaluated for their attractiveness to consumers (the attractiveness of red versus yellow versus blue). The results can also be used in product design simulators, which show customer acceptance for any number of product formulations and aid in the design of the optimal product. attributes and levels of attributes . . . based on a smaller subset of combinations of attributes and levels. 1-27 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. CONJOINT ANALYSIS Typical Applications: Soft Drinks Candy Bars Cereals Beer Apartment Buildings; Condos Solvents; Cleaning Fluids 1-28 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Structural Equations Modeling (SEM) • • • Technique that allows separate relationships for each of a set of dependent variables. Provides the appropriate and most efficient estimation technique for a series of separate multiple regression equations estimated simultaneously. Two basic components: • (1) the structural model - the path model, which relates independent to dependent variables. In such situations, theory, prior experience, or other guidelines enable the researcher to distinguish which independent variables predict each dependent variable. • (2) the measurement model enables the researcher to use several variables for a single independent or dependent variable. For example, the dependent variable might be a concept represented by a summated scale, such as self-esteem. 1-29 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Structural Equations Modeling (SEM) • E.g. - A study by management consultants identified several factors that affect worker satisfaction: • supervisor support, work environment, and job performance. • Also, supervisor support and the work environment not only affected worker satisfaction directly, but had possible indirect effects through the relationship with job performance, which was also a predictor of worker satisfaction. • To assess these relationships, multi-item scales for each construct was developed (supervisor support, work environment, job performance, and worker satisfaction). Work environment Supervisor support Job performance Worker satisfaction 1-30 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Factor analysis . . . analyzes the structure of the interrelationships among a large number of variables to determine a set of common underlying dimensions (factors). • • • E.g. - A researcher can use factor analysis, for example, to better understand the relationships between customers’ ratings of a fast-food restaurant. Assume you ask customers to rate the restaurant on the following six variables: food taste, food temperature, freshness, waiting time, cleanliness, and friendliness of employees. The analyst would like to combine these six variables into a smaller number. By analyzing the customer responses, the analyst might find that: • the variables food taste, temperature, and freshness combine together to form a single factor of food quality, whereas • the variables waiting time, cleanliness, and friendliness of employees combine to form another single factor, service quality. 1-31 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Cluster Analysis • . . . groups objects (respondents, products, firms, variables, etc.) so that each object is similar to the other objects in the cluster and different from objects in all the other clusters. • Cluster analysis usually involves at least three steps. • 1) measurement of some form of similarity or association among the entities to determine how many groups really exist in the sample. • 2) the actual clustering process, whereby entities are partitioned into groups (clusters). • 3) profile the persons or variables to determine their composition. • • E.g. Restaurant owner wants to know whether customers are patronizing the restaurant for different reasons. Data could be collected on perceptions of pricing, food quality, and so forth. Cluster analysis could be used to determine whether some subgroups (clusters) are highly motivated by low prices versus those who are much less motivated to come to the restaurant based on price considerations. 1-32 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Perceptual mapping (multidimensional scaling) • Objective is to transform consumer judgments of similarity or preference into distances represented in multidimensional space. • If objects A and B are judged by respondents as being the most similar compared with all other possible pairs of objects, perceptual mapping techniques will position objects A and B in such a way that the distance between them in multidimensional space is smaller than the distance between any other pairs of objects. E.g. - Owner of a Burger King franchise wants to know whether the strongest competitor is McDonalds or Wendy's. A sample of customers is given a survey and asked to rate the pairs of restaurants from most similar to least similar. The results show that the Burger King is most similar to Wendy's, so the owners know that the strongest competitor is the Wendy's restaurant because it is thought to be the most similar. Follow-up analysis can identify what attributes influence perceptions of similarity or dissimilarity. 1-33 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Correspondence Analysis . . . uses non-metric data and evaluates either linear or non-linear relationships in an effort to develop a perceptual map representing the association between objects (firms, products, etc.) and a set of descriptive characteristics of the objects. • • • • E.g. – Respondents brand preferences can be cross-tabulated on demographic variables (gender, income categories, occupation) by indicating how many people preferring each brand fall into each category of the demographic variables. Through correspondence analysis, the association of brands and the distinguishing characteristics of those preferring each brand are then shown in a two-or three-dimensional map of both brands and respondent characteristics. Brands perceived as similar are located close to one another. The most distinguishing characteristics of respondents preferring each brand are also determined by the proximity of the demographic variable categories to the brands position. 1-34 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Guidelines for Multivariate Analysis o Establish Practical Significance as well as Statistical Significance. o Sample Size Affects All Results. o Know Your Data. o Descriptive data; charts o Strive for Model Parsimony. o Irrelevant variables o multicollinearity o Look at Your Errors o starting points to diagnose validity and indication of the remaining unexplained relationships; o Validate Your Results. o o o Splitting the sample (one to estimate the model, the second to estimate predictive accuracy); Gather separate sample Bootstrapping-draw large number of subsamples, estimate models, calculate means of estimated coefficients, examine the actual values from the repeated samples. 1-35 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. A Structured Approach to Multivariate Model Building: Stage 1: Define the Research Problem, Objectives, and Multivariate Technique(s) to be Used Stage 2: Develop the Analysis Plan Stage 3: Evaluate the Assumptions Underlying the Multivariate Technique(s) Stage 4: Estimate the Multivariate Model and Assess Overall Model Fit Stage 5: Interpret the Variate(s) Stage 6: Validate the Multivariate Model 1-36 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. Description of HBAT Primary Database Variables Variable Description Data Warehouse Classification Variables X1 X2 X3 X4 X5 Customer Type Industry Type Firm Size Region Distribution System Variable Type nonmetric nonmetric nonmetric nonmetric nonmetric Performance Perceptions Variables X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 Product Quality E-Commerce Activities/Website Technical Support Complaint Resolution Advertising Product Line Salesforce Image Competitive Pricing Warranty & Claims New Products Ordering & Billing Price Flexibility Delivery Speed metric metric metric metric metric metric metric metric metric metric metric metric metric Outcome/Relationship Measures X19 X20 X21 X22 X23 Satisfaction Likelihood of Recommendation Likelihood of Future Purchase Current Purchase/Usage Level Consider Strategic Alliance/Partnership in Future Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. metric metric metric metric nonmetric 1-37 Multivariate Analysis Learning Checkpoint 1. What is multivariate analysis? 2. Why use multivariate analysis? 3. Why is knowledge of measurement scales important in using multivariate analysis? 4. What basic issues need to be examined when using multivariate analysis? 5. Describe the process for applying multivariate analysis. 1-38 Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.