manova - Noppa

advertisement
Multiple Analysis of Variance – MANOVA
Joe F. Hair, Jr.
Kennesaw State University
For more details, see Chapter 7, Hair, et
al, Multivariate Data Analysis, 7e, 2010.
What is MANOVA? Concepts
• A statistical method for testing whether the vector of means (variate)
•
•
•
•
•
across groups on multiple variables are equal (i.e., the probability that
any differences in the variate means across several groups are due
solely to sampling error).
Variables in ANOVA (Analysis of Variance):
 Dependent variables are metric.
 Independent variable(s) is nominal with two or more levels – also
called treatment, manipulation, or factor.
One-way MANOVA: only one independent variable with two or more
levels.
Two-way MANOVA: two independent variables each with two or more
levels (factorial design). One or more could be control variables.
With MANOVA (Multiple Analysis of Variance), two or more metric
dependent variables are tested as the outcome of a treatment(s).
With MANOVA there are two variates – one for the dependent
variables and another for the independent variables. The variate
optimally combines the multiple dependent measures into a single
value that maximizes the differences across the groups.
MANOVA Concepts continued . . .
• Factorial design – a research design that has two or more non-metric
•
•
•
independent variables (factors or treatments).
Control (blocking) variables – an independent variable not of interest to
the study that may be a source of differences that may obscure some
results of interest to the study.
Metric control variables – are independent variables you wish to control
for in the analysis. They are included a covariates (ANCOVA &
MANCOVA).
Any time your analysis includes more than one independent variable
interaction effects are created.
Null and Alternative Hypotheses
H0: The vectors of means on multiple dependent variables
across groups are the same (equal).
Ha: The vectors of means on multiple dependent variables
for at least one pair of groups are different.
How do you determine which means
are significantly different?
• The Hotelling’s T2 assesses whether you can conclude
that statistical differences are present somewhere
between the variates of means across the groups.
• To identify where the differences are between each of
the groups (when there are three or more groups) you
must use follow-up (post-hoc) tests called “multiple
comparison tests”. Many multiple comparison tests
are available in SPSS.
MANOVA Terms
Main Effect = the impact any single experimental
variable has on a response (dependent) variable.
Interaction Effect = the combined impact of multiple
independent variables on a response variable; i.e., is the
difference in the mean ratings of the ads (response
variable) the same when we compare males and
females?
Blocking Variable = a grouping variable the researcher
doesn’t manipulate or control in any way, such as gender.
What multiple comparison tests
are available in SPSS?
Games-Howell
recommended if
equal variances not
assumed.
Scheffe
recommended if
equal variances
assumed.
What assumptions need to be considered?
• Samples are independent (independence of observations).
• Dependent variables are normally distributed for each of the
•
•
•
•
•
samples – with larger sample sizes ( > 20/group) not a serious
problem should this be violated somewhat.
Dependent variables must exhibit multicollinearity.
Whether the sample sizes for the groups are very different (a
ratio of 1.5 or higher across groups may be a problem).
The variances for the different populations from which the
samples are drawn are equal (homoscedasticity = homogeneity of
the variance-covariance matrices among the groups) – possibly
a problem if they are not equal or at least comparable.
Outliers need to be identified and removed from the sample, or
other adjustments undertaken.
In general MANOVA is a fairly robust procedure.
HBAT DATABASE VARIABLES
Example 1: Two Group MANOVA
HBAT 200 Database (MDA, pp. 383-388)
•
•
•
Independent variables are: X5 (Distribution System), and X1
(Customer Type).
Dependent variables are: X19 (Satisfaction), X20 (Likelihood of
Recommending HBAT), and X21 (Likelihood of Future Purchases)
Research questions are:
1.
What differences are present in customer satisfaction
and other purchase outcomes between the two channels in
the distribution system?
2. Is HBAT establishing better relationships with its customers
over time, as reflected in customer satisfaction and other
purchase outcomes?
3. What is the relationship between the distribution system and
these relationships with customers in terms of the purchase
outcomes?
USING SPSS TO DO A Two Group MANOVA
The metric dependent variables for this hypothesis are X19 (Satisfaction), X20
(Likelihood of Recommending HBAT), and X21 (Likelihood of Future Purchases).
The nonmetric independent variable is X5 (Distribution System). The click-through
sequence to run the two group MANOVA is: ANALYZE  GENERAL LINEAR
MODEL  UNIVARIATE. Click on X19, X20 and X21 to highlight them and then on
the arrow box to move them into the Dependent Variable box. Click on X5 to
highlight it and then on the arrow box to move it to the box labeled “Fixed Factors.”
Click on the Options tab and highlight the (OVERALL) and X5 in the Factors and
Factor Interactions window, and move them into the Display Means for: window.
Next check the Compare main effects box and then in the Display window below click
on Descriptive statistics, Observed power and Homogeneity Tests (Levene test of
equal variances) and then Continue. Now go to the Plots tab and highlight X5 and
move it to the Horizontal Axis window and then under Plots below click Add. Finally,
click on Continue and then OK to execute the program.
10
XXX
Differences between Two
Independent Groups
X5 (Distribution System)
XXX
Differences between Two Independent Groups
X5 (Distribution System)
0 = Indirect through Broker 1 = Direct to Customer
XXX
XXX
Differences between Two Independent Groups
X5 (Distribution System)
0 = Indirect through Broker
1 = Direct to Customer
Higher means = more
favorable ratings (0-10
scale). Direct to Customer
consistently rated higher.
Assessing Assumptions – Homoscedasticity
Levene’s Test is a univariate assessment of homoscedasticity. All
three metric dependent variables are not significant, indicating
univariate homogeneity of variance across the two groups.
Assessing Assumptions – Homoscedasticity
Box’s M is a multivariate
assessment of homoscedasticity.
The non-significant results
indicate multivariate homogeneity
of variance across the two groups.
Assessing Assumptions – Outliers
The three dependent variables were examined for
outliers using the SPSS Explore and Box Plots
options. These three plots identify several outliers,
but none was an outlier for all three variables.
Respondent 38 was an outlier twice for the Direct
to Customer group and was considered for
removal, but a decision was made to retain them.
The outliers are at the high end of the distribution,
individuals that are very favorable about HBAT. But
overall the observations are about equally above
and below the mode.
Assessing Assumptions – Normality
Histograms
The histograms below show the data for X19 – Satisfaction divided by the two
distribution groups (X5). There is some deviation from normal but for social
science empirical data this type of deviations is typical. Similar charts were
completed for variables X20 and X21 and the overall conclusion was the
deviations from normal were not sufficient to justify data transformations.
Assessing Assumptions – Normality
Normal Q-Q Plot
A Q-Q plot displays observed values against a known distribution, in this case a normal
distribution. If the sample distribution is normal, the plot will have observations distributed
closely around the straight line. In the above chart, the expected normal distribution is the
straight line angled at 45 degrees and the line of little circles is the observed values from the
HBAT sample data for X19 – Satisfaction (i.e., the actual sample distribution shown as it
deviates from the straight line) . The plot shows the distribution is pretty much normal.
Assessing Assumptions – Normality
Detrended Normal Q-Q Plot
The Detrended Normal Q Q plot, shows the differences between the observed and
expected values of a normal distribution. If the distribution is normal, the points should
cluster in a horizontal band around zero with no pattern. The charts above suggest some
deviation from normal, particularly the left chart that has an up and down pattern . Our
overall conclusion is that this distribution is close enough to treat as a normal distribution,
and similar to most empirical data in social science research.
Assessing Assumptions – Normality
Statistical Tests
The above tests of normality can be obtained from the Explore option. Both tests calculate
the level of significance for the differences from a normal distribution. The tests are not
useful for very small samples (N = <30), quite sensitive to larger samples (N = 1,000+),
and interpreted cautiously with samples that range from N = 200 to 1,000.
The tests for variables X20 and X21 were both not significant, indicating no issues with
normality. Both tests for X19 are significant but based on other information the deviations
from normality for this variable are acceptable and do not require transformation.
XXX
Running Explore – SPSS Steps
XXX
Statistics and Plots tabs
What to check?
Multivariate Statistical Testing
Question: Do the two groups exhibit statistically significant
differences on the three purchase outcome variables? Yes
The four multivariate tests all confirm statistically significant
differences between the two types of distribution. These
four tests assess the set of three variables combined.
The univariate tests below assess each variable separately
and all indicate statistically significant differences. See
means on slide 12 of this presentation.
The power for the
statistical tests was
1.0, indicating the
sample sizes and
effect size were
sufficient to support
significant differences
if detected.
Overall Interpretation of MANOVA Results
Differences between Two Independent Groups
X5 (Distribution System)
0 = Indirect through Broker 1 = Direct to Customer
The results summarized in the previous slides on
variable X5 confirm that the type of distribution channel does
affect customer perceptions with regard to the three purchase
outcomes. The statistically significant differences, that are
of sufficient magnitude to base managerial decisions on, and
the consistently more favorable evaluations (higher means)
on the purchase outcomes variables (X19, X20 & X21),
indicate that the direct distribution channel is more effective
in creating positive customer perceptions on a wide range of
purchase outcomes.
Differences between Three Independent Groups
X1 (Length of Time a Customer)
1 = Less than 1 Year
2 = 1 to 5 Years
3 = Over 5 Years
XXX
XXX
The sample sizes are almost equally split between the groups. The
group means are larger for longer term customers indicating more
favorable purchase outcome perceptions.
The multivariate test of homoscedasticity above
(Box’s M) indicates that heteroscedasticity is not
present (lack of significance = .069). The
Levene’s univariate test for equality of error
variances indicates two variables (X20 & X21) are
not significant = homoscedasticity, but X19 is
highly significant (.001) indicating that
heteroscedasticity is likely present in that
variable. Given the relatively large sample sizes
in all the groups and homoscedasticity for
variables X20 and X21, corrective remedies are
not necessary for X19.
MANOVA Results
Differences between Three Independent Groups
X1 (Length of Time a Customer)
1 = Less than 1 Year 2 = 1 to 5 Years 3 = Over 5 Years
Interpreting MANOVA results when an independent variable has three or
more levels requires a two-step process:
1.Examination of the main effect of the independent variable on the
dependent variables. The two tests on the next slide (#28) – Multivariate
Tests and Univariate Tests indicate there is a significant main effect
across the three groups on the dependent variables.
2.Identifying differences, if any, between the three dependent variables
for the three or more individual groups of the independent variable. The
post hoc tests on slide #29 indicate significant differences between all
three groups on variable X19, but for variables X20 and X21 the
differences between the 1 to 5 Years and More than 5 Years groups are
not statistically different.
This second level of analysis is necessary when an independent
variable has three or more groups.
Main Effects Tests
The four multivariate tests all confirm statistically significant differences on the three
dependent variables combined (as a variate) between the three groups based on length of
time a customer. The four tests assess the set of three dependent variables combined.
The univariate tests above assess each variable separately across
all groups. The results indicate statistically significant differences
between the three groups. See means on previous slide.
The post hoc tests below assess each variable separately and between all combinations
of the three groups. Results indicate statistically significant differences between all three
groups for variable X19, but the 1 – 5 Years and Over 5 Years groups for variables X20
and X21 are not significantly different.
MANOVA Results
Differences between Three Independent Groups
X1 (Length of Time a Customer)
1 = Less than 1 Year 2 = 1 to 5 Years 3 = Over 5 Years
The results summarized in the previous slides on variable
X1 confirm that the Length of Time a firm is a customer does affect
customer perceptions with regard to the three purchase outcomes.
The overall main effects measured using both multivariate and
univariate tests are statistically different. Moreover, the consistently
more favorable evaluations (higher means) on the three purchase
outcomes variables (X19, X20 & X21) indicate that longer term
customers are more favorable. But when examined individually we
see that for X20 and X21 the differences are not statistically
significantly different between the 1 – 5 and More than 5 Years
customer groups.
Factorial Design – MANOVA with Two Independent Variables
X1 (Length of Time a Customer) and X5 (Distribution System)
XXX
MANOVA Example – 2 x 3 Design
Factorial Design with Two Independent variables
X1 and X5
Research Questions:
1. What are the main effects of the two independent variables?
2. What are the interaction effects?
Sample Size
Considerations
The sample sizes
are generally OK
with the exception of
Direct to Customer
for Less than 1 year
(N = 16).
This means
statistical
significance related
to this group must
be interpreted
cautiously.
Homoscedasticity Tests
As with prior MANOVA examples, the assumption of greatest
importance is the homogeneity of variance-covariance matrices
across groups.
The multivariate test of homoscedasticity (Box’s M) indicates
that heteroscedasticity is not present (lack of significance = .153).
The Levene’s univariate test for equality of error variances
indicates two variables (X20 & X21) are not significant (.113 &
.425) = homoscedasticity. Variable X19 is not significant (.059)
either indicating that heteroscedasticity is likely not present in any
of the three variables.
The MANOVA model for a factorial design tests not only for the main effects of both
independent variables, but also their interaction or joint effect on the dependent variables. The
first step is to examine the interaction effect and determine whether it is statistically significant.
If it is significant, then the researcher must confirm that the interaction effect is ordinal. If it is
found to be disordinal, the statistical tests of main effects are not valid. But assuming a
significant ordinal or a nonsignificant interaction effect, the main effects can be interpreted
directly without adjustment.
Interaction effects can be identified both graphically and statistically. The most common
graphical means is to create line charts depicting pairs of independent variables. As illustrated
in the graphs on the following slides, significant interaction effects are represented by
nonparallel lines (with parallel lines denoting no interaction effect). If the lines depart from
parallel but never cross in a significant amount, then the interaction is deemed ordinal. If the
lines do cross to the degree that in at least one instance the relative ordering of the lines is
reversed, then the interaction is deemed disordinal.
The next slide portrays each dependent variable across the six groups, indicating by the
nonparallel pattern that an interaction may exist. As we can see in each graph, the middle
level of X1 (1 to 5 years with HBAT) has a substantially smaller difference between the two
lines (representing the two distribution channels) than the other two levels of X1. We can
confirm this observation by examining the group means from slide #34. Using X19
(Satisfaction) as an example, we see that the difference between direct and indirect
distribution channels is 1.138 for customers of less than 1 year, which is quite similar to the
difference between channels (1.325) for customers of greater than 5 years. However, for
customers served by HBAT from 1 to 5 years, the difference between customers of the two
channels is only (.285). Thus, the differences between the two distribution channels, although
found to be significant in earlier examples, can be shown to differ (interact) based on how long
the customer has been with HBAT. The interaction is deemed ordinal because in all instances
the direct distribution channel has higher satisfaction scores.
Graphical Displays of Interaction
Effects of Purchase Outcomes
(X19, X20 & X21)
Across Groups (X1 & X5)
Testing Interaction and Main Effects
Effect size is much smaller for interaction
than for independent variables (X1 & X5).
The above table contains the MANOVA results for testing both the interaction and main effects. To
test for a significant interaction effect you first examine the multivariate effects and in this case all four tests
are statistically significant. Next (see next slide) , univariate tests for each dependent variable are
examined – results show the interaction effect is significant for each of the three dependent variables.
The statistical tests confirm what was shown in the graphs: A significant ordinal interaction effect
occurs between X5 and X1.
Note: normality and presence of outliers for the variables included in this example were
examined and determined to be within acceptable ranges.
Testing Main Effects – Univariate
η2
Effect
Overall
Interaction
Effect
All relationships tested
above were significant.
Effect size is smaller for
interaction than for independent
variables (X1 & X5).
Estimating Main Effects
If the interaction effect is deemed nonsignificant or even significant and ordinal, then the
next step examines the significance of the main effects for their differences across the
groups. If a disordinal interaction effect (non-parallel lines) is found, the main effects are
confounded by the disordinal interaction and tests for differences should not be
performed.
With a significant ordinal interaction (parallel lines), the next step is to determine whether
both independent variables still have significant main effects when considered
simultaneously. The previous slides show the MANOVA results for the main effects of X1
and X5 and the tests for the interaction effect. X1 (Customer Type) and X5 (Distribution
System) have a significant impact (main effect) on the three purchase outcome variables,
both as a set (multivariate) and separately (univariate).
The impact of the two independent variables can be compared by examining the relative
effect sizes as shown by η2 (eta squared). The effect sizes for each variable are
somewhat higher for X1 when compared to X5 on either the multivariate or univariate
tests. For example, with the multivariate tests the eta squared values for X1 range from
.244 to .488, but they are lower (all equal to .285) for X5. Similar patterns can be seen
on the univariate tests. This comparison gives an evaluation of practical significance
separate from the statistical significance tests. When compared to either independent
variable, however, the effect size attributable to the interaction effect is much smaller
(e.g., multivariate eta squared values ranging from .062 to .101).
Interpretation of Results
Interaction of X1 by X5
The non-parallel lines for each dependent measure are shown by the marked
narrowing of the differences in distribution channels for customers of one to five
years (see slide #38). While the effects of X1 and X5 are still present, there are
marked differences in these impacts depending on which specific sets of
customers we examine – direct distribution versus broker.
Main Effect of X1
This is illustrated for all three purchase outcomes by the upward sloping lines
across the three levels of X1 on the x-axis. The effects are consistent with
earlier findings in that all three purchase outcomes increase favorably as the
length of the relationship with HBAT increases.
Main Effect of X5
Shown by the separation of the two lines representing the two distribution
channels, with the direct distribution curve being consistently higher. Thus, the
direct distribution channel consistently generates more favorable purchase
outcomes.
Download