Chapter 7
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
1
• Consider the situation where the means for more than two groups are compared, e.g. mean alcohol expenditure for:
(a) students; (b) unemployed; (c) employees
• One could run a set of two mean comparison tests (students vs. unemployed, students vs. employed, employed vs. unemployed)
• However, as seen in lecture 6, each of these tests is subject to Type one error (the level of significance a ), i.e. the probability of rejecting the null hypothesis when it is actually true
• Thus, the overall Type one error for the joint three tests is larger than a because the probability of Type one error increases with the number of tests
• This is the so-called problem of inflated family-wise (or experiment-wise) Type one error
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
2
• It is an alternative approach to mean comparison for multiple groups
• It is a set of techniques for a variety of situations
• It is applicable to a sample of individuals that differ for one or more given factors
• It allows tests where variability in a variable is attributable to one (or more) factors
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
3
EFS: Total
Alcoholic
Beverages,
Tobacco
Mean
St. Dev.
Self-
Economic position of Household Reference Person employed
Fulltime employee
Pt employee
Unempl.
Ret unoc over min ni age
Unoc - under min ni age TOTAL
18.56
14.64
12.39
19.48
7.34
11.99
12.67
19.0 18.5 15.0 19.7 14.6 19.1 17.8
Are there significant difference across the means of these groups?
Or do the differences depend on the different levels of variability across the groups?
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
4
• Here the target variable is alcohol expenditure, the factor is the economic position of the HRP
• Basically we investigate the attribution of a variation in the metric target variable to variations in one on more categorical explanatory variables (the factors)
• A treatment is a combination of different factors in n-way analysis of variance
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
5
• Only one categorical variable (a single factor)
• Several levels (categories) for that factor
• The typical hypothesis tested through ANOVA is that the factor is irrelevant to explain differences in the target variable (i.e. the means are equal, as in bivariate mean comparisons/t-tests)
• Apart from the tested factor(s), the groups should be safely considered homogeneous between each other
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
6
• Null hypothesis (H
0
): all the means are equal
• Alternative hypothesis (H
1 means are different
): at least two
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
7
Overall mean
1
Self-
Economic position of Household Reference Person
Group (g)
2 3 4 5 employed
Fulltime employee
Pt t employee
Unempl.
Ret unoc over min ni age
6
Unoc - under min ni age x
11 x
21 x
31
… n
1 x
1 x
21 x
22 x
32
…
Observations x
13 x
23 x
33
… x
14 x
24 x
34
… x n
2
Number of observations ( n ) n
3
Means n
4 x
2 x
3 x
4 x
15 x
25 x
35
… n
5 x
5 x
16 x
26 x
36
… x n
6
6
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
8
1. Decompose the total variation (sum of squares corrected for the mean)
2. Compute the F-test statistic
3. Choose the critical value
4. Interpret the result
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
9
• Suppose that we have n observation within each group and g group
Obs.
1
2
… i
… n
Group mean
TOTAL MEAN
Group (factor level)
1 2 … j … g x
11 x
12
… x
1j
… x
1g x
21 x
22
… x
2j
… x
2g
… … … … … … x i1 x i2
… x ij
… x ig
… … … … … … x n1 x n2
… x nj
… x nn x
1 x
2
… x j
… x g x
1 g g j
1 x j
10
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
• SUM OF SQUARES (corrected for the mean)
VARIATION BETWEEN THE GROUPS +
VARIATION WITHIN EACH GROUP=
________________________________
TOTAL VARIATION
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
11
s
T
2 g n c c
1 r
1
x rc
x
2 n
1
(TOTAL VARIANCE)
2 s
BW
g c
1
x c
x
2 n c g
1
2 s
W
g n c c
1 r
1
x rc
x c n c
1
2
(VARIANCE BETWEEN GROUPS)
(VARIANCE WITHIN GROUPS) g n c c
1 r
1
x rc
x
2 g n r c
1 r
1
x rc
x c
2 c g
1
2 x x n c c
1
1
( n 1) n g
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
12
• If the variation explained by the different factor between the groups is significantly more relevant than the variation within the groups, then the factor is assumed to be statistically relevant in explaining the differences
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
13
• The test statistic is computed as:
F
s
2
B
2
s
W
Variance between groups
Variance within groups
• This test statistic compares the weight of the variance explained by the factors to the weight of the variance not explained by the factors
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
14
Rejection area
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
15
1 2
)
• Its shape (critical value) changes according to the degrees of freedom (numbers of observation/ groups)
• It is a non-negative statistic (one-tailed test)
• For ANOVA testing: df
1 df
2
=g-1
=n-g
16
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
Factor
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
Target variable
17
ANOVA
EFS: Total Alcoholic Beverages, Tobacco
Between Groups
Within Groups
Total
Sum of
Squares
6171.784
151535.3
157707.1
df
5
494
499
Variance between
Mean Square
1234.357
306.752
F
4.024
Sig.
.001
Variation decomposition
Degrees of freedom
Variance within p-value < 0.05
The null is rejected
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
18
• Allows to test hypotheses on specific sub-sets of the treatments (factor combinations).
• They open the way to further explore the sources of variability when the null hypothesis of mean equality is rejected.
• Comparisons are usually based on a theory and planned before the analysis, thus they are also called planned comparisons
.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
19
• Linear contrasts are linear combinations of the means, allowing one to test other hypotheses than mean equality
• For example, one may want to test whether the mean for group one is double the mean for groups two and three, while the means of groups two and three are equal
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
20
• Contrasts are also useful after rejection of the null hypothesis of mean equality
• Rejection of the null hypothesis means that
at least two means differ, but it does not say which ones actually differ
• Planned comparisons through linear contrasts can help
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
21
• Test whether chicken expenditures increases linearly with household size
• Check whether there are significant differences:
• Between households with one or two components and households with more components
• Considering the following groups
• Households with one component
• Households with two components
• Households with more than two components
• Considering the following comparison
• Households with four, five, six and seven components have equal means
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
22
Descriptives
In a typical w eek how m uch do you s pend on fres h or frozen chicken (Euro)?
2
3
4
5
0
1
6
7
Total
N
1
82
145
93
87
24
10
1
443
95% C onfidence Interval for
Mea n
Mea n Std. Deviation Std. Error Low er Bound Upp er Bound
4.8000
.
.
.
.
4.2470
5.0548
6.3231
6.7334
7.5613
6.2730
6.7500
5.6677
2.82338
3.41626
4.71695
3.87396
7.64258
3.25606
1.02966
.
4.13383
.3 1179
.2 8371
.4 8912
.4 1533
1.56003
.1 9640
.
3.6266
4.4941
5.3517
5.9078
4.3341
3.9438
5.2817
.
4.8673
5.6156
7.2946
7.5591
10.7884
8.6022
6.0537
.
Minim um Maximum
4.80
4.80
.3 7
.0 0
15.00
20.00
.0 0
.0 0
30.00
18.00
.0 0
.0 0
6.75
.0 0
30.00
10.49
6.75
30.00
7 GROUPS
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
23
• 1 and 2 components versus 3, 4, 5, 6 and 7
• Weights (they need to sum to 0)
1 =
2 =
3 =
4 =
5 =
6 =
7 =
1
1
1
1
1
-2.5
-2.5
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
24
• Helmert contrasts: the first treatment is compared with all of the remaining treatments, the second treatment will all the remaining treatments but the first, the third treatment will all of the remaining ones but the first two, and so on.
• By looking at the results of this battery of tests, it becomes possible to identify those groups whose difference from the others is most relevant.
• polynomial contrasts: it is possible to tested whether the trend in means follows a linear, quadratic or cubic sequence or any polynomial relationship between the treatments,
• repeated contrasts: each treatment is compared with the one which follows
• reverse Helmert contrasts (or difference contrasts): Helmert contrasts going backwards
• simple contrasts where the user can choose the benchmark treatment between the first and the last category .
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
25
• Linear contrasts are carried out independently from each other
• Post-hoc tests consist in a set of paired comparisons , where the critical values are corrected to account for the problem of inflating the Type I Error risk (rejecting the null hypothesis when it is true) measured by the cumulative
Type I error or familiwise error.
• The approach to correcting the critical values determines the Type of test being used. In SPSS:
– Scheffe’s test
– Bonferroni’s test
– Tukey’s test.
26
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
• Scheffe test : simultaneous comparisons for all potential pair-wise and linear combinations of treatments using F critical values corrected to account for the number of treatment being compared
• Bonferroni post-hoc method: (1) run the usual pair-wise ttests; (2) to account for the inflated Type one error rate an adjustment is provided by dividing the family-wise error by the number of tests.
• Tukey’s test: also known as an Honestly Significant
Difference or HSD test, it can be used when samples are of equal size, but statistical packages usually provide variants for unequal sizes. With this test, significant differences are identified through an adjusted Studentized range
distribution (an extension of the Student t statistic which uses pooled estimate of the standard errors)
• More tests on the textbook
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
27
• The experimental factor matters, but how much?
( effect size )
• Larger F statistics do not necessarily imply larger effect sizes – because they also depend on sample sizes
• A typical measure of effect size is h 2 (the ratio between variation between and total variation)
• The power of a test is 1b where b is the probability of non-rejecting the null hypothesis when the alternative is true (Type II error)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
28
• One should check the probabilities of Type I error and power (Type II errors)
• There is a trade-off between power and
Type I error
• Conservative tests: low Type I error, low power
(Scheffé, Bonferroni)
• Tukeys test more appropriate for a large number of means
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
29
Target variable (scale)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
Planned comparisons
Factor (categorical)
Post-hoc tests
30
w
1
L
w
2
ML
w
3
MH
w
4
H
0
The polynomial contrast assumes that the mean follows a given polynomial (linear, quadratic, etc.)
Note: the null hypothesis is that the polynomial contrast does not hold
Other contrasts
Insert a weight for each subgroup
Note: the null hypothesis is that the contrast holds…
Click here to insert other sets of weight (one set of weight per comparison)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
31
Descriptives
EFS: Total Alcoholic Beverages, Tobacco
Low income
Medium-low income
Medium-high income
High income
Total
N
125
125
125
125
500
Mean
7.467
11.381
13.040
18.789
12.669
95% Confidence Interval for
Mean
Std. Deviation Std. Error Lower Bound Upper Bound
12.8693
1.1511
5.188
9.745
17.9038
16.9137
1.6014
1.5128
8.212
10.046
14.551
16.035
20.8025
17.7777
1.8606
.7950
15.106
11.107
22.472
14.231
Minimum Maximum
.0
70.0
.0
.0
93.9
79.4
.0
.0
92.5
93.9
EFS: Total Alcoholic Beverages , Tobacco
Between
Groups
Within Groups
Total
(Combined)
Linear Term Contras t
Deviation
Sum of
Squares
8289.482
7932.717
356.765
149417.6
157707.1
ANOVA df
496
499
3
1
2
Mean Square
2763.161
7932.717
178.382
301.245
F
9.172
26.333
.592
Sig.
.000
.000
.554
Mean equality is rejected
The means are compatible with a linear polynomial
And not compatible with a non-linear one
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
32
Contrast
1
2
Contrast Coefficients
Anonymised hhold inc + allowances (Banded)
Low income
0
1
Medium-low income
1
0
Medium-high income
-1
0
High income
0
-1
EFS: Total Alcoholic
Beverages, Tobacco
As sume equal variances
Does not as sume equal variances
2
1
Contrast
1
2
Contrast Tests
Value of
Contrast Std. Error
-1.659
2.1954
-11.322
2.1954
-1.659
2.2029
-11.322
2.1879
t
-.756
-5.157
-.753
-5.175
df
496
496
247.202
206.788
Sig. (2-tailed)
.450
.000
.452
.000
The first contrast (0, 1, -1, 0) holds (not rejected)
The second contrast (1, 0, 0, -1) (rejected)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
33
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
34
Multiple Comparisons
Dependent Variable: EFS: Total Alcoholic Beverages , Tobacco
Tukey HSD
Bonferroni
(I) Anonymised hhold inc
+ allowances (Banded)
Low income
Medium -low income
Medium -high incom e
High incom e
Low income
Medium -low income
Medium -high incom e
High incom e
(J) Anonym ised hhold inc
+ allowances (Banded)
Medium -low income
Medium -high incom e
High incom e
Low income
Medium -high incom e
High incom e
Low income
Medium -low income
High incom e
Low income
Medium -low income
Medium -high incom e
Medium -low income
Medium -high incom e
High incom e
Low income
Medium -high incom e
High incom e
Low income
Medium -low income
High incom e
Low income
Medium -low income
Medium -high incom e
*. The mean difference is s ignificant at the .05 level.
Std. Error
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
2.1954
Mean
Difference
(I-J)
-3.9147
-5.5737
-11.3224*
3.9147
-1.6590
-7.4077*
5.5737
1.6590
-5.7487*
11.3224*
7.4077*
5.7487*
-3.9147
-5.5737
-11.3224*
3.9147
-1.6590
-7.4077*
5.5737
1.6590
-5.7487
11.3224*
7.4077*
5.7487
Sig.
.283
.055
.000
.283
.874
.004
.045
.451
.069
.000
.004
.055
.874
.045
.000
.451
1.000
.005
.069
1.000
.055
.000
.005
.055
95% Confidence Interval
Lower Bound Upper Bound
-9.574
1.745
-11.233
.086
-16.982
-1.745
-7.318
-5.663
9.574
4.000
-13.067
-.086
-4.000
-11.408
5.663
1.748
.089
-9.730
-11.389
-17.138
-1.748
11.233
7.318
-.089
16.982
13.067
11.408
1.901
.242
-5.507
-1.901
-7.474
-13.223
-.242
-4.156
-11.564
5.507
1.592
-.067
9.730
4.156
-1.592
11.389
7.474
.067
17.138
13.223
11.564
Results for each paired comparison are reported and significance level adjusted
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
35
1. Explore differences in monthly food expenditure for different geographical regions
2. Explore differences in monthly food expenditure according to the point of purchase for the last food shopping
• 1. is a fixed effect which implies that the researcher can fully control the factor (treatment)
• 2. is a random effect where the factor (treatment) cannot be fully controlled and is subject to a
(random) measurement error
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
36
Two key assumptions are needed for running analysis of variance without risks
1) that the sub-samples defined by the treatment are independent
2) that no big discrepancies exist in the variances of the different sub-samples
• Normality within the sub-sample: within limits, departure from normality is not a serious issue
• Different variances: results are still reliable if the sizes of sub-samples are equal
• Both variances and sample sizes differ: high risk of biased results
• Adjustments: Brown-Forsythe test and/or the Welch test instead of the usual F test
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
37
Click on OPTIONS to request descriptive stats for a random effect model , Brown-Forsythe and Welch tests (plus more plots and descriptive statistics)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
38
• Exclusive samples extracted from the same population
• Kruskal–Wallis test: extends the Mann-Whitney test to the case of a higher number of sub-samples. It tests the null hypothesis that all the sub-populations have the same distribution function.
• Jonckheere-Terpstra test: the same null hypothesis, but against the alternative that an increase in treatment leads to an increase in the (median of the) dependent variable.
• Related samples (the same respondent may appear in several treatment sub-samples)
• Friedman test, Kendall test or Cochran Q test, extend to the multiple sample case some of the non-parametric tests for mean comparisons
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
39
Exhaustive sub-samples Related samples
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
40
Number of target variables
1
1
2 or more
1
2 or more
1
1
1
1
1
Number of factors
1
2 or more
1 or more
2 or more
2 or more
1 or more
1 or more
1 or more
1
1
Measurement of factors Technique nominal / ordinal independent samples nominal / ordinal independent samples nominal / ordinal independent samples nominal / ordinal and continuous, independent samples nominal / ordinal and continuous, independent samples nominal / ordinal repeated samples nominal / ordinal mixed samples
Nominal / ordinal random effects nominal / ordinal independent samples, nonnormal data and/or non-homogeneous independent samples nominal / ordinal independent samples, nonnormal data and/or non-homogeneous related samples
One-way ANOVA
Factorial ANOVA
MANOVA
ANCOVA
MANCOVA
Repeated ANOVA
Mixed ANOVA
Variance Component Model
Non-parametric tests: Kruskal–Wallis test or
Jonckheree-Terpstra test
Non-parametric tests: Friedman, Cochran Q or Kendall's test
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
41
• Multi-way (factorial) ANOVA
• Multivariate ANOVA (MANOVA)
• (Multivariate) Analysis of Covariance
(MANCOVA)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
42
• One-way ANOVA is equivalent to a linear model, where the target variable is the dependent variable and then each of the treatments is transformed into a dummy
variable which assumes a value of one if respondents are subject to that treatment.
This means that they belong to that economic condition and are zero otherwise.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
43
Target variable: Alcohol and tobacco expenditure y i
Factor: employment status
b
0
b
1
SE i
b
2
FT i
b
3
PT i
b
4
UN i
b
5
RE i
b
6
UA i
i
• y i is the amount spent in alcohol and tobacco by the i-th respondent
• SE i
=1 if the respondent is self-employed
• FT i
• PT i
=1 for full-time employees
=1 for part-time employees
• UN i
• RE i
=1 for unemployed resepondents
=1 for retired or inactive respondents and
• UA i
=1 for those under working age
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
44
• T-test on each coefficient: bivariate mean comparison
• F-test: one-way ANOVA
Other analyses of variance
Multi-way (Factorial) ANOVA: More than one factor
(interactions)
MANOVA: More than one target variable: allows one to test whether the factors lead to significant differences in a set of variables.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
45
• A final generalization which is quite interesting for consumer research is the
Analysis of Covariance (ANCOVA), which is the appropriate technique when some of the factors are continuous quantitative variables instead of being measured on a nominal or ordinal scale
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
46
Univariate GLM: ANOVA, n-way
ANOVA, ANCOVA
Multivariate GLM: MANOVA,
MANCOVA
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
47
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
Target variable
Factors (more than one for n-way ANOVA, random factors are allowed)
Scale variables for
ANCOVA
48
More than one target variable for MANOVA or MANCOVA
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
49