Statistics for Marketing and Consumer Research

advertisement
Correlation and regression
Chapter 8
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
1
Correlation
• Measures the strength and direction of a
relationship between two metric variables
• For example, the relationship between
weight and height or between consumption
and price: it is not a perfect (deterministic)
relationship, but we expect to find one on
average
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
2
Correlation
• Correlation measures to what extent two (or
more) variables are related
• Correlation expresses a relationship that is not
necessarily precise (e.g. height and weight)
• Positive correlation indicates that the two variables
move in the same direction
• Negative correlation indicates that they move in
opposite directions
• The question is – do two variables move together?
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
3
Covariance
• measures the co-movement of two variables x and y across observations
n
• Sample covariance estimate:
COV ( x, y )  sxy 
 ( x  x )( y  y )
i 1
i
i
n 1
• For each observation i, a situation where both x and y are above or
below their respective sample means increases the covariance value,
while the situation where one of the variables is above the sample
mean and the other is below decreases the total covariance.
• Contrarily to variance, covariance can assume both positive and
negative values.
• If x and y always move in opposite direction, all terms in the summation
above will be negative, leading to a large negative covariance. If they
always move in the same direction there will be a large positive
covariance.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
4
From covariance to correlation
• Covariance, like variance, depends on the
measurement units.
• If one measures prices in dollars and consumption
in ounces, a different covariance value is obtained
as compared to the use of prices in Euros and
consumption in Kilograms, even if both situations
refer exactly to the same goods and observations.
• Some form of normalization is needed to avoid the
measurement unit problem
• The usual approach is standardization, which
requires subtracting the mean and dividing by the
standard deviation.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
5
Correlation
• Considering the covariance expression where the
numerator is based already on differences from
the means, all that is required is dividing by the
sample standard deviations, for both x and y.
n
 ( x  x )( y  y )
CORR( X , Y )  rxy 
sxy
sx s y
i 1

i
n 1
n

n
 (x  x )  ( y  y)
2
i 1
i
n 1
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
n
i
i 1
i
2
 ( x  x )( y  y )
i
i 1
i
n
n
 (x  x )  ( y  y)
2
i 1
i
i 1
2
i
n 1
6
Correlation coefficient
• The correlation coefficient r gives a measure
(in the range minus one to plus one) of the
relationship between two variables
• r=zero means no correlation
• r=plus one means perfect positive correlation
• r=minus one means perfect negative correlation
• Perfect correlation indicates that a p%
variation in x always corresponds to a p%
variation in y
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
7
Correlation and causation
• Note that no assumption or consideration is
made on causality
• The existence of a positive correlation of x
and y does not mean that it is the increase
in x which leads to an increase in y, but only
that the two variables move together(to
some extent)
• Thus, correlation is symmetric, so that rxy=
ryx
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
8
Correlation as a sample statistic
• Correlation is more than an indicator; it can
be regarded as a sample statistic (which
allows hypothesis testing)
• The sample measure of correlation is
affected by sampling error: for example a
small but positive correlation observed on a
sample might hide a null (or negative) true
correlation in the population
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
9
Correlation and inference
•
Some assumptions (checks) are needed
a)
the relationship between the two variables should be
linear (a scatterplot could allow the identification of nonlinear relationships)
the error variance should be similar for different
correlation levels
the two variables should come from similar statistical
distributions
If the two variables can be assumed to derive from
normal distributions it becomes possible to run hypothesis
testing
b)
c)
d)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
10
Hypothesis testing on correlations
• This fourth condition (d. ante) can be ignored
when the sample is large enough (fifty or more
observations).
• In these cases one can exploit the probabilistic
nature of sampling to run an hypothesis test on the
sample correlation coefficient
• The null hypothesis to be tested is that the
correlation coefficient in the population is zero.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
11
Bivariate correlations
• There are two elements to be considered
• the value of the correlation coefficient r, which indicates to what
extent the two variables move together
• the significance of the correlation (a p value), which helps a
decision as to whether the hypothesis that r=zero (no correlation in
the population) should be rejected.
• Examples
– a correlation coefficient r=0.6 suggests a relatively strong
relationship, but a p value well above 0.05 indicates that the
hypothesis that the actual correlation is zero cannot be rejected at
the 95% confidence level.
– r=0.1 and p value below 0.01. (Thanks to a larger sample) one can
be confident (at the 99% level) that there is a positive relationship
between the two variables, although the relationship is weak.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
12
The influence of third factors
• The correlation coefficient between x and y is only meaningful if one
can safely assume that there is no other intervening variable which
affects the values of x and y.
• In the supermarket example, a negative correlation between prices and
consumption is expected.
• However, suppose that one day the government introduces a new
tax which reduces the average available income by 10%. Consumers
have less money and consume less. The supermarket tries to
maintain its customers by cutting all prices, so that the reduction in
prices mitigates the impact of the new tax. If we only observe
prices and consumption, it is possible that we observe lower prices
and lower consumption and the bivariate correlation coefficient
might return a positive value.
• Thus, we can only use the correlation coefficient when the ceteris
paribus condition holds (all other relevant variables being constant)
• This is rarely the case, so it is necessary to control for other influential
variables like income in the price-consumption relationship.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
13
Partial correlation
• The partial correlation coefficient allows one to evaluate
the relationship between two variables after controlling for
the effects of one or more additional variables.
• For example, if x is price, y is consumption and z is income,
the partial correlation coefficient is obtained by correcting
the correlation coefficient between x and y after
considering the correlation between x and z and the
correlation between y and z
rxy z 
rxy  rxz ryz
1 r
2
xz
1 r
2
yz
• This can be generalized to control for more variables
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
14
Other correlation statistics
• Part (semi-partial) correlation: controls for the
correlation between the influencing variable z and
only one of the two variables x and y
• Non-parametric correlation statistics (rank-order
association):
• Spearmans Rho
• Kendalls Tau-b statistics
• Multiple correlation coefficient (regression
analysis): joint relationship between one variable
(the dependent variable) and a set of other
variables.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
15
Correlation and covariance in SPSS
Choose
between
bivariate &
partial
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
16
Bivariate correlation
Select the variables
you want to analyse
Non-parametric
association measures
Require the
significance level
(two tailed)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
Descriptive statistics
(incl. covariance)
17
Pearson bivariate correlation output
Correl ations
In a ty pical
week how
much fresh or
frozen chicken
do you buy for
your
household
consumption
(Kg.)?
In a ty pical week how
much fresh or frozen
chicken do you buy
for your household
consumption (Kg.)?
Pearson Correlation
Average price
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Income level
1
Sig. (2-tailed)
N
446
-.327**
.000
438
.088
.125
304
Average price
Income level
-.327**
.088
.000
.125
438
304
1
.091
.117
300
1
438
.091
.117
300
342
**. Correlation is significant at the 0.01 level (2-tailed).
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
18
Non-parametric tests output
Correlations
In a typical
week how
much fresh or
frozen chicken
do you buy for
your
household
consumption
(Kg.)?
Kendall's tau_b
In a typical week how
much fresh or frozen
chicken do you buy
for your household
consumption (Kg.)?
Correlation Coefficient
Average price
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Income level
Spearman's rho
In a typical week how
much fresh or frozen
chicken do you buy
for your household
consumption (Kg.)?
Average price
Income level
Sig. (2-tailed)
N
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
1.000
Average price
Income level
-.469**
.059
.
.000
.176
446
438
304
-.469**
.000
438
.059
.176
304
1.000
.
438
-.008
.854
300
-.008
.854
300
1.000
.
342
1.000
-.630**
.079
.
.000
.171
446
438
304
1.000
.
438
-.009
.880
300
-.009
.880
300
1.000
.
342
-.630**
.000
438
.079
.171
304
**. Correlation is significant at the 0.01 level (2-tailed).
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
19
Partial correlations
List of
variables to be
analysed
Control
variables
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
20
Partial correlation output
Corre lations
Control Variables
Average price
In a ty pical
week how
much fresh or
frozen chicken
do you buy for
your
household
consumption
(Kg.)?
Income level
1.000
.129
Significance (2-t ailed)
.
.026
df
0
297
.129
.026
297
1.000
.
0
In a ty pical week how
much fresh or frozen
chick en do you buy
for your household
consumption (Kg.)?
Correlation
Income level
Correlation
Significance (2-t ailed)
df
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
Partial correlations
still measure the
correlations between
two variables, but
eliminate the effect
of other variables,
i.e. the correlation
reflects the
relationship between
consumption and
income for
consumers facing
the same price
21
Bivariate linear regression
yi     xi   i
Dependent variable
(Random) error term
Intercept
Explanatory variable
Regression coefficient
• Causality (from x to y) is now assumed
• Regressing stands for going backwards from the dependent variable to
its determinant
• The error term embodies anything which is not accounted for by the
linear relationship
• The unknown parameters ( and ) need to be estimated (usually on
sample data). We refer to the sample parameter estimates as a and b
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
22
Least squares estimation of the
unknown parameters
• For a given value of the parameters, the error (residual)
term for each observation is
ei  yi  a  bxi
• The least squares parameter estimates are those who
minimize the sum of squared errors:
n
n
SSE   ( yi  a  bxi )  ei
2
i 1
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
2
i 1
23
Assumptions on the error term (1)
1.The error term has a zero mean
2.The variance of the error term does not
vary across cases (homoskedasticity)
3.The error term for each case is
independent of the error term for other
cases
4.The error term is also independent of the
values of the explanatory (independent)
variable
5.The error term is normally distributed
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
24
Assumptions on the error term (2)
1. The error term has a zero mean
•
(otherwise there would be a systematic bias which
should be captured by the intercept)
2. The variance of the error term does not vary
across cases (homoskedasticity)
•
for example, the error variability should not become
larger for cases with very large values of the
independent variable. Heteroskedasticity is the
opposite condition
3. The error term for each case is independent
from the error term for other cases
•
The omission of relevant explanatory variables would
break this assumption, as an omitted independent
variable (correlated across cases by definition) ends up
in the residual term and induces correlation.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
25
Assumptions on the error term(3)
4. The error term is also independent of the values
of the explanatory (independent) variable
•
•
•
otherwise it would mean that the variable is not truly
independent and is affected by changes in the dependent variable
Frequent problem: sample selection bias, which occurs when nonprobabilistic samples are used, that is the sample only includes
units from a specific group
Example: response to advertising by sampling those who purchase
a deodorant after seeing an advert; those who decided not to buy
the deodorant are not taken into account, even if they saw the
advert. Correlated observations do not enter the analysis and this
leads to a correlated error term
5. The error term is normally distributed
•
•
This corresponds to the assumption that the dependent variable is
normally distributed for any value of the independent variable(s).
Normality makes life easier with hypothesis testing, but there are
ways to overcome the problem if the distribution is not normal.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
26
Prediction
• Once a and b have been estimated, it is
possible to predict the value of the
dependent variable for any given value of
the explanatory variable
yˆ j  a  bx j
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
27
Model evaluation
• An evaluation of the model performance can be based on
the residuals, which provide information on the capability
of the model predictions to fit the original data (goodnessof-fit)
• Since the parameters a and b are estimated on the sample,
just like a mean, they are accompanied by the standard
error of the parameters, which measures the precision of
these estimates and depends on the sampling size.
• Knowledge of the standard errors opens the way to run
hypothesis testing and compute confidence intervals for
the regression coefficients (see lecture 6).
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
28
Hypothesis testing on regression
coefficients
• T-test on each of the individual coefficients
• Null hypothesis: the corresponding population coefficient is zero.
• T statistic: simply divide the estimate (for example a, the estimate
f ) by its standard error (sa).
• The p-value allows one to decide whether to reject or not the null
hypothesis that =zero, depending on the confidence level
• F-test (multiple independent variables, as discussed later)
• It is run jointly on all coefficients of the regression model
• Null hypothesis: all coefficients are zero
• The F-test in linear regression corresponds to the ANOVA test (and
the GLM is a regression model which can be adopted to run ANOVA
techniques)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
29
Coefficient of determination
n
n
i 1
i 1
SSE   ( yi  a  bxi ) 2  ei 2
The sum of squared errors is a
measure of the variability which is not
explained by the regression model
n
2
the
(
y

y
)

i
and
1
coefficient of determination R is ithe
Where SSR iscoefficient
the portion of variability which is
the correlation
SSTsquare
 SSR ofSSE
explained by the regression model:
n
between y and x
The In
sumaofBIVARIATE
squared errors is aREGRESSION,
portion
SST 
of total variation, which is measured by
2
The natural candidate for measuring how well the model
fits the data is the coefficient of determination, which
varies between zero (when the model does not explain any
of the variability of the dependent variable) and 1 (when
the model fits the data perfectly) :
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
SSR   (a  bxi  y ) 2
i 1
SSR
SSE
R 
 1
SST
SST
2
30
Bivariate regression in SPSS
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
31
Regression output
Only 5% of total variation is
explained by the model
(correlation is 0.23)
Model Summary
Model
1
R
.232a
R Square
.054
Adjusted
R Square
.052
Std. Error of
the Estimate
.532591
a. Predictors: (Constant), Hous ehold size
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
8.036
141.259
149.295
df
1
498
499
Mean Square
8.036
.284
F
28.329
a. Predic tors: (Constant), Household s ize
b. Dependent Variable: Eggs
a
Coefficients
Model
1
(Constant)
Household size
Unstandardized
Coefficients
B
Std. Error
.235
.049
.095
.018
a. Dependent Variable: Eggs
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
Standardized
Coefficients
Beta
.232
t
4.834
5.323
Sig.
.000
.000
Sig.
.000a
The F-test
rejects the
hypothesis that
all coefficients
are zero
Both parameters
are statistically
different from
zero according to
the
t-test
32
Multiple regression
•
The principle is identical to bivariate regression,
but there are more explanatory variables
yi   0  1 x1i   2 x2i  ...   k xki   i
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
33
Additional assumption
6. The independent variables are also
independent of each other. Otherwise we
could run into some double-counting
problem and it would become very difficult
to separate the meaning.
•
•
Inefficient estimates
Apparently good model but poor forecasts
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
34
Collinearity
• Assumption six refers to the so-called collinearity (or
multicollinearity) problem
• Collinearity exists when two explanatory variables are
correlated).
• Perfect collinearity: one of the variables has a perfect (1 or
-1) correlation with another variable, or with a linear
combination of more than one variable. This makes
estimation impossible.
• Strong collinearity makes estimates of the coefficients
unstable and inefficient, which means that the standard
errors of the estimates are inflated as compared to the
best possible solution.
• Furthermore, the solution becomes very sensitive to the
choice of which variables to include in the model.
• When there is multicollinearity the model might look very
good at a first glance, but produces poor forecasts.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
35
Goodness-of-fit
• The coefficient of determination R2 (still computed as the
ratio between SSR and SST), always increases with the
inclusion of additional regressors
• This is against the parsimony principle; models with many
explanatory variable are more demanding in terms of data
(higher costs) and computations.
• If alternative nested models are compared those with more
explanatory variables result in a better fit.
• Thus, a proper indicator is the adjusted R2 which accounts
for the number of explanatory variables (k) in relation to
the number of observations (n)
n -1
2
2
R  1  (1  R )
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
n - k -1
36
Multiple regression in SPSS
• Analyze / Regression / Linear
Simply select
more than one
explanatory
variable
Click on
STATISTIC for
collinearity
diagnostics and
more statistics
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
37
Additional statistics
Part and
partial
correlations
are provided
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
38
Output
Model Summary
Model
1
R
.439a
R Square
.193
Adjusted
R Square
.176
Std. Error of
the Estimate
1.54949
a. Predictors: (Constant), Average price, Chicken is a
safe food, In my household we like chicken, Pleas e
indicate your gross annual household income range,
house
Age, Number of people currently living in your
household (including yourself)
The model accounts for 19.3%
of variability in the dependent
variable. After adjusting for the
number of regressors, the R2 is
0.176
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
166.371
696.268
862.639
df
6
290
296
Mean Square
27.728
2.401
F
11.549
Sig.
.000a
a. Predictors: (Constant), Average price, Chicken is a s afe food, In my household we
like chicken, Please indicate your gros s annual household income range, Age,
Number of people currently living in your household (including yourself)
The null
hypothesis
that all
regressors
are zero is
strongly
rejected
b. Dependent Variable: In a typical week how much fresh or frozen chicken do you buy
for your household consumption (Kg.)?
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
39
Output
Coefficientsa
Model
1
(Constant)
In my hous ehold we like
chicken
Chicken is a safe food
Age
Number of people
currently living in your
household (including
yourself)
Please indicate your
gross annual household
income range
Average price
Unstandardized
Coefficients
B
Std. Error
-.362
.684
Standardized
Coefficients
Beta
t
-.529
Sig.
.597
Tolerance values close to 1
and low VIF indicate
multicollinearity is not an
issue
Correlations
Zero-order
Partial
Part
Collinearity Statistics
Tolerance
VIF
.122
.078
.084
1.562
.119
.134
.091
.082
.974
1.027
.090
.005
.062
.006
.077
.049
1.449
.900
.148
.369
.097
.069
.085
.053
.076
.047
.973
.954
1.028
1.049
.277
.074
.210
3.756
.000
.286
.215
.198
.890
1.124
.098
.068
.080
1.440
.151
.103
.084
.076
.910
1.099
-.108
.020
-.299
-5.492
.000
-.343
-.307
-.290
.940
1.064
a. Dependent Variable: In a typical week how much fresh or frozen chicken do you buy for your household cons umption (Kg.)?
Only these parameters (household size and
price) emerge as significantly different from 0
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
40
Coefficient interpretation –
intercept
• The constant represents the amount spent being
zero all other variables. It provides a negative
value, but the hypothesis that the constant is
zero is not rejected
• A household of zero components, with no income
is unlikely to consume chicken
• However, estimates for the intercept are often
unsatisfactory, because frequently there are no
data points with values for the independent
variables close or equal to zero
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
41
Coefficient interpretation
• The significant coefficients tell one that:
• Each additional household component means an
increase in consumption by 277 grams
• A £ 1 increase in price leads to a decrease in
consumption by 108 grams
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
42
Stepwise regression procedure
•
Explores each single explanatory variable before
entering it in the model. The procedure:
1.
2.
3.
•
Adds the variable which shows the highest bivariate correlation with the
dependent variable
The partial correlation of all remaining potential independent variables
(after controlling for the independent variable already included in the
model) are explored and the explanatory variable with the highest
partial correlation coefficients enters the model
The model is re-estimated with two explanatory variables, then the
decision whether to keep the second one is based on the variation of
the F-value or other goodness-of-fit statistics like the Adjusted Rsquare, Information Criteria, etc. If the variation is not significant, then
the second variable is not included in the model. Otherwise it stays in
the model and the process continues for the inclusion of a third variable
(go back to step 2)
At each step, the procedure may drop one of the
variables already included in the model if there is no
significant decrease in the F-value (or any other
targeted stepwise criterion)
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
43
Forward and backward
• Forward regression works exactly like step-wise
regression, but variables are only entered and not
dropped. The process stops when there is no
further significant increase in the F-value
• Backward regression starts by including all the
independent variables and works backward
(according to the step-wise approach), so that at
each step the procedure drops the variable which
causes the minimum decrease in the F-value and
stops when such decrease is not significant.
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
44
Stepwise regression in SPSS
Choose the variable
selection procedure
method here and
proceed as usual
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
45
Stepwise regression output
Coefficientsa
Model
1
2
(Constant)
Average price
(Constant)
Average price
Number of people
currently living in
your household
(including yours elf)
Unstandardized
Coefficients
B
Std. Error
2.067
.170
-.124
.020
1.128
.268
-.111
.019
.314
.071
Standardized
Coefficients
Beta
-.305
t
12.190
-6.270
4.204
-5.685
Sig.
.000
.000
.000
.000
.238
4.427
.000
-.343
Correlations
Zero-order
Partial
Part
Collinearity Statistics
Tolerance
VIF
-.343
-.343
-.343
1.000
1.000
-.343
-.315
-.302
.975
1.026
.286
.250
.235
.975
1.026
a. Dependent Variable: In a typical week how much fresh or frozen chicken do you buy for your hous ehold consumption (Kg.)?
The first model only includes the “average price” variable
In the second step, the household size is included
No other variable enters the model
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
46
Regression ANOVA and the GLM
• (Factorial) ANOVA can be seen as a
regression model where all explanatory
variables are binary (dummy) variables
• Each of the dummy variables indicates
whether a given factor is present or not
• The T-test on the coefficients of the
dummies are mean comparison tests
• The F-test on the regression model is the
test for factorial (n-way) ANOVA
Statistics for Marketing & Consumer Research
Copyright © 2008 - Mario Mazzocchi
47
Download