Uploaded by dostoyohima

8 ANOVA

advertisement
Analysis Of Variance
ANOVA
‫اختبار التباين‬
One way ANOVA
Two-Way (Factorial) ANOVA
Repeated measures (within-subjects) ANOVA
1
One way ANOVA
‫اختبار التباين األحادي في اتجاه واحد‬
Also known as:
• One-Factor ANOVA
• One-Way Analysis of Variance
• Between Subjects ANOVA
2
Common Applications
• One way ANOVA is used when we want to study the effect of
one independent, qualitative variable on dependent
continuous variable, the independent variable has more than
two subgroups.
• One way ANOVA compare the means for independent groups
• It can be thought of as an extension of the two independent
samples t-test, but it used to detect a difference in means of 3
or more independent groups.
3
Required variables
• The requirements for one way ANOVA:
1. One (single) dependent (also called outcome or
response) continuous (scale/interval/ratio) variable such
as weight, blood pressure, cholesterol.
2. One (single) Independent (factor) categorical variable
has > 2 groups (at least 3 unrelated/ independent
groups) such as marital status ( single, married, divorced)
economic status ( low, middle, high )
4
Example
• Assume that we have recorded the biomass of certain
bacterial species in broth medium at three pH levels.
• The researcher wishes to know if the biomass means
(measured by optical density, O.D) of bacterial species are
different between the three pH levels.
Replicate
pH 5.5
pH 6.5
pH 7.5
1
12
20
40
2
15
19
35
3
9
23
42
5
Example
• Data: The data set ‘Diet.sav’ contains
information on 78 people who undertook one of
three diets. There is background information such
as age, gender and height as well as weight lost
on the diet (a positive value means they lost
weight). The aim of the study was to see which
diet was best for losing weight so the
independent variable (group) is diet.
6
Cast the data into a table, labeling each group as Diet 1, Diet 2 and Diet 3
Subject
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
n
Mean
Diet 1
K1
3.8
6
0.7
2.9
2.8
2
2
8.5
1.9
3.1
1.5
3
3.6
0.9
-0.6
1.1
4.5
4.1
9
2.4
3.9
3.5
5.1
3.5
n1 = 24
3.3
Type of diet
Diet 2
K2
0
0
-2.1
2
1.7
4.3
7
0.6
2.7
3.6
3
2
4.2
4.7
3.3
-0.5
4.2
2.4
5.8
3.5
5.3
1.7
5.4
6.1
7.9
-1.4
4.3
n2 =27
3.03
Diet 3
K3
7
5.6
3.4
6.8
7.8
5.4
6.8
7.2
7
7.3
0.9
7.6
4.1
6.3
5
2.5
0.9
3.5
0.5
2.8
8.6
4.5
2.8
4.1
5.3
9.2
6.1
n3 =27
5.15
j is the observations in each group j =
1, 2, 3, …………n
In this example n = 24, 27 and 27 for
the first, second and third diet
respectively
i is the diet group i = 1, 2, ….k
In this example k = 3
7
Hypothesis
The one-way analysis of variance is used to test the claim
that three or more population means are equal
H0: µ1 = µ2 = µ3 =………. =µk
Ha: At least two of the means µ1, µ2, µ3 …….. , µk are different.
Or Ha: not all µ are equal
Where µ1 is the population mean of Diet 1,
µ2 is the population mean of Diet 2 and
µ3 is the population mean of Diet 3
µk is the population mean of group k
Hypothesis test :
 Null hypothesis : There is no difference in the average weight loss of
persons given three diet groups
 Alternative hypothesis : There is a difference in the average weight loss
of persons given three types of diet between at least two groups.
8
The Use of Computers
• The calculations required by analysis of variance are long and
complicated for this reason the computer assumes an
important role in analysis of variance.
9
Enter data
10
Assumptions
• The data are randomly sampled
• Observations of the dependent (outcome) variable within
each group were obtained independently. The independence
assumption means that there is no association between the
observations in the different groups and between the
observations in the same group.
• Observations of the dependent (outcome) variable within
each group are drawn from normally distributed populations.
• Residuals should be normally distributed
• Homogeneity of variance i.e. Variances of the populations are
equal. That is the variances of various groups are
homogenous.
11
Notes on assumptions
 Of the all assumptions, independence is the most crucial.
If this assumption is violated, the inferences based on
ANOVA are invalid.
 The analysis of variance is not heavily dependent on the
normality assumption; ANOVA is robust to minor
departures from normality. It is worthy to mention that
the ANOVA test is especially resistant to departures from
normality when the sample sizes are equal
 Similarly, the assumption of equal variances is crucial but
not critical. If the number of observations in each group is
the same, inferences about means not seriously affected by
unequal population variances.
12
•
Before carrying any analysis, summarize weight loss by
treatment using a box-plot and some summary statistics.
• Do the group means and standard deviations look similar or
very different?
13
14
• Diet 3 seems better than
the other diets as the
mean weight lost is
greater.
• The standard deviations
are similar so weight lost
within each group is
equally spread out.
The assumption of equal variances can also be checked by examining the spread of
the observations in the boxplots.
15
Checking the assumptions of normally
distributed dependent variable in each group
• Assumptions
– The dependent variable in each group is normally distributed .
• How to check
– Click on Analyze in the main menu > Descriptive Statistics >
Explore
– Move the WeightLost variable to the Dependent List and the
Diet variable to the Factor List. Click on Plots button and fill out
its dialog box to produce histograms/ QQ plot / Shapiro Wilk
tests for each diet group.
• What to do if the assumption is not met
– In case if the assumption is violated, natural logarithm or other
data transformations may be tried to correct this problem.
– If none of the available transformations turns out to be
successful (for example when data contain seriously outlying
observations), the Kruskal-Wallis test can be applied to the data.
16
17
Check normality
As p > 0.05, the dependent variable in the three groups is
normally distributed
18
Start ANOVA
• Enter
dependent
variable in
Dependent
List
• Enter
independe
nt variable
in Factor
This selection will create new
variable of the standardized residuals
for each subject which will added
to the dataset
19
This selection is essential to check the
assumption of Homogeneity (equality) of
variance
OK
20
Checking the assumptions of normally
distributed Residuals
• Assumptions
– Residuals should be normally distributed
• How to check
– Use the Save menu within General Linear Model
request the standardized residuals for each subject to
added to the dataset (already done) and
– then use Analyze > Descriptive Statistics > Explore
produce histograms/ QQ plot / Shapiro Wilk tests
residuals.
to
be
to
of
• What to do if the assumption is not met
– If the residuals are very skewed, the results of the ANOVA
are less reliable. The Kruskall-Wallis test should be used
instead of ANOVA.
21
22
Check normality
As p > 0.05, the residuals are normally distributed
23
Checking the assumption of
Homogeneity (equality) of variance
• Assumptions
– Homogeneity (equality) of variance: The variances (SD
squared) should be similar for all the groups.
• How to check
– The Levene’s test is carried out if the Homogeneity of
variance test option is selected in the Options menu.
– If p > 0.05, equal variances can be assumed.
• What to do if the assumption is not met
– If p < 0.05, the results of the ANOVA are less reliable.
– The Welch test is more appropriate and can be accessed
via the Options menu using Analyze > Compare Means >
One-way ANOVA.
– The Games Howell post hoc test should also be used
instead of Tukey’s.
24
Output
H0: variances are equal
Ha: variances are different
Since p > 0.05 , = 0.520 we can not reject H0
F = test statistic
= =MSBET/MSw35.547/5.736= 6.197
Between Groups
Within Groups
Since the
significance is
less than 0.05,
you may
reject the null
hypothesis
• When writing up the results, it is common to report certain
figures from the ANOVA table.
• F(dfbetween, df within) = Test Statistic, P =
F(df2, df75) = 6.197, P = 0.003.
There was a significant difference in weight lost [F(df2, df75) =
6.197, P = 0.003] between the diets.
25
New Terms in Analysis of Variance
• Sum of squares and mean square are only new names for
familiar concepts.
• The sum of squares (abbreviated SS)
– another name for variation
– the deviations between a value and the mean of the values
– the numerator of the variance.
• The mean square (abbreviated MS)
– is just the variance
– The mean squares (or variances) could be easily obtained by dividing
the sums of squares by their degrees of freedom (df(.
n
s2 
x
i 1
i
x 
2
n 1
26
Sum of the squares in ANOVA table
• There are two sources of variation in ANOVA table
– the variation between the groups, SS(BET)
– the variation within the groups, SS(W)
Between Groups
Within Groups
–There is also another sum of the squares (Total), but it is not a
source of variation
27
Sum of the squares(between)
– Sometimes called the
variation due to the
factor
– Denoted SS(BET) for Sum
of Squares (variation)
between the groups
– The
between
group
variation measures how
much the group means
vary from the grand
(overall) mean (= 3.84)
SS 

  n x  x

 n x  x   n x  x  
k
BET
SS 
i 1
i
i

2
2
BET
1
1
2
2
2
Subject
Type of diet
Diet 2
K2
0
0
-2.1
2
1.7
4.3
7
0.6
2.7
3.6
3
2
4.2
4.7
3.3
-0.5
4.2
2.4
5.8
3.5
5.3
1.7
5.4
6.1
7.9
-1.4
4.3
n2 =27
3.03
Diet 1
K1
3.8
6
0.7
2.9
2.8
2
2
8.5
1.9
3.1
1.5
3
3.6
0.9
-0.6
1.1
4.5
4.1
9
2.4
3.9
3.5
5.1
3.5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
n
Mean
n1 = 24
3.3
 n x  x 
k
Diet 3
K3
7
5.6
3.4
6.8
7.8
5.4
6.8
7.2
7
7.3
0.9
7.6
4.1
6.3
5
2.5
0.9
3.5
0.5
2.8
8.6
4.5
2.8
4.1
5.3
9.2
6.1
n3 =27
5.15
2
k
SS  BET   24  3.3  3.84   27  3.03  3.84   27  5.15  3.84   71.0936
2
2
2
28
Sum of the squares (between)
SS  BET   24  3.3  3.84   27  3.03  3.84   27  5.15  3.84   71.0936
2
2
2
Between Groups
Within Groups
29
Sum of the squares(within)
– This is called the within group
variation
– Denoted SS(W) for Sum of
Squares (variation) within the
groups
– SS(W) measures how much the
individuals vary from their group
mean.
– Each difference between an
individual and its group mean is
called a residual.
– These residuals are squared and
added together to give (SSW).
Subject
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
n
Mean
Diet 1
K1
3.8
6
0.7
2.9
2.8
2
2
8.5
1.9
3.1
1.5
3
3.6
0.9
-0.6
1.1
4.5
4.1
9
2.4
3.9
3.5
5.1
3.5
n1 = 24
3.3
SS W    i 1  j i1 (x ij  x i ) 2 = SS1 + SS2 +........+ SSk
SSW= 115.42 + 165.5523 + 149.2075 = 430.1798
k
Type of diet
Diet 2
K2
0
0
-2.1
2
1.7
4.3
7
0.6
2.7
3.6
3
2
4.2
4.7
3.3
-0.5
4.2
2.4
5.8
3.5
5.3
1.7
5.4
6.1
7.9
-1.4
4.3
n2 =27
3.03
Diet 3
K3
7
5.6
3.4
6.8
7.8
5.4
6.8
7.2
7
7.3
0.9
7.6
4.1
6.3
5
2.5
0.9
3.5
0.5
2.8
8.6
4.5
2.8
4.1
5.3
9.2
6.1
n3 =27
5.15
n
30
Sum of the squares (within)
SS W    i 1  j i1 (x ij  x i ) 2 = SS1 + SS2 +........+ SSk
k
n
SSW = 115.42 + 165.5523 + 149.2075 = 430.1798
Between Groups
Within Groups
31
Sum of the squares (Total)
– This is called the total
variation
– Denoted SS(TOT) for
the total Sum of
Squares (variation)
– SS(TOT) = SSBET + SSW
OR
SS TOT    i 1  j 1 (x ij  x ) 2
k
ni
Subject
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
n
Mean
Diet 1
K1
3.8
6
0.7
2.9
2.8
2
2
8.5
1.9
3.1
1.5
3
3.6
0.9
-0.6
1.1
4.5
4.1
9
2.4
3.9
3.5
5.1
3.5
n1 = 24
3.3
Type of diet
Diet 2
K2
0
0
-2.1
2
1.7
4.3
7
0.6
2.7
3.6
3
2
4.2
4.7
3.3
-0.5
4.2
2.4
5.8
3.5
5.3
1.7
5.4
6.1
7.9
-1.4
4.3
n2 =27
3.03
Diet 3
K3
7
5.6
3.4
6.8
7.8
5.4
6.8
7.2
7
7.3
0.9
7.6
4.1
6.3
5
2.5
0.9
3.5
0.5
2.8
8.6
4.5
2.8
4.1
5.3
9.2
6.1
n3 =27
5.15
32
Sum of the squares (Total)
Between Groups
Within Groups
33
Degrees of freedom (df)
In one-way ANOVA, there are three degrees of freedom:
•
The degrees of freedom between groups, df.BET is one less than
the number of groups
df.BET = k – 1 = 2 where k is the number of groups (K = 3).
•
The degrees of freedom within groups, df.within.
df.w = N – k = 75 where N is the total observations (N = 78)
Alternatively, the within group df is the sum of the individual df’s of
each group
–
–
The sample sizes are 24, 27, and 27
df(W) = 23 + 26+ 26= 75
34
• The total df is one less than the total sample sizes
– df(TOT) = N – 1 = 78– 1 = 77
– Or df(TOT) = df.BET + df.w
Between Groups
Within Groups
35
Mean Squares (variances)
– The variances are also called the Mean of the Squares and
abbreviated by MS.
– They are found by dividing the variation (SS) by the
degrees of freedom (df)
V ariation
V ariance 
df
SS
MS 
df
In ANOVA table there are two mean squares
•
MSBET, the variance between groups (read mean square between)
MS BET 
•
SS BET SS BET

df BET
k 1
MSW, the variance within groups (read mean square within)
SSW
SSW
MSW 

df W
N k
36
Mean Squares
MS BET
SS BET SS BET 71.094



 35.547
df BET
k 1
3 1
SSW
SSW
430.179
MSW 


 5.736
df W N  k
78  3
Between Groups
Within Groups
37
F-test
– F - statistic is the ratio of two variances, the MSBET and
MSW.
– In fact, ANOVA stands for ‘Analysis of variance’ as it uses
the ratio of between group variability (MSBET) to within
group variability (MSW), when deciding if there is a
statistically significant difference between the groups.
F
Between-groups variability (MS BET )
Within-groups variability (MSW )
For our data, F 
MS BET 35.547

 6.197
MSW
5.736
Between Groups
Within Groups
38
Decision rule
• ANOVA procedures utilize a distribution called the F
distribution.
• A given F distribution has two separate degrees of
freedom, represented by df1and df2.
– The first, df1, is called the degrees of freedom for the
numerator (represents dfbetween = 2) and
– the second, df2, is called degrees of freedom for the
denominator (represents dfwithin = 75).
• The critical F value for 2 (dfbetween) and 75 (dfwithin) is
3.18 for α of 0.05 and 5.06 at α of 0.01
39
The critical F values for 2 and
75 are 3.18 for α of 0.05 and
5.06 at α of 0.01
40
Decision
• The computed F value 6.197 is greater than the tabulated
(critical ) value for α = 0.05, F2,75 = 3.18 and also for α = 0.01,
F2,33 = 5.06
Decision: Reject H0 at 0.05 and even at 0.01 level of α. i.e. p < 0.01
Non rejection
Region
F = 3.18
41
P-value
You may reach the same decision and reject the null
hypothesis based on the significance (p-value = 0.003) which
is less than 0.05 and even less than 0.01
Between Groups
Within Groups
Conclusion: This indicates that at least one of the means is
significantly different from the others- that is type of diet
appears to be associated with (affect) weight loss
42
Where does variability come from?
Three types:
1. INDIVIDUAL DIFFERENCES: Variability between all
participants (gender, age, height, mood, favorite
food-----etc). People bring different experiences to
your study.
2. EXPERIMENTAL ERROR: Inaccurate measurement,
poor planning of the study. May be measure weight
with a broken scale.
F
Between-groups variability(MS BET )
Within-groups variability (MSW )
Indiv. Diff. + Exper. Error +..........?...........
F 
Indiv. Diff. + Exper. Error
3. ??????????????????????
43
3. TREATMENT/ EXPOSURE EFFECT:
– This is a between group variance.
– Cannot influence within-group variance since all the
subjects in a group are given the same treatment
(Diet).
So, the treatment effect is the only source of variance that
can influence between-groups variance that doesn’t
influence within-groups variance.
Indiv. Diff. + Exper. Error +Treatment effect
F 
Indiv. Diff. + Exper. Error
44
Summary of calculation equations
45
Which treatments differ from one another?
• ANOVA tests the null hypothesis ‘all group means are the
same’ so the resulting p-value only concludes whether or
not there is a difference between one or more pairs of
groups. But it does not tell us which group differ from one
another.
• Which pairs of means are different from one another and
which are not?
• In order to know which group differ from one another
further ‘post hoc’ tests have to be carried out to confirm
where those differences are found.
46
Post-hoc Comparisons of Treatments
• The post hoc tests are Several tests, named after their developers
• They are mostly t-tests with an adjustment to test all possible
pairs.
• Although the tests show some theoretical differences, in practice,
the results do not give very different results.
• One of these tests needs to be undertaken only when the results
of ANOVA indicate that there is a significant difference between
the means of the groups.
47
• Repeat the ANOVA making the following adjustments in the post
hoc window
• Move the independent variable (factor) from the Factor to the Post
hoc Tests for box, then choose from the available tests.
– Tukey’s and Scheffe’s tests are the most commonly used post hoc tests.
– Hochberg’s GT2 is better where the sample sizes for the groups are
very different.
48
Report each of the three pairwise comparisons
e.g. there was a significant difference between
diet 3 and diet 1 (p = 0.02).
Use the mean difference between each pair e.g.
people on diet 3 lost on average 1.85 kg more
than those on diet 1 or use individual group
means to conclude which diet is best.
Mean of diet 1– Mean of diet 3
3.3 – 5.1481 = - 1.8481
49
Reporting ANOVA
• A one-way ANOVA was conducted to compare the
effectiveness of three diets. Normality checks and Levene’s
test were carried out and the assumptions met.
• There was a significant difference in mean weight lost
[F(2,75)=6.197, p = 0.003] between the diets.
• Post hoc comparisons using the Tukey test were carried out.
• There was a significant difference between diets 1 and 3 (p =
0.02) with people on diet 3 lost on average 1.85 kg more than
those on diet 3.
• There was also a significant difference between diets 2 and 3
difference (p = 0.005) with people on diet 3 lost on average
2.12 kg more than those on diet 2.
50
Welch test
• If the assumption of Homogeneity (equality) of variance is not
met, the Welch test is more appropriate
• The Games Howell post hoc
test should also be used
instead of Tukey’s.
51
52
53
Two-Way (Factorial) ANOVA
• Two way ANOVA is an extension to one way ANOVA, it can be
used when we want to study the effect of two independent
categorical variables (factors) on continuous, dependent
variable based on means comparisons.
•
Two way ANOVA
– calculates main effects for each independent variable
– calculates interactive effects between independent
variables.
54
Required variables
• The requirements for two way (between-groups)
ANOVA:
1. One (single) dependent (also called outcome or
response), continuous (scale/interval/ratio) variable
such as weight, blood pressure, cholesterol.
2. Two independent, categorical (grouping factors)
variables.
55
Example
•
Data: The data set ‘Diet.sav’ contains information on 78
people who undertook one of three diets. There is
background information such as age, gender and height as
well as weight lost on the diet (a positive value means they
lost weight).
• The aim of the study was to see which diet was best for losing
weight but it was also thought that best diets for males and
females may be different so the independent variables are
diet and gender.
56
57
58
Assumptions
Basic assumptions of two-way ANOVA
1. The outcome (dependent) variable must be normally
distributed. The measurements in each cell are
assumed to be drawn from a population with a normal
distribution.
2. Homogeneity of variance in groups. The measurements
in each cell are assumed to come from distributions
with approximately the same variance.
3. The groups must be independent. The measurements
in each cell are come from independent random
samples.
4. The residuals are normally distributed
59
Hypotheses
• There are three hypotheses with a two-way
ANOVA.
– Two for the main effects (diet and gender), and
– one for the interaction of the two, diet and
gender.
60
Hypothesis test
Effect of diet on weight loss
Is there a significant difference in the average weight loss due to
type of diet?
Is the type of diet affect the weight loss?
• Type of diet (main effect 1)
 Null hypothesis: The effect of diet type on weight loss is not
significant.
 Alternative hypothesis: The effect of diet type on weight loss is
significant.
H0: There is no difference in average weight loss due to diet type.
i.e., H0: µ(1) = µ(2) =µ(3)
i.e. The type of diet has no effect on weight loss
Ha: At least two of the averages µ(1) , µ(2) ,µ(3) are different.
61
Hypothesis test
Effect of gender on weight loss
Is there a significant difference in the weight loss due to
gender?
• Gender (main effect 2)
 Null hypothesis: The effect of gender on weight loss is not
significant
 Alternative hypothesis: The effect of gender on weight loss is
significant.
H0: There is no difference in the average weight loss due to
gender.
i.e., H0: µ(male) = µ(female)
i.e. The gender has no effect on weight loss
Ha: µ(male) ≠ µ(female) .
62
The interaction (diet & gender)
Effect of diet and gender on weight loss
‫التأثير المتبادل بين نوع الطعام والجنس‬
Is there a significant difference in the weight loss due to
the interaction of the two variables, type of diet and
gender?
 Null hypothesis: The interaction between diet and gender on
weight lost is not significant
 Alternative hypothesis: The interaction between diet and
gender on weight lost is significant.
63
Steps in SPSS
• To carry out an ANOVA, select
Analyze > General Linear Model > Univariate
• Put the dependent variable (weight lost) in the Dependent
Variable box
• Put the independent variables (Diet and Gender) in the Fixed
Factors box.
64
• In the Plots menu, move Diet to the Horizontal Axis
box, Gender to the Separate Lines box and click Add.
65
• Request Tukey’s from the Post hoc menu for both factors.
66
• Ask for standardised residuals via the Save menu to check the
assumption of residuals normality.
67
Checking the assumptions for twoway ANOVA
• Assumptions
– Residuals should be normally distributed
• How to check
– Use the Save menu to request the standardized
residuals for each subject to be added to the dataset
and then use Analyze > Descriptive Statistics > Explore
to produce histogram of residuals.
• What to do if the assumption is not met
– If the residuals are very skewed, the results of the
ANOVA are less reliable. There is no equivalent nonparametric test in SPSS but
• transforming the dependent variable or
• a separate ANOVA by gender (i.e. for males and females
separately) are options.
68
Check normality of residuals
• (Using Analyze > Descriptive Statistics > Explore to produce
the histogram). The residuals are normally distributed.
69
Checking the assumptions for Twoway ANOVA
• Assumptions
– Homogeneity (equality) of variance (Levene’s test).
• How to check
– Use the Options menu to select Homogeneity tests for
equality of variances. If p > 0.05, equal variances can
be assumed
– If p < 0.05, the results of the ANOVA are less reliable.
• What to do if the assumption is not met
– There is no equivalent test but comparing the p-values
from the ANOVA with 0.01 instead of 0.05 is
acceptable.
70
• Select Homogeneity tests from the Options menu to check
the assumption of equal variances for each combination of
diet/ gender.
71
The output
• Checking the assumptions for equality of variances
As p > 0.05, equal variances can be assumed.
72
Calculation Equations
Factor A (diet)
Factor B (Gender)
= 49.679
Sum of squares due to diet effect
= 0.428
Sum of squares due to diet effect
= 33.904
Sum of squares due to interaction effect
73
Sum of squares due to diet effect (SSDiet)
= 49.679
Sum of squares due to gender effect (SSgender)
= 0.428
Sum of squares due to interaction effect (SS Diet*gender)
= 33.904
Sum of squares of due to Error = SSError = 376.329
dfDiet= k -1 = 3-1 = 2
dfgender= n -1 = 2-1 = 1
dfDiet*gender = (k -1) (n -1) = (3-1)(2-1)=2
dfError (residuals) = Total number of observations -(k)(n) = 76 -(3x2)=70
F statistic of diet = MSDiet /MSError = 4.620
F statistic of gender = MSgender /MSError = 0.08
F statistic of Diet*gender = MS Diet*gender /MSError = 3.153
74
• The results of the two-way ANOVA and post hoc tests are
reported for the main effects and the interaction.
• There was a statistically significant interaction effect of Diet
and Gender on weight loss [F(2, 70)=3.153, p = 0.049].
• Since the interaction effect is significant (p = 0.049),
interpreting the main effects can be misleading.
75
• To easiest way to interpret the interaction is to use the plot
from the output known as a Means or interaction plot which
shows the means for each combination of diet and gender.
• The plot clearly shows a
difference between males
and females in the way
that diet affects weight
lost, since the lines are
not parallel.
• The differences between
the mean weight lost on
the diets is much bigger
for females.
76
• The values of the means can be obtained by splitting file as following
77
Interactions
•
In Two-way ANOVA an interaction is the combined effect of
two independent variables on one dependent variable.
• In a single (one) observations two-way ANOVA however,
interaction cannot be measured where data in each cell of the
table consist of a single observations .
• In tests where no interaction occurred the lines in the plot
known as a means or interaction plot are reasonably parallel.
78
• Some people just use the interaction plot to
describe the combined effect of diet and
gender but others prefer to carry out one way
ANOVA’s for each group of one factor.
• For this example, it makes sense to look at the
differences between the diets by gender.
79
• To carry out separate ANOVA’s by gender, use
Data > Split File
• Select ‘Compare groups’ and move Gender to the
‘Groups Based on’ box.
• After clicking ‘OK’, all analyses and charts will
appear separately for males and females until the
split is cancelled by going back to this box and
80
selecting ‘Analyze all cases, do not create groups’.
• Run a one-way ANOVA for Diet:
Analyze > General Linear Model > Univariate
• The results appear separately for males and females.
• Reporting results
• There
was
a
difference between
the mean weight lost
on the 3 diets for
females
(F(2,40)=10.64, p <
0.001) but not for
males (F(2,30)=0.148,
p = 0.863).
• Only the post hoc
tests for females
should be interpreted.
• You should also report
the mean weight lost
for each diet for
females.
81
Continue …
• Tukey’s post hoc tests were carried out for females.
• Diet 3 was significantly different from diet 1 (p = 0.002) and
diet 2 (p < 0.001) but there is no evidence to suggest that
diets 1 and 2 differ (p = 0.841).
82
Continue …• For females, the mean weight lost on diet 3 was 5.88kg compared to
only 3.05kg and 2.61kg on diets 1 and 2 respectively.
• Normality checks and Levene’s test were carried out and the
assumptions were met.
83
Two-Way ANOVA
A single (one) observation per cell Example
• A physical therapist wished to compare three methods
for teaching patients to use a certain prosthetic device.
He felt that the rate of learning would be different for
patients of different ages and wished to design an
experiment in which the influence of age could be
taken into account.
84
• Note that the five age groups and three teaching devices give rise
to data which has only one observation per ‘cell.’ For example, the
age group “20 to 29” using teaching method C needs a 10 days to
learn the use of the prosthetic device, while using the same
teaching method, the “50 and over” age group needs 14 days to
learn. We obtain one observation per cell and cannot measure
variation within a cell. In this case we cannot check for interaction
between the age group and the teaching method- the two factors
used in this example.
• Running an experiment several times results in multiple
observations per cell and in this case we should assume that there
may be interaction between the factors and check for this.
85
SPSS
To fit a model without an
interaction, click the
Model button to open
the Univariate: Model
dialog box.
86
Select a model type
(Main effects)
87
88
F = MSmethod/MSError
F = MSAge/MSError
Statistical decision.
Since the p value of method = 0.001, we reject the null hypothesis of no
effects of teaching method on the rate of learning time to use the prosthetic
device.
Similarly, Since the p value of age = 0.001, we also reject the null hypothesis
of no effects of patients ages on rate of learning to use the prosthetic device.
In a single observations two-way ANOVA, interaction cannot be measured
where data in each cell of the table consist of a single observations .
89
90
• The 15 means can be displayed in a line/ means plot.
• For all teaching methods (A, B and C), the fastest (i.e.
shortest time) rate of learning are for those under 20,
followed by 20 – 29, 30 – 39, 40 – 49 and then 50 and over.
• The rate of teaching method A are higher than B and C at
all age groups.
• There is no interaction between teaching method and age
groups as the lines are reasonably parallel.
• An
interaction
occurs when the
lines are not
quite so parallel;
such that the
means of one
group do not
follow the same
pattern as the
other group.
91
Repeated measures (within-subjects)
ANOVA
93
Common Applications:
• Used when several measurements of the same
dependent variable are taken at different time points
or under different conditions.
Repeated measures ANOVA tests
(1) changes in mean score over 3 or more time points or
(2) differences in mean score under 3 or more conditions.
• This is the equivalent of a one-way ANOVA but for
repeated samples and is an extension of a pairedsamples t-test.
• Repeated measures ANOVA is also known as ‘withingroups’ ANOVA ‫اختبار تحليل التباين داخل المجموعات‬.
94
Required variables
• One (single) dependent variable of Continuous (scale)
• One (single) independent variable: Categorical e.g. time/
condition
95
Assumptions for repeated measures
ANOVA
• Assumptions
– Normality of residuals by time point
• How to check
– In the Save menu, ask for the standardised residuals. A set
of residuals will be produced for each time point and
added to the data set.
– Use histograms/ Shapiro-Wilk tests to check they are
approximately normally distributed.
• What to do if the assumption is not met
– If the residuals are very skewed, ANOVA is not reliable so
use the non-parametric Friedman test instead
97
Assumptions for repeated measures
ANOVA
• Assumptions
– Sphericity ‫الدورية‬: the variances of the differences between
all combinations of the related conditions/ time points are
equal (similar to the assumption of equal variances in
ANOVA).
• How to check
– Mauchly’s test of Sphericity is automatically given in the
output. If p > 0.05, Sphericity can be assumed.
• What to do if the assumption is not met
– Use the p-value from the Greenhouse-Geisser correction
row in the ‘Tests of Within-Subjects Effects’ ANOVA table.
98
Example
•
Data: Participants used Flora margarine for 8 weeks. Their
cholesterol (in mmol/L) was measured before the special diet,
after 4 weeks and after 8 weeks.
• Use the SPSS file ‘Cholesterol.sav’ to see if the use of margarine
has changed the mean cholesterol.
Make sure in your data set that there is one row per person and a
separate column for each of the three time points or conditions.
99
Hypothesis
H0: µ(before) = µ(after 4 weeks) = µ(after 8 weeks)
Ha: Not all µ are equal .
100
Steps in SPSS
• To carry out a repeated measures ANOVA, use
Analyse > General Linear Model > Repeated measures.
101
• This screen is where we define the levels of our repeated measures factor
which in our case is time.
• You need to name it using whatever name you like (we have used “time”
in this case) and then state how many time points there are (which here is
3; before the experiment, after 4 weeks and after 8 weeks).
• Make sure you click on the Add button and then click on the Define
button.
102
• Move the three cholesterol variables across into the WithinSubjects Variables box.
• Post hoc tests for repeated measures are in the Options
menu.
103
• Move time to the Display Means for box
Choose
Bonferroni
from the
Confidence
interval
adjustment
menu
104
• select In the Save menu, ask for the standardised residuals to
be added to the dataset (ZRE_1 – ZRE_3).
105
• These will be added to your dataset by SPSS when you run
the analysis.
• They should then be checked for normality using
histograms/ Shapiro-Wilk tests in Analyze > Descriptive
Statistics > Explore.
106
The output
• Histograms of the standardised residuals at the three time
points showed an approximate normal distribution.
107
• The test is significant (p < 0.001) so the assumption of Sphericity
has not been met.
p-value
108
• If Sphericity can be assumed, use the top row of the ‘Tests of
Within-Subjects Effects’ below.
• If it cannot be assumed, use the Greenhouse-Geisser row (as
shown below) which makes an adjustment to the degrees of
freedom of the repeated measures ANOVA.
As p < 0.001,
there’s a
difference in
cholesterol
between at
least 2 time
points
• Report the results of this table using [F(dftime, dfError(time))=
Test statistic F, p = …].
• Here a Greenhouse-Geisser correction was applied to the
degrees of freedom so use [F(1.235, 21.001)= 212.321, p <
109
0.001] when reporting the results.
• As the main ANOVA is significant, this means that there is a difference
between at least two time points.
• The Pairwise comparisons table contains multiple paired t-tests with a
Bonferroni correction.
• There was a significant difference between each pair of time points.
• Cholesterol reduced by 0.566 mmol/L between baseline and 4 weeks (p <
0.001) and then reduced by an additional 0.063 mmol/L between 4 and 8
weeks (p = 0.004).
111
Reporting ANOVA
• Participants used Flora margarine for 8 weeks. Their
cholesterol was measured before the special diet, after 4
weeks and after 8 weeks. Normality checks were carried
out on the residuals which were approximately normally
distributed.
• A repeated measures ANOVA with a Greenhouse-Geisser
correction showed that mean cholesterol differed
significantly between time points [F(1.235, 21.001)=
212.321, p < 0.001].
• Post hoc tests using the Bonferroni correction revealed that
Cholesterol reduced by an average of 0.566 mmol/L after 4
weeks (p < 0.001) and then reduced by an additional 0.063
mmol/L between 4 and 8 weeks (p = 0.004).
112
113
Statistical Soup
ANOVA, ANCOVA, MANOVA, &
MANCOVA
114
ANOVA
• The core component of all four of these
analyses (ANOVA, ANCOVA, MANOVA and
MANCOVA) is the first in the list, the ANOVA.
115
ANOVA
• An "Analysis of Variance" (ANOVA) tests three
or more groups for mean differences based on
a continuous (i.e. scale or interval) response
(dependent) variable.
• The group membership could be Race, level of
education, or treatment condition.
116
ANOVA
• There are two main types of ANOVA:
(1) "one-way" ANOVA compares levels (i.e. groups) of a single
factor based on single continuous response variable (e.g.
comparing Weight loss by 'Type of diet')
Single independent variable
(factor)
1. Type of diet= 1, 2, 3
Single continuous dependent variable
(Response or Outcome)
1. Weight loss
(2) a "two-way" ANOVA compares levels of two or more
factors for mean differences on a single continuous response
variable(e.g. comparing Weight loss by both 'Type of diet‘ and
‘Favorite soft drink').
1.
2.
Two independent variables
(factors)
Type of diet= 1, 2, 3
Favorite soft drink= Don’t like,
Sugarless, With sugar
Single continuous dependent variable
(Response or Outcome)
1.
Weight loss
117
ANCOVA ‫تحليل التغاير‬
‫اختبار تحليل التباين المصاحب‬
• If we want to study the effect of independent variable (Type of diet)
on dependent variable (Weight loss). Someone could suggest, that a
person’s Age will have an added influence in the amount of weight
they lose on a particular diet.
• In this case ANCOVA take into consideration the covariate age which
may has direct effect on Weight loss
• One or more variables that could be related to the dependent
variable could be controlled.
Two independent variables
One categorical variable
Factor
One or more variable/s
Controlled variable/s
covariate variable/s
Single continuous
dependent variable
(Response or Outcome)
Type of diet= 1, 2, 3
Age
(continuous variable)
Weight loss
ANCOVA COMPARES A CONTINUOUS RESPONSE VARIABLE (E.G. WEIGHT LOSS) BY LEVELS OF A FACTOR
118
VARIABLE (E.G. TYPE OF DIET), CONTROLLING FOR COVARIATE /s (E.G. AGE).
ANCOVA with three covariate variables
3 ‫ كمتغير تابع بناء على‬FEV ‫• في هذا المثال تم دراسة التغير في حجم الزفير القسري‬
‫فئات العمرية كمتغير مستقل اخذين بعين االعتبار متغيرات الطول والجنس وحالة‬
FEV ‫) قد تؤثر على معدل‬covariate variables( ‫التدخين كمتغيرات مصاحبة‬
Note here that the
covariate/s could
include continuous or
categorical variables
Forced expiratory volume
(FEV) measures how much air
a person can exhale during a
forced breath.
121
ANCOVA -> regression
• If ANCOVA is carried out with a single response continuous
variable and no factors, such analysis will be a regression
(when covariate variable is continuous ).
Two independent variables
One categorical variable
Factor
One or more variable/s
Controlled variable/s
covariate variable/s
Single continuous
dependent variable
(Response or Outcome)
Type of diet= 1, 2, 3
Age
(continuous variable)
Weight loss
122
MANOVA
• The obvious difference between ANOVA and a "Multivariate Analysis of
Variance" (MANOVA) is the “M”, which stands for multivariate.
• In basic terms, A MANOVA is an ANOVA with two or more continuous
response variables.
• Like ANOVA, MANOVA has both a one-way and a two-way.
• The number of factor variables involved distinguish a one-way MANOVA
from a two-way MANOVA.
One-way MANOVA
Two or more continuous dependent
Single independent variable
variables
(Factor)
(Response, Dependent or Outcome)
1. Weight loss
1. Type of diet= 1, 2, 3
2. Cholesterol level
Two-way MANOVA
Two or more continuous dependent
At least two independent variables
variables
(Factors)
(Response, Dependent or Outcome)
1. Type of diet= 1, 2, 3
1. Weight loss
2. Favorite soft drink= Don’t like,
123
2. Cholesterol level
Sugarless, With sugar
• A more subtle way that MANOVA differs from ANOVA in that
MANOVA compares levels of a factor that has only two levels
(binary).
• When dealing with a single response variable and binary factor (e.g.
gender), one uses an independent sample t-test.
independent sample t-test
Single binary independent variable
(Factor)
Gender = Male, Female
Single continuous dependent variable
(Response or Outcome)
Weight loss
• However, a t-test can not estimate differences for more than one
response variable together, thus a MANOVA fills that need.
One-way MANOVA
independent variable
(Factor)
Gender = Male, Female
Two or more continuous dependent
variables
(Response or Outcome)
1. Weight loss
2. Cholesterol level
124
Why Should You Do a MANOVA?
• You do a MANOVA instead of a series of one-at-a-time
ANOVAs
• MANOVA takes into account the intercorrelations among the
dependent variables (DVs).
One-way MANOVA
Single independent variable
(Factor)
1.
Type of diet= 1, 2, 3
Two or more continuous dependent
variables
(Response, Dependent or Outcome)
1. Weight loss
2. Cholesterol level
Two-way MANOVA
1.
2.
At least two independent variables
(Factors)
Two or more continuous dependent
variables
(Response, Dependent or Outcome)
Type of diet= 1, 2, 3
Favorite soft drink= Don’t like,
Sugarless, With sugar
1.
2.
Weight loss
Cholesterol level
125
MANCOVA
• Like ANOVA and ANCOVA, the main difference between MANOVA
and MANCOVA is the “C,” which again stands for “covariance.”
• Both a MANOVA and MANCOVA feature two or more response
variables, but the key difference between the two is the nature of
the independent variables.
• While a MANOVA can include only factors, an analysis evolves from
MANOVA to MANCOVA when one or more covariates are added to
the mix.
One-way MANCOVA
Two independent variables
One categorical variable
Factor
Type of diet= 1, 2, 3
One or more variable/s
Controlled variable/s
covariate variable/s
Age
Two or more continuous
variables
Dependent variable
(Response or Outcome)
1.
2.
Weight loss
Cholesterol level
126
Download