one-way anova - My Illinois State

advertisement
ANALYSIS OF VARIANCE
(ANOVA)
?
=
?
=
1
STATITICAL DATA ANALYSIS
COMMON TYPES OF ANALYSIS?
1. Examine Strength and Direction of Relationships
a. Bivariate (e.g., Pearson Correlation—r)
 Between one variable and another:
rxy or Y = a + b1 x1
b. Multivariate (e.g., Multiple Regression Analysis)
 Between one dep. var. and each of several indep.
variables, while holding all other indep. variables constant:
Y = a + b 1 x1 + b 2 x2 + b 3 x3 + . . . + b k xk
2. Compare Groups
a. Compare Proportions (e.g., Chi-Square Test—2)
H0:
P1 = P2 = P3 = … = Pk
b. Compare Means (e.g., Analysis of Variance)
H0: µ1 = µ2 = µ3 = …= µk
2
ONE-WAY ANOVA
ANOVA was developed in 1919 by Sir Ronald Fisher, a
British statistician and geneticist/evolutionary biologist
Sir Ronald Fisher
(1890-1962)
When Do You Use ANOVA?
• To compare the mean values of a certain characteristic
among two or more groups.
• To see whether two or more groups are equal (or different)
on a given metric characteristic.
• To examine whether a metric dependent variable is a
function of a categorical independent variable.
3
Remember: Level of measurement determines choice of
statistical method.
Statistical Techniques and Levels of Measurement:
INDEPENDENT
NOMINAL/CATEGORICAL
N
O
M
I
N
A
L
M
E
T
R
I
C
* Chi-Square
* Fisher’s Exact Prob.
* T-Test
* Analysis of Variance
(An Example ?)
METRIC (ORDERED METRIC or
HIGHER)
* Discriminant Analysis
* Logit Regression
* Correlation Analysis
* Regression Analysis
4
ONE-WAY ANOVA
H0 in ANOVA?
H0: There are no differences among the mean values of the
groups being compared (i.e., the group means are all equal)–
H0: µ1 = µ2 = µ3 = …= µk
Ha (Conclusion if H0 rejected)?
Not all group means are equal
(i.e., at least one group mean is different from the rest).
5
ONE-WAY ANOVA
So, the number of steps involved in ANOVA depend on
if we are comparing 2 groups or > 2 groups:
• Scenario 1. When comparing 2 groups, a one-step test :
2 Groups:
A
B
Step 1: Check to see if the two groups are different or not, and if so, how.
• Scenario 2. When comparing >3 groups, if H0 is rejected, it is
a two-step test:
>3 Groups:
A
B C
Step 1: Overall test that examines if all groups are equal or not.
And, if not all are equal (H0 rejected), then:
Step 2: Pair-wise (post-hoc) comparison tests to see where (i.e., among
6
which groups) the differences exit, and how.
Typical solution presented in statistics classes require…
• Constructing an ANOVA TABLE
Test Statistic
ANOVA TABLE
Sum of
Squares
df
SSB
K–1
(Between
Groups Sum Of
Squares)
SSW
N–K
(Within Groups
Sum of Squares)
SST
(Total Sum of
Squares)
Mean Squares
F-Ratio
MSB = SSB / K-1
MSB 
n1 ( x1  x)
2
F = MSB / MSW
 n2 ( x 2  x) 2  ...  nk ( xk  x) 2
K 1
corresponding 
MSW = SSW / N-K
MSW 

( x  x ) 2  ( x  x ) 2  ...  ( x  x ) 2
nK

1i
1

2i
2
ki
k
N–1
Let’s see the intuitive logic…
7
ONE-WAY ANOVA
EXAMPLE: Whether or not average earnings per share (EPS) for commercial banks,
retailing operations, & utility companies (variable Industry) was the same last year.
• Sample Data: A random sample of 9 banks, 10 retailers, and 10 utilities.
• Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries
Banking
Retailing
Utility
$6.42
$3.52
4.21
4.36
2.67
3.49
4.68
3.30
2.68
7.25
0.16
2.83
8.94
6.80
5.70
4.65
6.20
2.71
8.34
-----
nB = 9
nR = 10
$3.55
2.13
3.24
6.47
3.06
1.80
5.29
2.96
2.90
1.73
nU = 10
n = 29
H0: There were no differences in average EPS of Banks, Utilities, and Retailers.
First logical thing you do?
_
xB = 5.84
_
xR = 3.63
_
xU = 3.31
=
X = 4.21
8
ONE-WAY ANOVA
Why is it called ANOVA?
• Differences in EPS (Dep. Var.) among all 29 firms has
two components--differences among the groups and
differences within the groups. That is,
a.
There are some differences in EPS among the three groups of
firms (Banks vs. Retailers vs. Utilities), and
b.
There are also some differences/variations in EPS of the firms
within each of these groups (among banks themselves, among
retailers themselves, and among utilities themselves).
• ANOVA will partition/analyze the variance of the dependent
variable (i.e., the differences in EPS) and traces it to its two
components/sources--i.e., to differences between groups vs.
differences within groups.
WHY?
9
ONE-WAY ANOVA
The underlying intuitive logic in ANOVA:
If the groups that are being compared, come from the same
population (i.e., if groups are alike/equal):
• They should exhibit similar differences (have equal variability)
• Hence, the differences among these groups
should be no more than the differences within
them (i.e., among members within same groups).
• That is, groups that are alike/similar are expected to have about
as much variability between them as they have within them.
10
ONE-WAY ANOVA
On the other hand…
If the groups being compared are divergent/dissimilar/unequal ?

They would exhibit more difference between them than
they show within them.
Among members within the same groups

That is, they will have greater similarity/commonality
internally than they have externally (with members of the
other groups).
11
ONE-WAY ANOVA
CRITERION USED BY ANOVA: Groups can be considered
different if there exists…?
…if there exists larger differences among these groups
than there are among members within them.
QUESTION:
Given the above, what would one have to do to conduct
ANOVA?
• That is, what do you have to do to judge whether or not two
or more groups can be considered different/equal (with
respect to a given characteristic)?
a. Compute the differences that exist among these groups, and
b. Compare it with the differences that exist within these groups.
And, that is exactly what ANOVA does….
QUESTION: How do we usually measure differences?
12
ONE-WAY ANOVA
QUESTION: How do we usually measure differences/variations?
• VARIANCE:
A useful index of differences/variations/
dispersion among a set of values/scores.
– Estimate of average (i.e., per observation) difference from the mean
• Computation?
Sum of squared deviations from the mean
S2 =
Sample Size – 1
2
S
( x  x)


2
n 1
13
ONE-WAY ANOVA
•
So, steps in performing ANOVA:
a. Compute the BETWEEN-GROUP VARIANCE for the
characteristic under study (i.e., the dependent variable),
b. Compute the WITHIN-GROUP VARIANCE for the same
characteristic/variable, and then
c. COMPARE the two
(i.e., check to see if Between Group var. > Within Group Var.)
NOTE: In ANOVA the term “MEAN SQUARE,” rather than
variance, is utilized.
14
ONE-WAY ANOVA
•
Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries
Banking Retailing
6.42
2.83
8.94
6.80
5.70
4.65
6.20
2.71
8.34
-----
3.52
4.21
4.36
2.67
3.49
4.68
3.30
2.68
7.25
0.16
nB = 9
nR = 10
_
xB = 5.84
Utility
3.55
2.13
3.24
6.47
3.06
1.80
5.29
2.96
2.90
1.73
_
xR = 3.63
nU = 10
n = 29
_
=
xU = 3.31 X = 4.21
Total WITHIN Group Variance (or Mean Square WITHIN)?
2
2
2
2
2
(6.42 - 5.84)  ...  (8.34  5.84)  (3.52  3.63)  ...  (0.16  3.63)  (3.55  3.31)  (1.73  3.31)
MSW 
15
(9  10  10  3)
2
ONE-WAY ANOVA
Mean Square WITHIN Groups (MSW):
(6.42 - 5.84) 2  ...  (8.34  5.84) 2  (3.52  3.63) 2  ...  (0.16  3.63) 2  (3.55  3.31) 2  (1.73  3.31) 2
MSW 
(9  10  10  3)
87.112
MSW 
 3.350
26
Let’s see what we just did:
MSW 
Sum of Squared Deviationsof All Observations From T heir RespectiveGroup Means SS Within

T otalSample Size - Number of Groups
(N - K)
The generic mathematical formula for MSW:
MSW 
 ( x Bi
 x B ) 2   ( x Ri  x R ) 2   ( xUi  x U ) 2
Called “Degrees of
Freedom”=
(nB-1)+(nR-1)+(nU-1)
nK
16
ONE-WAY ANOVA
•
Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries
Banking Retailing
_
6.42
2.83
8.94
6.80
5.70
4.65
6.20
2.71
8.34
-----
3.52
4.21
4.36
2.67
3.49
4.68
3.30
2.68
7.25
0.16
nB = 9
nR = 10
xB = 5.84
_
xR = 3.63
Utility
3.55
2.13
3.24
6.47
3.06
1.80
5.29
2.96
2.90
1.73
nU = 10
n = 29
_
=
xU = 3.31 x = 4.21
Let’s now compute the BETWEEN Group Variance (Mean Square
BETWEEN--MSB)?
9(5.84  4.21)2  10(3.63  4.21)2  10(3.31  4.21)2 35.397 17
MSB 

 17.698
3 1
2
ONE-WAY ANOVA
Mean Square BETWEEN Groups (MSB):
9(5.84  4.21)2  10(3.63  4.21)2  10(3.31  4.21)2 35.397
MSB 

 17.698
3 1
2
Let’s see what we just did:
MSB 
Weighted by respective
group sizes
Sum of Squared Deviations of Group Means from the Grand Mean SS Between

Number of Groups - 1
K -1
Mathematical formula for MSB:
MSB 
Called Degrees of
Freedom
nB ( x B  x) 2  nR ( x R  x) 2  nu ( xu  x) 2
K 1
18
ONE-WAY ANOVA
Mean Square Between Groups = MSB = 17.698
MSB represents the portion of the total differences/variations in
EPS (the dependent variable) that is attributable to (or explained
by) differences BETWEEN groups (e.g., industries)
• That is, the part of differences in companies’ EPS that result
from whether they are banks, retailers, or utilities.
19
ONE-WAY ANOVA
Mean Square Within Groups (MS Residual/Error) =
MSW = 3.35
MSW represents:
a. The differences in EPS (the dependent variable) that are
due to all other factors that are not examined and not controlled
for in the study (e.g., diversification level, firm size, etc.)
Plus . . .
b. The natural variability of EPS (the dependent variable) among
members within each of the comparison groups (Note that even
banks with the same size and same level of diversification would
20
have different EPS levels).
ONE-WAY ANOVA
Now, let’s compare MSB & MSW:
MSB = 17.6 and MSW = 3.35.
QUESTION:
Based on the logic of ANOVA, when would we consider two (or
more) groups as different/unequal?
When MSB is significantly larger than MSW.
QUESTION:
What would be a reasonable index (a single number) that will
show how large MSB is compared to MSW?
 (i.e., a single number that will show if MSB is larger than, equal
to, or smaller than MSW)?
21
Compare BETWEEN and WITHIN Group
Variances/Mean Squares--Compute the F-Ratio:
• Ratio of MSB and MSW (Call it F-Ratio):
MSB
F
MSW
• What can we infer when F-ratio is close to 1?
– MSB and MSW are likely to be equal and, thus, there
is a strong likelihood that NO difference exists among
the comparison groups.
• How about when F-ratio is significantly larger than 1?
– The more F-ratio exceeds 1, the larger MSB is
compared to MSW and, thus, the stronger would be the
likelihood/evidence that group difference(s) exist.
MSB 17.698
F

 5.282
MSW 3.350
• Results of the above computations are usually summarized
in an ANOVA TABLE such as the one that follows:
22
2
2
2
2
2
(6.42 - 5.84)  ...  (8.34  5.84)  (3.52  3.63)  ...  (0.16  3.63)  (3.55  3.31)  (1.73  3.31)
MSW 
(9  10  10  3)
MSW 
2
87.112
 3.350
26
9(5.84  4.21) 2  10(3.63  4.21) 2  10(3.31  4.21) 2 35.397
MSB 

 17.698
3 1
2
ANOVA TABLE
Source
Sum of
Squares
df
Mean Squares
F
Between
Groups
35.397
K–1=2
35.39 / 2 = 17.698 17.698 / 3.35 = 5.282
Within
Groups
87.112
N – K = 26
87.11 / 26 = 3.350
Total
122.509
N – 1 = 28
23
ONE-WAY ANOVA
Interpretation and Conclusion:
QUESTION: What does the F = 5.28 mean, intuitively?
For our sample companies, EPS difference across the three
industries (MSB) is more than 5 times the EPS difference
among firms within the industries (MSW)
• QUESTION: What is our null Hypothesis?
• QUESTION: Is the above F-ratio of 5.28 large enough to
warrant rejecting the null?
– ANSWER: It would be if the chance of being wrong (in rejecting the
null) does not exceed 5%.
– So, look up the F-value in the table of F-distribution (under appropriate
degrees of freedom) to find out what the -level will be if, given this Fvalue, we decide to reject the null.
• Degrees of Freedom: v1 = k – 1 = 2
v2 = n – k = 26
24
11
F = 3.37 is significant at  = 0.05
25
(If F=3.37 and we reject H0, 5% chance of being wrong)
• Our F = 5.28 > 4.27
–So, what can we say about our -level?
F = 4.27 is significant at  = 0.025.
 That is, if F=4.27 and we reject H0, we would face 5% chance of being wrong.
But, our F = 5.28 > 4.27
 So, what can we say about our -level? Will it be larger or smaller than 0.025?
26
ONE-WAY ANOVA
• Our F = 5.28 > 4.27
• The odds of being wrong, if we decide to reject the null, would
be less than 2.5% (i.e.,  < 0.025) .
Would rejecting the null be a safe bet?
Conclusion?
Reject the null and conclude that the average EPS is NOT EQUAL
FOR ALL GROUPS (industries) being compared.
Is the analysis complete?
27
ONE-WAY ANOVA
• Is our analysis complete?
– It would be if we were comparing only two groups; simply
examine which sample mean is larger than which and report!!
HOWEVER, …
– If null is rejected and more than two groups are being compared:
• REMAINING QUESTION: Where exactly (i.e., between
which groups) do the differences lie? And, which group(s) of
firms exhibit relatively higher, lower, or equal EPS levels?
• ANSWER: Perform post hoc, multiple comparison tests.
– SPSS (and other software packages) offer a variety of
options (e.g., LSD, Bonferroni, Tukey, etc.) to choose28from.
Let’s now review the steps involved…
ONE-WAY ANOVA
Overall Ho: All Group Means Are Equal
Is overall F significant?
(i.e.,  < 0.05)
H1: Not All Groups Are Equal
No ( > .05) Don’t reject Ho; No group diff. found; stop
Yes ( < .05)
Reject Ho; Not all group means are equal.
(i.e., at least 2 groups are diff.)
How many groups are being compared?
If only 2
Examine the group means.
Report which group has
higher/lower mean
Stop
If more than 2
Conduct post-hoc pairwise comparison tests to see
where the differences lie. Examine the results.
Examine the group means.
Report which groups have higher/lower means.
29
Stop
ANOVA in SPSS
Let’s now use SPSS to perform the same analysis.
NOTE: Students are supposed to have printed and
brought the “SPSS OUTPUT One-Way ANOVA”
PDF file with them to class.
ONE_WAY_EPS_SPSS_FILE
30
TWO-WAY ANOVA (with Interaction)
In our EPS example, suppose you suspect that a company’s size category
(small vs large) also may have a sig. effect on EPS. As such, since you did
not attempt to control for company size when selecting your sample firms,
small and large companies may not have been equally represented in the
three industry groups (e.g., what if compared to the banks in the sample, all
or a much greater % of retailers and utilities were small?). As such you are
concerned that the potential confounding effect of company size may have
distorted your earlier results. So, you now wish to examine possible EPS
differences among the 3 industries while controlling for the possible
confounding effect of company size (i.e., holding size constant/equal for the
firms in our three industries). In other words, you wish to know if there are
any differences among average EPS of banks, retailers, and utilities of equal
size.
SOURCES OF BETWEEN GROUP DIFFERENCES
.
COMPANY
SIZE
Bank
INDUSTRY
Retailer
Utility
Small
Large
31
TWO-WAY ANOVA (with Interaction)
• So, Two-Way ANOVA will help us learn if banks in general, even
after controlling for co. size, would, on average, have higher EPS
than retailers and utilities.
• But an additional advantage of Two-Way ANOVA is that it can
also show us whether a particular group of banks (i.e., CERTAIN
COMBINATIONS of industry and size category) are more/less
conducive to EPS than others combinations of the two
characteristics.
As just one example, it can show us if only the larger banks
(and not all banks in general) have significantly higher EPS
compared to firms in the other two industries (or compared to
only the smaller firms in the other two industries).
32
ANOVA Using SPSS
• TWO-WAY ANOVA (with Main & Interaction Effects):
–Analyze: General Linear Models
–Univariate: Y to “Dependent” box, Categorical X1 & X2 to the
“Fixed Factors” box
–Model:
Full, Continue
–Plots:
X1 to “Horizontal”, X2 to “Separate Lines”, Add,
Continue
–Post Hoc: Move factors (IVs) with >2 groups to “Post Hoc
Tests” box, select “Tukey or Bonferoni”, Continue
–Options: Move Overall, X1, X2, and X1*X2 to “Display
Means” Box, check “Descriptive Stats.”, Continue
–OK
NOTE: Students are supposed to have printed and brought the “SPSS
OUTPUT Two-Way ANOVA with Interaction” PDF file with them to class.
TWO_WAY_EPS_SPSS_FILE
33
TWO-WAY ANOVA (Main & Interaction Effects Model)
Ho: There are no differences among the groups represented by either variable
No Don’t reject Ho; No group diff. found; STOP
Is overall F significant?
(i.e.,  < 0.05)
Reject Ho; Some differences among the groups
Yes
represented by at least one of the var.
Determine if the interaction effect is significant?
NO
YES
Examine plot of interaction effect for results
a. Examine which main effect, if any, is significant (i.e., differences exist
across categories of which independent variable).
STOP
b. Is the significant indep. var. dichotomous (i.e. represents only 2 groups)?
Yes, only 2 groups
No, more than 2 groups
Examine the group means
for that variable; report
which group has
higher/lower mean.
Conduct post-hoc pairwise comparison tests for that var.
to see where the differences lie. Examine the results.
Examine the group means for that variable; report
which groups have higher/lower means.
34
STOP
STOP
ANOVA
CAUTION:
Don’t get carried away with the number of factors (independent
categorical variables); DON’T DO N-WAY ANOVA !!!
35
ANOVA Using SPSS
ANOTHER EXAMPLES:
• Using the gss.sav data file, we wish to find out if the
age at which one gets married (agewed) is a function
of one’s gender (sex) and highest educational degree
(degree). That is, if average marriage age is different
among the two genders and various educational
groups. If so, in what way?
•
NOTE: Here, we are considering/treating educational
degree as a nominal/categorical variable, and NOT as
an ordered metric variable.
36
ASSIGNMENT 4
1. Suppose, as a social scientist, you are interested in studying
gender differences in preference for different types of music.
Specifically, you wish to know if there are differences between
men and women relative to how much they like classical music
(variables classical). The gss.sav data file (on your SPSS Data
Disk) includes data regarding such issues. This data set represents
1500 randomly selected cases from the 1993 General Social
Survey. Use the data from this SPSS file to address the above
questions.
NOTE:
If you check the value labels for the variables classical, opera, and
country in the gss.sav file, you will see that they were measured
on 5-point scales (1=Like Very Much, 5=Dislike Very Much) and,
thus, can be considered metric.
37
ASSIGNMENT 4
2. As a staff researcher in the HR Department of a major
company, you are interested in learning if there are
differences among male and female employees and among
employees who have different levels of education regarding
the level of importance that they attach (a) to having a
fulfilling job. Data regarding such issues have been
obtained through the General Social Research Survey using
a representative sample of approximately 1500 working
men and women in the U.S. You have access to the
resulting data (see gss.sav SPSS data file, variables sex,
impjob, and degree). Use this data set to address the above
issues.
38
IMPORTANT NOTES FOR QUESTIONS 2, 3, AND 4:
 Treat variable “degree” as a categorical/nominal variable.
 When interpreting the results, please pay attention to the fact that if you check
the value labels for the dependent variables, you will notice that it was
measured on 5-point scales (1=One of Most Important, 5=Not at All
Important).
 If you find it necessary to conduct ad-hoc multiple comparison tests,
use the Tukey option.
 IMPORTANT: If alpha level for a given test is just slightly higher than 0.05
(e.g., 0.054) consider that difference statistically significant.
REMINDERS:
– For each analysis, include the Notes part of the SPSS output in the printout.
Also edit the first page of every output to include your name. Make sure
that you state your complete interpretations and explanations on the
appropriate pages of the output. Be specific as to how you have used what
parts of the output to reach your conclusions. Make sure that your
explanations are complete. For example, it is not enough to say that there is
a difference between groups A and B regarding characteristic C. You have to
go on to indicate how the two groups are different on characteristic C (e.g.,
39
“on average, group A exhibits more/less of the characteristic C”).
QUESTIONS OR
COMMENTS
40
Download