Uploaded by Muhammad Wasi

chap 9. Analysis of Variance

advertisement
2023/6/27
Chapter 9. Analysis of Variance
 Main Idea
○ When we compare the means of two samples, we used t-test. When we are
interested in the differences between three groups, we would better use F-test,
rather than t-test.
○ If we are to carry out t-tests on every pair of groups, then that would involve
doing three separate tests: comparison of 1 and 2, 2 and 3, and finally 1 and 3.
– If each of these t-test use 0.05 level of significance, then each test the
probability of falsely rejecting the null hypothesis (Type I error) is only 5%. If
1
we assume each test is independent, then the overall probability of no Type I
error is .953 = .857.
– So, the probability of at least one Type I error is 1 - .857 = .143 or 14.3%.
○ An ANOVA produces an F-statistic, which is similar to the t-statistic in that it
compares the amount of systematic variance to the unsystematic variance.
– It tests overall effect 𝑥̅1 = 𝑥̅2 = 𝑥̅3 .
Treatment
Sample size
Sample mean
1
2
j
x11
x12
x21
.
.
.
x22
.
.
.
x2j
.
.
.
x2k
.
.
.
xn11
xn22
xnjj
xnkk
n1
𝑥̅1
n2
𝑥̅2
nj
𝑥̅𝑗
nk
𝑥̅𝑘
...
x1j
k
...
x1k
­ Xij= ith observation of jth sample.
2
­ nj = the number of observations in the same j.
­ 𝑥̅𝑗 = the mean of the jth sample.
­ 𝑥̿ = the gran mean of all the observation.
○ Test statistic
­ Sum of Squares for Model (between-treatment variation)
𝑘
SSM = ∑ 𝑛𝑗 (𝑥̅𝑗 − 𝑥̿ )2
𝑗=1
 If 𝑥̅1 = 𝑥̅2 = . . . = 𝑥̅𝑘 , then SSM=0.
­ Sum of Squares of Error (within-treatments variation)
𝑘
𝑛𝑗
SSE = ∑ ∑ 𝑛𝑗 (𝑥𝑖𝑗 − 𝑥̅𝑗 )2
𝑗=1 𝑖=1
­ Mean Squares of Model
MSM =
­ Mean Squares of Error
𝑆𝑆𝑀
𝑘−1
3
MSE =
𝑆𝑆𝐸
𝑛−𝑘
F=
𝑀𝑆𝑀
𝑀𝑆𝐸
- Test statistic
 Two types of datafile.
­ Wide type file
`Bumper 1` `Bumper 2` `Bumper 3` `Bumper 4`
1
610
404
599
272
2
354
663
426
405
3
234
521
429
197
4
399
518
621
363
5
278
499
426
297
6
358
374
414
538
7
379
562
332
181
8
548
505
460
318
9
196
375
494
412
10
444
438
637
499
­ Long type datafile
values
ind
4
1
610 Bumper 1
2
354 Bumper 1
3
234 Bumper 1
4
399 Bumper 1
5
278 Bumper 1
6
358 Bumper 1
○ R only uses long type datafile for AOV or Regression, and etc. You have to use
command “stack” and “unstack” to change the datafile types.
Ex 9-1)
A car company is considering several new types of bumpers. To test how well
they react to low-speed collisions, 10bumpers of each of four different types
were installed. The cost of repairing the damage in each case was assessed. Is
there sufficient evidence at the 5% significance level to infer that the bumpers
differ in their reactions to collision?
>
>
>
>
library(readxl)
bumper <- read_excel("R/data/22F data/Ex9-1.xlsx")
View(Ex9_1)
bumperstack <-stack (Ex9-1)
# Change into a long file.
5
> summary(aov(values~ind))
Df Sum Sq Mean Sq F value Pr(>F)
ind
3 150884
50295 4.056 0.0139 *
Residuals
36 446368
12399
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The p-value of bumper is .014 << .05. We can conclude that there exist differences
in the cost between bumpers.
Ex 9-2) Two-factor Experiment
A survey was conducted in which Americans aged between 37 and 45 were
asked how many jobs they have held in their lifetimes. Education levels are
1: Less than high school
2: High school
3: Some college without degree
4: At least one university degree.
The gender categories are: 1: male and 2: female.
6
Can we infer that differences exist between genders and educational levels?
> X14_4b <- read_excel("R/data/22F data/Ex9-2.xlsx")
> View(Ex9_2)
> twoaov <- aov(Jobs ~ Gender*Education, data= Ex9_2)
> anova (twoaov)
Analysis of Variance Table
Response: Jobs
Df Sum Sq Mean Sq F value
Pr(>F)
Gender
1 11.25 11.250 1.1655 0.2837386
Education
1 134.56 134.560 13.9406 0.0003624 ***
Gender:Education 1
0.16
0.160 0.0166 0.8978966
Residuals
76 733.58
9.652
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The p-value for Gender is .29 >> .05. There is no evidence at the 5%
significance level to infer that differences in the number of jobs exist between
men and women.
The p-value for Education is .00036 << .05. There is sufficient evidence to
conclude that differences in the number of jobs exist between educational level.
7
> twoaov1 <- aov(Jobs ~ Gender + Education, data= Ex9_2)
> anova (twoaov1)
Analysis of Variance Table
Response: Jobs
Df Sum Sq Mean Sq F value
Gender
1
11.25
11.250
Pr(>F)
1.1806 0.2806246
Education 1 134.56 134.560 14.1210 0.0003316 ***
Residuals 77 733.74
9.529
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
8
Download