Analysis of Variance

advertisement

Chapter 10: Analysis of Variance

2.

3.

4.

1.

5.

6. a.

The error sum of squares is the sum of the squared deviation of each observation from its group average. b.

The within-group sum of squares is the sum of squares within each group. c.

The between group sum of square is the sum of squares between each group average from the overall mean. d.

The mean square error is the sum of squares divided by the degrees of freedom.

The mean square error

F = 3.5. p -value = 0.0352

The Bonferroni correction factor is a method for calculating probabilities associated with multiple comparisons. You essentially divide the significance level required for statistical significance by the number of planned comparisons you wish to make between the various factor levels. a.

0.0389 b.

0.0175 c.

0.0082 d.

0.0040 e.

0.0021 a.

Term

Region

The completed table is:

SS df

9305

Method

Interaction

Error

Total SS

12,204

6,023

32,809

60,341 b.

R 2 = 0.4563

3

1

3

32

39

MS

3,101.67

12,204.00

2,007.67

1,025.28

F

3.025

11.903

1.958 c.

The p -value for the Region term is 0.0437

The p -value for the Method term is 0.0016

The p -value for the Interaction term is 0.1401 d.

Both the Region and the Method effects are statistically significant. There is no statistical evidence of an interaction between the two factors.

1

Chapter 10: Analysis of Variance

7. a.

Use the delete command to remove the data. b.

The one-way ANOVA is:

SUMMARY

Count

LA

SF

DC

NY

ANOVA

Groups

Source of Variation

Between Groups

Within Groups

Total

SS

23424.26

39959.48

63383.74

8

7

8

8

Sum

1083

946

1083

1585

Average

135.38

135.14

135.38

198.13 df

3

27

MS

7808.087

1479.981

30

Variance

980.84

1524.81

1771.13

1649.55

F

5.276

P-value

0.005

F crit

2.960 c.

Count

Average

The means matrix is:

Standard Deviation

Minimum

Maximum

City = LA City = SF City = DC City = NY

8.000

135.375

31.318

79.000

175.000

Pairwise Mean Difference (row - column)

7.000

135.143

39.049

99.000

185.000

8.000

135.375

42.085

64.000

189.000

8.000

198.125

40.615

135.000

250.000

City = LA

City = SF

City = DC

City = NY

MSE = 1479.98082010582

City = LA City = SF City = DC City = NY

0.000 0.232 0.000 -62.750

0.000 -0.232

0.000

-62.982

-62.750

0.000

Pairwise Probabilities (Bonferroni Correction)

City = LA

City = SF

City = DC

City = NY

City = LA City = SF City = DC City = NY

- 1.000 1.000 0.018

-

-

1.000

-

0.023

0.018 d.

There is a significant difference in hotel prices between those in New York and those in Los

Angeles, San Francisco, and Washington D.C. By removing the outlier from the San Francisco data set, we've reduced the MSE of the analysis and thus the pairwise differences are statistically significant where they weren't before.

2

Chapter 10: Analysis of Variance

8. a.

The interaction plot appears as:

250

Average of Price

200

150

100

50

City

DC

LA

NY

SF

0

2 3 4

Stars

The lines are not exactly parallel, so there may be an interaction between the number of stars and the city with respect to the hotel price, but if it exists, it will be a small one. b.

The two-way ANOVA is:

Source of Variation

Sample

Columns

Interaction

Within

Total

SS

23,794.08

20,462.79

5,295.58

15,493.50

65045.96 df

6

12

2

3

MS

11,897.042

6,820.931

882.597

1,291.125

23

F

9.214

5.283

0.684

P-value

0.004

0.015

0.667

F crit

3.885

3.490

2.996 c.

The City effect is significant with a p -value of 0.015. The Stars effect is significant with a p -value of 0.004. There is no significant interaction between the City and Stars factor.

Average 2-star price: $131.50

Average 3-star price: $170.50

Average 4-star price: $208.63 d.

IN both cases the City factor is significant. By performing the two-way analysis we can discount hotel quality as the reason for the difference since the price is still different after adjusting for the effect of hotel quality.

3

Chapter 10: Analysis of Variance e.

The graph is:

Price vs. Stars

300

SF: y = 56.5x - 15.833

LA: y = 19x + 75.167

250

DC: y = 42.5x + 60

NY: y = 36.25x + 98.75

200

150

100

DC

LA

NY

SF

Linear (DC)

Linear (LA)

Linear (NY)

Linear (SF)

50

9.

350

300

250

200

150

100

50

0

0

1.5

2 2.5

3 3.5

4 4.5

Stars

The slopes appear similar for San Francisco, Washington, and New York, with slopes ranging from 36.25 to 56.5 dollars per star. However the slope appears much lower for Los Angeles with a value of $19 per star. a.

400

The boxplot appears as:

Cola = coke Cola = pepsi Cola = shasta Cola = generic

4

The multiple histograms appear as:

Chapter 10: Analysis of Variance b.

The one-way ANOVA is:

Source of Variation

Between Groups

Within Groups

Total

SS

183,750.50

80,355.96

264106.46 df

3

44

MS

61,250.17

1,826.27

47

F P-value

33.54 1.94E-11

F crit

2.816

5

Chapter 10: Analysis of Variance c.

The means matrix is:

Count

Average

Standard Deviation

Cola = coke

12.000

307.275

34.614

Minimum

Maximum

245.800

362.900

Pairwise Mean Difference (row - column)

Cola = coke

Cola = coke

0.000

Cola = pepsi

Cola = shasta

Cola = pepsi

12.000

142.442

29.554

89.700

210.700

Cola = pepsi

164.833

0.000

Cola = generic

MSE = 1826.27189393939

Pairwise Probabilities (Bonferroni Correction)

Cola = coke -

Cola = coke Cola = pepsi

0.000

Cola = pepsi

Cola = shasta

Cola = generic

-

Cola = shasta

12.000

275.725

42.670

214.600

362.900

Cola = shasta

31.550

-133.283

0.000

Cola = shasta

0.464

0.000

Cola = generic

12.000

239.958

58.419

156.100

327.800

Cola = generic

67.317

-97.517

35.767

0.000

Cola = generic

0.002

0.000

0.278 -

-

The pairs with significant differences in foam volume are: (Coke, Pepsi), (Coke, Generic), (Pepsi,

Shasta), and (Pepsi, Generic).

6

10. a.

The multiple histograms appear as:

Chapter 10: Analysis of Variance

The histograms show that tuition drops as you go from the prestigious schools to the lest prestigious. Notice that the spread is much narrower in the first group, which suggests a possible problem with unequal variance. However, the analysis of variance is fairly robust with respect to this assumption.

7

Chapter 10: Analysis of Variance b.

The one-way ANOVA is:

Count Groups

ANOVA

Source of Variation

Between Groups

Within Groups

1

2

3

4

Total

SS

1.17E+08

92804855

6

6

6

6

2.1E+08

Sum

93754

83038

65550

60948

Average

15,625.67

13,839.67

10,925.00

10,158.00

Variance

288,624.67

5,679,268.27

9,738,750.00 df MS

3 38,909,841.06

20 4,640,242.73

2,854,328.00

F P-value

8.39 0.00083

23

F crit

3.098

The p -value of 0.00083 allows us to reject the hypothesis of equal means at the 0.1% level. Group does have a significant effect. Note that the means of the four groups decrease with the group numbers, meaning that there is a possible trend towards lower tuition for colleges lower in the rating system. c.

The means matrix is:

Count

Average

Standard Deviation

Descriptive Statistics

Group = 1 Group = 2 Group = 3

6.000 6.000 6.000

15,626

537.238

13,840

2383.122

10,925

3120.697

Minimum

Maximum

14,710

16,250

10,945

17,200

6,450

15,000

Pairwise Mean Difference (row - column)

Group = 1 Group = 2 Group = 3

Group = 1

Group = 2

0.000 1786.000

0.000

4700.667

2914.667

0.000 Group = 3

Group = 4

MSE = 4640242.73333335

Pairwise Probabilities (Bonferroni Correction)

Group = 1 Group = 2 Group = 3

Group = 1

Group = 2

Group = 3

Group = 4

-

-

0.999

-

0.007

0.177

Group = 4

6.000

10,158

1689.476

8,150

12,400

Group = 4

5467.667

3681.667

767.000

0.000

-

Group = 4

0.002

0.046

1.000

There are significant differences between groups that arenot adjacent, but adjacent groups do not differ significantly. The first quartile is significantly more expensive than groups 3 and 4, but not group 2. The bottom quartile is significantly less expensive than the first two quartiles, but not the third quartile.

8

11.

Chapter 10: Analysis of Variance d.

The results of the regression command are:

Regression Statistics

Multiple R

R Square

Adjusted R Square

Standard Error

Observations

0.731

0.534

0.513

2106.081

24

ANOVA

Regression

Residual df SS MS

1 111,951,673.63 1.12E+08

22 97,582,704.20 4435577

Total

Intercept

Group

23

-1,931.77

209,534,377.83

Coefficients Standard Error

17,466.50 1053.041

384.516 t Stat

16.587

-5.024

F

25.239

P-value

6.39E-14

4.97E-05

Significance F

4.97237E-05

Lower 95%

15,282.625

-2,729.205

Upper 95%

19,650.375

-1,134.328

The coefficient for group is –1932 which means that tuition drops by $1932 when the Group is increased by 1 quartile.

F -ratio = [(97,582,704 – 92,804,855)/2]/4,640,243 = 0.515 e.

a.

Use the delete command to delete the appropriate rows from the worksheet. b.

The boxplot appears as:

60

50

40

30

20

10

0

-10

Group = 1 Group = 2 Group = 3 Group = 4

The outlier is Hampshire College at 55 students per computer.

9

Chapter 10: Analysis of Variance c.

The one-way ANOVA is:

SUMMARY

Groups

ANOVA

Source of Variation

Between Groups

Within Groups

1

2

3

4

Total

Count

3

5

3

3

SS

178.192

2088.179

2266.371

Sum

20.790

77.040

32.610

23.310 df

3

10

Average

6.930

15.408

10.870

7.770

MS

59.397

208.818

Variance

22.270

497.568

20.379

6.305

F

0.284

P-value

0.836

13

F crit

3.708

The high p -value of 0.836 does not permit rejection of the hypothesis of equal means among the four grups. Note the very high variance in the second group due to the outlier from Hampshire

College. d.

Without the outlier the one-way ANOVA is:

SUMMARY

Groups Count

ANOVA

Source of Variation

Between Groups

Within Groups

Total

1

2

3

4

SS

50.984

128.771

3

4

3

3

179.756

Sum

20.79

22.04

32.61

23.31 df

3

9

12

Average

6.93

5.51

10.87

7.77

MS

16.995

14.308

Variance

22.270

10.288

20.379

6.305

F

1.188

P-value

0.368

F crit

3.863

With a p -value of 0.368 there is still not any indication of a significant difference between groups.

Does this mean that the four quartiles equal in terms of access to computers for their students?

Because of missing data, the analysis here is limited to just a few colleges and you should be reluctant to make any assertions about equal access based on such small samples. It is conceivable that a larger sample would find a significant difference between quartiles.

10

Chapter 10: Analysis of Variance

12. a.

3000

The boxplot and multiple histograms appear as:

2500

2000

1500

1000

500

0

Position = 3B

-500

Position = SS Position = 2B Position = 1B

11

Chapter 10: Analysis of Variance b.

The plots for LN salary are:

9

8

7

6

5

4 Position = 3B

3

Position = SS Position = 2B Position = 1B

The distribution of Salary is highly skewed, whereas the distribution of LN Salary is more symmetric.

12

Chapter 10: Analysis of Variance c.

The results of the one-way ANOVA are:

SUMMARY

Groups

3B

SS

2B

1B

ANOVA

Source of Variation

Between Groups

Within Groups

Total

Count

30

26

26

24

SS

4.384

87.012

91.395

Sum

181.243

147.114

158.635

148.968 df

3

102

105

Average

6.041

5.658

6.101

6.207

MS

1.461

0.853

Variance

0.897

0.820

0.542

1.171

F

1.713

P-value

0.169

F crit

2.694

The p -value is 0.169, so we cannot calim that there is a significant difference between positions.

Although the middle infielders (second base and shortstop) are not necessarily equally productive as hitters, they compensate with there fielding and therefore make salaries comparable to the others. For example, Ozzie Smith, a top shortstop, was not a great hitter but was paid well for his fielding. Futhermore, with LN Salary, there would need to be great disparities in salary to see significant differences using ANOVA. because the logarithmic transformation reduces the differences between numbers so dramatically.

13

Chapter 10: Analysis of Variance

13. a.

The boxplot and histograms appear as:

0.25

0.2

0.15

0.1

0.05

Position = 3B

0

Position = SS Position = 2B Position = 1B

The plots do not give us any strong reason to disbelieve a hypothesis of equal variance between the two groups.

14

Chapter 10: Analysis of Variance b.

The one-way ANOVA is:

SUMMARY

Groups

3B

SS

2B

1B

ANOVA

Source of Variation

Between Groups

Within Groups

Total

Count

30

26

26

24

SS

0.0591

0.0562

0.1153

Sum

3.77774

2.34500

2.34864

3.52020 df

3

102

105

Average

0.12592

0.09019

0.09033

0.14668

MS

0.020

0.001

Variance

0.00074

0.00046

0.00042

0.00055

F

35.789

P-value

0.000

F crit

2.694

The p -value is < 0.001, indicating that we can reject the null hypothesis that there is no difference between the groups and accept the alternative hypothesis that there exists a difference in RBI average for players from different positions. c.

The means matrix is:

Count

Average

Standard Deviation

Position = "3B"

RBI Aver

Sum of Squares

Pairwise Mean Difference (row - column)

30

0.12592

0.027206

0.497175

"3B"

"SS"

"2B"

"1B"

"3B"

0.000

MSE = 5.50540487668649E-04

Pairwise Probabilities (Bonferroni Correction)

"3B"

"3B" -

"SS"

"2B"

"1B"

-

Descriptive Statistics

Position = "SS" Position = "2B"

RBI Aver RBI Aver

26

0.09019

0.021363

0.222910

26

0.09033

0.020535

0.222700

"SS" "2B"

0.036

0.000

0.036

0.000

0.000

"SS"

0.000

-

"2B"

0.000

1.000

Position = "1B"

RBI Aver

24

0.14668

0.023534

0.529065

"1B"

-0.021

-0.056

-0.056

0.000

"1B"

0.010

0.000

0.000

-

First basemen have significantly higher RBI averages than all other position players. Shortstops and second basemen are significantly lower in their RBI averages than third basemen. Usually the second basemen and the shortstop are small and less powerful because they must be agile and quick for defense. First base is the least demanding defensive position in terms of agility, so the first baseman is often the biggest and most powerful hitter among the infield players.

15

Chapter 10: Analysis of Variance

14. a.

5sp at

The results of the two-sample t -test are:

Mean Diff.

-4,106.13

N

10

15

Std. Err.

1,229.240

Descriptive Statistics

Mean Std. Dev.

4,754.00

8,860.13

2,862.252

3,102.878 t

-3.340

Std. Err.

905.124

801.160 t-Test Analysis df

23.00 p-value

0.003 lower 95%

-6,649.01 upper 95%

-1,563.26

Equality of Variance Tests

F-Test

0.829

Bartlett

0.795

Levene

0.951 b.

The results of the one-way ANOVA are:

SUMMARY

Groups

5sp at

ANOVA

Source of Variation

Between Groups

Within Groups

Total

Count

10

15

SS

1.01E+08

2.09E+08

3.1E+08

Sum

47,540.00

132,902.00 df

Average

4,754.00

8,860.13

MS

1 101,161,985.71

23 9,066,188.42

24

Variance

8,192,487.78

9,627,853.12

F

11.16

P-value

0.0028

F crit

4.2793 c.

t -ratio 2 = (–3.340) 2 = 11.160 = F -ratio d.

The results of the ANOVA are:

SUMMARY

Groups

5sp at

Count

10

15

Sum

64

52

ANOVA

Source of Variation

Between Groups

Within Groups

Total

SS

51.627

222.133

273.76 df

1

23

24

Average

6.400

3.467

MS

51.6267

9.6580

Variance

12.489

7.838

F

5.3455

P-value

0.0301

F crit

4.2793

Automatic transmissions are signicantly more expensive, but they're also (in this data set) significantly younger. Therefore it's hard to determine whether the difference in price is a result of the transmission or the age of the model.

16

e.

Chapter 10: Analysis of Variance

The slope and confidence interval for the 5-speed transmission is –720.16 (–1,022,32 , –418.00)

For the automatic transmission it is –900.36 (–1,287.60 , –513.12)

The slopes are not exactly the same. From the intercepts, it would appear that new automatics are more expensive, but form the slopes they seem to depreciate faster as well (but another explanation is that most of them are not as old as the automatics and are in the stage of rapid depreciation.) This problem attempts to incorporate the fact that the 5-speed transmission cars are older by extrapolating the effects of age using linear regression.

It is difficult to make conclusions about the value of used cars with two regression lines because the two confidence intervals overlap. Also there is not much data in this data set. Nor are the data series necessarily linear. The chief problem with this data set is that type type of transmission is not independent of age, which has such a dramatic effect on price. It is not easy to draw any firm conclusions about the relationship between price and transmission type because of these issues.

17

Chapter 10: Analysis of Variance

15. a.

The boxplots and histograms are:

14000

12000

10000

8000

6000

4000

2000

Trans Age = 1 5sp Trans Age = 1 at Trans Age = 4 5sp Trans Age = 4 at Trans Age = 6 5sp Trans Age = 6 at

0

There is not enough data to determine whether the variance differs between the groups. There does not appear to be any large violations of the constant variance assumption.

18

Chapter 10: Analysis of Variance b.

The interaction plot appears as:

Interaction Plot

12,000

10,000

8,000

6,000

4,000

5sp at

2,000

0

1 to 3 4 to 5

Age

6 or more

There is no evidence of a strong interaction effect. c.

Count

Sum

Average

Variance

Count

Sum

Average

Variance

The results of the two-way ANOVA are:

SUMMARY

5sp at

Total

Count

Sum

Average

Variance

ANOVA

Source of

Variation

Sample

Columns

Interaction

Within

Total

1 to 3

2

13,250

6,625

781,250

2

19,895

9,948

18,574,513

4

33,145

8,286

10,131,590

SS

5,592,405.33

71,871,898.17

6,325,178.17

31,412,373.00

115,201,854.67

4 to 5

2

16,990

8,495

500,000

2

16,687

8,344

3,615,361

4

33,677

8,419

1,379,438 df

6 or more

2

5,400

2,700

1,280,000

2

7,250

6

43,832

3,625 7,305

6,661,250 14,411,700

4

12,650

3,163

2,932,292

MS

1 5,592,405.33

2 35,935,949.08

2

6

3,162,589.08

5,235,395.50

11

Total

6

35,640

5,940

7,510,190

F

1.07

6.86

0.60

P-value

0.34

0.03

0.58

F crit

5.99

5.14

5.14

The Columns sum of squares is large relative to the others and also shows significance with a p value of 0.028. This means that the age effect is responsible for much of the variance in price.

The ANOVA accounts for 73% of the variation in price.

19

Chapter 10: Analysis of Variance

16. a.

There is an outlier in Heat 7 in which a runner finished in 22.69 seconds.

13

12.5

12

11.5

11

10.5

10

9.5

1 2 3 4 5 6 7 8 9 10 11 12

9

Heat

Aside from the outlier, there is no strong visual evidence that the race times for one heat are substantially different from race times from other heats. b.

The results of the one-way ANOVA are:

SUMMARY

Groups

ANOVA

Source of Variation

Between Groups

Within Groups

Total

9

10

11

5

6

7

8

1

2

3

4

12

Count Sum Average Variance

9 96.09 10.68 0.27

8 85.07 10.63 0.49

9 94.00 10.44 0.09

9 93.84 10.43 0.05

9 96.15 10.68 0.12

9 94.43 10.49 0.03

9 108.48 12.05 16.09

9 95.07 10.56 0.08

9 94.50 10.50 0.04

9 95.38 10.60 0.08

8 83.82 10.48 0.06

9 96.51 10.72 0.15

SS

19.19

139.95

159.15 df MS F P-value F crit

11 1.74 1.17 0.32 1.89

94 1.49

105

The p -value for the ANOVA is 0.317, so we fail to reject the null hypothesis that the mean race times of the 12 heats are equal.

20

Chapter 10: Analysis of Variance c.

The pairwise mean differences are:

Pairwise Mean Difference (row - column)

1 2 3 4 5 6 7 8 9 10 11 12

1 0.000 0.043 0.232 0.250 -0.007 0.184 -1.377 0.113 0.177 0.079 0.199 -0.047

4

5

6

7

8

9

10

11

12

2

3

4

5

6

7

0.000 0.189 0.207 -0.050 0.142 -1.420 0.070 0.134 0.036 0.156 -0.090

0.000 0.018 -0.239 -0.048 -1.609 -0.119 -0.056 -0.153 -0.033 -0.279

0.000 -0.257 -0.066 -1.627 -0.137 -0.073 -0.171 -0.051 -0.297

0.000 0.191 -1.370 0.120 0.183 0.086 0.206 -0.040

0.000 -1.561 -0.071 -0.008 -0.106 0.015 -0.231

0.000 1.490 1.553 1.456 1.576 1.330

8

9

10

11

12

MSE =

1.4888752216312

Pairwise Probabilities (Bonferroni

Correction)

1 -

2

3

1

0.000 0.063 -0.034 0.086 -0.160

0.000 -0.098 0.023 -0.223

0.000 0.120 -0.126

0.000 -0.246

0.000

2 3 4 5 6 7 8 9 10 11 12

1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

- 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

- 1.000 1.000 1.000 0.413 1.000 1.000 1.000 1.000 1.000

-

-

1.000 1.000 0.378 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

-

-

0.522 1.000 1.000 1.000 1.000 1.000

-

0.733 0.542 0.861 0.610 1.000

1.000 1.000 1.000 1.000

-

-

1.000 1.000 1.000

1.000 1.000

-

-

1.000

There are no significant pairwise differences. d.

The significance level for this test is 5% and since the recorded p -value is 0.317 which is greater than 0.05, we do not reject the null hypothesis. We conclude that the average race times for the 12 heats are equal and that the best runners in the world were evenly divided among the 12 heats.

21

Chapter 10: Analysis of Variance

17.

The boxplots of the reaction times are:

0.24

0.22

0.2

0.18

0.16

0.14

0.12

1 2 3 4 5 6 7 8 9 10 11 12

0.1

Hea t

The ANOVA is:

ANOVA

Source of Variation

Between Groups

Within Groups

Total

SS

0.005739

0.038629

0.044368 df MS F P-value F crit

11 0.000522 1.269628 0.254133 1.891991

94 0.000411

105

It is difficult from the boxplots to make any conclusions regarding the reaction times in the different heats. It is possible that Heat 10 has higher reaction times than other heats and Heat 11 has lower, but nothing substantial appears in the plot. The p -value for the ANOVA is 0.254, causing us to fail to reject the null hypothesis that the mean reaction times of the 12 heats are equal.

22

Chapter 10: Analysis of Variance

The means matrix shown below also does not indicate any pairwise differences.

Pairwise Mean Difference (row - column)

1 -

2

1

3

4

5

6

7

8

9

10

11

12

1 2 3

1 0.000 0.006 -0.004

2

3

4

5

0.000 -0.011

0.000

6

7

8

9

10

4 5

-0.005 0.003 0.000 -0.010 0.003 0.010 -0.007 0.018

-0.011 -0.003 -0.006 -0.017 -0.004 0.004 -0.013 0.012

-0.001 0.007 0.004 -0.006 0.007 0.014 -0.002 0.023

0.000 0.008 0.005 -0.005 0.008 0.015 -0.002 0.023

0.000 -0.003 -0.013 0.000 0.007 -0.010 0.015

11

12

MSE = 4.10947695035463E-04

Pairwise Probabilities (Bonferroni Correction)

6 7 8 9 10 11

0.000 -0.010 0.003 0.010 -0.007 0.018

0.000 0.013 0.020 0.004 0.029

0.000 0.007 -0.009 0.016

0.000 -0.017 0.008

0.000 0.025

0.000

1.000 1.000

-

2

-

3

1.000

-

1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

4 5 6 7 8 9 10 11

1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

-

-

1.000 1.000 1.000 1.000 1.000 1.000

-

1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 0.299

-

-

1.000 1.000 1.000

1.000 1.000

-

-

0.848

-

12

0.002

-0.004

0.007

0.007

-0.001

0.002

0.013

0.000

-0.008

0.009

-0.016

0.000

12

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

18. a.

The two factors are the runner and the round. The ANOVA table appears as follows (after formatting): df Source of Variation

Rows

Columns

Error

SS

0.01225

0.00155

0.00468

13

2

26

MS

0.00094

0.00078

0.00018

F

5.23409

4.31024

P-value

0.00017

0.02417

F crit

2.11917

3.36901

Total 0.01849 41 b.

Based on the ANOVA, we reject the null hypothesis that the runners have equal reaction times with a p -value of 0.0017 and we reject the null hypothesis that the reaction times are the same, concluding that they differ from one round to another ( p -value = 0.02417).

The R 2 value is equal to the sum of squares for the rows and columns divided by the total sum of squares. In this case that is (0.01225 + 0.00155)/(0.01849) = 0.746746. So aobut 75% of the variation in reaction times can be explained by the runner and round factors.

23

Chapter 10: Analysis of Variance c.

Round 1: Average Reaction = 0.1675 (std. dev. = 0.0211)

Round 2: Average Reaction = 0.1640 (std. dev. = 0.0299)

Round 3: Average Reaction = 0.1532 (std. dev. = 0.0182)

It would appear that the reaction times decrease as the competition proceeds through each round in the meet. d.

Round 1 vs. Round 2 t -test: Paired Difference = 0.004, p -value = 0.541

Round 2 vs. Round 3 t -test: Paired Difference = 0.011, p -value = 0.060

Round 1 vs. Round 3 t -test: Paired Difference = 0.014, p -value = 0.006

There are significant differences between Round 1 and Round 3 reaction times and between Round

2 and Round 3. In both cases the earlier round had the slower reaction time. This lends credence to the theory that reaction times decrease as the runner proceeds through the three rounds. e.

The interaction plot is:

Reaction Times for each runner

0.220

0.200

0.180

0.160

0.140

0.120

0.100

Round 1 Round 2 Round 3

Round

Since the linear for each runner appears different (some reaction times stay level through the three rounds, others go up in round 2 and so forth) it seems likely that there is an interaction between runner and round. This means that the conclusion that reaction times decrease with each round may not hold for each runner in the competion.

24

19.

Chapter 10: Analysis of Variance a.

The two-way table appears as:

ActiveHR Height

Frequency

0

1

2

0 1

75 108

99 120

93

87

84

93

99

99

84 141

93 99

96 111

90 129

108 90

93 135

129 153

99 129

123 120

96 147 b.

The two-way ANOVA is:

ANOVA

Source of Variation

Sample

Columns

Interaction

Within

Total

SS

3727.80

3499.20

210.60

4683.60

12121.2 df

2

1

2

24

MS

1863.90

3499.20

105.30

195.15

29

F

9.551

17.931

0.540

P-value

0.001

0.000

0.590

F crit

3.403

4.260

3.403

The Height and Frequency factors are both significant, but the interaction between the two factors is not.

25

Chapter 10: Analysis of Variance c.

The interaction plot is:

80

60

40

20

160

Average of ActiveHR

140

120

100

Height

0

1

0

0 1 2

Frequency

The lines are roughly parallel indicating no interaction between the factors. d.

The two-way table for the change in HR is:

DiffHR Height

Frequency 0 1

0 15 39

9 33

6 15

9 15

0 15

1 21 45

24 27

15 24

15 39

18 6

2 24 66

42 60

27 51

48 39

18 57

ANOVA

The two-way ANOVA is:

Source of Variation

Sample

Columns

Interaction

Within

Total

SS

4048.80

1920.00

218.40

2700.00

8887.2 df

2

1

2

24

MS

2024.40

1920.00

109.20

112.50

29

F

17.995

17.067

0.971

P-value

0.000

0.000

0.393

F crit

3.403

4.260

3.403

As before, the Height and Frequency factors are significant, but the interaction term is not.

26

Chapter 10: Analysis of Variance

20. a.

880

The boxplot and histograms appear as:

860

840

820

800

780

760 small Standard medium Standard large Standard

740

small Octel medium Octel large Octel

27

Chapter 10: Analysis of Variance b.

The interaction plot between the Size and Type factors is:

860

840

820

800

780

760

Octel

Standard

740

720 large medium small c.

ANOVA

Source of Variation

Sample

Columns

Interaction

Within

Total

The two-way ANOVA is:

SS

26051.39

1056.25

804.1667

1962.5

29874.31 df

2

1

2

30

MS F

13025.69

1056.25

402.0833

65.41667

6.146497

199.1189

16.1465

35

P-value

4.8E-18

0.000363

0.005792

F crit

3.315833

4.170886

3.315833

There is a significant difference between the two types of filters. However the degree of difference depends on the filter size. The greatest difference occurs for themedium size filters.

28

21.

Chapter 10: Analysis of Variance a.

The boxplots appear as:

40

30

20

10

0

-10

-20

80

70

60

50

Plant1 Plant2 Plant3 Plant4 Plant5

-30

The are significant outliers in the 1 st and 2 nd plants. b.

The one-way ANOVA is:

Plant1

Plant2

Plant3

Plant4

Plant5

Groups Count

22

22

19

19

13

ANOVA

Source of Variation

Between Groups

Within Groups

Total

SS

450.9207

8749.088

9200.009

Sum

99.5

194.3

91.8

142.3

134.9 df

4

90

94

Average

4.523

8.832

4.832

7.489

10.377

MS

112.730

97.212

Variance

100.642

235.729

19.388

13.374

91.299

F

1.160

P-value

0.334

There is no evidence of a statistically significant difference between the plants.

F crit

2.473

29

-5

-10

-15

5

0

20

15

10

Chapter 10: Analysis of Variance c.

The matrix of paired differences is:

Pairwise Mean Difference (row - column)

Plant1

Plant2

Plant3

Plant4

Plant1

0.000

Plant2

-4.309

0.000

Plant5

MSE = 97.2120931991985

Pairwise Probabilities (Bonferroni Correction)

Plant1

Plant2

Plant3

-

Plant1

-

Plant2

1.000

Plant4

Plant5

-

Plant3

-0.309

4.000

0.000

Plant3

1.000

1.000

-

Plant4

-2.967

1.342

-2.658

0.000

Plant4

1.000

1.000

1.000

There is no evidence of differences between any pair of plants. d.

The boxplot for the reduced data set is:

-

Plant5

-5.854

-1.545

-5.545

-2.887

0.000

Plant5

0.931

1.000

1.000

1.000

30 Revised Data

25

Plant1 Plant2 Plant3 Plant4

30

Chapter 10: Analysis of Variance

Plant1

Plant2

Plant3

Plant4

Plant5

The revised one-way ANOVA is:

Groups Count

20

20

19

19

13

Sum

37.4

135.7

91.8

142.3

134.9

ANOVA

Source of Variation

Between Groups

Within Groups

Total

SS

670.433

2662.210

3332.643

Average

1.870

6.785

4.832

7.489

10.377 df

4

86

90

MS

167.608

30.956

Variance

15.469

35.948

19.388

13.374

91.299

F

5.414

P-value

0.001

F crit

2.478

After removing the outliers the p -value is significant, indicating that there is a difference between theplants. The paired differences are:

Pairwise Mean Difference (row - column)

Plant1

Plant2

Plant1

0.000

Plant2

-4.915

0.000

Plant3

Plant4

Plant5

MSE = 30.9559247010639

Pairwise Probabilities (Bonferroni Correction)

Plant1

Plant2

Plant3

-

Plant1

-

Plant2

0.064

Plant4

Plant5

-

Plant3

-2.962

1.953

0.000

Plant3

1.000

1.000

-

Plant4

-5.619

-0.704

-2.658

0.000

Plant4

0.022

1.000

1.000

-

Plant5

-8.507

-3.592

-5.545

-2.887

0.000

Plant5

0.000

0.735

0.069

1.000

There is a significant difference between Plant 1 and Plants 4 and 5.

31

Chapter 10: Analysis of Variance

22. a.

b.

The one-way ANOVA is:

SUMMARY

Count Groups

Meadow Pipit

Tree Pipit

Hedge Sparrow

Robin

Pied Wagtail

Wren

ANOVA

Source of Variation

Between Groups

Within Groups

Total c.

45

15

14

16

15

15

SS

42.940

94.248

137.188

Sum

1003.450

346.350

323.700

361.200

343.550

316.950 df

5

114

119

Average

22.299

23.090

23.121

22.575

22.903

21.130

MS

8.588

0.827

Variance

0.848

0.813

1.142

0.469

1.140

0.553

F

10.388

P-value

0.000

F crit

2.294

There is a significant difference in the egg sizes between the host birds ( p -value < 0.001).

The boxplots are:

25

24

23

22

21

20

19

Hedge Sparrow Meadow Pipit Pied Wagtail Robin Tree Pipit Wren

32

Chapter 10: Analysis of Variance d.

The means matrix is:

Pairwise Mean Difference (row - column)

Hedge

Sparrow

Meadow

Pipit

Hedge Sparrow

Meadow Pipit

Pied Wagtail

Robin

Tree Pipit

Wren

0.000 0.823

0.000

Pied

Wagtail

0.218

-0.604

0.000

MSE = .826739905318742

Pairwise Probabilities (Bonferroni Correction)

Hedge

Sparrow

Meadow

Pipit

Hedge Sparrow - 0.057

Meadow Pipit

Pied Wagtail

-

-

Robin

Tree Pipit

Wren

Pied

Wagtail

1.000

0.416 e.

-

Robin

0.546

-0.276

0.328

0.000

Robin

1.000

1.000

1.000

Tree Pipit

0.031

-0.791

-0.187

-0.515

0.000

-

Tree Pipit

1.000

0.064

1.000

1.000

-

Wren

1.991

1.169

1.773

1.445

1.960

0.000

Wren

0.000

0.001

0.000

0.000

0.000

There is a significant difference in the egg size between the wren and all other host birds. There are no other significant paired differences.

The eggs laid in the nests of wrens of significantly smaller than the eggs laid in the nests of other hosts birds. This lends credence to the theory that cuckoos lay their eggs in the nests of a particular host species.

33

35

Download