Feedback Lab 6 - Trinity College Dublin

advertisement

105

100

95

90

85

80

Trinity College, Dublin

Generic Skills Programme

Statistics for Research Students

Laboratory 6, Feedback

Dotplot of Strength

1 Testing filter membranes

1.1 Initial data analysis

A

B

C

D

77.0

80.5

84.0

87.5

91.0

Strength

94.5

98.0

101.5

Individual Value Plot of Strength

A B C D

M embrane

Which plot do you prefer? Why?

Minitab's standard dotplot is badly designed, with the multiple plots unevenly spaced and lines in between obscuring comparisons. The lines act as "chart junk" and violate the "maximise data-to-ink ratio" rule referred to in Laboratory 1.

Trinity College, Dublin

Generic Skills Programme

Statistics for Research Students

Laboratory 6, Feedback

The four plots are also too close, which also inhibits comparison.

The "individual values" plot overcomes these flaws.

The dotplot has the advantage that it shows the basic variable on the horizontal axis, in agreement with the traditional histogram, although this is merely convention and not based on any well defined pricniples of data display.

Interpret the output. Make tentative conclusions regarding comparisons of strengths of different membrane types, with a corresponding recommendation to the company, keeping in mind the origins of the four membrane types.

Leaving aside the possibility of one or two exceptional values, Membrane B appears best,

Membrane C appears worst, with not much difference between Membranes A and D.

Membrane C can be dismissed immediately. Membrane A, the current choice, may be preferred to Membrane D as it is manufactured "in house". Subject to statistical significance and cost considerations, Membrane B, the new design, appears worth recommending.

1.2 Formal analysis

Report on the statistical significance of the differences between the sample means.

There are statistically significant differences between the mean strengths of the membranes; p < 0.0005.

Explain the entries in the DF column of the ANOVA table.

The Membrane SS involves the deviations of the 4 sample means from the overall mean.

These 4 deviations sum to 0 and so have 4

– 1 = 3 degrees of freedom.

The Error SS is the sum of the 4 within sample sums of squares of deviations of strength values from sample means, each of which involves 10 deviations and, therefore, 10 – 1 = 9 degrees of freedom, a total of 4x9 = 36 degrees of freedom.

The Total SS is the sum of squares of the deviations of the 40 strength values from the overall mean, involving 40 – 1 = 31 degrees of freedom.

Note that 3 + 36 = 39, paralleling the fact that Membrane SS + Error SS = Total SS, the basic decomposition of sums of squares that makes Analysis of Variance work.

Using the Minitab Calc menu, confirm the p-value for F and calculate the critical value for F.

Use the Cumulative probability function of the F distribution to calculate the p-value (a probability) from the F calculated value.

The output value is the area to the left of 15.54, that is, 1.00000. The p-value is 1 minus thus, that is, 0.00000.

Use the inverse function to calculate the critical F value from the significance level (a probability).

The input constant is 0.95 = 1

– 0.05. The critical value is 2.86627. page 2

Trinity College, Dublin

Generic Skills Programme

Statistics for Research Students

Laboratory 6, Feedback

1.3 Make pairwise comparisons of membrane strengths

Report on the statistical significance of the differences between the sample means, pairwise, using both Tukey's and Fisher's methods.

Membrane B mean is significantly bigger than Membranes C and D means and close to significantly bigger than Membrane A mean.

Membrane C mean is significantly smaller than the other three means.

Membranes A and D means are not significantly different.

The same conclusions may be drawn from both methods.

Display using the underline format

Membrane C Membrane D Membrane A Membrane B

84.63 89.89 92.84 96.08

Compare the width of the Tukey intervals and the corresponding Fisher intervals. By how much do they differ?

All the Tukey intervals have width 9.4, all the Fisher intervals have width 7.076. This reflects that fact that the Tukey standard error multiplier is bigger than the Fisher multiplier.

Note that the Tukey multiplier 1 is q(.95, 2, 36)/

2 = 3.81 / 1.414 = 2.69 while Fisher's multiplier is t

36, 0.05

= 2.03.

Their ratio is

2.69 / 2.03 = 1.33, the same as the ratio of the interval widths,

9.4 / 7.076 = 1.33.

Explain the differences between the two methods, having regard to simultaneous and individual confidence levels.

When making a single comparison between two means, there is a possibility that we are wrong, which is why we are not 100% confident. With several comparisons, there are more possibilities for being wrong, which must reduce our overall level of confidence.

1 See Course Manual, Chapter 4, §4.3, p. 23. page 3

Trinity College, Dublin

Generic Skills Programme

Statistics for Research Students

Laboratory 6, Feedback

The Tukey method adjusts for this reduced level of confidence by increasing the width of each interval, which it achieves by using a bigger multiplier. The Fisher method makes no such adjustment

Note that the difference between Membranes A and B is marginally insignificant according to the Fisher method but rather less significant according to the Tukey method. This shows that, while both methods led to the same conclusions here, they may not always do so.

In so far as one needs to be cautious when making multiple comparisons, the Tukey method is the safer option.

1.4 Diagnostic analysis

Provide interpretations of the diagnostic plots.

Neither plot shows exceptional patterns.

Note that the Residuals versus Fitted Values plot is the Individual Values plot with each sampled centred at its mean. The sample means are the fitted values and the residuals are the individual values less the corresponding fitted values.

The Normal plot is not perfectly linear but the departure from linearity is not exceptional and is consistent with chance causes. To check this, you can make Normal reference plots based on

Normal data generated by Minitab, as in Laboratory 2, part 2.

What course of action is suggested?

Proceed to a conclusion!

Prepare a short report for management with your final recommendation(s).

Having compared all four membrane types, the first conclusion is that the competitors' membranes have no particular advantage over this company's membranes.

There is a suggestion that the newly developed Membrane B is, on average, stronger than the existing Membrane A. The difference in average strength is estimated to be between 1.46 kPa in favour of Membrane A and 7.94 kPa in favour of Membrane B. This estimate is made with

95% confidence.

However, as the result is not statistically significant, one should be cautious about opting for the new membrane without further testing.

2 A study of river pollution

2.1 Initial data analysis

Numerical summary of Aldrin levels

Depth N Mean StDev Minimum Maximum

Surface

Middepth

Bottom

10

10

10

4.19

5.02

6.02

0.67

1.10

1.58

3.08

3.17

3.76

5.17

6.57

8.79 page 4

Trinity College, Dublin

Generic Skills Programme

Statistics for Research Students

Laboratory 6, Feedback

Individual Value Plot of Aldrin levels

7

6

9

8

5

4

3

Surface Middepth

Depth

Bottom

Provide a detailed interpretation of the output, including tentative responses to the questions raised above.

The level of Aldrin appears to increase from surface to bottom, with mean levels of 4.2, 5.0 and

6.0 at Surface, Middepth and Bottom, respectively. The spread of the recorded levels also increases going from Surface to Bottom, with standard deviation values of 0.7, 1.1 and 1.6 at

Surface, Middepth and Bottom, respectively and corresponding measurement ranges of 3.1 to

5.2 (2.1), 3.2 to 6.6 (3.4) and 3.8 to 8.8 (5.0), respectively.

This calls into question the practice of using mid-depth measurements as representative of the levels of pollution in rivers.

2.2 ANOVA

Report on the statistical significance of the differences between the Aldrin sample means.

The F-ratio for comparing the sample means was 6.05, exceeding the 5% critical value of 3.35. The p-value was

0.007, less than 0.05. Thus, the differences between the sample means are statistically significant.

Area = .05

0 1 2 3

3.35

4 5 6

6.05

7 8 page 5

Trinity College, Dublin

Generic Skills Programme

Statistics for Research Students

Laboratory 6, Feedback

2.3 Make pairwise comparisons of pollutant levels

Report on the statistical significance of the differences between the Aldrin sample means, pairwise, using Tukey's method. Report specifically on differences from standard (Middepth).

Surface and Bottom are significantly different from each other; neither is significantly different from Middepth.

Report on the statistical significance of the differences of the Surface and Bottom

Aldrin sample means from standard (Middepth).

Neither is significantly different from Middepth.

Compare with the corresponding Tukey differences; explain any variations.

The conclusion is the same. However, the Dunnett intervals are narrower (and so could lead to a different conclusion in another problem).

This is because there are fewer comparisons to be made using the Dunnett procedure.

Dunnett compares standard to everything else and so, in this case, does not compare Surface and Bottom. Tukey compares everything to everything else and so, in this case, does compare

Surface and Bottom.

Note that pairwise comparisons could be misleading in this problem in that they could lead to the conclusion that Middepth was representative of all depths. The pairwise comparisons ignore the ordering of the depths and the evident trend in Aldrin concentration associated with this ordering. Conceivably, a regression approach could be used to take the ordering into account.

2.4 Diagnostic analysis Versus Fits

(response is Aldrin)

3

2

1

0

-1

-2

4.0

4.5

5.5

6.0

5.0

Fitted Value page 6

Trinity College, Dublin

Generic Skills Programme

Statistics for Research Students

Laboratory 6, Feedback

Normal Probability Plot

(response is Aldrin)

3

2

1

0

-1

-2

-3

-2 -1 0

Score

1 2

Provide interpretations of the diagnostic plots.

The Residuals versus Fitted Values plot shows a clear relationship of spread with concentration of Aldrin; the spread increases with the fitted values.

The Normal plot is unremarkable.

2.5 Formal comparison of spreads

Test for Equal Variances for Aldrin

Surface

Middepth

Bartlett's Test

Test Statistic 5.76

P-Value 0.056

Levene's Test

Test Statistic 2.11

P-Value 0.141

Bottom

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

95% Bonferroni Confidence Intervals for StDevs page 7

Trinity College, Dublin

Generic Skills Programme

Statistics for Research Students

Laboratory 6, Feedback

Interpret the results, recalling that the Normal diagnostic plot supported Normality.

Bartlett's test suggests that differences between standard deviations are close to statistical significance at the conventional 5% level. Levene's test suggests that they are not statistically significantly different. Because Bartlett's test is intended for problems where the Normal model applies whereas Levene's test is intended for problems where Normality may or may not apply,

Bartlett's test will be more powerful when the Normal model applies and so more likely to indicate significance, as here. In these circumstances, it makes sense to rely on Bartlett's test.

What do you conclude?

There is evidence of a departure from the assumption of homogeneous spread.

2.6 Use weighted least squares to adjust for unequal standard deviations

What are the weights for Surface, Middepth, Bottom?

The standard deviations at the different depths were given with the ANOVA output,

0.671 for Surface, 1.104 for Middepth, 1.582 for Bottom.

Thus, the corresponding weights are

1 / 0.671

2 = 2.22, 1 / 1.104

2 = 0.82, 1 / 1.582

2 = 0.40.

Review the output, compare point by point with the unweighted output.

Note any qualitative correspondences and differences in results,

Note any quantitative correspondences and differences in results.

The conclusions from both analyses are broadly similar; the means are statistically significantly different, pairwise comparisons suggest that Surface and Bottom are significantly different from each other; neither is significantly different from Middepth.

The F ratio for equal means calculated using the weighted analysis is bigger, 6.63 > 6.05.

Correspondingly, the p-value is smaller, 0.005 < 0.007.

The Tukey comparison intervals calculated from the weighted analysis are narrower.

Prepare a short report on the effects of weighting, with a final conclusion.

Applying a weighted analysis brings the assumptions required for the validity of the analysis of variance into line. This provides a more powerful analysis, as evidenced by the more statistically significant results.

The formal conclusion is that the means of the Aldrin concentration at the different depths are statistically significantly different. Informally, it may be seen that there is a strong trend from low to high concentration as depth increases from Surface to Bottom.

3 Two sample t-tests and ANOVA

Prepare short reports on both sets of results. Include commentary on the residual analysis incorporated in the ANOVA command. page 8

Trinity College, Dublin

Generic Skills Programme

Statistics for Research Students

Laboratory 6, Feedback

Two sample t-test:

The mean IQ for Boys was 111.0 and for Girls 105.8.

The difference between means, 5.12, was not statistically significant; t = 1.7 with p-value

0.093.

One-way ANOVA:

The mean IQ for Boys was 111.0 and for Girls 105.8.

The difference between means, 5.12, was not statistically significant; F = 2.89 with pvalue 0.09.

How many correspondences can you establish between the two sets of results?

The numerical summaries, means and standard deviations, are identical. t = 1.7,

F =

2.89 = 1.7. p = 0.93 in both analyses. s = Pooled StDev = 13.01.

Explain the DF in the first row of the ANOVA table

When comparing k means, the Factor SS involves the deviations of the k sample means from the overall mean. These k deviations sum to 0 and so have k – 1 degrees of freedom. Here, k

= 2 so k – 1 = 1.

4 Review Exercise; analysis of the HCB data

The results of this analysis are less pronounced than the analysis of Aldrin concentrations.

The sample means appear less different. The sample spreads are also less different, in fact, the spreads for Middepth and Bottom are virtually identical.

Formally, the F test for differences between means is not statistically significant at the conventional 5% significance level; p = 0.065.

None of the Tukey comparisons are significant.

The results of both tests for homogeneity of spread are not statistically significant.

There is little point in implementing a weighted analysis; the results will not change dramatically.

The overall conclusion is that the same broad trends appear in HCB concentrations but not as strongly as for Aldrin, and the formal tests are not statistically significant.

It should be noted, however, that the formal tests did not take account of the visible trend; a formal analysis that did take account of the trend would be more likely to produce statistically significant results. page 9

Download