Laboratory 6 - Trinity College Dublin

advertisement
Trinity College, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 6:
Analysis of Variance
To complete the laboratory exercise, work your way through this handout, which is self
contained and self explanatory. Work in pairs (two per machine), and learn from each other.
Keep separate logs of your work. The tutor is available to help with technicalities and discuss
substantive issues.
Invitations to consider the results of Minitab analysis and their statistical and substantive
interpretations are printed in italics. Take some time for this; consult your neighbour or
tutor. Enter your responses in a Word document, as if draft contributions to a report on
the experiment and its analysis.
Topics:
1.
2.
3.
4.
Testing filter membranes
A study of river pollution
Two sample t-tests and ANOVA
Review Exercise
Learning Objectives:
Be able to

formulate a substantive problem in statistical terms

conduct an initial data analysis using appropriate numerical and graphical
summaries of raw data

summarise the results of an initial data analysis in management report format

use Minitab to implement a one-way analysis of variance and explain and interpret
the results

use Minitab to implement Tukey's and Fisher's methods of comparison of several
means

interpret the results of multiple comparisons of means

compare and contrast Tukey's and Fisher's approaches

use Minitab to produce standard diagnostic plots for the analysis of variance and
explain and interpret the results

use Minitab to test the statistical significance of differences between several
standard deviations and explain and interpret the results

explain the rationale for weighted least squares in one-way analysis of variance and
derive suitable weights
Trinity College, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 6

use Minitab to implement weighted one-way analysis of variance and explain and
interpret the results

implement a comprehensive one-way analysis of variance using the Minitab General
Linear Model feature and explain and interpret the results

demonstrate the correspondence between two-sample t-tests and analysis of
variance applied to the comparison of two means
Refresher
To start Minitab,

click Start, Programs, Minitab 15 for Windows, Minitab.
To access the Excel data files,


click on the Start button in the bottom left hand corner and choose Run..
in the dialog box, type \\tholos\shared, as below, and click OK,



in the window that opens double click on the ST1001 folder,
double click on the GET folder,
double click on the GenericSkillsData folder.
The data sets for today's Laboratory are in files
MembraneStrength.xls,
IQ Scores.xls (from Laboratory 4) and
RiverPollution.xls.
To access the membrane strengths data:




click on MembraneStrength.xls, then Open
copy the two data columns,
in the Minitab active data window, click in the Name cell for Column 1 (C1),
from the Minitab Edit menu select Paste Cells.
page 2
Trinity College, Dublin
Generic Skills Programme
1
Statistics for Research Students
Laboratory 6
Testing filter membranes
A company that manufactures liquid filters is concerned with improving the burst strength of the
membranes which constitute the critical part of the filter. They have conducted a study of four
types of filter membrane, labelled A, B, C and D. Membrane A is the standard type currently
used by the company. Membrane B is an alternative developed by the company using a new
material they have developed. Membranes C and D were supplied by other manufacturers.
Following a review of historical data, it was decided to test sample membranes, one from each
of 10 batches of each membrane type. The standard measure of burst strength involved
increasing the pressure of liquid through the filter until the filter failed and recording the fail
pressure. The measurement unit was kilopascal (thousands of Pascals, kPa). The resulting
data follow.
1.1
Membrane A
Membrane B
Membrane C
Membrane D
95.5
103.2
93.1
89.3
90.4
92.1
93.1
91.9
95.3
84.5
90.5
98.1
97.8
97.0
98.0
95.2
95.3
97.1
90.5
101.3
86.3
84.0
86.2
80.2
83.7
93.4
77.1
86.8
83.7
84.9
89.5
93.4
87.5
89.4
87.9
86.2
89.9
89.5
90.0
95.6
Initial data analysis
Having already copied the data to Minitab


use the Dotplot command in Minitab's Graph menu, One Y, With Groups, to make
standard dotplots of the data,
use the Individual Value Plot command in Minitab's Graph menu, One Y, With Groups,
to make vertical dotplots of the data.
Which plot do you prefer? Why?
Interpret the output. Make tentative conclusions regarding comparisons of strengths
of different membrane types, with a corresponding recommendation to the company,
keeping in mind the origins of the four membrane types.
1.2
Formal analysis
The formal comparison of membrane strengths may be accomplished by an application of the
Analysis of Variance (ANOVA) which, effectively, compares the four sample means. Minitab
may be used to achieve this, as follows:




from the Stat menu, select ANOVA, then One-Way ∙ ∙ ∙,
select Strength as the Response and Membrane as the Factor,
click on Graphs ... , uncheck any checked residual plots(we will return to these),
click OK, OK.
page 3
Trinity College, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 6
Report on the statistical significance of the differences between the sample means.
Explain the entries in the DF column of the ANOVA table.
Using the Minitab Calc menu, confirm the p-value for F and calculate the critical
value for F.
Note that the Cumulative probability function is used to calculate probabilities
from F values while the Inverse cumulative probability function is used to
calculate F values from probabilities.
1.3
Make pairwise comparisons of membrane strengths
To assist in understanding the pattern of differences found between means of different
membrane types, multiple comparisons of the several means may be undertaken. Minitab
provides two approaches, corresponding to Tukey's HSD and Fisher's LSD1. They may be
implemented as follows:



edit the previous ANOVA dialog, click the Comparisons button,
check the Tukey's and Fisher's boxes,
click OK, OK.
The ANOVA table is repeated, followed by the multiple comparisons.
Report on the statistical significance of the differences between the sample means,
pairwise, using both Tukey's and Fisher's methods. Display using the underline
format (see, e.g., Course Manual, Figure 4.13, p. 21 or Figure 4.16, p. 28).
Compare the width of the Tukey intervals and the corresponding Fisher intervals. By
how much do they differ?
Explain the differences between the two methods, having regard to simultaneous
and individual confidence levels.
1.4
Diagnostic analysis
The validity of the statistical inferences relies on key assumptions, specifically, homogeneous
standard deviation and Normality of errors. These may be assessed using standard diagnostic
plots. These may be implemented as follows:






edit the previous dialog (CTRL+E) and click the Graphs button,
check the Normal plots of residuals and Residual versus fits boxes,
click OK,
click the Comparisons button,
uncheck the Tukey's and Fisher's boxes, click OK,
click OK.
Provide interpretations of the diagnostic plots.
What course of action is suggested?
1
See Course Manual, Chapter 4, §4.3, pp. 20-28.
page 4
Trinity College, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 6
Follow the suggested course of action. Proceed to a conclusion.
Prepare a short report for management with your final recommendation(s).
2
A study of river pollution
Jaffe, Parker and Wilson (1982)2 have investigated the concentration of several hydrophobic
(i.e., not dissolving in, absorbing, or mixing easily with water) organic substances (such as
hexachlorobenzene, chlordane, heptachlor, aldrin, dieldrin, endrin) in a river, downstream of an
abandoned dump site that had previously been used by the pesticide industry to dispose of its
waste products.
It was expected that these hydrophobic substances might have a nonhomogeneous vertical
distribution in the river because of differences in density between these compounds and water
and because of the adsorption of these compounds on sediments, which could lead to higher
concentrations on the bottom. It is important to check this hypothesis because the standard
procedure of sampling at six-tenths of the depth could miss the bulk of these pollutants if the
distribution were not uniform.
Grab samples were taken at various depths of the river. Ten surface, 10 mid-depth and 10
bottom samples were collected, all within a relatively short period. A gas chromatograph
measurement procedure was used to establish the concentrations of a range of pollutants in
each sample. The average aldrin and hexachlorobenzene (HCB) concentrations (in nanograms
per litre) in the 30 samples are shown below.
Aldrin
2.1
HCB
Surface
Middepth
Bottom
Surface
Middepth
Bottom
3.08
5.17
4.81
3.74
6.03
5.44
3.58
6.17
5.71
4.61
6.55
6.88
3.81
6.26
4.90
4.00
3.55
5.37
4.31
4.26
5.35
4.67
4.59
5.44
4.35
3.17
5.26
4.87
3.77
5.03
4.40
3.76
6.26
5.12
4.81
6.48
3.67
4.76
3.76
4.52
5.85
3.89
5.17
4.90
8.07
5.29
5.74
5.85
5.17
6.57
8.79
5.74
6.77
6.85
4.35
5.17
7.30
5.48
5.64
7.16
Initial data analysis; Aldrin
2
Jaffe, P. R., Parker, F. L., and Wilson, D. J. (1982). Distribution of toxic substances in rivers. Journal of
the Environmental Engineering Division, Proceedings of the American Society of Civil Engineers, Vol. 108,
No. EE4, pp. 639-649.
page 5
Trinity College, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 6
Copy the River Pollution data into Minitab columns and conduct an initial data analysis:


make dotplots and numerical summaries of the Aldrin levels, stratified by depth,
use the Value Order command in the Editor menu to arrange the depths in order of their
typical Aldrin concentrations; refer to Laboratory 1, Part 2, to see how to do this.
Provide a detailed interpretation of the output, as if for an interim management report,
including tentative responses to the questions raised above.
2.2
ANOVA
The formal comparison of pollutant levels at the three depths may be accomplished by an
application of the Analysis of Variance (ANOVA) which, effectively, compares the three sample
means. Minitab may be used to achieve this, as follows:




from the Stat menu, select ANOVA, then One-Way ∙ ∙ ∙,
select Aldrin as the Response and Depth as the Factor,
click on Graphs ... , uncheck any checked residual plots,
click OK, OK.
Report on the statistical significance of the differences between the Aldrin sample
means.
2.3
Make pairwise comparisons of pollutant levels
To assist in understanding the pattern of differences found between pollutant levels, make
multiple comparisons of the several means:



edit the previous ANOVA dialog and click the Comparisons button,
check the Tukey's box,
click OK, OK.
Report on the statistical significance of the differences between the Aldrin sample
means, pairwise, using Tukey's method. Report specifically on differences from
standard (Middepth).
Since Middepth is a "standard", there may be interest in comparing the responses at the other
two depth levels to those at the standard. For this purpose, an alternative approach to multiple
comparisons is available. To implement this,




edit the previous dialog (CTRL+E) and click the Comparisons button,
uncheck the Tukey's box,
check the Dunnett's box, enter "Middepth" (including quotation marks) in the Control
group level box,
click OK, OK.
Report on the statistical significance of the differences of the Surface and Bottom
Aldrin sample means from standard (Middepth).
Compare with the corresponding Tukey differences; explain any variations.
page 6
Trinity College, Dublin
Generic Skills Programme
2.4
Statistics for Research Students
Laboratory 6
Diagnostic analysis
Implement the standard diagnostics analysis:





edit the previous dialog (CTRL+E) and click the Comparisons button,
uncheck the Dunnett's box, click OK,
click the Graphs button,
check the Normal plots of residuals and Residual versus fits boxes,
click OK, OK.
Provide interpretations of the diagnostic plots.
2.5
Formal comparison of spreads
The diagnostic plot suggests that the spread of pollutant values increases with depth, in parallel
with pollutant levels. This merely echoes the initial data analysis. Minitab provides formal tests
of the statistical significance of the variation in spread, specifically, variation in standard
deviations. This may be achieved as follows:



from the Stat menu, select ANOVA, then Test for Equal Variances,
select Aldrin as the Response and Depth as the Factor,
click OK.
The results appear in a graph window. Results of two significance tests are shown, Bartlett's
test and Levene's test. Get help on these by editing the last dialog (CTRL+E) and clicking on
the Help button, then click the "Bartlett's and Levene's' tests" link. For more detail, click the
Back button, then click "see also", "Methods and formulas".
Interpret the results, recalling that the Normal diagnostic plot supported Normality.
What do you conclude?
2.6
Use weighted least squares to adjust for unequal standard deviations
The estimation process may be adjusted to allow for the possibly unequal standard deviations.
The basic principle is that cases with smaller standard deviation should contribute more and
cases with larger standard deviation should contribute less to the estimation process. This is
achieved by using a weighting process where each case's contribution is weighted by the
reciprocal of the relevant standard deviation.
In Minitab, the weighting is implemented as part of the Least Squares estimation process; the
weights used there are the reciprocals of the relevant variances. For the cases at each depth,
Surface, Middepth and Bottom, these are the squares of the standard deviations shown in the
original numerical summaries, see 2.1 above. To calculate the Least Squares weights, use the
formula
weight = 1/(standard deviation)2
What are the weights for Surface, Middepth, Bottom?
page 7
Trinity College, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 6
Next, the weights must be entered in a column:
 name C4 as Weights,
 from the Calc menu, select Make Patterned Data, then Arbitrary Set of Numbers,
 select Weights as the column in which to "Store patterned data",
 enter the calculated weights in the "Arbitrary set of numbers" box,
 enter 10 as the "Number of times to list each value",
 click OK.
Check the assignment of weights in the Worksheet, ensure correct correspondence
with Depths.
Now, we are ready to implement the weighted ANOVA. The weighting option is not available
using the simple One-Way command; use the General Linear Model command instead:
:







from Stat, select ANOVA, then General Linear Model,
select Aldrin as Response,
enter Depth as Model,
click on the Options button and enter Weights in the appropriate box, click OK,
click on the Comparisons button, enter Depth in the Terms box, check Tukey, click OK,
click on the Graphs button, check Deleted, Normal plot of residuals, Residuals versus
fits,
Click OK, OK.
Review the output, compare point by point with the unweighted output.
Note any qualitative correspondences and differences in results,
Note any quantitative correspondences and differences in results.
Prepare a short report on the effects of weighting, with a final conclusion.
3
Two sample t-tests and ANOVA
Recall the comparison of boys and girls IQ scores in Laboratory 4. As part of a larger study of
academic progress by males and females, IQ scores of samples of seventh grade boys and
girls in a Mid-West USA school district were measured. Assuming that these samples were
representative of all the seventh graders, male and female, in the school district, a basic
question is: Is there evidence of a difference in IQ scores for boys and girls?
The data are available in the Week 1 Day 4 section of the Moodle. Copy the data to Minitab
and repeat the 2-sample t-test of Laboratory 4; use the "Assume equal variances" option.
One-way ANOVA is intended for use in comparison of several sample means. If so, then it
should be applicable to the comparison of two sample means. Minitab allows application of
ANOVA to the two samples in separate columns, without the necessity of stacking them in a
single column with group identifiers in another column:




from the Stat menu, select ANOVA, then One-Way (Unstacked),
select Boys, Girls as the Responses,
click the Graphs button, check Normal plot of residuals, Residuals versus fits,
click OK, OK.
page 8
Trinity College, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 6
To facilitate establishing correspondences,

calculate the square roots of F and MS(Error) in the ANOVA table.
Prepare short reports on both sets of results. Include commentary on the residual
analysis incorporated in the ANOVA command.
How many correspondences can you establish between the two sets of results?
Explain the DF in the first row of the ANOVA table
4
Review Exercise
Carry out a careful analysis of the HCB data in the River Pollution study.
Conclusion
This concludes Laboratory 6. The learning objectives listed at the outset are reproduced here.
Check them individually and ensure that you have achieved each one; seek help from the Tutor
if necessary.
Learning Objectives:
Be able to
 formulate a substantive problem in statistical terms

conduct an initial data analysis using appropriate numerical and graphical
summaries of raw data

summarise the results of an initial data analysis in management report format

use Minitab to implement a one-way analysis of variance and explain and interpret
the results

use Minitab to implement Tukey's and Fisher's methods of comparison of several
means

interpret the results of multiple comparisons of means

compare and contrast Tukey's and Fisher's approaches

use Minitab to produce standard diagnostic plots for the analysis of variance and
explain and interpret the results

use Minitab to test the statistical significance of differences between several
standard deviations and explain and interpret the results

explain the rationale for weighted least squares in one-way analysis of variance and
derive suitable weights

use Minitab to implement weighted one-way analysis of variance and explain and
interpret the results

implement a comprehensive one-way analysis of variance using the Minitab General
Linear Model feature and explain and interpret the results

demonstrate the correspondence between two-sample t-tests and analysis of
variance applied to the comparison of two means
page 9
Download