Statistics 305 CHAPTER 7.1 – THE ONE-WAY NORMAL MODEL σ

advertisement
Statistics 305
CHAPTER 7.1 – THE ONE-WAY NORMAL MODEL
To begin, imagine a Normal population having mean zero and variance σ 2, i.e. N(0, σ 2).
Obviously if the same number µ is added to every element of this population the result is a
Normal population having mean µ and variance σ 2, i.e. N(µ, σ 2). Let ∈ denote a random
variable defined on the first population, i.e. ∈ ~ N (0, σ 2), and Y denote a random variable
defined on the second population, i.e. Y ~ N (µ, σ 2). The relationship between these random
variables is described mathematically as
Y = µ+∈.
Sampling random variables ∈1, ∈2, …, ∈n for a random sample of size n from N(0, σ 2) lead to
sampling random variables Y1, Y2, …, Yn from N(µ, σ 2) where
Yi = µ + ∈i ,
i = 1, 2, …, n.
Sample mean random variables have the obvious relationship Y = µ + ∈ .
In Chapter 7 we deal with the theoretical situation wherein there are r ≥ 2 normal populations
with means µ 1, µ 2, …, µ r and common variance σ 2. We are interested in estimating
relationships among the means µ i based on independent random samples from the populations.
Let ni, i = 1, 2, …, r denote the sample sizes. Sampling random variables are Yij, i = 1, 2, …, r;
j = 1, 2, …, ni. We assume samples are to be taken independently from each population. The
Normal N(µi, σ 2) sampling random variables can all be related to independent N(0, σ 2) random
variables as
Yij = µ i + ∈ ij ,
i = 1, 2, ..., r ;
j = 1, 2, ..., ni
(1)
where
Yij ~ N ( µ i , σ 2 ) and ∈ ij ~ N (0, σ 2 ) .
All the Y’s are independent, hence so are all the ∈‘s. The relationship (1) is a mathematical one
which describes the nature of the sampling scheme we envision. The overall scheme,
summarized by (1), is called the One-Way Normal Model. Expressing all the Y random
variables, from the various N(µi, σ 2) distributions, in terms of the N(0, σ 2) random variable is
quite useful as we will see in what follows.
When the sample is taken, realizations of the Y sampling random variables are yij, i = 1, 2, …, r;
j = 1, 2, …, ni. For each sampled population, the sample mean is
yi =
1
ni
ni
∑
yij
j =1
and it is the estimate of µi. Define residuals eij as
eij = y ij − y i ,
i = 1, 2, ..., r ;
j = 1, 2, ..., ni .
These residuals are viewed as approximate realizations of independent N(0, σ 2) random
variables (i.e. a random sample from N(0, σ 2). The eij / si are called standardized residuals.
They are viewed as approximate N(0, 1) sample values. Residuals are used to check the validity
of the normality and equal variance assumptions as follows.
In practical applications we have r sampled populations, but we probably don’t know whether
they are approximately Normal populations having equal variance. We can make r different
Normal Quantile Plots and look for linearity in everyone and approximately equal slope of the
linear trace. The problem with this is that the ni are often quite small so the plots may be hard to
interpret. An alternative is to make one Normal Quantile Plot of the residuals or the standardized
residuals. The number of points in this plot is n = ∑i =1 ni . If the sampled populations are
r
approximately Normal then this plot should be fairly linear. If any of the sampled populations
are not nearly normal, this should disturb the linearity of the Normal plot of residuals. Thus the
Normal Quantile Plot of residuals is the preferred method for checking the assumption of Normal
populations when one wishes to use the One-Way Normal Model. Figure 7.6, page 451, is an
example.
The other assumption in the model is that all populations have the same variance σ 2. A
graphical method for checking this is a scatterplot of the n ordered pairs ( yi , eij ) , i = 1, 2, …, r; j
= 1, 2, …, ni. The scatter should roughly be spread in the same vertical interval over each value
yi . Unequal variances across populations will likely show in this plot as somewhat different
spreads over the yi points on the horizontal axis. Figure 7.5, page 451, is an example. An
alternative is to plot residuals at observation number, assuming the observations are in
contiguous subsets grouped by sampled population. Then one looks across each cluster of
plotted residuals to see if the vertical spread of various clusters is roughly the same.
Additionally, look at the traces of the Normal Quantile Plots for each sample to see if they are
approximately parallel.
Using JMP, the data table is formed with one column having all sample values, grouped
sequentially by sampled population. A companion column is defined as a categorical variable
(Normal or ordinal modeling type) column. Categories are populations and a different variable
value is used in rows corresponding to different populations. The FIT Y BY X platform is used
2
with Y identified with the column of sample values and X the column of category identifications.
Residuals can be saved in a new column of the data table by specifying
or standardized as
SAVE → SAVE CENTERED
SAVE → SAVE STANDARDIZED.
Then a Normal Quantile Plot of residuals can be made using
ANALYZE → DISTRIBUTION → NORMAL QUANTILE PLOT
for the residuals variable. A plot of residuals grouped by sampled population can be made using
GRAPH → OVERLAY PLOT
and specifying the residual column in the data table as the Y variable and the X the column of
category identifications.
3
Download