Statistics 305 CHAPTER 7.1 – THE ONE-WAY NORMAL MODEL To begin, imagine a Normal population having mean zero and variance σ 2, i.e. N(0, σ 2). Obviously if the same number µ is added to every element of this population the result is a Normal population having mean µ and variance σ 2, i.e. N(µ, σ 2). Let ∈ denote a random variable defined on the first population, i.e. ∈ ~ N (0, σ 2), and Y denote a random variable defined on the second population, i.e. Y ~ N (µ, σ 2). The relationship between these random variables is described mathematically as Y = µ+∈. Sampling random variables ∈1, ∈2, …, ∈n for a random sample of size n from N(0, σ 2) lead to sampling random variables Y1, Y2, …, Yn from N(µ, σ 2) where Yi = µ + ∈i , i = 1, 2, …, n. Sample mean random variables have the obvious relationship Y = µ + ∈ . In Chapter 7 we deal with the theoretical situation wherein there are r ≥ 2 normal populations with means µ 1, µ 2, …, µ r and common variance σ 2. We are interested in estimating relationships among the means µ i based on independent random samples from the populations. Let ni, i = 1, 2, …, r denote the sample sizes. Sampling random variables are Yij, i = 1, 2, …, r; j = 1, 2, …, ni. We assume samples are to be taken independently from each population. The Normal N(µi, σ 2) sampling random variables can all be related to independent N(0, σ 2) random variables as Yij = µ i + ∈ ij , i = 1, 2, ..., r ; j = 1, 2, ..., ni (1) where Yij ~ N ( µ i , σ 2 ) and ∈ ij ~ N (0, σ 2 ) . All the Y’s are independent, hence so are all the ∈‘s. The relationship (1) is a mathematical one which describes the nature of the sampling scheme we envision. The overall scheme, summarized by (1), is called the One-Way Normal Model. Expressing all the Y random variables, from the various N(µi, σ 2) distributions, in terms of the N(0, σ 2) random variable is quite useful as we will see in what follows. When the sample is taken, realizations of the Y sampling random variables are yij, i = 1, 2, …, r; j = 1, 2, …, ni. For each sampled population, the sample mean is yi = 1 ni ni ∑ yij j =1 and it is the estimate of µi. Define residuals eij as eij = y ij − y i , i = 1, 2, ..., r ; j = 1, 2, ..., ni . These residuals are viewed as approximate realizations of independent N(0, σ 2) random variables (i.e. a random sample from N(0, σ 2). The eij / si are called standardized residuals. They are viewed as approximate N(0, 1) sample values. Residuals are used to check the validity of the normality and equal variance assumptions as follows. In practical applications we have r sampled populations, but we probably don’t know whether they are approximately Normal populations having equal variance. We can make r different Normal Quantile Plots and look for linearity in everyone and approximately equal slope of the linear trace. The problem with this is that the ni are often quite small so the plots may be hard to interpret. An alternative is to make one Normal Quantile Plot of the residuals or the standardized residuals. The number of points in this plot is n = ∑i =1 ni . If the sampled populations are r approximately Normal then this plot should be fairly linear. If any of the sampled populations are not nearly normal, this should disturb the linearity of the Normal plot of residuals. Thus the Normal Quantile Plot of residuals is the preferred method for checking the assumption of Normal populations when one wishes to use the One-Way Normal Model. Figure 7.6, page 451, is an example. The other assumption in the model is that all populations have the same variance σ 2. A graphical method for checking this is a scatterplot of the n ordered pairs ( yi , eij ) , i = 1, 2, …, r; j = 1, 2, …, ni. The scatter should roughly be spread in the same vertical interval over each value yi . Unequal variances across populations will likely show in this plot as somewhat different spreads over the yi points on the horizontal axis. Figure 7.5, page 451, is an example. An alternative is to plot residuals at observation number, assuming the observations are in contiguous subsets grouped by sampled population. Then one looks across each cluster of plotted residuals to see if the vertical spread of various clusters is roughly the same. Additionally, look at the traces of the Normal Quantile Plots for each sample to see if they are approximately parallel. Using JMP, the data table is formed with one column having all sample values, grouped sequentially by sampled population. A companion column is defined as a categorical variable (Normal or ordinal modeling type) column. Categories are populations and a different variable value is used in rows corresponding to different populations. The FIT Y BY X platform is used 2 with Y identified with the column of sample values and X the column of category identifications. Residuals can be saved in a new column of the data table by specifying or standardized as SAVE → SAVE CENTERED SAVE → SAVE STANDARDIZED. Then a Normal Quantile Plot of residuals can be made using ANALYZE → DISTRIBUTION → NORMAL QUANTILE PLOT for the residuals variable. A plot of residuals grouped by sampled population can be made using GRAPH → OVERLAY PLOT and specifying the residual column in the data table as the Y variable and the X the column of category identifications. 3