Stat 401B Lab 2

advertisement
Stat 401B Lab 2
Overview
In this lab you will look at residuals using JMP. The definition of a residual is the
difference between an observed value and a predicted value. The predicted value, and
thus the residual, is determined by the type of problem you have (the statistical model).
Computer Exercises
The objective for these exercises is to show you how to get residuals using JMP.
1. We will go back to the length of stay in hospital data. The JMP file is available
from the course web page. Recall that this file contains a sample of 40 lengths of
stay in hospital for newborns. For this one-sample problem the model and
definition of a residual are:
Model : yi = µ + ε
Residual : yi − y
a. In order to create residuals for the one sample problem you need to
subtract the value of the sample mean from each length of stay value. To
do this:
•
•
Go to the data table and add a new column. To do this click on
the red triangle next to Columns. Select New Column. Name this
column Residual.
Highlight the Residual column in the data table and go to the
Cols pull down menu. Select Formula. A formula window will
open. Enter a formula using the following mouse clicks.
[Days] – Statistical + Col Mean[Days]
b. Use Analyze + Distribution to analyze the residuals. You should include a
histogram, box plot and Normal Quantile Plot. You may also wich to use
Fit Distribution + Normal, to superimpose a normal curve on the
histogram.
2. We will now go back to the contents of cans of cola data. The JMP file is
available from the course web page. Recall that this file contains the contents
(ml) for 12 cans of regular and 12 cans of diet cola. For this two-sample problem
the model and definition of a residual are:
Model : y1i = µ1 + ε 1i
Residual : y1i − y1
y 2i = µ 2 + ε 2i
y 2i − y 2
1
a. JMP can automatically calculate residuals for you from the Fit Y by X
platform. Go to Analyze + Fit Y by X and enter Weight as the Y,
Response and Type as the X, Factor. Click on OK. Go to the red triangle
pull down in the output window and select Save + Residual. This will
create a new column in your data table labeled Weight centered by Type.
These are the residuals for the two-sample problem.
b. Use Analyze + Distribution to analyze the residuals, Weight centered by
Type. You should have a histogram, box plot, and Normal Quantile Plot.
You can use Fit Distribution + Normal, to superimpose a normal curve on
the histogram. Note that even though you have two samples of Weights
(Diet and Regular) you analyze a combined (single) set of residuals.
c. For the two-sample model there is also the condition of equal standard
deviations for the random errors. You can check this condition both
graphically and numerically.
•
•
Graphically: Use Fit Y by X and cast the residuals (Weight
centered by Type) as the Y, Response and Type as the X, Factor.
You should see side-by-side dot plots of the residuals. JMP
should automatically add a horizontal line at zero. Look at the
spread for each Type of cola.
Numerically: Use Analyze + Distribution and cast Weight as the
Y, Columns and Type as the By variable. This will calculate the
sample standard deviations for each Type of cola. You can
compare these two sample standard deviations.
2
Download