Stat 401B Lab 2

advertisement
Stat 401B Lab 2
Overview
In this lab you will look at residuals using JMP. The definition of a residual is the
difference between an observed value and a predicted value. The predicted value, and
thus the residual, is determined by the type of problem you have (the statistical model).
Computer Exercises
The objective for these exercises is to show you how to get residuals using JMP.
1. We will go back to the fuel economy data. The JMP file is available from the
course web page. Recall that this file contains a sample of 36 vehicles and their
combined city and highway mpg. For this one-sample problem the model and
definition of a residual are:
Model : yi = μ + ε
Residual : yi − y
a. In order to create residuals for the one sample problem you need to
subtract the value of the sample mean from each length of stay value. To
do this:
•
•
Go to the data table and add a new column. To do this click on
the red triangle next to Columns. Select New Column. Name this
column Residual.
Highlight the Residual column in the data table and go to the
Cols pull down menu. Select Formula. A formula window will
open. Enter a formula using the following mouse clicks.
[Average MPG] – Statistical + Col Mean[Average MPG]
b. Use Analyze + Distribution to analyze the residuals. You should include a
histogram, box plot and Normal Quantile Plot. You may also wish to use
Fit Distribution + Normal, to superimpose a normal curve on the
histogram.
2. We will now go back to the body mass index data for men and women. The JMP
file is available from the course web page. Recall that this file contains the Body
Mass Index (BMI) for 50 men and 50 women. For this two-sample problem the
model and definition of a residual are:
Model : y1i = μ1 + ε 1i
Residual : y1i − y1
y 2i = μ 2 + ε 2i
y 2i − y 2
1
a. JMP can automatically calculate residuals for you from the Fit Y by X
platform. Go to Analyze + Fit Y by X and enter BMI as the Y, Response
and Gender as the X, Factor. Click on OK. Go to the red triangle pull
down in the output window and select Save + Residual. This will create a
new column in your data table labeled BMI centered by Gender. These are
the residuals for the two-sample problem.
b. Use Analyze + Distribution to analyze the residuals, BMI centered by
Gender. You should have a histogram, box plot, and Normal Quantile
Plot. You can use Fit Distribution + Normal, to superimpose a normal
curve on the histogram. Note that even though you have two samples of
BMI values (Male and Female) you analyze a combined (single) set of
residuals.
c. For the two-sample model there is also the condition of equal standard
deviations for the random errors. You can check this condition both
graphically and numerically.
•
•
Graphically: Use Fit Y by X and cast the residuals (BMI
centered by Gender) as the Y, Response and Gender as the X,
Factor. You should see side-by-side dot plots of the residuals.
JMP should automatically add a horizontal line at zero. Look at
the spread for each Gender.
Numerically: Go to the red pull down in the Fit Y by X output
and select Means and Std Dev. This will provide the standard
deviations for the residuals for Males and Females separately.
You can compare these two sample standard deviations. The
condition of equal population standard deviations is satisfied
provided one sample standard deviation is less than 3 times as
large as the other sample standard deviation.
Alternatively, use Analyze + Distribution and cast BMI as the Y,
Columns and Gender as the By variable. This will calculate the
sample standard deviations for each Gender. You can compare
the two sample standard deviations.
2
Download