Stat 401B Lab 2 Overview In this lab you will look at residuals using JMP. The definition of a residual is the difference between an observed value and a predicted value. The predicted value, and thus the residual, is determined by the type of problem you have (the statistical model). Computer Exercises The objective for these exercises is to show you how to get residuals using JMP. 1. We will go back to the length of stay in hospital data. The JMP file is available from the course web page. Recall that this file contains a sample of 40 lengths of stay in hospital for newborns. For this one-sample problem the model and definition of a residual are: Model : yi = µ + ε Residual : yi − y a. In order to create residuals for the one sample problem you need to subtract the value of the sample mean from each length of stay value. To do this: • • Go to the data table and add a new column. To do this click on the red triangle next to Columns. Select New Column. Name this column Residual. Highlight the Residual column in the data table and go to the Cols pull down menu. Select Formula. A formula window will open. Enter a formula using the following mouse clicks. [Days] – Statistical + Col Mean[Days] b. Use Analyze + Distribution to analyze the residuals. You should include a histogram, box plot and Normal Quantile Plot. You may also wich to use Fit Distribution + Normal, to superimpose a normal curve on the histogram. 2. We will now go back to the contents of cans of cola data. The JMP file is available from the course web page. Recall that this file contains the contents (ml) for 12 cans of regular and 12 cans of diet cola. For this two-sample problem the model and definition of a residual are: Model : y1i = µ1 + ε 1i Residual : y1i − y1 y 2i = µ 2 + ε 2i y 2i − y 2 1 a. JMP can automatically calculate residuals for you from the Fit Y by X platform. Go to Analyze + Fit Y by X and enter Weight as the Y, Response and Type as the X, Factor. Click on OK. Go to the red triangle pull down in the output window and select Save + Residual. This will create a new column in your data table labeled Weight centered by Type. These are the residuals for the two-sample problem. b. Use Analyze + Distribution to analyze the residuals, Weight centered by Type. You should have a histogram, box plot, and Normal Quantile Plot. You can use Fit Distribution + Normal, to superimpose a normal curve on the histogram. Note that even though you have two samples of Weights (Diet and Regular) you analyze a combined (single) set of residuals. c. For the two-sample model there is also the condition of equal standard deviations for the random errors. You can check this condition both graphically and numerically. • • Graphically: Use Fit Y by X and cast the residuals (Weight centered by Type) as the Y, Response and Type as the X, Factor. You should see side-by-side dot plots of the residuals. JMP should automatically add a horizontal line at zero. Look at the spread for each Type of cola. Numerically: Use Analyze + Distribution and cast Weight as the Y, Columns and Type as the By variable. This will calculate the sample standard deviations for each Type of cola. You can compare these two sample standard deviations. 2