Stat 401B Lab 3 Overview

advertisement
Stat 401B Lab 3
Overview
In this lab you will be introduced to using JMP for simple linear regression analysis. For this lab
you need to be sitting in front of Windows PC that has JMP.
Warm-up Exercise
Simple Linear Regression (SLR) is available in two ways in JMP: Fit Y by X and Fit Model. Fit
Y by X is under Basic Stats on the JMP Starter. Fit Model is under Modeling on the JMP Starter.
Both are available under the Analyze pull-down menu. Multiple Regression and some advanced
SLR features are available only under Fit Model. This handout illustrates obtaining regression
output from Fit Y by X. JMP’s Fit Model analysis platform will be covered later.
In class we looked at the relationship between the annual global temperature (Y) and the CO2
concentration, (X) for 20 years randomly selected from between 1943 and 2003. Last year, 2007,
will be reserved to see if the simple linear regression line can predict the annual global
temperature in that year.
Year
CO2 (ppmv), X
1965
1966
1968
1970
1971
1972
1973
1974
1977
1979
1981
1984
1986
1987
1989
1991
1993
1996
1999
2002
2007
320.03
321.37
323.05
325.68
326.32
327.46
329.68
330.25
333.90
336.85
339.93
344.42
347.15
348.93
352.91
355.59
357.04
362.64
368.31
373.10
382.43
Global
Temperature, Y
13.85
13.92
13.91
14.04
13.90
13.95
14.18
13.94
14.16
14.14
14.40
14.15
14.19
14.35
14.26
14.44
14.19
14.39
14.46
14.69
.
1
Fit Y by X
1. Create a data worksheet with three columns. To do this, click on the red triangle next to
Columns and Add Multiple Columns. Name the columns Year, CO2 and Temp. Do not
enter a value for Temp in 2007 (arrow down past this row and JMP will automatically put
a period in the cell indicating a missing value).
2. Go to Fit Y by X under Basic Statistics or select Analyze + Fit Y by X.
3. Select Temp as the Response (Y) and CO2 as the Factor (X), then click OK. This
produces a plot of the data.
4. Using the analysis pop-up menu (indicated by the little red triangle to the left of Bivariate
Fit of Temp by CO2), you can choose various fits and options including Fit Mean, Fit
Line, and Fit Polynomial, etc. Choosing a fit produces a table of output below the plot
corresponding to the fit. The fitted line/curve is also drawn on the plot and a new pop-up
menu of options specific to the fit is created just below the plot. Choosing options from
this new pop-up menu either creates more output or creates new columns in the
worksheet (e.g. you can save the residuals from a particular fit as a new column in the
worksheet for further analysis). One can also fit a model, exclude a row, and re-fit the
same model to see the effect of the excluded point on the fit because JMP will graph both
fitted equations on the same scatter plot and provide both sets of output in the same
analysis window. JMP also provides many options (formatting and statistical) by simply
Right-clicking on the items displayed (both graphical items and numerical/text items).
5. Select Fit Line from the analysis pop-up menu (red triangle icon). Note the fitted line is
now on the plot of the data. Explore the output displayed below the scatter plot. Also
notice the new pop-up menu below the plot (labeled Linear Fit); click here and notice
your analysis options.
6. Identify the following items from the Linear Fit output:
• The equation of the fitted line. (Note: JMP does not indicate that this is really a
predicted annual global temperature. You will need to add this when you write
the prediction equation.
• R2 = RSquare
• sY|X = Root Mean Square Error (Our estimate of the error standard deviation, σ .
• n
• Test statistic and P-value for the test of β 1 = 0 vs. β 1 ≠ 0
7. Right-click on the Parameter Estimates portion of the output. From the resulting
contextual pop-up menu, select Lower 95% from the Columns menu item. Repeat this
selecting Upper 95%. This adds two additional columns to this section of the output.
These columns provide 95% confidence intervals for the intercept and the slope of the
model: Y = β 0 + β 1 X + ε .
8. From the Linear Fit pop-up menu (just below the scatter plot), choose Confid Curves Fit
to plot 95% confidence bands for μY |X = β 0 + β 1 X . Now, select Confid Curves Indiv to
plot 95% prediction bands for individual values of Y.
9. Save Predicteds and Save Residuals both create new columns in the JMP data table. Note
that even though 2007 was not used in the analysis JMP will predict a value for this year.
Plot Residuals adds an additional plot to the bottom of the output – a plot of Residuals
( Ŷ ) vs. CO2 (X). Choose all three of these menu commands and note the new columns
and new plot. (You could now use JMP’s Analyze + Distribution platform to analyze the
residuals from this model – checking for Normality via JMP's Normal Quantile Plot for
example.)
2
Download