Lab 10 instructions

advertisement
Stat 301 – Lab 10
Goals: In this lab, we will see how to:
construct an F test for multiple parameters
fit models to indicator variables
fit models to factor variables with automatically created indicator variables
estimate cell means and marginal means for 2 way factorial designs
Constructing an F test for multiple parameters:
You can get an F test for any set of parameters by requesting that from JMP. The advantage is that this gives you the pvalue as well as the F statistic. We will use the tamsales1.txt data set as our example. We will compare the linear model
E SalePrice = β0 + β1 Land + β2 Improve to the quadratic model E SalePrice = β0 + β1 Land + β2 Improve + β3 Land2 + β4
Improve2 + β5 Land* Improve. We need to test whether β3 =0, β4 = 0, and β5 =0.
Use Analyze / Fit Model to fit the quadratic model. Then click the red triangle by, select Estimates, then Custom Test.
The dialog allows you specify each piece of the null hypothesis to be tested. The test we want has 3 pieces: 1) β3 =0, 2)
β4 = 0, and 3) β5 =0. Each piece is specified as an equation (so if you wanted to test β3 - 2 β4 + 3β5 =8, you could). For us,
each piece is 1 x a coefficient = 0. AND WE NEED 3 PIECES. Put a 1 in the column of numbers next to LAND*LAND, then
click Add Column. Put a 1 next to IMPROVE*IMPROVE in the second column (for the second piece of the null
hypothesis), and click Add Column. Put a 1 next to LAND*IMPROVE in the third column, then click done. Before you
click done, the Custom Test box should look like:
We need three columns because the null hypothesis has three pieces. We only want one 1 in each column, because we
want to test that parameter = 0 for each piece.
JMP gives you information about each piece, then the F statistic output. The relevant output is:
The numerator DF is the number of pieces in your hypothesis. Good idea to check that this is what you expected. The F
statistic and p-value are labelled F Ratio and Prob > F. Here, the p-value is < 0.0001.
Note: Question 1 in the self assessment concerns constructing F tests for multiple parameters.
We will use the text examples of maintenance bids (BIDMAINT.txt) and diesel engine performance (DIESEL.txt) for the
last parts of lab, on analyzing qualitative variables.
Fit models to indicator variables for qualitative variables:
Load the BIDMAINT.txt data set. That data set has one qualitative factor, State, with 3 levels: Kansas, Kentucky, and
Texas. This data set already includes an X1 and X2 variable.
1. Look at the data set and compare the value in the State variable and the value in the X1 variable. X1 is an
indicator variable. That means it has the value of 1 for some level of State. X1 is an indicator for which state?
X2 is also an indicator variable. For which state?
2. To fit a regression, use Analyze / Fit Model and use X1 and X2 as the X variables. The parameter estimates are
the values reported in the text. The interpretation of these values is explained in the text.
Automatically create indicator variables for a qualitative variable:
Load the BIDMAINT.txt data set. Select Analyze / Fit Model
1. You will notice a set of red bars by State. That tells you that State is a qualitative variable (which JMP calls a
nominal variable).
2. Select Cost as the Y variable and State as the X variable, then run the model.
3. Some parts of the output are identical to the multiple regression output we’re familiar with:
Summary of Fit: familiar, Root Mean Square Error is a very useful number
Analysis of Variance: familiar. Compares the full model (3 means) to the “intercept only” model.
Parameter Estimates: these are “hidden”. To see them, click the sideways open triangle by “Parameter
Estimates”. You see three rows: Intercept, STATE[Kansas], and STATE[Kentucky]. The two “slopes” are for
indicator variables that JMP creates for you. These are slightly different from the 0/1 indicators described in the
text. I will briefly discuss these in lecture.
Residual by Predicted Plot: familiar
Some parts of the output are new:
Effect Tests: This is a comparison of the model that includes STATE to a model without. When there is only one
factor, the effect test for STATE is the same as the test in the Analysis of Variance box because both are
comparing the same pair of models.
Least Squares Means: When a model has a qualitative variable, the real interest is the means for each level of
the factor. These are calculated from the parameter estimates and reported in this table. For the models we’re
considering, the Least Squares Mean and Mean are the same quantity. The Standard Error is what you expect it
to be. You can also get a 95% confidence interval for each mean by clicking in the table, and selecting the Lower
95% and Upper 95% items.
To see what indicator variables JMP creates for you (optional):
1. After fitting the ANOVA model (using Analyze / Fit Model), click the red triangle by Response Cost, select Save
Columns and Save Coding Table (very last item on the menu). A new data set window opens with one row for
each observation and the indicator variables that JMP creates for you. If you look at them, you see that JMP
uses 1 / 0 / -1 values. Changes the parameter estimates for each indicator variable (which is why they’re
hidden) but doesn’t change the SS associated with each factor or the least squares means for each level.
Fitting a two-way factorial ANOVA model:
Load the Diesel.txt data set. Analyze / Fit Model. Put FUEL and BRAND in the Construct Model Effects box. Then, select
both variables in the Select Columns box and click Cross to add FUEL*BRAND to the model. Then run the model.
The ANOVA table is divided into two pieces:
1. The Error information is in the Analysis of Variance box of output.
2. F tests for individual factors are in the Effect Tests box of output.
The other really useful pieces of information are in the right-hand columns. These are the Least Squares Means tables
for each factor. The one under FUEL contains the lsmeans for each fuel type, averaged over the two brands of engine.
The one under BRAND contains the lsmeans for each brand, averaged over the three types of fuel. These are sometimes
called marginal means because these means are usually put in the margin of the table of means.
Note: The numbers we will use are in the Least Sq Means column. These are the averages that I defined in lecture. The
values in the Mean column are another way to calculate an average.
The one under FUEL*BRAND contains the six cell means, i.e. for each combination of fuel and brand.
There is a shortcut to creating the interaction variable: Select the two variables in the Select Columns box, then click
Macros and select Full factorial from the menu. JMP will put both variables and their interaction into the model box.
Further things you can do with LSMEANS:
If you click the red triangle by the name of the factor (or name of a factor if you have more than one), the menu
provides various things you do with the LSMEANS. The most relevant is to click LSMEANs Student’s T. You get a table
with a box for each pair of levels. The four numbers in each box, from top to bottom are:
The estimated difference
The standard error of that difference
and the lower and upper values for the 95% confidence interval for the difference.
Self Assessment Questions:
The data in PRACEXAM.txt are from a study of perceived helpfulness of two types of exam preparation (review or doing
a practice exam). Students are grouped by their class standing (low, medium or high) and the type of exam prep. they
used. The response variable is an 11 point rating scale for the perceived helpfulness of the exam prep (0 = not helpful at
all to 10 = extremely helpful). Start by fitting a two-way ANOVA model, with interaction, to these data.
1. Carry out the test of no interaction by testing whether both coefficients for the interaction equal 0. JMP will label
these PREP[PRACTICE]*STANDING[HI] and PREP[PRACTICE]*STANDING[LOW]. What are the F statistic and p-value?
2. Does average perceived helpfulness differ among the three levels of class standing?
3. Is the difference between the two types of preparation the same for all levels of class standing?
4. Estimate the average difference between practice and review, averaged over class standing.
5. What is a 95% confidence interval for the average difference between practice and review
6. Is it appropriate to use the average difference (from Q3) to describe the difference seen in low standing students?
Answers:
1. F = 1.77 with p=0.17.
Note: These should be the same as reported automatically by JMP as the F test for PREP*STANDING.
If you got F= 3.35, p = 0.069, you put both 1’s in the same column, which tests whether the sum of the two interaction
coefficients = 0. If you check the Numerator DF, you’ll see that there is only one “piece” in this test; you want two.
2. No evidence of a difference (p = 0.118), looking at the F test for the STANDING factor
3. No evidence of an interaction (p = 0.174).
4. 1.29 (Practice higher than Review)
5. (0.62, 1.96)
6. Yes, because there is no evidence of an interaction.
Download