Lab 13 instructions

advertisement
Lab 13 – Fall 2015
Goals: This lab demonstrates:
1) How to fit a logistic regression
2) How to reshape contingency table information into an analyzable form
3) set JMP defaults
4) how to get JMP help
The first part uses donner.csv, the Donner party data set. The second uses Vit C 2.csv, the Vitamin C
study data set in a new format.
Logistic regression:
Read the donner.csv data file into JMP. You will notice that both sex and status are categorical variables
(red bats). If you want the unadjusted analysis (just comparing sexes, no adjustment for age), use
Analyze / Fit Y by X to get the contingency table and Chi-square test of equal probability of death (or
equal probability of survival) for the two sexes. You can also get the odds ratio by clicking the red
triangle by Contingency Analysis and selecting Odds Ratio. The first column in the contingency table is
died, so the odds ratio is the odds of that outcome in the first row divided by the odds in the second
row. The female odds is (1/3) / (2/3) = 0.5 while the male odds is (2/3) / (1/3) = 2. Hence, the odds ratio
is 0.5 / 2 = 0.25. JMP automatically calculates the 95% confidence interval for the odds ratio.
To get the adjusted analysis (comparing sexes at the same age), we need to shift to the Analyze / Fit
Model platform. To match lecture results, we first need to create an indicator variable that is 1 for
female and 0 for male. Create a new column, which I’ll call Ifemale, and open the formula dialog.
Select Sex in the Table Columns box, Conditional in the Functions box, and choose Match (2nd in the list).
You should get a new menu with two items. Select Add Match Arguments from Data. You will (or
should) see three lines: “FEMALE” with a box labelled then clause, “MALE” with a box, and a line with
two blank boxes. Since we want the variable Ifemale to be 1 when the sex is FEMALE, put a line in the
then clause box to the right of female. Then click on the box to the right of MALE and put 0 into it. The
third line is for observations with a missing value. Leave that line blank. The dialog, when ready, should
look like:
Click OK, and you will see that the Ifemale column has values of 0 or 1, depending on the SEX value, as
desired.
To actually fit the logistic regression: Analyze / Fit Model, make sure that the response variable is
categorical (red bar), then put that variable in the Y box. You will see the Personality change from
Standard Least Squares to Nominal Logistic. Put Age and Isex into the Model Effects box and click the
Run button.
From top to bottom, the default output includes:
Whole Model Test: Tests whether all slopes = 0 (equivalent of the model F test in multiple regression).
For the Donner party data, tests whether beta for age = 0 and beta for Ifemale = 0.
This block also includes the AICc and BIC statistics for the model.
It is followed by various measures that compare observations and predictions.
Lack of Fit: A comparison of the specified regression model to a model with a different probability for
each unique X value. The equivalent of the ANOVA lack of fit test for continuous responses. Stat 301
hasn’t discussed these.
Parameter Estimates: The estimates and standard errors for each parameter. Also tests of whether each
parameter = 0. These are called Wald tests and are the equivalent of a t-test.
Effect Likelihood Ratio Tests: Tests for each effect. These are likelihood ratio (drop in deviance) tests,
which are more reliable than the tests in the Parameter estimates box. These are the equivalent of F
tests for each effect. Unlike Least Squares (where t tests and F tests of a single parameter have the
same p-value), p-values from the likelihood ratio test are not the same as the p-values from Wald tests.
JMP will calculate the odds ratios for each effect if you click the red triangle by Nominal Logistic Fit and
select Odds Ratios. You get two boxes of output: one for the odds ratio when each variable increased by
1 (Unit Odds Ratios) and the other that compares the odds at the largest value of X to the smallest value
of X. For both, you also get the 95% confidence interval. The value labelled reciprocal is 1/the odds
ratio, which is useful if JMP calculated the ratio “the wrong way round”.
The odds of a female surviving are 4.9 times that of a male of the same age.
Reshape contingency table information:
Lab 12 showed how to analyze contingency table data that came as one row for each cell in the table,
with the count for that cell (the Vit C 1 data) and data that came as one row for each subject (the Vit C 3
data). You may also get data formatted as a table, i.e.:
where column are the group and each of the possible responses. Each value is the number of
observations. To analyze these data, we need to reshape the table into a form like Vit C 1, i.e.:
To do this:
click on the data window
select Tables / Stack. We will combine the two columns (no cold and cold) of numbers into one
column of numbers and one column of labels (cold or no cold).
select the names of the data columns and click Stack Columns
type in names for the new output table (you will get a new data window), the column that will contain
the data values and the column that will contain the labels
columns not involved in the stacking (e.g. treatment) will be duplicated as needed
here’s what my dialog box looks like just before doing the stacking:
click OK
The result is a new data window with four rows of data with the counts for each cell in the contingency
table. The order of the groups (Placebo first or Vit C first) depends on the order in the original data
set. That order is irrelevant for the analysis.
Set JMP defaults:
You have probably noticed that JMP has many default behaviours. For example, Fit Model centers
polynomials by default when you include square or product terms in the model. If you don’t want this,
you can turn this off each time or you can change the default so that “Center Polynomials” is not
selected. To change defaults, select File / Preferences from the main menu in any JMP window. There
are huge number of options, organized by categories. All the statistical options are in the Platforms
dialog. That opens a long list of analyses as a menu of Platforms. The ones that we have worked with
are:
Distribution: to change default plots produced by Analyze / Distribution
Distribution Summary Statistic: to change default numerical summaries produced by Analyze /
Distribution
Overlay Plot: to change default characteristics of the Graph / Overlay Plot graphs
Scatterplot Matrix: to change default characteristics of the scatterplot matrix (Analyze / Multivariate
Methods / Multivariate)
Fit Least Squares: to change default output for Fit Y by X and Fit Model when Y is continuous
Contingency: to change default options for Fit Y by X and Fit Model when Y is discrete
Help for JMP: This is available in various places.
The 301 lab pages will be kept available until the next time I teach 301. They are then revised for each
year.
The Help menu in JMP provides various resources:
Help Contents / Search / Index: typical help file information that can be searched for specific things
Books provides electronic versions to books that were (long ago) the printed documentation for JMP.
These are up-to-date for the current version of JMP.
The books that are most relevant for what we’ve coverered are:
Basic Analysis: for Fit Distribution and Fit Y by X.
Fitting Linear Models: for multiple regr., logistic regr. and ANOVA (the Fit Model stuff)
Download