Stat 401 A – Lab 12: Analyzing proportions

advertisement
Stat 401 A – Lab 12: Analyzing proportions
Goals of today’s lab:




Reading a contingency table in count form
Contingency table tests, proportions, and measures of association
Reading event data in individual form
Exact tests for general contingency tables (more than 2x2).
All instructions use the Vitamin C study from lecture. The data are in vitc.txt, which will require import /
preview because fields are separated by spaces. That data set has four rows, one for each combination
of group (Vit C or Placebo) and cold (Yes or No). The n variable is the number of people in that cell of
the contingency table (cell = combination of group and cold).
Reading a contingency table in count form:
The data are in vitc.txt, which will require import / preview because fields are separated by spaces. That
data set has four rows, one for each combination of group (Vit C or Placebo) and cold (Yes or No). The n
variable is the number of people in that cell of the contingency table (cell = combination of group and
cold).
The treatment and cold variables should be red bar (categorical variables). They will be here because
the data to be stored for both variables are character strings. If one of the grouping variables is
numeric, you will need to convert the modeling type to “nominal” (i.e. make it a red bar variable) to get
a contingency table analysis.
Contingency table tests and proportions:
Analyze / Fit Y by X. Put one variable into the Y box and the other into the X box. I prefer to put the
variable that is closer to being a “response” into the Y box and the variable that is closer to being a
“treatment” variable into the X box. That means X is trt and Y is cold for this study. Put the sample size
(n) into the Freq box.
Note: For many contingency table analyses, the Y and X roles are interchangeable. So, if you’re
struggling to identify treatment and response, don’t worry about it.
The second part of the output is the contingency table (counts in each cell) with various proportions. If
the counts are 1 for each cell, you forgot to tell JMP that each row in the data set represents n
individuals.
The output includes:
1. a mosaic plot which graphically compares the proportions, emphasizing the treatment groups
with more individuals. (I’m happy to explain this plot, but I’m not going to do it here).
2. The contingency table. The first number in each box is the count of individuals for that cell. The
second is that count divided by the total sample size (proportion of the total N). The third is the
count divided by the column total (proportion of the response totals). The fourth is the count
divided by the row total (proportion of the group totals). For these data, the counts and the row
% are the most interesting numbers. You should see 335 individuals got colds on the Placebo
treatment, which is 81.51% of the Placebo individuals.
3. A block of tests. The Chi-square test is labeled “Pearson”. The Fisher’s exact test we’ll discuss in
lecture is the 2-tail number. If you prefer what some people call the “g test”, that is the
Likelihood Ratio test on the output.
Odds ratio and relative risk:
Red triangle by Contingency Analysis of code By trt. The drop-down menu should look like:
Select Odds Ratio to get the odds ratio and a confidence interval. Select Relative Risk to get relative risk
computations. Relative Risk will open a menu asking which computation you want. I generally choose
all combinations and then figure out which I want.
The output when both Odds Ratio and Relative Risk are selected is:
The odds ratio is calculated as the odds of the first column in the first row / odds of the first column in
the second row, i.e. (76/335) / (105/302). If you want the odds of the “other” event, or the odds for the
second row / odds for first, calculate (by calculator) the reciprocal. The confidence interval is a 95%
interval, by default. You can change this by first selecting “Set α level” from the red triangle menu. The
confidence interval is calculated as discussed in class: by working on the log odds scale and back
transforming.
Note: if you want the “other” odds, remember to also take the reciprocal of the ci endpoints.
The relative risks are computed for different choices of response (Yes or No) in both directions (Placebo
/ Vit C and Vit C / Placebo). JMP notation is the same as lecture notation: P( ) means probability of
(something). The vertical bar separates the response category from the group category. If the
responses and groups are backwards, i.e., JMP reports P(Placebo | No) and you want P(No | Placebo),
redo the analysis and switch what goes in the Y and X boxes.
Confidence intervals for proportions in each group:
Analyze / Distribution. Put the response variable (cold) in the Y box, Sample size (n) in the Freq box, and
Grouping variable (trt) in the By box. The default output repeats information in the contingency table.
What is new is the ability to get a confidence interval for the proportion (of No or of Yes) in each group.
Click the red triangle by code under Distributions trt=Placebo. You should get the following menu:
Select Confidence Interval, and choose the desired coverage (e.g. 0.95). You get a box of Confidence
Intervals below the Frequencies output. The first line includes the ci for P(No | Placebo). The second
line includes the ci for P(Yes | Placebo). These are computed using a refinement of the large sample
approach described in class.
To get intervals for probabilities in the Vit C group, you need to repeat the “Click the red triangle by
code” but this time you want the triangle under Distributions trt=Vit.C
Reading event data in individual form:
Often, the data are one row per individual, with a column for the group and a column for the outcome.
The data in madeup.txt are of this form. JMP can easily calculate the counts. Once you have the
contingency table, everything above follows without change.
Read the madeup.txt file into JMP. Again, you probably need import / preview because there is one
space between the fields. Change both variables (a and b) to red bar. Remember, there are various
ways to do this. I find right-clicking the blue ramp next to the variable name in the Columns frame, then
changing “Continuous” to “Nominal” to be the quickest. You may get a third column with missing values
because the import data gets confused by spaces at the end of the line. Just ignore this column
throughout the instructions. If you want to delete that third column, fine but not necessary.
Analyze / Fit Y by X, then put one variable into the X box and the other into the Y box. Click OK. Nothing
goes in the Freq box because each row is one observation.
Note: If the output is not labeled Contingency Analysis, you didn’t correctly set the modeling types (blue
ramp or red bar). Remember that we’ve also used Fit Y by X to fit regressions and to do t-tests. JMP
uses the modeling type to determine what analysis to do. That’s why it is important to make sure you
have red bars when you want a contingency table analysis.
Once you have set up the analysis, everything else is identical to when you analyzed summary data.
Exact analysis of general tables (more than 2 x 2):
JMP doesn’t seem to do this analysis. Sorry.
Download