Stat 401 A – Lab 12: Analyzing proportions

Stat 401 A – Lab 12: Analyzing proportions Goals of today’s lab:     Reading a contingency table in count form Contingency table tests, proportions, and measures of association Reading event data in individual form Exact tests for general contingency tables (more than 2x2). All instructions use the Vitamin C study from lecture. The data are in vitc.txt, which will require import / preview because fields are separated by spaces. That data set has four rows, one for each combination of group (Vit C or Placebo) and cold (Yes or No). The n variable is the number of people in that cell of the contingency table (cell = combination of group and cold). Reading a contingency table in count form: The data are in vitc.txt, which will require import / preview because fields are separated by spaces. That data set has four rows, one for each combination of group (Vit C or Placebo) and cold (Yes or No). The n variable is the number of people in that cell of the contingency table (cell = combination of group and cold). The treatment and cold variables should be red bar (categorical variables). They will be here because the data to be stored for both variables are character strings. If one of the grouping variables is numeric, you will need to convert the modeling type to “nominal” (i.e. make it a red bar variable) to get a contingency table analysis. Contingency table tests and proportions: Analyze / Fit Y by X. Put one variable into the Y box and the other into the X box. I prefer to put the variable that is closer to being a “response” into the Y box and the variable that is closer to being a “treatment” variable into the X box. That means X is trt and Y is cold for this study. Put the sample size (n) into the Freq box. Note: For many contingency table analyses, the Y and X roles are interchangeable. So, if you’re struggling to identify treatment and response, don’t worry about it. The second part of the output is the contingency table (counts in each cell) with various proportions. If the counts are 1 for each cell, you forgot to tell JMP that each row in the data set represents n individuals. The output includes: 1. a mosaic plot which graphically compares the proportions, emphasizing the treatment groups with more individuals. (I’m happy to explain this plot, but I’m not going to do it here). 2. The contingency table. The first number in each box is the count of individuals for that cell. The second is that count divided by the total sample size (proportion of the total N). The third is the count divided by the column total (proportion of the response totals). The fourth is the count divided by the row total (proportion of the group totals). For these data, the counts and the row % are the most interesting numbers. You should see 335 individuals got colds on the Placebo treatment, which is 81.51% of the Placebo individuals. 3. A block of tests. The Chi-square test is labeled “Pearson”. The Fisher’s exact test we’ll discuss in lecture is the 2-tail number. If you prefer what some people call the “g test”, that is the Likelihood Ratio test on the output. Odds ratio and relative risk: Red triangle by Contingency Analysis of code By trt. The drop-down menu should look like: Select Odds Ratio to get the odds ratio and a confidence interval. Select Relative Risk to get relative risk computations. Relative Risk will open a menu asking which computation you want. I generally choose all combinations and then figure out which I want. The output when both Odds Ratio and Relative Risk are selected is: The odds ratio is calculated as the odds of the first column in the first row / odds of the first column in the second row, i.e. (76/335) / (105/302). If you want the odds of the “other” event, or the odds for the second row / odds for first, calculate (by calculator) the reciprocal. The confidence interval is a 95% interval, by default. You can change this by first selecting “Set α level” from the red triangle menu. The confidence interval is calculated as discussed in class: by working on the log odds scale and back transforming. Note: if you want the “other” odds, remember to also take the reciprocal of the ci endpoints. The relative risks are computed for different choices of response (Yes or No) in both directions (Placebo / Vit C and Vit C / Placebo). JMP notation is the same as lecture notation: P( ) means probability of (something). The vertical bar separates the response category from the group category. If the responses and groups are backwards, i.e., JMP reports P(Placebo | No) and you want P(No | Placebo), redo the analysis and switch what goes in the Y and X boxes. Confidence intervals for proportions in each group: Analyze / Distribution. Put the response variable (cold) in the Y box, Sample size (n) in the Freq box, and Grouping variable (trt) in the By box. The default output repeats information in the contingency table. What is new is the ability to get a confidence interval for the proportion (of No or of Yes) in each group. Click the red triangle by code under Distributions trt=Placebo. You should get the following menu: Select Confidence Interval, and choose the desired coverage (e.g. 0.95). You get a box of Confidence Intervals below the Frequencies output. The first line includes the ci for P(No | Placebo). The second line includes the ci for P(Yes | Placebo). These are computed using a refinement of the large sample approach described in class. To get intervals for probabilities in the Vit C group, you need to repeat the “Click the red triangle by code” but this time you want the triangle under Distributions trt=Vit.C Reading event data in individual form: Often, the data are one row per individual, with a column for the group and a column for the outcome. The data in madeup.txt are of this form. JMP can easily calculate the counts. Once you have the contingency table, everything above follows without change. Read the madeup.txt file into JMP. Again, you probably need import / preview because there is one space between the fields. Change both variables (a and b) to red bar. Remember, there are various ways to do this. I find right-clicking the blue ramp next to the variable name in the Columns frame, then changing “Continuous” to “Nominal” to be the quickest. You may get a third column with missing values because the import data gets confused by spaces at the end of the line. Just ignore this column throughout the instructions. If you want to delete that third column, fine but not necessary. Analyze / Fit Y by X, then put one variable into the X box and the other into the Y box. Click OK. Nothing goes in the Freq box because each row is one observation. Note: If the output is not labeled Contingency Analysis, you didn’t correctly set the modeling types (blue ramp or red bar). Remember that we’ve also used Fit Y by X to fit regressions and to do t-tests. JMP uses the modeling type to determine what analysis to do. That’s why it is important to make sure you have red bars when you want a contingency table analysis. Once you have set up the analysis, everything else is identical to when you analyzed summary data. Exact analysis of general tables (more than 2 x 2): JMP doesn’t seem to do this analysis. Sorry.

Stat 401 A – Lab 12: Analyzing proportions

Related documents

Products

Support

Stat 401 A – Lab 12: Analyzing proportions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib