Analysis of Categorical Data I. Dichotomous Observations

STA 100 Lecture 16 Analysis of Categorical Data I. Dichotomous Observations A Practical Problem: Estimate the 5-year survival rate of patients receiving a standard treatment for breast cancer. a. Recall the binomial random variable b. An estimate for the population proportion p. c. The Wilson-Adjusted sample proportion. d. Sampling distribution of the estimators II. Confidence Intervals for a Population Proportion Example: Suppose among 75 cancer patients 67 survived after 10 years of being diagnosed and treated for the cancer. Find a 90% confidence interval for the survival rate. 2 III. Sample Size for Estimation of a Population Proportion A Practical Problem: We would like to estimate the proportion of patients visiting a local primary care clinic who will be diagnosed as depressed at their first visit. How many samples we need to select in order to estimate this proportion with 90% confidence and margin of error of 2%? Consider the large-sample confidence interval for p as p^ ± zα/2 √ p^ (1 - p^) / n The margin of error is M = zα/2 √ p^ (1 - p^) / n Which leads to n = ( zα/2 / M )2 p^ (1 - p^) In practice p^ is not known, but we many have a pilot estimate or guess, which we denote it by p*. Then n = ( zα/2 / M )2 p* (1 – p*) Now, if we have no idea about p, we can use Max p* (1 – p*) = 0.5 (1- 0.5) = 0.25 This leads to n = 0.25 ( zα/2 / M )2 Example: Diagnosis of depression. 3 IV. The Chi-Square Goodness-of-Fit Test Categorical data analysis is used when the variable under study is classified into several categories. A Practical Problem: The famous biologist and father of modern genetics Gregor Mendel crossed round yellow pea plants with wrinkled green pea plants obtaining the plants bearing peas in the following four categories: Category Frequency Round Yellow Round Green Wrinkled Yellow Wrinkled Green 315 108 101 32 According to his theory, the expected frequencies of these characteristics should be in proportion 9:3:3:1. Do the observed data agree with the theory ? Let O represent the observed and E the expected frequencies, respectively. Then the chi-square test statistics is: χ2 = Σ (O-E)2 / E The degree of freedom of the chi-square statistics is df = ν -1, where ν is the number of categories. We use Table 9 to find the critical values. 4 Example: Mandel’s experiment Category Frequency Round Yellow Round Green Wrinkled Yellow Wrinkled Green 315 108 101 32 V. The 2×2 Contingency Tables The 2x2 contingency tables arise in dealing with two binary categorical responses. Practical Problems: 1. In a study of side effects of certain prescription drug the following data were observed: Side Effects Present Absent Drug 15 35 4 46 Treatment Placebo Is there a side effect due to drug? 5 2. To study the relationship between smoking and cardio vascular disease the following table is created for a random sample of size 100. CVD No Yes No 50 15 Yes 10 25 Smoking Based on this data, is there an indication that smoking may cause CVD? The chi-square test statistics for testing independence or homogeneity in 2x2 tables is: χ2 = Σ (O-E)2 / E The degree of freedom of the chi-square statistics is df =1. Here, E = Row tot * Column tot / Grand tot Example1: Side Effects Side Effects Present Absent Drug 15 35 4 46 Treatment Placebo 6 Example2: CVD CVD No Yes No 50 15 Yes 10 25 Smoking 7

Analysis of Categorical Data I. Dichotomous Observations

Related documents

Products

Support

Analysis of Categorical Data I. Dichotomous Observations

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib