Chapter 8 Inference Concerning Proportions

Chapter 8 Inference Concerning Proportions Inference for a Single Proportion (p) • Goal: Estimate proportion of individuals in a population with a certain characteristic (p). This is equivalent to estimating a binomial probability • Sample: Take a SRS of n individuals from the population and observe X that have the characteristic. The sample proportion is X/n and has the following sampling properties: ^ Sample proportion : p  X n Mean and Std. Dev. of sampling distributi on :  ^  p  ^  p p p (1  p ) n  ^ p 1  p    Estimated Standard Error : SE ^  p n Shape : approximat ely normal for large samples (Rule of thumb : X , n  X  15) ^ Large-Sample Confidence Interval for p • Take SRS of size n from population where p is true (unknown) proportion of successes. – Observe X successes – Set confidence level C and choose z* such that P(-z*Z z*)=C (C = 90%  z*=1.645 C = 95%  z*=1.96 C = 99%  z*=2.576) X Point Estimate : p  n ^  p 1   n ^ Estimated Standard Error : SE ^  p Margin of error : m  z *SE ^ p ^ C % confidence interval for p : p m  p  ^ Example - Ginkgo and Azet for AMS • Study Goal: Measure effect of Ginkgo and Acetazolamide on occurrence of Acute Mountain Sickness (AMS) in Himalayan Trackers • Parameter: p = True proportion of all trekkers receiving Ginkgo&Acetaz who would suffer from AMS. • Sample Data: n=126 trekkers received G&A, X=18 suffered from AMS 18 (.14)(. 86) p  .143 SE ^   .031 p 126 126 Margin of error (C  95%) : m  1.96(.031)  .061 95% CI for p : .143  .061  (.082,.204) ^ Wilson’s “Plus 4” Method • For moderate to small sample sizes, large-sample methods may not work well wrt coverage probabilities • Simple approach that works well in practice (n10): – Pretend you have 4 extra individuals, 2 successes, 2 failures – Compute the estimated sample proportion in light of new “data” as well as standard error: ~ Point Estimate : p  X 2 n4 ~   p 1  p    n4 ~ Estimated Standard Error : SE ~  p Margin of error : m  z *SE ~ p ~ C % confidence interval for p : p m Example: Lister’s Tests with Antiseptic • Experiments with antiseptic in patients with upper limb amputations (John Lister, circa 1870) • n=12 patients received antiseptic X=1 died 1 2 3 .1875(.8125) p   .1875 SE ~   .0976 p 12  4 16 16 Margin of error( C  95%) : 1.96(.0976)  .1913 95% CI for p : .1875  .1913  (.0038,.3988)  (0,.40) ~ Significance Test for a Proportion • Goal test whether a proportion (p) equals some null value p0 H0: p=p0 ^ p  p0 Test Statistic : zobs  po (1  p0 ) n H a : p  p0 P - value  P( Z  zobs ) H a : p  p0 P - value  P( Z  zobs ) H a : p  p0 P - value  2 P( Z  zobs ) Large-sample test works well when np0 and n(1-p0) > 10 Ginkgo and Acetaz for AMS • Can we claim that the incidence rate of AMS is less than 25% for trekkers receiving G&A? • H0: p=0.25 Ha: p < 0.25 18 n  126 X  18 p   0.143 p0  0.25 126 .143  .25  .107 Test Statistic : zobs    2.75 .039 .25(.75) 118 P - value  P ( Z  2.75)  .0030 ^ Strong evidence that incidence rate is below 25% (p<0.25) Comparing Two Population Proportions • Goal: Compare two populations/treatments wrt a nominal (binary) outcome • Sampling Design: Independent vs Dependent Samples • Methods based on large vs small samples • Contingency tables used to summarize data • Measures of Association: Absolute Risk, Relative Risk, Odds Ratio Contingency Tables • Tables representing all combinations of levels of explanatory and response variables • Numbers in table represent Counts of the number of cases in each cell • Row and column totals are called Marginal counts 2x2 Tables - Notation Group 1 Outcome Present X1 Outcome Absent n1-X1 Group Total n1 Group 2 X2 n2-X2 n2 Outcome Total X1+X2 (n1+n2)(X1+X2) n1+n2 Example - Firm Type/Product Quality Not Integrated Vertically Integrated Outcome Total High Quality Low Quality Group Total 33 55 88 5 79 84 38 134 172 • Groups: Not Integrated (Weave only) vs Vertically integrated (Spin and Weave) Cotton Textile Producers • Outcomes: High Quality (High Count) vs Low Quality (Count) Source: Temin (1988) Notation • Proportion in Population 1 with the characteristic of interest: p1 • Sample size from Population 1: n1 • Number of individuals in Sample 1 with the characteristic of interest: X1 • Sample proportion from Sample 1 with the ^ characteristic of interest: X1 p1  n1 • Similar notation for Population/Sample 2 Example - Cotton Textile Producers • p1 - True proportion of all Non-integretated firms that would produce High quality • p2 - True proportion of all vertically integretated firms that would produce High quality n1  88 n2  84 X 1  33 X 1 33 p1    0.375 n1 88 X2  5 X2 5 p2    0.060 n2 84 ^ ^ Notation (Continued) • Parameter of Primary Interest: p1-p2, the difference in the 2 population proportions with the characteristic (2 other measures given below) ^ ^ • Estimator: D p p 1 2 • Standard Error (and its estimate):  ^  ^  ^  p1 1  p1  p 2 1  p 2      n1 n2 ^ D  p1 (1  p1 ) p2 (1  p2 )  n1 n2 SED  • Pooled Estimated Standard Error when p1=p2=p: SEDP   ^  1 1  p1  p      n1 n2  ^ X1  X 2 p n1  n2 ^ Cotton Textile Producers (Continued) • Parameter of Primary Interest: p1-p2, the difference in the 2 population proportions that produce High quality output ^ ^ D  p1  p 2  0.375  0.060  0.315 • Estimator: • Standard Error (and its estimate):  ^  ^  ^  p1 1  p1  p 2 1  p1       0.375(0.625)  0.060(0.94)  .003335  .0577 n1 n2 88 84 ^ SED  • Pooled Estimated Standard Error when p1=p2=p: SEDP 1 1  0.2210.779    .0633  88 84  ^ p 33  5  0.221 88  84 Confidence Interval for p1-p2 (Wilson’s Estimate) • Method adds a success and a failure to each group to improve the coverage rate under certain conditions: X1 1 p1  n1  2 ~ X 2 1 p2  n2  2 ~ ~ ~ D  p1  p 2  ~  ~  ~  p1  1  p1  p 2  1  p 2      n1  2 n2  2 ~ SE ~  D • The confidence interval is of the form: ~ ~  *  p1  p 2   z SE ~ D   ~ Example - Cotton Textile Production X  1 33  1 34 p1  1    0.378 n1  2 88  2 90 ~ ~ ~ ~ p2  X 2 1 5 1 6    0.070 n2  2 84  2 86 ~ D  p1  p 2  0.378  0.070  0.308 0.3780.622 0.0700.930 SE ~    .00261  .00076  .0581 D 90 86 95% Confidence Interval for p1-p2: 0.308  1.96(0.0581)  0.308  0.114  (0.194,0.422) Providing evidence that non-integrated producers are more likely to provide high quality output (p1-p2 > 0) Significance Tests for p1-p2 • Deciding whether p1=p2 can be done by interpreting “plausible values” of p1-p2 from the confidence interval: – If entire interval is positive, conclude p1 > p2 (p1-p2 > 0) – If entire interval is negative, conclude p1 < p2 (p1-p2 < 0) – If interval contains 0, do not conclude that p1  p2 • Alternatively, we can conduct a significance test: – H0: p1 = p2 Ha: p1  p2 (2-sided) ^ ^ – Test Statistic: zobs  Ha: p1 > p2 (1-sided) p1  p 2  ^  1 1  p1  p      n1 n2  ^ – P-value: 2P(Z|zobs|) (2-sided) P(Z zobs) (1-sided) Example - Cotton Textile Production H 0 : p1  p2 ( p1  p2  0) H A : p1  p2 ( p1  p2  0) ^ TS : zobs  ^ p1  p 2  ^  1 1  p1  p      n1 n2  ^  0.375  0.060 1   1 0.221(0.779)    88 84   0.315  4.98 0.0633 RR : zobs  z.025  1.96 P - value  2 P( Z  4.98)  0 Again, there is strong evidence that non-integrated performs are more likely to produce high quality output than integrated firms

Chapter 8 Inference Concerning Proportions

Related documents

Products

Support

Chapter 8 Inference Concerning Proportions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib