Chapter 8.1 — Inference for population proportions Inference for a population proportion p Stat 226 – Introduction to Business Statistics I Suppose we are interested in the proportion of people with credit card debt larger than $5,000. Spring 2009 Professor: Dr. Petrutza Caragea Section A Tuesdays and Thursdays 9:30-10:50 a.m. The parameter of interest is now no longer a mean but a proportion, e.g. say 25% of all credit card holders. We will denote the population proportion by p. The Census Bureau obtains a random sample of 2500 people and found 750 have more than $5,000 credit card debt. How would you estimate p? Chapter 8, Section 8.1 Inference for population proportions Stat 226 (Spring 2009) Introduction to Business Statistics I Section 8.1 1 / 15 Chapter 8.1 — Inference for population proportions Assuming that we will have a random sample the value of ! p will be random as well (our statistic ! p is a random variable) mean of ! p is the population proportion p, i.e. µbp = p " Section 8.1 2 / 15 confidence intervals hypotheses tests As the sample size n increases, the spread of the sampling distribution of ! p decreases. Introduction to Business Statistics I Section 8.1 Knowing the sampling distribution of ! p , we can do inference for the population proportion p in form of p(1 − p) n ! is an Because the mean of the sampling distribution is indeed p, p unbiased estimator of p. Stat 226 (Spring 2009) Introduction to Business Statistics I For sufficiently large sample sizes we have that # " $ p(1 − p) ! p ∼ N p, n properties of the sampling distribution of ! p are shape is close to normal σbp = Stat 226 (Spring 2009) Chapter 8.1 — Inference for population proportions sampling distribution of ! p standard deviation of ! p is We can use the sample proportion ! p to estimate the population proportion ! is an unbiased estimator of p. p, p 3 / 15 Stat 226 (Spring 2009) Introduction to Business Statistics I Section 8.1 4 / 15 Chapter 8.1 — Inference for population proportions Chapter 8.1 — Inference for population proportions Example: Bob wonders what proportion of students at his school think that tuition is too high. He interviews a random sample of 50 of the 2400 students at his (small) college and finds that 38 think that tuition is too high. Construct a 95% confidence interval for the proportion of students of the entire college thinking that tuition is too high. a (1 − α) · 100% ci for the population proportion p A (1 − α) · 100% confidence interval for the population proportion p is given by # $ " ! p (1 − ! p) ∗ ! p±z , n % by using ! p instead of the unknown p. where we estimate σbp = p(1−p) n Stat 226 (Spring 2009) Introduction to Business Statistics I Section 8.1 5 / 15 Stat 226 (Spring 2009) Introduction to Business Statistics I Section 8.1 Chapter 8.1 — Inference for population proportions Chapter 8.1 — Inference for population proportions Assumptions Answer: Yes. Use the so-called Wilson’s estimator: & p How?? We simply add 4 “phony” observations — 2 “yes” (positive) counts and 2 “no” (negative) counts. Then estimate p. Independence Assumption Plausible independence? A (1 − α) · 100% confidence interval for the population proportion p using Wilson’s estimate is given by # $ " & p (1 − & p) ∗ & p±z n+4 Random sample? 10% condition: Population size > 10 · n Sample size assumption: Large enough for CLT Check that n · p > 10 and n · (1 − p) > 10 This helps to move ! p further away from 0 or 1, respectively. Question: If not true, can we still estimate p? Stat 226 (Spring 2009) Introduction to Business Statistics I 6 / 15 As long as n ≥ 5, this works very well. Section 8.1 7 / 15 Stat 226 (Spring 2009) Introduction to Business Statistics I Section 8.1 8 / 15 Chapter 8.1 — Inference for population proportions Chapter 8.1 — Margin of error (m) too large to be useful Example: Bob’s data cont’d Stat 226 (Spring 2009) Introduction to Business Statistics I Section 8.1 9 / 15 Stat 226 (Spring 2009) Introduction to Business Statistics I Chapter 8.1 — Determining sample size Chapter 8.1 — Determining sample size Back to Bob’s data: a confidence interval that says that the percentage of people who think they pay too much for tuition is between 10% and 90% wouldn’t be of much use. Most likely, you have a sense of how large a margin of error you can tolerate. You would like to get a narrower interval without giving up confidence ⇒ you need to have less variability in your sample proportions. How can you do that? Choose a larger sample. How large??? This yields a required sample size of Recall when estimating unknown means, we used What we know: z ∗ , m What we usually don’t know: p What can we do??? n≥ ' z∗ · σ m (2 1 all we need to do is to adjust the standard deviation σ and use the standard deviation of ! p ) σbp = p(1 − p). Stat 226 (Spring 2009) Introduction to Business Statistics I n≥ Section 8.1 11 / 15 ' z∗ m (2 · p (1 − p) Section 8.1 10 / 15 Section 8.1 12 / 15 round up!! One possibility: Consider the Worst Case Scenario (the one that needs the largest sample size): p = 0.5. Stat 226 (Spring 2009) Introduction to Business Statistics I Chapter 8.1 — Determining sample size Chapter 8.1 — Inference for population proportions Example: Bob’s data we want a 95% CI for p that has a width of only 0.1 Other possibilities: 1 2 !, or & Use information on p, p p from previous studies (e.g. prior belief, a pilot study, historical data) if we are going to use Wilson’s estimate & p , we need to remember that we add 4 “phony” observations, so we really only need n≥ ' z∗ m (2 observations! — again round up!! Stat 226 (Spring 2009) ·& p (1 − & p) − 4 Introduction to Business Statistics I Section 8.1 13 / 15 Chapter 8.1 — Inference for population proportions significance test for a single proportion when performing a hypothesis test, the null hypothesis specifies a value for p, which we will call p0 when calculating p-values we act again as if the hypothesized p (i.e. p0 ) was actually true. when testing H0 : p = p0 , we substitute p0 for p in the expression for σbp and then standardize ! p in order to obtain our test statistic. That is, we get p̂ − p0 z=" p0 (1 − p0 ) n see Handout – Hypothesis Testing for one proportion p Stat 226 (Spring 2009) Introduction to Business Statistics I Section 8.1 15 / 15 ⇒ margin of error m=width/2 = 0.1/2=0.05 based on Bob’s previous data we have x +2 4 & p= = = 0.2857 n+4 14 Stat 226 (Spring 2009) Introduction to Business Statistics I Section 8.1 14 / 15