STATISTICS Self-Assessment Quiz 8 Answers 1. The online social networking site Facebook has found that the amount of time spent by American high school students on their site on weeknights is bimodal with mean 118 minutes per night and standard deviation of 14 minutes. What is the probability of collecting a random sample of fifty high school students and finding that the mean amount of time these students spend on Facebook per weeknight is at most 110 minutes? Answer this question with a complete English sentence. ---------------------------------------------------------------------------------------------------------------------------Let x represent the number of minutes per weeknight a high school senior spends on Facebook. Want: P( x ≤ 110) = P (0 ≤ x ≤ 110) , where x is the mean Facebook time (in minutes) of a sample of n = 50 high school students. Since a good (unbiased) sampling strategy was employed and assuming that there are more than 500 high school students, then 10n = 10 ⋅ 50 = 500 ≤ N . In addition, since n = 50 ≥ 40 , then the sampling distribution of sample means (i.e. the distribution all possible x s ) is approximately normal with mean µ x = µ = 118 14 minutes and standard deviation σ x = ≈ 1.98 minutes. 50 Hence, P( x ≤ 110) = P (0 ≤ x ≤ 110) = normalcdf 0, 110, 118, Answer: The probability of randomly selecting 50 high school students, from a population of high school students who spend an average of 115 minutes per weeknight on Facebook, and finding that this sample spends an average of at most 110 minutes is about 0.003%. 110 14 ≈ 2.7 E − 5 ≈ 0.000027 ≈ 0.003% . 50 118 mean Facebook time (in minutes) of a sample of 50 high school students ------------------------------------------------------------------------------------------------------------------------------------------- 2. Premiers of children’s movies are often accompanied by special promotional offers through one of the leading fast-food chains. During the many years that McDonald’s has run such promotions, the company has found that 15% of all its customers will buy the specialty item. When Disney re-released its movie “Snow White and the Seven Dwarfs” on DVD last year, McDonald’s simultaneously initiated a promotional offer enabling its customers to purchase a specially designed set of drink glasses featuring each of the seven dwarfs. Halfway through the promotional period, a study was performed at several restaurants throughout the country; 17.6% of the 500 customers sampled had purchased a set of these glasses. What was the probability of having randomly sampled 500 customers and finding at least 17.6% of them had purchased a set of the collectible glassware? WANT P( pˆ ≥ 0.176) = P(0.176 ≤ pˆ ≤ 1) . Thus, we need information about the distribution of all possible values of p̂ . GIVEN Population: All McDonald’s customers (whether nationwide or worldwide is not clear here) Parameter: p = 0.15 = 15% Statistic: With x representing the number of customers who purchased the collectible glassware 88 88 amongst 500 McDonald’s patrons, then since n = 500, pˆ = = = 0.176 = 17.6% . 500 500 FACT: The sampling distribution of the sample proportion --- the distribution of all sample proportions ( p̂ 's) --- will be approximately normally distributed with mean µ p̂ = p and standard deviation σ pˆ = pq = n p ⋅ (1 − p) IF AND ONLY IF: n N ≥ 10 ⋅ n (In English, the population must be at least 10 times larger than the sample to ensure the trials can be treated as if they’re independent if though sampling is done without replacement) np ≥ 10 nq = n ⋅ (1 − p ) ≥ 10 Checking all three of these conditions for this specific problem … Since n = 500, then the condition N ≥ 10 ⋅ n can be restated as N ≥ 10 ⋅ 500 ⇒ N ≥ 5,000 . Although I have no actual value for N, the total number of all McDonald’s customers (whether nationwide or worldwide), I am convinced that there are at least 5,000 of them in this country alone (and certainly worldwide). Therefore, N ≥ 10 ⋅ n . np = (500) ⋅ (0.15) = 75 ≥ 10 nq = n ⋅ (1 − p ) = (500) ⋅ (1 − 0.15) = (500) ⋅ (0.85) = 425 ≥ 10 Therefore, we can deduce (i.e. conclude) that the distribution of sample proportions, p̂ , is approximately normally distributed with mean µ pˆ = p = 0.15 and standard deviation σ pˆ = pq = n (0.15)(0.85) 500 ≈ 0.016 . Hence, ( ) P pˆ ≥ 88 = P( pˆ ≥ .176) = P(0.176 ≤ pˆ ≤ 1) 500 = normalcdf 0.176,1, 0.15, ≈ 0.0517430168 ≈ 0.0517 (0.15)(0.85) 500 ≈ 0.0517 0.15 0.176 Answer: Assuming that the proportion of special glassware purchasers amongst all McDonald’s patrons is 0.15, the percentage of all random samples of 500 McDonald’s customers which contain at least 88 special glassware purchasers is approximately 0.0517. This is equivalent to saying that the probability of having sampled 500 McDonald’s customers and finding at least 17.6% (i.e. 88 or more) of them who had purchased a set of this collectible glassware is approximately 0.0517 – only about 5 out of 100 such groups of 500 McDonald’s customers would contain 17.6% or more people who purchased the specially designed set of drink glasses featuring each of the seven dwarfs 3. A drug manufacturer has developed a drug that is said to cure postnatal depression. A random sample of 150 women who gave birth, suffered postnatal depression, and who used the drug in a two-year period revealed that 120 of them found it effective. Address parts (a) - (c); you do NOT have to answer with a sentence. (a) Construct a 99% confidence interval for the percentage of all postnatal depression cases that are cured by this new drug. Since a good (unbiased) sampling strategy was employed and since the population of interest here is comprised of all cases of postnatal depression, the size of this population is at least in the hundreds of 120 = 0.80 . thousands, and so 10n = 10 ⋅150 = 1500 ≤ N . pˆ = 150 120 Also, npˆ = (150) = (150)(0.80) = 120 ≥ 10 and nqˆ = (150)(1 − 0.80) = (150)(0.20) = 30 ≥ 10 . 150 Consequently, the sampling distribution of proportions is approximately normally distributed, and so we can use the calculator’s 1-PropZInt function to construct the requested confidence interval. We get the following rounded values: (0.71587, 0.88413) ≈ (71.6%, 88.4%) . Answer: Based on this sample, we are 99% confident that the percentage of all postnatal depression cases that are cured by this new drug is in the (approximate) interval (71.6%, 88.4%) . (b) Identify the margin of error (to the nearest whole percent) that’s in the parameter’s estimate you gave in part (a). Answer: The margin of error is (approximately) 6.4%. Since (71.6%, 88.4%) is equivalent to 120 = 0.80 = 80% and the margin of error is ME pˆ = 8.4% (since 150 80% − 8.4% = 71.6% and 80% + 8.4% = 88.4% ) . Alternatively, the confidence interval is 80% ± 8.4% , where pˆ = pˆ ± ME pˆ , where pˆ = 120 = 0.80 = 80% & ME pˆ = z * 150 ˆˆ pq n = 2.58 (0.8)(0.2) 150 ≈ 0.084 = 8.4% , 0.01 where z * = InvNorm 0.99 + , 0, 1 = InvNorm ( 0.995,0,1) ≈ 2.58 . 2 (c) Suppose p represents the percentage of all postnatal depression cases that are cured by this new drug. Is the probability that p is in the interval you created in part (a) equal to 99%? Justify your answer clearly and completely. Answer: No, the parameter p (the percentage of all postnatal depression cases that are cured by this new drug) is either in the specific 99% confidence interval (71.6%, 88.4%) , or it’s not. The method used in part (a) to construct a 99% confidence interval here is what had a 99% chance of succeeding. That is, before any particular sample was drawn (and before any specific statistics were available), there was a 99% chance of creating a 99% confidence interval that captures the parameter p. Whether the specific interval (71.6%, 88.4%) is one of the successful interval, or not, is unknown. 4. The number of hours per week that high school seniors spend on homework is normally distributed with a mean of 10 hours and a standard deviation of 3 hours. Address parts (a) and (b) below. (a) What is the probability that one randomly chosen high school senior spends more than 15 hours per week on his or her homework? Let x represent the number of hours per week that one high school senior spends on homework. Want: P ( x > 15 ) = P (15 < x ≤ 168 ) since there are at most 168 hours per week ( 7 ⋅ 24 = 168 ) . Given: The variable I called x is Normally Distributed with mean µ = 10 hours and standard deviation σ = 3 hours. Hence, P ( x > 15 ) = P (15 < x ≤ 168 ) = Area under the given Normal curve between x = 15 & x = 168 = normalcdf (15,168,10, 3) ≈ 0.0477903304 ≈ 0.05 10 15 hours per week a high school senior spends on homework x Answer: The probability that one randomly chosen high school senior spends more than 15 hours per week on his or her homework is approximately 5%. That is, about 5% of all high school seniors spend more than 15 hours per week on their homework (b) What is the probability that the mean number of hours spent on homework per week of 36 randomly chosen high school seniors is greater than 15 hours? Want: P ( x > 15 ) , where x is the mean number of hours spent on homework per week of a sample of n = 36 high school seniors. Since there is a maximum of 168 hours per week, then we want P (15 < x ≤ 168 ) Although no actual value for N, the total number of all high school seniors, is given I believe that it is at least 10n = 10 ⋅ 36 = 360 . Therefore, 10n ≤ N . Even though n = 36 ≥ 40 , we are still able to conclude the sampling distribution of the sample mean (i.e. the distribution of all possible x s) is approximately normally distributed because of the fact that we were told that x was normally distributed in the population. Specifically, the sampling distribution of the sample mean is approximately normally distributed with mean µ x = µ and standard deviation σ x = σ N µ, = N 15, n function. σ . In other words, the sampling distribution here is n 3 3 1 = N 15, 6 = N 15, 2 , and so, we can use the TI calculator’s normalcdf 36 So, P ( x > 15 ) = P (15 < x ≤ 168 ) = Area under the deduced curve between x = 15 and x = 168 = normalcdf (15, 168, 10, 12 ) ≈ 7.77 E − 24 ≈ 8 E − 24 ≈ 0.000000000000000000000008 x 10 15 mean number of hours per week spent on homework by a group of 36 random high school seniors ≈ 0.0000000000000000000008% (a very small number) Answer: The probability that the mean number of hours spent on homework per week of 36 randomly chosen high school seniors exceeds 15 hours is approximately 0.0000000000000000000008% … it’s near 0%, but it is not equal to 0% (for this would imply that it was never possible). That is, it’s highly unlikely to randomly select 36 high school seniors and find that their average weekly homework time exceeds 15 hours. 5. Suppose a 95% confidence interval is accurately computed for µ resulting in the interval (112.4, 121.6). Identify those statements that are definitely true. Write the number of each true statement in your Blue Book. If none of the statements are true, write NONE . 95% of the time, µ falls within the interval (112.4, 121.6). One can have 95% confidence that µ is 117. 95% of all possible values for µ will fall within the interval (112.4, 121.6). 95% of the time, p falls within the interval (112.4, 121.6). Using this method, 95% of all the possible samples will produce the interval (112.4, 121.6) for µ . The standard error is 4.6. µ is 117. There is a 95% chance that µ will fall within the interval (112.4, 121.6). -----------------------------------------------------------------------------------------------------------------------------------The correct conclusions here are: • One can be 95% confident that µ is in the interval (112.4, 121.6). • The 95% confidence interval for µ , (112.4, 121.6), can be expressed as 117 ± 4.6 . • • • • ME 4.6 = ≈ 2.35 . * t 1.960 Before any sample was selected, the method employed here had a 95% chance of creating a 95% confidence interval that successfully captures µ . Whether the specific interval, (112.4, 121.6), is one x = 117 and ME = 4.6 , where ME = t *SEx with t * ≐ 1.960 ; thus, SEx = of these, or not, is unknown. Any statement involving 95% and the specific interval (112.4, 121.6) that does not contain the phrase “95% confident” is a false statement. Any statement involving the phrase “95% confident” that doesn’t involve an interval of infinitely many numbers is false. Answer: NONE 6. A newspaper reports that the governor’s approval rating stands at 65%. The article adds that the poll is based on a random sample of 972 adults and has a margin of error of 2.5%. What level of confidence did the pollsters use? Since a good sampling strategy (SRS) was used and 10n = 10 ( 972 ) = 9720 ≤ N , where N , the number of adults in any one state in America (the specific state was given here), is in the millions, npˆ = ( 972 )( 0.65 ) = 631.8 ≥ 10 , and nqˆ = ( 972 )( 0.35 ) = 340.2 ≥ 10 , then the sampling distribution of sample proportions is normal. Therefore, the margin of error in any confidence interval for the true proportion is ME pˆ = z * ME pˆ = z * ˆˆ pq ⇒ 0.025 = z* n (0.65)(0.35) 972 ⇒ z* = 0.025 (0.65)(0.35) ˆˆ pq . n ≈ 1.63 . 972 the area of this region is the level of confidence p p̂ value is unknown −1.63 0 1.63 z P ( −1.63 ≤ z ≤ 1.63) = normalcdf ( −1.63, 1.63, 0, 1) ≈ 0.89689 ≈ 90% Answer: Rounded to the nearest whole percent, the pollsters used a 90% confidence interval