C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Advanced Placement Statistics Friday February 13, 2015 1 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Advanced Placement Statistics 2 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 3 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 1. Collect folders and materials 2. Start pennies activity ­ Austin B. 3. Review and check homework 4. LSRL Valentine 5. FRQ with partners (2 to 4 people in a group) 6. Return materials 4 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 5 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 6 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 OTL C7#4 page 437: 10, 12 page 437-8: 14,15,17,19,20 page 439: MC 21,22,23,24 25(review), 26(review) FINISH READING NOTES 7.2 pages 440-447 7 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#10 Tall Girls page 437 Tall girls According to the National Center for Health Statistics, the distribution of heights for 16­year­old females is modeled well by a Normal density curve with mean μ = 64 inches and standard deviation σ = 2.5 inches. To see if this distribution applies at their high school, an AP® Statistics class takes an SRS of 20 of the 300 16­year­old females at the school and measures their heights. What values of the sample mean would be consistent with the population distribution being N(64, 2.5)? To find out, we used Fathom software to simulate choosing 250 SRSs of size n = 20 students from a population that is N(64, 2.5). The figure below is a dotplot of the sample mean height of the students in each sample. ≈ 25 dots in this section ≈ 64.7 > (a) There is one dot on the graph at 62.4. Explain what this value represents. One of the 250 samples that randomly selected 20 girls from the 300 16­year­old females at the school resulted in a sample mean of 62.4 inches. (The sample randomly found 20 shorter girls .... it can happen!, just not very often.) > (b) Describe the distribution. Are there any obvious outliers? The "beginning" sampling distribution of 16­year­old girls heights is symmetric. It is bell shaped. The center of the distribution is approximately 64 inches. The range of the distribution is 65.75­62.4 = 3.35 inches. There appear to be a few potential outliers. The distribution can be analyzed using the mean and standard deviation. > (c) Would it be surprising to get a sample mean of 64.7 or more in an SRS of size 20 when μ = 64? Justify your answer. NO, A sample result of 64.7 inches or more is not totally suprising. This type of sample mean occured approximately 25 out of 250 times or 10% of the time. So it is a bit unusual but not a huge suprise. > (d) Suppose that the average height of the 20 girls in the class’s actual sample is = 64.7. What would you conclude about the population mean height μ for the 16­ year­old females at the school? Explain. If a sample result was actually 64.7 inches (or greater) I would conclude that it is very possible that the true mean height μ for the 16­year­old females is 64 inches. There is NO convincing evidence against the 64 inches claim. 8 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#12 Tall Girls page 437 Tall girls Refer to Exercise 10. > (a) Make a graph of the population distribution. > (b) Sketch a possible dotplot of the distribution of sample data for the SRS of size 20 taken by the AP® Statistics class. Yep sorry guys, I just took this from the solution manual ... my bad .... 9 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#14 Cold Cabin? page 438 Exercises 13 and 14 refer to the following setting. During the winter months, outside temperatures at the Starneses’ cabin in Colorado can stay well below freezing (32°F, or 0°C) for weeks at a time. To prevent the pipes from freezing, Mrs. Starnes sets the thermostat at 50°F. The manufacturer claims that the thermostat allows variation in home temperature that follows a Normal distribution with σ = 3°F. To test this claim, Mrs.Starnes programs her digital thermometer to take an SRS of n = 10 readings during a 24­ hour period. Suppose the thermostat is working properly and that the actual temperatures in the cabin vary according to a Normal distribution with mean μ = 50°F and standard deviation σ = 3°F. Cold cabin? The Fathom screen shot below shows the results of taking 500 SRSs of 10 temperature readings from a population distribution that is N(50, 3) and recording the sample minimum each time. > (a) Describe the approximate sampling distribution. The approximate sampling distribution of minimum temperature readings is slightly left skewed. (but not too badly). The center will be around 45OF. The values vary from 39 to 51 degrees for a range of 12 degrees F. The data should be analyzed with the 5 number symmary. > (b) Suppose that the minimum of an actual sample is 40°F. What would you conclude about the thermostat manufacturer’s claim? Explain. Due to the fact that if the thermostat is really set at 50O F, a sample of 40O F would very rarely happen. (only about 3 out of 500 times by chance or 0.006 which is 0.6% random chance). This sample result provides convincing evidence that the manufacturer's claim is FALSE. The thermostat does not have a σ of 3O F. 10 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#15 A Sample of teens page 438 A sample of teens A study of the health of teenagers plans to measure the blood cholesterol levels of an SRS of 13­ to 16­year­olds. The researchers will report the mean from their sample as an estimate of the mean cholesterol level μ in this population. Explain to someone who knows little about statistics what it means to say that is an unbiased estimator of μ. If we chose many SRSs and calculated the sample mean x for each sample, we will not consistently underestimate μ or consistently overestimate μ. A statistic is an unbiased of the population "God know" real answer when a graph of many random samples of this statistic produced a "picture" that is balanced at the real population blood cholesterol level of 13 to 16 year-olds. (presumably in the USA) Unbiased estimator can be explained by saying that our sample was selected by a method that will make its result (the average) point to the average of the population. If many other samples of the same size are selected in the same way then eventually the average of all of the samples will equal the average of the population that we are trying to estimate. 11 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#17 A sample of teens page 438 A sample of teens Refer to Exercise 15. The sample mean is an unbiased estimator of the population mean μ no matter what size SRS the study chooses. Explain to someone who knows nothing about statistics why a large random sample will give more trustworthy results than a small random sample. Sampling distributions contain all samples of a given size n. So the center of the entire sampling distribution will be exactly the center of the population. However, smaller n sizes will have more variability than larger n sizes. The smaller n histograms will spead out more left to right than the bigger n histograms. So choosing a large n will reduce your chances of missing the population center by a large amount. Individual samples will most likely miss the true center. Samples from a large n will miss with less "distance" than samples from a small n. 12 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#19 A sample of teens page 438 Bias and variability The figure below shows histograms of four sampling distributions of different statistics intended to estimate the same parameter. High Bias, High Variability Low Bias, Low Variability Low Bias, High Variability High Bias, Low Variability 13 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#20 IRS Audits page 438 IRS audits The Internal Revenue Service plans to examine an SRS of individual federal income tax returns. The parameter of interest is the proportion of all returns claiming itemized deductions. Which would be better for estimating this parameter: an SRS of 20,000 returns or an SRS of 2000 returns? Justify your answer. Choosing a SRS of size 20,000 would be better for estimating the population parameter. It will produce a sampling distribution that is much LESS variable than a sample size of 2000. i.e. All samples of size 20,000 will be closer to the true population parameter. 14 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#21 MC page 439 At a particular college, 78% of all students are receiving some kind of financial aid. The school newspaper selects a random sample of 100 students and 72% of the respondents say they are receiving some sort of financial aid. Which of the following is true? (a) 78% is a population and 72% is a sample. (b) 72% is a population and 78% is a sample. (c) 78% is a parameter and 72% is a statistic. (d) 72% is a parameter and 78% is a statistic. (e) 78% is a parameter and 100 is a statistic. 15 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#22 MC page 439 A statistic is an unbiased estimator of a parameter when (a) the statistic is calculated from a random sample. (b) in a single sample, the value of the statistic is equal to the value of the parameter. (c) in many samples, the values of the statistic are very close to the value of the parameter. (d) in many samples, the values of the statistic are centered at the value of the parameter. (e) in many samples, the distribution of the statistic has a shape that is approximately Normal. 16 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#23 MC page 439 In a residential neighborhood, the median value of a house is $200,000. For which of the following sample sizes is the sample median most likely to be above $250,000? (a) n = 10 (b) n = 50 (c) n = 100 (d) n = 1000 (e) Impossible to determine without more information. 17 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#24 MC page 439 Increasing the sample size of an opinion poll will reduce the (a) bias of the estimates made from the data collected in the poll. (b) variability of the estimates made from the data collected in the poll. (c) effect of nonresponse on the poll. (d) variability of opinions in the sample. (e) variability of opinions in the population. 18 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#25 Dem Bones page 439 Dem bones (2.2) Osteoporosis is a condition in which the bones become brittle due to loss of minerals. To diagnose osteoporosis, an elaborate apparatus measures bone mineral density (BMD). BMD is usually reported in standardized form. The standardization is based on a population of healthy young adults. The World Health Organization (WHO) criterion for osteoporosis is a BMD score that is 2.5 standard deviations below the mean for young adults. BMD measurements in a population of people similar in age and gender roughly follow a Normal distribution. You better show lots of work! (a) What percent of healthy young adults have osteoporosis by the WHO criterion? N(0,1) Z: ­3 ­2 ­1 0 1 2 3 P(z < ­2.5) ≈ 0.0062 This interprets (in the context of this problem)... The probability of randomly choosing a young adult with a BMD 2.5 standard deviations below the "norm" is approximately 0.62% or 62 out of 10,000. (b) Women aged 70 to 79 are, of course, not young adults. The mean BMD in this age group is about −2 on the standard scale for young adults. Suppose that the standard deviation is the same as for young adults. What percent of this older population has osteoporosis? N(­2,1) Z: ­3 ­2 ­1 0 1 2 3 X: ­5 ­4 ­3 ­2 ­1 0 1 x = ­2.5 z = ­0.5 ­2.5 ­ (­2) ­0.5 z = = 1 1 P(x < ­2.5) = P(z < ­0.5) ≈ 0.3085 This interprets (in the context of this problem)... The probability of randomly choosing an older woman with a BMD 2.5 standard deviations below the "norm" (for young adults) is approximately 30.85% or 31 out of 100. 19 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7# 26 Squirrels and their food supply page 439 Squirrels and their food supply (3.2) Animal species produce more offspring when their supply of food goes up. Some animals appear able to anticipate unusual food abundance. Red squirrels eat seeds from pinecones, a food source that sometimes has very large crops. Researchers collected data on an index of the abundance of pinecones and the average number of offspring per female over 16 years.3 Computer output from a least­squares regression on these data and a residual plot are shown below. FIND YOUR LSRL SHEETS! (a) Give the equation for the least­squares regression line. Define any variables you use. > > offspring = 0.4399(pinecone) + 1.4146 offspring = predicted average number of offspring per female pinecone = the index of the abundance of pine cones. > (b) Is a linear model appropriate for these data? Explain. A linear model is appropriate because their is no pattern in the residual scatterplot. > (c) Interpret the values of r2 and s in context. r2 = 57.2%, 57.2% of the variation in the average number of offspring per female is explained by the variation in the index of the abundance of pine cones as calculated by the LSRL of offspring on pine cone index. s = 0.600309. 0.60 is the standard deviation of the residuals. This is the typical amount that an observed average number of offspring differs from its predict average number of offspring on the Least Squares Regression Line. 20 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 OTL C7#5 page 448: 35 & 37 milk in cereal bowl please follow the directions on the problem we did in class. QUIZ WEDSNESDAY FEBRUARY 18th Read and Notes Section 7.3 Page 450-461 21 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#35 Do You Drink the Cereal Milk? page 448 A USA Today poll asked a random sample of 1012 very cool bowl !! U.S. adults what they do with the milk in the bowl after they have eaten the cereal. Let p be the proportion of people in the sample who drink the cereal milk. A spokesperson for the dairy industry claims that 70% of all U.S. adults drink the cereal milk. Suppose this claim is true. ∧ a) What is the mean of the sampling distribution of p. Why? Master sample proportion μ∧p = 0.70 This is an unbiased estimator of ρ. ∧ b) Find the standard deviation of the sampling distribution of p. Check to see if the 10% condition is met. We must check to see if N ≥10*1012 ? 10(1012) = 10,120 and this is definitely less than the U.S. adult population U.S. adults ≥ 10,120 sample standard deviation 0.7(1­0.7) = 0.0144 ∧ σp = 1012 √ ∧ c) Is the sampling distribution of p approximately Normal? Check to see if the Large Counts Condition is met. We must check to see if n*p ≥ 10 and if n(1­p) ≥ 10 1012(.70) = 708.4 708.4 ≥ 10 yes 1012(.30) = 303.6 303.6 ≥ 10 yes ∧ We can use the Normal approximation d) Of the poll respondents, 67% said that they drink the cereal milk. Find the probability of obtaining a sample of 1012 adults in which 67% or fewer say they drink the cereal milk if milk industry spokesman's claim is true. Does this poll give convincing evidence against the claim? NOTATION CHANGE P(x < ) = P(z < ) ∧ P(p < ) = P(z < ) N(0.70, 0.0144) 0.6568 0.6712 0.6856 0.7 0.7144 0.7288 0.7432 P(p < 0.67) = P(z < ­2.08 ) = 0.0188 0.67 ­ 0.70 z = ­­­­­­­­­­­­­­­­ = ­2.08 0.0144 There is only a 0.0188 or ≈ 2 out of 100 chance that this survey would happen randomly. I think something is fishy! There is a 0.0188 probability of obtaining a sample in which 67% or fewer say they drink the milk. Because this is a small probability, there is convincing evidence against the claim. 22 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 Exercise C7#37 Do You Drink the Cereal Milk? page 448 What sample size would be required to reduce the standard deviation of the sample proportion to one­half the value you found in 35? sample standard deviation σp = 0.7(1­0.7) = 0.0144 √ 1012 do not make 1012 2 times bigger, make it ???? times bigger. (4 times) b) If the pollsters had surveyed 1012 teenages instead of adults, do you think the sample proportion would have been greater or less than 0.67 ? I believe it would be less because teenagers do not drink as much milk. 23 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 24 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 25 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 26 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 27 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 HONESTLY, THIS SHOULD NOT TAKE YOU VERY MUCH TIME !!! 28 C7 S2 Sampling distribution day 2 2015.notebook February 12, 2015 OTL C7#6 FINISH VALENTINE FINISH FRQ Remember to do .... QUIZ WEDSNESDAY FEBRUARY 18th Read and Notes Section 7.3 Page 450-461 29