AP Class Notes 2-12b-15

advertisement
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Advanced Placement Statistics
Thursday February 12, 2015
Feb 4­9:05 PM
1. Collect folders and materials
2. Notes Quiz 7.2
3. Sampling distributions for proportions
4. Homework discussion AP Free + MC
5. Return materials
Feb 4­9:05 PM
1
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Feb 11­10:59 AM
Example:
The length of human pregnancies has a mean of 266 days and a standard deviation of 16 days. A random sample of 9 pregnant women was observed to have a mean pregnancy length of 270 days, with a standard deviation of 14 days. Identify the parameters and accompanying statistics in this situation.
Number
Population/proportion
Name
Sample/Statistic
Symbol
266
16
9
270
14
Feb 11­12:04 PM
2
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Feb 11­8:03 PM
OTL C7#4
page 437: 10, 12
page 437-8: 14,15,17,19,20
page 439: MC 21,22,23,24
25(review), 26(review)
FINISH READING NOTES 7.2 pages 440-447
May 16­7:22 PM
3
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Exercise C7#10 Tall Girls page 437
Tall girls According to the National Center for Health Statistics, the distribution of heights for 16­year­old females is modeled well by a Normal density curve with mean μ = 64 inches and standard deviation σ = 2.5 inches. To see if this distribution applies at their high school, an AP® Statistics class takes an SRS of 20 of the 300 16­year­old females at the school and measures their heights. What values of the sample mean would be consistent with the population distribution being N(64, 2.5)? To find out, we used Fathom software to simulate choosing 250 SRSs of size n = 20 students from a population that is N(64, 2.5). The figure below is a dotplot of the sample mean height of the students in each sample. ≈ 25 dots in this section
≈ 64.7
> (a) There is one dot on the graph at 62.4. Explain what this value represents.
One of the 250 samples that randomly selected 20 girls from the 300 16­year­old females at the school resulted in a sample mean of 62.4 inches. (The sample randomly found 20 shorter girls .... it can happen!, just not very often.)
> (b) Describe the distribution. Are there any obvious outliers?
The "beginning" sampling distribution of 16­year­old girls heights is symmetric. It is bell shaped. The center of the distribution is approximately 64 inches. The range of the distribution is 65.75­62.4 = 3.35 inches. There appear to be a few potential outliers. The distribution can be analyzed using the mean and standard deviation.
> (c) Would it be surprising to get a sample mean of 64.7 or more in an SRS of size 20 when μ = 64? Justify your answer.
NO, A sample result of 64.7 inches or more is not totally suprising. This type of sample mean occured approximately 25 out of 250 times or 10% of the time. So it is a bit unusual but not a huge suprise.
> (d) Suppose that the average height of the 20 girls in the class’s actual sample is = 64.7. What would you conclude about the population mean height μ for the 16­
year­old females at the school? Explain.
If a sample result was actually 64.7 inches (or greater) I would conclude that it is very possible that the true mean height μ for the 16­year­old females is 64 inches. There is NO convincing evidence against the 64 inches claim.
Feb 10­10:25 AM
Exercise C7#12 Tall Girls page 437
Tall girls Refer to Exercise 10.
> (a) Make a graph of the population distribution.
> (b) Sketch a possible dotplot of the distribution of sample data for the SRS of size 20 taken by the AP® Statistics class.
Yep sorry guys, I just took this from the solution manual ... my bad ....
Feb 10­5:28 PM
4
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Exercise C7#14 Cold Cabin? page 438
Exercises 13 and 14 refer to the following setting. During the winter months, outside temperatures at the Starneses’ cabin in Colorado can stay well below freezing (32°F, or 0°C) for weeks at a time. To prevent the pipes from freezing, Mrs. Starnes sets the thermostat at 50°F. The manufacturer claims that the thermostat allows variation in home temperature that follows a Normal distribution with σ = 3°F. To test this claim, Mrs.Starnes programs her digital thermometer to take an SRS of n = 10 readings during a 24­
hour period. Suppose the thermostat is working properly and that the actual temperatures in the cabin vary according to a Normal distribution with mean μ = 50°F and standard deviation σ = 3°F. Cold cabin? The Fathom screen shot below shows the results of taking 500 SRSs of 10 temperature readings from a population distribution that is N(50, 3) and recording the sample minimum each time. >
(a) Describe the approximate sampling distribution.
The approximate sampling distribution of minimum temperature readings is slightly left skewed. (but not too badly). The center will be around 45OF. The values vary from 39 to 51 degrees for a range of 12 degrees F. The data should be analyzed with the 5 number symmary.
>
(b) Suppose that the minimum of an actual sample is 40°F. What would you conclude about the thermostat manufacturer’s claim? Explain.
Due to the fact that if the thermostat is really set at 50O F, a sample of 40O F would very rarely happen. (only about 3 out of 500 times by chance or 0.006 which is 0.6% random chance). This sample result provides convincing evidence that the manufacturer's claim is FALSE. The thermostat does not have a σ of 3O F.
Feb 10­5:29 PM
Exercise C7#15 A Sample of teens page 438
A sample of teens A study of the health of teenagers plans to measure the blood cholesterol levels of an SRS of 13­ to 16­year­olds. The researchers will report the mean from their sample as an estimate of the mean cholesterol level μ in this population. Explain to someone who knows little about statistics what it means to say that is an unbiased estimator of μ. If we chose many SRSs and calculated
the sample mean x for each sample, we
will not consistently underestimate μ or
consistently overestimate μ.
A statistic is an unbiased of the population
"God know" real answer when a graph of
many random samples of this statistic
produced a "picture" that is balanced at the
real population blood cholesterol level of 13
to 16 year-olds. (presumably in the USA)
Unbiased estimator can be explained by saying that our sample was selected by a method that will make its result (the average) point to the average of the population.
If many other samples of the same size are selected in the same way then eventually the average of all of the samples will equal the average of the population that we are trying to estimate.
Feb 10­5:32 PM
5
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Exercise C7#17 A sample of teens page 438
A sample of teens Refer to Exercise 15. The sample mean is an unbiased estimator of the population mean μ no matter what size SRS the study chooses. Explain to someone who knows nothing about statistics why a large random sample will give more trustworthy results than a small random sample. Sampling distributions contain all samples of a given size n.
So the center of the entire sampling distribution will be exactly
the center of the population. However, smaller n sizes will have
more variability than larger n sizes. The smaller n histograms will
spead out more left to right than the bigger n histograms. So choosing a large n will reduce your chances of missing the population center by a large amount.
Individual samples will most likely miss the true center.
Samples from a large n will miss with less "distance" than samples from a small n.
Feb 10­5:33 PM
Exercise C7#19 A sample of teens page 438
Bias and variability The figure below shows histograms of four sampling distributions of different statistics intended to estimate the same parameter. High Bias, High Variability
Low Bias, Low Variability
Low Bias, High Variability
High Bias, Low Variability
Feb 10­5:35 PM
6
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Exercise C7#20 IRS Audits page 438
IRS audits The Internal Revenue Service plans to examine an SRS of individual federal income tax returns. The parameter of interest is the proportion of all returns claiming itemized deductions. Which would be better for estimating this parameter: an SRS of 20,000 returns or an SRS of 2000 returns? Justify your answer. Choosing a SRS of size 20,000 would be better for estimating the population parameter. It will produce a sampling distribution that is much LESS variable than a sample size of 2000. i.e. All samples of size 20,000 will be closer to the true population parameter.
Feb 10­7:26 PM
Exercise C7#21 MC page 439
At a particular college, 78% of all students are receiving some kind of financial aid. The school newspaper selects a random sample of 100 students and 72% of the respondents say they are receiving some sort of financial aid. Which of the following is true?
(a) 78% is a population and 72% is a sample.
(b) 72% is a population and 78% is a sample.
(c) 78% is a parameter and 72% is a statistic.
(d) 72% is a parameter and 78% is a statistic.
(e) 78% is a parameter and 100 is a statistic.
Feb 10­7:27 PM
7
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Exercise C7#22 MC page 439
A statistic is an unbiased estimator of a parameter when
(a) the statistic is calculated from a random sample.
(b) in a single sample, the value of the statistic is equal to the value of the parameter.
(c) in many samples, the values of the statistic are very close to the value of the parameter.
(d) in many samples, the values of the statistic are centered at the value of the parameter.
(e) in many samples, the distribution of the statistic has a shape that is approximately Normal.
Feb 10­7:28 PM
Exercise C7#23 MC page 439
In a residential neighborhood, the median value of a house is $200,000. For which of the following sample sizes is the sample median most likely to be above $250,000?
(a) n = 10
(b) n = 50
(c) n = 100
(d) n = 1000
(e) Impossible to determine without more information.
Feb 10­7:28 PM
8
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Exercise C7#24 MC page 439
Increasing the sample size of an opinion poll will reduce the
(a) bias of the estimates made from the data collected in the poll.
(b) variability of the estimates made from the data collected in the poll.
(c) effect of nonresponse on the poll.
(d) variability of opinions in the sample.
(e) variability of opinions in the population.
Feb 10­7:28 PM
Exercise C7#25 Dem Bones page 439
Dem bones (2.2) Osteoporosis is a condition in which the bones become brittle due to loss of minerals. To diagnose osteoporosis, an elaborate apparatus measures bone mineral density (BMD). BMD is usually reported in standardized form. The standardization is based on a population of healthy young adults. The World Health Organization (WHO) criterion for osteoporosis is a BMD score that is 2.5 standard deviations below the mean for young adults. BMD measurements in a population of people similar in age and gender roughly follow a Normal distribution.
You better show lots of work!
(a) What percent of healthy young adults have osteoporosis by the WHO criterion?
N(0,1)
Z: ­3 ­2 ­1 0 1 2 3
P(z < ­2.5) ≈ 0.0062
This interprets (in the context of this problem)... The probability of randomly choosing a young adult with a BMD 2.5 standard deviations below the "norm" is approximately 0.62% or 62 out of 10,000. (b) Women aged 70 to 79 are, of course, not young adults. The mean BMD in this age group is about −2 on the standard scale for young adults. Suppose that the standard deviation is the same as for young adults. What percent of this older population has osteoporosis?
N(­2,1)
Z: ­3 ­2 ­1 0 1 2 3
X: ­5 ­4 ­3 ­2 ­1 0 1
x = ­2.5
z = ­0.5
­2.5 ­ (­2) ­0.5
z = = 1
1
P(x < ­2.5) = P(z < ­0.5) ≈ 0.3085
This interprets (in the context of this problem)... The probability of randomly choosing an older woman with a BMD 2.5 standard deviations below the "norm" (for young adults) is approximately 30.85% or 31 out of 100. Feb 10­7:28 PM
9
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Sep 26­6:57 PM
Sep 26­6:58 PM
10
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Exercise C7# 26 Squirrels and their food supply page 439
Squirrels and their food supply (3.2) Animal species produce more offspring when their supply of food goes up. Some animals appear able to anticipate unusual food abundance. Red squirrels eat seeds from pinecones, a food source that sometimes has very large crops. Researchers collected data on an index of the abundance of pinecones and the average number of offspring per female over 16 years.3 Computer output from a least­squares regression on these data and a residual plot are shown below.
FIND YOUR LSRL SHEETS!
(a) Give the equation for the least­squares regression line. Define any variables you use.
>
>
offspring = 0.4399(pinecone) + 1.4146
offspring = predicted average number of offspring per female
pinecone = the index of the abundance of pine cones.
>
(b) Is a linear model appropriate for these data? Explain.
A linear model is appropriate because their
is no pattern in the residual scatterplot.
>
(c) Interpret the values of r2 and s in context.
r2 = 57.2%, 57.2% of the variation in the average number of
offspring per female is explained by the variation in the index of
the abundance of pine cones as calculated by the LSRL of
offspring on pine cone index.
s = 0.600309. 0.60 is the standard deviation of the residuals.
This is the typical amount that an observed average number of
offspring differs from its predict average number of offspring on
the Least Squares Regression Line.
Feb 10­7:29 PM
Feb 12­10:20 AM
11
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Feb 12­10:22 AM
9.2 SAMPLE PROPORTIONS QUALITITATIVE WORLD
∧
"ρ p world"
count of successes
∧
p = ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
count of total sample size n
This is a BINOMIAL SITUATION
http://www.youtube.com/watch?v=3aAtFrWft2k
Dec 31­1:36 PM
12
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Feb 12­10:22 AM
Feb 12­10:22 AM
13
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
CONCLUSION for NOTATION
For "both" worlds ...
Sample Statistics
Population Parameters
N = population size
η = sample size
For "qualititative" worlds ... (surveys and proportions)
Population Parameters
Sample Statistics
∧
p = sample proportion
ρ = population proportion
Dec 31­1:30 PM
There are three "center" proportions to understand.
sample proportion
sampling distribution
sample proportion
ρ
∧
p
∧
p our book uses μp∧
pronounced rho
pronounced p­hat
pronounced master p­hat
population proportion
It is the proportion of the
population that has the
given characteristic
It is the proportion of the
sample that has the
given characteristic
It is the center of the sampling
distribution that has the given
characteristic
It is usually unknown
or I call it the
(God knows) value
It is always calculated from
one sample
It is NEVER calculated from
scratch but always known
by theory
This value is
permanent for the
population
This value varies from
one sample to the next
This value is always equal to
the population proportion.
It does not vary.
Example:
53% of the voters
voted for Obama
Example:
Gallup after asking a
random sample of
750 voters predicts that
51% of the voters
will vote for Obama
Dec 31­1:46 PM
14
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Feb 12­10:24 AM
Feb 11­9:36 PM
15
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Binomial Mean and Standard Deviation
Return to introductory example ...
Jan 3­3:09 PM
How do we arrive at these new formulas? In Chapter 6.3 μ = n*p and σ = √p(1­p)n
count of successes
∧
p = ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
count of total sample size n
Binomial example ...
If there are 60 trials of a coin toss (p = 0.5) then we would expect 60*.5 = 30 heads.
In the "Inference" world we want percentages not counts do
divide both formulas by n ...
μ = n*p = p μ = p
n
and and
σ = √p(1­p)n = p(1­p)n = p(1-p)
n √
√
n
n2
SX =
√
p(1-p)
n
May 13­9:17 AM
16
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Feb 12­10:23 AM
Feb 12­10:24 AM
17
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Feb 11­9:35 PM
This rule let us know if we can use the formula for the new standard deviation of the sampling proportion.
If this rule fails (we are over achieving)
This rule tells us if we can use the normal curve to help
us make predictions about the future.
Remember the "wave" and program BISHAPE
May 16­9:00 PM
18
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
May 16­9:00 PM
Feb 11­9:36 PM
19
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Feb 12­10:25 AM
Feb 11­9:37 PM
20
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Exercise C7#31 Airport security page 447
Airport security The Transportation Security Administration (TSA) is responsible for airport safety. On some flights, TSA officers randomly select passengers for an extra security check before boarding. One such flight had 76 passengers—12 in first class and 64 in coach class. TSA officers selected an SRS of 10 passengers for screening. Let be the proportion of first­class passengers in the sample.
(a) Is the 10% condition met in this case? Justify your answer.
(b) Is the Large Counts condition met in this case? Justify your answer.
Feb 12­10:28 AM
Exercise C7#33 Hispanic workers page 448
Hispanic workers A factory employs 3000 unionized workers, of whom 30% are Hispanic. The 15­member union executive committee contains 3 Hispanics. What would be the probability of 3 or fewer Hispanics if the executive committee were chosen at random from all the workers? Feb 12­10:30 AM
21
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Exercise C7#36 Do you go to church? page 448
Do you go to church? The Gallup Poll asked a random sample of 1785 adults whether they attended church during the past week. Let be the proportion of people in the sample who attended church. A newspaper report claims that 40% of all U.S. adults went to church last week. Suppose this claim is true.
(a) What is the mean of the sampling distribution of ? Why?
(b) Find the standard deviation of the sampling distribution of . Check to see if the 10% condition is met.
(c) Is the sampling distribution of approximately Normal? Check to see if the Large Counts condition is met.
(d) Of the poll respondents, 44% said they did attend church last week. Find the probability of obtaining a sample of 1785 adults in which 44% or more say they attended church last week if the newspaper report’s claim is true. Does this poll give convincing evidence against the claim? Explain.
Feb 12­10:31 AM
Exercise C7#38 Do you go to church? page 448
Do you go to church? What sample size would be required to reduce the standard deviation of the sampling distribution to one­third the value you found in Exercise 36(b)? Justify your answer. Feb 12­10:34 AM
22
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
OTL C7#5
page 448: 35 & 37
milk in cereal bowl
please follow the directions on the problem we
did in class.
QUIZ WEDSNESDAY FEBRUARY 18th
Read and Notes Section 7.3
Page 450-461
Feb 13­4:54 PM
Exercise C7#35 Do You Drink the Cereal Milk? page 448
A USA Today poll asked a random sample of 1012 very cool bowl !!
U.S. adults what they do with the milk in the bowl after they have eaten the cereal. Let p be the proportion of people in the sample who drink the cereal milk. A spokesperson for the dairy industry claims that 70% of all U.S. adults drink the cereal milk. Suppose this claim is true.
∧
a) What is the mean of the sampling distribution of p. Why?
∧
Master sample proportion μp = 0.70
This is an unbiased estimator of ρ.
∧
b) Find the standard deviation of the sampling distribution of p. Check to see if the 10% condition is met.
We must check to see if N ≥ 10*1012 ?
10(1012) = 10,120 and this is definitely
less than the U.S. adult population
U.S. adults ≥ 10,120
sample standard deviation 0.7(1­0.7) = 0.0144
∧
σp = 1012
√
c) Is the sampling distribution of p approximately Normal? Check to ∧
see if the Large Counts Condition is met.
We must check to see if n*p ≥ 10 and if n(1­p) ≥ 10
1012(.70) = 708.4 708.4 ≥ 10 yes
1012(.30) = 303.6 303.6 ≥ 10 yes
∧
We can use the Normal approximation
d) Of the poll respondents, 67% said that they drink the cereal milk. Find the probability of obtaining a sample of 1012 adults in which 67% or fewer say they drink the cereal milk if milk industry spokesman's claim is true. Does this poll give convincing evidence against the claim?
NOTATION CHANGE
P(x < ) = P(z < )
∧
P(p < ) = P(z < )
N(0.70, 0.0144)
0.6568
0.6712
0.6856
0.7
0.7144
0.7288
0.7432
P(p < 0.67) = P(z < ­2.08 ) = 0.0188
0.67 ­ 0.70
z = ­­­­­­­­­­­­­­­­ = ­2.08
0.0144
There is only a 0.0188 or ≈ 2 out of 100 chance that this survey would happen randomly. I think something is fishy!
There is a 0.0188 probability of obtaining a sample in which 67% or fewer say they drink the milk. Because this is a small probability, there is convincing evidence against the claim.
Jan 3­2:08 PM
23
C7 S2 Sampling distribution day 1 2015.notebook
February 12, 2015
Exercise C7#37 Do You Drink the Cereal Milk? page 448
What sample size would be required to reduce the standard
deviation of the sample proportion to one­half the value you found in 35? sample standard deviation σp = 0.7(1­0.7) = 0.0144
√
1012
do not make 1012 2 times bigger, make it ???? times bigger. (4 times)
b) If the pollsters had surveyed 1012 teenages instead of adults, do
you think the sample proportion would have been greater or less than 0.67 ? I believe it would be less because teenagers do not drink as much milk.
Feb 12­4:41 PM
24
Download