Bias - s3.amazonaws.com

advertisement
Chapter 7 Notes Sampling Distributions AP Stats
Name__________________________
7.1 What is a Sampling Distribution?
Parameter – a number that describes some
characteristic of the population. The value of a
parameter is usually not known because we cannot
examine the entire population.
Statistic – number that describes some
characteristics of a sample.
μ – for the population mean and 𝑥̅ for the sample mean.
ρ – (greek letter rho) for the population proportion and 𝑝̂ (p-hat) the sample proportion is used to represent the
unknown parameter ρ
EXAMPLE:
Identify the population, the parameter, the sample, and the statistics in each of the following settings.
a) A pediatrician wants to know the 75th percentile for the distribution of heights of 10 year old boys, so she takes a
sample of 50 patients and calculates Q3 = 56 inches.
The population is all 10 year old boys, parameter in the 75th percentile, or Q3. The sample is the 50 10 yr old boys
included in the sample, the statistic is the sample Q3 = 56 inches.
b) A Pew Research Center Poll asked 1102 12 to 17 year olds in the US if they have a cell phone. Of the
respondents, 71% said yes.
The population is all 12 to 17 yr old in the US, Parameter is the proportion p with cell phones. The sample is the 1102
12 to 17 yr old in the sample; the statistic is the sample proportion with a cell phone p-hat = 0.71
SAMPLING VARIABILITY
sampling variability: the value of a statistic varies in repeated random sampling
ACTIVITY:
Draw the class Dot Plot below:
Proportion of red chips is p = ½
1. Choose 20 red chips
2. Write the proportion of red chips
3. Record that proportion on the class dot plot
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Describe the distribution:
Definition:
The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible
samples of the same size from the same population.
Be careful of interchanging the following terms:
1) Population distribution describes individuals from the population
2) Sample distribution describes individuals from the sample
3) Sampling Distribution describes how a statistic varies in many samples from the population
Describing Sampling Distributions
The fact that statistics from samples have definite sampling distributions allows us to answer the question:
“How trustworthy is a statistic as an estimator of the parameter?”
To answer this: consider the center spread and shape.
The dot plot on the right shows the approximate sampling
distribution of p-hat. The center is very close to the parameter
value of 0.5, the parameter value.
Understanding a sampling distribution:
If we took all possible samples of 20 chips from the population,
calculated p-hat for each sample and then found the mean of all
those p-values, we’d get exactly 0.5.
We say that p-hat is an unbiased estimator of p.
Definition:
A statistic used to estimate a parameter is an unbiased estimator if the mean of its sampling distribution is
equal to the true value of the parameter being estimated.
Unbiased does not mean perfect! An unbiased estimator will almost always provide an estimate that is not
equal to the value of the population parameter. It is called “unbiased” because in repeated samples, the
estimates won’t consistently be too high or consistently too low.
n=100
n=1000
Larger samples have a clear advantage over smaller samples. They are much more likely to produce an
estimate close to the true value of the parameter.
The variability of a statistic is described by the spread of its sampling distribution. This spread is determined
primarily by the size of the random sample. Larger samples give smaller spread. The spread of the sampling
distribution does not depend on the size of the population, as long as the population is at least 10
times larger than the sample.
Bias, variability, and shape
We can think of the true value of the population parameter as the bull’s- eye on a target and of the sample
statistic as an arrow fired at the target. Both bias and variability describe what happens when we take many
shots at the target.
Bias means that our aim is off and
we consistently miss the bull’s-eye
in the same direction. Our sample
values do not center on the
population value.
High variability means that
repeated shots are widely
scattered on the target. Repeated
samples do not give very similar
results.
The lesson about center and spread is clear:
given a choice of statistics to estimate an
unknown parameter, choose one with no
or low bias and minimum variability.
The blue line represents the true number of tanks.
Note the different shapes. Which statistic gives the
best estimator? Why?
Why n-1 when calculating sample variance and standard deviation?
𝑠2 =
1
∑(𝑥𝑖 − 𝑥̅ )2
𝑛−1
If we replace 1/n with 1/(n-1) then the variance becomes a biased estimator. (a very watered down
explanation)
P 429 1,3,5,7,9,11,13,17-20
Identify the population, the parameter, the sample, and the statistic in each setting.
1. STOP SMOKING. A random sample of 1000 people who signed a card saying they intended to quit
smoking were contacted nine months later. It turned out that 210 (21%) of the sampled individuals that
had not smoked over the past six months.
Population: people who signed a card saying that they intend to quit smoking
Parameter of interest: proportion of the population who signed the cards saying they would not smoke who
actually quit smoking
Sample: a random sample of 1,000 people who signed the cards
Sample statistics: 𝑝̂ = 0.21
3. Hot turkey. Tom is cooking a large turkey breast for a holiday meal. He wants to be sure that the turkey
is safe to eat, which requires a minimum internal temperature of 165° F. Tom uses a thermometer to measure
the temperature of the turkey meat at four randomly chosen points. The minimum reading in the sample is
170° F.
Population: all the turkey meat.
Parameter of interest: minimum temperature.
Sample: 4 randomly chosen points in the turkey.
Sample statistic: sample minimum = 170F.
For each boldface number in 5 and 7, 1) state whether it is a parameter or a statistic and 2) use appropriate
notation to describe each number for example p = 0.65
5. Get your bearings. A large container of ball bearings has mean diameter 2.5003 centimeters (cm). This is
within the specifications for acceptances of the container by the purchases. By chance, an inspector chooses
100 bearings from the container that have mean diameter 2.5009 cm. Because, this is outside the specified
limits, the container is mistakenly rejected.
μ = 2.5003 is a parameter (related to the population of all the ball bearings in the container)
𝑥̅ = 2.5009 is a statistic (related to the sample of 100 ball bearings).
7. Unlisted Numbers. A telemarketing firm in LA uses a device that dials residential phone numbers in that
city at random. Of the first 100 numbers dialed, 48%, are unlisted. This is not surprising because 52% of all
LA residential phones are unlisted.
𝑝̂ = 0.48 is a statistic (related to the sample of 100 numbers dialed) and p = 0.52 is a parameter (related to the
population of all residential phone numbers in Los Angeles).
9. Doing homework. A school newspaper article claims that 60% of the students at a large high school did
their assigned homework last week. Some skeptical AP Statistics students want to investigate whether this
claim is true, so they choose an SRS of 100 students from the school to interview. What values of the sample
proportion 𝑝̂ would be consistent with the claim that the population proportion of students who completed all
their homework is p = 0.60? To find out, we used Fathom software to simulate choosing 250 SRSs of size n =
100 students from a population in which p = 0.60. The figure below is the dotplot of the sample proportion 𝑝̂ of
students who did all their homework.
a) Is this the sampling distribution of 𝑝̂ ? Justify your answer.
This is not the exact sampling distribution because that
would require a value of 𝑝̂ for all possible samples of size 100.
However, it is an approximation of the sampling distribution that
we created through simulation.
b) Describe the distribution. Are there any obvious outliers?
The distribution is centered at 0.60 and is reasonably symmetric and bell-shaped. Values vary from about 0.47
to 0.74. The values at 0.47, 0.73 and 0.74 are outliers.
c) Suppose that 45 of the 100 students in the actual sample say that they did all their homework last week.
What would you conclude about the newspaper article’s claim? Explain.
If we found that only 45 students said that they did all their homework last week, we would be skeptical of the
newspaper’s claim that 60% of students did their homework last week. None of the simulated samples had a
proportion this low.
11. Doing Homework. Refer to #9.
a) Make a graph of the population distribution given that there are 3,000 students in the school.
REMEMBER THESE ARE INDIVIDUALS SINCE IT IS A POPULATION
DISTRIBUTION
b) Sketch a possible graph of the distribution of sample data for the SRS of size 100 taken by the AP Statistics
students.
REMEMBER THESE ARE INDIVIDUALS SINCE IT IS A SAMPLE
DISTRIBUTION
13. During winter months outside temperatures at the Starnes’s cabin in Colorado can stay well below freezing
(32° F or 0° C) for weeks at a time. To prevent the pipes from freezing, Mrs. Starnes sets the thermostat at
50° F. The manufacturer claims that the thermostat allows variation in home temperature that follows a Normal
distribution with σ = 3° F. To test this claim, Mrs. Starnes programs her digital thermometer to take an SRS of n
= 10 readings during a 24 hour period. Suppose the thermostat is working properly and that the actual
temperatures in the cabin vary according to a Normal distribution with mean μ = 50° F and standard deviation
σ = 3° F.
The Fathom screen shot below shows the results of taking 500 SRSs of 10 temperature readings from a
population distribution that’s N(50,3) and recording the sample variance s2 each time.
a) Describe the approximate sampling distribution
The approximate sampling distribution is skewed to the right with
a center at 9° F2. The values vary from about 2 to 27.5° F2
b) Suppose that the variance from an actual sample is sx2 = 25. What would you conclude about the
thermostat manufacture’s claim? Explain.
A sample variance of 25 is quite large compared with what we would expect, since only one out of 500 SRSs had a
variance that high. It suggests that the manufacturer’s claim is false and that the thermostat actually has more variability
than claimed.
17. IRS AUDITS the IRS plans to examine an SRS of individual federal income tax returns from each state.
One variable of interest is the proportion of returns claiming itemized deductions. The total number of tax
returns in each state varies from over 15 million in California to about 240,000 in Wyoming.
a) Will the sampling variability of the sample proportion change from state to state if an SRS of 2000 tax
returns is selected in each state? Explain your answer.
Since the smallest number of total tax returns (i.e., the smallest population) is still more than 10 times
the sample size, the variability of the sample proportion will be (approximately) the same for all states.
b) Will the sampling variability of the sample proportion change from state to state if an SRS of 1% of all tax
returns is selected in each state? Explain your answer.
Yes. It will change—the sample taken from Wyoming will be about the same size, but the sample in, for
example, California will be considerably larger, and therefore the variability of the sample proportion
will be smaller.
18. Predict the election Just before a presidential election, a national opinion poll increases the size of its
weekly random sample form the usual 1500 people to 4000 people.
a) Does the larger random sample reduce the bias of the poll result?
A larger sample does not reduce the bias of a poll result. If the sampling technique results in
bias, simply increasing the sample size will not reduce the bias.
b) Does it reduce the variability of the result? Explain.
A larger sample will reduce the variability of the result. More people means more information which means less
variability.
19. The figure below shows histograms of four sampling distributions of different statistics intended to estimate
the same parameter.
a) Which statistics are unbiased estimators? Justify your answer
Graph (c) shows an unbiased estimator because the mean of
the distribution is very close to the population parameter.
b) Which statistic does the best job of estimating the parameter?
Explain.
The graph in (b) shows the statistic that does the best job at
estimating the parameter. Although it is biased, the bias is small
and the statistic has very little variability.
20. A sample of teens. A study of the health of teenagers plans to measure the blood cholesterol of an SRS
of 13 to 16 year olds. The researchers will report the mean 𝑥̅ from their sample as an estimate of the mean
cholesterol level μ in this population.
a) Explain to someone who knows no statistics what it means to say that 𝑥̅ is an unbiased estimator of μ.
If we choose many samples, the average of the 𝑥̅ values from these samples will be close to μ. In other words,
the sampling distribution of 𝑥̅ is centered at the population mean μ we are trying to estimate.
b) The sample result 𝑥̅ is an unbiased estimator of the population μ no matter what size SRS the study
chooses. Explain to someone who knows no statistics why a large random sample gives more trustworthy
results than a small random sample.
A larger sample will give more information and, therefore, more precise results. The variability in the
distribution of the sample average decreases as the sample size increases.
7.2 SAMPLE PROPORTIONS
http://www.rossmanchance.com/applets/Reeses3/ReesesPieces.html
Describe the sampling distribution:
Describe the sampling distribution:
SHAPE: Roughly Symmetric, unimodal, and bell
shaped. A Normal curve would approximate
this distribution fairly well.
SHAPE: Roughly symmetric, unimodal, and bell
shaped. A Normal curve would approximate
this distribution fairly well.
CENTER: The mean of 400 sample proportions
is 0.440. This is quite close to the actual
proportion of 0.45.
CENTER: The mean of 400 sample proportions
is 0.451. This is quite close to the actual
proportion of 0.45.
SPREAD: The standard deviation of the 400
values of 𝑝̂ from these samples is 0.101.
SPREAD: The standard deviation of the 400
values of 𝑝̂ is 0.067.
So what happens as n increases: For a specific value of p, the standard deviation gets smaller as n gets
bigger. The value of σp depends on both p and n.
Go back to the applet and change p to .15
FORMULAS:
SUMMARY OF SAMPLING DISTRIBUTION of 𝑝̂
Choose an SRS of size n from a population of size N with proportion p of successes.
1) Let 𝑝̂ be the sample proportion of successes __context of problem_________________
2) μ = p = _____ from problem
3) Check 10% Rule n< 1/10N if so then,
σ=
or it could be given in problem
4) Check it is approximately normal: np> 10 and n(1 – p) >10.
EXAMPLE of Using Normal Approximation of 𝑝̂
A polling organization asks an SRS of 1500 first-year college students how far away their home is. Suppose that 35% of
all first-year students actually attend college within 50 miles of home. What is the probability that the random sample of
1500 students will give a result within 2 percentage points of this true value?
State: Find P(0.33< 𝑝̂ < 0.37)
Plan: n = 1500 p = 0.35 (the proportion of first year students who attend college within 50 miles of home)
𝜇𝑝̂ = p = 0.35
10% rule: We assume that there are more than 10(1500) = 15,000 first year college students
𝑝(1−𝑝)
𝜎𝑝̂ = √
𝑛
0.35(0.65)
=√
1500
=0.0123
Check for Approximately Normal: np = 1500(.35) = 525 > 10 and n(1-p)=1500(.65)=975 > 10
Therefore we can consider the distribution of 𝑝̂ to be Approximately Normal.
Draw and Shade the Normal Distribution
Do: For 𝑝̂ = 0.33 z =
0.33−0.35
0.0123
= -1.63
For 𝑝̂ = 0.37
z=
0.37−0.35
0.0123
= 1.63
P(0.33 < 𝑝̂ < 0.37) = P(-1.63 < z < 1.63) = 0.8968
Conclude: About 90% of all SRS’s of size 1500 will give a result within 2 percentage points of the truth about the
population.
P 431 21-24 Page 439 27,29,33,35,37,41
21. A newpaper poll reported that 73% of respondents liked business tycoon Donald Trump. The number 73% is
a) a population b) a parameter c) a sample
d) a statistic
e) an unbiased estimator
22. The name for the pattern of values that a statistic takes when we sample repeatedly from the same population is
a) the bias of the statistic
b) the variability of the statistic
d) the distribution of the sample data
c) the population distribution
e) the sampling distribution of the statistic
23. If we take simple random sample of size n = 500 from a population of size 5,000,000 the variability of our estimate will be
a) Much less than the variability for a sample of size n = 500 from a population of size 50,000,000
b) Slightly less than the variability for a sample of size n = 500 from a population of size 50,000,000.
c) About the same as the variability for a sample size n = 500 from a population of size 50,000,000.
d) Slightly greater than the variability for a sample of size n = 500 from a population of size 50,000,000
e) Much greater than the variability for a sample of n = 500 from a population of size 50,000,000.
24. Increasing the sample size of an opinion poll will
a) reduce the bias of the poll result
b) reduce the variability of the poll result
c) reduce the effect of nonresponse on the poll.
d) reduce the variability of opinions
e) all of the above.
Page 439
Explain why you cannot use the methods of this section to find the desired probability
33. Hispanic Workers A factory employs 3000 unionized workers of whom 30% are Hispanic. The 15-member union
executive committee contains 3 Hispanics. What would be the probability of 3 or fewer Hispanics if the executive
committee were chosen at random for all the workers?
The Normal condition is not met here. Np = 15(.3) = 4.5 < 10 No greater than or = to
How would you find this probability using a different method?
It is a binomial distribution. Binary Yes, Independent Yes, Fixed number of successes Yes, Probability of Success = Yes
15
15
( ) . 30 . 715 + ⋯ + ( ) . 33 . 712 = 0.2969.
0
3
35. Do you drink the cereal milk? A USA Today Poll asked a random sample of 1012 US adults what they do with the
milk in the bowl after they have eaten the cereal. Of the respondents, 67% said they drink it. Suppose that 70% of US
adults actually drink the cereal mile. Let 𝑝̂ be the proportion of people in the sample who drink the cereal milk.
a) What is the mean of the sampling distribution of 𝑝̂ ? Why?
The mean of the sampling distribution is the same as the population proportion so it is μp-hat = p = 0.67
b) Find the standard deviation of the sampling distribution of 𝑝̂ . Check to see if the 10% condition is met.
The population (all U.S. adults) is clearly at least 10 times as large as
the sample (the 1012 surveyed adults) so the 10% condition is met.
c) Is the sampling distribution of 𝑝̂ approximately Normal? Check to see if the Normal condition is met
The sampling distribution is approximately Normal since np = 1012(.70) = 708.4> 10 and n(1-p) = 1012(0.30)=303.60>10
Therefore approximately Normal.
d) Find the probability of obtaining a sample of 1012 adults in which 67% or fewer say they drink the cereal mile. Do you
have any doubts about the results of this poll?
Draw the Normal Curve:
z = 0.67 - .70 = -2.08 P(p-hat < .67) = P(z < -2.08) = 0.0188
0.0144
This is a fairly unusual result if 70% of the population actually drink the cereal milk.
37. Do you drink the cereal milk? What sample size would be required to reduce the standard deviation of the
sampling distribution to ½ the value you found in Exercise 35B? Justify your answer.
4048 since n is in the denominator, I would need to double the denominator to divide by two. However, since the
n is square root, I would to multiply it by 4, so that the the square root of n would be doubled, hence dividing the
original standard deviation by 2.
41. On time shipping. You mail order company advertises that is ships 90% of its orders within three working days.
You select an SRS of 100 of the 5000 orders received in the past week for an audit. The audit reveals that 86 of these
orders were shipped on time.
a) If the company really ships 90% of its order on time, what is the probability that the proportion in an SRS of 100 orders
is as small as the proportion in your sample or smaller?
DRAW:
b) A critic says, “Aha! You claim 90%, but in your sample the one time percentage is lower than that. So the 90% claim is
wrong.” Explain in simple language why your probability calculation in a) shows theat the result of the sample does not
refute the 90% claim.
7.3 Sample Means
A is one sample of 100 households
B is a distribution of the sample means from
many samples of size 100
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
Example:
The height of young women follows a Normal distribution with mean μ = 64.5 inches and standard
deviation σ = 2.5 inches.
a) Find the probability that a randomly selected young woman is taller than 66.5 inches. Show you
work.
Let x = the height of a randomly selected young woman.
Normal? Yes since Normal distributed
μ = 64.5 and σ = 2.5 inches Find P(x > 66.5)
DRAW A NORMAL CURVE
x = 66.5 z =(66.5 – 64.5)/2.5 = 0.80
P(z > 0.80) = 0.2119
The probability of choosing a young woman at random whose height exceeds 66.5 inches is about
0.21
b) Find the probability that the mean height of an SRS of 10 young woman exceeds 66.5 inches.
Show your work.
μx-bar = 64.5
10% condition Met? We can assume there are atleast 100 young women in the population
σx-bar = σ/squareroot(n) = 2.5/squareroot(10) = 0.79
DRAW NORMAL CURVE
Normal? Yes since population is normal
Z =(66.5 – 64.5)/.79 = 2.53
P(z > 2.53) = 0.0057
It is very unlikely, less than 1 % chance, that we would choose an SRS of 10 young women whose average
height exceeds 66.5 inches.
Since the standard deviation of the sampling distribution is much
smaller, the distribution is much tighter around the mean, thus there is
less area (which equals probability) from which to draw a sample from
versus picking a woman a random from the population distribution.
P. 441 43-46, 454 49,51,53,55
The magazine Sports Illustrated askes a random sample of 750 Division 1 college athletes, “Do you
believe performance enhancing drugs are a problem in college sports?” Suppose that 30% of all
Division 1 athletes that these drugs are a problem. Let 𝑝̂ be the sample proportion who say that
these drugs are a problem.
43. The sampling distribution of 𝑝̂ has mean
a) 225
b) 0.30
c) none of these
d) 0.017
e) 0
44. The standard deviation of the sampling distribution is about
a) 0.0006
b) 0.033
c) 0.017
d) 1
e) none of these
45. Decreasing the sample size from 750 to 375 would multiply the standard deviation by
a) 2
b) √2
c) ½
d)
1
√2
e) none of these
46. The sampling distribution of 𝑝̂ is approximately Normal because
a) There are atleast 7570 Division I college athletes
b) np = 225 and n(1-p) = 525
c) a random sample was chosen
d) a large sample size like n = 750 guarantees it.
e) The sampling distribution of 𝑝̂ always have this shape.
49. Songs on an iPod. David’s ipod has about 10,000 songs. This distribution of the play times for
these songs is heavily skewed to the right with a mean of 225 seconds and a standard deviation of 60
seconds. Suppose we choose an SRS of 10 songs from this population and calculate the mean play
time x-bar of theses songs. What are the mean and the standard deviation of the sampling
distribution of x-bar?
The mean is μx-bar = 225 second and σx-bar = 60/squareroot(10) = 18.974seconds. These results do not depend on the
shape of the distribution of the individual play times.
These results do not depend of the shape of the distribution of individual play times.
51. Refer to 49. How many songs would you need to sample if you wanted the standard deviation of
the sampling distribution of x-bar to be 30 seconds? Justify your answer.
53. Larger Sample. Suppose that the blood cholesterol of all mean aged 20 to 34 follows the
Normal Distribution with mean μ = 188 milligrams per deciliter and standard deviation σ = 41 mg/dl.
a) Choose an SRS of 100 men from this population. What is the sampling distribution of x-bar?
b) Find the probability that x-bar estimates μ within ±3 mg/dl. Show your work.
c) Choose an SRS of 1000 men form this population. Now what is the probability that x-bar falls
within ±3 mg/dl of μ? Show your work. In what sense is the larger sample “better”?
55. Bottling cola. A bottling company uses a filing machine to fill plastic bottles with cola. The
bottles are supposed to contain 300 milliliters. In fact, the contents vary according to a Normal
Distribution with mean μ = 298 ml and standard deviation σ = 3 ml.
a) What is the probability that an individual bottle contains less than 295 ml? Show your work
b) What is the probability that the mean contents of six randomly selected bottles is less than 295 ml?
Show your work.
REVIEW QUESTIONS: Sharing music online. A sample survey reports that 29% of Internet users
download music files online, 21% share music files from their computers, and 12% both download
and share music. Make a Venn Diagram that displays this information. What percent of Internet
users neither download nor share music files?
The Central Limit Theorem:
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
The Central Limit Theorem
Draw an SRS of size n form any population with mean μ and standard deviation σ. The Central Limit Theorem
(CLT) says that when n is large (n > 30), the sampling distribution of the sample mean x-bar is approximately
normal.
NORMAL CONDITIONS FOR SAMPLE MEANS
1) If the population is Normal, the so is the sampling distribution of x-bar, no matter what size n is.
2) If the population is not Normal, the CLT tells us that the sampling distribution of x-bar will be
approximately Normal if n >30
P 455 57,59,61,63,65-68
57. What does the CLT say? Asked what the central limit theorem says, a student replies, “As you
take larger and larger samples form a population, the histogram of the sample values looks more and
more Normal.” Is the student right? Explain your answer.
No. The histogram of the sample values will look like the population distribution, whatever it might happen to
be. The central limit theorem says that the histogram of the distribution of sample means (from many large
samples) will look more and more Normal.
59. Songs on an iPod. Refer to #49.
a) Explain why you cannot safely calculate the probability the mean play time x-bar is more than 4
minutes (240 seconds) for an SRS of 10 songs.
Since the distribution of the play times of the population of songs is heavily skewed to the right, a
sample size of 10 will not be enough for the Normal approximation to be appropriate.
b) Suppose we take an SRS of 36 songs instead. Explain how the central limit theorem allows ys to
find the probability that the mean play times is more than 240 seconds. Then calculate this
probability. Show your work.
With a sample size of 36, we now have enough observations in our sample for the Central Limit Theorem to
apply.
61. Airline passengers get heavier. In response to the increasing weight of airline passengers,
the Federal Aviation Administration (FAA) in 2003 told airlines to assume that passengers average
190 pounds in the summer, including clothes and carry-on baggage. But passengers vary, and the
FAA did not specify a standard deviation. A reasonable standard deviation is 35 pounds. Weights
are not Normally distributed, especially when the population includes both men and women, but they
are not very non-Normal. A commuter plan carries 30 passengers.
a) Explain why you cannot the calculate the probability that a randomly selected passenger weights
more than 200 pounds
This probability cannot be calculated, because we do not know the shape of the distribution of the
weights.
b) Find the probability that the total weight of the passengers of a full flight exceeds 6,000 pounds.
Show your work.
63. More on Insurance. An insurance company knows that in the entire population of homeowners,
the mean annual loss from fire is μ = $250 and the standard deviation of the loss is σ = $300. The
distribution of losses is strongly right skewed: many policies have $0 loss, but a few have large
losses. If the company sells 10,000 policies, can it safely base its rates on the assumption that its
average loss will be no greater than $275?
65 – 66. Scores on the mathematics part of the SAT exam in a recent year were roughly Normal with
mean 515 and standard deviation 114. You choose an SRS of 100 students and average their SAT
Math scores. Suppose that you do this many, many times.
65. The mean of the average scores you get should be close to
a. 515
b. 5.15
c. 51.5
d. 0
e. none of the above
66. The standard deviation of the average scores you get should be close to
a. 114
b. 1.14
c. 11.4
d. 1
e. none of these
67. A new born baby has extremely low birth weight (ELBW) if it weighs less than 1000 grams. A
study of the health of such children in later years examined a random sample of 219 children. Their
mean weight at birth was x-bar = 810 grams. This sample mean is an unbiased estimator of the
mean weight μ in the population of all ELBW babies, which means that
a) in all possible samples of size 219 from this population, the mean of the values of x-bar will equal
810.
b) in all possible samples of size 219 from this population, the mean of the values x-bar will equal μ.
c) as we take larger and larger samples from this population, x-bar will get closer and closer to μ.
d) in all possible samples of size 219 from this population, the values of x-bar will have a distribution
that is close to Normal
e) the person measuring the children’s weights does so without any systematic error.
68. The number of hours a light bulb burns before failing varies from light bulb to light bulb. The
distribution of burnout times is strongly skewed to the right. The central limit theorem says
a) as we look at more and more bulbs, their average burnout time gets ever closer to the mean μ for
all bulbs of this type.
b) the average burnout time of a large number of bulbs has a distribution of the same shape (strongly
skewed) as the population distribution.
c) the average burnout time of a large number of bulbs has a distribution with similar shape but not as
extreme (skewed, but not as strongly) as the population distribution.
d) the average burnout time of a large number of bulbs has a distribution that is close to Normal
e) the average burnout time of a large number of bulbs has a distribution that is exactly Normal.
Download