Inverse Normal Probability Calculations: Un-standardizing

advertisement
Chapter 6: Normal Probability Distributions
The NORMAL DISTRIBUTION describes many different data sets.
Attributes of the Normal Curve:




Notation:
Symmetric: Mean= 
Standard Deviation= 
Range is approximately 4 standard deviations
The entire area under the curve is 100%, or 1
Standardizing with Z-Scores:
The standard deviation is the most common measure of spread used for normal curves and is a
natural ruler for comparing individual values to the mean. To determine how many standard
deviations the value is away from the mean, we can standardize this value.
z = Observed value – mean
Standard deviation
So,
Z
X 

where μ (population mean) and σ (population standard deviation) are given.
The z-score tells us how many standard deviation an observation is above or below the mean.
Example:
 A Z-score of 1 means the observation is 1 standard deviation larger than the mean.
 A Z-Score of –2 means the observation is 2 standard deviations smaller than the mean.
1
**Note: In these examples, we are talking about a single observation (X) coming from an entire
population with a mean (  ) and a standard deviation (  )
Example: Two boys, in different classes, each ran a race. Boy A finished in 6.5 minutes. The
average for his class was 8 minutes, with a standard deviation of 1.0 minutes. Boy B finished in
7.5 minutes. His class average was 9.5 minutes, with a standard deviation of 1.5 minutes.
Suppose the distribution of times for each class follows a bell curve. Use z-scores to determine
which boy did better with respect to the rest of his class. Explain your answer.
Finding Normal Percentiles using the Standard Normal Table:
Example #1:
Use the Z table to find the following: Draw the picture first, shade the region you want and look up
the Z in Table 2 to find the proportion to the left of that z-score. The proportion is also known as
probability that the value of a particular member of a population will fall in the given interval.
a. P(Z<-1.42)=
b. P(Z  1.95) =
2
c. P(-1.02<Z  2.57) =
Example #2: Women’s heights
Assume that college women’s heights follow a normal curve with a mean height of 65 inches and a
standard deviation of 2.7 inches.
a) Find the probability a college woman, selected at random, is shorter than 62 inches?
b) Find the probability a college woman, selected at random, is at least 68 inches tall?
c) Find the probability a college woman, selected at random, is between 60 and 68 inches tall?
Example #3:
According to an article in Newsweek, in China, the mean emission of organic pollutants is 11.7
million pounds per day. Assume the water pollution in China is normally distributed throughout
the year with a standard deviation of 2.8 million pounds of organic emissions per day.
a) What is the probability that on any given day the water pollution in China is at least 15 million
pounds per day?
b) What is the probability that on any given day the water pollution in China is between 6.2 and
9.3 million pounds per day?
3
Inverse Normal Probability Calculations: Un-standardizing
Sometimes the proportion or percentage is given and you must find the corresponding Z-score
and un-standardize the value by finding the X-value
STEPS:


Draw the picture
Identify the Z-value from the given value of the proportion. Solve for X:
X  Z  
Example 1.
The distribution of heights of college women is normal, with mean 65 inches and standard
deviation 2.7 inches.
a) Find the height such that 10% of college women are shorter than that height.
b) Determine the two heights that make up the middle 90%.
Example 2. An athletic association wants to sponsor a footrace. The time it takes to run the
course is normally distributed with a mean of 58.6 minutes, and a standard deviation of 3.9
minutes
a) The association decides to have a tryout run, and eliminate the slowest 30% of the racers. What
should the cutoff time be in the tryout run for elimination?
4
b) What is the value of the first quartile for this distribution?
Practice problems.
1. A World Health Organization study of health in various countries reported that in Canada,
systolic blood pressure readings have a mean of 122 and a standard deviation of 16. It is known
that the distribution of systolic blood pressure is normal.
a) What is the probability a Canadian selected at random has systolic blood pressure
between 100 and 135?
b) High systolic blood pressures can be very dangerous. What systolic blood pressure
represents the boundary for the upper 7% of blood pressures?
2. Suppose that the distribution for the amount spent by students vacationing for a week in
Florida is normally distributed with a mean of $650 and a standard deviation of $120.
a) What is the probability that a randomly selected student vacationing for a week in Florida
will spend between $500 and $900?
b) Only 8% of students will spend more than what amount?
5
3. A machine that cuts corks for wine bottles operates in such a way that the distribution of the
diameter of the corks produced is normal with a standard deviation of 0.15 cm. Suppose that
15% of the corks have a diameter above 3.244 cm.
a) Find the mean diameter of the corks.
b) Suppose the machine has been recalibrated and the mean diameter of the corks produced
is now 4.5 cm with a standard deviation of 0.15 cm. Specifications for this machine
require that cork diameters should be no smaller than 4.42 cm. What is the probability
that a cork selected at random from this machine will have a diameter smaller than 4.42
cm?
The Distribution of a Sample Mean
Example #1: Tossing a Die
 Tossing a single die 10,000 times
Histogram of single toss
1800
1600
1400
Frequency
1200
1000
800
600
400
200
0
1
2
3
4
single toss
5
6
7
 Tossing a pair of dice 10,000 times, calculating and graphing the averages of each
pair.
6
Histogram of Avg of pairs
1800
1600
1400
Frequency
1200
1000
800
600
400
200
0
1
2
3
4
5
6
A vg of pairs
 Tossing twenty dice 10,000 times, calculating and graphing the averages.
Histogram of average of 20
600
500
Frequency
400
300
200
100
0
2.0
2.4
2.8
3.2
3.6
average of 20
4.0
4.4
4.8
In general, taking the average of larger sample sizes gives a more precise estimate of the true
mean.
(The spread around the center gets smaller)
A sampling distribution is the probability distribution of a sample statistic.
The Central Limit Theorem (CLT): When drawing a Simple Random Sample (SRS) n from
any non-normal population with a mean  and a standard deviation  , then the sample mean
( x ) has a sampling distribution that is approximately normal as long as the sample is large
enough.

Rule of thumb: If the population is not normally distributed, n should be greater than or
equal to 30.
Conditions:
1. The sampled values must be independent of one another.
2. Randomization condition: The data values must be sampled randomly.
7
3. If sampling has not been made using replacement, the sample size should be no larger
than 10% of the population. Usually, populations are so large that 10% is a small
fraction.

The Sampling Distribution Model for a Sample Mean:
The mean of the sample averages is:  x  
The standard deviation of the sample averages is:  x 

n
1. If a population is normal and has the N (  ,  ) distribution, then the sample mean x of (n)
independent observations has a distribution that is normal: x ~ N (  ,

).
n
2. If a population is non-normal, then the sample mean x of (n) independent observations has a
distribution that is approximately normal according to the CLT (as long as n is large) :
x ~ AN (  ,

).
n
Now, when we are looking for area under a normal curve and we are dealing with a sample mean,
the new Z-Score becomes:
Z
X
=

n
Z
X  x
x
Example #1: Weights of Adult Men
1. In engineering, weights of people are considered so that airplanes and elevators aren’t
overloaded, chairs won’t break and other embarrassing things won’t occur. Men’s weights are
normal with a mean of 173 lbs., and a standard deviation of 30 lbs.
a. What is the probability a randomly selected man weighs more than 180 lbs.?
b. If 9 men are randomly selected (say to be in an elevator), what is the probability that their
average weight is more than 180 lbs.
8
Example #2: As reported by Runner’s World magazine, the times of the finishers in the New
York City 10 km run are normally distributed with a mean of 61 minutes and a standard deviation
of 9 minutes. A simple random sample of 30 runners is selected.
a) Describe the sampling distribution of the average 10km finishing times.
b) Find the probability that the average for the sample of 30 finishing times will be more than 65
minutes.
Example #3: A rental car company has noticed that the distribution of the number of miles
customers put on rental cars per day is right skewed. The distribution has a mean of 60 miles and
a standard deviation of 25 miles. A random sample of 120 rental cars is selected.
a) Describe the sampling distribution of the average number of miles driven per day for the
sample of 120 rental cars. Use the appropriate notation.
b) What is the probability that the mean number of miles driven per day for the sample of 120
cars is less than 54?
9
c) What is the probability that the total number of miles driven per day in the sample of 120 cars
exceeds 7400?
Inverse Calculation: Un-standardizing
This Z-score calculation can also be rearranged to solve for a sample mean:
Z
X

n
X  Z  


 
n
Example #1: The amounts of telephone bills for all households in a large city have a distribution
that is skewed to the right with a mean of $75 and a standard deviation of $27. A random sample
of 90 households is selected from this city.
What is the value representing the first quartile for the sampling distribution of X ?
Example #2: A waiter believes the distribution of his tips has a model that is right skewed, with a
mean of $9.60 and a standard deviation of $5.40. A random sample of 40 parties this waiter waits
on is selected.
a) Describe the sampling distribution of the sample mean tip.
b) What is the probability that the waiter will earn a total of less than $450 in tips when he waits
on 40 parties?
10
c) How much does the waiter earn on the best 10% of weekends in which he waits on 40 parties?
Example #3: In the library on a university campus, there is a sign in the elevator that indicates a
weight limit of 2500 pounds. Assume the average weight of students, faculty and staff on campus
is normally distributed, with a mean of 150 pounds, and standard deviation 27 pounds. A random
sample of 16 persons from the campus is selected.
a. Describe the sampling distribution of the sample mean weight.
b. What is the probability that the average weight of the 16 people in the sample is less than 160
pounds?
c. Suppose the sample of 16 people is placed in the library elevator. What is the probability that
the total weight of the 16 persons on the elevator will exceed the weight limit of 2500 pounds?
The Distribution of a Sample Proportion
Example 1: Classroom Experiment: Simulating a sampling distribution for a sample
proportion.
Suppose students are asked to spin a penny on their desk and record the number of heads they get.
200 Students were directed to consider the distribution of sample proportion values from samples
of 10 and 20 spins.
Trial #1:
Variable
Sample Prop
N
200
n
10
Mean
0.4840
SE Mean
0.0107
11
StDev
0.1515
Histogram of sample proportion of heads
50
Frequency
40
30
20
10
0
0.2
0.3
0.4
0.5
0.6
0.7
0.8
sample proportion of heads (n=10 spins) 200 students
0.9
Trial #2:
Variable
Sample Prop
N
200
n
20
Mean
0.48975
SE Mean
0.00815
StDev
0.11532
Histogram of sample proportion of heads
70
60
Frequency
50
40
30
20
10
0
0.2
0.4
0.6
0.8
sample proportion of heads (n=20 spins) 200 students
1.0
Question #1: What proportion of heads did you expect from each set of spins?
Question #2: Did the students get the same sample proportion every time? This is called
Sampling Variability.
Question #3: Compare the two graphs. Which one did a better job at estimating the true
proportion? Why? Give two reasons.
The main fact sand formulas
The sample proportion (p-hat):
p̂ = number of successes in sample
total number in the sample
12
Notation: p̂ 
x
, where x is the number of successes.
n
In other words, p̂ is a sample proportion from a SRS of size (n) from a population having
proportion of successes is p .
Sample proportions summarize categorical variables.
Attributes, Assumptions and Conditions:
1. The sampled values must be independent of one another.
2. Mean of a sample proportion:
 pˆ  p
3. Standard deviation of a sample proportion:
p (1  p )
n
 p̂ 
4. When n is sufficient large and the true proportion p is not too near 0 or 1, the sampling
distribution model for a proportion is approximately normal.

pˆ ~ AN  p,

pq 

n 
5. As a safe (and conservative) rule of thumb, check that the number of successes and the
number of failures are at least 10.
np  10
nq  10
6. If sampling has not been made using replacement, the sample size must be no larger
than 10% of the population. Usually, populations are so large that 10% is a small
fraction.
Standardized Statistics: The standardized z-score we will use for sample proportions is
as follows:
Z
pˆ   pˆ
 pˆ

pˆ  p
p(1  p)
n
13
Examples. 1. The Associated Press reported that 71% of Americans ages 25 and older are
overweight. A researcher wants to know whether the proportion of such individuals in his state
that are overweight differs from the national proportion. A random sample of 600 adults in his
state results in 405 who are classified as overweight.
a. What is the sample proportion of overweight Americans?
b. Check and verify all of the attributes, assumptions and conditions.
c. Describe the sampling distribution of the sample proportion for size 600 using
the appropriate notation.
d. Find the probability that at most 405 of the 600 sampled adults are classified
overweight.
2. According to the 2001 Youth Risk Behavior Surveillance by the Center for Disease Control
and Prevention, 39% of the 10th-graders surveyed said that they watch three or more hours of
television on a typical school day. Assume that this percentage is true for the current population
of all 10th –graders. Suppose in a random sample of 200 10th-graders, 86 watched three or more
hours of television on a typical school day.
a. Check the general properties and describe the sampling distribution of the sample
proportion of size 200 using the appropriate notation.
b. Find the probability that 86 or more out of the 200 students watched three or more hours
of television on a typical day.
14
3. A nationwide survey by the University of Connecticut Center for Survey Research and
Analysis found that 30% of men aged 18 to 29 had tattoos in 2002. Suppose this result holds true
for the current population of all men in this age group. Find the probability that in a random
sample of 500 men aged 18 to 29, between 28.4% and 32.6% have tattoos.
4. 5% of the requests to a web server end up in a network error. A network technician monitors a
busy web server for one hour. He observes that 200 requests were received and 14 ended up in an
error.
Which of the following correctly describes the sampling distribution of p̂ , the sample proportion
of requests to the web server that end up in an error?
A. p̂ is approximately normal, with mean 0.07 and standard deviation 0.0180.
B. p̂ is normal, with mean 0.05 and standard deviation 0.0154.
C. p̂ is approximately normal, with mean 10 and standard deviation 3.082.
D. p̂ is approximately normal, with mean 0.05 and standard deviation 0.0154
E. p̂ is normal, with mean 0.07 and standard deviation 0.0180.
Calculate the probability that the proportion of requests in the sample that end up in an error is at
least 0.07 (14 out of 200)?
15
SUMMARY: Make sure that you understand this.
A. Normal Distribution




Symmetric: Mean= 
Standard Deviation= 
Range is approximately 4 standard deviations
The entire area under the curve is 100%, or 1
Standardizing with Z-Scores:
The standard deviation is the most common measure of spread used for normal curves and is a
natural ruler for comparing individual values to the mean. To determine how many standard
deviations the value is away from the mean, we can standardize this value.
z = Observed value – mean,
or
Z
X 

B. Finding Normal Percentiles using the Standard Normal Table.
C. Inverse Normal Probability Calculations
X  Z  
D. The Distribution of a Sample Mean
X
has a distribution that is approximately normal as long as the sample is large enough.

Rule of thumb: If the population is not normally distributed, n should be greater than or
equal to 30.
16

The sample size should be no larger than 10% of the population.
The mean of the sample averages is:
x  
The standard deviation of the sample averages is:
x

n
1. If a population is normal and has the N (  ,  ) distribution, then
X ~ N ( ,

n
).
2. If a population is non-normal, then for large n the sample mean is approximately
normal (CLT) :
X ~ AN (  ,

n
).
The Z-Score becomes:
Z
X
=

n
Z
X  x
x
Inverse Calculation: Un-standardizing
Z
X

n
X  Z  


 
n
E. The Distribution of a Sample Proportion
 pˆ  p
17
p (1  p )
n
 p̂ 
When n is sufficient large and the true proportion p is not too near 0 or 1, the sampling
distribution model for a proportion is approximately normal.

pˆ ~ AN  p,

18
p(1  p) 

n

Download