2007

advertisement
Test 2 Covers
Topics 12, 13, 16, 17, 18,
14, 19 and 20
Skipping Topics 11 and 15
Topic 12
Normal Distribution
Normal Distribution
If Density Curve is symmetric, single peaked, bell-shaped
then it is called Normal distribution
Remember that in density curve the curve:
• Is always on or above the horizontal line
• Has area exactly 1 underneath it
50% = 0.5
50% = 0.5
Topic 12 – Normal Distribution
• Data that display the general shape seen in the these
examples (page 248) occur frequently.
• The theoretical mathematical models used to
approximate such distributions are called normal
distributions.
• Every normal distribution shares three distinguishing
characteristics:
– all are symmetric, have a single peak at their center,
and follow a bell-shaped curve.
• Two things, however, distinguish one normal distribution
from another:
– its mean and standard deviation.
Topic 12 – Normal Distribution
• The mean µ of a normal distribution, represented by ,
determines where its center is; the peak of a normal
curve occurs at its mean, which is also its point of
symmetry.
• The standard deviation of a normal distribution,
represented by σ, indicates the spread of the
distribution.
• ( Note: We reserve the symbols x (x-bar) and s to refer
to the mean and standard deviation computed from
sample data rather than a mathematical model.)
• The distance between the mean and the points where
the curvature changes is equal to the standard deviation .
Normal distribution
• The probability of a randomly selected observation
falling in a certain interval is equivalent to the
proportion of the population’s observations falling in
that interval.
• Because the total area under the curve of a normal
distribution is 1, this probability can be calculated by
finding the area under the normal curve for that
interval.
• To find the area under a normal curve, you can use
either technology or tables.
• Using Table II in the back of the book ( the Standard
Normal Probabilities Table) reports the area to the
left of a given z- score under the normal curve.
Normal distribution
• It is customary to use the symbol Z to denote
observations from the standard normal
distribution, which has mean = 0 and standard
deviation = 1.
– The notation Pr( a < Z < b) or P( a < Z < b) denotes the
probability lying between the values a and b,
calculated as the area under the standard normal
curve in that region.
– The notation Pr( Z < c) or P( Z < c) denotes the area to
the left of a particular value c,
– while Pr( Z > d ) or P( Z > d ) refers to the area to the
right of a particular value d.
Example:
a.
b.
c.
d.
e.
f.
P(Z < = -2.25) =
P(Z > = -2.25) =
P(Z > 1.77) =
P(-2.25 < Z < 1.77)=
P(Z < a) = 5%
P(Z > b) = 3%
Using your calculator
TI83: [Distr] [2:normalcdf(]   lower range, Upper range,
mean, standard deviation) 
Between area P(a <Z< b)
Lower end area P(Z < c)
Upper end area P( z > d)
Lower limit: a
Upper Limit: b
Lower limit: -100000000
Upper Limit: c
Lower limit: d
Upper Limit: 100000000
Mean: 0
Standard deviation: 1
Mean: 0
Standard deviation: 1
Mean: 0
Standard deviation: 1
Example:
a. P(Z < = -2.25) =
TI83: [Distr] [2:normalcdf(]  -10000,-2.25, 0, 1) 
a. P(Z < = -2.25) = .0122244334 1.2% of data falls below –2.25
Example:
b. P(Z > = -2.25) =
TI83: [Distr] [2:normalcdf(]  -2.25, 10000, 0, 1) 
b. P(Z > = -2.25) = 0.9877755666
98.8% of data falls above –2.25
Another method: 1 - .0122244334 = 0.9877755666
Example:
c. P(Z > 1.77) =
TI83: [Distr] [2:normalcdf(]  1.77, 10000, 0, 1) 
c. P(Z > 1.77) = 0.0383635226
3.8% of data falls above 1.77
Using your calculator
TI83: [Distr]  [3:invNorm(]  lower proportion, mean,
standard deviation) 
Lower
proportion
Lower
proportion
Example:
a. P(Z < a) = 5%
TI83: [Distr]  [3:invNorm(]  lower proportion, mean,
standard deviation) 
TI83: [Distr]  [3:invNorm(]  0.05, 0, 1) 
5% = 0.05
Lower proportion = 0.05
a. P(Z < a) = 0.05;
a = -1.645
Example:
c. P(Z > b) = 3%
TI83: [Distr]  [3:invNorm(]  lower proportion, mean,
standard deviation) 
TI83: [Distr]  [3:invNorm(]  0.97, 0, 1) 
3% = 0.03
Lower proportion = 1 – 0.03 = 0.97
c. P(Z > b) = 0.03
b = 1.8808
Exercise 12-14: Dog Heights Page 265
Exercise 12-18: Critical Values Page 265
Exercise 12-22: SAT and SATs Page 266
Watch out
• Draw a sketch of the relevant normal curve,
• shade in the region of interest, and
• check whether or not the probability calculated seems reasonable in
light of the sketch.
• When you are using technology, make sure that the inequality
symbol is set in the correct direction for the question at hand.
• Be careful with phrases such as “ at least” and “ at most.”
– Weighing at least 3000 pounds means to weigh 3000 or more
pounds, so that indicates the area to the right of 3000.
– Weighing at most 2500 pounds is to weigh 2500 or less pounds,
so that indicates the area to the left of 3000.
• The probability of any one specific value is zero because the area
above one specific value is zero.
READ In Brief
• When working with two normal curves,
– it is easy to get confused about which curve to use for a given
question.
– Read the questions carefully.
• Try to recognize whether you have been given a value for
– the variable and asked for a probability, or
– a probability and asked for the value of the variable.
• Avoid sloppy notation. For example, do not say that z = 0.12 = 0.5478.
Instead, say that the z- score is 0.12 and the probability ( or area) to its
left is 0.5478. You could also say that P( Z < 0.12)= 0.5478.
• Remember the normal distribution is a mathematical model, so it is an
idealization that never describes real data perfectly.
Read Wrap Up pages 261-262
This topic introduced you to the most important mathematical model
in all of statistics— the normal distribution.
• You have seen that data from many quantitative variables follow a
familiar bell- shaped curve. This normal ( bell- shaped) model can,
therefore, be used to approximate the behavior of many real- world
phenomena.
• In addition to histograms and other common graphical displays, you
can use normal probability plots to assess whether or not sample
data can reasonably be modeled with a normal distribution, which
is particularly helpful with smaller sample sizes.
• You have also seen that the process of standardization, or
calculating a z- score, allows you to use a table of standard normal
probabilities to perform calculations related to normal distributions.
• You practiced using this table, and also using technology, both to
calculate probabilities and to determine percentiles.
Wrap Up pages 261-262
• The normal distribution is a useful model for summarizing the
behavior of many quantitative variables.
• A normal probability plot is a useful tool for judging whether or not
sample data could plausibly have come from a normally distributed
population.
• You can calculate probabilities from normal distributions by determining
the area under the normal curve over the interval of interest.
• These areas can be interpreted either as the probability that a randomly
selected value falls in the interval or as the proportion of values in the
distribution that fall in the interval.
• To calculate probabilities from normal distributions, you can
standardize ( use the z- score) the values of interest and use a table of
standard normal probabilities or using technology.
• Percentile lower area of the distribution
Topic 13
Sampling Distributions: Proportion
Topic 13 – Sampling Distributions: Proportion
• Recall from Topic 4 that a population consists of the entire group of
observational units of interest to a researcher, while a sample refers to
the ( often small) part of the population that the investigator actually
studies.
• Also remember that a parameter is a numerical characteristic of a
population, while a statistic is a numerical characteristic of a sample.
• In certain contexts, a population can also refer to a process ( such as
flipping a coin or manufacturing a candy bar) that, in principle, can be
repeated indefinitely.
• Using this interpretation of population, a sample is a specific collection of
process outcomes. Throughout this topic, we will be careful to use
different symbols to denote parameters and statistics.
• For example, we use the following symbols to denote proportions, means,
and standard deviations ( note that we consistently use Greek letters for
parameters):
Activity 13- 1: Candy Colors Page 270
Sampling variability:
• The distribution of the sample proportions from sample
to sample is called the sampling distribution of the
sample proportion.
• Even though the sample proportion of orange candies
varies from sample to sample, that variation has a
recognizable long- term pattern.
• These simulated sample proportions approximate the
theoretical sampling distribution derived from all possible
samples.
• Although you cannot use a sample proportion to
determine a population proportion exactly, you can be
reasonably confident that the population proportion is
within a certain distance of the sample proportion.
• This distance depends primarily on how confident you
want to be and on the size of the sample. You will study
this notion extensively when you encounter confidence
intervals in Topic 16.
Central Limit Theorem ( CLT)
Central Limit Theorem ( CLT) for a Sample Proportion
Suppose a simple random sample of size n is to be
taken from a large population in which the true
proportion possessing the attribute of interest is π.
• The sampling distribution of the sample proportion
p-hat is approximately normal with mean equal to π
and standard deviation equal to
 1   
n
• This normal approximation becomes more and more
accurate as the sample size n increases, and it is
generally considered to be valid as long as nπ >= 10
and n( 1 - π) >= 10.
Watch Out
• It’s essential to distinguish clearly between parameters and
statistics.
– A parameter is a fixed numerical value describing a population.
Typically, you do not know the value of a parameter in real life,
but you may perform calculations assuming a particular
parameter value.
– On the other hand, a statistic is a number describing a sample,
which varies from sample to sample if you were to repeatedly
take samples from the population.
• Notice that the Central Limit Theorem ( CLT) specifies three things
about the distribution of a sample proportion: shape, center ( as
measured by the mean), and spread ( as measured by the standard
deviation).
• It’s easy to focus on one of these aspects of a distribution and
ignore the other two. As with other normal distributions, drawing a
sketch can help you to visualize the CLT.
Activity 13-3: Smoking Rates, Page 279
Exercise 13-5: Miscellany Page 286
Exercise 13-17: Smoking Rates Page 289
Download