MATH 2560 C F03 Elementary Statistics I LECTURE 4: Normal

advertisement
MATH 2560 C F03
Elementary Statistics I
LECTURE 4: Normal Distribution
Calculations.
1
Outline.
⇒ standardizing observations;
⇒ the standard normal distribution;
⇒ normal distribution calculations;
⇒ normal quantile plots;
2
Standardizing Observations
All normal distributions are the same if we measure in units of size σ about
the mean µ as center.
⇒ changing to these units is called standardizing.
How to standardize a Value?
1. Subtract the mean of the distribution ;
2. Divide then by the standard deviation.
Standardizing and z-Scores
If x is an observation from a distribution that has mean µ
and standard deviation σ, the standardized value of x is
x−µ
.
σ
A standardized value is often called a z-scores.
z=
Remark. A z-score tells us how many standard deviations the
original observation falls away from the mean, and in which direction. Observations larger than the mean are positive when standardized,
and observations smaller than the mean are negative.
Example 1.24 Heights of Young Women. µ = 64.5 and σ = 2.5.
The standardized height is
z=
height − 64.5
.
2.5
z=
68 − 64.5
= 1.4
2.5
For example, for x = 68
or 1.4 standard deviations above the mean. And, a woman 5 feet (60 inches)
tall has standardized height
z=
60 − 64.5
= −1.8
2.5
or 1.8 standard deviations less than the mean height.
Using the Notations X and x : 1) we shorten ”the height of a
young woman is less than 68 inches” to X < 68 (capital letter X);
2) lowercase x stands for any specific value of the variable X (for example,
64.5).
We note, that X may take many values, but x only one.
Standardizing and Linear Transformation
Standardizing is a linear transformation that transforms the data into
the standard scale z-scores. So, standardizing does not change the shape
of a distribution, but the mean and standard deviation change in a simple manner.
Any variable obtained from a normal variable by a linear transformation remains
normal.
Let X has normal distribution with mean µ and standard deviation σ, then
Xnew = a + bX for a positive b has the normal distribution with mean a + bµ and
standard deviation bσ.
In particular, the standardized values for any distribution always have mean 0
and standard deviation 1.
3
Standard Normal Distribution
Standardizing a variable that has any normal distribution produces a new
variable that has the standard normal distribution.
The Standard Normal Distribution
The standard normal distribution is the normal distribution N (0, 1)
with mean 0 and standard deviation 1.
If a variable X has any normal distribution N (µ, σ) with
mean µ and standard deviation σ, then the standardized variable
Z=
has the standard normal distribution.
X −µ
σ
Table A appears on the inside fron cover of the textbook. You can use
Table A to do normal calculations.
Example 1.25: How to use Table A?
Problem 1. What proportion of observations on a standard normal
variable Z take values less than 1.4?
Solution: to find the area to the left of 1.40, locate 1.4 in the left-hand
column of Table A, then locate the remaining digit 0 as .00 in the top row.
The entry opposite 1.4 and under .00 is 0.9192. This is the area we seek.
Figure 1.26(a) illustartes this area.
Problem 2. Find the proportion of observations from the standard normal distribution that are greater than −2.15.
Solution: enter Table A under z = −2.15. That is, find −2.1 in the
left-hand column and 0.5 in the top row. The table entry is 0.0158. This is
the area to the left of −2.15. Because the total area under the curve is 1, the
area lying to the right of −2.15 is
1 − 0.0158 = 0.9842.
Figure 1.26(b) shows these area.
4
Normal Distribution Calculations
We can find relative frequencies for any normal distribution by stabdardizing
and using Table A.
Let us consider several examples.
4.1
Example 1.26: Area to the Left
1. State the Problem. Let variable X has the N (1019, 209) distribution.
We want the proportion of students with X > 820.
2. Standardize. X > 820 ⇒ X−1019
> 820−1019
⇒ Z > −0.95.
209
209
3. Use the table. From Table A, we see that the proportion of observations
less than −0.95 is 0.1711. The area to the right of −0.95 is therefore 1 −
0.1711 = 0.8289. It means that about 17 percents of students score less than
820. And 83 percents score greater than 820.
Figure 1.28 shows the standard normal curve with the area of interest.
4.2
Example 1.27: Area Between Different z
1. State the Problem. Let variable X has the N (1019, 209) distribution.
We want the proportion of scores in the interval720 ≤ X < 820.
2. Standardize. 720 ≤ X < 820 ⇒ 720−1019
≤ X−1019
< 820−1019
⇒
209
209
209
−1.43 ≤ Z < −0.95.
3. Use the table. The area between −1.43 and −0.95 is the area to the
left of −0.95 minus the area to the left of −1.43. From Table A,
area between −1.43 and −0.95=(area left of −0.95)-(area left of −1.43)=0.1711−
0.0764 = 0.0947.
About 9.5 percents of students.
Figure 1.29 shows the area under the standard normal curve.
4.3
Example 1.28: Using Table A backward: Inverse
Problem
1. State the Problem. Find score x with area 0.1 to its right under the
normal curve with mean µ = 505 and standard deviation σ = 110.
2. Use the table. Look in the body of Table A for the entry closest to
0.9. It is 0.8997. This is the entry corresponding to z = 1.28. So z = 1.28 is
the standardized value with area 0.9 to its left.
3. Unstandardizing: transformation the solution from the z to the original
x scale. Here z = 1.28. So x itself satisfies
x − 505
= 1.28.
110
Solving this equation for x gives
x = 505 + (1.28)(110) = 645.8.
We can see that a student must score at least 646 to place in the highest 10
percents.
Figure 1.30 poses the question in graphical form.
⇒ General Rule for Unstandardizing a z-score is x = µ + zσ.
5
Normal Quantile Plots
How we can judge whether data are approximately normal? The most useful
tool for assessing normality is another graph, the Normal Quantile Plot.
The Idea of a Normal Quantile Plot
1. Arrange the observed data values from smallest to largest.
2. Do normal distribution calculations to find the z-scores at these same
percentiles.
3. Plot each data point x gainst the corresponding z.
If the data distribution is close to standard normal, the plotted points will
lie close to the 45-degree line x = z.
If the data distribution is close to any normal distribution, the plotted points
will lie close to some stright line.
Use of Normal Quantile Plots
If the points on a normal quantile plot lie close to a stright line,
the plot indicates that the data are normal.
Systematic deviations from a stright line indicate a nonnormal distribution.
Outliers appear as points that are far away from the overall pattern of the plot.
Figure 1.31 to 1.33 are normal quantile plots for data met earlier (see Figure
1.33).
6
Summary
To standardize any observation x, subtract the mean of the distribution
and then divide by the standard deviation. The resulting z − score z = x−µ
σ
says how many standard deviations x lies from the distribution mean. All
normal distributions are the same when measurements are transformed to
the standardized scale. In particular, all normal distributions satisfy the
68.7 − 95.7 − 99.7 rule.
If X has the N (µ, σ) distribution, then the standardized variable Z = (X−µ)
σ
has the standard normal distribution N (0, 1). Relative frequencies for any
normal distribution can be calculated from the standard normal table
(Table A from the book), which gives relative frequencies for the events
Z < z for many values of z.
The adequacy of a normal model for describing a distribution of data is
best assesed by a normal quantile plot, which is available in most statistical software packages. A pattern on such a plot that deviates substantially
from a traight line indicates that the data are not normal.
Download