Chapter 5 - The Normal Curve - PART II

advertisement
Chapter 5 - The Normal Curve
PART II : DESCRIPTIVE STATISTICS
Dr. Joseph Brennan
Math 148, BU
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
1 / 20
Histogram and the Density Curve
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
2 / 20
Density Curves
A density curve may be used to display
the distribution of the data in addition
to or instead of a histogram.
We can consider a density curve as a
smooth approximation to the
histogram computed from the data.
A density curve describes the
distribution of a quantitative
continuous variable.
Dr. Joseph Brennan (Math 148, BU)
For continuous response variables, the
histogram computed from the data
(sample), approximates the (unknown)
population density of the response
variable.
Chapter 5 - The Normal Curve
3 / 20
Properties of Density Curves
Like histograms, the density curves may be described by their symmetry
and if they are skewed.
Density curves also have measures of center and spread.
µ is the mean of a density curve.
µ̃ is the median of a density curve.
σ is the standard deviation of a density curve.
q1 and q3 are the first and third quartiles of a density curve.
NOTE 1: The mean and median are the same for a symmetric
density curve. They both lie at the center of the curve.
NOTE 2: The mean of a skewed curve is pulled away from the
median in the direction of the long tail.
NOTE 3: The standard deviation of a density curve is computed
mathematically, and is difficult to estimate visually.
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
4 / 20
Population Parameters and Statistics
If the density curve describes the population distribution, then the mean µ
and standard deviation σ of the density curve are the (unknown)
population parameters.
The sample average x̄ and s computed from a data set estimate µ and σ,
respectively, but usually are not exact.
µ = 0.25
σ = 0.144338
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
x̄ = 0.2556
s = 0.144446
5 / 20
The Normal Curve
Perhaps, the most important density curve in statistics!
Figure : Figure 6. The (standard) normal density curve.
The curve is defined by the equation
z2
1
p(z) = √ e − 2 ,
2π
Dr. Joseph Brennan (Math 148, BU)
where e = 2.71828...
Chapter 5 - The Normal Curve
(1)
6 / 20
Properties of the Normal Curve
Properties of the (standard) normal
curve:
Symmetric about zero,
Unimodal,
The mean, median, and mode are
equal,
Bell-shaped,
The mean µ = 0 and the standard
deviation σ = 1,
The area under the whole normal
curve is 100% (or 1, if you use
decimals).
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
7 / 20
The Normal Approximation of Data
Many histograms for data are similar in shape to the normal curve,
provided they are drawn to an appropriate scale.
Normal Approximation:
Transforming the horizontal scale of a
histogram so that it aligns with the standard normal density curve.
z-units are the resulting value a data point attains after normal
approximation. (More information to come!)
If the histogram follows the normal curve, the area under the
histogram will be about the same as the area under the curve.
The area under the histogram corresponds to the percentage of
observations in the corresponding interval.
The goal of normal approximation is to use the normal density curve
approximating percentages of observations in a given interval.
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
8 / 20
The Empirical Rule
The (standard) normal curve is plotted against z, the standard units. The
following property of the normal curve explains the origins of the Empirical Rule.
THE 68-95-99.7 RULE for the NORMAL CURVE
Approximately 68% of observations fall within 1 standard unit of 0
(−1 < z < 1).
Approximately 95% of observations fall within 2 standard units of 0
(−2 < z < 2).
Approximately 99.7% of observations fall within 3 standard unit of 0
(−3 < z < 3).
The Empirical Rule, which is applicable to bell-shaped normal-like
histograms, is the direct consequence of the above property of the normal
curve.
The range −1 < z < 1 in standard units correspond to x̄ − s < x < x̄ + s in
the original, nonstandard units.
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
9 / 20
The 68-95-99.7 Rule
Figure : Normal curve and percentage of observations under it. Horizontal scale
uses the standard units z.
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
10 / 20
z-Scores
z-Score:
The transformation of data into standard units, normal
approximation:
observation − mean
z=
standard deviation
Thus, any data point x may be recomputed in standard units as
x − x̄
zx =
.
s
We call the z which corresponds to x the z-score zx . Note that
zx < 0 if x < x̄;
zx = 0 if x = x̄;
zx > 0 if x > x̄.
(2)
We may reverse the transformation; if zx is known, x can be found by
x = x̄ + s · zx .
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
(3)
11 / 20
z-Scores
zx =
x − x̄
s
The z - score indicates the number of standard deviations away a
data point falls above or below the average x̄.
If the histogram plotted against the z - scores follows the normal
curve well, we say that the normal distribution provides a good
approximation for the distribution of the data.
The normal curve is well studied and many of it’s values have been stored
in normal tables. Data that is found to have a good normal
approximation can be correlated with the normal curve.
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
12 / 20
Normal Table
A normal table found in the text providing the area between −z and z:
Figure : Figure 9. Fragment of a normal table.
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
13 / 20
Exercise 1, Set B, p.84
Using a normal table, let us find the area under the normal curve:
(d) between 0.4 and 1.3
(a) to the right of 1.25
Table Value: 0.8944
0.1056 = 1 − 0.8944
(b) to the left of -0.4
Table Value: 0.34464
(c) to the left of 0.8
Table Value: 0.7881
Table Value of 0.4: 0.6554
Table Value of 1.3: 0.9032
0.2478 = 0.9032 − 0.6554
(e) between -0.3 and 0.9
Table Value of -0.3: 0.3821
Table Value of 0.9: 0.8159
0.4338 = 0.8159 − 0.3821
(f) outside -1.5 to 1.5
Table Value of -1.5: 0.0668
Table Value of 1.5: 0.9332
0.1336 = (1 − 0.9332) + 0.0668
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
14 / 20
Example 8, p.85
The heights of the men age 18 and over in HANES5 averaged 69 inches;
the SD was 3 inches. Use the normal curve to estimate the percentage of
these men with heights between 63 inches and 72 inches.
Solution: The exact percentage is equal to the area under the height histogram
between 63 inches and 72 inches. We assume that the histogram can be well
approximated by the normal curve.
We will estimate the percentage of men between 63 and 72 inches by finding the
area of the corresponding region under the standard normal curve.
Step 1: Draw a number line and shade the interval of interest.
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
15 / 20
Example 8, p.85
Step 2: Mark the mean on the line and convert to standard units.
The z - score for the left endpoint is
z63 =
63 − 69
x − x̄
=
= −2.
s
3
The z - score for the right endpoint is
z72 =
x − x̄
72 − 69
=
= 1.
s
3
Step 3: Sketch the normal curve and find the area under the curve above
the shaded interval by using normal tables.
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
16 / 20
Example 8, p.85
Conclusion: From our table of z-scores, z63 = −2 is the 2.28 percentile
and z72 = 1 is the 84.13 percentile.
Therefore, about 82% of the heights were between 63 inches and 72
inches. This is only an approximation, though, in truth, 81% of the men
were in that range.
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
17 / 20
Example (S.A.T.)
The SAT is a test for readiness of students for college. The average SAT score (on
a 1600 point scale) is 1025 points and the standard deviation is 200 points. How
well must Jessica do on the SAT in order to place in the top 10% of all students?
Solution: The problem does not say that the histogram of the SAT scores is
bell-shaped, but it is reasonable to assume so. We will use the normal
approximation to the distribution of the SAT scores to solve the problem.
First, find a z-score representing the 90th percentile.
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
18 / 20
Example (S.A.T.)
Using the normal table provided in the textbook, Jessica is hoping for a
score that translates to z ≈ 1.3.
We know x̄ = 1025 and s = 200.
z=
x − x̄
s
⇒
x = x̄ + s · z= 1025 + 200 · 1.3 = 1285
So Jessica should score 1285 points to expect to be among the top 10%
of students.
The freshman average SAT score at Binghamton was 1305 in 2011,
in what percentile is the average freshman?
1305 − 1025
= 1.4
200
Using our z-table we find a value of 0.9192. Therefore the average
freshman at BU is 92 percentile.
z1305 =
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
19 / 20
IQ Score
An intelligence quotient, or IQ, is a score derived from one of several
standardized tests designed to assess intelligence. The mean score is
normalized as 100 and the standard deviation is roughly 15.
An IQ score of 70 is what percentile?
z70 =
70−100
15
= −2
Table Value of −2: 0.0228 or 2.2%
An IQ of 150 is required for entrance into a gifted program, what
percentage of students are considered eligible?
z150 =
150−100
15
= 3.33
Table Value of 3.33: 0.9996
With a requirement of a score of 150, only 0.04% of students will be
considered ”gifted”.
Dr. Joseph Brennan (Math 148, BU)
Chapter 5 - The Normal Curve
20 / 20
Download