Unit 3 - Department of Mathematics and Statistics

advertisement
Chapter 5
Page 1 of 10
Statistics 103
Chapter 5. The Normal Approximation of Data
We know from experience that many histograms look like the one below.
Figure 1. Sample Histogram
Around 1720 Abraham de Moivre discovered the normal curve. The normal
curve is a smooth bell shaped curved. All other histograms with a shape similar to that
seen in Figure 1 can be compared to it.
The reason that many histogram follow the bell curve is due to the central limit
theorem in mathematics which says that the distribution of the average of a random
sample MUST get closer and closer to the bell curve as the sample size gets bigger.
Some useful facts about the STANDARD normal curve are
1. It is symmetric about zero.
2. The total area under the curve is 100%.
3. The tails of the curve get close to zero but they never touch zero.
Here is a graph of the STANDARD NORMAL CURVE as an approximation to
histograms that have the bell shape.
Chapter 5
Page 2 of 10
Statistics 103
Figure 2. The Normal Curve
Percent Per Standard Unit
45
40
35
30
25
20
15
10
5
0
-6
-4
-2
0
2
4
6
Standard Units
Notice that we use the density scale on the vertical axis. Now if we want to
approximate the percent of observations that fall between one value and another we may
use the standard normal curve whose values are tabulated in the back of the book. Since
histograms can have any mean and SD and we want to use only one normal curve, we
have to be able to convert to STANDARD UNITS first.
DATA _ VALUE  MEAN
 DATA_ VALUE _ IN _ STANDARD _ UNITS
STANDARD _ DEVIATION
In real life, most problems do not follow the standard normal curve. However, many of
them can be related to the normal curve. Another way to write the procedure for
converting to standard units is
x  mean
z
Formula A
sd
If we convert values to standard units using Formula A, then the procedures for working
with all normal distributions are the same as those for the standard normal distribution.
Chapter 5
Page 3 of 10
Statistics 103
mean
z
x
x  mean
sd
Shaded
Area = A
0
z
Example A survey indicates that for each trip to the supermarket, a shopper spends an
average of mean=45 minutes with a standard deviation of sd=12 minutes. The length of
time spent in the store follows the normal curve and is represented by the variable x. A
shopper enters the store. Find the chances that the shopper will be in the store for the
lengths of time listed below.
1. Between 24 and 54 minutes
2. More than 39 minutes
Solution
1. The histogram for the amount of time spent in the store looks like the normal
curve so we can use the normal curve to approximate the chances. It is the area
between 24 and 54 under the normal curve.
  45
24
45 54
Time (minutes)
x
Chapter 5
Page 4 of 10
Statistics 103
The graph above shows a normal curve with mean = 45 minutes and sd=12 minutes. The
area for x between 24 and 54 minutes is shaded. The standard units that correspond to 24
minutes and to 54 minutes are
24  45
54  45
 1.75 and z 2 
 0.75
12
12
So, the probability that a shopper will be in the store between 24 and 54 minutes is the
area under the standard normal curve from –1.75 to 0.75. We look up in the table the area
from –1.75 to +1.75 as 91.98. The area from –1.75 to 0 is therefore 91.98/2=46%,
approximately. The table gives the area from z=-.75 to +.75 as 54.6%. Half of that is
what we need, or about 27%. So the area from –1.75 to + 0.75 is 46+27=73%.
z1 
Another way of interpreting this probability is to say that 73% of the shoppers will be in
the store between 24 and 54 minutes.
2.
  45
39 45
Time (minutes)
x
The graph above shows a normal curve with   45 minutes and   12 minutes. The
area for x greater than 39 minutes is shaded. The z score that corresponds to 39 minutes
is
z 
39  45
 0 . 5
12
The area under the standard normal curve from –0.5 to 0 is half of 38.29, or about 19%.
The area from 0 to infinity is 50%. So the area from –0.5 to infinity is 50+19=69%.

Chapter 5
Page 5 of 10
Statistics 103
Steps for Converting to the Standard Normal Curve Using Table A-2
1.
Sketch a normal curve, label the mean and the specific x values, then shade the
region representing the desired area.
2.
For each relevant value x that is a boundary for the shaded region, use Formula A
to convert that value to the equivalent z score.
3.
Refer to Table A-2 to find the area of the shaded region. This area is the desired
probability.
A value is converted to standard units by seeing how many SDs it is above or below
the average. Here is a simple rule of thumb you should remember.
 The area under the normal curve between –1 and +1 corresponds to an area of
68%.
 The area under the normal curve between –2 and +2 corresponds to an area of
95%.
 The area under the normal curve between –3 and +3 corresponds to an area of
99%.
Area = 99%
Area = 95%
Area = 68%
The z-table in the back of your book must be used to find the desired area underneath the
histogram. I have placed the z-table so that you can access it online as well. You can
look it up in the online table OR go to the cell marked Enter a z value. The next cell has
Chapter 5
Page 6 of 10
Statistics 103
the desired area and you can check it with the nearest z in the table. To go to the online z
table click the link:
NORMAL_TABLE.
The normal table is also in the spreadsheet section of the course home unit.
The graphs below show more of the possible areas that you will be asked to find. You
will need to find areas underneath the normal curve in order to properly interpret surveys,
studies or data in general.
Example
Given a standard normal distribution, find the area under the curve that lies (a) to the
right of z = 1.7 and (b) between z = -1 and z = 2.5.
Solution
(a)
First sketch the region in question
1.7
Referring to Table A-2, the area between 0 and 1.7 is 0.4554.
The area to
the left of
the dotted
line is 0.50
1.7
We know that the area to the left of the dotted line 0.5. From Table A-2, we
know that the area between 0 and 1.7 is 0.4554. The total area under the normal
curve is 1. To find the area to the right of 1.7:
1 - (area to left of dotted line) – (area between 0 and 1.7) =
1 - 0.5 - 0.4554 = 0.0446.
(b)
To find the area between z = -1 and z = 2.5, first sketch the region.
Chapter 5
Page 7 of 10
Statistics 103
-1
0
2.5
Referring to Table A-2, or the table in the spreadsheet section, the area between z
= 0 and 1 is z = 0.3413. The area between z = 0 and z = –1 is also 0.3413 by
symmetry. The area between z = 0 and z = 2.5 is 0.4938. To find the area
between z = -1 and z = 2.5, add 0.3413 and 0.4938. The shaded area is 0.8351.
Note that although a z score can be negative, the area under curve (or the corresponding
probability) can never be negative.
Example 1
Find the area between –1.5 and 1.5 under the normal curve.
Solution
First, graph the normal curve and label the desired z-values.
-1.5
1.5
Next, look at the z-table. (Here you can use the link to the table online: click
NORMAL_TABLE
Notice that the area in question is the same area shaded in the table at the back of the
book. In this example problem, the z-value is 1.5. To find this area, simply look up the
value of 1.5 in the table. The corresponding area is 86.64%.
Example 2
Find the area between –1.7 and 0.9 under the normal curve. NORMAL_TABLE
Chapter 5
Page 8 of 10
Statistics 103
Solution
Draw the normal curve and mark the desired area.
Percent Per
Standard
Unit
Example 2. Finding The Area
Between -1.7 and 0.9.
-6
60
40
20
0
-20
-2
0
-4
2
4
6
Standard Units
The area under this curve can be broken up into 2 parts. First find the area between –1.7
and 0.
Percent Per
Standard Unit
Example 2. Area Between -1.7 And 0.
60
40
20
0
-6
-4
-2
-20 0
2
4
6
Standard Units
This area is half of the area shown in the z-table. This area is
Now, find the area between 0 and 0.9.
1
 91.09%  45.55%.
2
Chapter 5
Page 9 of 10
Statistics 103
Percent Per
Standard Unit
Example 2. Finding The Area
Between 0 And 0.9.
-6
-4
-2
60
40
20
0
-20 0
2
4
6
Standard Units
From the z-table NORMAL_TABLE, this area is
1
 63.19%  31.60%.
2
The total area is 45.55% + 31.60% = 77.15%.
Percentiles and Histograms
A score or value X is the the y%-percentile if y% of the area of the histogram is to
the left of X. This is true no matter the shape of the histogram. If we don’t know the
shape of the histogram we often use the percentiles to give us an idea of what a score
means. For example, knowing that an income level of $25,000 per year is the 30th
percentile tells us that 30% of people have an annual income of less that $25,000.
Terms associated with percentiles:
1. quartiles—the first quartile is the 25th percentile, the second quartile is the 50th
percentile, the third quartile is the 75th percentile.
2. median—the same as the second quartile, which is the same as the 50th percentile.
3. interquartile range—the difference between the 75th and 25th percentile.
4. the quartiles of the standard normal curve are z = -0.67, z = 0, z = 0.67.
Example: If it is known that scores on the LSAT follow the normal curve with mean 169
and SD=9, what is the 90th percentile score?
Solution: We want the score Y so that 90% of people scored less than (or equal to) Y.
Using the Normal Table, we find that with z = 1.3, 90% of the area is to the LEFT of 1.3.
This is the 90th percentile of the standard normal curve. Now we convert that to an LSAT
score.
1.3 = (SCORE – 169)/9 so that SCORE = 9 times 1.3 + 169 = 180.7 is the 90th
percentile.
Chapter 5
Page 10 of 10
Statistics 103
Example 3 Pg. 95, #11
Solution
First calculate the average plus one standard deviation: 1.1 + 1.5 = 2.6.
Next, calculate the average minus one standard deviation: 1.1 – 1.5 = -0.4.
On a normal curve, the percentage of college mathematics courses taken by Berkeley
students within one standard deviation would correspond to -0.4 to 2.6 courses.
However, it is not possible for students to have taken –0.4 courses. Logically, the
minimum number of courses that could have been taken should be zero. Therefore, the
histogram does not follow the shape of the normal curve.
Histogram (iii) would say that the number of courses is skewed to the right, meaning that
a larger percentage of students took a lot of math courses. That doesn’t seem right.
Histogram (i) says the majority of students took a few math courses, most took one
course. That is about right and is the correct choice.
Download