Chapter 1.3 – The Normal Distribution

advertisement
Chapter 1.3 – The Normal Distribution
Density Curves
Stat 226 – Introduction to Business Statistics I
So far we have:
graphically displayed data: histogram, stemplot, boxplot
Spring 2009
Professor: Dr. Petrutza Caragea
Section A
Tuesdays and Thursdays 9:30-10:50 a.m.
described the overall pattern and identified deviations and outliers
numerically quantified center and spread of the distribution
If the distribution (as displayed by the histogram) appears sufficiently
regular, we can approximate it with a smooth curve, a so-called density
curve.
The density curve is simplified and an idealized version of reality, but can
still be useful!
Example:
Chapter 1, Section 1.3
The Normal Distribution
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
1 / 38
Chapter 1.3 – The Density Curve
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
2 / 38
Section 1.3
4 / 38
Chapter 1.3 – The Normal Distribution
gas mileage example from textbook:
Properties
A density curve is a curve that
is always on or above the horizontal axis, and
has an area of exactly 1 underneath it.
A density curve describes the overall pattern of a distribution. The area
under the curve and above any range of values is the proportion of all
observations that fall in that range.
Examples:
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
3 / 38
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Chapter 1.3 – The Normal Distribution
Chapter 1.3 – The Normal Distribution
Median and Mean of a Density Curve
.4
.5
.6
.7
.8
4
5
6
7
8
9
Median: The equal-areas point with 50% of the mass on either side.
Mean: The balancing point of the curve, if it were a solid mass
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
1 2 3 4 5 6 7 8 9 10 11 12 13
5 / 38
Chapter 1.3 – The Normal Distribution
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
6 / 38
Section 1.3
8 / 38
Chapter 1.3 – The Normal Distribution
Introduction to Normal Distributions
the Normal (or Gaussian) distribution is the single most important
distribution in Statistics.
Normal Distribution
(by Carl Friedrich Gauss (1777 - 1855))
many variables can be modeled (described) using the Normal
distribution, e.g.
height of humans
SAT scores
length of human pregnancies, etc.
it is characterized by the following two parameters:
the
and
the
overall shape:
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
7 / 38
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Chapter 1.3 – The Normal Distribution
Chapter 1.3 – The Normal Distribution
pictures of various normal distributions:
Notation: to denote the normal distribution we use
Example:
denotes a normal distribution with mean
and
standard deviation
, while
denotes a normal
distribution with mean
and standard deviation
.
To denote that a variable (e.g. heights, SAT scores, etc.) follows a
normal distribution we write
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
9 / 38
Chapter 1.3 – The Normal Distribution
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
, we
68%
standard deviation
standard
approx.
of all the data fall within
deviations of the mean, i.e. within
standard
Introduction to Business Statistics I
12 / 38
95%
approx.
of all the data fall within
deviations of the mean, i.e. within
Stat 226 (Spring 2009, Section A)
Section 1.3
99.7%
holds for all normal distributions (i.e. for any choice of µ and σ)
approx.
of the data fall within
of the mean, i.e. within
10 / 38
Chapter 1.3 – The Normal Distribution
The 68-95-99.7 Rule
68-95-99.7 Rule
For a variable that follows a
have that
Section 1.3
34%
13.5%
0.15%
11 / 38
Stat 226 (Spring 2009, Section A)
13.5%
2.35%
2.35%
" # 3!
Section 1.3
34%
" # 2!
" #!
"
" $!
" $ 2!
Introduction to Business Statistics I
0.15%
" $ 3!
Chapter 1.3 – The Normal Distribution
Chapter 1.3 – The Normal Distribution
The Standard Normal Distribution
Example: The length of human pregnancies follows a normal distribution
with mean µ = 266 days and a standard deviation of σ = 16 days.
1
How long do the middle 95% of all pregnancies last?
is a “special” normal distribution.
has a mean
and a standard deviation
denoted by
.
Nearly all the area is between
and
.
.
$%&
!"#$%#&%'()&*#+'%,-"&,./",)$
How long do the shortest 16% of all pregnancies last (at most)?
3
How long do the longest 0.15% of all pregnancies last (at least)?
$%"
$%$
$%#
()*+,-'.
$%!
2
!!
!"
!#
$
#
"
!
'
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
13 / 38
Chapter 1.3 – The Normal Distribution
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
For the standard normal distribution, the proportion of
observations falling into a specified range is tabulated.
1
What
fall in a specified range.
of individuals
2
What
know their data value.
a given individual falls at if you
3
What data value corresponds to a given
.
This is the
tabulated values.
normal distribution for which we have
We therefore need to
any given normal
distribution to a standard normal distribution, i.e. the values from any
are transformed to the corresponding values from a
.
This is called
Introduction to Business Statistics I
14 / 38
Chapter 1.3 – The Normal Distribution
Knowing the mean and the standard deviation of a normal distribution
allows us to determine
Stat 226 (Spring 2009, Section A)
Section 1.3
Section 1.3
15 / 38
Stat 226 (Spring 2009, Section A)
.
Introduction to Business Statistics I
Section 1.3
16 / 38
Chapter 1.3 – The Normal Distribution
Chapter 1.3 – The Normal Distribution
standardizing, z-score
If x is an observation from a normal distribution that has mean µ and
standard deviation σ, the standardized value of x is given by
A standardized value is often called a
Example: (length of human pregnancies continued)
.
A z-score tells us how many standard deviations the original
observation is off the mean and in which direction.
Observations larger than the mean are positive (i.e. have a positive
z-score) when standardized, and observations smaller than the mean
are negative (i.e. have a negative z-score) when standardized.
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
17 / 38
Chapter 1.3 – The Normal Distribution
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
18 / 38
Chapter 1.3 – The Normal Distribution
Once we know the corresponding z-score of an observation we can look up
the overall proportion (percentage) of men in that population having a
height of 73 inches or more.
Finding z-scores and corresponding proportions/areas
under the normal curve
Why are z-scores helpful?
⇒ need to know how to read Table A (Table of the Standard Normal
Distribution)
IQ’s follow a normal distribution with mean µ = 100 and standard
deviation σ = 16
heights of males follow approx. a normal distribution with mean
µ = 70 inches and σ = 3
Who is more unusual? — A man being 73 inches tall or a man having an
IQ of 124?
⇒ Table A in your textbook
Note, in the following the terms proportion, probability, percentage, and
area are all interchangeable, i.e.
proportion = probability = percentage = area
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
19 / 38
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
20 / 38
Chapter 1.3 – The Normal Distribution
Chapter 1.3 – The Normal Distribution
To find the proportion (corresponding to the area under the normal
curve) of observations that fall into a given range, e.g. between -z
and z:
The first column gives the z-score values correct to one decimal
place and the first row gives the second decimal place for a zscore. For example, if we want to find the area below z=-2.24, we
will find z=-2.2 in the first column, then look for z=0.04 along the first
row. Where the corresponding row and column intersect gives the
value 0.0125.
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
21 / 38
Chapter 1.3 – The Normal Distribution
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
22 / 38
Chapter 1.3 – The Normal Distribution
using table a to find proportions under the normal curve
1
What proportion of observations is greater than z = 1.67?
2
What proportion is less than z = −2.00 and greater than z = 2.00?
consider the following situations:
1 What proportion of observations is below z = −1.67, i.e. what is the
probability of observing a z-score of -1.67 or less?
2
What proportion is below z = 1.67?
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
23 / 38
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
24 / 38
Chapter 1.3 – The Normal Distribution
1
What is the area between z = −1.25 and z = 1.25?
2
What proportion is between z = 0.96 and z = 2.33?
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Chapter 1.3 – The Normal Distribution
Section 1.3
25 / 38
Chapter 1.3 – The Normal Distribution
1
What z-score does the 30th percentile correspond to?
2
What z-scores bound the middle 60%?
Stat 226 (Spring 2009, Section A)
26 / 38
Applications of the Normal Distribution
1
State the problem, i.e. state the mean µ, the standard deviation σ
and the value of the observation x
2
standardize x, i.e. find the corresponding z-score using
x −µ
z=
σ
draw picture, i.e. locate z-score under normal curve and shade area of
interest
4
Section 1.3
Chapter 1.3 – The Normal Distribution
Applications of the Normal Distribution
3
Introduction to Business Statistics I
Example: male heights ∼ N(70, 3)
1 What proportion of men is shorter than 72 inches?
2
What proportion of men is taller than 65 inches?
3
What proportion of men is taller than 73 inches?
use Table A to find the shaded area
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
27 / 38
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
28 / 38
Chapter 1.3 – The Normal Distribution
Chapter 1.3 – The Normal Distribution
Backwards Calculations
What proportion of men has an IQ of 124 or more? (IQ ∼ N(100, 16))
we can also work backwards — given a certain percentile (or proportion),
what is the corresponding value of x?
Example: Heights ∼ N(70, 3)
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
29 / 38
Chapter 1.3 – The Normal Distribution
1
What value does the 50th percentile of men’s height correspond to?
2
What value does the 10th percentile of men’s height correspond to?
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
30 / 38
Chapter 1.3 – The Normal Distribution
In general, to do backward calculations use the following formula
What value does the
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
31 / 38
Stat 226 (Spring 2009, Section A)
85th
x =z ∗σ+µ
percentile correspond to?
Introduction to Business Statistics I
Section 1.3
32 / 38
Chapter 1.3 – The Normal Distribution
Chapter 1.3 – The Normal Distribution
Assessing Normality of Data
How to assess Normality
Based on experience and/or past data the assumption of normality
might be justified
Histogram/stemplot or boxplot: reveal non-normal features, such as
skewness
In general it is quite risky though to assume normality without
looking at the data and verifying normality
multiple models
outliers
Normally distributed data allow the application of further statistical
procedures which enable us to learn more about the data and also to
further derive additional information about the variable we are
interested in. (We will learn about such procedures in Chapters 6&7)
If the above graphical displays appear somewhat normal, i.e. they indicate
a symmetric, unimodal, bell-shaped distribution we can use a so-called
normal quantile plot.
If data are not normally distributed and we still apply statistical
procedures that require the assumption of normality, derived
information can be wrong and misleading.
Normal quantile plots are a more sensitive tool allowing us to take a closer
look to judge the adequacy of normality.
Section 1.3
33 / 38
Chapter 1.3 – The Normal Distribution
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
34 / 38
Chapter 1.3 – The Normal Distribution
Normal quantile plots:
Observations from a standard normal distribution for various sample sizes
n=1250
hard to construct by hand (use JMP)
n=100
4
for main idea see pages 67 & 68 of the textbook
.999
.99
.95
.90
If distribution is close to a normal distribution, the plots points in a
normal quantile plot will lie close to a straight line.
Some Caution:
Real data almost always show some departure from normality (i.e.
from a perfect normal distribution).
.75
.50
.25
.10
.05
.01
.001
3
2
1
0
4
.999
3
.99
2
.95
.90
1
.75
0
.50
.25
-1
-1
.10
.05
-2
-2
.01
-3
Normal Quantile Plot
Introduction to Business Statistics I
Normal Quantile Plot
Stat 226 (Spring 2009, Section A)
-3
.001
-4
-4
It is important to restrict the examination of a normal quantile plot to
searching for clear departures from normality.
We can ignore “minor wiggles” in the plot — most common methods
will work well as long as the data are reasonably close to a normal
distribution with no extreme outliers.
Stat 226 (Spring 2009, Section A)
Introduction to Business Statistics I
Section 1.3
35 / 38
-3
-2
-1
0
Stat 226 (Spring 2009, Section A)
1
2
3
-3
-2
Introduction to Business Statistics I
-1
0
1
2
3
Section 1.3
36 / 38
Chapter 1.3 – The Normal Distribution
Chapter 1.3 – The Normal Distribution
small sample sizes
Observations from a skewed right and a triangular distribution
.95
.90
.75
.50
.25
.10
.05
.01
.001
3
2
1
0
4
.999
3
.99
2
.95
.90
1
.75
0
.50
.25
-1
-3
0
Stat 226 (Spring 2009, Section A)
1
1
.75
0
.50
.25
-3
.001
Introduction to Business Statistics I
0
1
2
1
.75
0
.50
.25
-1
.10
.05
-2
.01
-3
.001
-4
3
0
Section 1.3
2
.95
.90
-3
.001
3
.99
-2
-4
-1
4
.999
-1
.10
.05
.01
-4
-1
2
.95
.90
-2
.01
3
.99
-1
.10
.05
-2
4
.999
Normal Quantile Plot
.99
Normal Quantile Plot
4
.999
37 / 38
1
2
3
4
Stat 226 (Spring 2009, Section A)
5
6
7
Normal Quantile Plot
n=25
Normal Quantile Plot
n=10
-4
0
.1 .2 .3 .4 .5 .6 .7 .8 .9
Introduction to Business Statistics I
1
Section 1.3
38 / 38
Download