The-Normal-Distribution

advertisement
The Normal Distribution
This module introduces one of the most important distributions in statistics: the normal distribution.
Learning Objectives
After completion of this module the student will be able to




recognize the normal distribution
recognize the standard normal distribution
perform calculations with the
normal distribution using Excel
convert a normal distribution to
a standard normal distribution
Knowledge and Skills


normal distribution
standard normal distribution
Prerequisites




sample mean, average
sample standard deviation
sample variance
percentile
This module relies heavily on Collaborative Statistics: Chapter 6 or online
http://cnx.org/content/m16979/latest/
Citation: Neuhauser, C. Normal Distribution.
Created: September 9, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 1
Guide to Chapter 6
Section 6.1
Read Subsections 6.1.1 and 6.1.2
Length at Birth
The U.S. Department of Health of Human Services (HHS) is a U.S. government agency with responsibility
to “protect the health of all Americans and providing essential human services”
(http://www.hhs.gov/about/). One of HHS’s agencies is the Centers for Disease Control and Prevention
(CDC), which oversees a number of coordinating centers/offices and the National Institutes of Health
including the Coordinating Center for Health Information and Service (CCHIS). The National Center for
Health Statistics (NCHS) is an office within the CCHIS. According to NCHS’s website, its mission “is to
provide statistical information that will guide actions and policies to improve the health of the American
people. As the Nation's principal health statistics agency, NCHS leads the way with accurate, relevant,
and timely data”( http://www.cdc.gov/nchs/). Data on the health and nutritional status of adults and
children in the U.S. is collected by the National Health and Nutrition Examination Survey (NHANES). We
will be looking at one of NHANES’ data sets, namely the growth curves for infants. The data files can be
found at http://www.cdc.gov/growthcharts/html_charts/lenageinf.htm.
Growth charts that provide percentiles for length and weight at various ages are important screening
tool to monitor the growth of infants and assess their nutritional status and health risks caused by, for
instance, obesity. The following data1 lists selected percentiles for height (cm) for boys at 0 months:
Percentile
3
5
10
25
50
75
90
95
97
Height [cm] 44.93 45.57 46.55 48.19 49.99 51.77 53.36 54.31 54.92
The distribution of the length follows a certain pattern that is described by the normal distribution.
Subsection 6.1.3
We say that a random variable is normally distributed with mean μ and standard deviation σ if the
probability density function is given by

1
f ( x) 
e
 2
1
( x   )2
2 2
for    x  
Source: http://www.cdc.gov/growthcharts/html_charts/lenageinf.htm
Citation: Neuhauser, C. Normal Distribution.
Created: September 9, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 2
Because of its importance in statistics, it is worthwhile to memorize the form of the density function.
The cumulative distribution function F(x)  P(X  x) needs to be calculated by a computer or looked up
in tables.
Figure 1: The probability density function of a normal distribution with mean 0 and standard deviation 1.
The probability density function of a normal distribution with mean 0 and standard deviation 1 is
displayed in Figure 1. Note the bell-shaped curve.
EXCEL has a function that returns the cumulative distribution function and the density function for a
normal distribution with a specified mean and standard deviation. The syntax is
NORMDIST(x,mean,standard_dev,cumulative)
Where x is the value at which you want to evaluate the distribution, mean is the mean, standard_dev is
the standard deviation, and cumulative is a logical value (TRUE for the cumulative distribution function
and FALSE for the probability density function).
In-class Activity 1
(a) The height for boys at age 0 is normally distributed with mean 49.99cm and standard deviation
2.66cm. Use EXCEL to confirm the percentiles given in the table above.
Instruction: To determine the percentile of the value 44.93cm, you need to calculate the probability
that the height is less than or equal to the value 44.93cm. The following function in EXCEL calculates this
value
=NORMDIST(44.93,49.99,2.66, TRUE)
Citation: Neuhauser, C. Normal Distribution.
Created: September 9, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 3
Rounded to two decimal places, the answer is 0.03, which is the 3rd percentile, thus confirming the
result in the table. Repeat this for the other values given in the table.
(b) Plot in EXCEL the cumulative distribution function and the probability density function for a normally
distributed random variable with mean 49.99 and standard deviation 2.66 for 40  x  60 .
Section 6.2
Read Section 6.2.
The normal distribution is described by two parameters, the mean μ and the standard deviation σ. There
are thus infinitely many such distribution. However, it turns out that they are all related. Namely, if a
random variable is normally distributed with mean μ and standard deviation σ, we can define a new
random variable, Z, as follows
Z
X 

This random variable is then normally distributed with mean 0 and standard deviation 1.
Section 6.3
Homework: Read Section 6.3 and work through all examples and problems.
Section 6.4
Homework: Read Section 6.4
Section 6.5
Homework: Read Section 6.5 and work through all examples and problems. Below are solutions to two
of the problems using EXCEL functions.
Problem 1: Find P(X  65) when X is normally distributed with mean 63 and standard deviation 5:
X N(63,5) .
Solution: To find P(X  65) in EXCEL, enter “=1-NORMDIST(65,63,5,TRUE)”. You should get “0.344578”
as your answer.
Citation: Neuhauser, C. Normal Distribution.
Created: September 9, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 4
EXCEL has a function that returns the inverse of the cumulative distribution function of a normal
distribution with mean μ and standard deviation σ. The syntax is
NORMINV(probability, mean, standard_dev)
This function is useful when calculating percentiles as in the following problem.
Problem 3: Find the 90th percentile of a random variable with distribution N(63,5) .
Solution: In EXCEL, enter “=NORMINV(0.9,63,5)”. You should get “69.40776” as your answer.
Homework Project
The following data gives the length-at-birth for girls:
Percentile
3
5
10
25
50
75
90
95
97
Height [cm] 45.09 45.58 46.34 47.68 49.29 51.02 52.70 53.77 54.50
Assume that the height-at-birth is normally distributed. Find the mean and the standard deviation.
Citation: Neuhauser, C. Normal Distribution.
Created: September 9, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 5
Download