The Normal Distribution This module introduces one of the most important distributions in statistics: the normal distribution. Learning Objectives After completion of this module the student will be able to recognize the normal distribution recognize the standard normal distribution perform calculations with the normal distribution using Excel convert a normal distribution to a standard normal distribution Knowledge and Skills normal distribution standard normal distribution Prerequisites sample mean, average sample standard deviation sample variance percentile This module relies heavily on Collaborative Statistics: Chapter 6 or online http://cnx.org/content/m16979/latest/ Citation: Neuhauser, C. Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 1 Guide to Chapter 6 Section 6.1 Read Subsections 6.1.1 and 6.1.2 Length at Birth The U.S. Department of Health of Human Services (HHS) is a U.S. government agency with responsibility to “protect the health of all Americans and providing essential human services” (http://www.hhs.gov/about/). One of HHS’s agencies is the Centers for Disease Control and Prevention (CDC), which oversees a number of coordinating centers/offices and the National Institutes of Health including the Coordinating Center for Health Information and Service (CCHIS). The National Center for Health Statistics (NCHS) is an office within the CCHIS. According to NCHS’s website, its mission “is to provide statistical information that will guide actions and policies to improve the health of the American people. As the Nation's principal health statistics agency, NCHS leads the way with accurate, relevant, and timely data”( http://www.cdc.gov/nchs/). Data on the health and nutritional status of adults and children in the U.S. is collected by the National Health and Nutrition Examination Survey (NHANES). We will be looking at one of NHANES’ data sets, namely the growth curves for infants. The data files can be found at http://www.cdc.gov/growthcharts/html_charts/lenageinf.htm. Growth charts that provide percentiles for length and weight at various ages are important screening tool to monitor the growth of infants and assess their nutritional status and health risks caused by, for instance, obesity. The following data1 lists selected percentiles for height (cm) for boys at 0 months: Percentile 3 5 10 25 50 75 90 95 97 Height [cm] 44.93 45.57 46.55 48.19 49.99 51.77 53.36 54.31 54.92 The distribution of the length follows a certain pattern that is described by the normal distribution. Subsection 6.1.3 We say that a random variable is normally distributed with mean μ and standard deviation σ if the probability density function is given by 1 f ( x) e 2 1 ( x )2 2 2 for x Source: http://www.cdc.gov/growthcharts/html_charts/lenageinf.htm Citation: Neuhauser, C. Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 2 Because of its importance in statistics, it is worthwhile to memorize the form of the density function. The cumulative distribution function F(x) P(X x) needs to be calculated by a computer or looked up in tables. Figure 1: The probability density function of a normal distribution with mean 0 and standard deviation 1. The probability density function of a normal distribution with mean 0 and standard deviation 1 is displayed in Figure 1. Note the bell-shaped curve. EXCEL has a function that returns the cumulative distribution function and the density function for a normal distribution with a specified mean and standard deviation. The syntax is NORMDIST(x,mean,standard_dev,cumulative) Where x is the value at which you want to evaluate the distribution, mean is the mean, standard_dev is the standard deviation, and cumulative is a logical value (TRUE for the cumulative distribution function and FALSE for the probability density function). In-class Activity 1 (a) The height for boys at age 0 is normally distributed with mean 49.99cm and standard deviation 2.66cm. Use EXCEL to confirm the percentiles given in the table above. Instruction: To determine the percentile of the value 44.93cm, you need to calculate the probability that the height is less than or equal to the value 44.93cm. The following function in EXCEL calculates this value =NORMDIST(44.93,49.99,2.66, TRUE) Citation: Neuhauser, C. Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 3 Rounded to two decimal places, the answer is 0.03, which is the 3rd percentile, thus confirming the result in the table. Repeat this for the other values given in the table. (b) Plot in EXCEL the cumulative distribution function and the probability density function for a normally distributed random variable with mean 49.99 and standard deviation 2.66 for 40 x 60 . Section 6.2 Read Section 6.2. The normal distribution is described by two parameters, the mean μ and the standard deviation σ. There are thus infinitely many such distribution. However, it turns out that they are all related. Namely, if a random variable is normally distributed with mean μ and standard deviation σ, we can define a new random variable, Z, as follows Z X This random variable is then normally distributed with mean 0 and standard deviation 1. Section 6.3 Homework: Read Section 6.3 and work through all examples and problems. Section 6.4 Homework: Read Section 6.4 Section 6.5 Homework: Read Section 6.5 and work through all examples and problems. Below are solutions to two of the problems using EXCEL functions. Problem 1: Find P(X 65) when X is normally distributed with mean 63 and standard deviation 5: X N(63,5) . Solution: To find P(X 65) in EXCEL, enter “=1-NORMDIST(65,63,5,TRUE)”. You should get “0.344578” as your answer. Citation: Neuhauser, C. Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 4 EXCEL has a function that returns the inverse of the cumulative distribution function of a normal distribution with mean μ and standard deviation σ. The syntax is NORMINV(probability, mean, standard_dev) This function is useful when calculating percentiles as in the following problem. Problem 3: Find the 90th percentile of a random variable with distribution N(63,5) . Solution: In EXCEL, enter “=NORMINV(0.9,63,5)”. You should get “69.40776” as your answer. Homework Project The following data gives the length-at-birth for girls: Percentile 3 5 10 25 50 75 90 95 97 Height [cm] 45.09 45.58 46.34 47.68 49.29 51.02 52.70 53.77 54.50 Assume that the height-at-birth is normally distributed. Find the mean and the standard deviation. Citation: Neuhauser, C. Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 5