Chapter 1.3 – The Normal Distribution Density Curves Stat 226 – Introduction to Business Statistics I So far we have: graphically displayed data: histogram, stemplot, boxplot Spring 2009 Professor: Dr. Petrutza Caragea Section A Tuesdays and Thursdays 9:30-10:50 a.m. described the overall pattern and identified deviations and outliers numerically quantified center and spread of the distribution If the distribution (as displayed by the histogram) appears sufficiently regular, we can approximate it with a smooth curve, a so-called density curve. The density curve is simplified and an idealized version of reality, but can still be useful! Example: Chapter 1, Section 1.3 The Normal Distribution Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 1 / 38 Chapter 1.3 – The Density Curve Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 2 / 38 Section 1.3 4 / 38 Chapter 1.3 – The Normal Distribution gas mileage example from textbook: Properties A density curve is a curve that is always on or above the horizontal axis, and has an area of exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any range of values is the proportion of all observations that fall in that range. Examples: Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 3 / 38 Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Chapter 1.3 – The Normal Distribution Chapter 1.3 – The Normal Distribution Median and Mean of a Density Curve .4 .5 .6 .7 .8 4 5 6 7 8 9 Median: The equal-areas point with 50% of the mass on either side. Mean: The balancing point of the curve, if it were a solid mass Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 1 2 3 4 5 6 7 8 9 10 11 12 13 5 / 38 Chapter 1.3 – The Normal Distribution Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 6 / 38 Section 1.3 8 / 38 Chapter 1.3 – The Normal Distribution Introduction to Normal Distributions the Normal (or Gaussian) distribution is the single most important distribution in Statistics. Normal Distribution (by Carl Friedrich Gauss (1777 - 1855)) many variables can be modeled (described) using the Normal distribution, e.g. height of humans SAT scores length of human pregnancies, etc. it is characterized by the following two parameters: the and the overall shape: Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 7 / 38 Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Chapter 1.3 – The Normal Distribution Chapter 1.3 – The Normal Distribution pictures of various normal distributions: Notation: to denote the normal distribution we use Example: denotes a normal distribution with mean and standard deviation , while denotes a normal distribution with mean and standard deviation . To denote that a variable (e.g. heights, SAT scores, etc.) follows a normal distribution we write Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 9 / 38 Chapter 1.3 – The Normal Distribution Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I , we 68% standard deviation standard approx. of all the data fall within deviations of the mean, i.e. within standard Introduction to Business Statistics I 12 / 38 95% approx. of all the data fall within deviations of the mean, i.e. within Stat 226 (Spring 2009, Section A) Section 1.3 99.7% holds for all normal distributions (i.e. for any choice of µ and σ) approx. of the data fall within of the mean, i.e. within 10 / 38 Chapter 1.3 – The Normal Distribution The 68-95-99.7 Rule 68-95-99.7 Rule For a variable that follows a have that Section 1.3 34% 13.5% 0.15% 11 / 38 Stat 226 (Spring 2009, Section A) 13.5% 2.35% 2.35% " # 3! Section 1.3 34% " # 2! " #! " " $! " $ 2! Introduction to Business Statistics I 0.15% " $ 3! Chapter 1.3 – The Normal Distribution Chapter 1.3 – The Normal Distribution The Standard Normal Distribution Example: The length of human pregnancies follows a normal distribution with mean µ = 266 days and a standard deviation of σ = 16 days. 1 How long do the middle 95% of all pregnancies last? is a “special” normal distribution. has a mean and a standard deviation denoted by . Nearly all the area is between and . . $%& !"#$%#&%'()&*#+'%,-"&,./",)$ How long do the shortest 16% of all pregnancies last (at most)? 3 How long do the longest 0.15% of all pregnancies last (at least)? $%" $%$ $%# ()*+,-'. $%! 2 !! !" !# $ # " ! ' Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 13 / 38 Chapter 1.3 – The Normal Distribution Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I For the standard normal distribution, the proportion of observations falling into a specified range is tabulated. 1 What fall in a specified range. of individuals 2 What know their data value. a given individual falls at if you 3 What data value corresponds to a given . This is the tabulated values. normal distribution for which we have We therefore need to any given normal distribution to a standard normal distribution, i.e. the values from any are transformed to the corresponding values from a . This is called Introduction to Business Statistics I 14 / 38 Chapter 1.3 – The Normal Distribution Knowing the mean and the standard deviation of a normal distribution allows us to determine Stat 226 (Spring 2009, Section A) Section 1.3 Section 1.3 15 / 38 Stat 226 (Spring 2009, Section A) . Introduction to Business Statistics I Section 1.3 16 / 38 Chapter 1.3 – The Normal Distribution Chapter 1.3 – The Normal Distribution standardizing, z-score If x is an observation from a normal distribution that has mean µ and standard deviation σ, the standardized value of x is given by A standardized value is often called a Example: (length of human pregnancies continued) . A z-score tells us how many standard deviations the original observation is off the mean and in which direction. Observations larger than the mean are positive (i.e. have a positive z-score) when standardized, and observations smaller than the mean are negative (i.e. have a negative z-score) when standardized. Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 17 / 38 Chapter 1.3 – The Normal Distribution Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 18 / 38 Chapter 1.3 – The Normal Distribution Once we know the corresponding z-score of an observation we can look up the overall proportion (percentage) of men in that population having a height of 73 inches or more. Finding z-scores and corresponding proportions/areas under the normal curve Why are z-scores helpful? ⇒ need to know how to read Table A (Table of the Standard Normal Distribution) IQ’s follow a normal distribution with mean µ = 100 and standard deviation σ = 16 heights of males follow approx. a normal distribution with mean µ = 70 inches and σ = 3 Who is more unusual? — A man being 73 inches tall or a man having an IQ of 124? ⇒ Table A in your textbook Note, in the following the terms proportion, probability, percentage, and area are all interchangeable, i.e. proportion = probability = percentage = area Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 19 / 38 Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 20 / 38 Chapter 1.3 – The Normal Distribution Chapter 1.3 – The Normal Distribution To find the proportion (corresponding to the area under the normal curve) of observations that fall into a given range, e.g. between -z and z: The first column gives the z-score values correct to one decimal place and the first row gives the second decimal place for a zscore. For example, if we want to find the area below z=-2.24, we will find z=-2.2 in the first column, then look for z=0.04 along the first row. Where the corresponding row and column intersect gives the value 0.0125. Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 21 / 38 Chapter 1.3 – The Normal Distribution Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 22 / 38 Chapter 1.3 – The Normal Distribution using table a to find proportions under the normal curve 1 What proportion of observations is greater than z = 1.67? 2 What proportion is less than z = −2.00 and greater than z = 2.00? consider the following situations: 1 What proportion of observations is below z = −1.67, i.e. what is the probability of observing a z-score of -1.67 or less? 2 What proportion is below z = 1.67? Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 23 / 38 Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 24 / 38 Chapter 1.3 – The Normal Distribution 1 What is the area between z = −1.25 and z = 1.25? 2 What proportion is between z = 0.96 and z = 2.33? Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Chapter 1.3 – The Normal Distribution Section 1.3 25 / 38 Chapter 1.3 – The Normal Distribution 1 What z-score does the 30th percentile correspond to? 2 What z-scores bound the middle 60%? Stat 226 (Spring 2009, Section A) 26 / 38 Applications of the Normal Distribution 1 State the problem, i.e. state the mean µ, the standard deviation σ and the value of the observation x 2 standardize x, i.e. find the corresponding z-score using x −µ z= σ draw picture, i.e. locate z-score under normal curve and shade area of interest 4 Section 1.3 Chapter 1.3 – The Normal Distribution Applications of the Normal Distribution 3 Introduction to Business Statistics I Example: male heights ∼ N(70, 3) 1 What proportion of men is shorter than 72 inches? 2 What proportion of men is taller than 65 inches? 3 What proportion of men is taller than 73 inches? use Table A to find the shaded area Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 27 / 38 Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 28 / 38 Chapter 1.3 – The Normal Distribution Chapter 1.3 – The Normal Distribution Backwards Calculations What proportion of men has an IQ of 124 or more? (IQ ∼ N(100, 16)) we can also work backwards — given a certain percentile (or proportion), what is the corresponding value of x? Example: Heights ∼ N(70, 3) Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 29 / 38 Chapter 1.3 – The Normal Distribution 1 What value does the 50th percentile of men’s height correspond to? 2 What value does the 10th percentile of men’s height correspond to? Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 30 / 38 Chapter 1.3 – The Normal Distribution In general, to do backward calculations use the following formula What value does the Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 31 / 38 Stat 226 (Spring 2009, Section A) 85th x =z ∗σ+µ percentile correspond to? Introduction to Business Statistics I Section 1.3 32 / 38 Chapter 1.3 – The Normal Distribution Chapter 1.3 – The Normal Distribution Assessing Normality of Data How to assess Normality Based on experience and/or past data the assumption of normality might be justified Histogram/stemplot or boxplot: reveal non-normal features, such as skewness In general it is quite risky though to assume normality without looking at the data and verifying normality multiple models outliers Normally distributed data allow the application of further statistical procedures which enable us to learn more about the data and also to further derive additional information about the variable we are interested in. (We will learn about such procedures in Chapters 6&7) If the above graphical displays appear somewhat normal, i.e. they indicate a symmetric, unimodal, bell-shaped distribution we can use a so-called normal quantile plot. If data are not normally distributed and we still apply statistical procedures that require the assumption of normality, derived information can be wrong and misleading. Normal quantile plots are a more sensitive tool allowing us to take a closer look to judge the adequacy of normality. Section 1.3 33 / 38 Chapter 1.3 – The Normal Distribution Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 34 / 38 Chapter 1.3 – The Normal Distribution Normal quantile plots: Observations from a standard normal distribution for various sample sizes n=1250 hard to construct by hand (use JMP) n=100 4 for main idea see pages 67 & 68 of the textbook .999 .99 .95 .90 If distribution is close to a normal distribution, the plots points in a normal quantile plot will lie close to a straight line. Some Caution: Real data almost always show some departure from normality (i.e. from a perfect normal distribution). .75 .50 .25 .10 .05 .01 .001 3 2 1 0 4 .999 3 .99 2 .95 .90 1 .75 0 .50 .25 -1 -1 .10 .05 -2 -2 .01 -3 Normal Quantile Plot Introduction to Business Statistics I Normal Quantile Plot Stat 226 (Spring 2009, Section A) -3 .001 -4 -4 It is important to restrict the examination of a normal quantile plot to searching for clear departures from normality. We can ignore “minor wiggles” in the plot — most common methods will work well as long as the data are reasonably close to a normal distribution with no extreme outliers. Stat 226 (Spring 2009, Section A) Introduction to Business Statistics I Section 1.3 35 / 38 -3 -2 -1 0 Stat 226 (Spring 2009, Section A) 1 2 3 -3 -2 Introduction to Business Statistics I -1 0 1 2 3 Section 1.3 36 / 38 Chapter 1.3 – The Normal Distribution Chapter 1.3 – The Normal Distribution small sample sizes Observations from a skewed right and a triangular distribution .95 .90 .75 .50 .25 .10 .05 .01 .001 3 2 1 0 4 .999 3 .99 2 .95 .90 1 .75 0 .50 .25 -1 -3 0 Stat 226 (Spring 2009, Section A) 1 1 .75 0 .50 .25 -3 .001 Introduction to Business Statistics I 0 1 2 1 .75 0 .50 .25 -1 .10 .05 -2 .01 -3 .001 -4 3 0 Section 1.3 2 .95 .90 -3 .001 3 .99 -2 -4 -1 4 .999 -1 .10 .05 .01 -4 -1 2 .95 .90 -2 .01 3 .99 -1 .10 .05 -2 4 .999 Normal Quantile Plot .99 Normal Quantile Plot 4 .999 37 / 38 1 2 3 4 Stat 226 (Spring 2009, Section A) 5 6 7 Normal Quantile Plot n=25 Normal Quantile Plot n=10 -4 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 Introduction to Business Statistics I 1 Section 1.3 38 / 38