Continuous Probability Distributions: The Normal Distribution Normal Dist1 Towards the Meaning of Continuous Probability Distribution Functions: When we introduced probabilities, we spoke of discrete events: S = collection of all possible sample points ei 0 P(ei) 1 Probability of any event is between zero and one P(ei) = 1 Probability of all elementary events sum to 1 (something happens) Normal Dist2 In particular, for the binomial distribution: For the random variable X: • x stands for a particular value 0 P[ X x] 1 The probability that the random variable X takes the value x is between 0 and 1, inclusive. P[ X x] 1 all x The sum of the probabilities over all possible values of x is 1. Normal Dist3 A continuous variable has infinitely many possible values: With infinitely many possible values, the probability of observing any one particular value is essentially zero: [Pr(X=x)] = 0 e.g., for x=1.0 vs 1.02 vs 1.0195 vs 1.01947, … Pr(X=x) is meaningless for a continuous random variable – Instead, we consider a range of values for X: Pr(aX b) We can make this range quite broad or very narrow Normal Dist4 Comparing Probability Distributions for Discrete vs Continuous Random Variables We need new notation to describe probability distributions for continuous variables. Discrete Continuous List all possible sample points, e.g., State the range of of possible values of X; e.g., S={ei}, i=1 to k. to 0 to Note: is the symbol for ‘infinity’ to 0 Normal Dist5 For a continuous Random Variable, X, • P(X=x) = 0 • Instead, we compute the probability of X within some interval: b P[a X b] f x ( x )dx a This function is the probability density function of X. Don’t worry – if you don’t know or have forgotten calculus, I won’t be asking you to work with this notation. Normal Dist6 Much of statistical inference is based upon a particular choice of a probability density function, fx(x) – The Normal distribution. • This function is a mathematical model describing one particular pattern of variation of values. • It is appropriate for continuous variables only. Normal Dist7 Practically speaking, the normal distribution function is appropriate for: • Many phenomena that occur naturally. • Special cases of other phenomena. e.g., averages of phenomena that, individually are not normally distributed. For example, the sampling distribution of means may follow a normal distribution even when the underlying data do not. Normal Dist8 The Normal Probability Density Function 1 f x ( x) e 2 ( x )2 2 2 Features to note: The range of X is – to is the mathematical constant 3.14159… e is the mathematical constant 2.71828… Normal Dist9 The Normal Probability Density Function 1 f x ( x) e 2 ( x )2 2 2 Features to note: is the mean of the distribution is the standard deviation of the distribution 2 is the variance (x – )2 the squared deviation from the mean appears in the function Normal Dist10 Notation: X ~ N(,2) We say “X follows a Normal Distribution with mean and variance 2 ” or “X is Normally distributed with mean and variance 2 ” Normal Dist11 A Picture of the Normal Distribution fx x x The infamous “Bell-shaped Curve” Normal Dist12 There are infinitely many normal distributions, each determined by different values of and 2. The Shape of the Normal Distribution is characteristically • Smooth • Defined everywhere on the real axis • Bell-shaped • Symmetric about the mean = (it is defined in terms of deviations about the mean) Normal Dist13 fx x x The area under the curve represents probability, and the total area under the curve = 1 1 Pr[ X ] e 2 ( x )2 2 2 dx 1 Normal Dist14 Pr[X < x] x The area under the curve up to the value x is often represented by the notation: ( x) Pr[ X x] Pr[ X x] Normal Dist15 A Feeling for the Shape of the Normal distribution: locates the center, and measures the spread Normal Dist16 IF alone is changed – by adding a constant c, • the entire curve is shifted in location • but the shape remains the same. c Normal Dist17 IF alone is changed – by multiplying by a constant c • the shape of the bell is changed • a larger variance implies a wider spread (or flatter curve) – the area under the curve is always 1 c Normal Dist18 Picturing the Normal Probability Density x As the variance, 2, increases: • Bell flattens (gets wide) • Values close to the mean are less likely • Values farther from the mean more likely. As the variance decreases: • Bell narrows • Most values are close to the mean • Values close to the mean are more likely Normal Dist19 A Very Handy Rough Rule of Thumb: If X follows a Normal Distribution Then: ~68% of the values of X are in the interval 68% Normal Dist20 If X follows a Normal Distribution Then: ~95% of the values of X are in the interval 1.96 1.96 1.96 ~99% of the values of X are in the interval 2.576 2.58 2.58 Normal Dist21 Why is the Normal Distribution So Important? There are two types of data that follow a normal distribution: 1. A number of naturally occurring phenomena: For example : • heights of men (or women) • total blood cholesterol of adults 2. Special functions of some non-normally distributed phenomena, in particular sums and averages: The sampling distribution of sample means tends to be ~ Normal. Normal Dist22 Research often focuses on sample means Example: Blood pressure can vary with time of day, stress, food, illness, etc. One reading may not be a good representation of “typical” Distribution of a single reading of blood pressure for an individual – tends to be skewed, with a few high values Normal Dist23 To have a better gauge of an individual’s BP, we might use the average of 5 readings: Sampling Distribution of mean of 5 readings for an individual – tends to be ~ Normal, even when the original distribution is not Normal Dist24 A Feeling for the Central Limit Theorem. • Shake a pair of die. • On each roll, note the total of the two die faces. • This total can range from 2 to 12. • The most likely total is 7. (Why?) • How often do the other totals arise? Histogram of die totals for n=100 trials of rolling die pair 2 3 4 5 6 7 8 9 10 11 12 Normal Dist25 Histogram of die totals for n=1000 trials of rolling die pair 2 3 4 5 6 7 8 9 10 11 12 As the sample size n increases the distribution of the sum of the 2 die begins to look more and more normal. Normal Dist26 A Statement of the Central Limit Theorem: For any population with • mean and finite variance 2, • the sampling distribution of means, x, • from samples of size n from this population, • will be approximately normally distributed • with mean , • and variance 2/n, • for n large. That is, for n large, and X ~ ?? (, 2) then Xn ~ N (, 2/n) Normal Dist27 This is the main reason for our interest in the normal distribution: • regardless of the underlying distribution • if we take a large enough sample • we can make probability statements about means from such samples • based upon the normal distribution. This is true, even when the underlying distribution is discrete. Normal Dist28 Example: The Central Limit Theorem Works even for VERY non-normal data: A population has only 3 outcomes in it: 1 2 9 P(X=x) 1/3 1 2 mean of { 1,2,9 }: =4 9 X sum of {1,2,9}=12 standard deviation of {1,2,9} =3.6 Normal Dist29 Experiment: Take sample of size n with replacement. Compute sum of all n. Repeat… Look at Sampling Distribution of Sums n=25 n=50 n=100 Normal Dist30 To compute probabilities for a normal distribution. • Recall that we are looking at intervals of values of the random variable, X. • The probability that X has a value in the interval between a and b is the area under the curve corresponding to that interval: a b b Pr(a X b) f x ( x)dx a Note: since Pr(X=a) or any exact value is zero, this can be written as Pr(aXb) or Pr(a<X<b) Normal Dist31 The symmetry of the normal distribution can also help in computing probabilities. • The normal distribution is symmetric about the mean µ. • This tells us that the probability of a value less than the mean is .5 or 50%, • and the probability of a value greater than the mean is also .5 or 50% Pr( X ) f x ( x)dx 0.5 0.5 0.5 Normal Dist32 The Standard Normal Distribution The standard normal distribution is just one of infinitely many possible normal distributions. It has mean: = 0 variance: 2 = 1 =1 =0 By convention we let the letter Z represent a random variable that is distributed Normally with =0 and 2=1: Z ~ N(0,1) Normal Dist33 The standard normal distribution is important for several reasons: • Probabilities of Z within any interval have been computed and tabulated. • It is possible to look up Pr(a Z b) for any values of a and b in such tables. • Any other normal distribution can be transformed to a standard normal for computing probabilities. • Distances from the mean are equivalent to number of standard deviations from the mean. This last is perhaps of greatest interest to us, now that software does much of the transformation and computation for us. Normal Dist34 Table 3 in the Appendix of Rosner gives areas under the normal curve, in 4 different ways: • Column A gives values between – and z, where z is a particular value of the standard normal distribution. (Note: Rosner uses X rather than Z) That is, column A gives values for Pr(– Z z) = Pr(Z z) z is also known as a standard normal deviate. Pr[Z < z] 0 z Normal Dist35 Table 3 in the Appendix of Rosner: • Column B gives values between z and Pr(z Z ) = Pr(z Z) = Pr(Z z) 0 z • Column C gives values between 0 and z Pr(0 Z z) 0 z • Column D gives values between -z and z Pr(-z Z z) -z 0 z Normal Dist36 A probability calculation for any random variable, X~Normal (,2) can be re- expressed as an equivalent probability calculation for a standard Normal (0,1). This is nice because • we have tables for probabilities of the Normal (0,1) distribution. • We can interpret probabilities in terms of # of std deviations from the mean Of course, we can also use computer programs to compute probabilities for any Normal Distribution – the program does the translation for us. Normal Dist37 The Normal (0,1) or Standard Normal Table. Positive values of z are read from the first column (under x in Rosner) z 0.0 0.01 … 0.30 0.31 A B C D .5000 .5000 .0 .0 .5040 .4960 .0040 .0080 .6179 .3821 .1179 .2358 .6217 .3783 .1217 .2434 Pr[Z < 0.31] 0 z 0.31 The shaded area, which is the probability of Z z, is shown under Col A of the table: Pr(Z < 0.31) = .6217 A check that this makes sense: any positive value of z is above the mean, and should have a probability > .5 Normal Dist38 Note that only positive values of z are tabulated. We can take advantage of a few important features of the standard normal, to compute probabilities for values of z less than zero: • Symmetry Pr(Z -z) = Pr(Z z) • Zero is the median Pr(Z 0) = Pr(Z 0) = .50 • Total area is 1 Pr(Z z) + Pr(Z z) = 1 Normal Dist39 For example, we cannot read Pr(Z < -0.31) directly from the tables. We can, however use the property of symmetry: Use the property of symmetry to get this. Pr(Z <- 0.31) = .3783 z = - 0.31 We can read this probability from Col B Pr(Z > 0.31) = .3783 z = 0.31 Normal Dist40 -z 0 z Normal Dist41 Example Word Problem What is the probability of a value of Z more than 1 standard deviation below the mean? Solution: Since = 0 and = 1 1 standard deviation below the mean is z = 1x 0 11 Pr(Z<-1) = 0.1587 -1 0 The probability of observing a value more than 1 standard deviation below the mean is .1587, or just under 16%. Normal Dist42 Example: What is the probability Z is between –1.5 and 1.5? We can read this from Column D of the Table in Rosner: Pr[-1.50 Z 1.50] from the table: 0.8664 Example: What is the probability of Z more than 1.5 standard deviations from the mean in either direction? Since probabilities sum to 1: Pr[ Z -1.50 or 1.50 Z ] = 1 – 0.8664 = 0.1336 By symmetry, half of this or 0.0668 lies at either end. .0668 .0668 -1.50 0 1.50 Normal Dist43 Exercise Find the area under the standard normal curve between Z = +1 and Z = +2 Solution. It helps to draw pictures! 0 1 Pr(1<Z<2) 2 0 2 0 = Pr(Z<2) Pr(Z<1) = 0.9772 0.8413 = 0.1359 1 Normal Dist44 Notes on using Standard Normal Tables: • These come in a variety of formats. The examples given here are for the version seen in Rosner, Table 3 in the Appendix. • Look at the accompanying picture of the distribution to be clear what probability is listed in the body of the table. • Draw a sketch (paper and pencil) when computing probabilities – it always helps you keep track of what you are doing. • Minitab provides the same probabilities as Column A: Pr(X<x), when Cumulative Probability is selected Normal Dist45 Using Minitab: Calc Probability Distributions Normal Select for Pr(Z<z) or Pr(X<x) Enter value of z (or x) Normal Dist46 Finding Percentiles of the Normal Distribution Example: What is the 75th percentile of N(0,1) ? Solution: Again, it helps to draw a picture! 0.75 0 z.75 We want the area under the curve to be 75% -The value of z we want is the value, below which 75% of values are found. That is, find z.75 so that Pr(Z < z.75) = .75 Normal Dist47 Use the Inverse Cumulative Option in Minitab Input desired percentile Inverse Cumulative Distribution Function Normal with mean = 0 and standard deviation = 1.00000 P( X <= x) 0.7500 x 0.6745 Normal Dist48 Standardizing a Normal Random Variate: From N(,2) to N(0,1) We can transform any Normal distribution to a standard normal by means of a simple transformation: X ~ N ( , ) 2 Z X ~ N (0,1) Normal Dist49 Standardizing a Normal Random Variate: From N(,2) to N(0,1) Adding a constant: For X~N(,2) (X+b) ~ N(?,?) b The mean is shifted over ‘b’ units, but the variance or spread of the data is unchanged by adding a constant: (X+b) ~ N(+b, 2) Normal Dist50 Multiplying by a constant: For X~N(,2) (aX) ~ N(?,?) a a The mean is adjusted to ‘a’ times the original mean, and the variance by a2 times the original variance – this is a shift in scale: (aX) ~ N(a, a22) Normal Dist51 Adding a constant, multiplying by a constant: For X~N(,2) (aX+b) ~ N(?,?) Both adjustments are made: The mean is adjusted to ‘a’ times the original mean plus ‘b’, and the variance by a2 times the original variance: (aX+b) ~ N(ab, a22) Normal Dist52 Now, let Then a 1/ and b / 1 X Z aX b X For X~N(,2) Z ~ N(?,?) z a b 0 1 2 1 2 a 1 2 z Or 2 2 Z ~ N(0,1) Normal Dist53 X ~ N ( , ) 2 Z X ~ N (0,1) We have transformed the original scale • to units measured in multiples of standard deviations • centered around zero • A value of z=-1 means the value of x is 1 standard deviation below the mean • A value of z=2.5 means the value of x is 2.5 standard deviations above the mean Normal Dist54 This transformation is also important, because if we want to know Pr(a X b) Then we can convert it to an equivalent calculation: a X b Pr(a X b) Pr b a Pr Z Normal Dist55 Word Problem The profit from the Massachusetts state lottery on any given week is distributed Normally with mean = 10.0 million and variance = 6.25 million dollars. What is the probability that this week’s profit is between 8 and 10.5 million? Let X = weekly profit in millions Then X ~ N(,2) where =10 and 2=6.25 ( =2.5 ) What is Pr(8 X 10.5) ? Normal Dist56 What is Pr(8 X 10.5) ? Translate to Standard Normal: 8 X 10.5 Pr(8 X 10.5) Pr 10.5 10 8 10 Pr Z 2.5 2.5 Pr 0.8 Z 0.2 -.8 .2 Normal Dist57 -.8 .2 Pr(Z<0.2) – Pr(Z<-.8) Read from Table 3 or use Minitab or other program: = 0.5793 – 0.2119 = 0.3674 The probability of a weekly profit between 8 and 10.5 million dollars is 36.74%. Normal Dist58 Application of the Central Limit Theorem • Means of samples of size n • from a population with • mean and variance 2 • follow a normal distribution • with mean and variance 2/n, for n large. That is, for X ~ ?(, 2) for n large, X ~ N(, 2/n) Normal Dist59 Example: Consider a population of families with =3.4 children per family and 2=4.37. What percentage of samples of size n=4 families will have means greater than 5 children per family? Sample means from samples with n=4 follow a normal distribution with x= 3.4 and x2 = 2/n = 4.37/4 = 1.09. Then x = 1.045 We want: Pr(X>5) , where X ~ N(3.4, 1.09) Normal Dist60 X x 5 3.4 Pr( X 5) Pr 1.045 x Pr Z 1.53 Pr(z > 1.53) = 0.06 1.53 The probability of observing a sample with a mean of 5 children per family or larger, when n=4 is about 6%. Normal Dist61 So far we have gone from • X ~ N(, 2) Z ~ N(0,1): Z X We may be interested in the reverse: • Z ~ N(0,1) X ~ N(, 2): X Z Normal Dist62 Example: The distribution of IQ scores is normal with a mean of 100 and a standard deviation of 15. What is the 95th percentile of this distribution? Step 1: Find the 95th percentile of the standard normal – use Minitab, or another program to compute: Inverse Cumulative Distribution Function Normal with mean = 0 and standard deviation = 1.00000 P( X <= x) 0.9500 x 1.6449 or z.95 = 1.645 Normal Dist63 Step 2: X Z We know X ~ N(100, 152), and z.95 = 1.645 x.95 = z.95 + = (15)(1.645) + 100 = 124.7 The 95th percentile of the IQ distribution is 124.7 Normal Dist64 Another Example: Taking samples of size n=4 from the population of families with =3.4 children per family and 2=4.37: What is the middle 50% of the sampling distribution? 50% 25% 25% a b That is, find a and b so the Pr(a X b) = .50 a is the 25th percentile of the sampling distribution of X b is the 75th percentile of the sampling distribution of X Normal Dist65 Use Minitab to find 25th and 75th percentiles of standard normal: Inverse Cumulative Distribution Function P( X <= x) x 0.2500 -0.6745 0.7500 0.6745 For X ~ N(, 2/n) where =3.4 and 2/n=1.09, Convert z back to x: x = z x + x.75 = .675 (1.045) + 3.4 = 4.11 x.25 = -.675 (1.045) + 3.4 = 2.69 Pr( 2.69 < X < 4.11) = .50 50% of samples of size 4 from this population will have mean family size between 2.69 and 4.11 children per family. Normal Dist66 Recap. . . Introduction to the Normal Distribution For continuous variables, we speak of a • probability density function • We calculate the probabilities of intervals of values, not individual values The normal distribution is a good description of • many naturally occurring phenomena • the average of non-normal phenomena This last is particularly important since much statistical inference is based on the behavior of averages. Normal Dist67 While there are infinitely many normal distributions, each determined by and 2, • they can all be standardized by using the transformation Z X ~ N (0,1) • We use the standardized form to compute probabilities for any normal distribution. • In the standardized form, distance from the mean is in units of standard deviation Normal Dist68