Distribution of Continuous R.V.: Normal Distribution (Ch 1.4) Topics: §1.4 What is Normal Distribution, and its density function, mean, variance Standard Normal Distribution: (a) Calculating Probability (b) Calculating Percentile General Normal Distribution : (a) Calculating Probability (b) Calculating Percentiles ------------------------------------------------------------------------------------------------------I. Normal random variable/Normal Distribution A distribution for describing continuous random variables Two common ways to describe a Normal distribution 1. Density plot Shape: Symmetric, centering at also the median and mean. Can be fully specified via two parameters: and 2 . The distribution is denoted by N ( , ) , It can be shown 2 is the variance of x ( is the standard deviation of x) Ex. A B D C 1 2. Density function (for your reference): f ( x; , ) 1 ( x )2 exp{ } 2 2 2 What problems are we interested in solving regarding normal distribution? 1. Know how to calculate probabilities from a given normal distribution Ex. Test score X ~ N (75, 5). P(90< X < 100)? P(X < 60) ? 2. Be able to identify the percentiles of the population Ex. Test score X ~ N (75, 5). (1) What are largest 10% of the scores? That is, we want to find x0.1 such that P[ X x0.1 ] 10% 0.1 (2) What are the most extreme 5% of the scores? That is, we want to find x0.05 such that P[ X x0.05 ] 5% 0.05 II. Standard Normal Distribution Normal distribution with mean=0 and SD=1. Denoted by N(0, 1). Usually use Z to denote a standard normal r.v. Why learn the standard normal distribution? o Area under the normal curve can only be calculated numerically. 2 So statisticians have established a table that shows the left tail area under the standard normal curve of any given number (see the very first page of the textbook). o Later we can use such table to solve for all normal distribution. How? One can STANDARDIZE any given N ( , ) to N(0, 1), and then use the area table of standard normal to solve the problem (Your HW2, Question #2, 2.61) Use the area table of standard normal curve (1) Calculate probability Ex. A variable Z ~ N(0, 1). Calculating the following probabilities: 1. P(Z 1.25) =0.8944 2. P(Z -1.25) =0.1056 ( = 1 - 0.8944) 3. P(Z -1.25) = 1 - P(Z -1.25) =1 - 0.1056 = 0.8944 4. P(-.38 Z .25) =P[Z 0.25] – P[Z -0.38] = 0.5987 – 0.3520 = 0.2467 In general, P[a Z b] = P[Z b] – P[Z a]. 5. P(Z -6) < P[Z -3.89] = 0.0000 6. P(Z 2) = 1 – P[Z < 2] = 1 – P[Z 2] = 1 – 0.9772 = 0.0228 3 (2) Obtain extreme values Ex1. A variable Z ~ N(0, 1). Find the following z* that fulfills the probability: 1. P(Z z*) = 0.1 z * 1.28 (the exact value is -1.281552) 2. P(Z z*) = 0.5 z* 0 3. P(Z z* or Z -z*) = 0.1 By symmetry of N(0,1), P[Z - z*] = 0.05, -z* = -1.645, z* = 1.645. Ex2. Consider a standard Normal r.v. Z~N(0,1). At what value of z*, the area to the right is 2.5%? Want to find z* such that P[Z z*] = 0.025, or P[Z z*] = 0.975. The value of z* = 1.96 Ex3. Consider a standard Normal r.v . Z~N(0,1). At what value of z*, the area between –z* and z* is 68%? P[-z* Z z*] = 0.68 P[Z - z*] = (1 - 0.68)/2 = 0.16 - z* = - 0.995, z = 0.995. 4 III. General Normal Distribution If X has a normal distribution with mean and SD , then we can standardize X to Z by Z has the standard normal distribution Therefore, P[a X b] P[ X a X b ] P[a Z b ], where a a and b b Calculating probability and percentiles Ex. A variable X ~ N(100, 5). Calculating the following probabilities: 1. P(90 X 125) = P[a Z b ] , where a 90 100 125 100 2, b 5 . So 5 5 P(90 X 125) = P(-2 Z 5) = P(Z 5) – P(Z -2) = 1 – 0.0228 = 0.9772. 2. P( X 98 ) = P[Z 98 100 0.4 ] = 1 – P[Z -0.4] = 1- 0.3446 = 0.6554 5 3. Find the x* such that P( X x* )=0.1 P[ X x*] P[ X 100 x * 100 x * 100 ] P[ Z ] 0.1 . But P[Z -1.28] = 0.1, so 5 5 5 x * 100 1.28 , which gives x* = 100 + 5*(-1.28) = 93.6 5 4. Find the range that contains the MIDDLE 90% of the observations: Want to find a such that x is in [100 – a, 100 + a] with 90% probability P[100 a X 100 a] P[ a X 100 a ] P[ a / 5 Z a / 5] 0.9, P[ Z a / 5] 0.05 5 5 5 So, - a/5 = - 1.645, a = 5*1.645 = 8.225. The range is [100 – 8.225, 100 + 8.225] = [91.775, 108.225] 5 Ex. X is the diameter (in mm) of tires, normally distributed with mean 575 and SD 5. 1. P(575 < X < 579)= 575 575 X 575 579 575 P[ ] P[0 Z 0.8] P[ Z 0.8] 0.5 0.7881 0.5 0.2881. 5 5 5 2. P(575 X 579)=0.2881 3. Find the diameter x* such that there are only 1% tires longer than this diameter That is, P[X > x*] = 0.01 or equivalently P[X < x*] = 0.99. Since P[Z<2.33] = 0.99, so x* = 575 + 5 * 2.33 = 586.65. 4. Find the (diameters of) tires that have most extreme 5% diameters. That is, P[X > x*] = 0.05 or equivalently P[X < x*] = 0.95. Since P[Z<1.645] = 0.95, so x* = 575 + 5 * 1.645 = 583.225. 6 Putting everything together…. An overall example: The diameter of a tire follows normally distribution with mean 575 and SD 5. We have 4 tires, and the diameters of these tires are independent of each other. (a) What is the probability that a tire has its diameter between 570 and 580? Let Xi be the diameter for tire i. Then P[570 < Xi < 580] = P[-1 < Z < 1] = 0.8413 – 0.1587 = 0.6826. (b) What is the probability that all 4 tires have diameters between 570 and 580? Let Ai = [570 < Xi < 580]. Then A1, A2, A3, A4 are independent. So P[ A1 A2 A3 A4 ] P( A 1 ) P( A 2 ) P( A 3 ) P( A 4 ) 0.6826 4 0.2171 (c) What is the probability that at least one tire is not between 570 and 580? This probability = 1 – P[all tires are between 570 and 580] = 1 - P[ A1 A2 A3 A4 ] 1 0.2171 = 0.7829 7