Continuous Random Variables & The Normal Probability Distribution Learning Objectives 1. Understand characteristics about continuous random variables and probability distributions 2. Understand the uniform probability distribution 3. Graph a normal curve 4. State the properties of a normal curve 5. Understand the role of area in the normal density function 6. Understand the relation between a normal random variable and a standard normal random variable Continuous Random Variable & Continuous Probability Distribution Continuous Random Variable • The outcomes of a continuous random variable consist of all possible values made up an interval of a real number line. • In other words, there are infinite number of possible outcomes for a continuous random variable. Continuous Random Variable • For instance, the birth weight of a randomly selected baby. The outcomes are between 1000 and 5000 grams with all 1-gram intervals of weight between1000 and 5000 grams equally likely. • The probability that an observed baby’s weight is exactly 3250.326144 grams is almost zero. This is because there may be one way to observe 3250.326144, but there are infinite number of possible values between 1000 and 5000. According to the classical probability approach, the probability is found by dividing the number of ways an event can occur by the total number of possibilities. So, we get a very small probability almost zero. Continuous Random Variable • To resolve this problem, we compute probabilities of continuous random variables over an interval of values. For instance, instead of getting exactly weight of 3250.326144 grams we may compute the probability that a selected baby’s weight is between 3250 to 3251 grams. • To find probabilities of continuous random variables, we use probability distribution (or so called density) function. Uniform Random Variable & Uniform Probability Distribution Uniform Random Variable • Sometimes we want to model a continuous random variable that is equally likely between two limits • Examples – Choose a random time … the number of seconds past the minute is random number in the interval from 0 to 60 – Observe a tire rolling at a high rate of speed … choose a random time … the angle of the tire valve to the vertical is a random number in the interval from 0 to 360 Uniform Probability Distribution • When “every number” is equally likely in an interval, this is a uniform probability distribution – Any specific number has a zero probability of occurring – The mathematically correct way to phrase this is that any two intervals of equal length have the same probability Example • For the seconds after the minute example • Every interval of length 3 has probability 3/60 – The chance that it will be between 14.4 and 17.3 seconds after the minute is 3/60 – The chance that it will be between 31.2 and 34.2 seconds after the minute is 3/60 – The chance that it will be between 47.9 and 50.9 seconds after the minute is 3/60 Probability Density Function • A probability density function is an equation used to specify and compute probabilities of a continuous random variable • This equation must have two properties – The total area under the graph of the equation is equal to 1 (the total probability is 1) – The equation is always greater than or equal to zero (probabilities are always greater than or equal to zero) Probability Density Function • This function method is used to represent the probabilities for a continuous random variable • For the probability of X between two numbers – Compute the area under the curve between the two numbers – That is the probability Area is the Probability • The probability of being between 4 and 8 The probability From 4 (here) To 8 (here) Probability Density Function • An interpretation of the probability density function is – The random variable is more likely to be in those regions where the function is larger – The random variable is less likely to be in those regions where the function is smaller – The random variable is never in those regions where the function is zero Probability Density Function • A graph showing where the random variable has more likely and less likely values More likely values Less likely values Uniform Probability Density Function • The time example … uniform between 0 and 60 – All values between 0 and 60 are equally likely, thus the equation must have the same value between 0 and 60 Uniform Probability Density Function • The time example … uniform between 0 and 60 – Values outside 0 and 60 are impossible, thus the equation must be zero outside 0 to 60 Uniform Probability Density Function • The time example … uniform between 0 and 60 – Because the total area must be one, and the width of the rectangle is 60, the height must be 1/60. Therefore the uniform 1 y f ( x ) probability density is a constant ( the equation is ) 60 1/60 Uniform Probability Density Function • The time example … uniform between 0 and 60 – The probability that the variable is between two numbers is the area under the curve between them 1/60 Normal Random Variable & Normal Probability Distribution Overview • The normal distribution models bell shaped variables • The normal distribution is the fundamental distribution underlying most of inferential statistics Chapter 7 – Section 1 • The normal curve has a very specific bell shaped distribution • The normal curve looks like Normal Random Variable • A normally distributed random variable, or a variable with a normal probability distribution, is a random variable that has a relative frequency histogram in the shape of a normal curve • This curve is also called the normal density curve/function or normal curve (a particular probability density function) • The normal distribution models bell shaped variables • The normal distribution is the fundamental distribution underlying most of inferential statistics Normal Density Curve • In drawing the normal curve, the mean μ and the standard deviation σ have specific roles – The mean μ is the center of the curve – The values (μ – σ) and (μ + σ) are the inflection points of the curve, where the concavity of the curve changes. Normal Density Curve • There are normal curves for each combination of μ and σ • The curves look different, but the same too • Different values of μ shift the curve left and right • Different values of σ shift the curve up and down Normal Curve • Two normal curves with different means (but the same standard deviation) – The curves are shifted left and right Normal Density Curve • Two normal curves with different standard deviations (but the same mean) – The curves are shifted up and down Properties of Normal Curve • Properties of the normal density curve – The curve is symmetric about the mean – The mean = median = mode, and this is the highest point of the curve – The curve has inflection points at (μ – σ) and (μ + σ) – The total area under the curve is equal to 1. The total area is equal to 1. (It is complicated to show this. But it is true.) – The area under the curve to the left of the mean is equal to the area under the curve to the right of the mean Properties of Normal Curve • Properties of the normal density curve – As x increases, the curve getting close to zero (never goes to zero, though)… as x decreases, the curve getting close to zero (never goes to zero) • In addition, – The area within 1 standard deviation of the mean is approximately 0.68 – The area within 2 standard deviations of the mean is approximately 0.95 – The area within 3 standard deviations of the mean is approximately 0.997 (almost 100%) This is so called empirical rule. Therefore, a normal curve will be close to zero at about 3 standard deviation below and above the mean. Empirical Rule • The empirical rule or 68-95-99.7 rule is true – Approximately 68% of the values lie between (μ – σ) and (μ + σ) – Approximately 95% of the values lie between (μ – 2σ) and (μ + 2σ) – Approximately 99.7% of the values lie between (μ – 3σ) and (μ + 3σ) • These are difficult calculations, but they are true Empirical Rule ( 68-95-99.7 Rule) • An illustration of the Empirical Rule Histogram & Density Curve • When we collect data, we can draw a histogram to summarize the results • However, using histograms has several drawbacks • Histograms are grouped, so – There are always grouping errors – It is difficult to make detailed calculations Histogram & Density Curve • Instead of using a histogram, we can use a probability density function that is an approximation of the histogram • Probability density functions are not grouped, so – There are not grouping errors – They can be used to make detailed calculations Normal Histogram • Frequently, histograms are bell shaped such as • We can approximate these with normal curves Normal Curve Approximation • Lay over the top of the histogram with a curve such as • In this case, the normal curve is close to the histogram, so the approximation should be accurate Normal Density Probability Function • The equation of the normal curve with mean μ and standard deviation σ is y 1 2 ( x )2 2 e 2 • This is a complicated formula, but we will never need to use it for the calculation of probabilities. (thankfully) Modeling with Normal Curve • When we model a distribution with a normal probability distribution, we use the area under the normal curve to – Approximate the areas of the histogram being modeled – Approximate probabilities that are too detailed to be computed from just the histogram Example • Assume that the distribution of giraffe weights has μ = 2200 pounds and σ = 200 pounds Example Continued • What is an interpretation of the area under the curve to the left of 2100? Example Continued • It is the proportion of giraffes that weigh 2100 pounds and less Note: Area = Probability = Proportion Standardize Normal Random Variable • How do we calculate the areas under a normal curve? – If we need a table for every combination of μ and σ, this would rapidly become unmanageable – We would like to be able to compute these probabilities using just one table – The solution is to use the standard normal random variable Standard Normal Random Variable • The standard normal random variable is the specific normal random variable that has μ = 0 and σ = 1 • We can relate general normal random variables to the standard normal random variable using a so-called Zscore calculation Standard Normal Random Variable • If X is a general normal random variable with mean μ and standard deviation σ then Z X is a standard normal random variable ( Z-score) • This equation connects general normal random variables with the standard normal random variable • We only need a standard normal table Example • The area to the left of 2100 for a normal curve with mean 2200 and standard deviation 200 Example Continued • To compute the corresponding value of Z, we use the Zscore Z X 2100 2200 1 200 2 • Thus the value of X = 2100 corresponds to a value of Z = – 0.5 Symmary • Normal probability distributions can be used to model data that have bell shaped distributions • Normal probability distributions are specified by their means and standard deviations • Areas under the curve of general normal probability distributions can be related to areas under the curve of the standard normal probability distribution The Standard Normal Distribution Objectives • Find the area under the standard normal curve • Find Z-scores for a given area • Interpret the area under the standard normal curve as a probability How to Compute Area under Standard Normal Curve • There are several ways to calculate the area under the standard normal curve – We can use a table (such as Table IV on the inside back cover) – We can use technology (a calculator or software) • Using technology is preferred Compute Area under Standard Normal Curve • Three different area calculations – Find the area to the left of – Find the area to the right of – Find the area between • Two different methods shown here – From a table – Using TI Graphing Calculator (recommended method) Finding Area under Standard Normal Curve using Z-table • “Area to the left of" – using Z-table ( Standard Normal Table) • Calculate the area to the left of Z = 1.68 – Break up 1.68 as 1.6 + .08 – Find the row 1.6 – Find the column .08 • The probability is 0.9535 Note: The table always covers the area to the left of the Z score. Finding Area under Standard Normal Curve using Z- Table • “Area to the right of" – using a Z- table • The area to the left of Z = 1.68 is 0.9535 from reading the table. • The right of … that’s the remaining amount • The two add up to 1, so the right of is 1 – 0.9535 = 0.0465 which is the solution. Finding Area under Standard Normal Curve using Z-table • “Area Between” • Between Z = – 0.51 and Z = 1.87 • This is not a one step calculation Finding Area under Standard Normal Curve using Z-table • The left hand picture … area to the left of 1.87 ( which is 0.9693) … includes too much • It is too much by the right hand picture … area to the left of -0.51(which is 0.3050) Included too much Finding Area under Standard Normal Curve using Z-table • Area between Z = – 0.51 and Z = 1.87…. 0.9693 – 0.3050 = 0.6643 We want We start out with, but it’s too much We correct by Area = 0.9693 Area=0.3050 Finding Area under Standard Normal Curve using Z- Table • The area between -0.51 and 1.87 The area to the left of 1.87, or 0.9693 … minus The area to the left of -0.51, or 0.3050 … which equals The difference of 0.6643 • Thus the area under the standard normal curve between -0.51 and 1.87 is 0.6643 Finding Area under Standard Normal Curve using Z-table • A different way for “between” …. 1 – (0.3050+0.0307) = 0.6643 We want We delete the extra on the left We delete the extra on the right Area = 0.3050 Area = 0.0307 Finding Area under Standard Normal Curve using Z-table • The area between -0.51 and 1.87 – The area to the left of -0.51, or 0.3050 … plus – The area to the right of 1.87, or 0.0307 … which equals – The total area to get rid of which equals 0.3357 • Thus the area under the standard normal curve between -0.51 and 1.87 is 1 – 0.3357 = 0.6643 Finding Area under Standard Normal Curve using TI Graphing Calculator • • Area to the left of 1.68 – using TI graphing calculator The function is normalcdf( ). Following the key sequence below: 1. DISTR[2ND VARS] DISTR 2:normalcdf ENTER 2 Then, enter -E99,1.68,0,1) ENTER The probability is 0.9535 Note: 1. -E99 = -1099 which is a negative number near –infinity. We use it as the left bound to obtain “less than or equal to” some values, that is, x a . E symbol can be entered by pressing EE on the calculator, using the key sequence [2ND ,]. 2. normalcdf() (cdf means cumulative distribution function) sums up the probabilities. It differs from 1:normalpdf() on the calculator which calculate the normal densities. 3. There are four entries/parameters needed for the function normalcdf(). For instance, to find the probability of a normal variable between the interval from a to b, i.e. a x b. The 1st number entered for normalcdf() is the left bound of an interval a; the 2nd number is the right bound of the interval b; the 3rd number is the mean of the normal variable ( it is 0 for a standard normal variable). The 4th number is the standard deviation of the normal variable. ( which is 1 for a standard normal variable). Finding Area under Normal Curve using TI Graphing Calculator • • “Area to the right of" – using TI graphing calculator The area to the right of Z = 1.68 1. DISTR[2ND VARS] DISTR 2:normalcdf ENTER 2 Then, enter 1.68, E99, 0,1) ENTER The probability is 0.0465 Note: 1. E99 = 1099 which is a very large number near infinity. We use it as the right bound to obtain “greater than or equal to” some values, that is, x a . E symbol can be entered by pressing EE key on the calculator, using the key sequence [2ND ,]. Finding Area under Normal Curve using TI Graphing Calculator • “Area Between” – using TI graphing calculator • Between Z = – 0.51 and Z = 1.87 1. DISTR[2ND VARS] DISTR 2:normalcdf ENTER 2 Then, enter -0.51, 1.87, 0,1) ENTER The probability is 0.6642 Finding Z score from Probability • We did the problem: Z-Score Area • Now we will do the reverse of that Area Z-Score • This is finding the Z-score (value) that corresponds to a specified area (percentile) • And … no surprise … we can do this with a table, with TI graphing calculator. Locate Z Score from Table • • “To the left of” – using a table Find the Z-score for which the area to the left of it is 0.32 – Look in the middle of the table … find 0.32 – The nearest to 0.32 is 0.3192 … a Z-Score of -0.47 Locate Z Score from Table • • • • "To the right of" – using a table Find the Z-score for which the area to the right of it is 0.4332 Right of it is .4332 … So, left of it would be .5668 Look in the middle of the table … find 0.5668. The nearest one is 0.5675. • A value of .17 Read Read Enter Note: The table always covers the area to the left of a z score. So, we need the area to the left. Locate Z Score from TI Graphing Calculator • • “To the left of” – using TI graphing Calculator Find the Z-score for which the area to the left of it is 0.32 1. DISTR[2nd VARS] 3:invNorm ( ENTER 2. Enter 0.32,0,1), hit ENTER Solution: The Z-Score is -0.47 • • Find the Z-score for which the area to the right of it is 0.4332 Right of it is .4332 … So, left of it would be .5668 1. DISTR[2nd VARS] 3:invNorm ( ENTER 2. Enter 0.5668,0,1), hit ENTER Solution: The Z-Score is 0.17 Note: invNorm( ) contain 3 parameters: the 1st is the area to the left of a Z score; the 2nd is the mean; the 3rd is the standard deviation. Finding a Middle Range • We will often want to find a middle range of Z scores, from z0 to z 1 . For instance, find the middle 90% or the middle 95% or the middle 99%, of a standard normal distribution • The middle 90% would be How to find a Middle 90% Range • The two possible ways – The number for which 5% is to the left, or – The number for which 5% is to the right 5% is to the left 5% is to the right How To Find a Middle 90% Range • 90% in the middle is 10% outside the middle, i.e. 5% off each end • These problems can be solved in either of two equivalent ways • We could find – The number for which 5% is to the left, or – The number for which 5% is to the right • Use TI calculator: From invNorm(.05, 0, 1), we get a lower z score of -1.64. From invNorm(0.95, 0, 1), we get a upper z score of 1.64. So the middle range that covers the middle 90% of the values for a standard normal distribution is from -1.64 to 1.64. What is zα ? • The number zα denotes a Z-score such that the area to the right of zα is α (Greek letter alpha) • Some commonly used zα values are z.10 = 1.28, the area between -1.28 and 1.28 is 0.80 z.05 = 1.64, the area between -1.64 and 1.64 is 0.90 z.025 = 1.96, the area between -1.96 and 1.96 is 0.95 z.01 = 2.33, the area between -2.33 and 2.33 is 0.98 z.005 = 2.58, the area between -2.58 and 2.58 is 0.99 Area as the Probability • The area under a normal curve can be interpreted as a probability • The standard normal curve can be interpreted as a probability density function • We will use Z to represent a standard normal random variable, so it has probabilities such as P(a < Z < b) P(Z < a) P(Z > a) Note: Normal random variable is a continuous random variable. The probability for a continuous random variable being equal to a single value is zero as explained previously. So, The probability remains the same regardless if the inequalities are inclusive (include the endpoints) or exclusive (do not include the end points). That is, for instance, P(Z a) P(Z a) . Summary • Calculations for the standard normal curve can be done using tables or using technology • One can calculate the area under the standard normal curve, to the left of or to the right of each Z-score • One can calculate the Z-score so that the area to the left of it or to the right of it is a certain value • Areas and probabilities are two different representations of the same concept Applications of the Normal Distribution Learning Objectives 1. Find and interpret the area under a normal curve 2. Find the value of a normal random variable General Normal Probability Distribution • So far, we have learned to find the area under a standard normal curve. Now, we want to calculate area and values for general normal probability distributions • We can relate these problems to calculations for the standard normal previously. Standardize a General Normal Variable • For a general normal random variable X with mean μ and standard deviation σ, the variable Z X has a standard normal probability distribution • We can use this relationship to perform calculations for X from Z Convert X to Z • Values of X Values of Z • If x is a value for X, then z x is a value for Z • This is a very useful relationship Example • For example, if a normal variable X has μ = 3 and σ = 2, then a value of x = 4 for X corresponds to 43 z 0.5 2 a value of z = 0.5 for Z Find P(X < x) from P(Z < z) • Because of this relationship Values of X Values of Z z x then P(X < x) = P(Z < z) • To find P(X < x) for a general normal random variable, we could calculate P(Z < z) for a corresponding standard normal random variable Find P(X < x) from P(Z < z) • This relationship lets us compute all the different types of probabilities • Probabilities for X are directly related to probabilities for Z using the (X – μ) / σ relationship Find P(X < x) from P(Z < z) • A different way to illustrate this relationship X a μ b Z a–μ σ b–μ σ Find P(X < x) from P(Z < z) • With this relationship, the following method can be used to compute areas for a general normal random variable X – Shade the desired area to be computed for X – Convert all values of X to Z-scores using z x – Solve the problem for the standard normal Z – The answer will be the same for the general normal X Example • For a general normal random variable X with μ = 3 and σ = 2 calculate P(X < 6) • This corresponds to 63 z 1.5 2 so P(X < 6) = P(Z < 1.5) = 0.9332 [Use a Z-table or TI calculator from normalcdf(-E99,1.5, 0, 1)] Example • For a general normal random variable X with μ = –2 and σ = 4 calculate P(X > –3) • This corresponds to z 3 ( 2 ) 0.25 4 so P(X > –3) = P(Z > –0.25) = 0.5987 [ Use a Z-Table or TI calculator from normalcdf(-3, E99, 0, 1)] Example • For a general normal random variable X with μ = 6 and σ = 4 calculate P(4 < X < 11) • This corresponds to z 46 0.5 4 z 11 6 1.25 4 so P(4 < X < 11) = P(– 0.5 < Z < 1.25) = 0.5858 [ Use a Z-table or TI calculator from normalcdf(-0.5,1.25,0,1)] Calculate P(X < x) Directly • Technology often has direct calculations for the general normal probability distribution • For instance, for a general normal random variable X with μ = 6 and σ = 4, calculate P(4 < X < 11). Use TI graphing calculator, we can obtain the answer directly from normalcdf(4, 11, 6, 4) without converting X to Z. Note: In general, to find the area under any normal curve between the interval from a to b, the sequence of parameters for the function normalcdf( ) is (a, b, mean, standard deviation). If it is a standard normal curve, you can just enter (a, b) instead of (a, b, 0,1), because Z is the default normal variable in TI calculator. Compute X values from probabilities • The inverse of the relationship Z X is the relationship X Z • With this, we can compute value problems ( convert Z score to its original score) for the general normal probability distribution Compute X values from probabilities • The following method can be used to compute values for a general normal random variable X – Shade the desired area to be computed for X – Find the Z-scores for the same probability problem – Convert all the Z-scores to X using X Z Example • For a general random variable X with μ = 3 and σ = 2, find the value x such that P(X < x) = 0.3 • Since P(Z < –0.5244) = 0.3 (Note: From a Z-table or calculator: invNorm(0.3,0,1) = -0.5244), we then convert Z to X: X Z x 3 (0.5244) 2 1.95 so P(X < 1.95) = P(Z < –0.5244) = 0.3 Example • For a general random variable X with μ = –2 and σ = 4 find the value x such that P(X > x) = 0.2 • Since P(Z > 0.8416) = 0.2, (Note: From a Z-table or calculator to obtain a z-score: invNorm(0.8, 0,1) = 0.8416), we then convert the Z score back to X using: X Z x 2 0.8416 4 1.37 so P(X > 1.37) = P(Z > 0.8416) = 0.2 Example • We know that z.05 = 1.28, so P(–1.28 < Z < 1.28) = 0.90 • Thus for a general random variable X with μ = 6 and σ = 4, the middle 90% range is from -0.58 to 12.58. x1 6 1.28 4 0.58 x2 6 1.28 4 12.58 Compute X values directly • Technology often has direct calculations for the general normal probability distribution • For instance, For a general random variable X with μ = 3 and σ = 2, find the value x such that P(X < x) = 0.3. We can solve it with a TI graphing calculator: invNorm(0.3, 3, 2) which gives the answer 1.95. Note: In general, to find a x value corresponding a given area, say p, to the right of x under any normal curve, the sequence of parameters for the function invNorm( ) is (p, mean, standard deviation). If it is a standard normal curve, you can just enter (p) instead of (p, 0,1), because Z is a default normal variable in TI calculator. Summary • We can perform calculations for general normal probability distributions based on calculations for the standard normal probability distribution • For tables, and for interpretation, converting values to Z-scores can be used • For technology, often the parameters of the general normal probability distribution can be entered directly into a routine Summary • The normal distribution is – The most important bell shaped distribution – Will be used to model many random variables • The standard normal probability distribution – Has a mean of 0 and a standard deviation of 1 – Is the basis for normal distribution calculations • The general normal probability distribution – Has a general mean and general standard deviation – Can be used in general modeling situations