6.11 – The Normal Distribution IB Math SL/HL Y1&Y2 - Santowski (A) Random Variables Now we wish to combine some basic statistics with some basic probability we are interested in the numbers that are associated with situations resulting from elements of chance i.e. in the values of random variables We also wish to know the probabilities with which these random variables take in the range of their possible values i.e. their probability distributions (A) Random Variables So 2 definitions need to be clarified: (i) a discrete random variable is a variable quantity which occurs randomly in a given experiment and which can assume certain, well defined values, usually integral examples: number of bicycles sold in a week, number of defective light bulbs in a shipment discrete random variables involve a count (ii) a continuous random variable is a variable quantity which occurs randomly in a given experiment and which can assume all possible values within a specified range examples: the heights of men in a basketball league, the volume of rainwater in a water tank in a month continuous random variables involve a measure (B) CLASSWORK CLASSWORK: (to review the distinction between the 2 types of random variables) Math SL text, pg 710, Chap29A, Q1,2,3 Math HL Text, p 728, Chap 30A, Q1,2,3 (C) The Normal Distribution - data obtained by direct measurement (i.e. population heights) is usually continuous rather than discrete (all heights are possible, not just whole numbers - continuous data also has statistical distributions and many physical quantities are usually distributed symmetrically and unimodally about the mean statisticians observe this bell shaped curve so often that its model is known as the normal distribution (C) The Normal Distribution the graph of the normal distribution is also referred to as the standard normal curve and one defining equation for the curve for our purposes is F z I G J 1 2 K H e , z 2 f ( z) 2 where z refers to a concept called the z score which takes into account the mean and standard deviation of a set of data (C) The Normal Distribution we can graph the normal distribution as follows, where the x-axis is the number of standard deviations, , from the mean/median, (the idea behind our z score) the total area under the curve is 1 unit (aising from the fact that the total probability of all outcomes of an event can be at most 1 or 100%) With our z-score, we “set” the mean, , to be 0 and each 1 unit of the x-axis is 1 standard deviation, . (C) The Normal Distribution to find the area under the curve between any two given z-scores, we can rely on graphs the area under the curve between our two given z-scores means the proportion of values between our two z-scores so if we write p(-2 < z < 1) = 0.81859, we mean that the proportion of data values that are between 2 standard deviation units below the mean and 1 unit above is 0.81859, or as a percentage: 81.859% of our data, or the probability that our data values lie between 2 SD’s below and 1 SD unit above the mean is 0.0.81859 we can illustrate this on a normal distribution graph as follows: (C) The Normal Distribution – Tables of z scores We can work out the previous example without a graph and shading areas under a graph, by simply using prepared tables: SL Math text, p735 and HL Math text, p772 So to determine the p(-2 < z < 1), we check the table and see that a z value of – 2.00 corresponds to a value of 0.0228 this means that the area shaded under the curve, starting from –2.00 all the way left to - is 0.0288 (or 2.88% of the data is more than 2 SD units below the mean) Likewise, we check the table for our z value of 1.00 and see the value of 0.8413 this means that the area shaded under the curve, starting from 1.00 all the way left to - is 0.8413 (or 84.13% of the data is less than 1 SD units above the mean) So what do we do with the 2 numbers? Well, we have accounted for some of the data twice the data more than 2 SD units below the mean so this gets subtracted from the first value 0.8413 – 0.0288 = 0.8185 as we saw before with the graph and graphing software (D) Examples Use the table to evaluate p(z<1.5). Interpret the value. The table gives us the value 0.9332, which means that 93.32% of our data lies 1.5 SD units above the mean and below or the probability of getting a random data point that is at most 1.5 SD units above the mean is 0.9332 We can see this illustrated on the graph (D) Example Using Standard Normal Tables For the standard normal variable, find: – – – Some slightly more challenging examples: – – – (i) p(z > 1.7) (ii) p(z < -0.88) (iii) p(z > -1.53) And now some in-between values: – – – (i) p(z < 1) (ii) p(z < 0.96) (iii) p(z < 0.03) (i) p(1.7 < z < 2.5) (ii) p(-1.12 < z < 0.67) (iii) p(-2.45 < z < -0.08) WE can also do some Ainverse@ problems – – – (i) p(z < a) = 0.5478 (ii) p(z > a) = 0.6 (iii) p(z < a) = 0.05 (E) Homework SL Math text, Chap 29H.1, p736, Q1-4 HL Math text, Chap30K.1, p757, Q2-5 (F) Standardizing Normal Distributions When we have applications wherein we apply a normal distribution (i.e. with any continuous R/V like height, weight of people), each unique application has its own unique mean and standard deviation along with its unique distribution graph What we wish to accomplish now can we somehow standardize a normal distribution so that one single standardized normal distribution applies for every single possible normal distribution We can accomplish this by a combination of transformations of our unique data with its unique normal distribution (F) Standardizing Normal Distributions So from every data point in our distribution, we will subtract the population’s mean and then divide this difference by the population’s standard deviation we will call this result a “z”-score So our “formula” for this data transformation is z = (x )/ So we then graph the newly transformed data points and we get a standardized normal distribution curve The two key features on the standardized normal distribution curve are (i) the mean is 0 and (ii) the standard deviation is 1 (G) Graph of Standardized Normal Distribution (H) Working with a Standardized Normal Distribution Ex 1 The heights of all rugby players from India is normally distributed with a mean of 179 cm with a standard deviation of 5 cm. Find the probability that a randomly selected player (i) was less than 181 cm tall (ii) was at least 177.5 cm tall (iii) was between 175 and 190 cm (H) Working with a Standardized Normal Distribution Solution #1(i) is to use the zscore tables z = (181-179)/5 = 0.40 So find 0.40 on the tables, which is 0.6554 So given that the table gives us the cumulative area under the curve until the specified z-score (0.40), then we can conclude that 65.5% of the players would be less than 181 cm Alternatively, we can use a GDC: We simply select the normalcdf( command and enter the specifics as follows: Normalcdf(-EE99,181,179,5) which tells the GDC that you want the heights less than 181 (basically from 181 down to ) and that the population mean is 179 and the SD is 5 Our result is 0.6554 ….. similar to the result from the table (H) Working with a Standardized Normal Distribution Solution #1(ii) use the z-score tables however we must realize that the table gives us a cumulative area under the curve up to the given z-score now however we are looking for a value GREATER than the given area So, using the table, simply find the area under the curve BELOW the given zscore Then, using the “complement” idea, simply subtract the area from 1 z-score = (177.5-179)/5 = -0.30 Table value is 0.4404 (so 44.04% of the area under the curve is to the left of – 0.30 on the z-axis) Therefore, the area representing the probability of our players being GREATER than 177.5 cm would be 1 – 0.4404 = 0.5596 (so this would be the area under the curve, to the right of z = -0.30) In using the GDC, we again simply enter the command normalcdf(177.5, EE99, 179, 5) and get 0.5596 as our answer (H) Working with a Standardized Normal Distribution Solution #1(iii) use the z-score tables however we must realize that the table gives us a cumulative area under the curve up to the given z-score now however we are looking for a value BETWEEN 2 given values So our two z-scores for 175 and 190 are z = –0.80 and z = 2.1, which we can illustrate below (H) Working with a Standardized Normal Distribution So, again our tables require several steps in the calculation (i) find the area under the curve that is LESS THAN –0.80 0.2119 (ii) Now find the area under the curve that is less than 2.1 0.9821 So clearly, the 0.9821 total cumulative area includes the 0.2119 that we DO NOT have within our specified range of z-scores (player heights less than 175 cm) Which suggests that we need to subtract the 0.2119 from 0.9821 = 0.7702 Alternatively, using the GDC, we enter normalcdf(175,190,179,5) and get the same 0.7702….. (I) Homework HL Math text Chap30K.2, p759, Q1-3 Chap 30K.3, p760, Q1-4 Chap 30L, p761, Q1-7 SL Math text Chap 29H.2, p738, Q1-3 Chap 29H.3, p739, Q1-3 Chap 29I, p740, Q1-8