SENIOR HIGH SCHOOL STATISTICS AND PROBABILITY QUARTER 3 Module 3 – Means and Variances Module 4 – The Normal Distribution and Its Properties Name of the Learner: Grade and Section: Teacher: Contact number: _________________________________________ LEARNING COMPETENCIES At the end of the lesson, learners should be able to: • • • • interprets the mean and the variance of a discrete random variable.; solves problems involving mean and variance of probability distributions.; illustrates a normal random variable and its characteristics.; identifies regions under the normal curve corresponding to different standard normal values.; • converts a normal random variable to a standard normal variable and vice versa.; • computes probabilities and percentiles using the standard normal table. LESSON 1: MEANS AND VARIANCES INTRODUCTION/MOTIVATION: Importance of Mean and Variance (and Standard Deviation) / The Chebyshev’s Inequality You may wonder why the mean and standard deviation are by far the two most important summary measures of a distribution (whether a list of data, or for probability distribution, including a probability density function). There is a mathematical result derived by a Russian mathematician named Pafnuty Chebyshev, called Chebyshev’s Inequality, that says that for a distribution, (i) at least three fourths of the distribution is within two standard deviations from the mean; (ii) at least eight ninths of the data are within three standard deviations from the mean. These bounds may be conservative though. MAIN LESSON: PROPERTIES OF MEANS AND VARIANCES In previous lessons, you were shown that adding or subtracting a constant from data shifts the mean but it does not change the variance and standard deviation. This is also the case for random variables. E( X ± c ) = E(X) ± c Var( X ± c ) = Var(X) If a teacher decides to give extra points to everyone in an exam, the average in the exam increases by the number of extra points given by the teacher, but the variability of the new increased scores stays the same. If a company (or the government) decides to double the income of its employees, this would double the average income, and increase it by four times the variability in income. (The latter is the reason why government should be careful of doubling incomes, as this would increase income inequality). Learners may have observed that multiplying or dividing data by a constant changes both the mean and the standard deviation by the same factor. Variance, being the square of standard deviation, would be affected even more, by the square of this constant. This is also the case for random variables. E( aX ) = a E(X) Var( aX ) = a2 Var(X) 1 To make it simple, consider a case of two independent random variables, X and Y. The expected value of the sum of independent random variables X and Y is the sum of the expected values: E(X + Y) = E(X) + E(Y) while the expected value of the difference of X and Y is the difference of the expected values: E(X - Y) = E(X) - E(Y) How about the variance? Explain to learners that if the random variables are independent, then there is a simple Addition Rule for variances (for a sum of random variables): Var( X + Y ) = Var(X) + Var(Y) What about the variances of a difference? Surprisingly, variance also adds up for a difference of random variables: Var( X - Y ) = Var(X) + Var(Y) Variances are added for both the sum and difference of two independent random variables because the variation in each variable contributes to the variation in each case. The variability of the differences increases as much as the variability of sums. To illustrate this notion about sums (or differences of random variables), consider a team of four swimmers that are supposed to perform 4 medley relay events and swimming 100 meters. The swimmers’ performances are independent, having the following means and standard deviations of the times (in seconds) to finish 100 meters What would be the mean and standard deviation would be for the relay team’s total time in the relay. If the team practice’s best time was 201.62 seconds, would it be likely for the team to swim faster than this during actual competition? You should be able to obtain the mean of the team’s total time in the relay as the sum of the means 45.02+50.02 +51.45 +56.38 = 203.1 seconds, with a variance equal to the sum of the variances, i.e. 0.202 + 0.262 + 0.242 + 0.222 = 0.2136, so that the standard deviation is the square root of 0.2136=0.46 seconds. The best time of 201.62 seconds is 3.2 standard deviations below the mean, thus, it would be very likely for the team to swim faster than this best time. The crucial assumption is independence of the random variables. Suppose the amount of money spent by learners for lunch is represented by the random variable X, and the amount of money the same group of learners spends on afternoon snacks is represented by variable Y. The variance of the 2 sum X + Y is not the sum of the variances, since X and Y are not independent random variables. Consider tossing a fair coin 10 times and then ask learners: What would be the number of heads expected? Likely, you will answer 5 derived from10 times one half. If you answered 5, your intuition is correct. Here’s why: Define Xi as 1 if the ith toss comes up heads, and 0 if the ith toss comes up tails, and assuming in general that the coin has a chance p of yielding heads (with p=1/2 when the coin is fair), then tell them that the probability mass function for Xi is For all values of i: i=1, up to 10 (or whatever number of tosses we make). Here the mean of Xi is while the variance of Xi is For tossing a fair coin ten times, the expected value of the number of heads is while the variance here is and thus a standard deviation of approximately 1.58 Using Chebyshev’s Inequality, we know that when tossing a fair coin ten times (and repeating this coin tossing process many, many times), at least three fourths of the time, we would have the number of heads range between 5 heads (the expected value) and, give or take, 3 heads ( 3 = 2 times the standard deviation 1.58 ). In general, when we have a sequence of independent random variables X1, X2, X3, …, Xn, with a common mean m, and a common standard deviation s, then the sum will have an expected value of (n m) and a variance of (n s 2). If we were to toss a fair coin 100 times, then expected value of the number of heads obtained is 100 (1/2)=50 , while the variance is =100 (1/2) (1/2) =25. According to Chebyshev’s Inequality, at least three fourths of the distribution of 3 the number of heads in 100 tosses of a fair coin is within 50 – 2(5) = 40 heads to 50 + 2 (5) = 60 heads. For tossing a coin n times where the probability of getting a head is p, if S is the number of heads, then E(S) = n (p) while Var (S) = n (p) (1-p). REMINDER! Variances of independent random variables are the ones that add up (not the standard deviations: variances have squared units, so the intuition here is the underlying use of the Pythagorean theorem: the square of the hypotenuse is the sum of squares of the legs). In addition, remind them that variances of independent random variables add even when we are considering differences between them. ASSESSMENT 1. A grade 12 student uses the Internet to get information on temperatures in the city where he intends to go for college. He finds information in degrees Fahrenheit. Determine the summary statistics equivalents in Celsius scale given °C =(°F-32) (5/9) 4 LESSON 2: THE NORMAL DISTRIBUTION AND ITS PROPERTIES Many continuous random variables, such as IQ scores, heights of people, or weights of M&Ms, have histograms that have bell-shaped distributions. the most important distribution in statistical science is a normal distribution, which has a "bell-shaped" curve. Explain that there are many reasons why the normal distribution is considered the most important curve in statistics. (a) Many random variables are either normally distributed or, at least, approximately normally distributed. Heights, weights, examination scores, the log of the length of life of some equipment are among a few random variables that are approximately normally distributed. Although the distributions are only approximately normal, the approximation is usually quite close. (b) It is easy for mathematical statisticians to work with the normal curve. A number of hypothesis tests and the regression model are based on the assumption that the underlying data have normal distributions. (Extra note: There are, however, other kinds of continuous distributions that are used in practice. For instance, the distribution that has been found convenient for modeling the length of life of an equipment is the Weibull distribution.) Stress that the normal distribution is a continuous distribution just like the uniform and triangular distribution. However, the left and right tails of the normal distribution extend indefinitely but come infinitely close to the x-axis. This is a picture of the normal (bell-shaped) curve The graph of the normal distribution depends on two factors: the mean m and the standard deviation σ. In fact, the mean and standard deviation characterize the whole distribution. That is, we can get areas under the normal curve given information about the mean and standard deviation. The mean determines the location of the center of the bell-shaped curve. Thus, a change in the value of the mean shifts the graph of the normal curve to the right or to the left. For symmetric distributions with a single peak, such as the normal curve, assist learners to remember that in this case: Mean = Median = Mode. The standard deviation determines the shape of the graphs (particularly, the height and width of the curve). When the standard deviation is large, the 5 normal curve is short and wide, while a small value for the standard deviation yields a skinnier and taller graph. the curve above on the left is shorter and wider than the curve on the right, because the curve on the left has a bigger standard deviation. a normal curve is symmetric about its mean and is more concentrated in the middle rather than in the tails. Aside from that, observe that normal curves differ in how spread out they are (and that the spread or variability is measured by the standard deviations). when a random variable has a normal distribution with mean m and variance σ2, we denote this as X~N(μ,σ2). Technical Note: The height of a normal curve at some value x is a formidable looking expression that depends on the mean m and standard deviations: 6 7 KEY POINTS • The normal distribution, a special continuous distribution, is extremely important in statistics because many random variables that occur in real applications have normal distributions (or approximately normal distributions). • The normal distribution, characterized by its mean m and its standard deviations., has a graph that is bell-shaped. It is also symmetric about the mean so that in consequence, the mean is the median and is also the mode (since the curve is highest at the mean). • The normal curve satisfies the Empirical Rule: (a) Approximately 68% of the area under the normal curve is within one standard deviation from the mean; (b) Approximately 95% of the area under the normal curve is within two standard deviations from the mean; and (c) nearly everything, approximately 8 99.7% of the area under the normal curve, is within three standard deviations from the mean. ASSESSMENT 1. The data below and the accompanying histogram give the weights, to the nearest hundredth of a gram, of a sample of 100 coins (each with a value of P10). The mean weight is 8.69 grams and the standard deviation s is approximately 0.055 gram. a. Compare the mean and median. b. What percentage of the data is within one standard deviation of the mean? Within two standard deviations? Within three standard deviations? c. Suppose you were to randomly select a coin from this collection. What is the chance that its weight would be within one standard from the mean? Two standard deviations? Three standard deviations? d. What percentage of the data is below the mean? e. Suppose you were to randomly select a coin from this collection. What is the chance that its weight would be below the mean? 9 LESSON 3: AREAS UNDER A STANDARD NORMAL DISTRIBUTION The Standard Normal Curve Define the standard normal distribution to be the normal distribution with a mean of 0 and a standard deviation of 1, and draw a standard normal curve: 10 the table’s rows show the whole number and tenths place of the z-score, while the table’s columns show the hundredths place, and finally, the cumulative probability Φ(z) appears in the cell of the table. For example, a section of the standard normal table is reproduced below. To find the cumulative probability of a z-score equal to -1.31, explain to students that they should cross-reference the row of the table containing -1.3 with the column containing 0.01. The table shows that the probability that a standard normal random variable will be less than -1.31 is 0.0951; that is, Φ(1.31) = P(Z ≤ -1.31) = 0.0951. Practice this table of cumulative probabilities under a standard normal curve. Assume that we have a random variable Z that has a standard normal distribution. Ask them what would be: (a) P( Z ≤ 0 ): Answer should be 0.5 since the first entry of the first line (of the second page) for the Table of values of Φ(z) reads so. (b) P( Z ≤ -1.54 ) ; As per Table of values of Φ(z), answer is 0.0618 (c) P(-1.54 ≤ Z ≤ 1.54 ) = 0.8764. Get a graph of the pertinent area of interest, and show that the area between -1.54 and 1.54 can be obtained from the difference of the area to the left of 1.54 and the area to the left of -1.54: = P( Z ≤ 1.54 ) - P( Z ≤ -1.54 ) = 0.9382 - 0.0618 (as per the table entries) = 0.8764 (d) P(Z ≥ 1.54) = 0.0618 P(Z ≥ 1.54) is an upper tail area, but the total area under the curve is 1, so P( Z ≥ 1.54 ) is the difference of 1 and the area to the left of 1.54, i.e. 11 Alternatively, P( Z ≥ 1.54 ) = P( Z ≤ - 1.54 ) = 0.0618 12 KEY POINTS • The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. • Tables of the Cumulative Distribution Function of a Standard Normal Distribution can be used to generate various areas of a standard normal curve as well as percentiles of the distribution. ASSESSMENT 1. The standard normal distribution a. has a mean of zero (0) and a standard deviation of 1. b. has a mean of 1 and a variance of zero (0). c. has an area equal to 0.5. d. cannot be used to approximate discrete probability distributions. 2. If Z has a standard normal distribution, and P (0 < Z < z ) is 0.3770, then the value of z is a. 0.18 b. 0.81 c. 1.16 d. 1.47 3. True or False: The probability that a standard normal random variable, Z, falls between – 1.50 and 0.81 is 0.7242. __________ 13 4. Suppose Z has a standard normal distribution with a mean of 0 and standard deviation of 1. The probability that Z is less than 1.15 is __________ 5. Suppose Z has a standard normal distribution with a mean of zero (0) and standard deviation of 1. The probability that Z values are larger than __________ is 0.3483. 6. Suppose Z has a standard normal distribution with a mean of zero (0) and standard deviation of 1. 85% of the possible Z values are smaller than __________. HANDOUT 14 15 LESSON 10: AREAS UNDER A NORMAL DISTRIBUTION Given a normally distributed random variable: X~N(μ,σ2), we often wish to find various probabilities pertaining to where an arbitrary measurement may lie. For instance, we may want to find P(a ≤ X ≤ b), which is the probability that a random measurement X lies between a and b. Standard Scores (or Z-scores) Whatever the value of the mean and standard deviation of a normal curve, we can transform the whole normal curve into a standard normal curve (as illustrated in the following figure). 16 This entails transforming the all data in a normal curve into standard units: An observation is in standard unit (or z-score) if we see how many standard deviations it is above or below the average. That is, if x, m, and s respectively represent the observation, its mean, its standard deviation, then the standardized form (or z-score) of x is Remember that a Z-score indicates how many standard deviations a certain data element is from the mean. For instance, if examination scores in Statistics and Probability have an average of 75 and a standard deviation of 5, then an exam score of 90 has a z-score of (90-75)/5 = 3, while a score of 70 has a z-score of (70-75)/5 =-1. To interpret these z-scores, we note that 90 is 3 standard deviations above the mean (75), while 70 is one standard deviation “below’ the mean. Z scores have a very good way of making variables comparable. The Z-scores may also be used for normal random variables to transform them into standard normal random variables, and this, in turn, can help us relate probabilities for any normal distribution to areas under a standard normal curve, as the following example on the time to walk a dog illustrates. Illustration for Finding Areas Under a Normal Curve Assume that the distribution of heights of all female Grade 11 students can be modeled well by a normal curve with a mean of 1620 mm and a standard deviation of 50 mm. Further, we wish to determine (a) the proportion of female Grade 11 students shorter than 1550 mm; (b) the proportion of female Grade 11 students taller than 1650 mm; (c) the proportion of female Grade 11 students between 1600 and 1675 mm; (d) the height of a female Grade 11 student for which 10 percent of female Grade 11 students are shorter than it; (e) the height of a female Grade 11 student for which 75% of female Grade 11 students are taller than it. For computing the answer to (a) first, transform 1550 to its z-score, yielding (1550-1620)/50 =-1.4 so that we can associate the area to the left of 1550 (under a normal curve with mean 1620 and standard deviation 50) with that of the area to the left of z = -1.4 under a standard normal curve. Reading 17 from the table of Cumulative Distribution Function of a Standard Normal Curve, we find Φ(-1.4) = 0.0808, For (b), transform the height value 1650 to its standard units, (16501620)/50 = 0.6, and then note that the area to the right of z = 0.6 under the standard normal curve is the difference between the total area under a standard normal curve (100%) and the area to the right of z=0.6, Φ (0.6) = 0.7257. In consequence, the desired probability (and area) is 1- 0.7257=0.2743. For (c), learners should mention they need to firstly transform 1600 and 1675 into their respective standardized forms, namely (1600-1620)/50 = -0.4 and (1675- 1620)/50 = 1.1, and then generate the area between these two zscores as the difference between Φ (1.1) and Φ (-0.4), i.e. 0.86430.3446=0.5197. For (d), draw the figures on the board to illustrate what needs to be done: The 10th percentile of the height distribution may be obtained by firstly getting the 10th percentile of the standard normal curve, which can be read off from the table as –1.282. This means that the 10th percentile of the height distribution is 1.282 standard deviations below the mean. This required value for the height is – 1.282(50)+1620 =1555.9. Finally, for (e), suggest to learners that we want the 25th percentile as this is the value for which 75 percent of the height distribution would be above it. Similar to (d), tell students they can find the 25th percentile first of a standard normal curve (– 0675), then yield the required height as: –0.675(50)+1620 =1586.25. KEY POINTS • To obtain probabilities or percentiles under a normal curve, perform two steps: Transform the normal curve into a standard normal curve by way of “z-scores” (which involves subtracting the mean and dividing the result by the standard deviation) z = (X - μ) / σ. Then, use the tables of the Cumulative Distribution Function of a Standard Normal Distribution to obtain the required areas of a standard normal curve to find the probabilities associated with the z-scores. 18 ASSESSMENT 1. If a particular batch of data is approximately normally distributed, we would find that approximately a) 2 of every 3 observations would fall between ±1 standard deviation around the mean. b) 4 of every 5 observations would fall between ±1.28 standard deviations around the mean. c) 19 of every 20 observations would fall between ±2 standard deviations around the mean. d) All the above. For problems 2 to 4 consider the following case. The length of time it takes a Grade 11 student to play the Candy Crush computer app follows a normal distribution with a mean of 3.5 minutes and a standard deviation of 1 minute 2. The probability that a randomly selected Grade 11 student will play one game of Candy Crush in less than 3 minutes is a) 0.3551 b) 0.3085 c) 0.2674 d) 0.1915 3. The probability that a randomly-selected grade 11 student will take between 2 and 4.5 minutes to play Candy Crush is: a) 0.0919 b) 0.2255 c) 0.4938 d) 0.7745 4. The point in the distribution of times to play Candy Crush in which 75.8% of the Grade 11 students exceed when playing Candy Crush. a) 2.8 minutes b) 3.2 minutes c) 3.4 minutes d) 4.2 minutes 5. Rodrigo earned a score of 940 on a national achievement test. The mean test score was 850 with a standard deviation of 100. What proportion of students had a higher score than Rodrigo? (Assume that test scores are normally distributed.) If there were 100,000 students who took the test, how many would be expected to have a higher score than Rodrigo? 19 REFERENCES De Veau, R. D., Velleman, P. F., and Bock, D. E. (2006). Intro Stats. Pearson Ed. Inc. Workbooks in Statistics 1: 11th Edition, Institute of Statistics, UP Los Baños, College Laguna 4031 Random Variables. Khan Academy. Retrieved from https://www.khanacademy.org/math/probability/randomvariablestopic/random_variables_prob_dist/v/random-variables De Veau, R. D., Velleman, P. F., and Bock, D. E. (2006). Intro Stats. Pearson Ed. Inc. Workbooks in Statistics 1: 11th Edition, Institute of Statistics, UP Los Baños, College Laguna 4031 Probability and Statistics Module 19: Discrete Probability Distributions. (2013) Australian Mathematical Sciences Institute and Education Services Australia. Retrieved from http://www.amsi.org.au/ESA_Senior_Years/SeniorTopic4/4_md/SeniorTopic 4c.html http://www.amsi.org.au/ESA_Senior_Years/SeniorTopic4/4_md/SeniorTopic 4c.html#content_1 http://www.amsi.org.au/ESA_Senior_Years/SeniorTopic4/4_md/SeniorTopic 4c.html#content_2 http://www.amsi.org.au/ESA_Senior_Years/SeniorTopic4/4_md/SeniorTopic 4c.html#content_3 https://www.youtube.com/watch?v=qSu-Rk-6apw&feature=youtu.be Probability and Statistics Module 21: Continuous Probability Distributions. (2013) Australian Mathematical Sciences Institute and Education Services Australia. Retrieved from http://www.amsi.org.au/ESA_Senior_Years/PDF/ContProbDist4e.pdf Probability and Statistics Module 19: Discrete Probability Distributions. (2013) Australian Mathematical Sciences Institute and Education Services Australia. Retrieved from http://www.amsi.org.au/ESA_Senior_Years/SeniorTopic4/4_md/SeniorTopic 4c.html#content_3 http://www.amsi.org.au/ESA_Senior_Years/SeniorTopic4/4_md/SeniorTopic 4c.html#content_5 http://www.amsi.org.au/ESA_Senior_Years/SeniorTopic4/4_md/SeniorTopic 4c.html#content_6 [staslectures]. Mean and Expected Value of Discrete Random Variables. Retrieved from 20 https://www.opened.com/video/mean-and-expected-value-of-discreterandom-variables/116285 https://www.opened.com/video/variance-and-standard-deviation-of-discreterandom-variables/116286 https://www.opened.com/video/mean-e-x-and-variance-var-x-forcontinuous-random-variables/116287 21