Estimating the Mean and Variance of a Normal Distribution Learning Objectives After completing this module, the student will be able to explain the value of repeating experiments explain the role of the law of large numbers in estimating population means describe the effect of increasing the sample size or reducing measurement errors or other sources of variability Knowledge and Skills Properties of the arithmetic mean Estimating the mean of a normal distribution Law of Large Numbers Estimating the Variance of a normal distribution Generating random variates in EXCEL Prerequisites 1. Calculating sample mean and arithmetic average 2. Calculating sample standard variance and standard deviation 3. Normal distribution Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 1 Pretest 1. Laura and Hamid are late for Chemistry lab. The lab manual asks for determining the density of solid platinum by repeating the measurements three times. To save time, they decide to only measure the density once. Explain the consequences of this shortcut. 2. Tom and Bao Yu measured the density of solid platinum three times: 19.8, 21.4, and 21.9 g/cm 3. Determine the arithmetic average of these three measurements accurate to three decimal places. 3. The following graphs are densities of probability distributions. Which represent the density of a normal distribution? (a) (b) 0.5 (c) 2.5 0.35 0.45 0.3 0.4 2 0.25 0.35 0.3 1.5 0.2 0.25 0.15 0.2 1 0.15 0.1 0.1 0.5 0.05 0.05 0 0 2 4 t 6 0 0 2 4 6 0 0 2 4 6 t 4. Which two parameters are typically used to describe the normal distribution? a. Median b. Variance c. Standard deviation d. Mean 5. Suppose X is normally distributed with mean 3 and standard deviation 1, that is, X N(3,1) . Use EXCEL to (a) find P( X 3) , (b) find P(1 X 4) , and (c) determine a so that P(X a) 0.74 . Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 2 Estimating the Mean of a Normally Distributed Population Suppose an experiment is repeated n times under identical conditions. Denote by xi , i 1,2, , n the outcome of each individual experiment. The arithmetic average xn is calculated xn x1 x2 n xn 1 n xi n i 1 When outcomes are not all distinct, we can count the number of times each value occurs: Suppose again that an experiment is repeated n times under identical conditions. But now, we assume that there are only k distinct values x j , j 1,2,..., k , and that x j occurs f j times. Then the arithmetic average xn is calculated xn 1 1 x1 f1 x2 f2 ... xk fk n n k x f j j j 1 Example Suppose that the following data represent the ages of patients in a study: 17, 19, 19, 20, 21, 24, 26, 26, 26, and 27. We find for the arithmetic average x10 17 19 19 20 21 24 26 26 26 27 225 22.5 10 10 Since some of the values occur more than twice, we can also use the frequency distribution: xj 17 19 20 21 24 26 27 fj 1 2 1 1 1 3 1 For the arithmetic average we find x10 1 225 (17)(1) (19)(2) (20)(1) (21)(1) (24)(1) (26)(3) (27)(1) 22.5 10 10 Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 3 In-class Activity We will explore the properties of the arithmetic mean when measurements are taken from a normal distribution. Open the first tab (Explore 1) on the accompanying spreadsheet. Column B has 100 random variates from a normal distribution with mean 3 and variance 1. Recall that the function “=NORMINV(probability,mean,standard_dev)” returns the inverse of the normal cumulative distribution for the specified mean and standard deviation. Column C calculates the cumulative sum and Column D has the corresponding arithmetic averages. The Figure plots Column D against Column A. Use the F9 key to explore the arithmetic average. What do you observe? Theory In Explore 1, you observed that the arithmetic mean stabilizes around the mean of the normal distribution, regardless of the variance, as you increase the sample size. This is a consequence of the Law of Large Numbers. While we do not yet have the background to completely understand its mathematical formulation, we will give it here anyway so that you can see how a mathematical result expressing this property is formulated. We will come back to this result later in the course when we have more background. Law of Large Numbers If X1 , X2 , , X n are independent and identically distributed with E | X i | , then as n tends to infinity, X n converges to EX1 in probability. Problems 1. A random variate is a particular outcome of a random variable. Assume that random variates are drawn repeatedly from a normal distribution with mean 4 and variance 9. If you calculated the arithmetic average for a large number of variates from this distribution, what would you expect the arithmetic average to be close to? 2. The Law of Large Numbers holds quite generally. Without going more deeply into the theory, can you guess the answer to the following problem? Suppose you repeatedly tossed a biased coin where heads occur with probability 0.2. What percentage of time would you expect to see heads? Based on our observations in Explore 1, we conclude that the mean of a normal distribution can be estimated by repeatedly sampling from the normal distribution and calculating the arithmetic average of the sample. This arithmetic average serves as an estimate for the mean of the normal distribution. Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 4 Properties of the Arithmetic Average Explore 2 When you compare the arithmetic averages of 100 random variates in Explore 1, you will realize that different runs of the simulation result in slightly different averages. Arithmetic averages are random variables and we will explore their distribution as a function of the sample size. Again, we will use normally distributed random variables. A simulation is set up under the tab Explore 2 that simulates arithmetic averages of normally distributed random variables. We vary the sample sizes. Details are explained in the spreadsheet. Use the F9 key to explore the effect of the sample size on the arithmetic average. What do you observe? Explore 3 The variation in the arithmetic mean comes from the fact that the random variates in each sample vary from run to run. The more the random variates vary, the more the arithmetic mean varies. The degree of variation is described by the standard deviation. To explore the effect of the variation, we simulate arithmetic means for two different scenarios in the spreadsheet under tab Explore 3: in one simulation, we calculate arithmetic means for random variates that are normally distributed with mean 3 and standard deviation 1; in the second scenario, we calculate arithmetic means for random variates that are normally distributed with mean 3 and standard deviation 0.5. Details are explained in the spreadsheet. Use the F9 key to explore the effect of the standard deviation on the arithmetic average. What do you observe? Problems (cont.) 3. Based on your observations in Explore 2 and 3, what is the effect on the arithmetic mean when you (a) increase sample size and (b) reduce variation. What does this imply for experiments? Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 5 The following result quantifies the effect on variance when we increase the sample size n. The larger the sample size, the smaller the variance of the arithmetic mean. That is, the larger the sample size of a sample drawn from a normal distribution, the more accurately can we estimate the mean of the underlying normal distribution. Theory If X is normally distributed with mean and standard deviation , one can show that the arithmetic mean X n is normally distributed with mean and standard deviation / n . Estimating the Variance of a Normally Distributed Population Suppose an experiment is repeated n times under identical conditions. Denote by xi , i 1,2, , n the outcome of each individual experiment. The sample variance sn2 is calculated (x1 xn )2 (x2 xn )2 s n 1 2 n (xn xn )2 1 n (xi xn )2 n 1 i 1 where xn denotes the arithmetic average of the n outcomes x j , j=1,2,…,n. The sample standard deviation sn is the square root of the sample variance: sn sn2 . The sample variance serves as an estimate for the variance of a normally distributed population. This implies that if we wish to estimate the variance of a normally distributed population, we take a sample and calculate the sample variance. As with estimating the mean, the larger the sample is, the better the estimate will be. We will learn later in the course why we divide by n-1 and not by n when we calculate the sample variance. To gain some familiarity with the concept of estimation, we will simulate normally distributed variates and estimate the mean and the variance from the simulated data. Explore 4 The spreadsheet under the tab Explore 4 is set up to simulate 20 random variates from a normal distribution with mean (Cell J3) and standard deviation (Cell J4). In Cell H10, we estimate the mean by calculating the arithmetic average (“=AVERAGE(number 1, [number 2], …)”). In Cell H11, we estimate Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 6 the variance by calculating the sample variance (“=VAR(number 1, [number 2], …)”). In Cell H12, we calculate the sample standard deviation by taking the square root of the sample variance (“=SQRT(number)). (a) Use the F9 key to explore how the estimates for the mean and the variance change from run to run. (b) Change the simulation so that instead of simulating 20 random variates, simulate 40 random variates. Calculate the arithmetic mean, the sample variance, and the sample standard deviation. How does increasing the sample size change your estimates? Homework (Reading Assignments are from C. Neuhauser, Calculus for Biology and Medicine, 3rd edition, Prentice Hall) Read Section 12.7.1. Do Problems 1-8 and 11 in Section 12.7. Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 7