Estimating the Mean and Variance of a Normal Distribution

advertisement
Estimating the Mean and Variance of a Normal Distribution
Learning Objectives
After completing this module, the student will be able to



explain the value of repeating experiments
explain the role of the law of large numbers in estimating population means
describe the effect of
increasing the sample size
or reducing measurement
errors or other sources of
variability
Knowledge and Skills





Properties of the
arithmetic mean
Estimating the mean of a
normal distribution
Law of Large Numbers
Estimating the Variance of
a normal distribution
Generating random
variates in EXCEL
Prerequisites
1. Calculating sample mean and arithmetic average
2. Calculating sample standard variance and standard deviation
3. Normal distribution
Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution.
Created: September 9, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 1
Pretest
1. Laura and Hamid are late for Chemistry lab. The lab manual asks for determining the density of solid
platinum by repeating the measurements three times. To save time, they decide to only measure
the density once. Explain the consequences of this shortcut.
2. Tom and Bao Yu measured the density of solid platinum three times: 19.8, 21.4, and 21.9 g/cm 3.
Determine the arithmetic average of these three measurements accurate to three decimal places.
3. The following graphs are densities of probability distributions. Which represent the density of a
normal distribution?
(a)
(b)
0.5
(c)
2.5
0.35
0.45
0.3
0.4
2
0.25
0.35
0.3
1.5
0.2
0.25
0.15
0.2
1
0.15
0.1
0.1
0.5
0.05
0.05
0
0
2
4
t
6
0
0
2
4
6
0
0
2
4
6
t
4. Which two parameters are typically used to describe the normal distribution?
a. Median
b. Variance
c. Standard deviation
d. Mean
5. Suppose X is normally distributed with mean 3 and standard deviation 1, that is, X
N(3,1) . Use
EXCEL to (a) find P( X  3) , (b) find P(1  X  4) , and (c) determine a so that P(X  a)  0.74 .
Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution.
Created: September 9, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 2
Estimating the Mean of a Normally Distributed Population
Suppose an experiment is repeated n times under identical conditions. Denote by xi , i  1,2,
, n the
outcome of each individual experiment. The arithmetic average xn is calculated
xn 
x1  x2 
n
 xn

1 n
 xi
n i 1
When outcomes are not all distinct, we can count the number of times each value occurs: Suppose again
that an experiment is repeated n times under identical conditions. But now, we assume that there are
only k distinct values x j , j  1,2,..., k , and that x j occurs f j times. Then the arithmetic average xn is
calculated
xn 
1
1
 x1 f1  x2 f2  ...  xk fk  
n
n
k
x f
j j
j 1
Example
Suppose that the following data represent the ages of patients in a study: 17, 19, 19, 20, 21, 24, 26, 26,
26, and 27. We find for the arithmetic average
x10 
17  19  19  20  21  24  26  26  26  27 225

 22.5
10
10
Since some of the values occur more than twice, we can also use the frequency distribution:
xj 17 19 20 21 24 26 27
fj 1 2 1 1 1 3 1
For the arithmetic average we find
x10 
1
225
(17)(1)  (19)(2)  (20)(1)  (21)(1)  (24)(1)  (26)(3)  (27)(1)   22.5
10
10
Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution.
Created: September 9, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 3
In-class Activity
We will explore the properties of the arithmetic mean when measurements are taken from a normal
distribution. Open the first tab (Explore 1) on the accompanying spreadsheet. Column B has 100 random
variates from a normal distribution with mean 3 and variance 1. Recall that the function
“=NORMINV(probability,mean,standard_dev)” returns the inverse of the normal cumulative distribution
for the specified mean and standard deviation. Column C calculates the cumulative sum and Column D
has the corresponding arithmetic averages. The Figure plots Column D against Column A.
Use the F9 key to explore the arithmetic average. What do you observe?
Theory
In Explore 1, you observed that the arithmetic mean stabilizes around the mean of the normal
distribution, regardless of the variance, as you increase the sample size. This is a consequence of the
Law of Large Numbers. While we do not yet have the background to completely understand its
mathematical formulation, we will give it here anyway so that you can see how a mathematical result
expressing this property is formulated. We will come back to this result later in the course when we
have more background.
Law of Large Numbers
If X1 , X2 ,
, X n are independent and identically distributed with E | X i |  , then as n
tends to infinity, X n converges to EX1 in probability.
Problems
1. A random variate is a particular outcome of a random variable. Assume that random variates are
drawn repeatedly from a normal distribution with mean 4 and variance 9. If you calculated the
arithmetic average for a large number of variates from this distribution, what would you expect the
arithmetic average to be close to?
2. The Law of Large Numbers holds quite generally. Without going more deeply into the theory, can
you guess the answer to the following problem? Suppose you repeatedly tossed a biased coin where
heads occur with probability 0.2. What percentage of time would you expect to see heads?
Based on our observations in Explore 1, we conclude that the mean of a normal distribution can be
estimated by repeatedly sampling from the normal distribution and calculating the arithmetic average of
the sample. This arithmetic average serves as an estimate for the mean of the normal distribution.
Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution.
Created: September 9, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 4
Properties of the Arithmetic Average
Explore 2
When you compare the arithmetic averages of 100 random variates in Explore 1, you will realize that
different runs of the simulation result in slightly different averages. Arithmetic averages are random
variables and we will explore their distribution as a function of the sample size. Again, we will use
normally distributed random variables.
A simulation is set up under the tab Explore 2 that simulates arithmetic averages of normally distributed
random variables. We vary the sample sizes. Details are explained in the spreadsheet. Use the F9 key to
explore the effect of the sample size on the arithmetic average. What do you observe?
Explore 3
The variation in the arithmetic mean comes from the fact that the random variates in each sample vary
from run to run. The more the random variates vary, the more the arithmetic mean varies. The degree
of variation is described by the standard deviation. To explore the effect of the variation, we simulate
arithmetic means for two different scenarios in the spreadsheet under tab Explore 3: in one simulation,
we calculate arithmetic means for random variates that are normally distributed with mean 3 and
standard deviation 1; in the second scenario, we calculate arithmetic means for random variates that
are normally distributed with mean 3 and standard deviation 0.5. Details are explained in the
spreadsheet. Use the F9 key to explore the effect of the standard deviation on the arithmetic average.
What do you observe?
Problems (cont.)
3. Based on your observations in Explore 2 and 3, what is the effect on the arithmetic mean when you
(a) increase sample size and (b) reduce variation. What does this imply for experiments?
Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution.
Created: September 9, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 5
The following result quantifies the effect on variance when we increase the sample size n. The larger the
sample size, the smaller the variance of the arithmetic mean. That is, the larger the sample size of a
sample drawn from a normal distribution, the more accurately can we estimate the mean of the
underlying normal distribution.
Theory
If X is normally distributed with mean  and standard deviation  , one can show that the
arithmetic mean X n is normally distributed with mean  and standard deviation  / n .
Estimating the Variance of a Normally Distributed Population
Suppose an experiment is repeated n times under identical conditions. Denote by xi , i  1,2,
, n the
outcome of each individual experiment. The sample variance sn2 is calculated
(x1  xn )2  (x2  xn )2 
s 
n 1
2
n
 (xn  xn )2

1 n
(xi  xn )2

n  1 i 1
where xn denotes the arithmetic average of the n outcomes x j , j=1,2,…,n. The sample standard
deviation sn is the square root of the sample variance: sn  sn2 .
The sample variance serves as an estimate for the variance of a normally distributed population. This
implies that if we wish to estimate the variance of a normally distributed population, we take a sample
and calculate the sample variance. As with estimating the mean, the larger the sample is, the better the
estimate will be. We will learn later in the course why we divide by n-1 and not by n when we calculate
the sample variance.
To gain some familiarity with the concept of estimation, we will simulate normally distributed variates
and estimate the mean and the variance from the simulated data.
Explore 4
The spreadsheet under the tab Explore 4 is set up to simulate 20 random variates from a normal
distribution with mean  (Cell J3) and standard deviation  (Cell J4). In Cell H10, we estimate the mean
by calculating the arithmetic average (“=AVERAGE(number 1, [number 2], …)”). In Cell H11, we estimate
Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution.
Created: September 9, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 6
the variance by calculating the sample variance (“=VAR(number 1, [number 2], …)”). In Cell H12, we
calculate the sample standard deviation by taking the square root of the sample variance
(“=SQRT(number)).
(a) Use the F9 key to explore how the estimates for the mean and the variance change from run to run.
(b) Change the simulation so that instead of simulating 20 random variates, simulate 40 random
variates. Calculate the arithmetic mean, the sample variance, and the sample standard deviation. How
does increasing the sample size change your estimates?
Homework (Reading Assignments are from C. Neuhauser, Calculus for Biology and Medicine, 3rd
edition, Prentice Hall)


Read Section 12.7.1.
Do Problems 1-8 and 11 in Section 12.7.
Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution.
Created: September 9, 2009 Revisions:
Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution
Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows
others to translate, make remixes, and produce new stories based on this work, provided the original author and source are
credited and the new work will carry the same license.
Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute.
Page 7
Download