STAT2800: Lecture 11
CHAPTER 4: COMMONLY USED DISTRIBUTIONS
The Central Limit Theorem (Section 4.11, page 289)
The Central Limit Theorem is by far the most important result in
statistics. Many commonly used statistical methods rely on this
theorem for their validity. The Central Limit Theorem says:
Knowing the appropriate model for a population is like knowing the
entire population itself. In a real life situation we would not know
which model would be the appropriate one for the population of
interest, and we certainly would not know the actual value of . The
problem then is to ESTIMATE in some way.
One way to do this is to take a simple random sample from the
population and then use it to calculate a value of the sample mean x .
If we proceeded in this way, we would become aware of the fact
that each time we took a new sample from the population and
calculated x we would get a different value for our estimate of .
Thus it is important to try to understand a bit more about the behavior
of x .
For example: If we cast a fair die and take X to be the uppermost
number, we know that the population mean (expected value) is
3 .5 .
1
STAT2800: Lecture 11
But if we take a sample of, say, four throws, the mean may be far
from 3.5. Here are the results of 5 such samples of 4 throws (a
random number generator was used to obtain these samples):
X1 X2 X3 X4
Sample 1 6
2
5
6
Sample 2 2
3
1
6
Sample 3 1
1
4
6
Sample 4 6
2
2
1
Sample 5 2
6
2
4
X
4.75
3
3
2.75
3.5
Since each sample consists of 4 throws, we say that the sample size
is n = 4. Notice that only one of the five samples gave us the correct
mean, and that the mean of the first sample is far from the actual
mean.
However, the table above is interesting:
Look at the values of the mean x . The average (mean) of these
means is 3.4 (we get 3.4 from
4.75 3 3 2.75 3.5
3.4 ). Thus,
5
although the mean of a particular sample may not be a good predictor
of the population mean, we get better results if we take the mean of a
whole bunch of sample means.
Therefore, the values of x are values of a random variable (take a
sample of 5, and measure the mean), and its probability distribution is
called the sampling distribution of the sample mean. The above
table suggests that the expected value of the sampling distribution of
2
STAT2800: Lecture 11
the mean is the same as the population mean, and this turns out to
be true (because 3.4 is really close to 3.5!).
Mean, Variance and Standard Deviation of the Sample MeanX
Given: 1. A population with mean and standard deviation .
2. A random sample of size n from the population:
X1 , X2 , … Xn with sample mean X = (X1 +X2 +… Xn)/n
Then: x x [the mean of the distribution of x is equal to the
population mean]
x
2
x2
n
[the variance of the distribution of x is equal to
(the population variance)/n]
x
x
n
[the standard deviation of the distribution of x is
equal to the (population standard deviation)/
n]
THE CENTRAL LIMIT THEOREM
Given a large random sample of size n from a population with mean
x and standard deviation x , then the sample mean x is
approximately normally distributed with mean and standard deviation
given by
x x
x x
with
z
x x
x
3
n
STAT2800: Lecture 11
Notes:
1. For the central limit theorem to apply, you need a
large sample size. n > 30 is usually large enough.
2. If the population from which we sample is normal then x is
exactly normally distributed with mean and standard
deviation as above for any sample size.
Example: Let X denote the number of flaws in a 1 inch length of
copper wire. The probability mass function of X is presented in the
following table:
x
P(X=x)
0
0.48
1
0.39
2
0.12
3
0.01
One hundred wires are sampled from this population. What is the
probability that the average number of flaws per wire in this sample is
less than 0.5?
4
STAT2800: Lecture 11
Example: The scores on a Standardized Reading Test are known to
be normally distributed with a mean of = 10 and standard deviation
of = 1.5.
(a) Find the probability that a single individual selected at random
scores between 9.4 and 10.6 on this test.
(b) Find the probability that a random sample of 25 students has a
mean score between 9.4 and 10.6 on this test.
(a) Let X be the score of the randomly selected individual. Then X
is normal with
5
STAT2800: Lecture 11
(b) Since the population the sample came from is normal, then for
any n (hence for n =25), x is exactly normal with
Recall: the Continuity Correction
Since the normal distribution is continuous, remember that inclusive
is the same as non-inclusive…….and if we want to guarantee that
certain endpoints be inclusive:
6
STAT2800: Lecture 11
Example: Bags of concrete mix labeled as containing 100lb have a
population mean weight of 100lb and a population standard deviation
of 30lb. What is the probability that the mean weight of a random
sample of 50 bags is between 98 and 99 lb (inclusive)?
SEE ANNOUNCEMENTS FOR
DATE/TIME/LOCATION FOR OUR MIDTERM
Midterm Covers:
Chapter 1 (all of it)
Chapter 2 (2.1, 2.2, 2.3, 2.4)
Chapter 4 (4.2, 4.3, 4.5, 4.6, 4.7, 4.8)
Approximately 6 questions. Made up of long answer and short
answer. Formula sheet is posted under “Midterm and Exam
Resources”.
See announcements for more details.
7
STAT2800: Lecture 11
CHAPTER 5: CONFIDENCE INTERVALS
In Chapter 4 we discussed estimates for various parameters: for
example, x as an estimate of a population mean . These
estimates are called point estimates or simply, estimators, because
they are single numbers, or points. An important thing to remember
about point estimates is that they are almost never exactly equal to
the true values they are estimating. They are almost always off –
sometimes by a little, sometimes by a lot.
Definition
A point estimate of some parameter is a single number,
calculated from sample data, that can be regarded as an educated
guess for the value of .
One desirable property that a good estimator should possess is that it
be unbiased. In order for a point estimate to be useful, it is necessary
to describe just how far off the true value it is likely to be. One way to
do this is by reporting an estimate of the standard deviation/standard
error, or uncertainty, in the point estimate. Keep in mind that when
we calculate a point estimate for the sample mean, we may be off by
some degree to the left or to the right of the mean. This is where a
Confidence Interval comes into play.
Confidence Intervals for a Population Mean,
Variance Known (Section 5.1, page 324)
From The Central Limit Theorem we know that
z
x x
x / n
has a standard normal distribution.
8
STAT2800: Lecture 11
Therefore, if (1.96
x
1.96) 0.95 , let’s try solving for :
s/ n
With a probability of approximately .95, the selected sample will be
such that the value of is captured between these two interval limits.
Substituting the values of n , x , and from any particular sample
into these expressions gives a confidence interval for with a
confidence level of approximately 95%.
9
STAT2800: Lecture 11
Definition
A confidence interval (with known variance) for with a
confidence level of (approximately) 95% has
n
Upper confidence limit = x 1.96
n
Lower confidence limit = x 1.96
The interval is centered at x and extends out the same distance,
1.96
, to each side, so it can be written in abbreviated form as
n
x 1.96
n
This formula is valid whatever the shape of the population
distribution.
Example: The following summary statistics are given on breakdown
voltage of a particular circuit under certain conditions:
n 48 ,
x 2626
i
and
5.23 .
Find the 95% confidence interval.
10
STAT2800: Lecture 11
The correct interpretation is: if we consider taking sample after
sample from the population and use each one separately to compute
a 95% confidence interval, in the long run, roughly 95% of these
intervals will capture .
Summary
Let X 1 ,..., X n be a large (n 30) random sample from a
population with mean and standard deviation , so that x is
approximately normal. Then a level 100 (1 )% two-sided
confidence interval for is
x z 2 x
where x
n
.
Other Confidence Levels
Note: Other commonly used confidence intervals for a population or
process mean are given below:
11