Sampling Distributions

advertisement
QBM117
Business Statistics
Statistical Inference
Sampling Distribution of the Sample Mean
1
Objectives
•
To revise the differences between sample statistics
and population parameters
•
To introduce the sampling distribution of a sample
statistic.
•
To understand the central limit theorem.
2
Populations and Samples
• A population is the entire collection of items bout
which information is desired.
• A sample is a subset of the population that we collect
data from.
3
Parameters and Statistics
• A parameter is number that describes a population.
- A parameter is a fixed number.
• A statistic is a number that describes a sample.
- A statistic is a random variable whose value
changes from sample to sample.
4
Statistical Inference
• Population parameters are almost always unknown.
• We take a random sample from the population of
interest and calculate the sample statistic.
• We then use the sample statistic as an estimate of
the population parameter.
• Statistical Inference involves drawing conclusions
about a population based on sample information.
5
Example 1
Electronics Associates Industry (EAI) is an
international company that manufactures a diverse
line of products. The firm’s Director of Personnel has
been assigned the task of developing a profile of the
company’s 2500 managers. One of the
characteristics to be identified is the mean annual
salary for the managers.
The population is the 2500 managers.
The population parameter is the mean annual salary
of the 2500 managers. It is unknown.
The Director of Personnel does not have the time or
the money required to develop a profile for all 2500
managers. He selects a simple random sample of 30
managers and finds that the mean annual salary for
the sample is $69616.48.
The sample is the 30 managers randomly selected.
The sample statistic is the mean annual salary of the
30 managers in the sample, $69616.48.
The Director of Personnel then uses the mean annual
salary of the sample of 30 managers to estimate the
mean annual salary of all 2500 managers.
The process of using the mean annual salary of the
sample of 30 managers as an estimate of the mean
annual salary of all 2500 managers is known as
statistical inference.
How do we know that the mean annual salary of the
sample of 30 managers is a good estimate of the
mean annual salary of all 2500 managers?
Suppose we select another simple random sample of
30 managers and find that this sample has a mean
annual salary of $71374.35.70.
The sample mean annual salary will vary from
sample to sample.
Sampling Distributions
• Sample statistics are random variables.
• The probability distribution of a sample statistic is
called its sampling distribution.
• We us the sampling distribution to make inferences
about the population parameters.
10
Sampling Distribution of the Sample Mean
• One of the most common statistical procedures
involves using a sample mean x to make inferences
about an unknown population mean  .
• We expect different samples to have different means.
• If we use random sampling, each possible sample of
size n has the same probability of being selected.
• If we were to take every possible sample of size n
from a population and calculate the mean for each
sample, we would be able to find the probability
distribution of the sample mean.
11
Example 1
To determine the sampling distribution of the sample
mean annual salary we would need to calculate the
sample mean for every possible sample of size 30.
There are C30  2.745832 10 different
sample of 30 managers that can be taken from all
2500 managers.
2500
69
This is too many sample means to calculate.
12
We select 200 simple random samples of 30
managers and calculate the sample mean annual
salary for each of the 200 samples.
Sample Sample Mean Annual Salary
1
$69616.48
2
3
$71374.35
$72034.22
200
$72589.54
13
The histogram of the 200 sample mean annual
salaries will give an approximation of the sampling
distribution.
Histogram of the Sample Mean Annual
Salaries of 200 Simple Random Samples of
Size 30
60
Frequency
50
40
30
20
10
0
70000 70500 71000 71500 72000 72500 73000 73500 74000
Sample Mean Annual Salary
14
We can calculate the mean and the standard
deviation of the sample mean annual salaries for the
200 samples.
The mean of the 200 sample mean annual salaries is
$71842.13.
The standard deviation of the 200 sample mean
annual salaries is $680.01.
15
The sampling distribution of the mean annual salary
appears to be approximately normal with a mean of
$71842.13 and a standard deviation of $680.01.
If we were to take all 2.745832 10 possible
samples of 30 managers from all 2500 managers we
would be able to find the exact sampling distribution.
69
16
Sampling Distribution of the Sample Mean
• In practice we only take a single sample from a
population and hence use a single sample mean x
to make inferences about the population parameter

• So how do we find the probability distribution of the
sample mean finding that means of all possible
samples?
• We use some general results.
17
Mean and Standard Deviation of the
Sampling Distribution of the Sample Mean
• If x is the mean of a random sample of size n from a
population with mean  and standard deviation ,
then the mean and standard deviation of the
sampling distribution of the sample mean are given
by
x  
x 
• Note that
mean.

n
 x is called the standard error of the
18
Example 1
Suppose that information has been obtained from all
2500 managers.
The population mean annual salary is
The population standard deviation is
  $71800
  $4000
A sample of 30 managers is to be taken. What is the
mean and the standard deviation of the sampling
distribution of the sample mean annual salary?
19
The mean of the sampling distribution of the sample
mean annual salary is
x  
 $71800
The standard deviation of the sampling distribution of
the sample mean annual salary is
x 

n
4000

30
 $730.30
20
Shape of the Sampling Distribution of the
Sample Mean
• We have described the centre and the spread of the
sampling distribution of the sample mean, but what
about the shape?
• The shape depends on the shape of the population
distribution.
• If the population is normally distributed, then the
sampling distribution of the sample mean is also
normally distributed.
21
Example 1
Suppose that the annual salary of all 2500 managers
is normally distributed with a mean of $61800 and a
standard deviation of $4000.
A sample of 30 managers is to be taken. What is the
sampling distribution of the mean annual salary of the
sample?
22
Let
X = the annual salary
2
X ~ N (71800,4000 )
Let X = the mean annual salary of a sample of size 30
We have already determined that the mean of the
distribution of X is $61800 and the standard deviation
is $730.
Since the population from which the sample is being
drawn is normally distributed, then the sampling
distribution of the sample mean will be normally
distributed.
2
X ~ N (61800,730.30 )
23
Shape of the Sampling Distribution of the
Sample Mean
• What happens when the population distribution is not
normal?
• It turns out that as the sample size increases, the
distribution of X gets closer to a normal distribution,
no matter what the shape the population distribution
has.
24
Central Limit Theorem
If a random sample is draw from any population, the
sampling distribution of the sample mean is
approximately normal for a sufficiently large sample
size.
The larger the sample size, the more closely the
sampling distribution of X will resemble a normal
distribution.
25
Large Sample Size
• How large does a sample need to be to be
considered sufficiently large?
• Generally a sample size of n  30 is large enough to
ensure that the sampling distribution of X is
approximately normal.
• However if a population is extremely non-normal, the
sampling distribution will also be non-normal, even
for moderately large values of n
26
Sampling Distribution of the Sample Mean
x  
x 

n
If X is normally distributed, then
distributed.
X is normally
If X is non-normal, then X is approximately normally
distributed for sufficiently large sample sizes.
27
Using the Sampling Distribution for Inference
• Recall from Topic 2 that if X is normally distributed
with mean  and standard deviation  then
Z
X 

has a standard normal distribution with mean 0 and
standard deviation 1.
28
• It follows that if X is normally distributed with mean
 x   and standard deviation  x   / n then
Z
X  x
x
X 

/ n
has a standard normal distribution with mean 0 and
standard deviation 1.
• Hence we can use the standard normal tables to
make inferences about sample means.
29
Example 2
A federal inspector for weights and measures, visits a
packaging plant to check that the net weight if
packages is as indicated on the packages. The
manager assures the inspector that the packaging
process results in a mean weight of 750g with a
standard deviation of 14g. The inspector selects 100
packages at random and finds their mean weight to
be 748.5g.
If the managers claim is correct, how likely is a
sample mean of 748.g or less?
30
Let X = weight of package
2
The manager claims that X ~ N (750,14 )
The inspector has taken a sample of size 100.
Let X = mean weight of a sample of 100 packages
Using the Central Limit Theorem we know that X
will be approximately normally distributed with a
mean and standard deviation of
 x    750g

14
x 

 1.4
n
100
31
If the managers claim is correct, how likely is a
sample mean of 748.5g or less?
We want to find
P( X  748.5)
748.5 750
X
32
P ( X  748.5)
 X   x 748.5  750 
 P


14 / 100 
 x
 P ( Z  1.07)
 0.5  0.3577
 0.1423
1.07
0
Z
33
Example 3
The weight of a ’32g’ chocolate bar is normally
distributed with a mean of 32.2g and a standard
deviation of 0.3g.
a. If a customer buys one chocolate bar, what is the
probability that the bar will weigh less than 32g?
b. If a customer buys a pack of 4 bars, what is the
probability that the mean weight of the 4 bars will
be less than 32g?
34
a. X = weight of a chocolate bar
X ~ N (32.2,0.32 )
If a customer buys one chocolate bar, the probability
that the bar will weight less than 32g:
P ( X  32)
 X   32  32.2 
 P



0.3


 P ( Z  0.67)
 0.5  0.2486
 0.2514
32 32.2
0.67 0
X
Z
b.
X = mean weight of a sample of 4 chocolate bars
We know that
therefore
X ~ N (32.2,0.32 )
2

 0.3  
X ~ N  32.2, 
 

 4 

We want to find the probability that if a customer
buys a pack of 4 chocolate bars, the mean weight of
the 4 bars will be less than 32g.
Hence we want to find P( X  32)
36
P ( X  32)
 X   x 32  32.2 
 P


0.3/ 4 
 x
 P ( Z  1.33)
X
32 32.2
 0.5  0.4082
 0.0918
1.33 0
Z
37
Compare the distribution of X to the distribution of X
X
32 32.2
32 32.2
X
38
Reading for next lecture
• Chapter 8, Sections 8.1-8.3
Exercises
•
•
•
•
7.9
7.21
7.22
7.23
39
Download