Sampling Distributions

advertisement
Ch 5.5 + Ch 5.6 Sampling Distribution
Topics:
I. What is a Sampling Distribution?
II. Sampling Distribution of a Sample Mean X
(a) X ~ Normal Distribution
(b) X ~ Non-normal Distribution
III. Central Limit Theorem
IV. Sampling Distribution of the Sample Proportion p
---------------------------------------------------------------------------------------------------------------------------I. Sampling Distribution

Population vs. Sample:
 Population (or process) = The object of interest (for which we would like to make
inference
 Due to limited resource and time, it is usually impossible to know every aspect of
the population
 Instead, we obtain a (random) sample from the population
 Use the information from the sample to make inference about the population
 For example, we naturally use sample mean X to estimate the population
mean (  X )
 Since X was obtained from a sample, we are not guaranteed to get the same
value for X if we conduct the same experiment (to obtain the data) again.
 So X can be viewed as a random variable, thus has a distribution. This
distribution is called the sampling distribution of X .

A Statistic is a numerical quantity that is calculated from the sample (for example, the
sample mean X is a statistic)

A Parameter is a population characteristic (fixed) such as the success probability in
a Binomial distribution

The observed value of statistic depends on the particular sample; hence it varies from
sample to sample. Such variability is called sampling variability

The probability distribution of the statistics is called its sampling distribution

Why do we care about the sampling distribution? The sampling distribution of a statistic
tells us what values a statistic is likely to take; we can use the sampling distribution to
make inference about the population parameter.
1
Ex1. A neighborhood has 5 houses A, B, C, D and E. They respectively have 3, 2, 5, 3, and 4
bedrooms. We randomly draw 3 houses at a time and calculate the sample statistics
median and mean of bedrooms. What is the sampling distribution of the sample median?
What is the sampling distribution of the sample mean?
Houses drawn in
the sample
# of bedrooms
Sample median
Sample mean
Probability
ABC
3,2,5
3
10/3=3.3
0.1
ABD
3,2,3
3
8/3=2.7
0.1
ABE
3,2,4
3
9/3 = 3
0.1
ACD
3,5,3
3
11/3 = 3.7
0.1
ACE
3,5,4
4
4
0.1
ADE
3,3,4
3
10/3 = 3.3
0.1
BCD
2,5,3
3
3.3
0.1
BCE
2,5,4
4
11/3=3.7
0.1
BDE
2,3,4
3
9/3 = 3
0.1
CDE
5,3,4
4
4
0.1
The sample median takes 2 values: 3 & 4 with probabilities 0.7 and 0.3. So the probability
mass function for sample median is
Sample meadian
3
4
Probability
0.7
0.3
Similarly, the probability mass function for sample mean is
Sample mean
2.7
3
3.3
3.7
4
probability
0.1
0.2
0.3
0.2
0.2
2
II. Sampling Distribution of a Sample Mean X
Let X be the sample mean of a random sample X1 , X 2 ,..., X n from a population with mean 
and SD  . (That is, X 
X1  X1    X n
.) We want to know the sampling distribution of
n
X.
 If X ~ Normal (mean=  , SD=  ). Then X , the mean of a random sample of n
observations

follows a Normal distribution with mean

X =

X
 , and
X
=
X
and standard deviation
X .

n
is also called standard error (SE) of X , or Standard error of the mean
Ex 2. Thousands of boxes contain nuts. The weights are normally distributed with mean  =1 lb
and SD  =0.01 lb. We inspect 4 boxes and get their weights X 1 , X 2 , X 3 , X 4 . The sample
mean is X 
X1  X 2  X 3  X 4
4
(a) What is the sampling distribution of X ? Mean and SE of X ?
N (  X  1,

4

0.01
 0.005)
2
(b) What is the probability that X lies between 0.99 and 1.01 lb?
P[0.99  X  1.01]  0.9545
3
X ~ any non-normal distribution with mean=  , SD=  . The sampling distribution of X based
on a sample of size n is
(a) If n is small (i.e., < 30 ), then

Distribution: is determined by the distribution of X

Mean  X and SE  X :

X   ,  X 
n
(b) If n is large (i.e.,  30 ), then

Distribution of X is approximately normal

Mean  X and SE  X :
X   , X 

n
 These results follow from Central Limit Theorem (CLT)
III. Central Limit Theorem
Assume X follows an arbitrary distribution with mean  and SD  .
When sample size is sufficiently large (i.e., n  30), the sample distribution of X
always follows normal distribution with mean  and SE


n
Usually the less symmetric a distribution is, the larger the sample size will need to
ensure normality of X
4
Ex3. Let X be the number of major defects for each new automobile tested. Suppose the number
of such defects for a certain model has some distribution with mean  =3.2 and SD  =2.4.
A sample of 100 new cars is collected.
(a) What is the sampling distribution of X based on samples of size 100? What is its center
and what is the SE of X ?
N (  X  3.2,  X 
2.4
2.4

 0.24)
n
100
(b) What is the probability that the sample average number of major defects exceeds 4?
P[ X  4]  0
 Comments:

If X is the sample mean of a random sample X1 , X 2 ,..., X n from a population with
mean =  and SD =  , then regardless of the sample size n and the distribution of
X,
X  ,  X 


n
The variation of sample means is  (always) than variation of the original dataAs sample
size n increases,  x (the SE of X ) decreases, and the shape of the sampling
distribution becomes more and more bell shaped and the mass is more and more
concentrated around mean  . This implies higher probability around  .
5
Ex4. The heights of college age students (denoted by X) are known to have mean  =115 and
SD  =30.
(a) Assume that we were told that the heights of college age students are normally
distributed. What is the sampling distribution of X based on samples of 9 college age
students? What are the mean and SE?
N (  X  115,  X  30 / 9  10)
(b) What is the sampling distribution of X based on samples of 9 college age students without
the assumption that the heights have a normal distribution? What are the mean and SE of
the sampling distribution of X ?
The distribution is determined by the height distribution. But
 X  115,  X  30 / 9  10
(c) What is the sampling distribution of X , the average height of 36 college age students?
What are the mean and SE of the sampling distribution of X ?
N (  X  115,  X  30 / 36  5)
6
IV. Sampling Distribution of a Sample Proportion p
Ex. Consider a basket containing 100 balls with 2 colors: Red and White. The proportion of
Red balls is denoted by  (and is not known). Assume 30 balls were randomly picked
from the basket with replacement, and 14 balls out of the 30 balls were red.
(1) In the sample, what is the proportion of red balls?
p
14
30
(2) We refer such quantity, 14 / 30, as sample proportion and denote it by p .
Our question of interest: what is the distribution of the sample proportion p ?
Thoughts: we can think a r.v such that X = 1 if “red” and X=0 if “not red”.
Then p can be viewed as the mean of 30 X’s
In general, p from a sample of size n can be written as
X 1  X 2  ...  X n
n
Thus by CLT , p ~ Normal if n large. (However, different criteria for “large
n ” are needed here.)
Sampling Distribution of p
(a) If _large n ( n  30, n  5, n(1   )  5 ), then the sample proportion p has

A ___normal distribution___ ( by CLT)

Mean (denoted by  p )   , and SE (denoted by  p ) 
 1   
n
(b) If _small n (at least one of n  30, n  5, n(1   )  5 does not satisfy), then the sample
proportion p has

An unknown (discrete) distribution

Mean (denoted by  p )   , and SE (denoted by  p ) = 
 1   
n
7
Ex5. In the population, the proportion of defectives  =12%.
(a) What is the sampling distribution of p based on 100 observations? What is the mean?
What is the standard error?
Approximate normal with mean=0.12, SD=
0.12  (1  0.12)
 0.03
100
(b) What is the probability that p <0.10?
P[ p  0.1]  P[
p  0.12 0.1  0.12

 0.67]  0.25
0.03
0.03
8
Download