Chapter 2 SAMPLING DISTRIBUTIONS

advertisement
Chapter 2
SAMPLING
DISTRIBUTIONS
Two Definitions of Random Sample
 Random sample from a finite population (page 381)
 Random sample from an infinite population (page382)
Chapter 2. Sampling Distribution
Random Sample from a Finite Population
(page 381)
Definition 11.1.
Suppose we select n distinct elements from a population
consisting of N elements, using a particular probability
sampling method. Let
X1=measure taken from the 1st element in the sample
X2=measure taken from the 2nd element in the sample
…
Xn=measure taken from the nth element in the sample
Then, (X1, X2,…,Xn) is called a random sample of size n from
a finite population.
Chapter 2. Sampling Distribution
Remarks About a Random Sample from a Finite Population
(page 381)
Let (X1, X2,… Xn) be a random sample from a finite population.
 The Xis are random variables since probability sampling requires
the use of a randomization mechanism in selecting the elements
of the sample.
 The definition does not require the assignment of equal chances
of inclusion in the sample for all the elements in the population.
 The definition requires that the selected elements in the sample
must be distinct from each other.
Chapter 2. Sampling Distribution
More Remarks
(page 381)
 If the elements in the sample were selected using
SRSWOR and Xi is the measure taken from the ith
selected element in the sample then (X1, X2, … Xn) is a
random sample from a finite population.
 If the elements in the sample were selected using SRSWR
and Xi is the measure taken from the ith selected element
in the sample then (X1, X2, … Xn) is NOT a random
sample from a finite population. However, it can be
viewed as a random sample from an infinite population.
Chapter 2. Sampling Distribution
Random Sample from an Infinite Population
(page 382)
Definition 11.2.
Let
X1=measure taken from the 1st element in the sample
X2=measure taken from the 2nd element in the sample
…
Xn=measure taken from the nth element in the sample
Then (X1, X2,…,Xn) is called a random sample of size n from an infinite
population if the values of X1, X2,…,Xn are n independent observations
generated from the same cumulative distribution function (CDF), F(.). This
common CDF or its corresponding probability mass/density function, f(.), is called
the parent population or the distribution of the population.
This definition is equivalent to saying that a random sample from an infinite
population is a sample generated by a series of n independent trials that are
performed under identical conditions. This is because the CDFs of the Xis
must all be the same.
Chapter 2. Sampling Distribution
Remarks About a Random Sample from an Infinite Population
(page 382)
 Many of the procedures in this course will require that (X1, X2,…, Xn) is a random
sample from a normal population.
 Even if the collection of all elements under consideration is a finite collection,
sampling can still be seen from a viewpoint of sampling from an infinite population
when we replace the selected elements in the sample back to the population.
 In many cases, the distinction of sampling from a finite and an infinite population
will be irrelevant when the sampling fraction, n/N, is close to zero so that
inferences on finite and infinite populations will yield essentially the same results.
Chapter 2. Sampling Distribution
Example 10.34
(pages 328)
Go back to our small barangay in Example 10.34. This barangay consists of 6
qualified voters: A1, A2, A3, A4, A5, and A6. Renzo and Sandro are 2 candidates
vying for the same position. A1, A2, A3 and A4 have already decided to vote for
Renzo while A5 and A6 will vote for Sandro. Suppose we select a sample of size 2
using SRSWR.
We have a finite population where N=6. For i=1,2,3,4,5,6, define:
Xi
1
if i th voter in the population elects Re nzo
0
if i th voter in the population elects Sandro
Population Data={1,1,1,1,0,0}
This time, let us define for i=1,2
Xi
1
if i th voter in the sample elects Re nzo
0
if i th voter in the sample elects Sandro
Note that (X1,X2) is a random sample from an infinite population where the
common PMF of the discrete random variables, X1 and X2 are as follows:
x
f(x)
0
2/6
1
4/6
Chapter 2. Sampling Distribution
New Definition of Statistic
(page 183)
Definition 11.3.
Suppose (X1, X2,…,Xn) is a random sample. A statistic is a
random variable that is a function of X1, X2,…,Xn .
Example 11.1: Suppose (X1, X2,…,Xn) is a random sample.
n
Xi
a)
X
i 1
is a random variable that is a function of X1,
n
X2,…,Xn. Thus,
X is a statistic.
n
(Xi
b)
S2
X )2
is a random variable that is a function of
X1, X2,…,Xn. Thus, S2 is a statistic.
i 1
n 1
Chapter 2. Sampling Distribution
Remarks About the Statistic
(page 383)
 The given definition of a statistic does not contradict the
definition given in Stat 114 that states that it is a summary
measure computed from a sample. This time though we are
requiring that the statistic is a random variable.
 As a random variable, the statistic is a function whose value
depends on the outcome of a random experiment. In this
case, the random experiment is the selection of a random
sample of size n. It is impossible to predict with certainty
what the realized value of the statistic will be.
 However, as a random variable, it has a probability distribution
which can help us understand the behavior of this statistic in
probabilistic terms.
Chapter 2. Sampling Distribution
Sampling Distribution of a Statistic
(page 383)
Definition 11.4
The sampling distribution of a statistic is its probability
distribution.
o If the statistic is a discrete random variable then its sampling
distribution is its probability mass function. On the other hand, if
the statistic is a continuous random variable then its sampling
distribution is its probability density function.
o The sampling distribution of a statistic depends on various factors
such as:
 Sample size
 Method of choosing the random sample
 Population under study
Chapter 2. Sampling Distribution
Example 11.3
(page 383)
Go back to our small barangay in Example 10.34. This barangay consists of 6
qualified voters: A1, A2, A3, A4, A5, and A6. Renzo and Sandro are 2 candidates
vying for the same position. A1, A2, A3 and A4 have already decided to vote for
Renzo while A5 and A6 will vote for Sandro. Suppose we select a sample of size 2
using SRSWOR.
2
Xi
Construct the sampling distribution of X
Xi
i 1
2
, where we define for i=1,2
1
if i th voter in the sample elects Re nzo
0
if i th voter in the sample elects Sandro
(Note that X can also be viewed in this example as a sample proportion because
2
the numerator,
i 1
X i , simply counts the total number of voters in a sample of size 2
who will elect Renzo; and, we divide this by the number of voters in the sample.)
Chapter 2. Sampling Distribution
Example 11.3 cont’d.
Physical Sample
{A1,A2}
{A1,A3}
{A1,A4}
{A1,A5}
{A1,A6}
{A2,A3}
{A2,A4}
{A2,A5}
{A2,A6}
{A3,A4}
{A3,A5}
{A3,A6}
{A4,A5}
{A4,A6}
{A5,A6}
{X1,X2}
{1,1}
{1,1}
{1,1}
{1,0}
{1,0}
{1,1}
{1,1}
{1,0}
{1,0}
{1,1}
{1,0}
{1,0}
{1,0}
{1,0}
{0,0}
x
1
1
1
1/2
1/2
1
1
1/2
1/2
1
1/2
1/2
1/2
1/2
0
Xi
1 if A1 or A2 orA3 or A4 is selected
0 if A5 or A6 is selected
All of these 15 possible samples have the same chances of selection because we
select the sample using SRSWOR. We then use the classical definition of
probability to construct the sampling distribution of X .
Sampling Distribution of X
0
1/2 1
x
f( x )
1/15 8/15 6/15
Chapter 2. Sampling Distribution
Remarks:
 In Example 11.2 (page 384), the sample size was 3
instead of 2. Note that the sampling distribution of X is
different even if the population and sample selection
procedures are both the same.
 In Example 11.4 (page 386), the sample size was also 2
but the sample was selected using systematic sampling.
Once again, the sampling distribution of X is different.
Chapter 2. Sampling Distribution
Standard error of a Statistic (page 387)
Definition 11.5.
The standard deviation of a statistic is called its standard error.
Recall: The sampling error is the error attributed to the variation
present among the computed values of the statistic from the different
possible samples consisting of n elements (page 77). The standard error
will give us an idea on the expected size of the sampling error.
A small standard error indicates that the computed values of our statistic
in the different samples generated are close to one another, so that even
if we know that the value of a statistic varies from one sample to
another, a small standard error gives us an assurance that at least the
variation among their values is not too large.
Chapter 2. Sampling Distribution
Remarks About Theorems 11.1 and 11.2
(pages 389-390)
Suppose a sample of size n is selected from a population with mean µ and variance 2. Let be the
sample mean (viewed as a random variable).
Theorem 11.1
Theorem 11.2
SRSWOR
SRSWR/
Sampling from an infinite population
µ
µ
Mean of
Variance of
Standard
error of
In both theorems, mean of X is the mean of the population, µ. That is E( X ) =
In both theorems, the standard error of X is smaller for populations where
elements are homogeneous with respect to the characteristic of interest.
2
.
X
is small or the
In both theorems, increasing the sample size n will decrease the standard error of X .
The term
N n
N 1
in Theorem 11.1 is called the finite population correction. This term is notably
absent in the Var( X ) when we sample from an infinite population. This correction factor
results in a smaller standard error under SRSWOR compared to SRSWR. This correction
factor though will approach 1 when we allow N to approach infinity while n remains fixed.
Thus, the standard errors in both schemes will be approximately equal to each other.
Chapter 2. Sampling Distribution
Theorem 11.3. Central Limit Theorem
(page 393)
If X is the mean of a random sample of size n from a large or infinite population with
population mean and population variance 2, then the sampling distribution of X is
approximately normally distributed and the mean of X is the population mean, , and the
variance of X is 2/n, when n is sufficiently large.
The Central Limit Theorem basically states that when the sample size is sufficiently large
then we can use the normal distribution to approximate the sampling distribution of X .
The CLT does not state any requirement about the distribution of the population, aside from
having mean µ and variance 2. The normal approximation will hold for population
distributions that are either discrete or continuous. The normal approximation will hold for
population distributions that are either symmetric or skewed. We can use the approximation
even for random samples from finite populations so long as N is very large.
In most situations, the normal approximation will be good if n 30. If the distribution of the
population is not very different from the normal distribution then the approximation will be
good even if n<30. In fact, if the population is normally distributed then X will be normally
distributed even for a sample of size 1.
Chapter 2. Sampling Distribution
Examples
Example 11.8 (page 393)
Exercise 1 (page 395)
A random sample of size 400 is taken from a large population with mean µ=50 and variance
Approximate the probability of selecting a sample that satisfies:
By CLT , sin ce n
a) 49.5≤
b) |
X
X ≤50.75
=25.
400 is l arg e then X is approximately normally distributed
where mean of X is
X
2
50 and variance of X is
- µ| ≤0.5
a ) Find P(49.5
2
49.5 50
0.0625
P
50.75)
P
2
Z
X
2
3
/n
P( Z
/n
25 / 400 0.0625.
50.75 50
0.0625
3) P( Z
2)
0.9987 0.0228 0.9759.
b) Find P( X
0.5)
P( 0.5
P
X
0.5)
0.5
0.0625
P( 2
Z
X
2
2)
Chapter 2. Sampling Distribution
/n
P( Z
0.5
0.0625
2) P( Z
2) 0.9772 0.0228 0.9544.
Exercise 3
(page 395)
Suppose the mean monthly income, µ, of the households in the exclusive
subdivisions in Metro Manila is P200,000 with a standard deviation
=P150,000. What is the probability of selecting a random sample of 100
families whose sample mean monthly income is larger than P250,000?
Let Xi=monthly income of ith selected family in the sample, i=1,2,…,100
(X1, X2,…,X100) is a random sample from a population with mean =200,000
and standard deviation =150,000
State the problem: Find P( X 250, 000).
What can be concluded using the CLT?
The sampling dist ' n of X is approximately normal and the mean of X is
and the standard error of X is
/ n 150, 000 / 100 15, 000.
Solution:
250, 000 200, 000
15, 000
P( X
250, 000)
P
X
/
n
P( Z
3.33)
Chapter 2. Sampling Distribution
1 F (3.33)
200, 000
1 0.9996
0.0004
Assignment 5
A random sample of size 625 was selected from a large
population with population mean =15 and population
variance 2=20.25. Approximate the probability of selecting
a sample that satisfies:
1.
a)
b)
2.
sample mean is between 15.2736 and 15.4572
|X
0.45
An anthropologist claims that the population mean height of
men of the race he is studying is 55 inches with standard
deviation of 5 inches. Approximate the probability of
selecting a random sample of 100 men of this race whose
sample mean height is greater than 56.5 inches? Let
Xi=height of the ith selected man in the sample.
Chapter 2. Sampling Distribution
The t-distribution
(page 396)
If X is a random variable that follows a t-distribution with v degrees of freedom then we write X~t(v).
Just like the standard normal distribution, the t-distribution is also a bell-shaped distribution that
is symmetric about 0. Its tails will also approach the x-axis without ever touching it. However,
the t-distribution has a larger variance than the standard normal. But as the degrees of
freedom increases, the variance of the t-distribution approaches 1 (the variance of the standard
normal distribution.
V=5
V=2
-3
-2
-1
0
1
Chapter 2. Sampling Distribution
2
3
t-Table, Table B.2
(page 605)
If X~t(v) then:
(i) P(X < -t (v)) = P(X > t (v))=
(ii) t1- (v) = -t (v).
; and,
Example 11.10 (page 397)
Other Examples:
1. Suppose X~t(v=15)
a. P(X > 2.947)=
b. P(X < -2.947) =
c. P(X< -1.341)=
2. t0.01(v=4)
3. t0.005(v=8)
4. t0.95(v=24)
Chapter 2. Sampling Distribution
The Chi-square Distribution
(page 397)
If X is a random variable that follows a chi-square distribution with v degrees of
freedom thedfn we write X~ 2 (v).
The PDF of the chi-square distribution is positive for positive real numbers only;
elsewhere, its value is 0. Its mean is equal to its degrees of freedom. Its variance
is twice its degrees of freedom. Thus, as the degrees of freedom increases, both
the mean and variance will also increase.
The PDF of the chi-square distribution is skewed to the right. Its skewness is
more pronounced for smaller degrees of freedom. As the degrees of freedom
increases, its distribution becomes more symmetric.
V=2
V=5
V=10
V=15
Chapter 2. Sampling Distribution
Chi-square Table, Table B.3
(page 606)
If X~ 2(v) then:
2
(v) )= ; and,
2
(v) )= 1 - .
(i) P(X >
(ii) P(X <
Example 11.11 (page 398)
Other Examples:
1. Suppose X~ 2(v=18)
a. P(X > 6.265)=
b. P(X < -6.265) =
c. P(X< 28.869)=
2.
3.
2
0.025
2
0.1
(v 6)
(v
25)
Chapter 2. Sampling Distribution
The F-Distribution
(page 398)
If X is a random variable that follows an F-distribution with v1 numerator degrees of
freedom and v2 denominator degrees of freedom, we write X~F(v1,v2 ).
The PDF of the F-distribution is positive for positive real numbers only;
elsewhere, its value is 0. Its graph is skewed to the right. In general, distributions
with higher degrees of freedom are less skewed.
If X and Y are two independent random variables such that X~ 2(v1) and Y~ 2(v2),
then the random variable F
X / v1
Y / v2 will follow an F-distribution with v1
numerator degrees of freedom and v2 denominator degrees of freedom.
Chapter 2. Sampling Distribution
F-Table, Table B.4
(page 607-612)
If X~F(v1,v2) then:
(i) P(X >F (v1,v2)= ;
(ii) P(X <F (v1,v2)= 1- ;
1
F1 (v2 , v1 )
(iii) F (v1 , v2 )
Example 11.12 (page 399)
Other Examples:
1. Suppose X~F(v1=8,v2=4)
a. P(X > 5.1)=
b. P(X < 9.6) =
2. F0.1(v1=3, v2=6)
3. F0.975(v1=3, v2=6)
Chapter 2. Sampling Distribution
Sampling from the Normal Distribution
(page 400)
Suppose (X1,X2,…,Xn) is a random sample satisfying the condition that Xi~Normal(µ,
n
) for i=1,2,…,n.
n
Xi
i 1
Define the statistics, X
2
n
X )2
(Xi
2
as the sample mean and S
i 1
n 1
as the sample variance, where
n is the sample size.
TABLE 11.1.
Sampling Distributions of Statistics Based on a Random Sample from a Normal
Distribution
STATISTIC
SAMPLING
DISTRIBUTION
X
standard normal distribution
PARAMETER/S
mean=0
variance=1
t-distribution
degrees of freedom:
v=n–1
Chi-square distribution
degrees of freedom:
Z
n
T
X
S
n
X2
(n 1) S 2
2
v = n-1
Chapter 2. Sampling Distribution
Examples
Example 11.13 (pages 400-401)
Exercise 5 (page 406)
IQ is normally distributed with mean, µ=100, and standard deviation,
random sample of size 100 with mean IQ larger than 105?
Let Xi=IQ of ith selected student in the sample
Given: Xi ~ Normal(µ=100, 2=202)
(X1, X2, …, X100) is a random sample.
Find P( X
105).
According to table 11.1, Z
P( X
105)
P
X
/ n
P( Z
X
/ n
is a standard normal random variable.
105 100
20 / 100
2.5) 1 P( Z
2.5) 1 0.9938 0.0062.
Chapter 2. Sampling Distribution
=20. What is the probability of selecting a
More Examples
Example 11.14 (page 401)
Exercise 6 (page 407)
The length of time it takes a student in a dormitory to take a bath follows a normal distribution with mean,
µ=22.5689 minutes. Suppose a random sample of 16 students was selected and its standard deviation, S=2.2. Find
the probability of selecting a sample whose sample mean is more than 24 minutes.
Let Xi=length of time it takes ith student in the sample to take a bath
Given: Xi~Normal(µ=22.5689, 2) and (X1, X2, …, X16) is a random sample with S=2.2.
Find P( X
24).
According to table 11.1,
P( X
24)
P
X
S/ n
P(T
X
follows a t distribution with v
S/ n
n 1 degrees of freedom.
24 22.5689
2.2 / 16
2.602) where T ~ t (v 16 1 15)
0.01.
Chapter 2. Sampling Distribution
More Examples
IQ is normally distributed with mean, µ=100, and standard deviation,
random sample of size 11 with variance greater than 819.32?
=20. What is the probability of selecting a
Let Xi=IQ of ith selected student in the sample
Given: Xi ~ Normal(µ=100, 2=202)
(X1, X2, …, X11) is a random sample.
Find P(S 2 819.32).
According to table 11.1, X
P( S
2
819.32)
P
since X
(n 1)S 2
2
2
(n 1) S 2
2
(11 1)(819.32)
202
2
2
is a
(n 1) S 2
2
is a
2
random variable with (n 1) degrees of freedom.
P( X 2
20.483)
0.025
random variable with n 1 10 degrees of freedom.
Chapter 2. Sampling Distribution
Assignment 6
1.
2.
3.
The scores in the Stanford-Binet IQ test are known to be
normally distributed with population mean 100 and
population standard deviation 16. Suppose a random sample
of size 9 will be selected.
What is the probability of selecting a sample whose mean is
between 96 and 112?
What is the probability of selecting a sample whose variance
is less than 642.88?
Suppose the population standard deviation is unknown.
What is the probability of selecting a sample whose mean is
greater than 111.16 if the standard deviation of the sample
is 18?
Chapter 2. Sampling Distribution
Download