Uploaded by reeceh543

Lecture 5 Normal Approx and Stratified Sampling

advertisement
Lecture 5 (Survey Sampling continued)
Variance and Stratified Sampling
Recap from Lecture 4
Lecture was all about population parameters πœ‡, 𝜎 2 π‘Žπ‘›π‘‘ 𝜏. Please remember that this almost
always unknown. For the sake of exercises, we give population parameters but in real life
we do not know them and therefor we need to estimate the parameters.
We also introduced sampling with and without replacement and are always interested in
the variance that the sample mean can have.
Again, we can get the above IF we have the population parameter’s value, that is the value
of 𝜎 2 . Otherwise we use the sample and determine the sample variance 𝑠 2 and use that to
find the estimated variance of the sample mean.
π‘‰π‘Žπ‘Ÿ(𝑋̅) =
𝑠2
.
𝑛
We are still continuing with Chapter 7 and busy with Section 7.3
Survey Sampling
Chapter 7 Rice
Population
Parameters
Simple Random
Sampling
7.2
7.3
Stratified
Sampling
7.3.3
7.3.1
7.3.2.
Expectation and
variance of sample
mean
Estimation of
population
variance
1
Normal
Approximation to
the sampling
distribution of the
mean
Please revise Chapter 8 of STA221.
The takeaway:
We are not doing
_________________________________________________________________________
Survey Sampling
Chapter 7 Rice
Simple Random
Sampling
Population
Parameters
Stratified
7.3
7.2
Sampling
7.3.1
7.3.2.
Expectation and
variance of
sample mean
Estimation of
population
variance
2
Study this carefully- make sure you understand each aspect of the notation.
In a town we divided the town into two suburbs. Sub A has 250 households, Sub B has 150
households.
Give the values of the fraction of the population in the two strata to be used π‘Šπ‘™ , 𝑙 = 1,2.
π‘Š1 =
250
150
= 0.625 π‘Žπ‘›π‘‘ π‘Š1 =
= 0.375
400
400
Determine the population mean
πœ‡ = ∑2𝑙=1 π‘Šπ‘™ πœ‡π‘™
= π‘Š1 πœ‡1 + π‘Š2 πœ‡2
= 0.625πœ‡1 + 0.375πœ‡2 (we don’t know the population means in each strata)
3
A simple random sample of household expenditure on food 𝑿, is taken from each of the
suburb’s total households. Sub A- 110 and Sub B- 80.
Total expenditure Sub A – R550 000.00 and Sub B – R360 000.00. (110 and 80 household)
Give the mean expenditure of the suburbs.
𝑋̅1 =
1
(550000.00) = 5000
110
The average spending on food is R5000.00 in Suburb A.
𝑋̅2 =
1
(360000.00) = 4500
80
The average spending on food is R4500.00 in Suburb B.
What is the overall mean of the two strata?
2
𝑋̅𝑠 = ∑
π‘Šπ‘™ 𝑋̅𝑙
𝑙=1
= π‘Š1 𝑋̅𝑙 + π‘Š2 𝑋̅𝑙
= 0.625(5000) + 0.375(4500)
= 4812.50
The average spending on food over the two strata is R4812.50.
4
Can we show this with our example?
We found that the samples gave an average of R4812.50 but the actual mean is unknown.
We know from our sample that the two suburbs have an average sample mean of
(5000 + 4500)/2 = 4750.00
We can therefor assume that 𝐸(𝑋̅𝑠 ) = 4812.50 and therefore we can assume that it might
be the mean of the 400 households.
Why do want to know the variance over all the strata? Because we know that if we sample
again in the strata we will get other values.
Can we find the variance of our strata mean 𝑋̅𝑠 = 4812.50? That is π‘‰π‘Žπ‘Ÿ(𝑋̅𝑠 )?
5
Yes, if we have the population variance of each of the strata or the sample variance of each
strata.
1
110 − 1 2
1
80 − 1
) (1 −
) 𝜎1 + π‘Š22 ( ) (1 −
) 𝜎2
110
400 − 1
80
400 − 1 2
1
110 − 1 2
1
80 − 1
= π‘Š12 (
) (1 −
) 𝜎1 + π‘Š22 ( ) (1 −
) 𝜎2
110
400 − 1
80
400 − 1 2
π‘‰π‘Žπ‘Ÿ(𝑋̅𝑠 ) = π‘Š12 (
= π‘Š12 (0.0066)𝜎12 + π‘Š22 (0.100)𝜎22
Recall:
π‘Š1 =
250
150
= 0.625 π‘Žπ‘›π‘‘ π‘Š1 =
= 0.375
400
400
= (0.625)2 (0.0066)𝜎12 + (0.375)2 (0.0100)𝜎22
= 0.00257𝜎12 + 0.0014𝜎22
Hence, if we have the sample variance of a strata 𝑠𝑙2 we can estimate πœŽπ‘™2 where 𝑙 = 1,2,
Substituting these values with the sample variance will give an estimated value of the
variance of the sample mean across the strata.
Note:
The tests and the exam will be mainly applications of the theory presented. Hence, study
the examples carefully.
Note that the population parameters are known.
6
Now we apply simple random sampling within each of the strata.
7
This simply says that if a population indicates that there are three strata 𝑁1 , 𝑁2 , 𝑁3 , how
large will be choose each sample from the stratum to be. That is how large must 𝑛1 be if we
sample from 𝑁1 ?
Recall: This is the variance of the stratified sample mean.
2 2
πœŽπ‘™
π‘Š
If π‘‰π‘Žπ‘Ÿ(𝑋̅𝑠 ) = ∑𝐿𝑙=1 𝑙
𝑛𝑙
where
•
π‘Šπ‘™ =
𝑁𝑙
𝑁
π‘Žπ‘›π‘‘ 𝑁𝑙 𝑖𝑠 π‘Ž π‘ π‘‘π‘Ÿπ‘Žπ‘‘π‘’π‘š π‘Žπ‘›π‘‘ 𝑁 𝑖𝑠 π‘‘β„Žπ‘’ π‘π‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘› π‘‘π‘œπ‘‘π‘Žπ‘™
•
is the variance of the stratum
• 𝑛𝑙 is the random sample size drawn from 𝑁𝑙
Example:
We have 3 natural strata in a population. Three samples are drawn, one from each of the
strata.
πœŽπ‘™2
If 𝑡 = πŸ‘πŸŽπŸŽ π‘€β„Žπ‘’π‘Ÿπ‘’ 𝑁1 = 100, 𝑁2 = 90, 𝑁3 = 110, π‘‘β„Žπ‘’π‘› the weights of each stratum is
100
90
110
π‘Š1 =
= 0.3333 ,
π‘Š2 =
= 0.3 π‘Žπ‘›π‘‘ π‘Š3 =
= 0.3666
300
300
300
Also we have 𝜎12 = 10, 𝜎22 = 8 π‘Žπ‘›π‘‘ 𝜎32 = 11 π‘‘β„Žπ‘’π‘›
Now we need to sample from each stratum. Let us take a sample of 60 from each of the
stratum.
π‘Šπ‘™2 πœŽπ‘™2 (0.3333)2 (10) (0.3)2 (8) (0.3666)2 (11)
=
+
+
60
60
60
𝑙=1 𝑛𝑙
= 0.1111(10) + 0.09(8) + 0.1344(11)/60
= 1.1110 + 0.72 + 1.4784/60
= 3.3094/60
π‘‰π‘Žπ‘Ÿ(𝑋̅𝑠 ) = 0.0552
𝐿
π‘‰π‘Žπ‘Ÿ(𝑋̅𝑠 ) ≈ ∑
8
However, say we cannot use just sample equally form each stratum due to constraints of
money and time, then we can get good sample sizes using the following where we constrain
the number that we may sample.
Let’s apply Theorem A by now choosing the appropriate sample size for each stratum using
of the previous values.
π‘›π‘Šπ‘™ πœŽπ‘™
𝑛𝑙 = 𝐿
∑π‘˜=1 π‘Šπ‘˜ πœŽπ‘˜
Where
• 𝒏 is what resources allows us to sample
• π‘Šπ‘™ πœŽπ‘™ is as before
If we allow 𝒏 = 𝟐𝟎𝟎 so that 𝑛1 + 𝑛2 + 𝑛3 = 200 then we have to find the values of the 𝑛𝑖
so that the sum will be equal to 200.
For sample size from stratum 1:
π‘›π‘Š1 𝜎1
𝑛1 = 3
∑π‘˜=1 π‘Šπ‘˜ πœŽπ‘˜
π‘›π‘Š1 𝜎2
𝑛1 =
π‘Š1 𝜎1 + π‘Š2 𝜎2 + π‘Š3 𝜎3
We have
If 𝑁 = 300 π‘€β„Žπ‘’π‘Ÿπ‘’ 𝑁1 = 100, 𝑁2 = 90, 𝑁3 = 110, π‘‘β„Žπ‘’π‘›
100
90
110
π‘Š1 =
= 0.3333 ,
π‘Š2 =
= 0.3 π‘Žπ‘›π‘‘ π‘Š3 =
= 0.3666
300
300
300
If 𝜎12 = 10, 𝜎22 = 8 π‘Žπ‘›π‘‘ 𝜎32 = 11
and
𝜎1 = 3.1623, 𝜎2 = 2.8284 π‘Žπ‘›π‘‘ 𝜎3 = 3.3166
The weight of each sampled size is
π‘Šπ‘™ πœŽπ‘™
𝑀𝑙 = 𝐿
∑π‘˜=1 π‘Šπ‘˜ πœŽπ‘˜
Then,
𝑛1 =
=
π‘›π‘Š1 𝜎1
3
∑π‘˜=1 π‘Šπ‘˜ πœŽπ‘˜
200(0.3333 )(3.1623)
0.3333 × 3.1623 + 0.3 × 2.8284 + 0.3666 × 3.3166
210.7989
=
1.0539 + 0.8485 + 1.2158
9
210.7989
3.1152
π’πŸ = πŸ”πŸ•. πŸ”πŸ”πŸ•πŸ– ≈ πŸ”πŸ–
=
𝑛2 =
=
π‘›π‘Š2 𝜎2
3
∑π‘˜=1 π‘Šπ‘˜ πœŽπ‘˜
200(0.3 )(2.8284)
0.3333 × 3.1623 + 0.3 × 2.8284 + 0.3666 × 3.3166
169.704
=
1.0539 + 0.8485 + 1.2158
169.704
=
3.1152
π’πŸ = πŸ“πŸ’. πŸ’πŸ•πŸ” ≈ πŸ“πŸ’
243.173
3.1152
π’πŸ‘ = πŸ•πŸ–. πŸŽπŸ”πŸŽπŸ ≈ πŸ•πŸ–
𝑛3 =
Check:
𝑛1 + 𝑛2 + 𝑛3 = 68 + 55 + 78 = 200
This means that these samples sizes will be good for the resources we have and to only
sample these numbers from the strata in a population (where the total is 300) since we can
only afford to use 200 of the 300.
Neyman optimal allocation
Recall these weights,
100
90
110
π‘Š1 =
= 0.3333 ,
π‘Š2 =
= 0.3 π‘Žπ‘›π‘‘ π‘Š3 =
= 0.3666,
300
300
300
was for the size of the stratum.
But we have found the optimal sample sizes
𝑛1 = 68,
𝑛2 = 55 π‘Žπ‘›π‘‘ 𝑛3 = 78
Now:
Hence,
π‘‰π‘Žπ‘Ÿ(π‘‹Μ…π‘ π‘œ ) =
(∑𝐿𝑙=1 π‘Šπ‘™ πœŽπ‘™ )2
𝑛
10
(∑3π‘˜=1 π‘Šπ‘˜ πœŽπ‘˜ )2
=
200
=
(0.3333 × 3.1623 + 0.3 × 2.8284 + 0.3666 × 3.3166)2
200
(3.1152)2
=
200
=
9.7045
200
π‘‰π‘Žπ‘Ÿ(π‘‹Μ…π‘ π‘œ ) = 0.0485
For the normal variance of the stratified mean
π‘‰π‘Žπ‘Ÿ(𝑋̅𝑠 ) = 0.0552 > π‘‰π‘Žπ‘Ÿ(π‘‹Μ…π‘ π‘œ ) = 0.0485
The optimal variance of the stratified mean is smaller than the normal variance of the
stratified mean. Good.
Proportional Allocation
Example:
11
Recall:
𝑛1 + 𝑛2 + 𝑛3 = 68 + 55 + 78 = 200
and
If 𝑡 = πŸ‘πŸŽπŸŽ π‘€β„Žπ‘’π‘Ÿπ‘’ 𝑁1 = 100, 𝑁2 = 90, 𝑁3 = 110 then
𝑛1
𝑛2
𝑛3
= 0.68 ≠
= 0.6111 ≠
= 0.7091
𝑁1
𝑁2
𝑁3
Then we can use 𝑛𝑙 = 𝑛
𝑛1 = 200
100
300
𝑁𝑙
𝑁
.
= 66.7 ≈ 67 π‘Žπ‘›π‘‘ 𝑛2 = 200
90
= 60 π‘Žπ‘›π‘‘ 𝑛3 = 200
300
110
300
= 73.3 ≈ 73
This the sample sizes for the proportional allocation. we need this t find the means of each
stratum and then the proportional stratified mean.
Mean of each sample from a stratum
𝐿
𝑋̅𝑠𝑝 = ∑
π‘Šπ‘™ 𝑋̅𝑙
𝑙=1
= π‘Š1 𝑋̅1 + π‘Š2 𝑋̅2 + π‘Š3 𝑋̅3
= (0.3333
67
60
73
1
1
1
∑ 𝑋𝑙1 + 0.3 ∑ 𝑋𝑙2 + 0.3666 ∑ 𝑋𝑙3 )
67 𝑖=1
60 𝑖=1
73 𝑖=1
67
𝑋̅𝑠𝑝 = 0.0049 ∑
60
𝑋𝑙1 + 0.005 ∑
𝑖=1
73
𝑋𝑙2 + 0.005 ∑
𝑖=1
If we have the data then we can complete the equation and find 𝑋̅𝑠𝑝 .
π‘‰π‘Žπ‘Ÿ(𝑋̅𝑠𝑝 ) =
∑𝐿𝑙=1 π‘Šπ‘™ πœŽπ‘™ 2
𝑛
Recall: 𝜎12 = 10, 𝜎22 = 8 π‘Žπ‘›π‘‘ 𝜎32 = 11
also π‘Š1 =
100
300
= 0.3333 , π‘Š2 =
90
300
= 0.3 π‘Žπ‘›π‘‘ π‘Š3 =
π‘‰π‘Žπ‘Ÿ(𝑋̅𝑠𝑝 ) =
=
110
300
= 0.3666
1
(π‘Š 𝜎 2 + π‘Š2 𝜎22 + π‘Š3 𝜎32 )
𝑛 1 1
1
(0.3333 × 10 + 0.3 × 8 + 0.3666 × 11)
200
12
𝑋𝑙3
𝑖=1
=
1
(3.333 + 2.4 + 4.0326)
200
=
1
(9.7656)
200
π‘‰π‘Žπ‘Ÿ(𝑋̅𝑠𝑝 ) = 0.04883
Summary:
π‘‰π‘Žπ‘Ÿ(𝑋̅𝑠 ) = 0.0552
π‘‰π‘Žπ‘Ÿ(π‘‹Μ…π‘ π‘œ ) = 0.0485
π‘‰π‘Žπ‘Ÿ(𝑋̅𝑠𝑝 ) = 0.0488
π‘‰π‘Žπ‘Ÿ(𝑋̅𝑠 ) = 0.0552 > π‘‰π‘Žπ‘Ÿ(𝑋̅𝑠𝑝 ) = 0.0488 ≈ π‘‰π‘Žπ‘Ÿ(π‘‹Μ…π‘ π‘œ ) = 0.0485
Since the optimal and the proportional allocation methods yield almost equal variances and
is smaller than the normal variance of the stratified mean, one or the other can be used.
___________________________________________________________________________
The End
13
Download