Lecture 5 Normal Approx and Stratified Sampling

Lecture 5 (Survey Sampling continued) Variance and Stratified Sampling Recap from Lecture 4 Lecture was all about population parameters 𝜇, 𝜎 2 𝑎𝑛𝑑 𝜏. Please remember that this almost always unknown. For the sake of exercises, we give population parameters but in real life we do not know them and therefor we need to estimate the parameters. We also introduced sampling with and without replacement and are always interested in the variance that the sample mean can have. Again, we can get the above IF we have the population parameter’s value, that is the value of 𝜎 2 . Otherwise we use the sample and determine the sample variance 𝑠 2 and use that to find the estimated variance of the sample mean. 𝑉𝑎𝑟(𝑋̅) = 𝑠2 . 𝑛 We are still continuing with Chapter 7 and busy with Section 7.3 Survey Sampling Chapter 7 Rice Population Parameters Simple Random Sampling 7.2 7.3 Stratified Sampling 7.3.3 7.3.1 7.3.2. Expectation and variance of sample mean Estimation of population variance 1 Normal Approximation to the sampling distribution of the mean Please revise Chapter 8 of STA221. The takeaway: We are not doing _________________________________________________________________________ Survey Sampling Chapter 7 Rice Simple Random Sampling Population Parameters Stratified 7.3 7.2 Sampling 7.3.1 7.3.2. Expectation and variance of sample mean Estimation of population variance 2 Study this carefully- make sure you understand each aspect of the notation. In a town we divided the town into two suburbs. Sub A has 250 households, Sub B has 150 households. Give the values of the fraction of the population in the two strata to be used 𝑊𝑙 , 𝑙 = 1,2. 𝑊1 = 250 150 = 0.625 𝑎𝑛𝑑 𝑊1 = = 0.375 400 400 Determine the population mean 𝜇 = ∑2𝑙=1 𝑊𝑙 𝜇𝑙 = 𝑊1 𝜇1 + 𝑊2 𝜇2 = 0.625𝜇1 + 0.375𝜇2 (we don’t know the population means in each strata) 3 A simple random sample of household expenditure on food 𝑿, is taken from each of the suburb’s total households. Sub A- 110 and Sub B- 80. Total expenditure Sub A – R550 000.00 and Sub B – R360 000.00. (110 and 80 household) Give the mean expenditure of the suburbs. 𝑋̅1 = 1 (550000.00) = 5000 110 The average spending on food is R5000.00 in Suburb A. 𝑋̅2 = 1 (360000.00) = 4500 80 The average spending on food is R4500.00 in Suburb B. What is the overall mean of the two strata? 2 𝑋̅𝑠 = ∑ 𝑊𝑙 𝑋̅𝑙 𝑙=1 = 𝑊1 𝑋̅𝑙 + 𝑊2 𝑋̅𝑙 = 0.625(5000) + 0.375(4500) = 4812.50 The average spending on food over the two strata is R4812.50. 4 Can we show this with our example? We found that the samples gave an average of R4812.50 but the actual mean is unknown. We know from our sample that the two suburbs have an average sample mean of (5000 + 4500)/2 = 4750.00 We can therefor assume that 𝐸(𝑋̅𝑠 ) = 4812.50 and therefore we can assume that it might be the mean of the 400 households. Why do want to know the variance over all the strata? Because we know that if we sample again in the strata we will get other values. Can we find the variance of our strata mean 𝑋̅𝑠 = 4812.50? That is 𝑉𝑎𝑟(𝑋̅𝑠 )? 5 Yes, if we have the population variance of each of the strata or the sample variance of each strata. 1 110 − 1 2 1 80 − 1 ) (1 − ) 𝜎1 + 𝑊22 ( ) (1 − ) 𝜎2 110 400 − 1 80 400 − 1 2 1 110 − 1 2 1 80 − 1 = 𝑊12 ( ) (1 − ) 𝜎1 + 𝑊22 ( ) (1 − ) 𝜎2 110 400 − 1 80 400 − 1 2 𝑉𝑎𝑟(𝑋̅𝑠 ) = 𝑊12 ( = 𝑊12 (0.0066)𝜎12 + 𝑊22 (0.100)𝜎22 Recall: 𝑊1 = 250 150 = 0.625 𝑎𝑛𝑑 𝑊1 = = 0.375 400 400 = (0.625)2 (0.0066)𝜎12 + (0.375)2 (0.0100)𝜎22 = 0.00257𝜎12 + 0.0014𝜎22 Hence, if we have the sample variance of a strata 𝑠𝑙2 we can estimate 𝜎𝑙2 where 𝑙 = 1,2, Substituting these values with the sample variance will give an estimated value of the variance of the sample mean across the strata. Note: The tests and the exam will be mainly applications of the theory presented. Hence, study the examples carefully. Note that the population parameters are known. 6 Now we apply simple random sampling within each of the strata. 7 This simply says that if a population indicates that there are three strata 𝑁1 , 𝑁2 , 𝑁3 , how large will be choose each sample from the stratum to be. That is how large must 𝑛1 be if we sample from 𝑁1 ? Recall: This is the variance of the stratified sample mean. 2 2 𝜎𝑙 𝑊 If 𝑉𝑎𝑟(𝑋̅𝑠 ) = ∑𝐿𝑙=1 𝑙 𝑛𝑙 where • 𝑊𝑙 = 𝑁𝑙 𝑁 𝑎𝑛𝑑 𝑁𝑙 𝑖𝑠 𝑎 𝑠𝑡𝑟𝑎𝑡𝑢𝑚 𝑎𝑛𝑑 𝑁 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑡𝑜𝑡𝑎𝑙 • is the variance of the stratum • 𝑛𝑙 is the random sample size drawn from 𝑁𝑙 Example: We have 3 natural strata in a population. Three samples are drawn, one from each of the strata. 𝜎𝑙2 If 𝑵 = 𝟑𝟎𝟎 𝑤ℎ𝑒𝑟𝑒 𝑁1 = 100, 𝑁2 = 90, 𝑁3 = 110, 𝑡ℎ𝑒𝑛 the weights of each stratum is 100 90 110 𝑊1 = = 0.3333 , 𝑊2 = = 0.3 𝑎𝑛𝑑 𝑊3 = = 0.3666 300 300 300 Also we have 𝜎12 = 10, 𝜎22 = 8 𝑎𝑛𝑑 𝜎32 = 11 𝑡ℎ𝑒𝑛 Now we need to sample from each stratum. Let us take a sample of 60 from each of the stratum. 𝑊𝑙2 𝜎𝑙2 (0.3333)2 (10) (0.3)2 (8) (0.3666)2 (11) = + + 60 60 60 𝑙=1 𝑛𝑙 = 0.1111(10) + 0.09(8) + 0.1344(11)/60 = 1.1110 + 0.72 + 1.4784/60 = 3.3094/60 𝑉𝑎𝑟(𝑋̅𝑠 ) = 0.0552 𝐿 𝑉𝑎𝑟(𝑋̅𝑠 ) ≈ ∑ 8 However, say we cannot use just sample equally form each stratum due to constraints of money and time, then we can get good sample sizes using the following where we constrain the number that we may sample. Let’s apply Theorem A by now choosing the appropriate sample size for each stratum using of the previous values. 𝑛𝑊𝑙 𝜎𝑙 𝑛𝑙 = 𝐿 ∑𝑘=1 𝑊𝑘 𝜎𝑘 Where • 𝒏 is what resources allows us to sample • 𝑊𝑙 𝜎𝑙 is as before If we allow 𝒏 = 𝟐𝟎𝟎 so that 𝑛1 + 𝑛2 + 𝑛3 = 200 then we have to find the values of the 𝑛𝑖 so that the sum will be equal to 200. For sample size from stratum 1: 𝑛𝑊1 𝜎1 𝑛1 = 3 ∑𝑘=1 𝑊𝑘 𝜎𝑘 𝑛𝑊1 𝜎2 𝑛1 = 𝑊1 𝜎1 + 𝑊2 𝜎2 + 𝑊3 𝜎3 We have If 𝑁 = 300 𝑤ℎ𝑒𝑟𝑒 𝑁1 = 100, 𝑁2 = 90, 𝑁3 = 110, 𝑡ℎ𝑒𝑛 100 90 110 𝑊1 = = 0.3333 , 𝑊2 = = 0.3 𝑎𝑛𝑑 𝑊3 = = 0.3666 300 300 300 If 𝜎12 = 10, 𝜎22 = 8 𝑎𝑛𝑑 𝜎32 = 11 and 𝜎1 = 3.1623, 𝜎2 = 2.8284 𝑎𝑛𝑑 𝜎3 = 3.3166 The weight of each sampled size is 𝑊𝑙 𝜎𝑙 𝑤𝑙 = 𝐿 ∑𝑘=1 𝑊𝑘 𝜎𝑘 Then, 𝑛1 = = 𝑛𝑊1 𝜎1 3 ∑𝑘=1 𝑊𝑘 𝜎𝑘 200(0.3333 )(3.1623) 0.3333 × 3.1623 + 0.3 × 2.8284 + 0.3666 × 3.3166 210.7989 = 1.0539 + 0.8485 + 1.2158 9 210.7989 3.1152 𝒏𝟏 = 𝟔𝟕. 𝟔𝟔𝟕𝟖 ≈ 𝟔𝟖 = 𝑛2 = = 𝑛𝑊2 𝜎2 3 ∑𝑘=1 𝑊𝑘 𝜎𝑘 200(0.3 )(2.8284) 0.3333 × 3.1623 + 0.3 × 2.8284 + 0.3666 × 3.3166 169.704 = 1.0539 + 0.8485 + 1.2158 169.704 = 3.1152 𝒏𝟐 = 𝟓𝟒. 𝟒𝟕𝟔 ≈ 𝟓𝟒 243.173 3.1152 𝒏𝟑 = 𝟕𝟖. 𝟎𝟔𝟎𝟐 ≈ 𝟕𝟖 𝑛3 = Check: 𝑛1 + 𝑛2 + 𝑛3 = 68 + 55 + 78 = 200 This means that these samples sizes will be good for the resources we have and to only sample these numbers from the strata in a population (where the total is 300) since we can only afford to use 200 of the 300. Neyman optimal allocation Recall these weights, 100 90 110 𝑊1 = = 0.3333 , 𝑊2 = = 0.3 𝑎𝑛𝑑 𝑊3 = = 0.3666, 300 300 300 was for the size of the stratum. But we have found the optimal sample sizes 𝑛1 = 68, 𝑛2 = 55 𝑎𝑛𝑑 𝑛3 = 78 Now: Hence, 𝑉𝑎𝑟(𝑋̅𝑠𝑜 ) = (∑𝐿𝑙=1 𝑊𝑙 𝜎𝑙 )2 𝑛 10 (∑3𝑘=1 𝑊𝑘 𝜎𝑘 )2 = 200 = (0.3333 × 3.1623 + 0.3 × 2.8284 + 0.3666 × 3.3166)2 200 (3.1152)2 = 200 = 9.7045 200 𝑉𝑎𝑟(𝑋̅𝑠𝑜 ) = 0.0485 For the normal variance of the stratified mean 𝑉𝑎𝑟(𝑋̅𝑠 ) = 0.0552 > 𝑉𝑎𝑟(𝑋̅𝑠𝑜 ) = 0.0485 The optimal variance of the stratified mean is smaller than the normal variance of the stratified mean. Good. Proportional Allocation Example: 11 Recall: 𝑛1 + 𝑛2 + 𝑛3 = 68 + 55 + 78 = 200 and If 𝑵 = 𝟑𝟎𝟎 𝑤ℎ𝑒𝑟𝑒 𝑁1 = 100, 𝑁2 = 90, 𝑁3 = 110 then 𝑛1 𝑛2 𝑛3 = 0.68 ≠ = 0.6111 ≠ = 0.7091 𝑁1 𝑁2 𝑁3 Then we can use 𝑛𝑙 = 𝑛 𝑛1 = 200 100 300 𝑁𝑙 𝑁 . = 66.7 ≈ 67 𝑎𝑛𝑑 𝑛2 = 200 90 = 60 𝑎𝑛𝑑 𝑛3 = 200 300 110 300 = 73.3 ≈ 73 This the sample sizes for the proportional allocation. we need this t find the means of each stratum and then the proportional stratified mean. Mean of each sample from a stratum 𝐿 𝑋̅𝑠𝑝 = ∑ 𝑊𝑙 𝑋̅𝑙 𝑙=1 = 𝑊1 𝑋̅1 + 𝑊2 𝑋̅2 + 𝑊3 𝑋̅3 = (0.3333 67 60 73 1 1 1 ∑ 𝑋𝑙1 + 0.3 ∑ 𝑋𝑙2 + 0.3666 ∑ 𝑋𝑙3 ) 67 𝑖=1 60 𝑖=1 73 𝑖=1 67 𝑋̅𝑠𝑝 = 0.0049 ∑ 60 𝑋𝑙1 + 0.005 ∑ 𝑖=1 73 𝑋𝑙2 + 0.005 ∑ 𝑖=1 If we have the data then we can complete the equation and find 𝑋̅𝑠𝑝 . 𝑉𝑎𝑟(𝑋̅𝑠𝑝 ) = ∑𝐿𝑙=1 𝑊𝑙 𝜎𝑙 2 𝑛 Recall: 𝜎12 = 10, 𝜎22 = 8 𝑎𝑛𝑑 𝜎32 = 11 also 𝑊1 = 100 300 = 0.3333 , 𝑊2 = 90 300 = 0.3 𝑎𝑛𝑑 𝑊3 = 𝑉𝑎𝑟(𝑋̅𝑠𝑝 ) = = 110 300 = 0.3666 1 (𝑊 𝜎 2 + 𝑊2 𝜎22 + 𝑊3 𝜎32 ) 𝑛 1 1 1 (0.3333 × 10 + 0.3 × 8 + 0.3666 × 11) 200 12 𝑋𝑙3 𝑖=1 = 1 (3.333 + 2.4 + 4.0326) 200 = 1 (9.7656) 200 𝑉𝑎𝑟(𝑋̅𝑠𝑝 ) = 0.04883 Summary: 𝑉𝑎𝑟(𝑋̅𝑠 ) = 0.0552 𝑉𝑎𝑟(𝑋̅𝑠𝑜 ) = 0.0485 𝑉𝑎𝑟(𝑋̅𝑠𝑝 ) = 0.0488 𝑉𝑎𝑟(𝑋̅𝑠 ) = 0.0552 > 𝑉𝑎𝑟(𝑋̅𝑠𝑝 ) = 0.0488 ≈ 𝑉𝑎𝑟(𝑋̅𝑠𝑜 ) = 0.0485 Since the optimal and the proportional allocation methods yield almost equal variances and is smaller than the normal variance of the stratified mean, one or the other can be used. ___________________________________________________________________________ The End 13

Lecture 5 Normal Approx and Stratified Sampling

Related documents

Products

Support

Lecture 5 Normal Approx and Stratified Sampling

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib