Lecture 5 (Survey Sampling continued) Variance and Stratified Sampling Recap from Lecture 4 Lecture was all about population parameters π, π 2 πππ π. Please remember that this almost always unknown. For the sake of exercises, we give population parameters but in real life we do not know them and therefor we need to estimate the parameters. We also introduced sampling with and without replacement and are always interested in the variance that the sample mean can have. Again, we can get the above IF we have the population parameter’s value, that is the value of π 2 . Otherwise we use the sample and determine the sample variance π 2 and use that to find the estimated variance of the sample mean. πππ(πΜ ) = π 2 . π We are still continuing with Chapter 7 and busy with Section 7.3 Survey Sampling Chapter 7 Rice Population Parameters Simple Random Sampling 7.2 7.3 Stratified Sampling 7.3.3 7.3.1 7.3.2. Expectation and variance of sample mean Estimation of population variance 1 Normal Approximation to the sampling distribution of the mean Please revise Chapter 8 of STA221. The takeaway: We are not doing _________________________________________________________________________ Survey Sampling Chapter 7 Rice Simple Random Sampling Population Parameters Stratified 7.3 7.2 Sampling 7.3.1 7.3.2. Expectation and variance of sample mean Estimation of population variance 2 Study this carefully- make sure you understand each aspect of the notation. In a town we divided the town into two suburbs. Sub A has 250 households, Sub B has 150 households. Give the values of the fraction of the population in the two strata to be used ππ , π = 1,2. π1 = 250 150 = 0.625 πππ π1 = = 0.375 400 400 Determine the population mean π = ∑2π=1 ππ ππ = π1 π1 + π2 π2 = 0.625π1 + 0.375π2 (we don’t know the population means in each strata) 3 A simple random sample of household expenditure on food πΏ, is taken from each of the suburb’s total households. Sub A- 110 and Sub B- 80. Total expenditure Sub A – R550 000.00 and Sub B – R360 000.00. (110 and 80 household) Give the mean expenditure of the suburbs. πΜ 1 = 1 (550000.00) = 5000 110 The average spending on food is R5000.00 in Suburb A. πΜ 2 = 1 (360000.00) = 4500 80 The average spending on food is R4500.00 in Suburb B. What is the overall mean of the two strata? 2 πΜ π = ∑ ππ πΜ π π=1 = π1 πΜ π + π2 πΜ π = 0.625(5000) + 0.375(4500) = 4812.50 The average spending on food over the two strata is R4812.50. 4 Can we show this with our example? We found that the samples gave an average of R4812.50 but the actual mean is unknown. We know from our sample that the two suburbs have an average sample mean of (5000 + 4500)/2 = 4750.00 We can therefor assume that πΈ(πΜ π ) = 4812.50 and therefore we can assume that it might be the mean of the 400 households. Why do want to know the variance over all the strata? Because we know that if we sample again in the strata we will get other values. Can we find the variance of our strata mean πΜ π = 4812.50? That is πππ(πΜ π )? 5 Yes, if we have the population variance of each of the strata or the sample variance of each strata. 1 110 − 1 2 1 80 − 1 ) (1 − ) π1 + π22 ( ) (1 − ) π2 110 400 − 1 80 400 − 1 2 1 110 − 1 2 1 80 − 1 = π12 ( ) (1 − ) π1 + π22 ( ) (1 − ) π2 110 400 − 1 80 400 − 1 2 πππ(πΜ π ) = π12 ( = π12 (0.0066)π12 + π22 (0.100)π22 Recall: π1 = 250 150 = 0.625 πππ π1 = = 0.375 400 400 = (0.625)2 (0.0066)π12 + (0.375)2 (0.0100)π22 = 0.00257π12 + 0.0014π22 Hence, if we have the sample variance of a strata π π2 we can estimate ππ2 where π = 1,2, Substituting these values with the sample variance will give an estimated value of the variance of the sample mean across the strata. Note: The tests and the exam will be mainly applications of the theory presented. Hence, study the examples carefully. Note that the population parameters are known. 6 Now we apply simple random sampling within each of the strata. 7 This simply says that if a population indicates that there are three strata π1 , π2 , π3 , how large will be choose each sample from the stratum to be. That is how large must π1 be if we sample from π1 ? Recall: This is the variance of the stratified sample mean. 2 2 ππ π If πππ(πΜ π ) = ∑πΏπ=1 π ππ where • ππ = ππ π πππ ππ ππ π π π‘πππ‘π’π πππ π ππ π‘βπ ππππ’πππ‘πππ π‘ππ‘ππ • is the variance of the stratum • ππ is the random sample size drawn from ππ Example: We have 3 natural strata in a population. Three samples are drawn, one from each of the strata. ππ2 If π΅ = πππ π€βπππ π1 = 100, π2 = 90, π3 = 110, π‘βππ the weights of each stratum is 100 90 110 π1 = = 0.3333 , π2 = = 0.3 πππ π3 = = 0.3666 300 300 300 Also we have π12 = 10, π22 = 8 πππ π32 = 11 π‘βππ Now we need to sample from each stratum. Let us take a sample of 60 from each of the stratum. ππ2 ππ2 (0.3333)2 (10) (0.3)2 (8) (0.3666)2 (11) = + + 60 60 60 π=1 ππ = 0.1111(10) + 0.09(8) + 0.1344(11)/60 = 1.1110 + 0.72 + 1.4784/60 = 3.3094/60 πππ(πΜ π ) = 0.0552 πΏ πππ(πΜ π ) ≈ ∑ 8 However, say we cannot use just sample equally form each stratum due to constraints of money and time, then we can get good sample sizes using the following where we constrain the number that we may sample. Let’s apply Theorem A by now choosing the appropriate sample size for each stratum using of the previous values. πππ ππ ππ = πΏ ∑π=1 ππ ππ Where • π is what resources allows us to sample • ππ ππ is as before If we allow π = πππ so that π1 + π2 + π3 = 200 then we have to find the values of the ππ so that the sum will be equal to 200. For sample size from stratum 1: ππ1 π1 π1 = 3 ∑π=1 ππ ππ ππ1 π2 π1 = π1 π1 + π2 π2 + π3 π3 We have If π = 300 π€βπππ π1 = 100, π2 = 90, π3 = 110, π‘βππ 100 90 110 π1 = = 0.3333 , π2 = = 0.3 πππ π3 = = 0.3666 300 300 300 If π12 = 10, π22 = 8 πππ π32 = 11 and π1 = 3.1623, π2 = 2.8284 πππ π3 = 3.3166 The weight of each sampled size is ππ ππ π€π = πΏ ∑π=1 ππ ππ Then, π1 = = ππ1 π1 3 ∑π=1 ππ ππ 200(0.3333 )(3.1623) 0.3333 × 3.1623 + 0.3 × 2.8284 + 0.3666 × 3.3166 210.7989 = 1.0539 + 0.8485 + 1.2158 9 210.7989 3.1152 ππ = ππ. ππππ ≈ ππ = π2 = = ππ2 π2 3 ∑π=1 ππ ππ 200(0.3 )(2.8284) 0.3333 × 3.1623 + 0.3 × 2.8284 + 0.3666 × 3.3166 169.704 = 1.0539 + 0.8485 + 1.2158 169.704 = 3.1152 ππ = ππ. πππ ≈ ππ 243.173 3.1152 ππ = ππ. ππππ ≈ ππ π3 = Check: π1 + π2 + π3 = 68 + 55 + 78 = 200 This means that these samples sizes will be good for the resources we have and to only sample these numbers from the strata in a population (where the total is 300) since we can only afford to use 200 of the 300. Neyman optimal allocation Recall these weights, 100 90 110 π1 = = 0.3333 , π2 = = 0.3 πππ π3 = = 0.3666, 300 300 300 was for the size of the stratum. But we have found the optimal sample sizes π1 = 68, π2 = 55 πππ π3 = 78 Now: Hence, πππ(πΜ π π ) = (∑πΏπ=1 ππ ππ )2 π 10 (∑3π=1 ππ ππ )2 = 200 = (0.3333 × 3.1623 + 0.3 × 2.8284 + 0.3666 × 3.3166)2 200 (3.1152)2 = 200 = 9.7045 200 πππ(πΜ π π ) = 0.0485 For the normal variance of the stratified mean πππ(πΜ π ) = 0.0552 > πππ(πΜ π π ) = 0.0485 The optimal variance of the stratified mean is smaller than the normal variance of the stratified mean. Good. Proportional Allocation Example: 11 Recall: π1 + π2 + π3 = 68 + 55 + 78 = 200 and If π΅ = πππ π€βπππ π1 = 100, π2 = 90, π3 = 110 then π1 π2 π3 = 0.68 ≠ = 0.6111 ≠ = 0.7091 π1 π2 π3 Then we can use ππ = π π1 = 200 100 300 ππ π . = 66.7 ≈ 67 πππ π2 = 200 90 = 60 πππ π3 = 200 300 110 300 = 73.3 ≈ 73 This the sample sizes for the proportional allocation. we need this t find the means of each stratum and then the proportional stratified mean. Mean of each sample from a stratum πΏ πΜ π π = ∑ ππ πΜ π π=1 = π1 πΜ 1 + π2 πΜ 2 + π3 πΜ 3 = (0.3333 67 60 73 1 1 1 ∑ ππ1 + 0.3 ∑ ππ2 + 0.3666 ∑ ππ3 ) 67 π=1 60 π=1 73 π=1 67 πΜ π π = 0.0049 ∑ 60 ππ1 + 0.005 ∑ π=1 73 ππ2 + 0.005 ∑ π=1 If we have the data then we can complete the equation and find πΜ π π . πππ(πΜ π π ) = ∑πΏπ=1 ππ ππ 2 π Recall: π12 = 10, π22 = 8 πππ π32 = 11 also π1 = 100 300 = 0.3333 , π2 = 90 300 = 0.3 πππ π3 = πππ(πΜ π π ) = = 110 300 = 0.3666 1 (π π 2 + π2 π22 + π3 π32 ) π 1 1 1 (0.3333 × 10 + 0.3 × 8 + 0.3666 × 11) 200 12 ππ3 π=1 = 1 (3.333 + 2.4 + 4.0326) 200 = 1 (9.7656) 200 πππ(πΜ π π ) = 0.04883 Summary: πππ(πΜ π ) = 0.0552 πππ(πΜ π π ) = 0.0485 πππ(πΜ π π ) = 0.0488 πππ(πΜ π ) = 0.0552 > πππ(πΜ π π ) = 0.0488 ≈ πππ(πΜ π π ) = 0.0485 Since the optimal and the proportional allocation methods yield almost equal variances and is smaller than the normal variance of the stratified mean, one or the other can be used. ___________________________________________________________________________ The End 13