STT 825 F01 HOMEWORK #6 SOLUTIONS Due Wednesday, December 5 1. ( 19 points) We have a population with 8 clusters, and each cluster is composed of two strata. Consider the psu's as brokers, the ssu's as accounts which are stratified by type of account (I or II). The yij = fee per account. (Broker) psu 1 2 3 4 5 6 7 8 Number of Accounts Mi Type I Type II 100 50 50 100 50 50 100 60 40 100 30 70 200 100 100 200 100 100 200 120 80 200 40 160 We will take a with replacement PPS sample 2 brokers (psu's), then a stratified random sample (without replacement) of 10 accounts, proportionally allocated. a. i. If Broker #8 (psu #8) is selected, how many accounts (ssu's) would be selected from Type I Type II, respectively? Select (40/200) x 10 = 2 accounts from I, select 8 from II. and ii. What is the probability that Broker #7 is selected for the sample? M7/K = 200/1200 = 1/6 = .166667 = 7, so selection probability = 1-(1-.1666)2 = .3055 iii. Given that broker #7 is selected, we would then take a stratified random sample of 6 and 4 accounts from Types I and II, respectively. Given that broker #7 is selected, what is the probability that account #26 of Type I is then selected for the sample. P(acct #26, Type I selected Broker #7 selected) = 6/120 = 1/20 = .05 iv. Give the PPS probabilities for each broker (psu). Broker 1 2 3 4 5 i 1/12 1/12 1/12 1/12 1/6 or .083 .083 .083 .083 .167 6 1/6 .167 7 1/6 .167 8 1/6 .167 b. Suppose we took the sample and selected brokers #3 and #5. Data is given below and descriptive statistics are given below: Row y3Ii y3IIi y5Ii y5IIi 1 2 3 4 5 6 20.40 12.37 17.28 9.14 10.50 15.57 26.79 19.05 36.39 20.27 15.59 19.28 20.28 33.46 30.97 17.12 17.63 16.21 25.01 22.52 Descriptive Statistics: y3Ii, y3IIi, y5Ii, y5IIi Variable y3Ii y3IIi y5Ii N 6 4 5 Mean 14.21 25.63 23.92 Median 13.97 23.53 20.28 TrMean 14.21 25.63 23.92 1 StDev 4.30 7.94 7.82 SE Mean 1.76 3.97 3.50 y5IIi 5 19.70 ------------------------------ 17.63 19.70 3.85 1.72 i. Estimate the total fees for Broker #3 (use stratified sampling methods). Broker #3: 60(14.21) + 40 (25.63) = 852.6 + 1025.2 = 1877.8 ii. Estimate the total fees for Broker #5 (use stratified sampling methods). Broker #5: 100(23.92) + 100 (19.70) = 4362 c. i. Now you have estimated totals for Brokers #3 and #5 (from part (c)). Get an estimate of the total fees for all brokers by combining the 2 estimates using unequal probability sampling formulas. tˆ = ½{ tˆ 3/3 + tˆ 5/5} = ½{22533.6 + 26172} = 24352.8 ii. Compute its standard error of the estimator in (i) using unequal probability sampling formulas. Just find the sample variance of 22533.6 and 26172: = (2575)2. Then the estimated v variance = ½(2575)2 = 331,531.5. The SE = $1820.80 Problem #1, continued. d. i. Estimate the mean fee per account by the unbiased estimator. = tˆ / K = 24,352.8/1200 = $20.29 ii. Give the standard error of the estimator in (i). SE = 1820.80/1200 = $1.52 iii. Estimate K by the sum of the observed Mi/i. = ½{100/.083 + 200/.167} = 1200. Note that the estimated K is equal to K here because we have PPS. iv. Estimate the mean fee per account by a ratio estimator (hint: use (iii)). = $20.29 since estimated K is equal to K e. i. Find j3, the probability that ssu #j is in the sample given psu #3 was selected. j3 does not depend on the strata since we have exact proportional allocation. = 6/60 = 4/40 = 1/10 = .10 for both I and II. ii. Find j5. = 1//20 = .05 for both I and II. ----------------------------------------------------------------------------------------------------------------2. (6 points) Do text problem #13. Have PPS sample (self weighting). tˆ = (K/n) (sum of the observed psu means) Unit 14 12 09 14 16 01 14 10 21 yi 1.75 1.25 .25 .75 1.0 2.25 1.0 1.25 2.0 Sum = 17.0 answer = (807/10) (17) = 1371.9 The estimated variance = K2/n {sample variance of the yi ’s} = (807)2/10 {1.462}2, which give SE = 373. 2 11 5.5 --------------------------------------------------------------------------------------------------------------------------3. (9 points) Consider a sample of 4 units as given below. i 46 15 07 yi 17.9 18.4 14.1 59 15.8 a. Suppose the data were obtained from a simple random sample of size 4 from a population of size 80. i ____.05__ ___.05___ _.05_____ _.05_____ wi ___20___ ___20___ __20____ ___20___ i. Fill in the blanks in the i ( = probability that the ith unit is in the sample) row above. ii. Fill in the blanks in the wi (= sampling weights) row above. iii. Compute the Horvitz-Thompson estimator (show your work) of the population mean. tˆ = sum of the weights multiplied by yi = 20(17.9 + 18.4 + 14.1 + 15.8) = 1324, Estimated K = sum of the weights = 80, The H-T estimator of the mean = 1324/80 = 16.55 iv. Compute the sample mean and compare with (iii). Sample mean = (17.9 + 18.4 + 14.1 + 15.8)/4 = 16.55. b. Now suppose that the sample above is a stratified random sample where labels i=1-30 are from stratum #1 (size 30), and labels i=31-80 are from stratum #2 (size 50) (note the label numbers of the sample units to identify their stratum). i __.04____ __.066____ _.066_____ __.04____ wi __25____ ___15___ __15____ __25____ i. Fill in the blanks in the i ( = probability that the ith unit is in the sample) row above. ii. Fill in the blanks in the wi (= sampling weights) row above. iii. Compute the Horvitz-Thompson estimator (show your work) of the population mean. Find estimate of N = sum of the weights = 80, Then the H-T estimator of the mean = {25(17.9) + 25(15.8) + 15(18.4) + 15(14.1)}/80 = 16.625 iv. Compute the unbiased estimator of the population mean and compare with (iii). = y str = (N1/n1) y 1 + (N2/n2) y 2 = (30/80) ( 16.25) + (50/80) (16.85) = 16.625 --------------------------------------------------------------------------------------------------------------------------4. (7 points) Consider the population of brokers and accounts in Problem #1. Suppose we took a without replacement PPS sample n=2 brokers. The i’s are given below. i i 1 .1732 2 .1732 3 .1732 4 .1732 5 .3276 6 .3276 a. Keep the 2nd stage sampling the same as in #1. i. Find the sampling weights for broker #3 accounts. w3 wj3 = (1/.1732) (10) = 57.737 ii. Find the sampling weights for broker #5 accounts. =(1/.3276) (20) = 61.05 3 7 .3276 8 .3276 b. i. Compute the Horvitz-Thompson estimator of the total fees using weights formula. = 57.737 (sum of broker 3 fees) + (61.05) (sum of broker 5 fees) = 24,158.89 ii. Use the with replacement formula to get the approximate standard error of your estimator in (i). tˆ i tˆ i/(i/n) i i/n 3 1877.8 .1732/2=.0866 21683.6 5 4362 .3276/2=.1638 26630 Find the sample variance of 2 values: 21683.6 and 26630, = 12,233,450 Estimated variance of tˆ HT = ½(1-2/8) (12,233,450) = 4587543.7, SE = 2142 c. Compute the Horvitz-Thompson estimator of the mean fee per account. Must find KHT = sum of the weights = 57.737(10) + 61.05 (10) = 1187.87. HT estimator of the mean = 24,158.89/1187.87 = $20.34 ------------------------------------------------------------------------------------------------------------------------------5. (9 points) The target population is adults in a designated City. The City is first divided into 3 geographic areas consisting of 100, 300, and 600 blocks each. A simple random sample of 40, 40, and 60 blocks is taken from each part, respectively. Then for each selected block, a simple random sample of 10 households is taken. For each household, one adult is selected at random for an interview. a. A subset of the observations is given below. Give the sampling weight for each of the listed observational units. Area Block Number of Household Number of Adult Label Label Households Label adults in household Label Sampling Weight 1 35 120 95 3 2 __(2.5)(12)(3) = 90____ 2 297 56 06 4 3 __(7.5)(5.6)(4) = 168___ 3 488 52 20 1 1 __(10)(5.2)(1) = 52____ b. Which geographic area has been OVERSAMPLED? Proportional allocation: 10%, 30%, 60% of 140, or 14, 42, 84. Area 1 (n1=40) is oversampled. c. Suppose we decided to post-stratify by type of housing, and this further subdivided the geographic areas into single family/ multiple family housing. In Geographic area #1, there are 40 blocks of single family and 60 blocks of multiple family housing, and the sample of blocks from Geographic area #1 had 15 single family blocks and 25 multiple. Block #35 (in part (a)) is multiple. Give a new sampling weight for this adult (Area #1, Block #35, Household #95, Adult #2) with post stratification. The post-stratification occurs at the psu (blocks) level. When you combine the geographic strata and the type of housing strata, you get 6 strata. Then “Area 1 multiple” has 60 blocks in the population, and 25 in the sample. The new weight for the stage 1 psu is then 60/25 = 2.4 (which would then replace the 2.5 in the first line product. ans. (2.4) (12) (3) = 86.4 4