SC971 Week 2: Stratified and Multistage Sampling We will build upon the basic concepts of survey sampling introduced last week to discuss the main features of practical sample designs: Proportionate Stratification Systematic Sampling Disproportionate Stratification Variable Sampling Fractions Multistage Sampling Probability Proportional to Size (PPS) Sampling Components of Design Effects Proportionate Stratified Sampling Think of the population as being divided into I subsets (i = 1, ... I), with Ni units in the ith subset (π = ∑πΌπ=1 ππ ). Proportionate sampling involves: - Selecting a sample independently from each subset (so we call the subsets sampling strata); - Selecting the same proportion, n/N, of units from each stratum Hence, the sample size in stratum i will be πππ π Motivation: to ensure (unlike SRS) that the sample proportion from any particular stratum equals the population proportion. Effect: This will increase precision if strata are correlated with survey measures. Standard Errors for Stratified Sampling In general: πππ(π₯Μ ) = ππ2 ππ2 πΌ ∑π=1 2 (1 π π π − ππ ππ ), - (2.1) where ππ2 = ππππ (π₯) is the variance of π₯ within stratum i Note: 1. Differences between strata do not contribute to ππππ (π₯). 2. Consequently, sampling variance will be reduced if strata are homogeneous (small ππ2 ). 3. In the case of proportionate sampling, 1 π π π πππ(π₯Μ ) = (1 − ) ∑πΌπ=1 ππ ππ2 ππ ππ π = , so π - (2.2) Example Recall the example from last week of a population of 6 individuals with associated measures. Now suppose that we know in advance which are men and which are women, so we treat these as two sampling strata: Men A 2 B 6 Women D 10 C 8 E 10 F 12 Now, design is to sample one from each stratum and estimate mean. There are now only 9 possible samples of 2 from 6 (cf. 15 with SRS) Sample 1 2 3 4 A E A F B C 5 6 7 B E B F C D 8 9 D E D F Members of sample A B A C A D Values in sample 2 6 2 8 2 2 2 6 10 10 12 8 6 6 6 8 8 8 10 10 10 10 10 12 10 10 12 10 12 12 Sample mean 4 5 6 8 6 7 7 B D 8 9 9 C E 9 C F E F 10 10 11 11 Example continued As before, we can calculate the variance of this sampling distribution: π₯Μ = 72⁄9 = 8 Sample Mean xi Frequency 5 6 7 8 9 10 11 1 1 2 1 2 1 1 N=9 Mean x freq 5 6 14 8 18 10 11 72 Deviation Squared deviation from mean (8) from mean xi ο x ο¨ x i ο x ο©2 -3 -2 -1 0 1 2 3 9 4 1 0 1 4 9 ππΈ 2 = 30⁄8 = 3.75 ππ. 64⁄14 = 4.57 (ππ π) Freq. x Sq.dev. ο₯ ο¨x i ο xο© 9 4 2 0 2 4 9 30 2 Design Effect due to Proportionate Stratified Sampling So, ππΈπΆ2 = 3.75. 2 Recall from last week, ππΈππ π = 64 14 = 4.57. Proportionate stratified sampling has reduced sampling variance by a factor of 3.75⁄4.57 = 0.82, compared to SRS. This is the design effect due to stratified sampling: π·πππ(π₯Μ ) = 0.82. [and π·πππ‘(π₯Μ ) = 0.91] Stratum Construction for Proportionate Stratification Choose strata that are homogeneous in terms of π₯ In other words, we want an association between π₯ and π, i.e. some variance between strata, not only variance within strata In general: - a larger number of strata will be associated with greater precision gains; - strata defined by cross-classification of multiple factors better than same number of strata defined by many categories of one factor (e.g. 4 age bands x gender x 4 regions vs. 32 age bands); - stratum boundaries should be chosen carefully for continuous factors. Implicit and Explicit Stratification Stratified sampling: sorting (stratifying) the sampling frame prior to selection. There are two ways of doing this: Explicit Stratification involves sorting the population list (frame) into distinct strata and then sampling independently from each stratum (as in above example) Implicit Stratification involves sampling systematically from an ordered (stratified) list It is possible (and often desirable) to combine explicit and implicit stratification - i.e. to stratify implicitly within explicit strata Systematic Sampling Involves sampling at a fixed interval down a list If the list is ordered in some meaningful way, this has the effect of (proportionate) stratification It also has the advantage of being easy to implement Procedure is to calculate the required interval (I), then generate a random start (R). The sampled units are then the Rth, (R+I)th, (R+2I)th etc. on the list. I = N/n, where N is the total number of units on the list, and n the desired sample size. R is a random number between 1 and I. Note that I need not be an integer. E.g. if desired n is 500 and N = 10,679, using I = 21.36 will give exactly n = 500, but rounding to I = 21, will give n = 508. Do not use I = 21 and then stop once 500 are sampled: biased! Stratified vs. Quota Sampling Imposing quotas has similar effect to stratification - namely to reduce sampling variance But, quota sampling also has inherent bias towards more accessible and more willing population members This may manifest itself as a bias in the survey measures Thus, quota sample estimates could have relatively high precision, but be biased and therefore have low accuracy (see last week’s lecture) So, if it is important to ensure correct representation in terms of a particular variable… generally better to do this through stratification and random selection than through the use of quotas and subjective selection Disproportionate Stratified Sampling Recall (last week) that probability sampling does not require all units to have an equal probability of selection Disproportionate sampling involves: - Selecting a sample independently from each stratum; - Not selecting the same proportion of units from each stratum, but instead allowing the sampling fraction, ππ = ππ ππ , to vary between strata Motivation 1: to increase sample size from particular strata of interest without unduly increasing overall sample size. Effect 1: to increase precision for estimation within that stratum but, usually, to reduce precision for total sample estimation. Motivation 2: to over-sample strata with particularly high variance. Effect 2: to increase precision for total sample estimation. Disproportionate Stratified Sampling: Estimation For unbiased estimation, we can no longer use the direct sample analogue of the population parameter. Instead, we should use the Horvitz-Thompson estimator, which in the case of a mean is: π ∑ π€ π π₯π πΜ Μ = ∑π=1 π π=1 π€π - (2.3) Where π€π is the design weight (or sampling weight) assigned to sample unit π. Design weights should be proportional to the inverse of the selection probability: π€ππ = ππ ππ . Design Effect due to Disproportionate Stratified Sampling Recall expression (2.1) (ignoring f.p.c.): πππ(π₯Μ ) = ππ2 ππ2 πΌ ∑π=1 2 π π π In the case of motivation 1 for disproportionate stratification, it will often be the case that ππ2 varies little, so it is instructive to consider consider πππ(π₯Μ ) when ππ2 = π 2 . Then, πππ(π₯Μ ) = So, π·πππ (π₯Μ ) = = 2 ππ πΌ ∑ π2 π=1 π π π π π2 And ππππ (π₯Μ ) = ∑ ππ (π€π2 ) - (2.4) π π·πππ(π₯Μ ) =∑ π2 ππ (π€π2 ) - (2.5) π2 ππ2 π ππ ∑πΌπ=1 2 Example Example: Suppose I = 2; N1 = N2; π12 = π22 = π 2 . Consider two alternative sampling schemes: a) Proportionate allocation: ππ ππ = π π ; b) Disproportionate allocation, where π1 π1 =2 π2 π2 ; i.e. π1 = Then, with design a), from (2.1) we have: πππ(π₯Μ ) = 2 π2 ππ πΌ ∑ π=1 π2 π π = 2 2(π⁄2) ( π ) π2 ⁄2 π2 = π2 π (of course!) And with design b), we have : πππ(π₯Μ ) = 2 (π⁄2) ( 2π π2 ⁄3 π2 So, π·πππ (π₯Μ ) = 9 8 + 2 (π⁄2) π⁄ ) 3 = 1.125 = 9 π2 8 π [and π·πππ (π₯Μ ) = 1.06] 2π 3 ; π2 = π 3 Deff due to Disproportionate Stratified Sampling, ctd. For designs with variable sampling fractions, it may be reasonable to assume ππ2 ≅ π 2 , so we can use the simplified expression (2.4), π·πππ (π₯Μ ) = π π2 ∑ ππ (π€π2 ), to approximate the impact on variance of estimates. In general, it will be found that: - a larger range of sampling fractions (weights) will result in a larger π·πππ (greater loss of precision); - over-sampling a large subgroup will result in a greater loss of precision than over-sampling a small subgroup; - when the main aim is to produce estimates for subgroups, equal sample sizes per subgroup will be an efficient design; when the main aim is to produce estimates for the total population, equal sampling fractions will be efficient. Graphical Illustration of Approximate Design Effect The graph shows the relationship between the proportion of the sample taken from a stratum with a relatively high sampling fraction (x-axis) and the consequent loss of precision, as measured by the design effect (y-axis). - The three lines relate to three different possible relative weights: 2:1, 4:1 and 10:1 (i.e. w1=1 in all cases). - Obviously, these examples are all for the simple case of I = 2. - It demonstrates the first two bullet points on the previous page 3.4 DEFF VSF 3 2.6 2.2 1.8 1.4 1 0 0.2 0.4 0.6 0.8 n1/n w2=2 w2=4 w2=10 1 Approximate Design Effect The simplified expression (2.4) for the design effect due to variable sampling fractions (disproportionate stratified sampling) in commonly used in the situation of motivation 1 (see earlier), i.e. a desire to over-sample a small subgroup in order to increase its representation It is also used in another situation where variable sampling fractions arise, though not (explicitly) from stratification, namely when the sampling frame or method gives us no choice, e.g. ο· When the frame contains duplicates that can only be identified as such at the data collection stage (example: survey of motorcyclists); ο· In certain multi-stage sampling situations, where there are constraints on the number of selections that can be made at the final stage (example: address-based sampling, with one person selected per address). Over-Sampling Variable Strata Motivation 3 for using disproportionate stratified sampling: Sometimes, we can identify strata that have high population variances. Over-sampling these strata will tend to increase the precision of total-sample estimates (reduce standard errors). We can only do this if we have advance estimates of stratum variances. Generally only used in rather specialist situations, e.g. business or agricultural surveys with large and predictable variation in stratum variances. Example Example: Suppose I = 2; N1 = N2; and that we predict π12 = 2π22 (= 1.33π 2 ππ πΜ 1 = πΜ 2 ). Consider two alternative sampling schemes: π π a) Proportionate allocation: π1 = ; π2 = ; 2 2 b) Disproportionate allocation: π1 = 0.58π; π2 = 0.42π Then, with design a), from expression (2.1) we have: πππ(π₯Μ ) = ππ2 ππ2 πΌ ∑ π2 π=1 π 1 π = 2 (1.33π 2 +0.66π 2 )(π⁄2) ( ) π⁄ π2 2 1 = And with design b), we have : πππ(π₯Μ ) = 2 1.33π 2 (π⁄2) ( π2 0.58π 1 + 2 0.66π 2 (π⁄2) ) 0.42π = 0.971 So, π·πππ (π₯Μ ) = 0.971 [and π·πππ (π₯Μ ) = 0.986] π2 π π2 π (of course!) Optimum Allocation Sampling variance for total sample estimates can in principle be minimised by following the optimum allocation rule: ππ ππ πΌ ππ √πΆπ Where πΆπ is the unit cost of data collection for a unit in stratum π. So, if data collection costs do not vary between strata, this simplifies to: ππ πΌ ππ ππ and if stratum variances are equal, it further simplifies to: ππ πΌ πΎ ππ demonstrating than an equal probability selection method is optimum in the situation where variances and data collection costs are equal in all strata (other things being equal). Practical Limits to Stratification In multistage sampling contexts, stratification is often only possible at PSU level (e.g. household surveys) Correlation between strata and survey variables is typically modest Multi-purpose nature of surveys: optimal stratification for one estimate may produce no benefit for another Typically there is a lack of information about stratum variances Example of Stratification The Health Survey for England 1996 (DH) Stage 1: Postcode Sectors stratified by: o o o o o The 14 Regional Health Authorities (1st-level explicit strata) Proportion of adults with limiting long-term illness, in three bands (2nd-level explicit strata) Proportion of households with "non-manual" head, in two bands (3rd-level explicit strata) Proportion of households with no car, in two bands (4th-level explicit strata) Proportion "non-white" (5th-level stratification: implicit) 720 sectors were sampled systematically Stage 2: Within each sector, addresses are in postcode order, and selected systematically. This provides some geographical stratification. Multi-Stage Sampling The units in the population are arranged hierarchically A 3-stage design would entail: Primary sampling units (PSUs) Secondary sampling units (SSUs) Sample elements It would be necessary to assign every element uniquely to one SSU and every SSU uniquely to one PSU Stage 1: select sample of PSUs Stage 2: select sample of SSUs within each selected PSU Stage 3: select sample of elements within each selected SSU PSUs, SSUs, Elements Example: general population survey: PSUs might be postcode sectors SSUs might be households Elements might be persons Example: business survey: PSUs might be companies SSUs might be workplaces Elements might be employees There could be any number of stages; 2, 3 or 4 are common Why Multi-Stage Sampling? No frame of elements available, but frame of PSUs available (example: national sample of school pupils, where schools could be PSUs) Cost of data collection (example: general population sample involving face-to-face interviewing) Access to elements may only be via “gatekeepers” (examples: students, employees, trainees) Data quality (example: in the case of face-to-face interviewing, field work can be better supervised if in clusters) Design Choices (clustering): Some General Points Larger sample sizes per cluster (πππ ) will generally result in larger design effects due to clustering (see later) But larger πππ will also generally result in larger cost savings (e.g. field interviewers, gatekeepers) Necessary to make an appropriate compromise: i.e. where cost saving outweighs loss in precision, to produce higher overall accuracy per unit cost (cf. last week’s lecture) Selection Probabilities for Multistage Sampling With multi-stage sampling, the selection probability of each element is the product of the (conditional) selection probabilities at each stage e.g. probability of sampling unit π in SSU π in PSU π is ππππ = ππ x ππ|π x ππ|π,π For unbiased estimation, we need to weight each sampled element πππ by π€πππ = 1/ππππ So, it is important to control and record the selection probabilities at each stage. Other things being equal, it is desirable to keep selection probabilities equal for all elements. With multi-stage sampling, there are many ways to do this. For example, in 2-stage sampling, whatever we set ππ to be, ππ|π can be set proportional to 1⁄ππ and this will produce an epsem (self-weighting) design. Selection Probabilities for Multistage Sampling ctd. There are three intuitive alternative ways to set selection probabilities: a) select PSUs with equal probabilities and then a fixed number of elements within each (ππ = π1 ∀ π). This results in unequal selection probabilities and is therefore undesirable because it will generally cause a loss in precision compared with an epsem design. b) select PSUs with equal probabilities and then a variable number of elements within each (ππ πΌ ππ ), to give equal overall selection probabilities. This design has practical problems. The elements in one PSU typically form one interviewer workload, so large variation in ππ is undesirable. There is additionally a (usually modest) loss of precision associated with variation in ππ . And, the sample size is not fixed in advance - it is a random variable! c) select PSUs with PPS and then a fixed number of elements within each. This overcomes the problems associated with a) and b), but it depends on the availability of an accurate measure of the number of elements in each PSU (and SSU). We now discuss this design further. Probability Proportional to Size (PPS) Selection How it works: A 2-stage design. We set ππ proportional to ππ (the number of elements in the population in PSU j). So ππ = C ππ . We then select the same number of elements, D, from each sampled PSU, so ππ|π = D/ ππ . Then, ππ = ππ x ππ|π = C ππ x D/ ππ = CD, which is the same for every element. Implementation: We do not need to calculate the selection probabilities at each stage in order to make the selection. We create a cumulative total down the list of PSUs and then sample systematically down that list of totals, including each PSU within which the interval falls. Example of PPS 2-Stage Sample Selection Example: Selection of 3 PSUs from 10 with PPS and 25 units from each selected PSU, so that n=75. PSU 1 2 3 4 5 6 7 8 9 10 Size 1000 900 800 1200 1500 1300 1100 500 1000 700 Cum. size 1000 1900 2700 3900 5400 6700 7800 8300 9300 10000 Selection * * * Now, N = 10,000 and n = 3 (PSUs). To select systematically (see earlier), I = 3,333 and R needs to be a random number between 1 and 3,333. Suppose we generate R = 1,050. Then, we sample the PSUs that contain elements 1,050, (1,050 + 3,333) and (1,050 + 2 x 3,333), i.e. PSUs 2, 5 and 7 Some Limitations of PPS Sampling of PSUs We may have only imperfect estimates of the number of elements in each PSU (the size measure): We could then adjust the sample size within each PSU to keep overall probabilities equal or we might simply weight by π€π =1/ππ The sampling interval might be smaller than the number of elements in some PSUs. (This will only happen if the sampling fraction of PSUs is large and/or the size of PSUs is highly variable.) Those PSUs will be certain to be sampled, and could be sampled more than once. We might place these PSUs in a separate stratum and include them with certainty (ππ = 1). We might also increase their sample size of elements, to keep overall probabilities equal, or we might weight. Design Effect due to Clustering Clustering tends to increase sampling variance (but this may be offset by the fact that a larger sample size can be obtained for any given cost). This is because units within a cluster tend to be more homogeneous than units as a whole. Clustering is therefore tending to have the opposite effect to stratification. The design effect due to clustering takes the form: π·πππππ = 1 + (π − 1)π - (2.6) where π is sample size per cluster (in practice π may vary – see note on next page), and π (roh) is the intra-cluster correlation. π = 0: randomly sorted clusters π = 1: perfectly homogeneous clusters Note: π is a population characteristic relating to the chosen definition of PSU; π is chosen by the researcher as part of the sample design e.g. π = 10: if π = 0 then π·πππππ = 1; if π = 1 then π·πππππ = 10; more realistically, if π = 0.05 then π·πππππ = 1.45 A Note about Cluster Sample Size Expression (2.6) strictly holds only when there is no variation in cluster sample size, i.e. ππ = π ∀ π. For complex surveys, where ππ may vary and, additionally, unequal selection probabilities may be used, the design effect due to clustering is: π·πππππ = 1 + (π ∗ − 1)π π½ where π ∗ = ππ 2 ∑π=1(∑π=1 π€ππ ) ππ 2 ∑π½π=1 ∑π=1 π€ππ - (2.7) . 2 Note that for an epsem design, this gives π ∗ = ∑π½π=1(ππ ) π½ ∑π=1 ππ In some situations, notably when variation in ππ is small, mean cluster size, πΜ = πΜ π , may provide an adequate approximation. But often it is a poor approximation: see Lynn & Gabler (2005) Lynn P & Gabler S (2005) Approximations to b* in the prediction of design effects due to clustering, Survey Methodology, 31, 101-104 Example of Intra-Cluster Correlations From the British Social Attitudes Survey: πΜ π Μ π·πππ‘ Μ if π = 10 π·πππ‘ Household size Owner-occupier Has telephone Asian Roman Catholic 0.070 0.231 0.102 0.334 0.037 16.6 16.5 16.5 8.3 16.4 1.45 2.14 1.61 1.86 1.25 1.28 1.75 1.38 1.53 1.15 Not racially prejudiced Extra-marital sex wrong Dodging VAT is OK 0.021 0.044 0.021 8.4 8.3 8.2 1.08 1.15 1.07 1.03 1.08 1.04 Variable Note: πΜ is low for attitudinal variables, so design effects small. But πΜ is large for variables related to ethnicity and housing type. Thus, the most effective degree of clustering might be greater for an attitude survey (fewer clusters, with larger ππ ) than for a housing survey. http://courses.essex.ac.uk/sc/sc971/