Design Effects: What are they and how do they affect your analysis? David R. Johnson Population Research Institute & Department of Sociology The Pennsylvania State University What are Design Effects? • Applies to the analysis of data gathered in a sample from a population. • For Social Science folks, this is survey data. • Design effects are the ways departures of the sampling frame from a simple random sample (SRS) impact statistical estimates from the sample. • These departures from a SRS can affect: – Standard errors and significance tests – Estimates of coefficients Simple Random Sampling • Much of statistical theory used to develop inferential statistics assumes a simple random sample. • SRS assumptions include: – Equal probability of selection for all elements – Each element selected at random independently from other elements in the sample. • If these assumptions are not met the estimates are likely to be in error (biased) • Yet most sample surveys depart from a SRS design. Why Depart from a Simple Random Sample? • To reduce data collection costs (increase the efficiency of sample). – – – Cluster sampling Stratification Disproportionate sampling • To adjust for bias in the sample. – – Design weights: (adjust for disproportionate sampling) Post-estimation weights: (adjust for non-response and coverage) Use of Clustering in Sampling Designs Example of a cluster sampling design in a multistage area probability sample. Would include in sample several (5 – 10) housing units in the final segment. Violates the SRS assumption that are elements are sampled independently Reduces cost by greatly decreasing listing and interviewer costs. Source: http://ccnmtl.columbia.edu/projects/qmss/samples_and_sampling/types_of_sampling.html Other common clustered designs • Students in Schools. Where schools are randomly sampled but multiple students are surveys in each selected school. – Example: Add Health (80 schools; many students in each school) • Members in Organizations. – Example: A random sample of long term care providers in which all employees were surveyed in each organization. The Impact of Clustering • Because two random elements sampled within the same cluster may be more similar than two random element selected between clusters the information gained by adding more elements within clusters is less than that gained by adding more clusters. • This can results in higher standard errors than would be found in a simple random sample. A measure of Design Effect (deff) • deff is a measure of how much the sampling variability in a sample differs from the sampling variability in a simple random sample. • deff = 1 + rho (n – 1) • Where rho is the interclass correlation and n is the number of elements in the cluster. • rho measures the similarity two randomly selected elements within a cluster compared to two randomly selected elements between clusters. The higher the value the more similar elements are within clusters. • A deff of 2, for example, would mean that it the sample would have to be twice as large to yield the same sampling variability (standard errors) that would have been found with a simple random sample. Example • A study of rent rates in large apartment complexes. • Draw a random sample of 50 apartment complexes in the population. • Randomly sample 10 apartments in each complex (n = 10). • If the rent of each apartment were the same within each apartment and different between each of the complexes then rho = 1 and deff = 1 + 1(10 -1) = 10 • In this extreme case, each additional apartment surveyed within a cluster adds no new information about the rental cost. • Only surveying one apartment in each complex would give us the same information (with the same standard error) about level of rent as we get from surveying 10. Example • Another extreme example… • If we studying a variable like “shoe size” of residents of apartments the estimate of the design effect might be quite different. • We would not expect “shoe size” to be clustered by apartment complex, so we expect rho = 0. • deff = 1 + 0(10 – 1) = 1 • The sampling variability in our cluster sample would be the same as found in a simple random sample An important point!!! • The design effect is not a fixed characteristic of the sample but one that differs from variable to variable. • Shown here for the clustering effect but this is also true of design effects from stratification and weighting. • When design effects are present our estimates and standard errors are likely to be wrong unless we adjust for the sampling design in calculating our estimates. Stratification • • • • • • • Stratification can make our sample more accurate than a simple random sample. We use prior knowledge about the distribution in the sample to reduce variability. For example, let’s say we have 1000 students in a school and we want to draw a representative sample of 100 of them. Assume we know the gender of each student in the school and 50% are male and 50% are female. If we randomly sample 50 from among the males and 50 from among the females the distribution by gender in our sample will be exactly the same as in the population. With a SRS this might not have been the case. Will improve the estimates for other variables only if they are also related to gender. Stratification • The most widely used stratification variables in large national probability samples are geographical. – Census Region – Metropolitan areas – Population sizes of geographical subareas • Census data and census estimates are often used to define the strata. What estimates do clustering and stratification affect? • These do not affect the point estimates – Means – Regression coefficients • They only affect the standard errors, confidence intervals, and significance tests. • Weights, however, can bias both the point estimates and the standard errors, confidence intervals, and significance tests. • The impact of weights on point estimates is widely know, but the effects on inferential statistics less so. Weights – The Good and the Bad • The Good – Weights are designed to increase the representativeness of our sample. – e.g. if the percent male in our sample is 40% but 50% in the population, we assign weights so each male is worth more than one male and each female is worth less than one female to yield the population percent. – Weights can adjust for design decisions as well, e.g., most surveys randomly select only one adult to interview per household so adults in households with several adults are underrepresented. – These can reduce the bias in our sample. Weights – The Good and the Bad • The Bad – Weights always yield a deff > 1 – The size of the design effect will be impacted by the variability in the weights. • Large differences in the size of the weights for the cases will result in larger deff • Very large weights appear to have more effect on the deff than very small weights. – Although weights decrease bias they do it at the cost of increasing the variability of our estimates. What to do… • More Bad News: • Most datasets used in the social science have at least one of these features that affect the estimates. • Most standard statistical software does not adjust the estimates for these design factors. • More journals and granting agencies are requiring that the statistical findings are adjusted for design effects. What to do… • But the Good News is: • The major statistical packages now have relatively easy to use procedures for most types of statistical analysis that adjust for them. • Design effects appear to have substantially less impact on the standard errors of coefficients from multivariate analysis (e.g. regression coefficients) than they do on descriptive statistics (means, percentages) • Previous published analytic research findings are not likely to be affected very much by failing to adjust for such effects (especially the effects of clustering and stratification) How can we adjust for the design effects? • Documentation for most large datasets contain information on the variables included in the data that can be used adjust for the design. • The design data can take several forms which require different adjustment procedures. The most common are: – Variables identifying the primary sampling units (psu), the strata, and the weight – A set of replicates (e.g. 40 – 80) variables that give the structure for a resampling (replication) method for adjusting standard errors and replace the need for information on the psu and strata. – A set of replicate weights (e.g. 40-80) that replace psu, strata and weight information. • (The replicate methods are used to hide the psu information for confidentiality reasons.) Software to adjust for Design Effects • Until recently, specialized software, not an integrated part of standard packages was required to include design information in the estimates. – Sudaan: A separate program later included in SAS – WesVar: A program using replicate methods available to some degree in SPSS but also stand alone – IVEware: A public domain software package from the University of Michigan • Flexible procedures for design effects now available in: – – – – SAS: A set of survey analysis procedures separate from Sudaan Stata: A comprehensive set of SVY: procedures R: A set of survey analysis procedures SPSS: A survey analysis module available for extra cost (not part of SPSS site license at Penn State) Computational procedures used to create the adjusted estimates. • Taylor series expansion method. Considered the “gold standard” method. – A computational method involving estimating non-linear equations. Equations are different for different statistics. – Requires information on the psu and strata to compute. • Re-sampling or Replication methods. – Uses techniques such as the Jackknife and Bootstrap to draw multiple replicate samples which convey information on the dispersion in the sample. – These methods need either a set of replicates or can generate these (in some software) if the psu and strata are available. The National Survey of Families and Households (NSFH) • A large national personal interview survey with a complex sampling design employing a multistage area probability sampling design with clusters and strata. • Over 13,000 respondents. • There were 100 primary sampling units and 1,700 clusters with an average of 7.1 respondents per cluster. • Provides design information and weights to adjust for design effects in two ways: – Variables for the strata and psu’s – A set of replicate variables Replicates in the NSFH • includes a set of 52 balanced, half-sample, random replicates instead of case-level information on the sampling units and strata. • Balanced half-sample replicates require two or more primary sampling units in each stratum. • For each replicate, one of the two primary sampling units in each stratum is assigned a value of zero, and the other is assigned a value of 1. • The primary sampling units assigned zero are excluded from that replicate. • Programs such as Stata or WesVar can use these to adjust for the design effects. Design Information also available for the NSFH study id 3 8 11 16 24 29 stratum 118 12 117 14 14 117 psu 68 3 67 12 12 67 newla 17 14 10 14 14 13 Listing Area or cluster The Stratum and psu variables can be used to convey design information to many software packages Stata svyset command for NSFH • svyset psu [pweight=weight] , strata(stratum) • To use the replicates in Stata you might want to consult a PRI programmer. Design Information in the American Community Survey (ACS) • Conducted by the Census Bureau as a substitute for the long form of the Census • A large mail survey with telephone and personal interview follow-ups of non-respondents. • Considered a complex survey design but it is not an area probability sample or a SRS. • Available as a public use dataset. • Presents design effects in a set of 80 replicate weights that include both design and weight information. Examples of replicate weights for ACS rw1 136 178 181 173 265 185 86 rw2 27 102 101 114 136 64 31 rw3 83 113 103 132 132 290 133 rw4 167 101 93 101 126 50 27 rw5 161 197 184 202 231 301 177 rw6 77 90 97 89 139 303 204 Using the ACS weights • The documentations suggests the following: – Conduct your analysis 80 times, substituting in each weight respectively. – Save your parameter estimate in a file. – The standard deviation of your estimate over the 80 runs is your correct standard error. • It may also be possible to do this with a setting in the svyset command in Stata. Setting the design parameters for a dataset. • Consult the documentation. – Examples for setting the design for some software packages is often provided. • May need to consult with a PRI programmer if in doubt. • Set the design and forget it!! • You only need to do this once… Thank You!!! • This PowerPoint will be available on the PRI web site. • There is also a list of references on the web site to sources that discuss and explain design effect issues.