STATISTICAL METHODS FOR REDUCING BIAS IN WEB SURVEYS 13rd September 2012 Myoung Ho Lee Introduction Web surveys Methodology - Propensity Score Adjustment - Calibration (Rim weighting) Case Study Discussion and Conclusion Contents 2 • Trends in Data Collection Paper and Pencil => Telephone => Computer => Internet (Web) • Internet penetration Introduction 3 Pros and Cons of Web surveys • Pros - Low cost and Speed - No interviewer effect - Visual, flexible and interactive - Respondents convenience • Cons - Quality of sample estimates Web surveys may be solutions! But, Problems!!! Introduction 4 Previous Studies • Harris Interactive (2000 ~ ) • Lee (2004), Lee and Valliant (2009) • Hur and Cho (2009) • Bethlehem (2010), etc. Lee and Valliant (2009) : good performance in simulation But, most other results do not seem to be so good. - Malhotra and Krosnick (2007), Huh and Cho (2009) Introduction 5 Volunteer Panel Web Survey Protocol (Lee, 2004) Under-coverage Self-selection Non-response Challenge: Fix anticipated biases in web survey estimates that result from under-coverage, self-selection and non-response Web surveys 6 Proposed Adjustment Procedure for Volunteer Panel Web surveys (Lee, 2004) Methodology 7 Propensity Score Adjustment (PSA) • Original idea : Comparison of two groups, treatment and control, in observational studies (Rosenbaum and Rubin, 1983) - by weighting using all auxiliary variables that are thought to account for the differences • In context of web surveys, this technique aims to correct for differences between offline people and online people - by certain inclinations of people who participate in the volunteer panel web survey Methodology 8 • “Webographic” : overlapping variables between web and reference survey - To capture the difference between online and offline populations (Schonlau et al., 2007) - For example, “Do you feel alone?”, “In the last month have you read a book?”…… (Harris Interactive) Methodology 9 • Propensity score : It is assumed that zi are independent given a set of covariates (xi) • ‘Strong ignorability assumption’ : Response variable is conditionally independent of treatment assignment given the propensity score. Methodology 10 Logistic regression model : Variable Selection • Include variables related to not only treatment assignment but also response in order to satisfy the ‘strong ignorability assumption’ (Rosenbaum and Rubin, 1984; Brookhart et al., 2006) Methodology 11 Variable Selection • In practice, stepwise selection method has been often used to develop good predictive models for treatment assignment • Most previous web studies : Use of all available covariates (5-30) • Huh and Cho (2009) : 9 or 7 out of 123 covariates were chosen by their “subjective” views Methodology 12 Variable Selection • Stepwise logistic regression using SIC - large number of covariates, little theoretical guidance • LASSO (PROC GLMSELECT in SAS) - a good alternative to stepwise variable selection • Boosted tree (“gbm” in R) - determine a set of split conditions Methodology 13 Applying methods for PSA • Inverse propensity scores as weights - weights : - then, multiply them with sampling weights • Subclassification (Stratification) - subgrouping homogenous people into each stratum Methodology 14 • Subclassification (Stratification) 1. Combine both reference and web data into one 2. Estimate each propensity score from the combined sample 3. Partition those units into C subclasses according to ordered values, where each subclass has about the same number of units 4. Compute adjustment factor, and apply to all units in the cth subclass. 5. Multiply the factor with sampling weights to get PSA weights Methodology 15 Calibration (Rim weighting) • Matching sample and population characteristics only with respect to the marginal distributions of selected covariates • Little and Wu (1991) - Iterative algorithm to alternatively adjust weights according to each covariates’ marginal distribution until convergence Methodology 16 Case Study • Reference survey : “2009 Social Survey” by Statistics Korea - Culture & Leisure, Income & Consumption, etc. - All persons aged +15 in 17,000 households - Sample size : 37,049 - Face-to-face mode - Post-stratification estimation - Assumed to be “True” Case Study 17 • Web survey - Recruiting volunteers from web sites (6,854 households) - Systematic sampling with non-equal selection probabilities (inverse of rim weights using region, age, gender) - Sample size : 1,500 households and 2,903 respondents - Overlapping covariates : 123 Case Study 18 M1 = Stepwise(22), M2 = Stepwise(17), M3 = LASSO(12), M4 = Boosted tree(18) Case Study – Model Selecion 19 Assessment methods • 16 combinations : (Model 1, 2, 3 and 4) × (Inverse weighting and Subclassification) × (No Calibration and Rim weighting) • 12 response variables • Percentage of bias reduction Case Study 20 Percentage of bias reduction PSA alone Calibration Inverse weighting Subclassification Inverse weighting Subclassification M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 • Why PSA doesn’t work well alone ??? Propensity scores for each survey in 5 strata in Model 1 Discussion 22 What are the possible solutions to fix poor PSA? • Setting maximum value of weight • Different subclassification algorithm - Formula for the variance of weights that depends on both the number of cases from each group within a stratum and the variability of propensity scores with the stratum • Matching PSA - limited number of treated group members and a larger number of control group members Discussion 23 • Violation of some assumptions - ‘Strong ignorability assumption’ - Missing at random (MAR) - Mode effects • Variable selection (What are webographic variables?) - Models affect the performance of PSA significantly - Maybe expert knowledge, not statistical approach - Further studies are needed Discussion 24 • Web surveys have attractive advantages • However, bias from self-selection, under-coverage, non-responses • According to my case study results, => It seems to be difficult to apply PSA to “real world” just now • Further researches on webographic variables and different PSA methods are needed Conclusion 25