A Theoretical Framework for Adaptive Collection Designs

A Theoretical Framework for Adaptive Collection Designs Jean-François Beaumont, Statistics Canada David Haziza, Université de Montréal International Total Survey Error Workshop Québec, June 19-22, 2011 Overview  Selected literature review  Framework • Definition of the problem • Choice of quality indicator and cost function • Mathematical formulation of the problem  Solution and discussion  Conclusion 2 Literature review: Groves & Heeringa (2006, JRSS, Series A)  Responsive designs: Use paradata to guide changes in the features of data collection in order to achieve higher quality estimates per unit cost • Paradata: Data about data collection process • Examples of features: mode of data collection, use of incentives , … • Need to define quality and determine quality indicators • Two main concepts: phase and phase capacity 3 Literature review: Groves & Heeringa (2006, JRSS, Series A)  Phase: Period of data collection during which the same set of methods is used • Phase 1: gather information about design features • Phases 2+: alter features (e.g., subsampling of nonrespondents, larger incentives, …)  A phase is continued until its phase capacity is reached • Judged by the stability of an indicator as the phase matures 4 Literature review: Schouten, Cobben & Bethlehem (2009, SM)  Goal: determine an indicator of nonresponse bias as an alternative to response rates  Proposed a quality indicator, called R-indicator: R(ρ)  1  2  Pop.Std.Dev.( i , i U ) , 0  R(ρ)  1 • Population standard deviation must be estimated • Response probabilities, i , must be estimated using some model  An issue: indicator depends on the proper choice of model (choice of auxiliary variables) 5 Literature review: Schouten, Cobben & Bethlehem (2009, SM)  Another issue: indicator does not depend on the variables of interest but nonresponse bias does 1  R(ρ)  S (y)  ˆ  Maximal bias of  NA : 2  ˆ is the unadjusted estimator of the population NA mean: ˆNA   is wi yi r  isr wi  Two limitations of maximal bias (and R-indicator): • unadjusted estimator is rarely used in practice • depends on proper specification of 6 i Literature review: Peytchev, Riley, Rosen, Murphy & Lindblad (2010, SRM)  Goal: Reduce nonresponse bias through case prioritization  Suggest targeting individuals with lower estimated response probabilities • For instance, give them larger incentives or give interviewer incentives • Their approach is basically equivalent to trying to increase the R-indicator (or achieving a more balanced sample)  Recommend using auxiliary variables that are associated with the variables of interest 7 Literature review: Laflamme & Karaganis (2010, ECQ)  Development and implementation of responsive designs for CATI surveys at Statistics Canada  Planning phase: • before data collection starts (determination of strategies, analyses of previous data, …)  Initial collection phase: • evaluate different indicators to determine when the next phase should start  Two Responsive Designs (RD) phases 8 Literature review: Laflamme & Karaganis (2010, EQC)  RD phase 1: • prioritize cases (based on paradata or other information) with the objective of improving response rates • increase the number of respondents (desirable)  RD phase 2: • prioritize cases with the objective of reducing the variability of response rates between domains of interest (increasing R-indicator) • likely reduce the variability of weight adjustments (desirable) 9 Literature review: Schouten, Calinescu & Luiten (2011, Stat. Netherlands)  First paper to propose a theoretical framework for adaptive survey designs  Suggest: • Maximizing quality for a given cost; or • Minimizing cost for a given quality  Requires a quality indicator (e.g., overall response rate, R-indicator, Maximal bias, …) • Which one to use? 10 Definition of the problem  Adaptive collection design: Any procedure of calls prioritization or resources allocation that is dynamic as data collection progresses • Use paradata (or other information) to adapt itself to what is observed during data collection • Focus on calls prioritization  Our objective: Maximize quality for a given cost  Context: CATI surveys 11 Choice of quality indicator  Focus of the literature: Find collection designs that reduce nonresponse bias (or maximize Rindicator) of an unadjusted estimator  We think the focus should not be on nonresponse bias. Why? • Any bias that can be removed at the collection stage can also be removed at the estimation stage  We suggest reducing nonresponse variance of an estimator adjusted for nonresponse 12 Quality indicator  Suppose we want to estimate the total:    iU yi  Assuming that nonresponse is uniform within cells, an asymptotically unbiased estimator is: wgi ˆ  A  is ygi rg ˆ g g 1 G with ˆ g  nrg ng  Quality indicator: The nonresponse variance   2 varq ˆA s     g1  1  ng  1 Swy ,g G g 1      g  Eq ˆ g s  Eq nrg s ng 13 Overall cost  Overall cost: CTOT   g 1 CTOT , g G CTOT , g    (m isrg gi  1)CNR , g  CR , g    isg  srg mgi CNR , g mgi :total number of attempts for unit i CNR , g :cost of an unsuccessful attempt CR , g :cost of an interview 14 Expected overall cost  Expected overall cost: CTOT  Eq  CTOT s    g 1 CTOT , g G CTOT , g   CR , g  CNR , g  ng  g  CNR , g  mgi isg   mgi  Eq mgi s  m  pgi , M gi  Assumption : mgi does not dependon g G CTOT  0   1g ng  g 15 g 1 Mathematical formulation  Objective: Find g , g  1,..., G, that minimizes the nonresponse variance var ˆ s q   A subject to a fixed expected overall cost, CTOT  K  Solution:  Note: 16 1 2 2  1  ng1  S wy  ,g   S wy , g g   1g   Equivalent to maximizing the R-indicator only in a very special scenario Implementation  Find the effort egi (number of attempts) necessary to achieve the target response probability g egi  ln(1   g ) ln(1  pgi )  Procedure: Select cases to be interviewed with probability proportional to the effort egi  Issues: 1) Avoid small estimated pgi to avoid an unduly large effort egi 17 2) Might want to ensure that a certain time has elapsed between two consecutive calls Graph of variance vs cost Minimum nonresponse variance 18 Expected overall cost Revised solution  Solution of the optimization problem is found before data collection starts  May be a good idea to revise the solution periodically (e.g., daily) • Some parameters might need to be modified • Update remaining budget and expected overall cost • The revised optimization problem is similar to the initial one 19 Revised solution  Solution (same as before): 2  1  ng1  S wy  ,g  g   1g   1 2  Revised target response probability:  ng  g  nrg  g  Could be negative  ng  nrg  Effort: 20 ln(1   g ) egi  ln(1  pgi ) Conclusion  Next steps: • Simulation study • Adapt the theory for practical applications • Test in a real production environment  Which quality indicator? Nonresponse variance? Others?  Reduction of nonresponse bias: subsampling of nonrespondents • Our approach could be used within the subsample 21 Thanks - Merci  For more information,  Pour plus d’information, veuillez contacter : please contact: Jean-François Beaumont (Jean-Francois.Beaumont@statcan.gc.ca) David Haziza (David.Haziza@umontreal.ca) 22

A Theoretical Framework for Adaptive Collection Designs

Related documents

Products

Support

A Theoretical Framework for Adaptive Collection Designs

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib