Section 3: Survey Weighting

An ounce of planning is worth of a pound of weighting: Measuring cash holdings from the 2013 Bank of Canada Method-of-Payments Survey Heng Chen, Bank of Canada Chris Henry, Bank of Canada Kim P. Huynh, Bank of Canada Q. Rallye Shen, Bank of Canada Kyle Vincent, Bank of Canada Abstract This article details the methodology used in the Bank of Canada 2013 Method-ofPayments Survey to measure cash and non-cash payments. Measuring cash holdings is difficult due to the anonymous nature of cash and that some populations are hard-toreach. To ensure that the survey is a representative sample, a variety of methods in survey design, weighting, and variance estimation are used to estimate a measure of cash holdings in the Canadian population. Overall, we find that Canadians hold on average about 84 dollars. Keywords Sample weighting, non-probability samples, calibration, raking, variance estimation. Acknowledgements: We thank Ben Fung, Geoff Gerdes, Catherine Haggerty, Arthur Kennickell, Marcos Sanches, and numerous colleagues at the Bank of Canada for their useful comments and encouragement in undertaking this survey. Maren Hansen provided excellent editorial assistance. We acknowledge the tremendous collaboration and support from Shelley Edwards, Jessica Wu, and Ipsos Reid for their dedication to this study. Finally, we thank Statistics Canada for providing access to the 2011 National Household Survey and the 2012 Canadian Internet Usage Survey. The views of this paper are those of the authors and do not represent the views of the Bank of Canada. Section 1: Introduction The Bank of Canada has an interest in understanding the levels of cash holding as it is the sole issuer of Canadian bank notes. Measuring the amount of cash holdings is difficult, however, as cash is an anonymous and untraceable payment method. Therefore in 2009, the Bank of Canada undertook a Method-of-Payments Survey, which found that demographic factors are strongly correlated with cash usage. Arango, Huynh, and Sabetti (2011), in their analysis of these factors, find that an increase in the availability and usage of non-cash payment methods such as debit, credit, and even mobile payments make holding cash seem quaint among younger, educated, and wealthier demographics. The 2009 MOP introduced a three-day payment diary, which served as a memory aid to record all payments, including those in cash. This payment diary methodology has been successfully used in six other countries: Austria, Australia, France, Germany, the Netherlands, and the United States; more details are available at Bagnall et al (2014). In 2013, the Bank of Canada conducted the Method-of-Payments Survey (MOP) to measure cash usage in Canada. The 2013 MOP is a mixed-mode survey involving both paper and online collection methods. An Online sample was selected from a market access panel accessible by email and an Offline sample from a panel accessible by regular mail. In addition, a subsample was taken from another comprehensive annual household survey, the Canadian Financial Monitor (CFM), which collects data on household finances from approximately 12,000 Canadian households per annum. The CFM survey instrument collects information that complements and overlaps with the MOP, providing a complete picture of household finances. In total, 3600 surveys were collected for the 2013 MOP from across the country and then weighted to ensure that the sample is representative of the Canadian population. Surveys of the aforementioned type require post-stratification (sample weighting) as they are likely to be based on non-probability or, at best, highly unequal probability samples. For example, in the 2009 MOP the 18-24 year-old males were the most difficult to recruit and had the lowest response rates. Hence, as there is a large degree of heterogeneity in the recruitment procedure, this type of study benefits from sample calibration. The objective of this paper is to describe the methods undertaken to ensure that the 2013 MOP is a representative sample. The chief methods used consist of: 1) Revamping and redesigning the survey to ensure that it is user-friendly and easy to fill out. Users report a satisfaction rating of about 90 percent on the survey. 2) Engaging respondents using a mixture of financial incentives and appeals to civic duty, via an official letter from the Governor of the Bank of Canada. 3) Subsampling the CFM, whose data collection is conducted on a rolling monthly basis. This provides our own survey the advantage of periodic updates. 4) Collaborating with the survey company to ensure that the field work achieves preset population targets. The survey company hit all the targets with the help of an additional boost wave, in which additional invitations were sent out promising high incentives for response. 5) Cleaning and editing the data using external data sources such as the market access panel demographic profile, and verifying the subsample with the CFM. As a result, the level of missing data is quite low, at only about 1-3 percent, and so only light imputation is required. 6) Using raking methodology to conduct post-stratification weighting, using external administrative and large-scale survey data from Statistics Canada. The weights from the post-stratification are invariant to initial weights. 7) Using resampling methods with bootstrap replicate survey weights (BRSW) for the variance estimation. Usage of BRSW results in a decrease of 20-50 percent in the variance of the estimate. The rest of this paper describes in detail these procedures. Section 2 discusses points 15, Section 3 discusses the post-stratification methods, and Section 4 highlights the resampling methods for variance estimation. Finally, Section 5 concludes. Section 2: The 2013 Method-of-Payments The 2013 Method-of-Payments Survey is an update to the 2009 MOP. For further information on the 2009 MOP survey, refer to Arango and Welte (2012); for the 2013 MOP, see Henry, Huynh, and Shen (2014). Indeed, planning for the survey included incorporating lessons learned from both the 2009 MOP and the CFM. One important lesson from the 2009 MOP is that certain hard-to-reach populations were under-represented in the final sample. This led to zero/low cell counts, which in turn gave rise to extreme weights. While sampling targets were achieved for marginal demographic counts, missing cell counts at a nested level caused difficulties for the weighting process. Several measures were implemented to ensure that the sampling procedure in the 2013 MOP would avoid this problem. First, established sampling targets were nested by region, age, and gender. Thus we knew beforehand, for example, how many males aged 18-24 from the Prairies region of Canada were required for the sample to reflect the Canadian population. These pre-defined targets, built into the statement of services for the survey company, facilitated the ongoing monitoring of returns during data collection. Frequent updates allowed us to project which cells were likely to have an excess or shortage. Finally, various levels and types of incentives were randomly offered to potential respondents, which allowed us to determine the most effective combination. Financial incentives ranged from $5 to $20; other types of incentives included an advanced letter signed by the Governor of the Bank of Canada requesting participation, and a token $2 coin included in the survey package regardless of whether or not the respondent participated. Collaboration with the survey company, Ipsos Reid, was important to ensure that these tools were effectively employed to hit the nested sampling targets. Ipsos Reid provided almost daily updates to establish up-to-date projections for the final returns. During data collection, certain cells were identified as in danger of being under represented in the final sample. Through timely collaboration with Ipsos Reid and senior management, an additional sampling wave was added for the offline recruitment, which utilized the full spectrum of high-level incentives – the $20 completion incentive, advanced letter, and token $2 incentive. As a result, we were able to hit (and exceed) all nested targets, and ensure that no zero cells would impede the calibration. The other main innovation of the 2013 MOP was to leverage the existing CFM survey, via the survey instrument and the method of data collection. By comparing the 2009 MOP with the CFM we discovered that the surveys contain some overlapping content. This led to a sub-sampling of past CFM respondents (with the sampling frame provided by Ipsos Reid) in addition to maintaining the Online and Offline panels employed for the 2009 MOP. For certain aspects of the questionnaire – specifically/for example, information on a respondent’s bank account(s) and credit card(s) – the respondent had already provided the desired information in the CFM; this allowed us to shorten the 2013 MOP questionnaire, which likely had a positive impact on response rates. Other topics in the 2009 survey, such as cash usage, were maintained in the 2013 MOP but made directly comparable to questions in the CFM. This sub-sampling approach proved very successful in recruiting respondents, with a response rate of over 50 percent. Furthermore it provides an external benchmark with which to compare measures of consumer cash holdings/usage. Section 3: Survey Weighting Data collection methods for large scale surveys are almost always biased. Two common explanations are that the sampler does not have access to a full sampling frame, and that the sampler does not have full control over the sample selection procedure. However, sample calibration/balancing is a post-stratification procedure that can be used to reweight sample data so that it is more representative of the target population (Sarndal, 2007). Such sample weighting can facilitate more accurate estimates of population unknowns. The calibration procedure leverages the availability of national-level counts of auxiliary/demographic information, so as to balance the sample to these counts. For example, in our calibration analysis we make use of national-level counts based on the 2011 National Household Survey (NHS) and the 2012 Canadian Internet Use Survey (CIUS) for a variety of demographic variables. Calibration over such information offers users the ability to reduce non-response and non-coverage bias effects. Consequently, the resulting estimators should be less biased than those based on the un-weighted data (Kish, 1992) when calibration is used. In our calibration analysis we follow a series of steps to arrive at a suitable set of calibration weights. We summarize the process with the flowchart in Figure 1 and provide a detailed breakdown below. We then conclude the section with a reflection on non-response weights. Subsection 3.1 Outline of calibration analysis. Stage A: We first consider a set of potential calibration variables that include both demographic and technological-oriented variables. These are chosen based on their conjectured relationship with important survey questions. A round of data editing and cleaning is undertaken, and imputation of missing values in the calibration variables is achieved with the aid R package called “mice” package (van Buuren, 2012). As recruitment came from three separate market access panels, a comparative analysis is performed using the Epps-Singleton test for homogeneity on the demographic variables. The three subsamples are found to be fairly homogenous in composition and we therefore concatenate them into one final sample for the calibration analysis. A correlation analysis over the calibration variables is conducted, in order to determine any potential collinear variables to eliminate from the analysis. As the calibration variables are classified as categorical or ordinal variables, we use the polychoric correlation measure (Drasgow, 2004) with the aid of the R package “polycor” (Fox, 2010) to compute the correlations. Stage B: The calibration variables are found to be only mildly to moderately correlated, and hence no variables are eliminated from the analysis. With respect to nesting/pairing calibration variables, we pair the two most correlated variables with each other, as well as the gender variable with several other calibration variables. Nesting allows us to avoid sparse cells. For example, the gender variable is often paired with other calibration variables because it is a binary variable. This pairing can avoid small cell counts while accounting for disagreements between the joint sample and national distributions. A range of calibration techniques can be used; see Deville et al. (1993) for a mathematically detailed account of some commonly used procedures. We select/choose the raking and generalized regression (GREG) procedures as these are popular methods among both national statistical agencies and academics; see Sarndal (2007) and the R Survey package by Lumley (2010). Two sets of proposed initial weights for the raking method are based on both a simple random design and a stratified sampling design based on several key demographics. The correlation of the two sets of generated weights, when using the raking algorithm and the full list of calibration variables, is high. Hence, we conclude that the final weights are likely to be insensitive to the choice of the initial weights, and therefore base them on the simple random sampling design. Stage C: The raking and GREG procedures are evaluated based on the range of weights they generate. With numerous combinations of calibration variables, the GREG procedure gives rise to a number of negative weights. Furthermore, since raking is a popular post-stratification method that is widely used in the statistics profession, we choose to use the raking method. Subsection 3.2 Non-response weights. The issue of non-response is a common concern among survey practitioners; see Kish, (1992) for more details. Typically, non-response will bias and inflate the variance of the estimates and increase survey costs (as follow-ups can be expensive). However, several procedures can be used to account for such non-random, non-response issues. In some cases, typically when the quantity of non-response is small, imputation can be prescribed to resolve such issues. However, determining a suitable model and imputation strategy can be resource intensive and computationally expensive, as surveys are comprised of many questions. Instead, the original calibration weights can be adjusted to compensate for the non-response issues. Further, when responses are missing completely at random (MCAR) (Rubin, 1976) and non-response counts are small, one approach is to base estimation on a rescaling of the original calibration weights for those who have responded. However, the non-response pattern will usually be such that it cannot be viewed as MCAR. A common approach in such a case is to assume that the responses are missing at random (MAR) (Rubin, 1976). In other words, the responses are MCAR within strata/classes of the survey respondents. This assumption allows the computing of estimates for the response probabilities, possibly based on a logistic regression or propensity scoring model, and appending them to the original calibration weights. Essentially, a respondent 𝑖 will receive an original calibration weight 1/𝑃𝑖 and an estimated response weight 𝑟ℎ depending on their stratum membership ℎ. Their corresponding weight, for the survey question under consideration, would then be 𝑤𝑖 = 1/(𝑃𝑖 𝑟ℎ ). The aforementioned approach comes with precautions. As noted by Sarndal (2007), an inherited bias in the procedure is likely when estimating the actual response probabilities. Hence, a high level of caution should be exercised when positing/exploring response model(s). However, rigorous methods have been developed to approximate potential non-response bias as a function of the responses and the national-level covariate information (Sarndal and Lundstrom, 2008); with such methods, suitable calibration variables can be chosen to reduce the non-response bias. Subsection 3.3 Non-probability samples In an empirical setting, non-probability sampling is a common concern, as latent factors may influence selection probabilities. (Consider the discussion above on nonresponse). When estimation is based on non-probability samples, the population attributes are typically assumed to be distributed somewhat evenly so that the sample weighting (through a posited probability sampling design and sample calibration) can still provide accurate results. Web-based studies are gaining popularity due to their convenience and efficiency in obtaining data. Almost all of these surveys rely on convenience or volunteer sampling, both of which are non-probability sampling designs. Apart from legally mandated surveys such as national census most surveys rely on volunteer sampling. Such limitations have been acknowledged in the literature, and the demand is increasing for calibration methods to improve the accuracy of results based on such samples; see Schonlau, et al., (2007) and Disogra et al. (2011). Internet and mobile usage information from nationwide surveys presents much potential for use in calibration analyses. We (therefore?) exploit such demographic information from the 2012 CIUS for our calibration analysis, namely ownership of a mobile device and online payment activity, as some of the MOP survey questions are oriented to/relate to mobile means-of-payment. More details about the/this weighting procedure are available in Vincent (2014). Section 4: Variance Estimation Since the 2013 MOP employs stratified random sampling (Section 2), and survey weights are applied to ensure a representative sample (Section 3), variance estimation should take both the sampling design and calibration procedure into account. Heuristically speaking, the variance depends on the weighting procedure, not just on the numerical values of the weights. However in most payment surveys, variances are usually calculated by taking the calibrated weights as fixed values, thereby biasing the variance estimates. In order to capture the randomness from both sampling design and weight calibration, we propose a resampling method, specifically the bootstrap replicate survey weights (BRSW). For example, if the weight calibration is raked over the external variables x, then the estimated variance of the population-based weighted y variable is: ̂ (𝑦) = ∑ 𝑉𝑎𝑟 𝑘,𝑙∈𝑆 𝜋𝑘𝑙 − 𝜋𝑘 𝜋𝑙 𝑦𝑘 − 𝑥𝑘 𝑏 𝑦𝑙 − 𝑥𝑙 𝑏 𝜋𝑘𝑙 𝜋𝑘 𝜋𝑙 where 𝜋𝑘 is the k-th unit’s inclusion probability in the sample, 𝜋𝑘𝑙 is the pairwise inclusion probability of the k-th and l-th units, and the parameter b is the OLS estimate from regressing y on x. The first term is proportional to the sampled covariance for the k-th and l-th units. The second and third terms form the fitted residuals. Our reasons for choosing resampling over the linearization method are as follows. First, most software packages use linearization as if the weights were fixed, which does not allow for model-based information (b), but rather uses 𝑤𝑘 or the calibrated weight in the denominator. Although the correct linearization method is suggested by Lu and Gelman (2003), their method can be difficult to implement when sample weights are complicated functions of sample sizes within strata. (Only under simple random sampling does 𝑤𝑘𝑙 have a straightforward formula depending on 𝑤𝑘 and 𝑤𝑙 .) Second, the linearization estimators use the initial weights. Hence this requires the survey dataset to include both the calibrated weights and the base weights, which may confuse users. Third, the end user of the data must be given the set of strata variables, which may not be possible if confidential variables are used in calibration. These complications cause resampling to be the more popular method for estimating variances when calibrated weights are used (Shao 1996, Kolenikov 2010). Among the many resampling methods for a complex survey, we choose to use BRSW. We prefer the bootstrap to either the jackknife or balanced repeated resampling (BRR), because the jackknife is inconsistent for non-smooth functions (e.g. the median estimate), and BRR is more suitable for a stratified clustered sampling design, which was not used in our 2013 MOP Survey. As for choosing to use replicate survey weights, we do so to protect the privacy of survey respondents (no strata information will be provided), and because replicate weights incorporate information about corrections for non-response bias (weights are re-adjusted for each replicate). The construction of the BRSW involves first re-creating the sample in each replicate and then adjusting the associated calibrated weights. For example, if a unit from a replicate is not sampled, a zero weight is assigned to it, and then the weights of the other units in the same stratum are expanded to compensate. In the next step, the weight calibration is applied to each of these replicate sets of weights. These two steps generate the bootstrap replicate weights. We use the bsweights package in Stata to implement this method (Section 3.2 in Chen and Shen, 2014). Table 1 shows mean and variance computations from the 2013 MOP and 2013 CFM data. As a basis of comparison, we decompose the 2013 MOP Survey sample statistics into online, offline and CFM subsamples. Overall, the cash holdings from the weighted 2013 MOP Total mean of 83.68 is lower than 2013 CFM survey of 94.67. Part of this finding is driven by the MOP Online mean cash holding which is about 80.06 and constitutes about about one-third of the sample. The MOP offline mean of 91.10 is closer to the 2013 CFM mean. Part of this reason is that it is drawn from the offline access panel. The CFM subsample respondents provide a point of comparison as they are participants of both the 2013 MOP and 2013 CFM. The average cash holdings for this overlap set of respondents is 83.41 (2013 MOP) and 89.82 (2013 CFM). Overall, the 2013 MOP estimate of cash holdings is lower than the 2013 CFM. Part of this discrepancy maybe due to the timing of the survey as the 2013 MOP was conducted in October-November 2013 while the 2013 CFM was sampled from a subset of January-August 2013 participants. We compute the variance based on linearization and the BRSW methodology. In the second row, the variances are calculated by linearization without considering the weighting procedure, while in the third row the variances are calculated by BRSW. The BRSW variances are much smaller than those calculated by linearization, because the resampling method takes into account the weight calibration procedure, which is applied after the sample is collected. Note that the paper-based (MOP CFM and MOP offline) BRSW variances improve about 20-25 percent, while the MOP online BRSW variance improves by over 50 percent. A plausible explanation would be three-fold. First, the cash-on-hand variable is more correlated to the raking variables (e.g. online payment) for the online respondents than for the paper-based ones. By computing the R squared for different subsamples, we find that the R squared from the online panel is highest at 0.0179, compared to 0.0048 and 0.0087 for the offline and CFM panels, respectively. Second, since the sample variance also depends on the sample size, the relatively larger sample size for the online subsample could also drive the sizable improvement. Table 1: Mean and Variance Estimates for Average Cash on Hand Raw 2013 MOP Weighted 2013 CFM Weighted CFM CFM Total Total Online Offline Subsample Subsample Total Mean 87.62 83.68 80.06 91.10 83.41 89.82 94.67 VarLin 15.84 13.27 48.51 50.66 25.81 7.60 4.40 VarBRSW N/A 7.76 17.90 41.10 20.09 N/A N/A R-squared N/A 0.009 0.018 0.005 0.009 N/A N/A Observations 3413 3413 1294 679 1440 1440 12280 Notes: Average cash on hand is measured in Canadian dollars. Statistics are based on respondents reporting positive figures (i.e. excluding zero responses). VarLin is the linearized variance and VarBRSW is the variance based on bootstrap replicate survey weights. R-squared is the goodness of fit from the raking variables regression. This exercise demonstrates that if we ignore the random fluctuations due to the weight calibration in the variance estimation, the resulting variances will tend to be conservative (too large) and the confidence intervals too long, their coverage exceeding the pre-specified nominal level. Section 5: Conclusion/Discussion Our experience from the 2013 Method-of-Payments Survey dictates/highlights that survey weighting should not be viewed as a panacea to the ills of a poorly designed survey. Rather, a well-thought-out survey design and preparation will go a long way to ensuring that the survey is representative. Namely, we suggest that survey teams: 1) Use mixed-mode survey methods so that each mode can be used to validate and verify the others. In addition, various trusted external data should be brought in to help calibrate the survey collection methods. 2) Use the methods espoused by Dillman (2007) to induce higher response rates. 3) Work closely with the data collection agency to ensure that the objectives are laid out in advance. Closely monitor all fieldwork and collaborate to head off any difficulties. 4) Conduct post-stratification using a variety of methods, but ensure that the methods are robust and make sense. Again, use a variety of external data to conduct post-stratification. 5) Compute variance estimation using resampling methods, as they result in variance reduction as well as provide a way to anonymize the sampling design. Overall, we found that the 2013 MOP the average cash holdings is about 84 dollars which is about 10 dollars less than what was found in the 2013 CFM. Understanding the source of difference in estimates is left for future work. References Carlos Arango, Kim Huynh, Leonard Sabetti, 2011. "How Do You Pay? The Role of Incentives at the Point-of-Sale," Working Papers 11-23, Bank of Canada. Carlos Arango, Angelika Welte, 2012."The Bank of Canada’s 2009 Method-of-Payments Survey: Methodology and Key Results," Discussion Papers 12-6, Bank of Canada. Chen, H. and Shen, R. (2014), Variance Estimation for Survey-Weighted Data using Resampling Methods: 2013 Method-of-Payment Survey Questionnaire. Technical Report. Deville, J. C., Sarndal, C. E., and Sautory, O. (1993). Generalized Raking Procedures in Survey Sampling. Journal of the American Statistical Association 88, 1013-1020. Dillman, D. Internet, Mail and Mixed-Mode Surveys: The Tailored Design Method, 3rd ed. (2009) Disgora, C., Cobb, C., Chan, E., and Dennis, M.J. (2011). Calibrating non-probability internet samples with probability samples using early adopter characteristics. Proceedings of the Section on Survey Research Methods, JSM. Drasgow, F. (2004). Polychoric and Polyserial Correlation. John Wiley and Sons, Inc. Fox, J. (2010). Polycor: Polychoric and Polyserial Correlations. R package version 0.7-8. Henry, C, Huynh, C. and Shen, Q. Rallye (2014). 2013 Method-of-Payments Survey Report. Kish, L. (1992). Weighting for unequal pi. Journal of Official Statistics 8, 183-200. Kolenikov, S. (2010). Resampling variance estimation for complex survey data. The Stata journal 10 (2):165–199. Lu, H. and Gelman, A. (2003). A method for estimating design-based sampling variances for surveys with weighting, post stratification, and raking. Journal of Official Statistics 19 (2):133–151. Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R. John Wiley & Sons, Ltd. Rubin, D. (1976). Inference and missing data. Biometrika, 63 p. 581-592. Sarndal, C.-E. (2007). The calibration approach in survey theory and practice. Survey Methodology 33, 99-119. Sarndal, C.-E and Lundstrom, S. (2008). Assessing auxiliary vectors for control of nonresponse bias in the calibration estimator. Journal of Official Statistics, 24 (2), p. 167191. Schonlau, M., Van Soest, A., & Kapteyn, A. (2007). Are 'webographic' or attitudinal questions useful for adjusting estimates from web surveys using propensity scoring. RAND Corporation. Shao, J. (1996). Resampling methods in sample surveys (with discussion). Statistics 37:203–254. van Buuren, S. (2012). Flexible Imputation of Missing Data. Chapman and Hall/CRC Press. Vincent, Kyle (2014). 2013 Method-Of-Payments Survey Calibration Manual. Technical Report. Figure 1: Survey Weighting Workflow

Section 3: Survey Weighting

Related documents

Products

Support

Section 3: Survey Weighting

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib