UKSUG10.DSouza

A Stata program for calibration weighting John D’Souza National Centre for Social Research Outline  Description of calibration  Adjust selection weights so that a weighted sample exactly matches the population  Generalizes post-stratification  Several methods: Linear, logistic …    SAS, GenStat A new Stata program Limitations and extensions Sampling    Selection weights: dk = 1/P(Person k is chosen) Sample frame variables Xk1, …, XkJ with known population totals, P1, …, PJ. Horvitz-Thompson estimator of Pi ∑dkXki ≈ Pi for i=1,2, …, J.  Calibration: Adjust dk to get calibration weights, wk, giving exact equality: ∑wkXki = Pi for i=1,2, …, J. Example: School Census Variables include  Age, Gender, Ethnic Group, Exam results  Type of School, Region  Pupil’s Free School Meal eligibility We calibrate to J variables. Eg. Boy (binary) Girl (binary) Region (eg. four categories) FSM eligibility (binary) J= 1 + 1 + (4-1) + 1 = 6 Special case: post-stratification   Simplest case:  One categorical variable  Easy to deal with (post-stratification)  svyset , poststrata() postweight() More general case:  Several variables (categorical and numerical) Deville and Sarndal (1992). Minimize the “distance” between w and d subject to the J calibration constraints. Linear calibration: Minimize ∑S (wk- dk)2/dk Involves solving J simultaneous linear equations Logistic calibration: Minimize ∑S (wklog(wk/dk) – wk + dk) Involves solving J simultaneous non-linear equations GenStat, SAS, Stata  GenStat and SAS  Methods: linear, logistic and bounded.  Estimation: GenStat gives SEs.  SAS handles categorical variables directly. Enter as indicator variables in GenStat.  Stata  Post-stratification (calibration to one categorical variable). Gives SEs.  No routine for general calibration. A new Stata program  Typical syntax. matrix M=[10000, 10000, 3000, 4000, 3000, 8000] calibrate , entrywt(w1) exitwt(w2) poptot(M) /// marginals(boy girl FSM ireg1-ireg3) /// method(linear) print(final)  10,000 boys, 10,000 girls, 3,000 FSM  Variables boys, girls, FSM are binary  Categorical variable region (4 categories) turned into 4 binary indicator variables). Only 3 entered in the syntax (colinearity) Output Variable Pop total Weighted (entrywt) Weighted (exitwt) R boy 10000 9619.7188 10000 .21373408 girl 10000 10380.281 10000 .13733883 FSM 3000 2915.4929 3000 .04710333 ireg1 4000 4056.3379 4000 -.19511394 ireg2 3000 3197.1831 3000 -.24808005 ireg3 8000 8507.042 8000 -.2391432 Options   Options available to:  Control amount of output/graphs  Set max number of iterations/tolerance Methods  linear, logistic, bounded linear and nonresp (blinear sets bounds for wk/dk. GenStat and SAS have something very similar ) (nonresp adjusts for non-response – see below) Limitations (1)  Solves the equations by finding a matrix inverse 1. Won’t work if J is large 2. Can have problems with singular or nearly singular matrices 3. Iterative methods (logistic, blinear) won’t always converge  No obvious solution to 1. Problem 2 and 3 are usually down to problems with the data Limitations (2)   We need to recode categorical variables (SAS doesn’t)  Stata: tab region, gen(ireg) More complicated (eg two-phase) problems aren’t handled directly  Need a bit of syntax to handle this  Other packages can handle this directly Extensions –Standard errors Calibration weights are often incorrectly treated as selection weights. calibrate , entrywt(w1) exitwt(w2) poptot(M) /// marginals(boy girl FSM ireg1-ireg3) calibmean , selwt(w1) calibwt(w2) yvar(y) /// marginals(boy girl FSM ireg1-ireg3) /// psu(school) designops (strata(region)) This generalizes Stata’s poststrata command Extension: Method nonresp (1) Example  Select schools, then classes, then pupils  Assume all schools respond, pupils might not Variables available on responders. (Pop totals available)  Gender, Exam results, FSM, Region Variables on non-responders. (Pop totals not available)  PTratio: Pupil-teacher ratio  topset: Is pupil in the top set? Extension: Method nonresp (2) serial region topset outc sex FSM -----------------------------------------1. 1001 1 1 0 . . 2. 1002 1 0 1 1 0 3. 1003 2 0 0 . . 4. 1004 1 0 1 1 1 5. 1005 3 1 0 . . -----------------------------------------6. 1006 1 0 1 0 1 7. 1007 3 1 1 1 0 8. 1008 2 1 0 . . 9. 1009 1 0 1 1 0 Extension: Method nonresp (3) Population totals unknown, but variables are available on all the sample (including nonresponders) calibrate , entrywt(w1) exitwt(w2) poptot(M) /// marginals(boy girl FSM ireg1-ireg3) /// method(nonresp) outc(outc) /// svars(PTratio topset) Responders weighted to pop totals on “marginals” and to selected sample totals on “svars” (Lundstrom & Sarndal, 2005) Conclusions     We’ve found the program can handle many practical problems Easy to calculate SEs (but theory assumes no non-response) Method nonresp isn’t available in many packages We don’t have to calibrate to population totals  Eg, calibrate Wave n+1 of a survey to totals from  Wave n Calibrate one sample to look like another Questions References    Deville, J.-C. and Sarndal, C.-E. 1992. Calibration estimators in survey sampling. Journal of the American Statistical Association 87: 376-382  Background and theory behind calibration Lundstrom, S. and Sarndal, C.-E. 2005. Estimation in Surveys with Nonresponse. Wiley  Deals with non-response Singh, A.C. and Mohl, C.A. 1996. Understanding Calibration estimators in Survey Sampling. Survey Methodology 22: 107-115  Discusses several methods of doing bounded calibration

UKSUG10.DSouza

Related documents

Products

Support

UKSUG10.DSouza

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib