United Nations Economic Commission for Europe Statistical Division Imputation in UNECE Statistical Databases: Principles and Practices Steven Vale and Heinrich Brüngger, UNECE Statistical Division Contents The ECOSOC view of statistical imputation Current practices Basic principles Step-by-step implementation Conclusions and open questions 23 July 2016 Steven Vale - UNECE Statistical Division Slide 2 ECOSOC views Resolution 2006/6 on strengthening statistical capacity Sets limits for the use of imputation ... but also implicitly endorses it as a statistical technique Statistical agencies need to review their practices to ensure compliance Steven Vale - UNECE Statistical Division Slide 3 Defining imputation “A procedure for entering a value for a specific data item where the response is missing or unusable” Boundary issues: • Imputing and editing • Imputing and forecasting Steven Vale - UNECE Statistical Division Slide 4 Current practice in UNECE Very limited ad-hoc imputation Four cases: • Account identities • Regional aggregates Poor quality national data with little impact on region totals • Re-classification • Using imputations from others Sufficient transparency in source metadata? Steven Vale - UNECE Statistical Division Slide 5 Basic principles (1) Imputed national data are not published • Avoids the need for consultation Only official sources used for imputation Preference for data from same country Clear distinction between “real” and imputed data Transparency – imputed data clearly flagged, and methods documented Steven Vale - UNECE Statistical Division Slide 6 Basic principles (2) Aggregates must contain > 90% “real” data, covering > 50% of countries Imputed data are re-calculated periodically to adjust for revisions Method used defined at the level of the variable and stored as an attribute Decisions on the use of imputation to be taken with regard to the quality framework Steven Vale - UNECE Statistical Division Slide 7 Step-by-step application Automatic imputation routines to extend imputation towards the boundaries set by the ECOSOC Resolution One step at a time, with pause and review to consider quality and cost / benefit “Dashboard” to allow statisticians to choose the most appropriate method Implemented in the context of reengineering of statistical database system Steven Vale - UNECE Statistical Division Slide 8 First step Use a linear trend to impute missing values Requirements: • Sufficient time series observations (at least 3 out of previous 5 periods) • Closeness of fit of linear trend (R2 close to 1) Constraints Validity of R2 for few observations • Forward imputation only • Steven Vale - UNECE Statistical Division Slide 9 2000 2001 2002 2003 2004 2005 2006 2007 N Y Y N Y Y N Data Available: Imputation: Y = Yes = Yes N = No = No Steven Vale - UNECE Statistical Division N Slide 10 Next steps More flexibility: • Longer time series • Imputing values at start and in middle of time series • Non-linear trends? Cross-country imputation in strictly limited cases? Steven Vale - UNECE Statistical Division Slide 11 Conclusions Strong links between imputation and quality • Trade-off between accessibility and accuracy Step-by-step, pause and review approach seems appropriate Transparency is essential Standardization of practices between international organizations would help Steven Vale - UNECE Statistical Division Slide 12 Open questions Are other organizations interested in defining a common policy on the use of imputation, in response to the ECOSOC Resolution? 2. Could we go further and consider harmonization of methods and tools? 3. How should this be done? Is a specific forum needed, or can this be dealt with in combination with work on data quality? 4. Have other organizations modified their policies on imputation in the light of the ECOSOC Resolution, and if so, how? 1. Steven Vale - UNECE Statistical Division Slide 13