United Nations Economic Commission for Europe Statistical Division

advertisement
United Nations Economic Commission for Europe
Statistical Division
Imputation in UNECE
Statistical Databases:
Principles and Practices
Steven Vale and Heinrich Brüngger, UNECE Statistical Division
Contents





The ECOSOC view of statistical imputation
Current practices
Basic principles
Step-by-step implementation
Conclusions and open questions
23 July 2016
Steven Vale - UNECE Statistical Division
Slide 2
ECOSOC views




Resolution 2006/6 on strengthening
statistical capacity
Sets limits for the use of imputation
... but also implicitly endorses it as a
statistical technique
Statistical agencies need to review their
practices to ensure compliance
Steven Vale - UNECE Statistical Division
Slide 3
Defining imputation


“A procedure for entering a value for a
specific data item where the response is
missing or unusable”
Boundary issues:
•
Imputing and editing
• Imputing and forecasting
Steven Vale - UNECE Statistical Division
Slide 4
Current practice in UNECE


Very limited ad-hoc imputation
Four cases:
•
Account identities
• Regional aggregates

Poor quality national data with little impact on region
totals
•
Re-classification
• Using imputations from others

Sufficient transparency in source metadata?
Steven Vale - UNECE Statistical Division
Slide 5
Basic principles (1)

Imputed national data are not published
•




Avoids the need for consultation
Only official sources used for imputation
Preference for data from same country
Clear distinction between “real” and
imputed data
Transparency – imputed data clearly
flagged, and methods documented
Steven Vale - UNECE Statistical Division
Slide 6
Basic principles (2)




Aggregates must contain > 90% “real”
data, covering > 50% of countries
Imputed data are re-calculated periodically
to adjust for revisions
Method used defined at the level of the
variable and stored as an attribute
Decisions on the use of imputation to be
taken with regard to the quality framework
Steven Vale - UNECE Statistical Division
Slide 7
Step-by-step application




Automatic imputation routines to extend
imputation towards the boundaries set by
the ECOSOC Resolution
One step at a time, with pause and review
to consider quality and cost / benefit
“Dashboard” to allow statisticians to
choose the most appropriate method
Implemented in the context of reengineering of statistical database system
Steven Vale - UNECE Statistical Division
Slide 8
First step


Use a linear trend to impute missing values
Requirements:
•
Sufficient time series observations (at least 3
out of previous 5 periods)
• Closeness of fit of linear trend (R2 close to 1)

Constraints
Validity of R2 for few observations
• Forward imputation only
•
Steven Vale - UNECE Statistical Division
Slide 9
2000 2001 2002 2003 2004 2005 2006 2007
N
Y
Y
N
Y
Y
N
Data Available:
Imputation:
Y = Yes
= Yes
N = No
= No
Steven Vale - UNECE Statistical Division
N
Slide 10
Next steps

More flexibility:
•
Longer time series
• Imputing values at start and in middle of
time series
• Non-linear trends?

Cross-country imputation in strictly
limited cases?
Steven Vale - UNECE Statistical Division
Slide 11
Conclusions

Strong links between imputation and quality
•



Trade-off between accessibility and accuracy
Step-by-step, pause and review approach
seems appropriate
Transparency is essential
Standardization of practices between
international organizations would help
Steven Vale - UNECE Statistical Division
Slide 12
Open questions
Are other organizations interested in defining a
common policy on the use of imputation, in
response to the ECOSOC Resolution?
2. Could we go further and consider harmonization
of methods and tools?
3. How should this be done? Is a specific forum
needed, or can this be dealt with in combination
with work on data quality?
4. Have other organizations modified their policies
on imputation in the light of the ECOSOC
Resolution, and if so, how?
1.
Steven Vale - UNECE Statistical Division
Slide 13
Download