Editing process

advertisement
Outlining a Process Model for
Editing With Quality Indicators
Pauli Ollila (part 1)
Outi Ahti-Miettinen (part 2)
Statistics Finland
Process model for editing (top level)
Data studies and planning of
editing process
DATA
target of the
process
ACTION
ENTITIES
Editing process
MAIN
EVALUATIONS
Process and quality
evaluation
MAIN
DECISIONS
Draft, english terms not necessary in final form
Data studies and planning
of editing process
DATA STUDIES -kalvo
 Preliminary analysis gives an
overview on the contentual state of
current data (raw or partially
processed)
 Error diagnostics makes an
overview on typical errors in the data
and possible changes in the error
profile of the data.
Both phases have program-based and interactive analysis parts of study:
 Program-based analysis: tabulation and calculation of statistics with relevant
subgroups targeted to variables essential for editing process (preliminary analysis)
and of known fatal errors and clear suspicions found in the data (error diagnostics).
 Interactive analysis: based on the experiences of the researcher using suitable
IT solutions (analysis methods, graphical methods, observation value views) 
might catch those, possibly new changed data characteristics (preliminary
analysis) or errors (error diagnostics), which cannot be found with prepared
programs or are not met before.
Editing process
(in general)
Editing process
 All editing is realised
in the phase of
editing process
 Editing process can
include several error
identification and
correction actions 
iterative
 Error identification includes actions, which result to identifying certain and
possible error at the observation level or at the group of observations
level.
 Error correction realises corrections of all or some identified errors
following the decisions made at the error identification phase.
Editing process
ERROR IDENTIFICATION
(probably subject to further specification)
Data processing
Realisation of
information view for
error identification
Evaluation of the error
identification and
decisions of further
measures
ACTION
Methodological
group
Information
for editing
Information
view
View group
Editing process
ERROR CORRECTION
Correction of
identified errors
Evaluation of realisation
of correction ;
calculation of ”state of
data” indicators
Controlling constraints of
the edited data and
possible corrections
Edited
data
Process and quality
evaluation
 Process and quality can
be evaluated with
indicators, which should
be calculated
automatically. The
process of calculation is in
a constant form.
Process and quality evaluation
Edited
data
 Indicators describing the editing process
 ”State of data” indicators (essential estimates at the population level
and in relevant subgroups, as in preliminary analysis and during editing
process)
 Indicators revealing the influence of editing on results
 Indicators in relation with previous results
More about the indicators in the 2nd part of the presentation
Methodological resources (1)
 The actions realised in the editing model are supported
with the knowledge included in the methodology bank,
which describes the methods included in the
methodology groups in the different phases of the
editing model. It also provides help for the interpretation
of the statistical measures produced by the methods.
 The structure of the methodology bank follows strictly
the methodology groups appearing in the editing
model.
 The concept library defines the concepts used in the
model and the methodology bank
Methodological resources (2)
 For decisions (forthcoming actions and choices of
methods), and for interpretations and evaluations of the
results gained before and during processes and for
actions required for realizing the methods there must be
an instruction collection, which helps during different
phases. The instruction collection is based on research
work and recommendations, international experiences
and practices and Statistical office’s experiences on
data sets, error types and practices.
 Methodology bank, concept library and instructions
should be easily available whenever needed (e.g. wikibased). For some part these could be utilised for the
documentation of the quality of editing as well.
IT solutions (1)
Edited t-2
Unedited
t-2
Tot
t-2
Edited t-1
Unedited
t-1
Tot
t-1
Pre
t-2
Edited t
Unedited t
Tot
t
Pre
t-1
Example: ”philosophy ” of data preparations for SELEKT (whole lines) .
Pre=previous values (for some variables), Tot=totals (or other statistics)
The data architecture should make all the data of the
previous rounds available (also unedited versions) with
coherent variable structure for editing purposes. The
system should easily calculate statistics (totals, means
…) at the overall level and in subgroups and make them
available for creating edit rules and functions and for
controlling the changes in time.
IT solutions (2)
The IT environment should provide solutions to methods existing
in the methodology bank (e.g. modules, procedures, macro
packages) or at least a flexible platform to construct a program or
other ways of action for the method.
For larger entities of methods and practices it may provide
applications or systems. The environment can include existing
software (e.g. Banff, Selekt, LogiPlus) for the realization of some
parts.
Moving from programming by the user and manual written
updates of programs to time and case specific information,
modification information and methodological choices given with
parameterization.
IT solutions (3)
The IT environment should allow flexible processing
and obtaining of metadata and process data in order
to control the process (E&I indicators) and the state of
data (data indicators) during process and evaluating
the quality of the final data. These indicators should be
easily available for the use of quality reporting.
In principle, every action should leave a ”trace” for this
purpose. In practice, especially interactive operations
cause a substantial challenge for this objective.
Quality Indicators for Editing
(part 2)
• UNECE
• Work session on Statistical Data Editing
• (Oslo, Norway, 24-26 September 2012)
Functions of indicators
• To achieving the transparency of data for users
• Systematic quality control
– To control error identification and correction
actions
• effects on the data
– To evaluate the development of the quality of data
• in different stages of editing process
• overall quality of the final data
Aims of the paper
• To study useful indicators related to editing
process from different sources
• To create a consistent and systematic practice
for using indicators in Statistics Finland
• To collect a list of useful indicators and their
definitions
Indicators were collected from
• EUREDIT / Ray Chambers (2004). Evaluation Criteria
for Statistical Editing and Imputation
• EDIMBUS (2007). Recommended Practices for Editing
and Imputation in Cross-sectional Business Surveys
• EUROSTAT (2009a). ESS Standard for Quality Reports
• STATISTICS FINLAND (2007). Quality Guidelines for
Official Statistics
• STATISTICS CANADA (2007). Functional Description of
the Banff System for Edit and Imputation
Indicators are presented in three categories
• Indicators related to raw data
–
In the beginning of the editing process
– Give information about errors on data, errors effect on results and
variables’ and subgroups’ significance
• Indicators related to error identification
– Describe the amount of errors in variables and observations
– Evaluate the efficiency of error identification procedures
• Indicators related to error correction
– Describe quality of data after error correction
– Indicates amount of error correction actions
– Evaluate the effect on error correction on results or variables
Discussion
• The number of indicators presented in the
paper is quite substantial
• Not all indicators are suitable for every type of
statistics
• Some indicators presented are important but
suitable only for specific situations
• There for, it is not possible to define a detailed
list of indicators to be published with all
statistics
Discussion (2)
• Some standard indicators for editing process should
always be computed
• Indicators measuring missingness in data are valuable
tools for statistics production process by describing
the coverage of data in each stage of process
• If any editing actions are done, the user of the final
product should have access to information on edit
rates of the data
• Altogether, indicators are essential part of editing
process as they provide information on quality of data,
results and editing process in general
Download