Outlining a Process Model for Editing With Quality Indicators Pauli Ollila (part 1) Outi Ahti-Miettinen (part 2) Statistics Finland Process model for editing (top level) Data studies and planning of editing process DATA target of the process ACTION ENTITIES Editing process MAIN EVALUATIONS Process and quality evaluation MAIN DECISIONS Draft, english terms not necessary in final form Data studies and planning of editing process DATA STUDIES -kalvo Preliminary analysis gives an overview on the contentual state of current data (raw or partially processed) Error diagnostics makes an overview on typical errors in the data and possible changes in the error profile of the data. Both phases have program-based and interactive analysis parts of study: Program-based analysis: tabulation and calculation of statistics with relevant subgroups targeted to variables essential for editing process (preliminary analysis) and of known fatal errors and clear suspicions found in the data (error diagnostics). Interactive analysis: based on the experiences of the researcher using suitable IT solutions (analysis methods, graphical methods, observation value views) might catch those, possibly new changed data characteristics (preliminary analysis) or errors (error diagnostics), which cannot be found with prepared programs or are not met before. Editing process (in general) Editing process All editing is realised in the phase of editing process Editing process can include several error identification and correction actions iterative Error identification includes actions, which result to identifying certain and possible error at the observation level or at the group of observations level. Error correction realises corrections of all or some identified errors following the decisions made at the error identification phase. Editing process ERROR IDENTIFICATION (probably subject to further specification) Data processing Realisation of information view for error identification Evaluation of the error identification and decisions of further measures ACTION Methodological group Information for editing Information view View group Editing process ERROR CORRECTION Correction of identified errors Evaluation of realisation of correction ; calculation of ”state of data” indicators Controlling constraints of the edited data and possible corrections Edited data Process and quality evaluation Process and quality can be evaluated with indicators, which should be calculated automatically. The process of calculation is in a constant form. Process and quality evaluation Edited data Indicators describing the editing process ”State of data” indicators (essential estimates at the population level and in relevant subgroups, as in preliminary analysis and during editing process) Indicators revealing the influence of editing on results Indicators in relation with previous results More about the indicators in the 2nd part of the presentation Methodological resources (1) The actions realised in the editing model are supported with the knowledge included in the methodology bank, which describes the methods included in the methodology groups in the different phases of the editing model. It also provides help for the interpretation of the statistical measures produced by the methods. The structure of the methodology bank follows strictly the methodology groups appearing in the editing model. The concept library defines the concepts used in the model and the methodology bank Methodological resources (2) For decisions (forthcoming actions and choices of methods), and for interpretations and evaluations of the results gained before and during processes and for actions required for realizing the methods there must be an instruction collection, which helps during different phases. The instruction collection is based on research work and recommendations, international experiences and practices and Statistical office’s experiences on data sets, error types and practices. Methodology bank, concept library and instructions should be easily available whenever needed (e.g. wikibased). For some part these could be utilised for the documentation of the quality of editing as well. IT solutions (1) Edited t-2 Unedited t-2 Tot t-2 Edited t-1 Unedited t-1 Tot t-1 Pre t-2 Edited t Unedited t Tot t Pre t-1 Example: ”philosophy ” of data preparations for SELEKT (whole lines) . Pre=previous values (for some variables), Tot=totals (or other statistics) The data architecture should make all the data of the previous rounds available (also unedited versions) with coherent variable structure for editing purposes. The system should easily calculate statistics (totals, means …) at the overall level and in subgroups and make them available for creating edit rules and functions and for controlling the changes in time. IT solutions (2) The IT environment should provide solutions to methods existing in the methodology bank (e.g. modules, procedures, macro packages) or at least a flexible platform to construct a program or other ways of action for the method. For larger entities of methods and practices it may provide applications or systems. The environment can include existing software (e.g. Banff, Selekt, LogiPlus) for the realization of some parts. Moving from programming by the user and manual written updates of programs to time and case specific information, modification information and methodological choices given with parameterization. IT solutions (3) The IT environment should allow flexible processing and obtaining of metadata and process data in order to control the process (E&I indicators) and the state of data (data indicators) during process and evaluating the quality of the final data. These indicators should be easily available for the use of quality reporting. In principle, every action should leave a ”trace” for this purpose. In practice, especially interactive operations cause a substantial challenge for this objective. Quality Indicators for Editing (part 2) • UNECE • Work session on Statistical Data Editing • (Oslo, Norway, 24-26 September 2012) Functions of indicators • To achieving the transparency of data for users • Systematic quality control – To control error identification and correction actions • effects on the data – To evaluate the development of the quality of data • in different stages of editing process • overall quality of the final data Aims of the paper • To study useful indicators related to editing process from different sources • To create a consistent and systematic practice for using indicators in Statistics Finland • To collect a list of useful indicators and their definitions Indicators were collected from • EUREDIT / Ray Chambers (2004). Evaluation Criteria for Statistical Editing and Imputation • EDIMBUS (2007). Recommended Practices for Editing and Imputation in Cross-sectional Business Surveys • EUROSTAT (2009a). ESS Standard for Quality Reports • STATISTICS FINLAND (2007). Quality Guidelines for Official Statistics • STATISTICS CANADA (2007). Functional Description of the Banff System for Edit and Imputation Indicators are presented in three categories • Indicators related to raw data – In the beginning of the editing process – Give information about errors on data, errors effect on results and variables’ and subgroups’ significance • Indicators related to error identification – Describe the amount of errors in variables and observations – Evaluate the efficiency of error identification procedures • Indicators related to error correction – Describe quality of data after error correction – Indicates amount of error correction actions – Evaluate the effect on error correction on results or variables Discussion • The number of indicators presented in the paper is quite substantial • Not all indicators are suitable for every type of statistics • Some indicators presented are important but suitable only for specific situations • There for, it is not possible to define a detailed list of indicators to be published with all statistics Discussion (2) • Some standard indicators for editing process should always be computed • Indicators measuring missingness in data are valuable tools for statistics production process by describing the coverage of data in each stage of process • If any editing actions are done, the user of the final product should have access to information on edit rates of the data • Altogether, indicators are essential part of editing process as they provide information on quality of data, results and editing process in general