Case Study Integrated Metadata Driven Statistical Data Management System (IMD SDMS) CSB of Latvia Julija.Drozdova@csb.gov.lv METIS 2010 1 Outline • • • • • • • • The main steps for IMD SDMS creation IMD SDMS fundamental elements Costs & benefits IMD SDMS implementation strategy GSBPM versus SBPM of CSB Current situation and further developments The main lessons learned Proposal for GSBPM 2 The main steps for IMD SDMS creation (1) • Data and metadata collection (1999) • Thoughtful analysis of data and metadata flows (1999) • To set the requirements to the system (19971999) 3 The main steps for IMD SDMS creation (2) the main requirements to IMD SDMS were: • covers full cycle of statistical data processing; • uses process oriented approach; • IMD SDMS must be: - standardized; - integrated; - meta data-driven; - allows automated generation of user application forms (incl. web); - centralized; - has a modular structure; - transparent; 4 IMD SDMS fundamental elements (1) • Core Meta data base module handles all processes of IMD SDMS • Structure of Micro data [Bo Sundgren model] Objects characteristics: Co = O(t).V(t) where: O - is an object type; V - is a variable; t - is a time parameter. Every results of observations is a value of variable (data element) – Co • Two types of tables • Structure of Macro data 5 IMD SDMS fundamental elements (2) • Structure of Micro data (an example) 6 IMD SDMS fundamental elements (3) • Two types of tables: - fixed table (data matrix); - open table (data matrix with various number of rows or columns); Questionnaire consists of chapters and chapters consist of tables. 7 IMD SDMS fundamental elements (4) • Structure of Macro data The estimations are made on the basis of a set of Micro data. Statistical characteristics: Cs = O(t).V(t).f where: O and V - is an object characteristics; t - is a time parameter, f – is an aggregation function (sum, count, average, etc) summarizing the true values of V(t) for the objects in O(t). 8 Costs & benefits • Standardization of statistical data production processes • The basis for the CSB regional restructuring (2003-2004): 5 Data Collection and processing centres replaced previously existing 26 Statistical Regional offices and city Riga office; • Decreasing of statisticians from 180 to 115 9 IMD SDMS implementation strategy (1) • Step-wise approach • 1997 – 1999 CSB and PricewaterhouseCoopers experts were prepared General Technical Requirements for the project “Modernisation of CSB – Data Management System” 10 IMD SDMS implementation strategy (2) • The main requirement: Meta data should be used as the key element in statistical data processing • Additional requirements: - Increase efficiency of the production of statistical information; - Avoid hard code programming via standardisation of procedures and use of Meta data within the statistical data processing; - Increase the quality of the information produced; - Improve processes of statistical data analysis; - Modernise and increase the quality of data dissemination; 11 GSBPM versus SBPM of CSB (draft version) GSBPM versus SBPM of CSB ~51 % 12 Current situation (1) ADS 13 Current situation (2) Metadata description and analysys subsystem Data entry and validation subsystem STATISTICAL REGISTRIES ACTIVE MICRO DATA WEB data entry subsystem REFERENCE METADATA Missed data Imputation Subsystem MACRO DATA Data Analysis subsystem OLAP Data archiving for State archive subsystem RAW DATA ARCHIVED META DATA Registries operational subsystem Data agregation and retrieval subsystem Import export facylities USER ADMINISTRATION User administration subsystem Data dissemination subsystem GIS subsystem Data ACTIVE Archiving Subsystem 14 Further developments • Since 2009 a project has been launched for the IMD SDMS to cover Social statistics domain. Starting from: - Population Census; - Agricultural Census; - Labour Force Survey; - EU-SILC … 15 The main lessons learned (1) • Design of the new information system should be based on the results of deep analysis of statistical surveys: - statistical questionnaires and variables; - statistical processes and data flows; • Statistical data processes and “Variables and questionnaires system” must be harmonized and standardized before creation of the new system; 16 The main lessons learned (2) • The system should provide a full cycle of statistical data processing; • The system should be: - standardized; - integrated; - meta data-driven; - allows automated generation of user application forms (incl. web); - centralized; - has a modular structure; - transparent; 17 The main lessons learned (3) • Motivation of the statisticians to move (from stove-pipe to process oriented) to the new data processing environment is essential; • To establish Metadata group; • Data electronic archiving reduces human resources, expenses of CSB for deposition in the State Archives, time of archiving and physical amount of archiving information (In 2000, Population Census - 21 m3 = 4 DVD) 18 Proposal for GSBPM (1) • Extension of phase 4 – Collect, between subprocesses 4.1 and 4.2 • Extension, between sub-processes 4.3 and 4.4 Why ?: - statistician’s work with respondents and with the list of respondents is a very difficult, heavy process and time consuming process (…; sending of letters to respondents; conduction of the respondents lists; creation of the sample Matrix; clarifications; response control; reminding process; …); - sometimes statistician’s work is pressed for time (…Business tendencies survey…) 19 Proposal for GSBPM (2) Survey’s integration From analytic’s view List of indicators From statistician’s view: Sample Matrix -amount of work -respondents burden -statisticians burden … -response control - etc. From mathematician’s view 20