Increased efficiency by harmonizing metadata and quality Blagica Novkovska, M.Sc State Statistical Office Dame Gruev – 4 1000 Skopje, Republic of Macedonia blagica.novkovska@stat.gov.mk Helena Papazoska State Statistical Office Dame Gruev – 4 1000 Skopje, Republic of Macedonia helena.papazoska@stat.gov.mk Keywords: quality reporting, quality management, reference metadata, business process model 1. Introduction The State Statistical Office (SSO) has made a significant progress in the field of quality management in the last four years. The main quality framework that is implemented at the SSO is EFQM (European Foundation for Quality Management). Fifteen principles from the European Statistics Code of Practice (CoP) (Eurostat 2005) regarding the institutional environment, statistical processes and statistical outputs are set across the institution as clear values, which are recognized in the SSO Strategic Plan 2010 – 2012 (SSO 2010) as well. Quality system and statistical meta information system developments are activities that are underway at the SSO side by side. It was easy to identify a bi-directional linkage between statistical metadata and quality; metadata describe the quality of statistics and vice versa, statistical metadata are defined as a quality component. Principle 15 of the CoP dealing with accessibility and clarity of European Statistics also emphasizes that the accompanying metadata should be documented according to a standardized metadata system (2009/498/EC). After a brief description of the quality related initiatives at the SSO, this paper focuses on tying metadata and quality. First of all, the steps will be presented that led from the initial concept for connecting reference metadata with quality reporting, up to the practical implementation resulting in increased efficiency in quality reporting according to different quality frameworks. Secondly, the notion that the link between metadata and quality should go beyond quality reporting will be developed. And finally, the paper addresses the main open issues that arise while using Generic Statistical Business Process Model (GSBPM) (UNECE Secretariat 2009) in harmonizing metadata and quality, as well as the approach taken to tackle these issues. In the conclusion, the current status of the work and the future challenges will be summarized. 2. Quality Related Initiatives at the SSO The first step in the systematic approach to quality management was in 2006, when EFQM Excellence model with its nine evaluation criteria: five relating to the enablers and four relating to the results was adopted. Even in the initial steps, the intersections between CoP and EFQM (Eurostat 2005) were identified as well as their operational synergies. EFQM puts more emphasis on internal management processes, whereas the CoP, when dealing with processes, focuses more on statistical production aspects. The fact is that EFQM is specially designed for public-sector organizations taking into account their characteristics. On the other hand, CoP is rather specific for statistical offices, and is a document which elaborates the highest common level of business principles in the official statistics. For example, some aspects of the CoP are not covered by EFQM, like principles 2 (mandate for data collection) or 6 (impartiality and objectivity) or single indicators of some principles. The integration of the prevailing ESS quality frameworks, namely the CoP and the EFQM model, is suggested in the Eurostat document (Eurostat 2005). The SSO has performed two self-assessments based on EFQM endorsed by the European Commission. The first self-assessment exercise in 2006 was coached by experts from Destatis and the Czech Statistical Office. A user-friendly introductory model CAF (EIPA 2006) was used as a self-assessment tool that has offered the SSO an opportunity to learn more about itself but also to improve its own performances by using quality management techniques. Based on the findings and recommendations from the 2007 CAF report, short-term and long-term priorities have been defined and an action plan was drafted. The best evidence for the successful completion of the short-term priorities is the organization performance analysis and the results from the second CAF selfassessment conducted in 2010. 3. Organizational impact of closer connections between metadata and quality In the current SSO organizational structure there is no dedicated unit that deals with general methodologies, metadata and quality issues. There is no chief methodologist or general methodology department. This has significant impact on how the work is organized and in a way this “disadvantage” of the SSO has become an organizational advantage in building a closer connection between metadata and quality. Dealing with metadata is a multidisciplinary task, so it has been decided that a cross-sectional working group (WG) should be established as an “umbrella” over all the activities. Members of the WG are from different job positions and different organizational units: persons from subject-matter departments, IT staff and persons responsible for statistical outputs quality. The Director General is the leader of the WG, which means that there is support from the highest authority in the SSO, and the development of a central metadata repository has been defined as one of the highest priorities. High-level management commitment and support is a key factor that contributes to the success of the project. Members of the WG are responsible for: Development and implementation of centralized storage of reference metadata; Dissemination of reference metadata; Documentation, standardization and harmonization of SSO's Business Process Model (BPM); Implementation of the new BPM in the SSO environment; Standardization, documentation and variables inventory; Establishment of harmonized variables repository; Tying metadata and quality; The metadata WG has freedom to make priorities, choose methods, build the production system etc. In accordance with the SSO commitment towards building a teamwork culture in the Office, this WG is supported by several teams (for analyzing respondents’ burden, for implementation of the cost calculation system and the Statistical Confidentiality Committee), which are providing a necessary input from their field of expertise. 4. Reference metadata and quality reporting In early 2009, new versions of the ESS Handbook on quality reporting (EHQR) and ESS Standards for Quality Reports (ESQR) were published by Eurostat and it was easy to recognize that many of the quality and performance indicators from the ESQR (relevance, accuracy, timeliness, punctuality, accessibility and clarity, comparability, coherence) can be clearly identified in the "Euro-SDMX Metadata Structure" (ESMS) (Eurostat 2009). Regulation (EC) No 223/2009 of the European Parliament and of the Council of 11 March 2009 on European Statistics provided a reference framework for the next Commission Recommendation, of 23 June 2009 on reference metadata where the 21 metadata cross-domain concepts and sub-concepts from ESMS are listed. The compilation of ESMS concepts in a standardized quality report is advantageous for supporting internal self-assessment, for reporting to Eurostat and also for user-oriented quality reporting because it puts considerable emphasis on output quality. In the next step, user-oriented quality reports should be extended with producer-oriented (i.e. aimed at internal assessment of process and output quality) in order to fulfill quality reporting standards in accordance with ESQR. An important milestone in the improvement of efficiency has been reached at the SSO with the implementation of the available mapping between reference metadata (ESMS) and concepts applied in different quality frameworks (OECD metastore, IMF DQAF, SDMX), done in the frame of the SDMX initiative (SDMX Guidelines 2009). This mapping has triggered SSOs’ idea to tie reference metadata to quality assessment data in the quality reporting, therefore implementing the core principle of metadata, re-usability. 5. Development of Euro SDMX Database and Application Adhering to the concept of re-usability of metadata, increased efficiency in quality reporting at the SSO has been achieved in a multiple-stage process. Activities started with the Managerial Board decision for adopting Euro-SDMX Metadata Structure in the SSO. The fact that reference metadata in the SSO exist in various types of shapes and storage (in particular there was no officially promulgated and adopted standard at the organizational level) has simplified the process of adoption of the Eurostat recommendation for implementation of ESMS at the national level. Before the actual implementation of ESMS has started, SSO’s three-level Theme/Statistics structure was converted into five-level structure in order to achieve better harmonization. The old Theme/Statistics structure consisted of: Statistical Area, Statistical Sub-area and Statistics. The revision was done in accordance with the Eurostat Compendium structure: Domain, Theme and Module. Furthermore, a breakdown was made of the module level into two additional levels: group level and level of statistics. The core of the SDMX model of reference metadata is the concept of “Metadata Structure Definition”. Reference metadata may be attached to different object types (a data set, a time series, an observation) and Metadata Structure Definition also identifies the object the metadata are attached to (Concept 21). For the time being they are attached to a level of statistics, with an intention in the future to make it at the highest possible level - group level or even at a module level where appropriate. Methodological clarification of ESMS was done and the impact on existing reference metadata was analyzed as well. Although the existence of description and ESS guidelines in the ESMS structure is very helpful, at a conceptual and sub-conceptual level we attached an additional explanation available from another source in order to provide enough information for correct interpretation by subject-matter staff responsible for different statistics in the Office. All documents were translated into/from Macedonian language. This has been considered as a preparatory phase before the actual start of the development process for an IT system that is used as a national editor for reference metadata: ESMS Application and ESMS database. The Figure 1 shows the overall conceptual model of the system. It is divided into three sectors inter-linked among each other: Organizational Structure, Module-Group/Statistics Structure and ESMS Reference Metadata Framework. The system is organized as a comprehensive structured collection of ESMS concepts for the Module/ Group/ Statistics structure, and in particular for the level of statistics aiming at documenting methodologies, quality and the statistical production processes in general. Building blocks of SSOs’ ESMS system Domain/ Theme/ Module/ Group/ Statistics Classification Organizational Structure ESMS Concepts/ Subconcepts and Mapping Schemes Figure 1: ESMS DB Basic scheme History and version handling mechanism is implemented. History management makes it possible to keep consistency and quality among different versions of reference metadata. It is very important to distinguish how many versions we want to keep: either all or milestone only, in order to reduce uncertainty when doing comparisons between different versions. Only significant changes are denoted as new versions and only one version is current and active. The previous one, which is the base for the new one, is not active any more. Besides activity information each version is accompanied with additional information: user information, date and time of modification, type of modification (initial data entry, modification due to technical correction, modification due to approved and significant correction and deletion). In reference to this, modifying "shared" objects must be carefully managed since it affects the components, the model itself and any previous data being shared. The application has multi-lingual concept and automatically switches keyboard language and layout as needed from a user perspective in order to support Cyrillic code page. Regarding the fact that reference metadata is mainly free text, it has been discussed about using formatted text (styled text, rich text), as opposed to plain text. The advantages of the first one are that it has styling information beyond the minimum of semantic elements: colors, styles (boldface, italic), sizes and special features (such as images, tables, hyperlinks). As enhanced features, we plan to introduce formatted text capabilities and to develop advanced searching capabilities. The application is user-friendly and flexible and it provides user control via intuitive structure of information; however, a user manual for application usage is also available. First usage of the ESMS application was done by five users and five pilot statistics from different statistical domains were documented: Foreign Trade Statistics, Household Budget Statistics, Marriages Statistics and from Industry Statistics: Monthly Industry Survey and Turnover and New Orders in Industry. Figure 2 displays ESMS Application user interface. (Papazoska, H., Ristevska Karajovanovikj, B. and Lipikj, S. 2010) Figure 2: ESMS Application Interface 6. Increased efficiency in quality reporting by meeting various quality reporting requirements The SSO devotes a lot of time and energy to reporting to international organizations. The ESMS structure includes data quality information and once statistics are documented through the ESMS user interface, one can easily build predefined quality report just by using concepts and subconcepts related to quality. The most important benefit of having a central database of reference metadata (ESMS Database) is its exploitation for reporting. Reports could be produced in different shapes, depending on the user (for internal assessment, for submission to corresponding Eurostat units or other international organizations, for dissemination purposes, etc.). They can also contain all or just part of the concepts and appropriate sub-concepts (Figure 3). The compatibility and possibility of mapping between ESMS and SDMX cross-domain concepts, IMF-DQAF, and OECD-Metastore metadata frameworks, allows users to quickly and accurately build different reports for different organizations. The concept of re-usability means an increased efficiency over a period of time. Report Manager is included in Microsoft SQL Server Reporting Services and from a user perspective it is used as a report viewer and provides user interface to a report server where reports are deployed. The report layout design is very similar to Eurostat ESMS reports. We implemented, or rather tried to implement, the proposed mapping documents of ESMS with SDMX Cross-domain Concepts, IMF-DQAF, and OECD-Metastore. It was quite a challenging task because different types of mapping (1:1, 1:n and n:1) existed and at different levels as well. For the time being, the European Standard Quality Report (ESQR) has not yet been implemented because of not clarified mapping. The statistical users have the option to export reports in various printable formats (xls, csv, xml, pdf, tiff, web archive) for internal assessments, documentation and dissemination by selecting the desirable format. (Papazoska, H., Ristevska Karajovanovikj, B. and Lipikj, S. 2010) Figure 3 : Production of ESMS metadata report for Foreign Trade Statistics 7. Tying metadata and quality beyond quality reporting Metadata and Quality are two sides of the same coin; both are used to describe statistical system and its outputs. At the SSO, we see metadata in a broad perspective, as an adequate supply of information on statistical data that gives an insight into the production process and the possibility to analyze it and decide how the data can be used correctly. Metadata are neither new nor modern phenomenon. Documentation of the statistical operations including: definitions of concepts, methods of data collection, processing, dissemination, data about coverage, response rates, number of missing values, estimates of errors…etc has always been provided as a part of the daily work. Nevertheless, subject-matter staff has been permanently complaining that they have to meet deadlines and have no time to provide metadata. SSO's experience in collecting metadata probably is not a unique one. However, with the successful implementation of the ESMS database and user application, the SSO has somehow “demystified” the word metadata. The benefits and efficiency in meeting various quality reporting requirements based on provided reference metadata become obvious, shortly after the initial efforts for providing reference metadata. Subject-matter staff was encouraged to undertake further activities concerning centralized metadata collection that would provide the metadata necessary in the quality assurance and quality assessment. Now, when the relation between reference metadata and quality reporting is matured, the SSO is focused on connecting process metadata and quality assessment, which extends the importance of using metadata already provided elsewhere. Metadata are present in every activity of the statistical business process, either created or transferred from a previous phase (UNECE Secretariat 2009). The key challenge is to ensure that these metadata are captured as early as possible (i.e. when they are produced), stored and transferred from phase to phase alongside the data they refer to, but without asking producers to deliver metadata in specific formats, using specific forms. Our idea is to make less ambitious, but still useful metadata demands. In this regard, the SSO will rely on the experience gained with the ESMS application and on the potential of the BPM in facilitating identification of metadata relevant to each activity and their link with the process quality management. In order to identify the full framework of metadata used within the statistical process including quality assessment, a prerequisite is the sound knowledge and understanding of SSO’s BPM which will eventually result in the adoption i.e. mapping to GSBPM. 8. Impact of Statistical Business Process Model in harmonizing metadata and quality 8.1. The purpose of modeling and how the Generic Statistical Business Process Model (GSBPM) fits in it The term Business Process Model (BPM) is the noun form of Business Process Modeling and refers to a structural representation which defines a specified flow of activities in a particular business. Business Process Modeling is performed in order to improve process efficiency, usually resulting in an improved quality as well. Quality management and quality assurance concepts have existed for a long time in the private sector, but for the public administration they have become an important issue since the time when the taxpayers increased their expectations for quality products and services in return for their money. Concepts such as documentation, process description, metadata, quality assurance, and quality assessment were initially used in engineering and manufacturing, later followed by information and communication technology industry and software development in particular. Presently, statisticians have a very difficult task to adopt (or customize) the frameworks and models developed originally in the engineering and software industry, in order to make them relevant with regard to the particularities of the statistical production system where the main process is to deliver information to customers that they can use for their decisions. The original intention when designing GSBPM was to provide a basis for statistical organizations to agree on standard terminology to aid their discussions on developing statistical metadata systems and processes, but later GSBPM was extended with the inclusion of the overarching processes (UNECE Secretariat 2009). However, metadata and quality management included as over-arching processes give a very vague perception of the generated/used metadata and the implemented quality assurance concepts in the processes and sub-processes. At the SSO we consider that evaluation and feedback throughout the statistical business process have to be clearly recognized and become more visible, so that BPM could fulfill the role that it has by definition: support in the improvement of process efficiency and quality. This just means that the original idea when GSBPM was promoted could be further extended to provide a basis for statistical organizations to agree on some standardization in regard to process and quality metadata and measurement of process performance, covering all the aspects of the statistical value chain. Generally speaking, two types of metadata could be distinguished: metadata needed to ensure proper production leading to a statistics of reasonable quality (ex-ante metadata) and metadata produced by the production process and used to evaluate the quality of the result (expost metadata). The aim of the SSO is to identify and properly link these metadata with the processes and sub-processes (and further with the phases and activities) of the BPM and between them, resulting in recognized feedback loops. 8.2 GSBPM vs. Plan-Do-Check-Act (PDCA) Cycle and how to standardize feedback loops The SSO finds that it would be advantageous to see more explicit references to metadata and quality management throughout the BPM including evaluation and feedback loops. Feedback loops and mechanisms are intrinsic to the four-step model for carrying out change for improvements, the PDCA Cycle. PLAN ACT DO CHECK Figure 4: Plan-Do-Check-Act (PDCA) Cycle The four steps of PDCA cycle viewed from statistical BPM perspective are as follows: Plan Establish the objectives and processes necessary to deliver results in accordance with the expected output (specify needs, design and build) Do Implement the processes (collect) Check Compare the results against the expected (process and analyze). Act Determine differences between the users’ needs and disseminated products and identify where to apply changes that will include improvements (disseminate and evaluate). At the moment, the SSO is looking for a solution how to translate the PDCA cycle into the statistical business process model. GSBPM starts with Specify needs and stops with Evaluate, but ideally it should be viewed as a system with feed back from Dissemination into Specify needs. Such a system could provide sound basis for identifying and linking process metadata and quality metadata. The current GSBPM viewed from another perspective shows similarities to the waterfall model: progress flows from the top to the bottom, like a waterfall (Wikipedia 2010), except the occasional situations when some elements are forming iterative loops. Specify Needs Design Build Collect Process Analyse Disseminate Archive Evaluate Figure 5: GSBPM viewed as Waterfall Model The acceptable alternative of this model, the V Model (Wikipedia 2010), usually used in the software industry, could be applicable in managing the complexity of the statistical BPM. SSO's activities are focused on investigating the option to embrace it into the own business process model. Specify Needs Evaluate Disseminate Evaluate Design Build Evaluate Archive Analyse Process Collect Process Analyse Disseminate Archive Evaluate Figure 6: From the Waterfall Model to the V Model In the above V Model, two major views of the metadata life cycle can be clearly identified: Left side (inputs): Activity-oriented view of metadata life cycle – description of processes Right side (outputs): Entity-oriented view of a metadata life cycle – description of deliverables. Right side (outputs) are very much based on technology i.e. on the design of the IT systems which should have been evaluated with similar tools for quality assessment; this will not be discussed here because it is out of the scope of this paper and the topics of the conference in general. If the outputs do not meet the desired specifications, the process itself or the system behind has to be changed. A simple example is a survey with unacceptably high non-response rate. System changes may comprise changes in questionnaires or in the time schedule for fieldwork. A more fundamental change would be to base the survey on other data sources for example. The V Model is lifecycle project process model providing a framework for process quality assessment and improvement. It is a variation of the waterfall model that makes explicit the dependency between planed activities and resulting activities. All activities from requirements to deliverables focus on permanent evaluation, whereas evaluation of the system is in the peer processes, in addition of the evaluating iterations of separate processes and sub-processes. Evaluation doesn’t explicitly appear as a phase of this process model, not because we don’t have these phases at the SSO, but because it is considered as feedback from the peer processes in the V process model. Ex-ante metadata generated by the different sub-processes considered as inputs would be compared against ex-post metadata generated in sub-processes considered as outputs. This approach recognizes the importance of evaluation and feedback throughout the statistical business process which is a prerequisite for sound quality management. Quality assessments take as input existing process metadata and quality requirements and evaluate the statistical process and its outputs against pre-defined standards, identify strengths and weaknesses and derive the corresponding improvement actions. What is obvious is that the V Model is favorable concerning the provided framework for process quality assessment and improvement. Exhaustive study of processes, phases and sub-phases will be done at the SSO in order to prove whether this model could foster better integration work on statistical metadata and quality. 9. Conclusion The key points from the SSO experience and current development work presented in this paper could be summarized in the statement that it is possible and desirable to link two frameworks: Metadata management and Quality management in order to improve the efficiency of the statistical production system. The paper elaborates how tying reference metadata and quality indicators from ESMS structure has produced an immediate impact on the quality reporting for different needs and users. Future development activities at the SSO are focused on harmonizing process metadata and quality assurance concepts, which would have an impact on increased efficiency of the statistical production. It will be a rather challenging task because process quality is less straightforward in its definition, and there are no ESS standards as for product quality i.e. quality reporting. However, the SSO will rely on the experience gained with the ESMS application and on the potential of the BPM in facilitating identification of metadata relevant to each process as well as their linking with the process quality concepts. GSBPM has been proven as valuable for standardized approach when dealing with metadata and quality data by providing common terminology to describe the statistical business process. On the other hand, when it comes to its role in the improvement of process efficiency and quality, feedback loops and mechanisms for carrying out change for improvements should be clearly recognized and made more visible. This just means that the original idea when GSBPM was promoted could be further extended to provide a basis for statistical organizations to agree on some standardization in regard to process and quality metadata and measurement of process performance, covering in details all the aspects of the statistical value chain. In the first step, the SSO has to finish the mapping of its own business process model against the GSBPM. The development work on the meta-information system will be continued by harmonizing process metadata and quality indicators wherever possible, with full awareness that it is a very complex task to turn the general principles of the European Statistics Code of Practice (sound methodology, appropriate statistical procedures, non-excessive burden on respondents and cost effectiveness) (Eurostat 2005) into process metadata. REFERENCES EIPA 2006. “Common Assessment Framework (CAF)” http://www.eipa.eu/en/pages/show/&tid=102 European Commission 2009. Commission Recommendation (2009/498/EC) of 23 June 2009 on reference metadata for the European Statistical System ((2)) http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2009:168:0050:0055:EN:PDF European Parliament and the Council 2009. Regulation (EC) No 223/2009 of the European Parliament and of the Council of 11 March 2009 on European Statistics http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2009:087:0164:0173:En:PDF Eurostat 2005. European Statistics Code of Practice Eurostat 2005. Mapping of intersections between the European Statistics Code of Practice, the LEG on Quality recommendations and the EFQM Excellence Model Criteria http://epp.eurostat.ec.europa.eu/portal/pls/portal/!PORTAL.wwpob_page.show?_docname=68123.P DF Eurostat 2009. “EURO – SDMX Metadata Structure (release 3, March 2009)” http://epp.eurostat.ec.europa.eu/portal/page/portal/statistics/metadata/metadata_structure Götzfried , A. and Linden, H. 2010. “Quality reporting within the Eurostat and ESS metadata systems simplification and improvement of ESS data quality reporting”, Paper presented at the Work Session on Statistical Metadata (METIS), Geneva, 9-11 March 2010 http://www.unece.org/stats/documents/2010.03.metis.htm Papazoska, H., Ristevska Karajovanovikj, B. and Lipikj, S. 2010. “The value of adopting and implementation of ESMS structure in Macedonian State Statistical Office”, Paper presented at the Work Session on Statistical Metadata (METIS), Geneva, 9-11 March 2010 http://www.unece.org/stats/documents/2010.03.metis.htm SDMX Guidelines 2009. “ Mapping of SDMX Cross-Domain Concepts to metadata frameworks at international organisations (IMF-Data Quality Assessment Framework, Eurostat-SDMX Metadata Structure and OECD-Metastore)” http://sdmx.org/?page_id=11 SSO 2010. “Strategic Plan 2010 – 2012” , State Statistical Office of the Republic of Macedonia. http://www.stat.gov.mk/pdf/StrateskiPlan/StrateskiPlan2010_2012en.pdf UNECE Secretariat 2009. “Generic Statistical Business Process Model, Version 4.0 – April 2009” http://www1.unece.org/stat/platform/display/metis/The+Generic+Statistical+Business+Process+Mo del Wikipedia 2010. http://en.wikipedia.org/wiki/V-Model http://en.wikipedia.org/wiki/Waterfall_model