Paper - Quality on Statistics 2010

advertisement
Increased efficiency by harmonizing metadata and quality
Blagica Novkovska, M.Sc
State Statistical Office
Dame Gruev – 4
1000 Skopje, Republic of Macedonia
blagica.novkovska@stat.gov.mk
Helena Papazoska
State Statistical Office
Dame Gruev – 4
1000 Skopje, Republic of Macedonia
helena.papazoska@stat.gov.mk
Keywords:
quality reporting, quality management, reference metadata, business process model
1. Introduction
The State Statistical Office (SSO) has made a significant progress in the field of quality
management in the last four years. The main quality framework that is implemented at the SSO is
EFQM (European Foundation for Quality Management). Fifteen principles from the European
Statistics Code of Practice (CoP) (Eurostat 2005) regarding the institutional environment, statistical
processes and statistical outputs are set across the institution as clear values, which are recognized
in the SSO Strategic Plan 2010 – 2012 (SSO 2010) as well. Quality system and statistical meta
information system developments are activities that are underway at the SSO side by side. It was
easy to identify a bi-directional linkage between statistical metadata and quality; metadata describe
the quality of statistics and vice versa, statistical metadata are defined as a quality component.
Principle 15 of the CoP dealing with accessibility and clarity of European Statistics also emphasizes
that the accompanying metadata should be documented according to a standardized metadata
system (2009/498/EC).
After a brief description of the quality related initiatives at the SSO, this paper focuses on
tying metadata and quality. First of all, the steps will be presented that led from the initial concept
for connecting reference metadata with quality reporting, up to the practical implementation
resulting in increased efficiency in quality reporting according to different quality frameworks.
Secondly, the notion that the link between metadata and quality should go beyond quality reporting
will be developed. And finally, the paper addresses the main open issues that arise while using
Generic Statistical Business Process Model (GSBPM) (UNECE Secretariat 2009) in harmonizing
metadata and quality, as well as the approach taken to tackle these issues. In the conclusion, the
current status of the work and the future challenges will be summarized.
2. Quality Related Initiatives at the SSO
The first step in the systematic approach to quality management was in 2006, when EFQM
Excellence model with its nine evaluation criteria: five relating to the enablers and four relating to
the results was adopted. Even in the initial steps, the intersections between CoP and EFQM
(Eurostat 2005) were identified as well as their operational synergies. EFQM puts more emphasis
on internal management processes, whereas the CoP, when dealing with processes, focuses more on
statistical production aspects. The fact is that EFQM is specially designed for public-sector
organizations taking into account their characteristics. On the other hand, CoP is rather specific for
statistical offices, and is a document which elaborates the highest common level of business
principles in the official statistics. For example, some aspects of the CoP are not covered by EFQM,
like principles 2 (mandate for data collection) or 6 (impartiality and objectivity) or single indicators
of some principles. The integration of the prevailing ESS quality frameworks, namely the CoP and
the EFQM model, is suggested in the Eurostat document (Eurostat 2005).
The SSO has performed two self-assessments based on EFQM endorsed by the European
Commission. The first self-assessment exercise in 2006 was coached by experts from Destatis and
the Czech Statistical Office. A user-friendly introductory model CAF (EIPA 2006) was used as a
self-assessment tool that has offered the SSO an opportunity to learn more about itself but also to
improve its own performances by using quality management techniques. Based on the findings and
recommendations from the 2007 CAF report, short-term and long-term priorities have been defined
and an action plan was drafted. The best evidence for the successful completion of the short-term
priorities is the organization performance analysis and the results from the second CAF selfassessment conducted in 2010.
3. Organizational impact of closer connections between metadata and quality
In the current SSO organizational structure there is no dedicated unit that deals with general
methodologies, metadata and quality issues. There is no chief methodologist or general
methodology department. This has significant impact on how the work is organized and in a way
this “disadvantage” of the SSO has become an organizational advantage in building a closer
connection between metadata and quality. Dealing with metadata is a multidisciplinary task, so it
has been decided that a cross-sectional working group (WG) should be established as an “umbrella”
over all the activities. Members of the WG are from different job positions and different
organizational units: persons from subject-matter departments, IT staff and persons responsible for
statistical outputs quality. The Director General is the leader of the WG, which means that there is
support from the highest authority in the SSO, and the development of a central metadata repository
has been defined as one of the highest priorities. High-level management commitment and support
is a key factor that contributes to the success of the project.
Members of the WG are responsible for:
 Development and implementation of centralized storage of reference metadata;
 Dissemination of reference metadata;
 Documentation, standardization and harmonization of SSO's Business Process
Model (BPM);
 Implementation of the new BPM in the SSO environment;
 Standardization, documentation and variables inventory;
 Establishment of harmonized variables repository;
 Tying metadata and quality;
The metadata WG has freedom to make priorities, choose methods, build the production
system etc. In accordance with the SSO commitment towards building a teamwork culture in the
Office, this WG is supported by several teams (for analyzing respondents’ burden, for
implementation of the cost calculation system and the Statistical Confidentiality Committee), which
are providing a necessary input from their field of expertise.
4. Reference metadata and quality reporting
In early 2009, new versions of the ESS Handbook on quality reporting (EHQR) and ESS
Standards for Quality Reports (ESQR) were published by Eurostat and it was easy to recognize that
many of the quality and performance indicators from the ESQR (relevance, accuracy, timeliness,
punctuality, accessibility and clarity, comparability, coherence) can be clearly identified in the
"Euro-SDMX Metadata Structure" (ESMS) (Eurostat 2009). Regulation (EC) No 223/2009 of the
European Parliament and of the Council of 11 March 2009 on European Statistics provided a
reference framework for the next Commission Recommendation, of 23 June 2009 on reference
metadata where the 21 metadata cross-domain concepts and sub-concepts from ESMS are listed.
The compilation of ESMS concepts in a standardized quality report is advantageous for supporting
internal self-assessment, for reporting to Eurostat and also for user-oriented quality reporting
because it puts considerable emphasis on output quality. In the next step, user-oriented quality
reports should be extended with producer-oriented (i.e. aimed at internal assessment of process and
output quality) in order to fulfill quality reporting standards in accordance with ESQR.
An important milestone in the improvement of efficiency has been reached at the SSO with
the implementation of the available mapping between reference metadata (ESMS) and concepts
applied in different quality frameworks (OECD metastore, IMF DQAF, SDMX), done in the frame
of the SDMX initiative (SDMX Guidelines 2009). This mapping has triggered SSOs’ idea to tie
reference metadata to quality assessment data in the quality reporting, therefore implementing the
core principle of metadata, re-usability.
5. Development of Euro SDMX Database and Application
Adhering to the concept of re-usability of metadata, increased efficiency in quality reporting
at the SSO has been achieved in a multiple-stage process. Activities started with the Managerial
Board decision for adopting Euro-SDMX Metadata Structure in the SSO. The fact that reference
metadata in the SSO exist in various types of shapes and storage (in particular there was no
officially promulgated and adopted standard at the organizational level) has simplified the process
of adoption of the Eurostat recommendation for implementation of ESMS at the national level.
Before the actual implementation of ESMS has started, SSO’s three-level Theme/Statistics
structure was converted into five-level structure in order to achieve better harmonization. The old
Theme/Statistics structure consisted of: Statistical Area, Statistical Sub-area and Statistics. The
revision was done in accordance with the Eurostat Compendium structure: Domain, Theme and
Module. Furthermore, a breakdown was made of the module level into two additional levels: group
level and level of statistics.
The core of the SDMX model of reference metadata is the concept of “Metadata Structure
Definition”. Reference metadata may be attached to different object types (a data set, a time series,
an observation) and Metadata Structure Definition also identifies the object the metadata are
attached to (Concept 21). For the time being they are attached to a level of statistics, with an
intention in the future to make it at the highest possible level - group level or even at a module level
where appropriate. Methodological clarification of ESMS was done and the impact on existing
reference metadata was analyzed as well. Although the existence of description and ESS guidelines
in the ESMS structure is very helpful, at a conceptual and sub-conceptual level we attached an
additional explanation available from another source in order to provide enough information for
correct interpretation by subject-matter staff responsible for different statistics in the Office. All
documents were translated into/from Macedonian language. This has been considered as a
preparatory phase before the actual start of the development process for an IT system that is used as
a national editor for reference metadata: ESMS Application and ESMS database.
The Figure 1 shows the overall conceptual model of the system. It is divided into three sectors
inter-linked among each other: Organizational Structure, Module-Group/Statistics Structure and
ESMS Reference Metadata Framework. The system is organized as a comprehensive structured
collection of ESMS concepts for the Module/ Group/ Statistics structure, and in particular for the
level of statistics aiming at documenting methodologies, quality and the statistical production
processes in general.
Building blocks of SSOs’ ESMS system
Domain/
Theme/
Module/
Group/
Statistics
Classification
Organizational
Structure
ESMS
Concepts/
Subconcepts
and
Mapping
Schemes
Figure 1: ESMS DB Basic scheme
History and version handling mechanism is implemented. History management makes it
possible to keep consistency and quality among different versions of reference metadata. It is very
important to distinguish how many versions we want to keep: either all or milestone only, in order
to reduce uncertainty when doing comparisons between different versions. Only significant changes
are denoted as new versions and only one version is current and active. The previous one, which is
the base for the new one, is not active any more. Besides activity information each version is
accompanied with additional information: user information, date and time of modification, type of
modification (initial data entry, modification due to technical correction, modification due to
approved and significant correction and deletion). In reference to this, modifying "shared" objects
must be carefully managed since it affects the components, the model itself and any previous data
being shared.
The application has multi-lingual concept and automatically switches keyboard language and
layout as needed from a user perspective in order to support Cyrillic code page. Regarding the fact
that reference metadata is mainly free text, it has been discussed about using formatted text (styled
text, rich text), as opposed to plain text. The advantages of the first one are that it has styling
information beyond the minimum of semantic elements: colors, styles (boldface, italic), sizes and
special features (such as images, tables, hyperlinks). As enhanced features, we plan to introduce
formatted text capabilities and to develop advanced searching capabilities.
The application is user-friendly and flexible and it provides user control via intuitive structure
of information; however, a user manual for application usage is also available. First usage of the
ESMS application was done by five users and five pilot statistics from different statistical domains
were documented: Foreign Trade Statistics, Household Budget Statistics, Marriages Statistics and
from Industry Statistics: Monthly Industry Survey and Turnover and New Orders in Industry.
Figure 2 displays ESMS Application user interface. (Papazoska, H., Ristevska Karajovanovikj, B.
and Lipikj, S. 2010)
Figure 2: ESMS Application Interface
6. Increased efficiency in quality reporting by meeting various quality reporting requirements
The SSO devotes a lot of time and energy to reporting to international organizations. The
ESMS structure includes data quality information and once statistics are documented through the
ESMS user interface, one can easily build predefined quality report just by using concepts and subconcepts related to quality. The most important benefit of having a central database of reference
metadata (ESMS Database) is its exploitation for reporting.
Reports could be produced in different shapes, depending on the user (for internal assessment,
for submission to corresponding Eurostat units or other international organizations, for
dissemination purposes, etc.). They can also contain all or just part of the concepts and appropriate
sub-concepts (Figure 3). The compatibility and possibility of mapping between ESMS and SDMX
cross-domain concepts, IMF-DQAF, and OECD-Metastore metadata frameworks, allows users to
quickly and accurately build different reports for different organizations. The concept of re-usability
means an increased efficiency over a period of time. Report Manager is included in Microsoft SQL
Server Reporting Services and from a user perspective it is used as a report viewer and provides
user interface to a report server where reports are deployed. The report layout design is very similar
to Eurostat ESMS reports. We implemented, or rather tried to implement, the proposed mapping
documents of ESMS with SDMX Cross-domain Concepts, IMF-DQAF, and OECD-Metastore. It
was quite a challenging task because different types of mapping (1:1, 1:n and n:1) existed and at
different levels as well. For the time being, the European Standard Quality Report (ESQR) has not
yet been implemented because of not clarified mapping.
The statistical users have the option to export reports in various printable formats (xls, csv,
xml, pdf, tiff, web archive) for internal assessments, documentation and dissemination by selecting
the desirable format. (Papazoska, H., Ristevska Karajovanovikj, B. and Lipikj, S. 2010)
Figure 3 : Production of ESMS metadata report for Foreign Trade Statistics
7. Tying metadata and quality beyond quality reporting
Metadata and Quality are two sides of the same coin; both are used to describe statistical
system and its outputs. At the SSO, we see metadata in a broad perspective, as an adequate supply
of information on statistical data that gives an insight into the production process and the possibility
to analyze it and decide how the data can be used correctly.
Metadata are neither new nor modern phenomenon. Documentation of the statistical
operations including: definitions of concepts, methods of data collection, processing, dissemination,
data about coverage, response rates, number of missing values, estimates of errors…etc has always
been provided as a part of the daily work. Nevertheless, subject-matter staff has been permanently
complaining that they have to meet deadlines and have no time to provide metadata. SSO's
experience in collecting metadata probably is not a unique one. However, with the successful
implementation of the ESMS database and user application, the SSO has somehow “demystified”
the word metadata. The benefits and efficiency in meeting various quality reporting requirements
based on provided reference metadata become obvious, shortly after the initial efforts for providing
reference metadata. Subject-matter staff was encouraged to undertake further activities concerning
centralized metadata collection that would provide the metadata necessary in the quality assurance
and quality assessment.
Now, when the relation between reference metadata and quality reporting is matured, the SSO
is focused on connecting process metadata and quality assessment, which extends the importance of
using metadata already provided elsewhere. Metadata are present in every activity of the statistical
business process, either created or transferred from a previous phase (UNECE Secretariat 2009).
The key challenge is to ensure that these metadata are captured as early as possible (i.e. when they
are produced), stored and transferred from phase to phase alongside the data they refer to, but
without asking producers to deliver metadata in specific formats, using specific forms. Our idea is
to make less ambitious, but still useful metadata demands. In this regard, the SSO will rely on the
experience gained with the ESMS application and on the potential of the BPM in facilitating
identification of metadata relevant to each activity and their link with the process quality
management.
In order to identify the full framework of metadata used within the statistical process
including quality assessment, a prerequisite is the sound knowledge and understanding of SSO’s
BPM which will eventually result in the adoption i.e. mapping to GSBPM.
8. Impact of Statistical Business Process Model in harmonizing metadata and quality
8.1. The purpose of modeling and how the Generic Statistical Business Process Model (GSBPM) fits
in it
The term Business Process Model (BPM) is the noun form of Business Process Modeling and
refers to a structural representation which defines a specified flow of activities in a particular
business. Business Process Modeling is performed in order to improve process efficiency, usually
resulting in an improved quality as well. Quality management and quality assurance concepts have
existed for a long time in the private sector, but for the public administration they have become an
important issue since the time when the taxpayers increased their expectations for quality products
and services in return for their money. Concepts such as documentation, process description,
metadata, quality assurance, and quality assessment were initially used in engineering and
manufacturing, later followed by information and communication technology industry and software
development in particular. Presently, statisticians have a very difficult task to adopt (or customize)
the frameworks and models developed originally in the engineering and software industry, in order
to make them relevant with regard to the particularities of the statistical production system where
the main process is to deliver information to customers that they can use for their decisions.
The original intention when designing GSBPM was to provide a basis for statistical
organizations to agree on standard terminology to aid their discussions on developing statistical
metadata systems and processes, but later GSBPM was extended with the inclusion of the
overarching processes (UNECE Secretariat 2009). However, metadata and quality management
included as over-arching processes give a very vague perception of the generated/used metadata and
the implemented quality assurance concepts in the processes and sub-processes. At the SSO we
consider that evaluation and feedback throughout the statistical business process have to be clearly
recognized and become more visible, so that BPM could fulfill the role that it has by definition:
support in the improvement of process efficiency and quality.
This just means that the original idea when GSBPM was promoted could be further extended
to provide a basis for statistical organizations to agree on some standardization in regard to process
and quality metadata and measurement of process performance, covering all the aspects of the
statistical value chain. Generally speaking, two types of metadata could be distinguished: metadata
needed to ensure proper production leading to a statistics of reasonable quality (ex-ante metadata)
and metadata produced by the production process and used to evaluate the quality of the result (expost metadata). The aim of the SSO is to identify and properly link these metadata with the
processes and sub-processes (and further with the phases and activities) of the BPM and between
them, resulting in recognized feedback loops.
8.2 GSBPM vs. Plan-Do-Check-Act (PDCA) Cycle and how to standardize feedback loops
The SSO finds that it would be advantageous to see more explicit references to metadata and
quality management throughout the BPM including evaluation and feedback loops. Feedback loops
and mechanisms are intrinsic to the four-step model for carrying out change for improvements, the
PDCA Cycle.
PLAN
ACT
DO
CHECK
Figure 4: Plan-Do-Check-Act (PDCA) Cycle
The four steps of PDCA cycle viewed from statistical BPM perspective are as follows:
 Plan
Establish the objectives and processes necessary to deliver results in accordance with the
expected output (specify needs, design and build)
 Do
Implement the processes (collect)
 Check
Compare the results against the expected (process and analyze).
 Act
Determine differences between the users’ needs and disseminated products and identify
where to apply changes that will include improvements (disseminate and evaluate).
At the moment, the SSO is looking for a solution how to translate the PDCA cycle into the
statistical business process model. GSBPM starts with Specify needs and stops with Evaluate, but
ideally it should be viewed as a system with feed back from Dissemination into Specify needs. Such
a system could provide sound basis for identifying and linking process metadata and quality
metadata. The current GSBPM viewed from another perspective shows similarities to the waterfall
model: progress flows from the top to the bottom, like a waterfall (Wikipedia 2010), except the
occasional situations when some elements are forming iterative loops.
Specify Needs
Design
Build
Collect
Process
Analyse
Disseminate
Archive
Evaluate
Figure 5: GSBPM viewed as Waterfall Model
The acceptable alternative of this model, the V Model (Wikipedia 2010), usually used in the
software industry, could be applicable in managing the complexity of the statistical BPM. SSO's
activities are focused on investigating the option to embrace it into the own business process model.
Specify Needs
Evaluate
Disseminate
Evaluate
Design
Build
Evaluate
Archive
Analyse
Process
Collect
Process
Analyse
Disseminate
Archive
Evaluate
Figure 6: From the Waterfall Model to the V Model
In the above V Model, two major views of the metadata life cycle can be clearly identified:
 Left side (inputs): Activity-oriented view of metadata life cycle – description of
processes
 Right side (outputs): Entity-oriented view of a metadata life cycle – description of
deliverables.
Right side (outputs) are very much based on technology i.e. on the design of the IT systems
which should have been evaluated with similar tools for quality assessment; this will not be
discussed here because it is out of the scope of this paper and the topics of the conference in
general.
If the outputs do not meet the desired specifications, the process itself or the system behind
has to be changed. A simple example is a survey with unacceptably high non-response rate. System
changes may comprise changes in questionnaires or in the time schedule for fieldwork. A more
fundamental change would be to base the survey on other data sources for example.
The V Model is lifecycle project process model providing a framework for process quality
assessment and improvement. It is a variation of the waterfall model that makes explicit the
dependency between planed activities and resulting activities. All activities from requirements to
deliverables focus on permanent evaluation, whereas evaluation of the system is in the peer
processes, in addition of the evaluating iterations of separate processes and sub-processes.
Evaluation doesn’t explicitly appear as a phase of this process model, not because we don’t
have these phases at the SSO, but because it is considered as feedback from the peer processes in
the V process model.
Ex-ante metadata generated by the different sub-processes considered as inputs would be
compared against ex-post metadata generated in sub-processes considered as outputs. This approach
recognizes the importance of evaluation and feedback throughout the statistical business process
which is a prerequisite for sound quality management. Quality assessments take as input existing
process metadata and quality requirements and evaluate the statistical process and its outputs
against pre-defined standards, identify strengths and weaknesses and derive the corresponding
improvement actions. What is obvious is that the V Model is favorable concerning the provided
framework for process quality assessment and improvement. Exhaustive study of processes, phases
and sub-phases will be done at the SSO in order to prove whether this model could foster better
integration work on statistical metadata and quality.
9. Conclusion
The key points from the SSO experience and current development work presented in this
paper could be summarized in the statement that it is possible and desirable to link two frameworks:
Metadata management and Quality management in order to improve the efficiency of the statistical
production system.
The paper elaborates how tying reference metadata and quality indicators from ESMS
structure has produced an immediate impact on the quality reporting for different needs and users.
Future development activities at the SSO are focused on harmonizing process metadata and quality
assurance concepts, which would have an impact on increased efficiency of the statistical
production. It will be a rather challenging task because process quality is less straightforward in its
definition, and there are no ESS standards as for product quality i.e. quality reporting. However, the
SSO will rely on the experience gained with the ESMS application and on the potential of the BPM
in facilitating identification of metadata relevant to each process as well as their linking with the
process quality concepts.
GSBPM has been proven as valuable for standardized approach when dealing with metadata
and quality data by providing common terminology to describe the statistical business process. On
the other hand, when it comes to its role in the improvement of process efficiency and quality,
feedback loops and mechanisms for carrying out change for improvements should be clearly
recognized and made more visible. This just means that the original idea when GSBPM was
promoted could be further extended to provide a basis for statistical organizations to agree on some
standardization in regard to process and quality metadata and measurement of process performance,
covering in details all the aspects of the statistical value chain.
In the first step, the SSO has to finish the mapping of its own business process model against
the GSBPM. The development work on the meta-information system will be continued by
harmonizing process metadata and quality indicators wherever possible, with full awareness that it
is a very complex task to turn the general principles of the European Statistics Code of Practice
(sound methodology, appropriate statistical procedures, non-excessive burden on respondents and
cost effectiveness) (Eurostat 2005) into process metadata.
REFERENCES
EIPA 2006. “Common Assessment Framework (CAF)”
http://www.eipa.eu/en/pages/show/&tid=102
European Commission 2009. Commission Recommendation (2009/498/EC) of 23 June 2009 on
reference metadata for the European Statistical System ((2))
http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2009:168:0050:0055:EN:PDF
European Parliament and the Council 2009. Regulation (EC) No 223/2009 of the European
Parliament and of the Council of 11 March 2009 on European Statistics
http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2009:087:0164:0173:En:PDF
Eurostat 2005. European Statistics Code of Practice
Eurostat 2005. Mapping of intersections between the European Statistics Code of Practice, the LEG
on Quality recommendations and the EFQM Excellence Model Criteria
http://epp.eurostat.ec.europa.eu/portal/pls/portal/!PORTAL.wwpob_page.show?_docname=68123.P
DF
Eurostat 2009. “EURO – SDMX Metadata Structure (release 3, March 2009)”
http://epp.eurostat.ec.europa.eu/portal/page/portal/statistics/metadata/metadata_structure
Götzfried , A. and Linden, H. 2010. “Quality reporting within the Eurostat and ESS metadata
systems simplification and improvement of ESS data quality reporting”, Paper presented at the
Work Session on Statistical Metadata (METIS), Geneva, 9-11 March 2010
http://www.unece.org/stats/documents/2010.03.metis.htm
Papazoska, H., Ristevska Karajovanovikj, B. and Lipikj, S. 2010. “The value of adopting and
implementation of ESMS structure in Macedonian State Statistical Office”, Paper presented at the
Work Session on Statistical Metadata (METIS), Geneva, 9-11 March 2010
http://www.unece.org/stats/documents/2010.03.metis.htm
SDMX Guidelines 2009. “ Mapping of SDMX Cross-Domain Concepts to metadata frameworks at
international organisations (IMF-Data Quality Assessment Framework, Eurostat-SDMX Metadata
Structure and OECD-Metastore)” http://sdmx.org/?page_id=11
SSO 2010. “Strategic Plan 2010 – 2012” , State Statistical Office of the Republic of Macedonia.
http://www.stat.gov.mk/pdf/StrateskiPlan/StrateskiPlan2010_2012en.pdf
UNECE Secretariat 2009. “Generic Statistical Business Process Model, Version 4.0 – April 2009”
http://www1.unece.org/stat/platform/display/metis/The+Generic+Statistical+Business+Process+Mo
del
Wikipedia 2010.
http://en.wikipedia.org/wiki/V-Model
http://en.wikipedia.org/wiki/Waterfall_model
Download