The role of geospatial information in the modernisation of statistical

advertisement

The role of geospatial information in the modernisation of statistical production and services

(Version 0.1, August 2013)

About this document

This document is the draft output for Work Package 6 of the Frameworks and Standards for Statistical

Modernisation project. This project is one of two key projects chosen and overseen by the High-Level

Group for the Modernisation of Statistical Production and Services, to be completed during 2013. For more information, please see: www1.unece.org/stat/platform/display/hlgbas

This work is licensed under the Creative Commons Attribution 3.0

Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/. If you re-use all or part of this work, please attribute it to the United Nations Economic

Commission for Europe (UNECE), on behalf of the international statistical community.

1.

I.

Introduction

The mission of the High Level Group for the Modernization of Statistical Production and

Services (HLG) is to oversee development of frameworks, and sharing of information, tools and methods, which support the modernisation of statistical organisations. The aim is to improve the efficiency of the statistical production process, and the ability to produce outputs that better meet user needs.

2. The objectives of the HLG are:

To promote common standards, models, tools and methods to support the modernization of official statistics;

To drive new developments in the production, organisation and products of official statistics, ensuring effective coordination and information sharing within official statistics, and with relevant external bodies;

To advise the Bureau of the Conference of European Statisticians on the direction of strategic developments in the modernisation of official statistics, and ensure that there is a maximum of convergence and coordination within the statistical "industry".

3. In order to achieve the goal of industrialization and standardization, there are a complex set of prerequisites that must be aligned and converged. The HLG is responsible for stimulating development of global standards and overseeing activities undertaken in collaboration.

4. The HLG has a focus on defining and promoting frameworks and standards that support statistical modernisation. Each year, they sponsor a small number of projects.

5. In 2013, one of these sponsored projects is the Frameworks and Standards for Statistical

Modernization Project. The project has four main objectives:

To ensure that the international statistical community has access to the standards needed to support the modernisation of statistical production and services

To increase coherence between these standards

To provide support mechanisms for the practical implementation of these standards within national and international statistical organisations

To ensure effective promotion and maintenance of the GSBPM and the GSIM, including the release of new versions as appropriate

6. The HLG Strategy envisions new and existing products and services making use of the vast amounts of data becoming available, to provide better measurements of new aspects of society. Statistical organizations are in a unique position to connect to the data of the emerging information society and transform them into something useful.

7. In order to investigate this area, the Frameworks and Standards for Statistical

Modernization Project includes a work package that focuses on geo-spatial standards. The output of this work package is to provide an initial assessment of the role of geo-spatial standards in the modernisation of official statistics, including how they may relate to the other industry standards and frameworks.

II.

Geospatial information in Statistical Organisations

8. Statistical organizations provide official statistics to the community. These statistics are often used by governments and other parties to make informed evidence based decisions. The statistics are increasingly used to understand and make decisions on complex economic, social and environmental issues.

9. Increasingly often, consumers of the products and services from statistical organisations are seeking an integrated view of the economy, society and/or environment rather than being interested only in the outputs of one particular domain. The modernisation or transformation work being undertaken in many statistical organisations focuses on breaking down barriers between these domains to enable easier integration of this information.

10. One additional information source that can be utilized by statistical organizations is geo spatial information. The availability and use of geo spatial information has increased in recent years. Governments around the world are looking at using spatial approaches to support policy development and evaluation, service delivery and to improve their internal business functions through understanding the location of their constituents.

11. The need to link statistical information to location is widely accepted and acknowledged in a range of fora. The 2012 UN Statistics Commission survey

1

(undertaken as part of a

Programme Review by the Australian Bureau of Statistics) of geospatial activities in member countries identified an extremely strong demand for location based statistics from the 53 countries that responded to the survey.

12. However, this is an area which has received less focus in the modernisation of statistical organisations until recently. According to the 2012 UN Statistics Commission survey, 85% of national statistical organisations have an internal group which provides spatial support. In many organisations there is only a loose connection between this group and the statisticians in the organisation 2 . It has been suggested that there needs to be more integration between these two groups. Some organisations have already implemented strategies to encourage cooperation between the areas. For example, by having regular meetings between statisticians, geospatial teams and methodologists.

III.

Frameworks and Standards for Statistical Modernization

13. The Geospatial world is well advanced in terms of standards and standardisation. There are a number of geospatial standards which are being used by many statistical organizations. The

Open Geospatial Consortium (OGC), founded in 1994, has produced over 50 specifications aimed at enabling geo-processing technologies to interoperate. The most commonly used specifications include:

1 Report of the Australian Bureau of Statistics on developing a statistical-geospatial framework, 2012, http://www1.unece.org/stat/platform/download/attachments/81297858/2013-2-ProgReview-E.pdf

2 ESS Coordination of Geospatial Information and Statistics, 2012, http://www.czso.cz/dgins2012/dgins.nsf/i/session_iii_eurostat/$File/Contribution_Eurostat%20rev.pdf

GML (Geography Markup Language)

WMS (Web Map Service) and WFS (Web Feature Service) used by developers of web services using geospatial information

14. There are also internationally recognized standards for geospatial metadata. ISO19115 is the current best practice standard for geospatial metadata and ISO 19139 is the XML representation that corresponds to ISO 19115. Many statistical organizations use these standards, or profiles of them.

15. The HLG aims establish and support industry frameworks and standards which will aid modernisation work within statistical organisation and the statistical industry as a whole.

16. The most notable of the frameworks are the Generic Statistical Business Process Model

(GSBPM) and the Generic Statistical Information Model (GSIM). Also of note are two standards which can be used to implement GSIM. These are the Data Documentation Initiative (DDI) and

Statistical Data and Metadata eXchange (SDMX) standards. The following paragraphs examine the role geospatial information and processes in each of these frameworks and standards.

A.

The Generic Statistical Business Process Model (GSBPM)

17. The GSBPM

3

was developed by the United Nations Economic Commission for Europe

(UNECE) and the Conference of European Statisticians Steering Group on Statistical Metadata

(better known as "METIS"). Since its release in April 2009, version 4.0 of this model has already been widely adopted by national and international statistical organisations around the world. It is intended to facilitate the convergence of statistical production processes, both within and between organisations.

18 GSBPM provides a basis for statistical organisations to agree on standard terminology to aid their discussions and is a flexible tool to describe and define the set of processes needed to produce official statistics. The GSBPM is, however, increasingly being used in other contexts such as harmonising statistical computing infrastructures, facilitating the sharing of software components, and providing a framework for process quality assessment and improvement.

19. Often Geospatial information is integrated with statistical information at the end of the production process. To be efficient it could be considered much earlier in the process. Currently there is no mention in GSBPM of where in the process the use of geospatial data or information need to be considered. In the future, this could be included. Some examples of where in the production process geospatial data and information could be considered are:

In the planning of a statistical product, geospatial elements could be considered. For example, when users want geospatial data, what do they mean by that? What types of classifications / geographies are being used or would be most useful to users? Collections could be designed to meet user requirements for level of geospatial details in outputs.

3 GSBPM http://www1.unece.org/stat/platform/display/metis/The+Generic+Statistical+Business+Process+Model

When designing how data is collected or assembled, statistical organisations could think about how they will assist data providers in providing data with geospatial components.

When processing data, what are the editing tools and methodologies needed to handle geospatial data? Also, if providers cannot provide geospatial data? How could this information be imputed?

B.

The Generic Statistical Information Model (GSIM)

20. The GSIM

4

is the first internationally endorsed reference framework for statistical information. This overarching conceptual framework plays an important part in modernising, streamlining and aligning the standards and production associated with official statistics at both national and international levels.

21. GSIM is a reference framework of information objects, which enables generic descriptions of the definition, management and use of data and metadata throughout the statistical production process. It provides a set of standardized, consistently described information objects, which are the inputs and outputs in the design and production of statistics.

As a reference framework, GSIM helps to explain significant relationships among the entities involved in statistical production, and can be used to guide the development and use of consistent implementation standards or specifications.

22. GSIM is one of the cornerstones for modernising official statistics and moving away from subject matter silos. It is a key element of the strategic vision prepared by the High-Level

Group for the Modernization of Statistical Production and Services (HLG), and endorsed by the

Conference of European Statisticians.

23. Currently, GSIM does not include objects or attributes related to geospatial information.

In future work, this could be included in the model. An example could be a review of how dimensions in a dataset could be denoted as geospatial data.

C.

Data Documentation Initiative (DDI)

24. DDI

5

is a standard for describing data from the social, behavioural and economic sciences. It can be used to describe microdata sets and the accompanying metadata (such as questions, variables, concepts, codes) and the tabulation of those datasets into aggregate data cubes. DDI is one of the standards that can be used to implement GSIM.

25. DDI has two major development lines. These are DDI Codebook which focussed on the documenting datasets after the data has been collected and DDI Lifecycle which focusses on describing metadata as it is created and used throughout the production process. DDI Codebook and DDI Lifecycle are used by Statistical organisations across the world.

26. Both branches of DDI have the ability to capture geospatial information.

4 GSIM, http://www1.unece.org/stat/platform/pages/viewpage.action?pageId=59703371

5 DDI Alliance, www.ddialliance.org/

27. DDI Codebook focusses on geospatial discovery systems by providing users with the ability to describe the geographical coverage of the data through bounding boxes, bounding polygons, spatial types, top and bottom geography available, and identifying variables that contain geospatial information (codes, coordinate points, names, etc.)

28. DDI Lifecycle has the additional ability to describe the relational structure of the geographic hierarches used, describe specific geographic locations, and to reference external maps and spatial data sets (i.e. shape or boundary files).

29. While this functionality is available in DDI, it not believed to be used widely by statistical organisations.

D.

Statistical Data and Metadata eXchange (SDMX)

30. SDMX 6 is a standard designed to describe statistical data and standardise how that data and metadata is exchanged. It is particularly strong in describing dimensional data.

31. Many statistical organisations use SDMX in the dissemination phase of the statistical production process to describe and exchange dimensional data with other agencies (for example national statistical organisations exchanging data with supranational organisations like Eurostat).

The dissemination phase of statistical production is point where statistical information and geospatial information is most commonly integrated.

32. SDMX is designed for handling large amounts of tabular data, some of which may have geographic meaning, through a location name (e.g. region/area) or a Unique Identifier that relates to a boundary, e.g. Meshblock, Area Unit etc. Using SDMX it is possible to denote a geospatial role for a dimension (for example a column of data in a dataset). Other than this, there is no geospatial component within SDMX

7

.

IV.

A new framework for linking geospatial and statistical information

33. The frameworks and standards that support modernization do not describe geospatial information well. This leads one to wonder why this is the case. It seems that first the statistical industry needs an agreed conceptual basis regarding geospatial information.

34. Many statistical organisations are linking statistics to geography using geographic boundaries developed for purposes other than statistical representation. Thus, the method of linkage used by many countries is not based on statistical requirements and does not reflect the need to be able to make meaningful comparisons given the wide diversity of population numbers that exist in these ‘politically’ driven administrative areas. Some areas may represent many thousands of people and other only hundreds, making statistical comparisons less meaningful.

6 SDMX http://sdmx.org/

7 http://opensdmxdevelopers.wikispaces.com/SDMX+Spatial

35. The statistical community currently has no agreed method of linking geospatial and statistical data. Statistical organizations have an opportunity to support this increasing demand, but more importantly to influence a common and consistent international approach to undertaking this linkage.

36. Using an agreed number of dwellings to establish the smallest area geography and then agreeing on a hierarchy of larger areas where these initial building blocks can be aggregated consistently provides a method to support statistical comparisons of social and demographic data within and between countries.

37. There are other possible approaches as there are different geographies for different types of statistical units. These include the use of grid based systems, where a hierarchy of grids, starting with relatively small grids, and increasing in size to larger and larger grids (in a similar fashion to the hierarchy based on population as described above). The issue with grids is that they are not population centric. That is there will be different number of people in the same sized grid cell depending on urban or rural locations. However, grids may be an effective approach for environmental and other non people based statistics.

38. Establishing common and consistent statistical geographies to be used across statistical organisations across the world, would increase useability and relevance of these data as well as enhance the ability to compare data both within and across countries. This would also facilitate the ability to build statistical services to link statistical and geospatial information in alignment with HLG’s Common Statistical Production Architecture.

E.

Global Geospatial Information Management

39. At the forty-fourth session of the Statistical Commission held in 2012, the importance of the integration of the statistical and geospatial information was recognized. The Statistical

Commission

8

asked the United Nations Statistical Division (UNSD) to establish an expert group on the topic. This expert group would bring together representatives of the statistical and geospatial communities from Member States. The work of this group would be to develop an international statistical geospatial framework. The process of establishing this group has been initiated by UNSD.

40. The United Nations co-ordinates an initiative on Global Geospatial Information

Management (UN-GGIM)

9

. This initiative aims to set the agenda relating to global geospatial information management and to address the key challenges in the field.

41. At the meeting of UN-GGIM in July 2013

10

, national geospatial organizations were encouraged to have more active engagement with their national statistical organization, particularly in regard to how integration statistical and geospatial information.

8 Statistical Commission Report on the forty-fourth session (26 February-1 March 2013), http://unstats.un.org/unsd/statcom/doc13/FinalReport-SubmittedVersion14Mar.pdf

9 http://ggim.un.org/default.html

10 Committee of Experts on Global Geospatial Information Management Report on the third session,

2013, http://ggim.un.org/docs/meetings/3rd%20UNCE/GGIM_3%20final%20report%209%20Aug_FINAL.pdf

UNSD

11

are currently considering organizing a conference which brings the two communities together to exchange ideas and best practices in the integration of statistical and geospatial information.

V.

Conclusion

42. The Frameworks and Standards Project supports the work being undertaken by GGIM and UNSD. The work to create a spatial framework is an excellent initiative. It is thought that this work is an essential prerequisite for any further work is undertaken on updating or making changes to the frameworks and standards for statistical modernization.

43. There are a number of points of alignment between these pieces of work. It will be important to ensure that these are monitored. A simple example of a point of alignment is to ensure that the statistical concepts being used (e.g statistical unit) align with the definitions used in GSIM. The HLG Project and its associated task teams are keen to take part in the review process of the spatial framework to ensure there is alignment between it and HLG support frameworks and standards.

11 Linking of geospatial information to statistics and other data, 2013, http://ggim.un.org/docs/meetings/3rd%20UNCE/E-C20-2013-9linking%20GI%20to%20Statistics%20Summary_en.pdf

Download