WP.9abs

advertisement

Modern methods of data collection in Poland

Janusz Dygaszewicz, CSO Poland

Introduction

1.

The purpose of searching for new solutions in survey studies execution is to reduce financial costs of surveys as well as social burdens associated with the transfer of data by the respondents, at the same time keeping high quality of statistical data. It is enabled by the rich offer of a new technology market and of ICT tools. New tools, technologies and methods of data collection were applied for the first time successfully during Censuses data collections: Agricultural Census in

2010 and Population and Housing Census 2011. Innovations implemented back then (using electronic form only, filled in with CAxI methods: CAII – Computer Assisted Internet Interview,

CATI – Computer Assisted Telephone Interview and CAPI – Computer Assisted Personal Interview, which made it possible to eliminate paper use entirely) have been approved by statisticians and respondents. Thus initiated a new trend in data collection that is continuing in survey studies and will be gradually developed, fitting into the world trends of statistics development.

2.

The public statistics survey program is developed annually along with the schedule of surveys execution. Survey program includes among others, dates and methods of data collection from the respondents. Information about surveys in a given year, including methods and dates of data collection are published on the website of the Central Statistical Office of Poland. Apart from information on the website, each randomly chosen respondent receives a letter from the

President of the Central Statistical Office. In this letter apart from explanations there is an encouragement to self-complete the questionnaire on the Internet. The Letter from the

President of the Central Statistical Office of Poland is still the most effective tool for respondents’ acquisition for survey study execution and creating a proper image of statistics.

Collecting data by using the Internet

3.

In Poland there are special names for different methods of data collection using the Internet.

CAWI (Computer Assisted Web Interview) is used in the case of entrepreneurs and other units obliged to report by electronic form using a report website. The Reporting Portal is used as a tool for data collection from legal entities, organizational entities without legal status and persons conducting economic activity businesses. It allows for registration of data by the reporting entity, data validation and correction (both by the reporting person and by the statistician), monitoring of the report’s completion and communication between the reporting person and the statistician. Notifications of the approaching report submission deadlines and reminders (in electronic format, sent from the Reporting Portal) are sent upon expiry of the submission deadline to those units which failed to submit or complete their reports - to eliminate nonresponse.

4.

The Reporting Portal was implemented in 2007 and has been systematically extended and improved. Currently all surveys of business entities of the national economy are carried out with the use of electronic forms. Only units with a number of employees below 5 persons can complete and send paper reports.

5.

In the case of the individuals randomly selected for participation in the survey, the method of

CAII has been adopted – these persons answer survey questions using the application available online. Currently, the CAII method (beside to the CATI and CAPI methods) has already been implemented for agriculture survey execution, while in social survey this is a process extended in time and has a pilot character. Significant effects in introducing CAII method in survey are expected in the perspective of a few years, nevertheless the progress in this field is perceptible, resulting in a smaller workload for the statistical interviewers.

Computer Assisted Telephone Interview (CATI)

6.

CATI method was implemented in all agricultural survey studies and in most social studies. It is performed by telephone interviewers being employed by statistics units. Telephone interviewers are provided with professional equipment. Their working posts are located in separated Call

Center studies. Organization of the telephone contact campaigns varies, depending on the specific nature of the survey conducted. In the scope of agriculture, all the defined agricultural farms are subject to telephone contact, it is the second channel of data collection, following CAII method. Data collection is divided into two substages, i.e. nationwide sub-stage, during which telephone interviewers call respondents from all over the country. In the second, the so-called provincial substage, telephone interviewers from a given province call only the respondents from that particular province. In the social survey, depending on the sample size and the predefined criteria, one or more provinces gathering information from all over the country are chosen.

Regardless of the type of telephone surveys, apart from data collection, helpline service is also provided. It is activated for respondents in order to answer their questions and clarify the doubts in the case of choosing the CAII method.

Computer Assisted Personal Interview (CAPI)

7.

The CAPI method has been fully implemented as the third channel of data collection in agricultural surveys, in the case of failure to obtain a complete set of data via CAII and CATI channels. In the social survey, statistical interviewers conduct direct interviews in households where such a way of proceeding results from adopted methodology or whose members has not expressed consent for telephone survey. In the case of household budget survey during which currently CAII method is implemented as the pilot basis, the interviewers have an option of remote preview of the form completed independently by respondents. As a result, they can actively assist the respondents, quickly explaining possible mistakes. All statistical interviewers are equipped with tablets functionally prepared for surveying with electronic form, data control and sending correct data to the server, at the same time retaining any requirements associated with data security.

Statistical Interviewers network

8.

In Poland the statistical interviewers’ network is organised within official statistics. The network includes interviewers and telephone interviewers. The number of statistical interviewers’ network is annually planned in such a manner so as to ensure the execution of surveys planned for a particular year. Only in the cases when the number of farms to examine is very high

(agricultural farms survey executed every 3 years by the order and according to the criteria of

Eurostat), statistical interviewers team is supplemented with additional, external people, employed for the time of the survey. Censuses (population and housing as well as agriculture) during which the number of entities to survey exceeds operating capacities of statistical interviewers that conduct statutory survey, require hiring external census enumerators. Owing to the application of CATI and CAPI methods with form applications created in auxiliary functions environment, such as hints, dictionaries, classifications, errors signalling, the statistical interviewers can conduct interviews of various thematic areas.

CORstat system

9.

Collecting data by means of different channels and by various interviewers groups requires a management system, controlling surveys flow between channels and interviewers/telephone interviewers to avoid double checking or respondent omission. In Poland this system operates under the name CORStat. It enables not only reasonable allocation of tasks for statistical interviewers but also monitoring and control of their work, monitoring the course of the survey including the possibilities of remedial actions in case of any problems and creation of reports according to the stated parameters. It is a complex system with defined functionalities, however, it includes an option of further expansion if necessary. Currently, a system module enabling automated analysis process of interviewers’ workload in order to calculate the number of interviewers necessary for tasks execution is being created.

A geographic information system (GIS) as a tool to support, monitor and control the work of field interviewers

10.

In the 2010 Census round in Poland a combination of data coming from administrative sources and register containing spatial data was used for the first time. The application of digital maps and the GPS technologies brought a revolutionary change when it comes to the possibilities of planning and managing census operations, both prior to and during the census.

11.

For that purpose, the data obtained from the State geodetic and cartographic resources, as well as orthophotomaps (processed aerial photographs), were used. With the use of the materials obtained, both from geodetic and statistical resources, it was possible to develop sampling frames for censuses, comprising statistical address points and their spatial reference.

12.

The digital maps based on the GIS technology were used during:

(a) gminas update,

(b) pre-census field check,

(c) census survey.

13.

Digital maps were an indispensable tool facilitating the work of census enumerators (when it comes to moving around the area, verifying the sampling frame, etc.), gmina leaders, and voivodship and central dispatchers who could verify on a map the progress of the census and, for example, the route or location of an enumerator in the field, using a dispatching or GIS application, supporting the work of a gmina leader.

14.

Census enumerators had mobile terminals which were equipped with the GIS application. It enabled revisions and showed on the map, among other things, the current location of the census enumerator (GPS) and address points assigned to him. The GIS application was also actively applied during the census – to manage its course. It enabled the monitoring and control of the enumerator’s work, as well as the tracing of his movement in the field (among other things, to ensure his safety).

15.

Today, due to the knowledge of how to use digital maps, it is possible to manage the enumerators performing field work with the use of GIS tools. Those devices with GPS showed online on the orthophotomaps the current position of a given enumerator and the address point to which they were to go in order to carry out the surveys.

Model of transformation of administrative data to statistical data

16.

In Poland, applications to collect administrative data for statistical purpose are used. It is an

Internet application, used for the transfer of data from the keepers of administrative data sets to statistic system via an encrypted secure channel. This application allows to check the correctness of the structure of the transferred XML file. If the application identifies errors, the keeper of the data receives a message with line numbers of errors. Using of data from administrative registers for statistical purposes is preceded by transforming them and calculating quality indicators to allow assessing the source data.

Fig.1 ETL process scheme

17.

Data sets transformation from administration systems into statistical data sets aim to improve the quality of data sets and to increase their usefulness in statistics. Obtained data can have various formats and they need to be consolidated and converted into a format that is suitable for

processing - SAS tables. It is very important to check the correctness of the data and its structures. Then the data are loaded to the protected IT environment specially designed for this purpose (the Operational Microdata Base), where data transformation is carried out.

Administrative data sets with unit data are converted in terms of identification and address characteristics. Data transformation means a series of activities in the production environment consisting of: profiling - the creation of a report on data quality, unification data, parsing

(separation) or combining variables, standardization with schemes, conversion, validation, de-duplication, data integration. The basic transformation is performed according to the determined model and the rules on the basis of standard algorithms. In the case of non-standard requirements (relating to de-normalization) algorithms may be suitably modified. The transformation is carried out once on a given set for all further compilations.

18.

Validation procedures and quality assessment of data from administrative sources include validation and assessing of address data, the reference numbers, compliance with the applicable provisions of the standards and dictionaries. The next validation phase entails substantive validation and processing carried out according to the set algorithms. Data quality indicators concerning the phase of converting the set from administrative datasets are standardized and uniformed for all datasets, and prepared on the basis of quality indicators applied in Agricultural

Census and Population and Housing Census.

Processing data

19.

The “cleaned” data are loaded onto the Operational Microdata Base as successive logical layers corresponding to the obtained registers. In the case of surveys carried out, not only by CAxI methods but also with the use of data from administrative and non-administrative sources, it is crucial to gather all the data in one database. The basic structure of data in the OMB are layers. It is a set of records, each of which relates to one unit (a person, a dwelling, a household). The records include the values of attributes derived from source data collected by CAxI methods or defined in a different way (e.g. in the process of imputation).

Fig.2. Golden record Generation

OBM Layers

20.

In the first step of processing on the basis of source sets and the frame the layer referred to as the Master Record is created, consisting of the initial value of the selected subset of attributes.

The values from this layer are transferred to the CAPI, CATI and CAWI processes for personalization of electronic questionnaires. After collecting data in the census period with the use of the CAxI processes, on the basis of collected information, a proper layer in the database is created. The layers which have already been saved in the system can serve as the basis for creating a new, internal layer in which new attributes (derived from the existing ones) can be added. The last layer in the OBM (referred to as the Golden Record) is created from combination of data from individual layers with data from forms and data obtained from registers and other sources. The Golden Record, after depersonalization, is transferred for further processing to the

Analytical Microdata Base.

21.

The role of the Analytical Microdata Base is to store depersonalised census data in their final form. In this dataset every type of statistical analyses is carried out to acquire results for publication, i.e. the census products. The AMB allows all the recipients of statistical information to quickly acquire data in the form of aggregates. The AMB system constitutes an analytical and reporting platform that currently enables the statistical preparation of the outcome data from the National Population and Housing Census 2011. The results of analyses in the form of documents, reports and breakdowns are shared with internal and external users.

22.

In the Analytical Microdata Base took place the following processes: data integration, validation, automatic correction, imputation, calibration, creation a new secondary variables and new statistical units (i.e. families and households). ETL processes in OMB and AMB were repeatedly executed until the approval of the methodologists.

23.

In the next step, processes concerning the creation of multidimensional objects - OLAP Cubes for the national and international level were made. The AMB also allows to calculate the aggregates available in the Geostatistics Portal as digital maps (cartograms and cartodiagrams).

24.

The Metainformation Subsystem gathered metainformation describing data and census processes, including the processes indispensable for drawing up quality reports. The task of the

Metainformation Subsystem was to ensure the coherent definition of statistical objects for the

OMB and AMB. The Metainformation Subsystem was also used to store depersonalised operational metadata of the OMB and AMB systems. This Subsystem constitutes the Central

Metadata Repository (CMR).

25.

Metainformation Subsystem gathered the following metadata:

 methodological,

 operational,

 systems,

 definitional,

 quality.

26.

At present, intensive works on the implementation of modern methods of data collection are conducted.

Download