Paper - IAOS 2014 Conference

advertisement
New data sources for Egyptian statistics
Opportunities and Challenges
Nany Abd El-kader
Statistician, CAPMAS, Egypt
nany_elfar@hotmail.com
Abstract
All countries in the world are depending on the traditional sources such as surveys and
administrative sources in producing their statistics; whether they are general or official
statistics. Due to the presence of some obstacles in these methods, such as the time and cost
some countries began to overcome these problems by resorting to the use of new sources of
data.
Recently a new trend has emerged to data sources such as the Internet, which hired by the
Netherlands as a new source of information for the production of official statistics. This paper
sheds light on the possibility of adopting the Internet as a new source of data in Egypt for
compiling consumer price index (CPI). As well as the opportunities and challenges.
Keywords: big data, official statistics, CPI
1- Introduction
There are many reasons to discuss the internet as a new data source for statistics.
These reasons are; nowadays the internet is considered main or a quickly
communication channel; a burden responses in a sample survey. Sample survey
method is one of a traditional methods used for making official statistics. Due to the
presence of some of the obstacles in the traditional methods, there were new data source,
such as internet.
According to more data are being produced by an increasing number of electronic
devices, the big data and official statistics become emerging in international statistical
communication. So in this paper it will be focused on Netherlands’ experiments in
using the internet as a data source for compile Consumer Price Index. Also, can Egypt
benefit from this experiment?
Some of literature indicated to the multiplicity of different benefits, as well as good
opportunities when using the Internet as a new source of data on the structure of the
statistical work in general and, in particular, official statistics; see more Heerschap
(2013).
So, this paper will firstly show the classifications to the sources of data; which are
Primary traditional data source, Secondary traditional data source and Secondary new
data sources. Then, it will display the official statistics. The following sections will
display the experiment of Netherland and Internet as a new data source in Egypt.
Finally, the Opportunities and Challenges and Summary and Conclusions will be
shown.
2- Sources of Data
According to types of data sources as described in many references, there are three
classifications to the sources of data:
1- Primary traditional data source: this is the sample survey, which depends
on censuses, to collect data of all kinds of social and economic phenomena.
2- Secondary traditional data source: this is the administrative source,
“administrative sources are used whenever possible to avoid duplicating
requests for information” Laux and et al. (2009).
National statistical authorities depend on surveys, and administrative sources to
compile their statistics, and some time they combining these two types of source.
3- Secondary new data sources: this is the electronic sources of data
information used for production of statistics. Daas and et al. (2011) stated that,
there are four purpose of electronic sources:
i)
Product prices on the internet,
ii)
Mobile phone location data,
iii)
Twitter text messages,
iv)
Global Positioning System (GPS) data and traffic loop information.
3- Official Statistics
Official statistics according to Nordbotten (2008) is: “Almost every country in the
world has one or more government agencies (usually national institutes) that supply
decision-makers and other users including the general public and the research
community with a continuing flow of information. This bulk of data is usually called
official statistics. Official statistics should be objective and easily accessible and
produced on a continuing basis so that measurement of change is possible”.
In ancient times, Egyptian pharaohs carried out censuses as a source of data for many
purposes; to gather tax, to determine fitness for military and labor services and so on.
Statistics produced from these censuses known as official statistics. Recently, major
part of the official statistical products from administrative data. During the last 3
decades, statistical institutes and statistical offices have gradually been replacing
survey data with administrative data. This is because of the wishing to decrease the
response burden on the data providers and the desire to produce statistics of sufficient
quality in a cost efficient way.
According to Official statistics should be objective and easily accessible and produced
on a continuing basis so that measurement of change is possible, so nowadays,
effective use of these data has required the development of new electronic methods
for the transferring and editing of data, as well as for the estimation and quality
evaluation of statistical products. Also, we need to know how we can deal with the
big data.
Big data is characterized as data sets of huge volume, velocity and variety and often
largely unstructured, i.e., it has no pre-defined data model and/or does not fit well into
conventional relational databases. Big data is also potentially very interesting as an
input for official statistics. It can be used as a source of data, or in combination with
more traditional data sources such as sample surveys. However, harvesting the
information from big data and incorporating it into a statistical production process is
not easy.
4- The experiment of Netherland
In this section the researcher will shed light on Netherlands' experiment in using the
internet as a data source for statistics, or in particular for compile Consumer Price Index
(CPI) as an official statistic.
National Statistical Institute at Netherlands used traditional data sources, such as
sample surveys and administrative source. Generally, the production of official
statistics is facing major challenges:




There is the need to satisfy an increasing demand from users for more and
better statistics, including the faster measurement of new phenomena.
The users' requests are formulated in an environment requiring the response
burden placed on businesses and citizens to be limited.
Response rates, especially to household surveys, are declining.
In the future the official statistics will be confronted by a situation that will
enforce a systematic increase in productivity and efficiency.
Because of Shopping, travel arrangements, hotel and restaurant reservations and other
services are all increasingly done online, so more electronic sources of information
are available, these electronic sources potentially can be used for the production of
statistics. Here, we will be focuse on collecting data used to compile Consumer Price
Index (CPI) as an important official statistic.
At Statistics Netherlands the Consumer Price Index (CPI) department is gathering
data from the web on articles; such as airline fares; copy it, and store it into a local
database. Also, Special protocols have been developed per CPI-item to ensure that
statisticians collect the data using a consistent and structured statistical procedure.
There are two ways to collect data:
1. Collection of data from internet sites with many similar items, e.g. prices and
characteristics of any sets of electronic devices in consumer electronics web
shops. This called automated data collection.
2. Collection of data from internet data sources with only few items, e.g. the
price of a cinema ticket from a cinema website. This called robot- assisted data
collection.
The difference between these methods is the automated method run without
user interaction.
Statistics Netherlands started experimenting with automated collection of price data
from online retailers and showed that it was feasible to calculate price statistics from
internet data for the consumer price statistics in 2009 and 2010. Although these
experiments were successful in a technical sense, there were doubts about three
opportunities. These opportunities are cost efficiency, methodology and legal aspects.
These experiments concluded that:
1. About cost efficiency, they wondered whether manual collection would
outweigh collection by robots, they got that a robot would have to be
maintained technically for every small website change, which could become
expensive.
2. About the methodology, they recognised that processing huge volumes of
internet data would have its impact on the required statistical methodology.
3. About the legal framework, they concluded that further legal advice was
required.
Also, there are three challenges or major problems observed with the scripts and tool
during the data collection period were the interaction with dynamic web pages (i.e.
web pages generated by a web application), the occasional change of websites, and
response time of the website (delays in requests).
5- Internet as a new data source in Egypt
The researcher advised to use the automated data collection method in Egypt to gather
the data using to compile CPI. This is because of automated data collection can result
in more detailed data compared to data collected in traditional ways, that is will lead
to improve efficiency and to reduce response burden. Also as stated in Netherlands'
experiment this kind of collection methods may be used to study phenomena in a
completely new way.
6- Opportunities and Challenges
Using the internet as a data source may provide many opportunities:




They will be facility and allow observing data patterns between Egyptian
statistical office and any one other better than they can with traditional data
collection.
The price can be observed more frequently, daily, hourly, if you like.
It can be a more efficient way to collect data that would otherwise be costly to
collect or involve response burden.
More sources of information become available.
Using the internet as a data source may face by many challenges:





Process and analysis the big data.
Dealing with big Data; the emergence of new ways to visualise data.
Developing Methods to quickly uncover information from massive amounts of
data available.
Methods capable of integrating the information in the statistical process.
Visualisation methods.
7- Summary and Conclusions
This paper sheds light on:





The possibility of adopting the Internet as a new source of data.
Netherlands’ experiments, which using a new source of data for compile CPI.
The three types of data sources.
The literature of making official statistic, especially in Egypt.
Application of new data sources in Egypt.
8- References
[1] Bosch, O. and Windmeijer, D. (2014). “On The Use Of Internet Robots For
Official Statistics”. Meeting on the Management of Statistical Information Systems,
Working Paper.
[2] Daas and Loo, M. (2013). “Big Data (and official statistics)”. Meeting on the
Management of Statistical Information Systems, Working Paper.
[3] Daas, P., Puts, M., Buelens, B. and Hurk, Paul. “Big Data as a data source for
official statistics”. Big Data Target Conference, April 4, Groningen.
http://omegate.astro.rug.nl/~target_conference/presentations/Splinter5B/Piet_Daas.pd
f
[4] Heersschap, N. (2013). "Internet as a new source of information for the production
of official statistics. Experiences of Statistics Netherlands". 59th ISI World Statistics
Congress, p. 1431-1436.
[5] Laux, R., Baigorri, A. and Radermacher, W. (2009). “Building Confidence in the Use of
Administrative Data for Statistical Purposes”. 57th Session of the International Statistical
Institute.
Download