New data sources for Egyptian statistics Opportunities and Challenges Nany Abd El-kader Statistician, CAPMAS, Egypt nany_elfar@hotmail.com Abstract All countries in the world are depending on the traditional sources such as surveys and administrative sources in producing their statistics; whether they are general or official statistics. Due to the presence of some obstacles in these methods, such as the time and cost some countries began to overcome these problems by resorting to the use of new sources of data. Recently a new trend has emerged to data sources such as the Internet, which hired by the Netherlands as a new source of information for the production of official statistics. This paper sheds light on the possibility of adopting the Internet as a new source of data in Egypt for compiling consumer price index (CPI). As well as the opportunities and challenges. Keywords: big data, official statistics, CPI 1- Introduction There are many reasons to discuss the internet as a new data source for statistics. These reasons are; nowadays the internet is considered main or a quickly communication channel; a burden responses in a sample survey. Sample survey method is one of a traditional methods used for making official statistics. Due to the presence of some of the obstacles in the traditional methods, there were new data source, such as internet. According to more data are being produced by an increasing number of electronic devices, the big data and official statistics become emerging in international statistical communication. So in this paper it will be focused on Netherlands’ experiments in using the internet as a data source for compile Consumer Price Index. Also, can Egypt benefit from this experiment? Some of literature indicated to the multiplicity of different benefits, as well as good opportunities when using the Internet as a new source of data on the structure of the statistical work in general and, in particular, official statistics; see more Heerschap (2013). So, this paper will firstly show the classifications to the sources of data; which are Primary traditional data source, Secondary traditional data source and Secondary new data sources. Then, it will display the official statistics. The following sections will display the experiment of Netherland and Internet as a new data source in Egypt. Finally, the Opportunities and Challenges and Summary and Conclusions will be shown. 2- Sources of Data According to types of data sources as described in many references, there are three classifications to the sources of data: 1- Primary traditional data source: this is the sample survey, which depends on censuses, to collect data of all kinds of social and economic phenomena. 2- Secondary traditional data source: this is the administrative source, “administrative sources are used whenever possible to avoid duplicating requests for information” Laux and et al. (2009). National statistical authorities depend on surveys, and administrative sources to compile their statistics, and some time they combining these two types of source. 3- Secondary new data sources: this is the electronic sources of data information used for production of statistics. Daas and et al. (2011) stated that, there are four purpose of electronic sources: i) Product prices on the internet, ii) Mobile phone location data, iii) Twitter text messages, iv) Global Positioning System (GPS) data and traffic loop information. 3- Official Statistics Official statistics according to Nordbotten (2008) is: “Almost every country in the world has one or more government agencies (usually national institutes) that supply decision-makers and other users including the general public and the research community with a continuing flow of information. This bulk of data is usually called official statistics. Official statistics should be objective and easily accessible and produced on a continuing basis so that measurement of change is possible”. In ancient times, Egyptian pharaohs carried out censuses as a source of data for many purposes; to gather tax, to determine fitness for military and labor services and so on. Statistics produced from these censuses known as official statistics. Recently, major part of the official statistical products from administrative data. During the last 3 decades, statistical institutes and statistical offices have gradually been replacing survey data with administrative data. This is because of the wishing to decrease the response burden on the data providers and the desire to produce statistics of sufficient quality in a cost efficient way. According to Official statistics should be objective and easily accessible and produced on a continuing basis so that measurement of change is possible, so nowadays, effective use of these data has required the development of new electronic methods for the transferring and editing of data, as well as for the estimation and quality evaluation of statistical products. Also, we need to know how we can deal with the big data. Big data is characterized as data sets of huge volume, velocity and variety and often largely unstructured, i.e., it has no pre-defined data model and/or does not fit well into conventional relational databases. Big data is also potentially very interesting as an input for official statistics. It can be used as a source of data, or in combination with more traditional data sources such as sample surveys. However, harvesting the information from big data and incorporating it into a statistical production process is not easy. 4- The experiment of Netherland In this section the researcher will shed light on Netherlands' experiment in using the internet as a data source for statistics, or in particular for compile Consumer Price Index (CPI) as an official statistic. National Statistical Institute at Netherlands used traditional data sources, such as sample surveys and administrative source. Generally, the production of official statistics is facing major challenges: There is the need to satisfy an increasing demand from users for more and better statistics, including the faster measurement of new phenomena. The users' requests are formulated in an environment requiring the response burden placed on businesses and citizens to be limited. Response rates, especially to household surveys, are declining. In the future the official statistics will be confronted by a situation that will enforce a systematic increase in productivity and efficiency. Because of Shopping, travel arrangements, hotel and restaurant reservations and other services are all increasingly done online, so more electronic sources of information are available, these electronic sources potentially can be used for the production of statistics. Here, we will be focuse on collecting data used to compile Consumer Price Index (CPI) as an important official statistic. At Statistics Netherlands the Consumer Price Index (CPI) department is gathering data from the web on articles; such as airline fares; copy it, and store it into a local database. Also, Special protocols have been developed per CPI-item to ensure that statisticians collect the data using a consistent and structured statistical procedure. There are two ways to collect data: 1. Collection of data from internet sites with many similar items, e.g. prices and characteristics of any sets of electronic devices in consumer electronics web shops. This called automated data collection. 2. Collection of data from internet data sources with only few items, e.g. the price of a cinema ticket from a cinema website. This called robot- assisted data collection. The difference between these methods is the automated method run without user interaction. Statistics Netherlands started experimenting with automated collection of price data from online retailers and showed that it was feasible to calculate price statistics from internet data for the consumer price statistics in 2009 and 2010. Although these experiments were successful in a technical sense, there were doubts about three opportunities. These opportunities are cost efficiency, methodology and legal aspects. These experiments concluded that: 1. About cost efficiency, they wondered whether manual collection would outweigh collection by robots, they got that a robot would have to be maintained technically for every small website change, which could become expensive. 2. About the methodology, they recognised that processing huge volumes of internet data would have its impact on the required statistical methodology. 3. About the legal framework, they concluded that further legal advice was required. Also, there are three challenges or major problems observed with the scripts and tool during the data collection period were the interaction with dynamic web pages (i.e. web pages generated by a web application), the occasional change of websites, and response time of the website (delays in requests). 5- Internet as a new data source in Egypt The researcher advised to use the automated data collection method in Egypt to gather the data using to compile CPI. This is because of automated data collection can result in more detailed data compared to data collected in traditional ways, that is will lead to improve efficiency and to reduce response burden. Also as stated in Netherlands' experiment this kind of collection methods may be used to study phenomena in a completely new way. 6- Opportunities and Challenges Using the internet as a data source may provide many opportunities: They will be facility and allow observing data patterns between Egyptian statistical office and any one other better than they can with traditional data collection. The price can be observed more frequently, daily, hourly, if you like. It can be a more efficient way to collect data that would otherwise be costly to collect or involve response burden. More sources of information become available. Using the internet as a data source may face by many challenges: Process and analysis the big data. Dealing with big Data; the emergence of new ways to visualise data. Developing Methods to quickly uncover information from massive amounts of data available. Methods capable of integrating the information in the statistical process. Visualisation methods. 7- Summary and Conclusions This paper sheds light on: The possibility of adopting the Internet as a new source of data. Netherlands’ experiments, which using a new source of data for compile CPI. The three types of data sources. The literature of making official statistic, especially in Egypt. Application of new data sources in Egypt. 8- References [1] Bosch, O. and Windmeijer, D. (2014). “On The Use Of Internet Robots For Official Statistics”. Meeting on the Management of Statistical Information Systems, Working Paper. [2] Daas and Loo, M. (2013). “Big Data (and official statistics)”. Meeting on the Management of Statistical Information Systems, Working Paper. [3] Daas, P., Puts, M., Buelens, B. and Hurk, Paul. “Big Data as a data source for official statistics”. Big Data Target Conference, April 4, Groningen. http://omegate.astro.rug.nl/~target_conference/presentations/Splinter5B/Piet_Daas.pd f [4] Heersschap, N. (2013). "Internet as a new source of information for the production of official statistics. Experiences of Statistics Netherlands". 59th ISI World Statistics Congress, p. 1431-1436. [5] Laux, R., Baigorri, A. and Radermacher, W. (2009). “Building Confidence in the Use of Administrative Data for Statistical Purposes”. 57th Session of the International Statistical Institute.