Bridging economic statistics with people: A role for alternative sources of data? Zeynep Orhun Girard Statistician, ESCAP Statistics Division IAOS, Danang Viet Nam 9 October, 2014 DISCLAIMER: The views presented here are the author’s and do not necessarily reflect the views and position of the United Nations. “No wind favors he who has no destined port” Michel de Montaigne “We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns science cannot. […] Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all”. Chris Anderson Editor of Wired Magazine For official statistics to extract value from alternative sources of data like Big data 1) It has to be guided closely by statistical policy 2) with the goal of filling actual methodological and data gaps in different domains of statistics Methodological/policy developments are guiding economic statistics Macroeconomic statistical frameworks are constantly updated, e.g. SNA 1936 1947 1952 1968 1993 2008 - Input-Output analysis - First econometric model of business cycle and the General Theory - Report on measurement of national income and the construction of social accounts SNA published - Allowed for national statistical policies, recommended IOT and constant prices - Introduced satellite accounts - Some non-market production in production boundary - Concept of employment introduced in the sub-sectoring of household sector - Use of PPPs for international comparison - Balance sheets and SAMs - Chapter on informal aspects of economy 3 key policy-related initiatives are shaping the future of economics statistics SSF Commission Report • Five recommendations on material wellbeing • Follow-up work on disparities in national accounts, distribution of Household Income, Consumption and Wealth (OECD) G-20 Data Gaps Initiative • Recommendations 15-20 on Sectoral and Other Financial and Economic Datasets Post-2015 development agenda • Data revolution for targeted policy making • Measurement of progress on sustainable development that complement GDP (SGD17) We have witnessed a move towards an integrated approach to statistics and an emphasis of the household perspective and the distributional aspects of economic activity Big Data: 3 v’s yes but not only… • • • • • Exhaustiveness in scope (n=all) Granularity Indexical in identification Relational Flexible in fields and scalable in size Big data and economic statistics so far? Data sources Online search queries/web scraping Substantive areas Housing market, labour market, prices Methodologies/results Correlations and predictive modelling 𝑉𝑡 𝑜𝑟 𝑡+1 = 𝑓(𝑃𝑟𝑜𝑥𝑦 𝑡−1, 𝑡−2,… , 𝑉𝑎𝑙𝑢𝑒 𝑡−1, 𝑡−2,… ) Use of some big data sources for economic statistics 1. Housing market (Google Trends) – Bank of England: McLaren and Schanbhogue (2011) – Wu and Brynjolfsson (2009) 2. Labour/employment market (Google Trends and Word Tracker) – – – – Bank of England: McLaren and Schanbhogue (2011) D’Amuri and Marcucci (2009) Askitas (2009) Ettredge et al. (2005)—Word Tracker 3. Prices (Scraping and non-traditional enumeration) – Billion Prices – Premise (hybrid) Common points of these studies • Compare aggregate trends of online search data against official/administrative statistics • Emphasize correlation rather than causality • Find that that online search data can predict observed trends within the appropriate lead time (depends on the individuals and area of economic statistics) What can big data do for economic statistics? Beyond correlations and predictive modelling: 1. Enhance quality and granularity of economic statistics? – Increase resolution and distributional information, e.g. demographics and geographical location 2. Enhance availability of economic statistics? – Example: Components of a household balance sheet, e.g. consumer durables Selecting the Main Source of Data Define measurement objective based on policy question, e.g. distribution of wealth across different quintiles of households at provincial level Identify approach based on statistical policy Identify main data source based on FPOS and QAF (Relevance, accuracy, timeliness, punctuality, accessibility, clarity, and comparability and consistency over time) + Costefficiency Existing dataset Traditional Data Source (surveys, administrative records, registers) Data requirement X Alternative Data Source Design new data collection Big data set Using big data for distributional aspect Select dataset Example • Online search keyword, e.g. “insurance” and “repair/garage” for automobiles, yellow pages data for business address searches • Test correlations with any existing official statistics/other data source, e.g. household surveys covering consumer durables Select variable of disaggregation Example • Location, sex, age, etc. • Test distribution of groups by demographic characteristics • Population Census data and demographic distribution at the national and sub-national levels • Household Income and Expenditure Data for the item in question, e.g. vehicle ownership and its distribution Apply in analysis Example • Use distribution of vehicle ownership obtained through big data sources on macroeconomic aggregates Using big data for enhancing data availability Select dataset Process data Apply in analysis Example Example • Value of vehicle owned through purchase and repair data, e.g. insurance databases • Blow up to national (if possible sub-national) level figures • Calculate depreciation • Differentiate household enterprises • In construction of balance sheets • Memo item for national accounts Challenges: Big data in official statistics • Shift from planned data collection activities • Possible mismatch between what big data can offer and what the economic policy makers need (comprehensiveness and comparability) • Privacy of individuals and confidentiality of data • Lack of code of conduct covering all stakeholders (public and private) Opportunities: Big data in official statistics • In the policy context we live in we need to integrate different data sources • Alternative sources of data can respond to such needs (exhaustive, relational, flexible and scalable) • Maintaining TRUST of individuals is key – “Fifty-four per cent of global consumers indicated that they would be comfortable with the use of information about them if they believed that the uses would not embarrass them, damage their interests, or otherwise harm them” (BCG Global Consumer Sentiment Survey 2013) Conclusions 1. Big data to complement official statistics a. Conduct research for innovative statistics development; b. Provide quality insights through data confrontation and; c. Enhance availability of data by closing data gaps. 2. Statistical policy & actual methodological and data gaps need to guide big data research to allow for meaningful results that can be used 3. Big data has a potential role to bring in the distributional and household aspect to economic statistics Next steps? • Multiply the number of proposals embedded in methodological and data needs • Conduct studies with official and private sources of data Thanks and for comments/questions: Zeynep Orhun Girard orhun@un.org