Bridging economic statistics with people:
A role for alternative sources of data?
Zeynep Orhun Girard
Statistician, ESCAP Statistics Division
IAOS, Danang Viet Nam
9 October, 2014
DISCLAIMER: The views presented here are the author’s and do not necessarily reflect the views and position of the United Nations.
“No wind favors he
who has no destined
port”
Michel de Montaigne
“We can analyze the data without hypotheses
about what it might show. We can throw the
numbers into the biggest computing clusters the
world has ever seen and let statistical algorithms
find patterns science cannot. […] Correlation
supersedes causation, and science can advance
even without coherent models, unified theories,
or really any mechanistic explanation at all”.
Chris Anderson
Editor of Wired Magazine
For official statistics to extract value from
alternative sources of data like Big data
1) It has to be guided closely by statistical policy
2) with the goal of filling actual methodological
and data gaps in different domains of statistics
Methodological/policy developments are
guiding economic statistics
Macroeconomic statistical frameworks are
constantly updated, e.g. SNA
1936
1947
1952
1968
1993
2008
- Input-Output analysis
- First econometric model of business cycle
and the General Theory
- Report on measurement of national income
and the construction of social accounts
SNA published
- Allowed for national statistical policies,
recommended IOT and constant prices
- Introduced satellite accounts
- Some non-market production in production
boundary
- Concept of employment introduced in the
sub-sectoring of household sector
- Use of PPPs for international comparison
- Balance sheets and SAMs
- Chapter on informal aspects of economy
3 key policy-related initiatives are shaping the
future of economics statistics
SSF
Commission
Report
• Five recommendations on material
wellbeing
• Follow-up work on disparities in
national accounts, distribution of
Household Income, Consumption
and Wealth (OECD)
G-20 Data
Gaps
Initiative
• Recommendations 15-20 on Sectoral
and Other Financial and Economic
Datasets
Post-2015
development
agenda
• Data revolution for targeted policy
making
• Measurement of progress on
sustainable development that
complement GDP (SGD17)
We have witnessed a move towards an integrated approach to statistics and an emphasis of
the household perspective and the distributional aspects of economic activity
Big Data: 3 v’s yes but not only…
•
•
•
•
•
Exhaustiveness in scope (n=all)
Granularity
Indexical in identification
Relational
Flexible in fields and scalable in size
Big data and economic statistics so far?
Data sources
Online search queries/web scraping
Substantive areas
Housing market, labour market, prices
Methodologies/results
Correlations and predictive modelling
𝑉𝑡 𝑜𝑟 𝑡+1 = 𝑓(𝑃𝑟𝑜𝑥𝑦 𝑡−1,
𝑡−2,…
, 𝑉𝑎𝑙𝑢𝑒 𝑡−1,
𝑡−2,…
)
Use of some big data sources for
economic statistics
1. Housing market (Google Trends)
– Bank of England: McLaren and Schanbhogue (2011)
– Wu and Brynjolfsson (2009)
2. Labour/employment market (Google Trends and Word
Tracker)
–
–
–
–
Bank of England: McLaren and Schanbhogue (2011)
D’Amuri and Marcucci (2009)
Askitas (2009)
Ettredge et al. (2005)—Word Tracker
3. Prices (Scraping and non-traditional enumeration)
– Billion Prices
– Premise (hybrid)
Common points of these studies
• Compare aggregate trends of online search data
against official/administrative statistics
• Emphasize correlation rather than causality
• Find that that online search data can predict
observed trends within the appropriate lead
time (depends on the individuals and area of
economic statistics)
What can big data do for economic statistics?
Beyond correlations and predictive modelling:
1. Enhance quality and granularity of economic
statistics?
– Increase resolution and distributional information,
e.g. demographics and geographical location
2. Enhance availability of economic statistics?
– Example: Components of a household balance
sheet, e.g. consumer durables
Selecting the Main Source of Data
Define measurement objective
based on policy question, e.g.
distribution of wealth across
different quintiles of households
at provincial level
Identify approach based on
statistical policy
Identify main data source
based on FPOS and QAF
(Relevance, accuracy, timeliness,
punctuality, accessibility, clarity,
and comparability and
consistency over time) + Costefficiency
Existing dataset
Traditional Data Source
(surveys,
administrative records,
registers)
Data requirement X
Alternative Data
Source
Design new data
collection
Big data set
Using big data for distributional aspect
Select dataset
Example
• Online search keyword, e.g.
“insurance” and “repair/garage”
for automobiles, yellow pages
data for business address
searches
• Test correlations with any
existing official statistics/other
data source, e.g. household
surveys covering consumer
durables
Select variable of
disaggregation
Example
• Location, sex, age, etc.
• Test distribution of groups by
demographic characteristics
• Population Census data and
demographic distribution at
the national and sub-national
levels
• Household Income and
Expenditure Data for the item
in question, e.g. vehicle
ownership and its distribution
Apply in analysis
Example
• Use distribution of vehicle
ownership obtained through big
data sources on macroeconomic
aggregates
Using big data for enhancing data availability
Select dataset
Process data
Apply in analysis
Example
Example
• Value of vehicle owned
through purchase and
repair data, e.g.
insurance databases
• Blow up to national (if
possible sub-national)
level figures
• Calculate depreciation
• Differentiate
household enterprises
• In construction of
balance sheets
• Memo item for
national accounts
Challenges: Big data in official statistics
• Shift from planned data collection activities
• Possible mismatch between what big data can
offer and what the economic policy makers
need (comprehensiveness and comparability)
• Privacy of individuals and confidentiality of
data
• Lack of code of conduct covering all
stakeholders (public and private)
Opportunities: Big data in official statistics
• In the policy context we live in we need to
integrate different data sources
• Alternative sources of data can respond to such
needs (exhaustive, relational, flexible and
scalable)
• Maintaining TRUST of individuals is key
– “Fifty-four per cent of global consumers indicated
that they would be comfortable with the use of
information about them if they believed that the
uses would not embarrass them, damage their
interests, or otherwise harm them”
(BCG Global Consumer Sentiment Survey 2013)
Conclusions
1. Big data to complement official statistics
a. Conduct research for innovative statistics
development;
b. Provide quality insights through data confrontation
and;
c. Enhance availability of data by closing data gaps.
2. Statistical policy & actual methodological and
data gaps need to guide big data research to
allow for meaningful results that can be used
3. Big data has a potential role to bring in the
distributional and household aspect to
economic statistics
Next steps?
• Multiply the number of proposals embedded
in methodological and data needs
• Conduct studies with official and private
sources of data
Thanks and for comments/questions:
Zeynep Orhun Girard
orhun@un.org