The use of Google search data for macro

advertisement
The use of Google Search data
for macro-economic nowcasting
Per Nymand-Andersen,
European Central Bank
CCSA Special session on showcasing big data
ESCAP Headquarters, Bangkok, Thailand
Agenda
1
Reflections on “big data” for policy purposes
2
Show casing “big data” for macro-economic purposes
3
Preliminary lessons and way forward
Reflections on “big data” for policy purposes
1
“Big data are a source of information and intelligence that have been gathered
from a recorded action or from a combination of records”
For example:
• records of supermarket purchases (Walmart tracts > 1 mil. transactions/hour)
• robot and sensor information in production processes
• road tolls, train, ship, aeroplane, mobile tracking devices, navigation systems
• telephone operators and satellite sensors, Electronic images,
• behaviour, event-driven and opinion-gathering from search engines, such as social media
(Twitter, blogs, text messages, Facebook, LinkedIn),
• speech and word recognition
• credit and debit payments, trading and settlement platforms,
The list seem endless as more and more information becomes public and digital
1
Reflections on “big data” for policy purposes
The term “big data” – a large variety of interpretation*
While some institutions may consider
“administrative data” (business registers)
security datasets) as “big data”; others
complexity of combining size, formats
public/private sources
single sourced data, such as granular
or micro information data” (security-bymay take a more holistic approach of
and sources mainly focussed on non
 Big data is not just about large data sets.
The 4 Vs (IBM) relates to Volume, Velocity, Variety and Veracity.
Volume
Scale of data
Velocity
Analysis of
streaming
data
Variety
Different
forms of data
Veracity
Uncertainty
of data
“Big data – The hunt for timely insights and decision certainty. Central banking reflections on the use of big data for policy purposes”
P. Nymand- Andersen, IFC publication, (2015).
1
Reflections on “big data” for policy purposes
2
Show casing “big data” for macro-economic purposes
 Since 2008, new and increasing field for experimental nowcasting of mainly
consumption and selective macro-economic indicators
Macro-economic topic and number of releases
0
2
4
6
Unemployment
Stock market
House market
Predict sales/consumption
Travel
Consumer sentiment
Inflation
State of economy
Detect influenza
“Predicting the euro area unemployment rate using Google data: central banks’ interest in and use of big data.
”Nymand- Andersen, P & Koivupalo H, forthcoming publication (2015).
8
10
12
2
Show casing “big data” for macro-economic purposes
Authors
Area of macro-economic topic
Hal Varian & Choi (2009, 2011, 2013)
unemployment rate, retail sales, home sales,
travel/tourism, car sales, consumer confidence,
Zimmermann K & Askitas N (2009)
DE unemployment rate
D’Amuri F, & Marcucci J (2010, 2013)
US unemployment rate
McLaren N & Shanbhogue R (2011)
UK unemployment rate & housing market trends
Vosen & Schmidt (2011)
DE private consumption
Carriere-Swallow (2011)
Car purchases in Chile
Guzmán G (2011)
Inflations
Fantazzini D & Toktamysova Z (2014)
German car sales
Morgan J, e all (2015)
DE, FR, IT, ES NL unemployment rates
2
Show casing “big data” for macro-economic purposes
 How to use google search data to nowcast euro area unemployment
Eurostat’s euro area 13 and 19 unemployment rates
 testing using two periods; 2011–2012 & 2012–2014
 Dataset: Google search data (google search machines)
 using Google’s taxonomy of categorising search terms, includes 26 main categories and
269 sub-categories. (Finance and Banking)
 Google search data is an index of weekly volume changes
 The volumes are normalised starting at 1.00 and next week value shows the relative
change of Google searches within the category (no absolute volumes)
 Data from 14 countries: Austria, Belgium, Denmark, France, Germany, Ireland, Italy, Netherlands,
Portugal, Spain, Sweden, Slovenia, United Kingdom, USA
2
Show casing “big data” for macro-economic purposes
 Two autoregressive models are used to nowcast euro area unemployment rate
log(yt) = a + b* log(yt-1) + c*log(yt-y12) + et,
log(yt) = a + b* log(yt-1) + c*log(yt-y12) + G + et,
Where Y(t) is the unemployment rate at month(t)
And G is the google search index
2
Show casing “big data” for macro-economic purposes
2
Show casing “big data” for macro-economic purposes
Unemployment rate – EA13
MAE/Forecast
period
Base model
base model
Google data
Errors reduced
Unemployment rate – EA18
Jan2011–
Dec2012
Nov2012–
Oct2014
Jan2011–
Dec2012
Nov2012–
Oct2014
1,97
1,61
2,23
1,73
1,97
1,41
2,02
1,57
18,1%
22,6%
28,7%
22,2%
Applying the mean absolute error (MAE)
 Preliminary indications suggest that the naïve model including the Google data seems to
perform better over the two periods
 The improvement (reduction in the errors) range from 18.1% to 28,7%
3
Preliminary lessons and way forward
Robustness
Methodology
Quality
•stability of search terms
•volatility in analytical
results
•based on one search engine
•coverage, weights, normalisations
•aggregation methods
•price information
•short time series
•differ across regions
•no quality measurements
•No unit tracking
•rebasing and time lag
•home and host concept
3
Preliminary lessons and way forward
Usability
Availability
Innovation
• nowcasting of retail consumption
and selective macro-economic
indicators
• conjunctural analysis
• consumer behaviour
• price indexes
• public and free, easy to use
• one system for all countries
• comparability & timeliness
• large taxonomy of searches
•trends in communications
•product loyalty
•advertisement
•social patterns in retail markets
•households & business surveys
3
Preliminary lessons and way forward
 new ideas for statistical input are always meet with a degree of scepticism
 simple, cheap and easy to put into statistics production
 creates dependencies though always free in the start up phase
 challenges the statistics communication function
 Statisticians may need to explore private sources in meeting increasing user
demands for statistics
3
Preliminary lessons and way forward
Central banks are interested in cooperating in a structural approach
• establishing a big data road map
• identify joint pilot projects
• sharing experience
Relevant pilot projects within the field of using
1) administrative dataset
(e.g. corporate balance sheet data)
2) web search data set
(e.g. Google type search info)
3) commercial dataset
(e.g. credit card operators)
4) financial market data
(e.g. high frequency trading)
Outlet for statistical papers including big data
3
 ECB Statistics Paper Series (big data)
• “Nowcasting GDP with electronic payments data” by Galbraith J & Tkacz G.
–
–
Electronic payment transactions can be used in nowcasting current gross domestic product growth
finds that debit card transactions contribute most to forecast accuracy
• “Social media sentiment and consumer confidence” by Daas P & Puts M
–
–
Relationships between the changes in consumer confidence and Dutch public social media?
Could be used as an indicator for changes in consumer confidence and as an early indicator
• “Quantifying the effects of online bullishness on international financial markets”
by Mao H & Counts S, Bollen J.
–
–
Develops a measure of investor sentiment based on Twitter and Google search queries
Twitter and Google bullishness are positively correlated to investor sentiment
Download