Just_Google It

advertisement
Just Google It: Can Internet Search Terms
Help Explain Movements in Retail Sales?
Daniel Ayoubkhani (ONS) & Matthew Swannell (ONS)
Contents
1.
2.
3.
4.
5.
6.
7.
8.
Introduction to Google Trends
Existing Literature
Aims of Current ONS Research
Data
Methods
Results
Conclusions
Considerations
1. Introduction to Google Trends
•
Google provide information on search query
share for a given week
•
Data are available in 25 top level categories
and hundreds of lower level categories
•
Reported as how share of search queries
has grown since 1st week of January 2004
1. Introduction to Google Trends
Search Query: Football Transfers
Source: Google Insights for Search
1. Introduction to Google Trends
Summer Transfer
Deadline Reached
Search Query: Football Transfers
January Transfer
Deadline Reached
Summer Transfer Window January Transfer Window
2. Existing Literature
Choi, H and Varian, H (2009) Predicting the
Present with Google Trends:
• Paper pioneered use of Google Trends (GT)
data as a nowcasting tool
• Applied log–linear “nowcast” to US retail
sales
• Performance of models increased when
Google Trends data were included
2. Existing Literature
Chamberlin, G (2010) Googling the Present,
Economic and Labour Market Review (Dec
2010):
• Modelled 11 UK Retail Sales Index (RSI) time
series
• Relatively simple benchmark models
• Alternative models included GT category data
as predictors
• GT terms significant in eight models
3. Aims of Current ONS Research
Focus of this investigation: quality assurance of
the UK RSI
1. Fit benchmark models that are representative
of current ONS practice
2. Fit alternative models that include appropriate
GT terms as predictors
3. Compare models using empirical measures
4. Draw conclusions to inform ONS strategy
4. Data – Retail Sales Index
•
•
•
•
•
•
•
•
•
•
•
All Retail Sales
Non-Specialised Food Stores
Non-Specialised Non-Food Stores
Textiles, Clothing and Footwear
Furniture and Lighting
Home Appliances
Hardware, Paints and Glass
Audio and Video Equipment and Recordings
Books, Newspapers and Stationary
Computers and Telecommunications
Non-Store Retailing
4. Data – Retail Sales Index
All extracted RSI time series:
• represent monthly GB retail sales
• start in January 1988
• end in June 2011
• are not seasonally adjusted
• are chained volume indices
4. Data – Retail Sales Index
Source: ONS
4. Data – Google Trends
• All extracted GT time series:
• represent weekly UK search activity
• start in January 2004
• end in July 2011
• Each RSI series matched with:
• at least one GT search category
• top five search queries with each category
4. Data – Google Trends
RSI Series: Furniture and Lighting
Google Trends Category
Google Trends Queries
Lighting
lighting, light, lights, lamp, lamps
Home and Garden
furniture, ikea, garden, b&q, homebase
Homemaking and Interior Decor
blinds, curtains, curtains curtains
curtains, bedroom, ikea
Home Furnishings
furniture, ikea, beds, lighting, table
table
4. Data – Google Trends
• Raw data are weekly growth rates in query
shares
• Indices constructed by setting first full week in
January 2004 to 100 and applying growth
rates
• Monthly data formed by taking weighted
averages of weekly data
5. Methods – Benchmark Models
• Each RSI “month” is 4- or 5-week long period (SRP)
• Disparity between survey and Gregorian months
evolves by one or two days each year (“phase shift”)
• One-week long survey break every five or six years
• Example – September SRP:
August
26 27 28 29 30 31 1
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2
3
4
5
6
7
8
September
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1
October
2 3 4
5
5. Methods – Benchmark Models
Therefore SRPs not comparable with each
other due to:
• their compositions
• moving holidays
Holiday
Position
SRP
Easter
Good Friday and
Easter Monday
Mar or Apr
Spring (Late May)
Last Monday in May
May or Jun
Summer (Late August)
Last Monday in
August
Aug or Sep
5. Methods – Benchmark Models
• Regression models used to estimate phase
shift effects
• Example – Spring bank holiday variable:
1
x t = -0.8
0
In May, in years where the bank holiday is in the May SRP
In June, in years where the bank holiday is in the May SRP
Otherwise
5. Methods – Benchmark Models
Differenced (regular and seasonal)
y   x  z
t
it
i
i
t
Log transformed
z
t

y  x
t
Follows an ARMA process
i
i
it
5. Methods – Alternative Models
Benchmark models extended with (log transformed,
differenced) GT variables
• Static relationships estimated for all series
• Lagged relationships modelled where identified
• Relationships identified at more than one lag
modelled both individually and together
• Multiple regression models estimated for RSI
series matched with more than one GT search
category
5. Methods – Alternative Models
Lagged relationships identified from crosscorrelation plots of pre-whitened series
• ARIMA models fit to all RSI and GT series
• used the (0,1,1)(0,1,1) model for all series
• Each RSI residual series correlated with each
of its corresponding GT residual series
• series exhibit common trends and seasonality, so
correlate the shocks
5. Methods – Alternative Models
• Example – Furniture and Lighting vs “garden”
5. Methods – Alternative Models
• Example – Furniture and Lighting vs “garden”
No significant phase shift effects so models are:
yt     xt  zt
yt     xt  2  zt
yt     xt 3  zt
yt    1 xt  2   2 xt 3  zt
6. Results
Component of the RSI
(and no. alternative models fitted)
% Alternative
models with AICC
lower than
benchmark
% GT terms
significant at 5%
level
All Retail Sales (8)
0.0
37.5
Non–Specialised Food Stores (6)
0.0
0.0
Non-Specialised Non-Food Stores (6)
0.0
83.3
Textiles, Clothing and Footwear (23)
30.4
36.0
Furniture and Lighting (31)
90.3
78.8
Home Appliances (7)
14.3
0.0
Hardware, Paints and Glass (6)
50.0
100.0
Audio Equipment and Recordings (44)
43.2
51.0
Books, Newspapers and Stationary (6)
16.7
100.0
Computers and Telecoms (31)
9.7
15.2
Non-Store Retailing (7)
42.9
42.9
6. Results – Furniture and Lighting
Top three alternative models in terms of AICC
GT Term in Model
Lag(s)
GT Category
AICC
lighting
0
Home Furnishings
412.47
curtains curtains
curtains
0&1
Homemaking &
Interior Decor
414.76
lights
0
Lighting
415.63
Benchmark
432.29
6. Results – Furniture and Lighting
Top three alternative models in terms of MAPE
• Out-of-sample, one-step-ahead predictions
• 12 periods: July 2010 – June 2011
GT Term in Model
Lag(s)
GT Category
MAPE
lighting
0
Home Furnishings
2.38
lighting
0
Lighting
2.49
Home Furnishings
0
N/A
2.51
Benchmark
3.87
7. Conclusions
• Promising results for some RSI components...
• Furniture and Lighting
• Hardware, Paints and Glass
• Audio Equipment and Recordings
• ...but less so for others
• All Retail Sales
• Non-Specialised Food Stores
• Non-Specialised Non-Food Stores
Additional information is only useful when the RSI
series is not dominated by trend and seasonality
8. Considerations
• GT variable selection
• Transitory nature of search queries
• Changes to GT category taxonomy
• Future cost and accessibility of GT data?
• Wider applicability to ONS outputs?
Questions?
Download