12 Memobust course Seasonal Adjustment

advertisement
Seasonal Adjustment
Eurostat
Topics
•
Motivation and theoretical background (Øyvind Langsrud)
•
Seasonal adjustment step-by-step (László Sajtos)
•
(A few) issues on seasonal adjustment (László Sajtos)
Presented by
• Øyvind Langsrud
• Statistics Norway
Time series with seasonal and non-seasonal variation
80
60
40
a1
100
120
140
Index of production: Durable consumer goods
2004
2006
2008
Time
2010
2012
Removing the seasonal variation
40
60
80
100
120
140
Original (black) and seasonally adjusted (blue)
2004
2006
2008
Time
2010
2012
Removing also the non-seasonal variation
40
60
80
100
120
140
Original (black), seasonally adjusted (blue) and trend (red)
2004
2006
2008
Time
2010
2012
Monthly time series example
80
100
120
140
160
Original series: Retail sales volume index
2000
2002
2004
2006
2008
2010
2012
2014
• Trend and seasonality can be seen
– How to find it by computation?
Quick and dirty calculation of trend by ordinary linear regression:
120
140
160
y = a + b*time + e
80
100
a = -6619.731
b = 3.351223
2000
2002
2004
2006
2008
2010
2012
2014
time = 2000.000, 2000.083, 2000.167, 2000.250,
2000.333, 2000.417, 2000.500, 2000.583,
2000.667, 2000.750, 2000.833, 2000.917,
2001.000, 2001.083, …...
Including seasonality in "the dirty model"
y = a + b*time + cmonth + e
80
100
120
140
160
Original (blue) and model fit (red)
2000
2002
2004
2006
2008
2010
2012
2014
80
y = a + b*time + cmonth + e
100
120
Including seasonality in "the dirty model"
140
160
Original (blue) and model fit (red)
2000
2002
2004
2006
2008
2010
2012
a = -6468.505
b = 3.275956
c =
mnd0
mnd2
-9.19620250 -16.59062737
mnd7
mnd8
1.84439111
4.62139480
mnd3
-6.79790939
mnd9
-2.56494236
mnd4
-8.51090569
mnd10
-0.04409251
mnd5
-1.18890200
mnd11
1.53598811
mnd6
6.33881598
mnd12
30.55299181
• Transforming to seasonal adjustment language
a + b*time → Tt
yt = Tt + St + It
cmonth → St
e → It
2014
Trend from "the dirty model"
yt = Tt + St + It
80
100
120
140
160
Original (blue) and trend (red)
2000
2002
2004
2006
2008
2010
2012
2014
yt = Tt + St + It
Seasonality from "the dirty model"
-10
0
10
20
30
Seasonality
2000
2002
2004
2006
2008
2010
2012
2014
yt = Tt + St + It
Seasonal adjustment by "the dirty model"
80
100
120
140
160
Original (blue) and seasonal adjusted (red)
2000
2002
2004
2006
2008
2010
2012
2014
Question to the audience:
What is wrong with this
ordinary regression approach ?
yt = Tt + St + It
Irregular component by "the dirty model"
-5
0
5
10
Irregular componet
2000
2002
2004
2006
2008
2010
2012
2014
In practise a multiplicative model is used:
yt = Tt × St × It
yt is not the original series but a series that is corrected for holiday and
trading day effects (calendar adjusted)
yt = Tt × St × It
100
120
140
160
Original (blue) and trend (red)
80
•
2000
2002
2004
2006
2008
2010
2012
2014
yt = Tt × St × It
0.9
1.0
1.1
1.2
1.3
Seasonal factors
2000
2005
•
2010
Note that the seasonal factors vary slightly along time
2015
Irregular componet
0.97
0.98
0.99
1.00
1.01
1.02
yt = Tt × St × It
2000
2002
2004
2006
2008
2010
2012
2014
• This time the irregular component looks more as
true noise
• Note that correlated neighbour values is allowed
(autocorrelation)
yt = Tt × St × It
80
100
120
140
160
Original (blue) and seasonally adjusted (red)
2000
2002
2004
2006
2008
2010
2012
2014
• This is seasonally adjusted data as published by
Statistics Norway
Multiplicative model: yt = Tt × St × It
Additive model: yt = Tt + St + It
How to calculate Tt, St, and It from yt?
• This is done by filtering
techniques
120
Seasonally adjusted (blue) and trend (red)
90
100
110
– One element of this
methodology is how to
calculate the trend from
seasonally adjusted data
– This is a question of
smoothing a noisy series
2000
2002
2004
2006
2008
2010
2012
2000-2014
90
100
110
120
Seasonally adjusted (blue) and trend (red)
2000
2002
2004
2006
2008
2010
2012
2014
2007-2012
110
115
120
Seasonally adjusted (blue) and trend (red)
2007
2008
2009
2010
2011
2012
Smoothing by averaging
• Pt = (Yt-1+ Yt + Yt+1)/3
110
115
120
3-term simple moving average: [1,1,1]/3
2007
2008
2009
2010
2011
2012
Also called filtering
• Pt = (Yt-2+ Yt-1+ Yt + Yt+1 + Yt+2)/5
• The filter is [1,1,1,1,1]/5
110
115
120
5-term simple moving average: [1,1,1,1,1]/5
2007
2008
2009
2010
2011
2012
Here the filter length is 9
110
115
120
9-term simple moving average: [1,1,1,1,1,1,1,1,1]/9
2007
2008
2009
2010
2011
2012
Filtering can be performed twice
• 3x3 filter
– 3-term moving average of a 3-term moving average
– The final filter is [1,2,3,2,1]/9
– Pt = (Yt-2+ 2Yt-1+ 3Yt + 2Yt+1 + Yt+2)/9
• 2x12 filter
– [1/2,1,1,1,1,1,1,1,1,1,1,1,1/2]/12
– Also called a centred 12-term moving average
– Question to the audience:
 Why is this filter of special interest?
Henderson filters
• Finding filters with good properties is an
interesting topic …
• Hederson (1916) introduces the so-called
Henderson filters
• X-12-ARIMA uses this type of filter to calculate
the trend
• The filter length determines the degree of
smoothing
110
115
120
5-term Henderson: [-21,84,160,84,-21]/286
2007
2008
2009
2010
2011
2012
110
115
120
7-term Henderson: [-42,42,210,295,210,42,-42]/715
2007
2008
2009
2010
2011
2012
110
115
120
13-term Henderson: [-325,-468,0,1100,2475,3600,4032,3600,2475,1100,0,-468,-325]/16796
2007
2008
2009
2010
2011
2012
110
115
120
23-term Henderson filter
2007
2008
2009
2010
2011
2012
Question to the audience: Why does the filtered series stop in 2009?
110
115
120
99-term Henderson filter
2007
2008
2009
2010
2011
2012
Non-available observations at the end:
Two solutions
• Asymmetric filters
– Asymmetric variant of Henderson
 [-0.034,0.116,0.383,0.534,0,0,0]
 Can be used at the last observation
• Forecasts in place of the unobserved values
– The “starting series” for the X12-ARIMA decompositions is
a calendar adjusted series which is based on reg-ARIMA
modelling
– The reg-ARIMA modelling can also be used to produced
forecasts
– X12-ARIMA uses these forecasts in trend calculations
1.2
1.1
1.0
0.9
Finding the
seasonal
component
by filtering
1.3
Series with trend removed
2000
2002
2004
2006
2008
2010
2012
• From a series with the trend removed we make
12 series
– January-values, February-values, …
• Each of these series is smoothed by filtering
• Altogether these smoothed series are the
seasonal component
2014
The X12-ARIMA algorithm
• The decomposition is made by several iterative
steps
– Seasonal component from series with trend removed
– Trend from series with seasonal component removed
• Initial estimate of trend using the 2x12 moving
average
• One element is downweighting of observations
with an extreme irregular component
X12-ARIMA or SEATS
• Both method can be viewed as filtering techniques
• X12-ARIMA
– A non-parametric method
– No model assumed
• SEATS
– The components are assumed to follow ARIMA models
– The filters are derived from modelling
– Possible to do inference and to make forecasts with
confidence intervals
– So why the name X12-ARIMA when this method is the
one that is not based on ARIMA?
 Answer on the next slide
Calendar adjustment by reg-ARIMA modelling
"The dirty model"
mentioned earlier:
• Seasonal ARIMA model
– Correlated errors (autocorrelation)
– Differencing the series makes the model quite good without explicit
parameters for trend and seasonality
– Need to decide the type of ARIMA model: ARIMA(p,d,q)(P,D,Q)
• Regression parameters in the model
– Calendar effects: Trading day, Moving holyday, …
– Outliers and level shifts
• Here y can be a log-transformed and leap-year adjusted
variant of the original data
 This slide is “stolen” from
https://www.scss.tcd.ie/Rozenn.Dahyot/ST7005/15SeasonalARIMA.pdf
 Here B is the backshift operator: BYt =Yt-1
 ARIMA(0,1,1)(0,1,1)
 Most common model
 Airline model
Example of
regression variables
in reg-ARIMA
modelling
• Easter
– 2000 and 2001: Easter in
April
– 2008: Easter in March
– 2002: 4 of 5 Norwegian
Easter days in March
• Trading day
– Six parameters needed to
model seven days
– Mon: Number of Mondays
minus Number of Sundays
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
:
Mar
Apr
May
Jun
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2001
2001
2001
2001
2001
2001
2001
2001
2001
2001
2001
2001
2002
2002
2002
2002
2002
:
2008
2008
2008
2008
Easter Mon Tue Wed Thu Fri Sat
0.0000000
0 -1 -1 -1 -1
0
0.0000000
0
1
0
0
0
0
-0.2571429
0
0
1
1
1
0
0.2571429 -1 -1 -1 -1 -1
0
0.0000000
1
1
1
0
0
0
0.0000000
0
0
0
1
1
0
0.0000000
0 -1 -1 -1 -1
0
0.0000000
0
1
1
1
0
0
0.0000000
0
0
0
0
1
1
0.0000000
0
0 -1 -1 -1 -1
0.0000000
0
0
1
1
0
0
0.0000000 -1 -1 -1 -1
0
0
0.0000000
1
1
1
0
0
0
0.0000000
0
0
0
0
0
0
-0.2571429
0
0
0
1
1
1
0.2571429
0 -1 -1 -1 -1 -1
0.0000000
0
1
1
1
0
0
0.0000000
0
0
0
0
1
1
0.0000000
0
0 -1 -1 -1 -1
0.0000000
0
0
1
1
1
0
0.0000000 -1 -1 -1 -1 -1
0
0.0000000
1
1
1
0
0
0
0.0000000
0
0
0
1
1
0
0.0000000
0 -1 -1 -1 -1
0
0.0000000
0
1
1
1
0
0
0.0000000
0
0
0
0
0
0
0.5428571 -1 -1 -1 -1
0
0
-0.5428571
1
1
0
0
0
0
0.0000000
0
0
1
1
1
0
:
0.7428571
0 -1 -1 -1 -1
0
-0.7428571
0
1
1
0
0
0
0.0000000
0
0
0
1
1
1
0.0000000
0 -1 -1 -1 -1 -1
Trading day: Separate effect of each day or
common effect of all weekdays?
• Question to the
audience:
Regression Model
-------------------------------------------------------------Parameter
Standard
Variable
Estimate
Error
t-value
-------------------------------------------------------------Trading Day
Mon
-0.0019
0.00193
-1.00
Tue
0.0064
0.00194
3.31
Wed
0.0018
0.00190
0.94
Thu
-0.0016
0.00195
-0.81
Fri
0.0138
0.00188
7.37
Sat
0.0034
0.00193
1.73
*Sun (derived)
-0.0219
0.00196
-11.16
– Why exactly
equal t-values?
Regression Model
-------------------------------------------------------------Parameter
Standard
Variable
Estimate
Error
t-value
-------------------------------------------------------------Trading Day
Weekday
0.0036
0.00053
6.87
**Sat/Sun (derived)
-0.0090
0.00131
-6.87
Outliers
• An extreme observation caused by a special event can
be problematic
– Can influence the modelling in a negative way



Parameter estimates
Forecasts
Decomposition
• Solution
– Include the outlier as a dummy variable in the reg-ARIMA
modelling

….0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0….
– The outlier is included in the irregular component after
modelling


The observation is still included in seasonally adjusted data
But has no effect on the trend
 Question to the audience: Examples of special events?
90
100
110
120
Data with outlier: Seasonally adjusted (blue) and trend (red)
2000
2002
2004
2006
2008
2010
2012
2014
85
90
95
100
105 110 115
Data with level shift: Seasonally adjusted (blue) and trend (red)
2000
2002
2004
2006
2008
2010
2012
2014
• Level shift is handled similar to outliers
– Regression variable: ….0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1….
– Level shift is included in the trend
Presented by
• László Sajtos
• Hungarian Central Statistical Office
Topics
• Seasonal adjustment step-by-step
• (A few) issues on seasonal adjustment
Seasonal adjustment step-by-step
Seasonal adjustment step-by-step: structure
Input data
STEPS with check points
Not acceptable results
Preliminary results
If results are acceptable
Output data
Time series analysis (STEP 0)
Basic conditions
•
Length of time series (enough long to be seasonally adjusted?)
 Monthly datasets: at least 3-year long
 Quarterly datasets: at least 4-year long
At least 5-7-year long time series is optimal!
Expert information
•
Collecting expert data from the sections about datasets (potential outliers,
methodological changes, changes in exterior factors (e.g. law), connections
to other time series and sectors)
Graphical analysis, test for seasonality (STEP 1)
•
Graphical analysis via basic and sophisticated graphs
Plotted raw dataset
Spectral analysis: autocorrelogram and
auto-regressive spectrum
•
Identifying and explaining missing observations and outliers
•
Correction of data faults
•
Test for seasonality
Graphical analysis, an example (2000-2013)
Élelm. jell.
Seasonality
144
136
128
Seems additive
120
112
104
96
88
80
Probably outliers
72
64
date
56
Jan2000
Jan2002
Jan2004
Jan2006
Jan2008
Jan2010
Jan2012
Jan2014
Data: Hungarian monthly retail volume index, food
Type of transformation (STEP 2)
Software
tools
Automatic test
Verification
Graphical analysis
Calendar adjustment (STEP 3)
Determining factors which may
affect (regressors)+national holidays
Consideration based on
professional reasons
Elimination
Little significance
Significance
Non-significance or absence
Keep
Consideration based on
professional reasons
Elimination
Outlier treatment (Step 4)
Software tools
Available expert
information
Verifying the results
Automatic outlier testing
STEP 1
Less significant, but
professionally
reasonable
Significant
Keep it
Monitoring
Stability
Not
significant
Consideration based on
professional reasons
Eliminate
it
ARIMA model (Step 5)
Software tools
Automatic choice recommended
Good results
Not satisfying results
Keep model
Airline model
Manual settings
Reducing the order of the model
Other low ordered models
Decomposition (Step 6)
Software tools
Eliminating deterministic
effects
Decomposition
Additive
Multiplicative
Log-additive
Quality diagnostics (Step 7)
1.
Model adequacy on residuals:
•
•
Ljung-Box test
Box-Pierce test
2.
Seasonality: based on spectral graphics
3.
Stability analysis: sliding spans
Documentation
required!
Manual settings (Step 8)
In case of:
•
Detailed analysis
•
Quality diagnostics are not auspicious
•
Further outlier correction
•
Other advanced settings (e.g. confidence intervals)
Manual settings
satisfying
Quality diagnostics
not
Manual settings
Dissemination
(STEP 9)
EXAMPLE (IN DEMETRA 2.04 SOFTWARE)
HUNGARIAN INDUSTRIAL TIME SERIES
Automated module
Open the input database
The list of time series
Selection of time series output
Save of output
Diagnostic, outlier %
Adjustment without fixed models
Setting the method and trading day regressor
Setting the country specific holidays
The results
Manual settings required
Quality diagnostics
(A few) issues on seasonal adjustment
Issues in Memobust book
• Consistency issues
Data presentation
• Revision
Issues on chained indices
• Treatment of the crisis
Documentation
• Communication with users
Revision
Revision
Unadjusted
data
SA data
Reasons:
•
•
Reasons:
New information are available
Better estimation required.
What to do: Estimating new model,
new seasonal factors
•
•
Data arrival after deadline
Erroneous data etc.
What to do: Data review
Revision strategies
Goal: preserving accuracy, taking new information into consideration while
avoiding large changes
reliability and stability
Strategies:
Extreme
Extreme types
types
Current
Concurrent
Alternative
Alternative types
types
Partial
concurrent
Controlled
current
Horizon of revision
Question: How many months of data should be revised?
Practices:
• ESS Guideline: 3-4 years before the beginning of the revision
period
• Statistics Denmark: at least 13 months back in time
Consistency issues
Linkages in economy and
among time
series;expectations of users;
errors; etc.
Issues
Time consistency issue
Temporal constraints
E.g.Annual and infra-annual series
Aggregation consistency issue
Cross-sectional
constraints
E.g.Total industrial and segmental
series
Time consistency issues
Problem: consistency of, for instance, sub-annual and annual series
e.g. GDP
4
i=1
GDPquarterly
≠ GDPannual
i
Sources of inconsistency:
• Less and more accurate data are compared;
• Sampling errors;
• Errors in evaluation
Benchmarking
Benchmark: typically annual data
Aim: Providing time consistency, the techniques operate with the
sum of modified sub-annual series
Benchmarking
Pro-rating
method
Denton
method
Pro-rating method
How it works: multiplies the sub-annual values by the
corresponding annual proportional discrepancies
Example: Three observations (y0 , y1 , y2 ), requirement:
y0 =y1 +y2
y1 y0
Corrected values: y1 → b1 =
;
(y1+y2)
y1 +y2 y0
b1 +b2 =
=y0
y1 +y2
y2 y0
y2 → b2 =
y1+y2
Denton method
How it works: Based on quadratic optimalization
Advantages:
• The method can be developed, specificated
• More reliable results (smaller discontinuities compared with
pro-rating)
Aggregation consistency
Aggregate series: time series consists of several components
(e.g. industrial series)
Goal: The aggregate series should equal to the sum of their
components
Problem: Non-linear seasonal adjustment process
Direct SA
(𝑋 + 𝑌)𝑆𝐴
≠
Indirect SA
𝑋 𝑆𝐴 +𝑌 𝑆𝐴
Consequences: Hard to preserve accounting relationships, and
meet users’ expectations
Methods to achieve aggregation consistency
• Only direct or indirect seasonal adjustment
• Pro-rating
• Denton method
• Regression based models
Thank you for your attention!
Questions?
Download