Seasonal Adjustment Eurostat Topics • Motivation and theoretical background (Øyvind Langsrud) • Seasonal adjustment step-by-step (László Sajtos) • (A few) issues on seasonal adjustment (László Sajtos) Presented by • Øyvind Langsrud • Statistics Norway Time series with seasonal and non-seasonal variation 80 60 40 a1 100 120 140 Index of production: Durable consumer goods 2004 2006 2008 Time 2010 2012 Removing the seasonal variation 40 60 80 100 120 140 Original (black) and seasonally adjusted (blue) 2004 2006 2008 Time 2010 2012 Removing also the non-seasonal variation 40 60 80 100 120 140 Original (black), seasonally adjusted (blue) and trend (red) 2004 2006 2008 Time 2010 2012 Monthly time series example 80 100 120 140 160 Original series: Retail sales volume index 2000 2002 2004 2006 2008 2010 2012 2014 • Trend and seasonality can be seen – How to find it by computation? Quick and dirty calculation of trend by ordinary linear regression: 120 140 160 y = a + b*time + e 80 100 a = -6619.731 b = 3.351223 2000 2002 2004 2006 2008 2010 2012 2014 time = 2000.000, 2000.083, 2000.167, 2000.250, 2000.333, 2000.417, 2000.500, 2000.583, 2000.667, 2000.750, 2000.833, 2000.917, 2001.000, 2001.083, …... Including seasonality in "the dirty model" y = a + b*time + cmonth + e 80 100 120 140 160 Original (blue) and model fit (red) 2000 2002 2004 2006 2008 2010 2012 2014 80 y = a + b*time + cmonth + e 100 120 Including seasonality in "the dirty model" 140 160 Original (blue) and model fit (red) 2000 2002 2004 2006 2008 2010 2012 a = -6468.505 b = 3.275956 c = mnd0 mnd2 -9.19620250 -16.59062737 mnd7 mnd8 1.84439111 4.62139480 mnd3 -6.79790939 mnd9 -2.56494236 mnd4 -8.51090569 mnd10 -0.04409251 mnd5 -1.18890200 mnd11 1.53598811 mnd6 6.33881598 mnd12 30.55299181 • Transforming to seasonal adjustment language a + b*time → Tt yt = Tt + St + It cmonth → St e → It 2014 Trend from "the dirty model" yt = Tt + St + It 80 100 120 140 160 Original (blue) and trend (red) 2000 2002 2004 2006 2008 2010 2012 2014 yt = Tt + St + It Seasonality from "the dirty model" -10 0 10 20 30 Seasonality 2000 2002 2004 2006 2008 2010 2012 2014 yt = Tt + St + It Seasonal adjustment by "the dirty model" 80 100 120 140 160 Original (blue) and seasonal adjusted (red) 2000 2002 2004 2006 2008 2010 2012 2014 Question to the audience: What is wrong with this ordinary regression approach ? yt = Tt + St + It Irregular component by "the dirty model" -5 0 5 10 Irregular componet 2000 2002 2004 2006 2008 2010 2012 2014 In practise a multiplicative model is used: yt = Tt × St × It yt is not the original series but a series that is corrected for holiday and trading day effects (calendar adjusted) yt = Tt × St × It 100 120 140 160 Original (blue) and trend (red) 80 • 2000 2002 2004 2006 2008 2010 2012 2014 yt = Tt × St × It 0.9 1.0 1.1 1.2 1.3 Seasonal factors 2000 2005 • 2010 Note that the seasonal factors vary slightly along time 2015 Irregular componet 0.97 0.98 0.99 1.00 1.01 1.02 yt = Tt × St × It 2000 2002 2004 2006 2008 2010 2012 2014 • This time the irregular component looks more as true noise • Note that correlated neighbour values is allowed (autocorrelation) yt = Tt × St × It 80 100 120 140 160 Original (blue) and seasonally adjusted (red) 2000 2002 2004 2006 2008 2010 2012 2014 • This is seasonally adjusted data as published by Statistics Norway Multiplicative model: yt = Tt × St × It Additive model: yt = Tt + St + It How to calculate Tt, St, and It from yt? • This is done by filtering techniques 120 Seasonally adjusted (blue) and trend (red) 90 100 110 – One element of this methodology is how to calculate the trend from seasonally adjusted data – This is a question of smoothing a noisy series 2000 2002 2004 2006 2008 2010 2012 2000-2014 90 100 110 120 Seasonally adjusted (blue) and trend (red) 2000 2002 2004 2006 2008 2010 2012 2014 2007-2012 110 115 120 Seasonally adjusted (blue) and trend (red) 2007 2008 2009 2010 2011 2012 Smoothing by averaging • Pt = (Yt-1+ Yt + Yt+1)/3 110 115 120 3-term simple moving average: [1,1,1]/3 2007 2008 2009 2010 2011 2012 Also called filtering • Pt = (Yt-2+ Yt-1+ Yt + Yt+1 + Yt+2)/5 • The filter is [1,1,1,1,1]/5 110 115 120 5-term simple moving average: [1,1,1,1,1]/5 2007 2008 2009 2010 2011 2012 Here the filter length is 9 110 115 120 9-term simple moving average: [1,1,1,1,1,1,1,1,1]/9 2007 2008 2009 2010 2011 2012 Filtering can be performed twice • 3x3 filter – 3-term moving average of a 3-term moving average – The final filter is [1,2,3,2,1]/9 – Pt = (Yt-2+ 2Yt-1+ 3Yt + 2Yt+1 + Yt+2)/9 • 2x12 filter – [1/2,1,1,1,1,1,1,1,1,1,1,1,1/2]/12 – Also called a centred 12-term moving average – Question to the audience: Why is this filter of special interest? Henderson filters • Finding filters with good properties is an interesting topic … • Hederson (1916) introduces the so-called Henderson filters • X-12-ARIMA uses this type of filter to calculate the trend • The filter length determines the degree of smoothing 110 115 120 5-term Henderson: [-21,84,160,84,-21]/286 2007 2008 2009 2010 2011 2012 110 115 120 7-term Henderson: [-42,42,210,295,210,42,-42]/715 2007 2008 2009 2010 2011 2012 110 115 120 13-term Henderson: [-325,-468,0,1100,2475,3600,4032,3600,2475,1100,0,-468,-325]/16796 2007 2008 2009 2010 2011 2012 110 115 120 23-term Henderson filter 2007 2008 2009 2010 2011 2012 Question to the audience: Why does the filtered series stop in 2009? 110 115 120 99-term Henderson filter 2007 2008 2009 2010 2011 2012 Non-available observations at the end: Two solutions • Asymmetric filters – Asymmetric variant of Henderson [-0.034,0.116,0.383,0.534,0,0,0] Can be used at the last observation • Forecasts in place of the unobserved values – The “starting series” for the X12-ARIMA decompositions is a calendar adjusted series which is based on reg-ARIMA modelling – The reg-ARIMA modelling can also be used to produced forecasts – X12-ARIMA uses these forecasts in trend calculations 1.2 1.1 1.0 0.9 Finding the seasonal component by filtering 1.3 Series with trend removed 2000 2002 2004 2006 2008 2010 2012 • From a series with the trend removed we make 12 series – January-values, February-values, … • Each of these series is smoothed by filtering • Altogether these smoothed series are the seasonal component 2014 The X12-ARIMA algorithm • The decomposition is made by several iterative steps – Seasonal component from series with trend removed – Trend from series with seasonal component removed • Initial estimate of trend using the 2x12 moving average • One element is downweighting of observations with an extreme irregular component X12-ARIMA or SEATS • Both method can be viewed as filtering techniques • X12-ARIMA – A non-parametric method – No model assumed • SEATS – The components are assumed to follow ARIMA models – The filters are derived from modelling – Possible to do inference and to make forecasts with confidence intervals – So why the name X12-ARIMA when this method is the one that is not based on ARIMA? Answer on the next slide Calendar adjustment by reg-ARIMA modelling "The dirty model" mentioned earlier: • Seasonal ARIMA model – Correlated errors (autocorrelation) – Differencing the series makes the model quite good without explicit parameters for trend and seasonality – Need to decide the type of ARIMA model: ARIMA(p,d,q)(P,D,Q) • Regression parameters in the model – Calendar effects: Trading day, Moving holyday, … – Outliers and level shifts • Here y can be a log-transformed and leap-year adjusted variant of the original data This slide is “stolen” from https://www.scss.tcd.ie/Rozenn.Dahyot/ST7005/15SeasonalARIMA.pdf Here B is the backshift operator: BYt =Yt-1 ARIMA(0,1,1)(0,1,1) Most common model Airline model Example of regression variables in reg-ARIMA modelling • Easter – 2000 and 2001: Easter in April – 2008: Easter in March – 2002: 4 of 5 Norwegian Easter days in March • Trading day – Six parameters needed to model seven days – Mon: Number of Mondays minus Number of Sundays Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May : Mar Apr May Jun 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2001 2001 2001 2001 2001 2001 2001 2001 2001 2001 2001 2001 2002 2002 2002 2002 2002 : 2008 2008 2008 2008 Easter Mon Tue Wed Thu Fri Sat 0.0000000 0 -1 -1 -1 -1 0 0.0000000 0 1 0 0 0 0 -0.2571429 0 0 1 1 1 0 0.2571429 -1 -1 -1 -1 -1 0 0.0000000 1 1 1 0 0 0 0.0000000 0 0 0 1 1 0 0.0000000 0 -1 -1 -1 -1 0 0.0000000 0 1 1 1 0 0 0.0000000 0 0 0 0 1 1 0.0000000 0 0 -1 -1 -1 -1 0.0000000 0 0 1 1 0 0 0.0000000 -1 -1 -1 -1 0 0 0.0000000 1 1 1 0 0 0 0.0000000 0 0 0 0 0 0 -0.2571429 0 0 0 1 1 1 0.2571429 0 -1 -1 -1 -1 -1 0.0000000 0 1 1 1 0 0 0.0000000 0 0 0 0 1 1 0.0000000 0 0 -1 -1 -1 -1 0.0000000 0 0 1 1 1 0 0.0000000 -1 -1 -1 -1 -1 0 0.0000000 1 1 1 0 0 0 0.0000000 0 0 0 1 1 0 0.0000000 0 -1 -1 -1 -1 0 0.0000000 0 1 1 1 0 0 0.0000000 0 0 0 0 0 0 0.5428571 -1 -1 -1 -1 0 0 -0.5428571 1 1 0 0 0 0 0.0000000 0 0 1 1 1 0 : 0.7428571 0 -1 -1 -1 -1 0 -0.7428571 0 1 1 0 0 0 0.0000000 0 0 0 1 1 1 0.0000000 0 -1 -1 -1 -1 -1 Trading day: Separate effect of each day or common effect of all weekdays? • Question to the audience: Regression Model -------------------------------------------------------------Parameter Standard Variable Estimate Error t-value -------------------------------------------------------------Trading Day Mon -0.0019 0.00193 -1.00 Tue 0.0064 0.00194 3.31 Wed 0.0018 0.00190 0.94 Thu -0.0016 0.00195 -0.81 Fri 0.0138 0.00188 7.37 Sat 0.0034 0.00193 1.73 *Sun (derived) -0.0219 0.00196 -11.16 – Why exactly equal t-values? Regression Model -------------------------------------------------------------Parameter Standard Variable Estimate Error t-value -------------------------------------------------------------Trading Day Weekday 0.0036 0.00053 6.87 **Sat/Sun (derived) -0.0090 0.00131 -6.87 Outliers • An extreme observation caused by a special event can be problematic – Can influence the modelling in a negative way Parameter estimates Forecasts Decomposition • Solution – Include the outlier as a dummy variable in the reg-ARIMA modelling ….0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0…. – The outlier is included in the irregular component after modelling The observation is still included in seasonally adjusted data But has no effect on the trend Question to the audience: Examples of special events? 90 100 110 120 Data with outlier: Seasonally adjusted (blue) and trend (red) 2000 2002 2004 2006 2008 2010 2012 2014 85 90 95 100 105 110 115 Data with level shift: Seasonally adjusted (blue) and trend (red) 2000 2002 2004 2006 2008 2010 2012 2014 • Level shift is handled similar to outliers – Regression variable: ….0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1…. – Level shift is included in the trend Presented by • László Sajtos • Hungarian Central Statistical Office Topics • Seasonal adjustment step-by-step • (A few) issues on seasonal adjustment Seasonal adjustment step-by-step Seasonal adjustment step-by-step: structure Input data STEPS with check points Not acceptable results Preliminary results If results are acceptable Output data Time series analysis (STEP 0) Basic conditions • Length of time series (enough long to be seasonally adjusted?) Monthly datasets: at least 3-year long Quarterly datasets: at least 4-year long At least 5-7-year long time series is optimal! Expert information • Collecting expert data from the sections about datasets (potential outliers, methodological changes, changes in exterior factors (e.g. law), connections to other time series and sectors) Graphical analysis, test for seasonality (STEP 1) • Graphical analysis via basic and sophisticated graphs Plotted raw dataset Spectral analysis: autocorrelogram and auto-regressive spectrum • Identifying and explaining missing observations and outliers • Correction of data faults • Test for seasonality Graphical analysis, an example (2000-2013) Élelm. jell. Seasonality 144 136 128 Seems additive 120 112 104 96 88 80 Probably outliers 72 64 date 56 Jan2000 Jan2002 Jan2004 Jan2006 Jan2008 Jan2010 Jan2012 Jan2014 Data: Hungarian monthly retail volume index, food Type of transformation (STEP 2) Software tools Automatic test Verification Graphical analysis Calendar adjustment (STEP 3) Determining factors which may affect (regressors)+national holidays Consideration based on professional reasons Elimination Little significance Significance Non-significance or absence Keep Consideration based on professional reasons Elimination Outlier treatment (Step 4) Software tools Available expert information Verifying the results Automatic outlier testing STEP 1 Less significant, but professionally reasonable Significant Keep it Monitoring Stability Not significant Consideration based on professional reasons Eliminate it ARIMA model (Step 5) Software tools Automatic choice recommended Good results Not satisfying results Keep model Airline model Manual settings Reducing the order of the model Other low ordered models Decomposition (Step 6) Software tools Eliminating deterministic effects Decomposition Additive Multiplicative Log-additive Quality diagnostics (Step 7) 1. Model adequacy on residuals: • • Ljung-Box test Box-Pierce test 2. Seasonality: based on spectral graphics 3. Stability analysis: sliding spans Documentation required! Manual settings (Step 8) In case of: • Detailed analysis • Quality diagnostics are not auspicious • Further outlier correction • Other advanced settings (e.g. confidence intervals) Manual settings satisfying Quality diagnostics not Manual settings Dissemination (STEP 9) EXAMPLE (IN DEMETRA 2.04 SOFTWARE) HUNGARIAN INDUSTRIAL TIME SERIES Automated module Open the input database The list of time series Selection of time series output Save of output Diagnostic, outlier % Adjustment without fixed models Setting the method and trading day regressor Setting the country specific holidays The results Manual settings required Quality diagnostics (A few) issues on seasonal adjustment Issues in Memobust book • Consistency issues Data presentation • Revision Issues on chained indices • Treatment of the crisis Documentation • Communication with users Revision Revision Unadjusted data SA data Reasons: • • Reasons: New information are available Better estimation required. What to do: Estimating new model, new seasonal factors • • Data arrival after deadline Erroneous data etc. What to do: Data review Revision strategies Goal: preserving accuracy, taking new information into consideration while avoiding large changes reliability and stability Strategies: Extreme Extreme types types Current Concurrent Alternative Alternative types types Partial concurrent Controlled current Horizon of revision Question: How many months of data should be revised? Practices: • ESS Guideline: 3-4 years before the beginning of the revision period • Statistics Denmark: at least 13 months back in time Consistency issues Linkages in economy and among time series;expectations of users; errors; etc. Issues Time consistency issue Temporal constraints E.g.Annual and infra-annual series Aggregation consistency issue Cross-sectional constraints E.g.Total industrial and segmental series Time consistency issues Problem: consistency of, for instance, sub-annual and annual series e.g. GDP 4 i=1 GDPquarterly ≠ GDPannual i Sources of inconsistency: • Less and more accurate data are compared; • Sampling errors; • Errors in evaluation Benchmarking Benchmark: typically annual data Aim: Providing time consistency, the techniques operate with the sum of modified sub-annual series Benchmarking Pro-rating method Denton method Pro-rating method How it works: multiplies the sub-annual values by the corresponding annual proportional discrepancies Example: Three observations (y0 , y1 , y2 ), requirement: y0 =y1 +y2 y1 y0 Corrected values: y1 → b1 = ; (y1+y2) y1 +y2 y0 b1 +b2 = =y0 y1 +y2 y2 y0 y2 → b2 = y1+y2 Denton method How it works: Based on quadratic optimalization Advantages: • The method can be developed, specificated • More reliable results (smaller discontinuities compared with pro-rating) Aggregation consistency Aggregate series: time series consists of several components (e.g. industrial series) Goal: The aggregate series should equal to the sum of their components Problem: Non-linear seasonal adjustment process Direct SA (𝑋 + 𝑌)𝑆𝐴 ≠ Indirect SA 𝑋 𝑆𝐴 +𝑌 𝑆𝐴 Consequences: Hard to preserve accounting relationships, and meet users’ expectations Methods to achieve aggregation consistency • Only direct or indirect seasonal adjustment • Pro-rating • Denton method • Regression based models Thank you for your attention! Questions?