732G29 Time series analysis Fall semester 2009 • 7.5 ECTS-credits • Course tutor and examiner: Anders Nordgaard • Course web: www.ida.liu.se/~732G29 • Course literature: • Bowerman, O’Connell, Koehler: Forecasting, Time Series and Regression. 4th ed. Thomson, Brooks/Cole 2005. ISBN 0-53440977-6. Organization of this course: • (Almost) weekly “meetings”: Mixture between lectures and tutorials • A great portion of self-studying • Weekly assignments • Individual project at the end of the course • Individual oral exam Access to a computer is necessary. Computer rooms PC1-PC5 in Building E, ground floor can be used when they are not booked for another course For those of you that have your own PC, software Minitab can be borrowed for installation. Examination The course is examined by 1.Homework exercises (assignments) and project work 2.Oral exam Homework exercises and project work will be marked Passed or Failed. If Failed, corrections must be done for the mark Pass. Oral exam marks are given according to ECTS grades. To pass the oral exam, all homework exercises and the project work must have been marked Pass. The final grade will be the same grade as for the oral exam. Communication Contact with course tutor is best through e-mail: Anders.Nordgaard@liu.se. Office in Building B, Entrance 27, 2nd floor, corridor E (the small one close to Building E), room 3E:485. Telephone: 013-281974 Working hours: Odd-numbered weeks: Wed-Fri 8.00-16.30 Even-numbered weeks: Thu-Fri 8.00-16.30 E-mail response all weekdays All necessary information will be communicated through the course web. Always use the English version. The first page contains the most recent information (messages) Assignments will successively be put on the course web as well as information about the project. Solutions to assignments can be e-mailed or posted outside office door. 00 ma j-0 2 no v-0 1 ap r-0 1 ok t- ma r- 0 0 au g-9 9 feb -9 9 jul98 jan - 98 jun - 97 Time series Sales figures jan 98 - dec 01 45 40 35 30 25 20 15 10 5 0 2001-01-15 2000-01-15 1999-01-15 1998-01-15 1997-01-15 1996-01-15 1995-01-15 1994-01-15 1993-01-15 1992-01-15 1991-01-15 1990-01-15 1989-01-15 1988-01-15 1987-01-15 1986-01-15 1985-01-15 1984-01-15 1983-01-15 1982-01-15 1981-01-15 1980-01-15 Tot-P ug/l, Råån, Helsingborg 1980-2001 1000 900 800 700 600 500 400 300 200 100 0 Characteristics • Non-independent observations (correlations structure) • Systematic variation within a year (seasonal effects) • Long-term increasing or decreasing level (trend) • Irregular variation of small magnitude (noise) Where can time series be found? • Economic indicators: Sales figures, employment statistics, stock market indices, … • Meteorological data: precipitation, temperature,… • Environmental monitoring: concentrations of nutrients and pollutants in air masses, rivers, marine basins,… Time series analysis • Purpose: Estimate different parts of a time series in order to – understand the historical pattern – judge upon the current status – make forecasts of the future development • Methodologies: Method This course? Time series regression Yes Classical decomposition Yes Exponential smoothing Yes ARIMA modelling (Box-Jenkins) Yes Non-parametric and semi-parametric analysis No Transfer function and intervention models No State space modelling No Modern econometric methods: ARCH, GARCH, Cointegration No Spectral domain analysis No Time series regression? Let yt=(Observed) value of times series at time point t and assume a year is divided into L seasons Regession model (with linear trend): yt=0+ 1t+j sj xj,t+t where xj,t=1 if yt belongs to season j and 0 otherwise, j=1,…,L-1 and {t } are assumed to have zero mean and constant variance (2 ) The parameters 0, 1, s1,…, s,L-1 are estimated by the Ordinary Least Squares method: (b0, b1, bs1, … ,bs,L-1)=argmin {(yt – (0+ 1t+j sj xj,t)2} Advantages: • Simple and robust method • Easily interpreted components • Normal inference (conf..intervals, hypothesis testing) directly applicable • Forecasting with prediction limits directly applicable •Drawbacks: •Fixed components in model (mathematical trend function and constant seasonal components) •No consideration to correlation between observations Example: Sales figures Sales figures January 1998 - December 2001 45 40 35 30 25 20 15 10 5 -0 2 M ay 01 ov N r-0 1 Ap O ct -0 0 -0 0 M ar 9 g9 Au b99 Fe 8 l-9 Ju n98 Ja Ju n97 0 month jan-98 feb-98 mar-98 apr-98 maj-98 jun-98 jul-98 aug-98 sep-98 okt-98 nov-98 dec-98 20.33 20.96 23.06 24.48 25.47 28.81 30.32 29.56 30.01 26.78 23.75 24.06 jan-99 feb-99 mar-99 apr-99 maj-99 jun-99 jul-99 aug-99 sep-99 okt-99 nov-99 dec-99 23.58 24.61 27.28 27.69 29.99 30.87 32.09 34.53 30.85 30.24 27.86 24.67 jan-00 feb-00 mar-00 apr-00 maj-00 jun-00 jul-00 aug-00 sep-00 okt-00 nov-00 dec-00 26.09 26.66 29.61 32.12 34.01 32.98 36.38 35.90 36.42 34.04 31.29 28.50 jan-01 feb-01 mar-01 apr-01 maj-01 jun-01 jul-01 aug-01 sep-01 okt-01 nov-01 dec-01 28.43 29.92 33.44 34.56 34.22 38.91 41.31 38.89 40.90 38.27 32.02 29.78 Construct seasonal indicators: x1, x2, … , x12 January (1998-2001): x1 = 1, x2 = 0, x3 = 0, …, x12 = 0 February (1998-2001): x1 = 0, x2 = 1, x3 = 0, …, x12 = 0 etc. x1 = 0, x2 = 0, x3 = 0, …, x12 = 1 December (1998-2001): sales time x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 20.33 1 1 0 0 0 0 0 0 0 0 0 0 0 20.96 2 0 1 0 0 0 0 0 0 0 0 0 0 23.06 3 0 0 1 0 0 0 0 0 0 0 0 0 24.48 4 0 0 0 1 0 0 0 0 0 0 0 0 I I I I I I I I I I I I I I 32.02 47 0 0 0 0 0 0 0 0 0 0 1 0 29.78 48 0 0 0 0 0 0 0 0 0 0 0 1 Use 11 indicators, e.g. x1 - x11 in the regression model Analysis with software Minitab® Regression Analysis: sales versus time, x1, ... The regression equation is sales = 18.9 + 0.263 time + 0.750 x1 + 1.42 x2 + 3.96 x3 + 5.07 x4 + 6.01 x5 + 7.72 x6 + 9.59 x7 + 9.02 x8 + 8.58 x9 + 6.11 x10 + 2.24 x11 Predictor Coef SE Coef T P Constant 18.8583 0.6467 29.16 0.000 time 0.26314 0.01169 22.51 0.000 x1 0.7495 0.7791 0.96 0.343 x2 1.4164 0.7772 1.82 0.077 x3 3.9632 0.7756 5.11 0.000 x4 5.0651 0.7741 6.54 0.000 x5 6.0120 0.7728 7.78 0.000 x6 7.7188 0.7716 10.00 0.000 x7 9.5882 0.7706 12.44 0.000 x8 9.0201 0.7698 11.72 0.000 x9 8.5819 0.7692 11.16 0.000 x10 6.1063 0.7688 7.94 0.000 x11 2.2406 0.7685 2.92 0.006 S = 1.087 R-Sq = 96.6% R-Sq(adj) = 95.5% Analysis of Variance Source DF SS MS F P Regression 12 1179.818 98.318 83.26 0.000 Residual Error 35 41.331 1.181 Total 47 1221.150 Source DF Seq SS time 1 683.542 x1 1 79.515 x2 1 72.040 x3 1 16.541 x4 1 4.873 x5 1 0.204 x6 1 10.320 x7 1 63.284 x8 1 72.664 x9 1 100.570 x10 1 66.226 x11 1 10.039 Unusual Observations Obs time sales Fit SE Fit Residual St Resid 12 12.0 24.060 22.016 0.583 2.044 2.23R 21 21.0 30.850 32.966 0.548 -2.116 -2.25R R denotes an observation with a large standardized residual Predicted Values for New Observations New Obs 1 Fit SE Fit 32.502 0.647 95.0% CI ( 31.189, 95.0% PI 33.815) ( 29.934, 35.069) Values of Predictors for New Observations New Obs time x1 x2 x3 x4 x5 x6 1 49.0 1.00 0.000000 0.000000 0.000000 0.000000 0.000000 x7 x8 x9 x10 x11 0.000000 0.000000 0.000000 0.000000 0.000000 New Obs 1 Sales figures with predicted value 45 40 35 30 25 20 15 10 5 month What about serial correlation in data? -0 2 M ay 01 ov N r-0 1 Ap O ct -0 0 -0 0 M ar 9 g9 Au b99 Fe 8 l-9 Ju n98 Ja Ju n97 0 Positive serial correlation: Values follow a smooth pattern Negative serial correlation: Values show a “thorny” pattern How to obtain it? Use the residuals. 11 ˆ et yt yˆ t yt 0 ˆ1 t ˆs , j x j ,t ; t 1,...,48 j 1 Residual plot from the regression analysis: 2 Smooth or thorny? 1 0 -1 -2 10 20 30 Month number (from jan 1998) Durbin Watson test on residuals: n d 2 ( e e ) t t 1 t 2 n 2 e t t 1 Thumb rule: If d < 1 or d > 3, the conclusion is that residuals (and original data) are correlated. Use shape of figure (smooth or thorny) to decide if positive or negative) (More thorough rules for comparisons and decisions about positive or negative correlations exist.) Durbin-Watson statistic = 2.05 (Comes in the output ) Value > 1 and < 3 No significant serial correlation in residuals! What happens when the serial correlation is substantial? Estimated parameters in a regression model get their special properties regarding variance due to the fundamental conditions for the error terms {t }: • Mean value zero • Constant variance • Uncorrelated • (Normal distribution) If any of the first three conditions is violated Estimated variances of estimated parameters are not correct • Significance tests for parameters are not reliable • Prediction limits cannot be trusted How should the problem be handled? Besides DW-test, carefully perform graphical residual analysis If the serial correlation is modest (DW-test non-significant, and graphs OK) it is usually OK to proceed Otherwise, amendments to the model is need, in particular by modelling the serial correlation (will appear later in this course) Classical decomposition • Decompose – Analyse the observed time series in its different components: – Trend part (TR) – Seasonal part (SN) – Cyclical part (CL) – Irregular part (IR) Cyclical part: State-of-market in economic time series In environmental series, usually together with TR • Multiplicative model: yt=TRt·SNt ·CLt ·IRt Suitable for economic indicators Level is present in TRt or in TCt=(TR∙CL)t SNt , IRt (and CLt) works as indices Seasonal variation increases with level of yt 16 14 12 10 8 6 4 2 2 4 6 8 10 12 14 16 • Additive model: yt=TRt+SNt +CLt +IRt More suitable for environmental data Requires constant seasonal variation SNt , IRt (and CLt) vary around 0 10 9 8 7 6 5 4 3 2 1 2 4 6 8 10 12 14 16 Example 1: Sales data Observed (blue) and deseasonalised (magenta) Sales figures jan 98 - dec 01 Observed (blue) and theoretical trend (magenta) 02 ma j- 1 no v-0 1 ap r-0 t-0 0 ok 00 ma r- 9 au g-9 -99 feb 98 jul- 45.00 40.00 35.00 30.00 25.00 20.00 15.00 10.00 02 ma j- 1 no v-0 1 ap r-0 t-0 0 ok 00 ma r- 9 au g-9 -99 feb 98 jul- -98 5.00 jan -97 -98 Observed (blue) with estimated trendline (black) 45.00 40.00 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0.00 jun jan jun ma j-0 2 no v -0 1 ap r -0 1 00 ok t- ma r- 0 0 au g -9 9 feb -9 9 jul98 jan - 98 jun - 97 -97 45.00 40.00 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0.00 45.00 40.00 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0.00 0.00 mar-97 jul-98 dec-99 apr-01 sep-02 Example 2: Estimation of components, working scheme 1. Seasonally adjustment/Deseasonalisation: • • SNt usually has the largest amount of variation among the components. The time series is deseasonalised by calculating centred and weighted Moving Averages: M ( L) t yt ( L / 2) yt ( L / 21) 2 ... yt 2 ... yt ( L / 21) 2 yt ( L / 2) L2 where L=Number of seasons within a year (L=2 for ½-year data, 4 for quaerterly data och 12 för monthly data) – Mt becomes a rough estimate of (TR∙CL)t . – Rough seasonal components are obtained by • yt/Mt in a multiplicative model • yt – Mt in an additive model – Mean values of the rough seasonal components are calculated for eacj season separetly. L means. – The L means are adjusted to • have an exact average of 1 (i.e. their sum equals L ) in a multiplicative model. • Have an exact average of 0 (i.e. their sum equals zero) in an additive model. – Final estimates of the seasonal components are set to these adjusted means and are denoted: sn1 ,, snL – The time series is now deaseasonalised by • yt* yt / snt in a multiplicative model • yt* yt snt in an additive model where snt is one of sn1 ,, snL depending on which of the seasons t represents. 2. Seasonally adjusted values are used to estimate the trend component and occasionally the cyclical component. If no cyclical component is present: • • Apply simple linear regression on the seasonally adjusted values Estimates trt of linear or quadratic trend component. The residuals from the regression fit constitutes estimates, irt of the irregular component If cyclical component is present: • Estimate trend and cyclical component as a whole (do not split them) by tct yt*m yt*( m1) yt* yt*1 yt* m 2 m 1 i.e. A non-weighted centred Moving Average with length 2m+1 caclulated over the seasonally adjusted values – Common values for 2m+1: 3, 5, 7, 9, 11, 13 – Choice of m is based on properties of the final estimate of IRt which is calculated as * ir y • t t /(tct ) • irt yt* (tct ) in a multiplicative model in an additive model – m is chosen so to minimise the serial correlation and the variance of irt . – 2m+1 is called (number of) points of the Moving Average. Example, cont: Home sales data Minitab can be used for decomposition by StatTime seriesDecomposition Choice of model Option to choose between two models Time Series Decomposition Data Sold Length 47,0000 NMissing 0 Trend Line Equation Yt = 5,77613 + 4,30E-02*t Seasonal Indices Period Index 1 -4,09028 2 -4,13194 3 0,909722 4 -1,09028 5 3,70139 6 0,618056 7 4,70139 MAPE: 8 4,70139 MAD: 0,9025 9 -1,96528 MSD: 1,6902 10 0,118056 11 -1,29861 12 -2,17361 Accuracy of Model 16,4122 Deseasonalised data have been stored in a column with head DESE1. Moving Averages on these column can be calculated by StatTime seriesMoving average Choice of 2m+1 TC component with 2m +1 = 3 (blue) MSD should be kept as small as possible By saving residuals from the moving averages we can calculate MSD and serial correlations for each choice of 2m+1. 2m+1 MSD Corr(et,et-1) 3 1.817 -0.444 5 1.577 -0.473 7 1.564 -0.424 9 1.602 -0.396 11 1.542 -0.431 13 1.612 -0.405 A 7-points or 9-points moving average seems most reasonable. Serial correlations are simply calculated by StatTime seriesLag and further StatBasic statisticsCorrelation Or manually in Session window: MTB > lag ’RESI4’ c50 MTB > corr ’RESI4’ c50 Analysis with multiplicative model: Time Series Decomposition Data Sold Length 47,0000 NMissing 0 Trend Line Equation Yt = 5,77613 + 4,30E-02*t Seasonal Indices Period Index 1 0,425997 2 0,425278 3 1,14238 4 0,856404 5 1,52471 6 1,10138 MAPE: 7 1,65646 MAD: 0,9057 8 1,65053 MSD: 1,6388 9 0,670985 10 1,02048 11 0,825072 12 0,700325 Accuracy of Model 16,8643 additive additive additive Classical decomposition, summary Multiplicative model: yt TRt SNt CLt IRt Additive model: yt TRt SNt CLt IRt Deseasonalisation • Estimate trend+cyclical component by a centred moving average: CMAt yt ( L / 2) yt ( L / 21) 2 ... yt 2 ... yt ( L / 21) 2 yt ( L / 2) L2 where L is the number of seasons (e.g. 12, 4, 2) • Filter out seasonal and error (irregular) components: – Multiplicative model: yt snt irt CMAt -- Additive model: snt irt yt CMAt Calculate monthly averages Multiplicative model: sn m 1 nm nm ( snl irl ) Additive model: sn m 1 nm for seasons m=1,…,L nm ( snl irl ) Normalise the monhtly means Multiplicative model: snm sn m 1 L l 1 sn l L L L l 1 sn l Additive model: snm sn m 1 L L l 1 sn l Deseasonalise Multiplicative model: yt dt snt Additive model: dt yt snt where snt = snm for current month m Fit trend function, detrend (deaseasonalised) data trt f (t ) Multiplicative model: dt clt irt trt Additive model: clt irt dt trt Estimate cyclical component and separate from error component Multiplicative model: clt irt (cl ir )t k (cl ir )t ( k 1) ... (cl ir )t ... (cl ir )t k 2 k 1 (cl ir )t clt Additive model: clt (cl ir )t k (cl ir )t ( k 1) ... (cl ir )t ... (cl ir )t k irt (cl ir )t clt 2 k 1