Rob J Hyndman Automatic algorithms for time series forecasting Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Automatic nonlinear forecasting? 5 Time series with complex seasonality 6 Hierarchical and grouped time series 7 The future of forecasting Automatic algorithms for time series forecasting Motivation 2 Motivation Automatic algorithms for time series forecasting Motivation 3 Motivation Automatic algorithms for time series forecasting Motivation 3 Motivation Automatic algorithms for time series forecasting Motivation 3 Motivation Automatic algorithms for time series forecasting Motivation 3 Motivation Automatic algorithms for time series forecasting Motivation 3 Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. Specifications Automatic forecasting algorithms must: å determine an appropriate time series model; å estimate the parameters; å compute the forecasts with prediction intervals. Automatic algorithms for time series forecasting Motivation 4 Motivation 1 Common in business to have over 1000 products that need forecasting at least monthly. 2 Forecasts are often required by people who are untrained in time series analysis. Specifications Automatic forecasting algorithms must: å determine an appropriate time series model; å estimate the parameters; å compute the forecasts with prediction intervals. Automatic algorithms for time series forecasting Motivation 4 Example: Asian sheep 450 400 350 300 250 millions of sheep 500 550 Numbers of sheep in Asia 1960 1970 1980 1990 2000 2010 Year Automatic algorithms for time series forecasting Motivation 5 Example: Asian sheep 450 400 350 300 250 millions of sheep 500 550 Automatic ETS forecasts 1960 1970 1980 1990 2000 2010 Year Automatic algorithms for time series forecasting Motivation 5 Example: Cortecosteroid sales 1.0 0.8 0.6 0.4 Total scripts (millions) 1.2 1.4 Monthly cortecosteroid drug sales in Australia 1995 2000 2005 2010 Year Automatic algorithms for time series forecasting Motivation 6 Example: Cortecosteroid sales 1.2 1.0 0.8 0.6 0.4 Total scripts (millions) 1.4 1.6 Forecasts from ARIMA(3,1,3)(0,1,1)[12] 1995 2000 2005 2010 Year Automatic algorithms for time series forecasting Motivation 6 Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Automatic nonlinear forecasting? 5 Time series with complex seasonality 6 Hierarchical and grouped time series 7 The future of forecasting Automatic algorithms for time series forecasting Exponential smoothing 7 Exponential smoothing methods Trend Component Seasonal Component N A M (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M Automatic algorithms for time series forecasting Exponential smoothing 8 Exponential smoothing methods Trend Component Seasonal Component N A M (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M N,N: Simple exponential smoothing Automatic algorithms for time series forecasting Exponential smoothing 8 Exponential smoothing methods Trend Component Seasonal Component N A M (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M N,N: A,N: Simple exponential smoothing Holt’s linear method Automatic algorithms for time series forecasting Exponential smoothing 8 Exponential smoothing methods Trend Component Seasonal Component N A M (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M N,N: A,N: Ad ,N: Simple exponential smoothing Holt’s linear method Additive damped trend method Automatic algorithms for time series forecasting Exponential smoothing 8 Exponential smoothing methods Trend Component Seasonal Component N A M (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M N,N: A,N: Ad ,N: M,N: Simple exponential smoothing Holt’s linear method Additive damped trend method Exponential trend method Automatic algorithms for time series forecasting Exponential smoothing 8 Exponential smoothing methods Trend Component Seasonal Component N A M (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M N,N: A,N: Ad ,N: M,N: Md ,N: Simple exponential smoothing Holt’s linear method Additive damped trend method Exponential trend method Multiplicative damped trend method Automatic algorithms for time series forecasting Exponential smoothing 8 Exponential smoothing methods Trend Component Seasonal Component N A M (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M N,N: A,N: Ad ,N: M,N: Md ,N: A,A: Simple exponential smoothing Holt’s linear method Additive damped trend method Exponential trend method Multiplicative damped trend method Additive Holt-Winters’ method Automatic algorithms for time series forecasting Exponential smoothing 8 Exponential smoothing methods Trend Component Seasonal Component N A M (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M N,N: A,N: Ad ,N: M,N: Md ,N: A,A: A,M: Simple exponential smoothing Holt’s linear method Additive damped trend method Exponential trend method Multiplicative damped trend method Additive Holt-Winters’ method Multiplicative Holt-Winters’ method Automatic algorithms for time series forecasting Exponential smoothing 8 Exponential smoothing methods Trend Component Seasonal Component N A M (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M There are 15 separate exp. smoothing methods. Automatic algorithms for time series forecasting Exponential smoothing 8 Exponential smoothing methods Trend Component Seasonal Component N A M (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M There are 15 separate exp. smoothing methods. Each can have an additive or multiplicative error, giving 30 separate models. Automatic algorithms for time series forecasting Exponential smoothing 8 Exponential smoothing methods Trend Component Seasonal Component N A M (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M There are 15 separate exp. smoothing methods. Each can have an additive or multiplicative error, giving 30 separate models. Only 19 models are numerically stable. Automatic algorithms for time series forecasting Exponential smoothing 8 Exponential smoothing methods Trend Component Seasonal Component N A M (None) (Additive) (Multiplicative) N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M There are 15 separate exp. smoothing methods. Each can have an additive or multiplicative error, giving 30 separate models. Only 19 models are numerically stable. Multiplicative trend models give poor forecasts leaving 15 models. Automatic algorithms for time series forecasting Exponential smoothing 8 Exponential smoothing methods Seasonal Component N A M (None) (Additive) (Multiplicative) Trend Component N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M General notation E T S : ExponenTial Smoothing Examples: A,N,N: A,A,N: M,A,M: Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 9 Exponential smoothing methods Seasonal Component N A M (None) (Additive) (Multiplicative) Trend Component N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M General notation E T S : ExponenTial Smoothing Examples: A,N,N: A,A,N: M,A,M: Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 9 Exponential smoothing methods Seasonal Component N A M (None) (Additive) (Multiplicative) Trend Component N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M General notation Examples: A,N,N: A,A,N: M,A,M: E T S : ExponenTial Smoothing ↑ Trend Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 9 Exponential smoothing methods Seasonal Component N A M (None) (Additive) (Multiplicative) Trend Component N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M General notation Examples: A,N,N: A,A,N: M,A,M: E T S : ExponenTial Smoothing ↑ - Trend Seasonal Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 9 Exponential smoothing methods Seasonal Component N A M (None) (Additive) (Multiplicative) Trend Component N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M General notation Examples: A,N,N: A,A,N: M,A,M: E T S : ExponenTial Smoothing % ↑ - Error Trend Seasonal Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 9 Exponential smoothing methods Seasonal Component N A M (None) (Additive) (Multiplicative) Trend Component N (None) N,N N,A N,M A (Additive) A,N A,A A,M Ad (Additive damped) Ad ,N M (Multiplicative) M,N Ad ,A M,A Ad ,M M,M Md (Multiplicative damped) Md ,N Md ,A Md ,M General notation Examples: A,N,N: A,A,N: M,A,M: E T S : ExponenTial Smoothing % ↑ - Error Trend Seasonal Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 9 Exponential smoothing methods Innovations state space models Seasonal Component Trend N A M å AllComponent ETS models can (None) be written in innovations (Additive) (Multiplicative) N state 2002). N,A (None) space form (IJF,N,N A (Additive) N,M A,N A,A A,M d d d d d d å Additive and multiplicative versions give the A (Additive damped) A ,N A ,A A ,M d M Md same point forecastsM,N but different prediction (Multiplicative) M,A M,M intervals. (Multiplicative damped) M ,N M ,A M ,M General notation Examples: A,N,N: A,A,N: M,A,M: E T S : ExponenTial Smoothing % ↑ - Error Trend Seasonal Simple exponential smoothing with additive errors Holt’s linear method with additive errors Multiplicative Holt-Winters’ method with multiplicative errors Automatic algorithms for time series forecasting Exponential smoothing 9 ETS state space model xt −1 yt εt Automatic algorithms for time series forecasting State space model xt = (level, slope, seasonal) Exponential smoothing 10 ETS state space model xt −1 yt εt xt Automatic algorithms for time series forecasting State space model xt = (level, slope, seasonal) Exponential smoothing 10 ETS state space model xt −1 yt εt xt State space model yt+1 xt = (level, slope, seasonal) εt+1 Automatic algorithms for time series forecasting Exponential smoothing 10 ETS state space model xt −1 yt εt xt yt+1 εt+1 xt+1 State space model Automatic algorithms for time series forecasting xt = (level, slope, seasonal) Exponential smoothing 10 ETS state space model xt −1 yt εt xt yt+1 εt+1 xt+1 State space model xt = (level, slope, seasonal) yt+2 εt+2 Automatic algorithms for time series forecasting Exponential smoothing 10 ETS state space model xt −1 yt εt xt yt+1 εt+1 xt+1 yt+2 εt+2 xt +2 State space model Automatic algorithms for time series forecasting xt = (level, slope, seasonal) Exponential smoothing 10 ETS state space model xt −1 yt εt xt yt+1 εt+1 xt+1 yt+2 εt+2 xt +2 State space model xt = (level, slope, seasonal) y t +3 εt+3 Automatic algorithms for time series forecasting Exponential smoothing 10 ETS state space model xt −1 yt εt xt yt+1 εt+1 xt+1 yt+2 εt+2 xt +2 y t +3 εt+3 xt +3 State space model Automatic algorithms for time series forecasting xt = (level, slope, seasonal) Exponential smoothing 10 ETS state space model xt −1 yt εt xt yt+1 εt+1 xt+1 yt+2 εt+2 xt +2 y t +3 εt+3 xt +3 State space model xt = (level, slope, seasonal) y t +4 ε t +4 Automatic algorithms for time series forecasting Exponential smoothing 10 ETS state space model xt −1 yt εt xt yt+1 εt+1 xt+1 yt+2 εt+2 xt +2 y t +3 εt+3 xt +3 State space model xt = (level, slope, seasonal) y t +4 Estimation Compute likelihood L from ε1 , ε2 , . . . , εT . Optimize L wrt model parameters. Automatic algorithms for time series forecasting ε t +4 Exponential smoothing 10 Innovations state space models iid Let xt = (`t , bt , st , st−1 , . . . , st−m+1 ) and εt ∼ N(0, σ 2 ). yt = h(xt−1 ) + k (xt−1 )εt | {z } µt | {z et Observation equation } xt = f (xt−1 ) + g(xt−1 )εt State equation Additive errors: k (xt−1 ) = 1. yt = µt + εt . Multiplicative errors: k (xt−1 ) = µt . yt = µt (1 + εt ). εt = (yt − µt )/µt is relative error. Automatic algorithms for time series forecasting Exponential smoothing 11 Innovations state space models All models can be written in state space form. Additive and multiplicative versions give same point forecasts but different prediction intervals. Estimation ∗ X n L (θ, x0 ) = n log n X ε /k (xt−1 ) + 2 log |k (xt−1 )| 2 t t =1 2 t =1 = −2 log(Likelihood) + constant Minimize wrt θ = (α, β, γ, φ) and initial states x0 = (`0 , b0 , s0 , s−1 , . . . , s−m+1 ). Automatic algorithms for time series forecasting Exponential smoothing 12 Innovations state space models All models can be written in state space form. Additive and multiplicative versions give same point forecasts but different prediction intervals. Estimation ∗ X n L (θ, x0 ) = n log n X ε /k (xt−1 ) + 2 log |k (xt−1 )| 2 t t =1 2 t =1 = −2 log(Likelihood) + constant Minimize wrt θ = (α, β, γ, φ) and initial states x0 = (`0 , b0 , s0 , s−1 , . . . , s−m+1 ). Automatic algorithms for time series forecasting Exponential smoothing 12 Innovations state space models All models can be written in state space form. Additive and multiplicative versions give same point forecasts but different prediction intervals. Estimation ∗ X n L (θ, x0 ) = n log n X ε /k (xt−1 ) + 2 log |k (xt−1 )| 2 t t =1 2 t =1 = −2 log(Likelihood) + constant Minimize wrt θ = (α, β, γ, φ) and initial states x0 = (`0 , b0 , s0 , s−1 , . . . , s−m+1 ). Automatic algorithms for time series forecasting Exponential smoothing 12 Innovations state space models All models can be written in state space form. Additive and multiplicative versions give same point forecasts but different prediction intervals. Estimation ∗ X n L (θ, x0 ) = n log n X ε /k (xt−1 ) + 2 log |k (xt−1 )| 2 t t =1 2 t =1 = −2 log(Likelihood) + constant Minimize wrt θ = (α, β, γ, φ) and initial states x0 = (`0 , b0 , s0 , s−1 , . . . , s−m+1 ). Automatic algorithms for time series forecasting Exponential smoothing 12 Innovations state space models All models can be written in state space form. Additive and multiplicative versions give same point forecasts but different prediction intervals. Estimation ∗ X n L (θ, x0 ) = n log n X ε /k (xt−1 ) + 2 log |k (xt−1 )| 2 t t =1 2 t =1 = −2 log(Likelihood) + constant Minimize wrt θ = (α, β, γ, φ) and initial states x0 = (`0 , b0 , s0 , s−1 , . . . , s−m+1 ). Automatic algorithms for time series forecasting Exponential smoothing 12 Innovations state space models All models can be written in state space form. Additive and multiplicative versions give same point forecasts but different prediction intervals. Estimation ∗ X n L (θ, x0 ) = n log n X ε /k (xt−1 ) + 2 log |k (xt−1 )| 2 t t =1 2 t =1 = −2 log(Likelihood) + constant Q: How to choose Minimize wrt θ = (α, β, γ,between φ) and initial states the 15 useful x0 = (`0 , b0 , s0 , s−1 , . . . , s−ETS ). m+1models? Automatic algorithms for time series forecasting Exponential smoothing 12 Cross-validation Traditional evaluation Training data ● ● ● ● ● ● ● ● ● ● ● ● Automatic algorithms for time series forecasting Test data ● ● ● ● ● ● ● ● ● ● ● ● Exponential smoothing ● time 13 Cross-validation Traditional evaluation Training data ● ● ● ● ● ● ● ● ● ● ● ● Test data ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● time Standard cross-validation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Automatic algorithms for time series forecasting ● ● ● ● ● ● ● ● ● ● Exponential smoothing 13 Cross-validation Traditional evaluation Training data ● ● ● ● ● ● ● ● ● ● ● ● Test data ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● time Standard cross-validation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Time series cross-validation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Automatic algorithms for time series forecasting Exponential smoothing 13 Cross-validation Traditional evaluation Training data ● ● ● ● ● ● ● ● ● ● ● ● Test data ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● time Standard cross-validation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Time series cross-validation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Automatic algorithms for time series forecasting Exponential smoothing 13 Cross-validation Traditional evaluation Training data ● ● ● ● ● ● ● ● ● ● ● ● Test data ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● time Standard cross-validation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Time series cross-validation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Also known as “Evaluation on ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● a ● rolling ● ● ● ● forecast ● ● ● ● ●origin” ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Automatic algorithms for time series forecasting Exponential smoothing 13 Akaike’s Information Criterion AIC = −2 log(L) + 2k where L is the likelihood and k is the number of estimated parameters in the model. This is a penalized likelihood approach. If L is Gaussian, then AIC ≈ c + T log MSE + 2k where c is a constant, MSE is from one-step forecasts on training set, and T is the length of the series. Minimizing the Gaussian AIC is asymptotically equivalent (as T → ∞) to minimizing MSE from one-step forecasts on test set via time series cross-validation. Automatic algorithms for time series forecasting Exponential smoothing 14 Akaike’s Information Criterion AIC = −2 log(L) + 2k where L is the likelihood and k is the number of estimated parameters in the model. This is a penalized likelihood approach. If L is Gaussian, then AIC ≈ c + T log MSE + 2k where c is a constant, MSE is from one-step forecasts on training set, and T is the length of the series. Minimizing the Gaussian AIC is asymptotically equivalent (as T → ∞) to minimizing MSE from one-step forecasts on test set via time series cross-validation. Automatic algorithms for time series forecasting Exponential smoothing 14 Akaike’s Information Criterion AIC = −2 log(L) + 2k where L is the likelihood and k is the number of estimated parameters in the model. This is a penalized likelihood approach. If L is Gaussian, then AIC ≈ c + T log MSE + 2k where c is a constant, MSE is from one-step forecasts on training set, and T is the length of the series. Minimizing the Gaussian AIC is asymptotically equivalent (as T → ∞) to minimizing MSE from one-step forecasts on test set via time series cross-validation. Automatic algorithms for time series forecasting Exponential smoothing 14 Akaike’s Information Criterion AIC = −2 log(L) + 2k where L is the likelihood and k is the number of estimated parameters in the model. This is a penalized likelihood approach. If L is Gaussian, then AIC ≈ c + T log MSE + 2k where c is a constant, MSE is from one-step forecasts on training set, and T is the length of the series. Minimizing the Gaussian AIC is asymptotically equivalent (as T → ∞) to minimizing MSE from one-step forecasts on test set via time series cross-validation. Automatic algorithms for time series forecasting Exponential smoothing 14 Akaike’s Information Criterion AIC = −2 log(L) + 2k where L is the likelihood and k is the number of estimated parameters in the model. This is a penalized likelihood approach. If L is Gaussian, then AIC ≈ c + T log MSE + 2k where c is a constant, MSE is from one-step forecasts on training set, and T is the length of the series. Minimizing the Gaussian AIC is asymptotically equivalent (as T → ∞) to minimizing MSE from one-step forecasts on test set via time series cross-validation. Automatic algorithms for time series forecasting Exponential smoothing 14 Akaike’s Information Criterion AIC = −2 log(L) + 2k Corrected AIC For small T, AIC tends to over-fit. Bias-corrected version: AICC = AIC + 2(k +1)(k +2) T −k Bayesian Information Criterion BIC = AIC + k [log(T ) − 2] BIC penalizes terms more heavily than AIC Minimizing BIC is consistent if there is a true model. Automatic algorithms for time series forecasting Exponential smoothing 15 Akaike’s Information Criterion AIC = −2 log(L) + 2k Corrected AIC For small T, AIC tends to over-fit. Bias-corrected version: AICC = AIC + 2(k +1)(k +2) T −k Bayesian Information Criterion BIC = AIC + k [log(T ) − 2] BIC penalizes terms more heavily than AIC Minimizing BIC is consistent if there is a true model. Automatic algorithms for time series forecasting Exponential smoothing 15 Akaike’s Information Criterion AIC = −2 log(L) + 2k Corrected AIC For small T, AIC tends to over-fit. Bias-corrected version: AICC = AIC + 2(k +1)(k +2) T −k Bayesian Information Criterion BIC = AIC + k [log(T ) − 2] BIC penalizes terms more heavily than AIC Minimizing BIC is consistent if there is a true model. Automatic algorithms for time series forecasting Exponential smoothing 15 What to use? Choice: AIC, AICc, BIC, CV-MSE CV-MSE too time consuming for most automatic forecasting purposes. Also requires large T. As T → ∞, BIC selects true model if there is one. But that is never true! AICc focuses on forecasting performance, can be used on small samples and is very fast to compute. Empirical studies in forecasting show AIC is better than BIC for forecast accuracy. Automatic algorithms for time series forecasting Exponential smoothing 16 What to use? Choice: AIC, AICc, BIC, CV-MSE CV-MSE too time consuming for most automatic forecasting purposes. Also requires large T. As T → ∞, BIC selects true model if there is one. But that is never true! AICc focuses on forecasting performance, can be used on small samples and is very fast to compute. Empirical studies in forecasting show AIC is better than BIC for forecast accuracy. Automatic algorithms for time series forecasting Exponential smoothing 16 What to use? Choice: AIC, AICc, BIC, CV-MSE CV-MSE too time consuming for most automatic forecasting purposes. Also requires large T. As T → ∞, BIC selects true model if there is one. But that is never true! AICc focuses on forecasting performance, can be used on small samples and is very fast to compute. Empirical studies in forecasting show AIC is better than BIC for forecast accuracy. Automatic algorithms for time series forecasting Exponential smoothing 16 What to use? Choice: AIC, AICc, BIC, CV-MSE CV-MSE too time consuming for most automatic forecasting purposes. Also requires large T. As T → ∞, BIC selects true model if there is one. But that is never true! AICc focuses on forecasting performance, can be used on small samples and is very fast to compute. Empirical studies in forecasting show AIC is better than BIC for forecast accuracy. Automatic algorithms for time series forecasting Exponential smoothing 16 What to use? Choice: AIC, AICc, BIC, CV-MSE CV-MSE too time consuming for most automatic forecasting purposes. Also requires large T. As T → ∞, BIC selects true model if there is one. But that is never true! AICc focuses on forecasting performance, can be used on small samples and is very fast to compute. Empirical studies in forecasting show AIC is better than BIC for forecast accuracy. Automatic algorithms for time series forecasting Exponential smoothing 16 ets algorithm in R Based on Hyndman, Koehler, Snyder & Grose (IJF 2002): Apply each of 15 models that are appropriate to the data. Optimize parameters and initial values using MLE. Select best method using AICc. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Automatic algorithms for time series forecasting Exponential smoothing 17 ets algorithm in R Based on Hyndman, Koehler, Snyder & Grose (IJF 2002): Apply each of 15 models that are appropriate to the data. Optimize parameters and initial values using MLE. Select best method using AICc. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Automatic algorithms for time series forecasting Exponential smoothing 17 ets algorithm in R Based on Hyndman, Koehler, Snyder & Grose (IJF 2002): Apply each of 15 models that are appropriate to the data. Optimize parameters and initial values using MLE. Select best method using AICc. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Automatic algorithms for time series forecasting Exponential smoothing 17 ets algorithm in R Based on Hyndman, Koehler, Snyder & Grose (IJF 2002): Apply each of 15 models that are appropriate to the data. Optimize parameters and initial values using MLE. Select best method using AICc. Produce forecasts using best method. Obtain prediction intervals using underlying state space model. Automatic algorithms for time series forecasting Exponential smoothing 17 Exponential smoothing 400 300 millions of sheep 500 600 Forecasts from ETS(M,A,N) 1960 1970 1980 1990 2000 2010 Year Automatic algorithms for time series forecasting Exponential smoothing 18 Exponential smoothing 400 fit <- ets(livestock) fcast <- forecast(fit) plot(fcast) 300 millions of sheep 500 600 Forecasts from ETS(M,A,N) 1960 1970 1980 1990 2000 2010 Year Automatic algorithms for time series forecasting Exponential smoothing 19 Exponential smoothing 1.2 1.0 0.8 0.6 0.4 Total scripts (millions) 1.4 1.6 Forecasts from ETS(M,N,M) 1995 2000 2005 2010 Year Automatic algorithms for time series forecasting Exponential smoothing 20 Exponential smoothing 1.2 0.6 0.8 1.0 fit <- ets(h02) fcast <- forecast(fit) plot(fcast) 0.4 Total scripts (millions) 1.4 1.6 Forecasts from ETS(M,N,M) 1995 2000 2005 2010 Year Automatic algorithms for time series forecasting Exponential smoothing 21 Exponential smoothing > fit ETS(M,N,M) Smoothing parameters: alpha = 0.4597 gamma = 1e-04 Initial states: l = 0.4501 s = 0.8628 0.8193 0.7648 0.7675 0.6946 1.2921 1.3327 1.1833 1.1617 1.0899 1.0377 0.9937 sigma: 0.0675 AIC AICc -115.69960 -113.47738 BIC -69.24592 Automatic algorithms for time series forecasting Exponential smoothing 22 M3 comparisons Method MAPE sMAPE MASE Theta ForecastPro ForecastX Automatic ANN B-J automatic 17.42 18.00 17.35 17.18 19.13 12.76 13.06 13.09 13.98 13.72 1.39 1.47 1.42 1.53 1.54 ETS 17.38 13.13 1.43 Automatic algorithms for time series forecasting Exponential smoothing 23 Exponential smoothing Automatic algorithms for time series forecasting Exponential smoothing 24 Exponential smoothing www.OTexts.org/fpp Automatic algorithms for time series forecasting Exponential smoothing 24 Exponential smoothing Automatic algorithms for time series forecasting Exponential smoothing 24 Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Automatic nonlinear forecasting? 5 Time series with complex seasonality 6 Hierarchical and grouped time series 7 The future of forecasting Automatic algorithms for time series forecasting ARIMA modelling 25 ARIMA models Inputs Output y t −1 y t −2 yt y t −3 Automatic algorithms for time series forecasting ARIMA modelling 26 ARIMA models Inputs Output y t −1 Autoregression (AR) model y t −2 yt y t −3 εt Automatic algorithms for time series forecasting ARIMA modelling 26 ARIMA models Inputs Output y t −1 Autoregression moving average (ARMA) model y t −2 yt y t −3 εt ε t −1 ε t −2 Automatic algorithms for time series forecasting ARIMA modelling 26 ARIMA models Inputs Output y t −1 Autoregression moving average (ARMA) model y t −2 yt y t −3 εt ε t −1 ε t −2 Automatic algorithms for time series forecasting Estimation Compute likelihood L from ε1 , ε2 , . . . , εT . Use optimization algorithm to maximize L. ARIMA modelling 26 ARIMA models Inputs Output y t −1 y t −2 yt y t −3 εt ε t −1 ε t −2 Automatic algorithms for time series forecasting Autoregression moving average (ARMA) model ARIMA model Autoregression moving average (ARMA) model applied to differences. Estimation Compute likelihood L from ε1 , ε2 , . . . , εT . Use optimization algorithm to maximize L. ARIMA modelling 26 ARIMA modelling Automatic algorithms for time series forecasting ARIMA modelling 27 ARIMA modelling Automatic algorithms for time series forecasting ARIMA modelling 27 ARIMA modelling Automatic algorithms for time series forecasting ARIMA modelling 27 Auto ARIMA 450 400 350 300 250 millions of sheep 500 550 Forecasts from ARIMA(0,1,0) with drift 1960 1970 1980 1990 2000 2010 Year Automatic algorithms for time series forecasting ARIMA modelling 28 Auto ARIMA Forecasts from ARIMA(0,1,0) with drift 450 400 350 300 250 millions of sheep 500 550 fit <- auto.arima(livestock) fcast <- forecast(fit) plot(fcast) 1960 1970 1980 1990 2000 2010 Year Automatic algorithms for time series forecasting ARIMA modelling 29 Auto ARIMA 1.0 0.8 0.6 0.4 Total scripts (millions) 1.2 1.4 Forecasts from ARIMA(3,1,3)(0,1,1)[12] 1995 2000 2005 2010 Year Automatic algorithms for time series forecasting ARIMA modelling 30 Auto ARIMA 0.6 0.8 1.0 fit <- auto.arima(h02) fcast <- forecast(fit) plot(fcast) 0.4 Total scripts (millions) 1.2 1.4 Forecasts from ARIMA(3,1,3)(0,1,1)[12] 1995 2000 2005 2010 Year Automatic algorithms for time series forecasting ARIMA modelling 31 Auto ARIMA > fit Series: h02 ARIMA(3,1,3)(0,1,1)[12] Coefficients: ar1 -0.3648 s.e. 0.2198 ar2 -0.0636 0.3293 ar3 0.3568 0.1268 ma1 -0.4850 0.2227 ma2 0.0479 0.2755 ma3 -0.353 0.212 sma1 -0.5931 0.0651 sigma^2 estimated as 0.002706: log likelihood=290.25 AIC=-564.5 AICc=-563.71 BIC=-538.48 Automatic algorithms for time series forecasting ARIMA modelling 32 How does auto.arima() work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Algorithm choices driven by forecast accuracy. Automatic algorithms for time series forecasting ARIMA modelling 33 How does auto.arima() work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Hyndman & Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select p, q, c by minimising AICc. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Algorithm choices driven by forecast accuracy. Automatic algorithms for time series forecasting ARIMA modelling 33 How does auto.arima() work? A non-seasonal ARIMA process φ(B)(1 − B)d yt = c + θ(B)εt Need to select appropriate orders p, q, d, and whether to include c. Hyndman & Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select p, q, c by minimising AICc. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Algorithm choices driven by forecast accuracy. Automatic algorithms for time series forecasting ARIMA modelling 33 How does auto.arima() work? A seasonal ARIMA process Φ(Bm )φ(B)(1 − B)d (1 − Bm )D yt = c + Θ(Bm )θ(B)εt Need to select appropriate orders p, q, d, P, Q, D, and whether to include c. Hyndman & Khandakar (JSS, 2008) algorithm: Select no. differences d via KPSS unit root test. Select D using OCSB unit root test. Select p, q, P, Q, c by minimising AICc. Use stepwise search to traverse model space, starting with a simple model and considering nearby variants. Automatic algorithms for time series forecasting ARIMA modelling 34 M3 comparisons Method MAPE sMAPE MASE Theta 17.42 ForecastPro 18.00 B-J automatic 19.13 12.76 13.06 13.72 1.39 1.47 1.54 ETS AutoARIMA 13.13 13.85 1.43 1.47 17.38 19.12 Automatic algorithms for time series forecasting ARIMA modelling 35 Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Automatic nonlinear forecasting? 5 Time series with complex seasonality 6 Hierarchical and grouped time series 7 The future of forecasting Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 36 Automatic nonlinear forecasting Automatic ANN in M3 competition did poorly. Linear methods did best in the NN3 competition! Very few machine learning methods get published in the IJF because authors cannot demonstrate their methods give better forecasts than linear benchmark methods, even on supposedly nonlinear data. Some good recent work by Kourentzes and Crone on automated ANN for time series. Watch this space! Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 37 Automatic nonlinear forecasting Automatic ANN in M3 competition did poorly. Linear methods did best in the NN3 competition! Very few machine learning methods get published in the IJF because authors cannot demonstrate their methods give better forecasts than linear benchmark methods, even on supposedly nonlinear data. Some good recent work by Kourentzes and Crone on automated ANN for time series. Watch this space! Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 37 Automatic nonlinear forecasting Automatic ANN in M3 competition did poorly. Linear methods did best in the NN3 competition! Very few machine learning methods get published in the IJF because authors cannot demonstrate their methods give better forecasts than linear benchmark methods, even on supposedly nonlinear data. Some good recent work by Kourentzes and Crone on automated ANN for time series. Watch this space! Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 37 Automatic nonlinear forecasting Automatic ANN in M3 competition did poorly. Linear methods did best in the NN3 competition! Very few machine learning methods get published in the IJF because authors cannot demonstrate their methods give better forecasts than linear benchmark methods, even on supposedly nonlinear data. Some good recent work by Kourentzes and Crone on automated ANN for time series. Watch this space! Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 37 Automatic nonlinear forecasting Automatic ANN in M3 competition did poorly. Linear methods did best in the NN3 competition! Very few machine learning methods get published in the IJF because authors cannot demonstrate their methods give better forecasts than linear benchmark methods, even on supposedly nonlinear data. Some good recent work by Kourentzes and Crone on automated ANN for time series. Watch this space! Automatic algorithms for time series forecasting Automatic nonlinear forecasting? 37 Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Automatic nonlinear forecasting? 5 Time series with complex seasonality 6 Hierarchical and grouped time series 7 The future of forecasting Automatic algorithms for time series forecasting Time series with complex seasonality 38 Examples 6500 7000 7500 8000 8500 9000 9500 Thousands of barrels per day US finished motor gasoline products 1992 1994 1996 1998 2000 2002 2004 Weeks Automatic algorithms for time series forecasting Time series with complex seasonality 39 Examples 300 200 100 Number of call arrivals 400 Number of calls to large American bank (7am−9pm) 3 March 17 March 31 March 14 April 28 April 12 May 5 minute intervals Automatic algorithms for time series forecasting Time series with complex seasonality 39 Examples 20 15 10 Electricity demand (GW) 25 Turkish electricity demand 2000 2002 2004 2006 2008 Days Automatic algorithms for time series forecasting Time series with complex seasonality 39 TBATS model TBATS Trigonometric terms for seasonality Box-Cox transformations for heterogeneity ARMA errors for short-term dynamics Trend (possibly damped) Seasonal (including multiple and non-integer periods) Automatic algorithm described in AM De Livera, RJ Hyndman, and RD Snyder (2011). “Forecasting time series with complex seasonal patterns using exponential smoothing”. In: Journal of the American Statistical Association 106.496, pp. 1513–1527. URL: http://pubs.amstat.org/ doi/abs/10.1198/jasa.2011.tm09771. Automatic algorithms for time series forecasting Time series with complex seasonality 40 TBATS model yt = observation at time t (ω) yt (ω) yt ( (ytω − 1)/ω if ω 6= 0; = log yt if ω = 0. = `t−1 + φbt−1 + M X (i) st−mi + dt i=1 `t = `t−1 + φbt−1 + αdt bt = (1 − φ)b + φbt−1 + β dt p q X X dt = φi dt−i + θj εt−j + εt i=1 j=1 (i) ki (i) st = X j=1 (i) sj,t sj,t = (i) (i) (i) ∗(i) (i) (i) (i) (i) sj,t−1 cos λj + sj,t−1 sin λj + γ1 dt (i) (i) ∗(i) sj,t = −sj,t−1 sin λj + sj,t−1 cos λj + γ2 dt Automatic algorithms for time series forecasting Time series with complex seasonality 41 TBATS model yt = observation at time t (ω) yt (ω) yt ( (ytω − 1)/ω if ω 6= 0; = log yt if ω = 0. = `t−1 + φbt−1 + M X Box-Cox transformation (i) st−mi + dt i=1 `t = `t−1 + φbt−1 + αdt bt = (1 − φ)b + φbt−1 + β dt p q X X dt = φi dt−i + θj εt−j + εt i=1 j=1 (i) ki (i) st = X j=1 (i) sj,t sj,t = (i) (i) (i) ∗(i) (i) (i) (i) (i) sj,t−1 cos λj + sj,t−1 sin λj + γ1 dt (i) (i) ∗(i) sj,t = −sj,t−1 sin λj + sj,t−1 cos λj + γ2 dt Automatic algorithms for time series forecasting Time series with complex seasonality 41 TBATS model yt = observation at time t (ω) yt (ω) yt ( (ytω − 1)/ω if ω 6= 0; = log yt if ω = 0. = `t−1 + φbt−1 + M X Box-Cox transformation (i) st−mi + dt M seasonal periods i=1 `t = `t−1 + φbt−1 + αdt bt = (1 − φ)b + φbt−1 + β dt p q X X dt = φi dt−i + θj εt−j + εt i=1 j=1 (i) ki (i) st = X j=1 (i) sj,t sj,t = (i) (i) (i) ∗(i) (i) (i) (i) (i) sj,t−1 cos λj + sj,t−1 sin λj + γ1 dt (i) (i) ∗(i) sj,t = −sj,t−1 sin λj + sj,t−1 cos λj + γ2 dt Automatic algorithms for time series forecasting Time series with complex seasonality 41 TBATS model yt = observation at time t (ω) yt (ω) yt ( (ytω − 1)/ω if ω 6= 0; = log yt if ω = 0. = `t−1 + φbt−1 + M X Box-Cox transformation (i) st−mi + dt M seasonal periods i=1 `t = `t−1 + φbt−1 + αdt bt = (1 − φ)b + φbt−1 + β dt p q X X dt = φi dt−i + θj εt−j + εt i=1 j=1 (i) ki (i) st = X j=1 global and local trend (i) sj,t sj,t = (i) (i) (i) ∗(i) (i) (i) (i) (i) sj,t−1 cos λj + sj,t−1 sin λj + γ1 dt (i) (i) ∗(i) sj,t = −sj,t−1 sin λj + sj,t−1 cos λj + γ2 dt Automatic algorithms for time series forecasting Time series with complex seasonality 41 TBATS model yt = observation at time t (ω) yt (ω) yt ( (ytω − 1)/ω if ω 6= 0; = log yt if ω = 0. = `t−1 + φbt−1 + M X Box-Cox transformation (i) st−mi + dt M seasonal periods i=1 `t = `t−1 + φbt−1 + αdt bt = (1 − φ)b + φbt−1 + β dt p q X X dt = φi dt−i + θj εt−j + εt i=1 st = X j=1 ARMA error j=1 (i) ki (i) global and local trend (i) sj,t sj,t = (i) (i) (i) ∗(i) (i) (i) (i) (i) sj,t−1 cos λj + sj,t−1 sin λj + γ1 dt (i) (i) ∗(i) sj,t = −sj,t−1 sin λj + sj,t−1 cos λj + γ2 dt Automatic algorithms for time series forecasting Time series with complex seasonality 41 TBATS model yt = observation at time t (ω) yt (ω) yt ( (ytω − 1)/ω if ω 6= 0; = log yt if ω = 0. = `t−1 + φbt−1 + M X Box-Cox transformation (i) st−mi + dt M seasonal periods i=1 `t = `t−1 + φbt−1 + αdt bt = (1 − φ)b + φbt−1 + β dt p q X X dt = φi dt−i + θj εt−j + εt i=1 j=1 (i) ki (i) st = X j=1 (i) sj,t sj,t = (i) global and local trend ARMA error Fourier-like seasonal (i) (i) ∗(i) (i) (i) sj,t−1 cos λj + sj,t−1 sin λj + γ1 dt terms (i) (i) ∗(i) (i) (i) sj,t = −sj,t−1 sin λj + sj,t−1 cos λj + γ2 dt Automatic algorithms for time series forecasting Time series with complex seasonality 41 TBATS model yt = observation at time t (ω) yt ( (ytω − 1)/ω if ω 6= 0; = logTBATS yt if ω = 0. Box-Cox transformation M X (i) Trigonometric st−m + dt i=1 Box-Cox `t = `t−1 + φbt−1 + αdt ARMA bt = (1 − φ)b + φbt−1 + β dt p q TrendX X dt = φi dt−i + θj εt−j + εt Seasonal i=1 j=1 (ω) yt (i) = `t−1 + φbt−1 + st = ki X j=1 (i) (i) sj,t M seasonal periods i sj,t = (i) global and local trend ARMA error Fourier-like seasonal (i) (i) ∗(i) (i) (i) sj,t−1 cos λj + sj,t−1 sin λj + γ1 dt terms (i) (i) ∗(i) (i) (i) sj,t = −sj,t−1 sin λj + sj,t−1 cos λj + γ2 dt Automatic algorithms for time series forecasting Time series with complex seasonality 41 Examples 9000 8000 fit <- tbats(gasoline) fcast <- forecast(fit) plot(fcast) 7000 Thousands of barrels per day 10000 Forecasts from TBATS(0.999, {2,2}, 1, {<52.1785714285714,8>}) 1995 2000 2005 Weeks Automatic algorithms for time series forecasting Time series with complex seasonality 42 Examples 500 Forecasts from TBATS(1, {3,1}, 0.987, {<169,5>, <845,3>}) 300 200 100 0 Number of call arrivals 400 fit <- tbats(callcentre) fcast <- forecast(fit) plot(fcast) 3 March 17 March 31 March 14 April 28 April 12 May 26 May 9 June 5 minute intervals Automatic algorithms for time series forecasting Time series with complex seasonality 43 Examples Forecasts from TBATS(0, {5,3}, 0.997, {<7,3>, <354.37,12>, <365.25,4>}) 20 15 10 Electricity demand (GW) 25 fit <- tbats(turk) fcast <- forecast(fit) plot(fcast) 2000 2002 2004 2006 2008 2010 Days Automatic algorithms for time series forecasting Time series with complex seasonality 44 Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Automatic nonlinear forecasting? 5 Time series with complex seasonality 6 Hierarchical and grouped time series 7 The future of forecasting Automatic algorithms for time series forecasting Hierarchical and grouped time series 45 Hierarchical time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Total A AA AB C B AC BA BB BC CA CB CC Examples Net labour turnover Tourism by state and region Automatic algorithms for time series forecasting Hierarchical and grouped time series 46 Hierarchical time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Total A AA AB C B AC BA BB BC CA CB CC Examples Net labour turnover Tourism by state and region Automatic algorithms for time series forecasting Hierarchical and grouped time series 46 Hierarchical time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Total A AA AB C B AC BA BB BC CA CB CC Examples Net labour turnover Tourism by state and region Automatic algorithms for time series forecasting Hierarchical and grouped time series 46 Hierarchical time series Total A B C Automatic algorithms for time series forecasting Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. bt : vector of all series at bottom level in time t. Hierarchical and grouped time series 47 Hierarchical time series Total A B C Automatic algorithms for time series forecasting Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. bt : vector of all series at bottom level in time t. Hierarchical and grouped time series 47 Hierarchical time series Total A B C Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. bt : vector of all series at bottom level in time t. 1 1 yt = [Yt , YA,t , YB,t , YC,t ]0 = 0 0 Automatic algorithms for time series forecasting 1 0 1 0 1 Y A , t 0 YB,t 0 YC,t 1 Hierarchical and grouped time series 47 Hierarchical time series Total A B C Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. bt : vector of all series at bottom level in time t. 1 0 1 0 | {z 1 1 yt = [Yt , YA,t , YB,t , YC,t ]0 = 0 0 Automatic algorithms for time series forecasting S 1 Y A , t 0 YB , t 0 YC,t 1 } Hierarchical and grouped time series 47 Hierarchical time series Total A B C Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. bt : vector of all series at bottom level in time t. 1 0 1 0 | {z 1 1 yt = [Yt , YA,t , YB,t , YC,t ]0 = 0 0 Automatic algorithms for time series forecasting S 1 Y A , t 0 YB,t 0 YC,t 1 | {z } } bt Hierarchical and grouped time series 47 Hierarchical time series Total A B C Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. bt : vector of all series at bottom level in time t. 1 0 1 0 | {z 1 1 yt = [Yt , YA,t , YB,t , YC,t ]0 = 0 0 yt = Sbt Automatic algorithms for time series forecasting S 1 Y A , t 0 YB,t 0 YC,t 1 | {z } } bt Hierarchical and grouped time series 47 Hierarchical time series Total A AX yt = AY Yt YA,t YB,t YC,t YAX,t YAY ,t YAZ,t YBX,t YBY ,t YBZ,t YCX,t YCY ,t YCZ,t C B AZ BX 1 1 0 0 1 0 = 0 0 0 0 0 0 0 | 1 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 BY 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 {z S Automatic algorithms for time series forecasting 1 0 1 0 0 0 0 0 0 1 0 0 0 CX BZ 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 CY CZ YAX,t YAY,t Y AZ,t YBX,t YBY,t YBZ,t Y CX,t YCY,t YCZ,t | {z } bt } Hierarchical and grouped time series 48 Hierarchical time series Total A AX yt = AY Yt YA,t YB,t YC,t YAX,t YAY ,t YAZ,t YBX,t YBY ,t YBZ,t YCX,t YCY ,t YCZ,t C B AZ BX 1 1 0 0 1 0 = 0 0 0 0 0 0 0 | 1 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 BY 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 {z S Automatic algorithms for time series forecasting 1 0 1 0 0 0 0 0 0 1 0 0 0 CX BZ 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 CY CZ YAX,t YAY,t Y AZ,t YBX,t YBY,t YBZ,t Y CX,t YCY,t YCZ,t | {z } bt } Hierarchical and grouped time series 48 Hierarchical time series Total A AX yt = AY Yt YA,t YB,t YC,t YAX,t YAY ,t YAZ,t YBX,t YBY ,t YBZ,t YCX,t YCY ,t YCZ,t C B AZ BX 1 1 0 0 1 0 = 0 0 0 0 0 0 0 | 1 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 BY 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 {z S Automatic algorithms for time series forecasting 1 0 1 0 0 0 0 0 0 1 0 0 0 CX BZ 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 CY CZ YAX,t YAY,t Y AZ,t YBX,t YBY,t YBZ,t Y CX,t YCY,t YCZ,t | {z } bt yt = Sbt } Hierarchical and grouped time series 48 Forecasting notation Let ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as yt . (They may not add up.) Reconciled forecasts are of the form: ỹn (h) = SPŷn (h) for some matrix P. P extracts and combines base forecasts ŷn (h) to get bottom-level forecasts. S adds them up Automatic algorithms for time series forecasting Hierarchical and grouped time series 49 Forecasting notation Let ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as yt . (They may not add up.) Reconciled forecasts are of the form: ỹn (h) = SPŷn (h) for some matrix P. P extracts and combines base forecasts ŷn (h) to get bottom-level forecasts. S adds them up Automatic algorithms for time series forecasting Hierarchical and grouped time series 49 Forecasting notation Let ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as yt . (They may not add up.) Reconciled forecasts are of the form: ỹn (h) = SPŷn (h) for some matrix P. P extracts and combines base forecasts ŷn (h) to get bottom-level forecasts. S adds them up Automatic algorithms for time series forecasting Hierarchical and grouped time series 49 Forecasting notation Let ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as yt . (They may not add up.) Reconciled forecasts are of the form: ỹn (h) = SPŷn (h) for some matrix P. P extracts and combines base forecasts ŷn (h) to get bottom-level forecasts. S adds them up Automatic algorithms for time series forecasting Hierarchical and grouped time series 49 Forecasting notation Let ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as yt . (They may not add up.) Reconciled forecasts are of the form: ỹn (h) = SPŷn (h) for some matrix P. P extracts and combines base forecasts ŷn (h) to get bottom-level forecasts. S adds them up Automatic algorithms for time series forecasting Hierarchical and grouped time series 49 General properties ỹn (h) = SPŷn (h) Forecast bias Assuming the base forecasts ŷn (h) are unbiased, then the revised forecasts are unbiased iff SPS = S. Forecast variance For any given P satisfying SPS = S, the covariance matrix of the h-step ahead reconciled forecast errors is given by Var[yn+h − ỹn (h)] = SPWh P0 S0 where Wh is the covariance matrix of the h-step ahead base forecast errors. Automatic algorithms for time series forecasting Hierarchical and grouped time series 50 General properties ỹn (h) = SPŷn (h) Forecast bias Assuming the base forecasts ŷn (h) are unbiased, then the revised forecasts are unbiased iff SPS = S. Forecast variance For any given P satisfying SPS = S, the covariance matrix of the h-step ahead reconciled forecast errors is given by Var[yn+h − ỹn (h)] = SPWh P0 S0 where Wh is the covariance matrix of the h-step ahead base forecast errors. Automatic algorithms for time series forecasting Hierarchical and grouped time series 50 General properties ỹn (h) = SPŷn (h) Forecast bias Assuming the base forecasts ŷn (h) are unbiased, then the revised forecasts are unbiased iff SPS = S. Forecast variance For any given P satisfying SPS = S, the covariance matrix of the h-step ahead reconciled forecast errors is given by Var[yn+h − ỹn (h)] = SPWh P0 S0 where Wh is the covariance matrix of the h-step ahead base forecast errors. Automatic algorithms for time series forecasting Hierarchical and grouped time series 50 BLUF via trace minimization Theorem For any P satisfying SPS = S, then min = trace[SPWh P0 S0 ] P † † has solution P = (S0 Wh S)−1 S0 Wh . † Wh is generalized inverse of Wh . † † ỹn (h) = S(S0 Wh S)−1 S0 Wh ŷn (h) Revised forecasts Base forecasts Equivalent to GLS estimate of regression ŷn (h) = Sβn (h) + εh where ε ∼ N(0, Wh ). Problem: Wh hard to estimate. Automatic algorithms for time series forecasting Hierarchical and grouped time series 51 BLUF via trace minimization Theorem For any P satisfying SPS = S, then min = trace[SPWh P0 S0 ] P † † has solution P = (S0 Wh S)−1 S0 Wh . † Wh is generalized inverse of Wh . † † ỹn (h) = S(S0 Wh S)−1 S0 Wh ŷn (h) Revised forecasts Base forecasts Equivalent to GLS estimate of regression ŷn (h) = Sβn (h) + εh where ε ∼ N(0, Wh ). Problem: Wh hard to estimate. Automatic algorithms for time series forecasting Hierarchical and grouped time series 51 BLUF via trace minimization Theorem For any P satisfying SPS = S, then min = trace[SPWh P0 S0 ] P † † has solution P = (S0 Wh S)−1 S0 Wh . † Wh is generalized inverse of Wh . † † ỹn (h) = S(S0 Wh S)−1 S0 Wh ŷn (h) Revised forecasts Base forecasts Equivalent to GLS estimate of regression ŷn (h) = Sβn (h) + εh where ε ∼ N(0, Wh ). Problem: Wh hard to estimate. Automatic algorithms for time series forecasting Hierarchical and grouped time series 51 BLUF via trace minimization Theorem For any P satisfying SPS = S, then min = trace[SPWh P0 S0 ] P † † has solution P = (S0 Wh S)−1 S0 Wh . † Wh is generalized inverse of Wh . † † ỹn (h) = S(S0 Wh S)−1 S0 Wh ŷn (h) Revised forecasts Base forecasts Equivalent to GLS estimate of regression ŷn (h) = Sβn (h) + εh where ε ∼ N(0, Wh ). Problem: Wh hard to estimate. Automatic algorithms for time series forecasting Hierarchical and grouped time series 51 BLUF via trace minimization Theorem For any P satisfying SPS = S, then min = trace[SPWh P0 S0 ] P † † has solution P = (S0 Wh S)−1 S0 Wh . † Wh is generalized inverse of Wh . † † ỹn (h) = S(S0 Wh S)−1 S0 Wh ŷn (h) Revised forecasts Base forecasts Equivalent to GLS estimate of regression ŷn (h) = Sβn (h) + εh where ε ∼ N(0, Wh ). Problem: Wh hard to estimate. Automatic algorithms for time series forecasting Hierarchical and grouped time series 51 BLUF via trace minimization Theorem For any P satisfying SPS = S, then min = trace[SPWh P0 S0 ] P † † has solution P = (S0 Wh S)−1 S0 Wh . † Wh is generalized inverse of Wh . † † ỹn (h) = S(S0 Wh S)−1 S0 Wh ŷn (h) Revised forecasts Base forecasts Equivalent to GLS estimate of regression ŷn (h) = Sβn (h) + εh where ε ∼ N(0, Wh ). Problem: Wh hard to estimate. Automatic algorithms for time series forecasting Hierarchical and grouped time series 51 Optimal combination forecasts † † ỹn (h) = S(S0 Wh S)−1 S0 Wh ŷn (h) Revised forecasts Base forecasts Solution 1: OLS ỹn (h) = S(S0 S)−1 S0 ŷn (h) Solution 2: WLS Approximate W1 by its diagonal. Assume Wh = kh W1 . Easy to estimate, and places weight where we have best one-step forecasts. ỹn (h) = S(S0 ΛS)−1 S0 Λŷn (h) Automatic algorithms for time series forecasting Hierarchical and grouped time series 52 Optimal combination forecasts † † ỹn (h) = S(S0 Wh S)−1 S0 Wh ŷn (h) Revised forecasts Base forecasts Solution 1: OLS ỹn (h) = S(S0 S)−1 S0 ŷn (h) Solution 2: WLS Approximate W1 by its diagonal. Assume Wh = kh W1 . Easy to estimate, and places weight where we have best one-step forecasts. ỹn (h) = S(S0 ΛS)−1 S0 Λŷn (h) Automatic algorithms for time series forecasting Hierarchical and grouped time series 52 Optimal combination forecasts † † ỹn (h) = S(S0 Wh S)−1 S0 Wh ŷn (h) Revised forecasts Base forecasts Solution 1: OLS ỹn (h) = S(S0 S)−1 S0 ŷn (h) Solution 2: WLS Approximate W1 by its diagonal. Assume Wh = kh W1 . Easy to estimate, and places weight where we have best one-step forecasts. ỹn (h) = S(S0 ΛS)−1 S0 Λŷn (h) Automatic algorithms for time series forecasting Hierarchical and grouped time series 52 Optimal combination forecasts † † ỹn (h) = S(S0 Wh S)−1 S0 Wh ŷn (h) Revised forecasts Base forecasts Solution 1: OLS ỹn (h) = S(S0 S)−1 S0 ŷn (h) Solution 2: WLS Approximate W1 by its diagonal. Assume Wh = kh W1 . Easy to estimate, and places weight where we have best one-step forecasts. ỹn (h) = S(S0 ΛS)−1 S0 Λŷn (h) Automatic algorithms for time series forecasting Hierarchical and grouped time series 52 Optimal combination forecasts † † ỹn (h) = S(S0 Wh S)−1 S0 Wh ŷn (h) Revised forecasts Base forecasts Solution 1: OLS ỹn (h) = S(S0 S)−1 S0 ŷn (h) Solution 2: WLS Approximate W1 by its diagonal. Assume Wh = kh W1 . Easy to estimate, and places weight where we have best one-step forecasts. ỹn (h) = S(S0 ΛS)−1 S0 Λŷn (h) Automatic algorithms for time series forecasting Hierarchical and grouped time series 52 Optimal combination forecasts † † ỹn (h) = S(S0 Wh S)−1 S0 Wh ŷn (h) Revised forecasts Base forecasts Solution 1: OLS ỹn (h) = S(S0 S)−1 S0 ŷn (h) Solution 2: WLS Approximate W1 by its diagonal. Assume Wh = kh W1 . Easy to estimate, and places weight where we have best one-step forecasts. ỹn (h) = S(S0 ΛS)−1 S0 Λŷn (h) Automatic algorithms for time series forecasting Hierarchical and grouped time series 52 Optimal combination forecasts † † ỹn (h) = S(S0 Wh S)−1 S0 Wh ŷn (h) Revised forecasts Base forecasts Solution 1: OLS ỹn (h) = S(S0 S)−1 S0 ŷn (h) Solution 2: WLS Approximate W1 by its diagonal. Assume Wh = kh W1 . Easy to estimate, and places weight where we have best one-step forecasts. ỹn (h) = S(S0 ΛS)−1 S0 Λŷn (h) Automatic algorithms for time series forecasting Hierarchical and grouped time series 52 Challenges ỹn (h) = S(S0 ΛS)−1 S0 Λŷn (h) Computational difficulties in big hierarchies due to size of the S matrix and singular behavior of (S0 ΛS). Loss of information in ignoring covariance matrix in computing point forecasts. Still need to estimate covariance matrix to produce prediction intervals. Automatic algorithms for time series forecasting Hierarchical and grouped time series 53 Challenges ỹn (h) = S(S0 ΛS)−1 S0 Λŷn (h) Computational difficulties in big hierarchies due to size of the S matrix and singular behavior of (S0 ΛS). Loss of information in ignoring covariance matrix in computing point forecasts. Still need to estimate covariance matrix to produce prediction intervals. Automatic algorithms for time series forecasting Hierarchical and grouped time series 53 Challenges ỹn (h) = S(S0 ΛS)−1 S0 Λŷn (h) Computational difficulties in big hierarchies due to size of the S matrix and singular behavior of (S0 ΛS). Loss of information in ignoring covariance matrix in computing point forecasts. Still need to estimate covariance matrix to produce prediction intervals. Automatic algorithms for time series forecasting Hierarchical and grouped time series 53 Australian tourism Automatic algorithms for time series forecasting Hierarchical and grouped time series 54 Australian tourism Hierarchy: States (7) Zones (27) Regions (82) Automatic algorithms for time series forecasting Hierarchical and grouped time series 54 Australian tourism Hierarchy: States (7) Zones (27) Regions (82) Base forecasts ETS (exponential smoothing) models Automatic algorithms for time series forecasting Hierarchical and grouped time series 54 Base forecasts 75000 70000 65000 60000 Visitor nights 80000 85000 Domestic tourism forecasts: Total 1998 2000 2002 2004 2006 2008 Year Automatic algorithms for time series forecasting Hierarchical and grouped time series 55 Base forecasts 26000 22000 18000 Visitor nights 30000 Domestic tourism forecasts: NSW 1998 2000 2002 2004 2006 2008 Year Automatic algorithms for time series forecasting Hierarchical and grouped time series 55 Base forecasts 16000 14000 12000 10000 Visitor nights 18000 Domestic tourism forecasts: VIC 1998 2000 2002 2004 2006 2008 Year Automatic algorithms for time series forecasting Hierarchical and grouped time series 55 Base forecasts 7000 6000 5000 Visitor nights 8000 9000 Domestic tourism forecasts: Nth.Coast.NSW 1998 2000 2002 2004 2006 2008 Year Automatic algorithms for time series forecasting Hierarchical and grouped time series 55 Base forecasts 11000 9000 8000 Visitor nights 13000 Domestic tourism forecasts: Metro.QLD 1998 2000 2002 2004 2006 2008 Year Automatic algorithms for time series forecasting Hierarchical and grouped time series 55 Base forecasts 1000 800 600 400 Visitor nights 1200 1400 Domestic tourism forecasts: Sth.WA 1998 2000 2002 2004 2006 2008 Year Automatic algorithms for time series forecasting Hierarchical and grouped time series 55 Base forecasts 5000 4500 4000 Visitor nights 5500 6000 Domestic tourism forecasts: X201.Melbourne 1998 2000 2002 2004 2006 2008 Year Automatic algorithms for time series forecasting Hierarchical and grouped time series 55 Base forecasts 200 100 0 Visitor nights 300 Domestic tourism forecasts: X402.Murraylands 1998 2000 2002 2004 2006 2008 Year Automatic algorithms for time series forecasting Hierarchical and grouped time series 55 Base forecasts 60 40 20 0 Visitor nights 80 100 Domestic tourism forecasts: X809.Daly 1998 2000 2002 2004 2006 2008 Year Automatic algorithms for time series forecasting Hierarchical and grouped time series 55 80000 65000 Total 95000 Reconciled forecasts 2000 Automatic algorithms for time series forecasting 2005 2010 Hierarchical and grouped time series 56 18000 2005 2010 18000 Other Automatic algorithms for time series forecasting 14000 VIC 2000 10000 2010 2000 2005 2010 2000 2005 2010 24000 24000 2005 20000 2000 14000 QLD 18000 NSW 30000 Reconciled forecasts Hierarchical and grouped time series 56 2005 2010 Other QLD 9000 Other 22000 14000 Other NSW 2000 2010 2000 2005 2010 2000 2005 2010 2000 2005 2010 12000 2010 Other VIC 2005 6000 2000 2005 12000 2010 6000 2005 2000 5500 7500 7000 2000 6000 Melbourne 2010 20000 GC and Brisbane 2005 14000 Capital cities 2000 4000 5000 4000 Sydney Reconciled forecasts Automatic algorithms for time series forecasting Hierarchical and grouped time series 56 Forecast evaluation Select models using all observations; Re-estimate models using first 12 observations and generate 1- to 8-step-ahead forecasts; Increase sample size one observation at a time, re-estimate models, generate forecasts until the end of the sample; In total 24 1-step-ahead, 23 2-steps-ahead, up to 17 8-steps-ahead for forecast evaluation. Automatic algorithms for time series forecasting Hierarchical and grouped time series 57 Forecast evaluation Select models using all observations; Re-estimate models using first 12 observations and generate 1- to 8-step-ahead forecasts; Increase sample size one observation at a time, re-estimate models, generate forecasts until the end of the sample; In total 24 1-step-ahead, 23 2-steps-ahead, up to 17 8-steps-ahead for forecast evaluation. Automatic algorithms for time series forecasting Hierarchical and grouped time series 57 Forecast evaluation Select models using all observations; Re-estimate models using first 12 observations and generate 1- to 8-step-ahead forecasts; Increase sample size one observation at a time, re-estimate models, generate forecasts until the end of the sample; In total 24 1-step-ahead, 23 2-steps-ahead, up to 17 8-steps-ahead for forecast evaluation. Automatic algorithms for time series forecasting Hierarchical and grouped time series 57 Forecast evaluation Select models using all observations; Re-estimate models using first 12 observations and generate 1- to 8-step-ahead forecasts; Increase sample size one observation at a time, re-estimate models, generate forecasts until the end of the sample; In total 24 1-step-ahead, 23 2-steps-ahead, up to 17 8-steps-ahead for forecast evaluation. Automatic algorithms for time series forecasting Hierarchical and grouped time series 57 Hierarchy: states, zones, regions MAPE h=1 h=2 Top Level: Australia Bottom-up 3.79 3.58 OLS 3.83 3.66 WLS 3.68 3.56 Level: States Bottom-up 10.70 10.52 OLS 11.07 10.58 WLS 10.44 10.17 Level: Zones Bottom-up 14.99 14.97 OLS 15.16 15.06 WLS 14.63 14.62 Bottom Level: Regions Bottom-up 33.12 32.54 OLS 35.89 33.86 WLS 31.68 31.22 h=4 h=6 h=8 Average 4.01 3.88 3.97 4.55 4.19 4.57 4.24 4.25 4.25 4.06 3.94 4.04 10.85 11.13 10.47 11.46 11.62 10.97 11.27 12.21 10.98 11.03 11.35 10.67 14.98 15.27 14.68 15.69 15.74 15.17 15.65 16.15 15.25 15.32 15.48 14.94 32.26 34.26 31.08 33.74 36.06 32.41 33.96 37.49 32.77 33.18 35.43 31.89 Automatic algorithms for time series forecasting Hierarchical and grouped time series 58 hts package for R hts: Hierarchical and grouped time series Methods for analysing and forecasting hierarchical and grouped time series Version: Depends: Imports: Published: Author: Maintainer: BugReports: License: 4.5 forecast (≥ 5.0), SparseM parallel, utils 2014-12-09 Rob J Hyndman, Earo Wang and Alan Lee Rob J Hyndman <Rob.Hyndman at monash.edu> https://github.com/robjhyndman/hts/issues GPL (≥ 2) Automatic algorithms for time series forecasting Hierarchical and grouped time series 59 Outline 1 Motivation 2 Exponential smoothing 3 ARIMA modelling 4 Automatic nonlinear forecasting? 5 Time series with complex seasonality 6 Hierarchical and grouped time series 7 The future of forecasting Automatic algorithms for time series forecasting The future of forecasting 60 Forecasts about forecasting 1 2 3 4 Automatic algorithms will become more general — handling a wide variety of time series. Model selection methods will take account of multi-step forecast accuracy as well as one-step forecast accuracy. Automatic forecasting algorithms for multivariate time series will be developed. Automatic forecasting algorithms that include covariate information will be developed. Automatic algorithms for time series forecasting The future of forecasting 61 Forecasts about forecasting 1 2 3 4 Automatic algorithms will become more general — handling a wide variety of time series. Model selection methods will take account of multi-step forecast accuracy as well as one-step forecast accuracy. Automatic forecasting algorithms for multivariate time series will be developed. Automatic forecasting algorithms that include covariate information will be developed. Automatic algorithms for time series forecasting The future of forecasting 61 Forecasts about forecasting 1 2 3 4 Automatic algorithms will become more general — handling a wide variety of time series. Model selection methods will take account of multi-step forecast accuracy as well as one-step forecast accuracy. Automatic forecasting algorithms for multivariate time series will be developed. Automatic forecasting algorithms that include covariate information will be developed. Automatic algorithms for time series forecasting The future of forecasting 61 Forecasts about forecasting 1 2 3 4 Automatic algorithms will become more general — handling a wide variety of time series. Model selection methods will take account of multi-step forecast accuracy as well as one-step forecast accuracy. Automatic forecasting algorithms for multivariate time series will be developed. Automatic forecasting algorithms that include covariate information will be developed. Automatic algorithms for time series forecasting The future of forecasting 61 For further information robjhyndman.com Slides and references for this talk. Links to all papers and books. Links to R packages. A blog about forecasting research. Automatic algorithms for time series forecasting The future of forecasting 62