Rob J Hyndman Forecasting: Principles and Practice 11. Hierarchical forecasting OTexts.com/fpp/9/4/ Forecasting: Principles and Practice 1 Outline 1 Hierarchical and grouped time series 2 Forecasting framework 3 Optimal forecasts 4 Approximately optimal forecasts 5 Application: Australian tourism 6 Application: Australian labour market 7 hts package for R Forecasting: Principles and Practice Hierarchical and grouped time series 2 Introduction Total A AA AB C B AC BA BB BC CA CB CC Examples Manufacturing product hierarchies Net labour turnover Pharmaceutical sales Tourism demand by region and purpose Forecasting: Principles and Practice Hierarchical and grouped time series 3 Introduction Total A AA AB C B AC BA BB BC CA CB CC Examples Manufacturing product hierarchies Net labour turnover Pharmaceutical sales Tourism demand by region and purpose Forecasting: Principles and Practice Hierarchical and grouped time series 3 Introduction Total A AA AB C B AC BA BB BC CA CB CC Examples Manufacturing product hierarchies Net labour turnover Pharmaceutical sales Tourism demand by region and purpose Forecasting: Principles and Practice Hierarchical and grouped time series 3 Introduction Total A AA AB C B AC BA BB BC CA CB CC Examples Manufacturing product hierarchies Net labour turnover Pharmaceutical sales Tourism demand by region and purpose Forecasting: Principles and Practice Hierarchical and grouped time series 3 Introduction Total A AA AB C B AC BA BB BC CA CB CC Examples Manufacturing product hierarchies Net labour turnover Pharmaceutical sales Tourism demand by region and purpose Forecasting: Principles and Practice Hierarchical and grouped time series 3 Forecasting the PBS Forecasting: Principles and Practice Hierarchical and grouped time series 4 ATC drug classification A B C D G H J L M N P R S V Alimentary tract and metabolism Blood and blood forming organs Cardiovascular system Dermatologicals Genito-urinary system and sex hormones Systemic hormonal preparations, excluding sex hormones and insulins Anti-infectives for systemic use Antineoplastic and immunomodulating agents Musculo-skeletal system Nervous system Antiparasitic products, insecticides and repellents Respiratory system Sensory organs Various Forecasting: Principles and Practice Hierarchical and grouped time series 5 ATC drug classification 14 classes A 84 classes A10 A10B Alimentary tract and metabolism Drugs used in diabetes Blood glucose lowering drugs A10BA Biguanides A10BA02 Metformin Forecasting: Principles and Practice Hierarchical and grouped time series 6 Australian tourism Forecasting: Principles and Practice Hierarchical and grouped time series 7 Australian tourism Also split by purpose of travel: Holiday Visits to friends and relatives Business Other Forecasting: Principles and Practice Hierarchical and grouped time series 7 Hierarchical/grouped time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classification System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: Australian tourism demand is grouped by region and purpose of travel. Forecasting: Principles and Practice Hierarchical and grouped time series 8 Hierarchical/grouped time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classification System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: Australian tourism demand is grouped by region and purpose of travel. Forecasting: Principles and Practice Hierarchical and grouped time series 8 Hierarchical/grouped time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classification System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: Australian tourism demand is grouped by region and purpose of travel. Forecasting: Principles and Practice Hierarchical and grouped time series 8 Hierarchical/grouped time series A hierarchical time series is a collection of several time series that are linked together in a hierarchical structure. Example: Pharmaceutical products are organized in a hierarchy under the Anatomical Therapeutic Chemical (ATC) Classification System. A grouped time series is a collection of time series that are aggregated in a number of non-hierarchical ways. Example: Australian tourism demand is grouped by region and purpose of travel. Forecasting: Principles and Practice Hierarchical and grouped time series 8 Hierarchical data Total A B C Forecasting: Principles and Practice Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t. Hierarchical and grouped time series 9 Hierarchical data Total A B C Forecasting: Principles and Practice Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t. Hierarchical and grouped time series 9 Hierarchical data Total A B C Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t. 1 1 Yt = [Yt , YA,t , YB,t , YC,t ]0 = 0 0 Forecasting: Principles and Practice 1 0 1 0 1 Y A , t 0 YB , t 0 YC,t 1 Hierarchical and grouped time series 9 Hierarchical data Total A B C Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t. 1 0 1 0 | {z 1 1 Yt = [Yt , YA,t , YB,t , YC,t ]0 = 0 0 Forecasting: Principles and Practice S 1 Y A , t 0 YB,t 0 YC,t 1 } Hierarchical and grouped time series 9 Hierarchical data Total A B C Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t. 1 0 1 0 | {z 1 1 Yt = [Yt , YA,t , YB,t , YC,t ]0 = 0 0 Forecasting: Principles and Practice S 1 Y A , t 0 YB,t 0 YC,t 1 | {z } } Bt Hierarchical and grouped time series 9 Hierarchical data Total A B C Yt : observed aggregate of all series at time t. YX,t : observation on series X at time t. Bt : vector of all series at bottom level in time t. 1 0 1 0 | {z 1 1 Yt = [Yt , YA,t , YB,t , YC,t ]0 = 0 0 Yt = SBt Forecasting: Principles and Practice S 1 Y A , t 0 YB,t 0 YC,t 1 | {z } } Bt Hierarchical and grouped time series 9 Hierarchical data Total A AX Yt = AY Yt YA,t YB,t YC,t YAX,t YAY ,t YAZ,t YBX,t YBY ,t YBZ,t YCX,t YCY ,t YCZ,t C B AZ BX 1 1 0 0 1 0 = 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 | Forecasting: Principles and Practice 1 0 1 0 0 0 0 1 0 0 0 0 0 BY 1 0 1 0 0 0 0 0 1 0 0 0 0 {z S 1 0 1 0 0 0 0 0 0 1 0 0 0 CX BZ 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 CY CZ YAX,t YAY,t Y AZ,t YBX,t YBY,t YBZ,t Y CX,t YCY,t YCZ,t | {z } Bt } Hierarchical and grouped time series 10 Hierarchical data Total A AX Yt = AY Yt YA,t YB,t YC,t YAX,t YAY ,t YAZ,t YBX,t YBY ,t YBZ,t YCX,t YCY ,t YCZ,t C B AZ BX 1 1 0 0 1 0 = 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 | Forecasting: Principles and Practice 1 0 1 0 0 0 0 1 0 0 0 0 0 BY 1 0 1 0 0 0 0 0 1 0 0 0 0 {z S 1 0 1 0 0 0 0 0 0 1 0 0 0 CX BZ 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 CY CZ YAX,t YAY,t Y AZ,t YBX,t YBY,t YBZ,t Y CX,t YCY,t YCZ,t | {z } Bt } Hierarchical and grouped time series 10 Hierarchical data Total A AX Yt = AY Yt YA,t YB,t YC,t YAX,t YAY ,t YAZ,t YBX,t YBY ,t YBZ,t YCX,t YCY ,t YCZ,t C B AZ BX 1 1 0 0 1 0 = 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 | Forecasting: Principles and Practice 1 0 1 0 0 0 0 1 0 0 0 0 0 BY 1 0 1 0 0 0 0 0 1 0 0 0 0 {z S 1 0 1 0 0 0 0 0 0 1 0 0 0 CX BZ 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 CY CZ YAX,t YAY,t Y AZ,t YBX,t YBY,t YBZ,t Y CX,t YCY,t YCZ,t | {z } Bt Yt = SBt } Hierarchical and grouped time series 10 Grouped data Yt AX AY A BX BY B X Y Total 1 1 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0 1 0 YA,t 1 YB,t 0 YX,t 1 Yt = YY ,t = 0 Y 1 AX,t Y 0 AY ,t YBX,t 0 YBY ,t | {z S Forecasting: Principles and Practice 1 0 1 YAX,t 0 YAY ,t 1 YBX,t 0 YBY ,t 0 | {z } Bt 0 1 } Hierarchical and grouped time series 11 Grouped data Yt AX AY A BX BY B X Y Total 1 1 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0 1 0 YA,t 1 YB,t 0 YX,t 1 Yt = YY ,t = 0 Y 1 AX,t Y 0 AY ,t YBX,t 0 YBY ,t | {z S Forecasting: Principles and Practice 1 0 1 YAX,t 0 YAY ,t 1 YBX,t 0 YBY ,t 0 | {z } Bt 0 1 } Hierarchical and grouped time series 11 Grouped data Yt AX AY A BX BY B X Y Total 1 1 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0 1 0 YA,t 1 YB,t 0 YX,t 1 Yt = YY ,t = 0 Y 1 AX,t Y 0 AY ,t YBX,t 0 YBY ,t | {z S Forecasting: Principles and Practice 1 0 1 YAX,t 0 YAY ,t 1 YBX,t 0 YBY ,t 0 | {z } Bt 0 1 Yt = SBt } Hierarchical and grouped time series 11 Outline 1 Hierarchical and grouped time series 2 Forecasting framework 3 Optimal forecasts 4 Approximately optimal forecasts 5 Application: Australian tourism 6 Application: Australian labour market 7 hts package for R Forecasting: Principles and Practice Forecasting framework 12 Forecasting notation Let Ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as Yt . (They may not add up.) Hierarchical forecasting methods of the form: Ỹn (h) = SPŶn (h) for some matrix P. P extracts and combines base forecasts Ŷn (h) to get bottom-level forecasts. S adds them up Revised reconciled forecasts: Ỹn (h). Forecasting: Principles and Practice Forecasting framework 13 Forecasting notation Let Ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as Yt . (They may not add up.) Hierarchical forecasting methods of the form: Ỹn (h) = SPŶn (h) for some matrix P. P extracts and combines base forecasts Ŷn (h) to get bottom-level forecasts. S adds them up Revised reconciled forecasts: Ỹn (h). Forecasting: Principles and Practice Forecasting framework 13 Forecasting notation Let Ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as Yt . (They may not add up.) Hierarchical forecasting methods of the form: Ỹn (h) = SPŶn (h) for some matrix P. P extracts and combines base forecasts Ŷn (h) to get bottom-level forecasts. S adds them up Revised reconciled forecasts: Ỹn (h). Forecasting: Principles and Practice Forecasting framework 13 Forecasting notation Let Ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as Yt . (They may not add up.) Hierarchical forecasting methods of the form: Ỹn (h) = SPŶn (h) for some matrix P. P extracts and combines base forecasts Ŷn (h) to get bottom-level forecasts. S adds them up Revised reconciled forecasts: Ỹn (h). Forecasting: Principles and Practice Forecasting framework 13 Forecasting notation Let Ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as Yt . (They may not add up.) Hierarchical forecasting methods of the form: Ỹn (h) = SPŶn (h) for some matrix P. P extracts and combines base forecasts Ŷn (h) to get bottom-level forecasts. S adds them up Revised reconciled forecasts: Ỹn (h). Forecasting: Principles and Practice Forecasting framework 13 Forecasting notation Let Ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as Yt . (They may not add up.) Hierarchical forecasting methods of the form: Ỹn (h) = SPŶn (h) for some matrix P. P extracts and combines base forecasts Ŷn (h) to get bottom-level forecasts. S adds them up Revised reconciled forecasts: Ỹn (h). Forecasting: Principles and Practice Forecasting framework 13 Bottom-up forecasts Ỹn (h) = SPŶn (h) Bottom-up forecasts are obtained using P = [0 | I] , where 0 is null matrix and I is identity matrix. P matrix extracts only bottom-level forecasts from Ŷn (h) S adds them up to give the bottom-up forecasts. Forecasting: Principles and Practice Forecasting framework 14 Bottom-up forecasts Ỹn (h) = SPŶn (h) Bottom-up forecasts are obtained using P = [0 | I] , where 0 is null matrix and I is identity matrix. P matrix extracts only bottom-level forecasts from Ŷn (h) S adds them up to give the bottom-up forecasts. Forecasting: Principles and Practice Forecasting framework 14 Bottom-up forecasts Ỹn (h) = SPŶn (h) Bottom-up forecasts are obtained using P = [0 | I] , where 0 is null matrix and I is identity matrix. P matrix extracts only bottom-level forecasts from Ŷn (h) S adds them up to give the bottom-up forecasts. Forecasting: Principles and Practice Forecasting framework 14 Top-down forecasts Ỹn (h) = SPŶn (h) Top-down forecasts are obtained using P = [p | 0] where p = [p1 , p2 , . . . , pmK ]0 is a vector of proportions that sum to one. P distributes forecasts of the aggregate to the lowest level series. Different methods of top-down forecasting lead to different proportionality vectors p. Forecasting: Principles and Practice Forecasting framework 15 Top-down forecasts Ỹn (h) = SPŶn (h) Top-down forecasts are obtained using P = [p | 0] where p = [p1 , p2 , . . . , pmK ]0 is a vector of proportions that sum to one. P distributes forecasts of the aggregate to the lowest level series. Different methods of top-down forecasting lead to different proportionality vectors p. Forecasting: Principles and Practice Forecasting framework 15 Top-down forecasts Ỹn (h) = SPŶn (h) Top-down forecasts are obtained using P = [p | 0] where p = [p1 , p2 , . . . , pmK ]0 is a vector of proportions that sum to one. P distributes forecasts of the aggregate to the lowest level series. Different methods of top-down forecasting lead to different proportionality vectors p. Forecasting: Principles and Practice Forecasting framework 15 General properties: bias Ỹn (h) = SPŶn (h) Assume: base forecasts Ŷn (h) are unbiased: E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ] Let B̂n (h) be bottom level base forecasts with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ]. Then E[Ŷn (h)] = Sβn (h). We want the revised forecasts to be unbiased: E[Ỹn (h)] = SPSβn (h) = Sβn (h). Result will hold provided SPS = S. True for bottom-up, but not for any top-down method or middle-out method. Forecasting: Principles and Practice Forecasting framework 16 General properties: bias Ỹn (h) = SPŶn (h) Assume: base forecasts Ŷn (h) are unbiased: E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ] Let B̂n (h) be bottom level base forecasts with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ]. Then E[Ŷn (h)] = Sβn (h). We want the revised forecasts to be unbiased: E[Ỹn (h)] = SPSβn (h) = Sβn (h). Result will hold provided SPS = S. True for bottom-up, but not for any top-down method or middle-out method. Forecasting: Principles and Practice Forecasting framework 16 General properties: bias Ỹn (h) = SPŶn (h) Assume: base forecasts Ŷn (h) are unbiased: E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ] Let B̂n (h) be bottom level base forecasts with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ]. Then E[Ŷn (h)] = Sβn (h). We want the revised forecasts to be unbiased: E[Ỹn (h)] = SPSβn (h) = Sβn (h). Result will hold provided SPS = S. True for bottom-up, but not for any top-down method or middle-out method. Forecasting: Principles and Practice Forecasting framework 16 General properties: bias Ỹn (h) = SPŶn (h) Assume: base forecasts Ŷn (h) are unbiased: E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ] Let B̂n (h) be bottom level base forecasts with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ]. Then E[Ŷn (h)] = Sβn (h). We want the revised forecasts to be unbiased: E[Ỹn (h)] = SPSβn (h) = Sβn (h). Result will hold provided SPS = S. True for bottom-up, but not for any top-down method or middle-out method. Forecasting: Principles and Practice Forecasting framework 16 General properties: bias Ỹn (h) = SPŶn (h) Assume: base forecasts Ŷn (h) are unbiased: E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ] Let B̂n (h) be bottom level base forecasts with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ]. Then E[Ŷn (h)] = Sβn (h). We want the revised forecasts to be unbiased: E[Ỹn (h)] = SPSβn (h) = Sβn (h). Result will hold provided SPS = S. True for bottom-up, but not for any top-down method or middle-out method. Forecasting: Principles and Practice Forecasting framework 16 General properties: bias Ỹn (h) = SPŶn (h) Assume: base forecasts Ŷn (h) are unbiased: E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ] Let B̂n (h) be bottom level base forecasts with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ]. Then E[Ŷn (h)] = Sβn (h). We want the revised forecasts to be unbiased: E[Ỹn (h)] = SPSβn (h) = Sβn (h). Result will hold provided SPS = S. True for bottom-up, but not for any top-down method or middle-out method. Forecasting: Principles and Practice Forecasting framework 16 General properties: bias Ỹn (h) = SPŶn (h) Assume: base forecasts Ŷn (h) are unbiased: E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ] Let B̂n (h) be bottom level base forecasts with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ]. Then E[Ŷn (h)] = Sβn (h). We want the revised forecasts to be unbiased: E[Ỹn (h)] = SPSβn (h) = Sβn (h). Result will hold provided SPS = S. True for bottom-up, but not for any top-down method or middle-out method. Forecasting: Principles and Practice Forecasting framework 16 General properties: bias Ỹn (h) = SPŶn (h) Assume: base forecasts Ŷn (h) are unbiased: E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ] Let B̂n (h) be bottom level base forecasts with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ]. Then E[Ŷn (h)] = Sβn (h). We want the revised forecasts to be unbiased: E[Ỹn (h)] = SPSβn (h) = Sβn (h). Result will hold provided SPS = S. True for bottom-up, but not for any top-down method or middle-out method. Forecasting: Principles and Practice Forecasting framework 16 General properties: variance Ỹn (h) = SPŶn (h) Let variance of base forecasts Ŷn (h) be given by Σh = Var[Ŷn (h)|Y1 , . . . , Yn ] Then the variance of the revised forecasts is given by Var[Ỹn (h)|Y1 , . . . , Yn ] = SPΣh P0 S0 . This is a general result for all existing methods. Forecasting: Principles and Practice Forecasting framework 17 General properties: variance Ỹn (h) = SPŶn (h) Let variance of base forecasts Ŷn (h) be given by Σh = Var[Ŷn (h)|Y1 , . . . , Yn ] Then the variance of the revised forecasts is given by Var[Ỹn (h)|Y1 , . . . , Yn ] = SPΣh P0 S0 . This is a general result for all existing methods. Forecasting: Principles and Practice Forecasting framework 17 General properties: variance Ỹn (h) = SPŶn (h) Let variance of base forecasts Ŷn (h) be given by Σh = Var[Ŷn (h)|Y1 , . . . , Yn ] Then the variance of the revised forecasts is given by Var[Ỹn (h)|Y1 , . . . , Yn ] = SPΣh P0 S0 . This is a general result for all existing methods. Forecasting: Principles and Practice Forecasting framework 17 Outline 1 Hierarchical and grouped time series 2 Forecasting framework 3 Optimal forecasts 4 Approximately optimal forecasts 5 Application: Australian tourism 6 Application: Australian labour market 7 hts package for R Forecasting: Principles and Practice Optimal forecasts 18 Forecasts Key idea: forecast reconciliation å Ignore structural constraints and forecast every series of interest independently. å Adjust forecasts to impose constraints. Let Ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as Yt . Yt = SBt . So Ŷn (h) = Sβn (h) + εh . βn (h) = E[Bn+h | Y1 , . . . , Yn ]. εh has zero mean and covariance Σh . Estimate βn (h) using GLS? Forecasting: Principles and Practice Optimal forecasts 19 Forecasts Key idea: forecast reconciliation å Ignore structural constraints and forecast every series of interest independently. å Adjust forecasts to impose constraints. Let Ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as Yt . Yt = SBt . So Ŷn (h) = Sβn (h) + εh . βn (h) = E[Bn+h | Y1 , . . . , Yn ]. εh has zero mean and covariance Σh . Estimate βn (h) using GLS? Forecasting: Principles and Practice Optimal forecasts 19 Forecasts Key idea: forecast reconciliation å Ignore structural constraints and forecast every series of interest independently. å Adjust forecasts to impose constraints. Let Ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as Yt . Yt = SBt . So Ŷn (h) = Sβn (h) + εh . βn (h) = E[Bn+h | Y1 , . . . , Yn ]. εh has zero mean and covariance Σh . Estimate βn (h) using GLS? Forecasting: Principles and Practice Optimal forecasts 19 Forecasts Key idea: forecast reconciliation å Ignore structural constraints and forecast every series of interest independently. å Adjust forecasts to impose constraints. Let Ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as Yt . Yt = SBt . So Ŷn (h) = Sβn (h) + εh . βn (h) = E[Bn+h | Y1 , . . . , Yn ]. εh has zero mean and covariance Σh . Estimate βn (h) using GLS? Forecasting: Principles and Practice Optimal forecasts 19 Forecasts Key idea: forecast reconciliation å Ignore structural constraints and forecast every series of interest independently. å Adjust forecasts to impose constraints. Let Ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as Yt . Yt = SBt . So Ŷn (h) = Sβn (h) + εh . βn (h) = E[Bn+h | Y1 , . . . , Yn ]. εh has zero mean and covariance Σh . Estimate βn (h) using GLS? Forecasting: Principles and Practice Optimal forecasts 19 Forecasts Key idea: forecast reconciliation å Ignore structural constraints and forecast every series of interest independently. å Adjust forecasts to impose constraints. Let Ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as Yt . Yt = SBt . So Ŷn (h) = Sβn (h) + εh . βn (h) = E[Bn+h | Y1 , . . . , Yn ]. εh has zero mean and covariance Σh . Estimate βn (h) using GLS? Forecasting: Principles and Practice Optimal forecasts 19 Forecasts Key idea: forecast reconciliation å Ignore structural constraints and forecast every series of interest independently. å Adjust forecasts to impose constraints. Let Ŷn (h) be vector of initial h-step forecasts, made at time n, stacked in same order as Yt . Yt = SBt . So Ŷn (h) = Sβn (h) + εh . βn (h) = E[Bn+h | Y1 , . . . , Yn ]. εh has zero mean and covariance Σh . Estimate βn (h) using GLS? Forecasting: Principles and Practice Optimal forecasts 19 Optimal combination forecasts † † Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Σ†h is generalized inverse of Σh . † Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0 Problem: Σh hard to estimate. Forecasting: Principles and Practice Optimal forecasts 20 Optimal combination forecasts † † Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Initial forecasts Σ†h is generalized inverse of Σh . † Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0 Problem: Σh hard to estimate. Forecasting: Principles and Practice Optimal forecasts 20 Optimal combination forecasts † † Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Revised forecasts Initial forecasts Σ†h is generalized inverse of Σh . † Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0 Problem: Σh hard to estimate. Forecasting: Principles and Practice Optimal forecasts 20 Optimal combination forecasts † † Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Revised forecasts Initial forecasts Σ†h is generalized inverse of Σh . † Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0 Problem: Σh hard to estimate. Forecasting: Principles and Practice Optimal forecasts 20 Optimal combination forecasts † † Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Revised forecasts Initial forecasts Σ†h is generalized inverse of Σh . † Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0 Problem: Σh hard to estimate. Forecasting: Principles and Practice Optimal forecasts 20 Optimal combination forecasts † † Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Revised forecasts Initial forecasts Σ†h is generalized inverse of Σh . † Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0 Problem: Σh hard to estimate. Forecasting: Principles and Practice Optimal forecasts 20 Optimal combination forecasts † † Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Revised forecasts Initial forecasts Σ†h is generalized inverse of Σh . † Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0 Problem: Σh hard to estimate. Forecasting: Principles and Practice Optimal forecasts 20 Outline 1 Hierarchical and grouped time series 2 Forecasting framework 3 Optimal forecasts 4 Approximately optimal forecasts 5 Application: Australian tourism 6 Application: Australian labour market 7 hts package for R Forecasting: Principles and Practice Approximately optimal forecasts 21 Optimal combination forecasts † † Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Revised forecasts Base forecasts Solution 1: OLS † Approximate Σ1 by cI. Or assume εh ≈ SεB,h where εB,h is the forecast error at bottom level. Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ). If Moore-Penrose generalized inverse used, † † then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 . Ỹn (h) = S(S0 S)−1 S0 Ŷn (h) Forecasting: Principles and Practice Approximately optimal forecasts 22 Optimal combination forecasts † † Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Revised forecasts Base forecasts Solution 1: OLS † Approximate Σ1 by cI. Or assume εh ≈ SεB,h where εB,h is the forecast error at bottom level. Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ). If Moore-Penrose generalized inverse used, † † then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 . Ỹn (h) = S(S0 S)−1 S0 Ŷn (h) Forecasting: Principles and Practice Approximately optimal forecasts 22 Optimal combination forecasts † † Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Revised forecasts Base forecasts Solution 1: OLS † Approximate Σ1 by cI. Or assume εh ≈ SεB,h where εB,h is the forecast error at bottom level. Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ). If Moore-Penrose generalized inverse used, † † then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 . Ỹn (h) = S(S0 S)−1 S0 Ŷn (h) Forecasting: Principles and Practice Approximately optimal forecasts 22 Optimal combination forecasts † † Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Revised forecasts Base forecasts Solution 1: OLS † Approximate Σ1 by cI. Or assume εh ≈ SεB,h where εB,h is the forecast error at bottom level. Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ). If Moore-Penrose generalized inverse used, † † then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 . Ỹn (h) = S(S0 S)−1 S0 Ŷn (h) Forecasting: Principles and Practice Approximately optimal forecasts 22 Optimal combination forecasts † † Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Revised forecasts Base forecasts Solution 1: OLS † Approximate Σ1 by cI. Or assume εh ≈ SεB,h where εB,h is the forecast error at bottom level. Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ). If Moore-Penrose generalized inverse used, † † then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 . Ỹn (h) = S(S0 S)−1 S0 Ŷn (h) Forecasting: Principles and Practice Approximately optimal forecasts 22 Optimal combination forecasts † † Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Revised forecasts Base forecasts Solution 1: OLS † Approximate Σ1 by cI. Or assume εh ≈ SεB,h where εB,h is the forecast error at bottom level. Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ). If Moore-Penrose generalized inverse used, † † then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 . Ỹn (h) = S(S0 S)−1 S0 Ŷn (h) Forecasting: Principles and Practice Approximately optimal forecasts 22 Optimal combination forecasts † † Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h) Revised forecasts Base forecasts Solution 1: OLS † Approximate Σ1 by cI. Or assume εh ≈ SεB,h where εB,h is the forecast error at bottom level. Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ). If Moore-Penrose generalized inverse used, † † then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 . Ỹn (h) = S(S0 S)−1 S0 Ŷn (h) Forecasting: Principles and Practice Approximately optimal forecasts 22 Optimal combination forecasts Ỹn (h) = S(S0 S)−1 S0 Ŷn (h) Total A B C Forecasting: Principles and Practice Approximately optimal forecasts 23 Optimal combination forecasts Ỹn (h) = S(S0 S)−1 S0 Ŷn (h) Total A B C Weights: 0.75 0.25 0.25 0.25 0.25 0.75 −0.25 −0.25 S(S0 S)−1 S0 = 0.25 −0.25 0.75 −0.25 0.25 −0.25 −0.25 0.75 Forecasting: Principles and Practice Approximately optimal forecasts 23 Optimal combination forecasts Total A AA AB C B AC BA BB BC CA CB CC 0.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 0.73 −0.27 −0.27 0.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 −0.27 0.73 −0.27 Weights: S(S0 S)−1 S0 = 0.69 0.23 0.23 0.23 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.23 0.58 −0.17 −0.17 0.19 0.19 0.19 −0.06 −0.06 −0.06 −0.06 −0.06 −0.06 0.23 −0.17 0.58 −0.17 −0.06 −0.06 −0.06 0.19 0.19 0.19 −0.06 −0.06 −0.06 0.23 −0.17 −0.17 0.58 −0.06 −0.06 −0.06 −0.06 −0.06 −0.06 0.19 0.19 0.19 0.08 0.19 −0.06 −0.06 0.73 −0.27 −0.27 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 0.08 0.19 −0.06 −0.06 −0.27 0.73 −0.27 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 Forecasting: Principles and Practice 0.08 0.19 −0.06 −0.06 −0.27 −0.27 0.73 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 0.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 0.73 −0.27 −0.27 −0.02 −0.02 −0.02 0.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 −0.27 0.73 −0.27 −0.02 −0.02 −0.02 0.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 −0.27 −0.27 0.73 −0.02 −0.02 −0.02 Approximately optimal forecasts 0.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 −0.27 −0.27 0.73 24 Optimal combination forecasts Total A AA AB C B AC BA BB BC CA CB CC 0.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 0.73 −0.27 −0.27 0.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 −0.27 0.73 −0.27 Weights: S(S0 S)−1 S0 = 0.69 0.23 0.23 0.23 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.23 0.58 −0.17 −0.17 0.19 0.19 0.19 −0.06 −0.06 −0.06 −0.06 −0.06 −0.06 0.23 −0.17 0.58 −0.17 −0.06 −0.06 −0.06 0.19 0.19 0.19 −0.06 −0.06 −0.06 0.23 −0.17 −0.17 0.58 −0.06 −0.06 −0.06 −0.06 −0.06 −0.06 0.19 0.19 0.19 0.08 0.19 −0.06 −0.06 0.73 −0.27 −0.27 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 0.08 0.19 −0.06 −0.06 −0.27 0.73 −0.27 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 Forecasting: Principles and Practice 0.08 0.19 −0.06 −0.06 −0.27 −0.27 0.73 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 0.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 0.73 −0.27 −0.27 −0.02 −0.02 −0.02 0.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 −0.27 0.73 −0.27 −0.02 −0.02 −0.02 0.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 −0.27 −0.27 0.73 −0.02 −0.02 −0.02 Approximately optimal forecasts 0.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 −0.27 −0.27 0.73 24 Features Covariates can be included in initial forecasts. Adjustments can be made to initial forecasts at any level. Very simple and flexible method. Can work with any hierarchical or grouped time series. SPS = S so reconciled forcasts are unbiased. Conceptually easy to implement: OLS on base forecasts. Weights are independent of the data and of the covariance structure of the hierarchy. Forecasting: Principles and Practice Approximately optimal forecasts 25 Features Covariates can be included in initial forecasts. Adjustments can be made to initial forecasts at any level. Very simple and flexible method. Can work with any hierarchical or grouped time series. SPS = S so reconciled forcasts are unbiased. Conceptually easy to implement: OLS on base forecasts. Weights are independent of the data and of the covariance structure of the hierarchy. Forecasting: Principles and Practice Approximately optimal forecasts 25 Features Covariates can be included in initial forecasts. Adjustments can be made to initial forecasts at any level. Very simple and flexible method. Can work with any hierarchical or grouped time series. SPS = S so reconciled forcasts are unbiased. Conceptually easy to implement: OLS on base forecasts. Weights are independent of the data and of the covariance structure of the hierarchy. Forecasting: Principles and Practice Approximately optimal forecasts 25 Features Covariates can be included in initial forecasts. Adjustments can be made to initial forecasts at any level. Very simple and flexible method. Can work with any hierarchical or grouped time series. SPS = S so reconciled forcasts are unbiased. Conceptually easy to implement: OLS on base forecasts. Weights are independent of the data and of the covariance structure of the hierarchy. Forecasting: Principles and Practice Approximately optimal forecasts 25 Features Covariates can be included in initial forecasts. Adjustments can be made to initial forecasts at any level. Very simple and flexible method. Can work with any hierarchical or grouped time series. SPS = S so reconciled forcasts are unbiased. Conceptually easy to implement: OLS on base forecasts. Weights are independent of the data and of the covariance structure of the hierarchy. Forecasting: Principles and Practice Approximately optimal forecasts 25 Features Covariates can be included in initial forecasts. Adjustments can be made to initial forecasts at any level. Very simple and flexible method. Can work with any hierarchical or grouped time series. SPS = S so reconciled forcasts are unbiased. Conceptually easy to implement: OLS on base forecasts. Weights are independent of the data and of the covariance structure of the hierarchy. Forecasting: Principles and Practice Approximately optimal forecasts 25 Challenges Ỹn (h) = S(S0 S)−1 S0 Ŷn (h) Computational difficulties in big hierarchies due to size of the S matrix and singular behavior of (S0 S). Need to estimate covariance matrix to produce prediction intervals. Ignores covariance matrix in computing point forecasts. Forecasting: Principles and Practice Approximately optimal forecasts 26 Challenges Ỹn (h) = S(S0 S)−1 S0 Ŷn (h) Computational difficulties in big hierarchies due to size of the S matrix and singular behavior of (S0 S). Need to estimate covariance matrix to produce prediction intervals. Ignores covariance matrix in computing point forecasts. Forecasting: Principles and Practice Approximately optimal forecasts 26 Challenges Ỹn (h) = S(S0 S)−1 S0 Ŷn (h) Computational difficulties in big hierarchies due to size of the S matrix and singular behavior of (S0 S). Need to estimate covariance matrix to produce prediction intervals. Ignores covariance matrix in computing point forecasts. Forecasting: Principles and Practice Approximately optimal forecasts 26 Optimal combination forecasts † † Ỹn (h) = S(S0 Σ1 S)−1 S0 Σ1 Ŷn (h) Solution 1: OLS † Approximate Σ1 by cI. Solution 2: Rescaling Suppose we approximate Σ1 by its diagonal. −1 Let Λ = diagonal Σ1 contain inverse one-step forecast variances. Ỹn (h) = S(S0ΛS)−1 S0ΛŶn (h) Forecasting: Principles and Practice Approximately optimal forecasts 27 Optimal combination forecasts † † Ỹn (h) = S(S0 Σ1 S)−1 S0 Σ1 Ŷn (h) Solution 1: OLS † Approximate Σ1 by cI. Solution 2: Rescaling Suppose we approximate Σ1 by its diagonal. −1 Let Λ = diagonal Σ1 contain inverse one-step forecast variances. Ỹn (h) = S(S0ΛS)−1 S0ΛŶn (h) Forecasting: Principles and Practice Approximately optimal forecasts 27 Optimal combination forecasts † † Ỹn (h) = S(S0 Σ1 S)−1 S0 Σ1 Ŷn (h) Solution 1: OLS † Approximate Σ1 by cI. Solution 2: Rescaling Suppose we approximate Σ1 by its diagonal. −1 Let Λ = diagonal Σ1 contain inverse one-step forecast variances. Ỹn (h) = S(S0ΛS)−1 S0ΛŶn (h) Forecasting: Principles and Practice Approximately optimal forecasts 27 Optimal combination forecasts † † Ỹn (h) = S(S0 Σ1 S)−1 S0 Σ1 Ŷn (h) Solution 1: OLS † Approximate Σ1 by cI. Solution 2: Rescaling Suppose we approximate Σ1 by its diagonal. −1 Let Λ = diagonal Σ1 contain inverse one-step forecast variances. Ỹn (h) = S(S0ΛS)−1 S0ΛŶn (h) Forecasting: Principles and Practice Approximately optimal forecasts 27 Optimal combination forecasts † † Ỹn (h) = S(S0 Σ1 S)−1 S0 Σ1 Ŷn (h) Solution 1: OLS † Approximate Σ1 by cI. Solution 2: Rescaling Suppose we approximate Σ1 by its diagonal. −1 Let Λ = diagonal Σ1 contain inverse one-step forecast variances. Ỹn (h) = S(S0ΛS)−1 S0ΛŶn (h) Forecasting: Principles and Practice Approximately optimal forecasts 27 Optimal reconciled forecasts Ỹn (h) = Sβ̂n (h) = S(S0 ΛS)−1 S0 ΛŶn (h) Easy to estimate, and places weight where we have best forecasts. Ignores covariances. For large numbers of time series, we need to do calculation without explicitly forming S or (S0ΛS)−1 or S0Λ. Forecasting: Principles and Practice Approximately optimal forecasts 28 Optimal reconciled forecasts Ỹn (h) = Sβ̂n (h) = S(S0 ΛS)−1 S0 ΛŶn (h) Initial forecasts Easy to estimate, and places weight where we have best forecasts. Ignores covariances. For large numbers of time series, we need to do calculation without explicitly forming S or (S0ΛS)−1 or S0Λ. Forecasting: Principles and Practice Approximately optimal forecasts 28 Optimal reconciled forecasts Ỹn (h) = Sβ̂n (h) = S(S0 ΛS)−1 S0 ΛŶn (h) Revised forecasts Initial forecasts Easy to estimate, and places weight where we have best forecasts. Ignores covariances. For large numbers of time series, we need to do calculation without explicitly forming S or (S0ΛS)−1 or S0Λ. Forecasting: Principles and Practice Approximately optimal forecasts 28 Optimal reconciled forecasts Ỹn (h) = Sβ̂n (h) = S(S0 ΛS)−1 S0 ΛŶn (h) Revised forecasts Initial forecasts Easy to estimate, and places weight where we have best forecasts. Ignores covariances. For large numbers of time series, we need to do calculation without explicitly forming S or (S0ΛS)−1 or S0Λ. Forecasting: Principles and Practice Approximately optimal forecasts 28 Optimal reconciled forecasts Ỹn (h) = Sβ̂n (h) = S(S0 ΛS)−1 S0 ΛŶn (h) Revised forecasts Initial forecasts Easy to estimate, and places weight where we have best forecasts. Ignores covariances. For large numbers of time series, we need to do calculation without explicitly forming S or (S0ΛS)−1 or S0Λ. Forecasting: Principles and Practice Approximately optimal forecasts 28 Outline 1 Hierarchical and grouped time series 2 Forecasting framework 3 Optimal forecasts 4 Approximately optimal forecasts 5 Application: Australian tourism 6 Application: Australian labour market 7 hts package for R Forecasting: Principles and Practice Application: Australian tourism 29 Australian tourism Forecasting: Principles and Practice Application: Australian tourism 30 Australian tourism Domestic visitor nights Quarterly data: 1998 – 2006. From: National Visitor Survey, based on annual interviews of 120,000 Australians aged 15+, collected by Tourism Research Australia. Forecasting: Principles and Practice Application: Australian tourism 30 Australian tourism Hierarchy: States (7) Zones (27) Regions (82) Forecasting: Principles and Practice Application: Australian tourism 30 Australian tourism Hierarchy: States (7) Zones (27) Regions (82) Base forecasts ETS (exponential smoothing) models Forecasting: Principles and Practice Application: Australian tourism 30 Base forecasts 75000 70000 65000 60000 Visitor nights 80000 85000 Domestic tourism forecasts: Total 1998 2000 2002 2004 2006 2008 Year Forecasting: Principles and Practice Application: Australian tourism 31 Base forecasts 26000 22000 18000 Visitor nights 30000 Domestic tourism forecasts: NSW 1998 2000 2002 2004 2006 2008 Year Forecasting: Principles and Practice Application: Australian tourism 31 Base forecasts 16000 14000 12000 10000 Visitor nights 18000 Domestic tourism forecasts: VIC 1998 2000 2002 2004 2006 2008 Year Forecasting: Principles and Practice Application: Australian tourism 31 Base forecasts 7000 6000 5000 Visitor nights 8000 9000 Domestic tourism forecasts: Nth.Coast.NSW 1998 2000 2002 2004 2006 2008 Year Forecasting: Principles and Practice Application: Australian tourism 31 Base forecasts 11000 9000 8000 Visitor nights 13000 Domestic tourism forecasts: Metro.QLD 1998 2000 2002 2004 2006 2008 Year Forecasting: Principles and Practice Application: Australian tourism 31 Base forecasts 1000 800 600 400 Visitor nights 1200 1400 Domestic tourism forecasts: Sth.WA 1998 2000 2002 2004 2006 2008 Year Forecasting: Principles and Practice Application: Australian tourism 31 Base forecasts 5000 4500 4000 Visitor nights 5500 6000 Domestic tourism forecasts: X201.Melbourne 1998 2000 2002 2004 2006 2008 Year Forecasting: Principles and Practice Application: Australian tourism 31 Base forecasts 200 100 0 Visitor nights 300 Domestic tourism forecasts: X402.Murraylands 1998 2000 2002 2004 2006 2008 Year Forecasting: Principles and Practice Application: Australian tourism 31 Base forecasts 60 40 20 0 Visitor nights 80 100 Domestic tourism forecasts: X809.Daly 1998 2000 2002 2004 2006 2008 Year Forecasting: Principles and Practice Application: Australian tourism 31 80000 65000 Total 95000 Reconciled forecasts 2000 Forecasting: Principles and Practice 2005 2010 Application: Australian tourism 32 18000 2010 Other Forecasting: Principles and Practice 14000 VIC 2005 10000 2000 2000 2005 2010 2000 2005 2010 24000 2010 18000 24000 2005 20000 2000 14000 QLD 18000 NSW 30000 Reconciled forecasts Application: Australian tourism 32 2005 2010 Other QLD 9000 Other 22000 14000 Other NSW 2000 2010 2000 2005 2010 2000 2005 2010 2000 2005 2010 12000 2010 Other VIC 2005 6000 2000 2005 12000 2010 6000 2005 2000 5500 7500 7000 2000 6000 Melbourne 2010 20000 GC and Brisbane 2005 14000 Capital cities 2000 4000 5000 4000 Sydney Reconciled forecasts Forecasting: Principles and Practice Application: Australian tourism 32 Forecast evaluation Select models using all observations; Re-estimate models using first 12 observations and generate 1- to 8-step-ahead forecasts; Increase sample size one observation at a time, re-estimate models, generate forecasts until the end of the sample; In total 24 1-step-ahead, 23 2-steps-ahead, up to 17 8-steps-ahead for forecast evaluation. Forecasting: Principles and Practice Application: Australian tourism 33 Forecast evaluation Select models using all observations; Re-estimate models using first 12 observations and generate 1- to 8-step-ahead forecasts; Increase sample size one observation at a time, re-estimate models, generate forecasts until the end of the sample; In total 24 1-step-ahead, 23 2-steps-ahead, up to 17 8-steps-ahead for forecast evaluation. Forecasting: Principles and Practice Application: Australian tourism 33 Forecast evaluation Select models using all observations; Re-estimate models using first 12 observations and generate 1- to 8-step-ahead forecasts; Increase sample size one observation at a time, re-estimate models, generate forecasts until the end of the sample; In total 24 1-step-ahead, 23 2-steps-ahead, up to 17 8-steps-ahead for forecast evaluation. Forecasting: Principles and Practice Application: Australian tourism 33 Forecast evaluation Select models using all observations; Re-estimate models using first 12 observations and generate 1- to 8-step-ahead forecasts; Increase sample size one observation at a time, re-estimate models, generate forecasts until the end of the sample; In total 24 1-step-ahead, 23 2-steps-ahead, up to 17 8-steps-ahead for forecast evaluation. Forecasting: Principles and Practice Application: Australian tourism 33 Hierarchy: states, zones, regions MAPE h=1 Top Level: Australia Bottom-up 3.79 OLS 3.83 Scaling (st. dev.) 3.68 Level: States Bottom-up 10.70 OLS 11.07 Scaling (st. dev.) 10.44 Level: Zones Bottom-up 14.99 OLS 15.16 Scaling (st. dev.) 14.63 Bottom Level: Regions Bottom-up 33.12 OLS 35.89 Scaling (st. dev.) 31.68 Forecasting: Principles and Practice h=2 h=4 h=6 h=8 Average 3.58 3.66 3.56 4.01 3.88 3.97 4.55 4.19 4.57 4.24 4.25 4.25 4.06 3.94 4.04 10.52 10.58 10.17 10.85 11.13 10.47 11.46 11.62 10.97 11.27 12.21 10.98 11.03 11.35 10.67 14.97 15.06 14.62 14.98 15.27 14.68 15.69 15.74 15.17 15.65 16.15 15.25 15.32 15.48 14.94 32.54 33.86 31.22 32.26 34.26 31.08 33.74 36.06 32.41 33.96 37.49 32.77 33.18 35.43 31.89 Application: Australian tourism 34 Outline 1 Hierarchical and grouped time series 2 Forecasting framework 3 Optimal forecasts 4 Approximately optimal forecasts 5 Application: Australian tourism 6 Application: Australian labour market 7 hts package for R Forecasting: Principles and Practice Application: Australian labour market 35 ANZSCO Australia and New Zealand Standard Classification of Occupations 8 major groups 43 sub-major groups 97 minor groups – 359 unit groups * 1023 occupations Example: statistician 2 Professionals 22 Business, Human Resource and Marketing Professionals 224 Information and Organisation Professionals 2241 Actuaries, Mathematicians and Statisticians 224113 Statistician Forecasting: Principles and Practice Application: Australian labour market 36 ANZSCO Australia and New Zealand Standard Classification of Occupations 8 major groups 43 sub-major groups 97 minor groups – 359 unit groups * 1023 occupations Example: statistician 2 Professionals 22 Business, Human Resource and Marketing Professionals 224 Information and Organisation Professionals 2241 Actuaries, Mathematicians and Statisticians 224113 Statistician Forecasting: Principles and Practice Application: Australian labour market 36 Time Level 0 700 500 1000 Level 1 1. Managers 2. Professionals 3. Technicians and trade workers 4. Community and personal services workers 5. Clerical and administrative workers 6. Sales workers 7. Machinery operators and drivers 8. Labourers 1500 2000 2500 7000 9000 11000 Australian Labour Market data 400 700 100 200 300 Level 2 500 600 Time 400 500 100 200 300 Level 3 500 600 Time 300 200 100 Level 4 400 Time 1990 1995 2000 2005 2010 Time Forecasting: Principles and Practice Application: Australian labour market 37 Time Level 0 700 500 1000 Level 1 1. Managers 2. Professionals 3. Technicians and trade workers 4. Community and personal services workers 5. Clerical and administrative workers 6. Sales workers 7. Machinery operators and drivers 8. Labourers 1500 2000 2500 7000 9000 11000 Australian Labour Market data 400 700 100 200 300 Level 2 500 600 Time Time 400 500 100 200 300 Level 3 500 600 Lower three panels show largest sub-groups at each level. 300 200 100 Level 4 400 Time 1990 1995 2000 2005 2010 Time Forecasting: Principles and Practice Application: Australian labour market 37 11600 Level 0 11200 800 10800 Time 760 740 Level 1 720 700 500 680 700 Time 170 180 140 700 100 150 200 160 300 Level 2 180 400 190 500 200 600 Time Level 2 Base forecasts Reconciled forecasts 780 Time 1000 Level 1 1. Managers 2. Professionals 3. Technicians and trade workers 4. Community and personal services workers 5. Clerical and administrative workers 6. Sales workers 7. Machinery operators and drivers 8. Labourers 1500 2000 2500 7000 Level 0 9000 11000 12000 Australian Labour Market data 600 Time 170 Level 3 500 160 100 140 200 150 160 500 400 300 Level 3 Time 140 130 200 300 Level 4 150 Time 120 100 Level 4 400 Time 1990 1995 2000 2005 2010 Time Forecasting: Principles and Practice 2010 2011 2012 2013 2014 2015 Year Application: Australian labour market 37 11600 Level 0 11200 800 10800 760 740 Level 1 720 680 200 Time 190 500 160 170 Level 2 180 400 150 180 140 170 160 Level 3 150 140 160 600 500 400 500 100 200 300 Largest changes shown for each level Time Time 120 100 130 200 300 Level 4 150 400 Time 140 Level 2 300 200 700 100 Base forecasts from auto.arima() Time Level 3 Time 700 500 700 600 Time Level 4 Base forecasts Reconciled forecasts 780 Time 1000 Level 1 1. Managers 2. Professionals 3. Technicians and trade workers 4. Community and personal services workers 5. Clerical and administrative workers 6. Sales workers 7. Machinery operators and drivers 8. Labourers 1500 2000 2500 7000 Level 0 9000 11000 12000 Australian Labour Market data 1990 1995 2000 2005 2010 Time Forecasting: Principles and Practice 2010 2011 2012 2013 2014 2015 Year Application: Australian labour market 37 Forecast evaluation (rolling origin) h=1 h=2 h=3 h=4 h=5 h=6 h=7 h=8 Average Top level Bottom-up OLS WLS 74.71 52.20 61.77 102.02 77.77 86.32 121.70 101.50 107.26 131.17 119.03 119.33 147.08 138.27 137.01 157.12 150.75 146.88 169.60 160.04 156.71 178.93 166.38 162.38 135.29 120.74 122.21 Level 1 Bottom-up OLS WLS 21.59 21.89 20.58 27.33 28.55 26.19 30.81 32.74 29.71 32.94 35.58 31.84 35.45 38.82 34.36 37.10 41.24 35.89 39.00 43.34 37.53 40.51 45.49 38.86 33.09 35.96 31.87 Level 2 Bottom-up OLS WLS 8.78 9.02 8.58 10.72 11.19 10.48 11.79 12.34 11.54 12.42 13.04 12.15 13.13 13.92 12.88 13.61 14.56 13.36 14.14 15.17 13.87 14.65 15.77 14.36 12.40 13.13 12.15 Level 3 Bottom-up OLS WLS 5.44 5.55 5.35 6.57 6.78 6.46 7.17 7.42 7.06 7.53 7.81 7.42 7.94 8.29 7.84 8.27 8.68 8.17 8.60 9.04 8.48 8.89 9.37 8.76 7.55 7.87 7.44 Bottom Level Bottom-up 2.35 OLS 2.40 WLS 2.34 2.79 2.86 2.77 3.02 3.10 2.99 3.15 3.24 3.12 3.29 3.41 3.27 3.42 3.55 3.40 3.54 3.68 3.52 3.65 3.80 3.63 3.15 3.25 3.13 RMSE Forecasting: Principles and Practice Application: Australian labour market 38 Outline 1 Hierarchical and grouped time series 2 Forecasting framework 3 Optimal forecasts 4 Approximately optimal forecasts 5 Application: Australian tourism 6 Application: Australian labour market 7 hts package for R Forecasting: Principles and Practice hts package for R 39 hts package for R hts: Hierarchical and grouped time series Methods for analysing and forecasting hierarchical and grouped time series Version: Depends: Imports: Published: Author: Maintainer: BugReports: License: 4.3 forecast (≥ 5.0) SparseM, parallel, utils 2014-06-10 Rob J Hyndman, Earo Wang and Alan Lee Rob J Hyndman <Rob.Hyndman at monash.edu> https://github.com/robjhyndman/hts/issues GPL (≥ 2) Forecasting: Principles and Practice hts package for R 40 Example using R library(hts) # bts is a matrix containing the bottom level time series # nodes describes the hierarchical structure y <- hts(bts, nodes=list(2, c(3,2))) Forecasting: Principles and Practice hts package for R 41 Example using R library(hts) # bts is a matrix containing the bottom level time series # nodes describes the hierarchical structure y <- hts(bts, nodes=list(2, c(3,2))) Total A AX Forecasting: Principles and Practice AY B AZ hts package for R BX BY 41 Example using R library(hts) # bts is a matrix containing the bottom level time series # nodes describes the hierarchical structure y <- hts(bts, nodes=list(2, c(3,2))) # Forecast 10-step-ahead using WLS combination method # ETS used for each series by default fc <- forecast(y, h=10) Forecasting: Principles and Practice hts package for R 42 forecast.gts function Usage forecast(object, h, method = c("comb", "bu", "mo", "tdgsf", "tdgsa", "tdfp"), fmethod = c("ets", "rw", "arima"), weights = c("sd", "none", "nseries"), positive = FALSE, parallel = FALSE, num.cores = 2, ...) Arguments object h method fmethod positive weights parallel num.cores Hierarchical time series object of class gts. Forecast horizon Method for distributing forecasts within the hierarchy. Forecasting method to use If TRUE, forecasts are forced to be strictly positive Weights used for optimal combination method. When weights = sd, it takes account of the standard deviation of forecasts. If TRUE, allow parallel processing If parallel = TRUE, specify how many cores are going to be used Forecasting: Principles and Practice hts package for R 43