Forecasting: Principles and Practice Rob J Hyndman 11. Hierarchical forecasting

advertisement
Rob J Hyndman
Forecasting:
Principles and Practice
11. Hierarchical forecasting
OTexts.com/fpp/9/4/
Forecasting: Principles and Practice
1
Outline
1 Hierarchical and grouped time series
2 Forecasting framework
3 Optimal forecasts
4 Approximately optimal forecasts
5 Application: Australian tourism
6 Application: Australian labour market
7 hts package for R
Forecasting: Principles and Practice
Hierarchical and grouped time series
2
Introduction
Total
A
AA
AB
C
B
AC
BA
BB
BC
CA
CB
CC
Examples
Manufacturing product hierarchies
Net labour turnover
Pharmaceutical sales
Tourism demand by region and purpose
Forecasting: Principles and Practice
Hierarchical and grouped time series
3
Introduction
Total
A
AA
AB
C
B
AC
BA
BB
BC
CA
CB
CC
Examples
Manufacturing product hierarchies
Net labour turnover
Pharmaceutical sales
Tourism demand by region and purpose
Forecasting: Principles and Practice
Hierarchical and grouped time series
3
Introduction
Total
A
AA
AB
C
B
AC
BA
BB
BC
CA
CB
CC
Examples
Manufacturing product hierarchies
Net labour turnover
Pharmaceutical sales
Tourism demand by region and purpose
Forecasting: Principles and Practice
Hierarchical and grouped time series
3
Introduction
Total
A
AA
AB
C
B
AC
BA
BB
BC
CA
CB
CC
Examples
Manufacturing product hierarchies
Net labour turnover
Pharmaceutical sales
Tourism demand by region and purpose
Forecasting: Principles and Practice
Hierarchical and grouped time series
3
Introduction
Total
A
AA
AB
C
B
AC
BA
BB
BC
CA
CB
CC
Examples
Manufacturing product hierarchies
Net labour turnover
Pharmaceutical sales
Tourism demand by region and purpose
Forecasting: Principles and Practice
Hierarchical and grouped time series
3
Forecasting the PBS
Forecasting: Principles and Practice
Hierarchical and grouped time series
4
ATC drug classification
A
B
C
D
G
H
J
L
M
N
P
R
S
V
Alimentary tract and metabolism
Blood and blood forming organs
Cardiovascular system
Dermatologicals
Genito-urinary system and sex hormones
Systemic hormonal preparations, excluding sex hormones and insulins
Anti-infectives for systemic use
Antineoplastic and immunomodulating agents
Musculo-skeletal system
Nervous system
Antiparasitic products, insecticides and repellents
Respiratory system
Sensory organs
Various
Forecasting: Principles and Practice
Hierarchical and grouped time series
5
ATC drug classification
14 classes
A
84 classes
A10
A10B
Alimentary tract and metabolism
Drugs used in diabetes
Blood glucose lowering drugs
A10BA
Biguanides
A10BA02
Metformin
Forecasting: Principles and Practice
Hierarchical and grouped time series
6
Australian tourism
Forecasting: Principles and Practice
Hierarchical and grouped time series
7
Australian tourism
Also split by purpose of travel:
Holiday
Visits to friends and relatives
Business
Other
Forecasting: Principles and Practice
Hierarchical and grouped time series
7
Hierarchical/grouped time series
A hierarchical time series is a collection of
several time series that are linked together in a
hierarchical structure.
Example: Pharmaceutical products are organized in
a hierarchy under the Anatomical Therapeutic
Chemical (ATC) Classification System.
A grouped time series is a collection of time
series that are aggregated in a number of
non-hierarchical ways.
Example: Australian tourism demand is grouped by
region and purpose of travel.
Forecasting: Principles and Practice
Hierarchical and grouped time series
8
Hierarchical/grouped time series
A hierarchical time series is a collection of
several time series that are linked together in a
hierarchical structure.
Example: Pharmaceutical products are organized in
a hierarchy under the Anatomical Therapeutic
Chemical (ATC) Classification System.
A grouped time series is a collection of time
series that are aggregated in a number of
non-hierarchical ways.
Example: Australian tourism demand is grouped by
region and purpose of travel.
Forecasting: Principles and Practice
Hierarchical and grouped time series
8
Hierarchical/grouped time series
A hierarchical time series is a collection of
several time series that are linked together in a
hierarchical structure.
Example: Pharmaceutical products are organized in
a hierarchy under the Anatomical Therapeutic
Chemical (ATC) Classification System.
A grouped time series is a collection of time
series that are aggregated in a number of
non-hierarchical ways.
Example: Australian tourism demand is grouped by
region and purpose of travel.
Forecasting: Principles and Practice
Hierarchical and grouped time series
8
Hierarchical/grouped time series
A hierarchical time series is a collection of
several time series that are linked together in a
hierarchical structure.
Example: Pharmaceutical products are organized in
a hierarchy under the Anatomical Therapeutic
Chemical (ATC) Classification System.
A grouped time series is a collection of time
series that are aggregated in a number of
non-hierarchical ways.
Example: Australian tourism demand is grouped by
region and purpose of travel.
Forecasting: Principles and Practice
Hierarchical and grouped time series
8
Hierarchical data
Total
A
B
C
Forecasting: Principles and Practice
Yt : observed aggregate of all
series at time t.
YX,t : observation on series X at
time t.
Bt : vector of all series at
bottom level in time t.
Hierarchical and grouped time series
9
Hierarchical data
Total
A
B
C
Forecasting: Principles and Practice
Yt : observed aggregate of all
series at time t.
YX,t : observation on series X at
time t.
Bt : vector of all series at
bottom level in time t.
Hierarchical and grouped time series
9
Hierarchical data
Total
A
B
C
Yt : observed aggregate of all
series at time t.
YX,t : observation on series X at
time t.
Bt : vector of all series at
bottom level in time t.

1
1
Yt = [Yt , YA,t , YB,t , YC,t ]0 = 
0
0
Forecasting: Principles and Practice
1
0
1
0


1 
Y
A
,
t
0
  YB , t 
0
YC,t
1
Hierarchical and grouped time series
9
Hierarchical data
Total
A
B
C
Yt : observed aggregate of all
series at time t.
YX,t : observation on series X at
time t.
Bt : vector of all series at
bottom level in time t.

1
0
1
0
|
{z
1
1
Yt = [Yt , YA,t , YB,t , YC,t ]0 = 
0
0
Forecasting: Principles and Practice
S


1 
Y
A
,
t
0
 YB,t 
0
YC,t
1
}
Hierarchical and grouped time series
9
Hierarchical data
Total
A
B
C
Yt : observed aggregate of all
series at time t.
YX,t : observation on series X at
time t.
Bt : vector of all series at
bottom level in time t.

1
0
1
0
|
{z
1
1
Yt = [Yt , YA,t , YB,t , YC,t ]0 = 
0
0
Forecasting: Principles and Practice
S


1 
Y
A
,
t
0
 YB,t 
0
YC,t
1 | {z }
}
Bt
Hierarchical and grouped time series
9
Hierarchical data
Total
A
B
C
Yt : observed aggregate of all
series at time t.
YX,t : observation on series X at
time t.
Bt : vector of all series at
bottom level in time t.

1
0
1
0
|
{z
1
1
Yt = [Yt , YA,t , YB,t , YC,t ]0 = 
0
0
Yt = SBt
Forecasting: Principles and Practice
S


1 
Y
A
,
t
0
 YB,t 
0
YC,t
1 | {z }
}
Bt
Hierarchical and grouped time series
9
Hierarchical data
Total
A
AX






Yt = 





AY
Yt
YA,t
YB,t
YC,t
YAX,t
YAY ,t
YAZ,t
YBX,t
YBY ,t
YBZ,t
YCX,t
YCY ,t
YCZ,t

C
B
AZ
BX
1
1
 0
 0
 1
 
 0
 = 0
 0
 0
 
 0
 0
0
0
1
1
0
0
0
1
0
0
0
0
0
0
0
1
1
0
0
0
0
1
0
0
0
0
0
0
|
Forecasting: Principles and Practice
1
0
1
0
0
0
0
1
0
0
0
0
0
BY
1
0
1
0
0
0
0
0
1
0
0
0
0
{z
S
1
0
1
0
0
0
0
0
0
1
0
0
0
CX
BZ
1
0
0
1
0
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
1
0
1
0
0
1
0
0
0
0
0
0
0
0
1
CY
CZ

YAX,t 
 YAY,t
Y 
 AZ,t 
YBX,t 
YBY,t 
YBZ,t 
Y 
 CX,t
 YCY,t
 YCZ,t
| {z }
Bt
}
Hierarchical and grouped time series
10
Hierarchical data
Total
A
AX






Yt = 





AY
Yt
YA,t
YB,t
YC,t
YAX,t
YAY ,t
YAZ,t
YBX,t
YBY ,t
YBZ,t
YCX,t
YCY ,t
YCZ,t

C
B
AZ
BX
1
1
 0
 0
 1
 
 0
 = 0
 0
 0
 
 0
 0
0
0
1
1
0
0
0
1
0
0
0
0
0
0
0
1
1
0
0
0
0
1
0
0
0
0
0
0
|
Forecasting: Principles and Practice
1
0
1
0
0
0
0
1
0
0
0
0
0
BY
1
0
1
0
0
0
0
0
1
0
0
0
0
{z
S
1
0
1
0
0
0
0
0
0
1
0
0
0
CX
BZ
1
0
0
1
0
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
1
0
1
0
0
1
0
0
0
0
0
0
0
0
1
CY
CZ

YAX,t 
 YAY,t
Y 
 AZ,t 
YBX,t 
YBY,t 
YBZ,t 
Y 
 CX,t
 YCY,t
 YCZ,t
| {z }
Bt
}
Hierarchical and grouped time series
10
Hierarchical data
Total
A
AX






Yt = 





AY
Yt
YA,t
YB,t
YC,t
YAX,t
YAY ,t
YAZ,t
YBX,t
YBY ,t
YBZ,t
YCX,t
YCY ,t
YCZ,t

C
B
AZ
BX
1
1
 0
 0
 1
 
 0
 = 0
 0
 0
 
 0
 0
0
0
1
1
0
0
0
1
0
0
0
0
0
0
0
1
1
0
0
0
0
1
0
0
0
0
0
0
|
Forecasting: Principles and Practice
1
0
1
0
0
0
0
1
0
0
0
0
0
BY
1
0
1
0
0
0
0
0
1
0
0
0
0
{z
S
1
0
1
0
0
0
0
0
0
1
0
0
0
CX
BZ
1
0
0
1
0
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
1
0
1
0
0
1
0
0
0
0
0
0
0
0
1
CY
CZ

YAX,t 
 YAY,t
Y 
 AZ,t 
YBX,t 
YBY,t 
YBZ,t 
Y 
 CX,t
 YCY,t
 YCZ,t
| {z }
Bt
Yt = SBt
}
Hierarchical and grouped time series
10
Grouped data

Yt

AX
AY
A
BX
BY
B
X
Y
Total

1 1
1
0
0
1
0
1
0
0 0
1
0
1
1
0
0
0
1
0
 YA,t  1

 
 YB,t  0

 
 YX,t  1

 
Yt =  YY ,t  = 0
Y  1
 AX,t  
Y  0
 AY ,t  
YBX,t  0
YBY ,t
|
{z
S
Forecasting: Principles and Practice

1
0


1 
 YAX,t
0
 YAY ,t 
1 

 YBX,t
0
YBY ,t
0
 | {z }
Bt
0
1
}
Hierarchical and grouped time series
11
Grouped data

Yt

AX
AY
A
BX
BY
B
X
Y
Total

1 1
1
0
0
1
0
1
0
0 0
1
0
1
1
0
0
0
1
0
 YA,t  1

 
 YB,t  0

 
 YX,t  1

 
Yt =  YY ,t  = 0
Y  1
 AX,t  
Y  0
 AY ,t  
YBX,t  0
YBY ,t
|
{z
S
Forecasting: Principles and Practice

1
0


1 
 YAX,t
0
 YAY ,t 
1 

 YBX,t
0
YBY ,t
0
 | {z }
Bt
0
1
}
Hierarchical and grouped time series
11
Grouped data

Yt

AX
AY
A
BX
BY
B
X
Y
Total

1 1
1
0
0
1
0
1
0
0 0
1
0
1
1
0
0
0
1
0
 YA,t  1

 
 YB,t  0

 
 YX,t  1

 
Yt =  YY ,t  = 0
Y  1
 AX,t  
Y  0
 AY ,t  
YBX,t  0
YBY ,t
|
{z
S
Forecasting: Principles and Practice

1
0


1 
 YAX,t
0
 YAY ,t 
1 

 YBX,t
0
YBY ,t
0
 | {z }
Bt
0
1
Yt = SBt
}
Hierarchical and grouped time series
11
Outline
1 Hierarchical and grouped time series
2 Forecasting framework
3 Optimal forecasts
4 Approximately optimal forecasts
5 Application: Australian tourism
6 Application: Australian labour market
7 hts package for R
Forecasting: Principles and Practice
Forecasting framework
12
Forecasting notation
Let Ŷn (h) be vector of initial h-step forecasts, made
at time n, stacked in same order as Yt . (They may
not add up.)
Hierarchical forecasting methods of the form:
Ỹn (h) = SPŶn (h)
for some matrix P.
P extracts and combines base forecasts Ŷn (h)
to get bottom-level forecasts.
S adds them up
Revised reconciled forecasts: Ỹn (h).
Forecasting: Principles and Practice
Forecasting framework
13
Forecasting notation
Let Ŷn (h) be vector of initial h-step forecasts, made
at time n, stacked in same order as Yt . (They may
not add up.)
Hierarchical forecasting methods of the form:
Ỹn (h) = SPŶn (h)
for some matrix P.
P extracts and combines base forecasts Ŷn (h)
to get bottom-level forecasts.
S adds them up
Revised reconciled forecasts: Ỹn (h).
Forecasting: Principles and Practice
Forecasting framework
13
Forecasting notation
Let Ŷn (h) be vector of initial h-step forecasts, made
at time n, stacked in same order as Yt . (They may
not add up.)
Hierarchical forecasting methods of the form:
Ỹn (h) = SPŶn (h)
for some matrix P.
P extracts and combines base forecasts Ŷn (h)
to get bottom-level forecasts.
S adds them up
Revised reconciled forecasts: Ỹn (h).
Forecasting: Principles and Practice
Forecasting framework
13
Forecasting notation
Let Ŷn (h) be vector of initial h-step forecasts, made
at time n, stacked in same order as Yt . (They may
not add up.)
Hierarchical forecasting methods of the form:
Ỹn (h) = SPŶn (h)
for some matrix P.
P extracts and combines base forecasts Ŷn (h)
to get bottom-level forecasts.
S adds them up
Revised reconciled forecasts: Ỹn (h).
Forecasting: Principles and Practice
Forecasting framework
13
Forecasting notation
Let Ŷn (h) be vector of initial h-step forecasts, made
at time n, stacked in same order as Yt . (They may
not add up.)
Hierarchical forecasting methods of the form:
Ỹn (h) = SPŶn (h)
for some matrix P.
P extracts and combines base forecasts Ŷn (h)
to get bottom-level forecasts.
S adds them up
Revised reconciled forecasts: Ỹn (h).
Forecasting: Principles and Practice
Forecasting framework
13
Forecasting notation
Let Ŷn (h) be vector of initial h-step forecasts, made
at time n, stacked in same order as Yt . (They may
not add up.)
Hierarchical forecasting methods of the form:
Ỹn (h) = SPŶn (h)
for some matrix P.
P extracts and combines base forecasts Ŷn (h)
to get bottom-level forecasts.
S adds them up
Revised reconciled forecasts: Ỹn (h).
Forecasting: Principles and Practice
Forecasting framework
13
Bottom-up forecasts
Ỹn (h) = SPŶn (h)
Bottom-up forecasts are obtained using
P = [0 | I] ,
where 0 is null matrix and I is identity matrix.
P matrix extracts only bottom-level forecasts
from Ŷn (h)
S adds them up to give the bottom-up
forecasts.
Forecasting: Principles and Practice
Forecasting framework
14
Bottom-up forecasts
Ỹn (h) = SPŶn (h)
Bottom-up forecasts are obtained using
P = [0 | I] ,
where 0 is null matrix and I is identity matrix.
P matrix extracts only bottom-level forecasts
from Ŷn (h)
S adds them up to give the bottom-up
forecasts.
Forecasting: Principles and Practice
Forecasting framework
14
Bottom-up forecasts
Ỹn (h) = SPŶn (h)
Bottom-up forecasts are obtained using
P = [0 | I] ,
where 0 is null matrix and I is identity matrix.
P matrix extracts only bottom-level forecasts
from Ŷn (h)
S adds them up to give the bottom-up
forecasts.
Forecasting: Principles and Practice
Forecasting framework
14
Top-down forecasts
Ỹn (h) = SPŶn (h)
Top-down forecasts are obtained using
P = [p | 0]
where p = [p1 , p2 , . . . , pmK ]0 is a vector of proportions
that sum to one.
P distributes forecasts of the aggregate to the
lowest level series.
Different methods of top-down forecasting lead
to different proportionality vectors p.
Forecasting: Principles and Practice
Forecasting framework
15
Top-down forecasts
Ỹn (h) = SPŶn (h)
Top-down forecasts are obtained using
P = [p | 0]
where p = [p1 , p2 , . . . , pmK ]0 is a vector of proportions
that sum to one.
P distributes forecasts of the aggregate to the
lowest level series.
Different methods of top-down forecasting lead
to different proportionality vectors p.
Forecasting: Principles and Practice
Forecasting framework
15
Top-down forecasts
Ỹn (h) = SPŶn (h)
Top-down forecasts are obtained using
P = [p | 0]
where p = [p1 , p2 , . . . , pmK ]0 is a vector of proportions
that sum to one.
P distributes forecasts of the aggregate to the
lowest level series.
Different methods of top-down forecasting lead
to different proportionality vectors p.
Forecasting: Principles and Practice
Forecasting framework
15
General properties: bias
Ỹn (h) = SPŶn (h)
Assume: base forecasts Ŷn (h) are unbiased:
E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ]
Let B̂n (h) be bottom level base forecasts
with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ].
Then E[Ŷn (h)] = Sβn (h).
We want the revised forecasts to be unbiased:
E[Ỹn (h)] = SPSβn (h) = Sβn (h).
Result will hold provided SPS = S.
True for bottom-up, but not for any top-down
method or middle-out method.
Forecasting: Principles and Practice
Forecasting framework
16
General properties: bias
Ỹn (h) = SPŶn (h)
Assume: base forecasts Ŷn (h) are unbiased:
E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ]
Let B̂n (h) be bottom level base forecasts
with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ].
Then E[Ŷn (h)] = Sβn (h).
We want the revised forecasts to be unbiased:
E[Ỹn (h)] = SPSβn (h) = Sβn (h).
Result will hold provided SPS = S.
True for bottom-up, but not for any top-down
method or middle-out method.
Forecasting: Principles and Practice
Forecasting framework
16
General properties: bias
Ỹn (h) = SPŶn (h)
Assume: base forecasts Ŷn (h) are unbiased:
E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ]
Let B̂n (h) be bottom level base forecasts
with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ].
Then E[Ŷn (h)] = Sβn (h).
We want the revised forecasts to be unbiased:
E[Ỹn (h)] = SPSβn (h) = Sβn (h).
Result will hold provided SPS = S.
True for bottom-up, but not for any top-down
method or middle-out method.
Forecasting: Principles and Practice
Forecasting framework
16
General properties: bias
Ỹn (h) = SPŶn (h)
Assume: base forecasts Ŷn (h) are unbiased:
E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ]
Let B̂n (h) be bottom level base forecasts
with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ].
Then E[Ŷn (h)] = Sβn (h).
We want the revised forecasts to be unbiased:
E[Ỹn (h)] = SPSβn (h) = Sβn (h).
Result will hold provided SPS = S.
True for bottom-up, but not for any top-down
method or middle-out method.
Forecasting: Principles and Practice
Forecasting framework
16
General properties: bias
Ỹn (h) = SPŶn (h)
Assume: base forecasts Ŷn (h) are unbiased:
E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ]
Let B̂n (h) be bottom level base forecasts
with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ].
Then E[Ŷn (h)] = Sβn (h).
We want the revised forecasts to be unbiased:
E[Ỹn (h)] = SPSβn (h) = Sβn (h).
Result will hold provided SPS = S.
True for bottom-up, but not for any top-down
method or middle-out method.
Forecasting: Principles and Practice
Forecasting framework
16
General properties: bias
Ỹn (h) = SPŶn (h)
Assume: base forecasts Ŷn (h) are unbiased:
E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ]
Let B̂n (h) be bottom level base forecasts
with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ].
Then E[Ŷn (h)] = Sβn (h).
We want the revised forecasts to be unbiased:
E[Ỹn (h)] = SPSβn (h) = Sβn (h).
Result will hold provided SPS = S.
True for bottom-up, but not for any top-down
method or middle-out method.
Forecasting: Principles and Practice
Forecasting framework
16
General properties: bias
Ỹn (h) = SPŶn (h)
Assume: base forecasts Ŷn (h) are unbiased:
E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ]
Let B̂n (h) be bottom level base forecasts
with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ].
Then E[Ŷn (h)] = Sβn (h).
We want the revised forecasts to be unbiased:
E[Ỹn (h)] = SPSβn (h) = Sβn (h).
Result will hold provided SPS = S.
True for bottom-up, but not for any top-down
method or middle-out method.
Forecasting: Principles and Practice
Forecasting framework
16
General properties: bias
Ỹn (h) = SPŶn (h)
Assume: base forecasts Ŷn (h) are unbiased:
E[Ŷn (h)|Y1 , . . . , Yn ] = E[Yn+h |Y1 , . . . , Yn ]
Let B̂n (h) be bottom level base forecasts
with βn (h) = E[B̂n (h)|Y1 , . . . , Yn ].
Then E[Ŷn (h)] = Sβn (h).
We want the revised forecasts to be unbiased:
E[Ỹn (h)] = SPSβn (h) = Sβn (h).
Result will hold provided SPS = S.
True for bottom-up, but not for any top-down
method or middle-out method.
Forecasting: Principles and Practice
Forecasting framework
16
General properties: variance
Ỹn (h) = SPŶn (h)
Let variance of base forecasts Ŷn (h) be given by
Σh = Var[Ŷn (h)|Y1 , . . . , Yn ]
Then the variance of the revised forecasts is given
by
Var[Ỹn (h)|Y1 , . . . , Yn ] = SPΣh P0 S0 .
This is a general result for all existing methods.
Forecasting: Principles and Practice
Forecasting framework
17
General properties: variance
Ỹn (h) = SPŶn (h)
Let variance of base forecasts Ŷn (h) be given by
Σh = Var[Ŷn (h)|Y1 , . . . , Yn ]
Then the variance of the revised forecasts is given
by
Var[Ỹn (h)|Y1 , . . . , Yn ] = SPΣh P0 S0 .
This is a general result for all existing methods.
Forecasting: Principles and Practice
Forecasting framework
17
General properties: variance
Ỹn (h) = SPŶn (h)
Let variance of base forecasts Ŷn (h) be given by
Σh = Var[Ŷn (h)|Y1 , . . . , Yn ]
Then the variance of the revised forecasts is given
by
Var[Ỹn (h)|Y1 , . . . , Yn ] = SPΣh P0 S0 .
This is a general result for all existing methods.
Forecasting: Principles and Practice
Forecasting framework
17
Outline
1 Hierarchical and grouped time series
2 Forecasting framework
3 Optimal forecasts
4 Approximately optimal forecasts
5 Application: Australian tourism
6 Application: Australian labour market
7 hts package for R
Forecasting: Principles and Practice
Optimal forecasts
18
Forecasts
Key idea: forecast reconciliation
å Ignore structural constraints and forecast every
series of interest independently.
å Adjust forecasts to impose constraints.
Let Ŷn (h) be vector of initial h-step forecasts, made
at time n, stacked in same order as Yt .
Yt = SBt .
So Ŷn (h) = Sβn (h) + εh .
βn (h) = E[Bn+h | Y1 , . . . , Yn ].
εh has zero mean and covariance Σh .
Estimate βn (h) using GLS?
Forecasting: Principles and Practice
Optimal forecasts
19
Forecasts
Key idea: forecast reconciliation
å Ignore structural constraints and forecast every
series of interest independently.
å Adjust forecasts to impose constraints.
Let Ŷn (h) be vector of initial h-step forecasts, made
at time n, stacked in same order as Yt .
Yt = SBt .
So Ŷn (h) = Sβn (h) + εh .
βn (h) = E[Bn+h | Y1 , . . . , Yn ].
εh has zero mean and covariance Σh .
Estimate βn (h) using GLS?
Forecasting: Principles and Practice
Optimal forecasts
19
Forecasts
Key idea: forecast reconciliation
å Ignore structural constraints and forecast every
series of interest independently.
å Adjust forecasts to impose constraints.
Let Ŷn (h) be vector of initial h-step forecasts, made
at time n, stacked in same order as Yt .
Yt = SBt .
So Ŷn (h) = Sβn (h) + εh .
βn (h) = E[Bn+h | Y1 , . . . , Yn ].
εh has zero mean and covariance Σh .
Estimate βn (h) using GLS?
Forecasting: Principles and Practice
Optimal forecasts
19
Forecasts
Key idea: forecast reconciliation
å Ignore structural constraints and forecast every
series of interest independently.
å Adjust forecasts to impose constraints.
Let Ŷn (h) be vector of initial h-step forecasts, made
at time n, stacked in same order as Yt .
Yt = SBt .
So Ŷn (h) = Sβn (h) + εh .
βn (h) = E[Bn+h | Y1 , . . . , Yn ].
εh has zero mean and covariance Σh .
Estimate βn (h) using GLS?
Forecasting: Principles and Practice
Optimal forecasts
19
Forecasts
Key idea: forecast reconciliation
å Ignore structural constraints and forecast every
series of interest independently.
å Adjust forecasts to impose constraints.
Let Ŷn (h) be vector of initial h-step forecasts, made
at time n, stacked in same order as Yt .
Yt = SBt .
So Ŷn (h) = Sβn (h) + εh .
βn (h) = E[Bn+h | Y1 , . . . , Yn ].
εh has zero mean and covariance Σh .
Estimate βn (h) using GLS?
Forecasting: Principles and Practice
Optimal forecasts
19
Forecasts
Key idea: forecast reconciliation
å Ignore structural constraints and forecast every
series of interest independently.
å Adjust forecasts to impose constraints.
Let Ŷn (h) be vector of initial h-step forecasts, made
at time n, stacked in same order as Yt .
Yt = SBt .
So Ŷn (h) = Sβn (h) + εh .
βn (h) = E[Bn+h | Y1 , . . . , Yn ].
εh has zero mean and covariance Σh .
Estimate βn (h) using GLS?
Forecasting: Principles and Practice
Optimal forecasts
19
Forecasts
Key idea: forecast reconciliation
å Ignore structural constraints and forecast every
series of interest independently.
å Adjust forecasts to impose constraints.
Let Ŷn (h) be vector of initial h-step forecasts, made
at time n, stacked in same order as Yt .
Yt = SBt .
So Ŷn (h) = Sβn (h) + εh .
βn (h) = E[Bn+h | Y1 , . . . , Yn ].
εh has zero mean and covariance Σh .
Estimate βn (h) using GLS?
Forecasting: Principles and Practice
Optimal forecasts
19
Optimal combination forecasts
†
†
Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Σ†h is generalized inverse of Σh .
†
Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0
Problem: Σh hard to estimate.
Forecasting: Principles and Practice
Optimal forecasts
20
Optimal combination forecasts
†
†
Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Initial forecasts
Σ†h is generalized inverse of Σh .
†
Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0
Problem: Σh hard to estimate.
Forecasting: Principles and Practice
Optimal forecasts
20
Optimal combination forecasts
†
†
Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Revised forecasts
Initial forecasts
Σ†h is generalized inverse of Σh .
†
Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0
Problem: Σh hard to estimate.
Forecasting: Principles and Practice
Optimal forecasts
20
Optimal combination forecasts
†
†
Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Revised forecasts
Initial forecasts
Σ†h is generalized inverse of Σh .
†
Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0
Problem: Σh hard to estimate.
Forecasting: Principles and Practice
Optimal forecasts
20
Optimal combination forecasts
†
†
Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Revised forecasts
Initial forecasts
Σ†h is generalized inverse of Σh .
†
Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0
Problem: Σh hard to estimate.
Forecasting: Principles and Practice
Optimal forecasts
20
Optimal combination forecasts
†
†
Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Revised forecasts
Initial forecasts
Σ†h is generalized inverse of Σh .
†
Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0
Problem: Σh hard to estimate.
Forecasting: Principles and Practice
Optimal forecasts
20
Optimal combination forecasts
†
†
Ỹn (h) = Sβ̂n (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Revised forecasts
Initial forecasts
Σ†h is generalized inverse of Σh .
†
Var[Ỹn (h)|Y1 , . . . , Yn ] = S(S0 Σh S)−1 S0
Problem: Σh hard to estimate.
Forecasting: Principles and Practice
Optimal forecasts
20
Outline
1 Hierarchical and grouped time series
2 Forecasting framework
3 Optimal forecasts
4 Approximately optimal forecasts
5 Application: Australian tourism
6 Application: Australian labour market
7 hts package for R
Forecasting: Principles and Practice
Approximately optimal forecasts
21
Optimal combination forecasts
†
†
Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Revised forecasts
Base forecasts
Solution 1: OLS
†
Approximate Σ1 by cI.
Or assume εh ≈ SεB,h where εB,h is the forecast
error at bottom level.
Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ).
If Moore-Penrose generalized inverse used,
†
†
then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 .
Ỹn (h) = S(S0 S)−1 S0 Ŷn (h)
Forecasting: Principles and Practice
Approximately optimal forecasts
22
Optimal combination forecasts
†
†
Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Revised forecasts
Base forecasts
Solution 1: OLS
†
Approximate Σ1 by cI.
Or assume εh ≈ SεB,h where εB,h is the forecast
error at bottom level.
Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ).
If Moore-Penrose generalized inverse used,
†
†
then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 .
Ỹn (h) = S(S0 S)−1 S0 Ŷn (h)
Forecasting: Principles and Practice
Approximately optimal forecasts
22
Optimal combination forecasts
†
†
Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Revised forecasts
Base forecasts
Solution 1: OLS
†
Approximate Σ1 by cI.
Or assume εh ≈ SεB,h where εB,h is the forecast
error at bottom level.
Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ).
If Moore-Penrose generalized inverse used,
†
†
then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 .
Ỹn (h) = S(S0 S)−1 S0 Ŷn (h)
Forecasting: Principles and Practice
Approximately optimal forecasts
22
Optimal combination forecasts
†
†
Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Revised forecasts
Base forecasts
Solution 1: OLS
†
Approximate Σ1 by cI.
Or assume εh ≈ SεB,h where εB,h is the forecast
error at bottom level.
Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ).
If Moore-Penrose generalized inverse used,
†
†
then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 .
Ỹn (h) = S(S0 S)−1 S0 Ŷn (h)
Forecasting: Principles and Practice
Approximately optimal forecasts
22
Optimal combination forecasts
†
†
Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Revised forecasts
Base forecasts
Solution 1: OLS
†
Approximate Σ1 by cI.
Or assume εh ≈ SεB,h where εB,h is the forecast
error at bottom level.
Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ).
If Moore-Penrose generalized inverse used,
†
†
then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 .
Ỹn (h) = S(S0 S)−1 S0 Ŷn (h)
Forecasting: Principles and Practice
Approximately optimal forecasts
22
Optimal combination forecasts
†
†
Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Revised forecasts
Base forecasts
Solution 1: OLS
†
Approximate Σ1 by cI.
Or assume εh ≈ SεB,h where εB,h is the forecast
error at bottom level.
Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ).
If Moore-Penrose generalized inverse used,
†
†
then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 .
Ỹn (h) = S(S0 S)−1 S0 Ŷn (h)
Forecasting: Principles and Practice
Approximately optimal forecasts
22
Optimal combination forecasts
†
†
Ỹn (h) = S(S0 Σh S)−1 S0 Σh Ŷn (h)
Revised forecasts
Base forecasts
Solution 1: OLS
†
Approximate Σ1 by cI.
Or assume εh ≈ SεB,h where εB,h is the forecast
error at bottom level.
Then Σh ≈ SΩh S0 where Ωh = Var(εB,h ).
If Moore-Penrose generalized inverse used,
†
†
then (S0 Σh S)−1 S0 Σh = (S0 S)−1 S0 .
Ỹn (h) = S(S0 S)−1 S0 Ŷn (h)
Forecasting: Principles and Practice
Approximately optimal forecasts
22
Optimal combination forecasts
Ỹn (h) = S(S0 S)−1 S0 Ŷn (h)
Total
A
B
C
Forecasting: Principles and Practice
Approximately optimal forecasts
23
Optimal combination forecasts
Ỹn (h) = S(S0 S)−1 S0 Ŷn (h)
Total
A
B
C
Weights:


0.75
0.25
0.25
0.25
 0.25 0.75 −0.25 −0.25 

S(S0 S)−1 S0 = 
 0.25 −0.25 0.75 −0.25 
0.25 −0.25 −0.25
0.75
Forecasting: Principles and Practice
Approximately optimal forecasts
23
Optimal combination forecasts
Total
A
AA
AB
C
B
AC
BA
BB
BC
CA
CB
CC
0.08
−0.06
−0.06
0.19
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
0.73
−0.27
−0.27
0.08
−0.06
−0.06
0.19
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
−0.27
0.73
−0.27
Weights: S(S0 S)−1 S0 =
0.69
 0.23

 0.23

 0.23

 0.08

 0.08

 0.08

 0.08

 0.08

 0.08

 0.08

 0.08
0.08

0.23
0.58
−0.17
−0.17
0.19
0.19
0.19
−0.06
−0.06
−0.06
−0.06
−0.06
−0.06
0.23
−0.17
0.58
−0.17
−0.06
−0.06
−0.06
0.19
0.19
0.19
−0.06
−0.06
−0.06
0.23
−0.17
−0.17
0.58
−0.06
−0.06
−0.06
−0.06
−0.06
−0.06
0.19
0.19
0.19
0.08
0.19
−0.06
−0.06
0.73
−0.27
−0.27
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
0.08
0.19
−0.06
−0.06
−0.27
0.73
−0.27
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
Forecasting: Principles and Practice
0.08
0.19
−0.06
−0.06
−0.27
−0.27
0.73
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
0.08
−0.06
0.19
−0.06
−0.02
−0.02
−0.02
0.73
−0.27
−0.27
−0.02
−0.02
−0.02
0.08
−0.06
0.19
−0.06
−0.02
−0.02
−0.02
−0.27
0.73
−0.27
−0.02
−0.02
−0.02
0.08
−0.06
0.19
−0.06
−0.02
−0.02
−0.02
−0.27
−0.27
0.73
−0.02
−0.02
−0.02
Approximately optimal forecasts

0.08
−0.06 

−0.06 

0.19 

−0.02 

−0.02 

−0.02 

−0.02 

−0.02 

−0.02 

−0.27 

−0.27 
0.73
24
Optimal combination forecasts
Total
A
AA
AB
C
B
AC
BA
BB
BC
CA
CB
CC
0.08
−0.06
−0.06
0.19
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
0.73
−0.27
−0.27
0.08
−0.06
−0.06
0.19
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
−0.27
0.73
−0.27
Weights: S(S0 S)−1 S0 =
0.69
 0.23

 0.23

 0.23

 0.08

 0.08

 0.08

 0.08

 0.08

 0.08

 0.08

 0.08
0.08

0.23
0.58
−0.17
−0.17
0.19
0.19
0.19
−0.06
−0.06
−0.06
−0.06
−0.06
−0.06
0.23
−0.17
0.58
−0.17
−0.06
−0.06
−0.06
0.19
0.19
0.19
−0.06
−0.06
−0.06
0.23
−0.17
−0.17
0.58
−0.06
−0.06
−0.06
−0.06
−0.06
−0.06
0.19
0.19
0.19
0.08
0.19
−0.06
−0.06
0.73
−0.27
−0.27
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
0.08
0.19
−0.06
−0.06
−0.27
0.73
−0.27
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
Forecasting: Principles and Practice
0.08
0.19
−0.06
−0.06
−0.27
−0.27
0.73
−0.02
−0.02
−0.02
−0.02
−0.02
−0.02
0.08
−0.06
0.19
−0.06
−0.02
−0.02
−0.02
0.73
−0.27
−0.27
−0.02
−0.02
−0.02
0.08
−0.06
0.19
−0.06
−0.02
−0.02
−0.02
−0.27
0.73
−0.27
−0.02
−0.02
−0.02
0.08
−0.06
0.19
−0.06
−0.02
−0.02
−0.02
−0.27
−0.27
0.73
−0.02
−0.02
−0.02
Approximately optimal forecasts

0.08
−0.06 

−0.06 

0.19 

−0.02 

−0.02 

−0.02 

−0.02 

−0.02 

−0.02 

−0.27 

−0.27 
0.73
24
Features
Covariates can be included in initial forecasts.
Adjustments can be made to initial forecasts
at any level.
Very simple and flexible method. Can work
with any hierarchical or grouped time series.
SPS = S so reconciled forcasts are unbiased.
Conceptually easy to implement: OLS on
base forecasts.
Weights are independent of the data and of
the covariance structure of the hierarchy.
Forecasting: Principles and Practice
Approximately optimal forecasts
25
Features
Covariates can be included in initial forecasts.
Adjustments can be made to initial forecasts
at any level.
Very simple and flexible method. Can work
with any hierarchical or grouped time series.
SPS = S so reconciled forcasts are unbiased.
Conceptually easy to implement: OLS on
base forecasts.
Weights are independent of the data and of
the covariance structure of the hierarchy.
Forecasting: Principles and Practice
Approximately optimal forecasts
25
Features
Covariates can be included in initial forecasts.
Adjustments can be made to initial forecasts
at any level.
Very simple and flexible method. Can work
with any hierarchical or grouped time series.
SPS = S so reconciled forcasts are unbiased.
Conceptually easy to implement: OLS on
base forecasts.
Weights are independent of the data and of
the covariance structure of the hierarchy.
Forecasting: Principles and Practice
Approximately optimal forecasts
25
Features
Covariates can be included in initial forecasts.
Adjustments can be made to initial forecasts
at any level.
Very simple and flexible method. Can work
with any hierarchical or grouped time series.
SPS = S so reconciled forcasts are unbiased.
Conceptually easy to implement: OLS on
base forecasts.
Weights are independent of the data and of
the covariance structure of the hierarchy.
Forecasting: Principles and Practice
Approximately optimal forecasts
25
Features
Covariates can be included in initial forecasts.
Adjustments can be made to initial forecasts
at any level.
Very simple and flexible method. Can work
with any hierarchical or grouped time series.
SPS = S so reconciled forcasts are unbiased.
Conceptually easy to implement: OLS on
base forecasts.
Weights are independent of the data and of
the covariance structure of the hierarchy.
Forecasting: Principles and Practice
Approximately optimal forecasts
25
Features
Covariates can be included in initial forecasts.
Adjustments can be made to initial forecasts
at any level.
Very simple and flexible method. Can work
with any hierarchical or grouped time series.
SPS = S so reconciled forcasts are unbiased.
Conceptually easy to implement: OLS on
base forecasts.
Weights are independent of the data and of
the covariance structure of the hierarchy.
Forecasting: Principles and Practice
Approximately optimal forecasts
25
Challenges
Ỹn (h) = S(S0 S)−1 S0 Ŷn (h)
Computational difficulties in big hierarchies due
to size of the S matrix and singular behavior of
(S0 S).
Need to estimate covariance matrix to produce
prediction intervals.
Ignores covariance matrix in computing point
forecasts.
Forecasting: Principles and Practice
Approximately optimal forecasts
26
Challenges
Ỹn (h) = S(S0 S)−1 S0 Ŷn (h)
Computational difficulties in big hierarchies due
to size of the S matrix and singular behavior of
(S0 S).
Need to estimate covariance matrix to produce
prediction intervals.
Ignores covariance matrix in computing point
forecasts.
Forecasting: Principles and Practice
Approximately optimal forecasts
26
Challenges
Ỹn (h) = S(S0 S)−1 S0 Ŷn (h)
Computational difficulties in big hierarchies due
to size of the S matrix and singular behavior of
(S0 S).
Need to estimate covariance matrix to produce
prediction intervals.
Ignores covariance matrix in computing point
forecasts.
Forecasting: Principles and Practice
Approximately optimal forecasts
26
Optimal combination forecasts
†
†
Ỹn (h) = S(S0 Σ1 S)−1 S0 Σ1 Ŷn (h)
Solution 1: OLS
†
Approximate Σ1 by cI.
Solution 2: Rescaling
Suppose we approximate Σ1 by its diagonal.
−1
Let Λ = diagonal Σ1
contain inverse
one-step forecast variances.
Ỹn (h) = S(S0ΛS)−1 S0ΛŶn (h)
Forecasting: Principles and Practice
Approximately optimal forecasts
27
Optimal combination forecasts
†
†
Ỹn (h) = S(S0 Σ1 S)−1 S0 Σ1 Ŷn (h)
Solution 1: OLS
†
Approximate Σ1 by cI.
Solution 2: Rescaling
Suppose we approximate Σ1 by its diagonal.
−1
Let Λ = diagonal Σ1
contain inverse
one-step forecast variances.
Ỹn (h) = S(S0ΛS)−1 S0ΛŶn (h)
Forecasting: Principles and Practice
Approximately optimal forecasts
27
Optimal combination forecasts
†
†
Ỹn (h) = S(S0 Σ1 S)−1 S0 Σ1 Ŷn (h)
Solution 1: OLS
†
Approximate Σ1 by cI.
Solution 2: Rescaling
Suppose we approximate Σ1 by its diagonal.
−1
Let Λ = diagonal Σ1
contain inverse
one-step forecast variances.
Ỹn (h) = S(S0ΛS)−1 S0ΛŶn (h)
Forecasting: Principles and Practice
Approximately optimal forecasts
27
Optimal combination forecasts
†
†
Ỹn (h) = S(S0 Σ1 S)−1 S0 Σ1 Ŷn (h)
Solution 1: OLS
†
Approximate Σ1 by cI.
Solution 2: Rescaling
Suppose we approximate Σ1 by its diagonal.
−1
Let Λ = diagonal Σ1
contain inverse
one-step forecast variances.
Ỹn (h) = S(S0ΛS)−1 S0ΛŶn (h)
Forecasting: Principles and Practice
Approximately optimal forecasts
27
Optimal combination forecasts
†
†
Ỹn (h) = S(S0 Σ1 S)−1 S0 Σ1 Ŷn (h)
Solution 1: OLS
†
Approximate Σ1 by cI.
Solution 2: Rescaling
Suppose we approximate Σ1 by its diagonal.
−1
Let Λ = diagonal Σ1
contain inverse
one-step forecast variances.
Ỹn (h) = S(S0ΛS)−1 S0ΛŶn (h)
Forecasting: Principles and Practice
Approximately optimal forecasts
27
Optimal reconciled forecasts
Ỹn (h) = Sβ̂n (h) = S(S0 ΛS)−1 S0 ΛŶn (h)
Easy to estimate, and places weight where
we have best forecasts.
Ignores covariances.
For large numbers of time series, we need
to do calculation without explicitly forming
S or (S0ΛS)−1 or S0Λ.
Forecasting: Principles and Practice
Approximately optimal forecasts
28
Optimal reconciled forecasts
Ỹn (h) = Sβ̂n (h) = S(S0 ΛS)−1 S0 ΛŶn (h)
Initial forecasts
Easy to estimate, and places weight where
we have best forecasts.
Ignores covariances.
For large numbers of time series, we need
to do calculation without explicitly forming
S or (S0ΛS)−1 or S0Λ.
Forecasting: Principles and Practice
Approximately optimal forecasts
28
Optimal reconciled forecasts
Ỹn (h) = Sβ̂n (h) = S(S0 ΛS)−1 S0 ΛŶn (h)
Revised forecasts
Initial forecasts
Easy to estimate, and places weight where
we have best forecasts.
Ignores covariances.
For large numbers of time series, we need
to do calculation without explicitly forming
S or (S0ΛS)−1 or S0Λ.
Forecasting: Principles and Practice
Approximately optimal forecasts
28
Optimal reconciled forecasts
Ỹn (h) = Sβ̂n (h) = S(S0 ΛS)−1 S0 ΛŶn (h)
Revised forecasts
Initial forecasts
Easy to estimate, and places weight where
we have best forecasts.
Ignores covariances.
For large numbers of time series, we need
to do calculation without explicitly forming
S or (S0ΛS)−1 or S0Λ.
Forecasting: Principles and Practice
Approximately optimal forecasts
28
Optimal reconciled forecasts
Ỹn (h) = Sβ̂n (h) = S(S0 ΛS)−1 S0 ΛŶn (h)
Revised forecasts
Initial forecasts
Easy to estimate, and places weight where
we have best forecasts.
Ignores covariances.
For large numbers of time series, we need
to do calculation without explicitly forming
S or (S0ΛS)−1 or S0Λ.
Forecasting: Principles and Practice
Approximately optimal forecasts
28
Outline
1 Hierarchical and grouped time series
2 Forecasting framework
3 Optimal forecasts
4 Approximately optimal forecasts
5 Application: Australian tourism
6 Application: Australian labour market
7 hts package for R
Forecasting: Principles and Practice
Application: Australian tourism
29
Australian tourism
Forecasting: Principles and Practice
Application: Australian tourism
30
Australian tourism
Domestic visitor nights
Quarterly data: 1998 – 2006.
From: National Visitor Survey, based on
annual interviews of 120,000 Australians
aged 15+, collected by Tourism Research
Australia.
Forecasting: Principles and Practice
Application: Australian tourism
30
Australian tourism
Hierarchy:
States (7)
Zones (27)
Regions (82)
Forecasting: Principles and Practice
Application: Australian tourism
30
Australian tourism
Hierarchy:
States (7)
Zones (27)
Regions (82)
Base forecasts
ETS (exponential smoothing)
models
Forecasting: Principles and Practice
Application: Australian tourism
30
Base forecasts
75000
70000
65000
60000
Visitor nights
80000
85000
Domestic tourism forecasts: Total
1998
2000
2002
2004
2006
2008
Year
Forecasting: Principles and Practice
Application: Australian tourism
31
Base forecasts
26000
22000
18000
Visitor nights
30000
Domestic tourism forecasts: NSW
1998
2000
2002
2004
2006
2008
Year
Forecasting: Principles and Practice
Application: Australian tourism
31
Base forecasts
16000
14000
12000
10000
Visitor nights
18000
Domestic tourism forecasts: VIC
1998
2000
2002
2004
2006
2008
Year
Forecasting: Principles and Practice
Application: Australian tourism
31
Base forecasts
7000
6000
5000
Visitor nights
8000
9000
Domestic tourism forecasts: Nth.Coast.NSW
1998
2000
2002
2004
2006
2008
Year
Forecasting: Principles and Practice
Application: Australian tourism
31
Base forecasts
11000
9000
8000
Visitor nights
13000
Domestic tourism forecasts: Metro.QLD
1998
2000
2002
2004
2006
2008
Year
Forecasting: Principles and Practice
Application: Australian tourism
31
Base forecasts
1000
800
600
400
Visitor nights
1200
1400
Domestic tourism forecasts: Sth.WA
1998
2000
2002
2004
2006
2008
Year
Forecasting: Principles and Practice
Application: Australian tourism
31
Base forecasts
5000
4500
4000
Visitor nights
5500
6000
Domestic tourism forecasts: X201.Melbourne
1998
2000
2002
2004
2006
2008
Year
Forecasting: Principles and Practice
Application: Australian tourism
31
Base forecasts
200
100
0
Visitor nights
300
Domestic tourism forecasts: X402.Murraylands
1998
2000
2002
2004
2006
2008
Year
Forecasting: Principles and Practice
Application: Australian tourism
31
Base forecasts
60
40
20
0
Visitor nights
80
100
Domestic tourism forecasts: X809.Daly
1998
2000
2002
2004
2006
2008
Year
Forecasting: Principles and Practice
Application: Australian tourism
31
80000
65000
Total
95000
Reconciled forecasts
2000
Forecasting: Principles and Practice
2005
2010
Application: Australian tourism
32
18000
2010
Other
Forecasting: Principles and Practice
14000
VIC
2005
10000
2000
2000
2005
2010
2000
2005
2010
24000
2010
18000
24000
2005
20000
2000
14000
QLD
18000
NSW
30000
Reconciled forecasts
Application: Australian tourism
32
2005
2010
Other QLD
9000
Other
22000
14000
Other NSW
2000
2010
2000
2005
2010
2000
2005
2010
2000
2005
2010
12000
2010
Other VIC
2005
6000
2000
2005
12000
2010
6000
2005
2000
5500
7500
7000
2000
6000
Melbourne
2010
20000
GC and Brisbane
2005
14000
Capital cities
2000
4000 5000
4000
Sydney
Reconciled forecasts
Forecasting: Principles and Practice
Application: Australian tourism
32
Forecast evaluation
Select models using all observations;
Re-estimate models using first 12 observations
and generate 1- to 8-step-ahead forecasts;
Increase sample size one observation at a time,
re-estimate models, generate forecasts until
the end of the sample;
In total 24 1-step-ahead, 23 2-steps-ahead, up
to 17 8-steps-ahead for forecast evaluation.
Forecasting: Principles and Practice
Application: Australian tourism
33
Forecast evaluation
Select models using all observations;
Re-estimate models using first 12 observations
and generate 1- to 8-step-ahead forecasts;
Increase sample size one observation at a time,
re-estimate models, generate forecasts until
the end of the sample;
In total 24 1-step-ahead, 23 2-steps-ahead, up
to 17 8-steps-ahead for forecast evaluation.
Forecasting: Principles and Practice
Application: Australian tourism
33
Forecast evaluation
Select models using all observations;
Re-estimate models using first 12 observations
and generate 1- to 8-step-ahead forecasts;
Increase sample size one observation at a time,
re-estimate models, generate forecasts until
the end of the sample;
In total 24 1-step-ahead, 23 2-steps-ahead, up
to 17 8-steps-ahead for forecast evaluation.
Forecasting: Principles and Practice
Application: Australian tourism
33
Forecast evaluation
Select models using all observations;
Re-estimate models using first 12 observations
and generate 1- to 8-step-ahead forecasts;
Increase sample size one observation at a time,
re-estimate models, generate forecasts until
the end of the sample;
In total 24 1-step-ahead, 23 2-steps-ahead, up
to 17 8-steps-ahead for forecast evaluation.
Forecasting: Principles and Practice
Application: Australian tourism
33
Hierarchy: states, zones, regions
MAPE
h=1
Top Level: Australia
Bottom-up
3.79
OLS
3.83
Scaling (st. dev.)
3.68
Level: States
Bottom-up
10.70
OLS
11.07
Scaling (st. dev.) 10.44
Level: Zones
Bottom-up
14.99
OLS
15.16
Scaling (st. dev.) 14.63
Bottom Level: Regions
Bottom-up
33.12
OLS
35.89
Scaling (st. dev.) 31.68
Forecasting: Principles and Practice
h=2
h=4
h=6
h=8
Average
3.58
3.66
3.56
4.01
3.88
3.97
4.55
4.19
4.57
4.24
4.25
4.25
4.06
3.94
4.04
10.52
10.58
10.17
10.85
11.13
10.47
11.46
11.62
10.97
11.27
12.21
10.98
11.03
11.35
10.67
14.97
15.06
14.62
14.98
15.27
14.68
15.69
15.74
15.17
15.65
16.15
15.25
15.32
15.48
14.94
32.54
33.86
31.22
32.26
34.26
31.08
33.74
36.06
32.41
33.96
37.49
32.77
33.18
35.43
31.89
Application: Australian tourism
34
Outline
1 Hierarchical and grouped time series
2 Forecasting framework
3 Optimal forecasts
4 Approximately optimal forecasts
5 Application: Australian tourism
6 Application: Australian labour market
7 hts package for R
Forecasting: Principles and Practice
Application: Australian labour market
35
ANZSCO
Australia and New Zealand Standard
Classification of Occupations
8 major groups
43 sub-major groups
97 minor groups
– 359 unit groups
* 1023 occupations
Example: statistician
2 Professionals
22 Business, Human Resource and Marketing
Professionals
224 Information and Organisation Professionals
2241 Actuaries, Mathematicians and Statisticians
224113 Statistician
Forecasting: Principles and Practice
Application: Australian labour market
36
ANZSCO
Australia and New Zealand Standard
Classification of Occupations
8 major groups
43 sub-major groups
97 minor groups
– 359 unit groups
* 1023 occupations
Example: statistician
2 Professionals
22 Business, Human Resource and Marketing
Professionals
224 Information and Organisation Professionals
2241 Actuaries, Mathematicians and Statisticians
224113 Statistician
Forecasting: Principles and Practice
Application: Australian labour market
36
Time
Level 0
700
500
1000
Level 1
1. Managers
2. Professionals
3. Technicians and trade workers
4. Community and personal services workers
5. Clerical and administrative workers
6. Sales workers
7. Machinery operators and drivers
8. Labourers
1500
2000
2500
7000
9000
11000
Australian Labour Market data
400
700 100
200
300
Level 2
500
600
Time
400
500
100
200
300
Level 3
500
600
Time
300
200
100
Level 4
400
Time
1990
1995
2000
2005
2010
Time
Forecasting: Principles and Practice
Application: Australian labour market
37
Time
Level 0
700
500
1000
Level 1
1. Managers
2. Professionals
3. Technicians and trade workers
4. Community and personal services workers
5. Clerical and administrative workers
6. Sales workers
7. Machinery operators and drivers
8. Labourers
1500
2000
2500
7000
9000
11000
Australian Labour Market data
400
700 100
200
300
Level 2
500
600
Time
Time
400
500
100
200
300
Level 3
500
600
Lower three panels
show largest
sub-groups at each
level.
300
200
100
Level 4
400
Time
1990
1995
2000
2005
2010
Time
Forecasting: Principles and Practice
Application: Australian labour market
37
11600
Level 0
11200
800 10800
Time
760
740
Level 1
720
700
500
680
700
Time
170
180 140
700 100
150
200
160
300
Level 2
180
400
190
500
200
600
Time
Level 2
Base forecasts
Reconciled forecasts
780
Time
1000
Level 1
1. Managers
2. Professionals
3. Technicians and trade workers
4. Community and personal services workers
5. Clerical and administrative workers
6. Sales workers
7. Machinery operators and drivers
8. Labourers
1500
2000
2500
7000
Level 0
9000
11000
12000
Australian Labour Market data
600
Time
170
Level 3
500
160
100
140
200
150
160
500
400
300
Level 3
Time
140
130
200
300
Level 4
150
Time
120
100
Level 4
400
Time
1990
1995
2000
2005
2010
Time
Forecasting: Principles and Practice
2010
2011
2012
2013
2014
2015
Year
Application: Australian labour market
37
11600
Level 0
11200
800 10800
760
740
Level 1
720
680
200
Time
190
500
160
170
Level 2
180
400
150
180 140
170
160
Level 3
150
140
160
600
500
400
500
100
200
300
Largest changes
shown for each level
Time
Time
120
100
130
200
300
Level 4
150
400
Time
140
Level 2
300
200
700 100
Base forecasts from
auto.arima()
Time
Level 3
Time
700
500
700
600
Time
Level 4
Base forecasts
Reconciled forecasts
780
Time
1000
Level 1
1. Managers
2. Professionals
3. Technicians and trade workers
4. Community and personal services workers
5. Clerical and administrative workers
6. Sales workers
7. Machinery operators and drivers
8. Labourers
1500
2000
2500
7000
Level 0
9000
11000
12000
Australian Labour Market data
1990
1995
2000
2005
2010
Time
Forecasting: Principles and Practice
2010
2011
2012
2013
2014
2015
Year
Application: Australian labour market
37
Forecast evaluation (rolling origin)
h=1
h=2
h=3
h=4
h=5
h=6
h=7
h=8
Average
Top level
Bottom-up
OLS
WLS
74.71
52.20
61.77
102.02
77.77
86.32
121.70
101.50
107.26
131.17
119.03
119.33
147.08
138.27
137.01
157.12
150.75
146.88
169.60
160.04
156.71
178.93
166.38
162.38
135.29
120.74
122.21
Level 1
Bottom-up
OLS
WLS
21.59
21.89
20.58
27.33
28.55
26.19
30.81
32.74
29.71
32.94
35.58
31.84
35.45
38.82
34.36
37.10
41.24
35.89
39.00
43.34
37.53
40.51
45.49
38.86
33.09
35.96
31.87
Level 2
Bottom-up
OLS
WLS
8.78
9.02
8.58
10.72
11.19
10.48
11.79
12.34
11.54
12.42
13.04
12.15
13.13
13.92
12.88
13.61
14.56
13.36
14.14
15.17
13.87
14.65
15.77
14.36
12.40
13.13
12.15
Level 3
Bottom-up
OLS
WLS
5.44
5.55
5.35
6.57
6.78
6.46
7.17
7.42
7.06
7.53
7.81
7.42
7.94
8.29
7.84
8.27
8.68
8.17
8.60
9.04
8.48
8.89
9.37
8.76
7.55
7.87
7.44
Bottom Level
Bottom-up
2.35
OLS
2.40
WLS
2.34
2.79
2.86
2.77
3.02
3.10
2.99
3.15
3.24
3.12
3.29
3.41
3.27
3.42
3.55
3.40
3.54
3.68
3.52
3.65
3.80
3.63
3.15
3.25
3.13
RMSE
Forecasting: Principles and Practice
Application: Australian labour market
38
Outline
1 Hierarchical and grouped time series
2 Forecasting framework
3 Optimal forecasts
4 Approximately optimal forecasts
5 Application: Australian tourism
6 Application: Australian labour market
7 hts package for R
Forecasting: Principles and Practice
hts package for R
39
hts package for R
hts: Hierarchical and grouped time series
Methods for analysing and forecasting hierarchical and grouped
time series
Version:
Depends:
Imports:
Published:
Author:
Maintainer:
BugReports:
License:
4.3
forecast (≥ 5.0)
SparseM, parallel, utils
2014-06-10
Rob J Hyndman, Earo Wang and Alan Lee
Rob J Hyndman <Rob.Hyndman at monash.edu>
https://github.com/robjhyndman/hts/issues
GPL (≥ 2)
Forecasting: Principles and Practice
hts package for R
40
Example using R
library(hts)
# bts is a matrix containing the bottom level time series
# nodes describes the hierarchical structure
y <- hts(bts, nodes=list(2, c(3,2)))
Forecasting: Principles and Practice
hts package for R
41
Example using R
library(hts)
# bts is a matrix containing the bottom level time series
# nodes describes the hierarchical structure
y <- hts(bts, nodes=list(2, c(3,2)))
Total
A
AX
Forecasting: Principles and Practice
AY
B
AZ
hts package for R
BX
BY
41
Example using R
library(hts)
# bts is a matrix containing the bottom level time series
# nodes describes the hierarchical structure
y <- hts(bts, nodes=list(2, c(3,2)))
# Forecast 10-step-ahead using WLS combination method
# ETS used for each series by default
fc <- forecast(y, h=10)
Forecasting: Principles and Practice
hts package for R
42
forecast.gts function
Usage
forecast(object, h,
method = c("comb", "bu", "mo", "tdgsf", "tdgsa", "tdfp"),
fmethod = c("ets", "rw", "arima"),
weights = c("sd", "none", "nseries"),
positive = FALSE,
parallel = FALSE, num.cores = 2, ...)
Arguments
object
h
method
fmethod
positive
weights
parallel
num.cores
Hierarchical time series object of class gts.
Forecast horizon
Method for distributing forecasts within the hierarchy.
Forecasting method to use
If TRUE, forecasts are forced to be strictly positive
Weights used for optimal combination method.
When
weights = sd, it takes account of the standard deviation of
forecasts.
If TRUE, allow parallel processing
If parallel = TRUE, specify how many cores are going to be
used
Forecasting: Principles and Practice
hts package for R
43
Download