Forecasting: Principles and Practice Rob J Hyndman 2. The forecaster’s toolbox

advertisement
Rob J Hyndman
Forecasting:
Principles and Practice
2. The forecaster’s toolbox
OTexts.com/fpp/2/
Forecasting: Principles and Practice
1
Outline
1 Time series graphics
2 Seasonal or cyclic?
3 Autocorrelation
4 Forecast residuals
5 White noise
6 Evaluating forecast accuracy
Forecasting: Principles and Practice
Time series graphics
2
Time series graphics
plot(melsyd[,"Economy.Class"])
20
15
10
5
0
Thousands
25
30
Economy class passengers: Melbourne−Sydney
1988
1989
1990
1991
1992
1993
Year
Forecasting: Principles and Practice
Time series graphics
3
Time series graphics
30
Antidiabetic drug sales
15
10
5
$ million
20
25
> plot(a10)
1995
2000
2005
Year
Forecasting: Principles and Practice
Time series graphics
4
Time series graphics
30
Seasonal plot: antidiabetic drug sales
2008 ●
2007 ●
25
2006 ●
●
●
●
●
●
2005 ●
●
2003 ●
2002 ●
15
2007
●
2006
●
●
2005
2004
●
2003
●
2002
1999 ●
1998
1997
1996
●
●
●
1995 ●
1994
1993 ●
1992 ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Jan
●
Feb
●
●
●
●
●
2000 ●
●
●
●
2001 ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2004 ●
10
$ million
20
●
5
●
●
●
●
●
●
●
Mar
Apr
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
May
Jun
●
●
●
●
●
Jul
●
●
●
●
●
●
●
●
●
2001
2000
1999
● 1998
● 1997
●
●
1996
●
●
●
●
●
1995
1993
1994
1992
●
1991
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Aug
Sep
Oct
●
●
●
●
●
●
●
●
●
Nov
Dec
Year
Forecasting: Principles and Practice
Time series graphics
5
Seasonal plots
Data plotted against the individual “seasons” in
which the data were observed. (In this case a
“season” is a month.)
Something like a time plot except that the data
from each season are overlapped.
Enables the underlying seasonal pattern to be
seen more clearly, and also allows any
substantial departures from the seasonal
pattern to be easily identified.
In R: seasonplot
Forecasting: Principles and Practice
Time series graphics
6
Seasonal plots
Data plotted against the individual “seasons” in
which the data were observed. (In this case a
“season” is a month.)
Something like a time plot except that the data
from each season are overlapped.
Enables the underlying seasonal pattern to be
seen more clearly, and also allows any
substantial departures from the seasonal
pattern to be easily identified.
In R: seasonplot
Forecasting: Principles and Practice
Time series graphics
6
Seasonal plots
Data plotted against the individual “seasons” in
which the data were observed. (In this case a
“season” is a month.)
Something like a time plot except that the data
from each season are overlapped.
Enables the underlying seasonal pattern to be
seen more clearly, and also allows any
substantial departures from the seasonal
pattern to be easily identified.
In R: seasonplot
Forecasting: Principles and Practice
Time series graphics
6
Seasonal plots
Data plotted against the individual “seasons” in
which the data were observed. (In this case a
“season” is a month.)
Something like a time plot except that the data
from each season are overlapped.
Enables the underlying seasonal pattern to be
seen more clearly, and also allows any
substantial departures from the seasonal
pattern to be easily identified.
In R: seasonplot
Forecasting: Principles and Practice
Time series graphics
6
Seasonal subseries plots
30
Seasonal subseries plot: antidiabetic drug sales
15
10
5
$ million
20
25
> monthplot(a10)
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Month
Forecasting: Principles and Practice
Time series graphics
7
Seasonal subseries plots
Data for each season collected together in time
plot as separate time series.
Enables the underlying seasonal pattern to be
seen clearly, and changes in seasonality over
time to be visualized.
In R: monthplot
Forecasting: Principles and Practice
Time series graphics
8
Seasonal subseries plots
Data for each season collected together in time
plot as separate time series.
Enables the underlying seasonal pattern to be
seen clearly, and changes in seasonality over
time to be visualized.
In R: monthplot
Forecasting: Principles and Practice
Time series graphics
8
Seasonal subseries plots
Data for each season collected together in time
plot as separate time series.
Enables the underlying seasonal pattern to be
seen clearly, and changes in seasonality over
time to be visualized.
In R: monthplot
Forecasting: Principles and Practice
Time series graphics
8
Quarterly Australian Beer Production
beer <- window(ausbeer,start=1992)
plot(beer)
seasonplot(beer,year.labels=TRUE)
monthplot(beer)
Forecasting: Principles and Practice
Time series graphics
9
Time series graphics
450
400
megaliters
500
Australian quarterly beer production
1995
Forecasting: Principles and Practice
2000
2005
Time series graphics
10
Time series graphics
Seasonal plot: quarterly beer production
●
1992
1994
1997
1999
1995
1998
1993
1996
2002
2000
2001
2006
2003
2005
2007
●
2004
500
●
●
●
●
●
●
●
●
●
●
●
●
●
450
400
megalitres
●
2001
1994
1992
2006
1999
2004
2003
1993
1997
1998
2002
2007
1995
2000
2008
2005
1996
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Q1
Q2
Q3
Q4
Quarter
Forecasting: Principles and Practice
Time series graphics
11
Time series graphics
450
400
Megalitres
500
Seasonal subseries plot: quarterly beer production
Jan
Apr
Jul
Oct
Quarter
Forecasting: Principles and Practice
Time series graphics
12
Time series graphics
Time plots
R command: plot or plot.ts
Seasonal plots
R command: seasonplot
Seasonal subseries plots
R command: monthplot
Lag plots
R command: lag.plot
ACF plots
R command: Acf
Forecasting: Principles and Practice
Time series graphics
13
Outline
1 Time series graphics
2 Seasonal or cyclic?
3 Autocorrelation
4 Forecast residuals
5 White noise
6 Evaluating forecast accuracy
Forecasting: Principles and Practice
Seasonal or cyclic?
14
Time series patterns
Trend pattern exists when there is a long-term
increase or decrease in the data.
Seasonal pattern exists when a series is
influenced by seasonal factors (e.g., the
quarter of the year, the month, or day of
the week).
Cyclic pattern exists when data exhibit rises and
falls that are not of fixed period (duration
usually of at least 2 years).
Forecasting: Principles and Practice
Seasonal or cyclic?
15
Time series patterns
12000
10000
8000
GWh
14000
Australian electricity production
1980
1985
1990
1995
Year
Forecasting: Principles and Practice
Seasonal or cyclic?
16
Time series patterns
400
300
200
million units
500
600
Australian clay brick production
1960
1970
1980
1990
Year
Forecasting: Principles and Practice
Seasonal or cyclic?
17
Time series patterns
60
50
40
30
Total sales
70
80
90
Sales of new one−family houses, USA
1975
1980
Forecasting: Principles and Practice
1985
1990
Seasonal or cyclic?
1995
18
Time series patterns
88
87
86
85
price
89
90
91
US Treasury bill contracts
0
20
40
60
80
100
Day
Forecasting: Principles and Practice
Seasonal or cyclic?
19
1000 2000 3000 4000 5000 6000 7000
Annual Canadian Lynx trappings
0
Number trapped
Time series patterns
1820
1840
1860
1880
1900
1920
Time
Forecasting: Principles and Practice
Seasonal or cyclic?
20
Seasonal or cyclic?
Differences between seasonal and cyclic
patterns:
seasonal pattern constant length; cyclic pattern
variable length
average length of cycle longer than length of
seasonal pattern
magnitude of cycle more variable than
magnitude of seasonal pattern
The timing of peaks and troughs is predictable with
seasonal data, but unpredictable in the long term
with cyclic data.
Forecasting: Principles and Practice
Seasonal or cyclic?
21
Seasonal or cyclic?
Differences between seasonal and cyclic
patterns:
seasonal pattern constant length; cyclic pattern
variable length
average length of cycle longer than length of
seasonal pattern
magnitude of cycle more variable than
magnitude of seasonal pattern
The timing of peaks and troughs is predictable with
seasonal data, but unpredictable in the long term
with cyclic data.
Forecasting: Principles and Practice
Seasonal or cyclic?
21
Seasonal or cyclic?
Differences between seasonal and cyclic
patterns:
seasonal pattern constant length; cyclic pattern
variable length
average length of cycle longer than length of
seasonal pattern
magnitude of cycle more variable than
magnitude of seasonal pattern
The timing of peaks and troughs is predictable with
seasonal data, but unpredictable in the long term
with cyclic data.
Forecasting: Principles and Practice
Seasonal or cyclic?
21
Seasonal or cyclic?
Differences between seasonal and cyclic
patterns:
seasonal pattern constant length; cyclic pattern
variable length
average length of cycle longer than length of
seasonal pattern
magnitude of cycle more variable than
magnitude of seasonal pattern
The timing of peaks and troughs is predictable with
seasonal data, but unpredictable in the long term
with cyclic data.
Forecasting: Principles and Practice
Seasonal or cyclic?
21
Seasonal or cyclic?
Differences between seasonal and cyclic
patterns:
seasonal pattern constant length; cyclic pattern
variable length
average length of cycle longer than length of
seasonal pattern
magnitude of cycle more variable than
magnitude of seasonal pattern
The timing of peaks and troughs is predictable with
seasonal data, but unpredictable in the long term
with cyclic data.
Forecasting: Principles and Practice
Seasonal or cyclic?
21
Outline
1 Time series graphics
2 Seasonal or cyclic?
3 Autocorrelation
4 Forecast residuals
5 White noise
6 Evaluating forecast accuracy
Forecasting: Principles and Practice
Autocorrelation
22
Autocorrelation
Covariance and correlation: measure extent of
linear relationship between two variables (y and
X).
Autocovariance and autocorrelation: measure
linear relationship between lagged values of a
time series y.
We measure the relationship between: yt and yt−1
yt and yt−2
yt and yt−3
etc.
Forecasting: Principles and Practice
Autocorrelation
23
Autocorrelation
Covariance and correlation: measure extent of
linear relationship between two variables (y and
X).
Autocovariance and autocorrelation: measure
linear relationship between lagged values of a
time series y.
We measure the relationship between: yt and yt−1
yt and yt−2
yt and yt−3
etc.
Forecasting: Principles and Practice
Autocorrelation
23
Autocorrelation
Covariance and correlation: measure extent of
linear relationship between two variables (y and
X).
Autocovariance and autocorrelation: measure
linear relationship between lagged values of a
time series y.
We measure the relationship between: yt and yt−1
yt and yt−2
yt and yt−3
etc.
Forecasting: Principles and Practice
Autocorrelation
23
Example: Beer production
> lag.plot(beer,lags=9)
500
400
36
beer
31 11
3
15
47
39 35
51
55
59
63
7
43
23
27
19
37 9
1
57
49 45
29
215
41 13
61 25
31
47
6533 11 3
53
15
39
35
51
17
55 7
23 43
59
27
19
63
50
12
8
20
44
36
40
60
52
37
9
1
57
4929 45 5
21
25
133161 41
33
4711
3
15 39 35
53
51 27
55
42 591423
43 17
27
3454
18 19
26 22
63
50
58 62 30
10
46
38
6
37
9
1
57
49
5 45 2129
25 61 13
41
33
6
53
17
14 2 42
54 34
18
22
26
50
58
30
10
3862
46
500
36
52
52
48
37 9
1
4945
29 5
21
25
13 61 41
31
11 47
33
3
53
39
3515
7 51
17
55
23
43
59
27
19
57
47
59
51 35
55
43
31
3 11
39
15
7
23
27
19
28
44
4
24
16
4 12
24
32
16
8
20
8
44
36
48 40
56
37
57
29
41
33
53
14
42
54
34
26 18 22
50
58
10 46
38
2
9
1
49
13
21 5
25
17
30
lag 7
Forecasting: Principles and Practice
9
400
3038
1
29
lag 8
450
10
28
20
36
40
48
56
52
37
57
45 5 49
21
25 13
41
11 47
6 31 333
15 3953
35
2 425514 51 7 17
43 23
59
54
27
34
22 1819
26
50
30 58 1046
38
222618
50
lag 6
12
45
6
142
42
54
34
58
46
32
6
17
34
8
lag 5
beer
450
beer
1628
44 20
4
24
52
9
1
5 2921
41 25
13
33
12
40 60
56
60 40
48
31
1147
3
39
15 35
51
7
55
23 43
59 27
19
2
42
54
2218
26
50
58
62
3010
38 46
53
lag 3
4
24
32
32
56
400
30 10
4
24
28 16
8
20 44
36
4860 40
lag 4
32
16 28
8
20 44
36
58
46
38
56
beer
beer
56
12
42 14 2
34
18 26 22
54
61
6
3
lag 2
4 12
24
28
32 16
48
37
57
45
49
62
lag 1
52
3111
47
15 39
35
51
755
43
59 23 14
27
19
63
6
500
1
57
45
29 49
5
21
61 25 41
13
65 33 6
53
17 42 142
5434
1826
22
66
50
62 58 3846
30
10
500
56
64
52
500
beer
450
beer
400
52
37
9
450
4 12
24
32
16
28
8
20 44
36
60 4048
beer
500
20
44
40 48
60
56
64
4
450
12
24
32
28 16
8
20
44
36
40
60 48
56
64
8
400
450
12 4
24
28
16 32
beer
400
52
37
9
1
57
2949
455
21
41 1325
33 6
53
17 1442 2
54 34
18 26 22
50
58
10 46 3038
47
43
55
31
39
51
11
35
27
3
15
7
23
19
lag 9
Autocorrelation
24
Example: Beer production
> lag.plot(beer,lags=9,do.lines=FALSE)
400
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
beer
●
●
●●●● ●
●
● ● ●
●
● ● ●
●
● ● ●●
●●
●●●
●●
● ● ●●●
●
●
●
● ●
●
●
● ●
●
●
● ●
● ● ●
●
●
●
●●
●
● ● ●●
●●
● ●● ●●
●● ● ●●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
500
●
●
● ● ●
● ●
●
●
●
●
●
● ● ●●
●● ● ●
●
●
●
● ● ●
● ●
●●
●
●
●
●●●●●
●●
●
●●●
●
●
●
●●
●●
●●
●●
●
●
● ●
●
●●●● ●
● ●●
● ●
●
●
●
●
●●●●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
● ●
●● ●
●
●
●●●
●
●
●
●●
●
beer
●
●
●
●
●
● ●
●
●
lag 7
Forecasting: Principles and Practice
●
●
●
lag 8
450
●
●
●
●
●●
●
●
●
●● ● ●●
● ●
● ● ● ●●●
●
●●
●
● ●●● ● ● ●
● ●
●
●●●
● ●●
●
●
● ● ●●
●
400
●●
●●
●
●
●
●
●
●●●
●
lag 6
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
● ●●
●
lag 5
●
●
●
●
● ●
●
●
●
●
● ●●
●
● ●
●
●● ●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
lag 3
●
●
●
●
●
450
beer
●
●
●●
●
●
● ●
●
●●
●
●
●●
lag 4
400
●
●
●
●
beer
beer
●
●
●
●
●
●
● ●●
● ●
●
●
●●
●●
● ● ●
●
●
●
●
●
●
lag 2
● ●
●
●
●
●
●●
●
●● ●
●● ●
●
●●●●
●
●●
● ● ●
●
● ●● ●
● ●●●●
●●
●●
● ●
● ●
●
●
●● ●
●
●
●
●●
●
lag 1
●●
500
●
●
●
●
●
●
●
●
500
●
●
450
●●
●
●
500
beer
400
450
beer
●
●
450
500
●●
500
●
400
●●
●
●
beer
450
●
●
beer
400
●
●
●
●
●●●● ●●
● ●●
● ●
●
● ●● ●
● ●
● ● ●
●
●
● ● ●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
lag 9
Autocorrelation
25
Lagged scatterplots
Each graph shows yt plotted against yt−k for
different values of k.
The autocorrelations are the correlations
associated with these scatterplots.
Forecasting: Principles and Practice
Autocorrelation
26
Lagged scatterplots
Each graph shows yt plotted against yt−k for
different values of k.
The autocorrelations are the correlations
associated with these scatterplots.
Forecasting: Principles and Practice
Autocorrelation
26
Autocorrelation
We denote the sample autocovariance at lag k by ck and the
sample autocorrelation at lag k by rk . Then define
ck =
and
T
1 X
T
(yt − ȳ)(yt−k − ȳ)
t =k +1
rk = ck /c0
r1 indicates how successive values of y relate to each
other
r2 indicates how y values two periods apart relate to
each other
rk is almost the same as the sample correlation between
yt and yt−k .
Forecasting: Principles and Practice
Autocorrelation
27
Autocorrelation
We denote the sample autocovariance at lag k by ck and the
sample autocorrelation at lag k by rk . Then define
ck =
and
T
1 X
T
(yt − ȳ)(yt−k − ȳ)
t =k +1
rk = ck /c0
r1 indicates how successive values of y relate to each
other
r2 indicates how y values two periods apart relate to
each other
rk is almost the same as the sample correlation between
yt and yt−k .
Forecasting: Principles and Practice
Autocorrelation
27
Autocorrelation
We denote the sample autocovariance at lag k by ck and the
sample autocorrelation at lag k by rk . Then define
ck =
and
T
1 X
T
(yt − ȳ)(yt−k − ȳ)
t =k +1
rk = ck /c0
r1 indicates how successive values of y relate to each
other
r2 indicates how y values two periods apart relate to
each other
rk is almost the same as the sample correlation between
yt and yt−k .
Forecasting: Principles and Practice
Autocorrelation
27
Autocorrelation
We denote the sample autocovariance at lag k by ck and the
sample autocorrelation at lag k by rk . Then define
ck =
and
T
1 X
T
(yt − ȳ)(yt−k − ȳ)
t =k +1
rk = ck /c0
r1 indicates how successive values of y relate to each
other
r2 indicates how y values two periods apart relate to
each other
rk is almost the same as the sample correlation between
yt and yt−k .
Forecasting: Principles and Practice
Autocorrelation
27
Autocorrelation
Results for first 9 lags for beer data:
r1
r2
r3
r4
r5
r6
r7
r8
r9
−0.126 −0.650 −0.094 0.863 −0.099 −0.642 −0.098 0.834 −0.116
Forecasting: Principles and Practice
Autocorrelation
28
Autocorrelation
Results for first 9 lags for beer data:
r1
r2
r3
r4
r5
r6
r7
r8
r9
ACF
−0.5
0.0
0.5
−0.126 −0.650 −0.094 0.863 −0.099 −0.642 −0.098 0.834 −0.116
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Lag
Forecasting: Principles and Practice
Autocorrelation
28
Autocorrelation
r4 higher than for the other lags. This is due to
the seasonal pattern in the data: the peaks
tend to be 4 quarters apart and the troughs
tend to be 2 quarters apart.
r2 is more negative than for the other lags
because troughs tend to be 2 quarters behind
peaks.
Together, the autocorrelations at lags 1, 2, . . . ,
make up the autocorrelation or ACF.
The plot is known as a correlogram
Forecasting: Principles and Practice
Autocorrelation
29
Autocorrelation
r4 higher than for the other lags. This is due to
the seasonal pattern in the data: the peaks
tend to be 4 quarters apart and the troughs
tend to be 2 quarters apart.
r2 is more negative than for the other lags
because troughs tend to be 2 quarters behind
peaks.
Together, the autocorrelations at lags 1, 2, . . . ,
make up the autocorrelation or ACF.
The plot is known as a correlogram
Forecasting: Principles and Practice
Autocorrelation
29
Autocorrelation
r4 higher than for the other lags. This is due to
the seasonal pattern in the data: the peaks
tend to be 4 quarters apart and the troughs
tend to be 2 quarters apart.
r2 is more negative than for the other lags
because troughs tend to be 2 quarters behind
peaks.
Together, the autocorrelations at lags 1, 2, . . . ,
make up the autocorrelation or ACF.
The plot is known as a correlogram
Forecasting: Principles and Practice
Autocorrelation
29
Autocorrelation
r4 higher than for the other lags. This is due to
the seasonal pattern in the data: the peaks
tend to be 4 quarters apart and the troughs
tend to be 2 quarters apart.
r2 is more negative than for the other lags
because troughs tend to be 2 quarters behind
peaks.
Together, the autocorrelations at lags 1, 2, . . . ,
make up the autocorrelation or ACF.
The plot is known as a correlogram
Forecasting: Principles and Practice
Autocorrelation
29
0.0
−0.5
ACF
0.5
ACF
1
2
3
4
5
6
Acf(beer)
Forecasting: Principles and Practice
7
8
9
10
11
12
13
14
15
16
17
Lag
Autocorrelation
30
0.0
−0.5
ACF
0.5
ACF
1
2
3
4
5
6
Acf(beer)
Forecasting: Principles and Practice
7
8
9
10
11
12
13
14
15
16
17
Lag
Autocorrelation
30
Recognizing seasonality in a time series
If there is seasonality, the ACF at the seasonal lag
(e.g., 12 for monthly data) will be large and
positive.
For seasonal monthly data, a large ACF value
will be seen at lag 12 and possibly also at lags
24, 36, . . .
For seasonal quarterly data, a large ACF value
will be seen at lag 4 and possibly also at lags 8,
12, . . .
Forecasting: Principles and Practice
Autocorrelation
31
Recognizing seasonality in a time series
If there is seasonality, the ACF at the seasonal lag
(e.g., 12 for monthly data) will be large and
positive.
For seasonal monthly data, a large ACF value
will be seen at lag 12 and possibly also at lags
24, 36, . . .
For seasonal quarterly data, a large ACF value
will be seen at lag 4 and possibly also at lags 8,
12, . . .
Forecasting: Principles and Practice
Autocorrelation
31
Australian monthly electricity production
12000
10000
8000
GWh
14000
Australian electricity production
1980
1985
1990
1995
Year
Forecasting: Principles and Practice
Autocorrelation
32
0.4
0.2
0.0
−0.2
ACF
0.6
0.8
Australian monthly electricity production
0
10
20
30
40
Lag
Forecasting: Principles and Practice
Autocorrelation
33
Australian monthly electricity production
Time plot shows clear trend and seasonality.
The same features are reflected in the ACF.
The slowly decaying ACF indicates trend.
The ACF peaks at lags 12, 24, 36, . . . , indicate
seasonality of length 12.
Forecasting: Principles and Practice
Autocorrelation
34
Australian monthly electricity production
Time plot shows clear trend and seasonality.
The same features are reflected in the ACF.
The slowly decaying ACF indicates trend.
The ACF peaks at lags 12, 24, 36, . . . , indicate
seasonality of length 12.
Forecasting: Principles and Practice
Autocorrelation
34
Which is which?
10
9
7
thousands
2. Accidental deaths in USA (monthly)
8
chirps per minute
40
60
80
1. Daily morning temperature of a cow
0
20
40
60
1973
1975
1977
1979
4. Annual mink trappings (Canada)
100
60
20
thousands
100
thousands
200 300 400
3. International airline passengers
1950
1952
1954
1956
1850
1870
1890
1910
ACF
0.2
0.6
-0.4
-0.4
ACF
0.2
0.6
1.0
B
1.0
A
5
10
15
20
5
10
15
20
15
20
ACF
0.2
0.6
-0.4
-0.4
ACF
0.2
0.6
1.0
D
1.0
C
5
10
15
20
5
10
Outline
1 Time series graphics
2 Seasonal or cyclic?
3 Autocorrelation
4 Forecast residuals
5 White noise
6 Evaluating forecast accuracy
Forecasting: Principles and Practice
Forecast residuals
36
Forecasting residuals
Residuals in forecasting: difference between
observed value and its forecast based on all
previous observations: et = yt − ŷt|t−1 .
Assumptions
1
{et } uncorrelated. If they aren’t, then
information left in residuals that should be used
in computing forecasts.
2
{et } have mean zero. If they don’t, then
forecasts are biased.
Useful properties (for prediction intervals)
3
{et } have constant variance.
4
{et } are normally distributed.
Forecasting: Principles and Practice
Forecast residuals
37
Forecasting residuals
Residuals in forecasting: difference between
observed value and its forecast based on all
previous observations: et = yt − ŷt|t−1 .
Assumptions
1
{et } uncorrelated. If they aren’t, then
information left in residuals that should be used
in computing forecasts.
2
{et } have mean zero. If they don’t, then
forecasts are biased.
Useful properties (for prediction intervals)
3
{et } have constant variance.
4
{et } are normally distributed.
Forecasting: Principles and Practice
Forecast residuals
37
Forecasting residuals
Residuals in forecasting: difference between
observed value and its forecast based on all
previous observations: et = yt − ŷt|t−1 .
Assumptions
1
{et } uncorrelated. If they aren’t, then
information left in residuals that should be used
in computing forecasts.
2
{et } have mean zero. If they don’t, then
forecasts are biased.
Useful properties (for prediction intervals)
3
{et } have constant variance.
4
{et } are normally distributed.
Forecasting: Principles and Practice
Forecast residuals
37
3800
3700
3600
Dow−Jones index
3900
Forecasting Dow-Jones index
0
50
100
150
200
250
300
Day
Forecasting: Principles and Practice
Forecast residuals
38
Forecasting Dow-Jones index
Naïve forecast:
ŷt|t−1 = yt−1
et = yt − yt−1
Note: et are one-step-forecast residuals
Forecasting: Principles and Practice
Forecast residuals
39
Forecasting Dow-Jones index
Naïve forecast:
ŷt|t−1 = yt−1
et = yt − yt−1
Note: et are one-step-forecast residuals
Forecasting: Principles and Practice
Forecast residuals
39
Forecasting Dow-Jones index
Naïve forecast:
ŷt|t−1 = yt−1
et = yt − yt−1
Note: et are one-step-forecast residuals
Forecasting: Principles and Practice
Forecast residuals
39
3800
3700
3600
Dow−Jones index
3900
Forecasting Dow-Jones index
0
50
100
150
200
250
300
Day
Forecasting: Principles and Practice
Forecast residuals
40
0
−50
−100
Change in Dow−Jones index
50
Forecasting Dow-Jones index
0
50
100
150
200
250
300
Day
Forecasting: Principles and Practice
Forecast residuals
41
Forecasting Dow-Jones index
10
20
30
Normal?
0
Frequency
40
50
60
Histogram of residuals
−100
−50
0
50
Change in Dow−Jones index
Forecasting: Principles and Practice
Forecast residuals
42
−0.15
−0.05
ACF
0.05
0.10
0.15
Forecasting Dow-Jones index
1
2
3
4
5
6
7
8
9 10
12
14
16
18
20
22
Lag
Forecasting: Principles and Practice
Forecast residuals
43
Forecasting Dow-Jones index
fc <- rwf(dj)
res <- residuals(fc)
plot(res)
hist(res,breaks="FD")
Acf(res,main="")
Forecasting: Principles and Practice
Forecast residuals
44
Outline
1 Time series graphics
2 Seasonal or cyclic?
3 Autocorrelation
4 Forecast residuals
5 White noise
6 Evaluating forecast accuracy
Forecasting: Principles and Practice
White noise
45
Example: White noise
−3
−2
−1
x
0
1
2
White noise
0
10
20
30
40
50
Time
Forecasting: Principles and Practice
White noise
46
Example: White noise
2
White noise
−3
−2
−1
x
0
1
White noise data is uncorrelated across
time with zero mean and constant variance.
(Technically, we require independence as
well.)
0
10
20
30
40
50
Time
Forecasting: Principles and Practice
White noise
46
Example: White noise
2
White noise
−1
x
0
1
White noise data is uncorrelated across
time with zero mean and constant variance.
(Technically, we require independence as
well.)
−3
−2
Think of white noise as completely
uninteresting with no predictable patterns.
0
10
20
30
40
50
Time
Forecasting: Principles and Practice
White noise
46
r10
0.2
0.0
ACF
0.013
−0.163
0.163
−0.259
−0.198
0.064
−0.139
−0.032
0.199
−0.240
−0.2
=
=
=
=
=
=
=
=
=
=
−0.4
r1
r2
r3
r4
r5
r6
r7
r8
r9
0.4
Example: White noise
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Lag
Sample autocorrelations for white noise series.
For uncorrelated data, we would expect each
autocorrelation to be close to zero.
Forecasting: Principles and Practice
White noise
47
Sampling distribution of autocorrelations
Sampling distribution of rk for white noise data is
asymptotically N(0,1/T).
95% of √
all rk for white noise must lie within
±1.96/ T.
If this is not the case, the series is probably not
WN.
√
Common to plot lines at ±1.96/ T when
plotting ACF. These are the critical values.
Forecasting: Principles and Practice
White noise
48
Sampling distribution of autocorrelations
Sampling distribution of rk for white noise data is
asymptotically N(0,1/T).
95% of √
all rk for white noise must lie within
±1.96/ T.
If this is not the case, the series is probably not
WN.
√
Common to plot lines at ±1.96/ T when
plotting ACF. These are the critical values.
Forecasting: Principles and Practice
White noise
48
Sampling distribution of autocorrelations
Sampling distribution of rk for white noise data is
asymptotically N(0,1/T).
95% of √
all rk for white noise must lie within
±1.96/ T.
If this is not the case, the series is probably not
WN.
√
Common to plot lines at ±1.96/ T when
plotting ACF. These are the critical values.
Forecasting: Principles and Practice
White noise
48
Sampling distribution of autocorrelations
Sampling distribution of rk for white noise data is
asymptotically N(0,1/T).
95% of √
all rk for white noise must lie within
±1.96/ T.
If this is not the case, the series is probably not
WN.
√
Common to plot lines at ±1.96/ T when
plotting ACF. These are the critical values.
Forecasting: Principles and Practice
White noise
48
Autocorrelation
0.0
−0.4
−0.2
ACF
0.2
0.4
Example:
T = 50 and so
critical √
values at
±1.96/ 50 =
±0.28.
All autocorrelation
coefficients lie within
1
2
3
4
5
6
these limits,
confirming that the
data are white noise.
(More precisely, the data cannot be
distinguished from white noise.)
Forecasting: Principles and Practice
7
8
9
10
11
12
13
14
Lag
White noise
49
15
Example: Pigs slaughtered
100
90
80
thousands
110
Number of pigs slaughtered in Victoria
1990
1991
1992
Forecasting: Principles and Practice
1993
1994
White noise
1995
50
0.0
−0.2
ACF
0.2
Example: Pigs slaughtered
0
10
20
30
40
Lag
Forecasting: Principles and Practice
White noise
51
Example: Pigs slaughtered
Monthly total number of pigs slaughtered in the
state of Victoria, Australia, from January 1990
through August 1995. (Source: Australian Bureau of
Statistics.)
Difficult to detect pattern in time plot.
ACF shows some significant autocorrelation at
lags 1, 2, and 3.
r12 relatively large although not significant.
This may indicate some slight seasonality.
These show the series is not a white noise series.
Forecasting: Principles and Practice
White noise
52
Example: Pigs slaughtered
Monthly total number of pigs slaughtered in the
state of Victoria, Australia, from January 1990
through August 1995. (Source: Australian Bureau of
Statistics.)
Difficult to detect pattern in time plot.
ACF shows some significant autocorrelation at
lags 1, 2, and 3.
r12 relatively large although not significant.
This may indicate some slight seasonality.
These show the series is not a white noise series.
Forecasting: Principles and Practice
White noise
52
Example: Pigs slaughtered
Monthly total number of pigs slaughtered in the
state of Victoria, Australia, from January 1990
through August 1995. (Source: Australian Bureau of
Statistics.)
Difficult to detect pattern in time plot.
ACF shows some significant autocorrelation at
lags 1, 2, and 3.
r12 relatively large although not significant.
This may indicate some slight seasonality.
These show the series is not a white noise series.
Forecasting: Principles and Practice
White noise
52
Example: Pigs slaughtered
Monthly total number of pigs slaughtered in the
state of Victoria, Australia, from January 1990
through August 1995. (Source: Australian Bureau of
Statistics.)
Difficult to detect pattern in time plot.
ACF shows some significant autocorrelation at
lags 1, 2, and 3.
r12 relatively large although not significant.
This may indicate some slight seasonality.
These show the series is not a white noise series.
Forecasting: Principles and Practice
White noise
52
Example: Pigs slaughtered
Monthly total number of pigs slaughtered in the
state of Victoria, Australia, from January 1990
through August 1995. (Source: Australian Bureau of
Statistics.)
Difficult to detect pattern in time plot.
ACF shows some significant autocorrelation at
lags 1, 2, and 3.
r12 relatively large although not significant.
This may indicate some slight seasonality.
These show the series is not a white noise series.
Forecasting: Principles and Practice
White noise
52
Example: Pigs slaughtered
Monthly total number of pigs slaughtered in the
state of Victoria, Australia, from January 1990
through August 1995. (Source: Australian Bureau of
Statistics.)
Difficult to detect pattern in time plot.
ACF shows some significant autocorrelation at
lags 1, 2, and 3.
r12 relatively large although not significant.
This may indicate some slight seasonality.
These show the series is not a white noise series.
Forecasting: Principles and Practice
White noise
52
ACF of residuals
We assume that the residuals are white noise
(uncorrelated, mean zero, constant variance). If
they aren’t, then there is information left in the
residuals that should be used in computing
forecasts.
So a standard residual diagnostic is to check
the ACF of the residuals of a forecasting
method.
We expect these to look like white noise.
Dow-Jones naive forecasts revisited
ŷt|t−1 = yt−1
et = yt − yt−1
Forecasting: Principles and Practice
White noise
53
ACF of residuals
We assume that the residuals are white noise
(uncorrelated, mean zero, constant variance). If
they aren’t, then there is information left in the
residuals that should be used in computing
forecasts.
So a standard residual diagnostic is to check
the ACF of the residuals of a forecasting
method.
We expect these to look like white noise.
Dow-Jones naive forecasts revisited
ŷt|t−1 = yt−1
et = yt − yt−1
Forecasting: Principles and Practice
White noise
53
ACF of residuals
We assume that the residuals are white noise
(uncorrelated, mean zero, constant variance). If
they aren’t, then there is information left in the
residuals that should be used in computing
forecasts.
So a standard residual diagnostic is to check
the ACF of the residuals of a forecasting
method.
We expect these to look like white noise.
Dow-Jones naive forecasts revisited
ŷt|t−1 = yt−1
et = yt − yt−1
Forecasting: Principles and Practice
White noise
53
ACF of residuals
We assume that the residuals are white noise
(uncorrelated, mean zero, constant variance). If
they aren’t, then there is information left in the
residuals that should be used in computing
forecasts.
So a standard residual diagnostic is to check
the ACF of the residuals of a forecasting
method.
We expect these to look like white noise.
Dow-Jones naive forecasts revisited
ŷt|t−1 = yt−1
et = yt − yt−1
Forecasting: Principles and Practice
White noise
53
ACF of residuals
We assume that the residuals are white noise
(uncorrelated, mean zero, constant variance). If
they aren’t, then there is information left in the
residuals that should be used in computing
forecasts.
So a standard residual diagnostic is to check
the ACF of the residuals of a forecasting
method.
We expect these to look like white noise.
Dow-Jones naive forecasts revisited
ŷt|t−1 = yt−1
et = yt − yt−1
Forecasting: Principles and Practice
White noise
53
0
−50
−100
Change in Dow−Jones index
50
Forecasting Dow-Jones index
0
50
100
150
200
250
300
Day
Forecasting: Principles and Practice
White noise
54
−0.15
−0.05
ACF
0.05
0.10
0.15
Forecasting Dow-Jones index
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Lag
Forecasting: Principles and Practice
White noise
55
−0.15
−0.05
ACF
0.05
0.10
0.15
Example: Dow-Jones residuals
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Lag
These look like white noise.
But the ACF is a multiple testing problem.
Forecasting: Principles and Practice
White noise
56
−0.15
−0.05
ACF
0.05
0.10
0.15
Example: Dow-Jones residuals
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Lag
These look like white noise.
But the ACF is a multiple testing problem.
Forecasting: Principles and Practice
White noise
56
Portmanteau tests
Consider a whole set of rk values, and develop a
test to see whether the set is significantly different
from a zero set.
Forecasting: Principles and Practice
White noise
57
Portmanteau tests
Consider a whole set of rk values, and develop a
test to see whether the set is significantly different
from a zero set.
Box-Pierce test
Q=T
h
X
rk2
k =1
where h is max lag being considered and T is
number of observations.
My preferences: h = 10 for non-seasonal data, h = 2m
for seasonal data.
If each rk close to zero, Q will be small.
If some rk values large (positive or negative), Q will be
large.
Forecasting: Principles and Practice
White noise
57
Portmanteau tests
Consider a whole set of rk values, and develop a
test to see whether the set is significantly different
from a zero set.
Box-Pierce test
Q=T
h
X
rk2
k =1
where h is max lag being considered and T is
number of observations.
My preferences: h = 10 for non-seasonal data, h = 2m
for seasonal data.
If each rk close to zero, Q will be small.
If some rk values large (positive or negative), Q will be
large.
Forecasting: Principles and Practice
White noise
57
Portmanteau tests
Consider a whole set of rk values, and develop a
test to see whether the set is significantly different
from a zero set.
Box-Pierce test
Q=T
h
X
rk2
k =1
where h is max lag being considered and T is
number of observations.
My preferences: h = 10 for non-seasonal data, h = 2m
for seasonal data.
If each rk close to zero, Q will be small.
If some rk values large (positive or negative), Q will be
large.
Forecasting: Principles and Practice
White noise
57
Portmanteau tests
Consider a whole set of rk values, and develop a
test to see whether the set is significantly different
from a zero set.
Box-Pierce test
Q=T
h
X
rk2
k =1
where h is max lag being considered and T is
number of observations.
My preferences: h = 10 for non-seasonal data, h = 2m
for seasonal data.
If each rk close to zero, Q will be small.
If some rk values large (positive or negative), Q will be
large.
Forecasting: Principles and Practice
White noise
57
Portmanteau tests
Consider a whole set of rk values, and develop a
test to see whether the set is significantly different
from a zero set.
Ljung-Box test
∗
Q = T (T + 2)
h
X
(T − k)−1 rk2
k =1
where h is max lag being considered and T is
number of observations.
My preferences: h = 10 for non-seasonal data, h = 2m
for seasonal data.
Better performance, especially in small samples.
Forecasting: Principles and Practice
White noise
57
Portmanteau tests
If data are WN, Q∗ has χ2 distribution with
(h − K ) degrees of freedom where K = no.
parameters in model.
When applied to raw data, set K = 0.
For the Dow-Jones example,
res <- residuals(naive(dj))
# lag=h and fitdf=K
> Box.test(res, lag=10,
Box-Pierce test
X-squared = 14.0451, df
> Box.test(res, lag=10,
Box-Ljung test
X-squared = 14.4615, df
Forecasting: Principles and Practice
fitdf=0)
= 10, p-value = 0.1709
fitdf=0, type="Lj")
= 10, p-value = 0.153
White noise
58
Portmanteau tests
If data are WN, Q∗ has χ2 distribution with
(h − K ) degrees of freedom where K = no.
parameters in model.
When applied to raw data, set K = 0.
For the Dow-Jones example,
res <- residuals(naive(dj))
# lag=h and fitdf=K
> Box.test(res, lag=10,
Box-Pierce test
X-squared = 14.0451, df
> Box.test(res, lag=10,
Box-Ljung test
X-squared = 14.4615, df
Forecasting: Principles and Practice
fitdf=0)
= 10, p-value = 0.1709
fitdf=0, type="Lj")
= 10, p-value = 0.153
White noise
58
Portmanteau tests
If data are WN, Q∗ has χ2 distribution with
(h − K ) degrees of freedom where K = no.
parameters in model.
When applied to raw data, set K = 0.
For the Dow-Jones example,
res <- residuals(naive(dj))
# lag=h and fitdf=K
> Box.test(res, lag=10,
Box-Pierce test
X-squared = 14.0451, df
> Box.test(res, lag=10,
Box-Ljung test
X-squared = 14.4615, df
Forecasting: Principles and Practice
fitdf=0)
= 10, p-value = 0.1709
fitdf=0, type="Lj")
= 10, p-value = 0.153
White noise
58
Exercise
1
2
Calculate the residuals from a seasonal
naive forecast applied to the quarterly
Australian beer production data from
1992.
Test if the residuals are white noise.
Forecasting: Principles and Practice
White noise
59
Exercise
1
2
Calculate the residuals from a seasonal
naive forecast applied to the quarterly
Australian beer production data from
1992.
Test if the residuals are white noise.
beer <- window(ausbeer,start=1992)
fc <- snaive(beer)
res <- residuals(fc)
Acf(res)
Box.test(res, lag=8, fitdf=0, type="Lj")
Forecasting: Principles and Practice
White noise
60
Outline
1 Time series graphics
2 Seasonal or cyclic?
3 Autocorrelation
4 Forecast residuals
5 White noise
6 Evaluating forecast accuracy
Forecasting: Principles and Practice
Evaluating forecast accuracy
61
Measures of forecast accuracy
Let yt denote the tth observation and ŷt|t−1 denote its forecast
based on all previous data, where t = 1, . . . , T. Then the
following measures are useful.
MAE = T −1
T
X
|yt − ŷt|t−1 |
t =1
T
MSE = T −1
X
(yt − ŷt|t−1 )2
v
u
T
u
X
(yt − ŷt|t−1 )2
RMSE = tT −1
t =1
t =1
MAPE = 100T −1
T
X
|yt − ŷt|t−1 |/|yt |
t =1
MAE, MSE, RMSE are all scale dependent.
MAPE is scale independent but is only sensible if yt 0
for all t, and y has a natural zero.
Forecasting: Principles and Practice
Evaluating forecast accuracy
62
Measures of forecast accuracy
Let yt denote the tth observation and ŷt|t−1 denote its forecast
based on all previous data, where t = 1, . . . , T. Then the
following measures are useful.
MAE = T −1
T
X
|yt − ŷt|t−1 |
t =1
T
MSE = T −1
X
(yt − ŷt|t−1 )2
v
u
T
u
X
(yt − ŷt|t−1 )2
RMSE = tT −1
t =1
t =1
MAPE = 100T −1
T
X
|yt − ŷt|t−1 |/|yt |
t =1
MAE, MSE, RMSE are all scale dependent.
MAPE is scale independent but is only sensible if yt 0
for all t, and y has a natural zero.
Forecasting: Principles and Practice
Evaluating forecast accuracy
62
Measures of forecast accuracy
Let yt denote the tth observation and ŷt|t−1 denote its forecast
based on all previous data, where t = 1, . . . , T. Then the
following measures are useful.
MAE = T −1
T
X
|yt − ŷt|t−1 |
t =1
T
MSE = T −1
X
(yt − ŷt|t−1 )2
v
u
T
u
X
(yt − ŷt|t−1 )2
RMSE = tT −1
t =1
t =1
MAPE = 100T −1
T
X
|yt − ŷt|t−1 |/|yt |
t =1
MAE, MSE, RMSE are all scale dependent.
MAPE is scale independent but is only sensible if yt 0
for all t, and y has a natural zero.
Forecasting: Principles and Practice
Evaluating forecast accuracy
62
Measures of forecast accuracy
Mean Absolute Scaled Error
MASE = T
−1
T
X
|yt − ŷt|t−1 |/Q
t =1
where Q is a stable measure of the scale of the time
series {yt }.
Forecasting: Principles and Practice
Evaluating forecast accuracy
63
Measures of forecast accuracy
Mean Absolute Scaled Error
MASE = T
−1
T
X
|yt − ŷt|t−1 |/Q
t =1
where Q is a stable measure of the scale of the time
series {yt }.
Proposed by Hyndman and Koehler (IJF, 2006)
Forecasting: Principles and Practice
Evaluating forecast accuracy
63
Measures of forecast accuracy
Mean Absolute Scaled Error
MASE = T
−1
T
X
|yt − ŷt|t−1 |/Q
t =1
where Q is a stable measure of the scale of the time
series {yt }.
For non-seasonal time series,
−1
Q = (T − 1)
T
X
|yt − yt−1 |
t =2
works well. Then MASE is equivalent to MAE relative
to a naive method.
Forecasting: Principles and Practice
Evaluating forecast accuracy
63
Measures of forecast accuracy
Mean Absolute Scaled Error
MASE = T
−1
T
X
|yt − ŷt|t−1 |/Q
t =1
where Q is a stable measure of the scale of the time
series {yt }.
For seasonal time series,
−1
Q = (T − m)
T
X
|yt − yt−m |
t =m+1
works well. Then MASE is equivalent to MAE relative
to a seasonal naive method.
Forecasting: Principles and Practice
Evaluating forecast accuracy
64
Measures of forecast accuracy
Forecasts for quarterly beer production
400
450
500
Mean method
Naive method
Seasonal naive method
1995
Forecasting: Principles and Practice
2000
2005
Evaluating forecast accuracy
65
Measures of forecast accuracy
Forecasts for quarterly beer production
400
450
500
Mean method
Naive method
Seasonal naive method
1995
Forecasting: Principles and Practice
2000
2005
Evaluating forecast accuracy
65
Measures of forecast accuracy
Mean method
RMSE
38.0145
MAE
33.7776
MAPE
8.1700
MASE
2.2990
MAPE
15.8765
MASE
4.3498
Naïve method
RMSE
70.9065
MAE
63.9091
Seasonal naïve method
RMSE
12.9685
MAE
11.2727
Forecasting: Principles and Practice
MAPE
2.7298
MASE
0.7673
Evaluating forecast accuracy
66
Measures of forecast accuracy
Dow Jones Index (daily ending 15 Jul 94)
3600
3700
3800
3900
Mean method
Naive method
Drift model
0
50
100
150
200
250
300
Day
Forecasting: Principles and Practice
Evaluating forecast accuracy
67
Measures of forecast accuracy
Dow Jones Index (daily ending 15 Jul 94)
3600
3700
3800
3900
Mean method
Naive method
Drift model
0
50
100
150
200
250
300
Day
Forecasting: Principles and Practice
Evaluating forecast accuracy
67
Measures of forecast accuracy
Mean method
RMSE
148.2357
MAE
142.4185
MAPE
3.6630
MASE
8.6981
MAE
54.4405
MAPE
1.3979
MASE
3.3249
MAE
45.7274
MAPE
1.1758
MASE
2.7928
Naïve method
RMSE
62.0285
Drift model
RMSE
53.6977
Forecasting: Principles and Practice
Evaluating forecast accuracy
68
Training and test sets
Available data
Training set
(e.g., 80%)
Test set
(e.g., 20%)
The test set must not be used for any aspect of
model development or calculation of forecasts.
Forecast accuracy is based only on the test set.
Forecasting: Principles and Practice
Evaluating forecast accuracy
69
Training and test sets
Available data
Training set
(e.g., 80%)
Test set
(e.g., 20%)
The test set must not be used for any aspect of
model development or calculation of forecasts.
Forecast accuracy is based only on the test set.
Forecasting: Principles and Practice
Evaluating forecast accuracy
69
Training and test sets
beer3 <- window(ausbeer,start=1992,end=2005.99)
beer4 <- window(ausbeer,start=2006)
fit1 <- meanf(beer3,h=20)
fit2 <- rwf(beer3,h=20)
accuracy(fit1,beer4)
accuracy(fit2,beer4)
In-sample accuracy (one-step forecasts)
accuracy(fit1)
accuracy(fit2)
Forecasting: Principles and Practice
Evaluating forecast accuracy
70
Training and test sets
beer3 <- window(ausbeer,start=1992,end=2005.99)
beer4 <- window(ausbeer,start=2006)
fit1 <- meanf(beer3,h=20)
fit2 <- rwf(beer3,h=20)
accuracy(fit1,beer4)
accuracy(fit2,beer4)
In-sample accuracy (one-step forecasts)
accuracy(fit1)
accuracy(fit2)
Forecasting: Principles and Practice
Evaluating forecast accuracy
70
Beware of over-fitting
A model which fits the data well does not
necessarily forecast well.
A perfect fit can always be obtained by using a
model with enough parameters. (Compare R2 )
Over-fitting a model to data is as bad as failing
to identify the systematic pattern in the data.
Problems can be overcome by measuring true
out-of-sample forecast accuracy. That is, total
data divided into “training” set and “test” set.
Training set used to estimate parameters.
Forecasts are made for test set.
Accuracy measures computed for errors in test
set only.
Forecasting: Principles and Practice
Evaluating forecast accuracy
71
Beware of over-fitting
A model which fits the data well does not
necessarily forecast well.
A perfect fit can always be obtained by using a
model with enough parameters. (Compare R2 )
Over-fitting a model to data is as bad as failing
to identify the systematic pattern in the data.
Problems can be overcome by measuring true
out-of-sample forecast accuracy. That is, total
data divided into “training” set and “test” set.
Training set used to estimate parameters.
Forecasts are made for test set.
Accuracy measures computed for errors in test
set only.
Forecasting: Principles and Practice
Evaluating forecast accuracy
71
Beware of over-fitting
A model which fits the data well does not
necessarily forecast well.
A perfect fit can always be obtained by using a
model with enough parameters. (Compare R2 )
Over-fitting a model to data is as bad as failing
to identify the systematic pattern in the data.
Problems can be overcome by measuring true
out-of-sample forecast accuracy. That is, total
data divided into “training” set and “test” set.
Training set used to estimate parameters.
Forecasts are made for test set.
Accuracy measures computed for errors in test
set only.
Forecasting: Principles and Practice
Evaluating forecast accuracy
71
Beware of over-fitting
A model which fits the data well does not
necessarily forecast well.
A perfect fit can always be obtained by using a
model with enough parameters. (Compare R2 )
Over-fitting a model to data is as bad as failing
to identify the systematic pattern in the data.
Problems can be overcome by measuring true
out-of-sample forecast accuracy. That is, total
data divided into “training” set and “test” set.
Training set used to estimate parameters.
Forecasts are made for test set.
Accuracy measures computed for errors in test
set only.
Forecasting: Principles and Practice
Evaluating forecast accuracy
71
Beware of over-fitting
A model which fits the data well does not
necessarily forecast well.
A perfect fit can always be obtained by using a
model with enough parameters. (Compare R2 )
Over-fitting a model to data is as bad as failing
to identify the systematic pattern in the data.
Problems can be overcome by measuring true
out-of-sample forecast accuracy. That is, total
data divided into “training” set and “test” set.
Training set used to estimate parameters.
Forecasts are made for test set.
Accuracy measures computed for errors in test
set only.
Forecasting: Principles and Practice
Evaluating forecast accuracy
71
Poll: true or false?
1
Good forecast methods should have normally
distributed residuals.
2
A model with small residuals will give good
forecasts.
3
The best measure of forecast accuracy is MAPE.
4
If your model doesn’t forecast well, you should
make it more complicated.
5
Always choose the model with the best forecast
accuracy as measured on the test set.
Forecasting: Principles and Practice
Evaluating forecast accuracy
72
Download