Uploaded by Paula SΓ‘

time series

advertisement
MADSAD | FORECASTING METHODS AND TIME SERIES
PROJECT: NUMBER OF DEATHS IN SCOTLAND
25 JUNE 2021
Susana Ribeiro – up202000971@fep.up.pt
TABLE OF CONTENTS
1
INTRODUCTION ......................................................................................................... 3
2
THE TIME SERIES ....................................................................................................... 4
3
TIME SERIES MODELLING ......................................................................................... 5
3.1
DECOMPOSITION METHODS................................................................................................. 5
3.1.1
Classical Decomposition ...................................................................................................................... 5
3.1.2
STL Decomposition ............................................................................................................................. 6
3.2
SMOTHING METHODS .......................................................................................................... 8
3.3
STATISTICAL METHODS ......................................................................................................10
4
FORECASTING .......................................................................................................... 13
5
FINAL CONSIDERATIONS ........................................................................................ 14
6
REFERENCES ............................................................................................................ 15
ACRONYMS
FEP – Faculdade de Economia da Universidade do Porto
MADSAD – Mestrado em Análise de Dados e Sistemas de Apoio à Decisão
FMTS | Project – Number of Deaths in Scotland
Page 2
1
INTRODUCTION
In this project, as an important part of the course Forecasting Methods and Time Series, was requested to
students to produce forecasts for the time series using smoothing, decomposition, and statistical models,
explaining the analysis performed and the motive why they were chosen.
The decision regarding data itself was quite difficult, because besides the extensive list of interesting themes
that I would like to analyse, I have faced 2 difficulties: or the data of the theme was not available, or it was
yearly based. Considering that, I decided to study monthly deaths of Scotland 1, with information of cause (not
used) since January 2013 until March 2021. I have no special connection with Scotland, but it is a very interesting
country, and I am confident about the possible analysis I will get with this data.
First part of this project is related to exploratory data - Section 2, then Section 3 is related to modelling and
forecasting using different approaches, Section 4 comparison of forecasting and final considerations in Section
5.
This analysis was carried out using R, namely using forecast, astsa, fpp2 and tseries packages. Corresponding
file and excel with data are attached to this report.
https://www.nrscotland.gov.uk/statistics-and-data/statistics/statistics-by-theme/vital-events/generalpublications/weekly-and-monthly-data-on-births-and-deaths/monthly-data-on-births-and-deaths-registered-inscotland
1
FMTS | Project – Number of Deaths in Scotland
Page 3
2
TIME SERIES: EXPLORATORY DATA ANALYSIS
Time series used in this report contains 99 registers.
Considering the final goal of this project is the forecast, the decision of exclude last 12 months of data from
most of the analysis was made. The forecasting exercise was compared with test set built with last 12 months
of data used, so April 2019 until March 2020. In Figure 2 is clear the atypical behaviour of data occurred in the
beginning of 2021.
Time series used and his graphical representation are available in Figure 1.
Figure 1 – Monthly deaths in Scotland between Jan 2013 and Mar 2021
Monthly behaviour of deaths and respectively averages are shown in Figure 2
Figure 2 – Seasonal plot & averages per month of Deaths in Scotland
FMTS | Project – Number of Deaths in Scotland
Page 4
3
TIME SERIES: MODELLING AND FORECASTING
Three methods were used to modelling time series: Decomposition, Something and Statistical. They will be all
presented in this section. In each decomposition modelling and forecasting are explained.
Accuracy measures presented in the report are: Mean Error (ME), Root Mean Squared Error (RMSE), Mean
Absolute Error (MAE), Mean Percentage Error (MPE), Mean Absolute Percentage Error (MAPE), Mean Absolute
Scaled Error (MASE), Autocorrelation of errors at lag 1 (ACF1) and Theil's U Index of Inequality (Theil's U).
3.1 DECOMPOSITION METHODS
Time series data can exhibit a variety of patterns, and it is often helpful to split a time series into several
components, each representing an underlying pattern category.
3.1.1
Classical Decomposition
It is a relatively simple procedure and forms the starting point for most other methods of time series
decomposition. There are two forms of classical decomposition: an additive decomposition and a multiplicative
decomposition. Additive model was the one chosen, considering there’s no evidence that variance increases
with time, and the components representation are presented in Figure 3.
Figure 3 – Decomposition of time series in trend, seasonal and random components
In figures below there are time series representation and the summation of trend and seasonal components (in
blue) and the graphical representation of residuals - Figure 4.
Figure 4 – Time series and Trend + Seasonal components representation & Residuals representation
FMTS | Project – Number of Deaths in Scotland
Page 5
The representation of Autocorrelation function and Partial Autocorrelation function is one of the ways to verify
if the model is or not adjusted to data. We may say that apparently something is missing, because there’s some
values with high correlation. Same interpretation can be taken from the seasonal adjusted time series
(subtracting seasonal component). Both representations are available in Figure 5
Figure 5 –ACF and PACF for Residuals & Seasonal Adjusted time series
Classical Decomposition, for time series in study, can be expressed by following expression:
π·π‘’π‘Žπ‘‘β„Žπ‘ [𝑑] = π‘‡π‘Ÿπ‘’π‘›π‘‘[𝑑] + π‘†π‘’π‘Žπ‘ π‘œπ‘›π‘Žπ‘™[𝑑] + π‘…π‘’π‘ π‘–π‘‘π‘’π‘Žπ‘™π‘ [𝑑].
Figure 6 presents test set and forecast using Classical Decomposition (April 2019 until March 2020).
Figure 6 – Classical Decomposition model’s forecast
Accuracy measures can be obtained with R function accuracy(), and values regarding training and test set can
be observed on Table 1.
Table 1 – Accuracy measures for Classical Decomposition
3.1.2
ME
RMSE
MAE
MPE
MAPE
ACF1
Theil's U
593.9924
653.5914
593.9924
11.93733
11.93733
-0.01609
1.23781
STL Decomposition
STL is a versatile and robust method for decomposing time series. STL is an acronym for “Seasonal and Trend
decomposition using Loess,” while Loess is a method for estimating nonlinear relationships.
As in Classical Decomposition, additive modelling was the one chosen. Using R function stl() the decomposition
are obtained, Figure 7.
FMTS | Project – Number of Deaths in Scotland
Page 6
Figure 7 – Decomposition of time series in trend, seasonal and remainder components
In Figure 8 representations regarding seasonal adjusted time series and trend can be observed.
Figure 8 – Seasonal Adjusted time series & Time series and Trend representation
STL Decomposition, for time series in study, can be expressed by following expression:
π·π‘’π‘Žπ‘‘β„Žπ‘ [𝑑] = π‘‡π‘Ÿπ‘’π‘›π‘‘[𝑑] + π‘†π‘’π‘Žπ‘ π‘œπ‘›π‘Žπ‘™[𝑑] + π‘…π‘’π‘ π‘–π‘‘π‘’π‘Žπ‘™π‘ [𝑑].
Figure 9 presents time forecast using STL Decomposition (blue line) from April 2019 until March 2020 and
corresponding confidence intervals (to 80% in light grey and 95% in dark grey).
Figure 9 – STL’s model forecast
FMTS | Project – Number of Deaths in Scotland
Page 7
From the analysis of residuals, we may say that apparently the model seems good to the data, considering the
average is around zero and normal distributed (with exception of the tails) - Figure 10.
Figure 10 – Residuals analysis
Accuracy measures can be observed on Table 2.
Table 2 – Accuracy measures for STL Decomposition
ME
RMSE
MAE
MPE
MAPE
MASE
ACF1
Theil's U
270.3753
351.1855
297.624
5.356052
5.952544
0.865665
-0.29226
0.678304
3.2 SMOTHING METHODS
Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the
weights decaying exponentially as the observations get older. In other words, the more recent the observation
the higher the associated weight. This framework generates reliable forecasts quickly and for a wide range of
time series, which is a great advantage and of major importance to applications in industry. Holt (1957) and
Winters (1960) extended Holt’s method to capture seasonality. The Holt-Winters seasonal method comprises
the forecast equation and three smoothing equations. Exponential smoothing models are based on a description
of the trend and seasonality in the data.
On this section we decide to test additive and multiplicative models, and the results are quite similar. Figure
below (Figure 11) presents decomposition in components level, trend and seasonal.
Figure 11 – Decomposition of time series in level, trend and seasonal components
Figure 12 includes Holt-Winters model’s and respectively forecasts (blue line) and confidence intervals (to 80%
in light grey and 95% in dark grey) and real data (green line). We may say that confidence intervals are the
FMTS | Project – Number of Deaths in Scotland
Page 8
main difference between models presented: intervals from additive model are larger than intervals from
multiplicative model.
Figure 12 – Holt-Winters model’s representation and forecast (Additive & Multiplicative)
Table 3 presents parameters for both models tested and Table 4 forecast model with parameters updated.
Table 3 – Holt-Winters model’s parameters (Additive & Multiplicative)
Type of parameter
Smoothing parameters
Trend parameters
Seasonality parameters
Parameter
alpha:
beta
gamma:
a
b
s1
s2
s3
s4
s5
s6
s7
s8
s9
s10
s11
s12
Value (Additive)
0.227543
0
0.304287
4569.546
-2.54706
-168.531
-146.343
-400.655
-336.248
-337.081
-512.053
83.68617
47.38295
442.7161
1271.5
146.5858
188.0309
Value (Multiplicative)
0.246653
0
0.296167
4578.79
-2.54706
0.967324
0.970012
0.916205
0.930257
0.925756
0.889331
1.017862
1.007476
1.094513
1.263167
1.031218
1.039653
Table 4 – Holt-Winters’ forecast model (updated with the parameters values obtained)
Additive Model
𝐹𝑑+π‘š = (𝑙𝑙 + π‘šπ‘π‘‘ )𝑠𝑑+π‘š−12
𝑙𝑑 = 0.2275(π‘₯𝑑 − 𝑠𝑑−12 ) + (1 − 0.2275)(𝑙𝑑−1 + 𝑏𝑑−1 )
𝑏𝑑 = 0(𝑙𝑑 − 𝑙𝑑−1 ) + (1 − 0)𝑏𝑑−1
FMTS | Project – Number of Deaths in Scotland
Multiplicative Model
𝐹𝑑+π‘š = (𝑙𝑙 + π‘šπ‘π‘‘ )𝑠𝑑+π‘š−12
π‘₯𝑑
𝑙𝑑 = 0.2467
+ (1 − 0.2467)(𝑙𝑑−1 + 𝑏𝑑−1 )
𝑠𝑑−12
𝑏𝑑 = 0(𝑙𝑑 − 𝑙𝑑−1 ) + (1 − 0)𝑏𝑑−1
Page 9
𝑠𝑑 = 0.3043(π‘₯𝑑 − 𝑙𝑑 ) + (1 − 0.3043)𝑠𝑑−12
𝑠𝑑 = 0.2962
𝑆𝑆𝐸 = 8758486
π‘₯𝑑
+ (1 − 0.2962)𝑠𝑑−12
(𝑙𝑑−1 + 𝑏𝑑−1 )
𝑆𝑆𝐸 = 8427099
Table below includes forecast values and 95% confidence intervals for both models tested - Table 5.
Table 5 – Holt-Winters model’s forecast and IC95% (Additive & Multiplicative)
Date
Forecast
Lower 95%
Upper 95%
Date
Forecast
Lower 95%
Upper 95%
Apr 2019
4398
3663
5134
Apr 2019
4398
4088
4766
May 2019
4418
3664
5173
May 2019
4418
4054
4820
Jun 2019
4161
3388
4934
Jun 2019
4161
3774
4602
Jul 2019
4223
3432
5014
Jul 2019
4223
3796
4704
Aug 2019
4220
3411
5028
Aug 2019
4220
3741
4713
Sep 2019
4042
3217
4868
Sep 2019
4042
3552
4565
Oct 2019
4635
3793
5478
Oct 2019
4635
4058
5227
Nov 2019
4597
3738
5455
Nov 2019
4597
3986
5199
Dec 2019
4989
4114
5864
Dec 2019
4989
4315
5658
Jan 2020
5816
4925
6706
Jan 2020
5816
4974
6529
Feb 2020
4688
3782
5594
Feb 2020
4688
4013
5373
Mar 2020
4727
3805
5649
Mar 2020
4727
4093
5365
Accuracy measures can be observed on Table 6.
Table 6 – Accuracy measures for Smoothing Decomposition
Additive Model
Multiplicative Model
ME
320.6776
315.1462
RMSE
404.0315
395.2558
MAE
335.7734
321.0981
MPE
6.456431
6.311381
MAPE
6.720113
6.425155
MASE
0.976626
0.933942
ACF1
-0.23698
-0.26105
Theil's U
0.772326
0.756834
3.3 STATISTICAL METHODS
ARIMA models provide another approach to time series forecasting. Exponential smoothing and ARIMA models
are the two most widely used approaches to time series forecasting and provide complementary approaches to
the problem. While exponential smoothing models are based on a description of the trend and seasonality in
the data, ARIMA models aim to describe the autocorrelations in the data.
Now we need to decide on the differences that are needed to stationarise the series. This is equivalent to finding
how many unit roots (non-seasonal and seasonal we should consider in the model). Figure 13 presents graphical
representation and ACF and PACF for differenced tests made. We may say that diff(diff(datatr,12),1) have
apparently the better results.
FMTS | Project – Number of Deaths in Scotland
Page 10
Figure 13 – Differenced data representation, ACF and PACF
Table 7 below contains ARIMA Models description: configuration, significance of the models' parameters,
residuals description.
Table 7 – seasonal ARIMA Models description
1
SARIMA(1,0,0)x(1,0,0)12
Statistically
significant
parameters?
Yes
2
SARIMA(2,1,0)x(1,0,0)12
No
3
SARIMA(2,1,0)X(2,1,0)12
Yes
4
SARIMA(1,0,1)X(0,1,1)12
No
Model
Residuals
AIC
Residuals resemble white noise;
no significant
autocorrelations in ACF; no significant p-values in Ljung-Box
stat; distribution seems to converge to normal distribution.
Residuals resemble white noise;
no significant
autocorrelations in ACF; no significant p-values in Ljung-Box
stat; distribution seems to converge to normal distribution.
Residuals resemble white noise;
no significant
autocorrelations in ACF; no significant p-values in Ljung-Box
stat; distribution seems to converge to normal distribution.
Residuals resemble white noise;
no significant
autocorrelations in ACF; significant p-values in Ljung-Box
stat; distribution seems to converge to normal distribution.
15.04
15.14
12.76
12.77
Models #1 and #3 were the ones chosen, considering the significancy of their parameters. Residual’s analysis
can be found in Figure 14. SARIMA’s models ACF and PACF and forecast, with respectively confidence intervals
(to 80% in light grey and 95% in dark grey) and real data (green line) are presented in Figure 15.
FMTS | Project – Number of Deaths in Scotland
Page 11
Figure 14 – SARIMA(1,0,0)(1,0,0)12 & SARIMA(2,1,0)(2,1,0)12 model’s residuals analysis
Figure 15 – SARIMA’s models ACF and PACF and forecast
Forecast model chosen before was presented below, with parameters updated - .Table 8.
Table 8 – ARIMA’ forecast model (updated with the parameters values obtained)
SARIMA(1,0,0)x(1,0,0)12
(𝟏 − 𝟎. πŸ‘πŸ–πŸ‘πŸ“π‘©)(𝟏 − 𝟎. πŸ“πŸ—πŸ‘πŸ”π‘©πŸπŸ )𝑿𝒕 = 𝒆𝒕
(1 − 𝐡)(1 − 𝐡)12 (1 − 0.7147𝐡 − 0.4587𝐡 2 )(1 − 0.4134𝐡12
− 0.4598𝐡 24 )𝑋𝑑 = 𝑒𝑑
SARIMA(2,1,0)X(2,1,0)12
𝝈𝟐 = πŸπŸ”πŸ”πŸ”πŸ•πŸ”
𝝈𝟐 = 147987
Accuracy measures can be observed on Table 9.
Table 9 – Accuracy measures for ARIMA
SARIMA(1,0,0)x(1,0,0)12
SARIMA(2,1,0)X(2,1,0)12
ME
231.7259
697.3549
FMTS | Project – Number of Deaths in Scotland
RMSE
367.4495
742.6301
MAE
282.7771
697.3549
MPE
4.318207
14.11073
MAPE
5.497984
14.11073
MASE
0.822482
2.028318
ACF1
-0.16666
-0.38235
Theil's U
0.689664
1.406165
Page 12
4
FORECASTING COMPARISION
In this section a comparison of the forecasts obtained will be performed, with main purpose of access the
predictive power and the fit of the model. Mean Absolute Percentage Error (MAPE) was the accuracy measure
chosen to compare models. As observed in Table 10 all values are quite higher then 1% (as mentioned in
classes), and SARIMA(1,0,0)x(1,0,0)12 was the model with lower MAPE.
Table 10 – Accuracy measures per model
STL Decomposition
Holt-Winters (Multiplicative)
SARIMA(1,0,0)x(1,0,0)12
SARIMA(2,1,0)X(2,1,0)12
MAPE
5.952544
6.425155
5.497984
14.11073
Another possible comparative analysis is related to the real values inside/outside the confidence intervals and
if the percentage errors vary over time. Representations available in Figure 16 allow us to verify it. Confidence
intervals represented are 80% in light grey and 95% in dark grey. We noticed forecasts are inside the confidence
intervals for all models.
Figure 16 – Comparison of forecasts (in blue) and real data (in red) for methods tested
FMTS | Project – Number of Deaths in Scotland
Page 13
5
FINAL CONSIDERATIONS
This project was challenging as it was important, as it allowed us to apply in a very practical and
focused way the contents of the recently acquired knowledge provided in Forecasting Methods and
Time Series course, with a quite positive aspect – real world data.
Many different analysis and approaches could be performed differently, but from my point of view
the analysis presented reveals a simple approach as how to analyse a time series.
About 2018’s winter: « Last winter's death total was the largest number since 23,379 deaths were
recorded in 1999/2000. Around 80% of additional deaths (3,860) last winter were among people
aged 75 and older. The NRS said the seasonal increase was larger than in most of the previous 66
winters and exceeded the level seen in 19 of the previous 20 winters.2»
Possible transformation of data to ratio between number of deaths and total population…
2
https://www.bbc.com/news/uk-scotland-45876204
FMTS | Project – Number of Deaths in Scotland
Page 14
6
REFERENCES
Shumway, R., Stoffer, R., (2011). Time Series Analysis and its Applications, 3rd ed, Springer
Hyndman, R., 0Athanasopoulos, G. (2018). Forecasting: principles and practice, 2nd ed, OTexts: Melbourne,
Australia.
Makridakis, S., Wheelwright, S. C. & Hyndman, R. J. (1998), Forecasting: methods and applications, 3rd ed,
John Wiley & Sons, New York.
FMTS | Project – Number of Deaths in Scotland
Page 15
Download