Forecasting epidemic: Time series modelling

advertisement
Forecasting epidemic:
Time series modelling
Dr Cho-Min-Naing
Medical Officer (Malaria/DHF)
The National Vector Borne Diseases Control
Project, Yangon, Insein PO, Myanmar
Email
Learning objectives:
At the end of this session,
1. the participant should
understand forecasting methods.
2. the participant should know
concepts behind forecasting models.
Performance objectives:
1. the participant should be able
to develop times series model for
forecasting epidemic.
I. Background:
Unaided, subjective judgements to warn
of forthcoming events and changes are not as
accurate and effective as systematic, explicit
approaches to forecasting.
This does not mean there is error free
forecasts. This does mean explicit systematic
forecasting approaching can provide
substantial benefits when used properly as all
types and forms of forecasting techniques
are made available within the existing data.
II. Forecasting methods:
There are three major categories
as stated below.
1. Judgmental method
2. Quantitative method
3. Technological method
1. Judgmental method:
Forecasts are made as
individual judgements or by
committee agreement or
decisions.
2. Quantitative method:
To know what will happen, but not why
something happens.
There are three subcategories of this method
2.1 Times series methods: Seek to identify
historical patterns (using time as a reference)
and then forecast using a time-based
extrapolation of those patterns.
2.2 Explanatory methods: Seek to identify the
relationships that lead to observed outcomes in
the past and then forecast by applying those
relationships to the future.
2.3 Monitoring methods: Seek to identify
changes in patterns and relationships.
3. Technological method:
Address long-term issues of a
technological, societal, political or
economic nature.
3.1 Extrapolative methods: using historical
patterns and relationships as a basic for forecasts
3.2 Analogy-based methods: using historical
and other analogies to make forecasts
3.3 Expert-based methods:
3.4 Normative-based methods: {using
objectives, goals, and desired outcomes as a
basic for forecasting, thereby influencing future
events}.
III. Selection points for the appropriate
forecasting techniques.
When we have to concern with application of
forecasting in our decision making, we need to
iterate the importance of selecting the appropriate
forecasting techniques. In this context, there are six
points that play an important role in determining the
requirements for an appropriate technique.
1.Time horizon: Generally, time horizon can be divided
into short term (1 to 3 months) immediate term (less
than 1 month), medium term (3 months to 2 years), and
long term (2 years or more). The exact length of time
used to classify these four categories is subject to vary
by organization and situation.
III. Selection points for the appropriate
forecasting techniques. (cont.)
2. Level of aggregate detail: In general, the greater the
level of detail (and frequency) that is required, the greater
the need for an automated forecasting procedure, and vice
versa.
3. Number of items: The larger the number of items
involved (all other things being equal), the more accurate
the forecasts.
4. Control versus planning: In control, management by
except is the general procedure. Thus, a forecasting method
in such situation should be able to recognize changes in
basic patterns or relationships at an early stage. On the
planning side, it is generally assumed that the existing
patterns will continue in the future, the major emphasis is on
identifying those patterns and extrapolating them into future.
III. Selection points for the appropriate
forecasting techniques. (cont.)
5. Constancy: Forecasting a situation that is constant
over time is very different from forecasting one that is in a
state of flux. In the stable situation a quantitative
forecasting method can be adopted and checked
periodically to reconfirm its appropriateness. In changing
circumstances, what is needed is a method that can adopt
continually to reflect the most recent results and the latest
information.
6. Existing planning procedure: The greater the
competition (all other things being equal), the more difficult
to forecast. Based on the outcomes of forecasting models,
there is built – in resistance to change in any organization.
The change can be made in a stepwise manner, rather
than all at once.
IV. Concepts behind the
times series analysis:
In man, the conflict is what is desired
and what should be desired. In the
animal, the conflict is what is and
what is desired.
[Rabindranath Tagore, Personality: What is art?]
Time series forecasting treats the system as
black box and makes no attempt to discover
the factors affecting its behavior. It explains
only what will happen, but not why something
happens. The general formula for the time
series model is
Actual = pattern + randomness
The common goal in the application of
forecasting techniques is to minimize these
deviations or errors in the forecast. The errors
are defined as the differences between the
actual value and what was predicted.
V. The decomposition method
We will selectively present the decomposition method,
assuming that the data can be broken down into the
various components and a forecast obtained for each
component.
Advantage: 1. The simplicity of the procedures
2. Ease for computational procedures
3. The minimal start-up time
4. Accuracy especially for short-term forecasting
Disadvantage: Not having sound statistical theory
behind the method
Times series model can basically be
classified into two types; additive model and
multiplicative model.
The forecast for Y in the year t is
generally written as
Y^t = f (Trt, Snt, Clt, ˜
t )
Y^ = forecast y
f = function
Tr = trend
Sn = seasonal variation
Cl = cycle
˜= error
t = the time period being
examined (t = 1, 2,… i ).
Additive model
1. We assume that the data is the
sum of the time series
components.
Yt = Trt + Snt + Clt +•
2. If the data do not contain
one of the components
(e.g., cycle) the value for
that missing component is
zero. Suppose there is no
cycle, then
Yt = Trt + Snt + t
3. The seasonal component is
independent of trend, and thus
magnitude of the seasonal swing
is constant over time.
Multiplicative model
1. We assume that the data is
the product of the various
components.
Yt = Trt * Snt * Clt * ˜
t
2. If trend, seasonal variation, or
cycle is missing, then the value
is assumed to be 1.
Suppose there is no cycle, then
Yt = Trt * Snt * •t
3. The seasonal factor of
multiplicative model is a
proportion (ratio) to the trends,
and thus the magnitude of the
seasonal swing increases or
decreases according to the
behaviour of trend.
VI. Case study:
Quarterly malaria cases of a Township in
Myanmar between 1984-1992 is shown in
Table 1. Using the multiplicative
decomposition method,
a) calculate the centered moving average for
the time series data.
b) find the equation to model the linear trend.
c) estimate the seasonal factors.
d) calculate the final forecast values over the
estimation period.
e) discuss the model.
Objectives of modelling :
1) to monitor the malaria situation
in the study area and forecast with
modelling;
2) to detect seasonal transmission
patterns in the distribution of
malaria in the study area
Methods:
1. This is a documentary study using time series
data covering 1984 to 1992.
2. The dependent variable was the incidence of
malaria occurring during a given time including both
out-patient and in-patient malaria cases.
3. For a starting point, we demonstrated a simple,
two-variable regression model using the
independent variable, time factor.
4. For the centred moving average, we computed a
four-period moving centred average.
5. The data were processed using MINITAB release
11.12.
Results:
The output for MINITAB program illustrating
seasonal indices and centred moving average.
Times series (multiplicative decomposition
method)
Seasonal Indices
Period Index
1
1.18483
2
0.309150
3
0.738706
4
1.76732
Accuracy of Model
MAPE:
494
MAD:
234
MSD:
101789
Fig 1. The multiplicative decomposition
method: Actual versus forecast values for
malaria cases, 1984-92
1500
Actual
Predicted
Forecast
Actual
Predicted
Forecast
Cases
1000
500
MAPE:
MAD:
MSD:
0
0
10
20
30
quarterly time periods
40
249.4
206.1
89899.9
Fig 2. Moving average model
1500
Actual
Predicted
Actual
Predicted
cases
1000
Moving Average
500
0
0
10
20
quarterly time periods
30
Length:
4
MAPE:
499
MAD:
256
MSD:
132728
Linear regression model (simple,
two-variable model) in MINITAB
The regression equation is
Y = 21 + 10.9 X
Predictor
Constant
X
S = 324.7
Coef
21.1
StDev
110.5
T
0.19
P
0.850
10.932
5.209
2.10
0.043
R-Sq = 11.5%
R-Sq(adj) = 8.9%
Dependent variable: quarterly malaria cases
Independent variable: time
FIG 2 The linear regression model for malaria cases, 1984-92
Y = 21.0603 + 10.9322X
R-Sq = 0.115
1500
1000
se
sa
C
500
0
Regression
95% CI
-500
95% PI
0
10
20
quarterlytime periods
30
40
Evaluating the model:
Before completing the analysis, diagnostic tests
taking account of statistical pathology are to be
investigated.
Graphing the actual values with the predicted
values: closeness of the differences?
[see Figure 1]
Graphing the residual: Checked for a normal
distribution.
T-test for slope, F test, and Durbin-Watsin: Are
they greater than their tabled values (alpha =
0.05)?. Further details are available in standard
texts (see references). An example, see ChoMin-Naing et al (2000)
Discussion:
1.Epidemic of malaria: What?
1.1 Periodical rapid and great increase in
malaria morbidity and perhaps mortality,
reaching levels above local average endemicity.
1.2 A rapid increase in malaria morbidity and
mortality in a given population (independent of
seasonal variations) which clearly surpasses.
1.3 The usual levels, or the appearance of the
infection in an area where it was not known
before.
A sharp increase of the incidence of malaria
among a population in which disease was
unknown.
Points to ponder:
1. Among diverse factors, the selection of
independent variables should be judiciously
based on theoretical considerations.
2. It is worth emphasizing that the simple, twovariable regression model is limited in
information.
3. The preferred approach is to perform
regression model for more than one
independent
Variable. That is, multiple regression analysis: It
allows the investigator to assess the separate
effects of several unconfounded independent
variables on a single dependent variable.
A cautionary note:
For real progress, the
mathematical modeller as well as
the epidemiologist must have
mud on his boots. [Bradley,1982].
References:
1. Armitage P, Berry G. Automatic selection
procedures and colinearity. In:Statistical Methods
in Medical Research. 3rd ed. Blackwell Scientific
Publications, Oxford. 1994; 321-323.
2. Centred for Health Economics, Chulalongkorn
University, Bangkok, Thailand. Lecture Notes on
Econometrics. (1995/96) (unpublished)
3. Doti JL, Adibi E. Identifying and correcting
econometric problems. In: Econometric Analysis:
An Application Approach. Prentice-Hall, Inc, New
Jersey. 1987; 203-265.
References (cont):
4. Foster DP, Stine RA, Waterman RP. Summary
regression case. In:Business Analysis Using
Regression: A Case Book. Springer-Verlag New York,
Inc. 1998; 227.
5. Gujarati DN. Test of specification errors. In: Basic
Econometric. 3rd ed. Mcgraw-Hill, Inc. Singapore.
1995;461.
6. Kleinbaum DG, Kupper LL, Muller KE. Regression
diagnostics. In: Applied Regression Analysis and Other
Multivariable Methods. 2nd ed. PWS-KENT publishing
Company. Boston. 1988: 181-225.
7. Roll Back Malaria. Malaria Early Warning Systems.
WHO/CDS/RBM/2001.32. 2001
Download