– ECON CHAPTER 2 325 Basic Forecasting Tools

advertisement
CHAPTER 2 – ECON
325
Basic Forecasting Tools
Time series data
• To reprise this concept, often our historical data will
consist of a sequence of observations over time. We call
such a sequence a time series.
• For example: monthly sales figures, daily stock prices, weekly
interest rates, yearly profits, daily maximum temperatures, annual
crop production, and electrocardiograph measurements are all time
series.
Time series data
• In forecasting,
we are trying to estimate how the
sequence of observations will continue into the future. To
make things simple, we will assume that the times of
observation are equally spaced.
• This is generally not a great restriction because most business
series are measured daily, monthly, quarterly, or yearly and so will
be equally spaced.
• However, when might non-equally-spaced data matter?
Graphical summaries
• The basic features of the data including patterns and
unusual observations are most easily seen through
graphs.
• Sometimes graphs also suggest possible explanations for
some of the variation in the data.
• For example: industrial disputes will often affect time series of
production; changes in government will affect economic time
series; changes in definitions may result in identifiable changes in
time series patterns.
Graphical summaries
• The type of data will determine which type of graph is
most appropriate.
• Three common plots are time plots, seasonal plots, and
scatter plots.
Time plots
• Time plots and time series patterns: For time series, the
most obvious graphical form is a time plot in which the
data are plotted over time.
• A time plot immediately reveals any trends over time, any
regular seasonal behavior, and other systematic features
of the data.
Time plots
• An important step in selecting an appropriate forecasting
method is to consider the types of data patterns, so that
the methods most appropriate to those patterns can be
utilized.
• Four
types of time series data patterns can be
distinguished: horizontal, seasonal, cyclical, and trend.
Time plots
• A horizontal (H) pattern exists when the data values
fluctuate around a constant mean.
• (Such a series called “stationary” in its mean.)
• A product whose sales do not increase or decrease over time
would be of this type.
• A quality control situation involving sampling from a continuous
production process that theoretically does not change would also
show a horizontal pattern.
Time plots
• A seasonal (S) pattern exists when a series is influenced
by seasonal factors (e.g., the quarter of the year, the
month, or day of the week).
• Sales of products such as soft drinks, ice creams, and household
electricity consumption all exhibit this type of pattern.
• Seasonal series are sometimes also called “periodic” although they
do not exactly repeat themselves over each period.
Time plots
• A cyclical (C) pattern exists when the data exhibit rises and
falls that are not of a fixed period.
• For economic series, these are usually due to economic fluctuations
such as those associated with the business cycle.
• The sales of products such as automobiles, steel, and major
appliances exhibit this type of pattern.
• The major distinction between a seasonal and a cyclical pattern is that
the former is of a constant length and recurs on a regular periodic
basis, while the latter varies in length.
• Moreover, the average length of a cycle is usually longer than that of
seasonality and the magnitude of a cycle is usually more variable than
that of seasonality.
Time plots
• A trend (T) pattern exists when there is a long-term
increase or decrease in the data.
• The sales of many companies, the gross national product (GNP),
and many other business or economic indicators follow a trend
pattern in their movement over time.
Seasonal plots
• Seasonal plots: for time series data that are seasonal, it is
often useful to also produce a seasonal plot.
• This graph consists of the data plotted against the
individual “seasons” in which the data were observed.
This is something like a time plot except that the data from
each season are overlapped.
Seasonal plots
• A seasonal plot enables the underlying seasonal pattern
to be seen more clearly, and also allows any substantial
departures from the seasonal pattern to be easily
identified.
• Seasonal subseries plots are an alternative plot where the
data for each season are collected together in separate
mini time plots.
Scatter plots
• Some forecast data are not time series, and therefore
time or seasonal plots are inappropriate. However,
these data are well suited to a scatterplot, in which the
variable we wish to forecast is plotted against one of the
explanatory variables.
• In
some circumstances, categorical (qualitative)
variables can also be incorporated into a scatter plot;
points in the plot would be categorized by color or icon
shape/size in these cases.
Scatter plots
• When there are several potential predictor variables, it is
useful to plot each variable against each other variable.
These plots can be arranged in a scatterplot matrix.
• The value of the scatterplot matrix is that it enables a
quick view of the relationships between all pairs of
variables. Outliers can also be seen.
Numerical summaries
In addition to graphics, it is also helpful to provide
numerical summaries. A summary number for a data set is
called a statistic.
• For a single data set (univariate data) or a single time series, the
most common descriptive statistics are the mean, the standard
deviation, and the variance.
• In addition, in forecasting we also frequently make use of the
median (or other percentile value), as well as related concepts such
as the inter-quartile range (IQR).
Numerical summaries
fuel2 <- fuel[fuel$Litres<2,]
summary(fuel2[,"Carbon"])
sd(fuel2[,"Carbon"])
Numerical summaries
For a pair of random variables (bivariate data) it is of
interest to describe how the two data sets relate to each
other. The most widely used summary numbers (statistics)
for this purpose are the covariance and the correlation.
Numerical summaries
Correlation (r) measures the strength of the linear
relationship between two variables, x and y. It is possible
for data to have a strong non-linear relationship, but low
correlation, so you should always plot the data you’re
analyzing.
Correlation
These plots each have a correlation coefficient of 0.82.
Correlation
The values for r always lie between -1 and 1, with values closer to -1
indicating a stronger negative relationship, and values closer to 1
indicating a stronger positive relationship.
Numerical summaries
• Correlation equation is:
r
 x  x  y  y 
 x  x    y  y 
i
i
2
i
2
i
• Note that the equation can be also written as: Covxy/SxSy,
where Covxy is
1
xi  x  yi  y 
Covxy 

n 1
Numerical summaries
• For a single time series, it is very useful to compare the
observation at one time period with the observation at
another time period.
• The two most common statistics here are the autocovariance and
the autocorrelation.
Autocorrelation
• Autocorrelation measures the linear relationship between
lagged values of a time series.
• There are several autocorrelation coefficients depending
on the length of the lag selected. For example, r1
measures the relationship between yt-1 and yt.
Numerical summaries
• Autocorrelation equation:
T
rk 
y
t  k 1
t
 y  yt k  y 
T
2


y

y
 t
t 1
• Autocovariance:
1 T
ck    yt  y  yt k  y 
n t k 1
Numerical summaries
beer2 <- window(ausbeer, start=1992, end=2006-.1)
lag.plot(beer2, lags=9, do.lines=FALSE)
Numerical summaries
• The autocorrelation coefficients are typically plotted in a
correlogram or autocorrelation function (ACF).
• The R code for this example is simply Acf(beer2).
White Noise
• Time series that show no autocorrelation are called “white
noise”.
set.seed(30)
x <- ts(rnorm(50))
plot(x, main="White noise")
Acf(x)
• The dashed lines shown on the correlogram are equal to
2
T ; for white noise, one would expect 95% of all of the
autocorrelations to lie within the band.
Exercise #3.1
Consider the data in “running”, showing running times and
maximal aerobic capacity for 14 female runners.
• Calculate the mean, median, interquartile range, and standard deviation
for each variable.
• Which of these statistics give a measure of the center of data and which
give a measure of the spread of data?
• Calculate the correlation of the two variables and produce a scatterplot.
• Why is it inappropriate to calculate the autocorrelation of these data?
Exercise #3.1
summary(running)
sd(running[,"capacity"])
This WON’T
work in R!
sd(running[, "times”])
cor(running)
plot(times ~ capacity, data = running, pch = 19, col = 2)
Simple forecasting methods
There are a few very simple, and yet often quite effective,
forecasting methods.
1. Average method
2. Naïve method
3. Seasonal naïve
4. Drift method
Simple forecasting methods
Average (mean) method: the forecasts of all future values
are equal to the mean of the historical data.
If we let historical data be denoted by y1,…,yT, then
yˆT h  y   y1  ...  yT  T
meanf(y, h)
# y contains the time series
# h is the forecast horizon
Simple forecasting methods
Naïve method:
only appropriate for time series, all
forecasts are simply set to the value of the previous
observation.
That is, the forecasts for all future values are set to be yT,
where yT is the last observed value.
naive(y, h)
rwf(y, h) # Alternative
Simple forecasting methods
Seasonal naïve: A similar method is useful for highly
seasonal data. In this case, we set each forecast to be
equal to the last observed value from the same season of
the year (e.g., the same month of the previous year).
snaive(y, h)
Simple forecasting methods
Drift method: a variation on the naïve method is to allow
forecasts to increase or decrease over time, where the
amount of change over time (called the drift) is set to be the
average change seen in historical data.
This is equivalent to drawing a line between the first and
last observation, and extrapolating it into the future.
rwf(y, h, drift = TRUE)
Transformations and adjustments
• Adjusting the historical data can often lead to a simpler
forecasting model.
• The purpose of transformations and adjustments is to
simplify the patterns in the historical data by removing
known sources of variation or by making the pattern more
consistent across the whole data set.
• Simpler patterns usually lead to more accurate forecasts.
Transformations and adjustments
• Mathematical transformations: If the data show variation
that increases or decreases with the level of the series,
then a transformation can be useful.
• For example, a logarithmic transformation is often useful.
If we denote the original observations as y1,…,yT and the
transformed observations as w1,…,wT, then wt=log(yt).
Transformations and adjustments
• Logarithms are useful because they are interpretable:
changes in a log value are relative (or percentage)
changes on the original scale.
• So if log base 10 is used, then an increase of 1 on the log scale
corresponds to a multiplication of 10 on the original scale.
• Another useful feature of log transformations is that they
constrain the forecasts to stay positive on the original
scale.
Transformations and adjustments
Sometimes other transformations are also used (although
they are not so interpretable). For example, square roots
and cube roots can be used. These are called power
transformations because they can be written in the form
wt  ytp
Transformations and adjustments
A useful family of transformations that includes logarithms
and power transformations is the family of "Box-Cox
transformations", which depend on the parameter λ and are
defined as follows:
if   0
 log  yt 
wt   
 yt  1  otherwise.


The logarithm in a Box-Cox transformation is always a
natural logarithm (i.e., to base e). So if λ=0, natural
logarithms are used, but if λ≠0, a power transformation is
used followed by some simple scaling.
Transformations and adjustments
plot(log(elec), ylab="Transformed electricity demand",
xlab="Year", main="Transformed monthly electricity demand")
title(main="Log", line=-1)
A good value of λ is one which makes the size of the
seasonal variation about the same across the whole series,
as that makes the forecasting model simpler. In this case,
λ=0.30 works quite well, although any value of λ between 0
and 0.5 would give similar results.
Transformations and adjustments
# The BoxCox.lambda() function will choose a value of lambda for
you.
lambda <- BoxCox.lambda(elec)
plot(BoxCox(elec,lambda))
# = 0.27
Transformations and adjustments
Having chosen a transformation, we need to forecast the
transformed data. Then, we need to reverse the
transformation (or back-transform) to obtain forecasts on
the original scale. The reverse Box-Cox transformation is
given by:

 exp wt 
1
yt  

wt  1
if   0
otherwise.
Transformations and adjustments
• For many series, transformation does not often have a
major effect on forecast accuracy.
• This is because most forecast methods place more weight on the
most recent data. Therefore, earlier, typically smaller, variations
are unlikely to influence the forecast very much.
• Only when the series is rapidly changing in variation will
mathematical transformations make a larger difference to the
forecast.
• However, some of the measures of forecast accuracy give
equal weight to all data and so prediction intervals will be
affected by transformations.
Transformations and adjustments
• Calendar
transformations:
Some variation seen in
seasonal data may be due to simple calendar effects. In
such cases, it is usually much easier to remove the
variation before fitting a forecasting model.
• For example, if you are studying monthly milk production
on a farm, then there will be variation between the months
simply because of the different numbers of days in each
month in addition to seasonal variation across the year.
Transformations and adjustments
monthdays <- rep(c(31,28,31,30,31,30,31,31,30,31,30,31),14)
monthdays[26 + (4*12)*(0:2)] <- 29
par(mfrow=c(2,1))
plot(milk, main="Monthly milk production per cow",
ylab="Pounds",xlab="Years")
plot(milk/monthdays, main="Average milk production per cow per day",
ylab="Pounds", xlab="Years")
Transformations and adjustments
• A similar adjustment can be done for sales data when the
number of trading days in each month will vary. In this
case, the sales per trading day can be modelled instead
of the total sales for each month.
• Population adjustments:
Any data that are affected by
population changes can be adjusted to give per-capita
data.
• Inflation adjustments: Data that are affected by the value
of money are best adjusted before modelling. Financial
time series are usually adjusted so all values are stated in
dollar values from a particular year.
Measuring Forecast Accuracy
• How to measure the suitability of a particular forecasting
method for a given data set?
• In most forecasting situations, accuracy is treated as the
overriding criterion for selecting a forecasting method.
• In many instances, the word “accuracy” refers to the “goodness of
fit,” which in turn refers to how well the forecasting model is able to
reproduce the data that are already known.
• To the consumer of forecasts, it is the accuracy of the future
forecast that is most important.
Measuring Forecast Accuracy
Consider the Australian quarterly beer production figures.
We’re going to produce forecasts based upon data through
the end of 2005.
beer2 <- window(ausbeer, start=1992, end=2006-.1)
beerfit1 <- meanf(beer2, h=11)
beerfit2 <- rwf(beer2, h=11)
beerfit3 <- snaive(beer2, h=11)
plot(beerfit1, plot.conf=FALSE, main="Forecasts for quarterly beer
production")
lines(beerfit2$mean, col=2)
lines(beerfit3$mean, col=3)
lines(ausbeer)
legend("topright", lty=1, col=c(4,2,3), legend=c("Mean method","Naive
method","Seasonal naive method"))
Measuring Forecast Accuracy
• Let yi denote the ith observation and let
ŷi
denote a
forecast of yi.
• Scale-dependent errors
• The forecast error is simply
scale as the data.
ei  yi  yˆi , which is on the same
• Accuracy measures that are based on ei are therefore scale-
dependent and therefore cannot be used to make comparisons
between series that are on different scales.
Measuring Forecast Accuracy
• The two most commonly used scale-dependent measures
are based on the absolute errors or squared errors
• Mean Error (ME) = mean of errors
• Mean Absolute Error (MAE) = mean of absolute value of errors
• Root Mean Square Error (RMSE) = square root of mean of
squared errors
Measuring Forecast Accuracy
• Percentage errors
• The percentage error is given by pi=100ei/yi.
Percentage errors
have the advantage of being scale-independent, and so are
frequently used to compare forecast performance between different
data sets.
• The most commonly used measure is mean absolute percentage
error (MAPE), which is the mean of the absolute value of the
percentage errors.
Measuring Forecast Accuracy
• Scaled errors
• As an alternative to percentage errors, errors are scaled based on
the training MAE versus a simple forecasting method (usually a
naïve forecast for a time series).
• A scaled error is less that one if it arises from a better forecast than
the average naïve forecast computed on the training data.
Measuring Forecast Accuracy
beer3 <- window(ausbeer, start=2006)
accuracy(beerfit1, beer3)
accuracy(beerfit2, beer3)
accuracy(beerfit3, beer3)
Out-of-sample Accuracy Measurement
• The summary statistics described thus far measure the
goodness of fit of the model to historical data. Such fitting
does not necessarily imply good forecasting.
• As in the beer example, above, it is prudent to divide the
total data into an initialization/training
test/holdout set.
set and a
• The initialization set is used to estimate any parameters and to
initialize the method
• Forecasts are made for the test set, and accuracy measures are
computed for the errors in the test set only.
Residual Diagnostics
• A residual is the difference between an observed value
and its forecast based on other observations.
• For time series, a residual is based on one-step-ahead forecasts.
That is, the forecast of yt is based on yt-1,…,y1.
• For cross-sectional forecasts, the residual is calculated based on
forecasts using all observations other than the one being
examined.
Residual Diagnostics
A good forecasting model will yield residuals with the
following properties:
1. The residuals are uncorrelated. If there are correlations
between residuals, then there is information left in the
residuals which should be used in computing forecasts.
2. The residuals have zero mean. If the residuals have a
mean other than zero, then the forecasts are biased.
Residual Diagnostics
In addition to these essential properties, it is useful (but not
necessary) for the residuals to also have the following two
properties.
1. The residuals have constant variance.
2. The residuals are normally distributed.
These two properties make the calculation of prediction
intervals easier.
Residual Diagnostics
If either of the previous two properties is not satisfied, then
the forecasting method can be modified to give better
forecasts.
• Adjusting for bias is easy: if the residuals have mean m, then
simply add m to all forecasts and the bias problem is solved.
• Fixing the correlation problem is harder.
Residual Diagnostics
Example – Forecasting the DJIA: When forecasting equity
indices, the best forecast is often the naïve one. In this
case, the residual is simply the difference between
consecutive observations.
dj2 <- window(dj, end=250)
plot(dj2, main="Dow Jones Index (daily ending 15 Jul 94)", ylab="",
xlab="Day")
res <- residuals(naive(dj2))
plot(res, main="Residuals from naive method", ylab="", xlab="Day")
Acf(res, main="ACF of residuals")
hist(res, nclass="FD", main="Histogram of residuals")
qqnorm(res)
qqline(res)
Residual Diagnostics
• Portmanteau tests for autocorrelation: These test whether
a group of autocorrelations of the residuals are
statistically significantly different from zero, i.e. white
noise. This differs from Acf/correlograms, where each
autocorrelation coefficient is examined separately.
• Box-Pierce test and Ljung-Box test are two examples.
Box.test(res, lag=10, fitdf=0)
Box.test(res,lag=10, fitdf=0, type="Lj")
Download