CHAPTER 2 – ECON 325 Basic Forecasting Tools Time series data • To reprise this concept, often our historical data will consist of a sequence of observations over time. We call such a sequence a time series. • For example: monthly sales figures, daily stock prices, weekly interest rates, yearly profits, daily maximum temperatures, annual crop production, and electrocardiograph measurements are all time series. Time series data • In forecasting, we are trying to estimate how the sequence of observations will continue into the future. To make things simple, we will assume that the times of observation are equally spaced. • This is generally not a great restriction because most business series are measured daily, monthly, quarterly, or yearly and so will be equally spaced. • However, when might non-equally-spaced data matter? Graphical summaries • The basic features of the data including patterns and unusual observations are most easily seen through graphs. • Sometimes graphs also suggest possible explanations for some of the variation in the data. • For example: industrial disputes will often affect time series of production; changes in government will affect economic time series; changes in definitions may result in identifiable changes in time series patterns. Graphical summaries • The type of data will determine which type of graph is most appropriate. • Three common plots are time plots, seasonal plots, and scatter plots. Time plots • Time plots and time series patterns: For time series, the most obvious graphical form is a time plot in which the data are plotted over time. • A time plot immediately reveals any trends over time, any regular seasonal behavior, and other systematic features of the data. Time plots • An important step in selecting an appropriate forecasting method is to consider the types of data patterns, so that the methods most appropriate to those patterns can be utilized. • Four types of time series data patterns can be distinguished: horizontal, seasonal, cyclical, and trend. Time plots • A horizontal (H) pattern exists when the data values fluctuate around a constant mean. • (Such a series called “stationary” in its mean.) • A product whose sales do not increase or decrease over time would be of this type. • A quality control situation involving sampling from a continuous production process that theoretically does not change would also show a horizontal pattern. Time plots • A seasonal (S) pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). • Sales of products such as soft drinks, ice creams, and household electricity consumption all exhibit this type of pattern. • Seasonal series are sometimes also called “periodic” although they do not exactly repeat themselves over each period. Time plots • A cyclical (C) pattern exists when the data exhibit rises and falls that are not of a fixed period. • For economic series, these are usually due to economic fluctuations such as those associated with the business cycle. • The sales of products such as automobiles, steel, and major appliances exhibit this type of pattern. • The major distinction between a seasonal and a cyclical pattern is that the former is of a constant length and recurs on a regular periodic basis, while the latter varies in length. • Moreover, the average length of a cycle is usually longer than that of seasonality and the magnitude of a cycle is usually more variable than that of seasonality. Time plots • A trend (T) pattern exists when there is a long-term increase or decrease in the data. • The sales of many companies, the gross national product (GNP), and many other business or economic indicators follow a trend pattern in their movement over time. Seasonal plots • Seasonal plots: for time series data that are seasonal, it is often useful to also produce a seasonal plot. • This graph consists of the data plotted against the individual “seasons” in which the data were observed. This is something like a time plot except that the data from each season are overlapped. Seasonal plots • A seasonal plot enables the underlying seasonal pattern to be seen more clearly, and also allows any substantial departures from the seasonal pattern to be easily identified. • Seasonal subseries plots are an alternative plot where the data for each season are collected together in separate mini time plots. Scatter plots • Some forecast data are not time series, and therefore time or seasonal plots are inappropriate. However, these data are well suited to a scatterplot, in which the variable we wish to forecast is plotted against one of the explanatory variables. • In some circumstances, categorical (qualitative) variables can also be incorporated into a scatter plot; points in the plot would be categorized by color or icon shape/size in these cases. Scatter plots • When there are several potential predictor variables, it is useful to plot each variable against each other variable. These plots can be arranged in a scatterplot matrix. • The value of the scatterplot matrix is that it enables a quick view of the relationships between all pairs of variables. Outliers can also be seen. Numerical summaries In addition to graphics, it is also helpful to provide numerical summaries. A summary number for a data set is called a statistic. • For a single data set (univariate data) or a single time series, the most common descriptive statistics are the mean, the standard deviation, and the variance. • In addition, in forecasting we also frequently make use of the median (or other percentile value), as well as related concepts such as the inter-quartile range (IQR). Numerical summaries fuel2 <- fuel[fuel$Litres<2,] summary(fuel2[,"Carbon"]) sd(fuel2[,"Carbon"]) Numerical summaries For a pair of random variables (bivariate data) it is of interest to describe how the two data sets relate to each other. The most widely used summary numbers (statistics) for this purpose are the covariance and the correlation. Numerical summaries Correlation (r) measures the strength of the linear relationship between two variables, x and y. It is possible for data to have a strong non-linear relationship, but low correlation, so you should always plot the data you’re analyzing. Correlation These plots each have a correlation coefficient of 0.82. Correlation The values for r always lie between -1 and 1, with values closer to -1 indicating a stronger negative relationship, and values closer to 1 indicating a stronger positive relationship. Numerical summaries • Correlation equation is: r x x y y x x y y i i 2 i 2 i • Note that the equation can be also written as: Covxy/SxSy, where Covxy is 1 xi x yi y Covxy n 1 Numerical summaries • For a single time series, it is very useful to compare the observation at one time period with the observation at another time period. • The two most common statistics here are the autocovariance and the autocorrelation. Autocorrelation • Autocorrelation measures the linear relationship between lagged values of a time series. • There are several autocorrelation coefficients depending on the length of the lag selected. For example, r1 measures the relationship between yt-1 and yt. Numerical summaries • Autocorrelation equation: T rk y t k 1 t y yt k y T 2 y y t t 1 • Autocovariance: 1 T ck yt y yt k y n t k 1 Numerical summaries beer2 <- window(ausbeer, start=1992, end=2006-.1) lag.plot(beer2, lags=9, do.lines=FALSE) Numerical summaries • The autocorrelation coefficients are typically plotted in a correlogram or autocorrelation function (ACF). • The R code for this example is simply Acf(beer2). White Noise • Time series that show no autocorrelation are called “white noise”. set.seed(30) x <- ts(rnorm(50)) plot(x, main="White noise") Acf(x) • The dashed lines shown on the correlogram are equal to 2 T ; for white noise, one would expect 95% of all of the autocorrelations to lie within the band. Exercise #3.1 Consider the data in “running”, showing running times and maximal aerobic capacity for 14 female runners. • Calculate the mean, median, interquartile range, and standard deviation for each variable. • Which of these statistics give a measure of the center of data and which give a measure of the spread of data? • Calculate the correlation of the two variables and produce a scatterplot. • Why is it inappropriate to calculate the autocorrelation of these data? Exercise #3.1 summary(running) sd(running[,"capacity"]) This WON’T work in R! sd(running[, "times”]) cor(running) plot(times ~ capacity, data = running, pch = 19, col = 2) Simple forecasting methods There are a few very simple, and yet often quite effective, forecasting methods. 1. Average method 2. Naïve method 3. Seasonal naïve 4. Drift method Simple forecasting methods Average (mean) method: the forecasts of all future values are equal to the mean of the historical data. If we let historical data be denoted by y1,…,yT, then yˆT h y y1 ... yT T meanf(y, h) # y contains the time series # h is the forecast horizon Simple forecasting methods Naïve method: only appropriate for time series, all forecasts are simply set to the value of the previous observation. That is, the forecasts for all future values are set to be yT, where yT is the last observed value. naive(y, h) rwf(y, h) # Alternative Simple forecasting methods Seasonal naïve: A similar method is useful for highly seasonal data. In this case, we set each forecast to be equal to the last observed value from the same season of the year (e.g., the same month of the previous year). snaive(y, h) Simple forecasting methods Drift method: a variation on the naïve method is to allow forecasts to increase or decrease over time, where the amount of change over time (called the drift) is set to be the average change seen in historical data. This is equivalent to drawing a line between the first and last observation, and extrapolating it into the future. rwf(y, h, drift = TRUE) Transformations and adjustments • Adjusting the historical data can often lead to a simpler forecasting model. • The purpose of transformations and adjustments is to simplify the patterns in the historical data by removing known sources of variation or by making the pattern more consistent across the whole data set. • Simpler patterns usually lead to more accurate forecasts. Transformations and adjustments • Mathematical transformations: If the data show variation that increases or decreases with the level of the series, then a transformation can be useful. • For example, a logarithmic transformation is often useful. If we denote the original observations as y1,…,yT and the transformed observations as w1,…,wT, then wt=log(yt). Transformations and adjustments • Logarithms are useful because they are interpretable: changes in a log value are relative (or percentage) changes on the original scale. • So if log base 10 is used, then an increase of 1 on the log scale corresponds to a multiplication of 10 on the original scale. • Another useful feature of log transformations is that they constrain the forecasts to stay positive on the original scale. Transformations and adjustments Sometimes other transformations are also used (although they are not so interpretable). For example, square roots and cube roots can be used. These are called power transformations because they can be written in the form wt ytp Transformations and adjustments A useful family of transformations that includes logarithms and power transformations is the family of "Box-Cox transformations", which depend on the parameter λ and are defined as follows: if 0 log yt wt yt 1 otherwise. The logarithm in a Box-Cox transformation is always a natural logarithm (i.e., to base e). So if λ=0, natural logarithms are used, but if λ≠0, a power transformation is used followed by some simple scaling. Transformations and adjustments plot(log(elec), ylab="Transformed electricity demand", xlab="Year", main="Transformed monthly electricity demand") title(main="Log", line=-1) A good value of λ is one which makes the size of the seasonal variation about the same across the whole series, as that makes the forecasting model simpler. In this case, λ=0.30 works quite well, although any value of λ between 0 and 0.5 would give similar results. Transformations and adjustments # The BoxCox.lambda() function will choose a value of lambda for you. lambda <- BoxCox.lambda(elec) plot(BoxCox(elec,lambda)) # = 0.27 Transformations and adjustments Having chosen a transformation, we need to forecast the transformed data. Then, we need to reverse the transformation (or back-transform) to obtain forecasts on the original scale. The reverse Box-Cox transformation is given by: exp wt 1 yt wt 1 if 0 otherwise. Transformations and adjustments • For many series, transformation does not often have a major effect on forecast accuracy. • This is because most forecast methods place more weight on the most recent data. Therefore, earlier, typically smaller, variations are unlikely to influence the forecast very much. • Only when the series is rapidly changing in variation will mathematical transformations make a larger difference to the forecast. • However, some of the measures of forecast accuracy give equal weight to all data and so prediction intervals will be affected by transformations. Transformations and adjustments • Calendar transformations: Some variation seen in seasonal data may be due to simple calendar effects. In such cases, it is usually much easier to remove the variation before fitting a forecasting model. • For example, if you are studying monthly milk production on a farm, then there will be variation between the months simply because of the different numbers of days in each month in addition to seasonal variation across the year. Transformations and adjustments monthdays <- rep(c(31,28,31,30,31,30,31,31,30,31,30,31),14) monthdays[26 + (4*12)*(0:2)] <- 29 par(mfrow=c(2,1)) plot(milk, main="Monthly milk production per cow", ylab="Pounds",xlab="Years") plot(milk/monthdays, main="Average milk production per cow per day", ylab="Pounds", xlab="Years") Transformations and adjustments • A similar adjustment can be done for sales data when the number of trading days in each month will vary. In this case, the sales per trading day can be modelled instead of the total sales for each month. • Population adjustments: Any data that are affected by population changes can be adjusted to give per-capita data. • Inflation adjustments: Data that are affected by the value of money are best adjusted before modelling. Financial time series are usually adjusted so all values are stated in dollar values from a particular year. Measuring Forecast Accuracy • How to measure the suitability of a particular forecasting method for a given data set? • In most forecasting situations, accuracy is treated as the overriding criterion for selecting a forecasting method. • In many instances, the word “accuracy” refers to the “goodness of fit,” which in turn refers to how well the forecasting model is able to reproduce the data that are already known. • To the consumer of forecasts, it is the accuracy of the future forecast that is most important. Measuring Forecast Accuracy Consider the Australian quarterly beer production figures. We’re going to produce forecasts based upon data through the end of 2005. beer2 <- window(ausbeer, start=1992, end=2006-.1) beerfit1 <- meanf(beer2, h=11) beerfit2 <- rwf(beer2, h=11) beerfit3 <- snaive(beer2, h=11) plot(beerfit1, plot.conf=FALSE, main="Forecasts for quarterly beer production") lines(beerfit2$mean, col=2) lines(beerfit3$mean, col=3) lines(ausbeer) legend("topright", lty=1, col=c(4,2,3), legend=c("Mean method","Naive method","Seasonal naive method")) Measuring Forecast Accuracy • Let yi denote the ith observation and let ŷi denote a forecast of yi. • Scale-dependent errors • The forecast error is simply scale as the data. ei yi yˆi , which is on the same • Accuracy measures that are based on ei are therefore scale- dependent and therefore cannot be used to make comparisons between series that are on different scales. Measuring Forecast Accuracy • The two most commonly used scale-dependent measures are based on the absolute errors or squared errors • Mean Error (ME) = mean of errors • Mean Absolute Error (MAE) = mean of absolute value of errors • Root Mean Square Error (RMSE) = square root of mean of squared errors Measuring Forecast Accuracy • Percentage errors • The percentage error is given by pi=100ei/yi. Percentage errors have the advantage of being scale-independent, and so are frequently used to compare forecast performance between different data sets. • The most commonly used measure is mean absolute percentage error (MAPE), which is the mean of the absolute value of the percentage errors. Measuring Forecast Accuracy • Scaled errors • As an alternative to percentage errors, errors are scaled based on the training MAE versus a simple forecasting method (usually a naïve forecast for a time series). • A scaled error is less that one if it arises from a better forecast than the average naïve forecast computed on the training data. Measuring Forecast Accuracy beer3 <- window(ausbeer, start=2006) accuracy(beerfit1, beer3) accuracy(beerfit2, beer3) accuracy(beerfit3, beer3) Out-of-sample Accuracy Measurement • The summary statistics described thus far measure the goodness of fit of the model to historical data. Such fitting does not necessarily imply good forecasting. • As in the beer example, above, it is prudent to divide the total data into an initialization/training test/holdout set. set and a • The initialization set is used to estimate any parameters and to initialize the method • Forecasts are made for the test set, and accuracy measures are computed for the errors in the test set only. Residual Diagnostics • A residual is the difference between an observed value and its forecast based on other observations. • For time series, a residual is based on one-step-ahead forecasts. That is, the forecast of yt is based on yt-1,…,y1. • For cross-sectional forecasts, the residual is calculated based on forecasts using all observations other than the one being examined. Residual Diagnostics A good forecasting model will yield residuals with the following properties: 1. The residuals are uncorrelated. If there are correlations between residuals, then there is information left in the residuals which should be used in computing forecasts. 2. The residuals have zero mean. If the residuals have a mean other than zero, then the forecasts are biased. Residual Diagnostics In addition to these essential properties, it is useful (but not necessary) for the residuals to also have the following two properties. 1. The residuals have constant variance. 2. The residuals are normally distributed. These two properties make the calculation of prediction intervals easier. Residual Diagnostics If either of the previous two properties is not satisfied, then the forecasting method can be modified to give better forecasts. • Adjusting for bias is easy: if the residuals have mean m, then simply add m to all forecasts and the bias problem is solved. • Fixing the correlation problem is harder. Residual Diagnostics Example – Forecasting the DJIA: When forecasting equity indices, the best forecast is often the naïve one. In this case, the residual is simply the difference between consecutive observations. dj2 <- window(dj, end=250) plot(dj2, main="Dow Jones Index (daily ending 15 Jul 94)", ylab="", xlab="Day") res <- residuals(naive(dj2)) plot(res, main="Residuals from naive method", ylab="", xlab="Day") Acf(res, main="ACF of residuals") hist(res, nclass="FD", main="Histogram of residuals") qqnorm(res) qqline(res) Residual Diagnostics • Portmanteau tests for autocorrelation: These test whether a group of autocorrelations of the residuals are statistically significantly different from zero, i.e. white noise. This differs from Acf/correlograms, where each autocorrelation coefficient is examined separately. • Box-Pierce test and Ljung-Box test are two examples. Box.test(res, lag=10, fitdf=0) Box.test(res,lag=10, fitdf=0, type="Lj")