Chapter 1: Introduction to Business Forecasting 1. Introduction a. A forecast is essentially a prediction. b. “We might think of forecasting as a set of tools that helps decision makers make the best possible judgments about future events.” c. Forecasting methods can be broken up into two different categories. i Quantitative forecasting methods such as time series methods (which use time series data) and regression modeling (which uses cross sectional data). 1. The time series methods we will use, will forecast a particular variable using past observations of the same variable. 2. Regression modeling will forecast a particular variable using observations on other variables. a. Regression modeling relies on time series data to make forecasts of explanatory variables. 3. Quantitative forecasts are strongly tied to historical data. Quantitative forecasts will involve, at some point in the forecast process, making a forecast of a certain variable based on past observations of the same variable. ii “Subjective or qualitative forecasting methods 1. Although not called quantitative, subjective/qualitative forecasting methods do typically involve quantitative calculations. However, these calculations are less tied to historical data. d. This text and its accompanying computer software (ForecastX) have been carefully designed to provide you with an understanding of the conceptual basis for many modern quantitative forecasting models …” 2. Where is forecasting used? a. Business i “Forecasting in today’s business world is becoming increasingly important as firms focus on increasing customer satisfaction while reducing the cost of providing products and services… The term “lean” has come to represent an approach to removing waste from business systems while providing the same, or higher, levels of quality and output to customers (business customers as well as end users). One major business cost involves inventory, both of inputs and of final products.” ii Inventory management: 1. Prevent excessive inventory: “One major business cost involves inventory, both of inputs and of final products. Through better forecasting, inventory costs can be reduced and wasteful inventory eliminated.” a. The cost of excessive inventory is either the opportunity cost of resources tied up on the inventory (this is a holding cost) or the depreciated value of the inventory. Even when excessive inventory can be held without depreciation, there still will be holding costs to the inventory. The holding costs will either entail foregone interest that could have been earned on the money tied up in the inventory, or the interest that must be paid on money that must be borrowed (since available resources are tied up as inventory) to finance continuing operations. 2. Reduce likelihood of lost sales due to too small an inventory. iii Virtually every functional area of business makes use of some type of forecast. Other examples of business forecasting needs: 1. Accounting a. “Accountants rely on forecasts of costs and revenues in tax planning.” 2. Personnel a. “The personnel department depends on forecasts as it plans recruitment of new employees and other changes in the workforce.” 3. Finance a. “Financial experts must forecast cash flows to maintain solvency.” 4. Production management a. “Production managers rely on forecasts to determine raw-material needs and the desired inventory of finished products. 5. Marketing a. “Marketing managers use a sales forecast to establish promotional budgets.” iv Note that “the sales forecast is often the root forecast from which others … are derived.” b. Public and not for profit sectors. i Examples: 1. Forecasting the demand for police patrol services. a. W = 5.66 + 1.84 POP + 1.70 ARR – 0.93 AFF + 0.61 VAC + 0.13 DEN i. W: call-for-service work load. ii. POP: a population factor iii. ARR: an arrest factor iv. AFF: an affluence factor v. VAC: a vacancy factor vi. DEN: a density factor 2. Forecasting state budgets. 3. Forecasting hospital nursing staff requirements. 3. Forecasting and supply chain management a. “We can think of the supply chain as encompassing all of the various flows between suppliers, producers, distributors (wholesalers, retailers, etc.), and consumers. Throughout this chain each participant, prior to the final consumer, must manage supplies, inventories, production, and shipping in one form or another… Each one of these suppliers has its own suppliers back one more step in the supply chain. With all of these businesses trying to reduce inventory costs … reliability and cooperation across the supply chain become essential.” 4. Collaborative forecasting a. “The recognition that improving functions throughout the supply chain can be aided by appropriate use of forecasting tools has led to increased cooperation among supply chain partners.” b. “This cooperative effort … has become known as Collaborative Planning Forecasting and Replenishment (CPFR). CPFR involves coordination, communication, and cooperation among participants in the supply chain.” c. “In its simplest form the process is as follows: i A manufacturer that produces a consumer good computes its forecast. ii That forecast is then shared with retailers that sell that product to end-use consumers. iii Those retailers respond with any specific knowledge that they have regarding their future intentions related to purchases based on known promotions, programs, shutdown, or other proprietary information about which the manufacturer may not have had prior knowledge. iv The manufacturer then updates the forecast including the shared information.” d. Benefits of collaborative forecasting include: i Lower inventory and capacity buffers ii Fewer unplanned shipments or production runs 1. “These unplanned shipments usually carry a premium price.” iii Reduced stockouts 1. Stockouts “will always have a negative impact on the seller due to lost [current] sales and lower customer satisfaction” and hence lower future sales. iv Increased customer satisfaction and repeat business v Better preparation for sales promotions 1. “No one wants to promote products that cannot be supplied.” vi Better preparation for new product introductions 1. “New product launches can be very tricky … Meeting the needs of new product launches can [optimize] launch timing and increase speed to market.” vii Dynamically respond to market changes 1. “Sometimes markets change based on external factors (popular culture, government controls, etc.). Being able to respond to these special cases without overstocking or understocking is critical.” e. Potential costs of collaborative forecasting i “In a collaborative environment there is a lot of information that flows between the two parties. Most of the time, information resides in public forums (computer servers) with only a software security system protecting if from outsiders. Collaborative forecasting does run the risk of loss of confidentiality to … competitors.” 5. Computer use and quantitative forecasting a. “The widespread availability of computers has contributed to the use of quantitative forecasting techniques. Most of the methods described in this text fall into the realm of quantitative forecasting techniques, many of which would not be practical to carry out by hand.” b. Charles W. Chase, Jr., formerly director of forecasting at Johnson &Johnson Consumer Products, Inc. on the relationship between quantitative methods and judgment: i “Forecasting is a blend of science and art. Like most things in business, the rule of 80/20 applies to forecasting. By and large, forecasts are driven 80 percent mathematically and 20 percent judgmentally.” 6. Qualitative or subjective forecasting methods a. “Quantitative techniques using the power of the computer have come to dominate the forecasting landscape. However, there is a rich history of forecasting based on subjective and judgmental methods, some of which remain useful even today. These methods are probably most appropriately used when the forecaster is faced with a severe shortage of historical data and/or when quantitative expertise is not available… Very long-range forecasting is an example of such a situation.” b. Examples i Sales-force composites 1. “Members of the sales force are asked to estimate sales for each product they handle.” ii Surveys of customers and the general population 1. “In some situations it may be practical to survey customers for advanced information about their buying intentions.” iii Jury of executive opinion 1. “a forecast is developed by combining the subjective opinions of the managers and executives who are most likely to have the best insights about the firm’s business” iv The Delphi method 1. “Similar to the jury of executive opinion ... [with] the additional advantage … of anonymity among the participants. The experts, perhaps five to seven in number, never meet to discuss their views; none of them even knows who else is on the panel.” c. Advantages i Can use when shortage of quantitative expertise 1. “They do not require any particular mathematical background of the individuals involved. As future business professionals, like yourself, become better trained in quantitative forms of analysis, this advantage will become less important.” ii Can use when shortage of historical data: For example, salesperson and consumer surveys useful when forecasting new product sales. iii Wide acceptance 1. “Historically, another advantage of subjective methods has been their wide acceptance by users.” iv Flexible due to subjectivity 1. “The underlying models are, by definition, subjective [i.e., determined by the person doing the analysis]. This subjectivity is nonetheless the most important advantage of this class of methods. There are often forces at work that cannot be captured by quantitative methods. They can, however, be sensed by experienced business professionals and can make an important contribution to improved forecasts.” v Complement quantitative techniques 1. “Quantitative methods reduced errors to about 60 percent of those that resulted from the subjective method that had been in use. When the less accurate subjective method was combined with quantitative methods, errors were further reduced to about 40 percent of the level when the subjective method was used alone.” vi Compared to quantitative analysis, less costly to do analysis. vii Compared to quantitative analysis, avoids data collection costs. d. Disadvantages i No straightforward algorithmic reference for how forecast made 1. “Users are increasingly concerned with how the forecast was developed, and with most subjective methods it is difficult to be specific in this regard.” ii Subject to bias because use opinion often based on subjective personal experiences iii Easily subject to manipulation 1. This can cause inconsistency over time iv May rely on experience of forecaster to be done well 1. “It takes years of experience for someone to learn how to convert intuitive judgment into good forecasts.” v Compared to quantitative, generally less accurate. vi Compared to quantitative, more difficult to combine more than one forecasting method. 1. Combining forecasts potentially increases the informational content embodied in a forecast. vii Compared to quantitative, cannot be adjusted to improve fit by applying specialized methods such as for seasonality. 7. Example – New Product Forecasting a. Introduction i “Often judgmental methods are better suited to forecasting new-product sales because there are many uncertainties and few known relationships.” ii “However, there are ways to make reasonable forecasts for new products. These typically include both qualitative judgments and quantitative tools on one type or another.” b. The Product Life Cycle Concept i Figure 1.1. ii “This notion of a product life cycle can be applied to a product class (such as personal passenger vehicles), to a product form (such as sport utility vehicles), or to a brand (such as Jeep Cherokee …).” iii “The real forecasting problems occur in the introductory stage (or in the preintroductory product development stage). Here the forecaster finds traditional quantitative methods of limited usefulness and must often turn to marketing research techniques and/or qualitative forecasting techniques.” iv “Once the mid-to-late growth stage is reached, there is probably sufficient historical data to consider a wide array of quantitative methods.” c. Analog Forecasts i “The basic idea behind the analog method is that the forecast of the new product is related to information that you have about the introduction of other similar products in the past.” ii For example, using prior sales of a similar product, adjusted for relative estimated percentage of households that will purchase the product, to estimate future sales of the new product. d. Test Marketing i “Test marketing involves introducing a product to a small part of the total market before doing a full product rollout.” e. Product Clinics i “Potential customers are invited to a specific location and are shown a product mockup or prototype, which in some situations is essentially the final product… Afterwards they are asked to evaluate the product during an in-depth personal interview and/or by filling out a product evaluation survey.” 8. Simple or “Naive” Forecasting Models a. First “Naïve” Forecasting Model i Ft = At-1 1. Ft is the forecast for t. 2. At-1 is the actual observation at period t-1. b. Modified “Naïve” Forecasting Models i Ft = At-4, 1. Ft is the forecast for t. 2. At-4 is the actual observation in the same quarter of the prior year. 3. Prior quarter data is used in the lag when there is seasonality in the data. ii Ft = At-12 1. Ft is the forecast for t. 2. At-1 is the actual observation in the same month in the prior year. 3. Prior month data is used in the lag when there is seasonality in the data. c. Second “Naïve” Forecasting Model i Ft = At-1 + P(At-1 – At-2) 1. “P is the proportion of the change between periods t-2 and t-1 that we choose to include in the forecast.” 9. Evaluating Forecasts a. “We need some way to evaluate the accuracy of forecasting models over a number of periods so that we can identify the model that generally works the best.” b. Evaluation Criterion: Also known as loss functions. These are quantitative measures of the accuracy of a forecast method. i See page 34-35 of text. 1. Each evaluation criterion has a major advantage or disadvantage. 10. Example – Forecasting Consumer Sentiment 11. Example – Forecasting Total Houses Sold 12. Example – Forecasting Gap Sales 13. Using Multiple Forecasts a. “We know it is unlikely that one model will always provide the most accurate forecast for any series. Thus, it makes sense to “hedge one’s bet,” in a sense, by using two or more forecasts. This may involve making a “most optimistic,” a “most pessimistic,” and a “most likely” forecast.” b. One simply was to get a “most likely” forecast, is to take an average of different forecasts. 14. Sources of Data a. Internal records: “The most obvious sources of data are the internal records of the organization itself. Such data include unit product sales histories, employment and production records, total revenue, shipments, orders received, inventory records, and so forth.” b. Trade associations: “For many types of forecasts the necessary data come from outside the firm. Various trade associations are a valuable source of such data. c. Government: “But the richest sources of external data are various governmental and syndicated services.” 15. Introduction to ForecastX a. See online “software notes.” b. See p. 49 – 52. 16. Excel Review a. See online “software notes.” 17. Homework a. Case Questions: 1, 2, 3 (do with pencil and paper only) b. Exercises: 1, 2, 3, 4 (use pencil and paper only for these first 4 questions), 7, 8 (use Excel only for these final 2 questions) Chapter 2: The Forecast Process, Data Considerations, and Model Selection 1. The Forecast Process a. Specify objectives i. “Objectives and applications of the forecast should be discussed between the individual(s) involved in preparing the forecast and those who will utilize the results.” b. Determine what to forecast c. Identify time dimensions: length and periodicity i. “Is the forecast needed on an annual, a quarterly, a monthly, a weekly, or a daily basis?” d. Data considerations: from where will the data come. e. Model selection i. Depends on the following: 1. Pattern exhibited by the data (the most important criterion) 2. Quantity of historic data available 3. Length of the forecast horizon f. Model evaluation i. “This is often done by evaluating how each model works in a retrospect.” ii. Measures such as root mean square error (RMSE) are used. iii. Fit: “how well the model works retrospectively.” iv. Accuracy: “relates to how well the model works in the forecast horizon (i.e., outside the period used to develop the model).” v. Holdout period: “When we have sufficient data, we often use a “holdout” period to evaluate forecast accuracy.” g. Forecast preparation i. “When two, or more, methods that have different information bases are used, their combination will frequently provide better forecasts than would either method alone.” h. Forecast presentation i. “In both written and oral presentations, the use of objective visual presentations of the results is very important.” i. Tracking results i. “Over time, even the best of models are likely to deteriorate in terms of accuracy and need to be respecified, or replaced with an alternative method.” 2. Trend, Seasonal, and Cyclical Data Patterns a. “A time series is likely to contain some, or all, of the following components:” b. Trend: a long term consistent change in the level of the data. i. Linear trends: relatively constant increases over time. ii. Nonlinear trends: trends which are increasing (accelerating) or decreasing over time. iii. Stationary: “Data are considered stationary when there is neither a positive nor a negative trend (i.e., the series is essentially flat in the long term).” c. Seasonal: regular variation in the level of the data that occurs at the same time each year. d. Cyclical: Cyclical fluctuations are usually “represented by wavelike upward and downward movements of the data around the long-term trend.” i. “Cyclical fluctuations are of longer duration and are less regular than are seasonal fluctuations.” ii. “The causes of cyclical fluctuations are less readily apparent as well. They are usually attributed to the ups and down in the general level of business activity that are frequently referred to a s the business cycle.” iii. Another definition of cyclical is the following: “when the data exhibit rises and falls that are not of a fixed period… The major distinction between a seasonal and a cyclical pattern is that the former is of a constant length and recurs on a regular periodic basis, while the latter varies in length.” (Makridakis, Wheelwright, and Hyndman, 1998, p. 25) iv. Cyclical fluctuations can be more easily seen once the data has been “deseasonalized” or “seasonally adjusted.” e. Irregular: “fluctuations that are not part of the other three components. These are often called random fluctuations. As such they are the most difficult to capture in a forecasting model.” f. See Figure 2.2 i. “The third line, which moves above and below the long-term trend but is smoother than the plot of THS, is what the THS series looks like after the seasonality has been removed. Such a series is said to be “deseasonalized,” or “seasonally adjusted” (SA).” ii. “By comparing the deseasonalized series with the trend, the cyclical nature of houses sold becomes clearer.” iii. A trend can be seen iv. Seasonality can be seen v. Cyclicality can be seen g. See Figure 2.3 i. A trend can be seen. 1. The trend is nonlinear: “the quadratic (nonlinear) trend in the lower graph provides a better basis for forecasting” than the linear trend in the upper graph. ii. Seasonality is not apparent iii. Cyclicality is not apparent 3. Statistics a. Descriptive Statistics i. Measures of central tendency 1. Mode 2. Median 3. Mean ii. Measures of dispersion 1. Range 2. Variance (for a population and sample) a. Population: σ2 = ∑ (Xi – μ)2 / N b. Sample: s2 = ∑ (Xi – Xbar)2 / (n-1) 3. Standard deviation a. Population: σ = square root [∑ (Xi – μ)2 / N] b. Sample: s = square root [∑ (Xi – Xbar)2 / (n-1)] b. Normal Distribution i. Symmetric ii. 68, 95, 99.7 rule. 1. μ + 1 σ included about 68% of the area 2. μ + 2 σ included about 95% of the area 3. μ + 3 σ included about 99.7% of the area iii. There are an infinite number of normal distributions. The shape of each normal distribution is described by mean μ and standard deviation σ iv. Standard normal distribution 1. Z = (X – μ) / σ 2. “every other normal distribution can be transformed easily into a standard normal distribution called the Z-distribution.” 3. “The Z-value measures the number of standard deviation by which X differs from the mean. If the calculated Z-value is positive, then X lies to the right of the mean (X is larger than μ). If the calculated Z-value is negative, then X lies to the left of the mean (X is smaller than μ). 4. See table 2.4, p. 74. This table is used to calculate the following a. The percentage of data that fall below a particular level. b. The percentage of data that fall above a particular level. c. The percentage of data that fall between two points. c. Student’s t Distribution i. Useful when working with sample data (as we typically do in a business context). ii. “When the population standard deviation is not known, or when the sample size is small, the Student’s t-distribution should be used rather than the normal distribution.” iii. “The Student’s t-distribution resembles the normal distribution but is somewhat more spread out for small sample sizes.” d. Statistical Inference – Confidence Intervals i. “A sample statistic is our best point estimate of the corresponding population parameter. While it is best, it is also likely to be wrong. Thus, in making an inference about a population it is usually desirable to make an interval estimate.” ii. Confidence interval for the mean (μ) of a population is the following: 1. μ = Xbar + t (s / square root n) 2. See table 2.5, p. 73. a. To find the value of t in the above equation, look up the value in the column corresponding with ½ of (1 – confidence interval). (For example, for a 95% confidence interval, look up the value in the t0.25 column.) e. Statistical Inference – Hypothesis Testing i. “Frequently we have a theory or hypothesis that we would like to evaluate statistically.” ii. “The process begins by setting up two hypotheses, the null hypothesis (designated H0:) and the alternative hypothesis (designated H1:). These two hypotheses should be structured so that they are mutually exclusive.” 1. Set up your null and alternative hypothesis. a. See Case I on page 76. b. Use Case I when testing whether a parameter is or is not equal to a certain value. i. Ho: μ = μ0 ii. Ha: μ not = μ0 2. Look up the critical value (tT) in the appropriate table. a. Two sided test with significance level = 5%: p.75. i. If degrees of freedom > 100 just use row corresponding with n = 100. b. Two sided test for a wide range of significance levels: Table 2.5, p. 73. i. For a two sided test, look up the critical value in the column corresponding with ½ of your significance level. (For example, for a 5% significance level, look up the critical value in the t0.25 column.) c. Note: Sometimes the test is specified in terms of the “confidence level.” The significance level is equal to 1 minus the confidence level of the test. 3. Calculate your t statistic a. t = (Xbar – μ0) / (s / square root n) 4. Compare t to the critical value (tT) and accept or reject the null. a. If |t| > tT then reject the null. f. Correlation i. Definition 1. Correlation is a measure of how linearly associated two variables are. Correlation gives information about the strength and direction of the linear association. 2. The stronger is the linear association, the closer the absolute value of the correlation coefficient is to 1. 3. The direction of the linear association of the variables is determined by the sign of the correlation coefficient. A positive sign reveals a positive association, and a negative sign reveals a negative association. ii. Figure 2.6 iii. Equation: p. 81 iv. Hypothesis testing on Correlation Coefficient 1. Null hypothesis is that r = 0. 2. See equation for t statistic on p. 83. 3. Reject the null if the t that you calculate using the equation on p. 83 is greater than the t value you look up in the t table. Use n-2 degrees of freedom when looking up the t value in the t table. g. Autocorrelation & lagged data i. With autocorrelation, we can calculate the correlation between observations in a data series and the k period lagged values of the data series. ii. K period lag: the number of periods prior to a given period. iii. Autocorrelation is similar to correlation. With correlation we essentially multiply/compare the value of one series (relative to its mean) with the corresponding value of another series (relative to its mean). Now, however, the second series is just a lagged version of the original series. iv. K period lagged data series: A data series where the value for any given period is equal to value for the original (non-lagged) series k periods before. 1. When your data are ordered in a column with the oldest data on the top (period 1 on top), yt-k is created by simply shifting your original time series data (yt) down by k rows. You have now created a new time series with a k period lag. 2. Here the t – k indicates a series that is lagged by k periods. 3. Now you can think of two separate variables, your original time series (yt) and your lagged time series (yt-k). Now to calculate rk, just multiply data from the same row in each column (the corresponding values from each series), where the two columns represent yt and yt-k.. v. Autocorrelation Equation for a k period lag, rk: p. 84 1. There are some errors in book’s equation: a. First term after summation sign in numerator should have a subscript “t + k” rather than “t – k” b. Summation in denominator should begin at “t = 1” rather than “t – 1” vi. So, autocorrelation tells us about both trends and about seasonality. This is very useful information for forecasters for two reasons. 1. First, some forecasting methods are more useful for data without trends, while other forecasting methods are more useful for data with trends. 2. Second, some forecasting methods are more useful for data without seasonality, while other forecasting methods are more useful with seasonality. vii. Hypothesis testing 1. It is useful to run a hypothesis test for autocorrelation ρk = 0, for various k, in order to get a good idea as to whether the time series exhibits a trend or seasonality. a. t = (rk – 0) / [1 / square root (n – k)] i. (n – k) is the degrees of freedom b. Evidence of a trend: see below c. Evidence of seasonality: see below 2. Rule of thumb: The following is the rule of thumb for rejecting the null hypothesis that autocorrelation ρk = 0 at the 95% confidence level. a. |rk| > 2 / square root (n – k), or, approximately b. |rk| > 2 / square root n, when n is large relative to k. c. The rule of thumb being satisfied corresponds with a t statistic which is greater than approximately 2. h. Correlograms (aka, Autocorrelation Function - ACF) i. Definition 1. “A k-period plot of autocorrelations is called an autocorrelation function (ACF), or a correlogram.” ii. Evidence of Stationarity vs. a Trend 1. “If the time series is stationary, the value of rk should diminish rapidly toward zero as k increases. If, on the other hand, there is a trend, rk will decline toward zero slowly.” 2. Rejecting the null that ρk = 0 for most k is evidence of a trend. Reject the null for only one or two lags however is evidence against a trend. 3. Note that trends are difficult to differentiate from cycles using correlograms. (Note that one one can interpret cycles as alternating trends.) When the data appears not to be stationary, one can assume the existence of either a trend, a cycle, or both. (However, with cycles, the correlation will often eventually turn negative.) iii. Evidence of Seasonality 1. For stationary data, rejecting the null that ρk = 0 primarily (and or most convincingly) for lags which are divisible by the number of periods in a year is evidence of seasonality. 2. For example, “if a seasonal pattern exists, the value of rk may be significantly different from zero at k = 4 for quarterly data, or k = 12 for monthly data. (For quarterly data, rk for k = 8, 12, 16, … may also be large. For monthly data, a large rk may also be found for k = 24, 36, etc.)” 3. Seasonality can be difficult to see in a correlogram when the data exhibit either a cyle or a trend. However, first differencing the data, which often makes the data stationary, can make seasonality much more evident in a correlogram. (A description of how to first difference is below.) iv. Hypothesis testing for zero correlation 1. “To determine whether the autocorrelation at lag k is significantly different from zero,” use the rule of thumb on p. 84 in the text. 2. A bar beyond either of the horizontal lines in our software print outs, and in Figure 2.8 & 2.10, implies that the null hypothesis of zero autocorrelation should be rejected. v. Differencing to remove trends. 1. “If we want to try a forecasting method … that requires stationary data, we must first transform the … data to a stationary series. Often this can be done by using first differences.” 2. Types of differencing a. First differencing. b. Second differencing. 3. When differencing results in a stationary time series it allows us to utilize forecasting methods that are better suited to data without a trend. We then will forecast a difference (or several differences). From the difference(s) we can then “back out” a forecast for the actual level of the time series. a. Similarly, we can de-seasonalize data, use a forecast method better suited to data without seasonality, and then re-seasonalize our forecast. 4. Review (Relevance of information in this chapter) i. the reasons discussed above. 5. Forecast X a. See p. 49 – 52 from last chapter for a quick review. b. See p. 93 – 95. c. For making graphs, see online “software notes”. 6. Homework a. Case Questions: 2, 3. b. 8 – 11. Chapter 3: Moving Averages and Exponential Smoothing 1. The relationship of this chapter to the prior chapter. a. In this chapter we will discuss four different forecasting models. These models are appropriate under different underlying conditions in the data: stationarity, trend, and or seasonality. It is using information acquired from the procedures discussed in the last chapter that gives us insight into which of the models in this chapter would be the most appropriate choice to construct a forecast. 2. Smoothing: a. “a form of weighted average of past observations to smooth up-and-down movements, that is, some statistical method of suppressing short-term fluctuations” b. “The assumption underlying these methods is that the fluctuations in past values represent random departures from some smooth curve that, once identified, can plausibly be extrapolated into the future to produce a forecast or series of forecasts.” c. We will discuss several smoothing techniques in this chapter. d. “all [smoothing techniques] are based on the concept that there is some underlying pattern to the data … cycles or fluctuations that tend to occur.” 3. Moving Averages a. “The simple statistical method of moving averages may mimic some data better than a complicated mathematical function.” b. Calculating moving averages: Figure 3.1 and table 3.1 c. “The choice of the interval for the moving average [should be consistent with] the length of the underlying cycle or pattern in the original data.” d. The first naïve forecasting model presented in Chapter 1 was essentially a one period moving average. e. On the bottom of table 3.1 the RMSE is used to evaluate the different interval MAs (3 quarter versus 5 quarter). f. Figure 3.2 & 3.3: The “failure of the moving averages to predict peaks and troughs is one of the shortcomings of moving-average models.” g. The MA method of forecasting can lead forecasters into incorrectly identifying cycles that don’t exist. The reason is that an MA creates serial correlation (autocorrelation) in the forecasted data, as prior year MA data are functions of the same data. Serial correlation (autocorrelation) in turn can lead to cycles in the data. i. “Since any moving average is serially correlated … any sequence of random numbers could appear to exhibit cyclical fluctuation.” h. Like the first naïve forecasting, moving averages are effective in forecasting stationary time series. Why? However, they are not good at handling trends or seasonality very well. 4. Simple Exponential Smoothing a. Like moving averages, exponential smoothing is properly used when there is no trend. b. “With exponential smoothing, the forecast value at any time is a weighted average of all the available previous values...” c. “Moving-average forecasting gives equal weights to the past values included in each average; exponential smoothing gives more weight to the recent observations and less to the older observations.” d. The weight of the most recent observation is α, the next most recent observation is (1-α)α, the next observation (1-α)2α, and so on. e. The simple exponential smoothing model can be written as in 3.1. i. α is between 0 and 1. f. An alternative interpretation of the exponential smoothing model in equation 3.1 is seen in 3.2. i. “From this form we can see that the exponential smoothing model “learns” from past errors. The forecast value at period t + 1 is increased if the actual value for period t is greater than it was forecast to be, and it is decreased if Xt is less than Ft.” g. Although forecasting the value for the next period requires us only to know last period’s forecast and actual value, all past observations are embodied in the forecast. h. This leads us to a third interpretation of the exponential smoothing model in 3.1. i. See 3.3 ii. Note that exponential smoothing allows the weights to sum to one regardless of when your data starts. i. Understanding the weights on past observations of x. i. Note the nature of the weights on past observations: α, (1-α)α, (1-α)2α, (1α)3α … ii. Remember, the value of α is between 0 and 1. iii. α values close to 1 imply recent data is weighted much higher than past data. iv. α values close to 0 imply recent data is weighted only slightly higher than past data. v. See table on p. 108. vi. Note that regardless of the level of α, the weights will eventually sum to 1. j. Tips on selecting α. i. With a great deal of random variation in the time series, choose an α closer to 0. ii. If you want your forecast to depend strongly on recent changes in the time series, choose an α closer to 1. iii. The RMSE is often used as a criterion to determine the best level of α. iv. Generally, small values of α work best when exponential smoothing is the appropriate model. k. Note that in order to utilize this model we must make an initial estimate for F. “This process of choosing an initial value for the smoothed series is called initializing the model, or warming up the model.” i. “R. G. Brown first suggested using the mean of the data for the starting value, and this suggestion has been quite popular in actual practice.” l. If a value for α is not selected, ForecastX will select one to minimize the RMSE. m. Advantages of the simple exponential smoothing model. i. “it requires a limited quantity of data” ii. “it is simpler than most other forecasting methods” n. Disadvantages of the simple exponential smoothing model. i. “its forecasts lag behind the actual data” ii. “it has no ability to adjust for a trend or seasonality in the data” 5. Holt’s Exponential Smoothing a. Holt’s model is utilized when there is a trend in the data. b. See 3.4 – 3.6. c. Intuition: In order to get a forecast for a period, we take the smoothed value for the prior period and then add to it an estimate of the trend. d. Equation 3.4 estimates the smoothed value in period t+1. i. Note, 3.4 does not represent the forecast. F now represents the smoothed value, rather than the forecast. H now represents the forecast. ii. Note, the smoothed value is a linear combination of actual data from the period and the sum of the smoothed value for the last period and an estimate of the trend. e. Equation 3.5 estimates the trend from t to t+1. i. The estimate for the trend is a linear combination of the change in the smoothed data and the previous trend estimate. ii. Equation 3.5 reflects the fact that when we estimate a trend, we use both the most recent trend in the smoothed data (Ft+1 – Ft), and the previous trend estimiate (Tt). iii. From our prior discussion of 3.1 (and 3.3), it is apparent that the trend estimate is a weighted average of all the prior one-period changes in the smoothed values. f. “Equation 3.6 is used to forecast m periods into the future by adding the product of the trend component, Tt+1, and the number of periods to forecast, m, to the current value of the smoothed data Ft+1.” g. “Two starting values are needed: one for the first smoothed value [F] and another for the first trend value [T].” i. “The initial smoothed value is often a recent actual value available;” ii. “the initial trend value is often 0.00 if no past data are available.” iii. ForecastX will choose these values h. ForecastX software will choose the smoothing constants (α and γ) if you do not select them. i. “Holt’s form of exponential smoothing is then best used when the data show some linear trend but little or no seasonality. A descriptive name for Holt’s smoothing might be linear-trend smoothing.” 6. Winter’s Exponential Smoothing a. Winter’s model is utilized when there is both a trend and seasonality. b. See 3.7 – 3.10. c. Intuition: In order to get a forecast for a period, we take the smoothed (deseasonalized) value for the prior period, add to it an estimate of the trend, and then reseasonalize it. d. Equation 3.7 estimates the smoothed value in period t + 1. i. Note, 3.7 does not represent the forecast. Again, F represents the smoothed value, rather than the forecast. W now represents the forecast. ii. Note that this is identical to out Holt’s smoothed value equation, except for the fact that the Xt term is divided by St-p, a seasonality estimate. iii. Note, the smoothed value is a linear combination of deseasonalized actual data from the period and the sum of the smoothed value for the last period and an estimate of the trend. e. Equation 3.8 estimates the seasonality. i. The seasonality estimate is a linear combination of the ratio of actual data to the smoothed value and the prior seasonality estimate. f. Equation 3.9 estimates the trend. i. The estimate for the trend is a linear combination of the change in the smoothed data and the previous trend estimate. ii. Note that this is identical to out Holt’s trend estimate equation. g. Equation 3.10 “is used to compute the forecast for m periods into the future;” h. Seven values must be used to initialize or warm up the model: one for the first smoothed value, one for the first trend value, and one for each of the first seasonality values. i. These initial values are chosen by the software. i. ForecastX software will choose the smoothing constants (α, β, and γ) if you do not select them. j. Winter’s form of exponential smoothing is best used then the data show a linear trend and seasonality. k. An alternative to using Winter’s model is to deseasonalize the data and then use Holt’s model to get a forecast. The forecast would then be reseasonalized. l. Seasonal Indicees i. “As part of the calculation with an adjustment for seasonality, seasonal indices are calculated and displayed in most forecasting software.” m. Note, when data exhibit seasonality, but no trend, an alternative to using the Winters model is the following: deseasonalize the data, construct a forecast using the simple exponential model, and then reseasonalize the data. n. Note, when data exhibit seasonality and a trend, an alternative to the Winter’s model is the following: deseasonalize the data, construct a forecast using the Holt’s model, and then reseasonalize the data. 7. Seasonal Indicees a. Seasonal indicees essentially measure how high (or low) observations for a particular series are during a given period. For example, for quarterly data, a season index would tell how high (or low) data tended to be for the 1st, 2nd, 3rd, and 4th quarter. Seasonal indices are meant to capture predictable seasonal variation in a series. A seasonal index of 1.3 for the fourth quarter implies that fourth quarter data tends to be higher than average (since 1.3 is higher than 1). b. Deseasonalizing: dividing data by the seasonal index for the respective period. c. Reseasonalizing: multiplying data by the seasonal index for the respective period. d. Note, when data exhibit seasonality, but no trend, an alternative to using the Winters model is the following: deseasonalize the data, construct a forecast using the simple exponential model, and then reseasonalize the data. e. Note, when data exhibit seasonality and a trend, an alternative to the Winter’s model is the following: deseasonalize the data, construct a forecast using the Holt’s model, and then reseasonalize the data. 8. Note a. “A chief flaw with smoothing models is their inability to predict cyclical reversals in the data, since forecasts depend solely on the past.” 9. See Software Tips 10. Homework a. Case Questions: 1 (only answer the first two sub-questions, and replace "2004" by "2008"), 2. b. Exercises: 1, 3, 4, 7, 8, 9, 11, 12 Chapter 4: Introduction to Forecasting Regression Methods 1. The Bivariate Regression Model a. Y = β0 + β1X + ε i. This is the assumed true underlying relationship between X (the independent variable) and Y (the dependent variable). b. Y-hat = b0 + b1X i. This is the equation we estimate. It is the equation for the line which comes as close as possible to matching the data. c. e = Y – Y-hat i. This is the error term. It is how far our actual level of Y is from our predicted level of Y (Y-hat). d. Min ∑e2 = ∑ (Y - b0 + b1X)2 i. Ordinary Least Squares (OLS) is an algorithm for choosing b0 and b1 such that the sum of the squared errors (the above equation) is minimized. 2. Regression – What We All Try To Do a. Regression is most commonly used to estimate the causal impact of one variable on another variable. Ultimately, in this class, we will use it to forecast a variable of interest. b. One of your main take homes from this class can be casually estimating causal impacts: draw the scatter plot, draw the line, the slope is an estimate of the causal impact of the X variable on the Y variable. i. With more data your estimate is better. ii. If there is a confounding lurking variable, then your estimate is biased. c. The progression from thinking casually about the causal effect of one variable on another variable, to understanding regression: i. Think of two observations on some causal effect you are trying to understand/estimate. 1. For example, the causal effect of Hours Studied (X) on GPA (Y). The two observations would be two pairs: the Hours Studied and GPA for one individual (HS1, GPA1), and the Hours Studied and GPA for a second individual (HS2, GPA2). ii. From these two observations, you can estimate (somewhat crudely) whether the effect is positive or negative. iii. You can also estimate the size of the effect by taking the ratio ΔY / ΔX. 1. This is an estimate for the causal impact of a one unit change in X on Y. iv. If you plotted these two data points on a scatterplot, with the X variable on the horizontal axis, and the Y variable on the vertical axis, then the slope of the line through the two data points is exactly the same ΔY / ΔX from the prior step. v. So plotting the scatterplot, drawing a line through the data, and calculating the slope is essentially a way of estimating the causal effect of one variable (the one plotted on the X axis) on another variable (the one plotted on the Y axis). vi. Ultimately, we will plot many observations on the scatterplot and use the slope of the line that most closely matches the data as our estimate of the causal impact of the X variable on the Y variable. 1. Under the appropriate conditions, this can be a very reliable estimate. 2. Regardless, hopefully you can understand how regression analysis basically is an extension of a process of thinking about causality which you may already do (step i through iii). 3. Bivariate Regression in a Nutshell a. Begin with a data set containing two variables, one which we want to forecast (the dependent variable), and another which we will use to make the forecast (the independent variable). i. Think of a data table including the following columns: The period, Variable 1 (the Y variable or forecast variable), Variable 2 (the X variable or predicting variable). ii. Later we will add the following variables to the data table: Time, Y-hat, e. b. Assume the following linear relationship between the forecast variable and the predicting variable. i. Y = β0 + β1X + ε 1. This is the bivariate regression model. 2. Y is the variable we want to forecast. 3. X is the variable we will use to make the forecast. 4. This is a linear relationship because it has the general form of a line: Y = mX + b. β0 above corresponds with b and β1X above corresponds with mX. ii. Y-hat = b0 + b1X 1. This is the equation we use to forecast Y. Y-hat is the forecast (or prediction) of Y, for a given level of X. 2. The software we use will give us the b0 and b1 in the above equation. 3. This is the equation for the STRAIGHT line which comes as close as possible to matching the data. iii. Note that other variables besides the predicting variable impact Y through ε. Therefore, the assumed linear relationship does not imply that other variables do not impact Y. 1. More on this below. iv. β1 is an estimate of the impact of a one unit change in X on Y. c. Estimate β0 and β1 using software. i. The estimate for β0 will be referred to as b0. ii. The estimate for β1 will be referred to as b1. iii. β0 and β1 are the true values, while b0 and b1 are our estimates of those values. 1. The β’s are analogous to µ, while the b’s are analogous to Xbar. Here, the first set of variables (β and µ) are the true (population) parameter we are trying to estimate, while the second set of variables (b and Xbar are the estimates). d. b0 and b1 are the Y intercept and slope, respectively, of the “regression line” which comes as close as possible to matching the data (the “best fit” line) when we plot the independent variable on the X axis and the dependent variable on the Y axis. i. Plotting the data is simply recording, with a series of points on a graph, the combinations of the X variable (the independent variable) and Y variable (the dependent variable) for each period for which we have data. ii. There may be other variables that effect Y besides X. Excluding them from the analysis does not bias our estimate of β1. (Unless the excluded variable is a confounding lurking variable.) e. The quality of our estimates of β0 and β1 (that is, the quality of b0 and b1) are determined largely by (1) the sample size, (2) the lack of omitted variable bias (confounding lurking variables), (3) the assumed linearity of the relationship between X and Y. i. While we will estimate the slope of the line which most closely matches the scatterplot of the data, ultimately, that estimate will only be useful if the visual relationship between X and Y is linear (is a straight line). ii. If the scatterplot of the data has a curved shape, then the slope will not accurately reflect the causal effect of X on Y. f. The following equation gives up the predicted value of Y, we’ll call it Y-hat, for each level of X. It is the equation of the regression line. i. Y-hat = b0 + b1X ii. This is just the equation for the line which most closely matches the data. iii. We can add the Y-hat variable to the above data table, calculate Y-hat for each period, and then record it in the Y-hat column in the data table. g. The error, e, will be defined as the difference between the actual level of Y for a given level of X, and the predicted level of Y (Y-hat) for a given level of X. i. e = Y – Y-hat ii. On the graph, this is just the vertical distance between the predicted level of Y (Y-hat) for a given level of X (the regression line), and the actual level of Y for that same level of X. iii. The size of the e’s (that is, the scatter of the points around the regression line) is not a measure of the quality of the estimate of β1, rather it is an indication of the degree to which other variables (captured by ε) also influence the dependent variable. 1. Generally, the greater is the spread of the data around the regression line, the greater is the impact of all other variables on the forecast variable (Y). The impact of all other variables on the forecast variable is represented by ε in the assumed linear relationship equation: Y = β0 + β1X + ε. iv. We can add the e variable to the above data table, calculate e for each period (that is, each X Y combination for each period), and then record it in the e column in the data table. h. Note that when our software finds the equation of the line which most closely matches the data, it finds the line which minimizes the sum of squared errors: i. Min ∑e2 = ∑ (Y - b0 + b1X)2 i. Note, this procedure is useful because it enables us to forecast the dependent variable (Y) using the independent variable (X). i. However, we must first acquire an estimate for the independent variable for the period when we wish to forecast the forecasting variable. 4. Data Considerations a. Time series data: data covering multiple periods. b. Cross sectional data: data on a variety of variables covering only one period. 5. 6. 7. 8. c. Panel data: data on a variety of variable covering multiple periods. The Bottom Line a. We will use the software to find the b0 and b1 for the line which most closely matches the plot of our data(Y-hat = b0 + b1X). i. Graphically: find the line which “best fits” the data. b. We will use the b0 and the b1, in combination with a forecast for the independent variable (X), to produce a forecast (Y-hat) for the for dependent variable. Visualization of Data a. Observing data in graphical form can give insight into the regression process. b. See table 4.1 and Figure 4.1. c. “For all four of the data sets in Table 4.1, the calculated regression results show an OLS equation of Y-hat = 3 + 0.5X. It might also be noted that the mean of the X’s is 9.0 and the mean of the Y’s is 7.5 in all four cases. The standard deviation is 3.32 for all of the X variables and 2.03 for all of the Y variables. Similarly, the correlation for each pair of X and Y variable is 0.82.” d. “Visualization of these data allows us to see stark differences that would not be apparent from the descriptive statistics we have reviewed.” i. “The regression line is most clearly inappropriate for the data in the lower right plot.” ii. “The upper right plot of data suggests that a nonlinear model would fit the data better than a linear function.” e. The bottom line: looking at the scatterplot of the data can often give you insight into your data analysis: A Process For Regression Forecasting a. Data Considerations: “We should utilize graphic techniques to inspect the data, looking especially for trend, seasonal, and cyclical components, as well as for outliers. This will help in determining what type of regression model may be most appropriate (e.g., linear versus nonlinear, or trend versus causal).” b. Forecast the independent variable: “Each potential independent variable should be forecast using a method that is appropriate to that particular series, taking into account the model-selection guidelines discussed in Chapter 2 and summarized in Table 2.1.” c. Specify & evaluate the model: “… we mean the statistical process of estimating the regression coefficients … In doing so we recommend a holdout period for evaluation… you can then test the model in [the holdout] period to get a truer feel for how well the model meets your needs.” Forecasting With A Simple Linear Trend a. DPI-hat = b0 + b1(T) i. Here, T is time. You would construct this data yourself by listing in the “Time” variable column a 1, 2, … n, where n is the number of periods in your data set. ii. Another way of representing the same model bivariate regression models the following: DPI = β0 + β1T + ε b. For the final forecast plug in the appropriate time period for the period being forecasted (T), and values for b0 and b1 from the regression, into the above equation. c. Example: i. Table 4.2: table of DPI data ii. Figure 4.2: graph of DPI data iii. P. 168-9: results of regression iv. Table 4.3: error calculations 9. Using a Causal Regression Model to Forecast: A Jewelry Sales Forecast Based on Disposable Personal Income. a. JS-hat = b0 + b1(DPI) i. This is the model which we will estimate. ii. Regression 1: using actual jewelry sales. iii. Regression 2: using deseasonalized jewelry sales. 1. Note that it is necessary to reseasonalize the estimate to achieve the final forecast. b. Forecasts for JS are made by plugging the following into the regression model equation: JS-hat = b0 + b1(DPI) i. (1) estimates for b0 and b1 from the regression software, and ii. (2) forecasts for DPI from, for example, Holt’s forecasting model. c. Example: i. Table 4.4: table of JS and DPI data. ii. Figure 4.4: graph of JS and deasonalized JS data. iii. Figure 4.5: graphs of JS and DPI data showing possible relationship between data. iv. P. 175-6: results of regression 1. v. P. 177-8: results of regression 2. 10. Statistical Evaluation of Regression Models a. “Does the sign of the slope term make sense?” b. “Is the slope term significantly positive or negative?” (That is, is the slope term significantly different than zero?) i. A lower p value in the statistical software output tells you that b1 is a more significant (accurate) estimate of β1. ii. If p is below 0.01, we say that our estimate is significant at the 1% level. iii. If p is below 0.1, we say that our estimate is significant at the 10% level. iv. Etcetera. v. If p is above 0.1, we say that our estimate is insignificant. c. R Squared i. It tells the proportion of variation explained by the predicting variable. ii. It gives us an indication of the quality of our forecast. 11. Using the Standard Error of the Estimate (SEE) to make interval forecasts a. “The approximate 95 percent confidence interval can be calculated as follows:” i. Point estimate + 2 (standard error of the estimate) ii. Table 4.5: SEE and other regression results. iii. P. 185: confidence interval calculations. 12. Forecasting Total Houses Sold With Two Bivariate Regression Models a. Read for review of the basic principles already covered in the above lecture and prior examples. 13. Homework a. Exercises: 1, 2, 3, 4, 5, 6, 7, 8, 10 Chapter 5: Forecasting with Multiple Regression 1. The Multiple Regression Model a. Y = β0 + β1X + β2X2 + … + βkXk + ε i. This is the assumed relationship between X (the independent variable) and Y (the dependent variable). b. Y-hat = b0 + b1X1+ … + bkXk i. This is the equation we estimate. We find the b’s which result in the Yhat’s (for all the different values for X that we have) that are closest to the actual Y’s. c. e = Y – Y-hat i. This is the error term. It is how far our actual level of Y is from our predicted level of Y (Y-hat). d. Min ∑e2 = ∑ (Y – b0 + b1X1 + … + bkXk)2 i. Ordinary Least Squares (OLS) is an algorithm for choosing the b’s such that the sum of the squared errors (the above equations) are minimized. 2. Selecting Independent Variables a. Try to choose independent variables that are not too highly correlated: b. Consider using proxies for variables for which data is not available: i. “Sometimes it is difficult, or even impossible, to find a variable that measures exactly what we want to have in our model… However, a more readily available series … may be a reasonable proxy for what we want to measure.” 3. Forecasting with a multiple-regression model a. Example beginning on p. 227. i. Note the signs on the coefficients. ii. Forecasting using the regression results. 1. First, the independent variables included in the regression must be forecasted. In this example, the forecast for the independent variables will be made using Holt’s exponential smoothing model. 4. Statistical evaluation of multiple-regression models: Three quick steps a. First, “see whether the signs on the coefficients make sense.” b. Second, “consider whether these results are statistically significant at our desired level of confidence.” i. Significance level = 1 – “confidence level” ii. t = bi / se(bi) 1. Is the calculated t ratio greater than the critical value. If so, the estimated coefficient is significant. a. Critical value can be found by looking it up in the table on page 73. b. Use n – (K + 1) degrees of freedom. i. n is number of observations ii. K is number of independent (right hand side) variables. iii. p value: calculated by software 1. Is the p value smaller than the significance level. If so, the estimated coefficient is significant. iv. See table 5.2, p. 236. c. Third, evaluate the adjusted R squared. The adjusted R squared tells us the proportion of the variation in the dependent variable explained by variation in the independent variables. i. The reason we look at the adjusted R squared is because “adding another independent variable will always increase R-squared even if the variable has no meaningful relation to the dependent variable.” 5. Accounting for seasonality in a multiple-regression model a. Dummy variables: a variable that gets a one if the criterion for the dummy variable is met, and zero otherwise. b. For example, we could create a “first quarter” dummy variable, where the value of the variable is one if it is the 1st quarter, and zero otherwise. c. We could also have a dummy variable for the second quarter and third quarter, but not the fourth quarter. You would not have a dummy variable for all of the possible values of the underlying variable, rather all but one of the possible values. i. Note that the value of the coefficient on each dummy variable tells you the difference between the effect for the value/range of the underlying variable represented by the dummy variable, and the effect for the value/range represented by the case not represented by a dummy variable. Thus, the case not represented by a dummy variable is the benchmark against which all other dummy variable coefficients are measured. ii. Note that this makes a dummy variable regression different than a seasonal index as seasonal indices measure relative differences (ratios), while dummy variables measure absolute differences. d. See table 5.6 on p. 255. i. Note the movement of the R squared (Adjusted R squared) as we have progress through each of the NCS regressions. 6. Extensions of the multiple-regression model a. Sometimes adding a squared version of one of the variable to the regression can improve the fit of the regression. b. Implications of coefficient signs on time (T) and time squared (T2) i. Coefficient on T: 1. Positive: effect over time is initially positive. 2. Negative: effect over time is initially nevative. ii. Coefficient on T2: 1. Positive: effect over time is more towards positive a. More positive (see graph displaying positive coefficient on T) or b. Less negative (see graph displaying negative coefficient on T). 2. Negative: effect over time is less towards positive a. Less positive (see graph displaying positive coefficient on T) or b. More negative (see graph displaying negative coefficient on T). c. See table 5.7 on p. 261. i. Note that movement of the Adjusted R squared as we progressed one more step in our series of NCS regressions. 7. Advice on using multiple regression in forecasting a. KIS: Keep it simple. “The more complex the model becomes, the more difficult it is to use. As more causal variables are used, the cost of maintaining the needed database increases in terms of both time and money. Further, complex models are more difficult to communicate to others who may be the actual users of the forecast. They are less likely to trust a model that they do not understand than a simpler model that they do understand.” 8. Homework: a. Exercises: 1, 2, 3, 4b, 5, 11 Chapter 6: Time Series Decomposition 1. Introduction a. Time series decomposition allows us to decompose a time series into its constituent components. b. In this chapter, we will decompose the initial time series into 4 constituent components (4 subseries) – three predictable subseries and one random subseries. Why is this useful? Once the decomposition is completed, the 3 non-random subseries can be forecasted. This is useful because the sub-series can then be recombined to make a forecast for the initial series. 2. The Basic Model a. Y = T x S x C x I i. T: the long-term component ii. S: the seasonal component (the seasonal adjustment factor) iii. C: the cyclical component (the cyclical adjustment factor) iv. I: the irregular or random component (irregular or random variations) 3. Deseasonalizing the Data & Finding Seasonal Indicees a. Conceptual Description i. Calculate a moving average to essentially deseasonalize the data. 1. 4 period MA for quarterly data. 2. 12 period MA for monthly data. 3. 52 period MA for weekly data. ii. For each period, determine the ratio of actual data to the MA. iii. Calculate an average ratio for each period, going back as far as your data allows. iv. When you are done you will have an average ratio for each period, and this will be your seasonal index (seasonal adjustment factor). b. MAt (Moving Average) i. Here t represents the period for which the MA is calculated, not the number of periods included in the MA. ii. Note that a moving average, calculated using the same number of periods as the number of periods in a year for your data, will remove the seasonality. 1. The number of periods included in the MA must correspond with the number of periods in the year. iii. The MA here is defined differently compared to chapter 3. Here the MA takes into account future and past data. 1. When there is an even number of periods included in the MA, you must take an odd number of data points from the past/future. Therefore, there will be one additional data point from one of either the past or the future. We will always take one additional data point from the past. 2. See p. 302. iv. Note that we would prefer to have a centered moving average. Unfortunately, with an even number of periods our simple method of calculating a moving average we cannot calculate a centered moving average. However, if we average two of our moving averages, we can produce a centered moving average. c. CMAt (Centered Moving Average) i. Creating a centered moving average allows you to equally weight past and future data. ii. Averaging two MAs to create a CMA is reasonable because it is just averages two moving averages, one that weights the past “one data point” more, and one that weights the present “one data point” more. iii. p. 302 iv. CMAt is a deseasonalized version of the Yt. That is, the centered moving average is one way of deseasonalizing the data. d. SFt (Seasonal Factor) i. Yt / CMAt ii. See p. 304. e. SIp (Seasonal Index) i. Take the average SFt for each period in a year (e.g., for each month if you have monthly data). This will produce a SFp for each period of the year (e.g., for each month if you have monthly data). Then normalize the SFp series so that the sum equals the periodicity (e.g., equals 12 for monthly data). 1. SIp = SFp / (∑SFp / # p in a year) ii. p denotes a particular period in the year. For example, it equals 1, 2, 3, or 4 for quarterly data and 1, 2, …, 12 for monthly data. iii. SI is the “S” in the above equation. iv. The variation in SIp allows us to determine the extent of the seasonality. v. Deseasonalized data = Raw data / Seasonal Index 1. That is, the seasonal Index can also be used to deseasonalize the data. 4. Finding the long term trend a. In order to estimate the trend component of the original series, find the line which most closely resembles the CMA series. That is, find the a and the b in the below equation which results in the line which most closely resembles the CMA series. i. CMATt = a + b time 1. “time” refers to the time period. The 1st period would be time=1, the second period would be time=2, etc. 2. Note, with each additional period, CMAT will increase by “b”. ii. CMATt is the centered moving average trend. iii. CMATt is the “T” or trend component in the above time series decomposition equation. iv. The CMATt equation can be used to forecast the trend component going as far forward as one wants. Just plug in a value for “time” in the above equation and you will get a CMATt value. b. ForecastX utilizes an algorithm to find the line (the “a” and the “b”) that minimize the sum of the squared distances between the line and the CMA series. i. That is, ForecastX estimates the following equation. ii. CMAt = a + b time + error iii. CMA-hatt = a + b time 1. CMA-hatt is referred to as CMAT (centered moving average trend). iv. CMAT is the “T” in the above equation. 5. Measuring the Cyclical Component a. CFt = CMAt / CMATt b. CF is used to estimate the cyclical component in the original time series. c. CFt tells us how high the deseasonalized data is relative to its trend value. i. A value above one tells us that the deseasonalized data is high relative to the trend value. ii. A value below one tells us that the deseasonalized data is low relative to the trend value. d. CF is the “C” in the above equation. e. “A cycle factor greater than 1 indicates that the deasonalized value for that period is above the long-term trend of the data. If CF is less than 1, the reverse is true.” f. “The cycle factor is the most difficult component of a time series to analyze and to project into the forecast period. If analyzed carefully, however, it may also be the component that has the most to offer in terms of understanding where the industry may be headed. Looking at the length and amplitude of previous cycles may enable us to anticipate the next turning point in the current cycle. This is a major advantage of the time-series decomposition technique. An individual familiar with an industry can often explain cyclic movements around the trend line in terms of variables or events that, in retrospect, can be seen to have had some import. By looking at those variables or events in the present, we can sometimes get some hint of the likely future direction of the cycle component.” g. Overview of Business Cycles i. Expansion phase ii. Recession or Contraction phase iii. “If business cycles were true cycles, they would have a constant amplitude. That is, the vertical distance from trough to peak and peak to trough would always be the same. In addition, a true cycle would also have a constant periodicity. That would mean that the length of time between successive peaks (or troughs) would always be the same.” h. Business Cycle Indicators i. See Leamer paper. ii. The index of leading economic indicators 1. See Table 6.4 on p. 311. 2. See Figure 6.5 on p. 312. iii. The index of coincident economic indicators 1. See Table 6.4 on p. 311. 2. See Figure 6.5 on p. 312. iv. The index of lagging economic indicators 1. See Table 6.4 on p. 311. 2. See Figure 6.5 on p. 312. v. “It is possible that one of these indices, or one of the series that make up an index, may be useful in predicting the cycle factor in a time-series decomposition. This could be done in a regression analysis with the cycle factor (CF) as the dependent variable.” i. The Cycle Factor for Private Housing Starts i. See Figure 6.6 on p. 313. 1. Note, the cycle factors in bold are estimated values. 2. “You see that the cyclical component for private housing starts does not have a constant amplitude or periodicity.” 6. Forecasting the Cycle Factor a. Subjective methods: “Perhaps most frequently the cycle factor forecast is made on a largely judgmental basis by looking carefully at the historical values, especially historical turning points and the rates of descent or rise in the historical series.” i. This can be done by “focusing on prior peaks and troughs, with particular attention to their amplitude and periodicity” “You might look at the peakto-peak, trough-to-trough, and trough-to-peak distances by dating each turning point, such as we show in Figure 6.6. Then … you could calculate the average distance between troughs (or peaks) to get a feeling for when another such point is likely.” 1. Amplitude: peak to trough distance of the cycle. 2. Periodicity: time duration for a complete cycle to occur. For example, the peak to peak or trough to trough distance. 3. “The dates for peaks and troughs are shown in Figure 6.6, along with the values of the cycle factor at those points. Identification of these dates and values is often helpful in considering when the cycle factor may next turn around.” ii. “You can also analyze the rates of increase and/or decrease in the cycle factor as a basis on which to judge the expected slope of the forecast of the cycle factor.” b. Quantitative Methods: “Another approach would be to use another forecasting method to forecast values for CF. Holt’s exponential smoothing may sometimes be a good candidate for this task, but we must remember that such a model will not pick up a turning point until after it has occurred. Thus, the forecaster would never predict that the current rise or fall in the cycle would end.” i. “If we have recently observed a turning point and have several quarters of data since the turning point, and if we believe another turning point is unlikely during the forecast horizon, then Holt’s exponential smoothing may be useful.” 7. The Time-Series Decomposition Forecast a. Y = T x S x C x I b. FY = CMAT x SI x CF x I i. FY = CMAT x (Y / CMA) x (CMA / CMAT) x 1 ii. I is “the irregular component. This is assumed equal to 1 unless the forecaster has reason to believe a shock may take place, in which case I could be different from 1 for all or part of the forecast period.” c. “You will note that this method takes the trend [CMAT] and makes two adjustments to it: the first adjusts it for seasonality (with SI), and the second adjust it for cycle variations (with CF). d. See Table 6.5 on p. 317-18. i. The CMAT series can be extended indefinitely. ii. The SI series can also be extended indefinitely. iii. The CF series must be estimated, either by a forecasting method such as Holt’s exponential smoothing, or by subjective evaluation. 1. Note that “the cycle factors [CF series] starting in July 2006 are estimated values rather than actual ratios of PHS-CMA to PHSCMAT.” e. “Because time-series decomposition models do not involve a lot of mathematics or statistics, they are relatively easy to explain to the end user. This is a major advantage, because if the end user has an appreciation of how the forecast was developed, he or she may have more confidence in its use for decision making.” 8. Homework a. Exercises: 1, 2, 3, 4, 6 a-f, 13