Chapter 1: Introduction to Business Forecasting Introduction A

advertisement
Chapter 1: Introduction to Business Forecasting
1. Introduction
a. A forecast is essentially a prediction.
b. “We might think of forecasting as a set of tools that helps decision makers make the best
possible judgments about future events.”
c. Forecasting methods can be broken up into two different categories.
i
Quantitative forecasting methods such as time series methods (which use time series
data) and regression modeling (which uses cross sectional data).
1. The time series methods we will use, will forecast a particular variable using past
observations of the same variable.
2. Regression modeling will forecast a particular variable using observations on
other variables.
a. Regression modeling relies on time series data to make forecasts of
explanatory variables.
3. Quantitative forecasts are strongly tied to historical data. Quantitative forecasts
will involve, at some point in the forecast process, making a forecast of a certain
variable based on past observations of the same variable.
ii
“Subjective or qualitative forecasting methods
1. Although not called quantitative, subjective/qualitative forecasting methods do
typically involve quantitative calculations. However, these calculations are less
tied to historical data.
d. This text and its accompanying computer software (ForecastX) have been carefully
designed to provide you with an understanding of the conceptual basis for many modern
quantitative forecasting models …”
2. Where is forecasting used?
a. Business
i
“Forecasting in today’s business world is becoming increasingly important as firms
focus on increasing customer satisfaction while reducing the cost of providing
products and services… The term “lean” has come to represent an approach to
removing waste from business systems while providing the same, or higher, levels of
quality and output to customers (business customers as well as end users). One major
business cost involves inventory, both of inputs and of final products.”
ii
Inventory management:
1. Prevent excessive inventory: “One major business cost involves inventory, both
of inputs and of final products. Through better forecasting, inventory costs can be
reduced and wasteful inventory eliminated.”
a. The cost of excessive inventory is either the opportunity cost of resources tied
up on the inventory (this is a holding cost) or the depreciated value of the
inventory. Even when excessive inventory can be held without depreciation,
there still will be holding costs to the inventory. The holding costs will either
entail foregone interest that could have been earned on the money tied up in
the inventory, or the interest that must be paid on money that must be
borrowed (since available resources are tied up as inventory) to finance
continuing operations.
2. Reduce likelihood of lost sales due to too small an inventory.
iii
Virtually every functional area of business makes use of some type of forecast. Other
examples of business forecasting needs:
1. Accounting
a. “Accountants rely on forecasts of costs and revenues in tax planning.”
2. Personnel
a. “The personnel department depends on forecasts as it plans recruitment of
new employees and other changes in the workforce.”
3. Finance
a. “Financial experts must forecast cash flows to maintain solvency.”
4. Production management
a. “Production managers rely on forecasts to determine raw-material needs and
the desired inventory of finished products.
5. Marketing
a. “Marketing managers use a sales forecast to establish promotional budgets.”
iv Note that “the sales forecast is often the root forecast from which others … are
derived.”
b. Public and not for profit sectors.
i
Examples:
1. Forecasting the demand for police patrol services.
a. W = 5.66 + 1.84 POP + 1.70 ARR – 0.93 AFF + 0.61 VAC + 0.13 DEN
i.
W: call-for-service work load.
ii. POP: a population factor
iii. ARR: an arrest factor
iv.
AFF: an affluence factor
v.
VAC: a vacancy factor
vi.
DEN: a density factor
2. Forecasting state budgets.
3. Forecasting hospital nursing staff requirements.
3. Forecasting and supply chain management
a. “We can think of the supply chain as encompassing all of the various flows between
suppliers, producers, distributors (wholesalers, retailers, etc.), and consumers.
Throughout this chain each participant, prior to the final consumer, must manage
supplies, inventories, production, and shipping in one form or another… Each one of
these suppliers has its own suppliers back one more step in the supply chain. With all of
these businesses trying to reduce inventory costs … reliability and cooperation across the
supply chain become essential.”
4. Collaborative forecasting
a. “The recognition that improving functions throughout the supply chain can be aided by
appropriate use of forecasting tools has led to increased cooperation among supply chain
partners.”
b. “This cooperative effort … has become known as Collaborative Planning Forecasting and
Replenishment (CPFR). CPFR involves coordination, communication, and cooperation
among participants in the supply chain.”
c. “In its simplest form the process is as follows:
i
A manufacturer that produces a consumer good computes its forecast.
ii
That forecast is then shared with retailers that sell that product to end-use consumers.
iii
Those retailers respond with any specific knowledge that they have regarding their
future intentions related to purchases based on known promotions, programs,
shutdown, or other proprietary information about which the manufacturer may not
have had prior knowledge.
iv The manufacturer then updates the forecast including the shared information.”
d. Benefits of collaborative forecasting include:
i
Lower inventory and capacity buffers
ii
Fewer unplanned shipments or production runs
1. “These unplanned shipments usually carry a premium price.”
iii Reduced stockouts
1. Stockouts “will always have a negative impact on the seller due to lost [current]
sales and lower customer satisfaction” and hence lower future sales.
iv Increased customer satisfaction and repeat business
v
Better preparation for sales promotions
1. “No one wants to promote products that cannot be supplied.”
vi
Better preparation for new product introductions
1. “New product launches can be very tricky … Meeting the needs of new product
launches can [optimize] launch timing and increase speed to market.”
vii
Dynamically respond to market changes
1. “Sometimes markets change based on external factors (popular culture,
government controls, etc.). Being able to respond to these special cases without
overstocking or understocking is critical.”
e. Potential costs of collaborative forecasting
i
“In a collaborative environment there is a lot of information that flows between the
two parties. Most of the time, information resides in public forums (computer
servers) with only a software security system protecting if from outsiders.
Collaborative forecasting does run the risk of loss of confidentiality to …
competitors.”
5. Computer use and quantitative forecasting
a. “The widespread availability of computers has contributed to the use of quantitative
forecasting techniques. Most of the methods described in this text fall into the realm of
quantitative forecasting techniques, many of which would not be practical to carry out by
hand.”
b. Charles W. Chase, Jr., formerly director of forecasting at Johnson &Johnson Consumer
Products, Inc. on the relationship between quantitative methods and judgment:
i
“Forecasting is a blend of science and art. Like most things in business, the rule of
80/20 applies to forecasting. By and large, forecasts are driven 80 percent
mathematically and 20 percent judgmentally.”
6. Qualitative or subjective forecasting methods
a. “Quantitative techniques using the power of the computer have come to dominate the
forecasting landscape. However, there is a rich history of forecasting based on subjective
and judgmental methods, some of which remain useful even today. These methods are
probably most appropriately used when the forecaster is faced with a severe shortage of
historical data and/or when quantitative expertise is not available… Very long-range
forecasting is an example of such a situation.”
b. Examples
i
Sales-force composites
1. “Members of the sales force are asked to estimate sales for each product they
handle.”
ii
Surveys of customers and the general population
1. “In some situations it may be practical to survey customers for advanced
information about their buying intentions.”
iii Jury of executive opinion
1. “a forecast is developed by combining the subjective opinions of the managers
and executives who are most likely to have the best insights about the firm’s
business”
iv The Delphi method
1. “Similar to the jury of executive opinion ... [with] the additional advantage … of
anonymity among the participants. The experts, perhaps five to seven in number,
never meet to discuss their views; none of them even knows who else is on the
panel.”
c. Advantages
i
Can use when shortage of quantitative expertise
1. “They do not require any particular mathematical background of the individuals
involved. As future business professionals, like yourself, become better trained in
quantitative forms of analysis, this advantage will become less important.”
ii
Can use when shortage of historical data: For example, salesperson and consumer
surveys useful when forecasting new product sales.
iii Wide acceptance
1. “Historically, another advantage of subjective methods has been their wide
acceptance by users.”
iv Flexible due to subjectivity
1. “The underlying models are, by definition, subjective [i.e., determined by the
person doing the analysis]. This subjectivity is nonetheless the most important
advantage of this class of methods. There are often forces at work that cannot be
captured by quantitative methods. They can, however, be sensed by experienced
business professionals and can make an important contribution to improved
forecasts.”
v
Complement quantitative techniques
1. “Quantitative methods reduced errors to about 60 percent of those that resulted
from the subjective method that had been in use. When the less accurate
subjective method was combined with quantitative methods, errors were further
reduced to about 40 percent of the level when the subjective method was used
alone.”
vi
Compared to quantitative analysis, less costly to do analysis.
vii
Compared to quantitative analysis, avoids data collection costs.
d. Disadvantages
i
No straightforward algorithmic reference for how forecast made
1. “Users are increasingly concerned with how the forecast was developed, and with
most subjective methods it is difficult to be specific in this regard.”
ii
Subject to bias because use opinion often based on subjective personal experiences
iii Easily subject to manipulation
1. This can cause inconsistency over time
iv May rely on experience of forecaster to be done well
1. “It takes years of experience for someone to learn how to convert intuitive
judgment into good forecasts.”
v
Compared to quantitative, generally less accurate.
vi
Compared to quantitative, more difficult to combine more than one forecasting
method.
1. Combining forecasts potentially increases the informational content embodied in
a forecast.
vii
Compared to quantitative, cannot be adjusted to improve fit by applying specialized
methods such as for seasonality.
7. Example – New Product Forecasting
a. Introduction
i
“Often judgmental methods are better suited to forecasting new-product sales because
there are many uncertainties and few known relationships.”
ii
“However, there are ways to make reasonable forecasts for new products. These
typically include both qualitative judgments and quantitative tools on one type or
another.”
b. The Product Life Cycle Concept
i
Figure 1.1.
ii
“This notion of a product life cycle can be applied to a product class (such as personal
passenger vehicles), to a product form (such as sport utility vehicles), or to a brand
(such as Jeep Cherokee …).”
iii “The real forecasting problems occur in the introductory stage (or in the preintroductory product development stage). Here the forecaster finds traditional
quantitative methods of limited usefulness and must often turn to marketing research
techniques and/or qualitative forecasting techniques.”
iv “Once the mid-to-late growth stage is reached, there is probably sufficient historical
data to consider a wide array of quantitative methods.”
c. Analog Forecasts
i
“The basic idea behind the analog method is that the forecast of the new product is
related to information that you have about the introduction of other similar products
in the past.”
ii
For example, using prior sales of a similar product, adjusted for relative estimated
percentage of households that will purchase the product, to estimate future sales of
the new product.
d. Test Marketing
i
“Test marketing involves introducing a product to a small part of the total market
before doing a full product rollout.”
e. Product Clinics
i
“Potential customers are invited to a specific location and are shown a product
mockup or prototype, which in some situations is essentially the final product…
Afterwards they are asked to evaluate the product during an in-depth personal
interview and/or by filling out a product evaluation survey.”
8. Simple or “Naive” Forecasting Models
a. First “Naïve” Forecasting Model
i
Ft = At-1
1. Ft is the forecast for t.
2. At-1 is the actual observation at period t-1.
b. Modified “Naïve” Forecasting Models
i
Ft = At-4,
1. Ft is the forecast for t.
2. At-4 is the actual observation in the same quarter of the prior year.
3. Prior quarter data is used in the lag when there is seasonality in the data.
ii
Ft = At-12
1. Ft is the forecast for t.
2. At-1 is the actual observation in the same month in the prior year.
3. Prior month data is used in the lag when there is seasonality in the data.
c. Second “Naïve” Forecasting Model
i
Ft = At-1 + P(At-1 – At-2)
1. “P is the proportion of the change between periods t-2 and t-1 that we choose to
include in the forecast.”
9. Evaluating Forecasts
a. “We need some way to evaluate the accuracy of forecasting models over a number of
periods so that we can identify the model that generally works the best.”
b. Evaluation Criterion: Also known as loss functions. These are quantitative measures
of the accuracy of a forecast method.
i
See page 34-35 of text.
1. Each evaluation criterion has a major advantage or disadvantage.
10. Example – Forecasting Consumer Sentiment
11. Example – Forecasting Total Houses Sold
12. Example – Forecasting Gap Sales
13. Using Multiple Forecasts
a. “We know it is unlikely that one model will always provide the most accurate forecast for
any series. Thus, it makes sense to “hedge one’s bet,” in a sense, by using two or more
forecasts. This may involve making a “most optimistic,” a “most pessimistic,” and a
“most likely” forecast.”
b. One simply was to get a “most likely” forecast, is to take an average of different
forecasts.
14. Sources of Data
a. Internal records: “The most obvious sources of data are the internal records of the
organization itself. Such data include unit product sales histories, employment and
production records, total revenue, shipments, orders received, inventory records, and so
forth.”
b. Trade associations: “For many types of forecasts the necessary data come from outside
the firm. Various trade associations are a valuable source of such data.
c. Government: “But the richest sources of external data are various governmental and
syndicated services.”
15. Introduction to ForecastX
a. See online “software notes.”
b. See p. 49 – 52.
16. Excel Review
a. See online “software notes.”
17. Homework
a. Case Questions: 1, 2, 3 (do with pencil and paper only)
b. Exercises: 1, 2, 3, 4 (use pencil and paper only for these first 4 questions), 7, 8 (use Excel
only for these final 2 questions)
Chapter 2: The Forecast Process, Data Considerations, and Model Selection
1. The Forecast Process
a. Specify objectives
i. “Objectives and applications of the forecast should be discussed between
the individual(s) involved in preparing the forecast and those who will
utilize the results.”
b. Determine what to forecast
c. Identify time dimensions: length and periodicity
i. “Is the forecast needed on an annual, a quarterly, a monthly, a weekly, or a
daily basis?”
d. Data considerations: from where will the data come.
e. Model selection
i. Depends on the following:
1. Pattern exhibited by the data (the most important criterion)
2. Quantity of historic data available
3. Length of the forecast horizon
f. Model evaluation
i. “This is often done by evaluating how each model works in a retrospect.”
ii. Measures such as root mean square error (RMSE) are used.
iii. Fit: “how well the model works retrospectively.”
iv. Accuracy: “relates to how well the model works in the forecast horizon
(i.e., outside the period used to develop the model).”
v. Holdout period: “When we have sufficient data, we often use a “holdout”
period to evaluate forecast accuracy.”
g. Forecast preparation
i. “When two, or more, methods that have different information bases are
used, their combination will frequently provide better forecasts than would
either method alone.”
h. Forecast presentation
i. “In both written and oral presentations, the use of objective visual
presentations of the results is very important.”
i. Tracking results
i. “Over time, even the best of models are likely to deteriorate in terms of
accuracy and need to be respecified, or replaced with an alternative
method.”
2. Trend, Seasonal, and Cyclical Data Patterns
a. “A time series is likely to contain some, or all, of the following components:”
b. Trend: a long term consistent change in the level of the data.
i. Linear trends: relatively constant increases over time.
ii. Nonlinear trends: trends which are increasing (accelerating) or decreasing
over time.
iii. Stationary: “Data are considered stationary when there is neither a positive
nor a negative trend (i.e., the series is essentially flat in the long term).”
c. Seasonal: regular variation in the level of the data that occurs at the same time
each year.
d. Cyclical: Cyclical fluctuations are usually “represented by wavelike upward and
downward movements of the data around the long-term trend.”
i. “Cyclical fluctuations are of longer duration and are less regular than are
seasonal fluctuations.”
ii. “The causes of cyclical fluctuations are less readily apparent as well.
They are usually attributed to the ups and down in the general level of
business activity that are frequently referred to a s the business cycle.”
iii. Another definition of cyclical is the following: “when the data exhibit
rises and falls that are not of a fixed period… The major distinction
between a seasonal and a cyclical pattern is that the former is of a constant
length and recurs on a regular periodic basis, while the latter varies in
length.” (Makridakis, Wheelwright, and Hyndman, 1998, p. 25)
iv. Cyclical fluctuations can be more easily seen once the data has been
“deseasonalized” or “seasonally adjusted.”
e. Irregular: “fluctuations that are not part of the other three components. These
are often called random fluctuations. As such they are the most difficult to
capture in a forecasting model.”
f. See Figure 2.2
i. “The third line, which moves above and below the long-term trend but is
smoother than the plot of THS, is what the THS series looks like after the
seasonality has been removed. Such a series is said to be
“deseasonalized,” or “seasonally adjusted” (SA).”
ii. “By comparing the deseasonalized series with the trend, the cyclical
nature of houses sold becomes clearer.”
iii. A trend can be seen
iv. Seasonality can be seen
v. Cyclicality can be seen
g. See Figure 2.3
i. A trend can be seen.
1. The trend is nonlinear: “the quadratic (nonlinear) trend in the lower
graph provides a better basis for forecasting” than the linear trend
in the upper graph.
ii. Seasonality is not apparent
iii. Cyclicality is not apparent
3. Statistics
a. Descriptive Statistics
i. Measures of central tendency
1. Mode
2. Median
3. Mean
ii. Measures of dispersion
1. Range
2. Variance (for a population and sample)
a. Population: σ2 = ∑ (Xi – μ)2 / N
b. Sample:
s2 = ∑ (Xi – Xbar)2 / (n-1)
3. Standard deviation
a. Population: σ = square root [∑ (Xi – μ)2 / N]
b. Sample:
s = square root [∑ (Xi – Xbar)2 / (n-1)]
b. Normal Distribution
i. Symmetric
ii. 68, 95, 99.7 rule.
1. μ + 1 σ included about 68% of the area
2. μ + 2 σ included about 95% of the area
3. μ + 3 σ included about 99.7% of the area
iii. There are an infinite number of normal distributions. The shape of each
normal distribution is described by mean μ and standard deviation σ
iv. Standard normal distribution
1. Z = (X – μ) / σ
2. “every other normal distribution can be transformed easily into a
standard normal distribution called the Z-distribution.”
3. “The Z-value measures the number of standard deviation by which
X differs from the mean. If the calculated Z-value is positive, then
X lies to the right of the mean (X is larger than μ). If the
calculated Z-value is negative, then X lies to the left of the mean
(X is smaller than μ).
4. See table 2.4, p. 74. This table is used to calculate the following
a. The percentage of data that fall below a particular level.
b. The percentage of data that fall above a particular level.
c. The percentage of data that fall between two points.
c. Student’s t Distribution
i. Useful when working with sample data (as we typically do in a business
context).
ii. “When the population standard deviation is not known, or when the
sample size is small, the Student’s t-distribution should be used rather than
the normal distribution.”
iii. “The Student’s t-distribution resembles the normal distribution but is
somewhat more spread out for small sample sizes.”
d. Statistical Inference – Confidence Intervals
i. “A sample statistic is our best point estimate of the corresponding
population parameter. While it is best, it is also likely to be wrong. Thus,
in making an inference about a population it is usually desirable to make
an interval estimate.”
ii. Confidence interval for the mean (μ) of a population is the following:
1. μ = Xbar + t (s / square root n)
2. See table 2.5, p. 73.
a. To find the value of t in the above equation, look up the
value in the column corresponding with ½ of (1 –
confidence interval). (For example, for a 95% confidence
interval, look up the value in the t0.25 column.)
e. Statistical Inference – Hypothesis Testing
i. “Frequently we have a theory or hypothesis that we would like to evaluate
statistically.”
ii. “The process begins by setting up two hypotheses, the null hypothesis
(designated H0:) and the alternative hypothesis (designated H1:). These
two hypotheses should be structured so that they are mutually exclusive.”
1. Set up your null and alternative hypothesis.
a. See Case I on page 76.
b. Use Case I when testing whether a parameter is or is not
equal to a certain value.
i. Ho: μ = μ0
ii. Ha: μ not = μ0
2. Look up the critical value (tT) in the appropriate table.
a. Two sided test with significance level = 5%: p.75.
i. If degrees of freedom > 100 just use row
corresponding with n = 100.
b. Two sided test for a wide range of significance levels:
Table 2.5, p. 73.
i. For a two sided test, look up the critical value in the
column corresponding with ½ of your significance
level. (For example, for a 5% significance level,
look up the critical value in the t0.25 column.)
c. Note: Sometimes the test is specified in terms of the
“confidence level.” The significance level is equal to 1
minus the confidence level of the test.
3. Calculate your t statistic
a. t = (Xbar – μ0) / (s / square root n)
4. Compare t to the critical value (tT) and accept or reject the null.
a. If |t| > tT then reject the null.
f. Correlation
i. Definition
1. Correlation is a measure of how linearly associated two variables
are. Correlation gives information about the strength and direction
of the linear association.
2. The stronger is the linear association, the closer the absolute value
of the correlation coefficient is to 1.
3. The direction of the linear association of the variables is
determined by the sign of the correlation coefficient. A positive
sign reveals a positive association, and a negative sign reveals a
negative association.
ii. Figure 2.6
iii. Equation: p. 81
iv. Hypothesis testing on Correlation Coefficient
1. Null hypothesis is that r = 0.
2. See equation for t statistic on p. 83.
3. Reject the null if the t that you calculate using the equation on p.
83 is greater than the t value you look up in the t table. Use n-2
degrees of freedom when looking up the t value in the t table.
g. Autocorrelation & lagged data
i. With autocorrelation, we can calculate the correlation between
observations in a data series and the k period lagged values of the data
series.
ii. K period lag: the number of periods prior to a given period.
iii. Autocorrelation is similar to correlation. With correlation we essentially
multiply/compare the value of one series (relative to its mean) with the
corresponding value of another series (relative to its mean). Now,
however, the second series is just a lagged version of the original series.
iv. K period lagged data series: A data series where the value for any given
period is equal to value for the original (non-lagged) series k periods
before.
1. When your data are ordered in a column with the oldest data on the
top (period 1 on top), yt-k is created by simply shifting your
original time series data (yt) down by k rows. You have now
created a new time series with a k period lag.
2. Here the t – k indicates a series that is lagged by k periods.
3. Now you can think of two separate variables, your original time
series (yt) and your lagged time series (yt-k). Now to calculate rk,
just multiply data from the same row in each column (the
corresponding values from each series), where the two columns
represent yt and yt-k..
v. Autocorrelation Equation for a k period lag, rk: p. 84
1. There are some errors in book’s equation:
a. First term after summation sign in numerator should have a
subscript “t + k” rather than “t – k”
b. Summation in denominator should begin at “t = 1” rather
than “t – 1”
vi. So, autocorrelation tells us about both trends and about seasonality. This
is very useful information for forecasters for two reasons.
1. First, some forecasting methods are more useful for data without
trends, while other forecasting methods are more useful for data
with trends.
2. Second, some forecasting methods are more useful for data without
seasonality, while other forecasting methods are more useful with
seasonality.
vii. Hypothesis testing
1. It is useful to run a hypothesis test for autocorrelation ρk = 0, for
various k, in order to get a good idea as to whether the time series
exhibits a trend or seasonality.
a. t = (rk – 0) / [1 / square root (n – k)]
i. (n – k) is the degrees of freedom
b. Evidence of a trend: see below
c. Evidence of seasonality: see below
2. Rule of thumb: The following is the rule of thumb for rejecting the
null hypothesis that autocorrelation ρk = 0 at the 95% confidence
level.
a. |rk| > 2 / square root (n – k), or, approximately
b. |rk| > 2 / square root n, when n is large relative to k.
c. The rule of thumb being satisfied corresponds with a t
statistic which is greater than approximately 2.
h. Correlograms (aka, Autocorrelation Function - ACF)
i. Definition
1. “A k-period plot of autocorrelations is called an autocorrelation
function (ACF), or a correlogram.”
ii. Evidence of Stationarity vs. a Trend
1. “If the time series is stationary, the value of rk should diminish
rapidly toward zero as k increases. If, on the other hand, there is a
trend, rk will decline toward zero slowly.”
2. Rejecting the null that ρk = 0 for most k is evidence of a trend.
Reject the null for only one or two lags however is evidence
against a trend.
3. Note that trends are difficult to differentiate from cycles using
correlograms. (Note that one one can interpret cycles as
alternating trends.) When the data appears not to be stationary,
one can assume the existence of either a trend, a cycle, or both.
(However, with cycles, the correlation will often eventually turn
negative.)
iii. Evidence of Seasonality
1. For stationary data, rejecting the null that ρk = 0 primarily (and or
most convincingly) for lags which are divisible by the number of
periods in a year is evidence of seasonality.
2. For example, “if a seasonal pattern exists, the value of rk may be
significantly different from zero at k = 4 for quarterly data, or k =
12 for monthly data. (For quarterly data, rk for k = 8, 12, 16, …
may also be large. For monthly data, a large rk may also be found
for k = 24, 36, etc.)”
3. Seasonality can be difficult to see in a correlogram when the data
exhibit either a cyle or a trend. However, first differencing the
data, which often makes the data stationary, can make seasonality
much more evident in a correlogram. (A description of how to
first difference is below.)
iv. Hypothesis testing for zero correlation
1. “To determine whether the autocorrelation at lag k is significantly
different from zero,” use the rule of thumb on p. 84 in the text.
2. A bar beyond either of the horizontal lines in our software print
outs, and in Figure 2.8 & 2.10, implies that the null hypothesis of
zero autocorrelation should be rejected.
v. Differencing to remove trends.
1. “If we want to try a forecasting method … that requires stationary
data, we must first transform the … data to a stationary series.
Often this can be done by using first differences.”
2. Types of differencing
a. First differencing.
b. Second differencing.
3. When differencing results in a stationary time series it allows us to
utilize forecasting methods that are better suited to data without a
trend. We then will forecast a difference (or several differences).
From the difference(s) we can then “back out” a forecast for the
actual level of the time series.
a. Similarly, we can de-seasonalize data, use a forecast
method better suited to data without seasonality, and then
re-seasonalize our forecast.
4. Review (Relevance of information in this chapter)
i. the reasons discussed above.
5. Forecast X
a. See p. 49 – 52 from last chapter for a quick review.
b. See p. 93 – 95.
c. For making graphs, see online “software notes”.
6. Homework
a. Case Questions: 2, 3.
b. 8 – 11.
Chapter 3: Moving Averages and Exponential Smoothing
1. The relationship of this chapter to the prior chapter.
a. In this chapter we will discuss four different forecasting models. These models
are appropriate under different underlying conditions in the data: stationarity,
trend, and or seasonality. It is using information acquired from the procedures
discussed in the last chapter that gives us insight into which of the models in this
chapter would be the most appropriate choice to construct a forecast.
2. Smoothing:
a. “a form of weighted average of past observations to smooth up-and-down
movements, that is, some statistical method of suppressing short-term
fluctuations”
b. “The assumption underlying these methods is that the fluctuations in past values
represent random departures from some smooth curve that, once identified, can
plausibly be extrapolated into the future to produce a forecast or series of
forecasts.”
c. We will discuss several smoothing techniques in this chapter.
d. “all [smoothing techniques] are based on the concept that there is some
underlying pattern to the data … cycles or fluctuations that tend to occur.”
3. Moving Averages
a. “The simple statistical method of moving averages may mimic some data better
than a complicated mathematical function.”
b. Calculating moving averages: Figure 3.1 and table 3.1
c. “The choice of the interval for the moving average [should be consistent with] the
length of the underlying cycle or pattern in the original data.”
d. The first naïve forecasting model presented in Chapter 1 was essentially a one
period moving average.
e. On the bottom of table 3.1 the RMSE is used to evaluate the different interval
MAs (3 quarter versus 5 quarter).
f. Figure 3.2 & 3.3: The “failure of the moving averages to predict peaks and
troughs is one of the shortcomings of moving-average models.”
g. The MA method of forecasting can lead forecasters into incorrectly identifying
cycles that don’t exist. The reason is that an MA creates serial correlation
(autocorrelation) in the forecasted data, as prior year MA data are functions of the
same data. Serial correlation (autocorrelation) in turn can lead to cycles in the
data.
i. “Since any moving average is serially correlated … any sequence of
random numbers could appear to exhibit cyclical fluctuation.”
h. Like the first naïve forecasting, moving averages are effective in forecasting
stationary time series. Why? However, they are not good at handling trends or
seasonality very well.
4. Simple Exponential Smoothing
a. Like moving averages, exponential smoothing is properly used when there is no
trend.
b. “With exponential smoothing, the forecast value at any time is a weighted average
of all the available previous values...”
c. “Moving-average forecasting gives equal weights to the past values included in
each average; exponential smoothing gives more weight to the recent observations
and less to the older observations.”
d. The weight of the most recent observation is α, the next most recent observation is
(1-α)α, the next observation (1-α)2α, and so on.
e. The simple exponential smoothing model can be written as in 3.1.
i. α is between 0 and 1.
f. An alternative interpretation of the exponential smoothing model in equation 3.1
is seen in 3.2.
i. “From this form we can see that the exponential smoothing model “learns”
from past errors. The forecast value at period t + 1 is increased if the
actual value for period t is greater than it was forecast to be, and it is
decreased if Xt is less than Ft.”
g. Although forecasting the value for the next period requires us only to know last
period’s forecast and actual value, all past observations are embodied in the
forecast.
h. This leads us to a third interpretation of the exponential smoothing model in 3.1.
i. See 3.3
ii. Note that exponential smoothing allows the weights to sum to one
regardless of when your data starts.
i. Understanding the weights on past observations of x.
i. Note the nature of the weights on past observations: α, (1-α)α, (1-α)2α, (1α)3α …
ii. Remember, the value of α is between 0 and 1.
iii. α values close to 1 imply recent data is weighted much higher than past
data.
iv. α values close to 0 imply recent data is weighted only slightly higher than
past data.
v. See table on p. 108.
vi. Note that regardless of the level of α, the weights will eventually sum to 1.
j. Tips on selecting α.
i. With a great deal of random variation in the time series, choose an α closer
to 0.
ii. If you want your forecast to depend strongly on recent changes in the time
series, choose an α closer to 1.
iii. The RMSE is often used as a criterion to determine the best level of α.
iv. Generally, small values of α work best when exponential smoothing is the
appropriate model.
k. Note that in order to utilize this model we must make an initial estimate for F.
“This process of choosing an initial value for the smoothed series is called
initializing the model, or warming up the model.”
i. “R. G. Brown first suggested using the mean of the data for the starting
value, and this suggestion has been quite popular in actual practice.”
l. If a value for α is not selected, ForecastX will select one to minimize the RMSE.
m. Advantages of the simple exponential smoothing model.
i. “it requires a limited quantity of data”
ii. “it is simpler than most other forecasting methods”
n. Disadvantages of the simple exponential smoothing model.
i. “its forecasts lag behind the actual data”
ii. “it has no ability to adjust for a trend or seasonality in the data”
5. Holt’s Exponential Smoothing
a. Holt’s model is utilized when there is a trend in the data.
b. See 3.4 – 3.6.
c. Intuition: In order to get a forecast for a period, we take the smoothed value for
the prior period and then add to it an estimate of the trend.
d. Equation 3.4 estimates the smoothed value in period t+1.
i. Note, 3.4 does not represent the forecast. F now represents the smoothed
value, rather than the forecast. H now represents the forecast.
ii. Note, the smoothed value is a linear combination of actual data from the
period and the sum of the smoothed value for the last period and an
estimate of the trend.
e. Equation 3.5 estimates the trend from t to t+1.
i. The estimate for the trend is a linear combination of the change in the
smoothed data and the previous trend estimate.
ii. Equation 3.5 reflects the fact that when we estimate a trend, we use both
the most recent trend in the smoothed data (Ft+1 – Ft), and the previous
trend estimiate (Tt).
iii. From our prior discussion of 3.1 (and 3.3), it is apparent that the trend
estimate is a weighted average of all the prior one-period changes in the
smoothed values.
f. “Equation 3.6 is used to forecast m periods into the future by adding the product
of the trend component, Tt+1, and the number of periods to forecast, m, to the
current value of the smoothed data Ft+1.”
g. “Two starting values are needed: one for the first smoothed value [F] and another
for the first trend value [T].”
i. “The initial smoothed value is often a recent actual value available;”
ii. “the initial trend value is often 0.00 if no past data are available.”
iii. ForecastX will choose these values
h. ForecastX software will choose the smoothing constants (α and γ) if you do not
select them.
i. “Holt’s form of exponential smoothing is then best used when the data show some
linear trend but little or no seasonality. A descriptive name for Holt’s smoothing
might be linear-trend smoothing.”
6. Winter’s Exponential Smoothing
a. Winter’s model is utilized when there is both a trend and seasonality.
b. See 3.7 – 3.10.
c. Intuition: In order to get a forecast for a period, we take the smoothed
(deseasonalized) value for the prior period, add to it an estimate of the trend, and
then reseasonalize it.
d. Equation 3.7 estimates the smoothed value in period t + 1.
i. Note, 3.7 does not represent the forecast. Again, F represents the
smoothed value, rather than the forecast. W now represents the forecast.
ii. Note that this is identical to out Holt’s smoothed value equation, except
for the fact that the Xt term is divided by St-p, a seasonality estimate.
iii. Note, the smoothed value is a linear combination of deseasonalized actual
data from the period and the sum of the smoothed value for the last period
and an estimate of the trend.
e. Equation 3.8 estimates the seasonality.
i. The seasonality estimate is a linear combination of the ratio of actual data
to the smoothed value and the prior seasonality estimate.
f. Equation 3.9 estimates the trend.
i. The estimate for the trend is a linear combination of the change in the
smoothed data and the previous trend estimate.
ii. Note that this is identical to out Holt’s trend estimate equation.
g. Equation 3.10 “is used to compute the forecast for m periods into the future;”
h. Seven values must be used to initialize or warm up the model: one for the first
smoothed value, one for the first trend value, and one for each of the first
seasonality values.
i. These initial values are chosen by the software.
i. ForecastX software will choose the smoothing constants (α, β, and γ) if you do
not select them.
j. Winter’s form of exponential smoothing is best used then the data show a linear
trend and seasonality.
k. An alternative to using Winter’s model is to deseasonalize the data and then use
Holt’s model to get a forecast. The forecast would then be reseasonalized.
l. Seasonal Indicees
i. “As part of the calculation with an adjustment for seasonality, seasonal
indices are calculated and displayed in most forecasting software.”
m. Note, when data exhibit seasonality, but no trend, an alternative to using the
Winters model is the following: deseasonalize the data, construct a forecast using
the simple exponential model, and then reseasonalize the data.
n. Note, when data exhibit seasonality and a trend, an alternative to the Winter’s
model is the following: deseasonalize the data, construct a forecast using the
Holt’s model, and then reseasonalize the data.
7. Seasonal Indicees
a. Seasonal indicees essentially measure how high (or low) observations for a
particular series are during a given period. For example, for quarterly data, a
season index would tell how high (or low) data tended to be for the 1st, 2nd, 3rd,
and 4th quarter. Seasonal indices are meant to capture predictable seasonal
variation in a series. A seasonal index of 1.3 for the fourth quarter implies that
fourth quarter data tends to be higher than average (since 1.3 is higher than 1).
b. Deseasonalizing: dividing data by the seasonal index for the respective period.
c. Reseasonalizing: multiplying data by the seasonal index for the respective period.
d. Note, when data exhibit seasonality, but no trend, an alternative to using the
Winters model is the following: deseasonalize the data, construct a forecast using
the simple exponential model, and then reseasonalize the data.
e. Note, when data exhibit seasonality and a trend, an alternative to the Winter’s
model is the following: deseasonalize the data, construct a forecast using the
Holt’s model, and then reseasonalize the data.
8. Note
a. “A chief flaw with smoothing models is their inability to predict cyclical reversals
in the data, since forecasts depend solely on the past.”
9. See Software Tips
10. Homework
a. Case Questions: 1 (only answer the first two sub-questions, and replace "2004" by
"2008"), 2.
b. Exercises: 1, 3, 4, 7, 8, 9, 11, 12
Chapter 4: Introduction to Forecasting Regression Methods
1. The Bivariate Regression Model
a. Y = β0 + β1X + ε
i. This is the assumed true underlying relationship between X (the
independent variable) and Y (the dependent variable).
b. Y-hat = b0 + b1X
i. This is the equation we estimate. It is the equation for the line which
comes as close as possible to matching the data.
c. e = Y – Y-hat
i. This is the error term. It is how far our actual level of Y is from our
predicted level of Y (Y-hat).
d. Min ∑e2 = ∑ (Y - b0 + b1X)2
i. Ordinary Least Squares (OLS) is an algorithm for choosing b0 and b1 such
that the sum of the squared errors (the above equation) is minimized.
2. Regression – What We All Try To Do
a. Regression is most commonly used to estimate the causal impact of one variable
on another variable. Ultimately, in this class, we will use it to forecast a variable
of interest.
b. One of your main take homes from this class can be casually estimating causal
impacts: draw the scatter plot, draw the line, the slope is an estimate of the causal
impact of the X variable on the Y variable.
i. With more data your estimate is better.
ii. If there is a confounding lurking variable, then your estimate is biased.
c. The progression from thinking casually about the causal effect of one variable on
another variable, to understanding regression:
i. Think of two observations on some causal effect you are trying to
understand/estimate.
1. For example, the causal effect of Hours Studied (X) on GPA (Y).
The two observations would be two pairs: the Hours Studied and
GPA for one individual (HS1, GPA1), and the Hours Studied and
GPA for a second individual (HS2, GPA2).
ii. From these two observations, you can estimate (somewhat crudely)
whether the effect is positive or negative.
iii. You can also estimate the size of the effect by taking the ratio ΔY / ΔX.
1. This is an estimate for the causal impact of a one unit change in X
on Y.
iv. If you plotted these two data points on a scatterplot, with the X variable on
the horizontal axis, and the Y variable on the vertical axis, then the slope
of the line through the two data points is exactly the same ΔY / ΔX from
the prior step.
v. So plotting the scatterplot, drawing a line through the data, and calculating
the slope is essentially a way of estimating the causal effect of one
variable (the one plotted on the X axis) on another variable (the one
plotted on the Y axis).
vi. Ultimately, we will plot many observations on the scatterplot and use the
slope of the line that most closely matches the data as our estimate of the
causal impact of the X variable on the Y variable.
1. Under the appropriate conditions, this can be a very reliable
estimate.
2. Regardless, hopefully you can understand how regression analysis
basically is an extension of a process of thinking about causality
which you may already do (step i through iii).
3. Bivariate Regression in a Nutshell
a. Begin with a data set containing two variables, one which we want to forecast (the
dependent variable), and another which we will use to make the forecast (the
independent variable).
i. Think of a data table including the following columns: The period,
Variable 1 (the Y variable or forecast variable), Variable 2 (the X variable
or predicting variable).
ii. Later we will add the following variables to the data table: Time, Y-hat, e.
b. Assume the following linear relationship between the forecast variable and the
predicting variable.
i. Y = β0 + β1X + ε
1. This is the bivariate regression model.
2. Y is the variable we want to forecast.
3. X is the variable we will use to make the forecast.
4. This is a linear relationship because it has the general form of a
line: Y = mX + b. β0 above corresponds with b and β1X above
corresponds with mX.
ii. Y-hat = b0 + b1X
1. This is the equation we use to forecast Y. Y-hat is the forecast (or
prediction) of Y, for a given level of X.
2. The software we use will give us the b0 and b1 in the above
equation.
3. This is the equation for the STRAIGHT line which comes as close
as possible to matching the data.
iii. Note that other variables besides the predicting variable impact Y through
ε. Therefore, the assumed linear relationship does not imply that other
variables do not impact Y.
1. More on this below.
iv. β1 is an estimate of the impact of a one unit change in X on Y.
c. Estimate β0 and β1 using software.
i. The estimate for β0 will be referred to as b0.
ii. The estimate for β1 will be referred to as b1.
iii. β0 and β1 are the true values, while b0 and b1 are our estimates of those
values.
1. The β’s are analogous to µ, while the b’s are analogous to Xbar.
Here, the first set of variables (β and µ) are the true (population)
parameter we are trying to estimate, while the second set of
variables (b and Xbar are the estimates).
d. b0 and b1 are the Y intercept and slope, respectively, of the “regression line”
which comes as close as possible to matching the data (the “best fit” line) when
we plot the independent variable on the X axis and the dependent variable on the
Y axis.
i. Plotting the data is simply recording, with a series of points on a graph, the
combinations of the X variable (the independent variable) and Y variable
(the dependent variable) for each period for which we have data.
ii. There may be other variables that effect Y besides X. Excluding them
from the analysis does not bias our estimate of β1. (Unless the excluded
variable is a confounding lurking variable.)
e. The quality of our estimates of β0 and β1 (that is, the quality of b0 and b1) are
determined largely by (1) the sample size, (2) the lack of omitted variable bias
(confounding lurking variables), (3) the assumed linearity of the relationship
between X and Y.
i. While we will estimate the slope of the line which most closely matches
the scatterplot of the data, ultimately, that estimate will only be useful if
the visual relationship between X and Y is linear (is a straight line).
ii. If the scatterplot of the data has a curved shape, then the slope will not
accurately reflect the causal effect of X on Y.
f. The following equation gives up the predicted value of Y, we’ll call it Y-hat, for
each level of X. It is the equation of the regression line.
i. Y-hat = b0 + b1X
ii. This is just the equation for the line which most closely matches the data.
iii. We can add the Y-hat variable to the above data table, calculate Y-hat for
each period, and then record it in the Y-hat column in the data table.
g. The error, e, will be defined as the difference between the actual level of Y for a
given level of X, and the predicted level of Y (Y-hat) for a given level of X.
i. e = Y – Y-hat
ii. On the graph, this is just the vertical distance between the predicted level
of Y (Y-hat) for a given level of X (the regression line), and the actual
level of Y for that same level of X.
iii. The size of the e’s (that is, the scatter of the points around the regression
line) is not a measure of the quality of the estimate of β1, rather it is an
indication of the degree to which other variables (captured by ε) also
influence the dependent variable.
1. Generally, the greater is the spread of the data around the
regression line, the greater is the impact of all other variables on
the forecast variable (Y). The impact of all other variables on the
forecast variable is represented by ε in the assumed linear
relationship equation: Y = β0 + β1X + ε.
iv. We can add the e variable to the above data table, calculate e for each
period (that is, each X Y combination for each period), and then record it
in the e column in the data table.
h. Note that when our software finds the equation of the line which most closely
matches the data, it finds the line which minimizes the sum of squared errors:
i. Min ∑e2 = ∑ (Y - b0 + b1X)2
i. Note, this procedure is useful because it enables us to forecast the dependent
variable (Y) using the independent variable (X).
i. However, we must first acquire an estimate for the independent variable
for the period when we wish to forecast the forecasting variable.
4. Data Considerations
a. Time series data: data covering multiple periods.
b. Cross sectional data: data on a variety of variables covering only one period.
5.
6.
7.
8.
c. Panel data: data on a variety of variable covering multiple periods.
The Bottom Line
a. We will use the software to find the b0 and b1 for the line which most closely
matches the plot of our data(Y-hat = b0 + b1X).
i. Graphically: find the line which “best fits” the data.
b. We will use the b0 and the b1, in combination with a forecast for the independent
variable (X), to produce a forecast (Y-hat) for the for dependent variable.
Visualization of Data
a. Observing data in graphical form can give insight into the regression process.
b. See table 4.1 and Figure 4.1.
c. “For all four of the data sets in Table 4.1, the calculated regression results show
an OLS equation of Y-hat = 3 + 0.5X. It might also be noted that the mean of the
X’s is 9.0 and the mean of the Y’s is 7.5 in all four cases. The standard deviation
is 3.32 for all of the X variables and 2.03 for all of the Y variables. Similarly, the
correlation for each pair of X and Y variable is 0.82.”
d. “Visualization of these data allows us to see stark differences that would not be
apparent from the descriptive statistics we have reviewed.”
i. “The regression line is most clearly inappropriate for the data in the lower
right plot.”
ii. “The upper right plot of data suggests that a nonlinear model would fit the
data better than a linear function.”
e. The bottom line: looking at the scatterplot of the data can often give you insight
into your data analysis:
A Process For Regression Forecasting
a. Data Considerations: “We should utilize graphic techniques to inspect the data,
looking especially for trend, seasonal, and cyclical components, as well as for
outliers. This will help in determining what type of regression model may be
most appropriate (e.g., linear versus nonlinear, or trend versus causal).”
b. Forecast the independent variable: “Each potential independent variable should be
forecast using a method that is appropriate to that particular series, taking into
account the model-selection guidelines discussed in Chapter 2 and summarized in
Table 2.1.”
c. Specify & evaluate the model: “… we mean the statistical process of estimating
the regression coefficients … In doing so we recommend a holdout period for
evaluation… you can then test the model in [the holdout] period to get a truer feel
for how well the model meets your needs.”
Forecasting With A Simple Linear Trend
a. DPI-hat = b0 + b1(T)
i. Here, T is time. You would construct this data yourself by listing in the
“Time” variable column a 1, 2, … n, where n is the number of periods in
your data set.
ii. Another way of representing the same model bivariate regression models
the following: DPI = β0 + β1T + ε
b. For the final forecast plug in the appropriate time period for the period being
forecasted (T), and values for b0 and b1 from the regression, into the above
equation.
c. Example:
i. Table 4.2: table of DPI data
ii. Figure 4.2: graph of DPI data
iii. P. 168-9: results of regression
iv. Table 4.3: error calculations
9. Using a Causal Regression Model to Forecast: A Jewelry Sales Forecast Based on
Disposable Personal Income.
a. JS-hat = b0 + b1(DPI)
i. This is the model which we will estimate.
ii. Regression 1: using actual jewelry sales.
iii. Regression 2: using deseasonalized jewelry sales.
1. Note that it is necessary to reseasonalize the estimate to achieve
the final forecast.
b. Forecasts for JS are made by plugging the following into the regression model
equation: JS-hat = b0 + b1(DPI)
i. (1) estimates for b0 and b1 from the regression software, and
ii. (2) forecasts for DPI from, for example, Holt’s forecasting model.
c. Example:
i. Table 4.4: table of JS and DPI data.
ii. Figure 4.4: graph of JS and deasonalized JS data.
iii. Figure 4.5: graphs of JS and DPI data showing possible relationship
between data.
iv. P. 175-6: results of regression 1.
v. P. 177-8: results of regression 2.
10. Statistical Evaluation of Regression Models
a. “Does the sign of the slope term make sense?”
b. “Is the slope term significantly positive or negative?” (That is, is the slope term
significantly different than zero?)
i. A lower p value in the statistical software output tells you that b1 is a more
significant (accurate) estimate of β1.
ii. If p is below 0.01, we say that our estimate is significant at the 1% level.
iii. If p is below 0.1, we say that our estimate is significant at the 10% level.
iv. Etcetera.
v. If p is above 0.1, we say that our estimate is insignificant.
c. R Squared
i. It tells the proportion of variation explained by the predicting variable.
ii. It gives us an indication of the quality of our forecast.
11. Using the Standard Error of the Estimate (SEE) to make interval forecasts
a. “The approximate 95 percent confidence interval can be calculated as follows:”
i. Point estimate + 2 (standard error of the estimate)
ii. Table 4.5: SEE and other regression results.
iii. P. 185: confidence interval calculations.
12. Forecasting Total Houses Sold With Two Bivariate Regression Models
a. Read for review of the basic principles already covered in the above lecture and
prior examples.
13. Homework
a. Exercises: 1, 2, 3, 4, 5, 6, 7, 8, 10
Chapter 5: Forecasting with Multiple Regression
1. The Multiple Regression Model
a. Y = β0 + β1X + β2X2 + … + βkXk + ε
i. This is the assumed relationship between X (the independent variable) and
Y (the dependent variable).
b. Y-hat = b0 + b1X1+ … + bkXk
i. This is the equation we estimate. We find the b’s which result in the Yhat’s (for all the different values for X that we have) that are closest to the
actual Y’s.
c. e = Y – Y-hat
i. This is the error term. It is how far our actual level of Y is from our
predicted level of Y (Y-hat).
d. Min ∑e2 = ∑ (Y – b0 + b1X1 + … + bkXk)2
i. Ordinary Least Squares (OLS) is an algorithm for choosing the b’s such
that the sum of the squared errors (the above equations) are minimized.
2. Selecting Independent Variables
a. Try to choose independent variables that are not too highly correlated:
b. Consider using proxies for variables for which data is not available:
i. “Sometimes it is difficult, or even impossible, to find a variable that
measures exactly what we want to have in our model… However, a more
readily available series … may be a reasonable proxy for what we want to
measure.”
3. Forecasting with a multiple-regression model
a. Example beginning on p. 227.
i. Note the signs on the coefficients.
ii. Forecasting using the regression results.
1. First, the independent variables included in the regression must be
forecasted. In this example, the forecast for the independent
variables will be made using Holt’s exponential smoothing model.
4. Statistical evaluation of multiple-regression models: Three quick steps
a. First, “see whether the signs on the coefficients make sense.”
b. Second, “consider whether these results are statistically significant at our desired
level of confidence.”
i. Significance level = 1 – “confidence level”
ii. t = bi / se(bi)
1. Is the calculated t ratio greater than the critical value. If so, the
estimated coefficient is significant.
a. Critical value can be found by looking it up in the table on
page 73.
b. Use n – (K + 1) degrees of freedom.
i. n is number of observations
ii. K is number of independent (right hand side)
variables.
iii. p value: calculated by software
1. Is the p value smaller than the significance level. If so, the
estimated coefficient is significant.
iv. See table 5.2, p. 236.
c. Third, evaluate the adjusted R squared. The adjusted R squared tells us the
proportion of the variation in the dependent variable explained by variation in the
independent variables.
i. The reason we look at the adjusted R squared is because “adding another
independent variable will always increase R-squared even if the variable
has no meaningful relation to the dependent variable.”
5. Accounting for seasonality in a multiple-regression model
a. Dummy variables: a variable that gets a one if the criterion for the dummy
variable is met, and zero otherwise.
b. For example, we could create a “first quarter” dummy variable, where the value
of the variable is one if it is the 1st quarter, and zero otherwise.
c. We could also have a dummy variable for the second quarter and third quarter,
but not the fourth quarter. You would not have a dummy variable for all of the
possible values of the underlying variable, rather all but one of the possible
values.
i. Note that the value of the coefficient on each dummy variable tells you the
difference between the effect for the value/range of the underlying
variable represented by the dummy variable, and the effect for the
value/range represented by the case not represented by a dummy variable.
Thus, the case not represented by a dummy variable is the benchmark
against which all other dummy variable coefficients are measured.
ii. Note that this makes a dummy variable regression different than a
seasonal index as seasonal indices measure relative differences (ratios),
while dummy variables measure absolute differences.
d. See table 5.6 on p. 255.
i. Note the movement of the R squared (Adjusted R squared) as we have
progress through each of the NCS regressions.
6. Extensions of the multiple-regression model
a. Sometimes adding a squared version of one of the variable to the regression can
improve the fit of the regression.
b. Implications of coefficient signs on time (T) and time squared (T2)
i. Coefficient on T:
1. Positive: effect over time is initially positive.
2. Negative: effect over time is initially nevative.
ii. Coefficient on T2:
1. Positive: effect over time is more towards positive
a. More positive (see graph displaying positive coefficient on
T) or
b. Less negative (see graph displaying negative coefficient on
T).
2. Negative: effect over time is less towards positive
a. Less positive (see graph displaying positive coefficient on
T) or
b. More negative (see graph displaying negative coefficient
on T).
c. See table 5.7 on p. 261.
i. Note that movement of the Adjusted R squared as we progressed one more
step in our series of NCS regressions.
7. Advice on using multiple regression in forecasting
a. KIS: Keep it simple. “The more complex the model becomes, the more difficult it
is to use. As more causal variables are used, the cost of maintaining the needed
database increases in terms of both time and money. Further, complex models are
more difficult to communicate to others who may be the actual users of the
forecast. They are less likely to trust a model that they do not understand than a
simpler model that they do understand.”
8. Homework:
a. Exercises: 1, 2, 3, 4b, 5, 11
Chapter 6: Time Series Decomposition
1. Introduction
a. Time series decomposition allows us to decompose a time series into its
constituent components.
b. In this chapter, we will decompose the initial time series into 4 constituent
components (4 subseries) – three predictable subseries and one random subseries.
Why is this useful? Once the decomposition is completed, the 3 non-random
subseries can be forecasted. This is useful because the sub-series can then be
recombined to make a forecast for the initial series.
2. The Basic Model
a. Y = T x S x C x I
i. T: the long-term component
ii. S: the seasonal component (the seasonal adjustment factor)
iii. C: the cyclical component (the cyclical adjustment factor)
iv. I: the irregular or random component (irregular or random variations)
3. Deseasonalizing the Data & Finding Seasonal Indicees
a. Conceptual Description
i. Calculate a moving average to essentially deseasonalize the data.
1. 4 period MA for quarterly data.
2. 12 period MA for monthly data.
3. 52 period MA for weekly data.
ii. For each period, determine the ratio of actual data to the MA.
iii. Calculate an average ratio for each period, going back as far as your data
allows.
iv. When you are done you will have an average ratio for each period, and
this will be your seasonal index (seasonal adjustment factor).
b. MAt (Moving Average)
i. Here t represents the period for which the MA is calculated, not the
number of periods included in the MA.
ii. Note that a moving average, calculated using the same number of periods
as the number of periods in a year for your data, will remove the
seasonality.
1. The number of periods included in the MA must correspond with
the number of periods in the year.
iii. The MA here is defined differently compared to chapter 3. Here the MA
takes into account future and past data.
1. When there is an even number of periods included in the MA, you
must take an odd number of data points from the past/future.
Therefore, there will be one additional data point from one of
either the past or the future. We will always take one additional
data point from the past.
2. See p. 302.
iv. Note that we would prefer to have a centered moving average.
Unfortunately, with an even number of periods our simple method of
calculating a moving average we cannot calculate a centered moving
average. However, if we average two of our moving averages, we can
produce a centered moving average.
c. CMAt (Centered Moving Average)
i. Creating a centered moving average allows you to equally weight past and
future data.
ii. Averaging two MAs to create a CMA is reasonable because it is just
averages two moving averages, one that weights the past “one data point”
more, and one that weights the present “one data point” more.
iii. p. 302
iv. CMAt is a deseasonalized version of the Yt. That is, the centered moving
average is one way of deseasonalizing the data.
d. SFt (Seasonal Factor)
i. Yt / CMAt
ii. See p. 304.
e. SIp (Seasonal Index)
i. Take the average SFt for each period in a year (e.g., for each month if you
have monthly data). This will produce a SFp for each period of the year
(e.g., for each month if you have monthly data). Then normalize the SFp
series so that the sum equals the periodicity (e.g., equals 12 for monthly
data).
1. SIp = SFp / (∑SFp / # p in a year)
ii. p denotes a particular period in the year. For example, it equals 1, 2, 3, or
4 for quarterly data and 1, 2, …, 12 for monthly data.
iii. SI is the “S” in the above equation.
iv. The variation in SIp allows us to determine the extent of the seasonality.
v. Deseasonalized data = Raw data / Seasonal Index
1. That is, the seasonal Index can also be used to deseasonalize the
data.
4. Finding the long term trend
a. In order to estimate the trend component of the original series, find the line which
most closely resembles the CMA series. That is, find the a and the b in the below
equation which results in the line which most closely resembles the CMA series.
i. CMATt = a + b time
1. “time” refers to the time period. The 1st period would be time=1,
the second period would be time=2, etc.
2. Note, with each additional period, CMAT will increase by “b”.
ii. CMATt is the centered moving average trend.
iii. CMATt is the “T” or trend component in the above time series
decomposition equation.
iv. The CMATt equation can be used to forecast the trend component going
as far forward as one wants. Just plug in a value for “time” in the above
equation and you will get a CMATt value.
b. ForecastX utilizes an algorithm to find the line (the “a” and the “b”) that minimize
the sum of the squared distances between the line and the CMA series.
i. That is, ForecastX estimates the following equation.
ii. CMAt = a + b time + error
iii. CMA-hatt = a + b time
1. CMA-hatt is referred to as CMAT (centered moving average
trend).
iv. CMAT is the “T” in the above equation.
5. Measuring the Cyclical Component
a. CFt = CMAt / CMATt
b. CF is used to estimate the cyclical component in the original time series.
c. CFt tells us how high the deseasonalized data is relative to its trend value.
i. A value above one tells us that the deseasonalized data is high relative to
the trend value.
ii. A value below one tells us that the deseasonalized data is low relative to
the trend value.
d. CF is the “C” in the above equation.
e. “A cycle factor greater than 1 indicates that the deasonalized value for that period
is above the long-term trend of the data. If CF is less than 1, the reverse is true.”
f. “The cycle factor is the most difficult component of a time series to analyze and
to project into the forecast period. If analyzed carefully, however, it may also be
the component that has the most to offer in terms of understanding where the
industry may be headed. Looking at the length and amplitude of previous cycles
may enable us to anticipate the next turning point in the current cycle. This is a
major advantage of the time-series decomposition technique. An individual
familiar with an industry can often explain cyclic movements around the trend
line in terms of variables or events that, in retrospect, can be seen to have had
some import. By looking at those variables or events in the present, we can
sometimes get some hint of the likely future direction of the cycle component.”
g. Overview of Business Cycles
i. Expansion phase
ii. Recession or Contraction phase
iii. “If business cycles were true cycles, they would have a constant
amplitude. That is, the vertical distance from trough to peak and peak to
trough would always be the same. In addition, a true cycle would also
have a constant periodicity. That would mean that the length of time
between successive peaks (or troughs) would always be the same.”
h. Business Cycle Indicators
i. See Leamer paper.
ii. The index of leading economic indicators
1. See Table 6.4 on p. 311.
2. See Figure 6.5 on p. 312.
iii. The index of coincident economic indicators
1. See Table 6.4 on p. 311.
2. See Figure 6.5 on p. 312.
iv. The index of lagging economic indicators
1. See Table 6.4 on p. 311.
2. See Figure 6.5 on p. 312.
v. “It is possible that one of these indices, or one of the series that make up
an index, may be useful in predicting the cycle factor in a time-series
decomposition. This could be done in a regression analysis with the cycle
factor (CF) as the dependent variable.”
i. The Cycle Factor for Private Housing Starts
i. See Figure 6.6 on p. 313.
1. Note, the cycle factors in bold are estimated values.
2. “You see that the cyclical component for private housing starts
does not have a constant amplitude or periodicity.”
6. Forecasting the Cycle Factor
a. Subjective methods: “Perhaps most frequently the cycle factor forecast is made
on a largely judgmental basis by looking carefully at the historical values,
especially historical turning points and the rates of descent or rise in the historical
series.”
i. This can be done by “focusing on prior peaks and troughs, with particular
attention to their amplitude and periodicity” “You might look at the peakto-peak, trough-to-trough, and trough-to-peak distances by dating each
turning point, such as we show in Figure 6.6. Then … you could calculate
the average distance between troughs (or peaks) to get a feeling for when
another such point is likely.”
1. Amplitude: peak to trough distance of the cycle.
2. Periodicity: time duration for a complete cycle to occur. For
example, the peak to peak or trough to trough distance.
3. “The dates for peaks and troughs are shown in Figure 6.6, along
with the values of the cycle factor at those points. Identification of
these dates and values is often helpful in considering when the
cycle factor may next turn around.”
ii. “You can also analyze the rates of increase and/or decrease in the cycle
factor as a basis on which to judge the expected slope of the forecast of the
cycle factor.”
b. Quantitative Methods: “Another approach would be to use another forecasting
method to forecast values for CF. Holt’s exponential smoothing may sometimes
be a good candidate for this task, but we must remember that such a model will
not pick up a turning point until after it has occurred. Thus, the forecaster would
never predict that the current rise or fall in the cycle would end.”
i. “If we have recently observed a turning point and have several quarters of
data since the turning point, and if we believe another turning point is
unlikely during the forecast horizon, then Holt’s exponential smoothing
may be useful.”
7. The Time-Series Decomposition Forecast
a. Y = T x S x C x I
b. FY = CMAT x SI x CF x I
i. FY = CMAT x (Y / CMA) x (CMA / CMAT) x 1
ii. I is “the irregular component. This is assumed equal to 1 unless the
forecaster has reason to believe a shock may take place, in which case I
could be different from 1 for all or part of the forecast period.”
c. “You will note that this method takes the trend [CMAT] and makes two
adjustments to it: the first adjusts it for seasonality (with SI), and the second
adjust it for cycle variations (with CF).
d. See Table 6.5 on p. 317-18.
i. The CMAT series can be extended indefinitely.
ii. The SI series can also be extended indefinitely.
iii. The CF series must be estimated, either by a forecasting method such as
Holt’s exponential smoothing, or by subjective evaluation.
1. Note that “the cycle factors [CF series] starting in July 2006 are
estimated values rather than actual ratios of PHS-CMA to PHSCMAT.”
e. “Because time-series decomposition models do not involve a lot of mathematics
or statistics, they are relatively easy to explain to the end user. This is a major
advantage, because if the end user has an appreciation of how the forecast was
developed, he or she may have more confidence in its use for decision making.”
8. Homework
a. Exercises: 1, 2, 3, 4, 6 a-f, 13
Download