Document

advertisement
DECISION MODELING WITH
MICROSOFT EXCEL
Chapter 13
Part 1
Copyright 2001
Prentice Hall
Many important decisions made by individuals
and organizations crucially depend on an
assessment of the future.
There are a few “wise” sayings that illustrate the
promise and frustration of forecasting:
“It is difficult to forecast, especially in
regards to the future.”
“It isn’t difficult to forecast, just to forecast
correctly.”
“Numbers, if tortured enough, will confess to
just about anything.”
Economic forecasts influence
Government policies and business decisions
Insurance companies’ investment decisions
in mortgages and bonds
Service industries’ (such as airlines, hotels,
rental cars, cruise lines, etc.) forecasts of
demand as input for revenue management
Forecasting is playing an increasingly important
role in the modern firm.
There is clearly a steady increase in the use of
quantitative forecasting models at many levels in
industry and government.
The many types of forecasting models will be
distributed into two major techniques:
Quantitative and Qualitative
Quantitative forecasting models possess two
important and attractive features:
1. They are expressed in mathematical
notation. Thus, they establish an
unambiguous record of how the forecast is
made.
2. With the use of spreadsheets and
computers, quantitative models can be
based on an amazing quantity of data.
Two types of quantitative forecasting models that
will be discussed in the next two sections are:
Causal models and Time-Series models
In a causal forecasting model, the forecast for the
quantity of interest “rides piggyback” on another
quantity or set of quantities.
In other words, our knowledge of the value of
one variable (or perhaps several variables)
enables us to forecast the value of another
variable.
In this model, let
y denote the true value of some variable of
interest and
^
y denote a predicted or forecast value for
that variable.
Then, in a causal model,
^
y = f(x1, x2, … xn)
where
f is a forecasting rule, or function, and
x1, x2 , … xi , is a set of variables
In this representation, the x variables are often
called independent variables, whereas ^
y is the
dependent or response variable.
We either know the independent variables in
advance or can forecast them more easily than ^
y.
Then the independent variables will be used in the
forecasting model to forecast the dependent
variable.
Companies often find by looking at past
performance that their monthly sales are directly
related to the monthly GDP, and thus figure that
a good forecast could be made using next
month’s GDP figure.
The only problem is that this quantity is not
known, or it may just be a forecast and thus not a
truly independent variable.
To use a causal forecasting model, requires two
conditions:
1. There must be a relationship between
values of the independent and dependent
variables such that the former provides
information about the latter.
2. The values for the independent variables
must be known and available to the
forecaster at the time the forecast is made.
Simply because there is a mathematical
relationship does not guarantee that there is
really cause and effect.
One commonly used approach in creating a causal
forecasting model is called curve fitting.
CURVE FITTING:
AN OIL COMPANY EXPANSION
Consider an oil company that is planning to
expand its network of modern self-service
gasoline stations.
The company plans to use traffic flow (measured
in the average number of cars per hour) to
forecast sales (measured in average dollar sales
per hour).
The firm has had five stations in operation for
more than a year and has used historical data to
calculate the following averages:
The averages are plotted in a scatter diagram.
$300.00
$250.00
Sales/hour ($)
$200.00
$150.00
$100.00
$50.00
$0
50
100
150
Cars/hour
200
250
Now, these data will be used to construct a
function that will be used to forecast sales at any
proposed location by measuring the traffic flow at
that location and plugging its value into the
constructed function.
Least Squares Fits The method of least squares is
a formal procedure for curve fitting. It is a twostep process.
1. Select a specific functional form (e.g., a
straight line or quadratic curve).
2. Within the set of functions specified in step
1, choose the specific function that
minimizes the sum of the squared
deviations between the data points and the
function values.
To demonstrate the process, consider the salestraffic flow example.
1. Assume a straight line; that is, functions of
the form y = a + bx.
2. Draw the line in the scatter diagram and
indicate the deviations between observed
points and the function as di .
For example,
d1 = y1 – [a +bx1] = 220 – [a + 150b]
where
y1 = actual sales/hr at location 1
x1 = actual traffic flow at location 1
a = y-axis intercept for the function
b = slope for the function
$300.00
y
d3
$250.00
d1
Sales/hour ($)
$200.00
d5
y = a + bx
d4
$150.00
d2
$100.00
$50.00
$0
50
100
150
200
x250
Cars/hour
The value d12 is one measure of how close the
value of the function [a +bx1] is to the observed
value, y1; that is it indicates how well the
function fits at this one point.
One measure of how well the function fits overall
is the sum of the squared deviations:
5
di2
S
i=1
Consider a general model with n as opposed to
five observations. Since each di = yi – (a +bxi),
the sum of the squared deviations can be written
as:
n
2
(
y
–
[a
+b
x
])
S i
i
i=1
Using the method of least squares, select a and b
so as to minimize the sum in the equation above.
Now, take the partial derivative of the sum with
respect to a and set the resulting expression
equal to zero.
n
-2(yi – [a +bxi]) = 0
S
i=1
A second equation is derived by following the
same procedure with b.
n
-2xi (yi – [a +bxi]) = 0
S
i=1
Recall that the values for xi and yi are the
observations, and our goal is to find the values of
a and b that satisfy these two equations.
The solution is:
n
b=
1
x
y
S
i i
n
i=1
a= 1
n
n
n
n
xi S yi
S
i=1
i=1
n
1
2
x
S i -n
i=1
xi
S
i=1
n
n
1
y
b
S i
n
i=1
2
xi
S
i=1
The next step is to determine the values for:
n
xi
S
i=1
n
xi2
S
i=1
n
yi
S
i=1
n
xiyi
S
i=1
Note that these quantities depend only on
observed data and can be found with simple
arithmetic operations or automatically using
Excel’s predefined functions.
Using Excel, click on Tools – Data Analysis …
In the resulting
dialog, choose
Regression.
In the Regression dialog, enter the Y-range and
X-range.
Choose to
place the
output in
a new
worksheet
called
Results
Select Residual Plots and Normal Probability Plots
to be created along with the output.
Click OK to produce the following results:
Note that a (Intercept) and b (X Variable 1) are
reported as 57.104 and 0.92997, respectively.
To add the resulting least squares line, first click
on the worksheet Chart 1 which contains the
original scatter plot.
Next, click on the data series so that they are
highlighted and then choose Add Trendline …
from the Chart pull-down menu.
Choose Linear Trend in the resulting dialog and
click OK.
A linear trend is fit to the data:
$300.00
$250.00
Sales/hour ($)
$200.00
Series1
$150.00
Linear (Series1)
$100.00
$50.00
$0
50
100
150
Cars/hour
200
250
One of the other summary output values that is
given in Excel is: R Square = 69.4%
This is a “goodness of fit” measure which
represents the R2 statistic discussed in
introductory statistics classes.
R2 ranges in value from 0 to 1 and gives an
indication of how much of the total variation in Y
from its mean is explained by the new trend line.
In fact, there are three different sums of errors:
TSS (Total Sum of Squares)
ESS (Error Sum of Squares)
RSS (Regression Sum of Squares)
The basic relationship between them is:
TSS = ESS + RSS
They are defined as follows:
TSS =
ESS =
RSS =
n
–
n
^
(Yi – Y )2
S
i=1
(Yi – Yi )2
S
i=1
n
^
–
( Yi – Y ) 2
S
i=1
Essentially, the ESS is the amount of variation
that can’t be explained by the regression.
The RSS quantity is effectively the amount of the
original, total variation (TSS) that could be
removed using the regression line.
R2 is defined as:
R2
RSS
=
TSS
If the regression line fits perfectly, then ESS = 0
and RSS = TSS, resulting in R2 = 1.
In this example, R2 = .694 which means that
approximately 70% of the variation in the Y
values is explained by the one explanatory
variable (X), cars per hour.
Now, returning to the original question: Should
we build a station at Buffalo Grove where traffic
is 183 cars/hour?
The best guess at what the corresponding sales
volume would be is found by placing this X value
into the new regression equation:
^
y
=
a
+
b
*
x
Sales/hour = 57.104 + 0.92997 * (183 cars/hour)
= $227.29
However, it would be nice to be able to state a
95% confidence interval around this best
guess.
We can get the information to do this from Excel’s
Summary Output.
Excel reports that the
standard error (Se) is
44.18.
This quantity represents
the amount of scatter in
the actual data around
the regression line.
The formula for Se is:
n
Se =
^
(Yi – Yi )2
S
i=1
n – k -1
Where n is the number
of data points (e.g., 5)
and k is the number of
independent variables
(e.g., 1).
This equation is equivalent to:
ESS
n – k -1
Once we know Se and based on the normal
distribution, we can state that
• We have 68% confidence that the actual
value of sales/hour is within + 1 Se of the
predicted value ($277.29).
• We have 95% confidence that the actual
value of sales/hour is within + 2 Se of the
predicted value ($277.29).
The 95% confidence interval is:
[277.29 – 2(44.18); 227.29 + 2(44.18)]
[$138.93; $315.65]
Another value of interest in the Summary report
is the t-statistic for the X variable and its
associated values.
The t-statistic is 2.61 and the P-value is 0.0798.
A P-value less than 0.05 represents that we have
at least 95% confidence that the slope parameter
(b) is statistically significantly than 0 (zero).
A slope of 0 results in a flat trend line and
indicates no relationship between Y and X.
The 95% confidence limit for b is [-0.205; 2.064]
Thus, we can’t exclude the possibility that the
true value of b might be 0.
Also given in the Summary report is the
F –significance. Since there is only one
independent variable, the F –significance is
identical to the P-value for the t-statistic.
In the case of more than one X variable, the F –
significance tests the hypothesis that all the X
variable parameters as a group are statistically
significantly different than zero.
Concerning multiple regression models, as you
add other X variables, the R2 statistic will always
increase, meaning the RSS has increased.
In this case, the Adjusted
R2 statistic is a reliable
indicator of the true
goodness of fit because it
compensates for the
reduction in the ESS due to
the addition of more
independent variables.
Thus, it may report a decreased adjusted R2 value
even though R2 has increased, unless the
improvement in RSS is more than compensated
for by the addition of the new independent
variables.
Fitting a Quadratic Function The method of least
squares can be used with any number of
independent variables and with any functional
form (not just linear).
Suppose that we wish to fit a quadratic function
of the form
y = a0 + a1x + a2x2
to the previous data with the method of least
squares.
The goal is to select a0 , a1 , and a2 in order to
minimize the sum of squared deviations, which is
now
5
S (yi – [a0 + a1xi + a2xi2 ])2
i=1
Proceed by setting the partial derivatives with
respect to a0 , a1 , and a2 equal to zero. This gives
the equations
5a0 + (Sxi)a1 + (Sxi2 )a2 = Syi
(Sxi)a0 + (Sxi2)a1 + (Sxi3)a2 = Sxiyi
(Sxi2)a0 + (Sxi3)a1 + (Sxi4)a2 = Sxi2yi
This is a simple set of three linear equations in
three unknowns.
Thus, the general name for this least squares
curve fitting is “Linear Regression.”
The term linear comes from the fact that
simultaneous linear equations are being solved.
Solver will be used to find the coefficients in
Excel. Consider the following worksheet:
=$B$2+$B$3*B7+$B$4xB7^2
=C7 – D7
=SUM(F7:F11)
=E7^2
Now, to find the optimal values for the
parameters (a0 , a1 , and a2) using Solver, first
click on Tools – Solver.
In the resulting Solver Parameter dialog, specify
the following settings:
Click Solve to solve the unconstrained, nonlinear
optimization model.
In this model, the objective function is to
minimize the sum of squared errors.
Here are the Solver results.
The parameter values are:
=$B$2+$B$3*B7+$B$4xB7^2
=C7 – D7
=E7^2
=SUM(F7:F11)
=SUMXMY2(C7:C11,D7:D11)
This formula calculates the sum of squared errors
directly.
Use Excel’s Chart Wizard to plot the original data
and the resulting quadratic function.
First, highlight the original range of data,
then click on the Chart Wizard button.
Use Excel’s Chart Wizard to plot the original data
as a scatter diagram and specify a quadratic
function via the Chart – Add Trendline … option.
$300.00
$250.00
Sales/hour
$200.00
Series1
$150.00
Poly. (Series1)
$100.00
$50.00
$0
50
100
150
Cars/hour
200
250
Comparing the Linear and Quadratic Fits In the
method of least squares, the sum of the squared
deviations was selected as the measure of
“goodness of fit.”
Thus, the linear and quadratic fits can be
compared with this criterion.
In order to make this comparison, go back to the
linear regression “Results” spreadsheet and make
the corresponding calculation in the original
“Data” spreadsheet.
= E2^2
= C2 – D2
= SUM(F2:F6)
Note that the sum of the squared deviations for
the quadratic function is indeed smaller than that
for the linear function (i.e., 4954 < 5854.7).
Indeed, the quadratic gives roughly a 15%
decrease in the sum of squared deviations.
A linear function is a special type of quadratic
function in which a2 = 0.
It follows then: the best quadratic function must
be at least as good as the best linear function.
WHICH CURVE TO FIT?
If a quadratic function is at least as good as a
linear function, why not choose a more general
form, thereby getting an even better fit?
In practice, functions of the form (with only a
single independent variable for illustrative
purposes) are often suggested:
y = a0 + a1x + a2x2 + … + anxn
Such a function is called a polynomial of degree n,
and it represents a broad and flexible class of
functions.
n=2
quadratic
n=3
cubic
n=4
quartic
…
One must proceed with caution when fitting data
with a polynomial function.
For example, it is possible to find a (k – 1)-degree
polynomial that will perfectly fit k data points.
To be more specific, suppose we have seven
historical observations, denoted
(xi , yi), i = 1, 2, …, 7
It is possible to find a sixth-degree polynomial
y = a0 + a1x + a2x2 + … + a6x6
that exactly passes through each of these seven
data points.
A perfect fit gives zero for the sum of squared
deviations.
However,
this is
deceptive,
for it does
not imply
much about
the
predictive
value of the
model for
use in
future
forecasting.
Despite the perfect fit of the polynomial function,
the forecast is very inaccurate. The linear fit
might provide more realistic forecasts.
Also, note
that the
polynomial
fit has
hazardous
extrapolation
properties
(i.e., the
polynomial
“blows up”
at its
extremes).
One way of finding which fit is truly “better” is to
use a different standard of comparison, the
“mean squared error” or MSE.
MSE =
sum of squared errors
(# of points – # of parameters)
For the linear fit, the number of parameters
estimated is 2 (a, b)
MSE =
5854
(5-2)
= 1951.3
For the quadratic fit
MSE =
4954
(5-3)
= 2477.0
So, the MSE gets worse in this case even though
the total sum of squares will always be less or the
same for a higher-order fit.
When there is a perfect fit, both the total sum of
squares and the MSE will be 0.00.
Because of this, most forecasting programs will
fit only up through a cubic polynomial, since
higher degrees don’t reflect the general trend of
actual data.
What is a Good Fit? A good historical fit may
have poor predictive power. So what is a good
fit?
It depends on whether one has some idea about
the underlying real-world process that relates the
y’s and x’s.
To be an effective forecasting device, the
forecasting function must to some extent capture
important features of that process.
The more one knows, the better one can do.
However, knowledge of the underlying process is
typically phrased in statistical language.
For example, linear curve fitting, in the statistical
context, is called linear regression.
If the statistical assumptions about the linear
regression model are precisely satisfied (e.g.,
errors are normally distributed around the
regression line), then in a precise and welldefined sense, statisticians can prove that the
linear fit is the “best possible fit.”
In the real world, one can never be completely
certain about the underlying process.
The question then becomes: How much
confidence can we have that the underlying
process is one that satisfies a particular set of
statistical assumptions?
Fortunately, statistical analysis can reveal how
well the historical data do indeed satisfy those
assumptions.
And if it does not satisfy the assumptions, then
try a different model.
Remember, there is an underlying real-world
problem and the model is a selective
representation of that problem.
How good is that model? Ideally, to test the
goodness of a model, one would like to have
considerable experience with its use.
If, in repeated use, it is observed that the model
performs well, then our confidence is high.
However, what confidence can we have at the
outset, without experience?
Validating Models One benchmark, is to ask the
question: Suppose the model had been used to
make past decisions; how well would the firm
have fared?
This approach “creates” experience by simulating
the past.
This is often referred to as validation of the
model.
Typically, one uses only a portion of the historical
data to create the model – for example, to fit a
polynomial of a specified degree.
One can then use the remaining data to see how
well the model would have performed.
End of Part 1
Please continue to Part 2
Download