FORECASTING Quantitative Models Causal Models 1 Forecasting Models Forecasting Quantitative Qualitative Causal Model Expert Judgment Trend Time series Delphi Method Grassroots Stationary Market Research Trend Trend + Seasonality Jury Exec. Opinion 2 Quantitative Forecasts 3 Causal Forecasting Models In a causal forecasting model, the forecast for the quantity of interest “rides piggyback” on another quantity or set of quantities In other words, our knowledge of the value of one variable (or perhaps several variables) enables us to forecast the value of another variable. In this model, let y ^ y or y’ denote the true value of some variable of interest and denote a predicted or forecast value for that variable. 4 Causal Forecasting Models 5 Causal Forecasting Models One commonly used approach in creating a causal forecasting model is called curve fitting. 6 Curve Fitting Example: AN OIL COMPANY EXPANSION Consider an oil company that is planning to expand its network of modern self-service gasoline stations. The company plans to use traffic flow (measured in the average number of cars per hour) to forecast sales (measured in average dollar sales per hour). The firm has had five stations in operation for more than a year and has used historical data to calculate the following averages: 7 Curve Fitting Example: AN OIL COMPANY EXPANSION $300.00 $250.00 Sales/hour ($) $200.00 $150.00 $100.00 $50.00 $0 50 100 150 Cars/hour 200 250 8 Curve Fitting Example: AN OIL COMPANY EXPANSION Now, these data will be used to construct a forecast Use to forecast sales at any proposed location by measuring the traffic flow at that location and plugging its value into the constructed function. 9 Least Squares Method Least Squares Fits The method of least squares is a formal procedure for curve fitting. It is a two-step process. 1. Select a specific functional form (e.g., a straight line or quadratic curve). 2. Within the set of functions specified in step 1, choose the specific function that minimizes the sum of the squared deviations between the data points and the function values. 10 To demonstrate the process, consider the salestraffic flow example. 1. Assume a straight line; that is, functions of the form y = a + bx. 2. Draw the line in the scatter diagram and indicate the deviations between observed points and the function as di . For example, d1 = y1 – [a +bx1] = 220 – [a + 150b] where y1 = actual sales/hr at location 1 x1 = actual traffic flow at location 1 a = y-axis intercept for the function b = slope for the function 11 $300.00 y d3 $250.00 d1 Sales/hour ($) $200.00 d5 y = a + bx d4 $150.00 d2 $100.00 $50.00 $0 50 100 150 x250 200 Cars/hour The value d12 is one measure of how close the value of the function [a +bx1] is to the observed value, y1; that is it indicates how well the function fits at this one point. 12 One measure of how well the function fits overall is the sum of the squared deviations: 5 di2 S i=1 Consider a general model with n as opposed to five observations. Since each di = yi – (a +bxi), the sum of the squared deviations can be written as: n 2 ( y – [a +b x ]) S i i i=1 Using the method of least squares, select a and b so as to minimize the sum in the equation above. 13 Now, take the partial derivative of the sum with respect to a and set the resulting expression equal to zero. n -2(yi – [a +bxi]) = 0 S i=1 A second equation is derived by following the same procedure with b. n -2xi (yi – [a +bxi]) = 0 S i=1 Recall that the values for xi and yi are the observations, and our goal is to find the values of a and b that satisfy these two equations. 14 The solution is: n b= 1 x y S i i n i=1 a= 1 n n n n xi S yi S i=1 i=1 n 1 2 x S i -n i=1 xi S i=1 n n 1 y b S i n i=1 2 xi S i=1 The next step is to determine the values for: n xi S i=1 n xi2 S i=1 n yi S i=1 n xiyi S i=1 Note that these quantities depend only on observed data and can be found with simple arithmetic operations or automatically using Excel’s predefined functions. 15 Using Excel, click on Tools – Data Analysis … In the resulting dialog, choose Regression. 16 Using Excel, click on Tools – Data Analysis … In the resulting dialog, choose Regression. 17 In the Regression dialog, enter the Y-range and Xrange. Choose to place the output in a new worksheet called Results Select Residual Plots and Normal Probability Plots to be created along with the output. 18 Click OK to produce the following results: Note that a (Intercept) and b (X Variable 1) are reported as 57.104 and 0.92997, respectively. 19 To add the resulting least squares line, first click on the worksheet Chart 1 which contains the original scatter plot. Next, click on the data series so that they are highlighted and then choose Add Trendline … from the Chart pull-down menu. 20 Choose Linear Trend in the resulting dialog and click OK. 21 A linear trend is fit to the data: $300.00 $250.00 Sales/hour ($) $200.00 Series1 $150.00 Linear (Series1) $100.00 $50.00 $0 50 100 150 200 250 Cars/hour 22 One of the other summary output values that is given in Excel is: R Square = 69.4% This is a “goodness of fit” measure which represents the R2 statistic discussed in introductory statistics classes. R2 ranges in value from 0 to 1 and gives an indication of how much of the total variation in Y from its mean is explained by the new trend line. In fact, there are three different sums of errors: TSS (Total Sum of Squares) ESS (Error Sum of Squares) RSS (Regression Sum of Squares) 23 The basic relationship between them is: TSS = ESS + RSS They are defined as follows: TSS = ESS = RSS = n – n ^ (Yi – Y )2 S i=1 (Yi – Yi )2 S i=1 n ^ – ( Yi – Y ) 2 S i=1 Essentially, the ESS is the amount of variation that can’t be explained by the regression. The RSS quantity is effectively the amount of the original, total variation (TSS) that could be 24 removed using the regression line. R2 is defined as: R2 RSS = TSS If the regression line fits perfectly, then ESS = 0 and RSS = TSS, resulting in R2 = 1. In this example, R2 = .694 which means that approximately 70% of the variation in the Y values is explained by the one explanatory variable (X), cars per hour. 25 Now, returning to the original question: Should we build a station at Buffalo Grove where traffic is 183 cars/hour? The best guess at what the corresponding sales volume would be is found by placing this X value into the new regression equation: ^ y = a + b * x Sales/hour = 57.104 + 0.92997 * (183 cars/hour) = $227.29 However, it would be nice to be able to state a 95% confidence interval around this best guess. 26 We can get the information to do this from Excel’s Summary Output. Excel reports that the standard error (Se) is 44.18. This quantity represents the amount of scatter in the actual data around the regression line. The formula for Se is: n Se = ^ (Yi – Yi )2 S i=1 n – k -1 Where n is the number of data points (e.g., 5) and k is the number of independent variables (e.g., 1). 27 This equation is equivalent to: ESS n – k -1 Once we know Se and based on the normal distribution, we can state that • We have 68% confidence that the actual value of sales/hour is within + 1 Se of the predicted value ($277.29). • We have 95% confidence that the actual value of sales/hour is within + 2 Se of the predicted value ($277.29). The 95% confidence interval is: [277.29 – 2(44.18); 227.29 + 2(44.18)] [$138.93; $315.65] 28 Another value of interest in the Summary report is the t-statistic for the X variable and its associated values. The t-statistic is 2.61 and the P-value is 0.0798. A P-value less than 0.05 represents that we have at least 95% confidence that the slope parameter (b) is statistically significantly different than 0 (zero). A slope of 0 results in a flat trend line and indicates no relationship between Y and X. The 95% confidence limit for b is [-0.205; 2.064] Thus, we can’t exclude the possibility that the 29 true value of b might be 0. Also given in the Summary report is the F –significance. Since there is only one independent variable, the F –significance is identical to the P-value for the t-statistic. In the case of more than one X variable, the F – significance tests the hypothesis that all the X variable parameters as a group are statistically significantly different than zero. 30 Concerning multiple regression models, as you add other X variables, the R2 statistic will always increase, meaning the RSS has increased. In this case, the Adjusted R2 statistic is a reliable indicator of the true goodness of fit because it compensates for the reduction in the ESS due to the addition of more independent variables. Thus, it may report a decreased adjusted R2 value even though R2 has increased, unless the improvement in RSS is more than compensated for by the addition of the new independent 31 variables. WHICH CURVE TO FIT? If, for example, a quadratic function fits better than a linear function, why not choose a more general form, thereby getting an even better fit? In practice, functions of the form (with only a single independent variable for illustrative purposes) are often suggested: y = a0 + a1x + a2x2 + … + anxn Such a function is called a polynomial of degree n, and it represents a broad and flexible class of functions. n=2 quadratic n=3 cubic n=4 quartic 32 … One must proceed with caution when fitting data with a polynomial function. For example, it is possible to find a (k – 1)-degree polynomial that will perfectly fit k data points. To be more specific, suppose we have seven historical observations, denoted (xi , yi), i = 1, 2, …, 7 It is possible to find a sixth-degree polynomial y = a0 + a1x + a2x2 + … + a6x6 that exactly passes through each of these seven data points. 33 A perfect fit gives zero for the sum of squared deviations. However, this is deceptive, for it does not imply much about the predictive value of the model for use in future forecasting. 34 Despite the perfect fit of the polynomial function, the forecast is very inaccurate. The linear fit might provide more realistic forecasts. Also, note that the polynomial fit has hazardous extrapolation properties (i.e., the polynomial “blows up” at its extremes). 35 Reliability and Validity Does the model make intuitive sense? Is the model easy to understand and interpret? Are the coefficients statistically significant (pvalues less than .05)? Are the signs associated with the coefficients as expected? Does the model predict values that are reasonably close to the actual values? Is the model sufficiently sound (high R2, low standard error, etc.)? 36 Correlation Coefficient and Coefficient of Determination r n X iYi X i Yi [n X i2 ( X i ) 2 ][ Yi 2 ( Yi ) 2 ] Coefficient of determination = r2. Correlation coefficient = r. Where: Yi = dependent variable. Xi = independent variable. n = number of observations. 37 Correlation Coefficient and Coefficient of Determination 38 Summary: Causal Forecasting Models The goal of causal forecasting model is to develop the best statistical relationship between a dependent variable and one or more independent variables. The most common model approach used in practice is regression analysis. Only linear regression models are examined in this course. In causal forecasting models, when one tries to predict a dependent variable using a single independent variable, it is called a simple regression model. When one uses more than one independent variable to forecast the dependent variable, it is called a multiple regression model. 39