Demand Forecasting II Causal Analysis Chris Caplice ESD.260/15.770/1.260 Logistics Systems Sept 2006 Agenda Forecasting Evaluation Use of Causal Models in Forecasting Approach and Methods Ordinary Least Squares (OLS) Regression Other Approaches Closing Comments on Forecasting MIT Center for Transportation & Logistics – ESD.260 2 © Chris Caplice, MIT Forecast Evaluation How do we determine what is a good forecast? Accuracy - Closeness to actual observations Bias - Persistent tendency to over or under predict Fit versus Forecast – Tradeoff between accuracy to past forecast to usefulness of predictability Forecast Optimality – Error is equal to the random noise distribution Combination of art and science Statistically – find a valid model Art – find a model that makes sense MIT Center for Transportation & Logistics – ESD.260 3 © Chris Caplice, MIT Accuracy and Bias Measures 1. Forecast Error: et = xt - ̂xt n MD = 2. Mean Deviation: ∑e t t =1 n n 3. Mean Absolute Deviation MAD = ∑e t t =1 n n 4. Mean Squared Error: MSE = n 5. Root Mean Squared Error: t =1 n n 2 t n 2 t 6. Mean Percent Error: MIT Center for Transportation & Logistics – ESD.260 t =1 ∑e RMSE = 7. Mean Absolute Percent Error: MAPE = ∑e n et ∑ D MPE = t =1 t n et ∑D t =1 t n 4 © Chris Caplice, MIT Moving Average Forecasts 116.00 115.00 114.00 113.00 MD MAD MSE RMSE MAPE 112.00 111.00 110.00 MA3 0.05 0.56 0.47 0.68 0.50% MA10 0.21 1.07 1.67 1.29 0.96% MA20 0.35 1.41 2.71 1.65 1.27% 109.00 108.00 - 20.00 MA3 40.00 MA10 60.00 MA20 MIT Center for Transportation & Logistics – ESD.260 80.00 100.00 120.00 ActDemand 5 © Chris Caplice, MIT Analysis of the Forecast 25 Are the forecast errors ~N(0,Var(e))? Frequency 10 What is the expected value of the 5 errors? What is the variance of the errors? .1 9) 0. 12 0. 42 0. 72 1. 03 1. 33 M or e (0 .4 9) .1 0) .7 9) (0 (1 .4 0) 0 (0 For Moving Averages: 15 (1 20 From actual observations, error Are the observed errors ~N(0,Var(e))? For the MA3 data Errors 2.00 μe = 0.05 σe= 0.69 σD= 1.478 1.50 1.00 0.50 Testing for Normalcy – Chi-Square, Kolmogorov-Smirnov, or other tests (0.50) - 20 40 60 80 100 120 (1.00) (1.50) (2.00) MIT Center for Transportation & Logistics – ESD.260 6 © Chris Caplice, MIT Corrective Actions to Forecasts Measures of Bias Cumulative Sum of Errors (Ct) Normalize by dividing by RMSE (Ut) Ut should ~0 if unbiased Smoothed Error Tracking Signal (Tt) Tt=zt/MADt Where zt= ωet + (1-ω)zt-1 (smoothing constant) Autocorrelation of forecast Errors Correlation between successive observations Corrective Actions Adaptive Forecasting Methods where the smoothing coefficients change over time Found (generally) to be no better than standard methods Human Intervention Overrule the model’s output – look for reason Rules of thumb: |Tt|>f or |Ct|>k(RMSE) (f~0.4 and k~4) Lower values (of k or f) lead to more intervention MIT Center for Transportation & Logistics – ESD.260 7 © Chris Caplice, MIT Causal Forecasting Models Assumes that demand is highly correlated with some environmental factors Model is built to relate the independent exogenous factors to the demand Examples: Diapers ~ f(birth rates lagged by 1 year) NFL Jerseys ~ f(team and individual performance) New products ~ f(product lifecycle) Promotional Items ~ f(marketing promotions & ads) Regional Sales ~ f(household demographics in area) Umbrellas / Fuel ~ f(weather, temperature, rain, etc.) Form of Dependent Variable dictates the method used Continuous – takes any value Discrete – takes only integer values Binary – is equal to 0 or 1 MIT Center for Transportation & Logistics – ESD.260 8 © Chris Caplice, MIT OLS Linear Regression The relationship is described in terms of linear model The data (xi,yi) are the observed pairs from which we try to estimate the β coefficients to find the ‘best fit’ The error term, ε, is the ‘unaccounted’ or ‘unexplained’ portion The error terms are assumed to be iid ~N(0,σ) and catch all of the factors ignored or neglected in the model E (Y | x) = β 0 + β1 x yi = β 0 + β1 xi Yi = β 0 + β1 xi + ε i Observed for i = 1, 2,...n StdDev(Y | x) = σ Unknown MIT Center for Transportation & Logistics – ESD.260 9 © Chris Caplice, MIT OLS Linear Regression Residuals Predicted or estimated values are found by using the regression coefficients, b. Residuals, ei, are the difference of actual – predicted values Find the b’s that “minimize the residuals” yˆi = b0 + b1 xi for i = 1, 2,...n ei = yi − yˆi = yi − b0 + b1 xi for i = 1, 2,...n How should I measure the residuals? Min sum of errors - shows bias, but not accurate Min sum of absolute error - accurate & shows bias, but intractable Min sum of squares of error – shows bias & is accurate 2 e ( ∑ i =1 i ) = ∑ i =1 ( yi − yˆi ) = ∑ i=1 ( yi − b0 − b1 xi ) n n 2 n 2 The best model minimizes the residual sum of squares MIT Center for Transportation & Logistics – ESD.260 10 © Chris Caplice, MIT OLS Linear Regression We can find the optimal values of b0 and b1 by taking first order conditions of the SSE: ∑ ( e ) = ∑ ( y − yˆ ) = ∑ ( y − b n i =1 2 n 2 i =1 i i i n i =1 i 0 − b1 xi ) 2 This gives us the following coefficients: b0 = y − b1 x b1 ∑ = n i =1 ( xi − x )( yi − y ) 2 ( x − x ) ∑ i =1 i MIT Center for Transportation & Logistics – ESD.260 n 11 © Chris Caplice, MIT OLS Linear Regression Expansion to multiple variables is straightforward So, for k variables we need to find k regression coefficients Yi = β 0 + β1 x1i + ... + β k xki + ε i for i = 1, 2,...n E (Y | x1 , x2 ,..., xk ) = β 0 + β1 x1 + β 2 x2 + ... + β k xk StdDev(Y | x1 , x2 ,..., xk ) = σ ∑ ( e ) = ∑ ( y − yˆ ) = ∑ ( y − b n i =1 2 i 2 n i =1 i MIT Center for Transportation & Logistics – ESD.260 n i =1 i 12 i 0 − b1 x1i − ... − bk xki ) 2 © Chris Caplice, MIT OLS Example 4,500 4,000 3,500 3,000 2,500 MIT Center for Transportation & Logistics – ESD.260 Ju l ay M ar M Ja n No v Se p Ju l ay M ar 2,000 M Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Demand 3,025 3,047 3,079 3,136 3,454 3,661 3,554 3,692 3,407 3,410 3,499 3,598 3,596 3,721 3,745 3,650 4,157 4,221 4,238 4,008 Ja n Month What do you see? 13 © Chris Caplice, MIT OLS Example Month Establish relationship Fi = f(X1i, X2i, …Xni) =β0+β1X1i+β2X2i+…+βnXni Fi = Level + Trend + Season = β0 + β1 X1i + β2 X2i Where X2i = 1 if a summer month, = 0 o.w. Points to consider: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Demand Period 3,025 1 3,047 2 3,079 3 3,136 4 3,454 5 3,661 6 3,554 7 3,692 8 3,407 9 3,410 10 3,499 11 3,598 12 3,596 13 3,721 14 3,745 15 3,650 16 4,157 17 4,221 18 4,238 19 4,008 20 Summer 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 What if the trend is not linear? How do I handle seasonality if it impacts the trend? How does OLS treat old versus new data? How much information do I need to keep on hand? MIT Center for Transportation & Logistics – ESD.260 14 © Chris Caplice, MIT OLS Example (Excel) 4,300 4,100 3,900 3,700 Actual Predicted 3,500 SUMMARY OUTPUT 3,300 0.979 0.958 0.953 79.21 20 3,100 Aug Jul Jun May Apr Mar Feb Jan Dec Nov Oct Sep Jul Aug Jun May Apr Mar Feb 2,900 Jan Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations ANOVA df Regression Residual Total Intercept Period Summer 2 17 19 Coefficients 2,969.14 48.03 303.51 SS MS F Significance F 2442766.966 1221383.483 194.6730408 1.91955E-12 106658.4214 6274.024786 2549425.387 Standard Error 37.21 3.20 37.70 t Stat 79.79 15.00 8.05 P-value 0.0000 0.0000 0.0000 Lower 95% Upper 95% Lower 95.0% Upper 95.0% 2,890.62 3,047.65 2,890.62 3,047.65 41.27 54.79 41.27 54.79 223.97 383.04 223.97 383.04 Fi = 2969 + 48 (Period) + 304 (Summer_Flag) MIT Center for Transportation & Logistics – ESD.260 15 © Chris Caplice, MIT OLS Example (Excel) Coefficient of Determination R2= 1-ESS/TSS=RSS/TSS SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations Standard Error (estimate of σ around the regression line) 0.979 0.958 0.953 79.21 20 Sum of the Squares Regression (RSS) =∑(ŷ-̅y)2 Error (ESS) =∑(y-ŷ)2 Total (TSS) =∑(y-̅y)2 ANOVA df Regression Residual Total Intercept Period Summer Regression Coefficients 2 17 19 SS MS F Significance F 2442766.966 1221383.483 194.6730408 1.91955E-12 106658.4214 6274.024786 2549425.387 Coefficients Standard Error 2,969.14 37.21 48.03 3.20 303.51 37.70 Degrees of Freedom = n-k-1 MIT Center for Transportation & Logistics – ESD.260 t Stat 79.79 15.00 8.05 P-value 0.0000 0.0000 0.0000 Lower 95% Upper 95% 2,890.62 3,047.65 41.27 54.79 223.97 383.04 Std Error of Regression Coeff (sbm) 16 95% Confidence Intervals Lower 95.0% Upper 95.0% 2,890.62 41.27 223.97 3,047.65 54.79 383.04 t-Statistic (bm/sbm) Is bm different from 0? P-value tells you % conf. © Chris Caplice, MIT Coefficient of Determination (R2) Measures Goodness of Fit of the model Captures the amount of variation that the model ‘explains’ R2=1-ESS/TSS = RSS/TSS TSS = ESS + RSS Variation of observed around mean = Variation of observed around estimated – Variation of estimated around the mean Generally, a higher R2 is better, but . . . Model needs to make sense High R2 does not indicate causality It really depends on how the model is being used as to what is ‘good enough’ The individual coefficients need to be tested MIT Center for Transportation & Logistics – ESD.260 17 © Chris Caplice, MIT Discrete Choice Models What if you are predicting demand for one product over another? Model Selections (Blue vs. Red Cars) Mode Forecasting (pick one of many) OLS => y = b0 +b1 X Y 1 Yi = 1 + e− β X i Y 1 1 0 0 X MIT Center for Transportation & Logistics – ESD.260 18 X © Chris Caplice, MIT Sales Forecasting Methods Sales Forecast Errors (MAPE) by forecast horizons in years Expert Opinions 44.8% 37.3% 14.9% Sales Force Executives Industry Surveys Level Statistical Models 30.6% 20.9% 11.2% 6.0% 3.7% Naïve Model Moving Average Exp. Smoothing Regression Box-Jenkins <.25 yrs ≤2 yrs >2 yrs Industry 8 11 15 Corporate 7 11 18 Product Group 10 15 20 Product Line 11 16 20 Product 16 21 26 Source: Mentzer & Cox (1984) Source: Dalrymple (1987) Survey 134 companies MIT Center for Transportation & Logistics – ESD.260 19 © Chris Caplice, MIT Misc. Forecasting Issues Data Issues Sales data is not demand data Transactions can aggregate and skew actual demand Ordering quantities can dictate sourcing Historical data might not exist Demand visibility can be skewed by level of echelon Bullwhip effect Collaborative Planning, Forecasting, and Replenishment (CPFR) Forecasting vs. Inventory Management Statistical Validity vs. Use and Cost of Model Demand is not always exogenous MIT Center for Transportation & Logistics – ESD.260 20 © Chris Caplice, MIT Questions, Comments, Suggestions?