Finance 30210: Managerial Economics Demand Estimation and Forecasting What are the odds that a fair coin flip results in a head? What are the odds that the toss of a fair die results in a 5? What are the odds that tomorrow’s temperature is 95 degrees? The answer to all these questions come from a probability distribution Probability 1/2 Head Tail Probability 1/6 1 2 3 4 5 6 A probability distribution is a collection of probabilities describing the odds of any particular event The distribution for temperature in south bend is a bit more complicated because there are so many possible outcomes, but the concept is the same Probability Standard Deviation Temperature Mean We generally assume a Normal Distribution which can be characterized by a mean (average) and standard deviation (measure of dispersion) Without some math, we can’t find the probability of a specific outcome, but we can easily divide up the distribution Probability Temperature Mean-2SD 2.5% Mean -1SD 13.5% Mean 34% Mean+1SD 34% Mean+2SD 13.5% 2.5% Annual Temperature in South Bend has a mean of 59 degrees and a standard deviation of 18 degrees. Probability 95 degrees is 2 standard deviations to the right – there is a 2.5% chance the temperature is 95 or greater (97.5% chance it is cooler than 95) Temperature 23 41 Can’t we do a little better than this? 59 77 95 Conditional distributions give us probabilities conditional on some observable information – the temperature in South Bend conditional on the month of July has a mean of 84 with a standard deviation of 7. Probability 95 degrees falls a little more than one standard deviation away (there approximately a 16% chance that the temperature is 95 or greater) Temperature 70 77 84 91 95 98 Conditioning on month gives us a more accurate probabilities! We know that there should be a “true” probability distribution that governs the outcome of a coin toss (assuming a fair coin) PrHeads PrTails .5 Suppose that we were to flip a coin over and over again and after each flip, we calculate the percentage of heads & tails # of Heads Total Flips (Sample Statistic) .5 (True Probability) That is, if we collect “enough” data, we can eventually learn the truth! We can follow the same process for the temperature in South Bend Temperature ~ N , 2 We could find this distribution by collecting temperature data for south bend Sample Mean (Average) Sample Variance 1N x xi N i 1 N 1 2 s 2 xi x 2 N i 1 Note: Standard Deviation is the square root of the variance. Some useful properties of probability distributions x N μ,σ 2 y kx Probability distributions are scalable y N k,k 2σ 2 = 3X Mean = 1 Mean = 3 Variance = 4 Variance = 36 (3*3*4) Std. Dev. = 2 Std. Dev. = 6 y N ,σ x y N ,σ x N μ x ,σ x2 Probability distributions are additive y 2 y x y 2 x σ y2 2 cov xy = + Mean = 1 Mean = 2 Mean = 3 Variance = 1 Variance = 9 Variance = 14 (1 + 9 + 2*2) Std. Dev. = 1 Std. Dev. = 3 Std. Dev. = 3.7 COV = 2 Suppose we know that the value of a car is determined by its age Value = $20,000 - $1,000 (Age) Car Age Value Mean = 8 Mean = $ 12,000 Variance = 4 Variance = 4,000,000 Std. Dev. = 2 Std. Dev. = $ 2,000 We could also use this to forecast: Value = $20,000 - $1,000 (Age) How much should a six year old car be worth? Value = $20,000 - $1,000 (6) = $14,000 Note: There is NO uncertainty in this prediction. Searching for the truth…. You believe that there is a relationship between age and value, but you don’t know what it is…. 1. Collect data on values and age 2. Estimate the relationship between them Note that while the true distribution of age is N(8,4), our collected sample will not be N(8,4). This sampling error will create errors in our estimates!! 18000.00 16000.00 Slope = b 14000.00 12000.00 10000.00 a 8000.00 6000.00 4000.00 2000.00 0.00 0 2 4 6 Value = a + b * (Age) + error 8 10 12 14 error N 0,σ 2 We want to choose ‘a’ and ‘b’ to minimize the error! Regression Results Variable Intercept Age Coefficients Standard Error t Stat 12,354 653 18.9 - 854 80 -10.60 We have our estimate of “the truth” Value = $12,354 - $854 * (Age) + error Intercept (a) Age (b) Mean = $12,354 Mean = -$854 Std. Dev. = $653 Std. Dev. = $80 T-Stats bigger than 2 in absolute value are considered statistically significant! Regression Statistics R Squared 0.36 Standard Error 2250 Percentage of value variance explained by age Error Term Mean = 0 Std. Dev = $2,250 We can now forecast the value of a 6 year old car 6 Value = $12,354 - $854 * (Age) + error Mean = $12,354 Mean = $854 Mean = $0 Std. Dev. = $653 Std. Dev. = $ 80 Std. Dev. = $2,250 StdDev Var a X 2Var b 2 XCova, b Var error Cova, b XVarb (Recall, The Average Car age is 8 years) StdDev 6532 6 2 80 2 26880 2 2250 2 $2,259 Value 12,354 854 * 6 $7,230 StdDev 653 2 6 80 2 2 2 2 2 6 8 80 2250 $2,259 Value +95% Forecast Interval -95% Age 6 x 8 Age Note that your forecast error will always be smallest at the sample mean! Also, your forecast gets worse at an increasing rate as you depart from the mean What are the odds that Pat Buchanan received 3,407 votes from Palm Beach County in 2000? The Strategy: Estimate a relationship for Pat Buchanan’s votes using every county EXCEPT Palm Beach “Are a function of” B F D Pat Buchanan’s Votes Observable Demographics Using Palm Beach data, forecast Pat Buchanan’s vote total for Palm Beach BPB F DPB The Data: Demographic Data By County County Black (%) Age 65 (%) Hispanic (%) College (%) Income (000s) Buchanan Total Votes Votes Alachua 21.8 9.4 4.7 34.6 26.5 262 84,966 Baker 16.8 7.7 1.5 5.7 27.6 73 8,128 What variables do you think should affect Pat Buchanan’s Vote total? # of Buchanan votes V a bC % of County that is college educated # of votes gained/lost for each percentage point increase in college educated population Results Parameter a b Value 5.35 14.95 Standard Error 58.5 3.84 T-Statistic .09 3.89 R-Square = .19 19% of the variation in Buchanan’s votes across counties is explained by college education The distribution for ‘b’ has a mean of 15 and a standard deviation of 4 Each percentage point increase in college educated (i.e. from 10% to 11%) raises Buchanan’s vote total by 15 0 15 There is a 95% chance that the value for ‘b’ lies between 23 and 7 V 5.35 14.95C Plug in Values for College % to get vote predictions County College (%) Predicted Votes Actual Votes Error Alachua 34.6 522 262 260 Baker 5.7 90 73 17 Lets try something a little different… County College (%) Buchanan Votes Log of Buchanan Votes Alachua 34.6 262 5.57 Baker 5.7 73 4.29 Log of Buchanan votes LN V a bC % of County that is college educated Percentage increase/decease in votes for each percentage point increase in college educated population Results Parameter a b Value 3.45 .09 Standard Error .27 .02 T-Statistic 12.6 5.4 R-Square = .31 31% of the variation in Buchanan’s votes across counties is explained by college education The distribution for ‘b’ has a mean of .09 and a standard deviation of .02 Each percentage point increase in college educated (i.e. from 10% to 11%) raises Buchanan’s vote total by .09% 0 .09 There is a 95% chance that the value for ‘b’ lies between .13 and .05 LN V 3.45 .09C V e LN V Plug in Values for College % to get vote predictions County College (%) Predicted Votes Actual Votes Error Alachua 34.6 902 262 640 Baker 5.7 55 73 -18 How about this… County College (%) Buchanan Votes Log of College (%) Alachua 34.6 262 3.54 Baker 5.7 73 1.74 # of Buchanan votes V a bLN C Log of % of County that is college educated Gain/ Loss in votes for each percentage increase in college educated population Results Parameter a b Value -424 252 Standard Error 139 54 T-Statistic -3.05 4.6 R-Square = .25 25% of the variation in Buchanan’s votes across counties is explained by college education The distribution for ‘b’ has a mean of 252 and a standard deviation of 54 Each percentage increase in college educated (i.e. from 30% to 30.3%) raises Buchanan’s vote total by 252 votes 0 .09 There is a 95% chance that the value for ‘b’ lies between 360 and 144 V 424 252LN C Plug in Values for College % to get vote predictions County College (%) Predicted Votes Actual Votes Error Alachua 34.6 469 262 207 Baker 5.7 15 73 -58 One More… County College (%) Buchanan Votes Log of College (%) Log of Buchanan Votes Alachua 34.6 262 3.54 5.57 Baker 5.7 73 1.74 4.29 Log of Buchanan votes LN V a bLN C Log of % of County that is college educated Percentage gain/Loss in votes for each percentage increase in college educated population Results Parameter a b Value .71 1.61 Standard Error .63 .24 T-Statistic 1.13 6.53 R-Square = .40 40% of the variation in Buchanan’s votes across counties is explained by college education The distribution for ‘b’ has a mean of 1.61 and a standard deviation of .24 Each percentage increase in college educated (i.e. from 30% to 30.3%) raises Buchanan’s vote total by 1.61% 0 .09 There is a 95% chance that the value for ‘b’ lies between 2 and 1.13 LN V .71 1.61LN C V e LN V Plug in Values for College % to get vote predictions County College (%) Predicted Votes Actual Votes Error Alachua 34.6 624 262 362 Baker 5.7 34 73 -39 It turns out the regression with the best fit looks like this. County Black (%) Age 65 (%) Hispanic (%) College (%) Income (000s) Buchanan Total Votes Votes Alachua 21.8 9.4 4.7 34.6 26.5 262 84,966 Baker 16.8 7.7 1.5 5.7 27.6 73 8,128 LN P a1 a2 B a2 A65 a3 H a4C a5 I Buchanan Votes Total Votes Error term *100 Parameters to be estimated The Results: Variable Coefficient Standard Error t - statistic Intercept 2.146 .396 5.48 Black (%) -.0132 .0057 -2.88 Age 65 (%) -.0415 .0057 -5.93 Hispanic (%) -.0349 .0050 -6.08 College (%) -.0193 .0068 -1.99 Income (000s) -.0658 .00113 -4.58 R Squared = .73 LN P 2.146 .0132B .0415 A65 .0349H .0193C .0658I Now, we can make a forecast! County Black (%) Age 65 (%) Hispanic (%) College (%) Income (000s) Buchanan Votes Total Votes Alachua 21.8 9.4 4.7 34.6 26.5 262 84,966 Baker 16.8 7.7 1.5 5.7 27.6 73 8,128 County Predicted Votes Actual Votes Error Alachua 520 262 258 Baker 55 73 -18 County Black (%) Age 65 (%) Hispanic (%) College (%) Income (000s) Buchanan Total Votes Votes Palm Beach 21.8 23.6 9.8 22.1 33.5 3,407 431,621 LN P 2.146 .0132B .0415 A65 .0349H .0193C .0658I LN P 2.004 P e 2.004 .134% .00134431,621 578 This would be our prediction for Pat Buchanan’s vote total! LN P 2.004 We know that the log of Buchanan’s vote percentage is distributed normally with a mean of -2.004 and with a standard deviation of .2556 Probability LN(%Votes) -2.004 – 2*(.2556) = -2.5152 -2.004 + 2*(.2556) = -1.4928 There is a 95% chance that the log of Buchanan’s vote percentage lies in this range P e 2.004 .134% Next, lets convert the Logs to vote percentages Probability % of Votes e 2.5152 .08% e 1.4928 .22% There is a 95% chance that Buchanan’s vote percentage lies in this range .00134431,621 578 Finally, we can convert to actual votes Probability 3,407 votes turns out to be 7 standard deviations away from our forecast!!! .0008431,621 348 Votes .0022431,621 970 There is a 95% chance that Buchanan’s total vote lies in this range We know that the quantity of some good or service demanded should be related to some basic variables “ Is a function of” QD DP, I ,... Price Quantity Demanded Price Income D Quantity Other “Demand Shifters” Demand Factors Cross Sectional estimation holds the time period constant and estimates the variation in demand resulting from variation in the demand factors Time t-1 t t+1 For example: can we estimate demand for Pepsi in South Bend by looking at selected statistics for South bend Suppose that we have the following data for sales in 200 different Indiana cities City Price Average Income (Thousands) Competitor’s Price Advertising Expenditures (Thousands) Total Sales Granger 1.02 21.934 1.48 2.367 9,809 Mishawaka 2.56 35.796 2.53 26.922 130,835 Lets begin by estimating a basic demand curve – quantity demanded is a linear function of price. Q a0 a1P Change in quantity demanded per $ change in price (to be estimated) That is, we have estimated the following equation Regression Results Variable Coefficient Standard Error t Stat Intercept 155,042 18,133 8.55 Price (X) -46,087 7214 -6.39 Regression Statistics R Squared .17 Standard Error 48,074 Q 155,042 46,087 P Every dollar increase in price lowers sales by 46,087 units. Values For South Bend Price of Pepsi $1.37 Q 155,042 46,0871.37 91,903 P 1.37 .68 91,903 46,087 $1.37 Q 91,903 As we did earlier, we can experiment with different functional forms by using logs City Price Average Income (Thousands) Competitor’s Price Advertising Expenditures (Thousands) Total Sales Granger 1.02 21.934 1.48 2.367 9,809 Mishawaka 2.56 35.796 2.53 26.922 130,835 Adding logs changes the interpretation of the coefficients Q a0 a1LN P Change in quantity demanded per percentage change in price (to be estimated) That is, we have estimated the following equation Regression Results Variable Coefficient Standard Error t Stat Intercept 133,133 14,892 8.93 Price (X) -103,973 16,407 -6.33 Regression Statistics R Squared .17 Standard Error 48,140 Q 133,133 103,973LN P Every 1% increase in price lowers sales by 103,973 units. Values For South Bend Price of Pepsi $1.37 Log of Price .31 Q 133,133 103,973.31 Q 103,973 %p P Q 1 1 103,973 %p Q 100,402 $1.37 Q 100,402 As we did earlier, we can experiment with different functional forms by using logs City Price Average Income (Thousands) Competitor’s Price Advertising Expenditures (Thousands) Total Sales Granger 1.02 21.934 1.48 2.367 9,809 Mishawaka 2.56 35.796 2.53 26.922 130,835 Adding logs changes the interpretation of the coefficients LN Q a0 a1P Percentage change in quantity demanded per $ change in price (to be estimated) That is, we have estimated the following equation Regression Results Variable Coefficient Standard Error t Stat Intercept 13 .34 38.1 Price (X) -1.22 .13 -8.98 Regression Statistics R Squared .28 Standard Error .90 LN Q 131.22P Every $1 increase in price lowers sales by 1.22%. Values For South Bend Price of Pepsi $1.37 LN Q 13 1.221.37 11.33 Q e11.33 83,283 We can now use this estimated demand curve along with price in South Bend to estimate demand in South Bend % Q 1.22 p P % Q p 1.37 1.22 p 1 1 $1.37 Q 83,283 As we did earlier, we can experiment with different functional forms by using logs City Price Average Income (Thousands) Competitor’s Price Advertising Expenditures (Thousands) Total Sales Granger 1.02 21.934 1.48 2.367 9,809 Mishawaka 2.56 35.796 2.53 26.922 130,835 Adding logs changes the interpretation of the coefficients LN Q a0 a1LN P Percentage change in quantity demanded per percentage change in price (to be estimated) That is, we have estimated the following equation Regression Results Variable Coefficient Standard Error t Stat Intercept 12.3 .28 42.9 Price (X) -2.60 .31 -8.21 Regression Statistics R Squared .25 Standard Error .93 LN Q 12 2.6LN P Every 1% increase in price lowers sales by 2.6%. Values For South Bend Price of Pepsi $1.37 Log of Price .31 LN Q 12 2.6.31 11.19 Q e11.19 72,402 P %Q 2.6 %p $1.37 Q 72,402 We can add as many variables as we want in whatever combination. The goal is to look for the best fit. LN Q a0 a1P a2 LN I a3 LN Pc % change in Sales per $ change in price % change in Sales per % change in income % change in Sales per % change in competitor’s price Regression Results Variable Intercept Coefficient Standard Error t Stat 5.98 1.29 4.63 -1.29 .12 -10.79 Log of Income 1.46 .34 4.29 Log of Competitor’s Price 2.00 .34 5.80 Price R Squared: .46 Values For South Bend Price of Pepsi $1.37 Log of Income 3.81 Log of Competitor’s Price Now we can make a prediction and calculate elasticities .80 LN Q 5.98 1.291.37 1.463.81 2.00.80 11.36 Q e11.36 87,142 P %Q P 1.37 1.29 1.76 P 1 1 %Q I 1.46 %I %Q 2.00 CP %Pc $1.37 Q 87,142 Demand Factors We could use a cross sectional regression to forecast quantity demanded out into the future, but it would take a lot of information! Time t-1 t Estimate a demand curve using data at some point in time t+1 Use the estimated demand curve and forecasts of data to forecast quantity demanded Demand Factors Time Series estimation ignores the demand factors and estimates the variation in demand over time Time t-1 t t+1 For example: can we predict demand for Pepsi in South Bend next year by looking at how demand varies across time Time series estimation ignores the demand factors and looks at variations in demand over time. Essentially, we want to separate demand changes into various frequencies Trend: Long term movements in demand (i.e. demand for movie tickets grows by an average of 6% per year) Business Cycle: Movements in demand related to the state of the economy (i.e. demand for movie tickets grows by more than 6% during economic expansions and less than 6% during recessions) Seasonal: Movements in demand related to time of year. (i.e. demand for movie tickets is highest in the summer and around Christmas Time Period Quantity (millions of kilowatt hours) 2003:1 11 2003:2 15 2003:3 12 2003:4 14 2004:1 12 2004:2 17 2004:3 13 2004:4 16 2005:1 14 2005:2 18 2005:3 15 2005:4 17 2006:1 15 2006:2 20 2006:3 16 2006:4 19 Suppose that you work for a local power company. You have been asked to forecast energy demand for the upcoming year. You have data over the previous 4 years: First, let’s plot the data…what do you see? 25 20 15 10 5 0 2003-1 2004-1 2005-1 2006-1 This data seems to have a linear trend A linear trend takes the following form: Estimated value for time zero Estimated quarterly growth (in millions of kilowatt hours) xt x0 bt Forecasted value at time t (note: time periods are quarters and time zero is 2003:1) Time period: t = 0 is 2003:1 and periods are quarters Regression Results Variable Coefficient Standard Error t Stat Intercept 11.9 .953 12.5 Time Trend .394 .099 4.00 Regression Statistics R Squared Standard Error Observations .53 1.82 16 xt 11.9 .394t Lets forecast electricity usage at the mean time period (t = 8) xˆt 11.9 .3948 15.05 Var xˆt 3.50 Here’s a plot of our regression line with our error bands…again, note that the forecast error will be lowest at the mean time period 25 20 15 10 5 0 2003-1 2004-1 2005-1 T=8 2006-1 We can use this linear trend model to predict as far out as we want, but note that the error involved gets worse! 70 60 50 40 30 20 10 0 Sample xˆt 11.9 .39476 41.85 Var xˆt 47.7 Lets take another look at the data…it seems that there is a regular pattern… 25 20 Q2 Q2 Q2 Q2 15 10 5 0 2003-1 2004-1 There appears to be a seasonal cycle… 2005-1 2006-1 One seasonal adjustment process is to adjust each quarter by the average of actual to predicted Average Ratios Time Period Actual Predicted Ratio Adjusted 2003:1 11 12.29 .89 12.29(.87)=10.90 2003:2 15 12.68 1.18 12.68(1.16) = 14.77 2003:3 12 13.08 .91 13.08(.91) = 11.86 •Q2 = 1.16 2003:4 14 13.47 1.03 13.47(1.04) = 14.04 •Q3 = .91 2004:1 12 13.87 .87 13.87(.87) = 12.30 2004:2 17 14.26 1.19 14.26(1.16) = 16.61 2004:3 13 14.66 .88 14.66(.91) = 13.29 2004:4 16 15.05 1.06 15.05(1.04) = 15.68 2005:1 14 15.44 .91 15.44(.87) = 13.70 2005:2 18 15.84 1.14 15.84(1.16) = 18.45 2005:3 15 16.23 .92 16.23(.91) = 14.72 2005:4 17 16.63 1.02 16.63(1.04) = 17.33 2006:1 15 17.02 .88 17.02(.87) = 15.10 2006:2 20 17.41 1.14 17.41(1.16) = 20.28 2006:3 16 17.81 .89 17.81(.91) = 16.15 2006:4 19 18.20 1.04 18.20(1.04) = 18.96 •Q1 = .87 •Q4 = 1.04 For each observation: •Calculate the ratio of actual to predicted •Average the ratios by quarter •Use the average ration to adjust each predicted value With the seasonal adjustment, we don’t have any statistics to judge goodness of fit. One method of evaluating a forecast is to calculate the root mean squared error Time Period Actual Adjusted Error 2003:1 11 10.90 -0.1 2003:2 15 14.77 -0.23 2003:3 12 11.86 -0.14 2003:4 14 14.04 0.04 2004:1 12 12.30 0.3 2004:2 17 16.61 -0.39 2004:3 13 13.29 0.29 2004:4 16 15.68 -0.32 2005:1 14 13.70 -0.3 2005:2 18 18.45 0.45 2005:3 15 14.72 -0.28 2005:4 17 17.33 0.33 2006:1 15 15.10 0.1 2006:2 20 20.28 0.28 2006:3 16 16.15 0.15 2006:4 19 18.96 -0.04 Sum of squared forecast errors A F 2 RMSE t n Number of Observations RMSE .26 t Looks pretty good… 20 19 18 17 16 15 14 13 12 11 10 2003-1 2004-1 2005-1 2006-1 RMSE .26 Recall our prediction for period 76 ( Year 2022 Q4) 70 60 50 40 30 20 10 0 xˆt 11.9 .39476 41.851.04 43.52 We could also account for seasonal variation by using dummy variables xt x0 b0t b1D1 b2 D2 b3 D3 1, if quarter i Di else 0, Note: we only need three quarter dummies. If the observation is from quarter 4, then D1 D2 D3 0 xt x0 b0t Regression Results Variable Coefficient Intercept Standard Error t Stat 12.75 .226 56.38 .375 .0168 22.2 D1 -2.375 .219 -10.83 D2 1.75 .215 8.1 D3 -2.125 .213 -9.93 Time Trend Regression Statistics R Squared .99 Standard Error .30 Observations 16 Note the much better fit!! xt 12.75 .375t 2.375D1 1.75D2 2.125D3 Time Period Actual Ratio Method Dummy Variables 2003:1 11 10.90 10.75 2003:2 15 14.77 15.25 2003:3 12 11.86 11.75 2003:4 14 14.04 14.25 2004:1 12 12.30 12.25 2004:2 17 16.61 16.75 2004:3 13 13.29 13.25 2004:4 16 15.68 15.75 2005:1 14 13.70 13.75 2005:2 18 18.45 18.25 2005:3 15 14.72 14.75 2005:4 17 17.33 17.25 2006:1 15 15.10 15.25 2006:2 20 20.28 19.75 2006:3 16 16.15 16.25 2006:4 19 18.96 18.75 Ratio Method RMSE .26 Dummy Variables RMSE .25 A plot confirms the similarity of the methods 20 19 18 17 16 15 14 13 12 11 10 2003-1 2004-1 2005-1 Dummy 2006-1 Ratio Recall our prediction for period 76 ( Year 2022 Q4) 70 60 50 40 30 20 10 0 xt 12.75 .37576 41.25 Recall, our trend line took the form… xt x0 bt This parameter is measuring quarterly change in electricity demand in millions of kilowatt hours. Often times, its more realistic to assume that demand grows by a constant percentage rather that a constant quantity. For example, if we knew that electricity demand grew by G% per quarter, then our forecasting equation would take the form G% xt x0 1 100 t If we wish to estimate this equation, we have a little work to do… xt x0 1 g t Note: this growth rate is in decimal form If we convert our data to natural logs, we get the following linear relationship that can be estimated ln xt ln x0 t ln 1 g Regression Results Variable Coefficient Standard Error t Stat Intercept 2.49 .063 39.6 Time Trend .026 .006 4.06 Regression Statistics R Squared Standard Error Observations .54 .1197 ln xt 2.49 .026t 16 Lets forecast electricity usage at the mean time period (t = 8) ln xˆt 2.49 .0268 2.698 BE CAREFUL….THESE NUMBERS ARE LOGS !!! Var xˆt .0152 ln xˆt 2.49 .0268 2.698 Var xˆt .0152 The natural log of forecasted demand is 2.698. Therefore, to get the actual demand forecast, use the exponential function e 2.698 14.85 Likewise, with the error bands…a 95% confidence interval is +/- 2 SD 2.698 / 2 .0152 2.451,2.945 e 2.451 ,e 2.945 11.60,19.00 Again, here is a plot of our forecasts with the error bands 30 25 20 15 10 5 0 2003-1 2004-1 2005-1 T=8 2006-1 RMSE 1.70 Errors is growth rates compound quickly!! 600 500 400 300 200 100 0 1 13 25 37 49 61 73 85 97 e 4.49 89.22 / 2SD 35.8,221.8 Let’s try one…suppose that we are interested in forecasting gasoline prices. We have the following historical data. (the data is monthly from April 1993 – June 2010) Does a linear (constant cents per gallon growth per year) look reasonable? Let’s suppose we assume a linear trend. Then we are estimating the following linear regression: monthly growth in dollars per gallon pt p0 bt Price at time t Price at April 1993 Number of months from April 1993 Regression Results Variable Intercept Time Trend Coefficient Standard Error t Stat .67 .05 12.19 .010 .0004 23.19 R Squared= .72 We can check for the presence of a seasonal cycle by adding seasonal dummy variables: dollars per gallon impact of quarter I relative to quarter 4 1, if quarter i Di 0, else pt p0 b0t b1D1 b2 D2 b3 D3 Regression Results Variable Coefficient Standard Error t Stat Intercept .58 .07 8.28 Time Trend .01 .0004 23.7 D1 -.03 .075 -.43 D2 .15 .074 2.06 D3 .16 .075 2.20 R Squared= .74 If we wanted to remove the seasonal component, we could by subtracting the seasonal dummy off each gas price Seasonalizing Date Regression coefficient Price Seasonalized data 1993 – 04 1.05 2nd Quarter .15 .90 1993 - 07 1.06 3rd Quarter .16 90 1993 - 10 1.06 4th Quarter 0 1.06 1994 - 01 .98 1st Quarter -.03 1.01 1994 - 04 1.00 2nd Quarter .15 .85 Note: Once the seasonal component has been removed, all that should be left is trend, cycle, and noise. We could check this: Seasonalized Price Series Regression Results ~ pt p0 bt Variable Coefficient Standard Error t Stat Intercept .587 .05 11.06 Time Trend .010 .0004 23.92 Seasonalized Price Series ~ pt p0 b0t b1D1 b2 D2 b3 D3 Regression Results Variable Coefficient Standard Error t Stat Intercept .587 .07 8.28 Time Trend .010 .0004 23.7 D1 0 .075 0 D2 0 .074 0 D3 0 .075 0 The regression we have in place gives us the trend plus the seasonal component of the data pt .58 .01t .03D1 .15D2 .16D3 Predicted Trend Seasonal If we subtract our predicted price (from the regression) from the actual price, we will have isolated the business cycle and noise Business Cycle Component Date Actual Price Predicted Price (From regression) Business Cycle Component 1993 - 04 1.050 .752 .297 1993 - 05 1.071 .763 .308 1993 - 06 1.075 773 .301 1993 - 07 1.064 .797 .267 1993 - 08 1.048 .807 .240 We can plot this and compare it with business cycle dates Actual Price pt pˆ t Predicted Price Data Breakdown Date Actual Price Trend Seasonal Business Cycle 1993 - 04 1.050 .58 .15 .320 1993 - 05 1.071 .59 .15 .331 1993 - 06 1.075 .60 .15 .325 1993 - 07 1.064 .61 .16 .294 1993 - 08 1.048 .62 .16 .268 Regression Results Variable Coefficient Standard Error t Stat Intercept .58 .07 8.28 Time Trend .01 .0004 23.7 D1 -.03 .075 -.43 D2 .15 .074 2.06 D3 .16 .075 2.20 Perhaps an exponential trend would work better… An exponential trend would indicate constant percentage growth rather than cents per gallon. We already know that there is a seasonal component, so we can start with dummy variables Monthly growth rate Percentage price impact of quarter I relative to quarter 4 1, if quarter i Di 0, else ln pt p0 b0t b1D1 b2 D2 b3 D3 Regression Results Variable Coefficient Standard Error t Stat Intercept -.14 .03 -4.64 Time Trend .005 .0001 29.9 D1 -.02 .032 -.59 D2 .06 .032 2.07 D3 .07 .032 2.19 R Squared= .81 If we wanted to remove the seasonal component, we could by subtracting the seasonal dummy off each gas price, but now, the price is in logs Seasonalizing Date Price Log of Price Regression coefficient Log of Seasonalized data Seasonalized Price 1993 – 04 1.05 .049 2nd Quarter .06 -.019 .98 1993 - 07 1.06 .062 3rd Quarter .07 -.010 .99 1993 - 10 1.06 .062 0 .062 1.06 1994 - 01 .98 -.013 1st Quarter -.02 .006 1.00 1994 - 04 1.00 .005 2nd Quarter .06 -.062 .94 Example: e .019 .98 4th Quarter The regression we have in place gives us the trend plus the seasonal component of the data ln pt .14 .005t .02D1 .06D2 .07 D3 Predicted Log of Price Seasonal Trend If we subtract our predicted price (from the regression) from the actual price, we will have isolated the business cycle and noise e .069 .93 Business Cycle Component Date Actual Price Predicted Log Price (From regression) Predicted Price Business Cycle Component 1993 - 04 1.050 -.069 .93 .12 1993 - 05 1.071 -.063 .94 .13 1993 - 06 1.075 -.057 .94 .13 1993 – 07 1.064 -.047 .95 .11 1993 - 08 1.048 -.041 .96 .09 As you can see, very similar results Actual Price pt pˆ t Predicted Price In either case, we could make a forecast for gasoline prices next year. Lets say, April 2011. Forecasting Data Date Time Period April 2011 217 Quarter 2 pt .58 .01217 .030 .151 .160 2.90 OR ln pt .14 .005217 .020 .061 .070 1.005 e1.005 2.73 By the way, the actual price in April 2011 was $3.80 Quarter Market Share 1 20 2 22 25 3 23 20 4 24 5 18 6 23 7 19 5 8 17 0 9 22 10 23 11 18 12 23 30 15 10 Consider a new forecasting problem. You are asked to forecast a company’s market share for the 13th quarter. 1 2 3 4 5 6 7 8 9 10 There doesn’t seem to be any discernable trend here… 11 12 Smoothing techniques are often used when data exhibits no trend or seasonal/cyclical component. They are used to filter out short term noise in the data. Quarter Market Share MA(3) MA(5) 1 20 2 22 3 23 4 24 21.67 5 18 23 6 23 21.67 21.4 7 19 21.67 22 8 17 20 21.4 9 22 19.67 20.2 10 23 19.33 19.8 11 18 20.67 20.8 12 23 21 19.8 A moving average of length N is equal to the average value over the previous N periods t 1 MAN A tN N t The longer the moving average, the smoother the forecasts are… 30 25 20 Actual 15 MA(3) MA(5) 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 Calculating forecasts is straightforward… MA(3) Quarter Market Share MA(3) MA(5) 1 20 2 22 3 23 4 24 21.67 5 18 23 6 23 21.67 21.4 7 19 21.67 22 8 17 20 21.4 9 22 19.67 20.2 10 23 19.33 19.8 11 18 20.67 20.8 12 23 21 19.8 23 18 23 21.33 3 MA(5) 23 18 23 22 17 20.6 5 So, how do we choose N?? Quarter Market Share MA(3) Squared Error MA(5) Squared Error 1 20 2 22 3 23 4 24 21.67 5.4289 5 18 23 25 6 23 21.67 1.7689 21.4 2.56 7 19 21.67 7.1289 22 9 8 17 20 9 21.4 19.36 9 22 19.67 5.4289 20.2 3.24 10 23 19.33 13.4689 19.8 10.24 11 18 20.67 7.1289 20.8 7.84 12 23 21 4 19.8 10.24 Total = 78.3534 RMSE 78.3534 2.95 9 Total = 62.48 RMSE 62.48 2.99 7 Exponential smoothing involves a forecast equation that takes the following form Ft 1 wAt 1 wFt w 0,1 Forecast for time t Forecast for time t+1 Actual value at time t Smoothing parameter Note: when w = 1, your forecast is equal to the previous value. When w = 0, your forecast is a constant. For exponential smoothing, we need to choose a value for the weighting formula as well as an initial forecast Quarter Market Share W=.3 W=.5 1 20 21.0 21.0 2 22 20.7 20.5 3 23 21.1 21.3 4 24 21.7 22.2 5 18 22.4 23.1 6 23 21.1 20.6 7 19 21.7 21.8 8 17 20.9 20.4 9 22 19.7 18.7 10 23 20.4 20.4 11 18 21.2 21.7 12 23 20.2 19.9 Usually, the initial forecast is chosen to equal the sample average .523 .520.6 21.8 As was mentioned earlier, the smaller w will produce a smoother forecast 30 25 20 15 10 5 0 1 2 3 4 5 Actual 6 7 w=.3 8 9 w=.5 10 11 12 Calculating forecasts is straightforward… W=.3 Quarter Market Share W=.3 W=.5 1 20 21.0 21.0 2 22 20.7 20.5 3 23 21.1 21.3 4 24 21.7 22.2 5 18 22.4 23.1 6 23 21.1 20.6 7 19 21.7 21.8 8 17 20.9 20.4 9 22 19.7 18.7 10 23 20.4 20.4 11 18 21.2 21.7 12 23 20.2 19.9 .323 .720.2 21.04 W=.5 .523 .519.9 21.45 So, how do we choose W?? Quarter Market Share W = .3 Squared Error W=.5 Squared Error 1 20 21.0 1 21.0 1 2 22 20.7 1.69 20.5 2.25 3 23 21.1 3.61 21.3 2.89 4 24 21.7 5.29 22.2 3.24 5 18 22.4 19.36 23.1 26.01 6 23 21.1 3.61 20.6 5.76 7 19 21.7 7.29 21.8 7.84 8 17 20.9 15.21 20.4 11.56 9 22 19.7 5.29 18.7 10.89 10 23 20.4 6.76 20.4 6.76 11 18 21.2 10.24 21.7 13.69 12 23 20.2 7.84 19.9 9.61 Total = 87.19 RMSE 87.19 2.70 12 Total = 101.5 RMSE 101.5 2.91 12