1 Econ 240 C Lecture 14 2 Project II I. Work in Groups II. You will be graded based on a PowerPoint presentation and a written report. III. Your report should have an executive summary of one to one and a half pages that summarizes your findings in words for a nontechnical reader. It should explain the problem being examined from an economic perspective, i.e. it should motivate interest in the issue on the part of the reader. Your report should explain how you are investigating the issue, in simple language. It should explain why you are approaching the problem in this particular fashion. Your executive report should explain the economic importance of your findings. The technical details of your findings you can attach as an appendix Technical Appendix 1. Table of Contents 2. Spreadsheet of data used and sources or, if extensive, a subsample of the data 3. Describe the analytical time series techniques you are using 4. Show descriptive statistics and histograms for the variables in the study 5. Use time series data for your project; show a plot of each variable against time 4 5 Group A Group B Group C Tara Copello Zhimin Zhou Andrea Cardani Jonathan Hester Evan Nakano Eric Laschinger Yana Ten Pungdalis Suos Micah Witt Charles Rabkin Will Hippen Thomas Bruister Arnaud Piechaud Kyu-Sang Park Calvin Yeung Andrew Cahill Ashley Hedberg Jesse Smith Darren Doi Sarab Khalsa Jong Duk Woo Group D Group E Carl-Einar Thorner Robert Connor Gleason Antung Anthony Liu Hamid Ghofrani Joonho Shin Ufook Sahilliohlu Jeffrey Ahlvin Russell Ludwick Aren Megerdichian Carrie Koen Anthony Kasza Matthew Stevens 6 Outline Exponential Smoothing Back of the envelope formula: geometric distributed lag: L(t) = a*y(t-1) + (1-a)*L(t-1); F(t) = L(t) ARIMA (p,d,q) = (0,1,1); ∆y(t) = e(t) –(1-a)e(t-1) Error correction: L(t) =L(t-1) + a*e(t) Intervention Analysis 7 Part I: Exponential Smoothing Exponential smoothing is a technique that is useful for forecasting short time series where there may not be enough observations to estimate a Box-Jenkins model Exponential smoothing can be understood from many perspectives; one perspective is a formula that could be calculated by hand 8 Santa Barbara South Coast Median House Price in Nominal Thousands 1000 HSEPRC 800 600 400 200 0 1970 1980 1990 2000 2010 Three Rates of Growth 9 7 LNHSEPRC 6 5 4 3 1970 1980 1990 Y EAR 2000 2010 10 Santa Barbara South Coast House Price, 000 04 $ 1000 HSEPRC04 800 600 400 200 0 1970 1980 1990 YEAR 2000 2010 11 Simple exponential smoothing Simple exponential smoothing, also known as single exponential smoothing, is most appropriate for a time series that is a random walk with first order moving average error structure The levels term, L(t), is a weighted average of the observation lagged one, y(t-1) plus the previous levels, L(t-1): L(t) = a*y(t-1) + (1-a)*L(t-1) 12 Single exponential smoothing The parameter a is chosen to minimize the sum of squared errors where the error is the difference between the observation and the levels term: e(t) = y(t) – L(t) The forecast for period t+1 is given by the formula: L(t+1) = a*y(t) + (1-a)*L(t) Example from John Heinke and Arthur Reitsch, Business Forecasting, 6th Ed. observations Sales 1 500 2 350 3 250 4 400 5 450 6 350 7 200 8 300 9 350 10 200 11 150 12 400 13 550 14 350 15 250 16 550 17 550 18 400 19 350 20 600 21 750 22 500 23 400 24 650 13 14 Single exponential smoothing For observation #1, set L(1) = Sales(1) = 500, as an initial condition As a trial value use a = 0.1 So L(2) = 0.1*Sales(1) + 0.9*Level(1) L(2) = 0.1*500 + 0.9*500 = 500 And L(3) = 0.1*Sales(2) + 0.9*Level(2) L(3) = 0.1*350 + 0.9*500 = 485 observations Sales 1 500 2 350 3 250 4 400 5 450 6 350 7 200 8 300 9 350 10 200 11 150 12 400 13 550 14 350 15 250 16 550 17 550 18 400 19 350 20 600 21 750 22 500 23 400 24 650 Level 500 15 observations Sales 16 Level 1 500 500 2 350 500 3 250 485 4 400 5 450 6 350 7 200 8 300 9 350 10 200 11 150 12 400 13 550 14 350 15 250 16 550 17 550 18 400 19 350 20 600 21 750 22 500 23 400 24 650 a = 0.1 17 Single exponential smoothing So the formula can be used to calculate the rest of the levels values, observation #4-#24 This can be set up on a spread-sheet observations Sales 18 Level 1 500 500 2 350 500 3 250 485 4 400 461.5 5 450 455.4 6 350 454.8 7 200 444.3 8 300 419.9 9 350 407.9 10 200 402.1 11 150 381.9 12 400 358.7 13 550 362.8 14 350 381.6 15 250 378.4 16 550 365.6 17 550 384.0 18 400 400.6 19 350 400.5 20 600 395.5 21 750 415.9 22 500 449.3 23 400 454.4 24 650 449.0 a = 0.1 19 Single exponential smoothing The forecast for observation #25 is: L(25) = 0.1*sales(24)+0.9*(24) Forecast(25)=Levels(25)=0.1*650+0.9*449 Forecast(25) = 469.1 Single Exponential Sm oothing 800 700 600 Value 500 Sales 400 Levels 300 200 100 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Observation 15 16 17 18 19 20 21 22 23 24 21 Single exponential distribution The errors can now be calculated: e(t) = sales(t) – levels(t) observations Sales Level 22 error 1 500 500 0 2 350 500 -150 3 250 485 -235 4 400 461.5 -61.5 5 450 455.4 -5.35 6 350 454.8 -104.8 7 200 444.3 -244.3 8 300 419.9 -119.9 9 350 407.9 -57.9 10 200 402.1 -202.1 11 150 381.9 -231.9 12 400 358.7 41.3 13 550 362.8 187.2 14 350 381.6 -31.6 15 250 378.4 -128.4 16 550 365.6 184.4 17 550 384.0 166.0 18 400 400.6 -0.6 19 350 400.5 -50.5 20 600 395.5 204.5 21 750 415.9 334.1 22 500 449.3 50.7 23 400 454.4 -54.4 24 650 449.0 201.0 a = 0.1 observations Sales Level 23 error squared error 1 500 500 0 0 2 350 500 -150 22500 3 250 485 -235 55225 4 400 461.5 -61.5 3782.25 5 450 455.4 -5.35 28.62 6 350 454.8 -104.8 10986.18 7 200 444.3 -244.3 59698.86 8 300 419.9 -119.9 14376.05 9 350 407.9 -57.9 3353.58 10 200 402.1 -202.1 40852.14 11 150 381.9 -231.9 53780.95 12 400 358.7 41.3 1704.33 13 550 362.8 187.2 35027.05 14 350 381.6 -31.6 996.06 15 250 378.4 -128.4 16487.67 16 550 365.6 184.4 34016.68 17 550 384.0 166.0 27553.51 18 400 400.6 -0.6 0.37 19 350 400.5 -50.5 2554.91 20 600 395.5 204.5 41823.74 21 750 415.9 334.1 111594.53 22 500 449.3 50.7 2565.62 23 400 454.4 -54.4 2960.80 24 650 449.0 201.0 40412.28 a = 0.1 observations Sales Level 24 error squared error 1 500 500 0 0 2 350 500 -150 22500 3 250 485 -235 55225 4 400 461.5 -61.5 3782.25 5 450 455.4 -5.35 28.62 6 350 454.8 -104.8 10986.18 7 200 444.3 -244.3 59698.86 8 300 419.9 -119.9 14376.05 9 350 407.9 -57.9 3353.58 10 200 402.1 -202.1 40852.14 11 150 381.9 -231.9 53780.95 12 400 358.7 41.3 1704.33 13 550 362.8 187.2 35027.05 14 350 381.6 -31.6 996.06 15 250 378.4 -128.4 16487.67 16 550 365.6 184.4 34016.68 17 550 384.0 166.0 27553.51 18 400 400.6 -0.6 0.37 19 350 400.5 -50.5 2554.91 20 600 395.5 204.5 41823.74 21 750 415.9 334.1 111594.53 22 500 449.3 50.7 2565.62 23 400 454.4 -54.4 2960.80 24 650 449.0 201.0 40412.28 a = 0.1 sum sq res 582281.2 25 Single exponential smoothing For a = 0.1, the sum of squared errors is: S = (errors)2 = 582,281.2 A grid search can be conducted for the parameter value a, to find the value between 0 and 1 that minimizes the sum of squared errors The calculations of levels, L(t), and errors, e(t) = sales(t) – L(t) for a =0.6 observa tions 26 Sales Levels 1 500 500 2 350 500 3 250 410 4 400 314 5 450 365.6 6 350 416.2 7 200 376.5 8 300 270.6 9 350 288.2 10 200 325.3 11 150 250.1 12 400 190.0 13 550 316.0 14 350 456.4 15 250 392.6 16 550 307.0 17 550 452.8 18 400 511.1 19 350 444.4 20 600 387.8 21 750 515.1 22 500 656.0 23 400 562.4 24 650 465.0 a = 0.6 27 Single exponential smoothing Forecast(25) = Levels(25) = 0.6*sales(24) + 0.4*levels(24) = 0.6*650 + 0.4*465 = 776 observa tions Sales Levels 28 error square error 1 500 500 0 0 2 350 500 -150 22500 3 250 410 -160 25600 4 400 314 86 7396 5 450 365.6 84.4 7123.36 6 350 416.2 -66.2 4387.74 7 200 376.5 -176.5 31150.84 8 300 270.6 29.4 864.45 9 350 288.2 61.8 3814.38 10 200 325.3 -125.3 15699.02 11 150 250.1 -100.1 10023.67 12 400 190.0 210.0 44080.13 13 550 316.0 234.0 54747.14 14 350 456.4 -106.4 11322.57 15 250 392.6 -142.6 20324.22 16 550 307.0 243.0 59036.75 17 550 452.8 97.2 9445.88 18 400 511.1 -111.1 12348.55 19 350 444.4 -94.4 8920.73 20 600 387.8 212.2 45037.39 21 750 515.1 234.9 55172.40 22 500 656.0 -156.0 24349.97 23 400 562.4 -162.4 26379.58 24 650 465.0 185.0 34237.15 a = 0.6 Sum of Sq Res 533961.9 29 Single exponential smoothing Grid search plot Grid Search for Sm oothing Param eter 590000 580000 Sum of Squared Residuals 570000 560000 550000 540000 530000 520000 510000 500000 490000 0 0.2 0.4 0.6 Sm oothing Param eter 0.8 1 1.2 31 Single Exponential Smoothing EVIEWS: Algorithmic search for the smoothing parameter a In EVIEWS, select time series sales(t), and open In the sales window, go to the PROCS menu and select exponential smoothing Select single the best parameter a = 0.26 with sum of squared errors = 472982.1 and root mean square error = 140.4 = (472982.1/24)1/2 The forecast, or end of period levels mean = 532.4 32 33 Forecast = L(25) = 0.26*Sales(24) + 0.74L(24) = 532.4 =0.26*650 + 0.74*491.07 =532.4 34 35 36 Part II. Three Perspectives on Single Exponential Smoothing The formula perspective L(t) = a*y(t-1) + (1 - a)*L(t-1) e(t) = y(t) - L(t) The Box-Jenkins Perspective The Updating Forecasts Perspective Box Jenkins Perspective 37 Use the error equation to substitute for L(t) in the formula, L(t) = a*y(t-1) + (1 - a)*L(t-1) L(t) = y(t) - e(t) y(t) - e(t) = a*y(t-1) + (1 - a)*[y(t-1) - e(t-1)] y(t) = e(t) + y(t-1) - (1-a)*e(t-1) or Dy(t) = y(t) - y(t-1) = e(t) - (1-a) e(t-1) So y(t) is a random walk plus MAONE noise, i.e y(t) is a (0,1,1) process where (p,d,q) are the orders of AR, differencing, and MA. 38 Box-Jenkins Perspective In Lab Eight, we will apply simple exponential smoothing to retail sales, a process you used for forecasting trend in Lab 3, and which can be modeled as (0,1,1). 39 40 41 Retail Sales: Simple Exponential Smoothing 170000 160000 150000 140000 130000 90 91 92 RETAIL 93 94 95 RETAILSM 96 43 44 Box-Jenkins Perspective If the smoothing parameter approaches one, then y(t) is a random walk: Dy(t) = y(t) - y(t-1) = e(t) - (1-a) e(t-1) if a = 1, then Dy(t) = y(t) - y(t-1) = e(t) In Lab Eight, we will use the price of gold to make this point 45 Weekly Closing Price of Gold , Nov. 14, 2003-April 29, 2005 460 440 420 400 380 360 10 20 30 40 GOLD 50 60 70 46 47 48 Box-Jenkins Perspective 49 The levels or forecast, L(t), is a geometric distributed lag of past observations of the series, y(t), hence the name “exponential” smoothing L(t) = a*y(t-1) + (1 - a)*L(t-1) L(t) = a*y(t-1) + (1 - a)*ZL(t) L(t) - (1 - a)*ZL(t) = a*y(t-1) [1 - (1-a)Z] L(t) = a*y(t-1) L(t) = {1/ [1 - (1-a)Z]} a*y(t-1) L(t) = [1 +(1-a)Z + (1-a)2 Z2 + …] a*y(t-1) L(t) = a*y(t-1) + (1-a)*a*y(t-2) + (1-a)2a*y(t-3) + …. 50 51 The Updating Forecasts Perspective Use the error equation to substitute for y(t) in the formula, L(t) = a*y(t-1) + (1 - a)*L(t-1) y(t) = L(t) + e(t) L(t) = a*[L(t-1) + e(t-1)] + (1 - a)*L(t-1) So L(t) = L(t-1) + a*e(t-1), i.e. the forecast for period t is equal to the forecast for period t-1 plus a fraction a of the forecast error from period t-1. 52 Part III. Double Exponential Smoothing With double exponential smoothing, one estimates a “trend” term, R(t), as well as a levels term, L(t), so it is possible to forecast, f(t), out more than one period k 1 f(t+k) = L(t) + k*R(t), k>=1 L(t) = a*y(t) + (1-a)*[L(t-1) + R(t-1)] R(t) = b*[L(t) - L(t-1)] + (1-b)*R(t-1) k 1 so the trend, R(t), is a geometric distributed lag of the change in levels, DL(t) Part III. Double Exponential 53 Smoothing If the smoothing parameters a = b, then we have double exponential smoothing If the smoothing parameters are different, then it is the simplest version of HoltWinters smoothing Part III. Double Exponential Smoothing 54 Holt- Winters can also be used to forecast seasonal time series, e.g. monthly f(t+k) = L(t) + k*R(t) + S(t+k-12) k>=1 L(t) = a*[y(t)-S(t-12)]+ (1-a)*[L(t-1) + R(t-1)] R(t) = b*[L(t) - L(t-1)] + (1-b)*R(t-1) S(t) = c*[y(t) - L(t)] + (1-c)*S(t-12) 55 Part V. Intervention Analysis 56 Intervention Analysis The approach to intervention analysis parallels Box-Jenkins in that the actual estimation is conducted after prewhitening, to the extent that nonstationarity such as trend and seasonality are removed Example: preview of Lab 8 57 Telephone Directory Assistance A telephone company was receiving increased demand for free directory assistance, i.e. subscribers asking operators to look up numbers. This was increasing costs and the company changed policy, providing a number of free assisted calls to subscribers per month, but charging a price per call after that number. 58 Telephone Directory Assistance This policy change occurred at a known time, March 1974 The time series is for calls with directory assistance per month Did the policy change make a difference? 59 60 The simple-minded approach D=549 - 162 D=387 61 62 63 64 Principle The event may cause a change, and affect time series characteristics Consequently, consider the pre-event period, January 1962 through February 1974, the event March 1974, and the post-event period, April 1974 through December 1976 First difference and then seasonally difference the entire series 65 Analysis: Entire Differenced Series 66 67 68 69 70 Analysis: Pre-Event Differences 71 72 73 74 So Seasonal Nonstationarity It was masked in the entire sample by the variance caused by the difference from the event The seasonality was revealed in the preevent differenced series 75 76 Pre-Event Analysis Seasonally differenced, differenced series 77 78 79 80 81 Pre-Event Box-Jenkins Model [1-Z12 ][1 –Z]Assist(t) = WN(t) – a*WN(t-12) 82 83 84 85 Modeling the Event Step function 86 Entire Series Assist and Step Dassist and Dstep Sddast sddstep 87 88 89 90 Model of Series and Event Pre-Event Model: [1-Z12 ][1 –Z]Assist(t) = WN(t) – a*WN(t-12) In Levels Plus Event: Assist(t)=[WN(t) – a*WN(t-12)]/[1-Z]*[1-Z12] + (-b)*step Estimate: [1-Z12 ][1 –Z]Assist(t) = WN(t) – a*WN(t-12) + (-b)* [1-Z12 ][1 –Z]*step 91 92 93 Policy Change Effect Simple: decrease of 387 (thousand) calls per month Intervention model: decrease of 397 with a standard error of 22 94 95 Stochastic Trends: Random Walks with Drift We have discussed earlier in the course how to model the Total Return to the Standard and Poor’s 500 Index One possibility is this time series could be a random walk around a deterministic trend” Sp500(t) = exp{a + d*t +WN(t)/[1-Z]} And taking logarithms, 96 Stochastic Trends: Random Walks with Drift Lnsp500(t) = a + d*t + WN(t)/[1-Z] Lnsp500(t) –a –d*t = WN(t)/[1-Z] Multiplying through by the difference operator, D = [1-Z] [1-Z][Lnsp500(t) –a –d*t] = WN(t-1) [LnSp500(t) – a –d*t] - [LnSp500(t-1) – a –d*(t1)] = WN(t) D Lnsp500(t) = d + WN(t) 97 So the fractional change in the total return to the S&P 500 is drift, d, plus white noise More generally, y(t) = a + d*t + {1/[1-Z]}*WN(t) [y(t) –a –d*t] = {1/[1-Z]}*WN(t) [y(t) –a –d*t]- [y(t-1) –a –d*(t-1)] = WN(t) [y(t) –a –d*t]= [y(t-1) –a –d*(t-1)] + WN(t) Versus the possibility of an ARONE: 98 [y(t) –a –d*t]=b*[y(t-1)–a–d*(t-1)]+WN(t) Y(t) = a + d*t + b*[y(t-1)–a–d*(t-1)]+WN(t) Or y(t) = [a*(1-b)+b*d]+[d*(1-b)]*t+b*y(t-1) +wn(t) Subtracting y(t-1) from both sides’ D y(t) = [a*(1-b)+b*d] + [d*(1-b)]*t + (b-1)*y(t-1) +wn(t) So the coefficient on y(t-1) is once again interpreted as b-1, and we can test the null that this is zero against the alternative it is significantly negative. Note that we specify the equation with both a constant, [a*(1-b)+b*d] and a trend [d*(1-b)]*t 99 Part IV. Dickey Fuller Tests: Trend 100 Example Lnsp500(t) from Lab 2 101 102 103 104