Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes Economics 323 (Version 9 November 2015) Business Conditions Analysis Dr. Houston H. Stokes 722 UH hhstokes@uic.edu Web Page www.uic.edu/~hhstokes TA: Required Text Business Forecasting, John Hanke, Dean Wichern 9th ed Pearson / Prentice Hall, 2009 Optional References Data Analysis Using Stata by Ulrich Kohler and Fraunke Kreuter. 3rd Edition, Stata Press, 2012. Introduction to Time Series Using Stata by Sean Becketti. Stata Press, 2013 Introduction to Econometrics by Christopher Dougherty, 4th edition, Oxford University Press, 2011. Introductory Econometrics: A Modern Approach by Jeffrey M. Wooldridge, 5th edition, 2013, South Western Cengage Learning. This book is a bit expensive but is available from the Library. “Notes on the Basics of Econometric Modeling geared to Dougherty (2011)" by Houston H. Stokes, is available on-line from the course web page and will be discussed in class. See file Preliminary_Notes.docx “Econometric Notes” by Houston H. Stokes contains some sections on important topics. It also provides an introduction to Statistics. See especially sections 1-3 and 5 which we will initially discuss. This document is available from the course web page. See file Econometric_Notes.docx Purpose of Course The purpose of the course is to extend the student’s knowledge of Business Conditions Analysis by teaching statistical methods of business forecasting. This includes forecasting micro and macro data. Students will run the computer in class and get experience in a number of software systems. Students will be introduced to OLS and GLS analysis, nonlinear estimation of GLS models, recursive residual analysis options to test the stability of the estimated coefficients and simple time series ARIMA models. Students will be asked to apply their knowledge in number of computer exercises which will be graded and a final exam, which is a take home. The grading will be 2/3 computer exercises and 1/3 open 1 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes book final or an approved final project. Students can work in teams of two people but must turn in an individual paper with their name on it. If working in a team, be sure and indicate your team member on the cover sheet of your homework. Once a team is formed, it must stay together for the term. Students must have selected their teams by the third week or operate as a "lone wolf" by choice. Students are free to use software of their choosing. Stata and B34S will be discussed in the course as well as some Excel. The goal is to both teach how to do applied economic work and to allow students to list software they can use on their resumes. Students wanting free B34S software systems for their home machines will be allowed to obtain them. Objectives The main objective of the course is to give students knowledge that will allow them to apply the economic theory they have learned in other courses in a manner that will both be useful and will facilitate them obtaining a job. Unless economic knowledge is able to be applied, it is of substantially less value and will soon be forgotten. Since forecasting provides a competitive edge, a great deal of emphasis is placed on developing systematic forecasting skills. Systematic forecasting skills are those that can be replicated by others. Many students taking this class in prior years have used their projects in job interviews to illustrate their skills and to differentiate themselves from other job seekers. Jobs Since many of you will want to obtain jobs that use your economics training. The best way to make a good impression at the job interview is differentiate your skills from the skills of the other applicants. It has been found that student resumes that stress the fact that they have completed a successful econometric research project in the area of the job may have a "leg up" on those applicants that appears clueless at the interview stage. To achieve this end, students can mention the types of problems they have analyzed and the software they have used on their resumes. The project alternative is in place of the take home final and would be done by each individual student. 2 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes Computer Software For many of you this will be your first time using the computer for serious work apart from the basics you have learned in the required stat courses. It is important that you do not shrink from this aspect of the course. Computer work is exacting and is necessary for research. The problem sets can be run on your own machine or from the labs where it is possible to open the class web pages and cut and paste commands into control files. Many students find it helpful to work together on their computer projects since they will be able to explain to each other what they are trying to do and learn to express themselves. What is important is to discuss results clearly and with understanding. Attaching masses of computer output is NOT a substitute for writing in a clear fashion. In business stress is laid on being clear and specific in what you have found. While many of the problems can be solved using Excel many cannot be solved using this limited software. For this reason we will be using Stata for much of our work. Stata examples in this document provides a quick introduction to the software and will be discussed in class first to get you up and running. Since this document is on-line, you will be able to cut and paste the control files and get up and running very fast. Using the internet and cut and paste it is easy to run models. Kohler & Kreuter (2012) provides more Stata detail, if that is needed. A student version of Stata is available from the UIC computer center for a nominal charge or can be accessed from various UIC labs. Stata is available in SCE408, SELE 2249F, SELE 2249, and BSB 4133. Students can obtain a copy of Stata that expires on 6/30/2015 for $90.00 https://webstore.illinois.edu/Shop/search.aspx?keyword=Stata Advanced references that are available for more detail on forecasting as needed. 1. Chapter 2 of Specifying and Diagnostically Testing Econometric Models Houston H. Stokes, 2nd ed, Quorum Books 1997. Updated version available on line from web page. 2. Chapter 7 of Specifying and Diagnostically Testing Econometric Models Houston H. Stokes, 2nd ed, Quorum Books 1997 Updated version available on line from web page. 3. Chapter 9 of Specifying and Diagnostically Testing Econometric Models Houston H. Stokes, 2nd ed, Quorum Books 1997. Updated version available on line from web page. 4. Stokes (200x) The Essentials of Time Series Modeling: An Applied Treatment with Emphasis on Topics Relevant to Financial Analysis Chapter 2 "Time series modeling objectives" available on line from web page. 3 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes 5. Stokes (200x) The Essentials of Time Series Modeling: An Applied Treatment with Emphasis on Topics Relevant to Financial Analysis Chapter 4 "Stationary Time Series Models" available on line from web page. 6. Stokes (200x) The Essentials of Time Series Modeling: An Applied Treatment with Emphasis on Topics Relevant to Financial Analysis Chapter 5 "Estimation of AR(p), MA(q) and ARMA(p,q) Models" available on line from web page. Quick Start: To get going assume you have a file of data on the age of 6 cars and their value age 1 3 6 10 5 2 value 1995 875 695 345 595 1795 Your goal is to estimate a model of the form Value= Age Your results should show Value = 1852.9 (6.45) _ R2 = .672 178.41 Age (-3.35) which has been discussed in the overview notes on statistics. A simple Stata setup for this problem is input x y 1 1995 3 875 6 695 10 345 5 595 2 1795 end list summarize regress y x which if saved with the name cars.do can be is run in batch with the command 4 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes stata /e do cars.do which produces in cars.log ___ ____ ____ ____ ____ (R) /__ / ____/ / ____/ ___/ / /___/ / /___/ 12.1 Statistics/Data Analysis Copyright 1985-2011 StataCorp LP StataCorp 4905 Lakeway Drive College Station, Texas 77845 USA 800-STATA-PC http://www.stata.com 979-696-4600 stata@stata.com 979-696-4601 (fax) Single-user Stata perpetual license: Serial number: 3012042652 Licensed to: Houston H. Stokes U of Illinois Notes: 1. Stata running in batch mode . do cars.do . input x y 1. 2. 3. 4. 5. 6. 7. 1 1995 3 875 6 695 10 345 5 595 2 1795 end x y . list 1. 2. 3. 4. 5. 6. +-----------+ | x y | |-----------| | 1 1995 | | 3 875 | | 6 695 | | 10 345 | | 5 595 | |-----------| | 2 1795 | +-----------+ . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------x | 6 4.5 3.271085 1 10 y | 6 1050 679.5219 345 1995 . regress y x Source | SS df MS -------------+-----------------------------Model | 1702935.05 1 1702935.05 Residual | 605814.953 4 151453.738 -------------+-----------------------------Total | 2308750 5 461750 Number of obs F( 1, 4) Prob > F R-squared Adj R-squared Root MSE = = = = = = 6 11.24 0.0285 0.7376 0.6720 389.17 -----------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | -178.4112 53.20631 -3.35 0.028 -326.1356 -30.68683 _cons | 1852.85 287.3469 6.45 0.003 1055.048 2650.653 -----------------------------------------------------------------------------. end of do-file 5 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes An alternative is to place this file in the Stata do file editor and execute it! Data is saved on-line in the Excel directory. These can be accessed by Stata or Excel from your PC. APPL BEER BOATS CALDATA CAMPERS CAR CASE10_1 CASE10_4 CASE2_2 CASE4_1 CASE4_3 CASE5_1 CASE6_1 CASE6_2 CASE6_3 CASE6_5 CASE7_1 CASE7_3 CASE7_4 CASE8_1 CASE8_2 CASE8_4 CASE9_5 CENEX COMPANY DAY90 DISINC EDPRICE EGGS EMPLOYEE EX10_3 EX10_4 EX10_5 EX10_6 EX10_7 EX10_8 EX4_1 EX4_3 EX4_4 EX4_5 EX5_1 EX5_4 EX6_1 EX6_11 EX7_1 EX7_12 EX7_5 EX7_7 EX7_8 EX8_10 EX8_2 EX8_8 EX9_1 EX9_3 EX9_4 FARMS FORREST FURNACE GNP HANKEINFO HOMES HOUSEHOLD IMPORTS Appliance Shipments Monthly 1983-1993 Monthly Beer Production 1983-1993 Demand for Boats Monthly Sales of Electronic Calculators Camp Development Decision Monthly Deliveries of Dorf Company Weekly Restaurant Sales Lydia Pinkham Annual Data 1907-1960 Mr. Tux Rental Data Monthly Sales Data 1983-1995 Consumer Credit Counseling Clients Solar Alternative Company Sales 93-94 Tiger Transport Company - Weight & MPG Butcher Products Production Data Ace Personnel Company Credit clients Bond Market Dataset Mr. Tux with Dummy Variables Consumer Credit Counseling Extended Data Small Engine Doctor Data Case Study 8_2 Tux Data AAA Emergency Road Call Data 1988 - 1993 Full AAA Emergency Data Jun 87 - Jul 93 Cenex Chemical Process Data Not Found Quarterly 90-day Treasury Bills Disposable Income 1955 - 1985 Price of Higher Education 70-93 Production of Eggs 1961-1992 Employee Study Daily Transportation Index Close Readings from Atron Process Errors of Atronm Quality Control Errors for Ed Jones Quality Control Closing Stocks of ISC Keytron Sales VCR's Sold 40 Random Numbers Sears Sales Outboard Marine 1984-1996 Acme Tool Company Weekly Movie Video Rentals Milk Sales Hardware Advertising and Sales Milk vs Price vs Advertising Washington Power Usage Job Performance Food Expenditure Zurenko Pharmaceutical Company Quarterly Sales Outboard Marine New Passenger Cars in the United States 60-92 Monthly Registration of Cars 1986-1992 Reynolds Metals 1976-1996 Novak Corp Sales 1980-1996 Yearly Sears Sales 1976-1966 Number of US Farms 1975-1993 Forest Products Car Loadings Shipments of Furnaces 1982-1990 GNP 1950 - 1991 Lists of Hanke Datasets Demand for Motor Homes Household vs populations National Imports for the years 1967 to 1986 6 Economics 323 Business Conditions Analysis Spring 2016 INVEST KINSTON MARRIAGE MEDIAN MOODY MOTEL PAPER PE PERFUME PR10_10 PR10_11 PR10_12 PR10_13 PR10_14 PR10_15 PR10_7 PR10_8 PR10_9 PR2_1 PR2_10 PR2_13 PR2_2 PR2_7 PR2_9 PR4_13 PR4_17 PR4_20 PR5_11 PR5_12 PR5_13 PR5_14 PR5_15 PR5_6 PR5_9 PR6_11 PR6_12 PR6_13 PR6_3 PR6_4 PR6_5 PR6_6 PR6_7 PR6_8 PR6_9 PR7_10 PR7_11 PR7_13 PR7_15 PR7_8 PR8_11 PR8_13 PR8_19 PR8_20 PR9_10 PR9_11 PR9_12 PR9_13 PR9_14 PR9_15 PR9_16 PR9_17 PRES PRIME1 PRIME2 RAILROAD REFILL RIDERSHIP SALARY SEARINC STATIONS Non residential Investment 1950-1989 Monthly Sales of Kinston Number of Marriages 1965-1989 Population Dataset Electric Utility Stocks Annual Ave Monthly Occupancy For Model 9 1987-1996 Monthly Demand for Paper Products Quarterly Industrial P/E Ratio Monthly Demand for Perfume 80 obs of data 96 obs IBM Stock Quotes Daily DEF Corporation stock Weekly Auto Accidents 1984-1985 Corn Price in Spokane Washington Chips Bakery 126 obs of test data 80 Obs of test data Customer Transactions Books Sold vs Shelf Space Random Stock Vs Temp Data Housing Prices Family Sizes Maintenance of Buses Marriages in US 85 - 92 Quarterly Loans Dominion Bank Earning Per Share Price Company Demand for Hughes Supply Asset Value General American Investors Revenues from Southdown Inc Triton Sales per share Revenues for the Consolidated Edison Company Apex Mutual Fund Price Davenport Bond Yield Building Permits vs Interest Rate Print Cost Data Defective parts vs batch size Sales vs Advertising Checkout time vs Value of Purchase Maintenance cost vs age ooks Sold vs Shelf Space Orders vs Catalogues Yearly Investment vs Interest Rate Forecasting a Competitor bid Sales = f(# outlets, # cars) # Registered autos = f( ) What Makes a winning baseball team Presto Sales Checkout time vs Purchase & # items Capital Spending 1977 - 1993 Spending on TV Ads Goodyear Tires Sales 1985-1996 Retail Sales Data Gas Consumption in the US Resort Study Revenue Data Shareholder Data passengers who flew on Thompson Airline planes. Thomas Furniture Company sales Dicksen vs Industry Sales Savings in period 1935 - 1954 New Prescriptions 1990-1996 Quarterly Prime Rate 1985-1991 Monthly average prime 1945-1995 Railroad Labor Annual Average Cents Per Hour Monthly refill data 1983-1990 Daily Bus Ridership Age vs Salary Data Sears Sales 1955 - 1985 Number of TV stations changing hands in 1991 7 Dr. H. H. Stokes Economics 323 Business Conditions Analysis Spring 2016 STOCK TRAVEL WASH WELLHEAD WOOD Dr. H. H. Stokes S & P Monthly 1945-1995 US Citizen departures 1961-1991 Boats - Motor Homes - Income Average Wellhead Price, Natural gas 72-93 Monthly Wood Production 1992-1996 Problem Sets, Final/Project: There are 5 problem sets. These will be due on the 4th, 6th, 9th, 11th and 13th week for problem sets 1 - 5 respectively. Lates will not be accepted unless there is prior written approval. Problem set answers should be typed and carefully laid out. Since they will be 2/3 of the grade in the course, care should be taken in their preparation. If past classes are any indication, an impressive layout can be leveraged in a subsequent job interview to illustrate your capability to solve "real world" problems using modern methods of analysis. The remaining 1/3 of the grade is either the student project which involves obtaining data from the web or other sources and estimating a regression and ARIMA model OR taking the take home final. The objective of the project is to give students a paper which illustrates their capabilities which they can use in job interviews to further show their capability. If you have a job goal in one specific industry, it is a good idea to select a paper topic that is related to this area. Further information on the project is contained below. 8 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes Assignments I. Introduction to Data Analysis - Hanke-Wichern Chapters 1-2 Stokes [5], [6] II. Regression Analysis - Hanke-Wichern Chapter 6, 7 Stokes Econometric_Notes Stokes (1997) Chapter 2 listed as [2] Stokes (2004) Chapter 2 listed as [9] III. Recursive Residual Analysis - Stokes (1997) Chapter 9 listed as [4] IV. ARIMA Model Building - Identification and estimation. Stokes (1997) Chapter 7 listed as [3] Stokes (2004) Chapter 4 "Stationary Time Series Models" listed at [10] Stokes (2004) Chapter 5 "Estimation of AR(p), MA(q) and ARMA(p,q) Models" listed as [11] Hanke-Wichern Chapter 8 and 9 Problem Sets 1. Introductory Statistical Analysis - Due 4th week 2. Estimation and Testing of Regression Models. Due 6th week. 3. Applied Econometric Analysis. Due 9th week 4. Identification of ARIMA Models using real and generated data. Due 12th week 5. Estimation of ARIMA Models. Due 14th week. 9 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes Terms and concepts to understand OLS Model – define, state assumptions and discuss the effect on the estimates or standard errors if the assumptions of OLS are not met Multicollinearity Simultanity Exogenous, Endogenous Differences-in-Differences model – define and discuss interpretation Regression discontinuity Proxie variable Probit, logit and tobit model – define and show how used. Panel Data – define and slow the advantages and disadvantages of Fixed effects model, random effects model. Instrumental variable - Define and show how used. Be able to discuss and give examples. Effects of serial correlation, heteroskedasticity and model specification on estimated coefficients, estimated standard errors and the ability of a model to accurately draw inferences. Population Sample Sample selection bias. Datasets: Datasets for the problem sets are on-line under the course web page. Datasets for SAS are in the class FTP location in ftp.uic.edu/pub/depts/econ/hhstokes/e323/sas_files/ Datasets for Excel / Stata are in the class FTP location in ftp.uic.edu/pub/depts/econ/hhstokes/e323/excel_files/ 10 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes Problem Set # 1 Introductory Statistical Analysis Goals: Introduction to Computer Use Data Sampling 1. Hanke-Wichern (2009) page 50 # 12 list data on the number of books sold (BSOLD) and the feet of shelf space (SHELFS). You are asked to calculate the correlation BSOLD and SHELFS. In addition run a regression of BSOLD = f(constant SHELFS). Draw a skatter diagram. Use your model to predict how many books will be sold if the shelf space is 8.23. For software you can use any program that you would like. The means obtained should be: Variable WEEK BSOLD SHELFS CONSTANT Label # Cases 1 Week Data Collected 2 Books Sold 3 Shelf Space 4 11 11 11 11 Mean 6.00000 210.182 4.88182 1.00000 Std. Dev. 3.31662 54.7153 1.42816 0.00000 Variance 11.0000 2993.76 2.03964 0.00000 Maximum 11.0000 295.000 7.70000 1.00000 Stata will run the problem with statements: input week bsold shelfs * Data from Hanke - Wichern Edition 9 page 50 # 12 * Data from Hanke - Wichern Edition 8 page 48 * Data from Edition 7 page 44 1 275 6.8 2 142 3.3 3 168 4.1 4 197 4.2 5 215 4.8 6 188 3.9 7 241 4.9 8 295 7.7 9 125 3.1 10 266 5.9 11 200 5.0 end label variable week "Week Data Collected " label variable bsold "Books Sold " label variable shelfs "Shelf Space " summarize describe set graphics on graph twoway scatter corr regress bsold shelfs bsold shelfs, saving(graphp1_1) 11 Minimum 1.00000 125.000 3.10000 1.00000 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes 2. Hanke-Wichern (2009) page 50 problem 13 shows a dataset of 200 weekly observations of temperature in Spokane (Temp) and the number of shares of stock trades for Sunshire Mining (Shares). The dataset is Pr2-9.xls, You are asked to calculate the correlation between Shares and Temp and run a model Shares = f(constant, Temp). Code to load from a file is shown below shown below together with You can either run this problem with a modified script in “batch mode” or load the datafile directly into Stata from the web and give the appropriate commands. import excel using "c:\master\master1\class\e323\hanke\excel\ch2\pr2-9.xls",firstrow summ * list corr regress Shares Temp * * Alternative ways to do a bootstrap * regress Shares Temp, vce(bootstrap, reps(400) seed(10101)) * * Here we see what the coef look like using resampling techniques. bootstrap _b _se, reps(400) seed(10101):regress Shares Temp matrix list e(b_bs) * regress Shares Temp, robust When you load the data means should be: Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------C1 | 200 100.5 57.87918 1 200 Shares | 200 48.86 29.28461 0 99 Temp | 200 47.75 28.176 1 99 Discuss in detail what you find in terms of coefficients and SE’s . 2. The next part of the problem is to sample the data 400 times (with replacement) and see what happens to the SE and the coef. 3. Using the OLS results and the bootstrap coef forecast Shares for a Temp value of 63. Show your work. 12 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes Problem Set # 2. Regression Analysis Goals: OLS Model Specification Introduction to GLS Pindyck & Rubinfeld [1981] page 458 - 466, which can be consulted for more detail, contains data on a number of economic series. This data is on line in member penrub.dct and can be called from inside Stata. The steps to load this data on PC are infile using “c:\master\master1\class\e323\penrub.dct”, clear Means and variable names and descriptions are: Variable time qt c g gnp gnpp iin inv inr ir m p rl rs trans ur w wlth yd constant # Label 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 QUARTER PERSONNAL CONSUMPTION EXP. 58 DOLLARS GOV. EXP. GOODS & SERVICES 58 DOLLARS GROSS NATIONAL PRODUCT IN 58 DOLLARS POTENTIAL GNP IN 58 DOLLARS INVENTORY INVESTMENT IN 58 DOLLARS LEVEL OF BUSINESS INV. IN 58 DOLLARS FIXED NON RES. INVEST IN 58 DOLLARS FIXED RES INV NON FARM STRUCT 58 DOLLARS M/P IN 58 DOLLARS IMPLICIT GNP DEFLATOR LONG TERM INTEREST RATE SHORT TERM INTEREST RATE FED GOVERNMENT TRANSFER PAY. 58 DOLLARS UNEMPLOYMENT RATE NOMINAL WAGE IN DOLLARS PER HOUR INDEX OF REAL HOUSEHOLD WEALTH DISPOSIBLE INCOME IN 58 DOLLARS Data file contains Data begins on (D:M:Y) Frequency is Mean 88 observations on 1: 1:1956 and ends on Std. Dev. 1966.50 2.50000 415.268 124.123 640.481 696.262 5.15966 162.439 67.0768 28.8623 149.040 132.515 5.05501 4.38733 36.4505 5.39624 3.79868 1.96199 564.101 1.00000 6.38065 1.12444 98.1198 22.2271 141.650 208.121 4.85589 37.1501 17.2897 6.27482 8.11765 35.3421 1.31016 1.66629 15.7636 1.35257 1.55299 0.372728 124.761 0.00000 Variance 40.7126 1.26437 9627.50 494.042 20064.6 43314.5 23.5796 1380.13 298.934 39.3734 65.8962 1249.07 1.71653 2.77652 248.492 1.82945 2.41178 0.138926 15565.2 0.00000 20 variables. Current missing value code is Maximum 1977.00 4.00000 607.500 155.500 901.200 1089.40 17.5600 223.200 94.5500 44.2900 165.400 218.400 7.27000 8.32300 67.3300 8.86700 7.41300 3.01800 793.800 1.00000 Minimum 1956.00 1.00000 279.100 84.6000 434.200 413.400 -13.1600 111.300 40.4400 19.4100 136.800 93.9000 2.88700 0.957000 16.1100 3.40000 1.90400 1.33500 382.400 1.00000 0.1000000000000000E+32 1:10:1977 4. Assignment: Be sure that you have read and understand the material in Stokes [1997] Chapter 2 and Hanke-Wichern [2009] Chapter 7 1. Define and discuss the following econometric problems: a. Heteroskedasticity b. Serial Correlation c. Simultaneity 13 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes 2. Using Stata estimate the following models. a. ct = + 1 ydt b. ct = + 1 ydt + 2 ct-1 c. ct = + 1 ydt + 2 ct-1 + 3 ydt-1 d. ct = + 1 [ydt - transt] + 2 transt + 3 [wltht wltht-1] + 4 [rst + rst-1 + rst-2 +rst-3] + 5 ct-1 You will have to build the variables [ydt - transt] and [rst + rst-1 + rst-2 + rst-3]. The following code will do this task if the main dataset has been loaded. infile using "c:\master\master1\class\e323\penrub.dct", clear * infile using "g:\e323\penrub.dct", clear * set a time variable gen trend = _n tsset trend summ describe * Model a regress c yd estat dwatson * GLS for models with no lagged dependent variable on right prais c yd * what happend if you try next command? * prais c yd, corc * * Does the rho make sense prais c yd, ssesearch * newey c yd, lag(1) * * build data gen lag_c = L.c gen lag_yd = L.yd gen yd_m_trans = yd-trans gen dif_wlth = wlth - L.wlth gen sum_rs = rs + L.rs +L2.rs +L3.rs * Model b regress c yd lag_c * Model c regress c yd lag_c lag_yd * Model d regress c yd_m_trans trans dif_wlth sum_rs What models can be estimated using GLS? Discuss what you have found. 14 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes Problem set # 3 – Applied Econometric Analysis 1. This problem is based on a modification of problem 6 on pages 314 of Hanke Explain each of the following concepts and how it might be used. a. Correlation matrix b. R2 c. Multicollinearity d. Residual e. Dummy variable f. Stepwise regression. 2. Solve Hanke problem 12 page 317. Data load means are: Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------Sales | 11 29.60909 13.75947 3.5 52.3 Outlets | 11 1554.364 843.9054 125 2850 Auto | 11 12.42727 6.858585 4.1 24.6 Income | 11 60.25455 27.1665 19.7 98.5 The following partial program shows you how to fit a tentative model and obtain the fit and the error. import excel using "c:\master\master1\class\e323\hanke\excel\ch7\Pr7-13.xls",firstrow summ describe list corr * regress Sales Income predict fit predict error, resid list * drop fit drop error a. Analyze the correlation matrix b. How much error is involved in the prediction for region 1? c. Forecast the annual sales in region 12, given 2,500 retail outlets and 20.2 million automobiles registered. d. Are the partial regression coefficients sensible? 15 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes 3. This problem is based on problem 16 on page 320 of Hanke. The correct commands to load the data and get means and correlation are: import excel using "c:\master\master1\class\e323\hanke\excel\ch7\Pr7-15.xls",firstrow summ list corr ERA = earned run average SO = Strike outs BA = Batting average RUNS = Runs HR = Home Runs SB = Stolen Bases Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------WINS | 26 80.92308 9.711532 57 98 ERA | 26 3.904615 .3906429 3.06 4.59 SO | 26 938.0769 73.26168 739 1033 BA | 26 .2556538 .0096992 .241 .28 RUNS | 26 697.1923 70.10679 576 829 -------------+-------------------------------------------------------HR | 26 130.1154 32.00791 68 209 SB | 26 120 37.89776 50 221 . corr (obs=26) | WINS ERA SO BA RUNS HR SB -------------+--------------------------------------------------------------WINS | 1.0000 ERA | -0.4937 1.0000 SO | 0.0488 -0.3932 1.0000 BA | 0.4460 0.0152 -0.0067 1.0000 RUNS | 0.6267 0.2788 -0.2091 0.6449 1.0000 HR | 0.2088 0.4896 -0.2150 0.1536 0.6636 1.0000 SB | 0.1904 -0.4039 -0.0617 -0.2070 -0.1623 -0.3053 1.0000 Assume you have been retained to determine what is important for developing a winning team. a. Discuss the importance of each variable using correlation and regression analysis. b. What is a best equation to use to forecast wins? Give detailed reasons why you selected this equation. The stepwise command might be useful stepwise, pr(.2) : regress y x c. Prepare a report to submit to the team manager. 16 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes Problem set # 4 - Identification of ARIMA Models using real and generated data Assignment - Review Stokes [1997] Chapter 6 and Hanke-Wichern [2005] chapter 9 1. Define and discuss the use of: a. Autocorrelation Function (AFC) b. Partial Autocorrelation Function (PAFC) 2. Discuss what you would look for to identify: a. An AR(1) model b. A MA(1) model 3. Estimate the ACF and PACF for the variables C, RS, and M in dataset PENRUB. It is recommended that you investigate the original series, the first differenced series. From your work, what is the correct amount of differencing? Why? What do these ACF tell us about the series? 17 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes 4. The files test_arima_1500.dct and test_arima_15000.dct generate series with the following characteristics: a. (1-.7B)Xt = et. b. Xt = (1-.7B)et. c. (1+.7B)Xt = et. d. Xt = (1+.7B)et. e. (1-.65B)Xt = (1-.4B - .7B4)et. for 1500 and 15000 observations respectively. Means for the 1500 data are Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------ar1_a | 1500 -.000633 1.45299 -5.480957 4.640228 ma1_b | 1500 -.0003761 1.245439 -4.938618 4.3658 ar1_c | 1500 -5.77e-06 1.417546 -5.18021 5.23172 ma1_d | 1500 .0000899 1.241997 -4.590242 3.586889 arma_e | 1500 .0008087 1.34893 -5.08477 4.419892 -------------+-------------------------------------------------------norm | 1500 -.0001431 1.018634 -4.000103 3.422285 trend | 1500 750.5 433.157 1 1500 a. Estimate the ACF and PACF for the first five series and discuss. b. Next estimate the correct models and see how close you get. Contrast results for the 1500 observation series with the 15000 observation series. Computer help: * infile using "c:\master\master1\class\e323\test_arima_1500.dct",clear infile using "c:\master\master1\class\e323\test_arima_15000.dct",clear * set a time variable gen trend = _n tsset trend summ describe corrgram ar1_a, lags(24) corrgram ma1_b, lags(24) corrgram ar1_c, lags(24) corrgram ma1_d, lags(24) corrgram arma_e, lags(24) corrgram norm, lags(24) arima arima arima arima arima ar1_a, ma1_b, ar1_c, ma1_d, arma_e, arima(1,0,0) arima(0,0,1) arima(1,0,0) arima(0,0,1) ar(1) ma(1,4) 18 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes Problem set # 5 - Estimation of ARIMA Models Goal: Be able to forecast 1. Discuss the advantages and disadvantages of ARIMA models in comparison to large scale models. Under what conditions would an ARIMA modeling procedure be appropriate, a large scale econometric modeling procedure be appropriate? 2. Using the ACF and PACF that you estimated in case study # 2, estimate the ARIMA models for C, RS, M. Be sure to have the correct amount of differencing. Try your models using the predict option. Discuss your models. A sample job is shown. Two ARIMA models are shown. One appears to work better. Why? infile using "c:\master\master1\class\e323\penrub.dct", clear * infile using "g:\e323\penrub.dct", clear * set a time variable gen trend = _n tsset trend summ describe * arima c, arima(1,1,1) arima c, ar(1,2,3,4,5) predict arxb_c predict ardy_c, dynamic(80) list c D.c ar* 19 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes Help on autobj. AUTOBJ - Automatic Estimation of Box-Jenkins Model The ARMA command estimates univariate BJ models using ML and method of moments. Since only one AR and MA factor is allowed, this command can be used to select relatively simple models from inside a user selected framework. If many series are to be filtered quickly, this command should be considered. Models with very many terms can be estimated. The more complex command AUTOBJ will automatically identify models with AR, MA, SAR and SMA factors without the user having to specify the model. This use of time series AI allows filtering of a large number of quite different series possible. A limit of 10 terms can be in the model but up to 6 factors can be estimated. These limits are due to the BoxJenkins philosophy that suggests parsimonious models be used. The AUTOBJ command is based on the BJIDEN and BJEST routines available as B34S commands. The underlying code is based on the Peck Box Jenkins program that was developed under the supervision of George Box at UW starting in the late 60's. In addition to automatic model selection using the :autobuild option, the AR and MA parameters can be specified in "manual" mode of operation.. call autobj(x :options); x series to filter. If the user wants to impose differencing, this should be done outside the command or inside the command with the command :rdif or :sdif. Other wise using automatic model building, differencing will be selected if the AR parameter is above the :roottol value which defaults to .8. :autobuild - Automatically selects the arima model starting from a "generic" arima(1,1) model on appropriately differenced data. :rawacfpacf - Give Raw ACF and PACF prior to model being fit.. :difrawacf - Gives difference as well as raw acf and pacf if :rawacfpacf set. :assumptions - Lists assumptions. Not usually used. :seasonal n - Sets the seasonal period. If this is not present seasonal differencing will not be attempted. :seasonal2 n - Sets the second seasonal period. If seasonal2 is set, seasonal must be set. Used with hourly and weekly data. :longar n - Sets initial default AR order. Default=1. Range 0-2. This is not allowed if seasonal2 is set. :longma n - Sets initial default MA order. Default=1. Range 0-2. This option is not allowed if seasonal2 is set. :nodif - Suppress automatic differencing selection. 20 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes :rdif - Forces Regular Differencing. :sdif - Forces Seasonal Differencing. :trend - Estimate a trend if there is differencing. :noest - No estimation will be performed. This option requires that the model has been saved. :cleanmod - On the last step, the model will be cleaned of parameters that have |t| values LT droptol. This option makes a very parsimonious model. :forcedstart - Forces a default starting value of .1 to be set. This is usually not needed. :nosearch - Turns off spike hunting. :spikelimit i - Sets limit to look for spikes. Default = max(12,2*seasonal) :spiketol - Sets t for spike inclusion. Default = droptol. If this is set too low the program will cycle since a term will be added which will not be significant due to the |t| not meeting the droptol. :arlimit r r - Sets a value to check for |t| of adjacent ACF terms. If r is set smaller, it is more likely AR terms will be added. Change this value with caution. Default = 1.3. :startvalue r - Sets default parameter start value for automatic model building. Default = .1 :print - Print results. :printres - Print residuals. :printit - Print iterations :printsteps - Prints Model selection steps for automatic model building. :backforecast - Use backforecasting. This option allows residuals to be calculated for all data points. It can result in instable estimation. This option should be used with care. :maxtry n - Maximum tries at auto model selection. Default = 4. :roottol r - Set auto model differencing tolerance. Default = .8 :droptol r - Sets drop tolerance. Default = 1.7 :eps1 - Sets max change in relative sum of squares before iteration stops. Default = 0.0 => this criterian not used. r 21 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes :eps2 r - Sets relative max change in each parameter. Default = .004 :maxit i - Sets maximum number of iterations allowed. Default = 20 :nac i - Sets # autocorrelations printed. Max = 999. :npac i - Sets number of partial autocorrelations printed. Options to override auto selection of the model. Note: Specify AR and MA in this order if present. :ar ivec - set AR orders. Can specify up to three factors. For example: :ar index(1 2 3) index(12) :ma ivec - set ma orders. Can specify up to three factors. For example: :ma index(1 2 3) index(12) :arparm rarray - Initial ar values. Usually not needed. :maparm - Initial ma values. Usually not needed. :forecast index(i1 i2) - Sets forecast number and origin. Limit for number = 100 :smodeln - Sets model save name. If :noest is in effect, this sets the model name to used to make forecasts. Variables created if options selected: %numar - Number of AR factors %numma - Number MA factors %numdif - Number difference factors ********************************************** Defined if %numar > 0 %arparms - AR parameters %arse - SE of AR parameters %arord - AR orders %narfact - Number of parameters in each factor ********************************************** Defined if %numma > 0 %maparms - MA parameters 22 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes %mase - SE of MA parameters %maord - MA orders %nmafact - Number of MA parameters in each factor ********************************************** Defined if %numdif > 0 %diford - Dif Orders (6 element array) ********************************************** %coef - constant, ar parameters, ma parameters %se - Coefficient Standard Errors %t - Coefficient t scores %cname - Coefficient names 123456 AR - 1 AR - 2 MA - 1 MA - 2 give info on the factor %corder - Coefficient order Defined if Forecasting ********************************************* %fcast - Vector of forecasts %foreobs - Vector of Forecast obs %fse - Forecast standard error %fpsi - Forecast psi weights %nres - nob -(max(arorder, maorder)+2) %res - Residual vector of length %nres. %resobs - Observation # of residual %y - Y vector lined up same as %res. %yhat - Estimated y %yvar - Y variable name %rss - Residual sum of squares %sumabs - Sum of |e(t)| %maxabs - Maximum |e(t)| Notes: If :ar or :ma is found, auto identification will not be performed. If auto identification is used, the beginning values 23 Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes will often be close to the final values because of the "hidden" identification estimation runs. The switch :printsteps will show these estimations although usually this is not needed. The following statement will detect if the program ran: if(kind(%res).eq.-99)then; call print('AUTOBJ failed'); endif; Example # 1 Identify the Gas model: b34sexec options ginclude('gas.b34'); b34srun; b34sexec matrix; call loaddata; call load(rtest); /$ /$ This roottol setting forces no differencing /$ /$ call autobj(gasout :print :nac 24 :npac 24 /$ :roottol .99 :autobuild ); /$ This turns off differencing call autobj(gasout :print :nac 24 :npac 24 :autobuild ); :nodif call rtest(%res,gasout,48); /$ Default let program decide call autobj(gasout :print :nac 24 :npac 24 /$ :printsteps :spiketol 2.0 :autobuild ); call rtest(%res,gasout,48); b34srun; Example # 2 Identify Retail Data b34sexec options ginclude('b34sdata.mac') member(retail); b34srun; b34sexec matrix; call loaddata; call load(rtest); call autobj(applance :autobuild :seasonal 12 :nac 36 :print :assumptions /$ /$ maxtry limits model /$ :printsteps :maxtry 2 /$ :forecast index(20,norows(applance)) ); call names(all); call tabulate(%cname,%corder,%coef,%se,%t); call print(%yvar,%numar,%numma,%numdif); if(%numdif.ne.0)call print(%diford); if(%numar.ne.0) call print(%narfact,%arord,%arparms,%arse); if(%numma.ne.0) call print(%nmafact,%maord,%maparms,%mase); b34srun; 24 Economics 323 Business Conditions Analysis Spring 2016 SAS help follows * arma(0,1,2) Model; proc arima; identify var=c(1,1) noprint; estimate q=(2); forecast lead=10; run; * arma(2,1,0) Model; proc arima; identify var=c(1,1) noprint; estimate p=(2); forecast lead=10; run; * arma(1,0,2) Model; proc arima; identify var=c noprint; estimate p=(1) q=(2); forecast lead=10; run; 25 Dr. H. H. Stokes Economics 323 Business Conditions Analysis Spring 2016 Dr. H. H. Stokes Project For those selecting this option, the project will consist of the individual student selecting 2 to 3 series and fitting an ARIMA models to the series. These series must be related to some interesting topic that is relevant. After fitting ARIMA models the student should next relate these series using regression methods. In your write up of the project briefly outline the economic theory you are basing your results on, the techniques that you are using and the results obtained. The objective of this project is to test how well you can apply the theory that you have learned to data you have selected. A major objective is to have to be able to show a completed econometric study at a job interview. The paper should be 15-20 pages typed. In the past many students have used corrected copies of this paper in the job interview process with great success. Being able to a research project in econometrics will really set you apart from other job seekers. Students wishing to do this option must submit a 1/2 page proposal by the end of the 10th week. Project papers are due the end of the 15th week. With special approval, the paper can be a team project but in this case a longer and more extensive project on the order of 40 pages is required. 26