Nov. 6, 2002 LAB #6 ECON 240A-1 L. Phillips Exploratory Data Analysis, Scatterplots, Regression and ANOVA I. This first example uses the Anscombe data set, four data files of eleven observations each on the dependent and explanatory variable. Open the data file in Eviews and select the four variables x1, x2, x3, x4, along with y1, y2, y3, and y4. Go to the View menu, open selected, one window, one group. You should see the eight variables by observation in the spreadsheet view or table. A common practice is to rush and run a regression. This can often be fatal to understanding the relationship between the variables. For example, go to the quick menu and select estimate equation. In the equation specification box type y1 c x1, and hit the OK button. Note that the estimated intercept is 3.0 and the estimated slope is 0.5, and the coefficient of determination is 0.666. For diagnostics, in the equation window, go to the view menu and select actual, fitted, residual: graph. Now repeat this procedure for each of the other three data sets. Are you enlightened yet? As an alternative exploratory procedure, return to the workfile window. In the main Eviews menu, select quick: graph and in the window type in x1 y1 and hit the OK button. For graph type, choose scatter diagram, and hit the option button and select the regression line box and hit OK. Repeat this procedure for the remaining three data sets. Sometimes, a picture is worth a 1000 words. This is one of the points of using visual techniques in exploratory data analysis before wheeling up the heavy artillery. II. The Returns Generating Process The second exercise uses a data file from Chapter 18 of the text, Xr18-34, problem 18.34, p. 629. The problem states that this monthly data begins in January 1993 and ends with December 1996. The authors do not use net returns, i.e. net of the risk free rate. a. Show that this affects the interpretation of the intercept but not of the slope or the coefficient of determination. In Eviews, go to the File menu, select new, and workfile. In the box, click monthly, and for the dates 93.01 and 96.12, and click OK. In the workfile window, select procs, import, read text-Lotus-Excel. Select the Xr18-34 file in the Lab Six folder from Econ240a folder in the classes folder, and hit the open button. The data Nov. 6, 2002 LAB #6 ECON 240A-2 L. Phillips Exploratory Data Analysis, Scatterplots, Regression and ANOVA begins in cell A2. Type in 2 for the number of series, and hit the OK button. In the workfile window, select GE and s_p_index01. In the view menu, select open selected, one window, one group. You should be in the spreadsheet view. In the view menu, go to multiple graphs:line and you will see plots of each series against time. In the view menu, choose descriptive stats: common sample (since each have 48 observations. Close the group window and select GE. Go to the view menu, open selected, one window. In the view menu, choose descriptive statistics: histogram-stats. The coefficient of skewness, zero for the normal distribution is not significant, and the coefficient of kurtosis, three for a normal distribution is not significant either, as reflected by the Jarque-Bera statistic with probability 0.545. Thus the 48 monthly returns for the GE stock are not significantly different from normal. Select the stock index, Standard and Poor’s Composite, and repeat this procedure. It also looks normal. Go to the quick menu, graph and in the window type in s_p_index01 GE and hit the OK button. For graph type, choose scatter diagram, and hit the option button and select the regression line box and hit OK. Go to the quick menu, select estimate equation, and type in ge c s_p_index01. b. Is the slope significantly different from one? What does this finding mean? c. How much of the variation in the monthly returns to GE stock is attributable to the market? Go to the view menu and select actual, fitted, residuals: graph. Does the equation look OK? Go to the view menu, residual tests: histogram-normality test. d. Are the residuals normal? III. House Price and Multiple Regression The third exercise is from the text and is a preview of coming attractions in Econ 240B. This is the data file XM19-02, example 19.2, p. 676. There are 100 observations on homes with price, number of bedrooms, house size in square feet, and lot size in square feet. This data set was imported into EViews. Select bedrooms, lot_size01, house_ size01 and price. In the view menu, select open selected, one window, one group. You should be in the spreadsheet view. In the view menu, go to multiple graphs:scatter: matrix of all pairs. In the last row, you will see the scatter plots of price Nov. 6, 2002 LAB #6 ECON 240A-3 L. Phillips Exploratory Data Analysis, Scatterplots, Regression and ANOVA against the other three variables. It looks like price is positively associated with all three variables. The text regresses price against an intercept, number of bedrooms, house size and lot size. However from the scatter plots, it is apparent that house size and lot size are highly correlated. Try a scatter plot of just these two variables, by selecting these two, going to the quick menu, graph, and selecting scatter for type with a regression line as an option. Also, you can select these two variables, go to the view menu, select open selected, one window, one group. You should be in the spreadsheet view. In the view menu, select correlations. The correlation coefficient is 0.994. These two explanatory variables are highly correlated and are not providing separate variation explaining house price. This is called multicollinearity between the explanatory variables, and causes large standard errors for the slope coefficients for these explanatory variables, and hence low tstatistics, eventhough the coefficient of determination is high. One remedy is to regress price against a constant, bedrooms and house size. a. Interpret the estimated regression coefficient on house size. b. Interpret the estimated coefficient on bedrooms. IV. Exercises #1. An alternative to the regression of price on a constant, number of bedrooms and house size would be to estimate a separate intercept for two bedroom houses, three bedroom houses, etc., similar to the approach in Lab Five where we estimated separate intercepts for each industry in the regression of lnassets on lnsales. Select bedrooms, go to the view menu, open selected, one window. In the view menu, choose descriptive statistics: histogram-stats. The number of bedrooms ranges from two to five. Create the dummy variables and run the regression. Are there any anomalies? #2. Percent of Household Income Spent on Lotteries This exercise uses XR19-56. This data was imported into Eviews. Look at problem 19.56 on p. 692, and work though the questions. There is a subtle problem with this data for the variable, percent of household income spent on lotteries, if it is used as the dependent variable in a regression. Use exploratory data analysis to examine the variables, especially the dependent one, and look for special features. We will discuss this Nov. 6, 2002 LAB #6 ECON 240A-4 L. Phillips Exploratory Data Analysis, Scatterplots, Regression and ANOVA example in class. Also look at the variable for personal income and note its distribution (histogram). This skewness is typical of income data. Take the natural logarithm of income and look at its histogram.