Lab 6

advertisement
Nov. 6, 2002 LAB #6
ECON 240A-1
L. Phillips
Exploratory Data Analysis, Scatterplots, Regression and ANOVA
I. This first example uses the Anscombe data set, four data files of eleven observations
each on the dependent and explanatory variable. Open the data file in Eviews and select
the four variables x1, x2, x3, x4, along with y1, y2, y3, and y4. Go to the View menu,
open selected, one window, one group. You should see the eight variables by
observation in the spreadsheet view or table.
A common practice is to rush and run a regression. This can often be fatal to
understanding the relationship between the variables. For example, go to the quick menu
and select estimate equation. In the equation specification box type y1 c x1, and hit the
OK button. Note that the estimated intercept is 3.0 and the estimated slope is 0.5, and the
coefficient of determination is 0.666. For diagnostics, in the equation window, go to the
view menu and select actual, fitted, residual: graph.
Now repeat this procedure for each of the other three data sets. Are you
enlightened yet?
As an alternative exploratory procedure, return to the workfile window. In the
main Eviews menu, select quick: graph and in the window type in x1 y1 and hit the OK
button. For graph type, choose scatter diagram, and hit the option button and select the
regression line box and hit OK. Repeat this procedure for the remaining three data sets.
Sometimes, a picture is worth a 1000 words. This is one of the points of using visual
techniques in exploratory data analysis before wheeling up the heavy artillery.
II. The Returns Generating Process
The second exercise uses a data file from Chapter 18 of the text, Xr18-34,
problem 18.34, p. 629. The problem states that this monthly data begins in January 1993
and ends with December 1996. The authors do not use net returns, i.e. net of the risk free
rate.
a. Show that this affects the interpretation of the intercept but not of the slope or
the coefficient of determination.
In Eviews, go to the File menu, select new, and workfile. In the box, click
monthly, and for the dates 93.01 and 96.12, and click OK. In the workfile window,
select procs, import, read text-Lotus-Excel. Select the Xr18-34 file in the Lab Six
folder from Econ240a folder in the classes folder, and hit the open button. The data
Nov. 6, 2002 LAB #6
ECON 240A-2
L. Phillips
Exploratory Data Analysis, Scatterplots, Regression and ANOVA
begins in cell A2. Type in 2 for the number of series, and hit the OK button. In the
workfile window, select GE and s_p_index01. In the view menu, select open selected,
one window, one group. You should be in the spreadsheet view. In the view menu, go to
multiple graphs:line and you will see plots of each series against time. In the view
menu, choose descriptive stats: common sample (since each have 48 observations.
Close the group window and select GE. Go to the view menu, open selected, one
window. In the view menu, choose descriptive statistics: histogram-stats. The
coefficient of skewness, zero for the normal distribution is not significant, and the
coefficient of kurtosis, three for a normal distribution is not significant either, as reflected
by the Jarque-Bera statistic with probability 0.545. Thus the 48 monthly returns for the
GE stock are not significantly different from normal. Select the stock index, Standard
and Poor’s Composite, and repeat this procedure. It also looks normal. Go to the quick
menu, graph and in the window type in s_p_index01 GE and hit the OK button. For
graph type, choose scatter diagram, and hit the option button and select the regression
line box and hit OK. Go to the quick menu, select estimate equation, and type in ge c
s_p_index01.
b. Is the slope significantly different from one? What does this finding mean?
c. How much of the variation in the monthly returns to GE stock is attributable
to the market?
Go to the view menu and select actual, fitted, residuals: graph. Does the equation look
OK? Go to the view menu, residual tests: histogram-normality test.
d. Are the residuals normal?
III. House Price and Multiple Regression
The third exercise is from the text and is a preview of coming attractions in Econ
240B. This is the data file XM19-02, example 19.2, p. 676. There are 100 observations
on homes with price, number of bedrooms, house size in square feet, and lot size in
square feet. This data set was imported into EViews. Select bedrooms, lot_size01,
house_ size01 and price. In the view menu, select open selected, one window, one
group. You should be in the spreadsheet view. In the view menu, go to multiple
graphs:scatter: matrix of all pairs. In the last row, you will see the scatter plots of price
Nov. 6, 2002 LAB #6
ECON 240A-3
L. Phillips
Exploratory Data Analysis, Scatterplots, Regression and ANOVA
against the other three variables. It looks like price is positively associated with all three
variables.
The text regresses price against an intercept, number of bedrooms, house size and
lot size. However from the scatter plots, it is apparent that house size and lot size are
highly correlated. Try a scatter plot of just these two variables, by selecting these two,
going to the quick menu, graph, and selecting scatter for type with a regression line as
an option. Also, you can select these two variables, go to the view menu, select open
selected, one window, one group. You should be in the spreadsheet view. In the view
menu, select correlations. The correlation coefficient is 0.994. These two explanatory
variables are highly correlated and are not providing separate variation explaining house
price. This is called multicollinearity between the explanatory variables, and causes large
standard errors for the slope coefficients for these explanatory variables, and hence low tstatistics, eventhough the coefficient of determination is high. One remedy is to regress
price against a constant, bedrooms and house size.
a. Interpret the estimated regression coefficient on house size.
b. Interpret the estimated coefficient on bedrooms.
IV. Exercises
#1. An alternative to the regression of price on a constant, number of bedrooms
and house size would be to estimate a separate intercept for two bedroom houses, three
bedroom houses, etc., similar to the approach in Lab Five where we estimated separate
intercepts for each industry in the regression of lnassets on lnsales. Select bedrooms, go
to the view menu, open selected, one window. In the view menu, choose descriptive
statistics: histogram-stats. The number of bedrooms ranges from two to five. Create the
dummy variables and run the regression. Are there any anomalies?
#2. Percent of Household Income Spent on Lotteries
This exercise uses XR19-56. This data was imported into Eviews. Look at
problem 19.56 on p. 692, and work though the questions. There is a subtle problem with
this data for the variable, percent of household income spent on lotteries, if it is used as
the dependent variable in a regression. Use exploratory data analysis to examine the
variables, especially the dependent one, and look for special features. We will discuss this
Nov. 6, 2002 LAB #6
ECON 240A-4
L. Phillips
Exploratory Data Analysis, Scatterplots, Regression and ANOVA
example in class. Also look at the variable for personal income and note its distribution
(histogram). This skewness is typical of income data. Take the natural logarithm of
income and look at its histogram.
Download