DEPARTMENT OF ECONOMICS Unit ECON 12122 Introduction to Econometrics Notes 2 Interpreting Regression Results These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also consult the reading as given in the unit outline and the lectures. 2.1 Introduction The most important skill that you should learn from the Econometric part of this unit is how to interpret regression results. It is something to which we will return later. However it is useful to start doing this straight away. Suppose we have a linear regression model and a sample of data on the variables involved. We then wish to estimate the parameters of the model. How we estimate these parameters will be discussed in the next section of this unit. For the moment, we will simply assume that good estimates have been obtained. What are we going to do with them? What do they mean? These questions will be answered by means of two examples. 2.2 Consumption Function The first example is the econometric model of the consumption function (see notes 1); C = a + bY + u If we now have a data sample over time (this is called a time series) of data on C and Y We would now write this as C t = a + bYt + u t t=1,2,.… ,n (1) If the data sample is taken from the UK annual data 1948-1995, the estimated regression is Ct = 443.79 + 0.815 Yt + (470.29) (0.002) et t=1,2,… .,n Standard errors in brackets, n = 48. 1 (2) There are a number of points to be made about (2) (i) Equation (2) says that the point estimate of a (often called aˆ) is 443.79, the point estimate of b is 0.815. The interesting thing about the point estimate of b is that it is positive and lies between zero and one. This is clearly promising in terms of whether the mpc lies between zero and one. (ii) Note that whereas in (1) the random variable is called u t in (2) it is called e t . In (1) the random term is an unobserved disturbance. We never (or very rarely) have data on u t . e t is a random residual. We can calculate the n values of e t easily once we have estimates of a and b (given we also have data on C t and Yt ) It will often be useful to investigate the values of e t . But it is important to bear in mind that they are NOT the same as ut . (iii) The results say that “standard errors are in brackets”. Nearly always when regression results are given, there will be figures in brackets under the estimates. Unfortunately there is no agreement as to what these figures should be (although they are always intended to provide some measure of the precision of the estimates). This is why it says “standard errors in brackets”. This means that the estimated standard deviation of the point estimate (443.79) is 470.29 and the estimated standard deviation of the point estimate of the slope (0.815) is 0.002. 2.3 Interpreting the intercept and the slope of a simple regression Equation (2) gives the results of a simple regression. It is called simple because there is only one explanatory variable. The estimate of the intercept is usually of little interest. If the intercept was not in the model then the regression line would be forced to pass through the origin which might not be appropriate. Sometimes the intercept is interpreted as the value of the dependent variable when the explanatory is zero. Often this counterfactual does not make much sense. Slightly unusually, in this model the intercept does have an interpretation in relation to the average propensity to consume (see Exercise 5) 2 The estimate of the slope is more interesting. Precisely what it means depends on the model. The slope parameter in this model is the marginal propensity to consume (mpc). An mpc = 0.815 implies that for every extra pound of disposable income, consumption will go up by £0.815. This appears to be sensible. Suppose we wanted to know what these estimates indicate would happen if disposable income ( Yt ) increased by 10 per cent i.e. we wanted the elasticity of consumption with respect to disposable income. The formula for this elasticity (at a point) is given by η= δ C t Yt δ Yt C t This depends on δC t δ Yt which in this model is b (a constant) and which is estimated as 0.815. The ratio Yt varies over the sample. Thus the elasticity will also vary over the Ct sample. We can estimate the elasticity at the point of sample means or at a particular date. The sample mean of C t is £105,780 million and of Yt is £129,320 million. Thus the estimate of the elasticity at the point of the sample means is 0.996. Alternatively the values of C t and Yt is 1995 were £406,375 million and £502,433 million respectively. These figures given an estimate of the elasticity in 1995 of 1.008. In terms of the consumption function, it is of interest that both these values are close to one. 2.4 What do the standard errors tell us? As mentioned above the standard errors of the estimated parameters are estimates of the standard deviation of these estimates and are thus an indicator of their precision. The square of the standard errors gives an estimate of the variance of the estimated parameters. It is important to remember that estimates in statistics are realisations of random variables. Random variables generally have a certain distribution and any good estimation method provides estimates with a known distribution. In the case of these estimates, provided the model satisfies certain assumptions (these will be discussed later in the unit), the estimates will have a t distribution with (n− 2) degrees of freedom where n is the number of observations; 48 in this case. So in this example both the estimates of the intercept and the slope have a t distribution with 46 degrees of freedom. 3 This is a very useful fact because we can now construct a confidence interval for both estimates. Using the tables of the t distribution and taking a 95% confidence interval, we find that the confidence intervals are 443.79 ± 470.29×2.014 for a, or -503.37 ≤a ≤1,390.95 0.815 ± 0.002×2.014 for b, or 0.811 ≤b ≤0.819 (2.014 is in fact the 0.975 point of the t distribution with 45 degrees of freedom) Thus we can see that the intercept has a comparatively large 95 % confidence interval reflecting the fact that its standard error is large. Whereas the estimate of the mpc has a much smaller 95 % confidence interval reflecting the greater precision with which this parameter has been estimated. We are now in a position to answer the question with which we started this example – does the mpc lie between zero and one? These estimates suggest that, for this data sample at least, we can say with a high degree of confidence that it does. 2.5 Are these regression results any good? It appears that these regression results entirely confirm Keynes’ theory that consumption is a linear function of disposable income and that the slope parameter (the mpc) lies between zero and one. So far so good. But are these regression results reliable? Are they really telling us something about the UK economy and the behaviour of British consumers in this sample period? If they are, this is important information for policy makers and other agents in the economy. If not, they are likely to be misleading. It is crucial to ask these questions when considering regression results and a large part of Econometrics is devoted to answering these kinds of questions. At the moment, two ways of tackling this problem are suggested, both of them graphical. (i) The model ASSUMES that the relationship between consumption and disposable income is linear. Is this true? A scatter diagram of C t against Yt provides one way of answering this question (see lecture). (ii) Although we do not observe the random disturbances u t we can observe the regression residuals e t . These can be graphed over time. In the linear regression model, the disturbances u t are often assumed to be independent of each other. Are the regression residuals e t also independent? 4 We will discuss the problem of “non-independent” residuals later in the unit. For the moment we simply note that the observed pattern of residuals suggest that they are probably not independent and that this raises questions about whether these regression results are really quite as good as they might seem. 2.6 Production Function Example In the lecture 1 we discussed an example of an econometric model of production using the Cobb-Douglas production function in order to test the hypothesis of constant returns to scale. The regression model was y = A + α1 k + α 2 l + u Where y = log(Y), A = log( α 0 ), k = log(K), l = log(L) and the random disturbance is now called u with E(uk,l) = 0. If we estimate the parameters of this model on annual UK data for manufacturing 1972-1995 we obtain the following results. yt = 2.776 + 0.284 k t + 0.007 l t (5.030) (0.499) (0.251) + et t = 1,2,..,n (3) standard errors in brackets, n = 24, e t is the regression residual. Equation (3) is an example of some estimates of a multiple regression. It is called multiple because there is more than one explanatory variable. As before we will not pay much attention to the estimate of the intercept. It is the estimates of the slopes which are much more important. Are these estimates the sort of estimates which we would expect? To answer this question we have to consider the model which we are estimating. Recall that this is the Cobb-Douglas production function. For this production function, if the industry (or industries) concerned are competitive and pay both capital and labour inputs their marginal products, then the slope parameters ( α1 , α 2 ) have the interpretation that they are the shares of total output paid to each factor. Thus we expect the slopes to be positive and less than one. Both the point estimates satisfy these requirements. The point estimate of the capital parameter (0.284) appears to be reasonable, however the point estimate of the labour parameter (0.007) is rather low. The sum of the two point estimates (0.291) is not very close to one, suggesting that the hypothesis of constant returns to scale may not be true for UK manufacturing in this sample period. 5 Turning to the standard errors, they are both fairly large in relation to the estimated parameters. In multiple regression the estimated parameters have a t distribution with (n-k) degrees of freedom, where n is the number of observations and k is the number of parameters in the model (or alternatively the number of explanatory variables plus one for the intercept). Thus in this case both point estimates have a t distribution with 21 degrees of freedom. Using these facts we can construct 95% confidence intervals for both parameters. They are 0.284 ± 0.499×2.08 or − 0.754 ≤α1 ≤1.322 0.007 ± 0.251×2.08 or − 0.515 ≤α 2 ≤0.522 Both these confidence intervals include zero and some negative numbers. The estimates are sufficiently imprecise to include values which we know cannot be correct. The confidence interval for α1 also includes one, so the question of whether α1 + α 2 = 1 is, at the moment, unresolved. Testing this hypothesis requires techniques which will be introduced later in the course, so we will leave this question to one side for the moment. The point of this discussion is that the comparatively wide confidence intervals suggest that these regression results may also be flawed in some way. For instance on the basis of these estimates we cannot reject the hypotheses that either α1 = 0 or α 2 = 0 . This is not necessarily a problem, but it does raise a question about whether these regression results are very useful. Again the best method of investigating what is going on is to look at graphs of the variables and the residuals. In models where there is more than one explanatory variable, scatter diagrams are not always informative. However we can examine how these three variables and the residuals behave over the sample period. (For a discussion of these graphs and their implications see the lectures.) 2.7 Confidence Intervals and Hypothesis Tests In both the examples given above, we began with point estimates and then used the standard errors of the estimated parameters and knowledge of their distribution to construct confidence intervals. We can also use the same information to test hypotheses about the parameters of interest. 6 In the consumption function case we might wish to test the hypothesis that b = 1. Thus H0 : b = 1 H1 : b < 1 This is a one-sided test. The test statistic is 1 − 0.815 = 92.5 0.002 This is distributed as t with 46 degrees of freedom. The 95% critical value for a one sided test is 1.68, and thus the null hypothesis ( H 0 ) is rejected. In the production function example if we wished to test the hypothesis that α1 = 1,we could do so by testing the null against a two-sided alternative; H 0 : α1 = 1 H 1 : α1 ≠ 1 The test statistic is 1 − 0.284 = 1.435 0.499 This has a t distribution with 21 degrees of freedom. The 95% critical value is 2.08. We cannot reject H 0 . From this it can be seen that confidence intervals and two-sided hypothesis tests are very similar. An α per cent confidence interval gives the range of two-sided null hypotheses that will not be rejected at the α per cent level. 7