Undergraduate Econometrics The following section contains the paragraphs: (1.1) Introduction (1.2) The random disturbances (1.3) The choice of regressors (1.4) A belief review of central concepts in statistics 1.1 Introduction Before we start it is natural to ask the question: “What is Econometrics?” Strange as it may seem, there does not exist a generally accepted answer to this query. Responses vary from the silly ”Econometrics” is what econometricians do” to the more sturdy “Econometrics is the study of the application of statistical methods to the analysis of econometrics phenomena.” This confusion stems from the fact that econometricians wear many different hats. First and foremost they are economists, capable of utilizing economics theory to improve their empirical analyses of the problems they address. At times they are applied mathematicians, formulating relevant economic theory in ways that make it appropriate for statistical analysis. At times they are accountants, concerned with finding and collecting economic data and relating theoretical economic variables to observable variables. At times they are applied statisticians, spending hours with the computer trying to estimate economic relationships or to predict economic events. However, econometricians are not applied statisticians. While econometricians have their base in economics or economic theory, applied statisticians have their base in the statistical theory. The early econometricians had the principal idea that economic theories or hypotheses had limited value if they could not be confronted by empirical data. Therefore, they set off to bridge the gap between theoretical and empirical economics. Using statistical methods as the principal tool, they wanted to ascertain the validity as 1 well as the strength of the various theories. Nowadays, econometrics has drifted into any branch of economics. Almost any part abounds by empirical analyses. For example, in macroeconomics we can be interested in the relation between consumption, private incomes, the interest rate, the unemployment rate, etc. In labour market economics we might be interested in explaining how the wage rate depends upon the workers’ education, skill, the firms technical equipment, etc. In microeconomics we might be interested in explaining the dependence of demand of a certain good on its price, the prices of alternative goods and the consumers’ incomes. We realize that the list of possible applications can be made indefinitely long. Reading such a list might give us the impression that performing econometric analyses are easy work. But this is not true. In the first place we cannot or it is very difficult to carry out controlled experiments in economies. Hence, we cannot reproduce an experiment under identical conditions if we want to control the results obtained in a specific experiment. In controlled experiments it will follow from the designs of the experiments which variable is the dependent variable and which variables are independent or causal variables. In econometrics we have to use data generated by agents’ market behaviour. Often peoples’ market behaviour is influenced by a multitude of factors, and the clear stimuli response pattern characteristic for experiments in real sciences (experimental data) will usually be absent in data generated by peoples’ market behaviour (we often call such data non-experimental data). Indeed, with market data it is often not easily decided which is the dependent (endogenous) variable and which are independent (exogenous) variables. Although the distinction between cause and effect variables is no longer obvious by ‘the design of the experiment’, it can often be justified by economic theory or by economic reasoning in general. But, of course, sometimes we have to take courageous decisions. Broadly speaking, by the econometric modeling we aim at achieving two goals: (i) To estimate or predict reliably one endogenous variable, given one or more exogenous variables. 2 (ii) To obtain a causal explanation of an endogenous variable as a function of one or more exogenous variables. This means that we strive to obtaining permanent or stable relationships. That is, we wish to find something more than a more or less casual co-variation between a set of variables. We all know that a high correlation between two variables does not imply a causal relation between the variables. We have all heard about the Danish study that found a positive correlation between the number of stork nests and the number baby berths in Copenhagen, but nobody would hold that there is causal relation between these two variables. Not seldom can a high correlation between two variables be generated by a latent (unobservable) variable. Hence, the correlation we observe is superficial or spurious. Formally, we can have the situation: X and Y are the observable variables, Z is the unobservable latent variable, and 1 and 2 are the random disturbances. Suppose we have the simple structure: (1.1.1) (1.1.2) X 0 1Z Y 0 1Z 2 The causal relations are between Z and X and Z and Y. In order to make things simple we suppose that the latent variable Z and the random disturbances 1 and 2 are stochastically independent with means 0 and variances Z2 , 12 and 22 We know that the correlation coefficient = cbetween X and Y is defined by: (1.1.3) cov( X , Y ) / var( X ) var(Y ) Then, using the relations (1.1)--(1.2) and the stochastic independence we calculate directly: 3 (1.1.4) 11 Z2 12 Z2 12 12 Z2 22 From (1.1.4) we observe that if the variations of the disturbances 1 and 2 are small ( 12 and 22 are small), the latent variable Z will generate a high correlation coefficient between X and Z. (1.2) The random disturbances. A major distinction between economists and econometricians is the latter’s concern with the random disturbance terms. An economist will specify, for example, that consumption is a function of income, and write C f (Y ) where C is consumption and Y is income. An econometrician will claim that this relation must also include a disturbance term, and may alter the equation to read C f (Y ) where is the disturbance term. Without the disturbance term the relation is said to be exact or deterministic; with the disturbance term it is said to be stochastic (random). The reason for including the disturbance term is justified along the following lines. (i) It summarizing the influence (impact) on the endogenous (response) variable of innumerable random factors. Firstly, although income might be the major determinant of the level of consumption, it is certainly not the only determinant. Other variables, such as the interest rate, the consumer’s wealth, may also have a systematic impact on consumption. Their omission challenges our specification and interpretation of the regression coefficients. In addition to these systematic factors, however, the level of consumption is also influenced by innumerable purely random event, such as wealth variations, taste changes, etc., etc. The influence of these latter variables is assumed to be highly irregular or random, so the disturbance term is included to 4 represent the net impact of a large number of such small independent stocks. (ii) Measurement errors. It may be the case that the variable being explained cannot be measured accurately, either because of data collection difficulties or because it is inherently immeasurable and a proxy variable must be used instead. The disturbance term will in these cases also represent measurement errors. However, measurements errors in dependent variables do not create serious problems although they will increase the variance of , but measurements errors in the exogenous variables will create problems. Measurement errors in exogenous variables will raise serious problems of identifying the impact of the explanatory variables. (iii) Human indecisiveness. Some people believe that human behaviour is such that actions taken under identical circumstances will differ in a random way. The disturbance term can be viewed as representing this randomness in human behaviour. Generally, in regression analysis one aims at obtaining disturbances that are small and irregular. Small, for in the regression specified the disturbance terms play the role of a remainder that is left unexplained by the analysis. Irregular, for any trace of regularity in the disturbance terms may in principle be regarded as a systematic tendency which is accordingly left unexplained. Any systematic tendency should be explicitly specified in the regression equation. (1.3) The choice of regressors (independent variables). In econometric modeling econometricians always face the problem: “ which variables should be taken into account as regressors or independent variables?” This is a major problem in any econometric modeling. Since good or preferred behaviour of the disturbance terms presupposes a satisfactory answer to this question. At this stage of the modeling process we have to mobilize all relevant economic theory as well as 5 our general experience relating to the task at hand. When this is said, it must be admitted that any guiding principles in this respect will be vague. As an illustration suppose we wish to explain the demand of a certain consumer good (Y). From demand analysis we know that relevant regressors will be the price of the good, the prices of other goods entering the consumers’ budgets, the consumers’ incomes, the size of the households, etc. etc. In this way we can make a list of K regressors ( X 1 , X 2, ..., X K ). Thereafter we have to decide how a household with a given vector of explanatory variables, ( X 1 , X 2, ..., X K ) determines its demand for the good (Y ) . A assertion in economics is that the household will determine (Y) in a joint maximization process. Hence, the demand (Y) depends upon ( X 1 , X 2, ..., X K ) but also on the household’s preference (‘utility function’) As a result of this optimization process the household’s demand is given by: (1.3.1) Yˆ f ( X1 , X 2 ,....X K ) where f depends on the household’s preferences. Following this approach we will observe that a household with identical vectors of regressors ( X 1 , X 2, ..., X K ) on two different time points, will demand different quantities of the good. We realize that regardless how much we tried to improve our modeling, there will always be a discrepancy between the demand we observe and the demand we can explain by our model. The deviation is, of course, the disturbance term which we explained above. Hence, (1.3.2) i Y i Yˆ i observable demand –predicted demand In the literature Yˆ is also called expected demand given ( X 1 , X 2, ..., X K ), or 6 (1.3.3) Yˆ E Y X1 , X 2 ,..., X K f ( X1 , X 2 ,..., X K ) 7