Undergraduate Econometrics

advertisement
Undergraduate Econometrics
The following section contains the paragraphs:
(1.1)
Introduction
(1.2)
The random disturbances
(1.3)
The choice of regressors
(1.4)
A belief review of central concepts in statistics
1.1 Introduction
Before we start it is natural to ask the question: “What is Econometrics?” Strange as it may
seem, there does not exist a generally accepted answer to this query. Responses vary
from the silly ”Econometrics” is what econometricians do” to the more sturdy
“Econometrics is the study of the application of statistical methods to the analysis of
econometrics phenomena.”
This confusion stems from the fact that econometricians wear many different hats.
First and foremost they are economists, capable of utilizing economics theory to
improve their empirical analyses of the problems they address. At times they are
applied mathematicians, formulating relevant economic theory in ways that make it
appropriate for statistical analysis. At times they are accountants, concerned with
finding and collecting economic data and relating theoretical economic variables to
observable variables. At times they are applied statisticians, spending hours with the
computer trying to estimate economic relationships or to predict economic events.
However, econometricians are not applied statisticians. While econometricians have
their base in economics or economic theory, applied statisticians have their base in the
statistical theory.
The early econometricians had the principal idea that economic theories or
hypotheses had limited value if they could not be confronted by empirical data.
Therefore, they set off to bridge the gap between theoretical and empirical economics.
Using statistical methods as the principal tool, they wanted to ascertain the validity as
1
well as the strength of the various theories.
Nowadays, econometrics has drifted into any branch of economics. Almost any
part abounds by empirical analyses. For example, in macroeconomics we can be
interested in the relation between consumption, private incomes, the interest rate, the
unemployment rate, etc. In labour market economics we might be interested in
explaining how the wage rate depends upon the workers’ education, skill, the firms
technical equipment, etc. In microeconomics we might be interested in explaining the
dependence of demand of a certain good on its price, the prices of alternative goods
and the consumers’ incomes. We realize that the list of possible applications can be
made indefinitely long. Reading such a list might give us the impression that
performing econometric analyses are easy work. But this is not true.
In the first place we cannot or it is very difficult to carry out controlled
experiments in economies. Hence, we cannot reproduce an experiment under identical
conditions if we want to control the results obtained in a specific experiment. In
controlled experiments it will follow from the designs of the experiments which
variable is the dependent variable and which variables are independent or causal
variables. In econometrics we have to use data generated by agents’ market behaviour.
Often peoples’ market behaviour is influenced by a multitude of factors, and the clear
stimuli  response pattern characteristic for experiments in real sciences
(experimental data) will usually be absent in data generated by peoples’ market
behaviour (we often call such data non-experimental data). Indeed, with market data it
is often not easily decided which is the dependent (endogenous) variable and which
are independent (exogenous) variables. Although the distinction between cause and
effect variables is no longer obvious by ‘the design of the experiment’, it can often be
justified by economic theory or by economic reasoning in general. But, of course,
sometimes we have to take courageous decisions.
Broadly speaking, by the econometric modeling we aim at achieving two goals:
(i)
To estimate or predict reliably one endogenous variable, given one or more
exogenous variables.
2
(ii)
To obtain a causal explanation of an endogenous variable as a function of
one or more exogenous variables.
This means that we strive to obtaining permanent or stable relationships. That is, we
wish to find something more than a more or less casual co-variation between a set of
variables. We all know that a high correlation between two variables does not imply a
causal relation between the variables. We have all heard about the Danish study that
found a positive correlation between the number of stork nests and the number baby
berths in Copenhagen, but nobody would hold that there is causal relation between
these two variables.
Not seldom can a high correlation between two variables be generated by a latent
(unobservable) variable. Hence, the correlation we observe is superficial or spurious.
Formally, we can have the situation: X and Y are the observable variables, Z is the
unobservable latent variable, and  1 and  2 are the random disturbances. Suppose we
have the simple structure:
(1.1.1)
(1.1.2)
X   0  1Z  
Y  0  1Z   2

The causal relations are between Z and X and Z and Y. In order to make things simple
we suppose that the latent variable Z and the random disturbances 1 and  2 are
stochastically independent with means 0 and variances  Z2 ,  12 and  22 We know that
the correlation coefficient  = cbetween X and Y is defined by:
(1.1.3)
  cov( X , Y ) / var( X )  var(Y )
Then, using the relations (1.1)--(1.2) and the stochastic independence we calculate
directly:
3
(1.1.4)

11 Z2
12 Z2   12 12 Z2   22
From (1.1.4) we observe that if the variations of the disturbances  1 and  2 are
small (  12 and  22 are small), the latent variable Z will generate a high correlation
coefficient between X and Z.
(1.2)
The random disturbances.
A major distinction between economists and econometricians is the latter’s
concern with the random disturbance terms. An economist will specify, for example,
that consumption is a function of income, and write C  f (Y ) where C is
consumption and Y is income. An econometrician will claim that this relation must
also include a disturbance term, and may alter the equation to read C  f (Y )  
where  is the disturbance term. Without the disturbance term the relation is said to
be exact or deterministic; with the disturbance term it is said to be stochastic (random).
The reason for including the disturbance term is justified along the following lines.
(i) It summarizing the influence (impact) on the endogenous (response) variable of
innumerable random factors.
Firstly, although income might be the major determinant of the level of
consumption, it is certainly not the only determinant. Other variables, such as the
interest rate, the consumer’s wealth, may also have a systematic impact on
consumption. Their omission challenges our specification and interpretation of the
regression coefficients. In addition to these systematic factors, however, the level
of consumption is also influenced by innumerable purely random event, such as
wealth variations, taste changes, etc., etc. The influence of these latter variables is
assumed to be highly irregular or random, so the disturbance term is included to
4
represent the net impact of a large number of such small independent stocks.
(ii) Measurement errors. It may be the case that the variable being explained
cannot be measured accurately, either because of data collection difficulties or
because it is inherently immeasurable and a proxy variable must be used instead.
The disturbance term will in these cases also represent measurement errors.
However, measurements errors in dependent variables do not create serious
problems although they will increase the variance of  , but measurements errors
in the exogenous variables will create
problems. Measurement errors in
exogenous variables will raise serious problems of identifying the impact of the
explanatory variables.
(iii) Human indecisiveness. Some people believe that human behaviour is such
that actions taken under identical circumstances will differ in a random way. The
disturbance term can be viewed as representing this randomness in human
behaviour.
Generally, in regression analysis one aims at obtaining disturbances that are
small and irregular. Small, for in the regression specified the disturbance terms
play the role of a remainder that is left unexplained by the analysis. Irregular, for
any trace of regularity in the disturbance terms may in principle be regarded as a
systematic tendency which is accordingly left unexplained. Any systematic
tendency should be explicitly specified in the regression equation.
(1.3)
The choice of regressors (independent variables).
In econometric modeling econometricians always face the problem: “ which
variables should be taken into account as regressors or independent variables?” This is
a major problem in any econometric modeling. Since good or preferred behaviour of
the disturbance terms presupposes a satisfactory answer to this question. At this stage
of the modeling process we have to mobilize all relevant economic theory as well as
5
our general experience relating to the task at hand. When this is said, it must be
admitted that any guiding principles in this respect will be vague.
As an illustration suppose we wish to explain the demand of a certain consumer
good (Y). From demand analysis we know that relevant regressors will be the price of
the good, the prices of other goods entering the consumers’ budgets, the consumers’
incomes, the size of the households, etc. etc. In this way we can make a list of K
regressors ( X 1 , X 2, ..., X K ). Thereafter we have to decide how a household with a
given vector of explanatory variables, ( X 1 , X 2, ..., X K ) determines its demand for the
good (Y ) . A assertion in economics is that the household will determine (Y) in a joint
maximization process. Hence, the demand (Y) depends upon ( X 1 , X 2, ..., X K ) but also
on the household’s preference (‘utility function’) As a result of this optimization
process the household’s demand is given by:
(1.3.1)
Yˆ  f ( X1 , X 2 ,....X K )
where f depends on the household’s preferences.
Following this approach we will observe that a household with identical vectors of
regressors ( X 1 , X 2, ..., X K ) on two different time points, will demand different
quantities of the good. We realize that regardless how much we tried to improve our
modeling, there will always be a discrepancy between the demand we observe and the
demand we can explain by our model. The deviation is, of course, the disturbance
term which we explained above. Hence,
(1.3.2)
 i  Y i  Yˆ i observable demand –predicted demand
In the literature Yˆ is also called expected demand given ( X 1 , X 2, ..., X K ), or
6
(1.3.3)
Yˆ  E Y X1 , X 2 ,..., X K   f ( X1 , X 2 ,..., X K )
7
Download