Are the disturbances in the regressions equations normally distributed

Are the disturbances in the regressions equations normally distributed?
We have already seen that when deducing the distributions of the test statistics and the bounds
for confidence intervals for regressions coefficients, we relied heavily on the assumption that
the random errors in the regression equation are normally distributed.
Although, much of the discussion of choosing appropriate model specifications are motivated
by finding sensible economic models, this discussion at the same time aimed at finding
specifications in which we could reasonably assume that the random disturbances were
normally and independently distributed.
We thus understand that it is important to have methods to ascertain if the disturbances
entering our regressions are normally distributed. The proper way to settle this issue is, of
course, to design a test capable of disclosing the true properties of the random errors. As
always with statistical tests we have to start with constructing a suitable test statistic.
Since we want to conclude on the basis of this test whether we should reject or not reject the
null hypothesis that the random errors are normally distributed, our test statistic must in some
way reflect essential characteristic properties of a normally distributed variable.
Assuming that the disturbances  i  are normal with mean zero and variance  2 , we know at
once that the density of  i  is symmetric around zero. For a symmetric distribution we know
that every moment of odd order about the mean is equal to zero.In the present application this
means that:
 
 
E  3  E  5  ......  E  2 j 1  0
j  1,2,3....
Therefore, it is customary to use the third central moment as a measure of the skewness of a
distribution. Hence, if we know that E  3  0 the errors of the regression equation can not
be normally distributed. In order to eliminate the impact of different measurement units in the
variables, one usually measure the skewness of a distribution by the characteristic:
 
1 
 
where  3  E  3
and  denotes the standard deviation of the variable.
A sensible test statistic in this case should depend on skewness or  1 in some way. We note
again that if the disturbances  i  are normally distributed then  1  0 .
The second characteristic we want to include into the test statistic, is a parameter that
describes the peak values of the density. The point here is that a normal density has a rather
“flat” peak as illustrated in the figure below.
As a measure of the degree of flattening of a density we usually use the characteristic named
kurtosis and defined by:
2 
 
where  4  E  4 and  is again the standard deviation.
For a normal distribution  2  0 . Note that Hill et. al. use  4 /  4 as their definition of
kurtosis, but the standard definition is that given by (1.3) (see i.e. K. Sydsæter
“Matematisk formelsamling for økonomer” p. 181).
The Jarque-Bera statistic
The Jarque-Bera statistic is based on the sample values of the two characteristics (1.2) and
(1.3). Let us consider the regression:
Yi   0  1 X i1   2 X i 2  ....   k X ik   i
i  1,2,....., n
OLS regression provides us with estimators ˆ0 , ˆ1 ,....., ˆk . Now, the residuals:
ei  Yi  Yˆi
i  1,2,....., n
are estimates of  1 ,  2 ,...,  n .
Using (2.2) an estimator of skewness (1.2) is given by:
ˆ1 
ˆ 3
ˆ 3  1
1 n 3
 ei
n i 1
 e 
3/ 2
and similarly an estimator of the kurtosis (1.3) is given by:
ˆ2 
1 n 4
 ei
n i 1
ˆ 4
3 
ˆ 4
  ei2 
3  k 3
where k  ˆ 4 / ˆ 4
In order to test the null hypothesis:
H 0 :  i is N (0,  2 ) and identicall y, independently distributed for all i.
Jarque-Bera defined the test statistic:
k  3
JB   S 2 
 where S 2  ˆ1 2
When H 0 is true the JB statistic (2.5) has a  2 distribution (asymptotically) with 2 degrees
of freedom. Thus, if the estimated value JBˆ of the test statistic is “small” we will not reject
H . We will only reject the null hypothesis when the observed value of JBˆ is sufficiently
large. What decision should be taken in a particular case depends on the chosen level of
significance   and the implied threshold value which we find from tables of the
 2 distributi on with 2 degrees of freedom.