Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN

advertisement
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
LEAST ABSOLUTE DEVIATIONS ESTIMATION FOR THE CENSORED
REGRESSION MODEL
B. Sanjith, Department of Statistics
Manonmaniam Sundaranar University, Tirunelveli – 627 012
R. Elangovan, Department of Statistics, Annamalai University
Annamalai Nagar – 608 002.
ABSTRACT
One of the most extensively and exhaustively discussed methods among the statistical tools
available for analysis of data is “regression.” In the classical approach to the regression problem, the
objective is to minimize the sum of squared deviations from the observed and the predicted values of the
dependent variable and this method is known as the least squares method; it uses classical optimization
methods and generalized inverses. Another method used is minimizing mean absolute deviation from the
predicted and observed values of the dependent variable. This problem is known as L1 – norm minimization
or Least Absolute Deviations (LAD) method in literature. Third method considered in the literature is that
Chebyshev criterion of minimizing the maximum of the absolute deviations from the observed and the
predicted values of the dependent variables. The method of minimizing the sum of absolute and squared
deviations from hypothesized linear models have vied for statistical favour for more than 25 decades.
While least squares enjoy certain well-known optimality properties within strictly Gaussian parametric
models, the least absolute error (LAE) estimator is a widely recognized superior robust method especially
well suited to longer tailed error distributions. In this paper, Least absolute deviations estimation for the
censored regression model has been discussed. The analytical results are substantiated with numerical
illustrations.
1.
Introduction
One of the most extensively and exhaustively discussed methods among the statistical tools
available for analysis of data is “regression.” In the classical approach to the regression problem, the
objective is to minimize the sum of squared deviations from the observed and the predicted values of the
dependent variable and this method is known as the least squares method; it uses classical optimization
methods and generalized inverses. Another method used is minimizing mean absolute deviation from the
predicted and observed values of the dependent variable. This problem is known as L1 – norm
minimization or Least Absolute Deviations (LAD) method in literature. Third method considered in the
107
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
literature is that Chebyshev criterion of minimizing the maximum of the absolute deviations from the
observed and the predicted values of the dependent variables. The method of minimizing the sum of
absolute and squared deviations from hypothesized linear models has vied for statistical favour for more
than 25 decades. While least squares enjoy certain well-known optimality properties within strictly
Gaussian parametric models, the least absolute error (LAE) estimator is a widely recognized superior
robust method especially well suited to longer tailed error distributions. The problem of estimating linear
relationships when all variates are subject to error, although of frequent occurrence in science, is rarely
discussed in statistical courses of tests because most statisticians regard it is insoluble. This attitude arises
from considering only a mathematical formulation which gives rise to an insoluble problem. The least
square regression has dominated the statistical literature for a long time. A number of estimation
procedures which are most robust to departures from the usual least square assumptions have been
discussed in the recent statistical literature. The Minimum Sum of Absolute Errors (MSAE) regression is
considered a robust alterative to least squares regression by a number of authors.
Many of the important recent advances in econometric methods pertain to limited dependent variable
models – that is, regression models for which the range of the dependent variable is restricted to some
subset of the real line. Such prior restrictions quite commonly arise in cross-section studies of economic
behaviour; often, for some fraction of individuals in a sample, implicit non-negativity or other inequality
constraints are binding for the variable of interest. In a regression model, an inequality constraint for the
dependent variable results in a corresponding bound on the unobservable error terms, this bound being
systematically related to the value of the regression function. Hence, the mean of the restricted error term is
not zero, and the usual conditions for consistency of least squares estimation will not apply.
The censored regression model can be written in the form; yt = max {0, x't0 + ut}, t=1,...,T, where the
dependent variable yt and regression vector xt are observed for each t, while the parameter vector 0 and
error term ut are unobserved. It will be presumed throughout that estimation of 0 is the primary object of
the statistical analysis. The LAD estimator for the censored regression model minimizes the sum of
absolute deviations of yt from max {0,x't0} over all  in the parameter space B(say). Algebraically, the
censored LAD estimator ̂ T minimizes; ST ()  (1 / T)
T

|yt – max {0, x't}|;   B. As for the standard
t 1
regression model, LAD estimation may be computationally burdensome for the censored regression model,
because the function to be minimized is not continuously differentiable; nevertheless, it does provide a
consistent alternative to likelihood-based procedures when prior information on the parametric form of the
error density is unavailable. In this paper it is proposed to study the LAD regression for the censored
regression model. The analytical results are substantiated with numerical illustrations.
2.
Recent Developments in LAD regression theory
Strong convergence of LAD estimates in a censored regression model has been discussed by
Fang and Zhao (2005). Change-point estimation for censored regression model has been discussed by
Wang and Zhao (2007). Reducing bias in parameter estimates from stepwise regression in proportional
hazards regression with right-censored data has been discussed by Soh and Harrington (2008).
Approximation by randomly weighting method in censored regression model has been discussed by Wang
et al. (2009). LS versus LAD estimation in regression model has been discussed by Eakambaram and
Elangovan (2009). On the least absolute error estimation of linear regression model with auto-correlated
errors has been discussed by Eakambaram and Elangovan (2010). Asymptotic analysis of highdimensional LAD regression with LASSO has been discussed by Gao and Huang (2010). A comparison of
different methods for LAD regression is discussed by Chen et. al (2010). Estimation of censored panel-data
models with slope heterogeneity has been discussed by Abrevaya and Shen (2014). Penalized LAD
Regression for single index models have been discussed by Yang et. al. (2014). The adaptive L1 –
penalized LAD regression for partially linear single-index models have been discussed by Yang and Yang
108
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
(2014). Estimation of panel data regression models with two-sided censoring or truncation have been
discussed by Alan et. al. (2014)
3.
Regression Methods
3.1.
Simple Linear Regression
The functional relationship of Y and X is of the following form
Y = 0 + 1X + ,
... (1)
which is known as simple linear regression of Y on X. 0 and 1 are called parameters, and should be found
in equ. (1) means that for a given Xi, a corresponding Yi consists of 0 + 1Xi and an i by which an
observation may fall off the true regression line. On the basis of the information available from the
observations we would like to estimate 0 and 1. The term  is a random variable and is called “error
term”. From the equ. (1) one can write Yi - 0 - 1Xi = i. Finding 0 and 1 from (Xi, Yi), i = 1, 2, ..., n is
called estimation of the parameters. There are different methods of obtaining such estimates. In the
following sections we will consider methods by minimizing (i) sum of squared deviations, (ii) mean
absolute deviation, and (iii) maximum of absolute deviations.
3.2.
Minimizing Sum of Squared Deviations (Least Squares Regression)
This method is based on choosing o and 1 so as to minimize the sum of squares of the vertical
deviations of the data points from the fitted line. The sum of squares of deviations (SSD) from the line is
n 2
n
SSD =   i   (Yi - 0 - 1Xi)2
... (2)
i 1
i 1
Then we would choose the estimates 0 and 1 so that the sum of squared deviations in equ. (2) is
minimum. Differentiating equ. (2) with respect to 0 and 1 and setting the resultant partial derivatives to
zero, we have
n
SSD
= -2  (Yi - 0 - 1Xi) and
 0
i 1
n
SSD
= -2  Xi (Yi - 0 - 1Xi)
1
i 1
And hence,
n

(Yi - 0 - 1Xi) = 0 and

Xi(Yi - 0 - 1Xi) = 0
i 1
n
... (3)
... (4)
i 1
From equ. (4) we have
0n + 1
n

i 1
n
0  Xi + 1
i 1
n
X i   Yi and
i 1
n

i 1
Xi2 =
n

XiYi
... (5)
i 1
Equ. (5) are called normal equations. From equ. (5) we have
 X i Yi   X i  Yi / n
ˆ 1 
2
 Xi2   Xi  / n
109
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
n
n
and ˆ 0  Y  ˆ 1X , where Y and X are i 1 Yi / n and i 1 X i / n , respectively. ̂ 0 and ̂1 obtained in
this fashion are called least-squares estimates of 0 and 1 respectively. Thus, we can write the estimated
ˆ  ˆ X , which is called the prediction equation.
regression equation as Ŷ  
0
1
3.3.
Minimizing Mean Absolute Deviations (MINMAD) Regression
For the simple linear regression model, namely,
Y = 0 + 1X + ,
we have observed data on X and Y given by (Xi, Yi), i = 1, 2, ..., n. We are interested in finding the
coefficients 0 and 1 such that
1 n
... (6)
 Yi   0  1 X i
n i 1
is minimized. The expression in equ. (6) is known as the mean absolute deviation from the observed and
the predicted values of the dependent variable. It can easily be seen that minimizing equ. (6) is the same as
minimizing
n
Y 
i
i 1
0
 1 X i
... (7)
which is the sum of the absolute deviations. First we shall consider the problem with an additional
restriction on 0 and 1 as well as minimizing the sum of the absolute deviations in Y0 = 0 + 1X0 for a
given pair (X0, Y0).
3.4.
Minimizing Maximum of Absolute Deviations (MINMAXAD) Regression
Let us consider the case of estimating the parameters 0, 1 in equ. (1) by minimizing the maximum of
absolute deviations. Under this criterion, the objective is to of find the 0 and 1 such that 0, 1 is a
solution to
Minimize max Yi  β 0  X i β1  .

β 0 ,β1 

1i  n
In the literature, there are considerable works in the models relating to the models with and without
intercept terms.
3.5.
Other Estimators
The least squares regression criterion used to minimize i 1d i2 , and di is the deviation from the observed
and predicted values of the dependent variable corresponding to the ith observation. Now notice that
n
i 1d i2 can
n
be
thought
of
as
the

variance

of
the
deviations,
namely,
(1/n)(di – d )2, where d  (1 / n)11 d i  (1 / n)i 1 Yi  Yˆi . But d = 0, for the least squares regression
n
n
line as it passes through the point X , Y 
Thus minimizing
 d – d 
i<j
i
2
j
 d
 d  can be equivalently written as 1/n  
2
i
i<j
d – d 
i
j
2
.
is a possible criterion for finding a regression line. In case of d = 0, this is
an equivalent to the least-squares regression criterion. By replacing (di – dj)2 by |di – dj| in the above we
obtain another criterion, namely, the sum of the absolute difference between deviations.
3.6.
Minimizing Sum of Absolute Differences between Deviations (MINSADBED) Regression
Consider the estimation of the parameters 0, 1 in equ. (1) by minimizing the sum of absolute differences
between deviations; that is, Minimize  di – d j . From the expression for di, dj we have,
i j
 d – d =|(Yβ -β X ) – (Y -β -β X )|
i
i<j
j
i 0
1
i
j
0
1
j
i<j
110
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
= |  Yi -Yj  –β1  Xi - X j  |
i<j
Let us denote Yij = Yi – Yj and Xij = Xi – Xj, i < j. Then we have
d
i j
i
– d j   | Yij  1 X ij | Then the
i j
objective function is similar to that of the MINMAD regression, with 1 alone to be estimated, as the 0
term cannot be estimated using this method. So we can apply the procedure developed for MINMAD with
n(n – 1)/2 data points Yij, Xij to resolve this problem. One way of obtaining an estimate of 0 is to force the
line to pass through ( X , Y ) and take the corresponding constant term as an estimate of 0, that is,
ˆ  Y  Xˆ and there are other ways of estimating 0. One such estimate commonly used in the
0
1


1
literature is ˆ 0  median
Yi  Yj .
2
i j
3.7.
Minimizing Sum of Absolute Differences between Absolute Deviations (MINSADBAD)
Regression
Let us consider the estimation of parameter 0, and 1 in the simple regression by minimizing the sum of
absolute differences between absolute deviations.
That is, Minimize  |di |-|d j |
β0 ,β1 
i<j
where di = Yi – (0 + 1Xi).
3.8.
Multiple Linear Regression
Consider the multiple regression model
Y = 0 + 1X1 + ... + p – 1Xp – 1 + 
... (8)
where X1, X2, ..., Xp – 1 are known and j’s are unknown parameters to be estimated and  is the error term.
If the Xj’s are varied and n values of Y are observed, denoted by Y'=(Y1,Y2,...,Yn), then we have
Y = X + 
... (9)
th
where X=  X1 ,X2 ,...,Xn  and X'i = (1, Xi1, ..., Xip – 1) corresponds to the i choice of the variables X1, ...,
Xp-1, ' = (0, ..., p-1) and ' = (1, ..., n).
The least squared method of estimating  consists of minimizing ii2 with respect to ; that is, minimize
' = ||Y - X||2 with respect to . Now,
' = (Y - X)' (Y - X)
= Y'Y - 2'X'Y + 'X'X.
Differentiating ' with respect to  and equating ' /  to zero, we get
- 2 X'Y + 2X'X = 0 (or) X'X = X'Y.
... (10)
Equ. (10) is called the normal equation(s). If X is of rank p then X'X is positive definite, and so,
nonsingular. Hence we have a unique solution to (10). Thus
1
... (11)
ˆ   X ' X  X ' Y
Then for any ,
(Y-Xβ)'(Y-Xβ) =[Y-Xβ+ X (βˆ -β) '  Y-Xβ+ X( βˆ -β)]
ˆ '(Y-Xβ)
ˆ +(βˆ -β)'X'X(β-β)
ˆ
= (Y-Xβ)
 (Y–Xβˆ )'(Y–Xβˆ )
ˆ '(Y–Xβ)
ˆ and is attained at β=βˆ and this
which shows that the minimum of (Y-X)'(Y-X) is (Y–Xβ)
solution is shown to minimize '.
111
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
3.9.
Minimizing Mean Absolute Deviations
The problem of minimizing |di| with respect of  where di is the deviation from the observed, and
predicted values of Yi the ith observation. The problem is the same as minimizing mean absolute deviation;
it is alternatively known as the L1 – norm minimization problem.
The problem can be stated as follows:
Minimize |di|
... (12)
subject to X + d = Y; d,  unrestricted in sign.
Noting the fact that |di| = d1i + d2i are nonnegative, and di = d1i – d2i, we can reformulate the problem as:
Minimize
d1i + d2i
Subject to
X + d1 – d2 = Y
 Unrestricted in sign
d1, d2  0
... (13)
3.10. Least Absolute Deviation (LAD) Regression for the censored regression model
As discussed in section 1 LAD regression for the censored data has been discussed by many
researchers in recent past. A more detailed discussion on LAD regression on censored regression model
and its properties can be seen in Powell (1984).
4.
The Model
The functional relationship of Y and X is of the following form
Y = 0 + 1 X + ,
… (14)
which is known as simple linear regression of Y on X, 0 and 1 are called parameters and should be
estimated. Equ. (14) means that for a given Xi, a corresponding Yi consists of 0+1 Xi and an i by which
an observation may fall off the true regression line. On the basis of the information available from the
observations we would like to estimate 0 and 1. The term i is a random variable and is called “error
term”. By equ. (14), we can write
Yi  0  1 Xi = i
… (15)
Finding 0 and 1 from (Xi, Yi), i = 1, 2,...,n is called estimation of the parameters. There are different
methods of obtaining such estimates. In this section that follow we will consider methods that estimate the
parameters by minimizing sum of squared deviations, mean absolute deviations, and maximum of absolute
deviations. A more recent history regression analysis and its applications can also be seen in Draper and
Smith (1981).
5.
LS Approach
Consider the model
Y = X + e, e ~ N (0, 2 I)
…(16)
Where
Y is a n  1 vector of observed values of dependent variables.
X is a n  k matrix of values of independent variables and
e is the vector of random variables assumed to have a N (0, 2 I) distribution.
Under LS, ̂ is the estimate which minimizes
(Y  X) (Y – X)
and ̂ H is the estimate which minimizes
(Y  X) (Y – X) under H
ˆ (Y–Xβ)
ˆ
Let B=(Y-Xβ)
and A=(Y-Xβˆ )(Y–Xβˆ ) , then
H
…(17)
…(18)
…(19)
H
112
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
 A   B 
C 
 B 
…(20)
has a Snedecor’s F under the hypothesis  = 0, where C is a proper constant.
6.
LAD Approach
If the assumption that error are normally distributed appears to be violated, and if any prior knowledge is
available about error distribution, the maximum likelihood argument could be used to obtain the estimates
n
based on the criterion other than LS, namely, minimization of the sum of absolute errors
| e
i 1
~
i
| , where
ei’s are components of e =Y–Xβ. Let  denote the estimates obtained through LAD method by
~
minimizing |ei|. Let B = e ,Where ei’s are the components of Y-Xβ . Let  be the LAD estimates

i


H
obtained through LAD method by minimizing |ei|, subject to H. Let A = e iH , where ~
eiH ’s are the
~
components of (Y  X  H ). Rao et al. (1990) have shown that
A  B
C
 A 
… (21)
has an asymptotic 2 distribution, where C is a proper constant. For a detailed computational algorithm,
refer to Arthanari and Dodge (1981), the results are given in Table 1.
7.
Example
The following examples provide an empirical behaviour of the classical least square and the LAD
for the simple regression model, and assuming the error distribution as log-normal without using the
permutation test. A simplex Linear Programming (LP) algorithm given by Arthanari and Dodge (1981) was
used to make LAD regression estimates. For generating random numbers, the algorithm suggested by
Rubinstein (1981) was used.
7.1.
Weibull distribution
The probability density function of a Weibull random variable is
 k  x k-1 - x λ k
e
x  0

f  x;λ,k  =  λ  λ 
0
x < 0,

where k > 0 is the shape parameter and λ > 0 is the scale parameter of the distribution. Its complementary
cumulative distribution function is a stretched exponential function. The Weibull distribution is related to a
number of other probability distributions; in particular, it interpolates between the exponential
distribution (k = 1) and the Rayleigh distribution (k = 2).
Algorithm
(i)
Construct the log likelihood function L
(ii)
Obtain the gradient (1st derivative) vector with respect to λ and k
(iii)
Compute the Hessian (2nd derivative) matrix with respect to λ and k
(iv)
The initial values of λ and k are given by: k(0)=1; λ(0)=1.
7.2.
Log Gamma Distribution
If X follows log-gamma (, β) is used to indicate that the random variable X has the log-gamma
distribution with positive scale parameter  and positive shape parameter β. A log-gamma random variable
X with parameters  and β has probability density function
x
eβx e-e /α
f(x)= β
, - < x < 
α Γ(β)
113
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
The probability density function with three different parameter combinations is given in fig.2. The
cumulative distribution, survivor function, hazard function, cumulative hazard function, inverse
distribution function, moment generating function, and characteristic function on the support of X are
mathematically intractable. The population mean, variance, skewness, and kurtosis of X are also
mathematically intractable.
Algorithm:
(i)
Construct the log likelihood function L.
(ii)
Obtain the gradient (1st derivative) vector with respect to α and β.
(iii)
Compute the Hessian (2nd derivative) matrix with respect to α and β.
(iv)
The initial values of α and β can be obtained from the closed-form estimates using the method of
moments
Fig.1: Weibull Distribution
Fig.2: Log-Gamma Distribution
For simulation data and testing, where in to get the LS and LAD results, the algorithm suggested by
Lawless (1982), D’Agostino and Stephens (1986), Kotz and Dorp (2004), Kroese et. al. (2011), Arthanari
and Dodge (1981) Rubinstein (1981) was used and the results were shown in Table 2.
Table 1: For testing H0:  = 0 and for n = 500 in simple LAD and LS Regression (Y=X+ e); LogGamma with scale parameter  = 5,7; shape parameter β= 1,2
LS
Errors Distribution
log-Gamma
5
β=1
log-Gamma
5
β=2
log-Gamma
7
β=1
log-Gamma
7
β=2

0.5
0.7
1.0
0.5
0.7
1.0
0.5
0.7
1.0
0.5
0.7
1.0
Prob.
0.028
0.046
0.197
0.056
0.248
0.281
0.048
0.257
0.286
0.048
0.275
0.293
F
2.321
2.840
3.120
2.521
2.941
3.120
2.887
2.991
3.123
2.887
3.093
3.125
Prob.
0.187
0.191
0.174
0.209
0.229
0.203
0.197
0.201
0.199
0.216
0.231
0.219
LAD
Chi-square
15.320
19.459
9.152
20.125
21.281
19.974
19.591
19.902
19.691
20.864
23.165
20.921
Prob.
0.197
0.206
0.186
0.211
0.232
0.211
0.209
0.208
0.212
0.223
0.237
0.226
MAD
Chi-square
19.591
20.010
15.311
20.259
23.175
20.259
20.125
.20.119
20.306
21.157
24.081
21.198
114
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
Table 2: For testing H0:  = 0; 1 = 0.5, 0.7 and 1; for n = 500, 1000 in simple LAD and
LS Regression (Y = 0 + 1X1 + ); where X = 1,2,…, 10. Weibull with parameters k=5 and λ=1; LogGamma with parameters  = 5 and β = 1.
Errors
Distribution
Weibull
k  5; λ = 1
log-Gamma
  5; β = 1

0.5
0.7
1.0
0.5
0.7
1.0
LS
0.01 0.10
0.523 0.597
0.603 0.685
0.997 0.997
0.601 0.622
0.619 0.649
0.989 0.990
n = 500
LAD
0.01 0.10
0.542 0.623
0.623 0.705
0.997 0.997
0.636 0.641
0.645 0.702
0.990 0.997
MAD
0.01 0.10
0.594 0.651
0.649 0.746
0.998 0.998
0.654 0.659
0.689 0.713
0.998 0.998
LS
0.01 0.10
0.529 0.600
0.611 0.689
0.998 0.998
0.605 0.623
0.621 0.652
0.998 0.999
n = 1000
LAD
0.01 0.10
0.550 0.625
0.628 0.709
0.998 0.999
0.639 0.642
0.651 0.710
0.999 0.999
MAD
0.01 0.10
0.594 0.653
0.530 0.711
0.999 0.999
0.659 0.662
0.691 0.720
0.999 0.999
Table 3: For testing H0: 1 = 2 = 3 = 0 and H0: 3 = 0/ 1, i ;for 3 = 0.5, 0.7 and 1;  i =2=0 for
in multiple LAD and LS Regression (Y=0+1X1+2X2+3X3 +), for n = 500 and x uniformly
distributed (1,10). Weibull with parameters k=5 and λ=1; Log-Gamma with parameters  = 5 and β
= 1.
Errors
Distributio
n
Weibull
k5
λ=1
log-Gamma
5
β=1
LS
3
0.
5
0.
7
1.
0
0.
5
0.
7
1.
0
0.01
0.59
6
0.67
8
0.99
8
0.60
9
0.71
2
0.99
9
H0:  1 =  2 =  3 = 0
LAD
MAD
0.10 0.01 0.10 0.01 0.10
0.61 0.60 0.64 0.61 0.66
2
8
2
2
1
0.68 0.69 0.74 0.72 0.77
9
1
5
5
6
0.99 0.99 0.99 0.99 0.99
8
8
8
8
9
0.61 0.61 0.61 0.63 0.63
2
9
9
1
5
0.78 0.78 0.79 0.80 0.81
1
4
8
3
2
0.99 0.99 0.99 0.99 0.99
9
9
9
9
9
LS
0.01
0.59
9
0.68
1
0.99
9
0.61
5
0.72
0
0.99
9
H0: 3 = 0/ 1, i
LAD
MAD
0.10 0.01 0.10 0.01 0.10
0.61 0.61 0.64 0.62 0.66
9
2
9
0
5
0.69 0.69 0.75 0.73 0.77
2
9
0
0
9
0.99 0.99 0.99 0.99 0.99
9
9
9
9
9
0.61 0.62 0.62 0.63 0.63
8
1
3
4
6
0.78 0.78 0.80 0.80 0.81
5
9
1
9
5
0.99 0.99 0.99 0.99 0.99
9
9
9
9
9
For the data generated both F test and the chi-square statistic, given in equ. (20) and equ. (21) were
computed and their probabilities under the null hypothesis were calculated. For the later statistic, the
asymptotic distribution was assumed to be true, even in small samples. In otherwords, the 2 test adopted is
not an exact test, but an asymptotic one. For computing the 2 statistic, LAD and MAD have to be
computed, after computing the estimates of the parameters. For simulating data and testing, where in to get
LAD and MAD results, Arthanari’s algorithm was used. From Table 1, revealed that, Rao’s statistic is
quite sensitive, when the values of the difference in population means and standard deviations are very
close to each other and also when the value of the standard deviation is greater than the maximum
difference in means. From Table 2 and Table 3, it revealed that LAD approach shows better results
compared to LS approach for the given level .
8.
Comparing LS and LAD
For a number of reasons, LS regression analysis is perhaps the single most widely used statistical technique
by practitioners in industry, government and academia. Despite the high regard in which LS estimation is
115
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
held in the statistical theory, there are situations where in other criteria are more appropriate for parameter
estimation in simple linear models. An important problem involves the identification and handling of
observation that are “outliers.” Whatever the sources of the anomalies, it is usually desirable that outliers
be identified and that they not have an unduly large influence on model parameter estimates. Least squares
estimation falls short on both counts. A number of estimation procedures which are more robust to
departures from the usual least squares assumptions have been discussed in the statistical literature Adichi
(1967), Andrews (1974), Davies (1976), Huber (1973), Wagner (1959). One method, LAD estimation, the
most promising for applied work due to a combination of robustness properties and computational case.
Some interesting results are also found in Weiss (1988).
9.
LAD Estimation
The model under consideration will be of the form
Y  X   ,
Where
Y is an n x 1 vector of observations on the regressors and
X is an n x p matrix of values of the p regressors
 is a p x 1 vector of parameters and
 is an n x 1 vector of random disturbances.
Residuals are defined as

e  (e1, e2,...,en )  Y  X  ,

Where  is the estimators of .

The Lh estimators of  is the  that minimizes
h
n
e
i 1
i
.
It is to be noted that LAD (h=1) and OLS (h=2) are special cases of Lh estimation.
10.
Results
The following simulation results constitute a more systematic investigation of LS versus LAD and
MAD in regression models. Consider a model is of the form
Yi  β1  β 2 X i2  β 3 X i3  ε i
… (22)
in which 1 = 6.0; 2 = 3.0; 3 = -2.0. This will be referred to as the three variable model, each of the two
non-constant regressions, Xi2 and Xi3 took an integer values from -3 to 3. From the disturbances, values of
the dependent variable were calculated according to equ. (22) and the parameters, 1, 2 and 3 were
estimated by each of the four methods MAD, LAD, OLS and Generalized Least Squares (GLS). For
simulation, the algorithm suggested by D’Agostino and Stephens (1986), Kotz and Dorp (2004), Kroese et.

al. (2011) has been used. Denote the sample standard deviations of (  j ) when estimated by OLS, LAD






and MAD as  OLS (  j ),  LAD (  j ) and  MAD (  j ) respectively. These sample standard deviation are
calculated according to the usual formula
N

σ̂(β̂ j )   (β̂ jk  β̂ j ) 2 /(N  1)
 k 1

1/2
1 N
 β̂ jk.
N k 1
An obvious alternative would be to base the estimate on deviations from the true population means, j,
rather than from the sample means. The difference j - β̂ j were so that applying the alternative method of
Where β̂ ij is the kth sample estimate of j, N is the number of sample estimates and β̂ j 
 
measuring deviations always produced standard deviation estimates within 0.1 percent of σ̂ β̂ j . For each
116
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
of the p parameters and each estimation method, the sample standard deviation is calculated from 1000
parameter estimates β̂ j. the sample standard deviation for Ŷ is calculated as
1
 
2
n N
   1

 2
 Y   
 Y ik   i  Y    
   nN  1 i 1 k 1 

Where



Y jk  the k th sample

value
of Y i
P
μ i   β j X ij
j1
μ
1 n
1 n N
and
μ
Y

 i
 Yik .
n i 1
nN i 1 k 1

Thus the standard deviation of Y for each estimation method is based on 100n (5000 in this case), is a
estimates of Yi. The relative efficiency of LAD and MAD estimators of j relative to the LS is then given

    
σ LS  β j  σ LS  β j 

by     ;     and similarly for Y . Since both estimation methods give unbiased estimators,
 
 
σ LAD  β j  σ MAD  β j 
 
 
relative efficiency offers a good basis for comparing the methods values greater that 1.0 indicate LAD,
MAD are outperforming OLS, at least in this relative variance sense. The results were shown in Table 4,

Table 5 and Table 6 and the comparison of the three estimation methods based on the standard deviation Y
is shown Fig.3 and Fig.4
Table 4: Relative Efficiencies of LAD/MAD Estimator for a three variable Model in Equ. (22) with
k=5 and λ=1;  = 5 and β = 1.
No. of
Outliers
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Weibull Distribution

β1
0.99
1.89
2.41
2.78
2.91
3.00
3.26
2.99
2.73
2.71
2.61
2.58
2.47
2.20
2.10
1.80
1.56
1.32

β2
0.98
1.93
2.45
2.84
2.98
3.01
3.29
3.34
2.94
2.86
2.71
2.57
2.52
2.23
2.03
1.86
1.54
1.38

β3
0.99
1.94
2.47
2.85
2.99
3.06
3.32
3.20
3.16
3.01
2.95
2.63
2.57
1.98
1.78
1.73
1.57
1.43
Log-Gamma Distribution

Y
0.99
1.92
2.44
2.82
2.96
3.02
3.29
3.18
2.94
2.86
2.76
2.59
2.52
2.14
1.97
1.80
1.56
1.38

β1
1.23
1.92
2.46
2.98
3.21
3.65
3.95
3.51
3.40
3.12
3.01
2.93
2.81
2.53
2.21
2.01
1.81
1.52

β2
1.24
1.93
2.49
2.99
3.26
3.69
4.01
4.21
3.98
3.65
3.34
3.16
3.06
2.91
2.67
2.38
2.29
2.08

β3
1.24
1.94
2.51
3.02
3.26
3.70
4.02
4.22
3.98
3.66
3.36
3.19
3.06
2.93
2.67
2.38
2.30
2.10

Y
1.24
1.93
2.49
3.00
3.24
3.68
3.99
3.98
3.79
3.48
3.24
3.09
2.98
2.79
2.52
2.26
2.13
1.90
117
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
18
19
20
21
22
23
24
25
1.29
1.21
1.15
0.93
0.89
0.84
0.79
0.73
1.23
1.19
1.04
0.96
0.87
0.86
0.84
0.80
1.19
1.08
1.03
0.96
0.89
0.83
0.81
0.79
6.00
1.24
1.16
1.07
0.95
0.88
0.84
0.81
0.77
1.31
1.21
1.15
0.93
0.89
0.84
0.79
0.73
1.89
1.53
1.30
1.06
0.99
0.86
0.84
0.80
1.90
1.65
1.32
1.11
1.02
0.86
0.85
0.79
1.70
1.46
1.26
1.03
0.97
0.85
0.83
0.77
Weibull Distribution
5.00
4.00
OLS
3.00
GLS
LAD
2.00
MAD
1.00
0.00
1
3
5
7
9
11 13 15 17 19 21 23 25

Fig. 3. Y for Four Estimation Methods

Table 5: Sample Standard Deviation Y for three estimation methods for model in Equ. (22) with k=5
and λ=1;  = 5 and β = 1.
No. of
Outliers
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
OLS
0.49
0.90
1.09
1.32
1.37
1.59
1.86
1.98
2.03
2.09
2.21
2.31
2.35
2.69
2.83
2.98
3.13
Weibull Distribution
GLS
LAD
0.41
0.48
0.43
0.56
0.49
0.67
0.51
0.72
0.52
0.78
0.54
0.79
0.56
0.91
0.57
0.99
0.59
1.03
0.61
1.09
0.63
1.21
0.71
1.32
0.72
1.45
0.74
1.49
0.76
1.54
0.79
1.75
0.81
1.95
MAD
0.50
0.57
0.69
0.74
0.81
0.81
0.92
1.01
1.04
1.12
1.23
1.35
1.47
1.51
1.56
1.77
1.98
OLS
0.55
0.97
1.16
1.39
1.44
1.67
1.94
2.05
2.10
2.17
2.28
2.38
2.43
2.76
2.90
3.04
3.20
Log-Gamma distribution
GLS
LAD
0.47
0.55
0.50
0.62
0.55
0.74
0.59
0.79
0.60
0.85
0.61
0.86
0.63
0.98
0.64
1.05
0.66
1.11
0.68
1.16
0.69
1.28
0.78
1.40
0.78
1.52
0.80
1.57
0.82
1.61
0.86
1.81
0.88
2.01
MAD
0.56
0.63
0.76
0.81
0.88
0.89
1.00
1.08
1.12
1.19
1.30
1.43
1.54
1.58
1.63
1.85
2.06
118
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
17
18
19
20
21
22
23
24
25
3.28
3.42
3.57
3.72
3.86
4.01
4.16
4.30
4.45
0.86
0.91
0.93
1.03
1.56
1.98
2.51
3.07
4.23
2.13
2.65
3.09
3.26
3.49
3.95
4.06
4.57
5.12
6.00
2.15
2.68
3.10
3.28
3.51
3.96
4.09
4.60
5.15
3.35
3.49
3.64
3.80
3.94
4.07
4.22
4.38
4.51
0.92
0.98
1.00
1.10
1.63
2.05
2.57
3.15
4.29
2.21
2.72
3.17
3.33
3.56
4.02
4.13
4.64
5.20
2.22
2.74
3.16
3.33
3.58
4.03
4.16
4.66
5.23
Log-Gamma distribution
5.00
4.00
OLS
3.00
GLS
LAD
2.00
MAD
1.00
0.00
1
3
5
7
9
11 13 15 17 19 21 23 25

Fig. 4. Y for Four Estimation Methods
Table 6: Relative efficiencies of LAD estimators when OLS assumptions are satisfied.
Log-gamma  = 5 and β = 1
Weibull k=5 and λ=1
p
1
1
2
2
3
3
n
500
1000
500
1000
500
1000

β1
0.996
0.991
0.984
0.975
0.923
0.895

β2
0.923
0.916
0.863
0.829

β3
0.794
0.768

Y
0.960
0.960
0.954
0.917
0.860
0.831

β1
0.998
0.995
0.991
0.985
0.931
0.912



β2
β3
0.935
0.924
0.873
0.856
0.815
0.807
Y
0.966
0.972
0.963
0.926
0.873
0.859
From Table 4, it is shown that the LAD estimation clearly superior to OLS as the number of outliers ranges
between 1 to 19. When the homoscedasticity assumption is satisfied, the efficiency of LAD, MAD

estimates is about 97 percent. Table.5 gives actual values of sample standard deviations of Y for the four
estimation methods and the data listed in Table 5 also graphically depicted and shown in Fig.3 and Fig.4. It
is observed that when the variances of the disturbances are known, GLS gives best estimate. When the
variances of the disturbances are not known, LAD estimation looks to be a good choice. From Table 6, it
shows that with a moderate number of outliers as compared to Table 4, the advantages of LAD, MAD over
LS estimators increases as n increases. Table 6 is based on least 500, 1000 replications, a few values were
119
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
estimated from as many as 50000 replications in order to establish with high confidence that all the
efficiencies are not equal.
11.
Conclusion
It is observed that from above example 7, Rao’s statistic is quite sensitive, when the values of the
difference in population means and standard deviations are close to each other and also when the value of
standard deviations is greater than the maximum difference in means. LAD, MAD estimators are not
inefficient to LS estimator with LS assumptions are satisfied, but are dramatically more efficient in many
situations where large disturbances are present. It was found that regardless of the number of regressors,
the number of observations, the standard deviation of the disturbances, when the LS assumptions including
normality assumptions were satisfied, the efficiency of LAD, MAD estimators relative to OLS estimators
varies only 97 percent. The lowest efficiency for any LAD, MAD estimator was about 76 percent. Hence it
is worthwhile to compare the exact probabilities of the test statistic and to a final judgment. It was also
absorbed that LAD, MAD procedures, on the whole, provide the most attractive preliminary estimator for
robust regression when compared to LS estimator. LAD, MAD estimators, making it particularly attractive
relative to LS when the regression process is though to be particularly long-tailed. Least Absolute
Deviations Estimation for the Censored Regression Model will be better for asymptotic as well as some
symmetric distributions. The results obtained using LAD as well as MAD regression methods are not only
robust but give consistent results.
References
1. Abrevaya, J. and Shen, S. (2014). Estimation of censored panel-data models with slope
heterogeneity. Journal of Applied Econometrics, Volume 29, Issue 4, pages 523–548.
2. Adichi, J. N. (1967). Estimates of Regression Parameters Based on Rank Tests. Annals of
Mathematical Statistics, 38, pp.894-904.
3. Alan, S., Honoré, B. E., Hu, L., Petersen, S. L. (2014). Estimation of Panel Data Regression Models
with Two-Sided Censoring or Truncation. Journal of Econometric Methods, Volume 3, Issue 1,
Pages 1–20.
4. Andrews, D.F. (1974). A Robust Method of Multiple Linear Regression. Technometrics, Vol. 16,
pp 523-531.
5. Arthanari, T.S. and Dodge, Y. (1981). Mathematical Programming in Statistics, John Wiley and
Sons, New York.
6. Buckley, J. and James, I. (1979). Linear regression with censored data, Biometrica, 66, pp. 449 –
464.
7. Chen, H.J., Liu, X.L., and Liu, L.H. (2010).A Comparison of Different Methods for LAD
Regression, Advanced Materials Research, 143-144, 1328.
8. D’Agostino, R., and M. Stephens. 1986. Goodness-of-Fit Techniques. New York: Marcel Dekker.
9. Davies, M. (1976). Linear Approximation Using the Criterion of Least Total Deviations. Journal
of Royal Statistical Society, B29(1). pp. 101-109.
10. Draper, N.R and Smith. H. (1981). Applied Regression Analysis. John Wiley and Sons, New York.
11. Eakambaram. S, and Elangovan, R (2009). Least Square Versus Least Absolute Deviations
Estiomation in Regression Models, International Journal of Agricultural and Statistical Sciences,
Vol. %, No.2, pp. 355-372.
12. Eakambaram. S, and Elangovan, R (2010). On the Least Absolute Error Estimation of Linear
Regression Models with Auto-Correlated Errors, International Journal of Physical Sciences, Vol.
22(1)M, pp.213-220.
13. Fang, Y., Man, J. and Zhao, L. (2005). Strong Convergence of LAD estimates in a censored
regression model. Science in China Ser. A Mathematics, 48(2). pp. 155-168.
14. Gao, X. and Jian, H. (2010). Asymptotic analysis of high-dimensional lad regression with lasso.
Statistica Sinica 20, 1485-1506.
120
Asia Pacific Journal of Research
Vol: I Issue XVI, August 2014
ISSN: 2320-5504, E-ISSN-2347-4793
15. Huber, P.J. (1973). Robust Regression: Asymptotics, Conjectures and Monte Carlo. Ann. Statist. 1,
pp. 799-821.
16. Kalbfleisch, J.D and Prentice. R.L. (1980). The Statistical Analysis of Failure Time Data, Wiley.
17. Kaplan, E.L and Meier. P. (1958). Non Parametric Estimation from Incomplete Observations,
Journal of the American Statistical Association, 53, pp. 457 – 481.
18. Kotz, S., and J. Rene Van Dorp. 2004. Beyond Beta, Other Continuous Families of Distributions
with Bounded Support and Applications. Singapore: World Scientific Press.
19. Kroese, D. P., T. Taimre, and Z. I. Botev. 2011. Handbook of Monte Carlo Methods. Hoboken,
New Jersey: John Wiley & Sons.
20. Lawless, J.F. (1982). Statistical Models and Methods for Lifetime Data, John Wiley and Sons, New
York.
21. Powell, J. L. (1984). The asymptotic normality of two stage least deviations estimators,
Econometrica, 51, pp. 1569-1575.
22. Powell, J. L. (1984). Least absolute deviation estimation for the censored regression model, J.
Econometrica, 25, pp. 303 – 325.
23. Powell, J.L. (1986). Symmetrically trimmed least squares estimation for tobit models,
Econometrica, 54, pp. 1435 – 1460.
24. Rao, Y., Jan, L.Y., Jan, Y.N. (1990). Similarity of the product of the Drosophila neurogenic gene
big brain to transmembrane channel proteins. Nature 345(6271): 163-167.
25. Rubinstein, R. Y. (1981). Simulation and the Monte Carlo Method, John Wiley and Sons, New
York.
26. Soh, C., Harrinton, D.P., and Zazlavsky, A.M. (2008). Reducing bias in parameter estimates from
stepwise regression in proportional hazards regression with right – censored data, Lifetime Data
Anal, 14, 65 – 85.
27. Wagner, H. M. (1959). Linear Programming Techniques for Regression Analysis. Journal of the
American Statistical Association, 54, pp.206-212.
28. Wang, Z., Wu, Y. and Zhao, L. (2009). Approximation by randomly weighting method in censored
regression model. Science in China Series A : Mathematics, 52(3). pp. 561-576.
29. Wang, Z., Wu, Y.H. and Zhao, L.C. (2007). Change Point Estimation for Censored Regression
Model. Science in China Series A : Mathematics, 50(1). pp. 63-72.
30. Weiss. A. (1988). A Comparison of Ordinary Least Squares and Least Absolute Error Estimation.
Econometric Theory, 14(3). pp 517-527.
31. Yang, H and Yang, J. (2014). The adaptive L1-penalized LAD regression for partially linear singleindex models, Journal of Statistical Planning and Inference, Volumes 151–152, Pages 73–89.
121
Download