Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN

Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 LEAST ABSOLUTE DEVIATIONS ESTIMATION FOR THE CENSORED REGRESSION MODEL B. Sanjith, Department of Statistics Manonmaniam Sundaranar University, Tirunelveli – 627 012 R. Elangovan, Department of Statistics, Annamalai University Annamalai Nagar – 608 002. ABSTRACT One of the most extensively and exhaustively discussed methods among the statistical tools available for analysis of data is “regression.” In the classical approach to the regression problem, the objective is to minimize the sum of squared deviations from the observed and the predicted values of the dependent variable and this method is known as the least squares method; it uses classical optimization methods and generalized inverses. Another method used is minimizing mean absolute deviation from the predicted and observed values of the dependent variable. This problem is known as L1 – norm minimization or Least Absolute Deviations (LAD) method in literature. Third method considered in the literature is that Chebyshev criterion of minimizing the maximum of the absolute deviations from the observed and the predicted values of the dependent variables. The method of minimizing the sum of absolute and squared deviations from hypothesized linear models have vied for statistical favour for more than 25 decades. While least squares enjoy certain well-known optimality properties within strictly Gaussian parametric models, the least absolute error (LAE) estimator is a widely recognized superior robust method especially well suited to longer tailed error distributions. In this paper, Least absolute deviations estimation for the censored regression model has been discussed. The analytical results are substantiated with numerical illustrations. 1. Introduction One of the most extensively and exhaustively discussed methods among the statistical tools available for analysis of data is “regression.” In the classical approach to the regression problem, the objective is to minimize the sum of squared deviations from the observed and the predicted values of the dependent variable and this method is known as the least squares method; it uses classical optimization methods and generalized inverses. Another method used is minimizing mean absolute deviation from the predicted and observed values of the dependent variable. This problem is known as L1 – norm minimization or Least Absolute Deviations (LAD) method in literature. Third method considered in the 107 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 literature is that Chebyshev criterion of minimizing the maximum of the absolute deviations from the observed and the predicted values of the dependent variables. The method of minimizing the sum of absolute and squared deviations from hypothesized linear models has vied for statistical favour for more than 25 decades. While least squares enjoy certain well-known optimality properties within strictly Gaussian parametric models, the least absolute error (LAE) estimator is a widely recognized superior robust method especially well suited to longer tailed error distributions. The problem of estimating linear relationships when all variates are subject to error, although of frequent occurrence in science, is rarely discussed in statistical courses of tests because most statisticians regard it is insoluble. This attitude arises from considering only a mathematical formulation which gives rise to an insoluble problem. The least square regression has dominated the statistical literature for a long time. A number of estimation procedures which are most robust to departures from the usual least square assumptions have been discussed in the recent statistical literature. The Minimum Sum of Absolute Errors (MSAE) regression is considered a robust alterative to least squares regression by a number of authors. Many of the important recent advances in econometric methods pertain to limited dependent variable models – that is, regression models for which the range of the dependent variable is restricted to some subset of the real line. Such prior restrictions quite commonly arise in cross-section studies of economic behaviour; often, for some fraction of individuals in a sample, implicit non-negativity or other inequality constraints are binding for the variable of interest. In a regression model, an inequality constraint for the dependent variable results in a corresponding bound on the unobservable error terms, this bound being systematically related to the value of the regression function. Hence, the mean of the restricted error term is not zero, and the usual conditions for consistency of least squares estimation will not apply. The censored regression model can be written in the form; yt = max {0, x't0 + ut}, t=1,...,T, where the dependent variable yt and regression vector xt are observed for each t, while the parameter vector 0 and error term ut are unobserved. It will be presumed throughout that estimation of 0 is the primary object of the statistical analysis. The LAD estimator for the censored regression model minimizes the sum of absolute deviations of yt from max {0,x't0} over all  in the parameter space B(say). Algebraically, the censored LAD estimator ̂ T minimizes; ST ()  (1 / T) T  |yt – max {0, x't}|;   B. As for the standard t 1 regression model, LAD estimation may be computationally burdensome for the censored regression model, because the function to be minimized is not continuously differentiable; nevertheless, it does provide a consistent alternative to likelihood-based procedures when prior information on the parametric form of the error density is unavailable. In this paper it is proposed to study the LAD regression for the censored regression model. The analytical results are substantiated with numerical illustrations. 2. Recent Developments in LAD regression theory Strong convergence of LAD estimates in a censored regression model has been discussed by Fang and Zhao (2005). Change-point estimation for censored regression model has been discussed by Wang and Zhao (2007). Reducing bias in parameter estimates from stepwise regression in proportional hazards regression with right-censored data has been discussed by Soh and Harrington (2008). Approximation by randomly weighting method in censored regression model has been discussed by Wang et al. (2009). LS versus LAD estimation in regression model has been discussed by Eakambaram and Elangovan (2009). On the least absolute error estimation of linear regression model with auto-correlated errors has been discussed by Eakambaram and Elangovan (2010). Asymptotic analysis of highdimensional LAD regression with LASSO has been discussed by Gao and Huang (2010). A comparison of different methods for LAD regression is discussed by Chen et. al (2010). Estimation of censored panel-data models with slope heterogeneity has been discussed by Abrevaya and Shen (2014). Penalized LAD Regression for single index models have been discussed by Yang et. al. (2014). The adaptive L1 – penalized LAD regression for partially linear single-index models have been discussed by Yang and Yang 108 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 (2014). Estimation of panel data regression models with two-sided censoring or truncation have been discussed by Alan et. al. (2014) 3. Regression Methods 3.1. Simple Linear Regression The functional relationship of Y and X is of the following form Y = 0 + 1X + , ... (1) which is known as simple linear regression of Y on X. 0 and 1 are called parameters, and should be found in equ. (1) means that for a given Xi, a corresponding Yi consists of 0 + 1Xi and an i by which an observation may fall off the true regression line. On the basis of the information available from the observations we would like to estimate 0 and 1. The term  is a random variable and is called “error term”. From the equ. (1) one can write Yi - 0 - 1Xi = i. Finding 0 and 1 from (Xi, Yi), i = 1, 2, ..., n is called estimation of the parameters. There are different methods of obtaining such estimates. In the following sections we will consider methods by minimizing (i) sum of squared deviations, (ii) mean absolute deviation, and (iii) maximum of absolute deviations. 3.2. Minimizing Sum of Squared Deviations (Least Squares Regression) This method is based on choosing o and 1 so as to minimize the sum of squares of the vertical deviations of the data points from the fitted line. The sum of squares of deviations (SSD) from the line is n 2 n SSD =   i   (Yi - 0 - 1Xi)2 ... (2) i 1 i 1 Then we would choose the estimates 0 and 1 so that the sum of squared deviations in equ. (2) is minimum. Differentiating equ. (2) with respect to 0 and 1 and setting the resultant partial derivatives to zero, we have n SSD = -2  (Yi - 0 - 1Xi) and  0 i 1 n SSD = -2  Xi (Yi - 0 - 1Xi) 1 i 1 And hence, n  (Yi - 0 - 1Xi) = 0 and  Xi(Yi - 0 - 1Xi) = 0 i 1 n ... (3) ... (4) i 1 From equ. (4) we have 0n + 1 n  i 1 n 0  Xi + 1 i 1 n X i   Yi and i 1 n  i 1 Xi2 = n  XiYi ... (5) i 1 Equ. (5) are called normal equations. From equ. (5) we have  X i Yi   X i  Yi / n ˆ 1  2  Xi2   Xi  / n 109 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 n n and ˆ 0  Y  ˆ 1X , where Y and X are i 1 Yi / n and i 1 X i / n , respectively. ̂ 0 and ̂1 obtained in this fashion are called least-squares estimates of 0 and 1 respectively. Thus, we can write the estimated ˆ  ˆ X , which is called the prediction equation. regression equation as Ŷ   0 1 3.3. Minimizing Mean Absolute Deviations (MINMAD) Regression For the simple linear regression model, namely, Y = 0 + 1X + , we have observed data on X and Y given by (Xi, Yi), i = 1, 2, ..., n. We are interested in finding the coefficients 0 and 1 such that 1 n ... (6)  Yi   0  1 X i n i 1 is minimized. The expression in equ. (6) is known as the mean absolute deviation from the observed and the predicted values of the dependent variable. It can easily be seen that minimizing equ. (6) is the same as minimizing n Y  i i 1 0  1 X i ... (7) which is the sum of the absolute deviations. First we shall consider the problem with an additional restriction on 0 and 1 as well as minimizing the sum of the absolute deviations in Y0 = 0 + 1X0 for a given pair (X0, Y0). 3.4. Minimizing Maximum of Absolute Deviations (MINMAXAD) Regression Let us consider the case of estimating the parameters 0, 1 in equ. (1) by minimizing the maximum of absolute deviations. Under this criterion, the objective is to of find the 0 and 1 such that 0, 1 is a solution to Minimize max Yi  β 0  X i β1  .  β 0 ,β1   1i  n In the literature, there are considerable works in the models relating to the models with and without intercept terms. 3.5. Other Estimators The least squares regression criterion used to minimize i 1d i2 , and di is the deviation from the observed and predicted values of the dependent variable corresponding to the ith observation. Now notice that n i 1d i2 can n be thought of as the  variance  of the deviations, namely, (1/n)(di – d )2, where d  (1 / n)11 d i  (1 / n)i 1 Yi  Yˆi . But d = 0, for the least squares regression n n line as it passes through the point X , Y  Thus minimizing  d – d  i<j i 2 j  d  d  can be equivalently written as 1/n   2 i i<j d – d  i j 2 . is a possible criterion for finding a regression line. In case of d = 0, this is an equivalent to the least-squares regression criterion. By replacing (di – dj)2 by |di – dj| in the above we obtain another criterion, namely, the sum of the absolute difference between deviations. 3.6. Minimizing Sum of Absolute Differences between Deviations (MINSADBED) Regression Consider the estimation of the parameters 0, 1 in equ. (1) by minimizing the sum of absolute differences between deviations; that is, Minimize  di – d j . From the expression for di, dj we have, i j  d – d =|(Yβ -β X ) – (Y -β -β X )| i i<j j i 0 1 i j 0 1 j i<j 110 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 = |  Yi -Yj  –β1  Xi - X j  | i<j Let us denote Yij = Yi – Yj and Xij = Xi – Xj, i < j. Then we have d i j i – d j   | Yij  1 X ij | Then the i j objective function is similar to that of the MINMAD regression, with 1 alone to be estimated, as the 0 term cannot be estimated using this method. So we can apply the procedure developed for MINMAD with n(n – 1)/2 data points Yij, Xij to resolve this problem. One way of obtaining an estimate of 0 is to force the line to pass through ( X , Y ) and take the corresponding constant term as an estimate of 0, that is, ˆ  Y  Xˆ and there are other ways of estimating 0. One such estimate commonly used in the 0 1   1 literature is ˆ 0  median Yi  Yj . 2 i j 3.7. Minimizing Sum of Absolute Differences between Absolute Deviations (MINSADBAD) Regression Let us consider the estimation of parameter 0, and 1 in the simple regression by minimizing the sum of absolute differences between absolute deviations. That is, Minimize  |di |-|d j | β0 ,β1  i<j where di = Yi – (0 + 1Xi). 3.8. Multiple Linear Regression Consider the multiple regression model Y = 0 + 1X1 + ... + p – 1Xp – 1 +  ... (8) where X1, X2, ..., Xp – 1 are known and j’s are unknown parameters to be estimated and  is the error term. If the Xj’s are varied and n values of Y are observed, denoted by Y'=(Y1,Y2,...,Yn), then we have Y = X +  ... (9) th where X=  X1 ,X2 ,...,Xn  and X'i = (1, Xi1, ..., Xip – 1) corresponds to the i choice of the variables X1, ..., Xp-1, ' = (0, ..., p-1) and ' = (1, ..., n). The least squared method of estimating  consists of minimizing ii2 with respect to ; that is, minimize ' = ||Y - X||2 with respect to . Now, ' = (Y - X)' (Y - X) = Y'Y - 2'X'Y + 'X'X. Differentiating ' with respect to  and equating ' /  to zero, we get - 2 X'Y + 2X'X = 0 (or) X'X = X'Y. ... (10) Equ. (10) is called the normal equation(s). If X is of rank p then X'X is positive definite, and so, nonsingular. Hence we have a unique solution to (10). Thus 1 ... (11) ˆ   X ' X  X ' Y Then for any , (Y-Xβ)'(Y-Xβ) =[Y-Xβ+ X (βˆ -β) '  Y-Xβ+ X( βˆ -β)] ˆ '(Y-Xβ) ˆ +(βˆ -β)'X'X(β-β) ˆ = (Y-Xβ)  (Y–Xβˆ )'(Y–Xβˆ ) ˆ '(Y–Xβ) ˆ and is attained at β=βˆ and this which shows that the minimum of (Y-X)'(Y-X) is (Y–Xβ) solution is shown to minimize '. 111 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 3.9. Minimizing Mean Absolute Deviations The problem of minimizing |di| with respect of  where di is the deviation from the observed, and predicted values of Yi the ith observation. The problem is the same as minimizing mean absolute deviation; it is alternatively known as the L1 – norm minimization problem. The problem can be stated as follows: Minimize |di| ... (12) subject to X + d = Y; d,  unrestricted in sign. Noting the fact that |di| = d1i + d2i are nonnegative, and di = d1i – d2i, we can reformulate the problem as: Minimize d1i + d2i Subject to X + d1 – d2 = Y  Unrestricted in sign d1, d2  0 ... (13) 3.10. Least Absolute Deviation (LAD) Regression for the censored regression model As discussed in section 1 LAD regression for the censored data has been discussed by many researchers in recent past. A more detailed discussion on LAD regression on censored regression model and its properties can be seen in Powell (1984). 4. The Model The functional relationship of Y and X is of the following form Y = 0 + 1 X + , … (14) which is known as simple linear regression of Y on X, 0 and 1 are called parameters and should be estimated. Equ. (14) means that for a given Xi, a corresponding Yi consists of 0+1 Xi and an i by which an observation may fall off the true regression line. On the basis of the information available from the observations we would like to estimate 0 and 1. The term i is a random variable and is called “error term”. By equ. (14), we can write Yi  0  1 Xi = i … (15) Finding 0 and 1 from (Xi, Yi), i = 1, 2,...,n is called estimation of the parameters. There are different methods of obtaining such estimates. In this section that follow we will consider methods that estimate the parameters by minimizing sum of squared deviations, mean absolute deviations, and maximum of absolute deviations. A more recent history regression analysis and its applications can also be seen in Draper and Smith (1981). 5. LS Approach Consider the model Y = X + e, e ~ N (0, 2 I) …(16) Where Y is a n  1 vector of observed values of dependent variables. X is a n  k matrix of values of independent variables and e is the vector of random variables assumed to have a N (0, 2 I) distribution. Under LS, ̂ is the estimate which minimizes (Y  X) (Y – X) and ̂ H is the estimate which minimizes (Y  X) (Y – X) under H ˆ (Y–Xβ) ˆ Let B=(Y-Xβ) and A=(Y-Xβˆ )(Y–Xβˆ ) , then H …(17) …(18) …(19) H 112 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793  A   B  C   B  …(20) has a Snedecor’s F under the hypothesis  = 0, where C is a proper constant. 6. LAD Approach If the assumption that error are normally distributed appears to be violated, and if any prior knowledge is available about error distribution, the maximum likelihood argument could be used to obtain the estimates n based on the criterion other than LS, namely, minimization of the sum of absolute errors | e i 1 ~ i | , where ei’s are components of e =Y–Xβ. Let  denote the estimates obtained through LAD method by ~ minimizing |ei|. Let B = e ,Where ei’s are the components of Y-Xβ . Let  be the LAD estimates  i   H obtained through LAD method by minimizing |ei|, subject to H. Let A = e iH , where ~ eiH ’s are the ~ components of (Y  X  H ). Rao et al. (1990) have shown that A  B C  A  … (21) has an asymptotic 2 distribution, where C is a proper constant. For a detailed computational algorithm, refer to Arthanari and Dodge (1981), the results are given in Table 1. 7. Example The following examples provide an empirical behaviour of the classical least square and the LAD for the simple regression model, and assuming the error distribution as log-normal without using the permutation test. A simplex Linear Programming (LP) algorithm given by Arthanari and Dodge (1981) was used to make LAD regression estimates. For generating random numbers, the algorithm suggested by Rubinstein (1981) was used. 7.1. Weibull distribution The probability density function of a Weibull random variable is  k  x k-1 - x λ k e x  0  f  x;λ,k  =  λ  λ  0 x < 0,  where k > 0 is the shape parameter and λ > 0 is the scale parameter of the distribution. Its complementary cumulative distribution function is a stretched exponential function. The Weibull distribution is related to a number of other probability distributions; in particular, it interpolates between the exponential distribution (k = 1) and the Rayleigh distribution (k = 2). Algorithm (i) Construct the log likelihood function L (ii) Obtain the gradient (1st derivative) vector with respect to λ and k (iii) Compute the Hessian (2nd derivative) matrix with respect to λ and k (iv) The initial values of λ and k are given by: k(0)=1; λ(0)=1. 7.2. Log Gamma Distribution If X follows log-gamma (, β) is used to indicate that the random variable X has the log-gamma distribution with positive scale parameter  and positive shape parameter β. A log-gamma random variable X with parameters  and β has probability density function x eβx e-e /α f(x)= β , - < x <  α Γ(β) 113 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 The probability density function with three different parameter combinations is given in fig.2. The cumulative distribution, survivor function, hazard function, cumulative hazard function, inverse distribution function, moment generating function, and characteristic function on the support of X are mathematically intractable. The population mean, variance, skewness, and kurtosis of X are also mathematically intractable. Algorithm: (i) Construct the log likelihood function L. (ii) Obtain the gradient (1st derivative) vector with respect to α and β. (iii) Compute the Hessian (2nd derivative) matrix with respect to α and β. (iv) The initial values of α and β can be obtained from the closed-form estimates using the method of moments Fig.1: Weibull Distribution Fig.2: Log-Gamma Distribution For simulation data and testing, where in to get the LS and LAD results, the algorithm suggested by Lawless (1982), D’Agostino and Stephens (1986), Kotz and Dorp (2004), Kroese et. al. (2011), Arthanari and Dodge (1981) Rubinstein (1981) was used and the results were shown in Table 2. Table 1: For testing H0:  = 0 and for n = 500 in simple LAD and LS Regression (Y=X+ e); LogGamma with scale parameter  = 5,7; shape parameter β= 1,2 LS Errors Distribution log-Gamma 5 β=1 log-Gamma 5 β=2 log-Gamma 7 β=1 log-Gamma 7 β=2  0.5 0.7 1.0 0.5 0.7 1.0 0.5 0.7 1.0 0.5 0.7 1.0 Prob. 0.028 0.046 0.197 0.056 0.248 0.281 0.048 0.257 0.286 0.048 0.275 0.293 F 2.321 2.840 3.120 2.521 2.941 3.120 2.887 2.991 3.123 2.887 3.093 3.125 Prob. 0.187 0.191 0.174 0.209 0.229 0.203 0.197 0.201 0.199 0.216 0.231 0.219 LAD Chi-square 15.320 19.459 9.152 20.125 21.281 19.974 19.591 19.902 19.691 20.864 23.165 20.921 Prob. 0.197 0.206 0.186 0.211 0.232 0.211 0.209 0.208 0.212 0.223 0.237 0.226 MAD Chi-square 19.591 20.010 15.311 20.259 23.175 20.259 20.125 .20.119 20.306 21.157 24.081 21.198 114 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 Table 2: For testing H0:  = 0; 1 = 0.5, 0.7 and 1; for n = 500, 1000 in simple LAD and LS Regression (Y = 0 + 1X1 + ); where X = 1,2,…, 10. Weibull with parameters k=5 and λ=1; LogGamma with parameters  = 5 and β = 1. Errors Distribution Weibull k  5; λ = 1 log-Gamma   5; β = 1  0.5 0.7 1.0 0.5 0.7 1.0 LS 0.01 0.10 0.523 0.597 0.603 0.685 0.997 0.997 0.601 0.622 0.619 0.649 0.989 0.990 n = 500 LAD 0.01 0.10 0.542 0.623 0.623 0.705 0.997 0.997 0.636 0.641 0.645 0.702 0.990 0.997 MAD 0.01 0.10 0.594 0.651 0.649 0.746 0.998 0.998 0.654 0.659 0.689 0.713 0.998 0.998 LS 0.01 0.10 0.529 0.600 0.611 0.689 0.998 0.998 0.605 0.623 0.621 0.652 0.998 0.999 n = 1000 LAD 0.01 0.10 0.550 0.625 0.628 0.709 0.998 0.999 0.639 0.642 0.651 0.710 0.999 0.999 MAD 0.01 0.10 0.594 0.653 0.530 0.711 0.999 0.999 0.659 0.662 0.691 0.720 0.999 0.999 Table 3: For testing H0: 1 = 2 = 3 = 0 and H0: 3 = 0/ 1, i ;for 3 = 0.5, 0.7 and 1;  i =2=0 for in multiple LAD and LS Regression (Y=0+1X1+2X2+3X3 +), for n = 500 and x uniformly distributed (1,10). Weibull with parameters k=5 and λ=1; Log-Gamma with parameters  = 5 and β = 1. Errors Distributio n Weibull k5 λ=1 log-Gamma 5 β=1 LS 3 0. 5 0. 7 1. 0 0. 5 0. 7 1. 0 0.01 0.59 6 0.67 8 0.99 8 0.60 9 0.71 2 0.99 9 H0:  1 =  2 =  3 = 0 LAD MAD 0.10 0.01 0.10 0.01 0.10 0.61 0.60 0.64 0.61 0.66 2 8 2 2 1 0.68 0.69 0.74 0.72 0.77 9 1 5 5 6 0.99 0.99 0.99 0.99 0.99 8 8 8 8 9 0.61 0.61 0.61 0.63 0.63 2 9 9 1 5 0.78 0.78 0.79 0.80 0.81 1 4 8 3 2 0.99 0.99 0.99 0.99 0.99 9 9 9 9 9 LS 0.01 0.59 9 0.68 1 0.99 9 0.61 5 0.72 0 0.99 9 H0: 3 = 0/ 1, i LAD MAD 0.10 0.01 0.10 0.01 0.10 0.61 0.61 0.64 0.62 0.66 9 2 9 0 5 0.69 0.69 0.75 0.73 0.77 2 9 0 0 9 0.99 0.99 0.99 0.99 0.99 9 9 9 9 9 0.61 0.62 0.62 0.63 0.63 8 1 3 4 6 0.78 0.78 0.80 0.80 0.81 5 9 1 9 5 0.99 0.99 0.99 0.99 0.99 9 9 9 9 9 For the data generated both F test and the chi-square statistic, given in equ. (20) and equ. (21) were computed and their probabilities under the null hypothesis were calculated. For the later statistic, the asymptotic distribution was assumed to be true, even in small samples. In otherwords, the 2 test adopted is not an exact test, but an asymptotic one. For computing the 2 statistic, LAD and MAD have to be computed, after computing the estimates of the parameters. For simulating data and testing, where in to get LAD and MAD results, Arthanari’s algorithm was used. From Table 1, revealed that, Rao’s statistic is quite sensitive, when the values of the difference in population means and standard deviations are very close to each other and also when the value of the standard deviation is greater than the maximum difference in means. From Table 2 and Table 3, it revealed that LAD approach shows better results compared to LS approach for the given level . 8. Comparing LS and LAD For a number of reasons, LS regression analysis is perhaps the single most widely used statistical technique by practitioners in industry, government and academia. Despite the high regard in which LS estimation is 115 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 held in the statistical theory, there are situations where in other criteria are more appropriate for parameter estimation in simple linear models. An important problem involves the identification and handling of observation that are “outliers.” Whatever the sources of the anomalies, it is usually desirable that outliers be identified and that they not have an unduly large influence on model parameter estimates. Least squares estimation falls short on both counts. A number of estimation procedures which are more robust to departures from the usual least squares assumptions have been discussed in the statistical literature Adichi (1967), Andrews (1974), Davies (1976), Huber (1973), Wagner (1959). One method, LAD estimation, the most promising for applied work due to a combination of robustness properties and computational case. Some interesting results are also found in Weiss (1988). 9. LAD Estimation The model under consideration will be of the form Y  X   , Where Y is an n x 1 vector of observations on the regressors and X is an n x p matrix of values of the p regressors  is a p x 1 vector of parameters and  is an n x 1 vector of random disturbances. Residuals are defined as  e  (e1, e2,...,en )  Y  X  ,  Where  is the estimators of .  The Lh estimators of  is the  that minimizes h n e i 1 i . It is to be noted that LAD (h=1) and OLS (h=2) are special cases of Lh estimation. 10. Results The following simulation results constitute a more systematic investigation of LS versus LAD and MAD in regression models. Consider a model is of the form Yi  β1  β 2 X i2  β 3 X i3  ε i … (22) in which 1 = 6.0; 2 = 3.0; 3 = -2.0. This will be referred to as the three variable model, each of the two non-constant regressions, Xi2 and Xi3 took an integer values from -3 to 3. From the disturbances, values of the dependent variable were calculated according to equ. (22) and the parameters, 1, 2 and 3 were estimated by each of the four methods MAD, LAD, OLS and Generalized Least Squares (GLS). For simulation, the algorithm suggested by D’Agostino and Stephens (1986), Kotz and Dorp (2004), Kroese et.  al. (2011) has been used. Denote the sample standard deviations of (  j ) when estimated by OLS, LAD       and MAD as  OLS (  j ),  LAD (  j ) and  MAD (  j ) respectively. These sample standard deviation are calculated according to the usual formula N  σ̂(β̂ j )   (β̂ jk  β̂ j ) 2 /(N  1)  k 1  1/2 1 N  β̂ jk. N k 1 An obvious alternative would be to base the estimate on deviations from the true population means, j, rather than from the sample means. The difference j - β̂ j were so that applying the alternative method of Where β̂ ij is the kth sample estimate of j, N is the number of sample estimates and β̂ j    measuring deviations always produced standard deviation estimates within 0.1 percent of σ̂ β̂ j . For each 116 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 of the p parameters and each estimation method, the sample standard deviation is calculated from 1000 parameter estimates β̂ j. the sample standard deviation for Ŷ is calculated as 1   2 n N    1   2  Y     Y ik   i  Y        nN  1 i 1 k 1   Where    Y jk  the k th sample  value of Y i P μ i   β j X ij j1 μ 1 n 1 n N and μ Y   i  Yik . n i 1 nN i 1 k 1  Thus the standard deviation of Y for each estimation method is based on 100n (5000 in this case), is a estimates of Yi. The relative efficiency of LAD and MAD estimators of j relative to the LS is then given       σ LS  β j  σ LS  β j   by     ;     and similarly for Y . Since both estimation methods give unbiased estimators,     σ LAD  β j  σ MAD  β j      relative efficiency offers a good basis for comparing the methods values greater that 1.0 indicate LAD, MAD are outperforming OLS, at least in this relative variance sense. The results were shown in Table 4,  Table 5 and Table 6 and the comparison of the three estimation methods based on the standard deviation Y is shown Fig.3 and Fig.4 Table 4: Relative Efficiencies of LAD/MAD Estimator for a three variable Model in Equ. (22) with k=5 and λ=1;  = 5 and β = 1. No. of Outliers 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Weibull Distribution  β1 0.99 1.89 2.41 2.78 2.91 3.00 3.26 2.99 2.73 2.71 2.61 2.58 2.47 2.20 2.10 1.80 1.56 1.32  β2 0.98 1.93 2.45 2.84 2.98 3.01 3.29 3.34 2.94 2.86 2.71 2.57 2.52 2.23 2.03 1.86 1.54 1.38  β3 0.99 1.94 2.47 2.85 2.99 3.06 3.32 3.20 3.16 3.01 2.95 2.63 2.57 1.98 1.78 1.73 1.57 1.43 Log-Gamma Distribution  Y 0.99 1.92 2.44 2.82 2.96 3.02 3.29 3.18 2.94 2.86 2.76 2.59 2.52 2.14 1.97 1.80 1.56 1.38  β1 1.23 1.92 2.46 2.98 3.21 3.65 3.95 3.51 3.40 3.12 3.01 2.93 2.81 2.53 2.21 2.01 1.81 1.52  β2 1.24 1.93 2.49 2.99 3.26 3.69 4.01 4.21 3.98 3.65 3.34 3.16 3.06 2.91 2.67 2.38 2.29 2.08  β3 1.24 1.94 2.51 3.02 3.26 3.70 4.02 4.22 3.98 3.66 3.36 3.19 3.06 2.93 2.67 2.38 2.30 2.10  Y 1.24 1.93 2.49 3.00 3.24 3.68 3.99 3.98 3.79 3.48 3.24 3.09 2.98 2.79 2.52 2.26 2.13 1.90 117 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 18 19 20 21 22 23 24 25 1.29 1.21 1.15 0.93 0.89 0.84 0.79 0.73 1.23 1.19 1.04 0.96 0.87 0.86 0.84 0.80 1.19 1.08 1.03 0.96 0.89 0.83 0.81 0.79 6.00 1.24 1.16 1.07 0.95 0.88 0.84 0.81 0.77 1.31 1.21 1.15 0.93 0.89 0.84 0.79 0.73 1.89 1.53 1.30 1.06 0.99 0.86 0.84 0.80 1.90 1.65 1.32 1.11 1.02 0.86 0.85 0.79 1.70 1.46 1.26 1.03 0.97 0.85 0.83 0.77 Weibull Distribution 5.00 4.00 OLS 3.00 GLS LAD 2.00 MAD 1.00 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25  Fig. 3. Y for Four Estimation Methods  Table 5: Sample Standard Deviation Y for three estimation methods for model in Equ. (22) with k=5 and λ=1;  = 5 and β = 1. No. of Outliers 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 OLS 0.49 0.90 1.09 1.32 1.37 1.59 1.86 1.98 2.03 2.09 2.21 2.31 2.35 2.69 2.83 2.98 3.13 Weibull Distribution GLS LAD 0.41 0.48 0.43 0.56 0.49 0.67 0.51 0.72 0.52 0.78 0.54 0.79 0.56 0.91 0.57 0.99 0.59 1.03 0.61 1.09 0.63 1.21 0.71 1.32 0.72 1.45 0.74 1.49 0.76 1.54 0.79 1.75 0.81 1.95 MAD 0.50 0.57 0.69 0.74 0.81 0.81 0.92 1.01 1.04 1.12 1.23 1.35 1.47 1.51 1.56 1.77 1.98 OLS 0.55 0.97 1.16 1.39 1.44 1.67 1.94 2.05 2.10 2.17 2.28 2.38 2.43 2.76 2.90 3.04 3.20 Log-Gamma distribution GLS LAD 0.47 0.55 0.50 0.62 0.55 0.74 0.59 0.79 0.60 0.85 0.61 0.86 0.63 0.98 0.64 1.05 0.66 1.11 0.68 1.16 0.69 1.28 0.78 1.40 0.78 1.52 0.80 1.57 0.82 1.61 0.86 1.81 0.88 2.01 MAD 0.56 0.63 0.76 0.81 0.88 0.89 1.00 1.08 1.12 1.19 1.30 1.43 1.54 1.58 1.63 1.85 2.06 118 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 17 18 19 20 21 22 23 24 25 3.28 3.42 3.57 3.72 3.86 4.01 4.16 4.30 4.45 0.86 0.91 0.93 1.03 1.56 1.98 2.51 3.07 4.23 2.13 2.65 3.09 3.26 3.49 3.95 4.06 4.57 5.12 6.00 2.15 2.68 3.10 3.28 3.51 3.96 4.09 4.60 5.15 3.35 3.49 3.64 3.80 3.94 4.07 4.22 4.38 4.51 0.92 0.98 1.00 1.10 1.63 2.05 2.57 3.15 4.29 2.21 2.72 3.17 3.33 3.56 4.02 4.13 4.64 5.20 2.22 2.74 3.16 3.33 3.58 4.03 4.16 4.66 5.23 Log-Gamma distribution 5.00 4.00 OLS 3.00 GLS LAD 2.00 MAD 1.00 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25  Fig. 4. Y for Four Estimation Methods Table 6: Relative efficiencies of LAD estimators when OLS assumptions are satisfied. Log-gamma  = 5 and β = 1 Weibull k=5 and λ=1 p 1 1 2 2 3 3 n 500 1000 500 1000 500 1000  β1 0.996 0.991 0.984 0.975 0.923 0.895  β2 0.923 0.916 0.863 0.829  β3 0.794 0.768  Y 0.960 0.960 0.954 0.917 0.860 0.831  β1 0.998 0.995 0.991 0.985 0.931 0.912    β2 β3 0.935 0.924 0.873 0.856 0.815 0.807 Y 0.966 0.972 0.963 0.926 0.873 0.859 From Table 4, it is shown that the LAD estimation clearly superior to OLS as the number of outliers ranges between 1 to 19. When the homoscedasticity assumption is satisfied, the efficiency of LAD, MAD  estimates is about 97 percent. Table.5 gives actual values of sample standard deviations of Y for the four estimation methods and the data listed in Table 5 also graphically depicted and shown in Fig.3 and Fig.4. It is observed that when the variances of the disturbances are known, GLS gives best estimate. When the variances of the disturbances are not known, LAD estimation looks to be a good choice. From Table 6, it shows that with a moderate number of outliers as compared to Table 4, the advantages of LAD, MAD over LS estimators increases as n increases. Table 6 is based on least 500, 1000 replications, a few values were 119 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 estimated from as many as 50000 replications in order to establish with high confidence that all the efficiencies are not equal. 11. Conclusion It is observed that from above example 7, Rao’s statistic is quite sensitive, when the values of the difference in population means and standard deviations are close to each other and also when the value of standard deviations is greater than the maximum difference in means. LAD, MAD estimators are not inefficient to LS estimator with LS assumptions are satisfied, but are dramatically more efficient in many situations where large disturbances are present. It was found that regardless of the number of regressors, the number of observations, the standard deviation of the disturbances, when the LS assumptions including normality assumptions were satisfied, the efficiency of LAD, MAD estimators relative to OLS estimators varies only 97 percent. The lowest efficiency for any LAD, MAD estimator was about 76 percent. Hence it is worthwhile to compare the exact probabilities of the test statistic and to a final judgment. It was also absorbed that LAD, MAD procedures, on the whole, provide the most attractive preliminary estimator for robust regression when compared to LS estimator. LAD, MAD estimators, making it particularly attractive relative to LS when the regression process is though to be particularly long-tailed. Least Absolute Deviations Estimation for the Censored Regression Model will be better for asymptotic as well as some symmetric distributions. The results obtained using LAD as well as MAD regression methods are not only robust but give consistent results. References 1. Abrevaya, J. and Shen, S. (2014). Estimation of censored panel-data models with slope heterogeneity. Journal of Applied Econometrics, Volume 29, Issue 4, pages 523–548. 2. Adichi, J. N. (1967). Estimates of Regression Parameters Based on Rank Tests. Annals of Mathematical Statistics, 38, pp.894-904. 3. Alan, S., Honoré, B. E., Hu, L., Petersen, S. L. (2014). Estimation of Panel Data Regression Models with Two-Sided Censoring or Truncation. Journal of Econometric Methods, Volume 3, Issue 1, Pages 1–20. 4. Andrews, D.F. (1974). A Robust Method of Multiple Linear Regression. Technometrics, Vol. 16, pp 523-531. 5. Arthanari, T.S. and Dodge, Y. (1981). Mathematical Programming in Statistics, John Wiley and Sons, New York. 6. Buckley, J. and James, I. (1979). Linear regression with censored data, Biometrica, 66, pp. 449 – 464. 7. Chen, H.J., Liu, X.L., and Liu, L.H. (2010).A Comparison of Different Methods for LAD Regression, Advanced Materials Research, 143-144, 1328. 8. D’Agostino, R., and M. Stephens. 1986. Goodness-of-Fit Techniques. New York: Marcel Dekker. 9. Davies, M. (1976). Linear Approximation Using the Criterion of Least Total Deviations. Journal of Royal Statistical Society, B29(1). pp. 101-109. 10. Draper, N.R and Smith. H. (1981). Applied Regression Analysis. John Wiley and Sons, New York. 11. Eakambaram. S, and Elangovan, R (2009). Least Square Versus Least Absolute Deviations Estiomation in Regression Models, International Journal of Agricultural and Statistical Sciences, Vol. %, No.2, pp. 355-372. 12. Eakambaram. S, and Elangovan, R (2010). On the Least Absolute Error Estimation of Linear Regression Models with Auto-Correlated Errors, International Journal of Physical Sciences, Vol. 22(1)M, pp.213-220. 13. Fang, Y., Man, J. and Zhao, L. (2005). Strong Convergence of LAD estimates in a censored regression model. Science in China Ser. A Mathematics, 48(2). pp. 155-168. 14. Gao, X. and Jian, H. (2010). Asymptotic analysis of high-dimensional lad regression with lasso. Statistica Sinica 20, 1485-1506. 120 Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN: 2320-5504, E-ISSN-2347-4793 15. Huber, P.J. (1973). Robust Regression: Asymptotics, Conjectures and Monte Carlo. Ann. Statist. 1, pp. 799-821. 16. Kalbfleisch, J.D and Prentice. R.L. (1980). The Statistical Analysis of Failure Time Data, Wiley. 17. Kaplan, E.L and Meier. P. (1958). Non Parametric Estimation from Incomplete Observations, Journal of the American Statistical Association, 53, pp. 457 – 481. 18. Kotz, S., and J. Rene Van Dorp. 2004. Beyond Beta, Other Continuous Families of Distributions with Bounded Support and Applications. Singapore: World Scientific Press. 19. Kroese, D. P., T. Taimre, and Z. I. Botev. 2011. Handbook of Monte Carlo Methods. Hoboken, New Jersey: John Wiley & Sons. 20. Lawless, J.F. (1982). Statistical Models and Methods for Lifetime Data, John Wiley and Sons, New York. 21. Powell, J. L. (1984). The asymptotic normality of two stage least deviations estimators, Econometrica, 51, pp. 1569-1575. 22. Powell, J. L. (1984). Least absolute deviation estimation for the censored regression model, J. Econometrica, 25, pp. 303 – 325. 23. Powell, J.L. (1986). Symmetrically trimmed least squares estimation for tobit models, Econometrica, 54, pp. 1435 – 1460. 24. Rao, Y., Jan, L.Y., Jan, Y.N. (1990). Similarity of the product of the Drosophila neurogenic gene big brain to transmembrane channel proteins. Nature 345(6271): 163-167. 25. Rubinstein, R. Y. (1981). Simulation and the Monte Carlo Method, John Wiley and Sons, New York. 26. Soh, C., Harrinton, D.P., and Zazlavsky, A.M. (2008). Reducing bias in parameter estimates from stepwise regression in proportional hazards regression with right – censored data, Lifetime Data Anal, 14, 65 – 85. 27. Wagner, H. M. (1959). Linear Programming Techniques for Regression Analysis. Journal of the American Statistical Association, 54, pp.206-212. 28. Wang, Z., Wu, Y. and Zhao, L. (2009). Approximation by randomly weighting method in censored regression model. Science in China Series A : Mathematics, 52(3). pp. 561-576. 29. Wang, Z., Wu, Y.H. and Zhao, L.C. (2007). Change Point Estimation for Censored Regression Model. Science in China Series A : Mathematics, 50(1). pp. 63-72. 30. Weiss. A. (1988). A Comparison of Ordinary Least Squares and Least Absolute Error Estimation. Econometric Theory, 14(3). pp 517-527. 31. Yang, H and Yang, J. (2014). The adaptive L1-penalized LAD regression for partially linear singleindex models, Journal of Statistical Planning and Inference, Volumes 151–152, Pages 73–89. 121

Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN

Related documents

Products

Support

Asia Pacific Journal of Research Vol: I Issue XVI, August 2014 ISSN

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib