Discussion of “Least Angle Regression” by Weisberg

advertisement

Discussion of “Least Angle

Regression” by Weisberg

Mike Salwan

November 2, 2006

Stat 882

Introduction

 “Notorious” problem of automatic model building algorithms for linear regression

 Implicit Assumption

 Replacing Y by something without loss of info

 Selecting variables

 Summary

Implicit Assumption

 We have n x m matrix X and n-vector Y

 P is the projection onto the column space

 LARS assumes we can replace Y with Ŷ

= PY, in large samples F(y|x) = F(y|x’β)

 We estimate residual variance by

 ˆ 2 

( I

P ) Y

2

/( n

 m

1 )

 If this assumption does not hold, then LARS is unlikely to produce useful results

Implicit Assumption (cont)

 Alternative: let F(y|x) = F(y|x’B), where B is an m x d rank d matrix. The smallest d is called the structural dimension of the regression problem

 The R package dr can be used to estimate d using methods such as sliced inverse regression

 Find a smooth function that operates on a variable set of projections

 Expanded variables from 10 to 65 in paper such that F(y|x) = F(y|x’β) holds

Implicit Assumption (cont)

 LARS relies too much on correlations

 Correlation measures degree of linear association (obviously)

 Requires linearity in conditional distributions of y and of a’x and b’x for all a and b, otherwise bizarre results can come

 Any method replacing Y by PY cannot be sensitive to nonlinearity

Implicit Assumption (cont)

 Methods based on PY alone can be strongly

 influenced by outliers and high leverage cases

Consider

C p

(

 ˆ

)

Y

  ˆ

2

2

 n

2 i n 

1 cov(

 i

2

, y i

)

 Estimate σ² by  ˆ 2 

( I

P ) Y

2

/( n

 m

1 )

 Thus the ith term is given by:

C pi

( )

( y

ˆ i

 ˆ 2

 i

)

2

 cov(

 ˆ 2 i

, y i

)

 Ŷ i is the ith element of PY and h i is the ith leverage which is a diagonal element in P h cov(

 ˆ 2 i

, y i

)

Implicit Assumption (cont)

 From the simulation in the article, we can

 ˆ 2 u u i is the ith diagonal of the projection matrix on the columns of (1,X) at the current step of the

 algorithm

Thus, C pi

(

 ˆ

)

( i

 ˆ 2

 ˆ i

)

2

 u i

( h i

 u i

)

 This is the same formula in another paper by

Weisberg where is computed from LARS instead of a projection

Implicit Assumption (cont)

 The value of depends on the agreement i

, the leverage in the subset model and the difference in the leverage between the full and subset models

 Neither of the latter two terms has much to do with the problem of interest (study of conditional distribution of y given x), but they are determined by the predictors only

Selecting Variables

 We want to decompose x into two parts x u and x a where x a represents the active predictors

 We want the smallest x a such that F(y|x) =

F(y|x a

), often using some criterion

 Standard methods are too greedy

 LARS permits highly correlated predictors to be used

Selecting Variables (cont)

 Example to disprove LARS

 Added nine new variables by multiplying original variables by 2.2, then rounding to the nearest integer

 LARS method applied to both sets

 LARS selects two of the rounded variables including one variable and its rounded variable (BP)

Selecting Variables (cont)

 Inclusion or exclusion depends on the marginal distribution of x as much as the conditional distribution of y|x

 Ex: Two variables have a high correlation.

 LARS selects one for its active set

 Modify the other to make it now uncorrelated

 Doesn’t change y|x, changes marginal of x

 Could change set of active predictors selected by LARS or any method that uses correlation

Selecting Variables (cont)

 LARS results are invariant under rescaling, but not under reparameterization of related predictors

 By first scaling predictors then adding all cross-products and quadratics, we get a different model if done other way around

 This can be solved by considering them simultaneously, but this is self-defeating in terms of subset selection

Summary

 Problems gain notoriety because their solution is illusive but of wide interest

 LARS nor any other automatic model selection considers the context of the problem

 There seems to be no foreseeable solution to this problem

Download