Inverse Regression Methods: A Review, Applications and Prospects

advertisement
Inverse Regression Methods
Prasad Naik
7th Triennial Choice Symposium
Wharton, June 16, 2007
Outline

Motivation

Principal Components (PCR)

Sliced Inverse Regression (SIR)

Application

Constrained Inverse Regression (CIR)

Partial Inverse Regression (PIR)


p > N problem
simulation results
Motivation

Estimate the high-dimensional model:



Small p ( 6 variables)


y = g(x1, x2, ..., xp)
Link function g(.) is unknown
apply multivariate local (linear) polynomial regression
Large p (> 10 variables),

Curse of dimensionality => Empty space phenomenon
Principal Components (PCR, Massy 1965, JASA)

PCR


High-dimensional data X  x
Eigenvalue decomposition



Retain K components, (e1, e2, ..., eK)



where K < p
Low-dimensional data, Z = (z1, z2, ..., zK)


x e =  e
(1, e1), (2, e2), ... , (p, ep)
where zi = Xei are the “new” variables (or factors)
Low-dimensional subspace, K = ??
Not the most predictive variables

Because y information is ignored
Sliced Inverse Regression (SIR, Li 1991, JASA)

Similar idea: Xn x p Z n x K

Generalized Eigen-decomposition

 e =  x e




where  = Cov(E[X|y])
Retain K* components, (e1, ..., eK*)
Create new variables Z = (z1,..., zK*), where zi = Xei
K* is the smallest integer q (= 0, 1, 2, ...) such that
n( p  q) pq  (2pq )( H q1)

Most predictive variables across
 any set of unit-norm vectors e’s and
 any transformation T(y)
SIR Applications (Naik, Hagerty, Tsai 2000, JMR)

Model
y  g ( X1 , X2 ,, X K ,  )

p variables reduced to K factors

New Product Development context


28 variables  1 factor
Direct Marketing context

73 variables  2 factors
Constrained Inverse Regression (CIR, Naik and Tsai 2005, JASA)

Can we extract meaningful factors?

Yes


First capture this information in a set of constraints
Then apply our proposed method, CIR
Example 4.1 from Naik and Tsai (2005, JASA)

Consider 2-Factor Model
 y  X1  exp( X 2 )  




p = 5 variables
Factor 1 includes variables (4,5)
Factor 2 includes variables (1,2,3)
Constraint sets: A11  0, A2 2  0
1 0 0 0 0
A1  0 1 0 0 0
0 0 1 0 0
0 0 0 1 0 
A2  

0
0
0
0
1


CIR (contd.)

CIR approach



Solve the eigenvalue decomposition:
(I-Pc)  e =  x e

where the projection matrix Pc  Ac ( Ac Ac ) Ac

When Pc = 0, we get SIR (i.e., nested)

Shrinkage (e.g., Lasso)


set insignificant effects to zero by formulating an
appropriate constraint
improves t-values for the other effects (i.e., efficiency)
p > N Problem

OLS, MLE, SIR, CIR break down when p > N

Partial Inverse Regression (Li, Cook, Tsai, Biometrika,
forthcoming)
 Combines ideas from PLS and SIR
 Works well even when



p > 3N
Variables are highly correlated
Single-index Model

y  g ( X )  

g(.) unknown
p > N Solution

To estimate , first construct the matrix R as follows
R  (e1 ,  x e1 ,  2x e1 , ,  qx 1e1 )


where e1 is the principal eigenvector of  = Cov(E[X|y])
Then
ˆPIR  R( R x R) Re1
Conclusions

Inverse Regression Methods offer estimators that are
applicable for


a remarkably broad class of models
high-dimensional data


including p > N (which is conceptually the limiting case)
Estimators are closed-form, so


Easy to code (just a few lines)
Computationally inexpensive



No iterations or re-sampling or draws (hence no do or for loops)
Guaranteed convergence
Standard errors for inference are derived in the cited papers
Download