Inverse Regression Methods Prasad Naik 7th Triennial Choice Symposium Wharton, June 16, 2007 Outline Motivation Principal Components (PCR) Sliced Inverse Regression (SIR) Application Constrained Inverse Regression (CIR) Partial Inverse Regression (PIR) p > N problem simulation results Motivation Estimate the high-dimensional model: Small p ( 6 variables) y = g(x1, x2, ..., xp) Link function g(.) is unknown apply multivariate local (linear) polynomial regression Large p (> 10 variables), Curse of dimensionality => Empty space phenomenon Principal Components (PCR, Massy 1965, JASA) PCR High-dimensional data X x Eigenvalue decomposition Retain K components, (e1, e2, ..., eK) where K < p Low-dimensional data, Z = (z1, z2, ..., zK) x e = e (1, e1), (2, e2), ... , (p, ep) where zi = Xei are the “new” variables (or factors) Low-dimensional subspace, K = ?? Not the most predictive variables Because y information is ignored Sliced Inverse Regression (SIR, Li 1991, JASA) Similar idea: Xn x p Z n x K Generalized Eigen-decomposition e = x e where = Cov(E[X|y]) Retain K* components, (e1, ..., eK*) Create new variables Z = (z1,..., zK*), where zi = Xei K* is the smallest integer q (= 0, 1, 2, ...) such that n( p q) pq (2pq )( H q1) Most predictive variables across any set of unit-norm vectors e’s and any transformation T(y) SIR Applications (Naik, Hagerty, Tsai 2000, JMR) Model y g ( X1 , X2 ,, X K , ) p variables reduced to K factors New Product Development context 28 variables 1 factor Direct Marketing context 73 variables 2 factors Constrained Inverse Regression (CIR, Naik and Tsai 2005, JASA) Can we extract meaningful factors? Yes First capture this information in a set of constraints Then apply our proposed method, CIR Example 4.1 from Naik and Tsai (2005, JASA) Consider 2-Factor Model y X1 exp( X 2 ) p = 5 variables Factor 1 includes variables (4,5) Factor 2 includes variables (1,2,3) Constraint sets: A11 0, A2 2 0 1 0 0 0 0 A1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 A2 0 0 0 0 1 CIR (contd.) CIR approach Solve the eigenvalue decomposition: (I-Pc) e = x e where the projection matrix Pc Ac ( Ac Ac ) Ac When Pc = 0, we get SIR (i.e., nested) Shrinkage (e.g., Lasso) set insignificant effects to zero by formulating an appropriate constraint improves t-values for the other effects (i.e., efficiency) p > N Problem OLS, MLE, SIR, CIR break down when p > N Partial Inverse Regression (Li, Cook, Tsai, Biometrika, forthcoming) Combines ideas from PLS and SIR Works well even when p > 3N Variables are highly correlated Single-index Model y g ( X ) g(.) unknown p > N Solution To estimate , first construct the matrix R as follows R (e1 , x e1 , 2x e1 , , qx 1e1 ) where e1 is the principal eigenvector of = Cov(E[X|y]) Then ˆPIR R( R x R) Re1 Conclusions Inverse Regression Methods offer estimators that are applicable for a remarkably broad class of models high-dimensional data including p > N (which is conceptually the limiting case) Estimators are closed-form, so Easy to code (just a few lines) Computationally inexpensive No iterations or re-sampling or draws (hence no do or for loops) Guaranteed convergence Standard errors for inference are derived in the cited papers