Partial Least Squares

advertisement
Partial Least Squares:
Consider a multiple regression of a column Ynx1 on a matrix Xnxp where Y’Y=1 and X’X=Ipxp. If Y is a matrix
then I center and scale so that Y’Y=I as well.
I want to find a vector Vpx1 with V’V=1, such that the nx1 column vector XV is as highly correlated with Y
as possible. If the correlation is high enough, then I can regress Y on vector XV instead of on matrix X
getting a single scalar coefficient c. There would then be a px1 vector b of “regression coefficients”
given by b=cV.
If Ynxm is a matrix, I want to find Wmx1 with W’W=1 such that the column vectors (linear combinations of
the columns of X and the columns of Y), XV and YW are as highly correlated as possible. This differs
from principal component regression which does the singular value decomposition only on X. Because of
the centering and scaling above, this means I want to maximize (XV)’(YW), i.e. V’X’YW. By definition,
then, I want to find the first singular value and the left and right singular vectors of X’Y.
Now back to the case in which Y is a column vector and Y’Y=1, then the singular value decomposition
gives X’Y = LDR’ where, because Y is just a column vector, R’ is the scalar 1 and D is the largest (and only)
singular value. Things are a little different if Y is a matrix. In that case D is a diagonal matrix. With Y a
vector with the centering and scaling, X’Y is a vector of correlations between matrix X and vector Y. The
length of vector X’Y is D = the square root of Y’XX’Y, so to normalize, we divide by this length getting
L=X’Y/sqrt(Y’XX’Y) and clearly X’Y=(X’Y/sqrt(Y’XX’Y))*sqrt(Y’XX’Y)*1 which is our desired L*D*R’. This
means that L=X’Y/sqrt(Y’XX’Y) is the left singular vector of X’Y and so XL is the linear combination XV we
sought at the start. Regressing Y on XV=XL we get coefficient c and thus the predicted Y is XLc=(X)(Lc)=Xb
so that multiplying L by c gives the vector b of coefficients relating Y to X-> predicted Y = X b = X(cL).
In this SAS program, I modify the example in the PROC PLS documentation to initially center and scale,
then I show with IML the calculations described above to verify that PROC PLS uses the results above to
compute the solutions.
ods html close; ods listing; ods listing gpath="%sysfunc(pathname(work))";
data data;
input x1 x2 y;
x1 = (x1-0.65627)/2.12473;
x2=(x2-0.21468)/1.78008;
y =(y-0.31225)/0.76742;
datalines;
3.37651 2.30716
0.75615
0.74193 -0.88845
1.15285
4.18747 2.17373
1.42392
0.96097 0.57301
0.27433
-1.11161 -0.75225
-0.25410
-1.38029 -1.31343
-0.04728
1.28153 -0.13751
1.00341
-1.39242 -2.03615
0.45518
0.63741 0.06183
0.40699
-2.52533 -1.23726
-0.91080
2.44277 3.61077
-0.82590
;
proc corr cov;
run;
proc reg; model Y = X1 X2;
run;
proc pls data=data nfac=1 method=rrr;
model y = x1 x2/solution;
run;
proc pls data=data nfac=1 method=pcr;
model y = x1 x2/solution;
run;
proc pls data=data nfac=1 method=pls;
model y = x1 x2/solution;
run;
proc iml; use data;
read all var{X1 X2} into X;
read all var{Y} into Y;
print X Y;
XPY=X`*Y; XPX=X`*X; YPY=Y`*Y;
print XPX XPY YPY;
call svd(L, D, R, XPY); print L D R;
guess = sqrt(Y`*X*X`*Y); print guess;
delta = inv((X*L)`*(X*L))*(X*L)`*Y;
print delta;
Ldelta=L*delta; print Ldelta;
** compare to IML below;
** check centering and scaling, compute XPY ;
** get left eigenvector L;
** check that D is sqrt(YPX times XPY);
** regress Y on XL coefficient delta;
** convert back to betas;
Download