1. Introduction - lqta

advertisement
Introduction to Multiway Analysis
Quimiometria Teórica e Aplicada
Instituto de Química - UNICAMP
1. Introduction
The aim of the course is to give an introduction to the use of multiway models
such as PARAFAC, Tucker3 and N-PLS. The course will consist of both
lectures and practical exercises using MATLAB. If you have any questions
about the course, you can ask Steve Gurden (Sala H-209,
spgurden@iqm.unicamp.br).
Some prior knowledge of PCA (principal components analysis), PCR
(principal components regression), PLS (partial least squares regression) and
the MATLAB software is assumed. If the student is not familiar with these,
they may like to
(a) borrow a chemometrics book from the chemistry library for an
introduction to PCA, PCR and PLS. There are also some introductions
to chemometrics on the internet (e.g. at www.spectroscopynow.com).
and/or
(b) follow the ‘Getting Started’ guide found in the ‘Help’ menu for an
introduction to MATLAB (Version 6).
2. Suggested reading
Although many multiway analysis techniques were developed in the area of
psychometrics (i.e. the analysis of data within psychology), it is generally more
useful to read the chemistry and chemometrics literature. In particular, both
Rasmus Bro and Age Smilde have written some nice introductions:
1. Andersson, C.A., Bro, R., “The N-way Toolbox for MATLAB”, ”,
Chemometrics and Intelligent Laboratory Systems, 52 (2000), 1-4.
2. Bro, R., “PARAFAC – Tutorial and applications”, Chemometrics and
Intelligent Laboratory Systems, 38 (1997), 149-171.
3. Bro, R., “Multi-way calibration. Multi-linear PLS.”, Journal of
Chemometrics, 10 (1996), 47-63).
4. Bro, R., “Multi-way analysis in the food industry”, PhD Thesis, available
for download from internet at
http://www.models.kvl.dk/users/rasmus/thesis/thesis.html
5. Gurden, S.P., Westerhuis, J.A., Bro, R., Smilde, A.K., “A comparison of
multiway regression and scaling methods”, Chemometrics and
Intelligent Laboratory Systems, 59 (2001), 121-136.
6. Smilde, A.K., “3-way analyses: problems and prospects”,
Chemometrics and Intelligent Laboratory Systems, 15 (1992), 143-157.
There is no good book about multiway analysis in chemistry at the moment,
although I believe Smilde and Bro are writing one.
3. Notation
Scalars are written in lower-case italics, e.g. x, i, . If the scalar is an element
of a vector, matrix etc. then subscript indices may be used, e.g. xij.
Vectors are written in lower-case bold, e.g. y. Sometimes, the dimensions
may also be given, e.g. y (N  1). An element of this vector is a scalar, given
by yn.
The mean of vector y is a scalar given by y .
Matrices are written in upper-case bold, e.g. X. Sometimes, the dimensions of
may also be given, e.g. X (I  J). This matrix has I rows and J columns. An
element of this matrix is a scalar, given by xij. A column-vector of this matrix is
given by xj (I  1).
The transpose of y (N  1) is given by yT (1  N). The transpose of X (I  J) is
given by XT (J  I).
Three-way and higher order arrays are written in upper-case, underlined bold,
e.g. X. Sometimes, the dimensions may also be given, e.g. X (I  J  K). An
element of this array is a scalar, given by xijk.
To summarize
Symbol
x
x
X
X
Type
scalar
vector
matrix
3-way (or higher) array
Dimensions
11
I1
IJ
IJK
Multiway notation
There is some current debate about the best notation to use for multiway
analysis. The main problem is that using classical tensor notation is quite
complicated. Guys like H.Kiers, B.Alsberg and R.Harshman have all proposed
2
different schemes. In this course, the following notation (similar to that used
by Kiers and Bro) will be used.
When three-way arrays are matricized (or ‘unfolded’), they are written as XIJK
(I  JK) where the superscript ‘IJK’ indicates that, in this case, the array has
been unfolded along the third mode. In MATLAB, this is equivalent to
X=reshape(X,I,J*K).
Two matrix products which are sometimes used are the Kronecker product, ,
and the Khatri-Rao product, . These are defined as follows:
Kronecker product:
 x11Y  x1R Y 
X  Y   

 
 x i 1Y  x IR Y 

(1)

where X  Y   X T  Y T .
T
Khatri-Rao product:
XY  x1  y1 x1  y1  x R  yR 
(2)
These matrix products allow the Tucker3 model to be written as
X IJK  AG R1R2R3 C  B  EIJK
T
(3)
and the PARAFAC model as
X IJK  ACB  EIJK
or
(4)
T
X IJK  AI RRR C  B  EIJK
T
where IRRR is an unfolded superidentity matrix.
It is important to point out that the Tucker3 and PARAFAC models are not
performed on unfolded data. Unfolding is only used as a convenient way to
write the models as a formula.
3
Download