Introduction to Multiway Analysis Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP 1. Introduction The aim of the course is to give an introduction to the use of multiway models such as PARAFAC, Tucker3 and N-PLS. The course will consist of both lectures and practical exercises using MATLAB. If you have any questions about the course, you can ask Steve Gurden (Sala H-209, spgurden@iqm.unicamp.br). Some prior knowledge of PCA (principal components analysis), PCR (principal components regression), PLS (partial least squares regression) and the MATLAB software is assumed. If the student is not familiar with these, they may like to (a) borrow a chemometrics book from the chemistry library for an introduction to PCA, PCR and PLS. There are also some introductions to chemometrics on the internet (e.g. at www.spectroscopynow.com). and/or (b) follow the ‘Getting Started’ guide found in the ‘Help’ menu for an introduction to MATLAB (Version 6). 2. Suggested reading Although many multiway analysis techniques were developed in the area of psychometrics (i.e. the analysis of data within psychology), it is generally more useful to read the chemistry and chemometrics literature. In particular, both Rasmus Bro and Age Smilde have written some nice introductions: 1. Andersson, C.A., Bro, R., “The N-way Toolbox for MATLAB”, ”, Chemometrics and Intelligent Laboratory Systems, 52 (2000), 1-4. 2. Bro, R., “PARAFAC – Tutorial and applications”, Chemometrics and Intelligent Laboratory Systems, 38 (1997), 149-171. 3. Bro, R., “Multi-way calibration. Multi-linear PLS.”, Journal of Chemometrics, 10 (1996), 47-63). 4. Bro, R., “Multi-way analysis in the food industry”, PhD Thesis, available for download from internet at http://www.models.kvl.dk/users/rasmus/thesis/thesis.html 5. Gurden, S.P., Westerhuis, J.A., Bro, R., Smilde, A.K., “A comparison of multiway regression and scaling methods”, Chemometrics and Intelligent Laboratory Systems, 59 (2001), 121-136. 6. Smilde, A.K., “3-way analyses: problems and prospects”, Chemometrics and Intelligent Laboratory Systems, 15 (1992), 143-157. There is no good book about multiway analysis in chemistry at the moment, although I believe Smilde and Bro are writing one. 3. Notation Scalars are written in lower-case italics, e.g. x, i, . If the scalar is an element of a vector, matrix etc. then subscript indices may be used, e.g. xij. Vectors are written in lower-case bold, e.g. y. Sometimes, the dimensions may also be given, e.g. y (N 1). An element of this vector is a scalar, given by yn. The mean of vector y is a scalar given by y . Matrices are written in upper-case bold, e.g. X. Sometimes, the dimensions of may also be given, e.g. X (I J). This matrix has I rows and J columns. An element of this matrix is a scalar, given by xij. A column-vector of this matrix is given by xj (I 1). The transpose of y (N 1) is given by yT (1 N). The transpose of X (I J) is given by XT (J I). Three-way and higher order arrays are written in upper-case, underlined bold, e.g. X. Sometimes, the dimensions may also be given, e.g. X (I J K). An element of this array is a scalar, given by xijk. To summarize Symbol x x X X Type scalar vector matrix 3-way (or higher) array Dimensions 11 I1 IJ IJK Multiway notation There is some current debate about the best notation to use for multiway analysis. The main problem is that using classical tensor notation is quite complicated. Guys like H.Kiers, B.Alsberg and R.Harshman have all proposed 2 different schemes. In this course, the following notation (similar to that used by Kiers and Bro) will be used. When three-way arrays are matricized (or ‘unfolded’), they are written as XIJK (I JK) where the superscript ‘IJK’ indicates that, in this case, the array has been unfolded along the third mode. In MATLAB, this is equivalent to X=reshape(X,I,J*K). Two matrix products which are sometimes used are the Kronecker product, , and the Khatri-Rao product, . These are defined as follows: Kronecker product: x11Y x1R Y X Y x i 1Y x IR Y (1) where X Y X T Y T . T Khatri-Rao product: XY x1 y1 x1 y1 x R yR (2) These matrix products allow the Tucker3 model to be written as X IJK AG R1R2R3 C B EIJK T (3) and the PARAFAC model as X IJK ACB EIJK or (4) T X IJK AI RRR C B EIJK T where IRRR is an unfolded superidentity matrix. It is important to point out that the Tucker3 and PARAFAC models are not performed on unfolded data. Unfolding is only used as a convenient way to write the models as a formula. 3