Fitting the PARAFAC model Giorgio Tomasi Chemometrics group, LMT,MLI, KVL Frederiksberg. Denmark E-mail: gt@kvl.dk PARAFAC model • PARAFAC (PARallel FACtor analysis) Fitting an n-linear model to an n-way array. For a three way array: F xijk aif b jf ckf rijk f 1 The associated loss function is L A, B, C xijk aif b jf ckf i 1 j 1 k 1 f 1 I Where J K aif A, b jf B, ckf C Giorgio Tomasi, Chemometrics Group, KVL, Denmark F 2 The algorithms • Direct methods: – DTLD/GRAM (Direct TriLinear Decomposition / Generalised Rank Annihilation Method) • Alternating methods – ALS (Alternating Least Squares) – ASD (Alternating Slice-wise Diagonalisation) – SWATLD (Self-Weighted Alternating Trilinear Decomposition) • Derivative based – Levenberg – Marquadt – PMF3 (Positive Matrix Factorisation for 3 way arrays) Giorgio Tomasi, Chemometrics Group, KVL, Denmark Direct method DTLD-GRAM (Sanchez & Kowalsky 1986) Based on a generalised Eigenvalue Problem • Originally applicable only to arrays having only two slabs in one of the modes (GRAM) • Generalised by means of a Tucker “compression” (DTLD) • Advantage: quick • Shortcomings: – The algorithm does not provide the solution in terms of least squares – Sensitivity to noise Giorgio Tomasi, Chemometrics Group, KVL, Denmark Alternating methods - 1 The loss function is alternatively minimised with respect to one of the set of parameters involved • PARAFAC – ALS (Harshman 1970, Carrol & Chang 1970) – Well established algorithm – Several improvements have been added (compression, line search, variable separation) – The solution is found in the least squares sense – Shortcomings: • Slow convergence rate • Sensitivity to over- (and under-) factoring Giorgio Tomasi, Chemometrics Group, KVL, Denmark Alternating methods - 2 • SWATLD (Chen ZP et al, 2000) Alternates in the minimisation of three different loss functions (one each for A, B and C) 2 2 T T 1 T 1 L C C A X ..k diag ck B DB + X ..k B A diag ck D A F F k 1 K The solution for each step is found as: ck = 0.5 diag B X ..Tk A DA2 diag A X ..k B DB2 k 1, ,K Not expressed in terms of least squares. General property and mechanisms have not been studied, yet. Giorgio Tomasi, Chemometrics Group, KVL, Denmark Alternating methods - 3 • ASD (Jiang JH et al., 2000) – Based on a modified loss function employing five sets of parameters for a trilinear model K L SD A , B, C, P, Q P T X ..k Q diag ck F P T A I F k 1 2 2 T B Q IF F – The solution is not expressed in terms of least squares PT X..k Q is minimised and not the residuals – It includes compression based on SVD – Unknown properties Giorgio Tomasi, Chemometrics Group, KVL, Denmark 2 F Derivative-based methods - 1 Based on the linearisation of the loss function with respect to the parameters of the model. • All the parameters are unified i a single vector T p vec A vec B vec C T T T • Vectorisation of the 3-way array M M L p x m ym p r m 1 2 m 1 2 m p F yijk aif b jf ckf f 1 m JK i 1 K j 1 k Giorgio Tomasi, Chemometrics Group, KVL, Denmark Derivative-based methods - 2 • Levenberg-Marquadt (Paatero 1997, Bijlsma 1998) – The update for vector p is found as a solution to the system: J T J s I I J K F ps J T r – The parameter makes the right hand side positive definite and non-singular. – The solution is found in the least squares sense provided that becomes small enough Giorgio Tomasi, Chemometrics Group, KVL, Denmark Derivative-based methods - 3 Sparsity pattern for the Jacobian Sparsity pattern for J'J A B C A Giorgio Tomasi, Chemometrics Group, KVL, Denmark B nz = 144 C Derivative-based methods - 4 • PMF3 (Paatero, 1997) – The loss function includes a penalty term pn 2 n The system of normal equations is modified accordingly J J T s s I I J K F p s J T r s p s – A non-linear update is calculated and used if provides a better solution. The right hand side is modified into J p 0.5 p r p 0.5 p s p 0.5 p T – Line search is applied whenever the algorithm diverges Giorgio Tomasi, Chemometrics Group, KVL, Denmark Compression • A Tucker3 model with F 2 F 2 F 2 components is fitted X I JK TG F F F V U R T I JK • A PARAFAC model is fitted on the Tucker3’s core • PARAFAC is “expanded” to the original dimensions by means of the Tucker3’s loadings • The expanded matrices provide the starting values for more expensive computations on the original space (here always by means of PARAFAC-ALS As to be able to compare the its effect on the computational expenses ALS, LM and PMF3 algorithms were employed both with and without compression Giorgio Tomasi, Chemometrics Group, KVL, Denmark PARAFAC indeterminacies • Permutational indeterminacy (trivial) • Scaling indeterminacy: F F f 1 f 1 ˆ r a s c t b a c b X f f f f f f f f f The two models are equivalent so long as r f s f t f 1 The consequence is the rank deficiency of J Giorgio Tomasi, Chemometrics Group, KVL, Denmark Tests • Montecarlo simulations – 720 data sets of dimension 20 x 20 x 20 Four features were varied: - Rank (3 and 5) - Homoscedastic and heteroscedastic noise (3 levels each) - Collinearity between the components (cosine = .5 or .9) - On each data set were fitted to models F and F+1 - Two real data sets: fluorescence spectra - Data set 1: 6 replicates, 15 x 66 x 15, rank 4 - Data set 2: 3 replicates, 22 x 87 x 13, rank 4 Measured on solution of four compounds which concentrations were then calculated based on the PARAFAC model Giorgio Tomasi, Chemometrics Group, KVL, Denmark Initialisation and convergence • All the algorithms but DTLD were initialised using matrices of random numbers: 10 sets of loading matrices were generated with random numbers On each of them were run10 iterations with PARAFAC-ALS The best fitting has been used has initial value • Convergence criteria: – – – – – 6 Relative decrease in fit: 10 8 10 Relative change of the parameters (only LM and PMF3): 9 10 Gradient norm (only LM and PMF3): Consecutive “almost singular” left hand side: 5 Maximum number of iteration: 10000/500 respectively for alternating algorithms and derivative based Giorgio Tomasi, Chemometrics Group, KVL, Denmark Evaluation parameters % Full recoveries: number of cases when the algorithm retrieved the correct factors. Recognition is provided numerically using congruence: cos a f b f c f , aˆ g bˆ g cˆ g A threshold of 0.99 was set to establish the correct retrieval of a factor Solution quality: - Root Median Mean Squared Error ˆ , P, S MSE A , A A ˆ A APS 2 F IF - Loss function value Computational efficiency: n. of iterations, time, flops For the real data sets Root Mean Squared Error in Calibration RMSEC Giorgio Tomasi, Chemometrics Group, KVL, Denmark I y yˆ i 1 i i I 100.00 80.00 60.00 Rank 3, Cong. 0.5 Rank 3, Cong. 0.9 40.00 20.00 PM w. F3 Co m dG pr es N sio w. n Co PM m pr F3 es w. sio Co n m pr es sio n AL S dG N AS D SW DT LD / AT LD 0.00 GR PA AM RA FA CAL S % Full recovery (Th. 0.99) % Full Recoveries for correctly estimated rank Giorgio Tomasi, Chemometrics Group, KVL, Denmark % Full recovery (Th. 0.99) Quality of the solution 60 50 40 F 30 F+1 20 10 w. PM Co dG F3 m N pr w. es PM si Co on m F3 p re w. ss Co io m n pr es si on AL S dG N AT LD D AS SW DT LD /G PA RA RA M FA CAL S 0 MSE 0.7 0.6 0.5 0.4 0.3 0.2 F F+1 w. PM Co dG F3 m N pr w. es PM si Co on m F3 p re w. ss Co io m n pr es si on AL S dG N AT LD SW D AS DT LD /G PA RA RA M FA CAL S 0.1 0 Giorgio Tomasi, Chemometrics Group, KVL, Denmark • ALS, both with compression and without is very much affected by overfactoring • SWATLD is very resistant to it an has a better chance to retrieve the correct factors • ASD seems rather nice but the components tend to be extremely noisy Time consumption 60.00 50.00 40.00 30.00 20.00 10.00 0.00 124 3 Extracted Factors 4 Extracted Factors DT PA LD RA /GR FA AM CAL S A SW SD AT LD AL S d G dG w. C N N om PM PM w F p . 3 F3 Co res w. mp sio Co res n m s io pr es n si on Time (s) Time consumption, rank 3 130.00 110.00 90.00 70.00 50.00 30.00 10.00 -10.00 235 DT PA LD RA /GR FA AM CAL S A SW SD AT LD AL S d G dG w. C N N om PM PM w F p . 3 F3 Co res w. mp sio Co res n m s io pr es n si on Time (s) Time consumption, rank 5 Giorgio Tomasi, Chemometrics Group, KVL, Denmark 5 Extracted Factors 6 Extracted Factors • dGN and PMF3 are the most expensive in terms of computational time • The filling og the Jacobian takes up to 50% of the time • Compression significantly helps • Need for more efficient routines to calculate J T J and J T r Iterations N. iterations Iterations 250 200 150 3 Factors AL S w. PM Co dG F3 m N pr w. es PM Co si on m F3 p re w. ss Co io m n pr es si on dG N D AS SW AT LD 4 Factors PA RA F AC -A LS 100 50 0 Iteration's cost in terms of FLOPs 1.0E+07 1.0E+06 Rank 3 1.0E+05 Rank 5 1.0E+04 AL S PM w. F3 Co m dG pr N es w. sio Co n PM m pr F3 es w. sio Co n m pr es sio n N dG LD SW AT AS D AC -A LS 1.0E+03 PA RA F FLOPs/Iteration 1.0E+08 Giorgio Tomasi, Chemometrics Group, KVL, Denmark • Compressed methods require more iterations for fitting and many less for refining • Compressed methods are more affected by overfactoring for as n. of iterations • Derivative-based methods are more efficient but more expensive. • Compression allows similar cost per iteration for derivative based RMSEP for 1st data set, 4 factors Hydroquinone Phenylalanine 1.2000 322.0000 Rep. 2 316.0000 F Co 3 m Pa Co r m LM Co m PM F3 SW DT LD G RA Tryptophan DOPA 0.6000 5 0.5000 Rep. 2 2 Giorgio Tomasi, Chemometrics Group, KVL, Denmark Pa r Co m LM Co m PM F3 F3 Co m LM PM D M Rep. 3 DT LD G RA Pa r Co m LM Co m PM F3 Co m F3 PM LM AT LD SW AS D 0 AL S 0.0000 M 0.1000 1 AT LD Rep. 3 0.2000 Rep. 1 3 SW Rep. 2 AS 0.3000 AL S Rep. 1 RMSEP 4 0.4000 DT LD G RA RMSEP LM Rep. 3 M Pa r Co m LM Co m PM F3 F3 Co m PM LM AS SW DT LD G RA D 312.0000 AT LD 0.0000 AL S 0.2000 314.0000 PM Rep. 3 0.4000 Rep. 1 318.0000 D AT LD Rep. 2 AS 0.6000 AL S Rep. 1 RMSEP 320.0000 0.8000 M RMSEP 1.0000 RMSEP for 1st data set, 5 factors Hydroquinone Phenylalanine Tryptophan Giorgio Tomasi, Chemometrics Group, KVL, Denmark Pa Co r m LM Co m PM F3 F3 Pa r Co m LM Co m PM F3 F3 Co m PM LM AT LD D Pa r Co m LM Co m PM F3 F3 Co m PM LM AT LD D AS SW AL S M 0.0000 Rep 3 SW 0.1000 Rep 2 AS Rep 3 0.2000 Rep 1 AL S Rep 2 M Rep 1 0.4000 0.3000 RMSEP 0.5000 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 DT LD G RA 0.6000 DT LD G RA Co m DOPA 0.7000 RMSEP PM DT LD G RA M Pa r Co m LM Co m PM F3 F3 Co m LM PM D AT LD AS SW AL S DT LD G RA M 0.0000 Rep 3 LM Rep 3 0.2000 Rep 2 SW 0.4000 Rep 1 D AT LD Rep 2 AS Rep 1 0.6000 RMSEP RMSEP 0.8000 900.0000 800.0000 700.0000 600.0000 500.0000 400.0000 300.0000 200.0000 100.0000 0.0000 AL S 1.0000 RMSEP for 2nd data set Average RMSEP, 4 factors Average RMSEP, 5 factors 0.5 0.5 0.4 0.3 Hydro 0.2 Trypto Tyro 0.1 DOPA 0.3 Hydro 0.2 Trypto Tyro 0.1 Giorgio Tomasi, Chemometrics Group, KVL, Denmark Pa r Co m LM Co m PM F3 F3 Co m PM LM D AT LD AS SW AL S M DT LD G RA Pa r Co m LM Co m PM F3 F3 Co m PM LM D AT LD AS SW AL S 0 M 0 RMSEP DOPA DT LD G RA RMSEP 0.4 Conclusions • PARAFAC-ALS is more sensitive than the other methods to over-factoring • SWATLD appears as the most efficient method when it comes to retrieval of the underlying factors (on simulated data). Conversely it is not as efficient on real data and hardly ever provides the least squares solution. It is likely a good method for initialisation. • Derivative based methods require compression in order to be feasible for large scale problems • Compression does not seem to affect the recovery capability of the algorithms it is combined with. Giorgio Tomasi, Chemometrics Group, KVL, Denmark Future aspects • PARAFAC growing number of applications in spectrometry implies dealing with larger data sets: – Need for more efficient routines for the derivative based methods – Development of more refined methods exploiting the sparsity of the Jacobian and the multilinearity. (f.i use of 2nd derivatives, variable separation,…) – Alternative algorithms providing the least squares solution (e.g. simulated annealing) Giorgio Tomasi, Chemometrics Group, KVL, Denmark