Fitting the PARAFAC model

advertisement
Fitting the PARAFAC model
Giorgio Tomasi
Chemometrics group, LMT,MLI, KVL
Frederiksberg. Denmark
E-mail: gt@kvl.dk
PARAFAC model
• PARAFAC (PARallel FACtor analysis)
Fitting an n-linear model to an n-way array.
For a three way array:
F
xijk   aif b jf ckf  rijk
f 1
The associated loss function is


L  A, B, C     xijk   aif b jf ckf 
i 1 j 1 k 1 
f 1

I
Where
J
K
aif  A, b jf  B, ckf  C
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
F
2
The algorithms
• Direct methods:
– DTLD/GRAM (Direct TriLinear Decomposition /
Generalised Rank Annihilation Method)
• Alternating methods
– ALS (Alternating Least Squares)
– ASD (Alternating Slice-wise Diagonalisation)
– SWATLD (Self-Weighted Alternating Trilinear Decomposition)
• Derivative based
– Levenberg – Marquadt
– PMF3 (Positive Matrix Factorisation for 3 way arrays)
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Direct method
DTLD-GRAM (Sanchez & Kowalsky 1986)
Based on a generalised Eigenvalue Problem
• Originally applicable only to arrays having only two
slabs in one of the modes (GRAM)
• Generalised by means of a Tucker “compression”
(DTLD)
• Advantage: quick
• Shortcomings:
– The algorithm does not provide the solution in terms of least
squares
– Sensitivity to noise
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Alternating methods - 1
The loss function is alternatively minimised with respect
to one of the set of parameters involved
• PARAFAC – ALS (Harshman 1970, Carrol & Chang
1970)
– Well established algorithm
– Several improvements have been added (compression, line
search, variable separation)
– The solution is found in the least squares sense
– Shortcomings:
• Slow convergence rate
• Sensitivity to over- (and under-) factoring
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Alternating methods - 2
• SWATLD (Chen ZP et al, 2000)
Alternates in the minimisation of three different loss functions
(one each for A, B and C)


2
2



T T
1
 T
1
L C  C      A X ..k  diag  ck  B  DB + X ..k  B   A diag  ck  D A 
F
F
k 1 
K
The solution for each step is found as:

ck = 0.5 diag  B X ..Tk A  DA2  diag  A  X ..k B  DB2

k  1,
,K
Not expressed in terms of least squares.
General property and mechanisms have not been studied, yet.
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Alternating methods - 3
• ASD (Jiang JH et al., 2000)
– Based on a modified loss function employing five sets of
parameters for a trilinear model
K

L SD  A , B, C, P, Q    P T X ..k Q  diag  ck  F   P T A  I F
k 1
2
2
T

B
Q  IF
F
– The solution is not expressed in terms of least squares
PT X..k Q is minimised and not the residuals
– It includes compression based on SVD
– Unknown properties
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
2
F

Derivative-based methods - 1
Based on the linearisation of the loss function with
respect to the parameters of the model.
• All the parameters are unified i a single vector
T
p   vec  A 
vec  B 
vec  C  
T
T
T
• Vectorisation of the 3-way array
M
M
L p     x m  ym p    r
m 1
2
m 1
2
m
p
F
yijk   aif b jf ckf
f 1
m  JK  i  1  K  j  1  k
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Derivative-based methods - 2
• Levenberg-Marquadt (Paatero 1997, Bijlsma
1998)
– The update for vector p is found as a solution to the
system:


J T J   s  I I  J  K F ps    J T r
– The parameter  makes the right hand side positive
definite and non-singular.
– The solution is found in the least squares sense
provided that  becomes small enough
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Derivative-based methods - 3
Sparsity pattern for the Jacobian
Sparsity pattern for J'J
A
B
C
A
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
B
nz = 144
C
Derivative-based methods - 4
• PMF3 (Paatero, 1997)
– The loss function includes a penalty term   pn
2
n
The system of normal equations is modified accordingly
 J J  
T
s 


   s  I I  J  K  F p s    J T r    s  p s 
– A non-linear update is calculated and used if provides a better
solution. The right hand side is modified into
 J  p  0.5  p  r  p  0.5  p     s   p  0.5  p 
T
– Line search is applied whenever the algorithm diverges
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Compression
• A Tucker3 model with  F  2 F  2 F  2 components is fitted
X
I  JK 
 TG 
F  F F  

V

U

R


T
I  JK 
• A PARAFAC model is fitted on the Tucker3’s core
• PARAFAC is “expanded” to the original dimensions by means
of the Tucker3’s loadings
• The expanded matrices provide the starting values for more
expensive computations on the original space (here always by
means of PARAFAC-ALS
As to be able to compare the its effect on the computational
expenses ALS, LM and PMF3 algorithms were employed both
with and without compression
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
PARAFAC indeterminacies
• Permutational indeterminacy (trivial)
• Scaling indeterminacy:
F
F
f 1
f 1
ˆ  r a s c t b  a c b
X
f f f f f f  f f f
The two models are equivalent so long as r f s f t f  1
The consequence is the rank deficiency of J
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Tests
• Montecarlo simulations
– 720 data sets of dimension 20 x 20 x 20
Four features were varied:
- Rank (3 and 5)
- Homoscedastic and heteroscedastic noise (3 levels each)
- Collinearity between the components (cosine = .5 or .9)
- On each data set were fitted to models F and F+1
- Two real data sets: fluorescence spectra
- Data set 1: 6 replicates, 15 x 66 x 15, rank 4
- Data set 2: 3 replicates, 22 x 87 x 13, rank 4
Measured on solution of four compounds which
concentrations were then calculated based on the PARAFAC
model
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Initialisation and convergence
• All the algorithms but DTLD were initialised using matrices of
random numbers:
10 sets of loading matrices were generated with random numbers
On each of them were run10 iterations with PARAFAC-ALS
The best fitting has been used has initial value
• Convergence criteria:
–
–
–
–
–
6
Relative decrease in fit: 10
8
10
Relative change of the parameters (only LM and PMF3):
9
10
Gradient norm (only LM and PMF3):
Consecutive “almost singular” left hand side: 5
Maximum number of iteration: 10000/500 respectively for alternating
algorithms and derivative based
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Evaluation parameters
% Full recoveries: number of cases when the algorithm retrieved the correct factors.
Recognition is provided numerically using congruence:

cos a f  b f  c f , aˆ g  bˆ g  cˆ g

A threshold of 0.99 was set to establish the correct retrieval of a factor
Solution quality:
- Root Median Mean Squared Error


ˆ , P, S 
MSE A , A
A
ˆ
A  APS
2
F
IF
- Loss function value
Computational efficiency: n. of iterations, time, flops
For the real data sets Root Mean Squared Error in Calibration RMSEC 
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
I
  y  yˆ 
i 1
i
i
I
100.00
80.00
60.00
Rank 3, Cong. 0.5
Rank 3, Cong. 0.9
40.00
20.00
PM
w.
F3
Co
m
dG
pr
es
N
sio
w.
n
Co
PM
m
pr
F3
es
w.
sio
Co
n
m
pr
es
sio
n
AL
S
dG
N
AS
D
SW
DT
LD
/
AT
LD
0.00
GR
PA
AM
RA
FA
CAL
S
% Full recovery (Th. 0.99)
% Full Recoveries for correctly estimated rank
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
% Full recovery (Th. 0.99)
Quality of the solution
60
50
40
F
30
F+1
20
10
w.
PM
Co
dG
F3
m
N
pr
w.
es
PM
si
Co
on
m
F3
p
re
w.
ss
Co
io
m
n
pr
es
si
on
AL
S
dG
N
AT
LD
D
AS
SW
DT
LD
/G
PA
RA
RA
M
FA
CAL
S
0
MSE
0.7
0.6
0.5
0.4
0.3
0.2
F
F+1
w.
PM
Co
dG
F3
m
N
pr
w.
es
PM
si
Co
on
m
F3
p
re
w.
ss
Co
io
m
n
pr
es
si
on
AL
S
dG
N
AT
LD
SW
D
AS
DT
LD
/G
PA
RA
RA
M
FA
CAL
S
0.1
0
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
• ALS, both with
compression and without
is very much affected by
overfactoring
• SWATLD is very
resistant to it an has a
better chance to retrieve
the correct factors
• ASD seems rather nice
but the components tend
to be extremely noisy
Time consumption
60.00
50.00
40.00
30.00
20.00
10.00
0.00
124
3 Extracted Factors
4 Extracted Factors
DT
PA LD
RA /GR
FA AM
CAL
S
A
SW SD
AT
LD
AL
S
d
G
dG w. C
N
N
om PM
PM w
F
p
.
3
F3 Co res
w. mp sio
Co res n
m s io
pr
es n
si
on
Time (s)
Time consumption, rank 3
130.00
110.00
90.00
70.00
50.00
30.00
10.00
-10.00
235
DT
PA LD
RA /GR
FA AM
CAL
S
A
SW SD
AT
LD
AL
S
d
G
dG w. C
N
N
om PM
PM w
F
p
.
3
F3 Co res
w. mp sio
Co res n
m s io
pr
es n
si
on
Time (s)
Time consumption, rank 5
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
5 Extracted Factors
6 Extracted Factors
• dGN and PMF3 are the
most expensive in terms
of computational time
• The filling og the
Jacobian takes up to 50%
of the time
• Compression
significantly helps
• Need for more efficient
routines to calculate J T J
and J T r
Iterations
N. iterations
Iterations
250
200
150
3 Factors
AL
S
w.
PM
Co
dG
F3
m
N
pr
w.
es
PM
Co
si
on
m
F3
p
re
w.
ss
Co
io
m
n
pr
es
si
on
dG
N
D
AS
SW
AT
LD
4 Factors
PA
RA
F
AC
-A
LS
100
50
0
Iteration's cost in terms of FLOPs
1.0E+07
1.0E+06
Rank 3
1.0E+05
Rank 5
1.0E+04
AL
S
PM
w.
F3
Co
m
dG
pr
N
es
w.
sio
Co
n
PM
m
pr
F3
es
w.
sio
Co
n
m
pr
es
sio
n
N
dG
LD
SW
AT
AS
D
AC
-A
LS
1.0E+03
PA
RA
F
FLOPs/Iteration
1.0E+08
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
• Compressed methods require
more iterations for fitting and
many less for refining
• Compressed methods are
more affected by overfactoring for as n. of
iterations
• Derivative-based methods are
more efficient but more
expensive.
• Compression allows similar
cost per iteration for
derivative based
RMSEP for 1st data set, 4 factors
Hydroquinone
Phenylalanine
1.2000
322.0000
Rep. 2
316.0000
F
Co 3
m
Pa
Co r
m
LM
Co
m
PM
F3
SW
DT
LD
G
RA
Tryptophan
DOPA
0.6000
5
0.5000
Rep. 2
2
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Pa
r
Co
m
LM
Co
m
PM
F3
F3
Co
m
LM
PM
D
M
Rep. 3
DT
LD
G
RA
Pa
r
Co
m
LM
Co
m
PM
F3
Co
m
F3
PM
LM
AT
LD
SW
AS
D
0
AL
S
0.0000
M
0.1000
1
AT
LD
Rep. 3
0.2000
Rep. 1
3
SW
Rep. 2
AS
0.3000
AL
S
Rep. 1
RMSEP
4
0.4000
DT
LD
G
RA
RMSEP
LM
Rep. 3
M
Pa
r
Co
m
LM
Co
m
PM
F3
F3
Co
m
PM
LM
AS
SW
DT
LD
G
RA
D
312.0000
AT
LD
0.0000
AL
S
0.2000
314.0000
PM
Rep. 3
0.4000
Rep. 1
318.0000
D
AT
LD
Rep. 2
AS
0.6000
AL
S
Rep. 1
RMSEP
320.0000
0.8000
M
RMSEP
1.0000
RMSEP for 1st data set, 5 factors
Hydroquinone
Phenylalanine
Tryptophan
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Pa
Co r
m
LM
Co
m
PM
F3
F3
Pa
r
Co
m
LM
Co
m
PM
F3
F3
Co
m
PM
LM
AT
LD
D
Pa
r
Co
m
LM
Co
m
PM
F3
F3
Co
m
PM
LM
AT
LD
D
AS
SW
AL
S
M
0.0000
Rep 3
SW
0.1000
Rep 2
AS
Rep 3
0.2000
Rep 1
AL
S
Rep 2
M
Rep 1
0.4000
0.3000
RMSEP
0.5000
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
DT
LD
G
RA
0.6000
DT
LD
G
RA
Co
m
DOPA
0.7000
RMSEP
PM
DT
LD
G
RA
M
Pa
r
Co
m
LM
Co
m
PM
F3
F3
Co
m
LM
PM
D
AT
LD
AS
SW
AL
S
DT
LD
G
RA
M
0.0000
Rep 3
LM
Rep 3
0.2000
Rep 2
SW
0.4000
Rep 1
D
AT
LD
Rep 2
AS
Rep 1
0.6000
RMSEP
RMSEP
0.8000
900.0000
800.0000
700.0000
600.0000
500.0000
400.0000
300.0000
200.0000
100.0000
0.0000
AL
S
1.0000
RMSEP for 2nd data set
Average RMSEP, 4 factors
Average RMSEP, 5 factors
0.5
0.5
0.4
0.3
Hydro
0.2
Trypto
Tyro
0.1
DOPA
0.3
Hydro
0.2
Trypto
Tyro
0.1
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Pa
r
Co
m
LM
Co
m
PM
F3
F3
Co
m
PM
LM
D
AT
LD
AS
SW
AL
S
M
DT
LD
G
RA
Pa
r
Co
m
LM
Co
m
PM
F3
F3
Co
m
PM
LM
D
AT
LD
AS
SW
AL
S
0
M
0
RMSEP
DOPA
DT
LD
G
RA
RMSEP
0.4
Conclusions
• PARAFAC-ALS is more sensitive than the other
methods to over-factoring
• SWATLD appears as the most efficient method when it
comes to retrieval of the underlying factors (on
simulated data). Conversely it is not as efficient on real
data and hardly ever provides the least squares solution.
It is likely a good method for initialisation.
• Derivative based methods require compression in order
to be feasible for large scale problems
• Compression does not seem to affect the recovery
capability of the algorithms it is combined with.
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Future aspects
• PARAFAC growing number of applications in
spectrometry implies dealing with larger data
sets:
– Need for more efficient routines for the derivative
based methods
– Development of more refined methods exploiting
the sparsity of the Jacobian and the multilinearity. (f.i
use of 2nd derivatives, variable separation,…)
– Alternative algorithms providing the least squares
solution (e.g. simulated annealing)
Giorgio Tomasi, Chemometrics
Group, KVL, Denmark
Download