CONSENSUS MULTIVARIATE
CALIBRATION OR MAINTENANCE
WITHOUT REFERENCE SAMPLES
USING TIKHONOV TYPE
REGULARIZATION APPROACHES
John Kalivas, Josh Ottaway, Jeremy Farrell, Parviz Shahbazikah
Department of Chemistry
Idaho State University
Pocatello, Idaho 83209 USA
Outline
• Multivariate calibration
• Tikhonov regularization (TR)
• TR calibration maintenance with reference samples to
form full wavelength or sparse models
– Selecting “a” model
– Selecting a collection of models
– Comparison to PLS
• TR calibration or maintenance without reference
samples
– Examples with comparison to PLS
• Summary TR variant equations
2
Spectral Multivariate Calibration
• y = Xb
y = m x 1 vector of analyte reference values for m calibration
samples
X = m x n matrix of spectra for n wavelengths
b = n x 1 regression (model) vector
•
ˆ   XT X 
b
1
T
X y
MLR solution; requires m ≥ p (wavelength selection)
• bˆ  X  y
Biased regression solutions such as TR, RR (a TR variant),
PLS, and PCR
• Requires meta-parameter (tuning parameter) selection
T
• yˆ u n k  x u n k bˆ
3
Quantitation by Tikhonov Regularization (TR)
• General TR in 2-norm
m in
 Xb  y
2
2
η
2
2
Lb
2
y X 
 
b
 0   ηL 
t
2 t
bˆ   X X  η L L 
1
= Euclidian vector 2-norm
(vector magnitude or
length)

2
Depending on the
calibration goal, L can
have different forms
t
X y
• Ridge regression (RR) when L = I
m in

Xb  y
2
2
η
2
b
2
2
y X 
    b
 0   ηI 
t
2
bˆ   X X  η I 
1
X y
t

RR is regularized by using I
and selecting η to minimize
prediction errors (low bias)
simultaneously shrinking the
model vector (low variance)
4
Selecting η
• Cross-validation
• L-curve graphic (can use with RMSEC)
• Bias/Variance can be assessed
• Useful for putting RR, PLS, etc. on one plot for objective
comparison
– C.L. Lawson, et.al.,
Solving Least-Squares
Problems. Prentice-Hall,
(1974)
– P. C. Hansen,
L bˆ
Rank-Deficient and
Discrete Ill-Posed
Problems: Numerical
Aspects of Linear
Inversion, SIAM Press
(1998)
overfitting
best model
2
underfitting
yˆ  y
2
5
Calibration Maintenance
• Need primary model to function over time
and/or under new secondary conditions
1. Prepare calibration samples to span all potential
spectral variances
•
Not possible with a seasonal or geographical effects in
some data sets
2. Preprocess primary and secondary data to be
robust to new conditions
3. Adjust spectra measured under new conditions to
fit the primary model
4. Update the primary model to predict in the new
conditions
6
Calibration Maintenance with TR2
• Model updating a RR model requires a new penalty term
– Minimize prediction errors for a few samples from new secondary
conditions

m in X b  y
2
2
η b
2
2
2
 λ Mb  yM
2
2
2

M = spectra from secondary conditions
yM = analyte reference values
• Avoid measuring many samples by tuning with λ
• Local centering
– Respectively mean center X, y, M, and yM
– Validation spectra centered to M
7
Pharmaceutical Example
• M. Dyrby, et. al., Appl. Spectrosc. 56 (2002) 579-585
– http://www.models.life.ku.dk/datasets; Dept. of Food Sciences, Univ. of
Copenhagen
• 310 Escitolopram tablets measured in NIR from 7,400-10,507 cm-1 at
resolution 6 cm-1 for 404 wavelengths
• Four tablet types based on nominal weight: type 1, type 2, type 3, and
type 4
• Three tablet batches (production scale): laboratory, pilot, and full
• 30 tablets for each batch tablet type combination
0.15
 Lab, type1
 Lab, type 2
 Lab, type 3
 Lab, type 4
 Full, type 1
 Full, type 2
 Full, type 3
 Full, type 4
0.1
PC2
0.05
0
-0.05
-0.1
-0.1
0
PC1
0.1
8
Objective
• Using laboratory produced tablets as the primary
calibration set
– Determine active pharmaceutical ingredient (API)
concentration in new tablets produced in full production
(secondary condition)
Primary Calibration Space: 30 random lab batch
samples with 15 from types 1 and 2 each
Secondary Calibration Space: 30 random full batch
samples with 15 from types 1 and 2 each
Standardization Set M: 4 random full batch samples with
2 from types 1 and 2 each
Validation Space: Remaining 30 full batch types 1 and 2
• Other batch type combinations studied
9
Example Model Merit Landscapes

m in X b  y
2
2
η b
2
2
2
 λ M b  yM
2
2
2

η
RMSEC
η
λ
λ
η
RMSEM
η
λ
λ
10
Model Merit Landscapes

m in X b  y
2
η b
2
2
2
 λ M b  yM
2
2
2
η
η
RMSEC
2
λ
RMSEM
λ
Convergence at small λ
A tradeoff region
Prediction of primary
degrades while the
prediction of
secondary improves

• Secondary conditions are not
included in new model
• Amounts to using primary RR with
local centering where secondary
validation samples are centered to
the mean of M
Best local
centered models
11
Model Merit Landscapes

m in X b  y
η b
2
2
2
2
 λ M b  yM
2
2
2

η
η
RMSEC
2
bˆ
λ
λ
bˆ
η
A tradeoff region
2
RMSEM
2
Prediction of primary
degrades while the
prediction of
secondary improves
too large
Further tradeoffs
• Tradeoff region between bˆ
and RMSEC and RMSEM
• Can use an L-curve at a
fixed λ value
λ
12
2
Model Merit Evaluations
• Multiple merits can be used to assess tradeoff
– Respective RMSEC and RMSEM landscapes for R2, slope,
and intercept
– L-curves at selected η and λ values
2
ˆb
– H i   RM SEM i RM SEM m ax  2  bˆ i
m ax

2
2

1
H
0.8
λ = 54.29
η
RMSEV
0.6
0.4
0.2
0
0
0.004 0.257 16.037 1000
η
λ
λ = 54.29
13
Model Updating Results
Updating Primary Lab Batch Types 1 and 2 to Predict
Secondary Full Batch Types 1 and 2
bˆ
Method
2
TR2
2.10
RR local
centering
RR no
update
η
λ
RMSEC
RMSEM
RMSEV
R2
0.731
0.014
0.264
0.966
2.70
0.468
0.245
0.487
0.935
16.40
0.096
-
0.653
0.925
1.701
54.29
0.588
0
0.0359
-
Lab and Full Batches Types 1 and 2 Self
Predicting Using RR
Batch
bˆ
Lab
Full
RMSEC
RMSEV
R2
η
16.40
0.096
0.276
0.972
0.0359
1.34
0.205
0.239
0.968
0.215
2
• Updated primary
models predicts
equivalently to the
secondary model
predicting the
secondary
validation samples
14
Model Vectors
RR Lab Batch
3
2
bˆ i
TR2
1
0
-1
-2
0.2
8000
9000
Wavelength,
10000
cm-1
RR Full Batch
0
-0.2
0.1
bˆ i
bˆ i
-0.4
0
8000
9000
10000
Wavelength, cm-1
bˆ
-0.1
2
-0.2
8000
9000
10000
Wavelength, cm-1
15
Using PLS
• PLS (and other methods) can also be used
• With PLS, the PLS latent vectors (PLS factors) replace
the η values
TR2
 y

0

 λy
M

  X 
 

 ηI b
 

  λM 
 

1
t
2
2
t
t
ˆ
b  X X  η I  λ M M  X y
PLS
 y   X 


b
 λyM   λM 
y '  X 'b

bˆ   X '  y '
16
PLS Model Merit Landscapes
RMSEM
RMSEC
1
0.8
0.5
5
5
Factors
1
0.6
0.4
15
0.2
20
0
0.4
10
10
0.002
λ
1.0
0.3
15
0.2
20
0
37
0.1
0.002
bˆ
1.0
37
RMSEV
2
1
1
10
60
10
15
40
15
20
20
20
Factors
5
80
0
λ
0.002
λ
1.0
37
0.6
5
0
• Similar
landscape trends
• The discrete
factor aspect of
PLS can make it
difficult to
capture the
underlying
continuity of the
landscapes
0.5
0.4
0.3
0.002
λ
1.0
37
17
PLS and TR2 Model Updating Results
Updating Primary Lab Batch Types 1 and 2 to
Predict Secondary Full Batch Types 1 and 2
Method
bˆ
2
RMSEC RMSEM RMSEV
R2
TR2
2.10
0.731
0.014
0.264
0.966
PLS
3.29
0.658
0.024
0.266
0.964
η
λ
1.70
54.29
3
factors
19.31
• PLS prediction equivalent to TR2
• The discrete factor aspect of PLS can make it difficult to
capture the underlying continuity of the landscapes
18
Sparse TR Calibration Maintenance
• TR2:
m in
 Xb  y
2
2
η
2
b
2
2
 λ M b  yM
2
2
2

• TR2b (sparse model):

m in
Xb  y
2
2
η
2
Lb
2
2
2
 λ M b  yM
2
2

• L = diagonal matrix with lii = 1/│bi│
Gorodnitsky IF, Rao BD. IEEE Transactions on Signal Processing 1997; 45:
600-616.
• TR2-1 (sparse model):
m in
 Xb  y
2
2
η b
 λ M b  yM
2
1
2
2

19
TR2-1 Sparse Model Merit Landscapes

m in X b  y
Models with increasing η
RMSEC
bˆ
2
λ
2
2
η b
 λ M b  yM
2
1
RMSEM
RMSEV
λ
• Similar landscape
trends
• For small λ values,
the η values are the
same across λ
• At greater λ values,
the η values vary
across λ
20
2
2

TR2 and TR2-1 Model Updating Results
Updating Primary Lab Batch Types 1 and 2 to Predict
Secondary Full Batch Types 1 and 2
Method
bˆ
R2
RMSEC RMSEM RMSEV
2
TR2-1
10.65
0.666
0.029
0.227
0.970
TR2
2.10
0.731
0.014
0.264
0.966
PLS
3.29
0.658
0.024
0.266
0.964
η
λ
7650
2442
1.70
54.29
3
factors
19.31
TR2-1 prediction
results improve
over TR2 and PLS
0.5
TR2-1
5
bˆ i
TR2
0.2
0
PLS
0
0
-0.2
-5
-0.5
-0.4
8000
9000
cm-1
10000
8000
9000
cm-1
10000
8000
9000
cm-1
10000
21
Other TR Sparse Maintenance Methods
• TR2-1b (sparse models when L = I or L ≠ I):
m in

2
Xb  y
2
 η Lb
2
 λ M b  yM
2
1
2

• TR1-2b (full wavelength when L = I):
m in

Xb  y
2
2
η
2
Lb
2
2
 λ M b  yM
1

• TR1 (sparse models when L = I or L ≠ I):
m in

Xb  y
2
2
 η Lb
1
 λ M b  yM
1

22
Other TR Applications
• Updating a primary model:
– for extra virgin olive oil adulterant quantitation to a
new geographical region (applicable to new
seasons)
– to a new temperature
– formed on one instrument to work on another
23
Summary
• Only a few samples needed for M with appropriate
weighting
• Same samples measured in primary and
secondary conditions are not needed
– Avoids long term stability issue
• PLS and other methods can be used
– Discrete nature (PLS factors) can limit landscapes
• Need to select a pair of tuning parameters for “a”
model
• Requires reference values for yM
24
Consensus (Ensemble) Modeling
• Samples predicted with a collection of models
– Composite (fused) prediction is formed
– Simple mean prediction used here
• Typically form models by random sampling
across calibration samples and/or variables
• From collection, filter for model quality
• Ideal models:
1. High degree of prediction accuracy
2. Small but noteworthy difference between selected
models (model diversity)
25
Consensus TR and PLS Modeling
• Models formed from varying tuning
parameter values
• Plot predicted values against reference
values for X,y and M,yM
• Use respective R2, slope, and intercept
model merit values
• Natural target values:
–
→1
– Slope → 1
– Intercept → 0
R2
Merit
Min
Max
0.85
0.95
Slope X,y 0.85
0.95
R2 X,y
Intercept
0.30
1.00
0.98
0.99
0.01
0.20
0.95
0.99
X,y
R2 M,yM
Intercept
M,yM
Slope
M,yM
26
TR and PLS Consensus Models (RMSEV)
1 PLS Model
348 TR2 Models
1
0.6
0.5
Factors
η
5
10
0.4
15
20
λ
Model with increasi8ng η
628 TR2-1 Models
λ
0
0.3
0.002
1.0
37
• Fewer PLS models selected
due to sharpness of
landscapes from the discrete
factor nature of PLS
• Number of “good” models can
be made to increase by
reducing the increment sizes
of η and λ
27
Consensus Mean Model Updating Results
Updating Primary Lab Batch Types 1 and 2 to Predict
Secondary Full Batch Types 1 and 2
No.
Method
Models
bˆ
2
RMSEC RMSEM RMSEV
R2
TR2
348
4.32
0.552
0.016
0.284
0.955
TR2-1
628
16.32
0.580
0.007
0.274
0.958
PLS
1
3.29
0.658
0.024
0.266
0.964
η
Λ
0.591
207
0.379
1619
3 factors
19.31
• The one PLS model predicts best
• PLS limited to discrete factors where TR
allows 0 ≤ η < ∞ to more fully resolve the
landscape
28
Consensus Models and Correlations
2.5
TR2
348 models
1.5
i
0.5
0
4
2
Frequency
bˆ 1
x 10
1.5
1
0.5
-0.5
8000
9000
0
10000
0.6
x 10
TR2-1
628 models
bˆ i
10
0
-10
8000
9000
cm-1
10000
0.9
1
4
6
Frequency
20
0.7
0.8
Correlation
4
2
0
0
0.5
Correlation
1
29
Summary
• Only a few samples needed for M with appropriate
weighting
• Same samples measured in primary and secondary
conditions are not needed
– Avoids long term stability issue
• Can select “a” model or a collection of models
– Natural target values (thresholds) with model merits R2, slope, and
intercept for primary and secondary standardization sets
– Work in progress
• Requires reference values for yM
30
Without Reference Samples
Beer’s law: x = yaka + yiki + m +
+n
ka = pure component (PC) analyte spectrum
ki = PC spectrum of ith interferent
(drift, background, etc.)
m = rest of the sample matrix
n = spectral noise
T
yˆ  x bˆ





T
T
T
T
yˆ  y a k a bˆ  y i k i bˆ  m b  ...  n bˆ
• Ideal situation:
WHEN: 1 .  k Ta bˆ   1


T
T
2. k i bˆ an d m bˆ  0
3. bˆ
T
su ch th at n bˆ  0
2
THEN: 4. yˆ  y a
• Cannot simultaneously satisfy 1, 2, and 3 to obtain 4
31
Compromise PCTR2 Model

m in k b  1
T
a
2
2
η b
2
2
2
 λ Nb  0
2
2
2

N = spectra without analyte, e.g., ki
• Minimizing the sum requires a tradeoff between
the three conditions
– The closer the three conditions are met, the more
likely yˆ  y a
• Updating the non-matrix effected PC ka to predict
in current conditions (spanned by N)
32
Sources of N
• PC interferent spectra
– Reference values are 0
• Matrix effected samples without the analyte
– Reference values are 0
• Constant analyte samples
– Reference values are 0 after spectra are mean centered
• Estimate using samples with reference values
T

yy 
N  I T X
y y

Goicoechea et al. Chemom. Intell. Lab. Syst.
56 (2001) 73-81
• Samples for N need to be measured at current
conditions
33
Extra Virgin Olive Oil Adulteration
• EVOO samples: Crete, Peloponnese, and Zakynthos
–
–
–
–
RR calibration: 56 samples spiked 5, 10, and 15% (wt/wt) sunflower oil
Primary: PC sunflower oil, 1 sample
Secondary: EVOO, 25 samples
Validation: 22 spiked samples
• Synchronous fluorescence
spectra 270 to 340 nm
at Δλ=20 nm
3
x 10
6
Zakynthos
Intensity
Sunflower
EVOO
2
1
0
280
300
320
Excitation Wavelength (nm)
34
Model Merit Landscapes
RMSEN
RMSEPC
λ
η
bˆ
η
λ
RMSEV
2
η
λ
η
λ
35
H Values
PCTR2 at η = 9.1e3
RR with PCTR2 cal samples
1
1
0.8
0.8
H
0.6
0.6
0.4
0.4
0.2
0.2
1
3.6e-6
3.6e-3
λ
3.6
1.8e3
0
95
9.7e4
1.0e8
η
RR full cal
1
H
0.8
H i   RM SEN i RM SEN m ax  
2
0.6

H i   R M SEC i R M SEC m ax  
2
0.4
bˆ i

bˆ m ax
2
bˆ i
2
2
bˆ m ax

2
2

2
0.2
95
9.7e4
η
1.0e8
36
Model Updating From PC Sunflower
Method
(No.
Samples)
bˆ
PCTR2 (26)
RMSEV
R2
2.6e-7
0.031
0.882
RR (56)
4.0e-7
0.028
0.649
RR with
PCTR2
samples (26)
2.3e-7
0.077
0.787
0.2
0.15
yˆ i
0.1
2
RR
PCTR
LS line
LS line
Equality
η
λ
9.1e3
0.0036
1.9e5
-
• Updated PC predicts better
than a full calibration
1.6e6
-
yi = 0.422xi + 0.048
yi = 0.807xi - 0.0074
bˆ
2
0.05
0
0.05
0.1
yi
0.15
37
Temperature Data Set
• Wülfert, et al., Anal. Chem. 70 (1998) 1761-1767
• hhttp://www.models.life.ku.dk/datasets; Dept. of Food Sciences,
Univ. of Copenhagen
•
•
•
•
•
•
Water, 2-propanol, ethanol (analyte)
850 to 1049 nm at 1 nm intervals at 30, 40, 50, 60, and 70°C
Calibration: 13 mixtures from 0% to 67% at 30°C
Validation: 6 mixtures from 16% to 66% at 70°C
Primary: PC ethanol at 30°C
Non-analyte matrix (standardization set) N at 70°C
PC interferents water and 2-propoanol (2 samples)
Blanks (3 samples)
Constant analyte (CA, 5 samples)
38
PCTR2 Model Merit Landscapes
RMSEPC
RMSEN
100
100
0.8
0.37
η
1
0.37
0.8
0.6
0.6
1e-3
1e-3
0.4
2.73e-6
0
0.2
8.7e-7 9.5e-5 1.1e-2
1.2
0.4
2.73e-6
0.2
0
100
bˆ
8.7e-7 9.5e-5 1.1e-2
1.2
100
RMSEV
2
100
14
100
0.6
12
0.37
10
η
8
1e-3
0.5
0.37
0.4
1e-3
0.3
6
4
2.73e-6
0.2
2.73e-6
2
0
8.7e-7 9.5e-5 1.1e-2
λ
1.2
100
0.1
0
8.7e-7 9.5e-5 1.1e-2
λ
1.2
100
39
Model Updating From PC 30°C to 70°C
Method
(No. Samples)
N
PCTR2 (6)
Blanks
Int PC
PCTR2 (9)
bˆ
RMSEV
R2
13.41
0.037
0.97
Blanks
CA
17.23
0.050
0.99
PCTR2 (8)
CA
Int PC
15.42
0.093
0.98
PCTR2 (11)
Blanks
CA
Int PC
23.32
RR at 30°C
no update (13)
-
RR at 70°C (13)
-
2
4.52
4.93
0.069
0.99
0.258
0.66
0.115
0.86
Updated PC predicts as well as
secondary model predicting the
secondary validation samples
η
λ
5.2e-5
0.021
5.2e-5
0.054
5.2e-5
0.013
5.2e-5
0.054
0.043
0.034
-
40
PCTR2 and PLS Modeling Temperature
Updating analyte PC at 30°C to 70°C using interferent PC and blanks
Method
No.
Models
PCTR2
1454
PLS
84
RMSEPC
RMSEN
RMSEV
R2
13.70
0.0003
0.004
0.036
0.973
12.33
0.0003
0.013
0.054
0.963
η
Λ
0.00044
4.94
5 factors
0.17
PLS and PCTR2 predict similarly
PCTR2 RMSEV
PLS RMSEV
0.37
η
0.6
1
0.6
0.5
2
0.5
3
0.4
4
0.3
0.2
5
0.2
0.1
6
0.1
0.4
1e-3
0.3
bˆ
2.73e-6
2
0
8.7e-7 9.5e-5 1.1e-2
λ
1.2
100
Factors
100
0
2.8e-6
1.0e-3
λ
9.1e-1
100
41
PCTR2 Consensus Modeling Temperature
• On-going work
1. Cannot use R2, slope, and intercept for respective
predicted values of the PC and N
– Set thresholds for RMSEN, RMSPC, and bˆ based on
2
preliminary inspection of landscapes
• Tradeoff needed between
bˆ
, RMSEN and RMSEPC
2
– Can further filter based on predicted values
• Majority vote
• Remove outliers
2. Combine predicted value of analyte pure component
sample with predicated non-analyte samples to obtain
R2, slope, andˆ intercept
b
2
42
PCTR Variants (Calibration or Maintenance)
1.No reference values

2
m in k b  1
T
a
2
η b
2
2
2
 λ Nb  0
2
2
2

2.With current condition sample reference values

2
m in k b  1  η b
T
a
2
2
2
2
 λ M b  yM
2
2
2

3.A combination of N and M
4.Replace b 2 with b 1 or Lb 2to obtain sparse models
43
Summary
• PCTR2 calibrates (updates) to current conditions without
reference samples
• Only a few new samples needed
• Can predict better than a full calibration
– More focused to orthogonalize to the sample matrix
T
T
T
T
yˆ  y k bˆ  y k bˆ  m b  ...  n bˆ
a

a

i

i

bias
variance
• Requires PC analyte spectrum
– Does not have to be matrix effected
• Requires non-analyte samples
– Can be estimated with reference samples
44
Other TR Variants
Expression
m in

m in
 Xb  y
m in
 Xb  y
m in



2
2
2
2
Xb  y
2
2
T
Xb  y
 Xb  y
m in  X b  y
m in
2
2
m in k a b  1
m in
2
Xb  y
2
2
2
2
2
2
2
η
η
2
2
η
η
2
2
2
2
2
Lb
 η Lb
η
2
Lb
2

RR when L = I; includes variable selection
when L = diag and approximates an 1-norm
2
2
1
2
2
2
Lb
2
 λ M b  yM
 λ Nb
2
L b  b * 
η b
1
2
2


2
2

2
2
2
1

Model updating; includes variable selection

Model updating with variable selection;
approximates 0-norm when L = diag

Model updating with robustness to the
standardization set M
Calibration or updating without reference
samples
Calibration to target model b*
Adaptive LASSO and LASSO when L = I
λ b
2
1
2
 λ M b  yM
 λ M b  yM
Lb
 η Lb
Comments
2
2

Claerbout JF, Muir F. Geophysics 1973; 38: 826-844
Elastic net
Other On-Going Consensus Modeling
• In addition to combining a set of models, can combine
TR2, PLS, PCTR2, … sets of model predictions
Consensus TR2 models
Consensus PLS models
Consensus PCTR2 models
Final prediction
Consensus TR2-1 models
46