The application of piecewise direct standardisation with variable selection

advertisement
PAPER
www.rsc.org/jaas | Journal of Analytical Atomic Spectrometry
The application of piecewise direct standardisation with variable selection
to the correction of drift in inductively coupled atomic emission
spectrometry
Michael L. Griffiths,* Daniel Svozil, Paul Worsfold and E. Hywel Evans
Received 31st March 2006, Accepted 17th July 2006
First published as an Advance Article on the web 7th August 2006
DOI: 10.1039/b604728a
When an instrument response is measured over a period of time changes in the signal intensity
can occur. If this happens following the calibration of an instrument, in this instance using a
multivariate model, subsequent use of the calibration model will most probably produce
erroneous results, posing severe restrictions on the successful application of such models.
However, it is possible to solve this problem by applying multivariate techniques that attempt to
find a transformation function that acts on the measured response from a drifted instrument
transforming it to that which would be obtained on the same instrument at the time of
calibration. One multivariate technique capable of this, piecewise direct standardisation (PDS)
allows the full spectrum to be utilised without restriction, however it is shown here that it is
possible to reduce the number of wavelengths and combine drift correction with variable selection
using a previously published method. The synthetic calibration and test solutions used contained
varying concentrations of the analytes Pt, Pd, Rh and the matrix elements Al, Mg, Ce, Zr, Ba, In,
Sc and Y. Piecewise direct standardisation was then applied to a dataset comprising a set of
variables selected on the basis of the importance of their respective partial least squares (PLS)
regression coefficients. For Pt, Pd, and Rh, respectively, the relative root mean square percentage
error (RRMSE%) after the application of PDS, was 4.14, 3.03 and 1.88%, compared to 73.04,
44.39 and 28.06% without correction; it was evident that there was a clear bias in the uncorrected
concentrations for all three analytes. Confidence intervals for the analytes also showed a
significant improvement with the application of PDS, and was most noticeable for Rh and Pt.
Introduction
The effort in building a multivariate model is often considerable in terms of both time and money, it would be beneficial,
therefore, if it were possible to use the models over a period of
time. However, when the instrument response is measured
over such a period (hours or days) changes in the signal
intensity are likely to occur due to temperature fluctuations,
electronic drift, wavelength or detector instability etc. If this
happens following the calibration of an instrument, subsequent use of the calibration model will produce erroneous
results, the severity of which is dependent upon the magnitude
of the drift in the reponse signal. This poses severe restrictions
on the successful application of multivariate calibration models which generally require relatively large numbers of calibration samples in order to estimate the necessary model
coefficients.
There are many publications detailing the development of
calibration drift correction methods for broad spectrum techniques.1–9 Work on univariate calibration by Robinson eliminated the sample-to-sample difference, using either internal
Department of Environmental Sciences, Plymouth Environmental
Research Centre, University of Plymouth, Drake Circus, Plymouth,
Devon, UK PL4 8AA
This journal is
c
The Royal Society of Chemistry 2006
standards or the Zeeman effect.10 A considerable amount of
research into calibration transfer has been published in the
area of near infra-red (NIR) analysis. Three publications are
notable. Osborne and Fearn11 investigated the affects of
transferring single-wavelength calibration models between
nine different instruments for the prediction of protein and
moisture in wheat flour using NIR spectroscopy. Single wavelength bias correction terms for the two calibration equations
on each instrument were determined and the long-term stability of the calibration was studied. Later, Shenk et al.12
published results from a study where a large number of
candidate calibration equations were developed on a single
instrument and then transferred to six other instruments. The
‘‘best’’ equation was adjusted for bias, offset, and wavelength
selection on the other instruments and the standard error of
prediction (SEP) was compared between the original and the
other instruments for a set of 60 samples. Mark et al.13
published work describing the selection of wavelengths for
NIR calibration based on their robustness toward wavelength
shifts between instruments. These methods all involve calibration utilising a single, or sometimes a limited number of
wavelengths, and are not generally applicable to multivariate
calibration based on full spectral responses, or where variable
selection has been applied to the full spectrum.
The work of Marcos and Hill deals specifically with the
instrumental drift of an ICP-AES system. This work employs
J. Anal. At. Spectrom., 2006, 21, 1045–1052 | 1045
a correction factor based on the drift pattern of an intrinsic
plasma line (Ar 404.597 nm) relative to the analyte line of
interest, but performs a principal component analysis in order
to obtain the loading values on the first principal component.
In doing this, considerable amounts of noise and extraneous
information not directly related to the lines of interest are
removed, yielding correction factors containing only relevant
information. Alternatively, it is possible to solve the calibration transfer problem by applying multivariate techniques
which attempt to find a transformation function that makes
the measured response obtained from one instrument equal to
that obtained on the same instrument at a later point in time.
This method is commonly termed piecewise direct standardisation (PDS). There are two methods of correction, the first
method transforms the calibration model itself, while the
second transforms the response from the instrument at t =
2 (time at which samples were analysed) to match that which
would have been obtained if the sample had been measured at
t = 1 (time at which calibration samples were analysed). It is
the latter of these two methods that has been used in this
study. Although both methods allow the full response of the
instrument to be utilised without restriction to the number of
wavelengths that can be included in the calibration model, it
has been recommended by such workers as Spiegelman et al.
that uninformative variables should be removed prior to any
predictive modelling14 in order to remove predictive bias. This
study combines drift correction with a previously published
method of variable selection15 in an attempt to remove such
uninformative regions of the spectrum, a combination of
methods that, to the authors knowledge, has not been
attempted in the literature.
Piecewise direct standardisation
Background (direct standardisation)
There are many texts explaining the mathematical details of
piecewise direct standardisation (PDS),15–18 however a brief
explanation of the technique is warranted (readers who are
interested in the more mathematical aspects of the routine are
directed to the work of Wang and Veltkamp9 or Wang et al.;19
for a more general introduction to the method readers are
directed to the work of Behrens20 or Herrero and Ortiz.17
Assume that the response matrix (dimensioned calibration
samples by wavelengths) for the full calibration set R1, has
been measured, and R1 is a small subset of R1. The response
matrix of this subset is measured on the same instrument at a
later time, and is denoted by R2. Through standardisation, it is
hoped that the calibration information contained in R1 could
be transferred without measuring the response matrix (R2) of
the full calibration set at a later time.
Direct standardisation (DS), the forerunner of PDS, measures the spectra (future samples) at time, t = 2, and makes
corrections to match the spectra (the calibration model), built
at time, t = 1, hence the calibration model remains unchanged. In direct standardisation, the calibration response
matrices at both times are related to each other by a transformation matrix F, i.e.
R1 = R2F
1046 | J. Anal. At. Spectrom., 2006, 21, 1045–1052
(1)
where F is a square matrix whose dimensions are wavelength
by wavelength.
From eqn (1), the transformation matrix F is calculated as:
F = R+
2 R1
(2)
R+
2
where
is the generalised inverse of R2. The response vector
of an unknown sample measured at time, t = 2, r2,un, is
standardised to the response vector r1,un, expected from the
instrument at time, t = 1 according to:
rˆT1,un = rˆT2,unF
(3)
where the T superscript simply denotes the transpose of the
respective column vector. If there are many wavelengths
present, this is simply performed for each wavelength. Using
rˆT1,un (the transformed unknown sample response), the regression vector b1 constructed using response matrix R1 and the
known calibration concentration vector c (from time, t = 1),
eqn (4) can be used for the prediction of unknown concentrations analysed at time, t = 2.
c1,un = rˆT1,unb1
(4)
Note that where there are ns calibration samples, 1 component
and nl wavelengths (inverse calibration p-matrix approach) c
is a scalar quantity.
Piecewise direct standardisation (PDS)
In the previous section, the number of subset samples must be
at least equal to the rank of R1, to ensure that the inverse of R1
is stable, and hence an adequate standardisation is achieved.
In real applications this could lead to large numbers of subset
samples being needed. Also, it is noticed that in DS, the whole
spectrum at time, t = 2 is used to fit each spectral point at
time, t = 1. For real spectroscopic data, however, spectral
variations are often limited to much smaller regions. Therefore, each spectral point at time, t = 1, would more likely be
related to the spectral measurements at nearby wavelengths
than the full spectrum at time, t = 2. On the basis of these
considerations, a piecewise standardisation method has been
developed to reconstruct each spectral point at time, t = 1
from several measurements in a small spectral window at time,
t = 2. For subset measurements r1,i, at wavelength index i at
time t = 1 subset measurements at time t = 2, r2,ij,r2,ij+1,
. . . r2,i+k1, and r2,i+k at nearby wavelengths from index point
ij to i + k are chosen and put into a matrix
Xi = r2,ij,r2,ij+1, . . . r2,i+k1,r2,i+k
(5)
It is then possible to establish a local multivariate regression
model in the form of
r1,i = Xibi
(6)
Each regression vector bi can be calculated by means of
principal components regression (PCR) or PLS regression.
These regression vectors are arranged along the main diagonal
of the transformation matrix F while the rest of the elements
are zero (this is true only where there is a linear intensity
change and no wavelength shift), which results in the banded
diagonal matrix given by
F = diag(bT1 , bT2 , bT3 , . . . bTi)
This journal is
c
(7)
The Royal Society of Chemistry 2006
When compared to the direct standardisation method, piecewise standardisation is in fact a calculation of the transformation function F by setting most of the off-diagonal elements to
zero. The transformation matrix is subsequently used to
transfer rT2,un piece by piece into the spectrum as if it were
measured at time, t = 1 (eqn (3)). Predictions can then be
made using the original calibration model constructed at time,
t = 1 using eqn (4). The use of the moving window to establish
the relationship given in eqn (5), instead of using all available
spectral information, avoids the potential for ill-conditioning,
where the number of variables (predictors, i.e. spectral measurements in this case) is larger than the number of calibration
subset samples.
Selection of subset for standardisation
The sample subset (analysed at t = 2 in addition to the
unknown test samples) used in the standardisation must
contain enough information to describe the difference between
the spectra at time, t = 1 and t = 2. A stepwise procedure is
employed here to select the sample with the highest leverage
(maximum hi) according to eqn (8) (assuming that any outliers
have been detected and deleted from the calibration set).
H = R+
1 R1
(8)
The information contained in the i-th selected sample is then
removed from the set of samples by a linear transformation so
that they are all orthogonal to this selected sample. This
procedure continues until the desired number of samples have
been included in the subset. In this study, the m-function,
stdgen (a mathematical routine), implemented in Matlab’s
(high performance numeric computation visualisation software, release 12.1, Mathworks Inc.) PLS Toolbox 2.0 (Mathworks Inc.) for spectroscopic instrument standardisation,17,21
has been used to obtain the calibration transfer matrix, F. The
procedure is as follows:
(i) Calculate the hat matrix: H = R1(RT1 R1)1 RT.
(ii) The calibration standard with the highest leverage
(maximum hi) is selected: aj.
(iii) The correlation between aj and the remaining standards
is removed.
Experimental (instrumentation and reagents)
All data were collected using a simultaneous echelle inductively coupled plasma atomic emission spectrometer (Perkin–
Elmer Optima 3000 ICP, Norwalk, USA) equipped with a
segmented-charge-coupled array detection system.22,23 Instrumental operating conditions were optimised using simplex
optimisation and are given in Table 1.
Multielement solutions were prepared by serial dilution of
ultra-pure stock standards (10 000 and 1000 mg ml1, Johnson
Matthey plc, Royston, Hertfordshire). Water was deionised,
double distilled (18 MO quality), and acids were of Aristar
grade (Merk-BDH, Poole, Dorset). All glassware was acid
washed in 10% v/v nitric acid for 24 hours then rinsed
thoroughly with 18 MO water. All plasticware was metal free
high-density polypropylene (Anachem, Luton, Bedfordshire).
Calibration and test solutions containing varying concentrations of Pt, Pd, Rh, Al, Mg, Ce, Zr and Ba, plus the internal
This journal is
c
The Royal Society of Chemistry 2006
Table 1 Optimised instrumental parameters
Carrier gas flow/l min1
Auxiliary gas flow/l min1
Plasma gas flow/l min1
Viewing height above the load coil/mm
Power/W
Spray chamber
Nebuliser
Resolution
(Read time/integration time)/s
0.93
0.5l
16 l
9
1286
Ryton, double-pass
Seaspray, glass concentric
High
3/0.2
standards In, Sc and Y, were prepared from the stock solutions and stored in high-density polypropylene tubes. Synthetic calibration and independent test solutions (at
concentrations other than those specified by the experimental
design) were prepared using a Taguchi orthogonal array24 in
order to cover the required factor space with the minimum
number of experiments. The orthogonal array contained 49
experiments, each containing 8 factors (in this case 8 elements)
at 7 concentration levels, represented as OA49(78)25 (the
experimental design parameters are given in Table 2). These
matrix elements would be expected to produce quite severe
spectroscopic and non-spectroscopic matrix effects, and were
chosen in order to provide a stern test for the effectiveness of
the calibration standardisation method. Data collected from
the instrument comprised: (i) 49 5684 calibration data
points made up of 49 spectra (for the calibration data set)
each containing emission values at 5684 wavelengths spanning
the atomic emission spectrum (the spectrum consisted of those
areas covered by the instruments charge-coupled-device detector, thereby forming a segmented spectrum) and (ii) 49 166
data points comprising 166 of the most prominent emission
lines of all elements considered.
Procedures
The application of calibration transfer in this study is coupled
with the selection of informative variables from the 5684 data
points (informative plus uninformative variables, i.e. noise)
collected for each of the 49 calibration samples. The approach
used is explained in detail elsewhere by Griffiths et al.,15
however a description of both routines is warranted here.
The full segmented spectrum data set (49 5684) was
subjected to the UVE-PLS routine as follows:
(i) The original data array extracted from the ICP-AES was
a 49 5684 matrix made up of 49 spectra (for the calibration
data set) each containing 5684 data points. The data was
Table 2 Concentrations levels (mg ml1) and factors in the orthogonal array design, OA49(78)
Level
Factor
1
2
3
4
5
6
7
Pt
Pd
Rh
Ba
Ce
Zr
Mg
Al
0
0
0
0
0
0
0
0
5
5
1
1
1
1
1
1
10
10
2
5
10
10
10
10
20
20
4
10
50
50
50
100
30
30
6
50
100
100
100
200
40
40
8
100
300
300
300
500
50
50
10
200
500
500
500
1000
J. Anal. At. Spectrom., 2006, 21, 1045–1052 | 1047
autoscaled and one calibration spectrum was initially removed
to leave a 48 5684 matrix.
(ii) The PLS routine was applied and the bij regression
coefficients were extracted for the optimum number of latent
variables (LVs). Steps (i) and (ii) were then repeated 49 times.
(iii) Jackknife estimated standard errors were calculated for
the regression coefficients (bij), and a two-sided t-test was
performed, at each wavelength, in order to determine which
were equal to zero at the 95% level of confidence.
(iv) Those variables corresponding to bij = 0 were rejected
from the original 49 5684 data matrix before progression
onto the next step of the process (IVD-PLS).
It is essential that the regression coefficients used in the final
predictive PLS model are both relatively large and have low
standard errors. Informative variable degradation by PLS
(IVD-PLS) was applied as follows:
(i) The reduced 49 (5684 n) data matrix resulting from
the application of UVE-PLS was autoscaled and one calibration spectrum was initially removed.
(ii) The PLS routine was applied and the regression coefficients (bij) extracted for the optimum number of LVs. Steps (i)
and (ii) were then repeated 49 times.
(iii) Jackknife estimated standard errors, s(bij), and the
mean bij, were calculated.
(iv) The varjivd = bj/s(bj) ratios were calculated and ranked
in descending order.
(v) The variables in the data matrix were ranked in accordance with varjivd = bj/s(bj) ratios and the cumulative sum
calculated.
(vi) A PLS calibration model was then built using meancentred data that contributed to the first 30% of the varjivd =
bj/s(bj) ratio, and the RMSECV (root mean square error of
cross validation; an error measure similar to RRMSE except
that each calibration sample is omitted in turn and the error
calculated as the sum of the errors) calculated. The process
was repeated using the first 40%, 50% and so on at 10%
intervals of the cumsum data and the model with the lowest
RMSECV value was chosen as optimal.
Fig. 1 illustrates the procedure for the application of spectra
standardisation using PDS with variable selection. Prior to
PDS standardisation, outliers in the original data set are
removed using the inspection of concentration residual plots
(outliers tend to produce high leverage values). Because the
subset response matrices, R1 and R2, must contain the same
Fig. 1 Flow-diagram of the process of standardisation with variable selection (49 5684 denotes size of the calibration data set; 49 n denotes
the size of the calibration data set after the application of the variable reduction routines; 49 (48 5684) refers to the size of the calibration data
set during the application of the variable reduction routines.
1048 | J. Anal. At. Spectrom., 2006, 21, 1045–1052
This journal is
c
The Royal Society of Chemistry 2006
Table 3 Synthetic independent test RRMSE% values for Pt, Pd and Rh using PLS1 with variable selection (6, 8 and 10 PC’s), full spectra
modelling and the data set containing 166 gross analyte and matrix lines
Pt
Variable selection [mc]
RRMSE%
Remaining variables after UVE-PLS
Variables selected
5.27(6)
375
108
Full spectrum (5684 wavelength points) [as]
PRMSE%
—
Pd
5.18(8)
906
73
5.58(10)
1000
119
2.51(6)
933
99b
2.33(8)
1186
103b
2.56(10)
1174
245b
1.66(6)
168
35
1.52(8)
501
165
1.62(10)
537
140
12.64(8)
—
—
8.31(8)
—
—
27.15
—
—
—
7.06(5)
—
—
3.18(7)
—
Individual wavelengths (166 analyte and matrix lines) [as]
PRMSE%
—
8.38(5)
a
[mc] Data were mean centred.
b
Rh
Variables were selected using the IVD plot. c [as] Data were autoscaled.
(iii) The PLS calibration model is obtained using the
original (t = 1) mean centred calibration data, and the
synthetic test samples from time t = 2 scaled appropriately.
(iv) Predictions are then made for the standardised synthetic
samples from step (iii) using PLS.
The m-function, stdgen, implemented in the PLS Toolbox
for spectroscopic instrument standardisation (referenced previously) has been used to obtain the calibration transfer matrix
in this instance. The RRMSE% values (calculated according
to eqn (9)) obtained for the independent test samples (PDS
applied) using a series of different stdgen parameters (window
and sample subset size—see explanation below) are shown in
Tables 4–6 for Pt, Pd and Rh, respectively.
spectral regions for the calculation of the PDS transformation
matrix, F, the variable reduction routines (UVE-PLS and
IVD-PLS) must be performed on the original and the drifted
data sets taken at time, t = 1 and t = 2, respectively, prior to
the application of the PDS routine. To accomplish this,
alterations were made to both the UVE-PLS and IVD-PLS
routines as described by Griffiths et al.,15 this enabled the
simultaneous selection of informative wavelengths, based
upon data set 1, from data set 2. Thus, not only does the
method standardise the spectra in terms of differences in signal
intensity, but in doing so it also standardises the spectra from t =
2, in terms of variable importance, making those variables
which were informative at t = 1, also informative at t = 2.
To perform the UVE-PLS and IVD-PLS routines the
number of principal components (PC’s) must be chosen.
Because the RRMSE% values did not differ greatly with the
number of PC’s chosen (Table 3), 6 PC’s were used in
UVE-PLS routine for Pt, Pd and Rh. The number of PC’s
for the IVD-PLS routine was then determined by using a
cross-validation procedure. After the variable subsets had
been obtained the PDS routine is applied and synthetic sample
predictions are then made. This consists of four separate steps:
(i) After obtaining the data using the variable reduction and
selection algorithms, the calibration transfer subset samples are
determined using stdgen. The function stdgen simply identifies
those calibration experiment(s) with the highest leverage. These
specific (by concentration) samples are then made up once again
at t = 2 and are here referred to as the sample subset.
(ii) The transform matrix is calculated using the sample
subset obtained in step (i). This data is not pre-processed in
any way.
Where yi is the actual concentration of sample i, ŷi the
predicted value of the i-th sample and N the number of
samples. It should be noted that the m-function, stdgen,
implemented in the PLS Toolbox, requires the number of
PC’s to be allocated for the construction of the transformation
matrix F. For this study the number of PC’s allowed was set
equal to the number of samples in the subset. In the event that
unsatisfactory RRMSE% values were obtained with any
particular combination of window size (the number of spectral
points taken into consideration when constructing the transformation matrix, F; if set to zero, then direct standardisation
is used) and sample subset size, the number of PC’s could be
optimised.
Table 4 Independent test sample RRMSE% values for Pt (PDS
applied) for a range of calibration transfer functions (stdgen
m-function). Calibration transfer subset sample size, window size
and principal component number (set to sample size) parameters of
stdgen m-function
Table 5 Independent test sample RRMSE% values for Pd (PDS
applied) for a range of calibration transfer functions (stdgen
m-function). Calibration transfer subset sample size, window size
and principal component number (set to sample size) parameters of
stdgen m-function
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
vP
u
u ð^
y yi Þ
t i i
1
RRMSEð%Þ ¼ 100 meanðyÞ
N
Pt test sample (RRMSE%)
Pd test sample (RRMSE%)
Window size
Sample
size
1
3
5
7
9
3
5
7
9
8.15
5.48
5.72
7.96
6.56
5.28
5.83
8.46
5.8
5.49
5.64
8.12
4.6
4.86
6.6
9.06
4.41
4.81
6.58
9.12
This journal is
c
ð9Þ
11
Maximum
number
of PC’s
Sample
size
1
3
5
7
9
11
Maximum
number
of PC’s
4.14
4.77
6.64
9.02
3
5
7
9
3
5
7
9
3.27
3.55
3.33
3.21
3.07
3.57
3.38
3.28
3.05
3.59
3.35
3.24
3.03
3.58
3.38
3.23
3.53
3.61
3.31
3.15
3.41
3.63
3.38
3.23
3
5
7
9
The Royal Society of Chemistry 2006
Window size
J. Anal. At. Spectrom., 2006, 21, 1045–1052 | 1049
Table 6 Independent test sample RRMSE% values for Rh (PDS
applied) for a range of calibration transfer functions (stdgen mfunction). Calibration transfer subset sample size, window size and
principal component number (set to sample size) parameters of stdgen
m-function
Rh test sample (RRMSE%)
Sample
size
Window size
1
3
5
7
9
11
Maximum
number
of PC’s
3
5
7
9
4.17
3.65
3.83
3.89
5.7
2.14
2.15
3.08
7.24
1.88
2.04
2.12
8.64
1.97
2.2
2.14
8.34
2.06
2
1.99
13.02
2.08
2.08
2.23
3
5
7
9
The data sets used in this study were collected 11 days apart
(data set t = 1 and t = 2 refers to data collected on 16/7/99
and 26/7/99, respectively) thereby giving sufficient time for the
instrument to drift. In this instance both the calibration and
independent test sample data were collected on both occasions. The effects of varying both the window and sample
subset size on the PDS corrected RRMSE% values were
therefore relatively straightforward. In most practical situations such an optimisation of the stdgen settings may not be
possible.
Results and discussion
Instrumental drift
In order to illustrate the amount of drift the instrument
experienced between analysing the two data-sets, a solution
containing the middle concentration of all elements was
analysed at every 10th sample for each data-set (Fig. 2). The
line chosen was In 325.609 nm as this experienced the least
spectral interference of any of the lines. It is evident from Fig.
2 that the drift for this line was considerable, indicating that
the prediction of data-set 2 samples using a calibration derived
from data-set 1 would give erroneous results.
Calibration subset optimisation, RRMSE% values and variable
selection
Fig. 3 Lowest independent test sample RRMSE% values for Pt, Pd
and Rh (PDS applied).
able selection method compared to either use of the full
spectrum or the most prominent emission lines. The largest
reduction in RRMSE% was for Pd, from 7.06%, using
individual prominent wavelengths, to 2.33% after applying
the variable reduction technique. The effect of calibration
subset sample number and spectral window size on RRMSE%
values for Pt, Pd and Rh is shown in Fig. 3. For Pt and Pd the
lowest RRMSE% values were 4.14% and 3.03% (Tables 4 and
5), respectively, using only 3 calibration subset samples. For
Rh the number of samples increases to 5 giving a % RRMSE
value of 1.88% (Table 6). Generally, for all three analytes the
RRMSE% value does not decrease with increasing numbers of
calibration subset samples, which may indicate a lack of
intrinsic modelling ability (lack of fit). The effect of varying
the window size was also very clear, with a distinct minimum
RRMSE% value for all three analytes. This was to be
expected, too few wavelengths would not give enough information for standardisation and too many would result in the
noise component predominating. The large number of wavelengths required for Pt (11 compared to 7 and 5 for Pd and Rh,
respectively), may indicate a large non-linear response for this
analyte at the wavelength regions modelled, a feature seen by
Wang et al.9
Because the RRMSE% values (4.14, 3.03 and 1.88% for Pt,
Pd and Rh, respectively (Tables 4–6)) obtained were acceptable, PC optimisation within the stdgen procedure was not
The effect of the variable reduction algorithms is clearly shown
in Table 3, where for Pt, Pd and Rh the RRMSE% values for
the independent test samples fell dramatically using the vari-
Fig. 2 Central point In 325.609 nm concentration (mg ml1) over time
for data-sets 1 and 2 (16/7/99 and 26/7/99, respectively) using gross
intensity.
1050 | J. Anal. At. Spectrom., 2006, 21, 1045–1052
Fig. 4 Actual vs. predicted concentrations for Pt independent test
samples using t = 1 original calibration data and t = 2 PDS corrected
independent test samples (error bars constitute 95% prediction
intervals).
This journal is
c
The Royal Society of Chemistry 2006
t = 2 samples had taken place. Prediction intervals (95%),
calculated using the jackknife method outlined by Griffiths et
al.,15 for the three analytes also showed a significant improvement with the application of PDS correction. This is most
noticeable for Rh, Pt shows a moderate improvement, with Pd
showing no real difference in the confidence intervals (Fig. 4–
6). The relatively large confidence intervals shown by Rh (Fig.
6) without correction may be correlated to the much lower
concentration range of the Rh test samples (between 0–10
mg ml1), compared to both Pt and Pd which varied between
0–40 mg ml1.
Fig. 5 Actual vs. predicted concentrations for Pd independent test
samples using t = 1 original calibration data and t = 2 PDS corrected
independent test samples (error bars constitute 95% prediction
intervals).
carried out and was simply set equal to the maximum number
of calibration subset samples.
Multivariate calibration and quantitative prediction for
synthetic samples
The results of PDS correction to the calibration and independent test samples analysed 10 days apart (i.e. using t = 1
calibration data with standardised t = 2 test samples) are
shown in Fig. 4, 5 and 6 for Pt, Pd and Rh, respectively, using
a linear least squares regression of PLS predicted concentrations versus actual for the test samples. It is quite evident that
the accuracy of the test sample concentrations was much
improved when correction was applied. For Pt, Pd, and Rh,
respectively RRMSE% values with correction were 4.14, 3.03
and 1.88%, compared to 73.04, 44.39 and 28.06% without
correction, respectively. It is evident that a clear bias exists for
the uncorrected concentrations for all three analytes (Fig.
4–6). With PLS prediction performed using a calibration
model constructed from data analysed at the same time as
the independent samples (i.e. at time, t = 2), the RRMSE%
values for Pt, Pd and Rh were 6.51, 3.54 and 2.85%, respectively, indicating that a satisfactory transformation of the
Fig. 6 Actual vs. predicted concentrations for Rh independent test
samples using t = 1 original calibration data and t = 2 PDS corrected
independent test samples (error bars constitute 95% prediction
intervals).
This journal is
c
The Royal Society of Chemistry 2006
Conclusions
It has been shown that relatively few (3, 3 and 5 subset
samples, respectively, for Pt, Pd and Rh) are required in order
to construct a satisfactory transformation function (F) for
samples analysed subsequent to the calibration data set; a
promising alternative when recalibration using the entire
calibration data set is not desired, as is often the case with
multivariate calibration models. It has been clearly demonstrated that partial least squares regression can been successfully applied to the prediction of transformed data, collected at
time t = 2, using calibration data collected previously at time,
t = 1. Not only were the predictions successful, and the
transfer function able to correct for significant time-drift
effects, but it was also possible to standardise the spectra from
t = 2, in terms of variable importance, making those variables
which were informative at t = 1, informative at t = 2 also,
however the degree to which this is so has yet to be investigated.
The application of PDS, in conjunction with variable elimination, has not only shown that it is possible to use the same
calibration model over an extended period of time, but that the
elimination of uninformative (or redundant) information in
the ICP-AES emission spectrum removes both the need for
line selection, and increases prediction accuracy and precision.
References
1 L. G. Thygesen and S. O. Lundqvist, J. Near Infrared Spectrosc.,
2000, 8, 191.
2 P. Geladi, H. Barring, E. Dabakk, J. Trygg, H. Antti, S. Wold and
B. Karlberg, J. Near Infrared Spectrosc., 1999, 7, 251.
3 C. E. Anderson and J. H. Kalivas, J. Appl. Spectrosc., 1999, 53,
1268.
4 J. Sjoblom, O. Svensson, M. Josefson, H. Kullberg and S. Wold,
Chemom. Intell. Lab. Syst., 1998, 44, 229.
5 F. Despagne, B. Walczak and D. L. Massart, J. Appl. Spectrosc.,
1998, 52, 732.
6 E. Bouveresse, C. Casolino and D. L. Massart, J. Appl. Spectrosc.,
1998, 52, 604.
7 H. Swierenga, W. G. Haanstra, A. P. de Weijer and L. M. C.
Buydens, J. Appl. Spectrosc., 1998, 52, 7.
8 Z. Y. Wang, T. Dean and B. R. Kowalski, J. Appl. Spectrosc.,
1995, 67, 2379.
9 Y. D. Wang, D. J. Veltkamp and B. R. Kowalski, Anal. Chem.,
1991, 63, 2750.
10 J. W. Robinson, Atomic Spectroscopy, Marcel Dekker, Inc., New
York & Basel, 1990.
11 B. G. Osborne and T. J. Fearn, J. Food Technol., 1983, 18, 453.
12 J. S. Shenk, M. O. Westerhaus and W. C. Templeton, Crop Sci.,
1984, 25, 159.
13 H. Mark (Jr) and J. Workman, J. Spectrosc., 1988, 3, 28.
J. Anal. At. Spectrom., 2006, 21, 1045–1052 | 1051
14 C. H. Spiegelman, M. J. McShane, M. J. Goetz and M. Motamedi,
Anal. Chem., 1998, 70, 35–44.
15 M. L. Griffiths, D. Svozil, P. Worsfold, S. Denham and E. H.
Evans, J. Anal. At. Spectrom., 2002, 17, 800–12.
16 L. Zhang, G. W. Small and M. A. Arnold, Anal. Chem., 2003, 75,
5905–15.
17 A. Herrero and M. C. Ortiz, Anal. Chim. Acta, 1997, 348, 51–59.
18 H. W. Tan and S. D. Brown, J. Chemom., 2001, 15, 647–64.
19 X. Wang, T. Dean and B. R. Kowalski, Anal. Chem., 1995, 67,
2379.
1052 | J. Anal. At. Spectrom., 2006, 21, 1045–1052
20 A. Behrens, Spectrochim. Acta, Part B, 1997, 52, 445–58.
21 B. M. Wise, PLS Toolbox for use with Matlab, 2006, available from
the author.
22 T. W. Barnard, M. I. Crockett, J. C. Ivaldi and P. L. Lundberg,
Anal. Chem., 1993, 65, 1225.
23 T. W. Barnard, M. I. Crockett, J. C. Ivaldi, P. L. Lundberg, D. A.
Yates, P. A. Levine and D. J. Sauer, Anal. Chem., 1993, 65, 1231.
24 T. Y. Chen and C. Y. Lin, Finite Elements Anal. Des., 2000, 36,
1–16.
25 N. J. A. Sloane, http://www.research.att.com/Bnjas/oadir.
This journal is
c
The Royal Society of Chemistry 2006
Download