Variable reduction algorithm for atomic emission spectra: application

advertisement

Variable reduction algorithm for atomic emission spectra: application to multivariate calibration and quantitative analysis of industrial samples

Michael L. Griffiths, Daniel Svozil, Paul Worsfold, Sue Denham and E. Hywel Evans*

Department of Environmental Sciences, Plymouth Environmental Research Centre, University of Plymouth, Drake Circus, Plymouth, Devon, UK PL4 8AA.

E-mail: hevans@plymouth.ac.uk; Tel:

1

44 (0)1752 233040

Received 2nd April 2002, Accepted 29th May 2002

First published as an Advance Article on the web 11th July 2002

A variable selection procedure has been developed and used to reduce the number of wavelength data points necessary to formulate a predictive multivariate model for Pt, Pd and Rh using full atomic emission spectra

(5684 wavelength data points per spectrum) obtained using a Segmented-Array Charge-Coupled Device

Detector (SCCD) for inductively coupled plasma atomic emission spectrometry (ICP-AES). The first stage was the application of an Uninformative Variable Elimination Partial Least Squares (UVE-PLS) algorithm which identified the PLS regression coefficients that are equal to zero at a specified significance level, and removed the associated variables. The second stage was the application of an Informative Variable Degradation PLS (IVD) algorithm, which ranked variables using the ratio of their PLS mean regression coefficients and regression coefficient estimated standard errors (estimated using the jackknife). Variable selection was then achieved by examination of the cumulative sum of these ratios. The algorithms were applied to emission spectra for the determination of Pt, Pd and Rh in a synthetic matrix (Al, Mg, Ce, Zr, Ba, In, Sc and Y at various concentrations). The variable selection routines reduced 5684 wavelength data points to 47, 110 and 334 and gave relative root mean square error (RRMSE) values of 4.60, 3.20 and 1.65% for Pt, Pd and Rh, respectively, compared with 12.6, 8.32 and 27.2% obtained using all 5684 data points; and 5.19, 7.06 and 3.18% obtained when using 166 pre-selected, integrated atomic emission lines. There was no requirement for wavelength or background correction point selection or for wavelength alignment because the entire available segmented spectrum was used in the calibration model. This approach was extended by using real industrial fusion samples to build a calibration model in order to predict the concentration of Au, Ag and Pd in similar matrices. RRMSEs of 23, 29 and 9.1% were obtained for the respective elements in test samples.

Introduction

Partial Least Squares is intended as a full spectrum calibration method, but this often necessitates pre-treatment of the data to reduce the number of non-informative variables to an acceptable level prior to bilinear modelling. To this end, several authors have investigated the application of variable reduction in spectroscopy, in particular for broad-spectrum techniques such as Near Infrared 1–3 and Nuclear Magnetic Resonance.

4

The necessity for variable reduction in atomic spectrometry is even more acute because the spectra can contain a much higher ratio of noise to useful data than molecular spectra. To the authors’ knowledge very little work in this area has been published in relation to atomic spectroscopy. Factor analysis has been employed by Wirsz and Blades

5 for qualitative and quantitative multielement analysis in inductively coupled plasma atomic emission spectrometry (ICP-AES) to identify sample components and to minimise problems arising from spectral overlap. More recently, Morales et al.

6 and van Veen et al.

7 used multicomponent analysis to address the problem of using the full ICP-AES spectrum: however, the method relied upon the use of pure component spectra ( i.e.

it assumed that linear additivity was true under all circumstances). When chemical and/or physical interelement effects (matrix effects) occur this is not the case, and subsequent models will therefore yield inaccurate regression coefficients. The work of Soudier and Mermet using normalised relative line intensities

8 has the drawback that intelligent background correction must be conducted irrespective of the spectral interferences in order to

800 J. Anal. At. Spectrom.

, 2002, 17 , 800–812 obtain the net line intensity. This is difficult, if not impossible, with line rich spectra and varying sample matrices.

Variable selection is a crucial step in data analysis because it influences the quality of the data during the calibration procedure. The goal of variable selection is to identify a subset of spectral frequencies that produces the smallest possible predictive model error. Noisy variables can confound PLS models, and removal of the redundant data has been mathematically demonstrated to improve calibration.

9

Recently, considerable effort has been directed towards developing procedures that objectively identify those variables that contribute useful information and/or eliminate noise.

10–24,42

Several elaborate search-based strategies such as genetic algorithms (GA’s), 17,20,25,26 simulated annealing (SA), 18 and artificial neural networks (ANN) 27 have been developed. These algorithms are intended to be global optimisers that are capable of locating the best set of parameters for a given large-scale optimisation problem. Essentially, the algorithms are stochastic search methods that tolerate temporary decreases of quality during optimisation. This important feature enables the algorithms to escape local optima without supervision. However, although successfully applied to variable selection problems, particularly in NIR, these methods have significant drawbacks.

First, they are extremely slow in execution and, second, they represent a tremendous configuration challenge to the user because of the numerous adjustable factors that significantly affect model prediction. In addition, the higher the ratio between the number of variables and number of samples, the higher the risk of chance correlations. In a recent study by

DOI: 10.1039/b203239m

This journal is # The Royal Society of Chemistry 2002

R. Leardi et al.

28 it was found that a variable/sample ratio of 5 was the critical point beyond which the probability of chance correlation became too high. While they can be extremely useful for applications with low variable to sample ratios, the speed, configuration complexity, and level of expertise required limit the circumstances under which they can be applied successfully. For ICP-AES, where the number of variables can be several thousand, the probability of chance association prohibits the use of such methods unless the number of variables can be reduced significantly. Hence, there is a need to develop methods that are computationally rapid and can be applied to systems of a complex nature. One approach is the use of a stepwise procedure.

Several stepwise selection schemes have been proposed to select or eliminate variables from data.

11–13,24,29–31

In addition to these methods, uninformative variables have been deleted using a number of other procedures. Martens and Naes

32 suggested replacing small loading values with zeros. A similar method, called intermediate least squares (ILS) has been described by Frank

33 and modified by Lingren et al.

34,35

In interactive variable selection (IVS) the uninformative variables are detected for each PLS dimension by applying a threshold procedure to the vector of PLS weights. A related approach, which has often been used for the selection of wavelengths, is based on the conclusion that important wavelengths are those with the greatest loading values in early loading vectors from the PLS algorithm regardless of the sign. However, it has been established, through the use of cyclic subspace regression, that variable selection using this method can be misleading.

36

Centner et al.

1 have also shown that deleting variables with small PLS regression coefficients in a model obtained with autoscaled data can prove useful. A common weakness of all these methods, however, is that no explicit rule exists for identifying the optimal number of variables that should be used in the final model.

A popular criterion used to identify informative variables in spectra is the degree of correlation between wavelength and concentration. The correlation of the i th wavelength (cor i

) with c is given by cor i

~ cov( r i c ) s ri s c

(1) where c represents the m 6 1 calibration concentration vector for the analyte of interest; r i represents the m 6 1 vector of responses measured at the i th wavelength for c ; s ri represent standard deviations of the r i and s and c vectors, c respectively; and cov denotes the covariance function. The standard error of calibration for the i th wavelength (SEC i

) has also been used to select wavelengths, but it can be shown that

SEC i is equivalent to cor i

.

37

The wavelength selection procedure based on the ratio of the regression coefficient for the i th wavelength and the residual variance at the i th wavelength has been shown to be equivalent to eqn. (1).

16

However, wavelengths with low correlation may still be useful in that they explain effects other than direct dependence upon concentration, e.g.

matrix effects.

Faber et al.

38 reported an error propagation study for principal component analysis. He described how the uncertainties are carried over from the data to the estimated parameters and how measurement error and number of variables influence the bias of the eigenvalues, the model complexity, and the variance of the eigenvalues. The bias on the a th eigenvalue ( b l a

) has been defined as b l a

~

^ a

{ l ~ ( n z p { A ) s 2

M

(2) l a is the biased estimate of the a th eigenvalue l a

, n and p are the numbers of objects and variables respectively, A is the correct dimensionality ( i.e.

pseudorank), and s 2

M is the measurement error. It is evident that a large measurement error and a large number ( p ) of the uninformative variables will increase the bias in the eigenvalues and also the model bias.

The variable selection method described in this paper is twofold. The first part consists of elimination of those variables with regression coefficients close to zero (Uninformative

Variable Elimination, UVE-PLS). The second part takes the remaining variables and ranks them using the ratio of their PLS regression coefficient to their respective regression coefficient standard error (Informative Variable Degradation, IVD). The reduced data is then used to build a multivariate calibration model for correction of spectroscopic and non-spectroscopic interferences in ICP-AES.

Theory

Uninformative variable elimination by PLS (UVE-PLS)

The method proposed here is based upon that of Centner et al .

1 and modified by both Westad et al .

3 and Faber.

39

In the original paper by Centner, a reliability criterion, eqn. (3), was used in conjunction with the addition of random noise to the original data.

c j

~ b j

= s ( b j

) (3)

The reliability criterion c j is based on an analogy with stepwise multiple linear regression (MLR). The estimated standard error, s ( b j

), cannot be computed directly for PLS. Therefore

Centner et al.

proposed to estimate the multivariate regression coefficient ( b j

) as a mean and its estimated standard error s ( b j

), for the j th variable using a leave one out strategy.

The problem of determining a subjective cut-off level was achieved using eqn. (4)

In x j

( c j

)

ƒ j max ( c noise

) j (4) where In x j

( c j

) is the reliability criterion ( c j

) for the j th informative variable in the instrumental data ( X ) and |max( c noise

)| is the absolute value of the maximum value for c j from the added random noise. However, this method was dependent on a satisfactory estimation of the amount of random noise. The modification proposed by both Faber and Westad

3,39 gave an objective cut-off value for the reliabilities in the form of jackknife estimated standard error which was used to determine whether b j b j

|

0, ( i.e.

an informative variable) as opposed to

~ 0 (classified as an uninformative variable), using the

Student’s t -test. It is this modified method which is used here.

Informative variable degradation by PLS (IVD-PLS)

UVE-PLS can be thought of in terms of a filter which allows through only those b coefficients that have a high probability of being not equal to zero. Those that remain are useful, but some are more useful than others. So, in order to rank these variables efficiently, a suitable criterion must be found. Centner et al.

29 proposed using a genetic algorithm (GA), however, one of the dangers of using such algorithms is that of random correlations. Even after the application of the UVE-PLS routine to the full spectrum data used in this study the number of samples required to give a satisfactory variable to sample ratio would be in the order of 800–900, which is clearly impractical.

In spectroscopic applications, the b coefficients cannot be used directly to choose which wavelengths are the most important for modelling. Indeed, a large coefficient may indicate a significant variable, but also a variable with large variability with little or no correlation to the analyte of interest.

This problem can be avoided by autoscaling the data, so that a large absolute b coefficient always indicates an important variable.

40

It is essential that the regression coefficients used in the final predictive PLS model (eqn. (5)) are relatively large and have a low standard error. Hence, a ranking scheme is proposed

J. Anal. At. Spectrom.

, 2002, 17 , 800–812 801

whereby y ~

^

0 z b

1 x

1 z ::: z

^ n x n z e (5) where b is a partial regression coefficient and e is the model error term. The importance of the mean regression value is assessed by ratioing it to its standard error, with larger ratios being assessed as more important (eqn. (6)).

var ivd j

~ b j s ( b j

)

(6) where var ivd j is the IVD ratio for the j th variable, ¯ j is the mean value for the j th b coefficient from autoscaled X data, and s ( b j

) is its estimated standard error.

Informative variables will have large b j coefficient estimates and small estimated standard errors, so ranking on the basis of a decreasing IVD ratio should enable the most important variables, in terms of high correlation and low regression coefficient variance, to be used in the PLS algorithm. In order to obtain the correct number of ranked variables, the cumulative sum of var ivd j is obtained and at specific percentage intervals, i.e.

30 to 100% with 5% stepping, the corresponding root mean square error of cross validation (RMSECV) is obtained for that set of autoscaled wavelength data points. This results in a minimum RMSECV value which indicates the correct number of ranked variables to use. The stepping value can be altered to suite the degree of accuracy required, in this study a value of 10% was found to be adequate. The final PLS model is built with the corrected number of wavelength data points, using either mean centering or autoscaling of the data.

Correct estimation of all parameters in eqn. (6), and in the

UVE-PLS routine, is dependent upon the optimum number of latent variables (LVs) being used. The calculation of these parameters is performed prior to the removal of any variables so the estimation of the correct number of LV’s, using root mean square error of cross validation values (RMSECV) or any other method, will not give a unique solution. In this work it was not possible to objectively calculate the optimum number of LVs so a range was used in an attempt to reduce predictive errors in the final models. This problem is discussed more in the Results section.

The jackknife estimated standard error of regression coefficients

The jackknife technique based upon the work of Quenouille, 41 is a nonparametric resampling technique which can be used to estimate the standard error of an estimator, in this case b , the

PLS regression coefficient.

41

The formula for the jackknife corrected estimated standard error of b in both the UVE-PLS and IVD-PLS routines is shown in eqn. (7).

s ( b j

) ~

( n { 1

X n

( b ij

{ b j

)

2 i ~ 1

)

1 = 2

(7) where n is the number of calibration samples, b ij is the regression coefficient with the i th calibration sample removed, ¯ j the mean value of b i is for the j th variable. It is easy to verify that eqn. (5) reduces to the traditional estimate of the standard error of the mean when b ~ x¯ . The beauty of the jackknife technique is that it automatically produces a standard error estimate for even the most complicated estimator, such as the

PLS regression coefficient, even when no simple algebraic form exists.

43

Experimental

Analytical procedure

Instrumentation.

All data were collected using a simultaneous echelle inductively coupled plasma optical emission spectrometer (Perkin-Elmer Optima 3000 ICP, Norwalk, USA)

802 J. Anal. At. Spectrom.

, 2002, 17 , 800–812

Table 1 Optimised instrumental parameters used for the collection of all data

Viewing height above the load coil/mm

Auxiliary gas flow/l min

2 1

Plasma gas flow/l min

2 1

Carrier gas flow/l min

2 1

Power/W

Spray chamber

Nebuliser

Injector diameter/mm

Resolution mode

Read time/integration time/s

9

0.5

16

0.93

1286

Ryton, double-pass

Seaspray, glass concentric

2.5

High (0.01 nm)

3/0.2

equipped with a segmented-charge-coupled array detector system (SCCD). The spectrometer consisted of an echelle grating and separate cross-dispersers for the UV and visible channels. The 201 subarrays covered approximately 6% of the

UV/VIS spectrum (160–800 nm), but were located in order to coincide with the most sensitive known spectral lines of the elements. All data were acquired under the highest resolution setting. The remaining instrumental parameters were optimised using simplex optimisation, the criterion of merit for the optimisation being the limit of detection for the Pt 214.423 nm line. All instrumental settings are shown in Table 1. A computer program was written to extract the emission data for all 201 subarray signals and read to a text file which consisted of 5684 wavelength data points per spectrum.

Analysis of synthetic solutions.

Single and multielement solutions were prepared by serial dilution of ultra-pure stock standards (10,000 and 1000 m g ml 2 1 , Johnson Matthey plc,

Royston, Hertfordshire). Water was deionised, double distilled

(18 M V quality) and acids were of Aristar grade (Merk-BDH,

Poole, Dorset). All glassware was acid washed in 10% v/v nitric acid for 24 h then rinsed thoroughly with 18 M V water. All plasticware was metal free high-density polypropylene (Anachem, Luton, Bedfordshire). Calibration and independent test solutions containing varying concentrations of Pt, Pd, Rh, Al,

Mg, Ce, Zr and Ba, plus the internal standards In, Sc and

Y were prepared from the stock solutions and stored in high-density polypropylene tubes.

Synthetic calibration and test solutions were prepared using a Taguchi orthogonal array

44 in order to cover the required factor space with the minimum number of experiments. The concentration ranges of the elements were determined from historical data on the composition of autocatalyst digest samples. The orthogonal arrays contained 8 factors at seven levels with a total of 49 experiments, represented as OA

49

(7

8

).

45

The levels and factors for the design are shown in Table 2. The elements Pt, Pd and Rh were set at concentrations typical of analyte elements, whereas Ba, Ce, Zr, Mg and Al were at concentrations typical of matrix components. These matrix elements would be expected to produce quite severe spectroscopic and non-spectroscopic matrix effects, hence, they were chosen in order to provide a stern test for the effectiveness of

Table 2 Concentration levels ( m g ml

2 1

) and factors in the experimental design

Level

Factor

Pt

Pd

Rh

Ba

Ce

Zr

Mg

Al

1

0

0

0

0

0

0

0

0

2

1

1

1

1

1

5

5

1

3

10

10

2

5

10

10

10

10

4

20

20

4

10

50

50

50

100

5

30

30

6

50

100

100

100

200

6

40

40

8

100

300

300

300

500

7

50

50

10

200

500

500

500

1000

the variable reduction and subsequent multivariate calibration.

Independent test solutions were made ensuring that element concentration ratios were different from any calibration sample.

Analysis of real industrial samples.

Real industrial samples containing combinations of over 40 elements, including Pt,

Pd, Rh, Al, Mg, Ce, Zr, Fe, Si, Au, Ag, Ir, As, Cu, Ca, B and

Li were also analysed. The elements Fe, Mg and Al have extremely line-rich spectra in the 200–450 nm wavelength region thereby posing problems for the determination of elements with emission lines in this region of the spectrum. Due to the complex nature of the samples it was not possible to use an experimental design to produce a calibration data set covering all the factor space, so an alternative method was required. In order to acquire the data to build the calibration models the samples were analysed in two ways as follows.

NiS fire assay.

Samples were subjected to a NiS fire assay procedure, to separate the matrix from the precious metals, then the precious metal concentrations were determined by inductively coupled plasma atomic emission spectrometry

(ICP-AES) using the method of univariate calibration. This analysis is the traditional method of assaying precious metals which is regularly performed in industry, and is considered to be accurate to within ¡ 1%. The analyses were also performed by a number of independent laboratories to give consensus values for the precious metal concentrations, which were then treated as in-house reference values for the subsequent multivariate calibration.

Peroxide fusion.

The same samples that were analysed using the NiS fire assay were subjected to a peroxide fusion digestion, which did not separate the matrix from the analytes, then the precious and base metals were determined by ICP-AES using the method of univariate calibration and interelement correction. For this analysis a segmented array simultaneous

ICP-AES instrument was used (Perkin-Elmer optima 3000,

Norwalk, CT), and 38% of the segmented spectrum was acquired, consisting of 76 subarrays out of a possible 201, resulting in individual discontinuous atomic emission spectra each containing 2268 wavelength data points.

Hence, the spectral data obtained by the peroxide fusion analysis could be used to build the multivariate calibration model, and the consensus values for the precious metals obtained using the NiS fire assay method could be used as standard concentration values. The analytes available to this approach were gold (Au), silver (Ag) and palladium (Pd).

Variable reduction and multivariate calibration

Synthetic solutions.

Data was autoscaled prior to application of UVE-PLS and IVD-PLS in order to prevent variables with large and random variance from dominating the PLS model at the expense of variables with a small variance. This is important for atomic emission spectra in which signal intensities can range from a few hundred units up to several hundred thousand or more.

Uninformative variable elimination PLS (UVE-PLS).

A flow-chart outlining the UVE-PLS procedure is shown in

Fig. 1. The software used was Matlab Software Version 5.0, and the PLS_Toolbox 2.0 (Mathworks Inc.). The full segmented spectrum data set was subjected to the UVE-PLS algorithm as follows:

(i) The original data array extracted from the ICP-AES instrument was a 49 6 5684 matrix made up of 49 spectra (for the calibration data set) each containing 5684 data points. The data was autoscaled and one calibration spectrum was initially removed to leave a 48 6 5684 matrix.

(ii) The PLS algorithm was applied and the b ij regression coefficients were extracted for the optimum number of LVs

(Aopt PC).

(iii) Jackknife estimated standard errors were calculated for the b ij regression coefficients, and a two-sided t -test was performed, at each wavelength, in order to determine which were equal to zero at the 95% confidence level.

(iv) Those X variables corresponding to b j

~ 0 were rejected from the original 49 6 5684 data matrix before progression onto the next step.

Informative variable degradation by PLS (IVD-PLS).

A flow-chart outlining the IVD-PLS algorithm is outlined in

Fig. 2 and was applied as follows:

(i) The reduced 49 6 5684 2 n data matrix resulting from application of the UVE-PLS algorithm was autoscaled and one calibration spectrum was initially removed.

(ii) The PLS algorithm was applied and the b ij regression coefficients extracted for the optimum number of LVs (Aopt

PC).

(iii) Jackknife estimated standard errors, s ( b j

), and the b j

, were calculated for the regression coefficients.

(iv) The var ivd j

~ b

¯ j

/ s ( b j

) ratios were calculated and ranked in descending order.

(v) The X variables in the data matrix were ranked in accordance with the var ivd j calculated.

ratios and the cumulative sum

(vi) A multivariate PLS calibration model was built using mean-centred X data contributing to the first 30% of the cumulative sum of the var ivd j ratios, and the RMSECV was calculated. The process was repeated using the first 40%, 50% and so on at 10% intervals of the cumsum data and the model with the lowest RMSECV value was chosen as optimal.

Multivariate calibration.

In order to ascertain the effectiveness of the variable reduction procedure multivariate calibration models were prepared using three different data sets and compared. The first data set was comprised of the reduced variables only; the second, the unreduced spectrum ( i.e.

all 5684 wavelengths); and a third was prepared using the more traditional method of choosing 166 individual spectral lines representing the most intense analyte and matrix lines in the spectrum from which gross integrated intensities were then calculated. Data pretreatment methods of mean centering and autoscaling were also compared.

Real industrial samples.

Selection of samples using principal components analysis.

In order to perform any calibration experiment suitable calibration samples must be used. In a situation where structured experimental design can be used this is a simple matter of deciding which design to choose ( e.g.

factorial, partial factorial or orthogonal array etc.

). If the samples under examination are very complex (in this case w

40 elements), or if the elemental content of future samples is known to vary considerably, the remaining alternative is to use

‘historical data’ in a similar manner to that used for process analysis. However, for this to give satisfactory predictions, three criteria must be met: there must be sufficient samples from which to select a suitable calibration subset; the elemental composition of this subset must be similar; and the analyte concentrations used in the calibration should be accurate. It would be preferable to use certified concentration values, but because these are seldom available in an industrial context an alternative is to use concentration values that have been determined by an independent method. In this case the samples were independently analysed by a number of referee laboratories using ICP-AES after a NiS fire assay sample preparation procedure.

The emission data from the peroxide fusion samples was first autoscaled to ensure that each variable was considered equally

J. Anal. At. Spectrom.

, 2002, 17 , 800–812 803

Fig. 1 Flowchart outlining the UVE-PLS algorithm.

important. The samples were first divided into 3 subsets based upon their analyte (Au, Ag or Pd) content and then subjected to PCA and the Hotelling T

2 test. The degree of confidence, a , was chosen to be 95% ( i.e.

a ~ 0.05, a one sided t -test was used). After each PCA model was built the T

2 was calculated and the actual number of outlying samples was then compared to the ‘expected’ number according to a 6 m (where m is the number of calibration samples). If the number of outlying samples exceeded a 6 m all outlying samples were deleted and the remaining samples were used as a calibration set. If the number of outlying samples was v a 6 m then these were deleted and a further PCA model was built using this new data set and the Hotelling T

2 test repeated until the number of outlying samples was statistically acceptable. This method of identifying ‘suspect’ samples, can result in the loss of potentially informative samples, however, when faced with samples in their thousands, a non-statistical method of

804 J. Anal. At. Spectrom.

, 2002, 17 , 800–812 selection is obviously impossible. To select the calibration and validation samples, all samples were ranked in ascending order of analyte concentration. Validation samples were then selected at regular intervals, the size of the interval depending upon the total number of samples. This ensured that both the calibration and validation data set contained samples that were representative of the analyte concentrations available.

The UVE-PLS and IVD-PLS routines were applied and multivariate calibration models built in the same way as for the synthetic solutions.

Estimation of errors.

The overall prediction efficacy, and assessment of the models capability to accommodate the calibration data itself, were compared using the relative root mean square error (RRMSE), defined in eqn. (8), which gives a general estimate of the error of prediction for concentrations of an element in the range of samples used:

Fig. 2 Flowchart outlining the IVD-PLS algorithm.

1

RRMSE( % ) ~ 100 | mean( y ) s

( ^ i

{ y i

)

2 n

(8) where y i is the known concentration, yˆ is the predicted concentration, and n is the number of experiments. The ultimate assessment of future prediction is the application of the calibration models to independent test data.

Test sample confidence intervals were then based on TSP ¡

( t

( a ~ 0.05/2, DF ~ n )

* se (TSP)) ( TSP ~ test sample prediction), which corresponded to a confidence interval of 95% ( i.e.

y ¡ 2 standard deviations). This process is illustrated by the flowdiagram in Fig. 3.

Test sample confidence intervals.

Central to the variable reduction routine is the estimation of b coefficient uncertainty.

This uncertainty can be projected onto the test samples in order to define symmetric confidence intervals. This was done using a leave-one-out cross-validation approach, so that the variation of the calibration models was used to estimate the variation in predictions for an independent sample. This was done n times

( n ~ number of calibration samples) and the usual jackknife formula used to estimate the standard error of the prediction.

Results and discussion

Synthetic samples

The UVE and IVD algorithms were applied to the full raw spectral data matrix obtained from the ICP-AES instrument and multivariate calibration models built for the prediction of

Pt, Pd and Rh in synthetic test solutions.

Application of UVE-PLS and IVD-PLS algorithms.

Application of the variable reduction algorithms resulted in the deletion of the majority of the original 5684 spectral data points (Table 3). It is evident that the UVE-PLS algorithm had

J. Anal. At. Spectrom.

, 2002, 17 , 800–812 805

Fig. 3 Flowchart outlining the jackknife confidence interval algorithm.

No.

of

LVs

6

8

10

Table 3 Effect of applying the UVE-PLS and UVE-IVD-PLS algorithms to the original 5684 variables in the data matrix

No. of variables remaining after application of algorithms

UVE-PLS UVE-IVD-PLS

Pt

375

643

754

Pd

933

982

984

Rh

168

334

376

Pt

108

47

123

Pd

99

110

138

Rh

35

99

55 the largest effect, eliminating most of the spectral data. The

IVD-PLS algorithm then reduced the remaining data further, depending on the number of LVs used. The effectiveness of the

IVD-PLS algorithm is illustrated in Fig. 4, which shows the cumulative sum plot of the var ivd j ratios and the RMSECV values. It is evident from Fig. 4 that a minimum RMSECV value of 0.521 was obtained when 933 wavelength data points were included in the model (using 6 LVs). However, a

RMSECV value of approximately 0.525 was obtained by using only 99 variables, so it is possible to inspect these plots rather than simply use the global minimum given by IVD-PLS.

806 J. Anal. At. Spectrom.

, 2002, 17 , 800–812

Multiple t -tests and type I errors.

By performing multiple t -tests, rejection of the null hypothesis ( b ~ 0) when it is true will occur a n times, where a is the significance level and n is the number of tests performed. However, where the number of variables remaining after UVE-PLS w a n , subsequent modelling of the variable subset will result in ‘‘Type I error variables’’, which are dependent upon the orientation of all b j

/ s ( b j

) value. This is due to the re-orientation of the LVs when the b ~ 0 variables are removed. Where the number of variables remaining after

UVE-PLS j a n , it would appear that all variables are the result of Type I error, and are therefore non-informative.

However, this fails to take into account the removal of the b ~

0 variables, which again alters the LV orientation, and hence the ¯ j

/ s ( b j

) value. This gives a new set of variables where the maximum number of Type I errors can only be the product of a and the number of variables remaining after UVE-PLS. If n

U represents the number of variables remaining after UVE-PLS, then where a n % n

U subsequent modelling will not be dominated by Type I error variables. This is obviously dependent upon a , hence smaller a values may be more beneficial.

Additionally, because a refers to the average number of Type

I errors, it is knowable only in terms of a ¡ se , where se is its

Fig. 4 Plots of: (a) the cumulative sum of the ranked var j ivd ratios versus individual spectral data points; and (b) RMSECV for Pd for each variable subset, using 6 principal components.

standard error. If the number of variables remaining after

UVE-PLS is less than a n then the use of UVE-PLS followed by IVD-PLS may still result in low predictive errors depending on the standard error associated with a which is dependent upon the distribution of b at each wavelength.

Multivariate calibration for quantitative prediction.

In order to evaluate the usefulness of the variable selection algorithm for multivariate calibration and quantitative analysis, a series of independent test solutions, prepared with randomly chosen concentrations of the elements present in the calibration solutions, were analysed and the concentrations of Pt, Pd, and Rh predicted using PLS calibration models. Results are summarised in Table 4. The effectiveness of the variable

Table 4 Comparison of relative root mean square errors (%) obtained for PLS calibration models for Pt, Pd and Rh built using different data sets and data scaling options: ns, no scaling; as, autoscaling; mc, mean centering a

Pt Pd Rh

Data set ns as mc ns as mc ns as mc

Full spectrum

99.4 12.6 102 87.1

8.32 80.2 101 27.2 101

UVE-PLS 12.3

7.43 15.1

1.76 4.05 1.99

1.59 2.33

1.10

5.46 4.60

5.54 2.11 3.20 1.83

1.61 1.65

1.38

UVE-IVD-

PLS

166 lines n.d.

5.19

n.d.

n.d. 7.06 n.d.

n.d.

3.18

n.d.

a n.d. not determined.

reduction algorithm in improving the quality of the PLS calibration models built using the data is illustrated by the

RRMSE values obtained for the prediction of Pt, Pd and Rh concentration in the independent test solutions (Table 4), using

8 LVs and different methods of data scaling. In order to assess the effectiveness of the variable reduction routine, a PLS model was also built using the entire spectrum available with no variable selection. As can be seen from Table 4, there was a significant increase in the Pd prediction accuracy following variable reduction compared to using the unreduced spectral data, though in the case of using autoscaled data this improvement was not always great. The external RRMSE values obtained using the reduced spectrum and autoscaled data were

4.60, 3.20 and 1.65% for Pt, Pd and Rh respectively, compared to 12.6, 8.3, and 27.2% using the unreduced spectral data. The higher RRMSE values obtained when using the unreduced spectral data was probably due to the inclusion of too many uninformative variables ( i.e.

noise) in the model, which were deleted using the UVE-PLS and IVD-PLS algorithms. In contrast to infra-red and UV spectroscopy, where absorption bands are generally quite broad, atomic emission spectra are comprised of many narrow emission lines, of the order of

0.01 nm width, which can be extremely complex if even a few matrix elements with line-rich spectra are present. Part of the emission spectrum containing several intense lines for Pd, comprised of superimposed spectra of the calibration solutions, is shown in Fig. 5. Also shown are the regions of this spectrum

( i.e.

the wavelength data points) which were selected by the variable reduction algorithms for the prediction of Pd concentration. It is evident from Fig. 5 that the selected parts of the spectrum were often co-incident with analyte lines for

Pd ( e.g.

Pd 324.470 and Pd 340.458) and also known Pd interferents such as Ce, Zr, Pt and Zr. The PLS algorithm builds a model using linear combinations of variables for which variability is correlated to the analyte of interest, so it is to be expected that analyte interferents are also selected, as was the case here. On close examination of other parts of the spectrum, it was not obvious why particular spectral regions were selected by the algorithm ( e.g.

continuum background), however, it is quite possible that parts of the spectrum are correlated with non-spectroscopic matrix effects, such as suppression or enhancement of the emission signal, which are difficult to explain in any case. This highlights an extremely desirable

Fig. 5 A portion of the segmented emission spectrum obtained by superimposing the spectra of all the synthetic calibration samples. The vertical bars along the top indicate some of the wavelength data points selected by UVE-IVD-PLS for the prediction of Pd. Some major lines corresponding with the selected points are also indicated.

J. Anal. At. Spectrom.

, 2002, 17 , 800–812 807

aspect of this method of selecting variables from the raw spectral data, namely that it is an objective rather than a subjective method of selecting the most informative variables, so that prejudgements about the usefulness or otherwise of parts of the spectrum are not necessary. Bearing this in mind, it is interesting to compare the RRMSE values obtained when the calibration model was constructed using 166 individual spectral lines representing the most intense analyte and matrix lines in the spectrum.

46

The variable reduction method resulted in an improvement compared to this approach, presumably because useful regions of the spectrum were not subjectively omitted from inclusion in the model, as they inevitably must be if preselected lines are used.

In order to avoid model under- or over-fitting the number of LVs was heuristically selected on the basis of the known number of elements in the data set. It was known that 8 elements were present in the samples, so it was assumed that between 6 and 10 LVs would best model the system. It has been suggested that one should select the optimum number of LVs before variable selection has been carried out.

3

However, this method can lead to the selection of a large number of LVs (in excess of 40) resulting in a grossly overfitted model, especially when line-rich spectra containing a large amount of noise are used. Selection of the optimum number of LVs is not always an easy task especially if the method is to be applied to systems for which the exact chemical composition is unknown. In this case, the number of LVs was selected by comparing RMSECV values for models built with either 6, 8 or 10 LVs and choosing the lowest. A more objective method for selecting the number of LVs, prior to variable elimination is currently being investigated but as yet no satisfactory solution has been found.

A full set of results for the test solutions using autoscaled data is given in Table 5 and are compared to the actual concentrations in Fig. 6. The 95% confidence limits of prediction, obtained using the method described in the theory section, are also given in Table 5. The most striking feature of these results is that the prediction accuracy was well within acceptable limits down to a concentration of 1 m g ml 2 1 , but a positive bias in the prediction of zero concentration was evident. Further work is being undertaken to improve the prediction accuracy at low concentrations.

Real industrial samples

Selection of samples using PCA.

Plots of the first 2 principal components (PCs) obtained after the application of PCA on the data subsets are shown in Fig. 7 for Au, Ag and Pd. Also shown is the Hotelling T

2 confidence ellipse at the 95% confidence interval. The amount of variance explained by the first 2 PCs, in the first PCA model, for Au, Ag and Pd was y 37%, y 42% and 46% respectively. Despite the majority of the samples being contained within the confidence ellipse, a

Table 5 Concentrations and confidence intervals (C.I.) for the test solutions predicted using UVE-IVD-PLS with autoscaled data and the jackknife estimator

Pt Pd Rh

Actual Predicted C.I. Actual Predicted C.I. Actual Predicted C.I.

12

16

20

12

18

6

2

40

30

0

11.9

17.4

19.8

12.2

17.5

6.7

2.0

39.0

28.8

0.0

0.70 20

0.83 12

0.76 18

0.86 14

0.93 10

1.30 30

0.54

6

2.61

2

0.94 40

0.76

0

20.1

12.1

17.6

13.7

9.9

29.5

6.4

2.1

38.8

0.3

0.58

0.77

3

1.16

5

0.84

2

0.44

4

0.83

3

0.82

1

0.64

2

0.56

8

0.95 10

0

2.90

5.05

2.03

4.05

3.01

1.08

1.94

7.96

9.93

0.0691

0.14

0.29

0.24

0.20

0.16

0.18

0.095

0.26

0.22

0.12

Fig. 6 Relative errors (%) in the predicted concentrations of the independent test solutions as a function of actual concentration predicted using UVE-IVD-PLS of raw spectral data and data pretreatment:(a) Pt; (b) Pd; (c) Rh. Data pretreatment prior to building the PLS model are compared: ns ~ no scaling; as ~ autoscaling; mc ~ mean centering.

large spread was evident within the data for all analytes indicating that there were large differences in the chemical composition of the samples. The final number of samples retained for Au, Ag and Pd was 59 (59%), 48 (59%) and 36

(60%) respectively, with the remainder of the samples being classified as outliers. In order to test the method, a number of samples were removed from these data sets to act as independent test samples, resulting in 47, 38 and 28 for training, and 12, 10 and 8 for testing for Au, Ag and Pd respectively.

Application of UVE-PLS and IVD-PLS algorithms.

The

UVE and IVD algorithms were applied to the available segmented spectrum obtained from the ICP-AES instrument and multivariate calibration models built for the prediction of Au, Ag and Pd in complex industrial fusion samples. The

808 J. Anal. At. Spectrom.

, 2002, 17 , 800–812

Table 6 Effect of applying the UVE-PLS and UVE-IVD-PLS algorithms to the original 2268 variables in the data matrix for Au, Ag and

Pd calibration

No. of variables remaining after application of algorithms

UVE-PLS

No. of

LVs

8

Au

189

Ag

109

Pd

25

UVE-IVD-PLS

Au

52

Ag

40

Pd

13

Fig. 7 Scores plot (first 2 PCs) for: (a) Au; (b) Ag; (c) Pd (95% confidence ellipse) showing a statistically acceptable number of outliers.

Fig. 8 Segmented emission spectrum obtained by superimposing emission spectra of all the fusion calibration samples used for the prediction of Ag. The vertical bars along the top indicate the wavelength data points selected by the UVE-IVD-PLS algorithm.

Some major lines corresponding with the selected points are also indicated.

of the variable reduction algorithms resulted in the deletion of the majority of the original 2268 spectral data points

(Table 6). The UVE-PLS algorithm had the largest effect, eliminating between 92–99% of the spectral data. The IVD-PLS algorithm then reduced the remaining data by between 48–

72%, depending on the analyte.

The emission spectrum for all superimposed calibration solutions for Ag is shown in Fig. 8. Also shown are the regions of this spectrum ( i.e.

the wavelength data points) which were selected by the variable reduction algorithms for the prediction of Ag. Because of the relatively few lines chosen (Fig. 8) it was not obvious why particular spectral regions were selected by the algorithm. PLS looks for linear combinations of variables for which variability is correlated to the analyte of interest, so it is to be expected that interfering elements are also selected, e.g.

Ce, Fe and Mn. The minimum RMSECV value of 0.025 was obtained when only 30 wavelength data points were included in the model and a similar cumulative sum plot of the var ivd j ratios and the RMSECV values to that given in Fig. 4 was obtained.

general efficacy of the variable reduction algorithms are discussed, followed by a comparison of the results obtained for the multivariate calibration and prediction of Au, Ag and

Pd.

Before the UVE-PLS algorithm can be applied, one must enter the optimum number of LVs. Because the fusion digest data contained far less ‘noise’ than the available spectrum used previously with the synthetic solutions the optimum number of

LVs was found by using the RMSECV value. This was plotted for increasing numbers of LVs, and 8 LVs were chosen for Au,

Ag and Pd, resulting in RMSECV values of y 0.4. Application

Multivariate calibration and quantitative prediction.

of test samples.

Summary data for the prediction of the concentrations of Au, Ag and Pd in the fusion samples are shown in Table 7. In order to compare the effectiveness of the variable reduction routine, a PLS model was also built using the entire spectrum available with no variable selection and autoscaled or mean centred data (Table 7). For all analytes there was a significant increase in the accuracy of the predictions following variable selection. Only 52 variables

J. Anal. At. Spectrom.

, 2002, 17 , 800–812

Analysis

809

Table 7 RRMSE values for prediction of Au, Ag and Pd in the test samples using both the reduced spectrum and the full spectrum and

PLS a

Au Ag Pd

Data set ns as mc ns as mc ns as mc

Full spectrum n.d. 38 142 n.d. 82 59 n.d. 195 140

UVE-PLS 12

UVE-IVD-PLS 52

22 22 10

23 120 26

66 13

29 21

9.2

9.4

13

9.1

9.3

9.9

a n.d. not determined.

out of 2268 were required to obtain a RRMSE value of 23% for

Au in the test set, compared to 38% for the full spectrum. For

Ag and Pd test data the RRMSE values were 29 and

9.1% respectively, compared to 82 and 195% using the full spectrum.

Comparison of methods.

In the following discussion three sets of data, obtained for samples treated in three different ways, are compared as follows:

1. Samples were prepared using a NiS fire assay method and analysed using ICP-AES by univariate calibration, designated

FA-UC. These were the consensus values, which were used as in-house standards.

2. Samples were prepared using the peroxide fusion digestion method and analysed by ICP-AES using the variable reduction routine and multivariate calibration by PLS, designated F-VR-

PLS.

3. Samples were prepared using the peroxide fusion digestion method and analysed by ICP-AES using univariate calibration and interelement correction, designated F-UC-IEC.

Plots of concentration obtained using the FA-UC method

(consensus values), versus concentration predicted using the

UVE-IVD-PLS method are shown in Fig. 9 for Au, Ag and

Pd in the test samples. Also shown are the 95% confidence intervals.

Predicted concentrations in the independent test samples obtained using the three methods, and relative percentage errors for the F-VR-PLS and F-UC-IEC methods compared to the FA-UC method (consensus values), are shown in Tables

8–10 for Au, Ag and Pd, respectively.

Taking Au first, the RRMSE was 132% using F-UC-IEC compared to 23% using F-VR-PLS, indicating a slightly better level of accuracy. As can be seen from Table 8, in some cases the former method yielded better accuracy, and in some cases the latter, however, it should be noted that in both cases the

Table 8 Comparison of methods for the determination of Au: FA-UC

(fire assay and univariate calibration); F-VR-PLS (fusion and UVE-

IVD-PLS with autoscaled data); F-UC-IEC (fusion and univariate calibration with interelement correction)

Au concentration/ m g ml

2 1

% error relative to

FA-UC method

FA-UC

0.88

1.66

2.55

6.05

14.10

19.89

27.84

33.80

39.45

47.84

87.06

129.52

F-VR-PLS

2 0.67

8.71

2 3.98

8.48

21.35

14.23

17.63

31.14

39.92

41.08

72.65

114.90

F-UC-IEC

0.95

1.92

2.65

6.49

14.29

21.02

28.16

33.16

40.48

49.13

0.09

0.02

F-VR-PLS

2 176.7

424.5

2 256.2

40.3

51.4

2 28.5

2 36.7

2 7.9

1.2

2 14.1

2 16.5

2 11.3

F-UC-IEC

8.57

15.66

4.13

7.36

1.35

5.68

1.15

2 1.89

2.62

2.71

2 99.90

2 99.98

Fig. 9 Concentration of precious metals in fusion samples obtained using the FA-UC method (consensus concentration), versus concentration predicted using UVE-IVD-PLS of autoscaled spectra. Error bars represent the 95% confidence intervals calculated using the jackknife estimator. The dotted line indicates a gradient of 1 passing through the origin.

810 J. Anal. At. Spectrom.

, 2002, 17 , 800–812

Table 9 Comparison of methods for the determination of Ag: FA-UC

(fire assay and univariate calibration); F-VR-PLS (fusion and UVE-

IVD-PLS with autoscaled data); F-UC-IEC (fusion and univariate calibration with interelement correction)

Ag concentration/ m g ml

2 1

% error relative to

FA-UC method

FA-UC

0.01

2.43

4.33

10.79

20.44

22.27

28.43

35.73

41.17

57.51

F-VR-PLS

2 5.169

5.334

5.42

2 7.824

28.437

25.232

27.51

41.514

39.921

55.081

F-UC-IEC

0.047

1.99

4.44

11.59

20.39

24.32

27.27

38.23

44.44

61.12

F-VR-PLS

2 51790.0

119.5

25.2

2 172.5

39.1

13.3

2 3.2

16.2

2 3.0

2 4.2

F-UC-IEC

370.00

2 18.11

2.54

7.41

2 0.24

9.21

2 4.08

7.00

7.94

6.28

F-VR-PLS

0.059

0.071

0.657

0.845

1.594

1.628

3.745

7.844

F-UC-IEC

0.0126

0.0944

0.507

1.11

1.63

2.19

3.32

8.15

F-UC-IEC

2 16.00

4.89

0.40

2.78

4.49

4.29

2 3.49

6.12

results are heavily influenced by invidividual samples with large errors. In the case of Ag, RRMSEs were 8.5% and 29% for

F-UC-IEC and F-VR-PLS methods respectively. It should be noted that, in the latter case, a large error was obtained for the lowest concentration sample (Table 9) for which the consensus value obtained using the FA-UC method was only 0.01

m g ml

2 1

.

For Pd the RRMSE was 8.6% using F-UC-IEC compared to 9.1% using the F-VR-PLS, with the latter method again being heavily influenced by the lowest concentration sample

(Table 10).

The prediction errors were more pronounced at the lowest concentrations when using F-VR-PLS, indicating that the method was capable of correcting for interferences but there were insufficient calibration samples, particularly at low concentration. Overall, a comparable level of accuracy was obtained using the UVE-IVD-PLS method compared to a traditional univariate calibration with interelement correction

This is extremely encouraging, bearing in mind that these were quite complex industrial samples and that between only 28–47 samples were used for calibration. It is envisaged that considerable improvements could be realised if the number of calibration samples was increased to several hundred, thereby covering more of the factor space. Such a database could be built up over time to cover an ever widening range of sample matrices and should at least allow semiquantitative analysis without the need to select individual analytical lines, background correction procedures, or interferences.

and further results will be presented demonstrating its application in this area.

Spectral data obtained for the analysis of fusion digests has been used to build multivariate calibration models using PLS to predict the concentration of Au, Ag and Pd in test samples.

In order to achieve this, variable elimination and selection algorithms were used to select the informative parts of the ICP-

AES emission spectra without having to resort to line selection or the need to assign background correction points in order to obtain the net integrated line intensities. The model errors for both the calibration and independent test data, have shown considerable improvement compared to the errors achieved when using all 2268 wavelengths thus reinforcing the fact that

PLS benefits from selective variable reduction. The variable selection method and PLS multivariate calibration resulted in results comparable to those obtained using a more traditional univariate calibration approach with interelement correction.

Calibration models were built using 47, 38 and 28 samples for

Au, Ag and Pd respectively, hence, it is envisaged that an improvement in the accuracy of prediction would be obtained if more samples were used to build the model.

Acknowledgement

The authors are grateful to Johnson Matthey and the

Department of Trade and Industry for funding this project and for the valuable discussions and help with the experimental section. And also to the University of Plymouth for the PhD studentship enabling Mike Griffiths to carry out this work, and

John Kalivas at Idaho State University for help in developing the variable reduction algorithms.

Conclusions

The application of variable elimination and selection algorithms has shown that it is possible to use the complete available ICP-AES emission spectrum for multivariate modelling without having to resort to line selection or the need to assign background correction points in order to obtain the net analyte signal. One of the benefits of this approach is that it results in selection of the parts of the spectrum, such as continuum background, which would be regarded as uninformative by subjective human analysis, but which can be highly informative to a bilinear modelling technique such

PLS. The method is computationally simple, includes significance tests of model parameters and allows the calculation of confidence intervals for test data. Multivariate calibration models built using the reduced spectrum exhibited great improvement compared to those built using all 5684 wavelengths, and also exhibited improvement compared with models built using 166 integrated line intensities. Hence, it was not necessary to pre-select individual atomic emission lines so that expert knowledge of the chemical system and the emission spectrum was no longer required. The method is currently being used to build multivariate calibration models using data obtained for the analysis of real complex samples,

References

1 V. Centner and D.-L. Massart, Anal. Chem.

, 1996, 68 , 3851–3858.

2 S. D. Osborne, R. B. Jordan and R. Kunnemeyer, Analyst , 1997,

122 , 1531–1537.

3 F. Westad and H. Martens, J. Near Infrared Spectrosc.

, 2000, 8 ,

117–124.

4 A. D. Shaw, A. D. Camillo, G. Vlahov, A. Jones, G. Bianchi,

J. Rowland and D. B. Kell, Anal. Chim. Acta , 1997, 348 , 357.

5 D. F. Wirsz and M. W. Blades, J. Anal. At. Spectrom.

, 1988, 3 ,

363–373.

6 J. A. Morales, E. H. v. Veen and M. T. C. de Loos-Vollebregt,

Spectrochim. Acta , 1998, 53B , 683–697.

7 E. H. v. Veen, S. Bosch and M. T. C. de Loos-Vollebregt,

Spectrochim. Acta , 1997, 52B , 321–337.

8 L. Soudier and J. M. Mermet, J. Appl. Spectrosc.

, 1995, 49 , 1478–

1484.

9 C. H. Spiegelman, M. McShane, M. J. Goetz, M. Motamedi,

Q. L. Yue and G. L. Cote, Anal. Chem.

, 1998, 70 , 35–44.

10 A. S. Bangalore, R. E. Schaffer, G. W. Small and M. A. Arnold,

Anal. Chem.

, 1996, 68 , 4200.

11 P. J. Brown, C. H. Spielgelman and M. C. Denham, Philos. Trans.

R. Soc. London , 1991, 337 , 311.

12 P. J. Brown, J. Chemom.

, 1992, 6 , 151.

13 P. J. Brown, J. Chemom.

, 1993, 7 , 225.

14 M. J. Goetz, C. Spiegelman, G. Cote and M. Motamedi, Technical

Report No. 226 Dept. of Statistics, Texas A & M University,

College Station, Texas 1995.

15 U. Horchner and J. H. Kalivas, Anal. Chim. Acta , 1995, 311 , 1.

16 D. Jouan-Rimbaud, B. Walczac, D. L. Massart, I. R. Last and

K. A. Prebble, Anal. Chim. Acta , 1995, 304 , 285.

17 D. Jouan-Rimbaud, D. L. Massart, R. Leardi and O. E. DeNoord,

Anal. Chem.

, 1995, 67 , 4925.

18 J. H. Kalivas, N. Roberts and J. M. Sutter, Anal. Chem.

, 1989, 68 ,

2024.

19 Y. Liang, Y. Xie and R. Yu, Anal. Chim. Acta , 1989, 222 , 347.

20 C. B. Lucasius and G. Kateman, Trends Anal. Chem.

, 1991, 10 ,

254.

21 C. B. Lucasius, M. L. M. Beckers and G. Kateman, Anal. Chim.

Acta , 1994, 286 , 135.

22 F. Navarro-Villoslada, L. V. Perez-Arribas, M. E. Leon-Gonzalez and L. M. Polo-Diez, Anal. Chim. Acta , 1995, 313 , 93.

J. Anal. At. Spectrom.

, 2002, 17 , 800–812 811

23 K. Sasaki, S. Kawata and S. Minami, J. Appl. Spectrosc.

, 1986, 40 ,

185.

24 P. C. Thijssen, L. J. P. Vogels, H. C. Smit and G. Kateman,

Fresenius’ J. Anal. Chem.

, 1985, 320 , 531.

25 D. Broadjurst, R. Goodacre, A. Jones, J. J. Rowland and

D. B. Kell, Anal. Chim. Acta , 1997, 348 , 71–86.

26 E. V. Thomas and D. M. Haaland, Anal. Chem.

, 1990, 62 , 1091–

1099.

27 F. R. Burden, R. G. Brereton and P. T. Walsh, Analyst , 1997, 122 ,

1015–1022.

28 R. Leardi and A. L. Gonzalez, Chemom. Intell. Lab. Syst.

, 1998,

41 , 195–207.

29 V. Centner, D. L. Massart, O. E. D. Noord, S. D. Jong,

B. M. Vandeginste and C. Sterna, Anal. Chem.

, 1996, 68 , 3851.

30 M. J. McShane, G. L. Cote and C. Spiegelman, J. Appl. Spectrosc.

,

1997, 51 , 1559.

31 C. H. Spiegelman, M. J. McShane, M. J. Goetz, M. Motamedi,

Q. L. Yue and G. L. Cote, Anal. Chem.

, 1998, 70 , 35.

32 H. Martens and T. Naes, Multivariate Calibration , Wiley,

Chichester, 1989.

33 I. Frank, Chemom. Intell. Lab. Syst.

, 1987, 1 , 233–242.

34 F. Lingren, P. Geladi, S. Rannar and S. Wold, J. Chemom.

, 1994,

8 , 349–363.

35 F. Lingren, P. Geladi, S. Rannar and S. Wold, J. Chemom.

, 1995,

9 , 331–342.

36 G. A. Bakken, T. P. Houghton and J. H. Kalivas, Chemom. Intell.

Lab. Syst.

, 1999, 45 , 225–239.

37 J. Neter, W. Wasserman and M. H. Kutner, Applied Linear

Statistical Methods , Irwin Homewood, Illinois, 3rd edn.,1990.

38 N. M. Faber, M. J. Meinders, P. Geladi, M. Sjostrom and

L. M. C. Buydens, Anal. Chim. Acta , 1995, 304 , 257–271.

39 N. M. Faber, Anal. Chem.

, 2000, 72 , 4675.

40 A.

G.

Frenich, D.

Jouan-Rimbaud, D.

L.

Massart,

S. Kuttatharmmakul, M. M. Galera and J. l. M. Vidal, Analyst ,

1995, 120 , 2787–2792.

41 M. Quenouille, J. R. Statistical Soc. B , 1949, 11 , 18–84.

42 H. Martens, M. Hoy, F. Westad, D. Folkenberg and M. Martens,

Chemom. Intell. Lab. Syst.

, 2001, 58 , 151–170.

43 J. Shao and D. Tu, The Jackknife and Bootstrap , Springer, Berlin,

1995.

44 T. Y. Chen and C. Y. Lin, Finite Elements in Analysis and Design ,

2000, 36 , 1–16.

45 N. J. A. Sloane, http://www.research.att.com/ y njas/oadir

46 M. Griffiths, D. Svozil, P. Worsfold, S. Denham and E. H. Evans,

J. Anal. At. Spectrom.

, 2000, 15 , 967–972.

812 J. Anal. At. Spectrom.

, 2002, 17 , 800–812

Download