and “c”

CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr LINKING TWO SETS OF DATA TOGETHER • Peak height to concentration • Spectra to concentrations • Taste to chemical constituents • Biological activity to structure • Biological classification to chromatographic peak areas NORMALLY WE ARE INTERESTED IN SOME FUNDAMENTAL PARAMETER e.g. concentration or biological classification WE TAKE SOME MEASUREMENTS e.g. spectra or chromatograms WE WANT TO USE THESE MEASUREMENTS TO GIVE US A PREDICTION OF THE FUNDAMENTAL PARAMETER UNIVARIATE CALIBRATION One measurement e.g. a peak height MULTIVARIATE CALIBRATION Several measurements e.g. spectra NOTATION “x” block is measured data e.g. spectra, chromatograms, GCMS of biological extract, structural parameters “c” block is what we are trying to predict e.g. concentration, species, acceptability of a product, taste Measurement e.g. spectroscopic Response e.g. Spectroscopic Y Experimental design X Independent variable, e.g. Concentration X Calibration C Predicted parameter, e.g. Concentration c x c C X X MULTIVARIATE CALIBRATION IN ANALYTICAL CHEMISTRY •Single component. Example, concentration of chlorophyll a by uv/vis spectra. •Mixture of components, all compounds known. Example, mixture of pharmaceuticals, all pure compounds known. •Mixture of components, only some compounds known. Example, coal tar pitch volatiles in industrial waste studied by spectroscopy, only some known. •Statistical parameters. Example, protein in wheat by NIR spectroscopy. UNIVARIATE CALIBRATION “x” and “c” blocks consist of single measurements. Traditional analytical chemistry CLASSICAL CALIBRATION xc.s Unknown : s s  c+ . x where c+ is the pseudo-inverse x c = s TREATMENT OF ERRORS IN CLASSICAL CALIBRATION x c PROBLEMS 1. Modern lab : dilution and sample preparation errors (in “c”) are probably bigger than spectroscopic errors (in “x”). Spectra are more reproducible. Differs to classical statistics. 2. Want to predict concentration from spectra etc. not vice versa. Most classical textbooks in analytical chemistry and most spreadsheets incorrectly recommend classical calibration. INVERSE CALIBRATION cx.b Unknown : b b  c . x+ x c x c = b COMPARING FORWARD AND INVERSE CALIBRATION 40 35 30 25 20 Classical 15 10 Inverse 5 0 0 1 2 3 4 5 6 7 8 9 10 INCLUDING THE INTERCEPT : first column of “x” is 1s c  b0+ b1x cX.b b  X+ . c c X = b HOW WELL IS THE MODEL PREDICTED? Huge number of approaches • Root mean square error (divide by degrees of freedom – number of samples – 1 or 2 according to parameters in the model). E = I  ( x i - xˆ i ) 2 / d i =1 Often express as percentage either of the mean measurement or the standard deviation of the measurements • Correlation coefficient of predicted versus true – has problems if the number of samples is small. • ANOVA and replicates analysis using lack-of-fit error, as discussed in the experimental design lectures. • Leaving samples out and predicting them : crossvalidation and testing will be discussed later. PROBLEMS •Outliers can be a major difficulty. Graphical ways of looking for outliers – big area. •Undue influence on least square models. MULTIWAVELENGTH Example : four compounds, four wavelengths. MULTIPLE LINEAR REGRESSION (MLR) X = C. B Know •X : a series of spectra •C : concentrations WAYS OF PERFORMING THE CALIBRATION 1. Producing a series of mixture spectra of known concentrations by weighing different amounts and adding together 2. Taking a series of spectra and calibrating against and independent method e.g. HPLC. 220 240 260 280 300 320 340 360 380 400 EXAMPLE : UV/VIS OF PAHs AT 4 WAVELENGTHS, NO WAVELENGTH IS UNIQUE B = X+ . C estimated [pyrene] = -3.870 A330 + 8.609 A335 – 5.098 A340 + 1.848 A345 Can also use classical methods Cˆ = X.S+ This can be done by knowledge of the pure spectra. Different to calibration where a series of mixtures recorded MULTIPLE LINEAR REGRESSION •Why use only 4 wavelengths? •Why not 10 or 100 wavelengths? More information – not arbitrary choice of wavelengths. •Number of wavelengths can be greater than number of compounds. C X = Example • 25 spectra • 10 compounds • 100 wavelengths B B = X+ . C In this case •B is a matrix of coefficients, 100  10 •X is a spectral matrix, 25  100 •C is a concentration matrix, 25  10 Some technical problems using inverse calibration in this case, and often it does not work. Better approach 1. First predict the spectra S. •Either they are known from the calibration of the pure standards •Or they can be predicted from the mixture spectra S  C+. X 2. Then use these predictions in a model (e.g. of unknowns) C  X. S+ MLR effectively models a spectrum as a sum of spectra of the components, e.g. for a 3 component model Observed spectrum = conc A  spectrum A + conc B  spectrum B + conc C  spectrum C ENHANCEMENTS • Selecting only certain variables, not all the wavelengths. • Weighting of variables. ERROR ANALYSIS This now becomes more sophisticated. In addition to errors in the “c” block (concentration errors), now also errors in the “x” block (reconstruction of spectra). Discuss later. LIMITATIONS AND PROBLEMS WITH MLR • Number of experiments and number of wavelengths must never be less than number of compounds • All significant compounds must be known. If still unknowns, then these are mixed up with the knowns. Problems if no pure standards and no reliable reference method. THIS IS THE BIGGEST LIMITATION. •Sometimes extra wavelengths can be bad ones e.g. noise or background. • Assume that concentrations are perfectly known, errors in only one variable, using classical approach. However if information on all the significant compounds is known then MLR is a simple an effective method. PRINCIPAL COMPONENTS REGRESSION (PCR) Do not need to know all components in advance, simply "how many components", and the compounds of interest. Overcomes a major limitation of MLR Detector (e.g. wavelength) Samples X c T . r PCA P T Regression concentration Samples T r c The first step is to perform PCA. Obtain a scores matrix, retaining A components The value of A may be a guess of the number of compounds in the mixture. Then r = T+. c Can extend to more than one concentration – CT.R T C  R Example 25 spectra taken at 100 wavelengths We know about and want to predict 4 compounds We think there are around 10 compounds in the mixture, 6 are unknown. T is a matrix of dimensions 25  10 C is a matrix of dimensions 25  4 R is a matrix of dimensions 10  4 Example of the calculation of the concentration of pyrene in a set of 25 uv/vis spectra containing 10 different PAHS. How many PCA components to use? The prediction gets better the more the number of components. ERRORS – “x” block Simply as in PCA, look at eigenvalues as more principal components are calculated 0.1 0.01 0.001 1 3 5 7 9 11 13 15 ERRORS – “c” block Look at errors in calculation of concentrations – often different behaviour 1 0.1 0.01 1 3 5 7 9 11 13 15 0.8 0.7 0.7 0.6 0.6 0.5 0.5 predicted concentration 0.8 0.4 0.3 0.4 0.3 0.2 0.2 0.1 0.1 0 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.8 0.1 0.2 0.3 0.4 observed concentration observed concentration 0.8 0.7 0.6 0.5 predicted concentration predicted concentration Predictions for pyrene concentration using 1, 5 and 10 principal components. 0.4 0.3 0.2 0.1 0 0.0 0.1 0.2 0.3 0.4 observed concentration 0.5 0.6 0.7 0.8 0.5 0.6 0.7 0.8 Why not use a large number of PCA components? Then one can get perfect prediction? FALLACY : the idea is to predict unknowns, after the knowns have been modelled. Later PCs often model noise. Choose no of PCs equal to number of compounds in the mixture? Methods for determining number of PCs described later when this is unknown. Advantage over MLR - only partial knowledge necessary. Disadvantage : assumption that all errors in the "x" block. Practical situation. •Modern instruments very reproducible. •Volumetrics, measuring cylinders, syringes are inaccurate. PARTIAL LEAST SQUARES (PLS) This technique assumes that errors in both “x” and “c” block are equally significant. = X c = T . . T P q + + E f What does this mean? X = T.P + E c = T.q + f THERE IS A COMMON SCORES MATRIX FOR BOTH “x” AND “c” BLOCKS. In PCR we calculate the scores just for the “x” block and then use a separate step for regression. A big difference between PCR and PLS is that in PCR there is only one scores matrix whereas for PLS (using 1 column) there are different scores matrices according for each compound. The vector q is analogous to loadings. PLS components have some analogies to PC components. In PCA, each component consists of a •scores vector •loadings vector •eigenvalue. In PLS, each component consists of a •scores vector • “x” loadings vector (p) • “c” loadings vector (q) – a single number • magnitude. FOR THE TECHNICALLY MINDED. •Unlike eigenvalues, the magnitudes of success PLS components do not necessarily decrease in size, although they do model the overall datasets. •Unlike loadings for PCA, loadings in PLS are not orthogonal. •In most cases PLS loadings are not normal. •There are many algorithms for PLS and it can be confusing. ERROR ANALYSIS : similar principles to PCR but different curves for different compounds. Sometimes different number of PLS components are used to model different compounds in one mixture. 60 50 40 c errors 30 x errors 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 For a dataset consisting of 25 spectra observed at 27 wavelengths, for which 8 PLS components are calculated, there will be •a T matrix of dimensions 25  8, •a P matrix of dimensions 8  27, •an E matrix of dimensions 25 27, •a q vector of dimensions 8  1 and •an f vector of dimensions 25  1. PLS2 – when more than one “c” variable = X C = T . . T P Q + + E F X = T.P + E C = T.Q + F Differences to PLS1 •C is now a matrix •Q is also a matrix •F is also a matrix •Single scores for all compounds in the mixture. •Theoretically PLS2 should perform better than PLS1 but in practice it often performs worse. •Computationally faster, important 10 years ago. •Useful for non-linear problems such as QSAR where interactions, but not so useful in analytical chemistry which is very linear. SUMMARY OF MAIN METHODS • Univariate calibration •Classical •Inverse •Multiple linear regression •Principal components regression •Partial least squares •PLS1 •PLS2

and “c”

Related documents

Products

Support

and “c”

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib