Coupling near infrared spectroscopy and chemometrics for food and drug authentication Federico Marini Or better…. Marini - WSC8 Outline • • • • • Nutritional quality of cereals Food contamination by mycotoxins Traceability of foodstuff Quantification of nutrients in baby powdered milk Determination of ee in drugs Marini - WSC8 Nutritional quality of cereals • Oat (Avena sativa) is considered one of the most important grain cereals for human consumption. • Indeed, oat products are important sources of dietary fiber, β-glucan, good nutritional value proteins, vitamins and other components,which are demonstrated to be beneficial for human health. • With the aim to assess the nutritive potential value in new naked oat genotypes during breeding work, this study focuses on the possibility of developing a rapid, accurate and precise alternative method for the simultaneous quantification of β-glucan and protein content in naked oat samples. Marini - WSC8 The data set • The whole data set comprises 168 naked oat samples from 12 varieties, originally coming from Italy and other European countries and being representative of a large genetic range. • 166 samples analysed as flour by NIR spectroscopy • 54 samples analyzed as flour by NIT spectroscopy • 168 samples analyzed as whole grain by NIT spectroscopy • Robust calibration models built by Partial Robust MRegression Marini - WSC8 Data set split Marini - WSC8 NIR on flour b-glucan Protein content Marini - WSC8 Did we need robust methods? • The significant amount of weights <1 both for horizontal and vertical outlyingness is an indication that the choice of using robust calibration Marini - WSC8 Results • Reflectance measurements on flour seem to be the best experimental setup. • However, a different test set was used in the three experiments. • Model building repeated on a common data set. Marini - WSC8 Comparison of the three spectroscopic setups • Analysis was carried out only on the 54 samples analyzed by the three different spectroscopic approaches • Clearer evidence of NIR on flour being the better setup Marini - WSC8 Outline • • • • • Nutritional quality of cereals Food contamination by mycotoxins Traceability of foodstuff Quantification of nutrients in baby powdered milk Determination of ee in drugs Marini - WSC8 Micotoxin • Micotoxins are products of secondary metabolism of pathogenic fungi • They are among the highest impact contaminants for cereal cultures. • Attention was focused on DON (a micotoxin produced by fungi of the species Fusarium) • EU limit of 1750 ppb for wheat to be considered contaminated. • NIR-based approach for quantification and/or assessment of the contamination status. Marini - WSC8 Data set • More than 150 samples analyzed at least in replicate by NIR and NIT. • 45 samples left aside as independent test set. Marini - WSC8 Calibration Marini - WSC8 • Best pretreatment: MSC + 2nd derivative • RMSEC=17.63; RMSEP=17.82 Classification • Best pretreatment: MSC + 2nd derivative • Classification accuracies: – 96.4% Cont.; 92.5% non cont. (calibr) – 80% cont.; 100% non cont. (test) Marini - WSC8 Outline • • • • • Nutritional quality of cereals Food contamination by mycotoxins Traceability of foodstuff Quantification of nutrients in baby powdered milk Determination of ee in drugs Marini - WSC8 Traceability: introduction • Labelling issues are of increasing concern. • Growth and promotion of “added value” regional foods such as those produced under “Organic” and “Designated Origin” labels. • Many labelling claims that relate to perceived added value are rarely supported by analytical data, leaving regulators to rely solely on paper auditing procedures to monitor compliance. • Need for analytical specifications for labelling issues relating to food origin: – geographical origin, – production origin – species origin. Marini - WSC8 Tracing the origin of foodstuff • The assessment of the typicalness of a product and its traceability should imply an analytical method to determine the origin of the sample. • Unfortunately, even if a great host of instrumental analytical techniques are at present under investigation, no one of those can be listed whose results can be directly related to the origin of the samples. An alternative way to cope with this problem is to use mathematical and statistical methods (chemometrics) to process the results of a set of determination performed on the samples in order to obtain the desired classification. Marini - WSC8 An example: olive oil • Authentication of the origin of olive oil samples • 57 extra virgin olive oil samples – 20 from Sabina, Lazio (13 harvested 2009, 7 harvested 2010) – 37 samples of different origin (22 from 2009, 15 from 2010 • MIR and NIR spectra recorded on each sample Marini - WSC8 Training/test set selection • Duplex algorithm repeated class-wise on each pretreatment separately (Split ratio: 2/1) • Data selected more than 10 times (out of 15) in test set Marini - WSC8 PLS-DA on MIR data Pretreatment Linear baseline Quadratic baseline st 1 derivative (SG) nd 2 derivative (SG) MSC MSC + quadratic baseline st MSC + 1 derivative nd MSC + 2 derivative LV 6 6 7 3 3 4 6 3 % Correct Classification Calibration Sabina Other origins 100.0 100.0 100.0 100.0 100.0 100.0 84.6 86.4 100.0 95.5 100.0 95.5 100.0 100.0 84.6 86.4 • Best results with MSC + quadratic bl. • %cc on test set: 85.7% (sabina); 86.7% (other origins) Marini - WSC8 % Correct Classification Cross-validation Sabina Other origins 92.3 86.4 92.3 86.4 84.6 86.4 84.6 72.7 84.6 95.5 92.3 95.5 84.6 86.4 84.6 68.2 PLS-DA on NIR data Pretreatment MSC Detrending st 1 derivative (SG) nd 2 derivative (SG) MSC + detrending st MSC + 1 derivative nd MSC + 2 derivative LV 3 4 5 3 4 4 4 % Correct Classification Calibration Sabina Other origins 100.0 95.5 100.0 95.5 100.0 95.5 92.3 81.8 100.0 95.5 92.3 95.5 84.6 90.9 % Correct Classification Cross-validation Sabina Other origins 100.0 95.5 100.0 95.5 100.0 95.5 76.9 86.4 100.0 95.5 92.3 90.9 84.6 86.4 • Best results in CV with 4 pretreatments. • %cc on test set (d1): 100% (sabina); 100% (other origins) • %cc on test set (other 3): 100% (sabina); 93.3% (other origins) Marini - WSC8 Spectral interpretation • All the spectral regions identified as relevant correspond to significant NIR features: – the bands at around 4450-5000 cm-1, which may be attributed to combination bands of C=C and C-H stretching vibration of cis unsaturated fatty acids, – the bands between 5650 and 6000 cm-1, due to the combination bands and first overtone of C-H of methylene of aliphatic groups of oil, – and those between 7074 and 7180 cm-1, corresponding to C-H combination band of methylene. Marini - WSC8 SIMCA on MIR data Pretreatment Linear baseline Quadratic baseline 1st derivative (SG) 2nd derivative (SG) MSC MSC + quadratic baseline st MSC + 1 derivative nd MSC + 2 derivative PC 3 4 4 4 4 4 3 4 Calibration % Sensitivity 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 % Specificity 72.73 81.82 72.73 63.64 77.27 81.82 63.64 63.64 Cross-validation % Sensitivity 76.92 61.54 84.62 69.23 69.23 69.23 92.31 76.92 • Sensitivity decreases in CV. • Best model with d1 (based on geometric average of sens & spec) • Test set: 71.43% (sens.); 73.33% (spec.) Marini - WSC8 % Specificity 81.82 95.45 81.82 68.18 86.36 86.36 63.64 68.18 SIMCA on NIR data Pretreatment MSC Detrending st 1 derivative (SG) nd 2 derivative (SG) MSC + detrending st MSC + 1 derivative nd MSC + 2 derivative PC 4 4 3 5 3 3 4 Calibration % Sensitivity % Specificity 100.00 100.00 100.00 95.45 100.00 90.91 100.00 100.00 100.00 90.91 100.00 95.45 100.00 100.00 Cross-validation % Sensitivity % Specificity 61.54 100.00 76.92 95.45 69.23 90.91 61.54 100.00 76.92 90.91 76.92 95.45 69.23 100.00 • Best model with det & MSC+d1 (based on geometric avg. of sens & spec) • Test set (det): 100% (sens.); 93.33% (spec.) • Test set (MSC+d1): 71.43% (sens.); 86.67% (spec.) Marini - WSC8 Effect of year (SIMCA NIR) Pretreatment MSC Detrending st 1 derivative (SG) nd 2 derivative (SG) MSC + detrending st MSC + 1 derivative nd MSC + 2 derivative PC 3 3 3 4 3 2 3 Calibration % Sensitivity % Specificity 100.00 68.18 100.00 100.00 100.00 100.00 100.00 86.36 100.00 100.00 100.00 95.45 100.00 86.36 • Best model with det, d1 & MSC+det • Models highly sensitive and specific in cal. &CV • Test set (2010 samples): high specificity, no sensitivity Marini - WSC8 Cross-validation % Sensitivity % Specificity 76.92 81.82 84.62 100.00 84.62 100.00 61.54 90.91 84.62 100.00 84.62 95.45 84.62 90.91 Effect of year (Coomans) • If discriminant classification is sought, still a good classification ability could be obtained, notwithstanding the diversity Marini - WSC8 A second example: pistachio nuts • The major pistachio-producing countries of the world are, in order, Iran, USA (California), Turkey, Syria, but to lesser extent, other countries cultivate pistachios as well, between these Italy and India. • In Italy, only one variety (Bianca) is cultivated mainly in Bronte. • Italian production is very low in comparison to that of Asia and the USA; however, it is compensated by the very high quality. • Each producing country’s applied tariff rates and national laws on commodities vary dramatically. Therefore, pistachio variation in quality, food safety (e.g., contamination by aflatoxins), import/export fees, legal implications, and financial concerns makes determining the country of origin for pistachios important to protect the consumers against potential fraud, and there is a need to develop analytical methods to determine their geographical origin. Marini - WSC8 Samples • 483 pistachio samples from the 4 main producing countries + Italy and India were analyzed (NIR spectra recorded on both halves of nut and averaged) Country Nr. of samples Italy (Bronte) 41 Iran 121 India 41 Syria 40 Turkey 120 USA 120 Marini - WSC8 Training/test splitting Country Nr. of samples Italy (Bronte) 22+19 Iran 83+38 India 23+18 Syria 25+15 Turkey 86+34 USA 81+39 Marini - WSC8 PLS-DA modeling Bronte India Iran Syria Turkey USA Calibration 97.48% 96.82% 90.54% 96.47% 95.17% 99.79% CV 97.32% 95.30% 89.48% 93.97% 93.15% 99.17% Prediction 95.14% 91.29% 83.59% 93.63% 91.71% 99.19% Best model: MSC+detrending Bronte: red India: blue Iran: black Syria: green Turkey: cyan USA: purple Training set: empty Test set: filled Marini - WSC8 Predictions Marini - WSC8 VIP Marini - WSC8 SIMCA • Optimal complexity evaluated as those resulting in the best geometric average between sensitivity and specificity in CV Marini - WSC8 SIMCA • Best model: MSC+detrending Class Sensitivity (Cal) Specificity (Cal) Sensitivity (CV) Specificity (CV) Sensitivity (Pred) Specificity (Pred) Bronte 95.45% 95.64% 72.73% 95.30% 89.47% 96.53% India 100.00% 90.57% 65.22% 93.60% 83.33% 98.62% Iran 98.80% 68.35% 93.98% 67.93% 92.11% 76.80% Syria 88.00% 88.81% 88.00% 87.46% 73.33 82.43% Turkey 95.35% 79.06% 84.88% 80.77% 73.53% 80.62% USA 93.83% 100.00% 85.19% 100.00% 87.18% 99.19% Marini - WSC8 SIMCA models Marini - WSC8 Bronte in detail Marini - WSC8 Outline • • • • • Nutritional quality of cereals Food contamination by mycotoxins Traceability of foodstuff Quantification of nutrients in baby powdered milk Determination of ee in drugs Marini - WSC8 Quantification of nutrients in powdered milk for babies • Baby powdered milk is a product based on milk of cows or other animals and/or other ingredients which have been proven to be suitable for infant feeding. • The nutritional safety and adequacy should be scientifically demonstrated to support normal growth and development of infants. • In addition to the compositional requirements, other ingredients may be added to ensure that the formulation is suitable as the sole source of nutrition for the infant, or to provide other benefits that are similar to outcomes of populations of breastfed babies. • The suitability for the particular nutritional uses of infants and the safety of additional compounds added at the chosen levels shall be scientifically demonstrated. Marini - WSC8 Samples • Preliminary results only on lipid content. Marini - WSC8 BiPLS … . • • • Spectral region is divided into intervals & PLS models are computed removing one interval at a time. Interval whose deletion results in lowest RMSECV is removed. Procedure is iterated up to the desired number of retained variables 8 intervals 1600 variables Marini - WSC8 7 intervals 1400 variables 6 intervals 1200 variables GA cromosom i Population of n chromosomes with m genes P(t) .... n . 1 a ... .. . b ... .. . c .. .. . d .e.. . . generation evaluation of the fitness of chromosomes ranking of chromosomes according to fitness cross-over mutation new generation P(t+1) stop Marini - WSC8 BiPLS-GA Method RMSECV No sel 1.41 BiPLS 0.96 BiPLS-GA 0.90 Marini - WSC8 Outline • • • • • Nutritional quality of cereals Food contamination by mycotoxins Traceability of foodstuff Quantification of nutrients in baby powdered milk Determination of ee in drugs Marini - WSC8 Motivation • Quite often only one enantiomeric form of an IPA is pharmacologically active. • After cases such that of thalidomide, FDA recommends that the enantiomeric purity of pharmaceutical formulations is checked. • Present methods: – Polarimetry – HPLC on chiral columns – NMR • In this study the possiblity of using NIR+chemometrics to predict the enantiomeric excess of an IPA is studied. Marini - WSC8 Ibuprofen: phase diagram • Phase diagram confirms that ibuprofen crystalizes as a racemic mixture. • This suggests that R and S have the same spectrum but both have a different spectrum than the racemate Marini - WSC8 Ibuprofen: NIR spectra Marini - WSC8 Ibuprofen: calibration & testing • • • • • Best pretreatment: MSC + 2nd der 4 LVs RMSEC=2.11 RMSECV= 2.43 RMSEP=1.71 Marini - WSC8 Interpretation - VIP • 4000-4800 cm-1: combination of bands from methylenic CH and overtones of the bendings • 6000 cm-1: First overtone of the aromatic C-H stretching and of asymmetric stretching of methyls Marini - WSC8 Marini - WSC8