Multivariate Additive PLS Spline Boosting in Agro

Multivariate Additive PLS Spline Boosting in Agro-Chemistry studies
Rosaria Lombardo*1, Jean-François Durand2, Antonio P. Leone3
1
Economics Faculty, Second University of Naples, Capua (CE) 81043, Italy, rosaria.lombardo@unina2.it
2
Montpellier II University, cedex 5 Montpellier 34095, France, jf.durand@orange.fr
3
Italian National Research Council (CNR), ), Institute for Mediterranean Agriculture and Forest Systems Ercolano (NA)
80056 Italy, antonio.leone@isafom-cnr.it
*Corresponding author: Rosaria Lombardo, via Gran Priorato di Malta, Economics Faculty, Department of Business
Strategy and Quantitative Methods, Second University of Naples, Capua 81043 Capua (CE) Italy tel: +390823274032/
+390810601382 Fax: +390823622984
Abstract. Routinely, the multi-response Partial Least-Squares (PLS) is used in regression and classification problems
showing good performances in many applied studies. In this paper, we aim to present PLS via spline functions focusing
on supervised classification studies and showing how PLS methods historically belongs to L2 boosting family. The
theory of the PLS boost models is presented and used in classification studies. As a natural enrichment of linear PLS
boost, we present its multi-response non-linear version by univariate and bivariate spline functions to transform the
predictors. Three case studies of different complexities concerning soils and its products will be discussed, showing the
gain in diagnostic provided by the non-linear additive PLS boost discriminant analysis compared to the linear one.
Keywords: Partial Least-Squares regression, L2 boost, B-splines, Supervised Classification Analysis, Generalized
Cross-Validation, Agro-chemical data.
1.
Introduction
In recent years, reduction methods have quickly become established as valuable tools for chemical data analysis, a lot of
efforts have been expended on exploiting reduction methods like principal component analysis, linear discriminant
analysis and partial least squares methods. In particular, Partial Least Squares (PLS) [1-5] became a popular method for
the linear prediction and modeling of high dimensional data, especially in the field of chemometrics [6-7]. When
dimension reduction is needed and classification is sought, Partial Least Squares Discriminant Analysis (PLS-DA) can
represent a suitable technique. Linear discriminant approach can be carried out via a PLS regression analysis with an
indicator matrix reflecting the classes of the training set observations. In spite of this, many applied scientists have
presented generalizations of PLS to supervised classification or discrimination purposes [8-14].
PLS and its variants to the non linear framework indicate a family of related multivariate modeling methods, derived
from the concepts of iterative fitting developed by Wold and others [1, 2, 15]. These methods offer pragmatic solutions
to some of the problems that are encountered when conventional ordinary least squares methods are applied to large,
inter-correlated data sets, in case of mixed and non-linear data.
Facing nonlinearities, a large number of non-linear PLS approaches have been proposed in literature, for example by
incorporating second order polynomials or by the use of logistic functions [3, 16-18], by neural networks [19] by the
Box-Tidwell transformation [20] or by transformation functions with regards to components [21, 22] or to predictors
[23]. Among those generalizations in this paper, we stress the transformation of the predictors by spline functions [2326].
Recently, Durand [27] discussed linear and non-linear PLS models in the framework of the powerful family of L2
boosting regression [28] whose philosophy is based on the explicit refitting of residuals.
In this paper first, the PLS boosting is presented in this historical linear framework, where the components or latent
variables are base learners. Successively in the non-linear version, the base learners are expressed by univariate Bspline basis functions and by bivariate B-spline basis functions to catch relevant interactions. Non-linear PLS boosting
is not only a prediction tool but also an estimation method for models with a specific structure such as linearity or
additivity. According to the presence or not of interactions, the obtained non-linear additive models are respectively
called Multivariate Additive PLS Splines, MAPLSS [25] or PLS Splines, PLSS [23]. The website www.jf-durandpls.com is dedicated to the PLS boosting, where the associated source codes written in R (R development core team,
2006), data sets, articles, courses and conferences are available to download.
Our aim is also to present the PLS boost working in the supervised classification context. When the learning responses
indicate the membership to classes, the multi-response PLS boost has the ability to classify new measured objects. The
non-linear PLS boost classifier is proposed to improve, if possible, the performance of PLS-DA when complex (mixed
and non-linear variables) data sets are involved.
Three case studies of different interest will be discussed. The first concerns the use of MAPLSS boost and vis–NIR
reflectance spectroscopy to predict organic carbon content in soils from Southern Italy, working with a data base kindly
provided by the Institute for Mediterranean Agriculture and Forest Systems of the Italian National Research Council.
The second case study concerns the problem to discriminate the level of nitrates-nitrogen (NO3-N) based on some other
chemical and physical characteristics of a little sample of soils from West Virginia (NRCS, 2010) using MAPLSS-DA.
The third example illustrates the capability of MAPLSS-DA to classify olive oil varieties knowing some of their
chemical constituents [29].
The paper is structured so that, in the second section, the usual linear PLS regression is presented as an L2 boosting
method where the base learners are the PLS components, also called the latent variables. Having in view to capture nonlinear relationships and relevant interactions, Section 3 shows how the B-spline basis functions are used to enhance the
potential of PLS boost, while in Section 4 we focus on supervised classification that is on linear and non-linear PLS
discriminant analysis. In Section 5 some applications to soil data-sets illustrate the capabilities of the PLS boost
classifier and the gain in performance when passing from linearity to non-linearity.
2. The Ordinary PLS Regression: an L2 Boosting Method
Boosting is one of the most powerful learning ideas introduced in the last ten years, originally designed for both
classification and regression problems. It can be seen as an estimation method for models with a specific structure such
as linearity or additivity [30, 31], where the word “additive” does not imply a model fit which is additive in the
explanatory variables, but refers to the fact that boosting is an additive combination of function estimators [32].
Indeed boosting can represent an interesting regularization (or smoothness) scheme for estimating a model. Boosting
methods have been originally proposed as ensemble methods which rely on the principle of generating multiple
predictions from re-weighted data. Boosting iteratively fits a model to a data set with weights that depend on the
accuracy of the earlier iterated models and uses a linear combination of these models for producing the final prediction.
In literature [28, 31], the boosting algorithms can also be seen as functional gradient descent techniques.
When boosting methods are implemented as optimization techniques using the squared error loss function, they are
called L2 Boosting [28]. Moreover L2 Boosting also characterizes the PLS regression and its generalizations [27] as it
will be discussed in the following section. In regression framework, L2 Boost is based on the explicit expression of
refitting residuals. As a matter of fact simple variants of boosting with L2 loss have been proposed in literature [28, 32].
Like linear and non-linear PLS algorithms, L2 Boost is an additive combination of simple functions of estimators and
iteratively uses a pre-chosen fitting method called the learner. The improved performance through boosting of a fitting
method (i.e. the learner) has been impressive. It describes a procedure that combines the outputs of many “weak”
models to produce a powerful model “committee”. As proven [33], L2 Boosting for linear model yields consistent
estimates in the very high-dimensional context.
The possibility of the boosting procedures comes with the availability of large data sets where one can set aside part of
it as the test set. To fix the number of boosting iterations, cross-validation based on random splits, or the corrected
Akaike information criterion (AIC) as a computationally efficient approach, are usually applied.
2.1 Boosting in Brief
To explain boosting as a functional gradient descent technique, consider the centered training sample
{𝑦𝑖 , 𝑋𝑖 }𝑛𝑖=1
where the response 𝑦 ∈ ℝ may be continuous or categorical according to regression or classification problems, and the
predictor X = (π‘₯ 1 , … , π‘₯ 𝑝 ) ∈ ℝ𝑝 . The task is to estimate an additive parametric function
𝑀
𝐹̂ (𝑋, π›Όπ‘š , πœƒπ‘š )𝑀
π‘š=1 = ∑π‘š=1 π›Όπ‘š β„Ž(𝑋, πœƒπ‘š )
(1)
where πœƒπ‘š is a finite or infinite-dimensional parameter, π›Όπ‘š is the weight coefficient and the parametric function β„Ž(βˆ™) is
the base learner, which aims at capturing nonlinearities and interactions. The most efficient base learners are decision
trees, splines, wavelets. The specification of the data re-weighting mechanism and the linear combination coefficients
{π›Όπ‘š }𝑀
π‘š=1 are crucial and different choices characterize different ensemble schemes [32].
The model (1) is built in M steps, each step takes into account the model part not well predicted by the preceding one. M
is the dimension of the additive model usually chosen, for example by Cross-Validation, to minimize the expected cost
𝐸{𝐢[y, 𝐹(𝑋)]}, on which the data re-weighting mechanism is based.
In order to ensure that the gradient method works well, the cost function is assumed to be smooth and convex in the
second argument. In literature the most important cost functions are
𝐢(𝑦, 𝐹) = 𝑒π‘₯𝑝(𝑦, 𝐹), with 𝑦 ∈ {−1,1}, Ada boost cost function,
𝐢(𝑦, 𝐹) = {(𝑦 − 𝐹)2 /2}, with 𝑦 ∈ 𝑅, 𝐿2 boost cost function,
𝐢(𝑦, 𝐹) = π‘™π‘œπ‘”2 [1 + 𝑒π‘₯𝑝(−2𝑦𝐹)] , with 𝑦 ∈ {−1,1}, Logit boost cost function.
When the cost function is the squared loss, the solutions of the expected cost problem, obtained by a steepest descent
gradient algorithm with respect to F, are of the form F(x)=E[y | X = x].
2.2 Linear PLS as L2 boosting model
In linear PLS we set the base learner as equal to a latent variable or PLS component
𝑑 = β„Ž(𝑋, πœƒ) =< 𝑋, πœƒ >
(2)
with πœƒ defined in ℝ𝑝 and where the symbol < > represents the usual inner product. In non-linear PLS boosted by
splines, the base learners are of the same type, but πœƒ is defined in β„π‘Ÿ , where π‘Ÿ ≥ 𝑝.
The cost to be paid to capture nonlinearities and interactions is well supported by PLS which faces robustly with
problems with dimension π‘Ÿ ≫ 𝑛. In the framework of L2 boosting methods, we can say that linear PLS components are
linear compromises of pseudo-predictors built from centered design matrices, maximizing the covariance with the
response variables for each component.
Following Friedman [30], the L2 boosting can be presented as a steepest gradient descent algorithm in regards to F.
Algorithm 1 (L2 boosting ):
1. F0 = 0
2. For m = 1 to M do
πœ•πΆ(𝑦,𝐹)
3.𝑦̃𝑖 = [
]
πœ•πΉ
𝐹=πΉπ‘š−1 (𝑋𝑖 )
= 𝑦𝑖 − πΉπ‘š−1 (𝑋i )
2
4.(π›Όπ‘š , πœƒm ) = arg π‘šπ‘–π‘›π›Ό,πœƒ ∑𝑛𝑖=1[𝑦̃𝑖 − π›Όβ„Ž(𝑋𝑖 , πœƒ)]
5. πΉπ‘š (𝑋) = πΉπ‘š−1 (𝑋) + π›Όπ‘š β„Ž(𝑋, πœƒπ‘š )
6. end For
The step 3 of the algorithm 1 defines the pseudo-response as gradient of function F which represents the current
residual. Historically, the first L2 boosting algorithm was proposed by Tukey [34] under the name “twicing” that
consists of fitting the residuals of the ordinary least-squares regression to build an additive model. In step 4, the
parameters of the base learner are estimated by least squares. It is slightly different in PLS where a component 𝑑 is
estimated by a criterion based on maximizing the empiric covariance between the pseudo-response and a linear
compromise of the predictors. This is the reason we need a light extension of the L2 boosting algorithm, then Step 4 is
split in two sub-steps to estimate πœƒπ‘š , and π›Όπ‘š separately.
Algorithm 2 (extension of L2 Boosting): Substitute step 4 of the algorithm 1 by
-step 4.1 Criterion of proximity between X et y to compute πœƒπ‘š from the data
2
-step 4.2 π›Όπ‘š = arg π‘šπ‘–π‘›π›Ό ∑𝑛𝑖=1[𝑦̃𝑖 − π›Όβ„Ž(𝑋i , πœƒπ‘š )]
The algorithm 2 reduces to the algorithm 1 when, in sub-step 4.1, the least squares are used to estimate πœƒπ‘š [28].
Actually, the PLS base leaner is also and firstly built from a pseudo-predictor 𝑋̃ by a covariance based criterion that
gives a component 𝑑 =< 𝑋̃, π‘€π‘š >. Pseudo-predictors belong to the linear space span(X), see Tenenhaus [2] for the
∗
relationship between π‘€π‘š and πœƒπ‘š also called π‘€π‘š
.
Algorithm 3 (PLS as an extension of the boosting L2 ):
1. 𝑋̃ = 𝑋,
2. 𝑦̃ = 𝑦,
3. For m = 1 to M do
4. Compute πœƒπ‘š and π›Όπ‘š
4.1 Compute t and update 𝑋̃ from the data
4.1.1 π‘€π‘š = arg π‘šπ‘Žπ‘₯||𝑀||=1 π‘π‘œπ‘£ (< 𝑋̃, 𝑀 >, 𝑦̃) and 𝑑 = < 𝑋̃, π‘€π‘š > = < 𝑋, πœƒπ‘š >
4.1.2 𝑋̃ = 𝑋̃ − OLS(𝑋̃, 𝑑).
4.2 Partial regression and update the residual
2
4.2.1 π›Όπ‘š = arg π‘šπ‘–π‘›π›Ό ∑𝑛𝑖=1(𝑦̃𝑖 − 𝛼𝑑𝑖 )
4.2.2 𝑦̃ = 𝑦̃ − π›Όπ‘š < 𝑋, πœƒπ‘š >
5. end For
Once computed the base learner, sub-step 4.1.2 ends by updating the p pseudo-predictors as the residuals in the p
ordinary least-squares (OLS) regressions of the previous pseudo-predictors onto the current base learner 𝑑. In step 4.2.1,
the regression of the pseudo-response on the component is called the partial least-squares regression. Notice that the
boosting idea appears naturally developed in PLS when adding one by one the M equations of 4.2.2
𝑦 = 𝛼1 < 𝑋, πœƒ1 > + β‹― + 𝛼𝑀 < 𝑋, πœƒπ‘€ > + 𝑦̃𝑀 = 𝐹̂𝑀 (𝑋) + 𝑦̃𝑀 .
(3)
Having in view the multi-response case, the PLS model (3) holds for each of the q responses (𝐲 𝟏 , … , 𝐲 πͺ ) simultaneously
predicted. Denoting by Y the n by q sampling response matrix, by {𝐭 π‘š ∈ ℝ𝑛 }π‘š=1,𝑀 the column vectors of the sampled
components, and by {πœΆπ‘š ∈ β„π‘ž }π‘š=1,𝑀 the set of associated coefficients, then the expression (3) becomes
̃𝑀 = 𝐘
Μ‚(𝑀) + 𝐘
̃𝑀 .
𝐘 = 𝐭1 𝛂1′ + β‹― + 𝐭 𝑀 𝛂𝑀′ + 𝐗
(4)
As a consequence of the same operation (sub-step 4.1.2) on the M equations, there exists {π©π‘š ∈ ℝ𝑝 }π‘š=1,𝑀 such that the
original n by p design matrix 𝑿 is modeled by
̃𝑀 = 𝐗
Μ‚(𝑀) + 𝐗
̃𝑀 .
𝐗 = 𝐭1 𝐩1′ + β‹― + 𝐭 𝑀 𝐩𝑀′ + 𝐗
(5)
The PLS components are mutually uncorrelated, π‘π‘œπ‘£(𝑑 π‘š1 , 𝑑 π‘š2 ) = 0 with π‘š1 ≠ π‘š2 . Then, the components {𝒕1 , … , 𝒕𝑀 }
constitute a basis of a subspace of π‘ π‘π‘Žπ‘› 𝐗. When the rank of the matrix 𝐗 is equal to M, the PLS regression becomes
exactly the ordinary least-squares regression. In short,
if 𝑀 = rank(𝐗), then PLS(𝐗, 𝐘) ≡ {OLS(𝐗, π’šπ‘— )}𝑗=1,..,π‘ž
(6)
The upper bound for 𝑀 is given by the rank of 𝐗, but usually 𝑀 is less than the rank of 𝐗. The choice of the best 𝑀 can
be done by cross-validation and its PRESS criterion or by the GCV criterion, which will be described as regards PLSS
in the next section.
Furthermore, notice that when 𝐗 ≡ 𝐘, the components are of variance maximum and the self-PLS regression of 𝐗 onto
itself becomes coincident with the Principal Component Analysis of 𝐗
PLS(𝐗, 𝐘 = 𝐗) ≡ PCA(𝐗).
(7)
As consequence, PLS can also be viewed as a unifying tool for both regression and exploratory data analysis and
therefore shares with PCA the property of displaying plots of the observations explained by the variables.
2.3 Choosing the Model Dimension M
Let us first, recall that, due to (2) and (3), the PLS model is linear not only with respect to the base learners 𝑑 but also in
regards to the natural predictors 𝑋. The multi-response fit can be expressed as
𝑦̂ 𝑗 (𝑀) = < 𝑋, 𝛽̂ 𝑗 (𝑀) > ,
𝑗 = 1, … , π‘ž
(8)
where the 𝛽̂ coefficients are recursively computed from the θ values, see [27].
The success of PLS depends on the low number 𝑀 of latent variables, incorrelated with each other and easily
interpretable thanks to the original independent variables 𝑋.
This economy in determining 𝑀 makes the PLS model robust in case of small sample size and in presence of multicollinearity between the independent variables. In absence of a test sample, the choice of 𝑀 is usually done by crossvalidation on the sample learner. For example, at each time, for 𝑖 = 1, … , 𝑛, one individual is left out from the sample
and predicted by the remaining individuals
The mean-squares error of prediction on each response, PRESS, PRredictive Error Sum of Squares, is the established
criterion to measure the model quality
π‘ž 1
𝑛
2
𝑗
𝑗
PRESS(π‘š) = ∑
∑ (𝑦𝑖 − < 𝑋𝑖 , 𝛽̂−𝑖 (π‘š) >) , π‘š = 1, … ,rank(X).
𝑛
𝑗=1
𝑖=1
It allows to choose the best dimension 𝑀 of the model, which is in general given by that value of π‘š allowing the lowest
value of PRESS. The computational cost of PRESS is sometimes very high, an alternative criterion proposed in
literature [25] is given by GCV, generalized cross validation [35] which is a penalized mean-squares error
2
1
∑π‘žπ‘—=1 ∑𝑛𝑖=1(𝑦𝑖𝑗 − < 𝑋𝑖 , 𝛽̂ 𝑗 (π‘š) >)
𝑛
GCV(𝑐, π‘š) =
π‘š 2
(1 − 𝑐 )
𝑛
where 𝑐 is a coefficient of penalty and the product, π‘π‘š, is a measure of the effective number of parameters, i.e, the
effective dimension of the model. By default the coefficient of penalty is set equal to 1, because the dimension of linear
PLS model is π‘š, which changes when the base learner is enriched by B-spline functions.
In synthesis, the linear PLS model belongs to the family of L2 boosting methods with the particularity of using pseudopredictors which allow us to explore the data via scatter-plots of observations in the same way as PCA does. More
generally, a base learner β„Ž(𝑋, πœƒ) is usually a parametric function of X whose aim is to capture nonlinearities and
interactions. As we have seen in PLS, the base learner is a linear compromise of the predictors, namely the latent
variable or the component 𝑑. One natural way of extending PLS to non-linearity is to construct the base learner as a
linear combination of B-spline functions.
3. PLS boosted by Splines
Observing that the structure of linear PLS regression is similar to that of L2 boost regression, then the linear property of
the PLS learner can represent the natural framework for regression splines to put back the PLS in the historical
objective of boosting that is to catch possible non-linearities and relevant interactions [27]. This generalization takes
two forms whether or not interactions are captured: PLSS for Partial Least-Squares Splines and MAPLSS for
Multivariate Additive PLS Splines [24, 25] to catch relevant bivariate interactions. Both models were inspired by Gifi
[36] who replaced in PCA and in other related methods of exploratory data analysis, the design matrix 𝐗 by the matrix
𝐁 of the variables transformed by splines to explore non-linearities.
3.1 Regression Splines in short
Each variable π‘₯ 𝑖 is associated with a set of functions {𝐡1𝑖 , … , π΅π‘Ÿπ‘–π‘– } called B-splines or basis splines [37] which enjoy
several properties. A spline function is a linear combination of basis splines whose coefficients πœƒ are estimated in a
regression context
π‘Ÿ
𝑖
𝑠 𝑖 (π‘₯) = ∑π‘˜=1
πœƒπ‘˜π‘– π΅π‘˜π‘– (π‘₯).
In brief, a spline 𝑠 𝑖 , defined in the range [π‘₯−𝑖 , π‘₯+𝑖 ] of π‘₯ 𝑖 , is formed of piecewise polynomials of the same degree 𝑑 𝑖 that
join end to end on 𝐾 𝑖 points inside the interval ]π‘₯−𝑖 , π‘₯+𝑖 [ and called the knots. The smoothness of the junction between
two adjacent polynomials is managed by choosing simple or multiple knots at that junction point. The B-splines
constitute a basis of the linear space of splines whose dimension is π‘Ÿ 𝑖 = 𝑑 𝑖 + 1 + 𝐾 𝑖 . Observe that 𝐾 𝑖 = 0 produces the
set of polynomials of order 𝑑 𝑖 + 1 on the whole interval of data. Among several appealing properties, the B-spline
family is a set of fuzzy coding functions
0 ≤ π΅π‘˜π‘– (π‘₯) ≤ 1 and ∑
π‘Ÿπ‘–
π‘˜=1
π΅π‘˜π‘– (π‘₯) = 1.
Each B-spline function is of local support, that is π΅π‘˜π‘– (π‘₯), inside some interval called the support defined by two
particular distinct knots and null outside. Notice that the πœƒ coefficients cannot be easily interpreted except in the case of
splines of degree 0, that are piecewise constant functions, where each coefficient is the weight of the concerned local
support, expressing the level of the predictor, for example weak, mean or high in the case of two knots. Moreover, a
judicious choice of knots allows the user to build robust models against extreme values of the predictors, see [23].
The ordinary least-squares regression is a natural way of estimating πœƒ, thus leading to the Least-Squares Splines, LSS,
[38]. Denoting 𝐁 = [𝐁1 | … |𝐁 𝑝 ] the n by π‘Ÿ = ∑𝑖=1,…,𝑝 π‘Ÿ 𝑖 matrix of the transformed observations on the predictors, LSS
may be written in short
LSS(𝐗, 𝐲)=OLS(𝐁, 𝐲).
However, that way of capturing non-linearities by expanding the column dimension π‘Ÿ of the new design matrix 𝐁 is
subjected to the “curse of dimensionality” when 𝐁 is not of full column rank. For this reason, cubic smoothing splines
have been shown to be attractive as learners in 𝐿2 boosting by Bühlmann and Yu [29] in regression and classification
context. In very high dimensions these authors recommend among learners those that permit some sort of variable
selection like, for example, regression trees. This remark enforces the necessity of some sort of reduction of the model
dimension that PLS naturally does by requiring generally a small number M of its structural base learners, namely the
components or latent variables that, even if artificial, actually play the role of relevant predictors.
Using regression splines, the choice of knots is more crucial than that of the degree which most often is 0, 1 or 2.
Finding the optimal number and locations of knots is a difficult optimization problem even in the simple least-squares
regression because local optima are often captured by any implemented algorithm (see among others [39, 40]). One can
reasonably think that optimality is not essential for satisfying results. In simple regression and density estimation
literature, an attractive approach is that of Eilers and Marx [41] with their Penalized P-splines. It consists in using a
large number of equidistant knots and a difference penalty on coefficients of adjacent B-splines. They show that only
the penalty parameter is then to be chosen through the Akaike information criterion (AIC).
In the multiple predictors and/or multi-response context like that of multiclass classification problems, our strategy is
inspired by a data mining paradigm (described in [23, 27]) that consists in going back between the choice of the model
and its validation through the PRESS or the GCV criterion. The aim is to find a balance between the parcimony of the
model (M and r) and the goodness of prediction (PRESS or GCV). Our ascendant strategy consists in increasing
progressively the degree and the knots. It starts with a low degree, 0 in case of a categorical variable or 1 in the
continuous case, and few knots (no knot means to explore polynomial models). Then, more flexibility is achieved by
introducing knots, knowing that introducing one knot increases the local flexibility and then the freedom of adjusting
the data in that area. This empirical approach may be laborious in case of many predictors as for example in
spectroscopic data sets, then, a reasonable starting point can be equally spaced knots or knots at quantiles. However, in
many cases this strategy allows the user to take into account the knowledge of the phenomena that take part in the
elaboration of the data.
3.2 PLS, PLSS and MAPLSS boost models
The PLSS base learner is given by
π‘Ÿπ‘–
𝑝
𝑝
𝑑 = β„Ž(𝑋, πœƒ) = ∑𝑖=1 ∑π‘˜=1
πœƒπ‘˜π‘– π΅π‘˜π‘– (π‘₯ 𝑖 ) = ∑𝑖=1 β„Žπ‘– (π‘₯ 𝑖 ).
(9)
Each PLSS component can be represented as a sum of coordinate spline functions and the graphics of coordinate curves
{(π‘₯ 𝑖 , β„Žπ‘– (π‘₯ 𝑖 ))}𝑖=1,..,𝑝 are used to interpret the influence of the explanatory variables on the latent variable 𝑑. The
𝑝
dimension of the learner passes from p in the linear model to π‘Ÿ = ∑1 π‘Ÿ 𝑖 in the additive spline model. The PLSS
algorithm is based on replacing in PLS the 𝐗 design matrix by the column centered matrix 𝐁 = [𝐁1 | … |𝐁 p ]. In short
PLSS(𝐗, 𝐘)=PLS(𝐁, 𝐘).
𝑝
𝑝
In (8), replacing 𝑋 by (𝐡11 (π‘₯ 1 ), … , π΅π‘Ÿ11 (π‘₯ 1 ), … , 𝐡1 (π‘₯ 𝑝 ), … , π΅π‘Ÿπ‘ (π‘₯ 𝑝 )) and regrouping the terms lead to fit the π‘—π‘‘β„Ž response
π‘Ÿ
𝑝
𝑗
𝑝
𝑗,𝑖
𝑖
𝑦̂ 𝑗 (𝑀) = ∑𝑖=1 ∑π‘˜=1
π›½π‘˜ (𝑀) π΅π‘˜π‘– (π‘₯ 𝑖 ) = ∑𝑖=1 𝑠𝑀 (π‘₯ 𝑖 ),
𝑗 = 1, … , π‘ž
(10)
thus producing the fit of a purely additive multi-response spline model. Recall that the 𝛽 coefficients cannot be directly
interpreted except in the case of a spline of degree 0 whose basis functions are piecewise constant and 0-1 valued
leading to a dummy matrix 𝐁 i . In that case, a coefficient 𝛽 is the weight of the corresponding local support. In the same
way as for the components, the non-linear influence of the predictors on the π‘—π‘‘β„Ž response is measured by the most
𝑗,𝑖
significant coordinate function plots {(π‘₯ 𝑖 , 𝑠𝑀 (π‘₯ 𝑖 ))}
. After standardizing the predictors, the range of the values of
𝑖=1,…,𝑝
𝑗,𝑖
𝑠𝑀 (π‘₯ 𝑖 ) allows to sort the coordinate splines and therefore the predictors in descending order of their influence on
𝑦̂ 𝑗 (𝑀).
Properties (6) and (7) of the PLS boost becomes:
if M= rank(𝐁), then PLSS(𝐗, 𝐘)≡{LSS(𝐗, 𝐲𝑗 )}j=1,..,q .
(11)
It means that choosing the largest possible value of 𝑀 leads PLSS to produce q distinct regressions by least-squares
splines (LSS) if the inverse of the matrix 𝐁′𝐁 exists.
PLSS(𝐗, 𝐘 = 𝐁)=PLS(𝐁, 𝐘 = 𝐁)≡PCA(𝐁)≡NLPCA(𝐗).
Here, the non-linear PCA (NL-PCA) in the sense of Gifi (1990) is the self-PLS regression of 𝐁 onto itself.
(12)
By way of capturing the interactions, one is faced with the problem of the curse of dimensionality although the PLS
algorithm is to some extent robust against that problem. A phase of selecting the relevant interactions is the main
characteristic of the MAPLSS boost whose base learner incorporates tensor products of two by two B-splines families.
Let 𝐾2 be the set of the accepted couples of bivariate interactions
𝐾2 = {{𝑖, 𝑖′} | the interaction between π‘₯ 𝑖 and π‘₯ 𝑖′ is accepted} .
The number of accepted interactions is between 0 and 𝑝(𝑝 − 1)/2 when all the possible interactions are accepted. The
MAPLSS base learner becomes
𝑝
𝑑 = β„Ž(𝑋, πœƒ) = ∑
∑
𝑖=1
π‘Ÿπ‘–
π‘˜=1
𝑝
=∑
𝑖=1
πœƒπ‘˜π‘– π΅π‘˜π‘– (π‘₯ 𝑖 ) + ∑
π‘Ÿπ‘–
{𝑖,𝑖 ′ }∈𝐾2
β„Žπ‘– (π‘₯ 𝑖 ) + ∑
{𝑖,𝑖}∈𝐾2
{∑
π‘˜=1
𝑖′
β„Žπ‘–,𝑖 ′ (π‘₯ 𝑖 , π‘₯ )
∑
π‘Ÿπ‘–′
𝑙=1
′
𝑖,𝑖′ 𝑖
πœƒπ‘˜,𝑙
π΅π‘˜ (π‘₯ 𝑖 )𝐡𝑙𝑖 (π‘₯ 𝑖′ )}
′
where β„Žπ‘–,𝑖 ′ (π‘₯ 𝑖 , π‘₯ 𝑖 ) is the bivariate spline function of the interaction between π‘₯ 𝑖 and π‘₯ 𝑖′ . Therefore, when 𝐾2 ≠ ∅, the
MAPLSS base learner is an ANOVA type decomposition of main effects and bivariate interactions captured by simple
and bivariate spline functions. It reduces to the PLSS base learner when 𝐾2 = ∅. The price to be paid for capturing
interactions is the extension of the dimension π‘Ÿ that becomes
𝑝
π‘Ÿ=∑
π‘Ÿπ‘– + ∑
𝑖=1
{𝑖,𝑖′}∈𝐾2
π‘Ÿπ‘– π‘Ÿπ‘–′ .
We underline the importance of the selection of interactions, for example when 𝑝 = 50 and all the variables are
transformed by B-splines with the same dimension π‘Ÿπ‘– = 10, if all the possible interactions are incorporated into the
design (π‘π‘Žπ‘Ÿπ‘‘ 𝐾2 = 1225), then the dimension of the base learner becomes π‘Ÿ = 123 000.
′
In fact, the centered design matrix 𝐁 = [𝐁1 | … |𝐁 𝑝 | … |𝐁 𝑖,𝑖 |… ], used in the PLS algorithm instead of 𝐗, incorporates
'
blocks 𝐁 i,i (of dimension n by π‘Ÿπ‘– π‘Ÿπ‘– ) whose columns are the two by two product of columns from 𝐁 i and 𝐁 i' respectively.
In short,
MAPLSS(𝐗, 𝐘)=PLS(𝐁, 𝐘).
𝑝
𝑝
′
′
′
′
In (8), replacing 𝐗 by [𝐁11 (π‘₯ 1 ), … , 𝐁1π‘Ÿ1 (π‘₯ 1 ), … , 𝐁1 (π‘₯ 𝑝 ), … , ππ‘Ÿπ‘ (π‘₯ 𝑝 ), … , 𝐁1𝑖 (π‘₯ 𝑖 )𝐁1𝑖 (π‘₯ 𝑖 ), … , ππ‘Ÿπ‘– 𝑖 (π‘₯ 𝑖 )ππ‘Ÿπ‘– ′ (π‘₯ 𝑖 ), … ] and
𝑖
regrouping the terms lead to fit the π‘—π‘‘β„Ž response
′
𝑝
𝑗,𝑖
𝑗;𝑖,𝑖′
𝑦̂ 𝑗 (𝑀) = ∑𝑖=1 𝑠𝑀 (π‘₯ 𝑖 ) + ∑{𝑖,𝑖}∈𝐾2 𝑠𝑀 (π‘₯ 𝑖 , π‘₯ 𝑖 )
𝑗 = 1, … , π‘ž.
(13)
Similarly to PLSS, as the predictors are standardized, one can measure the influence of one term in (13) by the range of
the values of the predictors transformed by the spline functions. Using the range criterion, one can order in terms of
influence the 2-D and 3-D plots of the curves and surfaces of the ANOVA type decomposition (13). In the one response
case, more parsimonious models can sometimes be obtained by pruning (13) by removing the smallest terms. Notice
that in multi-response problems, like those in multi-class supervised classification, one can only prune the smallest
terms common to the q equations.
The building-model stage of MAPLSS whose aim is mainly the construction of 𝐾2 set can be summed up by the
following steps:
Inputs: Threshold ε = 20%, π‘€π‘šπ‘Žπ‘₯ = maximum number of dimensions to explore.
Step 0 - The main effects model PLSS (mandatory) indexed by m.
Set the spline parameters (degrees and knots) and c the GCV penalty parameter.
Denote GCVm (Mm ) and R2m (Mm ), the GCV and R2 of the main effects PLS boost.
Step 1 - Individual evaluation of interactions: each interaction denoted i, is separately added to the main effects model
crit(Mi )=maxM∈{1,Mmax }
R2m+i (M)-R2m (Mm ) GCVm (Mm )-GCVm+i (M)
+
GCVm (Mm )
R2m (Mm )
Eliminate interactions with negative criterion. Order decreasingly the accepted candidates to enter in 𝐾2 .
Sometimes in this interaction evaluation criterion, we do not consider the first term related to R2m (Mm ). Indeed
the dimension of design matrix could be very high and R2m (Mm ) will result close to one, so in this case it will suffice to
compute the interaction evaluation criterion simply using the relative increase of GCV(M).
Step 2 - Forward phase: Add successively interactions to main effects
set GCV0 =GCVm (Mm ), and 𝑖 = 0
Repeat
i=i+1 add interaction and index the new model with i
GCV𝑖 = min𝑀∈{1,…,π‘€π‘šπ‘Žπ‘₯} GCV𝑖 (M)
UNTIL {GCV𝑖 <GCV𝑖+1 -εGCV𝑖−1 }
Step 3 - Backward phase: prune the ANOVA terms of lowest influence,
Criterion: the range of the values of the ANOVA terms and the PRESS to assess quality of prediction and
dimension M of the model.
In the inputs, the threshold for accepting or rejecting one interaction, ε=0.2 used in Step 2, is the result of simulations
presented in [25]. Step 0 is the mandatory construction of the main effects PLS boost in which we set the spline
parameters (degrees and knots) as well c, the coefficient of penalty in GCV, which will be used in the next steps to
incorporate interactions eventually. Step 1 is the evaluation and the classification of interactions when each interaction
is separately added to the main effects. The criterion is the sum of two terms respectively based on the gain in R2 and in
GCV. This criterion is used to eliminate interactions with global negative gain and classify (in descending order) the
candidates for entering in 𝐾2 . The forward phase Step 2 is based on the GCV criterion to incorporate relevant
interaction in 𝐾2 as well as in the PLS boost (13). The last backward pruning phase, Step 3, is based on the
classification of the terms of the ANOVA decomposition and on the PRESS to evaluate the quality of prediction of the
retained PLS boost model.
4. PLS boost for Classification
Situations in which the response (output) variable takes categorical values constitute an important class of problems.
In the particular case of PLS-DA, the n by q learning response matrix 𝐘 is a 0-1 matrix indicating the membership of an
observation to one of the q groups. In order to predict the group memberships of the observations from the results of
PLS-DA a common approach is simply to assign the observations to the group giving the largest predicted indicator
variable, but as stated [11, 41] this strategy is not the best. A suitable geometrical alternative used in PLS boosting
consists in performing a classical linear discriminant analysis (LDA) using the matrix of components 𝐓 as the set of
latent discriminant variables. This leads to the use of the Mahalanobis distance [9] on the PLS-DA scores. For 𝑔 =
1, . . , π‘ž we can define the centroids of the groups by the row vectors πœ‡π‘” = 𝑛𝑔−1 ∑π‘–πœ–πΊ 𝐓𝑖 , where G is the index set of the
observations of the group 𝑔 and 𝑛𝑔 is the cardinal of 𝐺. We will first project it on π‘ π‘π‘Žπ‘›(𝐓 ′ ), leading to a component
row vector 𝑑π‘₯ . A distance measure which we denote by 𝑑(𝑑π‘₯ , πœ‡π‘” ) can then be computed and the new observation is
assigned to the group for which the distance takes the smallest value. The use of the Mahalanobis distance is a classical
choice
′
𝑑(𝑑π‘₯ , πœ‡π‘” ) = (𝑑π‘₯ − πœ‡π‘” )(𝐓 ′ 𝐓)−1 (𝑑π‘₯ − πœ‡π‘” ) .
In the following, three examples illustrate and compare the performances of the PLS boost in both the linear and the
non-linear versions.
5 Applications in Agro-Chemistry
To show the potential of MAPLSS boost regression and of MAPLSS-DA boost, we present three different real datasets
pertaining to soil and its products. We will consider datasets of different complexities. In order to decide the optimal
model dimension, the predictors to retain and the interactions to include, we will consider the GCV criterion in the first
example, and the PRESS criterion for the remaining two ones.
In the first example our aim is to predict soil organic carbon (OC) content on the basis of visible-near infrared (vis-NIR,
230-2500 nm) spectral reflectance.
In the second example, we are interested in classifying the level of Nitrate-nitrogen (NO3-N) in 25 soils (three classes of
NO3-N), given the knowledge of 8 chemical and physical characteristics of soils.
Finally, the third example faces the problem of classifying the olive oil variety (three varieties) knowing 13 chemical
variables of 56 olive oils samples.
5.1 Case study I: Spectroscopic Prediction of Organic Carbon
In the last decade, PLS has been used extensively by chemometricians to tackle spectroscopic applications with
considerable success [6, 29, 42].
Organic carbon (OC) is the main component of soil organic matter. On average, carbon comprises about half of the
mass of soil organic matter. The latter is one of the most important soil components, giving its profound effects on
many soil functions and properties, especially in the surface horizons [44]. Soil organic matter provides much of the soil
cation exchange capacity and water-holding capacity. It is largely responsible for the formation and stabilisation of soil
aggregate. Organic matter in the world soils contains two to three times as much carbon as is found in all the world
vegetation. Soil organic matter, therefore, plays a critical role in the global carbon balance that is thought to be the
major factor affecting global warming or the greenhouse effect [45].
Giving its relevance, knowledge of soil organic matter content, spatial (mapping) and temporal (monitoring) variability
is an essential pre-requisite for sustainable soil management and protection of the environment [46]. Usually organic
carbon is determined in the laboratory and the obtained values make it possible to compute the organic matter content
(OM= 1.724 * OC).
Typically, a large number of samples must be collected and analysed in order to assess the spatial and temporal
variability of organic carbon. However, conventional methods are expensive and require large amounts of labour and
chemical analysis [7]. Therefore, there is a need to develop alternative methods to the conventional laboratory analysis
to measure soil OC. In recent years, visible and near-infrared (vis-NIR) reflectance spectroscopy has proved to be a
promising technique for the investigation of soil organic carbon and various other soil properties. Compared to
conventional analytical methods, vis-NIR spectroscopy is faster, cheaper, and non-destructive, requires less sample
preparation, with less or no chemical reagents and it is highly adaptable to automated and in situ measurements [6, 4749].
Reflectance spectroscopy refers to the measure of spectral reflectance [50], i.e., the ratio of the electromagnetic
radiation reflected by a soil surface to that impinging on it [51]. Since the characteristics of the radiation reflected from
a material are a function of the material properties, observations of soil reflectance can provide information on the soil
properties (i.e. OC content; [52]). Laboratory determination of OC is frequently realised in the filled, during soil
survey, based on the visual assessment of soil colour for comparison with standard colour charts [43]. Colour is a
psychophysical perception of a visible reflectance.
Considering that the organic carbon affects the whole vis-NIR reflectance spectrum (although its effect on the visible
region is more relevant; [53]), and that the accuracy and the precision of the visual determination of the Munsell colour
of a soil depends on many factors (including the incident light, the condition of the sample and the skill of the person
making the match; [54, 55]), it would be useful to predict soil OC carbon based on vis-NIR reflectance spectra
measurements (Figure 1.1) by PLS boost regression family.
The objective of the present example is to evaluate the usefulness of vis-NIR reflectance spectroscopy and MAPLSS
boost in predicting soil OC. For this purpose a sample set of 324 soils representative of the pedo-variability of
important agro-ecosystems of southern Italy is used [43].
Figure 1.1: Examples of reflectance spectra (selected from the available dataset) of soils with different OC (and correspondent OM)
content. As the OC content increases, the reflectance decreases in the whole vis-NIR range, while the concavity in the visible
increases.
Soil organic carbon content was determined in the laboratory, using Walkey-Black method.
Spectral reflectance of soil samples was measured in the laboratory, under artificial light, using a Field Spec Pro spectro
radiometer (ASD, Boulder Colorado). This instrument combines three spectrometers to cover the portion of the
spectrum between 350 and 2500 nm. The instrument has a spectral sampling distance of ο‚£ 1.5 nm for the 350–1000 nm
region and 2 nm for the 1000–2500 nm region. The spectrometer optic was mounted at a height of 10 cm, so that the
measurement spot size was 4.5 cm in diameter. Each soil sample was placed inside a circular black capsule of 5 cm in
diameter and 0.5 cm depth and levelled with the edge of a spatula to obtain a smooth surface. Soil samples were
illuminated with two 100 W halogen lamps, positioned in the same plane, under a 30 degree illumination angle.
Measurements were made at the nadir. Radiances were converted to spectral reflectance by dividing the radiance
reflected by the target (soil) by that of a standard white reference plate ('spectralon') measured under the same
illumination conditions. To reduce instrumental noise, four measurements for each sample were averaged.
Further for the sake of low computational costs prior to statistical analysis, reflectance spectra were resampled each at
50 nm. In the end, 44 reflectance bands (predictors), labeled from R350 to R2500, were retained. Moreover, the 324
observations were randomly split up in two parts, one for the training set (216) and the second for the test set (108).
In a preliminary step all measured variables were standardized. Given the high dimension of model data, we consider as
accuracy criterion the “quick” GCV measure in PLS and MAPLSS models. To use this criterion we need to calibrate
the constant value of c following an heuristic strategy. Considering that GCV is nothing else but a PRESS surrogate, we
look for that c value which gives the best approximation of the GCV value to the PRESS one (see Table 1.1).
The choice of B-spline parameters has also been made following an heuristic strategy. Starting with a small degree and
a small knot number and progressively increasing the model complexity.
We have performed a campaign of different PLS boosts, starting from the usual linear PLS followed by the quadratic
and cubic models (splines of degree 2 and 3 with no knots), and finally by more flexible local polynomials introducing
one knot in the range of the predictors. To assess the quality of the prediction according to the dimension of the models,
three criteria have been used, see Table 1.1. The first one is the GCV, the second one is the R2 , both of these criteria are
internal to the training set. The third criterion is external, based on the test set. We have considered the Mean-Square
Error (MSE) of the standardized response OC, which is a prediction criterion. We retain firstly that the gain in
prediction supplied by the non-linear models is not very spectacular, the best results being those of the quadratic model,
with fourteen components according to the MSE prediction criterion.
Linear
Quadratic
Cubic
degree1, 1 knot at the median
Opt. Dim , GCV
GCV(1.6,16)=0.3
GCV(2.4,13)=0.3
GCV(2.4,16)=0.3
GCV(2.4,17)=0.4
Opt. Dim , MSE
(15, 0.22)
(14, 0.18)
(10, 0.28)
(16, 0.20)
R2
76.5%
77.0%
74.2%
76.6%
Table 1.1: Quality of the prediction of the soil Organic-Carbon obtained by polynomial PLS models (linear, quadratic, cubic) and by
PLSS of degree 1 with one knot at the median. Two criteria to find the optimal dimension and to assess the quality of the prediction
of OC: the GCV computed on the training set and the MSE on the test set.
In Figure 1.2, we represent the evolution of linear PLS boost regression coefficients. This plot represents the projection
of the beta coefficients according to the optimal dimension (M=15). As usual in linear model, the beta coefficients can
have positive and negative values, the most relevant beta coefficients are related to the following reflectance bands:
R450, R500, R1400, R1850 and R2200.
Figure 1.2: Evolution of the linear PLS Beta coefficients related to the predictor reflectance bands according to the optimal model
dimension (M=15).
Passing from the linear PLS boost model to the non-linear one, the optimal dimension number is lower, M=14, the
accuracy of the model measured by GCV does not change, while the prediction criterion MSE improves and the
goodness of fit, R2, slightly increases too (see Table 1). This result tends to validate the non-linear transformation
functions of spectra data used in MAPLSS boost. However among the 946 possible bivariate interactions none was
accepted as relevant, so in substance we use PLSS regression algorithm.
To assess which spectral predictors are the most influent in predicting OC, we look at the evolution of the beta of the
transformed predictors (Figure 1.3). Looking at Figure 1.3, we observe that the most important spectra variables are
R650, R750 and more markedly, R1900, R2200 and R2300, that is two visible red bands and three near-infrared bands.
Figure 1.3: Influence of the reflectance bands (the predictors) on the soil Organic-Carbon, revealed in the non-linear PLSS boost
(optimal model dimension M=14) by plotting the range of the coordinate spline functions that are pure polynomials (no knots) of
degree 2.
The preliminary investigation presented herein highlights the potential of the proposed statistical method and the higher
performances of the non-linear PLS boost compared to the usual PLS regression. However, for practical prediction of
organic carbon content, it needs to be further investigated, by extending the investigation to a higher number of soil
samples, more representative of the pedo-environmental variability of the agro-systems of southern Italy. To this end,
research activities are in progress, and the results will be reported elsewhere.
5.2 Case Study II: Nitrate-N concentration in soils.
This second data set arises from an investigation of the Natural Resources Conservation Service (NRCS) of the United
Stated Department of Agriculture regarding the application of soil survey to assess the effects of land management
practices on soil and water quality (NRCS, Soil Survey Investigations Report No. 52, 2010) [56].
We focus on nitrate-nitrogen (NO3-N) and other soil properties of 25 soil samples (representative of specific map units)
under forest, pasture, and cropland in the Cullers Run and Upper Cove Run watersheds, Hardy County, West Virginia.
Specifically, the application pursues the objective of discriminating (NO3-N) classes, using MAPLSS-DA boost applied
to soil properties, such as clay, organic matter (OM), cation exchange capacity (CEC), pH, bulk density (Bd), liquid
limit (LLimit), in addition to runoff and leaching. To this purpose, NO3-N concentration was classified as follows: low
(Nlow = < 78 mg kg-1 NO3-N), medium (Nmedium = 78.1 - 130 mg kg-1 NO3-N), and high (Nhigh = > 130 mg kg-1
NO3-N).
It must be pointed out that, this case study, unlike the other two case studies presented herein, has been considered
exclusively to assess the performance of the statistical approach under discussion, rather than providing a practical tool
to use for discriminating NO3-N classes.
Given the small number of samples, to detect the appropriate dimension number of models, we consider only the
PRESS criterion together with the number of misclassified samples. In Figures 2.1, 2.2, we show the PRESS prediction
plots obtained respectively by PLS-DA and MAPLSS-DA models.
In linear PLS-DA (Figure 2.1) the optimal dimension is 1, as the multi-response model we can observe on the left side
of Figure 2.1 the representation of a total PRESS value (the maximum dimension to explore has been set to 10) and, on
the right side of Figure 2.1, the partial PRESS values concerning the three NO3-N classes, the different symbols indicate
the three responses.
Notice that one can reject that linear approach since the total PRESS is larger than 3, the number of classes. Actually, in
that case the prediction is not better than that obtained by the response mean. Indeed, the percentage of misclassified
individuals produced by PLS-DA is rather high (68%) with a very low R2=3.7%.
Figure 2.1: PRESS criterion in PLS-DA with one observation out that suggests to reject this approach since the optimal PRESS
obtained with 1 dimension is larger than 3, the number of responses. On the right side there are the partial PRESS values concerning
the three NO3-N classes.
Using the MAPLSS boost, we observe that among the 45 possible bivariate interactions, none was assessed as relevant
thus leading to the main effects PLSS model. In Table 2.1 the percentage of misclassified soils has significantly
decreased to 16% (as well as the goodness of fit has significantly increased, R2=39.3%).
predicted 1
predicted 2
predicted 3
actual 1
7
0
0
actual 2
1
6
2
actual 3
0
1
8
Table 2.1: Percentage of misclassified individuals: 16 %.
Looking at Figure 2.2, we learn that the optimal dimension is 2 and the total PRESS value is 2.37 that allows to accept
this model. On the left side, Figure 2.2 displays the total PRESS criterion with respect to the model dimension (as above
the maximum dimension to explore has been set to 10). On the right side, Figure 2.2 shows the partial PRESS values in
regards to each response.
Figure 2.2: Leave-one out PRESS criterion in PLSS-DA indicating the acceptation of the model with two dimensions,
PRESS(0.05,2) = 2.37 < 3. On the right side, the partial PRESS plots concerning the three NO3-N classes-responses.
Figure 2.3: NO3-N in 3 classes Circle in MAPLSS-DA
Figure 2.4: NO3-N in 3 classes Observation plot in MAPLSSDA, in circles the observation misclassified
In Figures 2.3 and 2.4, we see that the circle of correlations and the observation plot of the final PLSS-DA model. In the
circle of correlations, we notice that the responses Nmiddle and Nhigh are highly correlated and this remark suggested
us to aggregate these classes to see how better the diagnostic improves with only two classes of NO3-N concentration,
Nlow and Nmiddle-high. Actually, performing the analysis leads to a main effects plus one interaction term in the
MAPLSS-DA additive model. In Figure 2.5, we represent the summary of the step 1 of the building-model phase that
evaluates and classifies separately each interaction when adding it to the pure main effects model. The interaction
between LLimit and Leaching, first in the lead, is the only candidate (allowing more than 15% relative gain in PRESS)
that is accepted to enter the model as a new predictor.
Figure 2.5: Classification of the 45 candidate interactions in MAPLSS-DA, the vertical red line indicates the threshold after that there
is no gain in the criterion.
In that final model after introducing the bivariate interaction, the number of misclassified samples is now equal to zero
(Figure 2.7), the leave-one-out PRESS criterion is equal to 0.72 with 5 components and the goodness of fit, R2,
increases to 90%.
Figure 2.6: Circle of correlation in MAPLSS-DA, the response
variable NO3-N is in 2 classes
Figure 2.7: Scatterplot of the observations in MAPLSS-DA, the
response variable NO3-N is in 2 classes
In Figures 2.6 and 2.7, we report the correlation and observation plots when considering only two NO3-N classes. The
five MAPLSS-DA components allow to perfectly distinguish the two groups of soil samples. In particular, it is
sufficient to look at the first component which separates in satisfactory manner the two groups: Nlow for negative
values of 𝑑 1 and Nmiddle-high for mean and positive values of 𝑑 1 .
In order to understand which and how the predictor variables characterize the two NO3-N classes (Nlow and Nmidhigh) detected by 𝑑 1 , we have to look at the scatterplot of observations (Figure 2.7) as well as at the ANOVA plots of
predictors on the first component (Figure 2.8).
Figure 2.8a
Figure 2.8b
Figure 2.8c
Figure 2.8d
Figure 2.8e
Figure 2.8f
Figure 2.8g
Figure 2.8h
Figure 2.8i
Figure 2.8: ANOVA function plots of the additive decomposition of 𝑑 1 in the two groups MAPLSS-DA of the NO3-N variable. The
predictors are ordered according to the vertical ranges from left to right and up to down. The dashed lines mark the knot position of
the transformed predictors.
In Table 2.2, we summarize how the main discriminant variables characterize and differentiate the two groups.
NO3-N classes
Nlow
Nmiddle-high
Main latent discriminant variable
(Figure 2.7)
𝑑 1 low (negative) values
𝑑 1 mean (near 0) and high (positive)
values
Main natural discriminant variables (Figure 2.8)
PH, OM and CLAY extreme (low and high) values
PH, OM and CLAY mean values
Table 2.2: Characterization of the two NO3-N classes by the main discriminant variables.
To interpret the predictor effects on this component, we focus our attention on predictor functions that are piecewise
linear. We can see in Figure 2.8 the contributions of the ANOVA terms of each predictors, the first three more
important predictors are PH, OM and Clay, followed by order of importance, by BD, Runoff, Leaching, LLimit, the
interaction LLimit*Leaching and in the last position the variable CEC. In particular in Figure 2.8a, 2.8b, 2.8c, we
notice that mean values of PH, OM, Clay and BD characterize greatly the group Nmiddle-high. The extreme values of
these four variables contribute to the negative values of 𝑑 1 and then to characterize the group Nlow. Furthermore in
Figure 2.8e, we see that high values of Runoff affect the first group (low content of NO3-N), while in Figure 2.8f low
values of Leaching influence the group Nlow. The last three less influent predictors are LLimit, the interaction
LLimit*Leaching and CEC (Figures 2.8g, 2.8h and 2.8i, respectively).
5.3 Case Study III: Classification of olive oil varieties
Olive oil is one of the most important Italian food products and it is the most largely used edible oil in all the
Mediterranean area [57-59]. Its market has been recently expanded due to its highly appreciated organoleptic attributes,
significant nutritional value and beneficial effects [60-61]. Furthermore during the last years, the market of olive oils
from pure genetic varieties (mono-varietal oils) is increasing, due to the distinctive and peculiar intense taste [60, 62].
Olive oils are complex mixtures containing a wide variety of substances whose concentration may vary depending on
fruit variety, in addition to other factors such as latitude, climatic conditions, irrigation regime, fruit ripening, harvesting
and extraction technologies also affect the distribution of oil components [60]. Major and/or minor components such as
saturated and unsaturated fatty acids, diacylglycerols and tryacylglicerols, sterols, phenolic compounds, hydrocarbons,
pigments and volatile components have been employed for characterization and classification of oils arising from the
different varieties using different multivariate statistical methods [57, 63].
The present case study focuses on the application of MAPLSS-DA to the olive oil composition for the characterization
and classification of three important olive oil varieties, namely Ortice (Var1), Ortolana (Var2) and Racioppella (Var3),
typical of the Benevento province area (Campania region, Southern Italy). All these varieties are used for the production
of extra-virgin mono-varietal oils. They represent the most relevant varieties of that province, and, as such they have
been considered for the recognition of the Sannio DOP (Protected Denomination of Origin).
For the purposes of the present study, the analytical data of 57 olive oil samples (i.e, 19 samples for each of the three
considered varieties) were used. The data are the results of an investigation [47] carried out by the Institute for
Mediterranean Agriculture and Forest Systems of the Italian National Research Council, with the financial support of
the MIPAF (Minister of Agriculture and Forest Politics, Italy). The following 13 oil characteristics are considered: free
acidity (FAc), fatty acid composition, total polyphenols (Poly) and ortho-polyphenols (Ort-Poly). Specifically, the fatty
acids analysed where Palmitic acid C16:0 (Pal), Palmitoleic acid C16:1 (PalOl), Heptadecanoic acid C 17:0 (Hept1),
Heptadecenoic acid C17:1 (Hept2), Stearic acid C18:0 (Ste), Oleic acid C18:1 (Ol), Linoleic acid C18:2 (Linol),
Linolenic acid C18:3 (Linol2), Arachidic acid C20:0 (Arac) and Eicosenoic acid C 20:1 (Eico).
Similarly to the second case study, given the reduced number of samples, only the PRESS criterion has been used to
assess the appropriate dimension of the linear and non-linear models. We point out that the PRESS criterion, together
with the number of misclassified elements, reflect the accuracy of the models. Following the heuristic strategy to fix the
tuning MAPLSS-DA parameters, we decide to set the spline degree equal to 3 and knot number equal to one.
In Figures 3.1 and 3.2, we show the PRESS prediction plots obtained respectively by PLS-DA and MAPLSS-DA
models.
Figure 3.1: PRESS criterion to decide the optimal model dimension. PRESS(0.05,5)=1.34 that is the minimum PRESS value is
reached for 5 dimensions, following a leave one out cross validation procedure, in PLS-DA. On the left side there is the
representation of the total PRESS value (the maximum dimension to explore has been set to 13). On the right side there are the partial
PRESS values concerning the three olive oil varieties, the different symbols indicate the three varieties (Var1=Ortice,
Var2=Ortolana, Var3=Racioppella).
Figure 3.2: PRESS criterion to decide the optimal model dimension. PRESS(0.05,4)=1.07 that is the minimum PRESS value is
reached for 4 dimensions, following a leave one out cross validation procedure, in MAPLSS-DA. On the left side, we look at the
representation of the total PRESS value (the maximum dimension to explore has been set to 13). On the right side, we see the partial
PRESS values concerning the three olive oil varieties, the different symbols indicate the three varieties (Var1=Ortice,
Var2=Ortolana, Var3=Racioppella).
In Figure 3.1, we observe that in linear PLS-DA model the optimal dimension is M=5 and the total PRESS value is
1.34, while in MAPLSS-DA model Figure 3.2 the PRESS improves (decreasing to 1.07) and the model dimension is
M=4. On the left side of Figures3.1 and 3.2 there is the projection of the total PRESS criterion in regards to the model
dimension (maximum dimension to explore has been set to 13), on the right side there is the projection of partial PRESS
values as regards each response, the different symbols indicate the three olive oil varieties.
In PLS-DA model we get the percentage of misclassified soils equal to 5.4% and a good value of goodness of fit
R2=68.2% (which is usually low when the response is a dummy variable). In MAPLSS-DA we observe that the
percentage of misclassified samples has significantly decreased to 1.7% and, we also obtain a very satisfactory
goodness of fit value, R2=81.7%. Among the 78 possible bivariate interactions none was accepted as relevant because
the relative PRESS gain is markedly less than 20%. In the PLS-DA model, three individuals were misclassified,
according to PRESS(0.01,5)=1.34.
Notice that when processing MAPLSS-DA algorithm, we finally prune and improve the model by deleting the variable
Arac which does not contribute significantly to the fit of all the three varieties of olive plants. The PRESS criterion
decreases then to 1.03 with 4 components, the percentage of misclassified samples is null, and the goodness of fit
improves to about 91%.
In Figures 3.3 and 3.4, we see the circle of correlations and the observation plot of final MAPLSS-DA model,
respectively. In the circle of correlations, we note that the varieties Ortolana and Racioppella (Var2 and Var3) are both
correlated with the first component and inversely between them, while the variety Ortice (Var1) is highly correlated
with the second component. All the three varieties are well discriminated/explained, in Figure 3.4 it appears a good
separation among the three groups of olive oil samples, given the chemical characteristics of olive oils.
Figure 3.3: Circle of correlation in MAPLSS-DA, the
response variable is olive oil varieties in three classes
Figure 3.4: Observation Plot in MAPLSS-DA, the
response variable is olive oil varieties in three classes
As the first component allows to separate variety 2 (Ortolana is correlated negatively with the first component) from
variety 3 (Racioppella is correlated positively with the first component) and the second component singles out group 1
(Ortice). The interpretation of the observation plot is facilitated by observing the ANOVA plots of the influence of the
predictors (Figure 3.5, 3.6,) on the latent variables (base learners) 𝑑 1 and 𝑑 2 . To understand how chemical variables
affect the components 𝑑 1 and 𝑑 2 , we focus on the coordinate function plots from Figures 3.5a to Figure 3.5n and from
Figure 3.6a to 3.6n, respectively. As usual the influent predictors are decreasingly ordered according to the range of the
transformed values. In detail in Figures 3.5a, we see that low values of Poly, influence positively the first component
(that is Var3, Racioppella), while high values of Ol influence negatively the first component (that is Var2, Ortolana).
Likewise in Figure 3.5c, we notice that low values of Pal influence negatively 𝑑 1 (that is Var2), while high value of Pal
influence positively 𝑑 1 (that is Var3), etc..
In regards to the second component 𝑑 2 (Figure 3.6) characterizing mainly Var1, the most influent predictors are Poly,
Ste and Ol. It is clear that high values of Poly characterize positively the second component (that is Var1), as well high
values of the variable Steatic acid, Ste, and middle values of Ol.
Figure 3.5a
Figure 3.5b
Figure 3.5c
Figure 3.5d
Figure 3.5e
Figure 3.5f
Figure 3.5g
Figure 3.5h
Figure 3.5i
Figure 3.5l
Figure 3.5m
Figure 3.5n
Figure 3.5: ANOVA of the first ten retained predictors which more affect the latent variable 𝑑 1 . The predictors are ordered according
to the vertical ranges from left to right and up to down. The dashed lines mark the knot position of the transformed predictors (the
variable olive oil varieties is in three classes) of MAPLSS-DA.
Figure 3.6a
Figure 3.6b
Figure 3.6c
Figure 3.6d
Figure 3.6e
Figure 3.6f
Figure 3.6g
Figure 3.6h
Figure 3.6i
Figure 3.6l
Figure 3.6m
Figure 3.6n
Figure 3.6: ANOVA of the ten retained predictors which more affect the latent variable 𝑑 2 . The predictors are ordered according to
the vertical ranges from left to right and up to down. The dashed lines mark the knot position of the transformed predictors (the
variable olive oil varieties is in three classes) of MAPLSS-DA.
In conclusion of the interpretation of Figures 3.4, 3.5 and 3.6, in Table 3.1 we describe how the main latent and natural
discriminant variables contribute to characterize and differentiate the three different olive oil varieties.
Olive Classes
Ortice (var 1)
Latent discriminant variables
𝑑 1 mean (near zero), 𝑑 2 high (positive)
Ortolana (var 2)
𝑑 1 low (negative), 𝑑 2 low (negative)
Racioppella (var 3)
𝑑 1 high (positive), 𝑑 2 low (negative)
Main natural discriminant variables
Poly high, Ol mean, Pal mean, LinoI mean,
Ste high.
Poly mean, Ol high, Pal low, LinoI low,
Pal-Ol low,
Poly low, Ol low, Pal high, LinoI high, PalOl high, Ort-Poly high.
Table 3.1 : Characterization of the olive oil classes by the discriminant variables.
6. Conclusion
In this paper we have presented the PLS boost model in its multi-response version for classification and regression
studies. Three case studies of different complexities have been discussed, showing the gain in diagnostic provided by
the non-linear additive PLS boost compared to the linear one.
Focusing on the characteristics of chemical noisy data in high dimension, it has been proposed a statistical method for
non-linear multivariate discriminant analysis, MAPLSS-DA, in the framework of Boosting classification techniques. As
well as an application of MAPLSS regression has been provided in regards to special data like spectroscopic ones.
The MAPLSS-DA boost offers advantages to linear PLS-DA method especially in situations where the predictors are
very noisy, interaction effects are present, and where there is a low sample size to number of variables ratio and where
outliers may occur. The performance of the proposed method has been compared to known discriminant linear model,
as PLS-DA in terms of their accuracy via two criteria, the number of misclassified elements and the PRESS criterion.
The multi-responses model, MAPLSS-DA, extends PLSS-DA algorithm using an automatic selection of interactions.
Similarly to PLS and PLSS, it avoids the problems of both multicollinearity and ill-conditioning. As it has been shown
by three case studies, MAPLSS-DA is an efficient discriminant tool in the complex real-life context.
In the first case study, we have analyzed a large number of bands on vis-NIR reflectance spectra to predict OC in soils
from representative agro-ecosystems of southern Italy [29]. Given the huge number of bivariate interactions to compute
in order to assess the accuracy of the model, the quick GCV criterion in MAPLSS boost regression was useful.
Eventually, no interaction has been added to the model as there was not a relevant gain in accuracy and goodness of fit.
These results have demonstrated the potential of combining spectroscopy with non-linear PLS regression.
In the second example, we have studied the nitrate-nitrogen (NO3-N) concentration in soil samples under forest, pasture,
and cropland in the Cullers Run and Upper Cove Run watersheds, Hardy County, West Virginia [43]. The objective has
been to discriminate nitrate-N classes, using MAPLSS-DA boost applied to other soil properties, such as clay, organic
matter (OM), cation exchange capacity (CEC), pH, bulk density (Bd), liquid limit (LLimit), besides than runoff and
leaching. In regards to this small data set, the model accuracy has been detected thanks to classical PRESS criterion and
to the table of misclassified elements. After data simplification (considering only two classes of nitrate-N) one
candidate bivariate interaction has been added to the main effects, which has permitted to improve the goodness of
model fit. By successively pruning the model, we have obtained optimal results by performing MAPLSS-DA as the
misclassified percentage was equal to zero.
Finally in the third case study, instead of analysing the soil characteristics as in the two previous cases, we have
polarized attention on one of the most important Italian food product which is olive oil. The aim was to discriminate
three important olive oil varieties, namely Ortice (Var1), Ortolana (Var2) and Racioppella (Var3), typical of the
Benevento province area (Campania Region, Southern Italy). The low number of olive oil samples has allowed to use
the classical PRESS criterion to choose the model dimension and detect the bivariate interactions. None of the
interactions was included in the final model as the relative gain in terms of decrease of the PRESS was less than 20%.
After the pruning phase, the results were very encouraging. Indeed with a relatively small number of olive oil chemical
characteristics, we have obtained that the percentage of misclassified units was zero and the goodness of fit was very
satisfactory.
In all the three case studies, the MAPLSS boost has improved the performance of linear PLS techniques, in fact
regression splines have allowed us to catch non-linearities and relevant interactions [27], setting the PLS in the
historical objective of boosting techniques.
References
[1] Wold, H. Estimation of principal components and related models by iterative least squares. In: Multivariate Analysis
II; P.R. Krishnaiah, Ed.; New York: Academic Press, 1966, pp. 391-420.
[2] Tenenhaus, M. La Règression PLS, Thèorie et Pratique, Editions Technip: Paris, 1998.
[3] Wold, S., Kettaneh-Wold, H., Skagerberg, B. Non Linear Partial Least Squares Modeling. Chemometrics and
Intelligent Laboratory Systems, 1989, 7, 53-65.
[4] Wold, S., Martens, H., Wold, H. The multivariate calibration problem in chemistry solved by PLS method. In:
Lecture Notes in Mathematics, Proceedings of the Conference on Matrix Pencils; Ruhe A.; Kagstrom B., Eds.;
Springer-Verlag, Heidelberg, 1983, pp. 286-293.
[5] Bastien, P., Esposito Vinzi, V. and Tenenhaus, M. PLS Generalised Regression. Computational Statistics and Data
Analysis, 2005, 48, 1, 17-46.
[6] Viscarra Rossel, R.A., Walvoort, T.D.J.J., McBratneyA.B., Janik L.J. and Skjemstad J.O. Visible, near infrared,
midinfrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties.
Geoderma, 2006, 131, 59–75.
[7] Viscarra Rossel, R.A. and McBratney, A.B. Laboratory evaluation of a proximal sensing technique for simultaneous
measurement of soil clay and water content. Geoderma, 1998, 85, 19–39.
[8] Sjöström, M., Wold, S., Soderstrom, B. PLS Discrimination Plots. In E.S. Gelsema and L.N. Kanals (Eds) Pattern
Recognition in Practice II, Elsevier: Amsterdam, 1986.
[9] Sabatier, R., Vivien, M. and Amenta P. Two Approaches for Discriminant Partial Least Squares. In: Between Data
Science and Applied Data Analysis; Schader M., Gaul W. And Vichi M.; Ed.; Springer, 2003, pp. 100-108.
[10] Barker, M. and Rayens W. Partial Least Squares for Discrimination. Journal of Chemometrics, 2003, 17, 163-177.
[11] Chevallier, S., Bertrand, D., Kohler, A. and Courcoux, P. Application of PLS-DA in multivariate image analysis.
Journal of Chemometrics, 2006, 20, 221–229.
[12] Hai-Ni, Qu., Guo-Zheng, Lin. Wei-Sheng, Xu. An asymmetric classifier based on partial least squares. Pattern
Recognition, 2010, 43, 3448–3457.
[13] Chiang, L. H., Russell, E. L., Braatz, R. D. Fault diagnosis in chemical processes using Fisher discriminant
analysis, discriminant partial least squares, and principal component analysis. Chemometrics and Intelligent
Laboratory Systems, 2000, 50, 243–252.
[14] Durante, C., Bro, R. and Cocchi, M. A classification tool for N-way array based on SIMCA methodology.
Chemometrics and Intelligent Laboratory Systems, 2011, 106, 73–85.
[15] Durand, J.F., and Sabatier, R. Additive Splines for Partial Least Squares Regression. Journal of the American
Statistical Association, 1997, 92, 440.
[16] Berglund, A. and Wolde, S. INLR, Implicit Non-Linear Latent Variable Regression. Journal of Chemometrics,
1997, 11, 141-156.
[17] Taavitsainen, V.M. and Korhonen, P. Nonlinear data analysis with latent variables. Chemometrics and Intelligent
Laboratory Systems, 1992, 14, 185-194.
[18] Baffi, G., Martin, E.B. and Morris, A.J. Non-linear projection to latent structures revisited: the quadratic PLS
algorithm. Computers and Chemical Engineering, 1999a, 23, 395-411.
[19] Baffi, G., Martin, E.B. and Morris, A.J.. Non-linear projection to latent structures revisited: the neural network
PLS algorithm. Computers and Chemical Engineering, 1999b, 23, 1293-1307.
[20] Li, B., Martin, E.B. and Morris, A.J. Box-Tidwell transformation based partial least squares regression. Computers
and Chemical Engineering, 2001, 25, 1219-1233.
[21] Wold, H. Non Linear Partial Least Squares Modeling II. Spline Inner Relation. Chemometrics and Intelligent
Laboratory Systems, 1992, 14, 71-84.
[22] Frank, I. E. A Nonlinear PLS Model. Chemometrics and Intelligent Laboratory Systems, 1990, 8, 109-119.
[23] Durand, J.F. Local Polynomial additive regression through PLS and splines: PLSS. Chemometrics and Intelligent
Laboratory systems, 2001, 58, 235-246.
[24] Durand, J.F., Lombardo, R., Interactions terms in non-linear PLS via additive spline transformation. In: Studies in
Classification, Data Analysis and knowledge Organization; Schader M., Gaul W. And Vichi M.; Ed.; Springer,
2003, pp. 22-30.
[25] Lombardo, R., Durand, J.F. and De Veaux, R. Model building in multivariate additive partial least squares splines
via the GCV criterion. Journal of Chemometrics, 2009, 23(12), 605-617.
[26] Lombardo, R., Durand, J.F. and Faraj A. Iterative design of experiments by boosted PLS models. A case study: the
reservoir simulator data to forecast oil production. Journal of Classification, 2011, 28, DOI: 10.1007/s00357-011.
[27] Durand, J.F. La régression Partial Least-Squares boostée. MODULAD, 2008, 38, 63-86.
[28] Bühlmann, P. and Yu B. Boosting with the L2 loss: regression and classification. Journal of the American
Statistical Association, 2003, 98, 324–339.
[29] Leone, A.P., Calabrò, G., Coppola, E., Maffei, C., Menenti, M., Tosca, M., Vella, M. and Buondonno, A.
Prediction of soil properties with VIS-NIR-SWIR reflectance spectroscopy and artificial neural networks. A case
study. Advances in GeoEcology, 2008, 39, 689-702.
[30] Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning, Data Mining, Inference and
Prediction, Springer: New York, 2001.
[31] Friedman, J.H. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 2001, 29,
1189–1232.
[32] Bühlmann, P. and Hothorn, H. Boosting algorithms: regularization, prediction and model fitting. Statistical
Science, 2007, 22(4), 477–505.
[33] Bühlmann, P. Boosting for high-dimensional linear models. The Annals of Statistics, 2006, 34(2), 559–583.
[34] Tukey, J.W. Exploratory Data Analysis. Addison-Wesley: Massachusetts, 1977.
[34] Craven, P. and Wahba, G. Smoothing noisy data with spline functions. Numer. Math, 1979, 31, 377–403.
[35] Gifi, A. Non-linear Multivariate Analysis, Wiley: Chichester, 1990.
[36] De Boor, C. A Practical Guide to Splines, Springer-Verlag: Berlin, 1978.
[37] Stone, C.J. Additive regression and other nonparametric models. The Annals of Statistics, 1985, 13, 689-705.
[38] Ramsay, JO. Monotone regression splines in action (with discussion). Stat. Sci. 1988, 3, 425–461.
[39] Molinari, N., Durand, JF. and Sabatier, R. Bounded optimal knots for regression splines. Computational Statistics
and Data Analysis, 2004, 45(2), 159–178.
[40] Eilers, P. H. C. and Marx, B. D. Flexible smoothing with B-splines and penalties. Statistical Sciences. 1996, 11,
89-121.
[41] Indahl, U.G., Sahni, N.S., Kirkhus, B., Naes, T. Multivariate strategies for classification based on NIR-spectra—
with application to mayonnaise. Chemometrics and Intelligent Laboratory Systems; 1999, 49, 19–31.
[43] Leone, A.P. and Escadafal, R. Statistical analysis of soil color and spectroradiometric data for hyperspectral remote
sensing of soil properties (example in a southern Italy Mediterranean ecosystem). Int. J. Remote Sensing, 2001, 12,
2311-2328.
[44] Brady, N.C. and Weil, R.R. The Nature and Properties of Soils. Pearson-Prentice Hall: UpperSaddle River, NJ.,
2008.
[45] vonLützow, M., Kögel-Knabner, I. Ekschmitt, K., Matzner, E., Guggenberger, G., Marschner, B. and Flessa, H.
Stabilization of organic-matter in temperate soils: Mechanisms and their relevance under different soil conditions.
Eur. J. Soil Sci., 2006, 57, 426–445.
[46] Volkan Bilgili, A. ,. van Es H.M, Akbas, F., Durak, A., Hively, W.D. Visible-near infrared reflectance
spectroscopy for assessment of soil properties in a semi-arid area of Turkey. Journal of Arid Environments, 2010,
74, 229–238.
[47] McCarty, G. W., Reeves, J. B., Reeves, V. B., Follett, R. F. and Kimble, J. M. Mid-Infrared and Near-Infrared
Diffuse Reflectance Spectroscopy for Soil Carbon Measurement. Soil Sci. Soc. Am. J., 2002, 66, 640–646.
[48] Leone, A.P., d’Andria, R., Patumi, M. and Scamosci, M. Relationships between the quality of olive oil and
attributes of rural and scapes: a case study in the Telesina Valley, Southern Italy. In: Biotechnology and quality of olive
tree products around the Mediterranean Basin, Proceedings of Olivebioteq 2006, Second International Seminar,
November 5–10 Mazara del Vallo, Marsala, Italy, 2006; Caruso, T.; Motisi, A., Eds.; 2006, 2, pp. 5-11.
[49] Vasques GM, Grunwald S.,SickmanJ.O, Comparison of multivariate methods for inferential modeling of soil
carbon usingvisible/near-infraredspectra. Geoderma, 2008, 146, 14–25.
[50] Milton, E.J. Principles of field spectroscopy. Int. J. Remote Sens., 1987, 12, 1807-1827.
[51] Drury, S.A. Image interpretation in geology, Chapman& Hall: London, 1993.
[52] Irons, J.R., Weismiller, R.A. and Petersen, G.W. Soil reflectance. In: Theory and Applications of optical remote
sensing; Asrar G., Ed.; Wiley: New York, 1989, pp. 66-106.
[53] Baumgardner, M. F. and Stoner, E. R. Soil mineralogical studies by remote sensing. In: Transaction, XII
International Congress of Soil Science; February 8-16, New Delhi, India, 1982; Krishan, K., Eds; 1982, pp. 419441.
[54] Fernandez, R.N. and Schultze, D.G. Calculation of soil color from reflectance spectra. Soil Sci. Soc. Am. J., 1987,
51, 1277-1282.
[55] Leone, A.P., Ajmar A., Escadafal, R. and Sommer, S. Relazioni tra colore e risposta spettrale del suolo.
Applicazioni ad un’area di studio dell’Appennino meridionale. Boll. Soc. Ital. Scienza Suolo, 1996, 8, 135-158.
[56] Elrashidi, M.A., West, L.T., Seybold, C.A. Wysocki, D.A., Benham, E. Ferguson, R. and Peaslee, S.D. Application
of soil survey to assess the effects of land management practices on soil and water quality. In: Soil Survey
Investigations Report; United States Department of Agriculture, Natural Resources Conservation Service, National
Soil Survey Center; Ed.; 2010, 52, pp. 21-35.
[57] Agozzino, P., Avallone, G., Bongiorno, D., Ceraulo, L., Indelicato, S., and Vèkey, K. Determination of the cultivar
and aging of Sicilian olive oils using HPLS-MS and linear discriminant analysis. Journal of Mass Sectrometry,
2010, 45, 989-995.
[58] Berenguer, M.J., Vossen, P.M., Grattan S.R., Connell J.H. and Polito, V.S. Three irrigation levels for optimum
chemical and sensory properties of olive oil. HortScience, 2006, 41, 427-432.
[59] Morales-Sillero A, Jiménez R, Fernández JE, Troncoso A, Beltrán G. 2007. Influence of fertigation in Manzanilla
de Sevilla olive oil quality. HortScience, 2007, 42(5), 1157-1162.
[60] Lerma-García, M.J., Herrero-Martínez, J.M., Ramis-Ramos, G. and Simó-Alonso, E.F. Prediction of the genetic
variety of Spanish extra virgin olive oils using fatty acid and phenolic compound profiles established by direct
infusion mass spectrometry. Food chemistry, 2008, 108, 1142-1148.
[61] Gurdeniz, G., Ozen, B. and Tokatli, F. Classification of Turkish olive oils with respect to cultivar, geographic
origin and harvest year, using fatty acid profile and mid-IR spectroscopy. European Food Research and Technology,
2008, 227(4), 1275-1281.
[62] Peres, A.M., Baptista, P., Malheiro, R., Dias, L.G., Bento, A. and Pereira, J.A. Chemometric classification of
several olive cultivars from Trás-os Montes region (northeast of Portigal) using artificial neural network.
Chemometrics and Intelligent Laboratory Systems, 2011, 105, 65-73.
[63] Diraman, H., Saygi, H., Hisil, Y. Relationship between geographical origin and fatty acid composition of Turkish
virgin oil for two harvest years. Journal of the American Oil Chemists’ Society, 2010, 87(7), 781-789.