Tittle: MCR-BANDS: An user friendly MATLAB program for the

advertisement
OPEN ACCESS DOCUMENT
Information of the Journal in which the present paper is
published:
 Elsevier, Chemometrics and Intelligent Laboratory
Systems, 2010, 103 (2), pp. 96-107.

DOI: dx.doi.org/10.1016/j.chemolab.2010.05.020
Tittle: MCR-BANDS: An user friendly MATLAB program for the evaluation of
rotation ambiguities in Multivariate Curve Resolution
Authors: Joaquim Jaumot1 and Romà Tauler2,*
1
Department of Analytical Chemistry, University of Barcelona, Diagonal 647,
Barcelona 08028
2
Department of Environmental Chemistry, IDAEA-CSIC, Jordi Girona 16-20,
Barcelona 08034
*
e-mail Roma.Tauler@idaea.csic.es
Abstract:
A new user friendly graphical interface and a command line MATLAB computer
program for the evaluation of the extent of rotation ambiguities associated to
Multivariate Curve Resolution solutions are presented. Different examples of
application are shown including the simultaneous analysis of multiple data sets and the
implementation of local rank and trilinearity constraints, basic tools to reduce and
eliminate rotation ambiguities. The programs allows for an easy check of the extent of
rotation ambiguity remaining in Multivariate Curve Resolution solutions in the
investigation of a particular system and it also allows for the checking of the effect of
applied constraints. In this way, conditions and limitations to achieve optimal solutions
in Multivariate Curve Resolution are easily assessed.
1. Introduction
Multivariate Curve Resolution (MCR) methods [1] are reaching a mature state in
Chemometrics and they have evolved as a powerful tool for the investigation of many
type of chemical systems. The minimal and basic assumption of all MCR methods is the
fulfillment of a bilinear model describing how the experimental data set arranged in a
data matrix is decomposed in the product of two factor matrices of reduced sizes, one
related with the rows and the other with the columns of the original data matrix. For
instance in the particular case of a data matrix obtained by liquid chromatography with
diode array detection (LC-DAD), the spectra at different elution times are collected in
the rows of the data matrix, or similarly, the elution profiles at the different wavelengths
are arranged in the columns of the same data matrix. The row factor matrix obtained in
the decomposition of this data matrix will describe the elution profiles of the (co-)eluted
components and the column factor matrix obtained in the same matrix decomposition
will describe the pure spectra of these components. Bilinear models and their
corresponding matrix factor decompositions provide very powerful tools to investigate
and describe many chemical processes and systems when they are measured by modern
multivariate analytical instrumental methods. Among the existing bilinear model based
methods, MCR looks for those matrix decompositions where the two factor matrices are
describing as close as possible the true sources of data variance (experimental
measurements), without knowing them in advance. In this context, MCR methods in
Chemistry have similar goals as other Factor Analysis methods [2, 3].
MCR methods differ from other bilinear model based methods like Principal
Component Analysis (PCA, [4]) or Singular Value Decomposition (SVD, [5]) methods
in the way the matrix decomposition is performed. Whereas in PCA or SVD, the factor
matrices are orthogonal and in the directions of maximum explained variances (apart
from factor normalization or scaling), in MCR, constraints are softer but with more
physical meaning, like non-negativity, unimodality, closure or local rank and selectivity
[6, 7]. They can be called ‘natural’ constraints since they are frequently fulfilled by
natural systems. For instance, absorption spectra and concentrations of chemicals can
only be non-negative. Therefore solutions obtained by MCR methods will be more
easily interpretable and closer to the true sources of data variation than those obtained
by methods like PCA or SVD.
However, although MCR solutions have more physical meaning and an easier
interpretation than those obtained by PCA or SVD, they are not unique in the general
case, and they have an unknown amount of ambiguity. Two types of ambiguities are
distinguished in MCR methods: intensity (or scale) ambiguities and rotation ambiguities
[8]. Intensity ambiguities are present in any factor analysis decomposition unless the
scale of one of the two factors matrices is fixed in a particular way. In SVD and PCA
decompositions, loading factors are usually scaled to have all the same length or norm
equal to one (the squared sum of all the vector elements is equal to 1). In some cases,
the scale of one of the two factor matrices is previously known, like in chemical
reaction systems where the total mass of constituents is known to be constant (closed
reaction systems). In these cases, the application of closure constraints will eliminate
the scale ambiguity. However, the more critical and difficult to avoid type of ambiguity
is the so called rotation ambiguity. In this case, a set of different solutions (and linear
combinations of them), will fit the experimental data equally well, i.e. they will be
equivalent from a mathematical point of view although they will be completely different
from a physical point of view. An obvious way to reduce this type of ambiguity is by
means of the application of more constraints to the solutions, but this should be
performed keeping the physical plausibility of the solutions. Therefore the use of
orthogonal constraints to provide unique solutions like in PCA or SVD is excluded in
MCR methods since this type of constraints cannot be applied if non-negativity
solutions are required.
Apart from natural constraints, the two more powerful strategies to avoid rotation
ambiguities in MCR methods are the use of local rank and selective constraints [7, 9]
the extension to simultaneous analysis of multiple data sets [10] and the use of hard
(deterministic) modeling [11]. Using appropriately these strategies, unique solutions
are achieved as it has been shown in previous works [12, 13]. Among all these
possibilities, the use of the trilinearity constraint [10, 14] in the simultaneous analysis of
multiple data sets is the best choice for the analysis of data sets that fulfill this type of
model, obtaining then the same solutions as when PARAFAC [15] is used.
Once it is recognized that MCR solutions can have a certain degree of ambiguity, the
question is how to evaluate this degree of ambiguity. Different methods have been
already proposed in the literature for such an evaluation, including the development of
methods for the calculation of the boundaries of the so called feasible bands, within
which all the feasible solutions should remain inside. In the last years, different authors
[16, 17] have proposed the calculation of these boundaries using the optimization of a
function describing the relative contribution of each component to the whole measured
system signal. This approach has been already used in the investigation of different
problems [18] and also revised and discussed in other works [19, 20]. There is a need
however to extend the use of this approach for its general use and to provide freely
available software and tutorials. This is the main goal of this paper: the proposal of new
user friendly MATLAB set of programs for the evaluation of rotational ambiguity in
Multivariate Curve Resolution which will be called MCR-BANDS. Two new
MATLAB programs are presented, the command line mcrbands program (improved
from our previous version of it, given in ref [17]) and the new mcrbandsg graphical user
interface program, firstly introduced in this paper. The results obtained by these two
programs are equivalent and the only differences are in the way the results are presented
and in the way the input data are given to the program. In this work, the use of these
two programs and the results of their application are shown for the solution of different
type of problems.
This paper is organized in the following way. First the basis of the MCR-BANDS
method will be summarized, then the MATLAB programs (command line and gui) will
be briefly explained and their freely available web addresses given. Finally, different
data examples will be also given and resolved using the previously described programs.
The use of these programs will be shown first for a single data set for different
constraints options, describing the different outputs given by the program. A special
attention is paid to the use of selectivity and local rank constraints, and to the reduction
or elimination of rotation ambiguities achieved by their application. Second and third
examples are given then for the case of simultaneous analysis of multiple data sets. In
this case, the implementation of the trilinear constraint for systems fulfilling this type of
data structure is shown in detail to eliminate completely the rotation ambiguities.
2. Data Examples
Figure 1 near here
In Figure 1, the set of elution (concentration) and spectra profiles used for the data
examples used in this paper are given. They all refer to the analysis of liquid
chromatography with diode array detection (LC-DAD) data.
In the upper part of the Figure 1, a single chromatographic LC-DAD run with a peak
region containing four different components is used as first data example (Example 1).
The spectra of these four components are given in the lower subplot. The inputs used in
the program are the C matrix of elution profiles (size 51 x 4) and the ST matrix of pure
spectra (size 4 x 96) of the 4 components present in the system. The overlap among
elution and spectra profiles is moderate and there are no conditions for the total
resolution of the system using only non-negativity constraints. In particular some of the
components (blue ones) have their profiles totally embedded inside the profiles of the
others, making their resolution without ambiguities especially difficult. However, using
appropriate local rank constraints, resolution conditions can be achieved and
ambiguities are practically eliminated.
Example 2 illustrates the power of the simultaneous analysis of multiple LC-DAD runs
in contrast to the individual analysis of them. The results show the improvement of the
resolution of the whole system achieved by this simultaneous analysis of the four
chromatographic runs of the second row of plots of Figure 1, containing each of them,
the same 4 components coeluted in a different way. Observe that the 4th
chromatographic run included in the analysis is the same as the one examined in detail
in Example 1. The inputs used in the program are the C matrix of elution profiles in the
4 different runs (size 204 x 4) and the ST matrix of pure spectra (size 4 x 96) of the 4
components present in the system. It is also interesting to point out that the order and
shape of the elution profiles of every component in the different chromatographic runs
was different. This was done in purpose to illustrate systems that do not accomplish the
conditions for trilinear models, since in this case; it is not possible to define a unique
elution profile (first mode profiles or loadings) for each component in the different
simultaneously analyzed matrices. In fact the rank of the two augmented matrices, row-
and column-wise differ considerably as it is shown in Table 1. This indicates a strong
departure from the trilinear condition. Therefore, in this case the application of
trilinearity is precluded and the conditions of unique solutions are not assured.
However, as it is also seen in Figure 1, some of the runs have better local rank
resolution conditions than others according to Resolution theorems [9], and therefore
the whole system can beneficiate from the application of selectivity and local rank
constraints. See for instance that in the fourth run previously analyzed alone in Example
1, there are concentration selective regions for elution profile of components 1 and 4,
and consequently, resolution conditions can also be achieved for components 2 and 3
[9].
Example 3 is similar to previous one, the simultaneous analysis of 4 chromatographic
runs with 4 coeluted components in each run (see the third row of plots in Figure 1). In
this case, the inputs used in the program are similar to those described fro previous
Example 2. But now the position and shape of the concentration profiles of every
component are the same in the different runs, and they only differ in their relative
amounts (peak areas). This is the ideal situation for chromatographic separations and for
the accomplishment of unique resolution condition without ambiguities. Even if
resolution theorems [9] are not fulfilled because of the extremely high overlap among
elution profiles and spectra profiles, the use of trilinearity constraints already provide
unique solutions [10, 14].
3. Theory
3.1. MCR methods: fundamentals and ambiguities
All MCR methods are based on the bilinear decomposition of the measured data matrix
according to equation:
T
D = CS + E =
N
 cn snT + E  D* + E
Equation 1
n=1
In this equation D(I,J) is the measured data matrix with I rows and J columns. For
instance in the case of LC-DAD UV spectrophotometric data, the rows are the
experimental DAD spectra measured at each retention time (i =1, … I) and the columns
of this data matrix are the elution profiles at each wavelength (j =1,… J). C(I,N) has the
elution or concentration profiles of the N eluted components and matrix ST(N,J) has the
UV pure spectra of these N components. The contribution of each component to the
whole measured signal is the rank one matrix obtained by the vector product of its
elution profile by its pure spectrum, i.e. for component n, cn snT . E matrix in Equation 1
gives the part of data matrix D which is not described by the matrix product CST, like
experimental errors and uncertainties. The main goal of all MCR methods is to obtain
solutions with physical meaning, describing the true profiles of the different
components in the mixture, measured according the two measurement modes (rows and
columns), and to leave in matrix E only the noise and experimental error. However, in
the general case, and in absence of any constraint, Equation 1 has an infinite number of
solutions, since there are an infinite number of matrices C and ST, which when they are
multiplied, they will produce the same result, the data matrix D* (apart from noise E).
This indeterminacy is described mathematically by the following equation:
D* = Cold SoldT = (ColdT-1) (T SoldT) = CnewSnewT
Equation 2
According to Equation 2, any invertible matrix T(N,N) gives a new set of equivalent
solutions of the MCR model. Or said in other words, any linear combination of C and
ST solutions will also produce new solutions of the bilinear model. In Factor Analysis
literature, this indeterminacy is usually referred as rotation ambiguity. The goal of MCR
methods is to limit rotation ambiguities, trying to eliminate them by the application of
appropriate constraints. Many examples of successful application of constraints in MCR
methods have been described [6, 7, 10-14]. Constraints can be defined as any feature we
can consider the true profiles in C and ST matrices should have. The more obvious one
is non-negativity. In the previous example on LC-DAD, elution profiles and pure
spectra of the different eluted components should have positive or null values, but not
negative values. Other constraints like unimodality, closure (mass balance), selectivity,
local rank and hard modeling constraints have been proposed and used successfully [1014]. When the MCR model of Equation 1 is extended to the simultaneous analysis of
multiple data matrices, other additional constraints like the correspondence among
components, absence and presence of them in some data matrices, and specially
trilinearity and other multilinearity constraints can be implemented. In many of these
cases, the solutions provided by MCR under such constraints produce solutions that can
be considered unique in practical terms and therefore the obtained results can be used
with confidence. The large experience accumulated in this field is reflected in the large
number of successful applications obtained by MCR in the last years [1, 12, 13].
3.2. The MCR-BANDS method
The MCR-BANDS method has been described in previous works [17] and already
applied in some examples [18]. Basically, the method is based in a preliminary idea
proposed by Paul Gemperline [16] of calculating the relative contribution of every
component in a mixture using a method which in our case is evaluated using the
equation:
fn =
cn s nT
CS T
Equation 3
Where fn is a scalar value which gives the relative signal contribution of a particular
component or n components to the whole signal for the mixture of N components
(n=1,…N). This relative signal contribution is measured by the quotient of two norms
(Frobenious norm), the one from the signal of the considered component n, cnsnT , and
the one from the whole signal considering of all components of the system together,
CS T .
As shown in Equation 2, the product CST produces the same results for any invertible
matrix T. Therefore the value of CS T in Equation 3 is a constant value for any T
value. On the contrary, for every component n=1,…N, every T matrix will give a
different set of cn and snT profiles with different shapes, and their product cnsnT will be
now also different, as well as its norm cnsnT , and its relative signal contribution fn
defined by Equation 3. Therefore, the scalar value of fn will depend on the considered T
matrix. The application of constraints makes only possible some of these T matrices. It
is then possible to look for those T matrices which give maximum and minimum values
of the relative contribution function for each component of the system, fn n=1,…N,
called respectively fnmax and fnmin. This is the strategy that was implemented in the
MCR-BANDS method [17-19], where for a particular system defined by a set of
profiles in C and ST, Tnmax and Tnmin matrices which give fn maximum, fnmax , and
minimum, fnmin, respectively, are calculated for each of the components of the system
under a particular set of constraints.
Once a system was investigated by means of a MCR method, using for instance MCRALS [6, 7] under a set of constraints, and a particular set of solutions C and ST is
obtained, then Tnmax and Tnmin matrices corresponding to fnmax and fnmin values of
Equation 3 are calculated. In order to perform this task a non-linear optimization with
non-linear constraints from the MATLAB Optimization Toolbox is applied. The set of
cn and snT profiles corresponding to each component n=1,…,N, giving the maximum
and minimum values fnmax and fnmin, noted by cnmax snT.max and cnmin snT.min respectively,
are then plotted for visual inspection together with the initially proposed profiles, cn
and snT. In the cases where rotation ambiguities were practically eliminated by the
application of constraints, the profiles cn and snT do not change during the optimization
and they are practically constant. This constitutes an easy and adequate way to show
that the obtained solutions are practically unique without ambiguities. On the other hand
when fnmax and fnmin differ appreciably from fn and cnmax snT,max and cnmin snT,min from
initial cn and snT, the presence of rotation ambiguities is confirmed and the extent of
rotation ambiguity can be evaluated by the difference between fnmax and fnmin values and
displayed graphically by the representation of cnmax snT.max and cnmin snT.min. It should be
noted that this evaluation should be performed for each component independently and
that the achieved values of fn and profiles cn and snT for every component will depend
on the profiles of the other components. This is actually taken into account during the
optimization of the function of fn, which is performed for every component separately,
allowing the profiles of all other components of the system to change Therefore, the
number of optimizations is equal to the number of components multiplied by 2 (one to
find out the maximum and another one for the minimum of each component). In a
recent work [20], we have shown that for the case of two components, the cnmax snT.max
and cnmin snT.min profiles obtained by the optimization of the fn function are coincident
with the boundaries of the band of the feasible solutions obtained by a systematic grid
search of all the possible solutions. The extension of this to more than two components
has not been possible until now because for the case of 3 or more components, it is not
possible to define a single couple of maximum and minimum graphical boundaries of
the band of feasible solutions. The concept of maximum and minimum band boundaries
and their graphical representation in a 2D plot is only strictly possible for a system of
two components [20, 21]. However, the use of the optimization of the scalar function
defined in Equation 3 and proposed in this work is always possible, whatever is the
number of components present in the system, and the results obtained using this strategy
can be used to measure the amount of rotation ambiguity remaining in the system. It is
in this direction that we have progressed in the implementation of the concept
developed in previous works [17-19] and implemented in a new graphical user interface
program in this paper to facilitate and popularize its use and its possible generalization
for all type of MCR studies
4. Computer programs
4.1. MCR-BANDS command line program
A command line MCR-BANDS program has been implemented in a small set of
MATLAB (The Mathworks Inc, Natick, MA) m functions consisting of: a) the main m
function mcrbands for input/output of parameters and for calling the optimization
routines; b) the m function fmaxmin which calculates the optimization function
according to Equation 3; c) the m function mycons defining the equality and inequality
constraints normalization/closure, non-negativity, local rank/selectivity and
multiset/trilinearity); d) the m function unimodg which is specific for the definition of
unimodality constraints; and e) the optimization m function fmincon from the MATLAB
Optimization Toolbox (The Mathworks Inc, Natick, MA) which contains the main core
functions for the non-linear optimization with non-linear constraints needed to solve the
problem under investigation. Further details about the program structure can be
obtained under request to the authors. For those who are interested in the details of the
optimization routines and procedure, MATLAB optimization toolbox tutorial and
reference textbook can be consulted, especially in relation to the optimization of the
non-linear function (Equation 3) under non-linear constraints using the general fmincon
optimization function in this toolbox. In the following, only the details concerning to
input and output parameters as well as further aspects specific to multivariate curve
resolution problems are given. The main program is mcrbands.m with the following
input and output arguments:
[sband,cband,normband,tband,foptim] = mcrbands(c,s,t0,cknown,sknown);
Mandatory inputs are c the initial concentration profiles and s the initial spectra profiles
while t0, cknown and sknown are optional inputs. t0 is an initial T matrix different to
the default one (the identity matrix is the default). cknown is of the same size as c
matrix and it refers to previously known concentration values, to be used either as
equality constraints or as an inequality constraints (<= values). For instance in case of
local rank and selectivity constraints, those values in cknown belonging to
concentration windows where a component is known not to be present are set to a very
low value (e.g. 10-7) and the rest of values (unknown) are left to NaN (Not a Number).
The same applies to sknown (of the same dimensions as the input parameters,
components spectra) for known spectral values if needed as a constraint. These two
parameters are only given as input when the corresponding should be applied.
The outputs of the program are optional and only if desired: sband, cband, normband,
tband, foptim. The two first outputs contain the spectra and concentration band profiles
corresponding to the fnmax and fnmin values obtained by the optimization of the function
of Equation 3. normband has the norm of each components contribution in these
function boundaries. tband has the T rotation matrices in Equation 2 corresponding also
to the function boundaries in relation to the initial estimates provided in the input as cinic
and sinic input matrices giving fninic values. The last output parameter, foptim, refers to
fnmax nd fnmin values finally achieved for each components n=1,…N in Equations 1, 2
and 3. For comparison and to facilitate their graphical representation afterwards, the
initial input values of cinic and sinic, are also given in the output.
During the execution of the command line program, additional optional inputs are
required to ask for what type of constraints should be applied. They have been
implemented as simple on-line queries at the beginning of the execution of the program.
At present, the following types of constraints have been implemented: normalization
(or closure), non-negativity, unimodality, local rank-selectivity and multiset-multilinear
data analysis. Different options are available in each case which should be filled out
depending on the nature of the problem under study and on the previous knowledge
about it. For instance, in case local rank-selectivity constraints, equality or inequality
constraints are chosen, matrices cknown or sknown are given as inputs in the command
line call to the program.
During the execution of the program different graphical windows are opened with the
partial results obtained during the optimization of each components contribution.
Moreover in the MATLAB command window, partial results of the optimization are
also given with details of its evolution. Cases where optimal solutions are not
encountered (unfeasible solutions) can be then easily detected. In some circumstances
the optimization becomes locally swamped inside an unfeasible region and it cannot
proceed outside it towards a feasible region. This situation can be handled in different
ways. One possibility is by changing the defaults for minimum incremental values in
the calculation of numerical derivatives required for the calculation of appropriate
directions where maximum and minimum of function defined in Equation 3 are located.
The minimum value by default is 10-5, but it that can be diminished to a smaller value
(e.g. 10-10). In many circumstances an easier way to solve this problem is to change the
initial estimates of matrix t0 in the input. This is especially critical for those cases where
the system is so constrained that the only possible values around the initial solution are
the ones given in the input (unique solutions). In these cases, it is not recommended to
start the optimization with the identity matrix as initial T value in t0, since it may be
difficult to encounter any direction where the optimization can progress (constraints
fulfillment). However if t0 is different to the identity matrix, the chances to have
progress to the optimum are much higher. The plots provided as outputs display the
profiles corresponding to fnmax and fnmin values as well as fninic for each component. In
the results section some examples are shown and described. Also the numerical values
of fnmax , fnmin and fninic are given for each components at the end of the optimization.
4.2. MCR-BANDS graphical user interface program
To facilitate the use of the MCR-BANDS program previously described, a new
Graphical User Interface (GUI) in the MATLAB environment is presented in this work.
This new GUI program uses most of the functions described before for the command
line version such as fmincon, fmaxmin or mycons. The new interface is launched by the
mcrbandsg function and, then, the main window shows up (Figure 2). In this window, it
can be clearly distinguished three different frames. In the first one (Obligatory
Parameters), the user has to select the initial concentration and spectra profiles, cinic and
sinic, in the corresponding pop-up menus. They should be already present in the
MATLAB workspace environment before the program is launched. In the second frame
(Constraints), the user has to select which of the available constraints should be applied
during the optimization. For every constraint (normalization, non-negativity,
unimodality, trilinearity and equality for concentration and/or spectra), the user can also
select from a new pop-up menu which kind of implementation of the selected constraint
is desired (for instance, non-negativity only for the concentration profiles). Finally, the
third frame includes the Optional Parameters of the optimization. Here, in the
corresponding pop-up menu, the initial default value of the T matrix can be modified by
selecting a different variable from the MATLAB workspace environment as described
in the case of the command-line program. In this space, other optimization parameters
such as constraint tolerances violation (TolCon), termination tolerances on the optimal
function value (TolFun), maximum number of iterations, or maximum and minimum
incremental values in numerical derivatives calculation can be changed. However, in
general, it is not recommended to modify these values and check for successful
optimizations using default values. In case of encountering problems and with some
experience these values can be tuned to deal with special difficult cases. Also, in this
space, three additional items are activated when the user selects certain constraints. For
instance, if the user changes the default value in the trilinearity constraint pop-up menu
(l, the “Number of experiments” edit box is activated and the user is asked to confirm
(or to modify) the number of analyzed experiments. The same happens in the case of
applying equality or inequality constraints. In this case the Known Concentrations and
Known Spectra pop-up menus will be activated This new pop-up menu will allow the
user to select the variable from the MATLAB workspace environment. Finally, there are
two additional parameters that will only affect the presentation of the optimization
results. First, there is a check-box that allows the user to decide if the program will
show plots during the optimization. Second, there is an option to save the results in a
structure-array named by the user in addition to the structure variable mcr_bands.results
given as a default output.
Figure 2, 3, 4 and 5 near here
When the “Optimize” button is clicked, the optimization is carried out and, at the end of
it, the “Results of the optimization” window is shown (Figure 3). In the upper side of
this window, the initial, maximum and minimum boundaries of concentration and
spectra profiles are given. In both cases, the initial profiles are plotted as dotted lines
and the calculated band boundaries are plotted as solid lines of the same color for every
considered component. Also, additional information about the optimization is also given
in this window. First, it gives a termination about how the optimization procedure
finished. There are three different options: convergence, divergence or maximum
number of iterations exceeded. If the user wants more detailed information, clicking on
the “Numerical Value” button the final termination values for the optimizations of each
one of the components are given indicating what situation was achieved in the
optimization of each profile. Second, the program informs about the total number of
constraints active at the end of the optimization. Again, the user can obtain more
detailed information by clicking on the “Constraints used” button. In the new opened
window (Figure 4), the active constraints are indicated. For instance in the example of
Figure 4 it can be seen that normalization constraints are active. Finally, the important
information about the final values of the optimization function in every optimization for
each profile are available from the “See information” button. When this is clicked, a
new window is launched (Figure 5). In this window, for each component, the function
values for the initial profiles, for those corresponding to the maximum and minimum
optimized functions, and for the difference between them, are given. This allows for an
easy checking of the extent of remaining rotational ambiguities associated to the input
initial solution. When the difference is close to zero means that practically there is no
remaining rotation ambiguity. In addition, information about the optimized T rotation
matrices both for MCR and SVD solutions are given.
4.2.1. Software details and implementation
The MCR-BANDS GUI program has been implemented with MATLAB 7.6 and tested
in different MATLAB versions (from 7.0 to 7.9) and operating systems. In addition to
the MCR-BANDS program, fmincon.m function from MATLAB Optimization Toolbox
is needed to perform the non-linear optimization with non-linear constraints. The results
obtained by the MCR-BANDS program are stored in a MATLAB data structure. This
data structure contains different data substructures in which all the information
regarding to the optimization process and presentation of results is organized. For
instance, they include the ‘Data’ structure in which the initial concentration and spectra
profiles are saved, the ‘Constraints’ structure in which the information about which and
how constraints are applied are stored and the ‘Results’ structure in which all the
results of the optimization (band boundaries profiles, termination of the optimization,
number of iterations, …) are stored.
Both the MCR-BANDS command line and graphical user interface programs can be
freely downloaded from the MCR-ALS webpage (www.mcrals.info) together with some
data sets for example and testing. Once the program is downloaded, it is only necessary
to copy the MATLAB m and fig files to the selected folder, which should be declared in
the MATLAB path.
5. Results
Table 1 near here
In Table 1, rank analysis using Singular Value Decomposition of the different data sets
used as examples in this work are given. For the purposes of this study, only the
component profiles are given. The reason for this is because the MCR-BANDS
procedure here described was used for the evaluation of the extent of rotation ambiguity
associated to a particular MCR solution (a particular set of profiles defining one specific
chemical system), once it has been obtained using for instance MCR-ALS. Therefore,
this procedure is applied to the data matrix described by these profiles in absence of
noise, i.e. to matrix D* in Equation 1 above. Four components were present and four
singular values (no noise was present) were calculated in every case except for the case
the row-wise augmented data matrix, where the four matrices used in data Example 2 do
not match with trilinear model. In this case, a larger number of singular values were
needed to explain the data variance of the row augmented data matrix, which could not
be described by only four elution profiles. This is simply a consequence of the fact that
the shapes and positions of the elution of the four coeluted components are different in
the four different runs and that they could not be explained by only four elution profiles.
This example was used here indeed to illustrate the case of a three-way data set without
a trilinear data structure. On the contrary, data Example 3 gave the same number of
components for row-wise and column wise augmented data matrices (equal to four),
illustrating that in this case the data are fulfilling the trilinear model.
Table 2 near here
In Table 2 a summary of the results obtained in the evaluation of rotation ambiguity for
the three different data examples is given. In this Table the values of the fnmin, fninic, fnmax
and of the difference between fnmax and fnmin values are given for every component for
the different analyzed data sets considering different type of constraints.
Figures 6 and 7 near here
In the analysis of the individual data matrix corresponding to the forth single LC-DAD
run with 4 coeluted components, results of three optimizations are shown. The first
optimization is for the case where only non-negativity and normalization constraints are
applied. The results showed that in this case, rotation ambiguities were rather large,
since for the four components n=1,2,3,4, the differences between the maximum and
minimum optimization function values, fnmax- fnmin, were also rather large (0.214, 0.481,
0.269, 0.298 in Table 2, respectively for the four components). In Figure 6, the profiles
given fnmax and fnmin values (continuous blue lines) together with the theoretical profile
(broken red line) are given for the case of non-negativity and normalization constraints.
Looking at these results, some of the profiles (Figure 6) resulted to be unreasonable
from a chromatographic point of view. For instance the second and forth concentration
elution profile were showing a very broad profile for fnmax with a double peak. Adding
the unimodality constraint improved the results and the difference between fnmax and
fnmin resulted to be smaller, but still large (0.188, 0.321, 0.255, 0.120 in Table 2). When
local rank/selectivity constraints were also applied to this data set, the difference
between the values of fnmin and fnmax was now very narrow (only 0.019, 0.019, 0.014,
0.024 in Table 2), indicating that the rotation ambiguities were practically nearly
eliminated. The profiles corresponding to the extreme values of fn, fnmax and fnmin, were
practically equal to those corresponding to fninic. In Figure 7, the effect of constraint
application is shown in more detail for the elution profile corresponding to the forth
component of this forth chromatographic run. Application of only non-negativity
constraint leaves a wide range of possible elution profiles (continuous blue lines),
including those with unreasonable double peaks. Also the spectra profiles can be very
different if only this non-negativity constraint was applied (continuous blue lines).
When unimodality was applied, this range of possible solutions was narrowed
considerably (dotted lines), and when local rank constraints were applied (dashed lines),
then the corresponding fnmax and fnmin components profiles were so close to the
theoretical one (dashed blue and broken red lines) that they can hardly be distinguished.
This proves again the importance of using selectivity/local rank constraints to get
optimal solutions in MCR analysis, in agreement also with results found using MCR
methods like Window Factor Analysis [22], HELP [23] and Genttle [24] methods. In
practical cases, for the analysis of experimental systems, the information necessary for
the application of the local rank/selectivity constraints can be obtained from preliminary
Evolving Factor Analysis [25] of the data sets or by previous knowledge of the system
concerning the selectivity in the profiles of every component in the two modes of
measurement. See references [6, 7, 22-25] for more details about the application of this
important constraint
Figure 8 near here
In the case of four LC-DAD runs each one with 4 coeluted components with a nontrilinear structure, two different optimizations were performed. In the first optimization,
the extent of rotation ambiguities was evaluated using only non-negativity and spectra
normalization constraints. The differences between fnmin and fnmax were always relatively
high, 0.280, 0.407, 0.130 and 0.311, respectively for the four components, probing
again that the extent of rotation ambiguities were also large (see also Figure 8). If the
local rank information about the forth chromatographic run is introduced (2nd
optimization performed) as an additional constraint, the differences between fnmin and
fnmax were drastically reduced (more than 10 times!) to only 0.027, 0.028, 0.014, 0.021.
This proves again [6, 7] the importance of this type of constraint. In Figure 8, the fnmax
and fnmin profiles (continuous blue lines) for the four concentration in the four
chromatographic runs and their corresponding spectra profiles are given for the case
when only non-negativity constraints were applied. The difference between the profile
continuous blue lines of the same component and with its theoretical profile (broken red
line) is large. The fnmax and fnmin profiles for the case when selectivity or local rank
constraints were applied were practically equal to the profile corresponding to the
theoretical profile, fninic broken red line in Figure 8.
Figure 9 near here
In the case of the four LC-DAD runs with a trilinear structure, also two different
optimizations were performed. In the first one only non-negativity and normalization
constraints were applied and in the second optimization, the trilinearity constraint was
also applied (apart from non-negativity and spectra normalization). This trilinearity
constraint eliminates completely the rotation ambiguity as it is proven by fnmin and fnmax
values. Their differences were now 0.738, 0.457, 0.084, 0.468 for the case without
application of the trilinearity constraint and 0, 0, 0, 0 for the case when trilinearity
constraint was applied. Therefore, in this last case, rotation ambiguity was totally
eliminated as expected for the case of a data set fulfilling the trilinear model. In Figure
9, the fnmin and fnmax profiles (continuous blue lines) and fninic (broken red line) are
shown for the case where only non-negativity and spectra normalization constraints
were applied. Again a broad separation between fnmin and fnmax continuous blue profile
lines is shown for this case. When trilinearity constraints are applied, fnmin and fnmax
continuous blue profile lines coincided totally with the red broken line of the initial
profiles and they were not shown. In the case of this optimization, due to the difficulty
to move from initial values to find feasible solutions, perform the optimization and find
fnmin and fnmax profiles different to fninic, the default option of the initial rotation T matrix
in equation (identity matrix) was changed to a different non-identity arbitrary T rotation
matrix. This allowed the optimization to proceed adequately and to confirm that the
initial solution was unique because the system fulfilled the trilinear condition.
6. Reviewer assessments
M. de Luca
Department of Pharmaceutical Sciences, University of Calabria, Via P. Bucci, 87036
Rende, Cosenza, Italy
“The Authors have presented an important innovation in multivariate curve
resolution analysis dealing with a graphical user interface able to evaluate the
rotation ambiguities in MCR solutions. The software, in both command line and
GUI versions, has been implemented in MATLAB environment, which is one of
the most used tools in chemometric field.
The arrangement of the data matrices in the input step is well built and the
expression of the results, through tables and plots, is clear and complete.
As described in the text, the size of the band boundaries depends on the
constraints that are imposed, by considering the data origin and the studied
chemical system. GUI permits an easy selection of the different constraint
combinations and it is besides possible to follow the change of the band
boundaries varying the set of constraints. In this contest, I would like to suggest
a issue for a future development of GUI. Since the matrices of the initial
concentrations and spectra profiles often come from MCR-ALS analysis, it
could be an interesting solution to have unique software able to perform
simultaneously MCR-ALS analysis and evaluate the respective band boundaries.
The examples reported in the text and the guide contained in the web site seems
exhaustive. The multivariate curve resolution website is an excellent way to
share the chemometric methods with the scientific community.”
H. Abdollahi
Faculty of Chemistry, Institute for Advanced Studies in Basic Sciences, 45195-159,
Zanjan, Iran
Evaluation of the extent of ambiguity in MCR solutions is a serious challenging
problem in chemometrics. There is a demand for easily available software
relevant to evaluate the rotation ambiguity accompany with MCR solutions for
general applications. This time Tauler group has developed a Matlab program
for estimating the rotational ambiguity in MCR methods. Now the users of MCR
methods can easily check the quality and the validity of their solutions.
Assessment of rotation ambiguity in this program is based on signal contribution
function, a scalar function which has the ability to define maximum and
minimum band boundaries for profiles of each component even in a multicomponent system. The software provides a clear graphical user interface, which
allows an easy interaction with the program, and represents a useful tool for
evaluating the rotational ambiguity under different options for imposing the
constraints. The software runs according to the features described in the
documentation correctly. No doubt, it represents a significant contribution as a
tool for chemometric analyses.
7. Conclusions
The use of the proposed MCR-BANDS program in its two implementations, the
command line implementation and the user graphical interface implementation, allow
for an easy check of the extent of rotation ambiguities associated to a particular MCR
solution under a set of constraints. This opens a systematic way to validate the quality
of MCR solutions and to assess their reliability. These two programs can also be used as
a test of possible scenarios for data systems having different structures and resolution
conditions. Most of the knowledge accumulated during the recent years about
Multivariate Curve Resolution methods, their advantages and limitations, can be easily
checked using the proposed MCR-BANDS program.
MCR-Bands command line and MCR-Bands GUI programs are complementary tools to
be used to assess the quality of MCR results. MCR-BANDS program can be used
together with the MCR-ALS GUI with the purpose of data analysis and quality
assessment of the obtained results in a single step.“Finally, since one of the possible
drawbacks for the widespread use of the MCR-BANDS program in the future can be its
dependence on the MATLAB Optimization Toolbox, an executable GUI version of the
same program is also being prepared at present. This stand-alone software will be an
adaptation of the program described in this work and its operation will be practically the
same.
8. Figure Captions
Figure 1. Component profiles used to prepare the different data examples. In the upper
part of the Figure, chromatographic elution profiles of the four pure components in four
chromatographic runs are given. Different components are identified by different line
colors. In the first row of the Figure, elution profiles of the pure components in a single
chromatographic run. In the second row of the Figure, elution profiles of the pure
components in four chromatographic runs are given (Run 4 is the same
chromatographic run used in Example 1). Observe that in this case these profiles do not
match with a trilinear model (their positions and shapes of the elution profiles of every
component change depending on what chromatographic run is considered). In the third
row of the Figure, Observe that in this case these profiles match with the trilinear model
(their positions and shapes of the elution profiles of every component are the same
whatever chromatographic run, and they differ only in their relative amounts or
intensities). In the fourth row of the Figure, the pure spectra of the four coeluted
components are given and identified by their different colors. These spectra have been
used in the three examples described in the text.
Figure 2. GUI windows of the MCR-BANDS program. Main window for the data
input and parameter selection.
Figure 3. GUI windows of the MCR-BANDS program. Summary of Results window.
Figure 4. GUI windows of the MCR-BANDS program. Applied constraints window.
Figure 5. GUI windows of the MCR-BANDS program. Details about optimization
results window.
Figure 6 MCR-BANDS results when applied to data Example 1 and only nonnegativity and spectra normalization constraints were applied. On the left, each subplot
gives the results obtained for the elution profile of every component. On the right, each
subplot gives the results obtained for the spectrum profile of every component. Blue
lines are the profiles corresponding to the maximum and minimum component relative
contribution optimization function, fnmin and fnmax. Red lines are initial postulated
profiles.
Figure 7. Detail of MCR-BANDS results for the elution (7a) and spectra (7b) profiles
of the forth component of data Example 1 when different constraints were applied: a)
non-negativity and spectra normalization with continuous blue lines; b) non-negativity,
spectra normalization dotted blue lines; c) non-negativity, spectra normalization and
selectivity/local rank with dashed blue lines. Initial elution profile with red line.
Figure 8. MCR-BANDS results when applied to data Example 2 (simultaneous
analysis of the four chromatographic runs whose elution profiles are given in second
row of plots of Figure 1, non-trilinear case) and only non-negativity and spectra
normalization constraints were applied. On the left, each subplot gives the results
obtained for the elution profiles of every component in the four different
chromatographic runs (from left to right). On the right, each subplot gives the results
obtained for the spectrum profile of every component. Blue lines are the profiles
corresponding to the maximum and minimum component relative contribution
optimization function, fnmin and fnmax. Red lines are the initial postulated profiles.
Figure 9. MCR-BANDS results when applied to data example 2 (simultaneous analysis
of the four chromatographic runs whose elution profiles are given in the third row of
plots of Figure 1, trilinear case) and only non-negativity and spectra normalization
constraints were applied. On the left, each subplot gives the results obtained for the
elution profiles of every component in the four different chromatographic runs (from
left to right). On the right, each subplot gives the results obtained for the spectrum
profile of every component. Blue lines are the profiles corresponding to the maximum
and minimum component relative contribution optimization function, fnmin and fnmax.
Red lines are the initial postulated profiles. When trilinearity constraint was applied
MCR-BANDS results were coincident with the initial profiles red line (no rotation
ambiguities).
9. References
[]
[1] S.D. Brown, R. Tauler, B. Walczak, Comprehensive Chemometrics: Chemical and
Biochemical Data Analysis, in, Elsevier Science, Amsterdam, The Netherlands, 2009,
pp. 2.15-12.24.
[2] W.H. Lawton, E.A. Sylvestre, Self Modeling Curve Resolution, Technometrics, 13
(1971) 617-633.
[3] E.R. Malinowski, D.G. Howery, Factor Analysis in Chemistry, 3rd edn. ed., Wiley,
New York, US, 2002.
[4] I.T. Jolliffe, Principal Component Analysis, 2nd Ed. ed., Springer, New York, US,
2002.
[5] G.H. Golub, C.F. Van Loan, Matrix Computations, 3rd. edn. ed., John Hopkins
University Press, Baltimore, US, 1996.
[6] R. Tauler, A. Smilde, B. Kowalski, Selectivity, Local Rank, 3-Way Data-Analysis
and Ambiguity in Multivariate Curve Resolution, Journal of Chemometrics, 9 (1995)
31-58.
[7] J. Jaumot, R. Gargallo, A. de Juan, R. Tauler, A graphical user-friendly interface for
MCR-ALS: a new tool for multivariate curve resolution in MATLAB, Chemometrics
and Intelligent Laboratory Systems, 76 (2005) 101-110.
[8] E. Spjotvoll, H. Martens, R. Volden, Restricted least squares estimation of the
spectra and concentration of two unknown constituents available in mixtures.,
Technometrics, 24 (1982) 173-180.
[9] R. Manne, On the resolution problem in hyphenated chromatography, Chemometrics
and Intelligent Laboratory Systems, 27 (1995) 89-94.
[10] R. Tauler, I. Marqués, E. Casassas, Multivariate curve resolution applied to threeway trilinear data: Study of a spectrofluorimetric acid-base titration of salicylic acid at
three excitation wavelengths, Journal of Chemometrics, 12 (1998) 55-75.
[11] A. De Juan, M. Maeder, M. Martínez, R. Tauler, Application of a novel resolution
approach combining soft- and hard-modelling features to investigate temperaturedependent kinetic processes, Analytica Chimica Acta, 442 (2001) 337-350.
[12] A. De Juan, R. Tauler, Chemometrics applied to unravel multicomponent processes
and mixtures: Revisiting latest trends in multivariate resolution, Analytica Chimica
Acta, 500 (2003) 195-210.
[13] A. De Juan, R. Tauler, Multivariate Curve Resolution (MCR) from 2000: Progress
in concepts and applications, Critical Reviews in Analytical Chemistry, 36 (2006) 163176.
[14] A. De Juan, R. Tauler, Comparison of three-way resolution methods for nontrilinear chemical data sets, Journal of Chemometrics, 15 (2001) 749-772.
[15] R. Bro, PARAFAC. Tutorial and applications, Chemometrics and Intelligent
Laboratory Systems, 38 (1997) 149-171.
[16] P.J. Gemperline, Computation of the range of feasible solutions in self-modeling
curve resolution algorithms, Analytical Chemistry, 71 (1999) 5398-5404.
[17] R. Tauler, Calculation of maximum and minimum band boundaries of feasible
solutions for species profiles obtained by multivariate curve resolution, Journal of
Chemometrics, 15 (2001) 627-646.
[18] M. Garrido, M.S. Larrechi, F.X. Rius, R. Tauler, Calculation of band boundaries of
feasible solutions obtained by Multivariate Curve Resolution-Alternating Least Squares
of multiple runs of a reaction monitored by NIR spectroscopy, Chemometrics and
Intelligent Laboratory Systems, 76 (2005) 111-120.
[19] R. Tauler, Application of non-linear optimization methods to the estimation of
multivariate curve resolution solutions and of their feasible band boundaries in the
investigation of two chemical and environmental simulated data sets, Analytica Chimica
Acta, 595 (2007) 289-298.
[20] H. Abdollahi, M. Maeder, R. Tauler, Calculation and meaning of feasible band
boundaries in multivariate curve resolution of a two-component system, Analytical
Chemistry, 81 (2009) 2115-2122.
[21] M. Vosough, C. Mason, R. Tauler, M. Jalali-Heravi, M. Maeder, On rotational
ambiguity in model-free analyses of multivariate data, Journal of Chemometrics, 20
(2006) 302-310.
[22] E.R. Malinowski, Window Factor-Analysis - Theoretical Derivation and
Application to Flow-Injection Analysis Data, Journal of Chemometrics, 6 (1992) 29-40.
[23] O.M. Kvalheim, Y.Z. Liang, Heuristic evolving latent projections: Resolving twoway multicomponent data. 1. Selectivity, latent-projective graph, datascope, local rank,
and unique resolution, Analytical Chemistry, 64 (1992) 936-946.
[24] R. Manne, B.V. Grande, Resolution of two-way data from hyphenated
chromatography by means of elementary matrix transformations, Chemometrics and
Intelligent Laboratory Systems, 50 (2000) 35-46.
[25] M. Maeder, A.D. Zuberbuehler, The resolution of overlapping chromatographic
peaks by evolving factor analysis, Analytica Chimica Acta, 181 (1986) 287-291.
Table 1 Chemical Rank SVD analysis of the different data sets.
SV1…SV7 first seven singular values. Xn1,..,Xn4 data matrices of the 4 chromatographic
runs obtained using the 4 set of elution profiles of the second row of plots of Figure
1(Example 2) and the spectra of the lower part of Figure 1. X1,…X4 data matrices of the
4 chromatographic runs obtained using the 4 set of elution profiles of the third row of
plots of Figure 1 (Example 3) and the spectra of the lower part of Figure 1.
1
Data matrices
SV1
SV2
SV3
SV4
SV5
SV6
SV7
Xn1
11.2949
2.2683
0.3953
0.2053
----
----
----
Xn2
13.2277
1.2636
0.1311
0.0576
----
----
----
Xn3
6.4385
0.9806
0.2841
0.1195
----
----
----
Xn4
3.4353
1.1396
0.3542
0.1778
----
----
----
[Xn1;Xn2;Xn3;Xn4]1
18.815
3.1344
1.0189
0.6358
----
----
----
[Xn1,Xn2Xn3,Xn4]2
18.590
3.6507
2.2066
1.0740
0.3696
0.3128
0.2077
X1
14.0970
1.8833
0.5402
0.1167
----
----
----
X2
15.3321
2.2551
0.2992
0.1040
----
----
----
X3
1.5925
1.4910
0.6249
0.1409
----
----
----
X4
9.1131
1.1158
0.3742
0.0645
----
----
----
[X1;X2;X3;X4]1
25.4528
3.8498
1.2177
0.3813
----
----
----
[X1,X2,X3,X4]2
25.3673
4.3705
1.2626
0.3107
----
----
----
and 2 indicate respectively column-wise and row-wise matrix augmentation in
MATLAB notation.
Table 2 Effect of constraints on rotation ambiguity measured by components relative contribution function maximum and minimum values.
Data
Constraints
Example 1
Matrix X4n
1,2
Example 1
Matrix X4n
1,2,3
Example 1
Matrix X4n
1,2,4
Example 1
Matrix X4n
1,2
Example 2
Augmented Matrix
[Xn1;Xn2;Xn3;Xn4]
Example 3
Augmented Matrix
[X1;X2;X3;X4]
Example 3
Augmented Matrix
[X1;X2;X3;X4]
1,2,4
1,2
1,2,5
f
fmax
finic
fmin
fmax- fmin
fmax
finic
fmin
fmax- fmin
fmax
finic
fmin
fmax- fmin
fmax
finic
fmin
fmax- fmin
fmax
finic
fmin
fmax- fmin
fmax
finic
fmin
max
f - fmin
fmax
finic
fmin
fmax- fmin
Component 1
0.491
0.369
0.277
0.214
0.472
0.369
0.284
0.188
0.376
0.369
0.357
0.019
0.652
0.495
0.372
0.280
0.506
0.495
0.479
0.027
0.772
0.558
0.034
0.738
0.558
0.558
0.558
0.000
Component 2
0.618
0.226
0.137
0.481
0.499
0.226
0.178
0.321
0.228
0.226
0.209
0.019
0.589
0.301
0.182
0.407
0.306
0.301
0.278
0.028
0.680
0.439
0.223
0.457
0.439
0.439
0.439
0.000
Component 3
0.660
0.658
0.391
0.269
0.660
0.658
0.405
0.255
0.660
0.658
0.646
0.014
0.297
0.281
0.167
0.130
0.290
0.282
0.276
0.014
0.144
0.130
0.060
0.084
0.130
0.130
0.130
0.000
Constraints: 1 normalization; 2 non-negativity; 3 unimodality; 4 selectivity/local rank; 5 trilinearity
Component 4
0.482
0.421
0.144
0.298
0.439
0.421
0.319
0.120
0.425
0.421
0.401
0.024
0.393
0.239
0.082
0.311
0.249
0.239
0.228
0.021
0.502
0.153
0.034
0.468
0.153
0.153
0.153
0.000
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Download