Tittle: MCR-BANDS: An user friendly MATLAB program for the

OPEN ACCESS DOCUMENT Information of the Journal in which the present paper is published:  Elsevier, Chemometrics and Intelligent Laboratory Systems, 2010, 103 (2), pp. 96-107.  DOI: dx.doi.org/10.1016/j.chemolab.2010.05.020 Tittle: MCR-BANDS: An user friendly MATLAB program for the evaluation of rotation ambiguities in Multivariate Curve Resolution Authors: Joaquim Jaumot1 and Romà Tauler2,* 1 Department of Analytical Chemistry, University of Barcelona, Diagonal 647, Barcelona 08028 2 Department of Environmental Chemistry, IDAEA-CSIC, Jordi Girona 16-20, Barcelona 08034 * e-mail Roma.Tauler@idaea.csic.es Abstract: A new user friendly graphical interface and a command line MATLAB computer program for the evaluation of the extent of rotation ambiguities associated to Multivariate Curve Resolution solutions are presented. Different examples of application are shown including the simultaneous analysis of multiple data sets and the implementation of local rank and trilinearity constraints, basic tools to reduce and eliminate rotation ambiguities. The programs allows for an easy check of the extent of rotation ambiguity remaining in Multivariate Curve Resolution solutions in the investigation of a particular system and it also allows for the checking of the effect of applied constraints. In this way, conditions and limitations to achieve optimal solutions in Multivariate Curve Resolution are easily assessed. 1. Introduction Multivariate Curve Resolution (MCR) methods [1] are reaching a mature state in Chemometrics and they have evolved as a powerful tool for the investigation of many type of chemical systems. The minimal and basic assumption of all MCR methods is the fulfillment of a bilinear model describing how the experimental data set arranged in a data matrix is decomposed in the product of two factor matrices of reduced sizes, one related with the rows and the other with the columns of the original data matrix. For instance in the particular case of a data matrix obtained by liquid chromatography with diode array detection (LC-DAD), the spectra at different elution times are collected in the rows of the data matrix, or similarly, the elution profiles at the different wavelengths are arranged in the columns of the same data matrix. The row factor matrix obtained in the decomposition of this data matrix will describe the elution profiles of the (co-)eluted components and the column factor matrix obtained in the same matrix decomposition will describe the pure spectra of these components. Bilinear models and their corresponding matrix factor decompositions provide very powerful tools to investigate and describe many chemical processes and systems when they are measured by modern multivariate analytical instrumental methods. Among the existing bilinear model based methods, MCR looks for those matrix decompositions where the two factor matrices are describing as close as possible the true sources of data variance (experimental measurements), without knowing them in advance. In this context, MCR methods in Chemistry have similar goals as other Factor Analysis methods [2, 3]. MCR methods differ from other bilinear model based methods like Principal Component Analysis (PCA, [4]) or Singular Value Decomposition (SVD, [5]) methods in the way the matrix decomposition is performed. Whereas in PCA or SVD, the factor matrices are orthogonal and in the directions of maximum explained variances (apart from factor normalization or scaling), in MCR, constraints are softer but with more physical meaning, like non-negativity, unimodality, closure or local rank and selectivity [6, 7]. They can be called ‘natural’ constraints since they are frequently fulfilled by natural systems. For instance, absorption spectra and concentrations of chemicals can only be non-negative. Therefore solutions obtained by MCR methods will be more easily interpretable and closer to the true sources of data variation than those obtained by methods like PCA or SVD. However, although MCR solutions have more physical meaning and an easier interpretation than those obtained by PCA or SVD, they are not unique in the general case, and they have an unknown amount of ambiguity. Two types of ambiguities are distinguished in MCR methods: intensity (or scale) ambiguities and rotation ambiguities [8]. Intensity ambiguities are present in any factor analysis decomposition unless the scale of one of the two factors matrices is fixed in a particular way. In SVD and PCA decompositions, loading factors are usually scaled to have all the same length or norm equal to one (the squared sum of all the vector elements is equal to 1). In some cases, the scale of one of the two factor matrices is previously known, like in chemical reaction systems where the total mass of constituents is known to be constant (closed reaction systems). In these cases, the application of closure constraints will eliminate the scale ambiguity. However, the more critical and difficult to avoid type of ambiguity is the so called rotation ambiguity. In this case, a set of different solutions (and linear combinations of them), will fit the experimental data equally well, i.e. they will be equivalent from a mathematical point of view although they will be completely different from a physical point of view. An obvious way to reduce this type of ambiguity is by means of the application of more constraints to the solutions, but this should be performed keeping the physical plausibility of the solutions. Therefore the use of orthogonal constraints to provide unique solutions like in PCA or SVD is excluded in MCR methods since this type of constraints cannot be applied if non-negativity solutions are required. Apart from natural constraints, the two more powerful strategies to avoid rotation ambiguities in MCR methods are the use of local rank and selective constraints [7, 9] the extension to simultaneous analysis of multiple data sets [10] and the use of hard (deterministic) modeling [11]. Using appropriately these strategies, unique solutions are achieved as it has been shown in previous works [12, 13]. Among all these possibilities, the use of the trilinearity constraint [10, 14] in the simultaneous analysis of multiple data sets is the best choice for the analysis of data sets that fulfill this type of model, obtaining then the same solutions as when PARAFAC [15] is used. Once it is recognized that MCR solutions can have a certain degree of ambiguity, the question is how to evaluate this degree of ambiguity. Different methods have been already proposed in the literature for such an evaluation, including the development of methods for the calculation of the boundaries of the so called feasible bands, within which all the feasible solutions should remain inside. In the last years, different authors [16, 17] have proposed the calculation of these boundaries using the optimization of a function describing the relative contribution of each component to the whole measured system signal. This approach has been already used in the investigation of different problems [18] and also revised and discussed in other works [19, 20]. There is a need however to extend the use of this approach for its general use and to provide freely available software and tutorials. This is the main goal of this paper: the proposal of new user friendly MATLAB set of programs for the evaluation of rotational ambiguity in Multivariate Curve Resolution which will be called MCR-BANDS. Two new MATLAB programs are presented, the command line mcrbands program (improved from our previous version of it, given in ref [17]) and the new mcrbandsg graphical user interface program, firstly introduced in this paper. The results obtained by these two programs are equivalent and the only differences are in the way the results are presented and in the way the input data are given to the program. In this work, the use of these two programs and the results of their application are shown for the solution of different type of problems. This paper is organized in the following way. First the basis of the MCR-BANDS method will be summarized, then the MATLAB programs (command line and gui) will be briefly explained and their freely available web addresses given. Finally, different data examples will be also given and resolved using the previously described programs. The use of these programs will be shown first for a single data set for different constraints options, describing the different outputs given by the program. A special attention is paid to the use of selectivity and local rank constraints, and to the reduction or elimination of rotation ambiguities achieved by their application. Second and third examples are given then for the case of simultaneous analysis of multiple data sets. In this case, the implementation of the trilinear constraint for systems fulfilling this type of data structure is shown in detail to eliminate completely the rotation ambiguities. 2. Data Examples Figure 1 near here In Figure 1, the set of elution (concentration) and spectra profiles used for the data examples used in this paper are given. They all refer to the analysis of liquid chromatography with diode array detection (LC-DAD) data. In the upper part of the Figure 1, a single chromatographic LC-DAD run with a peak region containing four different components is used as first data example (Example 1). The spectra of these four components are given in the lower subplot. The inputs used in the program are the C matrix of elution profiles (size 51 x 4) and the ST matrix of pure spectra (size 4 x 96) of the 4 components present in the system. The overlap among elution and spectra profiles is moderate and there are no conditions for the total resolution of the system using only non-negativity constraints. In particular some of the components (blue ones) have their profiles totally embedded inside the profiles of the others, making their resolution without ambiguities especially difficult. However, using appropriate local rank constraints, resolution conditions can be achieved and ambiguities are practically eliminated. Example 2 illustrates the power of the simultaneous analysis of multiple LC-DAD runs in contrast to the individual analysis of them. The results show the improvement of the resolution of the whole system achieved by this simultaneous analysis of the four chromatographic runs of the second row of plots of Figure 1, containing each of them, the same 4 components coeluted in a different way. Observe that the 4th chromatographic run included in the analysis is the same as the one examined in detail in Example 1. The inputs used in the program are the C matrix of elution profiles in the 4 different runs (size 204 x 4) and the ST matrix of pure spectra (size 4 x 96) of the 4 components present in the system. It is also interesting to point out that the order and shape of the elution profiles of every component in the different chromatographic runs was different. This was done in purpose to illustrate systems that do not accomplish the conditions for trilinear models, since in this case; it is not possible to define a unique elution profile (first mode profiles or loadings) for each component in the different simultaneously analyzed matrices. In fact the rank of the two augmented matrices, row- and column-wise differ considerably as it is shown in Table 1. This indicates a strong departure from the trilinear condition. Therefore, in this case the application of trilinearity is precluded and the conditions of unique solutions are not assured. However, as it is also seen in Figure 1, some of the runs have better local rank resolution conditions than others according to Resolution theorems [9], and therefore the whole system can beneficiate from the application of selectivity and local rank constraints. See for instance that in the fourth run previously analyzed alone in Example 1, there are concentration selective regions for elution profile of components 1 and 4, and consequently, resolution conditions can also be achieved for components 2 and 3 [9]. Example 3 is similar to previous one, the simultaneous analysis of 4 chromatographic runs with 4 coeluted components in each run (see the third row of plots in Figure 1). In this case, the inputs used in the program are similar to those described fro previous Example 2. But now the position and shape of the concentration profiles of every component are the same in the different runs, and they only differ in their relative amounts (peak areas). This is the ideal situation for chromatographic separations and for the accomplishment of unique resolution condition without ambiguities. Even if resolution theorems [9] are not fulfilled because of the extremely high overlap among elution profiles and spectra profiles, the use of trilinearity constraints already provide unique solutions [10, 14]. 3. Theory 3.1. MCR methods: fundamentals and ambiguities All MCR methods are based on the bilinear decomposition of the measured data matrix according to equation: T D = CS + E = N  cn snT + E  D* + E Equation 1 n=1 In this equation D(I,J) is the measured data matrix with I rows and J columns. For instance in the case of LC-DAD UV spectrophotometric data, the rows are the experimental DAD spectra measured at each retention time (i =1, … I) and the columns of this data matrix are the elution profiles at each wavelength (j =1,… J). C(I,N) has the elution or concentration profiles of the N eluted components and matrix ST(N,J) has the UV pure spectra of these N components. The contribution of each component to the whole measured signal is the rank one matrix obtained by the vector product of its elution profile by its pure spectrum, i.e. for component n, cn snT . E matrix in Equation 1 gives the part of data matrix D which is not described by the matrix product CST, like experimental errors and uncertainties. The main goal of all MCR methods is to obtain solutions with physical meaning, describing the true profiles of the different components in the mixture, measured according the two measurement modes (rows and columns), and to leave in matrix E only the noise and experimental error. However, in the general case, and in absence of any constraint, Equation 1 has an infinite number of solutions, since there are an infinite number of matrices C and ST, which when they are multiplied, they will produce the same result, the data matrix D* (apart from noise E). This indeterminacy is described mathematically by the following equation: D* = Cold SoldT = (ColdT-1) (T SoldT) = CnewSnewT Equation 2 According to Equation 2, any invertible matrix T(N,N) gives a new set of equivalent solutions of the MCR model. Or said in other words, any linear combination of C and ST solutions will also produce new solutions of the bilinear model. In Factor Analysis literature, this indeterminacy is usually referred as rotation ambiguity. The goal of MCR methods is to limit rotation ambiguities, trying to eliminate them by the application of appropriate constraints. Many examples of successful application of constraints in MCR methods have been described [6, 7, 10-14]. Constraints can be defined as any feature we can consider the true profiles in C and ST matrices should have. The more obvious one is non-negativity. In the previous example on LC-DAD, elution profiles and pure spectra of the different eluted components should have positive or null values, but not negative values. Other constraints like unimodality, closure (mass balance), selectivity, local rank and hard modeling constraints have been proposed and used successfully [1014]. When the MCR model of Equation 1 is extended to the simultaneous analysis of multiple data matrices, other additional constraints like the correspondence among components, absence and presence of them in some data matrices, and specially trilinearity and other multilinearity constraints can be implemented. In many of these cases, the solutions provided by MCR under such constraints produce solutions that can be considered unique in practical terms and therefore the obtained results can be used with confidence. The large experience accumulated in this field is reflected in the large number of successful applications obtained by MCR in the last years [1, 12, 13]. 3.2. The MCR-BANDS method The MCR-BANDS method has been described in previous works [17] and already applied in some examples [18]. Basically, the method is based in a preliminary idea proposed by Paul Gemperline [16] of calculating the relative contribution of every component in a mixture using a method which in our case is evaluated using the equation: fn = cn s nT CS T Equation 3 Where fn is a scalar value which gives the relative signal contribution of a particular component or n components to the whole signal for the mixture of N components (n=1,…N). This relative signal contribution is measured by the quotient of two norms (Frobenious norm), the one from the signal of the considered component n, cnsnT , and the one from the whole signal considering of all components of the system together, CS T . As shown in Equation 2, the product CST produces the same results for any invertible matrix T. Therefore the value of CS T in Equation 3 is a constant value for any T value. On the contrary, for every component n=1,…N, every T matrix will give a different set of cn and snT profiles with different shapes, and their product cnsnT will be now also different, as well as its norm cnsnT , and its relative signal contribution fn defined by Equation 3. Therefore, the scalar value of fn will depend on the considered T matrix. The application of constraints makes only possible some of these T matrices. It is then possible to look for those T matrices which give maximum and minimum values of the relative contribution function for each component of the system, fn n=1,…N, called respectively fnmax and fnmin. This is the strategy that was implemented in the MCR-BANDS method [17-19], where for a particular system defined by a set of profiles in C and ST, Tnmax and Tnmin matrices which give fn maximum, fnmax , and minimum, fnmin, respectively, are calculated for each of the components of the system under a particular set of constraints. Once a system was investigated by means of a MCR method, using for instance MCRALS [6, 7] under a set of constraints, and a particular set of solutions C and ST is obtained, then Tnmax and Tnmin matrices corresponding to fnmax and fnmin values of Equation 3 are calculated. In order to perform this task a non-linear optimization with non-linear constraints from the MATLAB Optimization Toolbox is applied. The set of cn and snT profiles corresponding to each component n=1,…,N, giving the maximum and minimum values fnmax and fnmin, noted by cnmax snT.max and cnmin snT.min respectively, are then plotted for visual inspection together with the initially proposed profiles, cn and snT. In the cases where rotation ambiguities were practically eliminated by the application of constraints, the profiles cn and snT do not change during the optimization and they are practically constant. This constitutes an easy and adequate way to show that the obtained solutions are practically unique without ambiguities. On the other hand when fnmax and fnmin differ appreciably from fn and cnmax snT,max and cnmin snT,min from initial cn and snT, the presence of rotation ambiguities is confirmed and the extent of rotation ambiguity can be evaluated by the difference between fnmax and fnmin values and displayed graphically by the representation of cnmax snT.max and cnmin snT.min. It should be noted that this evaluation should be performed for each component independently and that the achieved values of fn and profiles cn and snT for every component will depend on the profiles of the other components. This is actually taken into account during the optimization of the function of fn, which is performed for every component separately, allowing the profiles of all other components of the system to change Therefore, the number of optimizations is equal to the number of components multiplied by 2 (one to find out the maximum and another one for the minimum of each component). In a recent work [20], we have shown that for the case of two components, the cnmax snT.max and cnmin snT.min profiles obtained by the optimization of the fn function are coincident with the boundaries of the band of the feasible solutions obtained by a systematic grid search of all the possible solutions. The extension of this to more than two components has not been possible until now because for the case of 3 or more components, it is not possible to define a single couple of maximum and minimum graphical boundaries of the band of feasible solutions. The concept of maximum and minimum band boundaries and their graphical representation in a 2D plot is only strictly possible for a system of two components [20, 21]. However, the use of the optimization of the scalar function defined in Equation 3 and proposed in this work is always possible, whatever is the number of components present in the system, and the results obtained using this strategy can be used to measure the amount of rotation ambiguity remaining in the system. It is in this direction that we have progressed in the implementation of the concept developed in previous works [17-19] and implemented in a new graphical user interface program in this paper to facilitate and popularize its use and its possible generalization for all type of MCR studies 4. Computer programs 4.1. MCR-BANDS command line program A command line MCR-BANDS program has been implemented in a small set of MATLAB (The Mathworks Inc, Natick, MA) m functions consisting of: a) the main m function mcrbands for input/output of parameters and for calling the optimization routines; b) the m function fmaxmin which calculates the optimization function according to Equation 3; c) the m function mycons defining the equality and inequality constraints normalization/closure, non-negativity, local rank/selectivity and multiset/trilinearity); d) the m function unimodg which is specific for the definition of unimodality constraints; and e) the optimization m function fmincon from the MATLAB Optimization Toolbox (The Mathworks Inc, Natick, MA) which contains the main core functions for the non-linear optimization with non-linear constraints needed to solve the problem under investigation. Further details about the program structure can be obtained under request to the authors. For those who are interested in the details of the optimization routines and procedure, MATLAB optimization toolbox tutorial and reference textbook can be consulted, especially in relation to the optimization of the non-linear function (Equation 3) under non-linear constraints using the general fmincon optimization function in this toolbox. In the following, only the details concerning to input and output parameters as well as further aspects specific to multivariate curve resolution problems are given. The main program is mcrbands.m with the following input and output arguments: [sband,cband,normband,tband,foptim] = mcrbands(c,s,t0,cknown,sknown); Mandatory inputs are c the initial concentration profiles and s the initial spectra profiles while t0, cknown and sknown are optional inputs. t0 is an initial T matrix different to the default one (the identity matrix is the default). cknown is of the same size as c matrix and it refers to previously known concentration values, to be used either as equality constraints or as an inequality constraints (<= values). For instance in case of local rank and selectivity constraints, those values in cknown belonging to concentration windows where a component is known not to be present are set to a very low value (e.g. 10-7) and the rest of values (unknown) are left to NaN (Not a Number). The same applies to sknown (of the same dimensions as the input parameters, components spectra) for known spectral values if needed as a constraint. These two parameters are only given as input when the corresponding should be applied. The outputs of the program are optional and only if desired: sband, cband, normband, tband, foptim. The two first outputs contain the spectra and concentration band profiles corresponding to the fnmax and fnmin values obtained by the optimization of the function of Equation 3. normband has the norm of each components contribution in these function boundaries. tband has the T rotation matrices in Equation 2 corresponding also to the function boundaries in relation to the initial estimates provided in the input as cinic and sinic input matrices giving fninic values. The last output parameter, foptim, refers to fnmax nd fnmin values finally achieved for each components n=1,…N in Equations 1, 2 and 3. For comparison and to facilitate their graphical representation afterwards, the initial input values of cinic and sinic, are also given in the output. During the execution of the command line program, additional optional inputs are required to ask for what type of constraints should be applied. They have been implemented as simple on-line queries at the beginning of the execution of the program. At present, the following types of constraints have been implemented: normalization (or closure), non-negativity, unimodality, local rank-selectivity and multiset-multilinear data analysis. Different options are available in each case which should be filled out depending on the nature of the problem under study and on the previous knowledge about it. For instance, in case local rank-selectivity constraints, equality or inequality constraints are chosen, matrices cknown or sknown are given as inputs in the command line call to the program. During the execution of the program different graphical windows are opened with the partial results obtained during the optimization of each components contribution. Moreover in the MATLAB command window, partial results of the optimization are also given with details of its evolution. Cases where optimal solutions are not encountered (unfeasible solutions) can be then easily detected. In some circumstances the optimization becomes locally swamped inside an unfeasible region and it cannot proceed outside it towards a feasible region. This situation can be handled in different ways. One possibility is by changing the defaults for minimum incremental values in the calculation of numerical derivatives required for the calculation of appropriate directions where maximum and minimum of function defined in Equation 3 are located. The minimum value by default is 10-5, but it that can be diminished to a smaller value (e.g. 10-10). In many circumstances an easier way to solve this problem is to change the initial estimates of matrix t0 in the input. This is especially critical for those cases where the system is so constrained that the only possible values around the initial solution are the ones given in the input (unique solutions). In these cases, it is not recommended to start the optimization with the identity matrix as initial T value in t0, since it may be difficult to encounter any direction where the optimization can progress (constraints fulfillment). However if t0 is different to the identity matrix, the chances to have progress to the optimum are much higher. The plots provided as outputs display the profiles corresponding to fnmax and fnmin values as well as fninic for each component. In the results section some examples are shown and described. Also the numerical values of fnmax , fnmin and fninic are given for each components at the end of the optimization. 4.2. MCR-BANDS graphical user interface program To facilitate the use of the MCR-BANDS program previously described, a new Graphical User Interface (GUI) in the MATLAB environment is presented in this work. This new GUI program uses most of the functions described before for the command line version such as fmincon, fmaxmin or mycons. The new interface is launched by the mcrbandsg function and, then, the main window shows up (Figure 2). In this window, it can be clearly distinguished three different frames. In the first one (Obligatory Parameters), the user has to select the initial concentration and spectra profiles, cinic and sinic, in the corresponding pop-up menus. They should be already present in the MATLAB workspace environment before the program is launched. In the second frame (Constraints), the user has to select which of the available constraints should be applied during the optimization. For every constraint (normalization, non-negativity, unimodality, trilinearity and equality for concentration and/or spectra), the user can also select from a new pop-up menu which kind of implementation of the selected constraint is desired (for instance, non-negativity only for the concentration profiles). Finally, the third frame includes the Optional Parameters of the optimization. Here, in the corresponding pop-up menu, the initial default value of the T matrix can be modified by selecting a different variable from the MATLAB workspace environment as described in the case of the command-line program. In this space, other optimization parameters such as constraint tolerances violation (TolCon), termination tolerances on the optimal function value (TolFun), maximum number of iterations, or maximum and minimum incremental values in numerical derivatives calculation can be changed. However, in general, it is not recommended to modify these values and check for successful optimizations using default values. In case of encountering problems and with some experience these values can be tuned to deal with special difficult cases. Also, in this space, three additional items are activated when the user selects certain constraints. For instance, if the user changes the default value in the trilinearity constraint pop-up menu (l, the “Number of experiments” edit box is activated and the user is asked to confirm (or to modify) the number of analyzed experiments. The same happens in the case of applying equality or inequality constraints. In this case the Known Concentrations and Known Spectra pop-up menus will be activated This new pop-up menu will allow the user to select the variable from the MATLAB workspace environment. Finally, there are two additional parameters that will only affect the presentation of the optimization results. First, there is a check-box that allows the user to decide if the program will show plots during the optimization. Second, there is an option to save the results in a structure-array named by the user in addition to the structure variable mcr_bands.results given as a default output. Figure 2, 3, 4 and 5 near here When the “Optimize” button is clicked, the optimization is carried out and, at the end of it, the “Results of the optimization” window is shown (Figure 3). In the upper side of this window, the initial, maximum and minimum boundaries of concentration and spectra profiles are given. In both cases, the initial profiles are plotted as dotted lines and the calculated band boundaries are plotted as solid lines of the same color for every considered component. Also, additional information about the optimization is also given in this window. First, it gives a termination about how the optimization procedure finished. There are three different options: convergence, divergence or maximum number of iterations exceeded. If the user wants more detailed information, clicking on the “Numerical Value” button the final termination values for the optimizations of each one of the components are given indicating what situation was achieved in the optimization of each profile. Second, the program informs about the total number of constraints active at the end of the optimization. Again, the user can obtain more detailed information by clicking on the “Constraints used” button. In the new opened window (Figure 4), the active constraints are indicated. For instance in the example of Figure 4 it can be seen that normalization constraints are active. Finally, the important information about the final values of the optimization function in every optimization for each profile are available from the “See information” button. When this is clicked, a new window is launched (Figure 5). In this window, for each component, the function values for the initial profiles, for those corresponding to the maximum and minimum optimized functions, and for the difference between them, are given. This allows for an easy checking of the extent of remaining rotational ambiguities associated to the input initial solution. When the difference is close to zero means that practically there is no remaining rotation ambiguity. In addition, information about the optimized T rotation matrices both for MCR and SVD solutions are given. 4.2.1. Software details and implementation The MCR-BANDS GUI program has been implemented with MATLAB 7.6 and tested in different MATLAB versions (from 7.0 to 7.9) and operating systems. In addition to the MCR-BANDS program, fmincon.m function from MATLAB Optimization Toolbox is needed to perform the non-linear optimization with non-linear constraints. The results obtained by the MCR-BANDS program are stored in a MATLAB data structure. This data structure contains different data substructures in which all the information regarding to the optimization process and presentation of results is organized. For instance, they include the ‘Data’ structure in which the initial concentration and spectra profiles are saved, the ‘Constraints’ structure in which the information about which and how constraints are applied are stored and the ‘Results’ structure in which all the results of the optimization (band boundaries profiles, termination of the optimization, number of iterations, …) are stored. Both the MCR-BANDS command line and graphical user interface programs can be freely downloaded from the MCR-ALS webpage (www.mcrals.info) together with some data sets for example and testing. Once the program is downloaded, it is only necessary to copy the MATLAB m and fig files to the selected folder, which should be declared in the MATLAB path. 5. Results Table 1 near here In Table 1, rank analysis using Singular Value Decomposition of the different data sets used as examples in this work are given. For the purposes of this study, only the component profiles are given. The reason for this is because the MCR-BANDS procedure here described was used for the evaluation of the extent of rotation ambiguity associated to a particular MCR solution (a particular set of profiles defining one specific chemical system), once it has been obtained using for instance MCR-ALS. Therefore, this procedure is applied to the data matrix described by these profiles in absence of noise, i.e. to matrix D* in Equation 1 above. Four components were present and four singular values (no noise was present) were calculated in every case except for the case the row-wise augmented data matrix, where the four matrices used in data Example 2 do not match with trilinear model. In this case, a larger number of singular values were needed to explain the data variance of the row augmented data matrix, which could not be described by only four elution profiles. This is simply a consequence of the fact that the shapes and positions of the elution of the four coeluted components are different in the four different runs and that they could not be explained by only four elution profiles. This example was used here indeed to illustrate the case of a three-way data set without a trilinear data structure. On the contrary, data Example 3 gave the same number of components for row-wise and column wise augmented data matrices (equal to four), illustrating that in this case the data are fulfilling the trilinear model. Table 2 near here In Table 2 a summary of the results obtained in the evaluation of rotation ambiguity for the three different data examples is given. In this Table the values of the fnmin, fninic, fnmax and of the difference between fnmax and fnmin values are given for every component for the different analyzed data sets considering different type of constraints. Figures 6 and 7 near here In the analysis of the individual data matrix corresponding to the forth single LC-DAD run with 4 coeluted components, results of three optimizations are shown. The first optimization is for the case where only non-negativity and normalization constraints are applied. The results showed that in this case, rotation ambiguities were rather large, since for the four components n=1,2,3,4, the differences between the maximum and minimum optimization function values, fnmax- fnmin, were also rather large (0.214, 0.481, 0.269, 0.298 in Table 2, respectively for the four components). In Figure 6, the profiles given fnmax and fnmin values (continuous blue lines) together with the theoretical profile (broken red line) are given for the case of non-negativity and normalization constraints. Looking at these results, some of the profiles (Figure 6) resulted to be unreasonable from a chromatographic point of view. For instance the second and forth concentration elution profile were showing a very broad profile for fnmax with a double peak. Adding the unimodality constraint improved the results and the difference between fnmax and fnmin resulted to be smaller, but still large (0.188, 0.321, 0.255, 0.120 in Table 2). When local rank/selectivity constraints were also applied to this data set, the difference between the values of fnmin and fnmax was now very narrow (only 0.019, 0.019, 0.014, 0.024 in Table 2), indicating that the rotation ambiguities were practically nearly eliminated. The profiles corresponding to the extreme values of fn, fnmax and fnmin, were practically equal to those corresponding to fninic. In Figure 7, the effect of constraint application is shown in more detail for the elution profile corresponding to the forth component of this forth chromatographic run. Application of only non-negativity constraint leaves a wide range of possible elution profiles (continuous blue lines), including those with unreasonable double peaks. Also the spectra profiles can be very different if only this non-negativity constraint was applied (continuous blue lines). When unimodality was applied, this range of possible solutions was narrowed considerably (dotted lines), and when local rank constraints were applied (dashed lines), then the corresponding fnmax and fnmin components profiles were so close to the theoretical one (dashed blue and broken red lines) that they can hardly be distinguished. This proves again the importance of using selectivity/local rank constraints to get optimal solutions in MCR analysis, in agreement also with results found using MCR methods like Window Factor Analysis [22], HELP [23] and Genttle [24] methods. In practical cases, for the analysis of experimental systems, the information necessary for the application of the local rank/selectivity constraints can be obtained from preliminary Evolving Factor Analysis [25] of the data sets or by previous knowledge of the system concerning the selectivity in the profiles of every component in the two modes of measurement. See references [6, 7, 22-25] for more details about the application of this important constraint Figure 8 near here In the case of four LC-DAD runs each one with 4 coeluted components with a nontrilinear structure, two different optimizations were performed. In the first optimization, the extent of rotation ambiguities was evaluated using only non-negativity and spectra normalization constraints. The differences between fnmin and fnmax were always relatively high, 0.280, 0.407, 0.130 and 0.311, respectively for the four components, probing again that the extent of rotation ambiguities were also large (see also Figure 8). If the local rank information about the forth chromatographic run is introduced (2nd optimization performed) as an additional constraint, the differences between fnmin and fnmax were drastically reduced (more than 10 times!) to only 0.027, 0.028, 0.014, 0.021. This proves again [6, 7] the importance of this type of constraint. In Figure 8, the fnmax and fnmin profiles (continuous blue lines) for the four concentration in the four chromatographic runs and their corresponding spectra profiles are given for the case when only non-negativity constraints were applied. The difference between the profile continuous blue lines of the same component and with its theoretical profile (broken red line) is large. The fnmax and fnmin profiles for the case when selectivity or local rank constraints were applied were practically equal to the profile corresponding to the theoretical profile, fninic broken red line in Figure 8. Figure 9 near here In the case of the four LC-DAD runs with a trilinear structure, also two different optimizations were performed. In the first one only non-negativity and normalization constraints were applied and in the second optimization, the trilinearity constraint was also applied (apart from non-negativity and spectra normalization). This trilinearity constraint eliminates completely the rotation ambiguity as it is proven by fnmin and fnmax values. Their differences were now 0.738, 0.457, 0.084, 0.468 for the case without application of the trilinearity constraint and 0, 0, 0, 0 for the case when trilinearity constraint was applied. Therefore, in this last case, rotation ambiguity was totally eliminated as expected for the case of a data set fulfilling the trilinear model. In Figure 9, the fnmin and fnmax profiles (continuous blue lines) and fninic (broken red line) are shown for the case where only non-negativity and spectra normalization constraints were applied. Again a broad separation between fnmin and fnmax continuous blue profile lines is shown for this case. When trilinearity constraints are applied, fnmin and fnmax continuous blue profile lines coincided totally with the red broken line of the initial profiles and they were not shown. In the case of this optimization, due to the difficulty to move from initial values to find feasible solutions, perform the optimization and find fnmin and fnmax profiles different to fninic, the default option of the initial rotation T matrix in equation (identity matrix) was changed to a different non-identity arbitrary T rotation matrix. This allowed the optimization to proceed adequately and to confirm that the initial solution was unique because the system fulfilled the trilinear condition. 6. Reviewer assessments M. de Luca Department of Pharmaceutical Sciences, University of Calabria, Via P. Bucci, 87036 Rende, Cosenza, Italy “The Authors have presented an important innovation in multivariate curve resolution analysis dealing with a graphical user interface able to evaluate the rotation ambiguities in MCR solutions. The software, in both command line and GUI versions, has been implemented in MATLAB environment, which is one of the most used tools in chemometric field. The arrangement of the data matrices in the input step is well built and the expression of the results, through tables and plots, is clear and complete. As described in the text, the size of the band boundaries depends on the constraints that are imposed, by considering the data origin and the studied chemical system. GUI permits an easy selection of the different constraint combinations and it is besides possible to follow the change of the band boundaries varying the set of constraints. In this contest, I would like to suggest a issue for a future development of GUI. Since the matrices of the initial concentrations and spectra profiles often come from MCR-ALS analysis, it could be an interesting solution to have unique software able to perform simultaneously MCR-ALS analysis and evaluate the respective band boundaries. The examples reported in the text and the guide contained in the web site seems exhaustive. The multivariate curve resolution website is an excellent way to share the chemometric methods with the scientific community.” H. Abdollahi Faculty of Chemistry, Institute for Advanced Studies in Basic Sciences, 45195-159, Zanjan, Iran Evaluation of the extent of ambiguity in MCR solutions is a serious challenging problem in chemometrics. There is a demand for easily available software relevant to evaluate the rotation ambiguity accompany with MCR solutions for general applications. This time Tauler group has developed a Matlab program for estimating the rotational ambiguity in MCR methods. Now the users of MCR methods can easily check the quality and the validity of their solutions. Assessment of rotation ambiguity in this program is based on signal contribution function, a scalar function which has the ability to define maximum and minimum band boundaries for profiles of each component even in a multicomponent system. The software provides a clear graphical user interface, which allows an easy interaction with the program, and represents a useful tool for evaluating the rotational ambiguity under different options for imposing the constraints. The software runs according to the features described in the documentation correctly. No doubt, it represents a significant contribution as a tool for chemometric analyses. 7. Conclusions The use of the proposed MCR-BANDS program in its two implementations, the command line implementation and the user graphical interface implementation, allow for an easy check of the extent of rotation ambiguities associated to a particular MCR solution under a set of constraints. This opens a systematic way to validate the quality of MCR solutions and to assess their reliability. These two programs can also be used as a test of possible scenarios for data systems having different structures and resolution conditions. Most of the knowledge accumulated during the recent years about Multivariate Curve Resolution methods, their advantages and limitations, can be easily checked using the proposed MCR-BANDS program. MCR-Bands command line and MCR-Bands GUI programs are complementary tools to be used to assess the quality of MCR results. MCR-BANDS program can be used together with the MCR-ALS GUI with the purpose of data analysis and quality assessment of the obtained results in a single step.“Finally, since one of the possible drawbacks for the widespread use of the MCR-BANDS program in the future can be its dependence on the MATLAB Optimization Toolbox, an executable GUI version of the same program is also being prepared at present. This stand-alone software will be an adaptation of the program described in this work and its operation will be practically the same. 8. Figure Captions Figure 1. Component profiles used to prepare the different data examples. In the upper part of the Figure, chromatographic elution profiles of the four pure components in four chromatographic runs are given. Different components are identified by different line colors. In the first row of the Figure, elution profiles of the pure components in a single chromatographic run. In the second row of the Figure, elution profiles of the pure components in four chromatographic runs are given (Run 4 is the same chromatographic run used in Example 1). Observe that in this case these profiles do not match with a trilinear model (their positions and shapes of the elution profiles of every component change depending on what chromatographic run is considered). In the third row of the Figure, Observe that in this case these profiles match with the trilinear model (their positions and shapes of the elution profiles of every component are the same whatever chromatographic run, and they differ only in their relative amounts or intensities). In the fourth row of the Figure, the pure spectra of the four coeluted components are given and identified by their different colors. These spectra have been used in the three examples described in the text. Figure 2. GUI windows of the MCR-BANDS program. Main window for the data input and parameter selection. Figure 3. GUI windows of the MCR-BANDS program. Summary of Results window. Figure 4. GUI windows of the MCR-BANDS program. Applied constraints window. Figure 5. GUI windows of the MCR-BANDS program. Details about optimization results window. Figure 6 MCR-BANDS results when applied to data Example 1 and only nonnegativity and spectra normalization constraints were applied. On the left, each subplot gives the results obtained for the elution profile of every component. On the right, each subplot gives the results obtained for the spectrum profile of every component. Blue lines are the profiles corresponding to the maximum and minimum component relative contribution optimization function, fnmin and fnmax. Red lines are initial postulated profiles. Figure 7. Detail of MCR-BANDS results for the elution (7a) and spectra (7b) profiles of the forth component of data Example 1 when different constraints were applied: a) non-negativity and spectra normalization with continuous blue lines; b) non-negativity, spectra normalization dotted blue lines; c) non-negativity, spectra normalization and selectivity/local rank with dashed blue lines. Initial elution profile with red line. Figure 8. MCR-BANDS results when applied to data Example 2 (simultaneous analysis of the four chromatographic runs whose elution profiles are given in second row of plots of Figure 1, non-trilinear case) and only non-negativity and spectra normalization constraints were applied. On the left, each subplot gives the results obtained for the elution profiles of every component in the four different chromatographic runs (from left to right). On the right, each subplot gives the results obtained for the spectrum profile of every component. Blue lines are the profiles corresponding to the maximum and minimum component relative contribution optimization function, fnmin and fnmax. Red lines are the initial postulated profiles. Figure 9. MCR-BANDS results when applied to data example 2 (simultaneous analysis of the four chromatographic runs whose elution profiles are given in the third row of plots of Figure 1, trilinear case) and only non-negativity and spectra normalization constraints were applied. On the left, each subplot gives the results obtained for the elution profiles of every component in the four different chromatographic runs (from left to right). On the right, each subplot gives the results obtained for the spectrum profile of every component. Blue lines are the profiles corresponding to the maximum and minimum component relative contribution optimization function, fnmin and fnmax. Red lines are the initial postulated profiles. When trilinearity constraint was applied MCR-BANDS results were coincident with the initial profiles red line (no rotation ambiguities). 9. References [] [1] S.D. Brown, R. Tauler, B. Walczak, Comprehensive Chemometrics: Chemical and Biochemical Data Analysis, in, Elsevier Science, Amsterdam, The Netherlands, 2009, pp. 2.15-12.24. [2] W.H. Lawton, E.A. Sylvestre, Self Modeling Curve Resolution, Technometrics, 13 (1971) 617-633. [3] E.R. Malinowski, D.G. Howery, Factor Analysis in Chemistry, 3rd edn. ed., Wiley, New York, US, 2002. [4] I.T. Jolliffe, Principal Component Analysis, 2nd Ed. ed., Springer, New York, US, 2002. [5] G.H. Golub, C.F. Van Loan, Matrix Computations, 3rd. edn. ed., John Hopkins University Press, Baltimore, US, 1996. [6] R. Tauler, A. Smilde, B. Kowalski, Selectivity, Local Rank, 3-Way Data-Analysis and Ambiguity in Multivariate Curve Resolution, Journal of Chemometrics, 9 (1995) 31-58. [7] J. Jaumot, R. Gargallo, A. de Juan, R. Tauler, A graphical user-friendly interface for MCR-ALS: a new tool for multivariate curve resolution in MATLAB, Chemometrics and Intelligent Laboratory Systems, 76 (2005) 101-110. [8] E. Spjotvoll, H. Martens, R. Volden, Restricted least squares estimation of the spectra and concentration of two unknown constituents available in mixtures., Technometrics, 24 (1982) 173-180. [9] R. Manne, On the resolution problem in hyphenated chromatography, Chemometrics and Intelligent Laboratory Systems, 27 (1995) 89-94. [10] R. Tauler, I. Marqués, E. Casassas, Multivariate curve resolution applied to threeway trilinear data: Study of a spectrofluorimetric acid-base titration of salicylic acid at three excitation wavelengths, Journal of Chemometrics, 12 (1998) 55-75. [11] A. De Juan, M. Maeder, M. Martínez, R. Tauler, Application of a novel resolution approach combining soft- and hard-modelling features to investigate temperaturedependent kinetic processes, Analytica Chimica Acta, 442 (2001) 337-350. [12] A. De Juan, R. Tauler, Chemometrics applied to unravel multicomponent processes and mixtures: Revisiting latest trends in multivariate resolution, Analytica Chimica Acta, 500 (2003) 195-210. [13] A. De Juan, R. Tauler, Multivariate Curve Resolution (MCR) from 2000: Progress in concepts and applications, Critical Reviews in Analytical Chemistry, 36 (2006) 163176. [14] A. De Juan, R. Tauler, Comparison of three-way resolution methods for nontrilinear chemical data sets, Journal of Chemometrics, 15 (2001) 749-772. [15] R. Bro, PARAFAC. Tutorial and applications, Chemometrics and Intelligent Laboratory Systems, 38 (1997) 149-171. [16] P.J. Gemperline, Computation of the range of feasible solutions in self-modeling curve resolution algorithms, Analytical Chemistry, 71 (1999) 5398-5404. [17] R. Tauler, Calculation of maximum and minimum band boundaries of feasible solutions for species profiles obtained by multivariate curve resolution, Journal of Chemometrics, 15 (2001) 627-646. [18] M. Garrido, M.S. Larrechi, F.X. Rius, R. Tauler, Calculation of band boundaries of feasible solutions obtained by Multivariate Curve Resolution-Alternating Least Squares of multiple runs of a reaction monitored by NIR spectroscopy, Chemometrics and Intelligent Laboratory Systems, 76 (2005) 111-120. [19] R. Tauler, Application of non-linear optimization methods to the estimation of multivariate curve resolution solutions and of their feasible band boundaries in the investigation of two chemical and environmental simulated data sets, Analytica Chimica Acta, 595 (2007) 289-298. [20] H. Abdollahi, M. Maeder, R. Tauler, Calculation and meaning of feasible band boundaries in multivariate curve resolution of a two-component system, Analytical Chemistry, 81 (2009) 2115-2122. [21] M. Vosough, C. Mason, R. Tauler, M. Jalali-Heravi, M. Maeder, On rotational ambiguity in model-free analyses of multivariate data, Journal of Chemometrics, 20 (2006) 302-310. [22] E.R. Malinowski, Window Factor-Analysis - Theoretical Derivation and Application to Flow-Injection Analysis Data, Journal of Chemometrics, 6 (1992) 29-40. [23] O.M. Kvalheim, Y.Z. Liang, Heuristic evolving latent projections: Resolving twoway multicomponent data. 1. Selectivity, latent-projective graph, datascope, local rank, and unique resolution, Analytical Chemistry, 64 (1992) 936-946. [24] R. Manne, B.V. Grande, Resolution of two-way data from hyphenated chromatography by means of elementary matrix transformations, Chemometrics and Intelligent Laboratory Systems, 50 (2000) 35-46. [25] M. Maeder, A.D. Zuberbuehler, The resolution of overlapping chromatographic peaks by evolving factor analysis, Analytica Chimica Acta, 181 (1986) 287-291. Table 1 Chemical Rank SVD analysis of the different data sets. SV1…SV7 first seven singular values. Xn1,..,Xn4 data matrices of the 4 chromatographic runs obtained using the 4 set of elution profiles of the second row of plots of Figure 1(Example 2) and the spectra of the lower part of Figure 1. X1,…X4 data matrices of the 4 chromatographic runs obtained using the 4 set of elution profiles of the third row of plots of Figure 1 (Example 3) and the spectra of the lower part of Figure 1. 1 Data matrices SV1 SV2 SV3 SV4 SV5 SV6 SV7 Xn1 11.2949 2.2683 0.3953 0.2053 ---- ---- ---- Xn2 13.2277 1.2636 0.1311 0.0576 ---- ---- ---- Xn3 6.4385 0.9806 0.2841 0.1195 ---- ---- ---- Xn4 3.4353 1.1396 0.3542 0.1778 ---- ---- ---- [Xn1;Xn2;Xn3;Xn4]1 18.815 3.1344 1.0189 0.6358 ---- ---- ---- [Xn1,Xn2Xn3,Xn4]2 18.590 3.6507 2.2066 1.0740 0.3696 0.3128 0.2077 X1 14.0970 1.8833 0.5402 0.1167 ---- ---- ---- X2 15.3321 2.2551 0.2992 0.1040 ---- ---- ---- X3 1.5925 1.4910 0.6249 0.1409 ---- ---- ---- X4 9.1131 1.1158 0.3742 0.0645 ---- ---- ---- [X1;X2;X3;X4]1 25.4528 3.8498 1.2177 0.3813 ---- ---- ---- [X1,X2,X3,X4]2 25.3673 4.3705 1.2626 0.3107 ---- ---- ---- and 2 indicate respectively column-wise and row-wise matrix augmentation in MATLAB notation. Table 2 Effect of constraints on rotation ambiguity measured by components relative contribution function maximum and minimum values. Data Constraints Example 1 Matrix X4n 1,2 Example 1 Matrix X4n 1,2,3 Example 1 Matrix X4n 1,2,4 Example 1 Matrix X4n 1,2 Example 2 Augmented Matrix [Xn1;Xn2;Xn3;Xn4] Example 3 Augmented Matrix [X1;X2;X3;X4] Example 3 Augmented Matrix [X1;X2;X3;X4] 1,2,4 1,2 1,2,5 f fmax finic fmin fmax- fmin fmax finic fmin fmax- fmin fmax finic fmin fmax- fmin fmax finic fmin fmax- fmin fmax finic fmin fmax- fmin fmax finic fmin max f - fmin fmax finic fmin fmax- fmin Component 1 0.491 0.369 0.277 0.214 0.472 0.369 0.284 0.188 0.376 0.369 0.357 0.019 0.652 0.495 0.372 0.280 0.506 0.495 0.479 0.027 0.772 0.558 0.034 0.738 0.558 0.558 0.558 0.000 Component 2 0.618 0.226 0.137 0.481 0.499 0.226 0.178 0.321 0.228 0.226 0.209 0.019 0.589 0.301 0.182 0.407 0.306 0.301 0.278 0.028 0.680 0.439 0.223 0.457 0.439 0.439 0.439 0.000 Component 3 0.660 0.658 0.391 0.269 0.660 0.658 0.405 0.255 0.660 0.658 0.646 0.014 0.297 0.281 0.167 0.130 0.290 0.282 0.276 0.014 0.144 0.130 0.060 0.084 0.130 0.130 0.130 0.000 Constraints: 1 normalization; 2 non-negativity; 3 unimodality; 4 selectivity/local rank; 5 trilinearity Component 4 0.482 0.421 0.144 0.298 0.439 0.421 0.319 0.120 0.425 0.421 0.401 0.024 0.393 0.239 0.082 0.311 0.249 0.239 0.228 0.021 0.502 0.153 0.034 0.468 0.153 0.153 0.153 0.000 Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9

Tittle: MCR-BANDS: An user friendly MATLAB program for the

Related documents

Products

Support

Tittle: MCR-BANDS: An user friendly MATLAB program for the

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib