Literature Review Data Processing Methods for 2D Chromatography Muzi Li Supervisor: dr. Gabriel Vivo-Truyols University of Amsterdam MSc Chemistry Analytical Sciences Literature Review Data Processing Methods for 2D chromatography Muzi Li (1022 6761) Supervisor: dr. Gabriel Vivo-Truyols 2|Page ABSTRACT Two dimensional liquid and gas chromatography have become increasingly popular in the application of many fields such as metabolites, petroleum and food analysis due to its substantial resolving power of separating complex samples. However, the additional dimension together with a variety of detection instruments render the complexity of data sets by increasing the order of the data generated by instruments, though benefiting more useful information by, for instance, second-order advantages. The high order of data requires data processing methods so to transform chromatograms and spectrums into useful information in several steps. Data processing methods is consisting of two steps: data pre-processing and real data processing. Pre-processing, including baseline correction, peak detection, smoothing and derivatives, alignment and normalization, is aiming for reducing the unrelated variations in chemical variations caused by interferences such as noise. This step is important since this prepares the raw data sets for real data processing such as classification, identification and quantification. If data pre-processing fails, the data may be obscured by the unrelated variations, resulting in the failure of real data processing. In the real data processing steps, methods such as PCA, GRAM, PARAFAC were used for classification and quantification of the sample compounds. This literature present, however not in details, the most popular data processing methods used for two dimensional chromatography and their application was mentioned as well. 3|Page CONTENT INTRODUCTION .................................................................................................................................. 5 Order of Instrument............................................................................................................................. 8 Data pre-processing .............................................................................................................................. 10 Baseline correction............................................................................................................................ 10 Smoothing and Derivatives ............................................................................................................... 15 Peak detection ................................................................................................................................... 16 Alignment ......................................................................................................................................... 17 Normalization ................................................................................................................................... 19 Data processing ..................................................................................................................................... 20 Supervised and unsupervised learning .............................................................................................. 20 Unsupervised................................................................................................................................. 21 Supervised ..................................................................................................................................... 27 Conclusion & Future work.................................................................................................................... 34 Reference .............................................................................................................................................. 35 4|Page 1 INTRODUCTION 2 Two-dimensional (2D) chromatography (2D liquid chromatography or 2D gas chromatography) refers 3 to a procedure where parts or all of the sample components to be separated are subjected to two 4 separation steps by different separation mechanisms. In planar chromatography, 2D chromatography 5 refers to the procedures where components are supposed to first migrate in one direction and 6 subsequently in a direction at right angles to the first one by two different eluents. [1] Compared to 7 1D chromatographic performance, 2D chromatography possesses substantial resolving power 8 providing high separation efficiency and selectivity. In 1D chromatography, resolution (RS) is often 9 used to quantitatively measure the degree of separation between two components A and B. RS is 10 defined as the time difference of two adjacent peaks divided by the sum of the peak width of both. [2] 11 However, it is difficult to acquire acceptable resolution of all peaks for complex samples consisting of 12 numerous components. Therefore, peak capacity (PC) is introduced to measure the overall separation, 13 particularly for complex samples analyzed in gradient elution in 2D chromatography. [3] The peak 14 capacity in separation is defined as the total number of peaks that can be fit into a chromatographic 15 window, when every peak is separated from adjacent peaks with RS=1. [3] Since the fractions from 16 the first separation are further resolved in the second orthogonal separation, the peak capacity for 2D 17 separation is equal to the product of peak capacities of each separation. [3] For instance, if the peak 18 capacity in 1D in isocratic is PC=100, then the total peak capacity of 2D would be PC=100 × 100 = 19 10,000. [3] Due to the high resolving power [5-14], the use of 2D chromatographic separation has 20 raised substantially in bio-chemical field, [15-18] a nice review about multidimensional LC in the 21 field of proteomics [19] and a review on the application of 2D chromatography in food analysis [20] 22 have been published. After instrumental analysis, data sets are produced and are in need of 23 interpretation in terms of identification, classification and quantification. The process of data 24 interpretation/transformation into useful information, such as inferring a property of interest (typically 25 involving search of bio-markers) or classifying a sample into one of several categories, is termed 26 chemometrics. An example of sample classification is given in Figure 1. 27 28 29 5|Page 30 31 32 33 34 Figure 1. Classification of chromatograms is based on the relative abundance of all the peaks in the mixture. [21] 35 data generated by 2D instruments makes the work of data interpretation time-consuming; often, there 36 is chance of overlooking on the data. In order to extract the most useful information out of the raw 37 data, a well-defined procedure of data processing needs to perform. Daszykowski et al. [22] 38 summarized the results as listed in Table 1 by searching paper titles and keywords containing 39 chemometrics and chromatography. The results exhibited promising scope for solving 40 chromatographic problems. There are multiple chemometrical methods developed, applied and 41 improved for each aspect of problem in chromatography in the past decades bearing both advantages 42 and disadvantages, and the extensive application of chemometrics in chromatography has spread from 43 drug identification [23] in pharmaceutical and beer/wine quality control/classification [24-27], 44 proving economic fraud [28] in food field to identification of microbial species by evaluation of cell 45 wall material [29-30] and predicting disease state [31-33] in clinically medical, as well as oil 46 exploration (oil-oil correlation, oil-source rock correlation) [34] in petroleum. Although enormous information can be extracted from 2D chromatography, the complexity of the raw 47 48 49 6|Page 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Keyword(s) Multivariate curve resolution Alternating least squares Mcr-als Chemometrics Experimental design Multivariate analysis Pattern recognition Classification PCA Qspr Qsar Topological indices Topological descriptors Modeling retention Fingerprints Clustering Peak shifts Deconvolution Background correction De-noising Noise reduction Signal enhancement Preprocessing Mixture analysis Alignment Warping Peak matching Peak detection Wavelets Score 44 34 18 403 605 275 280 1029 556 51 111 38 11 5 802 219 16 244 21 9 17 43 45 82 383 11 18 54 32 5456 Table 1. Results of keyword search in SCOPUS system, using a quick search (“keyword(s)” and chromatography). [22] 97 98 99 100 101 7|Page 102 Order of Instrument 103 Before introducing data processing methods in 2D chromatography, data-type-based classification 104 shall be defined beforehand so to better understand the methods to be applied. The classification of 105 analytical instruments or methods is summarized for the simplification of data processing based on 106 the type of data generated, using the existed mathematical terminology as following [35] : a zero- 107 order instrument is one which generates only a single datum per sample since a single number is a 108 zero-order tensor. Examples of zero-order instruments are ion-selective electrodes and single-filter 109 photometers. A first-order instrument, including all types of spectrometers, chromatographs and even 110 arrays of zero-order sensors, is one which generates multiple measurements at one time or for one 111 sample, wherein the measurements can be put into an ordered array as a vector of data (also termed as 112 a first-order tensor). Likewise, a second-order instrument generates a matrix of data per sample, 113 mostly used in but not restricted to hyphenation such as gas chromatography – mass spectrometer 114 (GC-MS), LC-MS/MS, GC-GC. Higher order of data can be generated by even more complex 115 instruments and there is no limit to the maximum order of data. [35] In Table 2, the concepts of order 116 of data were depicted. 117 Data order Array One sample Calibration A sample set Zero Scalar One-way Univariate First Vector Two-way Multivariate Second Matrix Three-way Multi-way Third Three-way Four-way Multi-way Fourth Four-way Five-way Multi-way 118 119 120 Table 2. Different arrays that can be obtained for a single sample and for a set of samples. [36] 8|Page Calibratio n Required selectivit y Maximu m analytes Miminum standards (with offset) Interferences Signal averagi ng Statistics Somethin g extra Zero order Full 1 1 (170) None Simple, well defined − First order Net analyte signal No. of sensors Complex defined − Second order Net analyte rank min (I, J) 1 per species (1+1 per species present) 1 (169) Cannot detect; analysis biased Can detect; analysis biased Can detect; analysis accurate ~√𝐈 ∙ 𝐉 ~√𝐉 Complex, not fully investigate d Firstorder profiles 121 122 123 Table 3. Advantages and disadvantages of different calibration paradigms. [35] 124 For 2D chromatography analysis, the data would be at least first-order (e.g. LC×LC); however, single 125 wavelength detector or hyphenated detector (LC×LC-MS) is always necessary which complicating 126 the data by increasing the data order to such as second-order or third-order providing detailed and 127 precise information with second-order advantages (Table 3). The primary second-order advantage is 128 the higher selectivity even with unknown interferences. [35, 37] Different methods were categorized 129 and elucidated to give an overview of the current data processing methods for 2D chromatography. 130 131 132 133 134 9|Page 135 Data pre-processing 136 In 2D data processing, pre-processing must be applied to the raw data before quantitative and/or 137 qualitative data analysis due to the obscuration of irrelevant chromatographic variations caused by 138 interferences such as noise (high frequency) and background (low frequency). In contrast, the 139 intermediate frequency is the signal of component. Pre-processing is crucial because it helps reduce 140 the unrelated variations in chemometric analysis of chemical variations and it has become the critical 141 point potentially determining the success and failure in many applications. [38-40] Particularly in 142 metabolomics analysis, the methods of preprocessing have become the primary ones which are 143 difficult and influential in the final results. [41] In most cases, the variables in a data set adjacently 144 located to each other are related and contain similar information; methods for filtering the noise and 145 background correction utilize this relationship for interference removal. 146 Baseline correction 147 The interference in baseline consists of background (low frequency) and noise (high frequency). 148 General baseline correction procedures are designed to reduce the low frequency noise while some 149 procedures and smoothing techniques are particularly for high frequency variations so to improve the 150 signal-to-noise (S/N) ratios. [38] Note that the definition of background tends to be used in a more 151 general sense to designate any unwanted signal including noise and chemical components, while 152 baseline is associated with a smooth line reflecting a “physical” interference. [42] In Figure 2, 153 difference between background, noise and signal of component is well elucidated. 154 155 156 Figure 2. Components of analytical signal: (a) overall signal; (b) relevant signal; (c) background; and, (d) noise. [22] 10 | P a g e 157 Baseline correction is typically the first step in preprocessing due to the baseline shifting and 158 interference contributed by solvents and impurities etc. which assign imprecise signals to each 159 component at a fixed concentration. After baseline correction, the baseline noise signal is supposed to 160 be numerically centered at zero. [38] The simplest way of baseline correction is to run a “blank” 161 analysis and subtract the chromatogram from the sample one. However, several blank runs are in need 162 of performance in this case to obtain a confidence level on the blank chromatogram since variations 163 might occur from run to run analysis. A second way of baseline correction is to use polynomial least 164 squares fitting simulating a blank chromatogram and then subtract it from the sample one. This 165 method is effective to some extent but is in need of user intervention and prone to variability 166 particularly in low S/N environments. [43] An alternative is to use penalized least squares algorithm 167 with some adaption. The penalized least squares algorithm was first published by Whittaker [44] in 168 1922 as a flexible smoothing method (noise reduction). Silverman et al. [45-46] later developed 169 another method for smoothing named roughness penalty method. The penalized least squares 170 algorithm can be regarded as roughness penalty smooth by least squares, which is balanced between 171 the fidelity to the original data and the roughness of the fitted data. [43] The asymmetric least squares 172 (ALS) approach was widely applied by Eilers et al. for smoothing [47], for background correction 173 [48] in hyphenated chromatography and for finding new features in large spectral data sets [49]. In 174 order to apply penalized least squares algorithm in baseline correction, both Cobas et al. [50] and 175 Zhang et al. [51] introduced a weight vector based on the originals. However, peak detection should 176 be performed before baseline correction in both cases while the baseline of existence negatively 177 affects the peak detection. Method proposed by Cobas et al. [50] did not apply for complex baseline, 178 while the method from Zhang et al. [51] though having improvements over Cobas’s with better 179 accuracy, is time-consuming, particularly in two-dimensional datasets. 180 181 Eilers et al. [52] proposed an alternative algorithm named asymmetric weighted least squares based 182 on original asymmetric least squares (AsLS) [53]. The new one is a combination of a Whittaker 183 smoother with asymmetric weighing of deviations from the (smooth) trend to get an effective baseline 184 estimator. The advantages of this methods are [52]: 1) the baseline position can be adjusted by 185 varying two parameters (p for asymmetry and λ for smoothness.) while the flexibility of the baseline 186 can be tuned with p; 2) it is fast and effective while keeping the analytical peak signal intact; 3) no 187 prior information about peak shape or baseline (polynomial) is needed. With the application of this 188 method, GC chromatogram, MS, Raman and FTIR spectrums were well baseline-corrected. However, 189 it is difficult to find the optimal value of λ and not able to provide a fully automatic procedure to set 190 the optimal values for the parameters. And in this case the user judgment and experience are in need. 191 To overcome these problems, a novel algorithm termed adaptive iteratively reweighted Penalized 192 Least Squares (airPLS) was proposed by Zhang et al. [43]. According to Zhang et al. [43] the adapted 11 | P a g e 193 method is similar to the weighted least squares and 194 iteratively reweighed least squares but uses different 195 ways to calculate the weights and adds a penalty item 196 to control the smoothness of the fitted baseline. The 197 airPLS algorithm has been proved to be effective in 198 baseline correction while reserving primary useful the better the model. 199 information for classification and the R2, Q2 and Q2: second quartile 200 RMSECV of regression models pretreated by airPLS 201 were evidently better than those by Cobas et al. [50] 202 and Eilers et al. [52] and uncorrected, especially when 203 the number of principal components is small in 204 principle component analysis (PCA). [43] Recently, a 205 new method was developed by Reichenbach et al. [54] 206 for two-dimensional GC (GC ×GC) and was incorporated into GC Image software system. The 207 algorithm is based on statistical model of the background values by tracking the adjacency around the 208 smallest values as a function of time and of noise. The estimated background level is then subtracted 209 from the entire image, producing a chromatogram in which the peaks rise above a near-zero mean 210 background. The background removal algorithm is effectively removes the background level of 211 GC×GC, but it does not remove some artifacts observed in these images. Soon, this approach was 212 adapted for LC×LC in two important aspects. Since the background signal in gradient in LC can 213 either be decreasing or increasing as a slope due to the change in solvent constitution, in that case, the 214 baseline correction should track the “middle” value instead of the smallest. [55] Another problem is 215 that the variance of background in the second dimension (2D) is significant, so that the correction 216 algorithm model should fit in both dimensions. [55] An example of LC chromatogram before and 217 after baseline correction was given in Figure 3. 218 Some other algorithms such as weighted least squares (WLS) was also proposed and applied to 219 baseline correction mainly for spectra. A commonly used method for 2D chromatography is the linear 220 least squares algorithm which can be applied in multi-dimensions while other common algorithms 221 such as polynomial least squares is limited to one-dimension. De Rooi et al. [56] developed a two 222 dimensional baseline method based on penalized regression originally for spectroscopy but claimed to 223 be applicable to two dimensional chromatography data as well. Filgueira et al. [57] developed a new 224 method called orthogonal background correction (OBGC) for baseline correction, particularly useful 225 in correcting complex DAD background signals in fast online LC×LC. This method was developed 226 based on the currently existing two baseline correction methods (moving-median filter and 227 polynomial fitting) for one-dimensional liquid chromatography, and was extended by combining with 228 either of which to correct the two-dimensional background in LC×LC. Comparisons between the R2: coefficient of determination, indicating how well data points fit a line or curve. Normally ranges from 0 to 1; the higher the value, RMSECV: root-mean-square error of cross-validation, a measure of model’s ability to predict new samples; the smaller the value, the better the model. 12 | P a g e 229 newly developed method with the two basic methods and dummy blank subtraction were performed 230 on the second dimension (2D) chromatogram and the results were illustrated in Figure 4. 231 232 233 234 235 236 237 238 239 240 241 242 243 Figure 3. (a) Background values before (solid line) and after (dashed line) correction along a single row in the first dimension. A row with no analyte peaks was selected so that the values reflect only the baseline and noise. After correction, the values fluctuate in a small range centered very close to zero. (b) Background values before (solid line) and after (dashed line) correction along a single column in the second dimension. This secondary chromatogram with analyte no peaks was selected so that the values reflect only the baseline and noise. After correction, the values in the region of analysis are very close to zero. [55] 244 245 246 247 248 249 250 251 252 253 254 Figure 4. Comparison of estimated baselines using the different methods on a typical single 2D chromatogram. The chromatograms are intentionally offset by 7 mAU to help visualization. (a) Conventional baseline correction methods: the blue solid line chromatogram is the real single 2D chromatogram; the black dashed line is the estimated baseline using the moving-median filter, and the red dot-dashed line is the estimated baseline using the polynomial fitting method. (b) The two methods are applied in combination with the OBGC method; the line format is same as in a. [57] 13 | P a g e 255 Furthermore, compared to dummy blank subtraction, the reproducibility of the peak height of 256 measured peaks was significantly enhanced after the application of OBGC. The new robust method 257 for baseline correction has been proved to be effective for LC×LC and is considered to be applicable 258 for any 2D technique where the first dimension (1D) has lower frequency baseline fluctuations than 259 the 2D. However, they did not explain clearly the principle of the newly developed method in the 260 article but only mentioned the development was based on the existing method. 261 No. 1 Name of method Dummy blank subtraction 2 Polynomial least squares fitting 3 Penalized least squares Note √ × √ × √ × 4 5 Roughness penalty method Asymmetric least squares (AsLS) 6 Weighted vectorAsLS-1 7 Weighted vectorAsLS-2 8 Asymmetric weighted least squares √ × √ × √ × √ × 9 adaptive iteratively reweighted Penalized Least Squares (airPLS) √ 10 LC/GC Image incorporated Orthogonal background correction (OBGC) √ 11 262 263 √ Simple manual, time-consuming, more errors Effective user intervention, not suitable for low S/N fidelity to the original data and the roughness of the fitted data Need peak detection Effective Need to optimize two parameters, constant weights for entire region no need for peak detection not for complex baseline no need for peak detection, better accuracy time-consuming Easy to perform by tuning two parameters, fast and effective, no prior information needed Difficult to find the optimal value for one parameter, in need of user judgment and experience Effective while reserving the primary information, particularly for small number of principle component in classification; extremely fast for large datasets Powerful, accurate, quick Effective in 2D, highly reproducible of peak height Usage [Ref] Baseline correction Baseline correction Smoothing [44] Smoothing [4546] Smoothing [47], Background correction [48] Baseline correction [50] Baseline correction [51] Baseline correction [52] Baseline correction [43] Baseline correction [55] Baseline correction [57] Table 4. Summary of methods for baseline correction and smoothing. 264 14 | P a g e 265 Smoothing and Derivatives 266 Smoothing is a low-pass filter for the removal of high-frequency noise from samples and sometimes 267 termed as noise reduction. As mentioned above, some algorithms can also be applied to smoothing. 268 By smoothing, variables adjacent to each other in the data matrix share similar information which can 269 be averaged to reduce noise without significant loss of the signal of interest. [58] Smoothing can also 270 be performed by linear Kalman filter, which is mostly used as an alternative of linear least squares for 271 the estimation of the concentration of the mixture components, [59] often used in 1D chromatography. 272 The most classic smoothing method is Savitzky-Golay smoother (SGS) [60] which fits a low order 273 polynomial to each data pixel and its neighbors and then replaces the signal of that with the value 274 provided by the polynomial fit. [38] However, in practice, the missing values and the boundary of the 275 data domain in computation makes it complicated when using SGS. Instead, Whittaker smoother 276 based on penalized least squares has several advantages over SGS. It was said to be extremely fast 277 with automatic boundaries adaption and even allows fast leave-one-out cross validation. [61] 278 279 It is worth noting that digital filters are of good treatment on signal processing in terms of undesired 280 frequencies elimination without distorting the frequency region containing crucial information. [22] 281 The digital filters can be performed in either the time domain or frequency domain. the windowed 282 Fourier transform (FT) is often used to analyze the signal in both time and frequency domains, in a 283 way of studying the signal segment by segment. [22] However, due to the Heisenberg uncertainty 284 principle that precision in both time and frequency cannot be achieved simultaneously, there is a 285 severe disadvantage in FT. The narrower the window, the better localized the signal of peaks at the 286 cost of less precision in frequency; vice versa while with 287 a broader window. [22, 62] To obtain precision in both 288 time and frequency domains Wavelet transform (WT) is 289 preferable, particularly for non-stationary types of signal. 290 WT takes advantage of the intermediate cases of 291 uncertainty principle so to capture the precision in both time and frequency domains with only a little 292 sacrifice of precision for both. [22, 63] Non-stationary: features change with time or in space. 293 294 In contrast to smoothing, derivatives are high-pass filter and frequency-dependent scaling. Derivatives 295 are a common method to remove unimportant baseline signals from samples by taking the derivative 296 of the measured responses with respect to the variable number or other relevant axis scale such as 297 wavelength. It is often used either when lower-frequency features (baseline) are interferences or 298 higher-frequency features contain the signal of interest. [58] The use of this method is under the 299 condition that the variables are strongly related to each other and adjacent variables contain similar 15 | P a g e 300 correlated signal. [58] Savitzky-Golay algorithm is often used to simultaneously smooth the data 301 while taking the derivative so to improve the utility of derivatized data. [19] Vivo -Truyols et al. [64] 302 developed a method to select the optimal window size for the use of Savitzky-Golay algorithm in 303 smoothing, which successfully applied to NMR, chromatography and mass spectrometry data and 304 shown to be robust. 305 Peak detection 306 Peak detection is also an key step in data pre-processing which distinguish the important information, 307 sometimes the most important information, from the noise particularly in search of bio-marker. Peak 308 detection methods are almost fully developed for 1D chromatography with single channel detection, 309 and they are based on detecting signal change in the detectors and applying the condition of 310 unimodality. [65] Peak detection methods are mainly consisting of two families [66]: those make use 311 of matched filters and those make use of derivatives. Only few peak detection methods for two 312 dimensional chromatography (performed in time compared to 2D-PAGE in space) have been reported 313 in literature [67-68] and only two main families of methods are available [65]: those based on the 314 extension of 1D peak detection algorithms [69-70] and those based on the watershed algorithm [71]. 315 316 In general, the former ones follow a two-step procedure [65]: it first detects peaks in a one- 317 dimensional form using the raw signal from the detector, and this step has an advantage of avoiding 318 the discontinuity of sub-peak that drain algorithm has; in the second step, a collection of criteria is 319 then applied to decide the merging of peaks into a single two-dimensional peak from the one- 320 dimensional ones. In spite of the slight difference in those criteria, they are all based on the peak 321 profile similarities (i.e. peaks detected in the first dimension eluting at the same time in the second 322 dimension). 323 324 Reichenbach et al. [71-72] adapted watershed algorithm to peak detection in 2D GC chromatography 325 and the new method is termed drain algorithm. The drain algorithm, which has been applied in 2D LC 326 and 2D GC [71, 73], is an inversion of watershed algorithm. [65, 71] When applying to two 327 dimensional chromatography, the chromatographic peak ( a mountain) appears like a negative peak ( a 328 basin) so that the algorithm works by detecting peaks from the top to down then to the surrounding 329 valleys with minimum thresholds defined by users into two dimensions. [65, 73] Noise artifacts result 330 in over-segmentation detection of multiple regions which should be segmented as a single region 331 when using this algorithm, however, this can be solved by smoothing. [71] However, the fatal 332 drawback of drain algorithm is the discontinuity of sub-peak resulting in the “appear” and “disappear” 333 of a peak several times during the course of elution; moreover, peak splitting occurs due to the 334 intolerance to the retention time variation in the second dimension. [65] Since the variation in second 335 dimension is unavoidable, Peters et al. [69] proposed a method for peak detection in two dimensional 16 | P a g e 336 chromatography using the algorithm (termed C-algorithm) developed by Vivó-Truyols et al. [74] 337 which was originally designed for one dimension. They extended the use into two dimensional GC 338 data and it was shown to be able to quantify the 2D peaks. This C-algorithm was originally designed 339 for GC×GC chromatography, but they claimed that it can also be used for LC×LC with minor 340 modification. Vivó-Truyols et al. [65] built up a model suitable for both LC×LC and GC×GC and 341 compared the C-algorithm and watershed algorithm. In their studies, watershed algorithm has 20% 342 probability of failure in normal situations in GC×GC using C-algorithm as an reference. 343 Alignment 344 Alignment of retention time is also very important in preprocessing since retention time variations can 345 be affected by pressure, temperature and flow rate fluctuation as well as column bleeding. The 346 purpose of alignment (also named warping) is to synchronize the time axes in order to construct 347 representation of signals for n chromatograms (corresponding to n samples) for further data analysis 348 such as calibration and classification. [22] To acquire reproducible analysis of samples, the peak 349 position shifting should be corrected by alignment algorithms. [38] When using higher order 350 instrument for analysis, the data obtained become more complicated to process. Methods particularly 351 developed for alignments in 2D can be categorized into two groups [75]: the first group is seeking the 352 maximum correlation or the minimum distance between chromatograms on the basis of one- 353 dimensional benefit function, such as correlation optimized warping (COW) [76-77] and dynamic 354 time warping (DTW); on the contrary, the second group of methods focuses on second-order 355 instruments which generate a matrix of data per sample. Example of methods are: rank minimization 356 (RM), of which having a remarkable advantage that the interferences coeluting with the analytes of 357 interest do not really affect the performance of alignment, [75] iterative target factor analysis coupled 358 to COW (ITTFA-COW) and parallel factor analysis (PARAFAC). Yu et al. [75] developed a new 359 method named abstract subspace difference (ASSD) based on RM with some modification for 360 alignment. The performance of this new method is comparable with the old RM on both simulated 361 and experimental data, but it is more advantageous than RM due to higher intelligence and suitability 362 for dealing with analytes coeluting with multiple interferences. Furthermore, ASSD can be used in 363 combination with trilinear decomposition to obtain the second-order advantages. Eilers [78] 364 developed a fast and stable parametric model for warping function which consumes little memory and 365 avoids the artifacts of DTW which is time and memory consumptive. This method is very useful for 366 quality control and is easily interpolated, allowing alignment of batches of chromatograms for a 367 limited number of calibration samples. [78] 17 | P a g e 368 369 370 371 Figure 5. Aligning homogeneous TIC images. (a) Shown are the contours and peaks of TIC chromatograms for a pair of FA + AA samples, which are respectively from the first and last GCXGC/TOF-MS analyses. [79] 372 Most of the alignment methods are based on the similar procedures originally developed for 1D 373 chromatograms; however in 2D chromatograms the alignment becomes more critical due to the higher 374 relative variability of the retention time in the very short 2D time window. [80] Therefore, Castillo et 375 al. [80] proposed an alignment algorithm called score alignment utilizing two retention times in two 376 dimensions to improve the precision of the retention time alignment in two dimensions. Zhang et al. 377 [79] developed an alignment method for GC×GC-MS data termed 2D-COW which works on two 378 dimensions simultaneously. Example of application of 2D-COW is presented in Figure 5. Besides, 379 this method can work for both homogenous and heterogeneous chemical samples with a slightly 380 different process before. It was also claimed to be applicable in any 2D separation images such as 381 LC×LC data, LC×GC data, LC×CE data and CE×CE data in principle. Pierce et al. [81] has 382 developed a comprehensive 2D retention time alignment algorithm using a novel indexing scheme. 383 This new comprehensive alignment algorithm is demonstrated by correcting GC×GC data, but was 384 designed for all kinds of 2D instruments. After alignment, the classification by PCA gave 100% 385 accurate scores clustering. However, there is still future work needed to improve this algorithm for 386 combination with spectral detection in order to preserve the spectral information as alignment is 387 performed on retention time. Furthermore, the range of the shifting should also be investigated as well 388 as perturbations in pressure and temperature since the data used in their work gave far peak shifting 389 exceeding the nearest neighbor peaks. A second-order retention time algorithm proposed by Prazen et 390 al. [37] was applied to retention time shifting in second-order analysis, and it was originally designed 391 for chromatographic analysis coupled with spectrometric detection (i.e. LC-UV, GC-MS). However, 392 it is successfully applied on GC×GC data because the second GC column possesses the signal 18 | P a g e 393 precision to act like a spectrometric detector. [82-83] This method requires an estimation of the 394 number of chemical components in the time window of the sample being analyzed, however, it is not 395 a disadvantage because many literature covered the estimation or psuedorank of a bilinear data matrix 396 (i.e. Prazen et al. [37] used PCA using singular value decomposition (SVD) to estimate the 397 psuedorank). [84-86] This alignment algorithm is not objective for first-order chromatographic 398 analysis because retention time is the only qualitative information. [37] Fraga et al. [87] proposed a 399 2D alignment algorithm based on their previous alignment method developed for 1D [88]. This 2D 400 alignment method can objectively correct for run-to-run retention time variances on both dimensions 401 in an independent, stepwise way and it has been proved to be robust as well. They also claimed that 402 this 2D alignment with generalized rank annihilation method (GRAM) was successfully extended to 403 high-speed 2D separation conditions with a reduced data density. 404 Normalization 405 Normalization in preprocessing is also an important step due to the bias introduced by sample 406 preparation and poor injection volume precision provided by injectors. The most commonly used 407 method is the internal standard method. [89] However, the choice of internal standard is limited since 408 the internal standard needs to be inert and fully resolved from all native components while possessing 409 similar structure to the sample analytes. Likewise baseline correction, there are also several 410 algorithms for normalization. Although without using the standard solution, the responses are much 411 dependent on detectors. For example, the flame ionization detector (FID) responses in GC are largely 412 dependent on the carbon content of the solute. If samples belonging to similar types are analyzed by 413 FID, the normalization method algorithm will have the least error in data analysis. In general, 414 normalization is often used for the determination of the components of sample mixture since as time 415 passes by, the response of each component may vary during analysis, making the comparison 416 difficult. Other normalization methods are either mathematically forcing the mean signal of each 417 chromatogram to equal 1 [90], or forcing the maximum peak volume to equal 1 [91] so that the sum 418 of all signals/chromatogram constitutes as 100% and each component takes a certain percentage of all. 419 To summarize, the preprocessing procedures are very controversial and tricky since they can 420 determine the degree of usefulness of the raw data with the interference of users. With the advantages 421 and limitations of each method in every step of preprocessing, there is no best method for all. When 422 preprocessing methods are not used in the right way, unwanted variations can be introduced. [40] 423 Currently, some software/tools developed for data processing are already embedded with certain 424 algorithms chosen by the manufactories for commercial use (i.e. GC Image), many researchers prefer 425 to use in-home written routines [92]. However this is not clear-cut guidelines for choosing the optimal 426 methods. A nice review on preprocessing methods with critical comments is given by Engel et al. 427 [40]. 19 | P a g e 428 Data processing 429 The advantage of 2D instrumental analysis is that the data produced by instrument provide more 430 information in the form of second-order, third-order or even higher with the existence of unwanted 431 inferences. However, this advantage is at the cost of complicated useful information extraction. If 432 preprocessing of data is performed well, the next step is the real data processing procedure such as 433 data reduction, data decomposition and classification. 434 Supervised and unsupervised learning 435 In data analysis, the statistical learning always falls into two categories: supervised and unsupervised. 436 Let us look at grouping as an example to understand these two categories. In chemometrics, the aim of 437 classification is to separate a number of samples into different groups by their distinguishing 438 characteristics - similarities. However, the word classification is ambiguous in the field of pattern 439 recognition. To clarify the definition, grouping is used herein instead of classification to indicate the 440 general grouping. There are two types of grouping in pattern recognition: supervised and 441 unsupervised. Supervised pattern recognition requires a training set of known groupings in advance, 442 and tries to answer the belonging of the group of an unknown sample precisely. [93] In short, an 443 supervised pattern recognition is based on the prerequisite that the number of the groups of samples is 444 already known beforehand, and this kind of grouping is termed classification. An unsupervised 445 grouping is applied to explore the data when the number of groups is not known beforehand while the 446 aim is to find the similarities/dissimilarities between them. For instance, a large number of wine 447 chromatograms are given, and the researchers want to separate those samples into several groups by 448 their origin; in this case how many origins of the wine is unknown and this grouping exploration is 449 termed clustering. 450 So back to the original two categories, supervised learning is a process looking for an model fitting 451 the observations of the predictor measurements (xi) and relating the associated response measurement 452 (yi). With this model, it is expected to predict the response for future observations (prediction) 453 accurately or to better understand the relationship between the response and the predictors. In contrast, 454 unsupervised learning try to manage a more challenging situation where there is no associated 455 response to every observation. It is not possible to fit a linear regression model since there is no 456 response variable to predict. There are a lot of methods for both supervised and unsupervised learning, 457 however, it is not the aim of this literature review to describe all of them, particularly in details; and 458 some of them are not popular in 2D chromatography. For all statistical methods on this part, the 459 reader is referred to these books [94]. In this literature review, only the most popular methods applied 460 in 2D chromatography are explained. 20 | P a g e 461 Unsupervised 462 HCA 463 HCA, short for hierarchical clustering analysis, is an unsupervised method for data mining. While K- 464 means clustering, which requires a pre-specified number of clusters K in advance, HCA does not 465 require this. HCA has an advantage over K-means clustering that it results in an clear tree-like 466 representation of the observations, called dendrogram. HCA works in the way that putting 467 objects/variables with small distance (high similarity) together in the same cluster. The HCA 468 algorithm is simple and it is based on the calculation of distance, mostly Euclidean distance. 469 470 Suppose there are n observations, each of them is 471 treated as its own cluster. The clustering first Euclidean distance - The Euclidean 472 starts with calculating the smallest Euclidean distance between points p and q is 473 distance of two observations among all and fuse the length of the line segment 474 them. Since it is an iterative process, the connecting them. 475 calculation and clustering will continue till all the 476 distances are calculated. 477 478 479 The dissimilarity between two observations indicates the height in the dendrogram where the fusion is 480 placed. The dissimilarity of clusters of observations distinguishing that of observations is termed 481 linkage. There are four types of linkage: complete, average, single and centroid. 482 21 | P a g e Linkage Description Complete Maximal intercluster dissimilarity. Compute all pairwise fissimilarities between the observations in cluster A and the observations in cluster B, and record the largest of these dissimilarities. Single Minimal intercluster dissimilarity. Compute all pairwise dissimilarities between the observations in cluster A and the observations in cluster B, and record the smallest of these dissimilarities. Single linkage can result in extended, trailing clusters in which single observations are fused one-at-a-time. Average Mean intercluster dissimilarity. Compute all pairwise dissimilarities between the observations in cluster A and the observations in cluster B, and record the average of these dissimilarities. Centroid Dissimilarity between the centroid for cluster A (a mean vector of length p) and the centroid for cluster B. Centroid linkage can result in undesirable inversions. 483 484 485 486 Average and complete linkages are generally preferred over single linkage due to their tendency of 487 yielding more balanced dendrograms, while centroid linkage is often used in genomics but suffers 488 from the drawback of inversion where two clusters are fused at a height below either of the individual 489 ones. [95] Centroid is often used in chromatography field. In general, the resulting dendrograms are 490 strongly dependent on the linkage used. Table 5. A summary of the four most commonly-used types of linkage in hierarchical clustering. [95] 491 492 This clustering method is very popular in 2D chromatography application and is said to be used for 493 the case when sample sets are smaller than 250. [96] Ru et al. [97] applied both HCA and Principle 494 component analysis (PCA, explained later) for the peptide sample data sets analyzed by 2D LC-MS 495 for the peptide feature profiling of human breast cancer and breast disease sera. Schmarr et al. [98] 496 also applied HCA and PCA for profiling the volatile compounds from fruits. Groger et al. [99] 497 applied HCA and PCA as well for profiling of illicit drug samples. 498 499 500 501 22 | P a g e 502 PCA 503 Principle component analysis (PCA) is a very useful statistical method that can be applied to 504 simplification, data reduction, modeling, outlier detection, variable selection, classification, prediction 505 and unmixing. [101] 506 Basically, for a data matrix X containing N objects (rows) and K variables (columns), PCA 507 approximates the data matrix X, in terms of the product of two smaller matrices T and P which 508 capture the essential data patterns of X to interpret the information. [101] Generally, objects are the 509 samples and variables are the measurements. The decomposition in PCA can be performed by 510 eigenvalue decomposition, or singular value decomposition (SVD). The illustration of PCA by 511 eigenvalue decomposition is given in Figure 6. 512 513 514 515 516 Figure 6. A data matrix X with its first two principal components. The model of PC in matrix form can be mathematically expressed as: 𝑋 = 𝑇 ∙ 𝑃𝑇 + 𝐸 (2) 517 Where T are called scores, having the same number of rows as the original data matrix, P are 518 loadings, having the same number of columns as the original data matrix and E, which is not 519 explained by the PC model, is the residuals. ti and pj are denoted as the vectors in each scores and 520 loadings matrix. 521 522 From a geometric perspective, a data matrix X ( N rows × K columns) can be represented as an 523 ensemble of N points distributed in K dimensions space. This space may be termed M-space for 524 measurement space or multivariate space or K-space to indicate its dimensionality. [101] It is difficult 525 to visualize by human eyes when K>3 but not by mathematically. The number of principle 526 components is used to explain the information in PCA dimensionality. When a one-component PC 527 model is sufficient to explain the data, the model will be a straight line; and a two-component PC 528 model is a plane consisting of two orthogonal lines, so a three-component PC model is a three- 23 | P a g e 529 dimensional space wherein three lines are orthogonal. Besides, an A-components PC model is an A- 530 dimensional hyperplane, which is a subspace of one dimension less than its ambient space. In data 531 reduction, this is useful when the original data set is large and complex so that PCA can approximate 532 it by a moderately complex model structure. 533 534 In PCA algorithm, it searches for the axis used for the projection of the data points where the loss of 535 information (variability) is the minimum. In other words, since PCA is a least squares model, the PC 536 model is built on the basis that the sum of the squared residuals (Stotal) is the smallest. The first 537 principle component (PC1) is the one capturing the largest variance containing the most useful 538 information of all, then it is rotated to search for PC2 which is orthogonal to PC1 capturing the second 539 largest variance in the left variances. This process carries on till all the useful variances are captured 540 by PCs only leaving the residuals (noise) out. The rule of PC model is that all the PCs should be 541 orthogonal. The number of principle components is determined by the total contribution of PCs able 542 to explain the data matrix, which is dependent on the size of the component. After transforming the 543 data matrix into a number of PCs, the size of each component is measured. The size of the component 544 is termed eigenvalue which quantifies the variance captured by the PC: the more significant the 545 component, the larger the size. [101] The eigenvalue can be calculated by sum of squares of each PC 546 scores vector (Sk) divided by the sum of squares of the total data (Stotal). A basic assumption in PCA is 547 that the scores and loadings vectors corresponding to the largest eigenvalue contains the most useful 548 information relating to the specific problem. [100] One simple example is present in Table 6. PC1 549 explained 44.78% of the total data matrix and the first three PCs accounted for 95.37% of the total, 550 hence in this case, three principle components were sufficient to explain the information from data 551 matrix. 552 Total 670 PC1 PC2 PC3 PC4 PC5 Eigenvalue 300 230 109 20 8 % 44.78 34.34 16.27 2.99 1.19 Cumulative % 44.78 79.11 95.37 98.36 99.55 553 554 555 556 557 A common rule for choosing the number of principle components is determined when the cumulative 558 value exceeds the cut-off value 95%. However, this would not work in every case due to the fact that 559 Stotal is dependent on the variance of the raw data. Sometimes it requires a preprocessing of the data 560 before applying PCA, and this can be achieved through scaling (e.g. mean-centering the data by 561 subtracting the column averages corresponds to moving the coordinate system to the centre of the Table 6. Illustration of size of eigenvalues in PCA. [101] 24 | P a g e 562 data). The scaling is essential since PCA is a least squares method meaning the variables with large 563 variances will have large loadings, which enlarges the scale of the coordinate axis. The common ways 564 to avoid this bias are standardization or mean-centering or log centering the matrix columns so the 565 variance will be 1. [94] The scaling of variance makes all coordinate axes have the same length so that 566 each variable having same influence on the PC model. [94] In general, the 95% cut-off rule is not the 567 standard one to use that one should also take the nature of the data and personal experience into 568 account. 569 PCA provides two kinds of plots scores (T) and loading (PT ) plot. Each of which investigates the 570 relationships among objects and variables respectively. The value of object i (its projection) on a 571 principle component p is termed score. 572 (B) 573 574 575 576 577 Figure 7. (A) The scores plot obtained from PCA of 18 basil, 18 peppermint, and 18 stevia GC×GC-TOFMS m/z 73 chromatograms demonstrates differentiation between-species based on metabolite profiles. Ref [176]; (B) PCA plot of 2D GC-TOFMS data for serum samples. The group A samples were stored at -20℃ and the group B samples were stored at -80℃. [80] 578 As mentioned in the beginning that PCA can be applied to grouping, and PCA is an unsupervised 579 method. PCA has been widely used in chromatography, and working efficiently in 2D 580 chromatography. [80, 100, 102-103] With 54 chromatograms of three different species of plants (18 581 for each species), Pierce et al. [102] used PCA to quickly and objectively discover differences 582 between complex samples analyzed by 2D GC-MS. PCA compared the metabolite profiles of the 54 583 submitted chromatograms and 54 scores of the m/z 73 data sets were successfully clustered in three 584 groups according to the types of the plant, furthermore, they yielded highly loaded variables 585 corresponding with chemical differences between plants providing complementary information for 586 m/z 73. However, this approach has never been demonstrated for 2D GC-TOFMS metabolites data. 587 588 PCA has also been used in quality control to detect possible outliers by Castillo et al. [80]. 60 human 589 serum samples analyzed by 2D GC-MS and the total metabolite profiles were used in the evaluation. 590 All samples were separated into two clusters by the storage temperature as can be seen in Figure 7 (B) 25 | P a g e 591 indicating no outliers in this case. An example of PCA applied to 2D GC data by Schmarr et al. [98] 592 for profiling the volatile compounds from fruits is given in Figure 8. 593 594 595 596 597 598 599 600 Figure 8. PCA analysis: In the first/second principal component plot (panel A), except for “Cox-Orange” and with much lower distance “Pinova”, all apples (reddish and yellowish color shades) are projected into the center. Pears, which are encoded by green color hue, appear on the upper left, while “Alexander Lucas” and “Conference” are clearly distinguishable. The group of quince fruit samples appears at largest distance to the other samples on the upper right. [98] 601 method called hierarchical PCA (H-PCA) was suggested by Pierce et al. [102] to be conceivably 602 applied to this type of data. The principle of H-PCA is basically the same but providing more 603 information due to its dealing with higher dimensional data sets. It works in the way of constructing 604 several PCA models based on a subset of the entire higher-order data set (i.e. all the mass channels of 605 2D GC-MS), and the scores from all PCA models can be combined to form a new matrix. The same 606 extension of PLS is termed H-PLS and both methods are well explained by Wold et al. [104]. 607 MPCA 608 Multiway principle component analysis (MPCA), an extension of PCA, has recently become a 609 promising exploratory data analysis method with 2D GC data. [102-103, 105-107] The principle of 610 MPCA is basically the same as PCA, only extended to higher order data generated by instruments. In 611 short, MPCA is an unfold method where the two-way data for each sample is unfolded row-wise and 612 PCA is performed on the unfolded data. [107] It has been applied to extracted matrices of raw 3D data 613 arrays to determine the exact compounds distinguishing classes of samples. [108-109, 110] Indeed, PCA has been proved to be a very popular method in 2D chromatography. Still, another 614 26 | P a g e 615 Supervised 616 PLS 617 Partial least squares analysis (PLS) is a dimension reduction method which first identifies a new set of 618 M features Zi (i=1,2,3,…M) that are linear combinations of the original data and then fits a linear 619 model by least squares. However, PLS differs from principle component regression (PCR) in a 620 supervised way, which is as explained earlier taking the response into account. In other words, PLS 621 works on finding the linear combination of highly loaded chromatographic signals by capturing the 622 highest covariance of both the variable and response. An comparison example between PCR and PLS 623 was presented in Figure 9. 624 625 626 627 628 Figure 9. Comparison between PCR and PLS; where the first PLS direction (solid green line) and first PCR direction (dotted green line) are shown. [203] Different from PCA, PLS places the highest weight on the variables that are most strongly related to 629 the response. From both Figure 9 and the principle of PLS, it is clear that vectors in PLS does not fit 630 the predictors as closely as PCA does but provides a better explanation on the relationship with 631 response, which is very important for quantification; furthermore, PLS can work with multivariate 632 response variables. 633 Multi-way partial least squares (N-PLS) is an extension of PLS into multiple dimensions. [111-112] 634 Interval multi-way partial least squares (iNPLS) as it says in the name uses intervals of multi-way 635 datasets to build calibration models. [113] iNPLS is an extension of interval partial least squares 636 (iPLS) proposed by Norgaard et al. [114], which was developed for first order data by splitting the 637 dataset into a collection of intervals set by users and then a PLS model is calculated for each interval 638 with the lowest root mean square of cross-validation (RMSECV). However, there is no algorithm for 639 second-order data though iPLS, like many other methods, could also be used for it by unfolding the 640 data as PCA does. This also arise some problems (i.e. introduce bias into calibration because the 641 untargeted peaks coeluting with targeted ones are also calculated in the intervals) when applying to 642 GC×GC data by unfolding. This has resulted in the development of iNPLS. iNPLS is basically the 27 | P a g e 643 same as PLS, but splitting data matrix into intervals in both dimensions with multi-way algorithm. 644 Like NPLS, iNPLS does not have second-order advantage but is able to analyze an unknown sample 645 containing interferences which do not present in the calibration. As an supervised pattern recognition 646 method, partial least-squares discriminant analysis (PLSDA) has been used for modelling, 647 classification and prediction in 2D GC-TOFMS [115]. 648 649 Fisher ratio analysis (FRA) which calculates the ratio of variance between groups and variance within 650 groups as a function of an independent variable is a robust method for classification. In 651 chromatography, the new independent variable is the retention time for classification. The schematic 652 of reducing 4D data to 2D for Fisher ratio calculation has been well depicted by Pierce et al. [116] It 653 has been applied to breast cancer tumor data analyzed by 2D GC-MS. [117] Guo et al. [118] applied 654 FRA to 2D GC-MS data for metabolite profiling. FRA has been successfully applied to 2D GC- 655 TOFMS dataset by Pierce et al. [116] and proved to be better than PCA when handling biodiversity 656 by differentiating regions of large within-class variance from regions of large class-to-class variance. 657 658 659 660 661 Figure 10. Schematic of novel indexing scheme to reduce 4D data to 2D data for calculation of Fisher ratios. Ultimately, the entire set of data that is collected is automatically (i.e., not manually) submitted to Fisher ratio analysis, and the unknown chemical differences among complex samples are objectively discovered. [116] 662 663 664 28 | P a g e 665 Peak Deconvolution 666 Peak deconvolution is to resolve the overlapping peaks in complex mixture to enhance selectivity of a 667 certain chromatographic technique when separation cannot be improved by optimizing the separation 668 conditions. [22] This is necessary for quantification of components of interest. Several chemometrical 669 approaches can be used to achieve deconvolution with a single-channel detector, however they are not 670 used in an everyday practice due to the prerequisite of advanced knowledge. [22] Typical data sets 671 obtained by 1D technique can be presented as a two-way data table as in Figure 11. This type of data 672 can be decomposed into two matrices containing both concentrations (chromatogram) and profiles 673 (spectral). An example is to use orthogonal projection approach (OPA) [119] followed by alternating 674 least squares (ALS) [120]. Two alternatives of OPA are window factor analysis (WFA) and evolving 675 factor analysis (EFA). [121] Due to the limitation of EFA on prediction of peak shapes, a non- 676 iterative EFA (iEFA) was developed [122]. EFA was successfully applied to LC-DAD and GC-MS 677 data. A good review of mixture-analysis approaches for bilinear data matrices has been published with 678 comparisons of different methods. [123] 679 680 681 682 683 684 Figure 11. Illustration of the bi-linear chromatographic data. Columns of matrix X contain the concentration profiles (chromatograms) and rows contain the spectral profiles. [123] 29 | P a g e 685 The methods mentioned above are mostly designed and applied to 1D chromatography data sets, 686 including those with multichannel detectors (i.e. DAD) producing second-order (trilinear) data sets. 687 For 2D chromatography, the data form will be first-order (bilinear) with only 2D chromatography and 688 the order of data will increase to trilinear (can be decomposed into a matrix 𝐼 × 𝐽 × 𝐾) or higher with 689 different detectors (i.e. single wavelength UV detector or MS). Second-order data provides a trilinear 690 structure and the trilinear structure is beneficial for signals which are not sufficiently resolved by the 691 instrument can be resolved mathematically. Zeng et al. [124] developed an method (named 692 simultaneous deconvolution) using non-linear least squares curving fitting (NLLSCF). The NLLSCF 693 analysis was based on Levenberg-Marquardt algorithm due to its satisfactory performance to treat 694 multi-parameter systems. This method allows simultaneous deconvolution and reconstruction of peak 695 profiles for both dimensions and this makes accurate quantification and finding of retention 696 parameters of target components. It was originally designed for GC×GC datasets but can also be 697 utilized for LC×LC datasets. 698 699 700 701 702 703 704 705 706 707 708 709 710 711 Figure 12. Illustration of bilinear data structure. The data matrix, in the form of a contour plot, depicts the overlapped GC×GC signals of three components, A-C. Each component’s signal is bilinear when its noise-free signal can be modeled as the product of its pure column 2 chromatographic profile vector and its pure column 1 chromatographic profile vector. Here, the bilinear portion of the data matrix is modeled as the product of two matrices, each matrix containing the pure chromatographic profile vectors for components A-C on a given column. The nonbilinear portion of the data matrix is grouped into one matrix designated as noise. Concentration information for each component is incorporated within each pure chromatographic profile vector. [124]. 712 successfully applied to 1D chromatography data, such as deconvolution peaks in LC-DAD [88, 125]. 713 It was the first deconvolution method applied to comprehensive 2D chromatography [83, 126]. The 714 schematic bilinear data structure of 2D chromatography was depicted in Figure 12. The application of 715 GRAM extended from 1D GC chromatography to 2D GC chromatography is based on that the second 716 column of a GC×GC system can be treated as a multichannel detector. [83] Full resolution of all the 717 analytes of interest is not necessary since GRAM can be successfully applied to 2D GC data. [82-83, Generalized rank annihilation method (GRAM) introduced by Sanchez and Kowalski [66] has been 30 | P a g e 718 126] It was even demonstrated that GRAM can mathematically resolve overlapped GC×GC signals 719 without any preprocessing alignment to the data sets under favorable conditions. [82] The 720 deconvoluted peaks in GC×GC was presented in Figure 13. 721 722 723 724 725 726 727 728 Figure 13. GRAM deconvolution of the overlapped ethylbenzene and p-xylene components in the sample white gas comprehensive GC=GC chromatogram shown in Figure 9, using a white gas standard for comparison. (A) First GC column and (B) second GC column estimated pure profiles. Deconvolution is successful despite the low 0.09 resolution of the peaks on the second column, because retention times are very precise within and between GC runs. [82] 729 detection response must be linear with the concentration; secondly, the peak shapes must remain 730 unchanged, which means there is no overloading effect on the column; thirdly, the convoluted peaks 731 must have resolution on both dimensions; finally, there cannot be perfect covariance in concentration 732 of two compounds within the data window being analyzed from the standard to the sample. A key 733 advantage of GRAM over other analysis methods is that the unknown sample can contain overlapped 734 peaks not present in the calibration standard. [83] While GRAM can only quantify two injections at 735 one time where one of which needs to be a standard, PARAFAC does not have these limitations and 736 can be used to analyze more than two samples for three-way LC×LC data and four-way LC×LC data. Nevertheless, there are some prerequisites of using GRAM to 2D chromatographic data [83]: first, the 737 738 Figure 14. Schematic overview of PARAFAC. [106] 31 | P a g e 739 Parallel factor analysis (PARAFAC) is a generalization of PCA using an iterative process to resolve 740 trilinear signals by optimizing initial estimates by ALS and signal constraits. [127] It has been applied 741 to peak deconvolution in third-order data generated by 2D GC-TOFMS. [128-129] It was shown that 742 PARAFAC results can be consistent for replicate analyses even when the accuracy is not as optimal. 743 Trilinear decomposition (TLD) initiated PARAFAC was shown to be powerful for multivariate 744 deconvolution in 2D GC-TOFMS data analysis, the partially resolved components in complex 745 mixtures can be deconvoluted and identified without requiring a standard dataset, signal shape 746 assumptions or any fully selective mass signals. [128] PARAFAC is also applicable for higher order 747 data than three way datasets. [120] 748 749 750 751 Figure 15. (A) PARAFAC deconvoluted column 1 pure component profiles of the low-resolution isomer data set. (B) PARAFAC deconvoluted column 2 pure component profiles of the low-resolution isomer data. [128] 752 32 | P a g e 753 Compare to PARAFAC, PARAFAC 2 does not need alignment before. [250-251] PARAFAC 2 754 allows peaks to shift between chromatograms by relaxing the bilinearity constrait on the dimension 755 containing the chromatographic data. When analyzing 2D LC-DAD samples, PARAFAC was not 756 capable of analylzing the entire sample at once. 2D GC-TOFMS datasets treated by PARAFAC (with 757 alignment) and PARAFAC2 were compared [131]: it was found that PARAFAC was more robust 758 with lower S/N and lower concentrations while PARAFAC2 did not need alignment analysis. 759 However, this study was performed on fully resolved peak instead of overlapped peaks. Both methods 760 are based on ALS minimization of the residual matrix and yields direct estimates of the concentrations 761 without bias. [106] However, PARAFAC2 only permits the inner-structure relationship in one 762 direction. 763 764 765 766 767 Figure 16. Accuracy of the various quantitation methods based on the analysis of a reference mixture with known analyte concentrations. [106]. 768 results were shown in Figure 16. It was shown that the model given by PARAFAC2 was 769 overestimating while PARAFAC is the most accurate method of all. Even with real samples, the 770 results obtained also showed that PARAFAC2 was overestimating the concentration in all cases. It 771 was also shown that PARAFAC2, which was supposed to be able to deal with retention time shift due 772 to its inner-product structure, was neither accurate nor robust. [198] Van Mispelaar et al. [106] compared several methods for calibration with a standard mixture and the 773 774 33 | P a g e 775 Conclusion & Future work 776 For data pre-processing and real data processing procedures, a variety of methods can be chosen for 777 two dimensional chromatography datasets, and they all have their advantages and disadvantages. It is 778 a pity that the utilization of method is mostly dependent on user experience and preferences that 779 people tend to use what they have learned. There are no clear-cut guidelines to choose the optimal 780 methods. The search for optimal methods in the future would be substantially beneficial for the 781 development of chemometrics. 782 783 784 Tool box 785 Currently, most of the pre-processing algorithms and data processing methods for classification and 786 quantification can be applied directly in tool box packed in softwares such as Matlab and R. (i.e. The 787 PLS algorithm was from the PLS Toolbox by Eigenvector Research Inc. (Eigenvector Research Inc., 788 Wenatchee, WA). The N-PLS algorithm was from the N-Way Toolbox by Rasmus Bro 789 www.models.life.ku.dk/source/nwaytoolbox/ This has stimulated the development of algorithm by 790 user experience and provided a variety of choices on methods by user preferences.) 34 | P a g e Reference 1. Nomenclature for chromatography, (IUPAC Recommendations 1993) 2. D. Harvey, Modern analytical chemistry 1st ed., p549 3. L.R. Snyder, Introduction to Modern Liquid Chromatography 3rd ed., p76 4. K.S. Booksh, B. R. Kowalski, Ana. Chem., 66 (1994) 782-791 5. J. Blomberg, High Resolut. Chromatogr., 20 (1997) 539 6. R.B. Gaines, Environ. Sci. Technol., 33 (1999) 2106 7. R.B. Gaines, in: Z. Wang, S. Stout (Eds.), Spill Oil Fingerprinting and Source Identification, Academic Press, New York, 2006, p. 169 8. G.S. Frysinger, High Resolut. Chromatogr. 23 (2000) 197 9. G.S. Frysinger, Environ. Sci. Technol., 37 (2003) 1653 10. J. Beens, J. High Resolut. Chromatogr., 23 (2000) 182 11. C.M. Reddy, Environ. Sci. Technol., 36 (2002) 4754 12. C.M. Reddy, J. Chromatogr. A, 1148 (2007) 100 13. G.T. Ventura, PNAS, 104 (2007) 14261 14. G.T. Ventura, Org. Geochem., 39 (2008) 846 15. A. Motoyama, Anal. Chem., 80 (2008) 7187 16. J. Peng, J. Proteome Res., 2 (2003) 43 17. M. Gilar, Anal. Chem., 77 (2005) 6426 18. J. Peng, J. Proteome Res., 2003, 2, 43 19. X. Zhang, Anal. Chimica. Acta., 664 (2010) 101 20. P.Q. Tranchida, J. Chromatogr. A, 1054 (2004) 3 21. InforMetrix, Chemometrics in Chromatogr., 1996 22. M. Daszykowski, Trends in Anal. Chem., 25 (2006) 11 23. G. Musumarra, J. Anal. Toxicology, 11 (1987) 154 24. I. Moret, J. Sci. Food Agric., 35 (1984) 100 25. I. Moret, Riv.Vitic. Enol., 38 (1985) 254 26. L.E. Stenroos, J. Am. Soc. Brew. Chem., 42 (1984) 54 27. P.C. Van Rooyen, Dev. Food Sci., 10 (1985) 359 28. B.E.H. Saxberg, Anal. Chim. Acta, 103 (1978) 201 29. H. Engman, J. Anal. Appl. Pyrolysis, 6 (1984)137 30. W.R. Butler, J. Clin. Microbiol., 29 (1991) 2468 31. B.R. Kowalski, Anal. Chem., 47 (1975) 1152 32. R.J. Marshall, J. Chromatogr., 297 (1984) 235 33. J.A. Pino, Anal. Chem., 57 (1985) 295 34. J.E. Zumberge, Cosmochim. Acta, 51 (1987) 1625 35. Karl S. Booksh, Bruce R. Kowalski, Analytical Chemistry, 66 (1994) 782 36. M. Escandar, Anal. Chimica Acta, 806 (2014) 8 37. B.J. Prazen, Anal. Chem. 1998, 70, 218 38. K.M. Pierce, J. Chromatogr. A, 1255 (2012) 3 39. K.M. Pierce, Sep. & Purif. Rev., 41 (2012) 143 40. J. Engel, Trends in Anal. Chem., 50 (2013) 96 41. R.G. Brereton, Applied chemometrics for scientists, 42.docs.google.com/viewer?url=http%3A%2F%2Fwww.chemometrics.ru%2Fmaterials%2Fpresentati ons%2Fwsc4%2Fbogomolov.ppt 43. Z. Zhang, Analyst, 135 (2010) 1138 44. E. T. Whittaker, P. Edinburgh Math. Soc., 41 (1922) 63 45. P. J. Green and B. W. Silverman, Nonparametric regression and generalized linear models a roughness penalty approach, Chapman & HallCRC, London, 1994. 46.J. O. Ramsay and B. W. Silverman, Functional data analysis, Springer,New York, 1998. 47. http://www.alglib.net/interpolation/leastsquares.php#header1 35 | P a g e 48. H.F.M. Boelens, P.H.C. Eilers, R. Dijkstra, F. Fitzpatrick, J.A. Westerhuis, J. Chrom. A, 1057 (2004) 21 49. H.F.M. Boelens, P.H.C. Eilers, Th. Hankemeier, Anal. Chem. 2005. 50. J.C. Cobas, J. Magnetic Resonance 183 (2006) 145 51. Z. Zhang, J. Raman Spectrosc., 41 (2010) 659 52. http://www.science.uva.nl~hboelenspublicationsdraftpubEilers_2005.pdf 53.W. K. Newey and J. L. Powell, Econometrica, 1987, 819 54. S.E. Reichenbach, J. Chromatogr. A, 985 (2003) 47 55. S.E. Reichenbach , Journal of Chromatography A, 1216 (2009) 3458–3466 56. J. J. de Rooi , Analytica Chimica Acta 771 (2013) 7– 13 57. M.R. Filgueira, Anal. Chem. 2012, 84, 6747 58. wiki.eigenvector.com 59.Data analysis and signal processing in chromatography, elsevier, attila felinger, 1998 60.P.G. Stevenson, M. Mnatsakanyan, G. Guiochon, R.A. Shalliker, Analyst 135 (2010) 1541 61. P. H. C. Eilers, Anal. Chem. 2003, 75, 3631 62. http://en.wikipedia.org/wiki/Uncertainty_principle 63. R.X. Gao and R. Yan, Wavelets: Theory and Applications for Manufacturing 64. G. Vivo-Truyols, Anal. Chem. 2006, 78, 4598 65. G. Vivo-Truyols, Journal of Chromatography A, 1217 (2010) 1375 66. A. Felinger, Data Analysis and Signal Processing in Chromatography, Elsevier,Amsterdam, 1998 (Chapter 8). 67. I. Francois, K. Sandra, P. Sandra, Anal. Chim. Acta 641 (2009) 14. 68. L. Mondello, M. Herrero, T. Kumm, P. Dugo, H. Cortes, G. Dugo, Anal. Chem. 80 (2008) 5418 69. S. Peters, Journal of Chromatography A, 1156 (2007) 14 70. J. Beens, H. Boelens, R. Tijssen, J. Blomberg, J. High Resolut. Chromatogr. 21 (1998) 47. 71. S.E. Reichenbach, Chemom. Intell. Lab. Syst. 71 (2) (2004) 107. 72. S.E. Reichenbach , Journal of Chromatography A, 1071 (2005) 263–269 73. S.E. Reichenbach , J. Sep. Sci. 2010, 33, 1365–1374 74. G. Vivo-Truyols, Journal of Chromatography A, 1096 (2005) 133–145 75. Y. Yu, Analyst, 2013, 138, 627 76. N. Vest Nielsen, Journal of Chromatography A, 805 (1998) 17 77. N. P. V. Nielsen, J. Chromatogr., A, 805 (1998) 17 78. P. H. C. Eilers, Anal. Chem. 2004, 76, 404 79. D. Zhang, Anal. Chem. 2008, 80, 2664-2671 80. S. Castillo, Anal. Chem. 2011, 83, 3058 81. K.M. Pierce, Anal. Chem. 2005, 77, 7735 82. Bryan J. Prazen, J. Microcolumn Separations, 107 (1999) 97 83. Carsten A. Bruckner, Anal. Chem., 70 (1998) 2796 84. Faber, N. M.; Buydens, L. M. C.; Kateman, G. Anal. Chim. Acta, 296 (1994) 1 85. Faber, N. M.; Buydens, L. M. C.; Kateman, G. Chemom. Intell. Lab. Syst. 1994, 203 86. Malinowski, E. R. J. Chemom. 1988, 3, 49 87. Carlos G. Fraga, Anal. Chem. 2001, 73, 5833 88. Carlos G. Fraga, Anal. Chem. 1998, 70, 218 89. Elbert J.C. van der Klift, Journal of Chromatography A, 1178 (2008) 43 90. J. Vial, Talanta 83 (2011) 1295 91.R. E. Mohler, Journal of Chromatography A, 1186 (2008) 401 92. M. Kallio, Journal of Chromatography A, 1216 (2009) 2923 93. . Book_Chemometrics. Data analysis for the laboratory and chemical plant 94. D.L.Massart, Handbook of Chemomerics and Qualimetrics; J.N.Miller, Statisticas and Chemometrics for Analytical Chemistry; etc 95. G. Jame,s An Introduction to Statistical Learning 96. Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998) Cluster 97. Q. Cindy Ru, Molecular & Cellular Proteomics 5.6 98. H. Schmarr, Journal of Chromatography A, 1217 (2010) 565 99. Th. Groger, Journal of Chromatography A, 1200 (2008) 8 36 | P a g e 100. S. Wold, Chemometrics and Intelligent Laboratory Systems, 2 (1987) 37 101. R.G. Nreretpm. Applied Chemometrics for Scientists 102. K.M. Pierce, Talanta 70 (2006) 797 103.V.G. van Mispelaar, Journal of Chromatography A, 1071 (2005) 229 104. S. wold, JOURNAL OF CHEMOMETRICS, VOL. 10,463-482 (1996) 105. K.M. Pierce, Journal of Chromatography A, 1184 (2008) 341 106. V.G. van Mispelaar , Journal of Chromatography A, 1019 (2003) 15 107. G.T. Ventura, Journal of Chromatography A, 1218 (2011) 2584 108. P. Giordani, J. Chemometrics 2004; 18: 253 109. A.K.Smilde, J. Chemometrics 2003; 17: 323 110. 238.S.Wold, J. Chemom. 1987, 1, 41 111. 253.L.A.F. de Godoy, E.C. Ferreira, M.P. Pedroso, C.H.V. Fidelis, F. Augusto, R.J. Poppi, Anal. Lett. 41 (2008) 1603. 112. 254.M.P. Pedroso, L.A.F. de Godoy, E.C. Ferreira, R.J. Poppi, F. Augusto, J. Chromatogr.A 1201 (2008) 176. 113. L.A.F. de Godoy, Talanta 83 (2011) 1302–1307 114. L. Norgaard, A. Saudland, J. Wagner, J.P. Nielsen, L. Munck, S.B. Engelsen, Appl. Spectrosc. 54 (2000) 413. 115. X. Li, Analytica Chimica Acta 633 (2009) 257 116. K.M. Pierce, Anal. Chem. 2006, 78, 5068 117. S. E. Reichenbach, Talanta 83 (2011) 1279 118. X. Guo, Biotechnology and Bioengineering, 99 92008) 4 119. F.C. Sanchez, Anal. Chem. 1996, 68, 79 120. 167.R. Tauler, D. Barcelo, Trends Anal. Chem. 12 (1993) 319. 121. F.C. Sanchez, Chemometrics and Intelligent Laboratory Systems 36 (1997) 153 122. 222. Maeder, M. Anal. Chem. 1987, 59, 527 123. F.C. Sanchez, Chemometrics and Intelligent Laboratory Systems 34 (1996) 139 124. Z. Zeng, Journal of Chromatography A, 1218 (2011) 2301–2310 125. E. Sanchez, J. Chromatogr. 1987, 385, 151 126. C.G. Fraga, B.J. Prazen, R.E. Synovec, Anal. Chem. 72 (2000) 4154 127. O. Amador-Munoz, P.J. Marriott, J. Chromatogr. A 1184 (2008) 323. 128. A.E. Sinha, Journal of Chromatography A, 1056 (2004) 145 129. A.E. Sinha, Journal of Chromatography A, 1058 (2004) 209 130. R. Bro, J. Chemometrics 2003 17 274 131. T. Skov, J.C. Hoggard, R. Bro, R.E. Synovec, J. Chromatogr. A 1216 (2009) 37 | P a g e