Data pre-processing - FNWI (Science) Education Service Centre

advertisement
Literature Review
Data Processing Methods for
2D Chromatography
Muzi Li
Supervisor: dr. Gabriel Vivo-Truyols
University of Amsterdam
MSc Chemistry
Analytical Sciences
Literature Review
Data Processing Methods for
2D chromatography
Muzi Li (1022 6761)
Supervisor: dr. Gabriel Vivo-Truyols
2|Page
ABSTRACT
Two dimensional liquid and gas chromatography have become increasingly popular in the
application of many fields such as metabolites, petroleum and food analysis due to its substantial
resolving power of separating complex samples. However, the additional dimension together with a
variety of detection instruments render the complexity of data sets by increasing the order of the data
generated by instruments, though benefiting more useful information by, for instance, second-order
advantages. The high order of data requires data processing methods so to transform chromatograms
and spectrums into useful information in several steps. Data processing methods is consisting of two
steps: data pre-processing and real data processing. Pre-processing, including baseline correction,
peak detection, smoothing and derivatives, alignment and normalization, is aiming for reducing the
unrelated variations in chemical variations caused by interferences such as noise. This step is
important since this prepares the raw data sets for real data processing such as classification,
identification and quantification. If data pre-processing fails, the data may be obscured by the
unrelated variations, resulting in the failure of real data processing. In the real data processing steps,
methods such as PCA, GRAM, PARAFAC were used for classification and quantification of the
sample compounds. This literature present, however not in details, the most popular data processing
methods used for two dimensional chromatography and their application was mentioned as well.
3|Page
CONTENT
INTRODUCTION .................................................................................................................................. 5
Order of Instrument............................................................................................................................. 8
Data pre-processing .............................................................................................................................. 10
Baseline correction............................................................................................................................ 10
Smoothing and Derivatives ............................................................................................................... 15
Peak detection ................................................................................................................................... 16
Alignment ......................................................................................................................................... 17
Normalization ................................................................................................................................... 19
Data processing ..................................................................................................................................... 20
Supervised and unsupervised learning .............................................................................................. 20
Unsupervised................................................................................................................................. 21
Supervised ..................................................................................................................................... 27
Conclusion & Future work.................................................................................................................... 34
Reference .............................................................................................................................................. 35
4|Page
1
INTRODUCTION
2
Two-dimensional (2D) chromatography (2D liquid chromatography or 2D gas chromatography) refers
3
to a procedure where parts or all of the sample components to be separated are subjected to two
4
separation steps by different separation mechanisms. In planar chromatography, 2D chromatography
5
refers to the procedures where components are supposed to first migrate in one direction and
6
subsequently in a direction at right angles to the first one by two different eluents. [1] Compared to
7
1D chromatographic performance, 2D chromatography possesses substantial resolving power
8
providing high separation efficiency and selectivity. In 1D chromatography, resolution (RS) is often
9
used to quantitatively measure the degree of separation between two components A and B. RS is
10
defined as the time difference of two adjacent peaks divided by the sum of the peak width of both. [2]
11
However, it is difficult to acquire acceptable resolution of all peaks for complex samples consisting of
12
numerous components. Therefore, peak capacity (PC) is introduced to measure the overall separation,
13
particularly for complex samples analyzed in gradient elution in 2D chromatography. [3] The peak
14
capacity in separation is defined as the total number of peaks that can be fit into a chromatographic
15
window, when every peak is separated from adjacent peaks with RS=1. [3] Since the fractions from
16
the first separation are further resolved in the second orthogonal separation, the peak capacity for 2D
17
separation is equal to the product of peak capacities of each separation. [3] For instance, if the peak
18
capacity in 1D in isocratic is PC=100, then the total peak capacity of 2D would be PC=100 × 100 =
19
10,000. [3] Due to the high resolving power [5-14], the use of 2D chromatographic separation has
20
raised substantially in bio-chemical field, [15-18] a nice review about multidimensional LC in the
21
field of proteomics [19] and a review on the application of 2D chromatography in food analysis [20]
22
have been published. After instrumental analysis, data sets are produced and are in need of
23
interpretation in terms of identification, classification and quantification. The process of data
24
interpretation/transformation into useful information, such as inferring a property of interest (typically
25
involving search of bio-markers) or classifying a sample into one of several categories, is termed
26
chemometrics. An example of sample classification is given in Figure 1.
27
28
29
5|Page
30
31
32
33
34
Figure 1. Classification of chromatograms is based on the relative abundance of all the peaks in the mixture.
[21]
35
data generated by 2D instruments makes the work of data interpretation time-consuming; often, there
36
is chance of overlooking on the data. In order to extract the most useful information out of the raw
37
data, a well-defined procedure of data processing needs to perform. Daszykowski et al. [22]
38
summarized the results as listed in Table 1 by searching paper titles and keywords containing
39
chemometrics and chromatography. The results exhibited promising scope for solving
40
chromatographic problems. There are multiple chemometrical methods developed, applied and
41
improved for each aspect of problem in chromatography in the past decades bearing both advantages
42
and disadvantages, and the extensive application of chemometrics in chromatography has spread from
43
drug identification [23] in pharmaceutical and beer/wine quality control/classification [24-27],
44
proving economic fraud [28] in food field to identification of microbial species by evaluation of cell
45
wall material [29-30] and predicting disease state [31-33] in clinically medical, as well as oil
46
exploration (oil-oil correlation, oil-source rock correlation) [34] in petroleum.
Although enormous information can be extracted from 2D chromatography, the complexity of the raw
47
48
49
6|Page
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Keyword(s)
Multivariate curve resolution
Alternating least squares
Mcr-als
Chemometrics
Experimental design
Multivariate analysis
Pattern recognition
Classification
PCA
Qspr
Qsar
Topological indices
Topological descriptors
Modeling retention
Fingerprints
Clustering
Peak shifts
Deconvolution
Background correction
De-noising
Noise reduction
Signal enhancement
Preprocessing
Mixture analysis
Alignment
Warping
Peak matching
Peak detection
Wavelets
Score
44
34
18
403
605
275
280
1029
556
51
111
38
11
5
802
219
16
244
21
9
17
43
45
82
383
11
18
54
32
5456
Table 1. Results of keyword search in SCOPUS system, using a quick search (“keyword(s)” and
chromatography). [22]
97
98
99
100
101
7|Page
102
Order of Instrument
103
Before introducing data processing methods in 2D chromatography, data-type-based classification
104
shall be defined beforehand so to better understand the methods to be applied. The classification of
105
analytical instruments or methods is summarized for the simplification of data processing based on
106
the type of data generated, using the existed mathematical terminology as following [35] : a zero-
107
order instrument is one which generates only a single datum per sample since a single number is a
108
zero-order tensor. Examples of zero-order instruments are ion-selective electrodes and single-filter
109
photometers. A first-order instrument, including all types of spectrometers, chromatographs and even
110
arrays of zero-order sensors, is one which generates multiple measurements at one time or for one
111
sample, wherein the measurements can be put into an ordered array as a vector of data (also termed as
112
a first-order tensor). Likewise, a second-order instrument generates a matrix of data per sample,
113
mostly used in but not restricted to hyphenation such as gas chromatography – mass spectrometer
114
(GC-MS), LC-MS/MS, GC-GC. Higher order of data can be generated by even more complex
115
instruments and there is no limit to the maximum order of data. [35] In Table 2, the concepts of order
116
of data were depicted.
117
Data order
Array
One sample
Calibration
A sample set
Zero
Scalar
One-way
Univariate
First
Vector
Two-way
Multivariate
Second
Matrix
Three-way
Multi-way
Third
Three-way
Four-way
Multi-way
Fourth
Four-way
Five-way
Multi-way
118
119
120
Table 2. Different arrays that can be obtained for a single sample and for a set of samples. [36]
8|Page
Calibratio
n
Required
selectivit
y
Maximu
m
analytes
Miminum
standards
(with
offset)
Interferences
Signal
averagi
ng
Statistics
Somethin
g extra
Zero
order
Full
1
1 (170)
None
Simple,
well
defined
−
First
order
Net
analyte
signal
No. of
sensors
Complex
defined
−
Second
order
Net
analyte
rank
min (I, J)
1 per
species
(1+1 per
species
present)
1 (169)
Cannot
detect;
analysis
biased
Can detect;
analysis
biased
Can detect;
analysis
accurate
~√𝐈 ∙ 𝐉
~√𝐉
Complex,
not fully
investigate
d
Firstorder
profiles
121
122
123
Table 3. Advantages and disadvantages of different calibration paradigms. [35]
124
For 2D chromatography analysis, the data would be at least first-order (e.g. LC×LC); however, single
125
wavelength detector or hyphenated detector (LC×LC-MS) is always necessary which complicating
126
the data by increasing the data order to such as second-order or third-order providing detailed and
127
precise information with second-order advantages (Table 3). The primary second-order advantage is
128
the higher selectivity even with unknown interferences. [35, 37] Different methods were categorized
129
and elucidated to give an overview of the current data processing methods for 2D chromatography.
130
131
132
133
134
9|Page
135
Data pre-processing
136
In 2D data processing, pre-processing must be applied to the raw data before quantitative and/or
137
qualitative data analysis due to the obscuration of irrelevant chromatographic variations caused by
138
interferences such as noise (high frequency) and background (low frequency). In contrast, the
139
intermediate frequency is the signal of component. Pre-processing is crucial because it helps reduce
140
the unrelated variations in chemometric analysis of chemical variations and it has become the critical
141
point potentially determining the success and failure in many applications. [38-40] Particularly in
142
metabolomics analysis, the methods of preprocessing have become the primary ones which are
143
difficult and influential in the final results. [41] In most cases, the variables in a data set adjacently
144
located to each other are related and contain similar information; methods for filtering the noise and
145
background correction utilize this relationship for interference removal.
146
Baseline correction
147
The interference in baseline consists of background (low frequency) and noise (high frequency).
148
General baseline correction procedures are designed to reduce the low frequency noise while some
149
procedures and smoothing techniques are particularly for high frequency variations so to improve the
150
signal-to-noise (S/N) ratios. [38] Note that the definition of background tends to be used in a more
151
general sense to designate any unwanted signal including noise and chemical components, while
152
baseline is associated with a smooth line reflecting a “physical” interference. [42] In Figure 2,
153
difference between background, noise and signal of component is well elucidated.
154
155
156
Figure 2. Components of analytical signal: (a) overall signal; (b) relevant signal; (c) background; and, (d) noise.
[22]
10 | P a g e
157
Baseline correction is typically the first step in preprocessing due to the baseline shifting and
158
interference contributed by solvents and impurities etc. which assign imprecise signals to each
159
component at a fixed concentration. After baseline correction, the baseline noise signal is supposed to
160
be numerically centered at zero. [38] The simplest way of baseline correction is to run a “blank”
161
analysis and subtract the chromatogram from the sample one. However, several blank runs are in need
162
of performance in this case to obtain a confidence level on the blank chromatogram since variations
163
might occur from run to run analysis. A second way of baseline correction is to use polynomial least
164
squares fitting simulating a blank chromatogram and then subtract it from the sample one. This
165
method is effective to some extent but is in need of user intervention and prone to variability
166
particularly in low S/N environments. [43] An alternative is to use penalized least squares algorithm
167
with some adaption. The penalized least squares algorithm was first published by Whittaker [44] in
168
1922 as a flexible smoothing method (noise reduction). Silverman et al. [45-46] later developed
169
another method for smoothing named roughness penalty method. The penalized least squares
170
algorithm can be regarded as roughness penalty smooth by least squares, which is balanced between
171
the fidelity to the original data and the roughness of the fitted data. [43] The asymmetric least squares
172
(ALS) approach was widely applied by Eilers et al. for smoothing [47], for background correction
173
[48] in hyphenated chromatography and for finding new features in large spectral data sets [49]. In
174
order to apply penalized least squares algorithm in baseline correction, both Cobas et al. [50] and
175
Zhang et al. [51] introduced a weight vector based on the originals. However, peak detection should
176
be performed before baseline correction in both cases while the baseline of existence negatively
177
affects the peak detection. Method proposed by Cobas et al. [50] did not apply for complex baseline,
178
while the method from Zhang et al. [51] though having improvements over Cobas’s with better
179
accuracy, is time-consuming, particularly in two-dimensional datasets.
180
181
Eilers et al. [52] proposed an alternative algorithm named asymmetric weighted least squares based
182
on original asymmetric least squares (AsLS) [53]. The new one is a combination of a Whittaker
183
smoother with asymmetric weighing of deviations from the (smooth) trend to get an effective baseline
184
estimator. The advantages of this methods are [52]: 1) the baseline position can be adjusted by
185
varying two parameters (p for asymmetry and λ for smoothness.) while the flexibility of the baseline
186
can be tuned with p; 2) it is fast and effective while keeping the analytical peak signal intact; 3) no
187
prior information about peak shape or baseline (polynomial) is needed. With the application of this
188
method, GC chromatogram, MS, Raman and FTIR spectrums were well baseline-corrected. However,
189
it is difficult to find the optimal value of λ and not able to provide a fully automatic procedure to set
190
the optimal values for the parameters. And in this case the user judgment and experience are in need.
191
To overcome these problems, a novel algorithm termed adaptive iteratively reweighted Penalized
192
Least Squares (airPLS) was proposed by Zhang et al. [43]. According to Zhang et al. [43] the adapted
11 | P a g e
193
method is similar to the weighted least squares and
194
iteratively reweighed least squares but uses different
195
ways to calculate the weights and adds a penalty item
196
to control the smoothness of the fitted baseline. The
197
airPLS algorithm has been proved to be effective in
198
baseline correction while reserving primary useful
the better the model.
199
information for classification and the R2, Q2 and
Q2: second quartile
200
RMSECV of regression models pretreated by airPLS
201
were evidently better than those by Cobas et al. [50]
202
and Eilers et al. [52] and uncorrected, especially when
203
the number of principal components is small in
204
principle component analysis (PCA). [43] Recently, a
205
new method was developed by Reichenbach et al. [54]
206
for two-dimensional GC (GC ×GC) and was incorporated into GC Image software system. The
207
algorithm is based on statistical model of the background values by tracking the adjacency around the
208
smallest values as a function of time and of noise. The estimated background level is then subtracted
209
from the entire image, producing a chromatogram in which the peaks rise above a near-zero mean
210
background. The background removal algorithm is effectively removes the background level of
211
GC×GC, but it does not remove some artifacts observed in these images. Soon, this approach was
212
adapted for LC×LC in two important aspects. Since the background signal in gradient in LC can
213
either be decreasing or increasing as a slope due to the change in solvent constitution, in that case, the
214
baseline correction should track the “middle” value instead of the smallest. [55] Another problem is
215
that the variance of background in the second dimension (2D) is significant, so that the correction
216
algorithm model should fit in both dimensions. [55] An example of LC chromatogram before and
217
after baseline correction was given in Figure 3.
218
Some other algorithms such as weighted least squares (WLS) was also proposed and applied to
219
baseline correction mainly for spectra. A commonly used method for 2D chromatography is the linear
220
least squares algorithm which can be applied in multi-dimensions while other common algorithms
221
such as polynomial least squares is limited to one-dimension. De Rooi et al. [56] developed a two
222
dimensional baseline method based on penalized regression originally for spectroscopy but claimed to
223
be applicable to two dimensional chromatography data as well. Filgueira et al. [57] developed a new
224
method called orthogonal background correction (OBGC) for baseline correction, particularly useful
225
in correcting complex DAD background signals in fast online LC×LC. This method was developed
226
based on the currently existing two baseline correction methods (moving-median filter and
227
polynomial fitting) for one-dimensional liquid chromatography, and was extended by combining with
228
either of which to correct the two-dimensional background in LC×LC. Comparisons between the
R2: coefficient of determination,
indicating how well data points fit
a line or curve. Normally ranges
from 0 to 1; the higher the value,
RMSECV: root-mean-square
error of cross-validation, a
measure of model’s ability to
predict new samples; the smaller
the value, the better the model.
12 | P a g e
229
newly developed method with the two basic methods and dummy blank subtraction were performed
230
on the second dimension (2D) chromatogram and the results were illustrated in Figure 4.
231
232
233
234
235
236
237
238
239
240
241
242
243
Figure 3. (a) Background values before (solid line) and after (dashed line) correction along a single row in the
first dimension. A row with no analyte peaks was selected so that the values reflect only the baseline and noise.
After correction, the values fluctuate in a small range centered very close to zero. (b) Background values before
(solid line) and after (dashed line) correction along a single column in the second dimension. This secondary
chromatogram with analyte no peaks was selected so that the values reflect only the baseline and noise. After
correction, the values in the region of analysis are very close to zero. [55]
244
245
246
247
248
249
250
251
252
253
254
Figure 4. Comparison of estimated baselines using the different methods on a typical single 2D chromatogram.
The chromatograms are intentionally offset by 7 mAU to help visualization. (a) Conventional baseline
correction methods: the blue solid line chromatogram is the real single 2D chromatogram; the black dashed line
is the estimated baseline using the moving-median filter, and the red dot-dashed line is the estimated baseline
using the polynomial fitting method. (b) The two methods are applied in combination with the OBGC method;
the line format is same as in a. [57]
13 | P a g e
255
Furthermore, compared to dummy blank subtraction, the reproducibility of the peak height of
256
measured peaks was significantly enhanced after the application of OBGC. The new robust method
257
for baseline correction has been proved to be effective for LC×LC and is considered to be applicable
258
for any 2D technique where the first dimension (1D) has lower frequency baseline fluctuations than
259
the 2D. However, they did not explain clearly the principle of the newly developed method in the
260
article but only mentioned the development was based on the existing method.
261
No.
1
Name of method
Dummy blank
subtraction
2
Polynomial least
squares fitting
3
Penalized least
squares
Note
√
×
√
×
√
×
4
5
Roughness penalty
method
Asymmetric least
squares (AsLS)
6
Weighted vectorAsLS-1
7
Weighted vectorAsLS-2
8
Asymmetric
weighted least
squares
√
×
√
×
√
×
√
×
9
adaptive iteratively
reweighted Penalized
Least Squares
(airPLS)
√
10
LC/GC Image
incorporated
Orthogonal
background
correction (OBGC)
√
11
262
263
√
Simple
manual, time-consuming, more errors
Effective
user intervention, not suitable for low S/N
fidelity to the original data and the
roughness of the fitted data
Need peak detection
Effective
Need to optimize two parameters,
constant weights for entire region
no need for peak detection
not for complex baseline
no need for peak detection, better
accuracy
time-consuming
Easy to perform by tuning two
parameters, fast and effective, no prior
information needed
Difficult to find the optimal value for one
parameter, in need of user judgment and
experience
Effective while reserving the primary
information, particularly for small
number of principle component in
classification; extremely fast for large
datasets
Powerful, accurate, quick
Effective in 2D, highly reproducible of
peak height
Usage [Ref]
Baseline
correction
Baseline
correction
Smoothing [44]
Smoothing [4546]
Smoothing [47],
Background
correction [48]
Baseline
correction [50]
Baseline
correction [51]
Baseline
correction [52]
Baseline
correction [43]
Baseline
correction [55]
Baseline
correction [57]
Table 4. Summary of methods for baseline correction and smoothing.
264
14 | P a g e
265
Smoothing and Derivatives
266
Smoothing is a low-pass filter for the removal of high-frequency noise from samples and sometimes
267
termed as noise reduction. As mentioned above, some algorithms can also be applied to smoothing.
268
By smoothing, variables adjacent to each other in the data matrix share similar information which can
269
be averaged to reduce noise without significant loss of the signal of interest. [58] Smoothing can also
270
be performed by linear Kalman filter, which is mostly used as an alternative of linear least squares for
271
the estimation of the concentration of the mixture components, [59] often used in 1D chromatography.
272
The most classic smoothing method is Savitzky-Golay smoother (SGS) [60] which fits a low order
273
polynomial to each data pixel and its neighbors and then replaces the signal of that with the value
274
provided by the polynomial fit. [38] However, in practice, the missing values and the boundary of the
275
data domain in computation makes it complicated when using SGS. Instead, Whittaker smoother
276
based on penalized least squares has several advantages over SGS. It was said to be extremely fast
277
with automatic boundaries adaption and even allows fast leave-one-out cross validation. [61]
278
279
It is worth noting that digital filters are of good treatment on signal processing in terms of undesired
280
frequencies elimination without distorting the frequency region containing crucial information. [22]
281
The digital filters can be performed in either the time domain or frequency domain. the windowed
282
Fourier transform (FT) is often used to analyze the signal in both time and frequency domains, in a
283
way of studying the signal segment by segment. [22] However, due to the Heisenberg uncertainty
284
principle that precision in both time and frequency cannot be achieved simultaneously, there is a
285
severe disadvantage in FT. The narrower the window, the better localized the signal of peaks at the
286
cost of less precision in frequency; vice versa while with
287
a broader window. [22, 62] To obtain precision in both
288
time and frequency domains Wavelet transform (WT) is
289
preferable, particularly for non-stationary types of signal.
290
WT takes advantage of the intermediate cases of
291
uncertainty principle so to capture the precision in both time and frequency domains with only a little
292
sacrifice of precision for both. [22, 63]
Non-stationary: features change
with time or in space.
293
294
In contrast to smoothing, derivatives are high-pass filter and frequency-dependent scaling. Derivatives
295
are a common method to remove unimportant baseline signals from samples by taking the derivative
296
of the measured responses with respect to the variable number or other relevant axis scale such as
297
wavelength. It is often used either when lower-frequency features (baseline) are interferences or
298
higher-frequency features contain the signal of interest. [58] The use of this method is under the
299
condition that the variables are strongly related to each other and adjacent variables contain similar
15 | P a g e
300
correlated signal. [58] Savitzky-Golay algorithm is often used to simultaneously smooth the data
301
while taking the derivative so to improve the utility of derivatized data. [19] Vivo -Truyols et al. [64]
302
developed a method to select the optimal window size for the use of Savitzky-Golay algorithm in
303
smoothing, which successfully applied to NMR, chromatography and mass spectrometry data and
304
shown to be robust.
305
Peak detection
306
Peak detection is also an key step in data pre-processing which distinguish the important information,
307
sometimes the most important information, from the noise particularly in search of bio-marker. Peak
308
detection methods are almost fully developed for 1D chromatography with single channel detection,
309
and they are based on detecting signal change in the detectors and applying the condition of
310
unimodality. [65] Peak detection methods are mainly consisting of two families [66]: those make use
311
of matched filters and those make use of derivatives. Only few peak detection methods for two
312
dimensional chromatography (performed in time compared to 2D-PAGE in space) have been reported
313
in literature [67-68] and only two main families of methods are available [65]: those based on the
314
extension of 1D peak detection algorithms [69-70] and those based on the watershed algorithm [71].
315
316
In general, the former ones follow a two-step procedure [65]: it first detects peaks in a one-
317
dimensional form using the raw signal from the detector, and this step has an advantage of avoiding
318
the discontinuity of sub-peak that drain algorithm has; in the second step, a collection of criteria is
319
then applied to decide the merging of peaks into a single two-dimensional peak from the one-
320
dimensional ones. In spite of the slight difference in those criteria, they are all based on the peak
321
profile similarities (i.e. peaks detected in the first dimension eluting at the same time in the second
322
dimension).
323
324
Reichenbach et al. [71-72] adapted watershed algorithm to peak detection in 2D GC chromatography
325
and the new method is termed drain algorithm. The drain algorithm, which has been applied in 2D LC
326
and 2D GC [71, 73], is an inversion of watershed algorithm. [65, 71] When applying to two
327
dimensional chromatography, the chromatographic peak ( a mountain) appears like a negative peak ( a
328
basin) so that the algorithm works by detecting peaks from the top to down then to the surrounding
329
valleys with minimum thresholds defined by users into two dimensions. [65, 73] Noise artifacts result
330
in over-segmentation detection of multiple regions which should be segmented as a single region
331
when using this algorithm, however, this can be solved by smoothing. [71] However, the fatal
332
drawback of drain algorithm is the discontinuity of sub-peak resulting in the “appear” and “disappear”
333
of a peak several times during the course of elution; moreover, peak splitting occurs due to the
334
intolerance to the retention time variation in the second dimension. [65] Since the variation in second
335
dimension is unavoidable, Peters et al. [69] proposed a method for peak detection in two dimensional
16 | P a g e
336
chromatography using the algorithm (termed C-algorithm) developed by Vivó-Truyols et al. [74]
337
which was originally designed for one dimension. They extended the use into two dimensional GC
338
data and it was shown to be able to quantify the 2D peaks. This C-algorithm was originally designed
339
for GC×GC chromatography, but they claimed that it can also be used for LC×LC with minor
340
modification. Vivó-Truyols et al. [65] built up a model suitable for both LC×LC and GC×GC and
341
compared the C-algorithm and watershed algorithm. In their studies, watershed algorithm has 20%
342
probability of failure in normal situations in GC×GC using C-algorithm as an reference.
343
Alignment
344
Alignment of retention time is also very important in preprocessing since retention time variations can
345
be affected by pressure, temperature and flow rate fluctuation as well as column bleeding. The
346
purpose of alignment (also named warping) is to synchronize the time axes in order to construct
347
representation of signals for n chromatograms (corresponding to n samples) for further data analysis
348
such as calibration and classification. [22] To acquire reproducible analysis of samples, the peak
349
position shifting should be corrected by alignment algorithms. [38] When using higher order
350
instrument for analysis, the data obtained become more complicated to process. Methods particularly
351
developed for alignments in 2D can be categorized into two groups [75]: the first group is seeking the
352
maximum correlation or the minimum distance between chromatograms on the basis of one-
353
dimensional benefit function, such as correlation optimized warping (COW) [76-77] and dynamic
354
time warping (DTW); on the contrary, the second group of methods focuses on second-order
355
instruments which generate a matrix of data per sample. Example of methods are: rank minimization
356
(RM), of which having a remarkable advantage that the interferences coeluting with the analytes of
357
interest do not really affect the performance of alignment, [75] iterative target factor analysis coupled
358
to COW (ITTFA-COW) and parallel factor analysis (PARAFAC). Yu et al. [75] developed a new
359
method named abstract subspace difference (ASSD) based on RM with some modification for
360
alignment. The performance of this new method is comparable with the old RM on both simulated
361
and experimental data, but it is more advantageous than RM due to higher intelligence and suitability
362
for dealing with analytes coeluting with multiple interferences. Furthermore, ASSD can be used in
363
combination with trilinear decomposition to obtain the second-order advantages. Eilers [78]
364
developed a fast and stable parametric model for warping function which consumes little memory and
365
avoids the artifacts of DTW which is time and memory consumptive. This method is very useful for
366
quality control and is easily interpolated, allowing alignment of batches of chromatograms for a
367
limited number of calibration samples. [78]
17 | P a g e
368
369
370
371
Figure 5. Aligning homogeneous TIC images. (a) Shown are the contours and peaks of TIC chromatograms for
a pair of FA + AA samples, which are respectively from the first and last GCXGC/TOF-MS analyses. [79]
372
Most of the alignment methods are based on the similar procedures originally developed for 1D
373
chromatograms; however in 2D chromatograms the alignment becomes more critical due to the higher
374
relative variability of the retention time in the very short 2D time window. [80] Therefore, Castillo et
375
al. [80] proposed an alignment algorithm called score alignment utilizing two retention times in two
376
dimensions to improve the precision of the retention time alignment in two dimensions. Zhang et al.
377
[79] developed an alignment method for GC×GC-MS data termed 2D-COW which works on two
378
dimensions simultaneously. Example of application of 2D-COW is presented in Figure 5. Besides,
379
this method can work for both homogenous and heterogeneous chemical samples with a slightly
380
different process before. It was also claimed to be applicable in any 2D separation images such as
381
LC×LC data, LC×GC data, LC×CE data and CE×CE data in principle. Pierce et al. [81] has
382
developed a comprehensive 2D retention time alignment algorithm using a novel indexing scheme.
383
This new comprehensive alignment algorithm is demonstrated by correcting GC×GC data, but was
384
designed for all kinds of 2D instruments. After alignment, the classification by PCA gave 100%
385
accurate scores clustering. However, there is still future work needed to improve this algorithm for
386
combination with spectral detection in order to preserve the spectral information as alignment is
387
performed on retention time. Furthermore, the range of the shifting should also be investigated as well
388
as perturbations in pressure and temperature since the data used in their work gave far peak shifting
389
exceeding the nearest neighbor peaks. A second-order retention time algorithm proposed by Prazen et
390
al. [37] was applied to retention time shifting in second-order analysis, and it was originally designed
391
for chromatographic analysis coupled with spectrometric detection (i.e. LC-UV, GC-MS). However,
392
it is successfully applied on GC×GC data because the second GC column possesses the signal
18 | P a g e
393
precision to act like a spectrometric detector. [82-83] This method requires an estimation of the
394
number of chemical components in the time window of the sample being analyzed, however, it is not
395
a disadvantage because many literature covered the estimation or psuedorank of a bilinear data matrix
396
(i.e. Prazen et al. [37] used PCA using singular value decomposition (SVD) to estimate the
397
psuedorank). [84-86] This alignment algorithm is not objective for first-order chromatographic
398
analysis because retention time is the only qualitative information. [37] Fraga et al. [87] proposed a
399
2D alignment algorithm based on their previous alignment method developed for 1D [88]. This 2D
400
alignment method can objectively correct for run-to-run retention time variances on both dimensions
401
in an independent, stepwise way and it has been proved to be robust as well. They also claimed that
402
this 2D alignment with generalized rank annihilation method (GRAM) was successfully extended to
403
high-speed 2D separation conditions with a reduced data density.
404
Normalization
405
Normalization in preprocessing is also an important step due to the bias introduced by sample
406
preparation and poor injection volume precision provided by injectors. The most commonly used
407
method is the internal standard method. [89] However, the choice of internal standard is limited since
408
the internal standard needs to be inert and fully resolved from all native components while possessing
409
similar structure to the sample analytes. Likewise baseline correction, there are also several
410
algorithms for normalization. Although without using the standard solution, the responses are much
411
dependent on detectors. For example, the flame ionization detector (FID) responses in GC are largely
412
dependent on the carbon content of the solute. If samples belonging to similar types are analyzed by
413
FID, the normalization method algorithm will have the least error in data analysis. In general,
414
normalization is often used for the determination of the components of sample mixture since as time
415
passes by, the response of each component may vary during analysis, making the comparison
416
difficult. Other normalization methods are either mathematically forcing the mean signal of each
417
chromatogram to equal 1 [90], or forcing the maximum peak volume to equal 1 [91] so that the sum
418
of all signals/chromatogram constitutes as 100% and each component takes a certain percentage of all.
419
To summarize, the preprocessing procedures are very controversial and tricky since they can
420
determine the degree of usefulness of the raw data with the interference of users. With the advantages
421
and limitations of each method in every step of preprocessing, there is no best method for all. When
422
preprocessing methods are not used in the right way, unwanted variations can be introduced. [40]
423
Currently, some software/tools developed for data processing are already embedded with certain
424
algorithms chosen by the manufactories for commercial use (i.e. GC Image), many researchers prefer
425
to use in-home written routines [92]. However this is not clear-cut guidelines for choosing the optimal
426
methods. A nice review on preprocessing methods with critical comments is given by Engel et al.
427
[40].
19 | P a g e
428
Data processing
429
The advantage of 2D instrumental analysis is that the data produced by instrument provide more
430
information in the form of second-order, third-order or even higher with the existence of unwanted
431
inferences. However, this advantage is at the cost of complicated useful information extraction. If
432
preprocessing of data is performed well, the next step is the real data processing procedure such as
433
data reduction, data decomposition and classification.
434
Supervised and unsupervised learning
435
In data analysis, the statistical learning always falls into two categories: supervised and unsupervised.
436
Let us look at grouping as an example to understand these two categories. In chemometrics, the aim of
437
classification is to separate a number of samples into different groups by their distinguishing
438
characteristics - similarities. However, the word classification is ambiguous in the field of pattern
439
recognition. To clarify the definition, grouping is used herein instead of classification to indicate the
440
general grouping. There are two types of grouping in pattern recognition: supervised and
441
unsupervised. Supervised pattern recognition requires a training set of known groupings in advance,
442
and tries to answer the belonging of the group of an unknown sample precisely. [93] In short, an
443
supervised pattern recognition is based on the prerequisite that the number of the groups of samples is
444
already known beforehand, and this kind of grouping is termed classification. An unsupervised
445
grouping is applied to explore the data when the number of groups is not known beforehand while the
446
aim is to find the similarities/dissimilarities between them. For instance, a large number of wine
447
chromatograms are given, and the researchers want to separate those samples into several groups by
448
their origin; in this case how many origins of the wine is unknown and this grouping exploration is
449
termed clustering.
450
So back to the original two categories, supervised learning is a process looking for an model fitting
451
the observations of the predictor measurements (xi) and relating the associated response measurement
452
(yi). With this model, it is expected to predict the response for future observations (prediction)
453
accurately or to better understand the relationship between the response and the predictors. In contrast,
454
unsupervised learning try to manage a more challenging situation where there is no associated
455
response to every observation. It is not possible to fit a linear regression model since there is no
456
response variable to predict. There are a lot of methods for both supervised and unsupervised learning,
457
however, it is not the aim of this literature review to describe all of them, particularly in details; and
458
some of them are not popular in 2D chromatography. For all statistical methods on this part, the
459
reader is referred to these books [94]. In this literature review, only the most popular methods applied
460
in 2D chromatography are explained.
20 | P a g e
461
Unsupervised
462
HCA
463
HCA, short for hierarchical clustering analysis, is an unsupervised method for data mining. While K-
464
means clustering, which requires a pre-specified number of clusters K in advance, HCA does not
465
require this. HCA has an advantage over K-means clustering that it results in an clear tree-like
466
representation of the observations, called dendrogram. HCA works in the way that putting
467
objects/variables with small distance (high similarity) together in the same cluster. The HCA
468
algorithm is simple and it is based on the calculation of distance, mostly Euclidean distance.
469
470
Suppose there are n observations, each of them is
471
treated as its own cluster. The clustering first
Euclidean distance - The Euclidean
472
starts with calculating the smallest Euclidean
distance between points p and q is
473
distance of two observations among all and fuse
the length of the line segment
474
them. Since it is an iterative process, the
connecting them.
475
calculation and clustering will continue till all the
476
distances are calculated.
477
478
479
The dissimilarity between two observations indicates the height in the dendrogram where the fusion is
480
placed. The dissimilarity of clusters of observations distinguishing that of observations is termed
481
linkage. There are four types of linkage: complete, average, single and centroid.
482
21 | P a g e
Linkage
Description
Complete
Maximal intercluster dissimilarity. Compute all pairwise
fissimilarities between the observations in cluster A and the
observations in cluster B, and record the largest of these
dissimilarities.
Single
Minimal intercluster dissimilarity. Compute all pairwise
dissimilarities between the observations in cluster A and the
observations in cluster B, and record the smallest of these
dissimilarities. Single linkage can result in extended, trailing
clusters in which single observations are fused one-at-a-time.
Average
Mean
intercluster
dissimilarity.
Compute
all
pairwise
dissimilarities between the observations in cluster A and the
observations in cluster B, and record the average of these
dissimilarities.
Centroid
Dissimilarity between the centroid for cluster A (a mean vector
of length p) and the centroid for cluster B. Centroid linkage can
result in undesirable inversions.
483
484
485
486
Average and complete linkages are generally preferred over single linkage due to their tendency of
487
yielding more balanced dendrograms, while centroid linkage is often used in genomics but suffers
488
from the drawback of inversion where two clusters are fused at a height below either of the individual
489
ones. [95] Centroid is often used in chromatography field. In general, the resulting dendrograms are
490
strongly dependent on the linkage used.
Table 5. A summary of the four most commonly-used types of linkage in hierarchical clustering. [95]
491
492
This clustering method is very popular in 2D chromatography application and is said to be used for
493
the case when sample sets are smaller than 250. [96] Ru et al. [97] applied both HCA and Principle
494
component analysis (PCA, explained later) for the peptide sample data sets analyzed by 2D LC-MS
495
for the peptide feature profiling of human breast cancer and breast disease sera. Schmarr et al. [98]
496
also applied HCA and PCA for profiling the volatile compounds from fruits. Groger et al. [99]
497
applied HCA and PCA as well for profiling of illicit drug samples.
498
499
500
501
22 | P a g e
502
PCA
503
Principle component analysis (PCA) is a very useful statistical method that can be applied to
504
simplification, data reduction, modeling, outlier detection, variable selection, classification, prediction
505
and unmixing. [101]
506
Basically, for a data matrix X containing N objects (rows) and K variables (columns), PCA
507
approximates the data matrix X, in terms of the product of two smaller matrices T and P which
508
capture the essential data patterns of X to interpret the information. [101] Generally, objects are the
509
samples and variables are the measurements. The decomposition in PCA can be performed by
510
eigenvalue decomposition, or singular value decomposition (SVD). The illustration of PCA by
511
eigenvalue decomposition is given in Figure 6.
512
513
514
515
516
Figure 6. A data matrix X with its first two principal components.
The model of PC in matrix form can be mathematically expressed as:
𝑋 = 𝑇 ∙ 𝑃𝑇 + 𝐸
(2)
517
Where T are called scores, having the same number of rows as the original data matrix, P are
518
loadings, having the same number of columns as the original data matrix and E, which is not
519
explained by the PC model, is the residuals. ti and pj are denoted as the vectors in each scores and
520
loadings matrix.
521
522
From a geometric perspective, a data matrix X ( N rows × K columns) can be represented as an
523
ensemble of N points distributed in K dimensions space. This space may be termed M-space for
524
measurement space or multivariate space or K-space to indicate its dimensionality. [101] It is difficult
525
to visualize by human eyes when K>3 but not by mathematically. The number of principle
526
components is used to explain the information in PCA dimensionality. When a one-component PC
527
model is sufficient to explain the data, the model will be a straight line; and a two-component PC
528
model is a plane consisting of two orthogonal lines, so a three-component PC model is a three-
23 | P a g e
529
dimensional space wherein three lines are orthogonal. Besides, an A-components PC model is an A-
530
dimensional hyperplane, which is a subspace of one dimension less than its ambient space. In data
531
reduction, this is useful when the original data set is large and complex so that PCA can approximate
532
it by a moderately complex model structure.
533
534
In PCA algorithm, it searches for the axis used for the projection of the data points where the loss of
535
information (variability) is the minimum. In other words, since PCA is a least squares model, the PC
536
model is built on the basis that the sum of the squared residuals (Stotal) is the smallest. The first
537
principle component (PC1) is the one capturing the largest variance containing the most useful
538
information of all, then it is rotated to search for PC2 which is orthogonal to PC1 capturing the second
539
largest variance in the left variances. This process carries on till all the useful variances are captured
540
by PCs only leaving the residuals (noise) out. The rule of PC model is that all the PCs should be
541
orthogonal. The number of principle components is determined by the total contribution of PCs able
542
to explain the data matrix, which is dependent on the size of the component. After transforming the
543
data matrix into a number of PCs, the size of each component is measured. The size of the component
544
is termed eigenvalue which quantifies the variance captured by the PC: the more significant the
545
component, the larger the size. [101] The eigenvalue can be calculated by sum of squares of each PC
546
scores vector (Sk) divided by the sum of squares of the total data (Stotal). A basic assumption in PCA is
547
that the scores and loadings vectors corresponding to the largest eigenvalue contains the most useful
548
information relating to the specific problem. [100] One simple example is present in Table 6. PC1
549
explained 44.78% of the total data matrix and the first three PCs accounted for 95.37% of the total,
550
hence in this case, three principle components were sufficient to explain the information from data
551
matrix.
552
Total
670
PC1
PC2
PC3
PC4
PC5
Eigenvalue
300
230
109
20
8
%
44.78
34.34
16.27
2.99
1.19
Cumulative %
44.78
79.11
95.37
98.36
99.55
553
554
555
556
557
A common rule for choosing the number of principle components is determined when the cumulative
558
value exceeds the cut-off value 95%. However, this would not work in every case due to the fact that
559
Stotal is dependent on the variance of the raw data. Sometimes it requires a preprocessing of the data
560
before applying PCA, and this can be achieved through scaling (e.g. mean-centering the data by
561
subtracting the column averages corresponds to moving the coordinate system to the centre of the
Table 6. Illustration of size of eigenvalues in PCA. [101]
24 | P a g e
562
data). The scaling is essential since PCA is a least squares method meaning the variables with large
563
variances will have large loadings, which enlarges the scale of the coordinate axis. The common ways
564
to avoid this bias are standardization or mean-centering or log centering the matrix columns so the
565
variance will be 1. [94] The scaling of variance makes all coordinate axes have the same length so that
566
each variable having same influence on the PC model. [94] In general, the 95% cut-off rule is not the
567
standard one to use that one should also take the nature of the data and personal experience into
568
account.
569
PCA provides two kinds of plots scores (T) and loading (PT ) plot. Each of which investigates the
570
relationships among objects and variables respectively. The value of object i (its projection) on a
571
principle component p is termed score.
572
(B)
573
574
575
576
577
Figure 7. (A) The scores plot obtained from PCA of 18 basil, 18 peppermint, and 18 stevia GC×GC-TOFMS
m/z 73 chromatograms demonstrates differentiation between-species based on metabolite profiles. Ref [176];
(B) PCA plot of 2D GC-TOFMS data for serum samples. The group A samples were stored at -20℃ and the
group B samples were stored at -80℃. [80]
578
As mentioned in the beginning that PCA can be applied to grouping, and PCA is an unsupervised
579
method. PCA has been widely used in chromatography, and working efficiently in 2D
580
chromatography. [80, 100, 102-103] With 54 chromatograms of three different species of plants (18
581
for each species), Pierce et al. [102] used PCA to quickly and objectively discover differences
582
between complex samples analyzed by 2D GC-MS. PCA compared the metabolite profiles of the 54
583
submitted chromatograms and 54 scores of the m/z 73 data sets were successfully clustered in three
584
groups according to the types of the plant, furthermore, they yielded highly loaded variables
585
corresponding with chemical differences between plants providing complementary information for
586
m/z 73. However, this approach has never been demonstrated for 2D GC-TOFMS metabolites data.
587
588
PCA has also been used in quality control to detect possible outliers by Castillo et al. [80]. 60 human
589
serum samples analyzed by 2D GC-MS and the total metabolite profiles were used in the evaluation.
590
All samples were separated into two clusters by the storage temperature as can be seen in Figure 7 (B)
25 | P a g e
591
indicating no outliers in this case. An example of PCA applied to 2D GC data by Schmarr et al. [98]
592
for profiling the volatile compounds from fruits is given in Figure 8.
593
594
595
596
597
598
599
600
Figure 8. PCA analysis: In the first/second principal component plot (panel A), except for “Cox-Orange” and
with much lower distance “Pinova”, all apples (reddish and yellowish color shades) are projected into the center.
Pears, which are encoded by green color hue, appear on the upper left, while “Alexander Lucas” and
“Conference” are clearly distinguishable. The group of quince fruit samples appears at largest distance to the
other samples on the upper right. [98]
601
method called hierarchical PCA (H-PCA) was suggested by Pierce et al. [102] to be conceivably
602
applied to this type of data. The principle of H-PCA is basically the same but providing more
603
information due to its dealing with higher dimensional data sets. It works in the way of constructing
604
several PCA models based on a subset of the entire higher-order data set (i.e. all the mass channels of
605
2D GC-MS), and the scores from all PCA models can be combined to form a new matrix. The same
606
extension of PLS is termed H-PLS and both methods are well explained by Wold et al. [104].
607
MPCA
608
Multiway principle component analysis (MPCA), an extension of PCA, has recently become a
609
promising exploratory data analysis method with 2D GC data. [102-103, 105-107] The principle of
610
MPCA is basically the same as PCA, only extended to higher order data generated by instruments. In
611
short, MPCA is an unfold method where the two-way data for each sample is unfolded row-wise and
612
PCA is performed on the unfolded data. [107] It has been applied to extracted matrices of raw 3D data
613
arrays to determine the exact compounds distinguishing classes of samples. [108-109, 110]
Indeed, PCA has been proved to be a very popular method in 2D chromatography. Still, another
614
26 | P a g e
615
Supervised
616
PLS
617
Partial least squares analysis (PLS) is a dimension reduction method which first identifies a new set of
618
M features Zi (i=1,2,3,…M) that are linear combinations of the original data and then fits a linear
619
model by least squares. However, PLS differs from principle component regression (PCR) in a
620
supervised way, which is as explained earlier taking the response into account. In other words, PLS
621
works on finding the linear combination of highly loaded chromatographic signals by capturing the
622
highest covariance of both the variable and response. An comparison example between PCR and PLS
623
was presented in Figure 9.
624
625
626
627
628
Figure 9. Comparison between PCR and PLS; where the first PLS direction (solid green line) and first PCR
direction (dotted green line) are shown. [203]
Different from PCA, PLS places the highest weight on the variables that are most strongly related to
629
the response. From both Figure 9 and the principle of PLS, it is clear that vectors in PLS does not fit
630
the predictors as closely as PCA does but provides a better explanation on the relationship with
631
response, which is very important for quantification; furthermore, PLS can work with multivariate
632
response variables.
633
Multi-way partial least squares (N-PLS) is an extension of PLS into multiple dimensions. [111-112]
634
Interval multi-way partial least squares (iNPLS) as it says in the name uses intervals of multi-way
635
datasets to build calibration models. [113] iNPLS is an extension of interval partial least squares
636
(iPLS) proposed by Norgaard et al. [114], which was developed for first order data by splitting the
637
dataset into a collection of intervals set by users and then a PLS model is calculated for each interval
638
with the lowest root mean square of cross-validation (RMSECV). However, there is no algorithm for
639
second-order data though iPLS, like many other methods, could also be used for it by unfolding the
640
data as PCA does. This also arise some problems (i.e. introduce bias into calibration because the
641
untargeted peaks coeluting with targeted ones are also calculated in the intervals) when applying to
642
GC×GC data by unfolding. This has resulted in the development of iNPLS. iNPLS is basically the
27 | P a g e
643
same as PLS, but splitting data matrix into intervals in both dimensions with multi-way algorithm.
644
Like NPLS, iNPLS does not have second-order advantage but is able to analyze an unknown sample
645
containing interferences which do not present in the calibration. As an supervised pattern recognition
646
method, partial least-squares discriminant analysis (PLSDA) has been used for modelling,
647
classification and prediction in 2D GC-TOFMS [115].
648
649
Fisher ratio analysis (FRA) which calculates the ratio of variance between groups and variance within
650
groups as a function of an independent variable is a robust method for classification. In
651
chromatography, the new independent variable is the retention time for classification. The schematic
652
of reducing 4D data to 2D for Fisher ratio calculation has been well depicted by Pierce et al. [116] It
653
has been applied to breast cancer tumor data analyzed by 2D GC-MS. [117] Guo et al. [118] applied
654
FRA to 2D GC-MS data for metabolite profiling. FRA has been successfully applied to 2D GC-
655
TOFMS dataset by Pierce et al. [116] and proved to be better than PCA when handling biodiversity
656
by differentiating regions of large within-class variance from regions of large class-to-class variance.
657
658
659
660
661
Figure 10. Schematic of novel indexing scheme to reduce 4D data to 2D data for calculation of Fisher ratios.
Ultimately, the entire set of data that is collected is automatically (i.e., not manually) submitted to Fisher ratio
analysis, and the unknown chemical differences among complex samples are objectively discovered. [116]
662
663
664
28 | P a g e
665
Peak Deconvolution
666
Peak deconvolution is to resolve the overlapping peaks in complex mixture to enhance selectivity of a
667
certain chromatographic technique when separation cannot be improved by optimizing the separation
668
conditions. [22] This is necessary for quantification of components of interest. Several chemometrical
669
approaches can be used to achieve deconvolution with a single-channel detector, however they are not
670
used in an everyday practice due to the prerequisite of advanced knowledge. [22] Typical data sets
671
obtained by 1D technique can be presented as a two-way data table as in Figure 11. This type of data
672
can be decomposed into two matrices containing both concentrations (chromatogram) and profiles
673
(spectral). An example is to use orthogonal projection approach (OPA) [119] followed by alternating
674
least squares (ALS) [120]. Two alternatives of OPA are window factor analysis (WFA) and evolving
675
factor analysis (EFA). [121] Due to the limitation of EFA on prediction of peak shapes, a non-
676
iterative EFA (iEFA) was developed [122]. EFA was successfully applied to LC-DAD and GC-MS
677
data. A good review of mixture-analysis approaches for bilinear data matrices has been published with
678
comparisons of different methods. [123]
679
680
681
682
683
684
Figure 11. Illustration of the bi-linear chromatographic data. Columns of matrix X contain the concentration
profiles (chromatograms) and rows contain the spectral profiles. [123]
29 | P a g e
685
The methods mentioned above are mostly designed and applied to 1D chromatography data sets,
686
including those with multichannel detectors (i.e. DAD) producing second-order (trilinear) data sets.
687
For 2D chromatography, the data form will be first-order (bilinear) with only 2D chromatography and
688
the order of data will increase to trilinear (can be decomposed into a matrix 𝐼 × 𝐽 × 𝐾) or higher with
689
different detectors (i.e. single wavelength UV detector or MS). Second-order data provides a trilinear
690
structure and the trilinear structure is beneficial for signals which are not sufficiently resolved by the
691
instrument can be resolved mathematically. Zeng et al. [124] developed an method (named
692
simultaneous deconvolution) using non-linear least squares curving fitting (NLLSCF). The NLLSCF
693
analysis was based on Levenberg-Marquardt algorithm due to its satisfactory performance to treat
694
multi-parameter systems. This method allows simultaneous deconvolution and reconstruction of peak
695
profiles for both dimensions and this makes accurate quantification and finding of retention
696
parameters of target components. It was originally designed for GC×GC datasets but can also be
697
utilized for LC×LC datasets.
698
699
700
701
702
703
704
705
706
707
708
709
710
711
Figure 12. Illustration of bilinear data structure. The data matrix, in the form of a contour plot, depicts the
overlapped GC×GC signals of three components, A-C. Each component’s signal is bilinear when its noise-free
signal can be modeled as the product of its pure column 2 chromatographic profile vector and its pure column 1
chromatographic profile vector. Here, the bilinear portion of the data matrix is modeled as the product of two
matrices, each matrix containing the pure chromatographic profile vectors for components A-C on a given
column. The nonbilinear portion of the data matrix is grouped into one matrix designated as noise.
Concentration information for each component is incorporated within each pure chromatographic profile vector.
[124].
712
successfully applied to 1D chromatography data, such as deconvolution peaks in LC-DAD [88, 125].
713
It was the first deconvolution method applied to comprehensive 2D chromatography [83, 126]. The
714
schematic bilinear data structure of 2D chromatography was depicted in Figure 12. The application of
715
GRAM extended from 1D GC chromatography to 2D GC chromatography is based on that the second
716
column of a GC×GC system can be treated as a multichannel detector. [83] Full resolution of all the
717
analytes of interest is not necessary since GRAM can be successfully applied to 2D GC data. [82-83,
Generalized rank annihilation method (GRAM) introduced by Sanchez and Kowalski [66] has been
30 | P a g e
718
126] It was even demonstrated that GRAM can mathematically resolve overlapped GC×GC signals
719
without any preprocessing alignment to the data sets under favorable conditions. [82] The
720
deconvoluted peaks in GC×GC was presented in Figure 13.
721
722
723
724
725
726
727
728
Figure 13. GRAM deconvolution of the overlapped ethylbenzene and p-xylene components in the sample white
gas comprehensive GC=GC chromatogram shown in Figure 9, using a white gas standard for comparison. (A)
First GC column and (B) second GC column estimated pure profiles. Deconvolution is successful despite the
low 0.09 resolution of the peaks on the second column, because retention times are very precise within and
between GC runs. [82]
729
detection response must be linear with the concentration; secondly, the peak shapes must remain
730
unchanged, which means there is no overloading effect on the column; thirdly, the convoluted peaks
731
must have resolution on both dimensions; finally, there cannot be perfect covariance in concentration
732
of two compounds within the data window being analyzed from the standard to the sample. A key
733
advantage of GRAM over other analysis methods is that the unknown sample can contain overlapped
734
peaks not present in the calibration standard. [83] While GRAM can only quantify two injections at
735
one time where one of which needs to be a standard, PARAFAC does not have these limitations and
736
can be used to analyze more than two samples for three-way LC×LC data and four-way LC×LC data.
Nevertheless, there are some prerequisites of using GRAM to 2D chromatographic data [83]: first, the
737
738
Figure 14. Schematic overview of PARAFAC. [106]
31 | P a g e
739
Parallel factor analysis (PARAFAC) is a generalization of PCA using an iterative process to resolve
740
trilinear signals by optimizing initial estimates by ALS and signal constraits. [127] It has been applied
741
to peak deconvolution in third-order data generated by 2D GC-TOFMS. [128-129] It was shown that
742
PARAFAC results can be consistent for replicate analyses even when the accuracy is not as optimal.
743
Trilinear decomposition (TLD) initiated PARAFAC was shown to be powerful for multivariate
744
deconvolution in 2D GC-TOFMS data analysis, the partially resolved components in complex
745
mixtures can be deconvoluted and identified without requiring a standard dataset, signal shape
746
assumptions or any fully selective mass signals. [128] PARAFAC is also applicable for higher order
747
data than three way datasets. [120]
748
749
750
751
Figure 15. (A) PARAFAC deconvoluted column 1 pure component profiles of the low-resolution isomer data
set. (B) PARAFAC deconvoluted column 2 pure component profiles of the low-resolution isomer data. [128]
752
32 | P a g e
753
Compare to PARAFAC, PARAFAC 2 does not need alignment before. [250-251] PARAFAC 2
754
allows peaks to shift between chromatograms by relaxing the bilinearity constrait on the dimension
755
containing the chromatographic data. When analyzing 2D LC-DAD samples, PARAFAC was not
756
capable of analylzing the entire sample at once. 2D GC-TOFMS datasets treated by PARAFAC (with
757
alignment) and PARAFAC2 were compared [131]: it was found that PARAFAC was more robust
758
with lower S/N and lower concentrations while PARAFAC2 did not need alignment analysis.
759
However, this study was performed on fully resolved peak instead of overlapped peaks. Both methods
760
are based on ALS minimization of the residual matrix and yields direct estimates of the concentrations
761
without bias. [106] However, PARAFAC2 only permits the inner-structure relationship in one
762
direction.
763
764
765
766
767
Figure 16. Accuracy of the various quantitation methods based on the analysis of a reference mixture with
known analyte concentrations. [106].
768
results were shown in Figure 16. It was shown that the model given by PARAFAC2 was
769
overestimating while PARAFAC is the most accurate method of all. Even with real samples, the
770
results obtained also showed that PARAFAC2 was overestimating the concentration in all cases. It
771
was also shown that PARAFAC2, which was supposed to be able to deal with retention time shift due
772
to its inner-product structure, was neither accurate nor robust. [198]
Van Mispelaar et al. [106] compared several methods for calibration with a standard mixture and the
773
774
33 | P a g e
775
Conclusion & Future work
776
For data pre-processing and real data processing procedures, a variety of methods can be chosen for
777
two dimensional chromatography datasets, and they all have their advantages and disadvantages. It is
778
a pity that the utilization of method is mostly dependent on user experience and preferences that
779
people tend to use what they have learned. There are no clear-cut guidelines to choose the optimal
780
methods. The search for optimal methods in the future would be substantially beneficial for the
781
development of chemometrics.
782
783
784
Tool box
785
Currently, most of the pre-processing algorithms and data processing methods for classification and
786
quantification can be applied directly in tool box packed in softwares such as Matlab and R. (i.e. The
787
PLS algorithm was from the PLS Toolbox by Eigenvector Research Inc. (Eigenvector Research Inc.,
788
Wenatchee, WA). The N-PLS algorithm was from the N-Way Toolbox by Rasmus Bro
789
www.models.life.ku.dk/source/nwaytoolbox/ This has stimulated the development of algorithm by
790
user experience and provided a variety of choices on methods by user preferences.)
34 | P a g e
Reference
1. Nomenclature for chromatography, (IUPAC Recommendations 1993)
2. D. Harvey, Modern analytical chemistry 1st ed., p549
3. L.R. Snyder, Introduction to Modern Liquid Chromatography 3rd ed., p76
4. K.S. Booksh, B. R. Kowalski, Ana. Chem., 66 (1994) 782-791
5. J. Blomberg, High Resolut. Chromatogr., 20 (1997) 539
6. R.B. Gaines, Environ. Sci. Technol., 33 (1999) 2106
7. R.B. Gaines, in: Z. Wang, S. Stout (Eds.), Spill Oil Fingerprinting and Source Identification,
Academic Press, New York, 2006, p. 169
8. G.S. Frysinger, High Resolut. Chromatogr. 23 (2000) 197
9. G.S. Frysinger, Environ. Sci. Technol., 37 (2003) 1653
10. J. Beens, J. High Resolut. Chromatogr., 23 (2000) 182
11. C.M. Reddy, Environ. Sci. Technol., 36 (2002) 4754
12. C.M. Reddy, J. Chromatogr. A, 1148 (2007) 100
13. G.T. Ventura, PNAS, 104 (2007) 14261
14. G.T. Ventura, Org. Geochem., 39 (2008) 846
15. A. Motoyama, Anal. Chem., 80 (2008) 7187
16. J. Peng, J. Proteome Res., 2 (2003) 43
17. M. Gilar, Anal. Chem., 77 (2005) 6426
18. J. Peng, J. Proteome Res., 2003, 2, 43
19. X. Zhang, Anal. Chimica. Acta., 664 (2010) 101
20. P.Q. Tranchida, J. Chromatogr. A, 1054 (2004) 3
21. InforMetrix, Chemometrics in Chromatogr., 1996
22. M. Daszykowski, Trends in Anal. Chem., 25 (2006) 11
23. G. Musumarra, J. Anal. Toxicology, 11 (1987) 154
24. I. Moret, J. Sci. Food Agric., 35 (1984) 100
25. I. Moret, Riv.Vitic. Enol., 38 (1985) 254
26. L.E. Stenroos, J. Am. Soc. Brew. Chem., 42 (1984) 54
27. P.C. Van Rooyen, Dev. Food Sci., 10 (1985) 359
28. B.E.H. Saxberg, Anal. Chim. Acta, 103 (1978) 201
29. H. Engman, J. Anal. Appl. Pyrolysis, 6 (1984)137
30. W.R. Butler, J. Clin. Microbiol., 29 (1991) 2468
31. B.R. Kowalski, Anal. Chem., 47 (1975) 1152
32. R.J. Marshall, J. Chromatogr., 297 (1984) 235
33. J.A. Pino, Anal. Chem., 57 (1985) 295
34. J.E. Zumberge, Cosmochim. Acta, 51 (1987) 1625
35. Karl S. Booksh, Bruce R. Kowalski, Analytical Chemistry, 66 (1994) 782
36. M. Escandar, Anal. Chimica Acta, 806 (2014) 8
37. B.J. Prazen, Anal. Chem. 1998, 70, 218
38. K.M. Pierce, J. Chromatogr. A, 1255 (2012) 3
39. K.M. Pierce, Sep. & Purif. Rev., 41 (2012) 143
40. J. Engel, Trends in Anal. Chem., 50 (2013) 96
41. R.G. Brereton, Applied chemometrics for scientists,
42.docs.google.com/viewer?url=http%3A%2F%2Fwww.chemometrics.ru%2Fmaterials%2Fpresentati
ons%2Fwsc4%2Fbogomolov.ppt
43. Z. Zhang, Analyst, 135 (2010) 1138
44. E. T. Whittaker, P. Edinburgh Math. Soc., 41 (1922) 63
45. P. J. Green and B. W. Silverman, Nonparametric regression and generalized linear models a
roughness penalty approach, Chapman & HallCRC, London, 1994.
46.J. O. Ramsay and B. W. Silverman, Functional data analysis, Springer,New York, 1998.
47. http://www.alglib.net/interpolation/leastsquares.php#header1
35 | P a g e
48. H.F.M. Boelens, P.H.C. Eilers, R. Dijkstra, F. Fitzpatrick, J.A. Westerhuis, J. Chrom. A, 1057
(2004) 21
49. H.F.M. Boelens, P.H.C. Eilers, Th. Hankemeier, Anal. Chem. 2005.
50. J.C. Cobas, J. Magnetic Resonance 183 (2006) 145
51. Z. Zhang, J. Raman Spectrosc., 41 (2010) 659
52. http://www.science.uva.nl~hboelenspublicationsdraftpubEilers_2005.pdf
53.W. K. Newey and J. L. Powell, Econometrica, 1987, 819
54. S.E. Reichenbach, J. Chromatogr. A, 985 (2003) 47
55. S.E. Reichenbach , Journal of Chromatography A, 1216 (2009) 3458–3466
56. J. J. de Rooi , Analytica Chimica Acta 771 (2013) 7– 13
57. M.R. Filgueira, Anal. Chem. 2012, 84, 6747
58. wiki.eigenvector.com
59.Data analysis and signal processing in chromatography, elsevier, attila felinger, 1998
60.P.G. Stevenson, M. Mnatsakanyan, G. Guiochon, R.A. Shalliker, Analyst 135 (2010) 1541
61. P. H. C. Eilers, Anal. Chem. 2003, 75, 3631
62. http://en.wikipedia.org/wiki/Uncertainty_principle
63. R.X. Gao and R. Yan, Wavelets: Theory and Applications for Manufacturing
64. G. Vivo-Truyols, Anal. Chem. 2006, 78, 4598
65. G. Vivo-Truyols, Journal of Chromatography A, 1217 (2010) 1375
66. A. Felinger, Data Analysis and Signal Processing in Chromatography, Elsevier,Amsterdam, 1998
(Chapter 8).
67. I. Francois, K. Sandra, P. Sandra, Anal. Chim. Acta 641 (2009) 14.
68. L. Mondello, M. Herrero, T. Kumm, P. Dugo, H. Cortes, G. Dugo, Anal. Chem. 80 (2008) 5418
69. S. Peters, Journal of Chromatography A, 1156 (2007) 14
70. J. Beens, H. Boelens, R. Tijssen, J. Blomberg, J. High Resolut. Chromatogr. 21 (1998) 47.
71. S.E. Reichenbach, Chemom. Intell. Lab. Syst. 71 (2) (2004) 107.
72. S.E. Reichenbach , Journal of Chromatography A, 1071 (2005) 263–269
73. S.E. Reichenbach , J. Sep. Sci. 2010, 33, 1365–1374
74. G. Vivo-Truyols, Journal of Chromatography A, 1096 (2005) 133–145
75. Y. Yu, Analyst, 2013, 138, 627
76. N. Vest Nielsen, Journal of Chromatography A, 805 (1998) 17
77. N. P. V. Nielsen, J. Chromatogr., A, 805 (1998) 17
78. P. H. C. Eilers, Anal. Chem. 2004, 76, 404
79. D. Zhang, Anal. Chem. 2008, 80, 2664-2671
80. S. Castillo, Anal. Chem. 2011, 83, 3058
81. K.M. Pierce, Anal. Chem. 2005, 77, 7735
82. Bryan J. Prazen, J. Microcolumn Separations, 107 (1999) 97
83. Carsten A. Bruckner, Anal. Chem., 70 (1998) 2796
84. Faber, N. M.; Buydens, L. M. C.; Kateman, G. Anal. Chim. Acta, 296 (1994) 1
85. Faber, N. M.; Buydens, L. M. C.; Kateman, G. Chemom. Intell. Lab. Syst. 1994, 203
86. Malinowski, E. R. J. Chemom. 1988, 3, 49
87. Carlos G. Fraga, Anal. Chem. 2001, 73, 5833
88. Carlos G. Fraga, Anal. Chem. 1998, 70, 218
89. Elbert J.C. van der Klift, Journal of Chromatography A, 1178 (2008) 43
90. J. Vial, Talanta 83 (2011) 1295
91.R. E. Mohler, Journal of Chromatography A, 1186 (2008) 401
92. M. Kallio, Journal of Chromatography A, 1216 (2009) 2923
93. . Book_Chemometrics. Data analysis for the laboratory and chemical plant
94. D.L.Massart, Handbook of Chemomerics and Qualimetrics; J.N.Miller, Statisticas and
Chemometrics for Analytical Chemistry; etc
95. G. Jame,s An Introduction to Statistical Learning
96. Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998) Cluster
97. Q. Cindy Ru, Molecular & Cellular Proteomics 5.6
98. H. Schmarr, Journal of Chromatography A, 1217 (2010) 565
99. Th. Groger, Journal of Chromatography A, 1200 (2008) 8
36 | P a g e
100. S. Wold, Chemometrics and Intelligent Laboratory Systems, 2 (1987) 37
101. R.G. Nreretpm. Applied Chemometrics for Scientists
102. K.M. Pierce, Talanta 70 (2006) 797
103.V.G. van Mispelaar, Journal of Chromatography A, 1071 (2005) 229
104. S. wold, JOURNAL OF CHEMOMETRICS, VOL. 10,463-482 (1996)
105. K.M. Pierce, Journal of Chromatography A, 1184 (2008) 341
106. V.G. van Mispelaar , Journal of Chromatography A, 1019 (2003) 15
107. G.T. Ventura, Journal of Chromatography A, 1218 (2011) 2584
108. P. Giordani, J. Chemometrics 2004; 18: 253
109. A.K.Smilde, J. Chemometrics 2003; 17: 323
110. 238.S.Wold, J. Chemom. 1987, 1, 41
111. 253.L.A.F. de Godoy, E.C. Ferreira, M.P. Pedroso, C.H.V. Fidelis, F. Augusto, R.J. Poppi, Anal.
Lett. 41 (2008) 1603.
112. 254.M.P. Pedroso, L.A.F. de Godoy, E.C. Ferreira, R.J. Poppi, F. Augusto, J. Chromatogr.A
1201 (2008) 176.
113. L.A.F. de Godoy, Talanta 83 (2011) 1302–1307
114. L. Norgaard, A. Saudland, J. Wagner, J.P. Nielsen, L. Munck, S.B. Engelsen, Appl. Spectrosc.
54 (2000) 413.
115. X. Li, Analytica Chimica Acta 633 (2009) 257
116. K.M. Pierce, Anal. Chem. 2006, 78, 5068
117. S. E. Reichenbach, Talanta 83 (2011) 1279
118.
X.
Guo,
Biotechnology
and
Bioengineering,
99
92008)
4
119. F.C. Sanchez, Anal. Chem. 1996, 68, 79
120. 167.R. Tauler, D. Barcelo, Trends Anal. Chem. 12 (1993) 319.
121. F.C. Sanchez, Chemometrics and Intelligent Laboratory Systems 36 (1997) 153
122. 222. Maeder, M. Anal. Chem. 1987, 59, 527
123. F.C. Sanchez, Chemometrics and Intelligent Laboratory Systems 34 (1996) 139
124. Z. Zeng, Journal of Chromatography A, 1218 (2011) 2301–2310
125. E. Sanchez, J. Chromatogr. 1987, 385, 151
126. C.G. Fraga, B.J. Prazen, R.E. Synovec, Anal. Chem. 72 (2000) 4154
127. O. Amador-Munoz, P.J. Marriott, J. Chromatogr. A 1184 (2008) 323.
128. A.E. Sinha, Journal of Chromatography A, 1056 (2004) 145
129. A.E. Sinha, Journal of Chromatography A, 1058 (2004) 209
130. R. Bro, J. Chemometrics 2003 17 274
131. T. Skov, J.C. Hoggard, R. Bro, R.E. Synovec, J. Chromatogr. A 1216 (2009)
37 | P a g e
Download