18th European Symposium on Computer Aided Process Engineering – ESCAPE 18
Bertrand Braunschweig and Xavier Joulia (Editors)
© 2008 Elsevier B.V./Ltd. All rights reserved.
Process Analytical Technologies (PAT) – the
Impact for Process Systems Engineering
Zeng Ping Chena, David Lovettb and Julian Morrisc
a
CPACT, Pure and Applied Chemistry Strathclde University, Glasgow, Scotland
Perceptive Engineering, Daresbury Innovation Centre, Daresbury,Cheshire, UK
c
CPACT,Chemical Engineering and Advanced Materials, Newcastle University, UK
b
Abstract
With the increasing take-up of Process Analytical Technologies (PAT) by the pharmachem, bio-pharma , specialty chemicals and materials manufacturing industries there is
a critical need for robust data verification, particularly as this information is being
included in real-time control or at least advisory feedback applications. In particular the
accuracy and reliability of spectral calibrations for processes which are subject to
variations in physical properties such as sample compactness, surface topology, etc are
becoming a hot topic The variation in the optical path-length materializing from the
physical differences between samples may result in multiplicative light scattering
influencing spectra in a nonlinear manner leading to the poor calibration performance.
In this paper a number of new approaches are shown to overcome the limitations of
existing methods and algorithms. Space precludes detailed descriptions of the new
algorithms, which are fully referenced, with some of the main results being presented
and the justification why these data validity issues need to be addressed by the Process
Systems Engineering and CAPE communities.
Keywords: Process Analytical Technologies, calibration, data validity, process control
1. Introduction
Pharma-Chem and bio-pharma, alongside speciality chemicals and materials
development and production are now being heavily influenced through the recent FDA
PAT2 initiative with spectroscopic instrumentation being increasingly applied for, or at
the very least explored, on-line real-time process control applications. This drives the
urgent need to incorporate and integrate the detailed spectral information into process
performance monitoring schemes. The enhancement of spectroscopic data analysis
techniques, calibration algorithms and robust software thus becomes even more
important if PAT (and Quality by Design - QbD) is to be widely applied and accepted.
As it is generally chemical information (in most cases the concentrations of the
chemical or biological compounds) inherent within the spectroscopic measurements,
rather than the spectroscopic measurements themselves, that are used for efficient
management and optimization as well as for quality control; calibration models are
therefore needed to transform abundant spectroscopic measurements into the desired
concentration information. The accuracy of the calibration models is influenced by a
number of factors, chemical and physical, the need for sophisticated chemometric
methodologies and algorithms for advanced spectral data analysis,the initial modelling
data and most critically for on-line deployment the “real-time” data quality. The real2
FDA, PAT - A Framework for Innovative Pharmaceutical Development, Manufacturing, and Quality
Assurance, http://www.fda.gov/cder/guidance, 2004
2
Julian Morris et al
time application of these calibration models provides a corner stone of PAT and hence
is of significant importance to the process systems engineering community in the
broader application of QbD
2. Challenges for PAT in Quality by Design
Robust and transferable calibration models are essential for the full implementation of
PAT/QbD. To achieve this two areas of complexity need to be addressed. One relates to
the variations in external process variables which can have different impact on different
chemical species in mixtures. For example, (i) fluctuations in temperature will provoke
non-linear shift and broadening in spectral bands of absorptive spectra of constituents in
mixtures, where such temperature-induced non-linear spectral variations can have
detrimental effects on predictive performance of multivariate calibration model if not
being properly taken into account when developing the model; (ii) uncontrolled
variations in optical path length due to the physical variations such as particle size and
shape, micro-organism growth, sample packing and sample surface which may cause
dominant multiplicative light scattering perturbations which will mask the spectral
variations related to the content differences of chemical compounds in samples.
2.1. Correction of Temperature Induced Spectral Variations for In-line Monitoring and
Control (Using Loading Space Standardization)
The routine application of PAT within a process control environment requires that the
building of calibration models becomes a routine, non-expert, application embedded
within a process systems engineering context. Typically variable selection models tend
to require special expertise and software; non-linear effects are often not removed by
filtering or resolved through orthogonal basis transformations such as a wavelet
transformation, etc. In practice a full account of the effect of temperature on the spectra
is therefore only possible through the application of nonlinear methods. Due to the
nonlinear characteristic of temperature effects, neither implicit modelling through the
inclusion of temperature into the calibration experimental design nor explicit inclusion
of temperature into the calibration model (such as treating the temperature of samples as
an extra independent variable appended to the spectra or as another dependent variable
can successfully eliminate the temperature influence on the predictions of calibration
models. There are a large number of methods attempting to resolve these and related
calibration issues methodologies2 and these are referenced in the papers by Chen et al).
E.g. calibration based on robust variable selection (PDS) and its extensions, CPDS for
compensation of temperature effects on spectra, With a view to compensating for
temperature effects on spectra, and ICS which was first proposed by Chen et al, 2004 to
eliminate temperature effects on the predictive abilities of calibration models for white
chemical systems, A full description of temperature influence is only possible using
nonlinear methods. Loading Space Standardization (LSS) was developed to generalize
the ideas behind ICS to correct temperature-induced spectral variations for grey
chemical systems (Chen et al, 2005). The underlying proposition is that if process
temperature variations have not been taken into account during the data collection phase
of the calibration task the calibration model built on the spectral data measured at
particular temperatures can only provide accurate predictions for reactions operating at
2
FT: Fourier Transform; WT: Wavelet Transform; CPDS: Continuous Piecewise Direct Standardization;:
PSR: Penalized Signal Regression; ICS: Individual Contribution Standardization; DS: Direct
Standardization; PDS: Piecewise Direct Standardization; MSC: Multiplicative Signal Correction; ISC:
Inverted Signal Correction; EMSC: Extended MSC; EISC: Extended ISC; LSS: Loading Space
Standardization; OPLEC: Optical Path Length Estimation and Correction
Process Analytical Technologies (PAT) – the Impact for Process Systems Engineering
1.0
1.0
0.8
0.8
Abs. (AU)
Abs. (AU)
the same temperatures. Temperature is a continuous variable in process analytical
control applications. It is not possible to build calibration models for every possible
temperature that will be encountered during process manufacturing. In order to apply
the calibration models established at training temperatures to future on-line
measurements under different temperature profiles it is necessary to model the
temperature effects and then standardize the spectra for future process measurements to
the corresponding spectra as if they were measured under calibration training
temperatures. In LSS the absorbance of each chemical species in every wavelength
follows polynomials with respect to temperature which are used to predict the loading
vectors at test temperature which play an important role in correcting spectral variations
caused by temperature differences between calibration models and measurements made
during production process control.
0.6
0.4
0.2
0.0
0.6
0.4
0.2
1350
1400
1450
1500
-1
1550
Wavenumber) (cm
1600
1650
0.0
1350
1400
1450
1500
-1
1550
)
1600
1650
Wavenumber (cm
Fig. 1a. Raw Data
Fig. 1b. LSS pre-processed spectra
Fig.1a shows the raw spectra of samples with different monosodium glutamate (MSG)
crystal concentrations with Fig. 1b the LSS pre-processed spectra of samples. These
results confirm the validity of the basic assumption underlying LSS. A recent extension
to LSS addresses the enhanced improvement of the linearity of spectroscopic data
subject to fluctuations in external variables such as temperature (see Chen et al 2007).
2.2. Improving the signal-to-noise ration through Smoothed Principal Components
(XRD for on-line monitoring of crystal morphology)
X-ray diffraction spectroscopy is one of the most widely applied methodologies for the
in-situ analysis of kinetic processes involving crystalline solids. However, due to its
relatively high detection limit, it has limited application in the context of crystallizations
from liquids. Methods that can lower the detection limit of x-ray diffraction
spectroscopy are therefore highly desirable. Most methods tend to only utilize the
frequency information contained in the single spectrum being processed to discriminate
between the signals and the noise and as a result may not successfully identify very
weak but very important peaks especially when these weak signals are masked by
severe noise. Smoothed principal component analysis (SPCA) which takes advantage of
both the frequency information and spectral common variations is proposed as a
methodology for the pre-processing of the x-ray diffraction spectra. SPCA was used to
provide enhanced extraction of polymorphic form information from high signal to noise
ratio x-ray diffraction spectra in a crystallization process (Chen et al 2005). The
resulting sensitivity was sufficiently high to enable the detection of the formation of
polymorphic structures at an early stage in the reaction. Figs. 4a (upper plot) shows the
X-ray diffraction profiles of the desired β-form and the corresponding (un-smoothed)
PLS calibration model (Fig. 4b upper plot) of the GA slurries. Fig. 4a (lower plot)
shows the corresponding smoothed PCA diffraction profiles and resulting PLS
calibration model Fig.4b (lower plot). Compared with other signal processing methods
such as the wavelet transformation, SPCA achieves lower detection limit of the β-form
3
4
Julian Morris et al
of GA with concentrations as low as 0.4% by weight being detected from GA-methanol
slurries comprising mixtures of both α and β forms (see Chen et al, 2005).
Intensity
800
β-form
Peak height
B1 B2
5000
4000
B3
3000
2000
400
200
0
-200
1000
10 12 14 16 18 20 22 24 26 28 30 32 34
-400
2θ
B1 B2
5000
β-form
4000
B3
3000
2θ = 29.8272
0
2
4
6
Concentration (%)
8
10
800
2000
R = 0.9951
600
Peak height
6000
Intensity
R= =0.9540
600
400
200
0
-200
1000
-400
10 12 14 16 18 20 22 24 26 28 30 32 34
2
4
6
8
10
Concentration (%)
2θ
Fig. 4a. Measured GA Profiles (upper
plot); Smoothed PCA (lower plot)
2θ = 29.8272
0
Fig 4b. PLS Calibration Models (raw
data), PLS (with Smoothed PCA)
2.3. Extracting Chemical Information from Spectral Data with Multiplicative Light
Scattering Effects (Optical Path-Length Estimation and Correction)
Spectroscopy in solid and heterogeneous types of samples that exhibit sample-to-sample
variability, the variation in the optical path length materializing from the physical
differences between samples, due to particle size and shape, sample packing, and
sample surface, for example, may result in the multiplicative light scattering effect
masking the spectral variations relating to the differences between the chemical
compounds present. The effect of multiplicative light scattering is difficult to handle
through the application of standard bilinear calibration methodologies as these are based
on the construction of latent variables that are a linear combination of the wavelengths.
Consequently if the spectral data are not appropriately preprocessed, the underlying
behavior of the data, relating to the chemical properties, will be masked due to the effect
of multiplicative light scattering. Many chemometric pre-processing methods have been
proposed to explicitly model the effect of multiplicative light scattering (MSC), e.g.
multiplicative signal correction and its variants. A new correction method enabling the
improved extraction of chemical information from spectral data with MSC problems
(Optical Path-Length Estimation and Correction – OPLEC does not place any
requirement on prior chemical knowledge and can be generally applied in unit. Figs. 5a
and 5b show the impact of the new spectral pre-processing approach, space precludes a
full discussion – see Chen, et al. 2006.
3.2
2.2
3.0
2.15
2.1
Abs. (AU)
Abs. (AU)
2.8
2.6
2.4
2.05
2.0
2.2
1.95
2.0
1.9
1.8
850
1.85
900
950
Wavelength (nm)
Fig. 5a. Raw Spectral data
1000
1050
850
900
950
1000
Wavelength (nm)
Fig. 5b. OPLEC Corrected Spectra
1050
Process Analytical Technologies (PAT) – the Impact for Process Systems Engineering
5
3. Challenges for PAT in Closed Loop Systems
The management of real time data, including pre-processing, outlier detection, outlier
isolation and record of uncertainty associated with data is vital in a validated
environment, to ensure complete traceability of all actions deployed by either a closed
loop control system or operator. This management housekeeping, underpins the
credibility for any software used for PAT and “real time” applications. This is an area
that has been considered in depth for safety critical systems, for instance, in the Nuclear
Industry, and one that again emphasizes the Process Systems Engineering approach
from input to final output. There are a number of data quality monitoring approaches to
strengthen the integrity and robustness of predictive engines that are worth
consideration and are likely to be a pre-requisite for real time PAT. This section
provides the results from an industrial application of a multivariate data quality
monitoring system on a biological process that supports analysers, by providing real
time data quality analysis to increase the availability, robustness and confidence in the
process sensors and analyzers prior to use in any subsequent computational action.
These spectroscopic developments were demonstrated by application to a commercial
pilot scale batch cooling crystallization. Fig.6a and b show the industrial pilot plant, the
supersaturation control system and a typical the resulting crystallization control run.
1.225
1.175
1.125
1.075
1.025
0.975
0.925
0.875
0.825
0.775
0.725
0.675
Concentration (g/500ml)
0.625
0.575
Turbidity (%)
0.525
Slimit
0.475
0.425
0.375
0.325
0.275
0.225
0.175
0.125
0.075
0.025
900
1000 1100 1200 1300
Smax = 1.125
S = 1.1
90
Smin = 1.075
Started Supersaturation
Control
80
70
5% seeds
added
60
50
Temperature (°C)
40
Solubility (g/500ml)
Supersaturation
30
Slimit
20
10
0
0
100
200
300
400
500
600
700
800
Time (min)
Fig. 6a. Industrial Pilot Plant Application
Fig 6b. Crystallization Control Run
3.1. Model-based Multivariate Statistical Process Performance Monitoring
In multivariate statistical process control (process performance monitoring) the
monitoring statistics depend upon the model residuals being ‘IID’. In practice, the
monitoring model (eg PCA or PLS model) will not be perfect and the residuals will
contain structure. A modified model-based approach (McPherson et al. 2002)
incorporates an additional residual modelling stage to remove structure from the
residuals as shown in Fig 7a. Examination of the model-plant mismatch residuals, Figs.
7b and 7c shows that both serial correlation and non-normal behaviour is still present.
A modified model-based approach is proposed where an additional residual modelling
stage is incorporated, as shown in Fig 7a, to remove the remaining structure and to
obtain a set of unstructured residuals suitable for process monitoring. Finally, Fig. 8
shows the structure of a commercial real time data quality monitoring and dynamic
model based control system designed for pharma and other PAT based applications.
4. Conclusions
Real time chemometrics and the incorporation of PAT Sensors into real time process
control need to be part of the CAPE toolkit. PAT devices capable of 1-2 second or sub-
Supersaturation (S = C/C*)
Temperature, Concentration, Solubility & Turbidity
100
6
Julian Morris et al
second measurement rates, real-time control based on a PAT measurement is a reality.
Richness of measurement: not just a single data point per sample but a vector of data per
sample. Sensor calibration models can give real time inference of product property.
Real time pre-processing - no control system is going to control a spectrum of several
hundred simultaneous values; so what is important? Is there is a calibration model to
infer product property? Are there particular features/segments of the spectrum of
interest? Should the scores of the PCA/PLS calibration model be controlled? Online
Spectral methods coupled with closed loop process control will be both critical and
useful tools for process transfer and comparability – process systems engineering has
much to offer the PAT initiative.
Outputs
Plant
Plant
Process Data
99
99
Percentage
95
First
FirstPrinciples
Principles
Model
Model
or
or
Dynamic
Empirical
Model
Dynamic
Empirical
Empirical Model
Model
80
60
40
20
10
5
Plant-Model
Mismatch
+
+
-
95
Percentage
Inputs
5
1
80
130
180
--
99
Percentage
80
60
40
20
10
5
Process Data
1.0
0.8
0.4
95
-3
-2
-1
0
1
Model Residuals
Dynamic Non-linear PLS
80
60
40
20
10
5
1
1
-0.5
Fig. 7a. Model based MSPC
1.0
0.8
Percentage
Multivariate
Multivariate
Monitoring
Monitoring
Charts
Charts
+
+
-4
Model Residuals
Linear PLS
95
Dynamic
Dynamic
Empirical
Empirical
‘Error-whitening’
‘‘Error’
Model
Model
80
60
40
20
10
1
99
-1
Plant-Model
Residuals
0.0
Data
0.5
1.0
-1
0
Data
1
Fig. 7b. Normal probability plots
Spectral Data
Residuals
Plant-Model Mismatch
0.4
0.0
0.0
-0.4
-0.4
-0.8
-1.0
PLS/PCA
Calibration
Model
-0.8
-1.0
2
12
22
32
42
52
62
2
12
22
32
42
52
62
Dynamic
PCA
Controller
Process Data
1.0
0.8
1.0
0.8
Model Residuals
Dynamic PLS
0.4
Model Residuals
Time Series Model
0.4
0.0
Discrete Data
0.0
-0.4
-0.4
-0.8
-1.0
-0.8
-1.0
2
12
22
32
42
52
62
Normal
Operating
Space
Design Space
21 CFR part 11 Compliant Records
2
12
22
Fig. 7c. Partial Correlation Plots
32
42
52
62
Fig. 8. Real Time Data Quality Monitoring
& Model-Based Control© Perceptive Eng.
Acknowledgements
EPSRC grant GR/R19366/01 (KNOW-HOW) and GR/R43853/01 (CBBII).
References
ZP. Chen, Morris, J.; Martin, E., (2004),Modeling temperature-induced spectral variations in
chemical process monitoring, Dycops.
ZP. Chen, Morris, J.; Martin, E., (2005), Correction of temperature-induced spectral variations by
loading space standardization, Anal. Chem. 77, 1376-1384
ZP. Chen, Morris, J.; Martin, E.; Hammond, R.B.; Lai, X.J.; Ma, C.Y.; Purba, E.; Roberts, K.J.;
Bytheway, R., (2005), Enhancing the signal to noise ratio of x-ray diffraction spectra by
smoothed principal component analysis, Anal. Chem. 77, 6563-6570
ZP. Chen, Morris, J.; Martin, E., (2006), Extracting chemical information from spectral data with
multiplicative light scattering effects by optical path-length estimation and correction, Anal.
Chem., 78, 7674-7681.
L. McPherson, E.B. Martin, & A. J. Morris (2002). Super Model Based Techniques for Batch
Process Performance Monitoring, ESCAPE 12, 523–528.
ZP. Chen, Morris, J., Improving the Linearity of Spectroscopic Data with Fluctuations in External
Variables by the Extended Loading Space Standardization,(2007), Under Review for Anal. Chem.
2