18th European Symposium on Computer Aided Process Engineering – ESCAPE 18 Bertrand Braunschweig and Xavier Joulia (Editors) © 2008 Elsevier B.V./Ltd. All rights reserved. Process Analytical Technologies (PAT) – the Impact for Process Systems Engineering Zeng Ping Chena, David Lovettb and Julian Morrisc a CPACT, Pure and Applied Chemistry Strathclde University, Glasgow, Scotland Perceptive Engineering, Daresbury Innovation Centre, Daresbury,Cheshire, UK c CPACT,Chemical Engineering and Advanced Materials, Newcastle University, UK b Abstract With the increasing take-up of Process Analytical Technologies (PAT) by the pharmachem, bio-pharma , specialty chemicals and materials manufacturing industries there is a critical need for robust data verification, particularly as this information is being included in real-time control or at least advisory feedback applications. In particular the accuracy and reliability of spectral calibrations for processes which are subject to variations in physical properties such as sample compactness, surface topology, etc are becoming a hot topic The variation in the optical path-length materializing from the physical differences between samples may result in multiplicative light scattering influencing spectra in a nonlinear manner leading to the poor calibration performance. In this paper a number of new approaches are shown to overcome the limitations of existing methods and algorithms. Space precludes detailed descriptions of the new algorithms, which are fully referenced, with some of the main results being presented and the justification why these data validity issues need to be addressed by the Process Systems Engineering and CAPE communities. Keywords: Process Analytical Technologies, calibration, data validity, process control 1. Introduction Pharma-Chem and bio-pharma, alongside speciality chemicals and materials development and production are now being heavily influenced through the recent FDA PAT2 initiative with spectroscopic instrumentation being increasingly applied for, or at the very least explored, on-line real-time process control applications. This drives the urgent need to incorporate and integrate the detailed spectral information into process performance monitoring schemes. The enhancement of spectroscopic data analysis techniques, calibration algorithms and robust software thus becomes even more important if PAT (and Quality by Design - QbD) is to be widely applied and accepted. As it is generally chemical information (in most cases the concentrations of the chemical or biological compounds) inherent within the spectroscopic measurements, rather than the spectroscopic measurements themselves, that are used for efficient management and optimization as well as for quality control; calibration models are therefore needed to transform abundant spectroscopic measurements into the desired concentration information. The accuracy of the calibration models is influenced by a number of factors, chemical and physical, the need for sophisticated chemometric methodologies and algorithms for advanced spectral data analysis,the initial modelling data and most critically for on-line deployment the “real-time” data quality. The real2 FDA, PAT - A Framework for Innovative Pharmaceutical Development, Manufacturing, and Quality Assurance, http://www.fda.gov/cder/guidance, 2004 2 Julian Morris et al time application of these calibration models provides a corner stone of PAT and hence is of significant importance to the process systems engineering community in the broader application of QbD 2. Challenges for PAT in Quality by Design Robust and transferable calibration models are essential for the full implementation of PAT/QbD. To achieve this two areas of complexity need to be addressed. One relates to the variations in external process variables which can have different impact on different chemical species in mixtures. For example, (i) fluctuations in temperature will provoke non-linear shift and broadening in spectral bands of absorptive spectra of constituents in mixtures, where such temperature-induced non-linear spectral variations can have detrimental effects on predictive performance of multivariate calibration model if not being properly taken into account when developing the model; (ii) uncontrolled variations in optical path length due to the physical variations such as particle size and shape, micro-organism growth, sample packing and sample surface which may cause dominant multiplicative light scattering perturbations which will mask the spectral variations related to the content differences of chemical compounds in samples. 2.1. Correction of Temperature Induced Spectral Variations for In-line Monitoring and Control (Using Loading Space Standardization) The routine application of PAT within a process control environment requires that the building of calibration models becomes a routine, non-expert, application embedded within a process systems engineering context. Typically variable selection models tend to require special expertise and software; non-linear effects are often not removed by filtering or resolved through orthogonal basis transformations such as a wavelet transformation, etc. In practice a full account of the effect of temperature on the spectra is therefore only possible through the application of nonlinear methods. Due to the nonlinear characteristic of temperature effects, neither implicit modelling through the inclusion of temperature into the calibration experimental design nor explicit inclusion of temperature into the calibration model (such as treating the temperature of samples as an extra independent variable appended to the spectra or as another dependent variable can successfully eliminate the temperature influence on the predictions of calibration models. There are a large number of methods attempting to resolve these and related calibration issues methodologies2 and these are referenced in the papers by Chen et al). E.g. calibration based on robust variable selection (PDS) and its extensions, CPDS for compensation of temperature effects on spectra, With a view to compensating for temperature effects on spectra, and ICS which was first proposed by Chen et al, 2004 to eliminate temperature effects on the predictive abilities of calibration models for white chemical systems, A full description of temperature influence is only possible using nonlinear methods. Loading Space Standardization (LSS) was developed to generalize the ideas behind ICS to correct temperature-induced spectral variations for grey chemical systems (Chen et al, 2005). The underlying proposition is that if process temperature variations have not been taken into account during the data collection phase of the calibration task the calibration model built on the spectral data measured at particular temperatures can only provide accurate predictions for reactions operating at 2 FT: Fourier Transform; WT: Wavelet Transform; CPDS: Continuous Piecewise Direct Standardization;: PSR: Penalized Signal Regression; ICS: Individual Contribution Standardization; DS: Direct Standardization; PDS: Piecewise Direct Standardization; MSC: Multiplicative Signal Correction; ISC: Inverted Signal Correction; EMSC: Extended MSC; EISC: Extended ISC; LSS: Loading Space Standardization; OPLEC: Optical Path Length Estimation and Correction Process Analytical Technologies (PAT) – the Impact for Process Systems Engineering 1.0 1.0 0.8 0.8 Abs. (AU) Abs. (AU) the same temperatures. Temperature is a continuous variable in process analytical control applications. It is not possible to build calibration models for every possible temperature that will be encountered during process manufacturing. In order to apply the calibration models established at training temperatures to future on-line measurements under different temperature profiles it is necessary to model the temperature effects and then standardize the spectra for future process measurements to the corresponding spectra as if they were measured under calibration training temperatures. In LSS the absorbance of each chemical species in every wavelength follows polynomials with respect to temperature which are used to predict the loading vectors at test temperature which play an important role in correcting spectral variations caused by temperature differences between calibration models and measurements made during production process control. 0.6 0.4 0.2 0.0 0.6 0.4 0.2 1350 1400 1450 1500 -1 1550 Wavenumber) (cm 1600 1650 0.0 1350 1400 1450 1500 -1 1550 ) 1600 1650 Wavenumber (cm Fig. 1a. Raw Data Fig. 1b. LSS pre-processed spectra Fig.1a shows the raw spectra of samples with different monosodium glutamate (MSG) crystal concentrations with Fig. 1b the LSS pre-processed spectra of samples. These results confirm the validity of the basic assumption underlying LSS. A recent extension to LSS addresses the enhanced improvement of the linearity of spectroscopic data subject to fluctuations in external variables such as temperature (see Chen et al 2007). 2.2. Improving the signal-to-noise ration through Smoothed Principal Components (XRD for on-line monitoring of crystal morphology) X-ray diffraction spectroscopy is one of the most widely applied methodologies for the in-situ analysis of kinetic processes involving crystalline solids. However, due to its relatively high detection limit, it has limited application in the context of crystallizations from liquids. Methods that can lower the detection limit of x-ray diffraction spectroscopy are therefore highly desirable. Most methods tend to only utilize the frequency information contained in the single spectrum being processed to discriminate between the signals and the noise and as a result may not successfully identify very weak but very important peaks especially when these weak signals are masked by severe noise. Smoothed principal component analysis (SPCA) which takes advantage of both the frequency information and spectral common variations is proposed as a methodology for the pre-processing of the x-ray diffraction spectra. SPCA was used to provide enhanced extraction of polymorphic form information from high signal to noise ratio x-ray diffraction spectra in a crystallization process (Chen et al 2005). The resulting sensitivity was sufficiently high to enable the detection of the formation of polymorphic structures at an early stage in the reaction. Figs. 4a (upper plot) shows the X-ray diffraction profiles of the desired β-form and the corresponding (un-smoothed) PLS calibration model (Fig. 4b upper plot) of the GA slurries. Fig. 4a (lower plot) shows the corresponding smoothed PCA diffraction profiles and resulting PLS calibration model Fig.4b (lower plot). Compared with other signal processing methods such as the wavelet transformation, SPCA achieves lower detection limit of the β-form 3 4 Julian Morris et al of GA with concentrations as low as 0.4% by weight being detected from GA-methanol slurries comprising mixtures of both α and β forms (see Chen et al, 2005). Intensity 800 β-form Peak height B1 B2 5000 4000 B3 3000 2000 400 200 0 -200 1000 10 12 14 16 18 20 22 24 26 28 30 32 34 -400 2θ B1 B2 5000 β-form 4000 B3 3000 2θ = 29.8272 0 2 4 6 Concentration (%) 8 10 800 2000 R = 0.9951 600 Peak height 6000 Intensity R= =0.9540 600 400 200 0 -200 1000 -400 10 12 14 16 18 20 22 24 26 28 30 32 34 2 4 6 8 10 Concentration (%) 2θ Fig. 4a. Measured GA Profiles (upper plot); Smoothed PCA (lower plot) 2θ = 29.8272 0 Fig 4b. PLS Calibration Models (raw data), PLS (with Smoothed PCA) 2.3. Extracting Chemical Information from Spectral Data with Multiplicative Light Scattering Effects (Optical Path-Length Estimation and Correction) Spectroscopy in solid and heterogeneous types of samples that exhibit sample-to-sample variability, the variation in the optical path length materializing from the physical differences between samples, due to particle size and shape, sample packing, and sample surface, for example, may result in the multiplicative light scattering effect masking the spectral variations relating to the differences between the chemical compounds present. The effect of multiplicative light scattering is difficult to handle through the application of standard bilinear calibration methodologies as these are based on the construction of latent variables that are a linear combination of the wavelengths. Consequently if the spectral data are not appropriately preprocessed, the underlying behavior of the data, relating to the chemical properties, will be masked due to the effect of multiplicative light scattering. Many chemometric pre-processing methods have been proposed to explicitly model the effect of multiplicative light scattering (MSC), e.g. multiplicative signal correction and its variants. A new correction method enabling the improved extraction of chemical information from spectral data with MSC problems (Optical Path-Length Estimation and Correction – OPLEC does not place any requirement on prior chemical knowledge and can be generally applied in unit. Figs. 5a and 5b show the impact of the new spectral pre-processing approach, space precludes a full discussion – see Chen, et al. 2006. 3.2 2.2 3.0 2.15 2.1 Abs. (AU) Abs. (AU) 2.8 2.6 2.4 2.05 2.0 2.2 1.95 2.0 1.9 1.8 850 1.85 900 950 Wavelength (nm) Fig. 5a. Raw Spectral data 1000 1050 850 900 950 1000 Wavelength (nm) Fig. 5b. OPLEC Corrected Spectra 1050 Process Analytical Technologies (PAT) – the Impact for Process Systems Engineering 5 3. Challenges for PAT in Closed Loop Systems The management of real time data, including pre-processing, outlier detection, outlier isolation and record of uncertainty associated with data is vital in a validated environment, to ensure complete traceability of all actions deployed by either a closed loop control system or operator. This management housekeeping, underpins the credibility for any software used for PAT and “real time” applications. This is an area that has been considered in depth for safety critical systems, for instance, in the Nuclear Industry, and one that again emphasizes the Process Systems Engineering approach from input to final output. There are a number of data quality monitoring approaches to strengthen the integrity and robustness of predictive engines that are worth consideration and are likely to be a pre-requisite for real time PAT. This section provides the results from an industrial application of a multivariate data quality monitoring system on a biological process that supports analysers, by providing real time data quality analysis to increase the availability, robustness and confidence in the process sensors and analyzers prior to use in any subsequent computational action. These spectroscopic developments were demonstrated by application to a commercial pilot scale batch cooling crystallization. Fig.6a and b show the industrial pilot plant, the supersaturation control system and a typical the resulting crystallization control run. 1.225 1.175 1.125 1.075 1.025 0.975 0.925 0.875 0.825 0.775 0.725 0.675 Concentration (g/500ml) 0.625 0.575 Turbidity (%) 0.525 Slimit 0.475 0.425 0.375 0.325 0.275 0.225 0.175 0.125 0.075 0.025 900 1000 1100 1200 1300 Smax = 1.125 S = 1.1 90 Smin = 1.075 Started Supersaturation Control 80 70 5% seeds added 60 50 Temperature (°C) 40 Solubility (g/500ml) Supersaturation 30 Slimit 20 10 0 0 100 200 300 400 500 600 700 800 Time (min) Fig. 6a. Industrial Pilot Plant Application Fig 6b. Crystallization Control Run 3.1. Model-based Multivariate Statistical Process Performance Monitoring In multivariate statistical process control (process performance monitoring) the monitoring statistics depend upon the model residuals being ‘IID’. In practice, the monitoring model (eg PCA or PLS model) will not be perfect and the residuals will contain structure. A modified model-based approach (McPherson et al. 2002) incorporates an additional residual modelling stage to remove structure from the residuals as shown in Fig 7a. Examination of the model-plant mismatch residuals, Figs. 7b and 7c shows that both serial correlation and non-normal behaviour is still present. A modified model-based approach is proposed where an additional residual modelling stage is incorporated, as shown in Fig 7a, to remove the remaining structure and to obtain a set of unstructured residuals suitable for process monitoring. Finally, Fig. 8 shows the structure of a commercial real time data quality monitoring and dynamic model based control system designed for pharma and other PAT based applications. 4. Conclusions Real time chemometrics and the incorporation of PAT Sensors into real time process control need to be part of the CAPE toolkit. PAT devices capable of 1-2 second or sub- Supersaturation (S = C/C*) Temperature, Concentration, Solubility & Turbidity 100 6 Julian Morris et al second measurement rates, real-time control based on a PAT measurement is a reality. Richness of measurement: not just a single data point per sample but a vector of data per sample. Sensor calibration models can give real time inference of product property. Real time pre-processing - no control system is going to control a spectrum of several hundred simultaneous values; so what is important? Is there is a calibration model to infer product property? Are there particular features/segments of the spectrum of interest? Should the scores of the PCA/PLS calibration model be controlled? Online Spectral methods coupled with closed loop process control will be both critical and useful tools for process transfer and comparability – process systems engineering has much to offer the PAT initiative. Outputs Plant Plant Process Data 99 99 Percentage 95 First FirstPrinciples Principles Model Model or or Dynamic Empirical Model Dynamic Empirical Empirical Model Model 80 60 40 20 10 5 Plant-Model Mismatch + + - 95 Percentage Inputs 5 1 80 130 180 -- 99 Percentage 80 60 40 20 10 5 Process Data 1.0 0.8 0.4 95 -3 -2 -1 0 1 Model Residuals Dynamic Non-linear PLS 80 60 40 20 10 5 1 1 -0.5 Fig. 7a. Model based MSPC 1.0 0.8 Percentage Multivariate Multivariate Monitoring Monitoring Charts Charts + + -4 Model Residuals Linear PLS 95 Dynamic Dynamic Empirical Empirical ‘Error-whitening’ ‘‘Error’ Model Model 80 60 40 20 10 1 99 -1 Plant-Model Residuals 0.0 Data 0.5 1.0 -1 0 Data 1 Fig. 7b. Normal probability plots Spectral Data Residuals Plant-Model Mismatch 0.4 0.0 0.0 -0.4 -0.4 -0.8 -1.0 PLS/PCA Calibration Model -0.8 -1.0 2 12 22 32 42 52 62 2 12 22 32 42 52 62 Dynamic PCA Controller Process Data 1.0 0.8 1.0 0.8 Model Residuals Dynamic PLS 0.4 Model Residuals Time Series Model 0.4 0.0 Discrete Data 0.0 -0.4 -0.4 -0.8 -1.0 -0.8 -1.0 2 12 22 32 42 52 62 Normal Operating Space Design Space 21 CFR part 11 Compliant Records 2 12 22 Fig. 7c. Partial Correlation Plots 32 42 52 62 Fig. 8. Real Time Data Quality Monitoring & Model-Based Control© Perceptive Eng. Acknowledgements EPSRC grant GR/R19366/01 (KNOW-HOW) and GR/R43853/01 (CBBII). References ZP. Chen, Morris, J.; Martin, E., (2004),Modeling temperature-induced spectral variations in chemical process monitoring, Dycops. ZP. Chen, Morris, J.; Martin, E., (2005), Correction of temperature-induced spectral variations by loading space standardization, Anal. Chem. 77, 1376-1384 ZP. Chen, Morris, J.; Martin, E.; Hammond, R.B.; Lai, X.J.; Ma, C.Y.; Purba, E.; Roberts, K.J.; Bytheway, R., (2005), Enhancing the signal to noise ratio of x-ray diffraction spectra by smoothed principal component analysis, Anal. Chem. 77, 6563-6570 ZP. Chen, Morris, J.; Martin, E., (2006), Extracting chemical information from spectral data with multiplicative light scattering effects by optical path-length estimation and correction, Anal. Chem., 78, 7674-7681. L. McPherson, E.B. Martin, & A. J. Morris (2002). Super Model Based Techniques for Batch Process Performance Monitoring, ESCAPE 12, 523–528. ZP. Chen, Morris, J., Improving the Linearity of Spectroscopic Data with Fluctuations in External Variables by the Extended Loading Space Standardization,(2007), Under Review for Anal. Chem. 2