Data Reconciliation of Concentration Estimates from Mid-Infrared and Dielectric Spectral Measurements for Improved On-Line Monitoring of Bioprocesses Michal Dabros Laboratory of Chemical and Biological Engineering, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland Michael Amrhein and Dominique Bonvin Automatic Control Laboratory, School of Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland Ian W. Marison School of Biotechnology, Dublin City University, Dublin 9, Ireland Urs von Stockar Laboratory of Chemical and Biological Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland DOI 10.1021/bp.143 Published online March 30, 2009 in Wiley InterScience (www.interscience.wiley.com). Real-time data reconciliation of concentration estimates of process analytes and biomass in microbial fermentations is investigated. A Fourier-transform mid-infrared spectrometer predicting the concentrations of process metabolites is used in parallel with a dielectric spectrometer predicting the biomass concentration during a batch fermentation of the yeast Saccharomyces cerevisiae. Calibration models developed off-line for both spectrometers suffer from poor predictive capability due to instrumental and process drifts unseen during calibration. To address this problem, the predicted metabolite and biomass concentrations, along with off-gas analysis and base addition measurements, are reconciled in real-time based on the closure of mass and elemental balances. A statistical test is used to confirm the integrity of the balances, and a non-negativity constraint is used to guide the data reconciliation algorithm toward positive concentrations. It is verified experimentally that the proposed approach reduces the standard error of prediction without the need for additional C 2009 American Institute of Chemical Engineers Biotechnol. Prog., 25: off-line analysis. V 578–588, 2009 Keywords: in situ bioprocess monitoring, FTIR spectroscopy, dielectric spectroscopy, continuous elemental balances, on-line data reconciliation Introduction The field of biotechnology has witnessed over the past decade an increasing demand for real-time process automation, monitoring, control, and optimization.1–4 These process enhancement techniques require analyzers that provide reliable real-time information about the system. On-line spectrometers are such devices, which, in addition, have the advantage of small sampling times, straightforward in-situ installation, low-maintenance requirements, inherent sterility, noninvasiveness, and nondestructiveness.3,5–7 Near-infrared (NIR) and Fourier transform mid-infrared (FTIR) spectrometers, in particular, have been used successfully as on-line analyzers of medium metabolites. Meanwhile, capacitance (dielectric) spectrometers are gaining the status of a workhorse for in situ monitoring of biomass. However, the spread and implementation of spectroscopic sensors is still limited in the industrial setting due to the lack Current address of Michal Dabros: School of Biotechnology, Dublin City University, Dublin 9, Ireland. Correspondence concerning this article should be addressed to U. von Stockar at urs.vonstockar@epfl.ch. 578 of real-time and long-term reliability of calibration models. Instruments calibrated off-line often perform poorly in online conditions due to instrumental and process drifts unseen during calibration.8–13 Various signal and model adaptation methods exist, but most of them require off-line sample analysis to obtain the necessary reference or transfer standards.3,14 Data reconciliation is a key method for correcting concentration estimates from spectroscopic measurements without sacrificing the on-line applicability of analyzers. Data reconciliation, sometimes called data confrontation or rectification, is a statistical technique that evaluates the consistency of measurements and attempts to reduce errors in measured variables by taking into account a defined set of physical equalities (or inequalities15) such as balances.16–21 The corresponding algorithm is typically formulated as an optimization problem, where the physical equalities are formulated as constraints and the cost function is the weighted distance between reconciled and measured values. Data reconciliation has mainly been applied for reconciling flow rate and concentration measurements in flow circuits. In contrast to flow circuits, mass balance equations cannot be easily established in bioprocess applications because of C 2009 American Institute of Chemical Engineers V Biotechnol. Prog., 2009, Vol. 25, No. 2 difficulties in identifying the presence of all relevant species and the lack of complete measurements. Nevertheless, data reconciliation has been applied successfully to obtain improved estimates of conversion rates and yield coefficients using macroscopic elemental and/or energy balances based on measurements from various process analyzers such as online gas sensors, mass-flow controllers, or reaction calorimeters.6,16,21–24 In the field of spectroscopy, Kornmann et al.6 used data reconciliation to correct concentration estimates of medium analytes from FTIR data during batch fermentations of bacteria and used the reconciled measurements to recalibrate the instrument model on-line. Two constraints were used: carbon and degree of reduction balances. It was shown experimentally that the technique can lead to a significant reduction in the number of standards required for off-line calibration though, due to the lack of an appropriate on-line analyzer, biomass could not be monitored and was omitted in the algorithm. This omission was justified by the low-biomass yield of the bacterial strain used. However, the unmodeled biomass led to a bias in the reconciled concentration estimates. In this work, data reconciliation is extended to cases with a non-negligible biomass yield. The concentrations of medium metabolites and biomass are estimated simultaneously using, respectively, FTIR and dielectric (capacitance) spectrometers. Both instruments are calibrated off-line using calibration standards and reference data from a first batch fermentation of Saccharomyces cerevisiae. During a second fermentation, the predicted concentrations of all analytes are reconciled on-line by imposing the verification of four elemental balances (carbon, nitrogen, degree of reduction and charge) and an additional hard constraint that limits the reconciled concentrations to non-negative values. The elemental balances require, besides the predicted concentrations of medium metabolites and biomass, on-line measurements of offgas composition and of the amount of base added for pH control. The balances are checked for gross errors using a statistical test before being used as individual or concurrent constraints by the reconciliation algorithm. The performance of the proposed algorithm is assessed by evaluating the standard error of prediction for each analyte. The article is organized as follows: The balance equations, the data reconciliation algorithm, and the constraints are presented first. Details pertaining to the culture, the experimental setup, the spectrometers, and the reference analysis methods used in the work are then described. There follow the results of the study, a discussion and some concluding remarks. Problem Formulation Mass balances The mass balances serve to calculate, based on the available measurements, the number of moles, n, of each substrate and product that has been either consumed or produced from the start of the experiment up to the observation time t.24 Assumptions and Conventions. The following general assumptions are made in the balance equations: • The levels of dissolved O2 and CO2 are negligible. In similar conditions, Schenk et al.25 reports O2 and CO2 levels of 7 and 12 mg/L, respectively; • Stripping of any medium species other than ethanol (the most volatile) is negligible; and 579 • The elemental composition of biomass is known and constant. The balance notation adopted in this study is that the number of moles of each substance j (nj) is positive if the species is produced and negative if it is consumed from the start of the experiment. Mass Balances of O2 and CO2 Measured by Gas Analyzer. Oxygen and carbon dioxide levels in the off-gas are measured on-line by a gas analyzer. The measurements provided by the gas analyzer are in molar fraction. The following mass balance equations are used to convert the molar fraction values into the number of moles of CO2 and O2 that have evolved from the beginning of the process up to time t: Zt nCO2 ðtÞ ¼ yCO2 ðtÞGout ðtÞ yCO2 ;in Gin dt (1) 0 Zt nO2 ðtÞ ¼ yO2 ðtÞGout ðtÞ yO2 ;in Gin dt (2) 0 The constants yCO2,in and yO2,in are the levels of carbon dioxide and oxygen in the inlet air, respectively. The inlet and outlet gas flow rates Gin and Gout are expressed in moles per hour. The inlet flow rate is set constant by a mass flow controller, whereas the outlet flow rate is calculated by performing a mass balance of the inert nitrogen gas:26 Gout ðtÞ ¼ Gin 1 yO2 ;in yCO2 ;in 1 yO2 ðtÞ yCO2 ðtÞ yw (3) The constant yw is the fraction of moisture in the outlet gas determined on the basis of an oxygen balance around the reactor before inoculation: yw ¼ yO2 ;in yO2 ;wet yO2 ;in (4) The constant yO2,wet represents the oxygen level that is measured in the ‘‘wet’’ outlet gas that passes through the reactor filled with the reaction medium before inoculation. Mass Balance of Base. The mass of KOH added into the reaction for pH control is continuously monitored by a laboratory balance, and the molar flux of base up to time t is calculated using the density and the molarity of the base: Zt mKOH nKOH ðtÞ ¼ MKOH dt qKOH (5) 0 where m, q, and M stand for mass, density, and molarity, respectively. Mass Balances of Medium Analytes Measured by Infrared Spectroscopy. The concentrations of the chemical components present in the medium Cj(t) (in terms of grams per litre of medium) are estimated on-line using a FTIR spectrometer and a calibration model. Knowing the initial mass concentration of all the medium species at the beginning of the experiment Cj(0), the molar flux of species j up to time t is calculated using the following mass balance equation for batch applications: 580 Biotechnol. Prog., 2009, Vol. 25, No. 2 Cj ðtÞVR ðtÞ Cj ð0ÞVR ð0Þ þ P a nj ðtÞ ¼ MWj Cj ðaÞVa ; (6) where VR(L) is the reactor volume, Va(L) is the volume of each successive sample withdrawn from the reactor, and Cj(g/L) is the mass concentration of species j in that sample. The molecular weight of species j is represented by the constant MWj(g/mol). The reactor volume at time t is computed by knowing the initial volume, VR(0), and keeping a continuous inventory of all liquid volumes entering and leaving the reactor. In the case of ethanol, an additional term is added to Eq. 6 to account for the amount of this component stripped by the gas passing through the reactor. The ethanol mole fraction in the outlet gas can be estimated using a partition coefficient determined experimentally by Duboc and von Stockar for reactions at 30 C and aeration rates between 0.63 and 1.3 vvm:26 yEtOH ¼ 0:532xEtOH (7) where xEtOH is the liquid mole fraction of ethanol in the medium. This empirical coefficient not only includes the volatility of ethanol (Henry’s law) but also the effects of mass transfer. The amount of ethanol stripped up to measurement time t can thus be calculated by integration over time and added to Eq. 6, which takes on the following form for ethanol: nEtOH ðtÞ ¼ CEtOH ðtÞVR ðtÞ þ Rt yEtOH ðtÞGout ðtÞdt þ P CEtOH ðaÞVa a 0 ð8Þ MWEtOH Note that the initial term, CEtOH(0)VR(0), is omitted because there was no ethanol in the medium before inoculation. Mass Balance of the Biomass Measured by a Biomass Monitor. The concentration of biomass CX(t) is estimated on-line using a dielectric (capacitance) spectrometer, also known as the biomass monitor (BM), and a calibration model. The mass balance equation used to calculate the molar flux of biomass up to time t is very similar to Eq. 6: CX ðtÞVR ðtÞ CX ðinÞVin þ nX ðtÞ ¼ P a MWX CX ðaÞVa (9) Here, the initial biomass concentration comes from the amount Cx(in)Vin used to inoculate the reactor. The constant MWX represents the molecular weight of a C-mole of biomass including the ash, i.e., the amount of dry biomass containing 12.01 g of carbon. Elemental Carbon Balance. Six species are involved in the carbon balance: glucose, ethanol, glycerol, acetic acid, biomass, and carbon dioxide. The matrix of element fractions is equal to the carbon content of 1 mole of each of these six compounds: XC ¼ ½ 6 2 3 2 1 1 (10) Hence, the carbon balance takes on the following form: eC ¼ 6nGluc þ 2nEtOH þ 3nGlyc þ 2nHAc þ nX þ nCO2 (11) The term eC contains the balance error (in mol) resulting from measurement inaccuracies. Elemental Nitrogen Balance. Only two species are present in the nitrogen balance: ammonium and biomass. The matrix of element fractions is equal to the nitrogen content of 1 mole of these species: XN ¼ ½1 eN;X ; (12) where eN,X represents the stoichiometric coefficient of nitrogen in biomass to be determined by elemental analysis. The nitrogen balance can be written as follows: eN ¼ nNHþ4 þ eN;X nX (13) Elemental Degree of Reduction Balance. The degree of reduction (c) of a substance is defined as the number of electrons required for the oxidation of 1 mole of that substance. Thus, a degree of reduction balance is essentially an account of the available electrons in the system. It is more convenient to use this balance in microbial cultures because it removes the necessity to account for water in the system. The exact inventory of water is difficult to maintain, thus making oxygen and hydrogen balances impractical. However, because water has a degree of reduction of zero, it disappears altogether from the balance without affecting the total number of degrees of freedom. The degree of reduction for 1 mole of substance j of elemental formula CeC,j HeH,j OeO,j NeN,j is calculated in the following manner:17 ci ¼ 4eC;j þ eH;j 2eO;j 3eN;j (14) Note that in this way the degree of reduction of H2O, CO2, and NH3 is equal to zero. Six species are considered in the degree of reduction balance: glucose, ethanol, glycerol, acetic acid, biomass, and oxygen. The matrix of element fractions is equal to the degree of reduction of each of these compounds: Xc ¼ ½ 24 12 14:01 8 cX 4 (15) The value of cX will depend on the elemental composition of biomass, which needs to be determined. The degree of reduction balance takes on the following form: Elemental balances ec ¼ 24nGluc þ 12nEtOH þ 14:01nGlyc þ 8nHAc þ cX nX 4nO2 The elemental balances describe the conservation of the four elements considered in this work: carbon, nitrogen, degree of reduction, and charge. They take on the form of e ¼ Xn, where e is the balance error and X contains the specific balance fractions for each variable in n.23,24 Elemental Charge Balance. The charge balance in the system is attained through the maintenance of a constant pH throughout the culture. During growth, biomass uptakes ammonia (NH3) from ammonium (NHþ 4 ), liberating a hydrogen ð16Þ Biotechnol. Prog., 2009, Vol. 25, No. 2 581 ion (Hþ) for each mole of ammonium consumed. In addition, the cells produce acetic acid that dissociates into aceþ tate giving (C2H3O 2 þ H ) per mole of acetic acid. The free hydrogen ions in the medium are neutralized by OH ions coming from the added base, forming water and maintaining a constant pH. Thus, three species are present in the charge balance: NHþ 4 , acetic acid, and OH , which gives the following matrix of element fractions: XCharge ¼ ½ 1 1 1 ; (17) and the charge balance can be written as follows: eCharge ¼ nNHþ4 þ nHAc nOH (18) Combined Balance. The combined balance includes the four elemental balances described earlier and considers all nine species arranged in the following order: glucose, ethanol, ammonium, glycerol, acetate, biomass, CO2, O2, and base. Thus, the matrix of element fractions X becomes a 4 9 matrix and the elemental balances can be written as: e ¼ Xn ¼ 2 6 6 0 6 6 4 24 0 2 2 0 3 2 1 1 0 0 1 0 0 eN;X 0 0 12 0 0 1 14:01 0 8 1 cX 0 0 0 4 0 nGluc 3 6n 7 6 EtOH 7 6 7 nNHþ4 7 36 6 7 0 6 7 n 6 Glyc 7 6 7 0 7 76 76 nHAc 7 7 0 56 7 6 nX 7 6 7 1 6 7 6 nCO2 7 6 7 4 nO2 5 nOH ð19Þ Combined balances are considerably more difficult to close and reconcile simultaneously, but they offer the potential of more robust results because the closest solution is less prone to chance.24 Statistical test A statistical test can be applied to each of the balances to determine whether the balance errors fall inside a normally distributed range of acceptable values. A standard way of performing such a test is to calculate a statistical function h based on the measurement variance–covariance matrix, W, and to check whether h falls below an upper control limit defined by a v2-distribution.16,21,24 The statistical function is given by the following formula: 1 h ¼ eT XWXT e (20) The computation of the upper control limit involves the number of degrees of freedom for the system under consideration. The number of degrees of freedom, F, is defined as the number of unknown variables, N, minus the number of variables available either through measurements, M, or bal- ances, K. A positive value of F indicates that some of the variables can be chosen freely to satisfy certain criteria (e.g., in optimization). In contrast, a negative value of F indicates the level of redundancy that is available to tackle the effect of measurement noise. The yeast fermentation under consideration has F ¼ N M K ¼ 4, with: N ¼ 9, unknowns: glucose, ethanol, ammonium, glycerol, acetic acid, biomass, CO2, O2, and base; M ¼ 9, measured or estimated concentration changes: glucose, ethanol, ammonium, glycerol, acetic acid, biomass, CO2, O2, and base; and K ¼ 4, balances: carbon, nitrogen, degree of reduction, and charge. Using the same approach, it can be shown that the individual balances (carbon, nitrogen, degree of reduction, and charge) have a number of degrees of freedom of negative one. For a significance level of 95%, the upper control limit (UCL) is 9.49 for F ¼ 4 and 3.84 for F ¼ 1. The statistical test will be useful in the analysis of the results. As data reconciliation does not handle systematic gross errors, it could perform poorer in areas where the statistical function h exceeds the upper control limit. Data reconciliation algorithm The data reconciliation algorithm used in this work is governed by the following minimization problem subject to two constraints, evaluated at time t: minðnr ðtÞ nm ðtÞÞT WðtÞ1 ðnr ðtÞ nm ðtÞÞ nr etol Xnr etol such that nr ðtÞ þ n0 0 ð21Þ The cost function describes the distance between the measured values, nm, and the reconciled values, nr, whereas the covariance matrix W acts as a weighting factor such that a higher penalty is imposed on adjusting those variables that are expected to be more accurate. The first constraint contains the elemental balance equations whose residuals are to be kept within some tolerance levels for each measurement time t. The tolerance levels, etol, introduce a certain flexibility into the balances to account for potential inaccuracies in the balances equations. The values of etol are chosen based on the typical errors obtained for balances performed using off-line reference measurements in earlier runs and are as follows: 0.06 mol for the carbon balance, 0.015 mol for the nitrogen balance, 0.3 mol for the degree of reduction balance, and 0.02 for the charge balance. The second, ‘‘nonnegativity’’ constraint guarantees that the concentrations, computed as nr(t) þ n0, of all the species are positive, where n0 contains the initial molar concentrations of the components present in the culture medium. Figure 1 illustrates how the nearest solution satisfying both constraints can be found. Assuming that the measurement errors are independent of each other, W can be considered diagonal.16 The diagonal elements of W are computed as: 2 wj ¼ nj;max ej ; (22) where nj,max is the largest expected molar flux of species j and ej the corresponding measurement error. The measurement errors for the metabolites and biomass are set to 15%, which represents the typical prediction errors 582 Biotechnol. Prog., 2009, Vol. 25, No. 2 to have a continuous estimate of the reactor volume throughout the experiment. The maximum molar fluxes (nj,max) expected in this culture, based on previous experiments, were set to 0.3 mol of glucose, 0.4 mol of ethanol, 0.2 mol of ammonium, 0.03 mol of glycerol, 0.02 mol of acetic acid, 0.7 C-mol of biomass, 0.8 mol for CO2, 0.6 mol for O2, and 0.1 mol of KOH. Analytical methods Figure 1. Solution of the data reconciliation algorithm. The point labeled nr,A is the closest solution to the measurement nm lying in the solution space described by etol Xnr etol in Eq. 21. However, it does not satisfy the non-negativity constraint; the corresponding feasible region is represented by the white domain of the solution space. Thus, the measurement will be reconciled to nr,B, the nearest solution that satisfies all constraints of Eq. 21. for the two spectrometers. The gas analyzer measurement error is set to 3% and the base balance error to 1%. Materials and Methods Organism and culture conditions Two aerobic batch fermentation runs were performed using a wild-type strain of the crab tree-positive bakers’ yeast Saccharomyces cerevisiae. The strain (CBS 8066) was obtained from the Centraalbureau voor Schimmelcultures (Utrecht, NL). The first batch served to collect data for the off-line, in situ calibration of the biomass monitor before the study. The second batch was used for the main data reconciliation experiment. Source cells were stored at 80 C in 1.8 mL aliquots. For each batch, the reaction inoculum was obtained by adding one aliquot into a 1-L Erlenmeyer flask containing 100 mL of a sterile complex preculture medium (10 g/L yeast extract OXOID, 10 g/L peptone BACTO, and 20 g/L glucose) and incubating it for 24 h at 30 C and 200 rpm. The defined culture medium was sterilized by filtration and contained, per liter: 20 g glucose, 5 g (NH4)2SO4, 3 g KH2PO4, 0.5 g MgSO47H2O, as well as trace elements and vitamins (adapted from Verduyn et al.27 and Cannizzaro et al.28). The medium was supplemented with 0.5 ml/L of a standard antifoam agent to prevent foaming. The initial biomass concentration, following inoculation was 0.25 g/L. The cultures were grown in a 3.6-L laboratory bioreactor from Bioengineering (Wald, Switzerland), with a working volume of 2.6 L, equipped with a rushton-type agitator, baffles, temperature and pH probes and control mechanisms, gas inlet and outlet ports, a base inlet port, and a sampling port. Outlet gas passed through a condenser to minimize liquid loss by evaporation. The reactor was sterilized in situ at 121 C for 20 min. The cultures were grown at 30 C with an agitation speed of 800 rpm and an inlet air flow rate of 3.35 L/min (1.3 vvm). A solution of 2 M KOH (density, 1.12 g/ cm3) was used to maintain the pH at 5; no acid control was necessary. A detailed account of volumes entering the reactor (i.e., base added) and leaving it (i.e., samples) was kept For validation purposes, reference samples of about 12 mL were collected at intervals of 1 h using an in-house developed automated sampling robot, BioSampler 2002.7 The robot was equipped with a refrigeration system storing the samples at 4 C before treatment. A portion of each sample was stored frozen at 20 C before elemental analysis. Concentrations of ethanol, glycerol, and acetic acid were quantified by HPLC (HewlettPackard 1100 Series, Agilant, Palo Alto, CA). Concentrations of glucose and ammonium were determined with an automated enzymatic analyzer (Cobas Mira, Roche, Basel, CH) using commercially available enzymatic kits (R-Biopharm AG, Darmstadt, D). Both analyzers were calibrated using 4–5 synthetic standards for each of the monitored analytes. Biomass dry cell weight (DCW) was determined by putting 8 mL of the culture medium through a preweighed 0.22 lm pore filter, drying the filter, and subsequently reweighing it. Optical density measurements were performed as a backup method at 600 nm using the Spectronic Helios-Epsilon spectrophotometer from Thermo (Waltham, MA, USA). Elemental analysis of the biomass harvested at the end of the culture and freeze-dried provided the stoichiometric coefficients of carbon, hydrogen, oxygen, and nitrogen in 1 Cmole of biomass, giving the formula CH2.15O0.49N0.18 þ ashes. The molecular mass is 26.39 g/C-mol and the degree of reduction is 4.62 (Eq. 14). Oxygen and carbon dioxide levels in the outlet gas were analyzed using an infrared gas analyzer (Dr. Marino Müller AG, Esslingen, CH). A two-point linear calibration is performed for each gas before the experiment using the following: –.for CO2 calibration, N2 as 0% and a calibration-grade gas mixture as 5% CO2; –.for O2 calibration, N2 as 0% and pure air as 20.946%. FTIR spectrometer Concentration levels of glucose, ethanol, ammonium, phosphates, glycerol, and acetic acid were monitored using a single-beam ReactIRTM 4000 FTIR from Mettler Toledo (Greifensee, Switzerland). The instrument was equipped with a MCT detector and an Attenuated Total Reflection (ATR) diamond probe (DiCompTM, ASI Applied Systems, Millerville, MD). Dry air was continuously supplied as purge gas into the spectrometer housing, the optical conduct, and the probe shaft. The probe was built into a thermostatically controlled 5 mL flow cell (StreamlineTM, Mettler Toledo), sterilized in situ together with the reactor. During the experiment, the culture medium was pumped continuously through the flow-through cell via a sterile recirculation loop with a residence time inside the loop of less than 20 s. Spectra Biotechnol. Prog., 2009, Vol. 25, No. 2 583 Table 1. Characteristics of the Calibration Model Used for the FTIR Analyte Concentration Range (g/L) Spectral Range (cm1) PLS Factors SEC (g/L) Glucose Ethanol Ammonium Phosphate Glycerol Acetic acid 0–25 0–10 0–2 0–4 0–2 0–2 1,200-950 1,150-950 1,500-1,400 & 1,200-1,000 1,200-1,000 1,200-1,000 1,500-950 7 10 4 5 8 10 0.47 0.10 0.04 0.11 0.10 0.03 were collected at an interval of 2 min from the averaged values of 64 scans. The spectral range spanned wavenumbers between 4,000 and 650 cm1 with an approximate resolution of 4 cm1. Spectra were saved in binary format by the instrument’s custom software, ReactIRTM 2.1 (ASI Applied Systems, Millerville, MD) and subsequently imported into and analyzed in Matlab (The MathWorks, Inc., Natick, MA). The instrument was calibrated off-line, before the experiment, using partial least squares (PLS). The PLS model was developed using 49 synthetically prepared standards and a seven-level multivariate design.29 The choice of the number of PLS factors was based on predicted residual error sum of squares (PRESS) plots of leave-one-out cross-validation.30 Distinct spectral ranges were selected for each analyte to account for the different frequencies of the vibration modes of each component. Spectra of demineralized water were collected in parallel to each standard during the calibration procedure and used as background. Mean-centering was applied to the calibration sets. Table 1 summarizes the concentration range, spectral range, and the corresponding standard error of calibration (SEC) for each analyte. During the experiment, absorbance spectra for each measurement were calculated from the ratio of the corresponding intensity spectrum to the single spectrum of demineralized water taken immediately before the run. The standard error of prediction (SEP) was calculated based on 16 reference samples collected during the experiment and analyzed by HPLC/enzymatic analyzer. The equations used for calculating the values of SEC and SEP for a particular analyte are given below: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uP u nc y y i Þ2 u ð^ ti¼1 i SEC ¼ ; nc vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uP n u p y y i Þ2 u ð^ ti¼1 i SEP ¼ np (23) where y^i and yi are the predicted/reconciled and true (reference) concentration values of the analyte in sample i, nc is the number of samples in the calibration set and np is the number of samples in the prediction set. Biomass monitor Concentration levels of biomass were measured with the dielectric spectrometer Biomass Monitor 210 from Aber Instruments (Aberystwyth, UK). The BM was equipped with a 12 mm probe containing four annular electrodes. This configuration is particularly favourable, as four-terminal probes (as opposed to two-terminal probes) reduce electrode polarization.31,32 The probe was introduced directly into the reactor and sterilized in situ. During the cultures, 25 excitation frequencies from 0.1 MHz to 20 MHz were scanned every 15 s and the capacitance as well as the conductivity of the cell suspension was registered at each frequency. A program developed in-house using LabView (National Instruments, Austin, TX) was used to collect and store the measured data. The instrument was calibrated off-line, before the experiment during the first batch culture. A simple linear correlation model was established between biomass dry cell weight and dual-frequency capacitance values obtained by measuring the difference in capacitance readings at 500 kHz and 10 MHz (DC ¼ C500 C10,000). To eliminate noise in the signal, a low-pass filter with a frequency of 60 s was applied. All data were mean-centered before modeling. The standard error of calibration was 0.36 g/L for a biomass concentration range of 0–7 g/L. The standard error of prediction (SEP) was calculated based on 17 reference samples collected during the experiment and analyzed as described earlier. Experimental setup The basic experimental setup is illustrated in Figure 2. The BM probe was installed directly in the reactor, whereas the FTIR probe was nested inside a flow cell. Temperature, pH, and dissolved oxygen probes are not shown. Inlets into the reactor included air (flow rate measured by a mass flow controller) and base. The flow rate of the base was measured by continuously monitoring the scale on which the base flask was placed and dividing the result by the density of the base. Outlets included off-gas (flow rate calculated according to Eq. 3) and the samples. The volume of the samples was registered manually. The volume of the reactor was verified at the end of the experiment to confirm its agreement with the estimated final value of VR. During the experiment, the prediction of metabolite and biomass concentrations, the individual and combined balances and the data reconciliation routine were performed in real-time at an interval of 2 min, corresponding to the measurement interval of the slowest instrument in the setup, the FTIR. The general flow of tasks can be summarized as follows: (a). Off-line tasks before the experiment: –.Calibrate the FTIR spectrometer using synthetic standards; –.Calibrate the biomass monitor using standards from the first batch fermentation run; and –.Calibrate the gas analyzer using calibration gases. (b). On-line tasks during the experiment (second batch fermentation run): –.Predict the concentrations of the medium metabolites and biomass using measurements from the FTIR spectrometer and biomass monitor, respectively; –.Measure the off-gas composition and inlet and outlet gas flow rates; –.Measure the amount of base added; –.Check for gross errors of the individual and combined elemental balances using a statistical test; –.Apply the proposed reconciliation algorithm to reconcile the predicted concentrations of medium 584 Biotechnol. Prog., 2009, Vol. 25, No. 2 Figure 2. Experimental setup. Figure 3. Predicted (line) versus reference (dots) concentration profiles obtained with the original FTIR and BM calibration models. metabolites, biomass, off-gas composition, and amount of base added; and –. Collect reference samples for off-line validation; measure VR and Va. (c). Off-line tasks after the experiment: –. Analyze the reference samples to validate the reconciliation algorithm by calculating the errors of the original and reconciled concentration estimates of medium metabolites and biomass. Results and Discussion Results with original FTIR and BM calibration models Figure 3 shows the concentration profiles of the species considered in this study, predicted using the original FTIR and BM calibration models and compared with their reference values. Qualitatively, the results show the major trends in the culture (note the diauxic growth of the crabtree-positive yeast strain). However, significant prediction errors due to instrumental and process shifts can be observed. The standard errors of prediction (SEP) for all the species involved are as follows: 2.29 g/L for glucose, 0.60 g/L for ethanol, 0.23 g/L for ammonium, 0.36 g/L for glycerol, 0.54 g/L for acetic acid, and 0.62 g/L for biomass (see Column 2 in Table 2). Data reconciliation results obtained with the various balances Carbon Balance. The statistical test of the carbon balance is shown in Figure 4. The values of the test function h (Eq. 20) remain below the upper control limit of 3.84 for the duration of the culture. Hence, the carbon balance can be considered reliable. Using the carbon balance for reconciliation, significant improvement is achieved, in particular for glucose (Figure 5), where the SEP is reduced to 0.22 g/L. The ethanol profile is also adjusted and its SEP value reduced to 0.39 g/L. The corrections in the profiles of glycerol and acetic acid amount to a simple flattening of the negative values (see Column 4 Biotechnol. Prog., 2009, Vol. 25, No. 2 585 Table 2. Comparison of the Values of the Standard Error of Prediction Obtained With the Original Calibration Models, the ‘‘Zeroing’’ Method, and Data Reconciliation Using Individual and Combined Elemental Balance Analyte SEP (g/L) Original SEP (g/L) ‘‘Zeroing’’ SEP (g/L) C Balance SEP (g/L) N Balance SEP (g/L) c Balance SEP (g/L) Charge Balance SEP (g/L) Combined Balance Glucose Ethanol Ammonium Glycerol Acetic acid Biomass 2.29 0.60 0.23 0.36 0.54 0.62 0.84 0.39 0.23 0.33 0.07 0.61 0.22 0.39 – 0.33 0.07 0.60 – – 0.12 – – 0.40 0.47 0.38 – 0.33 0.07 0.58 – – 0.13 – 0.11 – 0.32 0.35 0.12 0.33 0.08 0.39 Figure 4. Statistical test values obtained for the carbon balance. The solid line shows the upper control limit. Figure 5. Glucose profile reconciled with the carbon balance (solid line) compared with the original FTIR profile (dotted line) and the reference measurements (dots). Figure 7. Ammonium profile reconciled with the nitrogen balance (solid line) compared with the original FTIR profile (dotted line), and the reference measurements (dots). Figure 8. Statistical test values obtained for the degree of reduction balance. The solid line shows the upper control limit. Figure 6. Statistical test values obtained for the nitrogen balance. The solid line shows the upper control limit. Figure 9. Statistical test values obtained for the charge balance. The solid line shows the upper control limit. in Table 2). However, this was expected considering that the concentration ranges of these two species are around the limit of detection of the FTIR. Surprisingly, the prediction error for biomass is only slightly reduced to 0.60 g/L. Nitrogen Balance. Figure 6 shows the results of the statistical test for the nitrogen balance. Because of the relatively large errors in the prediction of ammonium (see 586 Biotechnol. Prog., 2009, Vol. 25, No. 2 Figure 3), the upper control limit is slightly exceeded toward the end of the experiment. The results obtained after the culture time of 14 h could, therefore, be less reliable. Overall, the nitrogen balance is fairly effective for data reconciliation, and the prediction error is reduced to 0.12 g/L for ammonium and 0.40 g/L for biomass (Column 5 in Table 2). In agreement with the statistical test, the reconciliation performance seems to worsen after the culture time of 14 h (Figure 7). Degree of Reduction Balance. The result of the statistical test for the degree of reduction balance is shown in Figure 8. The statistical test values are very similar to those obtained with the carbon balance. The results of using the degree of reduction balance in the reconciliation algorithm are also comparable with those achieved with the carbon balance (see Columns 4 and 6 in Table 2). Charge Balance. Figure 9 shows the outcome of the statistical test for the charge balance. The large errors in both ammonium and acetic acid at the beginning and toward the end of the culture contribute to a large statistical test value. During the first 3 h of the experiment and after the culture time of 14 h, the value of h exceeds the upper control limit. Consequently, the results obtained during this period could be less reliable. Using the charge balance, the standard prediction error decreases to 0.13 g/L for ammonium and to 0.11 g/L for acetic acid. It should be noted that the result for ammonium obtained with the charge balance is very similar to that obtained with the nitrogen balance (compare Columns 5 and 7 in Table 2). As the concentration of acetic acid remains very low throughout the culture, the correlation between base uptake and biomass formation is quite strong. Combined Balance. The statistical test results for the combined balance are shown in Figure 10. Because of the significant errors in the estimates of several species during the initial states of the culture, the test value exceeds the upper control limit of 9.49. However, after culture time of 2.5 h, the value of h falls and remains below the upper control limit and hence, the combined balance can be considered fairly reliable. Reconciling the concentration estimates using the combined balance improves the prediction profiles of all six species (see Column 8 in Table 2 and Figure 11). In particular, the errors for ethanol, ammonium, and biomass are reduced even more than with any of the individual balances, suggesting that combining balances may in some cases help find solutions closer to the optimum. As expected with the low measurement errors assigned to the gas analyzer and base balance measurements (Data Reconciliation Algorithm Section), the average absolute percentage changes to nCO2, nO2, and nOH are low, respectively, 1.564%, 0.003%, and 0.016%. By comparison, the average absolute percentage changes for the six analytes vary between 11.5% for glucose to 83.9% for ammonium. Comparison of data reconciliation with ‘‘zeroing’’ of negative results Figure 10. Statistical test values obtained for the combined (carbon, nitrogen, degree of reduction, charge) balance. The solid line shows the upper control limit. As an alternative to the proposed data reconciliation approach, the standard error of prediction can also be reduced by zeroing the negative concentrations—an on-line postprocessing technique commonly used. Applying this method, the SEP values are reduced significantly for some of the analytes, but not as effectively as with data reconciliation using the combined balance. In general, data reconciliation outperforms the ‘‘zeroing’’ method because it also corrects Figure 11. Predicted (line) versus reference (dots) concentration profiles obtained after reconciling the FTIR and BM results with the combined (carbon, nitrogen, degree of reduction, charge) balances. Biotechnol. Prog., 2009, Vol. 25, No. 2 the positive values in the profiles. It is also a more systematic approach, because the reconciled results are checked for consistency of all balances. Table 2 compares the results obtained with the original calibration models, the ‘‘zeroing’’ method, the individual elemental balances and the combined balance. Concluding Remarks Signal drifts in on-line spectrometers can lead to significant inaccuracies in the prediction of low-concentration metabolites. In this work, data reconciliation based on continuous elemental balances involving spectroscopic measurements, off-gas analysis, and measurements of base addition has been presented as a method for correcting the on-line concentration estimates of both medium components and biomass without additional off-line sample analysis. One of the main advantages of the proposed approach is that it can easily be coded into the prediction routine for on-line spectrometers. The particular challenge in implementing data reconciliation for bioprocesses arises from the necessity to accurately formulate elemental balances for the system. Elemental balances are often difficult to close because species that were not seen during the modeling step may unexpectedly appear and vary in concentration at various stages of the process. On the other hand, the presence of unseen species can be easily detected using statistical tests. Therefore, data reconciliation also serves as a tool to gain insight into the process and to check for the validity of the balances. This feature may prove valuable in longer fermentations, where instrument drift becomes a significant issue or where, for instance, cell clumping or swelling affect the reliability of biomass measurements. Finally, difficulties may arise in cultures grown on complex media. However, because the concentration variations of many of the ill-defined compounds in complex media tend to be very small, their influence can be neglected when monitoring by FTIR spectroscopy. Note also that performance may be improved by eliminating gross errors, for example, through offset removal14 or on-line drift correction,33 before reconciling the estimated concentrations. Acknowledgments The Swiss National Science Foundation is greatly acknowledged for financial support of this work. Special thanks to Jonas Schenk for help with the data acquisition systems and to Paman Gujral for help with the implementation of the data reconciliation algorithm. Literature Cited 1. Junker BH, Wang HY. Bioprocess monitoring and computer control: key roots of the current PAT initiative. Biotechnol Bioeng. 2006;95:226–261. 2. Schügerl K. Progress in monitoring, modeling and control of bioprocesses during the last 20 years. J Biotechnol. 2001;85: 149–173. 3. Kornmann H, Rhiel M, Cannizzaro C, Marison I, von Stockar U. Methodology for real-time, multianalyte monitoring of fermentations using an in-situ mid-infrared sensor. Biotechnol Bioeng. 2003;82:702–709. 4. Olsson L, Schulze U, Nielsen J. On-line bioprocess monitoring—an academic discipline or an industrial tool? Trac-Trends Anal Chem. 1998;17:88–95. 5. Vojinovic V, Cabral JMS, Fonseca LP. Real-time bioprocess monitoring. I. In situ sensors. Sensors Actuat B: Chem. 2006;114:1083–1091. 587 6. Kornmann H, Valentinotti S, Marison I, von Stockar U. Realtime update of calibration model for better monitoring of batch processes using spectroscopy. Biotechnol Bioeng. 2004;87:593– 601. 7. Cannizzaro C. Spectroscopic Monitoring of Bioprocesses: A Study of Carotenoid Production by Phaffia Rhodozyma Yeast, PhD Thesis. Lausanne, Switzerland: Ecole Polytechnique Fédérale de Lausanne; 2002. 8. Yardley JE, Todd R, Nicholson DJ, Barrett J, Kell DB, Davey CL. Correction of the influence of baseline artefacts and electrode polarisation on dielectric spectra. Bioelectrochemistry. 2000;51:53–65. 9. Feudale RN, Woody NA, Tan H, Myles AJ, Brown SD, Ferre J. Transfer of multivariate calibration models: a review. Chemometrics Intelligent Lab Systems. 2002;64:181–192. 10. Wolthuis R, Tjiang GCH, Puppels GJ, Schut TCB. Estimating the influence of experimental parameters on the prediction error of PLS calibration models based on Raman spectra. J Raman Spectrosc. 2006;37(1–3):447–466. 11. Arnold SA, Gaensakoo R, Harvey LM, McNeil B. Use of atline and in-situ near-infrared spectroscopy to monitor biomass in an industrial fed-batch Escherichia coli process. Biotechnol Bioeng. 2002;80:405–413. 12. Duponchel L, Ruckebusch C, Huvenne JP, Legrand P. Standardisation of near infrared spectrometers using artificial neural networks. J Near Infrared Spectrosc. 1999;7:155–166. 13. Zhang L, Small GW, Arnold MA. Calibration standardization algorithm for partial least-squares regression: application to the determination of physiological levels of glucose by near-infrared spectroscopy. Anal Chem. 2002;74:4097–4108. 14. Dabros M, Schenk J, Marison IW, von Stockar U. The ongoing quest for truly on-line bioprocess monitoring using spectroscopy. In: Flynne WG, editor. Biotechnology and Bioengineering. New York: Nova Science Publishers Inc; 2008:99–119. 15. Dibo MA, Maquin D, Ragot J. Data Reconciliation using Interval Analysis. Nancy, F: Institut National Polytechnique de Lorraine; 2007. 16. Wang NS, Stephanopoulos G. Application of macroscopic balances to the identification of gross measurement errors. Biotechnol Bioeng. 1983;25:2177–2208. 17. de Kok HE, Roels JA. Method for the statistical treatment of elemental and energy balances with application to steady-state continuous-culture growth of saccharomyces cerevisiae CBS 426 in the respiratory region. Biotechnol Bioeng. 1980;22:1097– 1104. 18. Narasimhan S, Jordache C. Data Reconciliation & Gross Error Detection. Houston, TX: Gulf Publishing Company, 2000:406. 19. Romagnoli JA, Sánchez MC. Data Processing and Reconciliation for Chemical Process Operations, Vol.2. San Diego: Academic Press, 2000:270. 20. Crowe CM. Data reconciliation—progress and challenges. J Process Control. 1996;6(2–3):89–98. 21. van der Heijden RTJM, Romein B, Heijnen JJ, Hellinga C, Luyben KCAM. Linear constraint relations in biochemical reaction systems, Part 2: Diagnosis and estimation of gross errors. Biotechnol Bioeng. 1994;43:11–20. 22. Jungo C. Quantitative Characterization of a Recombinant Pichia pastoris Mutþ Strain Secreting Avidin Using Transient Continuous Cultures, PhD. Lausanne: Ecole Polytechnique Fédérale de Lausanne; 2007. 23. Herwig C, Marison I, von Stockar U. On-line stoichiometry and identification of metabolic state under dynamic process conditions. Biotechnol Bioeng. 2001;75:345–354. 24. Duboc P, von Stockar U. Energetic investigation of Saccharomyces cerevisiae during transitions, Part 1: Mass balances. Thermochim Acta. 1995;251:119–130. 25. Schenk J, Marison IW, von Stockar U. A simple method to monitor and control methanol feeding of Pichia pastoris fermentations using mid-IR spectroscopy. J Biotechnol. 2007;128: 344–353. 26. Duboc P, von Stockar U. Systematic errors in data evaluation due to ethanol stripping and water vaporization. Biotechnol Bioeng. 1998;58:428–439. 588 27. Verduyn C, Postma E, Scheffers WA, Vandijken JP. Effect of benzoic-acid on metabolic fluxes in yeasts—a continuous-culture study on the regulation of respiration and alcoholic fermentation. Yeast. 1992;8:501–517. 28. Cannizzaro C, Valentinotti S, von Stockar U. Control of yeast fed-batch process through regulation of extracellular ethanol concentration. Bioprocess Biosyst Eng. 2004,26:377– 383. 29. Brereton RG. Multilevel multifactor designs for multivariate calibration. Analyst. 1997;122:1521–1529. 30. Geladi P, Kowalski BR. Partial least-squares regression: a tutorial. Anal Chim Acta. 1986;185:1–17. Biotechnol. Prog., 2009, Vol. 25, No. 2 31. Davey CL, Davey HM, Kell DB. On the dielectric properties of cell suspensions at high-volume fractions. Bioelectrochem Bioenerg. 1992;28(1–2):319–340. 32. Harris CM, Todd RW, Bungard SJ, Lovitt RW, Morris JG, Kell DB. Dielectric permittivity of microbial suspensions at radio frequencies: a novel method for the real-time estimation of microbial biomass. Enzyme Microbial Technol. 1987;9:181–186. 33. Dabros M, Amrhein M, Gujral P, von Stockar U. On-line recalibration of spectral measurements using metabolite injections and dynamic orthogonal projection. Appl Spectrosc. 2007,61:507–513. Manuscript received July 28, 2008, and revision received Oct. 23, 2008.