2007 ECSIS Symposium on Bio-inspired, Learning, and Intelligent Systems for Security Using Genetic Algorithms in Chem-Bio Defense Applications Sue Ellen Haupt* Randy L. Haupt George S. Young Meteorology Department The Pennsylvania State University State College, PA 16804-0030 *haupts2@asme.org uncertainties in modeling turbulent dispersion; transport and dispersion models compute the ensemble average of many realizations of an event while the goal is to reproduce a specific single realization of the event in real time; and the wind field evolves in time and space. Although these considerations imply a rather formidable problem, careful formulation of the problem as one in optimization that combines the physical dispersion modeling capability with the ground truth from a network of field sensors enables back-calculation of the parameters necessary to predict the downwind transport and dispersion of the contaminant. Our previous work demonstrated that coupling inverse models with transport and dispersion models using a genetic algorithm (GA) is an effective approach for attributing concentration contribution at a receptor to each of a specified number of sources [1]. This methodology was tested using a basic Gaussian plume dispersion model on synthetic data for circular source configurations and with actual source configuration for Logan, Utah. The methodology was then validated using Monte Carlo techniques to determine the confidence intervals [2]. We also studied the robustness of the methodology by considering the effects of both additive and multiplicative white noise [2]. We found that even when the noise was the same magnitude of the signal, the GA coupled model could correctly apportion the pollutant to the correct source. The next step was to replace the Gaussian plume dispersion model with an operational Second order Closure Integrated puff model, SCIPUFF [3]. The GA coupled model performed as well with SCIPUFF computing the dispersion as with the Gaussian plume model. That enhanced coupled model was then tested on field test Abstract There are many problems in Security and Defense that require a robust optimization technique, including those that involve the release of a chemical or biological contaminant. This paper discusses using a genetic algorithm for addressing such problems. An example is given how a mixed integer genetic algorithm can be used in conjunction with field sensor data to invert for source information and all necessary meteorological data. A new mixed integer genetic algorithm is described that is a state-of-the-art tool capable of optimizing a wide range of objective functions. Such an algorithm is useful for optimizing atmospheric stability, wind speed, wind direction, and source location. We demonstrate that the algorithm is successful at reconstructing these source and meteorological parameters. 1. Introduction In the case of an accidental or intentional release of a toxic contaminant, responsible agencies must decide which areas to evacuate, how to mitigate the release, and to plan for emergency response. That process is likely to be based on forecasts of transport and dispersion of the contaminant. In a real situation, however, it is unlikely that the exact information regarding the source parameters (location, time, strength of the release) or the meteorological data (wind speed and direction, atmospheric stability) that are necessary to predict contaminant concentrations downwind would be available. In addition, monitored concentration data contains errors; there are inherent 0-7695-2919-4/07 $25.00 © 2007 IEEE DOI 10.1109/BLISS.2007.29 151 data [3]. Within the limitations of the data, the coupled model still performed admirably. The cases where performance was disappointing were traced to difficult situations during the field test that would be expected to impact data quality. For those cases, prior comparisons of model results to the measured concentrations were also quite poor. The reformulation of the problem to additionally compute the wind speed and direction appears in Allen, et al. [4]. The inverse problems in these prior studies were all solved using a genetic algorithm (GA). The parameters to be optimized by the GA are the input values for the dispersion model. Thus, for each potential solution, the results of the dispersion model with those estimated parameters are compared to the monitored concentration pattern. That series of efforts progressed from identifying the source strength through identifying all relevant parameters. This process is depicted in Figure 1. data and predicted concentrations. The cost function to be minimized by the GA is: TR ∑ [ ln(aC r cost = r =1 + ε ) − ln(aRr + ε )] 2 (1) TR ∑ [ln(aR r r =1 + ε )] 2 where: Cr= forecast concentration as predicted by the Gaussian puff equation at receptor r, Rr=observed concentration retrieved from receptor r, and a and ε are constants used to avoid taking the logarithm of zero (a = 1, ε = 1× 10 here). TR is the total number of receptors. The dispersion model used is a Gaussian plume model. The runs presented here use a 32X32 grid of receptors (TR = 32X32 = 1024) with the source located in the center of the grid as seen in Figure 2. Note that many of the receptors receive negligible impact. We demonstrate two sets of calculations. First, we back-calculate the meteorological parameters: wind direction, wind speed, and stability classification. Then we add to that the location (x,y) of the source. −13 Figure 1. Schematic of source and meteorological data optimization for Security. The current paper describes our first efforts to compute atmospheric stability parameters in addition to wind direction and speed. The stability of the atmosphere determines the dispersion coefficients that govern the extent of the plume spread with distance and time. This requires a Mixed Integer Genetic Algorithm (MIGA), described in more detail below. Figure 2. Concentration of dispersed plume on 32X32 grid for stability 4. 2.2 The Mixed Integer Genetic Algorithm – A GA is an optimization technique that integrates genetic recombination with natural selection to evolve better solutions to an optimization problem. Figure 3 is a flow chart of a typical GA. A single guess of the optimum input to the cost function is placed in a row vector called a chromosome. The GA works with many guesses at once, so a matrix is formed with chromosomes as the rows. Initially, all the chromosomes in the population are random. This matrix is passed to the cost function and a column vector of costs is created. The operation of mating 2. Methodology 2.1 Model Formulation – Given field sensor data, the algorithm must backcalculate source characteristics and meteorological data for subsequent transport and dispersion modeling. The technique used to solve that problem is a MIGA, which optimizes the agreement between monitored 152 combines the information from the best prior parameter values to produce a new population of improved estimates. The mutation operator generates new solutions to maintain an adequate sampling of the parameter space, preventing premature convergence to a suboptimal set of parameter values [5]. The GA is quite robust at solving difficult nonlinear coupled optimization problems that are difficult for traditional techniques. a real GA, because the operators work with any combination of variable types. A chromosome can have any mix of real, integer, and binary variables. The next step in the algorithm is natural selection. Chromosomes with low costs survive, while chromosomes with high costs are discarded. This step either keeps a certain percentage of the population or discards members with costs that exceed a certain level. Surviving chromosomes are known as the mating pool. Discarded chromosomes from the population are replaced by new chromosomes called offspring. In order to create the offspring, parents must be chosen. Here we use tournament selection. In general, two parents produce two offspring that replace two discarded chromosomes. Mating between two selected chromosomes uses uniform crossover, which is preferable for a MIGA since uniform crossover provides a larger exploration of the cost surface than other approaches to crossover. First, a random binary mask is created consisting of ones and zeros to the length of the chromosome. A one in the mask column means the offspring receives the variable value in parent#1. If it has a zero, then the offspring receives the variable value in parent#2. Mutation is performed by randomly selecting variables in the population and replacing them with uniform random values. The mutation rate determines the total number of variables that receive a mutation This type of mutation modifies the entire chromosome rather than a single variable. It is attractive, because it is not confined to exploring one variable at a time. Figure 3. Flow chart of a genetic algorithm. We develop a new MIGA approach. The new MIGA used here has several unique features, including • All variables are represented with values between zero and one, • The uniform crossover mating operation is used, • Mutations occur on an entire chromosome rather than an individual variable, and • All scaling and mapping of the variables occurs in the cost function. This MIGA is versatile because the same algorithm can be used for any type of variable. The MIGA used here minimizes cost functions that are comprised of real number continuous variables and integer variables in calculating the cost. We configure the MIGA to minimize the cost in (1). The integer variable, atmospheric stability class, is included in the search space. 4. Results The MIGA was run for two different types of inversion. The first configuration optimized all meteorological parameters used for computing dispersion: wind speed, wind direction, and Pasquill Gifford stability class (integer). Note that the stability class determines the dispersion coefficients, i.e. the spread of the plume. First a single run was accomplished using 5000 generations. A plot of the convergence properties appears in Figure 4. For this case, the best solution converges in about 1200 generations. Note that the high mutation rate forces the algorithm to continually try new solutions; thus, the mean solution does not change much. The results of the first series of optimizations appear in Table 1. That table reports the statistics of 10 runs of 2000 generations each for optimizing the three meteorological parameters. It is obvious that the GA is quite reliable for this back-calculation. 3. Algorithm Details In order to make the MIGA as flexible as possible, all variables are mapped to continuous values between 0 and 1 [6]. The term continuous as used here, specifies values between 0 and 1. If a variable has an integer or binary value, then the cost function will convert it to a continuous value. The benefit of this approach is that all the scaling, quantizing, and rounding happen in the cost function, so the MIGA operates independent of the variable type. There is no need for a binary GA or 153 5. Discussion The MIGA is a useful advance of technology that allows jointly optimizing integer, binary, and continuous parameters. It is particularly applicable for extending this work in security – back-calculating source and meteorological parameters for subsequent dispersion modeling. Specifically, it allows adding the computation of stability category in a way that would be difficult for more traditional techniques. This work will aid decision-makers by giving better estimates of contaminant dispersion. Future work will concentrate on examining the robustness of the results in the face of noise in the data. In addition, we will look at more variables, such as source strength and effective source height in the inversion process. For these additional variable, we will use a instantaneous source model, a puff dispersion model. We will study these variables for various receptor configurations and compute the amount of information necessary to complete an inversion, both without noise and in the presence of noise. Finally, we plan to examine using the MIGA to optimize the number of receptors and their location. Figure 4. Convergence of the MIGA for optimizing three meteorological parameters. Table 1. Results of 10 GA optimizations of meteorological parameters. Wind Spd (m/s) Actual Mean Median Stand Dev Wind Dir (°) 5.000 4.990 4.987 0.226 Stabilty Class 180 180.026 180.028 0.014 6 6 6 0 References Table 2 reports statistics of 10 separate runs, each run for 10,000 generation when optimizing wind direction, stability, the (x,y) location and strength of the source. Generally, the MIGA is successful in identifying the relevant parameters. The value of source strength (a factor that multiplies the emission rate) is not quite as close to actual, but still a reasonable percent error (taken as the difference from the exact divided by the range). Although the magnitude of the source location appears large, it is on a scale ranging from -8000 to 8000m. The final row of Table 2 lists the difference between the mean and the actual as a percentage. When looking at a percentage error, the source location has been pin-pointed rather accurately. [1] S.E. Haupt, “A Demonstration of Coupled Receptor/Dispersion Modeling with a Genetic Algorithm,” Atmospheric Environment, vol. 39, pp. 7181-7189, 2005. [2] S. E. Haupt, G. S. Young, and C. T. Allen, “Validation of a Receptor/Dispersion Model Coupled with a Genetic Algorithm Using Synthetic Data,” J. Appl. Meteor., 45, 476–490, 2006. [3] C. T. Allen, S.E. Haupt, and G. S. Young, “Source Characterization With a Genetic Algorithm-Coupled Receptor/Dispersion Model Incorporating SCIPUFF”, J. Appl. Meteor, 46, No. 3, 273–287, 2007. [4] C.T. Allen, G.S. Young, and S.E. Haupt, “Improving Pollutant Source Characterization by Optimizing Meteorological Data with a Genetic Algorithm”, to Atmospheric Environment, 41, 2283-2289, 2007. Table 2. Results of 10 GA optimizations of meteorological parameters plus source siting. Source Strength Wind Dir (°) Actual Mean Median 1.00 1.38 9.20 180.00 180.41 180.30 StdDev 0.67 0.32 % Error 3.8 0.01 Stabilty Class 4 4 4 x (m) y (m) 0.0 -9.9 -22.0 0.0 24.6 46.7 0 25.6 72.3 0.00 0.00 0.15 [5] R. L. Haupt and S. E. Haupt, Practical Genetic Algorithms, 2nd edition with CD. John Wiley & Sons, New York, NY, 2004. [6] 154 R. L. Haupt, Antenna Design with a Mixed Integer Genetic Algorithm, IEEE AP-S Trans., 55, No. 3, 577582, 2007.