Prediction of Solubility for Large Molecules in Solvents by Parallelized Molecular Simulation 1. Motivation The prediction of the solubility of compounds in different solvents is an important problem in many areas of chemistry. For example, volatile organic compounds represent an important class of environmental contaminants. Knowledge of their physico-chemical properties, such as the air-water distribution coefficient, the octanol-water partition coefficient, and solubilities in water, is necessary for modeling transport and distribution of these pollutants in the basic compartments of the environment (water, air, soil), and for adopting rational remediation measures. A related problem involves prediction of the solubilities of a range of solutes in supercritical fluids such as carbon dioxide and water. These solubilities are required in order to design equipment to extract them from other media such as soil by dissolution in the solvent. A fundamental thermodynamic quantity in all these solubility problems is either the solute excess chemical potential or the solute excess chemical potential at infinite dilution related to the Henry's-law constant. Although some macroscopic thermodynamic models exist for the prediction of solubilities in bulk systems, their accuracy is generally poor [1]. Other macroscopic approaches rely on empirical correlations, requiring a considerable amount of experimental data for their implementation [2]. Experimental measurement is often difficult and costly, as for example in situations involving organo-metallic compounds. For these reasons, molecular-level simulation methods show considerable promise as alternative predictive strategies. 2. State-of-the-Art The key problem in the molecular simulation of solubility of organics in solvents such as those considered here is that of efficiently calculating the excess chemical potential of the solute, typically at very low concentrations. The limiting case of infinite dilution allows the macroscopic-based Henry's law constant to be calculated. The main method for calculating chemical potentials at the molecular level involves the Widom test-particle-insertion method [3]. In the case of a molecularly large solute molecule (as is the case here) inserted into a solvent fluid, this method is very inefficient. A number of approaches has been devised to circumvent this problem [4], although none are without difficulties in their implementation and deficiencies in their accuracy. In the following we outline several advanced simulation methods for calculating the chemical potential that are well suited for parallelization. The Kirkwood coupling parameter method [5] gradually turns on the interaction between the extra particle and the system, by changing the Kirkwood coupling parameter from 0 (no interaction between the extra particle and the system) to 1 (the full interaction between the extra particle and the system). The excess chemical potential is determined as an integral over of ensemble averages for . The integral is evaluated numerically and it requires a series of simulations to compute the ensemble averages for at different -values. The Kirkwood coupling parameter method is useful for solutes formed by ring molecules e.g. naphthalene or benzene. The single-charging integral method [6] is a variant of the Kirkwood coupling parameter method. The method relies on a number of separate simulations in which a solute is slowly mutated from one form to another. Instead of a single coupling parameter that varies from 0 to 1, the single-charging integral approach utilizes a vector of coupling parameter, the elements of which correspond to simple functions of each potential parameter that changes during the mutation. The single-charging integral approach affords more flexibility in dealing with possible singularities, numerical difficulties, or unwanted phase changes that can plague the Kirkwood coupling parameter method. The single-charging integral method is useful for solutes formed by ring molecules and also for solutes formed by shorter linear or branched chain molecules. The calculation of the excess chemical potential using the configurational-bias method [7] is an approach ideally suited for solutes formed by chain molecules. To generate samples with favourable statistics, the method introduces a configurational bias that favours low-energy conformations. The position of the first two segments of a test chain molecule in the host system and their orientation in space are chosen at random. Subsequent segments are appended at the end of the test chain, one by one, until a full test chain is grown. The bias introduced by the chain-growing scheme is subsequently taken into account in a modified Widom expression of the excess chemical potential. The efficiency of the method could be further improved by introducing additional biasis for the position and orientation of the first two segments of the test chain. The test-segment-insertion method [8] decomposes the total solute excess chemical potential into a sum of contributions from segments of a test particle. The contributions are calculated in separate simulations. Here, a test segment is inserted onto the end of “partially built” test particle and the contribution to the total solute excess chemical potential is calculated using the Widom expression [3]. The testsegment-insertion method is useful for solutes formed by long chains including polymers. The expanded ensemble method [9] requires simulations of systems with various degrees of coupling between the solute and the solvent, involving the gradual turning on of the intermolecular potentials. The coupling ranges from completely decoupled to fully interactive solute. The solute excess chemical potential is evaluated using a histogram, describing the probability with which each of the sub-systems is visited. Frequency of visiting the sub-systems is controlled by the pre-weighting factors that are computed in the course of simulations. The expanded ensemble method is useful for solutes formed by chain as well as ring molecules. The above outlined advanced simulation methods are ideally suited for implementation on parallel computers with distributed memory using the Message Passing Interface library [10] since they rely on either the series of simulations (the Kirkwood coupling parameter method, the single-charging integral method, the testsegment-insertion method and the expanded ensemble method) or the sampling in various directions (the configurational-bias method). The simulations of particular sub-systems or the sampling in particular directions can be straightforwardly performed on separate computer nodes. Moreover, the methods require an interchange of minimum information among the nodes. Thus, there is negligible time-loss due to communication among the nodes. Another aspect of the simulations for large molecules is an efficient performing of (rotational, translation and conformational) moves for large molecules. This can be realized using the configurational-bias technique that offers further parallelization opportunities for the simulations. Finally, we have shown on prediction of chemical and vapour-liquid equilibrium for mixtures [11] that an incorporation of experimental pure-component vapour pressures into simulations leads to significant improvement in predictive accuracy of simulation methods. A similar approach can be used in the prediction of solubilities by simulations methods. In this case, the solvent excess chemical potential (calculated from an accurate equation of state at given temperature and pressure) can be fixed in the course of simulations instead of the system pressure. 3. Methodological Approach The proposed work can be divided into two complementary parts: development of simulation methodology for calculating the solute excess chemical potential on parallel computers, and the application of the parallelized simulation methods to predict the solubilities of real systems of practical interest. Throughout the work we plan to interact with experimental studies wherever possible, both to validate the parallelized simulation method and to aid in understanding the phenomena and properties of the real systems of practical interest. References: M. McHugh, Supercritical Fluid Extraction (Butterworth, Boston, 1986); S. I. Sandler, Chemical and Engineering Thermodynamics (McGraw-Hill, 1999, sections 8.3-8.6). 2. K. D. Bartles, A. A. Clifford, S. A. Jafar, and G. F. Shilstone, J. Phys. Chem. Ref. Data 20, 713, 1991. 3. B. Widom, J. Chem. Phys. 39, 2808, 1963. 4. D. Frenkel and B.,Smit, Understanding Molecular Simulation. From Algorithms to Applications (Academic Press, San Diego, 1996, pp. 151-181, 315-332). 5. J. G. Kirkwood, J. Chem. Phys. 3, 300, 1935. 6. A. A. Chialvo, J. Chem. Phys. 92, 673, 1990. 7. J. J. de Pablo, M. Laso, and U.W. Suter, J. Chem. Phys. 96, 6157, 1992. 8. S. K. Kumar, I. Szleifer, and A. Z. Panagiotopoulos, Phys. Rev. Letts., 66, 2935, 1991. 9. A. P. Lyubartsev, A. A. Martsinovski, S. V. Shevkunov, and P. N. VorontsovVelyaminov, J. Chem. Phys. 96, 1776, 1992; N. B. Wilding and M. Muller, J. Chem. Phys. 101, 4324, 1994. 10. Y. Aoyama and J. Nakano, RS/6000 SP: Practical MPI Programming (IBM Corp., 1999). 11. M. Lísal, W. R. Smith, and I. Nezbeda, J. Phys. Chem. B 103, 10496 (1999); M. Lísal, W. R. Smith, and I. Nezbeda, AIChE J. 46, 866 (2000). 1.