Journal of Hazardous Materials 303 (2016) 137–144 Contents lists available at ScienceDirect Journal of Hazardous Materials journal homepage: www.elsevier.com/locate/jhazmat Filling environmental data gaps with QSPR for ionic liquids: Modeling n-octanol/water coefficient Anna Rybinska, Anita Sosnowska, Monika Grzonkowska, Maciej Barycki, Tomasz Puzyn ∗ Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland h i g h l i g h t s g r a p h i c a l a b s t r a c t • We developed a QSPR model to predict the values of log KOW for ionic liquids. • Effect of the cation and anion structures on the modeled property was determinated. • Increase in the length of alkyl chain in cation causes significant increase of the log KOW . • Majority of ILs could be transported with the water mass. a r t i c l e i n f o Article history: Received 21 June 2015 Received in revised form 22 September 2015 Accepted 12 October 2015 Available online 23 October 2015 Keywords: Ionic liquids ILs QSPR KOW Octanol–water partition coefficient a b s t r a c t Ionic liquids (ILs) form a wide group of compounds characterized by specific properties that allow using ILs in different fields of science and industry. Regarding that the growing production and use of ionic liquids increase probability of their emission to the environment, it is important to estimate the ability of these compounds to spread in the environment. One of the most important parameters that allow evaluating environmental mobility of compound is n-octanol/water partition coefficient (KOW ). Experimental measuring of the KOW values for a large number of compounds could be time consuming and costly. Instead, computational predictions are nowadays being used more often. The paper presents new Quantitative Structure–Property Relationship (QSPR) model that allows predicting the logarithmic values of KOW for 335 ILs, for which the experimentally measured values had been unavailable. We also estimated bioaccumulation potential and point out which group of ILs could have negative impact on environment. © 2015 Elsevier B.V. All rights reserved. 1. Introduction Design of sustainable ‘green’ products is nowadays one of the most important challenges for chemical industry. New chemical materials should be not only useful and inexpensive, but also safe for human health and the environment. Ionic Liquids (ILs) – salts consisting of a large organic cation and a small inorganic anion, having their melting point lower than 100 ◦ C – have being considered as ‘green chemicals’ for last few years. This is because ILs are characterized by low flammability, low vapor pressure, stability at ∗ Corresponding author. E-mail address: t.puzyn@qsar.eu.org (T. Puzyn). http://dx.doi.org/10.1016/j.jhazmat.2015.10.023 0304-3894/© 2015 Elsevier B.V. All rights reserved. high temperatures and ability to retain the liquid state for a wide range of temperatures. These properties, together with the possibility of easy modification of the structure, decide on employing ILs in many disciplines such as chemistry, biotechnology, chemical engineering and industry [1–3]. Nowadays synthesized ILs belong to the third generation of that compounds; they are designed to possess certain biological activity combined with selected physical properties. Ionic liquids, having antibacterial activity and being soluble in water, are examples of the third-generation ILs [4]. However, recent studies confirm negative impact of ILs on living organisms, when the organisms are exposed at those novel materials. Regarding that the increasing production and use of ILs increase probability of their emission to the environment, it is important to estimate the ability of these compounds to spread in the 138 A. Rybinska et al. / Journal of Hazardous Materials 303 (2016) 137–144 environment. Low vapors pressures that characterize ILs decrease the risk of air pollution by these substances and reduces their longrange atmospheric transport potential. However, since ILs have significant solubility in water, natural waters are the most likely media through which ILs would be transported in the environment. Moreover, the physical–chemical properties that make ILs useful from the industrial perspective (i.e., high chemical and thermal stability) may suggest potential problems with degradation of ILs and their high persistence in environment [5–7]. For these reasons, it is of vital importance to assess the possible transport and fate of novel ILs, before they are introduced at the market. N-octanol/water partition coefficient (KOW ) is a parameter that, regarding low vapor pressure and high water solubility of ILs, is crucial for the assessment of the environmental transport and fate. It describes distribution of a substance between the organic and the aqueous phase at equilibrium state. Logarithmic values of KOW are commonly used in exposure assessment to express lipophilicity of a chemical [8]. Highly lipophilic substances (log KOW » 0) could be accumulated in water organisms, whereas more hydrophilic ones (log KOW « 0) are preferably transported with water masses and, thus, the uptake by organisms is significantly lower [9,10]. The values of n-octanol/water partition coefficient are also used as input data for more comprehensive environmental multimedia models that allows investigating the behavior of chemicals in the total environment [11]. Unfortunately, data on physicochemical properties of ionic liquids are often incomplete or not available in the literature [12]. Therefore, regarding the large number of the existing and theoretically possible ILs, it is reasonable developing computational models that enable predicting missing data in relatively short time, without necessity of performing additional experiments. An example of such methods is Quantitative Structure–Activity/Property Relationship (QSAR/QSPR) approach, according to which the property of interest might be predicted from the variance in chemical structures of the investigated group of compounds, when experimental data are available only for a part of the group [13–19]. Since the experimentally measured values of n-octanol/water partition coefficient are currently available only for a small part of the ionic liquids, in this work we have developed a QSPR model that allows predicting the logarithmic values of KOW of a wider set of ionic liquids from their chemical structure. Furthermore, with this model we tried to determine the effect of structural variation of cations and anions on the modeled property in the context of environmental transport and fate of ILs. We also estimated bioaccumulation potential of all ionic liquids from prediction set by comparing predicted values of log KOW with criteria of the Stockholm Convention [20]. 2. Materials and methods The process of developing QSPR models consists of several basic steps, namely: (i) collecting available experimental data and splitting them into training and validation sets, (ii) calculating molecular descriptors, (iii) selecting the optimal, physically interpretable combination of the descriptors and training a QSPR model, (iv) external validation of the model, (v) providing physical interpretation of the model [21]. Only appropriately developed and validated model can be further used for making valuable predictions. 2.1. Experimental data At first, we collected the experimental values of n-octanol/water partition coefficient from available literature sources [22–28]. We have found the data only for 53 ionic liquids. In addition, when evaluating the data, we were forced to discard 10 compounds, because the experimental details (i.e., the information about the applied measurement method and the temperature at measurements) were missing. In effect, we obtained a set of 43 ILs, in which the experimental values of the studied partition coefficient ranged from −3.77 to 1.73 logarithmic units. The data have been measured at 297.15 ± 2 K. We accepted this temperature range, because the temperature variation up to 10 K does not affect the measurement in a significant way [22]. In the next step, 43 ILs were sorted according to the increasing values of log KOW . Then, the compounds were split into a training set and a validation set. Data splitting procedure was performed using so-called “Z:1 algorithm”, in which every Zth compound in a group of the compounds sorted according to the predicted property (here log KOW ) is assigned to the validation set, whereas the remaining ones form the training set. In this case Z = 3 and we obtained the training set containing 29 ionic liquids (67%) and the validation set containing 14 compounds (Table A in the electronic Supplementary material). 2.2. Molecular descriptors In order to obtain molecular descriptors (numerical variables that characterize molecular structures of the compounds used as input variables in the QSPR model), we created molecular models of all cations and anions present in the 43 studied ILs using the ChemSketch software [29]. Subsequently molecular geometries of cations and anions (separately) were optimized with use of quantummechanical methods at the semi-empirical PM7 level [30] with the MOPAC 2012 software [31]. Molecular models of anionic and cationic moieties with optimized geometry were then used for calculating molecular descriptors. We obtained 1025 descriptors of the cations’ and 1311 descriptors of the anions’ structures (Tables B and C in the electronic Supplementary material, respectively). This includes: • Constitutional and topological descriptors (1D–2D), • Weighted Holistic Invariant Molecular descriptors (WHIM) (3D), • Quantum-mechanical descriptors (3D). Constitutional and topological descriptors that take into account one- and two-dimensional features of the molecules (1D–2D) as well as three-dimensional (3D) WHIM descriptors were separately calculated with Dragon software (version 6.0), whereas quantummechanical ones (e.g., frontier orbital energies, dipole moments) were extracted directly from MOPAC 2012 output files after the geometry optimization [31,32]. 2.3. QSPR modeling Next, we applied multiple linear regression method (MLR) to find the quantitative relationship between molecular descriptors (input variables) and log KOW (the modeled value) [33,34]. The optimal combination of molecular descriptors was selected with genetic algorithm (GA) [35] implemented in the QSARINS software [36,37]. The following set of parameters has been applied to control the genetic algorithm: the size of a population: 100, the mutation rate: 45%. For more details, please refer to Fig. 1S in the electronic Supplementary material. It should be noted that there are more software packages for automated QSAR/QSPR modeling, which offer various modeling techniques, such as Partial-Least-Squares (PLS), Supporting Vector Machines (SVM) or neural networks (NN) [38,39]. To assure credibility of the model, the following recommendations published by Organization for Economic Cooperation and Development (OECD) were fulfilled [40]. Indeed, our model is based A. Rybinska et al. / Journal of Hazardous Materials 303 (2016) 137–144 139 2.0 1.5 1.0 0.5 on precisely defined endpoint (log KOW ) and employs a common method of modeling (MLR combined with GA). However, it should be highlighted here that the model should be only used to predict the endpoint values for the compounds belonging to its domain of applicability (AD). This is because the predictions for such compounds, being in fact the results of the model’s interpolation, are considered as much more precise than the predictions outside the domain (extrapolation). Applicability domain of our model was inspected with using the plot of the standardized residuals versus the leverage values (Williams plot). The leverage values (hi ) that illustrate similarity of particular compounds to the training set were calculated in accordance with the following Eq. (1): hi = xiT X T X −1 xi (1) where xi is the vector of descriptors calculated for the considered ith compound and X is the matrix of descriptors calculated for all compounds from the training set. The standardized residuals were calculated in accordance with the equation that could be found in Supporting information (Eq. 1S). Boarders of the applicability domain are determinate by the critical values of standardized residuals (threshold of three standard deviation units, 3) and the threshold leverage (h*). Value of h* is calculated as h* = 3p’/n, where p’ is the number of model variables plus one, and n is the number of compounds in the training set [41,42]. Goodness-of-fit of the QSPR model was measured by using determination coefficient (R2 ) and root mean square error of calibration (RMSEC ) (Table D in Supplementary material). Stability and robustness of the model was verified by the leave-one-out cross-validation (LOO-CV) (Fig. 1) [43] and expressed by means of leave-one-out cross-validation coefficient (Q2 CV ) and root mean square error of cross-validation (RMSECV ). The presence of influential points in the training set has been additionally tested with the approximate F-test proposed by Toth et al. [44], where F = (1 − Q2 CV )/(1 – R2 ). Since the final model must be tested on compounds not previously used for its development, we calculated external validation coefficient (Q2 EXT ) and root mean square error of prediction (RMSEEXT ) based on chemicals from the validation set, used both measures to estimate the predictivity [42,43,45,46]. In addition we have calculated concordance correlation coefficient (CCC) as a complementary, more prudent measure of the model to be externally predictive (Table D in Supplementary material) [45]. Finally, we proposed mechanistic interpretation of the model. 3. Results and discussion 3.1. Predicting log KOW with GA-MLR model Firstly, we have developed a quantitative model describing the linear relationship between the molecular structure of ionic liquids and the logarithmic values of n-octanol/water partition coefficient by using QSARINS software. We have identified two points outside the applicability domain (the standardized residuals were higher than ±3), namely 1,3-dihexyloxymethyl-imidazolium tetrafluoroborate and 1-butyl-3-methylimidazolium chloride. These points were not taken into account during development of the final model Predicted logKow 0.0 Fig. 1. Algorithm of leave-one-out cross-validation [43] (single column). −0.5 −1.0 −1.5 −2.0 −2.5 −3.0 −3.5 Splitting Training set Validation set −4.0 −4.5 −4.5 −4.0 −3.5 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Experimental logKow Fig. 2. Experimental and predicted values of log KOW (single column). Eq. (2): logK OW = −1.48(±0.09) + 0.93(±0.09)ON0VC + 0.55(±0.10)X5AvC + 0.47(±0.10)L3mA p = 5.65.10−13 , (2) R2 n = 41, k = 13, F = 1.22, = 0.91, RMSEC = 0.42, Q2 CV = 0.89, RMSECV = 0.48, Q2 EXT = 0.83, RMSEEXT = 0.54, CCC = 0.94where n is a number of all compounds and k is a number of ionic liquids in validation set. The model is a combination of three uncorrelated (r < 0.48) molecular descriptors, namely: the overall modified Zagreb index of order 0 by valence vertex degrees—calculated for cation (ON0 VC ), the average valence connectivity index of order 5—calculated for cation (X5AvC ) and 3rd component size directional WHIM index weighted by atomic masses—calculated for anion (L3mA ). High (close to 1) values of R2 , Q2 CV , Q2 EXT , and CCC as well as low and similar values of the errors (RMSEC , RMSECV , RMSEEXT ) indicate that the developed model is well-fitted, robust and has satisfactory predictive capabilities. Visual analysis of a plot that illustrates correlation between the experimental and predicted values of log KOW additionally confirms high quality of the model (Fig. 2). Applicability domain was verified based on the Williams plot (Fig. 3). The X-axis of the plot (the leverage value) expresses similarity of a given compound, for which the prediction is made, to the training compounds. The calculated critical leverage value (similarity threshold) for this model is h* = 0.429. Interestingly, all compounds are situated in the range of residuals differing by ±3 standard deviations from zero. This means uncertainty of the prediction for all of them was acceptable, independently on the structural difference. After verifying that our model fulfills the OECD recommendations, we have applied it to predict missing values of the n-octanol/water partition coefficient for 335 ionic liquids, for which the experimentally measured values of log KOW had been unavailable (prediction set). The prediction set contained ILs based on various cations structures, including: imidazolium, pyridinium, ammonium, phosphinium, pyrolidinium and sulfonium. Thus, it was necessary to verify, whether the predictions have been 140 A. Rybinska et al. / Journal of Hazardous Materials 303 (2016) 137–144 Standardized residuals 4 2 0 Splitting Training set −2 Validation set h* −4 0.1 0.2 0.3 0.4 0.5 Leverages Fig. 3. Applicability domain—Williams plot (1,5 column). performed within the domain of applicability. Since the molecular descriptors were the only available data for the prediction set (obviously, the observed values of log KOW were lacking), the application of Williams plot for this purpose was impossible. Instead, we employed Insubria graph (Fig. 4) that illustrates the distribution of the ionic liquids in the space of descriptors we used. The graph provides information about the leverage values for the prediction set and about the relationship between the predicted values for the training set and the prediction set [47]. Certainly, all the predicted values should be in range of the values of log KOW observed for the training set. If any predicted value is higher than the maximal observed KOW value from the training set or is lower than the minimal observed KOW value from the training set, it should be assumed that the value is a result of extrapolation rather than of interpolation. We found (Fig. 4) that majority of ionic liquids from the prediction set (82.7%) were sufficiently structurally similar to the training set (leverages lower than h* ; group 1 on Fig. 4). There are also compounds structurally different from training set (leverages higher than h*; group 2 on Fig. 4), although they lay between extreme values of experimental values of log KOW . We also identified ionic liquids, significantly differing from the training set by the structure (leverages higher than h*; log KOW values higher that YMAX ; groups 3–5 on Fig. 4). Group 3 contained ionic liquids with various types of cation. Last two groups contained mainly phosphonium ILs, except one ammonium ionic liquid. Interestingly, two phosphonium ILs from last group, namely: tetrabutylphosphonium bromide and tetrabutylphosphonium bis[1,2-benzenediolato(2-)]borate(1-) were characterized by extremely high leverage values. It should be highlighted here that for the compounds from group #1 and #2 the predicted values could be considered as reliable, in contrary to ILs from group #3, #4, and #5 (with leverages higher than h* and log KOW values higher that YMAX ). We also have compared additional experimental data with the values predicted by our model to additionally confirm predictive ability of the presented model (Table G in Supplementary information). Predictions for majority of the additionally compared compounds are characterized by low values of the relative error. Only in case of three ILs (namely: 1-ethylpyridinium bis(trifluoromethylsulfonyl) imide, 1-octyl-3-methylimidazolium tetrafluoroborate and 1-ethyl-3-methylimidazolium tris(pentafluoroethyl) trifluorophosphate) those differences are significant (marked in red in the Table G). However, it should be noticed that one of mentioned ILs, namely 1-ethyl-3-methylimidazolium tris(pentafluoroethyl)trifluorophosphate, has the leverage value higher than the threshold. Thus, the predicted value for this ionic liquid is less reliable. The rest of ILs from Table G have leverage values less than threshold. Predicted values of log KOW for ionic liquids from the prediction set together with the leverage values are presented in the Table E in electronic Supplementary material. Fig. 4. Insubria graph (1,5 column). A. Rybinska et al. / Journal of Hazardous Materials 303 (2016) 137–144 141 Table 1 Structure impact on values of X5AvK descriptor. M1 = 44 N N + C ON0V = 4.9 1-butyl-3-methylimidazolium N + M1 = 48 ON0VC = 5.3 1-butyl-3-methylpyridinium Fig. 5. Influence of a cation’s size on the first Zagreb index value (single column). 3.2. Relationship between the structure of ionic liquids and log KOW As mentioned, the model is a linear combination of three molecular descriptors: two descriptors related to the cation’s (ON0VC and X5AvC ) and one related to the anion’s structure (L3mA ). First cationic descriptor (ON0VC ) is the overall modified Zagreb index of order 0 by valence vertex degrees. Original Zagreb indices and theirs further modifications (i.e., overall modified Zagreb indices) are derived from graphs representing hydrogen-depleted molecules (molecular graphs). Vertexes in those graphs represent particular atoms, whereas edges indicate chemical bonds. Zagreb indices are calculated based on the vertex degrees in the graphs. Vertex degree for a given atom (vertex) corresponds to the number of other atoms (vertexes) connected to the one. As such, the group of Zagreb indices characterizes molecular topology [48]; the indices deliver information about the overall size of a molecule and molecular branching [49]. For example, when we comparing two different cations (imidazolium and pyridinium) having the same (butyl) substituents (Fig. 5), once can notice that the calculated value of the first Zagreb index (M1 ) is higher for larger (pyridinium) cation. In our model, we used modified Zagreb indices (ON0V), where the indices were calculated based on the valence vertex degrees (ıV ) instead of vertex degrees (ı). But, the values of ON0V have the same tendency for the studied ionic liquids as the original Zagreb indices. Second cationic descriptor (X5AvC ) belongs to the family of valence connectivity indices. It delivers information about the presence of double and triple bonds and about the number of heteroatoms (i.e., N, S, O) in the molecule [50,51]. We point out that the value of X5AvC does not depend on type of cation. It’s decrease with increasing number of double bonds and increase with decreasing number of heteroatoms. Influence of presence of double bonds and heteroatoms is presented in Table 1. Cation name Number of double bonds X5AvK 1-Benzyl-3-methylimidazolium 4-(Dimethylamino)-1-hexylpyridinium 1-Methyl-3-hexylimidazolium Butyl-(2-hydroxyethyl)-dimethylammonium 5 3 2 0 0.03 0.04 0.06 0.11 1-(Ethoxymethyl)-3-methyl-imidazolium 1-Methyl-3-hexylimidazolium Number of heteroatoms 3 2 0.03 0.06 Table 2 Predicted values n-octanol/water partition coefficient for chosen ILs. Ionic liquid log KOW 1-Methyl-3-nonylimidazolium tetrafluoroborate 1-Methyl-3-octylimidazolium tetrafluoroborate 1-Methyl-3-pentylimidazolium tetrafluoroborate 1-Methylimidazolium tetrafluoroborate 0.67 0.31 -0.89 −2.15 Furthermore, the type of anion has also a considerable impact on the value of n-octanol/water partition coefficient. L3mA anionic descriptor belongs to the group of WHIM (Weighted Holistic Invariant Molecular) descriptors. WHIM descriptors are statistical indices derived from projections of the atoms along with principal axes [50]. The algorithm consists in performing a principal components analysis [52] on the centered molecular coordinates by using a weighted covariance matrix. The matrix can be obtained based on different weighting schemes for the atoms. This includes weighting by: unit, atomic mass, van der Waals volume, Sanderson atomic electronegativity, atomic polarizability and the electrotopological index of Kier and Hall [53,54]. In particular, L3m descriptor describes molecular size along with the third principal direction weighted by mass. For small anions (e.g., halides) the values of L3mA are close to 0. Larger and more complex anions are characterized by higher values of L3mA descriptor. For example, the value of L3mA is equal to 0.55 for tetrafluoroborate anion, whereas the value of L3mA raises up to 0.79 for bis(trifluoromethylsulfonyl) amide anion. The meaning of descriptors employed in the model can be simply explained in the context of the solvation processes mechanisms, i.e., interactions of a solute with the solvent involving electrostatic forces, van der Waals forces and more specific effects, such as hydrogen bonds formation [55]. The presence of ILs in water changes the structure of the hydrogen bond network. Water molecules orient themselves toward the ions. This requires that many hydrogen bonds between water molecules must be broken, especially in case of relatively large, highly branched ions of ILs. Moreover, the formation of the solvation shells and the orientation of water molecules around the ILs cations and anions depend on the charge of IL. Those factors could decide about ILs solvation [56]. Moreover, experimental data indicate that presence of hydrophobic groups like alkyl chain have an effect on ILs solubility. Research [57] proved that the ethyl-substituted imidazolium ionic liquids are less soluble in water and more soluble in 1-octanol than the methyl-substituted compounds. This behavior is result of van der Waals interactions between the alkyl chains of alcohol and ILs. Compound in which alkyl chain contains only few carbon atoms (low value of ON0VC ) has got the lowest log KOW . Those chemicals are the most hydrophilic ionic liquids and most probably will not accumulate or concentrate in the environment. Increasing the number of carbon atom in alkyl chain (higher value of ON0VC ) cause notable growth of log KOW , ILs are more lipophilic (Table 2). Furthermore, based on regression coefficient value in our model equation we pointed out which ion has got major influence on log KOW value. High positive regression coefficient value of ON0VC 142 A. Rybinska et al. / Journal of Hazardous Materials 303 (2016) 137–144 Table 3 Comparison of described models [24,57,58]. Models Method ILs in training set R2 Internal validation External validation Present study Model I Model II Model III QSPR LFER ABF-MD MD Various Chloride-based Imidazolium-based Various 0.92 0.98 – – 0.87 – – – 0.83 – – – descriptor suggests that chemical structure of cation is crucial for n-octanol/water partition coefficient value. Type of anion structure has got minor but still significant impact on log KOW value of ionic liquids. 3.3. Comparison with the existing models In this work we present the first QSPR model having a wide applicability domain (training set represent ionic liquids with diversified structures). We found three previously published models for predicting n-octanol/water partition coefficient (Table 4), but any of them was developed according to typical Hansch approach. In 2011 Cho et al. [26] (Table 3: Model I) published a model to predict n-octanol/water partition coefficient of chloride-based ionic liquids. They employed a modified Abraham equation (with added experimentally determined anionic hydrophobicity parameter) and multiple linear regression as the method of modeling. The model was characterized by high goodness of fit to the experimental data (R2 = 0.98). The authors demonstrated significant influence of two descriptors, namely: molar volume of the cation and anion hydrophobicity on the modeled property. Although the model was well fitted, unfortunately the authors did not perform external validation and have not provided any information about the assessment of applicability domain. Kamath et al. [58] (Table 3: Model II) proposed model utilizing adaptive bias force-molecular dynamics (ABF-MD) simulations. They predicted the values of log KOW for six imidazolium-based ILs by using an equation, in which log KOW is dependent on the free energy of hydration and solvation as well as on the temperature. The free energy of hydration and solvation were calculated using ABF-MD simulations. Ammonium Lee and Lin [59] (Table 3: Model III) used calculation of infinite dilution activity coefficients (IDAC) in water and octanol-rich phases to predict KOW of 67 contrastive ionic liquids. Authors used the Pitzer–Debye–Hückel (PDH) model (long-range coulomb interactions) and COSMO-SAC model (short-range molecular surface interactions) to calculate the activity coefficients. That point of view allowed estimating the effect of different physical interactions on the value of KOW . The above models were developed by various methods, so we could compare them by two ways: (a) compare based on fitting parameters and size of applicability domain; (b) putting together values of log KOW predicted for the same ILs. Model I was characterized by high value of determination coefficient, but model was developed for ionic liquid based only on chloride anion. Moreover there was no information about validation method and AD. Model II and III were obtain by molecular dynamics simulation, it is important that this technique does not use any fitting parameters as in the QSPR model [59]. Compare to those models our QSPR model could be considered as reliable; as a consequence of fulfill OECD recommendation. Although, we put differential ionic liquids (varied in terms on cation and anion) to training set to provide wide range of the applicability domain (Table 4). Comparison between experimental and predicted log KOW for the same set of ionic liquids (12 overlapping ILs and estimating R2 can be found in Table F in Supplementary material) shows that Model I gives the most accurate values. However, values obtained by presented model and Model III are also well-fitted and R2 values are slightly lower than in Model I. Small set of ionic liquids in Model II caused that this model were not taken into account. Determination coefficient only allows estimating how models are good fitted to experimental values. To compare predictivity for all models internal and external validation should be made. Imidazolium Phosphonium 10 A 5 B 0 Predicted logKow C −5 0 20 40 60 100 150 Pyridinium 200 205 210 Pyrrolidinium 215 220 225 Sulfonium 10 Predicted logKow A −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 11 12 PYR Su 5 Cation B AM IM PH Py 0 C −5 240 260 280 300 305 310 315 320 325 330 335 Ionic liquid IDs Fig. 6. Comparison predicted values of log KOW with criterion established by Stockholm Convention (2 columns). For interpretation of the references to colour in the text, the reader is referred to the web version of this article. A. Rybinska et al. / Journal of Hazardous Materials 303 (2016) 137–144 3.4. Environmental fate of ionic liquids Developed QSPR model was used to predict the logarithmic values of KOW for 335 ILs, for which the experimentally measured values had been unavailable. Modeled property allows estimating compound’s lipophilicity and thereby points out which compound could be accumulated in organisms. To determinate which ILs could be bioaccumulated, all predicted values of log KOW were compared with criterion established by the Stockholm Convention [20]. In accordance with convention, compound for which value of log KOW is greater than 5could bioaccumulate in organisms. It means that ionic liquids with log KOW > 5 could have negative impact for environment. Predicted values of log KOW were presented at multi-panel plot (Fig. 6) in which each panel is dedicated to group of ILs with the same cation. We also determinate three sections accordingly to range of log KOW values. Ionic liquids in section A (above red line) have values of log KOW higher than 5. Compounds in section B (between red and green line) have log KOW greater than 0, but less than 5. Section C (under green line) contains ionic liquids with value of log KOW smaller that 0. ILs that belong to section A could be bioaccumulated, those from section B are partially bioaccumulated and those from section C are transported with water mass. Most of ammonium ionic liquids (98%) belong to section B or C. ILs with 2-hydroxyethyl substituent have the smallest values of log KOW. Only one compound, N,N-dimethyl-Ndioctadecyloammonium chloride, has log KOW higher than 5. In case of imidazolium and pyridinium groups, majority of ILs belong to section C. Values of log KOW higher than 0 for particular compounds is caused by large cation strucpresence (e.g. 1,3-didecyl-2-methyl-3H-imidazolium ture and 3-[[[[(decyloxy)methoxy]methyl]amino]carbonyl]-1[(decyloxy) methyl]pyridinium). Anion influence on log KOW is noticeable in pyrrolidinium group. ILs partially bioaccumulated (above green line) contain large anions like bis(trifluoromethylsulfonyl)imide. Compounds with the lowest values of log KOW contain mostly halide anions. When we consider one cation (1-butyl-1-methylpyrrolidinium) with anions (trifluoridotris(pentafluoroethyl))phosphate, various bis(trifluoromethylsulfonyl)imide, tetrafluoridoboranuide, bromide) we notice that values of log KOW is decreasing from 1.19 to −0.68. Phosphonium ionic liquids belong to section A entirely. Even though predictions for phosphonium ILs are less reliable (leverages higher than critical value and log KOW higher than YMAX ) they are consistent with literature information about their hydrophobicity [60]. Last group from prediction set contains only one sulfonium ionic liquid that won’t be bioaccumulate. Visual analysis of a plot (Fig. 6) shows that majority of ILs could be transported with water mass, thus they are less dangerous for environment. Only phosphonium and one ammonium ionic liquids are highly lipophilic. Lipophilicity is strongly correlated with ionic liquid toxicity. Research proved that phosphonium ionic liquids are accumulating in Escherichia coli cells, mainly in lipid membranes [61]. It was also proved that toxicity of phosphonium ionic liquids increase with the alkyl chain extending and they could be more toxic than the analog imidazolium-based ILs [62]. 4. Conclusions In our work we developed a QSPR model to predict the values of log KOW for 335 ionic liquids, for which the experimental values were not available in literature. The presented QSPR model fulfills the quality recommendations set by OECD. It is characterized by satisfactory goodness-of-fit (R2 = 0.91, RMSEC = 0.42), robustness (Q2 CV = 0.89, RMSECV = 0.48) and predictivity (Q2 EXT = 0.83; 143 RMSEEXT = 0.54). Based on the Williams plot, we observed that the model correctly predicted the modeled values despite the insufficient similarity in ionic liquids structure. Furthermore, we determined the effect of the cation and anion structures on the modeled property. We found that the values of n-octanol/water coefficient depend on the cation in a significant way. ILs with the shortest alkyl substituents in cations are the most hydrophilic (have the lowest values of log KOW ). The increase in the length of alkyl chain causes significant increase of the log KOW values. Hydrophilicity of the ionic liquid is also affected by the anion’s structure, but this influence in minor. Ionic liquids with heavily branched cation and anion exhibit relatively high lipophilicity. Information about the influence of the structure on hydrophilicity could be very useful when designing new ionic liquids that belong to third generation of ILs. When comparing the results obtained from our QSPR model with the results of others [26,58,59], we have found that values of log KOW predicted in current study were more reliable, thanks to a more diversified training set (three types of cations and five types of anions groups). The interpretation of descriptors in the presented model equation provides a clear explanation of the relationship between the structure of ionic liquid and its KOW value. Furthermore, the unquestionable advantage of the model presented in this study is that it fulfills all quality criteria for developing QSPRs recommended by OECD. Developed model could be successfully used to predict the log KOW values for each type of ionic liquids without necessity of performing additional time-consuming and expensive experiments. We also compared predicted log KOW values with norms established by the Stockholm Convention. According to those results we were able to point out that phosphonium-based ILs could have negative impact on environment. Their ability to bioaccumulation and potential toxicity should be considered before applying. In our opinion the predicted values of n-octanol/water partition coefficient for new ILs could be very crucial in assessment the environmental risk and fate of these chemicals. Acknowledgments We would like to thank Prof. Paola Gramatica from the University of Insubria for giving access to QSARINS software. This material is based on research sponsored by the Polish National Science Center (grant no. UMO-2012/05/E/NZ7/01148; Project “CRAB”). Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.jhazmat.2015.10. 023. References [1] S. Zhu, R. Chen, Y. Wu, Q. Chen, X. Zhang, Z. Yu, A mini-review on greenness of ionic liquids, Chem. Biochem. Eng. Q. 23 (2009) 207–211. [2] T.P. Pham, C.W. Cho, Y.S. Yun, Environmental fate and toxicity of ionic liquids: a review, Water Res. 44 (2010) 352–372. [3] R.N. Das, K. Roy, Advances in QSPR/QSTR models of ionic liquids for the design of greener solvents of the future, Mol. Divers. 17 (2013) 151–196. [4] W.L. Hough, M. Smiglak, H. Rodriguez, R.P. Swatloski, S.K. Spear, D.T. Daly, J. Pernak, J.E. Grisel, R.D. Carliss, M.D. Soutullo, J.H. Davis, R.D. Rogers, The third evolution of ionic liquids: active pharmaceutical ingredients, New J. Chem. 31 (2007) 1429–1436. [5] B. Jastorff, K. Mölter, P. Behrend, U. Bottin-Weber, J. Filser, A. Heimers, B. Ondruschka, J. Ranke, M. Schaefer, H. Schröder, A. Stark, P. Stepnowski, F. Stock, R. Störmann, S. Stolte, U. Welz-Biermann, S. Ziegert, J. Thöming, Progress in evaluation of risk potential of ionic liquids—basis for an eco-design of sustainable products, Green Chem. 7 (2005) 362. 144 A. Rybinska et al. / Journal of Hazardous Materials 303 (2016) 137–144 [6] A. Romero, A. Santos, J. Tojo, A. Rodriguez, Toxicity and biodegradability of imidazolium ionic liquids, J. Hazard. Mater. 151 (2008) 268–273. [7] S. Studzinska, T. Kowalkowski, B. Buszewski, Study of ionic liquid cations transport in soil, J. Hazard. Mater. 168 (2009) 1542–1547. [8] J. Sangster, Octanol–Water Partition Coefficients: Fundamentals and Physical Chemistry, John Wiley & Sons, 1997. [9] O. Chalkiadaki, M. Dassenakis, N. Lydakis-Simantiris, Bioconcentration of Cd and Ni in various tissues of two marine bivalves living in different habitats and exposed to heavily polluted seawater, Chem. Ecol. 30 (2014) 726–742. [10] A. Zenker, M.R. Cicero, F. Prestinaci, P. Bottoni, M. Carere, Bioaccumulation and biomagnification potential of pharmaceuticals with a focus to the aquatic environment, J. Environ. Manage. 133 (2014) 378–387. [11] T. Puzyn, On the replacement of empirical parameters in multimedia mass balance models with QSPR data, J. Hazard. Mater. 192 (2011) 970–977. [12] Y. Deng, P. Besse-Hoggan, P. Husson, M. Sancelme, A.M. Delort, P. Stepnowski, M. Paszkiewicz, M. Golebiowski, M.F. Costa Gomes, Relevant parameters for assessing the environmental impact of some pyridinium, ammonium and pyrrolidinium based ionic liquids, Chemosphere 89 (2012) 327–333. [13] A. Sosnowska, M. Barycki, K. Jagiełło, M. Harańczyk, A. Gajewicz, T. Kawai, N. Suzuki, T. Puzyn, Predicting enthalpy of vaporization for Persistent Organic Pollutants with Quantitative Structure–Property Relationship (QSPR) incorporating the influence of temperature on volatility, Atmos. Environ. 87 (2014) 10–18. [14] Y. Zhao, J. Zhao, Y. Huang, Q. Zhou, X. Zhang, S. Zhang, Toxicity of ionic liquids: database and prediction via quantitative structure–activity relationship method, J. Hazard. Mater. 278 (2014) 320–329. [15] E.B. de Melo, A structure-activity relationship study of the toxicity of ionic liquids using an adapted Ferreira–Kiralj hydrophobicity parameter, Phys. Chem. Chem. Phys. 17 (2015) 4516–4523. [16] M.T.D. Cronin, Quantitative structure–ctivity relationships (QSARs)—applications and methodology, in: T. Puzyn, J. Leszczynski, M.T.D. Cronin (Eds.), Recent Advances in QSAR Studies. Methods and Applications, Springer, Dordrecht, New York, 2010, pp. 3–11. [17] Y. Pan, J.C. Jiang, R. Wang, H.Y. Cao, J.B. Zhao, Prediction of auto-ignition temperatures of hydrocarbons by neural network based on atom-type electrotopological-state indices, J. Hazard. Mater. 157 (2008) 510–517. [18] M.H. Keshavarz, Prediction of heats of sublimation of nitroaromatic compounds via their molecular structure, J. Hazard. Mater. 151 (2008) 499–506. [19] A. Lewis, N. Kazantzis, I. Fishtik, J. Wilcox, Integrating process safety with molecular modeling-based risk assessment of chemicals within the REACH regulatory framework: Benefits and future challenges, J. Hazard. Mater. 142 (2007) 592–602. [20] UNEP 2001: Final act of the conference of plenipotentiaries on the Stockholm convention on persistent organic pollutants (2001). [21] D. Livingstone, Building QSAR models: a practical guide, in: M.T.D. Cronin, D. Livingstone (Eds.), Predicting Chemical Toxicity and Fate, CRC Press, Boca Raton, Florida, 2004, pp. 143–162. [22] U. Domańska, E. Bogel-Łukasik, R. Bogel-Łukasik, 1-Octanol/water partition coefficients of 1-alkyl-3-methylimidazolium chloride, Chem. Eur. J. 9 (2003) 3033–3041. [23] U. Domańska, A. Marciniak, Phase behaviour of 1-hexyloxymethyl-3-methyl-imidazolium and 1,3-dihexyloxymethyl-imidazolium based ionic liquids with alcohols, water, ketones and hydrocarbons: the effect of cation and anion on solubility, Fluid Phase Equilib. 260 (2007) 9–18. [24] L. Ropel, L.S. Belveze, S.N.V.K. Aki, M.A. Stadtherr, J.F. Brennecke, Octanol–water partition coefficients of imidazolium-based ionic liquids, Green Chem. 7 (2005) 83–90. [25] L.S. Belveze, Modeling and Measurement of Thermodynamic Properties of Ionic Liquids, University of Notre Dame, Notre Dame, Indiana, 2004. [26] C.W. Cho, U. Preiss, C. Jungnickel, S. Stolte, J. Arning, J. Ranke, A. Klamt, I. Krossing, J. Thoming, Ionic liquids: predictions of physicochemical properties with experimental and/or DFT-calculated LFER parameters to understand molecular interactions in solution, J. Phys. Chem. B 115 (2011) 6040–6050. [27] U. Domanska, R. Bogel-Lukasik, Physicochemical properties and solubility of alkyl-(2-hydroxyethyl)-dimethylammonium bromide, J. Phys. Chem. B 109 (2005) 12124–12132. [28] S. Stefan, S. Reinhold (2003–2014) The UFT/Merck Ionic Liquids Biological Effects Database 31.08.2014 http://www.il-eco.uft.uni-bremen.de. [29] A.C.D. Inc (2010) ACD/ChemSketch, 12.01, Advanced Chemistry Development, Inc., Toronto, ON, Canada, http://www.acdlabs.com. [30] J. Hostas, J. Rezac, P. Hobza, On the performance of the semiempirical quantum mechanical PM6 and PM7 methods for noncovalent interactions, Chem. Phys. Lett. 568 (2013) 161–166. [31] J.J.P. Stewart (2012) MOPAC2012, Stewart Computational Chemistry, Colorado Springs, http://openmopac.net/MOPAC2012.html. [32] Talete (2014) Dragon (Software for Molecular Descriptor Calculation), 6.0, Milano, http://www.talete.mi.it/. [33] M.A. Sharaf, D.L. Illman, B.R. Kowalski, Chemometrics, Wiley, New York, 1986. [34] I. Stanimirova, D. Michał, B. Walczak, Metody uczenia z nadzorem—kalibracja, dyskryminacja i klasyfikacja, in: A. Parczewski, D. Zuba (Eds.), Chemometria ˛ Kraków, 2008, pp. w Analityce, Wydawnictwo Instytutu Ekspertyz Sadowych, 137–142. [35] C.M. Andersen, R. Bro, Variable selection in regression—a tutorial, J. Chemom. 24 (2010) 728–737. [36] P. Gramatica, N. Chirico, E. Papa, S. Cassani, S. Kovarich, QSARINS: a new software for the development, analysis, and validation of QSAR MLR models, J. Comput. Chem. 34 (2013) 2121–2132. [37] P. Gramatica, S. Cassani, N. Chirico, QSARINS-chem: Insubria datasets and new QSAR/QSPR models for environmental pollutants in QSARINS, J. Comput. Chem. 35 (2014) 1036–1044. [38] E.L. Piparo, A. Worth, Review of QSAR Models and Software Tools for Predicting Developmental and Reproductive Toxicity, Institute for Health and Consumer Protection, Luxembourg, 2010. [39] R. Cox, D.V.S. Green, C.N. Luscombe, N. Malcolm, S.D. Pickett, QSAR workbench: automating QSAR modeling to drive compound design, J. Comput. Aided Mol. Des. 27 (2013) 321–336. [40] OECD principles for the validation, for regulatory purposes, of Quantitative Structure–Activity Relationship models OECD (2004). ˛ [41] T. Puzyn, A. Mostrag-Szlichtyng, N. Suzuki, M. Harańczyk, Metody ˛ struktura˛ chemometryczne w ocenie ryzyka: Ilościowe zależności pomiedzy ˛ a właściwościami (QSPR) dla nowych rodzajów zanieczyszczeń chemiczna, chemicznych, in: A. Parczewski, D. Zuba (Eds.), Chemometria w Nauce i ˛ Kraków, 2009, pp. Praktyce, Wydawnictwo Instytutu Ekspertyz Sadowych, 61–71. [42] P. Gramatica, Principles of QSAR models validation: internal and external, Qsar & Comb. Sci. 26 (2007) 694–701. [43] R. Bro, K. Kjeldahl, A.K. Smilde, H.A. Kiers, Cross-validation of component models: a critical look at current methods, Anal. Bioanal. Chem. 390 (2008) 1241–1251. [44] G. Toth, Z. Bodai, K. Heberger, Estimation of influential points in any data set from coefficient of determination and its leave-one-out cross-validated counterpart, J. Comput. Aided Mol. Des. 27 (2013) 837–844. [45] N. Chirico, P. Gramatica, Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient, J. Chem. Inf. Model. 51 (2011) 2320–2335. [46] T. Puzyn, A. Gajewicz, A. Rybacka, M. Harańczyk, Global versus local QSPR models for persistent organic pollutants: balancing between predictivity and economy, Struct. Chem. 22 (2011) 873–884. [47] P. Gramatica, S. Cassani, P.P. Roy, S. Kovarich, C.W. Yap, E. Papa, QSAR modeling is not push a button and find a correlation: a case study of toxicity of (benzo-)triazoles on algae, Mol. Inf. 31 (2012) 817–835. [48] D. Bonchev, N. Trinajstic, Overall molecular descriptors. 3. Overall Zagreb indices, SAR QSAR Environ. Res. 12 (2001) 213–236. [49] R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, WILEY-VCH, Weinheim, 2000. [50] R. Todeschini, V. Consonni, Molecular Descriptors for Chemoinformatics, Wiley-VCH, Weinheim, 2009. [51] O. Isayev, B. Rasulev, L. Gorb, J. Leszczynski, Structure–toxicity relationships of nitroaromatic compounds, Mol. Divers. 10 (2006) 233–245. [52] A. Sosnowska, M. Barycki, M. Zaborowska, A. Rybinska, T. Puzyn, Towards designing environmentally safe ionic liquids: the influence of the cation structure, Green Chem. 16 (2014) 4749–4757. [53] P. Gramatica, WHIM descriptors of shape, QSAR Comb. Sci. 25 (2006) 327–332. [54] R. Todeschini, P. Gramatica, New 3D molecular descriptors: the WHIM theory and QSAR applications, Perspect. Drug Discov. Des. 9–11 (1998) 355–380. [55] P. Muller, Glossary of terms used in physical organic chemistry (IUPAC Recommendations 1994), Pure Appl. Chem. 66 (1994). [56] C. Spickermann, J. Thar, S.B. Lehmann, S. Zahn, J. Hunger, R. Buchner, P.A. Hunt, T. Welton, B. Kirchner, Why are ionic liquid ions mainly associated in water? A Car-Parrinello study of 1-ethyl-3-methyl-imidazolium chloride water mixture, J. Chem. Phys. 129 (2008) 104505. ˛ [57] U. Domańska, A. Rekawek, A. Marciniak, Solubility of 1-alkyl-3-ethylimidazolium-based ionic liquids in water and 1-octanol, J. Chem. Eng. Data 53 (2008) 1126–1132. [58] G. Kamath, N. Bhatnagar, G.A. Baker, S.N. Baker, J.J. Potoff, Computational prediction of ionic liquid 1-octanol/water partition coefficients, Phys. Chem. Chem. Phys. 14 (2012) 4339–4342. [59] B.-S. Lee, S.-T. Lin, A priori prediction of the octanol–water partition coefficient (Kow ) of ionic liquids, Fluid Phase Equilib. 363 (2014) 233–238. [60] K.J. Fraser, D.R. MacFarlane, Phosphonium-based ionic liquids: an overview, Aust. J. Chem. 62 (2009) 309. [61] R.J. Cornmell, C.L. Winder, G.J.T. Tiddy, R. Goodacre, G. Stephens, Accumulation of ionic liquids in E. coli cells, Green Chem. 10 (2008) 836. [62] S.P.M. Ventura, C.S. Marques, A.A. Rosatella, C.A.M. Afonso, F. Goncalves, J.A.P. Coutinho, Toxicity assessment of various ionic liquid families towards Vibrio fischeri marine bacteria, Ecotoxicol. Environ. Saf. 76 (2012) 162–168.