Project EDA-2: Quantitative-Structure-Activity Relationships (QSARs) of Nanomaterials Toxicity and Physicochemical Properties Yoram Cohen, Donatello Telesca, Robert Rallo, Kenneth Bradley, Andre Nel, Rong Liu, Cecile Low-Kam, Taimur Hassan In-silico methods for correlating toxicity end points associated with exposure of cell lines and whole organisms to nanoparticles are essential for environmental hazard assessment. Accordingly, the primary goal of the project is to develop quantitativestructure-activity relationships (QSARs) for nanoparticles toxicity (i.e., Nano-SARs) based on HTS data initially focusing on metal and metal-oxide nanoparticles. Toxic outcomes can be quantified based on numerical or visual data (e.g., images depicting a response) and approaches developed in Project EDA-1 (“Machine Learning Analysis and Modeling of High Throughput Screening Data for Nanoparticles”) form the core of the mathematical and computer algorithms utilized for Nano-SAR development. The development of predictive Nano-SARs require identification and quantification of the relevant features (i.e., parameters) that correlate the identified toxicity measures with nanoparticles properties and environmental conditions (e.g., concentration) via appropriate feature selection methods. Based on our previously developed unsupervised feature selection method (KLS-FS), which can handle both linear and non-linear feature selections, a classification based cytotoxicity nano-SAR was developed based on a set of nine metal oxide nanoparticles to which transformed bronchial epithelial cells (BEAS-2B) were exposed over a range of concentrations of 0.375-200 μg/ml and exposure times up to 24 h. The nano-SAR was developed using cytotoxicity data for BEAS2B cells with the best performing model, utilizing only four descriptors (atomization energy of the metal oxide, period of the nanoparticle metal, nanoparticle primary size, in addition to nanoparticle volume fraction in solution), having a 100% classification accuracy in both internal and external validation. This classification Nano-SAR enables one to identify decision boundaries which are crucial for use in hazard ranking of nanoparticles. The above Nano-SAR as well as the KLS-FS feature selection algorithm were implemented within the Weka data mining/machine learning package and are available for use by others. More recently, the previously developed KLS-FS feature selection method was also utilized to develop a Nano-SAR based on a literature HTS data set of 109 nanoparticles (for nanoparticles uptake by PaCa2 cells) consisting of the same core (iron oxides/NH2 core based) but with different organic chemical surface modifications. Application of the unsupervised KLS-FS feature selection and a subsequent supervised genetic feature selection algorithm for an initial pool of 150 2D molecular descriptors resulted in the selection of only ten features. The resulting Nano-SAR performed with a Mean Absolute error of ~6%. The above KLS-FS method is now also being employed for arriving at a Nano-SAR based on a new CEIN HTS data set being developed for 24 metal oxide nanoparticles (by the CEIN “Molecular, cellular and organism high-throughput screening for hazard assessment” group (Theme 2). In this collaborative effort nano-(Q)SAR development for the above nanoparticles (based on three toxicity assays, the BEAS-2B and RAW cell lines, and nanoparticles concentration 0.39 - 200 ppm), the slope of the dose response curve for the nanoparticles (i.e., the rate of response increase with concentration) at the EC50 was introduced as a metric for labeling “safe” nanoparticles. Based on whether the slope is significantly larger than 0, NPs were divided into a toxic group (7 NPs: ZnO, CuO, Mn2O3, CoO, Ni2O3, Co3O4, Cr2O3) and a Nontoxic group (17 NPs). Subsequently, the modeling task was formulated as a classification problem given that for the following data set: (a) There is greater confidence/reliability in predicting whether a NP is toxic or not than to predict the toxicity level; and (b) There are only marginal differences in the responses induced by “nontoxic” NPs (17 NPs). Twelve nanoparticle descriptors were evaluated and these were classified as: (a) Basic nanoparticles Descriptors: Average NP size in water (dw, nm), Atomization Energy (Ea, eV), Conduction Band Energy (Ec, eV), Valence Band Energy (Ev, eV); and (b) Derived Descriptors: Chemical Hardness (η), Chemical Potential (μ), Electrophilicity (ω)2, four formation enthalpies ΔHs in Born-Haber Cycle, and ΔHIE1. Accordingly, an initial nano-SAR classification model (toxic versus non-toxic) was evaluated using the above collection of NP descriptors was developed, along with applicability domain analysis, demonstrating ~92% predictive accuracy (measured via a five-fold crossvalidation). The above development was based on an extensive effort that included the use of self-organizing map analysis for selecting the training and validation data sets. Work is continuing in collaboration with Theme 2 to further improve the nano-SAR accuracy. In the developed nan-SAR, the potential utility of the bandgap energy structure as a criterion for predicting the toxicity of metal oxides in cellular HTS assays in addition to other pertinent descriptors. Additional correlation of those predictions to the intact animal level is further described in Projects ENM-1 (Theme 1) and HTS-6 (Theme 2). Finally, it is noted that Nano-SARs developed in this project will be incorporated decision analysis tools being developed in Project EDA-4 (“Environmental Impact Analysis”).