Project EDA-2: Quantitative-Structure

advertisement
Project EDA-2: Quantitative-Structure-Activity Relationships (QSARs) of Nanomaterials
Toxicity and Physicochemical Properties
Yoram Cohen, Donatello Telesca, Robert Rallo, Kenneth Bradley, Andre Nel, Rong Liu, Cecile
Low-Kam, Taimur Hassan
In-silico methods for correlating toxicity end points associated with exposure of cell lines and whole organisms to nanoparticles
are essential for environmental hazard assessment. Accordingly, the primary goal of the project is to develop quantitativestructure-activity relationships (QSARs) for nanoparticles toxicity (i.e., Nano-SARs) based on HTS data initially focusing on metal
and metal-oxide nanoparticles. Toxic outcomes can be quantified based on numerical or visual data (e.g., images depicting a
response) and approaches developed in Project EDA-1 (“Machine Learning Analysis and Modeling of High Throughput Screening
Data for Nanoparticles”) form the core of the mathematical and computer algorithms utilized for Nano-SAR development. The
development of predictive Nano-SARs require identification and quantification of the relevant features (i.e., parameters) that
correlate the identified toxicity measures with nanoparticles properties and environmental conditions (e.g., concentration) via
appropriate feature selection methods. Based on our previously developed unsupervised feature selection method (KLS-FS),
which can handle both linear and non-linear feature selections, a classification based cytotoxicity nano-SAR was developed based
on a set of nine metal oxide nanoparticles to which transformed bronchial epithelial cells (BEAS-2B) were exposed over a range of
concentrations of 0.375-200 μg/ml and exposure times up to 24 h. The nano-SAR was developed using cytotoxicity data for BEAS2B cells with the best performing model, utilizing only four descriptors (atomization energy of the metal oxide, period of the
nanoparticle metal, nanoparticle primary size, in addition to nanoparticle volume fraction in solution), having a 100% classification
accuracy in both internal and external validation. This classification Nano-SAR enables one to identify decision boundaries which
are crucial for use in hazard ranking of nanoparticles. The above Nano-SAR as well as the KLS-FS feature selection algorithm were
implemented within the Weka data mining/machine learning package and are available for use by others. More recently, the
previously developed KLS-FS feature selection method was also utilized to develop a Nano-SAR based on a literature HTS data set
of 109 nanoparticles (for nanoparticles uptake by PaCa2 cells) consisting of the same core (iron oxides/NH2 core based) but with
different organic chemical surface modifications. Application of the unsupervised KLS-FS feature selection and a subsequent
supervised genetic feature selection algorithm for an initial pool of 150 2D molecular descriptors resulted in the selection of only
ten features. The resulting Nano-SAR performed with a Mean Absolute error of ~6%. The above KLS-FS method is now also being
employed for arriving at a Nano-SAR based on a new CEIN HTS data set being developed for 24 metal oxide nanoparticles (by the
CEIN “Molecular, cellular and organism high-throughput screening for hazard assessment” group (Theme 2). In this collaborative
effort nano-(Q)SAR development for the above nanoparticles (based on three toxicity assays, the BEAS-2B and RAW cell lines, and
nanoparticles concentration 0.39 - 200 ppm), the slope of the dose response curve for the nanoparticles (i.e., the rate of response
increase with concentration) at the EC50 was introduced as a metric for labeling “safe” nanoparticles. Based on whether the slope
is significantly larger than 0, NPs were divided into a toxic group (7 NPs: ZnO, CuO, Mn2O3, CoO, Ni2O3, Co3O4, Cr2O3) and a
Nontoxic group (17 NPs). Subsequently, the modeling task was formulated as a classification problem given that for the following
data set: (a) There is greater confidence/reliability in predicting whether a NP is toxic or not than to predict the toxicity level; and
(b) There are only marginal differences in the responses induced by “nontoxic” NPs (17 NPs). Twelve nanoparticle descriptors
were evaluated and these were classified as: (a) Basic nanoparticles Descriptors: Average NP size in water (dw, nm), Atomization
Energy (Ea, eV), Conduction Band Energy (Ec, eV), Valence Band Energy (Ev, eV); and (b) Derived Descriptors: Chemical Hardness
(η), Chemical Potential (μ), Electrophilicity (ω)2, four formation enthalpies ΔHs in Born-Haber Cycle, and ΔHIE1. Accordingly, an
initial nano-SAR classification model (toxic versus non-toxic) was evaluated using the above collection of NP descriptors was
developed, along with applicability domain analysis, demonstrating ~92% predictive accuracy (measured via a five-fold crossvalidation). The above development was based on an extensive effort that included the use of self-organizing map analysis for
selecting the training and validation data sets. Work is continuing in collaboration with Theme 2 to further improve the nano-SAR
accuracy. In the developed nan-SAR, the potential utility of the bandgap energy structure as a criterion for predicting the toxicity
of metal oxides in cellular HTS assays in addition to other pertinent descriptors. Additional correlation of those predictions to the
intact animal level is further described in Projects ENM-1 (Theme 1) and HTS-6 (Theme 2). Finally, it is noted that Nano-SARs
developed in this project will be incorporated decision analysis tools being developed in Project EDA-4 (“Environmental Impact
Analysis”).
Download