(Q)SAR models

advertisement
EXAMPLES OF CURRENTLY USED (Q)SAR-MODELS IN DANISH EPA
Endpoint
Reference
Source
Bintein and Devillers, SAR and QSAR in Environmental Research, Vol. 1,
pp. 29-39 (1993):
Bintein et al.
log BCF = 0.910 log P - 1.975 log (6.8 10-7 P + 1) - 0.786 r=0.950
1993
s=0.347 F=463.51
Multicase model. Data from MITI report; "Biodegradation and
Bioaccumulation Data of Existing Chemicals based on the CSCL Japan",
Biodegradation
edited by Chemicals Inspection & Testing Institute, Japan Oct. 1992,
II (DK)
DK-EPA
Japan Chemical Industry Ecology - Tox. & Info-Center, and supplemented
with extra data from Phil Howard, Syracuse, for readily biodegradable
substances.
Multicase model. Data from Brooke, L.T. et al: "Acute Toxicities of Organic
Fish, LC50
Chemicals to Fathead Minnows (Pimephales promelas)", Center for Lake
DK-EPA
(96h)
Superior Environm. Studies, University of Wisconsin, Superior 1988.
Multicase model. Data from Aquire Database, Kuhn (Water Res. 23/1989),
Daphnia, LC50
Joop, DK-EPA tests and O.C.Hansen, DTI Environment, "Quantitative
DK-EPA
(48h)
Structure-Activity Relationships (QSAR) and Pesticides", April 1999
Multicase model. Ongoing testing on the Danish Technical University
Algae, EC50
(DTU) on Pseudokirchneriella subcapitata (Selenastrum capricornutum),
DK-EPA
(growth)
plus data from various sources in the open literature
Tetrahymena
Multicase model. Data generated by Dr. Terry Shultz, Univ. Tennessee
pyriformis (46h,
DK-EPA
(semi-confidential data set)
IGC50)
BCF (BIntein)
LD50 (Rat,
acute oral)
LOAEL (Rat,
Chronic)
Skin Irritation
(Severe vs.
mild)
Sensitization I
(dermal)
– strong vs.
Mild
Sensitization II
TOPKAT model
Commercial
TOPKAT model
Commercial
Multicase model. Data from RTECS nov. 2000, HSDB nov. 2000, and 22.
Amendment of EU Annex I
Training
Validation
Set
Domain*
n=154
n=745
LGO (10%) gave: Sens.=77%, Spec.=76%,
Conc.=77%
60%
n=569
LGO (10%) gave: q2 = 0.76
47%
n=641
LGO (10%) gave: q2=0.69
53%
n=476
LGO (50%) gave q2=0.77
39%
n=1860
LGO (50%) gave q2=0.73
51%
LOO of the 19 sub-models gave 86-100% of
estimates falling within a factor of five from test
n=Approx.
results. DK-EPA external validation with 1840
4000
substances gave q2=0.31 and 86% estimated to
be within a factor of ten from test results.
LOO of the 5 sub-models gave 95% of estimates
n=393
falling within a factor of 3-5 from test results.
66%
61%
DK-EPA
n=800
LGO (20%) gave: Sens.= 59.7%, Spec. = 90.5%,
Conc.= 77.4%
50%
TOPKAT model, GPMT data from open literature
Commercial
n=389
LOO gave: Sens.=85-94%, Spec.=87-96%
36%
TOPKAT model, GPMT data from open literature
Commercial
n=390
LOO gave: Sens.=88-96%, Spec.=88-98%
42%
Multicase model A33, data from GPMT or human experience (ACD
Commercial
n=1033
LGO (10%) gave: Sens.=69-89%, Spec.=89-
58%
Endpoint
Reference
Source
(dermal)
Humans)
Respiratory
Multicase model. Data from Graham et. al; Regul. Toxicol. and Pharmacol.
DK-EPA
allergy
26; 296-306 (1997)
Reproductive
TOPKAT model, data from open literature (rat oral data)
Commercial
toxicity
Teratogenicity
Multicase model A49, data from FDA-TERIS program
Commercial
(Human)
Estrogen-a
Receptor
Multicase model. Data from Japanese METI test data (6th Meeting of the
Binding Assay - Task Force on Endocrine Disrupters Testing and Assessment (EDTA) 24DK-EPA
balanced
25 June 2002, Tokyo, Appendix I)
model
Estrogen
reporter gene
Multicase model. Preliminary assessment of Human estrogen-a Reporter
Gene model based on Japanese METI test data
(6th Meeting of the Task Force on Endocrine Disrupters Testing and
Assessment (EDTA) 24-25 June 2002, Tokyo,Appendix I) after
introduction of new structural notations.
DK-EPA
Multicase model. Data from the open literature plus approximately 200 test
Antiandrogene
results funded by DK EPA and made at the Danish Institute for Veterinary
DK-EPA
effect
and Food Research in 2003/2004.
Arh Receptor Multicase model. Rannug et. Al, Carcinogenesis vol. 12, no. 11, pp. 2007Commercial
Binding
2015, 1991.
Training
Validation
Set
94%, Conc.=82-89%
LGO (10%) gave: Sens.=71.4%, Spec.= 90.0%,
n=79
Conc.=85.2%
LOO of the 3 sub-models gave: Sens.=86-89%,
n=273
Spec.=86-97%
LGO (10%) gave: Sens.=60-100%, Spec.=75n=323
92.9%, Conc.=70-88.9%
26%
37%
43%
n=803
LGO (50%) gave: Sens. = 77%, Spec. = 87%,
Conc. = 82%
52%
n=899
LGO (50%) gave: Sens. = 73%, Spec. = 87%
and Conc. = 81.4%. External validation for
Specificity: 418 Reporter Gene assay chemicals
which tested inactive, and which were excluded
prior to modelling
were predicted and gave the following results: Of
384 Active or Inactive predictions, 336/384 or
88% were correctly predicted as Negative - Of
the 303 chemicals which were fully
within the AOK model domain, 274/303 or 90%
were predicted negative.
61%
n=340
LGO (50%) gave: Sens. = 64%, Spec. = 90%
and Conc. = 78%.
31%
n=148
Samuel H. Yalkowsky, “A New Absorption Parameter”, QSAR 2002 (10th
G.I. absorption International Workshop on QSARs in Environmental Science, May 2002,
Ottawa, Ontario, Canada)
Conference
paper
n=132
Structural alerts
for DNA
Multicase model A2E, data from Ashby and associates
reactivity
(Ashby)
Commercial
n=784
Ames
TOPKAT model, Mutagenicity (Ames)
mutagenicity I
Commercial
n=1866
Ames
Commercial
n=2034
Multicase model A2H, data from NTP (NIEHS-USA) or GENETOX (EPA)
Domain*
LGO (10%) gave: Sens. = 80-100%, Spec. = 71100% and Conc. = 75-100%.
External validation with 98 chemicals gave: 96%
of well absorbed substances correctly predicted
and 70% of poorly absorbed substances correctly
predicted. Overall concordance 89%.
LGO (10%) gave: Sens.=84.6-97.6% Spec.= 6069.2%, Conc.= 80.6-87.1%
LOO result of the 10 sub-modules: Sens. and
spec. of 75-100%. Ext. val. by DK-EPA (1998)
with 118 chemicals (A: 50 and I: 68) gave:
Sens.=76% and Spec.= 82%
LGO (10%) out gave: Sens.=75-78.5%,
7%
-
66%
61%
81%
Endpoint
Reference
Source
mutagenicity II
Training
Validation
Set
Spec.=78.2-90%, Conc.=78.3-83.2% DK-EPA
LGO (50%) validation gave: Sens.= 72%,
Spec.=89%, Conc.=80% and external validation
with 53 CPDB chemicals (27 pos and 26 neg)
gave Sens.=74%, Spec.=96% and Conc.=85%
Domain*
(Coverage
for
MCASE
Ames or
TOPKAT
Ames:
90.1%)
LGO (10%) gave: Sens.= 71%, Spec.=82%,
Conc.= 78%
LGO (50%) gave: Sens.= 57%, Spec.=87%,
Conc.= 72%
LGO (50%) gave: Sens.= 71%, Spec.=77%,
Conc.= 75%
LGO (50%) gave: Sens.= 63%, Spec.=79%,
Conc.= 80%
- Direct, -S9
Multicase model. Data from Proctor and Gamble
DK-EPA
n=396
- Base-pair
Multicase model. Data from Proctor and Gamble
DK-EPA
n=207
- Frame-shift
Multicase model. Data from Proctor and Gamble
DK-EPA
n=314
- Rev.>10*Ctrl Multicase model. Data from Proctor and Gamble
DK-EPA
n=189
Commercial
n=233
LGO (10%) gave: Sens. = 44-80%, Spec. = 5080%, Conc. = 56-80%.
43%
DK-EPA
n=506
LGO (50%) gave: Sens. = 64%, Spec. = 83%,
Conc. = 74%, LGO (10%) gave: Sens. = 63%,
Spec. = 86%, Conc. = 76%. External validation
with 62 chemicals within domain gave: Sens. =
59%, Spec. = 82%, Conc. = 76%.
63%
Grant et al.
2001
n=555
LGO (50%) gave: Sens.= 70%, Spec. = 82% and
Conc. = 77%
54%
DK-EPA
n=360
LGO (50%) gave: Sens. = 51%, Spec. = 84%,
Conc. = 71%
61%
Chromosomal
aberrations
Multicase model A61, data from NTP (tests in cultured CHO cells)
(CHO) (NTP)
Multicase model. Data from: Motoi Ishidata, Jr., "Data book of
Chromosomal
Chromosomal Aberration Test In Vitro", Biological Research Centre,
aberrations
National Institute of Hygienic Sciences, Japan, Elsevier Life-Science
(CHL)
Information Center, New Yourk 1988. **
49%
35%
52%
43%
Additional data from: Mutagenicity Database of Environmental Chemicals,
as reported by Dr. Motoi Ishidate, Jr. Taken from Sofoni, T. (Ed), "Data
Book of Chromosomal Aberration Test In Vitro", LIC. Tokyo, 1998.
Mutations in
Mouse
Lymphoma
(GTX)
Multicase model. Data from Grant et. al., Mut. Res. 465 (2001) pp. 201229;
Multicase model. Data from various sources in the open literature. The
primary references are: Hayashi, et. al., "Micronucleus Tests in Mice on 39
Mouse
Food Additives and Eight Miscellaneus Chemicals," Food Chem Toxicol,
micronucleus II 26 (1988) 487-500 ; Mavournin, et. al., "The in vivo micronucleus assay in
mammalian gone marrow and peripheral blood. A report of the U.S.
Environmental Protection Agency Gene-Tox Program," Mutation Research,
Endpoint
Reference
Source
Training
Validation
Set
Domain*
239 (1990) 29-80 ; Waters, et. al., "The performance of short-term test in
identifying potential germ cell mutagens: A quantitative and Qualitative
analysis," Mutation Research, 341 (1994) 109-131 ; Morita, et. al.,
"Evaluation of the rodent micronucleus assay in the screeing of IARC
carcinogens (Groups 1, 2A and 2B) The summary report of the 6th
collaborative study by CSGMT/JEMS - MMS," Mutation Research, 389
(1997) 3-122.
Mouse Sister
Chromatid
Multicase model. Data from Genetox Mut. Res. 297 (1993) pp. 101-180
Exchange bone
m. (in vivo)
Rodent
Multicase model. Data from Genetox Mut. Res. Vol. 154, No. 1, July 1985
Dominant
plus updates
Lethal Test
Chinese
Hamster Ovary
Multicase model. Primary information source; Genetox (Mutat. Res. Vol.
Cell HGPRT
196, No. 1, July 1988) plus updates
Forward
Mutation Assay
Drosophila
melanogaster
Multicase model. Data from Mut. Res. Vol 123, No. 2, Okt. 2983 mainly
Sex-Linked
from Genetox plus updates
Recessive
Lethal (SLRL)
Multicase model. Data from Sasaki et. al., Critical Reviews in Toxicology
Mouse COMET
Vol. 30, issue 6, 2000, pp. 629-799 plus 89 physiological chemicals
assay
entered as inactives
Unscheduled
DNA synthesis
Multicase model. Data from Mutation Research, 221 (1989) 263-286,
in Rat
Williams et al., updated with tests from CCRIS and Toxline EMIC (Jan. 03)
hepatocytes, in
vitro
Multicase model. Data from Isfort, et. al. Mutation Research 356 (1996),
Syrian Hamster pp. 11-63 (169 test results), Kerckaert, et.al: Mutation Research 356
Embryo Cell
(1996) pp. 65-84 (25 test results.) 7 additional substances from Toxicol.
Transformation Sci. 2002 Jul; 68(1) pp. 43-50, Mutation Research 392, no. 1,2 (1997) pp.
Assay
61-90 and Toxcol. Sci. 1998 (41(2) pp. 189-197. To balance the model
against over representation with positive tests, 39 physiological chemicals
which were assumed to have a low probability of activity were added as
negatives (source Mutation Research 465 (2000) pp. 201-229).
DK-EPA
n=210
LGO (50%) gave: Sens=70% Spec. = 98% and
Conc.=91%
51%
DK-EPA
n=193
LGO (50%) gave: Sens.= 44%, Spec.= 94%,
Conc.= 76%
48%
DK-EPA
n=243
LGO (50%) gave: Sens.=76%, Spec.=87%,
Conc.=80%. External validation with 150
physiological chemicals expected to be negative
for this endpoint gave for all chemicals:
Spec.=94.8%
and gave for 77 AOK predictions: Spec. = 97.5%
41%
DK-EPA
n=368
LGO (50%) gave: Sens. = 67%, Spec. = 93%,
Conc. = 81%
56%
DK-EPA
n=288
LGO (50%) gave: Sens.=65%, Spec.=89%,
Conc.=80%.
48%
n=413
LGO (50%) gave: Sens.=56.3%, Spec.=89.1%,
Conc.=74.0%. External validation with 150
physiological chemicals expected to be negative
for this endpoint gave: Spec.=96.7%.
60%
DK-EPA
n=363
DK-EPA
LGO (50%) gave: Sens.=60%, Spec.=80%,
Conc.=72.0%. External validation with 61
physiological chemicals expected to be negative
for this endpoint gave: Spec.=58/61=95%.
External validation with 16 positive tests not used
in the model gave: Sens=11/16=69%.
55%
Endpoint
Reference
Cancer (male Multicase model AF1. Matthews, J. and J.F. Contrera, Reg. Toxicol. And
Rat, FDA) open Pharm. 28, 242-264 (1998)
Cancer (female Multicase model AF1. Matthews, J. and J.F. Contrera, Reg. Toxicol. And
Rat, FDA) open Pharm. 28, 242-264 (1998)
Cancer (male
Multicase model AF1. Matthews, J. and J.F. Contrera, Reg. Toxicol. And
Mouse, FDA)
Pharm. 28, 242-264 (1998)
open
Cancer (female
Multicase model AF4. Matthews, J. and J.F. Contrera. Reg. Toxicol. And
Mouse, FDA)
Pharm. 28, 242-264 (1998)
open.
Cancer (male
Rat, FDA)
proprietary
Source
Commercial
Commercial
Training
Validation
Domain*
Set
LGO (20%) gave: Sens.=51-95 %, Spec.=54n=1085
70%
97%, Conc.=80-81%, Chi2=35-40
LGO (20%) gave: Sens.=64-67% Spec.=98-99%,
n=1079
72%
Conc.=88-89%, Chi2=61-62
Commercial
n=1010
LGO (20%) gave: Sens.=64-68%, Spec.=9699%, Conc.=90-91%, Chi2=46-52
69%
Commercial
n=1025
LGO (20%) gave: Sens.=76-78%, Spec.=96%,
Conc.=91-92&, Chi2=53-55
70%
Multicase model AF5
Commercial
-
Cancer (female
Rat, FDA)
Multicase model AF6
proprietary
Commercial
-
Cancer (male
Mouse, FDA)
proprietary
Multicase model AF7
Commercial
-
Cancer (female
Mouse, FDA) Multicase model AF8
proprietary
Commercial
-
CPDB Rat
TD50
Multicase model. Data from Louis & Gold, Carcinogenic Potency Data
Base, 1999 Berkley (on-line version)
DK-EPA
n=868
CPDB Mouse
TD50
Multicase model. Data from Louis & Gold, Carcinogenic Potency Data
Base, 1999 Berkley (on-line version)
DK-EPA
n=720
DK-EPA
n=320
HepatocarcinogMulticase model. Data from Gold,and Ames: Carcinogenic Potency Data
External validation by Matthews and Contrera
(Reg. Toxicol. And Pharm. 28, 242-264, 1996)
with 100 test results not used in the model gave
for the sum of the 4 proprietary models:
Sens=34/58=58.6%, Specificity=41/42=97.6%
and Conc.=75/100=75%.
External validation by Matthews and Contrera
(Reg. Toxicol. And Pharm. 28, 242-264, 1996)
with 100 test results not used in the model gave
for the sum of the 4 proprietary models:
Sens=34/58=58.6%, Specificity=41/42=97.6%
and Conc.=75/100=75%.
External validation by Matthews and Contrera
(Reg. Toxicol. And Pharm. 28, 242-264, 1996)
with 100 test results not used in the model gave
for the sum of the 4 proprietary models:
Sens=34/58=58.6%, Specificity=41/42=97.6%
and Conc.=75/100=75%.
External validation by Matthews and Contrera
(Reg. Toxicol. And Pharm. 28, 242-264, 1996)
with 100 test results not used in the model gave
for the sum of the 4 proprietary models:
Sens=34/58=58.6%, Specificity=41/42=97.6%
and Conc.=75/100=75%.
LGO (50%) gave: Sens.=68%, Spec.=81%,
Conc.=74%, and regression log measured vs. log
calculated TD50 gave q2=0.43
LGO (50%) gave: Sens.=38%, Spec.=86%,
Conc.=66%, and regression log measured vs. log
calculated TD50 gave q2=0.22
LGO (50%) gave: Sens = 33%, Spec. = 91%,
68%
70%
67%
67%
69%
65%
61%
Endpoint
Reference
enicity in
Base, 1999 Berkley (on-line version)
rodents (Rat or
Mouse)
Source
Training
Validation
Set
Conc. = 74%. External evaluation by 182
negative tests gave Spec. = 86.3%.
Domain*
Some of these models were externally validated, all were cross validated: LGO (Leave Groups Out) and LOO (Leave One Out) refers to statistical validations ("cross validations"),
where one or more (e.g. 10%, 20% or 50%) of the substances are removed by random and a new model is developed on the remaining substances. (LOO used for TOPKAT
models may give too "optimistic" results in some cases.). The new model is then used to predict the chemicals which were not used to make it. This is repeated a number of times.
Predictions are compared with experimental test results to determine the predictivity of the model. For continuos variables (quantitative predictions), this is expressed as the
regression coefficient for predictivity q2. For parametric ("yes/no") models Cooper statistics are used to express predictivity as sensitivity (sens.), specificity (spec.) and
concordance (conc.). What is shown in the validation column is not internal performance measures, but predictivity assessed by external validation and cross validation, where
chemicals are removed entirely and new models are made to predict the chemicals, which were not used to make them. Sensitivity is the fraction of the positives correctly identified
by the model, specificity is the fraction of the negatives correctly identified by the model and concordance is the overall accuracy.
Regarding commercial models: See for Topkat models http://www.accelrys.com/products/topkat/modules.html and for Multicase models http://www.multicase.com/.
Please note that models from Episuite and the EU TGD are not referenced above but cf. http:/www.epi.gov and http://ecb.jrc.it/
* Domain for approximately 46000 EINECS substances. For further details on domain description: See www.mst.dk/chemi/01050000.htm.
** cf. also: ENV/JM/TG(2004)27/ANN p. 91-110, Annex 4 J. Niemelä & Wedebye E.:”A “Global” Multicase Model for in vitro Chromosomal Aberrations in Mammalian Cells”
Download