Session IV Transcripts

Session IV Transcripts
November 21st (PM) Session IV: Predictive Toxicology
Case History: Toxicology Biomarker Development Using Expression Profiling
David Watson (Eli Lilly)
Question: In my introductory comments yesterday I put up a benchmarking study that
was reported on by Scott Fillard last year at the Profiling Workshop we did. One of the
numbers that struck me there was that 44% of the failures in preclinical and clinical
development were due to toxicological reasons. In my opinion, it's going to be difficult
to reduce that number significantly in the foreseeable future because we don’t
understand the molecular basis of a lot of toxicology. I think you've also brought up
another point, and that is that there's a reluctance to move some of these more
molecular-based assays earlier and earlier in drug discovery because you may end up
killing compounds when, in fact, they don't deserve to die, because the assays aren't
that predictive. If you tell a chemist they've got a toxic event going on, that puts a black
mark over the compound. So, I guess I'd like you, as a toxicologist, to react to those
comments. What do you see happening at Lilly in terms of pushing these back into lead
David Watson Response: All compounds will be toxic at some dose. The question is
finding the margin of safety and determining whether that is acceptable for further
development. The qualitative nature of the toxicity, the patient population, and margin of
safety are key considerations. Will compounds that are toxic continue to be brought
forward? Definitely. Do we have data bases that are sufficiently robust and chemically
diverse enough to be able to predict such toxicities in silico? No, not in my opinion. Do
we have good in vitro assays that can model the enormous number of toxicological
endpoints? No. What kind of in vivo biomarkers of toxicity do we have? Reasonably
good ones for necrosis in many organs, but for all the other types of endpoints they're
coming slowly. So, how do you deal with that? Similar to the approach being used for
biopharmaceutic properties, I think you have to have a prioritization in terms of what are
the key variables that you absolutely have to have built in and you start with those. If
you try to do everything all at once, it's going to be too challenging. Most companies will
look at the kinds of toxicological endpoints that have really caused problems with their
developmental molecules, and they'll find liver, probably renal, cardiac and neuro.
Those are the four main systems that have caused the great majority of compound
failures. I think most companies are investing in those four organs to develop better in
vitro and in vivo assays to at least identify those toxicities earlier. Developing better in
vivo biomarkers and in vitro models, and knowing when to employ them, are keys to
progress in Lead Optimization Toxicology.
Ron Borchardt Response: So, this goes back to the question I asked before. What
are the unmet needs? I think there are huge unmet needs here in terms of ultimately
being able to bring the 44% failure rate for toxicology reasons down to a more
reasonable number. Toxicology needs to be taken more and more molecular and
validated HTS assays need to be developed so that they can be utilized in lead
David Watson Response: I do think there will be many more molecular tools available.
They will be most useful to address known in vivo toxicities caused by the compounds
of interest, and will be applied using high quality in vitro systems. I've also tried to point
out here that there are a lot of business process changes that can be implemented
immediately. Conducting a small number of in vivo toxicology studies early can
characterize toxicological liabilities of a platform, to identify challenges that lie ahead for
that platform. This allows toxicologists to either begin work resolving them, or to make
other strategic decisions, such as deciding to bring forward an additional or a different
platform. Those are simple things that anybody can do using existing resources and will
have a potential to have a big impact.
November 21st (PM) Session IV: Predictive Toxicology
Panel Discussion
David Watson, Paul Levesque, Alastair Cribb, Constantine Kreatsoulas
Question: I'd like to ask Constantine a question because we discuss this internally all
the time. If you look at the modeling of Ames positive, or any other mutagenicity assay,
and you're thinking in terms of 2000 compounds—that's a really big compound set
relative to what we might get for bile duct epithelial hypoplasia. So, with efforts to
modelling really, really complex biological interactions, what do you see the future
looking like? Do we have to develop more in vitro systems? Can we find other ways to
model complex systems in vivo? What would you look for?
Constantine Kreatsoulas, BMS-Merck, Response: The answer is yes. I think we
need more in vitro assays and better in vivo assays for the complex systems. I think the
problem is that the mechanisms underlying the endpoints are so complex themselves
that you need to chip away at different elements and model them separately and then
try to find good correlations between them. That's sort of what we're trying to do with
local models with the QSARs and then feed them back in to the larger global model.
Question: This question is for Dr. Levesque. When we talk about the QT interval in
the hERG channel activity, in my company we just incorporate an assay, the rubidium
efflux. This assay is a functional assay and also not so difficult to handle. We have just
started this, so I'm trying to get some information on understanding it. I assume you are
familiar with this assay? How do you look at this assay and how reliable it is?
Paul Levesque Response: The question is what do I think of the rubidium flux assay
for screening hERG. I can answer that question generally for voltage-gated potassium
channels, but not for hERG specifically since we haven't used that assay to screen
hERG. I think the rubidium flux assay can be quite useful as a primary screen for
detecting channel inhibition and for some chemotypes, it can be useful even for
generating SAR. However, for some chemotypes we’ve seen big discrepancies
between patch-clamp and rubidium flux potency, so for those the flux assay was
misleading and not useful for SAR. If you can validate the rubidium flux assay using
patch-clamp for the series your working on and show good correlation, you could
probably use it for generating SAR with some confidence. However, you want to make
sure it's predictive for the particular chemotype you're working in and you should do at
least a little bit of electrophysiology to determine whether there are potency shifts and
whether the rank order of potency for compounds is ok. But, in general, I think it's a
reasonable assay once validated and used appropriately.
David Watson Response: There's been a lot of work on methapyrilene; I suspect that
particular endpoint is the result of a metabolite. The best evidence for that is that it's
quite species-specific. The toxicity that's seen for methapyrilene is really not shared by
mice, it's not shared by hamsters, guinea pigs; it seems to be quite specific. That's
often a good indication that you've got metabolism, not a direct effect of the parent
compound. Now, your question really was can toxicogenomics address that for you..
The other thing was—you identify correctly that we're talking about a thiophene—but
thiopenes can be, like a terminal thiophene, monosubstituted as was the case for
methapyrilene, or they can be imbedded in a structure. And they behave very
differently in those two contexts; the same thing goes for furans and indoles and, clearly,
it's where within the molecule the structure occurs. Back to the toxicogenomics. There
are a number of companies that are starting to generate large databases for
compounds in a specific model. I believe at least one of these companies is at a point
where you can start to ask questions about specific structural motifs and correlate them
with endpoints. But, to the point that Constantine made and others have made, you
really need a lot of compounds, you need a lot of observations, and I don't think we're at
a point yet, but certainly we're getting there.
Question: This is a general question and it pertains to in vitro/in vivo, but, I think of the
hERG assay every time this comes up. A lot of the compounds that are hERG active
are lipophilic, amines. So, I immediately start thinking about cell partitioning and its
impact on cell concentration as the appropriate referencing concentration for potency,
rather than what's in solution or what's thought to be put into solution. Is there any
evidence that surface activity, or cell concentration, correlates with hERG activity? And
then I'd like to extrapolate that beyond total concentration in vitro to free concentration in
vivo. Has anybody looked at what's actually delivered in vivo in any tox reaction and
made correlations there? In fact, what does the FDA think about protein binding
adjustments to free concentration?
Paul Levesque Response: As I mentioned in the talk, a number of publications show
convincingly that free or unbound drug exposure correlates best with hERG potency
and QT interval prolongation. This is consistent with the fact that most drugs have to
enter the cell to interact with the binding site on the hERG channel. However, I
probably should have stated that although free plasma concentrations are important for
most drugs, we have identified a number of compounds where that doesn't hold up very
well at all. In some cases where we have seen in vivo QT signals at exposures much
lower than we would have predicted based on hERG activity and factoring in protein
binding, it turned out that the compounds accumulated in heart tissue. So even though
we factor in protein binding when estimating risk potential from hERG potency data,
which is determined in a protein-free assay, and anticipated efficacious exposure, it's
with the caveat that sometimes the correlation between free plasma concentrations and
in vitro hERG potency is not always good. In addition, we have done some in vitro
patch-clamp studies where we’ve added protein to assay solutions and we’ve found that
adding protein doesn't shift hERG potency to the degree you'd predict based on protein
binding data for all drugs.
Jim Stevens Response: Just to add a simple-minded example for the audience. At
Lilly we have an assay similar to the one that Dr. McKim would have presented, but with
fewer endpoints, called the ABCL assay, for acute basal cell lethality. When we piloted
the assay using some compounds that were known to be human toxicants, there was
no correlation between administered dose to an animal and the rank order toxicity in the
in vitro assay. But when you extrapolated to what was probably or believed to be an
EC50 plasma concentration that led to toxicity, then you got a much better correlation.
So, obviously, understanding the exposure, even at a crude level, such as the plasma
level which doesn't really give you what's happening in the liver or another target organ,
gave you a much better correlation with the in vitro model.
Question: Specifically, are you aware of anyone—and this goes back to something
Lipinski said years ago with trying to correlate compounds that appear to have a surface
activity with cardiotoxicity—who actually looked at measured surface activity and tried
to correlate it with an effect on channel inhibition? Because that seems to be a more
efficient way to hit a channel if you recall Leo Herbette's data for calcium channel
Jim Stevens Response: I don't know of anybody who's done those calculations or
Question: I'm not in this area, I'm more involved in efficacy models and things like that.
I'm wondering—a lot of the things that you talked about for tox testing are really quite
late and is there any way to get an earlier read on a number of your compounds?
During an efficacy study we run a wide dose range and you could do clin path or
histology or whatever just to get a read on a number of compounds as to what
potentially could be problems.
David Watson Response: There are companies doing that, definitely. Often the
efficacy models are different strains of rodent than what toxicologists customarily use,
and the exposures need to be high enough to give justify the resource investment, but
yes this is a strategy that brings important information to drug development teams at
times earlier than is typically done.
Question: To go along that theme, is it appropriate to screen all chemicals prior to
having in vivo data for selective toxicity influence or should one use in vitro models or
toxicogenomic models or molecular models to respond to toxicities that are observed in
in vivo models early in lead optimization?
David Watson Response: That's a great controversial question. I'll take one side of
that. There's a proliferation of the number of available assays and species that are
being used to assess toxicity, both in vitro and in vivo. But the question is, what's the
value of those data? My own perspective is that you're better off having some type of in
vivo data from rodents that allow you to specifically address a legitimate concern.
Otherwise you will create issues that are irrelevant. If you run only three assays, and
they are 80% predictive, compounds that are clean will come up positive on average
about 50% of the time in one of the assays. The limitations of the assay just created an
issue that shouldn't exist for that compound. So the fact that none of our in vitro assays
are 100% predictive means that you're going to get false positives, and that can be
damaging in the sense that it occupies people's time and it takes up more resources. In
contrast, if you find something significant in vivo with histology or clinical pathology, then
you've got a more meaningful observation because it relates to the margin of safety for
your compound. This is more likely to justify resource expense on in vitro assay
development and implementation, because you'll have a clear use for those data:
avoidance of an in vivo issue that you've already identified.
Jim Stevens Response: Let me add to that. There are two ways you can use these
assays. One is to say, if we're going into this chemical space and we have 100
examples of that chemical space, could we run an in silico model that would predict a
higher probability of yielding, for example, a gentox result than another chemical space
we could go into. I think you're on a much firmer footing with a much higher sample
number. But when you try to say we predict that the activity of this single molecule will
be x, y, or z in multiple in silico assays, then I believe you begin to have a problem. So,
my bias is to look at probabilities in larger chemical space as opposed to trying to make
predictions about the properties of a single molecule that you might want to bring
forward in the lab.
Question: I'm going to put Alastair on the spot here. Over the last couple of years as
I've traveled around the industry, I've often been asked about the issue of covalent
binding, and the paper from Merck about the 50 picomolar equivalent to a mg. Is that a
number that you're comfortable with? If you were to go back into the industry, is that a
number you would select? How would you deal with that?
Alastair Cribb Response: I kind of view that number as a starting point. In other
words, we've not really done the research to back that number up. When I was at
Merck, we didn't have a number and we used to just look at whether covalent binding
was there. We did it more from the perspective of: we had a problem, and we were
going back to evaluate covalent binding. We weren't doing it prospectively. So, I would
not be comfortable with having that [50 picomolar equivalent per mg] as a go-no go
mark or using it in a rigid fashion. But I think as a starting point—to use it as a
reference point and go forward and start accumulating more information and have
things evaluated earlier—I think it's a reasonable starting point. I'm not sure of all the
things that went into arriving at that number. What I did when I saw it, was to go back
and look at some of the compounds that were out there, that we knew caused problems,
and we had some covalent binding information on them. And it [50 pmol/mg] was
reasonable, within the context of the caveat I said. I'm a little uncomfortable with using
just one concentration of a drug and then saying that's it. Because we know from many
of the compounds that were out there that you would not have exceeded that level using
the “Merck” paradigm and yet they do cause problems in vivo. So, I don't want to use it
in a rigid way, and certainly we can give examples where we saw no covalent binding in
a rodent species and yet we do know there is covalent binding in other species and we
actually end up with a clinical problem. So, you need, I think, some in vitro work, but the
in vivo is very useful. Bottom line: I'm ok with this [50 pmol/mg] as a starting point, but I
wouldn't use it too rigidly.
Question: This is a question for Constantine. In terms of chemists using the kind of
software you described, can you give us some idea of whether this is really influencing
significantly their decisions about what to make, what not to make, or do you view it
more as an educational tool.
Constantine Kreatsoulas Response: It's interesting; I think it really depends on the
chemist. Chemists who have taken the time to sit down and kind of understand where
this approach works and where it fails—they're the ones who use it most wisely. They
can judge for themselves the quality of the model because they understand what its
limitations are. With those chemists, they actually do use it as sort of a guide—as, “The
model said this was positive, we sent it into the assay, the assay said it was positive, so
probably these are the substructures I need to look at and understand.” That being said,
chemists do not want you to limit their space. They do not want you to “whack” a
compound. So, it's a fine line between losing an entire segment of scientists and
helping the process along. So, I've seen both sides of it, and I think the understanding
that you need to look at and evaluate this takes a little bit of time and a little bit of
patience. So, the population of chemists that use it wisely or use it at all is much
smaller than the population of medicinal chemists as a whole.
Question: One last question for the whole panel. It's been interesting to see how the
hERG issue has kind of changed the industry over the last few years; I think it's
because the FDA has gotten interested in it and so everybody does hERG assays.
From your perspective, what's going to be the next hERG in the next five years? What
seems to be popping up on people's radar screens as something that's so predictable of
toxicity that eventually the FDA is going to start writing guidances around it.
Jim Stevens Response: There are a number of issues popping up and there's publicly
available a recent guidance from the FDA for all companies that are dealing with
peroxisome proliferator-activated receptor compounds so I think that's an area where
there's a lot of activity and there are already a couple of nascent efforts forming up
consortia in the industry to deal with those issues. I think as we get into more and more
complex signaling pathways my prediction is that we'll begin to see some toxicities that
we haven't seen before because we're not only entering new chemical space but we're
also entering new biological space. So you see these signaling pathways operating in
multiple physiological pathways. I can't give you specific examples, but the prediction
would be that as we start to get more and more targeted molecules that hit these
complex signaling pathways, we will begin to see toxicities that we have not dealt with
before that will be target-mediated.
Question: I was intrigued by Paul's comment that was a consortium that was put
together around the hERG issue and you just brought up consortia again. How do
toxicologists do consortia and the rest of the industry doesn’t seem to be able to do it?
Jim Stevens Response: In general, I hope the industry does not want to take the
position that we will compete on safety; rather we will compete on efficacy. That
position provides a push for toxicologists to band together externally and address these
Question: The 50 picomole number when I was at Merck was very soft, and we have
to admit that. I wasn’t directly involved, but I think they had examples like
acetaminophen that at 1000 picomoles/mg there was overt toxicity. They just divided
that number by 20 as a comfort level, a safety margin. I want to take a crack at Dr.
Borchardt's question. I may be naïve about transporters, but it seems to me that for
many years at least P450 people like me were extremely arrogant, thinking that our
enzyme systems that we were working with were totally irrelevant in terms of
gluconegenesis, intermediary metabolism, bile acid salts and all the rest of it. What I've
seen in the last five years is a slow convergence of all the fields; the transporters are
now impacting on the DMEs, drug metabolizing enzymes; we're now realizing that the
transporters are involved in the efflux of intermediary metabolic components and if
metabolites in our drugs impede those transporters that could give rise to toxicity at
relatively high doses. We're now seeing that PXR may be related to glucogenesis as
well through various kinase pathways; I think that's the next big step. And folks like me
have been extremely arrogant for 15 years and I think it's time we were slapped down
by biology.
Alastair Cribb Response: It’s interesting, and it does make a difference whether
you're looking at intrinsic toxins or not. One of the issues with the idiosyncratic
reactions is that the density of haptenization of proteins probably plays a role—that's
certainly been shown from immunological studies. So that's where you get into an issue,
when you look at total binding vs. binding to one protein. You know, when you look at
the data in Merck's publications and listen to some of the discussions, they are
combining both intrinsic and idiosyncratic toxins, and coming up with the 50 picomole
guideline—but you have to be careful with that. It's interesting that both the FDA and
Health Canada are grappling with guidelines for hepatotoxicity prediction and they refer
to covalent binding and reactive metabolite formation—it always shows up in the
working documents—and it's going to be interesting to see how that plays out as they
go forward—whether people do try to firm up these numbers.
Question: I'm not a toxicologist, but I'm wondering if you compared outcomes of all
these computational programs to maybe Ames test and the test you described; I guess
the Ames test is the golden standard, but how good is that as a standard? Should we
continue to look at that? Obviously we cannot do carcinogenicity studies for all the
compounds, but is the Ames test still the standard to go for?
Constantine Kreatsoulas Response: Yes, the Ames test is definitely the standard
that the FDA looks for. There are nongenotoxic carcinogens certainly so that's a
different issue altogether, and then there are issues such as chromosomal aberrations,
which the Ames does not address so the in vitro micronucleus is an assay that's used to
evaluate that if there's a likelihood of that occurring. So, yes and a little bit of no is what
the answer really is.
Question: I guess what I'm aiming at is if you're having false negatives and false
positives ultimately you're not sure.
Constantine Kreatsoulas Response: In terms of these in silico models, it's just a
question of trying to fold that in early on in the discovery phase. So if you can begin
partitioning the compounds intelligently, or as intelligently as we can given what we
know right now, that implicitly increases the throughput of the assays, which can give
you a true positive, so that's why we end up trying to limit the false negatives.
David Watson Response: We often see people who are trying to model mutagenesis,
in particular, and I think it's really an important test piece for in vitro models. The largest
dataset in Toxicology is for compounds run through the Ames test. Second, the test is
run in bacteria, as opposed to a strain of rodent, for example. And third, it's a yes-no
output. These attributes mean that until models for Ames results and similar GenTox
data are done extremely well, there is low probability of modeling in silico any of the
complex toxicological phenomena that occur in vivo, because the data sets are much
smaller, they are less diverse structurally, they are more complex biologically, and
you're going to have to take into account margin of safety. So you can bet that we've got
to succeed in that first one, which is the Ames test.
Prabhaka Jadhav, Eli Lilly, Response: I have two comments. The first one is how
medicinal chemists react to the predictive models. From my one experience and the
experience of my team, we do use these computational models to prioritize our
compounds, but don't use them as a show stopper. For example, if one wants to test
our hypothesis by putting a certain group in a certain location to block the metabolism
and if the log P is >5, we still want to go ahead and make that molecule. The second
comment I want to make is the in vitro assays for toxicity are also very useful, but they
should not be used as a show stopper. As an example, in one particular series, one
assay suggested they would be active, but when we went to the next level they were
completely inactive. So, what it did, the first result kind of slowed down the team in a
way. In another case we submitted a series of compounds because the toxicology
suggested that there was a substructure that could have some potential to show that
toxicity. It did come in as preliminarily positive, that was the toxicological study. The
whole team was discouraged, but we wanted to work at the next level and get an in vivo
data point and, sure enough, this toxicity that was predicted by the in vitro was there,
but we had a greater than 104 margin of safety.
Jim Stevens Response: Those are excellent examples of using data wisely. At least
at Lilly we look at the lead optimization toxicologists' job as not to stop teams from going
forward but to keep them going forward with the eyes wide open and when you do see
things playing out in a way that predict a safety issue you can react very quickly. So, let
me bring this to a close by coming back to a question Ron asked earlier in the day. He
asked, "How low can we drive attrition?" I think there are some areas where attrition is
due to toxicology and there'll be very little we can do about it. Things that pop up in very
late phase and appear when multiple pharmaceutical companies submitting novel
chemical structures to hit novel targets and the FDA
sees a pattern playing out, we won't be able to predict those in lead optimization. As
much as 40-50% of attrition prior to first human dose occurs after candidate selection—
that's something we can do better. You're going to pay a price. If you're going to drive
attrition down between candidate selection and before first human dose, you're going to
drive attrition up before that at some earlier point. So expect to see that, and be ready
to accept it. By driving attrition down post-candidate selection, you will be weeding
those compounds out earlier and you will be accepting a rate of false negatives and
false positives and you will see high attrition earlier. So there is going to be a yin-yang
between knowing where you want your attrition to occur and where you don't want your
attrition to occur, and, not investing in attrition where you probably can't do anything
about it.