Session IV Transcripts November 21st (PM) Session IV: Predictive Toxicology Case History: Toxicology Biomarker Development Using Expression Profiling David Watson (Eli Lilly) Question: In my introductory comments yesterday I put up a benchmarking study that was reported on by Scott Fillard last year at the Profiling Workshop we did. One of the numbers that struck me there was that 44% of the failures in preclinical and clinical development were due to toxicological reasons. In my opinion, it's going to be difficult to reduce that number significantly in the foreseeable future because we don’t understand the molecular basis of a lot of toxicology. I think you've also brought up another point, and that is that there's a reluctance to move some of these more molecular-based assays earlier and earlier in drug discovery because you may end up killing compounds when, in fact, they don't deserve to die, because the assays aren't that predictive. If you tell a chemist they've got a toxic event going on, that puts a black mark over the compound. So, I guess I'd like you, as a toxicologist, to react to those comments. What do you see happening at Lilly in terms of pushing these back into lead optimization? David Watson Response: All compounds will be toxic at some dose. The question is finding the margin of safety and determining whether that is acceptable for further development. The qualitative nature of the toxicity, the patient population, and margin of safety are key considerations. Will compounds that are toxic continue to be brought forward? Definitely. Do we have data bases that are sufficiently robust and chemically diverse enough to be able to predict such toxicities in silico? No, not in my opinion. Do we have good in vitro assays that can model the enormous number of toxicological endpoints? No. What kind of in vivo biomarkers of toxicity do we have? Reasonably good ones for necrosis in many organs, but for all the other types of endpoints they're coming slowly. So, how do you deal with that? Similar to the approach being used for biopharmaceutic properties, I think you have to have a prioritization in terms of what are the key variables that you absolutely have to have built in and you start with those. If you try to do everything all at once, it's going to be too challenging. Most companies will look at the kinds of toxicological endpoints that have really caused problems with their developmental molecules, and they'll find liver, probably renal, cardiac and neuro. Those are the four main systems that have caused the great majority of compound failures. I think most companies are investing in those four organs to develop better in vitro and in vivo assays to at least identify those toxicities earlier. Developing better in vivo biomarkers and in vitro models, and knowing when to employ them, are keys to progress in Lead Optimization Toxicology. Ron Borchardt Response: So, this goes back to the question I asked before. What are the unmet needs? I think there are huge unmet needs here in terms of ultimately being able to bring the 44% failure rate for toxicology reasons down to a more reasonable number. Toxicology needs to be taken more and more molecular and validated HTS assays need to be developed so that they can be utilized in lead optimization. David Watson Response: I do think there will be many more molecular tools available. They will be most useful to address known in vivo toxicities caused by the compounds of interest, and will be applied using high quality in vitro systems. I've also tried to point out here that there are a lot of business process changes that can be implemented immediately. Conducting a small number of in vivo toxicology studies early can characterize toxicological liabilities of a platform, to identify challenges that lie ahead for that platform. This allows toxicologists to either begin work resolving them, or to make other strategic decisions, such as deciding to bring forward an additional or a different platform. Those are simple things that anybody can do using existing resources and will have a potential to have a big impact. November 21st (PM) Session IV: Predictive Toxicology Panel Discussion David Watson, Paul Levesque, Alastair Cribb, Constantine Kreatsoulas Question: I'd like to ask Constantine a question because we discuss this internally all the time. If you look at the modeling of Ames positive, or any other mutagenicity assay, and you're thinking in terms of 2000 compounds—that's a really big compound set relative to what we might get for bile duct epithelial hypoplasia. So, with efforts to modelling really, really complex biological interactions, what do you see the future looking like? Do we have to develop more in vitro systems? Can we find other ways to model complex systems in vivo? What would you look for? Constantine Kreatsoulas, BMS-Merck, Response: The answer is yes. I think we need more in vitro assays and better in vivo assays for the complex systems. I think the problem is that the mechanisms underlying the endpoints are so complex themselves that you need to chip away at different elements and model them separately and then try to find good correlations between them. That's sort of what we're trying to do with local models with the QSARs and then feed them back in to the larger global model. Question: This question is for Dr. Levesque. When we talk about the QT interval in the hERG channel activity, in my company we just incorporate an assay, the rubidium efflux. This assay is a functional assay and also not so difficult to handle. We have just started this, so I'm trying to get some information on understanding it. I assume you are familiar with this assay? How do you look at this assay and how reliable it is? Paul Levesque Response: The question is what do I think of the rubidium flux assay for screening hERG. I can answer that question generally for voltage-gated potassium channels, but not for hERG specifically since we haven't used that assay to screen hERG. I think the rubidium flux assay can be quite useful as a primary screen for detecting channel inhibition and for some chemotypes, it can be useful even for generating SAR. However, for some chemotypes we’ve seen big discrepancies between patch-clamp and rubidium flux potency, so for those the flux assay was misleading and not useful for SAR. If you can validate the rubidium flux assay using patch-clamp for the series your working on and show good correlation, you could probably use it for generating SAR with some confidence. However, you want to make sure it's predictive for the particular chemotype you're working in and you should do at least a little bit of electrophysiology to determine whether there are potency shifts and whether the rank order of potency for compounds is ok. But, in general, I think it's a reasonable assay once validated and used appropriately. David Watson Response: There's been a lot of work on methapyrilene; I suspect that particular endpoint is the result of a metabolite. The best evidence for that is that it's quite species-specific. The toxicity that's seen for methapyrilene is really not shared by mice, it's not shared by hamsters, guinea pigs; it seems to be quite specific. That's often a good indication that you've got metabolism, not a direct effect of the parent compound. Now, your question really was can toxicogenomics address that for you.. The other thing was—you identify correctly that we're talking about a thiophene—but thiopenes can be, like a terminal thiophene, monosubstituted as was the case for methapyrilene, or they can be imbedded in a structure. And they behave very differently in those two contexts; the same thing goes for furans and indoles and, clearly, it's where within the molecule the structure occurs. Back to the toxicogenomics. There are a number of companies that are starting to generate large databases for compounds in a specific model. I believe at least one of these companies is at a point where you can start to ask questions about specific structural motifs and correlate them with endpoints. But, to the point that Constantine made and others have made, you really need a lot of compounds, you need a lot of observations, and I don't think we're at a point yet, but certainly we're getting there. Question: This is a general question and it pertains to in vitro/in vivo, but, I think of the hERG assay every time this comes up. A lot of the compounds that are hERG active are lipophilic, amines. So, I immediately start thinking about cell partitioning and its impact on cell concentration as the appropriate referencing concentration for potency, rather than what's in solution or what's thought to be put into solution. Is there any evidence that surface activity, or cell concentration, correlates with hERG activity? And then I'd like to extrapolate that beyond total concentration in vitro to free concentration in vivo. Has anybody looked at what's actually delivered in vivo in any tox reaction and made correlations there? In fact, what does the FDA think about protein binding adjustments to free concentration? Paul Levesque Response: As I mentioned in the talk, a number of publications show convincingly that free or unbound drug exposure correlates best with hERG potency and QT interval prolongation. This is consistent with the fact that most drugs have to enter the cell to interact with the binding site on the hERG channel. However, I probably should have stated that although free plasma concentrations are important for most drugs, we have identified a number of compounds where that doesn't hold up very well at all. In some cases where we have seen in vivo QT signals at exposures much lower than we would have predicted based on hERG activity and factoring in protein binding, it turned out that the compounds accumulated in heart tissue. So even though we factor in protein binding when estimating risk potential from hERG potency data, which is determined in a protein-free assay, and anticipated efficacious exposure, it's with the caveat that sometimes the correlation between free plasma concentrations and in vitro hERG potency is not always good. In addition, we have done some in vitro patch-clamp studies where we’ve added protein to assay solutions and we’ve found that adding protein doesn't shift hERG potency to the degree you'd predict based on protein binding data for all drugs. Jim Stevens Response: Just to add a simple-minded example for the audience. At Lilly we have an assay similar to the one that Dr. McKim would have presented, but with fewer endpoints, called the ABCL assay, for acute basal cell lethality. When we piloted the assay using some compounds that were known to be human toxicants, there was no correlation between administered dose to an animal and the rank order toxicity in the in vitro assay. But when you extrapolated to what was probably or believed to be an EC50 plasma concentration that led to toxicity, then you got a much better correlation. So, obviously, understanding the exposure, even at a crude level, such as the plasma level which doesn't really give you what's happening in the liver or another target organ, gave you a much better correlation with the in vitro model. Question: Specifically, are you aware of anyone—and this goes back to something Lipinski said years ago with trying to correlate compounds that appear to have a surface activity with cardiotoxicity—who actually looked at measured surface activity and tried to correlate it with an effect on channel inhibition? Because that seems to be a more efficient way to hit a channel if you recall Leo Herbette's data for calcium channel antagonists. Jim Stevens Response: I don't know of anybody who's done those calculations or correlations. Question: I'm not in this area, I'm more involved in efficacy models and things like that. I'm wondering—a lot of the things that you talked about for tox testing are really quite late and is there any way to get an earlier read on a number of your compounds? During an efficacy study we run a wide dose range and you could do clin path or histology or whatever just to get a read on a number of compounds as to what potentially could be problems. David Watson Response: There are companies doing that, definitely. Often the efficacy models are different strains of rodent than what toxicologists customarily use, and the exposures need to be high enough to give justify the resource investment, but yes this is a strategy that brings important information to drug development teams at times earlier than is typically done. Question: To go along that theme, is it appropriate to screen all chemicals prior to having in vivo data for selective toxicity influence or should one use in vitro models or toxicogenomic models or molecular models to respond to toxicities that are observed in in vivo models early in lead optimization? David Watson Response: That's a great controversial question. I'll take one side of that. There's a proliferation of the number of available assays and species that are being used to assess toxicity, both in vitro and in vivo. But the question is, what's the value of those data? My own perspective is that you're better off having some type of in vivo data from rodents that allow you to specifically address a legitimate concern. Otherwise you will create issues that are irrelevant. If you run only three assays, and they are 80% predictive, compounds that are clean will come up positive on average about 50% of the time in one of the assays. The limitations of the assay just created an issue that shouldn't exist for that compound. So the fact that none of our in vitro assays are 100% predictive means that you're going to get false positives, and that can be damaging in the sense that it occupies people's time and it takes up more resources. In contrast, if you find something significant in vivo with histology or clinical pathology, then you've got a more meaningful observation because it relates to the margin of safety for your compound. This is more likely to justify resource expense on in vitro assay development and implementation, because you'll have a clear use for those data: avoidance of an in vivo issue that you've already identified. Jim Stevens Response: Let me add to that. There are two ways you can use these assays. One is to say, if we're going into this chemical space and we have 100 examples of that chemical space, could we run an in silico model that would predict a higher probability of yielding, for example, a gentox result than another chemical space we could go into. I think you're on a much firmer footing with a much higher sample number. But when you try to say we predict that the activity of this single molecule will be x, y, or z in multiple in silico assays, then I believe you begin to have a problem. So, my bias is to look at probabilities in larger chemical space as opposed to trying to make predictions about the properties of a single molecule that you might want to bring forward in the lab. Question: I'm going to put Alastair on the spot here. Over the last couple of years as I've traveled around the industry, I've often been asked about the issue of covalent binding, and the paper from Merck about the 50 picomolar equivalent to a mg. Is that a number that you're comfortable with? If you were to go back into the industry, is that a number you would select? How would you deal with that? Alastair Cribb Response: I kind of view that number as a starting point. In other words, we've not really done the research to back that number up. When I was at Merck, we didn't have a number and we used to just look at whether covalent binding was there. We did it more from the perspective of: we had a problem, and we were going back to evaluate covalent binding. We weren't doing it prospectively. So, I would not be comfortable with having that [50 picomolar equivalent per mg] as a go-no go mark or using it in a rigid fashion. But I think as a starting point—to use it as a reference point and go forward and start accumulating more information and have things evaluated earlier—I think it's a reasonable starting point. I'm not sure of all the things that went into arriving at that number. What I did when I saw it, was to go back and look at some of the compounds that were out there, that we knew caused problems, and we had some covalent binding information on them. And it [50 pmol/mg] was reasonable, within the context of the caveat I said. I'm a little uncomfortable with using just one concentration of a drug and then saying that's it. Because we know from many of the compounds that were out there that you would not have exceeded that level using the “Merck” paradigm and yet they do cause problems in vivo. So, I don't want to use it in a rigid way, and certainly we can give examples where we saw no covalent binding in a rodent species and yet we do know there is covalent binding in other species and we actually end up with a clinical problem. So, you need, I think, some in vitro work, but the in vivo is very useful. Bottom line: I'm ok with this [50 pmol/mg] as a starting point, but I wouldn't use it too rigidly. Question: This is a question for Constantine. In terms of chemists using the kind of software you described, can you give us some idea of whether this is really influencing significantly their decisions about what to make, what not to make, or do you view it more as an educational tool. Constantine Kreatsoulas Response: It's interesting; I think it really depends on the chemist. Chemists who have taken the time to sit down and kind of understand where this approach works and where it fails—they're the ones who use it most wisely. They can judge for themselves the quality of the model because they understand what its limitations are. With those chemists, they actually do use it as sort of a guide—as, “The model said this was positive, we sent it into the assay, the assay said it was positive, so probably these are the substructures I need to look at and understand.” That being said, chemists do not want you to limit their space. They do not want you to “whack” a compound. So, it's a fine line between losing an entire segment of scientists and helping the process along. So, I've seen both sides of it, and I think the understanding that you need to look at and evaluate this takes a little bit of time and a little bit of patience. So, the population of chemists that use it wisely or use it at all is much smaller than the population of medicinal chemists as a whole. Question: One last question for the whole panel. It's been interesting to see how the hERG issue has kind of changed the industry over the last few years; I think it's because the FDA has gotten interested in it and so everybody does hERG assays. From your perspective, what's going to be the next hERG in the next five years? What seems to be popping up on people's radar screens as something that's so predictable of toxicity that eventually the FDA is going to start writing guidances around it. Jim Stevens Response: There are a number of issues popping up and there's publicly available a recent guidance from the FDA for all companies that are dealing with peroxisome proliferator-activated receptor compounds so I think that's an area where there's a lot of activity and there are already a couple of nascent efforts forming up consortia in the industry to deal with those issues. I think as we get into more and more complex signaling pathways my prediction is that we'll begin to see some toxicities that we haven't seen before because we're not only entering new chemical space but we're also entering new biological space. So you see these signaling pathways operating in multiple physiological pathways. I can't give you specific examples, but the prediction would be that as we start to get more and more targeted molecules that hit these complex signaling pathways, we will begin to see toxicities that we have not dealt with before that will be target-mediated. Question: I was intrigued by Paul's comment that was a consortium that was put together around the hERG issue and you just brought up consortia again. How do toxicologists do consortia and the rest of the industry doesn’t seem to be able to do it? Jim Stevens Response: In general, I hope the industry does not want to take the position that we will compete on safety; rather we will compete on efficacy. That position provides a push for toxicologists to band together externally and address these issues. Question: The 50 picomole number when I was at Merck was very soft, and we have to admit that. I wasn’t directly involved, but I think they had examples like acetaminophen that at 1000 picomoles/mg there was overt toxicity. They just divided that number by 20 as a comfort level, a safety margin. I want to take a crack at Dr. Borchardt's question. I may be naïve about transporters, but it seems to me that for many years at least P450 people like me were extremely arrogant, thinking that our enzyme systems that we were working with were totally irrelevant in terms of gluconegenesis, intermediary metabolism, bile acid salts and all the rest of it. What I've seen in the last five years is a slow convergence of all the fields; the transporters are now impacting on the DMEs, drug metabolizing enzymes; we're now realizing that the transporters are involved in the efflux of intermediary metabolic components and if metabolites in our drugs impede those transporters that could give rise to toxicity at relatively high doses. We're now seeing that PXR may be related to glucogenesis as well through various kinase pathways; I think that's the next big step. And folks like me have been extremely arrogant for 15 years and I think it's time we were slapped down by biology. Alastair Cribb Response: It’s interesting, and it does make a difference whether you're looking at intrinsic toxins or not. One of the issues with the idiosyncratic reactions is that the density of haptenization of proteins probably plays a role—that's certainly been shown from immunological studies. So that's where you get into an issue, when you look at total binding vs. binding to one protein. You know, when you look at the data in Merck's publications and listen to some of the discussions, they are combining both intrinsic and idiosyncratic toxins, and coming up with the 50 picomole guideline—but you have to be careful with that. It's interesting that both the FDA and Health Canada are grappling with guidelines for hepatotoxicity prediction and they refer to covalent binding and reactive metabolite formation—it always shows up in the working documents—and it's going to be interesting to see how that plays out as they go forward—whether people do try to firm up these numbers. Question: I'm not a toxicologist, but I'm wondering if you compared outcomes of all these computational programs to maybe Ames test and the test you described; I guess the Ames test is the golden standard, but how good is that as a standard? Should we continue to look at that? Obviously we cannot do carcinogenicity studies for all the compounds, but is the Ames test still the standard to go for? Constantine Kreatsoulas Response: Yes, the Ames test is definitely the standard that the FDA looks for. There are nongenotoxic carcinogens certainly so that's a different issue altogether, and then there are issues such as chromosomal aberrations, which the Ames does not address so the in vitro micronucleus is an assay that's used to evaluate that if there's a likelihood of that occurring. So, yes and a little bit of no is what the answer really is. Question: I guess what I'm aiming at is if you're having false negatives and false positives ultimately you're not sure. Constantine Kreatsoulas Response: In terms of these in silico models, it's just a question of trying to fold that in early on in the discovery phase. So if you can begin partitioning the compounds intelligently, or as intelligently as we can given what we know right now, that implicitly increases the throughput of the assays, which can give you a true positive, so that's why we end up trying to limit the false negatives. David Watson Response: We often see people who are trying to model mutagenesis, in particular, and I think it's really an important test piece for in vitro models. The largest dataset in Toxicology is for compounds run through the Ames test. Second, the test is run in bacteria, as opposed to a strain of rodent, for example. And third, it's a yes-no output. These attributes mean that until models for Ames results and similar GenTox data are done extremely well, there is low probability of modeling in silico any of the complex toxicological phenomena that occur in vivo, because the data sets are much smaller, they are less diverse structurally, they are more complex biologically, and you're going to have to take into account margin of safety. So you can bet that we've got to succeed in that first one, which is the Ames test. Prabhaka Jadhav, Eli Lilly, Response: I have two comments. The first one is how medicinal chemists react to the predictive models. From my one experience and the experience of my team, we do use these computational models to prioritize our compounds, but don't use them as a show stopper. For example, if one wants to test our hypothesis by putting a certain group in a certain location to block the metabolism and if the log P is >5, we still want to go ahead and make that molecule. The second comment I want to make is the in vitro assays for toxicity are also very useful, but they should not be used as a show stopper. As an example, in one particular series, one assay suggested they would be active, but when we went to the next level they were completely inactive. So, what it did, the first result kind of slowed down the team in a way. In another case we submitted a series of compounds because the toxicology suggested that there was a substructure that could have some potential to show that toxicity. It did come in as preliminarily positive, that was the toxicological study. The whole team was discouraged, but we wanted to work at the next level and get an in vivo data point and, sure enough, this toxicity that was predicted by the in vitro was there, but we had a greater than 104 margin of safety. Jim Stevens Response: Those are excellent examples of using data wisely. At least at Lilly we look at the lead optimization toxicologists' job as not to stop teams from going forward but to keep them going forward with the eyes wide open and when you do see things playing out in a way that predict a safety issue you can react very quickly. So, let me bring this to a close by coming back to a question Ron asked earlier in the day. He asked, "How low can we drive attrition?" I think there are some areas where attrition is due to toxicology and there'll be very little we can do about it. Things that pop up in very late phase and appear when multiple pharmaceutical companies submitting novel chemical structures to hit novel targets and the FDA sees a pattern playing out, we won't be able to predict those in lead optimization. As much as 40-50% of attrition prior to first human dose occurs after candidate selection— that's something we can do better. You're going to pay a price. If you're going to drive attrition down between candidate selection and before first human dose, you're going to drive attrition up before that at some earlier point. So expect to see that, and be ready to accept it. By driving attrition down post-candidate selection, you will be weeding those compounds out earlier and you will be accepting a rate of false negatives and false positives and you will see high attrition earlier. So there is going to be a yin-yang between knowing where you want your attrition to occur and where you don't want your attrition to occur, and, not investing in attrition where you probably can't do anything about it.