24866 >> Bob Davidson: Okay. We'll go ahead and... you for coming for all of you here and all...

24866 >> Bob Davidson: Okay. We'll go ahead and get started. First of all, thank you for coming for all of you here and all of you in the television audience. Today we have Ed Addison from TeraDiscoveries coming to talk to us about using computers to design drugs. He's been doing a lot of work with Microsoft and Azure and they've taken their work that they did in the universities and they're now making a company of it and moving forward. It's been 25 years working in this industry. TeraDiscoveries is a serial entrepreneur sounds like and we've worked with them for probably three years, I think, since I first met you here at Microsoft, roughly three years ago, that you started doing some of this stuff. >>: Not quite. Close. Maybe two. >> Bob Davidson: Two years and change. My time goes way too slow for me apparently because I'm having so much fun. It's been very interesting talking with him, the several times I've met with him before, and we're going to have a chance to hear what he has to say about rational design. >> Ed Addison: Thank you very much. I'm going to talk today to you about TeraDiscoveries. We're a small company of 15 people, some full-time, some part-time, some contractors, some employees, typical startup, quasi virtual. We have a business incubator where we're located on Davis Drive in ReSearch Triangle Park in North Carolina. Before describing what we're doing, let me take a minute or two to state where we came from. TeraDiscoveries was a venture that wasn't quite planned with a business plan. It evolved a little bit. A colleague of mine by the name of Lawrence Hughes and mine we founded TeraDiscoveries, working quite a number of years through various ventures, he's a chemist and intellectual property attorney and I'm an engineer turned entrepreneur spent most of my career doing business development. And a few years back -- I'm a graduate of Virginia Tech. So I'm a Hogie. A few years back the dean of engineering at Virginia Tech reached out to me and others who had venture experience. They asked us to come in and take a look at some of their technology. They were doing a tech transfer push. And at the time -- and this is before Azure and before the cloud was overwhelming, Virginia Tech had a system called system X, which was a 2200 node cluster with a high speed network connecting a bunch of Apple G-5 servers together. Some of you may have heard that. They won some awards, and they were number three or number seven super computer in the world one year but they gradually slipped because they didn't go beyond that. When I saw that, I asked the dean what they were doing to commercialize it. He said nothing. And I didn't really think we were going to sell anything like that. But I suggested that the pharmaceutical industry needed to start doing high performance computing on an outsourced basis. And I had agreement about that from a major pharmaceutical company whom we did our first engagement with. And what we did is we formed a consulting company that lasted a year or two, and then along the way I was making a presentation at Duke University where I was then approached by their tech transfer director and said I have the perfect piece of software for that big machine you've got. This was inverse design, which was a computationally intensive drug design tool which I'm going to talk about today. And so we ported inverse design to system X and it worked at Duke University on a 16 node cluster. On system X it worked okay, had a few memory problems. But very quickly the cost was dropping on the cloud, Amazon was where we went next. We ported to Amazon because the big cluster we started out with was not really being maintained in a commercial way. And the cost, even though it was an academic price, was higher than the cloud by a factor of 50 percent or more. And after we were porting to Amazon, we had an internal discussion and decided that we're going to need a lot of computing cycles not to do our software as a service, but we had a bigger vision, and that's that we wanted to preemptively compute potential drugs for many proteins from the protein data bank as a way of shortening drug discovery. So we identified resources of big cloud computing that was Google, Amazon and Microsoft. And the relationship that stuck was Microsoft. I met Todd Needham a couple of years ago, and we began this discussion. And so our stuff runs on Azure, and we aren't using the other clouds at this time. We are focusing, I guess that's a complement for Microsoft. So what we do we do? What we do is we use proprietary software and cloud computing and design drugs. And if you're a chemist that probably does a little violence to you. We don't really design drugs, we design drug candidates, because we don't do the clinical research. But it does it quite well. And we use a method called QMMM, if you're a scientist, quantum mechanics and molecular modeling, do a quantum mechanical model of the small molecule or the peptide and you do a molecular model of the protein which makes it a lot more accurate than older methods specifically DOX which did all molecular modeling which was a crude and less accurate approach, even though faster it didn't produce really good results. Now, that's not all there is to it. That's just the binding calculation. The problem is the molecular space is 10 to the 65th big. So you don't want to enumerate QMMM calculations which take about eight hours using P dynamo on a single node or you'd be waiting several lifetimes or more for the answer. So what we have done is we've built and designed a heuristic search algorithm through molecular space. We constrain the space by the properties of the protein that said that we can search virtual space, including molecules that have never been synthesized before. And only test some of them in order to do a -- you know what a hill climbing search is, you're Microsoft. So that's essentially what it does. If we describe inverse design, I've already talked about some of this. But we're searching and computing binding energy, but we're doing a lot more, we're using additional filters. We model water which a lot of the earlier in silica techniques model the molecules and pay no attention to their properties and water. We look for a lock fit. In other words a very good binding score. And we'll optimize that as we search through space. This is all based on worldwide exclusive license from Duke University and our enhancements to inverse design. It now runs on Azure. We're still testing a few things, but we have modeled several proteins. We've done the calibrations. We're starting production runs this month. We have one patent issued on the heuristic search as applied to chemistry. We have more intellectual property being filed. So our partnership with Microsoft is to use Azure. We've done runs, with 500 nodes and we're trying to up that now to about 1200 nodes. It could go even higher than that. But there's some iteration in terms of the scores. Where if we do too many we might be less efficient with the computing. We have to keep -- what we do is we launch say a 1200 QMMM calculations, get the 1200 scores and use those scores to determine which 1200 to do next based upon the heuristic bench downhill climbing search and we have to do several dozen iterations. Each one is eight hours so several dozens. It's a few weeks for a given protein. Not running continuously for a few weeks, but it could run continuously but we have -- we go from long calculations, score update repeat and so forth. And so a little bit of technical detail. And this is an illustration of the idea that we choose the best molecule in the database but we don't calculate the binding score for every molecule in the database. And we're looking for local Maxima in the scores. So you might have a surface. This is a simplified illustration because dimensionality is really higher than this. But with inverse design, it's an M times N as opposed to enumeration, which is an M to the Nth power computational cost, where M is the number of sites and N is the number of groups per site. And so groups per site has to do with how big the molecular space is. A group is you can make a small molecule by I guess some of you may be chemists in a group. If you put an R group on a molecular scaffold it's a group. You can put, 10, 20, 30 of those on a scaffold that's typical of a protein and that defines a rather huge space. Then the M is the number of sites which could be -- you could think of it in simple terms as the number of proteins. But each protein has multiple binding pockets. So it's really the number of binding pockets. So just to compare inverse design with prior methods. You'll often hear old school biochemists who are primarily wet lab people say oh that stuff doesn't work. And what they're referring to primarily is dock, which is a system, only molecular dynamics, no quantum mechanics, for many years. And it was used because QMMM was too computationally expensive. And it still is if you enumerated it. But with the cloud, we got one speed up. With the heuristic search, it got a substantial speed-up, and with Moore's Law you got another speedup. So compared to five years ago, this really wasn't practical. So docking is very fast, but its accuracy is about 20 percent. But what that means is if I get docked in scores for five molecules, one of them will be correct and the other four may not be. And so the attitude of the chemist is, okay, so I have to synthesize five molecules to get one good one. Not so bad compared to screening 10,000. Except I'll tell you why this isn't good enough, even though it is 20 percent in a moment. The free energy calculation is where you do quantum -- shredding your wave equation for binding and you enumerate instead of using the smart methods and the problem is it takes years to do the calculation. So with inverse design we use the AI search and the cloud we're accurate greater than 80 percent demonstrated multiple times in terms of binding prediction. And the search is in the -- we call it novel, because we're searching molecular space, typical drug discovery project and using robotic screening is screening existing molecules. So you hear stories like small molecules, all the good ones are taken. You may have heard that if you talked to chemists in the drug discovery business. It's all the good ones are taken because they've looked at the ones they've already built or already synthesized that they have in their refrigerators or libraries that are variations of that. So we have a greater novelty, greater accuracy, but let me give you an example of why this accuracy is even more important than it appears to be. In other words, the chemist's story we only have to synthesize five molecules and we get a good one isn't quite where we're at. So suppose a protein that's getting a lot of interest is Jack 3 because it's indications of inflammation and rheumatoid arthritis, huge markets, and we're running inverse design on Jack 3 now. But it's really not good enough to just have a Jack 3 inhibitor with good drugable properties, because there are other Jacks that signal proteins like Jack 1 and 2 whom you do not want to inhibit even though you're inhibiting Jack 3. So the problem is more complicated if you have to be selective. So we were asked by a potential customer can you give me a Jack 3 inhibitor selective against Jack 1 and Jack 2. We said we believe you can. We've set out and are doing it now. If you look at the accuracy scores, to come up with a selective Jack 3, selective against Jack 1 and 2 you have to take the .8 or .9. We'll use .8 to be conservative. And take it to the third power, because you have to run it against three molecules, three protein models not just one. So that says at 52 percent or just call it 50 percent of our results should be accurate. If I do it with Doc, 20 percent to the third power less than 1 percent. So you can't design selective molecules with docking, because the selectivity will kill you because of the low accuracy. Yes, if all you care about is binding, then that doesn't make up a drug, just binding. Binding needs selectivity need good drugable properties and you need low tox. And we're not a tox software company, but we use tox software filters because our output library eliminates molecules that have really bad tox scores. There's plenty of room for improvement in that tox software but it is getting better. Yes? >>: Docking is fast so you could have it almost for free compared to what you're doing? Can you combine the two even the dock is a less accuracy score? >> Ed Addison: There's smart ways you can combine the two, depending on what you're trying to do. That's a good question. And I'm not sure we fully exploited that yet. But when we're interested in selectivity because inhibitors are agonists alone aren't enough. Again, to be a good -- a molecule worthy of going through preclinical research, you need an inhibitor. It's got to be selective. It's got to be low tox. It's got to be synthesizable, and it also -- it also has to have good drugable properties. So we need to use filters to do that. And medicinal chemists has to like it. In other words, it has good clearance properties; it's not going to go in there and clear out right away so in effect ->>: Have you looked at the correlation between the errors of docking and the errors of the new method, because they're completely correlated, then of course there's no extra value in using docking also. uncorrelated there would be good value. If they're completely >> Ed Addison: That's a good question. Personally I don't know the answer. Sharkine and our chief scientist would probably know the answer. And I will follow up and respond to that. Because that's a very good point. So this is how inverse design is configured conceptually. The workhorse part is the binding calculation, which we're using P dynamo for, which is open source software. Very good open source software. But our value added is not the computation of the short under wave equation of P domino, it's selecting what dominos to compute it on. The way this all begins is that we have a binding affinity equation applied to a target. You start with the target. It's an xray structure. We build a computer model from the x-ray structure of the target which is a protein. The step one is to calibrate the inverse design algorithm based upon any published data for any inhibitors whatsoever for that target. So this setup process is still a little bit manual. We're building automation to combination expert systems and automated algorithms to take the person completely out of the loop. This here is completely automated on Azure. This set up process takes anywhere from a couple of days to a couple of weeks per protein. But once it's fully automated it will be down to hours, if less. The library design is a process. Part of -- we're not searching complete blind molecular space because it's 10 to the 65th big, but you can choose a smart scaffold that fits the binding pockets of that protein. And we are in the process of writing an automatic library designer, but it's still today based on mining the literature. So all this setup again takes a couple of days to a couple of weeks to appropriate. We expect to shrink it down to hours. We've got Barry Hobbs one of our computer scientists working on that right now, is her main mission, as to by this time next year to have that down to hours and not days. Property filters, that's where we use third-party software to eliminate bad molecules that are bad for other properties. That they've got predictively bad tox, which is not our core competency, but other people do that, then why consider it. Solubility can be or, rather, synthesizability can be estimated from properties and those kinds of things go in these properties. And then we do the iterative runs where you go and use up to X molecules. We had 1200. We were doing 500 but we're going up to 1200. I'm not sure we got our limit raised to 1200 yet but we'll find out soon. You do a run on that many simultaneously. It's embarrassing in parallel. Each one is running a QMMM process or QM process and giving the scores back. Then it iterates. We do it again and again several dozen times and we come back with a new chemical entity or a small highly focused library of several chemical entity possibilities. And we can do this. We're doing this primarily for small molecules today. But we expect to do it also for peptides. Short peptides. Not proteins, but short peptides, 10 to 20 positions. You get any bigger than that, computation gets ridiculous because there's 20 amino acids per position to consider. Whereas if you're using small molecules you can limit the number of R groups you vary and each number in each group set can be substantially less than 20. So this came from Duke University. It was validated with HDAC 8, which is a protein that was done at Duke. And a new molecule was designed and synthesized and literature data was used to validate the binding scores. And here is where we got a result of about 80 percent. We've also done it since it was licensed from Duke for Jack 1, Jack 2, and Jack 3, and the correlation to the literature is good. We're getting ready to do the big production run on Jack 3 that's going to -the selective one, and so we expect for that we're going to hire out a chemist to synthesize the results we get so we can get some wet lab correlation validating even further than what was validated at Duke. So I've already pointed this out, the benefits, the accuracy, the speed and speed relative to free energy calculation, not speed relative to doc. And novelty. And novelty, we think, is really important because for the reason I mentioned earlier. You still hear sometimes the people in the industry saying, well, all the good small molecules have been taken. Well, space 10 to the 65th big how can that be true? the same ones all the time. It's just that they use So some of the business propositions that we are experimenting with in the market now are as follows. One option is we're in early stage company, just primarily been in development. Most of our revenue today, in fact all of our revenue to date has been consulting and services. Option one, think of this as a customer's option. We do a single target discovery project, which may have a total price tag of 50 K or higher depending on the complexity. And we would ask for royalty if it ever goes to market or milestone piece. But that are much smaller than what a biotech typically asks for developing a single molecule. Option two is to license molecular libraries. In our project with Microsoft, we are doing 25 targets that we select, developing small focused libraries, library being maybe six to 12 molecules big, our best results. And making those available if someone wants to go forth with them they can buy them or license them, license the rights from us, or we can be collaborative with them and raise money together for the project to take it through the clinical and outsource basis. And this is what our agreement with Microsoft has been called the speculative business, because this is where we choose the proteins in advance and work on them and then look for partners for the results. We have a partner in Philadelphia called Numota Technologies who has a database, and it uses SQL. The database matches molecular assets to any one in the world who is interested in those molecular assets, either to license, to partner with, to research with, or from a market perspective. So we're using that as one of the ways that we're going to find partners for the work we're doing together is through their database. Third option is an R&D partnership. If we find a target that's of interest to a big pharma company and we achieve early results, then we will seek a partnership with them where we do -- they fund preclinical development together or that we pass that on to them, depending on what their preferences are. And option four is to license inverse design for internal use. We haven't done that yet. I think we are going to wait about 12 months until we get more experience with it ourselves and make it more full-proof and we make that as a high end Azure application that we'll train a company to use. >>: Is that an option due to pharma's concerns about complete and utter privacy? >> Ed Addison: private Azure. schizophrenic. It could be. And if that happens, what they're going to want a How do I say that? Azure. Okay. Now I know why I'm So that might be something that is more in Microsoft's world, what happens if a pharma company wants the crowd internally, how do you solve that problem? Do you just sell them a monster machine inside their firewall? If so, then we can port the software over there and get a contract with them and give them a license to do all their proteins or one of their proteins, whatever they want to do. So we're a little bit opportunistic about the business model. I think there's going to be some changing dynamics in this market. And we don't claim to have a good enough crystal ball to know what the stable business model or which one of these is going to be the driver. So we're going to spread our bets and be nimble. And as the market evolves, and as this matures, it may be that we zero in on just one of these, as our primary business model. But for now we are going with the flow. And I don't think anybody in the market knows what the market's going to be, as the blockbusters move towards personalized medicine there's going to be lots of changes. So far for marketing, we've been to bio IT world, and also Boston biotech, the Boston biotech CEO meeting I went to, we want to market inverse design to large pharma companies. And these are not customers. I just put their names down as examples of the kind of clients we would like to have. We would like to partner with other Microsoft partners. I named a couple here because their software might be compatible for either improving the speed of what we're doing or being high end option for our customers. And we would like to market the products. When I say "products", I'm speaking of molecular products, that come from the speculative business. Using one of those business models that I mentioned. And this will be across the barriers going forward. We have some other bioinformatics capabilities that I would like to mention. And I also would like to talk a little bit about some of the things we're interested in. PDB, which is a database of the National Institutes of Health, and it's also worldwide. It's European Molecular Biology Institute also hosts PDB, has been ported to -- it's been ported to Azure, not to the data marketplace yet. But it will be soon. One of our developers has ported it. It's in SQL Azure. We can do full SQL queries on PDB. We can do more powerful queries than you can do on PDB on the public site. For instance, you can find molecules that have certain kinds of properties. Maybe you want a molecule that has three zincs close together. You can't find that in PDB, the public PDB now, but if you have SQL you can. And we're going to provide you with some queries that we think that are unique to the SQL Azure PDB as a follow-up. >>: [inaudible]. >> Ed Addison: About 100,000 proteins and about 300 gigabytes in that range. Our staff has both computational chemists and bio infomatics people as well as software folks. And so a reason for being interested in PDB is we can draw the extra structures from that to get inverse design. But along those lines we're interested in -- and I had several discussions today here -- we're interested in finding a natural language or search capability, semantic search capability to complement the platform because we would like to compute upon finding adequate capital all of the targets in PDB in advance of anyone doing drug discovery on them, just so that we shorten the drug discovery cycle for small molecules and peptides as least. But not all proteins and PDB are suitable as targets. Many of them would never be a target, because they're either not human or they're not part of the pathway of the physiologic relevance. So by having a literature extractor, natural language focused on biomedical literature, we can identify which 10% of the PDB are possible they might be targets because they were found in pathways that molecular biologists have identified in their research. That's one literature extraction problem. We have another semantic search problem we have to do, and that is as we produce these molecules we need to do patent searches. And the patent searches are not as simple as tech searches, because we need the semantic remodel the properties of chemistry in those queries. So that we can find whether there are structures, molecules or structures that might, that we might be in violation of if we try to sell these molecules and have to eliminate those. So we need to do -- that's not as good as an IP attorney doing it. But where you're doing two-step IP. The first is the sort of automated filter and the second is we have a interested customer we have a real IP attorney look at it. And that way we're not paying attorney's fees for every molecule but only when there's a customer and it's been filtered in advance. So we have a need for semantic searching and literature extracting in the biomedical literature, and we've already started some discussions along these lines. We're exploring a search engine and we're one of your staff is also looking at what you're all doing in natural language to see if there's a fit there. And so let me summarize where our status is. The SQL Azure PDB is ported not yet released on the data marketplace. We have some productization work to do, such as a user's manual, privacy policy and a license and that stuff is being worked. And we need to do a little testing. We'll probably have that out there for free for a little while. Maybe by Labor Day. Inverse design is ported and debugged, although we found some new things that we had to do this week. And it does the heavy lifting. We need to do more front-end automation before it's released as a piece of software that we can license without hand holding. But that's a goal. And the first six -- actually, the first seven targets have been identified. Recalibrated. Jack 3 reproductions ready to run. We're again 15 full and part time folks, quasi virtual. We have incubator office space but half of our people are not in North Carolina. So obviously we're not all there every day all day. But we use the incubator as a place to go for meetings to meet customers and to have group meetings when needed if we're not doing it online. We're also raising a round of funding. We have an interested investor. looking to add to that. And we're expecting that that will come to a conclusion in the next couple of months. We're So I have a chart that I call the holy grail. And I've already alluded to this. And that's the holy grail is we would like to reduce drug discovery to a simple SQL lookup or search lookup. In other words, now that's a long way off, but there are significant steps toward that that we can take. So we precompute inhibitors for all promising-looking targets in PDB. That's the immediate goal. That's an expensive computational proposition. We are doing 25 right now and we're shrinking the time and we're trying to do as many smart things. We'll look at your suggestion about the combining of docking with, to see if we can get any savings there. But ultimately we're going to let you take 100,000 proteins in PDB and we want to maybe choose five to 10,000 of them we want to precompute this for and we said in an earlier slide it's 50 K engagement. That includes profit and markup and people we're cutting out. So it may really only be $10,000 worth of cloud time per protein and that will come down as costs come down. However, it's still a 15 or $20 million proposition. So we have to raise money to do it. And our intent is to raise money for some of that and get customers to pay for some of that. And maybe to get some of that from government sponsorship, and just over time build up enough, roll up enough money there to do this the first step toward the holy grail. We'll take the x-ray from PDB or wherever it comes from, if it's a proprietary protein. We're working on what we call the automatic scaffold designer. That has to be done before we do this in volume. That's the part that's people intense that we're automating, and the methods to automate that have been identified. So it's not -- doesn't require scientific breakthrough. It just requires more work. And what we want to do is compute this big inhibiter library and then drug discovery then becomes -- one of the things you'll do in drug discovery is look up and see what inhibitors are already available. If a customer wants them we sell them take a royalty, whatever business model it leads to as part of our business. So thank you for your time. And we can do some questions if there are any. >>: A couple of points I don't think you spoke to not just for drug discovery but for material science as well. >> Ed Addison: I neglected to mention that. This is inverse design technique originally came from material science in the chemistry department at Duke, and then it was, Duke got grants to do it for drug discovery and Sharkine, our chief scientific officer, was at Duke at the time as a post-doc, she did the original design for the drug discovery. But with some changes we can use it to design materials. We haven't done that yet. But -- and I think that what we would do initially is to do that as a service, to find if we find an interesting project or a customer who is interested in doing that. That we can go back and optimize the materials to certain properties. What inverse design does is it maximizes the score on a property. In the case of drug design, that property is binding. But in the case of materials, there are other properties that people are interested in, and you have to change the property equation. And so there's some testing and some changing of the scope and size of the problem that would have to be done. But we would be interested in branching out into material science as well because we think -- this has a different risk profile from a business sense than drugs do. Drugs are low probability of success, but big money when they succeed, whereas material science would probably be a little bit more stable. So it might be good to complement two businesses. But we're small and focused at the moment. Yes, we would be interested in that. So if you know of others interested in your community in that problem, we would certainly like to have conversations. Any others? >>: [inaudible] so in the transition to Azure from your proprietary cluster to Amazon to Azure now, has it been easy? Is it ->> Ed Addison: We weren't really all that far with Amazon. We had only done a couple of runs when I first came in here, and then we were given some Azure time. It was a little bit challenging for our folks at first, because they didn't have, they're mostly Linux-type C programmers that didn't know the Microsoft platform all that well. But we recruited a guy from Florida, the one who did the PDB work, was a very strong database guy, good software engineer had experience with Azure. Basically coached our staff on getting through some of -- we got some good tips from Microsoft people, too. But they had to go through periods of time of not knowing a bunch of things. And so it was lack of familiarity with the Microsoft platform as a developer. And so we brought Eric in who has done the PDB work, and he's helped the others, Terry and Bill and Sharkine overcome the we don't know the Microsoft platform. And I think we're passed that hurdle mostly. >>: So rather than just a learning curve that you're coming up on, do you think that capabilities, the things you're going to try to accomplish, you said embarrassingly parallel you said. I tend to use the term pleasingly parallel because I'm not embarrassed on parallel at all. >> Ed Addison: I'm not either, but that's the term that technical folks like to use. I'd rather have ->>: Yeah. >> Ed Addison: Then it cuts some costs down. >>: But you're finding that you've got access to everything you need or there are issues you still have in the platform? >> Ed Addison: It's taken me a while to learn who to call at Microsoft. But I've gotten more comfortable with that now. And different people at Microsoft have different ways that they respond to messages. Some will respond to e-mail, and some might need a calendar appointment or somebody might need a text. So you have to kind of figure out -- okay. But so that was one challenge. That's a nontechnical challenge. Getting the people trained was a challenge. We've run into a couple of technical barriers on Azure that your staff was very helpful with also. And we're working to find the best data centers for these big jobs. We ran a job on a Friday night once and it took a long time to queue up. Not that long. Since this is being recorded I won't say the numbers. But I think it's gotten better since. >>: Okay, thank you. >> Ed Addison: And we're going to follow up with having some of our technical folks, especially on the PDB, talk to you SQL Azure folks. Any others? >>: [inaudible] how much, like how much manual effort does it take to do one of these problems? >> Ed Addison: The part that's manual is once we choose a protein, you have to go get the extra structure and set up a computing file. And that doesn't take too long. But it's not fully automated. But the part that's a little more challenging, there's two of them. One is calibration, where we have to go to the literature, grab any data that's available, pull it out, and run a calibration run for that protein. And if we could do that with the literature extractor we could largely take the person out of the loop on that. And the other one is scaffold design. Scaffold is largely based upon the binding pockets of the protein. We don't want to just do blind molecular space. It's too big. So we have to -- are we doing a peptide? Are we doing a small molecule. There's no scaffold in the peptide we just have to decide how long. For the small molecule, you may have some parts that are going to be fixed and many parts that are going to be buried. And you have to decide how big you want that molecule to be. And the literature can give pointers to that. Chemist's intuition. This is the hardest one to automate that we're working on. It's going to be part expert system and part extraction and part assembly. However, those are not -the reason we're using Azure was for the QM part, this is the real number crunching where we need a lot of parallelism. This automation is more along the lines of if I want to queue up a lot of proteins I want to get people out of the way so I can streamline them rather than having a two-week delay for each one or have multiple people in parallel for each one. You've got that labor down. And also to release this as an application for customers, I think, we really have to keep that much simpler than that setup is now. It's not terrible. It's just not as automated as the rest of it. We focused on the heavy lifting thing first. So it's just where we are. And there's some hard problems in there but there's not impossible problems in there. >>: [inaudible] archive? >> Ed Addison: Well, what literature should we search or do we search when we set it up? Usually Pub Med. We want to know what biochemists and molecular biologists have found when they've either done binding experiments or done pathway analysis. Depending on which problem we're looking at. The pathway is really more to determine is this protein potentially a target? If we're doing preemptive calculations we don't wait for a validated target. If we're doing a service for a customer, they'll bring us a validated target. So there are two different models there. And the project we're doing with Microsoft, that's the preemptive stuff. The ones we're handpicking initially are validated targets coveted by the industry. But when we get to looking at the bigger piece of the database, we have to have methods of selecting that are smarter. >>: Could you tell me a little bit about the calibration process that you do? >> Ed Addison: It takes a set of binding data for anything which has been bound to that protein. And uses it to track the parameters in the algorithm. So it's ->>: Operation ->> Ed Addison: No, but. >>: A .6 gets mapped to a .75. >> Ed Addison: No, but there's a the inputs that you want to use. I'd have to set up a call between the algorithm. But the algorithm that we're going to put in it. problem that does it. You just have to find And if you want the detailed science of that, you and Sharkine, so she can share what's in exists, it's a matter of finding the data And the same would be true if we did material sciences. We'd have to seed it with anything that was known. Now, it's possible to do this without calibrating. But it takes a two-step process. In other words, you find a homologous protein and get some of that data. Run with that. Take the results. Then you have to synthesize it and get the binding data and run it again, unless you get really lucky and it's good enough then you don't have to get it again. >>: [inaudible] does calibration affect the sort that you would do afterwards? What binds strongest and what binds weakest? Or is it only when you're talking about you want something that matches this and doesn't match these, now you really need calibration because you have to be able to combine ->> Ed Addison: No, that's selectivity. The selectivity is done by running the result set against other protein models. All the proteins have to be calibrated. Others? >> Bob Davidson: >> Ed Addison: [applause]. >> Bob Davidson: I want to thank you very much. Thank you for the opportunity. Appreciate it.

24866 >> Bob Davidson: Okay. We'll go ahead and... you for coming for all of you here and all...

Related documents

Products

Support

24866 &gt;&gt; Bob Davidson: Okay. We'll go ahead and... you for coming for all of you here and all...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

24866 >> Bob Davidson: Okay. We'll go ahead and... you for coming for all of you here and all...