24866 >> Bob Davidson: Okay. We'll go ahead and... you for coming for all of you here and all...

advertisement
24866
>> Bob Davidson: Okay. We'll go ahead and get started. First of all, thank
you for coming for all of you here and all of you in the television audience.
Today we have Ed Addison from TeraDiscoveries coming to talk to us about using
computers to design drugs. He's been doing a lot of work with Microsoft and
Azure and they've taken their work that they did in the universities and
they're now making a company of it and moving forward. It's been 25 years
working in this industry. TeraDiscoveries is a serial entrepreneur sounds like
and we've worked with them for probably three years, I think, since I first met
you here at Microsoft, roughly three years ago, that you started doing some of
this stuff.
>>: Not quite.
Close.
Maybe two.
>> Bob Davidson: Two years and change. My time goes way too slow for me
apparently because I'm having so much fun. It's been very interesting talking
with him, the several times I've met with him before, and we're going to have a
chance to hear what he has to say about rational design.
>> Ed Addison: Thank you very much. I'm going to talk today to you about
TeraDiscoveries. We're a small company of 15 people, some full-time, some
part-time, some contractors, some employees, typical startup, quasi virtual.
We have a business incubator where we're located on Davis Drive in ReSearch
Triangle Park in North Carolina.
Before describing what we're doing, let me take a minute or two to state where
we came from. TeraDiscoveries was a venture that wasn't quite planned with a
business plan. It evolved a little bit. A colleague of mine by the name of
Lawrence Hughes and mine we founded TeraDiscoveries, working quite a number of
years through various ventures, he's a chemist and intellectual property
attorney and I'm an engineer turned entrepreneur spent most of my career doing
business development.
And a few years back -- I'm a graduate of Virginia Tech. So I'm a Hogie. A
few years back the dean of engineering at Virginia Tech reached out to me and
others who had venture experience. They asked us to come in and take a look at
some of their technology. They were doing a tech transfer push.
And at the time -- and this is before Azure and before the cloud was
overwhelming, Virginia Tech had a system called system X, which was a 2200 node
cluster with a high speed network connecting a bunch of Apple G-5 servers
together. Some of you may have heard that.
They won some awards, and they were number three or number seven super computer
in the world one year but they gradually slipped because they didn't go beyond
that.
When I saw that, I asked the dean what they were doing to commercialize it. He
said nothing. And I didn't really think we were going to sell anything like
that. But I suggested that the pharmaceutical industry needed to start doing
high performance computing on an outsourced basis.
And I had agreement about that from a major pharmaceutical company whom we did
our first engagement with. And what we did is we formed a consulting company
that lasted a year or two, and then along the way I was making a presentation
at Duke University where I was then approached by their tech transfer director
and said I have the perfect piece of software for that big machine you've got.
This was inverse design, which was a computationally intensive drug design tool
which I'm going to talk about today. And so we ported inverse design to system
X and it worked at Duke University on a 16 node cluster. On system X it worked
okay, had a few memory problems. But very quickly the cost was dropping on the
cloud, Amazon was where we went next. We ported to Amazon because the big
cluster we started out with was not really being maintained in a commercial
way.
And the cost, even though it was an academic price, was higher than the cloud
by a factor of 50 percent or more. And after we were porting to Amazon, we had
an internal discussion and decided that we're going to need a lot of computing
cycles not to do our software as a service, but we had a bigger vision, and
that's that we wanted to preemptively compute potential drugs for many proteins
from the protein data bank as a way of shortening drug discovery. So we
identified resources of big cloud computing that was Google, Amazon and
Microsoft. And the relationship that stuck was Microsoft.
I met Todd Needham a couple of years ago, and we began this discussion. And so
our stuff runs on Azure, and we aren't using the other clouds at this time. We
are focusing, I guess that's a complement for Microsoft.
So what we do we do? What we do is we use proprietary software and cloud
computing and design drugs. And if you're a chemist that probably does a
little violence to you. We don't really design drugs, we design drug
candidates, because we don't do the clinical research. But it does it quite
well.
And we use a method called QMMM, if you're a scientist, quantum mechanics and
molecular modeling, do a quantum mechanical model of the small molecule or the
peptide and you do a molecular model of the protein which makes it a lot more
accurate than older methods specifically DOX which did all molecular modeling
which was a crude and less accurate approach, even though faster it didn't
produce really good results.
Now, that's not all there is to it. That's just the binding calculation. The
problem is the molecular space is 10 to the 65th big. So you don't want to
enumerate QMMM calculations which take about eight hours using P dynamo on a
single node or you'd be waiting several lifetimes or more for the answer.
So what we have done is we've built and designed a heuristic search algorithm
through molecular space. We constrain the space by the properties of the
protein that said that we can search virtual space, including molecules that
have never been synthesized before.
And only test some of them in order to do a -- you know what a hill climbing
search is, you're Microsoft. So that's essentially what it does. If we
describe inverse design, I've already talked about some of this.
But we're searching and computing binding energy, but we're doing a lot more,
we're using additional filters. We model water which a lot of the earlier in
silica techniques model the molecules and pay no attention to their properties
and water.
We look for a lock fit. In other words a very good binding score. And we'll
optimize that as we search through space. This is all based on worldwide
exclusive license from Duke University and our enhancements to inverse design.
It now runs on Azure. We're still testing a few things, but we have modeled
several proteins. We've done the calibrations. We're starting production runs
this month.
We have one patent issued on the heuristic search as applied to chemistry. We
have more intellectual property being filed. So our partnership with Microsoft
is to use Azure. We've done runs, with 500 nodes and we're trying to up that
now to about 1200 nodes. It could go even higher than that. But there's some
iteration in terms of the scores. Where if we do too many we might be less
efficient with the computing. We have to keep -- what we do is we launch say a
1200 QMMM calculations, get the 1200 scores and use those scores to determine
which 1200 to do next based upon the heuristic bench downhill climbing search
and we have to do several dozen iterations. Each one is eight hours so several
dozens. It's a few weeks for a given protein. Not running continuously for a
few weeks, but it could run continuously but we have -- we go from long
calculations, score update repeat and so forth.
And so a little bit of technical detail. And this is an illustration of the
idea that we choose the best molecule in the database but we don't calculate
the binding score for every molecule in the database. And we're looking for
local Maxima in the scores.
So you might have a surface. This is a simplified illustration because
dimensionality is really higher than this. But with inverse design, it's an M
times N as opposed to enumeration, which is an M to the Nth power computational
cost, where M is the number of sites and N is the number of groups per site.
And so groups per site has to do with how big the molecular space is. A group
is you can make a small molecule by I guess some of you may be chemists in a
group. If you put an R group on a molecular scaffold it's a group. You can
put, 10, 20, 30 of those on a scaffold that's typical of a protein and that
defines a rather huge space.
Then the M is the number of sites which could be -- you could think of it in
simple terms as the number of proteins. But each protein has multiple binding
pockets. So it's really the number of binding pockets.
So just to compare inverse design with prior methods. You'll often hear old
school biochemists who are primarily wet lab people say oh that stuff doesn't
work. And what they're referring to primarily is dock, which is a system, only
molecular dynamics, no quantum mechanics, for many years.
And it was used because QMMM was too computationally expensive. And it still
is if you enumerated it. But with the cloud, we got one speed up. With the
heuristic search, it got a substantial speed-up, and with Moore's Law you got
another speedup. So compared to five years ago, this really wasn't practical.
So docking is very fast, but its accuracy is about 20 percent. But what that
means is if I get docked in scores for five molecules, one of them will be
correct and the other four may not be.
And so the attitude of the chemist is, okay, so I have to synthesize five
molecules to get one good one. Not so bad compared to screening 10,000.
Except I'll tell you why this isn't good enough, even though it is 20 percent
in a moment.
The free energy calculation is where you do quantum -- shredding your wave
equation for binding and you enumerate instead of using the smart methods and
the problem is it takes years to do the calculation.
So with inverse design we use the AI search and the cloud we're accurate
greater than 80 percent demonstrated multiple times in terms of binding
prediction. And the search is in the -- we call it novel, because we're
searching molecular space, typical drug discovery project and using robotic
screening is screening existing molecules.
So you hear stories like small molecules, all the good ones are taken. You may
have heard that if you talked to chemists in the drug discovery business. It's
all the good ones are taken because they've looked at the ones they've already
built or already synthesized that they have in their refrigerators or libraries
that are variations of that.
So we have a greater novelty, greater accuracy, but let me give you an example
of why this accuracy is even more important than it appears to be. In other
words, the chemist's story we only have to synthesize five molecules and we get
a good one isn't quite where we're at.
So suppose a protein that's getting a lot of interest is Jack 3 because it's
indications of inflammation and rheumatoid arthritis, huge markets, and we're
running inverse design on Jack 3 now. But it's really not good enough to just
have a Jack 3 inhibitor with good drugable properties, because there are other
Jacks that signal proteins like Jack 1 and 2 whom you do not want to inhibit
even though you're inhibiting Jack 3.
So the problem is more complicated if you have to be selective. So we were
asked by a potential customer can you give me a Jack 3 inhibitor selective
against Jack 1 and Jack 2. We said we believe you can. We've set out and are
doing it now.
If you look at the accuracy scores, to come up with a selective Jack 3,
selective against Jack 1 and 2 you have to take the .8 or .9. We'll use .8 to
be conservative. And take it to the third power, because you have to run it
against three molecules, three protein models not just one.
So that says at 52 percent or just call it 50 percent of our results should be
accurate. If I do it with Doc, 20 percent to the third power less than
1 percent. So you can't design selective molecules with docking, because the
selectivity will kill you because of the low accuracy. Yes, if all you care
about is binding, then that doesn't make up a drug, just binding. Binding
needs selectivity need good drugable properties and you need low tox.
And we're not a tox software company, but we use tox software filters because
our output library eliminates molecules that have really bad tox scores.
There's plenty of room for improvement in that tox software but it is getting
better. Yes?
>>: Docking is fast so you could have it almost for free compared to what
you're doing? Can you combine the two even the dock is a less accuracy score?
>> Ed Addison: There's smart ways you can combine the two, depending on what
you're trying to do. That's a good question. And I'm not sure we fully
exploited that yet. But when we're interested in selectivity because
inhibitors are agonists alone aren't enough.
Again, to be a good -- a molecule worthy of going through preclinical research,
you need an inhibitor. It's got to be selective. It's got to be low tox.
It's got to be synthesizable, and it also -- it also has to have good drugable
properties. So we need to use filters to do that. And medicinal chemists has
to like it. In other words, it has good clearance properties; it's not going
to go in there and clear out right away so in effect ->>: Have you looked at the correlation between the errors of docking and the
errors of the new method, because they're completely correlated, then of course
there's no extra value in using docking also.
uncorrelated there would be good value.
If they're completely
>> Ed Addison: That's a good question. Personally I don't know the answer.
Sharkine and our chief scientist would probably know the answer. And I will
follow up and respond to that. Because that's a very good point.
So this is how inverse design is configured conceptually. The workhorse part
is the binding calculation, which we're using P dynamo for, which is open
source software. Very good open source software. But our value added is not
the computation of the short under wave equation of P domino, it's selecting
what dominos to compute it on.
The way this all begins is that we have a binding affinity equation applied to
a target. You start with the target. It's an xray structure. We build a
computer model from the x-ray structure of the target which is a protein.
The step one is to calibrate the inverse design algorithm based upon any
published data for any inhibitors whatsoever for that target.
So this setup process is still a little bit manual. We're building automation
to combination expert systems and automated algorithms to take the person
completely out of the loop. This here is completely automated on Azure. This
set up process takes anywhere from a couple of days to a couple of weeks per
protein. But once it's fully automated it will be down to hours, if less.
The library design is a process. Part of -- we're not searching complete blind
molecular space because it's 10 to the 65th big, but you can choose a smart
scaffold that fits the binding pockets of that protein. And we are in the
process of writing an automatic library designer, but it's still today based on
mining the literature.
So all this setup again takes a couple of days to a couple of weeks to
appropriate. We expect to shrink it down to hours. We've got Barry Hobbs one
of our computer scientists working on that right now, is her main mission, as
to by this time next year to have that down to hours and not days.
Property filters, that's where we use third-party software to eliminate bad
molecules that are bad for other properties. That they've got predictively bad
tox, which is not our core competency, but other people do that, then why
consider it.
Solubility can be or, rather, synthesizability can be estimated from properties
and those kinds of things go in these properties. And then we do the iterative
runs where you go and use up to X molecules. We had 1200. We were doing 500
but we're going up to 1200. I'm not sure we got our limit raised to 1200 yet
but we'll find out soon.
You do a run on that many simultaneously. It's embarrassing in parallel. Each
one is running a QMMM process or QM process and giving the scores back. Then
it iterates. We do it again and again several dozen times and we come back
with a new chemical entity or a small highly focused library of several
chemical entity possibilities.
And we can do this. We're doing this primarily for small molecules today. But
we expect to do it also for peptides. Short peptides. Not proteins, but short
peptides, 10 to 20 positions. You get any bigger than that, computation gets
ridiculous because there's 20 amino acids per position to consider. Whereas if
you're using small molecules you can limit the number of R groups you vary and
each number in each group set can be substantially less than 20.
So this came from Duke University. It was validated with HDAC 8, which is a
protein that was done at Duke. And a new molecule was designed and synthesized
and literature data was used to validate the binding scores. And here is where
we got a result of about 80 percent.
We've also done it since it was licensed from Duke for Jack 1, Jack 2, and Jack
3, and the correlation to the literature is good.
We're getting ready to do the big production run on Jack 3 that's going to -the selective one, and so we expect for that we're going to hire out a chemist
to synthesize the results we get so we can get some wet lab correlation
validating even further than what was validated at Duke.
So I've already pointed this out, the benefits, the accuracy, the speed and
speed relative to free energy calculation, not speed relative to doc.
And novelty. And novelty, we think, is really important because for the reason
I mentioned earlier. You still hear sometimes the people in the industry
saying, well, all the good small molecules have been taken.
Well, space 10 to the 65th big how can that be true?
the same ones all the time.
It's just that they use
So some of the business propositions that we are experimenting with in the
market now are as follows. One option is we're in early stage company, just
primarily been in development. Most of our revenue today, in fact all of our
revenue to date has been consulting and services.
Option one, think of this as a customer's option. We do a single target
discovery project, which may have a total price tag of 50 K or higher depending
on the complexity. And we would ask for royalty if it ever goes to market or
milestone piece. But that are much smaller than what a biotech typically asks
for developing a single molecule.
Option two is to license molecular libraries. In our project with Microsoft,
we are doing 25 targets that we select, developing small focused libraries,
library being maybe six to 12 molecules big, our best results.
And making those available if someone wants to go forth with them they can buy
them or license them, license the rights from us, or we can be collaborative
with them and raise money together for the project to take it through the
clinical and outsource basis.
And this is what our agreement with Microsoft has been called the speculative
business, because this is where we choose the proteins in advance and work on
them and then look for partners for the results.
We have a partner in Philadelphia called Numota Technologies who has a
database, and it uses SQL. The database matches molecular assets to any one in
the world who is interested in those molecular assets, either to license, to
partner with, to research with, or from a market perspective.
So we're using that as one of the ways that we're going to find partners for
the work we're doing together is through their database.
Third option is an R&D partnership. If we find a target that's of interest to
a big pharma company and we achieve early results, then we will seek a
partnership with them where we do -- they fund preclinical development together
or that we pass that on to them, depending on what their preferences are.
And option four is to license inverse design for internal use. We haven't done
that yet. I think we are going to wait about 12 months until we get more
experience with it ourselves and make it more full-proof and we make that as a
high end Azure application that we'll train a company to use.
>>: Is that an option due to pharma's concerns about complete and utter
privacy?
>> Ed Addison:
private Azure.
schizophrenic.
It could be. And if that happens, what they're going to want a
How do I say that? Azure. Okay. Now I know why I'm
So that might be something that is more in Microsoft's world, what happens if a
pharma company wants the crowd internally, how do you solve that problem? Do
you just sell them a monster machine inside their firewall? If so, then we can
port the software over there and get a contract with them and give them a
license to do all their proteins or one of their proteins, whatever they want
to do.
So we're a little bit opportunistic about the business model. I think there's
going to be some changing dynamics in this market. And we don't claim to have
a good enough crystal ball to know what the stable business model or which one
of these is going to be the driver. So we're going to spread our bets and be
nimble. And as the market evolves, and as this matures, it may be that we zero
in on just one of these, as our primary business model.
But for now we are going with the flow. And I don't think anybody in the
market knows what the market's going to be, as the blockbusters move towards
personalized medicine there's going to be lots of changes. So far for
marketing, we've been to bio IT world, and also Boston biotech, the Boston
biotech CEO meeting I went to, we want to market inverse design to large pharma
companies.
And these are not customers. I just put their names down as examples of the
kind of clients we would like to have. We would like to partner with other
Microsoft partners.
I named a couple here because their software might be compatible for either
improving the speed of what we're doing or being high end option for our
customers.
And we would like to market the products. When I say "products", I'm speaking
of molecular products, that come from the speculative business. Using one of
those business models that I mentioned. And this will be across the barriers
going forward.
We have some other bioinformatics capabilities that I would like to mention.
And I also would like to talk a little bit about some of the things we're
interested in.
PDB, which is a database of the National Institutes of Health, and it's also
worldwide. It's European Molecular Biology Institute also hosts PDB, has been
ported to -- it's been ported to Azure, not to the data marketplace yet. But
it will be soon. One of our developers has ported it. It's in SQL Azure. We
can do full SQL queries on PDB. We can do more powerful queries than you can
do on PDB on the public site. For instance, you can find molecules that have
certain kinds of properties. Maybe you want a molecule that has three zincs
close together. You can't find that in PDB, the public PDB now, but if you
have SQL you can.
And we're going to provide you with some queries that we think that are unique
to the SQL Azure PDB as a follow-up.
>>: [inaudible].
>> Ed Addison:
About 100,000 proteins and about 300 gigabytes in that range.
Our staff has both computational chemists and bio infomatics people as well as
software folks. And so a reason for being interested in PDB is we can draw the
extra structures from that to get inverse design.
But along those lines we're interested in -- and I had several discussions
today here -- we're interested in finding a natural language or search
capability, semantic search capability to complement the platform because we
would like to compute upon finding adequate capital all of the targets in PDB
in advance of anyone doing drug discovery on them, just so that we shorten the
drug discovery cycle for small molecules and peptides as least. But not all
proteins and PDB are suitable as targets.
Many of them would never be a target, because they're either not human or
they're not part of the pathway of the physiologic relevance. So by having a
literature extractor, natural language focused on biomedical literature, we can
identify which 10% of the PDB are possible they might be targets because they
were found in pathways that molecular biologists have identified in their
research. That's one literature extraction problem. We have another semantic
search problem we have to do, and that is as we produce these molecules we need
to do patent searches. And the patent searches are not as simple as tech
searches, because we need the semantic remodel the properties of chemistry in
those queries.
So that we can find whether there are structures, molecules or structures that
might, that we might be in violation of if we try to sell these molecules and
have to eliminate those.
So we need to do -- that's not as good as an IP attorney doing it. But where
you're doing two-step IP. The first is the sort of automated filter and the
second is we have a interested customer we have a real IP attorney look at it.
And that way we're not paying attorney's fees for every molecule but only when
there's a customer and it's been filtered in advance.
So we have a need for semantic searching and literature extracting in the
biomedical literature, and we've already started some discussions along these
lines. We're exploring a search engine and we're one of your staff is also
looking at what you're all doing in natural language to see if there's a fit
there.
And so let me summarize where our status is. The SQL Azure PDB is ported not
yet released on the data marketplace. We have some productization work to do,
such as a user's manual, privacy policy and a license and that stuff is being
worked. And we need to do a little testing. We'll probably have that out
there for free for a little while. Maybe by Labor Day.
Inverse design is ported and debugged, although we found some new things that
we had to do this week. And it does the heavy lifting. We need to do more
front-end automation before it's released as a piece of software that we can
license without hand holding. But that's a goal.
And the first six -- actually, the first seven targets have been identified.
Recalibrated. Jack 3 reproductions ready to run. We're again 15 full and part
time folks, quasi virtual. We have incubator office space but half of our
people are not in North Carolina. So obviously we're not all there every day
all day.
But we use the incubator as a place to go for meetings to meet customers and to
have group meetings when needed if we're not doing it online.
We're also raising a round of funding. We have an interested investor.
looking to add to that. And we're expecting that that will come to a
conclusion in the next couple of months.
We're
So I have a chart that I call the holy grail. And I've already alluded to
this. And that's the holy grail is we would like to reduce drug discovery to a
simple SQL lookup or search lookup. In other words, now that's a long way off,
but there are significant steps toward that that we can take. So we precompute
inhibitors for all promising-looking targets in PDB. That's the immediate
goal.
That's an expensive computational proposition. We are doing 25 right now and
we're shrinking the time and we're trying to do as many smart things. We'll
look at your suggestion about the combining of docking with, to see if we can
get any savings there.
But ultimately we're going to let you take 100,000 proteins in PDB and we want
to maybe choose five to 10,000 of them we want to precompute this for and we
said in an earlier slide it's 50 K engagement. That includes profit and markup
and people we're cutting out.
So it may really only be $10,000 worth of cloud time per protein and that will
come down as costs come down. However, it's still a 15 or $20 million
proposition. So we have to raise money to do it.
And our intent is to raise money for some of that and get customers to pay for
some of that. And maybe to get some of that from government sponsorship, and
just over time build up enough, roll up enough money there to do this the first
step toward the holy grail. We'll take the x-ray from PDB or wherever it comes
from, if it's a proprietary protein.
We're working on what we call the automatic scaffold designer. That has to be
done before we do this in volume. That's the part that's people intense that
we're automating, and the methods to automate that have been identified.
So it's not -- doesn't require scientific breakthrough. It just requires more
work. And what we want to do is compute this big inhibiter library and then
drug discovery then becomes -- one of the things you'll do in drug discovery is
look up and see what inhibitors are already available. If a customer wants
them we sell them take a royalty, whatever business model it leads to as part
of our business.
So thank you for your time.
And we can do some questions if there are any.
>>: A couple of points I don't think you spoke to not just for drug discovery
but for material science as well.
>> Ed Addison: I neglected to mention that. This is inverse design technique
originally came from material science in the chemistry department at Duke, and
then it was, Duke got grants to do it for drug discovery and Sharkine, our
chief scientific officer, was at Duke at the time as a post-doc, she did the
original design for the drug discovery. But with some changes we can use it to
design materials. We haven't done that yet. But -- and I think that what we
would do initially is to do that as a service, to find if we find an
interesting project or a customer who is interested in doing that. That we can
go back and optimize the materials to certain properties.
What inverse design does is it maximizes the score on a property. In the case
of drug design, that property is binding. But in the case of materials, there
are other properties that people are interested in, and you have to change the
property equation. And so there's some testing and some changing of the scope
and size of the problem that would have to be done. But we would be interested
in branching out into material science as well because we think -- this has a
different risk profile from a business sense than drugs do. Drugs are low
probability of success, but big money when they succeed, whereas material
science would probably be a little bit more stable. So it might be good to
complement two businesses. But we're small and focused at the moment. Yes, we
would be interested in that.
So if you know of others interested in your community in that problem, we would
certainly like to have conversations.
Any others?
>>: [inaudible] so in the transition to Azure from your proprietary cluster to
Amazon to Azure now, has it been easy? Is it ->> Ed Addison: We weren't really all that far with Amazon. We had only done a
couple of runs when I first came in here, and then we were given some Azure
time.
It was a little bit challenging for our folks at first, because they didn't
have, they're mostly Linux-type C programmers that didn't know the Microsoft
platform all that well.
But we recruited a guy from Florida, the one who did the PDB work, was a very
strong database guy, good software engineer had experience with Azure.
Basically coached our staff on getting through some of -- we got some good tips
from Microsoft people, too. But they had to go through periods of time of not
knowing a bunch of things. And so it was lack of familiarity with the
Microsoft platform as a developer. And so we brought Eric in who has done the
PDB work, and he's helped the others, Terry and Bill and Sharkine overcome the
we don't know the Microsoft platform. And I think we're passed that hurdle
mostly.
>>: So rather than just a learning curve that you're coming up on, do you think
that capabilities, the things you're going to try to accomplish, you said
embarrassingly parallel you said. I tend to use the term pleasingly parallel
because I'm not embarrassed on parallel at all.
>> Ed Addison: I'm not either, but that's the term that technical folks like
to use. I'd rather have ->>: Yeah.
>> Ed Addison:
Then it cuts some costs down.
>>: But you're finding that you've got access to everything you need or there
are issues you still have in the platform?
>> Ed Addison: It's taken me a while to learn who to call at Microsoft. But
I've gotten more comfortable with that now. And different people at Microsoft
have different ways that they respond to messages. Some will respond to
e-mail, and some might need a calendar appointment or somebody might need a
text. So you have to kind of figure out -- okay. But so that was one
challenge. That's a nontechnical challenge. Getting the people trained was a
challenge.
We've run into a couple of technical barriers on Azure that your staff was very
helpful with also. And we're working to find the best data centers for these
big jobs. We ran a job on a Friday night once and it took a long time to queue
up. Not that long. Since this is being recorded I won't say the numbers.
But I think it's gotten better since.
>>: Okay, thank you.
>> Ed Addison: And we're going to follow up with having some of our technical
folks, especially on the PDB, talk to you SQL Azure folks.
Any others?
>>: [inaudible] how much, like how much manual effort does it take to do one of
these problems?
>> Ed Addison: The part that's manual is once we choose a protein, you have to
go get the extra structure and set up a computing file. And that doesn't take
too long. But it's not fully automated. But the part that's a little more
challenging, there's two of them. One is calibration, where we have to go to
the literature, grab any data that's available, pull it out, and run a
calibration run for that protein.
And if we could do that with the literature extractor we could largely take the
person out of the loop on that.
And the other one is scaffold design. Scaffold is largely based upon the
binding pockets of the protein. We don't want to just do blind molecular
space. It's too big. So we have to -- are we doing a peptide? Are we doing a
small molecule. There's no scaffold in the peptide we just have to decide how
long. For the small molecule, you may have some parts that are going to be
fixed and many parts that are going to be buried. And you have to decide how
big you want that molecule to be.
And the literature can give pointers to that. Chemist's intuition. This is
the hardest one to automate that we're working on. It's going to be part
expert system and part extraction and part assembly. However, those are not -the reason we're using Azure was for the QM part, this is the real number
crunching where we need a lot of parallelism. This automation is more along
the lines of if I want to queue up a lot of proteins I want to get people out
of the way so I can streamline them rather than having a two-week delay for
each one or have multiple people in parallel for each one. You've got that
labor down.
And also to release this as an application for customers, I think, we really
have to keep that much simpler than that setup is now. It's not terrible.
It's just not as automated as the rest of it. We focused on the heavy lifting
thing first.
So it's just where we are. And there's some hard problems in there but there's
not impossible problems in there.
>>: [inaudible] archive?
>> Ed Addison: Well, what literature should we search or do we search when we
set it up? Usually Pub Med. We want to know what biochemists and molecular
biologists have found when they've either done binding experiments or done
pathway analysis. Depending on which problem we're looking at.
The pathway is really more to determine is this protein potentially a target?
If we're doing preemptive calculations we don't wait for a validated target.
If we're doing a service for a customer, they'll bring us a validated target.
So there are two different models there.
And the project we're doing with Microsoft, that's the preemptive stuff. The
ones we're handpicking initially are validated targets coveted by the industry.
But when we get to looking at the bigger piece of the database, we have to have
methods of selecting that are smarter.
>>: Could you tell me a little bit about the calibration process that you do?
>> Ed Addison: It takes a set of binding data for anything which has been
bound to that protein. And uses it to track the parameters in the algorithm.
So it's ->>: Operation ->> Ed Addison:
No, but.
>>: A .6 gets mapped to a .75.
>> Ed Addison: No, but there's a
the inputs that you want to use.
I'd have to set up a call between
the algorithm. But the algorithm
that we're going to put in it.
problem that does it. You just have to find
And if you want the detailed science of that,
you and Sharkine, so she can share what's in
exists, it's a matter of finding the data
And the same would be true if we did material sciences. We'd have to seed it
with anything that was known. Now, it's possible to do this without
calibrating. But it takes a two-step process. In other words, you find a
homologous protein and get some of that data. Run with that. Take the
results. Then you have to synthesize it and get the binding data and run it
again, unless you get really lucky and it's good enough then you don't have to
get it again.
>>: [inaudible] does calibration affect the sort that you would do afterwards?
What binds strongest and what binds weakest? Or is it only when you're talking
about you want something that matches this and doesn't match these, now you
really need calibration because you have to be able to combine ->> Ed Addison: No, that's selectivity. The selectivity is done by running the
result set against other protein models. All the proteins have to be
calibrated.
Others?
>> Bob Davidson:
>> Ed Addison:
[applause].
>> Bob Davidson:
I want to thank you very much.
Thank you for the opportunity.
Appreciate it.
Download