>> Lee Dirks: Well, thank you for coming back... did have a chance to chat with colleagues. But...

advertisement
>> Lee Dirks: Well, thank you for coming back from lunch. I suppose you guys
did have a chance to chat with colleagues. But let's get back to business. And
we have a great afternoon lineup of more speakers and we'll just go ahead and
hand it over to Heather Joseph, who is the executive director of SPARC and is
here to talk to us about is open access the new normal? Heather.
>> Heather Joseph: Thanks very much. My very inauspicious beginning to this
talk is that the technical folks had to remind me that today was actually February
20th, not the 21st, so I had to go in and edit slides to correct the date. So I don't
know, open access maybe the new normal, but I'm not sure where my mind set is
today.
I'd like to start just by thanking Science Commons and Microsoft for inviting me to
be part of this program. It's always wonderful to come into a program where
you're more excited to listen to the other people who are speaking than you are
about the possibility of getting your own message across, which is an unusual
position for us to be in. So thanks so much for the invitation to be here.
The topic that I'm going to talk about today is open access and in specific when
SPARC concentrates on open access, we talk about open access to journal
literature. We talk about it not in the larger open data picture but as sort of the tip
of the iceberg, a subset of the result of research that are in digital form scholars
and researchers produce.
And SPARC, for those of you who may not be aware, is actually a library
membership organization of research and academic libraries. And we were
founded to be a catalyst for action within the scholarly community to do a very
specific task; to try to expand the dissemination of the results of research that
scholars produce and to do it in a very specific way, to do it taking advantage of
opportunities presented by network digital technology. So to try to take an old
tradition and marry it with new technology and try to move things forward.
And because we're a membership organization, we have sort of a third part of
that task. The results that we achieve in expanding dissemination through
technology are designed also to help reduce the financial pressures that have
built up on academic and research libraries over time, which is why we
concentrate on journals. Because this is the subset of materials in the academic
world where a pricing issue has really put pressure on our ability to be able to do
our jobs, which is to deliver access to people on our campuses, in our research
institutes to the materials that they need to do their work.
When prices go up, our budgets don't go up at the same rate, we end up
spending the same amount of money or worst case scenario more and more
year in and year out, but providing people to access to less and less material.
So SPARC really has focused on the idea of open access as a way to move the
game forward to achieve the results that we're tasked with doing. Inspired by
Cameron this morning, I wanted to just also note that I always thought that I was
uniquely positioned to be the executive director of SPARC because I had been a
journal publisher for 15 years, I have an MBA, I've worked in disciplines like
neuroscience, cell biology and astronomy, who have really worked with an
understanding that, you know, open is better.
But I found out yesterday reading the Atlantic that actually it was dropping out of
college and going on dead tour that probably best qualified me to understand the
business models that are supporting ventures like open access are, you know,
something that apparently Jerry Garcia a pioneer of. There's a great quote in this
article that talks about the best way to build a demand for your product that the
Dead realized was to give it away. And I think we really see that in the open
access arena.
So I was thrilled to know that what my parents called the lost years were actually
-- they were all working today's this very important goal.
So we work on open access, you know, again, for the reasons outlined before.
We have a great opportunity. And a lot of times I think people when we talk
about open access, especially when it's coming from the library community,
people expect it to be well, books cost too much.
Well, yes, that's a symptom of the system not being able to deliver optimally what
the system should be able to deliver. But really we get excited about it because
we have a chance to bring this information to a whole bunch more people at
relatively little cost.
And the nature of research is you all very well know, right, is that it's cumulative.
You only gain in value in your research investment when somebody can get to
the stuff you've produced, use it, and build on it. And yet despite the fact that
we've been living in this digital environment for quite some time now, we only
have access to just a fraction of the research articles that we should be able to
have access to.
And that's led for a pretty large scale call for not just a little bit of change but a
whole new framework that allows us to really think about getting to this stuff and
using it in a whole new and more full way.
We work on open access and the idea of open access. I'm going to go to the
definition of open access and then I'll go back to that page. We use the
Budapest open access initiative definition. And I think it's really important that we
put this up here.
By open access, we mean yes, the ability to get to it immediately on the public
Internet. And it's free. It's barrier free. There's no cost associated with it. But
also to be able to use it, to read it, to download it, distribute it, print it, crawl it,
pass it as data, use it in any kind of way that the electronic world allows us to do.
And we try to think about what happens if we're able to do this, right, that we're -we look at this from the very high levels to when this information gets to our
individual scientists on our campus, that it lets us real collect -- strengthen our
collective ability, to leverage our investment in science, right, build on the value
that we want to build.
It -- I think we want to do this for a variety of reasons. Cameron did a nice job
this morning, I think, with a very nice set of visuals talking about well, why do we
really want to do this? We want people to be able to get to in to -- for people to
be able to think of new discoveries and innovation.
The example that Cameron gave was cures. Ice certainly biomedical disciplines
there's an easy case -- not an easy case but a relatively straightforward case to
make for bench to bedside, right, to being able to speed up when you do your
experiment to when you can actually see something that will have an effect in the
community.
But it really that holds true in just about any discipline, in environmental sciences,
in energy research, in agricultural, sustainable agricultural techniques. You can
begin to see that the more people have access to these results and the faster
things can build the faster that benefits get returned to the public good.
So we work on advancing open access from -- in a variety of different ways. And
I'm going to talk about four different -- the four different strategies that we use to
advance open access and try to answer the question are we getting to the point
where open access is the new normal in any or all of these arenas while I do
that.
And the first strategic that we use to kind of get to open access is kind of under
the category of infrastructure, right? So if you want people to be able to publish
journal articles and have them available openly, it stands to reason that you need
to make sure that they have open access journals to publish in.
When I started talking about this about five and a half years ago and did this talk,
I would talk about very proudly the fact that the directory Of open access Journal
listed a couple dozen journals that were options, that why alternatives to the
traditional subscription journals for people to publish in.
This is a snapshot from the DOAJ this morning. There's 4,755 open access
journals now listed and indexed in the Directory of Open Access Journals. And
that's fantastic, right? It's fantastic as a benchmark because there's this whole
suite of options and a variety of disciplines that researchers can now choose.
And we're beginning to see that we are moving from a few dozen to several
thousand. It is beginning to move us more into the mainstream. And every time I
give this talk audiences on campus will say yeah, but they're not all good ones,
right, they're -- you know, there's a bunch of crap in there with the high quality
ones. And I say yeah, that's true. But what a luxury for us to be in the position
where we now of a situation With open access Journals that mirrors the situation
with subscription access journals. Not every subscription access journal is
perfect either. It's caveat emptor equally in both worlds. And the twain are
marrying, and I think that that's a very positive sign.
Also, when I started giving talks and encouraging people to think about open
access, this is all I had, right, volume, volume, volume, year in and year out.
Month after month I would put a new slide up and the number would go up. And
the numbers have continued to go up tremendously. But I think what leads me
down the path of thinking open access is becoming more of the new normal is
that volume isn't only all we have anymore, we also have functionality. And
journals that are appearing in the open access marketplace that do things -- Pete
Binfield must be here, and a he's cheering for himself, I want you to know. PLoS
One is a great example of innovation in open access publishing of things that can
be done in the open access environment that aren't being done and in many
cases cannot be done in a closed access environment. And I think that's hugely
important that we can now talk about the kinds of things that scientists are
actually doing in this arena. And I'm not going to talk about it, because Pete is
here, and he is going to talk about it this afternoon.
But I am going to give one other example of why I think things are moving in the
direction of more normalcy. It's interesting because I was having a conversation
with a couple of colleagues here in the audience and I mentioned that one of the
other milestones in open access journal publishing was that last year the open
access Scholarly Publishers Association was established. And it's essentially a
trade association for open access publishers. And the reaction of one of my
colleagues was yahoo, a trade association. And I thought, you know, I get that
reaction on the one hand, but on the other hand it's amazing that at this point in
time there are so many open access publishers and it's such an accepted part of
the environment that they feel they have issues and that they're at a level where
they need an association to be represented in policy discussions and also in
norm setting and in best practice discussions.
It's not catch is catch can. This is beginning to be something that's associated
into the culture of the academic community. And I think that's critically important.
Because one of the biggest barriers towards acceptance of open access has
been we've always known it. There's a culture change. There's an element of
that in the research community and also in the academy as a whole. And I think
this is one big milestone that shows that we're beginning to not only have tiny
little pockets of journals that are out there, the numbers are growing, the utility is
growing and the fact that they're coalescing as a community is very important.
We see mirrored in the second arena and the second strategy that we work on to
encourage open access to take root. And that's in the arena of open access
repositories. This is a mash-up of the Google Earth and the directory of open
access repositories, Open Door. And it shows I think it's 1422 open access
digital repositories that have been established all around the world. And they're
all different colored dots. And those represent the different sort of flavors of
mostly open source software platforms that these repositories have been -- are
being supported on.
And again, when I first started talking, a handful of these scattered around the
world. You see the global nature of this movement, you see the growth in again
volume, volume, volume, and you see the different kinds of applications that
have been built, the options that are out there. So if you're a scholar and you
can't find, you know, one of the 4700 open access journals isn't perfect to publish
your article in, you're not out in the cold, you can still pick one of these 1400
repositories and put your article in it and feel fairly confident that the world will be
able to get to your material.
Like the journal world, it isn't just numbers anymore. We're also seeing the
activities of these repositories moving in a direction that I think is very important.
And we heard this morning over and over again that interoperability is the key to
really being able to leverage the effectiveness and the value of the research
that's contained in this universe. We're seeing projects now where individual
institutions, governments, consortia are establishing repositories, and they're
recognizing that standing alone they do something. Networking them together
does a whole other thing. So, for example, in Europe, we saw the sustained
project, the Driver Project that talked about networking repositories and the
economies of scale that can be achieved and the kinds of things that can be
done when the repository community begins to act as a network rather than as a
series of silos. I think that was a huge milestone in moving us forward and
moving us down the path of open access really kind of coalescing.
Much like the open access journal world, we're now seeing groups and
organizations. We're saying this is part of a landscape, part of the culture. And
in the academy I guess we like because we like to be a group. We like to get
together. And the repository community is really beginning to coalesce much in
the same way. And we now saw the establishment in just October of the
confederation of open access repositories. It's still a fairly nascent group, but it's
come out of the Driver project, and it's come out of the recognition that there is a
need and there really is a call for these repositories to think collectively about
leveraging their strength.
So again, I think from the infrastructure perspective, from the strategies that we
pursue at SPARC, we're really pleased to see us move from a world where
there's a few one-offs and some progress to this real sense of community and
establishment within the academy.
This is a slide that I think is just important as a touch stone. For repositories a lot
of times we still worry about the question of content and attracting content and if
they build it will they come and if they do come and put their content in, will
anybody use it, sort of mirroring the idea that in the data world this is a collection
of all the NIH databases but Pub Med central is included in this. And PubMed
Central is the repository that NIH has for its full text journal article repository.
The deposition rates and the use rates look very similar in the NIH world, right,
they've been going up at an exponential rate. What's significant about this set of
statistics and what leapt out at me, I actually watched Dr. Zerhouni, who used to
be the head of NIH, give a talk and he used this slide, and I immediately
requested permission to snag it upon which had you reminded me that it was
produced using government money, so it was already open access. Yeah. And
we were testifying on open access in front of congress. That was embarrassing.
The thing that leapt out at me on this slide and which continues to sort of let me
know that we're on the right path when we encourage people to put things in you
know repositories is that of all these databases, they broke out the use of
PubMed Central.
And at NIH they found at this point in time they were getting 4,000 -- 400,000
unique users a day for the full text of journal articles at NIH.
And it's now up to something like 450,000, it's growing at a considerably fast rate.
The number leapt out at me because I don't think that there's 400,000 practicing
researchers that are heading this database. It says something about the demand
for the full text of journal articles that extends far beyond what the users that we
might think in the academy are the core users. And that for me was very
important because as we get into the next two strategies that we work on to
expand open access, particularly the policy arena that was a very compelling
statistic. Because we so often hear that science is for the scientists.
We hear this from the publishing community, we don't hear this from the
scientists, but we hear it in arguments against open access. Science is for the
scientists, this is for scholars, the information in journals will not be of interest or
of use to anybody beyond sort of the ivory tower in the academy.
And that besides being very patronizing and pedantic and, you know, annoying in
many ways I think is untrue. When you look at these kinds of numbers, you can
again to build a case for the demand and the kinds of things that we might be
able to see coming out of this sort of usage are I think very encouraging.
So the third strategy that we used to get to open access was the first project that
we ever worked on with Science Commons together. And that's the idea of if you
can't find a journal that's the perfect outlet for you and you are not really sure
about using an institutional repository, you can still be sure that your article gets
the widest possible downstream use and the broadest possible distribution and,
you know, the best life it possibly can have in an open access arena if you do
one simple thing, if you make sure that you, as the author, as the creator, reserve
the rights that you own from the outset until you choose to sign them away to
allow this article to be read and used in the ways that you might envision as an
individual author.
So whether you pick a journal, pick a repository or not, you can still negotiate a
transfer of copyright that allows your article to be able to live and breathe in the
world in the ways you want it to live and breathe. And that's -- we do that through
Author Addendum. I'm sure most of you are familiar with the project that Science
Commons really helped us to spearhead.
Author Addendum is a very, very simple legal document that Mike Carol from
Creative Commons, John Wilbanks helped us work on this. You can attach it to
your copyright transfer agreement. And you reserve the pieces of the budget of
copyright that you need to say I just want to put on it my website and let people
get to it when they want to get to it. Or I want people to be able to use it in class
-- in the classroom in the ways that they want to use it.
It's a way to say you can take control right from the get go and ensure that if you
want your work to be available with all the benefits that accrue under open
access you can do it from the outset by reserving those rights.
That's proven to be hugely important as we've gone forward into the final strategy
that we've been pursuing to ensure open access. I don't know why the computer
did not want Stuart to be up there. Jumped right past him.
And that is in the policy arena. And that's where I'll spend the rest of my time
sort of talking about the latest updates. And this is the place where I think we're
really seeing it, just a sea change in what's happening to move open access
much more towards the mainstream.
The idea of the retention of rights and of exercising the copyrights in the fullest
was behind the Harvard Arts and Science faculty when they voted to make open
access the default on their campus, right. The way Harvard went about, the
faculty went about ensuring that the results of research that was produced by
their faculty would be made openly accessible was to say that the institution
reserved a non-exclusive worldwide irrevocable distribution license, right, it
basically did what -- one of the things that we suggest that authors think about
doing in examining their copyright transfer forms, which says you can publish my
article in your journal, but I reserve the ability to make sure that it's seen by the
people that I want it to be seen by. And then the authors at Harvard agree to
place a copy of that article in Harvard's institutional repository.
So we see these deposit permission mandates that we've been talking about on
campuses and that some funders have been toying -- have been -- I shouldn't
say toying with, having working on coming to fruition on campuses. Harvard was
the first campus. MIT quickly followed suit. Stanford School of Education rapidly
thereafter -- and I should preface there by saying in the United States. Because
there are -- I see Peter saying no, this is happening in Europe and worldwide.
That again, sorry, SPARC perspective, US perspective. Thanks for that head
nod. In the US those campuses quickly followed suit.
We're now seeing public universities like the University of Kansas and liberal arts
colleges like Oberlin and Trinity. It's faculty votes, it's the faculties recognizing
that by exercising their ability to retain copyrights that they can make sure that
their work gets seen and used in the open access environment in a very full way.
And I think of this year after these policies were established, we will see a
snowball effect. And one of the things that we're doing at SPARC is this photo is,
for those of you who don't know, the lovely Stuart Shieber, who was the faculty
advocate at Harvard who led that charge. Stuart and Hal Abelson at MIT, John
Willinsky at Stanford, Lorraine Haricombe at Kansas, Ray English at Trinity, and
Diane Graves at -- or at Oberlin, and Diane at Trinity all agreed to form what
they're calling a little open access faculty swat team. And they're now available
through SPARC a. And if you're having a conversation on your campus or if you
as a researcher are interested in trying to advance an open access policy on
your campus you can come to SPARC, we have resources that everything from
white papers to checklists of these are the considerations that you might want to
think through on campus. When you get down the pike far enough, one of these
folks, from Stuart to Lorraine, to Ray, are available to work with you and your
campus on a one-to-one basis to help you go through that process and give you
the benefit of their expertise. And they do it for free.
I mean, this is really something that is a net -- we have a network of people who
are very committed to helping these kinds of policies go forward. So I'm very
optimistic about what we'll be seeing over the next year.
Okay. The final area that we work on is -- are policies that extend beyond the
campus environment. When SPARC first started, we worked on advocacy, and
we worked on advocacy very much from the let's see if we can get the librarian to
talk to the provost about, you know, these kinds of issues and to try to get
attention in that arena. I think we learned very early on as we were trying all
kinds of different strategies for affecting change that if we were going to plan the
advocacy arena that wasn't going to be enough. We needed the bottom up
approach for policy locally but we also needed the seats at the tables where the
larger commercial interests in the publishing world were sitting because this was
really where we were going to have the barriers to have even our local campus
policies succeed, we needed to take those down in order for any change really to
be made.
So we work on open access policies on a local, national, and international basis.
This statement from the organization for economic cooperation and development,
the OECD that came out in 2005 is something that really helped us coalesce the
way we thought about doing advocacy for open access on a much larger basis
than -- see, it still sounds funny to me to say it, a much larger basis than even on
our campuses like Harvard. I mean, it's amazing to me that we need to work on
these levels simultaneously.
And you should know SPARC is, you know, seriously two girls and a dog. We
are three full-time staff people. Rain Crow, who does a lot of our research is the
dog. I just want to go on the record as saying that. We're a very small
organization. So we had to find ways to work on these issues and to find ways to
affect change in open access by really drawing in the community and finding
messaging that worked for all of us to kind of gather around.
And this statement was one of the things that really helped us coalesce. And it's
that idea that governments would boost their innovation and get a better return
on their investment in publically funded research by doing one very simple thing,
by making the findings more widely available.
So we worked on creating a much broader organization than just SPARC. And
we convene a group called the alliance for taxpayer access. So the library
community is part of it. But I will tell you that when you try to do policy advocacy
from the library community, a library -- a representative of the library community
who goes up to the hill and wants to talk to a senator or representative about the
price of academic journals you get about two and a half seconds before their
eyes glaze over and they just don't have any interest.
But if you go up and you talk as a representative have the alliance for taxpayer
access, Americans have funded this research with their tax dollars and we
deserve access to it, suddenly the door opens as they're the custodians of trying
to steward the results of our investment in a responsible way. So we convene
the alliance for taxpayer access, and over the years it's grown to be a group of
almost 100 different organizations that run the gamut from libraries to full
universities to patients advocacy groups to consumer groups to economic groups
to student groups. We now have over six million -- groups that represent over six
million students alone that are part of our coalition. So it's a hugely active
organization, and it's been tremendously effective in helping us to bring the
message and educate policy makers about the potential benefits of open access.
So rather than, you know, journals cost too much, it's here are the opportunities
that we have available to us to make our investment in scientific research do
more for all of us collectively.
I think it was Cameron again this morning -- this thing has a mind of its own -who talked about coalescing around principles. And Peter certainly underscored
that with the declaration that came out yesterday that it's really important for us to
be able to articulate clearly what it is the alliance is doing. And for us it was a
very simple statement of four principles, to be able to put out there and then to be
able to use the power of numbers of this coalition to advocate for policies that fit
into the categories of, you know, taxpayers are entitled to see the results of the
research that they invest in and that this is part of our collective investment in
science. And here are the good things that will come out of it, that this
information should be shared so it will stimulate discovery, innovation, advance
the translation of this knowledge into public benefits.
And this has been a very helpful platform for us to advocate successfully for
things like the NIH public access policy. Call it a public access policy because
it's not an open access policy, it has an embargo period in it which keeps it from
being pure open access, right. It says that if you take money from the NIH and
you do research, you agree as a condition of that grant that you'll make any
article resulting from that research openly accessible after 12 months in the NIH's
digital repository.
Now, while the 12 month embargo period was a really bitter pill for me to swallow
initially, one of the things I did have come to recognize, after crying, I readily
admit that, is that it's a tremendously important proof of concept policy to have
out there. An embargo period allowed the policy to go forward. As an open
access policy it was considered, I mean radical, a non-starter, there was no way
it was every going to go through.
With the 12-month embargo period, what we've done, and I actually had to have
somebody remind me of this, was to take the availability of these research
articles from a perpetual exclusive distribution license that we always were going
to have to pay to see to that window being reduced to 12 months. And in doing
that, it also gave us a demonstration that the sky did not fall with that, even with
that tremendous reduction, the sky did not fall on the publishing industry's ability
to manage peer review or to provide the kinds of services that the publishing
industry was certain would go away immediately into vapor if these kinds of
policies went forward.
So while not a perfect policy, certainly a tremendously important milestone in
allowing us to be poised to make even greater strides in the policy arena. And
the last two things that I'll talk about before taking questions are what's
happening right now.
So right now there's a piece of legislation that's been introduced into the senate
that would essentially extend the NIH policies to 11 other agencies that fund big
science. So any agency that has a hundred million dollar research, external
research budget or more. It was introduced in June. And we actually anticipate
that within the next two weeks they'll be a house companion introduced into the
house.
Now, bills get introduced all the time. Why am I even excited about this? Well,
I'm excited about it because the environment into which this house companion
will be introduced to draw attention to that senate companion is -- it's changed
radically in the last year.
When the Obama administration came in, they sent a signal that they were very
interested in the idea of openness and transparency in all of the endeavors of the
federal government. And in fact, during the transition process they signaled that
they were particularly interested in the results of federally funded research.
They asked for background information from the community informally at first
about the idea of hmm, this NIH policy looks pretty interesting, should we be
considering something similar, more aggressive perhaps, who knows, for all of
the results of research that we fund, and in decision they issued a formal request
for information to the public that asked for exactly that information. It's really
important that that request for information did not ask is this a good idea or a bad
idea. It did not say should the American public have access to this information or
shouldn't they? It was a series of nine multi-part detailed questions, three
community blogs over 45-day period that asked how should we do it, and they
broke it up into implementation. Should there be an embargo period, if so, how
long? Should each agency establish a repository, should we look to partner with
the university community? How should we do this?
It asked technically, are there standards for interoperability among repositories or
standards for interoperability among what we're asking people to deposit that we
should be considering? And finally they asked very detailed questions about
management, how much will this cost, how can we sustain it, and how can we
make sure that it's working the way we need it to work. I get so choked up
talking about this. [laughter].
It seriously happened over Christmas, right, so December 9th through January
21st. And they did a series of three blog discussions which many of -- several of
you in the audience actually were kind enough to give up your time over your
holidays and discuss. But they got more than 500 comments for the federal
register process and a very robust discussion in the blog.
And what's happened is because they not only issued this request for
information, they issued it as part of the larger open government directive that the
administration is engaged in. They have signaled that it's a core issue for this
administration. They have a war in Afghanistan, a war in Iraq, they have a
recession that they're dealing with, they're dealing with healthcare debate on the
hill. And within the first 10 months of being in office they issued an RFI of
tremendous detail to try to figure out how to move forward on this issue.
So policy makers on the hill are fully aware of what that means. And I think this
is the best chance that we've had and that we probably will get to advance some
sort of national, cohesive US policy that will move us in the direction of open
access within a very short period of time.
The last thing that I'll say a little bit about before closing is while all of the signs
that I've sort of pointed to that would say yes, we're sort of moving towards open
access coalescing as the new norm, many of them are positive, one of the other
very strong signals that we're making progress is that the nature of the opposition
has really changed. And as we step up our game, folks who oppose it step up
their game. And we've seen a whole different set of intensity and arguments that
have come into play over the last 18 months I'd say that have been put out there
that have us not scrambling, but have us having to create, you know, good,
strong answers and very different arenas.
So far from saying now that the finances of individual journals are threatened or
we're not sure about, you know, the effect that this will have on commercial
publishers or not-for-profit journals, the opposition to open access and open
access policies in particular are kind of coalescing in three different categories.
And there's one that's probably the most pervasive and I understand that's not
only here in the US but also internationally that although these policies have
nothing to do with implicating US copyright law or changing it in any way, shape
or form, the layer of arguments that we're seeing right now are saying don't touch
these policies because even though we're not really sure how they come into
play with copyright, we can't afford, as a country, any perceived -- any perception
that they weaken in any way, shape or form this administration's will to go out
with strong IP enforcement in any arena.
So we actually had a comment submitted to OSTP, to the White House during
the RFI process that was from the Copyright Alliance that opposed moving
forward at all on an open access policy. And a Copyright Alliance this letter was
signed by the American association of publishers, yes, no surprise, the recording
artist industry, the software industry, the motion picture industry, Nascar, the
NFL, Major League Baseball and all of these people who have a very strong
interest in making sure that you don't get access to scientific articles on a timely
basis. [laughter].
So it's stepped up in a very interesting way. There have been also new
arguments that have come out fairly recently about unfair competition.
Publishing industry does not like institutional repositories. Not that they don't like
open access journals but institutional repositories with now being perceived as a
much bigger threat because they have the potential to provide an entry point to a
robust corpus of information that anybody has fair game to build a value-added
service on. I think that actually encourages competition, but the arguments are
no, if we don't have this gravy train lock on it, it's bad.
And the final thing, and I think this is something we really have to watch out for is
that open access poses a threat to academic freedom. And in particular that
open access policies can unduly influence scholars to feel that they have to
publish in a journal that is open access friendly and that that limits a scholar's
choice of publishing outlets that are acceptable in their field.
And I understand where that comes from. I also know that the policies on our
campuses in particular have been very careful to include automatic opt-out
clauses so that if you feel in any way, shape or form that you don't want to
comply, Harvard says no questions asked, you can do it.
But I can't think of a system that's more or less conducive to academic freedom
right now than the system that we have now that says publish in this journal and
this discipline with this impact factor or you don't get tenure or you don't get
funding. So I think it's a very hard argument for folks to make that open access
somehow is moving in the wrong direction in terms of academic freedom than in
the right direction.
One last thing to say. And the idea that open access is becoming the new formal
to me means that we provide the infrastructure, we've got options, we've got
acceptance of these options beginning to grow, we have greater education in
terms of what it is that I need to do to make my stuff open access and copyright
education. We have policies proliferating and we're on the brink of national
policies that will be very supportive to this. But the thing that makes me the
most optimistic actually came over my cell phone while I was sitting in the
audience here. The other girl and a dog from SPARC are at the AAAS meeting
in San Diego today. They're not getting a fellowship. We have a booth. And
we're educating people on open access. And we've gone several times. And we
hadn't gone in the last couple years because sometimes we can be a little bit of a
lightning rod and, you know, we wanted to make sure that the conversations are
positive and productive.
And I got this picture from a colleague. These are not my colleague. These are
attendees at today's AAAS meeting. And they're all wearing open access, got
rights, public library of science and SPARC T shirts at the AAAS meeting. And
that lets me really feel good about the fact that our researchers, they're getting
our message and they're wearing it at the meeting today.
So I do think that all and all we are moving really in a good way towards open
access becoming the new normal. So thanks so much for letting me speak.
[applause].
>> Lee Dirks: Any questions.
>>: I have a librarian question.
>> Heather Joseph: Yeah? I'm not a librarian.
>>: That's okay. That's good. [laughter]. [inaudible] has been around for 44
years and it's the most highly structured and sophisticated of the databases,
periodical databases. And then you go to PubMed Central, and it's like searching
Google. I mean, indexing structure is not there for it.
>> Heather Joseph: Yeah. I think that there's a lot of work to do on this. Are we
even in first generation? What have I done? I'm not even connected. There we
go. Sorry. A little technical glitch.
We're not even, I think, really in -- we're in first generation in these databases.
But I have to say that of all the places that I have confidence in to work this out
and to come up with better ways to get through this material and CBI, NIH is one
of the places that I -- they have phenomenal people who are there. I think they're
also extraordinarily responsive to feedback and whenever we have had people
who have come and said we've changed this user interface or I can't do X or I
can't do Y, they take that feedback very seriously and they work on it.
And I would encourage any kind of feedback that you have, give it to them,
because I'd rather the criticism came from within the community, we work on it
and strengthen it and then we can continue to point at that for an exemplar for
other agencies to considered following.
>> Heather Joseph: Yes?
>>: I'm in a research library, and we've definitely found that our scientists,
researchers are completely open to open access. They're not the ones we have
to convince. It's general counsel, it's tech transfer. Can you tell me if SPARC is
kind of looking towards creating tools for the library community to help ease
those folks into this conversation?
>> Heather Joseph: Sure. That's interesting that -- because that experience
doesn't tend to be the same on every campus and in fact, I think the tech transfer
people are often times a little more comfortable because they recognize this is
already published material. We're not breaking any ground here. We're talking
about making stuff accessible that's already been -- you know, appeared in the
journal. And the general counsels, you know, they've been extraordinarily helpful
with us on copyright issues and in terms of crafting policy. So we don't have any
specific tools that are targeted there, but we have had a lot of conversations with
different offices, so we're happy to, you know, to work with you on it. Where are
you?
>>: [inaudible].
>> Heather Joseph: I don't know if you guys are a member. [laughter]. But
we're still happy to talk with anybody any time. Thanks so much.
>> Lee Dirks: Thank you.
[applause].
>> Lee Dirks: What I'd like to do now is introduce Stephen Friend. And he's
going to be telling us about some break through work they've got going on at
Sage. I'll turn it over to you.
>> Stephen Friend: Thanks very much. So to orient you, you've been hearing
about what I'd say is some remarkable progress on how chemical structures can
be interrogated and some day machines can go and find those structures and the
complexities dealing with that. And I think you've just heard an example of the
ability to search articles.
What I'm going to talk about is something that lies in the land in between and is
something that I think is going to be critical if we're going to make progress in
carrying out someone's phrase, which was getting from bid to patients. This has
to do with the new technologies that are becoming available to interrogate what
actually make up diseases. And the nature of that data doesn't fit into the simple
world of recordable compounds, it doesn't sit into the world of I just put it out on
the -- some site as appendix as some material to query.
The other thing to say as a disclaimer is this is an effort which is [inaudible]. I
first met John Wilbanks in the Science Commons about a year ago to the day,
and this is a seven-month old effort that is moving along but in terms of I dream
of five year, 10 year descriptions of what I was doing in 2005 or doing a decade
ago. So nascent effort.
But there are three themes I hope I'm going to get across to you. One is there is
a fundamental shift in how diseases are being looked at that is being driven by
new tools. I'll go through that.
Secondly, there is a fundamental different way that clinician/biologists are going
to be working. So remember that I'm not talking about the scientists in general
because I'm going to argue that actually physicists made this migration 30, 40
years ago. But for clinician/scientists it's a fundamentally different way that
they're going to be looking at working coming into the future.
Then the other, which I think is very important, is the role of patient advocates
and how patient advocates can actually help with regard to access to data.
Because without it being driven that way, this is unlikely to go forward.
In terms of models of disease, our ability to treat is guided by the way we think
about diseases. And going into the 21st Century, we're still being guided by the
Germanic organ based pathologic ways of looking at disease and the molecular
at the level of proteins, at the level of constituents that you would get from a
serum tube when you go and get seen by your physician.
But system to what has happened in other worlds, and I think the best example
here is 400 years ago what happened when you could actually look through a
telescope at the heavens and see that actually there were complex bodies, they
were craters on the moon and basically completely shifted of how one was
thinking what was going on in the world, I'm going to argue that the tools that are
now coming available to query the genome, to query the proteome et cetera, are
actually giving insight that will make us look at who has what disease in ways
that are as different as those that were happening in the areas of astronomy for
example hundreds of years ago.
There has been more hype than is easy to strip away from the impact of the
sequencing of the human genome -- the genome project. But like most
transitions, tipping points in science, you go through hype, then you go and look
at what's really going on, and then here comes a wave that people are going oh,
was there really something there.
And we're about to undergo that second phase, I think, in terms of what is there.
Because the tools that did the first analysis, whether it was being able to look at a
DNA level that took, as I think everyone in this room knows hundreds of millions
of dollars, maybe billion dollars to do, are actually getting down to where it's
$3,000 a day to -- $3,000 to sequence the entire genome and it's going to go
under a thousand dollar. It will be cheaper to have every nucleotide across the
genome than it is to get the gild of CT scanners to read your CT scan. Okay?
It's going to be cheaper to sequence your entire genome than that do or
sometimes to get a pass. It's going to fundamentally change how we think of
who has what disease, what's going on. And it's not just at a DNA level. This
applies also at the level of RNA and the level of proteins.
And when that happens, as that happens, there's an opportunity to begin to think
of disease at an individual level that has not been possible.
Right now what's possible to do is to go and take a cell apart and instead of
looking at protein structures that you might be able to characterize and put into a
structured database, there are signatures which are terribly loose and floppy and
can be shifted around in ways where it's very hard to say that is a defined
signature, the only signature to look at this. We're going to have to get used to a
fuzziness, floppiness in how you're monitoring signatures of what's going through
cells. Here are four examples of different types of these response signatures.
But for those who know in the medical area, this technology was one that our
group used now a decade ago to figure out which young women were at risk to
go on and get metastasis. So that work went on. It was in Nature, it was in New
England Journal.
A year -- sorry, decade ago has turned into MammaPrint and Oncotype. Those
are FDA approved drugs -- sorry, FDA approved diagnostics that are in used to
sorting outcomes in patients.
Right now this nascent field is cluttered by almost religious sects that believe that
particular ways of looking at the data are the best way to look at the data. There
are a number of such zealots who feel you've got to use proteomics. Anything
else is a terrible waste of time. If people feel just as adamant that actually
looking at the DNA level is all you need to do or looking at the RNA level, what
I'm going to argue is if you're going to be able to look at the impact between the
environment and the genome, if you're going to be able to make predictive causal
models of what's going on, it's not going to be possible to take any one of these
slices. So the themes of the models we're going to talk about are integrative
genomic models that actually allow you to integrate what's going on at all of
those levels to get insight as to what's going to go on clinically. And as we do
this, what we notice is a fundamental paradigm that drives most of your pictures
of disease are going to go away.
Most people think of targets and they think of pathways. I'm going to argue that
pathways are not the way to think of diseases, that in fact the pathways, whether
they're metabolic pathways or whether their signalling pathways are actually
looking at rate limiting steps and not looking at the redundancy in systems which
actually determines where the disease is actually going to occur or not occur.
So for any simple pathway, the reason why it's hard to make a drug to target a
particular pathway is chances are two-thirds of the time when you think you have
a rate limiting step, it will not do what you want in a patient because the
redundancy that's sitting there through evolution to make sure that does not
happen in fact prevents the effect that you thought you were going to get.
And so the same time as this is going on the acceleration that makes Moore's
law look like a tamed beast in terms of the data that's coming out is absolutely in
an exponential growth here. We passed a petabyte level of data, this is biologic
genomic data, in 2008. And now the third generation sequencing machines are
able to get close to a petabyte of data out in a period of one day. So what was a
year is now a day, okay.
When that starts happening, the problem is going to be how do you connect this
data together? The bits of information are not what we need to have access to.
Actually, it has to be how are we going to structure that into something that's able
to give insight?
And in every other scientific discipline at a different phase, depending upon
where that development is, those models have what -- been what led people out
of chaos, whether it was alchemy into chemistry, whether it was in physics and
understanding of ring structures and what's going on on at subatomic particles.
And I think that what you're going to see in biology in the clinical world is a
structuring of models that are different from ones you have been used to thinking
about and are going to be critical in order to think about what's going on.
An experiment that frames this as a possibility is one that took five years. It was
done at Rosetta, which is a company that I built up, got sold to Merck. And that
five-year experiment which cost about 150 to $200 million asked a very simple
question. If I could take as much information as I wanted at a DNA level, RNA,
protein, and trait, could I make insights and could I build a predictive model of
disease from a top-down structure, not a bottoms-up. This is the most important
point I'm going to make.
I do not think systems biology in the classic sense gears fitting together like a
watch maker equations trying to do that will in the next 30 years, 20 years, get
insights that are capable of giving predictions about human diseases. This is
top-down, very similar to that blue button you bush on your computer when
Microsoft collects what went wrong. You do it by understanding when things go
wrong, collecting that data and sifting through that in order to figure out what's
really going on in the system.
Because what you want to know in disease biology is actually what is the weak
spot that's sitting in the system, not everything that's going on. So this strategy is
actually one that takes a fundamentally different approach. And it's actually a
unique one driven by Eric Schott [phonetic] in his insights, which is let us assume
that every variation that sits in each of us across our entire three billion base
pairs is an experiment. And let's go in and act as if each one of those nucleotide
changes that each of us have, possibly, are ones that you could read out and
then run the ad experiment in terms of what happened to that patient in that
disease and read a middle layer which is RNA or protein between those
perturbations, those perturbations and looking at the traits.
This is -- takes serious compute power to do. But when you do, you can start
making Bayesian, other network structures of what actually is the weak spot that
determines the behavior of other genes, what genes are driving the output.
So whether it's co-expression networks, Bayesian networks or causality methods,
these methods have come up with a way of highlighting where are the targets,
where are the causal features in various diseases, where are the biomarkers that
would be able to give you insights.
And just two examples. One is a paper that was published five years ago which
said let's go into obesity, let's build models in humans, let's build them in mice
and let's come up and ask, can we make a rank ordered list of the most important
targets that if you perturbed or if you modified those we're likely to shift from lean
to obese. That list that was published in Nature Genetics came with another
paper that came out last auctioneer where all of those were went in and asked
could you validate those.
So if you're used to association studies, GWAS studies, these usually don't end
up coming up with the right answer. Nine out of the 10 of the top list shifted from
lean to obese or obese to lean as it was predicted. That began to give us
insights. You can build probabilistic causal models of disease that will be helpful
in guiding outcomes.
In some other work, where we look -- if you look at the macrophage and get an
idea for atherosclerosis, for cardiovascular disease, for diabetes, could you
similarly do the an analysis of these networks? The answer was yes. And the
cross-talk between tissues ended up being important to figure out how those
linkages occur.
Similar working happening in cancer. Some of the most interesting work is it loud
us to go and look at diseased states and develop compounds that were shifting
disease states and were not going after targets at all. They were just looking at
disease states and could you do SAR off of those signatures. So there are about
60 publications in PLoS, Nature, Nature Genetics that came out from this group
over the last five year, and this is the seminal work that's now driving this concept
of a top-down approach and what would happen if you were able to do that work.
It's been work that's been done here. This is a beautiful example of work done
by Andrea Califano at Columbia doing similar structures. And there are about a
dozen to maybe two dozen labs across the world that have done this approach.
There are another hundred that have experiments going on in this area. This
area is about to take off, which is how do I take clinical data from patients, how
do I take intermediate data, how do I take DNA variation and begin to make
top-down probabilistic causal models?
We think that those representations of disease which are incredibly floppy,
flexible, not very well defined, it's not like the structures, Tony, that you showed
where you could go in, is this the right isomer, is this the that, it's just not of that
nature, are those to be very difficult to sit there and say I'm going to be using this
model, have you used that, is that the same as this? The complexity of trying to
share these models, share this data, is an entirely different thing than figuring out
what drawer of areas things fit in because they're ones which are actually quite
fluid, and they're undoubtedly going to evolve over time. There is not one model
that you can say is absolutely correct.
So the question that comes up is sort of where should this go? Just to
summarize, these models take advantage of the fact that certain parts in the cell,
there are weak spots that determine Alzheimer's, that determine diabetes, et
cetera. And the beauty of it is you do not have to understand the entire system's
biology in order to be able to build those models.
It does, however, take massive compute structure, it takes putting things up in
the cloud to be able to work on this data, it takes people being able to share data
and models and obviously will be completed by Semantic Web approaches.
So we feel that we're going to have to look now, second part of the top, at
fundamentally changing how clinicians and biologists work together. Think of the
current way people work together, this is clinicians and scientists, biologic
scientists as clinicians as hunter gatherers. They get a big funding, Framingham
study. They get a lot of money for the two decades they get to take that data.
The person who generates it is the person who's allowed to make analysis,
almost as if they feel entitled, that they own the data and no one else can look at
it because they were given the right.
Where did the money come from? Didn't come from them. So this concept of
changing clinical trials and who's working with what data and whether the person
that generates the data should be the person that actually analyzes it we think is
fundamentally flawed.
We think changing clinicians from being archivists and sitting and saying
congratulations, I published a paper, and citing that paper is going to be
important, we think is going to go away. We think fundamentally that papers will
not be what is primarily cited in biology and clinician, in the clinical world, any
more than it is in the world of physicists where citing what was the insight, what
was the model, who did that, and thinking of what is the level of which you're
getting your credit, you're getting your recognition, you're getting promotion we
think is going to fundamentally change for these clinicians, scientists.
To do that, though, I do not want to underestimate the complexity of actually
making that transition. That's an interesting world to talk about, but almost all of
the reward structures, the culture, the way the data's generated, the way that it's
published does not take that into account.
You cannot go in and take someone's data now and assume that you could plug
it into someone else's model. We think you fundamentally have to get to a place
where you could actually to that N plus 1. And if we can get to that N plus 1 way
of doing biologic data, it would be very powerful. So logically, no one institution
or group is in a position to do that. That's one of the reasons I left Merck where I
had headed up oncology for eight years was to see, could we begin to put
structures together, and could we begin to enable, leverage, contributor networks
that would actually be building up these structures and put this together.
And so I think it was an acknowledgement that these molecular models that are
coming are going to be ones that require significant resources and there is no
end product. That combined with the fact that we were fortunate that the single
place where this was done inside a pharmaceutical company was willingness to
donate 80 percent of the data that had been generated. All of the hardware,
software know how to a non profit entity called Sage Bionetworks where then that
could be the seed similar to the Encyclopedia Britannica 1911 for Wikipedia, the
seed data that then could be brought up with absolutely no strings back to Merck.
There's nothing Merck gets that others wouldn't get. Had to be that way. Has to
be non-profit. Can't be a for-profit entity.
And so as I had just mentioned, the vision is to get this information connectivity,
the network models and distributed innovation.
And this entity, Sage Bionetworks, based at the Frederick Cancer Research
Center, now seven months old, has taken three strategies to getting there. One
is we think we need more research examples that this works. In this audience I
would bet there are under 10 people who had known about these network -maybe there were 10. But a small fraction of this room who had understood
probabilistic causal models likely to be a mechanism by which clinical data is
shared. We need to be having that enablement. And to do that, what we've
done is worked with groups who actually have the funds, who want to get to that
information and put a stipulation in it as we will help you, we'll take your data,
we'll build a model for it, whether it's a foundation or a biotech or a pharma
company, but at the end of the time that we've worked together there will be a
cliff and all the data and all of the models that you gave us and that we made go
out into the public domain.
So no pullbacks. Go and use this as a way to build models, put that out there.
And training. Physicists and mathematicians and biologists can't talk well.
Those are the two groups that become network biologists and systems biologists.
There are only maybe a couple hundred, maybe -- not many more than that in
the whole world who I would say is trained, network biologists, systems biologists
who need to work together. So fortunately we're getting funded by a number of
groups to actually train people to become network biologist, system biologists to
do that. Hardest part though is building the platform. And that is what is the
infrastructure that's going to allow this data to be identified, models built to work
with that.
I'm not going to go through the active partners that we have. But we've been
fortunate in having many groups sharing data, helping us with IT aspects and
advocate groups.
In the entire world, there are less than 100 of these coherent global data sets that
I've referred to where you have a thousand patients with clinical outcomes. You
have a full genomewide said of information at an intermediate level, and you
have DNA variations across the genome.
I think that in the next five years there will be five to 10,000 of these. So there's
less than 100. This is what every group is putting in their money to do now. And
these data sets are getting built up. We have to have a place where you could
put those and share those and not have them siloed away. So to do that, we
recognize that no one entity even the 20 people that we currently have, can
actually do that and recognize that it shouldn't be done by that group. It should
be a contributor community that's actually building what it is that's going to be
used.
So that's the Sage repository in commons. You can see the great help that
we've gotten from Science Commons and thinking about it. We're assuming that
what needs to be done is build the data repository where the data can be stored.
The robust data, another topic to go into is probably in the commons you some
data that ends up being found to be most valuable, most robust. You need to
have a platform architecture system. You need to have a place where those
models are shared and rules and governance.
And so what we assume was that you need to have a distributed social network
of scientists who are willing to actually put this together and use it. So instead of
saying we're going to hire 100 people, what we've said is let's get the community
to work with the individual pieces that are needed to build that complemented by
work that could be done by industrial or IT customers.
Along those lines, we're now eight weeks and six days away from the congress
that's going to be held in San Francisco. I know Cameron and a number of other
people in this room are going to be there. But this is the purpose of this is to
gather together the model builders with the people who build contributor
networks, the libraries and publishers who have said they're going to be present,
various institutions, the various aspects of the government, pharma and biotech
and put together what do we really need to do? Our goal from that congress is to
get the standards and ontologies for that integration analyst.
So even though I am naive, I'm not naive enough to not know what is the EBI
doing, what is NCI doing, what is et cetera? And those groups are there. This is
a different effort than trying to actually solve it in a general sense. This is to
solve it specifically for these coherent data sets.
We're very excited about the group that Liz Lion and several people from Nature
are working with us and PLoS on how do you get citations to these network
models? What do you have to have for workflow? How would you do it so you
don't cite the paper, you actually cite the model? And that whole structure of
putting that together we aren't sure how to do it. We're hoping people will help
us.
And then finally, there's a whole set of rules and ways to share models and
things like that that we're hoping to put out as proposals to the community to
have the such and such San Francisco principles that will go out there and drive
where it's going.
Last thing I want to say for those who are going wait a second, what about
privacy, wait a second, what about IP, et cetera. There are real issues, some of
them most distinct with regard to institutions, tech transfer offices and places
such as that on how is this going to work? So this is talking about how that data
is put out there and shared.
And I want to give you one example of a project that we're doing that I think will
give you a sense of what you can do with the technology. It will also I think show
the importance of going to patients.
So for cancer drugs. So I was trained as a pediatric oncologist. As cancer drugs
are concerned, less than one in four drugs that's given as an approved drug
actually works in patients. In general. This includes the ones that cost 50 to
$100,000. It has gotten to be okay. It's not -- it's morally not asked what about
the three-quarters of patients who get those drugs, is there a way to actually find
who shouldn't get those drugs?
It's most serious not because of the cost but because of the fact that those
patients do not get the drugs that they should have gotten as alternates. So we
have standards of care that get given in lung cancer, et cetera, which we know
only one in four patients are going to be able to respond. There are no trials
today that if looked into how to find those patients who are not responding. So
we're going to use this technology to go in and find for those particular tumor
types up here, I think breast cancer is as of yesterday actually maybe going to be
added as a fifth. But the point is, we think we have the technology if we can
gather equivalent of a thousand patients, their sample to do the [inaudible]
interrogations that will allow us to have evolving signatures that will actually say
which approved drugs should not be used in which patients.
And the Lance Armstrong Foundation, 23andME, a number of groups are helping
us. But I bring it up because this is going to be done for the first time going
directly to patients. We're not going to physicians, we're not going to MD
Anderson, we're not going to Sloan Kettering, we're going directly to patients and
saying can you give us permission for your data? Do you want your data used in
this way with the data -- with the clinical information and outcomes would you like
to generate an evolving system which had over decades will actually add on to
where the FDA says this is an approved drug there will come a but, this is maybe
where it should be used. And I think this is the type of advocate project which I
think will help with regard to some of the logical questions that you might
otherwise ask in terms of how we're going to work with data.
You could say but do they want to do this? Because actually the biggest fear is
not a technical fear, although that's pretty scary. But it's actually do the scientists
who are doing their experiments today want to do this in a different way and do
biotech and pharma companies, are they willing to let biology become a
pre-competitive area. What's driving this is we think the disease biology should
not be a competitive area. It should be something that is open, that everyone is
sharing data in biology.
And the reason why I think there's a possibility is similar to what happened to the
semi-conductor industry in the '70s and '80s when it was recognized that they
dropped out of being in first place and all of a sudden came together as
Sematech and put together structures to figure out heat and density, to figure out
what should be done, I think scientists and industry are ready to go we're going
into a world we don't know how to get into, and we're going to have to come up
with fundamental different ways and be willing to put serious money into how do
we work with data, how do we put models together.
And so with that, I'll just go back to the three themes. I hope I've been able to
suggest that there is an emergent way of looking at disease models, not looking
at them as targets or as pathways but at networks; that to get to that enormous
amounts of information have to be stored, handled, and models built, and to do
that, we're going to have to work as scientists in different ways. And thirdly, if we
do that, we do not involve patients as drivers, it won't happen. Thanks very
much.
[applause].
>> Lee Dirks: Questions for Stephen.
>>: So this is the first time I've heard this, and I'm very interested. Now, if I
under what you're doing you're building essentially machine learning models from
pharmaceutical measurements, right, which have predictive powers. Is that
correct?
>> Stephen Friend: I'm going to make one slight change because I think it's
around the semantics of machine learning. So what is done is to take the state
of the cell across the entire genome and say it's in -- you know position one is
this way, that way, two, three, four. So the input is raw clinical data -- or raw
genomic data. To take another layer through the cell looking at what's happened
all the RNA or all the proteins. And then to have a box of outcomes.
The models that are built -- now to how is machine learning used. The models
that are built are the ones that are done by asking if every time I see this happen
here, do I see this and do I get to that. So I don't know whether I would call it
classic machine learning as much as iterations.
>>: [inaudible] like a neuro network with, you know, some form of observed
reality ->> Stephen Friend: Correct. Let's call it that. I agree. That's a good way to
state it.
>>: So my problem, if you like, or my concern is the following. First of all it has
to be -- without a clear understanding in terms of scientific domains that people
understand, they have to take the models purely on their predictive power; that is
the value of a model either works or it doesn't.
>> Stephen Friend: Correct.
>>: Now, we've been through that in [inaudible] for about 30 years, and I am
fairly outspoken in saying I think most of the stuff that's been published in the last
30 years is junk.
>> Stephen Friend: I agree.
>>: Right?
>> Stephen Friend: I agree.
>>: And the problem is that in the journals you do not have to deposit data set,
you do not have to deposit your model, you do not have to deposit your
algorithm, you do not have to say how you filtered the paper. Basically what you
do had you submit it, you just say well, we're trusting people.
>> Stephen Friend: Exactly.
>>: [inaudible] known in the community. And I mean, Tony may disagree with
me, but Tony and I are on the board of this new journal, open access
Cheminformatics, and I am concerned that there's still a feeling that we take
people on trust rather than we build totally reproducible. So one of the issues
that came up was should the pharma company be required to put their data in
and the general feeling in part for me was, well, it's a bit tough on them if they
actually have to make their data available because it's commercially sensitive.
But you can run up against the same sort of thing here in [inaudible]. And how
are you going to get across the problem that everything has to be in principle
reproducible by somebody who actually wishes to challenge the model?
>> Stephen Friend: I first want to acknowledge that I agree with the two themes
that you brought up I think are the drivers and then make a comment. And so I
think one of them was the current practices where there's not a layer of
accountability/responsibility for how you went from what input data to how you
got to the model I think is wrong, and this has to not take that approach.
And the second has to do with the fact that don't begin to think that pharma is
ready to make data available that has to do with the compounds that they're
bringing through their clinical trials. And I want to say both of those I think you
can move and come through the middle in the following way: Which is I think
that there has to be a way for the annotations on the input data are ones that
people can understand and use to combine. They have to -- the way data is
deposited in the public today is absurd. It's in the public domain. It isn't. You
cannot get to it. Okay? And I don't care what anyone says about it's all in the
public domain. Biologic clinical data, good luck if you're going to try to do
something with it after, even though the token yes, I put it out there.
So we're saying you actually have to structure this, and that's what we're saying
one would need to do in the -- in this biologic space to build the model.
The other is that there's a very important line between having an understanding
of disease biology not interrogated by proprietary compounds. And I think the
beginning of where there's going to be an inability to go through which has to do
with the structures and the compounds that are proprietary. And what we are
hoping is that there's enough of a sense of burning platform we're not going to be
able to have existent business models, scientists are going to go I need to make
my models, that the ability to actually have the sharing on the biologic disease
itself. So on the disease of Alzheimer's not on the drugs when it, that actually
there could be a enough of a -- we're not going to be able to do what we want to
do unless there's a sharing there and then separate that from what's going on
with regard to the compounds.
>> Lee Dirks: I think I need to cut us off so we can take a break. I would like to
suggest a 15 minute break and reconvene at 3:35. So, please, thank you very
much, Steve.
[applause]
Related documents
Download