>> : Okay, so thanks for coming back. So we've... Kurt Bon and Alex Gray. So now it's my great...

advertisement
>> : Okay, so thanks for coming back. So we've swapped the order of
Kurt Bon and Alex Gray. So now it's my great pleasure to introduce Alex
Gray who's going to tell us about scalable data mining.
>> Alexander Gray: All right. Thanks, George, and the organizers for
having me here. So my affiliation is a little different now. For those
of you -- It's the usual suspects plus some others. For those of you
who know me I'm still at Georgia Tech but spending most of my time now
at Skytree which, I think, is a new company. Not that new in a sense;
it's been developing product for three years which is scalable machine
learning software. So a lot of what I'll talk about, all of what I'll
talk about and will ever talk about is implemented in Skytree or will
be. And so that's what I've decided is the final delivery form for all
the fast algorithms and stuff that my research lab and the other labs
that I cherry pick the best of. The best, fast algorithms and machine
learning methods from, they will be in Skytree.
So why is that interesting to you? Of course I have a special sweet
spot for astronomy. It's how I grew up. It was my domain area that I
worked in. I'm not an astronomer; I'm maybe the one guy in here who's
not an astronomer. Who else would describe him or herself as a nonastronomer? Okay. All right. But I'm a computer scientist/machine
learning or statistician, so I'm an oddball in a sense that in the last
20 years I'm the probably the person who's been hanging around
astronomers the longest that I keep seeing today.
And then within machine learning, I'm an oddball because I'm the guy
who's been worrying for a long time about big data. And of course now
it's a hot topic. In the last two years a lot of other people have
started to think maybe this is important. So, but for whatever reason
the software we have today to do this stuff is the same. It's the same
crap that we've had for 20 years since I got started; it's the same
general framework. You have a library written in some language. You do
need many different tools because there's no one tool that does
everything. But you have some thin command line interface and you can
do some plots. Okay? That's MATLAB, R, Weka, SAS, SPSS, both commercial
and open source stuff. There's really nothing still. There are some
attempts that are pretty weak still and if you ask me pretty like
really weak, like not useable, weak. In the open source world it's
great. But --. So what we've decided to do is kind of take the best in
the research world that we know of in algorithms and machine learning
methodology and cook it into real professional enterprise-grade
software that both a big company with a lot of money and/or a science
institute with almost no money can use for their high value, high
complexity problem.
So, anyway, that's all I'll say about that. But if you're interested or
intrigued, both -- we need people to work for us who are smart,
including astronomers and also making it basically as free as we can
make it for astronomers. Okay? So if your institute or group is
interested, let me know.
So as for your statistics problems, here are some common ones that I'll
talk about. So the first four, these are basically -- these boil down
to -- So I realize a lot of what people end up being interested in
after my usual talks are kind of steps that I skip over which are,
well, what's the best way to translate my problem, first my science
problem, to a machine learning problem? And then once I've done that,
like which type of machine order problem? Once I've done that, which
method should I use? So I'm going to talk a little bit about that in a
bit of more of a tutorial way. Many of you here are already experts in
that, but for those of you who might get something out of that I'll
talk a little bit about that.
And then at the end I'll come to these last three which are -- So the
first four really boil down to -- Here's a kind of a, usually a
straightforward translation from the problem to a machine learning
formulation. But then the issue is computational, and that's what I'll
say a little bit about in a slightly different way than I usually do.
And then the last three are basically statistical issues where the
standard machine learning methods out of the usual software or
textbooks don't quite do the trick. You need the fancy stuff, something
fancier.
So, well, what are the different kinds of tasks, basic queries,
estimating a density, classification, regression, dimension reduction
clustering and two-sample testing and matching, comparing two data
sets. Okay, I think most of you kind of know what these are. You know,
and this is the usual starting point for my talk which is, yeah, if you
look at these actually and you look at the best ones they are in
squared or in cubed often. And I will come back to that. That's a
killer. N is the number of rows in your data table, so N-squared is a
crushing growth and computation that means you can't do millions, let
alone billions of objects.
And then of course that's just for one run of a method with one setting
of a parameter. You actually want to try many, many, many, many
settings of parameters to get the best model and to get air bars which
we never get to this point because of computation reason. We would like
to do the whole thing ten thousand times to do a bootstrap or
jackknife. Okay, so we never even get that far because we are crushed
just trying to do one run of a sophisticated method.
And these are just textbook methods and they're already difficult.
Okay, and these are non-textbook methods. These are more state of the
art methods that these are just examples from micro. But there are
many, many of these; the fancy stuff that is the stuff you make up to
deal with specific problems like whatever measurement errors or
whatever. These are even more expensive, typically. Okay? You want
higher sophistication. You see, even an N to the four here, okay,
because higher statistical sophistication means higher computational
cost. Almost no way around it. So that's the computational problem
which I will come back to.
But for now I want to -- This is probably the most common kind of
question that I get which is, well, so I translated my problem to a
density estimation problem, let's say. Density estimation is basically
like -- A histogram is a density estimator. It's non-parametric. It can
fit any shape of distribution. It's one dimensional and it's kind of
crude because it's choppy. What's the fancier version of that? Well,
it's kernel density estimation. Many of you know what that is. It's
smoother. It's in general dimension. It just fits the shape of your
distribution, and then you can now probe that shape and go, "Where is
it low? At this point is it low? Is it a low density point or an
outlier? Is it a high density point or common point?" and so on. So of
course this is fundamental; you do this everywhere in astronomy.
So what's the best way? There's kernel density estimation. There's also
a mixture of Gaussians. Those are probably the two that I would suggest
for this problem. What are the tradeoffs? kernel density estimation is
your most accurate overall method. It's free. Why is that? Because it's
free of strong distributional assumptions. It doesn't have a lot of
parameters. It has one parameter that controls a scale basically of a
little kernel function that you throw on each point. But it's killer.
The reason it isn't used everywhere is that it's expensive unless you
have a fast algorithm , which I'll come to of course. I work a lot on
that. Okay? But if you don't have the fast algorithm , the fast
algorithm , the real fast algorithm , the best one, is pretty
complicated unfortunately for this. So you really want it kind of
packaged up in some reliable software.
Mixture of mixture of Gaussians, tried and true, it's pretty efficient.
You use something called the EM algorithm . It's pretty efficient once
you have a fixed K. Where things -- K is the number of Gaussians that
you fit. Of course the problem then is how many Gaussians do I fit? I
kind of want to make it more and more non-parametric or able to fit
arbitrary shapes that aren't necessarily really just big Gaussians. And
so I can throw more and more Gaussians, but then how many and how do I
search that space of Gaussian parameters and number of Gaussians.
That's what makes it fiddly, unfortunately. Meaning, you have to try a
bunch of stuff. And the other thing making it fiddly is that you obtain
local minima in the optimization. It's not a global optimizer. Nobody
knows how to do a global optimizer for it that just works. So it's a
non-convex problem.
But it has another advantage: it offers a way to impute. Imputation
means filling in, guessing at missing values. So if you're data table
has a whole bunch of question marks in it: Well, this measurement was
not taken for this. We don't know the, whatever, ellipticity of this
object. No one recorded that or whatever. Question mark. You have a
whole bunch of holes in your data table then the usual algorithm for
this, EM algorithm . It can fill those in; it can guess them for you.
Whether that's good or bad, it's better than nothing because if you
don't fill them in you can't use any machine learning method. Machine
learning methods assume, generally, that everything is filled in. But
there's still basic -- kind of an open problem basically because you
are kind of putting in data that isn't really there. So ideally your
learning method would learn to model simply ignoring those question
marks, just they don't affect it. It only uses the non-question marks
to obtain the model.
Okay? But that is possible in certain methods, certain situations and
that's preferable in my opinion. So I don't think missing value
imputation is the end of the road for -- It's not totally satisfying.
Nonetheless, if you have a problem right now and you have missing
values, this is a good thing. It's decent. Okay? So those are some
tradeoffs there.
Classification is the other one I'll spend time on. I won't touch on
all the problems, regression and all, unless you want to ask me about
them later for lack of time. But there's a whole zoo of methods
especially for classification. It's the one machine learning problem -In a way it's the hallmark machine learning problem. It's the one that
the most effort has been put on. It's the one that's easiest to
understand what it's doing. It applies all over the place, and it gives
good results. It's the basis for most of the big application results
that have come out of machine learning, you could say. Ad Google uses
machine learning to show you ads which ads you might like based on your
query terms. You know, the handwriting recognition that's in the U.S.
Postal System; 97% of all mail is done by machine learning. All these
big things really are classification under the hood.
Okay, but that's why there's so many of them and there are so many
different ways to kind of sit around and dream up a new classifier, new
kind of classifier. So I would navigate that. So Naïve Bayes is
probably your simplest. You could teach this to a Kindergartener as
long as they know what a Gaussian is. It's simple and, therefore, it's
instantaneous as far as speed as well. It's also instantaneous to
implement. Those are both good things. But it's unlikely to ever give
you the highest accuracy. It's a very simple model. Logistic regression
is for one level up in non-simplicity but still pretty simple, pretty
fast. It's a linear classifier. Perceptron is a different way to do a
linear classifier. You may have heard of that from the eighties, also
unlikely to give you the highest accuracy because it's a linear model.
Your decision boundary that defines, you know, the boundary between
cloud of Class A and cloud of Class B has to be linear, perfectly
linear for this to be a great model.
So decision tree. Now we're starting to get into some powerful methods.
This is non-parametric which means you can prove -- Actually only in a
shakey way but roughly it's still true that you can fit any
distribution, any decision boundary with this kind of thing.
Unfortunately it's also a little crude. It's kind of choppy. It's like
in the sense of a histogram it chops things up into hyper-rectangles.
But it has a lot of great properties other than if you -- It gives you
medium-level accuracy, a lot better than these other two generally. But
you can interpret the output as rules. And so one of the earliest kind
of machine learning things in astronomy, I was semi-involved with, was
at JPL doing a star galaxy classifier with decision trees. And back
then they were looking at the rules to interpret what are the
properties of objects that make it more star-like or galaxy-like.
Okay, and it's somewhat fast. It's not instantaneous but kind of fast.
It's N log N to learn. N log N time where N is the number of rows. You
can easily do mixed, discrete and continuous attributes. In other
words, some of them are numbers, numerical, some of them are A or Btype variables. It has a way to do missing values in a nice way
actually that doesn't impute them. It doesn't guess at the missing
values; it actually just ignores them. It only uses the non-missing
values to make its model. And it automatically all at the same time as
learning the model, it decides what features it can ignore that
actually weren't useful for making the -- whch is the other big thing
you always want to know. Okay, you have a model. What were the features
that actually predict if something's a star galaxy or a quasar or nonquasar.
Okay? So that, a lot of nice properties but still not the highest
accuracy generally. We do a lot of these bake offs in machine learning.
It's kind of part of the culture which I like, empirical bake offs. And
it never really is the winner, but it has these nice other properties
that are very practical. Okay? Then there are random forests. So people
who are champions of decision trees didn't like that other methods were
pounding decision trees inaccuracy and so they found ways to boost the
accuracy of decision trees in a way that mostly keeps most of the
properties but that are good. But then, you lose one at least. So what
you do is you basically take a whole bunch of decision trees and you
average them together. So you have to learn hundreds, ideally thousands
of decision trees. They're all a little different, and the couple ways
you make sure that they are a little different from each other, and
then you average them.
So then the model is some linear combination, essentially, of many
decision trees with equal weights generally. But it's a huge model of
course and so you can't interpret it any more. But it retains the other
good aspects of decision trees, and the accuracy now is in that set of
methods that can give you the highest accuracy. Okay? So I'll just say
I won't pick a winner as far as accuracy because there's never one
winner across all problems. It is problem-dependent. There's even a
theorem about that, the no free lunch theorem which says there's no
winner across all problems. It depends on your data.
But although in the winning set, these are all methods that are nonparametric meaning you can prove that they can fit any decision
function. Okay? And this is one of them, random forests.
Neural networks: There was a time when all of machine learning was
neural networks. It was synonymous with neural networks a couple
decades. Now it's been reborn as something called deep learning, but
it's really the same thing. It has always been poo-poo'd. It got poopoo'd big time in the mid-nineties because something displaced it in
popularity called support vector machines that'll come next. Because
support vector machines, basically you can learn in a convex -- It has
a convex objective function, so you have a good optimizer for it.
Whereas the problem with neural nets is that it's fiddly. It's yet
another one of these non-convex object functions, and so you're always
talking local minima. It has thousands of parameters usually. And there
are all sorts of things so you end up kind of -- But if you work at it,
you can get state of the art. In fact for a couple problems neural nets
are the best approach.
And so that's always the case of any method: if you're dedicated enough
and dogmatic enough about that method, you can make it the best method
for your problem because you've put more energy into it than anybody
else. And there are some important problems when neural nets, deep
learning, are the biggest thing. And relevant to this community, image
problems, image classification problems some of them deep learning is
the best thing right now.
Okay? Speed slow-ish to train but fast at prediction time. So still
medium, kind of like decision trees on the [inaudible] front. Nearest
neighbor: a very simple method but among that set that give you the
highest, yeah, accuracies. And my favorite for certain astronomy
problems is kind of you could say an elaboration of nearest neighbor,
kernel discriminant analysis. I like it because it is in that set that
can give you the highest accuracies. It just has a few parameters. It's
expensive unless you have a fast algorithm . If you do, it's okay. And
it has interpretable probabilistic semantics which people like
Bayesians and other people want to think about probability
distributions like and feel comfortable with in terms of
interpretation. And it gives you an accurate estimate of the
probability of being in class one or two, which many of the other
methods don't focus on that and so they don't give you good estimates.
Okay, and then the support vector machine actually anecdotally across
all problems. If you had to take one winner averaged over many, many
problems, what's the most common winner? It is the non-linear support
vector machine. Usually by just a little bit but if you had to pick one
winner this would be it. And unfortunately this is the biggest open
problem of my lab. My lab aims to scale up all machine learning
methods. This is the one that I can't really do yet. This is the thorn
in my side. So no one can do this yet basically and make it scalable.
If you could, it would be the one to use in terms of pure accuracy.
Okay?
So I just wanted to say if you look at this list and you want to focus
on the non-parametric ones, the high accuracy ones, the ones in red,
well, why are they expensive? Why are they in squared's and in cube's?
And many of you know this if you know my research: it's because they
involve all pairs computations. They're comparing each point with each
other point, computing a distance or similarity between them or kernel.
Okay? Recently there's this National Academy's report on massive data.
It's finally become a topic of national interest, and so I wrote the
chapter there looking at the deep computational bottlenecks in analysis
of massive data. And these problems tend to be of this second kind
whose category I made up "Generalize N-body Problems," things involving
pairwise comparisons.
For those of you who don't know this, and I usually don't say it this
way but this is my provocative claim, anything to do with pairwise
comparisons I believe we know how to make it from N-squared to order N.
We have done this, I think, for all of the common things that you see
in statistics. Okay? For example, for every point find its K nearestneighbors. That's N-squared naively, order N. This is all provable now.
It took me many years. We now prove this is provable in the computer
science of worst case complexity, which you may not care about but
computer scientists always told me, "Ah, it's just a conjecture. Prove
it." So we finally proved it. And it is a little non-intuitive which is
why it took me a while, why it is that you can do this in order N time.
Okay? But my claim is if it looks like you're looking at something with
pairwise comparisons and you have to do them all, you probably don't
need to. And this includes things like friends of friends. It includes
endpoint where it's even worse. Endpoint correlations are pairs in
the two-point case or triples or quadruples in the higher order cases.
Okay? And so we did the first fast algorithm for general endpoint
that's exact a long time ago in 2000. But now smarter people than me,
like my student Bill who I'm going to promote in this talk, have done
the largest three-point and have continued -- There's even faster now - largest scale three-point correlation to date published. And actually
in practice when you're doing endpoint calculations using a random set
and you're doing many, many re-samples or you're doing a jackknife and
you're doing it for many, many scales, right, the many matchers or
scales. If it's three-point, it's many different triangle settings. So
it turns out you can do all of them at the same time in less time than
doing all of those things individually. And you can get many orders of
magnitude speed up by doing that. That's another paper that Bill did
with Andy Connolly.
Okay. So real quick, statistical issues that come up: measurement
errors, a big issue because some objects are close, some are far away
so they have different measurement errors. So we're working on how to
do some of these fancy non-parametric things that are based on kernel
density estimation. We have results that are about to come out in a way
that accounts for measurement errors.
Two more slides, three more slides. And then everything is now time
domain so all the next generation surveys, and so you would like to be
able to do all of this stuff not just on static objects but where each
object is actually a time series. The problem with that is time series
are variable in length and so on. So they're not of the same shape. But
there are ways to do machine learning which is standard paradigm,
everything is over the same shape of fixed length vector, in funny
objects that are not of the same shape at all like graphs, sentences
and time series. So we have some ways to do that. And then finally even
though we can do certain sensors on a large scale all over the sky,
still there are other more expensive ones that we can't do all over the
sky on everything.
So the question arises, what are the best objects to measure? This is
sometimes called active learning. So if I only have a fixed budget of
objects to measure to obtain some underlying function of something then
what should they be? So that, it turns out, surprisingly is a natural
concept to formalize, to want to formalize. It's been a long time. Very
recently people have done this is a rigorous way where you can prove
that by not choosing everything and not choosing everything randomly
but in a deliberate way, you still get some guarantee on the error.
Right? Because if you only choose some of them in a funny way you could
just learn the wrong function. But there's a way to. And we probably
did the other way to do that.
Okay, so I'll just mention the software implements all of these things.
How do you find out more, is the last slide? If you want to find out
more about how do machine learning in astronomy, there are some other
good books coming out and we have one with these co-authors where we'll
have a longer expounding on these kind of tips and tricks for, "What
sort of methods should I use for my problem?" as well as explanations
of them. If you want scalable software to do this stuff, where do you
get it? It's not scalable but there's Python code in our book that
makes machine learning easy. There is scalable serial software, one
machine software, that's free, open source from my lab. And then of
course if you want the fastest and most powerful that's for pay but
we'll make you a deal, you and any scientist basically, to support
science at cost. So just talk to me directly if you're interested in
that.
And if you're interested in both, the upcoming expert is my student
Bill who works in astronomy in doing scalable astrostatistics. And if
you're interested in machine learning as a career, talk to me about -We need smart people at Skytree. All right. Thanks.
[ Audience applause ]
>> : Okay. Questions? Yeah.
>> Alexander Gray: Yeah?
>> : Can I have the microphone and a soapbox? I have to disagree with
your characterization of kernel density estimation.
>> Alexander Gray: Okay.
>> : Can you make it louder?
>> : Oh, I'm attacking the characterization of kernel density
estimation as, you said, the most accurate. It actually throws away
information. Any smoothing technique, of course, discards information
at the finer scales. In a sense the trade off is you want pretty
pictures of your density rather than accuracy in some sense. I didn't
have time to talk about it but the Bayesian block algorithm I
described in one dimension can work in higher dimensions. And I refer
to a paper with [inaudible], the senior author, where we apply that to
the Sloan Digital Sky Survey and basically get a density estimation
that essentially represents all the information that's present without
any bias at scale and doesn't do any smoothing.
>> Alexander Gray: Okay. Well I'm certainly open to there being many
methods that are good. But I'm not sure how you can do any kind of
density estimate without smoothing because if you're not...
>> : [Inaudible] tessellation does it in order N in a simple way.
[Inaudible]...
>> Alexander Gray: But that's a dis-characterization which is a form of
smoothing. In fact it's a chop.
>> : Well, that -- Yes. In this context the data is discrete anyway so
you're just representing the density information in the data.
>> Alexander Gray: Okay. So if it's discrete data then you can use a
discrete method and not smooth. But if it's continuous data you have to
do some smoothing. I think that's --. But, anyway, not that there
aren't other methods, but these are kind of -- I just listed, you know,
standard off the shelf textbook methods. That doesn't mean there aren't
methods from the research literature that should be in the textbooks.
>> : Okay. [Inaudible] question.
>> Alexander Gray: Oh. Maybe David's question.
>> : Yeah.
>> : So one of those things you alluded to in your talk but didn't make
explicit is the difference between generative models and non-generative
models. So some of these, for instance, just specializing
classification some of these techniques are effectively a generative
model for the data and then use that generative model. And I think from
the point of view of astronomy there's a lot of advantage to generative
models over non-generative models because in generative models you're
better able to deal with missing data which I consider to be an
absolutely essential [inaudible] for astronomy. You can deal with the
fact that you have [inaudible]. But more importantly to this community
I think in this room, is this issues of utilities. So whenever we're
classifying we're always classifying for objectives. We have
objectives. And those objectives -- We don't necessarily want the most
[inaudible] results. We want the results that produce the most value
for us going forward. [Inaudible] select things for follow up or
whatever. And I think somehow that's missing in this. This thing about
classification. It's not clearly all where you can insert your
utilities.
>> Alexander Gray: Yeah, so you hit a number of things which I'll try
to remember and address. So, you know, generative versus discriminative
-- Yes. Typically, you know, you get a couple things extra out of
generative method that you don't get out of a discriminative method,
usually at a cost, at the cost of being the very most accurate. The
discriminative ones usually they do less work in a sense; they are not
trying to get you class probability estimates. And those are really
important to you and you want them to be accurate, so that's why I like
things like kernel discriminate analysis as I said. But, you know, you
do pay a little cost for that, and that's why a kernelized support
vector machine which is discriminative is slightly more accurate in
general because it only worries about the decision boundary.
But another, you could say, advantage of probabilistic or generative
methods is the missing value thing. But that's only one way of doing
missing values which I cautioned about which is by imputation. So
that's not the only way. It has some pluses and minuses, so I wouldn't
necessarily tie them together in a one-to-one fashion. "Oh, you have
missing values? You have to do a probabilistic method." Not
necessarily. But, yes, you often do want a confidence on -- You know, I
called this thing type A instead of type B, what's the confidence of
the model? That's pretty common. You can get those out of a
discriminative method but it's usually a hackier way of getting
probabilities out. So it just depends on the relative importance of all
of those things in your application.
And then utility, there is a formal way to put utilities on top of
machine learning. It's very simple. You just, in your decision
function, you can eventually write any machine learning method if you
know the basics enough as a base classification task in terms of Bayes'
formula. It doesn't mean you have to be a Bayesian, but you just write
in terms of Bayes' rule. And then you can throw utilities into that
formula and thus you can say, "Well, if it's more important to me, for
example, to make, you know, errors when I call things that are really
type A I call them type B versus calling truly type B things type A,"
that's one form of utility that's not equal weights in the errors.
That's easy to adjust for in putting utilities around the overall loop.
And there are other ways that you can put utilities into things. So I
don't know if that addresses everything, but --.
>> : For instance in the kernelized SVM is there a straightforward way
to include utility?
>> Alexander Gray: Straightforward, no. But that's why I said if you
know the basics well enough you can essentially re-derive all of the
methods in a way that you put utilities into the parts that you care
about.
>> : That would be a valuable contribution for science.
>> Alexander Gray:
Well, let me know if there's a particularly common
instance of that in astronomy. That's something we could do.
>> : Okay. Last question, Eric.
>> : I'm not sure if I'm asking the same question as David in a more
naïve way. But David very quickly mentioned [inaudible] measurement
errors. So in a table, a [inaudible] table, every cell has its own
personal measured measurement error. My question is, do any methods
[inaudible] but do any classification methods or other methods allow
these to be taken into account?
>> Alexander Gray: Off the shelf, no. And this is of course a huge
issue in...
>> : Is that the same question you were talking about...?
>> Alexander Gray: Well, no. But I do have a...
>> : [Inaudible].
>> Alexander Gray: I do have a similar answer which is...
>> : Okay.
>> Alexander Gray: ...you know...
>> : This is a grand challenge level need. Because astronomers spend a
vast amount of effort, measurements and measurement errors, and then
they go to the statistician and there's nothing but [inaudible] pi
squared. [Inaudible] capabilities. And so there's a huge disconnect
between the practice of observational astronomers and the availability
of modern methodology to incorporate it [inaudible]. So this is not a
trivial issue when you ask about measurement errors. And I'm not sure
how closely this is related to your point.
>> : I consider it completely related because for me missing data is
just an extended case of measurement error.
>> : Yes.
>> : Your missing data is just [inaudible]...
>> : Missing data, which is sometimes a truncation and sometimes just
missing. And we have censored data and we have measurement errors, and
in astronomy the truncation, the censoring and the measurement errors
are all from the same source. And we have no mathematician who's
willing to tackle this thing. I've tried to get you guys to work on it.
>> Alexander Gray: Nah, we're working. Let me just give you...
>> : We'll -- Sorry. [Inaudible] very quickly then we'll start our
second talk.
>> Alexander Gray: So the rough run down of the situation in the
literature is that this was worked out for linear regression and that's
it basically in the statistical literature. This problem is called
there "Errors and Variables." So it's even got...
>> : There's two books on it.
>> Alexander Gray: ...funky names and some of them make it hard to
track down the...
>> : But it's useless.
>> Alexander Gray: It's kind of useless. And then there are two cases
of it which -- Homogenous errors and heterogenous errors. Of course
we're interested in different errors on different points. That has even
less work on it. But we want to do this now for fancier -- all machine
learning methods. Unfortunately the way to -- you know, you have rederive each one from scratch basically. It takes a new method, a custom
method. And so we're trying to do that for a certain class of methods.
David's done that for a certain class of methods. So they exist. There
are a few things coming out. But it basically takes custom development
of new methods. It hasn't really been done in the stats literature or
the machine learning literature.
>> : Okay. I'm afraid we should call this discussion stuff [inaudible].
I'm sorry. We really have got to move on.
[ Audience applause ]
>> : Alex, none of your links from Fast Lab work.
>> Alexander Gray: Oh, yeah, I keep hearing that.
>> : Okay. Thank you, Alex. So right into the panel discussion with
George Djorgovski, Yan Xu and Mike Kurtz. A talk about [inaudible]
models and [inaudible] publishing.
[ Background noise continues ]
>> George Djorgovski: Well, I may as well start. So for years we've
been all crutching about the need to invent scientific scholarly
publishing enabled by information technology. And there are two aspects
to this. The first one is that the variety of scientific product has
increased dramatically, whereas it used to be more or less just
research papers in journals. Now we have all manner of other stuff
that's very much intellectual product or its data archives, algorithms,
work flows, just simple ideas, people debating interesting things in
blogs, etcetera, etcetera. And I think first of all we have to develop
a system for acknowledging and collecting all that worthwhile product
in a meaningful fashion and also figuring out the new model for quality
control. Because we have a peer review system which is effectively
broken because there is too many papers and not enough time, and
essentially it's more used for people to undermine their competitors
than to advise colleagues to do something better or nothing at all. So
we need to look into that through forms of maybe crowd-sourcing or
whatever.
The second aspect of all this is that we're simply not using technology
to deliver whatever content we want, that most of the stuff that we do
mimics printed paper which is crazy, that we need to think of native
electronic publishing that has nothing to do with ink on paper and can
link to different varieties of digital products, media, movie
simulations and whatever in noble fashion. It doesn't have to be
structured as traditional papers. And then we have to convince
community to embrace all this.
>> Yan Xu: Right. So speaking of that, George and I have -- I think I
was reading this book that we put together like several years ago
regarding computation and education for scientists. So in 2007 that was
the first conference I [inaudible] at Microsoft Research, and there
were like three astronomers in the room among other professionals whose
key on promoting computation or education for scientists. And we also
talk about a new way of publishing ideas. But we've been talking about
it since then for the past several years. Then a lot of things has
changed. First of all two years ago when I was doing the same thing
was, you know, putting together a book for the conference, the first
criticism I got when I had the opening of the workshop saying, "Oh,
we're going green. We shouldn't print them anymore." So I'm totally for
we do a, you know, online electronic version which is good for not
only, you know, publishing the papers, results and data mostly promote
ideas.
And now the question is, if the community is not into it then we may
have, you know, a few [inaudible] show up and then not sustainable
without a group of people dedicated to maintain the quality like George
was saying. And then keep on having new content from different
disciplines. So I'm here to listen to you for ideas, and I can go ahead
and then perhaps implement some mechanism that leads the start for
Microsoft Research to support the community. But I need people to, you
know, some of you at least, diehard people who is going to, you know,
devote to it and then keep this going. So, again, I can have a sign up
sheet whoever wants to be the victim of the first group.
>> Michael Kurtz: Okay. First I think I'd like to say that I'm very
unhappy that Lee Dirks isn't here to be on this panel.
>> George Djorgovski: Yes.
>> Michael Kurtz: I think that's all I want to say. I miss Lee Dirks.
Astronomy from the past 25 years has been building publication
mechanisms and libraries completely independent of the journals and the
old structure. I'm going to list several large entities that are all 25
years old or so, and you'll see how that works: ADS, CDS Simbad, NED,
The Archive, CADC, the main archives that's home to Einstein and
Chandra or ST Mast or the ISO Archive -- It's interesting that all
three of those were founded when Riccardo Giacconi ran those
organizations -- the Hesark, the IPAC, the European Space Agency has
several, the Super OR Archive, etcetera. That's where we're spending
our money. The amount of money going to those dwarfs the money that's
going to the journals or to our journals plus all the libraries
combined. We've already decided to move away. We're already doing it.
Maybe it's time to formalize that better, but we're kind of leaders in
that field among many of the scientists. So for something new to
publish I just thought I'd say one thing that seems to me obvious and
it's very short forms, things that don't really belong in journals the
way journals are structured but need to get out quickly like a graph or
a table. Very often a result is one graph, one data table and all the
paper around it is just, you know, getting that out. The people who
understand it will understand it just looking at the picture. And why
wait six months to write? Why not put up the picture? We don't have a
mechanism for that, and that's the mechanism I would most like to see.
So I think that's enough to introduce.
>> George Djorgovski: Yeah, over the coffee we've been [inaudible]
about how maybe we should start a completely novel journal with faster
informatics along the lines I talk before, fully digital, different
media, experiment with all kinds of things including peer review. And I
assure you the exact same discussion happens in e-science conferences
in general. So there are two problems here: some people need to devote
serious time to actually start and run such a thing -- In our copious
free time, right? -- and some institution needs to sign up to it. And I
mean respectable institution, not like those mushroom journals that are
popping up all over Internet and are essentially vanity publishing.
That stuff's irrelevant. And, okay, you can say, "Well there is new
journal of astronomy computing." And I appreciate good intentions, but
I'm sorry. I think that was exactly the wrong thing to do because it's
a) paper journal of old style and b) it's going with commercial
publisher. And I think that commercial science publishing is already
dead they just don't know it yet, just like newspapers, just like the
old music industry.
>> : Can I make a comment on that? One of the problems is that some of
us need to continue to build our careers. In order to do that we have
to publish papers and get funding and then we...
>> Yan Xu: Did you say career? Build up a career? Yeah.
>> : And so we're graded on the way we publish to a large degree. And
so, you know, we can start up our own little journal and that's
wonderful. But no one's going to publish anything because...
>> Yan Xu: Exactly.
>> : ...you want to get that job, you want to get that promotion, you
want to encourage your students to publish in top level journals and so
on. What may be a -- it may not be possible but a better avenue is to
talk one of the existing, well-respected journals into having a branch
that they try some experimental thing and so you can piggyback off the
reputation of that journal but still try these new things. So, you
know, should we be getting people from those journals along with these
kinds of meetings to get them excited and get them thinking about how
this might happen?
[ Multiple audience comments ensue and continue in the background ]
>> George Djorgovski: That's a really good idea.
>> Yan Xu: Yeah, yeah.
>> George Djorgovski: That's a really good idea.
>> Yan Xu: Uh-huh.
[ Audience conversations continue ]
>> Yan Xu: So that's --. That's the answer to my question I just posted
on Facebook. If you wouldn't mind, go comment. Yeah, I just posted a
question and you provided the answer. At least one way of --.
>> George Djorgovski: What was your question?
>> Yan Xu: My question was how can we come up with a mechanism, you
know, providing incentive and encourage that? You can't just ask people
[inaudible] and then creating a career problem for them. They will have
identity problem when they graduate if a student is doing that. Where
do they go, right, as an editor? As a physicist? As a computational
scientist?
>> : It's like these online classes that Harvard and Princeton are
putting up, you know, different ways to get an education that's
piggybacking off the reputation of an existing infrastructure. And
maybe, I think, there'd be an avenue that way.
>> Michael Kurtz: Yeah, consider physical review X.
>> : Exactly. Yeah.
>> : Okay. I hear that's what I [inaudible] for that journal. I agree.
I have long agreed with most of the doubts that George has, and I've
expressed exactly the same thing as a number of you. The thing is the
journal has a very key property which is missing [inaudible] that
George mentioned, and that is the property of actual existence. It
actually...
>> George Djorgovski: What?
>> : ...exists.
>> George Djorgovski: Oh.
>> Yan Xu: Actual.
>> : Actually exists. It [inaudible]. In every other respect it's a
very boring journal. It looks very much like all journals, so it takes
a lot of the [inaudible]. And that was very deliberate. We weren't
trying to be experimental under any heading other than getting this
community a journal. And the question of trying to piggyback on other
journals, we thought of that as well. And before we started this we
were in touch with the editor of [inaudible] and, well, the big three
astronomy journals and other journals. We sent them some [inaudible]
abstracts and said, "What would you do with these? Would you drop them
to the floor or would you at least send them [inaudible]?" And the best
thing he said, "Drop them on the floor." Even E and E which has a
section on...
>> George Djorgovski: Experimental [inaudible].
>> : ...[inaudible], said, "Well that wasn't very [inaudible]." And
basically they said, "We're not interested unless you can talk about
the astronomical results coming out of this technology. Just technology
is not interesting." So the journals [inaudible] were journals that
were to [inaudible] looked a little bit too [inaudible] to think
anything.
>> : And yet those journals would probably survey design papers.
[ Multiple audience comments ensue and continue in the background ]
>> George Djorgovski: Or instrumentation papers. What's the difference
between hardware instrument and a software instrument?
>> : Well, and I agree. I think it would make perfect sense for these
journals to publish these papers. But they said no. They didn't say no
maybe.
>> George Djorgovski: I think this was an aspect of the great cultural
shift problem that we've been talking off and on through the meeting,
in this case applied to journal boards or editors. But maybe the better
approach is to go not through astronomy but through e-science community
which is much bigger than astroinformatics and has exact same issues
and probably does have a critical mass, which I'm not sure about
astroinformatics, to actually do something. Start a respectable journal
within respectable society that will deal with these issues, especially
if they have universal importance like, "This is how we deal with such
and such data analysis problem or data-based problem," and so on. Maybe
that is a more viable way to approach it. Then maybe some day there
will be [inaudible] e-science supplement or astroinformatics supplement
or something like that.
>> Yan Xu: So I'll just comment a little bit and then switch to you
because we talk about this last night. My comments with an active
perspective saying, "What value do you provide to the rest of the
community for bringing it to e-science? Why would [inaudible]
informatics read your journal?" So that was quite [inaudible]. And this
morning I thought about it. I guess we can position it the other way.
If we wanted to join the e-science community as a branch of
astroinformatics, maybe that's a way to push us to think more broadly.
And then we publish, we really want to focus on the general aspects of
computational...
>> George Djorgovski: Well, you know...
>> Yan Xu: ...challenges..
>> George Djorgovski: People who build instruments...
>> Yan Xu: ...provides value to the rest of the community.
>> George Djorgovski: Yeah, people who build instruments might read
optics journals. But general astronomer will not read an optics journal
or...
>> Yan Xu: It's a different matrix...
>> George Djorgovski: ...something like that.
>> Yan Xu: ...to measure this. I'm sorry.
>> : I was just somewhat following up on this comment of PRX or ApJ
astroinformatics, recognizing the serious issue of careers.
>> Yan Xu: Right.
>> : I think it should get some really serious thought is there a way
to change something like ApJ Letters into an ApJ experimental or some
more experimental aspect of ApJ. I've certainly written to the ApJ
Letters editor and said, "I don't think this paper should be published
in ApJ Letters because you might as well just put it in the main
journal and then [inaudible] as soon as it's accepted." That's way
faster than ApJ Letters. And so the letters and -- I'm picking on ApJ
but I think at least a couple of the other big four have a letters
section. Figure out a way to do away with the letters section because
that's outdated and move it to this more experimental...
>> : Well once upon a time -- They may still have this statement -- but
ApJ Letters had on their, you know, instructions to authors that they
were accepting papers that had no lasting value. I was like, "Wait a
second."
>> Yan Xu: Right.
>> : Since we are talking about science fiction.
>> : Speak up.
>> : We are talking science fiction, about dreams, about dreams which
need to implemented from scratch. Now very stupid question, we have
something which already is in place where all of us send our paper
which is Astro-Ph. It is not refereed. Wouldn't it be easier to
transform Astro-Ph and Astro-Ph Version 2? So basically you submit your
draft then Astro-Ph takes care of the refereeing process. Then when the
paper has been refereed, you get the flag which is what we want that
"this paper is refereed." And then you have also an automatic way of
measuring the impact on the community of the paper, not to the journal,
by looking at the number of downloads and eventually with the number of
feedback. So this is an idea, I must be honest, which we [inaudible]
already many years ago. I mean, how to get a feedback from the
community which is the equivalent of the Facebook I like. I mean
[inaudible] I don't enter into a library since many years, and also I
love ADS because [inaudible] there is everything there. But usually
what I do, I go to ADS. I find the new papers in which I am interested.
I try to download that. They are protected by the copyright of the
journal. I go to Astro-Ph, and I download it from Astro-Ph. So
basically all our reading of astronomy journal is through Astro-Ph. Now
what is the reason for the proliferation of this journal? We all want
to reach the astronomer. We want to reach computer science; there is a
computer science section also inside the [inaudible] archive so it
should not be difficult to do the things. Because to be in the
scientific [inaudible] of a journal that's prestige. I don't know. I
don't understand the proliferation of journal. I have been member of
several scientific board, well, [inaudible] didn't gain [inaudible]. We
don't need to have the support of a big community like [inaudible] or
like, you know, or American Astronomical -- I was thinking about this
last week [inaudible]: We already have our board which creates
consensus. This is the whole community which cannot live anymore
without Astro-Ph.
>> Michael Kurtz: The way high energy physics normally works now is you
submit to Astro-Ph, you let it stay up for about two weeks. People
complain and write you back. You change the paper and then you submit
it to the Physical Review which allows you to submit it by sending an
Astro-Ph number and they download it from, you know, archive
themselves. So, high energy physics already has a rather different
model in that respect. The editor of Physical Review D told me once
that he reads papers --when they're already up in the journal, he still
reads the archive version instead of the Phys Rev version because it's
one click rather than two on Spires. That's the editor of the journal.
So everybody else does too. There's plenty of statistics that show
that.
>> : Question over there.
>> Yan Xu: [Inaudible].
>> George Djorgovski: Give him a mic.
>> : For professional journals [inaudible] --. For journals I suggest
maybe we should do some more survey for user's requirements. Before we
meet the user's requirements we can get better performance of much
successful. For example, for Chinese community for scientific journals
we have some basic requirements. A journal has to be indexed by the
major index system, for example the ICI and the EI. Yeah, open access
journals are very popular for some countries, but for Chinese users
they are not popular. The main reason is that most open access journals
are not indexed by ISI or EI systems. For us or maybe for graduate
students in China even though you publish the term papers or even more
on the new created computing journal but it not usable for you at all
for you to graduate or to get a promotion. So another example is for
the proceedings for ERDAS and the proceedings of SPIE, they are quite
different because ERDAS proceedings are very usable for our software
developers and the [inaudible] developers. But we have -- Yeah we are
not eager to publish or to submit papers to ERDAS because it has no use
for us for promotion. But for SPIE proceedings because they are indexed
by EI so we are happy to publish our papers, to submit papers to the
SPIE meeting. That's why for the IU symposiums because they are indexed
by ISI, so many pupil maybe from China and maybe from other countries
they hope to submit one or even more papers to the IU symposium
conference. So that's my opinion.
>> : So I mean that is essentially Darin's comment, I think, that you
need some formalism like being indexed by ISI, something like
refereeing or maybe replacing refereeing. It doesn't have to be
refereeing, but you got have some process which installs a minimum
standard...
>> George Djorgovski: Those are two different problems. There is
respectability problem and there is...
>> Yan Xu: The recognition...
>> George Djorgovski: ...[inaudible] peer review problem. And I think
what [Inaudible] was alluding to is some form of crowdsourcing
effectively should be a peer...
>> : [Inaudible] it was too fast. I was saying just Astro-Ph throughout
the refereeing mechanism.
>> George Djorgovski: Oh.
>> : Submit your paper and people can begin to read it. Meanwhile it is
refereed. Then after if it is accepted there will a light or a green
light on the paper saying it has been refereed [inaudible].
>> George Djorgovski: Perfect.
>> : Yeah.
>> George Djorgovski: Astro-Ph will never change because Paul Ginsberg
never wants to change anything.
>> : That's right.
>> George Djorgovski: And they don't have the resources. And, moreover,
they're just rapidly disseminating the old style papers. Right? The
word paper tells you the problem. We're so used to ink and paper.
>> : Yeah, I am the referee for instrumentation and methods for all of
Astro-Ph. I read the titles and look at the authors and normally that's
enough. Somebody does look at every paper before it goes up. It's an
undergraduate assistant. And if it doesn't meet the form of a paper,
has references and all that, it's flagged. I have to look at it. If
it's a post or something, it doesn't go up. They're not willing to put
things like that up as George just pointed out.
>> : If I can just make one comment on the refereeing? What some other
disciplines, particularly [inaudible], is [inaudible] the formal
refereeing because they couldn't get people to referee papers and they
do effectively have a crowdsource and things, so they post the paper.
They invite comments from other people and then some editor at some
stage reviews the comments and gives it a "Go," or, "No go," or, "The
paper needs to be revised." So there are other models already out
there. Eric?
>> : I've been an ApJ editor for seven years, one of nineteen
scientific editors, and to a considerable degree I served this
community in the sense that astrostatistics and astroinformatics papers
often are assigned to me. So I actual play a role as the traditionalist
in the room, and now I'm going to give you my opinions that are not
official opinions of the Astrophysical Journal but they're mine. First
you should know that the [inaudible] and the AJ actually do accept
astroinformatics papers. There's an informal sort of feeling that we
don't like barcode manuals, we don't see how the code, you know,
patterns [inaudible]. But if you describe the methods and then have an
appendix on the details of the code and apply it to some example and
show that it's useful then we will accept it. Our criterion is new and
significant research in astronomy and astrophysics. And there's not
statement that it requires a telescope or physics. It can be pure
instrumentation and pure informatics.
>> : So if...
>> : However, despite the fact that this is true, your community, this
community here, it doesn't seem to work in the sense that very few
submissions are made on codes or informatics. So few that you don't
even know what I just said because it almost never occurs. So
sociologically it has failed. But by policy [inaudible]. And that's my
first comment. My second comment...
>> : Sorry. Can I just ask you -- So, clarification: So would you
accept somebody who writes a paper about a new algorithm...
>> : Yes.
>> : ...without actually applying it to any data?
>> : No.
>> Michael Kurtz: See, that's the problem.
>> : Yeah, but it's not a big problem because we don't have very high
standards about the application part. So if you apply it to a piece of
-- a junk problem? You know, it doesn't have to be an innovative
result. It just has to show that it applies to some star or galaxy or
some image or some time series or whatever you want. So we low
standards on the applications but we require [inaudible] application.
This is informal. This is not a formal requirement and maybe you could
convince us to change our style on this. It's informal.
>> Yan Xu: Right. That's what I was going to say. Do you regularly
revisit the matrix or the rules that you use to...?
>> : We should. [Inaudible response continues]...
>> Yan Xu: For example, you may learn from this community. Yeah.
>> : ...but it's informal. So it'd be useful for us to formally state
this in [inaudible] journal online. Would you then use the journal
more?
>> : Yes.
>> : Because almost nobody is submitting anything.
>> Yan Xu: Now you catch up. We match.
>> George Djorgovski: See that's the problem. This discussion is going
in the wrong direction. We're not in need of a new journal. We're in
need of a different...
>> Yan Xu: New way.
>> George Djorgovski: ...way of publishing that can contain...
>> : [Inaudible]. That's my second point. I'm talking about the
traditional journal, traditional paper articles that are now online,
okay, [inaudible] PDF. My second point is about ApJ Letters. And,
again, this is a personal interpretation of discussions that I've
heard. In my opinion and in the opinion of perhaps some others, ApJ
Letters has lost its original purpose. The original purpose was to be
fast. And the reason it lost it is because ApJ used to ten months is
now ten weeks. ApJ is as fast as ApJ Letters used to be. Okay? And
that's all technology. It has nothing to do with anything. So all the
delays in ApJ are not due to the editorial process, okay, it's due to
the authors. It...
[ Inaudible background conversation starts and stops ]
>> : ...[inaudible]. Admittedly ApJ Letters is making seven weeks
instead of ten weeks. There's some small difference but it's very
small. So there's actually discussion that ApJ Letters, having lost its
purpose, is purposeless. And I actually had hoped that there would be a
change of purpose in the last change of editor which just occurred,
which due to lack of courage and frankly lack of consensus and lack of
good ideas, no change was made. You should know that the [inaudible]
pub board and the editorial boards don't like the situation of ApJ
Letters and we sort of want a new purpose so we're moving to something
new. So if there's anyone in the room who has actual new ideas within
the context of ApJ Letters -- We're not going to become a blog, okay, I
mean it's just not going to be totally different. But in the context of
a traditional journal, if you see ways of innovating I think you should
send your ideas to Ethan T. Vishniac, the Editor-in-Chief, because I
think he is actually for ideas. I don't know that but I think he is.
[Inaudible]...
>> : Before [inaudible] comments can I ask the panel...
>> Michael Kurtz: All right. Yeah, first I showed a whole bunch of
plots yesterday with the exception of the one where the fish pond fire
pot which was in SPIE, all the rest were in ApJ, AJ, PASP and ApJ
Letters. It's quite easy to publish these things if they're part of
some sort of scientific thing, astronomical thing. Most of them were
just software. The eigenvector paper is just pure software. There are a
couple different directions that we can talk about. One is making the
current publications computable which they're not. And the other is
publishing things that are outside the realm of a ten-page paper.
George is adjusting the second. I was thinking the first, but they're
both ways that are pushing nontraditional things. The journals have not
been very good in either of those things, in leading. It's one of the
reasons why there are 20 new libraries and publications over the last
25 years, none of which are the journals and none of which are the
traditional libraries. To be build, I don't think that modifying the
ApJ Letters is really what George things is important and I don't
either....
>> : No, but other people do.
>> Michael Kurtz: I do think that making the papers computable is
important and that....
>> : What do you mean by [inaudible]?
>> Michael Kurtz: I mean, making it so that computers can understand
what the papers are. And that requires work in publishing. It makes it
so that papers can't be free. It sort goes in the other direction of
open access. Dense semantic tagging is one of the catchphrases for
that. Simbad of course does that already. The librarians at...
>> : [Inaudible]...
>> Michael Kurtz: ...SV do that already.
>> : You have to talk to the technical people about this. I don't know
enough.
>> Michael Kurtz: But, yeah, we'd do that. It's one of the things that
ADS is looking at with librarians. But that goes in a different
direction than making publication faster and toward what people are
actually doing, and I think that's what George is talking
[inaudible]....
>> George Djorgovski: One of the things but maybe...
>> : Well, can we hear from Yan next?
>> Yan Xu: I want to say that other than the [inaudible] and the
scientific outcomes for journal there is other things that you have to
worry about, even the political impact and social impact and
administrative and the logistics of running a journal. So we should not
invent another journal, that's not what we're here for. We're talking
about new ways of publishing. If there is already existing a vehicle
that we can use, we should take that vehicle to a different path that
matches what we want for today's publishing. So that should be the
direction of this discussion. Other than, you know, creating another
vehicle.
>> : George, do you want to add?
>> George Djorgovski: Well, that's exactly right. We are so brainwashed
into thinking in old format style papers in a journal that this
discussion veered in that direction, "Let's improve ApJ Letters." No,
that's not the problem. The problem is exactly what Yan said: to be
able to publish scholarly output of different kinds that you can
publish a data set or an archive without having to write a bogus paper
around it, right? Or I ought to be able to publish one paragraph idea
saying, "This is a good idea. Somebody should do this. I don't have
time," and somebody might do it. And maybe they'll cite my little
digital thingy somewhere. All right? I want to be able to publish
numerical simulation or an algorithm or a work flow, "Given Sloan
Survey and this survey and that survey, this is how we discover
[inaudible] quasars." So there is no result. All right? You know, it's
not even code. Right? So that's what I think we need. The only reason
why I said new journal is because the old journals are, I think,
hopelessly wedded to the paper paradigm.
>> : Okay. We got a zillion [inaudible]. So we're going to -- We'll
start in the front and move backwards. So Alex, Matthew, [Inaudible].
>> Alexander Gray: I'll just throw out a couple things. So in a field
like mine which is about methods, we have an issue that everyone feels
uncomfortable about which is...
>> : Oh, sorry. [Inaudible].
>> Alexander Gray: ...you can read a paper about a fancy method or
algorithm, but then when you try to implement it all the details aren't
there or you don't know if it's -- or the experiments that they showed
aren't really reproducible. You don't have the data set. You don't have
the code. Which gets to the issue of publishing code and data sets and
idea all at the same time. In fact, it should be a requirement that you
can't publish an algorithm unless the code is there so it can be
verified. So we don't have reproducibility in methods, basically, the
methods part of the world. And the same would be here if we opened up
astronomical publishing to methods more. So that's one thing.
The other thing is [inaudible] idea about Astro-Ph. So what if there
wasn't -- Instead of having a green light, yes or no, acceptance model
like we have in publishing today, we simply have a continuous, you
know, like you said like a dig-style thing where you put something up
there, it happens to be in my area, I read it? That's the one problem
we have in computer science is getting enough of the experts time as
reviewers. You got some reviewers but they might -- it's almost always
you don't have the right people because they're busy or whatever. But
you can get their attention on the topics that they're really
interested in. So, you know, publish something. I go, "Oh, I like that
topic. I want to comment on that," and I'll rate it in the four areas
or whatever, novelty, depth of experimental results and whatever. And
then it has my name on it. It's not anonymous. And so if I'm well known
then that has more weight. If I'm not as well known, it has maybe less
weight. And so that creates a kind of formal point system for, you
know, [inaudible] it just becomes a...
>> : [Inaudible]...
>> : Sorry. Can we -- Let's move up the -- We'll get to you in a
second.
>> : So I would like to suggest that some of this is actually already
being solved by a broader scientific community. There's a little
journal called Science that some of you may have heard of, and there
are -- I can think of four papers that I've downloaded from Science in
the last year which are specifically about methodologies, new
methodologies for doing informatics type analysis. And you have the
paper which is the summary of the method and then there's a thing
called the SOM which is "supplemental other material," which more in
depth descriptions of the algorithms, greater details, attached data
sets, attached results, sometimes there's a attached code as well. And
it strikes me that a large chunk of what's being suggested has already
been solved by a major journal out there, and that's the sort of thing
you want to be looking at.
>> : I think that -- I'm sorry.
>> : [Inaudible] reaction but that's still different from what George
says, where you're going to publish fragments not...
>> : No, I agree but there is part of it has already been done where
you are trying to, "I don't have the -- I can't reproduce your
results," or, "I want to know more about this," or where is the code
attached to it.
>> : That's optional, right? So very few people do.
>> : The four papers that I've looked at in depth have got it, and it
was very useful to have the information. But you make it a stipulation
that if you're publishing that's what you're going to be doing.
>> : Yeah, I agree with this part that we should start doing that more,
add code to it, add a set to it and make it reproducible so other
people don't have to come back to us and say, "How did you do this? I
am not getting the same [inaudible] and the same code and the same data
set." They should be able to do that. And so that I [inaudible]. To
what George said about being able to publish a paragraph and so on. So
what's wrong with doing something like that on the existing things like
Astro-Ph for instance? Who might need something? And some people do
that.
>> : [Inaudible] does it.
>> : No, but there's no connection to formal points basically. You can
post whatever you want, but you know I think if we just establish a way
that it just shows you some points...
>> : No, but then are we asking about policing it? And then who would
do it?
>> : You got several different issues in there.
>> George Djorgovski: Yeah.
>> Yan Xu: Right.
>> : But, okay, Pepe, Norman, Nick, Eric.
>> : Absolutely, I agree with what [inaudible] said. The problem is
that -- Eric, I mean you are just going the wrong direction in my
opinion. I know I'll start a fight. But I think the ApJ [inaudible] are
exactly the wrong way to go. It's the wrong way to go because basically
they cost a lot of money. For reasons which are no longer
understandable it seems most of the subscription are to the electronic
journal and, therefore, there is not any more huge cost of paper which
there was in the past. They are just a system [inaudible] power when
nowadays we see a world which has moved toward consensus platform like
Facebook [inaudible]. So it is just a matter of doing things
intelligently because to assign a referee -- I've done a few times in
my life to assign a referee to a paper. You basically look at the
quotations. You find the paper which is quoted first or more times
inside of the paper. [Inaudible], "Oh, yeah their group." And you send
the paper to that guy. I mean it's basically a system [inaudible] power
which is self [inaudible] itself when acknowledge [inaudible] different
solution. I would like -- I don't see why there must be monthly
notices, Astrophysical Journal or Astronomy Computing [inaudible] when
basically you have system or keywords which electronically allows you
to pick up the article in which you are interested. If you wanted to
have a most automatic system of reviewing your choice or referee done
with a simple algorithm, you know, by selected member of the community
will be [inaudible] today you can get the traditional refereeing or you
can get the referee all together done by the community like the one
which [inaudible] mentioned and so on.
I think it's just obsolete. An approach like this has Astro-Ph
obviously since the guy is paranoid about these things, it will not be
Astro-Ph it will be something different, something research gate like,
it's something which I like very much, then you can publish programs.
Not like it happens very often that you say the program is
downloadable, and then for two years you cannot download it for reason.
The program must be downloadable then you know you can download data. I
think it's too simple. I don't see where the problem is besides the
political problem to solve the mafia over the various journals.
>> : [Inaudible].
>> : I'm very much enjoying this discussion because over the last year
and over this afternoon it's made me more and more convinced that
getting involved with this journal was the right thing to do. So I've
intelligibly [inaudible] points here. The question about why can't
people publish small additions, like a table, like a paragraph? They
can already. They can do that now and that would work to model if
people could get professional credit for that, if people could list
these in annual reviews, in CV's, that would work. Now there's
absolutely nothing stopping that except politics. It's true that
journals are dead. They were dead ten years ago. That was obvious. But
they have manifested nonstop twitching. And I see no reason to expect
they'll suddenly stop twitching in the next decade because it's
unintelligible they've last this long. So it's unintelligible to claim
that they'll suddenly stop. And all your discussions about archive
overly journals, about put your new models all for publishing, I've
heard all of these for the last ten years in bars, cafes, lunch breaks
at conferences and nothing has happened.
>> : Our generation must die.
>> : Our generation must die, fine.
[ Audience comments and laughter ensue and continue in the background ]
>> : And this is a social problem. It's a social problem.
[ Multiple audience comments continue in the background ]
>> : [Inaudible]. It's Eric next but first we'll try, does the panel
want to respond to anything?
>> George Djorgovski: Well, now I think the discussion is moving in the
right direction. I think maybe we can clarify our thinking by parsing
the great problem into several smaller pieces. One of the reasons we
have journals at all is the archival nature of it, that somebody has
signed to perpetuate this stuff forever. And usually it's some
professional society or a really successful commercial house like
Nature, right? And so that's a whole other thing. Somebody needs to
really make that commitment and pay for the upkeep. Now as far as
we're, contributors are concerned there is an issue of, well, broad
variety of types of contributions which we really haven't addressed
properly. The issue of quality control or peer review which we may be
able to crowdsource or each paper can have a little Wiki associated and
so all of the reviews minus the obscene ones, to be removed, would be
there forever. And then there is a separate issue of giving people
credit for this new variety of publications, and so there has to be
some agreed upon standard way. You know, it could be [inaudible],
something that people can put in their publication list that they can
be cited as, you know, "As such-and-such suggested in this electronic
thingy," you know, or we use the data or the program or the work flow
from, you know, again giving the electronic reference. And that those
count. So then that really means if enough professional societies
decide this is the way to go then Institute of Scientific Information
will have to just learn how to count those. It can count downloads. It
can make, you know, linear combination of a number of things, you know,
number of downloads, positive reviews, negative reviews, whatever. But
that's a different story.
>> : Sorry. We've got a whole cue of people. Do either of you guys want
to add anything to that?
>> Yan Xu: I just want to share how I feel. Yeah, I agree that we have
these kind of discussions over and over again. I was showing the book
that we did in 2007. I was thinking of when I was ten years I was
talking to a friend of mine, and she started collecting stamps when she
was five. So I remember her had that stamp book, beautiful stamp book
with dragon stamps and stamps from UK from other country. I was
thinking, "Oh, I'm already five years later than she is so probably not
a good idea for me to start." If I started, you know, back then I'm
already having, I don't know, how many beautiful collections of stamps.
So I don't care what kind of things we're talking about as long as we
can kick off something. You know, [inaudible] that we can collect ideas
and then get it going and then revise it, that'd be a great outcome of
this conference.
>> : Mike?
>> Michael Kurtz: All right, to the point of money. It cost about five
dollars for something to be up on Archive, and it cost about two
thousand for things to come into one of the reputable journals. That
money isn't wasted. So you have to decide what parts of it you want. I
won't go any further than that except that the money isn't wasted. The
whole refereeing and formatting and long term saving of it is part of
what that money goes for. The journals traditionally have not been the
ones to archive the journals. It's been libraries. Libraries have
basically collapsed. So the long term archival keeping of the journals
is only newly and temporarily in the hands of the journals or handing
it off to some place like Portico. It's not clear that there's a long
term solution for electronic journals, but there is no other long term
solution. Paper is not the way anymore, clearly. I guess that's enough.
>> : Okay. [Inaudible]...
>> George Djorgovski: Just a footnote to this, I mean, I've been
talking a lot to librarians and they are various astute communities
paying a lot of attention to these issues. So there are a lot of clever
people who are really thinking how to do that.
>> Michael Kurtz: That's a good discussion [inaudible].
>> : Eric, Nick, Joe, Alex.
>> : I just want to -- This is very light. It's just an anecdote, and I
thought, Pepe, you would enjoy it. Hearing about the mafia of the
journals from you from [inaudible] is very interesting to me. So now I
want to tell you who the mafia are in some cases. So I'll try to
disguise a little bit. There's a senior editor who's been trying to
make an innovation that, by the way I think Science and Nature do, to
publish the numbers underneath graphs, you know, associated with
graphs. Okay? And this is an obvious innovation. It's quite easy for us
to do technically. And it's been unsuccessful because it's been stymied
by the higher level bosses, called the AAS Publication Board, who are
elected and tend to be very ignorant of the fantastically interesting
issues that the people in this room are actually quite knowledgeable
about in many ways. So I just want you to know that it's not always the
public face that you think who's the enemy. It's sometimes someone
else.
[ Various inaudible audience comments ]
>> : Pogo said, "I have met the enemy and he is us." Yes.
>> : Just a couple of quick things. So it's not perfect but you could
certainly imagine on your job application replacing "List of
Publications" with "List of DOI's" -- some of which are publications
but it encompasses all the other stuff like code and whatever -- as a
way to compromise between being able to show everything you did and
actually having something you can get credit for. The other thing is
more of a question is there's an initiative by Peter Coles in England
called "The Open Journal of Astrophysics," and the motivation of that
was to basically let astronomers make their own journal. I don't know
much about it but does anybody have any comments on that?
>> : It won't go anywhere.
>> : [Inaudible] having it be free and not having it be open. It should
be called "Free Journal of Astrophysics."
>> : Doesn't go anywhere.
>> : Are you saying to [inaudible] or --?
>> : No one is using it [inaudible].
>> : Okay.
>> : Well, it only started a couple of months ago.
>> : Yeah, but no, no, no. [Inaudible].
>> Michael Kurtz: Yeah, how is it different from Archive?
>> : Well, it's a journal.
>> Michael Kurtz: Ah, right.
>> Yan Xu: Right.
>> Michael Kurtz: All right, journals produce articles which people
look up in ADS just like Archive. It's how people really use it. So --.
>> George Djorgovski: I cite Archive papers all the time.
>> Michael Kurtz: Yeah.
>> George Djorgovski: And, you know, why not? They're there.
>> Yan Xu: The ADS go search out all that.
>> Michael Kurtz: Yeah, ADS makes the match anyway. So --.
>> : [Inaudible]...
>> : [Inaudible].
>> : When you're applying for a grant, at least in Australia, you
actually have to list citations and impact factors, and unfortunately
Astro-Ph is no good for that. Archive's no good for that.
>> : Joe, you're up next.
>> Michael Kurtz: Could ADS help you by creating an impact factor for
Archive? It's easily...
>> : Yeah.
>> Michael Kurtz: ...calculated.
[ Various audience comments ensue and continue through ]
>> Michael Kurtz: I'll do that. I can probably do that, though.
>> George Djorgovski: This may be the best outcome of this conference.
[ Various audience comments ensue ]
>> : [Inaudible] enormous impact factor if you did that.
>> Yan Xu: Yeah.
>> : [Inaudible].
>> Yan Xu: And then we will have [inaudible]...
>> : We'd have all the money.
>> Yan Xu: ...so Lee Dirks economic search?
>> : Yeah.
>> Yan Xu: [Inaudible] as well.
>> : Oh, okay.
>> : I may be repeating what others have said but I sort of think 90%
of this problem has probably been solved in that we already have large
table and journals that are not printed. They're -- You know you get
the first ten lines or whatever. And if you really are interested in a
table of data, you go some place else and you grab it. It's not obvious
to me why we can't have appendices that are not published. There's a
one line description that if you'd really like to see code, go here. Or
if you'd really like to see this movie or set of movies or extracts of
simulations -- Okay, it's not going to be in the paper part; It's got
to be somehow stored elsewhere -- but I think that the technology and
the capabilities are sort of almost there with some of our major
journals. And to provide the sort of curmudgeonly view point here as
well, George did hit on this archival notion. There is an important
aspect that we shouldn't forget, that at times you do want to be able
to go back to read the paper from 1970 or 1950 or, at some point,
people are going to want to look to see what we were doing in 2012. So
there is an important archival aspect to this as well that is not free
and is going to be very challenging.
>> George Djorgovski: While that mic is moving let me point out that
what you just described says that all of this new content is minor
subsidiary to the traditional paper. What I'm pointing out is that
there new types of content that are good in their own right. If
somebody wants to publish code, they should be able to publish just
that code without the paper that it goes with. You know? That kind of
thing.
>> : Okay. Two quick comments here, then [inaudible].
>> : I think what you're saying is not very complete because he said
exactly one paper on Archive costs five dollars, one paper in a journal
two thousand [inaudible]. Look, this journal do not live only out of
subscription. This journal live from contributions from many
organizations from -- Astronomy astrophysics in Europe is largely paid
by the government. They don't live out of subscription. And I think the
same is true for monthly notices [inaudible]. So if you just moved
[inaudible] published a small fraction of what is currently paid by the
government to maintain the paper journal it will cover all the expenses
for the running operation and for the archives.
>> Michael Kurtz: But that two thousand...
>> : [Inaudible]...
>> Michael Kurtz: ...has nothing to do with paper. The paper is almost
free. All the rest is the cost.
>> : A very quick almost point of information: ANC will, for example,
support adding supplementary material to do article. And I think that's
not the only journal that would do that. And...
>> : But first, after you set the science [inaudible]...
>> : Yeah, exactly. And so the only thing that's stopping the vision,
one of the other only things that's stopping it is the willingness of
editorial boards to do something different with what they believe is a
good article, an adequate article, an adequate contribution. And some
editorial boards would be more conservative. Some editorial boards,
e.g. this one, still are at the phase of saying, "Okay, what should
this article look like?" That was part of the motivation of this. It
wasn't just journal; it's to ask the question what does an article in
this area look like? It doesn't have to be section one to four. It can
be anything.
>> Michael Kurtz: I think just briefly if you have your software, the
astronomy software library -- The poster is over there -- it is indexed
in ADS.
>> : Right.
>> : Yeah. Yeah.
>> : On George's point of why not just publish a code, you know, the
counterargument to that is, of course, the code is not good if you're
not telling me why you're publishing it.
>> : Yeah.
>> : I mean seriously think about, could you go through a code, strip
out all the comments, and publish that. And would that have any value
to anybody?
>> George Djorgovski: That's a bogus example.
>> : I don't think George [inaudible]....
>> George Djorgovski: That's not what I'm saying.
>> : Exactly. So you need some kind of descriptive about it, even if
it's only a paragraph. "This code is going to do X, Y, Z."
>> George Djorgovski: Sure. That's not...
>> : That's not the point. The point is do you want to write a full
scientific paper...
>> George Djorgovski: Yeah, exactly.
>> : ...to describe what the manual...
>> Yan Xu: It would be reproducible.
>> : ...what the manual described.
>> Alexander Gray: I just want to throw out one data point from another
field which is a success story of how we upended the paradigm. So in
machine learning roughly ten years ago, the dominant journal was called
The Machine Learning Journal. But they didn't let you own the -- They
owned the copyright and, therefore, you couldn't put the paper on your
webpage. They were actually sending people nasty messages saying, "You
can't have that on your webpage." So the machine learning community -You know, this gets to that difficult issue, you're trying to do
something new, making something free to different paradigm, there's the
reputation issue. That was the only place to publish if you wanted to
have a reputable paper. So what happened, the way we resolved that was
the leader of our field, basically the most famous guy, started the new
journal. And so you need -- Basically once the famous people are all
there -- And now this new journal is the main journal in our field,
AMLR. So you need the well known famous people to all be behind it, and
then overnight it can actually have the reputation. So it's not
hopeless.
>> : Okay. We're coming towards the end so -- Anybody else [inaudible]
then I'll hand it back to the panel for some [inaudible]. Okay, one,
two, three, four and then the panel.
>> : So regarding publishing of code, there already are some
[inaudible] archives like R has the CRA in the content, so R Archive as
well as the CPAN which is the [inaudible] of [inaudible]. So these are
typically individual programs well packaged with documentation with
manual pages and everything. So would there be any sense for the
astronomical ones to have their responding [inaudible] apply to that.
All like the poster that is there, they have a good set of rules, what
you can put it, how you can start, about the code [inaudible]. That is
also pending. When we are talking about a new journal, what is it
exactly that we are saying?
>> Yan Xu: A set of rules, yeah.
>> : How are these different from that? So can we somehow leverage
these into what we are calling as [inaudible] journal and so on?
>> : Okay. Pass the mic.
>> : This is a journal comment and everything, but just coming from the
point of view of a earlier parsing career, I would like to kind of
remind that almost everything I think who's been discussing this so far
is someone in a permanent position.
>> : Oh no.
>> : Hmm? Okay.
>> : [Inaudible].
>> :Okay. I think that many people in this room are discussing this
from the point of permanent positions....
>> : [Inaudible]...
>> : Okay. I think that meant...
>> : ...[inaudible].
>> : Okay. So because it hasn't been explicitly said, although I'm
philosophically very much in favor of everything we've discussed, I'm
somewhat nervous about all this as well because I worry that when going
for a permanent job I know that there are many people who are not as
broadminded as the people doing this discussion who will not -- who,
despite the fact even if it works better than a traditional model will
slap down a ruler and say, "You only have this many ApJ papers. We're
not going to consider this." And for younger people, I think that's an
enormous consideration that we have to -- even if it's a better method,
there is pressure to put it in an ApJ article or [inaudible].
>> : Let them [inaudible]. If you had an established journal which was
recognized and not just [inaudible] thinks it's got to be recognized.
>> Yan Xu: But on the other hand if you join it now when you turn into
establish, this journal could be one of your credit. Like my virtual
stamp -- I mean my dream of virtual stamp book. Right? If I started ten
years go -- I mean when I was ten years old I would have this
[inaudible].
>> : But if you don't publish now in the old, good journals we're not
going to get tenure. That's the point. If we try to publish on that
journal I would like to publish on...
>> Yan Xu: No, maybe not now. Yeah.
>> : Most astronomers would look at that and say, "What is this?"
>> : And that's okay.
>> : "What are you talking about?"
>> : But that's not a reason for not starting it, though.
>> George Djorgovski: It's a separate problem, as I said. There is
problem of delivery. There is problem of quality control. There is
problem of giving credit. All of those are big problems.
>> : Yeah, I think it's very complex, and it's probably worth looking
at the sociology of all this because in a certain way it's a
reshuffling of power. I mean, you have new methods but the power of
decision making goes much slower the changes. So basically it's
absolutely right to hear students that will be skeptical about these
new methods unless the power is sensitive to these changes. So the
point is trying to look at these new systems and somehow create the
right incentives, the people, so that the recognition and that it moves
science in a better way. I mean, I think in a certain way that's
already happening with information. I mean, we Google and some
algorithm is making the decision of what we look into. So it is
beginning to happen that's affecting the power of decision making and
the way that information flows. I think it's a very complex
sociological problem that we should give a very deep thought. I don't
think we have the answers but we do have the right problem that should
be addressed very seriously.
>> : [Inaudible].
>> : [Inaudible] students are never skeptical about the new
[inaudible]; they are skeptical about [inaudible]. And that's a
completely different thing because -- So [inaudible] because there is
no other way. I mean we cannot think that we can continue [inaudible]
shelves of libraries which, you know, cannot not even allocate any more
space for these things. This thing is going to happen. Now the problem
is to understand how, I just wanted to comment on the [inaudible]
publishing the program. [Inaudible] meeting organized by Eduardo. There
was a fantastic talk by a guy from a [inaudible] processing community
who has solved this problem since many years. We just need to look what
are doing others. They publish programs since many years with the
[inaudible], with the comparison, with the documentation. These are
standard for publishing program which needs just to be adopted and not,
as we always want to do, to rediscover water wheel and this type of
thing. Many communities have already done it. Can I make a suggestion?
I think that this discussion is very interesting and very useful. If we
can keep all the [inaudible] to the discussion, we have a Facebook
page, why don't we use it to put...
>> : Because Facebook [inaudible].
>> : Sorry?
[ Audience laughing ]
>> : [Inaudible]. Really we have this Facebook is where every one of
the people who are going to be doing the discussion. Okay, not the
contributions, so that we can try to reconstruct. Because at the end,
if I'm not wrong everything goes down to build a proper business model
because I'm sure that if we point out that using electronics rather
than paper can save a strangled university, choked university billions
of dollar because it's -- Like my university spends 1.9 million Euros
for the [inaudible] in physics. It's all in subscription. And model
based on electronic publisher would save precious research money
[inaudible].
>> : [Inaudible]...
>> : So fix...
>> : It won't save any money.
[ Various audience comments ensue and continue through ]
>> Yan Xu: The thing is...
>> : It certainly won't save any money.
>> Yan Xu: Yeah, I want to say that...
>> : I think so.
>> Yan Xu: ...Facebook or e-mail or any mechanism -- I want to throw
this question to the audience, what would be the minimum we can do to
avoid having the same discussion next year?
>> George Djorgovski: It seems to me that we need to write a
requirements document for the new model of scientific publishing which
will outline all of the problems that we have mentioned here. And then
that can guide discussion about possible solutions. Some things may
have been solved already; others might require a completely new
approach. Others may be unsolvable or solvable just by old people dying
off. Right? But I think a useful product of this discussion doesn't
have to be done in one day. It will be that people who really care
about this get together to some virtual forum and write this
requirement document, if you will, for, you know, future model
scientific publishing. And it may not come to anything but at least we
can have problem defined in a way that's approachable.
>> Yeah. Yeah, [inaudible]...
>> Michael Kurtz: There's a thing called Force 11 that Lee Dirks was
part of that actually that looks at that for all signs.
>> : Can I cut off your discussion right now? We're going to have one
last comment from the floor and then you three can sum up or whatever
you want to.
>> : I just wanted to say this has been a problem across a lot of the
disciplines and make the comment that there are people who have been
working on issues of credit and metrics and impact factors for
journalism, sort of alternate methods for accounting that. And one of
those groups is oldmetrics.org and there's another group that just
launched called Total Impact. And they come out of the gene sequencing
world. But if you're concerned about that you may want to look into
those two groups.
>> : Okay. Thanks. So up to the panel to sum up all proposal
[inaudible].
>> Michael Kurtz: Me?
>> Yan Xu: Yeah.
>> Michael Kurtz: All right, first the metrics are being worked on a
lot. The how to do nano-publishing and get credit for it is a typical
topic of conversation at a general meeting for the future of scientific
publishing. I was at three of them last year; they happen all the time.
Force 11 is a buzz word. Type it in and find the web page. They'll have
another conference some time next year. I guess the metrics of use and
citation are the most useful ones. I have a new citation metric on
Astro-Ph this morning, so I have to say the citations really are still
very useful.
We're developing new publication methods all the time. That's what
Hesark. That's what the Space Telescope Archive is. It's what Archive
is. It's basically what CDS is. It's what ADS is. It's what the
Astronomy Software Library is. The Astronomy Software Library is linked
to by ADS just the same way that the space telescope measurements are
linked to through ADS, through papers, through paragraph descriptions.
People think in words, so describing things in words is probably the
least common denominator to get everything all working together.
Otherwise, you don't really need to invent that much stuff. You have t
use a lot of the stuff that's there up to nanopublishing which is not a
solved problem anywhere. There's no real reason why astronomy would
solve it.
>> Yan Xu: Right. I just wanted to repeat the question that I just
throw out, is I wanted to find out what would be the minimum we can do?
And I volunteer to, you know, be your assistant to make that happen
because the success of my job is not defined by the number of
publications with my name as the first author in a physics journal.
That would be a failure at Microsoft. I'm supposed to assist you to do
better science. So you let me know and take advantage of what we can do
for you from here. And give me suggestions.
>> George Djorgovski: It seems to me that we're fairly ignorant of
serious thought that has gone into this in other fields. And maybe what
we need is a one day workshop where we can get mutually informed with
people from other areas and then construct a requirements document and
start thinking, "What's already in hand?" because some things may
require only minor tweaks as Michael just described. And I suspect it
will boil down to the sociological change of getting used to giving
people credit for things for which they're not getting credit now.
>> Michael Kurtz: That is the major issue. It's tenure decisions. It's
old farts who don't view having software that's used in ten different
scientific papers as important as writing one of those scientific
papers. It's the same problem with instrumentalists also who do things
that other people use rather than they use themselves. That's a real
serious problem for pretty much everybody in this room.
>> Yan Xu: So we should find a venue that we can reconvene ourselves,
perhaps E-Science in Chicago would be a --?
>> George Djorgovski: It's too soon.
>> Yan Xu: Too soon?
>> George Djorgovski: We have to, you know, organize this and think
about it.
>> Yan Xu: I don't mean to be too pushy but we have to make it happen.
>> : Can I suggest actually...
>> George Djorgovski: Be pushy.
>> : ...we do continue this conversation online somehow, perhaps just a
Wiki rather than Facebook but whatever suits you best.
>> Yan Xu: Absolutely.
>> George Djorgovski: Yeah.
>> Michael Kurtz: Things you put on Facebook are broadcast to other
places.
>> : Yeah, I think...
>> Michael Kurtz: [Inaudible]...
>> : ...a Wiki format is [inaudible].
>> : [Inaudible]...
>> : Let's discuss it on the Wiki. Okay, let's thank the panel. It's
been a great discussion. Thank you...
>> Yan Xu: Thank you, Ray.
>> : ...panel.
>> George Djorgovski: Thank you, Ray.
[ Audience applause ]
Download