25120 >> Matotasho Ishi: Let’s get started for day two. ...

advertisement
25120
>> Matotasho Ishi: Let’s get started for day two. This morning, okay, my name is Matotasho Ishi and I
am going to chair the morning session. The first half we have two key note talks. The first speaker is
James Crutchfield, The Macroscope The Engine for Pattern Discovery. James –
>> James Crutchfield: Thank you. Well you’re not my usual audience but then in talking to people
yesterday listening to the discussions I realize maybe we’re not our usual audience. Kind of a motley
crew trying to figure out what you’re doing and who you are and if there’s something here.
So, with that in mind what I’d like to do is paint a certain kind of picture doing this a bit in historical
perspective of my own interest and study of very complex large scale systems. One of the themes is to
try to emphasize or maybe plead or beg for the role of theory as an important component in doing
computing and gathering large data sets. I’ve actually become a little bit concerned about the role of
theory. I do have a positive message I hope that I’ll be able to convey, course not in such a short talk
there’s a lot many things I’ll be sort of skipping over.
Since it is a short talk let me just give you the executive summary. So, the talk is gonna oscillate
between being provocative and obvious, between the sublime theory and mundane gadgets. I like sort
of oscillating between all of those extremes.
So, for me big data is not enough nor is building larger and larger machines. I have spent my entire
career building instruments to help me understand how nature organizes itself. How randomness is
produced in the natural world. I’m gonna try to give a little historical summary of why that’s sort of
necessary and the fundamental role of computing. I’m gonna come back and talk a little bit about in
that context the role of theory and maybe try to flatter the astrophysicists here by pointing out the long,
long history that astronomy has had in pushing some even more abstract ideas I’m gonna talk about.
The optimistic message, the kind of conclusion that I’ve come to although like they say it’s sort of a
vision thing is the possibility of actually automating science. I mean that in a very particular context
where you have very complex large scale systems. How is it that we discover something new? So in the
talk title I use the phrase “pattern discovery” and I want to make the distinction between what’s
probably more familiar pattern recognition when you call up American Airlines and you say Chicago
you’re talking to a machine that some poor speech engineer has spent literally months, I have a friend
that does this, tweaking hidden mark off models to recognize the particular frequency response in small
time windows.
Pattern recognition in that sense is a successful engineering technology that requires a built in
vocabulary, a set of templates against which our little digital signal processors compare and the one
that’s closest becomes Chicago to the machine. That’s not what I want to talk about.
We do this all the time. It’s very successful when I say pattern discovery I want to know where the
vocabulary came from in the first place. What is the set of templates? The positive message is that over
the last century we have a number of techniques that let us actually think about this issue and frame in I
think a precise way.
So, if you’re interested in some of the theoretical aspects there’s just this last January this year I have a
review in nature physics so please go take a look at that. It gives some of the background going back to
the origins of the theory of computation, soles trial mechanics, dynamical systems, nonlinear physics,
and try to weave these things together to make this case more plausible.
Okay, so this will be the sublime I’m gonna end with the things that entertain me when I don’t want to
take another derivative, mainly gadgets. I want to talk about tools that help accentuate that I argue will
accelerate discovering new patterns.
Okay, a little bit of history, this is maybe revealing myself, kind of dating things here. How many people
know what this is? Any guess?
>>: Analog computer.
>> James Crutchfield: Congratulations. In particular it is a Systron Donner SD 10/20. This machine is
near and dear to my heart. What do analog computers do? They solve differential equations. So why
would I care about an analog computer? Well, in the 70’s I was very interested in various kinds of vary
nonlinear equations for which you could actually show some of the mathematics I’ve eluded too. You
can’t write down a close form solution therefore you have to simulate hence the beginning of this long
sort of focus on tool building.
In the service for me of building theory and trying to understand how systems are organized but
computing simulation this particular one although it was actually built in the late 60’s, you can’t really
buy these things any more, makes me greatly sad.
You have different modules here, operational amplifiers, multipliers, and we’re showing it in the unprogrammed state. Ha-ha and you patched all these things together. It is an 8 parallel computer. All
the modules operate simultaneously and it runs extremely fast. Probably in this size of machine you
could probably do oh say a set of 20 first order differential equations. Probably running anywhere from
like as slow as you like and you put in different capacitors anywhere from a few cycles per second in a
natural frequency up to maybe 10, 20 kilo hertz.
At the time when I was using this in the late 70’s that obviously beat the pants off the PVP 11 I was using
by a long way. In fact I would still kind of argue not so much for the speed but because of very
important interface decision engineers made, knobs. It seems like a trivial point but there’s something
about looking at the solutions as they change as you twiddle a knob. Because there is an interactive
feedback loop that is very short and intuitive.
But I’m gonna come back to this. In some sense I’ve been trying to through this long digital distraction
since the 70’s get back to a system that is as nature for discovery as an analog computer is. Of course in
a digital setting I can do more sophisticated things, larger systems and so on but it’s a little bit kind of
sort of an irony in terms of shall we say gooey’s, is that a gooey? It’s and instrument panel dammit. Of
course people doing music synthesis and VJ’s and so on have, you know, interactive interfaces like this,
sliders and so on trying to fake the naturalist of this sort of thing.
Okay, so let’s just, I’m kind of grounding myself here. You’ve got to know a little bit of where I’m coming
from this old analog world and my heart still pines every once and a while for this especially when my
[indiscernible].
So I want to argue now in terms of fundamentals why we must computer. Maybe yesterday it maybe
kind of the context of the workshop it’s a little bit, systems are big. We have big data sets, we have big
machines, big systems of equation you want to solve. I actually want to argue that even for small but
nonlinear systems we have to compute. This is base on a good deal of mathematics developed in the
late 19th century but mostly through the 20th century.
First point, nature spontaneously organizes. There are emergent structures in the world. What do I
mean by that? Well the word emergence now getting heavily abused but it means something very, very
immediate and practical. That the structures and patterns produced by the system are not directly
specified, not directly derivable from the equations emotion.
You write down a differential equation sure you can put the left hand side to zero and find the fix points
or maybe could be computation difficult but you’ve reduced it to computational quadriture. I’m talking
about systems where over a long spatial scales and long times structures emerge that again are not
given by the equations emotion.
There’s just way to many examples to go through. Fluid flows here its Von Carmen Vortex Street.
Benarty convections fluid, thin layer of fluid heated from below. Same thing here, convective cells and
this hexagonal, these little donuts, these are 2 admissible fluids under high pressure. Belesaubs
Abbotensky reaction you’re looking down on a peltry dish with phivery agents. They actually start
spontaneously start to oscillate in every little fluid volume and then those oscillations get synchronized
and we recognize these as traveling waves of reactions.
Unpublished for about 10 years because at the time the late 40’s and early 50’s when Belesaubs and
Abbotensky tried, did the experiments this was thought A, to be impossible, so that’s the first round of
rejections, B, uninteresting cause it’s transient, and finally they got it published. This became one of the
paradigms for understanding how order spontaneous arises in nature, the Belesaubs Abbotensky
reaction through, reaction to fusion partial differential equations.
But pattern formation occurs in the natural world. This is a peltry dish, bacteria, the evil biologist put
some taluwing here and the bacteria actually get together and kind of race away from, together in these
patterns. Zebra striping, animal striping is also described by a very similar sort of reaction to fusion
system, butterfly wing patterns and it goes on. This looks like micrograph of the vascular system of a
leaf but in fact it is what happens when you take a piece of plastic and you stress it and it cracks.
The point is nature spontaneously organizes. One of the goals of science is to understand what these
patterns are. It’s almost like different fields are sort of defined by what we decide to attend to, right.
So, hidden in this is actually the mathematics that’s developed maybe even in this case kind of the last
half of the 20th century.
We actually understand a lot about the mechanisms in the systems states basis that lead to these
different classes of pattern. So, in some sense there is a kind of predictive theory behind many of these
systems. Sometimes the analysis is quite hard. I’m not claiming we can do every one of these things
although Bezee’s done pretty well, Vortex Street’s done pretty well, and so on.
Not only natural systems, right, as the intelligent beings that we are we seem to be hell bent on building
gigantic social technical systems now. Just like before there are consequences of building these large
social technical systems that we don’t understand.
Just one example internet route flapping, I don’t know if anyone remembers this from the early 90’s. So
if you went to the catalog from Cisco, well actually it wasn’t a very big catalog back in the early 90’s it
was a small company. You paid your sixty-five thousand dollars for a router and you guaranteed a
hundred thousand packets per second through put, as soon as you deployed these things and put them
out on the network, what happened? Suddenly, in fact this was noticed in, across the entire network in
the early 90’s through put would start oscillating on time scales of seconds to several minutes. Huh, you
know, you made a bad purchase, right. You looked at the spec sheet and you said a hundred thousand
packets per second. You weren’t getting it sometimes you were getting zero. What the hell was going
on? Well it turned out that the engineers designed the protocols, the control protocols in isolation. In
fact the control protocols to simplify a rather complicated story they tried to be smart, they tried to add
smarts, oh I have a map of my connections, oh there’s a lot of traffic over here. I the router I’m gonna
switch to everything putting my packets over this way. Well, what happens if every body decides that?
You know, low traffic, high traffic, low traffic, high traffic and then this propagate. That’s just one
example.
Obviously this is maybe more immediate concern to us now. We build regulatory systems and have
notions of philosophical pictures of how an economy is supposed to work and what good business
practices are. Nonetheless, the systems can fail right in front of us even in the system with the best
intentions, forget fraud and so on.
So, power grid failures India right? What was there a third of the country goes down, no power for
hours and hours and hours, hum I wonder what happened?
So this notion of emergent structure it’s important not only as kind of a natural scientist looking out at
the world realizing that when you write down the equations emotional model that doesn’t tell you
everything. Also the systems we built so I see it as a very broad set of problems.
Okay, but the consequence at least for my argument here is that because of this these emergent
structures are not directly determined by the equations. Basically every system needs it own sort of
function basis, its own explanatory basis.
Then the issue for me for pattern discovery, the why, why the way I frame pattern discovery even comes
up is that we don’t know this ahead of time. There’s some process that obviously we seem to be pretty
good at as say natural scientists at sort of figuring out how the systems are organized. I’m interested in
studying that process of discovery well enough that it could teach a machine to do it. So that’s a tall
order. Mathematics gives me some sort of optimism.
So, why do we use computing in this setting? Just to explore what’s possible. There’s a lot of
exploratory phenomilogical in looking at different kinds of nominary equations seeing how they can
behave.
A large part of this area dynamical systems theory is based on coming up with classifications of
nonlinear behaviors that actually don’t require solving any differential equations, using geometry and
topology and so on.
Okay, so, nature forms patterns. First reason we want to compute is to explore just what’s possible.
Just to give us some intuitive sense of what’s possible out in the world or in the designed world.
Another reason to compute and so here let me go back to the early, early days so this Laplace in late
1700’s talking about actually these quotes here are from the preface to his book on probability theory.
So first he starts off of course setting the context of trying to explain why you need probability. Well, so
the first thing is to set up the straw man determinism. We all know this quote right, “…if we conceive of
an intelligence which at a given instant comprehends all the relations of the entities of this universe, it
could state the respective positions, motions, and general affects of all these entities at any time in the
past or future.” Nice clean statement of determinism, reductionism there.
Just to remind you of the role that astronomy has played. He goes on, “Physical astronomy, the branch
of knowledge which does the greatest honor to the human mind, gives us an idea, albeit imperfect, of
what such an intelligence would be.” So, on that question you go back and the French analysts in 1670,
amazing, amazing, I’ve been studying some of the history. Just amazing calculational apparatus
developed to deal with planetary motion.
Okay, punch line, he’s writing a book on probability theory so here’s his theory of why you need notions
or chance. It’s due to ignorance, “So it is that we owe to the weakness of the human mind one of the
most delicate and ingenious of mathematical theories, the science of chance or probability.” Because
we can’t be this omniscient, all seeing, all measuring super computer intelligence. Because we fall short
of that the world will appear to us to be to different degrees probabilistic or random, that’s it. It is just
ignorance. But if we could get to this ideal intelligence then everything would be certain for us.
Well, maybe some of you know the punch line to the story. Poincare comes along and I point out again
the role of astronomy in his book, New Methods of Celestial Mechanics, he’s studying the three body
problem. A book written in the chagrin state because in fact he proved the stability of the three body
problem and won a prize given to him by the King of Sweden, 6 months later the manuscript was going
to press to be published and the typesetter sent him a note saying, oh I think there’s a sign error here.
He realized the entire proof is wrong and then produced a 1700 page book over the next year that
shows that this deterministic chaos is absolutely fundamental. So, it may happen that small differences
in the initial conditions produce very great ones in the final phenomenon. A small error in the former
would produce an enormous error in the later, prediction becomes impossible.
What he introduces in this, he didn’t have computers back then poor guy. Just hand calculation,
coordinate transformation after coordinate transformation I don’t recommend reading the entire book
however pick it up sometime and just look at the last chapter from where this comes from. He just kind
of throws up his hands, I can’t even draw, I’m not even gonna draw for you the sort of delicate
interweaving of these solution sets built. A picture of that has persisted this day and actually many of
the insights from this work laid the foundations where this will be called qualitative dynamics. A
structural analysis of how states base of these chaotic systems are organized. It’s absolutely
fundamental even in mechanical systems as simple as a sun, an earth, and a mass less moon restricted
to orbital plain, fundamental chaotic this mechanism.
Now there are practical consequences for this that we are concerned about. So in the 20th century we
came to understand that if we have some kind of time series what we would like to do is make some
measurements, some early time trying to characterize what the current state of the system is. Then we
want to predict the state of the system later on. The time series shown here are the three different
coordinates x, y, z of the Lorenz differential equations, but sort of more generally we know now when
the system is chaotic the accuracy in our prediction decays exponentially fast. Hence we have to
measure again because this current measurement gives certain information about the state and we lose
that exponentially fast.
In order to turn this around the amount of data we need to accumulate either the accuracy that we
have to employ getting the state, how often we have to measure that grows exponentially with the
prediction horizon. Also the amount of compute time, you have to use increasingly accurate arithmetic
in your calculations even assuming you can handle at these two other things the compute time also
explodes exponentially. As in absolutely fundamental there’s no way around this.
So, consequences of this, there are no short cuts. As I mentioned before there are, actually you can
prove if you want to get sort of mathematical about it. There are no closed form solutions that give you
from the initial state, the state of a chaotic system in any arbitrary time in the future, can’t no closed
form, you can prove you can’t write down the interesting kind of proof, ha-ha. You can’t write down the
solution and no computational speed ups. You basically, if you want to know what the state is you’ve
got to invest all those exponentially growing resources to actually calculate that trajectory. If you don’t
do that, if there’s a little bit of error the errors gonna grow so large that after a finite time your error is
gonna be as big as the entire set of solutions. You won’t know where it is.
So, okay, so this is kind of, this is what theory has told us about these large scale, complex, nonlinear
systems. I see computing as a response, why did I go to the analog computer and not the PDP 11?
Because I just couldn’t simulate the things fast enough. It turned out there was a level of hidden
patterns in looking at the time series which had to do with the topological structure and the geometric
structure of the solution sets. I wanted to see those shapes and patterns. To do that I needed to study
these systems interactively and see thousands and thousands and thousands of solution sets. I like
doing that by turning knobs because I could see it on my oscilloscope screen.
So emergent organization, unpredictability, and even we have too limited a view of the world around us,
the wrong theories and so on. So these fundamental problems for me beg the question of how do I
discover something new? I can’t just do incrementally better because of the exponential scaling you’ll
quickly reach, ha-ha, you’ll double the time you need, you know, four times the resources. So, you have
to do something else. It means you have to understand fundamentally how these systems are
organized.
That I think is, or maybe I was reading into some of the comments yesterday about what the goal is
providing these astro and phormatic tools. What are you trying to do? We didn’t so much talk about it
yesterday but once you have all this running let’s just imagine your wildest dreams you’re successful,
you can store as much data as you want, you can compute as much as you want. It’s redundant; copy it
all over the world at the end of the day you’re still sitting in your office. All the stuff is laid out for you,
what are you gonna do with it? There’s that next stage of discovery. A little bit of understanding how
that process works should tell you how to design the tools too, right.
Okay, so, on the up side I just gave you a proof of guaranteed employment for those of us who compute.
So, you can rest quietly satisfied for a little while any way. Now, and maybe I’m sort of hinting at this,
you know the idea that we can just make bigger and bigger machines to make progress alone just by
building bigger machines or collecting bigger data set I think is just flawed. I just gave you a sketch of
the few radical reasons for that.
So, in particular in, ha-ha, it’s a funny irony, the more powerful these machines are they approach in say
the simulations they can do the sophistication of the natural world. Right, you have some companion
star dropping mass into the cretin disk and you’re doing rock by rock and great. What did you
understand? You may have questions you want to experiment with that simulation but that doesn’t
lead to understand.
So in this, for me, I want to separate out sort of understanding and the role of theory from just building
better tools. I think there’s something else and I think there’s a way to appeal to some of the theory I
alluded to that shows how to do this.
So, so we need sort of new tools for understanding. Yes, big data, yes big computing but also big theory.
So, same comments on just building bigger data sets.
Complex systems, these unpredictable systems require you to take huge amounts of data. It was a little
bit odd in the early days of all this to have to argue with your program manager that I needed databases
to study low dimensional nonlinear systems because I wanted to store all the data they produced, the
diversity of data. To do this qualitative topological geometric analysis we did I needed vast amounts of
data.
Okay, so moving away a little bit from sort of the provocative critical part of this I want to propose very
quickly some kind of solutions and we’ll end up with some, hopefully bailing things out here with some
toys.
So the very presumptuous goal is to come up with a theory of theory building. What is that discovery
process? Right, I want to study how we build models of the order, the kind of geometric structure
somewhere between the Laplacian determinism focused on particular solutions and the other extreme
is just throwing up our hands and calling it all random. Right, all those pattern forming systems I just
showed you are neither of those two extremes. There’s something about how they’re organized and we
recognize it. I showed you, I used those two dimensional visual patterns cause we have this evolutional
inheritance of visual recognition.
So the claim is that we should be focusing on direct computational support for theory building. I really
don’t see much alternative to this. Eventually, right, after the genome was sequenced finally the
biologists came around and said oh there’s a thing called systems biology which has to do with how
genes interact. Huh, right, it’s okay.
At some point your back sitting at your desk, you’ve got all the tools that current technology will let you
put together and you still have to sit there and think. So what set of tools do we need to help us
understand that and see the patterns?
Quick example of this here’s a little universe. It has one dimension, 287 binary sites, the ends are
neighbors and I start off in some arbitrary bit pattern. There’s a local deterministic rule each site looks
at its two neighbors and its own bit value and makes a decision what should come at the next time and
you get this complicated pattern. I may not go through this in detail it’s called a cellular automaton,
initial state down to state at time 99, hidden in this is actually these particle structures. Their regions
have kind of homogenous texture. There are boundaries between these things and there are two
different kinds of these things. So in that midst that I just showed you exactly the same data a little
demon here that knows about the background texture been run over that same data and it pulls out
these particles. So how was that discovered, this sort of hidden particle physics? Well there was
actually a huge computer assisted proof system that we used that we’d go through and, I won’t go into
detail here but, doing a kind of exhaustive search of different candidate background textures, pruning
these things off by counter examples and so on trying to decide whether the candidates were an
expanding set in the states base, a contracting set, or the goal these space time shift invariant domains.
The product was a table like this of basically about a million theorems. This isn’t numerical computing
this is returning algebraic results. It’s completely clean no round off here and actually showing that Rule
22 had these domains.
So to kind of say this sort of most prosaically you say to the system I want to study Rules [indiscernible]
on one ten, a particular set of interactions or Rule 22. It returns the domains, these are like vacuum
states it returns the particles; these are the excitations above the vacuum state and the particle
interactions. It basically gives you the full; I don’t know what you want to call it, artificial particle physics
in a completely automated way. It discovers the emergent structures. So this particular case granted
this is binary world but at least for me it provides a proof of concept that you can have a completely
automated discovery system.
So, what about these tools? So let me go through this quickly and mostly just to show you two quick
films. So, nature’s complex, our minds are limited but how can we extend them to build new kinds of
tools? It’s a kind of cognitive ergonomics for science.
Not only do I want to understand how the world can make organization and patterns I also have to
understand what the human observers limitations are and that’s where the engineering comes in to
help sort of match these things. There are people that say visual and acoustic cycle physics. These
people should be talked to. It’s sort of like a grand gooey design, ha-ha, if you like, but much more
informed about the complex nature of the world.
So, one of the ways I’ve been looking at this is using and some of you have seen these things since you
have been around super computer centers, a cave environment. So I have three walls and a floor, each
surface is stereographic projection that you look at through LCD shutter glasses, then you have your silly
sort of theme analog so called wand.
Actually the most important part of this beyond what you’d see at your 3-D theater is this little bar up
here. The system tracks the location of your head and where you’re looking. This, actually, this point I
want to make is impossible to make. You can’t understand the importance of controlling software by
just moving around through it until you do it. It’s one of those things.
So here’s just a simple example, this is [indiscernible] the chief software engineer for the system,
software system. When we follow this he’s actually designing a protein here interactively. There are
different views he’s got. He’s grabbed on to a little beta sheet and manipulating that around. There are
different views of the same structure the sort of atomic composition rather than the abstraction of
alpha and beta sheets, alpha [indiscernible] and beta sheets, and so on.
I’m using this to study and teach mathematics of these chaotic systems. Of course I torture my students
endlessly at the blackboard making them sort of work through the abstractions and then at the end of
the quarter we go into the cave and within minutes they get these things. I’m trying to, how do these
different surfaces interact here?
So what you’re looking at is a tool that is, think of the analog computer finally recapitulated a
differential equation solver. What Jordon one of the programmers is doing he’s actually changing in real
time the param as the co-efficiency in the equation looking at the solution sets. The different ways we
can look at this to see how these complicated solution sets, in this case the Lorenza tractor sheets sort
of suture together. There are many other ways of probing these differential equations.
Here we’re showing you one solution set in goal that’s one particulative solution and a hundred and fifty
thousand other initial conditions evolving. As the system is solving this you get to walk through it and
stick your head inside of it. With in seconds you can figure out the geometry which is the goal. That is
how these solutions inter-structure. That’s what I needed to understand way back when, when I was
studying these things on the analog computer.
A little criticism here, the control interfaces here, these things, they’re menus. This is like importing all
of the debates we had back at Xerox Park about how people should interface through gooey’s. That’s
terrible. One of the most interesting things about this domain having a cave like this, this immersive
environment is the new kinds of interface that are possible. We don’t, haven’t even scratched the
surface to import all these motif like widgetses. It’s a disaster, it’s so completely unnatural. There have
to be other ways. Just like at Xerox Park the debates we had in the 70’s and 80’s about how, what
fiction should we present to the user so this bit map displays and the hard disk the state of these things
made sense. Now we have the desk top we all take it for granted.
We’re in exactly the same position now with these 3-D devices. So the cave is a half million dollar single
user device. Kind of prohibitive but we’ve made some progress you now lower the point down to 5k and
here it is. So this is the current instantiation of the Macroscope, your favorite Samsung Sony 3-D TV
plugged into your Unix box, we have 2 GPUs in there. One for the graphics display running open GL and
the other for doing the HPC simulations. In some ways almost more importantly we have this thing.
This is a game controller that you can move in 3 dimensions called the Hydro Razor, one hundred bucks,
and it gives you x, y, and z and you have a knob, joy stick and then buttons on the end of it to interact
with it. You can’t quite tell but there’s a little control knob here and I’m actually pulling out this sphere
of a hundred and fifty thousand initial conditions to let that evolve around.
So, and there are new tools. How do we access tools going up and selecting a menu like this? Well, it
gets tiring in 3 dimensions that’s one thing you’re not sitting at your desk kind of relaxed and moving
your mouse around. When you’re in a cave and you’re interactive like this it’s very physical. So you
don’t want to have this older paradigm of interface. So we come up with this kind of rotator tool that
you select things.
But the punch line is this, so five thousand, great five hundred thousand dollar single user system down
to a five thousand dollar user system. This is all open source its DIY, we’re putting together a website
you can do it yourself on all the underlying software all of us have written completely available to do
this. It runs on, these guys all the way up to Geowalls and other. But this is the near future.
So in terms of input and control, I don’t know if some of you might have seen the video this little
cigarette box sized thing called the Leap Motion, $70 USB device. You stick it down in front of your
keyboard down here and through, as a series of infrared transceivers in here. It picks up and gives you
this point cloud so you’re now forget that stupid wand thing or the Hydro Razor. It’s gonna be a natural
interaction you just get to touch your data. Oculus Rift, this replaces all of those heavy giant displays, all
the digital theater quality projectors in the cave, forget it, $500, called the Oculus Rift they just closed
their kick starter. I hope some people here were supportive of this. These are gonna be available in
December. The SDKs will be available.
So, what does that mean? That means I will actually be able to demonstrate to you rather than say its
impossible story to make. I could just plug this stuff into this guy and have exactly the same emergent
environment that I’ve been using in the cave or the Macroscope.
So, this is where it’s going. The point here is in some ways not so much the different hardware
instantiations it’s almost a mistake to focus on the hardware. The question is what kind of user
interface, what sort of tools should we be developing to help us understand all the data, all the
computing that we’re generating? That to me is sort of a wide open thing I take a little bit of
encouragement from the fact that there is some of this through nonlinear theory that’s telling us what
these emergent patterns are and how to look at chaotic systems and measure that.
So together to me that’s the vision of this Macroscope and it’s coming to a desk top near you. Thanks.
[applause]
>> Matotasho Ishi: Okay, thank you very much, questions, yes.
>>: The last three or four slides where very interesting but I was distracted by the rest of the talk in the
first part of the –
>> James Crutchfield: You’re a gadget guy.
>>: [inaudible] no, no, no –
>> James Crutchfield: Oh yeah –
>>: My, my questions are about the first part.
>> James Crutchfield: Right.
>>: One of your slides you say is you want to build [indiscernible] theory –
>> James Crutchfield: Right.
>>: [indiscernible] about building –
>> James Crutchfield: Exactly good, yeah.
>>: That’s positive, a [indiscernible] program.
>> James Crutchfield: Yeah, right, right.
>>: Are you saying that we are going to be able to succeed to building a simple algorithm that will allow
us to discover anything in many kinds of data sets?
>> James Crutchfield: I’d say through my study of these nonlinear chaotic systems [indiscernible]
pattern forming systems those are domains in which I know how complicated the problem is. Using
some of the mathematics that goes back to [indiscernible] and other things developed in the 20th
century I’m very optimistic about doing it in those domains. I mean I’m very cognoscente of how
complicated this is.
The idea that it is a simple algorithm I do not subscribe to that but there is a way of thinking about how
we build approximate models and we use sort of a comparison across different models as they become
more accurate to look at how smaller less accurate models become larger more accurate models and
thinking about that and I appeal to the vocabulary, I know renormalization group to look at how you go
from imperfect assumptions the world has finite correlations to actually building up a model to describe
infinite systems with infinite correlations.
There is a way of doing that systematically. So I called it higher arc of goal machinery construction. So
it’s not just trying to build a model but to look at, just be honest about the fact that maybe your initial
assumptions are wrong and look at how those assumptions are violated in a systematic way that can
actually lead to ways of building more sophisticate models. I go from the cellular tomiton zero-one
configurations to asking about these invariant sets, these domains, oh I find those but then factor those
out of the system, oh they’re particles, oh they’re red and yellow particles and then I try to describe
those. Maybe there’s some regularity with those things. People try to compute with these with these
cellular tona. So there’s a way of actually taking those particles and imbedding the universal machine at
another level.
So the point here that really, if I’m saying this clearly enough, the point here is the process of discovery
is one of changing your assumptions. Of kind of moving up levels of abstraction if you, must be some
people studied computation theory here. It’s very much like an operator that moves you up and down
the Chomsky hierarchy. [indiscernible] machines to push down automata [indiscernible] machines
although it’s much richer the way we’re looking at it, many other intermediate steps. But that’s the kind
of idea of, so it’s not a simple algorithm it’s kind of this hierarchal [indiscernible].
>>: [inaudible] last one.
>>: [inaudible] question about the automated atom [indiscernible] discovery. What about things like
symbolic regression where, you know, there are well tested [indiscernible] –
>> James Crutchfield: Uh hum –
>>: Or looking for [indiscernible] equations –
>> James Crutchfield: Yeah –
>>: Solutions to –
>> James Crutchfield: Right.
>>: I’m looking for constant [indiscernible] corrections –
>> James Crutchfield: Right.
>>: So I was wondering whether you could just say something about that.
>> James Crutchfield: Yes, thank you. I wish you were a plant. So some time ago you can take this,
these notions from nonlinear [indiscernible] and there’s a way of going from a single time series, you
may not have all the probes in to the system but you could show if you come from a singe time series to
reconstructing a state space either by taking successive time derivatives or time delays that gives you an
effective state space of the system. Then once you have the state space you can think now of the states
mapped in time into each other and it then becomes actually a function fitting problem to get the
differential equations out. So you can do this directly, and it’s kind of been re-popularized recently.
>>: [inaudible] there’s a technique that [indiscernible] –
>> James Crutchfield: I know I’m –
>>: You recode [indiscernible].
>> James Crutchfield: Ha-ha, yes –
>>: [inaudible] –
>> James Crutchfield: It came out in science popularized in a wired magazine but this is actually the
original paper and they actually got something wrong, so, ha-ha.
>> Matotasho Ishi: Yeah, I guess too many other people may want to [indiscernible] questions but our
time [indiscernible] so we just want to thank James again and please continue –
[applause]
>> Matotasho Ishi: Next keynote talk is ECR A Framework for Interactive Visualization by Curtis Wong.
>> Curtis Wong: Okay this is gonna be a much simpler talk than Jim’s talk. It’s almost kind of a here’s
what I did on my summer vacation talk. I thought I would talk a little bit about, since the topic is general
about data visualization to recap a little bit about what we’ve been doing with WorldWide Telescope.
So, Alex Szalay yesterday talked about sort of 10 years of this work that he’s been doing. About 10 years
ago sort of when I got involved in the space Jim Gray was giving this talk about databases meet
astronomy and this last slide was, you know if you’re a vis-person we need you and we know it. So I
offered to help and then, you know, we kind of got busy and built the first DR2 website.
Then Alex suggested the following year that I go to this workshop in Chicago, basically with, about the
visualization of astrophysical data and I was kind of intimidated because it’s a room full of
astrophysicists not sort of unlike what we have here. But he said go ahead, you know, tell them, you
know, give this little talk about what I’d pitched to him which was, what I had said to Jim I said if, you
know, we bring together all this astronomy data into one place I think it’s a terrific opportunity to create
a learning environment for lots of people. Basically not just having access to the images but building a
virtual environment and then when you have this virtual environment you can then connect those
objects to other information on the web and perhaps that in that environment build a virtual camera so
that you can create guided tours in this space, he said, great go build it.
So this was sort of my original presentation deck that I gave there in Chicago which was a very simple
idea of bringing guided tours together into this virtual sky and connected to sky server. Then about a
year later I hired Jonathan Fay back there and he quickly put together some work with the Sloan
Imagery from a tiled image browser that he had been working on for a few years. We pulled that
together and got sort of the green light to go build this crazy idea of the biggest telescope on the
Internet.
So, we launched about 18 months later at the TED Conference with Roy talking about it. But since that
time I’ve been really wanting to sort of figure out how to connect that work with some more immediate
needs of the growing data and a lot of the public wanting to figure out how to visualize data.
So that’s what we started doing with the WorldWide Telescope. About 2 years ago, how many people
have seen the earthquake demo on WorldWide Telescope? Just a hand full, okay, so I’ll show you this,
this is a tour we bring in about forty thousand – is it behind this thing, it might be, yeah I think it’s there.
So you might want to turn the lights down here to.
Okay, so when you’re looking at the ring of fire it’s not particularly obvious, sort of what’s going on
here? You know why is this stuff happening in the middle of the ocean? We’re used to thinking about
earthquakes happening as a result of faults but when you take the same data and you can look at it with
perhaps a different underlying base layer such as here [slide]. You start to see dark oceans and then
light oceans which I guess means shallower water and deeper water and you start seeing from the depth
information you’re seeing a pattern here. In terms of oh, I get it, there’s earthquakes are all forming
along these subduction zones. That same pattern is visible in lots of other places –
>>: [inaudible].
>> Curtis Wong: Ha-ha, yeah there’s not as much going on here in Washington as you can see some
things are somewhat below the surface. But if you go down in California you can literally see the San
Andreas. [indiscernible], okay there’s the Bay area. As we move you can sort of see the motion parallax
really defining the path of where the San Andreas goes. This is the blue marble earth image database
which is why it’s sort of [indiscernible]. When you come down here to Salten Sea there’s a lot of activity
here down by Baja.
The other thing about this is that, you know; by being able to visualize this stuff in 3-D is the ability to
look at sort of data over time which I think is something that, it’s not really easy to do today with
commercial or even free tools, especially for the average person on the street.
So we brought in some of these faults, you can see the Haiti earthquake there but we’re gonna look at
these earthquakes at about a million times real time. You can really start to see large duration temporal
structures which are kind of interesting, as well as sort of exploring within these individual quakes
themselves.
We’re gonna go back to my talk here. So another example is - we’re actually running kind of short on
time. This is a half a million data points, accumulative rainfall for precipitation in the United States. So
these are constructed very simply, very much like a Power Point. If you’re not familiar with how these
tours are done they’re done like Power Point where you’re basically setting starting and ending points.
Then the camera just goes and tweens between your different slides. You’re just telling how long each
slide is.
The nice thing about these things even though they look like video is that they’re just paths in the
environment. So at any point you could sort of dive in and start to look at individual data points.
Okay, let me put that one away. So this is - we started to get a little bit more grounded here if you will.
There’s been a lot of interest in the product we share at Microsoft in terms of visualization of business
intelligence data.
I took this public data from the City of Chicago which is on narcotic arrests and this is actually kind of
interesting. We’ll take a look here; this is about 18 months of narcotics arrests. You can sort of pick any body here from Chicago? Okay, well we’re gonna go into this in a little bit deeper but you can sort
of see the temporal pattern here of where the action is. I’m gonna –
>>: [inaudible]?
[laughter]
>> Curtis Wong: Well let me go in, okay, so I’m gonna take this data and we’ll look at it say, okay, so we
can see all of it at once and then start to look at this in terms of account of domain [indiscernible]. So, if
you’re looking here, let me sort this so you can see this a little bit better. So there, of the forty thousand
arrests it seems like cannabis is the drug of choice. You can sort of see that about 52.3% of the arrests
are for that. If you’re looking at like, I think that’s heroin, heroin is kind of concentrated down in this
area here, crack somewhat similar, cocaine as you’d expect maybe is a different audience.
In fact if we go into this and look at say time. Let’s say date and hour of the day, so these, you see sort
of a 9 to 5 here and also sort of a peak in the afternoon which is kind of interesting.
If we go and look at, go back to a different one, we were looking in heroin, lets look at cocaine and do
the same thing. Must be people working because they’re kind of busy during the day and all the action
happens after hours. But you know the other things we can do is we can start to look at some of these
by any other columns that we have in the database, either looking at them by precinct or ward or any
other sort of metadata that exists in there.
What we’re using WorldWide Telescope for here is just to start to look at some of this faceted browsing
in terms of the existing data that’s there. Other things that you can sort of do, let me put this away
here, and this is not a finely UI, this is just Jonathan and me sort of exploring what you can do here. Let
me skip and go back here.
So I want to sort of bring this back to the ECR reference that I made. When I first started with
WorldWide Telescope the information architecture was designed in my perspective around education.
How do you get some body interested in a topic that they may not already know something about? To
me that was to start with stories so we wanted the ability to quickly be able to engage some body with a
story and that meant through this being able to create a virtual camera to be able to put in narration
and music and other kinds of things, animation hyper-links that would all create an experience that was
as good as a regular kind of documentary. But that because it was just a path in this environment that it
was fully interactive at any time which allowed for individual self discovery, then connecting those
objects to information on the web.
So I think that same model also applies for how you might want to leverage doing data visualization in
that it’s a way to share insight about hotwires and other things in your data cloud. To be able to
annotate them to create this path of perspective about a particular insight and share those especially
when we’re talking about bigger data and it not being able to move the data around but instead moving
the analytics to the data itself. This might also be another methodology from which you could start to
communicate and share insight.
So I know they are already starting to be some places that are including GPUs in the cloud and initially
for gaming but the idea of trying to do something like this we are doing a camera in this space and said, I
think Alex referenced this yesterday of sending out an HD stream if you will, but since it’s rendered in
real time like we do with our tours there might be an interesting way to interactively explore some of
these large data sets and then of course linking any of those particular insights to other data sets that
you can bring in as well.
So some of the challenges that we’re finding in working with making something like this usable by any
body is of course the initial challenges with dirty data, unstructured data, I think those are our biggest
problems that we’re facing. It’s sort of a universal problem with any body that has data particularly on
spreadsheets.
Then sort of the next stages that we want to move to are source analytics in that space applying
machine learning and that’s just sort of understanding what happened as opposed to what really could
happen and, you know, that was really important insight for me here.
We want to bring in things like N-dimensional slicers and cubes to be able to segment data not just
along the existing metadata but to be able to do the kinds of things across dimensions that don’t already
exist. Spatial queries as well in combination with the things that we’ve shown, complexity are always a
problem with visualization and we’re trying to get to, I think this as we called it working with the office
team about the shortest path to wow. That’s something we’re spending a lot of time to try and make
happen.
Any way, with that the other work that we’re doing in this space with another team related to us which
is the VXT we’re building a very large power wall, right now it’s 35 megapixels but we’re gonna make it
much bigger. It’s, the team is building a very different graphics rendering structure. We’re trying to
move computation closer to the pixels so [indiscernible] functions and not pixels over the wire. We’re
bringing, bringing it together with some of the connect technology, some speech and touch and
multimodal and other certain devices with it as well. That’s a project called vX that’s happening here.
Sort of getting close to closing this whole idea of integrated versus connected visualization applications
and the WorldWide Telescope was kind of an integrated application and has its own challenges because
each one of these of course only use some of what you really need. Most of you doing astronomy stuff
are probably using SAMP with a combination of other different components to be able to do the kind of
visualization analysis that you’re doing. Of course those require programming at various stakes and
that’s something we want to try and avoid if we’re gonna make this available to generally everybody.
Lastly I want to sort of finish with this paper that Alyssa Goodman did. I was looking at it and I thought I
would share it. I have a copy of it on a thumb drive. Basically it’s a summary of the state of where
visualization is if you’re not up on sort of a past, present, and where she thinks the future is in terms of
bringing different visualization technologies today for astronomy. So I would highly recommend this
paper as a summary cause yesterday someone had said is there sort of an equivalent of Toughty Book,
well this isn’t a Toughty Book but certainly sort of, here’s a snapshot of where things are and I think it
was a good paper.
So with that I think we’re back on schedule. Thank you.
[applause]
>>: Your talk was so short.
>> Curtis Wong: Well, I skipped a bunch of stuff deliberately, so, ha-ha.
>>: You have plenty of time for questions.
>>: One of your bullets was dirty data, I’m not sure exactly what you meant by that but it seems to me
there’s an issue for all of these approaches to visualization and data [indiscernible] by saying –
>> Curtis Wong: Uh hum.
>>: If there’s observational noise in data and you have this powerful tool for viewing it in a whole bunch
of different ways –
>> Curtis Wong: Theoretically you can pull it out –
>>: [inaudible] great but also gives you a whole bunch of statistical fluctuations. Okay so in a way
you’re building a tool to generate false statistical fluctuations so –
>> Curtis Wong: [inaudible] –
>>: Statistics [indiscernible] in terms of counter measures but I think that’s something that isn’t often
considered in building the tools. So do you have any tools that specifically are counter measures to
statistical fluctuations, and that’s my question?
>> Curtis Wong: Well I don’t think we have even gotten to doing as much statistics in this space yet, so,
not yet. I guess it’s a simple answer but we hope to, we hope to. Jeff –
>> Jeff: In that context one thing when you read the old books on surveying there’s a difference
between error and blunder, okay that errors something you can understand what the structure of it is
and there’s ways to beat it down with having lots of data. Blunders you have to detect, you know if you
look at the history of ship ward observations of cloud cover. A significant number of them are on land
and it’s because some body wrote the latitude and longitude wrong –
>> Curtis Wong: Yeah.
>> Jeff: And so that means that some of them on the ocean are also wrong because, but not
[indiscernible] obvious.
>> Curtis Wong: Right.
>> Jeff: So, you know, so like one way to detect that is to track an individual ship and then to see if it
jumps a thousand kilometers –
>> Curtis Wong: Right.
>> Jeff: And so I think that’s this idea of using something like this to be able to detect wonders in the
thing where instrument just gone haywire, you know or a horse has stood on top of the snow globe, it’s
very useful.
>> Curtis Wong: [inaudible]
>> Jeff: You know the people who make real measurement out of the environment we know all the
things that go wrong, no we don’t know all of the things that go wrong, ha-ha, but there’s data has a lot
of glitches in it.
>> Curtis Wong: [indiscernible] for you the bad data that you’ve encompassed is a very, very small
partly because over time you’ve come to recognize what’s what?
>>: [inaudible] that means there was kind of, I know I found lots of bad measurements, I also know I
haven’t found them all, right.
>>: So [indiscernible] some years ago graduate student of mine [indiscernible] worked on a way to
represent this. We used transparency as a way of measuring how for the errors on the data basically.
[indiscernible] point of view of the viewer so had lots of poor quality data in one place [indiscernible].
So that’s an approach maybe you could apply to that.
>> Curtis Wong: I’ve also been interested in possibly using sound, you know as a way to let you know of
things that may not be visibly obvious but are perhaps outside of some boundary. It’s amazing sort of
how we can detect subtleties both in periodicity and amplitude and things like that.
>>: [indiscernible] in ways that you’re very confident of [indiscernible] very firm [indiscernible], ha-ha.
>> Curtis Wong: Ha-ha.
>>: That’s quite nice –
>> Curtis Wong: [indiscernible] –
>>: [indiscernible] come out with the strongest, the strong is to conservative, ha-ha.
>> Curtis Wong: Yes.
>>: Are you considering putting a more abstract bottom layer than [indiscernible] or the sky into the
system?
>> Curtis Wong: Jonathan and I have been talking about that for a while. That’s what I sort of meant by
the N-dimensional data slicers that we could use to do things with that then would be represented there
or not. People are always saying hey can you do the genome that has its own sort of challenges but at
least it’s sort of a linear sort of spatial model. Jonathan you want to say anything about this?
>> Jonathan Fay: Well kind of, so as it sits right now you can create some, you know, empty reference
frames that are just, you know, multi dimensional space and off date it and just use it without any
concern to whether it’s important into reference system or anything like that or any reality. Ralph
[indiscernible] does a lot of demos where he is like chemical compositions of water coming out of a river
and he just creates a 3 dimensional grid of this right in the middle of the ocean where the river comes
out. So it’s sort of spatially located because it’s associated with that river but it’s just a [indiscernible]
grid made with time, you know, sending out, you know, just some random spot to be able to use that as
a 3 dimensional canvas to do a chart on. But you could do it in, against a background reference or
without any other reference if you wanted and supported in the –
>> Curtis Wong: We just don’t have that in the black cube with nothing else in there yet, with a grid.
>> Jonathan Fay: [inaudible] just turn off all the background reference.
>> Curtis Wong: And then you have it.
>>: So have you reported this to an immersed environment?
>> Curtis Wong: Ha-ha that would be cool.
>>: Yeah.
>> Curtis Wong: We haven’t done it yet it would be fun. We have –
>> Jonathan Fay: Sorry, sorry what do you mean we haven’t done it yet?
>> Curtis Wong: I mean we have the big walls but we –
>> Jonathan Fay: But we have it running in –
>> Curtis Wong: We have to –
>> Jonathan Fay: [indiscernible] stereo walls, pole dome planetariums, and tracks [indiscernible]
planetarium. We have not done a full dome stereo but there’s actually, you know ourselves, but there’s
actually a guy in Germany who’s using it for underwater, you know, [indiscernible] tree and other things
like that. It has a stereo under project a dome so you kind of sit over this and it’s an inverted dome and
you can fly around in this. He’s trying to connect this up so that they can have two of them in different
universities and collaborate with each other interactively on it.
Another guy who’s actually sold a power wall to Microsoft and asked me if I knew the person it might be
you that he was talking about. The power, one of the caves, [indiscernible] caves, yeah.
>>: Yeah.
>> Jonathan Fay: But they’re also experimenting with setting up a WorldWide Telescope environments
in that as well. So it supports that functionality. We don’t have currently right now and immersive
studio at Microsoft with it installed but we have a lot of people in immersive environments like Full
Domes, [indiscernible] systems that are using it that way.
Right, but the immersive means tracking and it adds like I said it’s impossible to make the point –
>> Curtis Wong: You’re not experiencing it.
>> Jonathan Fay: You’re not experiencing it in an extra level. In one of our [indiscernible] demos is
actually flying around the earth and looking at the fake boundaries and where the epicenters are. You
can just bring it right up and have the data right in front of your face and kind of look around like that
and [indiscernible] and in 30 seconds you know what tectonics is all about.
>> Curtis Wong: Yeah.
>> Jonathan Fay: Cause you’ve seen it –
>> Curtis Wong: Yeah.
>> Jonathan Fay: Right here when you’re showing it to us where using the motion –
>> Curtis Wong: Yeah. [indiscernible] there.
>> Jonathan Fay: This way it’s all just complete and the fact that you can, basically you’re controlling a
program by your bodily position there is something about that extra level of feedback that makes it so
present.
>> Curtis Wong: I was looking at –
>> Jonathan Fay: [indiscernible]
>> Curtis Wong: I was in one at Exxon Mobile which does a lot of their exploration stuff but I found their
stuff the latency was still an issue for me. Has it gotten better?
>>: It’s still a little bit tricky.
>>: Yeah but there’s some examples where it’s really smooth even very complicated things.
>> Curtis Wong: [inaudible]
>>: Actually I have two questions. First one is about, have you, did you think about all the things of
tools [indiscernible] tools for example [indiscernible] to interact with the visualized data? The second
question is about the, as it is accessible yet did you think to make it collaborative?
>> Curtis Wong: Two good questions. The connect we’ve, Jonathan is shown that, I think, two years ago
so it’s done with a gesture part and I think it’s been improved a little bit. Speech is getting there; it’s
more of us focusing to spend more time on it. Then I think with regard to what was the second
question? It was about –
>> Jonathan Fay: Collaborative.
>> Curtis Wong: Collaborative, we haven’t done the true collaborative where people can remotely
control or communicate through it to give remote demos or anything yet, now.
>>: [inaudible] show you something –
>> Curtis Wong: Oh, okay.
>> Jonathan Fay: In our [indiscernible] sorry our eclipse release in December we should have a
collaborative environment that will allow planetariums to connect with each other and drive along and
based on multi vendor protocol that we’re gonna be collaborating it will work from WWT to WWT but it
will also work with sky scan and [indiscernible] some other things like that. We’re in the process,
because of the fact that there’s several vendors involved and some people to try to drive that, you
know, it’s not as fast as just getting it going but it’s the intent to do that. The [indiscernible] that we
already have people have already built like I mentioned this underwater viewer with the, you can
actually build your own communication mechanism to synchronize across the web to be, the
fundamentals are already inside WWT it’s just we don’t have a way of acting sort of like messenger in
the cloud to connect people together and that’s sort of what’s necessary to close the gap because the
network protocols for it are in place its just there’s no sort of like game logging like they have in multi
player games and that’s sort of necessary in order to make it easy for people to connect up.
>> Curtis Wong: Is that the first public announcement on that Jonathan?
>> Jonathan Fay: What’s that?
>> Curtis Wong: I’m talking about the collaboration mode.
>> Jonathan Fay: The collaboration?
>> Curtis Wong: Did you mention an IPS?
>> Jonathan Fay: That’s where it came to light.
>> Curtis Wong: I’ve been working in office for the past year and a half. I’ve been a little bit removed
from this. But anyway, thank you.
>> Matotasho Ishi: Okay, let’s [indiscernible] to preface again and –
[applause]
Download