Danyel Fisher: Good morning, my name`s Danyel Fisher. I`m a

>> Danyel Fisher: Good morning, my name’s Danyel Fisher. I’m a Researcher here at Microsoft Research in Information and Data Visualization. It’s a rare opportunity that we have today to be greeted and spoken to by Al Inselberg. Al I should, by the way for those of you watching virtually I need to give you a warning. Al tends to reward really good questions with chocolate. While I will be on link so if you want to ask any questions online I can relay them. The chocolate will be more difficult to send through campus mail so if you’re like sitting up there in your office on third or fourth floor and you contemplating whether to come down, I’m just saying. [laughter] Al Inselberg has a long and distinguished history. Virtually any university that you can contemplate he has served some time speaking at, although his Ph.D. is from University of Illinois, right, in Mathematics and Physics. He was at IBM Research for a number of years before he has moved to UCLA, University of Southern California, eventually to Technion and Ben Gurion University. He’s now concluding a one month tour in the United States by joining us here at Microsoft Research to talk about some of his seminal work in Parallel Coordinates. With that I’d like to welcome the founder, the creator of Parallel Coordinates and a fantastic raconteur, Al Inselberg. >> Alfred Inselberg: Ha, ha, ha, wow. [applause] >> Alfred Inselberg: Thank you very, very much Danyel. What can I say, I appreciate it. Take it with a grain of salt or maybe even with the rock salt he what he said but it’s nice. I wasn’t aware of the fact that there is an audience but much of it is virtual. I shall do my level best to also entertain them. Since we will be talking about visualization of some sort let us start with this semi-unserious icon. You might notice that you have an opportunity to do some extrapolation in years and use your imagination and all that. But, I want to point out that in the demos I will be using real data. As an example this magnificent specimen here turns out to be real. Those are the panties of Queen Victoria which were found in a museum and you can read it in German. It also says that the girth was 140 centimeters, it does say that, and some people even suggested that that maybe the real Victoria Secret, but. [laughter] We will not contemplate them. Okay, so as Danyel very, very graciously mentioned we will be talking about parallel coordinates and some of the things that we can do with them. A bit of history, I was not in the past interested in data at all. Rather when I young and that was a long time ago I fell in love with geometry. I particularly enjoyed the fact that in geometry we’d be given a problem and I could draw the problem out. The people, they heard about the chocolates, yes. Hello. You could draw, make drawings about the problem. This is Tanya; she was my student at Hebrew University also a long time ago, fantastic. If I have a chance I want to show some of very nice work that she did in those… >> Danyel Fisher: Did she attend alma mater? >> Alfred Inselberg: Your alma mater? So now we see that Alma really does matter. >>: Yes. [laughter] >> Alfred Inselberg: Okay, so as I said I love geometry because I could make pictures about problems, interact with whatever is in between with a picture, get inside, redraw the picture, redo it. Those days we didn’t have computer graphics. I felt that I really understood what was going on the moment that I could learn how to make good pictures about that problem. So as my studies proceeded I learned about multidimensional geometry. The learning process was something like professor would put an equation on the board and say this is a hyper-plane, another equation this is a hyper-sphere, yet some more this is a hyper-surface, all this hypers. I couldn’t see a blasted thing other than equations. So here were this man talking geometry but playing around with equations and no pictures. I had felt instinctively that it should be illegal to try to do geometry without pictures. So I had a burning desire to come up with a way to make multidimensional pictures of some sort, and had noticed that I would discuss this idea with my friends they would distance themselves from me. So I decided not, anyway so in the course of time I looked at a number of ideas that people had come up with in statistical graphics and that didn’t work out. So I want to for the fundamental idea that Rene Descarte gave us, namely coordinate system or by means of a coordinate system. We can actually transform an equation to picture, not lose information, learn a great deal from the picture, go back and forth. I said, wow if we could only do that for functions, relations that had many variables. Then back into geometry the idea came up that in geometry the fundamental concept is parallelism and not orthogonality. We have Euclid’s fifth axiom that speaks about the existence of parallel lines. But we don’t have anything in the axioms about angles. So to speak about angles first of all a new concept has to be introduced and then make specific choices and what have you. In short, parallelism is really the fundamental notion and not orthogonality and they’re not equivalent. So I said, well let’s try and construct coordinate system with parallel coordinates. Here’s how it works. If we want to do something in five space or if we are interested in displaying data with say five variables, we take a copy of the plane, take five copies of the real line, place them equal distant and parallel to each other, label them with whatever labels are appropriate for the problem, and also super impose a Cartesian coordinate system something which people often forget. It’s a pity because it is useful. Okay and we take the first component of say point and five space. We measured on the first Xs, the second component of the second Xs, third and the third Xs, and so forth. To indicate that all this is to be considered as one entity. We join the corresponding values with straight lines. What we have done is we have constructed a one to one mapping between say ordered quintuples and polygonal lines who’s vertices are in parallel Xs. By polygonal lines I really mean to take the full line and you see that we need to, why is that useful? Okay, well and of course we haven’t lost any information because from the picture we can recover the numbers, we can keep on adding Xs, wow. I cannot tell you how much I regret that this definition is so simple. The reason is I regret it is because, where’s the copy of the book some place, thank you. The reason I regret it is because people tend to look to read the papers. They read the definition, they say I got it, and do not read the rest, and proceed to try and solve problems which have already been solved. You see it’s quite a bit. It’s a pity, so this is what it is and simple as it is we can add a lot of additional ideas partly. Part of them very nice ideas that Tanya contributed in her project in my class there, and do some very interesting things. But even really simple idea we can still do some useful stuff, which is why lots of people believe it or not use parallel coordinates. Let me illustrate. So here I’m showing you a map. It’s a portion of Slovenia and a very nice country in Europe. I was there. I was working on some data with an economics group, a group in economics. Close to us was a group of people looking at data that were collected by satellite. Don’t ask me about the physics because I don’t know them. But depending on the ground there are different emissions that the earth gives and they can be distinguished and measured. So the satellite in this particular case was measuring seven different types of, shall we say radiation from the ground from each particular point. The purpose of the exercise is by looking at the data to be able to tell what’s in the ground, whether it’s water, forest, chickens, rockets, whatever. So they said, okay would you mind looking at our data and see what we shall see. So the first thing I ask you when you see data displayed in parallel coordinates, do not let the picture intimidate you because it does look intimidating especially if you haven’t see too many before. Okay, so when we put all the data that we have for this problem this is what we get. So let’s try to understand. We have a rectangle, the region in the map. This is Windows 7 so you guys make it strange ways. [laughter] I put one window, okay, anyway, so I’m just going to choose one data point and look at it. Okay, so remember a data point in parallel coordinates looks like a polygonal line, something like this. So we have seven measurements and a position. So here now is what the data, what the polygonal line tells us. It gives us the X coordinate of the point where the measurements are made, and also the Y coordinates. So the first two Xs specify the point and the remaining seven Xs give us the seven numbers that were measured at that point. So from our point of view we could think of it in the abstract. We have this rectangle and seven numbers that aliens sent associated with each point. So we would like to know what if anything we can learn from it. Okay, please look in the lake, this strange looking lake with a finger and remember the name of the game in visualization is pattern recognition. So I would like to draw the lake for you but from the picture. If fact it might be better if I do it over there. So let me note from the data. So I see a pattern here and here’s the lake from the data. Can’t get this thing positioned right. Okay, so I chose this part of the data and I get the lake. So people were quite amazed and said, wow, what’s going on, blah, blah, blah? While they were looking at each other in awe I saw another pattern which enables us to take the water out of the lake and return it. So this is almost in real time and they really didn’t [indiscernible] there, if you work harder you can come up with rules to distinguish the green regions, the built up areas, and what have you. But this is nice and quick which is why I do it. So they liked it very much and they said, okay fine, all these things, patterns. Is there a way that you could do this automatically? Of course what they meant is can you come up with a classifier and there are lots of classifiers, algorithms which do classify that may be able to do this automatically. But I thought it would be fun to try and do it based in some geometric ideas and using parallel coordinates internally. So I worked on such a thing and I was visiting a group at Yale that I work with, very good group, mathematicians, computer scientists. They asked me, what are you doing? I said you know I’m working on a classifier. They said that’s great because people from the medical school gave us a dataset and they want classification and we have not been able to do it. We sent it to some of our friends and they were also not able to do it. The doctors are making fun of us which is an unbearable insult. [laughter] So I said okay we’ll try it. So they bring the doctors over with the data and explaining what it is about. Here’s the data, it involves thirty-three variables which is starting to be respectable. What they did is they took a poor monkey and they stuck electrodes on its brain. I instinctively made a face and one of the doctors assured me that it doesn’t hurt. >> [inaudible] >> Alfred Inselberg: So I said if it doesn’t hurt why didn’t you do it on yourself? [laughter] So I was asked by my friends not to make such remarks and to concentrate on the problem. So here’s the data. They took this poor monkey and they claimed to have placed electrodes on two and only two different types of neurons. So the first access is categorical, neurons of type one, neurons of type two. Then they gave the monkey some pictures to look at and the neurons started firing, and the good doctors claimed to be able to distinguish thirty-two different spikes, pulses, and measure them. Here is a sample, a data point and it says this is a neuron of type two and pulse of type X one has this amount, X two this amount, and what have you all the way to thirty-two. So there are thirty-three variables all together and one of them is the category. Okay, so what the doctors wanted to classify, wanted to be able to do was to look at the firing neurons and to be able to tell with precision whether it’s neuron of type one or type two. My friends told me just hadn’t been able to do it. So we select here neurons of type one and these are the blue guys. So what we want is the rule to distinguish the blue guys from the white guys. Okay, so that is going to be the input to our classifier. Here’s the answer, so it gives us the answer in terms of conditions in the variables, which is very nice, unlike neural networks. But more important we can see the answer and that is really fun. >>: I’m sorry, is that saying within the ranges between those and those? >> Alfred Inselberg: Correct, you have to be within these ranges. Notice its Q two minus Q three and then take that out, so it’s a compliment. It’s called nested cavit as you will see why in a moment. You grab what you want, you have everything you want but it’s too much. Then you take things out and put back that name nest that cavities the name of the classifier, absolutely enthused my dentist. [laughter] How would you deal, how would a poor dentist deal with nested cavities, anyway, but let us not get carried away? So the first result that we have from the classifier is that it only needs nine out of thirtytwo variables. If we are in three space and we throw points it can happen that they can all fall on a single plane, okay. So it can very well be that even though we are three dimensional the specific subset that we’re looking at is actually two dimensional, a true dimensional. So it finds the true dimensionality of the problem. Now in many dimensions thirty-two, thirty-three there’s lots of space. So it’s quite likely that if you throw lots of points they may not be dispersed all over the space and you may be able to concentrate them in a subspace like that. It found that it’s nine and it finds it in a very efficient way, nice way. So it finds nine and chooses the best nine. We will see the criteria in a moment, why, and orders them according to their, you will see in a moment, the power to separate the two classes. First of all let me show you the first two variables that were given in the dataset, X one and X two. The first adjoining variables, so we see that the two classes are quite mixed even into two dimensions. So we have a real problem. Now let’s see what the classifier is able to do. These are, don’t do that. This is what the classifier found the best two variables, which is remarkable. I mean we looked at that and we understood why the traditional classifiers hadn’t worked. Because typically people try to separate clusters by putting planes between them and this isn’t going to do it. Also nearest neighbor ideas will not work because they are very, very close together. So and of course doctors saw that and said wow we didn’t realize our dataset had holes. I though this looks a little bit like a banana so I said that’s probably what the poor monkey was fantasizing while they were torturing, anyway. So here is, now we are actually looking at the rule. Here’s the rule explicitly but we are seeing what it does. Notice X, X eleven, X fourteen, and it gives us a road map of going down this way and seeing further best sections in some sense. We see that as we proceed this way the separation between the classes becomes less pronounced and this is the criterion for choosing the order. So what has happened here, in a very efficient way, a classifier went into the thirty-two dimensional set of data points, chose a nine dimensional subspace, and found these two very strange pretzels winding in nine dimensions tightly coiling around each other. So it’s nice and we give the doctors, you know very good idea about the structure of their data. We have the rule exclusively. What is left is to get a measure of how accurate is the rule? Now let’s do that quickly with training test. So typically I let the software choose two-thirds of the data at random. You can choose any number you want to and construct the rule based on two-thirds, and then test that rule on the one-third. Here’s the answer, oh you can see false positives about five percent, negatives about three percent, some kind of an average about four percent, very accurate, very fast. Get the rule both explicitly and you can see it which is fun. Now the parallel coordinates are internal to this. I wanted to go a little bit into the very nice geometry that enables us to do that. A good way to get started is to look at how we could describe the plan, the two dimensional plane itself in terms of parallel coordinates. Of course we can describe it with orthogonal coordinates. So since it is two dimensions we only need to Xs, take a point, always remember we have the XY coordinate system. We do the same thing, first coordinate the first Xs, second coordinate the second Xs, and join them. So we see that a point is represented by a line and it’s fair then to ask how can a line be represented? Here’s the answer and it is both nice and surprising. Here’s the line, we choose some points in a line, each one of these points in parallel coordinates is a line, and continues. Over here we have a bunch of points, here we have a bunch of lines, but remarkably the lines intersect, very important. It’s not accidental it happens all the time. The proof is very quick and fun. If we take a line and two dimensions given say by two parameters in this case slope and intercept. Take one point, transfer it over here, see it is a line, another point and see it as a line. Now since we do have the XY coordinate system we find the equations of these two lines and we find the point of intersection. We say that the point of intersection is independent of these two points and depends only on the two parameters which specify the line, M and B, D is whatever distance we choose between Xs which we can take as one. So that’s very nice because it tells us that we have duality. A very nice transformation in mathematics, points going to lines and lines going to points, perfectly okay, because to specify line into D we just need two numbers, two parameters. To specify a point we also need two numbers so we should be able to make such a correspondence. There is an important detail which we see here with the division which says and it’s know in mathematics that in order to do duality properly and completely you have to do it in the projective plane. It’s all in the book. You’re welcome to look at it; it’s a fun thing to do. So we do have the first very nice property in parallel coordinates. Again because I suspect most of you are interested in data exploration rather than a course in multidimensional geometry, which I would have much preferred to do. [laughter] Let’s illustrate that and let us see how duality can help us make money, which is something that lots of people talk about. So here is a dataset, a financial dataset that was given to me by four experts working on Wall Street and making over a million dollars a year, which I suspect is a bit more than some of you are making here. They said okay here’s the dataset that will explain we know everything. We’d like to see what if anything you can discover, kind of testing the methodology which is fine. So the dataset consists, is actually data given every Monday of the year, so some years have fifty-three Mondays and years still have twelve months. This was data the year nineteen eighty-five to ninety-three. There were, we were given the price of the sterling and dollars. There was no euro there so this is a deutsche mark in dollars, yen in dollars, two interest rates, short term and long term three month treasury bonds, thirty year. These are just the interest rates, price of gold, and S and P five hundred. Okay and I knew absolutely nothing about this. So we started exploring and you can do lots of things. Again don’t let the picture intimidate you. It looks like a bunch of scribbles. There’s really a lot of information here and interactivity is essential in getting it out in my opinion. I’m sorry it’s not a matter of opinion, I really do not believe in static visualization if that is no longer exploration that’s presentation which is legitimate. But to explore you need to be able to interact because you have to remove different parts. When people using parallel coordinates speak about over plotting they’re really not thinking of the power interactivity. Because you can have as many things as you want to as long as you can choose parts of them, see patterns, and recognize the patterns and remove that from the picture then you can work with the display. Okay, so let’s take one year, nineteen eight-six. Now, okay we see that the stock market was uniformly low. We have this somewhat strange gap in the gold which we will explore. We have a lot of volatility in the interest rates but still roughly in the mid range. Here we have something which was unusual for those times, that the yen was the most volatile of the three currencies, not true in general the yen was quite stable. We will see that in a moment. Let’s take another year so that we can compare. Let’s take ninety-two and immediately we see the difference. All the information is there, high stock market, low gold, low interest rates. Here the yen is stable, the sterling is all over the place, and those of you who follow these things might remember that that was the year that George Soros speculated on the sterling. You can see that nature of the speculation. Okay, so we promised to say something about making money. So let’s do it, okay so back to nineteen eighty-six. I noticed this gap, again remember we’re looking for patterns and there are a number of philosophical discussions, some not so philosophical about what is a pattern? As far as I’m concerned a pattern is anything that attracts your eyes. So the gap attracted my eye and I don’t argue with myself about what is a pattern. Okay, so I was curious about the gap in the gold and I select the low gold. Notice that gold was low from the first week in January of that year to the first week in August. Something happened between the first and the second week because gold jumped and stayed. Now what is really interesting is that low gold goes beautifully with low deutsche mark, not sterling or yen just deutsche mark, and high short term interest rates not long term interest rates. So as I was saying that the four people from Wall Street took out the little notebook and started writing on it. So it was satisfying to see that maybe they didn’t know everything. Okay, so there’s much more here which I don’t want to spend too much time. But another pattern that we see is this intersection which we isolate. We see that without really doing any work we have found a cluster with this very, very strong negative correlation between the yen and the short term interest rates, and you can play with it, again interactivity. Now of course the reason it is negative is if you noticed perhaps from the duality that we found the X coordinate of the point that represents a line depends only on the slope, and slopes and lines with negative slopes if you look at it a bit fall between the Xs. So this is in the intersection and therefore we get a negative correlation, yeah. So we found that quite easily there are lots and that happens a lot. Some more exploration which I will spare you the pain but it’s also well known, when you can play with the rates between the currencies and see that when the rates are vary gold moves a lot, and when the rates are fairly stable gold does not move. So based on that insight I said, let us suppose that we bought gold and now of course we would like to sell it when it’s high. So I picked a high gold and I started looking for a pattern and there is one. Okay, so here we have high gold and remembering what I just told you, what do you think we will see if we plot this sterling versus the deutsche mark for the period of high gold, which is a year plus? Something remarkable, a perfect straight line, now those things don’t usually happen with data, but it happened here. I will show you that it only happens when gold is high. Now what does that say? It says that the rate of exchange between these two currencies is fixed because that is the slope. I said I thought it was a bit peculiar that it was fixed only when the gold was high and then when that disappeared was dancing again. I thought well that would be a good way if these banks, if gold is high and these two banks have a lot of gold they would like to sell, they make a deal and until they get rid of the gold. Then after that they just let their currencies slope. So I suggested in one of the conferences when I saw that maybe there is a bit of shall we say manipulation, but it was speculation I didn’t accuse anybody of anything. However, in less than a month after that I received an invitation from Her Majesty’s Treasury. They said well they would be happy if I would go there and make a presentation, and, and, and. I was met at the door of this nice building by the river by a very proper Englishman wearing dark suit, white shirt, speaking with a very difficult accent. He led me in into a room with another fifteen such people, very hard to understand. I said, my god, you know if these people were anymore English they wouldn’t be able to speak at all. [laughter] Anyway, so we did all kinds of things and I finally showed them the straight line. I said, would you gentlemen know anything about that and they developed an intense interest in the cracks in the ceiling, they were going like that, but you could really feel the tension. So this is an example of using, yes using visualization to answer questions that we did not know how to ask. I had no idea that this was there. I can discuss and show you many other such cases. So I wanted to make that point. I understand that we have an hour and a half which we will break up so that you will not suffer unduly. Let us proceed a little bit, oh yes, now if, I can also show you how to make money on the yen but only if you promise that I would get something like ten percent of the… [laughter] I think that’s a reasonable, but there’s one of the really… >>: [inaudible] >> Alfred Inselberg: I’m sorry, yes? >>: The rules on the yen only changed in the mid nineties. >> Alfred Inselberg: Ah, thank you. >>: [inaudible] >> Alfred Inselberg: Your name, sir? >>: My names Lindsey Hughes. >> Alfred Inselberg: Excellent, I have… >>: It’s chocolate. >> Alfred Inselberg: Brought chocolate, Microsoft chocolate to reward good remarks and questions. You’re welcome to take some and circulate. >>: Sure. >> Alfred Inselberg: Yes and we do have some other ones. So thank you for that remark. That only says that one must update their data and their conclusions, thank you. >>: Yeah. >> Alfred Inselberg: Okay, I want to, there’s lots of really fun thing to do in geometry. But let me, I mean I just want to emphasize the pattern idea and show you another one and another way that it came up. Okay, so if I show you this pattern well we can immediately see that it’s something and it’s happening in three dimensions. So we have a bunch of polygonal lines and therefore a corresponding set of points in three dimensions. But they’re not random in the sense that small bunches of them intersect and when we join the intersecting points we get these two vertical lines. We are looking believe it or not at coplanar points, coplanar the same plane. Even in three dimensions if you have a plane and choose some points and then take the plane away unless you look at them the right way you could miss them. But here it doesn’t depend on the view point. It works in any dimension and minus one vertical lines. Let me just illustrate, oh by the way here’s the duality point to line, another point another line. Whatever I do here happens over there. Now let’s go onto planes. Okay, so the reason we get that, oops, ah I killed it, apologies folks, just a minute, nope. >> Danyel Fisher: You didn’t want to press that. >> Alfred Inselberg: Oh, no I didn’t want to press that. It was just a misappropriated enthusiasm. Yes, coming, coming, coming, okay. So we have coplanar points and I was showing you this pattern. Now let’s see where it comes from. So if we have a plane in three D as in here. We can build a new coordinate system using say these lines intersection of the plane with the X two, X three Xs and with the X one, X two Xs. So we can build a new coordinate system basically taking lines parallel to this line. Taking lines parallel to that, that, and then looking at the grid points. The grid points show up here and it is exactly the grid points that give us this property. You can play with it and it works for any plane. So if we have grid points on a plane we can recognize coplanarity immediately. It works I said in any dimension and armed with this very important information I will show you an application, which is not here, sorry about that. Okay as David mentioned, no I use to work for IBM otherwise known as the International Brotherhood of Magicians, we were magicians in those days but perhaps not any more. While there and doing this work, one of the industrial divisions heard that we were doing such work. So said, okay we are measuring some very important varied multidimensional data, we’d like to see if you guys can spot something. So they sent us the data. I did not really expect anything and one of the permutations came up like that. I happened to notice this thing, remember we’re looking for patterns. It reminded me of this and I said, okay let’s isolate it and plot just this, and sure enough it’s here. It really is a straight line. So I found this is remarkable, why? I mean this is industrial data that you would get this sort of a pattern. There’s some sort of coplanarity but for coplanarity we need at least two lines and I could only find one. What does it mean to have half a plane? Well it was fun. If you plot this guy against, versus this guy this is the equivalent we see that we have a bunch of parallel lines. So we can tell that this and that are linearly related with another parameter that is not being measured. So we we’re able to tell these people that there is a variable with a linear relationship here apparently important that is not showing up in your measurements. They went and they found it and they were so happy they sent us a check. This doesn’t happen very often. By the way this, it’s very nice when you’re able to help the people who take the measurements improve what they’re doing by adding variables that are missed. Okay this was one example of that. Another is also connected with income, we will be taking a break in a moment but I want to show you this example. I was invited by a company in Belgium that makes chocolates, Belgium they make great chocolates, to go over and help them with some data analysis. So I went there and they explained to me that they have to buy cocoa and the cocoa commodity market is pretty crazy, in the sense that the very sharp changes in the price overnight ten, fifteen percent one way or another. So you can win or lose a lot. So they’ve been trying to build a model of the cocoa commodity market for some time. They weren’t too happy so they went into a meeting but before that they gave some data and said look at it and when we come out we’ll discuss it. So, what do we have here? They gave me data for one market year that had two hundred and fifty-one actually active market days. This is volume, the amount in dollars or some currencies that was traded. Again, I don’t know that much about the variables. They often do not want you to know too much about the variable. This is another kind of volume. This is information on contracts that are open. That means they’re still being negotiated and they haven’t closed and they monitor that. Turns out it’s a very smart thing to do. This is profit and loss which they measure. Of course if profit is negative then it’s a loss and these are future prices, price three months, six months, and so on. So they gave me that and went happily to their meeting. I was looking at it and immediately saw this pattern again that I showed you. I said, wow that’s very interesting let’s look at it. This is what it looks like. So I said, you know this stock, the cocoa commodity market is not so crazy. I mean it seems to like kind of linearity and then something causes it to break. So I said okay let me look at some of these things and see if we can learn something. So using again point to line duality I pick one of the straight line intervals but I will choose it here from its corresponding point. So here’s say the first one. So I choose the first one and immediately I see that there’s an outlier in the profit and here. So please look at the last point. I choose the outlier and that was the last trait. I said, wait this is interesting could that be true of some of the other ones. So I go and look for some more. Now let’s pick this one and I choose this one. Again we see an outlier, look at this guy, last trait. So I did it and something like sixty-seven percent of the cases was like that. I said, wow, I can give them twenty-four hour notice because the outlier is the last trait, after that it breaks. Now I couldn’t tell yet whether it’s up and down but that’s it, very nice information. I showed them. They were thrilled and how do I know that they were thrilled? Because they gave me the contract to the do the big job which I’ll show you with very nice result. But then I happened to notice that I had trouble kind of in this time because there wasn’t so much regularity. One of the people that was working the model remembered that, first of all this is summer and he remembered that that year there was a revolt in the Ivory Coast, a rebellion in the Ivory Coast where they produce a lot of cocoa. So we can use parallel coordinates as a rebellion detector. [laughter] Anyway so let me show you the big thing and that turned out to be really fun. So they gave me the data now on ten years, here it is year eighty-two to ninety-two. So again I found this phenomenon and I was able now this time to get very nice indicator whether it’s going to go up or down and really valuable stuff. But there was something especially pleasing as a mathematician that I noticed. Here’s a graph of profit and loss, again loss is negative, profit is positive, and versus price difference. This is the opening minus the closing price. You get this very, very beautiful butterfly pattern but for the losses. So I was intrigued with that and I said, let’s explore them. Let’s see get rid of this. So I wanted to just take the butterfly. Again, you guys got to do something with Windows this is very, okay. So, just want the butterfly, okay, got the butterfly. Now I’m going to choose a variable that I can’t tell you what it is, a range of it. Okay, here’s a variable, please notice something unusual. Here we get two perfect straight lines. Folks this is cocoa commodity data it’s not something that comes from physics. So I said, wow, that’s nice. Now we’re going to make the butterfly fly. Look at the straight lines opening beautifully. It was really spectacular. We found the formula and non-linear and all that. They were able to use this they told me for a risk analysis model because this is about the losses. But I was so thrilled this is a pretty result. I said, guys we really have to publish it, it’s a shame. They said, no you will not publish it the competitors, blah, blah, blah. So I was pushing my luck and I said, I will publish it. So one of the ladies on the team said, we will sue you. So I said, how much? She said, at least fifty million dollars. So I said, now I know how much my work is worth. [laughter] I could not persuade them to get a small percentage of that. Anyway, so this was really fun. There are lots of kinds of adventures like that. Again, this is an instance of visualization answering questions we did not know how to ask. Very, very important to keep on the standard stuff, the old ups and the oldest things that take sections which are pre-canned questions, perfectly legitimate will miss this, because we’re doing exploration without any bias. Okay, I do not want to tire you. I want to show you a very nice way I think of making decisions in general and also for specific applications. Here is an example of that. Okay, we can really do geometry with parallel coordinates, multidimensional geometry do it and see it, and do some things that in some sense we’re not able to do it using orthogonal coordinates. So we can show lines, multidimensional lines, planes, and surfaces, hyper-surface. So this is a hyper-surface in eight space. It is built from data on the economy of a Latin American country that I worked with a central bank. They gave me the output of the agricultural sector for many years, fishing, mining, manufacturing, construction, government so it means budget, miscellaneous, and GNP, and using standards, statistical tools, lease squares. We found a nice equation for the model. They gave me the equation then we can represent it in parallel coordinates. Again, it’s all in the book. So we have a model, a visual geometrical model for the economy of this country or for anything for that matter, for particular relation. Now if we have a new point in these variables. If it falls inside this creature it satisfies the relation. If it doesn’t fall inside it doesn’t satisfy the relation. So satisfaction of the relationship is not the completely, has it complete geometrical equivalent in this sense. So let’s use the simplistic model to construct feasible economic policies for that country which is certainly a non-trivial matter. So using an interior point algorithm which you see here and it’s something that teaches us a lot. See here I’m playing around with values of the agricultural output. Let’s decide that we want a lot of agricultural output for that year. So we see that once we make a decision about one variable because of their relationship it affects everything else. Something that decision makers and especially politicians tend to ignore and in this particular case when you have a budget for the year and you use up a certain amount for agriculture it’s going to be less for everybody else, but not just less but in specific positions because of certain relationships and constraints within. Okay, so we have this then we go to fishing and a very peculiar thing happened, high fishing, low mining, and almost the reverse. I happily proceeded while one of my friends there, whose name was also Alfredo was intrigued by that and eventually, not so long after that, I got an enthusiastic email like Latin American’s know how to do. It says, Alfredo you will not believe what we found. It turns out that there is a large group of migrant workers there. So when the fishing is good they come down from the mountains to go to the fishing boats and there are not enough of them left to work in the mines and vice versa. That’s why we couldn’t construct a policy that was good for both. So unknowingly these two sectors were competing pretty much for the same labor force. So that was very satisfying. As I said it’s a model for decision support. I’ve been trying to convince doctors who work in intensive care units to use it. There they measure many, many variables of the people who come in very sad shape inside. They can go and you see a list of, Excel list, thirty, forty variables that they measured with the ranges that they’re suppose to be in. I don’t know anybody that can look at thirty, forty variables supposed to be inter-related and get an idea of their interaction and how to determine the state of the patient. This plus the use, excuse me this idiotic idea that things should be in a box in a fixed interval, which we can discuss afterwards philosophically is a very, very poor model. So I said, look you shouldn’t be doing it like that. Let’s just say that you have eight variables. Let me show you what can happen. Let’s say that the patient is such that the measurements that you have are pretty much from the center of the ranges, which is about as ideal as you can get with slight deviations. By the time you get to the end look how much you have left. It’s not this but this. So one of the doctor’s who looked at that says, wow this might explain some mysterious deaths. [laughter] We leave our lives in the hands of people that don’t have the right tool. So I said, you should be using that so when you get a new patient in here if you know he’s diabetic you can put in the constraint already and get a much better feel of what the actual ranges should be. Okay, so this is an idea that you may want to contemplate. But the most important idea is this. So, thank you for listening. We have time for questions, lots of rewards for good questions, remarks, and jokes. Yes. [applause] >>: I forgot its chocolates not the [indiscernible]. >> Alfred Inselberg: No, wow, well you give it to somebody else. >>: Oh. [laughter] >> Alfred Inselberg: Okay. >>: How successful have you been in… >> Alfred Inselberg: Who would you like to give it to? By the way I think I deserve it. [laughter] >>: How successful were you in training laymen in using this? Because you understand geometry but most people don’t. They don’t but you tell them duality and they don’t know what you’re talking about. >> Alfred Inselberg: I was successful not only training laymen but also laywomen. This is a fantastic example; Tanya did an amazing piece of work. It takes; it’s a very nice question by the way. You deserve a second reward. >>: Thank you. >> Alfred Inselberg: People remarkably enough liken this to Excel. It is visual Excel, no more no less. It’s a visual spreadsheet. I take the Excel table, space separated, and that’s how it [inaudible]. Takes about two hours, high school right? They are not expects but they’re productive after… >>: [inaudible] intersecting lines into hyper-planes or... >> Alfred Inselberg: Well. >>: Linear relationships. >> Alfred Inselberg: Even non-linear they’re much more interesting. They don’t do it like that. They go and I tell them look at this thing, anything that grabs your eyes, anything that grabs your eyes. We have here three queries which I worked very hard to come up with called atomic queries. I then composed much harder; I mean more complex queries with Boolean operators. So they see this crossing. This you don’t need anything, your eyes see them. So I said grab it. They grab it and I said play with it. They can do whatever they like. I said grab an interval on one of them and play. Ah, looks like they did a correlation, yeah. >> Danyel Fisher: [inaudible] from there to… >> Alfred Inselberg: I’m sorry. >> Danyel Fisher: What’s the leap from there to interpretation? That is I grab some random crossing of yen and three month yield. I just got fed chocolate. >> Alfred Inselberg: You could say yes, but I mean you got, you need the green one. [laughter] Yes, then. >> Danyel Fisher: Right, I just decided that I’m interested in this chunk of intersection between yen and three month yield. You showed me that once I had selected that, you know there was an interesting relationship between yen and a bunch of other stuff. But what was that subset that I grabbed and what made it an interesting subset? >> Alfred Inselberg: What made it interesting is your feel but you used is the fact that they intersect. So something happens, something that your eye identifies. It happens that just about any pattern that I came up with corresponded to a relationship in the dataset. So by choosing selecting the pattern and playing with it I find relationships in the dataset. But in a way your question is a very good and fundamental one like I would expect from Danyel. Danyel I met as a student in Professor Hertz class at Berkley in earlier days. So we have very basic tools. You see this thing this is a zebra. I don’t know anything about the dataset and statisticians tell me that is a plus because you come without a bias. You have no ideas; you have no beliefs because many times the beliefs get people into trouble. That’s what the statisticians tell; by the way this is not instead of statistics this is in addition to statistics. You have a dataset you don’t know a thing about it, start exploring, pull some properties, and then you can start making intelligent hypothesis. Okay, so I start play. I said, well S and P five hundred everybody says this is important in the stock market. Let’s explore it. So I choose it and I do a zebra on it. Which, so let’s take four intervals just for the fun of it. Look I broke it up into four intervals and colored it without doing a blasted thing I see, hey, low stock market, low currencies, high stock market, well high yen, low gold, I’m sorry low gold, lower interest rates, things that by reading the newspaper I had heard about and I start learning. >>: What’s the deal with the week with the first thing in each of those week things being different than the others? >> Alfred Inselberg: It’s because of the color. >>: I had told it to. [laughter] >> Alfred Inselberg: No, no but this. I won’t tell anybody, sorry. I’ll say I came with a whole bunch of goodies from Israel and people liked them so much I used them and last night I was looking for the one for Microsoft and said my god they took it away. So I brought Microsoft chocolate. Look this particular thing is the coloring. So… >>: [inaudible] purple best. >> Alfred Inselberg: I think it corresponds to these values of the stock market, which is by the way is very weird like it’s the last… >>: Also overprinting purple best. >> Alfred Inselberg: Yeah that could be a gold. That could be an overprint. >>: See purple is printed over the pale blues [inaudible]. >>: Okay. >> Alfred Inselberg: You can play with it. By the way, very good remark, you need interactivity. So I start with two and that gives us, and then you get a feeling and don’t, no reason to be afraid to go back and forth and get a feel on dynamically what is happened. So I start exploring with tools like that. Another very important consideration is, which surprisingly enough you have nest, the ordering. Now here I see this intersection because these guys happen to be adjacent. If they weren’t adjacent the information would still be there but I would not see it. So what do we do? There’s a very, very pretty answer and it also leads me to an open question which I would like to pose here. Let me explain, the ordering is a significant concern. So let’s see how we handle it. Sorry, okay ideally we would like to have all possible permutations, we could see everything. But, you know if we have end variables and factorials really it doesn’t, we’re not going to do that. So if we’re talking about all possible pairings that’s the order of N squared, that’s quadratic. However, we can beat it and this is a very nice thing to remember. Let’s say that we have six variables. We will make a graph where the vertices of graph are the variables and let’s do that. So here are the vertices one, two, three, variable four, variable five, variable six. We will put an edge if these two variables are adjacent in the permutation that we are looking at, okay. So look at this strange Hamilton path that is chosen at first, okay. It’s a Hamilton path and it goes through every one of the vertices. So we have that and that corresponds to a particular permutation. We rotate it, another permutation, rotate it again and these three Hamilton paths together give us the complete graph. In other words we have all possible edges and therefore all possible adjacencies. This shows that for end variables we need of the order of N over two well chosen permutations to see all possible adjacent. The software constructs them. So if we are here, we go this guy is the permutation editor. So here there are ten variables so there constructs these five permutations. Let’s say that for whatever reason say we’re speaking about the yen so let’s say we wanted the year and the yen adjacent. So we pick the permutation where this is adjacent. We select it, we apply it, and sure enough it does look different. So you start with data that you don’t know a thing about and again my statistician friends tell me the less you know the better off you are. Because you don’t start, I’m not sure I buy that but that’s, of course, you know there’s some famous books about statisticians. One of them is a very good book because I know I had it… >>: [inaudible] >> Alfred Inselberg: And I lent it… [laughter] I’m sorry? >>: Careful. [laughter] >> Alfred Inselberg: No, no, it’s not, I’m only quoting. [laughter] I didn’t, but there’s this very famous book and I know it’s a very good book because I lent it and I never got it back. [laughter] And it’s called, How to Tell the Liars from the Statisticians. Anyway, but no I, so here is another permutation. So what I do with a set, that a dataset that I don’t know anything about. I put it up there and I run through these permutations and if I see a pattern I like, for example here or here. By the way you see this curve that tells me there’s some convexities here. So it’s full of pattern. I put it, I write it on the line, on the side and then when I’m done I go to here and I make my own custom made permutation, which in my opinion is the best permutation for that dataset. That has all the parts that I want. I can repeat by the way the Xs as many times as I like. One thing you could not do in orthog and then proceed. I don’t know if that fully answers your question, Danyel. I’m not even sure that one can because this is a game which is akin to being a detective. >> Danyel Fisher: I was asking something that was actually much simpler. >> Alfred Inselberg: Oh, well. >> Danyel Fisher: Sorry, so… >> Alfred Inselberg: You ruined it. [laughter] >> Danyel Fisher: No, no I think that was really, that was a great answer to a, and not the question you didn’t ask. >>: Or to a deep question. >> Danyel Fisher: To a deeper more profound question I was asking. What I’m asking is, I look at this say and I see like between a thirty year and S and P five hundred is this little knot at the top. From what you’ve told me… >> Alfred Inselberg: That’s of the year… >> Danyel Fisher: Over towards the right side with the thirty year long term yielding… >> Alfred Inselberg: Ah, okay, yeah. >> Danyel Fisher: It’s just a knot at the top and from what you’ve told me look for those knots they’re very interesting. >> Alfred Inselberg: This thing? >> Danyel Fisher: I meant certainly this big wad there. >> Alfred Inselberg: Ah, okay, yeah, yeah, I know, okay. That is… >> Danyel Fisher: But, but I didn’t know, so you’re going to select that and, you know you’ll then discover that this shows some correlation between something. But there’s also a bunch of stuff that we’ve now eliminated, right. >> Alfred Inselberg: Yes. >> Danyel Fisher: And that’s, what’s the thing that isn’t, what’s the thing that I just eliminated? How do I know what’s interesting about this cluster against the other rest of the world? >> Alfred Inselberg: Oh, very good, there’s even a query about that. See, you’ve been anticipative, actually it’s, so one way to find out is to do the query and then afterwards go here and use this Boolean operator which is the compliment, and redo the query and you will see what you have left in contrast. By the way this here is turning out to be nicer than I thought. Here we’re looking at something roughly elliptical. You see ellipsis going to roughly hyperbolas. So without even realizing you’re discovering new useful things about, I know. >>: How in something like this like what you’ve just highlighted there as sort of the, sort of the classic movement back and forth between bonds and stocks? How do you look for causality? >> Alfred Inselberg: Oh, that is, I’m very careful about that. It’s not that I wouldn’t like to have causality. >>: Yeah. [laughter] >> Alfred Inselberg: I’m not sure, well first of all these are fields that I don’t know anything about. So for me to go and lecture I found the cause of this, this would be very foolish. But if we’re talking about specific aspects that maybe I can put in date and see by the date what is happening. At least I can give a time description of how something evolves. If you’re looking for causality and you have a bit more time I want to show you a charming piece of work which I did for the Army of a small country in the Middle East whose name I cannot mention. I was invited and they brought me into a room, no windows. They showed me this and they said, we took nine hundred and twelve trucks and divided into eleven categories. In which I didn’t know what and they wouldn’t tell me. We run them over the same track, put sound sensors, measured the noise in seven different frequency ranges, and recorded it. Then took it’s frequency range, applied something called Waveots, got a single number, and so that for each particular car sale, or truck, sorry we have something like that. We have, we know the category, we have the information in some kind of a file and this thing here is shall we say the noise signature, okay. So they tell me, okay the top five classes are Russian made trucks. We want you to find a rule so that we can identify them at a distance from the noise they make. So I looked for the nearest escape hatch. I thought these people were mad. [laughter] There were two burley soldiers at the door so this was not an option. But I was approached by a soldier, an unusual soldier who happened to have a Ph.D. in Computer Science and a Ph.D. in Physics. Mind you this was the Top Yield Program. It’s a program in Israel and he says, Professor I think we can do it. All of a sudden it was we not just me and said just think of it as sonar of the ground. So already he kind of gave me a paradigm a little bit towards the line. So I said, okay let’s start looking. So I start looking and remember it’s visualization, we’re looking for patterns. So I go up and down, I said whatever V four is there seems to be that there are two populations, those that are dense here and those that are more sparse. So in classification if one can identify different populations it’s better to classify them separately, gives you a more accurate rule, it may be in better understanding and then take the union of the rules. So I choose these and this special soldier looked them up in the file and he says that’s amazing, because all these trucks have automatic gear and only these trucks. So I said, oh that’s interesting and these guys, the sparse ones are somewhere in there have manual gear. I remember the days I worked on a farm driving tractors, tractors have manual gears, doesn’t take very long but from a distance you could hear the blue tractor, tell the difference from the red tractor and the green. They all sounded different, manual gears you might say have more of a personality. It is true because, so I said, okay wow let’s go back and start, try to classify these guys, okay. I’ll say it doesn’t work it doesn’t work. So this is the input to the classifier and here’s the rule. Let’s look at the rule as we meant before. We have explicitly, let’s look in the rule. So already this tells us that these are the four most important variables and in that order. So if you want to do real time identification the fewer variables you have to measure the faster you can do it. Also it knows how to kind of concentrate information. This is what we want and it suggests there might even be three populations which is already valuable. So we have, I’m sorry we have a rule. Let’s see if it maybe an accurate rule, again do, train, and test, two thirds. It really is quite an accurate rule about four percent. However, this is tricky business, if you’re going to start shooting, I didn’t say that I said if. [laughter] You should, I kept some information. So the question came up, is there anyway, is there any kind of information we can get on the misclassified ones, the ones that it doesn’t get? So it doesn’t always, it isn’t always possible to do that but here it turns out that it is, so I go back and I use a rougher classifier. You will see why in a moment. Here is the rule that we get and where is the previous, the previous rule is this, no sorry, where is it? >> Danyel Fisher: I think you lost it. >> Alfred Inselberg: Well we lost it. It was more complicated, it doesn’t matter. Anyway, so this is a simpler rule. But we now go over to the picture and we include all those that we want, the top five and some elements from category two and only category two. So that makes classifieds come from here. So my friend looked them up and absolutely cracked up. He says these are American made trucks that the Russians used to copy, and hence the similarity. So this was one of those rare cases where you could get in deeper and make some very interesting, but it doesn’t always happen. I wish I, by the way I wish I could do more about that then we could discuss it. Because people ask you, okay you know you found that now can you tell me what happened in the past? Can you make a forecast and how good, and all kinds of interesting, very important questions? Which they’re not just data analysis type questions, there’s an element of philosophy here. Is it, do you really believe that what happened before is most likely to happen again, and, and, and? You know ask the people who lost a lot of money in two thousand eight. Where you have this country know it’s a catastrophe, call it what you like, so. I know that this is not a satisfactory answer but I tried. >> Danyel Fisher: No, no it’s great. >> Alfred Inselberg: It is? >> Danyel Fisher: Yeah. >> Alfred Inselberg: Chocolate. [laughter] >>: The [inaudible] again I work for market research on [inaudible] rather than… >> Alfred Inselberg: Ah, yes, yes. >>: But I get that sort of question all the time. So I say well it’s inherent, you know you can pick what your causal variables are and look at the relationship to the other variables in the dataset. But again the data doesn’t tell you that. >> Alfred Inselberg: Yes, you said something which I should have said, but you said it in a more eloquent way. For me a good data analysis exploration is supposed to craft a story. You go back and you say this is what I found; this is what I think it means. Now it may not be true but you’re doing your level best to give it structure. If, we have four more minutes. One of the other interesting stories and I have several is I was looking at some data from a large oil company, different prices of crude. I had no idea how many crudes there were, products, alternative energies, storage, amounts in storage, distillery, utilization. I mean the fascinating thing is you get into fields which as a mathematician you never dreamt of. So the guy was dealing with a very nice man he said, believe it or not we make much more money trading oil than looking for oil. Which says something about, so they wanted some heuristics, for me to find some heuristics for the stock market. It was a very interesting question and I was working with an intelligent person who was helpful. It’s not surprising that the data was seasonal because of the needs of oil. For all those years, something like ten years I had data, the maximum, in storage the maximum amount of products in storage were typically the end of May, unleaded gasoline. Americans were getting ready for their summer driving what have you. Every year this was true except one, February of nineteen ninety was the maximum for the year nineteen ninety, and bigger maximum than the normative. So of course I called them up, I said there’s a mistake in the data. They said, no, no it’s alright like that. I was going crazy, what the hell’s going on? I checked it; it’s with billions of dollars it’s not like somebody bought. Speaking about stories, I live close to the desert, sometimes even in the desert, start thinking. I said wait a minute this great philanthropist Saddam Hussein went into Kuwait in August of nineteen ninety. Now you don’t just walk in you have to prepare. So you send material in trucks and what have you. If you know desert, if you drive a truck in the desert it takes about three days for the dust to settle down. I mean you can really, there’s a lot of information just looking around. Of course the desert there’s a lot of Bedouins. So I surmised that these guys tell Bedouins, if you see anything unusual tell me and here’s some money. So clearly the Bedouins thought there’s a lot of stuff moving in the South toward Kuwait. It was not such a big perhaps idea that maybe something is cooking. So these companies bought oil like crazy at about fifteen dollars a barrel. They sold it at about seventy-five dollars a barrel in November after he had gone in and took. So they really knew what they were doing. But just too kind of reinforce my speculation I was able to trace it because I was getting it every week or whatever. I could see that an enormous amount of jet fuel was being produced. So they were preparing for war. So I gave them the results which they liked and I also said, well there seems to be this peculiarity about nineteen ninety. There were some very upset faces. So I left with the remark saying that I hope that the CIA and some other agencies are as well informed as you guys, but, so something about causality. But I was trying my level best to connect. Okay, sorry. >>: [inaudible] >> Alfred Inselberg: No, no, yeah. >>: So it seems that’s a very good like the causality. Could you find linear correlations? >> Alfred Inselberg: No, no, no, why did you say linear? How could you possibly? >>: [inaudible] like sections of lines? >> Alfred Inselberg: Nah, nah, nah because these thing that I showed you about the cocoa commodity market this was highly non-linear. Okay, go ahead, I interrupt. >>: So you measure that like I have an explanation of [indiscernible] between the line. I just want to see how distinctive it is from the other stuff. How can I spot it, you know? >> Alfred Inselberg: Yeah. >>: [indiscernible] >> Alfred Inselberg: You look for patterns. Look, look at this, you cannot spot this? If you cannot spot this I would suggest you visit your optometrist. >>: [inaudible]. I recently had to choose this two particular [indiscernible]. >>: [inaudible] >> Alfred Inselberg: No, no, okay. >> Danyel Fisher: Folks that’s, sorry to cut you off. >> Alfred Inselberg: Okay, we’ll continue… >> Danyel Fisher: Let’s thank Professor Inselberg once more. [applause] >> Alfred Inselberg: Thanks. >> Danyel Fisher: I encourage this conversation…

Danyel Fisher: Good morning, my name`s Danyel Fisher. I`m a

Related documents

Products

Support

Danyel Fisher: Good morning, my name`s Danyel Fisher. I`m a

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib