>> Feng Zhao: So I'm Feng Zhao. It is my pleasure to introduce professor Rich Baraniuk from Rice University. And quite a number of us have known Rich for quite sometime and Rich had visited MS before, a couple years back. So this is your third visit. Rich has been working in the areas of signal processing and in particular more recently working on compressive sensing and he actually organized or edited a special issue in the signal processing magazine on the topic of compressed sensing which I think is a great resource. It was March '08. And today Rich is going to give a talk on compressed sensing and various computation experiments and then the test beds that he has been building. So with that, Rich. >> Richard Baraniuk: Okay. Thanks. Thanks very much for the invite. It's great to be here. So my job is to talk a little bit about this field of compressive sensing, compressed sensing, compressive sampling. It goes by a lot of different names. And what I'm going to do is talk about motivation of why we're worrying about why we're doing compressive sensing, talk a little bit about some of the mathematics behind it and then talk about some of the real world experiments that people are doing that show that this brand new theory really looks like it has some real potential for applications. So I hope that's okay with everybody. And I'd also like to keep it inner active. So if you have questions, just interrupt me as I go along. And this of course is work that is done by a number of different people around the world. I'm going to talk about not just work from our group but also work from a number of different groups around the world. But none of this would have been possible from our group without the contribution of all these current students and alums. Okay. So sensing. So it's hard to imagine doing anything in this -- in science and in engineering without somehow turning the physical world into data that we can process, that we can analyze, that we can try to feedback into that physical world. And in case you haven't noticed, there's been a revolution, a digitizing revolution in sensors that we're starting to push the analog to digital interface closer and closer to the front end of a whole host of different types of sensing systems that started with, for example, digital audio, things like analog to digital converters for audio systems, for cell phones, moved into the digital still camera realm where we sample light, right, reflected from objects. And it moved into digital video and into an increasing array of different type of digital imaging modalities. And the reasons for doing this are clear, right, because of the tremendous advantages of digital data processing reprogrammability, easy storage. They're just tremendous strong arguments for why we want to do digital data acquisition. But as a digital data acquisition and processing person it makes me nervous a little bit, right, and I think Rico will agree, this puts us under a lot of pressure, right. Because digital sensing and processing has been so successful, everybody wants to do it, and we're pushing that interface between the analog world and the digital world closer and closer to the front end of increasingly wideband types of sensing systems. So just because we build an A to D converter we're not happy, right? We're constantly interested in pushing the denser-denser sampling. Why? Because we'd like to have higher resolution in our sensors. We're also increasingly not interested in just having a single sensor, we want to network the sensors, covering for example sensor nets that are covering a broad geographical area or distributed camera networks that are not just reconstructing pictures of an environment but actual 3D reconstructions of an environment. And we're not just interested anymore in just audio signals or just visible wavelength of light, we're interested in sensing extremely broad bandwidths of electromagnetic radiation, for example, all the way from infrared and terahertz up into the gamma rays even into the camera world. And when you multiple all this together, right, all these growing demands, it's not hard to see that you end up with an absolute deluge of data that really presents two big challenges or a number of challenges but two really big once, the first of which is how are we going to acquire all of this digitized information? That creates really a hardware challenge of building new A to D interfaces, analog to digital interfaces. The second, though, is even if you could acquire all of this data, how are you going to process or store this data efficiently? So it becomes a very important data fusion and processing problem. And I think that the important thing that's going to come out of this talk is this area of compressive sensing is really trying to address both of these problems simultaneously, how to acquire big data volumes and then how to process them really efficiently. And I'm going to try to bring up both of these during this talk. So just to see how we got ourselves in this mess, right, let's do a really brief, brief review. And let me get a show of hands. Nyquist sampling theorem, Shannon sampling, show of hands, people familiar with that? Okay. So this is paradigm -- this is dogma, right, in the signal processing world? The dogma is right here. If you sample densely enough, namely at twice the highest frequency in your signal, analog signal, then you can hope to perfectly reconstruct the original analog data, right? That's the theorem, Shannon/Nyquist theorem goes back even further than that. And this is the reason why in your cell phone we sample your voice signal uniformly at eight kilohertz, for example, or in a CD digital audio processing system we might sample at 48 kilohertz or 44.1 kilohertz uniformly. This is why in your digital camera when the light comes through the lens it's focused on a CCD array or a CMOS imaging array that again samples uniformly where this sample spacing is telling you it's very, very closely related to this Nyquist bandwidth of your signal. Everybody with me? Like I said, feel free to interrupt me. So let's just look at how this actually plays out in an actual application and why this might be an issue, and let's think of a digital camera example, because everybody has a digital camera. Does everybody have a digital camera? Okay. So let's think in particular about a cell phone camera, because that is a great example. So I want to take a picture of Nicolaus, so here is Nicolaus, and what do I do? I bounce some light off Nicolaus and I focus that light through a lens on to a sampling array. So the first thing we do in a digitized sampling is we focus in on an array. This is actually a color buyer sampling grid. And we acquire N samples of the light. This is going to be the notation for the entire length of the talk, N sample. So N is how big? Megapixels, right, 10 megapixels. Kodak just released a 50 megapixel camera. Within weeks another company announced a 60.1 megapixel camera. So there's this arm's race of who can have the most megapixels. So millions and millions and millions. So step one is get a lot of data. Step two is to realize there's too much data. And so what we do typically is we compress, right? No one stores tiff images on their cell phone camera or tries to transmit them over the wireless network, right? We always compress by an algorithm by an algorithm, by an algorithm like for example JPEG or like JPEG 2000, and what we do is we take these N pieces of data and we crunch them down in some way. We're going to talk about that a lot. We crunch them down to just N pieces of data, just N number -- sorry, just K numbers. Where K is much, much, much smaller than that. And where K is small enough for the application of interest we can either store it on our memory chip and store lots of pictures or we can transmit these over the ether, right, so that our friends can look at the picture. So this is really the -- to use the term paradigm of a lot of data acquisition. You sample and then you compress. So what is behind the idea of transform domain compression, right? There's been a lot of work done on here at MSR about this. But the fundamental idea are two concepts, closely related concepts called sparsity or compressibility. We're going to talk a lot about during this talk. So here's a picture of Nicolaus. It's N pixels. Remember, millions of pixels. The key idea is that I could rerepresent the information in this picture using a transform, foray transform, wavelet transform, DCT transform. Show of hands, people? Comfortable with this? Okay. So I can rerepresent and it's an orthogonal transform, so information preserving. So I have N numbers over here and I have N numbers over here. So there's no information lost going from this domain to in this case the wavelet domain. But what we've done in this rerepresentation is we've chosen just the right basis in this case so that the coefficients in this basis are what's called sparse. What do I mean by sparse? Here I've chosen a color map where blue means a coefficient is zero and you can see what we've done. We've taken this picture of Nicolaus and we've rerepresented in a way in terms of this wavelet representation where in fact most of the wavelet coefficients are either zero or very small. There's just a very few in particular K coefficients that are large. And in a wavelet transform, it turns out that they're the coefficients that trace out the edges, right, in the image. You can see, for example like the eyebrows and the nose of Nicolaus here. Right? So we've taken a representation, it's just a rotation, in N dimensional space, totally informational preserving but we have this new property where most of the coefficients are small. And really what goes on in JPEG or JPEG 2000 is simply rerepresenting in terms of a basis where we know our data is sparse encoding only the positions and values of the nonzero coefficients of the large coefficients and throwing away all the small coefficients. That's really how JPEG and JPEG 2000 work. And you might think that this is something very specific to image data but in fact I would argue that this idea of sparsity or compressibility is really a generic property of a whole host of man made and natural signals. So here's an example of a sonar wave form that's actually yelled out by a large brown bat. Okay? It's an echo location chirp, it's a sonar chirp. And this signal has a very large time bandwidth product. What I mean by that is it's spread out all over in time, and it's also spread out if you look at it's foray transform it's spread out over a very broad foray bandwidth. So you need to sample very, very high frequency in order to not alias the signal. But on the other hand, if you instead of looking at this signal in the time domain or the frequency domain, you look at it in the time frequency domain, using either a Gabor transform or a wavelet transform, some kind of a filter bank lap transform, then you see there's again this kind of sparse structure that in fact this signal has a very simple explanation that it's just a harmonic at any given point in time there are just three frequency that are active in this particular signal. And this signal, you could think of this representation much like a musical score, this signal just sounds like (inaudible), right? So very, very sparse again. And if you wanted to compress this, why would you do it in the time domain or the frequency domain? Clearly you'd do it in this timeframe. So really this is what goes on. So we sample our data and then we compress using a representation like wavelets or DCT, et cetera. So you might ask yourself, is there anything wrong with this system because this is really what's inside, you know, huge, huge number of technologies out there. And I would argue that there's really two issues here. The first is an inefficiency argument. Why do we go to all the work to acquire N pieces of data when we're going to throw out all but K pieces of data. Right? So there's this megapixel push to have hundreds and hundreds of megapixels and we're going to compress down to much, much, much smaller files. So that seems inefficient. Why? Puts a lot of pressure on building this sampling array. The second is a bit more subtle, a bit more philosophical and it's a theoretical mismatch between the world of sampling, which is a completely linear world, right? Let's just go through that really quickly. The set of signals that can be sampled without aliasing is a bad limited subspace, it's a linear subspace. The actual sampling process is linear and synch interpolation reconstruction is linear. Everything to do about sampling is really linear. Whereas other than the transform, the rerepresentation in terms of say wavelets, everything to do with compression is non linear keeping the big coefficient, throwing away the small coefficients. That's a highly, highly non-linear process. And not only that, as we will see, the signals that are compressible form extraordinarily non-linear sets that are absolutely not like band limited subspaces. Okay? So really I would say there's a theoretical mismatch and why are we jamming together sampling and compression is this. So really this idea of compressive sensing is really to try to bring these together into one block that goes directly from analog information over here and creates measurements over here, they will be lineal measurements, and we're going to take M of them. So instead of sampling to N, compressing to K, we're going to try to go directly to M measurements, and what we're going to do is we're going to try to push M as close as possible to K, where we take K as kind of our think of it like the intrinsic information from our source, right, you can't go below K. We're going to try to push as close as we can to K. And that's really the idea. And then there's going to be a reconstruction process where we're going to actually be able to tease the image out of our or data out of our measurements. So that's really the idea that I'm going to talk about. Any questions at this point? That's a long intro, but that's the basic idea. So what I'd like to do, and I've cleared with Feng to go to 11:30, right, but I'm not supposed to go past 11:30. Let's talk for 10 minutes or 15 minutes about some of the mathematics behind this and let me just ask a show of hands, who has seen some of the math behind compressed sensing? Anybody out there? Maybe a few of you? So I'm going to, hopefully this will be like a review for people who have seen it and for people who be haven't, hopefully this will make some sense as we go along. Okay. So let's talk about sampling first, standard stuff. And let's think about sampling a signal that is itself sparse. Okay? What do I mean by that? We're going to say that our signal X, which is called K sparse, which means think of X now like a vector, well it is a vector, it's of length N, so it's very, very long, and it's sparse, meaning all of its entries are zero except K entries are nonzero. So in this case, K would be 3, because there's only 3 color blocks. I apologize that I reverse the color map. Remember before blue was zero? Sorry. Now kind of white is zero. But I hope everybody gets the idea. So this would be a signal that would be all zero except for three spikes, right? If this was a picture that I had vectorized, it would be a picture of a starry sky at night where everything is black and I see three stars. So it's very, very, very simple signals. Everything I'm going to talk about for the next ten minutes is with these kind of signals and then I'll explain how this tends very easily to signals that are compressible in DCT or wavelets. It's very, very straightforward. Okay. So this is our signal model, and very roughly speaking oversimplifying you can think of oh, I lost a slide. I can't believe this. You can think of sampling as essentially like multiplying this vector by an identity matrix, okay, where you basically apply an N dimensional operator to the signal to acquire samples, which are going to be if the samples are an identity matrix, they will be exactly just the entries of the vector X year, and then you can think of synch interpolation of this trivial undoing of the identity operation. Okay? I think -- I don't have to go into this in very much detail, so I think I'll just skip over it. But we're not going to do that. Okay. What we're going to do instead is we're going to do a dimensionality reduction. Okay? So we're not going to talk about an N dimensional operator operating on our vector X, we're going to talk about operating on X by a set of linear measurements that are M is basically a matrix operation that has M rose and N columns. Okay? So we're going to talk about sensing by dimensionality reduction, right? So instead of going from N dimensions to N dimensions we're going to go from N dimensions down to just M dimensions. Remember, this is my number of measurements. Okay. So let me just step back and go through this very, very carefully. I had X. X is of length N. It's very sparse. I'm going to take linear measurements of that X that is sparse and my measurements will just be equivalent to a matrix multiplication because they're linear measurements. Where each measurement YI is just going to be the inner product of the Ith row of phi with the entire vector X. So it's a linear measurement system. But the number of measurements M will be much, much less than the dimension of the vector X. That's a dimensionality reduction. Is everybody with me? And the game of course is going to be try to push the number of measurements here as close to K as possible. And what we'll see is you can get very, very close. Is everybody with me? Okay. So let's just talk about this. So this is problematic, I would say, right, from a mathematical point of view this is problematic because this matrix, everybody put on their linear algebra hat, this is a non full-rank matrix, so it loses information. What do I mean that? I mean that if for any given Y there are infinitely many Xs that when multiplied by phi give the exact same Y. There's the whole point of an null space, right? So this is obviously I could have a picture of Feng and a picture of Rico and I could fence both of them with the same sensing system and I could get the same measurements. How am I ever going to hope to untease those and figure out whether it was a picture of Feng or a picture of Rico? Is everybody with me on that? Yeah. So that's a problem. Okay. But the really, really cool thing is we're not interested in any old X, okay, we're not interested in arbitrary pictures. The picture of Feng is compressible in DCT. The picture of Rico is compressible in DCT. So now let's think of what happens if we have a sparse vector X. Well, in fact this matrix structure is actually much, much, much nicer because think of what happens when you multiple this matrix by a vector that only has three nonzero entries? Well, basically this multiplication is just going to select three columns from this fine matrix. Does everybody see that? Y is just a combination of three columns. So if I have sparse data -- in fact, my matrix doesn't look wide and short at all, in fact my matrix essentially looks very tall and skinny, which is a very nice matrix. Right? So we have what looks like a very bad matrix. But when our data's sparse, it's actually very nice matrix, as long as N is bigger than K. So there's hope, right? There is hope. Okay? So what can we do? So here's going to be our design goal. So now we're going to actually talk about designing a sensing matrix. Our design goal is going to be designing a phi so that if I select K columns arbitrarily from this matrix and then I look at this matrix, that matrix has full rank. Because this is a matrix that's going to preserve information in sparse vectors. Give a second for people to see this. Does that make sense to everybody? Okay. Hopefully. In particular though, we're going to ask even a little bit more, okay, just to add to the complexity here. Let's think of two sparse signals. Picture of Feng, picture of Rico. The thing I don't want to have happen is the measurements from my system mapped to the same Y, right? So what I want to do is if Feng is X1 and Rico is X2 and they're a certain distance apart I want to make sure that when I measure them, their measurements stay far apart. Does that make sense? Okay? Well, now let me think about what is the distance between Rico and Feng? Well, if they're both K sparse, then what is X1 minus X2? It's just 2K sparse in general. Right? So in fact, if this makes sense, hopefully if I can instead of asking about K columns, if I can ask for a matrix phi so that if I draw 2K columns out arbitrarily, that I get a nice full rank matrix I can in fact preserve the distances between all K sparse signals. Which mean if you are all K sparse pictures, right, and I take the same measurement matrix phi times all of you and you all have inter distances between each of you, all those distances will be preserved. Which means that could preserve the structure of sparse. Okay? Yes, question. >>: (Inaudible) thinking about matrices that you (inaudible) two elements that bring those (inaudible) very likely to be ->> Richard Baraniuk: So in fact, you were coming to the punch line of this whole buildup, right, this whole -- so hold that thought for about one minute, okay. Let me just ask if there are any questions at this point. Okay. So this is called the restrictive astronomy property of order 2K, which just means that if you restrict the matrix it looks like an ice only entry. And this looks like something that can be done. The unfortunate thing about this is that this is actually an NP complete design problem. So to design a matrix phi is extraordinarily hard, basic combinatorial. And so we were just really excited for a minute, right? I sensed it in some of you. Now we're sort of depressed, right, because this is a problem. And what Rico offered because he knows the Russian mathematical literature was there was a result due to Kassian and Gluskin (phonetic) from the 1980s that he's aware of that in fact it's an NP complete hard -- it's an NP hard -- NP complete design problem to design a matrix phi. But if you just throw down a matrix at random, for example a random Gaussian matrix or a random plus minus one matrix, you can actually show with high probability it will have exactly this property that I just talked about; namely that if you select 2K columns out of this matrix arbitrarily it will have a very -- it will basically have a high -- eigenvalue control. It will have a full rank. Okay. Which is bizarre to a lot of people who are used to sensing systems that are predictable and that are deterministic, because what this is actually suggesting we do is randomize data acquisition, right? This is suggesting that rather than sampling signals uniformly compressing them using something like JPEG or JPEG 2000, we're proposing to just sense where the entries of our measurement vector are just random linear combinations of our vector X that we're interested in. Which is kind of bizarre. But as we'll see, quite, quite useful. Any -- yeah? >>: Say that you now for a large enough dimension space I choose K much less than N directions at random with high probability (inaudible). >> Richard Baraniuk: Yeah. It's nothing other than that. There's nothing miraculous at this point. >>: (Inaudible) tell us this -- >> Richard Baraniuk: No. Well, in fact there's a whole -- and actually this is a good point to talk about some of the lengths. First of all, if you think of this from for example a frame theory approach or just Gaussian random vector approach this should be kind of obvious. That's one. Two, if you're from the JohnsonLindenstrauss world of pattern recognition machine learning, this is also Johnson-Lindenstrauss type theory. And then third, there are people who have looked at a large class of different kinds of distributions, not just Gaussian vectors, and shown that they -- these kind of may entry sees will also have this property in the random matrix literature. So I'm not trying to say this is anything new. This is known and this is just step one of the whole process. Yeah? But good point. >>: (Inaudible). >> Richard Baraniuk: Well, that's I'll leave it to you to figure that one out. And -yeah? >>: So far you're kind of (inaudible) that you don't know which ones are -- it's a sparse but you don't know which ->> Richard Baraniuk: Absolutely. You don't know which -- you know it's a sparse vector. You don't know where the big values are. If you knew where the big values were, this would be super easy. K measurement, you're done, right? But the important thing here is that what we're doing somehow in these measurements Y is encoding both the position and the location of those entries. Both the position and the location. Yeah? Other questions? Yeah. >>: One more issue seems to be that if these measurements are really numbers. >> Richard Baraniuk: Yes. >>: Then you're going to end up quantitizing. >> Richard Baraniuk: Absolutely. Absolutely. So everything -- and in fact, everything I'm going to talk about through this entire talk is real valued theory. But this is an area of a lot of research right. I'm going right for research right now is first of all, how robust is this quantization and why. And then even better can we design systems specifically knowing that we're going to be quantizing Y. But this is an area of active research right now. It's still very open. Other questions? Okay. So this is just part one. Information preservation. And the term for people in the machine -- the learning literature is this is an embedding, right, we're just embedding the structure of height. These N dimensional sparse signals into (inaudible). So now let's talk about recovery, which is why these dudes got famous, right? They didn't get famous for doing the forward problem is the recovery problem. Because what we're really interested in is taking X, sensing it randomly and then taking our measurements and getting X back. And of course this is again a difficult problem because this is an ill posed inverse problem. The matrix phi is not full rank so they're infinitely many Xs again that when multiplied by phi will get Y. So what we're going to do is we're going to exploit, right, we're going to solve this ill post inverse problem by exploiting the geometry, high dimensional geometry of sparse signals, because in general this is a very, very ill posed problem. But we're going to exploit the geometrical structure of sparse signals in order to solve this inverse problem. Okay. So let's talk a little bit about geometry of sparse signals. So very briefly. So what is a sparse signal? Sparse signal is a signal where only K out of N coordinates are nonzero. So now I want everybody to put on their N dimensional thinking cap, right, go to N dimensional space, close your eyes if it helps, and here we are in the N dimensional space. And we might ask what does a sparse X look like? Well, I can tell you that the vector X it has all these entries are zero coordinates or zero except for just K of them are nonzero. So does anybody have a sense of what the set of all K sparse signals in N dimensional space looks like? Anybody have a sense? Any feeling? So it's a very, very bizarre set, and I'll just draw you a picture. It's actually a union of hyperplane. This is in three dimensions. Two sparse, two signals that are K equals two sparse in 3D. Basically all the signals living along a hyperplane of dimension K that is aligned with the coordinate axes. So it's a very strange, strange set. Is this a band limited subspace? Does this look anything like a subspace? It's a union of subspaces, right? N choose K of them. Many, many, many subspaces. Okay is so it's a very, very non linear set. Okay. So this is what sparse signals are. So these signals you can think of them if I take my vector X and I sort its coefficient from biggest to smallest there will be K and then at the value K they will go to zero. Compressible is the generalization of sparse where the coefficients don't actually ever go to zero, they just decay very rapidly. For example, like a power lot. And really when you look at pictures like the picture of Nicolaus, there are actually no really truly zero wavelet coefficients, right, in a real idealized image. They're just getting very, very, very, very small, okay. With a power loss. Very, very quickly. So now you can ask, what does the set of all power loss decaying signals look like in N dimensional space? Okay? And if you think about that, it's basically the set of signals that live in an LP vault, right, this is just like the L2 energy where we just take the coordinates XI to the Pth power and we ask is that sum less than 1, for example, that's the P unit ball. But the key thing is we're interested in powers P less than 1. So non-convex LP balls. These opinion cushiony things that you all studied in class back in grad school but you really focused on L1 and L2 and L infinity and you just talked about the fact that these exist, but they're not really -- they're not convex bases, right? These are really the set of compressible signals. Again, does this look anything like a band limited subspace? So incredibly sea-urchiny like pointy kind of signal set. So I've kind of gone into too much detail here. But what is the one property that you can sort of pull out of this intuition. Critical property about the geometry of compressible or sparse signals. The key property is that sparse or compressible acts live close to the coordinate axes. You see that? See that? Because you're on a hyperplane aline with the coordinate action east here, you have to live somehow close to the axes. Because you're on this pin cushion here, you have to live close to the coordinate axes. You don't live out away from the axes. So now let's see how we could use this to solve this recovery problem. And now I'll speed up a little bit. This is actually a bit easier. So what are we interested in doing? We have an ill posed inverse problem, we have a signal X that's sparse, we've multiplied by a random matrix and we've got our measurements Y. Now our job is given Y to get X back. So what do we do? What's the geometry of the setup? Well, phi is a matrix that has a null space. So you can look at all of the Xs, the infinitely many Xs that when multiplied by phi give Y. They just form an N minus M dimensional hyperplane in this N dimensional space. And it's translated over to the true sparse solution X. And what's the critical point? This is a random matrix so the hyper planes at a random angle. There's this randomly angled M minus M hyperplane in high dimensions. Our job in solving an inverse problem is to final the right X out of this infinitely many possible Xs, find the right one. So what is the right one? Right? You have to have some criterion. And the standard one is li square. That's what we use all the time. So we might solve the li square's problem. So we're given Y equals phi X, I want to find X. So I search over all X sets of phi equals Y, and I look for the one with the minimum energy. That's what li squares would do. Okay? Y is a set nice. It's nice, it's fast, it's closed form. You just apply the pseudo inverse. Sounds good. But it's actually the worst thing you can do. Okay? In fact, it's provably with high probability always gets the wrong answer. So here's an example of a sparse X, multiple by Gaussian random matrix, solve the li square's problem by pseudo-inverse, this is the result. Okay? Terrible. Okay? Why is it -- it's all Gibbs phenomena basically, right. But when you really ask what is the L2 energy here and here? This is much smaller energy than this signal. So the problem is not with li squares, the problem is that li squares says nothing about sparsity of our data vector, it only talks about energy in our data vector. Okay? And you can actually look at the geometry of why this happens. Basically when you talk about solving a li squares problem, you're basically just talking about blowing up an L2 ball, which is just a big hyper sphere until it touches this hyperplane. Think of the randomly direction hyperplane blowup an L2 ball. Where is it going to touch? Is it every going to touch along a coordinate axis? No. Right? With the high probability it's going to touch somewhere out away if the coordinate axes because of the shape of the L2 ball. So that's why L2 is bad. Yes? >>: (Inaudible) but that drawing which is pretty good shows that if you try to ask a different question. >> Richard Baraniuk: Right. >>: Which point in this old hyperplane is very close to where the axis, right? >> Richard Baraniuk: Yes. >>: You're going to find (inaudible), right? Because the hyperplane will cross the axis in several points. >> Richard Baraniuk: Yes. So let's hold that thought for a minute, okay, because great point. So this is a problem. This is not good. So what you might ask is the problem that Rico just suggested which is replace the L2 norm with let's search through the null space and look through the sparsest X. That's clear. Because we want a sparse X, not a small energy X. And you can formalize this bring saying it's like looking for the signal with the smallest L0 norm. This is called L0 optimization. And in fact, this will work. And in fact, counter to Rico said, you might think that there are -- basically this is like blowing up that set of hyper planes or looking where that set of K dimensional hyper planes intersects with that vector. And you might think that there are many, many, many possible solutions but in fact you can prove that as long as you have M equals 2K measurements you can prove that you will find a unique optimizer, okay, a unique optimizer. And not only that, there is no sparse solution close to that solution. So it's a great answer. Okay? Really, really great answer. So this is exciting. This is really good. Problem is it's an NP complete problem to solve this optimization because you truly have to search all N choose K hyper planes, see where they cut through that, the big green hyperplane and that's an NP complete problem. So again, we're just getting excited and we're let down. So now the answer to Cormack's (phonetic) question. Why did these dudes get famous? These dudes got famous, these dudes, because they realized that you could convexify this problem and you could look for a norm in between this L0 norm and the L2 norm which is the first one -- you basically instead of using LP optimization -- use LP optimization until you get to the first one that's convex. Okay? Which is the L1 optimization, which lives between L2 and L0. And in fact, this is a linear program it solved. You search through the entire null space for the X's smallest L1 norm, and you can actually show that unlike this case where you need just 2K measurements you can actually show that as long as you have K LogN measurements you're still very close to K. It's just a log factor of N away. You can actually prove that the dimensionality is right so that you will have an exact solution to this optimization problem, which is the exactly right signal that you started with. And not only that, that you were far away from any other sparse solution. Okay? So this is interesting because this is now a problem that can be solved in polynomial time, right, using just a simple linear programming optimizer. And there are just tons of people now working on linear programs to make this really, really, really fast. Okay. So just to summarize this why does this work, well this works because of the geometry of the L1 ball. So people, remember the L1 ball in high dimensional space is a diamond. It's like a hyper diamond. Well, it's kind of pointy, right? Unlike the L2 ball that's kind of bulbous, the L1 ball is kind of pointy. And in fact as you go to higher and higher dimensions it gets pointier and pointier. And so it has these points that align with the coordinate axes and as you blow up that L1 ball where does it pierce? It will pierce that hyperplane at the sparse solution. Okay. So this is the thing that really broke everything open in 4 years ago with people realizing that they could do randomized data acquisition and they could use L1 optimization to be able to recover perfectly these K sparse signals or if the signal is compressible, get a very good approximation. Okay. So hopefully this long, actually too long foray into the math behind this sort of gives you a sense of why this is, first of all interesting and second, very different from the usual band limited subspace uniform sampling sync interpolation, right? It's a highly non linear recovery process. Okay. So last thing we want to say as far as theory is what if my signal is not sparse and it's not a picture of stars but it's a picture of Rico or Feng, right? These signals will be sparse in the foray domain or the wavelet domain or the DCT domain. So what do we do in that case? Well, in that case, it's very simple because we say my data X it's not sparse, but it's a sparse combination of some elements of some other basis like a DCT basis. Which means there's an N by N orthogonal DCT matrix that when I multiple it by a sparse vector alpha I get my X. So it's just a combination of in this case three columns. So now let's ask what happens when I sense X? Well, I'm going to multiple X by phi. Here's my X down here and here's my phi. And now all you have to do is remember if this is an IID Gaussian matrix, what happens when you multiple by an orthogonal matrix? Another IID Gaussian matrix. Okay? Beautiful, right? So the key is there's a universality property that it doesn't matter what basis you're sparse in here, whether it's DCT, wavelets, Fenglets, Cormacklets, right, you will be able to take the same random measurements will be sufficient to preserve the information in the signal. The only time you need to know which basis it was sparse in is in the reconstruction process. Which I think is very useful for SensorNet type problems. Okay. So hopefully that makes sense. Let's just summarize and then go into some of the applications. So the idea now is instead of uniform sampling and then compression, we're actually going to take the light coming into our camera and we're going to take inner products with random basis functions, and in the image case each row of phi, each row of that phi matrix is just going to be a different random picture. And we're going to take inner products between the light coming in with these random pictures, we're going to do that M times where M is K LogN, and this is going to be enough information to reconstruct our picture of Nicolaus, using for example linear programming. So it's a very different approach of trying to do data acquisition. So why is this potentially interesting? Well, one, it's interesting from a theoretical perspective because it just lets us think about data acquisition in a very different way. So this is clear from everything I'm talking about, right? There's all kinds of interesting insights that you gain in the data acquisition by thinking this way. The second is there's some nice practical properties that I'll just talk about really, really briefly. First, I won't have time to talk about this very much, but this is a stable process. Just in case people have questions about this. So if I quantitize the measurements, reasonably finely I will be able to still reconstruct a very good quality approximation to my original data. If there is noise added to my measurements, I will still be able to reconstruct a reasonable quality representation my data without the reconstruction process going wild, right, an unstable. So there's a numerical stability. Second, I already talked about this universality that when you take your random measurements it doesn't matter what basis you're sparse in, you only need to know after the fact which basis you are sparse in. Second or thirdly, very useful for some, especially SensorNet problems. This is a very asymmetrical process where you can think of the standard process for data acquisition is asymmetrical, where you do most of the work at the encoder. Think of an MPEG, your digital video camera. There's a lot of work going on in your encode -- you know, the camera and the encoder. The actual decoder is simple. You just put everything back together. CS by comparison is exactly the opposite. The encoder is very dumb, you're just taking random measurements. It's now the decoder that's smart. For a lot of applications that's a really bad idea, but for some applications where you really need to preserve battery life at your encoder or you only have very, very limited computational resources this might make sense. Almost lastly, there's a democratic property. And what I mean by democratic is that because each measurement is a random linear combination of the entries of a sparse vector, essentially each one of these measurements carries the same amount of information. Unlike a JPEG compressed packetized data stream, a compressive measurement packet stream, each aspect contains the same amount of information. And what means is there's kind of a digital fountain property that allows you to just stream packets to a source and if a whole bunch of them are lost, it doesn't matter as long as you get enough packets. It doesn't matter which ones you can reconstruct. It is very handy. And then finally there's a weak encryption property because you encode your data in terms of random measurements if Feng and I use a random seed to generate our random matrix and Rico doesn't know it, it would be very hard for him even if he intercepts my packets to be able to reconstruct the picture that we were both looking at. Okay. So I've taken too long. So let's jump right into some applications. Quick questions? >>: You did pay a price, right? >> Richard Baraniuk: What is my price? >>: M is at least two times K. >> Richard Baraniuk: Yes. And or K LogN but let's even say it's at least K LogN. >>: (Inaudible). >> Richard Baraniuk: True. True. Absolutely. But at least in the back of your mind, just kind of think of roughly speaking if you were to encode a DCT in terms of bits, it's N point DCT in terms of bits and you're going to throw K bits per coefficient or some number of bits per coefficient, how many bits do you need to actually encode the location of each big coefficient? You need login bits I mean in a naive coder. So in a sense, it's similar to what you would get from a naive coder. It would be K LogN. >>: (Inaudible). >> Richard Baraniuk: But a big open issue, and this is what actually we're doing in my group thinking very hard about is, yeah, how to really exploit what's done in real world coder to do this reconstruction. Yeah. So at least naively it's actually pretty close. But there's, you know, 50 years of compression technology and like four years of work here. Okay. So let's talk about some examples unless I have some other questions. Yeah? >>: (Inaudible)? >> Richard Baraniuk: Yes. K LogN measurement camera. Yeah. And I really should have sort of set that better. You're going to build a camera where you don't take N pixels and then compress, you just somehow boil directly down to M numbers. And that could be M pixels or it could be M time multiplex measurements. And I'll talk about that in like a minute. Good people push the discussion forward. Okay. So the first application is in art. So Gerhart Richter is a painter, and this is one of his paintings. It's called 4096 colors. And this is actually a valid compressed sensing matrix. We've been able to show that it actually has the restricted isometry property, it has the Johnson-Lindenstrauss embedded property and as a result it was sold for 3.7 million dollars. So there's big money in compressed sensing art. That's the first lesson you should take away. You might not get famous, but you might make big money. And Richter is actually working on now a set of stained-glass windows for the Cologne Cathedral that also had the restricted isom -- blocked restricted isometry property. Okay. Another example that's just I think will illustrate all the ideas is we built a camera to do this. And this is a camera that takes just K LogN measurements rather than N pixels. So imagine taking a light from the lens as it comes through a digital camera, and instead of focusing it on a CCD array, focus it on a micromirror array, the same kind of micromirror array that's inside this projector where we have N of these micromirrors, N, and they are each the size of a tiny -a bacterium, and they can tilt 10 degrees this way or 10 degrees this way. So what we do is we put a random pattern on the mirror array with roughly half of them facing this way, half of them facing that way. The light from the mirrors facing this way is just thrown away. The light from the mirrors facing this way is focused via a second lens to a single photo dial. So what is this? It's an optical computer to compute the inner product between the light from the scene and one row of the phi matrix. We do this N times. It's a time multiplexed camera, it's not an M pixel camera but it's M time multiplex measurements. We get our random measurements Y, and then we reconstruct with a linear program. So this is -- you could actually do this. And here's our first pictures we ever took. This is at 50 times sub-Nyquist measurement. So this is just two percent of -well, actually I should go over here. This is what you should see. This is using just a portion of the array that is 256 by 256 pixels. So it's 65,000 pixel image. This is what you would see with a regular digital camera with this kind of imagery. And this is what you see with our camera taking just 1300 measurements. So that's basically 50 times sub-Nyquist measuring. So K. So you could certainly tell us in R. It's pretty crappy. But the other nice thing about these compressive measurements is they have a progressive property, just like progressive JPEG. If you're willing to take 16 percent of the number of pixels, you can actually get a representation that's essentially as good as the original data. So you also have a very scaleable measurement system. Yes? >>: What were you using for your sparsity model ->> Richard Baraniuk: Oh, yeah. In this case it's actually Haar wavelets and TV -- well, in this case actually it was just TV regularized reconstruction. But you could -- but it's very close to using L1 optimization with Haar, Haar basis. >>: (Inaudible) the same measurements that you already have reconstruct it differently potentially doing better, right? I mean, like in the future if you had a ->> Richard Baraniuk: Oh, absolutely. Absolutely. >>: Images, you could get better images. >> Richard Baraniuk: Exactly. Exactly. And that's such a cool thing that I want to bring that up. So let's imagine we're sending a space probe to Pluto. How long does it take to get to Pluto? Like 12 years or more. Long time to get to Pluto. If you were going to build a camera on that space probe, you would put in today's technology, like JPEG 2000, probably or some fancy wavelet coder. But then you're stuck. It's fixed. Let's assume you can't reprogram this sensor. On the other hand, you could put in a randomized measurement camera like this camera and it might be based today on wavelets, right? But what's going to happen in the 12 or 20 years that the camera's going to a it's travelling through space, what do we know? Rico's going to have a better basis, Cormack's going to have a better basis, curvelets will be even improved. So in 20 years we'll have a basis that can make images even sparser than the bases of today. What this is basically saying is if you take those random measurements and you reconstruct today with wavelets and you reconstruct in 20 years with Cormacklets that can even better sparsify that data, you will get a better picture out. So think in terms of all kind of data archiving, data warehousing or sort of long term sort of sensing applications. This is I call it the future proofedness property. >>: (Inaudible) that's a very interesting generalization. A lot of the advantage seems to come when you're really dealing with something that's a union of subsets, right? >> Richard Baraniuk: Right. >>: I don't know why you talk about images so much because I can't think of a worse example of a signal set you are know. Because the band limited assumption we all know it's done but it's better than anything else we've got. >> Richard Baraniuk: Right. >>: If you're dealing with a, you know, a two-tone image like this maybe a union of subspace makes sense. >> Richard Baraniuk: Right. Right. >>: But for pretty much any other image class you want to deal with union of subsets just doesn't really ->> Richard Baraniuk: Well, I would actually argue that if on a first level of first approximation, okay, there's a multiple levels of approximation, but on a level of first approximation I would say that a LP model compressible model what says the for example wavelet coefficients decay with a power lot is a pretty -- to a first approximation, pretty accurate model for a large class of images. It basically says that my -- which is really the kind of model that we're exploiting here. We're not saying that the images are necessarily union and subspaces. >>: (Inaudible) already better ways of exploiting that model, right? You know, to harness the power of what you're presenting here. >> Richard Baraniuk: Right. >>: It seems to me you really need something like a multi-band signals or things that are supported over pretty distinguished regions. >> Richard Baraniuk: Right. >>: In sub-high dimensional space. You know, images just don't seem to -- >> Richard Baraniuk: Yeah, well I'm -- there's a whole bunch of things here. One is I think it's also very -- this is a first order approximation to good models and I think that just as people have developed better and better models for transform domain compression, the dream is to have those models integrated in -- they're basically priors or on the for example correlations of how the coefficients of those representations behave. Those can be built into, that's the dream, into reconstruction algorithms. That's my answer really to that question. The second is this is just the first of the applications that I want to talk about, and I'm not going to say that cameras are the -- this is not intended to replace the digital camera that you carry around, right, this is just a first example, okay. And there are all kind of interesting examples and multi band signals. A to D conversion over very broad bandwidth where these models where union and subspace models are very compelling. Yeah? >>: That (inaudible). >> Richard Baraniuk: Yeah, absolutely. Yeah. >>: (Inaudible). Quick argument. There's actually a very (inaudible) we have the wavelet model doesn't explain too well, and you compress them 2X or 4X better than (inaudible) which is building (inaudible) which actually it's very important when you're trying (inaudible) for example take pictures of streets. >> Richard Baraniuk: Street view thing. >>: So that kind of stuff could actually help in that sense. >> Richard Baraniuk: Yeah, yeah, very interesting. Yeah, absolutely. Absolutely. Okay. So but why do we find this interesting for cameras? One of the reasons we find this interesting is that the fact that this is based on a single photon, single photon detector, and just very briefly think for one second about why you can buy a one megapixel digital camera these days for about 10 bucks. Anybody really thought about why digital cameras are so cheap? Anybody ever thought? Pardon? >>: (Inaudible). >> Richard Baraniuk: That's exactly right. Right. There's an incredible coincidence that the wavelength that is our eyes are sensitive to just happen to be the wavelengths that silicon is sensitive to. Which means that digital camera technology can essentially ride on Moore's law and processor technology and have tremendous economies of scale using processes and ways to build the chips that are very similar to computer chips. As soon as you move outside of the visible wavelengths, your $10, your $100 camera becomes a $100,000 camera. You move into the far infrared gamma ray, terahertz, other kind of regimes. So the thing that's very, very interesting is that this compressive approach can actually scale into those regimes. So let's just take an example. Say you want to build a really low light camera or you can't see with regular CMOS imaging chips. What you have to use is what's called a photo multiplier tube. Right? People know about photo multipliers? Basically it's for extremely -- when you just have hundreds or thousands of photons coming into a camera. Think of a very dark night doing astronomy, okay, or a particle physics experiment, okay. Photo multiplier tubes. One, they cost $1,000. Two, they're physically as large as my fist. So now try to build a one megapixel photo multiplier tube camera. There's one organization in the world that can do this, and it's (inaudible), right, for particle physics experiments and they cost millions of dollars to build these cameras. Instead what you can do is just take the camera that we built, swap out the photo (inaudible) and put in a photo multiplier tube and build a thousand dollar or thousand one-hundred dollar very low light imaging camera that can work not only in visible wavelengths but even into the infrared. Okay? So our feeling is that the biggest impacts from this compressive technology at least in the camera world is going to be outside of where silicon is really, really -- has a strangle hold because it's very, very natural. Yeah? >>: (Inaudible). >> Richard Baraniuk: Good point. Good point. What about video, say. Right? Okay. I could talk people offline after the talk, but the cool thing is if you think in terms of video as a space time cube and think of the -- having sparsity in some sort of space time sense, you can actually do video reconstruction using space time type processing and we actually have some examples that I could show. Okay. So let me ask briefly because I realize I'm at 11:30. Should I take an extra five minutes and talk about some pattern recognition applications or should I just shut down and go to the summary and then talk with people offline about that? Because I really don't want to overstay my welcome. Five minutes is fine? Okay. So let's just jump straight to data processing. So what we've talked about up to now is data reconstruction. So the idea of reconstructing pictures. But the thing that's very, very useful about compressive measurement is that there are also very, very useful for solving inference problems, like pattern recognition type problems. Is it Rico -- is this a picture of Rico, of Cormack or of Feng, okay? And a the good news is that these kind of randomized measurements support the kind of learning and machine recognition type problems, inference type problems that -- you can basically do all of those kind of tasks on the compressive measurements. And to make a long story short, you could think of these random projections very roughly like sufficient statistics for sparse and compressible data. And here we start to get very, very related to the work that goes on in random projections for machine learning. And I could talk with people about that offline. So let's talk about a very, very simple pattern recognition problem that's being sponsored by the environmental projection agency where we're interested in flying UAVs, unmanned aerial vehicles over American cities and basically finding gas guzzling SUVs and blowing them up, okay. Because we really need to get away from dependence on oil. But at the same time we want to protect the future of our country and our important national infrastructure, right? So we want to find SUVs, but we don't want -- we want to differentiate them from busses and from tanks. And so the question becomes how to detect and classify what kind of target I'm looking at. And the problem always in these kind of problems is there are nuisance parameters. Namely I don't -- I don't have a registered image with the bus or tank exactly in the center of my image scaled correctly at the right rotation. I always have some sort of articulation parameters. And so you know what we do? We do what's called a match filter, which means I'll have a database of pictures, right, of different rotations, translations, and then I'll just do a comparison and find the best fit, okay. So this is what I do essentially. I have some data, a picture of a bus. I want to know, is this a picture of an SUV? Well, I'll have a database of pictures of SUVs at all different translation, for example, and rotations. And now what I have is a pattern recognition task in high dimensional space. These are N dimensional points. Each picture is just a point in our N. And now I just need to ask the question, what is the distance for my data to the best fitting SUV, bet fitting bus and best fitting tank? Closest wins. Everybody got that? It's standard. This is match filter. You can do closest wins either by L2 distance or by inner product. All right. It's the same thing. So that's what we do. And now let's think, though for a second. So how does compressive measurement come in here? We can think for a second what is the structure of this database of pictures of SUVs? What is the structure of a database of pictures of the same object with different rotations and different translations, horizontal, vertical. How many parameters do I have? I have three parameters, right? Translation, up, down, and rotation. Well, in fact you can show that the set of all pictures with these three articulations forms a low dimensional manifold in this high dimensional space. In fact, it forms a three dimensional manifold or three dimensional sheet in high dimensional space. So really when you think of match filters and you think of closest, nearest neighbor classification in this case it's really nearest manifold classification. I have a data point. Am I closest to the tank manifold, the bus manifold or the SUV manifold? And that's really what the match filter does. So where does compress sensing come into this? Well, the really nice thing is that you can actually show that if you have a smooth manifold that is K dimensional now, so it's not sparsity, this is just K number of articulation parameters and you do a random projection of that manifold to a low dimensional space, and you take K LogN measurements you can actually show that you stably embed that manifold, meaning all the geometry and topology of that manifold here in high dimensional space is reserved under a random projection to a low dimensional space. So what does that mean? That means we can actually do compressive inference directly in the low dimensional space without ever having to reconstruct back to the high dimensional space. So here's just to make it clear. Instead of reconstructing from our camera, maybe this compressive camera a picture of each of these, we can actually do the nearest manifold classification directly on the randomized measurements from each of these three targets, which can save us never having to solve linear program to reconstruct, being able to work in a much lower dimensional space and being able to store these templets that are much lower dimensional than traditionally done. Okay? And here's just an example showing let's look at classification rate that with differing amounts of noise that in a moderate amounts of noise even with just 50, 60, or 100 random measurements, these are 65,000 pixel measurements, 65,000 pixel images, but with just on the order of 100 measurements you can actually get very, very high classification accuracy and basically single pixel resolution of the translation of the object. So showing that you can do a lot of pattern recognition directly on these randomized measurements. So hopefully that makes sense. So I think I will end just by saying you can actually push this into the network sensing realm and in particular what you can show is that if you have network sensing resources, be they cameras, SensorNet nodes, et cetera, you can actually show that unlike a standard SensorNet system where the amount of communication out of the network will scale with the number of sensors and the resolution, okay, you can actually show that if you take randomized measurements of the data within your network, you can actually show that you can have communication and computation bandwidth that actually scale only linearly in the information rate of your sources and a logarithmically in both the number of sensors and the resolution of those sensors. Which is what we feel is really, really critical to being able to deploy very, very wide networks of high dimensional imaging, you know, imaging or data acquisition sensors. You really have to move towards systems where you can have logarithmic growth of computation and communication rather than linear growth, right, to avoid bandwidth collapse. Okay. So I think I will end there and just jump straight to the conclusions, because I think I overstayed my five minutes and made it ten minutes. And just ask if anybody has any questions? And I think I will just put up this website that's the community resource for the compressive sensing community. We passed about 200 papers recently, but they're nicely grouped into tutorials and applications and network sensing in other areas. So I just invite you to go here and grab any information that you'd like, and thanks very much for all your questions. (Applause). >> Richard Baraniuk: Any additional questions? Yes? >>: (Inaudible) I guess you're -- the contribution is more to (inaudible) that random sensing is applicable for a large sort of (inaudible) and the compressibility (inaudible) how would you compare how much compression you get from, you now, that -- you know, (inaudible). >> Richard Baraniuk: Right. Right. Okay. So that -- and that -- yeah. So that's a very good question. That's a very good open research problem. And let me kind of restate it. Basically you're right on, that the contribution of this work is to show that we can acquire data using randomized data acquisition and then be able to do things with that randomized data, either process it directly or fuse it together or reconstruct. However, all of the theory that I've talked about is based on very naive models, right. In fact, I can actually go right to a picture here. Based on a very naive compressibility model. And this is to Cormack's point that my data either is case sparse, right, meaning that at any one given point in time there are only three active Gabor coefficients or I have a model that my coefficients of my wavelet decay like a power lot which is an independent coefficient model. So I would argue that this whole field of compressive sensing is predicated on the absolutely most naive possible compression models. Right? Which would be equivalent to a JPEG model or a JPEG 2000 model where you just compute a wavelet transform and then just look for the biggest coefficients, encode their values and their locations naively and then throw away all the rest. We know that this isn't what's done, right? There are all kinds of elegant structures that have been found underlying wavelet transforms, underlying Gabor transforms and I think that it's a very important open problem to try to tease those out of these -- basically utilize those to do even better with randomized (inaudible). So here's an example. So we know what core McKenzie would say is I'm a -you know, in a wavelet compression algorithm I know there's a tree structure and if I have a big wavelet coefficient, I know that probably one of four big child wavelet coefficients, I know probably the parent is large, and I can exploit that to encode location information very efficiently. That's not exploited in this other model. I know in this Gabor case not only at any given point in time are three Gabor coefficients active, but they evolve smoothly with time. So in fact, what we've been doing over the last few months is actually exploiting these, trying to fuse the theory of graphical models, which can very effectively capture these secondary tertiary structures and then use those to do better at, for example, reconstruction or other cases. And this is just one example. So remember I talked about the LogN dependents? So this is an example of reconstructing a simple target but needing a number of measurements that essentially is growing linearly with K rather than times LogN. And these are the state of the art techniques, okay, that agree with K LogN. And the difference is these are based on this naive coefficient model, and this is based on a graphical model, a mark of random field type model. >>: (Inaudible). >> Richard Baraniuk: What these are based on, these are state of the art compressive sensing results which are based on loss E compression results of 15 years ago, 20 years ago, right. So this is moving more closer to compressive sensing results based on loss E compression results from five years ago. And I think the goal is to really try to bring it up to date. Hopefully that answers your question. Yeah? >>: Examples you showed a second ago with these graphical models, I mean can this still be in the framework of sparsity that you just haven't captured the full sparsity? >> Richard Baraniuk: Exactly. I would argue that going all the way back to Cormack's (phonetic) question, to a first approximation both of these transforms are absolutely sparse. That's a good first approximation. Then do a second order approximation they have this structure that the large coefficients have a correlations in their magnitude. And as I indicated. And so you get a nice big -you know, you get to K LogN by this initial sort of sparsity argument, and then you can get, you know, even lower number of measurements by using this secondary graphical model type structure. And you can actually rigorize this and figure out even just exactly how many measurements you do need. Yes? >>: There's a -- I mean, if you look at (inaudible) sparse representations before, right, (inaudible). >> Richard Baraniuk: Absolutely. >>: And there's a long -- I mean, there have been a number of case like say if you look at the zero crossing, what (inaudible). >> Richard Baraniuk: Yes. >>: If you look at the fractal stuff, if you look at a lot of kind of stuff that happens in the '90s and stuff. A common trend among several of those really imaginative, really creative, really exciting representations. >> Richard Baraniuk: Right. >>: And I think each of them kind of came to grief on, you know, there were very, you know, unstable (inaudible). Once you start quantitating you know, you can prove very nice reconstruction results, but, you know, quantization was like (inaudible) I think the conclusion was that you know, yeah, uniform sampling is exciting but as soon as you deviate from it, things can go to hell in a hand basket quite quickly. >> Richard Baraniuk: I think. >>: (Inaudible). >> Richard Baraniuk: So I think that's a very good question. The question was -well, I think everybody can hear. Having to do with a stability. So I would argue I would just say two things. The first is for mid to high signal to noise ratio problems -- well, first of all I just say this theory is numerically stable. Unlike the zero crossing work which was not numerically stable. You can actually prove results here saying that if the input, the signal X that's coming into my sensing system has a certain SNR, I can actually -- I actually have a bound on how badly the SNR could be degraded after sensing randomly and then reconstructing. So unlike those theories, there actually is -- there are stability results. So I think that that's the thing that's -- one of the things that's most exciting about this theory is that this can actually be applied in a lot of different types of practical problems. However, I will agree with you that high degrees of quantization, high amounts of noise are still a challenge and basically what you start to get into there is less and less sparse approximation problems and more and more finding needles in hey stack type problems which become -- which become very, very difficult. So unlike what I -- how I would characterize it is there's a grace full degradation of uniform sampling with SNR to really quite low SNRs whereas here there will be a threshold effect. You'll have a very nice decay and then once the noise becomes too large you will basically have -- you'll basically lose your ability to sense. But there have been some progress on this and Petro Bafunos (phonetic) at Rice actually has been able to do some really nice work on reconstruction using just single-bit measurements. So taking that Y vector, right, the randomized measurements and actually just taking the sign of those, so single-bit measurements and then reconstructing, not using linear program but by other type, other kind of means. And this is an example of the target he wants to get -this is 4096 bit -- pixel image, 8 bit pixels, that's what you'd like to see, and this is actually essentially one bit per pixel measurements, which is pretty good, and this is actually one-eighth bit per pixel measurements. So what we're hoping we can do is to push this, you know, just be able to follow the trend and as better and better compression algorithms are developed, try to be able to sense or reconstruct data that at least close to those kind of (inaudible). >> Feng Zhao: Thank you. >> Richard Baraniuk: Okay. Thank you. (Applause)