>> Feng Zhao: So I'm Feng Zhao. It... Baraniuk from Rice University. And quite a number of...

advertisement
>> Feng Zhao: So I'm Feng Zhao. It is my pleasure to introduce professor Rich
Baraniuk from Rice University. And quite a number of us have known Rich for
quite sometime and Rich had visited MS before, a couple years back. So this is
your third visit.
Rich has been working in the areas of signal processing and in particular more
recently working on compressive sensing and he actually organized or edited a
special issue in the signal processing magazine on the topic of compressed
sensing which I think is a great resource. It was March '08.
And today Rich is going to give a talk on compressed sensing and various
computation experiments and then the test beds that he has been building. So
with that, Rich.
>> Richard Baraniuk: Okay. Thanks. Thanks very much for the invite. It's great
to be here. So my job is to talk a little bit about this field of compressive sensing,
compressed sensing, compressive sampling. It goes by a lot of different names.
And what I'm going to do is talk about motivation of why we're worrying about
why we're doing compressive sensing, talk a little bit about some of the
mathematics behind it and then talk about some of the real world experiments
that people are doing that show that this brand new theory really looks like it has
some real potential for applications. So I hope that's okay with everybody.
And I'd also like to keep it inner active. So if you have questions, just interrupt
me as I go along. And this of course is work that is done by a number of different
people around the world. I'm going to talk about not just work from our group but
also work from a number of different groups around the world.
But none of this would have been possible from our group without the
contribution of all these current students and alums.
Okay. So sensing. So it's hard to imagine doing anything in this -- in science
and in engineering without somehow turning the physical world into data that we
can process, that we can analyze, that we can try to feedback into that physical
world.
And in case you haven't noticed, there's been a revolution, a digitizing revolution
in sensors that we're starting to push the analog to digital interface closer and
closer to the front end of a whole host of different types of sensing systems that
started with, for example, digital audio, things like analog to digital converters for
audio systems, for cell phones, moved into the digital still camera realm where
we sample light, right, reflected from objects. And it moved into digital video and
into an increasing array of different type of digital imaging modalities.
And the reasons for doing this are clear, right, because of the tremendous
advantages of digital data processing reprogrammability, easy storage. They're
just tremendous strong arguments for why we want to do digital data acquisition.
But as a digital data acquisition and processing person it makes me nervous a
little bit, right, and I think Rico will agree, this puts us under a lot of pressure,
right. Because digital sensing and processing has been so successful,
everybody wants to do it, and we're pushing that interface between the analog
world and the digital world closer and closer to the front end of increasingly
wideband types of sensing systems.
So just because we build an A to D converter we're not happy, right? We're
constantly interested in pushing the denser-denser sampling. Why? Because
we'd like to have higher resolution in our sensors.
We're also increasingly not interested in just having a single sensor, we want to
network the sensors, covering for example sensor nets that are covering a broad
geographical area or distributed camera networks that are not just reconstructing
pictures of an environment but actual 3D reconstructions of an environment.
And we're not just interested anymore in just audio signals or just visible
wavelength of light, we're interested in sensing extremely broad bandwidths of
electromagnetic radiation, for example, all the way from infrared and terahertz up
into the gamma rays even into the camera world.
And when you multiple all this together, right, all these growing demands, it's not
hard to see that you end up with an absolute deluge of data that really presents
two big challenges or a number of challenges but two really big once, the first of
which is how are we going to acquire all of this digitized information? That
creates really a hardware challenge of building new A to D interfaces, analog to
digital interfaces.
The second, though, is even if you could acquire all of this data, how are you
going to process or store this data efficiently? So it becomes a very important
data fusion and processing problem.
And I think that the important thing that's going to come out of this talk is this area
of compressive sensing is really trying to address both of these problems
simultaneously, how to acquire big data volumes and then how to process them
really efficiently. And I'm going to try to bring up both of these during this talk.
So just to see how we got ourselves in this mess, right, let's do a really brief, brief
review. And let me get a show of hands. Nyquist sampling theorem, Shannon
sampling, show of hands, people familiar with that? Okay.
So this is paradigm -- this is dogma, right, in the signal processing world? The
dogma is right here. If you sample densely enough, namely at twice the highest
frequency in your signal, analog signal, then you can hope to perfectly
reconstruct the original analog data, right? That's the theorem, Shannon/Nyquist
theorem goes back even further than that.
And this is the reason why in your cell phone we sample your voice signal
uniformly at eight kilohertz, for example, or in a CD digital audio processing
system we might sample at 48 kilohertz or 44.1 kilohertz uniformly.
This is why in your digital camera when the light comes through the lens it's
focused on a CCD array or a CMOS imaging array that again samples uniformly
where this sample spacing is telling you it's very, very closely related to this
Nyquist bandwidth of your signal.
Everybody with me? Like I said, feel free to interrupt me.
So let's just look at how this actually plays out in an actual application and why
this might be an issue, and let's think of a digital camera example, because
everybody has a digital camera. Does everybody have a digital camera? Okay.
So let's think in particular about a cell phone camera, because that is a great
example. So I want to take a picture of Nicolaus, so here is Nicolaus, and what
do I do? I bounce some light off Nicolaus and I focus that light through a lens on
to a sampling array.
So the first thing we do in a digitized sampling is we focus in on an array. This is
actually a color buyer sampling grid. And we acquire N samples of the light.
This is going to be the notation for the entire length of the talk, N sample. So N is
how big? Megapixels, right, 10 megapixels. Kodak just released a 50 megapixel
camera. Within weeks another company announced a 60.1 megapixel camera.
So there's this arm's race of who can have the most megapixels. So millions and
millions and millions.
So step one is get a lot of data. Step two is to realize there's too much data.
And so what we do typically is we compress, right? No one stores tiff images on
their cell phone camera or tries to transmit them over the wireless network, right?
We always compress by an algorithm by an algorithm, by an algorithm like for
example JPEG or like JPEG 2000, and what we do is we take these N pieces of
data and we crunch them down in some way. We're going to talk about that a lot.
We crunch them down to just N pieces of data, just N number -- sorry, just K
numbers. Where K is much, much, much smaller than that. And where K is
small enough for the application of interest we can either store it on our memory
chip and store lots of pictures or we can transmit these over the ether, right, so
that our friends can look at the picture.
So this is really the -- to use the term paradigm of a lot of data acquisition. You
sample and then you compress.
So what is behind the idea of transform domain compression, right? There's
been a lot of work done on here at MSR about this. But the fundamental idea are
two concepts, closely related concepts called sparsity or compressibility. We're
going to talk a lot about during this talk.
So here's a picture of Nicolaus. It's N pixels. Remember, millions of pixels. The
key idea is that I could rerepresent the information in this picture using a
transform, foray transform, wavelet transform, DCT transform. Show of hands,
people? Comfortable with this?
Okay. So I can rerepresent and it's an orthogonal transform, so information
preserving. So I have N numbers over here and I have N numbers over here.
So there's no information lost going from this domain to in this case the wavelet
domain.
But what we've done in this rerepresentation is we've chosen just the right basis
in this case so that the coefficients in this basis are what's called sparse. What
do I mean by sparse? Here I've chosen a color map where blue means a
coefficient is zero and you can see what we've done. We've taken this picture of
Nicolaus and we've rerepresented in a way in terms of this wavelet
representation where in fact most of the wavelet coefficients are either zero or
very small. There's just a very few in particular K coefficients that are large. And
in a wavelet transform, it turns out that they're the coefficients that trace out the
edges, right, in the image. You can see, for example like the eyebrows and the
nose of Nicolaus here. Right?
So we've taken a representation, it's just a rotation, in N dimensional space,
totally informational preserving but we have this new property where most of the
coefficients are small. And really what goes on in JPEG or JPEG 2000 is simply
rerepresenting in terms of a basis where we know our data is sparse encoding
only the positions and values of the nonzero coefficients of the large coefficients
and throwing away all the small coefficients. That's really how JPEG and JPEG
2000 work. And you might think that this is something very specific to image
data but in fact I would argue that this idea of sparsity or compressibility is really
a generic property of a whole host of man made and natural signals.
So here's an example of a sonar wave form that's actually yelled out by a large
brown bat. Okay? It's an echo location chirp, it's a sonar chirp. And this signal
has a very large time bandwidth product. What I mean by that is it's spread out
all over in time, and it's also spread out if you look at it's foray transform it's
spread out over a very broad foray bandwidth. So you need to sample very, very
high frequency in order to not alias the signal.
But on the other hand, if you instead of looking at this signal in the time domain
or the frequency domain, you look at it in the time frequency domain, using either
a Gabor transform or a wavelet transform, some kind of a filter bank lap
transform, then you see there's again this kind of sparse structure that in fact this
signal has a very simple explanation that it's just a harmonic at any given point in
time there are just three frequency that are active in this particular signal.
And this signal, you could think of this representation much like a musical score,
this signal just sounds like (inaudible), right? So very, very sparse again. And if
you wanted to compress this, why would you do it in the time domain or the
frequency domain? Clearly you'd do it in this timeframe.
So really this is what goes on. So we sample our data and then we compress
using a representation like wavelets or DCT, et cetera.
So you might ask yourself, is there anything wrong with this system because this
is really what's inside, you know, huge, huge number of technologies out there.
And I would argue that there's really two issues here.
The first is an inefficiency argument. Why do we go to all the work to acquire N
pieces of data when we're going to throw out all but K pieces of data. Right? So
there's this megapixel push to have hundreds and hundreds of megapixels and
we're going to compress down to much, much, much smaller files.
So that seems inefficient. Why? Puts a lot of pressure on building this sampling
array. The second is a bit more subtle, a bit more philosophical and it's a
theoretical mismatch between the world of sampling, which is a completely linear
world, right? Let's just go through that really quickly. The set of signals that can
be sampled without aliasing is a bad limited subspace, it's a linear subspace.
The actual sampling process is linear and synch interpolation reconstruction is
linear. Everything to do about sampling is really linear. Whereas other than the
transform, the rerepresentation in terms of say wavelets, everything to do with
compression is non linear keeping the big coefficient, throwing away the small
coefficients. That's a highly, highly non-linear process.
And not only that, as we will see, the signals that are compressible form
extraordinarily non-linear sets that are absolutely not like band limited
subspaces. Okay? So really I would say there's a theoretical mismatch and why
are we jamming together sampling and compression is this.
So really this idea of compressive sensing is really to try to bring these together
into one block that goes directly from analog information over here and creates
measurements over here, they will be lineal measurements, and we're going to
take M of them. So instead of sampling to N, compressing to K, we're going to
try to go directly to M measurements, and what we're going to do is we're going
to try to push M as close as possible to K, where we take K as kind of our think of
it like the intrinsic information from our source, right, you can't go below K. We're
going to try to push as close as we can to K.
And that's really the idea. And then there's going to be a reconstruction process
where we're going to actually be able to tease the image out of our or data out of
our measurements. So that's really the idea that I'm going to talk about. Any
questions at this point? That's a long intro, but that's the basic idea.
So what I'd like to do, and I've cleared with Feng to go to 11:30, right, but I'm not
supposed to go past 11:30. Let's talk for 10 minutes or 15 minutes about some
of the mathematics behind this and let me just ask a show of hands, who has
seen some of the math behind compressed sensing? Anybody out there?
Maybe a few of you?
So I'm going to, hopefully this will be like a review for people who have seen it
and for people who be haven't, hopefully this will make some sense as we go
along.
Okay. So let's talk about sampling first, standard stuff. And let's think about
sampling a signal that is itself sparse. Okay? What do I mean by that? We're
going to say that our signal X, which is called K sparse, which means think of X
now like a vector, well it is a vector, it's of length N, so it's very, very long, and it's
sparse, meaning all of its entries are zero except K entries are nonzero. So in
this case, K would be 3, because there's only 3 color blocks.
I apologize that I reverse the color map. Remember before blue was zero?
Sorry. Now kind of white is zero. But I hope everybody gets the idea.
So this would be a signal that would be all zero except for three spikes, right? If
this was a picture that I had vectorized, it would be a picture of a starry sky at
night where everything is black and I see three stars. So it's very, very, very
simple signals. Everything I'm going to talk about for the next ten minutes is with
these kind of signals and then I'll explain how this tends very easily to signals that
are compressible in DCT or wavelets. It's very, very straightforward.
Okay. So this is our signal model, and very roughly speaking oversimplifying you
can think of oh, I lost a slide. I can't believe this. You can think of sampling as
essentially like multiplying this vector by an identity matrix, okay, where you
basically apply an N dimensional operator to the signal to acquire samples, which
are going to be if the samples are an identity matrix, they will be exactly just the
entries of the vector X year, and then you can think of synch interpolation of this
trivial undoing of the identity operation. Okay? I think -- I don't have to go into
this in very much detail, so I think I'll just skip over it.
But we're not going to do that. Okay. What we're going to do instead is we're
going to do a dimensionality reduction. Okay? So we're not going to talk about
an N dimensional operator operating on our vector X, we're going to talk about
operating on X by a set of linear measurements that are M is basically a matrix
operation that has M rose and N columns. Okay? So we're going to talk about
sensing by dimensionality reduction, right? So instead of going from N
dimensions to N dimensions we're going to go from N dimensions down to just M
dimensions. Remember, this is my number of measurements.
Okay. So let me just step back and go through this very, very carefully. I had X.
X is of length N. It's very sparse. I'm going to take linear measurements of that
X that is sparse and my measurements will just be equivalent to a matrix
multiplication because they're linear measurements. Where each measurement
YI is just going to be the inner product of the Ith row of phi with the entire vector
X. So it's a linear measurement system.
But the number of measurements M will be much, much less than the dimension
of the vector X. That's a dimensionality reduction. Is everybody with me? And
the game of course is going to be try to push the number of measurements here
as close to K as possible. And what we'll see is you can get very, very close. Is
everybody with me?
Okay. So let's just talk about this. So this is problematic, I would say, right, from
a mathematical point of view this is problematic because this matrix, everybody
put on their linear algebra hat, this is a non full-rank matrix, so it loses
information. What do I mean that?
I mean that if for any given Y there are infinitely many Xs that when multiplied by
phi give the exact same Y. There's the whole point of an null space, right? So
this is obviously I could have a picture of Feng and a picture of Rico and I could
fence both of them with the same sensing system and I could get the same
measurements. How am I ever going to hope to untease those and figure out
whether it was a picture of Feng or a picture of Rico? Is everybody with me on
that? Yeah. So that's a problem. Okay.
But the really, really cool thing is we're not interested in any old X, okay, we're
not interested in arbitrary pictures. The picture of Feng is compressible in DCT.
The picture of Rico is compressible in DCT. So now let's think of what happens if
we have a sparse vector X. Well, in fact this matrix structure is actually much,
much, much nicer because think of what happens when you multiple this matrix
by a vector that only has three nonzero entries? Well, basically this multiplication
is just going to select three columns from this fine matrix. Does everybody see
that? Y is just a combination of three columns.
So if I have sparse data -- in fact, my matrix doesn't look wide and short at all, in
fact my matrix essentially looks very tall and skinny, which is a very nice matrix.
Right? So we have what looks like a very bad matrix. But when our data's
sparse, it's actually very nice matrix, as long as N is bigger than K.
So there's hope, right? There is hope. Okay? So what can we do? So here's
going to be our design goal. So now we're going to actually talk about designing
a sensing matrix. Our design goal is going to be designing a phi so that if I select
K columns arbitrarily from this matrix and then I look at this matrix, that matrix
has full rank. Because this is a matrix that's going to preserve information in
sparse vectors. Give a second for people to see this. Does that make sense to
everybody? Okay. Hopefully.
In particular though, we're going to ask even a little bit more, okay, just to add to
the complexity here. Let's think of two sparse signals. Picture of Feng, picture of
Rico. The thing I don't want to have happen is the measurements from my
system mapped to the same Y, right? So what I want to do is if Feng is X1 and
Rico is X2 and they're a certain distance apart I want to make sure that when I
measure them, their measurements stay far apart. Does that make sense?
Okay? Well, now let me think about what is the distance between Rico and
Feng? Well, if they're both K sparse, then what is X1 minus X2? It's just 2K
sparse in general. Right?
So in fact, if this makes sense, hopefully if I can instead of asking about K
columns, if I can ask for a matrix phi so that if I draw 2K columns out arbitrarily,
that I get a nice full rank matrix I can in fact preserve the distances between all K
sparse signals.
Which mean if you are all K sparse pictures, right, and I take the same
measurement matrix phi times all of you and you all have inter distances
between each of you, all those distances will be preserved. Which means that
could preserve the structure of sparse. Okay? Yes, question.
>>: (Inaudible) thinking about matrices that you (inaudible) two elements that
bring those (inaudible) very likely to be ->> Richard Baraniuk: So in fact, you were coming to the punch line of this whole
buildup, right, this whole -- so hold that thought for about one minute, okay. Let
me just ask if there are any questions at this point.
Okay. So this is called the restrictive astronomy property of order 2K, which just
means that if you restrict the matrix it looks like an ice only entry. And this looks
like something that can be done. The unfortunate thing about this is that this is
actually an NP complete design problem. So to design a matrix phi is
extraordinarily hard, basic combinatorial. And so we were just really excited for a
minute, right? I sensed it in some of you. Now we're sort of depressed, right,
because this is a problem.
And what Rico offered because he knows the Russian mathematical literature
was there was a result due to Kassian and Gluskin (phonetic) from the 1980s
that he's aware of that in fact it's an NP complete hard -- it's an NP hard -- NP
complete design problem to design a matrix phi. But if you just throw down a
matrix at random, for example a random Gaussian matrix or a random plus
minus one matrix, you can actually show with high probability it will have exactly
this property that I just talked about; namely that if you select 2K columns out of
this matrix arbitrarily it will have a very -- it will basically have a high -- eigenvalue
control. It will have a full rank.
Okay. Which is bizarre to a lot of people who are used to sensing systems that
are predictable and that are deterministic, because what this is actually
suggesting we do is randomize data acquisition, right? This is suggesting that
rather than sampling signals uniformly compressing them using something like
JPEG or JPEG 2000, we're proposing to just sense where the entries of our
measurement vector are just random linear combinations of our vector X that
we're interested in. Which is kind of bizarre. But as we'll see, quite, quite useful.
Any -- yeah?
>>: Say that you now for a large enough dimension space I choose K much less
than N directions at random with high probability (inaudible).
>> Richard Baraniuk: Yeah. It's nothing other than that. There's nothing
miraculous at this point.
>>: (Inaudible) tell us this --
>> Richard Baraniuk: No. Well, in fact there's a whole -- and actually this is a
good point to talk about some of the lengths. First of all, if you think of this from
for example a frame theory approach or just Gaussian random vector approach
this should be kind of obvious. That's one. Two, if you're from the JohnsonLindenstrauss world of pattern recognition machine learning, this is also
Johnson-Lindenstrauss type theory.
And then third, there are people who have looked at a large class of different
kinds of distributions, not just Gaussian vectors, and shown that they -- these
kind of may entry sees will also have this property in the random matrix literature.
So I'm not trying to say this is anything new. This is known and this is just step
one of the whole process. Yeah? But good point.
>>: (Inaudible).
>> Richard Baraniuk: Well, that's I'll leave it to you to figure that one out. And -yeah?
>>: So far you're kind of (inaudible) that you don't know which ones are -- it's a
sparse but you don't know which ->> Richard Baraniuk: Absolutely. You don't know which -- you know it's a
sparse vector. You don't know where the big values are.
If you knew where the big values were, this would be super easy. K
measurement, you're done, right? But the important thing here is that what we're
doing somehow in these measurements Y is encoding both the position and the
location of those entries. Both the position and the location. Yeah? Other
questions? Yeah.
>>: One more issue seems to be that if these measurements are really numbers.
>> Richard Baraniuk: Yes.
>>: Then you're going to end up quantitizing.
>> Richard Baraniuk: Absolutely. Absolutely.
So everything -- and in fact, everything I'm going to talk about through this entire
talk is real valued theory. But this is an area of a lot of research right. I'm going
right for research right now is first of all, how robust is this quantization and why.
And then even better can we design systems specifically knowing that we're
going to be quantizing Y. But this is an area of active research right now. It's still
very open.
Other questions? Okay. So this is just part one. Information preservation. And
the term for people in the machine -- the learning literature is this is an
embedding, right, we're just embedding the structure of height. These N
dimensional sparse signals into (inaudible).
So now let's talk about recovery, which is why these dudes got famous, right?
They didn't get famous for doing the forward problem is the recovery problem.
Because what we're really interested in is taking X, sensing it randomly and then
taking our measurements and getting X back. And of course this is again a
difficult problem because this is an ill posed inverse problem. The matrix phi is
not full rank so they're infinitely many Xs again that when multiplied by phi will get
Y.
So what we're going to do is we're going to exploit, right, we're going to solve this
ill post inverse problem by exploiting the geometry, high dimensional geometry of
sparse signals, because in general this is a very, very ill posed problem. But
we're going to exploit the geometrical structure of sparse signals in order to solve
this inverse problem.
Okay. So let's talk a little bit about geometry of sparse signals. So very briefly.
So what is a sparse signal? Sparse signal is a signal where only K out of N
coordinates are nonzero. So now I want everybody to put on their N dimensional
thinking cap, right, go to N dimensional space, close your eyes if it helps, and
here we are in the N dimensional space. And we might ask what does a sparse
X look like?
Well, I can tell you that the vector X it has all these entries are zero coordinates
or zero except for just K of them are nonzero. So does anybody have a sense of
what the set of all K sparse signals in N dimensional space looks like? Anybody
have a sense? Any feeling? So it's a very, very bizarre set, and I'll just draw you
a picture. It's actually a union of hyperplane. This is in three dimensions. Two
sparse, two signals that are K equals two sparse in 3D.
Basically all the signals living along a hyperplane of dimension K that is aligned
with the coordinate axes. So it's a very strange, strange set. Is this a band
limited subspace? Does this look anything like a subspace? It's a union of
subspaces, right? N choose K of them. Many, many, many subspaces. Okay is
so it's a very, very non linear set.
Okay. So this is what sparse signals are. So these signals you can think of them
if I take my vector X and I sort its coefficient from biggest to smallest there will be
K and then at the value K they will go to zero.
Compressible is the generalization of sparse where the coefficients don't actually
ever go to zero, they just decay very rapidly. For example, like a power lot. And
really when you look at pictures like the picture of Nicolaus, there are actually no
really truly zero wavelet coefficients, right, in a real idealized image. They're just
getting very, very, very, very small, okay. With a power loss. Very, very quickly.
So now you can ask, what does the set of all power loss decaying signals look
like in N dimensional space? Okay? And if you think about that, it's basically the
set of signals that live in an LP vault, right, this is just like the L2 energy where
we just take the coordinates XI to the Pth power and we ask is that sum less than
1, for example, that's the P unit ball. But the key thing is we're interested in
powers P less than 1. So non-convex LP balls. These opinion cushiony things
that you all studied in class back in grad school but you really focused on L1 and
L2 and L infinity and you just talked about the fact that these exist, but they're not
really -- they're not convex bases, right?
These are really the set of compressible signals. Again, does this look anything
like a band limited subspace? So incredibly sea-urchiny like pointy kind of signal
set.
So I've kind of gone into too much detail here. But what is the one property that
you can sort of pull out of this intuition. Critical property about the geometry of
compressible or sparse signals. The key property is that sparse or compressible
acts live close to the coordinate axes. You see that? See that? Because you're
on a hyperplane aline with the coordinate action east here, you have to live
somehow close to the axes. Because you're on this pin cushion here, you have
to live close to the coordinate axes. You don't live out away from the axes.
So now let's see how we could use this to solve this recovery problem. And now
I'll speed up a little bit. This is actually a bit easier.
So what are we interested in doing? We have an ill posed inverse problem, we
have a signal X that's sparse, we've multiplied by a random matrix and we've got
our measurements Y.
Now our job is given Y to get X back. So what do we do? What's the geometry
of the setup? Well, phi is a matrix that has a null space. So you can look at all of
the Xs, the infinitely many Xs that when multiplied by phi give Y. They just form
an N minus M dimensional hyperplane in this N dimensional space. And it's
translated over to the true sparse solution X.
And what's the critical point? This is a random matrix so the hyper planes at a
random angle. There's this randomly angled M minus M hyperplane in high
dimensions.
Our job in solving an inverse problem is to final the right X out of this infinitely
many possible Xs, find the right one. So what is the right one? Right? You have
to have some criterion. And the standard one is li square. That's what we use all
the time.
So we might solve the li square's problem. So we're given Y equals phi X, I want
to find X. So I search over all X sets of phi equals Y, and I look for the one with
the minimum energy. That's what li squares would do. Okay?
Y is a set nice. It's nice, it's fast, it's closed form. You just apply the pseudo
inverse. Sounds good. But it's actually the worst thing you can do. Okay? In
fact, it's provably with high probability always gets the wrong answer. So here's
an example of a sparse X, multiple by Gaussian random matrix, solve the li
square's problem by pseudo-inverse, this is the result.
Okay? Terrible. Okay? Why is it -- it's all Gibbs phenomena basically, right. But
when you really ask what is the L2 energy here and here? This is much smaller
energy than this signal. So the problem is not with li squares, the problem is that
li squares says nothing about sparsity of our data vector, it only talks about
energy in our data vector. Okay?
And you can actually look at the geometry of why this happens. Basically when
you talk about solving a li squares problem, you're basically just talking about
blowing up an L2 ball, which is just a big hyper sphere until it touches this
hyperplane. Think of the randomly direction hyperplane blowup an L2 ball.
Where is it going to touch? Is it every going to touch along a coordinate axis?
No. Right? With the high probability it's going to touch somewhere out away if
the coordinate axes because of the shape of the L2 ball. So that's why L2 is bad.
Yes?
>>: (Inaudible) but that drawing which is pretty good shows that if you try to ask
a different question.
>> Richard Baraniuk: Right.
>>: Which point in this old hyperplane is very close to where the axis, right?
>> Richard Baraniuk: Yes.
>>: You're going to find (inaudible), right? Because the hyperplane will cross the
axis in several points.
>> Richard Baraniuk: Yes. So let's hold that thought for a minute, okay,
because great point. So this is a problem. This is not good.
So what you might ask is the problem that Rico just suggested which is replace
the L2 norm with let's search through the null space and look through the
sparsest X. That's clear. Because we want a sparse X, not a small energy X.
And you can formalize this bring saying it's like looking for the signal with the
smallest L0 norm. This is called L0 optimization. And in fact, this will work.
And in fact, counter to Rico said, you might think that there are -- basically this is
like blowing up that set of hyper planes or looking where that set of K
dimensional hyper planes intersects with that vector. And you might think that
there are many, many, many possible solutions but in fact you can prove that as
long as you have M equals 2K measurements you can prove that you will find a
unique optimizer, okay, a unique optimizer. And not only that, there is no sparse
solution close to that solution. So it's a great answer. Okay? Really, really great
answer.
So this is exciting. This is really good. Problem is it's an NP complete problem
to solve this optimization because you truly have to search all N choose K hyper
planes, see where they cut through that, the big green hyperplane and that's an
NP complete problem. So again, we're just getting excited and we're let down.
So now the answer to Cormack's (phonetic) question. Why did these dudes get
famous? These dudes got famous, these dudes, because they realized that you
could convexify this problem and you could look for a norm in between this L0
norm and the L2 norm which is the first one -- you basically instead of using LP
optimization -- use LP optimization until you get to the first one that's convex.
Okay? Which is the L1 optimization, which lives between L2 and L0.
And in fact, this is a linear program it solved. You search through the entire null
space for the X's smallest L1 norm, and you can actually show that unlike this
case where you need just 2K measurements you can actually show that as long
as you have K LogN measurements you're still very close to K. It's just a log
factor of N away.
You can actually prove that the dimensionality is right so that you will have an
exact solution to this optimization problem, which is the exactly right signal that
you started with. And not only that, that you were far away from any other sparse
solution. Okay?
So this is interesting because this is now a problem that can be solved in
polynomial time, right, using just a simple linear programming optimizer. And
there are just tons of people now working on linear programs to make this really,
really, really fast. Okay.
So just to summarize this why does this work, well this works because of the
geometry of the L1 ball. So people, remember the L1 ball in high dimensional
space is a diamond. It's like a hyper diamond. Well, it's kind of pointy, right?
Unlike the L2 ball that's kind of bulbous, the L1 ball is kind of pointy. And in fact
as you go to higher and higher dimensions it gets pointier and pointier. And so it
has these points that align with the coordinate axes and as you blow up that L1
ball where does it pierce? It will pierce that hyperplane at the sparse solution.
Okay.
So this is the thing that really broke everything open in 4 years ago with people
realizing that they could do randomized data acquisition and they could use L1
optimization to be able to recover perfectly these K sparse signals or if the signal
is compressible, get a very good approximation. Okay.
So hopefully this long, actually too long foray into the math behind this sort of
gives you a sense of why this is, first of all interesting and second, very different
from the usual band limited subspace uniform sampling sync interpolation, right?
It's a highly non linear recovery process.
Okay. So last thing we want to say as far as theory is what if my signal is not
sparse and it's not a picture of stars but it's a picture of Rico or Feng, right?
These signals will be sparse in the foray domain or the wavelet domain or the
DCT domain.
So what do we do in that case? Well, in that case, it's very simple because we
say my data X it's not sparse, but it's a sparse combination of some elements of
some other basis like a DCT basis. Which means there's an N by N orthogonal
DCT matrix that when I multiple it by a sparse vector alpha I get my X. So it's
just a combination of in this case three columns.
So now let's ask what happens when I sense X? Well, I'm going to multiple X by
phi. Here's my X down here and here's my phi. And now all you have to do is
remember if this is an IID Gaussian matrix, what happens when you multiple by
an orthogonal matrix? Another IID Gaussian matrix. Okay?
Beautiful, right? So the key is there's a universality property that it doesn't matter
what basis you're sparse in here, whether it's DCT, wavelets, Fenglets,
Cormacklets, right, you will be able to take the same random measurements will
be sufficient to preserve the information in the signal.
The only time you need to know which basis it was sparse in is in the
reconstruction process. Which I think is very useful for SensorNet type
problems.
Okay. So hopefully that makes sense. Let's just summarize and then go into
some of the applications. So the idea now is instead of uniform sampling and
then compression, we're actually going to take the light coming into our camera
and we're going to take inner products with random basis functions, and in the
image case each row of phi, each row of that phi matrix is just going to be a
different random picture.
And we're going to take inner products between the light coming in with these
random pictures, we're going to do that M times where M is K LogN, and this is
going to be enough information to reconstruct our picture of Nicolaus, using for
example linear programming.
So it's a very different approach of trying to do data acquisition. So why is this
potentially interesting? Well, one, it's interesting from a theoretical perspective
because it just lets us think about data acquisition in a very different way. So this
is clear from everything I'm talking about, right? There's all kinds of interesting
insights that you gain in the data acquisition by thinking this way.
The second is there's some nice practical properties that I'll just talk about really,
really briefly. First, I won't have time to talk about this very much, but this is a
stable process. Just in case people have questions about this. So if I quantitize
the measurements, reasonably finely I will be able to still reconstruct a very good
quality approximation to my original data. If there is noise added to my
measurements, I will still be able to reconstruct a reasonable quality
representation my data without the reconstruction process going wild, right, an
unstable. So there's a numerical stability.
Second, I already talked about this universality that when you take your random
measurements it doesn't matter what basis you're sparse in, you only need to
know after the fact which basis you are sparse in.
Second or thirdly, very useful for some, especially SensorNet problems. This is a
very asymmetrical process where you can think of the standard process for data
acquisition is asymmetrical, where you do most of the work at the encoder.
Think of an MPEG, your digital video camera. There's a lot of work going on in
your encode -- you know, the camera and the encoder.
The actual decoder is simple. You just put everything back together. CS by
comparison is exactly the opposite. The encoder is very dumb, you're just taking
random measurements. It's now the decoder that's smart.
For a lot of applications that's a really bad idea, but for some applications where
you really need to preserve battery life at your encoder or you only have very,
very limited computational resources this might make sense.
Almost lastly, there's a democratic property. And what I mean by democratic is
that because each measurement is a random linear combination of the entries of
a sparse vector, essentially each one of these measurements carries the same
amount of information. Unlike a JPEG compressed packetized data stream, a
compressive measurement packet stream, each aspect contains the same
amount of information.
And what means is there's kind of a digital fountain property that allows you to
just stream packets to a source and if a whole bunch of them are lost, it doesn't
matter as long as you get enough packets. It doesn't matter which ones you can
reconstruct. It is very handy.
And then finally there's a weak encryption property because you encode your
data in terms of random measurements if Feng and I use a random seed to
generate our random matrix and Rico doesn't know it, it would be very hard for
him even if he intercepts my packets to be able to reconstruct the picture that we
were both looking at.
Okay. So I've taken too long. So let's jump right into some applications. Quick
questions?
>>: You did pay a price, right?
>> Richard Baraniuk: What is my price?
>>: M is at least two times K.
>> Richard Baraniuk: Yes. And or K LogN but let's even say it's at least K LogN.
>>: (Inaudible).
>> Richard Baraniuk: True. True. Absolutely. But at least in the back of your
mind, just kind of think of roughly speaking if you were to encode a DCT in terms
of bits, it's N point DCT in terms of bits and you're going to throw K bits per
coefficient or some number of bits per coefficient, how many bits do you need to
actually encode the location of each big coefficient? You need login bits I mean
in a naive coder.
So in a sense, it's similar to what you would get from a naive coder. It would be
K LogN.
>>: (Inaudible).
>> Richard Baraniuk: But a big open issue, and this is what actually we're doing
in my group thinking very hard about is, yeah, how to really exploit what's done in
real world coder to do this reconstruction. Yeah. So at least naively it's actually
pretty close.
But there's, you know, 50 years of compression technology and like four years of
work here.
Okay. So let's talk about some examples unless I have some other questions.
Yeah?
>>: (Inaudible)?
>> Richard Baraniuk: Yes. K LogN measurement camera. Yeah. And I really
should have sort of set that better. You're going to build a camera where you
don't take N pixels and then compress, you just somehow boil directly down to M
numbers. And that could be M pixels or it could be M time multiplex
measurements. And I'll talk about that in like a minute. Good people push the
discussion forward.
Okay. So the first application is in art. So Gerhart Richter is a painter, and this is
one of his paintings. It's called 4096 colors. And this is actually a valid
compressed sensing matrix. We've been able to show that it actually has the
restricted isometry property, it has the Johnson-Lindenstrauss embedded
property and as a result it was sold for 3.7 million dollars.
So there's big money in compressed sensing art. That's the first lesson you
should take away. You might not get famous, but you might make big money.
And Richter is actually working on now a set of stained-glass windows for the
Cologne Cathedral that also had the restricted isom -- blocked restricted isometry
property.
Okay. Another example that's just I think will illustrate all the ideas is we built a
camera to do this. And this is a camera that takes just K LogN measurements
rather than N pixels. So imagine taking a light from the lens as it comes through
a digital camera, and instead of focusing it on a CCD array, focus it on a
micromirror array, the same kind of micromirror array that's inside this projector
where we have N of these micromirrors, N, and they are each the size of a tiny -a bacterium, and they can tilt 10 degrees this way or 10 degrees this way.
So what we do is we put a random pattern on the mirror array with roughly half of
them facing this way, half of them facing that way. The light from the mirrors
facing this way is just thrown away. The light from the mirrors facing this way is
focused via a second lens to a single photo dial. So what is this? It's an optical
computer to compute the inner product between the light from the scene and one
row of the phi matrix.
We do this N times. It's a time multiplexed camera, it's not an M pixel camera but
it's M time multiplex measurements. We get our random measurements Y, and
then we reconstruct with a linear program. So this is -- you could actually do this.
And here's our first pictures we ever took.
This is at 50 times sub-Nyquist measurement. So this is just two percent of -well, actually I should go over here. This is what you should see. This is using
just a portion of the array that is 256 by 256 pixels. So it's 65,000 pixel image.
This is what you would see with a regular digital camera with this kind of imagery.
And this is what you see with our camera taking just 1300 measurements. So
that's basically 50 times sub-Nyquist measuring.
So K. So you could certainly tell us in R. It's pretty crappy. But the other nice
thing about these compressive measurements is they have a progressive
property, just like progressive JPEG. If you're willing to take 16 percent of the
number of pixels, you can actually get a representation that's essentially as good
as the original data. So you also have a very scaleable measurement system.
Yes?
>>: What were you using for your sparsity model ->> Richard Baraniuk: Oh, yeah. In this case it's actually Haar wavelets and
TV -- well, in this case actually it was just TV regularized reconstruction.
But you could -- but it's very close to using L1 optimization with Haar, Haar basis.
>>: (Inaudible) the same measurements that you already have reconstruct it
differently potentially doing better, right? I mean, like in the future if you had a ->> Richard Baraniuk: Oh, absolutely. Absolutely.
>>: Images, you could get better images.
>> Richard Baraniuk: Exactly. Exactly. And that's such a cool thing that I want
to bring that up. So let's imagine we're sending a space probe to Pluto. How
long does it take to get to Pluto? Like 12 years or more. Long time to get to
Pluto. If you were going to build a camera on that space probe, you would put in
today's technology, like JPEG 2000, probably or some fancy wavelet coder. But
then you're stuck. It's fixed. Let's assume you can't reprogram this sensor.
On the other hand, you could put in a randomized measurement camera like this
camera and it might be based today on wavelets, right? But what's going to
happen in the 12 or 20 years that the camera's going to a it's travelling through
space, what do we know? Rico's going to have a better basis, Cormack's going
to have a better basis, curvelets will be even improved. So in 20 years we'll have
a basis that can make images even sparser than the bases of today.
What this is basically saying is if you take those random measurements and you
reconstruct today with wavelets and you reconstruct in 20 years with
Cormacklets that can even better sparsify that data, you will get a better picture
out.
So think in terms of all kind of data archiving, data warehousing or sort of long
term sort of sensing applications. This is I call it the future proofedness property.
>>: (Inaudible) that's a very interesting generalization. A lot of the advantage
seems to come when you're really dealing with something that's a union of
subsets, right?
>> Richard Baraniuk: Right.
>>: I don't know why you talk about images so much because I can't think of a
worse example of a signal set you are know. Because the band limited
assumption we all know it's done but it's better than anything else we've got.
>> Richard Baraniuk: Right.
>>: If you're dealing with a, you know, a two-tone image like this maybe a union
of subspace makes sense.
>> Richard Baraniuk: Right. Right.
>>: But for pretty much any other image class you want to deal with union of
subsets just doesn't really ->> Richard Baraniuk: Well, I would actually argue that if on a first level of first
approximation, okay, there's a multiple levels of approximation, but on a level of
first approximation I would say that a LP model compressible model what says
the for example wavelet coefficients decay with a power lot is a pretty -- to a first
approximation, pretty accurate model for a large class of images.
It basically says that my -- which is really the kind of model that we're exploiting
here. We're not saying that the images are necessarily union and subspaces.
>>: (Inaudible) already better ways of exploiting that model, right? You know, to
harness the power of what you're presenting here.
>> Richard Baraniuk: Right.
>>: It seems to me you really need something like a multi-band signals or things
that are supported over pretty distinguished regions.
>> Richard Baraniuk: Right.
>>: In sub-high dimensional space. You know, images just don't seem to --
>> Richard Baraniuk: Yeah, well I'm -- there's a whole bunch of things here.
One is I think it's also very -- this is a first order approximation to good models
and I think that just as people have developed better and better models for
transform domain compression, the dream is to have those models integrated
in -- they're basically priors or on the for example correlations of how the
coefficients of those representations behave. Those can be built into, that's the
dream, into reconstruction algorithms. That's my answer really to that question.
The second is this is just the first of the applications that I want to talk about, and
I'm not going to say that cameras are the -- this is not intended to replace the
digital camera that you carry around, right, this is just a first example, okay. And
there are all kind of interesting examples and multi band signals. A to D
conversion over very broad bandwidth where these models where union and
subspace models are very compelling. Yeah?
>>: That (inaudible).
>> Richard Baraniuk: Yeah, absolutely. Yeah.
>>: (Inaudible). Quick argument. There's actually a very (inaudible) we have
the wavelet model doesn't explain too well, and you compress them 2X or 4X
better than (inaudible) which is building (inaudible) which actually it's very
important when you're trying (inaudible) for example take pictures of streets.
>> Richard Baraniuk: Street view thing.
>>: So that kind of stuff could actually help in that sense.
>> Richard Baraniuk: Yeah, yeah, very interesting. Yeah, absolutely.
Absolutely.
Okay. So but why do we find this interesting for cameras? One of the reasons
we find this interesting is that the fact that this is based on a single photon, single
photon detector, and just very briefly think for one second about why you can buy
a one megapixel digital camera these days for about 10 bucks. Anybody really
thought about why digital cameras are so cheap? Anybody ever thought?
Pardon?
>>: (Inaudible).
>> Richard Baraniuk: That's exactly right. Right. There's an incredible
coincidence that the wavelength that is our eyes are sensitive to just happen to
be the wavelengths that silicon is sensitive to. Which means that digital camera
technology can essentially ride on Moore's law and processor technology and
have tremendous economies of scale using processes and ways to build the
chips that are very similar to computer chips.
As soon as you move outside of the visible wavelengths, your $10, your $100
camera becomes a $100,000 camera. You move into the far infrared gamma
ray, terahertz, other kind of regimes.
So the thing that's very, very interesting is that this compressive approach can
actually scale into those regimes. So let's just take an example. Say you want to
build a really low light camera or you can't see with regular CMOS imaging chips.
What you have to use is what's called a photo multiplier tube. Right? People
know about photo multipliers? Basically it's for extremely -- when you just have
hundreds or thousands of photons coming into a camera. Think of a very dark
night doing astronomy, okay, or a particle physics experiment, okay.
Photo multiplier tubes. One, they cost $1,000. Two, they're physically as large
as my fist. So now try to build a one megapixel photo multiplier tube camera.
There's one organization in the world that can do this, and it's (inaudible), right,
for particle physics experiments and they cost millions of dollars to build these
cameras.
Instead what you can do is just take the camera that we built, swap out the photo
(inaudible) and put in a photo multiplier tube and build a thousand dollar or
thousand one-hundred dollar very low light imaging camera that can work not
only in visible wavelengths but even into the infrared. Okay? So our feeling is
that the biggest impacts from this compressive technology at least in the camera
world is going to be outside of where silicon is really, really -- has a strangle hold
because it's very, very natural. Yeah?
>>: (Inaudible).
>> Richard Baraniuk: Good point. Good point. What about video, say. Right?
Okay. I could talk people offline after the talk, but the cool thing is if you think in
terms of video as a space time cube and think of the -- having sparsity in some
sort of space time sense, you can actually do video reconstruction using space
time type processing and we actually have some examples that I could show.
Okay. So let me ask briefly because I realize I'm at 11:30. Should I take an
extra five minutes and talk about some pattern recognition applications or should
I just shut down and go to the summary and then talk with people offline about
that? Because I really don't want to overstay my welcome. Five minutes is fine?
Okay.
So let's just jump straight to data processing. So what we've talked about up to
now is data reconstruction. So the idea of reconstructing pictures. But the thing
that's very, very useful about compressive measurement is that there are also
very, very useful for solving inference problems, like pattern recognition type
problems. Is it Rico -- is this a picture of Rico, of Cormack or of Feng, okay?
And a the good news is that these kind of randomized measurements support the
kind of learning and machine recognition type problems, inference type problems
that -- you can basically do all of those kind of tasks on the compressive
measurements. And to make a long story short, you could think of these random
projections very roughly like sufficient statistics for sparse and compressible
data. And here we start to get very, very related to the work that goes on in
random projections for machine learning. And I could talk with people about that
offline.
So let's talk about a very, very simple pattern recognition problem that's being
sponsored by the environmental projection agency where we're interested in
flying UAVs, unmanned aerial vehicles over American cities and basically finding
gas guzzling SUVs and blowing them up, okay. Because we really need to get
away from dependence on oil.
But at the same time we want to protect the future of our country and our
important national infrastructure, right? So we want to find SUVs, but we don't
want -- we want to differentiate them from busses and from tanks.
And so the question becomes how to detect and classify what kind of target I'm
looking at. And the problem always in these kind of problems is there are
nuisance parameters. Namely I don't -- I don't have a registered image with the
bus or tank exactly in the center of my image scaled correctly at the right rotation.
I always have some sort of articulation parameters. And so you know what we
do? We do what's called a match filter, which means I'll have a database of
pictures, right, of different rotations, translations, and then I'll just do a
comparison and find the best fit, okay. So this is what I do essentially. I have
some data, a picture of a bus. I want to know, is this a picture of an SUV? Well,
I'll have a database of pictures of SUVs at all different translation, for example,
and rotations.
And now what I have is a pattern recognition task in high dimensional space.
These are N dimensional points. Each picture is just a point in our N. And now I
just need to ask the question, what is the distance for my data to the best fitting
SUV, bet fitting bus and best fitting tank? Closest wins. Everybody got that? It's
standard. This is match filter. You can do closest wins either by L2 distance or
by inner product. All right. It's the same thing.
So that's what we do. And now let's think, though for a second. So how does
compressive measurement come in here? We can think for a second what is the
structure of this database of pictures of SUVs? What is the structure of a
database of pictures of the same object with different rotations and different
translations, horizontal, vertical. How many parameters do I have? I have three
parameters, right?
Translation, up, down, and rotation. Well, in fact you can show that the set of all
pictures with these three articulations forms a low dimensional manifold in this
high dimensional space. In fact, it forms a three dimensional manifold or three
dimensional sheet in high dimensional space.
So really when you think of match filters and you think of closest, nearest
neighbor classification in this case it's really nearest manifold classification. I
have a data point. Am I closest to the tank manifold, the bus manifold or the
SUV manifold? And that's really what the match filter does. So where does
compress sensing come into this?
Well, the really nice thing is that you can actually show that if you have a smooth
manifold that is K dimensional now, so it's not sparsity, this is just K number of
articulation parameters and you do a random projection of that manifold to a low
dimensional space, and you take K LogN measurements you can actually show
that you stably embed that manifold, meaning all the geometry and topology of
that manifold here in high dimensional space is reserved under a random
projection to a low dimensional space.
So what does that mean? That means we can actually do compressive inference
directly in the low dimensional space without ever having to reconstruct back to
the high dimensional space. So here's just to make it clear. Instead of
reconstructing from our camera, maybe this compressive camera a picture of
each of these, we can actually do the nearest manifold classification directly on
the randomized measurements from each of these three targets, which can save
us never having to solve linear program to reconstruct, being able to work in a
much lower dimensional space and being able to store these templets that are
much lower dimensional than traditionally done. Okay?
And here's just an example showing let's look at classification rate that with
differing amounts of noise that in a moderate amounts of noise even with just 50,
60, or 100 random measurements, these are 65,000 pixel measurements, 65,000
pixel images, but with just on the order of 100 measurements you can actually
get very, very high classification accuracy and basically single pixel resolution of
the translation of the object.
So showing that you can do a lot of pattern recognition directly on these
randomized measurements. So hopefully that makes sense.
So I think I will end just by saying you can actually push this into the network
sensing realm and in particular what you can show is that if you have network
sensing resources, be they cameras, SensorNet nodes, et cetera, you can
actually show that unlike a standard SensorNet system where the amount of
communication out of the network will scale with the number of sensors and the
resolution, okay, you can actually show that if you take randomized
measurements of the data within your network, you can actually show that you
can have communication and computation bandwidth that actually scale only
linearly in the information rate of your sources and a logarithmically in both the
number of sensors and the resolution of those sensors. Which is what we feel is
really, really critical to being able to deploy very, very wide networks of high
dimensional imaging, you know, imaging or data acquisition sensors. You really
have to move towards systems where you can have logarithmic growth of
computation and communication rather than linear growth, right, to avoid
bandwidth collapse.
Okay. So I think I will end there and just jump straight to the conclusions,
because I think I overstayed my five minutes and made it ten minutes. And just
ask if anybody has any questions? And I think I will just put up this website that's
the community resource for the compressive sensing community. We passed
about 200 papers recently, but they're nicely grouped into tutorials and
applications and network sensing in other areas. So I just invite you to go here
and grab any information that you'd like, and thanks very much for all your
questions.
(Applause).
>> Richard Baraniuk: Any additional questions? Yes?
>>: (Inaudible) I guess you're -- the contribution is more to (inaudible) that
random sensing is applicable for a large sort of (inaudible) and the
compressibility (inaudible) how would you compare how much compression you
get from, you now, that -- you know, (inaudible).
>> Richard Baraniuk: Right. Right. Okay. So that -- and that -- yeah. So that's
a very good question. That's a very good open research problem. And let me
kind of restate it. Basically you're right on, that the contribution of this work is to
show that we can acquire data using randomized data acquisition and then be
able to do things with that randomized data, either process it directly or fuse it
together or reconstruct. However, all of the theory that I've talked about is based
on very naive models, right. In fact, I can actually go right to a picture here.
Based on a very naive compressibility model. And this is to Cormack's point that
my data either is case sparse, right, meaning that at any one given point in time
there are only three active Gabor coefficients or I have a model that my
coefficients of my wavelet decay like a power lot which is an independent
coefficient model.
So I would argue that this whole field of compressive sensing is predicated on
the absolutely most naive possible compression models. Right? Which would
be equivalent to a JPEG model or a JPEG 2000 model where you just compute a
wavelet transform and then just look for the biggest coefficients, encode their
values and their locations naively and then throw away all the rest.
We know that this isn't what's done, right? There are all kinds of elegant
structures that have been found underlying wavelet transforms, underlying Gabor
transforms and I think that it's a very important open problem to try to tease those
out of these -- basically utilize those to do even better with randomized
(inaudible).
So here's an example. So we know what core McKenzie would say is I'm a -you know, in a wavelet compression algorithm I know there's a tree structure and
if I have a big wavelet coefficient, I know that probably one of four big child
wavelet coefficients, I know probably the parent is large, and I can exploit that to
encode location information very efficiently. That's not exploited in this other
model.
I know in this Gabor case not only at any given point in time are three Gabor
coefficients active, but they evolve smoothly with time. So in fact, what we've
been doing over the last few months is actually exploiting these, trying to fuse the
theory of graphical models, which can very effectively capture these secondary
tertiary structures and then use those to do better at, for example, reconstruction
or other cases.
And this is just one example. So remember I talked about the LogN
dependents? So this is an example of reconstructing a simple target but needing
a number of measurements that essentially is growing linearly with K rather than
times LogN. And these are the state of the art techniques, okay, that agree with
K LogN. And the difference is these are based on this naive coefficient model,
and this is based on a graphical model, a mark of random field type model.
>>: (Inaudible).
>> Richard Baraniuk: What these are based on, these are state of the art
compressive sensing results which are based on loss E compression results of
15 years ago, 20 years ago, right. So this is moving more closer to compressive
sensing results based on loss E compression results from five years ago. And I
think the goal is to really try to bring it up to date. Hopefully that answers your
question. Yeah?
>>: Examples you showed a second ago with these graphical models, I mean
can this still be in the framework of sparsity that you just haven't captured the full
sparsity?
>> Richard Baraniuk: Exactly. I would argue that going all the way back to
Cormack's (phonetic) question, to a first approximation both of these transforms
are absolutely sparse. That's a good first approximation. Then do a second
order approximation they have this structure that the large coefficients have a
correlations in their magnitude. And as I indicated. And so you get a nice big -you know, you get to K LogN by this initial sort of sparsity argument, and then
you can get, you know, even lower number of measurements by using this
secondary graphical model type structure.
And you can actually rigorize this and figure out even just exactly how many
measurements you do need. Yes?
>>: There's a -- I mean, if you look at (inaudible) sparse representations before,
right, (inaudible).
>> Richard Baraniuk: Absolutely.
>>: And there's a long -- I mean, there have been a number of case like say if
you look at the zero crossing, what (inaudible).
>> Richard Baraniuk: Yes.
>>: If you look at the fractal stuff, if you look at a lot of kind of stuff that happens
in the '90s and stuff. A common trend among several of those really imaginative,
really creative, really exciting representations.
>> Richard Baraniuk: Right.
>>: And I think each of them kind of came to grief on, you know, there were very,
you know, unstable (inaudible). Once you start quantitating you know, you can
prove very nice reconstruction results, but, you know, quantization was like
(inaudible) I think the conclusion was that you know, yeah, uniform sampling is
exciting but as soon as you deviate from it, things can go to hell in a hand basket
quite quickly.
>> Richard Baraniuk: I think.
>>: (Inaudible).
>> Richard Baraniuk: So I think that's a very good question. The question was -well, I think everybody can hear. Having to do with a stability. So I would argue I
would just say two things. The first is for mid to high signal to noise ratio
problems -- well, first of all I just say this theory is numerically stable. Unlike the
zero crossing work which was not numerically stable. You can actually prove
results here saying that if the input, the signal X that's coming into my sensing
system has a certain SNR, I can actually -- I actually have a bound on how badly
the SNR could be degraded after sensing randomly and then reconstructing.
So unlike those theories, there actually is -- there are stability results. So I think
that that's the thing that's -- one of the things that's most exciting about this
theory is that this can actually be applied in a lot of different types of practical
problems. However, I will agree with you that high degrees of quantization, high
amounts of noise are still a challenge and basically what you start to get into
there is less and less sparse approximation problems and more and more finding
needles in hey stack type problems which become -- which become very, very
difficult. So unlike what I -- how I would characterize it is there's a grace full
degradation of uniform sampling with SNR to really quite low SNRs whereas here
there will be a threshold effect. You'll have a very nice decay and then once the
noise becomes too large you will basically have -- you'll basically lose your ability
to sense.
But there have been some progress on this and Petro Bafunos (phonetic) at Rice
actually has been able to do some really nice work on reconstruction using just
single-bit measurements. So taking that Y vector, right, the randomized
measurements and actually just taking the sign of those, so single-bit
measurements and then reconstructing, not using linear program but by other
type, other kind of means. And this is an example of the target he wants to get -this is 4096 bit -- pixel image, 8 bit pixels, that's what you'd like to see, and this
is actually essentially one bit per pixel measurements, which is pretty good, and
this is actually one-eighth bit per pixel measurements.
So what we're hoping we can do is to push this, you know, just be able to follow
the trend and as better and better compression algorithms are developed, try to
be able to sense or reconstruct data that at least close to those kind of
(inaudible).
>> Feng Zhao: Thank you.
>> Richard Baraniuk: Okay. Thank you.
(Applause)
Download