Sing Bing Kang: It's a pleasure for me to introduce Mohit Gupta. Mohit graduated with a Ph.D. robotics at CMU. And he's currently at Columbia where it's a research scientist and will be assistant professor at University of Wisconsin-Madison starting January of next year. So Mohit is an expert in computational imaging. More specifically, for example, in computational cameras, where he's interested in coming out with systems that are robust under very challenging situations. Mohit just told me that one of his work in structured light has been licensed by a Japanese company. So that's great. So Mohit. >> Mohit Gupta: Thanks. Thanks Sing Bing for the introduction. And thanks a lot for inviting me. I'm going to talk about 3D cameras. Now, 3D cameras have a lot history. This is perhaps the first known 3D imaging system. It's called photo-sculpture and was invented by a French sculpture called Francios Willeme. This consisted of a large room, something like this, and the subject who needed to be scanned would sit in the center of the room. There were a lot of cameras on the walls of the room. And each of these cameras would take an image, which were then manually put together to create a physical replica of the subject. Now you can imagine this is not a very portable system. This is not very fast. And this did not get very high resolution 3D shape. We have come a long way since then. Current 3D cameras can capture very high resolution 3D shape in near realtime. 3D imagining techniques, at least most of them, can be broadly classified into two categories. First are the triangulation based systems like studio or structured light. This is a typical structured light system. It consists of a camera is the projector. The projector projects a known structure illumination pattern on the scene. The camera captures an image. And because we know the pattern, we can find out the correspondences between camera pictures and projected pictures which we can then use to triangulate and get the scene decks. For those of you who know studio, this is basically a studio system but instead of two cameras, we have one camera and a light source. The second category is time of flight. Again, a time of flight system consists of a light source and a camera. Now, in the simplest form, the light source emits a very short light pulse into the scene. The pulse then travels to the scene and then comes back to the sensor. There's a high-speed stopwatch that measures the time delay between when the pulse was emitted and when it was received. Now, because we know the speed of light, we can compute the depth or the distance by this very simple equation here. There's a factor of two here because light travels a distance twice, back and forth. Now, this is one classification, but another way to classify 3D cameras is active versus passive. For the purpose of this talk, we're going to focus on active cameras which use a coded light source to illuminate the scene. So both structured light and time of flight are active imaging techniques. They're both active imaging techniques and now currently form the basis of after large fraction of 3D cameras, especially in consumer domain. You guys all know about the Kinect. Now soon we will have these devices in our cell phones and in our laptops. Because of their low cost and small size, these cameras really have the potential to be the driving force behind a lot of new imaging applications. So think about augmented reality, robotic surgery, autonomous cars, industrial automation. Again, so there's a lot of potential. But in all these scenarios, these cameras will have to really step out of the laboratory place, out of their comfort zone, into the real world where they face a lot of challenges. So here is a 3D camera. It has its source and its sensor. Now, it is okay if this light source is the only one that is illuminating the scene. But in outdoor scenarios, especially for autonomous cars, this light source has to compete with sunlight, which is much, much brighter. So here's an example. This is an example image and the 3D image captured by time of flight camera. But this is just around dawn, around 6:00 a.m., so the sunlight is not very strong. Now, I created a time lapse review by capturing an image every ten minutes. So see what happened as the sun rises. Wherever there is constant light, the camera cannot measure 3D shape. Another media challenge is if there is some kind of scattering media present between the camera and the scene. Again, this scenario arise a lot in autonomous cars or even in medical robotics when there is stuff like fog, smoke, or medical -- or body tissue. So here's an example. This is the video of a self-driving car. First, this is on a clear day. Now, the car approaches a turn and it's nice and clear to it makes a turn successfully. Now, the same car, the very next day, when there is fog and rain. So watch what happens as the car approaches the turn. It cannot make it. It goes into the curve. So this is clearly not acceptable. We would like to avoid such a thing. Now, these challenges are due to environmental factors. These are something which are external to the scene. But even if we are in a controlled environment, let's say indoors, maybe on a factory floor, there are challenges due to the scene itself. For example, many of these cameras assume that light bounces only once between when it leaves the source and gets to the sensor. Now, this is fine if the scene is perfectly convex. But in general, light will bounce multiple times. So think about this room or think about the robot which is exploring an underground cave. Light is going to bounce multiple times. And finally, if the scene is not made of nice well-behaved diffused materials but something more challenging like metal, glass, or something translucent like skin, again, these, cameras cannot measure shape reliably. So if you look at the evolution of 3D cameras over the last 150 years, we have made a lot of progress in terms of speed, size, and cost, but I believe that the next generation of 3D cameras, if they have to make a really meaningful impact in our life, they'll have to perform reliably in the wild. And by in the wild, I mean in every environment and for every scene. So now, I'm going to talk about some of our work into dealing with these core challenges. And I'll start with ambient illumination or sunlight. Now, the first idea, the very first thing that will come to your mind for dealing with ambient illumination is why don't we capture two images? One where the light source is on so that the total light is the sum of I-source and I-sun. And then we capture another image where the light source is turned off or blocked so that the camera captures only the sunlight. And then by simple subtraction, we can remove sunlight. Now, it would be nice if this simple approach works, but unfortunately, it does not. The previous example that I showed was after the subtraction. And the reason why this approach does not work is because there's a more fundamental effect at play here because of particle nature of the light. Now, when a camera captures measures of light sources measures light intensity, it does not measure it as a continuous wave but as a discrete of particles called photons. And the arrival of photons, it's not uniform. It's a discrete random process, which is modeled through the discrete random process which means that if the mean rate of arrival, let's say, is three photons [indiscernible] time, the camera will measure three photons. It may measure six photons, or it can measure zero photons. So there's an inherently uncertainty associated with any light measurement. We express this uncertainty by using this random variable called photon noise. So the actual measured value is the sum of the true value and this random variable photon noise. Now, this is a discrete process so it follows a Poisson distribution. But the important thing that I want to remember is that the standard division of this photon noise is equal to square root of the mean value. So the more the light captured, more is uncertainty. Yes? >> This is more of a problem if you have [indiscernible] time is very short, right? >> Mohit Gupta: This is more of a problem if your [indiscernible] time is too short, but it's also a problem if the light that you're sending out from your light source, which is carrying useful information about the scene, if that is much smaller as compared to ambient light because ambient light is also going to contribute this photon noise, but it's not carrying any useful information. >> But photon noise is an unavoidable limit, but then you also have implementation issues like, you know, the amount of charge that a bucket can hold, so the dynamic range, readout noise, there's a whole bunch of things. >> Mohit Gupta: That's exactly true. Now, the reason I'm focusing on photon noise is because, as you said, this is a very fundamental property of light, and no matter how good your camera is, even human eye, you cannot avoid photon noise. So let's see how photon noise affects 3D imaging. So now, you have this camera and the scene is also illuminated by sunlight. Now, the total measured light has three components. One is the source light, then there is a sunlight and then there is the photon noise. Now, we call this as a signal component because this is the only one which is carrying useful information about scene depths. Now, as we said before, the standard division of photon noise is equal to square root of I source plus I-sun. Now, we are looking at scenarios where sunlight is much stronger so we can prescription [indiscernible] I-sun. Now, the quality of three D shape that this camera measures is given by the ratio of the signal component and the noise. If you recall the signal to noise ratio, and in this case, this is I-source or [indiscernible] device. Now, immediately here, we can see the problem. If sunlight is much stronger than source light, then you're not going be able to measure 3D shape reliably. So how does this ->> Are you going to say something about wavelength effects? narrow band filters? You know, using >> Mohit Gupta: So we can increase this ratio by using wavelength filters. But as I'm going to show, that increases this term by a factor of about one order of magnitude. Now, in practice, we are looking at a gab of about 3 to 5 orders of magnitude. So even if we do this filtering, and in all our experiments, we do that, but still we need to get something more. But that's a very good question. So yeah. Let's look at what this ratio looks like in practice. So here is a plot of sunlight strength measured over a typical day. So X axis is the time of the day and Y axis is the ambient illumination strength. This is large scale. And these are the images of the sky at different times of the day. But this is the range of the strength of artificial light source, typical artificial light sources. If you have a very bright spotlight, you're somewhere around here. But if you have a weak, maybe a pocket projector, you're somewhere around here. So in all these scenarios, the artificial source is about 2 to 5 orders of magnitude weaker. And that's what makes the problem so challenging. So how do we deal with this? So now that we have established that the problem is photon noise, which is a random variable with a high standard deviation, the first idea is to just capture a lot of images and compute the average. Now, we know from 36 that the standard deviation of the mean of a bunch of random variables is lower than the individual deviation. So particularly if you capture images and take the average, the standard deviation of the average is going to be lower by a factor of [indiscernible] ten, which means the SNR is going to be higher by a factor of [indiscernible] ten. So if you plot the SNR as a function of time, we get -- there is growth, but there's a very slow [indiscernible] growth. Meaning suppose we have increased SNR by a factor of ten, we have to capture a hundred times more images. And that's not really practical. Especially in outdoor scenarios where we have to make decisions in realtime. Now, the problem with this approach is that you're starting with a light source that has a small amount of power, and then you're diluting it even further by spreading it out over the scene. And then you have to compensate for that by doing this average and post processing. So what if we could increase the signal strength before the images are captured? So instead of spreading out the light source power over the scene, what if the we could divide the scene into end parts and focus all the available light into one part at a time? So we have still capturing N times more images, the same as averaging approach. But now the SNR in the signal increases by a factor of N in the same total time. So in the same total time, there's a much faster linear group. So the next question is if focusing light is good, then why don't we concentrate all the available light into a single theme point at a time? And that is the approach taken by many commercial lighter systems. It works in that the signal strength is very high, but it's a very slow process. You have to scan over the scene one point at a time. Now, the other extreme is of course the averaging approach, which we talked about. That's also very slow. Now, our main observation, our main contribution here, was to identify that these are just two extremes in the space of light space. This is this whole unexplored space in the middle here, and the optimal light straight, some lie somewhere in between. Now, intuitively the spread should be small enough that we don't have to do too much averaging and it should be large enough that we don't have do too much scanning. So we've derived the [indiscernible] expression for the optimal light strength. Here [indiscernible] depends on the light source and sensor characteristics. But the most interesting thing here is that the optimal spread actually depends on the amount of sunlight. Which means that if we have different scenarios, let's say, sunny day, cloudy day or night, the optimal light spread is going to be different. Decreasing sunlight, increasing light spread. Now, this is an interesting design. We are saying that the optimal camera for an outdoor scenario is the one that really adapts to the amount of sunlight. Now, this is fine [indiscernible] but then a practitioner might ask how do you implement such a light source? Now, remember that we don't -- we cannot do this by blocking light. We don't want to lose light. So we really have to change the spread of light reactively to the environment. Now, if you look at this laser pointer, this is focusing all the light into a single point. If you look at this projector here, this is spreading out all the light. So there are fixed light spreads. How do you implement this light source with variable light spreads? We solve this optical problem by making this mechanical apparatus. Here's a polygonal mirror and there's a laser diode. There's a cylindrical lens. And as the mirror rotates, this laser sheet is swept across the scene. This apparatus is typically used in laser scanners, and typically this mirror rotates very fast so that in the time that it takes for a camera to capture an image, the sheet has swept across the scene. Now, our main idea here was that if you could somehow control the rotation speed of the mirror, then we could achieve -- we could emulate a light source with different amount of light spreads without losing any light. So for example, if the mirror rotates slowly, then this sheet sweeps through a smaller area, [indiscernible] point that's a lot of light. If the mirror rotates fast, then the same sheet for a larger area, but [indiscernible] receives a small amount of light. Now we're not losing any light here. So these are some example images captured by our system. Here the scene was just a single flat white wall. From left to right, the rotation speed decreases. And if we plot the intensities long this horizontal scan line, we can see that from left to right, the spread decreases but the peak intensity increases. The area [indiscernible] for all these is the same. Just a total amount of light going into the scene. Now I want to show some real experimental results with the setup. So here's an example object placed outdoors. Very strong sunlight. Around 75,000 lux. In comparison, the total brightness of the light shows was close to 50 lux. So again, maybe three to four orders of magnitude difference. And Rick, to answer your question, we used a spectral filter to block out some of the light, which still we need to. This is the result that we get with the frame averaging approach. We see a lot of holes. It's incomplete. This is the result that we get with the point scanning approach. Now, I want to explain this result because we used a fixed time wizard for all the methods. So because we have a small -- we're capturing a small number of images, we have to subsample the scene with the point scanning approach, and that's why we see this blocky result. All the surface details are lost. With the same number of images, and with the same power, this is the result that we get with our method. >> What happens if the scene contains like [indiscernible] not very dark? >> Mohit Gupta: Yeah. That's something which we're thinking about right now. Right now we assume that the entire scene is lit with the same amount of ambient illumination. And if there is a very strong shadow edge, let's say, then ideally you like do something different for different parts of the scene. And we don't have it right now, but that's a very good future direction. >> You're using a rotating mirror. deflect things? >> Mohit Gupta: But could you use a MEMS device to You could do that too. >> And then the cylindrical part which spreads vertically, could that be a deflector as well? >> Mohit Gupta: You could have it to the deflector, like [indiscernible]. And that will probably give you more flexibility. This was kind of the simplest implementation, and here, we didn't -- this device that we built actually looks like an exotic device, but it's available off the shelf. The only thing that we had to do was change the speed. >> So [indiscernible]. >> Mohit Gupta: In this case, I believe we used about 30 images in total. So the acquisition time was close to one second. Yeah. So now that we have this device, we went around Columbia campus scanning all those structures. So this is a marble statue around noon. Again, very strong sunlight. We can still recover very fine details here. This is a church inside Columbia campus. Again, strong sunlight, we can still recover high level details. Now, getting this kind of fine details can be useful for [indiscernible] navigation, but also in virtual reality and digital tourism applications where we want to achieve highly detailed 3D structures of large scale buildings and cities. So just so summarize this part, we looked at the photon noise on 3D imaging and we showed that the optimal thing do is to adapt the spread of light before -- according to the environment and that this should be done before the images are captured. So next I want to talk about the problems of scattering and dealing with difficult geometry. And the underlying physical processes are very similar. So I'm going to talk about them together. I'll start with geometry, actually. So suppose you're an architect and you want to build 3D shape this have room here. And suppose you are using a time of flight camera. Now, you will be okay if there is only this direct reflection. But now because this is an enclosed scene, there will be all these indirect light paths which are called interreflections. Now, these indirect paths are longer than the direct path. So because of that, the camera will overestimate the scene depths. So here's a simulation. This is a camera view. This is the ground truth shape along this horizontal scan line. It's a square. This is a shape [indiscernible] using a conventional time of flight camera. There are two things you notice here. One, the areas are quite large. The scale of the scene is about three meters but the error is about one meter, which is not acceptable in many settings, in road navigation, in modeling, et cetera. The second thing to notice is that this is not a random [indiscernible] kind of error. It's a very structured error which cannot be removed in post processing by simple filtering. So you have to do something again before the images are captured. So how do we deal with this problem? First let's look at the image formation modeling in a bit more detail. Now, this is a time of flight camera but we are going to use the continuous model of time of flight where the light source does not emit a slight pulse but emits light continuously. And the intensity of the source is modulated over time. So for example, it could be a sinusoid over time. So this is a light source intensity on Y axis. X axis is the time. Now, this light ray travels from the source to the scene and then comes back to the center. Now, the light received at the center, it is also a sinusoid over time but it's shifted. The amount of this [indiscernible] shift is proportional to the travel distance. And we can measure the phase shift and hence we can measure the travel distance. That's how continuous wave time of flight cameras work. Now, suppose the scene is a single flat plane. The camera, each camera picks up a receiver single light part and there will be a sinusoid corresponding to that. But if you made the scene a bit more interesting, a bit more realistic, this camera pixel starts facing light along this indirect path as well. Now, this path has a defined length so the sinusoid corresponding to it will have a different phase. There's another light part [indiscernible] sinusoid different phase. Now, eventually the counter integrates all these rays, which means all these sinusoids are going to get added together, will get another sinusoid which now has a different phase than what it should be. And this phase error is what result in the depth errors. Now, interreflections are an important problem in vision. They have been looked at in many different context over the last 30, 40 years in photo metric studio and structured light. They've received a lot of attention recently in the time of flight community as well. But most existing approaches assume that it's indirect light coming to the camera comes along 2 or 3, a small direct number of indirect light paths. But if you think about it in general, there can be an infinite number of a continuum of light paths. And each of these paths is contributing a sinusoid here. >> The time graph is a little bit misleading because this assumes like a perfectly mapped reflector, and even then, it's fudging stuff, right? I mean, reflectors with a lot of gloss have a fairly narrow load. >> Mohit Gupta: That's true. That's true if you have perfect mirrors in the scene, then the sparse approaches would be better. But here we're assuming that the scene is -- it has broad reflectance looks. Yeah. So that's a good one. Yeah. So now, the challenge here is to be -- is to somehow separate this infinite number of sinusoids from this direct component. And without any [indiscernible] formation, it kind of seems very challenging. Now, the key intuition that we have is that while we cannot prevent this indirect path from happening, if you can somehow ensure that the sum of all these indirect sinusoids becomes a constant DC and [indiscernible] how, but if somehow we can achieve that, then the total light, which is the sum of the direct and indirect light, will just be [indiscernible] shifted version of the direct light. Which means we will have a different offset, but the phase will be the same. It will be okay. So how do we achieve that? Now, before we look at the solution, I want to first talk a little bit about how do we represent at least light trace. So far we've been talking about sinusoids, which are kind of a clumsy representation. Can we do something better? Light parts, one direct and one indirect light path. And they have different sinusoids. They may have different phases. They may have different amplitudes but the important observation is that the period or the frequency is the same. Which is the same as that of emitted light. Now this is kind of -- this is not very surprising. All that we are saying is that if we have a light source, which is sending out light into the seen at a particular temporal frequency, the light [indiscernible] may bounce around multiple times [indiscernible] absorb, but the frequency will not change. So if the frequency remains constant, we can simply factor it out. The only two parameters that remain are the amplitude and the phase. So we can represent each light ray by a complex number or a phasor. So it's a more compact representation. So now that we have this phasor representation, we can take a computational view of light transport. We can think about the scene as a black box which takes as input a phasor that is emitted by the and gives as output another phasor which goes into the camera. And this transformation between the input and the output is a simple linear one where this constant here is called a light transport coefficient. This is what [indiscernible] properties. This is a very simple linear representation. So now we can use this representation to start analyzing different light transport events. So different events that light undergoes in a scene. So suppose we have this light source, there is some initial amplitude of the emitted light. There is some initial phase. Suppose light travels through free space. Now, we know that light does not lose intensity. Only the phrase changes. So the corresponding phasor transformation would be simple rotation. And suppose light gets reflected. Now, at the moment of reflex, light only loses amplitude. The phrase does not change. So there's a reduction in amplitude here. Now, suppose light goes through and kind of scattering medium where light is absorbed as well. So as light travels, both the phase and the amplitude change. So the transformation here would be [indiscernible]. And finally, a lot of light is get added together. Each of them have a corresponding phasor here. A total light after superposition would be just the complex resultant of these phases. So far, we have looked at a single frequency. Now let's look at what happens as we change the modulation frequency. We're going to look at the simplest case of free space propagation. So again, we have this initial phase of amplitude [indiscernible] propagation, the phase changes. Now, the amount of phase changes proposed to the travel distance. But it is also proportion to the modulation frequency. Which means that for the same travel distance, as we start increasing the modulation frequency, the amount of phase change increases. Now, this is an important property. I want you to remember that. I'm going to use it very soon. So now that we have this new kind of visual representation, a new kind of visual algebra to analyze light transport, you can use it to analyze interreflections. So this is a very simple case. There's one single indirect light path. And there's a corresponding phasor here. Next we look at another light past which is an enclosed neighborhood. Now, because light transport is locally smooth, the amplitude of this light path would be very similar to the previous one. But the path length is different, so the phasor is slightly different. So now, we look at a small continuum of light path starting in this small cone here. All of them have approximately the same amplitude but slightly different phases. So this phasors will trace out this sector in the phasor space. Now, this is where we use the fact that the Angular spread of these phases is proportionate to the modulation frequency here, which means that as we start increasing the frequency, the spread increases and these phasors start cancelling each other out until eventually it becomes zero. So one more time. We increase the frequency until eventually we reach a threshold where the indirect component becomes zero. So that's the main idea. That's the main contribution. Just to show that if you use a high enough temporal frequency, then we can deal with the problem of interreflections even in the general case when there can be an infinite amount of light bounces. So yeah, this is what we started out to achieve. The interreflection component becomes a DC. The direct component is still oscillating. And the total component is just a DV shifted version so that the phases, so that only the offset is different but the phase is the same. So we are nearly done here. We have almost solved the problem, but there's one small issue that is remaining. And that is because if you use a high temporal frequency, we get a lot of depth ambiguities. And that because if we have two different sim points, for example, this one here, the phase, this is the phase for this sim point, if we look at other sim point which is further away, the phasor wraps around after [indiscernible] and we get the same phase. Now, this is fortunately this is a very well studied problem in many different fields, including [indiscernible], in acoustics, even in [indiscernible] where the idea is if you use two high frequencies that are very close to each other, we can emit at a low frequency. This is kind of similar to emitting a big frequency. So based on these ideas, we can maybe use two high frequencies that are very similar to each other. And we can estimate phase of both of them and use that to [indiscernible]. So based on this, we have developed a method called micro time of flight imaging where we use two high frequencies that are very close to each other. We call this micro because both frequencies are high. And the periods are small or micro. Now, if you compare with conventional time of flight, it uses only three measurements. It uses a single low frequency. Three measurements. We need three because [indiscernible] three unknowns, offset, amplitude and phase. Now, micro time of flight uses one extra measurement but provides significant robustness against interreflections. So next I want to show some results for some simulations. This is a Cornell box. >> How do you decide how different this frequencies are? >> Mohit Gupta: That's an engineering decision. Ideally, theoretically you want them to be as close as possible. But there are practical limits imposed by the light source. Light source has a finite frequency resolution. So we -- if we know something about the [indiscernible] like maybe we know an approximate range of scene depths. So based on that, we can do this kind of tradeoff. So this is a Cornell box. Again, it will be okay if there is only direct reflection but there are all these indirect bounces as well. This is a shape, this is a ground truth shape. This is a shape using conventional time of flight. And with one extra image, this is a shape recovered using micro time of flight. It's about a two-hour of magnitude improvement. So now, based on these ideas, we developed an experimental setup. This is a light source. It's a bank of laser diodes. And the camera that we use, it's a PMD CamBoard nano. [Indiscernible] in Germany, which makes these customizable time of flight cameras. And we needed this to be customizable because we wanted to change the frequencies. This is the experimental scene here. It's a conical scene for [indiscernible] interreflections. It's a corner scene. There are two phases. There's light bounces between these two walls. And we kept the light wall to be moveable so that we can change the apex angle to change the amount of interreflections. This is very [indiscernible] setup here, the fixed wall, the moveable wall and the camera here. These are three measures of the setup for different apex angle, 45, 60, and 90 degrees. This is the shape recovered using micro time of flight for the 45-degree [indiscernible]. Now, probably a comparison is more instructive. This is a ground truth shape. This is the shape using conventional time of flight, the mean area is about 85 millimeters. The depths are all estimated. And this is a shape using micro time of flight. Again, the errors are about one to two orders of magnitude lower. >> [Indiscernible] continuous time of flight, right? wouldn't have this issue? >> Mohit Gupta: So post time of flight That's right. >> And so what is the benefit of using continuous time of flight over [indiscernible] time? >> Mohit Gupta: It's mostly a cost argument. Now, in first time of flight, what you mentioned first time of flight, those systems are extremely expensive. >> [Indiscernible]. >> Mohit Gupta: Mostly lighter approaches, like maybe the one that is used in Google [indiscernible]. Now, in a system itself is three times more expensive than the current [indiscernible]. These cameras can be as cheap as hundred dollars. >> Your straw mat here is a single frequency, right? But the, you know, the biggest consumer time of flight continuous wave time of flight which is the you connect one uses multiple frequency, right? So the issue, I worked with that team. The issue is often one about range ambiguity. So if you use the high frequencies -- well, anyway, but in other words, this idea of using more than one frequency isn't particularly novel, right? It's maybe the fact that you're using two very high, very close frequencies. Right? >> Mohit Gupta: The main novelty here is the use of high frequencies. Yeah. So people have used multiple low frequencies as well. But each of those frequencies will be susceptible to interreflections. So the [indiscernible] part is not novel. novel part here. >> But again, the idea of using high frequencies, that's a So if you keep increasing the frequency, do you still get the DC? >> Mohit Gupta: You do. And the reason for that is this path size or the scene that I used for this derivation, that's a mathematical construct. It's not a physical construct. So I guess your question is if your frequency goes beyond that threshold frequency, would you still get the cancellation. You do because you can still divide up your scene into smaller patches where in each patch, you get the cancellation effect. So couple more comparisons. This is 60 degrees and 90-degree wedges. This is ground truth. This is conventional time of flight and this is micro time of flight. So just to summarize this part here, analyzed the effect of interreflections for time of flight imaging and we showed that by using high frequencies that are close to each other, we can deal with the problem of interreflections. Now, it turns out that the same kind of tools and techniques can be used for the other 3D imaging technique which we talked about, structured light. So just to remind you in structured light, we project spatially coded intensity [indiscernible] on the scene. And one of the most popular structured light method is phase shifting where you project sinusoidal coded intensity patterns on the scene. Now, in conventional phase shifting, these patterns are very low spatial frequencies and that's what makes them susceptible to the problem of interreflections. So if we do the same analysis that we did for time of flight but now we did it in spatial domain instead of temporal domain, we can show that by using high spatial frequency patterns, we can deal with the problem of interreflections. So based on these ideas we've developed a method called micro phase shifting where we use only high spatial frequency patterns. Now, this is only a very broad inclusion. I'm not going into details here. I just want to show some results for this method. So this is a scene. It's a concave ceramic bowl. Now, this actually is not very diffuse. This has very high narrow specular loads as well. Now if you look at a scene point here, it receives light directly from the projector. But it also receives light due to this interreflection, indirect light. Now, this is a shape comparison using conventional phase shifting. You see this incorrect phase here. And with the same number of images, in this case it was seven images, this is a shape replica we're using, micro phase shifting. Now, this is another example. This is one of my favorite examples. Here we have a shower curtain. And there's an opaque background behind it. Now the goal here is to recover the shape of the curtain itself. Which is nearly transparent. Now, this kind of a scenario is used a lot in medical imaging where there's a tissue which are nearly translucent, and there's an opaque surface behind that. Now, you'll be okay here if there's only this direct reflection from the curtain. But now this is all this light which permeates the curtain, goes back and then comes back out here. So because of that, this is the shape using conventional phase shifting. There are a lot of errors, big holes. Now, with the same number of images, this is the shape using micro phase shifting. Now, there are no holes here, but you can also recover this fine details like the ripples in the curtain as well. >> [Indiscernible]. >> Mohit Gupta: You can do that as well. It's a good question somebody asked me recently. I haven't thought much about it. But yeah. >> [Indiscernible] really know the shape, can you actually make use of that information? >> Mohit Gupta: You could do that. Yeah, you could have an iterative approach where you firstly construct the foreground. Once you know that, you can factor it into a reconstruction algorithm and you can -- yeah, so that's a good thing to do in the future, yeah. Next I want to very briefly talk about the problem of scattering because, again, the underlying physical process is very similar. As I mentioned before, it's an important problem, especially for outdoor 3D imaging. So suppose we have a convex scene, a single convex scene. So the counter [indiscernible] is only along a single direct path, a single sinusoid. But now there is some kind of medium between the scene and the camera. So now there are all these indirect light paths which we call backscatter which never even get to the scene. Now, as an interreflection, these light paths and different lengths than the direct path, so they are different phases here of the sinusoids. And because of that [indiscernible], and we get incorrect phase. Now, the main observation here is that this looking very similar to the case where there's interreflections, there is one direct component and there is whole bunch of indirect components. So maybe we can use the same tools that we do for interreflections as well. So suppose this is a single simulation example. Here, the scene was a hemisphere. This is a ground truth shape. This is the shape you think conventional time of flight. Now the important difference here is that in scattering, the indirect paths are actually shorter than the direct paths. So the depths are underestimated. And now this is the shape using micro time of flight. Now, I want to emphasize that this is only a very preliminary, only a simulation wizard and scattering is a very, very challenging problem. But perhaps this is something which tells us where to proceed. It's maybe a good starting step. Again, I want to very briefly talk about dealing with difficult materials. Now, suppose the scene is made of something well-behaved, something diffused like wood. Now, these kind of materials scatter incident light almost equally in all directions which means that irrespective of where the camera is, it receives almost an equal amount of light. But now the scene is made of something which is more challenging like metal. Now, these cameras have a very narrow specular spike as Rick mentioned. So now, depending on where the camera is, it may either receive no light at all or a lot of light. And both are a problem. Now, ideally we would like the surface to reflect light almost equally in all directions, but we cannot change the material properties. But one thing that we can do is if we can somehow illuminate the surface, not from a single direction but from multiple directions, then we can emulate this different -- so one way to do that is to place a diffuser in front of the light source so that it acts as an area light source, which illuminates the scene from multiple directions and now we get this reflexes in all multiple directions. So the light received by camera is now not coming from a single direction but [indiscernible] direction. Based on these [indiscernible], we have developed a method called diffused structured light where we place a diffuser between the projector and the object. The important thing to notice here is that the diffuser cannot be just an arbitrary diffuser because it's going to destroy the projected pattern. So there's a trade-off here. We need to somehow emit an area light source without blurring out or without destroying the pattern. So the key thing here was to use a diffuser which is linear meaning it scatters light only long one direction. And if you use a structured light item which is also linear, and you align this diffuser along that pattern, we get both advantages. We don't lose the structured light pattern, but you also get an area light source. This is a coin, metal. This is the shape [indiscernible] using conventional structured light. This is a profile view. You get a lot of errors here. And with the same number of images. This is a shape using diffuse structured light. These are some more reconstructions using diffuse structured light. And by the way, all [indiscernible] are [indiscernible]. This is a regular [indiscernible] camera, a pocket projector, and a cheap hundred dollar diffusers. This is a lemon. This is organic. So it's translucent. Meaning there's a steam point. It gets direct line. But the sunlight which permeates beneath the surface and comes back out. This is very similar to interreflections so we can use maybe micro phase shifting to recover this thing. So this is shape using conventional phase shifting. There are errors due to scattering. And this is the shape using micro phase shifting. Now, two of these methods, micro phase shifting and diffuse structured light, were recently licensed by a big company in the space of industrial automation. And they're soon going to release products, hopefully in the next 12 months or so, for robotic assembly of machine parts, mainly automotive parts, and also inspection of electronics, printed circuit boards. Now, this is a huge multibillion dollar industry and it's very satisfying to see our research make an impact here. So do we have time? Five more minutes? >> Sure. >> Mohit Gupta: So in the next five minutes or so, I want to just quickly talk about the lessons that we've learned from all this work and what are some future research directions. Now, a central theme of all this subject you've talked about is to develop computational models of light transport or how light interacts with the physical world. We have developed models for many different processes like interreflections, scattering, specular reflection, et cetera. For example, we talked about using the phasor representation to develop light transport model for time of flight imaging. Now, here we use a phasor to model the temporal variation of intensity. But we know that light is electromagnetic wave meaning even if the intensity is held constant, there's an underlying electric field which oscillates over time. And oscillation of the electric field is also sinusoidal like the intensity that we talked about. So one interesting direction is to build on the tools that we use for time of flight to model the temporal variation of electric fields. And I call this coherent light transport because now we are really trying to model the coherent or the wave nature of light. Now, once we do that, it really opens up the entire electromagnetic spectrum starting from terahertz down to x-rays. Right now, vision algorithms are mostly limited to visible or maybe [indiscernible] UV. Now, each of these modalities, each of these bands interact differently with the physical world. For example, we know that visible light cannot penetrate surfaces but terahertz and x-rays can. So each of these bands can tell us something different about this scene. For example, suppose there is an object here. If you use visible light, we can learn about the surface appearance. But if you want to learn about the underlying material properties, what this object is made of, we want to use terahertz or x-rays which actually permeate beneath the surface and then come back out. Now, the phase of light, the phase of the coherent light of each of the sensor, it's a function of the part length here, which in turn is a function of the material properties. So fortunately now there are cameras which can measure not just intensity but also electric phase. So it would be nice to develop algorithms which can interplay these intensity and phrase images to start making inferences about the material properties as well as maybe some other high order properties. With these kind of algorithms and the right cameras, we will be able to -- we may be able to take vision beyond the visible spectrum into these different kind of modalities. >> In this particular example, that internal scattering, there must be millions of such paths. >> Mohit Gupta: That's right. >> So wouldn't the phase pretty much get totally destroyed by the superposition of all these random paths, the coherence just disappears? >> Mohit Gupta: It depends on the wavelengths that you use. If you use something like x-ray maybe, which has a very short wavelength, then the variation in the path lengths is going to be larger than the wavelength itself. So then you will lose all the coordinates. But if you use something like terahertz where the wavelength is [indiscernible] millimeters or even centimeters, then you have some hope of keeping the coordinates. So there are many applications for something like this. For example, you can start thinking about tools for personalized health monitoring because you can start recovering properties about skin. So this can potentially be used for early detection of skin cancer maybe. This can be very useful for robotic grasp planning. Now, for these applications, we want to know the material properties of the object that we're going to hold. Is it soft versus hard, rough versus smooth, et cetera. Another interesting application is robotic agriculture. We want to figure out whether the crop is right for picking. Now, this can be very useful for preventing a lot of food wastage. Now, another application is [indiscernible] has a property that it amplifies very small motion. So if you can model that, we can start building tools for mechanical vibration analysis as well. It can be very useful in industry. Now, another direction, and this is very speculative, is that we can start building models toward different exotic physical effects. There's one example, birefringence. Many different materials like glass [indiscernible] mechanical [indiscernible] develop two different refractive indices. And this is what results in this [indiscernible] effect which we see a lot in our wind screens. We all have used polarized glasses at some point or another to reduce glare while driving. There's another effect, fluorescence. Now, a large fraction of materials known to us are fluorescent, which means that even though the visible light image that we capture has some information, but we can recover much more information, much more hidden information if we look at the florescent image. And finally, there are some processes which are not even natural which are only present in [indiscernible] materials like this negative reflection. Now, it would nice to build models for these processes so that we can design vision systems that can capture information that was not possible with existing systems. Now, building these models will not be sufficient. We also need the light cameras to capture this effects. As Heisenberg, the physicist, not the chemistry teacher, if you watch [indiscernible], said that what you observe is not nature itself, but nature exposed to our method of questioning. Now, imaging and vision systems interact with nature by asking these visual questions, by capturing images. So we need to design systems which ask the right kind of questions. Now, in vision so far, we have used this pinhole model of cameras where light rays are mapped from a scene to a slide detector through this pinhole. Now, this is fine for capturing images for human consumption, but it can be restricted for computer vision systems. Going forward, we really need to expand the notion of cameras to a general light recording device which may map the light from the scene to the detector in all ways. Like in different new kind of optics. The detector itself may be curbed. It may even be flexible. The camera may measure light across a wide range of banks. But perhaps the most importantly, these cameras will not just passively observe the scene. They will have a programmable light source will actively influence the image formation process. These light sources will act as probes that tease out [indiscernible] formation from the scene. Now, this field of active illumination or active vision systems, it's a very rich field in itself. There is a lot of research that has gone it into it. Not just in vision. In many different fields. Right now, I and Shree are in the process of putting together a book on which gives a comprehensive introduction on all these active methods. We are learning a lot while writing this book and we hope that it will excite the readers about this field of active vision and vision [indiscernible]. So thanks a lot for listening and I'll be happy to take any questions. [Applause] >> Mohit Gupta: Yes. >> [Indiscernible] light experiments earlier, you mentioned that you're changing the speed of your mirror, right? So you mentioned that you might use slow, medium or fast speeds, but how are you actually calculating what that speed should be? >> Mohit Gupta: So we have a system where the camera first captures two extra images, one with the light source on and then one with the light source off. And that gives you an indication of the ratio of the projected light versus ambient light. And based on that, it's a feedback loop kind of a thing. We change the rotation speed. >> Okay, I see. >> [Indiscernible] examples which you showed for the first part of the talk, the results of using your system in ambient [indiscernible], the examples you showed for Columbia, the different slides. How big were they? Were they cropped or were the images, is that what your camera is actually capturing? >> Mohit Gupta: Right now, that's what the camera is capturing. The scene distances that we used were about one to two meters. And the light source that we have is actually very small. It's like a pocket light source. Now, it may be -- it will be possible to scan the entire scenes if you use a slightly larger source. >> [Indiscernible] you were talking about this tradeoff between pointing the light source at a small part of the scene versus the whole field of view, so in this case, field of view is an important variable in this thing, and so do you think that you can actually figure out a way to also build omnidirectional -- not omnidirectional but large field of view cameras? Because a lot of the technology that's there has this issue of [indiscernible] relatively small field of view. >> Mohit Gupta: That's true. I think that the field of view of the camera is actually limited by the field of view of the light source. So right now, we have to use a relatively small light source field of view because you're limited by light source power. If you can come up with methods -- and this may be an example where we can increase the field of view of the light source, then we can increase the field of view of the camera as well. And perhaps even go to an omnidirectional sensor. >> A problem that in practice that we [indiscernible] distance, [indiscernible] distance, the depth field. So [indiscernible]. >> Mohit Gupta: So are you talking about depth of field issues? >> Yeah. So for example, same camera, I would like to capture [indiscernible], so be able to capture near distance and also far distance. >> Mohit Gupta: So if you have a large depth variation in the scene, there can be two problems. One the limited depth of field of the sensor, right? So what you're saying is that maybe if you focus your camera behind the scene or towards that part of the scene then the front part of the scene will be out of focus. Now, that problem you can deal with by just using a short smaller aperture. Or you can use many of these coding aperture imaging methods that are developed in computational imaging. So that's a blurring problem. Right? Now, another problem that you may have is the dynamic [indiscernible] problem because you're using this active light source and if the scene is very close to the camera, that will receive a lot of light. But if the scene is far from the camera, you will not receive a lot of light. the question will be what part of the scene do you optimize for. >> [Indiscernible]. requirements. You always want all of them. So That's practical >> Mohit Gupta: Right. That's an engineering decision. You can either optimize for one small depth or if you know [indiscernible] that you're going to have a large depth range, then you can optimize your parameters accordingly. Like the light stream that I talked about, right now, we assume that the range of scene depths is small. But if you knew [indiscernible], then you could do something more smart. >> [Indiscernible] sounds like same kind of problems that offshore seismics have to solve. Offshore seismics, you have that [indiscernible] that generate mechanical engineering [indiscernible] so they ocean finds the rock bottom and the [indiscernible] coming back. >> Mohit Gupta: That's right. >> So [indiscernible], are you absorbing this knowledge from seismics to actively elimination. >> Mohit Gupta: So seismic is one example, but maybe perhaps a closer example is radar. >> [Indiscernible]. >> Mohit Gupta: I mean, in seismic, you are using ultrasound which are sound waves. And they're a slightly different model as compared to transfers this light waves right? So as I say, the closer model is radar and a lot of these problems are being looked at in radar community as well, and the problem of interreflections, et cetera. Now, there, the solutions are slightly different because of the difference in wavelengths involved. Now, many of these problems are not that severe if you are using radar because the wavelengths are large and the kind of scene structure that you are observing, the scale is actually smaller than the wavelength of light that you are using. Sing Bing Kang: [Applause]. Well, let's thank the speaker once more.