>> Sing Bing Kang: Good morning everyone it's my pleasure to introduce Oliver Cossairt. He graduated with a PhD from Columbia University and now he is with Northwestern where he founded the Computational Photography Lab. Oliver is an expert in occupational imaging and he is going to be talking about some of his projects today. >> Oliver Cossairt: Great. Thank you, Sing Bing. First of all, it's great to be here. Thanks, Sing Bing for inviting me. Basically, what I have in mind for this talk today is to just give you a bit of a flavor for the different types of projects that we work on in my group. I've been at Northwestern now for about three years. I'm just starting now to build up my group. We are interested in basically anything where we get to play around with actual optical instrumentation, so like building cameras, building imaging systems, and also get our hands dirty in terms of developing both the algorithms and the instrument itself. That means actually playing around with the optical side of things and the sensing hardware or any combination of these things. Broadly, we are interested in applications that span anything in the field of optical imaging, anything in the field of computer vision related applications, or overlaps between that and image processing and computer graphics. That's basically high-level idea of the interests and motivation of my research group. What I'm going to talk to about today is essentially three different projects that sort of give a flavor of the kind of things that we do and the kinds of things that we are interested in. And I'll also give you a sense of the diversity of the type of work that we do and want to do and basically various levels of technical novelties. The first one that I'll talk about is this motion contrast 3-D scanner that we have developed which is essentially what we think is a very light efficient power usage efficient way to do active 3-D scanning. I'll tell you about that with some technical details and then I'm going to talk for a while about these surface shape studies which are of some very famous artworks that are housed at the Art Institute of Chicago. We do some collaborations with cultural heritage imaging, things like the Art Institute in a couple of other museums that we are working with as well. There are things like the George O'Keeffe Museum and basically what we are interested in here is also doing 3-D imaging. The technical novel here is not as great, but the value to the conservation community is significant because they have never used techniques like this in any of the analysis that they do. The last thing I'll talk about is about this phase retrieval problem that we just started to get interested in and this is part of a broader effort in my group to look at the specific image processing problem that has had a long history. Mainly the optics world has now gained some interest in the more theoretical mathematical foundations on image processing side and it has some really interesting applications in biological imaging that have just emerged in the last couple of decades. I'm just going to give you a high-level overview of all of these projects with some technical details on the first in the last one. I'll talk about MC3D or motion contrast 3-D scanning. This all the technical work for this project was done by my PhD student Nathan Matsuda in collaboration with Mohit Gupta who I think some of you guys saw give a talk here also about active 3-D scanning not too long ago. This is really the high-level idea that we were going after. The idea is to take a point scanning system like this. We have a laser dot that is moving across our scene and our goal is to capture information about the position of that dot as it scans across the scene and the traditional way to do this would be capture sequence of photographs and then that would be essentially equivalent to a point scanning information you would need to capture for point scanning 3-D laser scanner. The issue there is really the amount of data you need to capture. If you have an image sensor that is M by N pixels and you're scanning your point across each pixel in the scene that's going to require M times N images and the goal that we're going after here is that really the amount of information in the scene as we're projecting this dot and moving it across the scene is actually really minimal. What we want to do is figure out some way to extract from our sensor just the pertinent information about the location of the spot in the scene and then essentially stacked that up as a queue, a list of numbers as a function of time as the point is scanned across the scene and the goal there is that if you have some way to do that then you can reduce it from M squared times N squared measurements to just M times N different pieces of data that your sensor has to capture and send through the system pipeline to the computer to the 3-D processing. That's the high-level idea. Again, the goal is to do 3-D capture, so the reason we're looking at 3-D scanning is that basically this is the way I think about it. Broadly, we can categorize three capture type techniques into the most basic ones or the passive ones which are the stereo methods. Two cameras looking at different views of the scene with different amounts of parallax. When you look at the shift of different types of texture features you can compute the depth based on the baseline between the two different cameras. Very close to related to depth from focus or depth from defocus techniques were you use a depth dependent blur to try to figure out what the distance away from the scene is. These passive techniques, the idea here is they can work well for scenes that have strong amounts of texture, but they really don't work if you have no high-frequency spatial information in the scene. So if you are trying to capture the depth of a wall, you are capturing to images of the wall at least from perspectives, the images look the same. If there's no texture them you can't calculate any parallax information. That's why we looked at active scanning systems. Active scanning systems essentially project texture in one form or another onto the scene that allows us to establish correspondences between in this case a trajectory projecting light and a camera capturing light. Laser scanners are one form of active 3-D acquisition systems, but they're part of essentially a broad class of structured light systems. Again, all of these techniques are based on triangulation, having the camera displaced with some baseline to a projector and a projector is projecting light into the scene. The laser scanner the light pattern it's going to project is just a dot that gets captured in sequence. For a system like the structured light system and the Kinect 1, a single pattern is projected. The scene is captured by the camera and then the informations of this pattern are used to determine parallax information essentially trading off some sort of spatial resolution or assumptions for getting a 3-D image in a single snapshot. The triangulation structure like methods work a different principal then time of flight principal. So time of flight principal would be like the latest Kinect 2 type of a sensor. The way that that works is you temporally modulate the elimination from the source and then you have some sort of demodulation that happens on the sensor and you use that figure out the time of travel from the illumination source to the detector back to the sensor which then you can use to backtrack you calculate the depth of the scene. For the rest of the talk I'm not really going to talk about time of flight systems because I'm going to focus on triangulation-based systems. A high-level comparison you can keep in mind about the systems is that for the temporal modulation systems you need to have high modulation frequencies to get high depth resolutions and there are limitations both in terms of, literally on the sensing side in terms of frequencies that you can use on your sensor and the depth resolution that you can get. The triangulation-based systems, the depth resolution depends on the baseline and distance to the scene. In general, for closer scenes you can do much higher depth resolution with triangulation they structured light system, whereas for a time in-flight system you can get very high resolution, depth resolution even at really large distances away, so that's sort of like the way to compare these two different technologies. Here's an example of being able to use triangulation-based structured light systems. In this case there's a line stripe laser scanner to get, in this case I think they got some millimeter accuracy in this scan. This is the traditional Michelangelo project where they scanned Michelangelo's David. You can get very, very high quality 3-D scans with laser scanning. That is essentially where we started out thinking about is we want to push towards this type of very quality 3-D scans as much as possible. A taxonomy of structured light systems. Here I have sort of my classification of this sort of continuum between different types of structured light systems. On the top left here we have a laser scanning system, so we have a point being projected from a laser a point or a line being projected from a laser that's being swept through the scene. We capture a sequence of photographs from our camera and then we figure out what the disparity is, essentially the difference between the column that is projected out from the projector and the column that is received on the camera. That gives us depth information that is scanned over time and if we have, if we are doing point scanning and our resolution is M i N, we need to capture M times N images. If we are doing column scanning then we need to capture M images which could be significant, as much as like 1000. The way to make this faster, and remember the key here is just establishing correspondence between the projector and the camera. We are essentially trying to find this triangle, sending a ray out from your projector and then finding where that ray projects to in the camera. That's the whole goal. That corresponds as a key to calculating the depth because once you know the geometric relationship between the camera and a projector, you can use that to calculate the depth. There are other ways to capture, to get this correspondence information faster. One of the ways is to use a binary coding scheme. The idea is you project a series of binary patterns with different spatial frequencies and what they do is uniquely code each of the columns in the projector's frame of reference so when you measure a bit pattern on your camera on off on off, whatever it is, you can uniquely figure out which code in the projector that column belongs to and that establishes the correspondence. And then you can calculate the depth. The advantage here is you can use a divide and conquer approach essentially getting a lot of rhythmic time that here would take you to do linear time in terms of the number of columns. You can add it to the mix intensity modulation, so now in addition to, you basically assign a unique intensity identifier to each projective pattern. One of these examples is the phase shifting type of pattern where you project sinusoids. Other examples are phase ramp where you just project a gradient in intensity. For instance, in the phase shift pattern you can capture you use what is in principle just three images. In the phase ramp you can just do two images and basically what we are seeing is as we are moving in this direction the ability to capture the images faster. Here I have the example of this first generation of Kinect where we are just projecting the single image and we are using smoothness constraints to recover the depth. This is sort of the space of different types of textured light systems. On the left-hand corner to the laser scanning system we have what is essentially a very slow system. However, for the system the light is highly concentrated because we are only projecting all of your available light you are focusing to one part of the scene. And then for all of these systems you are eventually flooding the scene with lots of illumination and the advantage of that is it's giving you faster acquisition system. This is a trade-off. >>: About 10 years ago somebody proposed a system using fraction grading to generate a sweep of wavelengths of each wavelength being one plane and then you could simply just look at the color of each pixel in the scene and determine based on that because it's monochromatic light, and that any one point determine which plane you were looking at. Does anybody use that technique today? >> Oliver Cossairt: What you are describing sounds like OCT. Does it use coherent skating? >>: No. It's just a diffraction grating, sends out a rainbow basically, but if you look at any one plane in that rainbow the light is monochromatic. And so you can look at the ratio of the red green and blue responses that you get in your pixel, the pixel that's being illuminated, and you can immediately determine which monochromatic wavelength gave rise to that. >> Oliver Cossairt: Yeah, that's basically related to phase shifting. You're using intensity coding, right, so whatever the intensity in the captured image is going to be a unique identifier for the column which will allow you to establish correspondence. In that case it's just using color in addition to intensity. That would fall in kind of this category over here, so definitely, people do that. Commercially, I don't know of any system that is out there that does that, and I'm not sure exactly why that specifically doesn't exist in a commercial product. But it could, certainly. The point that I was trying to get at here was this trade-off between how fast you can capture the data and the way that your using your light efficiently and assist. And the point there is that it has a really strong implication for some particularly challenging types of 3-D scanning environments. For instance, here's a comparison between laser squinting, wind scanning in this case and then gray codes. This was sort of the ideal scenario where you project light at the scene and only like that exists in the universe is coming from the camera. However, in practice there's always going to be some light just ambiently in the scene coming from overhead light that is bouncing off the walls. If you're trying to go outdoors, that means you have to compete with light from the sun. Your model is that your only illumination source is the projector. In reality, that's not the case. However, for our laser scanning system, we are taking all of our available power and we are concentrating it to a small line. >>: [indiscernible] >> Oliver Cossairt: Yes. Actually, that would be great. Is that better? Better contrast. Okay. The goal here is to establish the correspondence, figure out, he'd know which column in the projector projected this line. You want to figure out what it maps to the sensor, so you use some sort of binary threshold or some equivalent kind of algorithm and figure out the location of this line and now you have established a correspondence. It works fine in the absence of ambient illumination. If you add ambient illumination to the mix, providing you have sufficient optical power and in this case remember you are taking all of your optical power and you are concentrating it in a narrow line, you can just adjust your threshold and still get a good binarization. But the issue here when you switch over to a gray coding type of a scheme, which remember, the whole motivation was to make this scanning mechanism faster, so what you did is you floodlight scene with illumination with this binary coding pattern. Now that you're conserving optical power so the power that you used to project onto this line, you have now spread it over a larger area. As a result, of course, when you have the ideal case you can get a good binarization, but when you add ambient light into the mix, you now have your ambient light being much stronger in certain areas in your scene and you get, even if you adjust your thresholding, binarization errors still result that propagate to depth errors in the scene. Yeah? >>: There is an obvious fix for this, right? Are you going to tell us what that is? I mean you just take subsequent images and say when is it above or below the average, right? >> Oliver Cossairt: Yeah. There's a lot of work in trying to make binary and related coding patterns be robust in ambient light. Yes. There is definitely an impact and I think Mohit talked about several of these techniques when he was here a couple of weeks ago. I'm actually not going to get too much into detail about that, but you're right. That's an important point of comparison. Today, we do not have exhaustive comparisons with those techniques. Hopefully, you can sort of suspend your questions about that and maybe we can talk about it or off-line. The first one was ambient light. The second closely related one is global illumination. The idea here is that we are looking at an object that, in this case we are looking at a book. It's like a wedge pattern. Our model is that light goes from our source, bounces off the object and goes to this sensor. This is sort of the picture of what that would look like if that were true. This is a picture of what we actually see. Light goes and hits the object. It bounces around different places in the object and then comes to the sensor. This is just a reality of the way that light propagation works for conventional and coherent imaging systems. We run into the same sorts of problems here which, again, can be mitigated and there are actually some very specific cases that you can illustrate where the technique will work for our laser scanning system. And for the gray coding system you have the same sort of problem where the light that bounces around the scene eventually gets low pass filtered and just looks like a constant offset that is closely related to the same problem that you get with ambient illumination. Again, this is a scene dependent problem here. This problem exists for convex things where light is bouncing, has the potential to bounce around multiple times before it gets to the sensor. If it was a concave object, vice versa, then it wouldn't be a problem. This is basically the space that we were trying to chart out here in looking at 3-D scanning systems this way. On one axis we have acquisition speed. We like to be able to acquire 3-D scans fast. We like to do them at high resolution. And we would also like to have our systems the light efficient because the more light efficient it is the more robust we can do our 3-D scanning relative to essentially light pollution from the scene. A laser scanning system, we put some up over here on the upper left-hand side. It's very light efficient. It takes all the light power and concentrates it to a small area. But it's slow, but it's high resolution. And then we can plot out other systems. We get great coding and phase shifting. Those are higher-speed. We've essentially traded off speed here for light efficiency. Instead of concentrating the light to a small area we are now spreading it across the scene and then you can throw things like single shot methods like Kinect out there and we can also look in the literature and see how there are systems that essentially offer trade-offs between these different extremes. Mohit had a paper where he was looking at hybridization between gray coding scene and they laser scanning scene. It was a scene dependent trade-off parameter that allows you to adjust between the two systems dynamically trading off light efficiency against speed. There are also methods that are sort of hybridizations between single shot and gray coding essentially looking at the motion in the scene and changing the patterns that are being projected depending on if there is motion or not. This is our taxonomy of the space and this is where we are trying to get is in the upper right-hand corner. This is the idea. The idea is to take this new principle. Not a new principle, it's a very old principle, actually. Biologically inspired principle, which is to sense, instead of sensing optical intensity, directly sense motion contrast. This is a comparison. The very high-level idea. We have some source with time varying intensity. We have photon fluxes that are changing the function of time. For a traditional photo sensor that would get converted to current or time varying voltage. And then this is all happening internally. This is essentially the photo sensor on your pixel, and then that is analog time varying photocurrent would then get sampled at different points of time periodically using additional analog converter. The motion contrast sensor, the idea here is to take that same time varying photon flux, converting it to a time varying voltage in this case and then in the analog form electronically perform a temporal differentiation, and then have a thresholding mechanism that is looking around waiting for this temporal derivative to exceed some threshold and when it does you spit out some information. This system here is by nature an asynchronous sensing mechanism. This one pulls for information even when you get no intensity here. This one only spits out information when the value exceeds some threshold. The idea is to say biologically inspired idea. Constant stimulus to photoreceptors in your retina produce no output, so the way that your photoreceptors in your retina send impulses to your visual cortex is by sensing changes in intensity over time. This is enforced by at least partially by micro-sicods [phonetic] that are always causing the scene to scan, so a lot of times what you are seeing is edge information that gets mapped to temporal information and then that gets differentiated by the internal circuitry in the photoreceptor, and it's a compression mechanism. It's a way to take the hundred million photoreceptors in the retina and essentially compress it down to 1 million or so signals I get sent down the optic nerve. It's the same idea here. People have built sensors based on… Yeah? >>: Do you get better signal-to-noise ratio if you have suddenly varying input and then you had a batch temporal filter as opposed to a differentiator? >> Oliver Cossairt: That sounds plausible, to get a better signal-to-noise ratio. That's a good question. You have one signal and you are sensing that signal only. That sounds like the ideal way to sense that signal. Yeah. I'll have to think about that some more. That's a good question. What you are getting out is a good point is that we are playing around with time domain stuff. It ends up being very closely related to the time domain modulation of time flight type of techniques, although there's going to be no time of flight information in this. This is going to be a triangulation based technique. But we are incorporating temporal information implicitly into the way that the sensor is designed. Basically, there's a team that ETH Zürich, a sensing team there that has played around with sensing architectures that are based on this motion contrast principle. This is a picture of high-level block diagram of the circuit. You have a logarithmic photoreceptor, so you have time varying photon flux coming in that gets converted to a time varying voltage that is proportional to a logarithm of the incoming flux. Then you have a temporal differentiator circuit which takes this time varying voltage and differentiates its and then you have a threshold mechanism that looks for when the time varying voltage exceeds, the derivative time varying voltage exceeds some threshold. That gets pulled asynchronously and gets sent out as events to the circuitry of the readout circuitry of the sensor. This is one pixel, basically. There would be a whole bunch of these replicated in an array and this is what the output of the sensor looks like. The idea here is that this is intended to be apples to apples comparison in terms of bandwidth, so the same number of bits per second that are coming out of these two engine systems. On the left we have conventional video camera. On the right we have a motion contrast camera, and the point here is that on the left we're using a lot of bandwidth to measure sensors but we are not changing intensity very much. On the right any pixels that are not changing in intensity are not being sensed, and what we are doing instead is we're using that extra bandwidth to sense the motion at higher temporal resolution. That's the motivation for using this sensor and it's been used for things like high-speed tracking. There's actually some surveillance type of stuff that they use it for and then there are a lot of people in field biology that are interested in this because they can do, can put a sensor out there to watch like when a rabbit comes out of a hole and it is not consuming any power until there is movement in the scene. That's the main advantage of it. This is the idea. The idea is really simple. Take our laser scanning system where we are basically scanning sequentially through to the columns in the projector's frame of reference and then take that system and send it to a motion contrast camera. That's it. The point here is that what we're going to do now is we're going to scan this laser scanner fast, so we're going to do it at 60 frames per second, 60 Hz repetition rate. At that speed we're going to basically assume that the motion in the scene is static. As a result, the only intensity changes in the scene are caused by the laser pointer increasing to reflect to that point. Going to the sensor essentially produces a spike in the time varying voltage that's produced by a pixel. That gets differentiated thresholded, produces a single event that tells us a location of the laser scanner at that specific point in time. That gets sent as a single event. It gets sequenced over time as the laser scan moves position. Each of these events produces two pieces of information. One is the column that gets mapped to in the sensor and the other is the time of arrival that that event occurred. Based on calibration between the timing of the projector and the timing of the camera, you can produce a conventional disparity map and then use the intrinsics of the pose and the baseline of the camera projector to compute a 3-D model. That's the idea and the real advantage in comparison to a conventional laser scanning system is basically this bandwidth utilization. The idea here is if we have a line scanner here, the amount of time that it takes to scan each line in the full scene would be equivalent to the amount of time it would take given a sensor with equivalent bandwidth, the same number of pixels per second that are being read by the sensor, you'd need to capture and different frames where each frame is only capturing one column of essentially depth information. Most of the pixels in the scene correspond to essentially wasted bandwidth, pixel measurements that don't really correspond to what you're trying to sense in scene at all. We can do the equivalent of laser scanning in real time. That's the idea. This is a prototype. This is the DVS 128. This is that motion contrast focal plane array that I was talking about before. It's a commercial product but it's sold in small quantities right now. There's essentially a startup company that is looking for applications for this. We approach them with this idea of building a 3-D scanner based on this and built this prototype. The projector, what it was was a Show X projector from Micro-Vision. It has a mems 2-D scanner and a laser and basically what it does is the laser projects to the scanner and then it projects out to a point in space and then you intensity modulate the laser as you scan the mems scanner and a produces an image using essentially just laser temporal intensity modulation synchronized to this 2-D mems scanner. It's a commercial product. It was produced relatively inexpensively. It's like a $300 pico projector. For one reason or another the product is not available anymore. I'm not sure where the technology For Micro-Vision sits at this time. I been told that there's a new company that's going to be commercializing it. We use that system for our prototype and have since switched to our own custom-built laser scanning system for various reasons. Here's a comparison. Here we are comparing against Kinect 1 and the main thing here is that we are getting the same scan speed as the Kinect 1. They're both trying relation based systems. That's one of the reasons why we're doing the comparisons between those. The point here is what sort of performance can you get out of, how do you maximize performance on a triangulationbased system. That was a comparison we were trying to make here. The point is that we can get laser scan quality all things bidding equal in terms of resolution. This is again relatively low resolution for this prototype, 128 x 128 resolution, but we can certainly do better than the Kinect 1 given the same amount of time to produce a scam. Again, close to laser scan quality even though we get orders of magnitude decrease in the amount of time it takes to produce a scan. Here's a real-time scan result. It's just a person spinning a coin. In this sense this is 128 x 128 pixels at 30 frames per second and the only processing done here is some 2-D median filters to correct for dropped pixels. Yeah? >>: [indiscernible] by rotating the figure or something? How did you, because it's in motion, right? >> Oliver Cossairt: No. It's based on motion in the sense that there's change in intensity over time. The way that we induce the change in intensity over time is projecting a laser spot out in the scene and moving it really fast. That induces a change in reflectance. Brightness is a function of time that is sensed by the pixel which is the information that we take advantage of. We assume the object to be stationary over the duration of the scan. >>: [indiscernible] like when you project something you might have multiple bounces and it might exceed the threshold and you get multiple points. >> Oliver Cossairt: You can show that as long as the portion of the light that is reflected back in the direction of the camera is greater than the amount of light reflected in some other direction, meaning that it's basically not a perfect mirror or not too specular. Then you are guaranteed to be able to set a threshold that will work with the system. I'll show you some examples of highly reflective objects, or one example. >>: On the one that you showed us why is the table cut out? Why was there only the hand and the coin? >> Oliver Cossairt: I don't know. I would have to ask Nathan about that. I'm not sure. This example of pinwheel in comparison to Kinect 1, identical frame rates. The real advantage that we are pushing for with this is performance in ambient light and the hopeful application there is outdoors, doing robust outdoor 3-D active triangulation-based scanning. Here's an initial experiment. Kinect 1, again, comparison, starting out with 150 lux which would be like this room and then increasing the illumination to be up to 5000 lux which would be a cloudy day. We can obviously do better than the Kinect 1 and the take-home message here was at this was about the ambient light on the surface was about 10 times greater than the laser, so basically you couldn't see the laser point in these images because the laser is so much dimmer than the ambient light but still we were able to get a good 3-D scan despite that. This was our first generation. This was actually using visible frequencies, red light without any color filtering, so actually the sensor even though we're projecting narrow band with illumination, we're not filtering that narrow bandwidth on the sensor which we could use to essentially block a lot of the ambient light getting to this sensor. That was our first generation. This was our second generation. In this system we have our own custom-built laser scanner using galvanometers instead of using this really cheap efficient implementation from Micro-vision that unfortunately at this point is difficult get hardware from. At this point we have switched over to infrared wavelengths where ambient light sources tend to have lower intensity and we have also used a color filter now to block all wavelengths within the narrow bandwidth except for the ones that we're projecting out. This is probably about as good as you could do. This would be similar to the tricks that are being played inside of the Kinect 1 and 2 systems to block out ambient light. In this case we're showing you, so this light bulb was on. On the surface of the light bulb is 50,000 lux, which is quite bright. It's equivalent to about typical reflectance of what a human face would be in direct sunlight at about noon. It's about as bright as you can get outdoors. And we're doing in real time, we actually are getting you a 3-D scan of this light bulb as the light is on. So we're projecting light very dim onto the surface of this light bulb and we're actually, based on the light that we're projecting onto the bold able to compute this geometry despite how much light is coming from the bulb. >> That's a fluorescent light, though, so it's not putting out a lot of IR at all. >> Oliver Cossairt: This example here is visible in this case. The next example I'll show you is infrared outdoors. You're right. That detail I glossed over. We started out doing this invisible. The only difference here, this is still visible. This is 633 but we have a narrow bandwidth filter in front of and then the next one will be outdoors with switching over to infrared. Now we're in infrared. This is the system here. This is outdoors, noon on a sunny day. This surface is the illumination on the services 80 kilo lux. We're about 4 meters away here and we're going to scan a person. Here's a comparison. There's a field of mismatch here between our 3-D scanner, and in this case we are in comparison with the Kinect 2 because the Kinect 2 has much better ambient light rejection than the Kinect 1. There's a field of view mismatch which is just really a practical issue between what focal length we had to use for the sensor and the projector based on the scan angles of the specific scanning optics that we're using. The point is that we're getting a high-quality depth map here even in this 80 kilo lux and the illumination and we can see that clearly we are getting failing in the Kinect 2. We can definitely do better than the Kinect 2. We can engineer this even further, but this is where essentially the state-ofthe-art is for us right now. >>: In the one in the middle when you watch the video when the person stops moving the hand in his hand is closed the camera can see how sharp silhouette of the hand is. Whereas, when the hand is moving you can see that it's noisy. This [indiscernible] >> Oliver Cossairt: I think you are seeing basically motion blur. It looks weird because the sensing mechanism is totally different than a conventional sensor. It's the effect of motion during essentially single stand periods. Here I think, this one was lower frame rate. I can't remember why the technical reasons why we had to go with the lower frame rate back here but I think it was 10 Hz. With >>: The galvanometer is probably [indiscernible] >> Oliver Cossairt: Yeah, I think that's right. The scan sees limitation with the galvanometers over the field of view that we want is just a trade-off that we are stuck with for this prototype. Here's a first stab at a comparison between doing a book, that V groove type of an object where the light to you project is going to bounce around multiple times before it gets back to the sensor and a produces serious errors for the gray coding system and then we're able to at least more faithfully reconstruct the shape of that V shape there. Here's another example of a highly reflective object. It's an metallic sphere and we are able to get the sphere pretty faithfully whereas the gray coding method fails. >>: You mentioned earlier that if the reflectance is the specular mirror you can choose a threshold that would give you the correct depth. You have an algorithm for choosing that dynamically or do you tweak that by hand? >> Oliver Cossairt: Right now it's all by hand. Basically right now whenever we set up an experiment we tune the parameters specifically for that experiment, so that's definitely the next step is flexibility in the sensor operation in how rhythmic programming. None of this has to be hardcoded. Like you're saying it can operate on the fly and change two different conditions. That's a good point. I spent a lot of time on that. I'm supposed to end at 11:30, right? I'm going to give you a really brief overview of these other two projects. I think they are cool projects and they give you a sense that we are really trying to work on a lot of different stuff. The point here is there are all these prints at the Art Institute of Chicago that they don't know how they are made. The reason is because there is this artist Paul Gauguin. Some of his paintings are the most expensive paintings in the world at this point. We're looking at his prints, not his paintings. He has this whole other body of work and the process that he used is really not understood. The art Institute has basically picked some prints that they wanted to study carefully. This is an example of one that we looked at closely. This nativity. This is the front of the frame and the back of the print. This was essentially what was hypothesized as the printing process that was used. It's a standard monotype process. I'm not going to go over the details. But that hypothesis had some weaknesses. And the main reason is because when I looked closely at these prints, you could see what appeared to be these broken lines in the prints. So what they were basically interested in figuring out here is could these broken lines originate from indentations in the print that were caused by him essentially tracing something on top of it and then the surface pressure transferring from the top to the bottom the way that you would transfer secret message from the top of the Post It on top to the one below it, that sort of thing. Blind incisions is what they called them. They thought that was the origin of these and we're not sure what to do. I'm going to keep going here. They asked us to make some 3-D surface measurements of the paintings to try to verify if this was the case. These are Museum conservators, so they don't have access to anything high-tech at all. They have access to cameras and lighting equipment and standard photographic equipment. The approach we chose to go with was photometric stereo because that's something they can actually implement in their lab there. What that means is we're capturing a sequence of photographs of the scene from a thick camera view point and changing the illumination. This is the data that we are capturing. Here's the painting. We are changing the light source around from this reflective ball here. You can figure out the angle of the light source coming in. And then based on the angle of the light source coming in and the brightness that you measure for each of those directions you can back calculate giving some basic assumptions or understanding about the material properties in the scene you can compute a 3-D surface information and use that to essentially differentiate between surface color and get to underlying 3-D surface shape. That's what they're interested in here and that's what we were helping them out with. There were two main questions that we were able to answer for them in this process. The first was the origination of these lines, so you can see as these lines are pulled away, you can see very clear surface perturbations, so hills coming out of the page. This is about 100 micron details size that we are seeing the height of these perturbations. That was essentially strong enough evidence for them to indicate that the transfer process was such that the print was placed down, pressure was applied to the back, which caused indentations on the front, which was the way the ink was transferred from and ink support surface to the print. That's the first piece of evidence that we were able to get them. The second piece of evidence was that all of these locations where it looked like you had these broken lines, we were able to verify that despite the fact that we know we can measure accurate 3-D surface shapes for these prints, there are no surface features that correspond to any of these broken lines. That's where we were able to do for them. What it allowed them to come away with is this idea that the way that he made these prints was he took ink surfaces that were tarnished to begin with. This is basically the process that we were able to come up with for how he made these prints. Start off making a print, he's pushing on the back. ink is getting applied from the ink surface to the front of the piece of paper and then you pull up the print and you move it away and then this is now the beginning of the next print which is the one that we were analyzing. Now we put a piece of paper down. We draw on the back. Ink gets transferred from the ink surface to the front. However, now because the ink surface was tarnished there are already ink lines that were removed. There are broken lines that appeared in the transferred ink and then the process continues for more ink layer and this is the mock up comparison between the front and the back. Our reconstruction versus the ground truth and they were very happy with this process. They thought that this was a pretty good explanation for the process that Paul Gauguin probably used. Last one, I'll just do this quickly. This is a totally different flavor here. This is actually about x-ray imaging. Although it's related to a lot of other problems, specifically, we are interested in very high resolution x-ray imaging. The idea here is that you are using x-rays that have a very small wavelength and as a result you can resolve features on order when you have your imaging set up appropriately, features on the order of the wavelength which can be as small as tens or even single nanometer resolutions for specifically what has been a really hot topic in the last couple of years, which is biological samples. Looking at biological samples, sub cellular level has been a new phenomena that has occurred because of these very high power x-ray sources that have come out of these national labs that are repurposed sing now defunct physics experiments, particle accelerators and using them as essentially high-power light sources that anyone can apply to to use for their experiments. That's where a lot of biology experiments are now flocking to these national instruments to use these high-powered x-ray sources. Biological imaging is one of the great things. The issue here is that the way these experiments work you can't bend around x-rays way you can optical frequencies. It's hard to build a lens. It's expensive to build a lens. It's possible to build a lens. It has to be refractive. You can't make things refractive x-ray. Build a reflective optics is expensive in x-rays. The easiest by far way to do it is you just shine a light on to your object. That light, remember it's very small wavelength light. It diffracts. The diffraction pattern you just go directly to a sensor. That's the whole, it's completely lens less. Send light to the sensor. It scatters. You measure the intensity. The problem is with the systems that most of the light doesn't get scattered. You have this large concentration of light at the center of the beam and you get a dynamic range problem. This is because these are essentially natural images here and natural images, we know, are very sharply peaked in the center in the low-frequency region and that's what's happening and they have a very, very large dynamic range. We're sensing the amplitude pattern of the Fourier transform of this image here. That's what we're trying to do here. It's problematic and a produces all sorts of issues with the sensor. What's typically done is to avoid blooming problems with the sensors, they just block out. They put literally a physical absorber block in the center of the sensor. Just don't sense that light at all. One solution is to do HDR just as we all know, capture multiple images with different exposures. This is done but it's slow and there are certainly applications where they would like to do this faster and not have to wait around to capture multiple exposures to make it work. Single shot imaging, you put an x-ray beam on your object. It scatters. You essentially measure the amplitude of the Fourier transform. It has a very large dynamic range, so you throw away the center. That was the problem that we're looking at here. I'll just skip ahead to the results. I find the most important point in 3 minutes. >>: You can take a little bit more time. >> Oliver Cossairt: Okay. I'll walk through it briefly. It will just be slightly more than 3 minutes. I'll explain it properly. This is the model. Our sensing model is we take our image of our specimen x. We scatter it. If the scattering is far enough away it essentially boils down to taking a Fourier transform and that Fourier transform we actually only measure the module of that Fourier transform. But given some special tricks we can play like if we just take the inverse Fourier transform we can get the object back. We need the phase in the Fourier transform which we are not measuring, but there are tricks that you can play with that. The difference here is that we're just going to have a high pass filter here. The point is that where blocking all of the DC frequencies, so if we do just naïve inverse Fourier transform processing, it's equivalent to taking our image and applying a high pass filter to it. All we are doing here is we are just applying some regularization to this problem. A very simple solution obvious to anybody who is done some image processing or played around with some sort of computational photography tricks, just happens to be something that hasn't been looked at in this x-ray community, which is basically mostly physics stuff. We're using a total variation regularization to solve for this problem here. This is the experimental setup. You have your object here and this is represented by Lena's eye. And then you have a couple dots out here in the idea is that you measure the intensity Fourier plane, actually the squared magnitude. And then that's equivalent to if you took an inverse Fourier transform, that's equivalent to taking this image and measuring its autocorrelation. When you look at its autocorrelation the autocorrelation looks like this and the point is that because you had a couple of dots out here the autocorrelation is going to give you copies of the image that you carry about out here. That's the basic idea for how to solve this problem of you're only measuring intensity out here but even though you don't have the phase, you're able to sort of get the equivalent of getting the phase back just using simple Fourier transform, inverse Fourier transform and that sort of thing. But it doesn't work when you have missing data. It just totally fails when you essentially just apply a high pass filter to that other image that we captured, you don't get a good reconstruction. However, if you regularize then you do get a good reconstruction for the natural images. This part in the center is essentially the autocorrelation of this image. You can't reconstruct that well because it doesn't know base statistics of natural images, but you don't care. All you care about is getting these images out here and it seems to work very well for at least the initial experiments that we've done. This is a real experiment in visible comparison between a conventional reconstruction and the regularize reconstruction, just very basic images at this point. We're now working with some physicists at the Argonne National Laboratory to get real biological data to test this out on. There are some comparisons that you can make two different types of biological specimens, resolution target. This would be a typical result that they would get out of their algorithms and then this is where we are able to push the results using a simple twist on the algorithmic processing. That's it. That's just the flavor of the projects that we work on. The MC 3D scanning system doing high-quality active triangulationbased 3-D scanning outdoors. That's where we are trying to push it we have a lot of collaborations with cultural heritage institutions, universities around the world where we are trying to do computational imaging, 3-D imaging and take it into interesting places specifically inexpensive implementations is what we are interested in this area of things. And then we are also interested in these emerging imaging applications in different scientific fields. [applause]. >> Sing Bing Kang: Any more questions for our speaker? >>: For the last thing where you had the two dots on the side on either corner there, was that like making something like a hologram? >> Oliver Cossairt: It's exactly a hologram. It's called Fourier transform holography. >>: You had essentially two reference sources which is interesting. >> Oliver Cossairt: Yeah. It doesn't have to be that way. People played around with all sorts of different types of reference objects, even using multiple points together. One interesting implementation is what basically boils down to contrast in the interference between the light defracture from the point source and a light defracture from the object. You want more light because there's no way, you can't put a beam splitter in the set up to make more light goes through the pinhole, so ideally you would like to not attenuate as much like and just use a circular aperture. Square aperture has been done before. Because basically, if you apply to derivatives horizontal and then vertical, you end up with for delta functions of the corners and so you can get higher contrast fringes and then get back to the same image that you would've had if you had had to dots, that sort of thing. What I didn't really get a chance to do is in practice this is just one of the techniques that they use as holography. What they really like to do is no interference at all. Just take the diffraction pattern from the object and just measure the amplitude of its Fourier transform and there is a series of these is so-called iterative nondeterministic phase retrieval techniques that are used. In that case the techniques that we, it also works with that case to. It's called coherent diffraction imaging and that's the more common way to actually solve this problem. The alternative is you actually have to make a piece of hardware that has those little dots in it and then you need to stick your object in a known position relative to those little dots and this is nanometer scale. It can be really challenging. Taking the pinhole side of the equation essentially makes for similar experimental setups, more convenient setups and that's the regime that it is more often used in. >>: Is that hard to scale, two beams [indiscernible] for example? >> Oliver Cossairt: Yeah. It's a pain in the butt. I think any type of optical components for the x-ray are really a pain in the butt to work with. That's my understanding of the situation. Beam splitters, focusing optics, all of these things, they do exist and people to use them, but they seriously add to the complication of the experimental setup. So if they can avoid using it they like to. >> Sing Bing Kang: Any more questions? I just have one. How do you know that the results you show here are reasonably accurate given that A the surface is not [indiscernible] and B there is a lot of [indiscernible]? >> Oliver Cossairt: That's basically what we are looking at now. The goal right now is to actually take ground truth measurements using high precision 3-D surface measurement equipment. We're going to take a white light interferometer, measure to micron scale the height maps of the set of representative surfaces that have been produced by the conservators that have reflectances similar to materials in that actual artworks that we care about. Make ground truth measurements, do comparisons between different photometric stereo reconstruction algorithms, different assumptions about using libraries and materials and that sort of thing. That's essentially what we're looking at now. For this initial work all we did was assume this version and the results were informative to them even that simple assumption straight photometric stereo from 1980 Woodward paper, that's all we're doing, but that was useful to them. Now we're starting to look at it in a little bit more detail to see how important it is to see what accuracy is required out of these things. >> Sing Bing Kang: Let's thank the speaker once more. [applause]