>> Sing Bing Kang: Welcome. Good morning. Thanks for being here. It's my pleasure to introduce Albrecht Lindner. He's a Ph.D. student at EPFL under the supervision of Sabine Süsstrunk. He's actually defending next month. So he'll be finishing his Ph.D. very soon. His interest is in statistical methods for scene understanding and scene enhancement. And the talk he's about to give is based on his thesis. >> Albrecht Lindner: Great. So thank you for the introduction and thanks for being here despite the upcoming CPR deadline. I'm glad you made some time. So before I start my talk, I maybe give a short introduction on my economic background. So I started masters in engineering in Stuttgart in Germany. And then after that there was a possibility to do double masters program with university in Paris where I focused on image signal processing. And right now I'm doing a Ph.D. at EPFL Switzerland and as Sing Bing just said I will defend soon. And so my professor is Sabine Süsstrunk and the industrial sponsor is Oce Canon. These are the ones that enable me to do this work. And before I started the academic part, I want to also make quick notes or I make -- I'm having kind of a one-person business for a bit more than ten years now. So I'm doing this a bit in parallel and a bit less than the last time but still so I'm building props for magicians and so these things are maybe a bit unknown to you so I'll give a quick intro. Start with [inaudible] design. So the magician comes to you and says you what kind of effect he wants to realize. And then you check out how you could make it possible that the audience gets the illusion of some effect. Then I build these devices. Of course, you use wood and metal for the physical structure. But also depending on the magician's need you also maybe have to build printed circuit boards, program micro controllers or use power electronics in order to steer different flashes or engines to make something happen on the scene. And so this is not really a toy. So we're a bit serious about that. So the gadgets that I build we won different international prizes. And so I won't go deep into that. Of course, I would like to share all these secrets of all the magicians. But in the interests of time I will now go to the academic part. So the outcome of my talk is I will present to you two different things, the first one is semantic image enhancement. It was presented last week at AC Multimedia. And the other one is automatic color naming that will be presented at CIC in Los Angeles. And both of them together they work on -- they use the same statistical framework at the bottom. They're just two different applications for the same mathematic concepts. So we started semantic image enhancement. The first thing I'd like to pose is a question, and the question is which image is better? So do you have an opinion who proceeds on the left? Your one? Okay. And the others prefer them on the right, I guess. So the thing is now the question was a bit unfair because I didn't give you all the information. Additional information could have been who prefers the image for the concept dark and which image do we prefer for the concept snow? And in any case, if I ask you for an image for the concept of dark, you would have said the left image. For the concept now you would have voted for the right image. Same is true, again it's the very same image, these are just two different versions of it. The one has been enhanced for our sandy beach and the other for sunset. So the point what I want to give here is ->>: The one percent actually looks unnatural. Regardless of what you say. It looks unnatural because you don't naturally see the appearance to sun like that when everything is so bright. >> Albrecht Lindner: That's true. So I make the processing quite strong here in order to show the difference between two images. You can of course just make the processing a little less strong and do something in the intermediary. This is just to demonstrate the difference between the two concepts. So the main point here is that you can't decide which image is better if you don't have the semantic context but only the pixel values. And this is the reason why any kind of auto adjust of contrasting colors that can do a decent job but you will never get to a position where you can actually enhance an image for semantic concept. The only reason to do -- the only way to do this is to do manual editing. So you use PhotoShop or some other tool and if you give such an image to an artist and say like make it look like a sunset, he will do something like the one on the right, for instance. But, of course, like many editing is good but it will be nice if something automatic. So the goal is to have an automatic enhancement with semantics. And so now let's look at what are the solutions that are provided today. So one possibility is to have modes. So you have cameras that have a portrait or nature or firework mode, and depending on the mode you're setting, your camera will apply a specific processing to it. Or you have printers, they have modes draft and presentation, and then, again, each mode then evokes a different processing that is adapted to the context. Or the other methods that do a classification plus an enhancement step. So there are different publications and they try to classify skin or sky regions or some other classes. And then for each region the image of a class you apply specific processing. So if you do X regions you make sure the skin tones are correct or not too reddish or something like that. So the problem of all these methods is that they're difficult to scale to large vocabularies. So you can imagine to have 10 to 20 modes or maybe 20 or 30 classes, and then implement processing for each class. But what is if you want to do a thousand or 10,000. So this is just a practical -- you cannot implement 1,000 different processings for 1,000 different classes. You cannot do it. So you need some other means to take semantic context information and process the image accordingly. So and this one point I want to make clear is that so we're not doing a classification task. So a classification task would be to go from an image to the keywords. However, in our case we have a keyword and we want to apply this keyword to the image. So we go the other way around and then a classification task. And what we need is we need something that says us a keyword significance for an image characteristic. And we need a re-characteristic that makes sense to change an image; we can't just descriptors or backoff words or these kind of descriptors. These won't work because we need something that we can actually change on image. Such as likeness, color, depth of field for frequency distributions. And so the way we measure the significance for keyword is the following: So we started for big database. So we have one million images plus keywords. So it's a database, public, readily available from Flickr. So, for example, let's look at the keyword night and the characteristics, we just look at some gray levels. So what you see here is on the X axis is a histogram with different gradients that go from white to dark and on the Y axis you see how many percent of pixels on an image fall into this bin. And you see here that for night, for instance, is you have on average a bit more than 20 percent of the pixels actually fall in the darkness bin. And this is quite more than for images that are not annotated with night. And you see on the other hand, on the other side here is night images, they have less pixels that fall in the bright bins whereas the images not on tip of night they are higher. So the question is now is can we assess the difference between these two distributions? So each bin is a distribution. So this is the distribution here. The median and the 25 and 75th percentile. And can we assess the difference between these two distributions for every single bin? And the thing is we don't really know the distribution apriori. The thing is we want this thing to be very versatile. So we don't make any assumptions of how the distribution might look like. That's why we're using nonperimetric significance tests, because we don't want to assume this is a Gaussian or whatever distribution, we just want it to be general. So we're using a nonperimetric significance test. And so there are a couple of significance tests that do that, that compare two distributions. No matter what the actual distribution is. So three of the well known ones are the Wilcoxon ranksum, Kolmogorow-Smirnow, and Chi-Square method. And I have here three example inputs to demonstrate the difference between these methods. So the Wilcoxon ranksum test just measures a difference in median. You see here where you have two probability distributions, they're just shifted. Your test significance will be this part here, the difference in the median. However, if you have two distributions that have the same median but they just have different shape you will not measure any difference. So zero and the same for the bottom case. The Kolmogorow-Smirnow just measures the maximum difference between the two accumulated distribution functions along the X axis and that's why you measure the difference in shape but you don't measure anything here. Last year the Chi-Square method because you make the difference in each single bin you will measure actually also difference in the distribution at the bottom. So the thing is in our case, when we want to do image enhancement, we want to in or decrease an image characteristic. We don't care about the shape we just want to increase or decrease. So in our case we'll use the Wilcoxon ranksum method because we don't want sensitivity to shape changes. Of course, if you have some other application mind you might want to use a different test with different properties. >>: Isn't this shape distribution very typical of contrast when you sort of reduce your contrast, it gets sort of gray and muddy if you really push the contrast and things move out to the edges? >> Albrecht Lindner: Sure. this on the next slide. But you can still do that. I can show So it's the same plot. You see again the distribution from the image of night and the type with not night. And then if you compute the Z value, this is just a test statistic minus the expected mean divided by the variance, which gives you this E value. You actually see that it is positive here in the dark bin. So means significantly more pixels with dark gray value. And here's it's negative. That means night images have significantly less pixels with these bright gray values. And now, for instance, if you want to enhance the contrast, you would actually shift according to this thing here you would say I need more pixels here. So you shift pixels to this side. And you need less pixels here so you move them out of this region. If you do contrast thrashing you'd have a similar distribution that would tell you you need more here, more here and less here. And you just move out of the middle and you go to the sides. And so this example for gray level characteristics but you can have other characteristics so here you see in C lab space, here's the keyword Ferrari, and you see here three dimensional histogram and Z values in there with these free heat maps. And you see the maximum here the crossing actually is in the red region as expected for Ferrari. And can you do more, for instance, here we have C values for red green blue flower and here on the top you see the U angle. You actually see that red has significantly more pixels in the red bins and green more in the green bins. Flower has not -- it's a bit less significant but also has some peaks in the red regions. If you look at linear binary patterns you'll see that red green, blue don't have very significant patterns, however flower has. This just means that flower has -- they have less angles that are very pointy, and they have more angles that are flat or cues. Means that flowers have round shapes. You see all these statistics and you see these patterns as expected in the data. Or you can do spatial layouts. So here's the keyword light and you see that in the center of an image -- so this is just a spatial layout. So we just take an eight-by-eight compose of the image no matter what the aspect ratio of the size is. And you see here in the middle you have slightly positive values. That means the center of an image of light tends to be a bit light in the middle, but it's definitely a lot darker on these surroundings. And here's just typical example that shows you the reason for that. Or here we did spatial chromal layout. The keyword is barn. And you see here that in the center image where you have a bound that is made of wood a low chroma object. You have less chroma pixels and the bottom where you have grass or nature scenes you have a high chroma, and that's why the values are positive here. The bottom or you have fireworks here. So these are gabba photo layouts that tell you how much structure there is. You see here the top is a blob that is from the fireworks and the bottom is mainly an illuminated city. >>: [inaudible] they make that they have the firework in the image? >> Albrecht Lindner: No, you could -- but that would be that would be automatic image classification. That would be to go from the image to the keywords. >>: Okay. >> Albrecht Lindner: What I'm doing here is the other way around. So what I'm doing here says me if you have an image for fireworks, there probably is a lot of structure at the top in the middle and here at the bottom. Okay. You could potentially use it also to make an automatic image orientation but this is not the goal for this work here. >>: I see. >> Albrecht Lindner: There's one more point. This is very efficient. So this Wilcoxon ranksum test, what you need to do is you need to sort all these values in a sorted list and then you have to compute the ranks of all the elements that belong to the one set. So I won't go into details. But the thing is you only have to sort this list once. So every additional keyword that you want to measure the significance, it's just a simple sum. So you can do this in a fraction of a second. So you can easily compute for thousands of keywords and like in my case I just stop at around 3,000. We can easily compute for large set of keywords. You can compute automatic the significance values at almost no cost, because once the sorting is done, like they all go like one after the other, very fast. Okay. So now I'm going to stop this part and I will go on to the application how we used it, which is the first one semantic image enhancement. And we implemented three different methods. So we did tone mapping. We did our color enhancement, and we did depth of field adaptation. So here for this one I have to add that -- so the input images and the method to estimate the defocused map they come from these authors. However the algorithm and the output of the -- the output image are going to show you the output around. So to show you the principle I focus on the image enhancement. The others are similar. The enhancement pipeline proposing is the following: We have two independent inputs. We have image and keyword. And then you extract characteristics and you have on the one hand you have some of them are -- the one part is image dependent. The other is dependent on the semantics of the keyword. And then you fuse them together to get an output. And in this case the image looks a bit more golden than the input. So let me first go to the semantic component. So what we're doing here is we take the significance values. In this case we just do it for the red green and blue channels. It's like the gray levels that we saw before, but here it's just for red green and blue channels. You see here for the keyword gold, you have positive values C for the blue curve. That means you need more pixels that have a low blue content. And you have high Z values here, positive, that means you need more pixel that have a high red content and green is something in the middle. Basically this processing adds a bit more gold to your image. >>: Computed based on many images. >> Albrecht Lindner: Yes, we took this database of online images. >>: This is how you got it back. >> Albrecht Lindner: This is how I got this one, exactly. And from that, now we can immediately derive a tone mapping operator. So we want something. So here's the equation. But what we want something is we want something that accumulates the blue channel in the lower, with low values. So we need for blue tow mapping function that goes from the beginning and goes straight up and for that we need the opposite. We need something that pushes the pixels away from the low red values and accumulates them where they're high red values and approximately we have a processing curve that goes up like this. >>: So can I ask -- so for any image that say is gold and I want you to enhance it, you also apply that ->> Albrecht Lindner: For gold, yes. >>: So even within gold there might be some differences in the way people might enhance it, right? I mean, you're basically saying anything that's gold, I just apply a single type of transformation. >> Albrecht Lindner: There's a second component that adapts to the image. But the semantic component will always be the same. So if some people have a different imagination of the concept of gold, then I need a database from these people to learn it. But then I can do it. So it's just -- I need data to learn what people mean with the concept of gold. And if they don't match the taste, in this case, of the Flickr dataset, then, of course, it won't work. >>: Also if I say gold and sunset, what will it do? >> Albrecht Lindner: I'm going to talk about that later. >>: [inaudible]. >> Albrecht Lindner: And so the only thing we're having, we have just like a single parameter that just says how extreme the processing is. So if S is zero, we will just have the identity transform and the larger Ss, the more extreme the processing will be. Okay. So we have now the semantic component. Now let's go to the image component. Because if you just apply this tone mapping curve to the entire image you will effect everything also the sky and that won't look good. So we need something that adapts to the image. So for this input image in gold, we built a weight map that says how likely is it that this pixel is part of the concept of gold. Okay? In this case what we're doing is simple method we just take the color value at the pixel and we ask how high is the significance for this semantic concept, and if it's high, you will have a bright pixel here. If it's low it's like a dark pixel on this side. >>: And how do you do that? >> Albrecht Lindner: Okay. We just take the significance values. >>: How do you compute it? Is that a million -- out of the subset of images that's been tacked gold. >> Albrecht Lindner: Yes, exactly. So we have these significance values for gold that we computed from this one million images database. >>: Okay. >> Albrecht Lindner: And this says me how much each single color is related to gold. This is the high value in the yellowish region and like a negative value somewhere in the red and green region. >>: There must be a technique to allow you to do that kind of -- what to use. >> Albrecht Lindner: This is not hiding. The thing is I know the keyword anyway. I have two inputs. I have an image and a keyword. And so I know the tack of this image. >>: I understand. relevant to gold. But you're saying the probability that is actually >> Albrecht Lindner: Exactly. >>: But the question is how do you learn that? >> Albrecht Lindner: Well, I learned it from the tack database of one million images that I had at the beginning. >>: Oh, you're looking at the current -- across images that's tactical. >> Albrecht Lindner: Yes, the thing is I have these significant distributions that I showed you before for gold and barn and Ferrari and like any concept. And so this is precomputed. And you can just store it. And now if you have this concept of gold, you just look up, okay, what is the significance distribution for gold. You just take it out of your database. It's precomputed and you can immediately take every pixel here and look up how significant this is for gold. You just look at all of the distributions. Okay. You could, of course, to other methods to actually to find the regions relevant to gold. Could you do like a segmentation and then do some more sophisticated computer vision techniques to actually find the regions. And there's a lot of work done, of course. But in our case, we just do that. But, of course, you can do something more sophisticated than that. >>: Yeah, because I would imagine like it's so the distribution is so varied it's not clear to me. >> Albrecht Lindner: The thing is if the distribution vary, the significance values will be although you can use a threshold if I don't use a dominant concept I won't touch the image. For gold you definitely have a high significance blob in the yellow region that is very dominant. And if you detect that you can say, okay, like in this case I really know what to do and you go further. Okay. So now we have our image and semantic component and we're going to fuse them in a semantic processing step. And what we're doing is so we globally process the image with this tone mapping region and do a reweighting with the map here. So we take the input image where the weight is zero, and we take this intermediate image where the weight is one. And in between we just interpret linearly and that means we just enhance the characteristic in the relevant region. In this case we touch either the bottom part and increase the goldness here and we won't touch this guy. And now I'm going to show you like a couple of example images that we processed here, for instance, here this is the, I will just dim the lights here that you can see this a bit better. So here's for sand, here this is for snow. Here is an input image for dark and now this is the output image. This is the silhouette. There's sunset. There's grass. Autumn. There's strawberry. So you can see, you can do any kind of keywords. You're not limited to restricted vocabulary. Here is enhancement for sky. There's banana. Makes banana more yellow. And this now is for depth of field. So the one before is colors. So now you see the keyword macro indicates that the artist's intent was to have a depth of field effect. If you plug the macro keyword to our framework you'll get this as an output where you blur out the background but foreground remains in focus. Here you have for flower also special frequency processing or here is this little boy for the keyword macro, also blurs out the background. So intuitively we found it looks quite good. But of course we wanted some, we wanted to measure the performance of the system. So we did some experiments and the experiment was so we showed two images to an observer. The original and our proposed image. And we also showed them the keyword. And we just asked them which image do you prefer for this context. In this case contact is sand. Of course, you don't even really see there's a beach. In this case people would probably pick on that image. >>: Can you compare against both other techniques like this? >> Albrecht Lindner: And so we did eight different keywords. 30 images each. Different parameters for the scale variable. So for the observer there were almost 30,000 image comparisons. We did it in Amazon Mechanical Turk. And the result you see here is so at the bottom you see the escape parameter and here's the approval rate. So anything above 50 percent means that they approved our image. So you see that the approval rate is about 50 percent for almost all the keywords except light. And I'm going to come back to that later by that. And so we did a second experiment where we checked for just called them reciprocal keywords. So this is an image that you can enhance for two different concepts. So, for instance, here this image, you can enhance it for snow and for dark, and our proposed images are this one for snow and this one for dark. And we tested other methods like histogram equalization and PhotoShop auto contrast. And of course the thing is that none of these methods actually is able to take a semantic concept as an input. So that's why the PhotoShop and the histogram equalization like you have the same image here in the bottom. So what we did is image and keyword together with the prefer. They had we took photo observers. We showed them 29 of these pairs and we showed them all four images in a row keyword. And we asked them which one would you to pick one out of the four. And the result you see the following. So you see here, you see 25 percent line, because if you have one of the four images if you had pure random that would be 25 percent. And you see here that our method does indeed outperform the others. And the main reason is that these are keywords, reciprocal keywords, and we show them images you can enhance for the one or the other. And we didn't find any method that was actually able to enhance an image for sand or anything else. And that's why on this dataset, we significantly outperformed the others, because there's no method that actually can enhance an image for an arbitrary semantic concept. So I'm going to conclude this section with limitations and future work. So there are a couple of keywords that don't really have a significant characteristic. And these are mainly abstract keywords like friendship or boredom. And so at the moment we don't really know what you would do to an image to increase its friendshipness. It's very difficult. These are very high level concepts. And so we don't really know what to do with those. And the thing is also the significance values for these keywords are rather low. So that just means we don't have any significant characteristics for these keywords. Then there is some keywords have conflicting meanings. And this is not a reason why we underperform for the keyword light. The thing is what our algorithm does -- so if you have this input image and apply the keyword light to it, you would get this as an output. What our algorithm learned is that light images actually are rather dark. And the reason is that in order to have a light source and image that makes only sense if you have a dark surround. And consequently our algorithm darkens the image and actually the light source here is more sealed. It pops out more at the image. However, the observers on Amazon Mechanical Turk, they thought about light. I just want to have a bright image. So they went for the one on the left. So in our current implementation we're not able to tell whether the keyword light means this more artistic interpretation of light or just something bright. And then another problem that we haven't tackled yet. This is exactly your question, is multiple keywords or machine generated keywords. So it would be good to have something if you had multiple keywords that you can weigh them how important they are for the image or they could also be machine generated keywords or keywords are just wrong. So it would be good to have some mechanism to detect these ones and discard them or just remove the influence on the processing. So that was actually some work here from the group that was presented last week in ACM, where they deal with this problem of imperfect tagging. So that could be relevant to solve this problem. And the last thing is we talked about it before is this is a publication from Sing Bing here that does also context-based automatic image enhancement and the context base in this case is images that have a similar context in terms of image features. So it would be interesting to see what this framework does actually if you query for images with similar keywords. Maybe you can have the same output images or similar effects. So to summarize, I present a framework that links images characteristic and semantics. I presented two image enhancement methods. And you can download more images and also the code here from the website. Do we have more questions for that or then otherwise I will move on to the color naming? Okay. So this is going to be a bit shorter, because it uses the same framework at the bottom. So color naming is actually a task that is a bit tedious. So before our like traditional way to do it is you show a color name to an observer and you ask him to adjust like some color sliders to find the right color patch. Or you go the other way around. You just show a random color pitch and say what is this color and the user has to type in the name. So and of course this is a task -- this is very time consuming. And so our approach is now just to use these statistical framework. So if you have an input keyword green you can darkly compute from a large database, compute the significant distribution in color space and find the maximum in the green region of color space. Because we can go large scale with this framework, we actually wanted to do it. So we took a database of 950 English color names plus color values. These so-called XCD color survey. This survey was done with particular experiments. People asked to type in the names for different color patches. And to go even more large scale, we ask native speakers to translate this list of 150 color names to nine other languages. So we did Chinese friend -- and some European and Asian languages. And then we did our statistical analysis. So we started -- we went on Google image search downloaded 100 images per color name. And then we converted them all to CIELAB color space assuming they're sRGB and ran the test for all color names and we picked the color bin where we have the maximum significance values and to account for quantititization errors from histogram we did a billion interpolation in a local neighborhood around this maximum bin. And I just am going to move on to results now. So you see here for the ten different languages that we covered and 50 example color names, you see here color estimation. So, for instance, see pale pink in English was estimated to be this color. The Chinese interpretation was estimated to be this color. >>: You said they were looking at an English word or Chinese equivalent? When the Chinese people were asked to pick a pale pink, did they see ->> Albrecht Lindner: Weren't asked -- we asked them to translate these 900 color names to Chinese. We asked one native Chinese speaker to translate all the color names to his native language. >>: That's just one person? >> Albrecht Lindner: names. One person did the translation of the color >>: Okay. >> Albrecht Lindner: And then we did an automatic approach to estimate all these color names, all these color values. >>: Yeah. Okay. >>: But the translation, there could be some bias in the particular word a person since it was just one person translating. >> Albrecht Lindner: That is definitely true. There's some translation problems. I'm going to go to that next. So this is the accuracy. So this looks quite good but is it actually accurate. It turns out language is just one problem that you're dealing with. So we look at the -- as an example the color maroon. So these are our estimations for maroon. You see here the bars and maybe dim the lights so you can actually see that. The colors. For the different languages, and here this is the delta E distance to the ground truth from the XKCD dataset. And the thing is this XKCD dataset gives you only the language names and the estimation for the English color name. So that's why like in English, like the delta E distance is 10. This is reasonably good. But like it turns out that there's like a few languages where the arrow is quite low and then there's another group of languages where the arrow is quite high. And the reason is that the translator, for instance, for Chinese, he could not find a direct translation for maroon. So he picked something that is close. Also for Portuguese, for instance, the translation is castanha which means chestnut. And that is why this is more brown and consequently the arrow is a bit higher. It turns out that all the translators of these languages, they have the color name is something with chestnut in there. So in German this is chestnut brown and this is just chestnut. These are all colors that are chestnutty and that's why they're a bit brownish and consequently the error's a bit higher. It's not just an error it's a different word. You can compare to the English value. That's why it seems high but actually the brownish color here is correct. And you also put this in perspective. So what we did is we went on the Internet, we found different databases that proposed themselves estimation for maroon. So [inaudible] is an online color database. WPC, they have a definition for HTML Web pages. And X-11 for Unix, Maroni, different databases, and we compare their values to the XKCD value for maroon, and you see they have the same scale value as ours. >>: [inaudible] oh, I see. >> Albrecht Lindner: itself. XKCD has to be zero because it measures to >>: Okay. >> Albrecht Lindner: So and what you see here now is the delta E distance not only for maroon but for all the color volumes you have. And you see like the 30 seems to be like a reasonable error. This is also what different databases can agree on. And what you see here is all the delta E distance for our color names, and you see here the median here is roughly in the region where 30 is. So you could say that half of our color value estimations are actually in the range of human disagreement. And the other thing is actually these language translations are also challenging, and they add error to the estimation, because we don't have ground truth for all these languages. So if you had ground truth for all these languages, actually these errors would all be a bit lower. Okay. So I think you can show a quick demo of the color thesaurus. So we put this online. So you see here the color name, then sRGB values and C level values what you can do you can browse through color space. You can say I like this color but just want something that is darker. So you can just click here and you go to a darker color. Or you can say this is good but I would like the U angle to be a bit different. You can click through here and it will like walk you through color space to go to different colors. >>: Is the upper part changing as we do this or that's just picker. >> Albrecht Lindner: This is a color picker. I'll show it to you now. What you can do you can pick a color here from the color wheel. And then it will just give you closest color it can find. >>: As you click down in the bottom ->> Albrecht Lindner: yet. This does an update. I haven't implemented that >>: What's the color that the algorithm predicted for that particular [inaudible]. >> Albrecht Lindner: Here. name, light blue and green. values. This is the -- so this is an input color This will give you as an output sRGB >>: Predict name ->> Albrecht Lindner: Yes, you give it a name and it will automatically give you the color values. And what it can also do now you can translate it. But not in terms of language, but in terms of color. So what you can say, it's a nice color in English but what actually is the closest color that a German person would use, and then you will say, okay, this color is close to the German color experiment, which is [inaudible] in German or you can also go to Chinese and then say, okay, a Chinese person would call this color spring green. You can also post a query so you can say you have different types of burgundy. And if you want you can make a challenge name me a color name I'll try to look it up in my database. >>: Oker. >> Albrecht Lindner: >>: Try viridian. There we go. V-e-r-i-d-i-a-n. >> Albrecht Lindner: Okay. Okay. I cannot -- >>: Artist color. >> Albrecht Lindner: You want the challenge. What kind of tint is viridian? >>: I think it's a kind of green. >> Albrecht Lindner: A kind of green. Okay. So we have -- okay. >>: But if you were an artist, then certain paints aren't named by the pigment, just like oker. There are certain, like vermillion. You have vermillion. So that's a red. >> Albrecht Lindner: That I have here. >>: That's right. >> Albrecht Lindner: Yeah. Okay. So the survey was not done by artists. These were just random people on the Internet. They might not know all these special terms. But I'm sure like we could add it. So what was it, viridian? >>: Viridian. I'm trying to find if it's actually on the Web. Because I can't find it. I may have just hallucinated. There's viridian color. V-i-r-i-d-i-a-n. >> Albrecht Lindner: >>: VI? V-I? I was just misspelling it. >> Albrecht Lindner: It's the top one. Viridian. I have it. >>: Very good. >> Albrecht Lindner: Okay. So you can play with this. It's online. And with that I come to my final conclusions. I show you statistical framework that's easily scaleable because there's a very efficient way to implement this test. I showed you two applications, semantic image enhancement and automatic color naming, and I hope I could convince you a bit that semantic context actually is helpful for image processing in general. So thank you and if you have questions and answers I'll take them. [applause]. >>: So any other questions? >>: Actually have one. I think you glossed over the case where you're doing the automatic depth of field. So the training is not -- the transformation is not color anymore, right? It's the amount of blur that you ->> Albrecht Lindner: Exactly. >>: So how do you decide what is foreground and what is background? >> Albrecht Lindner: just go here. Yes. Good question. So the thing is -- let me >>: Because that seems to be a pretty difficult problem in general. >> Albrecht Lindner: It is. So for the color case, we had a semantic part that does the tone mapping and an image part that says you where to apply it. So for the depth of field, you have the same -- you have like a filter and frequency domain, [inaudible] domain that you estimate also from your significance values. And for macro it just says you you have to reduce your high frequency content. >>: So it has to be -- it has to have reasonably low frequency plan to start with. That's how you figure out ->> Albrecht Lindner: Exactly. And the map here, the weight map is so-called defocused estimation that says you which region of the image already are a bit out of blur, and then you just add it there. >>: Okay. Just enhancing the amount of blur. >> Albrecht Lindner: Exactly. And so this is not our own algorithm. There what I said at the beginning we have from Zoo and Zim we use there depth maps to do this task. There's different methods out there to do defocus map estimation, and we just use one of them. >>: Right. So if you want to sharpen everything, you can still do it, do the opposite, basically figure out ->> Albrecht Lindner: can also do that. >>: Sharp is hard. blur. I guess we haven't tried that. But I guess you The blurred content isn't there and you get extra >>: Slightly sharper. >>: The blur is a lot easier. >> Albrecht Lindner: What you could do if you have keywords that indicate structure that you know -- for instance, keyword fence or architecture or grid or something, you could try to learn that, for instance, architecture you have a lot of right angles if you have a descriptor for that. You can learn that there's a significant amount of right angles in the image. And then you could try to do some enhancement based on that that you say, okay, this image is additive, is architecture. So try to find something like right angles and then increase the sharpness there or something. So I guess it's possible we didn't do it. >>: Another very common image is like [inaudible] right? Text. So anything that you know that thing might be part of a text sharpen it. >> Albrecht Lindner: Yes, sure. So you could -- what you could try to do if you have a scanner and you have mixed regions so you have some region of text and some of images, you could, if you know that [inaudible] what you could do is you could try to detect if there's any text in the image and then you could actually do two things. So if you know there's text in the image you sharpen these regions. But you could also try to use an OCR character recognition to actually read the text and because it might tell you something to the image that's next to the text. If it talks about a sunset or something then you cannot only sharpen text but you can make the sunset look good. >>: So the processing from left to output is this in fraction of a second because everything's precomputed? >> Albrecht Lindner: Yes, all you're doing -- so you have to look up significance values for the keyword. Just lookup. And well then you just have a tone mapping. It's very, very fast. So this is -- I think it's suitable also for embedded system. All you need is a database of significance values. And like maybe a for a few thousand keywords. That's it you look them up and apply to the image. >>: Think it will go for a mobile phone? >> Albrecht Lindner: You could. You could. And you could say I had a nice image and I want to enhance the beach-ness of my image or something. You could have a slider that increases beach in your image. You could have that, sure. >> Sing Bing Kang: speaker once more. [applause] Any more questions? If not, let's thank the