>> Sing Bing Kang: Welcome. Good morning. ... It's my pleasure to introduce Albrecht Lindner. He's a...

advertisement
>> Sing Bing Kang: Welcome. Good morning. Thanks for being here.
It's my pleasure to introduce Albrecht Lindner. He's a Ph.D. student
at EPFL under the supervision of Sabine Süsstrunk. He's actually
defending next month. So he'll be finishing his Ph.D. very soon.
His interest is in statistical methods for scene understanding and
scene enhancement. And the talk he's about to give is based on his
thesis.
>> Albrecht Lindner: Great. So thank you for the introduction and
thanks for being here despite the upcoming CPR deadline. I'm glad you
made some time.
So before I start my talk, I maybe give a short introduction on my
economic background. So I started masters in engineering in Stuttgart
in Germany. And then after that there was a possibility to do double
masters program with university in Paris where I focused on image
signal processing.
And right now I'm doing a Ph.D. at EPFL Switzerland and as Sing Bing
just said I will defend soon. And so my professor is Sabine Süsstrunk
and the industrial sponsor is Oce Canon. These are the ones that
enable me to do this work.
And before I started the academic part, I want to also make quick notes
or I make -- I'm having kind of a one-person business for a bit more
than ten years now. So I'm doing this a bit in parallel and a bit less
than the last time but still so I'm building props for magicians and so
these things are maybe a bit unknown to you so I'll give a quick intro.
Start with [inaudible] design. So the magician comes to you and says
you what kind of effect he wants to realize. And then you check out
how you could make it possible that the audience gets the illusion of
some effect.
Then I build these devices. Of course, you use wood and metal for the
physical structure. But also depending on the magician's need you also
maybe have to build printed circuit boards, program micro controllers
or use power electronics in order to steer different flashes or engines
to make something happen on the scene.
And so this is not really a toy. So we're a bit serious about that.
So the gadgets that I build we won different international prizes. And
so I won't go deep into that. Of course, I would like to share all
these secrets of all the magicians.
But in the interests of time I will now go to the academic part. So
the outcome of my talk is I will present to you two different things,
the first one is semantic image enhancement. It was presented last
week at AC Multimedia. And the other one is automatic color naming
that will be presented at CIC in Los Angeles.
And both of them together they work on -- they use the same statistical
framework at the bottom. They're just two different applications for
the same mathematic concepts.
So we started semantic image enhancement.
The first thing I'd like to pose is a question, and the question is
which image is better? So do you have an opinion who proceeds on the
left? Your one? Okay. And the others prefer them on the right, I
guess.
So the thing is now the question was a bit unfair because I didn't give
you all the information. Additional information could have been who
prefers the image for the concept dark and which image do we prefer for
the concept snow?
And in any case, if I ask you for an image for the concept of dark, you
would have said the left image. For the concept now you would have
voted for the right image.
Same is true, again it's the very same image, these are just two
different versions of it. The one has been enhanced for our sandy
beach and the other for sunset.
So the point what I want to give here is ->>: The one percent actually looks unnatural. Regardless of what you
say. It looks unnatural because you don't naturally see the appearance
to sun like that when everything is so bright.
>> Albrecht Lindner: That's true. So I make the processing quite
strong here in order to show the difference between two images. You
can of course just make the processing a little less strong and do
something in the intermediary. This is just to demonstrate the
difference between the two concepts.
So the main point here is that you can't decide which image is better
if you don't have the semantic context but only the pixel values.
And this is the reason why any kind of auto adjust of contrasting
colors that can do a decent job but you will never get to a position
where you can actually enhance an image for semantic concept.
The only reason to do -- the only way to do this is to do manual
editing. So you use PhotoShop or some other tool and if you give such
an image to an artist and say like make it look like a sunset, he will
do something like the one on the right, for instance.
But, of course, like many editing is good but it will be nice if
something automatic.
So the goal is to have an automatic enhancement with semantics. And so
now let's look at what are the solutions that are provided today.
So one possibility is to have modes. So you have cameras that have a
portrait or nature or firework mode, and depending on the mode you're
setting, your camera will apply a specific processing to it.
Or you have printers, they have modes draft and presentation, and then,
again, each mode then evokes a different processing that is adapted to
the context.
Or the other methods that do a classification plus an enhancement step.
So there are different publications and they try to classify skin or
sky regions or some other classes. And then for each region the image
of a class you apply specific processing.
So if you do X regions you make sure the skin tones are correct or not
too reddish or something like that. So the problem of all these
methods is that they're difficult to scale to large vocabularies.
So you can imagine to have 10 to 20 modes or maybe 20 or 30 classes,
and then implement processing for each class. But what is if you want
to do a thousand or 10,000.
So this is just a practical -- you cannot implement 1,000 different
processings for 1,000 different classes. You cannot do it.
So you need some other means to take semantic context information and
process the image accordingly. So and this one point I want to make
clear is that so we're not doing a classification task. So a
classification task would be to go from an image to the keywords.
However, in our case we have a keyword and we want to apply this
keyword to the image. So we go the other way around and then a
classification task.
And what we need is we need something that says us a keyword
significance for an image characteristic. And we need a
re-characteristic that makes sense to change an image; we can't just
descriptors or backoff words or these kind of descriptors. These won't
work because we need something that we can actually change on image.
Such as likeness, color, depth of field for frequency distributions.
And so the way we measure the significance for keyword is the
following: So we started for big database. So we have one million
images plus keywords. So it's a database, public, readily available
from Flickr. So, for example, let's look at the keyword night and the
characteristics, we just look at some gray levels.
So what you see here is on the X axis is a histogram with different
gradients that go from white to dark and on the Y axis you see how many
percent of pixels on an image fall into this bin.
And you see here that for night, for instance, is you have on average a
bit more than 20 percent of the pixels actually fall in the darkness
bin. And this is quite more than for images that are not annotated
with night.
And you see on the other hand, on the other side here is night images,
they have less pixels that fall in the bright bins whereas the images
not on tip of night they are higher.
So the question is now is can we assess the difference between these
two distributions? So each bin is a distribution. So this is the
distribution here. The median and the 25 and 75th percentile. And can
we assess the difference between these two distributions for every
single bin? And the thing is we don't really know the distribution
apriori. The thing is we want this thing to be very versatile. So we
don't make any assumptions of how the distribution might look like.
That's why we're using nonperimetric significance tests, because we
don't want to assume this is a Gaussian or whatever distribution, we
just want it to be general. So we're using a nonperimetric
significance test.
And so there are a couple of significance tests that do that, that
compare two distributions. No matter what the actual distribution is.
So three of the well known ones are the Wilcoxon ranksum,
Kolmogorow-Smirnow, and Chi-Square method. And I have here three
example inputs to demonstrate the difference between these methods.
So the Wilcoxon ranksum test just measures a difference in median. You
see here where you have two probability distributions, they're just
shifted. Your test significance will be this part here, the difference
in the median.
However, if you have two distributions that have the same median but
they just have different shape you will not measure any difference. So
zero and the same for the bottom case. The Kolmogorow-Smirnow just
measures the maximum difference between the two accumulated
distribution functions along the X axis and that's why you measure the
difference in shape but you don't measure anything here.
Last year the Chi-Square method because you make the difference in each
single bin you will measure actually also difference in the
distribution at the bottom.
So the thing is in our case, when we want to do image enhancement, we
want to in or decrease an image characteristic. We don't care about
the shape we just want to increase or decrease.
So in our case we'll use the Wilcoxon ranksum method because we don't
want sensitivity to shape changes. Of course, if you have some other
application mind you might want to use a different test with different
properties.
>>: Isn't this shape distribution very typical of contrast when you
sort of reduce your contrast, it gets sort of gray and muddy if you
really push the contrast and things move out to the edges?
>> Albrecht Lindner: Sure.
this on the next slide.
But you can still do that.
I can show
So it's the same plot. You see again the distribution from the image
of night and the type with not night. And then if you compute the Z
value, this is just a test statistic minus the expected mean divided by
the variance, which gives you this E value.
You actually see that it is positive here in the dark bin. So means
significantly more pixels with dark gray value. And here's it's
negative. That means night images have significantly less pixels with
these bright gray values.
And now, for instance, if you want to enhance the contrast, you would
actually shift according to this thing here you would say I need more
pixels here. So you shift pixels to this side. And you need less
pixels here so you move them out of this region.
If you do contrast thrashing you'd have a similar distribution that
would tell you you need more here, more here and less here. And you
just move out of the middle and you go to the sides.
And so this example for gray level characteristics but you can have
other characteristics so here you see in C lab space, here's the
keyword Ferrari, and you see here three dimensional histogram and Z
values in there with these free heat maps. And you see the maximum
here the crossing actually is in the red region as expected for
Ferrari. And can you do more, for instance, here we have C values for
red green blue flower and here on the top you see the U angle.
You actually see that red has significantly more pixels in the red bins
and green more in the green bins. Flower has not -- it's a bit less
significant but also has some peaks in the red regions.
If you look at linear binary patterns you'll see that red green, blue
don't have very significant patterns, however flower has. This just
means that flower has -- they have less angles that are very pointy,
and they have more angles that are flat or cues. Means that flowers
have round shapes.
You see all these statistics and you see these patterns as expected in
the data. Or you can do spatial layouts. So here's the keyword light
and you see that in the center of an image -- so this is just a spatial
layout. So we just take an eight-by-eight compose of the image no
matter what the aspect ratio of the size is.
And you see here in the middle you have slightly positive values. That
means the center of an image of light tends to be a bit light in the
middle, but it's definitely a lot darker on these surroundings. And
here's just typical example that shows you the reason for that.
Or here we did spatial chromal layout. The keyword is barn. And you
see here that in the center image where you have a bound that is made
of wood a low chroma object. You have less chroma pixels and the
bottom where you have grass or nature scenes you have a high chroma,
and that's why the values are positive here. The bottom or you have
fireworks here. So these are gabba photo layouts that tell you how
much structure there is.
You see here the top is a blob that is from the fireworks and the
bottom is mainly an illuminated city.
>>: [inaudible] they make that they have the firework in the image?
>> Albrecht Lindner: No, you could -- but that would be that would be
automatic image classification. That would be to go from the image to
the keywords.
>>: Okay.
>> Albrecht Lindner: What I'm doing here is the other way around. So
what I'm doing here says me if you have an image for fireworks, there
probably is a lot of structure at the top in the middle and here at the
bottom.
Okay. You could potentially use it also to make an automatic image
orientation but this is not the goal for this work here.
>>: I see.
>> Albrecht Lindner: There's one more point. This is very efficient.
So this Wilcoxon ranksum test, what you need to do is you need to sort
all these values in a sorted list and then you have to compute the
ranks of all the elements that belong to the one set.
So I won't go into details. But the thing is you only have to sort
this list once. So every additional keyword that you want to measure
the significance, it's just a simple sum. So you can do this in a
fraction of a second.
So you can easily compute for thousands of keywords and like in my case
I just stop at around 3,000. We can easily compute for large set of
keywords. You can compute automatic the significance values at almost
no cost, because once the sorting is done, like they all go like one
after the other, very fast.
Okay. So now I'm going to stop this part and I will go on to the
application how we used it, which is the first one semantic image
enhancement. And we implemented three different methods.
So we did tone mapping. We did our color enhancement, and we did depth
of field adaptation. So here for this one I have to add that -- so the
input images and the method to estimate the defocused map they come
from these authors. However the algorithm and the output of the -- the
output image are going to show you the output around. So to show you
the principle I focus on the image enhancement. The others are
similar. The enhancement pipeline proposing is the following: We have
two independent inputs. We have image and keyword.
And then you extract characteristics and you have on the one hand you
have some of them are -- the one part is image dependent. The other is
dependent on the semantics of the keyword. And then you fuse them
together to get an output.
And in this case the image looks a bit more golden than the input. So
let me first go to the semantic component. So what we're doing here is
we take the significance values. In this case we just do it for the
red green and blue channels. It's like the gray levels that we saw
before, but here it's just for red green and blue channels.
You see here for the keyword gold, you have positive values C for the
blue curve. That means you need more pixels that have a low blue
content. And you have high Z values here, positive, that means you
need more pixel that have a high red content and green is something in
the middle.
Basically this processing adds a bit more gold to your image.
>>: Computed based on many images.
>> Albrecht Lindner:
Yes, we took this database of online images.
>>: This is how you got it back.
>> Albrecht Lindner: This is how I got this one, exactly. And from
that, now we can immediately derive a tone mapping operator. So we
want something. So here's the equation. But what we want something is
we want something that accumulates the blue channel in the lower, with
low values.
So we need for blue tow mapping function that goes from the beginning
and goes straight up and for that we need the opposite. We need
something that pushes the pixels away from the low red values and
accumulates them where they're high red values and approximately we
have a processing curve that goes up like this.
>>: So can I ask -- so for any image that say is gold and I want you to
enhance it, you also apply that ->> Albrecht Lindner:
For gold, yes.
>>: So even within gold there might be some differences in the way
people might enhance it, right? I mean, you're basically saying
anything that's gold, I just apply a single type of transformation.
>> Albrecht Lindner: There's a second component that adapts to the
image. But the semantic component will always be the same. So if some
people have a different imagination of the concept of gold, then I need
a database from these people to learn it.
But then I can do it. So it's just -- I need data to learn what people
mean with the concept of gold. And if they don't match the taste, in
this case, of the Flickr dataset, then, of course, it won't work.
>>: Also if I say gold and sunset, what will it do?
>> Albrecht Lindner:
I'm going to talk about that later.
>>: [inaudible].
>> Albrecht Lindner: And so the only thing we're having, we have just
like a single parameter that just says how extreme the processing is.
So if S is zero, we will just have the identity transform and the
larger Ss, the more extreme the processing will be.
Okay. So we have now the semantic component. Now let's go to the
image component. Because if you just apply this tone mapping curve to
the entire image you will effect everything also the sky and that won't
look good. So we need something that adapts to the image.
So for this input image in gold, we built a weight map that says how
likely is it that this pixel is part of the concept of gold. Okay? In
this case what we're doing is simple method we just take the color
value at the pixel and we ask how high is the significance for this
semantic concept, and if it's high, you will have a bright pixel here.
If it's low it's like a dark pixel on this side.
>>: And how do you do that?
>> Albrecht Lindner:
Okay.
We just take the significance values.
>>: How do you compute it? Is that a million -- out of the subset of
images that's been tacked gold.
>> Albrecht Lindner: Yes, exactly. So we have these significance
values for gold that we computed from this one million images database.
>>: Okay.
>> Albrecht Lindner: And this says me how much each single color is
related to gold. This is the high value in the yellowish region and
like a negative value somewhere in the red and green region.
>>: There must be a technique to allow you to do that kind of -- what
to use.
>> Albrecht Lindner: This is not hiding. The thing is I know the
keyword anyway. I have two inputs. I have an image and a keyword.
And so I know the tack of this image.
>>: I understand.
relevant to gold.
But you're saying the probability that is actually
>> Albrecht Lindner:
Exactly.
>>: But the question is how do you learn that?
>> Albrecht Lindner: Well, I learned it from the tack database of one
million images that I had at the beginning.
>>: Oh, you're looking at the current -- across images that's tactical.
>> Albrecht Lindner: Yes, the thing is I have these significant
distributions that I showed you before for gold and barn and Ferrari
and like any concept.
And so this is precomputed. And you can just store it. And now if you
have this concept of gold, you just look up, okay, what is the
significance distribution for gold. You just take it out of your
database. It's precomputed and you can immediately take every pixel
here and look up how significant this is for gold. You just look at
all of the distributions. Okay. You could, of course, to other
methods to actually to find the regions relevant to gold. Could you do
like a segmentation and then do some more sophisticated computer vision
techniques to actually find the regions.
And there's a lot of work done, of course. But in our case, we just do
that. But, of course, you can do something more sophisticated than
that.
>>: Yeah, because I would imagine like it's so the distribution is so
varied it's not clear to me.
>> Albrecht Lindner: The thing is if the distribution vary, the
significance values will be although you can use a threshold if I don't
use a dominant concept I won't touch the image.
For gold you definitely have a high significance blob in the yellow
region that is very dominant. And if you detect that you can say,
okay, like in this case I really know what to do and you go further.
Okay. So now we have our image and semantic component and we're going
to fuse them in a semantic processing step. And what we're doing is so
we globally process the image with this tone mapping region and do a
reweighting with the map here. So we take the input image where the
weight is zero, and we take this intermediate image where the weight is
one.
And in between we just interpret linearly and that means we just
enhance the characteristic in the relevant region. In this case we
touch either the bottom part and increase the goldness here and we
won't touch this guy.
And now I'm going to show you like a couple of example images that we
processed here, for instance, here this is the, I will just dim the
lights here that you can see this a bit better.
So here's for sand, here this is for snow. Here is an input image for
dark and now this is the output image. This is the silhouette.
There's sunset. There's grass. Autumn. There's strawberry.
So you can see, you can do any kind of keywords. You're not limited to
restricted vocabulary. Here is enhancement for sky. There's banana.
Makes banana more yellow. And this now is for depth of field. So the
one before is colors. So now you see the keyword macro indicates that
the artist's intent was to have a depth of field effect. If you plug
the macro keyword to our framework you'll get this as an output where
you blur out the background but foreground remains in focus.
Here you have for flower also special frequency processing or here is
this little boy for the keyword macro, also blurs out the background.
So intuitively we found it looks quite good. But of course we wanted
some, we wanted to measure the performance of the system. So we did
some experiments and the experiment was so we showed two images to an
observer. The original and our proposed image.
And we also showed them the keyword. And we just asked them which
image do you prefer for this context. In this case contact is sand.
Of course, you don't even really see there's a beach. In this case
people would probably pick on that image.
>>: Can you compare against both other techniques like this?
>> Albrecht Lindner: And so we did eight different keywords. 30
images each. Different parameters for the scale variable. So for the
observer there were almost 30,000 image comparisons. We did it in
Amazon Mechanical Turk. And the result you see here is so at the
bottom you see the escape parameter and here's the approval rate.
So anything above 50 percent means that they approved our image. So
you see that the approval rate is about 50 percent for almost all the
keywords except light. And I'm going to come back to that later by
that.
And so we did a second experiment where we checked for just called them
reciprocal keywords. So this is an image that you can enhance for two
different concepts.
So, for instance, here this image, you can enhance it for snow and for
dark, and our proposed images are this one for snow and this one for
dark.
And we tested other methods like histogram equalization and PhotoShop
auto contrast.
And of course the thing is that none of these methods actually is able
to take a semantic concept as an input.
So that's why the PhotoShop and the histogram equalization like you
have the same image here in the bottom.
So what we did is
image and keyword
together with the
prefer. They had
we took photo observers. We showed them 29 of these
pairs and we showed them all four images in a row
keyword. And we asked them which one would you
to pick one out of the four.
And the result you see the following. So you see here, you see
25 percent line, because if you have one of the four images if you had
pure random that would be 25 percent. And you see here that our method
does indeed outperform the others. And the main reason is that these
are keywords, reciprocal keywords, and we show them images you can
enhance for the one or the other. And we didn't find any method that
was actually able to enhance an image for sand or anything else.
And that's why on this dataset, we significantly outperformed the
others, because there's no method that actually can enhance an image
for an arbitrary semantic concept.
So I'm going to conclude this section with limitations and future work.
So there are a couple of keywords that don't really have a significant
characteristic. And these are mainly abstract keywords like friendship
or boredom. And so at the moment we don't really know what you would
do to an image to increase its friendshipness.
It's very difficult. These are very high level concepts. And so we
don't really know what to do with those. And the thing is also the
significance values for these keywords are rather low. So that just
means we don't have any significant characteristics for these keywords.
Then there is some keywords have conflicting meanings. And this is not
a reason why we underperform for the keyword light. The thing is what
our algorithm does -- so if you have this input image and apply the
keyword light to it, you would get this as an output.
What our algorithm learned is that light images actually are rather
dark. And the reason is that in order to have a light source and image
that makes only sense if you have a dark surround. And consequently
our algorithm darkens the image and actually the light source here is
more sealed. It pops out more at the image.
However, the observers on Amazon Mechanical Turk, they thought about
light. I just want to have a bright image. So they went for the one
on the left. So in our current implementation we're not able to tell
whether the keyword light means this more artistic interpretation of
light or just something bright.
And then another problem that we haven't tackled yet. This is exactly
your question, is multiple keywords or machine generated keywords. So
it would be good to have something if you had multiple keywords that
you can weigh them how important they are for the image or they could
also be machine generated keywords or keywords are just wrong.
So it would be good to have some mechanism to detect these ones and
discard them or just remove the influence on the processing. So that
was actually some work here from the group that was presented last week
in ACM, where they deal with this problem of imperfect tagging.
So that could be relevant to solve this problem. And the last thing is
we talked about it before is this is a publication from Sing Bing here
that does also context-based automatic image enhancement and the
context base in this case is images that have a similar context in
terms of image features.
So it would be interesting to see what this framework does actually if
you query for images with similar keywords. Maybe you can have the
same output images or similar effects.
So to summarize, I present a framework that links images characteristic
and semantics. I presented two image enhancement methods. And you can
download more images and also the code here from the website.
Do we have more questions for that or then otherwise I will move on to
the color naming? Okay. So this is going to be a bit shorter, because
it uses the same framework at the bottom. So color naming is actually
a task that is a bit tedious. So before our like traditional way to do
it is you show a color name to an observer and you ask him to adjust
like some color sliders to find the right color patch. Or you go the
other way around. You just show a random color pitch and say what is
this color and the user has to type in the name.
So and of course this is a task -- this is very time consuming. And so
our approach is now just to use these statistical framework. So if you
have an input keyword green you can darkly compute from a large
database, compute the significant distribution in color space and find
the maximum in the green region of color space.
Because we can go large scale with this framework, we actually wanted
to do it. So we took a database of 950 English color names plus color
values. These so-called XCD color survey. This survey was done with
particular experiments. People asked to type in the names for
different color patches.
And to go even more large scale, we ask native speakers to translate
this list of 150 color names to nine other languages. So we did
Chinese friend -- and some European and Asian languages. And then we
did our statistical analysis. So we started -- we went on Google image
search downloaded 100 images per color name. And then we converted
them all to CIELAB color space assuming they're sRGB and ran the test
for all color names and we picked the color bin where we have the
maximum significance values and to account for quantititization errors
from histogram we did a billion interpolation in a local neighborhood
around this maximum bin.
And I just am going to move on to results now. So you see here for the
ten different languages that we covered and 50 example color names, you
see here color estimation.
So, for instance, see pale pink in English was estimated to be this
color. The Chinese interpretation was estimated to be this color.
>>: You said they were looking at an English word or Chinese
equivalent? When the Chinese people were asked to pick a pale pink,
did they see ->> Albrecht Lindner: Weren't asked -- we asked them to translate these
900 color names to Chinese. We asked one native Chinese speaker to
translate all the color names to his native language.
>>: That's just one person?
>> Albrecht Lindner:
names.
One person did the translation of the color
>>: Okay.
>> Albrecht Lindner: And then we did an automatic approach to estimate
all these color names, all these color values.
>>: Yeah. Okay.
>>: But the translation, there could be some bias in the particular
word a person since it was just one person translating.
>> Albrecht Lindner: That is definitely true. There's some
translation problems. I'm going to go to that next.
So this is the accuracy. So this looks quite good but is it actually
accurate. It turns out language is just one problem that you're
dealing with. So we look at the -- as an example the color maroon. So
these are our estimations for maroon. You see here the bars and maybe
dim the lights so you can actually see that. The colors.
For the different languages, and here this is the delta E distance to
the ground truth from the XKCD dataset. And the thing is this XKCD
dataset gives you only the language names and the estimation for the
English color name.
So that's why like in English, like the delta E distance is 10. This
is reasonably good. But like it turns out that there's like a few
languages where the arrow is quite low and then there's another group
of languages where the arrow is quite high.
And the reason is that the translator, for instance, for Chinese, he
could not find a direct translation for maroon. So he picked something
that is close. Also for Portuguese, for instance, the translation is
castanha which means chestnut. And that is why this is more brown and
consequently the arrow is a bit higher.
It turns out that all the translators of these languages, they have the
color name is something with chestnut in there.
So in German this is chestnut brown and this is just chestnut. These
are all colors that are chestnutty and that's why they're a bit
brownish and consequently the error's a bit higher. It's not just an
error it's a different word. You can compare to the English value.
That's why it seems high but actually the brownish color here is
correct.
And you also put this in perspective. So what we did is we went on the
Internet, we found different databases that proposed themselves
estimation for maroon. So [inaudible] is an online color database.
WPC, they have a definition for HTML Web pages. And X-11 for Unix,
Maroni, different databases, and we compare their values to the XKCD
value for maroon, and you see they have the same scale value as ours.
>>: [inaudible] oh, I see.
>> Albrecht Lindner:
itself.
XKCD has to be zero because it measures to
>>: Okay.
>> Albrecht Lindner: So and what you see here now is the delta E
distance not only for maroon but for all the color volumes you have.
And you see like the 30 seems to be like a reasonable error. This is
also what different databases can agree on.
And what you see here is all the delta E distance for our color names,
and you see here the median here is roughly in the region where 30 is.
So you could say that half of our color value estimations are actually
in the range of human disagreement.
And the other thing is actually these language translations are also
challenging, and they add error to the estimation, because we don't
have ground truth for all these languages.
So if you had ground truth for all these languages, actually these
errors would all be a bit lower. Okay. So I think you can show a
quick demo of the color thesaurus. So we put this online. So you see
here the color name, then sRGB values and C level values what you can
do you can browse through color space. You can say I like this color
but just want something that is darker. So you can just click here and
you go to a darker color.
Or you can say this is good but I would like the U angle to be a bit
different. You can click through here and it will like walk you
through color space to go to different colors.
>>: Is the upper part changing as we do this or that's just picker.
>> Albrecht Lindner: This is a color picker. I'll show it to you now.
What you can do you can pick a color here from the color wheel. And
then it will just give you closest color it can find.
>>: As you click down in the bottom ->> Albrecht Lindner:
yet.
This does an update.
I haven't implemented that
>>: What's the color that the algorithm predicted for that particular
[inaudible].
>> Albrecht Lindner: Here.
name, light blue and green.
values.
This is the -- so this is an input color
This will give you as an output sRGB
>>: Predict name ->> Albrecht Lindner: Yes, you give it a name and it will automatically
give you the color values. And what it can also do now you can
translate it. But not in terms of language, but in terms of color.
So what you can say, it's a nice color in English but what actually is
the closest color that a German person would use, and then you will
say, okay, this color is close to the German color experiment, which is
[inaudible] in German or you can also go to Chinese and then say, okay,
a Chinese person would call this color spring green. You can also post
a query so you can say you have different types of burgundy. And if
you want you can make a challenge name me a color name I'll try to look
it up in my database.
>>: Oker.
>> Albrecht Lindner:
>>: Try viridian.
There we go.
V-e-r-i-d-i-a-n.
>> Albrecht Lindner:
Okay.
Okay.
I cannot --
>>: Artist color.
>> Albrecht Lindner: You want the challenge.
What kind of tint is viridian?
>>: I think it's a kind of green.
>> Albrecht Lindner:
A kind of green.
Okay.
So we have -- okay.
>>: But if you were an artist, then certain paints aren't named by the
pigment, just like oker. There are certain, like vermillion. You have
vermillion. So that's a red.
>> Albrecht Lindner:
That I have here.
>>: That's right.
>> Albrecht Lindner: Yeah. Okay. So the survey was not done by
artists. These were just random people on the Internet. They might
not know all these special terms.
But I'm sure like we could add it.
So what was it, viridian?
>>: Viridian. I'm trying to find if it's actually on the Web. Because
I can't find it. I may have just hallucinated. There's viridian
color. V-i-r-i-d-i-a-n.
>> Albrecht Lindner:
>>: VI?
V-I?
I was just misspelling it.
>> Albrecht Lindner:
It's the top one.
Viridian.
I have it.
>>: Very good.
>> Albrecht Lindner: Okay. So you can play with this. It's online.
And with that I come to my final conclusions. I show you statistical
framework that's easily scaleable because there's a very efficient way
to implement this test.
I showed you two applications, semantic image enhancement and automatic
color naming, and I hope I could convince you a bit that semantic
context actually is helpful for image processing in general.
So thank you and if you have questions and answers I'll take them.
[applause].
>>: So any other questions?
>>: Actually have one. I think you glossed over the case where you're
doing the automatic depth of field. So the training is not -- the
transformation is not color anymore, right? It's the amount of blur
that you ->> Albrecht Lindner:
Exactly.
>>: So how do you decide what is foreground and what is background?
>> Albrecht Lindner:
just go here.
Yes.
Good question.
So the thing is -- let me
>>: Because that seems to be a pretty difficult problem in general.
>> Albrecht Lindner: It is. So for the color case, we had a semantic
part that does the tone mapping and an image part that says you where
to apply it.
So for the depth of field, you have the same -- you have like a filter
and frequency domain, [inaudible] domain that you estimate also from
your significance values. And for macro it just says you you have to
reduce your high frequency content.
>>: So it has to be -- it has to have reasonably low frequency plan to
start with. That's how you figure out ->> Albrecht Lindner: Exactly. And the map here, the weight map is
so-called defocused estimation that says you which region of the image
already are a bit out of blur, and then you just add it there.
>>: Okay.
Just enhancing the amount of blur.
>> Albrecht Lindner: Exactly. And so this is not our own algorithm.
There what I said at the beginning we have from Zoo and Zim we use
there depth maps to do this task. There's different methods out there
to do defocus map estimation, and we just use one of them.
>>: Right. So if you want to sharpen everything, you can still do it,
do the opposite, basically figure out ->> Albrecht Lindner:
can also do that.
>>: Sharp is hard.
blur.
I guess we haven't tried that.
But I guess you
The blurred content isn't there and you get extra
>>: Slightly sharper.
>>: The blur is a lot easier.
>> Albrecht Lindner: What you could do if you have keywords that
indicate structure that you know -- for instance, keyword fence or
architecture or grid or something, you could try to learn that, for
instance, architecture you have a lot of right angles if you have a
descriptor for that. You can learn that there's a significant amount
of right angles in the image.
And then you could try to do some enhancement based on that that you
say, okay, this image is additive, is architecture. So try to find
something like right angles and then increase the sharpness there or
something.
So I guess it's possible we didn't do it.
>>: Another very common image is like [inaudible] right? Text. So
anything that you know that thing might be part of a text sharpen it.
>> Albrecht Lindner: Yes, sure. So you could -- what you could try to
do if you have a scanner and you have mixed regions so you have some
region of text and some of images, you could, if you know that
[inaudible] what you could do is you could try to detect if there's any
text in the image and then you could actually do two things. So if you
know there's text in the image you sharpen these regions. But you
could also try to use an OCR character recognition to actually read the
text and because it might tell you something to the image that's next
to the text.
If it talks about a sunset or something then you cannot only sharpen
text but you can make the sunset look good.
>>: So the processing from left to output is this in fraction of a
second because everything's precomputed?
>> Albrecht Lindner: Yes, all you're doing -- so you have to look up
significance values for the keyword. Just lookup. And well then you
just have a tone mapping. It's very, very fast. So this is -- I think
it's suitable also for embedded system. All you need is a database of
significance values. And like maybe a for a few thousand keywords.
That's it you look them up and apply to the image.
>>: Think it will go for a mobile phone?
>> Albrecht Lindner: You could. You could. And you could say I had a
nice image and I want to enhance the beach-ness of my image or
something. You could have a slider that increases beach in your image.
You could have that, sure.
>> Sing Bing Kang:
speaker once more.
[applause]
Any more questions?
If not, let's thank the
Download