21265 >> Sing Bing Kang: I'm delighted to welcome back...

advertisement
21265
>> Sing Bing Kang: I'm delighted to welcome back Katsushi Ikeuchi from the University of Tokyo.
I've known Katsushi for a long time. He was my advisor 20 years ago at CMU. 20 years ago, a
long time. Katsushi is one of the most influential computer vision researchers in Japan.
In addition to computer vision, Katsushi also worked on robotics and graphics. Today he's going
to give an update on his e heritage project.
>> Katsushi Ikeuchi: Thank you very much Sing Bing. This is a combination of old stuff and new
stuff. [inaudible] and I will stop. By the way, this project is called e-Heritage. And what is
e-Heritage? Basically e-Heritage, there is a [inaudible] try to make this form shows up later we
can display that one in remote place and different times.
This project is called e-Heritage. And in order to generate such an e-Heritage, there are two
ways, one is 2D e-Heritage and the other is 3D e-Heritage. And as an example of 2D e-Heritage
I'll show some Pompe examples. And as you know Pompe's a ruin which is buried around the
1964.
And near to Annapoli, and this is Annapoli. And this is Rome. And this is a scene which is
digitized [inaudible] Chinese style. Japanese style.
[video]
This is so-called [inaudible] I don't know.
Google people are using it [inaudible] why we are using they are human. But anyway by using
this one, now we obtain 360 degrees of view like this. And then, of course, by combining this one
to GPS image, GPS data and map information, we can make data structure once you click
around here along this street you can show 360 view.
Of course, this is a way to show the e-Heritage in different place, different time. And this time we
are working from [inaudible] route to the [inaudible] house.
>>: Do you do any visual stabilization? Because it's very jerky.
>> Katsushi Ikeuchi: We're also working with -- but in this particular video we didn't. Of course
it's 2D e-Heritage it's relatively easy for everyone and the contents is available for [inaudible] and
educations. And personal memory.
However, based on just 2D e-Heritage, we cannot expand any accurate study or nor preservation
of heritage. So we are definitely should go to the 3D heritage.
And because e-Heritage is priceless and irreplaceable and vanishing, so it's a good idea to
safeguarding by storing such data. And also based on that e-Heritage data, we can conduct
scientific studies and, of course, we can use for contents, too.
And this is provide ample opportunity for research topics in computer graphics and computer
vision and robotics, too, actually. So today I'll talk about three issues. One is modeling issue.
One is representation issue and one is display issue.
Some of the stuff is old. Some of the stuff new. So let's take a look at some modeling issues.
So basically how to model -- how to obtain 3D data from cultural heritage. And in this area we
should worry about two areas. One is geometric information and the other is photometric
informations.
And as an example of geometric information I'm using relatively all the work by young project.
And we scan this [inaudible] temple. [video].
>>: Ancient India in the tradition of Kramere. The temple was constructed around the end of the
12th century to bring relief in the crisis in the [inaudible] era. It is well known for the appearance
of, for example, calm smiling faces on towers and double corridors carved in beautiful interest in
relief.
>> Katsushi Ikeuchi: And we scan this convergent temple. Why? Because central tower is
inclining and there's a possibility of collapse in the near future so it's good idea to obtain 3D from
collapsable cause.
And since this temple is large structure, 150, 150, 30 meter high. So big side usually provide the
challenge in research topics. In geometric modeling, usually we take three step data acquisition
in order to determine the data vehicle, the alignment and then connect all the data merging.
And the object is relatively small and covered by say 10 images or 10 views already various
commercially available software exists.
But due to the size of convergent temple research topic appears. And the university [inaudible] is
easy job when you encounter difficult problem, just give to the student, then automatic problem
solver, provide solutions.
So basically university [inaudible] job is to find difficult problems. And this convergent temple
provides me lots of opportunity to express such a difficult problem. And one of the problems is to
design new sensors for large scale building and also to develop new [inaudible] to handle huge
amount of data.
And some of you already knew this talk of the design use sensor, balance sensor, basically. For
example, data acquisition you need balance sensors. You need corridor climbing sensors. You
need meter sensors quickly review balance sensors.
Basically we obtain color image and lens images. And as you know, color image store RGB
information of each pixel while arranging this pixel to the object. This is color image.
And usually we can obtain such an image by using laser scanners. Of course by [inaudible] too.
And range sensors case to project laser light to the object and the measureings flight time we can
determine distance.
And many commercially available software -- commercially available sensors produced. While
the research issue exists? Because the sensor is so-called grand base. So you put sensor on
tripod in Sylex case and wait 15 minutes and data appears.
It's good. But sometime ground base has problem. This case we're talking about pagoda, and of
course from ground, all the data is obtained. But due to the occlusion, some portion is missing.
And of course we can build a scaffold. But this priority is you can build a scaffold. But we're
talking about the [inaudible] temple 150 to 150 30 meter height. So scaffold is not good solution,
especially Mayan temple is famous place, so if you cover that temple by scaffold, sight seer quite
angry with you. We should avoid such method.
So we build balance sensor, and a balance sensor we hand lens sensor under the [inaudible] and
you can bring the sensor any place. And this is a scene which we are loading the balance
sensor. This is a good idea. Good idea.
But this issue exists. Data obtained balance sensor like this. This doesn't work. Again, as I told
you, if you encounter difficult problem, university professor usually has automatic problem solver,
gives a grad student three years and one grad student mount TV camera on top of the lens
sensors and so-called factorization method and this stored data because I'm from university -used to be Carnegie Mellon [inaudible] famous method so-called factorization method.
Probably you know from the image you can obtain the motion of the sensors. But unfortunately
maybe [inaudible] -- not so [inaudible] to arrange data. So somehow we have to worry. And what
we did was sensor distorted the lens data image motion and balance motion. So we extract three
constraint such as factorization method provide 3D data.
There is balance sensor also provide lens sensor. It should be consistent so we can set up one
constant. And also of course [inaudible] adjustment and thirdly long-time I'm working on
[inaudible] at that time we invented a smoothness constraint so as a group we should use
smoothness constraint.
Fortunately the reason why we use balloon is balloon motion is relatively smooth. So we can
apply smoothness constraint. If we use helicopter, they have high frequency motion. So we
cannot use such smoothness constraint. So we use a smoothness constraint. And we have this
global cross function. And usually this is normalization, nonlinear normalizations so you need
good initial solutions.
And fortunately Tokyo provide good solution. [inaudible] so using this method we obtain good
initialization, initially [inaudible] and plug in this initialization problem and then we set up iterative
solutions.
Then this distorted data is corrected like this. This distorted data is correct like this. [inaudible]
where this is all the stuff but then it's okay.
And we also invented various [inaudible] sensors such as climbing sensors and also [inaudible]
sensors. And then this orders videos.
>>: In order to scan large architectural structures such as the Mayan temple we have to use
different sensors depending on the location of objects on the site. The scan the data faces of
[inaudible] we used the long range sensor named Cyrex. We measured the face from any
positions such as the ground the scaffold on the roof and a bucket lifted up by a crane. The data
from different directions were integrated and a 3D digital model of each face was built.
To scan the narrow space between the [inaudible] and the corridor, the laser sensor [inaudible]
which moves vertically along the ladder [inaudible] and was used.
Mayan temple is a huge architectural structure with a large number of high towers and it is not
practical to scan the upper side, especially the roofs from scaffolds.
For this task, we use the balloon sensor. The laser sensor suspended under a balloon which had
been developed for this purpose.
>> Katsushi Ikeuchi: So this is a story. And then we obtain half terabyte datasets and now the
issue is how to align those datas. And, of course, again number of the images is small, we can
use standard ICP.
So that alignment basically to obtain data correspondence between feature point and then
determine rotation translations.
And we apply ICP. We can align the observed data easily. Like this. But the problem is we are
talking about terabyte datas and it requires large amount of datas, and also if we apply ICP it
turns out probably takes one year to align all the arranged datas. And I ask grad student where
he takes one -- hey you cannot graduate. And of course he works hard.
And he developed good softwares. And basically what approach always is a quick pairwise
alignment using GPU and also parallel assignment, simultaneous alignment on PC clusters, and
basically I should skip this one. First to make a quick correspondence he used GPU and mapped
one range data over the [inaudible] processing memory mode and then generated this kind of
index image.
And another image is mapped to this one and then make a correspondence. And originally NL
squared operation become N operations. Secondly, some of the characterization sparse matrix
so we use incomplete coarse key factorization and computational time becomes [inaudible] and
suddenly concerning the data dependency, we assign all the data in parallel computers. And
then we deduce a thousand times parse agreement. Sorry, Japanese. Thousand times faster
and then also terabyte data processing.
And this is the result. Again, this is all the stuff. So entire Mayan temple is represented one
centimeter resolutions.
>>: How many points are there?
>> Katsushi Ikeuchi: Which?
>>: [inaudible].
>> Katsushi Ikeuchi: Yes. And, of course, 3D data so you can enter inside this 3D datas.
Since we obtained entire structures. So from this 3D data, we can generate floor plans. And we
generate floor plans and it turns out somehow 0.9 fold degree counterclockwise rotation entire
Mayan structures, with respect to the path directions.
And who cares? But you know who cares. But somehow 0.9 degree counterclockwise. And no
one knows why such a rotation occurs. But in obtaining this kind of 3D data you can find this kind
of some new findings.
Another findings, you know, we scan 173 data phases and we classify that 173 data phases and
then originally people guessing we can classify that 173 phases into three groups.
And it turned out we can also classify such phase into three groups. [inaudible] and moreover,
similarity group exists. One similarity group is here. One similarity group is here. Another
similarity group here.
Basically similar face exists approximately positions. And again that is corresponding to the
previous rumor saying who independent work team exist and they work in parallel manner. One
groups K this area, one group cave there. One group caves here and usually central portion is
covered by teachers and student take a look at teachers caving as they follow.
Such -- so in this case we can find four similarity groups. So maybe we can say three, four
independent researcher groups work independently. In parallel manner. Did I talk about this
one?
>>: I don't remember.
>> Katsushi Ikeuchi: Okay. This is new stuff. And also 3D [inaudible] when Mayan temple was
built, some of the impediment is hidden. Even today you visit the Mayan temple you cannot see
this kind of impediment. And even there is no picture of this impediments.
And we scan a little bit -- a little bit combined all of them and generally synthetic picture like this.
>>: [inaudible].
>> Katsushi Ikeuchi: Maybe like this. Maybe this is a real size, actually. And this not so built to
impediment, but I gave a talk at U.C. Berkeley. One of the points -- the Buddhist department
quite excited when he saw this impediment. Why? Can you guess?
>>: Converge [inaudible].
>> Katsushi Ikeuchi: Yeah, yeah. Did I talk that one? [laughter].
>> Katsushi Ikeuchi: Yeah. According to him, apparently there was a Buddha carved out and
then this is a man symbol, man symbol meaning [inaudible] that is Seeba. Seeba is a God in
Hinduism. So meaning this representing Seeba by removing Buddha, this is evidence of religious
change from Buddhism to Hinduism.
So apparently this temple was converted from Buddhism to Hinduism at a certain point. So what
we are saying is by using such 3D structure, not only generate promotion video, but also we can
find this kind of archeological findings such as entire structure is [inaudible] 0.94 counterclockwise
over the [inaudible] yes we can classify that 173 faces into three groups [inaudible] and also
impediment, the impediment suggests that they just change from Buddhism to Hinduism.
So this is a parallel 3D data. Another example. [inaudible] in Pompe. Actually, the reason why
we went to Pompe is we would like to make a YouTube video and we are interested in scanning
this [inaudible] house called [inaudible]. And we scanned entire structure like this. This is an
entrance. And this is the podium, I don't know the English name. And garden exists, and
beautiful Pompe bust. And, of course, by using this kind of video, we can generate the batch of
tourism. But after this one we make cross section and check whether this cross section is
corresponding to previous finding. Then somehow there is a discrepancy between previously
scanned floor plan with our floor plan.
And somehow all that one is larger than ours. And I think about it. And previously what
happened was usually when you scanning, not scanning, how does it in English, scabbing -subbing, they would put a pole like this. And that backward is located in hillside. So they are
measuring this distance while we scan entire structure and generate cross section, meaning we
are measuring this distance.
So probably due to this measuring method such discrepancy occurs. Meaning -- more accurate
than previous method. So by using again 3D data we can check all the floor plan, too.
So these are the story of how 3D shape measurement provide new finding over some of the
insight of measuring method. Now, let's take a look at photometric modeling.
And one issue is when you measure the 3D data, we paste picture, color picture over the scene.
And then Mayan temple is large structure. So this is morning area. And when you take a picture
and coming back, evening, color difference occurs.
How can we do that? Well, you can make [inaudible] whatever. But in order to preserve real
color, it is not good idea to simple [inaudible] did I talk about this one?
>>: I don't remember.
>> Katsushi Ikeuchi: Okay.
And observe the color, what's going on is basically multiplication of surface color with elimination
colors.
So what we need is to separate elimination color from surface color to preserve the surface color.
How can we do that? Well, let's to make a story simple, narrow the assumption, basically we are
RGBs scanned by particular Web lengths, and then story becomes relatively simple, observe R is
multiplication surface R and elimination R. Like this.
But still ambiguity exists because we only observe RJB. We set up three equations, but there is
six unknown parameter. Maybe you think if we increase observation also observation, still you
can set up six equations and let's see. One, two, three, four, 5, six, seven, eight -- nine unknown.
So you cannot obtain solutions.
More over, in this equation, there is ambiguity. That [inaudible] office ten times brighter
illumination. 1-tenths of the surface color provide the same observation. So ambiguity exists
also. So usually in this case we can set up so-called chromaticity. And chromaticity space, we
can still obtain similar equation but still ambiguity exists in the posed problem. Two observation
and four unknowns.
What can we do then? Well, when you are talking about outside object, usually we can assume
illumination is black body datasets. Sun is black body datas. Sky is black body data. So
combined illumination is black body addition.
So then what's good? Because black body addition, inverse of green inverse of red is existing
one story vines. So by plugging in this one, just inverse equation, plug in this elimination
constraint, then basically we can set up this equation.
If you observe RJB, M and C is known parameters. So unknown surface color exists in among
storage lines. So when you observe one elimination, one observation, then you can guess the
near surface color existing one particular storage lines.
If we observe two images of the same point under different elimination conditions, we can obtain
two lines and intersection provide the surface colors.
And also we need [inaudible] and we introduce some of the [inaudible] conditions and then
basically we are saying we can't precisely obtain surface colors. Obtain two images of the same
positions and then from that we can obtain the surface colors.
Now sometimes it's clumsy to obtain the same position from under different elimination
conditions. Fortunately, single image provides some time to different elimination conditions,
because this area illuminated by sky and sun and this area, subtle area only limited to sky. So
different elimination conditions.
So if we obtain some particular surface area, which is the same RB, though, we are -- that area is
divided by shadow areas illuminated area.
From that area we can obtain same RB but different elimination conditions. And by using these
ideas, I'll skip these details, we can obtain chromicity averaging.
So from original image by using that idea we can average chromicity and then blackness.
So these are the stories how we obtain real colors. But again our GB is literally values. Because
incoming light is continuous spectrum distribution. And you are measuring RGB by using a filter,
particular to this particular camera. So depending on the camera, this filter characteristic is
different. So you are obtaining arbitrarily RJB.
Moreover, sometime different color appear exactly the same on the particular elimination
conditions, if you use RJB. So definitely we need spectrum modeling necessary. And of course
there is a method to obtain particular spectrum at each point. For example, the spectrometer
provide spectrum distribution at this point. But this equipment is a little tedious, because you can
only major sport. So you can scan entire wall you have to repeat many times.
So we introduce couple of the equipment. One is to paste [inaudible] in front of black and white
TV camera. And if you load actually interference filter at each point passing bandwidth different.
So if you rotate this kind of equipment over translate you obtain spectrum data of entire wall.
Another equipment so-called liquid crystal shutters depending on the different voltage passing
wave lengths is different. So by adjusting this voltage, you can again scan the spectrum over the
entire wall.
By using this, we can digitize various carve in existing Aleutian Island. One cave is called
[inaudible]. And why this work is too sides -- sorry, one is, of course, one was Japanese
company Topan is interesting to make a video contents of this [inaudible] and we digitize and
then makes a video contents.
And that is permanent display of [inaudible] National Museum. That is a commercial motivations.
And this is [inaudible]. And this is a hill and in this hill there is a [inaudible] and important point is
you can see that [inaudible]. So to display this kind of video content, good idea.
And why this [inaudible] is famous is this [inaudible] has color painting over the stones and due to
that we make video contents. But our purpose is slightly different.
Previously, standard understanding is this painting was done under sunlight. But some
researcher is wondering, maybe wrong because there's no [inaudible] of the [inaudible].
So maybe they are painted under -- no, no, standard interpretation is this painting was done
under torch. There's no residue of the smoke. So maybe there is a possibility that sunlight.
So we scan measure the spectrum of this [inaudible] and then make a simulation by applying
torch colors and sunlight colors.
This is the result. And apparently under sunlight more [inaudible] exist. Some of the lines only
visible under sunlight, meaning simulation results suggests that painting was done most likely
under sunlight. Why it is important.
Well, if this painting was done under torch, as a standard interpretation, they complete [inaudible]
and then bring the torch and then paint. Why? If this was done under sunlight, meaning the only
complete walls without ceiling and then paint and then after that they put the ceiling and make a
hill.
So this kind of interpretation provides different interpretation of how they complete this [inaudible].
And again this is a power of real digitization of e-heritage result, actually.
Similar result. [inaudible]. This is again six centuries, and current situation is like this. You
cannot see anything. But by scanning spectometry, not -- in this case liquid crystal equipment,
and also applying no linear dimensional reduction by using normalized cut and also PCA,
whatever, method, we segment by using spectrum data, and we extract this kind of three
[inaudible] and also, more importantly, previously surveys say in this area [inaudible] doesn't
exist.
But apparently by analyzing this one, we see this kind of [inaudible]. Meaning this [inaudible] has
[inaudible] [Japanese]. This is quite important. But anyway, by analyzing such spectrum data
entire wall, we can find this kind of [inaudible].
So for accurate study, again, this kind of digitization method is more powerful than traditional
RGB. I will skip this one. Now, second issue. Now you digitize various data. How to represent
such data is another headache, and we are talking about a huge amount of data.
And, of course, in my opinion they store the query computing is important. But how to display is a
headache. For you to represent this kind of data structure inside cloud computer, and the user
want to see this kind of data to [inaudible].
Network is relatively narrow. And so we have to worry about how to display such huge structure.
And we are going combination of image-based rendering and model-based rendering as Rick is
also doing. And what we did was -- maybe I should skip this one -- combination of model-based
rendering and image-based rendering.
And what we did was first 3D model we constructed 3D modeling hierarchical structures. And
then also we prepare various view similarly as a [inaudible] graph. But we are sampling more
density. And put it on image and last 3D data is sent to the viewer side requesting the viewing
directions.
And then we paste this image over this last 3D models and user can enjoy view. I'll skip details
that I can really explain. But user graph only scan along one particular viewing relation, but in our
case we sample density over the space.
And we are, thanks to the huge amount of data, data area of the cloud computing, we can
probably possible. And we're asking Microsoft Research to ask, to give us time of the cloud
computing of this particular element, actually. So maybe you can see intern came and working
on this particular topic.
And he prepared this kind of system and one of the device which he built was like this. So you
can see huge structure on line through simple display terminal.
Now, the third issue is display. And one of the displays of this structure is using this kind of
theater type. But the theater type is not interesting. So what we are working is so-called mixed
similarity display, and we see real image in the other side feeling wind blowing from ancient time
and fusion of current image with [inaudible] image.
When we have the goggle -- actually this is the area which we are working so-called [inaudible]
village. Did I talk about this one? I'm confusing which portion I've already talked and which
portion I didn't.
So basically this is a [inaudible] in Japan. And this is a quite famous temple called [Japanese]
and if we have enough historic knowledge, they really appreciate -- this is the [Japanese] and this
is an important site and you bring school kids, they don't care -- [inaudible] and then for all I know
they have books here and come back, go home. So it made this village quite angry to appreciate
the school kids understand this portion is quite important.
So we propose when school kids, when they go, you can see this kind of ancient temple, then
they really appreciate, right? So we build this kind of system. And this is called mixed reality
display. And in this mixed reality system, we have to worry about geometric consistency and
photometric consistency.
And in my opinion, 99 percent of mixed reality research is working on geometric consistency.
And usually outliers, if measure people are working on that particular area, against my grad
student working area, that's a different topic.
And in our group we are working on photometric consistency. But before I explain this
photometric consistency what is geometric consistency?
Basically, this is a image of mixed reality. This is a batched object, mixed reality image.
In this case you have to make a consistency between coordinate system of this particular object
with background image. This is called geometric consistency. And 99 percent of mixed reality
research is working in this area.
However, we decide not to work on this area. So we rely on the hardware solutions. So basically
we use magnetic field explore magnetic field and this goggle magnet field and determine data
relation between this goggle and this magnetic field poll.
And this goggle has a small camera and they take an image of the background and then on top of
that, since this guy knows data relation of the code system, so appear -- shows 3D image on the
screen and uses this kind of imaginary image on the real scene.
This is good. This is good. But did I show you this one? These two objects show exactly the
same positions. But somehow this looks floating where this guy is on the table.
And of course in mixed reality is good. But if you have goggle, palace is floating temple floating
on the ground, that's not good. The palace should be on the ground like that. And so you also
need to worry about these photometric consistencies. So that shadow area is similar to
surrounding shadows.
How can we do that? Well, we have been working this area for maybe 10 years, and [inaudible]
proposed one of the solutions to generate shadow by measuring -- by using fish eye lens. And
then later she improved the method for real time system and then one of the other grad student
generate shadow baked on the 3D image.
First of all, [inaudible] all the work, basically in order to calculate the shadow, what we can do? If
there's no [inaudible] object, this point receive energy from all illumination directions. And you
can measure illumination strings by using fish eye lens. Well, if there's virtual object, some of the
direction is occluded. So by calculating this ratio between no object and object, you can darken
the surrounding area and generate this kind of shadows.
And the same as this video, maybe 10 years ago. 10 years ago. This is good. But in order to
generate this kind of video, she used one week of super computing tower powers. And mixed
reality, if you ask the viewer please wait one week, then you can see a shadow.
That method doesn't work, right? Definitely doing a real time system. So she improves the
method, the method in real time. What she did was basically shadow under license ABC can be
decomposed. Shadow A, shadow B. Shadow C. And the linear manner. This is characteristic of
shadows.
So basically since we can decompose, she precalculated the shadow on the -- source two
direction source three direction and shadow 80 directions [phonetic] of course this calculation
takes a long time, maybe a week or whatever. But who cares, this is online calculations.
Once you prepare this kind of basic shadow images in real time, again we measure the
elimination distribution. Then the light source, one strings is 0.5 relative to the original image
calculation, and second direction is 0 point whatever ratio with respect to the original illumination
conditions.
She multiply this coefficient to the basic shadow images, and this can be done on GPU easily.
And we can make this kind of virtual shadows. Real time. But the problem of this method is
since this is an image base, so viewer cannot change viewing direction.
If you ask viewer where you go and don't move your head, that doesn't work again. And good
point of mixed reality is you can move around your head. So what we can do. And the character
can propose one of the solutions. He proposed to shadow plane surrounded by the object and
over this shadow plane she calculate the basic shadow image again and repeat the same story
and generate this one. Now, by using this one, we can generate this kind of temple with
shadows.
Once we complete it, we build entire area of Uscra village [phonetic] and then -- did I show you
this one? Okay. You can [inaudible] but this part is not important. Rather important point is once
we make this kind of a CG and system, when you climb this hill, this is the real thing. But in the
goggle you can see this kind of other capitals and even zoom in on ancient event. No, seriously.
>>: What is the ->> Katsushi Ikeuchi: What.
>>: What happens to the range? The range.
>> Katsushi Ikeuchi: Don't worry. Later. Later. That's a good point, actually. No, no, I have a
solution. And then since this is popular, so we brought this to Italy and this is one of the
[inaudible] area done by Tokyo [inaudible] and some of this [inaudible] and then one of the
Domamea [phonetic] released this talk and asked me to demonstrate. So we brought this one to
the following manner, and this is me and this is [inaudible] and [inaudible] is quite enjoying this
site and [inaudible] [inaudible] of the department also enjoy.
And they ask me to install immediately. But I was no, no, this is an experiment system. And of
course we are saying sometimes we have to --? This case rainy day, so we make a tent.
[laughter] and they're enjoying this kind of scene.
Well, it's important -- I'll explain this cloud museum. So by combining everything, in my opinion
we should build this cloud museum. Basically cloud computer store such data and develop
technology surrounding this cloud computer for motivation, exhibition, guidance and also opinion
uploading.
And let's take a look at the motivation. The motivation is relatively easy, basically at the center
cloud computing, just you download to the website and then people are interested to visit this site
more. And then people try to visit. And why they try to move around the site, we can combine
moving vehicle with this kind of goggle.
Maybe bicycle over car, in order to avoid lane. And seriously, so this year we're demonstrating
this system in historic site [inaudible] and we are using this drum [phonetic] and this drum
accommodated 12 passengers and we mounted this system, a system like this and every
passenger wearing this one. This is the strange, I think.
And on this real thing, on the -- not real. The CZ is superimposed. Even there is some ancient
people moving around -- this is loading. And even you can join some ceremony. Seriously.
This event ended yesterday. But we did like this.
Now you go to the real site, you see more [inaudible] scenery by using more better equipment like
this. And even we can generate a transitional period. If you build real -- reconstructed palace,
you are stopping the flow of the time, just display one particular period of the palace. But
sometimes in Japanese place, Chinese place, whatever it changes its shape depending on the
period. By using this method you can show various period of the structures. And also another
important point is I can't explain in English, but in Japan very famous program exists. [Japanese]
maybe. This program displaying some of the important event, TV set. But if you can enjoy such
program on site in front of you someone is killed or whatever, it's more value for viewers.
And we can provide this kind of system by using this kind of Google -- goggle system. Not
Google system. [laughter] so these displays and also dichronic displays too. Basically we're
building time machines.
And also communications. Once you visit cloud museum, if you have some opinion over feeling,
you can upload. And then that promotes motivation again on visit again.
And also you can change event couple of times depending on the date. So when you go to one
particular period, maybe you can see Pompe people's life and sometimes when you visit Pompe,
the day, which eruption occurs, whatever. So by using this one you can explore space. Also, you
can explore time.
So this is the story of e-Heritage and in my opinion this e-Heritage for good computer vision
research for e-Heritage sensing, cloud computing for e-Heritage representation and computer
graphics for e-Heritage displays.
So this is a summary. E-Heritage safeguarding heritage, scientific research for accurate study
and also video contents and for promoting tourism and education. And excellent research topics
in computer vision, computer graphics and computer science, too. Thank you very much.
[applause].
>> Sing Bing Kang: Any other questions for our speaker.
>> Katsushi Ikeuchi: Sorry about that. I mixed up all the old stuff and new stuff. I'm not sure
which portion is new and which is old.
>>: I'm kind of curious how difficult is it to get permission from the respective governments to
capture the data?
>> Katsushi Ikeuchi: Quite difficult. Especially -- especially neighboring country.
>>: Which one was the most difficult? Cambodia?
>> Katsushi Ikeuchi: Neighboring country of Japan.
>>: Oh.
>> Katsushi Ikeuchi: But so Microsoft Research Asia is quite important. Because Microsoft
Research, Asia people can work with that government people.
>>: I was wondering about the computing power of your computation in the device you were
using and what the resolution of the goggles were.
>> Katsushi Ikeuchi: Goggle, I'm using currently commercial product.
And price-wise, three years ago when I started this project, I use a Canon device. The cost was
maybe two [inaudible] nowaday, the commercial project, less than iPhone. In three years. And in
terms of resolution, Canon was better.
They accommodate thousand multiplier, 700 something. And current one, five [inaudible] or
something. But in ->>: Are they stereoscopic?
>> Katsushi Ikeuchi: Of course, you can do stereoscopic, but sometimes people is a little bit, how
to say? Motion ->>: Motion sickness.
>> Katsushi Ikeuchi: Motion sick. So stereo -- sometimes motion sick. So for self-side we're
using 2D. But it's easy to convert to 3D because we already have 3D data. So just simply
change the left and right image you can obtain 3D.
But I'm not sure whether the 3D is necessary, because in my opinion human beings only perceive
3D depth up to three meter or four meters. Beyond that, basically people perceive depth from
motion queue or single queue such as shape shading or line drawings. So I'm not sure whether
3D is important. And in order to -- in movies, in order to emphasize 3D, they are using particular
case. But I'm a little bit -- I don't know.
>>: You can always do user studies to see what the preference is.
>> Katsushi Ikeuchi: Yeah.
>>: So you showed this example where you walk through the temple. There are people like
ancient people that are doing the March -- [inaudible].
>> Katsushi Ikeuchi: Sun.
>>: Audit.
>> Katsushi Ikeuchi: Yeah, of course I didn't explain. But this another event, one of the students
working sound effect and you can hear the host footage and also marching sound, too, because
that also includes visibility -- how do you say.
>>: Realism.
>> Katsushi Ikeuchi: Yes, realism yes.
>>: Another project does that include all the actual scenery you're looking at also [inaudible] if
there's a dog running, suddenly, do you see that dog?
>> Katsushi Ikeuchi: Yeah, yeah. We cannot model everything. But in terms of building, yes that
3D data contains all the another chapter but you can only see part of that one. Back to the reality
system. And event, we only digitize a couple of them. And the digitizing method is actually
maybe shooting. And also we use already taken TV programs and we extract human beings by
using graph cut whatever method and then paste that one on the CG, actually. So again that's a
research topic, too, actually.
>>: Are the goggles you used in RF, are they see-through displays?
>> Katsushi Ikeuchi: No, video. And see-through is better. But -- yeah, one -- one company
called Blazer has a see-through, but it's displaying using laser and basing image on retina, a little
concern about as they said. But laser is projecting on the recognizer.
You know, eventually we don't care about the equipment. Rather, we care the method. And our
strategies to use anything which is commercially available. And some people said we should use
iPhone or iPod or whatever. But in order to increase reality, goggle method is more immersing
feeling.
And also immersing goggle may be expensive. But as I told you, in this three years, price
reduced 100th. So eventually goggle is roughly similar price as iPhone. Then I can expect
people are usually using iPhone, but when you necessarily bring from the pocket you bring out
goggle and then iPhone and then you can enjoy. If price is around $100, people actually
purchase them.
>>: Are you happy with the sensors sensitivity in terms of getting the correct orientation? Also the
processing power in terms of the latency of rendering.
>> Katsushi Ikeuchi: Rendering is research issue, and that is the reason why I'm talking with the
Microsoft people to use your system.
>>: Do people have trouble if there's gaps in latency or there's glitches when they're wearing the
goggle or they're immersed, can they lose their balance or get ->> Katsushi Ikeuchi: Maybe.
>>: You know get like you said motion sickness or the lag between the glasses and --
>> Katsushi Ikeuchi: Currently, we only tried two scenarios. One is standing up one particular
place and wearing, to looking around. And another case this is another event. They are sitting
on the chair of the trunk [phonetic] and also the goggle both background and CG is generated
simultaneously. So there is no discrepancies.
>>: Okay.
>> Katsushi Ikeuchi: Yes.
>>: So you demonstrate places that are relatively accessible to most people, but have you
considered doing it, for example, in underwater archeological site and are there limitations in with
the problem of the water between that and the site?
>> Katsushi Ikeuchi: Maybe underwater is probably difficult. But in some -- I know some people
are working underwater scanning. So we don't care what method you use. Rather important
point is 3D data.
And if 3D data is available, we can manipulate such 3D data and also important point is we
shouldn't worry too much about method. We should worry about the data itself. And in my
opinion again we shouldn't worry too much about the goggle or display. Whatever method it is,
basically it's a display. And there is a data. And how to process such data into the available data
format is more important research issues.
>> Sing Bing Kang: No other questions, let's thank the speaker once more. [applause]
Download