21265 >> Sing Bing Kang: I'm delighted to welcome back Katsushi Ikeuchi from the University of Tokyo. I've known Katsushi for a long time. He was my advisor 20 years ago at CMU. 20 years ago, a long time. Katsushi is one of the most influential computer vision researchers in Japan. In addition to computer vision, Katsushi also worked on robotics and graphics. Today he's going to give an update on his e heritage project. >> Katsushi Ikeuchi: Thank you very much Sing Bing. This is a combination of old stuff and new stuff. [inaudible] and I will stop. By the way, this project is called e-Heritage. And what is e-Heritage? Basically e-Heritage, there is a [inaudible] try to make this form shows up later we can display that one in remote place and different times. This project is called e-Heritage. And in order to generate such an e-Heritage, there are two ways, one is 2D e-Heritage and the other is 3D e-Heritage. And as an example of 2D e-Heritage I'll show some Pompe examples. And as you know Pompe's a ruin which is buried around the 1964. And near to Annapoli, and this is Annapoli. And this is Rome. And this is a scene which is digitized [inaudible] Chinese style. Japanese style. [video] This is so-called [inaudible] I don't know. Google people are using it [inaudible] why we are using they are human. But anyway by using this one, now we obtain 360 degrees of view like this. And then, of course, by combining this one to GPS image, GPS data and map information, we can make data structure once you click around here along this street you can show 360 view. Of course, this is a way to show the e-Heritage in different place, different time. And this time we are working from [inaudible] route to the [inaudible] house. >>: Do you do any visual stabilization? Because it's very jerky. >> Katsushi Ikeuchi: We're also working with -- but in this particular video we didn't. Of course it's 2D e-Heritage it's relatively easy for everyone and the contents is available for [inaudible] and educations. And personal memory. However, based on just 2D e-Heritage, we cannot expand any accurate study or nor preservation of heritage. So we are definitely should go to the 3D heritage. And because e-Heritage is priceless and irreplaceable and vanishing, so it's a good idea to safeguarding by storing such data. And also based on that e-Heritage data, we can conduct scientific studies and, of course, we can use for contents, too. And this is provide ample opportunity for research topics in computer graphics and computer vision and robotics, too, actually. So today I'll talk about three issues. One is modeling issue. One is representation issue and one is display issue. Some of the stuff is old. Some of the stuff new. So let's take a look at some modeling issues. So basically how to model -- how to obtain 3D data from cultural heritage. And in this area we should worry about two areas. One is geometric information and the other is photometric informations. And as an example of geometric information I'm using relatively all the work by young project. And we scan this [inaudible] temple. [video]. >>: Ancient India in the tradition of Kramere. The temple was constructed around the end of the 12th century to bring relief in the crisis in the [inaudible] era. It is well known for the appearance of, for example, calm smiling faces on towers and double corridors carved in beautiful interest in relief. >> Katsushi Ikeuchi: And we scan this convergent temple. Why? Because central tower is inclining and there's a possibility of collapse in the near future so it's good idea to obtain 3D from collapsable cause. And since this temple is large structure, 150, 150, 30 meter high. So big side usually provide the challenge in research topics. In geometric modeling, usually we take three step data acquisition in order to determine the data vehicle, the alignment and then connect all the data merging. And the object is relatively small and covered by say 10 images or 10 views already various commercially available software exists. But due to the size of convergent temple research topic appears. And the university [inaudible] is easy job when you encounter difficult problem, just give to the student, then automatic problem solver, provide solutions. So basically university [inaudible] job is to find difficult problems. And this convergent temple provides me lots of opportunity to express such a difficult problem. And one of the problems is to design new sensors for large scale building and also to develop new [inaudible] to handle huge amount of data. And some of you already knew this talk of the design use sensor, balance sensor, basically. For example, data acquisition you need balance sensors. You need corridor climbing sensors. You need meter sensors quickly review balance sensors. Basically we obtain color image and lens images. And as you know, color image store RGB information of each pixel while arranging this pixel to the object. This is color image. And usually we can obtain such an image by using laser scanners. Of course by [inaudible] too. And range sensors case to project laser light to the object and the measureings flight time we can determine distance. And many commercially available software -- commercially available sensors produced. While the research issue exists? Because the sensor is so-called grand base. So you put sensor on tripod in Sylex case and wait 15 minutes and data appears. It's good. But sometime ground base has problem. This case we're talking about pagoda, and of course from ground, all the data is obtained. But due to the occlusion, some portion is missing. And of course we can build a scaffold. But this priority is you can build a scaffold. But we're talking about the [inaudible] temple 150 to 150 30 meter height. So scaffold is not good solution, especially Mayan temple is famous place, so if you cover that temple by scaffold, sight seer quite angry with you. We should avoid such method. So we build balance sensor, and a balance sensor we hand lens sensor under the [inaudible] and you can bring the sensor any place. And this is a scene which we are loading the balance sensor. This is a good idea. Good idea. But this issue exists. Data obtained balance sensor like this. This doesn't work. Again, as I told you, if you encounter difficult problem, university professor usually has automatic problem solver, gives a grad student three years and one grad student mount TV camera on top of the lens sensors and so-called factorization method and this stored data because I'm from university -used to be Carnegie Mellon [inaudible] famous method so-called factorization method. Probably you know from the image you can obtain the motion of the sensors. But unfortunately maybe [inaudible] -- not so [inaudible] to arrange data. So somehow we have to worry. And what we did was sensor distorted the lens data image motion and balance motion. So we extract three constraint such as factorization method provide 3D data. There is balance sensor also provide lens sensor. It should be consistent so we can set up one constant. And also of course [inaudible] adjustment and thirdly long-time I'm working on [inaudible] at that time we invented a smoothness constraint so as a group we should use smoothness constraint. Fortunately the reason why we use balloon is balloon motion is relatively smooth. So we can apply smoothness constraint. If we use helicopter, they have high frequency motion. So we cannot use such smoothness constraint. So we use a smoothness constraint. And we have this global cross function. And usually this is normalization, nonlinear normalizations so you need good initial solutions. And fortunately Tokyo provide good solution. [inaudible] so using this method we obtain good initialization, initially [inaudible] and plug in this initialization problem and then we set up iterative solutions. Then this distorted data is corrected like this. This distorted data is correct like this. [inaudible] where this is all the stuff but then it's okay. And we also invented various [inaudible] sensors such as climbing sensors and also [inaudible] sensors. And then this orders videos. >>: In order to scan large architectural structures such as the Mayan temple we have to use different sensors depending on the location of objects on the site. The scan the data faces of [inaudible] we used the long range sensor named Cyrex. We measured the face from any positions such as the ground the scaffold on the roof and a bucket lifted up by a crane. The data from different directions were integrated and a 3D digital model of each face was built. To scan the narrow space between the [inaudible] and the corridor, the laser sensor [inaudible] which moves vertically along the ladder [inaudible] and was used. Mayan temple is a huge architectural structure with a large number of high towers and it is not practical to scan the upper side, especially the roofs from scaffolds. For this task, we use the balloon sensor. The laser sensor suspended under a balloon which had been developed for this purpose. >> Katsushi Ikeuchi: So this is a story. And then we obtain half terabyte datasets and now the issue is how to align those datas. And, of course, again number of the images is small, we can use standard ICP. So that alignment basically to obtain data correspondence between feature point and then determine rotation translations. And we apply ICP. We can align the observed data easily. Like this. But the problem is we are talking about terabyte datas and it requires large amount of datas, and also if we apply ICP it turns out probably takes one year to align all the arranged datas. And I ask grad student where he takes one -- hey you cannot graduate. And of course he works hard. And he developed good softwares. And basically what approach always is a quick pairwise alignment using GPU and also parallel assignment, simultaneous alignment on PC clusters, and basically I should skip this one. First to make a quick correspondence he used GPU and mapped one range data over the [inaudible] processing memory mode and then generated this kind of index image. And another image is mapped to this one and then make a correspondence. And originally NL squared operation become N operations. Secondly, some of the characterization sparse matrix so we use incomplete coarse key factorization and computational time becomes [inaudible] and suddenly concerning the data dependency, we assign all the data in parallel computers. And then we deduce a thousand times parse agreement. Sorry, Japanese. Thousand times faster and then also terabyte data processing. And this is the result. Again, this is all the stuff. So entire Mayan temple is represented one centimeter resolutions. >>: How many points are there? >> Katsushi Ikeuchi: Which? >>: [inaudible]. >> Katsushi Ikeuchi: Yes. And, of course, 3D data so you can enter inside this 3D datas. Since we obtained entire structures. So from this 3D data, we can generate floor plans. And we generate floor plans and it turns out somehow 0.9 fold degree counterclockwise rotation entire Mayan structures, with respect to the path directions. And who cares? But you know who cares. But somehow 0.9 degree counterclockwise. And no one knows why such a rotation occurs. But in obtaining this kind of 3D data you can find this kind of some new findings. Another findings, you know, we scan 173 data phases and we classify that 173 data phases and then originally people guessing we can classify that 173 phases into three groups. And it turned out we can also classify such phase into three groups. [inaudible] and moreover, similarity group exists. One similarity group is here. One similarity group is here. Another similarity group here. Basically similar face exists approximately positions. And again that is corresponding to the previous rumor saying who independent work team exist and they work in parallel manner. One groups K this area, one group cave there. One group caves here and usually central portion is covered by teachers and student take a look at teachers caving as they follow. Such -- so in this case we can find four similarity groups. So maybe we can say three, four independent researcher groups work independently. In parallel manner. Did I talk about this one? >>: I don't remember. >> Katsushi Ikeuchi: Okay. This is new stuff. And also 3D [inaudible] when Mayan temple was built, some of the impediment is hidden. Even today you visit the Mayan temple you cannot see this kind of impediment. And even there is no picture of this impediments. And we scan a little bit -- a little bit combined all of them and generally synthetic picture like this. >>: [inaudible]. >> Katsushi Ikeuchi: Maybe like this. Maybe this is a real size, actually. And this not so built to impediment, but I gave a talk at U.C. Berkeley. One of the points -- the Buddhist department quite excited when he saw this impediment. Why? Can you guess? >>: Converge [inaudible]. >> Katsushi Ikeuchi: Yeah, yeah. Did I talk that one? [laughter]. >> Katsushi Ikeuchi: Yeah. According to him, apparently there was a Buddha carved out and then this is a man symbol, man symbol meaning [inaudible] that is Seeba. Seeba is a God in Hinduism. So meaning this representing Seeba by removing Buddha, this is evidence of religious change from Buddhism to Hinduism. So apparently this temple was converted from Buddhism to Hinduism at a certain point. So what we are saying is by using such 3D structure, not only generate promotion video, but also we can find this kind of archeological findings such as entire structure is [inaudible] 0.94 counterclockwise over the [inaudible] yes we can classify that 173 faces into three groups [inaudible] and also impediment, the impediment suggests that they just change from Buddhism to Hinduism. So this is a parallel 3D data. Another example. [inaudible] in Pompe. Actually, the reason why we went to Pompe is we would like to make a YouTube video and we are interested in scanning this [inaudible] house called [inaudible]. And we scanned entire structure like this. This is an entrance. And this is the podium, I don't know the English name. And garden exists, and beautiful Pompe bust. And, of course, by using this kind of video, we can generate the batch of tourism. But after this one we make cross section and check whether this cross section is corresponding to previous finding. Then somehow there is a discrepancy between previously scanned floor plan with our floor plan. And somehow all that one is larger than ours. And I think about it. And previously what happened was usually when you scanning, not scanning, how does it in English, scabbing -subbing, they would put a pole like this. And that backward is located in hillside. So they are measuring this distance while we scan entire structure and generate cross section, meaning we are measuring this distance. So probably due to this measuring method such discrepancy occurs. Meaning -- more accurate than previous method. So by using again 3D data we can check all the floor plan, too. So these are the story of how 3D shape measurement provide new finding over some of the insight of measuring method. Now, let's take a look at photometric modeling. And one issue is when you measure the 3D data, we paste picture, color picture over the scene. And then Mayan temple is large structure. So this is morning area. And when you take a picture and coming back, evening, color difference occurs. How can we do that? Well, you can make [inaudible] whatever. But in order to preserve real color, it is not good idea to simple [inaudible] did I talk about this one? >>: I don't remember. >> Katsushi Ikeuchi: Okay. And observe the color, what's going on is basically multiplication of surface color with elimination colors. So what we need is to separate elimination color from surface color to preserve the surface color. How can we do that? Well, let's to make a story simple, narrow the assumption, basically we are RGBs scanned by particular Web lengths, and then story becomes relatively simple, observe R is multiplication surface R and elimination R. Like this. But still ambiguity exists because we only observe RJB. We set up three equations, but there is six unknown parameter. Maybe you think if we increase observation also observation, still you can set up six equations and let's see. One, two, three, four, 5, six, seven, eight -- nine unknown. So you cannot obtain solutions. More over, in this equation, there is ambiguity. That [inaudible] office ten times brighter illumination. 1-tenths of the surface color provide the same observation. So ambiguity exists also. So usually in this case we can set up so-called chromaticity. And chromaticity space, we can still obtain similar equation but still ambiguity exists in the posed problem. Two observation and four unknowns. What can we do then? Well, when you are talking about outside object, usually we can assume illumination is black body datasets. Sun is black body datas. Sky is black body data. So combined illumination is black body addition. So then what's good? Because black body addition, inverse of green inverse of red is existing one story vines. So by plugging in this one, just inverse equation, plug in this elimination constraint, then basically we can set up this equation. If you observe RJB, M and C is known parameters. So unknown surface color exists in among storage lines. So when you observe one elimination, one observation, then you can guess the near surface color existing one particular storage lines. If we observe two images of the same point under different elimination conditions, we can obtain two lines and intersection provide the surface colors. And also we need [inaudible] and we introduce some of the [inaudible] conditions and then basically we are saying we can't precisely obtain surface colors. Obtain two images of the same positions and then from that we can obtain the surface colors. Now sometimes it's clumsy to obtain the same position from under different elimination conditions. Fortunately, single image provides some time to different elimination conditions, because this area illuminated by sky and sun and this area, subtle area only limited to sky. So different elimination conditions. So if we obtain some particular surface area, which is the same RB, though, we are -- that area is divided by shadow areas illuminated area. From that area we can obtain same RB but different elimination conditions. And by using these ideas, I'll skip these details, we can obtain chromicity averaging. So from original image by using that idea we can average chromicity and then blackness. So these are the stories how we obtain real colors. But again our GB is literally values. Because incoming light is continuous spectrum distribution. And you are measuring RGB by using a filter, particular to this particular camera. So depending on the camera, this filter characteristic is different. So you are obtaining arbitrarily RJB. Moreover, sometime different color appear exactly the same on the particular elimination conditions, if you use RJB. So definitely we need spectrum modeling necessary. And of course there is a method to obtain particular spectrum at each point. For example, the spectrometer provide spectrum distribution at this point. But this equipment is a little tedious, because you can only major sport. So you can scan entire wall you have to repeat many times. So we introduce couple of the equipment. One is to paste [inaudible] in front of black and white TV camera. And if you load actually interference filter at each point passing bandwidth different. So if you rotate this kind of equipment over translate you obtain spectrum data of entire wall. Another equipment so-called liquid crystal shutters depending on the different voltage passing wave lengths is different. So by adjusting this voltage, you can again scan the spectrum over the entire wall. By using this, we can digitize various carve in existing Aleutian Island. One cave is called [inaudible]. And why this work is too sides -- sorry, one is, of course, one was Japanese company Topan is interesting to make a video contents of this [inaudible] and we digitize and then makes a video contents. And that is permanent display of [inaudible] National Museum. That is a commercial motivations. And this is [inaudible]. And this is a hill and in this hill there is a [inaudible] and important point is you can see that [inaudible]. So to display this kind of video content, good idea. And why this [inaudible] is famous is this [inaudible] has color painting over the stones and due to that we make video contents. But our purpose is slightly different. Previously, standard understanding is this painting was done under sunlight. But some researcher is wondering, maybe wrong because there's no [inaudible] of the [inaudible]. So maybe they are painted under -- no, no, standard interpretation is this painting was done under torch. There's no residue of the smoke. So maybe there is a possibility that sunlight. So we scan measure the spectrum of this [inaudible] and then make a simulation by applying torch colors and sunlight colors. This is the result. And apparently under sunlight more [inaudible] exist. Some of the lines only visible under sunlight, meaning simulation results suggests that painting was done most likely under sunlight. Why it is important. Well, if this painting was done under torch, as a standard interpretation, they complete [inaudible] and then bring the torch and then paint. Why? If this was done under sunlight, meaning the only complete walls without ceiling and then paint and then after that they put the ceiling and make a hill. So this kind of interpretation provides different interpretation of how they complete this [inaudible]. And again this is a power of real digitization of e-heritage result, actually. Similar result. [inaudible]. This is again six centuries, and current situation is like this. You cannot see anything. But by scanning spectometry, not -- in this case liquid crystal equipment, and also applying no linear dimensional reduction by using normalized cut and also PCA, whatever, method, we segment by using spectrum data, and we extract this kind of three [inaudible] and also, more importantly, previously surveys say in this area [inaudible] doesn't exist. But apparently by analyzing this one, we see this kind of [inaudible]. Meaning this [inaudible] has [inaudible] [Japanese]. This is quite important. But anyway, by analyzing such spectrum data entire wall, we can find this kind of [inaudible]. So for accurate study, again, this kind of digitization method is more powerful than traditional RGB. I will skip this one. Now, second issue. Now you digitize various data. How to represent such data is another headache, and we are talking about a huge amount of data. And, of course, in my opinion they store the query computing is important. But how to display is a headache. For you to represent this kind of data structure inside cloud computer, and the user want to see this kind of data to [inaudible]. Network is relatively narrow. And so we have to worry about how to display such huge structure. And we are going combination of image-based rendering and model-based rendering as Rick is also doing. And what we did was -- maybe I should skip this one -- combination of model-based rendering and image-based rendering. And what we did was first 3D model we constructed 3D modeling hierarchical structures. And then also we prepare various view similarly as a [inaudible] graph. But we are sampling more density. And put it on image and last 3D data is sent to the viewer side requesting the viewing directions. And then we paste this image over this last 3D models and user can enjoy view. I'll skip details that I can really explain. But user graph only scan along one particular viewing relation, but in our case we sample density over the space. And we are, thanks to the huge amount of data, data area of the cloud computing, we can probably possible. And we're asking Microsoft Research to ask, to give us time of the cloud computing of this particular element, actually. So maybe you can see intern came and working on this particular topic. And he prepared this kind of system and one of the device which he built was like this. So you can see huge structure on line through simple display terminal. Now, the third issue is display. And one of the displays of this structure is using this kind of theater type. But the theater type is not interesting. So what we are working is so-called mixed similarity display, and we see real image in the other side feeling wind blowing from ancient time and fusion of current image with [inaudible] image. When we have the goggle -- actually this is the area which we are working so-called [inaudible] village. Did I talk about this one? I'm confusing which portion I've already talked and which portion I didn't. So basically this is a [inaudible] in Japan. And this is a quite famous temple called [Japanese] and if we have enough historic knowledge, they really appreciate -- this is the [Japanese] and this is an important site and you bring school kids, they don't care -- [inaudible] and then for all I know they have books here and come back, go home. So it made this village quite angry to appreciate the school kids understand this portion is quite important. So we propose when school kids, when they go, you can see this kind of ancient temple, then they really appreciate, right? So we build this kind of system. And this is called mixed reality display. And in this mixed reality system, we have to worry about geometric consistency and photometric consistency. And in my opinion, 99 percent of mixed reality research is working on geometric consistency. And usually outliers, if measure people are working on that particular area, against my grad student working area, that's a different topic. And in our group we are working on photometric consistency. But before I explain this photometric consistency what is geometric consistency? Basically, this is a image of mixed reality. This is a batched object, mixed reality image. In this case you have to make a consistency between coordinate system of this particular object with background image. This is called geometric consistency. And 99 percent of mixed reality research is working in this area. However, we decide not to work on this area. So we rely on the hardware solutions. So basically we use magnetic field explore magnetic field and this goggle magnet field and determine data relation between this goggle and this magnetic field poll. And this goggle has a small camera and they take an image of the background and then on top of that, since this guy knows data relation of the code system, so appear -- shows 3D image on the screen and uses this kind of imaginary image on the real scene. This is good. This is good. But did I show you this one? These two objects show exactly the same positions. But somehow this looks floating where this guy is on the table. And of course in mixed reality is good. But if you have goggle, palace is floating temple floating on the ground, that's not good. The palace should be on the ground like that. And so you also need to worry about these photometric consistencies. So that shadow area is similar to surrounding shadows. How can we do that? Well, we have been working this area for maybe 10 years, and [inaudible] proposed one of the solutions to generate shadow by measuring -- by using fish eye lens. And then later she improved the method for real time system and then one of the other grad student generate shadow baked on the 3D image. First of all, [inaudible] all the work, basically in order to calculate the shadow, what we can do? If there's no [inaudible] object, this point receive energy from all illumination directions. And you can measure illumination strings by using fish eye lens. Well, if there's virtual object, some of the direction is occluded. So by calculating this ratio between no object and object, you can darken the surrounding area and generate this kind of shadows. And the same as this video, maybe 10 years ago. 10 years ago. This is good. But in order to generate this kind of video, she used one week of super computing tower powers. And mixed reality, if you ask the viewer please wait one week, then you can see a shadow. That method doesn't work, right? Definitely doing a real time system. So she improves the method, the method in real time. What she did was basically shadow under license ABC can be decomposed. Shadow A, shadow B. Shadow C. And the linear manner. This is characteristic of shadows. So basically since we can decompose, she precalculated the shadow on the -- source two direction source three direction and shadow 80 directions [phonetic] of course this calculation takes a long time, maybe a week or whatever. But who cares, this is online calculations. Once you prepare this kind of basic shadow images in real time, again we measure the elimination distribution. Then the light source, one strings is 0.5 relative to the original image calculation, and second direction is 0 point whatever ratio with respect to the original illumination conditions. She multiply this coefficient to the basic shadow images, and this can be done on GPU easily. And we can make this kind of virtual shadows. Real time. But the problem of this method is since this is an image base, so viewer cannot change viewing direction. If you ask viewer where you go and don't move your head, that doesn't work again. And good point of mixed reality is you can move around your head. So what we can do. And the character can propose one of the solutions. He proposed to shadow plane surrounded by the object and over this shadow plane she calculate the basic shadow image again and repeat the same story and generate this one. Now, by using this one, we can generate this kind of temple with shadows. Once we complete it, we build entire area of Uscra village [phonetic] and then -- did I show you this one? Okay. You can [inaudible] but this part is not important. Rather important point is once we make this kind of a CG and system, when you climb this hill, this is the real thing. But in the goggle you can see this kind of other capitals and even zoom in on ancient event. No, seriously. >>: What is the ->> Katsushi Ikeuchi: What. >>: What happens to the range? The range. >> Katsushi Ikeuchi: Don't worry. Later. Later. That's a good point, actually. No, no, I have a solution. And then since this is popular, so we brought this to Italy and this is one of the [inaudible] area done by Tokyo [inaudible] and some of this [inaudible] and then one of the Domamea [phonetic] released this talk and asked me to demonstrate. So we brought this one to the following manner, and this is me and this is [inaudible] and [inaudible] is quite enjoying this site and [inaudible] [inaudible] of the department also enjoy. And they ask me to install immediately. But I was no, no, this is an experiment system. And of course we are saying sometimes we have to --? This case rainy day, so we make a tent. [laughter] and they're enjoying this kind of scene. Well, it's important -- I'll explain this cloud museum. So by combining everything, in my opinion we should build this cloud museum. Basically cloud computer store such data and develop technology surrounding this cloud computer for motivation, exhibition, guidance and also opinion uploading. And let's take a look at the motivation. The motivation is relatively easy, basically at the center cloud computing, just you download to the website and then people are interested to visit this site more. And then people try to visit. And why they try to move around the site, we can combine moving vehicle with this kind of goggle. Maybe bicycle over car, in order to avoid lane. And seriously, so this year we're demonstrating this system in historic site [inaudible] and we are using this drum [phonetic] and this drum accommodated 12 passengers and we mounted this system, a system like this and every passenger wearing this one. This is the strange, I think. And on this real thing, on the -- not real. The CZ is superimposed. Even there is some ancient people moving around -- this is loading. And even you can join some ceremony. Seriously. This event ended yesterday. But we did like this. Now you go to the real site, you see more [inaudible] scenery by using more better equipment like this. And even we can generate a transitional period. If you build real -- reconstructed palace, you are stopping the flow of the time, just display one particular period of the palace. But sometimes in Japanese place, Chinese place, whatever it changes its shape depending on the period. By using this method you can show various period of the structures. And also another important point is I can't explain in English, but in Japan very famous program exists. [Japanese] maybe. This program displaying some of the important event, TV set. But if you can enjoy such program on site in front of you someone is killed or whatever, it's more value for viewers. And we can provide this kind of system by using this kind of Google -- goggle system. Not Google system. [laughter] so these displays and also dichronic displays too. Basically we're building time machines. And also communications. Once you visit cloud museum, if you have some opinion over feeling, you can upload. And then that promotes motivation again on visit again. And also you can change event couple of times depending on the date. So when you go to one particular period, maybe you can see Pompe people's life and sometimes when you visit Pompe, the day, which eruption occurs, whatever. So by using this one you can explore space. Also, you can explore time. So this is the story of e-Heritage and in my opinion this e-Heritage for good computer vision research for e-Heritage sensing, cloud computing for e-Heritage representation and computer graphics for e-Heritage displays. So this is a summary. E-Heritage safeguarding heritage, scientific research for accurate study and also video contents and for promoting tourism and education. And excellent research topics in computer vision, computer graphics and computer science, too. Thank you very much. [applause]. >> Sing Bing Kang: Any other questions for our speaker. >> Katsushi Ikeuchi: Sorry about that. I mixed up all the old stuff and new stuff. I'm not sure which portion is new and which is old. >>: I'm kind of curious how difficult is it to get permission from the respective governments to capture the data? >> Katsushi Ikeuchi: Quite difficult. Especially -- especially neighboring country. >>: Which one was the most difficult? Cambodia? >> Katsushi Ikeuchi: Neighboring country of Japan. >>: Oh. >> Katsushi Ikeuchi: But so Microsoft Research Asia is quite important. Because Microsoft Research, Asia people can work with that government people. >>: I was wondering about the computing power of your computation in the device you were using and what the resolution of the goggles were. >> Katsushi Ikeuchi: Goggle, I'm using currently commercial product. And price-wise, three years ago when I started this project, I use a Canon device. The cost was maybe two [inaudible] nowaday, the commercial project, less than iPhone. In three years. And in terms of resolution, Canon was better. They accommodate thousand multiplier, 700 something. And current one, five [inaudible] or something. But in ->>: Are they stereoscopic? >> Katsushi Ikeuchi: Of course, you can do stereoscopic, but sometimes people is a little bit, how to say? Motion ->>: Motion sickness. >> Katsushi Ikeuchi: Motion sick. So stereo -- sometimes motion sick. So for self-side we're using 2D. But it's easy to convert to 3D because we already have 3D data. So just simply change the left and right image you can obtain 3D. But I'm not sure whether the 3D is necessary, because in my opinion human beings only perceive 3D depth up to three meter or four meters. Beyond that, basically people perceive depth from motion queue or single queue such as shape shading or line drawings. So I'm not sure whether 3D is important. And in order to -- in movies, in order to emphasize 3D, they are using particular case. But I'm a little bit -- I don't know. >>: You can always do user studies to see what the preference is. >> Katsushi Ikeuchi: Yeah. >>: So you showed this example where you walk through the temple. There are people like ancient people that are doing the March -- [inaudible]. >> Katsushi Ikeuchi: Sun. >>: Audit. >> Katsushi Ikeuchi: Yeah, of course I didn't explain. But this another event, one of the students working sound effect and you can hear the host footage and also marching sound, too, because that also includes visibility -- how do you say. >>: Realism. >> Katsushi Ikeuchi: Yes, realism yes. >>: Another project does that include all the actual scenery you're looking at also [inaudible] if there's a dog running, suddenly, do you see that dog? >> Katsushi Ikeuchi: Yeah, yeah. We cannot model everything. But in terms of building, yes that 3D data contains all the another chapter but you can only see part of that one. Back to the reality system. And event, we only digitize a couple of them. And the digitizing method is actually maybe shooting. And also we use already taken TV programs and we extract human beings by using graph cut whatever method and then paste that one on the CG, actually. So again that's a research topic, too, actually. >>: Are the goggles you used in RF, are they see-through displays? >> Katsushi Ikeuchi: No, video. And see-through is better. But -- yeah, one -- one company called Blazer has a see-through, but it's displaying using laser and basing image on retina, a little concern about as they said. But laser is projecting on the recognizer. You know, eventually we don't care about the equipment. Rather, we care the method. And our strategies to use anything which is commercially available. And some people said we should use iPhone or iPod or whatever. But in order to increase reality, goggle method is more immersing feeling. And also immersing goggle may be expensive. But as I told you, in this three years, price reduced 100th. So eventually goggle is roughly similar price as iPhone. Then I can expect people are usually using iPhone, but when you necessarily bring from the pocket you bring out goggle and then iPhone and then you can enjoy. If price is around $100, people actually purchase them. >>: Are you happy with the sensors sensitivity in terms of getting the correct orientation? Also the processing power in terms of the latency of rendering. >> Katsushi Ikeuchi: Rendering is research issue, and that is the reason why I'm talking with the Microsoft people to use your system. >>: Do people have trouble if there's gaps in latency or there's glitches when they're wearing the goggle or they're immersed, can they lose their balance or get ->> Katsushi Ikeuchi: Maybe. >>: You know get like you said motion sickness or the lag between the glasses and -- >> Katsushi Ikeuchi: Currently, we only tried two scenarios. One is standing up one particular place and wearing, to looking around. And another case this is another event. They are sitting on the chair of the trunk [phonetic] and also the goggle both background and CG is generated simultaneously. So there is no discrepancies. >>: Okay. >> Katsushi Ikeuchi: Yes. >>: So you demonstrate places that are relatively accessible to most people, but have you considered doing it, for example, in underwater archeological site and are there limitations in with the problem of the water between that and the site? >> Katsushi Ikeuchi: Maybe underwater is probably difficult. But in some -- I know some people are working underwater scanning. So we don't care what method you use. Rather important point is 3D data. And if 3D data is available, we can manipulate such 3D data and also important point is we shouldn't worry too much about method. We should worry about the data itself. And in my opinion again we shouldn't worry too much about the goggle or display. Whatever method it is, basically it's a display. And there is a data. And how to process such data into the available data format is more important research issues. >> Sing Bing Kang: No other questions, let's thank the speaker once more. [applause]