15804
>>: Okay. I think we'll try to start again. If I could ask you all to slowly stop the conversations and sort of slink towards your seats or other seats. Thank you. Excellent.
We tried to be a little disruptive this year and we didn't just say oh yeah machine learning and computer vision and follow along and see what happens. We tried to get some talks that are thought provoking and maybe out of context and nonlinear you might think it's random I like to call it nonlinear, just me. [Laughter] But it's actually very, very interesting what we're going to hear.
Taking an economic look at some of these same problems we're looking at.
Li Beibei, I apologize in advance for blustering your name. But she's with the Shared Groups business. I'm actually personally very interested to hear this. So welcome.
>> Li Beibei: Thank you. Hello, everyone. This should work. I'm very happy to be here and introduce our work improving low cost search for hotels using econometric modeling and image classification.
This work has been done with Professor Anita Gosch (phonetic) and Professor Panos Eperadas
(phonetic).
So before we start, oftentimes we would have this sort of question: How can I find a hotel near
Redmond, Microsoft headquarters, less than one mile away from Washington Lake with easy access to highway access and also providing a good price for what it offers.
So this is a very typical question usually we would ask before we go somewhere, and the original motivation of our work is to try to help you to solve this question.
So here's our research agenda. The problem is we want to locate hotels that satisfy specific material and offer the best value for the money. And the challenge here is currently there's no established metrics that can isolate the economic value of the location and service-based hotel characteristics.
So the method we are using is we combine the econometric modeling with user-generated content data and image classification method to identify and measure important hotel characteristics.
So here's the overview of what I'm going to talk about today. First I'll give an introduction of the background then I will introduce how we identify these hotel characteristics and how we collect the data.
And after that I will introduce our econometric-based ranking approach and how we evaluate these ranking results and, finally, I will give the conclusion and future work.
So as we just mentioned, customers, like you and me, we try to identify hotels that satisfies a particular criteria. For example, the food quality or the service. So what we usually do is we search.
Here is a very typical travel search engine, usually we can find a line, and as we can see here, usually they provide only rudimentary ranking facilities using a single criterion.
For example, they sort the hotels by their name or by their price per night or by the number of stars, or, more recently, they sort hotels by the customer reviews.
For example, the ratings or the popularity rank. But the disadvantage of that is it allegedly ignores the multi-dimensional preferences of consumers. And more important it ignores the location characteristics of hotels.
When we say "location characteristics," for example, near downtown or near the beach. So these location-based characteristics, they influence the desirability of hotels, which will in turn influence the prices of those hotels.
However, very few empirical studies so far have focused on the location and hotel pricing. So what we want to do is we want to use hedonic regression models. This is a technique from the econometric field, and we want to use this model to estimate the economic values of hotel characteristics.
So hedonic regression model it basically assumes that groups can be described by vectors of objectively measured characteristics. And consumers evaluation can be decomposed into implicit values of each characteristic. So basically a hotel product can be represented as a bundle of characteristics. So this model sounds perfect. Why don't we just directly use it? What is preventing us from using it right away.
The challenge here is we don't even know what hotel characteristics we want that we can put into our model. And what's more, we don't know what kind of data ours is, we don't know how to collect those data.
So first we want to identify what important hotel characteristics we want. We do this through an on line anonymous way by using Amazon mechanical Turk or AMTURK I'll refer to the first shot.
Basically we asked 100 people what characteristics you will consider the most important when you choose a hotel. Though we choose the ones that are most highly valued by customers. And we believe they will contribute to the aggregate price of hotels.
And according to the resulting characteristics we divide them into two categories. First, service-based hotel characteristics and location-based characteristics. For service-based, it contains hotel cost, which refers to the number of stars. And the total number of hotel rooms.
Hotel internal amenities, which refers to, for example, free breakfast, wake-up call service, high speed Internet, and et cetera.
And also it contains a customer review count and popularity rank. The last two characteristics we identify them in a slightly different way, which I will come back to this next slide.
So after we know what service-based characteristics we want, we want to collect those corresponding data. Our technique is validated on Unix panel data consisting of one-five-seven for one-four observations based on 9,463 different hotels located in the United States for the past five months.
And the hotel prices and service-based hotel characteristics data, we crawl them from tripadvisor.com website. During the crawling procedure we notice that the number of customer reviews and the popularity rank, they are changing overtime. We think, okay, this might be also interesting. So we added them into our model.
So this is how we acquire the service-based hotel characteristics. And for location-based, it contains seven in this category. Near the beach. Near the lake or river. Near public transportation. Near downtown. Near interstate highway and hotel external amenities. This refers to, for example, near restaurant, near shops, near bars or market or any other attractions.
And also safe neighborhood. And after we identify location-based characteristics, similarly we want to collect the corresponding data. However, all the location-based characteristics information cannot be easily derived from the same way.
For example, near the beach, we found that it's easy for using, for analyzing image to derive this characteristic because we notice for beach image it usually provides a long, straight sand strip, together with large ocean area. So that provides us with very highly discriminating features which is easy for machine learning algorithms to recognize.
But for characteristic like near restaurant, it's almost impossible for discriminating this as your image analysis. Therefore, our approach involves a combination of both machine learning and human annotators.
Specifically, we collected the data through four different ways. First, Microsoft Virtual Earth SDK, which all of us are very familiar with. And, second, image classification, and also Turk online
(inaudible) and data collection from online websites.
For commercial characteristics, we compute them through low crawl search inquiries using Visual
Earth SDK. So, for example, basically we use the mapping service provided by Microsoft and we search the total number of restaurants or shops or bars or markets around a specific hotel location.
And we compute the total number of it, save it as the hotel external amenity. So that's how we collect the commercial characteristics.
And for geography call characteristics, for geography call characteristics, for example, beach or downtown, we notice that it's very hard for using the mapping service to acquire these characteristics by using low cost search query. We can't search beach or downtown. So we can directly use the low crawl search, and we notice that -- but we notice that the corresponding image provide better reach texture information.
As I just mentioned, we can see here is a long street, sand strip, and here is a big area of ocean.
And also for downtown area, we can notice it has very dense intersection of the road and street.
So we decided to use image classification method with feature extraction. So basically what we do first we retrieve the images using Visual Earth power system and we get all these two five six times two five six images. And then we divide them into 49 overlap to regions and we compute for the eight features for each region.
And after we conduct a feature selection, we tried different classification method on these two set of images, because although they have the discriminating features but their feature type are different.
So we found that actually we -- we found the decision tree classification method works better on beach and 3M classification method works better on downtown.
So that's how we discriminate to the geography call characteristics. But for some other geography call characteristics, for example, near public transportation, they are too hard even for image classification algorithms.
Because, for example, if we want to know if this hotel is by subway station or not, usually the subway station we can see a very little sign of subway train shown on this image but it's very hard for machine learning algorithm to figure it out. But meanwhile it's very easy for human eyes to recognize.
So for this set of characteristics we classify them using on demand human annotation through
MTURK (inaudible) so basically we ask people do you think this may just show something related to public transportation or not.
And for the last characteristics related to neighborhood safety, we acquire them from FBI online statistics. This includes city annual crime rate and also city population. So here's how we acquire location-based characteristics.
And now we have identified what characteristics we want to put into our model. And also we have successfully collected the corresponding data. Remember, our goal is to locate the hotels with specific criteria and the best value for the money. So with all the data ready now we can start our estimation. We use hedonic regression models to estimate the economic value for this hotel characteristic. We propose a linear function in this following form.
So here, for the first summation part, are location-based variables and second summation are service-based variables and C and S here are our control variables. C is a dummy variable for a large city and S is a dummy variable for holiday season.
And the beta and theta parameters we want to estimate and mu are random error terms. We used fixed vector to composition estimation method at hotel level.
This is a recent proposed estimation method from econometrics field which can capture the influence by time invariant variables in our panel data model. So after this estimation, we got two major results. First one is the economic value for each hotel characteristic, and we also get the residual value for each hotel. So what we can do from these two major results, for hotel characteristics value, here are our preliminary results.
We notice here there are two characteristics which will bring negative impact on hotel price. The first is highway. If it's near highway, the price would be lower and also the average crime rate, if the crime rate is high and the hotel price will be lower.
And also notice here the popularity rank also have a negative sign but it actually brings positive impact on the hotel price because usually the ranking -- the smaller the rank index is, the more popular it is.
Meanwhile, we also notice that among all the location-based characteristics, beach has the largest positive impact on the price. So people really prefer a walkable beach and a hotel by the beach can really make a difference in their price.
Here are all the characteristics which will bring positive impacts on hotel price near beach, near downtown, near lake and river, near public transportation, and also cost, hotel external amenities, internal amenities and total number of rooms, customers review count and popularity rank.
And here's what we can learn from the economic value of hotel characteristic. And remember we also have another major result which is the residual value of hotel, what we can learn from that.
We propose defining value for the money of each hotel costs to the residual value which is the predicted price minus the real price.
And then we rank all the hotels according to their value for the money and then we got something interesting. Here is top 10 hotels we ranked with best value for the money in New York City during 2007 holiday season. And as we notice here, here are all the names of the hotels and that is the corresponding number of stars of that hotel.
We will say three stars, three star, four star, five star, four star, and it pretty much covers a good variety of all kinds of hotels. And then, after we get this list, we want to know how good is our ranking list? So we conduct user study to evaluate our ranking. So basically we -- first we generate seven other different rankings by using the current techniques, including price, high to low, and price low to high. And maximum review count. Number of stars, number of internal amenities. Total number of rooms and popularity rank.
We generate this seven different other different rankings and we put them with our ranking together. And then we provide a short title for each ranking. And then we'll ask customers to select, to choose their favorite one.
The results shows that regardless of the presenting order, more than half of the customers prefer our ranking. This sounds pretty good. And though we want to know how reliable is their evaluation, how robust is this result? So then we keep the title intact and ramp the underlying hotel ranking list.
For example, we keep the title best value for the money but we use the ranking list actually generated by hotel class. And then we realize that customers overwhelmingly vote for the ranking with the title best value for the money. So this on one hand it shows people really prefer hotels with best value for the money. But on the other hand, it also implies that their choices, their decision got largely influenced by the title.
So because of that, we decided to conduct a pure blind test. We hide all the titles and conduct peer-wise comparison using our ranking list with each of the other competing alternatives.
And then we are very happy to see that more than 80 percent customers prefer our ranking. So this would give us very good confidence that our ranking is better than the current techniques according to our user study.
And we ask the users their reason, and here's some words from the people. They think our ranking provides a better diversity, including 30 percent of five star hotels, 40 percent of four star hotels and another 30 percent of three stars and including below.
And also those hotels are priced competitively, and our ranking result also provides a logical way to present information, which start out with lower level class hotels and generally increase to five star hotels. This will give them -- this will help them to make a decision much easier.
And here's an explanation from the users. And here is the further explanation from us based on the qualitative opinions of users diversity is indeed an important factor that will improve the satisfaction of consumers.
And our economic-based ranking approach introduce the diversity more naturally. And so here is what we got from the best value for the money. And what we want to mention here is we also notice something looks more interesting to us. We also rank the top 10 worst value for the money.
And for New York City hotels. And guess what? We notice that the top three worst hotels, four stars, five stars, four stars, they're all from the luxury end. And compared to the top three hotels from the best value for the money list, they're most from the middle level three star hotels.
And we think this is very interesting, and we, to make sure this is the real case or not, we also tried some other big cities in the United States. For example, Washington D.C. and Philadelphia on the east side and LA on the west side.
And also Miami and Austin, Dallas in the south. And it really shows the same trend. So we can't help asking this question: Are those luxury hotels there the worst? So this might be something very interesting to discuss later when we have more time. But we would really want to ask everybody here what do you think, do you agree that the luxury hotels they contribute the worst, the most to the worst value for the money?
And according to your own experience, maybe you have some different idea about this or maybe you can explain this more reasonable.
Finally, just for fun, we also ranked the best value of the money for the great Seattle area, and guess who is the best? It's my hotel in Kirkland. We believe this is the best one. But we really like it.
So here's our conclusion we empirically estimate the economic value of the hotel characteristics using econometrics modeling user-generated content and image classification and by incorporating the economic value for hotel characteristics into a local ranking function, we can locate a hotel with specific criteria and providing best value for the money.
Here's our future work direction. First we want to look into longer data set and also want to look into demand data set from the price, which can give us a more reliable analysis for economic point of view. And also we want to conduct tax analysis on consumers, customers review contents. And we're also planning to conduct personalized ranking instead of just general ranking. We're also thinking about using street level images for better inferences.
So thank you.
(Applause)
>>: We have time for one long or two short questions.
>> Question: The set of characteristics you described, was this derived somehow from users opinions or is this a set that you kind of made up in advance?
>> Li Beibei: You mean the location-based characteristics?
>> Question: All those criteria that you evaluated, how did you come up with that list?
>> Li Beibei: You mean how we identified those characteristics?
>> Question: Yes.
>> Li Beibei: Yeah, we conducted through online survey. It's provided by the technique we used is provided by Amazon mechanical Turk. So basically it's an online tool. We can distribute it small tasks and ask big amount of people to finish the task. We pay them a very small amount of compensation. So it's also online.
>> Question: I see. But did you ask those people to indicate which of the listed criteria are more important for them or did you ask them what criteria they would consider when selecting a hotel?
>>: We just asked the users what things you would consider important when choosing a hotel, and we left it open ended. And then we asked the users the most frequently mentioned ones out of three. Out of the hundred, we picked the most.
>>: Other questions?
>> Question: Close to the microphone, I'll just ask questions. You mentioned about based on your experience that more than 80 percent of users sort of satisfied with your ranking. And can you elaborate a bit more about how actually you did this verification, the 80 percent of the users were satisfied.
>> Li Beibei: You mean especially for the blind test, is it? The last one? So we hide their titles and we just provide them with our ranking, the top 10 hotels, and also one of the other competing alternatives were on the seven different ranking list. So we asked people which one makes more sense to you, you think is better.
>> Question: Have those users sort of stayed in those hotels before or are they have personally ranked those hotels by their experience?
>> Li Beibei: Those persons are random customers.
>> Question: So they actually have not been staying in those hotels, just based on your criteria they just --
>> Li Beibei: Some of them mentioned they stayed there before. And some of them haven't stayed, but they know something about that. So it's like all kinds of customers, not special kind.
>> Question: So those customers were just looking at your, those different parameters and very --
>>: We were just looking at two lists of hotels saying if you were searching for hotels in New York
City, which of the two lists you would prefer to see as a return of a hotel search on a travel website.
>>: Let's take one more question. Don't worry, there's a chance we can meet tonight. There's no questions. Please.
>> Question: You could argue that an efficient hotel market that all the people who set the prices on hotels want to drive the residual value to zero or negative even. And so I wonder if that, maybe the people who set the prices on hotels more about the value of the hotel than the people that you ask. So I wonder if maybe they would be a better source of what criteria to use to measure the hotels by, since they're setting the prices.
>> Li Beibei: Yeah, that's true. But usually hotel owners prefer to set hotel price much higher than their real value. That's what they actually usually do.
>> Question: (Inaudible) so you're not going to ask for an unrealistic price, because you just ask for an unrealistic price who is going to pay it?
>>: You'll have a chance in a second, but I think next year we'll invite a bunch of operation research people and then we'll get an answer for that.
Thank you.
(Applause)
Next speaker needs no introduction. The inventor of the quad-tree himself, Samet Hanan, and I can't wait to see the next talk. Thank you.
>> Samet Hanan: So we can talk later, except I'm on your talk. And it says ranking evaluations.
So how do I get to mine? I'm not supposed to talk on this. I don't want to touch anything.
I'm on the top here. I see. Sorry. All right. My talk is titled A New View on News, as myself and really my students Michael Lieberman, Danielle Panotso, Jargon Sarkoraninan and Benjamin
Titler, and Jargon and Mike are both here. They're sitting right there. And we'll do a demo tomorrow as well.
So let's get started. We're looking at news reading. And news articles often have a spatial focus.
So the key question, at least to us, where are the current top stories, how do we find them, and also what's happening around the world?
We can give examples like in Asia, India, Pakistan border. My neighborhood we don't want to restrict. We want to enable it to go anywhere. We developed a system we call newsSTAND, which stands for Spatial Texture Aggregation of News and Display.
What we do is we crawl the web for news articles. Currently we're indexing 3,000 news sources.
We aggregate news articles by topic based on content similarity. So articles about the same
events are grouped into clusters. Although we don't tell you in advance what the events are, the system discovers them on its own.
Now, we rank the clusters based on importance, which is determined by the number of, let's say, articles in the cluster, number of unique newspapers in the cluster. For example, an event in
Redmond Washington is important if it's in multiple papers, especially if some papers are geographically far from Seattle. In other words, the scope or the distribution.
And also we look into the story's rate of propagation. Now, the idea is that important stories will be picked up by multiple papers within a short time period. If it takes a long time period it's not as important as in the short.
Now, we also leverage work that we've done in a system called STEWARD, which is just a document retrieval system with geographic location. And what it does is it associates clusters with their geographic focus or foci.
And what we do is we display each cluster at the positions of the geographic foci. Now, the motivation is really started with our work in spatial databases where there's a distinct -- you always have this distinction in queries in spatial locational data, which is between a location-based query and a feature-based career. Location-based queries are the form given the location what is there? And we're talking about a lat/long specification. In terms of feature-based queries, it's really the same as spatial data mining.
What you're saying is you're saying given a feature, where is it? Now, what we've done is we've extended the distinction to texturally specified locations. So you'd say find me all topics, articles that mention a given location or region. Alternatively, for the feature base, you would say find me all regions mentioned by articles about a certain topic.
Now, the topic is not necessarily known a priori and the topics are ranked by importance which could be defined by the number of articles that form them.
Now, we do one step further. We further extended the distinction to location specified visually and by the direct manipulation actions of pan and zoom. So you see how we sort of run the gamut of the specification.
Now, our goal is really to change the way news is read. Now, what we're doing is we're choosing a place of interest and refining topics, articles relevant to it. Now, again, these topics and articles are determined by location and level of zoom. So if you really zoomed in you have a different focus.
Now, there's no predetermined boundaries on the sources of the articles. So some applications could be monitoring of hot spots. Hot spots, for example, investors, national security, disease monitoring.
And what we're doing is really fundamentally changing the way news is read and presented. So one action is summarize, what are the top stories happening, exploring what's happening in
Darfur or discovering patterns, how are the Olympics in China and Darfur related. You might not think they are, but if you look at some newspapers you'll see there's a big relationship.
Now, the ultimate goal is to map, make the map the medium for presenting information that has spatial relevance. Examples are news, search results, photos, videos, things of that nature.
Now, let's look at existing news readers. Look at Microsoft Live. I'm sorry, I'm a rude guest, but it's rather primitive and the top stories are presented linearly. There's really at least little or no classification by topic.
The Google News Reader classifies articles by topic, if you know, and also has a little feature called Local News Search. What it really does it aggregates articles by zip code or city state specification. For example, you could say give me articles mentioning College Park, Maryland.
Provides a limited number of articles. Let's say nine at the moment. And seems to be based on the host of the articles. In fact, a lot of people attribute geographic location by the little tag line.
Look at the AP stories they say Washington D.C. or something like that.
So we say the Los Angeles Times provides local articles for Los Angeles. We look at Los
Angeles, not a bad idea. Locate the LA Times.
It seems to use Google Search with location names as search keys. For example, when we look for articles for zip code 20742 where the University of Maryland is, it mentions College Park
Maryland or University of Maryland doesn't seem to have a notion of story importance in the grand scheme.
If you look at international versions it really means international news sources. So, as I said, it's really closely related to STEWARD, which stands for Spatial Textual Extraction on the Web
Aiding Retrieval of Documents. It's a mouthful but you've got to have a name that means something and people like to -- they'll never remember what it stands for. But STEWARD.
Actually, STEWARD is a great name. You know why? Because STEWARD acts like a guardian.
Why we did this is for government documents and for the Housing and Urban Development
Agency. They viewed this. STEWARD is a caretaker. So it's not a bad name. Okay?
And again, these acronyms, what do you do? All right. So what is a document search engine where you have an example of spatial textual query would be say you're looking for a rock concert. The location is say near Bellevue Washington, what we want are resulting documents that are relevant to both key word and location. For key word it's pretty easy. Mention of rock concert. But you want a spatial focus near Bellevue Washington.
So why do we want to use it? Try it on a conventional search engine. What we want is a preference for documents containing both rock concert and Bellevue. Now, the question, is it the intended Bellevue? What about rock concerts in Redmond or Kirkland, you'd like to get them as well.
So the deeper issues are you want to take spatial synonyms into account. Most search engines don't understand the various forms of specifying geographic content. So we really want more than just postal addresses.
And the results on most of the search engines are really based on other measures: Length structure, things like this. In other words, if people link to you they'll find you. If nobody links to you, you can have the world's greatest stuff but no one will find you.
It's more of a popularity contest. And what I call the democratization of search. The reason why I call it democratization of search is because if you give someone crap, everybody gets the same crap. In other words, if you search for something you don't find it, the other people also don't find it. So how can they complain? Okay? Do you see the distinction there I'm making?
And maybe I said too much here, but that's what link structure does for you. So we're saying that
STEWARD is not really Google local in a sense where it uses really geo codes and postal addresses and the points on map.
And this is not so hard because address strings are generally well formatted and most results can be drawn, let's say, from online yellow pages, things like that. What we're really doing with
STEWARD is really working on unstructured text documents, where the document is really a bag of words and the goal is more than searching for addresses in documents, which is really easier.
So the goals as I said before is really to identify all geographic locations mentioned in the document, find the geographic focus of the document, what is it really about, but just in a geographic sense.
And retrieve documents by spatial textual proximity. So having said that, how does the newsSTAND different from STEWARD? Well, STEWARD -- I mean news stand focuses on finding clusters of articles on a single topic and associated them with the geographic locations they're about and to a lesser extent that they mention.
Now, STEWARD focuses on determining the geographic focus or foci of single documents. See, now we're working with multiple documents.
The beauty of working with multiple documents is that the noise doesn't stand out. In other words, if I've got like 100 articles, well, if one article mentioned Redmond and no other does, well
Redmond is not going to be the focus. So what happens is the clustering somehow helps you in finding the geographic things, because the clustering is based on a lot of terms. It's like maybe huge feature vector we use TFIDF business. So the locations play a small role.
But the ones who are the same play a big role. So that enables us to actually find the foci by having a lot of documents. So what I say here is that newsSTAND can choose to ignore some locations as being irrelevant to the central topic of the article. And the common topic or the cluster is used to improve the geographic foci determination process.
In STEWARD, the user selects the key words that determine the documents. Could be news articles. But we also do it for Pub Med, Pro Med and things like that as well. In newsSTAND, the topics are more general than key words and are determined by the clustering process.
And that's independent of the user. And also the newsSTAND can use the functionality of
STEWARD to enhance the process of reading particular articles in the cluster. For example, you could search the cluster for key words. Browse the geographical foci of the elements of the clusters as well.
So mapping the news. Well, we ignore the source geography, the newspaper location. When I say ignore, not completely, because we sometimes use it for hints in certain cases.
We use the geo tagger to identify references to locations. Here's an interesting thing. One of the things we're able to do is to take advantage of zooming in. Zooming in is a very important feature, but it's sort of a problem. What it means is that the cluster populations will be smaller as fewer articles refer to the viewing window.
If I give you the whole world, of course, you can cluster the most important stories, because there's millions and millions of stories. They're not all -- some will be more important than others.
So it's a self-selection. In other words, self-selection gets rid of the bad articles, not the bad articles but the threes important ones. But as you're zooming in you can go into Bangalore, into streets in Bangalore, there won't be a lot of stories but the point is those articles are not going to be that specific -- they'll be more prone to errors.
So what we're saying is that location plays a larger role in the clustering algorithm, but geo tagging errors are less likely to be filtered out when you're zooming in.
Now, we also have something called cluster rank versus cluster spread. When we show the stories, we don't want to have empty areas on the map with no articles.
So this implies less important articles are displayed with some regions than others, and some important articles are obviously not displayed unless you zoom in.
So as you zoom in and pan, you want to make sure that once an article is displayed, it persists until its location is no longer in the viewing window. Otherwise people get disoriented.
Now, zoom in and pan are really expensive, as much redrawing. And what you'd really like to do, and it will come up with us, is use an inset overview window to control the zoom and pan with little symbolic information that needs to be redrawn.
So what are some general geo tagging issues? For example, you've got to identify the geographical references in the text. Does Jefferson refer to a person or to a geographic location?
How about this geographic reference. Does London mean London UK, or London Ontario?
There's 25 - 70 different Londons in our corpus. Also, you want to determine the spatial focus of a document.
Is Singapore relevant to a news article about a article about Hurricane Katrina. Not so if the article appeared in the Singapore Straight Times. You have to worry about these things.
New geo specific tagging issues. The name of the news source. Like I said before, you identify geographic focus for the news source for the containers of the articles in the source and you use this to resolve geo tagging ambiguities.
I don't know if you understood what I'm saying here. What I'm saying is that if you have a -- one of the things we do we cluster the newspaper themselves. In other words, what you'd like to know about the newspaper, is it a local throw-away? Is it a national newspaper like the New York
Times. If it is then the stories will have much bigger scope. Whereas, if the newspaper is like the
Kirkland Daily Herald, whatever it is, the stories are going to be about Kirkland, therefore, when you see stories in that particular venue and you see a name like London, it will most likely be a
London that's local instead of something else, because it's -- this is a very hard problem what we're looking at. I mean to get it right or to get things done.
Now, so another thing is you could perform some preliminary clustering by focusing on the headline. The headline often says a lot. It says little but it says a lot, too.
In the little that it says it's conveying a lot to you. You have to understand this sort of play on words on this. But there is some importance there.
Now, the other one is you have a multiple versus a single interpretation as a geographic location.
Multiple means that if I have a lot -- if I have a lot of interpretations for a name, it probably means it's probably a geographic location. Okay. Whereas if you just have a single interpretation, it may very well be an error. So there you could verify by checking the population, the presence of containers.
In other words, if I see someone like Manches -- a name that can also be used as a name, like
Obama, for example. There's a place called Obama. If you see Obama, am I going to say because Obama appears in the genus, I'm going to place every place -- where was Obama?
Japan. There's an Obama in Japan. We don't want to report Obama in Japan. We'd like pretty silly if we did that.
So these clues are right here and we're looking for presence of containers. What's a container?
It's like a hierarchy. And people have used containers in the spatial textual world, but what they do is they use containers for the lowest common denominator. Saying I have a story about
Kirkland, Redmond, and other places, therefore it's Seattle. Because what they have in common, they're in Seattle.
But we're looking at containers not to give you an identification, but really to sort out problem things. And also the presence of approximate locations. An example like this. If I have London, if I have Toronto in a document, a few other places, then it's much more evidence for London,
Ontario than London, England.
And the point is so we don't -- and the natural way of doing this is to look at it probably say well if you've got London it's got to be London, England because that's the biggest population, the biggest thing. Therefore it's got to be that.
We're saying no you can't make such decisions. We're trying to get it right or at least use more evidence. So there's a lot of learning aspects here as well.
So the architecture of news stand. You have these RSS news feeds and you can see right here sort of the structure, the document acquisition. There's the geo tagger. I mentioned that before.
You have feature vector extraction. So if we go here, so this breaks up the document into entities. The feature record assignment associates each entity with a list of potential geographic locations. And we use a gazetteer. This is another problem. Find a decent gazetteer. We've been through a whole bunch and none of them really have all the information you need.
Somebody could really compile -- it's really worth something -- like a service. And they give the information. It's just not there. Disambiguation versus semantic analysis. This is how we determine the correct geographic reference to for an entity. Then we have the geographic focus determination. This is like saying pick the geographic references most relevant to the document's content.
So here I mentioned the news cluster detections use DFIDF type work. I'm not going to repeat that here. I don't have that much time. Feature vector extraction methods, you know, we said before, is Austin a location or a person?
And they have these problems here. We try to use natural language-based techniques. So we want to find noun phrases or potential locations. For example, we avoid names that are followed by a verb or an adverb. Rule-based type thing. Of course, none of this stuff is perfect. We use a corpus as well.
Natural language techniques. We use corpus. In other words, as part of speech tagging, named entity tagging as well as rule-based approaches, and this is the feature record assignment where we actually interpret things and how much time do I have Gore.
So I still have -- all right. I just don't want to be stopped before I -- so I go back here. So the feature record -- no, no -- [laughter].
So the assignment -- well, I want to give you your money's worth. The price is right. It's free.
Anyway, we search the gazetteer. You have a corpus, geographical location. Genus, you've got
2.06 million locations. It also contains hierarchy information, as well as alternate names.
Still, it's not enough there. For each word and phrase, we obtain a list of all possible geographic locations could be and then we try to associate the key word with at most one location. Semantic analysis, we mentioned before, how do you know which London it is.
So I mentioned that if you had Toronto in there, it might be more likely to be London, Ontario.
And we try to make use of this idea. We have something called a disambiguation algorithm which we call pair strengths where we look at pairs. If you look at London here, it could be in the UK and Canada. These are the options. And so we have London UK, London Canada, and we look at all pairs of entity location combinations. You have London UK. You have London Canada, these are the interpretations.
Suppose we have Hamilton in there. There's a Hamilton Canada and Hamilton Bermuda. We sort the pairs in decreasing pair strength in features like how far apart they are and things like that to enable us to find the strongest pair.
And, of course, we ignore assignments with weaker ones. So where did we go here? . I wanted to go next. What happened? Oh, I used the wrong -- where are we? I got lost. Anyway, so here -- because I've got one and one here.
You see here, if we say distance between them. So the pair London UK Hamilton Canada is weak. Whereas London Canada, Hamilton Canada is strong to give you that interpretation. So we're trying to show you how we would interpret where London was. I know it's a lot of stuff here.
We talked about geographic focus. We'd like to have which one is it. And we look at frequency in the document but not necessarily spatial proximity to other entities, container-based method, every entity votes for a set of container objects. If they're all there it gives greater evidence. We also use clustering-based methods like agglomerative methods and we use a combination of all of these.
Now the user interface, we use the mapping API provided by Virtual Earth. Some improvements I mentioned to Gore already today, I'd like the ability to recursively invoke the Virtual Earth. Why, because there's certain tasks you want to do again and again and you don't want to -- we don't want to open a new window. We'd like to have an inset window and the fact where your mouse is it tells you where the action is. A very powerful user device used in maps.
Insets, the whole notion of insets in cartography, is really that. So some application scenarios are the spatial textual search engine for the hidden web, which I mentioned before, and also we've done it for infectious disease monitoring. We scan newspaper articles, Pro Med alerts and we identify infectious disease mentioned using a disease ontology which we developed.
Now, instead of just having a geographic focus we also have an ontology focus for the document.
You see that the same techniques that we used to do geographic, you can do other topics. Of course, you're not getting all knowledge, but you can devise an engine or a device for a particular subject, and you can use that knowledge that you have about the subject.
So, the idea is to map disease alerts in real time. I'm going to go to the demo. I want to acknowledge our support from Microsoft Research and Virtual Earth. HUD, Department of
Housing, Urban Development. And John Sperling, some of you've met him before. He's our collaborator on this. And he and I all hatched this stuff at a meeting a few years ago. And we work closely together. Digital Government Program at the National Science Foundation and
Larry Brant and the University of Maryland who also supported us in doing this.
I say "supported us," it means they actually gave us money, not money, but in other words they didn't just say, okay, you can do it. So I should acknowledge it.
Now, how do I get to the bottom here? Escape. But I'm not supposed to escape.
I'll give you a little demo here. Okay. So here I should mention these are news articles refroze it a couple of days ago so we don't have this problem that suddenly something doesn't work here.
But this actually works in real time and clusters things as well.
So here is a map. And you can see that we have the different colors here and sizes. The color has to do with freshness. The light green is older. The dark green is newer. And the size of the box has to do with importance. So here is an interesting thing. If I go here, you see -- let me click on this story here. You see they light up.
And what it is, this story is about arms to Zimbabwe. When you look at the tents, it shows you the locations where the ship has been going. There's a ship of arms for Zimbabwe that no one wants to take. And here are the stories.
So if you look here, this is Angola, Chinese ship carries arms cargo to Mugabe. And you see
Namibia. And it's sort of neat to look at it. South Africa, didn't go there, either.
Here, what do we have? Another South Africa. Here we have Zimbabwe. And if we go here, come on -- all right. Here you've got Mozambique and here you've got China. You can sort of follow what's going on.
Let's go here for a minute. This is Jimmy Carter's trip. You can follow Jimmy Carter trying to be a peace maker. And if you don't like it, let's see, let me -- how do I -- this is supposed to -- why isn't the mouse working?
It's supposed to -- okay. I'm sorry. So anyway you can see I zoomed in here, and you can see that these are where Jimmy Carter, Egypt, Gaza, where else, here is Syria. And here we have
Damascus and here are different stories here, but you can see the articles, you can sort of follow him around.
You see the zoom here, what else can I show you? All right. So let me zoom in on India here.
We didn't have many stories, because remember I said before it partitioned the world so everybody has approximately the same stories. But now as I zoom in here you see I'm getting closer. And if I zoom again, okay, what's happening -- this mouse doesn't react? It's supposed to -- all right. This mouse is not reacting huh?
>>: (Inaudible).
>> Samet Hanan: I did. All right. Let's try this. If I can find it. All right. This zoom in should be -- okay. All right. For some reason Virtual Earth has the controls here as well. And it should work with the mouse, but this mouse is not responding to the double click. But you see what happened here in India, we got many more stories about it.
As we zoom in. And, of course, they may not be -- here you see more. You say why are these here? They're about India so they're plotted in the center of India, but here we have stories that are more relevant to these locations.
I didn't do this on purpose. Okay. But you can see here we're getting -- you see how we're focusing in. And as you're zooming in let's go on Sri Lanka, and we're getting stores on Sri
Lanka, in the grand scheme of things you didn't get anything about it.
So now let's zoom out for a minute. So we'll get back -- it takes -- because when it's zooming out, we're working in Maryland here. So all right so here let's look at something else. Let's zoom in on here. This should be the zoom in. And we want to get into Chicago here. Just a minute.
Can you see? -- I think this is the one. Okay. Earthquake. So here if I click here I get the story.
But if I go here, I get the various stories on Chicago. So here what you can see is you get all the stories. If I hit title, it will sort the stories about that particular topic, which is the earthquake, alphabetically. I can also sort by the stories here.
These are domain things. We're going to convert them to the real things if we can. This is by date. It gives you the freshness. You can do increasing, decreasing.
Now, here let's go back by title. Let me just go back to where we -- was this the first -- no, I'm trying to think. Where is the close here? Let me go back here. Let's go to the story and you can see it in stored. You'll see it in power, too. No, I can make this, fill this screen here. Right here.
Let me maximize. Let me zoom in.
We're just displaying with a Google API here. So let's go in there. Here STEWARD would actually find lots of articles but here we're just applying STEWARD to the article. And if I hit
Focus here look what happened. Now I'm looking at terrain. But I can just look at map.
What you see here is different markers of the different geographic locations mentioned in that article. And the colors have to do with the confidence that we have in our classification.
So what you see right here is Illinois. So if you go here, you can get, it just shows you the textual extracts, where you find Illinois here. Number three, number six.
Well, if I go here I get other references. Well, Des Moines was number two in importance here.
So now I can see all the Des Moineses mentioned, just twice. If I click here I get the other one. If
I go here, I can get the next one, with Cincinnati.
Now, if I -- okay, one of the things you see here. Suppose I made a mistake and it was a different
Cincinnati.
Then what happens is I get options here -- I guess Cincinnati was only one. Let's see, do we have other ones? All right. Mount Carmel. So let's see, suppose this is an error. What did we say here? What should I do?
>>: Up on the list.
>> Samet Hanan: Do you see all the Mount Carmels here. These are all the options we can do to correct it. Once we do that it's corrected. So if it was an error it won't make that mistake again.
And the other thing that's supposed to be here is -- let's see. Where is the highlighted copy. It's in there. It was supposed to be. Where is it?
>>: I don't think the system cached this. You can try one of those.
>> Samet Hanan: Where is the other one? Okay. Let's see. Let's try another story. Oh, that was a mistake. Just a minute. So we'll see it in STEWARD. So this should be a story here.
>>: (Inaudible).
>> Samet Hanan: It's the same one.
>>: Because the system was not caching these stories for a short period of time.
>> Samet Hanan: For a few minutes before. So what do I do here? Nothing.
>>: Try another cluster.
>> Samet Hanan: How do I get to another cluster? By closing, go to another story, is what you're saying. Let's go to our friends in Africa. You think they'll be clustered. Let's go to Robert
Mugabe. His ship has been floating everywhere. So let's go here and hopefully one of these stories will have it. And that's sort of a neat thing to show you.
I should have gone to stored. Anyway, this is hot off the press, okay? So let's try this. And hopefully this one -- it's not.
>>: Try a more recent story.
>> Samet Hanan: I mean when you say more recent? You're saying a more recent story like meaning a really dark one. A recent -- light. Anyway, I think what it's supposed to do is highlight show you a PDF with everything in there. And we'll just try one more and, again, we had this breakdown before and tomorrow we'll have the highlighted copies.
Here it is. Here it is. It's here. Okay. Okay. Here it is. That's what I wanted to show you.
You see the location. So you say Canada, get the next. It will get the next one. And it should get all of them highlighted and you can step through the document and see all of the locations and that's the idea.
I'm sorry to run a minute over. Any questions?
>> Question: Thank you for the talk. It's very interesting. The first question is you mentioned you do the clustering then you do the geo tagging. Just not very follow, why are you doing the clustering first then you do the tagging? Because they have many methods you can do the geo tagging then just curious how accurate is this method you're using the clustering first then geo tagging.
>> Samet Hanan: No, because there are two systems going on here, okay? I explained to you the newsSTAND is a cluster, okay. And what you're doing with the clustering is you're trying to identify what are the common.
>> Question: That part I can understand because you're using the group or whatever, they're doing the similar thing, right? They group the newspaper based on the subject. Then you find what's central word.
>> Samet Hanan: But afterwards you try to find what are the geographic -- those are the geographic locations that are there. So what you're saying why don't I geo tag first?
>> Question: Right.
>> Samet Hanan: That's sort of an optimization that you could do.
>> Question: Okay. Another question is, if we have the real time updates for the newspaper, how do you update your cluster and --
>> Samet Hanan: First of all, real time, they're coming in, I can't cluster that fast right now. The clustering takes time, okay? It's not -- but I say it doesn't take days. And the clustering that we just developed, got running today a clustering algorithm that goes much, much faster than the one we had before which was sort of brute force type things. Is that fair to say?
But the point I wanted to make is about the RSS feeds and all that. The RSS feeds don't get you all that information, because most of the news is not there. If you want to get all the news, you're throttled. So you can't really get at it because nobody will let you go to their website very much.
So we have what we can get. But we have a lot of sources but it doesn't mean we can get them day and night, of course.
But there's no reason why you can't do that with our work. I'm just saying we didn't necessarily have it. So our clustering again that you're getting here, it takes hours to do right now. But the
newest ones we have are much faster and one of the things we're working on is developing clustering algorithms that are super fast. One of the things we're working on is using GPUs and other parallel. We're working on some cloud computing and things like that as well. We've got a lot of things in the -- irons in the fire.
But we're only four people or five people. And that's it.
>> Question: You did mention more than agree that for news search and you pay less attention to the address we seen in the documentation. But one of your criteria mentioned you pay less attention to the source of the news, which I think is a little bit debatable.
Because looking at the news is the most important news for the local is how it influence or impact the local. For example, Microsoft news, a lot of people would read Microsoft news in the greater
Seattle news, but in that news probably there's no location about Microsoft or those several locations out there.
>> Samet Hanan: What we're saying to you is we're looking at news from a geographic -- we're trying to show you news from a geographic context. What you're saying to me is, hey, Microsoft is in Redmond therefore you should infer the articles from Redmond. Sure, but I can't do that.
I'm just saying that.
Our goal has been really to try to identify geographic locations. We use this as a vehicle to do that and we found very interesting applications you can do with it. But the fact we can't do everything -- we make a ton of mistakes. I could have showed you a zillion errors, but my students asked me not to be deprecating and not to say it doesn't work perfectly. That's life.
Nothing is perfect.
And I can tell you we have a problem with Santa Katarina. Have you heard of Santa Katrina?
There's a Brazilian priest who flew off with a bunch of balloons. And no one's ever found him. He even has a chair on his balloon. And there are stories everywhere about him. And our system, the gazetteer doesn't have Santa Katrina in it. It puts Santa Katrina in Mexico. I was furious because I heard of Santa Katrina, and it's not right. How can we miss it when we found Brazil and we didn't find Santa Katrina. And I told my students that Santa Katrina is a really famous applause but no one heard of it.
I'll tell you, where Moses spoke to God is in Santa Katrina. It's called Sinai. Hars Sinai. That's in
Santa Katrina. You didn't know that. But a lot of people do. But they wouldn't have thought
Santa Katrina in Mexico. That's what I expected, to get it in Egypt but not in Mexico. All right.
You see that's funny but not so funny.
>> Question: Very nice talk and on a very important topic, but one of the challenges you sort of alluded to in the last example, many articles refer to multiple locations.
>> Samet Hanan: Right, we do.
>> Question: So the first location often is where the reporter was based. So many stories in
Afghanistan, if you read NYT, the first thing you will see is New Delhi. And in the article it will then talk about the current location of the event.
It will give you the history, what else happened nearby. If an article has multiple locations, how do you choose the one where should be next?
>> Samet Hanan: What I said was, first of all, we try to avoid, not avoid, but we don't take -- just because the article was filed in New Delhi, it's not about New Delhi. What we're trying to do is find evidence by other locations in the place that would give you that.
When you say how do we know what the main location is? Well, if you look at STEWARD you don't know. But if you look at the clustering algorithm, what happens is that the topic of
Afghanistan, Taliban and all these things, they dominate.
So each one may have other locations, but the common theme will be Afghanistan. So
Afghanistan will stand out over everything else and you avoid this thing about multiple. However, if you just look at one document, which some of our documents news stories are short. The documents in the HUD repository may be 500 pages and we're going through 500 pages.
There's no way we can give you one location.
We get screwed up left and right. There's a bibliography. You got all the things -- where the people are. You have mayor of El Chinitso, the El Chinitso City Council. Does that mean it's a place?
So you're sort of asking me, you're picking on the reporter, yes, and that's the point I wanted to make. Most of the places that look for locations they really look at, hey, AP says New Delhi or something like that. You say it's about New Delhi because usually they write like that. But I gave you the example about the Singapore Straight Times and I saw that in a real system that flagged
Hurricane Katrina stories in Singapore. That's why I said it.
>>: So thank you.
>> Samet Hanan: And my apologies.
(Applause)
>>: I absolutely encourage you to look at the demo tomorrow. Our next talk is some people here complained that big corporate America is distorting research by solving all your problems. Well, actually the reverse. Some people here are working really hard to make sure you have this wonderful data sets that you can then build upon.
Anyway, Wolfgang runs the left brain or I guess it's the right brain in a way. He's this big continental force in the sky, airplanes and cars and satellites and spends a lot of money too. So he's going to tell us a little bit about what we're doing in virtual domain and specifically (inaudible) when you look at this this is an impressive piece of work. I don't say it because I work on it, but I think the work he's doing on connected technologies is important, of course. But look at it from the lens with what we can do with the data set. Some of the problems have been solved when this is available for us what else can we do. So Wolfgang please.
>> Wolfgang Walcher: Do I need to introduce myself, because I missed the morning session. By the way, I'm feeling bad now because it looks like I'm the only thing between you and the great dinner. And so I promise I keep it as short as I can. How does that work?
And you promise me not to fall asleep. So let's have a deal here. So actually Ger said I manage the imagery data acquisition program for Virtual Earth. And as such I'm dealing a lot with reality with business reality, with facts, with weather, with climate and political issues and all that.
So my talk is not going to be very scientific. It's supposed to be provocative to some degree.
Nonlinear thinking, right.
So I titled it The Geo Spatial Data Challenge for the 3-D Web, and I need to explain every word in that title.
Let me start out with what is 3-D web. And there's this famous guy here that made this statement at his 50th birthday, almost two and a half years ago, in Oxford in a speech when he said: You will be walking around in downtown London, London, UK, is that clear now, London, UK, and be able to see the shops, the stores, see what the traffic is like, walk in the shop, navigate the merchandise, not in a flat 2-D environment like today but a virtual reality walk through 3-D environment.
So that's the grand vision. If you look at what Virtual Earth and Virtual Earth 3-D is doing we're working towards that goal in the hopefully not so distant future.
But there's a lot of things that are needed to get there. Those are, for one thing, we need to have an actual geo database, 3-D representation of that real world, of those alley ways in downtown
London and the shops and the stores there and potentially the interior one day.
There's a whole lot of questions about compute infrastructure, the servers, the user interface and not least of all that, the business model that is, I would say, still developing and not at all clear at this point.
The 3-D databases, that's the stuff that I'm dealing with. So at an absolute minimum what we need in there is the terrain and its textures for all the land masses on the planet, plans, textures, manmade objects, buildings, bridges, everything.
And we could add in moving things like cars and ships and airplanes. But at a minimum we need meta data to describe all those things, where they are and how they correlate with each other and interrelate.
Are we there yet? No. But we're making progress. So I think we demonstrated the capabilities and with the Virtual Earth 3-D. We're not going into what we call version 2 where we added a lot of realism, more content, more detail. I'm not going to go into any of that because I don't want to steal the thunder for Jason and Steve Stansel coming to give a presentation. So I will just touch on the surface of that.
And I drew a red line here. I would say this is about where we are today. Next things will be -- well, there needs to be more functionality. Now, that's a very ambiguous statement. Let's put it that way. Today we're mostly looking at the things, the maps, the 3-D world, and the functionality that we see in applications mostly comes from overlaying it with other data sources. The mash up is the typical scenario.
The other thing we haven't done yet is really scaling it globally. Yes, we are having some 250
3-D cities on line today, but if you look at the percentage of the population centers in the world
that's not a whole lot. That's a very small fraction shun of it. There's really 5,000 really big cities in the world we would need to cover one day and keep up-to-date.
So that real big scale global approach hasn't happened yet. So I drew a red line for two reasons.
Because that's where we are. But also it's a natural barrier. We need a few things to happen before we can go further. We need -- for this to scale really on a global scale, we need a lot more automation. We're proud of what we can do today.
But it's fairly limited to creating the initial product. What's not taken care of very well are two other things. Automated QA QC, quality assurance and quality control. And automation is great if we now have lots of computers trading millions of buildings, we don't need an army of clickity click people somewhere in an offshore factory working along.
However, the thing falls apart now if a quality control we still need people to look at every one of those millions of buildings to make sure, yeah, it's a good one. So there needs to be a lot more technology developed or implemented that allows us to do things like automated QC and automated change detection, figuring out where to update, when to update, why we want to update at all.
So number two and number three of that list are still a big challenge. So that's kind of why do I say that? Because there's a lot of smart people here. You all present a paper here. You're kind of done now with that. What do I do next? Rephrase it and try to give it somewhere else. Or maybe here's a new topic to work on. So that's my main motivation for talking today.
So this colorful thing here is, before I talk further, I want to go back to the basics. What's the requirements, what's the reasons why we're doing all those things. What are the implications for algorithms which use methods we implement.
I'm starting at the back end. So if you say we want to create data to support web-based applications, those applications in a very basic sense could be characterized initially at least as
VN Tour and Explore (phonetic), so what is it that I'm looking at? And analyze.
So analytically functions get more complicated. So get me all the buildings with 13 stories and higher in that area because I'm looking to buy some real estate. I'm a big investment firm.
So number one and two is what we can do today very easily with all the tools that we have at our disposal. But number two and three, they get a little hard, because for that we would need to better understand the semantics, the content, what is it we're actually looking at?
Now, all of those models we need are supposed to go into web-based applications. And with that come a lot of implicit requirements. These are just a few ones here. But a small data sized footprint to enable reasonable download speed and performance in a networked environment is a basic requirement. Simple structure means simple geometry, again fast rendering on not so fast devices making it available on mobile phones. You think of all -- think of the web not just as the big desktop but the beefy graphics engine and all that stuff.
Now with web-based comes the other thing, that it's totally open. It's the whole world. So you have a lot of non-Ph.D. users who want to have things they understand.
For example, when we looked at the 360 panoramas earlier today, they look impressive for people like us who deal with that every day, but looking at one flat picture that shows back and front at the same time is very confusing to most users.
The international aspect, it's very important, for a couple of reasons. What we find okay may not be deemed okay in other places, vice versa. Remember last year, France, when there was still the old office in Microsoft Excel cards (phonetic), there's this elementary school kind of behind, and in the world, the street that goes over there is one of those big poster advertising things.
And there was a poster on there, it was advertisement, I think, for socks or something like that, with a very naked man on there. He had black socks on and that was it. And that was in front of an elementary school, as deemed totally normal in someplaces of the world would not fly in this country.
What I'm saying is what's acceptable in one cultural setting and one legal system is not entirely acceptable in other places.
Privacy, security, we have heard some talk about street level images showing pictures of faces in cars. People going into, what was it, porn shops and all the things that Google got famous for when they first released the Street View Project a few months ago.
We need to be careful when we create those databases that they take into account all the security and privacy aspects as well. So any grand system to drive through a street and create great 3-D models and such it doesn't also have functionality to sensor, to mask out building -- sorry, faces or registration numbers for cars and so forth is not going to work in our environment.
We need those functions from the get-go.
Ger, how much time do we have given that I started late?
So geo data, it's not just 3-D models. It's imagery, raster, vectors. It can be text and multi-media, as long as it's geo referenced put into a geo spatial location. I wanted to put that on for verification.
How do we go about creating all these things? The simplest thing would be there's a warehouse of data that's already there, we just need to geo code, put it in the right location in the world.
Actually, that's what we very early on started out with in Virtual Earth 3-D. We grabbed the landmarks from the flight simulator and put them in the right place and off we went. Hey, here's a
3-D world.
Aggregate UTC user, generated content. That's a funny one. So France at that company not to be named, when the 3-D content generation, there's the sketch up tool. There's thousands, maybe 10s of thousands or more of people contributing content.
Now, that creates a lot of issues because it doesn't give us control over completeness, quality.
Might get 10 people doing the same landmark, and no one doing the 10 houses around that one landmark.
And the alternative is we create it from scratch from other sources. But then there's a lot of issues about which sensors do we use and which sources do we use, how do we acquire those
from the ground, from space. Do we do it ourselves? Do we license it? Do we outsource it? As simple as it seems, when you start digging into those details, it can get very quickly very hairy.
How much realism, fidelity do we want to have to meet all the requirements for our applications?
And how do we actually, eventually, scale that to the level that this worldwide audience and the
World Wide Web would expect us to cover with reasonable databases?
And, of course, all that while there might be good solutions for this and that, we just need to buy this $10 million sensor and fly it in the spacecraft and all that stuff. By the time you build a 30 satellite constellation, this is totally not financeable. Is that the right word?
So even Bill Gates will not pay for it, put it that way.
>>: (Inaudible).
>> Wolfgang Walcher: Yes. So let's look at a couple of problems we have. So assume we build some content and we have users attribute or add to it, generate their own version of that landmark. Us later maybe updating the same city or place from other sources.
We immediately get into a lot of issues with how do we combine different data showing the same thing, different objects that are coming at different times from different sources with different quality criteria and so forth.
It automatically, again, without having an army of people looking at that stuff. So what we need is a means to compare objects in the database by using criteria like correctness, realism, how old is it and what's the level and so forth. But any of these, if you want to implement it in software, we quickly understand that the software needs to better understand what it is looking at. So it's no longer good to have just points and vertices and faces and textures on it. We need to know this is a bridge, this is a building, this is a residential building. This is a parking structure, this is a wooden bridge and so forth.
It is vitally important to introduce some semantics, some image understanding, modeling understanding, understanding of the world we're creating here so the algorithms can actually work.
More realism. If you look at the newest computer games or Hollywood movies, there is enough technology out there to make things really look real. We don't need more realism in that aspect.
We don't need better water representation. What we need is we need to automatically find all the waters so we can make it look like that.
And we need to find out where are all those trees and what type they are so we can really render them that way. So there's other issues like time of day and seasonal rendering, which would require us to remove shadows to understand what surfaces we're having here. What's the window? Where will a light be turned on and shined through the middle of the night? Again are you sensing a pattern here? We need to have a good understanding of we are looking at. What the software is dealing with, what the tools are applied to. Optimizing for our online web-based applications means, I said it earlier, lower footprint, better transmission or download speed, which, again, requires better -- you want to get higher and higher fidelity textures and other features through to the client, we need to do better and better deals, job compressing data.
Using level of detail, being smart about how we size it up and reformat so it goes through that
pipe quicker. Again these issues of sensitive content offensive content, especially dealing with user contributed content. We had to eliminate imagery from our databases from the most obnoxious content. People had the idea that spray paint on the grass, big whatever obscene things, that like high school pranks, will make them local celebrities, and of course that shows up in the early imagery. We had to take it out. Typically we didn't find it, the press finds it or the bloggers.
So it happened with the not so nice thing was the propeller of the nuclear submarine that was absolutely not supposed to be in the imagery happened.
So detecting such things that are not supposed to be in images. Again, required to understand what is it that I'm actually looking at.
Language support is another thing. Showed good stuff at the tech fest about finding letters and writing in different languages and imagery. Wouldn't it be nice to replace it with the local language that the user has for his settings when he signed in?
Again, understanding what we're looking at. This is a common theme here. So with the application support, wouldn't it be nice to enable functionality like pointing at a random object in a virtual world and say what is it? Who lives there? What is in there? What is it made of? Since when does it exist? Boy did I like that time slide thing today. That's very interesting.
What's playing in the theater? Can I have a preview? There's a lot of things we can conceptually think as really cool applications but again all of those would require that we have some built-in knowledge that tells us more about what we're looking at, not just the shape of it and how big it is and where it is.
So in a nutshell, what I'm saying, remember that one slide where I said this is where we are? If you don't start getting more semantics, more thorough understanding of the databases you're dealing with into our algorithms and in the ways we're going about creating those virtual worlds, we will have a hard time to jump across this line here without having a huge cost in manpower and manual labor and so forth.
So that is a real challenge for us, everyone out there who wants to build something at this scale with this functionality.
And I encourage everyone here to do some creative thinking, because, boy, we need it. So what
I'm saying, what I'm coining here is we need smart models. And at a minimum we need to know for each object in those 3-D models in these databases we need to know what it is we're looking at. At least a basic type. It's a building, it's a tree. It's water. It's a bearer of something like that.
And it would be even nicer to have more specificity, a specific tag for it. Like this is actually a gas station.
This is a restaurant. This is a hotel. This is a furer, whatever. Surface material, would that be nice, if we know this is glass, we can make it reflective. If it's brick, we can make it a bump map and render it differently.
And for one thing we could actually start replicating, right? Because for a brick building all you need is one brick and you replicate it. That would be a nice compression.
I'm sure by now you have read all that. Saves me reading it for you. But on the very end I have two things here that those come in from the practical world again. Owner and copyright holder and use restrictions, especially if you start combining data that come from different sources, maybe from a catalog or warehouse or user contributed. Some people might restrict it and say, hey, I don't want you to use that commercially or outside this country or I don't know what.
So that's also a very important attribution for us that we need to take care of. The other example is like do not use it in broadcast. It was done by one movie studio. We don't want to have another movie studio use it. So conclusions?
We have solved a lot of the theoretical issues, but for the practical implementation of doing these things at very large scale, very cost efficiently, because that's the only way we can do it at such a large scale, we need to focus on a couple of those practical implications. So in the real world we're facing copyright issues and obscenities and language barriers and all these type of things and we need to take care of it in our tools and we need to do some creative thinking getting there so adding semantics to the 3-D model is going to be the next big challenge.
And without that I'm stating the 3-D web is going to remain this Mirage at the horizon there and we can get closer but we will never reach it.
And we can be stuck in that world that's nice to look at. We can mash up but it pretty much gets boring once you have looked at everything because you can't really do a whole lot with it without enabling these functions I talked about.
I guess that's it.
>>: Yes, it is.
(Applause).
>>: Questions.
>> Question: Are you going to be looking at automating meta data collection? Because if you take (inaudible) for instance they've got 500 people in India looking at time elapsed photography, creating meta data things like speed limits, signs, (inaudible) where the litter bins are in the roads all that. Are you looking at automating that collection?
>> Wolfgang Walcher: I would actually turn this question to the room and say if someone wants to work on it. I think actually some people are going to talk about this hopefully tomorrow. But that's a huge problem. Take OCR and just try to apply it to the entire world. It's not only text. It's everything else.
>>: OCR is a good example. Yes, clearly the answer is yes whatever can be automated we're going to try very hard to automate. And where there are practical limits we need to look for different options. But, yes.
>> Question: My question is will there be a demo?
>> Wolfgang Walcher: Yes.
>> Question: You had a long list of things to do. Do you have a prioritization, which one you think is more important.
>> Wolfgang Walcher: No, actually, this is really a laundry list of, hey, there's a lot of unsolved problems out there. There might be solutions but is the type of solution that worked once, right and someone quickly wrote a paper and here it is, under lab conditions.
But apply it to real world scenarios, it might break down. So some of those problems look on first glance look like it's solved. We've done that. But what I'm saying is when you put it in the context of what we're trying to do here at the scale at which we're trying to do it, with the sources available and the resources available, your hitting limits very rapidly.
So there's a lot of optimization still possible, and every little bit helps. So I don't want to prioritize here too much. Jason can talk about that when he talks about his stuff.
>>: That is an interesting question. If you use A and B suddenly becomes much easier and understanding the dependencies here and what fusion and leverage later. It's interesting. That's actually a research question. It's how to guess what's going on. Help you in which sequence.
Yes, please.
>> Question: Wolfgang, in one of your slides you mentioned about quality of an object. Can you please talk more about it, is there any current algorithm to assess the quality of the objects, how to select?
>> Wolfgang Walcher: It's very basic. You can care about it statistically say I'm looking at a building and it's 7 kilometers tall, something is wrong here. So you can use some basic heuristics about what am I supposed to see here and what's the shape and form and size and color and what's its spatial distribution amongst its peers, right?
If I have a city I typically have one to three airports. If I suddenly have 72, I have a problem, right? So what I mean is just our intrinsic knowledge about how things should look and where they should be and how they should be correlated to each other helps a little bit.
But to measure -- the whole question about what's the quality, right? We have endless discussions about what's shippable quality. At which point can we release it to the public, what are the measures? You say I've got 10 examples of really good buildings and then you get building No. 11 it just doesn't look like any of the 10.
So back to if I would have these measures like do I know its basic type, do I know its specific type, do I know its surface characteristics and what's the confidence level with which I have determined all these characteristics would help a lot to then compare two renditions of the same thing or similar things and say I can always form some vector and weigh it and then make a comparison.
But, again, it's an open field for discussion and research.
>>: To end that, definitely using Virtual Earth, all buildings in cities tend to look alike and this is sort of an outlier of the solution. The question I mentioned on gazetteers before, it's interesting if it's possible to develop a generalized gazetteer of building objects or building parameters or
building sort of primitives or architecture primitives. And yes this has the actual constraint that make it behave like a building yet you'll have outliers which are a more complex building but probably very few of them, the more organic.
>> Wolfgang Walcher: What you said earlier, one thing might make the other thing better, right?
So we find all the streets in the street names automatically and we know that most people, most buildings have some street access, right? So now you can start matching the roads with the buildings and say okay they need to have street access. And, well, we count the buildings and we have a database that has all the known addresses and we say okay this piece of street should have so many addresses do we have that many buildings?
There's a lot of -- you need to look at the problem at a very global level and see how we can piece it all together from many different sources.
That's most likely much more promising than trying to focus on one data set and one data source and trying to solve all the ambiguities within that one data set.
>>: Any other questions? We'll have a chance, of course, to talk later. If not, thank you.
>> Wolfgang Walcher: Thanks.
(Applause).
[One hour 50 minutes no accents]