>> Philip A. Chou: So I am very pleased to introduce Yung-Hsiang Lu. We have been working together as technical co-chairs of ICME and Multimedia Conference next year in Seattle, but most of the time he spends time as an associate professor in the School of Electrical and Computer Engineering at Purdue. Today he will share with us his passion for cameras everywhere. >> Yung-Hsiang Lu: Okay, thank you very much. Thank you for giving me the chance to come here and talk. I was here before. I was in the room next door last year when I was a speaker in the eScience in Cloud workshop last year and that was me in the middle. I remember another room similar to this where it was captured and it really was very nice. At that time I told some people, “I will be back” and here I am. Today I want to share with you some of the progress we have made in the last 1.5 years. I also want to discuss some ideas. It is always good to hear a few different views and we actually are building a system around a few of the views. So you are welcome to be users and also collaborators. Actually Jerry and I have already discussed some ideas. Please feel free to interrupt if you want to say anything. So I started the project almost 4 years ago. How many of you are gold members of any airline or have been gold members? It looks like Microsoft doesn’t send you to too many places. >>: I am a diamond. >> Yung-Hsiang Lu: Oh that’s much better. So as we go to conferences sometimes I feel that it’s great to go to different places and learn the culture, the people and so on, but sometimes we encounter problems: lost baggage, delayed flight, cancelled flight, overcrowded trains and so on. So I asked the question, “Can I see the world at my desk without going anywhere?” Of course I don’t get the experience of the food or the people, but if I just want to take a look at the famous architecture maybe that’s good enough. So that was one of the motivations of the project. Now I know some of you ask the question, “Does a ‘street view’ solve the problem?” Let me give you a story about myself. I took a sabbatical in Singapore and then after that went to Singapore again twice for a conference. And some of you may have been to Singapore, you know it’s a very expensive place and you can easily spend a few hundred dollars for a decent hotel. So I was looking for a hotel and I found a hotel that looks reasonably nice. It’s a 4 star hotel and that’s the street view. So I saw it was a good deal. So I bought that hotel, but when I arrived the taxi driver couldn’t even find the hotel because the road was closed for MRT construction. If you come from United States to Singapore you know you always land between 11:50 pm to 1:00 am, because of the time difference. So it was very late and I found it hard to find the hotel. I finally found it and the next day I tried to catch a little bit of sleep and the construction started at 7:00 am. So I was thinking, “Okay that doesn’t seem like a very good decision.” I look at the previous example and I made a decision based on this information, but the reality is the information was obsolete. So this is a picture I took just a few days ago. You probably recognize this intersection right. It should be just outside here and it actually tells you it was captured more than a year ago. This is another example, it’s outside my office at the Purdue University, this is Northwestern Ave and there is a building there for 3 years. The construction started in 2013. The building has been [indiscernible] and tells us the image was captured 6 years ago. So the problem here is absolute data may make us make wrong decisions. So suppose I tell my friend, “Okay, so if you see the parking garage and then my building is next to a parking garage,” then my friend would say, “Well I looked at the street view, the building doesn’t exist. I see the garage, but the building doesn’t exist,” because it obsolete data. So as a researcher I ask the following question “Is there a problem?” The problem is obsolete data makes us make wrong decisions. So I want to have real-time data and I see there are opportunities for research. So the question: Millions of images and video clips and many of them are continuous videos are online. What can we do with the data and what difference does real-time data verses obsolete data make? I already gave you a few examples how obsolete data can cause problems. Yes, please? >>: To that point, how often are the street views images updated, if at all? >> Yung-Hsiang Lu: So the question is: How often is the street data updated? It depends on sources, some places such as California, the highway data is streaming video. I didn’t calculate the frame rate, but it is many frames per second. I will show you a few examples of the data in New York City. That’s about one frame per second. And for Colorado, I am talking about traffic cameras right now, for Colorado it is about once every two minutes. So it varies. >>: No, no, but the street view thing that you said. >> Yung-Hsiang Lu: The street view, okay. So I will talk about street view. Sorry I misunderstood the question. Actually different companies have different update rates. This is from Google. They actually have a map showing you how old the data is. West Lafayette is not a very big city so I guess it’s updated very rarely. Actually this was updated at least once because when I started the project the data was before even 2009, but then for some reason they updated ti in 2009. Some bigger cities are updated more often, but I don’t have a very precise definition on how often they update. Does that answer your question? >>: Yes. >> Yung-Hsiang Lu: Okay. So we started the idea of using a network camera. The idea is the following. There are many network cameras studied by a company called [indiscernible]. They estimate about 20 million network camera installs per year at about a 20 percent rate increase each year. This network camera can stream data 24 hours in my project we deal with only public data, meaning there is no password, for obvious legal reasons. We don’t want to deal with anything about private data and this may reveal a lot of information about the world. You can also think about mobile data such as from phone for dash cam and if they are connected to a network we can also call that as a network camera. Today you can buy a dash camera with WiFi capability for about $200. Now let me show you a few examples of what a network camera can show you. Yes? >>: Can I ask when you analyze in real-time who is out doing the analysis and what sort of analysis? >> Yung-Hsiang Lu: [indiscernible] has written analysis, but we also make our system available to other people to do analysis. So what kind of analysis, for example you can count people, you can count cars, you can do these type things. >>: So when you say you it means the customer of your [inaudible]. >> Yung-Hsiang Lu: Our system, yes. >>: So they have to write their own code or something to do these things? >> Yung-Hsiang Lu: Yes, we give you some samples and then you will modify the sample programs to do what you want to do. Let me show you a few examples of some of the newer data. This is captured as you can see, it was about 1.5 months ago, October 7, 2015. If you search Panda Cam then you will find one of the examples here. This is one the national zoo in Washington DC and they have 4 Panda cameras. I don’t know who is doing the tracking. I don’t know if it is by computer or a person, but there is another room where you can actually see the Panda. So imagine that you can do a study of animal behavior without going to a zoo. That opens up a new paradigm of doing research. >>: I’m sorry I just wanted to hear; you said you don’t know who is doing the tracking. >> Yung-Hsiang Lu: I do not know who is doing the tracking. >>: But something is happening there that when he moves to a different room the camera switches to that? >> Yung-Hsiang Lu: Yeah. >>: The camera is moving. >> Yung-Hsiang Lu: The camera is moving. >>: No, but when he goes to a different room –. >> Yung-Hsiang Lu: When he goes to a different room it also changes. I don’t know if it’s tracked by computers or if somebody is controlling it, but you can see it. If you have a computer right now do a search Panda Cam national zoo you will be able to see it. So this is one example. Let me show you another example. This example has a lower refresh rate so you see people are kind of jumping, but you can use this example to study –. This is actually from Romania, you can see from the URL. You can use this to study human behavior. Actually I am working with several professors in psychology and they want to observe human behavior non-intrusively and also cross different cultural environments. You can imagine you can sit here in the United States and observe peoples behavior in Romania. For example do they come as groups? When do they come? How many of them at a particular time do they bring their children? Can you tell their age group? That has significant impact on how you design your marketing strategy, because maybe in the morning the age group will be different than in the evening. We actually have talked to a few people near Purdue, some store owners and they say they do see the change of demographics and they want to use that information to improve their marketing strategy. This is another example and as I mentioned earlier I want to see the world. This is a Yellowstone, if you have been to Yellowstone, how many of you know where this is? You know the name of that? >>: [inaudible]. >> Yung-Hsiang Lu: That’s Old Faithful. This was captured in October and now it may be covered by snow. I watched them in February and I see these covered by snow, but I still see people sitting there, waiting for Old Faithful and I was really impressed. As you can see these are all kinds of examples where you can potentially use the data without going there. Now let me talk about the psychology study again. So, several professors want to use this data to do a worldwide study. The hypothesis is that people with different cultures behave differently. You can do this study without sending graduate students to 10 different countries. Of course they want to go, but we don’t have the funding. This is at Washington State and these were captured a few days ago as you can see on November 28. If you want to see where a particular highway is congested or see where there is a long line there. Just this morning a professor sent me an e-mail, I haven’t read the details yet, being a few days ago announced they have 30,000 traffic cameras [indiscernible] now and I already mentioned some of it. So as you can see this is an example of the same shopping mall. You can look at the time, it is a bit small, but you can look at the time and study maybe how the customers may change. This is a volcano eruption in Hawaii. It was captured by the national park camera. Let me show you it here. It is at 15 minute intervals. This was captured on January 24, 2014. So you see the volcano eruption. Of course that is a very active volcano so it happens very often, but nobody was there to do the recording, but you can still watch that. So I think I gave you enough examples of those events. Also last year we did a study, this used the camera in New York City. The left side of the figure shows the parade route. It is the Thanksgiving Parade and the right side shows the location of the traffic camera. We captured the parade on that day by selecting specific cameras. I would claim using a network camera, first you can see the parade without going there and second it is even better than you going there, because first if you have been to any parade like this either you need to go very early or the only thing you see is somebody’s head if that person is taller. And even if you are tall enough and you stand there you can only see one place, right. You cannot physically be in multiple places at the same time. But here we have 4 different angles of seeing the parade. There are actually quite a few cameras on the whole route so you can see them. I hope that gives you enough ideas about the possibility of using network cameras to do all kinds of things. This is an air quality in Washington DC. I was at [indiscernible] in Australia a few weeks go and I noticed there taxi’s also have cameras on the side. I am not sure what the purpose is, but I noticed that. This is another study by a research fellow at Australia National University. They have a camera looking at the forest to see how the trees change and it is using solar power and he is happy to share the data. So now you can imagine you can do a study of a forest without going to Australia. I am going to skip that. We are not the first project doing this. Actually some people in St. Louis have been doing this for quite a few years, but there are quite a few differences. First they give you data, second the data has a very [indiscernible]. Over almost 10 years, since 2006 to now they get 800 million images and in a field study I can tell you we can get 800 million images in a day because of [indiscernible]. So that is the quality I can give you. So now let me introduce the project we are working on. It is called CAM2: Continuous Analysis of Many Cameras. Yes? >>: So just from the examples that you gave, how did you get the list of the cameras that are publically available? >> Yung-Hsiang Lu: Research imaging. >>: Okay. >> Yung-Hsiang Lu: So the question is: How do we find the camera? We do a web search. We went through different types of strategies. First our project started by scanning the IP address at Purdue Network with a permission of Purdue’s network security. They said, “Okay you know how to do it; go ahead and do it.” So we found a few dozen cameras that way. So, different brands of cameras have a specific signature. So we send a query, if a camera responds in a particular format we send out the camera. So we started by doing that. Then we also scanned a few universities IP address. We found a few hundred cameras that way. One university complained, they noticed we were scanning the IP address, but that was too slow, because the [indiscernible] was very low. Then we started working on traffic cameras. Why do we use traffic cameras? Because you go to the state, you figure out the data format, you get a few hundred or few thousand at once. So that’s another strategy. Then a third strategy is we actually have an agreement with a few companies, hey have data online and then we get permission to get the data. >>: So what fraction of these are sort of 30 frames per second camera verses once a minute? >> Yung-Hsiang Lu: So the question is: What’s the ratio of a high-frame rate verses low frame rate? I actually don’t have a precise number. I would say maybe 5 to 10 percent give us 30 frames per second and I will show you in a later slide that depends on where you are talking about. If you have a camera in the United States, you try to get data from using a machine in the United States; you may get 30 frames per second. If you have exactly the same camera in Europe you try to get data from using the machine in the US you don’t get 30 frames per second. I will show you that a few slides later. Does that answer your question? Okay, so the majority of cameras that we have are updated once every several seconds to once every several minutes and we don’t have control over that. So this is our project. You are welcome to register as a user. When I received [indiscernible] science foundation in July the program manager really pushed me hard. I won’t give you a number, but he has me a number of how many users we want to target before the grant ends. So I hope everybody signs up so we can get our number up. Okay, go to that website and sign up as a user. Our system is not an archive of data. In fact we don’t retrieve data regularly. We retrieve data only when you ask us to retrieve data. Our system is a computing system for doing image analysis in a large scale. When I say large I will show you an example here. We used 17 Amazon instances, we also used Microsoft Azure, but in this previous case we used Amazon. Over 24 hours we grabbed 1 image every 5 seconds from 16 thousand cameras worldwide and we get about 7TB of data. We are working on getting 1 billion images over 24 hours. We encounter some limitation about our program because we need to spend more than 1 zone in Amazon. But we think we will get there very soon. We are able to grab 200 million images and do some relatively simple analysis we call background subtraction. They key here is not the image analysis itself. The key here is being able to do image analysis at this scale. Yes? >>: You mentioned something about zones. Were these 17 Amazon instances all inside one zone? >> Yung-Hsiang Lu: In one zone, yes. >>: Why do the zones matter? >> Yung-Hsiang Lu: Because the account we got from Amazon restricted to 20 instances per zone. >>: Oh I see. >> Yung-Hsiang Lu: So basically we just need to modify the program so we can get it from multiple zones. We got a research account from Amazon. We also got a research account from Azure. Some of the data later you see will come from Azure. So this is a background subtraction program. As you can see it is relatively simple, but you can replace that simple program by many other things. So the purpose is not the image processing itself. The purpose is to run data processing at a very large scale. We can also do things like moving object detection or human detection. So let me explain the architecture of our system. Our system called CAM2 contains analysis of many cameras and basically has the following components: first it has a web portal that you saw earlier that interfaces with users. Then it has a user database. We need to know who the users are, what they have done and so on, including the programs they write. Then we have a camera database and as I mentioned there are quite a few cameras. We have about 70,000 cameras right now in our system and the data goes directly from the camera to the cloud. It doesn’t go to our system because our system would be bottlenecked that way. A lot of study here is about Resource Manager because when I started the project the purpose was to build a realistic environment to study Resource Manager in the cloud. That was the original purpose. But I wanted to build a real system so that the workload was real and that’s why we started the project in the first place. So later on you will see quite a few slides that talk about Resource Manager. This shows the distribution of the cameras. The majority are in the United States. Simply it is easier to do web search and we also go through the department of transportation systematically. Purdue University has signed an agreement with about 20 states to get the data with their approval. The data has property, but we don’t want to just grab the data because some states prefer us not to grab the data from the website. They want to grab the data from their other servers so that we don’t [indiscernible] website too much. Then the rest of the module is in west Europe, again because it is easier to do a web search. Some of the students are working on [indiscernible] and so on. So that shows the distribution of the cameras. Let me give a demonstration of 1 possibility of using those cameras to do something we call realtime image based navigation. This was done by a student and received an award last year. This is a very short vide because in our competition each group only has 7 minutes to present. So he spoke really fast. [Video] >>: You are looking at our mobile application and as you can see it zooms and operates much the same way as our website application. We will take a look at a few cameras in North America here before jumping over to Europe. This is a brief glimpse at a camera in Lawson. Over in Europe you can see the clustered functionality works much the same way as the website application and we will take a look at a camera in the Mediterranean here before we get to the really interesting part of the mobile application. Here at the top you can see me typing in 2 addresses, both in New York, a few blocks apart. What the application is going to do is not only calculate the route between these two addresses, but it is also going to show me any cameras in our system between the two addresses. So what this means is I can access the video feed from these cameras and take a live look at locations along my route or I can look at my destination. So this might be interesting to me if I would like to see if there is traffic perhaps or what the day looks like outside, if there are people walking around. In this look you can see that there is not much traffic, it’s a sunny day. So we would expect the same for the other cameras. I could also see for instance if there was a crowd outside of a restaurant I might want to visit or any other sort of interesting data. The important thing is that I can take a live look at a destination or place important to me with this application. [End Video] >> Yung-Hsiang Lu: We also notice that the frame rate there is about 1 frame a second. Any questions so far? Yes, please. >>: Is CAM2 grabbing the images from the cameras directly or they are sending stuff into their own web server somewhere and you are grabbing them from the web server? >> Yung-Hsiang Lu: Okay, it depends on the source. Some of them we grab the data from the camera directly and some of them we grab from their server. So CAM2 hides behind the camera database and as a user you don’t need to know, but if you want to help us add more cameras you do need to know. Does that answer your question? >>: Yes. >> Yung-Hsiang Lu: Okay, yes? >>: If you have to add new cameras, then for me to be able to be compatible with the CAM2 system do I need any additional software? >> Yung-Hsiang Lu: So the question is: If you want to add your camera to our system do you need additional software? The answer is no, because we have a layer to handle the heterogeneity of the cameras, unless your camera is something very strange and I don’t expect that. Our system already handles many different types of cameras. Some that have high frame rates, some that have low frame rates, different brands have different requests. There are different ways to grab data. [indiscernible]. When you grab data your query has a special brand specific path and we can call that. Does that answer your question? So unless you have a brand new camera that we don’t know, if it’s a commodity camera we can handle it. We can also grab data from a web server or FTP server as you asked. For example Texas told us not to grab from their web server. They wanted us to grab from their FTP server because I guess their FTP server has more bandwidth. >>: So when the user requests a camera what is allowed? So basically do you reroute the data to the user or you actually [inaudible]? >> Yung-Hsiang Lu: Okay, so for the mobile demo the data goes to the mobile device directly. It doesn’t go through us. >>: [indiscernible]. >> Yung-Hsiang Lu: Yes, so the mobile app communicates with the camera directly and if you want to do analysis, as I mentioned earlier, this analysis we go to the Amazon instances and that goes to the camera directly. It doesn’t go through our server, because we are very afraid our server becomes a performance bottleneck. In fact we are pretty sure our server will not be able to handle this kind of load because we don’t have 17 machines. And these are the highest performance machines. >>: You are not storing anything, right. >> Yung-Hsiang Lu: We are not storing unless you ask us to. >>: Unless what? >> Yung-Hsiang Lu: You ask us to store it. >>: So what’s the [indiscernible]? >> Yung-Hsiang Lu: In this case we don’t store it. We grab the data, we process it and we throw it away, in this particular case, but you can store it. So let me go through a few more slides that may answer some of the questions. So as I mentioned the system was built originally to study the resource management for cloud. So I put a lot of emphasis on cloud resource management and what’s a resource management problem? You select a group of cameras, let’s say you want to study a camera in New York and in New York those cameras have specific rate solutions. Then you also give this analysis program. You may say, “I want to count the number of people or I want to detect a particular car.” You can give us a program and you tell us the frame rate you want. So, some of the cameras have a very high frame rate. Some cameras have a low frame rate. Of course for low frame rate we cannot give you more than what the camera gives us, but some cameras have very high frame rates. You can say, “I don’t want a high frame rate, even though the camera can give me 30 frames per second I only want 1 frame per minute.” You can get that number. Then we need to determine the cloud instances. What types, how many cores, how much memory? Currently we don’t use any special hardware such as GPU, but we are working on that. Then where the call instance should be and how many of them. These are the Microsoft Azure locations and they are not equal. They are not equal in many different ways. One of the reasons is the price, for Microsoft Azure it updated only yesterday. The United States has the lowest cost. This is D14, it has 16 cores plus 112GB of memory. Per hour you spend between 1.5 dollars to 19.9 dollars. So the difference is about 25 percent. For Amazon the difference is much higher. For Amazon the difference is up to almost 50 percent. So if you have a lot of data to analyze that 50 percent makes a difference. So you may say, “Well, okay it looks like it’s cheaper to do data analysis in the United States. Should we move all the data to the United States?” The answer is no, because as I mentioned while location matters it depends on your desired frame rate. If your round-trip time between a camera and the cloud instances are long then your frame rate will drop. In this case we use MJPEG, this is measured and we really appreciate that Microsoft Azure gave us this. This data was measured using Azure. All we do is select [indiscernible] cameras, know their locations; we launch the instances in different parts of the world. We measure the round-trip time and then we measure the frame rate we can achieve. Then this figure also shows 2 types of data, the more dots with the yellow [indiscernible] is measured. Then the black squares are emulated by injecting delays using an emulator. So what we observe here is when the round-trip time increases the achievable frame rate drops for MJPEG. What’s Motion JPEG? It encodes the video by a sequence of independent JPEG frames. Why do they do that? Because it’s easier, it doesn’t need to do motion detection between frames. It is also more robust. If one frame get’s corrupted the damage is only 1 frame. But the disadvantage is you need more bits for the data streams. Newer cameras support MJPEG and also H.264. H.264 we observe the frame rate doesn’t drop once the round-trip time get’s increased. However, there are more repeating frames. Still if your round-trip time get’s too long then H.264 still cannot keep up. So what it does is you will repeat frames. On the surface the frame rate doesn’t drop, but in reality it still drops. So I think that answers your question about location. Sorry it took so long to get here. You have to be careful about where you launch your cloud instances because that can affect the achievable frame rate. >>: So why is it that round-trip time effects it? Is it not bandwidth? >> Yung-Hsiang Lu: Because motion JPEG is TCP based. Once your round-trip time is low enough your TCP outstanding window will –. >>: [inaudible]. >> Yung-Hsiang Lu: Yeah, it is waiting for acknowledgment. >>: [inaudible]. >> Yung-Hsiang Lu: H.264, but we still see that and actually I haven’t done enough studies to see why, but it is not fully synchronous. It does not send a frame and wait for acknowledgment. But TCP, once you reach the saturation –. >>: [indiscernible], but H.264 is just streaming –. >> Yung-Hsiang Lu: Yes, but we still see that. >>: [inaudible]. >> Yung-Hsiang Lu: Basically you adjust, but we observe. We haven’t done enough measurement yet, but we observe in several cases, the number of repeated frames increased as the round-trip time increased. >>: Repeated? >> Yung-Hsiang Lu: Repeated frame, right. So it will tell you that you get 30 frames per second, but it will also tell you these 2 frames are exactly the same. So you actually see the jitter of motion. >>: But do you lose frames or do you just see duplicates? >> Yung-Hsiang Lu: Okay the question is do we lose frames or see duplicates? I think we see duplicates. We don’t know the frames are lost because we are only on the receiving end. We have no control of the cameras. >>: Is there any place you are getting 30 frames per second? >> Yung-Hsiang Lu: We have a lot, but this specific intention is to see –. The question here is: Should we move all the data to the United States? >>: [indiscernible]. >> Yung-Hsiang Lu: Yeah. >>: [indiscernible]. In the end you are processing the data and that’s taking time as well. So if you process it faster you can get some delay and then still [inaudible]. >> Yung-Hsiang Lu: No. So the question is: Can we do [indiscernible] of processing a delay? The answer is no, because we simply do not get the data. >>: That’s different though. You might get data, but if you don’t process it in time [inaudible]. >> Yung-Hsiang Lu: No, this is measured without processing. This figure is –. >>: Right, but what I’m saying is that [indiscernible]. >> Yung-Hsiang Lu: So this is upper bound. >>: If your consumption rate is slowing down because you are not processing it that’s the equivalent to not getting the data. >> Yung-Hsiang Lu: Right, well yes, but this example is we grab the data and we throw it away. We don’t do anything. So this data is the upper bound. This figure is the highest you can achieve, of course if you do processing [indiscernible]. Does that answer your question? Does that make sense? Okay. So this can be formulated as an optimization problem that you have different cameras. In this case that view has 3 cameras and –? Yes? >>: Yeah I have a thought. So you said that depending on what analysis somebody wants to do – . Well this is a question. Do you actually know or understand that this analysis is going to need this kind of an instance to be able to start up a VM based on that up front or is it something that the consumer or customer who is doing the analysis have to provide? >> Yung-Hsiang Lu: So the question is: Do we know if you run the program and we need to determine how many virtual machines to launch, who knows that? It is somewhere in between. We don’t know in advance, but we can launch your program, get a few data points and then we extrapolate. That’s my paper tomorrow. We will be presenting in Vancouver tomorrow. You give us a program –. So let me go back to this slide. So you give us a program and we run your program. Let’s say you want to analyze the data from 100 cameras. We will launch your program and run your program on 10 cameras. We will see whether we can meet the frame rate you want. If we can meet the frame rate you want, then we measure the utilization. Let’s say the utilization is 80 percent then we will say, “It looks like 1 cloud instance is good enough for 10 strings. We will launch 10 instances.” Does that make sense? Then of course that measurement can change because your program behavior may change and the content may change. Then we will observe and adapt. So maybe later on we find you need 12. So we launch more and reduce [indiscernible]. Then sometime later maybe you need only 8. Then we will consolidate. That’s exactly my speech for tomorrow in Vancouver. Okay I am going to skip this slide. Then what’s the frame rate you need? It depends on the study you want to do. If you want to study motion you probably need a high frame rate. If you want to study let’s say weather you probably don’t need a very high frame rate. So let me walk you through a few slides of the screen shot of our system to give you an idea of how our system can be useful. This is an example of Los Angeles, because a few weeks ago I gave a speech at UCLA so I used Los Angeles as an example. As I said we don’t continuously save the data. We grab the data when you ask us to grab the data. So after you log in you can go to our system and then you select by locations. You go to Los Angeles and then you select some cameras. As you can see there are several hundred cameras here. After you select some cameras you can also select a camera by recent snapshot. We grab one frame from each camera every 24 hours. We just have a program that we rotate and run all these through the camera. We make our system very slow. We don’t want to jam the network. So we grab approximately 1 frame, actually I believe its 1 week, not 1 day. So we grab one frame per camera every so many minutes. You can also select by snapshot. Then you can see here how many cameras you have selected and how long you want to run your program. You may want to run your program for only a few seconds or maybe a few hours, a few days or maybe longer. You can say, “What’s the interval between frames?” If you are a user we currently allow you to get 1 frame per second. That’s the highest frame you can get, but if you are working with us you can have a backdoor and you can get a higher frame rate. Why do we do once per second? Because we haven’t figured out what’s the right answer. We simply set that limit. You can also specify how many frames you want to keep. Suppose you are doing some kind of motion detection you may want so say, “I want 10 frames.” Then we will give you a running window, if we get more than 10 frames we will drop the oldest, the 11th frame. We just give the latest 10 frames. So you can specify that number. Then you can upload your program. So I guess sorry it takes so long up here. You can upload a program you write or you can use a program we give you. We have written more than a dozen examples. These programs are written in Python right now. We are planning to extend to other language, but we have not. We need more students as always. But you can use our existing program to ask for samples and modify them. Basically if you imagine your program will most likely be something like a loop. You grab one frame, do some analysis and grab another frame and do another analysis. The only thing you need to change is to change that wide loop or 4 loop into our event driven function. We have a paper published this year in [indiscernible] that actually gives you the example, but once you log in as a user you will see all the examples that we give you. Then we will decide which cloud instance to launch and that’s a problem we haven’t solved yet. We have some progress, but we still have a lot of problems to solve. So which cloud instance do we need to launch? How many cores? Some programs are more computation intense. Some programs need more memory. In fact this example shows you that this is a very complex relationship and it’s not a linear relationship. Now let me explain what this figure shows. So we show the utilization of a processor and the memory and the two different frame rates. On the left side we have 1 frame every 5 seconds. On the right side we have 10 frames per second. You can imagine you have a very high frame rate. On the right side it is processor intense. On the left side it is not a very low frame rate its some programs are processor intense. For example human detection and some other programs such as motion estimation is bounded by memory. So the question here is that there are 4 different programs here, the image archive will simply grab the image and save it. It does not do any processing. Motion estimation takes 2 adjacent frames and asks them how much change has happened. Moving object detection does motion estimation and then does a [indiscernible]. So it’s a little bit more complex. Then human detection is the most complex because you use a histogram of orientation to do the human detection. For this left side and right side you can see human detection we see in this particular instance using M3X large, we cannot do human detection at 10 frames per second. So that answers your question. If your progress is too complex computation is the bottleneck. On the left side if your frame rate is low enough we can do human detection. This is per frame utilization. So if you want to do human detection using this particular instance you can do approximately 30 strings, maybe 40 strings. So process utilization is about 2.6 percent per string and then you can process about 30 something strings and then your utilization becomes 100 percent. Does that answer your question from earlier? >>: Yes. >> Yung-Hsiang Lu: Okay and I’m going to skip this. This simply says that there is no simple answer. It is a very complex problem and we are trying to build an empirical model so we don’t have to do this for very type of instance. This slide also shows that if you make a wrong decision, choose a wrong instance you may pay more than twice. This data shows the price per million images. So if the bars are very much uniform that means you pay the same price for different types of instances, but on the right side if you have 10 frames per second the tallest bar is almost twice higher than the shortest bar. So that means you pay almost twice the cost. We want to do that optimization so we choose the right instance. It depends on if your program is computation intense or memory intense and some other factors. Earlier I mentioned our project is called continuous analysis of many cameras. This is an example showing we count people for 24 hours. And over 24 hours we actually see the number of people going up and down at different times. Actually it shows very clearly that at night nobody is waiting for a bus at the bus stop or even passes through. We do this for 1 frame every 10 seconds and then we do a running average every minute, that’s how we get a number. So if you want to do this kind of analysis over 24 hours you can do that very easily by going to our system. As you can see in the middle it says the duration. If you specify a whole day you can do that. >>: [inaudible]. >> Yung-Hsiang Lu: What’s paying for computation? Right now Microsoft and Amazon, because we get a research credit. Sometime in the future maybe somebody else needs to pay. Also our system is marginal enough so we can switch the back end to your machine. Actually when we ran out of the research credit we got renewed, but when we ran out we just simply switched the back end to our lab. We cannot do very large scale processing, but we can do that. Also a related problem is: What happens to the data that you do not save? That’s a very big question and I don’t have a very good answer yet. In cloud computing big data a few weeks ago I presented some very initial ideas about a problem, but I think there are still a lot of problems to solve so I won’t take time here. So far what I have talked about is the real-time data. You get the data or your process the data. A lot of cases you have the data [indiscernible]. So maybe you save it or you want to process it again and again. So save the data and you want to process it offline, not real-time data. We also studied how to do this using spot instance. You can think about using a computer in several different ways. On demand you pay by hour, it’s like a hotel room. You can sign a long-term contract; it’s like renting an apartment. You can do spot instance by bidding a price, it’s like priceline.com. You give a price and you may get a hotel room or you may not. However, the spot instance is different from Priceline because after you check into the hotel room they may still kick you out if the market price goes above your bidding price. And when that happens whatever you do is lost. Your intermediate result is lost. So what we do is we create a periodic checkpoint so that when we get kicked out from the spot instance we can resume, later on we can resume from the checkpoints. And we find you can vary your bidding strategies. If your bidding price is about half the on demand price, meaning that if the hotel room costs 200 dollars and your bidding price is 100 dollars you have a very good chance to get it. We find we have about only 5 percent performance degradation, which means we don’t get it or we get kicked out. But that delay is about 5 percent only. But we can save about 85 percent of the cost. The reason is the bidding price is the highest price you are willing to pay. What the cloud may charge you is no more than your bidding price. So you may say, “I bid 100 dollars,” but you may get it at 70 dollars. That’s the reason you can save up to 85 percent of the cost, but you have only 5 percent performance degradation. So that can save a lot particularly if you have a huge amount of data, that savings is significant. I think I have told you quite a few interesting problems. I will say what we have done is only a very small portion of what can be done. I call it the tip of the iceberg. The more studies we do and the more papers we write the more problems we discover. And the more people we talk to, the more people we want to talk to. Now as I mentioned earlier I have talked people doing psychology, doing forest research, doing transportation research, but there are still many problems. One problem is the metadata; currently there is no easy way to get metadata. We don’t know where the cameras are. We don’t know the frame rate. There is no easy way to get it. We don’t know what [indiscernible]. Is it indoor or outdoor? Can you see people? Still there are a lot of things we need to work on. We are working on some automatic methods to generate metadata. Another problem is there is simply no standard of getting data. Different brands of cameras have different methods. And different states have different ways of giving you the data. It all shows up on a website for humans to see, but it’s not designed for machines to grab the data. At the end what we want to do is to use the system to understand the world. So, obviously there is lot of opportunity for vision research. One colleague was doing the machine learning so he asked me to give him some data as simple as asking, “How much do you want?” He said, “How much can you give me?” So I gave him 100,000 images from New York City at 1 fame per second from 12 traffic cameras over 4 hours. So he said, “Well that’s good enough for awhile.” He was trying to see how he could detect the same car from moving across several different intersections. I have talked about resource allocation. That is the original problem of building a system. And privacy always comes up. When I was in Australia I found this very nice location. So I took a picture of it and it tells you that you are being recorded. Our project uses only publically available data and we are cleared by the Purdue legal department that we don’t have any privacy concerns, but privacy is always a question. I have done some studies about the legal status of our cameras. My understanding is this still needs to be worked out. I think this morning when I was at the airport waiting for a flight I saw several cities decided to give each police person a camera because of some recent instance. So what’s the privacy issue there? So if the police have a body camera, who owns the data? I think that still needs to be worked out and I am not an expert on that. >>: [indiscernible] because you started by saying you are just tapping into public feeds, right? >> Yung-Hsiang Lu: Yes. >>: So basically you bypass those issues. >> Yung-Hsiang Lu: We only use public data. >>: You are not the ones deploying cameras? >> Yung-Hsiang Lu: We are not the ones deploying cameras. >>: Someone else is deploying, someone else is grabbing it and they are publically available so it’s not really that much issue there. >> Yung-Hsiang Lu: Correct, correct, but when we were looking for cameras we also found some cameras were publically available, but we don’t want to use our system. So for example we find some cameras and we believe they are looking like somebody is living room or something. Why are they public? I don’t know, but we found them, but we don’t want them because we don’t want to deal with that. This is a big project and it is also very good experience for a lot of students, many of them are honor graduate students. This is, I think for probably all of them, the biggest project they have worked on. It is a very big team for them to have that experience and I am also very fortunate to have [indiscernible] and of course Microsoft gave us the research credit to use as well. That’s very helpful for us to do some study. To conclude I hope I have given you the idea that network cameras can be very useful for a lot of studies. On Wednesday in cloud [indiscernible] in Vancouver there will be a panel discussion about network cameras and cloud computing. I am one of the panelists. I think there are still many, many challenges. From the many questions here I can see there are a lot of questions and we have answered only a very small fraction of a lot of questions, but I hope I have given you the idea that we are building something useful. It is an open system for people to use. You can register as a user and if you want to get the source code we can also work on some agreement. We have already signed an agreement with a few universities about sharing a source code and our legal department will be able to work out these details. As I mentioned our system is not an archive of data. It is a computing platform for you to run programs and we do everything for you, in particular allocating cloud resources. With this I want to thank you for your attention. [Applause] >>: So these programs that you can write to analyze, do they have to be some format? Do you have some APIs? >> Yung-Hsiang Lu: We have an API, basically your program needs to include a few modules and you have to create the right class of a specific class we create and then [indiscernible] object. So we handle the majority of work in the base class. But you have to override a few methods for your [indiscernible]. >>: But the quantum is just 1 frame? >> Yung-Hsiang Lu: Currently we do 1 frame. >>: [inaudible]. >> Yung-Hsiang Lu: Yes, but you can get the past frames. So you can say, “I want the past 10 frames,” and then you can get say the last 3 frames. You decide what you want to do. And we run in Windows. Let’s say you want to keep 10 frames, we just keep the 10 latest frames and then the old one get’s dropped out. But you can specify the number 10, or 12 or whatever. >>: [indiscernible]. >> Yung-Hsiang Lu: What we want to do is this: first we want to use this system to do some studies, interesting studies. I mentioned psychology, civil infrastructure, transportation, forest and so on and use this system for people to do machine learning because we have a huge amount of data for them to use. Let me talk about machine learning a bit more because it seems to be a hot topic these days. For example we know the precise location of some cameras, for example New York City. They each give us coordinates at each intersection and we can map that into, for example a bus route. So you know a bus will come from one particular intersection to another particular intersection to another particular intersection and then you can use that meta information to train your bus detection program. And remember this bus may not be in the same lane and even in the same lane the camera angle may be different. So this becomes a very rich resource of doing all kinds of studies. So we want to provide that infrastructure. And I mentioned the psychology study; we want to use this data for people to do observation, non-intrusive observation, of people worldwide. Does that answer your question? >>: Yes. >> Yung-Hsiang Lu: I am also the organizer of a low power image recognition competition. So this can also be used as the source of data. I have not combined this to the project yet. So far the two projects are still separate, but this can be the source of data to use. As many of you may know, good data is often the foundation of many interesting researches and we are building a system to provide good data. Does that make sense? >>: Sure. >> Yung-Hsiang Lu: Okay. >>: Sure it makes sense. >> Yung-Hsiang Lu: Other questions? >> Philip A. Chou: All right. Thanks very much. >> Yung-Hsiang Lu: Thank you very much. [Applause]