>> Philip A. Chou: So I am very pleased to... together as technical co-chairs of ICME and Multimedia Conference next...

advertisement
>> Philip A. Chou: So I am very pleased to introduce Yung-Hsiang Lu. We have been working
together as technical co-chairs of ICME and Multimedia Conference next year in Seattle, but
most of the time he spends time as an associate professor in the School of Electrical and
Computer Engineering at Purdue. Today he will share with us his passion for cameras
everywhere.
>> Yung-Hsiang Lu: Okay, thank you very much. Thank you for giving me the chance to come
here and talk. I was here before. I was in the room next door last year when I was a speaker in
the eScience in Cloud workshop last year and that was me in the middle. I remember another
room similar to this where it was captured and it really was very nice. At that time I told some
people, “I will be back” and here I am.
Today I want to share with you some of the progress we have made in the last 1.5 years. I also
want to discuss some ideas. It is always good to hear a few different views and we actually are
building a system around a few of the views. So you are welcome to be users and also
collaborators. Actually Jerry and I have already discussed some ideas. Please feel free to
interrupt if you want to say anything.
So I started the project almost 4 years ago. How many of you are gold members of any airline or
have been gold members? It looks like Microsoft doesn’t send you to too many places.
>>: I am a diamond.
>> Yung-Hsiang Lu: Oh that’s much better. So as we go to conferences sometimes I feel that
it’s great to go to different places and learn the culture, the people and so on, but sometimes we
encounter problems: lost baggage, delayed flight, cancelled flight, overcrowded trains and so on.
So I asked the question, “Can I see the world at my desk without going anywhere?” Of course I
don’t get the experience of the food or the people, but if I just want to take a look at the famous
architecture maybe that’s good enough. So that was one of the motivations of the project.
Now I know some of you ask the question, “Does a ‘street view’ solve the problem?” Let me
give you a story about myself. I took a sabbatical in Singapore and then after that went to
Singapore again twice for a conference. And some of you may have been to Singapore, you
know it’s a very expensive place and you can easily spend a few hundred dollars for a decent
hotel. So I was looking for a hotel and I found a hotel that looks reasonably nice. It’s a 4 star
hotel and that’s the street view. So I saw it was a good deal.
So I bought that hotel, but when I arrived the taxi driver couldn’t even find the hotel because the
road was closed for MRT construction. If you come from United States to Singapore you know
you always land between 11:50 pm to 1:00 am, because of the time difference. So it was very
late and I found it hard to find the hotel. I finally found it and the next day I tried to catch a little
bit of sleep and the construction started at 7:00 am. So I was thinking, “Okay that doesn’t seem
like a very good decision.” I look at the previous example and I made a decision based on this
information, but the reality is the information was obsolete.
So this is a picture I took just a few days ago. You probably recognize this intersection right. It
should be just outside here and it actually tells you it was captured more than a year ago. This is
another example, it’s outside my office at the Purdue University, this is Northwestern Ave and
there is a building there for 3 years. The construction started in 2013. The building has been
[indiscernible] and tells us the image was captured 6 years ago. So the problem here is absolute
data may make us make wrong decisions. So suppose I tell my friend, “Okay, so if you see the
parking garage and then my building is next to a parking garage,” then my friend would say,
“Well I looked at the street view, the building doesn’t exist. I see the garage, but the building
doesn’t exist,” because it obsolete data.
So as a researcher I ask the following question “Is there a problem?” The problem is obsolete
data makes us make wrong decisions. So I want to have real-time data and I see there are
opportunities for research. So the question: Millions of images and video clips and many of
them are continuous videos are online. What can we do with the data and what difference does
real-time data verses obsolete data make? I already gave you a few examples how obsolete data
can cause problems. Yes, please?
>>: To that point, how often are the street views images updated, if at all?
>> Yung-Hsiang Lu: So the question is: How often is the street data updated? It depends on
sources, some places such as California, the highway data is streaming video. I didn’t calculate
the frame rate, but it is many frames per second. I will show you a few examples of the data in
New York City. That’s about one frame per second. And for Colorado, I am talking about
traffic cameras right now, for Colorado it is about once every two minutes. So it varies.
>>: No, no, but the street view thing that you said.
>> Yung-Hsiang Lu: The street view, okay. So I will talk about street view. Sorry I
misunderstood the question. Actually different companies have different update rates. This is
from Google. They actually have a map showing you how old the data is. West Lafayette is not
a very big city so I guess it’s updated very rarely. Actually this was updated at least once
because when I started the project the data was before even 2009, but then for some reason they
updated ti in 2009. Some bigger cities are updated more often, but I don’t have a very precise
definition on how often they update. Does that answer your question?
>>: Yes.
>> Yung-Hsiang Lu: Okay. So we started the idea of using a network camera. The idea is the
following. There are many network cameras studied by a company called [indiscernible]. They
estimate about 20 million network camera installs per year at about a 20 percent rate increase
each year. This network camera can stream data 24 hours in my project we deal with only public
data, meaning there is no password, for obvious legal reasons. We don’t want to deal with
anything about private data and this may reveal a lot of information about the world. You can
also think about mobile data such as from phone for dash cam and if they are connected to a
network we can also call that as a network camera. Today you can buy a dash camera with WiFi
capability for about $200. Now let me show you a few examples of what a network camera can
show you. Yes?
>>: Can I ask when you analyze in real-time who is out doing the analysis and what sort of
analysis?
>> Yung-Hsiang Lu: [indiscernible] has written analysis, but we also make our system available
to other people to do analysis. So what kind of analysis, for example you can count people, you
can count cars, you can do these type things.
>>: So when you say you it means the customer of your [inaudible].
>> Yung-Hsiang Lu: Our system, yes.
>>: So they have to write their own code or something to do these things?
>> Yung-Hsiang Lu: Yes, we give you some samples and then you will modify the sample
programs to do what you want to do.
Let me show you a few examples of some of the newer data. This is captured as you can see, it
was about 1.5 months ago, October 7, 2015. If you search Panda Cam then you will find one of
the examples here. This is one the national zoo in Washington DC and they have 4 Panda
cameras. I don’t know who is doing the tracking. I don’t know if it is by computer or a person,
but there is another room where you can actually see the Panda. So imagine that you can do a
study of animal behavior without going to a zoo. That opens up a new paradigm of doing
research.
>>: I’m sorry I just wanted to hear; you said you don’t know who is doing the tracking.
>> Yung-Hsiang Lu: I do not know who is doing the tracking.
>>: But something is happening there that when he moves to a different room the camera
switches to that?
>> Yung-Hsiang Lu: Yeah.
>>: The camera is moving.
>> Yung-Hsiang Lu: The camera is moving.
>>: No, but when he goes to a different room –.
>> Yung-Hsiang Lu: When he goes to a different room it also changes. I don’t know if it’s
tracked by computers or if somebody is controlling it, but you can see it. If you have a computer
right now do a search Panda Cam national zoo you will be able to see it.
So this is one example. Let me show you another example. This example has a lower refresh
rate so you see people are kind of jumping, but you can use this example to study –. This is
actually from Romania, you can see from the URL. You can use this to study human behavior.
Actually I am working with several professors in psychology and they want to observe human
behavior non-intrusively and also cross different cultural environments. You can imagine you
can sit here in the United States and observe peoples behavior in Romania.
For example do they come as groups? When do they come? How many of them at a particular
time do they bring their children? Can you tell their age group? That has significant impact on
how you design your marketing strategy, because maybe in the morning the age group will be
different than in the evening. We actually have talked to a few people near Purdue, some store
owners and they say they do see the change of demographics and they want to use that
information to improve their marketing strategy.
This is another example and as I mentioned earlier I want to see the world. This is a
Yellowstone, if you have been to Yellowstone, how many of you know where this is? You know
the name of that?
>>: [inaudible].
>> Yung-Hsiang Lu: That’s Old Faithful. This was captured in October and now it may be
covered by snow. I watched them in February and I see these covered by snow, but I still see
people sitting there, waiting for Old Faithful and I was really impressed. As you can see these
are all kinds of examples where you can potentially use the data without going there.
Now let me talk about the psychology study again. So, several professors want to use this data to
do a worldwide study. The hypothesis is that people with different cultures behave differently.
You can do this study without sending graduate students to 10 different countries. Of course
they want to go, but we don’t have the funding.
This is at Washington State and these were captured a few days ago as you can see on November
28. If you want to see where a particular highway is congested or see where there is a long line
there. Just this morning a professor sent me an e-mail, I haven’t read the details yet, being a few
days ago announced they have 30,000 traffic cameras [indiscernible] now and I already
mentioned some of it. So as you can see this is an example of the same shopping mall. You can
look at the time, it is a bit small, but you can look at the time and study maybe how the
customers may change.
This is a volcano eruption in Hawaii. It was captured by the national park camera. Let me show
you it here. It is at 15 minute intervals. This was captured on January 24, 2014. So you see the
volcano eruption. Of course that is a very active volcano so it happens very often, but nobody
was there to do the recording, but you can still watch that.
So I think I gave you enough examples of those events. Also last year we did a study, this used
the camera in New York City. The left side of the figure shows the parade route. It is the
Thanksgiving Parade and the right side shows the location of the traffic camera. We captured
the parade on that day by selecting specific cameras. I would claim using a network camera, first
you can see the parade without going there and second it is even better than you going there,
because first if you have been to any parade like this either you need to go very early or the only
thing you see is somebody’s head if that person is taller. And even if you are tall enough and
you stand there you can only see one place, right. You cannot physically be in multiple places at
the same time. But here we have 4 different angles of seeing the parade. There are actually
quite a few cameras on the whole route so you can see them.
I hope that gives you enough ideas about the possibility of using network cameras to do all kinds
of things. This is an air quality in Washington DC. I was at [indiscernible] in Australia a few
weeks go and I noticed there taxi’s also have cameras on the side. I am not sure what the
purpose is, but I noticed that. This is another study by a research fellow at Australia National
University. They have a camera looking at the forest to see how the trees change and it is using
solar power and he is happy to share the data. So now you can imagine you can do a study of a
forest without going to Australia. I am going to skip that.
We are not the first project doing this. Actually some people in St. Louis have been doing this
for quite a few years, but there are quite a few differences. First they give you data, second the
data has a very [indiscernible]. Over almost 10 years, since 2006 to now they get 800 million
images and in a field study I can tell you we can get 800 million images in a day because of
[indiscernible]. So that is the quality I can give you.
So now let me introduce the project we are working on. It is called CAM2: Continuous Analysis
of Many Cameras. Yes?
>>: So just from the examples that you gave, how did you get the list of the cameras that are
publically available?
>> Yung-Hsiang Lu: Research imaging.
>>: Okay.
>> Yung-Hsiang Lu: So the question is: How do we find the camera? We do a web search. We
went through different types of strategies. First our project started by scanning the IP address at
Purdue Network with a permission of Purdue’s network security. They said, “Okay you know
how to do it; go ahead and do it.” So we found a few dozen cameras that way. So, different
brands of cameras have a specific signature. So we send a query, if a camera responds in a
particular format we send out the camera. So we started by doing that.
Then we also scanned a few universities IP address. We found a few hundred cameras that way.
One university complained, they noticed we were scanning the IP address, but that was too slow,
because the [indiscernible] was very low. Then we started working on traffic cameras. Why do
we use traffic cameras? Because you go to the state, you figure out the data format, you get a
few hundred or few thousand at once. So that’s another strategy. Then a third strategy is we
actually have an agreement with a few companies, hey have data online and then we get
permission to get the data.
>>: So what fraction of these are sort of 30 frames per second camera verses once a minute?
>> Yung-Hsiang Lu: So the question is: What’s the ratio of a high-frame rate verses low frame
rate? I actually don’t have a precise number. I would say maybe 5 to 10 percent give us 30
frames per second and I will show you in a later slide that depends on where you are talking
about. If you have a camera in the United States, you try to get data from using a machine in the
United States; you may get 30 frames per second. If you have exactly the same camera in
Europe you try to get data from using the machine in the US you don’t get 30 frames per second.
I will show you that a few slides later. Does that answer your question?
Okay, so the majority of cameras that we have are updated once every several seconds to once
every several minutes and we don’t have control over that. So this is our project. You are
welcome to register as a user. When I received [indiscernible] science foundation in July the
program manager really pushed me hard. I won’t give you a number, but he has me a number of
how many users we want to target before the grant ends. So I hope everybody signs up so we
can get our number up. Okay, go to that website and sign up as a user.
Our system is not an archive of data. In fact we don’t retrieve data regularly. We retrieve data
only when you ask us to retrieve data. Our system is a computing system for doing image
analysis in a large scale. When I say large I will show you an example here. We used 17
Amazon instances, we also used Microsoft Azure, but in this previous case we used Amazon.
Over 24 hours we grabbed 1 image every 5 seconds from 16 thousand cameras worldwide and
we get about 7TB of data. We are working on getting 1 billion images over 24 hours. We
encounter some limitation about our program because we need to spend more than 1 zone in
Amazon. But we think we will get there very soon. We are able to grab 200 million images and
do some relatively simple analysis we call background subtraction. They key here is not the
image analysis itself. The key here is being able to do image analysis at this scale. Yes?
>>: You mentioned something about zones. Were these 17 Amazon instances all inside one
zone?
>> Yung-Hsiang Lu: In one zone, yes.
>>: Why do the zones matter?
>> Yung-Hsiang Lu: Because the account we got from Amazon restricted to 20 instances per
zone.
>>: Oh I see.
>> Yung-Hsiang Lu: So basically we just need to modify the program so we can get it from
multiple zones. We got a research account from Amazon. We also got a research account from
Azure. Some of the data later you see will come from Azure.
So this is a background subtraction program. As you can see it is relatively simple, but you can
replace that simple program by many other things. So the purpose is not the image processing
itself. The purpose is to run data processing at a very large scale. We can also do things like
moving object detection or human detection. So let me explain the architecture of our system.
Our system called CAM2 contains analysis of many cameras and basically has the following
components: first it has a web portal that you saw earlier that interfaces with users. Then it has a
user database. We need to know who the users are, what they have done and so on, including the
programs they write. Then we have a camera database and as I mentioned there are quite a few
cameras. We have about 70,000 cameras right now in our system and the data goes directly from
the camera to the cloud. It doesn’t go to our system because our system would be bottlenecked
that way.
A lot of study here is about Resource Manager because when I started the project the purpose
was to build a realistic environment to study Resource Manager in the cloud. That was the
original purpose. But I wanted to build a real system so that the workload was real and that’s
why we started the project in the first place. So later on you will see quite a few slides that talk
about Resource Manager.
This shows the distribution of the cameras. The majority are in the United States. Simply it is
easier to do web search and we also go through the department of transportation systematically.
Purdue University has signed an agreement with about 20 states to get the data with their
approval. The data has property, but we don’t want to just grab the data because some states
prefer us not to grab the data from the website. They want to grab the data from their other
servers so that we don’t [indiscernible] website too much. Then the rest of the module is in west
Europe, again because it is easier to do a web search. Some of the students are working on
[indiscernible] and so on. So that shows the distribution of the cameras.
Let me give a demonstration of 1 possibility of using those cameras to do something we call realtime image based navigation. This was done by a student and received an award last year. This
is a very short vide because in our competition each group only has 7 minutes to present. So he
spoke really fast.
[Video]
>>: You are looking at our mobile application and as you can see it zooms and operates much
the same way as our website application. We will take a look at a few cameras in North America
here before jumping over to Europe. This is a brief glimpse at a camera in Lawson. Over in
Europe you can see the clustered functionality works much the same way as the website
application and we will take a look at a camera in the Mediterranean here before we get to the
really interesting part of the mobile application.
Here at the top you can see me typing in 2 addresses, both in New York, a few blocks apart.
What the application is going to do is not only calculate the route between these two addresses,
but it is also going to show me any cameras in our system between the two addresses. So what
this means is I can access the video feed from these cameras and take a live look at locations
along my route or I can look at my destination.
So this might be interesting to me if I would like to see if there is traffic perhaps or what the day
looks like outside, if there are people walking around. In this look you can see that there is not
much traffic, it’s a sunny day. So we would expect the same for the other cameras. I could also
see for instance if there was a crowd outside of a restaurant I might want to visit or any other sort
of interesting data. The important thing is that I can take a live look at a destination or place
important to me with this application.
[End Video]
>> Yung-Hsiang Lu: We also notice that the frame rate there is about 1 frame a second. Any
questions so far? Yes, please.
>>: Is CAM2 grabbing the images from the cameras directly or they are sending stuff into their
own web server somewhere and you are grabbing them from the web server?
>> Yung-Hsiang Lu: Okay, it depends on the source. Some of them we grab the data from the
camera directly and some of them we grab from their server. So CAM2 hides behind the camera
database and as a user you don’t need to know, but if you want to help us add more cameras you
do need to know. Does that answer your question?
>>: Yes.
>> Yung-Hsiang Lu: Okay, yes?
>>: If you have to add new cameras, then for me to be able to be compatible with the CAM2
system do I need any additional software?
>> Yung-Hsiang Lu: So the question is: If you want to add your camera to our system do you
need additional software? The answer is no, because we have a layer to handle the heterogeneity
of the cameras, unless your camera is something very strange and I don’t expect that. Our
system already handles many different types of cameras. Some that have high frame rates, some
that have low frame rates, different brands have different requests. There are different ways to
grab data. [indiscernible]. When you grab data your query has a special brand specific path and
we can call that. Does that answer your question? So unless you have a brand new camera that
we don’t know, if it’s a commodity camera we can handle it. We can also grab data from a web
server or FTP server as you asked. For example Texas told us not to grab from their web server.
They wanted us to grab from their FTP server because I guess their FTP server has more
bandwidth.
>>: So when the user requests a camera what is allowed? So basically do you reroute the data to
the user or you actually [inaudible]?
>> Yung-Hsiang Lu: Okay, so for the mobile demo the data goes to the mobile device directly.
It doesn’t go through us.
>>: [indiscernible].
>> Yung-Hsiang Lu: Yes, so the mobile app communicates with the camera directly and if you
want to do analysis, as I mentioned earlier, this analysis we go to the Amazon instances and that
goes to the camera directly. It doesn’t go through our server, because we are very afraid our
server becomes a performance bottleneck. In fact we are pretty sure our server will not be able
to handle this kind of load because we don’t have 17 machines. And these are the highest
performance machines.
>>: You are not storing anything, right.
>> Yung-Hsiang Lu: We are not storing unless you ask us to.
>>: Unless what?
>> Yung-Hsiang Lu: You ask us to store it.
>>: So what’s the [indiscernible]?
>> Yung-Hsiang Lu: In this case we don’t store it. We grab the data, we process it and we throw
it away, in this particular case, but you can store it. So let me go through a few more slides that
may answer some of the questions.
So as I mentioned the system was built originally to study the resource management for cloud.
So I put a lot of emphasis on cloud resource management and what’s a resource management
problem? You select a group of cameras, let’s say you want to study a camera in New York and
in New York those cameras have specific rate solutions. Then you also give this analysis
program. You may say, “I want to count the number of people or I want to detect a particular
car.” You can give us a program and you tell us the frame rate you want.
So, some of the cameras have a very high frame rate. Some cameras have a low frame rate. Of
course for low frame rate we cannot give you more than what the camera gives us, but some
cameras have very high frame rates. You can say, “I don’t want a high frame rate, even though
the camera can give me 30 frames per second I only want 1 frame per minute.” You can get that
number. Then we need to determine the cloud instances. What types, how many cores, how
much memory? Currently we don’t use any special hardware such as GPU, but we are working
on that. Then where the call instance should be and how many of them.
These are the Microsoft Azure locations and they are not equal. They are not equal in many
different ways. One of the reasons is the price, for Microsoft Azure it updated only yesterday.
The United States has the lowest cost. This is D14, it has 16 cores plus 112GB of memory. Per
hour you spend between 1.5 dollars to 19.9 dollars. So the difference is about 25 percent. For
Amazon the difference is much higher. For Amazon the difference is up to almost 50 percent.
So if you have a lot of data to analyze that 50 percent makes a difference.
So you may say, “Well, okay it looks like it’s cheaper to do data analysis in the United States.
Should we move all the data to the United States?” The answer is no, because as I mentioned
while location matters it depends on your desired frame rate. If your round-trip time between a
camera and the cloud instances are long then your frame rate will drop. In this case we use
MJPEG, this is measured and we really appreciate that Microsoft Azure gave us this. This data
was measured using Azure. All we do is select [indiscernible] cameras, know their locations; we
launch the instances in different parts of the world. We measure the round-trip time and then we
measure the frame rate we can achieve.
Then this figure also shows 2 types of data, the more dots with the yellow [indiscernible] is
measured. Then the black squares are emulated by injecting delays using an emulator. So what
we observe here is when the round-trip time increases the achievable frame rate drops for
MJPEG. What’s Motion JPEG? It encodes the video by a sequence of independent JPEG
frames. Why do they do that? Because it’s easier, it doesn’t need to do motion detection
between frames. It is also more robust. If one frame get’s corrupted the damage is only 1 frame.
But the disadvantage is you need more bits for the data streams.
Newer cameras support MJPEG and also H.264. H.264 we observe the frame rate doesn’t drop
once the round-trip time get’s increased. However, there are more repeating frames. Still if your
round-trip time get’s too long then H.264 still cannot keep up. So what it does is you will repeat
frames. On the surface the frame rate doesn’t drop, but in reality it still drops. So I think that
answers your question about location. Sorry it took so long to get here.
You have to be careful about where you launch your cloud instances because that can affect the
achievable frame rate.
>>: So why is it that round-trip time effects it? Is it not bandwidth?
>> Yung-Hsiang Lu: Because motion JPEG is TCP based. Once your round-trip time is low
enough your TCP outstanding window will –.
>>: [inaudible].
>> Yung-Hsiang Lu: Yeah, it is waiting for acknowledgment.
>>: [inaudible].
>> Yung-Hsiang Lu: H.264, but we still see that and actually I haven’t done enough studies to
see why, but it is not fully synchronous. It does not send a frame and wait for acknowledgment.
But TCP, once you reach the saturation –.
>>: [indiscernible], but H.264 is just streaming –.
>> Yung-Hsiang Lu: Yes, but we still see that.
>>: [inaudible].
>> Yung-Hsiang Lu: Basically you adjust, but we observe. We haven’t done enough
measurement yet, but we observe in several cases, the number of repeated frames increased as
the round-trip time increased.
>>: Repeated?
>> Yung-Hsiang Lu: Repeated frame, right. So it will tell you that you get 30 frames per
second, but it will also tell you these 2 frames are exactly the same. So you actually see the jitter
of motion.
>>: But do you lose frames or do you just see duplicates?
>> Yung-Hsiang Lu: Okay the question is do we lose frames or see duplicates? I think we see
duplicates. We don’t know the frames are lost because we are only on the receiving end. We
have no control of the cameras.
>>: Is there any place you are getting 30 frames per second?
>> Yung-Hsiang Lu: We have a lot, but this specific intention is to see –. The question here is:
Should we move all the data to the United States?
>>: [indiscernible].
>> Yung-Hsiang Lu: Yeah.
>>: [indiscernible]. In the end you are processing the data and that’s taking time as well. So if
you process it faster you can get some delay and then still [inaudible].
>> Yung-Hsiang Lu: No. So the question is: Can we do [indiscernible] of processing a delay?
The answer is no, because we simply do not get the data.
>>: That’s different though. You might get data, but if you don’t process it in time [inaudible].
>> Yung-Hsiang Lu: No, this is measured without processing. This figure is –.
>>: Right, but what I’m saying is that [indiscernible].
>> Yung-Hsiang Lu: So this is upper bound.
>>: If your consumption rate is slowing down because you are not processing it that’s the
equivalent to not getting the data.
>> Yung-Hsiang Lu: Right, well yes, but this example is we grab the data and we throw it away.
We don’t do anything. So this data is the upper bound. This figure is the highest you can
achieve, of course if you do processing [indiscernible]. Does that answer your question? Does
that make sense? Okay.
So this can be formulated as an optimization problem that you have different cameras. In this
case that view has 3 cameras and –? Yes?
>>: Yeah I have a thought. So you said that depending on what analysis somebody wants to do –
. Well this is a question. Do you actually know or understand that this analysis is going to need
this kind of an instance to be able to start up a VM based on that up front or is it something that
the consumer or customer who is doing the analysis have to provide?
>> Yung-Hsiang Lu: So the question is: Do we know if you run the program and we need to
determine how many virtual machines to launch, who knows that? It is somewhere in between.
We don’t know in advance, but we can launch your program, get a few data points and then we
extrapolate. That’s my paper tomorrow. We will be presenting in Vancouver tomorrow. You
give us a program –. So let me go back to this slide. So you give us a program and we run your
program. Let’s say you want to analyze the data from 100 cameras. We will launch your
program and run your program on 10 cameras. We will see whether we can meet the frame rate
you want.
If we can meet the frame rate you want, then we measure the utilization. Let’s say the utilization
is 80 percent then we will say, “It looks like 1 cloud instance is good enough for 10 strings. We
will launch 10 instances.” Does that make sense? Then of course that measurement can change
because your program behavior may change and the content may change. Then we will observe
and adapt. So maybe later on we find you need 12. So we launch more and reduce
[indiscernible]. Then sometime later maybe you need only 8. Then we will consolidate. That’s
exactly my speech for tomorrow in Vancouver.
Okay I am going to skip this slide. Then what’s the frame rate you need? It depends on the study
you want to do. If you want to study motion you probably need a high frame rate. If you want to
study let’s say weather you probably don’t need a very high frame rate. So let me walk you
through a few slides of the screen shot of our system to give you an idea of how our system can
be useful. This is an example of Los Angeles, because a few weeks ago I gave a speech at
UCLA so I used Los Angeles as an example. As I said we don’t continuously save the data. We
grab the data when you ask us to grab the data.
So after you log in you can go to our system and then you select by locations. You go to Los
Angeles and then you select some cameras. As you can see there are several hundred cameras
here. After you select some cameras you can also select a camera by recent snapshot. We grab
one frame from each camera every 24 hours. We just have a program that we rotate and run all
these through the camera. We make our system very slow. We don’t want to jam the network.
So we grab approximately 1 frame, actually I believe its 1 week, not 1 day.
So we grab one frame per camera every so many minutes. You can also select by snapshot.
Then you can see here how many cameras you have selected and how long you want to run your
program. You may want to run your program for only a few seconds or maybe a few hours, a
few days or maybe longer. You can say, “What’s the interval between frames?” If you are a
user we currently allow you to get 1 frame per second. That’s the highest frame you can get, but
if you are working with us you can have a backdoor and you can get a higher frame rate.
Why do we do once per second? Because we haven’t figured out what’s the right answer. We
simply set that limit. You can also specify how many frames you want to keep. Suppose you are
doing some kind of motion detection you may want so say, “I want 10 frames.” Then we will
give you a running window, if we get more than 10 frames we will drop the oldest, the 11th
frame. We just give the latest 10 frames. So you can specify that number.
Then you can upload your program. So I guess sorry it takes so long up here. You can upload a
program you write or you can use a program we give you. We have written more than a dozen
examples. These programs are written in Python right now. We are planning to extend to other
language, but we have not. We need more students as always. But you can use our existing
program to ask for samples and modify them. Basically if you imagine your program will most
likely be something like a loop. You grab one frame, do some analysis and grab another frame
and do another analysis. The only thing you need to change is to change that wide loop or 4 loop
into our event driven function. We have a paper published this year in [indiscernible] that
actually gives you the example, but once you log in as a user you will see all the examples that
we give you.
Then we will decide which cloud instance to launch and that’s a problem we haven’t solved yet.
We have some progress, but we still have a lot of problems to solve. So which cloud instance do
we need to launch? How many cores? Some programs are more computation intense. Some
programs need more memory. In fact this example shows you that this is a very complex
relationship and it’s not a linear relationship. Now let me explain what this figure shows. So we
show the utilization of a processor and the memory and the two different frame rates. On the left
side we have 1 frame every 5 seconds. On the right side we have 10 frames per second. You
can imagine you have a very high frame rate. On the right side it is processor intense. On the
left side it is not a very low frame rate its some programs are processor intense. For example
human detection and some other programs such as motion estimation is bounded by memory.
So the question here is that there are 4 different programs here, the image archive will simply
grab the image and save it. It does not do any processing. Motion estimation takes 2 adjacent
frames and asks them how much change has happened. Moving object detection does motion
estimation and then does a [indiscernible]. So it’s a little bit more complex. Then human
detection is the most complex because you use a histogram of orientation to do the human
detection. For this left side and right side you can see human detection we see in this particular
instance using M3X large, we cannot do human detection at 10 frames per second. So that
answers your question. If your progress is too complex computation is the bottleneck. On the
left side if your frame rate is low enough we can do human detection.
This is per frame utilization. So if you want to do human detection using this particular instance
you can do approximately 30 strings, maybe 40 strings. So process utilization is about 2.6
percent per string and then you can process about 30 something strings and then your utilization
becomes 100 percent. Does that answer your question from earlier?
>>: Yes.
>> Yung-Hsiang Lu: Okay and I’m going to skip this. This simply says that there is no simple
answer. It is a very complex problem and we are trying to build an empirical model so we don’t
have to do this for very type of instance. This slide also shows that if you make a wrong
decision, choose a wrong instance you may pay more than twice. This data shows the price per
million images. So if the bars are very much uniform that means you pay the same price for
different types of instances, but on the right side if you have 10 frames per second the tallest bar
is almost twice higher than the shortest bar. So that means you pay almost twice the cost. We
want to do that optimization so we choose the right instance. It depends on if your program is
computation intense or memory intense and some other factors.
Earlier I mentioned our project is called continuous analysis of many cameras. This is an
example showing we count people for 24 hours. And over 24 hours we actually see the number
of people going up and down at different times. Actually it shows very clearly that at night
nobody is waiting for a bus at the bus stop or even passes through. We do this for 1 frame every
10 seconds and then we do a running average every minute, that’s how we get a number. So if
you want to do this kind of analysis over 24 hours you can do that very easily by going to our
system. As you can see in the middle it says the duration. If you specify a whole day you can do
that.
>>: [inaudible].
>> Yung-Hsiang Lu: What’s paying for computation? Right now Microsoft and Amazon,
because we get a research credit. Sometime in the future maybe somebody else needs to pay.
Also our system is marginal enough so we can switch the back end to your machine. Actually
when we ran out of the research credit we got renewed, but when we ran out we just simply
switched the back end to our lab. We cannot do very large scale processing, but we can do that.
Also a related problem is: What happens to the data that you do not save? That’s a very big
question and I don’t have a very good answer yet. In cloud computing big data a few weeks ago
I presented some very initial ideas about a problem, but I think there are still a lot of problems to
solve so I won’t take time here.
So far what I have talked about is the real-time data. You get the data or your process the data.
A lot of cases you have the data [indiscernible]. So maybe you save it or you want to process it
again and again. So save the data and you want to process it offline, not real-time data. We also
studied how to do this using spot instance. You can think about using a computer in several
different ways. On demand you pay by hour, it’s like a hotel room. You can sign a long-term
contract; it’s like renting an apartment. You can do spot instance by bidding a price, it’s like
priceline.com. You give a price and you may get a hotel room or you may not.
However, the spot instance is different from Priceline because after you check into the hotel
room they may still kick you out if the market price goes above your bidding price. And when
that happens whatever you do is lost. Your intermediate result is lost. So what we do is we
create a periodic checkpoint so that when we get kicked out from the spot instance we can
resume, later on we can resume from the checkpoints. And we find you can vary your bidding
strategies. If your bidding price is about half the on demand price, meaning that if the hotel
room costs 200 dollars and your bidding price is 100 dollars you have a very good chance to get
it.
We find we have about only 5 percent performance degradation, which means we don’t get it or
we get kicked out. But that delay is about 5 percent only. But we can save about 85 percent of
the cost. The reason is the bidding price is the highest price you are willing to pay. What the
cloud may charge you is no more than your bidding price. So you may say, “I bid 100 dollars,”
but you may get it at 70 dollars. That’s the reason you can save up to 85 percent of the cost, but
you have only 5 percent performance degradation. So that can save a lot particularly if you have
a huge amount of data, that savings is significant.
I think I have told you quite a few interesting problems. I will say what we have done is only a
very small portion of what can be done. I call it the tip of the iceberg. The more studies we do
and the more papers we write the more problems we discover. And the more people we talk to,
the more people we want to talk to. Now as I mentioned earlier I have talked people doing
psychology, doing forest research, doing transportation research, but there are still many
problems. One problem is the metadata; currently there is no easy way to get metadata. We
don’t know where the cameras are. We don’t know the frame rate. There is no easy way to get
it. We don’t know what [indiscernible]. Is it indoor or outdoor? Can you see people? Still there
are a lot of things we need to work on. We are working on some automatic methods to generate
metadata.
Another problem is there is simply no standard of getting data. Different brands of cameras have
different methods. And different states have different ways of giving you the data. It all shows
up on a website for humans to see, but it’s not designed for machines to grab the data. At the
end what we want to do is to use the system to understand the world. So, obviously there is lot
of opportunity for vision research. One colleague was doing the machine learning so he asked
me to give him some data as simple as asking, “How much do you want?” He said, “How much
can you give me?” So I gave him 100,000 images from New York City at 1 fame per second
from 12 traffic cameras over 4 hours. So he said, “Well that’s good enough for awhile.” He was
trying to see how he could detect the same car from moving across several different
intersections.
I have talked about resource allocation. That is the original problem of building a system. And
privacy always comes up. When I was in Australia I found this very nice location. So I took a
picture of it and it tells you that you are being recorded. Our project uses only publically
available data and we are cleared by the Purdue legal department that we don’t have any privacy
concerns, but privacy is always a question. I have done some studies about the legal status of our
cameras. My understanding is this still needs to be worked out. I think this morning when I was
at the airport waiting for a flight I saw several cities decided to give each police person a camera
because of some recent instance. So what’s the privacy issue there? So if the police have a body
camera, who owns the data? I think that still needs to be worked out and I am not an expert on
that.
>>: [indiscernible] because you started by saying you are just tapping into public feeds, right?
>> Yung-Hsiang Lu: Yes.
>>: So basically you bypass those issues.
>> Yung-Hsiang Lu: We only use public data.
>>: You are not the ones deploying cameras?
>> Yung-Hsiang Lu: We are not the ones deploying cameras.
>>: Someone else is deploying, someone else is grabbing it and they are publically available so
it’s not really that much issue there.
>> Yung-Hsiang Lu: Correct, correct, but when we were looking for cameras we also found
some cameras were publically available, but we don’t want to use our system. So for example
we find some cameras and we believe they are looking like somebody is living room or
something. Why are they public? I don’t know, but we found them, but we don’t want them
because we don’t want to deal with that.
This is a big project and it is also very good experience for a lot of students, many of them are
honor graduate students. This is, I think for probably all of them, the biggest project they have
worked on. It is a very big team for them to have that experience and I am also very fortunate to
have [indiscernible] and of course Microsoft gave us the research credit to use as well. That’s
very helpful for us to do some study.
To conclude I hope I have given you the idea that network cameras can be very useful for a lot of
studies. On Wednesday in cloud [indiscernible] in Vancouver there will be a panel discussion
about network cameras and cloud computing. I am one of the panelists. I think there are still
many, many challenges. From the many questions here I can see there are a lot of questions and
we have answered only a very small fraction of a lot of questions, but I hope I have given you
the idea that we are building something useful. It is an open system for people to use. You can
register as a user and if you want to get the source code we can also work on some agreement.
We have already signed an agreement with a few universities about sharing a source code and
our legal department will be able to work out these details. As I mentioned our system is not an
archive of data. It is a computing platform for you to run programs and we do everything for
you, in particular allocating cloud resources. With this I want to thank you for your attention.
[Applause]
>>: So these programs that you can write to analyze, do they have to be some format? Do you
have some APIs?
>> Yung-Hsiang Lu: We have an API, basically your program needs to include a few modules
and you have to create the right class of a specific class we create and then [indiscernible] object.
So we handle the majority of work in the base class. But you have to override a few methods for
your [indiscernible].
>>: But the quantum is just 1 frame?
>> Yung-Hsiang Lu: Currently we do 1 frame.
>>: [inaudible].
>> Yung-Hsiang Lu: Yes, but you can get the past frames. So you can say, “I want the past 10
frames,” and then you can get say the last 3 frames. You decide what you want to do. And we
run in Windows. Let’s say you want to keep 10 frames, we just keep the 10 latest frames and
then the old one get’s dropped out. But you can specify the number 10, or 12 or whatever.
>>: [indiscernible].
>> Yung-Hsiang Lu: What we want to do is this: first we want to use this system to do some
studies, interesting studies. I mentioned psychology, civil infrastructure, transportation, forest
and so on and use this system for people to do machine learning because we have a huge amount
of data for them to use. Let me talk about machine learning a bit more because it seems to be a
hot topic these days.
For example we know the precise location of some cameras, for example New York City. They
each give us coordinates at each intersection and we can map that into, for example a bus route.
So you know a bus will come from one particular intersection to another particular intersection
to another particular intersection and then you can use that meta information to train your bus
detection program. And remember this bus may not be in the same lane and even in the same
lane the camera angle may be different. So this becomes a very rich resource of doing all kinds
of studies. So we want to provide that infrastructure. And I mentioned the psychology study; we
want to use this data for people to do observation, non-intrusive observation, of people
worldwide. Does that answer your question?
>>: Yes.
>> Yung-Hsiang Lu: I am also the organizer of a low power image recognition competition. So
this can also be used as the source of data. I have not combined this to the project yet. So far the
two projects are still separate, but this can be the source of data to use. As many of you may
know, good data is often the foundation of many interesting researches and we are building a
system to provide good data. Does that make sense?
>>: Sure.
>> Yung-Hsiang Lu: Okay.
>>: Sure it makes sense.
>> Yung-Hsiang Lu: Other questions?
>> Philip A. Chou: All right. Thanks very much.
>> Yung-Hsiang Lu: Thank you very much.
[Applause]
Download