>> Aman Kansal: It's my pleasure to welcome Moo-Ryong... Southern California. He'll be talking about cloud-enabled mobile sensing...

advertisement
>> Aman Kansal: It's my pleasure to welcome Moo-Ryong Ra from University of
Southern California. He'll be talking about cloud-enabled mobile sensing systems.
A large part of this talk is, in fact, a preview of what the rest of the world will only get to
hear at NSDI. So this is faster for you guys.
With that, welcome, Moo-Ryong. And take it from here.
>> Moo-Ryong Ra: Thanks for the introduction, Aman. First of all, thanks for inviting
me today as a candidate. I must say that it's my honor to give a talk here today.
My name is Moo-Ryong Ra from USC. And today I'm going to talk about cloud-enabled
mobile sensing systems. And especially how to enable efficient processing and secure
sharing of sensor data using the cloud.
So nowadays almost everyone has one or more smart mobile devices. So these mobile
devices are not just [inaudible] but deeply changing many aspects of our lives. The
changes are from how we interact with each other to where we eat, where we meet, and
how we experience a visit to the doctor and how we pay bills, et cetera.
So with this much mobile devices numerous cells have been developed, and these have
become indispensable in our everyday life. So compared to desktop PC or laptop, one
of the distinctive features of smart mobile devices is the resistance of sensors. Some
other smart mobile devices already have a variety of sensors such as camera GPS,
microphone, compass and motion sensors like [inaudible] and gyroscope and many
others.
So these sensors are smart mobile devices provide you just reach contextual
information to enable mobile features.
So based on those sensors, many useful mobile sentencing applications have been
developed. Video sharing, the [inaudible] intelligent personal digital assistant and photo
sharing and location based services are widely used. And some of them have
humongous feature base. The Facebook has over one billion active monthly users and
Google Maps for Mobile has more than hundred million active monthly users.
And note that all these applications are enabled by the cloud. So this cloud-enabled
mobile sensing systems and applications are the focus of this talk. And all these
applications use the cloud, many because they are resource constraint.
So the computing power is less powerful than the desktop PC or laptop. And they have
smaller storage space or so. And although they are rapidly revolving, the wireless
network is less reliable than wired network and not always available. And battery
problem is well known problem in mobile community.
For example, ever since mid 1990s, the battery density has been improved only two
times every 10 years. So cloud certainly gives a great opportunity for mobile devices
because of its high availability, nearly infinite storage space and millions of compute
cores.
So [inaudible] exist many use for cloud-enabled mobile sensing systems as
applications. The problem is people always desire first application with more features.
And there are growing concerns of security and privacy.
So my thesis focused on the system support to realize these growing demands and to
resolve existing concerns on cloud-enabled sensing applications.
So in this context we have several challenges to overcome in sensor data sharing
processing. First we often face performance problem and you deal with compute
intensive, data intensive mobile workloads.
And second, whenever we share large volumes of sensor data with others using the
cloud, there is a tension between efficient sensor data sharing and privacy protection.
This is not an easy problem to solve immediately.
And third, when we share the large volumes of data from the corpus of smartphone
users, there is often very challenging to officially deal with labor intensive self tests. So
the question is what kind of programming abstraction do we need to address this
problem?
And lastly, the -- whenever we share large volumes of data using mobile devices, the
energy problems are always there.
So given the challenges my research goal is to enable efficient processing and secure
sharing of sensor data using the cloud.
So my -- as I described, my thesis work is tightly connected to sensor data, sharing and
processing. So I made an effort to overcome several challenges that I described in the
previous slide.
So this slide is about enabling mobile perception application focused on performance
and P3 is about how to protect user's privacy against providers when sharing photos.
And Medusa is a high-level programming framework for crowd sensing. And SALSA is
how to trade off energy and delay when sharing large volumes of data.
Before we enter the details I would like to briefly cover some technical aspects of my
thesis research project. The Odessa project I did read data-flow programming language
and built runtime based on workload characterization.
In P3 I developed an encryption and decryption algorithm based on signal and image
processing average. And I built a software -- I built a system that uses software
interposition architecture that re-engineers photo upload and download protocols of
existing photo sharing service providers.
And in Medusa project I designed a domain-specific language and built a partitioned
runtime across mobile devices and the cloud. For those who are interested it [inaudible]
the implementation on Google code. This is [inaudible] project.
And in SALSA I exploited application delay tolerance to design [inaudible] network
interface selection algorithm using Lyapunov analysis. So this underlying system is
deployed at Los Angeles International Airport and other universities and companies for
more than three years.
And in addition, and I was an intern at here, at MSR with surge group, I focused on the
continuance sensing application and characterized the workload based on the simulator
and actual measurement on two very different types of processors.
So in today's talk, I'm going to cover the first two projects in depth, and I'm going to
summarize the other two pieces of work at the end of the talk.
So here is the outline of the talk. So I already introduced my problem space and my
research. I am moving to the first part, how to we -- how should we offload the
computation to the cloud to enable demanding applications and why the resisting
approaches are not directly applicable.
As I already mention smartphone have sensors. And these sensors enabled a set of
sensing applications such as activity recognition, health, traffic monitoring,
location-based services, et cetera.
Recent advances are computation, sensing, and communication capabilities of smart
mobile devices create a new class of application that we call interactive perception
applications. They are like the other sensing applications in this slide. The mobile
perception applications make use of high-data sensors like cameras.
Here are some examples of such application. So we use three prototype, the
interactive perception applications. The first one is face recognition application. So at
the conference, for example, [inaudible] people's faces to immediately recognize face in
the room. And second application is object and pose recognition application, which will
enable a [inaudible]. And third application is a gesture recognition application to control
tablet device using simple hand gestures to analogate the drive.
So these emerging applications have the following characteristics. So they are
interactive. Typically requires the crisp response time in the order of 10 to 200
milliseconds. And high data-rate because of video data. This is realtime video data.
And compute intensive because the computer vision-based algorithms are typically
used.
So when we run these applications on mobile devices, we will have significant
performance problem. So to understand this, two measures of goodness will
characterize the requirements of interactive perception applications. The throughput is
how many frames the system can processes per second or so often denoted as FPS,
frames per second. And next thing is end-to-end latency of a compute pipeline for a
given -- a single frame, which is basically response time of a given recognition task.
So in general, we are not sure if high throughput and low makespan to verify the how
severe the performance problem is. Here is one experimental data, one throughput of
type applications. So each application runs locally on mobile devices.
As you can sense from the video on the right side, this is two [inaudible]. So note that
the number on object and pose recognition application in the table is actually 10 times
slower than the video playing on the right side.
How do you solve this problem on performance? So fortunately these applications are
naturally presented as a data flow graph and in this slide. So first we have mobile
devices on the bottom and cloud infrastructure on the top and they are [inaudible]
through a deep network.
So technique that we can you've is offloading. So each moves demanding stages from
the mobile device to the cloud to reduce the execution time.
And second technique that we can use is using parallelism. So by increasing the
number of workers for demanding stages we can further reduce the execution time
significantly. And additionally we can process multiple frames simultaneously using
pipeline parallelism.
So given these techniques our focus in [inaudible] is in the context of computing data
intensive mobile applications which can be structured as data flow graph. How do we
design -- sure?
>>: [inaudible] it seems like the mobile devices are suddenly getting fast. Like people
don't know what to do with, like, four cores on the phone. At the same time I think so
why won't just that aspect solve the problem like taking everything to the cloud?
>> Moo-Ryong Ra: So can you rephrase the question? I'm not sure I ->>: [inaudible] performance of the device, right. I'm saying devices are getting faster.
And you wait a couple years and then you can run the application and the device itself
rather than worrying about this new structure [inaudible].
>> Moo-Ryong Ra: Oh, okay.
>>: Which actually has some fundamental image around how far the cloud is.
>> Moo-Ryong Ra: Okay. Okay. The -- I think this perception application, the three
application may be easily enabled by the future of may be three to four years of mobile
devices. But I think people will create more demanding application with higher accuracy
always exceed the capability of mobile devices. And there is the problem on energy
always.
So I think there is a need -- there will be a need to use the cloud, whatever the device
performs. That's my opinion.
>>: Okay.
>> Moo-Ryong Ra: Is that the answer for your question?
>>: It's your opinion you rendered, so I don't know what to say [laughter].
>> Moo-Ryong Ra: I see. Okay. So, yeah, given the application structure, the data
flow structure, how do we design the underlying system that use offloading and
parallelism techniques together to enable such applications? This is the high-level
focus of this work.
To enable the goal, to achieve the goals, these three fundamental questions need to be
answered. The first, what factors impact offloading and parallelism between mobile
devices and the cloud. And second, how do we improve the throughout and makespan
simultaneously, again by using offloading and parallelism techniques together? And,
third, how much benefits can we get compared to other strategies?
So to understand the problem space you measure the workload and identify that there
are lots of variability in the system. So graph on the right side shows the result on
object and pose recognition application. The X axis is a frame number and Y axis is a
number of SIFT features detected on the bottom. And makespan values on the top.
>>: I missed it. Can you tell me again what makespan is?
>> Moo-Ryong Ra: Makespan is the entrant, the compute latency over entire pipeline
for a single frame.
So the upper graph shows makespan values. And actually there was huge variability in
complexity in input which caused significant fluctuation on makespan values.
For example, if we see the frame number 200, the makespan value's relatively low
because sin complexity is relatively moderate. But if we see the frame number 300, it
has much longer makespan because of more SIFT features on the first [inaudible] in the
image. Yes?
>>: So to interpret that graph you're seeing about 12 seconds of latency per frame?
>> Moo-Ryong Ra: Yeah. In this application everything runs locally very, very slow.
So I will show how my system improves the performance too by huge overlaper.
>>: [inaudible] average, right? I mean, it's like some of them are 10, some of them are
30?
>>: Right. But looking at that red line, just that would save 12 seconds to get your
frame back.
>> Moo-Ryong Ra: And this is fairly [inaudible].
>>: I'm sorry.
>>: [inaudible].
>>: What did you just say about it's something mobile?
>> Moo-Ryong Ra: Every stage runs locally.
>>: On the phone?
>> Moo-Ryong Ra: On the phone.
>>: Okay. Got it.
>> Moo-Ryong Ra: So why this huge makespan.
>>: What's the capability of the phone? Which phone did you use for experiment?
>> Moo-Ryong Ra: So it was the netbook at that time, 1.4 gigahertz single core. And
the Galaxy S III has code core 1.4 gigahertz CTU now. But at that time you don't have
this. Questions?
So from this we learned that the system should adapt to the but variability at runtime
because of huge variability in input.
And in addition to input variability we also explored other domains which can affect the
performance. The additional domains include the different mobile devices and network
condition and different choices of parallelism.
So these additional domains incur more variabilities to the system, so we conclude that
offloading and parallelism decisions should be adaptive to input and platform variability.
But from the lessons from the measurement we design the Odessa runtime system.
And let me give you a high-level description of Odessa. So Odessa is a runtime on top
of Sprout. Sprout is a distributed and parallel runtime engine developed at [inaudible].
Odessa uses mechanisms provided by the Sprout.
Odessa runtime is mainly comprised of two components e the application profiler and
decision engine. And this application profiler delivers sequencer statistics to the
decision engine using lightly weight piggybacking mechanism. And thereby decision
engine can adapt or [inaudible] parallelism decision to improve throughput and
makespan simultaneously.
So this decision engine runs on mobile devices. This means some part of data flow
graph will be placed on the cloud if necessary or compute stage might be offloaded from
the mobile device to the cloud, from the cloud to the mobile device. And he can also
spawn more workers for demanding stages.
Then how the decisions are made. It's all [inaudible]. Let's look at how this makes this
decision. So when the application starts the entire part is on the smartphone.
>>: [inaudible].
>> Moo-Ryong Ra: Sure.
>>: [inaudible] being written for this framework, or are you doing all of this
automatically?
>> Moo-Ryong Ra: I would say written for this framework, the application developer
should provide the data flow structure to the runtime.
Again the smartphone is on the button and the cloud infrastructure on the top, they are
connected through the network. So based on the profiler data, the decision engine
knows stage A is a [inaudible] and then it estimates the migration cost and expected
execution time on the cloud and offload the stage only in the remote execution cost is
less than local execution cost.
And after that, the decision engine again identifies the stage B as a battler and offload it.
And after that, it spoils one more worker more stage B since the stage B was the first
one here. And at some point [inaudible] could be a battler. Then the system must
estimate the offloading possibilities on both ends and take the relevant picture.
So in this particular example, the system migrates a [inaudible] stage to the cloud. So
overall decisions are incremental and so it adapts quickly to input and platform
variability. So before talking about the performance result, here are actual data flow
graph for our three prototype applications. But these applications runs on top of
Odessa. Yes?
>>: You said that quickly. Can you just give me a [inaudible].
>> Moo-Ryong Ra: So this is an engine for the decision making only two to -- less than
two milliseconds. So this is quite quick so we can, you know, frame level ->>: In the example you started with where you moved those three modules up into the
cloud, how many frames would it take for that to happen?
>> Moo-Ryong Ra: So it depends on the parameter setting. We see -- we actually
profile the actual recognition time of every frames using our profile engine. So we set
our window size as 10. So see the statistics for the [inaudible] 10 frames and makes
the decision.
>>: I have a question. So when you say adaptive I'm just trying to understand. Does it
mean that right now on WiFi I walk around the building and I won't find WiFi any more,
I'm on 3G ->> Moo-Ryong Ra: Uh-huh. That ->>: Immediately switch the phone -- it would immediately switch the local processing?
>> Moo-Ryong Ra: Yeah.
>>: Okay.
>> Moo-Ryong Ra: It might bring back the compute stage back to the [inaudible].
>>: And how long does that take? Because you have to figure out that latencies have
increased.
>> Moo-Ryong Ra: Yeah. We will see the latest 10-frame statistics. So usually -- it
depends -- as long as we don't lose the connectivity within five to eight frames we can
[inaudible].
>>: [inaudible].
>> Moo-Ryong Ra: As we can see, all these applications have varying structure and
different number of computationally demanding stages.
>>: How difficult would it be -- so this is a lot of complication. How difficult would it be
for me, as a programmer, to go in and add a notation to say if the latency -- if you're on
3G from this locally, if not run this on the cloud or vice versa for a handful of stages.
I guess the question I'm getting at is how complicated are these app locations and do
they actually require your adaptive mechanism?
>> Moo-Ryong Ra: Application developer is not to know the -- these dynamics. They
just provide -- these are fine grained, the data flow structure, and the runtime takes care
of the rest automatically.
>>: Yeah. No, I understand that. But my question was how complicated are each of
these pipeline stages? Could I have -- if I were an expert developer, maybe I don't want
to use your framework, could I come along and just make -- provide those annotations
to use this unknown Sprout framework to do this statically?
>> Moo-Ryong Ra: So I compare our performance with domain experts later, but, yeah.
The domain experts cannot know the every decisions correctly. Yeah. Actually this
slide shows the [inaudible].
So our main -- the problem was performance. So let's compare Odessa's performance
with other strategies. So we compare Odessa with three other competitors as well as
one imaginary strategy that is optimized by offline method. The local runs every stage
locally on mobile devices. And offload all runs the stage that [inaudible] frames as well
as the stage that displays the result on the mobile device. And all other result, all other
stages will run on the cloud.
And domain-specific uses the partition source by the domain experts which is the
application developer in our case. And last competitor is offline optimizer. It basically
uses to research every possible partitions and picks the one that gives the best result.
So sense the computation requires this too expensive and it requires statistics on all
possible partitions it cannot be done online but has to be done offline.
To remind you the throughput high is better, the makespan low is better. Yes?
>>: This question is about what this data represents. Are these means or medians?
What is the benchmark? What is the fraction of time spent in different network
conditions?
>> Moo-Ryong Ra: This results is on object and pose recognition application. And we
use the best quality network, hundred megabit bits per second for this experiment.
When we use the 1.4 [inaudible] CPU netbook as a client. And for this experiment we
run the object and pose recognition on the mobile device at the beginning. And wait
until the partitions is saturated and then see the average, the frame rate, average
throughput and average makespan after hundred frames.
>>: If it's not [inaudible] bandwidth why [inaudible] give you the highest frames per
second?
>> Moo-Ryong Ra: The photo -- why the old photo is not going to be highest
throughput? That seems to be your question?
>>: And will -- yeah. If you've got effectively a zero latency or very low latency network
with unlimited resources and you've got a hundred megabit network, why would that -why would the lower frames per second mean something that in Odessa, for example?
>> Moo-Ryong Ra: Okay. The why the Odessa performs better than offload or
domain-specific technology, that's the abstraction question for [inaudible].
The reason is I think two things. One is parallelism choice. And the other is the way of
partitioning the application across to the available resources.
For parallelism the domain specific makes a wrong decision. The -- in terms of pipeline
parallelism, domain specific, the application developer doesn't know what is the right
number of tokens existing on the pipeline? So how many mini frames the system
should processes simultaneously.
That decision depends a lot of on this device capability. So it should be based on the
actual profile rather than the fixed number at the beginning. So that's one very crucial
reason about this performance difference.
And the offload, the amount of data parallelism was also important. It's usually just a
single -- the detection stage single recognition stage, where the domain space began
with multiple stage for such demanding stages.
Is that the answer for your question?
>>: I guess I don't fully understand what these stages do. So I -- you're saying there's
a lack of parallelism in offload all?
>> Moo-Ryong Ra: Basically, yes.
>>: So would you say it's fair to say that offload all was implemented poorly or --
>> Moo-Ryong Ra: So offload all choose wrong number of data parallelism and
pipeline parallelism. So for example, the object and pose recognition application, there
are three demanding stages. Safety feature extraction where they're matching and
clustering. But, you know, domain specific and offload all, especially they use just one
instance of safety feature extraction, one worker thread for safety feature extraction,
clustering and more dimension. So that caused the huge performance difference.
And the other is pipeline parallelism. [inaudible] you run pipeline you know you have 10
end-to-end pipe stages, computer stages. And deciding the right number of frames in
the pipeline at a given time is not truly your decision. It should be based on actual
profile data and so on.
>>: [inaudible] asking is if all the trick -- all the changes you're talking about in Odessa
don't sound like adaptive, they sound kind of like better programming in a way or better
use of the data. And if those same things could happen in offload all, would offload all
be as fast as Odessa? Or is there something going on sort of almost in realtime in
adaptive, which seem to be the special thing about Odessa?
>> Moo-Ryong Ra: Right.
>>: That was ->> Moo-Ryong Ra: [inaudible].
>>: Is making it different than offload all.
>> Moo-Ryong Ra: So what's your ->>: I guess [inaudible] doesn't seem to me like -- it doesn't seem like an optimal parallel
like if I -- if I figured the cloud is free and I just want to burn as many resources as
possible, you would presume I would go for the maximum level of parallelism and just
burn the heck out of the CPU units in the cloud. And if -- that would seem like one of
the naive strategies I want to compare to it. It might not be terribly efficient in terms of
use of CPU resources if I'm throwing out steak that will be useful in a less parallel
situation.
But that would certainly -- I mean, maxing out parallelism would seem to be the best
way, the maximum frame per second.
>> Moo-Ryong Ra: So one problem is they don't know what the right number of
maximum parallelism is for a given environment. That's one thing. So let me show how
I decide the pipeline parallelism.
>>: [inaudible] offload, onload, is that something that you implemented, and were you
using some other system that just, you know, was offloaded this?
>> Moo-Ryong Ra: I use mechanisms provide by Sprout. So they provide the basic,
yeah, offloading structures. Yeah. So let me show -- so I think left [inaudible] depend
on the application since it is adaptive to a given environment. So the degree of pipeline
parallelism should change accordingly, right? But the domain experts and this offload
all cannot know this right number of degree of pipeline parallelism as well as -- so in my
experiment the offload all doesn't use the maximum agree degree of parallelism. That's
why the performance is too low.
>>: So it was kind of set at a fixed low level whereas your system will vary it [inaudible]
and it will go higher than the level that [inaudible].
>>: What if you just picked a higher level of -- would that suffer badly ever?
>> Moo-Ryong Ra: Higher level ->>: Higher level of parallelism [inaudible].
>> Moo-Ryong Ra: Then the [inaudible] suffer because the -- all [inaudible] frames will
wait before the [inaudible] stage. So we need to be careful about choosing the real
degree of pipeline paradigm.
>>: Odessa is doing three different things you described so far in terms of improving
this. One is it's deciding which pipeline stages to upload -- to upload to the cloud.
>> Moo-Ryong Ra: Okay.
>>: Secondly is deciding what degree of parallelism to get at each new stage.
>> Moo-Ryong Ra: Right.
>>: And thirdly, it's making adaptive decisions about those as conditions change.
>> Moo-Ryong Ra: Right.
>>: Can you give us a breakdown for this example you've been showing us, the one on
the previous slide before you jumped here. Would those things matter? It seems like in
particular the decision about which stages to upload is not relevant. It's not the -- the
reason that Odessa is beating offload all is not because there's some stage that's really
important to do on the client. I'm guessing. I may be wrong.
Can you give us a breakdown of which of those things matters in this example?
>> Moo-Ryong Ra: On this example ->>: The example you had before, that chart.
>> Moo-Ryong Ra: Well, this is an example. The -- so let me show the resulting
partitions. This is the result on object and pose recognition application. The resulting
partitions are something like this. There are three demanding stages but the Odessa
offload will need two stages and increase the data parallelism like this and control the
pipeline parallelism also.
The notable difference is it is accurate clustering stage locally. So the Odessa can use
the more resources on the cloud for the other stages. Compared to offload the domain
specific technology which may use the maximum data parallelism whatever we set then
may be wrong. Right? The Odessa uses the necessary computation resource locally
and utilize the cloud resources more the right way.
>>: Are you saying that this is actually optimal here doing four network ground trips
rash taking that middle stage and pushing out to the cloud, that the performance would
be worse if that middle blue rectangle there were pushed up on to the cloud?
>> Moo-Ryong Ra: That depends on the amount of data that will be transmitted
between the stages. So in this ->>: How could that possibly be worse than that?
>> Moo-Ryong Ra: So actually but through my Odessa algorithm is works based on the
[inaudible] stage. So I measured every stage at execution time. So execution time of
blue rectangle and execution the delay of the every network edge and execution time of
red rectangle also and tried to reduce the execution time of the [inaudible] link or stage.
So there make sure this is better than the other partitions. Right?
>>: So you're skeptical of that result? [laughter].
>> Moo-Ryong Ra: I'm not skeptical about this result.
>>: I am not as skeptical of this result.
>> Moo-Ryong Ra: This number is ->>: If you pushed that rectangle -- if you pushed that blue rectangle up there, you're
going to save two network round trips -- excuse me, you're going to save one network
round trip, two network opps, and you're going to be able to use a higher performance
core up in the cloud than you are down on your client machine. It may be a small win
but it's got to be a win in terms of performance.
>> Moo-Ryong Ra: But that depends on the congestion on the cloud side, right. So if
you offload the middle stage on top, maybe -- maybe performance is a little better. But
the throughput is governed by the [inaudible] stage. Therefore will not increase. Right?
So the -- I'm trying to optimize both the makespan and throughput simultaneously,
however the cost is, offload the single stage to the cloud may not increase the
throughput at all. Because [inaudible] execution time ->>: [inaudible] right? It certainly will not hurt it. It would not be any worse.
>> Moo-Ryong Ra: Yeah, maybe. But I'm not saying my [inaudible] partition is globally
optimal, I just increase the both metrics simultaneously.
>>: So your assumption is that the cloud itself could be [inaudible] at some point?
>> Moo-Ryong Ra: Yes.
>>: [inaudible].
>> Moo-Ryong Ra: Right. Any other questions? Okay. Then I will get back to the
result. Yeah. For this I will perform against three other competitors. And even
compared to the offline optimizer it gets comparable throughput.
And although different, there are considerable amount of related work in this space.
First set of approaches using the integer linear programming. And second set of
approach -- sorry, is based on graph-based partitioning method to optimize custom
utility function. And third approach is using static partitioning scheme. Of the
application partitions will be determined at compile time. And fourth ones are switching
between pre-specified partitions, either by the application developers or domain experts.
So these are not providing the relevant solution for ours because the objectives are
different. And because of huge variability. So static or fixed partitioning schemes will
not work. And none of these considers the parallelization of demanding stages on
mobile devices.
So Odessa, so Odessa achieves our goal using the [inaudible] in dynamic runtime,
which adopts input and perform variability runtime.
The summary of Odessa, some emerging applications are too heavy to run on mobile
devices. So Odessa enables interactive perception applications by dynamically
adapting to the input and platform variability.
So I'm moving to the second piece of my work. So when we just enabled the mobile
perception applications. So when you want to share the [inaudible] result within the
cloud, you may have a privacy problem. So this work is about how to protect our
privacy when sharing the photos.
So cloud-based photo sharing services, PSPs are becoming very popular nowadays.
The people uses various mobile devices to share photos and upload it to PSPs using
wireless network. But here we have serious privacy concerns.
Here is an example. Suppose Alice has a secret picture of a nice guy and want to
share it with friends using PSP? First possible concern, privacy concern, in this
situation is the unexpected exposure of the photo. Which could happen either by
accidental bugs or careless system design by PSP.
The second problem might be we don't have any mechanism to PSP's data abuse. So
in this particular example, the people may use their best possible inference algorithm in
the photo and may conclude the following: So this is obviously not a desirable scenario
for Alice. But currently there is no way to prevent this scenario. We need to completely
trust PSPs in order to share our photos.
So Alice not making up artificial threats, but they're real ones. Here are for recent new
satellites. The Photobucket system unexpectedly exposed the [inaudible] photos
because of their naive system design. The problem was their photo URL was too easy
to guess. There by the [inaudible] to know is just to use your ID.
And besides the Facebook had face recognition API in their web based API data
specification. But because of privacy issues, partially described in this slide, they
eventually shut down the API.
And now a long time ago, the Instagram tried to change their terms of service saying
that they can send you just photos without having data on this function. It caused a big
ruckus. So the company diverted back to the original [inaudible].
So these privacy concerns are real to many users.
On the other hand, PSPs provide the useful processing for mobile devices. Again,
suppose Alice has a brand-new smart camera and takes a high resolution photo and
unload it to a PSP. And you know, and the horror -- Alice's friend may have a mobile
device with different screen sizes.
In order to provide desirable user experience, the PSP will still the image appropriately
and send them to the different mobile devices. But these types of processing, so-called
images scaleability service are very useful for users to reduce network latency and
[inaudible].
Also, it is possible that the PSPs can perform other kinds of proving. For example,
[inaudible] operation to enhance image quality. So the cloud is already doing useful
processing for mobile devices. And people gets tremendous benefits from that.
And the problem that you will have both privacy protection and cloud-side processing.
So solving this problem, especially understand particular constraints, is quite tricky. So
we immediately think that as a potential solution why not just encrypt everything? So as
a result, for example, the mobile devices should download full resolution images,
regardless of their screen size and storage size limitation. This is [inaudible].
So if you useful encryption, we lose image scaleability service as well as the other
benefits provided by the providers. Unless you do it -- before describing our approach, I
will describe our goals, thread model, and assumptions that we made.
Again, our goal is to protect users' privacy with the cloud-side processing. And our
thread model covers two categories of threats. So one is the unauthorized access, and
the other is the application of automatic recognition technology on users' photos. And
our trust boundary is in between mobile devices in the cloud, which means that we
completely trust mobile devices hardware and software. Those include sensors,
operating system codes and apps, et cetera. And we don't trust starters, including
eavesdroppers on network and PSPs.
For PSPs you assume that they are honest but curious, so which again means they will
not change what they are doing for photos no matter what. But they will try to inform
you just partial information using their best possible method.
So I am describing the our approach in high level. So, again, suppose Alice want to
share a photo with Bob. From the photo we first extract small but has very important
visual information which we called a secret part. So one can think it as the most
significant bits of entire image and I will describe how exactly we construct the secret
part later, after this slide.
And after removing the secret part what remains is large volume but has reader vanish
information, which equal a public part in this talk. Again, one can think it as least
significant bits of entire image there. And public part is the standard JPEG image. So
the PSPs can't accept it without changing their system.
So in theory, the secret part will be encrypted and ideally embedded inside the public
part, and then image will be up loaded to a PSP. And this way PSPs can perform any
processing and useful processing on the public part. In this particular example, they
scaled down the image for serving the mobile device. When Bob wants to see the
photo, he download both public and secret part and combine those two to reconstruct
image.
To enable this capability we have certainly important requirements. They are our
algorithm should ensure has to be -- has to ensure privacy on the public part and
storage overhead should be minimized. Our encryption and decryption processes
should be lightweight. And our public part should maintain a standard compliancy, in
our case JPEG image. And the cloud should be able to processes the public part
appropriately.
And the resulting system should transparently work with the existing PSPs. So overall,
the -- our algorithm system collectively called P3 achieve these goals and requirements.
And I will describe why our system and algorithm works in later slides. Yes?
>>: So [inaudible] you assume what they can do to your data?
>> Moo-Ryong Ra: Yes. I will describe it later.
So before describing actual encryption decryption algorithm, I want to share the intuition
behind the P3 algorithm. So how do you extract small but important information from
the given image? So in this work we focused on the widely used image format, JPEG
image compression standard.
So in JPEG when compressing the image, and image is divided into many small
patches. The size of one patch is 8 by 8. On these patches, the JPEG performs the
DCT, the discrete cosine transform. Then the location this left grade corresponds to
different frequency values. So if you draw these two lines of all coefficients from the old
patches in the image, they will look like this.
In this gram the center position have zero values denoted as blue line here. Then the
first fact that you can exploit is that the discrete coefficients of the actual images are
sparse. So in general, more energy is concentrated on the top left corner, which has
low frequency values. Especially zero frequent values are called DC coefficient, DC
component. And has significant visual information.
And the second thing that we can use is we see that the signs of coefficients are evenly
distributed because the histograms are mostly symmetric. So if we take out those
values, it is very hard for attackers to correctly recover the values. And, third, certainly
magnitude of [inaudible] have some information. So to exploit this fact basically takes
all three components out to degrade the public part as much as possible.
Now, I'm ready to describe how P3 encryption works. From the given image we get
quantized DC coefficients. First we take out the DC components, which is visual
information. And for remaining AC coefficients, we cut their magnitude using fixed
threshold T. Then its regions are iterated separately with their signs. So in a part of the
coefficients we form a public part which become another JPEG file. And you restore it
and process by a PSP system. And out the part of coefficient we recombined with the
DC components and former secret part. Which is small size but has important and
significant visual information.
And the second part will be encrypted when it goes over the mobile device.
So note that we successfully eliminate three important components that I discussed in
the previous slide. DC and magnitude by thresholding. And signs taken for a secret
part.
The next question is now we have this P3 encryption algorithm. So how well this
algorithm works in practice? If I implement this algorithm and first result show threshold
versus storage trade-off. And this -- yes?
>>: [inaudible] previous slide. The way you cut the threshold if it's higher than the
threshold do you take the whole thing and store it or only the delta part upon the
threshold?
>> Moo-Ryong Ra: Delta part of the threshold and remain -- the threshold in the public
part. We messed up the [inaudible] public part.
So in this graph we applied our algorithm in INRIA data set, which has 1491 different
images. And X axis of this graph is P3 threshold used and Y axis is normalized file size
compared to the original.
So naturally original size is our one in blue color. And secret part is in red. And public
part is in green. And so the public and secret parts are in black. So results are very
encouraging. Even if it comes to the worst case, the file size increases only by 20
percent. And for the individual file size the size of probably a secret parts are almost
even at threshold 1. After that, the volumes are moving to the public part as we
increase the threshold.
So based on this result and the privacy variation on the public part, we set our operating
range, the P3's operating range as 1 to 20. Then the next case there might be -- my
information will be exposed in our operating range in the public part.
So I used one example image from UCCP data set, which has some canonical images.
When you set the threshold as 20, which is the strongest privacy setting in our scheme,
the image looks like this. So if you are familiar with the image data -- the UCCP data
set, you may recognize some structure here. But depending on who you are, one may
have a hard time to recognize what is in the image.
If I decrease the threshold, the image becomes more secure. This is 15, 10, 5, and 1.
So if you said the threshold as 1, the visual rate is almost impossible to recognize
anything. And for your reference, I'll present the original image if looks like this. And I
will present the secret part with their threshold as I increase the threshold.
This is secret part with threshold 1, 5, 10, 15, 20. As I increase the threshold, naturally
less information will remain in the secret part and more volumes will go to the public
part.
So we have seen how in P3 increase the image and it's basically trade-off. Yes?
>>: [inaudible] adversary who is trying to increase this information or are you just using
the standard ->> Moo-Ryong Ra: I just use standard [inaudible].
>>: So it might be possible to [inaudible] more image if you try to do so?
>> Moo-Ryong Ra: We don't have [inaudible] of our secret [inaudible]. In the variation I
will show our -- the variation method. We use automatic recognition technology and so
on.
>>: [inaudible] show a particular encryption for that a [inaudible] looking at statistics in,
you know, higher order statistics images detecting things like modification [inaudible]
and all that work works because there are pretty strong structure -- there's strong
structure in natural images. And I'm wondering whether -- my intuition is that you've
[inaudible] one can apply similar techniques here and your priors about the relationships
are not [inaudible] and nearby pixels and natural images and recover a lot of natural
images from not very much bits, LSB bits that you're [inaudible]. I'm thinking about the
adversary group probably matters a lot because how much -- because there's so much
-- because you made such strong assumptions about the set of images that can be -that can make it through your ->> Moo-Ryong Ra: Right. [inaudible]. We haven't tried such image forensic
techniques in our schemes yet.
So we show how P3 encrypts the image and in trade-off. Then what about the
decryption? How about decryption? So for decrypting the image, we are facing one
very interesting challenges. Because of this cloud site processing. Suppose, again, the
Alice want to share a photo with Bob. Since the public part is told and processed by a
PSP system, the receiver will get the unprocessed secret part with the processed
version of public part.
Then the challenge is can you reconstruct -- sorry, can you reconstruct a processed
version of original image using the given information on the receiver side?
So if you can express the original image as a linear combination of secret and public
part, so this problem becomes more straightforward. But it is not the case in our setting
because we -- our P3 encryption algorithm hides this information from the public part.
Then how do you solve this problem? So, as I mentioned, the original image is not just
a linear combination of secret and public part. So it turns out that the correct location
for the original image must include compensation from C. And our analysis result
shows this C, this compensation from C can be derived from the secret part. Which you
already have on the receiver side. Therefore, P3 can handle any linear processing.
And for photos, this linear processing can handle many useful functions, the scaling,
cropping, sharpening, blending, smoothing. Is that the answer for your question?
>>: [inaudible].
>> Moo-Ryong Ra: But based on this P3 encryption and decryption algorithm, we
design a P3 system that can transparently work with existing PSPs. So P3 takes an
[inaudible] architecture to really acquire trusted proxy on the device, on the mobile
device and as [inaudible] the cloud-side storage space.
So it would be ideal if we can store the secret part together with the public part on to the
PSP system. And the JPEG sender does our route, embedding application specific
information into the binary. But in reality, most PSPs will eliminate this application
specific information when they receive the photos from the users.
So we take this [inaudible] approach based on the external storage space. So the
on-device proxy will perform P3 encryption, description when it uploads or downloads
the photos. And cloud-side storage space will store encrypted secret part. And public
part will be stored and processed by a PSP.
So P3 architecture is very easy to be implemented with existing PSPs. And we don't
require the change the PSPs' infrastructure. So we -- sure.
>>: So if there are two copies of the same image, and then if I use the same threshold
and then the public part should be the same, right?
>> Moo-Ryong Ra: Right.
>>: So -- and I assume that public part is also kind of sense it's sparse, it's kind of -you can use public part as a signature of an image.
>> Moo-Ryong Ra: Okay. [inaudible] scenario or ->>: Right. So an attacker or whatnot, right? So let's say I have an image of Justin
Bieber and then you know the public part of the image, right? So let's say even if there
are some, I don't know, hundreds of millions of images out there, if the public part
encodes kind of unique -- unique bits corresponding to an image, then just ignoring the
public part you can identify what's the original image, right?
>> Moo-Ryong Ra: If you [inaudible] to secret part, that means you are my friend.
>>: [inaudible] having the public part, you can almost guarantee that you can identify -you can map to an original image.
>> Moo-Ryong Ra: That's --
>>: If the original image ->>: If you had ->>: -- available out on the Internet somewhere?
>>: Uh-huh. Yeah.
[brief talking over].
>>: [inaudible] software and hardware on the mobile device, [inaudible] mobile device
[inaudible] user is the only one that is [inaudible].
[brief talking over].
>>: Other scenarios, right. Like so I want to find out who has posted the picture of
Justin Bieber, right, for instance that I have a picture of Justin Bieber, I create a public
part, then I can just scan the entire publicly available images and then find out that -which public part of the image is the Justin Bieber's -- the public part of the Justin
Bieber's ->>: You're saying is like if they're identical under this -- and hash the public part is a
hash of the image?
>>: Right. Right. And then this -- looks like this public part, since it's very sparse, looks
like, you know, could be very one on one mapping, that's many to one mapping from the
original image to that.
>> Moo-Ryong Ra: [inaudible] doesn't solve your problem. I assume kind of this image
is not publicly available all the time. But I think one way to address that problem is
however we unload the public part to the PSP we may inject a random overlapping
images and may add random overlapping images, depending on the users. That may
hide ->>: [inaudible] capability to reconstruct the original image with the secret part if you
inject something ->> Moo-Ryong Ra: So the receiver -- if the receiver knows the ->>: [inaudible].
>> Moo-Ryong Ra: Yeah. But -- yeah. Valid point, yes.
>>: [inaudible] constant threshold you have some minimized threshold and encoded
that into the private part [inaudible].
>> Moo-Ryong Ra: Another question?
>>: [inaudible] this isn't really encryption as much as reversible obfuscation. Because
encryption requires -- encryption implies that you have a key, and if you're not the
person who has a key, you can't, under most definitions of security for encryption, it
means that you -- without the key, you can't tell if the encrypted message is the
encryption of a given plaintext. But in this case, if you have a -- if you have a plaintext
you can tell whether this is the encryption of that plaintext.
>> Moo-Ryong Ra: So let me move on to the next stop. So actually implement the
necessary component on your device and with Facebook system this prototype is
visually on top of one of the latest smartphone, the Samsung [inaudible] 3. And here is
a screenshot on the device and the delay numberings on the device also. So receiver
without relevant password or relevant key, we basically see the gray image on the right
side. Depending on the threshold it may change rather than the original on the left side.
And also delay numbers are moderate. So the P3 is practical and can be implemented
with a real system like Facebook. Sure?
>>: [inaudible] and you posted this image on the Facebook but you [inaudible] in that
case, because I should be able to see the picture too, right?
>> Moo-Ryong Ra: Right.
>>: So what do I need to -- so you need access to the [inaudible] on Facebook
[inaudible].
>> Moo-Ryong Ra: Right. Right.
>>: And you need this information to make the secret domain and somehow there
needs to be some layer which combines this.
>> Moo-Ryong Ra: Right. Right. So in our system, in the previous slide, so this, the
P3 trusted -- yeah. That part will do encryption and decryption.
And we [inaudible] P3's privacy aspect using PSNR and preventative set of computer
vision algorithms, computer vision based algorithms. And essentially all results are
saying that [inaudible] is solo and all this recognition technology becomes useless with
the public part, so P3 preserve privacy.
So in this talk I'm going to show the two results, edge detection and face recognition.
For edge detection technique, the first result is on the edge detection template. We
applied canny edge detection on the public parts. So these images are from UCCP's
data set, the three canonical images. If you use threshold as one and apply canny edge
detection technique, it will look like this. It's almost impossible to recognize anything.
But if you increase the threshold to 10, there it looks like this.
So, again, if you are familiar with the data set you may recognize especially in the
middle image. But still hard to recognize something under the right side.
If you increase the threshold to 20, which is the weakest setting, it looks like this. Okay.
So in the next slide I represent the original image of at least three images. And together
with the canny edge detection results on the original image. So there it looks like this.
Okay. Of.
And the second result to show is on face recognition. So we use EigenFace algorithm
with the color FERET database for the variation. And we use Colorado State
University's face recognition evaluation system, which is basically devised for evaluating
the -- comparing the different face recognition algorithms.
So we examine the recognition performance under various settings. So different
probing sets, which is in the database, and different distance metrics and different P3
threshold used and public parts as a training set as a new training set for mimicking the
other [inaudible].
Here is the result of -- yes?
>>: [inaudible] canonical face did your [inaudible] system through the similar algorithm
you had [inaudible] or you keep the original face as the -- as the recognition algorithm?
>> Moo-Ryong Ra: So I -- I tried both. So I'm going to present the result on the right
case, public part. So also train the faces using the public part of the training set. And,
yeah.
So here is the result on the worst case. So X axis is the recognition rank. And Y axis is
a cumulative recognition rate. I am following the methodology provided by the FERET
database community.
And the upper line uses the normal training and public -- normal training and proving
set. And the below two lines uses public part as a training and proving sets. So each
line uses different threshold. The green line uses training and red line uses 1.
The first if we consider this point which gives the best recognition rate from the
perspective of an attacker. So which has the 50, rank 50 and about 40 percent
recognition rate. So intuitively what it means is that for unknown face there is a right
answer among top 51 candidate faces with 40 percent probability. All right?
So if we just consider the top recognition rate, which is those two points, the green line
has about 15 percent recognition rate and the red line has two percent recognition rate.
So note that even if attackers get the 15 percent recognition rate, she already have -she only have this public part so it would be very hard to verify whether the result is right
or not.
So other results using different threshold and the normal training set shows the worst
recognition rate than the green line here. So overall the face recognition is broken.
So those are not useful for our purpose. But there are considerable amount of related
work in this space again. So fully homomorphic encryption enables arbitrary processing
on the encrypted data. But they are too expensive to be used in the high-dimensional
data for [inaudible]. And they're required to change PSP infrastructures [inaudible].
And second work, second set of work is on the privacy on video surveillance literature.
So they do masking, blurring, pixellation and scrambling coefficient, et cetera. But they
are either fragile to recognition techniques or they increase file size too much.
And third set of -- there are considerable amount of related work in selected encryption
literature. They do useful things. But all these works are done at -- in late 1990s. At
that time they just focused on the reducing of the amount of computation on the device.
So none of these can handle the full requirement of P3 algorithm. For example,
[inaudible] challenge due to the crowd -- size processing. None of these existing
algorithms can handle. So P3 is a kind of selective encryption algorithm but a unique
one tailored to the novel requirements.
So summarizing P3, the crowd-service providers already providing the useful
processing for mobile devices. And P3 protect our privacy against providers while
maintaining the cloud-side processing. Yes?
>>: So do you have a definition for useful on the cloud-side processing [inaudible]?
>> Moo-Ryong Ra: As I described or linear processing that we can handle.
>>: So let's say I'm [inaudible] and you have this technology where he needs to spend
a couple million dollar [inaudible] so that he gets [inaudible] 95 percent of the users are
[inaudible]. What would the argument to any of this [inaudible] providers be for
providing any of this stuff?
>> Moo-Ryong Ra: So what is benefit of the [inaudible] or ->>: No, warranting the incentive for any cloud provider to do this.
>> Moo-Ryong Ra: So, right. The kind of version that the PSPs believe in [inaudible]
the argument for this is for Facebook, for example, that they may want more users. So
there are privacy concerns to users who's very reluctant to use this kind of sharing
environment.
Then they [inaudible] device paid services for that kind of users to increase their user
base. Right? That's kind of argument that I have right now.
>>: [inaudible].
>> Moo-Ryong Ra: We have -- we don't -- I don't have the concrete numbers. But
there are -- you know, in this space, there are many startups nowadays. So actually
some -- yeah. Some people are interested in this direction.
So we have examined two examples of how we enabled efficient processing and secure
sharing of sensor data is in the cloud. So now I argue overview of other two pieces of
my work and conclude the talk. I also explored the other interesting domains. So the
first emerging demands on larger-scale sensor data collection and processing from the
corpus of smartphone users. So crowd-sensing is another capability that combines the
power of clouds with sensors on smart mobile devices.
The main key observation here is that there is a lack of support to automate this labor
intensive task. So I -- built a high-level probing framework for crowd-sensing
applications. Now the users can just give a high-level description and the run time
takes care of the rest automatically.
And second, whenever we share large volumes of sensor data that's in the cloud, we
will have an energy problem, energy concerns. The characterization here is given the
delay-tolerant mobile applications, existence of multiple wireless network interfaces and
time varying wireless network, it may make sense to defer the transmission opportunity
rather than sending it immediately. So I designed a [inaudible] algorithm that governs
these transaction transmission decisions. And the algorithm called SALSA can
effectively trade off energy and delay by intelligently defining the transmission
opportunity.
So I'm summarizing my entire work in high level. So now we have Odessa to enable
the mobile processing applications, which is data and compute intensive workloads.
With P3 we made [inaudible] to protect users' privacy while maintaining their cloud-side
processing.
In with Medussa we enable the large-scale sensor data collection and processing from
the smartphone clouds.
And with SALSA, so we can effectively trade off energy and delay when using the
delay-tolerant mobile applications.
So at this time I want to thank my collaborators. So without their support I may not be
here as a candidate today.
So finally, future work. So in the future I want to broaden my research horizon and
make our personal computing environment more efficient and secure. So I categorize
my future work as two things. The first, I'm interested in building infrastructure support
for mobile devices in the future. So which may -- which includes the [inaudible] for the
mobile devices like the location and notification services. Also, making the mobile
systems scalable and privacy preserving.
And the second I'm also very interested in making multimedia data sharing and
processing secure and efficient in if our personal computing environment. The
examples include the privacy-preserving video sharing and the making the heavy
processing on video data efficient and secure on our personal computing environment.
And thank you. I will conclude my talk at this point. And I will be happy to take any
more questions.
[applause] okay?
>>: Can you tell us a little bit about how you disseminate the secret data? I'm a little
curious about that. So you have -- you have this public and -- yeah, public and secret
part of this. Can you tell us a little bit about how you disseminate the secret?
>> Moo-Ryong Ra: Sure. For this secret part of the image, that is going to the
cloud-size storage. When you unload the photos, on the device you derived the image
into two part. And public part will go to the Facebook, for example. And secret part will
go to the Dropbox, for example. And then when the receiver wants to see the photo, he
download the public part from the Facebook and it also gives the unique photo ID. And
in the photo ID you retrieve the secret part from the Dropbox. All right? Then that way
you can reconstruct the secret and public together on the device side. And, yeah.
That's how it is.
And for the key we assume that the key should be distributed offline. [inaudible]. Does
that answer your question?
>>: Yes.
>> Aman Kansal: Okay. They don't have any more questions. Let's thank the speaker
once again.
[applause]
Download