>> Aman Kansal: It's my pleasure to welcome Moo-Ryong Ra from University of Southern California. He'll be talking about cloud-enabled mobile sensing systems. A large part of this talk is, in fact, a preview of what the rest of the world will only get to hear at NSDI. So this is faster for you guys. With that, welcome, Moo-Ryong. And take it from here. >> Moo-Ryong Ra: Thanks for the introduction, Aman. First of all, thanks for inviting me today as a candidate. I must say that it's my honor to give a talk here today. My name is Moo-Ryong Ra from USC. And today I'm going to talk about cloud-enabled mobile sensing systems. And especially how to enable efficient processing and secure sharing of sensor data using the cloud. So nowadays almost everyone has one or more smart mobile devices. So these mobile devices are not just [inaudible] but deeply changing many aspects of our lives. The changes are from how we interact with each other to where we eat, where we meet, and how we experience a visit to the doctor and how we pay bills, et cetera. So with this much mobile devices numerous cells have been developed, and these have become indispensable in our everyday life. So compared to desktop PC or laptop, one of the distinctive features of smart mobile devices is the resistance of sensors. Some other smart mobile devices already have a variety of sensors such as camera GPS, microphone, compass and motion sensors like [inaudible] and gyroscope and many others. So these sensors are smart mobile devices provide you just reach contextual information to enable mobile features. So based on those sensors, many useful mobile sentencing applications have been developed. Video sharing, the [inaudible] intelligent personal digital assistant and photo sharing and location based services are widely used. And some of them have humongous feature base. The Facebook has over one billion active monthly users and Google Maps for Mobile has more than hundred million active monthly users. And note that all these applications are enabled by the cloud. So this cloud-enabled mobile sensing systems and applications are the focus of this talk. And all these applications use the cloud, many because they are resource constraint. So the computing power is less powerful than the desktop PC or laptop. And they have smaller storage space or so. And although they are rapidly revolving, the wireless network is less reliable than wired network and not always available. And battery problem is well known problem in mobile community. For example, ever since mid 1990s, the battery density has been improved only two times every 10 years. So cloud certainly gives a great opportunity for mobile devices because of its high availability, nearly infinite storage space and millions of compute cores. So [inaudible] exist many use for cloud-enabled mobile sensing systems as applications. The problem is people always desire first application with more features. And there are growing concerns of security and privacy. So my thesis focused on the system support to realize these growing demands and to resolve existing concerns on cloud-enabled sensing applications. So in this context we have several challenges to overcome in sensor data sharing processing. First we often face performance problem and you deal with compute intensive, data intensive mobile workloads. And second, whenever we share large volumes of sensor data with others using the cloud, there is a tension between efficient sensor data sharing and privacy protection. This is not an easy problem to solve immediately. And third, when we share the large volumes of data from the corpus of smartphone users, there is often very challenging to officially deal with labor intensive self tests. So the question is what kind of programming abstraction do we need to address this problem? And lastly, the -- whenever we share large volumes of data using mobile devices, the energy problems are always there. So given the challenges my research goal is to enable efficient processing and secure sharing of sensor data using the cloud. So my -- as I described, my thesis work is tightly connected to sensor data, sharing and processing. So I made an effort to overcome several challenges that I described in the previous slide. So this slide is about enabling mobile perception application focused on performance and P3 is about how to protect user's privacy against providers when sharing photos. And Medusa is a high-level programming framework for crowd sensing. And SALSA is how to trade off energy and delay when sharing large volumes of data. Before we enter the details I would like to briefly cover some technical aspects of my thesis research project. The Odessa project I did read data-flow programming language and built runtime based on workload characterization. In P3 I developed an encryption and decryption algorithm based on signal and image processing average. And I built a software -- I built a system that uses software interposition architecture that re-engineers photo upload and download protocols of existing photo sharing service providers. And in Medusa project I designed a domain-specific language and built a partitioned runtime across mobile devices and the cloud. For those who are interested it [inaudible] the implementation on Google code. This is [inaudible] project. And in SALSA I exploited application delay tolerance to design [inaudible] network interface selection algorithm using Lyapunov analysis. So this underlying system is deployed at Los Angeles International Airport and other universities and companies for more than three years. And in addition, and I was an intern at here, at MSR with surge group, I focused on the continuance sensing application and characterized the workload based on the simulator and actual measurement on two very different types of processors. So in today's talk, I'm going to cover the first two projects in depth, and I'm going to summarize the other two pieces of work at the end of the talk. So here is the outline of the talk. So I already introduced my problem space and my research. I am moving to the first part, how to we -- how should we offload the computation to the cloud to enable demanding applications and why the resisting approaches are not directly applicable. As I already mention smartphone have sensors. And these sensors enabled a set of sensing applications such as activity recognition, health, traffic monitoring, location-based services, et cetera. Recent advances are computation, sensing, and communication capabilities of smart mobile devices create a new class of application that we call interactive perception applications. They are like the other sensing applications in this slide. The mobile perception applications make use of high-data sensors like cameras. Here are some examples of such application. So we use three prototype, the interactive perception applications. The first one is face recognition application. So at the conference, for example, [inaudible] people's faces to immediately recognize face in the room. And second application is object and pose recognition application, which will enable a [inaudible]. And third application is a gesture recognition application to control tablet device using simple hand gestures to analogate the drive. So these emerging applications have the following characteristics. So they are interactive. Typically requires the crisp response time in the order of 10 to 200 milliseconds. And high data-rate because of video data. This is realtime video data. And compute intensive because the computer vision-based algorithms are typically used. So when we run these applications on mobile devices, we will have significant performance problem. So to understand this, two measures of goodness will characterize the requirements of interactive perception applications. The throughput is how many frames the system can processes per second or so often denoted as FPS, frames per second. And next thing is end-to-end latency of a compute pipeline for a given -- a single frame, which is basically response time of a given recognition task. So in general, we are not sure if high throughput and low makespan to verify the how severe the performance problem is. Here is one experimental data, one throughput of type applications. So each application runs locally on mobile devices. As you can sense from the video on the right side, this is two [inaudible]. So note that the number on object and pose recognition application in the table is actually 10 times slower than the video playing on the right side. How do you solve this problem on performance? So fortunately these applications are naturally presented as a data flow graph and in this slide. So first we have mobile devices on the bottom and cloud infrastructure on the top and they are [inaudible] through a deep network. So technique that we can you've is offloading. So each moves demanding stages from the mobile device to the cloud to reduce the execution time. And second technique that we can use is using parallelism. So by increasing the number of workers for demanding stages we can further reduce the execution time significantly. And additionally we can process multiple frames simultaneously using pipeline parallelism. So given these techniques our focus in [inaudible] is in the context of computing data intensive mobile applications which can be structured as data flow graph. How do we design -- sure? >>: [inaudible] it seems like the mobile devices are suddenly getting fast. Like people don't know what to do with, like, four cores on the phone. At the same time I think so why won't just that aspect solve the problem like taking everything to the cloud? >> Moo-Ryong Ra: So can you rephrase the question? I'm not sure I ->>: [inaudible] performance of the device, right. I'm saying devices are getting faster. And you wait a couple years and then you can run the application and the device itself rather than worrying about this new structure [inaudible]. >> Moo-Ryong Ra: Oh, okay. >>: Which actually has some fundamental image around how far the cloud is. >> Moo-Ryong Ra: Okay. Okay. The -- I think this perception application, the three application may be easily enabled by the future of may be three to four years of mobile devices. But I think people will create more demanding application with higher accuracy always exceed the capability of mobile devices. And there is the problem on energy always. So I think there is a need -- there will be a need to use the cloud, whatever the device performs. That's my opinion. >>: Okay. >> Moo-Ryong Ra: Is that the answer for your question? >>: It's your opinion you rendered, so I don't know what to say [laughter]. >> Moo-Ryong Ra: I see. Okay. So, yeah, given the application structure, the data flow structure, how do we design the underlying system that use offloading and parallelism techniques together to enable such applications? This is the high-level focus of this work. To enable the goal, to achieve the goals, these three fundamental questions need to be answered. The first, what factors impact offloading and parallelism between mobile devices and the cloud. And second, how do we improve the throughout and makespan simultaneously, again by using offloading and parallelism techniques together? And, third, how much benefits can we get compared to other strategies? So to understand the problem space you measure the workload and identify that there are lots of variability in the system. So graph on the right side shows the result on object and pose recognition application. The X axis is a frame number and Y axis is a number of SIFT features detected on the bottom. And makespan values on the top. >>: I missed it. Can you tell me again what makespan is? >> Moo-Ryong Ra: Makespan is the entrant, the compute latency over entire pipeline for a single frame. So the upper graph shows makespan values. And actually there was huge variability in complexity in input which caused significant fluctuation on makespan values. For example, if we see the frame number 200, the makespan value's relatively low because sin complexity is relatively moderate. But if we see the frame number 300, it has much longer makespan because of more SIFT features on the first [inaudible] in the image. Yes? >>: So to interpret that graph you're seeing about 12 seconds of latency per frame? >> Moo-Ryong Ra: Yeah. In this application everything runs locally very, very slow. So I will show how my system improves the performance too by huge overlaper. >>: [inaudible] average, right? I mean, it's like some of them are 10, some of them are 30? >>: Right. But looking at that red line, just that would save 12 seconds to get your frame back. >> Moo-Ryong Ra: And this is fairly [inaudible]. >>: I'm sorry. >>: [inaudible]. >>: What did you just say about it's something mobile? >> Moo-Ryong Ra: Every stage runs locally. >>: On the phone? >> Moo-Ryong Ra: On the phone. >>: Okay. Got it. >> Moo-Ryong Ra: So why this huge makespan. >>: What's the capability of the phone? Which phone did you use for experiment? >> Moo-Ryong Ra: So it was the netbook at that time, 1.4 gigahertz single core. And the Galaxy S III has code core 1.4 gigahertz CTU now. But at that time you don't have this. Questions? So from this we learned that the system should adapt to the but variability at runtime because of huge variability in input. And in addition to input variability we also explored other domains which can affect the performance. The additional domains include the different mobile devices and network condition and different choices of parallelism. So these additional domains incur more variabilities to the system, so we conclude that offloading and parallelism decisions should be adaptive to input and platform variability. But from the lessons from the measurement we design the Odessa runtime system. And let me give you a high-level description of Odessa. So Odessa is a runtime on top of Sprout. Sprout is a distributed and parallel runtime engine developed at [inaudible]. Odessa uses mechanisms provided by the Sprout. Odessa runtime is mainly comprised of two components e the application profiler and decision engine. And this application profiler delivers sequencer statistics to the decision engine using lightly weight piggybacking mechanism. And thereby decision engine can adapt or [inaudible] parallelism decision to improve throughput and makespan simultaneously. So this decision engine runs on mobile devices. This means some part of data flow graph will be placed on the cloud if necessary or compute stage might be offloaded from the mobile device to the cloud, from the cloud to the mobile device. And he can also spawn more workers for demanding stages. Then how the decisions are made. It's all [inaudible]. Let's look at how this makes this decision. So when the application starts the entire part is on the smartphone. >>: [inaudible]. >> Moo-Ryong Ra: Sure. >>: [inaudible] being written for this framework, or are you doing all of this automatically? >> Moo-Ryong Ra: I would say written for this framework, the application developer should provide the data flow structure to the runtime. Again the smartphone is on the button and the cloud infrastructure on the top, they are connected through the network. So based on the profiler data, the decision engine knows stage A is a [inaudible] and then it estimates the migration cost and expected execution time on the cloud and offload the stage only in the remote execution cost is less than local execution cost. And after that, the decision engine again identifies the stage B as a battler and offload it. And after that, it spoils one more worker more stage B since the stage B was the first one here. And at some point [inaudible] could be a battler. Then the system must estimate the offloading possibilities on both ends and take the relevant picture. So in this particular example, the system migrates a [inaudible] stage to the cloud. So overall decisions are incremental and so it adapts quickly to input and platform variability. So before talking about the performance result, here are actual data flow graph for our three prototype applications. But these applications runs on top of Odessa. Yes? >>: You said that quickly. Can you just give me a [inaudible]. >> Moo-Ryong Ra: So this is an engine for the decision making only two to -- less than two milliseconds. So this is quite quick so we can, you know, frame level ->>: In the example you started with where you moved those three modules up into the cloud, how many frames would it take for that to happen? >> Moo-Ryong Ra: So it depends on the parameter setting. We see -- we actually profile the actual recognition time of every frames using our profile engine. So we set our window size as 10. So see the statistics for the [inaudible] 10 frames and makes the decision. >>: I have a question. So when you say adaptive I'm just trying to understand. Does it mean that right now on WiFi I walk around the building and I won't find WiFi any more, I'm on 3G ->> Moo-Ryong Ra: Uh-huh. That ->>: Immediately switch the phone -- it would immediately switch the local processing? >> Moo-Ryong Ra: Yeah. >>: Okay. >> Moo-Ryong Ra: It might bring back the compute stage back to the [inaudible]. >>: And how long does that take? Because you have to figure out that latencies have increased. >> Moo-Ryong Ra: Yeah. We will see the latest 10-frame statistics. So usually -- it depends -- as long as we don't lose the connectivity within five to eight frames we can [inaudible]. >>: [inaudible]. >> Moo-Ryong Ra: As we can see, all these applications have varying structure and different number of computationally demanding stages. >>: How difficult would it be -- so this is a lot of complication. How difficult would it be for me, as a programmer, to go in and add a notation to say if the latency -- if you're on 3G from this locally, if not run this on the cloud or vice versa for a handful of stages. I guess the question I'm getting at is how complicated are these app locations and do they actually require your adaptive mechanism? >> Moo-Ryong Ra: Application developer is not to know the -- these dynamics. They just provide -- these are fine grained, the data flow structure, and the runtime takes care of the rest automatically. >>: Yeah. No, I understand that. But my question was how complicated are each of these pipeline stages? Could I have -- if I were an expert developer, maybe I don't want to use your framework, could I come along and just make -- provide those annotations to use this unknown Sprout framework to do this statically? >> Moo-Ryong Ra: So I compare our performance with domain experts later, but, yeah. The domain experts cannot know the every decisions correctly. Yeah. Actually this slide shows the [inaudible]. So our main -- the problem was performance. So let's compare Odessa's performance with other strategies. So we compare Odessa with three other competitors as well as one imaginary strategy that is optimized by offline method. The local runs every stage locally on mobile devices. And offload all runs the stage that [inaudible] frames as well as the stage that displays the result on the mobile device. And all other result, all other stages will run on the cloud. And domain-specific uses the partition source by the domain experts which is the application developer in our case. And last competitor is offline optimizer. It basically uses to research every possible partitions and picks the one that gives the best result. So sense the computation requires this too expensive and it requires statistics on all possible partitions it cannot be done online but has to be done offline. To remind you the throughput high is better, the makespan low is better. Yes? >>: This question is about what this data represents. Are these means or medians? What is the benchmark? What is the fraction of time spent in different network conditions? >> Moo-Ryong Ra: This results is on object and pose recognition application. And we use the best quality network, hundred megabit bits per second for this experiment. When we use the 1.4 [inaudible] CPU netbook as a client. And for this experiment we run the object and pose recognition on the mobile device at the beginning. And wait until the partitions is saturated and then see the average, the frame rate, average throughput and average makespan after hundred frames. >>: If it's not [inaudible] bandwidth why [inaudible] give you the highest frames per second? >> Moo-Ryong Ra: The photo -- why the old photo is not going to be highest throughput? That seems to be your question? >>: And will -- yeah. If you've got effectively a zero latency or very low latency network with unlimited resources and you've got a hundred megabit network, why would that -why would the lower frames per second mean something that in Odessa, for example? >> Moo-Ryong Ra: Okay. The why the Odessa performs better than offload or domain-specific technology, that's the abstraction question for [inaudible]. The reason is I think two things. One is parallelism choice. And the other is the way of partitioning the application across to the available resources. For parallelism the domain specific makes a wrong decision. The -- in terms of pipeline parallelism, domain specific, the application developer doesn't know what is the right number of tokens existing on the pipeline? So how many mini frames the system should processes simultaneously. That decision depends a lot of on this device capability. So it should be based on the actual profile rather than the fixed number at the beginning. So that's one very crucial reason about this performance difference. And the offload, the amount of data parallelism was also important. It's usually just a single -- the detection stage single recognition stage, where the domain space began with multiple stage for such demanding stages. Is that the answer for your question? >>: I guess I don't fully understand what these stages do. So I -- you're saying there's a lack of parallelism in offload all? >> Moo-Ryong Ra: Basically, yes. >>: So would you say it's fair to say that offload all was implemented poorly or -- >> Moo-Ryong Ra: So offload all choose wrong number of data parallelism and pipeline parallelism. So for example, the object and pose recognition application, there are three demanding stages. Safety feature extraction where they're matching and clustering. But, you know, domain specific and offload all, especially they use just one instance of safety feature extraction, one worker thread for safety feature extraction, clustering and more dimension. So that caused the huge performance difference. And the other is pipeline parallelism. [inaudible] you run pipeline you know you have 10 end-to-end pipe stages, computer stages. And deciding the right number of frames in the pipeline at a given time is not truly your decision. It should be based on actual profile data and so on. >>: [inaudible] asking is if all the trick -- all the changes you're talking about in Odessa don't sound like adaptive, they sound kind of like better programming in a way or better use of the data. And if those same things could happen in offload all, would offload all be as fast as Odessa? Or is there something going on sort of almost in realtime in adaptive, which seem to be the special thing about Odessa? >> Moo-Ryong Ra: Right. >>: That was ->> Moo-Ryong Ra: [inaudible]. >>: Is making it different than offload all. >> Moo-Ryong Ra: So what's your ->>: I guess [inaudible] doesn't seem to me like -- it doesn't seem like an optimal parallel like if I -- if I figured the cloud is free and I just want to burn as many resources as possible, you would presume I would go for the maximum level of parallelism and just burn the heck out of the CPU units in the cloud. And if -- that would seem like one of the naive strategies I want to compare to it. It might not be terribly efficient in terms of use of CPU resources if I'm throwing out steak that will be useful in a less parallel situation. But that would certainly -- I mean, maxing out parallelism would seem to be the best way, the maximum frame per second. >> Moo-Ryong Ra: So one problem is they don't know what the right number of maximum parallelism is for a given environment. That's one thing. So let me show how I decide the pipeline parallelism. >>: [inaudible] offload, onload, is that something that you implemented, and were you using some other system that just, you know, was offloaded this? >> Moo-Ryong Ra: I use mechanisms provide by Sprout. So they provide the basic, yeah, offloading structures. Yeah. So let me show -- so I think left [inaudible] depend on the application since it is adaptive to a given environment. So the degree of pipeline parallelism should change accordingly, right? But the domain experts and this offload all cannot know this right number of degree of pipeline parallelism as well as -- so in my experiment the offload all doesn't use the maximum agree degree of parallelism. That's why the performance is too low. >>: So it was kind of set at a fixed low level whereas your system will vary it [inaudible] and it will go higher than the level that [inaudible]. >>: What if you just picked a higher level of -- would that suffer badly ever? >> Moo-Ryong Ra: Higher level ->>: Higher level of parallelism [inaudible]. >> Moo-Ryong Ra: Then the [inaudible] suffer because the -- all [inaudible] frames will wait before the [inaudible] stage. So we need to be careful about choosing the real degree of pipeline paradigm. >>: Odessa is doing three different things you described so far in terms of improving this. One is it's deciding which pipeline stages to upload -- to upload to the cloud. >> Moo-Ryong Ra: Okay. >>: Secondly is deciding what degree of parallelism to get at each new stage. >> Moo-Ryong Ra: Right. >>: And thirdly, it's making adaptive decisions about those as conditions change. >> Moo-Ryong Ra: Right. >>: Can you give us a breakdown for this example you've been showing us, the one on the previous slide before you jumped here. Would those things matter? It seems like in particular the decision about which stages to upload is not relevant. It's not the -- the reason that Odessa is beating offload all is not because there's some stage that's really important to do on the client. I'm guessing. I may be wrong. Can you give us a breakdown of which of those things matters in this example? >> Moo-Ryong Ra: On this example ->>: The example you had before, that chart. >> Moo-Ryong Ra: Well, this is an example. The -- so let me show the resulting partitions. This is the result on object and pose recognition application. The resulting partitions are something like this. There are three demanding stages but the Odessa offload will need two stages and increase the data parallelism like this and control the pipeline parallelism also. The notable difference is it is accurate clustering stage locally. So the Odessa can use the more resources on the cloud for the other stages. Compared to offload the domain specific technology which may use the maximum data parallelism whatever we set then may be wrong. Right? The Odessa uses the necessary computation resource locally and utilize the cloud resources more the right way. >>: Are you saying that this is actually optimal here doing four network ground trips rash taking that middle stage and pushing out to the cloud, that the performance would be worse if that middle blue rectangle there were pushed up on to the cloud? >> Moo-Ryong Ra: That depends on the amount of data that will be transmitted between the stages. So in this ->>: How could that possibly be worse than that? >> Moo-Ryong Ra: So actually but through my Odessa algorithm is works based on the [inaudible] stage. So I measured every stage at execution time. So execution time of blue rectangle and execution the delay of the every network edge and execution time of red rectangle also and tried to reduce the execution time of the [inaudible] link or stage. So there make sure this is better than the other partitions. Right? >>: So you're skeptical of that result? [laughter]. >> Moo-Ryong Ra: I'm not skeptical about this result. >>: I am not as skeptical of this result. >> Moo-Ryong Ra: This number is ->>: If you pushed that rectangle -- if you pushed that blue rectangle up there, you're going to save two network round trips -- excuse me, you're going to save one network round trip, two network opps, and you're going to be able to use a higher performance core up in the cloud than you are down on your client machine. It may be a small win but it's got to be a win in terms of performance. >> Moo-Ryong Ra: But that depends on the congestion on the cloud side, right. So if you offload the middle stage on top, maybe -- maybe performance is a little better. But the throughput is governed by the [inaudible] stage. Therefore will not increase. Right? So the -- I'm trying to optimize both the makespan and throughput simultaneously, however the cost is, offload the single stage to the cloud may not increase the throughput at all. Because [inaudible] execution time ->>: [inaudible] right? It certainly will not hurt it. It would not be any worse. >> Moo-Ryong Ra: Yeah, maybe. But I'm not saying my [inaudible] partition is globally optimal, I just increase the both metrics simultaneously. >>: So your assumption is that the cloud itself could be [inaudible] at some point? >> Moo-Ryong Ra: Yes. >>: [inaudible]. >> Moo-Ryong Ra: Right. Any other questions? Okay. Then I will get back to the result. Yeah. For this I will perform against three other competitors. And even compared to the offline optimizer it gets comparable throughput. And although different, there are considerable amount of related work in this space. First set of approaches using the integer linear programming. And second set of approach -- sorry, is based on graph-based partitioning method to optimize custom utility function. And third approach is using static partitioning scheme. Of the application partitions will be determined at compile time. And fourth ones are switching between pre-specified partitions, either by the application developers or domain experts. So these are not providing the relevant solution for ours because the objectives are different. And because of huge variability. So static or fixed partitioning schemes will not work. And none of these considers the parallelization of demanding stages on mobile devices. So Odessa, so Odessa achieves our goal using the [inaudible] in dynamic runtime, which adopts input and perform variability runtime. The summary of Odessa, some emerging applications are too heavy to run on mobile devices. So Odessa enables interactive perception applications by dynamically adapting to the input and platform variability. So I'm moving to the second piece of my work. So when we just enabled the mobile perception applications. So when you want to share the [inaudible] result within the cloud, you may have a privacy problem. So this work is about how to protect our privacy when sharing the photos. So cloud-based photo sharing services, PSPs are becoming very popular nowadays. The people uses various mobile devices to share photos and upload it to PSPs using wireless network. But here we have serious privacy concerns. Here is an example. Suppose Alice has a secret picture of a nice guy and want to share it with friends using PSP? First possible concern, privacy concern, in this situation is the unexpected exposure of the photo. Which could happen either by accidental bugs or careless system design by PSP. The second problem might be we don't have any mechanism to PSP's data abuse. So in this particular example, the people may use their best possible inference algorithm in the photo and may conclude the following: So this is obviously not a desirable scenario for Alice. But currently there is no way to prevent this scenario. We need to completely trust PSPs in order to share our photos. So Alice not making up artificial threats, but they're real ones. Here are for recent new satellites. The Photobucket system unexpectedly exposed the [inaudible] photos because of their naive system design. The problem was their photo URL was too easy to guess. There by the [inaudible] to know is just to use your ID. And besides the Facebook had face recognition API in their web based API data specification. But because of privacy issues, partially described in this slide, they eventually shut down the API. And now a long time ago, the Instagram tried to change their terms of service saying that they can send you just photos without having data on this function. It caused a big ruckus. So the company diverted back to the original [inaudible]. So these privacy concerns are real to many users. On the other hand, PSPs provide the useful processing for mobile devices. Again, suppose Alice has a brand-new smart camera and takes a high resolution photo and unload it to a PSP. And you know, and the horror -- Alice's friend may have a mobile device with different screen sizes. In order to provide desirable user experience, the PSP will still the image appropriately and send them to the different mobile devices. But these types of processing, so-called images scaleability service are very useful for users to reduce network latency and [inaudible]. Also, it is possible that the PSPs can perform other kinds of proving. For example, [inaudible] operation to enhance image quality. So the cloud is already doing useful processing for mobile devices. And people gets tremendous benefits from that. And the problem that you will have both privacy protection and cloud-side processing. So solving this problem, especially understand particular constraints, is quite tricky. So we immediately think that as a potential solution why not just encrypt everything? So as a result, for example, the mobile devices should download full resolution images, regardless of their screen size and storage size limitation. This is [inaudible]. So if you useful encryption, we lose image scaleability service as well as the other benefits provided by the providers. Unless you do it -- before describing our approach, I will describe our goals, thread model, and assumptions that we made. Again, our goal is to protect users' privacy with the cloud-side processing. And our thread model covers two categories of threats. So one is the unauthorized access, and the other is the application of automatic recognition technology on users' photos. And our trust boundary is in between mobile devices in the cloud, which means that we completely trust mobile devices hardware and software. Those include sensors, operating system codes and apps, et cetera. And we don't trust starters, including eavesdroppers on network and PSPs. For PSPs you assume that they are honest but curious, so which again means they will not change what they are doing for photos no matter what. But they will try to inform you just partial information using their best possible method. So I am describing the our approach in high level. So, again, suppose Alice want to share a photo with Bob. From the photo we first extract small but has very important visual information which we called a secret part. So one can think it as the most significant bits of entire image and I will describe how exactly we construct the secret part later, after this slide. And after removing the secret part what remains is large volume but has reader vanish information, which equal a public part in this talk. Again, one can think it as least significant bits of entire image there. And public part is the standard JPEG image. So the PSPs can't accept it without changing their system. So in theory, the secret part will be encrypted and ideally embedded inside the public part, and then image will be up loaded to a PSP. And this way PSPs can perform any processing and useful processing on the public part. In this particular example, they scaled down the image for serving the mobile device. When Bob wants to see the photo, he download both public and secret part and combine those two to reconstruct image. To enable this capability we have certainly important requirements. They are our algorithm should ensure has to be -- has to ensure privacy on the public part and storage overhead should be minimized. Our encryption and decryption processes should be lightweight. And our public part should maintain a standard compliancy, in our case JPEG image. And the cloud should be able to processes the public part appropriately. And the resulting system should transparently work with the existing PSPs. So overall, the -- our algorithm system collectively called P3 achieve these goals and requirements. And I will describe why our system and algorithm works in later slides. Yes? >>: So [inaudible] you assume what they can do to your data? >> Moo-Ryong Ra: Yes. I will describe it later. So before describing actual encryption decryption algorithm, I want to share the intuition behind the P3 algorithm. So how do you extract small but important information from the given image? So in this work we focused on the widely used image format, JPEG image compression standard. So in JPEG when compressing the image, and image is divided into many small patches. The size of one patch is 8 by 8. On these patches, the JPEG performs the DCT, the discrete cosine transform. Then the location this left grade corresponds to different frequency values. So if you draw these two lines of all coefficients from the old patches in the image, they will look like this. In this gram the center position have zero values denoted as blue line here. Then the first fact that you can exploit is that the discrete coefficients of the actual images are sparse. So in general, more energy is concentrated on the top left corner, which has low frequency values. Especially zero frequent values are called DC coefficient, DC component. And has significant visual information. And the second thing that we can use is we see that the signs of coefficients are evenly distributed because the histograms are mostly symmetric. So if we take out those values, it is very hard for attackers to correctly recover the values. And, third, certainly magnitude of [inaudible] have some information. So to exploit this fact basically takes all three components out to degrade the public part as much as possible. Now, I'm ready to describe how P3 encryption works. From the given image we get quantized DC coefficients. First we take out the DC components, which is visual information. And for remaining AC coefficients, we cut their magnitude using fixed threshold T. Then its regions are iterated separately with their signs. So in a part of the coefficients we form a public part which become another JPEG file. And you restore it and process by a PSP system. And out the part of coefficient we recombined with the DC components and former secret part. Which is small size but has important and significant visual information. And the second part will be encrypted when it goes over the mobile device. So note that we successfully eliminate three important components that I discussed in the previous slide. DC and magnitude by thresholding. And signs taken for a secret part. The next question is now we have this P3 encryption algorithm. So how well this algorithm works in practice? If I implement this algorithm and first result show threshold versus storage trade-off. And this -- yes? >>: [inaudible] previous slide. The way you cut the threshold if it's higher than the threshold do you take the whole thing and store it or only the delta part upon the threshold? >> Moo-Ryong Ra: Delta part of the threshold and remain -- the threshold in the public part. We messed up the [inaudible] public part. So in this graph we applied our algorithm in INRIA data set, which has 1491 different images. And X axis of this graph is P3 threshold used and Y axis is normalized file size compared to the original. So naturally original size is our one in blue color. And secret part is in red. And public part is in green. And so the public and secret parts are in black. So results are very encouraging. Even if it comes to the worst case, the file size increases only by 20 percent. And for the individual file size the size of probably a secret parts are almost even at threshold 1. After that, the volumes are moving to the public part as we increase the threshold. So based on this result and the privacy variation on the public part, we set our operating range, the P3's operating range as 1 to 20. Then the next case there might be -- my information will be exposed in our operating range in the public part. So I used one example image from UCCP data set, which has some canonical images. When you set the threshold as 20, which is the strongest privacy setting in our scheme, the image looks like this. So if you are familiar with the image data -- the UCCP data set, you may recognize some structure here. But depending on who you are, one may have a hard time to recognize what is in the image. If I decrease the threshold, the image becomes more secure. This is 15, 10, 5, and 1. So if you said the threshold as 1, the visual rate is almost impossible to recognize anything. And for your reference, I'll present the original image if looks like this. And I will present the secret part with their threshold as I increase the threshold. This is secret part with threshold 1, 5, 10, 15, 20. As I increase the threshold, naturally less information will remain in the secret part and more volumes will go to the public part. So we have seen how in P3 increase the image and it's basically trade-off. Yes? >>: [inaudible] adversary who is trying to increase this information or are you just using the standard ->> Moo-Ryong Ra: I just use standard [inaudible]. >>: So it might be possible to [inaudible] more image if you try to do so? >> Moo-Ryong Ra: We don't have [inaudible] of our secret [inaudible]. In the variation I will show our -- the variation method. We use automatic recognition technology and so on. >>: [inaudible] show a particular encryption for that a [inaudible] looking at statistics in, you know, higher order statistics images detecting things like modification [inaudible] and all that work works because there are pretty strong structure -- there's strong structure in natural images. And I'm wondering whether -- my intuition is that you've [inaudible] one can apply similar techniques here and your priors about the relationships are not [inaudible] and nearby pixels and natural images and recover a lot of natural images from not very much bits, LSB bits that you're [inaudible]. I'm thinking about the adversary group probably matters a lot because how much -- because there's so much -- because you made such strong assumptions about the set of images that can be -that can make it through your ->> Moo-Ryong Ra: Right. [inaudible]. We haven't tried such image forensic techniques in our schemes yet. So we show how P3 encrypts the image and in trade-off. Then what about the decryption? How about decryption? So for decrypting the image, we are facing one very interesting challenges. Because of this cloud site processing. Suppose, again, the Alice want to share a photo with Bob. Since the public part is told and processed by a PSP system, the receiver will get the unprocessed secret part with the processed version of public part. Then the challenge is can you reconstruct -- sorry, can you reconstruct a processed version of original image using the given information on the receiver side? So if you can express the original image as a linear combination of secret and public part, so this problem becomes more straightforward. But it is not the case in our setting because we -- our P3 encryption algorithm hides this information from the public part. Then how do you solve this problem? So, as I mentioned, the original image is not just a linear combination of secret and public part. So it turns out that the correct location for the original image must include compensation from C. And our analysis result shows this C, this compensation from C can be derived from the secret part. Which you already have on the receiver side. Therefore, P3 can handle any linear processing. And for photos, this linear processing can handle many useful functions, the scaling, cropping, sharpening, blending, smoothing. Is that the answer for your question? >>: [inaudible]. >> Moo-Ryong Ra: But based on this P3 encryption and decryption algorithm, we design a P3 system that can transparently work with existing PSPs. So P3 takes an [inaudible] architecture to really acquire trusted proxy on the device, on the mobile device and as [inaudible] the cloud-side storage space. So it would be ideal if we can store the secret part together with the public part on to the PSP system. And the JPEG sender does our route, embedding application specific information into the binary. But in reality, most PSPs will eliminate this application specific information when they receive the photos from the users. So we take this [inaudible] approach based on the external storage space. So the on-device proxy will perform P3 encryption, description when it uploads or downloads the photos. And cloud-side storage space will store encrypted secret part. And public part will be stored and processed by a PSP. So P3 architecture is very easy to be implemented with existing PSPs. And we don't require the change the PSPs' infrastructure. So we -- sure. >>: So if there are two copies of the same image, and then if I use the same threshold and then the public part should be the same, right? >> Moo-Ryong Ra: Right. >>: So -- and I assume that public part is also kind of sense it's sparse, it's kind of -you can use public part as a signature of an image. >> Moo-Ryong Ra: Okay. [inaudible] scenario or ->>: Right. So an attacker or whatnot, right? So let's say I have an image of Justin Bieber and then you know the public part of the image, right? So let's say even if there are some, I don't know, hundreds of millions of images out there, if the public part encodes kind of unique -- unique bits corresponding to an image, then just ignoring the public part you can identify what's the original image, right? >> Moo-Ryong Ra: If you [inaudible] to secret part, that means you are my friend. >>: [inaudible] having the public part, you can almost guarantee that you can identify -you can map to an original image. >> Moo-Ryong Ra: That's -- >>: If the original image ->>: If you had ->>: -- available out on the Internet somewhere? >>: Uh-huh. Yeah. [brief talking over]. >>: [inaudible] software and hardware on the mobile device, [inaudible] mobile device [inaudible] user is the only one that is [inaudible]. [brief talking over]. >>: Other scenarios, right. Like so I want to find out who has posted the picture of Justin Bieber, right, for instance that I have a picture of Justin Bieber, I create a public part, then I can just scan the entire publicly available images and then find out that -which public part of the image is the Justin Bieber's -- the public part of the Justin Bieber's ->>: You're saying is like if they're identical under this -- and hash the public part is a hash of the image? >>: Right. Right. And then this -- looks like this public part, since it's very sparse, looks like, you know, could be very one on one mapping, that's many to one mapping from the original image to that. >> Moo-Ryong Ra: [inaudible] doesn't solve your problem. I assume kind of this image is not publicly available all the time. But I think one way to address that problem is however we unload the public part to the PSP we may inject a random overlapping images and may add random overlapping images, depending on the users. That may hide ->>: [inaudible] capability to reconstruct the original image with the secret part if you inject something ->> Moo-Ryong Ra: So the receiver -- if the receiver knows the ->>: [inaudible]. >> Moo-Ryong Ra: Yeah. But -- yeah. Valid point, yes. >>: [inaudible] constant threshold you have some minimized threshold and encoded that into the private part [inaudible]. >> Moo-Ryong Ra: Another question? >>: [inaudible] this isn't really encryption as much as reversible obfuscation. Because encryption requires -- encryption implies that you have a key, and if you're not the person who has a key, you can't, under most definitions of security for encryption, it means that you -- without the key, you can't tell if the encrypted message is the encryption of a given plaintext. But in this case, if you have a -- if you have a plaintext you can tell whether this is the encryption of that plaintext. >> Moo-Ryong Ra: So let me move on to the next stop. So actually implement the necessary component on your device and with Facebook system this prototype is visually on top of one of the latest smartphone, the Samsung [inaudible] 3. And here is a screenshot on the device and the delay numberings on the device also. So receiver without relevant password or relevant key, we basically see the gray image on the right side. Depending on the threshold it may change rather than the original on the left side. And also delay numbers are moderate. So the P3 is practical and can be implemented with a real system like Facebook. Sure? >>: [inaudible] and you posted this image on the Facebook but you [inaudible] in that case, because I should be able to see the picture too, right? >> Moo-Ryong Ra: Right. >>: So what do I need to -- so you need access to the [inaudible] on Facebook [inaudible]. >> Moo-Ryong Ra: Right. Right. >>: And you need this information to make the secret domain and somehow there needs to be some layer which combines this. >> Moo-Ryong Ra: Right. Right. So in our system, in the previous slide, so this, the P3 trusted -- yeah. That part will do encryption and decryption. And we [inaudible] P3's privacy aspect using PSNR and preventative set of computer vision algorithms, computer vision based algorithms. And essentially all results are saying that [inaudible] is solo and all this recognition technology becomes useless with the public part, so P3 preserve privacy. So in this talk I'm going to show the two results, edge detection and face recognition. For edge detection technique, the first result is on the edge detection template. We applied canny edge detection on the public parts. So these images are from UCCP's data set, the three canonical images. If you use threshold as one and apply canny edge detection technique, it will look like this. It's almost impossible to recognize anything. But if you increase the threshold to 10, there it looks like this. So, again, if you are familiar with the data set you may recognize especially in the middle image. But still hard to recognize something under the right side. If you increase the threshold to 20, which is the weakest setting, it looks like this. Okay. So in the next slide I represent the original image of at least three images. And together with the canny edge detection results on the original image. So there it looks like this. Okay. Of. And the second result to show is on face recognition. So we use EigenFace algorithm with the color FERET database for the variation. And we use Colorado State University's face recognition evaluation system, which is basically devised for evaluating the -- comparing the different face recognition algorithms. So we examine the recognition performance under various settings. So different probing sets, which is in the database, and different distance metrics and different P3 threshold used and public parts as a training set as a new training set for mimicking the other [inaudible]. Here is the result of -- yes? >>: [inaudible] canonical face did your [inaudible] system through the similar algorithm you had [inaudible] or you keep the original face as the -- as the recognition algorithm? >> Moo-Ryong Ra: So I -- I tried both. So I'm going to present the result on the right case, public part. So also train the faces using the public part of the training set. And, yeah. So here is the result on the worst case. So X axis is the recognition rank. And Y axis is a cumulative recognition rate. I am following the methodology provided by the FERET database community. And the upper line uses the normal training and public -- normal training and proving set. And the below two lines uses public part as a training and proving sets. So each line uses different threshold. The green line uses training and red line uses 1. The first if we consider this point which gives the best recognition rate from the perspective of an attacker. So which has the 50, rank 50 and about 40 percent recognition rate. So intuitively what it means is that for unknown face there is a right answer among top 51 candidate faces with 40 percent probability. All right? So if we just consider the top recognition rate, which is those two points, the green line has about 15 percent recognition rate and the red line has two percent recognition rate. So note that even if attackers get the 15 percent recognition rate, she already have -she only have this public part so it would be very hard to verify whether the result is right or not. So other results using different threshold and the normal training set shows the worst recognition rate than the green line here. So overall the face recognition is broken. So those are not useful for our purpose. But there are considerable amount of related work in this space again. So fully homomorphic encryption enables arbitrary processing on the encrypted data. But they are too expensive to be used in the high-dimensional data for [inaudible]. And they're required to change PSP infrastructures [inaudible]. And second work, second set of work is on the privacy on video surveillance literature. So they do masking, blurring, pixellation and scrambling coefficient, et cetera. But they are either fragile to recognition techniques or they increase file size too much. And third set of -- there are considerable amount of related work in selected encryption literature. They do useful things. But all these works are done at -- in late 1990s. At that time they just focused on the reducing of the amount of computation on the device. So none of these can handle the full requirement of P3 algorithm. For example, [inaudible] challenge due to the crowd -- size processing. None of these existing algorithms can handle. So P3 is a kind of selective encryption algorithm but a unique one tailored to the novel requirements. So summarizing P3, the crowd-service providers already providing the useful processing for mobile devices. And P3 protect our privacy against providers while maintaining the cloud-side processing. Yes? >>: So do you have a definition for useful on the cloud-side processing [inaudible]? >> Moo-Ryong Ra: As I described or linear processing that we can handle. >>: So let's say I'm [inaudible] and you have this technology where he needs to spend a couple million dollar [inaudible] so that he gets [inaudible] 95 percent of the users are [inaudible]. What would the argument to any of this [inaudible] providers be for providing any of this stuff? >> Moo-Ryong Ra: So what is benefit of the [inaudible] or ->>: No, warranting the incentive for any cloud provider to do this. >> Moo-Ryong Ra: So, right. The kind of version that the PSPs believe in [inaudible] the argument for this is for Facebook, for example, that they may want more users. So there are privacy concerns to users who's very reluctant to use this kind of sharing environment. Then they [inaudible] device paid services for that kind of users to increase their user base. Right? That's kind of argument that I have right now. >>: [inaudible]. >> Moo-Ryong Ra: We have -- we don't -- I don't have the concrete numbers. But there are -- you know, in this space, there are many startups nowadays. So actually some -- yeah. Some people are interested in this direction. So we have examined two examples of how we enabled efficient processing and secure sharing of sensor data is in the cloud. So now I argue overview of other two pieces of my work and conclude the talk. I also explored the other interesting domains. So the first emerging demands on larger-scale sensor data collection and processing from the corpus of smartphone users. So crowd-sensing is another capability that combines the power of clouds with sensors on smart mobile devices. The main key observation here is that there is a lack of support to automate this labor intensive task. So I -- built a high-level probing framework for crowd-sensing applications. Now the users can just give a high-level description and the run time takes care of the rest automatically. And second, whenever we share large volumes of sensor data that's in the cloud, we will have an energy problem, energy concerns. The characterization here is given the delay-tolerant mobile applications, existence of multiple wireless network interfaces and time varying wireless network, it may make sense to defer the transmission opportunity rather than sending it immediately. So I designed a [inaudible] algorithm that governs these transaction transmission decisions. And the algorithm called SALSA can effectively trade off energy and delay by intelligently defining the transmission opportunity. So I'm summarizing my entire work in high level. So now we have Odessa to enable the mobile processing applications, which is data and compute intensive workloads. With P3 we made [inaudible] to protect users' privacy while maintaining their cloud-side processing. In with Medussa we enable the large-scale sensor data collection and processing from the smartphone clouds. And with SALSA, so we can effectively trade off energy and delay when using the delay-tolerant mobile applications. So at this time I want to thank my collaborators. So without their support I may not be here as a candidate today. So finally, future work. So in the future I want to broaden my research horizon and make our personal computing environment more efficient and secure. So I categorize my future work as two things. The first, I'm interested in building infrastructure support for mobile devices in the future. So which may -- which includes the [inaudible] for the mobile devices like the location and notification services. Also, making the mobile systems scalable and privacy preserving. And the second I'm also very interested in making multimedia data sharing and processing secure and efficient in if our personal computing environment. The examples include the privacy-preserving video sharing and the making the heavy processing on video data efficient and secure on our personal computing environment. And thank you. I will conclude my talk at this point. And I will be happy to take any more questions. [applause] okay? >>: Can you tell us a little bit about how you disseminate the secret data? I'm a little curious about that. So you have -- you have this public and -- yeah, public and secret part of this. Can you tell us a little bit about how you disseminate the secret? >> Moo-Ryong Ra: Sure. For this secret part of the image, that is going to the cloud-size storage. When you unload the photos, on the device you derived the image into two part. And public part will go to the Facebook, for example. And secret part will go to the Dropbox, for example. And then when the receiver wants to see the photo, he download the public part from the Facebook and it also gives the unique photo ID. And in the photo ID you retrieve the secret part from the Dropbox. All right? Then that way you can reconstruct the secret and public together on the device side. And, yeah. That's how it is. And for the key we assume that the key should be distributed offline. [inaudible]. Does that answer your question? >>: Yes. >> Aman Kansal: Okay. They don't have any more questions. Let's thank the speaker once again. [applause]