Document 17868802

advertisement
>> Matthai Philipose: It’s my pleasure to introduce Seungyeop Han who’s a PhD candidate right now at
the University of Washington. He’s a grad-student working with Ivan [indiscernible] and Tom Anderson
over there. Before that he was an under-graduate at KAIST. Actually and in between he did three years
at [indiscernible] or the big internet companies in Koya as a software developer. He can speak code.
He’s interested broadly in mobile network and distributor systems. He’s built several of them during his
thesis. If you read his references, one of the things that becomes sort of apparent is he served on three
thesis worth of work, and builds three corresponding systems during his work thesis there, during his
stay. One of them is the app Fence and related mobile privacy and security work with [indiscernible]
and David Wetherall who…
>> Seungyeop Han: And Stewart.
>> Matthai Philipose: And Stewart actually, yes, sorry.
[laughter]
I’d written it down here but, before David ran off to Google, Wetherall that is. Then he worked with
many of us here at Microsoft Research in the Mobile Networking Group. On the [indiscernible] and
MCDNN Speech and Mobile Recognition systems, the distributed runtimes for some of these things. But
the [indiscernible] thesis turns out to be about a set of kind of wide-area distributed services that he’s
going to talk about today, MetaSync and IPv6, SYTCOM, that kind of work and SEI sort of work.
Without further ado I’ll let him talk about that. That’s what he’s focusing on.
>> Seungyeop Han: Okay, thank you Matthai. Hello everyone. Can you hear me well in the back? As
introduced I’m Seungyeop from the University of Washington over the lake. Not that far away.
I’m from the Networking Lab but as mentioned I’ve been interested in many different areas in Computer
Science. I’ve done some of the distribution work and security and privacy, or even some computer
vision or machine learning related work. But today I will talk mostly about like systems research for
untrusted or unreliable environment.
In recent years how people used the computing devices has repeatedly evolved. We have diverse
devices from like tradition computers, like laptop or Tablets, and Smart Phones, and like the watches
and glasses. Now the Internet of Things has asked that all different devices are on the, or on you will
have some sort of computing power, and will be connected to the internet.
Along with new application the amount and the types of person information have ever been increasing
as well. It includes user’s location or contact information, house information or something like your
search queries, etcetera. It becomes very easy for the users to access their information through the
application and the devices.
Unfortunately not only the users but the other parties can also easily access to your personal
information when we are using internet there are so many entities that not trusted or not reliable.
These are raising the security and privacy risks.
At the same time users may not understand what’s happening and do not have much control over it.
We can’t actually change the first two things. Ultimately we want to change the last point. We want to
give users control over how their information is exposed and used from their remote services.
Let’s look at a simple diagram showing the interest involved in the network communication. You may
run some application in Smart Phones or computers. It’s connected to the internet. It’s talking to the
remote service. Any of these interests in this diagram can be untrusted or unreliable except for yourself.
Throughout my PhD study I have studied various issues around each environment. Let’s talk first about
the application. As Matthai briefly mentioned I worked quite a bit on the Smart Phone security and
privacy. When I was starting my PhD in two thousand ten Android phones started to become quite
popular over the world. It’s also an open source.
One of the great things with the Smart Phones, not only Android but also for iPhones and Windows
Phone, is that you may download different application from the application market. The good thing is
we can enjoy the functionality from the third party applications. But at the time and probably even now
we don’t understand much about what’s going on when we are using application. What it means the
application can access many different types of personal information. We don’t understand whether
they are sending them to other places or they’re just using there.
I kind of participated in the TaintDroid project which is an information-flow tracking system. It reveals
which information going to where. Like your location as going to the Bing Map maybe fine but it’s also
being sent to the advertising server, as well.
Based on the project with my colleagues I proposed protection mechanism which it’s called AppFence.
We were using two different mechanisms. One is kind of selectively give false information to the
application or we could block the unedited information from the device to be sent to the remote
services. I also worked on studying the Third-party Tracking. Like advertising and testing security, sorry
advertising analytics, and also testing security vulnerability in Smart Phones by combining study and
dynamic losses.
Moving onto the next part the network can be untrusted or unreliable. Fundamentally, the current
internet was not designed with the security or privacy considerations. We were looking into how we
could redesign the whole internet with the principal that all the network elements use the very minimal
information to allow the internet communication.
One of the key idea, I mean key design question we were using was that we use Tor Onion routing as an
addressing scheme. Instead of an IP address its kind of host use a Tor as a network protocol. It’s
through the application proxy. We could demonstrate with the proper design we can communicate
without having many kind privacy and security issues.
Finally, there could be a little more obvious adversaries at the remote services, so which I will talk in the
[indiscernible] today. By the way if you have any questions during the talk just feel free to ask.
In this talk I will talk about how we should design systems when the remote services are not reliable or
not trusted with the two different example systems. First, I will talk about MetaSync which is a file
synchronization service across multiple untrusted storage providers including like Dropbox, OneDrive,
and Google Drive.
Then I will talk about Pseudonym extraction where each application or user can use many pseudonym
identities to control over information exposure. Then I will briefly introduce some other work I did and
the future direction.
Let’s first look into the MetaSync. There are many different Cloud services. Among them I was looking
into the file sync services. One of the most popular services for users as kind of backing up your folders
into the Cloud, also many different types of applications are relying on those services.
The services like Dropbox or OneDrive make it easy for the users to access their files from many
different devices. Also to prevent from losing their file contents. Also we are using those services for
sharing files with other users like your friends or collaborators.
With this convenience they are getting very popular. Last year Dropbox announced that the number of
their users reached four hundred million. There are also many similar services are provided by various
companies including Microsoft except as OneDrive and Google Box. Recently the Chinese companies
like Baidu and [indiscernible] started to provide users two terabytes of free space, for free.
This sounds great, we can use those services. But the question is can we rely on single service for
keeping a hold of our files or for sharing files with others? We are expecting them to work well. But
fundamentally there is no reason for us to trust those service providers. Some of the service may come
from small company that may become unavailable time to time. Some others may be provided by other
countries so you may not trust them. We have seen many incidents that your data stored in those
synchronized service at risk even with some relatively trusted companies like Apple or Dropbox.
How can we protect our files? To protect files you may think of encrypting your files before storing into
the Cloud service. They’re effectively serviced doing that. But it does not provide any better availability.
We also can build a new system, whole new system from scratch to use those Cloud services with
minimal trust.
We’ve seen several systems especially in the distribution system community. But here I want to tackle
the problem from a little bit different direction. We are building a file sync service by exploiting existing
services. Each service has their own unique feature to differentiate from each other. But the core
functionality is to allow the users to put their files into Cloud service. Allow them to access from
different devices.
We are building sync service over the APIs provided by those services. By combining them we could
provide higher availability and greater capacity, higher performance, with a stronger confidentiality and
integrity. Yeah?
>>: How is the availability today?
>> Seungyeop Han: Sorry?
>>: How, what is the availability today? What is the percentage of ability for any of these services?
>> Seungyeop Han: We don’t have number; I mean I don’t have a number on the top of my head. It’s
not, they’re mostly available say for kind of, they are kind of pretty much [indiscernible]. But if you look
into the cases there’s some availability checking service. It’s not kind of rare to see sometimes one of
those services is not available. We’ve seen for example there’s some service which can permanently
save their service after all.
>>: Can you show us some numbers comparing what is higher in your results?
>> Seungyeop Han: Right, so we are not kind of showing as a number. But it’s kind of by design we have
better availability. Meaning, even if some of the services are failing we could, the users can access their
files.
>>: Okay, but I mean that’s a very good question, right. Suppose the availability was finite, let’s
pretend, okay. Let’s say you added one extra line on top of it, is it worth it?
>> Seungyeop Han: That’s a good point.
>>: Why do you care? Yeah so unless you have the current numbers you’re asking why build MetaSync?
The [indiscernible] system that we will see.
>> Seungyeop Han: Right, so…
>>: If it’s just availability unless you show numbers it’s not clear why you need MetaSync for
availability?
>> Seungyeop Han: That’s I think fair.
>>: You can’t show numbers for rare catastrophic events, right?
>>: But even, no…
>>: I mean I’m sure MetaSync is figured most potential the number for that, that’s fine, right?
>> Seungyeop Han: Yeah, I think that’s fair. In terms of reliability I mean, we, I don’t have number
unfortunately. But then I…
>>: That’s fine but…
>> Seungyeop Han: Then I kind of show that there’s incidents that the files are not, the services are not
available. This kind of design could help in the case of those unavailable services.
>>: Another quick comment I want to make is there’s a greater capacity in every single service at
[indiscernible] has an infinite capacity right now if you want to just pay me.
>> Seungyeop Han: Right, so it just kind of started from very naïve idea about can we use more free
space.
>>: [indiscernible] availability…
>> Seungyeop Han: It’s weird saying the greater capacity has like someone threw on a, gets more free
space…
>>: [indiscernible]. You can use everybody’s free space…
[laughter]
>> Seungyeop Han: Okay, those are good questions. In addition we have two more goals in MetaSync.
First we don’t expect there’s communication between the service providers because they are provided
by different companies. We don’t want to rely on the communication among the clients. Lastly, we
don’t want to introduce additional servers to, for using the system and make everything run in clients.
>>: Seungyeop can you tell us a little bit more about the justification for these two goals? Like, you
don’t want to have some service that you’re running in the Cloud that mediates between the different
services. You don’t want me to have multiple clients on my computer that will, then there’s some client
in there that will mediate between them?
>> Seungyeop Han: Right, so the thing is that for the communication between clients. Meaning we
could, let’s say for the file synchronization service so you could access those services from like your
computer or your Smart Phone. Even if, especially if you are sharing a file and folder with other
colleagues, several different devices or different users can access those folders at the same time, right.
That’s kind of the assumption we have. We are connecting to the multiple service providers and
communicating through that.
For the justification I think the first one is kind of obvious between service and service. But the
communication between the client we might need some mediation as a server. Or something like GHT
or whatever to figure out which clients are available like online, and etcetera, which kind of inquires
some system complexity, as well.
That’s kind of one of the reasons we didn’t take the peer to peer way for [indiscernible]. Reason for no
additional server is one thing was little bit of a trust issue. We are climbing that, okay, there’s a little
fundamental reason for you guys to trust Microsoft. It’s not making much sense that you guys should
instead trust us for, because we are running a better service.
Also there’s a kind of, it can be used if, as long as there’s the clients are, so clients are open source.
Anyone can see the codes and they can kind of audit what’s happening in the clients. For the server
again, server side another reason is we can’t kind of argue that we can serve the service forever. That’s
kind of the reason of justification of why we are taking this to be more considerations.
>>: That also [indiscernible] services a potential failure, correct, since he is not running any service,
right?
>> Seungyeop Han: Yeah.
>>: That’s good.
>> Seungyeop Han: From the goals there are three key challenges of the systems. We need to maintain
a globally consistent view of those files across multiple clients and over multiple services. Furthermore,
we want to build this system by using only the service providers’ unmodified APIs. We can’t force them
to make their like some new APIs for us. Finally, this should work even if there’s some failing service.
Let’s see how we designed this. This is the overview of the, yeah?
>>: Just for my understanding, the APIs provided by different services do they have any inherent
differences between them?
>> Seungyeop Han: Right, there are, even some of the functionalities are a little bit varying over the
services. But the core, so I will explain a little more about how we are exploring different services with
different APIs a little bit. But the core functionality is more like putting files and getting files. We are
using the service provider as more like a blob storage rather than using their synchronization feature.
>>: I’m just curious about the security more than anything. Do they provide any APIs to increase the
security?
>> Seungyeop Han: What do you mean by APIs?
>>: I don’t know, encryption if you could [indiscernible] encrypt API for [indiscernible]…
>> Seungyeop Han: No, there isn’t, so we, they are definitely [indiscernible] encrypted files in their
Cloud with some of the keys they are maintaining. But we don’t, and if there’s any exposed APIs to
control over that.
>>: How do support a sharing between two different users? Another particular example is
anonymously sharing, right. A convenient was on OneDrive I create a link, I send the link to anyone;
they can use that link to access pictures I want to share instead of dealing with their OneDrive account. I
have to manage the permission, right.
>> Seungyeop Han: Yeah.
>>: I think for any encrypted file system like this I mean it’s easy to deal with a single user. The
challenge is the multiple users.
>> Seungyeop Han: Right, that’s a good question. For the encryption we are providing various kind of
minimal like encryption layer which is kind of encrypting files based on the password. It’s also not kind
of whole you’re space are shared but whether we are kind of using similar to version control systems
repository model. You can designate a folder to be synchronized like multiple providers. You can share,
it’s kind of you are sharing a password for the users for accessing together.
Again, there is several…
>>: [indiscernible] password to your own system?
>>: Yeah…
>> Seungyeop Han: The password for encrypt…
>>: The passwords are embedded in this MetaSync layer right?
>> Seungyeop Han: Encrypting…
>>: How does this MetaSync layer share the…
>> Seungyeop Han: Sharing is we are using each client…
>>: Yeah.
>> Seungyeop Han: Each service are sharing features, I mean sharing APIs to make each of the users has
access to those repositories. The second question is how if it’s encrypted with some key how can we
manage the share of the key?
>>: Yes…
>> Seungyeop Han: We are using a like password based encryption which pretty simple, just simple
mechanism. Because there are several different orthogonal, I mean there could be some different
orthogonal [indiscernible] and contribution. Encryption as we are taking the minimalistic, I mean the
simplest way to do. For different users to share a folder it kind of, sharing is done by again API code.
Then the sharing of the key is basically telling the password to, I mean the encryption key to the person.
>>: There’s certain like unique features that some of these storage providers give you if they
understand…
>> Seungyeop Han: Right.
>>: The content that’s in the file. For example for certain open formats Google will give you some
newly rich semantics around version control. For images OneDrive will look for particularly problematic
images and alert the authorities if that’s the case.
>> Seungyeop Han: Right.
>>: Do you lose those semantics?
>> Seungyeop Han: Yes, so we are losing the semantics because it’s more kind of focusing on the, as I
mentioned it’s kind of identified as some of the core functionalities. How to, putting files on to the
Cloud and it can access them. We are not supporting those other functionalities for now. Some of them
may or may not be implemented on top of this.
>>: Okay and then does your system also support web clients?
>> Seungyeop Han: Currently not, but it could be.
>>: It could be done?
>> Seungyeop Han: Yeah.
>>: Just thought, I just want to understand the sharing model a little bit better. Let’s say I store a
picture that is maybe…
>>: [indiscernible].
>>: Yeah, no go ahead, go ahead, we’ll speak later.
>>: Just to clarify the stuff about this line is that on the client side or is that in the Cloud.
>> Seungyeop Han: This is a client side…
>>: Client side, okay.
>> Seungyeop Han: This is the…
>>: That wasn’t a joke, that wasn’t a joke question?
>>: No, no I just, it wasn’t clear to me that that was the client side.
>> Seungyeop Han: Yeah.
>>: They were following up on the line [indiscernible].
>>: Oh.
>> Seungyeop Han: Okay, so…
>>: [indiscernible]…
[laughter]
>>: [indiscernible] questions.
>> Seungyeop Han: There are, where was I? Okay, so there are three sub components. One is object
store and managing the files, one for synchronization, and one for replication. It’s mentioned there’s
kind of some common abstraction for each of the service. Also in between that’s doing encryption and
integrity check.
Let’s look into the first part the object store. The object store holds copy of the files. Those copies will
be later [indiscernible] and synchronized to the backend services. It has a pretty similar data structure
with any other version control system like git. It’s using content-based addressing and hash tree to store
the object.
Because we are using content-based addressing the names of the files are determined as the hash of the
content. The integrity check is pretty simple because you can just check the name, whether the name is
matching with the hash of the content. Also it automatically de-duplicates because if the contents are
saying it’ll be mapped into the same object. Finally, because the file name is unique each client can
independently modify and unload or download the file object.
Look into insight it would look like this. The directory maintains the pointers to the files. Files are
chunked if the file size is too big. Or the files in a directory can be margined into one single object if file
sizes are too small. Basically, it’s some kind optimization to make the file size over the object be
relatively uniform.
From this hash tree the hash of the root directory or do it through a repository as uniquely identifying
the current snapshots. When some of the files are modified, like for example here the large
[indiscernible] is being modified. What it does is its creating. The object store will create new blob for it
and update its parent pointers recursively. Basically, it’s just kind of considered as an immutable object.
This the logical view of the object store. We have files in the local file system. We synchronize those
files into the backend. After those changes we have new, had I mean kind of hash value of the new
snapshot.
Yeah?
>>: I have a question regarding freshness guarantees. If we’ve opened, we’ve shared a tree and we
open the same file. I presume there’s some meta data that tells me that the file been update, will
update?
>> Seungyeop Han: It’s a kind of a…
>>: By the way if you’re getting to it in the latter part of the talk…
>> Seungyeop Han: Yeah, I will get into that but a little bit about that before actually getting to. Again,
we, this kind of model is some following the version control system like git. It allows the other clients or
shared users, sharing users’ kind of access or modify the files. But some, maybe later it needs to be
resolved the confliction bi-manually.
Simply saying we replicate objects redundantly across our storage of providers. R is some configuration
number. For example, here with the replication factor two each blob will be replicated over like some of
the two services, like this. There are several requirements for this replication. First we need to
minimize the shared information among the services and clients. Because there could be many different
objects and it’s not, doesn’t make sense to map like a store that all the different mappings about the,
whether this object is going to like Google Drive and Dropbox, and the other objects going to some
other set of the services.
We also need to support the variation in storage size. Like some of the service providing like two
gigabytes to the like fifteen, like two terabytes.
Finally, we wanted to minimize realignment upon the configuration changes. Like if you are adding a
new service or removing a service, or changing the space, kind of a location wanted to minimize a real
alignment.
I will not go into the whole detail. But basically we are using the deterministic mapping function can be
created by small shared information. Each client can independently calculate the result of the function
to say okay this object hash something. We can put this object to the OneDrive and Google Drive.
>>: What if one of the services goes bankrupt or unavailable? Your deterministic mapping cannot
handle that.
>> Seungyeop Han: We, the function can be changed. Again, I don’t have the whole; it would take too
much time to explain how exactly it’s kind of built. But that’s correct and it can remove one service.
Then it remaps the object to the new services. But because of the, one of the requirements was we
wanted to minimize that cost even if we are removing a service or adding a new service. It’s not, it’s
costing like not as much as like other wrapping algorithms. I can talk to you a little bit more after.
>>: Okay.
>> Seungyeop Han: This is a replication. But I will talk a little bit about, more about the synchronization
steps. There are two different things we need to kind of share or agree between the clients. The one
thing that’s mentioned it’s kind of where the files are stored. The other is if each client is modifying files
or folders then how can we know that like which one is, how can we apply the changes to each other?
How can we know what’s the most recent version of the folder?
As mentioned, the, each object can be independently updated and independently unloaded or
downloaded. But the problem could happen if both, I mean multiple clients are modifying the folder
and they are insisting that my changes are should be applied before yours.
Let’s see how it works like, let’s imagine there are two clients and at the beginning they are
synchronized into the same point. As mentioned if client one is kind of updating some files and it’s the
only client updating then it can easily update the whole global view. The other client can catch up. But
the problem happens if both clients modify files and they are trying to claim that hey, the next version
should be mine.
In this way we need some sort of mechanism to order the changes. Then the other clients can merge on
to. I’ll explain this kind of synchronizing mechanism for now. You might realize that this is kind of a
traditional distribution system problem determining orders in the system. We might use Paxos or two
phase commit for making algorithms. They need to have consensus between the client to say what,
how we apply the changes in which order. Paxos is a multi-round non-blocking consensus algorithm and
is safe regardless of failures. It progresses if the majority of the acceptors is alive.
However we don’t have a Paxos API or two phase commit APIs from the service providers. As
mentioned in the goals slides we don’t have communication channels between servers and clients. The
challenge here is we need to handle concurrent updates and potentially unavailable service, only relying
on the existing APIs.
What we do is we were looking into way to simulate Paxos. Where there is no Paxos APIs but we could
devise a way to simulate with the given APIs. Especially, we found that it showed a service we could use
their APIs to build append-only log list abstraction.
Here with append-only log clients send the normal Paxos messages to the backend services and when
the messages arrive the service just appends the message into a list. Clients then later can fetch the list
of messages to figure out which proposal is accepted.
This abstraction can be built in various ways over the service, like we built the append-only list
abstraction with the comments on a file for Google Drive, OneDrive, and Box. For the Dropbox if you are
overriding a file it’s creating the new like revision list. We could use the revision list as an append-only
list. If those are not available we could use a sequence number of file name in a directory to make sure
like it’s a list of file which has some order in that.
>>: What part of the documentation for Google Drive and OneDrive makes you think that the commits
on a file provide strong consistency globally.
>> Seungyeop Han: Right, so that’s a good question. We don’t have, we were looking into that and we
don’t have like real guarantee that they are that they have a strong consistency. But we were checking
more empirically to see whether it’s working as we expected. But it’s kind of fair to say we don’t have
actual guarantee that it’s working with a strong consistency. But kind of we are modeling them as
something, with some linearization and we kind of build on top of the assumption.
Now backend services work as a passive acceptor. They log each message from clients. We call this as a
passive because the acceptors just tore the messages and accept decisions are made in client as if it was
done in the acceptors. In this diagram each client may propose a new root like client two say new root
is one and client one says something different. Then after reading the log if the majority wins it can
claim or it can learn that the new root should be something from the client two.
This is kind of simplified diagrams. An actual algorithm this needs to be done through two runs. Like
prepare promise and propose accept in the real Paxos algorithm. But yeah, let me know if you have
some other questions related to this.
After kind of devising this we realize that there’s validity the Lamport was proposing something similar
which it’s called its Disk Paxos. After an alignment we could think our passive Paxos algorithm is some
optimization, some form of optimization of the Disk Paxos algorithm. Especially this in Disk Paxos its
proposal need to access the per client block. The number of messages kind of order of number of
clients’ times’ number of acceptors. But because of the append-only list we could reduce that into the
order of number of acceptors.
>>: Question?
>> Seungyeop Han: Yep.
>>: Is this different from like the Google system called the Spanner? They use Paxos and support some
various [indiscernible] that provides consistency…
>> Seungyeop Han: Right, so overall it’s not different from any Paxos kind of system or Paxos algorithm.
But the thing is we are building a new way to run the Paxos. Like we can say it’s a new client based
Paxos algorithm. That’s kind of the difference between any other Paxos algorithm.
>>: If you have your [indiscernible] mapping which assumes the services are always there. Otherwise I
mean [indiscernible] mapping is troublesome. Then you could even have a different [indiscernible]
master and slave, right? For anything that’s replicated across services…
>> Seungyeop Han: Right, so that…
>>: That makes this thing simpler?
>> Seungyeop Han: That’s kind of, could be simpler but it’s not cannot progress when the master is
failing.
>>: But even masters have failed. Even if anything failed [indiscernible] mapping has to dynamically
handle it then it’s not a [indiscernible]?
>> Seungyeop Han: No, no, no, so we are replicating over like some vector right. It’s a different, so
there are two difference things again. One is for replication and one is for synchronization to determine
like what’s the most recent version. For the replication because if some, if there is some failing service
you can put maybe less number of services, I mean object to the less number of object at the moment.
But it can be still accessible because you have a copy in one of those R servers. Even if you’re failing,
there so like Google is failing you can for example get that object from like Dropbox or OneDrive. That’s
one thing. The other is kind of synchronization it’s simulating Paxos. Paxos can be working even if
there’s some failing services.
>>: But if you’re talking about if you want to separate these two things then let’s kind assume there’s
no replication. If there’s no replication this [indiscernible] clients are [indiscernible] to the single service.
Then that kind of problem can be solved though with some other function instead of Paxos, right? It’s
only because you are trying to get multiple things and you’re try to not just use your replication as a
backup. You’re also accessing the replication for load balancing kind of purposes. Otherwise you don’t
need Paxos, right? There’s something that’s not clear to me.
>> Seungyeop Han: We can talk a little more later but again it’s two different problems, we’re solving
two different problems here. Even for a synchronizing problem we need Paxos because the two
[indiscernible] commit or [indiscernible] approach cannot guarantee the progress if there’s a single
failing service. That’s why people use Paxos for like in ZooKeeper or Spanner in like a bottom line.
We implemented MetaSync system prototype in Python because many service providers have APIs in
Python. It currently supports five different backend services including Baidu, Box.net, Dropbox, Google
Drive, and OneDrive. We have two different types of clients. One is command line client and the other
is something similar with other native client which has kind of dedicated the folder to synchronize to the
Cloud periodically.
As one of the evaluation we checked the kind of end-to-end performance. For this we synchronize a
folder between two computers, with using clients from the sync services and also with MetaSync. I
presented the results of two different workloads. One is Linux Kernel source code that has many small
files and directories, the other was fifty photos.
You can see that we are outperforming but it’s something maybe not really fair comparison. Because
one thing is we can get certainly some performance gain by design by like parallel upload and download
with multiple providers. Also we are combining small files into a blob. But it’s unclear whether they
kind of wanted to really optimize their kind of synchronizing [indiscernible] in performance from the
native client, as well. But we are kind of; I wanted to show that it’s some working prototype. Then
users can have other option for…
>>: I’m confused by that statement. In the first row you’re uploading fifteen thousand files.
>> Seungyeop Han: Yep.
>>: You’re saying that MetaSync is faster because each of those fifteen thousand files is broken up into
smaller chunks?
>>: Other way.
>> Seungyeop Han: No, no, no, the other one.
>>: Combined at the…
>> Seungyeop Han: They are combined…
>>: I see.
>> Seungyeop Han: Maybe per, it’s not combining everything into the larger blob. But we have some
policy like if in a directory there are small files kind of smaller than some threshold we are merging them
into a single blob.
>>: But if you upload for example in Dropbox and Google if you upload a directory presumably they can
do the same thing that you’re doing…
>> Seungyeop Han: They could do, they are not doing.
>>: Not only that…
>> Seungyeop Han: What’s, that’s what I mentioned…
>>: That I know is not at least true for OneDrive. OneDrive uploading the directory and uploading the
file are two different options. OneDrive does do something interesting if you’re uploading multiple files
from the same directory.
>> Seungyeop Han: We…
>>: I could be wrong but…
>> Seungyeop Han: Yeah, I have…
>>: But there’s no…
>>: [indiscernible]
>> Seungyeop Han: I have a number for OneDrive but it’s not here. But what is similar was Dropbox. I
mean it’s faster than like other services, but…
>>: But what does this…
>>: [indiscernible]…
>>: Upload, download doing?
>>: They’re separate.
>>: Oh.
>>: There’s the file blobbing which you’re saying any of these guys could do in principal.
>> Seungyeop Han: We are using multiple files to kind of uploading. When there are multiple, many
different files, right. We’re unloading them kind of concurrently, one and also unloading concurrently to
the multiple backend and downloading, as well.
>>: [indiscernible]
>> Seungyeop Han: One thing is its service has something [indiscernible] like per user per bandwidth.
We could, some might overcome that as also it may not be the huge, the biggest factor. But it’s one of
the factors that’s kind of parallel on download.
>>: Have you compared this with limited approach where you could just zip the file at one client and
unzip another, how does that compare?
>> Seungyeop Han: For and unloading through the Dropbox or OneDrive and downloading. We haven’t
compared that. Again, performance is something like in terms of we wanted to show that this is kind of
one of the, potentially better options. But it’s not claiming that they are doing something very wrong.
More on, we wanted to show that some design is working and like…
>>: What does the [indiscernible] experiment that we can do? Just zip it and…
>> Seungyeop Han: Yeah, I think that’s kind of interesting thing to do.
>>: Clarification of the Dropbox column, Google column. Are those, is that the performance of the
Dropbox client and the Google client or is that the performance of your Python code that’s only talking
to Dropbox?
>> Seungyeop Han: Their client.
>>: Their native client or their rest client.
>> Seungyeop Han: Their native client. Kind of running clients, its computer and we are putting folder
into the dedicated…
>>: The method I’m getting here is there are some optimizations that they should be doing.
>>: This is not…
>>: They’re not doing.
>>: Yeah…
>>: But the overall high level bit is that the whole synchronization protocol is not taking a huge amount
of extra time suddenly converting other things.
>> Seungyeop Han: Right, yeah.
>>: There’s no huge overhead as that API…
>> Seungyeop Han: Sure.
>>: We need that to do [indiscernible]…
>> Seungyeop Han: Yeah, we can talk a little more after the talk. But let me move forward. In summary
I present MetaSync which is a, kind combining multiple file synchronization to build a file synchronous
service. We achieve a consistent update with a new client-based Paxos algorithm. I kind of present a
minimize, how we could minimize the redistribution through a stable deterministic mapping. If you’re
interest in the source codes is available please visit the website.
>>: If the client is disconnected are they allowed to edit objects?
>> Seungyeop Han: They are allowed to edit the object but the object will be synchronized later.
>>: Does synchronized later mean rejecting changes completely or merging them?
>> Seungyeop Han: No they need to merge on to them. [indiscernible]?
>>: On the previous slide how many Paxos operation are run? Is it one per file or is it one for the whole
update?
>> Seungyeop Han: Well it’s actually one for the whole update. It’s not really testing the evaluation of
the like Paxos algorithm. I have graph if you’re interested.
>>: Is it done at the end after all the data is up there?
>> Seungyeop Han: Yeah.
>>: If there were a conflict eighteen minutes and fifty-four seconds in it would then have to start
entirely from scratch?
>> Seungyeop Han: No, so it can, well the merging itself can be considered as somewhat separate
problem. Like for example if you were merging two conflict, I mean some conflict in like git there is
some way to resolve some part of a file to be okay. But there could be some of the files need to be like
handled by marking okay this part is kind of merged by the user intervention.
We’re not really handling much we’re just marking some of the conflict file as this is conflicted. But in
general you don’t need to start from the scratch but there’s some way to figure out which files are kind
of conflicted. Because it has also hash so that it’s easy to check whether there’s some files are same or
not.
>>: What is the state of the system? Let’s say there were two uploads that conflicted and this is the
second one and the conflict was happened eighteen minutes and fifty-four second in. What state are
we left in after both of those attempts have happened?
>> Seungyeop Han: I so could you clarify the question? Like, so there are two empty repositories and
both of them are uploading some different set of the files, or. Eighteen something minutes, eighteen
minutes and what nineteen minutes it’s basically trying to synchronize a whole Linux kernel from the
one computer to the other computer. That kind of thing, something to say it’s not creating the conflict.
Because conflicting is usually maybe modifying some of the files over there and that could maybe
involve with some number but not as long as like a nineteen minute or twenty minutes, right. Not sure I
answered that.
>>: We’ll check it out for you.
>> Seungyeop Han: Okay.
>>: I noticed that Amazon’s consumer file service is not on your list.
>> Seungyeop Han: Right.
>>: Qualitatively you know how did it differ? Did you purposely not consider them?
>> Seungyeop Han: Right, so we are not, so we are missing it because we kind on focused on more like
end user file services like a Dropbox or OneDrive. It’s fair to; I mean it’s kind of something we could
include into the system to build.
>>: When you were doing this did you observe any odd failures? You know one; the common failure
mode I would image would be you know services up or services down. But did you see failure modes
where the blob of bytes that you got back from the fall storage provider was not the blob of bytes that
you wrote?
>> Seungyeop Han: [indiscernible], so.
>>: Okay.
>> Matthai Philipose: Can we listen to rest of the talk guys?
>> Seungyeop Han: Sorry?
>>: [indiscernible] contained.
>> Matthai Philipose: This is half the talk.
[laughter]
>> Seungyeop Han: Yeah, how many minuets do I have?
>> Matthai Philipose: [indiscernible]
>> Seungyeop Han: Three more, sure.
>>: He was the last question…
>> Matthai Philipose: I know but there has to be a limit right.
>>: Thirty minutes.
>>: Yeah, keep going.
>> Seungyeop Han: Yeah, okay. With MetaSync I’ve talked about file sync service with unreliable or
untrusted backend. It also can prevent the remote service from linking or corrupting files. But I would
kind of change a little bit the gears into how we can prevent service from linking onto the user activities.
I think you might have experience like after visiting Zappos to see some of the shoes. Ads of shoes are
following you to CNN, the Fox, or whatever the news sites go. There are many different types of
personal information. But some of them are not really obvious to see the privacy risk. One example
was just mentioned the tracking behavior, tracking user behavior. When we are using the today’s
internet we should [indiscernible] that the most of our activities are somewhat tracked on the website.
That’s something done through tracking. I mean the example I just showed, right. According to the
previous work among the top five hundred internet web sites, ninety-one, it’s also from two thousand
twelve so may have more. But at the time ninety-one percent embed at least one tracker. Ninety, like
eighty-eight percent embed like some third party trackers like advertising or analytics in it.
Web services are very well incentivized to build a large profile, large big user profile. Because their
revenue model is strongly tied to how they understand the users interest. Information collected from
those remote services could be something from like your demographic information or location. To a
little more sensitive ones like political opinion, medical information, or sexual orientation.
Let’s look into how the tracking is done a little more deeply or more in the low level. In this example
Alice is sending a set of queries like Microsoft route to her home address. Bob is sending another set of
queries. From the tracker they may know the first set of queries is coming from one user. The set is
coming from someone else. They are just linking those queries or activities in the remote side.
What does it mean to us? The information collected from the trackers create very detailed picture of
you. Because of this we usually think okay tracking is bad, tracking is harmful to us. But on the other
hand we also get some benefit from tracking or being tracked. The tracking is a tool for creating the
relationships between the users and services. It enables a personalization like recommendation
services. It may be used for sometimes better user security like in banking systems. Also in some sense
we can say that we are paying the service by being tracked.
In terms of tracking the threat model here is not just being tracked. But more like they are tracking you
through information in packets to correlate even unwanted traffic together, so what could be a better
scenario.
I want to kind of have tracking look like this diagram. Let’s make [indiscernible] her address from being
correlated to the other queries. Similarly Bob may want to separate activities related to his
[indiscernible] depression from the other activities. Even though those two sets of queries are coming
from one [indiscernible] trackers cannot know whether they are coming from one host or two different
end hosts.
Again, I’m not arguing that. We need to get rid of whole tracking availability. Instead we want to
provide the users with more control over what can be tracked. Or what can be linked together. By
giving users over what to be tracked we could eliminate or even remove the problem of privacy risk. At
the same time we believe that we can keep, we can still maintain the positive side of the tracking.
Before jumping into the, our approach I wanted to understand how tracking works a little more. It’s just
pretty simple again. The service that track users by linking their requests at the very low level, so
multiple requests from a single host or single application can be linked from the remote side because
they are sharing some identifiers. The identifier could be something like a cookie which is kind of most
common one. But also like IP addresses giving pretty much information to the remote side especially
when they are kind of combined with the fingerprinting information like some of the Window related
ones like fonts from the operating system, etcetera.
Therefore, users, we should have controls with an abstraction covering all different identifying features,
not just cookie or not just IP address. We are calling this a pseudonym. We want to, each host manage
a large number of unlinkable pseudonyms. Users or applications can choose which ones are used for
which operation, so that the remote service can have limited ability to correlate those operations or
activities.
For example, there could be something like [indiscernible] pseudonym for one time per trace. From the
previous example Alice may use a pseudonym for medical information and another one for her home
address to separate them.
Now this is an overview. Let’s see how we want to use pseudonyms. When Alice is using medical
information she’s using one pseudonym. Then when she needs another one the application finds when
or which pseudonym to use through the policy engine. It communicates with the operating system to
locate more IP addresses. In turn the operating system needs to talk to the network to figure out okay I
need more IP addresses from the DHCP and how can the packets from other remote service to be
routed into my computer. Finally, the application now can use another pseudonym for the location
related query.
From this picture I will first explain how the application layer design should be. Then describe our
application-layer designed to support the pseudonym abstraction.
>>: I have a classification question. Are you talking about first party tracking or are you talking about
the eighty-eight percent of third party tracking?
>> Seungyeop Han: It doesn’t, it covers both of them. We are trying to say that, so that’s kind of more
related to the how the policy [inaudible] designs…
>>: But not really, right, because unless you also manipulate how browser handles cookies…
>> Seungyeop Han: We many…
>>: If I’m Google you log into me I put a cookie and I also embed my thing into all the websites you go to
as a friend that you will load it to and you give me your cookie then I know which websites you go, right.
>> Seungyeop Han: Right…
>>: It doesn’t matter which IP you come from.
>> Seungyeop Han: This is kind of cross layer design. We also modify the browser to manage, how they
are managing the cookies, as well.
>>: Oh, are you going to talk about it?
>> Seungyeop Han: Little bit, yes.
>>: Okay, alright.
>> Seungyeop Han: It’s not that much about that but we will talk a little bit more about that at all, sure.
>>: So that I understand this a little bit. Today if you start a browser in [indiscernible] I guess the IP
address won’t change.
>> Seungyeop Han: Right.
>>: But you can login as a separate user there and do searches there.
>> Seungyeop Han: Right.
>>: Cookies would change.
>> Seungyeop Han: Right.
>>: Is IP the unique bit here that you’re supporting different IP addresses?
>> Seungyeop Han: There are two different things. We are, the one is we are claiming that although we
are mostly talking about IP and cookie here. But one thing is the identifier as a kind of, identifier should,
or identifying abstraction should consider many different types of potential identifiers in the. Not only
web browsing but let’s say web browsing context. For example IP address, cookie or something related
to JavaScript, or some of the like web browser related information. Those things need to be considered.
That’s kind of one argument here. The other is, yes, IP addresses is somewhat new bit, a little bit
different bit from there.
>>: I…
>>: [indiscernible] system maybe you get, I mean if it’s a [indiscernible] system the IP addresses are
going to change, right. What do you do about that?
>> Seungyeop Han: Sorry?
>>: If it’s a [indiscernible] system, right, [indiscernible] of a meaning.
>> Seungyeop Han: That’s a good question. There is some, for example we could think about using that
or a proxy for kind of hiding between them to have like many different hosts can share IP address. One
hand we are talking about we want to give kind of control. It’s not; we are not trying to block the
tracking ability. But is your question about like can we change the IP address even under the net?
>>: We’ll take it offline.
>> Seungyeop Han: Yeah, sure. Let’s, as mentioned I will talk the application-layer design first. Let’s
assume that we have pseudonyms available. Then the question is application needs to find a way to
determine how to use them. But it again depends on the user and the application. Sometimes people
might want to have every packet to be different. Or they might change the ID or pseudonym for the
different account. Or like changing by like tab changes or domain name or whatever, etcetera.
As a system they design we don’t know actually how, I mean which pseudonym to be used. Here
instead we are trying to build a flexible way for each application to define their own policy. For
example, in web browsing policy can be defined as a function of the request information or the state of
the browser. Or they may include something like unique ID for each Window, tab, or as [indiscernible].
>>: Is the mapping between activity and pseudonyms one to one?
>> Seungyeop Han: Doesn’t need to be. Activity to the pseudonym is n to one and if they are mapped
into the same pseudonym they can be kind of correlated from the remote side. That’s kind of how to, it
depends on which, I mean which level do you want to allow them to access your activity as a correlated
one?
>>: It’s many to one. Can I have two pseudonyms for an activity?
>> Seungyeop Han: No, I mean not exactly. It’s not, I don’t say it’s possible because maybe possible in a
little more [indiscernible]. But more on like, activities more like per request we can see.
>>: [indiscernible]
>>: [indiscernible] quality design like I use multiple identities on the Chrome browser and sometimes
even I don’t know which identity I should be using.
>> Seungyeop Han: That’s fair.
>>: Why doing a sudden task?
>> Seungyeop Han: Yeah.
>>: I don’t know how computer will decide that.
>> Seungyeop Han: That’s fair. That’s a kind of separate question to answer actually.
>>: Then that’s it, that’s the question I had in mind…
>> Seungyeop Han: Yeah.
>>: If you cannot do it then how can some people, probably cannot do it, as well.
>>: What you’re saying for example if you have a medical application you might use a distinct
pseudonym for that. If you have some media related stuff you might so they make very course policies
that allow you to do that. Is that the level of which…
>> Seungyeop Han: Yeah, and then people have been looking to how to for example difference say the
cookies are over like when you are accessing facebook.com you have one say, pseudonym. If you are
accessing Bing.com you have another pseudonym. There are many different types of polices as a
literature. We are trying to make it possible to implement those policies. But whether we are not, we
haven’t had a chance to look into, in detail what kind of policy would be more effective.
That’s a kind of example what kind of policies could be possible? By default every request uses the
same pseudonym. In this example the facebook.com can know that the user is reading some specific
article on some news site, because it can correlate the cookie from its previous login to the Facebook.
It’s coming from the like button there to the facebook.com.
On the other hand we can think about a little more extreme case. Every request has this different
pseudonym. But there are many different pseudonym policies in the middle. For example, where
[indiscernible] can change pseudonym according to the pages domain name, the user connected which
means that when a user visits news.com all the images, scripts are using the same pseudonym. But it
uses different pseudonym when the user visits facebook.com. In this case, as well as the previous
extreme case Facebook cannot correlate the user’s visit to Facebook what it’s reading in some article in
some special website.
>>: This is a tradeoff between privacy and convenience, right. Because our [indiscernible] P two
different from P three. That means I won’t be able to click like…
>> Seungyeop Han: No, the thing is you; well it’s from one of the previous work, not by me. But the
problem of the like button is even if you are, you don’t click the like. It’s giving the information to the
Facebook. It’s possible when you are clicking the like it’s connecting to the social [indiscernible] site at
the time. Again, sure there will be some part, tradeoffs. But, we are not selling that this specific policy
is effective or efficient. It’s claiming that there could be multiple policies that we can build up.
Again, we briefly, I briefly explained how policies can limit servers tracking ability. Let’s move on to how
we get support from the application-layer. Especially how can we assign many addresses to a single
host?
To support pseudonym abstraction again, we need to consider several different considerations. The first
thing is when we need to assign IP address per pseudonym it just needs to have many IP addresses.
Then those addresses, those many addresses should be properly mixed. If they are clustered together
trackers can easily figure out okay those clusters are coming from a single host.
On the other hand if we are just randomly assigning the addresses into the network, I mean each host
has just has some random address from your network. Then the problem is routing table could be just
[indiscernible]. We need to have the design for resolving these issues.
Let’s first talk about how we can provide many IP addresses. It’s pretty simple. We are moving toward
IPv6 although we don’t have any more, potentially any more IPv4 addresses. In IPv6 even a small
network will get slot sixty-four IP block. But it’s much larger, even much larger than the whole IPv4
address space.
With IPv6 we’ll have an environment where its host can get many more IP addresses rather than just
one. Then if we look into the IP address in a packet the first part is used to route the packet into the
network. The second part is used to route the packet within the network. We realize that as long as the
network can deliver the packet to the end host. The address can encode many different information.
Also with the long IP address with the one twenty-eight bits although much more flexibility to encode
information into an address.
What we did is we devised a very simple technique to assign seemingly random addresses to one host,
but route packets still efficiently. We divided the second part of the address into three sub-parts. First
is subnet ID and the second is host ID, and the third is pseudonym ID which is pretty much randomly
assigned. The first two are kind of similar to what we currently have its internet. We are doing the
longest prefix matching to route the packet. Then what we do is we are encrypting these three subparts together into the encrypted ID using symmetric encryption.
End-hosts know only encrypted IP addresses so whenever it needs a new IP address it requests from
the, it sends a request to the network. Then the network knows those subnet and host ID and assigning
the new pseudonym ID and, by after encryption it gives it to the end-host. Router uses the base
addresses to forward packets. It can decrypt those address parts and see the original subnet ID and host
ID to write the packet.
As mentioned, these two parts and the routing mechanism haven’t changed it from the current internet.
We can still have the same size of routing table and same efficiency of the routing protocol.
Let me show a, I mean with an example. When destination server is sending a packet it has a prefix and
encrypted ID because a prefix it can use something like a PGP to figure out okay the packet should go to
the network. After it arrives into the network it can decrypt the encrypted ID part and see what’s next
to [indiscernible], in the same way with what we do currently. It repeats the operation until the packet
arrives into the end-host. By this, again, we can have the still efficient algorithm for routing while
multiple, I mean many, many IP addresses can be assigned to the single host.
As proof of concept we implemented the prototype again which approximates our system design.
Because we didn’t have an IP basics [indiscernible] to control was what we did is we were using kind of
up, sorry let me step back. We didn’t have IP basics [indiscernible] but there was IPv6 Tunnel Broker we
could use for building this.
Let me explain how we approximate the design. For the policy engine what we did as we were building
browser extension, at the time we were using Chrome. Browser extension can have policy functions like
in JavaScript to say like how, which pseudonym to be used for which activity.
As mentioned we were using IPv6 Tunnel Broker which is connected to our Gateway server. In Gateway
server they are maintaining IP addresses from that network. It works as a web proxy. When Chrome, I
mean when our extension is sending the request it can tag the request with pseudonym ID. The
Gateway assigns that IP, the IP address matching with the pseudonym ID to the output socket.
Any questions, no. Based on the prototype here we look into part of the evaluation. First we wanted to
examine whether we can build various policies into our design. To the [indiscernible] we looked into the
protection mechanism out there as mentioned there are several different papers talking about how we
can protect in like the cookies, or extra for the third party tracker blocking.
We could implement various protection mechanisms from the related work in a cross-layer manner.
Meaning we could have IP and cookie at the same time for, in the protection mechanism. Most of them
are pretty straightforward and they include something like very simple trivial or extreme case as
mentioned in the previous slides. Or depending on a little more information like per first party domain I
showed previously.
Then we looked at the tradeoffs, how much activities are exposed to the third party. Here we are
looking at the third party blocking a little bit with the number of pseudonyms. Because say if you’re
changing the pseudonym every request it can limit quite a bit. I mean the third party cannot track any
of the activity but it needs quite large number of pseudonyms.
Based on, we collected choices for like three days with end users, pretty small size of trace. As you can
see the, from the red line it’s showing the average number of activities observed by third party. For the
kind of middle policies it could effectively register number of activities while requiring not too many
pseudonyms.
In summary; I introduced a new abstraction code pseudonym which allows flexible user control over
unlinkable identities. To enable that we provide the new network addressing and routing mechanism
which exploit the [indiscernible] IPv6 address space. Our system enables various policies with
expressive policy framework.
Before finishing up I will briefly introduce some of our other work and the future directions. In addition
to security related work I just presented I have explored many different research directions.
[indiscernible] systems but with connection to other fields like [indiscernible] computing or machine
learning. Actually I had lots of collaboration with MS [indiscernible] folks.
For example I have worked on the voice and the vision interactions with mobile devices. Like how we
could enable the natural language interfaces in the Smart Phone applications. Or continuous mobile
perception and DNN execution engine for the mobile devices. I also did some like ATCI and machine
learning and a little more security acceleration stuff, as well.
Looking forward, I kind of plan to continue the research in systems and security or privacy, one example
I’m looking into as a new scalable micropayment system. We know that Bitcoin or block chain has some
potential to be practically used. But it has several limitations including its scalability. I’m looking into
how we can build some scalable block chain mechanism for, especially looking into cases like incentivize
Tor users to pay the [indiscernible]. That’s kind of micropayment between those peer to peer like
systems.
The second one is from the computer vision related direction but more from the system side. Many
researchers have looked into how can we actually write the training because it takes so much time? But
in the future if many, applications are using DNNs for their services. It’s apparently we’ll have lots of
requests coming to the Cloud. We need to; I mean the Cloud service needs to serve those requests
more efficiently. It would be a very interesting question from the system point of how we can design
the like scheduling for or research the location for handling those large numbers of requests for the
DNNs.
Finally, the last one is related to both of the some DNNs and privacy like we are kind of going to see
some of more wearable devices like the Google Glasses. Or some more, like I want to say the AR devices
like HoloLens, etcetera. That kind of scenario we could expect the continuous vision would pose a lot of
challenges in privacy, as well. Because it will have lots of input from the devices and how shall the
application can handle those input without violating too much, or not privacy. Also as is shown I have
very broad interest across many different areas in computer science. I hope to collaborate with not only
in systems but in other fields, as well.
It’s the final slide. Today I presented how we can give users more control in the untrusted or unreliable
environments. As examples I talked about two different systems. Thank you very much and I will be
happy to take any further remaining questions.
[applause]
>>: What does [indiscernible] when you look at your future work that you’re proposing.
>> Seungyeop Han: Yeah.
>>: It seems like a [indiscernible] a broad [indiscernible] of things. For example there’s some that are
privacy and wearable stuff for example. What’s the connection? I mean you know how do we make
sense of your interest across all, what’s the connection within these things basically?
>> Seungyeop Han: Well, sure I mean connections are somewhat in between two items and two items
or something like this. I can’t say that there’s an overall single theme I want to work on. But something
I want to try to figure out what kinds of interesting problems is coming given the like new emerging
system, like micro-payment might be. The other is like computer vision through wearables. There are
multiple different kind of problems are there.
>>: [indiscernible] see is a connection within micro-payments in the MetaSync work and your
[indiscernible] network, and your [indiscernible] scalable work. But in privacy in wearables what is the,
what’s the angle, or what is the connection there?
>> Seungyeop Han: It’s hard to say that it’s connected to the other two or something. It’s connected to
like some other my privacy related work. Also somewhat natural to be interested in this problem
because I’ve been working with some of the wearable devices to, I mean mobile devices to say how can
we process the computer vision related stuff?
>>: Actually so there’s some mobile, okay.
>>: Pseudonym…
>>: But he’s done some work on mobile…
>>: I had a question.
>>: [indiscernible].
>>: This is maybe more of sort of a nitty gritty question rather than more of a vision question that
[indiscernible] was alluding to. But in MetaSync [indiscernible] asked this question about trying to use
the storage service that might provide block storage interface as opposed to you know Google Drive
which provides [indiscernible] stored files, right. You went with sort of the later where you know it is
more user facing.
>> Seungyeop Han: Right.
>>: But one of the applications that you use pretty aggressively in that setting is you know shared
document editing, right? Do you consider that as a workload? Because I would imagine if you’re editing
the same file then I wasn’t able to get an idea of what entry goes into the log. You know you showed us
Paxos log, log of operations.
>> Seungyeop Han: Right.
>>: But it’s very possible that we’re editing the same file; multiple users are editing the same file. That
would result in a lot of round trips just too sort of reach an agreement as to you know what edit should
go in next.
>> Seungyeop Han: Right, so that’s a good question. That’s again, that’s somewhat similar when you
are writing on paper together with other colleagues through git. It’s, I’m not sure how much conflict you
had, I had quite a bit, but.
>>: I think that is a different setting where you have multiple files. You can divvy up your files ahead of
time. But if you just consider Google Doc for example.
>> Seungyeop Han: Right, right, so it’s a, it’s not targeting the Google Doc like collaborative web
browsing, like, how can I say it’s kind of a special application in the setting. We potentially, it would
potentially be possible to build such application on top of these APIs. But currently what we have is
there is a folder and the log is, kind of logs in the Paxos as determined, what’s the next version of the
current folder is.
>>: Not in the file level but in the folder level, okay.
>> Matthai Philipose: Thank you.
>> Seungyeop Han: Thanks.
>>: [indiscernible]
>> Seungyeop Han: Sure.
>>: Who’s after me?
Download