Document 17844380

37034 >> Ganesh Ananthanarayanan: We're happy to host Rachit Agarwal today. Rachit got his Ph.D. from URUC and did a post doc with Ion in Berkeley in the AMPLab. Rachit is one of the kind that truly, you know, lives up to his systems plus theory billing in that he's actually published at both sitcom or NSDI as well as SODA. And yeah, today he would be talking about work that he has been doing in his post doc on Succinct, which has been getting a lot of attention in the media, in the spark open source community, as well as a lot of companies that are beginning to experiment to start using it. >> Rachit Agarwal: Okay. Great. Thanks, Ganesh. Very happy to be here. Thank you all for all for coming. So, as Ganesh said, I am a post doc and I've been at Berkeley for two years. And mostly I've been working on the system Succinct. I'll tell you towards the end some of the things that I've been thinking about, again, surrounding Succinct. And I did my Ph.D. more on graphs or something, graph queries. I'm going to talk a little less about it. But the combined team has always been about interactive queries. So what do I mean by that is some user sitting right in front of the system, these could be services or somebody interacting with a system that wants to do queries on large data sets. And the challenges there are generally twofold. One is you want to get low latency, but [indiscernible] latency is less than seven hundredths of millisecond and you also want to get high throughput. So that's what I'm going to talk about, how to design interactive systems and why did they start thinking about rethinking interactive systems. So the first thing is that I think achieving query interactivity is becoming increasingly harder today. And there are three main reasons for this. The first one is a scale. In last few years, or at least in last decade, we have seen all this new media system that have given rise to this large massive amounts of data. And today, standing here, at least I can say hundreds of terabytes of data has just [indiscernible]. Right. More than that, the data growth has been reported to be very, very fast on these user-facing system. So at least on the conservative side, people are shown that 70 percent growth rate on the scale size is just normal. And while the data sizes have increased significantly, people still want to do interesting queries on these data. For example, just three months ago, Twitter released their search systems so now they have indexed all their tweets [indiscernible] everybody. Now you can search for simple tweets like, okay, tell me all the tweet that has a certain person in them. Okay. You can do more interesting things. For example, there are these new log analytics companies that allow to you do interesting queries. Here I have a simple query which is like a regular expression query which says find me all the locks that have error 404, error 505. Right? So we would want to do these queries. At a high level, you can translate these queries into so-called search queries or regular expression queries and we would want to do very interesting queries in graph systems like Facebook. Every time you go to a Facebook system, there's some range queries happening in the back and I'll talk a little bit about graph queries towards the end, but people want to do very interesting graph queries on this mass amounts of data as well. So scale has increased. People want to do complex queries. What has not change really is the definition of interactivity. For anything, it has only gotten more and more stringent because users don't want to wait too much. So, the latency and throughput constraints have not changed. Wanted to do queries on mass amounts of data, even larger amounts of data, want to do complex queries, and you still want to do that within milliseconds. Okay. So what is it that makes this problem challenging? So if you look at this interactive big data problems, let me run a very simple experiment and then show you some results. Okay. So I'm going to take a massive amount of records from a company called Conviva. And these records are just to think of it as a collection of attribute pairs, so to say a role or a key value pair with multiple attributes in the values. And I'm going to run some certain search queries. I'm going to use a single Amazon EC2 server with certain amount of RAM, but show you results for single core results. Okay. And one of the state-of-the-art systems to be able to do search queries is called Elasticsearch. I'm going to [indiscernible]. So let's see what happens. The X axis, I'm going increase the amount of data that I can -that I am doing the queries on. Okay. On the Y axis, I have the throughput which is the number of queries you can answer per second. Let's start with the Elasticsearch. So this is what the throughput you see. Okay? Which means until roughly 16 gigabytes of data, you see really good performance. You can do roughly 200 queries per core or Elasticsearch. As soon as you go from 16 to 32, the throughput drops down to just, you know, one or two queries per second. And this is where the problem lies. And this is 60 gigabyte RAM. I'll tell you why this is the case. But it's not just Elasticsearch. You see, similar results for Mongo dB and similar results for Cassandra. Okay. So question is, why is it that we see such huge performance drop just beyond certain data sizes? Any guesses? >> RAM? >> Rachit Agarwal: RAM. Right? Like you guessed, most of you who have guessed that at some point, you cannot execute queries in memory. Okay. you have to go to secondary storage. And the problem is that secondary storage today is still 100 X slower than main memory. >> So So you don't index in any way? >> Rachit Agarwal: Yes. So this was -- I'll talk about indexes later on. This was indexed data. That's why you drop out at 16 gigabytes, because of the rest of the data is indexing data. I'll show you exactly. But yes, this, we're using indexes. Elasticsearch uses secondary nexus. Okay. So the secondary storage is still 100 X slower. Which means if you do a simple calculation, you will see that even if your ten percent queries go out of memory, right, throughput reduces by order of magnitude, even if you're ten percent queries. In fact, if your one percent queries go out of memory, your system throughput, the number of queries you can answer out of your system reduces by 2X. Okay. And this is one of the problems that has been known in the system [indiscernible] for a long while which is increasing the cache hit rate. Where here, it just becomes more prominent. Okay. So this is the problem we want to understand. But it's not just that the fixed data sizes are creating a problem. If you look at what has happened to the last ten years, in 2006, we used to have, you know, amount of memory was large enough to keep most of the data sizes in memory, right? Over the last ten years, if you look at Moore's law, we have seen that memory capacity has been growing slowly, much, much more slowly than Moore's law. On the other hand, all these systems have given rise to this data sets which are larger. Right? And this gap has been increasing. So it's not just like today's data you can scale out and do in memory. It's that this cap has been increasing very rapidly and this is where the problem becomes much more challenging than how do we sustain this over time. Right? So at scale, basically today, all existing systems expose this very hard choice to users with intractive queries that either you can do very simple point lookups where you do reads and writes of data, and you can get intracted where you can do millions of queries per second. They are both systems from industry and from academia where you can do queries very fast. On the other hand, if you want to do even slightly complicated queries like simple search, you really have to lose interactability because you go out of main memory. Right? And given the data growth sizes, this problem is only getting worse. >> Just one question. data or just hot data? On the previous graph, is that data size all of the >> Rachit Agarwal: The hot. have here, you mean? >> So this was for active data. The number that I Right, yeah. >> Rachit Agarwal: Yeah. Yeah. So this number is from this paper, Partha from HP Labs. He had this survey done where they showed that for some applications, the active data they call it, active data, which is the hard data, user-facing data is increasing at this rate. >> And how did they define hot data? >> Rachit Agarwal: So this was one of the -- that's a good question. Okay. I don't remember the exact time range, but they just said, okay, data used in exactly this time range was called the hot data. Yes. >> [Indiscernible] through to the next slide? So I don't know what your notion of a powerful [indiscernible] is, but Google and Bing both interactive bounds and the data is way bigger than URL [indiscernible] we're talking about. So why is that? I mean ->> Rachit Agarwal: >> Right. -- it seems like [indiscernible] is giving intracted performance. >> Rachit Agarwal: Absolutely. Yes. So Google and Bing and some other companies, right? Yes. That's one of the points I was going to stress out later on as well. If you could scale out today, right, and if you could continue to increase your scale out at 70 percent our data growth rate every year, you'll be able to sustain this. The problem is that if your data continues to grow every year, you have to scale out by that particular rate every year. So if you're data is growing at 50 percent every year, you have to continue increasing having 50 percent extra servers every year to be able to just to stay in that performance. Again, because of the same problem, your queries going to memory, off memory. Now, the rate ->> That's [indiscernible], right? Because it depends upon whether you want -- again, it depends on what you [indiscernible]. If you're willing to live with [indiscernible], then no. Right? For example, web data. If I don't want the latest and the greatest [indiscernible], willing to live with the data that is a little bit older, I can put older data in cold storage or something like that and not query it. >> Rachit Agarwal: Exactly. So yes. I was ignoring that space of queries. For example, search and regular expressions, it's slightly unclear how we would define approximation there. Right? So the examples that I showed you so far were for search and regular expression queries. Right? For those queries, it's slightly -- people have defined the notion of approximation. Slightly tricky to define approximation in the context of Google and Bing. We're there. There are other ways, and Google has the like -- you know, Google has worked on ways where they actually optimize their system to bet to be able to do these search waves really, really quickly in memory. Right? And their main solution is scale out as optimization per server. But if you look at these more recent company startups like analytics companies and companies that are using Elasticsearch, they really don't want to scale out at that late every year. They don't want to have tens of thousands of servers. And they don't have such optimizers so that Google has them internally and are not open sourced. So the question is whether you can get that performance without doing Google [indiscernible] scale out. [Indiscernible]? Okay. Any questions? Okay. Great. So, what is my research on. My research focuses on bridging this gap between these large data sizes and memory capacity. Okay. However, what I want to achieve is the functionality of so-called NoSQL stores where you can do these powerful queries on the values and I want to be able to do in memory query execution for large, much larger data sizes than possible today. The question is whether it's possible to achieve this new line that is going to come up on the [indiscernible]. Right? Is it really possible to do queries, you know, on a 60 gigabyte RAM and I do queries up to 64, 128 gigabytes RAM out losing the performance. At some point the performance will drop. If possible, you get gains in two respects. Right? The first thing is you can scale up your systems much, much larger so you can execute more queries in memory. On the other hand, if you look at scale, then you can get much better performance in today's systems. Yes? >> Based on the trends you're saying, it seems like you're going to buy only two years. Right? Like why -- so you don't seem like that should be your idea. It seems like the idea has to be something else that has better graceful [indiscernible] properties than what you say you're shooting for, right? >> Rachit Agarwal: Yeah. That's a good question. So I'm going to show you [indiscernible]. If you can keep that question for a while, I'm going to show you that it is not just this point. You can actually extend this point much, much further by doing clever things. But this is not just two years. If you can get 10X more data in memory, right, and if the data is growing at a rate of 70 percent, you get roughly six years. Right? So you get roughly six years. >> [Indiscernible]. >> Rachit Agarwal: >> Absolutely. [Indiscernible]. >> Rachit Agarwal: Absolutely. See, that's -- >> Unless you invent a compression algorithm that can do better than constant. [Laughter] >> And then gets better and better with the years we put into make optimized. >> Why don't we do that? >> Rachit Agarwal: I promise you that's hard. I promise you that's hard. [Laughter] >> Rachit Agarwal: But, here's another thing, I think new memory technologies are going to arrive. Right? For example, until recently announced this 3D cross point technology, right, that are going to present a very different tradeoff than what we have [indiscernible]. Right? We're going to have much larger capacities, but much lower latencies than SSDs. And for them ->> Is that related to what you [indiscernible]? >> Memory, if hardware is going to face that issue, then we don't need any ->> Rachit Agarwal: Oh, they're still going to be, you know, 5x slower than [indiscernible] but they're going to have more capacity. So for that point, you want to design sort of solutions that can adopt to have a slightly high latency, but more capacity as well. Okay. So why don't I answer your question towards the later of the talk? But yes, the cognition says that you do get six years here, but not more than that. At some point, you are going to run out of memory when data sizes become large. Okay. So to resolve this problem, I've looked at two specific sides. Attack the problem from two perspectives. The idea is to take these problems that come out of this data fibers and then take the constraints that come out of systems and design scalable [indiscernible] and techniques while taking these constraints and data sizes into account. And then once you have [indiscernible] and techniques that are tend to solve this problem, you want to build systems that implement these techniques. What I have focused on during my last few years of career is that the idea of having these two combined views to attack the problems and I really think that it's important because if you just focus on designing scalable systems, then you're going to feel, you know, not leverage the structure in these interactive problems. On the other hand if you just focus on designing legal algorithms, then you're going to ignore the advances in scalable design. So I'm going to focus both on the algorithm side and a little bit on the system side in this talk. But interestingly, what I want to show you is that once you start looking at a problem from these two perspectives, sometimes you're able to look ahead a few years and look at how these new evolving technologies are going to change the problem space and design solutions that will work when these technologies evolve. So I'm going to focus a little bit on that too. Yes? >> Can you just explain what you mean by scalable systems where -- what aspects of the systems other than the algorithms and techniques they're using? Those two things look very similar. >> Rachit Agarwal: So, for example, you know, if you look at -- so, okay. Let me understand your question correctly. Are you saying how are scalable systems different from a scalable algorithm? >> Scalable systems are both using scalable algorithms and techniques [indiscernible] so what are these two boxes? If I'm the only one confused by the two boxes, then I will [indiscernible]. >> [Indiscernible]. >> [Indiscernible] that's supposed to mean? >> Rachit Agarwal: Yes, you are right. At the large scalable systems, just employ scalable techniques or algorithms running on top of them. But there's also the lower layer where you don't want to have systems that crash if you are going to transfer 50 megabytes of data between two queries for example. Right? So you want to have a rescalable RPC layer where for example you can touch multiple, tens of thousands of servers and yet not have very high overheads because of aggregating the data across these and this might be out of your system space, out of your algorithmic space where you [indiscernible] solve problems. >> [Indiscernible]. >> Hardware versus software. >> Rachit Agarwal: Okay. So okay. So this is the focus of my research. So what I want to start with is by understanding why is it the systems perform badly at scale. Okay. So let me give you an example of such that we started with. Here, I have a file which I just color encoded. Each of these blocks in the file could be terms or characters and what I want to find is all the green blocks in the file. Okay? Suppose I want to find this green block. The first technique that people use in literature is so-called data scans. Okay. The nice thing is you store your input file in memory, right? Every time a query comes in, you start scanning the file. Okay. And believe me, it's not just animation. Data scans are actually that slow. Okay? So but they have something nice, right, that you don't have to store anything in addition to your import data so you're just store your import data itself. So the storage overhead is not so high. The problem is that since every query has to scan [indiscernible] data set, you have very high latency or very low throughput. On the other side of the techniques, people have designed these absolutely great indexing techniques. We store the input file for item of access, but then in addition, you preprocess your input file to generate results. And this is so-called in the literature different kind of secondary indexes. So what's the nice thing about indexes, indexing works in a way that if your query comes in, you just simply do a bind research on your index and you get the query response. So it's super-duper fast. On the other hand, since you have to store some additional data structures in addition to your input file, your storage goes high. Okay. So you have high storage and low throughput or high throughput. So if I look at the scale, these are the two techniques. What I do is if I increase the data size, and now I'm going to plot in one of the state-of-the-art systems with scans and see what the performance looks like. So this is what the performance looks like and this is exactly what you would expect that as the data size is increased, your scan throughput, scan lay tense increases linearly. Right? Roughly linearly. So if you're doing a scan in faster storage, you have certain number of queries, not good enough, with larger data sizes and scans [indiscernible]. And no comes the point that you asked earlier whether these systems are indexed. Yes. So when you index these systems Elasticsearch, right, there you have 16 gigabytes of data and the index overhead that I showed in the last slide make the data sizes -- make the [indiscernible] execution [indiscernible] memory. Yes. >> Sorry, just so I can understand [indiscernible] or is this something [indiscernible]? >> Rachit Agarwal: So for this one, we had the Conviva data set, the first plot that I showed earlier. The Conviva data set where I have each row contains 98 different attributes, so user IDs, what time they logged in on Conviva, what we do when they show. And for each of the queries, you search along one particular attribute. >> [Indiscernible]? >> Rachit Agarwal: Yes. Yes. >> So sorry, going back to [indiscernible], I guess it's the same question. Do we have a -- is there some notion here that there is a key [indiscernible] Conviva column? Is that a setup in? Do you find keys? Because I don't understand why you need [indiscernible] to be in the memory. You can just have a place in the memory and you can kick off the blocks from whatever stables [indiscernible] you want to pick up, you will get probably something in the middle, right? Low storage because you're only storing index, and middling throughput because you go up and access [indiscernible] storage. >> Rachit Agarwal: So you have to store the data in memory. I'll give you an example why people do that. Mostly because you also want to do random access on that data. Here's an example. Twitter. It's red table there, something is running on top of Redis. The way they did this, they have to store the tweets in memory. Right? At least for the last few hours' worth of tweets or last few days' worth of tweets. But not only that, when people do the search, it's not the user ID that they're returning. They're returning the tweets of the search results. >> [Indiscernible] the same as [indiscernible]. >> Rachit Agarwal: >> Compactly. It's not interactive. >> Rachit Agarwal: Yes. the data and -- yes. Does that make sense? Yeah. So you have to store >> So for the data scans, the fact [indiscernible] today are basically the columns that are limited to the query so they can actually get much higher throughout data than what you are locating because you are only accessing the data that's relevant so how would you [indiscernible] that? >> Rachit Agarwal: The result I showed you was for actually a column store. Okay. Yes, you're right that you can -- if you have a one terabyte data set, that's a small data set. And if you have ten columns, each column is 100 gigabytes. You still have to scan hundred gigabytes of data. Even with today memory with speed, it will take you one and a half, two seconds. And we are talking about hundreds of queries per second here. >> And the other [indiscernible] in terms of the data scans was indexes, right? So that's the other part of the problem [indiscernible] handle freshness of data. How quickly can you ingest new data and be able to make it available for query. So indexing can be a more expensive operation rather than just scanning the data in [indiscernible] fashion and be able to scan so you can [indiscernible]. So will you be talking about this problem? >> Rachit Agarwal: Yeah. You're absolutely right. And no, I will not be talking about it. Data freshness is a very important problem and so I think personally, it's going to become more and more important. And all these techniques that preprocess the data have this problem that you have to preprocess the data. I think what today's systems do and a what Succinct does as well is we have an append only data model when new data comes in and you have a log store where all the new data comes in and you have to find really fast ways to be able to update this data as well as execute queries. Right? If you have in-place updates, I personally think this is a problem that both community systems have algorithms community have not resolved yet. Right? Updating indexes, we all know it's a very complex problem, and it's not that we have a fundamental reason to know why it's a complex problem. Everybody just says it's complex but nobody has resolved that problem. So I do think that there's a space there to solve a problem. But in terms of freshness of data, I think most of the system, the techniques to handle this is to have a separate write store and a separate read store. And that's what Succinct does as well. Any other questions? Okay. Good. So okay. So this is the cost you're paying by querying executed queries off slower storage. So what does Succinct do now? Okay. So this is a two slide description of Succinct. So in the first slide, I'm going tell you what Succinct does and then the second slide I'm going to take [indiscernible] answer what powerful queries mean and what Succinct can do. Okay. So Succinct takes your input file, okay, and preprocess the file to compress, to store a suit of data structures. Interesting thing here which I would like you to know is that Succinct does not have to store input data. All it stores is this compressed presentation of the input data. Right. Okay. Succinct takes the input file, generates this compressor presentations, and now, you can execute a lot of very wide range of queries directly on this compressor presentation. Okay. So what I want to convince you for next few minutes is that you can get low storage and high throughput for a larger range of input sizes than what is possible today. Okay. So why is this interesting? Okay. This is interesting because Succinct doesn't have to store any additional indexes, right? But more than that, even though it's not storing indexes, it avoids data scans and hence it's providing you the functionality of indexes. And then interesting thing is that this compressed presentation of the data actually contains within itself the bits or information bits required to be able to get the functionality of indexes. Okay. So you don't store indexes but you get the functionality of indexes. You are avoiding data scans and unless you want to access the data itself, which I highly recommend you not to do in that file, you don't have to decompress the entire in Succinct all the queries I executed directly on the compressor presentation. Yes? >> So you just indexes are hard to update, right? update your compressor presentations? How do you remember >> Rachit Agarwal: Oh, like I said, so when I say they're hard to update, we do that in terms of in-place updates. When you're doing data events most people use the write optimized store and then a read optimized store. Read optimized store is going to contain the compressor presentations in Succinct. The write optimized store is a lot of store you're going to keep uncompressed data. And then periodically transfer the uncompressed data into compressor presentation. Okay. Now people have done a lot of work in systems as well on designing very efficient log stores. But they did not have to solve the problem of executing search queries on that and we did not want log store to become a latency or throughput bottleneck. So you can't scan the data on log store. So what we did was we used some very simple techniques in terms of speeding up the queries so we can avoid a scan the entire file. And they can techniques have been worked on in the database community where we just show simple diagram indexes. And those are very fast to update. They're just two hash table lookups. >> It seems like there's also into every compressor presentation will be incrementally updated. >> Rachit Agarwal: >> Even Succinct -- Speaking compressor presentations [indiscernible]. >> Rachit Agarwal: Absolutely. Even Succinct. So this is one of the problems. Yes. So Anrag has been -- the student has been working on this project is working on now some techniques to be able to update not just compressor presentations but this new blow fish project that I'll talk about later on where we show that you can query the compression factor, so you get the performance of indexes as well. And there we are really trying to understand why is it for the last five years, 10 years, everybody has been saying that it's really hard to update indexes because there's no easy reason to understand. There's no lower bound on being able to update indexes in query community. I think the best lower bound is still I think 25 years old which is log in and log in query time you can do in tens of microseconds today. Yes? >> [Indiscernible]. >> Rachit Agarwal: It's the run data of indexes that you won't have to do data scans. The advantage of Succinct gets you by combining the first two points which is we are not storing indexes but we are still providing the functionality of indexes. So what you get is [indiscernible]. Yes. >> It's very interesting that you don't have any [indiscernible] it looks at having very, very exact index representations. Right? And having a log store or things like [indiscernible] and [indiscernible]. They might not be looking at search, but I'm assuming once you have an index, you can do something interesting like storing, you know, combining that with the column store. So just sort of the comparison with systems like Elasticsearch, [indiscernible] that's what's popular in industry today, but later on, will you show us some comparison with state of the art research systems? >> Rachit Agarwal: So I do think that Elasticsearch is state of the art, at least for search queries. MICA is a simple key value storage, does not support any query beyond simple reads and writes. So it does not have any notion of indexes. Same thing with run cloud as well. Silt is a slightly different system. Silt was supposed to get the memory efficiency, just like MICA did, but again, for simple key value pair lookups. Right? They did not have any functionality beyond simple key value pair lookups. Now, they can be tradeoffs. The tradeoff is that they can achieve much better performance than Succinct for random access queries only. Right? Because now you can push much more data in memory. You don't have to worry about search and you're data structure is not optimized for that. But that composite did not make sense because they have much weaker functionality. Okay. But I can tell you numbers. They are going to be 10X, 10X more random access queries that you can do with MICA with the specialized hardware like they have. Okay. So coming back to this plot, Succinct, for Succinct to work exactly like this, this part is not fundamental. Okay. I think Succinct system does not have those overheads that these very [indiscernible] systems have. This part is not fundamental. The fundamental part is this flat line there. You can maintain your performance for much larger range of input sizes so this is essentially what you get in terms of bridging the gap so you can do roughly apex larger data sizes today and get interactivity at scale. But once I show you the [indiscernible], I should also tell you -- oh. Okay. I'm going to tell you what Succinct can do and then show you some -- what Succinct cannot do. So what is Succinct data moral and what kind of queries can Succinct do today? So in Succinct, when we started designing the system, we decided to go with queries on flat unstructured files. Okay. And this may sound as boring as the file itself, unstructured files, but what I'm going to show you later on is that we as system designers, we should really be thinking about flat files much more than we do. Going to show you that using the simple interface, you can execute, you can implement many, many powerful models on top of flat files including key value stores, document stores, tables, and even graphs using the simple flat file interface. So here's the original input. What Succinct does is shows this compressor presentation. Now you can execute a lot of interesting queries. What do I mean by interesting queries? The first one is that you can execute search. Right? So the search query will result into either the key if you're thinking about key value pairs, if you're thinking about off sets and flat files, it will result in what the results in the original input are, what will execute on this compressor presentation. Right? You can do random access starting at any arbitrary offset. You can extract as much data as you want. The third one is, which is very -- we use this query for many, many optimizations is that you can do very, very fast counts in Succinct. If you want do count, documents of certain string, you can execute it really, really fast. You can append new data. And now, the interesting things come back where you can do range queries. Interesting thing in Succinct is that the range queries have the complexity, same as the original search query. So you can get really fast range queries in Succinct. And finally, a project that we recently finished and which I'm very excited about is that now you can execute these very powerful regular expression queries directly on compressed data. So really, we have nailed the space of search. You can execute as powerful search queries as one can directly on this compressed data. Okay. >> Does the search on this give you back an audit? >> Rachit Agarwal: Audit list. No. No. It doesn't have to be a sorted audit list. But as you mean that, okay, so search is taking here a couple of milliseconds. Right? So sorting, if you have even tens of thousands of results, it's not going to be latency bottleneck. Somebody else have a question? Yes? >> Range query within the file? >> Rachit Agarwal: Yes. Range queries within a file or within a column if you think about column RS stores are within a value, you know, a particular attribute if you're thinking key value pairs. >> [Indiscernible]? >> Rachit Agarwal: >> Yes. Yes. Yes? I'm a little bit struggling. >> Rachit Agarwal: Okay. >> The answer is a single attribute answer, right? As in if you -- if you are asking for show me column Y of all of the green logs, your complex representation is not going to get you the column Y that is associated with the green logs. >> Rachit Agarwal: It is actually. So, okay, so what is the -- right now, it's talking about flat files. So if you're thinking about key value stores or key value pairs or tables, let me tell you what would this mean. If your query is along the column, you would get access, our result, these results would be the primary keys in your table. Okay? Once you have the primary key, you can also do ran done access in Succinct so you can exploit, you can extract any of the other columns that you want. >> But that seems to require a [indiscernible] data structure. >> Rachit Agarwal: No, no, no. This is -- all these queries executed on a single data structure that Succinct stores. Succinct compressor presentation. Search, random access, regular expressions. Just using this single compressor presentation, you can execute all these queries that are listed here. Okay? So we are not going to change the data structures based on the queries. It's the same data structure that allows you to execute all these queries. >> So we can take this off and just to be [indiscernible], among those queries are MICA attribute. >> Rachit Agarwal: Yes, this is later on how I analyze these two to show you how we implement key Sorry. Yeah. This was only for for flat files only. I'm going to show you tables or columns. Okay. Yes? I'm going value stores and tables on top of that. the API for flat files. Okay? So okay. We can do queries and compress it and we don't need a regional indexes. We don't do data scans and we don't do data decompression. And when I gave this talk at Google the first time and they told me, okay, you must be really joking with us. And in fact, actually, it became a very interesting question for us to understand, you know, what are the tradeoffs that we're making in this system. Right? And we do make very strange tradeoffs. Strong tradeoffs. The first one is, like somebody asked here, we spend time preprocessing all the data. Right? So if you have a system where you want to execute queries only, you know, thousands of time, then this is not probably very interesting. This particular would be interesting where you want to do millions of queries. The second thing is if you want to access the data, compare to systems like MICA, we have to decompress. We have do extract certain number of bytes and that takes extra CPU cycles. Okay? So throughput will be lower if you just focus on random access itself. Third one is Succinct has focused on point queries. Do not really care about sequential scan throughput and hence, it's not very useful for systems like Hadoop or MapReduce, right? You really care about how much data can you read per second. And finally, we do not in-place updates very efficiently. The way it is supported in Succinct right now is deletes followed by appending, you know, delete update [indiscernible]. Okay. So how much time do I have? I'm running very late. But okay. So I do want to give you some idea about Succinct because I really think that the data structures are simple. Okay. So I can describe you that this is not a complicated technique. This was a simple idea in hindsight. And so I thought that I'll give you some ideas about the data structures, okay? But it builds upon a lot of theory work which was done in late '90s and early 2000s. And there are two main areas in terms of search that people use when querying [indiscernible]. Something called Burrows-Wheeler Transform, BWT, which was viewed as an [indiscernible] today. And then something else, suffix arrays, which are much less appreciated than they should be. Succinct builds upon the latter ones, suffix arrays. And we have some new data structures which make it very efficient. And in particular, these new data structures, they impose some new structure in the data that allows us to execute queries very, very -- by exploiting this structure, we can execute queries much quicklier than we could earlier. So I want to tell but this data structures and you know, about how these queries executed. But let me start with the suffix arrays. So all of us have the same background. So what do these suffix arrays do? Suppose I have this file, okay. So the file is just happy puppy. And the numbers were at the top are just the indexes into the file. Okay. I don't have to rush, right? Okay. So in the numbers over here are just the indexes into the file. So the way suffix arrays work is you first construct all the suffixes in your input file. So since these are suffixes, the entire file becomes a first suffix. You remove the first character and the remaining string becomes a second suffix. And so on. >> [Indiscernible]. >> Rachit Agarwal: Right. And suffix arrays, there's two is affixes in assorted order. Okay. So I have all the suffixes sorted lexographically. Now, for each of the suffix, you store its location in the input file. So the first suffix, which starts at A, starts at here, at location one. Okay. Second one at location zero and so on. Okay. So this is what is called a suffix array. This integer. Okay. And then we have suffixed in sorted order. So what is the problem? Actually, the nice thing is that if you wanted to do a substring search, this is just plain simple binary search. Once we have done binary search, the corresponding numbers here, they give you the search results. Right? Okay. The problem is if you have a file with N characters, the top array is roughly inscribed [indiscernible], right? Sum of one to N. The second one is storing a pointer in the input file, so each pointer requires log in bits so you have size N log N. Right? Okay. And if we look at the input file, if you think about ASCII files where you have only eight N bits in the input file, so these two [indiscernible] are much, much larger than the input file. Okay. I want to reduce a space. So here's the idea. Let's focus on the first two entries in the suffix array. Okay. So this slide is slightly complicated one but I'll try to make it simple. What I want to do is first see if there is some structure in first these two entries, right? What is a structure? I got the second -- I got the second suffix by removing the first character from the first suffix. Right? Which means if I store this pointer, which is saying where is my next suffix stored, right, then I can forget about this entire second suffix and reconstruct it by following the pointer. Okay? Stop me if this is confusing. But more importantly, this is storing locations in the input file. So if I remove first character, my value is only going to increase by one. Because the next [indiscernible] in the next location. Which means this pointer also tells me where's the next larger value stored in this array. Okay? Good. So I have removed this entire suffix and I could now remove this value and compute these values on the fly if I could store these pointers. Make sense? Okay. Then I'm just going to [indiscernible] over the entire array, do the same thing over and over again, and I have a collection of pointers that will allow me to compute unsampled values and I did not even store the entire array. But since this array was sorted, I don't even have to store one character for entry. I just have to store the first occurrence. Right? So I have taken these two massive arrays and reduced them down to one sampled array plus a few bytes. This is what Succinct stores as the first step. Okay. Now what were pointers storing? These pointers were storing -- this was a pointer into this array, right? Which means this is length and long so this is also going to take log in bets. So that's a problem. But yes, like I said, you can compute the unsampled values by following these pointers. Once you hit a sample value, then the number of pointers, sample value minus number of pointers looked up will give you the desired result. Right? So what do I do about this array? Because this has log in bets. So what I did was I took these two large arrays, stored a set of pointers that allow me to reconstruct this large array. Right? But this array will also [indiscernible] log in bits per entry but can anybody see structure there? If you look at all the values that start with the same character, they have an increasing sequence of integers. So although I did not have interesting structure in the suffix arrays, these pointers that allow me to reconstruct these suffixes on the fly have a very interesting structure. Right? And we know how to compress these increasing sequences, integer sequences very efficiently using realtime coding, for example. Okay. So I can store a very compressor presentation of this data set, of this data structure. What Succinct does is something more. It takes this data structure, the third one, and it transforms into two representations where each row is not just increasing sequence of integers but actually contiguous sequence of integers. Okay. And since contiguous sequence of integers are heavily compressed, well, you get a lot of benefits there. And more than that, it also allows us to do some queries very efficiently rather than doing this binary search over the entire array which is not very cache efficient, we can reduce the binary search to a very small part of the array. And then Succinct [indiscernible] finally this part in the data structures which allows you to do queries which are random access queries and these were mainly for search and then random access queries. Okay. So that's one way to think about it. Yes? >> I'm just kind of wondering. Do you ever worry about the number of memory accesses as you are compressing data [indiscernible] like for the times you were talking about memory access [indiscernible]? >> Rachit Agarwal: So I missed the last part. >> So I'm guessing with all this compression that you're doing here, it's like every query will take a lot of memory accesses for you to try to reconstruct and follow pointers and whatnot. So do you not worry about ->> Rachit Agarwal: We do actually. We do worry about it. So what I did not talk about is that Succinct gives you the following guarantees. If you're doing random access for B bits, then you have to do B-plus log-in point of lookups. Okay? Which means if you're extracting 1000 bytes, then you have to do just 16 extra pointer lookups. Okay? So you have 1,000 plus 16 pointer lookups. Now, when you're doing search, you have to do exactly log-in point and lookups. Actually, two log in in the current implement. Okay. Now, if I take 100 gigabyte file, this boils down to roughly speaking 38, 39 pointer lookups on a single core. 62 nanoseconds. You can still do queries, you know, much, much faster than going to SSDs. Right? So unless you are going to do thousands or even hundreds of pointer lookups, then that becomes a problem. But yes, you have some overheads of doing queries and compressed data. Right? As long as those overheads are less than overheads of going to secondary storage, you win. >> Secondary storage you said was [indiscernible]? >> Rachit Agarwal: Yes. SSDs for example. Today if you look at SS -- >> [Indiscernible] the number of memory accesses is fewer than whatever that amounts to. >> Rachit Agarwal: But when you're doing an index lookup, unless you have very smart data structure like Google, if you're doing an index lookup, you have to do binary search or SSD stored data. So even for search, you have to go to SSDs multiple times. There are techniques to avoid that. I'm saying there are techniques to avoid that, but then that makes really multiple indexes, three indexes that just makes it very complicated. Yes? >> Just tell me you just said, you said the comparison point was going to secondary storage. But isn't there sort of an intermediate point where you keep everything compressed in my memory but you just continually scan it? You don't worry about doing an index. Depending on your query workload could be much faster than secondary storage, and depending on how many pointer lookups you have to trade off and the latency you're willing to tolerate, that can also be a bit of a point. >> Rachit Agarwal: So if I understand your question correctly, you're saying why don't we just do a scan. >> Well, I'm just saying that -- no, I understand why you don't want to do scans. But I'm saying that that maybe should be your fallback position rather than secondary storage. Saying we can tolerate so many pointer lookups, well, actually, at some point you're better off doing just scans. >> Rachit Agarwal: Actually, no. I'll tell you why. I think scans are really, really slow. Here's the reason. Right? Ten gigabytes of data, it still takes a hundred milliseconds to scan that data, even if you're not doing any kind of computation today. Right? Now, if you ->> But you're doing it from all the cores in a -- >> Rachit Agarwal: No, single core. I'm only talking about single core performance. Yes, you can paralyze it. So if you are 100 terabytes of data, then you will -- 100 terabytes of data, one single core gets set and ten gigabytes of data, you still have to do 100 milliseconds, right? >> And you are -- how many milliseconds does that take? >> Rachit Agarwal: Say 100 milliseconds with the 100, even if you are [indiscernible] hundred megabyte per second memory ->> [Indiscernible]. >> Rachit Agarwal: >> Sorry? Seems way too fast. Is it good for you? >> Rachit Agarwal: Yeah. I'm saying that I'm doing a conservative analysis, right? I'm doing a conservative analysis that even ten ->> [Indiscernible]. >> You can compress it. [Indiscernible]. >> Rachit Agarwal: Believe me, I think [indiscernible] people have worked on that problem. But what I'm saying is that if you are spending 100 milliseconds scanning the data, going to SSD today you can get a state of the art [indiscernible] 25-microsecond latency random lookups. >> [Indiscernible]. >> Rachit Agarwal: >> Right. Are you talking about 4000X extra time? I'll shut up. >> Rachit Agarwal: >> Right? Yes? So [indiscernible] the whole file as a huge stream, right? >> Rachit Agarwal: I'm going to get to that in the next slide. key value pairs and stuff, right? >> No, no, no, the flat file. >> Rachit Agarwal: yes. You mean the You are treating the whole flat file as a -- As a stream right now. In this example that I showed, >> Oh, so I'm wondering, so if we want to like append [indiscernible] just append to that sort of file, right? So how do you update a structure? Do you need to like recompute the suffix array again, or I'm just wondering -and also [indiscernible] keep increasing, do we need to like increase the [indiscernible] for every item, every integer [indiscernible]? >> Rachit Agarwal: So we use a recreational solution, at least in the current version, right, that you have a write optimized log store where you append the new data. And since data is sharded, you have a read optimized store where Succinct data structures are not updated unless you're deleting something. Okay? Now, when a new data arrives, you collect the data for a while, right, and on this data, we have some optimizations to do queries on search queries so it does not become a bottleneck. And then you periodically transfer this into compressor presentations. Okay? So this is the standard log store approach. In fact, Silt paper that somebody mentioned, Silt paper does this multi-store approach to this. Yes. Yes. Okay. So actually, so index as a scans and on this curve, now we've got one new point, which is Succinct. Okay? So this most of the things I've talked about earlier, this is Conviva data set, 1.5 kilobyte records, 98 attributes. And systems like Elasticsearch, Mongo dB and Cassandra, I want to show you some quick numbers on the storage. So this is the system amount of memory. Okay? I'm going to plot just storage for [indiscernible] which means that I have the certain data size which is raw input data size and what is the storage footprint for these systems? Okay? If you look at Cassandra, you are roughly around eight, somewhere between 8 and 16. It runs out of memory. Okay. I don't think this is fundamentally wrong with Cassandra. It's just that that system was not optimized for memory. But a lot of it is coming out of index overhead. If you look at Mongo dB, it has similar performance lags. Search is actually much better. And oh, one thing I should say is that even after our days of experiments, we couldn't upload where the curves stop. That's the last point that system can work on with 60 gigabyte RAM. >> What is the data here? >> Rachit Agarwal: The Conviva data set. It's a collection of records with 98 columns or attributes. Right? And it's 1.5 KB records, each of the records. This is one of the companies that collects video and that has video logs. >> So it's video data? >> Rachit Agarwal: No, these are log files, user-generated log files like when do you start watching a video, when did you stop, what video were you watching, what was the length of the video, where did the video come from, and all those things. So different ->> [Indiscernible] the Y axis. >> Rachit Agarwal: The Y axis, oh, yeah. This is a system storage footprint, which means for a given particular raw data size, right, how much data -- what is the amount of data that these systems have in terms of after creating indexes and stuff? Okay. >> Real memory that they use? >> Rachit Agarwal: Yes. So essentially thinking about it, how much memory would you need if you wanted to put all this data in memory? Right? And this is the line that I gave which is the system memory. And here's what ->> Are you using -- all right. >> Rachit Agarwal: Yes, please. >> Interfere with bottom line here. systems? Are you using huge pages in all these >> Rachit Agarwal: No. So I think Cassandra does not support huge pages. In Succinct, w disabled the huge pages for fair comparisons. But I'll show you some numbers towards the end where we actually get a lot of gains by using huge pages. And then I'm going compare against different systems there. Yes? >> Some questions here. I mean, [indiscernible] transform is developed quite long time ago. And the capability to basic execute search very efficiently on that data structure is also known [indiscernible]. Can you comment what's your exact contributions? >> Rachit Agarwal: Yes. >> To view the [indiscernible] previous [indiscernible]? Are you describing this performance, the efficiency of the [indiscernible] depends on the data characteristics. But, I mean, I assume you work on Conviva a lot of data set. That might be more compressed if you are viewing -- searching for other things like [indiscernible] web page, the compressor [indiscernible] may be different. Can you comment on those? >> Rachit Agarwal: Okay. So two questions. The first one is what were our contributions given that BWTs have been known for a while? So the two directions to BWTs, one is the transform itself. Right? That does not have -- and you can do very efficient search queries on that and Google actually does that. The problem is that it does not provide you compression. People have worked on using the storage overhead of BWTs, okay, using some structure there. And that is actually those kind of techniques are also used in B zip too, which is standard compression technique. But then you lose your ability to be able to do queries on that compressor presentations. You have to decompress the data which has its own latency overheads. >> [Indiscernible]. >> Rachit Agarwal: >> Exactly. Less compression. [Indiscernible]. >> Rachit Agarwal: Yes, for compression. Okay? Now, your second question was that these numbers are, especially the storage numbers, are very different from the data sizes. >> So I asked the question, are you transforming the whole data set? Meaning are you trying to [indiscernible] for the entire data set, or are you trying to segment the data set [indiscernible]? >> Rachit Agarwal: Yeah. Succinct allows you to do flexible data sharding, which means that you can shard your data along columns if you knew that your queries were only get to get the long columns, or you could shard basically like Elasticsearch and Mongo dB, which is row sharding. Right? And then for each of the shards, reconstruct these data structures. Right? Does that answer your question? I have still not answered one of your questions. Sorry. I have still not answered one of the questions which was the dependency of these numbers on the data itself. So yes, you are right, and any compression property, any compression [indiscernible] would have that property. Now, there are two things. One is even if your data is completely incompressible, right, and this is slightly non-intuitive, and I didn't want to say that earlier, is that Succinct allows you to do search on -- maybe you won't get data compression, which means your data size would be one, but you will still get the functionality of indexes. Right? >> Okay. But I'm not sure [indiscernible] index efficiently [indiscernible]. >> Rachit Agarwal: Absolutely. Yes. So performance wise, let me settle the storage numbers first. What we did was we experimented with different -many, many different data sets. I think 20 data sets and we have some numbers in our paper. And what we showed was the following. If you took your data set and compressed it using Gzip, right, and then you took it and compressed it with Succinct, for most -- actually for all the data sets, what we have seen is the numbers lie between 1.4 to 1.7X of Gzip compression. >> The size is one point -- >> Rachit Agarwal: 1.4 to 1.7X of Gzip, which means you're paying 40 to 70 percent extra compared to Gzip. But you get all these functionalities. Okay? >> Are you going to show us experiments where you vary how much you -- the granularity of compression? >> Rachit Agarwal: I did not tell you yet Succinct could do change in compression, but I'm going to come down to that towards the end. But yes, right now, this one is a fixed compression factor that Succinct allows you to do. I'm going to show you some numbers I think towards the edge. But not with the varying compression factor. >> I meant the sharding of the data for the compression indexes. Because then you can search two indexes at once so you have more memory accesses but it's in parallel and ->> Rachit Agarwal: >> Yeah, but then you get -- -- all the same machine, right? >> Rachit Agarwal: See, in terms of throughput, it's easier if you just parallelize, you're going to get that many improvements. Linear improvements, right? >> Right. But if you're up against the interactivity deadline, there's a -there's the user at the end and a hundred memory references might not work for them but 50 might. So being able to configure that appropriate to what's happening is important. >> Rachit Agarwal: Yes. If you give me not more than five minutes, I'm show you a cool result. Okay? So search results. Like I said, these are three systems. And Succinct runs something like that. After Succinct [indiscernible] performance [indiscernible] and again, you know, this was when Succinct stops [indiscernible] in memory. It doesn't [indiscernible] 256 on 60 gigabyte RAM. But, see, this is something which I think is the power of Succinct. You take a machine with 60 gigabyte of RAM, right, and you are putting in 128 gigabytes of data on that machine and yet getting some millisecond or subsecond queries. Okay. So you're putting more data than the RAM itself. And then we have random access throughput where the performance numbers look very much similar. And you know, the only thing is that these systems degrade, have much better degradation, not just flat drops in terms of throughput. Yes? >> So how -- what would the gains be, say you ran on dB CBS? dB CBS. >> Rachit Agarwal: 16 gigabytes 16 gigabytes? >> Yeah. Without one of the things [indiscernible]. The question is about query generality or query [indiscernible] as opposed to I don't know what [indiscernible] you're using here. >> Rachit Agarwal: So I showed you the set of queries that Succinct supports right? Starts regular expressions range queries. And then counts and random access. What kind of queries do we not support is the aggregate queries, which means if you want to do it on average, in general, [indiscernible] queries I don't yet know how to do ->> But you had a counter. >> Rachit Agarwal: Oh, counters -- okay, yes. One can say counters [indiscernible] query, but what I meant is suppose you have a column that is a salary of people and then you say okay, find me the average salary. I don't know how to do that for query yet on compressed data directly. >> What is your number? What [indiscernible]? >> Rachit Agarwal: So we were looking at NoSQL stores, right? So yes, I'm not answering your question, but I don't know the answer exactly. But we are looking more on the NoSQL side where people look at system [indiscernible] Cassandra and these, right? And I'm not sure if people run dB CBS on those systems. But I'll be happy to run some numbers and see what kind of queries. Those are more -- actually I haven't looked at -- those are more SQL queries, right? The problem there is that even if a SQL query has an aggregate part, right, then I'll have to say I cannot support that query. >> [Indiscernible]? >> Rachit Agarwal: Yes, I think -- ah, in terms of SQL queries, we have to understand the coverage, right? But our focus was not on SQL at all. Like even when we released Succinct on top of spark, we have a support for spark SQL where you can implement the [indiscernible] fast, right, but we did not release Succinct as part for SQL because ->> [Indiscernible]. >> Rachit Agarwal: Yeah. So filters is one thing that people use and likewise which is used in regular expression you can use. So I think Succinct would be a part of SQL execution rather than your SQL execution engine or [indiscernible] speaking because I think a scan is just too hard for Succinct and you have do some scans in spark. >> [Indiscernible]. >> Rachit Agarwal: Okay. I want to tell you some more cool things but I think I'm going to start skipping a few things. But so now, okay, I told you about what Succinct can do, what Succinct cannot do, and how do we do it. So now, how do we take this technique and build a distributed store out of it, right? And you have to think about multiple things in this context. First is what is the data model that you're going to support? The second one is and this problem was told to us by LinkedIn people that if you have these skewed workloads where you have certain data which is very hard, certain data which is cold, you cannot -- you do not want to have the same data representation for all the data. Right? So how do you handle skewed workloads where queries are distributed non-uniformly across different shards? How do you handle [indiscernible] failures where once a symptom, once a machine fails then the load on the other remaining replicas increases? How do you handle data recovery during failures and data concerns? I'm going to go through one, each one of them and give you some ideas what Succinct does. So data model. We, like I said, we have this flat file interface where which allows you to implement many, many data models right on this using simple serializer interface which means you can run your queries on unstructured data or key value stores like Voldemort Dynamo. You could do document stores using one single interface of flat files. And I know I might be missing some people, but I want to show you this. This is really cool simple thing. So, this is a Succinct interface. User submits the file. Just treating green as if it's going to give the system a key value store or a table. So here I'm going to show you a table and this table has four columns. Okay? Now, how does Succinct execute these queries along the columns while working on flat files? Right? It takes this table and it assigns each column a unique delimiter. Okay? And then, it takes down these values in each column, appends a delimiter to each of the value and writes them down as a flat file, okay, which means the first value here, green value, combined with the delimiter is written first. And the second value and so on. Okay? Now, I have a flat file that I'm going create Succinct data structures on. Make sense? >> You can go a little bit faster. >> Rachit Agarwal: I can go a bit faster? Okay. So when a search query comes in, where I have a green block, a long column, one particular column, I want to find all the green blocks in one particular column. What I'm going to is execute this search query, oops. Execute this search query by appending this delimiter here with the green column. Okay? And now it's easy to see that you only get the results in the first column because of the [indiscernible]. And this is happening all inside Succinct. And since I've showed you for tables, now I can do it for this on documents, I can do it for key value strokes and everything. Such a simple powerful interface. And then I was going to show you something that now you can actually do very powerful graph queries using the same interface. But okay. How do we do skewed workloads? And this answers part of your question that I'm going to jump into. What is it that we do today? So something that [indiscernible] a while ago, at least in MapReduce, is that if you have a query distribution or a load distribution or in shards of this order, so this is a [indiscernible] distribution where some shards are very lightly loaded, other shards are hot. What you do you is you create additional number of replicas along for the shards that are hot and create fewer replicas. And this is something I'm going to call selective replication. The problem is that selective replication is kind of coarse grain. Right? If you want on it to increase the throughput, then you have to -- if you do 2X replica, then you're throughput increases by 2X. In memory, you want to do something more efficient and see if you could do something more efficient. So here's what we do. So I had the Succinct techniques, right, where I had this input file and I stored these data structures. So these two data structures as I mentioned have very small storage overhead. These two, on the other hand, were sample data structures so their storage depends on the sample rate. And for sample rate, if I have X, then the tradeoff was very simple. Your space becomes N log in for the top one, which was the suffix array, and this was for random access. So two N log in and my sampling rate is X and every time I have to do a query I spend time roughly X. Okay. For computing and sample values. So how can I take this and generate something more interesting? I'm going to take this sampled array and I'm going to use one of the techniques that people have used long, long ago in redoing and quoting techniques. Okay? What I'm going to do is take the sampled array and store it along multiple layers. So the top array has sampling rate two, so I'm restoring every second sample value. What I'm going to do is store it along multiple layers 2, 4, 8, and so on. Why is that interesting? The interesting thing is that I can change the sampling rate easily now. That's what I wanted to convince you. So if I wanted to do a layer deletion, then I could simply deallocate the space or a layer. Right? And my sampling rate suddenly changes from 2 to 4 now. Yes? It is nice because I have reduced my storage and I've increased my query latency, though, right? Adding the layers is different. Suppose I want to add a new layer. Right? The problem is that these are highly interactive systems. We don't want to dedicate extra resources to be able to fill in round sample values that you need to fill in this layer, populate this layer. Right? But a sampling rate goes down and you have a higher storage but low grade latency. Interesting thing is this: Succinct is already computing the unsampled values on the fly. Right? During query execution. So we can use that to [indiscernible] fill these layers. So basically, you get very low overhead ways to fill and delete layers here. Right? To populate the layers [indiscernible]. And this is very nice for skewed workloads in particular because once you have completed the value, since the queries are being executed more and more on that shard, some values [indiscernible] for skewed workloads. Okay? So you can basically add and delete layers dynamically and by adding some small few bits here, there will just show that now you can add and delete layers independent of existing layers and even do query execution independent of what layers exist and do not exist. So what I have done is taking the system here, Succinct, and allowed you to achieve any point on this smooth storage performance tradeoff curve, you know, where you can change our desire whatever storage you want, you know, going all the way to B zip to compression, right, and go all the way to indexes and you can choose any operating point depending on what sampling rate you chose. Right? And not only that, by adding and deleting layers, I provided you one way to move along the straight-off curve very, very efficiently. Right? This is something that we have never been able to do in distributor systems. Given a particular data set, you always had a fixed data size. This really allows you to do -- a chief a very flexible way of designing distributor systems for interactive queries. Okay. So now, since I have the solution, I can tell you how I can apply this particular solution to solve many, many system Z problems. Plus one is like I said, you have coarse-grained access to these -- for these skewed queried distributions Succinct by using this flexible trade off, you can do much, much Finer-grained control over the throughput and latency in particular for shards that have already loaded, you can move along the straight-off curve and get much finer control depending on if you want 1.5X throughput, you only increase to that level of storage. Okay? So I'm going to skip this [indiscernible] failures. Again, the same thing that today, like I said, Facebook could erase the replica crease for 15 minutes in case of [indiscernible] failures and this creates a very high load on the remaining replicas. Suppose you have three replicas, one of those fail, but I don't want to create a new replica, so all the queries that were going to the failed replica are now going to this new replica. So if only one [indiscernible] failure happens, your load is to increase by 50 percent on the remaining replicas. So what Succinct allows you do is navigate along the straight-off curve upon load of spikes and here's one simple result. What we did was we ran the system and at time equal to 30, we increased the load on the system by 3X, okay? Once the load on the system increases, at this point, Succinct says, okay, I have to create another layer of samples. And it creates layers within five minutes, it has started to meet the new load of the system because those values in the new layers were filled up in five minutes. Right? And by the time I spent 15 minutes, my entire system was again stable. Right? And this can be seen here where queries, this is the Q lent size as the time release. Q lent starts building up as soon as I increase the load, and then you can job the Q lent. And this is the worst case result for Succinct because this is for uniform workloads. Okay. That is all I just showed you for a skewed workload [indiscernible] than a minute or so for very high 3X load increases. Okay. So I'm going to skip this part. One thing that I didn't say is that Succinct doesn't have yet the transactional support or we don't provide [indiscernible] and that is something I'm very happy to talk about later on. What I want to talk about is three quick projects on the future work that I have planned. One is more short term, the other one is medium term, and the final one is the long term. So I think in terms of future work, there's one problem that we haven't really yet resolved which is how to do these interactive queries on graphs. Increasingly larger number of the services today interact or answer your queries on those services by exploiting your information on social networks right? And so people have been thinking about integrating these queries, interactive queries and graphs, with these distributed stores. There's been a lot of work in graph processing which means you want to run shard [indiscernible] which is again same as batch processing. But interactive queries on graphs are far from efficient. They're two systems that are good systems that exist, [indiscernible] and Titan, and believe me, they're not even close to like good. [Indiscernible] does not have any shard so you cannot scale it up. Titan has global performance. It's not that these systems are worse or bad. It's that the these complex of the graph queries are actually very complex, even in my pieces that I showed that some of the graph queries is just impossible to do without scanning the entire graph. And here's one simple query. That if you want to find friends of Ratul who live in Berkley, right? Now the problem is that 2X execute ->> [Indiscernible]? [Laughter] >> Rachit Agarwal: So there are two ways to execute this query. One is you look at Ratul's friend, which is a large number of people, and then you look at people in Berkeley, which is again a large number of people, and you do a very complex join. Right? And as we all know that is going to crash the system, right? If it was for me, maybe fewer friends. Another way to do this is you look at Ratul's friend, and for each of the friends, go and random access its location, the person's location, right? And then filter out the results while you are doing this random access. What is the problem? The problem is that Ratul is hot data, right? Everybody wants to look up Ratul. And then some of the friends, some of his friends might be cold data that never access their Facebook. So essentially, they're cold data. But since I'm doing executing this query on a hot note, I can touch -- I may be touching code nodes because of the relationships between people. Which means caching becomes much, much more important than standard systems. Right? Because the cold data, I might not be quoting cold data directly, but the queries might be going on the cold data indirectly and I think compression should help that. Right? If you could cache much more data. So I think this is one thing which I'm super excited about these days and we have done some very preliminary understanding of existing systems and I think they're far, far from what we can achieve today. So building a distributed graph store is a short-term problem which I think will integrate very well with today's [indiscernible] services. The second problem which, again, I'm very excited about and we have made some progress is there's all this work about achieving data confidentiality. Okay? I don't want you to access my data. Please don't access my data. And then I tell you, okay, take my data, please query this data, allow me to query this data in an interesting manner, right? So people want to achieve, you know, some kind of data confidentiality and they use encryption today. On the other hand, you know, what I have shown you today is that you can get a lot of performance benefits using compression. Right? The question is whether it's possible to be able to get the benefits of compression while getting data confidentiality. And it's a very challenging problem because compression is all about removing redundancy. On the other hand, if you think about encryption, then it's about adding redundancy. Right? And these two things seem very, very against each other. And we have realized that, you know, at least over the last six months, me, we have realized that we need fundamentally new techniques. And Ion and I got interested in this problem and then we start thinking about, okay, why is it that we should be able to solve this problem? Because I think we can focus on some limited functionality, but more than that, what can we do today? So we built this very simple system which is called MiniCrypt which can do queries and compress encrypted date but no search, no regular expression, nothing. Just plain, simple key value store lookups. So I don't want to say much here but it's still a lot to be very non-trivial problem, but you can get a lot of things there compared to existing systems. Okay? And Ganesh is telling me that I have only a few seconds left, so I want to say the third thing which I think is something which is happening more and more as well, which is this whole direction of resource disaggregation, and some people call it rack scale computing. So if you look at Facebook, Google, all these big companies that have recently moved to this, you know, a resource [indiscernible] datacenter architecture, which means if you look at today, each server has some amount of RAM, some number of CPUs, some amount of disk, and they're tightly integrated. Right? And because of the capacity scaling challenges, people are [indiscernible] like Intel have realized that it is no longer sustainable, this model. So what they say is that, okay, we are going to disaggregate all of these resources. We're going to build CPU blades separate from memory blades, and separate from IO blades and disk blades. Okay? And all these pool of resources now will be connected by a network fabric. So this is called rack scale computing or some people call it [indiscernible] datacenters. The problem is that this new style of servers are going to fundamentally change. And by the way, these rack scale computers are already being used in clusters and datacenters. Facebook actually uses them in one of their Oregon clusters. And in terms of systems, right, I believe that this is going to -these new architectures are going to fundamentally change the way we design the systems because the way we build systems for last ten years, 20 years, were built on some fundamental assumptions like we have very high CPU memory bandwidth which is no longer possible in this ideation, right? The failure models are going to change. How do we access data locality is going to change and even, you know, how do we have CPUs interact with disks is going to change. And I think there are going to be numerous new challenges in desktop building systems. So that's all I have to say, but I'm going to take one extra minute. So what I want to -- what I talked to you about today was mostly this queries and compressed data. I want to give you a very quick picture about some of the other projects that I didn't get to talk about. So I did my Ph.D. on queries and graphs. And as Ganesh said initially, that I did some of the theory work there. What I was really trying to establish fundamentally why queries and graphs are a hard problem. Earlier in my career, I did some work, which I have essentially got interested again, coding information theory because of the problem of queries on compression encrypted data. And here we had some very interesting results. More recently, I have had what with [indiscernible] disaggregation and we recently had some is successes understanding the network challenges for resource disaggregation. I've done some work, network debugging a while ago. And but more recently I started thinking about some interesting network debugging problems again. And finally, a lot of people still think of me as a routing guy. And believe me, the only routing I do today is driving to my office from my home. But a lot of people still think of me as scalable routing guy. So I think you know, having been exposed to these different areas or subareas have given me the right platform to solve all the problems that I want to solve. And none of this would have been possible without all the awesome collaborators I have. And with all the people that listen to my talks. Thanks all for coming and I can take more questions now. [Applause] >> The question I have is because you're using really complex compression algorithm, does that make your data more vulnerable to corruption because now every [indiscernible], every entry is actually related to all of the data of the original data, right? So for example, if you corrupt one part, that is just one part is done. But now, if it is corrupted, if you corrupt one part of your corrupted data, then all the data will be done, right? You cannot recover it. So that would be a bigger problem. And also, you had mentioned you were using [indiscernible]. I'm wondering whether you are thinking about you basically will add another layer on top of that batch layer instead of replacing that batch layer. >> Rachit Agarwal: So first question on data corruption, it's an interesting question. Well, I haven't thought about it, to be frank. But let me. So, when you said data corruption, you mean when data fails. Or is it -- are you talking about low-level data corruption where ->> Yes. >> Rachit Agarwal: >> Okay. [Indiscernible]. >> Rachit Agarwal: Yes, definitely. Memory, I'm thinking, because we are doing in memory computations, right? So I'm thinking -- I'm trying to put data corruption in place. But let's see. >> With all the alternate memory technologies, they're more likely to fail, so you might actually have failures. >> Rachit Agarwal: Right. So the persistence is easy, right? The way we do process data is just doing it in the secondary storage [indiscernible]. But an interesting thing might be, again, I'm now speculating since I already told you I haven't thought about this problem, an interesting thing might be since we are computing these unsampled values on the fly, if we have some data corruption, then it might be very, very -- it might be interesting to understand whether we can we can identify data corruption because there will be inconsistency between sample values and unsampled values and the number of pointer lookups that we do. >> But if you identify that, then how do you [indiscernible]? Because if you cannot recover it, the damage [indiscernible] how much damage that will cause. >> Rachit Agarwal: Absolutely. So the way we do it today is we process the data. Right? We do data replication on secondary storage with every write. Right? So once the data is processed and replicated, then something fails and we can check where it has failed. >> What I'm saying, if you cannot recover that, you might use the replication. Then you need to matter how much damage that will cause, right? [Indiscernible] flat file, that damage will just be the corrupted data. But if you -- but now you are compressing the data, then if it's corrupted, then it is very possible that the failure will be amplified because of the compression. >> You still store the data in persistent storage. compressed form in the persistent storage. >> You don't store the So [indiscernible] you need to -- >> Well, you're doing in-memory computation that's on the compressed data and then you're storing the real data on persistent storage. >> Okay. So the -- that's -- >> So if you need more, you replicate. >> Okay. >> You need more resilience, you replicate. >> [Indiscernible]. >> Rachit Agarwal: It's not replenishing the batch layer with in -No, no, no. >> Ganesh Ananthanarayanan: >> Rachit Agarwal: [Applause] Just like you do any -- I think we're out of time, but [indiscernible]. Thanks all, for coming.

Document 17844380

Related documents

Products

Support

Document 17844380

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib