37190 >> Chris Hawblitzel: All right. So welcome to this afternoon edition of Microsoft Research talks. It's my pleasure to welcome Baris Kasikci from -he's just gotten his Ph.D. from EPFL, advised by George Candea. He's published on a variety of topics in a variety of places, including SOSP, ASPLOS, TOPLAS, HADDARAS. And today he'll be talking about one of his most recent projects on stamping out concurrency bugs. >> Baris Kasikci: Thanks a lot, Chris. Well, thanks for coming everyone. I'm really happy to be here again. Seeing familiar faces. Okay. So again, just like he said, this talk is going to be about getting rid of concurrency bugs, and I guess it's no surprise to this crowd that bugs are a big part of software development. And just to give you an idea of how big a part they are, though, I'll provide you some numbers. So now, roughly, 35 percent of all the IT expenditures this past year went into quality assurance. So this is activities like testing, debugging, and fixing of software. Right? And according to some projections, this number is expected to rise all the way up to 50 percent by 2018, potentially. And -- what's that? >> Why should it go up? >> Baris Kasikci: Why should it go up? creeping in the software. Yeah. >> Because I guess more bugs are [Indiscernible] creeping in for the [indiscernible]. >> Baris Kasikci: Many systems are becoming more complicated, so the bugs are also more complicated. >> [Indiscernible] ask a question on the early slides. >> Baris Kasikci: >> Okay. It's okay. You can ask them. I have a question about the previous slide. [Laughter] >> Baris Kasikci: And so because quality assurance is so costly, the operators basically end up having less time and resources for essentially building cool new features which is a major problem. Right? And now, one particularly challenging and notorious type of bugs is concurrency bugs. So concurrent programs are one in which multiple threads of execution communicates your resources such as memory, then it's great if you can write correct concurrent software. Because then your code can actually run faster on parallel hardware and then you take advantage of this. But it's subtle to manage these interactions, shared resources, and people and the, you know, making mistakes such as data races, [indiscernible] violations and [indiscernible] locks. And these are concurrency bugs. Right? These bugs have caused losses to human lives, material losses, security vulnerabilities. Not just that, compared to other type of bugs, they're known to be more time-consuming. So they're hard to reproduce and it may take days or even up to weeks to properly craft a fix for such bugs. And all these extra challenges I believe at least that make concurrency bugs scientifically compelling problem to tackle. And to better grasp the ordeal that these bugs have become, especially over the past ten years, I'll give a quick historical recap of, you know, how this happened and, you know, this graph, familiar graph, the transistor count trends over the past 40 years, now, you know, it's otherwise known as Moore's law will be able to pack twice the number of processors into processors, right? [Indiscernible] the number of transistors into processors roughly over 18 months, right? And this trend went on while all the way until early 2000s. All the while, processors had a single core in them. And then after the early 2000s, due to various physical constraints, processor designers had to increase core counts in order to keep reaping the benefits of Moore's law. And this was great. It worked. People called it the multicore revolution, right? My phone has two cores. I'm sure you have better phones out there with four cores or 8 cores. And this was all great, but the problem was that this shift happened rather rapidly. And so there were only a few people writing code for such parallel architectures. So just people writing code for multiprocessor systems or people writing code for supercomputers but all of us, there were a lot of developers writing code for such parallel architectures. And they weren't necessarily well trained to write correct concurrent software that would leverage this parallel hardware. And ultimately, unwittingly, they introduced concurrency bugs in their programs. Now, I want to come back to one particular aspect of the previous graph, namely the core count trends and focus on this, you know, this trend, you know, this rapid rise after the early 2000s in core counts. So what I did is I went ahead and asked Google Scholar how the number of papers that mentioned concurrency bugs in their titles, at least, varied over the years and superimposed this information on to the core counts graph. [Laughter] >> Baris Kasikci: And perhaps not surprisingly, after the early 2000s, after this shift to multicore architectures, there's been a clear surge in the number of techniques that at least mention these bugs, right? Now, this is in some sense a fortunate situation in that -- yeah, go ahead. >> Are there space limitations with all cores as well as the number of concurrency papers? [Laughter] >> Baris Kasikci: >> More [indiscernible]. >> Baris Kasikci: >> Space limitations? What do you mean? Anyway, go on. I mean, basically -- [Indiscernible]. >> Baris Kasikci: Basically, what I mean, I think what this basically shows is that people, I mean, academia have placed too grave an interest in the growing real world problem as evidenced just by the paper count. And so just to give you some examples of how, for instance, concurrency bugs can compromise system security, I listed a couple of exploits that actually used concurrency bugs in order to craft the attacks in popular software such as Linux kernel and the Apache web server. And these attacks can allow an attacker to execute code, leak data, you know, gain arbitrary privileges over the program. But of course the other types of bugs cause security vulnerabilities, right? It's not just concurrency bugs. But it turns out that existing defenses against attacks may actually fail if attackers use concurrency bugs in order to craft their exploits, and which is again why I believe it's another example of why it is scientifically compelling to tackle these bugs. Now, having identified this increasingly growing problem of concurrency bugs in my dissertation, I build techniques in order to identify and fix concurrency bugs. And the approach that I took is one in which I studied real systems. I identified the issues that developers face. I then designed techniques that actually solved these issues. And the key theme that reoccurs when I designed these techniques is this hybrid static-dynamic program analysis approach. Now, the reason why this hybrid approach is powerful is because it is generally possible to build static analyses that don't have any runtime performance overhead, right, because they're going to run offline so they don't impose any slowdown on the program. But they're generally inaccurate static analysis because they don't have access to actual execution context of an executing program. And dynamic analysis, it's the contrary, right? Because they have access to actual execution context. They can be accurate about the results that they provide. But because they are monitoring real execution events, they can slow down the program. And it turns out that a carefully crafted mix of static and dynamic approaches can be both efficient and accurate. And I'll give you an example of how this can be done actually, a detailed example during the talk. Now, having designed these techniques, I then build actual systems that solve these problems in the real world. And while I'm building these systems, I follow 33 guiding principles. The first guiding principle is that of striving for low overhead. As we'll see in a moment, most of the techniques I designed are geared towards usage in production, so in this -- in deployed system. And therefore, efficiency requirements are a prime design constraint. The second goal that I strive for is high accuracy. So providing correct results to developers. We don't want to build inaccurate tools because developers will then just go on a wild goose chase and then lose time and ultimately end up not using these tools. And finally, I try to make the tools that I build rely on commodity hardware so that developers can quickly pick them up and start using them right away. So it makes things more practical for them. >> You're designing a bug [indiscernible]. you would need to deploy it in production. It's not immediately obvious why >> Baris Kasikci: Yeah. So you don't need to deploy it in production. all the things I'll present can be used in house, but it turns out that So dealing with in production bugs makes the problem more interesting and challenging, and that's particularly why I targeted that class of bugs. >> It certainly makes it more challenging. [Laughter] >> I guess the interesting question is could you have done a better job of bug finding if you sacrificed overhead and production in exchange for the developer trying to ->> Baris Kasikci: Yes. What you can do is you can sacrifice this requirement and bring all these tools in house and do more analysis. And these are strictly complimentary to each other. Where we could have done a better job? It's hard to say what we mean by better. Maybe you can find more bugs, but it's unclear whether those bugs are really in corner cases in deep down, you know, hard to find bugs versus, you know, really shell out things that you could have, you know, avoided by, let's say, proper programming practices. >> Proper programming practices [indiscernible] all bugs if you take that suggestion. >> Baris Kasikci: >> Sure, sure. But in the real world, aside from like Chris, that doesn't happen. >> Baris Kasikci: Yeah. I mean, it really depends, right? So typically bugs that occur once in every blue moon in a production deployment are just hard to track and these techniques, you know, they're -- once you actually modify these parameters, you can make them applicable in house pretty easily actually. Yes? >> Just one comment. When you say corner bugs, when you talk about security, it's really those are corner case bugs because security [indiscernible]. >> Baris Kasikci: >> Yeah. Yeah. So there's no difference whether [indiscernible]. >> Baris Kasikci: That is true in that case. That's correct, yeah. Yeah? >> [Indiscernible] in accuracy? >> Baris Kasikci: >> Yeah. Oh, how much we cover? It becomes rare in both cases. >> Baris Kasikci: >> No. In accuracy, we mainly target false positives. So it's not a goal. >> Baris Kasikci: No. It's not a goal. Yeah. It's not about detecting more bugs, but it's about being practical and finding bugs that hit actually developers or users. All right. So yeah. Again, so these guiding principles, I'll actually connect back to them when I'm talking in detail about various aspects of my dissertation work. So, you know, using the aforementioned approach, basically, my dissertation, I built techniques for detection, root cause diagnosis, and classification of primarily concurrency bugs. And these three steps are actually essential to finding and fixing bugs in practice. Detection, you need to detect bugs to be able to fix them generally. But really, more often than not, you need to identify as a developer the precise conditions that led to a failure associated with a bug. That's what we do in debugging and that's what root cause diagnosis is about. And sometimes there's just a lot of bugs. You cannot deal with all of them, so you need to classify them according to their severity or potential severity to address the really pressing ones first. Now, again, as I just said a while ago, you know, these techniques, you can use them in house, but I primarily target a setup where we can actually use these, most of these techniques in production and when you target in-production systems, this increases complexity because you need to now design efficient techniques to not hurt user experience. And to do so, I also built infrastructure for efficient runtime monitoring, which enables some of these techniques to be usable in production. All right. So at this point, I'd like to give brief overviews of the various components of my dissertation work. And what I'll do is I'll give -- I'll explain the use cases that these techniques target and also highlight key result that does we obtained using the associated systems that we've built for these techniques. And for that, I'll start with detection and in the case of detection, we targeted one of the nastiest type of bugs, namely data races, which are unsynchronized accesses to shared variables from multiple threads. Now, data races are forbidden by certain language standards for certain system standards as well, and not just that, but they're also known to cause other types of concurrency bugs like [indiscernible] violations or even deadlocks. Now, the input to this detection system is the program itself and the use case is one in which users run their programs in production all the while this detection system is detecting data races that essentially impact them. And the key result for the system is that it has very low performance overhead of only around 2.5 percent on average, which is orders of magnitude that are then with prior work achieved. Now, detecting bugs is useful but as I mentioned a while ago, you know, developers typically want to identify inputs to threat schedules, the right control flow that led to the failure associated with a bug. And traditionally this is done through reproducing the bug in a debugger. And identifying the root cause of the failure. And this may actually not always be possible, especially for bugs that occur in production. Our work [indiscernible] diagnosis automates this detective work, this detective work that the developer essentially needs to go through. And just like the previous detection scheme, this technique is also geared towards in production usage. So the input to this technique is the failure report and the source code of the program and then the output is a representation that conveys the root cause of the failure to developers. The key result for this technique is that it is fully accurate. And what I mean by accuracy in this case is that it allows developers to seamlessly diagnosis root causes of real bugs in real world programs. And I'll talk more about this in detail. Now, it's good to perform detection and root cause diagnosis, but sometimes the sheer number of bugs is overwhelming. Right? So developers need to prioritize them for fixing them. And this technique in particular classifies data races according to their consequences. However, it targets a different use case in that it is geared towards in house usage because it relies on heavy program analysis that is not well suited for an in production deployment. Now, the input to this system is a list of data races. That could have been, you know, basic data races that could have been previous detected using this in product data race detection system or a third party data race detector. And the output is again a list of data races that are ranked according to their potential severity. So you can think of a data race as having a tag that says this data race is going to potentially cause a hang. In other data race, there is a tag that says this data race is going to potentially cause a crash. Now, the key result, again, for this technique is that it achieves almost perfect accuracy. Yes? >> [Indiscernible]? >> Baris Kasikci: So we do have some modular sources of false negatives because the technique we use is a mix of static and dynamic analysis again and when ->> The coverage program you're talking about or -- >> Baris Kasikci: That would -- >> [Indiscernible] because the [indiscernible] does not have all the [indiscernible]? >> Baris Kasikci: Both. So we have two sources of false negatives. Both because we make some assumptions and static analysis, and second, you know, you don't have all the coverage needed, right? And so you can complement it with in house testing actually to increase the coverage basically. >> Is a lot of the moderations was finding how they can exploit these bugs [indiscernible]? >> Baris Kasikci: So for security actually, so I'm going to [indiscernible] future work, we're extending our work on diagnosis and security is a particular problem because you don't necessarily have an Oracle that will tell you that you have like a security breach or anything like that. So you need to take a different approach. In this line of work that I'm going to talk about today, we're mostly targeting things that have some observable behavior through which you can tell that, you know, something went wrong. So that's a distinction. talk. >> [Indiscernible]. >> Baris Kasikci: >> But yeah, I'll get to that towards the end of the What's that? What language? >> Baris Kasikci: So language doesn't matter in the sense that -- so we -anything we can compile down to LVMIR because most of the static analysis we use actually operates on LVMIR, so it basically, that's the answer. So language is that you can compile to LVMIR would be plausible to these. But for in practice, we looked at C and C++. You could do other things as well. Cool. All right. So again, finally, I'd like to remind that the root -- the detection and root cause diagnosis techniques that I mentioned are intended for software running in production. So to enable them to be performed efficiently, I built some dynamic instrumentation techniques that actually track execution information and a low overhead manager. And a key result again for this technique is that it can actually, for broad range of real world programs, it can do runtime tracking with fairly low overhead. Yes? >> You said it's up to six percent of the dynamic tracking? >> Baris Kasikci: >> Yes. Is it 2.5 percent for the detection? >> Baris Kasikci: For detection. Yes, for tracking, so in that particular case, I'm actually going to talk just the next slide about this particular thing. This is actually work I have done at Microsoft Research when I was interning here. And this particular case, we were seeing this for managed code. This outlier here [indiscernible]. Okay. So basically, at this point, I want to emphasize some practicality aspects of my work. And in particular, over this past summer while I was interning at Intel, I integrated the root cause diagnosis technique that I'm going to talk about in detail to their internal tool chain. And this integration is still being maintained and we're actually currently working with Intel to release that integration as open source. So hopefully that will happen soon. And back in 2013 when I was working at MSR, I used this efficient runtime monitoring infrastructure I designed to build a code coverage tool for Windows. And this code coverage tool was quite efficient in that with our tests with all Windows 8 system binaries, about 700 binaries, we've seen overheads between 1 to 6 percent. And thanks to these advantages at least after I left, I know that this project was still being maintained within the tools for software engineering. >> [Indiscernible]. >> Baris Kasikci: Yes, yeah. It's the LCC. >> [Indiscernible]. >> Baris Kasikci: >> Yes, yeah. [Indiscernible] processor trace? >> Baris Kasikci: Yes, yes. Exactly. All right. Again, coming back to this overview of my dissertation work, again, I'm going to delve into details of my root cause diagnosis work and primarily focus on concurrency bugs. Although this approach is more general in that it can target other types of bugs in sequential programs in the context of today's talk I'm going to focus on concurrency bugs. This technique is called Gist's because it conveys the gists of the failure to developers, which is the root cause. And when describing Gist, I'll first give you a background and overview for Gist. I'll then give details of the design of Gist and I'll finally present and evaluation results that we obtained when applying Gist to real world systems. All right. But yeah, before I delve into describing Gist, I would like to say that root cause diagnosis of software failures in general and in particular for failures that occur due to concurrency bugs is a scientifically hard problem for several key reasons. Now, in particular, root cause diagnosis requires gathering significant execution information from program which harms efficiencies. So there's an efficiency challenge. The second challenge is that of accuracy. So we don't want to provide developers false positives or false negatives. So in the case of root cause diagnosis, false positives would mean pointing developers to wrong root causes and false negatives would mean just all together missing the root causes of certain failures. And finally, targeting in production bugs aggravates the efficiency challenge because of the stringent requirements on performance for in production software but another challenge is that it may actually not be possible to reproduce the did you goes that recur in production in a testing setup. So this is an added challenge that comes with targeted in production bugs. Now, there's a significant body of related work that dealt with the various aspects of root cause diagnosis of software failures that ranges all the way from collaborative approaches to approaches that use test cases to reproduce the failures in order to isolate their root causes. There are other approaches that record replay and runtime checkpoint to perform root cause diagnosis. There are approaches that rely on hardware support to do root cause diagnosis. And we do really build upon all this prior work for our work. Although it's worth mentioning the assumptions that prior work make for performant root cause diagnosis. Now, some prior work makes the assumption that there's access to a non-commodity hardware or some sort of state checkpointing infrastructure for performing root cause diagnosis. Which may not necessarily be the case. More critically, some of the prior work actually relies on the premise that there is actually a means to reproduce failures in order to perform this root cause diagnosis task. Which may also not be the case. Right? Now, pretty much all prior work makes the assumption that there is an observable means to detect failures, right, in the form of, let's say, a crash or a hang let's say that is reported by a user. Now, in our work, we revise these assumptions to target our use case of in production root cause diagnosis of in production failures and in particular, what Gist does is it makes the assumption that there's an observable means for it to detect failures. Now, when I'll talk about future work and that will actually relate back to the security question you came up with, I will connect back to these -- I'll come back to this assumption and suggest for -- suggest ways in which we can deal with the limitations actually that arise from this assumption. Now, the key component of the design of Gist is a hybrid static dynamic program analysis approach. Essentially a heavy weight in house static analysis is an enabler for subsequent dynamic analysis and it's really the synergy between static and dynamic analysis that allows Gist to perform efficient and accurate root cause diagnosis. Now, to give an overview of Gist I'll first talk about the software usage model today and basically what happens is developers develop some program and users run this program either on their laptops or on their mobile devices or in the cloud in a datacenter. At the end of the day, we can consider these as endpoints for users around these programs. I'm sure some of you are familiar with this -- these old occasional error message in Windows systems. Other systems also have similar error reporting infrastructures of course. Mac OSX has their own. IOS has, Linux variants also have similar reporting infrastructures. And if you click on some error report after a failure, the systems on which this software is running will actually shift certain information back to developers like a memory down pour in some cases logs and then developers can then go through this information to debug and fix to understand the problem to debug it, to fix it, and improve the quality of their software essentially. And as mentioned previously, debugging is typically done in an iterative way. Right? So developers reproduce failures to find their root accuse and fix them. And just to tell you in the context of this talk, when I'm talking about a root cause, I'm talking about a statistical notion of root cause. So events that are primarily correlated with the occurrences of failures are root causes for our purposes. Now, this practical definition, it turns out, is useful in practice for real world programs and real wonder bugs. But we essentially rely on correlation to define causation and, you know, that's a long discussion we can have offline if you want. But it's an interesting one. I'll just like to point that up front. >> Across users? >> Baris Kasikci: >> What's that? This is across users? >> Baris Kasikci: Across users. So [indiscernible] multiple users to have some statistical inference, exactly. >> [Indiscernible] beginning of your talk, we could conclude that the increase in the output of paper [indiscernible] caused more force to happen so you do have to be careful with -- >> Baris Kasikci: So it is not -- it is not just a straightforward correlation. There is more information retrieval techniques that we use in some sense. It's not a straightforward, you know, this happened, so this -it's more like computing precision recall ranking events, so on, so forth. So it's not that simplistic in that regard. Yeah? >> I just want to understand your assumption in the previous slide you mentioned you said failures can be detected. >> Baris Kasikci: >> Failure here is programmed crash? >> Baris Kasikci: >> Yes. Okay. Yes. Not [indiscernible]. >> Baris Kasikci: Well, if the specification violation is encoded in let's say as an assertion or something like that, or a post condition that you're checking, yes, that would also be -- there needs to be like the system needs to be automatically tell you basically. That's the -- you need a trigger that will tell you there's a failure. All right. So basically, yeah, again, finding the root causes of failures may actually be impossible if you're not able to reproduce them. And Gist precisely attacks this problem. It is a technique that automates this difficult debugging process by creating what we call failure sketches. Now, informally, failure sketches are representations that convey the root cause to developers and I'll get into a more precise definition just in a little bit. But essentially, you can think of failure sketches as, you know, you just stare at them and hopefully it will point you to the root cause of the failures. And I guess without further ado, let me show you what a failure sketch looks like. And aside from formatting, this is the output that you would get from Gist. This is a failure sketch for a real bug. Now, in this representation, time flows downward and the steps in an execution are enumerated along that flow of time. Failure sketch shows statements from two different threads that affect the failure and their order of execution with respect to the enumerated steps. And perhaps we can look at this representation and tell me what is the bug in this program. >> [Indiscernible]. >> Baris Kasikci: Yeah, I mean, even if it's not obvious, even if it's not obvious actually. So failure sketch will actually give you more information. Basically what these boxes and this arrow says that primarily in failing executions, the mutex -- the free statement on the mutex is executed before the mutex unlock statement. And maybe now we can look at it again and tell me how you would actually fix this bug. There are multiple ways. >> [Indiscernible] on the rest of what you're doing. >> Baris Kasikci: Yeah. Depends on semantics. One idea, like one straightforward idea, maybe you can give me one. Yeah. I guess I know what ->> [Indiscernible]. >> Baris Kasikci: Yeah, that is one thing. So you need to enforce some ordering, right? Basically. I guess hopefully it is clear, but basically these differences that are shown on the failure sketch according to our evaluation point to the root causes of failures. So we actually were able to validate this by looking at the patches that developers came up with. In this take example, in this particular example, basically what developers did was they actually waited for threat to the join thread one before releasing the mutex. Right? That's one way of fixing things. But it took them four months to fix this bug. Now, I'm not saying that they just sat in front of their computer and debugged this for four months and then at the end of the four months, they came up with a bug fix for this particular bug. I'm just saying that if they had access to this representation, it would have just taken them much shorter to get rid of this bug. That's what I'm saying. >> [Indiscernible] four months because they had three months and 30 days' worth of other bugs that they fixed first. >> Baris Kasikci: That is also -- that is also possible. But I guess if they had better insight into what the root cause of a bug was, then they could use that in prioritizing their effort as well. Right? Because if they knew fixing something, it's like answering e-mail. If you know it will take you ten seconds to answer the e-mail, you would ->> [Indiscernible] both by affect and by cost. >> Baris Kasikci: Yes. >> If the crash actually happens in the mutex unlock, then I would say [indiscernible] by developers not that hard. I know that [indiscernible] is wrong [indiscernible] ->> Baris Kasikci: >> [Indiscernible]. >> Baris Kasikci: >> Somebody wants to change it. Yes, yes. [Indiscernible]. >> Baris Kasikci: Yeah, yeah, you're right. >> I mean, the hard one is [indiscernible] was a chance to crop some memory that other crashes somewhere else. >> Baris Kasikci: >> Yes. [Indiscernible]. >> You called free, so what happened is the thing got put back in the free list and then maybe somebody allocated ->> Baris Kasikci: Somebody else allocated -- >> [Indiscernible]. >> [Indiscernible], but here, you say the crash actually happened. >> Baris Kasikci: Sure, sure. This is -- this is just an example. an illustration. We target more complicated types of ->> Yeah. This is [Indiscernible] in four months, [indiscernible]. >> Baris Kasikci: Yeah, yeah. Even this took four months. Typically, the way you would debug this, you would see the crash. Normally, what I would do, I would put the hardware watch point that that address tried to figure out, put display somewhere. Who else is changing this in the code? I don't know the code base -- if I know the code base, I can just go and fix it. >> The challenges are [indiscernible] and to make sure my change actually fixed the bug. >> Baris Kasikci: >> Yes. [Indiscernible]. >> Baris Kasikci: Yes. So that's the thing. You may not necessarily be able repro this right, and so that's the good thing. You can try to repro it on your own setup. It may not work. So these are strictly complimentary efforts. You do this in your own testing setup. We try to repro. It works. Then you fix it. It's good for you. If it doesn't work, you have this representation that incrementally makes itself better so that you can, you know, use it in your debugging effort. But that's a good point. Yeah. >> Quick question. >> Baris Kasikci: Yes. >> Just to clarify, so you will catch bugs that cause these maladies like crashes, but what if the program does not crash but produces an incorrect result? >> Baris Kasikci: Yes. Yes. That's a good point. So we don't target this in our prototype. But what you can do and what is conceivable is that you can define custom failure modes basically. You need the means to actually tell the system that there is let's say an inconsistency in memory layout or you do a check on your files or every now and then to say that something has gone wrong. This is the assumption that we have. We rely on this. So if there's just a corruption and there's no way of checking that, we don't handle that case. So that's a limitation that arises from the assumption that I made initially. All right. So I'll begin by describing the high level architecture of Gist. So basically, Gist has a client server architecture so the server site does static analysis and the client side's dynamic analysis. So the server site takes this, input the failure report, so this could be the core dump stack trace, the instruction pointer of the failing instruction and it also takes the program source code. And it then feeds these inputs to a static analyzer which computes a static slice based off of these inputs. Now, I'll explain a little more in detail what the static slice is actually composed of so you can think of it as having statements that are related to the failing statements. Now, what then the client side of Gist does is it uses a runtime that gathers more control flow and data value information from programs that are executing out there in production. By instrumenting essentially programs that are running in production, and by taking, when doing this instrumentation, by taking into account the static slice that the static analyzer computes. What then Gist does is it uses these runtime traces that it gathers to refine the static slices and in particular, in the context of our work, what I mean by refinement is that what basically refinement will do is it will remove certain statements from the slice that don't get executed in production and in production runs. And it adds to the slice information such as access orderings. And this refinement step needs to be done of course because of the imprecise nature of static analysis that actually lacks this information in the first place. Right? You don't have these ordering informations. I mean, you could potentially have it, but at least the analysis that we use, it doesn't give you this information. Now, then, finally, another server component essentially uses the refined slices from both failing and successful executions to determine the salient differences between them to highlight them as root causes on the failure sketch. Now, this is the high level design of Gist and I would like to get in a details of design of Gist and for that, I'll start with the static analyzer in Gist. Now, static analysis of Gist builds static backwards slices with the primary goal of reducing subsequent dynamic tracking overhead. Now, the static backwards slice will start from a certain statement of interest and in this case, the statement of interest would be the statement where the failure occurs. And it will include statements that the failing statement depends on. When we're talking about dependencies, we're talking about both read and write dependencies in this case. Now, because this excludes all the other statements that the failing statement doesn't depend on, it allows subsequent runtime tracking to be more efficient. And finally, the analysis in Gist that we use is interprocedural. Is this is because failure sketches can actually span function boundaries and so Gist needs to account for this fact by looking at multiple functions and function calls among them. Now, to better understand how static slicing works, I'll walk you through a simple example. I'll continue using this example for explaining the rest of the operation of Gist. Now, in the simple example, there's -- let's consider this there's a cleanup function that prints some debug messages and then deletes the memory allocated to the state object S. And there's a display size function that again prints some debug messages and then displays the size of the state object S. And of course there's other code in this program. And in particular, code that calls these functions, but they're irrelevant from the purposes of -- for the purpose of understanding this example. Now, it turns out that some users observe this program to crash. And when the lock statement in display size function exercises the size field of S, and I'll show you how Gist will help us determine the root cause of this failure step by step. Now, in the first step, when Gist computes the static backwards slice, starting from the failure point, in our example, it removes the log function calls up on entry to functions display size and cleanup. And this is because the failing operation, namely the loading of the size field of S is not influenced by these statements. Now, the key take away from static analysis of Gist is that it helps following dynamic analysis monitor fewer events than it would otherwise require monitoring. And in conjunction with another adaptive scheme that I'll describe in just a few slides, this static analysis essentially enables the cost of control flow tracking to be reduced by a factor of 20. Now, Gist will remove more -- yes, please? >> So [indiscernible]. >> Baris Kasikci: >> Of course it depends on how you model pointers -- Sure. So it seems, you know, like a huge difference. >> Baris Kasikci: That's a good point. So basically, so they're I am say almost three things. So you're right that where your failure occurs matters a lot. So the way you model pointers, the way we -- the analysis that we rely is called data structure analysis so it's somehow advanced form of type-based alias analysis that is basically built into LLVM that allows you to, in cases where you have a specific type, let's say, in the failing statement, allows you to prune quite a bit of the code statically. So that's one thing. But that doesn't always happen of course. The second thing is knowing where to start monitoring. So you target basically, this is essentially statically, you target a portion of the program. You compute a static slice, and as your program enters this slice, you start monitoring, and as it exits from that slice, you stop monitoring. So knowing where to start makes a big difference rather than, you know, otherwise, because you know, you would have to monitor a lot -- many more events basically dynamically. And yeah. And the third thing is again something that I didn't talk about. It's this adaptive approach that actually incrementally monitors portions of the slice. So it's really that combination that brings in the 20 X overhead. Yes? >> The example you're giving doesn't cover issues of [indiscernible] take into account like issues that enumerate alias [indiscernible] on memory that supposedly -- >> Baris Kasikci: >> Yes. -- is unrelated to S overwrites S. >> Baris Kasikci: That's a good point. So we don't do that. So that's an assumption we make. So you're saying that if there's a memory error in a function that could potentially overwrite everything else in the program, and you can alias basically ->> There's no type base weighing to -- >> Baris Kasikci: Yeah. All right. slicing ->> Sorry. >> Baris Kasikci: >> Please. Yes. [Indiscernible]. >> Baris Kasikci: >> Okay. So can we go back to the -- >> Baris Kasikci: >> No. I have a question. >> Baris Kasikci: >> What's that? I saw you have more slicing [indiscernible]. >> Baris Kasikci: >> Yes. [Indiscernible]. >> Baris Kasikci: >> No. No. We don't do that. That's a really good point. So now that I explained how Gist performs static So how do you know the S [indiscernible]. Oh, yeah. [Indiscernible]. >> Baris Kasikci: So this data structure analysis is -- you don't have a custom alias analysis but this data structure analysis basically tells you this global aliasing information. That's how we know. >> Data structure analysis. >> Baris Kasikci: DSA. Yes. This is Chris Lattner's thesis basically that was left in the repository in LLVM so we had to port it all the way to recent versions. >> [Indiscernible]. >> Baris Kasikci: >> Okay. Yeah. That one has [indiscernible]. >> Baris Kasikci: It's not precise enough. For something like instance, we have dynamic tracking that will fix that. But for like Apache for instance, it's not uncommon to have a cold site potentially 300 targets whereas in reality, we'll only have one it is not -- it's not very accurate. >> Apache, for something to have or two. So [Indiscernible] it's complete but not accurate. >> Baris Kasikci: So, it is complete again modular certain things like in line assembly and so on. So as far as I know. I don't know the details of the internals of the DSA, which is a massive thing. But all right. So we talked about static slicing. So I'll talk about how dynamic control flow tracking and data value tracking that just performs using instrumentation helps with the refinement of slices. Now, control flow tracking allows just a perform slice refinement by essentially identifying statements that get executed during the actual programming executions and removing from the slice statements that don't get executed. Now, let's assume that this control flow graph here represents the control flow in a static slice. Node are basic blocks and edges are branches in the slice. Now what Gist does is it tracks the control flow and refines static slice using a technology from Intel called processor tracing. Now, in this particular example, the path highlighted in blue happens to be the path that the program actually executed during an actual run. Now, Intel PT has about 40 percent tracking overhead so this is with broad versions. If you're surprised, maybe skylight is better. So I haven't tried with skylight and this is, you know, full program tracking overhead for a broad range of desktop and the server applications. So Intel, they promised that this value will get better. Maybe it's already better. But this is the result that we obtained. >> [Indiscernible] much better. >> Baris Kasikci: >> It also depends on how you [indiscernible]. >> Baris Kasikci: >> It's much better? Well, if it's ring zero and ring three, I mean -- [Indiscernible] ring three and we're only [indiscernible]. >> Baris Kasikci: Okay. Okay. ask later. But thing like ->> [Indiscernible]. But like was it for -- okay. Maybe I can >> Baris Kasikci: Yeah. Okay. Yeah, yeah. All right. I'm curious to know more. All right. So again, what I was going to say is, okay, 40 percent. It may be actually acceptable for some applications. Generally not acceptable to have this type of overhead in production. What basically Gist does again as I mentioned, this combination of adaptive monitoring and static analysis is essentially going to bring this overhead down. Now, coming back to the example for which is previously computed the static slice, it -- when Gist performs the control flow tracking, it determines that the log line that prints the [indiscernible] of the pointer S is never executed. And this is I guess kind of expected because when code is deployed and programs are deployed in production, typically verbose debugging messages are disabled to not incur overhead. Now, at this point, I'd like to emphasize the synergy between static analysis and dynamic tracking. Right? Dynamic control flow tracking. So static analysis removes from basically the program the things that are not related to the failing statements and dynamic control flow tracking removes from the slice statements that don't get executed during actual executions. So it's this really working hand in hand. All right. For data value tracking, just tracks values that are read and written by the statements in the slice and for this it relies on another hardware feature, namely watch points. Now, watch points can observe certain address in a program and cause the CPU to track if there is a read or write access to the variable at this address with low overhead. I mean, it's configurable. It can be just reads, just write. It can be both. And watch points also allow just to track the total order of statements accessing memory and augment failure sketches with this information because the way we handle watch points is that we handle them atomically. Now, this ordering information in turn increases the failure sketch accuracy to help developers better reason about bugs for which understanding the ordering makes a big difference such as concurrency bugs. Now, coming back to our example, Gist will go ahead and place a hardware watch point at the address of the state object S and it will monitor the values of S as well as the order of accesses to S across multiple threads and it will determine that in successful executions, the log line that prints the size of the state object S executes before the lead statement and these two have been in different threads. Now we have a notion of threads, right? Because static analysis had this flat structure of the program. At this point we have -- just a sec. At this point we have threads and if the alternate order basically happens where the deletion happens followed by dereference, it turns out that the program is failing. Yes? >> [Indiscernible]? >> Baris Kasikci: So yes, so that's a good point. Let's, for the sake of this example, we can assume that somebody else allocated that page or you have a compiler that will actually pad the values in a way that when you dereference it, you will actually crash. In reality, this is abstracted from a more complicated example where you have a deletion function that actually deletes the value. It's like a destruct function that deletes the value and sets it to null basically. Yeah. That's a good point. Yes? >> [Indiscernible]? >> Baris Kasikci: I must be missing something. What's that? >> S is a point [indiscernible]. The contents of S don't necessarily change at all when you do [indiscernible]. >> Baris Kasikci: >> [Indiscernible]. >> Baris Kasikci: >> So the -- that's what I was saying. Yeah, yeah, yeah. Yeah. [Indiscernible]. >> Baris Kasikci: Yeah, yeah. By just deleting, you can actually dereference. That's why I was saying this is an abstraction. There is actually a destruct function. Yeah. Yeah. That's a good point. Yeah. >> [Indiscernible]. >> Baris Kasikci: >> Somehow it's the same S everywhere. >> Baris Kasikci: >> What's that? Yeah. It's not copies of it. Yes. [Indiscernible] technique such as this, such as what you're describing? >> Baris Kasikci: How -- I mean, you mean, if you -- yeah. There's a pool and you can actually dereference the thing and if it doesn't lead to a crash because things are allocated from a pool, yes, it could defeat, yes. Yeah. All right. Now, Gist basically tracks control flow and data that I will use to refine the slice, but the slice can grow to be quite large. It depends on where the failing instruction is. In refining the entire slice at once, it can actually impose high overheads. It's relying on efficient hardware support. And to solve this problem, to perform refinement in an adaptive way, increasing the larger portions of the slice and combined with static analysis, this adaptive approach is the key insight to low performance overhead in Gist. Now, explain how Gist tracks the control flow adaptively but tracking of data values is also done in a similar adaptive manner. All right. So, let's assume that the control flow graph shown on the slide is again the control flow in a static slice. We show the basic [indiscernible] where the failure occurs on the slide as well. And what Gist does is it starts tracking small number of statements from the slice based on a common observation from prior work that in most cases root causes of failures are close to the failures themselves. Gist then builds a failure sketch using this refined slice and continues refining increasingly larger portions of the slice until it can provide developers a failure sketch that contains the root cause of the failure. Now, this technique is effective because as I mentioned, this basically empirical result shows that in most cases, root causes are close to the failures. Now, this is still useful if this is not the case. But it may not be as effective because Gist will potentially require monitoring larger portions of the slice for cases where the root cause is far away from the failure and potentially incur more runtime performance overhead. So it's really a tradeoff basically. >> Consideration is going around redeploying it [indiscernible]. >> Baris Kasikci: Yes. Yes, yes. All right. So we saw how Gist performs control flow and data value tracking. The refined static slices. So let's look at how it monitoring multiple user executions to determine the key differences among them to isolate the root causes of failures. Now, a running example had this pattern, right, deleting of S by its reference. Can think of a more abstract example where we have a write to S followed by a read from two different threads in some particular order that leads to a failure. Now, this particular failure pattern is a common failure pattern in multithread programs. It's known as an order violation. What Gist does is it seeks and refines slices. This order violation pattern and also other patterns of common concurrency bugs such as atomicity violations, single variable atomicity violations, and data races. Now, remember that Gist seeks these patterns in failing and successful executions. [Indiscernible] statistically determined patterns that are primarily correlated with failing executions and highlight them as root causes on failure sketches. In our example, Gist will go ahead and monitor multiple executions and it will determine that indeed failing executions exhibit the same failing ordering where successful executions do not exhibit this ordering. Now, at this point, Gist can statistically compute that the pattern where the deletion of the pointer S followed by it's reference is the best predictor of failure. And it can show this as the root cause on the failure sketch which essentially looks like this. Yes? >> [Indiscernible]. >> Baris Kasikci: >> Yes. [Indiscernible]. >> Baris Kasikci: That's one thing. Another thing is [indiscernible]. It's not obvious which one you should track when you start the program executing. >> Baris Kasikci: Yes, you can -- basically you can have a different address every time literally. >> [Indiscernible]? >> Baris Kasikci: Yes. So you would basically, currently, the way we rely on this is we would distribute the tracking task basically. We would enumerate and distribute it across multiple executions. But this is -- I mean, I wish we had better data tracing capabilities. >> [Indiscernible]. >> Baris Kasikci: >> Yeah. So that we didn't resort to -- Okay. >> Baris Kasikci: -- hack, let's say. >> Where do you get successful executions from? Your model is run it until there's a crash [indiscernible] Microsoft, there's no crash, there's no bug report. >> Baris Kasikci: Yes, the runtime will monitor that, you know, we have to pass from the program counter that actually normally crashes basically. That would cause [indiscernible] as a non-failing execution. >> And [indiscernible]. >> Baris Kasikci: >> Yes, yes. [Indiscernible]. >> Baris Kasikci: Yeah. Yes? >> [Indiscernible] fundamental use of why these things have a hard time finding deployment is privacy, right? >> Baris Kasikci: >> I'm going to -- [Indiscernible] is tough, even for [indiscernible]. >> Baris Kasikci: >> Oh, yeah. Yes, absolutely. [Indiscernible]. >> Baris Kasikci: Yes. Yes. I know. So this is something I'm -- this is part of -- I was going to mention is part of future work, but this is an aspect we haven't looked at much. So ->> There's not as much good news, right? >> Baris Kasikci: >> Yes. [Indiscernible]. >> Baris Kasikci: Sure, sure. There are certain things. There are certain ideas that when we can discuss perhaps offline. But it's about basically treating the data as opaquely as possible, all the way providing useful answers, basically computing hashes of paths and in some sense, you know, using them. I have some ideas, but you know. And by no means I'm not an expert in privacy. So it's ripe for cooperation as well. All right. So, we discussed the design of Gist and I'll talk a little bit about how we evaluated it and the time so we have another half an hour? Is that it? No. >> [Indiscernible] you're on schedule. >> Baris Kasikci: >> Yeah. We have the room for longer. >> Baris Kasikci: >> You planned for an hour. [Indiscernible]. >> Baris Kasikci: >> For longer? Some of us planned for an hour, yeah. >> Baris Kasikci: >> Oh, really? Okay. [Indiscernible]. >> Baris Kasikci: Okay. Okay. Because I thought we had like 70 minutes for the talk, so we have ten more minutes. That's how I was planning, but okay. Anyhow. So basically I'll then briefly talk about the evaluation. So basically we looked at system software such as Apache, SQLite, and Memcached. [Indiscernible] tell what these do. So I'll basically answer the questions of whether Gist is effective and whether it is efficient. To answer the question of whether Gist is effective, what we did is we manually analyzed the failure sketches that Gist automatically built for 11 failures. So this was for the paper. And basically what we figured out is that Gist identified correctly the points, the root causes of failures which the developers ended up removing from their programs. Another interesting result is that Gist basically reduced average number of source statements that a develop needs to look at before finding the root cause to seven, which is orders of magnitude smaller than the sizes of the programs that we looked at. So Gist was useful in that regard. So I'll quickly pass over the efficiency. Basically, the idea here is that the more -- the larger a slice that Gist monitors [indiscernible] monotonically, the overhead increases. And this is what this graph shows. Now, the good news is, well, okay, I guess it depends who you ask. But the overhead is always below five percent and these numbers have very little variance. So this is across all executions that Gist monitored. So it is considerable that Gist is practically deployable in production. And we heard good news that, you know, new hardware generations have even better performance behaviors so this going to get better hopefully. Well, to recap, you know, Gist uses a combination of static analysis and dynamic analysis to build failure sketches that help developers do root cause diagnosis. There's more info on this web page such as related source code. And we're working with Intel to release our integration to GDP essentially as open source. turns out this is the GDP logo. And [Laughter] >> Baris Kasikci: I didn't know about it. All right. Say a few words about future work and how that ties to my past work. So you mentioned privacy, right? Some of these techniques that I built on rely on gathering execution information from users, so I like to work on privacy preserving techniques to still achieve good results to improve quality of software while respecting the privacy of users. I'm extending my work on root cause diagnosis for encompassing security vulnerabilities. So what you can think is is I think that we'll be able to gather these path profiles for instance and build a model of good versus bad behavior, right, using some machine learning. And that behavior in this case would mean something like control flow hijack attack for instance. So these are deviations from good behavior. And using this ML approach essentially we can go back and revisit the assumption that we made in the beginning about requiring to detect bugs to do root cause diagnosis and potentially make just stronger, more powerful. Of course, concurrency is present, not just in a single node level but in the distributed system settings. So I'd like to take some ideas from my work and apply it in that setting. And as a general theme, you know, developers face challenges due emerging technology trends not just, you know, this trend of single core to multicore architecture transitioning and related challenges and so forth development. But also all sorts of emerging trends and [indiscernible] computing devices and IoT, their programmability and security challenges. And I'm very excited to look deeper into those challenges as well. I like to thank a lot of people who helped me with all that work. It was a lot of Microsoft folks. This is, you know, faculty, fellow grad students and, you know, thanks a lot for all their help. And, you know, that brings me to the end of my talk. So I talked about overview of my research, delved deep into root cause diagnosis, and gave you some details, showed that these techniques are efficient and effective and I'd like to mention as a closing thought that I believe complexity is just going to be ever increasing. So we'll need better tools to understand and tame this complexity. And I think my results show that I managed to do that well for better reasoning about concurrent programs and this brings me to the end of my talk. I'm happy to take more questions, discuss offline. Thanks for coming. [Applause] >> Chris Hawblitzel: Are there any other questions? >> So, I guess, one question comes to mind is, you know, is this strictly for concurrency analysis? >> Baris Kasikci: Yeah. That's a good point. So the root cause diagnosis part, I didn't talk about it, but it's not just for concurrent because we're looking at other types of differences, other types of patterns. Just like control flow of the program, we're just in data values. We are not looking at invariants on data values, but we could potentially do that. So it is applicable to other types of bugs as well. And we did evaluate it for other types of bugs, but, you know, I wanted to talk beyond concurrency bugs physically. Yeah. Yes? >> So one of the pitfalls in this big finding [indiscernible]. But it's not, right? And not only does it work, it's also the potential to create issues down the line in every single domain, privacy, security, performance, what have you. So any thoughts on making the potential fix a more certain deal [indiscernible]. >> Baris Kasikci: So that's something we debated a lot. So you identified this root cause statistic. Potentially you can concede that you can actually craft automatically like a synchronization that will eliminate offending threat schedules and fix the problem, right? You can do that. It's fairly easy to do that, but we want to be cautious and we want to just present that as a hint to developer rather than actual fix that will be applied. Because it becomes hard to reason about other properties in your program when you introduce synchronization. Such as if you introduce some ordering constraints, what happens to the rest of the program? Can you maybe -- are you maybe introducing a deadlock to your program? >> [Indiscernible] monitor that somehow. Right? >> Baris Kasikci: Yes. So then actually that's a really good point. So you could deploy additional monitoring to see whether you're running into cycles and, yeah. That's a good point. But, yeah, I think we haven't gone into, you know, automated fixing line that much, but you know, we've considered these things more as hints to the developer. But, yeah. That's a really good point. >> As long as you can prove -- >> Baris Kasikci: awesome. Yeah. If you can prove, yes. If you can prove, that would be >> Even if you can't prove, if you've got the data that shows that you're fixing is crashing less often than it was [indiscernible], why not? >> Baris Kasikci: >> I guess that's true. Yes. [Indiscernible]. >> Baris Kasikci: >> Yeah. I was just saying, yeah, that's -- [Indiscernible]. >> Baris Kasikci: So I mean, technology is speaking anyways. We don't have the infrastructure to capture that information, right? So we're relying on a single node monitoring infrastructure to do that. In a distributed system setting where you have like a synchronous call, things become more tricky. Now you have to actually ->> [Indiscernible]. >> Baris Kasikci: Yes, yes. I guess if you can track like the prominence of the request and ordering information [indiscernible] that will be just the new back camp to the whole system that does diagnosis. So in that case, it will work. But I'm not sure like how easy it is to infer that type of information, let's say, from a ->> [Indiscernible]. >> Baris Kasikci: Then you need to infer that basically dynamically. Right? You need -- maybe you need timestamps to give you ordering information that you can -- rather than relying on hardware support, you rely on logs. You mine data from logs and use that. I mean, these are all interesting things. The thing is in distributed systems, these things are trickier and is less explored I think. So I think it's very exciting thing to look at. Yes? >> So [indiscernible] it's a pretty old technique. And there's been a lot of work on [indiscernible] analysis since then. And as far as I know, it really -- there wasn't a threat of research [indiscernible]. It was sort of like Lattner did that in '91. >> Baris Kasikci: Okay. >> So I'm curious about it. Is it really the best, you know, state of the art? And if it isn't, is there something about it that made you want to use it instead of the [indiscernible]? >> Baris Kasikci: So to be honest, so I don't know of a better or not better. I don't know of a technique within LLVM that gives me interprocedural aliasing information and that does better than this ->> Standard package other than [indiscernible]. >> Baris Kasikci: >> [Indiscernible] LLVM. >> Baris Kasikci: >> For interprocedural analysis. I bet I could find one. There may be. There may be, but -- The fact that you had to dust it off, someone suggested that -- >> Baris Kasikci: Yeah, yeah. [Indiscernible] talking to John Criswell, who was one of the maintainers as well. So he's now in Rochester. So he actually suggested me to look at that so that's -- >> [Indiscernible] did a lot of work recently on finer analysis. >> Baris Kasikci: [Indiscernible] LLVM? >> [Indiscernible] my assumption. >> [Indiscernible]. >> Chris Hawblitzel: you. [Applause] [Indiscernible]. Want to go ahead and thank the speaker again. Thank