>> Mark Marron: So let me go ahead and I’m very happy to introduce Brain Burg today. He’s visiting us from the University of Washington where he’s just finishing up his PhD. He’s been doing some really nifty work on doing deterministic record and replay in the browser. He actually… I believe a lot of that has been mainlined in to WebKit? Okay, mainlined in to WebKit, so that’s a great accomplishment. And he’s also doing some nice work, I think, building on top of that core infrastructure to build some useful developer tools. So I’ll let him take it away and tell us all about that. >> Brian Burg: Alright, thanks, Mark. So this talk is a summary of the stuff I’ve been doing over the last five years or so in my PhD work there, and a lot of it’s focusing on understanding dynamic behavior and in particular, retroactively, meaning we can go back in time and look at past things. So I’ll be talking about three different projects and sort of the motivation that connects these together. So I know lots of developers, I live in Seattle, a lot of times I’m like—you know—“I’m a software Engineer Researcher. I’m trying to—you know—make your life better. What do you hate about your job?” And the first thing is—you know—they complain about whatever language they use; it doesn’t depend what language, they complain about that. So I try to be more specific. Like, “Why are you stuck at work later than you need to be?” And it’s never like, “Well I just wanted to write one more test,” or “I needed lambdas in my language, it’s usually like, “I got stuck debugging and I just couldn’t figure this thing out.” So a few years ago Robert O’Callahan said to a room full of academics—you know—“we talk a lot about finding these bugs, but, you know, our bottleneck is fixing them.” We’ve got plenty of bugs and we don’t have enough throughput to fix them all. So in my work is figuring out how we can improve the throughput of fixing bugs, and it’s been observed over and over that when you’re fixing a bug, writing the fix is not the hard part, it’s usually understanding what the bug is—you know—what the sequence of events were that led you to something going wrong. So in that sense, like, I’m not really interested in tools that automatically write patches, but things that sort of help you understand the program and what happened. So if you’re gonna go try and debug a bug, there’s lots of different things you could try and do to improve your understanding of what happened. So you could add logging, you could capture a trace or a profile, a core dump. A lot of these things you can only use in certain situations depending on how you can interact with the program. And—you know—if you have limited access to the program, you can’t rerun it then you may only have a few of these available to you. So in a sort of offline investigation, you have full access to the source code and the program and you can run it, but instead you’ll use a static analysis tool to see facts about the program no matter which way it executes. So an example of state of the art work for debugging that uses static analysis is something called Reacher, and Reacher is a reachability tool. So a user can ask reachability questions, like, “Does this method potentially ever call this other method?” using the interface on the right here. And what Reacher will do is it’ll do some reachability holograph analysis, and on the left side it will show the relevant source code, and on the bottom it will show an interactive control flow graph. So these symbols mean things like, there was a guard in front of this branch or in front of a call site and this other thing can call this other thing. So this is an interactive tool but it’s completely static. So that’s great you can see everything that could happen, but it’s not relevant to any particular—you know—interaction. And you need to know what code you should be looking at in the first place, you can’t just demonstrate something and then see what could be called an interaction. So another way of going about this debugging is post-mortem investigation, and in the setting you don’t have source code necessarily, but what you have is stuff that came out of one execution. So traditionally this is like a core dump that you get from a customer somewhere, and once you have this data you don’t have access to the original source code or the execution, you just have whatever you dumped. But this is actually really powerful because you can dump lots of things that would be useful. So the state of the art here is something called Whyline, and the idea of Whyline is that as the program executes it collects enough information about all the memory accesses and drawing commands that you can ask questions about the output and they’ll be able to automatically answer the question. So in this case, the user might ask, you know, “Why was this color black even though it should be blue?” And there’s an interface where you right click on the output and ask a question, and what Whyline will do is look through its huge trace of everything that happened, do some analysis—dynamic and static—to figure out why this could have happened and explain it in terms of a slice. So on the bottom, you can see this happened and that happened, and you can ask follow-up questions about what happened. But the thing with this is you need to record everything that ever happens, so it doesn’t really scale beyond pane applications. You know, if you had to do this in Word or something, it wouldn’t work—you know— it would just create terabytes of stuff. So the most common way of sort of going about this debugging is iterative investigation where you have the binary, you can rerun it over and over to get different sorts of information as you think it’s useful. But usually this information isn’t collated together, it’s you run once and you hit breakpoints, you run once, you get logging, this stuff is just sort of, comes and goes. So state of the art here, in terms of investigating dynamic behavior, is a project called Senseo, and the idea of this project is to show dynamic runtime information as it happens in the IDE. So in the center pane here you can see which methods are really hot because they have more bars in the gutter. And if you hover on a specific call site or method it can show runtime statistics from the VM, like, for this call site, which methods actually got called the most, how many times did this branch get taken, and so forth. So this is great because you can get really detailed information without making your own logging or whatever, but you still need to know what code to look at and you need to rerun over and over. And if you’re debugging something nondeterministic, well, this could be really a problem. So in this talk I’ll be explaining a new way of investigation called Retroactive investigation. And the idea here is that you have one execution and from that you can use tools to go back and look at different aspects of it without reproducing the behavior over and over. So if you need logging or you need coverage or you need a profile you can click a button and that data will go and be collected for you without you having to—you know—reinteract with the program or anything like that. And if you change your mind and need some other information, well, you just use a different tool. So what this gives you is a more live system, more interactive system, where you can quickly narrow down what’s going on. So first I’ll talk about Dolos, which is a deterministic replay infrastructure for web programs. So this was presented two years ago at UIST and in this part of the talk I’ll talk a little bit infrastructure of record and replay. So some key points here, background for deterministic replay, the main idea is that one execution is determined by a program and then some nondeterminism that was used by the execution. So if you keep the program and nondeterminism fixed, then you will get the same execution back. So to do this we assume that the program is fixed anyway, we don’t need to control for that, we just need to find all the nondeterminism that’s used by the program. So in the browser, these are things like event loop work and nondeterministic APIs, and later I’ll sort of go through the full scope of what needs to be handled here. And there’s lots of other strategies for replay and these tend to be grouped into different levels. So there’s tools that replay at the VM level which sort of work like—you know—they record all register state and makes sure x86 executes the same way. They don’t know anything higher level. There are systems that replay at the operating system level so they replay POSIX calls deterministically, it’s still pretty low level. And there’s also application or managed runtime levels, so some of Mark’s work and Mugshot was also done here. Those operate at the level of like the JavaScript engine or the Java virtual machine and that has lots of benefits, like there’s just less to record. And the main nugget in this work is that you can modify the browser engine itself and use virtual machine replay techniques and it’s… you can capture and replay really low overhead that way. So to sort of figure out what the main design goals are here we’ll look at video games. So for replay, video games are sort of like your worst case scenario. Video games exercise lots of different parts of the browser, you download them over the network, they’re parsed and evaluated, they handle user input, they’re really timing sensitive, if you slow down then the game becomes unplayable, you can’t reproduce a bug, they use the animations stuff, this could be implemented using canvas APIs or the DOM or both, and a lot of times these games are implemented with frameworks in JavaScript which make the code hard to understand. So I see this as sort of like—you know—we need to design to video games and you can view Gmail or other apps as video games that… where you don’t shoot bullets, for replay purposes. So, going back to design goals, the first is that there should be really low overhead when you’re capturing. If you miss a few frames when you’re playing a video game that’s noticeable and it may cause different behavior. So we want to make sure that when we’re capturing a specific execution it doesn’t really slow down anything. And the second is that we need to exactly replay the game otherwise—you know—it could have a different ending. So we need to make sure that we’re handling all sources of nondeterminism as accurately as possible. Did you have a question? Okay. The third is we want this to be transparent and composable. And what I mean by this is that replay is not like the one who approached the debugging, it needs to integrate with logging and break points and profiling and so forth, so if turning on replay disables all those other things it’s not gonna be very useful for debugging. So this needs to be implemented in a way that doesn’t mess up everything else. And the last thing is zero configuration. Lots of prior work assumes that you can install some tool as root or that you can install a reverse proxy or something like that, and most people either don’t have root or don’t want to install a proxy, and a lot of websites that won’t work either because SSL. So this needs to be something that just works when you open up your developer tools. So next I’ll talk a little bit about what nondeterminism exists in the browser. So there’s sort of a few main categories. The most obvious one is going to be user input. That’s clicking, mousing, keyboard, it could be things like geolocation or motion data, or even navigation. So—right—you click on a link, that’s a kind of user input. You’re saying, “I want to navigate to this page,” and you want to be able to replay that action. Another thing is network traffic. So anything that comes over the wire could change at any time so you need to capture all network traffic in the headers, and so forth. The scheduler’s also pretty important. So in JavaScript it’s single-threaded, but it’s highly asynchronous so you can enqueue new work to happen at some later time, so you need to make sure that those things happen in the right order. The fourth thing is the environment. So the computer that you use your web app on depends or can change the behavior a lot, so it could change how big things are rendered, it could change the layout, you could be passing in the user agent and it gives you a completely different program, so all those sorts of things need to be captured. And another thing is persistent state. Browsers can use simple things like cookies, which is pretty much like a big string that every page on the same domain can look at, or you can have more complicated things like a SQL database per website. And—you know—if the SQL database has different state and you try and replay, it may do the wrong thing, so you need to make sure that that sort of thing is addressed. So where does this nondeterminism show up during execution? So there’s two sort of main ways that you’ll see this. The first is that it will be specific things in the event loops. So an event from user or from the networking stack or from asynchronous work, these are sort of things that drive execution and are also nondeterministic. The second category of things is nondeterministic APIs, so this is sort of like syscalls in a sense where—you know—you can get the current time or the random number or the size of the window or whatever it is. So those are the two ways that this nondeterminism gets surfaced. And lastly, we need some strategies for handling these things both to capture them and also to replay them accurately. So for things that are sort of creating work, like user and networks, what you do is you capture that whole piece of work in the event loop so you can capture the event and what it’s dispatched to or the network callback or what timer fired and on replay you just—you know—inject new work. The second thing is we can memoize the results, so for these syscall type things, like the current time or the cookie, we can save the value on capture and on replay we don’t actually call the built-in function we use the thing we saved. And third, for things like persistent state or the browsing history, at the beginning of replay we can just sneak in the saved state that we saw at the beginning of capturing. And this way we don’t need to memoize every call—you know—after that point because we assume that everything else is deterministic. So for example, you can save the random seed and restore at once or you can memoize every single call to random number. You know, one of these is gonna be a lot less space, but in general this is sort of a case by case determination, so you could—you know—save and restore all the cookies for a page or you could just replace one use of document.cookie. And it—you know—you can do both of those and see which one uses less space. There’s no sort of golden rule there. So lastly, I’m gonna walk through sort of one interaction with a game and how that might be handled by the replay engine. So here on the top we see things in the event loop. On the bottom we see script to execute. On the right side this is sort of what’s in the recording. So first we load a game that will fire like a resource received callback with some network data. This gets parsed into script and run and this calls some functions and eventually it’ll look for document.cookie. You know, it wants to restore the high scores or whatever. So in the recording what we’re gonna save is this event loop—ding—resource received, all the network data and headers, and then later we’ll also save the memoized value of the cookie. So the next thing that might happen in the game is we press a key to fire bullet or whatever. This can call a bunch of JavaScript code. Eventually it will ask for the current time because this is an animated game, so we need to know when the bullet should be drawn again. So we look at the current time and then set a timeout based on that. So in the recording we have the up arrow and we have the current time. And so similarly when it’s actually time to redraw, a timer will fire, so we save that as an event loop work. And then inside all the drawing routines it’s going to look at the current time and the size of the canvas or the page determine where to render the bullet. So this will produce a recording like this. Sort of the major index here is event loop work and the things underneath each of those are nondeterministic n-wised values. There might also be things at the very beginning like restore the initial state of cookies or history. So short performance story is that this is really fast and low overhead because we’re not really capturing a whole lot. On the left side there’s a graph, kind of hard to read but we picked up a number of small real benchmarks and we recorded the speed with replay, without replay, and capture versus replay, playing back at one x speed, trying to simulate the original timings, and just replaying as fast as possible. So in general, there’s—you know—maybe like a few percent overhead for capturing, depending on— you know—how image intensive the page is, you have to copy the buffers over. But for playing as fast as possible, it’s really going to be much faster, because most web pages are not bound by CPU or network, it’s bound by the user deciding to do something. So you can elide all those waitings… wait times and replay—you know—a few times faster. But in cases where you are CPU bound, like in Raytracer, you know, we can’t really replay any faster. We’re not doing any application specific memoization of rays or something like that. So… do you have a question? >>: Oh, sorry. It’s time, not speed. Okay. >> Brian Burg: Oh yeah. This is a relative to one x. And also these recordings are quite small, so what this is showing and… orange is the size of all the images on the page and other content. And the other bars are showing different representations of the other stuff we recorded in the recording, so where the user clicked and what timer fired and so forth. So if you see a recording as this tiny red bar plus the orange bar—you know—the size of the recording is just dominated by the stuff on the page. So—you know—sending a recording around is not much worse than just sending the page saved as HTML, or whatever archive format you have. So that’s sort of the replay framework and now I’m going to move on to Timelapse which is a marketing term for several different interfaces for navigating through a recording. It’s great that we can record and replay, but unless you can move through it, well, you just see the game play again, and that probably won’t help you too much. So, moving on, we built this replay framework and we’re like, “Well, lots of people said replay is useful for debugging, so like, it should be really obvious what the UIs gonna be, right?” So our first thought was, “Well, let’s just visualize the recording and show what’s in it and show our place in it over time.” So now I’ll show a video that has the initial UI for this and sort of explains—you know—what we did at first cut for the UI. >> Video: Can precisely capture and reproduce entire program executions. To record, a user opens the time lapse drawer and starts and stops recording by clicking a button. Instead of writing reproduction steps, users can record an execution while they demonstrate interactive behaviors of interest. Recordings can be saved to or loaded from file and shared by email or bug tracking software. Recordings can be replayed on demand using Timelapse’s navigation affordances. Timelapse visual lights is program inputs over time using several linked views that provide different levels of detail. An overview shows the entire recording and its inputs using a stacked line graph. Each input category is visualized with a heat map timeline. Users can adjust the displayed interval by adjusting zoom settings with linked sliders or by scrolling horizontally and vertically. Each circle can be inspected to reveal the inputs it represents. Lower level recording details are also available. The current replay position can be changed by dragging the red slider in any view or by double clicking an input circle. In this game, the player should only have one bullet on the screen at a time, but a glitch allows multiple bullets. This is hard to debug without Timelapse, because the developer must play the game while setting breakpoints and reasoning about the program’s behavior. With Timelapse the developer reproduces the behavior once and can then seek through the recording to isolate the buggy behavior to specific inputs. Breakpoints and other debugging tools are usable while replaying and recording. A developer can experiment with breakpoint placement without having to reproduce behavior or worry about losing execution state. Breakpoints can be set when execution is paused. When it continues, the debugger will pause normally. Once the developer has isolated the bug using Timelapse, they can use existing tools to… >> Brian Burg: … to fix the bug. Got clipped a little early. So this is sort of version one of the UI. It shows the recording and it shows inputs over time. These inputs are the event loop inputs like user, network, and timers. And the user can see a huge list of all of them at the top, or they can look at the visualization and double click to move around through it. So this is B1, we were wondering—you know—does this work for debugging tasks? It seems kind of useful. You don’t have to manually interact with the program, but it doesn’t dump you directly to specific causes of the bug, say. So we did some user studies to figure out—you know—how does this work out in practice? So the studies were two rounds of twelve each, and these were researchers and web developers and people with enough experience to ship a web page, or application. Despite that—you know—we interviewed people here, at other big companies. There was a high variation in prior debugging experience. So some people fix bugs all day, some people write new features, some people do design, and we wanted to sort of see how these experience levels would impact how they could pick up a tool like this and use it. So for the task in the studies we had the multiple-bullet bug and also a bug in the Tetris style game. And the order of these are randomized and split up among two groups to see what the differences would be with the tool and without the tool. So they had Timelapse for one tool and they had just normal for one game and for the other game they just had the stock tools. And we were really interested in how does replay affect, one, sort of like the high level thing, like, are they more productive? But more specifically, do they spend more or less time reproducing behaviors? And how does it interact with other tools? So sort of the high level results are that regardless of their control group, people spent ten to fifteen percent of their time in the debugging tasks just getting back to specific states in the program. Whether this was manually interacting with the program or using Timelapse to zip around by different inputs. With Timelapse they didn’t have to manually do this, they could just zip around, so they did it even more ‘cause it was easier. But it didn’t necessarily make them more productive in fixing the bug because— you know—the bug was still further down in the execution, so they still had to revert to logging and breakpoints to—you know—find more specific programs states that had to do with the bug. When they were replaying the users found it really hard to use breakpoints at the same time because we could halt the replay by just not feeding any more inputs in, or the debugger could actually be paused, and that’s a separate system. And so we tried really hard to make it clear what was going on, but users got really confused as to what was being paused and why and why do I need to care about the difference? Also, many users were just really uncomfortable using breakpoints even without Timelapse. Most people use logging as far as they can go until it’s no longer feasible to use it to fix their bug. There are a set of power users that will go straight into breakpoints but in our study the vast majority of people, we had to really, like, coax into using breakpoints at all. So our sort of insights from this round of studies were that: people are really looking at the outputs, not the inputs to the program, so they’re really focused on, “What does the game look like at this time, that time?” and they used that to sort of dig into more interesting states. And they were trying to use logging as a way to index into this—you know—vast database of runtime information that we have captured in the execution. And the debugger is sort of like the needle on the record player, like, that’s what they want to be at the right place. They don’t care about being at the right place with respect to time, it’s more about, is the debugger at the statement that’s buggy or that will get me some useful information? So with these in mind, we thought about ways to come up with better interfaces for replay. So in particular we’re looking at how can we navigate to past program states? And this multiple-bullet bug, it’d be really great if we could—you know—add logging or go to the statements when the bullets are created or destroyed, ‘cause in this example we’re creating more bullets than we’re supposed to so there must be some—you know—missing statement or wrong guard or something like that. So what if we could just add logging there without replaying or anything. So we tried to figure out how we could implement this ability to retroactively go and add logging into this execution. And so this is what were called Probes, and the basic idea is that it’s like a breakpoint, you can set it on any statements and add some expression to be evaluated when you get there, and that you can press a button and the replay infrastructure will go and collect all possible outputs that would be executed at this line. So it’ll go replay, regenerate all the logging that you need for that probe that you added and for any of these samples that you got from the probe you can go back in time to that statement that generated it. And last thing is that this is all integrated into the developer tools so you could log a bunch of values, you could go back in time to one of them and then continue with the debugger. So this is sort of B2 of the UI. We have less of a visualization over time at the top and instead we have actual runtime values over here. So on the left side this is a—you know—your window with breakpoints and probes. On the right side are the values that we collected from each of the probes. So in this case we have a probe at plane 107 at the bottom and we have three probes installed at that point, one is curX, curY, and the event. So then we can see over time—this is like a Tetris game—we can see the cornets of the piece as it moves around. And we can use that as an index into the execution to get to the point when the piece was at the bottom or all the way to the left or whatever it is. And we can also get screen shots so you can correlate this runtime state with—you know—something that’s visual. So in this case you can clearly see—you know—what the cornets mean in terms of the output. Oh, and lastly, the sort of timeline at the top and the actual samples here are synchronized together so things that are faded out happen in the future, meaning that—you know—they’re not live on the heap yet because it hasn’t been executed, but we can see a preview of it. So if you want to—you know—say if this was a more complicated thing like an object you would have to replay up to here to have that thing actually be created and inspectable. But the developer tools often make, like, a preview, and you can see the preview from the last run. So what’s going on here under the hood is that to replay back to any of these things that we logged, we need to be able to uniquely identify any statement that executed. And this takes advantage of the fact that my recording is split up into a bunch of event loops, and then stuff that happens underneath that. So back in this diagram if we scatter a bunch of statements throughout the execution we can index to see—you know—per event loop, we can see how any times each of those statements is executed. So say we want to get to the very last purple one at the bottom, well, we don’t even need to install breakpoints or anything until we get to that third event loop. So we’ll run up to there with replay. At that point we’ll say, “Ah, this statement at Tetris line 107, we need to install a counter there.” So we can do that with break points or by rewriting the program using byte codes, and once the counter gets to two—right—we can just—you know—stop when the debugger… and we’ve replayed to that executed statement. So this was a way of—you know—having a time point for any executed statement. In the prototype, this was done with breakpoints just because it’s easier to implement, but in the future it’d be great to do this with rewriting to the bytecode, because going into the debugger is really expensive. In WebKit, if you turn on the debugger and have any breakpoints, it has to rewrite all the code blocks and not go into any of the JITs and it’s just like ten X slower, so if possible, you want to avoid that when you’re navigating through a recording. So with Probes we have this retroactive workflow. We can add some logging to some interesting statements. We can get a bunch of states. We can find some weird discrepancy like here, the bullet dies more times than it’s created, so that seems sort of sketchy. Maybe we want to replay up to that point and investigate further. And then we’re there and we see that either the logging is wrong or there’s really something interesting going on. So we can add even more logging, rescan the values and then keep going from there. So there’s also some other uses for this time points of specific statements that we didn’t look at too much. So the first is that you could theoretically replay up to any outlet producing statement. So we just did this for Probes but anything output on the console you could replay to the statement that made that output. You could generalize this to replaying to specific lines in the profile that you got, so if some code is really hot you want to know like, why did we get here or why aren’t we busting out of a loop? You can replay up to some point in the profile and start debugging that way. The second is that with this you can pretty accurately represent any point in the execution, you could add annotations like, “This seems like an interesting part of the recording,” and you could send that to someone else and they could get back to the exact same point. And the last thing is, sort of extending that, where you could have a journal of your investigation of a specific bug and you could keep that on the bug tracker instead of in your brain or in your personal notebook, so that someone else could pick up your debugging session and just continue or look back and see what you’ve already tried. So lastly I’ll move on to a tool called Scry. So Probes is a way to navigate through an execution that’s still pretty statement and codecentric and—you know—you can navigate to specific logged statements, but it’s still a really—you know—it requires you to know where to look in the codes. So we were thinking deeper, like, how would we get a tool that lets you go from output back to the relevant pieces of code? So Scry is a feature location tool for visual changes on the web page. So I’m gonna play a short video which shows the motivation and how the prototype works. >> Video: Developers often reuse existing designs and interactive behaviors by inspecting the code of third party websites. To reuse an interactive behavior a developer needs to understand what it does, what DOM and CSS it uses, and how interactivity is programmed through JavaScript. Feature location, the task of working backwards from outputs to internal states to source code, is time consuming and difficult with existing tools. First, current tools do not capture an interface elements output history as it changes. Users can only inspect the current state of the web page. Second, isolating an element’s internal states is difficult because these states, DOM elements and CSS styles, have hidden nonlocal interactions. Third, existing tools provide no link between changes to the DOM and CSS and the JavaScript code responsible for these changes. We present Scry, a feature location tool that directly addresses these challenges. Scry introduces a novel future location workflow. First a user selects output examples from an output history timeline. Then Scry visualizes differences between the DOM and CSS that was used to render each example. Finally, the user selects a single difference to see the operations and source code responsible for the change. We illustrate these features and design contributions by using Scry to inspect several examples. This article web page has a search box that expands and contracts when clicked. However, it’s difficult to see exactly how the transition is implemented because the interaction is brief and intermediate states are lost. To inspect this element using Scry, the user first finds the DOM element using the web inspector’s inspect function. Then the user tells Scry to start tracking the element. As the user demonstrates the search bar interaction, Scry populates a timeline with the output history of the search bar. As a tracked element’s visual appearance changes, Scry captures output screenshots, internal states, and a log of state changes. Using this data, Scry can show the rendering engines inputs and outputs at any intermediate state. This used three panes, showed the screenshot, DOM subtree and CSS styles for the selected element. When a user selects two screenshots, Scry visualizes the differences between them using familiar inline diff annotations. Here we see that the root elements width is animated; in the inner form elements capacity property is also animated. The Tetris games user interface is implemented entirely with DOM elements and CSS styles. Let’s use Scry to find out how the game board works. Pieces can be translated, rotated, or dropped. If we compare two screenshots where a piece was translated, we can see that the same DOM elements are used but are repositioned using CSS. If we compare two screenshots where pieces rotated, we see that instead the DOM elements for the piece have been removed and readded. Added elements and attributes have green highlights, while removed elements and attributes have red highlights. By clicking on a diff annotation we can see the operations that caused the change. Each operation is linked to JavaScript source code which can be previewed on the right pane or displayed in the main content browser. Functions may rotate piece and rebuild piece seem like good starting places for investigating the relevant game logic. We have demonstrated Scry, a tool for locating the code that implements interactive behavior. Scry supports a new workflow for feature location based on selecting output examples, comparing their internal states, and jumping to the JavaScript code responsible for state and output changes. >> Brian Burg: So that’s sort of a demo of the UI. So in the rest of the talk I’d like to explain what’s going on under the hood so that these changes can be tracked and linked back to source code. So first thing to start with is, well, on a web page what determines the appearance of some element, right? And by some element we’re talking about something you click on the UI, but you could also include its subtree in the DOM. So the first thing is, well—you know—what’s in the DOM—you know—is it a heading? Is it a form element? What text is underneath it in the tree? That’s gonna influence how it looks. The second thing is CSS properties, so things like the color, whether it should be floated or displayed like a table, should be underlined—you know—are children table rows, or are they columns, that sort of thing? So that can influence how something is rendered. So these things get fed into the rendering engine and that will spit out—you know—some rendering of it. It’s a bitmap. So that’s how we get one visual state, but how can we go between visual states? So we can change either the style properties or the DOM tree. So on the left side there’s some ways to mutate the DOM tree, there’s— you know—structure operations, like I added or removed a child or I changed my ordinal among my siblings, that could change CSS rules. And on the right side there’s lots of different ways that properties for a specific element could be changed. So style properties can come from an inline style, which is sort of a legacy way to say, this node has exactly these properties attached to it. They could also come from style rules which are declarative ways to say, “Match all these elements in the page and apply these properties to them.” So you could have a CSS rule like .active and that will match anything with the active class and maybe—you know—make the element look more active. And—you know—if a rule starts matching or stops matching that could change what properties get applied to the element. So, and then the last thing is animations. The browser is able to animate specific style properties, like the opacity of an element. So you can say, take opacity, vary it from zero to one and take one second, and underneath the hood what the browser will do is figure out how many frames to draw, interplay opacity for each of those and apply the property to the element before it gets rendered. So all these things could influence—you know—what an element looks like on the page. So if we had to track all these things, it would be really expensive and invasive to record them and figure out which of them actually have any effect. And—you know—the browsers like a million minds of code so we don’t want to have to reason specifically about what causes changes. So instead we can take a more output-based approach, and so what Scry does is it has a snapshot of the element as a bitmap and what instruments are notifications from the rendering system about what rectangles are painted on the screen. So we have a snapshot and we have a bunch of rectangle notifications and the first thing we do is see if they intersect. If we repainted and it doesn’t intersect with the target elements then they couldn’t have changed. And if it does intersect, then—you know—we could have possibly repainted some part of it in a different way. So we use this as like the first pass. If these things do intersect then we’ll go and take an actual bitmap snapshot of the element again and then we’ll do a fuzzy diff of the two images to see if they’re pretty similar. And right now Scry uses one percent mean pixel difference, so what it does is it looks pixel by pixel and it computes how different it is and if it’s greater than one percent over the whole image then we say, “It’s too different, this is a new visual state.” And this is really designed to get around gotchas like subpixel rendering or font smoothing or video encoding. There’s actually lots of different sources of nondeterminism in the rendering pipeline itself, so our experience one percent’s a pretty good level at which to—you know—accept or reject changes. So once that passes, can capture a snapshot of this new visual state. And inside this visual state we want to capture what it looks like, so a bitmap of the element and its subtree. We also want to capture the subtree—you know—what was in the tree and its attributes and so forth, and for each of those elements we also want to capture what its CSS properties were applied to that and also, like, what are the sources of the CSS properties? Right? It could come from a rule or it could come from an inline style and we want to know the source of that so we can link it back to specific places in the source code. And lastly, record the mutation operations mainly on the DOM tree, so append and remove child and changing attributes and so forth. And these are used later to figure out—you know—what actually caused the change. So in the UI you saw that—you know—the main thing is you can select snapshots and compare them. So there’s lots of ways to compare trees and we picked probably the simplest thing you could do which is look at each node on either side and compare them node by node, instead of doing like tree differencing or edit distance or anything like that. And for the small snapshots we’re taking that works fine. You walk over—you know—a hundred nodes, you look at them pair wise, that— you know—you can decide—you know—does it have fewer children? Does it have more children? Have attributes changed? Have the style properties applied changed? Has their sources changed? So you compute a change summary of changes to the DOM structure in the properties for each node and then you visualize that using green and red diff markers. And this crucially relies on the fact that we can instrument the browser and we have a stable identity for DOM elements across runs. So we know for sure that this element in the heap is still the same element of this later snapshot. So we can use that to—you know—just index into both of these snapshots and then, like, exactly compare them instead of trying to—you know—do some fuzzy matching to figure out which node corresponds to which node. So and then the last thing we need to do is figure out what code actually caused the change or changes. So you see back here we have a list of these mutation operations and what we need to do is figure out for some specific change that we’ve computed, what operations are responsible for it. So this is called slicing. And so this is pretty straightforward slicing approach. First you instrument all these operations, you build a dependency graph. So in the example we had this subtree that was added and removed, so you would have individual events for we added this element, we added its child, we set the attribute. Setting the attribute on some child that was added since the last snapshot is gonna depend on the node existing. So there’s a pretty straightforward DOM tree shaped dependency graph here, and based on this changed summary, we look at a bunch of candidates for what sort of operation could have caused this change. You know, if we see that the class was changed at this string we look for that string being set in an operation, and from there we work backwards in the dependency graph to find what other operations were required for this thing to happen. And then in the UI that’s presented as just a subset of the operations we captured, and each of those has a call stack that we can see what code contacts called this. So, it sounds nice, but there were a lot of fun, interesting challenges along the way which I’ll briefly cover here. So the first thing is that rendering is sort of weird on the web in that you can have some target element, like that search box, and if we ask the browser to just render the search box, it’s not going to render the black background because that’s from the root node, the body. So—you know— according to the browser—you know—that’s what this node looks like. So someone else go change the background color, the user’s gonna think that the search box has changed but it—you know—if you just look at the subtree, it hasn’t. So you have to be careful about how you’re defining, like, what the element is and—you know—what the visual aspects are. The second thing is software versus hardware rendering. We depend on these paint notifications and those only happen if you’re rendering in software. You know, if you’re rendering on the GPU and compositing and all this stuff it’s happening somewhere out there on the GPU and you’re never gonna figure out what exactly happens unless—you know—you can reconstruct that from your instructions to it. So in some cases we have to turn off hardware acceleration for animations and so forth, so that we can get accurate paint notifications and then capture the intermediate states of the animation. And some animations you can’t actually accelerate on the GPU, like changing font size or something, so in that case we’re fine, but in other ones like, transforming something—you know—you really have to do this. The next thing is, this relies on stable DOM element identities, both from the backend we know that something on the heap here is also the same element at some later time, but also in the applications. So in that Tetris game things were pretty stable from snapshot to snapshot—you know—the red piece is still the red piece, it has the same elements, but in more complex applications a lot of times DOM nodes are just used as like these spongeable building blocks that you need to—you know—construct the UI. So React is a popular library these days and you build up a virtual DOM using only JavaScript and under the hood React has this, like, pool of DOM elements it just like plugs in here and there to create the proper visual effects. So this really confuses Scry because it’s expecting these nodes have some sort of identity in the user space. And another thing is that a lot of times people use jQuery or other libraries to do all the UI work for them and this can make call stacks pretty useless or hard to interpret because there’s like five layers of dollar signs and gobbledygook. So it’d be great if you can filter out some of that, but that’s sort of a orthogonal problem. The last thing is that sometimes a lot of style properties change. They may be changed intentionally but they still have no effect on rendering. So for example, sometimes people try to move text to be left or right or centered and the property that they’re using doesn’t actually apply to the layout context of the element they’re applying it to. So it’s really a no op and it changes based on classes, but really it has no effect on what you see on the screen. So it would be straightforward to—you know—go one by one and prune the styles that don’t really have any effect and just not show them in the diff because—you know—the user may look at that and think it’s significant, but really nothing changed because of it. So the last challenge here is that to use a center replay context where you’re at some postdate and you want to go back in time to see the history of an element, you need to be able to identify that element on successive replays of the execution. So we can’t just rely on the heap address, we need to do something a little more intelligent like assign deterministic IDs or something like that. So for this you notice it didn’t use replay at all because for this we just prototyped the rendering stuff, but earlier version was integrative with replay so we could use this to go back. But for time I—you know—just sort of stopped working on that part, but we know, like, how to hook it together. So this sort of illustrates—you know—several different ways of moving through a recording once you captured it using Dolos. In the future work, there’s lots of different ways you could take these ways of capturing and navigating recording and applying it to more—you know—specific domains. So the one that I get asked about the most is, “Can I use this for bug reports?” And the answer is, “Yes, you could once it’s finished.” And serialization works. Right now it doesn’t quite work ‘cause—you know—it’s not important for the video, but—you know—we can serialize the whole recording to JSON and I’ll load it back up and that works fine. But—you know—there’s probably still some determinism bugs but we’re not saving enough state. The next thing is that you can imagine more complicated analysis that actually do some sort of dynamic analysis or on the fly slicing or something somewhat heavy weight, right? You don’t want to necessarily do that while the user’s staring at the program executing, but if you’re able to take this recording and then replay it on a beefy computer or in the cloud, then you could—you know—press a button, you send a request to the cloud, it goes and gets all this data and it shows it to you on your own computer. So this would be great because—you know—then we don’t care so much about the overhead of dynamic analysis as long as it’s—you know—gonna finish sometime, we can just go do this in the cloud and we can’t really do this in a lot of contexts now. Something from a more testing angle is using these recordings as a way to sort of author test cases. So if you develop an interactive application you’ve probably been through the pain of trying to write a test that simulates the user and—you know—you have to get the coordinates and, like, enter them in and so forth, and it’s kind of a drag. So what if you could just take a recording and then take a subset of that and spin it off as a test? So I haven’t looked at this, but other people I’m working with have been trying to do this and also minimizing—you know—the stuff in the recording to just stuff that’s necessary for the test case. Another interesting avenue is using these recordings as performance benchmarks for interactive web pages. So if you go to Edge website or Chrome or whatever, there’s lots of benchmarks and most of them are just JavaScript and then there’s a few interactive ones but really it’s like, how fast can we load this page and play around on the Facebook gate page, right? You’re not logging into Facebook and then, like, messing around with it and—you know—sharing photos, and for our users, that’s sort of like the more important thing, not how fast can I run—you know—my Mandrill benchmarks. So you could use replay as a way to get deeper into those applications, to—you know—exactly replay those interactive behaviors or you could synthesize a benchmark that simulates it. And lastly, like, there’s just so many ways that you could plug in existing tools to be retroactive in the sense that you can use them as ways to index into the past execution. So I mention profilers as one, but there’s also lots of other opportunities. So, that’s all I got right now. I’m happy to take questions. [applause] >>: So how does the replay capabilities work if you have, like, web page with multiple iframes of different origins communicating over PostMessage where they may be on multiple threads or something like that? >> Brian Burg: Multiple threads. So the unit of replay, right now, is pretty much a tab on the web page. So if the page is gonna communicate anyway they—you know—they’ll be recorded together. So we record the main frame and all the iframes underneath it and PostMessage would be recorded as sort of like an event loop piece of work. On replay, everything’s like workers, we are not actually executing the workers, we just save the messages they send back and we play that on the main thread. >>: Okay. >> Brian Burg: And like, right now, since you can’t see workers or what they’re doing and a lot of developer tools that—you know—it doesn’t matter so much, but you could also—you know—make a subrecording for those if you wanted to step into it. >>: Can you kind of talk a bit more about the browser support they required in terms of how much of it was using well established extension points or how much of it was just changing browser code? >> Brian Burg: It’s all changing browser code. >>: Okay. >> Brian Burg: So there’s been prior work that tries to rewrite JavaScript to capture this stuff, like Mugshot done by James Mickens here, but the problem with that is it doesn’t work well with developer tools. Like, you look at this rewritten code and you try and step through it and you’re like, “What the heck?” So we were just like, “Well, we’re gonna take off all the—you know—niceness and just, like, change the actual browser.” And I used to think it was possible through extension points, but now, like, there’s just so many weird internal sources of nondeterminism in the browser itself that it doesn’t really seem possible anymore. In particular, we’re doing up asynchronous work that can trigger JavaScript but wasn’t really caused by the user or the program itself. >>: So you mentioned this future work perhaps taking replays and rewriting to this test. I know that selenium has that capability but it’s kind of like it’s a starting point for your test and then you are supposed to refactor using a page offing model, something like that. Would you try to… given that, I guess, you can make this more deterministic, would you think this is an easier way to write tests? >> Brian Burg: So one is you can just replay the recording itself as a test, which is fairly [indiscernible], or the other is like synthesizing a selenium test from it. And—you know—we haven’t really looked at what’s required there, but—you know—we have all the event data so it’s pretty much one to one. We’d be able to automatically go write the selenium commands to do that. Whether it’s easier or not, I don’t know. It sort of depends on how much effort’s applied to it. There’s also issues like, do we package the resources or do we want them to be live? So a lot of it seems to depend on, like, what the use case is and how well that would work. >>: It would be somewhat difficult to… as the code evolves, which is the purpose of having a test, right, it would be difficult to keep the nondeterministic playback in synch, right? ‘Cause if a code make a different number of calls to a nondeterministic API during a loop cycle, right? You’ve not had memoized values for some of them. >> Brian Burg: Right, right. So for testing where you want different executions to happen, you do need to be able to have the replay go, like, off the track so to speak. And right now, like, we haven’t really investigated that, like, what you need to do, but some people at Mozilla have been working on this tool called rr which is essentially replay of POSIX calls so you can replay the entire browser together and they have this ability to make a diversion session which is, essentially you go and make a different ending and then you could go back. So we’ve been looking at ways to support that. But the bigger point is that, like, yeah, you can make different network requests or different code could run in response to a click, so you need to be very explicit about what you want to be the same every time. Is it the user? Is it the network? Is it timers? That just seems like a—you know—what you want case by case. >> Mark Marron: Sounds good. Any other questions or… yeah, I guess we’re good. Thanks again. [applause]