>> Tom Zimmerman: So I am very excited to introduce Christopher Parnin from Charter Institute of Technology. Chris does research on software engineering in which he focuses on different perspectives like from an empirical perspective, ACI perspective and cognitive neuroscience perspective. He has worked at Microsoft before as an intern. A couple of years ago [indiscernible]. He has also done research with ABB research, so he is very well founded in [indiscernible], has lots of Windows experience, lots of development experience. His research has been recognized by many awards. I think, I hope I got everyone. You have got like ACM SIGSOFT Distinguished Paper Award, CHI honorable mention about ICSE, Best Paper ward and also my favorite one, IBM HVC, Haifa Verification Conference Most Influential Paper Award. And not only did Chris get many awards, his research has also been featured in Game Developer's Magazine, Hacker Monthly and it’s frequently discussed in social networks like Slashdot and Reddit. So Chris, go ahead. >> Christopher Parnin: All right, thank you Tom. Is this okay volume wise? Okay, so I would like to just get started a little bit about me. So my grandparents came from Cuba to this country and they didn’t have anything to start with. So one of the things that the did was they had enough money to buy a small house and turn one of the rooms into a refrigerator so they could sell eggs to the neighborhood because there were no grocery stores in that area in Miami. And that’s how they started bootstrapping themselves and I have kind of taken that work ethic myself and try to bootstrap myself throughout my career. And one of the interesting things that have happened lately to me though is having a 17 month year old son come into my life and that has really changed my perspective on how my work as a developer is influenced. Just as an example last month my son had a fever, I had to pick him up from daycare, stay home with him that week and then the next week we got snowed in, in Atlanta and we couldn’t go anywhere for that week. So it took two weeks before I could get back to work again. To me that’s the real type of interruption that I face these days. And just to give you a little bit of background I am at Georgia Tech, but I have been a developer for over 10 years and that’s mainly because that’s how I paid for college. I paid my way through college, but that’s actually been a blessing because that has exposed me to the real challenges that developers have and it’s helped kind of ground my research approaches. And during this course of experience I have had the chance to work at several different companies as well as do studies and consult with many different types of companies. So I have had a nice exposure to the different challenges developers have. But, if you want to take a broad perspective of what my research approach is I like to look at systematic problems that developers experience, looking at the whole ecosystems, looking at very broad pain points that developers experience and hoping that if you can tackle them and address them that you will subsequently have broad impact. So just to give you some examples we have looked at things like if you introduce new language feature what is the impact on the community? How does it affect the code quality? Do people actually uptake that new feature? What are some of the barriers to blocking its adoption, whether people actually migrate to that new code, to that new feature or not. We look at how developers actually use tools in programming environments and one of the things that we found was that developers actually like to refractor a lot, but they don’t use tools for them because there are some basic usability issues with them that we’re able to identify and show that if you remove those barriers you could actually have better use of refactoring tools. Another area that we looked at was looking at research and statistical debugging to see how that would actually help developers in practice. And the thing that this is a research effort that’s been going on for over 30 years and they have actually never done any user studies on them. So when we did a user study on this approach we actually found that for most developers even if you optimize the algorithm to its optimal levels it actually wasn’t any faster than developers debugging themselves manually. So they basically were using all this effort, this research effort in a direction that wasn’t going to produce any impact in the end. And this paper has actually helped people refocus how they are building statistical debugging techniques and better applying that to development tools. And I am going to talk a little bit about crowd documentation in the end of the talk as in feature work. But, what I am mainly going to focus on this talk is my work in supporting interrupted programming tasks. The reason why I like to focus on this is because it kind of shows a good span of my empirical work, cognitive work and some of the user studies and tool building that I have done. And so why study interruption of programmers? There is a very simple reason: one I found that programmers were often put in situations where they didn’t have any data to support the situations that they were in and they pushed back against management. So you have people being in situations where they’re told that they can’t wear headphones while they are working, or they were asked to be in a time study where they were called every 30 minutes for 30 days to tell the person what they were doing and they couldn’t explain to management how that would be completely unproductive for the whole team. You know managers have to also think about how to create policy, you know, how frequent should we have meetings, whether we should have an open office, of whether remote working is a good idea or not. You know there are some CEO’s that basically position the office as where work doesn’t get done. And we know that programmers when they are interrupted have poor recovery support and so they try just avoiding interruptions by working late at night, going to coffee shops or just putting themselves in these uncomfortable situations. For example there are even reports of people eutropics because they can’t focus at work anymore. So they are taking drugs to kind of focus better at work. And finally if you look at the research in programmer cognitive theory most of this was done in the 80s and they were built on theories in the 60s on memory that we had available at the time. And they were kind of built around when programs were very small and weren’t built the same way that we build them today. So there’s a good chance to kind of refresh these theories. So what do I mean by interruption? So I have actually a very simple definition. I just mean the break in continuity of work. So whether that’s a coffee break or someone came to ask you a question you at some point have to stop your focus of work. And you can think of this in terms of multitasking. So say there is this kind of switch, this fact concurrent multitasking, like if I am driving and programming or sequential multitasking where I am cooking and then programming. So people from my perspective I am not really looking at concurrent multitasking. I am looking at these longer term switches in work. And we know that office workers in general have to work in very interrupted environments. Most tasks are interrupted and when they are interrupted they contain twice as many errors and take twice as long to perform. And in software development in particular many developers face long-term interruptions where they aren’t just asked a question and are interrupted from their work, but they are often blocked from being able to perform their work and having to shelve that task for several weeks. One example my wife came across, because she is also a software developer, is she is working on some mapping software and they wanted to change the representation of the overlays that they had while she as in the middle of doing her task and they told her, “Well, this is probably going to take two weeks, so just doing work on what you were before and just wait.” So these sorts of dependencies in workers cause people to have to constantly switch between tasks and shelve their task all the time. So in this talk the central argument I want to make basically follows as such: I view interruption as a stressor to memory. So we know that if you interrupt someone at a point of high mental load that’s incurs the longest recovery time, but I really wanted to look at the perspective of the memory cost and the resources that developers need for working on their probing tasks and not things like anxiety or trying to focus on something for a long period of time. I was looking more at the memory cost of interruption. And from what we know in how developers work is that they tried recovering from these memory failures with very ad-hoc strategies and very little tool support. They are doing very silly things basically. And I took this perspective that we can use cognitive neuroscience to model different types of memory needs that programmers have. [Buzzing]. Sorry, my wife was calling. And we can build tool that address those needs. And finally there is exciting opportunities to use brain imaging to better understand memory support for developers. So this was my research plan in studying interruptions. I basically use a mixed method of different techniques with the first stage being an exploratory analysis where I use surveys, history of developers work actions and various cognitive measures to kind of see what this landscape looked like. Then I perform an explanatory analysis where I wanted to be able to really explain why developers were having the problems that they were having and be able to build tools that address them and finally just a series of evaluations that looked at how interventions and different tools supported developers or not. So for the first stage of my research we wanted to be able to collect data on how often developers were being interrupted? What’s the typical recovery cost? And what were they doing to recover? So to get this information we used IDE usage data or things they were doing in their program environments. So things like every time they used an edit, used a command in the IDE, when they debugged, refractor, navigated, used source control, what files they were viewing, etc. And we did this for 86 programmers over several months of programming history and we additionally had over 400 programmers where we asked them various questions about how they deal with interruptions. And so for the first analysis that we did with the usage data was we wanted to create sessions of work history so we looked for continuous groups of events and we defined an interruption as a break in these events. And to look at cost we looked at how long it took them to edit again. And so in psychology literature they used this idea of resumption lag to see how long people take to make an action when they first come back to a task. For us, since we knew they were programming, we wanted to look at something a little more concrete so we looked at edits in specific. And then finally we wanted to know what they were doing before that edit. What was it they were trying to recover? So they help show how this data looks like, I have this session visualization. So one thing about this visualization is I have this metric of focus and basically it just shows how continuous the events are or not. So as things are more continuous it goes up to 100; as there are breaks in work then it goes down and decays, right. So you can see here the various sessions that occurred. So there are three sessions here. And then we can start doing things like overlying edits over these sessions. So this is one of the typical things that we would see is that developers typically had this warm-up session in the beginning of the day where they were kind of looking around their code, but never made any actions. And then they resumed work several hours later. So this was actually seen in other type of interruption studies where people would kind of try starting work and then just kind of wait for that off background loading process to happen again. But, eventually we see this developer getting into focus. They start making edits in their code and they have a pretty good session going. But, at some point they experience an interruption. They last left off at method A and when they came back we can see that they visited 7 different files, viewed source code history and it took in total 20 minutes before they decided to edit again in that method. Yeah? >>: What is an edit? >> Christopher Parnin: This is a change to a line of a code. >>: I got it. >> Christopher Parnin: Yeah, yeah. >>: Why doesn’t focus drop sharply? Why does it drop slowly? >> Christopher Parnin: It’s just a decay factor. You are not seeing the individual points, but as the spread get’s larger the decay goes down. And so then what we did when we looked at these measures is we looked at what they were doing before making that edit again. So whether they were looking at source control, whether they were looking at compile errors, whether they ran the program again, whether they were navigating around, etc. So you can get an idea of what a great day might look like. So this is someone coding for 5 hours continuously with lots of edits, a lot more work that they have accomplished in this day than many other days. But, the sad thing is most developers only get about once a month, a day that looks like this. What was more typical was something that like this, where it’s just a lot of stop and go. So where someone is looking on a little bit, having to take a long break and then coming back again. And this induced a lot of overhead where they had to keep rebuilding in context, make a little bit of changes and then go back to work again. So we did this for over 10,000 programming sessions over several months of history for programmers. So we found some interesting things, we found that it took 10 to 15 minutes for programmers to start editing again when they got back to work. And this was consistent with self reported survey data on how long they said it would take them to recover. And we found some interesting cases where if you looked at someone editing a method and then coming back to resume editing that same method it never was just a minute, right. Most cases it took them 10 to 15 minutes for them to keep getting back into working on that same code. And it wasn’t just because, well one of the things that we would see is they would often have to revisit other files before coming back to that same place. So they had to keep rebuilding context again in those situations. And those programmers were again, working in this very non-continuous working state. They were lucky to have a 2 hour break in a day. And there was a typical 20 to 30 percent overhead cost if you looked at the resumption lags or edits lags in total of all the session links. Yeah? >>: So I know you are stating these in a very factual way which is good. >> Christopher Parnin: Yeah. >>: But, in a way it sounds like almost a greedy thing for a programmer to want to spend 5 hours programming without interruptions, because if they work on a team you might be blocking 5 other people who actually need to talk to you and get something done. So, overall productivity might go down if you are uninterrupted. >> Christopher Parnin: Yeah, so there are these global optimizations that you can make across the team. We were just looking at individual developers. Later on I did some field studies with looking at interruption of developers that could see some of that dynamics in play, but I think it’s not so much to ask a programmer to at least have one day to do their work, not necessarily 5 days. Yeah? >>: So developers do a lot of things other than writing code. When you were asking them about interruptions or gathering your data were you looking at interruptions when they were coding or also interruptions when they were say in a meeting, have an urgent e-mail or viewing specks, doing builds or other types of activities other than just writing code? >> Christopher Parnin: Yeah, we were primarily looking at the IDE, but there is other data that I have now that also includes outside of the IDE data. >>: Okay. >> Christopher Parnin: So a little bit about analyzing the survey results. So we had over 4,000 text entries to kind of read through and code. We had about 17 questions that we asked developers on various things like their note taking habits, some of their preferences on how they recovered from code, etc. And basically the summary was we were able to come up with a general sense of what strategies that they used. And so it comes in two flavors: the suspension strategies they have and the resumption strategies they have. So the suspension strategy basically involves what they did before the interruption. So we know that they would like to do things like pruning context, so close out tabs that you don’t need to minimize the distractions for when they resume effort, just highlight a box of text, rehearse in their mind and use notes. One funny thing is they would insert an intentional compile error so that they would have to see what they were working on and be able to actually remember it. But, the interesting thing about this is there is other research that shows that developers often take as much as 7 minutes to reach a good stopping point and so if they only had a minute to stop this is often not enough time to kind of find a good stopping point. And later on in some of the studies we showed that developers area actually bad at serializing their state and make a lot of mistakes at this point. But, the most common thing people did was they just tried rebuilding their context even if it did take awhile. So they would just navigate around trying to read through all the code that they visited again. They would look at their notes, the task descriptions that they had running code and as the last resort they viewed source code differences. They saw this as incredibly cumbersome, very terrible interface for getting a vew of what they changed, but something they had to do when they completely lost all contexts. Yeah? >>: Were you able to distinguish strategies between self interruptions verses outside interruptions? >> Christopher Parnin: No in this data we didn’t have the difference between that. This was just giving us a good sense of what the landscape looked like and it wasn’t a detailed study of one particular thing. But, the interesting thing that we saw was that frequency was strongly correlated with the observed data, so the data that was observed and their self reported strategy usage. So we saw that they said they used source code differences this much time, we actually saw that in the data as well. So the ordering of strategies was the same on both data sources. So once we had a sense of what this landscape looked like we did want to kind of look at a smaller example of this and this is where cognitive measures came in. So we wanted to use this idea of being able to see which parts of programming tasks involve more mental load or not. So, one of the things that we looked at was sub-vocalization. If you look back at when we first learned to read we actually didn’t know how to read silently. It was only recently we learned to suppress this desire to kind of read aloud. The thing is we don’t completely suppress that. We still send muscle signals when we are processing verbal information. So basically what’s happening is premotor cortex still primes those muscles and you can observe the muscle signals that are being sent during these activities. So what you could use is you could use the EMG to observe when the larynx, lips or vocal cords are engaged during certain types of tasks. And we did this while developers were programming. So essentially what we were able to do is find points of very high mental load as indicated by sub-localization and the types of the activities that they were doing. And so we know from other research is that if you have high mental load that’s a very bad time to interrupt someone. So the moments of highest mental load were during an edit, especially with concurrent edits at multiple locations. When people were doing extensive navigation and searching activities and when they encountered a particularly difficult block of code. So we can see when someone, distinguish between someone scanning code and someone really trying to dig into a complex logic of code. So this gave us a good sense of the smaller scale that the data we are observing can be used on a larger scale. But, there are kind of limitations to this approach, right. So we know that programmers aren’t interrupted by interruptions and we have a good sense that they have these strategies for recovering from interruption, but they don’t have much tool support. But, the thing that we have is we know that interruption is typically studied in the context of seconds and minutes in most literature. We don’t have a good sense of how to deal with a long-term interruption. That situation that I described of being blocked from work for 2 weeks, how would you support someone like that? What is the effect of interruption in the long-term and with these large cortexes? And finally if developers have to work in these very small blocks of work units or time segments how can we better externalize their memory resources? I mean can we help them program without having to get into the zone all the time? So this motivated me to start looking at some of the neuroscience literature to be able to explain several different facets of this problem. So we wanted to be able to explain why developers had these certain information seeking behaviors. What different memory failures were they experiencing and be able to explain what the memory need might be for a particular programming task. So this lead to this effort of building this conceptual framework for interruption of programmers and this basically involved looking at hundreds of neuroscience papers, taking several classes, getting a minor at teach in neuroscience and just participating in some lab work to get familiar with some of the things that they do in this area. And so this was a 2 year journey to be able to organize all of the crazy things that I am reading about and be able to translate it back to say a practicing software engineer researcher and be able to say, “Hey, look I found all these cool researches about these particular effects, but how can I find a simpler way to explain this?” And so my goal here isn’t to explain this mapping, but just kind of illustrate that there was this mapping effort. But, what I will dive deeper into right now is how to apply this framework. And I want to do it in a couple different ways. I want to demonstrate how these different memory types can be used in everyday programming tasks, describe the current memory support that programmers have in the context of this framework, be able to reason about short-term and long-term effects of interruption and be able to reason about potential improvements through programming tools that we can make. And to help do this I am going to explain how making a program is like making a pie. So this pie is the everyday task and the programming task is my main focus here. So the first thing that you do when you want to make a pie is you have to go to the grocery store. So you have to buy your ingredients for your pie. So one of the things that I do is I keep in my mind that when I go to the vegetable isle I have to get apples. So I have this kind of visual, mental cue that I have in mind. So, the thing is as I entered the store my wife calls and tells me, “Hey, you really need to pick up milk for the baby.” Okay, so I had better not forget that. So I dash over in the other direction and get the milk. And so as I am walking out the door I never went down that path and noticed this cue. So I failed to trigger the perspective memory that I formed, but even worse I failed to track the fact that I had this goal. So I went home without buying apples. And this is the sort of thing that developers experience all the time. So if you look at what they try doing to maintain perspective memory support they basically have to do message and compile errors. And so compile errors as I mentioned before were very useful for demanding their attention so that they wouldn’t miss the fact that they have it, but it’s terrible in that it breaks their ability to switch to other tasks. It just breaks their ability to get feedback and run a program. It’s a very heavy handed ad-hoc solution. With “to do” messages there is no impetus for me to actually see that reminder message. I just may walk past another isle and never see that I left this reminder. So what we find is that perspective memory is easily disrupted by goal and environment changes for developers. So let me just give you a very simple example of what you can do to fix this. So here is just a little video, it takes awhile to load up, where if you see a developer writing a “to do” message you have this ability to either automatically or on demand attach a reminder message to the view port so that if I switch files or scroll around it’s always there. This is a very simple tweak that shows how you can make something more recognizable. It’s the equivalent of leaving a pill box on the kitchen counter and making sure that you see verses keeping it in the cabinet door that you always have to remember to open, right. All right, attentive memory: So I finally remembered I had to go back to the store and pick up apples. So as I am going to the apple isle I see this kid is starting to lick the apples and I am thinking that’s really gross and I don’t want to pick those apples. So I am starting to track which ones he licks, but if he licks too many I have exceeded my ability to track the apples. If I go down a different isle I may forget which ones he licked. So it’s very easy to lose track of this sort of information. Developers have to deal with this sort of stuff all the time. The main support they have for tracking where they viewed is their tabs in their programming environment and their tree view that shows all the locations that they could be at. The problem with tree view is it can be very long. This is me scrolling as fast as possible and it still takes 13 seconds to go through all the locations I could be in this project. And there are very simple things that we can do. As I am doing certain tasks that involve visiting many locations you could have better aids that track which places you have made changes at and which places you have visited. The thing is about attentive memory we know that it expires very quickly, within minutes and hours. So we have this ability to demand focus on these sorts of items, but if you have to switch your attention on another task it’s very likely you could lose track of this. Yes? >>: Could any ID’s use these tricks, not tricks, but two very simple changes to actually improve [inaudible]? Are these changes implemented in any of the ID’s today? >> Christopher Parnin: I think Smalltalk had a similar view, but I don’t think there is one with a tracking support like that, but it’s one of the tools that I’ve built. >>: Very nice. >> Christopher Parnin: Okay, so episodic memory. So I finally have all the ingredients I have for making a pie, I found this very nice recipe online and I have it on my iPad. So I have looked at the recipe once so I kind of know how it works, but as I get started I run out of power on my iPad and so I can’t look at the recipe anymore. So I try thinking of were I got that from, because it’s been about a week since I looked up that recipe. I don’t remember how I found it. So I experience a source failure. I forgot the context of where I got this information from. So I just go ahead and say, “Well, I know how to do this, I have done it before, let me recall the steps.” So as I am doing this high task I forget to knead the dough and this results in a terrible crust. This is something that developers experience all the time. The thing they have to rely on is the source control view. So the source control is very terrible for supporting the ability to retrieve information about the sequence of the events and the context of those events. For example say I did a task two weeks ago and I wanted to be able to review it, if I go into the source control history I have to look at all the commits that I have made on that task as well as all my other teammates tasks, be able to pick one, then view the file that I changed and view the changes for the file and then so I. So I have to manually piece together all this information from this very uncoordinated set of information. So basically source control isn’t meant to be able to get a good sense of what you did recently. And from what we know from a cognitive point of view is that episodic memory has to be held in this temporary state. So the frontal lobe holds information about the ordering and context of events and it isn’t until basically you sleep or you have the chance to consolidate them that they more strongly bond. So it’s very likely that if you are interrupted in this stage that you are going to forget about these sort of things. All right, associative memory: So associative memory is very interesting because it’s basically how we remember everything from the initial point of view. If you don’t have associative memory, if you don’t have a hippocampus you don’t remember anything, you can’t retain new memory. The neat thing about the way associative memory works is that there is a whole pipeline through which information you experience runs through and you automatically remember it if it meets certain thresholds. If it’s novel, if it’s frequent enough and there is kind of a weak hashing phase and a strong hashing phase. So if you have this idea of something is familiar, that face is familiar, but you can’t recall the name. You went through the familiarity stage, you were able to recognize that as familiar, but you never formed the stronger association of a face with a name. The other interesting thing about associated memory is we have a good sense of how long it lasts. So when information flows through this pipeline initially unless there are other external effects this effect is only hours long. So the mechanism of which we use to retain this information required extra boosting for the signal to basically be retained for longer and longer periods of time. So if we look at associative memory support in programming environments there are really silly things. For example developers use the scroll bar position to know where a code is. So they know that it’s about 70 percent down in this file so I will just scroll there or the fact that I recently edited is a cue to let me look for that, but there isn’t really good associative memory support. You know a really funny example that you can use here is tabs in a browser. So if you look at the tabs on top there are no icons. This makes it very hard to focus on lots of locations and code or I mean in the web, but as soon as you add icons it also allows you to use associated memory to kind of pull in and out things in your attention with larger sets of information. So it’s of the simplest things that you can do to a programming environment today. The neat thing about associated memory is we have this kind of false confidence that we are going to remember it and then the next day it’s gone. That’s because it has that initial lifetime of about a couple of hours. One of the best ways to experience this is reopen your browser tabs the next day in the morning. Most of the time they are meaningless to you; you have to basically close them all down because you have lost all associations you had with the task and locations that you are working with. So now that we have a sense of the way we can reason about programming environments you can do a very similar thing real quickly with programming tasks. So with the attentive memory we know that developers often have to make systematic changes. For example if they have to convert a 32-bit system to 64-bit there may be many locations in the code that they have to track that are related to making that change, but they often run into different problems. For example if I am editing a piece of code that is in a tread and the threading library doesn’t have a 64-bit equivalent that’s also a trouble spot that I have to track. So as I am making these changes I notice more and more trouble spots so I have to track these sorts of changes. And this sort of things happens all the time when we refractor. One of the main things we found with refactoring tools was because it uses this very dialogue heavy interface, in fact only 5 percent of the time do people change the default options in refactoring dialogues, that people are forced to make decisions about where they want to make changes in the code in the dialogue. Like, I want to move over all these files into a new class without being able to address its potential impact. So developers instead like this concept of touch points. That’s why we built that tool that I showed before. But, what I mainly want to focus on is episodic tool support. So we know that developers have to review code changes all the time. One of the classic things that developers do is when they have to make a commit message they have to recall what changes they made recently and we actually have evidence that they often forget common things that they have done. They are very likely to forget and omit details of their programming task. And this also happens when they have to explain changes to a teammate. So if developers need to vet a changes like, “Hey I am going to be making these changes to the system; let me walk you through them and make sure they are okay.”, they have to go back through and reason about all the places that they have changed. And this happens in other contexts, for example if I have to do a code review. It may be a month later when I am asked to actually walk through that code review. So if I have to revisit these code changes on something that I did a month ago that’s going to require an immense reconstructive effort for developers. Developers often need to hand off tasks or they want to create a blog post that describes how they did a task. And one of the things we found is that developers spend a lot of time, especially when trading blog posts, reasoning about the steps that they made, in performing the task, finding the code snippets that were relevant for it and being able to format and publish it. And this often is a 4 hour effort for even a simple coding task. And we had this one interesting example of an older programmer that had memory problems that used a running word doc of all the places that they would visit in code and all the actions that they are making. So they basically manly maintained an episodic log where they would take a screen shot of what they were doing and paste it in the Word doc, make a change in the code and then paste it into the Word doc. So they were doing this manually because they couldn’t remember it otherwise. Yeah? >>: So if you are doing it like that, that detailed that kind of way, couldn’t we just like look at a diff of what you had changed since your last commit rather than keeping a strand of what you changed in the doc? >> Christopher Parnin: Right, so yeah, looking at diff’s and this is one of the things that we explore later on, is that they are very cumbersome. As the diff’s get larger there is no ordering information and just being able to view one thing at a time and trying to keep that in mind as I have to view another file is a very cumbersome sort of interface. And so developers actually could be a little clumsy in how they use diff’s. I mean diff’s are very useful in being able to get very detailed information, but they are very slow and cumbersome. So this marks the point of where now that I had this kind of inspiration of how we could better model how programmers are doing their work, how can we start applying this to tools? And basically we wanted to be able to reason about the different memory needs that developers might have, explain some of the failures that we saw and be able to conceptualize some of the programming tasks that developers are doing. So these are kind of the outcomes for doing this sort of effort. So the next goal was to be able to build different tools and see how it helped developers deal with these particular tasks. So what we did was we built a tool for each memory area and we designed it based on the information needs that they needed for doing a particular task. For example if I am doing a systematic change being able to form a query across code and be able to track locations related to that query and be able to track my progress in making changes. So we had this concept of touch points and being able to support that type of programming task. But the one I want to focus on for the rest of this talk is episodic review. We really saw that source control was this cumbersome process and since we had this infrastructure for collecting events in the IDE. We could not only look at surf control, but could start looking at other rich environmental events and be able to capture and replay these sorts and then be able to show them to developers. So what we basically had the ability to do was every time we made changes be able to commit that change locally and be able to redisplay an ordered change log of what they have worked on and then even be able to allow them to annotate and publish these sort of changes. And we also started annotating this log with other events. For example: if they visited Stack Overflow posts or if they copied code from a blog post. So this allowed people to recover from source event failures, like if when I forgot the recipe, where I got it from. And just a little bit about this infrastructure: because we had all these different events like people searching for code, people navigating code, people making changes, exceptions occurring, we basically had this ability to have snapshots of the code and be able to associate them with those events. And we had the ability to reason over lots of changes in code. So basically this idea of history slicing where you could look back and reason about whether a change ever survived in future versions or not. So that allowed you to do thing like generate me all the changes I made, but kind of throwing away all the churning I did or if I actually want that detail be able to include that. So to evaluate this idea of being able to look at this episodic review of your coding history we did a couple of experiments. So, the first one was an interruption study. This was actually done here at Microsoft where we recruited 14 developers from the participant pool and our main goal was to look at how developers recovered with using these tools verses using notes. And we wanted to be able to compare how developers had better performance in task completion, if they made less errors or not and be able to examine their behavior. Why does not taking not work sometimes? What would be their tool behavior and preference? So the tasks that we had developers do were three tasks with making changes to different games. These were tasks that were long enough to impose some cognitive load, but not something that was going to take 3 hours. So these were something that would take about 30 minutes each to do. And what we did was we provided them with two different tools to recover from interruption. So the first one was more of an attentive tool support where it showed them where they visited recently by bolding in the tree view the methods and files they visited. And the second was a time line view of places they had clicked on, navigated and edited in code. So our method was basically to allow developers to perform the task, but shortly after making an edit have them have to have to move onto the next task. And in doing so they had a minute to prepare for the interruption and they could always take notes for this preparation, but what was different was when they resumed the task they had the option of where they had the condition of using one of these tools to help them recover. And the findings that we had was when they were in this position that if you has one of the resumption aids available you were able to complete more tasks and do this twice as fast when you had the tool available verses notes. And this largely was happening because there were less errors happening when you had a tool available. And one of the reasons that we found was that the reason why developers were making errors was because their notes were often incomplete. What was happening was they would often just write about their most intermediate task and they action they were doing and not what they were doing before, where they visited before or where they wanted to go to next. So what would happen is developers would be in the situation where they would come back, complete the edit that they were making and then have to realize they had to find the place they visited before to make a cascading change and be unable to locate where that previous location was. And so there were several instances where developers resumed their task, competed the edit they were making and just spent the rest of the time trying to look for old code that they had previously encountered, but they couldn’t remember how to find it again. The interesting thing that we observed was that developers were using these recovery aids in different ways. So developers used the attentive interface just kind of implicitly. So they knew, “Well I have been there so let me just check it.”, but they had no association what was there. So the bolding of the file name, they couldn’t recall what it had done at that location, what was relevant about that location, just that they had been there. So it helped them guide their attention, but not recall. The episodic tool support allowed them to have a more restorative recall. So they had a better sense of what they were doing and what their goals were, but they kind of used it to walk back to sufficient state of mind where they felt they were comfortable with working again. Yeah? >>: [inaudible]. >> Christopher Parnin: Um, I think you can use both of them in the same interface and because they are kind of --. The -–. They are complimentary so they wouldn’t necessarily interfere with each other, but they have different lifetimes. So attentive memory is something that is not going to help you necessarily remember what you are doing a week later. There is a lot of problems with attention being very dynamic. So a lot of workers will focus on different methods at a time. So a lot of that data get’s very stale. So it worked in this small case, but when you try applying this in real programming tasks, for example in Eclipse this tool is available, most developers actually don’t use that feature. And I believe it’s mainly just because it isn’t really addressing the longerterm memory needs. Whereas episodic memory allows you to be able to reconstruct your goal state and ask specific questions about what you are doing that isn’t available just by tracking where you visited. But, they all come from the same work history. I think there are multiple ways to be able to combine it. I actually have, I think that the way that I would combine it is actually to use the idea of touch points with perspective memory, so being able to touch many locations that you want to be able to deal with at the line level, not necessarily at the file level. So now that we had the ability to observe how these tools would impact people in a controlled situation we wanted to see how it would work on real developer tasks. So what we did was we did a field study with developers spread across three different teams and we had them adopt this tool and the way we watched them use this tool was we used a tag-along interruption to naturally accompany interruptions they were experiencing and be able to ask them to impose an intervention where you use this tool to help you recover from that interruption. For example say they had to go on a lunch break and then after that they attended a meeting and finally they came back to a desk and started working on the code they were dealing with in the morning. We would just follow them to their desk and ask them, “Hey do you want to use this tool to collect some information?” or if someone came along and was asking them a question we would just tag along afterwards and say, “Hey, can we use this tool to help you look at what you were looking at before that interruption?” And the data collection method was narrative inquiry, basically it’s a way of being able to collect stories about the task that they are doing and be able to code information about the events that they recalled. And their main goal was to compare free recall plus the use of source control if they wanted to verses episodic review. And at a high level what we found right away was that when they had the episodic review tool they were able to recall twice as many events and it wasn’t just the number of events, but they were actually a much richer detail. So what was basically happening was when they had free recall they were basically regurgitating what they had to do at a high level. So the task descriptions that they were working on and not the type of detail you would need if you were vetting a change with someone or having to recall the type of changes you did in a commit. We did observe some longer effort to recall in some cases for the episodic review tool. And this was mainly due to the implementation where if the developer had to re-factor, say they renamed a class and that impacted 30 different file, the tool wasn’t smart enough to consolidate that. So they had to spend more time looking over those changes. But, otherwise most things could still be done under 5 minutes for both tools. And with future efforts I think you can make that tool a little bit smarter in how it can consolidate some of the systematic changes. And so we have more ongoing experiments. We are looking at the impact of the smart reminders on people’s ability to follow through on TODO notes. And we are looking at the episodic reviews impact on commit messages and code reviews. But, meanwhile after all this time of collecting this data on interruptions, reasoning about the different memory support developers need and building some of these tools I wrote a blog post about it. And this actually ended up really taking up, far more than I expected. This has been featured in multiple magazines and numerous news articles. My blog post alone has over 300,000 visitors. When you think of all the developers in the US there is only 3 million developers in the US. So this is actually quite a large impact. And the stories I get back are very interesting. I get stories like developers saying, “I wasn’t allowed to wear headphones and now the CTO bought us all headphones. That the company is now instituting off-line hours where they have on say Thursdays no meetings or IM’s.” And that has really helped their productivity. I even got this story of how someone is instituting this post agile methodology and the founding principal of it is less interruptions. And they use my work as an inspiration. You know I have built different tools before and I have never really had that many people use it, but without making that effort I am getting thousands of downloads for these tools and that makes me feel that I am striking the right cord here with something that developer’s need. For the field study that I did developers are now adopting a tool to help them do code reviews, because they were experiencing problems where it was taking developers a lot of time to have to remember what they were doing a month ago and now that they had this review log it was very easy for them to kind of get back to speed on their task. And researchers are starting to use this framework for reasoning about experiment design and different code editors. So there is this kind of whole space of different types of code editors, code bubbles, code canvas, there is some new work at University of Memphis on kind of this flat tile paradigm. And so they are using this framework help them predict what sort of performance that you will get over different time frames. And a recent development was that there is a startup that is commercializing some of the approaches that I have done in my research. They are really taking the kind of personal analytics quantified self dimension to a new level. So they basically can tell you how often you get interrupted in your programming environment, they can calculate your focus level on work and be able to show that and how it compares to other types of workers and really just give you a good sense of the type of activities you are doing in the programming environment. So, one of the goals that we are having here is to be able to validate their measure of focus. So they built it based on he priors that I had on the data I collected on the time to recover from interruption. But, it would be nice to verify this. Yeah? >>: [inaudible]. >> Christopher Parnin: It depends on when they did it. Um, it’s a feedback look, right. So you could look at it every minute it and it would be terrible. You could look at it at the end of the day. The other interesting thing about it is that they have data from lost more programmers than me. They had data from thousands of programmers and were able to replicate a lot of results I saw with over 100 developers with their data. And they are finding that developers are in focus less than 2 hours a day, that’s their finding. And so I think this is actually one of the points that Andy also brought up is this idea of interacting with other developers. I think this was a really interesting opportunity where developers were telling me stories about how they weren’t sure whether they should interrupt the senior developer because it’s going to ruin their productivity, but they didn’t know them struggling for 2 hours was helping the team or not. So one of the things I would really like to be able to do is assist automated transfer of this sort of knowledge for an experienced developer to be able to perform a task and when someone is going to be doing a similar task say, “Hey, I have already done a task very similar to this. Let me show you how I did it and you can see the steps I did to do it.” The neat thing is because you have the web search history this actually works perfect for explaining coding behavior because you look for things like how to make a web service request? And you go to Stack Overflow and you implement that little logic of code and then you move onto your next task. And for a lot of these there are these kind of building blocks. So you kind of see a breakdown of the information seeking request you had and the code for it and the information seeking request you had for it and so on. And it’s also a promising technique for detecting problematic code changes, because from this history we are also able to see when people have a lot of turn on particular parts of code. And this may tell us that for these particular API calls there may be some problems with the design in the fact that someone was spending two hours trying to fiddle with the parameters before getting something working. So if you look back at the past 5 years on how I worked on this problem I started off with this idea that interruption is a stressor is memory and we wanted to be able to think of how to support developers in recovering from memory failure, because developers were using ad hoc strategies for recovering from interruption. And they didn’t really have much tool support. The thing is I didn’t really know how to design these tools and it wasn’t until I started modeling the different memory needs that they had that had a crystal clear design path for building different tools for them. And so we showed that we can start building tools that addressed these needs and actually help them recall and recover from interruption. And I think that there are more exciting opportunities to do a better job at this as we get a better understanding of developers cognitive needs. So as we look into the future what’s the next step here? So if you want to look at what my theme would be I really believe that developers are the future of Microsoft’s platforms, devices and ecosystems. If you look at how adoption of a platform coincides if developers don’t build apps for it then it greatly suffers its adoption. So I really believe that getting developers to understand what their pinpoints are about a particular ecosystem will help the big picture of Microsoft’s efforts. And I am looking at this from two different approaches: one is more of an applied research angle on looking at developer ecosystems and the other is a fundamental research approach on developer cognition. And they have different goals. The first goal is to really try identifying the value and adoption barriers and potentials of various Microsoft platforms and services. The other is targeted at improving the general training education language and tool design support for developers and the products that are created here for that. So I just wanted to give one example of the ecosystem research that I have done. So one of the things that Microsoft spends a lot of time and effort on is creating documentation for all the services they have. So MSDN creates millions of topics on all these platforms. For instance Windows Phone has to be able to describe how you build an application for it, how you insert things into the store and just all the general services that are available, usually connecting with other services like Cloud services. One of the things that we examined when we looked at the Android ecosystem was that developers were finding that system was largely deficient. So if you actually look at kind of a poll of developers takes on the different platforms they actually view Android as a very terrible platform to develop for. And strangely enough IOS and Windows Phone do much better in terms of the understandability of the documentation. So when we looked at this problem we found that if you looked at how many examples were available in the documentation there was actually a very low coverage for many of the classes in Android. And it turns out when we looked at the browser history of developers most developers were just going to Stack Overflow anyways instead of visiting the official documentation sources to get their information. And when you examine Stack Overflow itself it had more examples and more discussion points on the topics in Android than the official documentation itself. And so this was something that was of interest to both Google and Microsoft. So when I wrote about this research I actually visited both to help them get a better understanding of this ecosystem that’s going on, because one of the things that was happening for example in Android was they were losing their authority over the crowd. So what the crowd was suggesting, “Hey, you should do these things this way and even though it’s unofficial or private and you could just use reflection to get it, it’s okay.” and Android didn’t know that people were doing this so when they made that obsolete in the next version they just broke their ecosystem. So they weren’t having a way to tell developers “Don’t do this” and be able to see that they are doing this. So that’s one of the things that they were interested in was getting of sense of how they could assess the pinpoints of the community and their activities. And so looking at some future questions in this area John Carmack recently said that Stack Overflow has created billions of dollars in savings for the programmer, but we have no sense of how to measure this, right. We don’t have a good sense of what is all the effort involved? We also don’t know what it looks like to maintain this. So one of the things we found is if you waited for developers to cover all the topics in Android it would at least take 2 years before you could get 80 percent of the topics. And that’s not something that, if you are just letting the crowd take care of something, is going to workout, even though that’s what a lot companies are doing now. So a lot of companies are just on their support page putting questions tagged my API, go there and we are not going to provide any documentation. So I mean there is a real weakness in these crowd efforts. The other that you find is if developers achieve a badge on Stack Overflow they stop doing that action as soon as they achieve it. And that 60 percent of the answers are provided by 5 percent of developers. So it turns out that Microsoft and Google are paying the top 100 people to do this, so that’s part of the incentive. There is other gaming going on, for example multiple people use the same account so that they can sell their consulting services and books that they have on these services. And so as I mentioned all this, this is something that has been of interest, especially in developer relations and MSDN, on how to have this sort of tool support. So it will be my intention to be able to support those divisions in bringing these sorts of research findings to them and the tools that they need for doing it. What’s something a little bit bolder though? So I was inspired to write from my view of software engineering in the next 50 years and Ian Sommerville is the guy that wrote software engineering textbook, right. He is like the guy that defined software engineering in the early age. And the interesting things is that he pushed back on this idea that will we ever be able to find a fundamental scientific basis for software engineering? And so I mean there is something that bothered me about that. So for me I believe that we actually can find a fundamental scientific basis for software engineering if we look at developer cognition. So, one of my visions is looking at this idea of neuroscience programming. To be able to use insights neuroscience to obtain a systematic understanding of how programmers work. So I have a Huffington Post article about this. And we actually have done an experiment to start this line of work. So we have done the first fMRI study of programmers. This is working with Janet Siegmund in [indiscernible] and what we basically did was we had this controlled experiment where we had 17 programmers come into an fMRI scanner and be able to and be able to this research, the studies at compare different types of programming tasks in the scanner reason about them. And we have a grant in progress for doing which may not mean so much here, but that will help pay for least. So to just give you an idea of how this process works when you do an fMRI study your main goal is to find location of brain activity. And the way you do that is by contrasting different types of tasks and in a way kind of like a diff. So the goal is to find the task that is very similar to another task so that you can actually be able to factor out common things you are not interested in. For example, if I am looking at someone reading code I don’t really care about eye fixation or very low level visual processing. I want to be able to filter that out. And so we did this by choosing a syntax error task which basically involved them looking for syntax errors in code verses a program comprehension task where their goal there was to examine the output of a particular variable and be able to reason about its output, what it’s evaluation would be. And so the way this paradigm of research usually works is you start off with being able to try to find where these things are happening in the brain and then be able to start building a model of what’s happening and relate that to other sorts of tasks. So I think a really interesting example was in Kentucky there was this effort to, and it’s now a law, to basically say programming is a foreign language. And so they make is that if you take a programming class you get a foreign language credit. Naturally programmers were in a uproar about this. They were like, “This is silly, this is stupid” and okay, the way they went about it was a little bit silly, but the question they had was very interesting. You know, how related is programming to language comprehension? Programmers take on it was that there were several that said programming was nothing like language, its mathematics right. The interesting thing we found about in this study was that for the programming task we examined most areas of the brain that were activated were largely language processing parts of the brain and actually when you compare other studies that are done in mathematical reasoning, deductible reasoning and those sorts of tasks those areas of the brain weren’t activated. So developers actually were able to use their language processing regions of the brain to really comprehend and extract out the information that they needed and in fact they didn’t have to use parts of the brain more are dedicated to abstract representations and manipulation. They were able to get the information they needed in earlier stages of the information processing pipeline without having to like really reason about it. So that showed that we can start being able to look at what programmers were doing, but there is actually other techniques that don’t require fMRI machines. So one interesting thing you can do is actually use TMS. So TMS basically allows you to knock out a part of the brain so that you can’t use that part of the brain for a particular task. This is actually similar to how a lot of brain research starts. You look for someone that has a brain lesion where they damage this part of the brain and they can’t do this task anymore. And so then you use those subjects to kind of reason about what they are doing. This allows you to create this same situation. So you can knock out the part related to language production. You can knock out the part related to mathematical reasoning and so we can confirm things like out studies show that programmers weren’t using this part of the brain so if you knock it out can they still use it? And if we confirm it this allows us to narrow down the parts of the brain that are actually relevant to the task. So we can again knock out parts related to language processing and show that developers can’t do certain things, right. And this is something that you can just do at the desk. This is basically a big magnet that you put over someone’s head and it inducts a current a current in their brain. And you can use it to basically suppress that part of the brain for a couple of hours, 30 minutes maybe. Yeah? >>: [inaudible]. I am trying to visualize the experiment, like how the programmer can have an MRI taken at the same time of reasoning through the code. Like, what’s the setup? >> Christopher Parnin: Yeah, so the set up is basically --. There are a couple of things that you could do, like you actually have goggles that are screens, you know monitor screens, but what we used in this one was a mirror. So you actually look at a mirror that projects out and then reflects at a computer screen that’s outside of the machine. So you are looking at a machine, but they are reading the code, right. So they are reading the code while they are in this tube, but they are not making changes right. So the task is just to comprehend the code. There are other things that you could do to allow movements in the fMRI machine, but as you start having motion artifacts that just becomes more things you have to factor out. So, some of the simplest devices are just kind of more corded to keyboards, so you could select 1, 2 or 3. The way it was done here was we asked them to make the decision and then we did a post test to see if they had the correct answer or not. And this was consistent with the pilot studies in terms of recall and performance. >>: I think this is great, but it’s just sort of the way you program and being in an MRI machine are very different experiences. [inaudible]. >> Christopher Parnin: Yeah, being in the fMRI is quite an experience. It’s almost like being in the middle of a jet engine. In fact they give you like these headphones that you have on --. >>: [inaudible]. >> Christopher Parnin: So it does block out some of the noise, but on the other hand when you really break down what a programmer is doing at some point there is a limited amount of information you can process, right. And you are processing that information in kind of little buckets of time. And if we can look at it at that fundamental level then we can extract up. And with these techniques we can do a lot of things offline in the desktop environment. So, fMRI is very promising because you see it as kind of like the desktop version of fMRI. And so Erin S. when I was at MIT last week we were talking about some work she is doing about looking at workload while people are driving and you should be able to do similar stuff for programming tasks. Amy B. has done work with skin conductors and EEG to do similar stuff. So I think there is a rich area involved with combining all these different techniques. I did want to leave this one example as just kind of a final thing for this area. One thing I didn’t talk about was conceptual memory, although I mentioned it in my framework. There is no better way to see that in action when you compare a novice to an expert. So if you look at what a novice golfer does in thinking about how to swing a golf putt they have to activate all sorts of the parts of their brain. They are really kind of clumsily thinking about all the aspects of the task verses an expert who has encoded this action to a single unit of action. Actually in their motor part of their brain they have conceptualized this to the point where it requires no effort at all to do this action. In fact, that’s the way that evolution works, that the brain doesn’t want to spend a lot of energy in doing tasks. So what an expert is essentially doing is finding the minimal way to do the same action with the least amount of power. What would this look like for a programmer? I think this would be very interesting to see. There are all sorts of studies that look at taxi drivers, video game players and they actually can find parts of the brain exhibit larger growth in mass differences when you are an expert verses just a common person. What would we find for a programmer? I think that would be very interesting. So just some future questions in this area: can we finally find a neurological basis for flow? And programmers say, “I have to be in the zone.” and what does that really mean and can we instrument that and find other metrics that correspond to it? Is it worth teaching programming at an early age? There are all these efforts, hour of code and educational efforts and everyone has to learn how to code. Well, when should we be teaching them? We know for foreign languages if you learn languages at a certain age there is a critical period where it actually fundamentally shapes how language regions process this information. Is there this critical area for computational thinking? You know we have all these debates on object oriented programming, design patterns, functional programming, etc. If someone is taught and invested in a particular way of thinking does that actually reflect itself in activating different pathways? Does a functional person actually fundamentally think differently than the object in person? Are there different pathways? There are all sorts of claims that code visualization actually helps developers or visual languages are better than text. Can we actually see that this is true when we measure it? And one thing that I experienced as a developer was that when I was learning XAML and WPF that I thought that I had reached the point of no return, that’s it, I couldn’t learn anything new. But, I worked through it and two years later I finally got it. The thing is the technology expired by the time I learned it. Is there a point where we can anticipate the amount of the effort in learning an API exceeds the ability for the brain to conceptualize it, because there is a time line. We know it usually takes a year to three years for you to conceptualize a lot of memories. And if the technology is only going to be around for 5 years is that tradeoff worth it? So here are some related publications on just interruptions. There is other work in crowd systems and language adoptions and it’s not included in here. But, for now I would just like to open it up to questions. Yeah? >>: So kind of a stupid question: [inaudible]? >> Christopher Parnin: Yeah, so there has been --. >>: [inaudible]. >> Christopher Parnin: Yeah, TMS stimulation of the cerebellum, I think the body is very resilient to not being turned off by these things. So usually what happens is this is invoking repetition suppression where as something keeps getting turned on and turned on and stimulated that signal get’s depressed. I think there are certain parts of the brain where if that happened you would die. So naturally the brain isn’t susceptible to that sort of thing. So yeah, the fact that IRB doesn’t really have problems with approving these sorts of things, this is a safe technique for that. >>: [inaudible]. >> Christopher Parnin: Yeah, yeah. Run with it as long as you can. >>: Have you thought about other cognitive structures you could use to measure [inaudible]? >> Christopher Parnin: Yeah, so this was very memory focused and it allowed me to kind of dig into one area, but there are all sorts of learning networks and problem representation networks that you can examine and I think those are interesting things to look at in the future. [Clapping]