>> Mary Czerwinski: Okay, everybody. It's my pleasure today to introduce to you Margaret Burnett from Oregon State University where she's a full professor. She visits us often and we have many interactions with her and her students, funding them in their research, sometimes even hosting them when they do their dissertation studies and whatnot. Margaret has been super active in the area that bridges programming and HCI and she has focused for a long time now on gender and most of her research and I'm sure what she'll be talking about today has been done in spreadsheets and understanding how to work in spreadsheets. It's fascinating stuff and she's also been the papers chair for Kai. She's been on the steering committee for the Visual Languages and Human Centered Computing Conference forever. So she's been super active and has published a lot in this area. I'm sure she'll have lots to talk to us about today. So welcome, Margaret. >> Margaret Burnett: Thank you, Mary. So for those of you here and people remote, I'm very happy to have you all here. Today I want to talk about gender HCI, what about the software. So the idea is that people have been very interested in recent years about gender differences that interact with the workplace in software development, with education issues in software development. But what we've been interested in is what about the software itself? What happens when somebody sits down and faces a keyboard if they're male versus being a female. I want to -- so that is what the talk is about. I want to also give credit to my colleagues in the EUSES Consortium, that's End User Shape Effective Software. So that's a group of about 15 of us across eight or nine different institutions, depending on the day you count, who are working on anything relating to end user programming, but a lot of us have a particular focus on gender. So -- and by the way, those institutions are Oregon State, where I am, Carnegie Mellon, Drexler, Penn State, Nebraska, Cambridge, IBM and University of Washington. I didn't hear Microsoft in that list just right now, but of course that could change. So, okay, here we go. Now why am I talking to you today? Actually I'm trying to get all of you involved. So today I want to describe what we've done so far in trying to understand this problem, then I'll be here starting in late March for three months on a sabbatical visit and I'll be collaborating with Mary, who introduced me, and Chow Ziekbal, and Cory Quinn and Gina Vinolia(phonetic). And so what we want to do together and look at gender phenomena and how people use Microsoft products. Our research goal is to try to generalize what we've done so far just pretty much in the world of spreadsheets and see just exactly how pervasive that is across software product usage, especially software products that are intended to have people solve problems. So problem solving is especially what we're interested in. So that's our research goal. But then a Microsoft goal would be to produce better products for males and females. So what you could be doing, starting today or tomorrow or next week through June, is gather some data as part of what you're already doing about your team's products. And if you pick up gender 2 and start thinking a little bit about some of the issues that I'll be raising today, then your product could be helping -- could be getting some benefit out of what we'll be doing together and could be helping us, too, in understanding how far our results really generalize. So I hope you'll get involved. There are these pieces of paper that are going to be going around, in which you -- it's not a commitment, but I'll get your name and e-mail address and your team or product that you're thinking about perhaps contributing data to and what kind of data you might imagine contributing if it's okay with your team and your time permits and all that kind of stuff. So I hope you'll give me that so that I'll have a way to get back to you when you finally get here, or if you can't actually write it down today, then, you know, you can send e-mail to any of us who are working together to Mary or to Shamzi, or to Cory or to me and that would be great. Okay. So that's my involvement. You can contribute data about actually three things existing features that you already have in your product. Or if you're one of the people who's been following this research for a while, you may already have started to have some ideas about what to change and if you actually institute some sort of different feature then we could get old product versus new product, which would be wonderful. The other thing we're working on is strategies. So if you could get data about strategies, then that would be great, too. So I'm going to be going through all three of those today. And you could be giving us surveys, videotaped interviews, user studies, stats, whatever it is you've got. You could analyze it first or not. So we just want to connect with you in whatever way would work. Okay. So why do we care? There are two reasons why we should really care about gender and software. The first is the pipeline story. So of course there are all of these issues about women and Computer Science, but we think that software is one of those issues that we should think about because just imagine, you know, supposing you come -- you're a young girl, you sit down in front of a computer for the first time to do something serious, something of a problem-solving nature. And it just isn't a good fit for you. You know, what are you going to do? Are you going to leave and say, boy, I want to be a Computer Scientist when I grow up. No. Okay. So there is a whole pipeline reason. But there is another reason that's perhaps even more important. Regardless of whether those women would have gone into computer science if their experience were better, you know, regardless of whether they were going to be graphic designers or business analysts or accountants or whatever, if the software is limiting their success, there's this whole class ceiling thing that's going on and we don't want that. And in fact I'm going to tell you about somebody named Ashley, who had exactly that thing happen. Ashley in high school had a career plan to become a graphic designer and went to college and majored in graphic design. But, you know, the thing is in all these classes then you had to use Flash and I don't know if any of you have used Flash in recent years, but although it started out being kind of whizzy wig, it sort of turned into this Java-like thing, mostly for software developer areas, and then web programming came up. And so at that point, the major changed. No longer in graphic design, but instead in art. Now Ashley was very bright. In fact, before graduating, Ashley got one of these really top awards at that university for being one of the top academic students in the whole university, so this is not a matter of a stupid person. So why is this? Well, perhaps the software was just not a good match to Ashley's problem-solving style, learning style, confidence, information processing style and so on. And that's what we think. So what we've been doing to try to study this had is we've been starting by reading the literature about theories that relate to gender differences in things like problem-solving style, information processing style, self-efficacy, all those kinds of things. And from those we derive specific hypotheses, which we go to the lab and refine tests, try to understand empirically. We use those results to redesign prototypes and evaluate them and that gives us new ideas about theories, refinements of the theories, new variance for our whole set of hypotheses and we go around the circle again and again and again. So there are three things that I want to focus on today as I eluded to earlier. One of them is we want to know is there something broken? Are there gender differences in feature usage? And I'll tell you about that. Then we want to know, well, can we fix it at the feature level? Can we make feature changes that might help to reduce the gender gap? And then the third one is looking deeper. Whether there are things that we can't fix at the feature level, unless we go deeper and really start thinking about the problem-solving strategies, the males and females might be trying to use. So first, feature usage. Is there something broken? Now this is the most mature of the things we've been working on so we've been at it for quite a while. I'm going to give you a whirlwind tour of five studies. Obviously we won't be going into a lot of depth on all five of these. The first one was qualitative. We were looking at feature interests. We then took that to a statistical lab study to see if we could find some statistical differences in feature usage and self-efficacy. Then we did a qualitative follow-on. Then we did another statistical study in which we looked at the ties of what we'd found before with feature usage to tinkering and finally we did another statistical study that was actually hosted here at Microsoft in which we looked to see how our previous studies, which were in our own prototype, spreadsheet system, generalized to itself a commercial system. Okay. Study number one. Feature interest. So when we began, we wondered from reading the theoretical literature whether we might see some differences in the amount males and females were actually using various features in spreadsheets. And we had a lot of fun. We had a bunch of old data. We spread it out all over the floor. We had a profile that looked like these for every single user. Actually the profiles were more complicated than this. They had lots of different colors in them and so on and so forth, but this one's particularly telling and what we would do is we sorted them out and then we'd look at the back of the piece of paper to see if it was a male or a female. And sometimes we said, oh, my gosh, look at this, everybody in this pile is a male. And so one of the things we found out is there was this tremendous difference in the amount the males were using these features. Up here is count and down here is time. Okay. Versus the amount the females were using the features. As I said, these are just two users, but these are fairly representative. So anyway, from this old data, we formed some hypotheses and went to the lab for a statistical study. One of the things we also got out of the literature was that there might be a potential tie between this feature usage and self-efficacy. Self-efficacy is a specific form of self-confidence. It's regarding a specific task. So your confidence in your ability to perform a specific thing like debug this spreadsheet. It's a general theory. It's been found to be very productive of willingness to try, perseverance and so on and so forth. And in the past literature we learned the females traditionally had lower computer self-efficacy than males, so we thought this might be an important factor. And we went to the lab with this research prototype of a spreadsheet system which I'll explain to you in a minute. And the task that we set forth for our users was to find and fix errors in spreadsheet formulas and the reason we used our prototype in this one, instead of Excel, is that we have some specific features that we had previously designed to help with exactly that task. So here are the features. We divided them into three categories. First there was the familiar type. Everybody knows how to edit formulas or at least everybody in our study did because we made sure they had prior spreadsheet experience. In our prototype you do it with a text box instead of the way you do it in Excel, but it's still a very familiar way of doing things. Then also we taught them two features in a little tutorial at the beginning. We knew that none of them had any experience with these features because they're unique in our prototype. And one of them is a checkmark. So if you happen to notice that a value is correct you can check it off. And if you do that under the hood, the system is making calculations about how thoroughly tested your spreadsheet is using a formal test adequacy criterion and it does that behind the scenes and then what it does in front of the scenes is it takes the cell borders and colors them along a continuum from red to blue, blue being more tested, more testing coverage. So that checkmark actually turned this cell all the way blue. And this one, it tested partially according to our criteria, so that made it purple. So that was one thing we taught them. We didn't tell them about test adequacy criteria, we just sort of gave them kind of the naive version of that and then also there are these data flow arrows, which you can pop up and those data flow arrows do what you would expect, but in addition they also have the testedness coloring here so that you can see that the interaction between these two is also fully tested because it's blue. So we taught them that. And then there was one other feature that we did not teach them, the X. So instead of noticing that a value is right, you might notice that it is wrong, in which case you can X it out. And if you do that the system reasons about which formula may have actually been at fault for that because it might not be the one here. And colorizes these, highlights them in darker and darker shades of yellow depending on how implicated it thinks a cell is. So those were the features in these three categories, familiar, taught and untaught. Oh, and there was a little tool tip thingy, too, so they could explore any feature they wanted to, taught or untaught, with this tool tip stuff. Okay. So what'd we find out? First, notice this lovely 45-degree line. We found this in study after study after study. This is feature -- effective feature count and this is self-efficacy. Okay. We always get this. The lower the self-efficacy for females, the higher -- the lower the use of more advanced features, the effective use of more advanced features. As their self-efficacy goes up so does their feature usage. Look at the guys, they're all over the place. Look at the P value. Flip a coin. Okay. I mean, there are low self-efficacy men and there are high self-efficacy men, but it has no relationship to whether or not they use features. Yes? >> Question: When you say men are very (inaudible) undergraduate students or do you mean all people throughout the population? >> Margaret Burnett: So this study was undergraduate students, not computer science majors. We're not allowed to have very much computer science background, either, and many of our studies have been business students only. I can't remember if this particular one was. In a couple of our other studies we went beyond those age groups and in fact the one that was sponsored by Microsoft in Excel was not students, but we haven't gotten to study number five yet. But we always find this. This graph, every single study, we find this. Yes? >> Question: The feature that wasn't bought -- that wasn't taught, was it cased advertised to the users? (Inaudible) ->> Margaret Burnett: Marginally. We said, and, you know, and you can see there is also this X mark. So we just sort of tossed it out there, but we didn't say you should be using this, you know, we just tossed it out there. And of course they knew that they had tool tips available over everything because that came up in the tutorial. So the bottom line here is that self-efficacy mattered, but it's not just about self-efficacy. It impacted females differently than males. Okay. So what about trying new features, just touching them? Well, so this is the time they first touched something. Look at the females. They were much faster to start with the familiar feature, namely editing formulas. The guys were later. Type taught. The females are much later at even touching the types of features we taught them. Look at the untaught feature, okay. I mean, okay. So bottom line, the female has ventured to try out new features much later than the males. What about genuine engagement? We had a way of measuring whether people were really following up on things that they were trying out. So for the males, type taught, they were much more engaged. For the females, type familiar, they were much more engaged. For the untaught significantly more males used the untaught features than the females. Bottom line, the females engaged in the new features less than the males. Okay. So let's get to the chase here. The goal was to fix bugs in that study. So how'd they do there? With the sea of bugs that were fixed, there was no difference. However, the females were significantly more likely to introduce new bugs that had not been there. Okay. So let's think about this a minute. Maybe they're just stupid, right? They just enter it. Okay. Is that what happened? Well, there's only one way to introduce a new bug. How do you do it mechanically? Any ideas? >> Question: Write your own formula. >> Margaret Burnett: You can edit a formula, that's right. What are the females spending all their time doing? Editing the formulas. Okay. The males were using all of these other features to help them problem-solve, too, so they were just spending less time in this way to introduce bugs. Furthermore, there was this kind of self-fulfilling prophesy going on. So in our study the females had significantly lower self-efficacy than the males. And this was, as you've seen, tied to their feature usage, not true of the males. So we asked -- one of the things that we had on our post-session questionnaire was if you didn't use this feature or that feature, why not? And the females were significantly more likely to say because I thought it would take me too long to learn them. But, in fact, we also had a comprehension test at the end on how these various features worked, what would happen if you did this or that or the other. There was no difference in their comprehension of the features. Even though throughout the course of the task the men were getting a lot more practice with them. There was still no difference in comprehension. Furthermore, we also know that using these features helps. We know in this study and in several other ones that the use of these features does help you to find and fix bugs. Yes, Andy? >> Question: Can you go back one slide? I just want to ask you a question ->> Margaret Burnett: Sure. >> Question: -- about the new bugs introduced. >> Margaret Burnett: Yeah. >> Question: Is that number normalized by how much actual time the female spent editing ->> Margaret Burnett: No. >> Question: -- formulas? >> Margaret Burnett: No. This is just raw new bugs introduced. >> Question: Because like when you're code -- when you're coding, the more lines you turn, the more likely you're going to introduce bugs. >> Margaret Burnett: Exactly. >> Question: So if you gave -- the more time you spent editing formulas, the more likely it is you're going to introduce bug into a formula. >> Margaret Burnett: That's right. >> Question: So if you normalize that then you can see if females are more likely to introduce bugs than males independent of the time that they spent editing formulas. >> Margaret Burnett: Well, I would argue ->> Question: You're not done. >> Margaret Burnett: I would argue that I'd rather know it this way than your way because what we really want to know is the collection of features available to them in the way they're using that more likely to lead them down this bad path? And so that is what this set of numbers tells us. Yeah. Okay. Let's see here. I think I polished this one off. All right. So basically not using the features was pretty much tying one hand behind their backs. Okay. Then we did another study qualitative in which we had users talk aloud as they found and fixed areas in the spreadsheet and here is one low self-efficacy user telling us how she regards these wonderful features of ours. What's this little arrow doing everywhere? So I need to take this -- oh, my goodness, now what's happening? Too much happening. Okay? So this user was not entranced with our features. Here's another one. This is a different feature which you haven't seen, something called Guards. Guards are kind of sort of related to Excel's data validation thing so you can provide a range that expresses the values within which a cell should fall. So this female's using them, she's very production oriented here. She says, so, 0 to 100 is the Guard I'm entering. Okay. Okay. Hmmm. It doesn't like the minus 5, they can get a 0. That gets rid of the red circle. So you can see that she's business-like, production-oriented. Her motivation is to use these exactly the way we designers intended her to use them to solve these bugs. >> Question: (Inaudible) -- circle. >> Margaret Burnett: Yeah, there's this circle that's ->> Question: This is emotional. >> Margaret Burnett: There is this circle, yeah, but she's making it angry. Yeah. All right. Now here's what the male says. He starts down the same path she does. The first thing I'm going to do is go through and check the Guards for everything, just to make sure none of the entered values are above or below any of the ranges specified. So homework one, actually I'm going to put Guards on everything because I feel like it. I don't even know if this is necessary, but it's fun. Okay. It's fun. But then look what happens. He gets into it for the fun of it and then it starts doing him some good. So okay it doesn't like my Guard apparently. Okay. Ah-hah. The reason I couldn't get the Guard for the sum to be correct is because the sum formula is wrong. Okay. So they both got benefit, but the male thought it was fun. So this caused us to start thinking about tinkering. We said, okay, so the guys, you know they kind of like to play around with this stuff. So we did a study very similar to the previous one. First let's think about -- let's look at what happened with the females. For the females, increased tinkering was good. Okay. If they did it, it led to more effective use of these features, which we call percent testedness measured that way and that in turn was predictive of bugs fixed. Okay. So for females tinkering was good for them. For males, testing effectiveness, the features were good for bugs fixed, but the tinkering was not tied to more use, effective use of the features. So this is kind of strange. The male's tinkering was maybe not the world's hugest advantage here. So in fact ultimately increased tinkering was inversely predictive of the bugs fixed because of this bit right here. Okay. Why? Well, there were two things that went on. The first was pausing. The males tended to not pause. The females did. When they tinkered, they tinkered pausefully. Okay. Now the education literature says that pauses improve critical thinking. And our results showed that in fact the pauses mid-tinker, tink-er, like that instead of tinker...tink-er, tink-er, tink-er. Okay. Tink-er was predictive of understanding and effective use of the problem-solving features. Why number two? Well, we had two environments in that study. The low-cost environment that you've seen, I've shown this to you in previous slides so you click here and you get it here. You click here and it goes away, very easy, very low cost to tinker. For the other environment that we had, I'll tell you a little bit more about this, but sufficed to say, right now it was a little more complex and tinkering was not so easy. Look at the females. Doesn't matter which environment they're in, the amount of tinkering they do is about the same either way. Maybe it's because of the pauseful way they did it. For the males, however, look at this huge difference. Okay. Now the difference between these features and that feature is not really that huge and yet look at this huge difference in the males. Here they're obsessive in their tinkering. Here their level of tinkering is at the same level that turned out to be good for the females. Okay. So we've got this sort of tinkering obsession thing going on with the males. All right. So one more quick study and then on we go to the next topic. We then tried to replicate this in Excel. We were explicitly interested in replicating that second study, the one that looked at self-efficacy and feature usage and gender. We had different software, namely Excel, a different population. This was a wide span of adult ages and occupations, all of whom had spreadsheet experience. They were in the Seattle area and once again, they could not be system developers, they couldn't have degrees in computer science, all that kind of stuff. Instead of debugging, we had them do maintenance, which basically means create your own bugs and then fix them. And we had a more complex spreadsheet. We focused on the audit toolbar and we taught some of those features and didn't teach other features and so it was pretty much the same experiment design I've explained to you before. Participants could use any Excel feature they wanted, they weren't limited in what they could use. And just to overview the results, look at all these X's. Okay. The pink are the females and the blue are the males. Self-efficacy is a predictor of success. Here again, we had significant differences. Okay. Yes for the females, not significant for the males, although you can see an upward trend it's not as pronounced as for the females. Self-efficacy's tied to familiar feature usage. All three of these tests were about that. Look at these X's. Once again for the females these trends were significant. For the males they were not. Finally, self-efficacy's tied to the usage of the untaught features. This one we didn't get anything significant, although you can see that the trend seemed to be more pronounced for the females, but we didn't get significance on that one. So in summary on what we've found out with -- is there a gender difference in feature usage? The answer is yes. We found it in feature usage. We found it in self-efficacies tied to feature usage. We found it in propensity to tinker. We found it in tinkering's ties to self-efficacy, and we found it in tinkering's ties to effectiveness. We found it in five different studies and a whole bunch of different populations. It's real. Okay. Now this, again, is something that we've done only in spreadsheets. So if you all can help us gather data on other products, we'll know how real it is there, too. But now can we fix it? Yes? >> Question: Do you actually do some work on educational technique or educational backgrounds if you look at it from that perspective on the sources? >> Margaret Burnett: We've tried to control for that. So in some cases we in the earlier studies we had all business students with a particular minimum amount of spreadsheet stuff they had to have done and no computer science and all that kind of thing. In general whenever we do any of our studies we collect their GPA and their majors and the number of years of programming experience they have, if any, because nowadays in high school a lot of times you get it. And you know, number of years of spreadsheet experience and we've never explained our results in those ways, but we always look for it. Okay. So what can we do to fix it with features? Now this is much less mature than the other work so I don't have nearly as much to tell you about it, but I do have a couple of things to tell you. So in the original prototype, of course we had these features that I've shown you and in fact sure enough they did encourage tinkering. We then added -- oops. We then added -- where did I put that? Oh, here we are. We then added these new things. So we added a more expanded version of Help. So in addition to the tool tip, which even in the original version it explains what people see and it also explains a little bit about why they might care. Down here it was sort of strategy tips. What can you really do about it, a little bit more in-depth explanation. So, this was version one and so we did that and the other thing we did -- well, we did three things. Another thing we did is we added maybe marks. So this it was this means the value's right. This means the value's wrong. These two kind of grayed out ones in the middle -- well, not grayed out, but lighter, this one means "seems wrong maybe." This one means "seemed right maybe," and those all had tool tips. Yeah? >> Question: Compared to the tool tips, you know, it was introduced during a tutorial and is that tutorial consistent throughout the experiment? >> Margaret Burnett: Was the tu -- we did a tutorial at the beginning, if that's what you're asking. And yes, both groups got the tutorial. I think I'm answering what you asked. >> Question: I'm just wondering how discoverable the tool tips were ->> Margaret Burnett: Very. Yeah. They were too discoverable, actually. Yeah. They were very discoverable. Any time your mouse spent any amount of time over anything they came up and this little thing here was one of these pull-down things, but we taught them how to do that in the tutorial, as well. >> Question: Okay. >> Margaret Burnett: Uh-huh. Okay. So let's see here. So the reason we introduced these maybe marks is these were intended to be a communication to those who felt they might not be sort of qualified to make the right-wrong judgment. Maybe I'm just not sure enough of myself to say it's right or it's wrong. And so these -- "seems right maybe," "seems wrong maybe," we put them there thinking that low self-efficacy users might be encouraged by those and use them. And what happens is the border colors then are just a little bit more faded out if they make use of them. So that's what we did. And we also might care. Down here it was sort of strategy tips. What can you really do about it, a little bit more in-depth explanation. So, this was version one and so we did that and the other thing we did -- well, we did three things. Another thing we did is we added maybe marks. So this it was this means the value's right. This means the value's wrong. These two kind of grayed out ones in the middle -- well, not grayed out, but lighter, this one means "seems wrong maybe." This one means "seemed right maybe," and those all had tool tips. Yeah? >> Question: Compared to the tool tips, you know, it was introduced during a tutorial and is that tutorial consistent throughout the experiment? >> Margaret Burnett: Was the tu -- we did a tutorial at the beginning, if that's what you're asking. And yes, both groups got the tutorial. I think I'm answering what you asked. >> Question: I'm just wondering how discoverable the tool tips were ->> Margaret Burnett: Very. Yeah. They were too discoverable, actually. Yeah. They were very discoverable. Any time your mouse spent any amount of time over anything they came up and this little thing here was one of these pull-down things, but we taught them how to do that in the tutorial, as well. >> Question: Okay. >> Margaret Burnett: Uh-huh. Okay. So let's see here. So the reason we introduced these maybe marks is these were intended to be a communication to those who felt they might not be sort of qualified to make the right-wrong judgment. Maybe I'm just not sure enough of myself to say it's right or it's wrong. And so these -- "seems right maybe," "seems wrong maybe," we put them there thinking that low self-efficacy users might be encouraged by those and use them. And what happens is the border colors then are just a little bit more faded out if they make use of them. So that's what we did. And we also had something else, testing scaffolding, which I won't talk about today. Now, the down side of this interface is that it was a little bit more complex and making use of the features, now you've got different intensities of border colors to sort out and you have four instead of two to choose from and so it was just a little more complex and there was more visual feedback. But there were some good points. One of them was and this was just a set of preliminary trends at first. So the low confidence marks over time we began to notice that the gender gap in male and female usage seemed smaller than it had been before. And we also began to notice that males were using them, too, although the females seemed to be using them more. Then we changed the interface and instead of that pull-down tips thing which we never were able to implement in any very nice way, we added little video snippets instead and also hyper text as an alternative that explained strategy. And these were -- well, we were aiming for a minute or less. Sometimes we didn't make it, but that is what we did and in a qualitative study we found out that these were liked by females and the females all commented it improved their confidence in their ability to solve spreadsheet bugs. So then we did a third variant, it had the same nuance judgments and even better strategy explanations and we presented that with a statistical study at (Inaudible) CCO8. And in that one we found some very good things. Keep in mind, these feature changes I'm telling you about here, these are not big changes. Okay. We're talking about taking the two tuple and is turning it into a four tuple and we're talking about instead of just the normal tool tips also having a hyper text and little video snippets feature. These are not huge. Now one of the things we found is that the females -- is that these were very effective together at helping to close the gender gap. So here we are with tinkering with the X mark. Here it was fairly, both of them were pretty small. You'll notice here in fact the females were experimenting with those more than the males in that particular group. Here tinkering with the check marks, look at this. Here we are again with these control males doing huge amounts of tinkering. Then we add, you know, for the group that had the new interface, it's all nice and even, almost flat. Furthermore, we can see that females on both sorts went up when we had the new interface. And the males, here it's almost the same. They went up a little bit, but thankfully they didn't exceed the females. And here they went down. Remember, we like that. Remember the males and their obsessive tinkering? Okay. It's not so easy to do in the other interface. You have to kind of go click, click, click. I don't know, it's just not quite worth it as much. Okay. And so this brought them down to the level we want them to be at, which is about the same level as the females. So the females are going up, which we want. And the males are going down, which we want. >> Question: I thought there was a difference between the low cost and the high cost tinkering. >> Margaret Burnett: Okay. So, sorry, I didn't really explain this. The C, that's control. So control males control field -- control females. So this first half, this is the old environment and then the second half, the treatment, those are the new environment. Does that -- is that ->> Question: When you say that men tinker less with high cost ->> Margaret Burnett: Uh-huh. >> Question: -- features, but more of low cost features and you're not turning the (inaudible) here. >> Margaret Burnett: I am. So here. Let's just look at this graph because it's the most pronounced with the checkmark tinkering. So in the low-cost features, the males are tinkering more. And then in the higher cost, the new interface with the four tuple and the strategy explanations, it goes down. Uh, no, for the males, the other way was up. Yeah, now it's down. But I've switched -- I switched left to right. Maybe I switched -- no, no, no, I haven't. I don't know, it did the right thing. Okay. Let's see here, I can show you again later if you want. Let's see here. Okay. What about the difference in self-efficacy? Because you may remember that the females had lower self-efficacy than the males, too, in most of our studies. So we measure self-efficacy at the beginning of studies and at the end. And it always goes down. They always, no matter male or female, you know, they say yeah, I'm pretty good at debugging spreadsheets and then you give them one to do and at the end they think, maybe I'm not quite so good at that after all because...Yeah. Anyway, but for the difference in the treatment females versus the control females their self-efficacy went down less. So this is good. And furthermore, so you may say, well, gee, why don't we just tell them they're wonderful and then we could really just make their self-efficacy be great. But the really good thing is they were better judges of how their performance had really been. Okay. So when we compared their post self-efficacy judgments against bugs fixed, the treatment females were much more aware of how well they had really done, which is good. You care because when do you decide to ship a product or rely on a spreadsheet? It's when you think the bugs are done. So if you're not correct about when the bugs are done you're either fiddling around with it way too long or else fiddling around with it way too little. And these were -- we got these out of our questionnaire data, but these were triangulated against post-session questionnaire answers, as well. So let's look at attitudes. Here one of the things we can see is that their overall attitudes with the control version, look at the difference between the males and the females. The females did not like that original version. But here it's almost the same. Okay. And just focusing on information, which was kind of focusing on those video snippets and the various kinds of help we gave, at the beginning nobody really liked them much. And at the end, people liked them more. But the females especially. And furthermore, we can see that for both of these measures the females went up and look at this, the males went up, too. This is very cool. Okay. We're trying to fix the gender gap, right? We're trying to do things to enable females to be able to use this software more effectively. And what we're finding out here is it's helping everybody. Okay. Very cool. What's the scale? Let's see. What were those? Oh, these were scores. I think these things were liker things and we just added them all up and so these were just liker score sums. So yeah, yeah, that is what they were. Yeah, you just, you know, take 0 to 5, and, you know, six questions. Let's see, how many subjects do we have here? This one, yeah, let's see here. Hmmm. I may not be telling you everything, but we did get a lot. Yeah, I'm going to have to go back and look at that for real, but I think it's better than it looks there. >> Question: (Inaudible) -- for a number of measures. >> Margaret Burnett: Is it the mean? >> Question: I think it's a mean. >> Margaret Burnett: It's probably a mean. >> Question: For access to information. >> Margaret Burnett: It's probably a mean. I'll have to double -check that. Yeah. Yeah. And I think possibly -- aha, I know, I know. I know what it is. It is either mean or sum, I don't remember which, but I know why we went up to 30 here. It is because we went up to 30 here, but you don't actually have 30 points just available for talking about information. So the question is what was the max and I can't remember. But the females were quite positive. It was quite startling. And your question about how many subjects. I think that that's the study -- let's see here. We had close to 60. I can't remember for sure. Yeah, something like that. All right. So we've seen that there is some gaining that you can make just by small changes to features. And this is good. But one of the things that occurred to us is okay in those new features we had the maybe marks, the nuance judgments and we also had the strategy videos and what those strategies were about was what -- incorporated what our features were about, which was testing. So we began to wonder, well, if females maybe want to go about things some other way, then really promoting testing might not be going deep enough. So we should be thinking about what their problem-solving styles and preferences and success stories are really like from a strategy perspective. So that's what this third category is about and we're still fairly new at this, too. So I don't have a ton to tell you about, but I can tell you about some things. So this is about a study we did and published at Kai '08. And so we were interested in end-user strategy as they problem solve with spreadsheets, the same as these other studies. Now the thing about strategies is they're in the head. Okay. So how are you going to get that? You can't get it from the world. So what we did is we used a lot of triangulation. So we used participants' words about their strategies and combined that with electronically logged behaviors about what they did and did playbacks of that so that we could understand what they did in some detail and then we also did an independent data mining study in parallel, which would also be about what they did. So we used all of those things and that data mining study was on different data. So we did all of those things to try to understand what we had in terms of strategies. So we asked people what kind of strategies they used and we translated those into one-word codes and they told us about eight strategies. One was data flow and that was a male strategy, as you'll see. One was testing and that was a male strategy, as you'll see. One was code inspection looking at the formulas and that was a female strategy, as you'll see. One was specification checking, which was a female strategy. One was color following. Now if there are people in here who are kind of sort of software engineering people, you know, don't freak out. Say, color following, that doesn't count. Well, it does because if -- when we asked them their strategies, they said following the colors, then that counted. Whatever they said was a strategy, was a strategy. To do listing was a female strategy. Fixing formulas, that was pretty much how it was described as the strategy, that was a female strategy. And finally spatial, oh, good, we don't have a gender on this one. Okay. One out of eight, we didn't find gender differences. So let's look at these counts. The orange diagonal tells you the amount for each particular strategy. Testing was the biggest. We had 61 participants in this so this is roughly three quarters of the people mentioned something oriented toward testing. Talking about fiddling around with the values to see how they turned out or various other things they could have done that were testing oriented. Almost three quarters also talked about code inspection or talked about code inspection and of those about three quarters of them talked about the other. So testing people often also mentioned code inspection, code inspection people also often mentioned testing. Okay. The next most popular one was specification checking B. Half of the people mentioned that. And of those about two-thirds of them tended to also use testing and/or code inspection. So those are the biggest ones that were mentioned. I don't have time to talk to you about all of them today, but I can tell you about those three. I have time to talk about those. So data flow. In the words one male put it this way. Systematically go from the formulas that rely wholly on data sales progressing to formulas that rely on other formula-driven cells and so on. Basically he's following the references of varied data flow oriented way of going about things. Now so everybody who we classified as being -- oops, sorry. Everybody who we classified as being successful, we talked about data flow, was male. Okay. We then looked at their behavior and what we found is that for males the number of data flow instances that we could count, the actual moving back through a data flow chain, was positively correlated with bugs fixed for males. Not for females, look, it is just all over the map. Okay. Now there's a theoretical explanation that we could apply to this. There's been some very interesting work on information processing styles from another domain, namely the domain of marketing. And there is a hypothesis that has been born out quite repeatedly in other domains called the Selectivity hypothesis. And it comes down to females statistically are more likely to want comprehensive information before they start taking action. Whereas males are more likely to do something that you might think of as sort of a breath first -- sorry, a depth first thing. So they take the first promising cue and sort of follow it up. And if it doesn't pan out, back up and try something else. Okay. So there's been a lot of literature about this and we tend to see it a lot and that could be the explanation for this phenomenon here. Keep in mind these differences are statistical, right? I mean, there's no completely typical male or female, so these are all just statistical. Now there's one feature that is very data flow oriented and that's those data flow arrows. And we found that for the males usage of arrows correlated in a positive direction very significantly to their talking about data flow. For the females, not so. And we also found that the males use those arrows almost twice as often as females. Okay. And we found this in several other studies, as well. Okay. What about testing, another male strategy. So here's the way one male expressed it. After correcting everything, I press Help Me Test, so that's a testing oriented thing and double checked the values, so there's a testing oriented thing. To make sure they work, next I plugged in values. Okay. So very interested in testing. Now in terms of statistics, we didn't get gender differences in who talked about testing. But remember three quarters of the people did so it could have just been a ceiling effect, but in behaviors we found testing correlated to bugs fixed. Not so for females. In fact, boy, oh, boy, did it correlate for the males. So if we looked at successful males versus unsuccessful males, what we found was that value edits, successful males did more of it. Value edits, plus the Help Me Test scaffolding successful males did more of it. Percent testedness, which is, you know, how effectively they're using the features, males did more of it. But the females, we got nothing, absolutely nothing. So then, if you compare only the successful males versus the successful females, we have once again successful males more, successful males more, successful males more. Okay. Testing was definitely a male oriented strategy. So let's look at code inspection. This of course was examining the formulas themselves. One female said, I then looked at the formulas to see if I could find mistakes. In our environment, just short of showing four versions of the same cell for some reason, but anyway, you can post up the formula, have it sitting there and you can leave it there and have others visible at the same time. So it's possible for you to see as many formulas as you want to, plus the values all at once. So that was the facility they used for code inspection. So for the females, when we looked at the number of instances of people leaving up multiple formulas at once, the successful females did that significantly more than the successful males. And also when we looked at the number of formulas they had lies around in all of the instances we got the same results, successful females significantly more than successful males. So okay so those are the findings that I have to tell you about. Now in addition we've started branching out a little bit in a few other environments that we've had time to look at. So we did a study in power shell, actually my student Valentina did that on an internship with the Power Shell Team and found consistent results with the previous ones that I've just told you, plus three additional strategies because Power Shell is control-flow oriented and they had a few fewer restrictions, as well, on what they could do. And also found suggestive results on exactly how and when the strategies actually get used. And I guess I didn't mention this here, but we have a paper coming out about that for the EED paper -- or conference. Our hypothesis at the moment is that when -- for example, when you look at code inspection, that that's tied to female success in both finding and actually fixing bugs. So that's what we're trying to do now is really break it down and find out, what are they using these strategies for and how can we better imagine sporting those? We also have some emerging studies coming out focusing on strategies in both pop fly and going back to Excel with a sense-making perspective. So those were sort of analyzing and conducting right now. So that's -- so the strategy results overview is we've found eight -- actually it's up to 11 now, end users, debugging strategies, but the study I told you about, it was eight. Seven had gender differences. The males, the biggest telling points were with data flow and testing, which were both positive uses of those strategies for success. For females, code inspection, to-do listing and specification checking and they also did something called fixing formulas, which was actually a bad idea. Now just kind of thinking of Excel because a lot of this is done in spreadsheets, let's see here. Data flowy. Okay. We know how to do that in Excel. We have those arrows. You've got that sort of color thing, you know, that you get when you click in a formula. Don't have much in the way of testing, but at least we can fiddle around with values pretty easily. So, you know, it's something. How do you do code inspection in Excel? Well, you memorize a formula and then you can go look at some different formula and memorize that one and then you can look at some other formula and memorize that one or you can switch to another view and get all the formulas, but now you lose the values. Okay. So -- okay. So I think there's some room here. Okay. All right. So let's get back to Ashley. Remember Ashley from the very beginning, that person who -- who's major changed from graphic design to art? Okay. Well so what we wonder is whether maybe these were because of self-efficacy issues, maybe because of motivational issues, maybe because of problem-solving style issues, maybe because of information processing style issues, all of which have been shown in many domains to have gender differences. However, remember that those are all statistical. Right? And there is no male that is exactly male and there's no female that's exactly female. And in fact Ashley is a male. Okay. And yet encountered all of these stakes because of his particular problem-solving styles. So designing for both genders is about taking down barriers. Okay. And if we do that, it benefits everyone. So that's that. I have a whole bunch of papers and so if you either go to the Users Consortium site and click on Gender, just go to my home page and click on Gender, you'll get there, but I have a call to arms, so I'm really hoping that you all can help to contribute to this work. Some of you came in late, but I have these pieces of paper that hopefully are going around and maybe we can start another one around. And so what I'm really hoping is that we can get data on a bunch of Microsoft products, things that you can do as part of your regular day jobs, just be sure to collect gender 2 on whatever studies are going on. Perhaps either with existing features or if some of you already have some ideas about things you might want to try, an old feature versus new feature thing would be great and also we are very, very interested in anything we can get on strategies, problem-solving strategies. And what you could contribute would be interviews, surveys, user studies, statistics, whatever you've got. You can analyze it or not. And our hope for outcomes, well, we hope there will be a technical report of some sort that will be valuable to Microsoft and presentations, of course, inside Microsoft. We're hoping to get a paper out of it, too, and hopefully product improvements. And so for those of you who came in late, this is what I'll be focusing on for my three-month stay here that starts toward the end of March and I'll be working with Shamziek Baugh and Mary Czerwinski and Cory Quinn and Gina Vinolia on this. So if you don't want to scribble anything down on that little piece of paper and would rather just send e-mail to one of us letting us know what you can contribute, that would be great, because we'd really like for some data to actually be here pretty early in our stay so that we can really start tearing into it. So, yeah, hoping you can contribute. (Applause) >> Mary Czerwinski: Are there questions? >> Question: Yes. Because I do a lot with K12 and product service center, how much -what it is your tie in terms of the research with computer science teachers of America? We're trying to create -- there's a problem with this self-efficacy among girls is hard ->> Margaret Burnett: Uh-huh. >> Question: -- it's happening in games, etcetera. So CSTA is looking at (inaudible) computer science curriculum. What is the sort of modeling, problem solving, algorithmic thinking that you need to start like in first grade or 5th grade to create the pipeline? And have you been involved, are you seeing any good stuff happening? Because by the time -- you are studying college students and you're starting to go back with that. >> Margaret Burnett: Right. Okay. So I have a yes and a no for you to answer that question. The no part is I'm not really directed in that. But the yes part is I'm very aware of that going on and I can tell you if you don't already know about it, about a great resource to connect with on kind of the state of the art in that area and that's NCWT. Okay. So those of you who haven't heard of that, National Center for Women and Technology is this wonderful organization whose mission is to collect what everybody around the country is trying to do and to try to turn it into a science instead of people just all kind of hacking away independently coming up with solutions that maybe don't work or maybe do. So NCWT is great in that area. We have been studying adults, college students and post-college students because -- well, I guess for more than one reason, but one of them is we think if we don't document and understand the effect the way it is among today's adults we're going to have to wait a really long time before we start removing this glass ceiling. So there are lots of people age, you know, 18 to 70 who are facing this already. So that's a piece of the pie we've tried to break off. And then of course we're focusing on software itself. But I do think that there are big implications for the way software that K-12 students are using should change, too. But we haven't focused on it. Yes? >> Question: So are there general rule -- rules of thumb for software design that, you know, avoid this style of approach because it tends to be male centric or -- I mean it sounds like you have very kind of specific information about the scenarios ->> Margaret Burnett: Uh-huh. >> Question: -- you are going to test on, but are ->> Margaret Burnett: Yeah. >> Question: -- can I generalize that through my product? >> Margaret Burnett: Well, here is another yes and no for you. So we haven't progressed to the point of actual guidelines because our research is too early and that is one of the reasons why we want to generalize it and get data from you all because we think we might be able to get there. But I do have two hypotheses based on the prototype work we've done so far. And one of them is nuanced interfaces. So if you're asking somebody to take a stand, it's right, it's wrong, okay. Maybe that is not the right thing. And so this nuanced interface thing, not only did it help to close the gender gap in our study, but also it turned -- it just -- it's just better. I mean, the men were using it, too. And you think about it, there's more information there. Now instead of saying it's right or it's wrong, if there were some that you weren't quite so sure of and then, you know, you make changes and the spreadsheet still isn't right, you know which ones to come back to. So by looking at the barriers for one group, what we've done here is come up with something that's better. So what I might suggest is that if there's a feature in the product you have that asks for some sort of judgment or perhaps dogmatic stand and you give some nuances to it, that would be worth trying out, trying some before-after comparisons. The other one, I have really less information about and so the other thing we've tried is this strategy thing which is explanations that are not about the features, but about the approach. And most software help doesn't have that. Some sort of online documentation help does, but now it's this big thing you have to read as opposed to little snippety low-cost things that you can get your hands on right away. So we do have, you know, a mounting body of evidence that seems to help. But -- those two things that we've tried are still fairly early in the game, so I would hesitate to actually call them guidelines at this point. But they're things that you can try. Yes? >> Question: So to follow-up on that question, then, aligning with the Selectivity hypothesis ->> Margaret Burnett: Uh-huh. >> Question: -- does that not then apply to -- is it because you've entered into the experience, you've then sought help ->> Margaret Burnett: Uh-huh. >> Question: -- and been presented with this volume of support, the Selectivity hypothesis not apply to that because that's providing a full-in picture rather than an incidental or nuanced here, you know, as needed to setup? >> Margaret Burnett: So the way we implemented our strategy approach was also as needed. And so for example instead of saying everybody's got to read this stuff, the way we implemented it is in the tool tip there's a button you can click that says either show me or tell me. And then that causes that other thing to come up. So I don't recall our statistically measuring whether the females actually asked for it more. We've seen that qualitatively. But ->> Question: It's instances of help, show me and tell me that's demonstrative versus narrative? >> Margaret Burnett: Right. Yes. And some people learn better one way and some the other. But one reason we really thought it was important to have a video version besides the learning differences is that self-efficacy theory says that one way to boost your self-efficacy is if you see someone with whom you identify succeeding at the same task. And so in those little video snippets, I don't know if you noticed, but we have a male and a female working together on it. And in one of our qualitative studies it was I think allowed and we were videotaping and we went back and looked at it and at that time the female actress was Nira Jaw(phonetic), who a few of you may know. And every time she said something, the subject smiled. So she was definitely identifying with her. >> Question: So is that quality either -- the I do, we do bobble with the way the videos were constructed or was it just simply ->> Margaret Burnett: I wish I could say that it did, but it didn't. (Laughter) Yeah. So yeah. And we're still iterating on those two, so it's quite possible that would have been an improvement to it even more. But I think what you were originally asking about was how that related to the Selectivity hypothesis and intuitively we thought that the females would probably ask for more information, but as I said, I'm not sure we've measured that statistically. We have seen it qualitatively, but I'm not sure we've actually tried to see if that's true in the numbers. But it seems to satisfy the ones who seem to want that more information. >> Question: I also wanted to ask another follow-up question with regard to the judgment. >> Margaret Burnett: Yeah. >> Question: So my role at Microsoft is product manager for a user experience (inaudible) web and we're implementing the ability for people to basically rate the knowledge that's being consumed. >> Margaret Burnett: Uh-huh. Uh-huh. >> Question: And so we're having discussion right now whether a graduated scale of one to five stars, what is on a best library. >> Margaret Burnett: Uh-huh. >> Question: Or a dichotomous, which is like what Dig and some of these other sites use ->> Margaret Burnett: Uh-huh. >> Question: -- to let us know thumb's up, thumb's down. >> Margaret Burnett: Uh-huh. >> Question: You're an advocate of providing gender on a graduated scale? >> Margaret Burnett: Uh-huh. Uh-huh. Yeah. And we haven't personally looked into this one, but probably there's literature that says that females are more likely to be polite about these things, too. So if you have a thumb's up, thumb's down, you know, more people are going to do the thumb's up probably if they are females. This is my guess. >> Question: Uh-huh. (Inaudible) ->> Margaret Burnett: Uh-huh. >> Question: -- pretend to think it's them and not the product. >> Margaret Burnett: Uh-huh. Right. Exactly. Yes? >> Question: Thank you. >> Question: A couple other things, first of all, kind of like what you guys were just mentioning. Is that when they go on a research, they usually go into (inaudible) work tools. One of my regular routine questions that I ask people (inaudible) was to ask them to mention AES software tool or any product that they had used that made them feel stupid. The product that was by far most commonly mentioned was Excel. So I found it ironic that was the controlled view that you chose that on this particular one because by far Excel was mentioned most commonly as being the product they chose. >> Margaret Burnett: Right. And I think it's not ironic because the thing is maybe that just turned out to be a fantastic example of a male-oriented tool that's out there and so it maybe gave us the magnifying class we needed to really see the phenomenon. We'll see how widespread it turns out to be. >> Question: (Inaudible) I didn't split in that tallying with (inaudible) ->> Margaret Burnett: Uh-huh. >> Question: It was routine (inaudible). >> Margaret Burnett: Uh-huh. Uh-huh. >> Question: In the range of people. >> Margaret Burnett: Uh-huh. Yeah. >> Question: The other point being that it was in a psychological research world there's quite a lot of existing information about the way that women approach problems, the way that women out there approach the world in general anymore, environmental position, whereas men tend to be approaching (inaudible) more kind of (inaudible) basis. When I'm thinking about, you know, the entire environment around a specific thing that they're doing ->> Margaret Burnett: Uh-huh. >> Question: Whereas men are more likely to be focusing on specific -- I -- how do I achieve this goal ->> Margaret Burnett: Uh-huh. >> Question: -- in a much more (inaudible) thinking in a more broader context and ->> Margaret Burnett: Uh-huh. >> Question: -- you think that relates to the trends you are seeing here or is that ->> Margaret Burnett: I think it could well and we've seen -- we've seen some literature that sort of kind of eludes to that, but I think we maybe haven't read all the right papers that we should, so if you actually have some references that you could contribute, that would be super. >> Question: Yeah. It's a really big area. >> Margaret Burnett: Yeah, I know, we won't be able to read all of it, but -- (laughter). You know, already we've been trying to keep our finger on something like five different domains as we try to gather the theories that are really relevant here, but we're always looking for really important things we've missed and it sounds like we could do some reading there so that would be great. >> Question: And another quick thing. While I was doing (inaudible) research (inaudible), one thing that we included in our general thinking about this that you guys might want to think about, as well, is use of social and media tools, some unplugged sites, for example ->> Margaret Burnett: Uh-huh. >> Question: -- who are maybe asking expert kind of option ->> Margaret Burnett: Uh-huh. >> Question: -- and things of that nature thinking about gender differences and use of those sorts of social ->> Margaret Burnett: Yes. >> Question: -- supported tools, as well. >> Margaret Burnett: Yeah. >> Question: So I would expect that they would be pretty strong gender differences in (inaudible). >> Margaret Burnett: I would think so, too. And we haven't, you know, we tried to focus on just one sort of style because, you know, that way we could really try to get some data and compare it. But, you know, if there are people working on that sort of tool, that would be so great because we do strongly suspect that could make a big difference. So, yeah. Did I see another hand somewhere? Maybe not. >> Mary Czerwinski: All right. Well, thank you again and... (applause)