Margaret Burnett from Oregon State University where she's a full... often and we have many interactions with her and her... >> Mary Czerwinski:

advertisement
>> Mary Czerwinski: Okay, everybody. It's my pleasure today to introduce to you
Margaret Burnett from Oregon State University where she's a full professor. She visits us
often and we have many interactions with her and her students, funding them in their
research, sometimes even hosting them when they do their dissertation studies and
whatnot.
Margaret has been super active in the area that bridges programming and HCI and she
has focused for a long time now on gender and most of her research and I'm sure what
she'll be talking about today has been done in spreadsheets and understanding how to
work in spreadsheets. It's fascinating stuff and she's also been the papers chair for Kai.
She's been on the steering committee for the Visual Languages and Human Centered
Computing Conference forever. So she's been super active and has published a lot in
this area. I'm sure she'll have lots to talk to us about today.
So welcome, Margaret.
>> Margaret Burnett: Thank you, Mary. So for those of you here and people remote,
I'm very happy to have you all here. Today I want to talk about gender HCI, what about
the software. So the idea is that people have been very interested in recent years about
gender differences that interact with the workplace in software development, with
education issues in software development. But what we've been interested in is what
about the software itself?
What happens when somebody sits down and faces a keyboard if they're male versus
being a female. I want to -- so that is what the talk is about. I want to also give credit to
my colleagues in the EUSES Consortium, that's End User Shape Effective Software. So
that's a group of about 15 of us across eight or nine different institutions, depending on
the day you count, who are working on anything relating to end user programming, but a
lot of us have a particular focus on gender.
So -- and by the way, those institutions are Oregon State, where I am, Carnegie Mellon,
Drexler, Penn State, Nebraska, Cambridge, IBM and University of Washington. I didn't
hear Microsoft in that list just right now, but of course that could change.
So, okay, here we go. Now why am I talking to you today? Actually I'm trying to get all of
you involved. So today I want to describe what we've done so far in trying to understand
this problem, then I'll be here starting in late March for three months on a sabbatical visit
and I'll be collaborating with Mary, who introduced me, and Chow Ziekbal, and Cory
Quinn and Gina Vinolia(phonetic). And so what we want to do together and look at
gender phenomena and how people use Microsoft products.
Our research goal is to try to generalize what we've done so far just pretty much in the
world of spreadsheets and see just exactly how pervasive that is across software product
usage, especially software products that are intended to have people solve problems. So
problem solving is especially what we're interested in. So that's our research goal. But
then a Microsoft goal would be to produce better products for males and females. So
what you could be doing, starting today or tomorrow or next week through June, is gather
some data as part of what you're already doing about your team's products. And if you
pick up gender 2 and start thinking a little bit about some of the issues that I'll be raising
today, then your product could be helping -- could be getting some benefit out of what
we'll be doing together and could be helping us, too, in understanding how far our results
really generalize. So I hope you'll get involved.
There are these pieces of paper that are going to be going around, in which you -- it's not
a commitment, but I'll get your name and e-mail address and your team or product that
you're thinking about perhaps contributing data to and what kind of data you might
imagine contributing if it's okay with your team and your time permits and all that kind of
stuff. So I hope you'll give me that so that I'll have a way to get back to you when you
finally get here, or if you can't actually write it down today, then, you know, you can send
e-mail to any of us who are working together to Mary or to Shamzi, or to Cory or to me
and that would be great.
Okay. So that's my involvement. You can contribute data about actually three things
existing features that you already have in your product. Or if you're one of the people
who's been following this research for a while, you may already have started to have
some ideas about what to change and if you actually institute some sort of different
feature then we could get old product versus new product, which would be wonderful.
The other thing we're working on is strategies. So if you could get data about strategies,
then that would be great, too. So I'm going to be going through all three of those today.
And you could be giving us surveys, videotaped interviews, user studies, stats, whatever
it is you've got. You could analyze it first or not. So we just want to connect with you in
whatever way would work.
Okay. So why do we care? There are two reasons why we should really care about
gender and software. The first is the pipeline story. So of course there are all of these
issues about women and Computer Science, but we think that software is one of those
issues that we should think about because just imagine, you know, supposing you
come -- you're a young girl, you sit down in front of a computer for the first time to do
something serious, something of a problem-solving nature. And it just isn't a good fit for
you. You know, what are you going to do? Are you going to leave and say, boy, I want
to be a Computer Scientist when I grow up. No. Okay. So there is a whole pipeline
reason.
But there is another reason that's perhaps even more important. Regardless of whether
those women would have gone into computer science if their experience were better, you
know, regardless of whether they were going to be graphic designers or business
analysts or accountants or whatever, if the software is limiting their success, there's this
whole class ceiling thing that's going on and we don't want that.
And in fact I'm going to tell you about somebody named Ashley, who had exactly that
thing happen. Ashley in high school had a career plan to become a graphic designer and
went to college and majored in graphic design. But, you know, the thing is in all these
classes then you had to use Flash and I don't know if any of you have used Flash in
recent years, but although it started out being kind of whizzy wig, it sort of turned into this
Java-like thing, mostly for software developer areas, and then web programming came
up. And so at that point, the major changed. No longer in graphic design, but instead in
art.
Now Ashley was very bright. In fact, before graduating, Ashley got one of these really top
awards at that university for being one of the top academic students in the whole
university, so this is not a matter of a stupid person.
So why is this? Well, perhaps the software was just not a good match to Ashley's
problem-solving style, learning style, confidence, information processing style and so on.
And that's what we think.
So what we've been doing to try to study this had is we've been starting by reading the
literature about theories that relate to gender differences in things like problem-solving
style, information processing style, self-efficacy, all those kinds of things. And from those
we derive specific hypotheses, which we go to the lab and refine tests, try to understand
empirically. We use those results to redesign prototypes and evaluate them and that
gives us new ideas about theories, refinements of the theories, new variance for our
whole set of hypotheses and we go around the circle again and again and again.
So there are three things that I want to focus on today as I eluded to earlier. One of them
is we want to know is there something broken? Are there gender differences in feature
usage? And I'll tell you about that. Then we want to know, well, can we fix it at the
feature level? Can we make feature changes that might help to reduce the gender gap?
And then the third one is looking deeper. Whether there are things that we can't fix at the
feature level, unless we go deeper and really start thinking about the problem-solving
strategies, the males and females might be trying to use.
So first, feature usage. Is there something broken? Now this is the most mature of the
things we've been working on so we've been at it for quite a while. I'm going to give you
a whirlwind tour of five studies. Obviously we won't be going into a lot of depth on all five
of these. The first one was qualitative. We were looking at feature interests. We then
took that to a statistical lab study to see if we could find some statistical differences in
feature usage and self-efficacy. Then we did a qualitative follow-on. Then we did
another statistical study in which we looked at the ties of what we'd found before with
feature usage to tinkering and finally we did another statistical study that was actually
hosted here at Microsoft in which we looked to see how our previous studies, which were
in our own prototype, spreadsheet system, generalized to itself a commercial system.
Okay. Study number one. Feature interest. So when we began, we wondered from
reading the theoretical literature whether we might see some differences in the amount
males and females were actually using various features in spreadsheets. And we had a
lot of fun. We had a bunch of old data. We spread it out all over the floor. We had a
profile that looked like these for every single user. Actually the profiles were more
complicated than this. They had lots of different colors in them and so on and so forth,
but this one's particularly telling and what we would do is we sorted them out and then
we'd look at the back of the piece of paper to see if it was a male or a female. And
sometimes we said, oh, my gosh, look at this, everybody in this pile is a male.
And so one of the things we found out is there was this tremendous difference in the
amount the males were using these features. Up here is count and down here is time.
Okay. Versus the amount the females were using the features. As I said, these are just
two users, but these are fairly representative. So anyway, from this old data, we formed
some hypotheses and went to the lab for a statistical study. One of the things we also
got out of the literature was that there might be a potential tie between this feature usage
and self-efficacy. Self-efficacy is a specific form of self-confidence. It's regarding a
specific task. So your confidence in your ability to perform a specific thing like debug this
spreadsheet. It's a general theory. It's been found to be very productive of willingness to
try, perseverance and so on and so forth.
And in the past literature we learned the females traditionally had lower computer
self-efficacy than males, so we thought this might be an important factor. And we went to
the lab with this research prototype of a spreadsheet system which I'll explain to you in a
minute. And the task that we set forth for our users was to find and fix errors in
spreadsheet formulas and the reason we used our prototype in this one, instead of Excel,
is that we have some specific features that we had previously designed to help with
exactly that task.
So here are the features. We divided them into three categories. First there was the
familiar type. Everybody knows how to edit formulas or at least everybody in our study
did because we made sure they had prior spreadsheet experience. In our prototype you
do it with a text box instead of the way you do it in Excel, but it's still a very familiar way of
doing things. Then also we taught them two features in a little tutorial at the beginning.
We knew that none of them had any experience with these features because they're
unique in our prototype. And one of them is a checkmark.
So if you happen to notice that a value is correct you can check it off. And if you do that
under the hood, the system is making calculations about how thoroughly tested your
spreadsheet is using a formal test adequacy criterion and it does that behind the scenes
and then what it does in front of the scenes is it takes the cell borders and colors them
along a continuum from red to blue, blue being more tested, more testing coverage.
So that checkmark actually turned this cell all the way blue. And this one, it tested
partially according to our criteria, so that made it purple. So that was one thing we taught
them. We didn't tell them about test adequacy criteria, we just sort of gave them kind of
the naive version of that and then also there are these data flow arrows, which you can
pop up and those data flow arrows do what you would expect, but in addition they also
have the testedness coloring here so that you can see that the interaction between these
two is also fully tested because it's blue.
So we taught them that. And then there was one other feature that we did not teach
them, the X. So instead of noticing that a value is right, you might notice that it is wrong,
in which case you can X it out. And if you do that the system reasons about which
formula may have actually been at fault for that because it might not be the one here.
And colorizes these, highlights them in darker and darker shades of yellow depending on
how implicated it thinks a cell is. So those were the features in these three categories,
familiar, taught and untaught.
Oh, and there was a little tool tip thingy, too, so they could explore any feature they
wanted to, taught or untaught, with this tool tip stuff. Okay. So what'd we find out?
First, notice this lovely 45-degree line. We found this in study after study after study.
This is feature -- effective feature count and this is self-efficacy. Okay. We always get
this. The lower the self-efficacy for females, the higher -- the lower the use of more
advanced features, the effective use of more advanced features. As their self-efficacy
goes up so does their feature usage. Look at the guys, they're all over the place. Look
at the P value. Flip a coin. Okay. I mean, there are low self-efficacy men and there are
high self-efficacy men, but it has no relationship to whether or not they use features.
Yes?
>> Question: When you say men are very (inaudible) undergraduate students or do you
mean all people throughout the population?
>> Margaret Burnett: So this study was undergraduate students, not computer science
majors. We're not allowed to have very much computer science background, either, and
many of our studies have been business students only. I can't remember if this particular
one was. In a couple of our other studies we went beyond those age groups and in fact
the one that was sponsored by Microsoft in Excel was not students, but we haven't gotten
to study number five yet. But we always find this. This graph, every single study, we find
this.
Yes?
>> Question: The feature that wasn't bought -- that wasn't taught, was it cased
advertised to the users? (Inaudible) ->> Margaret Burnett: Marginally. We said, and, you know, and you can see there is
also this X mark. So we just sort of tossed it out there, but we didn't say you should be
using this, you know, we just tossed it out there. And of course they knew that they had
tool tips available over everything because that came up in the tutorial.
So the bottom line here is that self-efficacy mattered, but it's not just about self-efficacy.
It impacted females differently than males. Okay. So what about trying new features,
just touching them? Well, so this is the time they first touched something. Look at the
females. They were much faster to start with the familiar feature, namely editing
formulas. The guys were later. Type taught. The females are much later at even
touching the types of features we taught them.
Look at the untaught feature, okay. I mean, okay. So bottom line, the female has
ventured to try out new features much later than the males. What about genuine
engagement? We had a way of measuring whether people were really following up on
things that they were trying out. So for the males, type taught, they were much more
engaged. For the females, type familiar, they were much more engaged. For the
untaught significantly more males used the untaught features than the females. Bottom
line, the females engaged in the new features less than the males.
Okay. So let's get to the chase here. The goal was to fix bugs in that study. So how'd
they do there? With the sea of bugs that were fixed, there was no difference. However,
the females were significantly more likely to introduce new bugs that had not been there.
Okay. So let's think about this a minute. Maybe they're just stupid, right? They just
enter it. Okay. Is that what happened?
Well, there's only one way to introduce a new bug. How do you do it mechanically? Any
ideas?
>> Question: Write your own formula.
>> Margaret Burnett: You can edit a formula, that's right. What are the females
spending all their time doing? Editing the formulas. Okay. The males were using all of
these other features to help them problem-solve, too, so they were just spending less
time in this way to introduce bugs.
Furthermore, there was this kind of self-fulfilling prophesy going on. So in our study the
females had significantly lower self-efficacy than the males. And this was, as you've
seen, tied to their feature usage, not true of the males.
So we asked -- one of the things that we had on our post-session questionnaire was if
you didn't use this feature or that feature, why not? And the females were significantly
more likely to say because I thought it would take me too long to learn them. But, in fact,
we also had a comprehension test at the end on how these various features worked,
what would happen if you did this or that or the other. There was no difference in their
comprehension of the features. Even though throughout the course of the task the men
were getting a lot more practice with them. There was still no difference in
comprehension.
Furthermore, we also know that using these features helps. We know in this study and in
several other ones that the use of these features does help you to find and fix bugs. Yes,
Andy?
>> Question: Can you go back one slide? I just want to ask you a question ->> Margaret Burnett: Sure.
>> Question: -- about the new bugs introduced.
>> Margaret Burnett: Yeah.
>> Question: Is that number normalized by how much actual time the female spent
editing ->> Margaret Burnett: No.
>> Question: -- formulas?
>> Margaret Burnett: No. This is just raw new bugs introduced.
>> Question: Because like when you're code -- when you're coding, the more lines you
turn, the more likely you're going to introduce bugs.
>> Margaret Burnett: Exactly.
>> Question: So if you gave -- the more time you spent editing formulas, the more likely
it is you're going to introduce bug into a formula.
>> Margaret Burnett: That's right.
>> Question: So if you normalize that then you can see if females are more likely to
introduce bugs than males independent of the time that they spent editing formulas.
>> Margaret Burnett: Well, I would argue ->> Question: You're not done.
>> Margaret Burnett: I would argue that I'd rather know it this way than your way
because what we really want to know is the collection of features available to them in the
way they're using that more likely to lead them down this bad path? And so that is what
this set of numbers tells us. Yeah.
Okay. Let's see here. I think I polished this one off. All right. So basically not using the
features was pretty much tying one hand behind their backs. Okay. Then we did another
study qualitative in which we had users talk aloud as they found and fixed areas in the
spreadsheet and here is one low self-efficacy user telling us how she regards these
wonderful features of ours. What's this little arrow doing everywhere? So I need to take
this -- oh, my goodness, now what's happening? Too much happening. Okay?
So this user was not entranced with our features. Here's another one. This is a different
feature which you haven't seen, something called Guards. Guards are kind of sort of
related to Excel's data validation thing so you can provide a range that expresses the
values within which a cell should fall.
So this female's using them, she's very production oriented here. She says, so, 0 to 100
is the Guard I'm entering. Okay. Okay. Hmmm. It doesn't like the minus 5, they can get
a 0. That gets rid of the red circle. So you can see that she's business-like,
production-oriented. Her motivation is to use these exactly the way we designers
intended her to use them to solve these bugs.
>> Question: (Inaudible) -- circle.
>> Margaret Burnett: Yeah, there's this circle that's ->> Question: This is emotional.
>> Margaret Burnett: There is this circle, yeah, but she's making it angry. Yeah.
All right. Now here's what the male says. He starts down the same path she does. The
first thing I'm going to do is go through and check the Guards for everything, just to make
sure none of the entered values are above or below any of the ranges specified. So
homework one, actually I'm going to put Guards on everything because I feel like it. I
don't even know if this is necessary, but it's fun. Okay. It's fun.
But then look what happens. He gets into it for the fun of it and then it starts doing him
some good. So okay it doesn't like my Guard apparently. Okay. Ah-hah. The reason I
couldn't get the Guard for the sum to be correct is because the sum formula is wrong.
Okay. So they both got benefit, but the male thought it was fun. So this caused us to
start thinking about tinkering. We said, okay, so the guys, you know they kind of like to
play around with this stuff.
So we did a study very similar to the previous one. First let's think about -- let's look at
what happened with the females. For the females, increased tinkering was good. Okay.
If they did it, it led to more effective use of these features, which we call percent
testedness measured that way and that in turn was predictive of bugs fixed. Okay. So
for females tinkering was good for them.
For males, testing effectiveness, the features were good for bugs fixed, but the tinkering
was not tied to more use, effective use of the features. So this is kind of strange. The
male's tinkering was maybe not the world's hugest advantage here. So in fact ultimately
increased tinkering was inversely predictive of the bugs fixed because of this bit right
here.
Okay. Why? Well, there were two things that went on. The first was pausing. The
males tended to not pause. The females did. When they tinkered, they tinkered
pausefully. Okay. Now the education literature says that pauses improve critical
thinking. And our results showed that in fact the pauses mid-tinker, tink-er, like that
instead of tinker...tink-er, tink-er, tink-er. Okay. Tink-er was predictive of understanding
and effective use of the problem-solving features.
Why number two? Well, we had two environments in that study. The low-cost
environment that you've seen, I've shown this to you in previous slides so you click here
and you get it here. You click here and it goes away, very easy, very low cost to tinker.
For the other environment that we had, I'll tell you a little bit more about this, but sufficed
to say, right now it was a little more complex and tinkering was not so easy. Look at the
females. Doesn't matter which environment they're in, the amount of tinkering they do is
about the same either way. Maybe it's because of the pauseful way they did it. For the
males, however, look at this huge difference. Okay.
Now the difference between these features and that feature is not really that huge and
yet look at this huge difference in the males. Here they're obsessive in their tinkering.
Here their level of tinkering is at the same level that turned out to be good for the
females. Okay.
So we've got this sort of tinkering obsession thing going on with the males. All right. So
one more quick study and then on we go to the next topic. We then tried to replicate this
in Excel. We were explicitly interested in replicating that second study, the one that
looked at self-efficacy and feature usage and gender. We had different software, namely
Excel, a different population. This was a wide span of adult ages and occupations, all of
whom had spreadsheet experience. They were in the Seattle area and once again, they
could not be system developers, they couldn't have degrees in computer science, all that
kind of stuff.
Instead of debugging, we had them do maintenance, which basically means create your
own bugs and then fix them. And we had a more complex spreadsheet. We focused on
the audit toolbar and we taught some of those features and didn't teach other features
and so it was pretty much the same experiment design I've explained to you before.
Participants could use any Excel feature they wanted, they weren't limited in what they
could use. And just to overview the results, look at all these X's. Okay. The pink are the
females and the blue are the males. Self-efficacy is a predictor of success. Here again,
we had significant differences. Okay. Yes for the females, not significant for the males,
although you can see an upward trend it's not as pronounced as for the females.
Self-efficacy's tied to familiar feature usage. All three of these tests were about that.
Look at these X's. Once again for the females these trends were significant. For the
males they were not. Finally, self-efficacy's tied to the usage of the untaught features.
This one we didn't get anything significant, although you can see that the trend seemed
to be more pronounced for the females, but we didn't get significance on that one.
So in summary on what we've found out with -- is there a gender difference in feature
usage? The answer is yes. We found it in feature usage. We found it in self-efficacies
tied to feature usage. We found it in propensity to tinker. We found it in tinkering's ties to
self-efficacy, and we found it in tinkering's ties to effectiveness. We found it in five
different studies and a whole bunch of different populations. It's real. Okay.
Now this, again, is something that we've done only in spreadsheets. So if you all can
help us gather data on other products, we'll know how real it is there, too. But now can
we fix it? Yes?
>> Question: Do you actually do some work on educational technique or educational
backgrounds if you look at it from that perspective on the sources?
>> Margaret Burnett: We've tried to control for that. So in some cases we in the earlier
studies we had all business students with a particular minimum amount of spreadsheet
stuff they had to have done and no computer science and all that kind of thing. In
general whenever we do any of our studies we collect their GPA and their majors and the
number of years of programming experience they have, if any, because nowadays in high
school a lot of times you get it. And you know, number of years of spreadsheet
experience and we've never explained our results in those ways, but we always look for
it.
Okay. So what can we do to fix it with features? Now this is much less mature than the
other work so I don't have nearly as much to tell you about it, but I do have a couple of
things to tell you. So in the original prototype, of course we had these features that I've
shown you and in fact sure enough they did encourage tinkering. We then added --
oops. We then added -- where did I put that? Oh, here we are. We then added these
new things. So we added a more expanded version of Help. So in addition to the tool
tip, which even in the original version it explains what people see and it also explains a
little bit about why they might care. Down here it was sort of strategy tips. What can you
really do about it, a little bit more in-depth explanation.
So, this was version one and so we did that and the other thing we did -- well, we did
three things. Another thing we did is we added maybe marks. So this it was this means
the value's right. This means the value's wrong. These two kind of grayed out ones in
the middle -- well, not grayed out, but lighter, this one means "seems wrong maybe."
This one means "seemed right maybe," and those all had tool tips. Yeah?
>> Question: Compared to the tool tips, you know, it was introduced during a tutorial
and is that tutorial consistent throughout the experiment?
>> Margaret Burnett: Was the tu -- we did a tutorial at the beginning, if that's what
you're asking. And yes, both groups got the tutorial. I think I'm answering what you
asked.
>> Question: I'm just wondering how discoverable the tool tips were ->> Margaret Burnett: Very. Yeah. They were too discoverable, actually. Yeah. They
were very discoverable. Any time your mouse spent any amount of time over anything
they came up and this little thing here was one of these pull-down things, but we taught
them how to do that in the tutorial, as well.
>> Question: Okay.
>> Margaret Burnett: Uh-huh. Okay. So let's see here. So the reason we introduced
these maybe marks is these were intended to be a communication to those who felt they
might not be sort of qualified to make the right-wrong judgment. Maybe I'm just not sure
enough of myself to say it's right or it's wrong. And so these -- "seems right maybe,"
"seems wrong maybe," we put them there thinking that low self-efficacy users might be
encouraged by those and use them. And what happens is the border colors then are just
a little bit more faded out if they make use of them.
So that's what we did. And we also might care. Down here it was sort of strategy tips.
What can you really do about it, a little bit more in-depth explanation.
So, this was version one and so we did that and the other thing we did -- well, we did
three things. Another thing we did is we added maybe marks. So this it was this means
the value's right. This means the value's wrong. These two kind of grayed out ones in
the middle -- well, not grayed out, but lighter, this one means "seems wrong maybe."
This one means "seemed right maybe," and those all had tool tips. Yeah?
>> Question: Compared to the tool tips, you know, it was introduced during a tutorial
and is that tutorial consistent throughout the experiment?
>> Margaret Burnett: Was the tu -- we did a tutorial at the beginning, if that's what
you're asking. And yes, both groups got the tutorial. I think I'm answering what you
asked.
>> Question: I'm just wondering how discoverable the tool tips were ->> Margaret Burnett: Very. Yeah. They were too discoverable, actually. Yeah. They
were very discoverable. Any time your mouse spent any amount of time over anything
they came up and this little thing here was one of these pull-down things, but we taught
them how to do that in the tutorial, as well.
>> Question: Okay.
>> Margaret Burnett: Uh-huh. Okay. So let's see here. So the reason we introduced
these maybe marks is these were intended to be a communication to those who felt they
might not be sort of qualified to make the right-wrong judgment. Maybe I'm just not sure
enough of myself to say it's right or it's wrong. And so these -- "seems right maybe,"
"seems wrong maybe," we put them there thinking that low self-efficacy users might be
encouraged by those and use them. And what happens is the border colors then are just
a little bit more faded out if they make use of them.
So that's what we did. And we also had something else, testing scaffolding, which I won't
talk about today.
Now, the down side of this interface is that it was a little bit more complex and making
use of the features, now you've got different intensities of border colors to sort out and
you have four instead of two to choose from and so it was just a little more complex and
there was more visual feedback.
But there were some good points. One of them was and this was just a set of preliminary
trends at first. So the low confidence marks over time we began to notice that the gender
gap in male and female usage seemed smaller than it had been before. And we also
began to notice that males were using them, too, although the females seemed to be
using them more.
Then we changed the interface and instead of that pull-down tips thing which we never
were able to implement in any very nice way, we added little video snippets instead and
also hyper text as an alternative that explained strategy. And these were -- well, we were
aiming for a minute or less. Sometimes we didn't make it, but that is what we did and in a
qualitative study we found out that these were liked by females and the females all
commented it improved their confidence in their ability to solve spreadsheet bugs.
So then we did a third variant, it had the same nuance judgments and even better
strategy explanations and we presented that with a statistical study at (Inaudible) CCO8.
And in that one we found some very good things.
Keep in mind, these feature changes I'm telling you about here, these are not big
changes. Okay. We're talking about taking the two tuple and is turning it into a four tuple
and we're talking about instead of just the normal tool tips also having a hyper text and
little video snippets feature. These are not huge.
Now one of the things we found is that the females -- is that these were very effective
together at helping to close the gender gap. So here we are with tinkering with the X
mark. Here it was fairly, both of them were pretty small. You'll notice here in fact the
females were experimenting with those more than the males in that particular group.
Here tinkering with the check marks, look at this. Here we are again with these control
males doing huge amounts of tinkering.
Then we add, you know, for the group that had the new interface, it's all nice and even,
almost flat. Furthermore, we can see that females on both sorts went up when we had
the new interface. And the males, here it's almost the same. They went up a little bit, but
thankfully they didn't exceed the females. And here they went down. Remember, we like
that.
Remember the males and their obsessive tinkering? Okay. It's not so easy to do in the
other interface. You have to kind of go click, click, click. I don't know, it's just not quite
worth it as much. Okay. And so this brought them down to the level we want them to be
at, which is about the same level as the females. So the females are going up, which we
want. And the males are going down, which we want.
>> Question: I thought there was a difference between the low cost and the high cost
tinkering.
>> Margaret Burnett: Okay. So, sorry, I didn't really explain this. The C, that's control.
So control males control field -- control females. So this first half, this is the old
environment and then the second half, the treatment, those are the new environment.
Does that -- is that ->> Question: When you say that men tinker less with high cost ->> Margaret Burnett: Uh-huh.
>> Question: -- features, but more of low cost features and you're not turning the
(inaudible) here.
>> Margaret Burnett: I am. So here. Let's just look at this graph because it's the most
pronounced with the checkmark tinkering. So in the low-cost features, the males are
tinkering more. And then in the higher cost, the new interface with the four tuple and the
strategy explanations, it goes down.
Uh, no, for the males, the other way was up. Yeah, now it's down. But I've switched -- I
switched left to right. Maybe I switched -- no, no, no, I haven't. I don't know, it did the
right thing.
Okay. Let's see here, I can show you again later if you want. Let's see here. Okay.
What about the difference in self-efficacy? Because you may remember that the females
had lower self-efficacy than the males, too, in most of our studies. So we measure
self-efficacy at the beginning of studies and at the end. And it always goes down. They
always, no matter male or female, you know, they say yeah, I'm pretty good at debugging
spreadsheets and then you give them one to do and at the end they think, maybe I'm not
quite so good at that after all because...Yeah.
Anyway, but for the difference in the treatment females versus the control females their
self-efficacy went down less. So this is good. And furthermore, so you may say, well,
gee, why don't we just tell them they're wonderful and then we could really just make their
self-efficacy be great. But the really good thing is they were better judges of how their
performance had really been. Okay.
So when we compared their post self-efficacy judgments against bugs fixed, the
treatment females were much more aware of how well they had really done, which is
good. You care because when do you decide to ship a product or rely on a spreadsheet?
It's when you think the bugs are done. So if you're not correct about when the bugs are
done you're either fiddling around with it way too long or else fiddling around with it way
too little.
And these were -- we got these out of our questionnaire data, but these were triangulated
against post-session questionnaire answers, as well. So let's look at attitudes. Here one
of the things we can see is that their overall attitudes with the control version, look at the
difference between the males and the females. The females did not like that original
version. But here it's almost the same. Okay.
And just focusing on information, which was kind of focusing on those video snippets and
the various kinds of help we gave, at the beginning nobody really liked them much. And
at the end, people liked them more. But the females especially. And furthermore, we
can see that for both of these measures the females went up and look at this, the males
went up, too. This is very cool. Okay. We're trying to fix the gender gap, right? We're
trying to do things to enable females to be able to use this software more effectively. And
what we're finding out here is it's helping everybody. Okay. Very cool.
What's the scale? Let's see. What were those? Oh, these were scores. I think these
things were liker things and we just added them all up and so these were just liker score
sums. So yeah, yeah, that is what they were. Yeah, you just, you know, take 0 to 5, and,
you know, six questions. Let's see, how many subjects do we have here? This one,
yeah, let's see here. Hmmm. I may not be telling you everything, but we did get a lot.
Yeah, I'm going to have to go back and look at that for real, but I think it's better than it
looks there.
>> Question: (Inaudible) -- for a number of measures.
>> Margaret Burnett: Is it the mean?
>> Question: I think it's a mean.
>> Margaret Burnett: It's probably a mean.
>> Question: For access to information.
>> Margaret Burnett: It's probably a mean. I'll have to double -check that. Yeah.
Yeah. And I think possibly -- aha, I know, I know. I know what it is. It is either mean or
sum, I don't remember which, but I know why we went up to 30 here. It is because we
went up to 30 here, but you don't actually have 30 points just available for talking about
information. So the question is what was the max and I can't remember. But the females
were quite positive. It was quite startling.
And your question about how many subjects. I think that that's the study -- let's see here.
We had close to 60. I can't remember for sure. Yeah, something like that.
All right. So we've seen that there is some gaining that you can make just by small
changes to features. And this is good. But one of the things that occurred to us is okay
in those new features we had the maybe marks, the nuance judgments and we also had
the strategy videos and what those strategies were about was what -- incorporated what
our features were about, which was testing. So we began to wonder, well, if females
maybe want to go about things some other way, then really promoting testing might not
be going deep enough. So we should be thinking about what their problem-solving styles
and preferences and success stories are really like from a strategy perspective.
So that's what this third category is about and we're still fairly new at this, too. So I don't
have a ton to tell you about, but I can tell you about some things.
So this is about a study we did and published at Kai '08. And so we were interested in
end-user strategy as they problem solve with spreadsheets, the same as these other
studies. Now the thing about strategies is they're in the head. Okay. So how are you
going to get that? You can't get it from the world.
So what we did is we used a lot of triangulation. So we used participants' words about
their strategies and combined that with electronically logged behaviors about what they
did and did playbacks of that so that we could understand what they did in some detail
and then we also did an independent data mining study in parallel, which would also be
about what they did. So we used all of those things and that data mining study was on
different data. So we did all of those things to try to understand what we had in terms of
strategies.
So we asked people what kind of strategies they used and we translated those into
one-word codes and they told us about eight strategies. One was data flow and that was
a male strategy, as you'll see. One was testing and that was a male strategy, as you'll
see. One was code inspection looking at the formulas and that was a female strategy, as
you'll see. One was specification checking, which was a female strategy. One was color
following. Now if there are people in here who are kind of sort of software engineering
people, you know, don't freak out. Say, color following, that doesn't count. Well, it does
because if -- when we asked them their strategies, they said following the colors, then
that counted. Whatever they said was a strategy, was a strategy. To do listing was a
female strategy. Fixing formulas, that was pretty much how it was described as the
strategy, that was a female strategy. And finally spatial, oh, good, we don't have a
gender on this one. Okay. One out of eight, we didn't find gender differences.
So let's look at these counts. The orange diagonal tells you the amount for each
particular strategy. Testing was the biggest. We had 61 participants in this so this is
roughly three quarters of the people mentioned something oriented toward testing.
Talking about fiddling around with the values to see how they turned out or various other
things they could have done that were testing oriented.
Almost three quarters also talked about code inspection or talked about code inspection
and of those about three quarters of them talked about the other. So testing people often
also mentioned code inspection, code inspection people also often mentioned testing.
Okay.
The next most popular one was specification checking B. Half of the people mentioned
that. And of those about two-thirds of them tended to also use testing and/or code
inspection. So those are the biggest ones that were mentioned. I don't have time to talk
to you about all of them today, but I can tell you about those three. I have time to talk
about those.
So data flow. In the words one male put it this way. Systematically go from the formulas
that rely wholly on data sales progressing to formulas that rely on other formula-driven
cells and so on. Basically he's following the references of varied data flow oriented way
of going about things.
Now so everybody who we classified as being -- oops, sorry. Everybody who we
classified as being successful, we talked about data flow, was male. Okay.
We then looked at their behavior and what we found is that for males the number of data
flow instances that we could count, the actual moving back through a data flow chain,
was positively correlated with bugs fixed for males. Not for females, look, it is just all over
the map. Okay.
Now there's a theoretical explanation that we could apply to this. There's been some
very interesting work on information processing styles from another domain, namely the
domain of marketing. And there is a hypothesis that has been born out quite repeatedly
in other domains called the Selectivity hypothesis. And it comes down to females
statistically are more likely to want comprehensive information before they start taking
action.
Whereas males are more likely to do something that you might think of as sort of a breath
first -- sorry, a depth first thing. So they take the first promising cue and sort of follow it
up. And if it doesn't pan out, back up and try something else. Okay. So there's been a
lot of literature about this and we tend to see it a lot and that could be the explanation for
this phenomenon here. Keep in mind these differences are statistical, right? I mean,
there's no completely typical male or female, so these are all just statistical.
Now there's one feature that is very data flow oriented and that's those data flow arrows.
And we found that for the males usage of arrows correlated in a positive direction very
significantly to their talking about data flow. For the females, not so. And we also found
that the males use those arrows almost twice as often as females. Okay. And we found
this in several other studies, as well. Okay. What about testing, another male strategy.
So here's the way one male expressed it. After correcting everything, I press Help Me
Test, so that's a testing oriented thing and double checked the values, so there's a testing
oriented thing. To make sure they work, next I plugged in values. Okay. So very
interested in testing. Now in terms of statistics, we didn't get gender differences in who
talked about testing. But remember three quarters of the people did so it could have just
been a ceiling effect, but in behaviors we found testing correlated to bugs fixed. Not so
for females.
In fact, boy, oh, boy, did it correlate for the males. So if we looked at successful males
versus unsuccessful males, what we found was that value edits, successful males did
more of it. Value edits, plus the Help Me Test scaffolding successful males did more of it.
Percent testedness, which is, you know, how effectively they're using the features, males
did more of it. But the females, we got nothing, absolutely nothing.
So then, if you compare only the successful males versus the successful females, we
have once again successful males more, successful males more, successful males more.
Okay. Testing was definitely a male oriented strategy.
So let's look at code inspection. This of course was examining the formulas themselves.
One female said, I then looked at the formulas to see if I could find mistakes. In our
environment, just short of showing four versions of the same cell for some reason, but
anyway, you can post up the formula, have it sitting there and you can leave it there and
have others visible at the same time. So it's possible for you to see as many formulas as
you want to, plus the values all at once. So that was the facility they used for code
inspection.
So for the females, when we looked at the number of instances of people leaving up
multiple formulas at once, the successful females did that significantly more than the
successful males. And also when we looked at the number of formulas they had lies
around in all of the instances we got the same results, successful females significantly
more than successful males.
So okay so those are the findings that I have to tell you about. Now in addition we've
started branching out a little bit in a few other environments that we've had time to look
at. So we did a study in power shell, actually my student Valentina did that on an
internship with the Power Shell Team and found consistent results with the previous ones
that I've just told you, plus three additional strategies because Power Shell is control-flow
oriented and they had a few fewer restrictions, as well, on what they could do. And also
found suggestive results on exactly how and when the strategies actually get used. And I
guess I didn't mention this here, but we have a paper coming out about that for the EED
paper -- or conference.
Our hypothesis at the moment is that when -- for example, when you look at code
inspection, that that's tied to female success in both finding and actually fixing bugs. So
that's what we're trying to do now is really break it down and find out, what are they using
these strategies for and how can we better imagine sporting those?
We also have some emerging studies coming out focusing on strategies in both pop fly
and going back to Excel with a sense-making perspective. So those were sort of
analyzing and conducting right now.
So that's -- so the strategy results overview is we've found eight -- actually it's up to 11
now, end users, debugging strategies, but the study I told you about, it was eight. Seven
had gender differences. The males, the biggest telling points were with data flow and
testing, which were both positive uses of those strategies for success. For females, code
inspection, to-do listing and specification checking and they also did something called
fixing formulas, which was actually a bad idea.
Now just kind of thinking of Excel because a lot of this is done in spreadsheets, let's see
here. Data flowy. Okay. We know how to do that in Excel. We have those arrows.
You've got that sort of color thing, you know, that you get when you click in a formula.
Don't have much in the way of testing, but at least we can fiddle around with values pretty
easily. So, you know, it's something.
How do you do code inspection in Excel? Well, you memorize a formula and then you
can go look at some different formula and memorize that one and then you can look at
some other formula and memorize that one or you can switch to another view and get all
the formulas, but now you lose the values. Okay. So -- okay. So I think there's some
room here. Okay. All right.
So let's get back to Ashley. Remember Ashley from the very beginning, that person
who -- who's major changed from graphic design to art? Okay. Well so what we wonder
is whether maybe these were because of self-efficacy issues, maybe because of
motivational issues, maybe because of problem-solving style issues, maybe because of
information processing style issues, all of which have been shown in many domains to
have gender differences. However, remember that those are all statistical. Right? And
there is no male that is exactly male and there's no female that's exactly female.
And in fact Ashley is a male. Okay. And yet encountered all of these stakes because of
his particular problem-solving styles. So designing for both genders is about taking down
barriers. Okay. And if we do that, it benefits everyone. So that's that.
I have a whole bunch of papers and so if you either go to the Users Consortium site and
click on Gender, just go to my home page and click on Gender, you'll get there, but I
have a call to arms, so I'm really hoping that you all can help to contribute to this work.
Some of you came in late, but I have these pieces of paper that hopefully are going
around and maybe we can start another one around.
And so what I'm really hoping is that we can get data on a bunch of Microsoft products,
things that you can do as part of your regular day jobs, just be sure to collect gender 2 on
whatever studies are going on. Perhaps either with existing features or if some of you
already have some ideas about things you might want to try, an old feature versus new
feature thing would be great and also we are very, very interested in anything we can get
on strategies, problem-solving strategies.
And what you could contribute would be interviews, surveys, user studies, statistics,
whatever you've got. You can analyze it or not. And our hope for outcomes, well, we
hope there will be a technical report of some sort that will be valuable to Microsoft and
presentations, of course, inside Microsoft. We're hoping to get a paper out of it, too, and
hopefully product improvements.
And so for those of you who came in late, this is what I'll be focusing on for my
three-month stay here that starts toward the end of March and I'll be working with
Shamziek Baugh and Mary Czerwinski and Cory Quinn and Gina Vinolia on this. So if
you don't want to scribble anything down on that little piece of paper and would rather just
send e-mail to one of us letting us know what you can contribute, that would be great,
because we'd really like for some data to actually be here pretty early in our stay so that
we can really start tearing into it. So, yeah, hoping you can contribute.
(Applause)
>> Mary Czerwinski: Are there questions?
>> Question: Yes. Because I do a lot with K12 and product service center, how much -what it is your tie in terms of the research with computer science teachers of America?
We're trying to create -- there's a problem with this self-efficacy among girls is hard ->> Margaret Burnett: Uh-huh.
>> Question: -- it's happening in games, etcetera. So CSTA is looking at (inaudible)
computer science curriculum. What is the sort of modeling, problem solving, algorithmic
thinking that you need to start like in first grade or 5th grade to create the pipeline? And
have you been involved, are you seeing any good stuff happening? Because by the
time -- you are studying college students and you're starting to go back with that.
>> Margaret Burnett: Right. Okay. So I have a yes and a no for you to answer that
question. The no part is I'm not really directed in that. But the yes part is I'm very aware
of that going on and I can tell you if you don't already know about it, about a great
resource to connect with on kind of the state of the art in that area and that's NCWT.
Okay. So those of you who haven't heard of that, National Center for Women and
Technology is this wonderful organization whose mission is to collect what everybody
around the country is trying to do and to try to turn it into a science instead of people just
all kind of hacking away independently coming up with solutions that maybe don't work or
maybe do.
So NCWT is great in that area. We have been studying adults, college students and
post-college students because -- well, I guess for more than one reason, but one of them
is we think if we don't document and understand the effect the way it is among today's
adults we're going to have to wait a really long time before we start removing this glass
ceiling. So there are lots of people age, you know, 18 to 70 who are facing this already.
So that's a piece of the pie we've tried to break off. And then of course we're focusing on
software itself. But I do think that there are big implications for the way software that
K-12 students are using should change, too. But we haven't focused on it.
Yes?
>> Question: So are there general rule -- rules of thumb for software design that, you
know, avoid this style of approach because it tends to be male centric or -- I mean it
sounds like you have very kind of specific information about the scenarios ->> Margaret Burnett: Uh-huh.
>> Question: -- you are going to test on, but are ->> Margaret Burnett: Yeah.
>> Question: -- can I generalize that through my product?
>> Margaret Burnett: Well, here is another yes and no for you. So we haven't
progressed to the point of actual guidelines because our research is too early and that is
one of the reasons why we want to generalize it and get data from you all because we
think we might be able to get there.
But I do have two hypotheses based on the prototype work we've done so far. And one
of them is nuanced interfaces. So if you're asking somebody to take a stand, it's right, it's
wrong, okay. Maybe that is not the right thing. And so this nuanced interface thing, not
only did it help to close the gender gap in our study, but also it turned -- it just -- it's just
better. I mean, the men were using it, too. And you think about it, there's more
information there. Now instead of saying it's right or it's wrong, if there were some that
you weren't quite so sure of and then, you know, you make changes and the spreadsheet
still isn't right, you know which ones to come back to.
So by looking at the barriers for one group, what we've done here is come up with
something that's better. So what I might suggest is that if there's a feature in the product
you have that asks for some sort of judgment or perhaps dogmatic stand and you give
some nuances to it, that would be worth trying out, trying some before-after comparisons.
The other one, I have really less information about and so the other thing we've tried is
this strategy thing which is explanations that are not about the features, but about the
approach. And most software help doesn't have that. Some sort of online
documentation help does, but now it's this big thing you have to read as opposed to little
snippety low-cost things that you can get your hands on right away.
So we do have, you know, a mounting body of evidence that seems to help. But -- those
two things that we've tried are still fairly early in the game, so I would hesitate to actually
call them guidelines at this point. But they're things that you can try.
Yes?
>> Question: So to follow-up on that question, then, aligning with the Selectivity
hypothesis ->> Margaret Burnett: Uh-huh.
>> Question: -- does that not then apply to -- is it because you've entered into the
experience, you've then sought help ->> Margaret Burnett: Uh-huh.
>> Question: -- and been presented with this volume of support, the Selectivity
hypothesis not apply to that because that's providing a full-in picture rather than an
incidental or nuanced here, you know, as needed to setup?
>> Margaret Burnett: So the way we implemented our strategy approach was also as
needed. And so for example instead of saying everybody's got to read this stuff, the way
we implemented it is in the tool tip there's a button you can click that says either show me
or tell me. And then that causes that other thing to come up. So I don't recall our
statistically measuring whether the females actually asked for it more. We've seen that
qualitatively. But ->> Question: It's instances of help, show me and tell me that's demonstrative versus
narrative?
>> Margaret Burnett: Right. Yes. And some people learn better one way and some the
other. But one reason we really thought it was important to have a video version besides
the learning differences is that self-efficacy theory says that one way to boost your
self-efficacy is if you see someone with whom you identify succeeding at the same task.
And so in those little video snippets, I don't know if you noticed, but we have a male and
a female working together on it. And in one of our qualitative studies it was I think
allowed and we were videotaping and we went back and looked at it and at that time the
female actress was Nira Jaw(phonetic), who a few of you may know. And every time she
said something, the subject smiled. So she was definitely identifying with her.
>> Question: So is that quality either -- the I do, we do bobble with the way the videos
were constructed or was it just simply ->> Margaret Burnett: I wish I could say that it did, but it didn't. (Laughter) Yeah. So
yeah. And we're still iterating on those two, so it's quite possible that would have been an
improvement to it even more. But I think what you were originally asking about was how
that related to the Selectivity hypothesis and intuitively we thought that the females would
probably ask for more information, but as I said, I'm not sure we've measured that
statistically. We have seen it qualitatively, but I'm not sure we've actually tried to see if
that's true in the numbers. But it seems to satisfy the ones who seem to want that more
information.
>> Question: I also wanted to ask another follow-up question with regard to the
judgment.
>> Margaret Burnett: Yeah.
>> Question: So my role at Microsoft is product manager for a user experience
(inaudible) web and we're implementing the ability for people to basically rate the
knowledge that's being consumed.
>> Margaret Burnett: Uh-huh. Uh-huh.
>> Question: And so we're having discussion right now whether a graduated scale of
one to five stars, what is on a best library.
>> Margaret Burnett: Uh-huh.
>> Question: Or a dichotomous, which is like what Dig and some of these other sites
use ->> Margaret Burnett: Uh-huh.
>> Question: -- to let us know thumb's up, thumb's down.
>> Margaret Burnett: Uh-huh.
>> Question: You're an advocate of providing gender on a graduated scale?
>> Margaret Burnett: Uh-huh. Uh-huh. Yeah. And we haven't personally looked into
this one, but probably there's literature that says that females are more likely to be polite
about these things, too. So if you have a thumb's up, thumb's down, you know, more
people are going to do the thumb's up probably if they are females. This is my guess.
>> Question: Uh-huh. (Inaudible) ->> Margaret Burnett: Uh-huh.
>> Question: -- pretend to think it's them and not the product.
>> Margaret Burnett: Uh-huh. Right. Exactly. Yes?
>> Question: Thank you.
>> Question: A couple other things, first of all, kind of like what you guys were just
mentioning. Is that when they go on a research, they usually go into (inaudible) work
tools. One of my regular routine questions that I ask people (inaudible) was to ask them
to mention AES software tool or any product that they had used that made them feel
stupid.
The product that was by far most commonly mentioned was Excel. So I found it ironic
that was the controlled view that you chose that on this particular one because by far
Excel was mentioned most commonly as being the product they chose.
>> Margaret Burnett: Right. And I think it's not ironic because the thing is maybe that
just turned out to be a fantastic example of a male-oriented tool that's out there and so it
maybe gave us the magnifying class we needed to really see the phenomenon. We'll
see how widespread it turns out to be.
>> Question: (Inaudible) I didn't split in that tallying with (inaudible) ->> Margaret Burnett: Uh-huh.
>> Question: It was routine (inaudible).
>> Margaret Burnett: Uh-huh. Uh-huh.
>> Question: In the range of people.
>> Margaret Burnett: Uh-huh. Yeah.
>> Question: The other point being that it was in a psychological research world there's
quite a lot of existing information about the way that women approach problems, the way
that women out there approach the world in general anymore, environmental position,
whereas men tend to be approaching (inaudible) more kind of (inaudible) basis. When
I'm thinking about, you know, the entire environment around a specific thing that they're
doing ->> Margaret Burnett: Uh-huh.
>> Question: Whereas men are more likely to be focusing on specific -- I -- how do I
achieve this goal ->> Margaret Burnett: Uh-huh.
>> Question: -- in a much more (inaudible) thinking in a more broader context and ->> Margaret Burnett: Uh-huh.
>> Question: -- you think that relates to the trends you are seeing here or is that ->> Margaret Burnett: I think it could well and we've seen -- we've seen some literature
that sort of kind of eludes to that, but I think we maybe haven't read all the right papers
that we should, so if you actually have some references that you could contribute, that
would be super.
>> Question: Yeah. It's a really big area.
>> Margaret Burnett: Yeah, I know, we won't be able to read all of it, but -- (laughter).
You know, already we've been trying to keep our finger on something like five different
domains as we try to gather the theories that are really relevant here, but we're always
looking for really important things we've missed and it sounds like we could do some
reading there so that would be great.
>> Question: And another quick thing. While I was doing (inaudible) research
(inaudible), one thing that we included in our general thinking about this that you guys
might want to think about, as well, is use of social and media tools, some unplugged
sites, for example ->> Margaret Burnett: Uh-huh.
>> Question: -- who are maybe asking expert kind of option ->> Margaret Burnett: Uh-huh.
>> Question: -- and things of that nature thinking about gender differences and use of
those sorts of social ->> Margaret Burnett: Yes.
>> Question: -- supported tools, as well.
>> Margaret Burnett: Yeah.
>> Question: So I would expect that they would be pretty strong gender differences in
(inaudible).
>> Margaret Burnett: I would think so, too. And we haven't, you know, we tried to focus
on just one sort of style because, you know, that way we could really try to get some data
and compare it. But, you know, if there are people working on that sort of tool, that would
be so great because we do strongly suspect that could make a big difference. So, yeah.
Did I see another hand somewhere? Maybe not.
>> Mary Czerwinski: All right. Well, thank you again and...
(applause)
Download