>> Eric Horvitz: It's an honor to have Sven Seuken with us today. Sven is a repeat intern, also an MSR fellow as well as a Fullbright scholar working at Harvard University in what we all believe is his last year of his dissertation work before he goes off to become a professor somewhere in the world, hopefully not too far away from where we are, because we like collaborating with him. He's been working with David Parkes, David Parkes team at Microsoft
Research. He's been working with Kamal Jain and Mary Czerwinski, Desney Tan and myself over the last couple of years.
He's been particularly interested in electronic market design, the application of mechanism design and game theory to electronic markets and systems. He's taken a particular interest in a very interesting area that tries to join the complexities of markets with potentially naive consumers who don't understand the details of how markets actually work, looking at concepts like what's the user interface experience with markets and how do people control complex markets, what's the implications of hidden aspects of markets from people.
In his recent intern work, internship work, which he'll be talking ability today, he's been exploring more aspects of how UI design and optimization meet markets. And you'll be hearing more about that right now.
So it's an honor again to have him here today, Sven, on market design meets user interface design.
>> Sven Seuken: Thanks very much for this wonderful introduction, and it's great to be back here. And I'm going to be talking today about the work I've done last summer with Eric, Kamal,
Mary, and Desney on market design meets user interface design.
And I want to start with this little example to motivate what we've done. So this is just a capture of a cafeteria salad bar. And research has shown that just by rearranging the food choices in a cafeteria we can increase or decrease the consumption of certain food items by up to 25 percent.
This is a Web site I just discovered yesterday where users go on line shopping and instead of browsing through all the possible laptops out in the world, this Web site only gives them exactly one laptop. And they can't do anything else but buy this one laptop. So we've taken all choices away from them. And many users are happy with this Web site.
This is another example about 401(k) plan design where researchers, economists have shown that giving employees more choices in the plan selection actually leads to fewer enrollment in the first place.
So we have these kind of contradicting results according to standard economic research or theory, and Thaler and Sunstein have captured this contradiction quite nicely by making the distinctions between humans and econs. So on one side, the econs which the economists usually talk about, seem to be perfectly rational, have unlimited time to make a decision and unbounded computational resources for deliberation.
However, we're dealing with real humans in the real world, and these humans have cognitive costs. They have bounded amount of time and in particular they have opportunity costs for doing something else than deliberating about the problem at hand, and they have bounded computational resources.
So as a consequence, while the way in which we offer choices to an econ doesn't matter, this actually matters a lot for humans. And Thaler and Sunstein call this the choice architecture matters.
And in this particular project what we looked at is how can we design the choice architecture for an electronic marketplace. And this is nothing else but electronic market design for real humans instead of for econs.
Here's a brief outline of my talk. Next I'm going to dive a little deeper into this intersection of user interface design and market design and motivate why I think it's an important field of study.
And I'm going to motivate one particular problem that we looked at and where we situated all our research in, which I call the 3G bandwidth allocation problem.
And then I'll show you in detail an economic experiment that we designed and ran over the summer and all the results that we have from the statistical data analysis. And if I have time, I'll give you a little bonus chapter on a hidden market with fixed prices that we've started to think about recently. And then I'll conclude.
Okay. Here I'm depicting this intersection of user interface design and market design. And I want to motivate why user interface design is really important for markets for at least four reasons. First of all, the UI is the first point of contact for any user interacting with the market, and it will decide whether the user even stays in the system or goes to another market.
But, more importantly, the UI design constrains the kinds of market designs that we can just fully employ. And the user interface defines how the users express their preferences in the market, depending on how we -- what kind of choices we give them, whether we let them enter values or move sliders. This defines how we learn about the users' preferences.
And last but not least, and probably most importantly for our talk, the user interface defines the amount of cognitive costs we put on the user, how complex is this UI and thereby the market for the user.
So in general we'll have this tradeoff between a more complex market or UI on the one side and a more expressive market on the other side. And we'll have to decide where the sweet spot lies.
So in our research last summer, we introduce this new paradigm thinking about market user interfaces. And I think this is best defined by these two questions; namely, which information do we display to the user about the market at any point in time, and what choices and how many choices are we offering the user.
And the research question we were interested in is what is the optimal market user interface given that users have cognitive costs and are not perfectly rational like the econs.
And here is a -- is what I think one of our main contributions is. Essentially, a methodology to optimize market user interfaces. What we start with is a particular market user interface, like a dummy interface, and then we run a user experiment on that dummy. So what we did is we designed a market game that we let the users play, and because we designed the game, we knew which choices were optimal at any point in time, and we could measure users' performance and compare the performance with the optimal play.
Based upon this experiment we could then feed the results into a learning algorithm which gave us a user model predicting future behavior of our users in similar environments. So based on the experiments from here, we learned and then built the user model. And then we can put this user model into an optimization algorithm and essentially search over the whole design space for the user interfaces. For each design we compute the expected value based on this user model, based on the real human behaviors, and then we choose the design with the highest expected value and leading us to the optimized market user interface.
Okay. So this is abstract view. Let me make this all a little more concrete by diving into this particular application. Yes.
>>: It seems you're optimizing for the user, not for the guy who's sending to the user, which is usually how Web sites are designed.
>> Sven Seuken: That's a good -- it's a good question. And let me answer yes and no. We are interested in this talk mainly about overall efficiency, which is defined as the sum of all users' utilities. It is true that if you are concerned about maximizing the revenue of a business or of a
Web site, you might design the choice architecture differently.
In many markets, in particular in very large markets where you have competition, usually you want to, you know, have very high efficiency in that market as well, because if you lose too much efficiency, another competitor can come in, design a more efficient market, and users will switch.
So that's why I -- you know, in recent years many economists have argued for efficiency being a good reason to optimize, a good point of optimization.
Okay. So the particular problem we looked at as a motivating example is that of 3G or 4G bandwidth allocation for smartphones. So there is currently a shortage of this bandwidth, and one sign is that AT&T has recently dropped their unlimited data plan. And there are multiple reasons. One is we have exponentially growing number of users, but at the same time it's very expensive to improve the existing infrastructure.
And demand actually has high variance. While supply is fixed, there's always, you know -- tomorrow there's going to be the same supply of that bandwidth as today, but depending on the time of day and the location, the demand that users have for this bandwidth is highly variant. I mean, in New York or San Francisco it's almost impossible to get online, and in Denver it's probably not a problem. And in particular the time of day plays a big role in terms of what the demand is.
The current approach for solving this shortage is simply that whenever there's more demand than supply you simply slow down every user. Or even to prevent this from happening, we simply constrain the total data usage of every user so that every user just consumes that at a very low level.
Obviously this introduces large economic inefficiencies, and we believe we can improve the system by going to a market-based solution.
So the assumption is that sometimes users are doing tasks of high importance, sometimes they're doing tasks of low importance, and users might be willing to accept a lower performance now
when they're doing a task of low importance, like they're just updating their Facebook status, if we can guarantee them a high performance later, perhaps when they are e-mailing an important presentation to their boss or something like this.
>>: Why did you make that assumption? Just curious.
>> Sven Seuken: We'll make that assumption essentially to make this point. Let me -- please ask again if this doesn't make it clear. So what we want to do is we want to shift the usage of some users to a different point in time. So whenever there is too much demand and if this is the maximum supply level that we have in this market, we need some users to stop consuming and consume at a later point in time when there's enough supply.
So if I can convince some of these users to not consume at all or to consume at lower rate, then
I'm fine in this market. So this is why we need this assumption that some users will be fine with sometimes consuming at a lower rate if we can guarantee them the higher rate later.
So the idea is very simple. When we have more demand than supply, we will ask users to interact with some kind of market interface with a bandwidth market, and then we update the prices to balance demand and supply until we come out fine.
So, by the way, this raised some interesting neutrality questions because we're no longer treating every package equally in this market. However, it is actually aligned with recent agendas of researchers on the net neutrality question because we're not giving the provider the power of deciding who do I favor, but it's the user who we put in charge, do I want to have slow or fast
Internet right now, and that's actually in line with some --
>> Eric Horvitz: [inaudible] put the question in, it was a good one, [inaudible] there's a domain that was selected here for doing the studies as a sample domain, but the principles and the ideas comply to many different kinds of domains, whole different [inaudible].
>>: Fair. I'll wait till everything -- because this domain I'm particularly interested in.
>> Eric Horvitz: Okay.
>>: It's a very challenging domain.
>> Sven Seuken: Yeah. Thanks.
>>: If we have classes of memberships, one being like priority membership where you get the supply whenever you like and there's like medium membership where you get the data whenever available, so -- and you -- with the market structure reassigned, different prices in different locations for these different classes, dynamically, how could -- a system like this would compare with that. Because in that system you are not doing the auction every time you need something, but you have a pattern and you make decision maybe [inaudible].
>> Sven Seuken: Yeah. I think to answer that question, definitely you would have to write down a formal model and state the assumptions clearly. Obviously by having a tiered pricing system, you can remove some of this problem. Essentially the people are not -- what you're doing is you're making the people who are not willing to pay much to simply consume at a lower rate, and the people who are willing to pay a lot, we allow them to consume a lot.
And that's what some of the cell phone providers are doing right now. We have tiered pricing being introduced by Verizon, for example, a month ago, or I think ideas of that. 4G. Your idea is being rolled out on Verizon.
But that idea would still lead to certain economic inefficiencies; namely, if you are a customer who is -- let's say you chose the low tier pricing and there's currently enough supply so that we could allow you to consume at the full speed, but we don't allow you because you chose the lowest tier, right? And that's losing efficient -- that's losing social welfare. And our approach would avoid that.
So here's a sample UI of what this interface for this market could look like. We didn't actually build this on a phone, but this is how I could picture this. Let's assume a user gets 50 points a month that he could spend, and you don't have to interact with this market at all whenever there's enough supply for everybody. But whenever there's a shortage, let's assume every three days you get this little popup and you have to decide right now, okay, do I want to stop surfing or do I want to keep surfing but at a low speed, I'm going to pay two of my 50 points, or do I really need full speed right now because I'm doing something really important, I'm going to pay 10 points.
Okay. This is what this UI could look like. You look at this, you think about it, you click, and then you continue what you did before.
So what can we change about this market UI? Well, there's a couple things we can change. And we consider four different UI design levels. The first one is really the easiest one. How many choices should we offer the user. The second one is should these prices that we're offering the user be fixed or should they change over time.
The third one is do we have fixed or situation-dependent choices. What this means is do we also -- if we decide we're going to offer the user four choices, do we always give him the four same choices, or do we give him different choices depending on the context that we think he's currently in.
And then last but not least, this is the question of how do we optimize the UI. If we have some model of user behavior, we can optimize the UI either for perfectly rational behavior or we can optimize the UI assuming that the users are sometimes going to make mistakes. And we're going to analyze all of these design levels in our experiment.
So now I'm going to move on to describing the details of the experiment and then to our results.
So let me show you a quick demo of the game that we designed. Here we go. This is not the game; this is just my way of starting the game. And so here we go. This is essentially the application I wrote to run the experiment. It is -- the game where we're showing the user, similar to the screen shot you saw before, five choices, and every choice has a speed. This is zero, 100,
200, 400, 900 kilobytes per second. Every choice has a price in tokens from $0 to $18 tokens.
And every -- in contrast to the real application in this game, we also show the user the value of that choice, going from minus 1.2 all the way up to 1.7.
>>: What does that mean?
>> Sven Seuken: I'll explain that in a second. The user -- we give the user 30 tokens that he can spend over six rounds. Every decision he makes is one round. Whenever the user clicks on one
of these choices, we're going to subtract the number of tokens from his total amount of tokens and we're going to add the value of that choice to his score.
So this is essentially describing how valuable is getting 900 kilobytes per seconds to me in that particular situation. And so we're defining that value for the user. So let's assume the user clicks here. He gets to the next round, five rounds left, we subtracted eight tokens and we added $0.9 to his score. And what happened randomly, there was chosen a new distribution of values in this round, the values keep changing, just in the last round we had minus 1.2, now we have minus
0.7.
So now the user has to make another decision. And he has to essentially allocate his tokens, the
30 tokens, over the six rounds that he's playing. If he runs out of tokens too early, he will at some point no longer be able to choose -- to make a selection from these upper choices because they're too expensive, he can't afford them anymore, and then he will be forced to take these choices that have a negative value.
And at some point the game is over. We show the user what his total score was -- in this case, 70 cents -- and we show him his accumulated score overall.
And it's important -- this is very important because what we actually did is we let the users play after an initial training period and detailed explanations. We let them play about 50 games and we paid them the total amount of money they made in all of these games together. And I think the maximum payoff was around $40. The medium was around 20.
So to -- so I explained the basic structure of the game. In the upper left corner we have this timer that's ticking down. And that is putting a time constraint on the user. And that was important for our experiment because we are -- remember, we're studying markets where cognitive costs shall matter. So we have to limit the amount of time that the user has to make a decision; otherwise, he could just solve this game optimally on paper, which we didn't want.
So we had to put him under some time pressure. And in the fixed time constraint later, we put him either under a seven-second or 12-second per-round time constraint.
This task category that we're displaying here is essentially a key for the user about the distribution of values that he can expect. And before the game starts, we show him a little table.
And he has time to look at this table. He doesn't have to summarize this table, but essentially to build a model of what to expect when he's playing the game, and he sees, okay, if my task category is high importance, this is the distribution of values I will see in the game; for medium importance, this is what I see; and for low importance, this is what I will see.
And we also show him the five different choices he gets in the game and the different prices he can expect over the time course of the game. Yes.
>>: If something's of low importance, why would it be more harmful to have a hundred kilobits per second for it than for high importance?
>> Sven Seuken: Something is of low -- ah. You mean this minus 0.5? Yeah, that's -- it's a good point. It's not -- you know, is the semantics of low, high, and medium importance is -- was essentially meant to -- we could also call this blue, red, and green and put the user -- give the
user three signals about different situations he could be in or different value distributions he could expect.
We -- what we did is we generated a utility curve and then at some point these utility curves for the different classes of importance would cross, so they had to -- at some point they had to cross and overlap. So I agree there's a semantic inconsistency of this little point, but I don't think it mattered too much for the experiment.
Okay. So let me get back to this. So this is the game that the users played. So here is the real time limit that they saw, seven seconds per round. When they ran out of time, they got a warning. Three seconds before the end of the round, the computer would start beeping, and we told them beforehand if they don't make a selection by the end of the seven seconds, the computer automatically selects the lowest choice for them, which is often the -- not the best choice. And so they had to avoid that by all means.
So I already told you to put -- to study cognitive costs we made the game challenging enough so that there was actually -- it was actually not easy to always find the optimal choice.
That's why we designed this multistep optimization game and we put the users under time pressure. The first time constraint that I've shown you is this exogenous time constraint where we fixed the time limit per round, and that was in our experiment of the seven or 12 seconds.
But we also had a second time constraint which I call endogenous where we give the user four minutes overall to play an unlimited number of games. Meaning, once the user finished six rounds, the game ended and the next game started. So now the user had the challenge of deciding, okay, how much time do I want to spend on each decision, but the longer I spend on each decision, the fewer games I get to play in total. So the decision time was now endogenous to the overall meta game.
And I told you to provide the users with the proper incentives we showed them the value of each choice and paid them based on how well they did in the overall game.
Okay. So if you're familiar with Markov decision processes, then you've probably already noticed that the game we designed is essentially an MDP. We have the state space consist of the current budget that the user has left, the current round, the current values of the choices we're offering the user, and the current price vector.
And the action space is simply which choice do we give the user -- which choice does the user select. And the transitions are deterministic for the budget and the round and they are random for the values of the prices.
And if I didn't mention this before, in every transition from one round to the next there was a one-in-three chance of any of the three categories of showing up, the high, medium, low importance, and a one-in-three-chance of one of the price levels showing up.
What we did is for any of the games that we let the users play, we solved the game optimally solving an MDP; thus, for each situation that the user could face, we computed the Q-value of each action, Q of A. And the Q-value is just the terminology of for every action in a particular game -- meaning, if we had four choices, for each of the four choices -- we compute the optimal expected value of taking that choice now and playing optimally for the rest of the game.
So the strongest finding from the data, which was also -- which is very positive, is that users understand and properly detect these Q-values. So how this shows up in the data is that we have a very high correlation between the Q-value of a particular action and the decision what the user actually made.
One second. So, in particular, the users are not myopically going for the highest value of the four choices that we're showing them, but they're looking forward, okay, how expensive is that choice now, how much -- how many rounds do I have left, so they actually are properly planning forward given the uncertainty of the game. Yes.
>>: Are these users [inaudible] or are they typical members of [inaudible]?
>> Sven Seuken: We recruited 60 people from the Seattle area, and none of them were computer scientists. Or we excluded people with a math, physics, economics, statistics major. No user design-experienced people. So basic skills. Once in a while -- I think out of the 60 I had one finance major because I forgot to exclude finance from the set of -- but, you know, most people had a very base level. But all of them had college degree, at least a bachelor's. So we wanted people to at least understand what it means for a choice to have one-in-three chance or something like this.
>>: Keeping in mind that [inaudible] decision analysts are the worst decision-makers.
>> Sven Seuken: Okay. Yeah.
>>: So did the optimal Q-value have any structure in this case?
>> Sven Seuken: That is a good question. And we tried to design the game such that it doesn't.
When I started designing the game, you know, it went through a lot of iterations. And at the beginning I came up with games where it was very easy to see which one was the optimal. For example, you know, when I had a game with three choices and I had three categories, it was a simple mapping from -- the high-importance category was you always choose the top choice; the medium, the middle choice; the low, the lowest choice.
But I kept changing the value curve and I kept introducing uncertainty in terms of prices going up and down and values going up and down such that there was no longer an easy mapping, so to make the game challenging enough. And, you know, the results show -- yeah, you know, if there were an easy mapping, users didn't detect it. But there wasn't.
So let's first look -- so I told you that these Q-values are highly predictive for a user's value.
Let's look at this in a little more detail. The first thing I'm looking at is games that had a fixed time limit. And I'm doing a binary logistic regression on the variable opt choice; that is, did the users make the optimal choice among the set of choices or not. So it's a binary variable. That's why I'm doing a binary logistic regression.
And what we see here is the Q-value difference between the top two choices. And we're showing -- it shows up as highly significant in terms of predicting this binary variable opt choice.
So P less than 1,000s. And as long as this term is larger than 1, you have a higher probability of selecting the optimal choice when this value goes up.
So the fact that it's over a hundred means it's incredibly predictive in terms of making the optimal choice.
So the larger the difference between the first best and the second best choice, the larger the difference in Q-values, the higher the probability that the users found the optimal choice.
Which -- one second -- which intuitively makes sense because when the Q-values of the two top choices are close to each other, it's not easy to figure out, okay, which one is really the best. And the larger they are apart from each other, the easier it is to see. Yes.
>>: So did you have a dummy variable for every participant?
>> Sven Seuken: I -- did I have a dummy variable? No. What I did is I took all the -- are you worried about --
>>: The fact these are not independent variables.
>> Sven Seuken: Right. What I did is I took all the -- for efficiency I demeaned all the values -- so let me think about what I did there. For efficiency I demeaned all the results and compared the variances to take care of that.
For this effect, I think you're right. I didn't -- I didn't do that yet. So what you're worried about, that this statistically -- that this measure is actually -- it's not the right one.
>>: [inaudible] participant who is brilliant and none of the other participants and because you're counting these nonindependent events it's -- you may be overstating your significance.
>> Sven Seuken: Yeah. Let me think about this. Yeah. I think you're right. I have to do that --
I have to go back to that analysis and introduce a dummy variable or do a repeat measure analysis. I think that's --
>>: Just compare the means. I mean, they should be far enough off from random that you can prove they're statistically doing better than random. I think it's the same thing you're getting.
>> Sven Seuken: Right. So when I look at -- on the next couple slides when I go to efficiency, then I looked at the means. And then that effect -- then that doesn't show up I think for -- here I could do the same thing. Yeah. So it's a good point.
Let me move on to decision time. So the worry is that this -- when we look at the means instead of -- you know, when we take care of the fact that we have multiple nonindependent results here, that this statistical significance will drop. But that was so strong, and I looked in the data in detail that I have high confidence that even if we look at the means the same result is going to come out.
When we -- now we look at, okay, what happens with decision time as we look at the Q-values of the different choices. And so what we -- here we're doing a linear regression on the dependent variable time taken per round. So now we are no longer looking at the games with the fixed time limit per round but where they had an overall time limit of four minutes.
And here what we're seeing is that the Q-value difference, again, highly predicts the amount of time taken for a decision. Essentially what this is saying is if the Q-value difference goes up
from one to two, the users take on average three and a half seconds less to make a decision. And
I think the average decision time was somewhere between three and six seconds, depending on the particular game they played. So higher Q-value difference, much easier, much faster to make a decision.
Now that we've seen the Q-values are the highest -- most important predictors for users' decisions, let's move on to the four user interface design levels. So this was our overall flow that we wanted to follow. And here we started with the first experiment where we had 40 users, the first two design levels where we varied the number of choices and we varied whether we had fixed or changing prices.
Based on the data from these two experiments, we did the reoptimization, and then in the second experiment we tested, okay, what was the effect of optimizing the UI for optimal or suboptimal users and what was effect of having fixed or changing choices.
So we'll start with these two design levels in the first experiment. So here this is a game with three choices where I'm offering the users zero, 200 and 300 kilobytes per second. And with four choices I'm offering them zero, 100, 200, 400. With five choices, and this is how -- what it looks with six choices.
So the users played different games with more choices. And the choices were optimized such that I searched the design space and I found those four or five or six choices that were optimal given that I wanted to -- that I needed to give them four or five or six choices.
So now let's look at what -- if we compare this, the optimal value obviously keeps increasing as we give the users more choices. That's of course the more freedom you have in making decisions, the better you can be overall.
But the error rate of our real users playing also went up, which is expected. With more choices, you can make more mistakes. So also the value loss per game keeps going up, but that is relative to the maximum amount you could possibly make.
So let's see if these error rates are statistically significant. If we again run the binary logistic regression on the optimal choice variable, we see that giving users more options made them less likely to find the optimal choice. And that is statistically significant with the caveat that I have do the demeaning.
So that effect is important. So what happened to overall efficiency. This is really what we're interested in. And this is the resulting graph. Blue is the optimal value of the game which keeps going up as we increase the number of choices. And red is the curve that the users actually achieved. The value keeps going up as we increased it from three to four to five choices. And then goes down as we go to six choices.
So this is the curve that we had hypothesized would happen; that at some point giving users more expressiveness, the overall efficiency actually decreases.
Unfortunately, these results are not on only statistically significant going from three to five and three to six, but this drop here is not large enough to be statistically significant.
So that's unfortunate, but it happens. We only had 20 users in that particular experiment. I believe, you know, either we need more users to get statistically significant effect or we add a fifth category with seven choices or eight choices. At some point we're going to see this effect clearly; that adding more expressiveness for the users is actually going to decrease sufficiency.
Yes.
>>: How much [inaudible] six choices were one long row instead of two by three [inaudible]?
>> Sven Seuken: I agree, it's a long list of choices, but with -- you know, the users had -- we trained them for about a half an hour on the game and they had a mouse and, you know, it's really easy enough to find the -- it doesn't -- you don't need five seconds to move from the top to the bottom of the screen, so I don't think that was the problem.
I actually thought about keeping the size of the game constant and changing the size of the buttons, but that would introduce other problems with changing cycles of the buttons, and so I decided to keep this that way.
So, yeah, as I said, this is the curve that we had predicted. Unfortunately this drop at the end didn't come out statistically significant, but nevertheless, this is the curve we got.
Let's look at the second design level. Fixed or changing prices. So what this means is when we add the fixed prices condition, then for the same choices, zero, 100, 300, 900, we would always have this price, zero, two, six, and 18. When we had changing prices, then we had these three different price categories.
When we're currently in the cheap world, prices would only go up to nine tokens, 900 kilobytes per second. In the medium they would go up to 18, and then in the expensive world they would go up all the way to 27. So now every round one of these three levels will be chosen at random.
And users had to cope with that.
So what happened. Let's look at the error rate again, first looking at the games with the fixed time constraint. And here we see that by having changing prices, so this is now a binary variable, whether we had changing or fixed prices, if we had changing prices, the probability of finding the optimal choice actually decreased. So this value is less than one, so there was actually a drop in finding the optimal choice. And that drop is slightly stronger than the corresponding term for the number of choices left.
So what this means is having changing prices instead of fixed prices has a slightly more negative effect than giving the user one more additional choice in terms of what's the increase in making an error on the user's part.
So let's look at what the effect of training prices is on decision time. Now we're looking at the games with the four-minute time -- four minutes overall time limit. On average the users took
4.8 seconds to make a decision. And now we're doing again a linear regression on the time per round that the users have taken.
And here we're seeing that when we're -- when we have training prices instead of fixed prices the users take approximately a half a second longer to make a decision than with fixed prices. And half a second is approximately 10 percent of the average decision time they take. So it takes
them slightly longer to reach a decision. And as we've seen on the previous slide, they make more mistakes.
So this doesn't say anything yet on the resulting effect on efficiency. And I'm postponing this to a later discussion because analyzing efficiency with changing where the fixed price is actually tricky because it changes the market structure completely and you cannot analyze sufficiency in isolation because it's not possible that all users at the same time, you know, everybody chooses the high speed or everybody chooses the low speed, which would happen if everybody would react in the same way to the changing prices.
So to analyze the efficiency, you essentially have to build a market model and consider the effect of what happens if a whole economy now is facing fixed or changing prices. So I'll get back to that later.
Now moving on to the third design level, whether we had fixed or changing choices. So the problem is we don't know users' values in the real world. When they're facing a decision, they have these four speed levels. We don't know what's your value for 300 kilobytes per second.
But every choice the user makes is a signal about the user's value to some degree because we can learn a mapping from the current context the users [inaudible] value estimate.
So if every time the user's currently in a Facebook application and we present him this UI and every time he selects the lowest choice, then that -- then we can learn something from that about an estimate of the user value whenever he's on Facebook.
Whenever the user's currently in Outlook trying to send an e-mail and we present him these choices and he selects the full speed, this is another value estimate that we learn.
So if we take these learned mappings and present the users with a situation-dependent choice set, then we end up with something like this. In an ideal world, if we know, okay, currently a low-importance task, or another word for this would be Facebook, let's give the user these choices, zero, 100, 200, and 400 kilobytes per second. Medium importance task, we give him these choices up to 600. And high importance we give him the choices all the way up to a thousand. So now we're optimizing the choice sets for the situation that the user finds himself in.
What's the effect of this on efficiency? Doing a linear regression on efficiency, what we find is efficiency -- efficiency increases slightly but statistically significant when we have changing choices compared to fixed choices. By almost 0.1. Which is a non-negligible amount.
Let's look at this in a little more detail what actually happened. So when we redesigned the UI, went from fixed to changing choices, what happened is that we actually changed the optimal value of the game but only slightly so. We increased the optimal value by only 0.02. However, the real-life score of our users increased by 0.08. So almost five -- so four times as much.
So having these changing choices, the users did not only exploit the efficiency gain that was there because we changed the game, but it was actually easier for them to find the right choices and to make better decisions. So changing choices actually helped the users to achieve high efficiency.
And that wasn't clear a priori. I remember Eric and I had this discussion whether, you know, presenting users with different choices every round, every seven seconds you get different
choices and you have to re -- you know, rescan the choice set and look at, okay, what am I facing now. This might actually make things worse because it's confusing users or slowing them down.
But this shows it actually helped them make better decisions.
Now the fourth design level where we move to the UI optimization part. So based on -- to remind you based on the first two experiment, we took the data and trained this user model to have a predictive model about user behavior in other environments, and then we searched for the optimal user interface given this user model and came up with an optimized market user interface.
So now we're going to look at this part of the experiment. And here's what we did in this learning and user model building phase. We've already seen that the Q-values are the most important predictors for users' behavior.
So we took this softmax activation function to have a value proportional error function. So this probability distribution is telling us what's the probability of choosing action A given the Q values of AI and the Q-values of all the other AJs. And this -- if you order the actions by the
Q-values, you get something like this, that the action with the highest Q-value will have the highest probability of being chosen and so on.
So in the learning and model building phase, what we did is we computed the maximum likelihood estimate of the model's parameter lambda to fit the data from the first two experiments. And then in the optimization phase we searched through the whole UI design space and selected the best user interface. And "best" meaning achieving the highest expected efficiency over all possible games.
And we did that two times, once assuming optimal play that resulted in a UI optimized for optimal players, and once assuming suboptimal play based on the user model we had learned in this phase assuming suboptimal play. And then we ran an experiment with both UIs looking at, okay, how did users perform with one UI and how did they perform with the other UI.
So here are the results. Let's first look at the error rate running the logistic regression on the optimal choice. And what we see is that if when we optimize the choices for suboptimal play, the probability of getting the choices right actually increased, and statistically significant so.
And by a non-negligible amount. You know, as long as this is larger than one, the probability increased.
So with the UI that was optimal for suboptimal play, the error rate was lower than with the UI for optimal play. So that's good. But what happened to efficiency? This is what we're ultimately interested in. And here we're running a linear regression on the dependent variable efficiency. It was actually the expected relay score demeaned.
And here we're seeing this very surprising result that efficiency actually went down minus 0.1, and this is also a statistically significant result. So that was really surprising when we first saw that. And obviously it seemed to indicate, okay, this -- our method totally didn't work. So we achieved lower efficiency with the reoptimized UI.
And even though on the previous slide I showed you that users made better decisions among the choices we gave them, we still got overall lower efficiency.
So let's look at what happened. Because the problem is actually there were two counteracting effects in the reoptimization phase.
And here is a table where I'm looking at the mean values of the achieved scores. First let's look at the optimal value of the game. When we optimize the game for optimal play, the game had an optimal value of 1.02. And when we reoptimized the game, the value -- the optimal value of the game decreased to 0.78. So we took away about 20 percent of the efficiency from the user. And the user had never had a chance to recoup this 20 percent. So these are gone. I mean, you can't -- in this game you can't achieve more than this efficiency.
Why did we do this? Because we thought that playing this game was so much easier that the user would actually achieve a level here that was still better than a level that he would achieve here because this game will be a more difficult game to play. Okay.
But what actually happened is that the real life scores looked like this for the game optimized for opt, the users achieved 0.42 and optimized for sub-opt they achieved 0.32.
So what we see is that the value loss compared to the optimal value of the game was much smaller here, which is consistent with the fact that the users made fewer errors. However, unfortunately the overall efficiency was still lower than here.
So what this is saying is that these two counteracting effects, the effect that we took away some efficiency from the user in the first place was actually stronger than the effect that we made the decision easier for the user in the second place.
So overall we took away too much efficiency in the reoptimization phase. Or, in other words, we didn't find the right sweet spot. So there's many explanations for why we only trained a user model based on 20 users from the first experiment. That's not a lot of data. That was, you know -- perhaps with a richer dataset we could come up with a better user model that was -- had better prediction for user behavior. Or perhaps we forgot some important variables that are crucial for user behavior. And one of those might be loss aversion.
And this is -- so what we looked at to illustrate what loss aversion means is when the user plays the game, sometimes the user has -- okay. This is impossible to see. Okay. Sometimes the user has to take a negative value now to save his tokens so that he can get a higher positive value in a later round.
What loss aversion means is that the users are too averse to these negative values and try to avoid them by all means and might be willing to spend a very high amount of tokens now just to avoid taking this small cut of $0.3 now. And obviously that would lead to many suboptimal decisions.
So let's look at the regression. And what I did is I looked at the logistic regression for the optimal choice variable again and compared it what happened -- how does it affect whether the optimal choice had a positive value or negative value affect that. And indeed users were more likely to choose the optimal choice if it had a positive than it had a negative value.
So the takeaway from this is perhaps that in a next iteration when we're running a similar experiment again we would also take -- so that wasn't even part of our predictive model, the fact whether the values had negative or positive values that was -- that the optimization algorithm was completely ignorant about that. It would only look at the differences in the Q-values.
So perhaps if we take something like positive and negative values into account and build a richer user model of behavior and then do the experiment again, we can come up with a better result.
Okay. Another interesting fact that I found in the data is the role of the time. And it seemed that users were overconfident at the beginning of a game and started making better decisions towards the end of a game. And by game and I don't mean the whole like 45 minutes that we let them play, but every game out of the 50 games it seemed that at the beginning they made worse choices than at the end.
So if you look at the current time step, then you see that the larger the time -- and time step goes from one to six, so that's essentially the round in the particular game. The larger the current time step is, the more likely you are to make the optimal choice.
And this is not simply due to the fact that in later rounds of the game there were fewer choices so you automatically made better choices, but we're controlling for the number of choices that were left and we're controlling for the budget that's left and Q-value differences, so this is a separate result from that.
So in the earlier rounds users are more likely to make mistakes, and that's why I called this overconfidence at the beginning. Yes.
>>: What challenge were the MDPs that at the beginning of the game it is looking six times the
[inaudible] to make the optimal decision but at the later rounds of the game this is the MDP approach is looking one step ahead and two steps ahead. So this result may easily show that people are looking ahead fewer than the MDP approach would do just looking [inaudible] approach of looking one step ahead and two step ahead. Which means [inaudible] is larger that people's decision-making doesn't agree that much with the MDP approach. But [inaudible] is an important issue.
>> Sven Seuken: Yes. That's actually a very good explanation for this effect. Yeah. And I'm thinking about how we can tear this apart -- how we can look at if this is another analysis I could run to figure out if that was the effect or not. But that is certainly an explanation for this factoring up.
>>: There is some literature [inaudible] showing that people are not looking [inaudible].
>> Sven Seuken: Yeah. Of course, you know, I intuitively believe people are more likely to only plan ahead two, three time steps than six time steps. Yes.
>>: Can you repeat the difference between [inaudible] time step?
>> Sven Seuken: Yes. So number of choices left is the -- describes -- I ran this analysis on every decision for every round. So for every round there were -- if we consider the game with four choices, there were either four choices left or three or two or only one.
So we saw previously that the number of choices left has a high -- is obviously very important for the --
>>: Only one choice left is the optimal.
>> Sven Seuken: Yeah, then it's optimal. Right. So the more choices there are left, the less likely you are to make the right decision. If this value is less than one, you're less likely to make a decision. So the higher the number of choices left, the less likely you'll make the optimal choice. But that makes sense with a hundred choices. There's more ways to make a mistake than with only two choices left.
And I'm saying that because I'm controlling for the number of choices left, and current time step still shows up at statistically significant. It is not simply -- I mean, towards the end of the game, because you're running out of tokens, it was always the case of in the last couple time steps you had fewer choices left than at the beginning of the game. But this is not simply picking up that because I'm controlling for it.
>>: You're controlling for it linearly? Or how -- what -- how are you defining that variable?
>> Sven Seuken: It -- well, it's a binary logistical regression where all of these variables enter into the regression together. So it's -- it's a logistic function over all of these together. Does that make sense?
>>: Right. So my performance with -- my chance of making an error with one choice left is not half of my choice of making an error with two choices left. So if you're doing this as a linear regression on that variable, what you see in create time step could just be the -- what's left after that error results?
>> Sven Seuken: I'm -- I don't think I fully understand. Note that this is not a linear regression, right? It's a binary logistical regression on the --
>>: It's binary on the --
>> Sven Seuken: Yeah.
>>: On the dependent variable --
>> Sven Seuken: Dependent variable, yes.
>>: The question is what is the value you put before each of -- in what structure is each of the dependent variables? Are you treating them linearly? Do you treat them as squares. If you're treating them as linearly, then a one would be half of a two. But your chance of making an error when there's only one choice left is not half of your choice -- your chance of making an error when you've got two choices left because it's zero and there's some error that's greater than twice zero when you've got two choices left.
So you can't just -- it's a little premature to assume that because you've factored in number of choices left that that's not what's going on in current time step. Because linear does not perfectly model this impact.
>> Sven Seuken: Yeah. What I think I'm -- I'm only -- I mean, I understand the counter example with only one choice left, then you can't simply read that value and use that. But I'm wondering what's the analysis I would have to run to make sure that didn't happen.
>>: [inaudible] understand your comment. So he's measuring error on a single step, right, with a single step was error [inaudible] not total error [inaudible] outcome.
>>: Right. So there's different -- and each of these variables represents a different contribution to the error. Now, the assumption is because your number of choices left is going to remove any part of the error that's being made that's a result of the number of choices, it reflects how many choices the user has.
And so by having that variable in there, you're going to factor that out of one of -- I mean, at the end of the -- on the last step maybe I only have budget left, so that there's one -- maybe the reason why I'm doing so well in the last step is because there's only one choice, so there's only the optimal choice to make. So if that is the case, when you've got this linear variable number of choices left, it doesn't properly represent the possible chance of error.
>> Sven Seuken: I see. So are you suggesting, for example, to instead of using a -- letting number of choices left enter linearly to the binary regression, let it use three dummy variables for whether the number of choices left are four, three, or two --
>>: I'm not sure what the right way is to fix it. I think I'm just printing out that this could -- you may still have a confounder here. You can't assume that you've --
>> Sven Seuken: That -- yeah. Now I understand, and very --
>>: [inaudible] take off line exactly --
>> Sven Seuken: Very intricate. Yep. I think you might -- yeah, I might be able to fix that with introducing dummy variables for the number of choices left. That's a good point.
Okay. And so I'm -- this essentially concludes the experimental results on the study that we ran.
And now I'm going to briefly talk about an idea that we started to develop recently --
>> Eric Horvitz: [inaudible] questions on it, if any more questions on the first project.
>> Sven Seuken: Do you want to ask now or do you want to --
>> Eric Horvitz: Any more questions on the --
>> Sven Seuken: I only have two more slides and then I'm done anyway.
>> Eric Horvitz: Okay. We'll wait [inaudible].
>>: [inaudible] I have is when you're generating the user interface for the suboptimal case, you are using all the data that you collected with [inaudible] some of those objects may be [inaudible] right, making the optimal decision most of the time, and some of those objects may be more subtle [inaudible] making heuristic decisions. I'm just curious if there was a way to understand which people are optimal and which people are not and generating the model based on those people's data. I'm just curious if your results [inaudible].
>> Sven Seuken: Well, so of course I can look at the data and look at which people perform particularly well and which people didn't. For every user, I know his total score or the error rate
for a particular user, I know how well each user played. The assumption of -- or the reason for training the model on all users instead of a subset of users was essentially the belief that the population we would get for the second experiment would kind of be similar to the population we'd get for the first experiment. And there was no reason for me to believe if I only took the bottom half of the players from the first experiment that that would be -- give me the right model for the second experiment. You should normally expect, okay, take all the data points you have, train a model that matches the data as good as possible, and that should be your best predictor for the second experiment.
So what we definitely didn't do is take one individual user's behavior, look at how well can that user play a game, and then we reoptimized the game for that user. So that's not what we did.
That was not -- you know, we didn't even try do that. That would have required to let the user play for half an hour, take the data, reoptimize something, and then let him play the reoptimized
UI. Of course that should give you much better results because you are tailoring the UI to a particular user, not to a population of users.
>>: Sure. The individual learning would definitely help in a model like this. But because at the beginning of the talk you highlight the issue that there may be two types of users in the world, two types of players in the world. When you just get a model that is representing the whole population, you may just get a very noisy model that is not reflecting neither the econs or the noisy people.
>> Sven Seuken: Okay.
>>: That's my [inaudible].
>> Sven Seuken: So one clarification. When I talked about econs and humans, I didn't -- so what I essentially meant is that no human is an econ. No human plays optimally and has unlimited bounded time and computational resources. But every human makes mistakes, and that's what we saw in the data.
I agree that you would have -- you could try to come up with market UIs, let's say with a set of market UIs, three, and one is tailored for the very smart users, one for the medium, and one for the dumb users. And you could try to get other signals from users either what do you know about this user, what do you know about this user's behavior on his phone, on his computer, and so on, to present whim a particular UI --
>>: [inaudible] question might be might the current dataset you've collected be useful to answer the question about whether there is likely to be opportunity for optimizing [inaudible] or is there not enough data to do that kind of analysis at this point? Because if you've had a long summer to study this, take the current dataset [inaudible] whether we do the search for this kind of thing.
There might be very interesting nuances in psychology, might not be dumb or smart, it might just be more like nuances of how they do like serial verbal scans of text versus bar graph and so on.
>> Sven Seuken: I don't know the definite answer to whether the current data gives us -- lets us answer that question. I definitely could look at that question. I could look at the relative differences in performance between different users and look at whether these are statistically significant.
I would predict that definitely between the highest and the lowest performance there was -- I knew how much I paid them. You know, some users I paid them five bucks, some I paid 42 bucks, so this 42-buck guy or woman made a lot more -- made a lot better choices than the other one.
>>: For example [inaudible] know if the $42 person versus the $2 person with the gains or losses would have been per efficiency and per optimization of the new interface than it is the before and after.
>> Sven Seuken: So we can look at that -- yeah, it's questionable whether we get statistically significant results from that, because that's even taking 20 users and now splitting them further apart. Yes.
>>: Since you gave people as many games as they could in a fixed time period, there was mental tradeoff between how quickly do I just -- you know, one strategy might be, oh, just hit the topmost thing you can hit and if it comes out any bit over zero as long as he can do it fast enough, you should beat the other strategies and so on. Did you see --
>> Sven Seuken: Quick answer to that, we -- so that was obviously a concern. And there's two ways we combatted that concern. One is -- I didn't go into all the details in the four-minute game. After every game you played, we forced the user to make a 15-second break so that you couldn't skip ahead, so you couldn't actually play an infinite number of games.
But also, and more importantly, I designed the game such that if you just clicked on the top or at the bottom or somewhere all the time, you would not on average come up with a positive expected value. Because you really -- you know, the game was difficult enough so that a simple clicking strategy did not suffice. You really had to think. And that was a large -- that's what actually took me a lot of time and testing, trying to design a game such that this clicking strategy would not work. Yes.
>>: A couple questions [inaudible]. What would it take to search the design space to come up with -- to identify trades at the sweeter spot for design? Say we haven't found the sweet spot here, that doesn't mean that there [inaudible] heuristic and algorithmic approach to finding sweet spots. Any reflections about that?
>> Sven Seuken: I'm not sure I understand the question.
>>: You said what we did was --
>> Sven Seuken: We didn't find the sweet spot. Yes.
>>: So that raises the question, might there be a methodical way to identify [inaudible] maybe not in the sweet spot space or find the sweet spot, besides just trying a bunch of things?
>> Sven Seuken: I can -- yeah. I think the most -- the first step should be the richer user model.
I mean, that's also what I discussed with David Lapeson [phonetic], you know, try to identify all the factors of your environment that influence the error rate such as loss aversion, such as possibly other effects that we haven't identified yet. And once you understand every factor that influences your error rate, come up with a user model that predicts errors as accurately as possible, and then that should give you better results.
>>: And then to echo Gideon's semi-humorous remark, which was, boy, what if this gets in the hands of the wrong people who actually -- those strange people who actually want to make profit in the world. Who are these people anyway? [inaudible] the user, I guess comments on that. I mean, for example, might we as designers come up with incentives such that that would not be possible --
>> Sven Seuken: So who's the designer?
>>: -- or mechanisms that actually make it in favor of the owner of the interface to do things well for the user?
>> Sven Seuken: So --
>>: [inaudible] I mean, if --
>> Sven Seuken: It's a computational [inaudible].
>>: Yeah. Because the user will switch to another provider.
>> Sven Seuken: So you're essentially asking if there's a meta design so that if we then give the methodology to somebody they cannot misuse it?
>>: [inaudible]
>> Sven Seuken: So the -- I think one -- I like to think of Amazon personal e-commerce. They really have the biggest experimentation platform for trying out things like, okay, what's the effect of tailoring choices to users' behavior to particular users. They have so much information about you, can offer you specific things. Of course they can misuse the methodology by not offering you the choices that increase sufficiency but revenue.
And the competition -- yeah, the best answer I can give there is the competition argument, if you over time figure out, okay, I'm not getting -- I'm losing out, I'm going to a different platform, then Amazon has a disincentive to do that. Obviously a big monopolist like Amazon -- or not quite monopolist, but, you know, once you get really big, then there's a danger of computation not working anymore.
I don't see a way how you could build it into the design methodology to prevent misuse of that method.
>>: [inaudible] to you before, but it occurs to me that it would be really helpful for you to do at least eye tracking. Next time you do the richer [inaudible] I think you do need to go higher.
Because once you start putting more and more, subjects really might not look at everything.
They certainly might [inaudible] we see this in search results all the time, they look at the top two [inaudible]. So you might really benefit from that, because then you'll see what they're actually looking at and paying attention to. And that could really influence the attention
[inaudible] might be influencing more than you know.
>>: [inaudible] is going to study this kind of thing of how far down in lists of various kinds
[inaudible] granularities and the idea of taking some of the search results [inaudible] Georgia's results on the heatmaps.
>>: I got a huge influence on the design.
>> Sven Seuken: Yeah. I mean, the eye tracking thing, it would be interesting. I wonder how much more that gives you in addition to already controlling for the number of choices.
Because --
>>: [inaudible] things differently I guess is what I'm saying based on scan patterns of the eye.
We even know people scan Web pages on a very systematic particular way [inaudible] accounted for that in this design. So I don't know why I never thought of it before.
>>: People might be less likely to shirk or they might be less likely if they know they're being eye tracked to go -- to skip things.
>>: Yeah, that's true.
>>: So you do have to --
>>: If you make it tedious and onerous enough --
>> Sven Seuken: The good thing in our experiment was we paid users based on how well they do.
>>: Yeah, yeah, yeah, that's true.
>> Sven Seuken: So they have a strong incentive to, whatever they're doing, however they're scanning things, do it in such a way as to maximize your money. And so I wouldn't be worried that they skipped --
>>: For the amount of money, they still might not [inaudible].
>> Sven Seuken: But it isn't necessarily stupid to skip over certain choices. If you -- you know, if there are six choices and, you know, I don't have time to read through all the information on the screen in seven seconds, it might actually be the best thing for you to only -- to only read every third choice. But it's -- but I -- okay, I think concerned that if you know I'm being eye tracked then that changes your behavior.
>>: [inaudible]
>> Sven Seuken: Is that possible?
>>: [inaudible] the cost of looking at all options and checking which one is the best may be just costly [inaudible] that they may decide this value is not [inaudible] and I'll just, you know, choose one in a simpler case [inaudible] using a heuristic. And that can be always an issue in this. [inaudible] cost of the market design [inaudible] I think it is not fair to compare the
[inaudible] with Amazon just because there are contracts. So if you get into a plan, you are either there for two years or you have to pay $500 or $600 to switch to another provider. And
also the number of suppliers is much less just because the infrastructure is back. So if it's very easy for [inaudible] whatever is providing them the best profit, it would [inaudible] the same thing, then you don't have a choice. These are two issues with regard to infrastructure.
>>: But I thought it was really [inaudible] if anybody [inaudible] to come up with a [inaudible].
>>: [inaudible] deserves thinking about it, but I think the problem of generating the good
[inaudible] will be more challenging than the regular Amazon markets. Just pointing out the difference.
>>: I very much enjoyed your talk. Thank you. When you showed the graph of the smoothly changing demand or supply or whatever was being modeled, I have to say my head originally went to the air conditioner on my car. I don't think of it as a discrete thing, I either turn it up a little bit or I turn it down -- well, when I get in the car in the morning I turn it way up. You know, I'm just adjusting a little bit. If we think about this, it seems like a natural view of this would be whatever choice I made, let's call it if they're in order, C, the next choice for me is almost certainly if I haven't changed, if only the market has changed and it's changing smoothly, is either to stay at C or go to B or D. A and E are out of the question because the market doesn't move very much.
>> Sven Seuken: Yeah.
>>: And from a UI point of view, I think that's where I would always want to look. Do you want to pay -- things are getting tight, do you want to pay a little more to keep your current level, or if things are getting loose, would you like to upgrade? What would you like do? And this discreteness I think isn't modeling -- isn't taking into account that context.
>> Sven Seuken: I think that's a good point. We got at that aspect a little bit when we studied this design level of changing choice sets where we essentially -- let me go back --
>>: So long as its anchored and so long as you can tell me this is where you were --
>>: Well, you sort of did that in the hidden market study [inaudible].
>> Sven Seuken: Right. I'm going to get to that in a second, yeah.
>>: Yeah.
>> Sven Seuken: To the hidden market study. But your idea I want to -- I think is partly reflected in this idea of the changing choice set where we acknowledge the fact that it's a slightly different angle, but I think it gets at something similar; namely, that here if we know you're in a low-importance task, perhaps we know that on average you choose something around 200. Well, we're going to give you our choices that are all around the 200 range. This is what this gives you, zero, 100, 200, and 400. If you are here, we know you're in a high-importance task. And so we know that sometimes you choose the thousand.
So the analogue -- what you're suggesting essentially, if you're currently using something at 500, then your choice set should probably be 300, 400, 500, 600. So I think the motivation was similar for that design level, but we haven't directly studied what you're suggesting.
>>: You're suggesting cognitive overhead of making -- of [inaudible] changes that are
[inaudible] surfaces would be much lower and therefore do less design work for cognitive cost.
>> Sven Seuken: Let me try to --
>>: [inaudible] yes, search data more [inaudible] people are bad at making decisions; they're good at making adjustments.
>> Sven Seuken: Let me -- I want to --
>>: I guess that's the electronic cane. That looks fine just now, but we have a Wizard of Oz behind the curtain that will help us [inaudible].
>> Sven Seuken: And let me get to the last two slides of this, too, because I think that addresses partly your question, and it goes -- it takes some of the ideas a little further beyond the study of what we actually did.
So I want to remind you of this effect of changing prices. I showed you the -- these results were statistically significant; that when we have changing instead of fixed prices, the error rate increases and the decision time increases. So that's really -- both is bad for social welfare.
So what if we could avoid changing prices. And to an economist that first sounds crazy because a standard results allocation market requires that the prices for each resource keep changing up and down until supply equals demand, so until we reach an equilibrium. However, it's really tedious because prices have to change whenever demand or supply change to rebalance the market or at least it has to change sometimes if we have this curve.
And this means that if price changed, then also we need frequent user interaction with the market upon every price change. And that every interaction with the market is annoying to begin with, and then we've seen that interacting with a market with changing prices is even more annoying and is even more difficult.
So what if we could hide part of the market from the users to avoid all of these costs altogether.
And that's essentially the idea of a hidden market with fixed prices. So let's consider the 3G or
4G bandwidth market again. We have a fixed maximum amount of bandwidth that the market supplies and every user has variable demand, but with a fixed limit. I mean, you can't consume more than your cell phone can consume. It's a limit of how much of that resource you can consume, even if the price is zero. That's a strong assumption. It's not really the assumption in other markets. If there's zero price for Coke, I'm going to buy an unlimited amount of Coke, probably. Almost.
And, furthermore, we need that each user have a fixed amount of some currency, either virtual or real currency. So now what the idea is we are going to keep the market price fixed over time but we're going to expose a different set of the users to the market at different points in time to balance the market.
So we will determine a market demand threshold T. And whenever aggregate the amount of larger than T, then we will show the market UI with fixed prices to subset S of all users. So as long as everybody demands less than T, nobody has to interact with the market. Aggregate demand rises above T, some users get a popup and are asked to make a choice regarding their
speed now. Each user must make a choice, and some users will choose a choice less than the maximum choice. So aggregate demand decreases.
The challenge is to find an algorithm that chooses the right size subset S and the right kind of users such that we can balance the market. And of course the idea would be if today Kamal is chosen for the interaction to make a selection, then tomorrow he's not chosen, another subset of users is chosen, so that we are reducing the overhead on every individual user, potentially by
N-fold, depending on how large a subset we take and how much we want every user to decrease his demand at a particular point in time. And we can still balance a market, but we keep prices fixed and they don't have to change over time. So it's a much easier task for the users to deal with.
So in ongoing work we are formalizing this design idea, and this is where I'm getting back to the efficiency question of fixed business changing prices which we can only answer essentially either in a simulation or in a full-on model because, you know, if you have changing prices, you can do more things than if you have fixed prices. You can more individually move the demand up and down more carefully than if you are showing a fixed price market through subset of S users.
However, you're gaining all this efficiency because you are not exposing the users to the market in the first place, saving their cognitive costs and so on. So this is the ongoing work that we're doing to compare these two design approaches.
>>: [inaudible] overbooking of flights might be [inaudible].
>> Sven Seuken: Um-hmm. So, yeah, this is something Kamal and I had talked about before, if -- everybody knows this, this scenario at the airport and the airline overbooked the flight, there's five people that won't be able to fly among the currently booked passengers, and what should the airline do. Do we ask every user, okay, do you want to give up your seat for $500?
Or that means every user has to -- is prompted with this information, sees that the airline has overbooked, perhaps that's a negative signal or not, you might -- this is -- you can discuss this, but then every user is making this decision now, is this worth it or not, and only five users at the end have to make that decision.
So the analogue is that you might only -- if you have a model about users' values, then you might only show this decision to a subset of the users such that you know on average or least five users will make the selection and then you're fine.
So we believe that this approach of designing markets where we're hiding a part of the market from a set of users can actually increase welfare significantly.
>>: Do you think that will fly politically, those kinds of marketplaces?
>> Sven Seuken: Yes. I --
>>: [inaudible] I don't think it -- no, I mean, the people who travel on let's say a higher fair --
>>: Gold members of Alaska.
>> Sven Seuken: One second. To answer your political question, I have no worries about political issues here. I mean, because I've already seen that depending on the browser you use, to go to a credit card application Web site, you get different interest rates depending on the browser you use. That's even crazy price differentiation based on your browser --
>>: Firefox [inaudible].
>> Sven Seuken: Yeah, Firefox browsers get the best rates. And this --
>>: Isolated that in the cookies that are in the browser?
>> Sven Seuken: I didn't do that study. Somebody else did, and I think they looked at that carefully, yes.
And so this is treating users much more fairly than this or much more equally than this browser-based approach, right? We're just -- randomly sometimes you have -- you get the choice, sometimes somebody else gets the choice.
And, by the way, if you don't get a choice now, what that means is you get to use the market at full speed without having to pay for it, right? So you're actually not suffering. Okay. Did you have a question?
>>: So I guess why is this different, say, than a Vickrey auction or something like that where you make the decision ahead of time what your value is and no one really has to answer anything. As long as the clearing price is zero, I don't have to pay for anything. The moment it goes up, you're paying whatever you put the value at.
>> Sven Seuken: Ah. Okay. If you -- that would require -- users' values in this particular market that we're studying are highly situation dependent, right? It depends very much on the current thing you're doing at this point in time. The e-mail you're writing to which person, your boss or your intern, how important is it, how time sensitive is it.
So your value for speed is highly situation dependent, and I'm making the assumption that you cannot pre-specify your values for all possible situations that will come up in the future such that -- and that was the example with the Vickrey auction where you specify your value once and that --
>>: I don't think it's any more ridiculous than assuming you know your value of what the next incoming e-mail is.
>> Sven Seuken: You don't know -- oh. What this approach allows you to do is simply use your application, don't even think about what your value is for anything, and only if we ask you, and we will only ask you let's say once every three days, in that particular situation you now have to decide what's my value for speed right now.
And you should know that because you're doing something right now and you have -- and that either that something you're doing is important or not important. So you only have to think even about what's my value for this if we're asking you. If we're not asking you, you never have to think about that value. Does that make sense?
>>: Sure. I mean, this is kind of going on already today in that --
>>: [inaudible] formalized.
>>: I mean, the carriers are already saying this protocol is more valuable we're much more willing to interrupt your video than we are to interrupt your office VoIP or what have you. And so it's just separating the consumer one step away from -- if the service provider is making a set of decisions that conflict with your values, you may end up changing service providers.
>>: [inaudible] two years.
>>: Well, that's because they're hiding the fact that you can get out of the contract for 200 bucks but you're paying 500 bucks for the phone. So you're not really locked in for two years.
>>: Another hidden market.
>>: [inaudible] technologies has been looking at a very similar mechanism for electricity markup. I don't know if you know about it, but I think it's a versatile mechanism where if there's excess demand and there would be a blackout because of excess demand, they were thinking about sending messages to people can you switch off your air conditioner please, otherwise we'll have [inaudible]. And they were also thinking about who to send this message to.
>>: [inaudible] would be [inaudible] you have a perfect model and essentially control to lower
[inaudible] from the power plant. We're raising a little bit [inaudible].
>>: I sent you a message saying you know what you're doing right now, a minute ago it cost you a dollar an hour, now it's costing you $4 an hour, what do you want to do.
>>: [inaudible] raise the temperature.
>> Sven Seuken: Yeah. So I know about this, you know, new electricity markets and, as you know, I'm studying this idea of hidden markets more and more generally for my thesis, and one application I started thinking about is smart grids and the question of how do we design these electricity markets to involve the user at the end of the market in a proper way without overloading them with cognitive costs.
>> Eric Horvitz: So probably should summarize.
>> Sven Seuken: Yep. This is my last slide. To conclude, what we've done is we introduced this paradigm of market user interface design and the ways to look at it and study it. In this particular project we studied these four design levels and ran the experiment. And here the effect -- we saw the effect that we predicted; however, the drop at the end was not statistically significant. For the other three we saw statistically significant effects with respect to the error rate. However, the UI optimization, while it worked for decreasing the error rate, it didn't work for increasing efficiency in some possible -- of some possible explanations for why that happened.
Future work is a formal analysis of this hidden market idea that I just presented, and then also taking the idea a step forward essentially to this long-term AI dream of the personalized shopping agent, why can't I just let my agent do all the market interactions for me.
Well, our motivation here was always the agent never knows my full utility function for every possible situation that I can be in. That's why we don't want to take all the choices away from the user; we want to tailor the choices in an optimal way for the user.
But what if we could go a step further with the approach and essentially do an iterative or realtime UI optimization for the users when they are already using the UI. [inaudible] answers the question from before, if we actually know that this user is such-and-such good at making decisions, then we can really tailor the UI for that user.
However, know that this requires some kind of feedback signal about how good the decisions are that the users make, which we get from our experiment. The question is where would you get these from in the real world when you're no longer running an experiment. So that's an open question.
And then taking this in another direction, how can -- can we perhaps essentially put a Bayesian approach on top of this. What if the UI optimizer or your computer had a model -- had a more sophisticated user model and sometimes it knows, okay, I'm really sure I got the value estimate right this time, and perhaps I'm even going to make the choose for you and not even going to bother you with the question, or I'm going to offer you just three very similar choices [inaudible] very fine grained around the optimal choice that I think you will make anyway, or perhaps I have really noisy estimate, I don't really know what your -- I've never seen this context before, I don't know what your value estimate should be, so perhaps I have to give you many, many choices in this particular situation so that you can make the best possible choice.
And perhaps we can if -- you know, take the smartphone idea again. If we can combine this with the wisdom of [inaudible] if I have many users in many similar contexts making many decisions, perhaps if I'm using collaborative filtering ideas I can improve the market UI for user A based on looking at what all other users did in similar situations that are similar to me in other ways, behavioral or other ways.
Okay. Thank you very much for your attention, and thanks again to Eric and Kamal and Mary and Desney for a great summer and for being here. Thanks.
[applause]