>> Host: My name is [indiscernible] and I run the... MSR. And these are a set of talks that...

advertisement
>> Host: My name is [indiscernible] and I run the EETALK series for
MSR. And these are a set of talks that we do regularly, pretty
regularly, twice a month or so. Just to reach out to everyone in
Microsoft, the product groups in particular, the business groups. And
it’s mostly talking about technologies, interesting trends, and
innovations coming from MSR that we think have a broader impact. This
of course has a broader impact to all of us. And it’s a very
interesting talk. So David is new to Microsoft, he came from Yahoo
Research just recently. We have a whole new lab in New York City that
is a new Microsoft research lab, all housing about 15 researchers?
>> David Rothschild: That’s correct.
>> Host: Several of whom are some economists, some behavioral
psychologists and some machine learning people. Yeah, that’s the sort
of expertise there. And they are all applied; you know they have been
working for various lengths at Yahoo. David in particular has been
there only a year at Yahoo.
>> David Rothschild: That’s correct.
>> Host: And prior to that got his PhD from the Wharton School in
applied economics. With that I will hand it over to David, take it
away, thank you.
>> David Rothschild: Excellent. Well thank you guys and thank you guys
for coming today. It has been a pleasure being at Microsoft. Of
course we have only been here for a few months but it has been working
out very nicely, so. Our entire lab is excited about it.
Today’s talk is talking about forecasting, in particular focusing in on
the 2012 election in the end, but really focusing on the idea of
forecasting for economics and related topics as well as politics. And
I will get into that in a second.
But when it comes to elections this is big business, upwards of 10
billion dollars are going to be spent on the 2012 election cycle. So
this is a very serious, serious industry. And not only that it’s
similar techniques that are developed thinking about economics and
thinking about how to gather information for forecasting politics is
coming into political economy questions, marketing type questions,
economic indicators, etc.
Pretty much anything which I am talking about in the upcoming election
you could be thinking about how many copies of Halo 4 is going to be
sold, or what kind of jeans are going to be popular next year, or
whether or not people are going to use the new initiatives in Obama
care, so any sorts of these types of questions in which individuals are
making some sort of collective action. The same sort of forecasting
techniques can be utilized.
And the kind of scary thing about it though is that the forecasting
that is used and used to make really serious investment decisions
hasn’t changed very much since 1936.
So in 1936 George Gallup came up with the idea of randomly polling a
representative group of people and using that as a basis for forecast
of what is going to happen in an election. And if you froze him after
the 1936 election and thawed him out today he would be pretty
comfortable with what is being done to find data for a 10 billion
dollar industry.
But that’s going to change very quickly. And it’s going to change in
the next 5-10 years. And I ma going to talk today about how a lot of
that invitations are going to be coming out of our lab and other
related labs in Microsoft research and how it’s going to impact not
just politics, but also hopefully have some really neat implications
for Microsoft as well.
So when you think about forecasting and you think about forecasting the
elections the first data point that comes to mind is going to be the
polls of voter intention. So if an election were held today, who would
you vote for? This is the standard ubiquitous data point that’s been
around since 19, well it’s been around since the 1920s and perfected by
George Gallup in the 30s.
But in addition to that there are kind of four other groups where I
have put data that you do hear about and sometimes you don’t think
about in the same sort of way, but kind of fundamental data is all this
stuff that people talk about where they imply that it has some
correlation with the outcome, but a lot of times they don’t really go
into how, or the exact mechanism.
But past election results, incumbency, presidential approval ratings,
even for Senate and House races and governors’ races. Economic
indicators, what’s the latest job report, prediction markets. For
those of you who are not familiar with prediction markets, prediction
markets are actual real money markets where people can buy and sell
contracts which pay off if a candidate wins and don’t pay off if a
candidate loses.
And they become very strong indicators of the probability of an event
happening, so if it costs 60 cents for a contract that pays 1 dollar of
Barak Obama wins, which is essentially the case right now, it’s a very
closely aligned with about a 60 percent probability from the collective
people in the market. Experts, they can say whatever they want. And
social media, so social media has just started to be coming into it’s
own, Twitter data, Facebook data, search data, all sorts of real time
data in which people are trying to think of how do these correlate with
outcomes of different events.
And so I am going to focus first in this talk on the [indiscernible]
poll. And I am going to be talking about how stuff that we have
learned in research about prediction markets is going to help shape how
the poll is going to transform over the next few cycles.
So this kind of outlines the key differences between prediction markets
and polls. And I am going to walk you through this, but I want to keep
something in mind too, is that research into polling which has been
fairly stagnant over the years is a totally different tract than
research into prediction markets. And it’s kind of weird because
essentially these are just two different ways to engage individuals,
gather information for them, aggregate them together, and then say
something about something. Say something about some upcoming event,
get a data point, but yet no one has really kind of worked to kind of
bring the two together and think about how they differ and how they
share similarities.
Polling likes to think about gathering a random sample of a
representative group of something. So in elections it’s likely voters.
For marketing it’s going to be likely people in the market for Xbox or
jeans or whatever you are talking about.
Prediction markets rely on a self-selected group of people who have
superior information to join a market, bid on contracts, and provide
that information, self-selecting group. Questions, prediction markets
focus in on asking the expectation of what’s going to happen. They are
asking people about whether or not something is going to hit or not,
whether or not someone is going to win an election, whether or not
sales are going to cross a certain threshold. And polling focus is on
the intentions.
It’s the same thing in marketing as it is in politics. Will you buy
this product? Will you go see this movie? Will you vote on this
election? Who will you vote for in this election?
Aggregation; polls focus in on simple aggregation methods. Generally
averaging or some sort of weighted average based on some prior
demographics. Polls use, sorry prediction markets proxy for how much
money you are willing to spend as some sort of proxy for confidence.
So if you are allowed to put more money into, if you feel very
confident you can bid up in a prediction market.
And finally, incentives; polling is generally not incentive controlled,
so the incentives may align for people to be telling the truth, they
may not, and there’s a lot of questions about that verses prediction
markets which are directly assigned to have properly aligned incentive
so that you are supposed to be able to bid up to about the probability
that you feel that something is when you place a purchase order.
Now I am going to focus in though first on this question. So polling
they focus in on intention questions. Prediction markets focus in on
expectation questions. Is there something we thought about here where
you can kind of go into polling and see if maybe expectation questions
make a difference?
So when polling individuals in order to forecast an upcoming election,
generally you have this, voter intention. Who would you vote for if
the election were held today? The wording is pretty much identical for
any of the major new sources that you see from Rasmussen, to Gallup, to
Fox News polls, to ABC polls and MSNBC, it’s all the same.
Voter expectation question; voter expectation question would look
something like this, “Who do you think will win the election?
Regardless of who you are going to vote for, who do you think is going
to win the election”?
And the reason that this interests us a lot is this idea that everyone
has a pool of information, whether about an upcoming event such as an
election or whether or not it’s a marketing question. So on elections
we know a lot of things. We know who we are going to vote for. We
know who our friends and family are going to vote for. We have seen
the news; we know what other people are talking about it.
And when you ask someone their intentions you are getting a small
sliver, a very small sliver of that data right. You are getting just
that individuals intention of it. Expectation is going to respond with
something about their individual information, as well as information
about their social network, as well as other centralized information.
And this is the key motivating factor about it.
And so in order to compare it we were able to, and this is work with a
co-author of mine Justin Wolfers who is no, who is at Wharton School at
the University of Pennsylvania, we were able to find a data set, a
collective data set essentially of 345 times where the same people at
the same time were asked who they were going to vote for in an election
and who they were going to, who they thought was going to win.
Specifically in state by state electoral votes that went from 1952
until 2008.
And this is the basis of the data. 345 races, 217 of the times they
both had greater than 50 percent. Both their expectation and their
intention pointed towards the winner of the election. 45 races or 13
percent of the time they both pointed to the eventual looser. 82
races, or 83 races 24 percent of the time one poll, so the same people
at the same time, more than 50 percent of people intended to vote for
one candidate, more than 50 percent of people expected the other
candidate to win. And by the way, just to be clear, we are talking
here about the major party candidates, so we dropped people who
intended to vote for a third party candidate and/or expected to vote,
expected a third party candidate to win in which case no third party
candidates have won, so it wasn’t was relevant.
Three-quarters of the time the expectation question points to the
eventual winner, one-quarter of the time the intention question pointed
to the eventual winner. And so just on the sake of this kind of binary
outcome it’s pretty clear that there is a lot of information coming
from this expectation question. But to kind of head off some like very
obvious questions about this first of all is that there are many
examples in here, and many more examples in life where there is no
intention poll kind of publicized before the election.
So one of the first things you are thinking of is, “Oh people can do
expectations well because they probably saw the intention poll”. But a
lot of these states you are not really seeing publicized intention
polls before the election. Second thing I would mention is that we
actually, part of the research that I am not going to show today is
that we gathered amounts of about several hundred actually of times
where this type of question was asked prior to intention polls.
So prior to 1936 we gathered a data set from 1880 to 1932 in which
people were asked who they thought was going to win in certain
elections. And show that it actually provided a lot of information,
even prior to the availability of any sort of polls of voter intention.
But more importantly I think for kind of this example and thinking
about major races where there are intention polls what we are able to
show here is that when both intention polls and expectation polls are
available there is a time where intention polls are very, very poor.
Let’s say there is a very small sample size or there isn’t a good
sample selection. Expectation polls dominate and provide just tons
more information.
In the most advantageous position for intention polls, were perfectly
run large sample size, major polling companies, etc, etc, expectation
polls still carry more weight than the intention polls. Also we have
individual level data for these 345 races and we were able to show that
there is an extreme amount of homogeneity. So if you think about it,
if everyone just expected whoever they voted for to win than there
would be the intention poll and expectations polls would be the same.
If everyone was seeing the same thing from the media than everyone
would expect the same thing, regardless of their intention.
What we see is a mass amount of homogeneity, which shows there is some
middle ground. There is information coming from the center, coming
from this kind of social network data or coming from some other thing
besides just your intention in the central signal. And finally I think
it’s the most important for some reasons too is that even if there is
an extremely well publicized intention poll which one would you rather
ask somebody? If you only asked one person and you asked them their
intention. There is a close race, there is about a 55 percent chance
that they may be right, a 45 percent chance they may be wrong in that
sense. But if you ask them their expectation there is a much higher
likelihood you are going to be right. And this kind of carries over,
this kind of basic idea carries over through this idea of asking this
question.
So, now of course the binary outcome is nice, but that’s not really
what a lot of people care about at the end of the day, especially when
it comes to forecasting, people think about expected vote share. So
what I have done here on the X-axis is the proportion of people who
intend to vote Democratic. So this is the polls that we just talked
about on the last slide and I have dropped 2008 from here because that
becomes out of sample. And so this is 311 different elections, this is
the proportion of people who intend to vote Democratic and the Y-axis
is the actual vote share.
Now this is what you see a lot on the news. It’s a kind of naive,
implicit vote share. And what we are going to see is that there
definitely is an upward slope here. Definitely the higher the poll the
more votes they end up getting, but it’s clearly not a one for one
ratio right. So this naive vote share is not that great. And so we do
some very clean and transparent calculations where we say that the poll
essentially equals the actual vote share plus some sort of house bias,
plus some [indiscernible] party or [indiscernible] bias, plus some sort
of time bias. Most of these are actually clear. We don’t have much
time variation. They are all taken about 30 days before the election.
Plus some certain error turn and we are able to translate the raw
polling data into an expected vote share and it looks something like
this.
Now the interesting thing though is when you look at expectations. So
on the X-axis is the proportion of people who expect the Democrat to
win and the Y-axis is the actual vote share. It doesn’t look that
clean. It doesn’t show very much. And this is one of the reasons why
people have not been using this poll for very much because if there is
90 percent of people who expect a candidate to win what does that mean?
I mean it could be a close race, but everyone still knows what the
outcome is going to be or it could be a very wide race.
And so implicitly it doesn’t look like expected vote share and
doesn’t look like a probability victory. It doesn’t look like
all actually. There is an upward slope to it. The higher the
expectation the more people expect a candidate to win, and the
votes they end up getting, but clearly not one to one. But we
same transformation, it’s the exact transformation we use for
intentions and this is what we get.
it
much at
more
do the
And it may not be clean because you didn’t see the other one right next
to it. So if you look at the two of them right next to it, being on
the 45 degree line means parity, which means the expected vote share
coming from that poll equals the outcome. And you can see just looking
at these that the expectation data translates into a much cleaner
looking forecast of the actual vote share.
For the numbers people we can just show that when you take the
expectation data, which is the second column and turn that into an
expected vote share verses the first column which is the intention data
into a then expected vote share you have a lower root mean square, a
lower mean absolute error, its closer to the forecast, the forecast is
closer to the answer more often, the higher correlation and
encompassing regression. But this is the kind of thing which I like to
focus on is this last slide here, this last number here.
Is that if we were to put a weight on which forecasts it would take you
get tons of significance and a lot of weight on the expectation and you
get a little bit and no significance on the intention. And to be clear
here what major polling companies use are 100 percent and 0 right. So
even if you don’t totally buy it, right here we have a very, very
strong finding which is essentially 100 percent and 0 in the other
direction.
So we wanted to think more about what this means and how this could
help us beyond just this question. And in many ways we wanted to kind
of interpret the power of it as well. So let me just give you a quick
interpretation. Imagine if when people were asked their expectation
over what was going to happen in an election they went out and they
just polled random voters on the street? They just polled random
voters on the street?
The data is as powerful for expectation as if they went out and they
interviewed 10 random voters and then included themselves in it. So
essentially if they just gave us a binary answer, you know Democratic
or Republican, based of a poll of 11 people which was them, plus 10
additional people. We don’t think they are actually doing that
obvious, but that just kind of shows the kind of power of asking this
question verses the other question.
So there’s this massive multiplicative effect. It’s like having a poll
of 10 times as many people. But not just 10 times as many people, it
also solves one of the hardest questions in polling which is, “What’s
the representative sample”? So when polls report a margin of error,
when you see Gallup as a margin of error plus or minus 2, or plus or
minus 3, but what they are reporting to you is the random sample error
if they knew exactly what the likely voters were going to look like.
So they knew the demographic breakdown on Election Day of who was going
to show up and then they went out and polled 1000 people.
That plus or minus 3 percent, plus or minus 2 percent, that’s the
random sample error based off the fact that they only interviewed 1000
verses you know a million or everyone. But it misses this other major
source of error which is that they don’t know who is going to vote and
they are having this likely voter model which is a lot of guessing
work. And that’s why in the 2008 election, the day before the election
of the major polls that came out in pretty much every major polling
company they ranged from Obama up by 2 to Obama up by 11. And it
wasn’t just from random sampling error; it had a lot to do with this
hard problem of thinking about likely voters.
But that’s not what actually happens. So we want to figure out more
about what actually happens. And as we said there is this idea about,
you know that the people’s intentions were involved, we know there is
something about this localized information that they have, and we know
that there is something about the central signal.
So, one of the things that we are working on is that we actually have a
bunch of stuff in the field, actually with Gallup, which I have been
working on prior to getting here. And also we will be asking these
questions inside Microsoft’s network with Xbox and with some other
platform in which we are going to be asking people about not just their
expectation and intention, but asking who their friends and family are
voting for; who they think that the media is saying is going to win.
And trying to put all this together to actually
information we are getting because the question
the question we had to use because it was asked
Rumsfeld said you go to war with what you have,
What we want doesn’t involve the central signal
leading expert in the world on what the central
break down the
about expectation is
before as Donald
not with what you want.
because I am the
signal is.
So I don’t want people regurgitating to me what they think the polls
are saying because I can do that with less noise than they can. I want
to know what they are providing me about their friends and family. I
want to be able to ask a poll and get that information and think about
the social network data that’s coming from it. And that would be the
ideal and that’s what we are experimenting on this year.
Let me show you a little bit in the field this year. So like I said I
have been working with Gallup, so Gallup has been kind enough to be
asking these questions for us. This was taken in November during the
Republican primary and what you will see is at this point Romney and
Cain, if you remember him, were in a dead heat, but when we asked them
their expectations, Romney dominated it.
Later, one month later, you had Romney getting crushed by Cain, Cain
was, oh I am sorry not Cain, Gingrich, Gingrich, sorry it just kept on
floating around so I forget who was leading at different points. There
was actually, in the intention polls, there was about 15 different
leaders. It was like one leader, then Romney, then another leader,
them Romney. And this went back and forth. The expectation poll even
at the lowest point, and this would be about the lowest point, never
wavered. So every time we asked the expectation poll they always said
Romney, even when he was being dominated in the intention poll.
So there is clearly something that people knew when they were coming
into these that was providing us meaningful information. And so what
do we have right now? We have a mix of polls up there. So if you look
up a Real Clear Politics or Pollster which takes the latest polls up
there, I think this was Real Clear Politics and I polled this down two
days ago, they had Obama up in four different intention polls, they had
Romney up in three and they had a tie in two. Not all of them had
expectation polls, but everyone we found so far in this entire cycle
has Obama up. A good example would be the latest Washington Times poll
in which they have Romney up 42.8 percent to 42, almost a tie in the
intention, but they still have Obama up big time in expectation.
So let me talk about sample selection now. So this is a pretty
interesting thing because the way the polls are worked, and this is for
everything, we spend a mass amount of money and time thinking about
this representative sample. And the basis of all of this kind of
polling has always been to gather representative samples; random groups
of representative samples. But what we are able to show with this same
sort of data is that I can take the most biased, un-representative
sample possible. Just those people who said they were going to vote
Democratic, or just those people who save vote Republican and actually
create and accurate forecast than the whole group.
And why is this important? Declining land-line penetration, unrepresentative online surveys, difficult contacting working families,
20 to 30 percent of some demographic groups don’t have land-lines
anymore. Pew ran a survey recently which showed that getting people
onto standard random digit dialing polling, which is the gold standard,
9 percent is what people are averaging now of how much you can gather.
They did a kind of gold plated one where they got up into like the 20s
by like calling 35 times or something crazy like that.
If you could take non-representative samples like Microsoft and other
major corporations handle millions in any given day, or billions right
of non-representative people who ping them for different things and ask
them simple questions and you can make it work for you, then you have
got something going. And another thing to think about is that, you
know, in contrast to the polls and just going back to prediction
markets, prediction markets are dominated by generally elderly or kind
of people in their 40s or 50s, presumably wealthy, and white, and male;
this is a very un-representative group and they have done a very good
job in predicting outcomes of elections.
Just to be really clear of what we are doing in this, standard. You
take all the poll people who said they were going to vote Democratic,
people who said they are Republican, that’s your voter intention.
Expectations, let’s take the whole poll. All the people involved, the
whole sample, expectations.
What we are able to do here is by accounting for the correlation
between intention and expectations, we are able to take roughly the
sample size, either just those people who were going to vote Democratic
or just those people who are Republican and are able to get a more
accurate forecast of the upcoming election.
So I am not going to go into the math because I am not going to have
time here, but this was really big because this kind of showed us that
we don’t need to do polling on fully representative samples. If we
have some historical data and we have some understanding about
debiasing we can actually make un-representative samples work for us.
I am going to go over this real quickly because I will be low on time,
which I am, aggregation methods and incentives. So what are we
thinking about here? We are thinking about here is a combination of
gamification and new graphical interfaces. And so this is what we call
the ball and buckets method which we have been working on, in which we
get laypeople, people with you know non-experts, and they fill up, we
ask them a question and we have them fill up the different
distributions between the different ranges. And actually they end up
creating a probability distribution.
So we asked them in this case about quantity of something, their
[indiscernible], we asked them say, “Where is the answer likely to
fall”? They pushed these up until they distributed 100 percent
likelihood across these things and we have been able to show that
people have been able to create really amazingly good distributions;
normal distributions, uniformed distributions, right distribution, left
distributions. Really able to show their confidence and their kind of
full understanding of where the answer may be able to like.
More importantly we have been also developing other things which are
actually less cumbersome which are just kind of fun games that people
have been playing in order to provide information for us. And what its
doing is two things. Number one is that it’s allowing people to reveal
their confidence over their answer, reveal their confidence without
being a money based thing or without having money as a proxy for it
because what we are showing is that confidence as the standard
deviation of their probability distributions is highly correlated to
how accurate they have been.
So we have had people play these games like this. The tighter their
standard deviation, the tighter their distribution, the more accurate
they have been, so we have been able to use that to weight forecast
based on it. And the second thing we have been able to show is that if
we make the games fun, if people have incentives beyond money, we can
actually incentivize them properly to give us meaningful and truthful
answers without having to make it a money based game.
And so in short when we are thinking about polling and how we are going
to transform it is that previously basically everything has been based
on this random sample representative. We are switching to the idea
that we can use random sample, even self-selective samples of nonrepresentative groups. Questions, we are trying to think of the
intersection between intentions, expectations and people’s social
network rather than just relying on people’s intentions. We are
working on weighting on weighting by revealed confidence. And we are
working on making things incentive compatible by gamifying things
rather than thinking about having to make sure that everything is money
based.
And so redefining polling by having polls meet prediction markets. And
where this is going to be useful moving forward too is that obviously
Microsoft has massive amounts of engaged, non-representative users.
And so in this election cycle we are going to be asking some of these
questions, a lot of these questions and pushing it forward with
experiments with the Xbox where people are going to have the
opportunity to answer polling as well on some other platforms.
And it’s the idea being that we hope to be able to create very
meaningful forecast predictions, as well as also some analysis on
sentiment and interest from these people coming from this nonrepresentative group.
Furthermore we are also running some experimental games which are
running off of Microsoft right now because these are not advanced
enough, but we have been toying with them and the idea of making even
more advanced prediction markets which we are really excited about
could really utilize and really be a lot of fun for people.
But I should also
representative is
of likely voters,
representative of
representative of
people.
put a caveat here on this question of nonthat our user base is not necessarily representative
but it is representative of Xbox users and it is
Microsoft users. It also, it’s also somewhat
a lot of demographics that are poured into a lot of
So the ability to also do some really cool market research, some
advertising research and some other very useful and very direct things
is not lost on us by the fact that we don’t need a perfectly
representative, but reasonably representative stuff, hit randomly or
hit even self-selectively, our ability to start debiasing that data and
thinking, “Well I would rather just have this many more hits and not be
representative”. If we can debias it properly we can actually answer
some really answer some really awesome questions and with some massive
scale.
So let me start putting this into a little more context. So I am going
to sort of put things into three categories here. So what I just
talked about is kind of active Microsoft data. This is data that we
are going to start collecting in different realms with experimental
polling and experimental games. On the far left, this is kind of
passive outside data that is generally used in making predictions for
both kind of marketing type questions as well as election type
questions. In the middle kind of I will refer to as passive Microsoft
data. This is Microsoft data that we have that we don’t actively all
for this use.
Now the really interesting thing is when you start thinking about this
in terms of election forecasting. A lot of stuff stops at this kind of
passive level right? So you hear a lot of people report the latest
daily poll, but they don’t kind of translate that into expected vote
share you or probability of victory or something. People talk about
the latest jobless numbers and say this is important, but they are not,
there are ways to translate that into what does it really mean? Like
does a 10 percent tick up in the un-employment rate, what does that do?
Same thing with Twitter; everyone likes to report raw Twitter numbers
you know. There was this many re-Tweets of something. What does that
mean? I mean put that in context. How many Tweets are normal? And so
the idea really of what this kind of greater project that this falls
into is taking all of this data, this passive outside data, this
passive Microsoft data, this Active data we are calling and really work
it into real time data visualizations and tables in which there will be
internal dashboards in which people can utilize to learn about
different things, as well as for external market research type things.
And then external facing charts and tables that try to take all of this
data into --. Aggregate it together into very clean kind of
predictions of things, social media interests and things, and sentiment
around things. So that’s kind of the overall kind of goals. This is
kind of the second stage. So before I was talking about the first
stage, let’s talk a little bit about this second stage.
And so when you think about this in terms of forecasts, the idea is to
gather this information and aggregate it. And I kind of think about it
in four kind of key things. One is thinking about accuracy, two is
thinking about timeliness, three is thinking about the relevancy of the
forecast, and four is about the economic efficiency.
And so what do I mean by these things? So let’s think about accuracy.
I am going to talk a lot about combining data. And so what you see a
lot of is, is the people providing you individual data streams, but I
don’t inherently care about Twitter. I care about what Twitter can
forecast for me or what it tells me about sentiment or interests. You
know I don’t inherently care about Gallup’s polls verses Rasmussen’s
polls right. I care about the forecast they are trying to project.
So it doesn’t make much sense, the fact that people don’t think about
this. And what the first type of thing is to think about, of kind of
within data type aggregation, so you get a benefit from you know
aggregating Rasmussen’s polls and Gallup’s polls and taking an average.
It’s pretty much always better than grabbing one verses the other.
Debiasing it, so with polling we know that there are biases, whether or
not it’s from, in polling there is something called the anti-incumbency
bias. So incumbents poll lower around Labor Day or a couple of months
before the election than they actually do in the outcome. And these
are very simple things you can debias. Combining different data streams
together, and combining prediction markets and polls. Is there and
advantage to it? And then kind of debiasing the whole thing together
and I will give you more examples.
So this is data from 2008. And so essentially what you are seeing here
in the blue line is all polling for the national election and I have
aggregated the different polling together by taking a simply linear
trend on any given day of the previous polls. And then I did a simple
debiasing method which is very transparent and created a probability of
victory. That’s the blue line. And this is for the incumbent party
which was the Republican Party, so this is McCain. And what you see is
that he floats around 40 percent, pretty stable, and then Lehman goes
under and his likelihood of winning the election just crashes towards
zero.
Now what if I had done the same method, but I had just taken the daily
polls on any given day? So rather than taking a trend of the previous,
all the previous polls together, I just took whatever the latest poll
that came out was? That’s what the exact same thing would look like.
And this is what you are getting basically right, when you turn on the
evening news? People are just talking about that latest poll about
Obama, or maybe there are two polls out today and they will average
them together for you. But there is really no reason to believe that
the underlying value of this election was bouncing around like that on
a daily basis. I think it’s pretty safe to say that it looks a lot
more like this blue line. And so this is what happens when you just
kind of do some basic within data aggregation.
So let me talk about prediction markets. So people like to report
prediction market prices and it’s been pretty clear with the research
that prediction market price as far as raw data goes are really a
strong indication of probability of victory. But this is polls from
about two days ago. What you will see on the top line there is
Michigan, so this is the top line in yellow is essentially a contract
that pays off, let’s assume at 100, should Obama carry Michigan and
pays off to 0 if he loses. But what you will see is that there is a
pretty big difference here between the bids. So people are willing to
spend 75 and when they were asked people were willing to sell it for
86. And then the last purchase was at 73.
So it’s not exactly obvious what the price is and what the number to
poll from this is. What we generally take is something that is
actually, in order to do this in real time, is actually a pretty long
string of if and but conditions. Which comes out to something roughly
between the middle of the bid and the ask, unless the spread gets too
large.
But then there is something else which is a question of debiasing which
is you will see Main here sitting at you know somewhere around 97 or
96. That will never go any higher because of transaction costs and
opportunity costs no one is actually going to bid Main all the way up
to 100. And we know this and we see that systematically so we have
ways to kind of say, “Let’s debias what we call the favorite
[indiscernible] which is that really its low contracts tend to move up
slightly higher than they should, and really high contracts don’t go
all the way to 100".
This is from another set of research which I have done using kind of
fundamental data. So this is created based off of past election
results. Past election results combined with economic indicators,
incumbency and other things like that. And what you see is that pretty
early before the election you can actually take all this data,
aggregate all these of these different kind of data sources together
and create a pretty accurate forecast of what the expected vote shares
could be.
So the X-axis is the kind of forecast that I can create on June 15th of
an election year, the Y-axis account for the actual outcome, and what
you see is that it’s a pretty nice line. But it does speak to this
question of when people just talk about job numbers and talk about
things like this. You know we spend a lot of time of how they actually
correlate in. If you actually think about it you can actually break it
down. I can see, you know, the trend in job numbers. How important
that is. How much does incumbency buy you? And that’s why you kind of
want to create these models and think about it more in this kind of
fully debias form.
With that being said, this is taking that kind of fundamental data,
adding it to prediction market data, and polling data and this is from
the last few cycles, both presidential and senate races. This is the
kind of main forecast model I use. This is how accurate it can be 103
days before the election. 103 days before the election is obviously
today. I didn’t just pick that number just randomly. So this is the
forecast that my model had created for the last couple of election
cycles about 202 races, so it’s the last three presidents, the last
three senatorial cycles and the last two presidential cycles, and you
can get pretty darn accurate if you start polling all this data
together. And this will just get more and more accurate as the
election approaches because more and more information is added to
prediction markets or polls or fundamental which are the main inputs
going into this model.
So now on the question of timeliness, this has always been something
that just bugged me in the sense that academia and the press tend to
judge forecasts for almost anything, this includes economic indicators
especially right before something comes out when it’s only completely
meaningless. So whether or not you can forecast the jobless rate you
know three minutes before the jobless rate comes out, or generally they
come out a day to five days beforehand. You can’t make major decisions
based off that right?
Essentially you need that data coming when you
need it. And the earlier you can make these predictions the more
valuable it is. To actually make investment decisions.
And so I work on making sure that all these forecasts are updated in
real time so they are as relevant when you need it and when you need
it. And actually adds a massive layer of complexity that we are
working with because of things like prediction markets breaking down.
It can get confusing when there is no bid and ask in some prediction
market how you write the right code in order to make all this stuff
work.
And second of all the granular nature of creating forecasts that update
all the time makes it so that you can do some really awesome research
about the effect of different things. So if I have a forecast that
literally moves in real time and someone announces a surprise VP
candidate you can actually see how quickly and how precisely this
moves, these forecasts moves. And that allows you to study things that
you can’t study if you have a forecast that comes out once a month or
something.
And so it’s the question of relevancy too. So, generally as you know
with elections, the first thing that people talk about is the raw poll
numbers. This is raw data right. And hopefully you can kind of see
from when I was talking about today is that it is very easy to make it
into something a lot more meaningful. But yet what people generally
report is raw data. Raw number of Tweets, raw number on a poll, the
price on a prediction market, the jobless number, and this stuff
doesn’t help most people. This is raw data. Then you have got to
transform it into the most popular thing is into some sort of
estimation of vote share.
But most people don’t actually care about estimates and vote share.
The reason that estimation of vote share has been the kind of grand
daddy of forecasting for elections is because it’s the easiest forecast
to make and polls implicitly to look like it. But most stakeholders
don’t care if Obama wins by negative points, as long as he wins the
election. And they don’t care if he wins by a thousand. What they
care about is who wins.
So probability of victory is actually what most stakeholders have shown
they care about, but what very few forecasts focus on, and because it’s
a little harder to make and a little different. But trying to think
about these things are some of the things that we have been thinking
about a lot and trying to make whatever we create, whether or not they
are sentiment index or interest index or predictions more relevant.
And so finally, kind of social media data, and I apologize if I did not
talk about social media enough data today, but the bottom line is that
there is a lot of ongoing work on it. But currently people embarrass
themselves frequently when they talk about social media data in the
popular press and also in academia too. And the reason is because it
is very hard to calibrate social media data without coming to this
point because social media changes so rapidly and there are so few
outcomes to correlate them with.
And the bigger problem though comes with this very, very had problem of
people taking this raw data and then confusing it with many different
types of outcomes. Not being clear what they are talking about. Are
they talking about how much people are interested in something? What
they think is going to happen, or the sentiment around it?
And a good example would be Ron Paul who was getting a lot of news
during the primary was the fact that Ron Paul was dominating social
media. And so a lot of media people kind of made an ass of themselves
by talking about maybe this is some sort of indication if he is going
to win. Well it was no indication if he was going to win; it had
nothing to do with that. It was some indication of how much people on
Twitter, which is a very small self-selected group of people that we
are able to actually quantify, which is an even smaller group of people
that you can actually poll sentiment from, happen to like Ron Paul a
lot.
But people weren’t putting that into the proper context. They also
weren’t putting in this question about sentiment or was it a question
about interest? And all these types of things are very difficult to do
and these are kinds of things we are working on. What we are trying to
do is pull out all this social media data combined with other rapidly
moving data and put it into proper context; into kind of three main
buckets.
One is a prediction of what things are going to happen. Two is an
interest index which kind of says something about how big something is
and how long we believe that this event is going to last.
So if an issue pops up during a debate can we put that into context of
earlier debates? How big the Twitter reaction was to it or the
Facebook reaction to it was, and how long we feel it is going to last.
And then finally putting it into a proper sentiment index which shows
us things like pretty much every candidate has massively negative
sentiment on Twitter. So unless you put that into the context you are
not applying anything meaningful.
So if I take those first two steps then we are going to try and bring
in the next two steps which are kind of infographics and stuff.
Thinking about how do we get laypeople to understand this in a very
clean and meaningful manner? And then supply this to outside media.
And the reason why we are interested in doing that is to kind of build
this feedback loop. And this feedback loop is that the more we can
talk about the really meaningful and interesting data that’s coming
from Microsoft, the more we can get people to supply us with data and
utilize our systems.
But also the interesting one is kind of the other inner back-loop which
is that the more we can figure out how people can understand things, if
I can get people to understand improbability distribution, and then I
can get them to supply me more interesting information. So the work we
are going on to create data visualizations that people can understand
is the same data visualizations that we turn around into these
graphical interfaces to get people to supply that information.
And so let me talk about the 2012 election then. Let’s see here. I
know it’s always tricky to go live during a talk, but. So the outcome
of this is that, and these are currently sitting on my personal blogs,
but they are going to transfer over to Microsoft space sometime soon,
is that we are able to turn around real time and what we believe is
extremely accurate forecasts of the election.
So this table, along with accompanying map form, is something which is
updated every two minutes or so and it’s updated with the latest
prediction market data and other data that’s coming into real time.
And we feel, like I said, that this is very accurate, but also very
meaningful. That it comes in when you need it.
And what you are going to see from this list of the Electoral College
is that you can focus in on this middle section, these are the swing
states. So if you add up everything above that point you are going to
give Romney 191 electoral votes and everything, so that’s everything up
to Tennessee. And then everything Wisconsin or below is 247 electoral
votes for the presidency.
Now I want to make extremely clear is that these things are well
calibrated. So when I say there is a 77 percent chance of covering
Wisconsin, I do mean that there is a 1 in 5 percent chance, or almost a
1 in 4 percent chance the president will lose. But, and this is kind
of tricky and this is really interesting research we are also doing as
well, the correlation is such that if the president loses Wisconsin,
mostly likely it’s not the swing vote at that point because most likely
if he loses Wisconsin he has lost a lot of other states as well.
And so that’s something to start kind of thinking. Rather than
thinking about these independently your best thing is to think about
these almost as a ranking method where it’s very unlikely that states
jump across too many different states. If the president carries
Alabama, he has pretty much won the country.
But the swing states point to a really interesting thing, which is that
the fundamental data, as which I said we looked at this fundamental
data that comes up through mid June or so, and Romney should be winning
on all accounts if you look at just, even incumbency against him, but
you look at the economic conditions, you look at some of these other
kind of raw data that comes that is not directly related to the
campaign. But Romney is getting killed in favorability ratings
compared to the president. So Romney’s favorability is still sitting
around 43, where the president is closer to 50. And the president has
extremely high presidential approval for someone in these economic
conditions.
And what that translates into is an uphill battle. So if I kind of
refocus on here again, as I said, once you get up through Tennessee
that means that Romney is only up to 191 right there. He pretty much
needs to carry Florida, which he has a pretty good shot at, but he
needs to take Ohio and Virginia and then he needs to take one of the
following four states. So that means the president needs to defend
either Ohio or Virginia, or just these other states, Iowa, New
Hampshire, Colorado and Nevada, which does translate into what we have
right now as a roughly 60 percent chance of the president winning the
election.
Now, and I will say one thing which I was just talking to Jennifer in
the audience about this a few seconds ago. This doesn’t, it’s kind of
scary because it does rather than kind of adding things to watch, it
really subtracts into sort of a scarcely small segment of the country.
The senate list though is a little interesting as well. So the
Democrats currently have 53 seats which they are controlling, that
includes 2 seats which are technically controlled by independent,
Bernie Sanders who is a socialist and so is not in danger of
[indiscernible] with Republicans. And Lieberman who is an independent,
but is retiring. And so if you take a look at this list right now you
have to put in mind though that despite the fact that the Democrat’s
control 53 seats right now, they have 23 seats that are up for election
and Republican’s have just 10.
So, the Democrats need to think about it in a different way; it’s that
they are going into this election with 30 guaranteed seats and the
Republicans, despite being down, are going into this election with 37
guaranteed seats. So, 33 seats are up for grabs. The Democrats have
30 already right now and the Republicans have 37. And the really
interesting thing is that if you just think binary right now, so if you
think the Democrats capture Virginia through Washington and the
Republicans capture Montana through Mississippi and I will ignore Main
for one second, that essentially gives the Republicans, I’m sorry the
Democrats that puts at 49 seats and the Republicans at 50 seats.
Main right now is highly likely to go to an independent, which I have
sitting out here because there is only one of them, there you go. His
name is Augustus King and he refuses to say who he is going to, Angus
King sorry, he refuses to say who he is going to caucus with.
Although, he says the he is going to vote for the president, so it’s
likely he will caucus with the Democrat’s. And then of course there is
a 51st tiebreaker which is the presidency. So you have to slide that
in there as I said, about a 60 percent chance that the Democrat’s
control that tiebreaker and about a 40 percent chance the Republicans
control that tiebreaker.
What this translates into is simply as tight of race to control the
Senate as pretty much possible, essentially 50/50. With races to
watch, not that surprising, which is Massachusetts and Virginia are the
two races people have been talking about. Massachusetts which is
incumbent Scott Brown who upset and took Kennedy’s old seat after his
death, against Elizabeth Warren who is the popular professor from
Harvard who recently ran the consumer finance board, consumer finance
protection board. And then Virginia, which has George Allen who lost
to Webb in 2006 after his racial slur incident, running against the
former governor Tim Kaine; both of these votes, elections which are
essentially toss-ups.
So I think that about covers the predictions.
long but does anyone have any questions?
So I ran it a little
>>: So the debiasing that you are talking about requires a ton of prior
data? Okay. Otherwise it’s like a magic variable that you just --.
>> David Rothschild: That’s right, and so, which I didn’t go into too
much. So a lot of this requires a mass amount of historical data which
is one of the major issues when it comes to new data, social media
data, Twitter, etc, Facebook. But what we are looking at with polling
data of course is a tremendous amount of historical information, but on
the other hand you look at something like even prediction markets and
you only have a few cycles of them. So it is something that is
continuously updated with each new cycle of information that comes in,
which will make us more and more accurate. But it is something that
requires historical information.
Yes?
>>: [indiscernible]
>> David Rothschild: So some of it is going to be kind of straight
linear type progressions for very simple things. A lot of things
though on the probability are looking at probits or logits which are
fairly straight forward as well. Some of them do go a little beyond
that. A lot of them have polynomial terms or kind of factorial terms
in order to show how essentially the effects differ as you move away
from the middle.
So let me give you an example. So when it comes to polling per say
there is this idea called regression. Essentially polls which are at
10 point leads turn into something like 3 point or 2 point victories on
average. But that kind of factor only hits the very high lead ones.
It doesn’t hit the small lead ones. So there you have a square term
and some other things like that to work on that. Mainly you would
think of debiasing as looking at incumbency, party, but then thinking
about how it changes over the days before the election, which is very
key. How does it change based off of different election types? And
then how does it change, as I am saying, if something is very close or
not very close?
And so working in all those factors, but especially the days before the
election is an interesting one because there when you are thinking
about combining you see fundamental data does extremely strong 140 days
out, but essentially provides no additional information 10 days out.
There you are focusing only on other tips data.
>>: Yeah, I was trying to say [indiscernible], because when it came to
historical data [indiscernible].
>> David Rothschild: Uh huh.
>>: [indiscernible].
>> David Rothschild: Right.
>> [indiscernible] and it works very well up until now, so how do you -?
>> David Rothschild: Right. So I think there is kind of a couple of
ways in which we have been doing that for, depending on the different
types of elections. So for something like the fundamentals or the
polling individually we do have many, many years and there we are able
to very easily drop years and do in sample and out of sample
examinations of that. And the same thing even with prediction markets
though, we have in sample and out of sample for anything that we have
done, anything that I have done.
But more importantly also as well is very cognizant of the, the out of,
of this kind of concern, we do end up --. I do end up, depending on if
we or me sorry, depending on if I am working with someone or now,
dropping variables that would, or dropping terms and different
conditions that look very strong in sample if they are not working out
of sample.
And so we have done very strong on that, making sure that we are not
over fitting. And it has been a big concern. And so the basic paper
that I have on this last kind of combined model I talk a lot about over
fitting and things that I dropped back and scaled back on because I was
concerned about it.
Yes?
>>: You made a comment about, the only thing you were interested in was
the probability of victory. And the only thing that mattered was
[indiscernible].
>> David Rothschild: Uh huh.
>>: But if I look at, if I am a working manager at Xbox, I am really
not interested in victory or [indiscernible]. I am really interested
in actually unit sales.
>> David Rothschild: Yeah, so. Let me, let me, let me answer that
question. It’s essentially that’s for elections. And so for elections
we are not as concerned, although everything is also run for that for
historical comparison. What we are working on, and what I am doing a
lot more internal research on thinking about when it comes to
marketing, is actually full probability distributions, because I think
that is the answer more than just even levels.
So again, it’s more thinking, “What’s the relevant thing for the
relevant condition”? I could be thinking about forecasting the number
of sales on Xbox. And if I --. Maybe that, in a standard deviation,
maybe that’s the relevant thing, but if the relevant thing is
probability a distribution that’s what we are actually gunning for.
Thinking can we actually take that to the next level and get what we
feel could be for some of those things; the actually most especially if
there is a probability of something of a skewed distribution of some
sort.
So we have been working towards thinking about that.
most relevant for the right context?
Just what’s the
>>: Second question. [indiscernible] intriguing piece of information
about the conversion between [indiscernible] markets and polling data
and show where I think expectations tend to win roughly by a factor of
3 to 1 and that’s for about a set of what 80 elections. So is there
anything common between that 3/4 of the 80 elections or 60 elections,
or something common about the others; the characteristics of the
elections that make one method better than the other?
>> David Rothschild: Right. Well so the thing which I am happy to say
is that it doesn’t --. There is differing varying sample sizes that
occurred in these in different days before the election. And it was
not reliant on that kind of basic variables that we thought. And so it
really cut through fairly consistently. And so I didn’t show a slide,
but if we broke it up by those major factors we didn’t see anything
very, very strong.
>>: I just want to make sure I understood correctly the more bias
toward the higher intent, for example those much more likely to
[indiscernible] Democratic for example. Did you say that their
expectation is better?
>> David Rothschild: No, so to be clear, I could take the expectations
of just those people who were going to vote Democratic or just those
people who vote Republican and take that group and debias it based on
historical data to create a more accurate forecast of expected vote
share and probability of victory than I could with essentially just
thinking about the intentions of the whole group. But both
Republican’s and Democrat’s were capable of doing it.
>>: When you are talking about gamification you are getting information
from [indiscernible] from winning or losing?
>> David Rothschild: How do you know if you are winning or losing? So,
for what we are doing right now we are running samples generally using
Mechanical Turk or other sets of outside groups in which you are
knowing if you are winning or losing based off of bonuses that you are
receiving or based off of points that you are getting in a different
game.
>>: So how do you now what points to give me if I am [indiscernible]
information to you right?
>> David Rothschild: Right, so generally in these types of things you
are providing the information on either outcomes that have already
happened that you don’t know about or things that are randomly drawn in
the future and then taking that data to provide the points for it.
>>: [indiscernible] in the news [indiscernible] is starting to go
towards this new method we can kind of watch as it starts to emerge?
>> David Rothschild: Yeah, so some of them are, um, including --. So
I, I tend to blog on some news outlets as well as the New York times as
a statistician in house that does some things in probability, although
I am not a big fag --. No, I was, I was, but he has moved away from
some of those things. And some other news outlets are starting to
refer to the least aggregated forecasts or from in trade prices, or
[indiscernible] prices, or things like that.
But I think it’s slow because essentially there was always a kind of
loss leader to talk about the latest poll that they are conducting and
they want to conduct these polls and want to talk about them. So, I
think what you are going to see quicker and more aggressively is people
talking about just raw social media data, because that’s kind of the
prestigious thing for people to talk about now; whereas it used to be a
prestigious thing to drop 100,000 dollars on a couple of polls during
the course of an election. Now people, they need to have their Twitter
guy talking about the latest Twitter reaction to a different event.
So I think people are still far away from it. Still people know that
this is still out there, except they still are just reluctant to kind
of switch over.
>>: So I have heard that the polls that are out these days are mostly
registered voters, in the future they are going to start doing likely
voters.
>> David Rothschild: Yeah, so it transfers over --.
>>: [indiscernible] change the goal results?
>> David Rothschild: Essentially that transfers over sometime around
now, where essentially early in the cycle you are having polling on
registered voters and then you start building in the likely voter
models. It actually switches over kind of organically throughout the
end of the summer. And what you will see is going to be a mixed bag
because different companies used different likely voter models, and so
it’s very hard for them to predict who’s going to come out.
And a lot of that is where the bias between different polling companies
comes in. So you have Rasmussen and some other companies on the right
that tend to come out with polls of Romney slightly leading. And then
you have some polls which are know to have a slightly less bias. And
it’s all coming from their estimation of whose going to come out and
vote and that’s their way for them to kind of cycle it through.
>>: How do the two political camps use their statistics?
>> David Rothschild: Well they use it increasingly meaningfully is the
short of it. So, you know, 5 or 10 years ago the basic statistic that
people worked on was primary voters, so they used a triple D or triple
R, which meant that they had voted in previous election cycles in the
primaries as a strong indication. That was kind of the extent of data
mining that people were doing. And now the polling has become
ridiculously sophisticated as far as transferring 10s of millions of
dollars from different states, looking at whether or not it’s
prediction markets are polling, but more likely its these internal
pollsters that they are hiring as well as teams of kids, especially on
Obama who are translating Twitter data and other things for whatever
they claim they can do.
But my guess is that it’s, I haven’t seen inside any of the campaigns,
but I think they probably have a long way to go to actually make it
meaningful, but definitely impactful. I am sorry, meaningful in the
way whether or not they are correct, but impactful in the sense that it
does lead for a huge amount of investment decisions are being based off
what they can decide on these.
>>: Wouldn’t they try to hire you and say, “Quit your job and work for
us”.
>> David Rothschild: Right, so um. I don’t, I don’t, I think Microsoft
won’t allow me to, would not want me to work for one camp or the other.
But you know anyone who wants to, you know take our publically
available data is more than happy to translate that into investment
decisions I guess.
>>: Are you expanding the expectations model to include to side with my
social network and [indiscernible].
>> David Rothschild: Yeah.
>>: That’s my expectation as opposed to someone who is a social hub?
>> David Rothschild: No question. So that’s kind of our future work
which we are really excited about, is weighing people by how much, how
many people they know about. So right now we have kind of been torn on
it. We have asked people, “How many people do you talk to about
politics”? And the numbers actually come out fairly consistently.
But then there is, you know, some people are going to say 1000 people
and some people are going to say 1. And obviously we are not going to
weigh people based directly upon that.
We also have been working on kind of deducing it from how accurate
people can be and it’s been a fun challenge. But that’s something
where we would definitely like to go.
>>: Or their Facebook friends.
>> David Rothschild: Yeah, or their Facebook friends. And you know at
the end of the day the idea is to be able to poll this information more
organically. I think the key thing to think about like in 5 or 10
years is the idea of an actually poll wont make much sense to people at
that time, because its all going to be ongoing data collection, which
is going to be a combination of passive and active data in which you
wont be able to tell the difference. You are not going to realize you
were just polled per say in the same way that you know Google, and
Microsoft and Yahoo and all these companies are, you know, predicting
things all the time on an individual level. Why wouldn’t we be able to
do that for elections and other things like that? And so polling from
social network data would definitely be part of it as well.
>>: So is one idea that you could do to determine that a likely voter
would be to ask some kind of a question that somebody who pays
attention to the news might know the answer to, like you could ask them
about this Obama sector and how he crashed his car. But if the person
knew about that maybe they are more likely to be a voter
[indiscernible].
>> David Rothschild: There is a mass of literature on breaking down
political information and what does it mean. And it’s actually a very
tricky question because its not always clear which questions translate
into which outcomes. But what there is actually a lot of people doing
research in those types of things. Like which questions can we ask and
then what does that imply if people understand that or don’t understand
that. And also thinking about people’s confidence in those questions
is actually where my research is tended towards though. Is that we
show that people who are extremely confident in their responses to
these political questions are also less likely to be influenced by
certain things. And try to figure out exactly where does all that lie
and what do we know?
>>: That’s good.
Thank you [indiscernible].
Download