Probabilistic Card Trick Solution

advertisement
Backwards reasoning fallacy
Suppose it is known that, on average, 50% of the students who start a course pass it. Is
it correct to conclude the following?:
A courses that starts with 100 students will end up, on average, with 50 passes
A course that ends with 50 passes will, on average have started with 100 students
In fact a) is normally correct but b) is normally not. This has everything to do with the
way we reason with so-called prior assumptions (this reasoning lies at the heart of the
so-called Bayesian approach to probability).
To understand this fallacy we have to work with an example. The crucial prior
knowledge in this case is the so-called ‘distribution’ of student numbers who start
courses. Let’s suppose that these are courses in a particular college where the
‘average’ number of students per course is 180. We know that some courses will have
more than 180 and some less –the distribution of student numbers looks like this:
This is a bell curve (a so called Normal distribution) whose average (mean) is 180.
Distributions like this are characterised not just by the mean, but also by the so-called
‘variance’, which is how ‘spread out’ the distribution is. The lower the variance the
closer most of the data are to the mean. In this example the variance is 1000 which
means that about 95% of the data lie within plus or minus 60 of the mean (that is ‘two
standard deviations’ where the standard deviation is the square root of the variance).
Because the number of students who pass is influenced by the number who start, we
represent this relationship as follows.
As in any model like this (it is a so-called Bayesian net or risk map) we need to define
not only the distribution for the node representing ‘number of students starting course
(which we have said is a Normal distribution with mean 180 and variance 1000) but
also the distribution for the node representing number of students who pass. Since this
latter number is dependent on the former what we actually need to define is the socalled conditional distribution. We know that on average the number of students who
pass is 0.5 times the number who start. But we cannot say that this is a certain
relationship. What we can reasonably say is that the mean of the distribution of the
number who pass is 0.5 times the number who start. So, it seems reasonable again to
use a Normal distribution whose mean is 0.5 times the number who start. We assume
that the variance of this distribution is 500.
Thus, if we know that 100 students start the course then the (predicted) distribution
for the number who will pass looks like this:
As you would expect the mean of the predicted distribution is about 50.
However, suppose we do not know the number who start but we know that 50 passed
a particular course. In this case we use the model to reason backwards and it gives the
following result:
Curiously the mean of the predicted distribution for the number who started this
course is not 100 but is much higher – it is about 153. This seems to be wrong. But in
fact, the fallacy is to assume that the mean should have been 100.
What is happening here is that the model is reasoning about our uncertainty in such a
way that our prior assumptions are taken into consideration. Not only do we know
that the average number of people who start a course is 180, but we also know that it
is very unlikely that fewer than 120 people start a course (there are some such courses
but they are rare). On the other hand, while on average a course with, say 150 starters
will on average result in 75 passes, about 5% of the time the number of passes will be
50 or lower.
Hence, if we know that there is a very low number, 50, of passes on a course there are
two ‘explanations’.
there might have been far fewer students start the course than the 180 we expected
the pass rate for the course might have been far lower than the 50% we expected.
What the model does is shift more on the latter than the former, so it says “I am
prepared to believe there were fewer students start this course than expected but the
stronger explanation for the low number of passes is that the pass rate was lower than
expected”.
This is the Bayesian approach to reasoning. Is it rational? Imagine that, give or take a
handful of students, there were always 180 students starting a course. If you discover
that only 50 students had passed a course you would have to conclude that a much
lower than expected pass rate was to blame. In this extreme case your prior beliefs
about the number of starting students is very strong and cannot be shifted by other
observations. If you had no such strong beliefs about the number of starting students
then everything is different. For example starting with a so-called ‘ignorant’ prior
distribution, if we discover 50 students passing a course then the model does indeed
conclude that the are most likely to have started with about 100 students:
This type of problem is extremely common. For example, in a real-life project I was
involved in we were tackling the problem of attrition rates for classes of military
vehicles in combat. On the one hand we needed to know the likely number of vehicles
left operational at the end of combat given certain combat scenarios. On the other
hand, given a requirement for a minimum number of vehicles to be operational at the
end of combat we needed to calculate the minimum number of vehicles to start with.
Although the model involved many variables you can think of it in terms exactly like
the above model where vehicles at start of combat replaces students at start of course
and operational vehicles at end of combat replaces students who pass the course. As
in the above example users of the model could not understand why it predicted 50
vehicles at the end of combat given 100 vehicles at the start, yet predicted over 150
vehicles at start given 50 at the end.
Since the prior distribution for vehicles had been provided by the users themselves the
model was working correctly, even if it did not produce the results that they felt were
‘sensible’. In this case it was the strength of the prior distribution that the users had to
review for correctness.
Download