Mathematical Ideas that Shaped the World Bayesian Statistics Plan for this class Why is our intuition about probability so bad? What is the chance that two people in this room were born a few days apart? What is conditional probability? If someone’s DNA is found at a crime scene, what is the chance they are guilty? How can we spot bad statistics in the media? An unfortunate truth Humans have an extraordinarily bad intuition about probability. Winning the lottery What do you think your chances of winning the lottery are? Say whether winning the lottery is more or less likely to happen than this collection of events… Is winning the lottery more or less likely? LESS MORE 1 in 4,096 Chance of getting 12 heads in a row when flipping a fair coin. Is winning the lottery more or less likely? LESS MORE 1 in 24,000 Dying from a road accident in 1 year Is winning the lottery more or less likely? LESS MORE 1 in 25 million Dying in the next flight you take Is winning the lottery more or less likely? LESS MORE 1 in 1 million Being struck by lightning Is winning the lottery more or less likely? LESS MORE 1 in 300 million Dying from a shark attack Is winning the lottery more or less likely? LESS MORE 1 in 2 million Dying in the next hour from any causes whatsoever Conclusion Winning the lottery has surprisingly bad odds: 1 in 13,983,816. Yet many people are convinced that this could one day be likely to happen to them. We mix up the probability of someone winning the lottery (which is quite likely) with the probability of us winning the lottery. The birthday problem How many people need to be a room together so that there is a more than 50% chance of two people having the same birthday? A) 300 B) 183 C) 91 D) 23 Number of people Probability that 2 people share a birthday 10 11.7% 20 41.1% 23 50.7% 30 70.6% 50 97% 57 99% 100 99.99997% 200 99.9999999999999999999999999998% 366 100% The birthday graph In this room? What is the chance that two people in this room have birthdays less than 3 days apart (ignoring the year?) Answer: more than 50% Monty Hall Behind 1 door is a sheep. Behind the other 2 doors are other, non-sheepy, animals. You choose a door. I open a different door showing a non-sheep. Given the choice now of sticking with your choice or switching, what should you do? Suppose you choose Door 1… Door 1 Door 2 Door 3 Stick Switch Sheep! Not a sheep Not a sheep Sheep! No sheep Not a sheep Sheep! Not a sheep No sheep Sheep! Not a sheep Not a sheep Sheep! No sheep Sheep! If you stick with your choice, you only win 1 time out of 3. Conditional probability Conditional probability is the chance of something happening given that another event has already happened. For example: you throw two dice. What is the probability of the first die being a 6 given that the sum of the two dice is 8? What if the sum of the two dice was 6 or 7? How to think about conditional probability Conditional probability is all about updating your odds in light of new evidence. There are a priori odds – the initial probability of an event. E.g. the probability of rolling a 6 is a priori 1 in 6. After new evidence, you have a posteriori odds. E.g. the probability of having a 6, given that the sum of two dice is 8, is 1 in 5. Boy or girl? I know a friend who has 2 children. At least one of the children is a boy. What is the chance that the other child is also a boy? Answer: 1 in 3 Explanation A priori, there are 4 possible combinations of children: Boy – Boy Boy – Girl Girl – Boy Girl - Girl From our new evidence, we know that GirlGirl is not possible, leaving only 3 options. Of these 3 options, only one of them is BoyBoy. A paradox? If you know that the oldest child is a boy, the probability of the other child being a boy is 50%. If you know that the youngest child is a boy, the probability of the other child being a boy is 50%. Surely the first boy must be either the youngest or the oldest?! Homework I know a friend who has two children. At least one of the children is a boy who was born on a Tuesday. What is the chance that the other child is also a boy? Confusion of the inverse People have a tendency to assume that a conditional probability and its inverse are similar. For example: If sheep enjoy eating grass, then an animal who likes grass is likely to be a sheep. If most accidents happen within 20 miles of home, then you are safest when you are far from home. Manipulating statistics A. Taillandier (1828) found that 67% of prisoners were illiterate. “What stronger proof could there be that ignorance, like idleness, is the mother of all vices?” But what proportion of illiterate people were criminals? Bayesian statistics The first person we know who looked seriously into conditional probabilities was Thomas Bayes. He was the first person to write down a formula connecting the two inverse conditional probabilities. Bayesian statistics is all about updating the odds of an event after receiving new evidence. Thomas Bayes (1702 – 1761) Son of a London Presbyterian minister. Studied logic and theology at the University of Edinburgh. In 1722 returned to London to assist his father before becoming a minister of his own church in Tunbridge Wells, Kent, in 1733. Thomas Bayes (1702 – 1761) During his lifetime, Bayes only published two papers. One was on “Divine Benevolence”. The other was a defence of “The Doctrine of Fluxions” against the attack of George Berkeley. His most famous paper was published in 1764, called “An Essay towards solving a problem in the Doctrine of Chances”. Bayes’ Theorem P(A) is the prior probability of A. P(B) is the prior probability of B. P(A|B) is the probability of A happening, given that B has happened. P(B|A) is the probability of B happening, given that A has happened. Importance of Bayes’ Theorem Bayes’ Theorem is especially useful in medicine and in law. Most doctors get the following question wrong. Let’s see what you think! A test for breast cancer 1% of women aged 40 will get breast cancer. Out of the women who have breast cancer, 80% of them will have a positive test result. Out of the women who don’t have breast cancer, 10% of them will get a positive result. If a woman tests positive for breast cancer, what is the chance she has actually has it? Doing the numbers Consider 10,000 women. 100 of them will have breast cancer. 9900 of them don’t have breast cancer. 80 of them test positive 20 of them test negative 990 of them test positive 8910 of them test negative In total there are (80+990) = 1070 positive results, of which only 80 have cancer. That’s 7.4%. The prosecutor’s fallacy Suppose a prosecutor in a court case finds a piece of evidence – e.g. a DNA sample. They argue that the probability of finding this evidence if the defendant were innocent is tiny. Therefore the defendant is very unlikely to be innocent. Where is the fallacy in this argument? The prosecutor’s fallacy If the a priori chance of the defendant’s guilt is very low, then it will still be very low after presentation of this evidence. Just like with the cancer example, a false positive may be much more likely than a true positive in the absence of other evidence. Exhibit 1: Sally Clark, 1999 Convicted of murdering both her sons. Paediatrician Roy Meadow argued that the chance of both children dying naturally was 73 million to 1. Didn’t take into account that double murder would have been more unlikely. Conviction overturned in 2003. Exhibit 2: Denis Adams, 1996 Convicted of rape based on DNA found at the scene of the crime. Probability of a match said to be 1 in 20 million. There was no other evidence to convict: victim did not identify Adams in a line-up and Adams had an alibi. The defence team instructed the jury in the use of Bayes’ Theorem. The judge questioned its appropriateness. After 2 appeals, Adams is still convicted. A rule against Bayes In 2010 a convicted killer known as “T” appealed against his conviction. Part of the evidence was based on the special markings on his Nike trainers. The data on how many pairs of such trainers existed was unreliable. It has now been ruled that Bayes’ Theorem is not allowed in court unless the underlying statistics are “firm”. Quotes of statistics “98% of all statistics are made up” “The average human has one breast and one testicle. “ “Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital. “ “There are three kinds of lies: lies, damned lies, and statistics.“ Misuse of statistics We are going to look at some examples of bad statistics in the media. What things should we look out for to spot bad maths and stats? Strange patterns Matt Parker, of Queen Mary University London, look at 800 ancient sites. 3 sites, around Birmingham, formed a perfect equilateral triangle. Extending the base of this triangle links up 2 more sites, more than 150 miles apart, with an accuracy of 0.05%. Ancient sites? Ancient sites? What to watch out for Events assumed to be independent (e.g. ‘6 double yolks’ article). Patterns found using large amounts of data (e.g. ‘ancient sat-nav’ article) Other factors not taken into account (e.g. ‘perfect whist deal’ article) Confusion of the inverse Omission of relevant data Misleading labelling of graphs Lessons to take home Don’t play the lottery. Think very carefully when you are asked a question about probability. Don’t confuse conditional probabilities with their inverses. Ask questions whenever you see statistics in the media! (And write in to report bad journalism!)