So you think you can do statistics? Lecture 1b: Kahneman revisited, simulating extreme events, over-confidence, and coincidence 1) Discussion of ideas from Kahneman reading and the Most Dangerous Equation. - Where in your own discipline would these ideas be most relevant? - Other heuristics – which ones seem most important - What do you think about the Gates Foundation small school program in light of all this? 2) Simulating the marble sampling game with R ##Creating A Populations urn=c(rep("red", 20),rep("clear", 20)) ##Sampling From the Population #mix up the marbles sample(urn) #take 4 marbles sample(urn, 4) # four.marbles.one.at.a.time sample(urn, 4, replace=T) ##Recoding Something About the Population extremeEvent <- rep(0,10) extremeEvent if(all(four.marbles=="clear") | all(four.marbles=="red")){ extremeEvent[1] <- 1 } extremeEvent ##Sampling and Recording Many Times draws <- 1000 extremeEvent <- rep(0,draws) sampleSize <- 4 for(i in 1:draws){ samp <- sample(urn, sampleSize) if(all(samp=="clear") | all(samp=="red")){ extremeEvent[i] <- 1 } } sum(extremeEvent)/draws n= 4 N =140 150 100 Frequency 50 0 Ashley’s experiment of experiments: Blue lines are about what was observed in class marble experiment 200 Histogram of probExtreme 0.00 0.05 0.10 0.15 probExtreme 200 50 100 0 Frequency 300 Histogram of probExtreme 0.00 0.05 0.10 n= 7 N=103 0.15 probExtreme 3) Over-confidence: From class surveymonkey. If we were appropriately confident then our assessment of the range in which we are 95% sure the truth exists should cover the truth about 95% of the time. We covered it 30%of the time. Another example: Subjects are asked to estimate some quantity whose exact value they typically do not know, such as the surface area of Lake Michigan or the number of registered cars in Sweden. They are asked not for a single number, but for an upper bound and a lower bound together encompassing an interval with the property that the subject attaches a subjective probability of 98% to the event that the true value lies in the interval. If subjects are well-calibrated in terms of the confidence they attach to their estimates, then one would expect them to hit the true value about 98% of the time. In experiments, they do so less than 60% of the time, indicating that severe overconfidence in estimating unknown quantities is a wide-spread phenomenon. Häggström hävdar - http://haggstrom.blogspot.com/ Coincidence: Apophenia / is the experience of perceiving patterns or connections in random or meaningless data. The term is attributed to Klaus Conrad[1] by Peter Brugger,[2] who defined it as the "unmotivated seeing of connections" accompanied by a "specific experience of an abnormal meaningfulness", but it has come to represent the human tendency to seek patterns in random information in general, such as with gambling and paranormal phenomena Exploring the Birthday problem: The headmaster of a large school notices that in more than half of the classes there are at least two children whose birthdays coincide. He knows that the average class size in his school is 30, so he reckons that all those coinciding birthdays must be a record. Big Truths Today: Sample size matters Rare events will be more likely with smaller sample size and with more trials Weigh subjective information more than base rates We naturally think things are self-correcting but this doesn’t apply to random variables