# Thursday week 1

```So you think you can do statistics?
Lecture 1b: Kahneman revisited, simulating extreme events, over-confidence,
and coincidence
1) Discussion of ideas from Kahneman reading and the Most Dangerous Equation.
- Where in your own discipline would these ideas be most relevant?
- Other heuristics – which ones seem most important
- What do you think about the Gates Foundation small school program in light of all this?
2) Simulating the marble sampling game with R
##Creating A Populations
urn=c(rep(&quot;red&quot;, 20),rep(&quot;clear&quot;, 20))
##Sampling From the Population
#mix up the marbles
sample(urn)
#take 4 marbles
sample(urn, 4)
# four.marbles.one.at.a.time
sample(urn, 4, replace=T)
##Recoding Something About the Population
extremeEvent &lt;- rep(0,10)
extremeEvent
if(all(four.marbles==&quot;clear&quot;) | all(four.marbles==&quot;red&quot;)){
extremeEvent[1] &lt;- 1
}
extremeEvent
##Sampling and Recording Many Times
draws &lt;- 1000
extremeEvent &lt;- rep(0,draws)
sampleSize &lt;- 4
for(i in 1:draws){
samp &lt;- sample(urn, sampleSize)
if(all(samp==&quot;clear&quot;) | all(samp==&quot;red&quot;)){
extremeEvent[i] &lt;- 1
}
}
sum(extremeEvent)/draws
n= 4
N =140
150
100
Frequency
50
0
Ashley’s experiment of experiments:
Blue lines are about what was observed
in class marble experiment
200
Histogram of probExtreme
0.00
0.05
0.10
0.15
probExtreme
200
50 100
0
Frequency
300
Histogram of probExtreme
0.00
0.05
0.10
n= 7
N=103
0.15
probExtreme
3) Over-confidence: From class surveymonkey. If we were appropriately confident then our assessment
of the range in which we are 95% sure the truth exists should cover the truth about 95% of the time.
We covered it 30%of the time.
Another example: Subjects are asked to estimate some quantity whose exact value they typically do not
know, such as the surface area of Lake Michigan or the number of registered cars in Sweden.
They are asked not for a single number, but for an upper bound and a lower bound together
encompassing an interval with the property that the subject attaches a subjective probability of 98% to
the event that the true value lies in the interval.
If subjects are well-calibrated in terms of the confidence they attach to their estimates, then one would
expect them to hit the true value about 98% of the time.
In experiments, they do so less than 60% of the time, indicating that severe overconfidence in
estimating unknown quantities is a wide-spread phenomenon.
H&auml;ggstr&ouml;m h&auml;vdar - http://haggstrom.blogspot.com/
Coincidence:
Apophenia
/ is the experience of perceiving patterns or connections in random or meaningless data. The term is
attributed to Klaus Conrad[1] by Peter Brugger,[2] who defined it as the &quot;unmotivated seeing of
connections&quot; accompanied by a &quot;specific experience of an abnormal meaningfulness&quot;, but it has come to
represent the human tendency to seek patterns in random information in general, such as with
gambling and paranormal phenomena
Exploring the Birthday problem: The headmaster of a large school notices that in more than half of the
classes there are at least two children whose birthdays coincide. He knows that the average class size in
his school is 30, so he reckons that all those coinciding birthdays must be a record.
Big Truths Today:
Sample size matters
Rare events will be more likely with smaller sample size and with more trials
Weigh subjective information more than base rates
We naturally think things are self-correcting but this doesn’t apply to random variables
```