How frequently does randomness cause pattern? Under what circumstances are you more likely to see patterns by chance alone? Ashley Explain the idea behind Kahneman's librarian/farmer example. How is that related to Bayesian statistics? Farmers Librarians It’s all about the Base Rates Introvert Base rates are important. Even Introvert Martin Explain the idea behind Kahneman's librarian/farmer example. How is that related to Bayesian statistics? Null Hypothesis True No difference in population Null False Difference reject Base rates are important. Even reject Martin Do Not Have AIDS Null Hypothesis is true 10,000 What is the probability you do not have AIDS if you test + Have AIDS Null Hypothesis is false 10 Do Not Have AIDS Null Hypothesis is true 10,000 Have AIDS Null Hypothesis is false All 10 test + 500 π π·π πππ‘ βππ£π π΄πΌπ·π π‘ππ π‘+) = = 0.98 510 Even though 500 π π‘ππ π‘ + π·π πππ‘ βππ£π π΄πΌπ·π) = = 0.05 10,000 500 test + What property of a county (that has nothing to do with health) makes it more likely to have an extreme rate of kidney cancer? Martin What is statistical "power"? Why does it matter? Probability of Rejecting Given the Null Hypothesis is FALSE Martin Example of a power analysis sampleSizes <- c(5,10,20,40,60,80,100,120,140,160) power_vals <- numeric(10) for(j in 1:10){ sampSize <- sampleSizes[j] p_values <- numeric(10000) for(i in 1:10000){ groupA <- rnorm(sampSize,180,40) groupB <- rnorm(sampSize,180+15,40) p_values[i] <- t.test(groupA,groupB)$p.value } power_vals[j] <- length(p_values[abs(p_values)<=0.05])/10000 } Example of a power analysis sampleSizes <- c(5,10,20,40,60,80,100,120,140,160) power_vals <- numeric(10) for(j in 1:10){ sampSize <- sampleSizes[j] p_values <- numeric(10000) for(i in 1:10000){ groupA <- rnorm(sampSize,180,40) groupB <- rnorm(sampSize,180+15,40) p_values[i] <- t.test(groupA,groupB)$p.value } power_vals[j] <- length(p_values[abs(p_values)<=0.05])/10000 } Example of a power analysis sampleSizes <- c(5,10,20,40,60,80,100,120,140,160) power_vals <- numeric(10) for(j in 1:10){ sampSize <- sampleSizes[j] p_values <- numeric(10000) for(i in 1:10000){ groupA <- rnorm(sampSize,180,40) groupB <- rnorm(sampSize,180+15,40) p_values[i] <- t.test(groupA,groupB)$p.value } power_vals[j] <- length(p_values[abs(p_values)<=0.05])/10000 } Example of a power analysis sampleSizes <- c(5,10,20,40,60,80,100,120,140,160) power_vals <- numeric(10) for(j in 1:10){ sampSize <- sampleSizes[j] p_values <- numeric(10000) for(i in 1:10000){ groupA <- rnorm(sampSize,180,40) groupB <- rnorm(sampSize,180+15,40) p_values[i] <- t.test(groupA,groupB)$p.value } power_vals[j] <- length(p_values[abs(p_values)<=0.05])/10000 } 0 50 100 Sample size 150 0.0 0.2 0.4 0.6 Power 0.8 1.0 What are the two different kinds of mistakes scientists are worried about making when conducting a statistical test? (Explain Type I and Type II errors in English) Ashley How many "significant" results (P < 0.05) are you likely to see, on average, when you conduct 60 tests and the null hypothesis is true? 0.05 X 60 = 3 Martin Explain the difference between exploratory and confirmatory analysis? Exploratory Confirmatory “Exploratory analysis is w/o pre-assumption. Confirmatory analysis first assume and then seek to prove it.” Exploratory analyses: initiated by data and vague ideas. Confirmatory analyses: driven by one (or more) explicit, predetermined questions. Exploratory analyses: look for patterns, connections, new ideas. Confirmatory analyses: answer questions, test hypotheses. Exploratory products: graphs, interesting ideas, new hypotheses, models for further evaluation. Confirmatory products: statistical test results, answers (always subject to revision), “final”models (also subject to future revisions). Science proceeds by a balance – often found within a given project. All confirmatory -> end of new ideas. All exploratory -> little true forward progress in understanding mechanisms and observed relationships. Best practice: Be explicit about what you are doing, limit inferential conclusions from exploratory analyses, blend within a project as two distinct steps. Name 4 ways of making a comparison between two populations based on the means of a sample from each population? Ashley What information can you derive from a mean and a confidence interval? Ashley What is a bootstrap sample? 5 2 9 7 -3 3 4 11 -1 0 5 8 2 Martin What is a bootstrap sample? 7 11 5 4 5 5 2 4 9 2 7 -3 4 311 -3 11 0 -1 8 2 2 4 5 5 7 11 8 7 11 2 -3 5 0 7 4 2 -3 8 0 -1 9 5 2 -3 0 -1 3 3 5 9 2 11 4 5 2 3 7 5 8 2 -3 0 -1 8 0 9 5 4 11 2 2 3 7 5 8 2 -3 0 -1 8 11 4 5 9 7 8 2 3 5 2 -3 0 -1 9 -1 2 3 5 Martin 5 What is a bootstrap sample? 7 11 3 9 7 11 5 4 5 2 9 2 7 -3 4 311 -1 0 4 -1 5 8 2 5 4 2 5 7 11 8 7 4 11 2 -3 5 0 7 4 2 -3 8 0 8-1 9 5 2 -3 0 -1 3 3 5 9 2 11 4 5 2 2 3 7 5 8 2 -3 0 -1 8 0 9 5 4 11 2 2 3 7 5 8 2 -3 0 -1 8 11 4 5 9 7 8 2 3 5 2 -3 0 -1 9 -1 2 3 5 5 2 11 2 5 -3 11 0 9 -3 0 7 0 9 2 11 9 4 9 -3 9 -1 4 2 2 5 4 7 7 11 0 3 7 -3 9 7 -1 9 -3 5 5 11 2 -1 9 9 Martin What is a bootstrap sample? 5 2 9 7 -3 3 4 11 -1 0 5 8 2 Martin 5 What is a bootstrap sample? 7 11 3 9 -3 0 2 9 7 -3 3 4 -1 5 -1 0 4 11 2 5 9 -3 9 -1 4 2 11 0 2 4 5 5 9 -3 2 5 8 7 2 0 9 2 11 9 4 7 7 11 0 3 7 -3 9 7 -1 9 5 5 11 2 -1 5 9 9 Martin 5 What is a bootstrap sample? 7 11 3 9 -3 0 2 9 7 -3 3 4 -1 5 -1 0 4 11 mean = 5.3 2 5 9 -3 9 -1 4 2 11 0 2 4 5 5 9 -3 mean = 4.9 2 5 8 7 2 0 9 2 11 9 4 7 7 11 0 3 7 -3 9 7 -1 9 5 mean = 5.1 5 11 2 -1 5 9 mean = 5.2 9 Martin What is a bootstrap sample? Martin What is a bootstrap sample? Martin How is a permutation test different from a t-test? Ashley Name 3 reasons to use simulations in statistical thinking. • Power Analyses / sample size • Am I getting the correct answer? • Can “try out” your analysis / plots. • It’s FUN. Martin If you know the mean (m), standard deviation (sd) and size of a sample (n), what is one way to calculate an approximate 95% confidence interval. Ashley You are considering the results from one study in which the researchers rejected the null hypothesis. This study comes from a group of 200 similar studies. You believe that for about half of the studies the null hypothesis is true and the other half false. The researchers rejected if the p-value is less than 0.01 and the tests had a power of 0.8. Use simple frequencies to calculate the probability that the null hypothesis is false. Martin Null Hypothesis True No difference in population Null Hypothesis False difference in population 100 100 Null False Difference Base rates are important. Even Martin Null Hypothesis True No difference in population Null Hypothesis False difference in population 100 100 Null False Difference 80 rejects Base rates are important. Even 1 reject Martin Null Hypothesis True No difference in population Null Hypothesis False difference in population 100 100 80 = 0.99 81 Base rates are important. Even 1 Null False Difference 80 rejects reject Martin Name 3 hazards that you are exposed to every day. Estimate how they influence your risk of mortality? • Student going Berserk! • Bicycle Crash • Heart Attack, Etc… Martin Name 3 hazards that you are exposed to every day. Estimate how they influence your risk of mortality? • Student going Berserk! • Bicycle Crash • Heart Attack, Etc… (15 micromorts / day) Martin Name 3 hazards that you are exposed to every day. Estimate how they influence your risk of mortality? • Student going Berserk! • Bicycle Crash (0.35 micromorts /day) • Heart Attack, Etc… (15 micromorts / day) Martin Name 3 hazards that you are exposed to every day. Estimate how they influence your risk of mortality? • Student going Berserk! (LOW……I hope!) • Bicycle Crash (0.35 micromorts /day) • Heart Attack, Etc… (15 micromorts / day) Martin What are three questions you should ask yourself when evaluating a published paper? A newspaper story? Ashley