Tests and beyond Group Presentations • Group 1: How do weird distributions affect the pvalue when the null hypothesis is true. • Group 2: What does the p-value say about the effect size? • Group 3: How does sample size affect the probability of rejecting (p≤0.05) the null, when the null hypothesis is true. • Group 4: How does increasing sample size change your interpretation of rejecting (p≤0.05) the null. • Group 5: The estimated effect size if you reject. Power How frequently would I reject if there was a difference of X in the population means? Example of a power analysis How often would I reject if the difference between the means was 15? p_values <- numeric(1000) for(i in 1:1000){ groupA <- rnorm(20,180,40) groupB <- rnorm(20,180+15,40) p_values[i] <- perm.test(groupA,groupB)$p.value } length(P_values[abs(p_values)<=0.05]/1000 Example of a power analysis How often would I reject if the difference between the means was 15? p_values <- numeric(1000) for(i in 1:1000){ groupA <- rnorm(20,180,40) groupB <- rnorm(20,180+15,40) p_values[i] <- perm.test(groupA,groupB)$p.value } length(p_values[abs(p_values)<=0.05])/1000 2000 1500 Frequency 1000 500 0 0.0 0.2 0.4 0.6 p-value 0.8 1.0 2000 1500 Frequency 1000 500 0 0.0 0.2 P-value ≤ 0.05 0.4 0.6 p-value 0.8 1.0 2000 1000 500 0 Frequency 1500 Power = 0.21 Probability of rejecting given: • A difference of 15 • Standard deviation = 40 • Sample size = 20 0.0 0.2 P-value ≤ 0.05 0.4 0.6 p-value 0.8 1.0 Where do we get the difference and standard deviation? • • • • Theory (e.g. expect at most a difference of XX) Other studies. A pilot study. Regulations (must show an increase of at least ZZ). Total Cholesterol Example Example of a power analysis sampleSizes <- c(5,10,20,40,60,80,100,120,140,160) power_vals <- numeric(10) for(j in 1:10){ sampSize <- sampleSizes[j] p_values <- numeric(10000) for(i in 1:10000){ groupA <- rnorm(sampSize,180,40) groupB <- rnorm(sampSize,180+15,40) p_values[i] <- t.test(groupA,groupB)$p.value } power_vals[j] <- length(p_values[abs(p_values)<=0.05])/10000 } Example of a power analysis sampleSizes <- c(5,10,20,40,60,80,100,120,140,160) power_vals <- numeric(10) for(j in 1:10){ sampSize <- sampleSizes[j] p_values <- numeric(10000) for(i in 1:10000){ groupA <- rnorm(sampSize,180,40) groupB <- rnorm(sampSize,180+15,40) p_values[i] <- t.test(groupA,groupB)$p.value } power_vals[j] <- length(p_values[abs(p_values)<=0.05])/10000 } Example of a power analysis sampleSizes <- c(5,10,20,40,60,80,100,120,140,160) power_vals <- numeric(10) for(j in 1:10){ sampSize <- sampleSizes[j] p_values <- numeric(10000) for(i in 1:10000){ groupA <- rnorm(sampSize,180,40) groupB <- rnorm(sampSize,180+15,40) p_values[i] <- t.test(groupA,groupB)$p.value } power_vals[j] <- length(p_values[abs(p_values)<=0.05])/10000 } 0 50 100 Sample size 150 0.0 0.2 0.4 0.6 Power 0.8 1.0 1.0 0.6 0.4 0.2 Effect size = 10 0.0 Power 0.8 Effect size = 15 0 50 100 Sample size 150 This stuff makes my head hurt. There’s got to be an easier way! • Hypothesis Testing Reject • Estimation ! Estimation & Confidence Intervals Estimation & Confidence Intervals Reject ! Estimation & Confidence Intervals Reject ! What is a Confidence Interval? • Unfortunately it’s not: 95% probability that the true mean lies in the CI. • 95% of the time the CI includes (covers) the true mean (difference, etc…) True Difference Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? 0 Issues with Tests Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? Total cholesterol • • • • 20 1 0 Total cholesterol 20 1 0 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? 15 0 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? 15 0 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? smaller CI’s -> Higher Power What does N=10 rejecting mean? N=20 15 0 N=50 N=75 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? 2 x se What does N=10 rejecting mean? N=20 15 0 N=50 N=75 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? 2 x se 𝑣𝑎𝑟1 + 𝑣𝑎𝑟2 What does N=10 rejecting mean? se = N=20 15 0 𝑁 N=50 N=75 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? 2 x se 𝑣𝑎𝑟1 + 𝑣𝑎𝑟2 What does N=10 rejecting mean? se = N=20 15 0 𝑁 N=50 N=75 se = 1600 + 1600 𝑁 1 se = 57 𝑁 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? 0 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? 15 0 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? 15 0 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? 0 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? True effect size 0 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? 0 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? 0 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? 0 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? 0 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? 0 Issues with Tests • • • • Statistical vs practical significance. Power of a test. What does not rejecting mean? What does rejecting mean? 0 Thursday • Study Design End How would you do a power analysis How large would my samples need to be in order to reject 90% of the time? Need: 1) A reasonable difference than you would like to be able to detect. 2) An approximate standard deviation of the data you expect to collect. Confidence intervals For the Cholesterol example • Simulation for power analysis. • Simulation for checking your test. Define Power Use a motivating example. Confidence Intervals Use bootstrap to motivate then introduce 2*se Explain connection between confidence intervals and tests. subtract mean difference Given the mean is X, there is a 95% probability that the CI will cover it. Given diff is zero there is a less than 5% probability that this confidence interval Rules of Thumb that are useful! • • • • Standard error 2*standard error (t-test). Again, don’t get caught of in the minutia. The difference between 2 and 1.96 is small compared to other things.