SUMMARY Homoscedasticity http://blog.minitab.com/blog/statistics-and-quality-data-analysis/dont-be-a-victim-of-statistical-hippopotomonstrosesquipedaliophobia Tests for homoscedasticity • π»0 : π1 = π2 • F-test of equality of variances (Hartley's test), πΉ ππΏ Power of the test • A probability that it correctly rejects the null hypothesis (H0) when it is false. • Equivalently, it is the probability of correctly accepting the alternative hypothesis (Ha) when it is true - that is, the ability of a test to detect an effect, if the effect actually exists. Probability of FP is α Decision Reject H0 State of the world H0 true Retain H0 Type I error H0 false power = 1 - β Type II error Probability of FN is β What factors affect the power? To increase the power of your test, you may do any of the following: 1. Increase the effect size (the difference between the null and alternative values) to be detected The reasoning is that any test will have trouble rejecting the null hypothesis if the null hypothesis is only 'slightly' wrong. If the effect size is large, then it is easier to detect and the null hypothesis will be soundly rejected. 2. Increase the sample size(s) – power analysis 3. Decrease the variability in the sample(s) 4. Increase the significance level (α) of the test The shortcoming of setting a higher α is that Type I errors will be more likely. This may not be desirable. NEW STUFF Effect size • When a difference is statistically significant, it does not necessarily mean that it is big, important or helpful in decision-making. It simply means you can be confident that there is a difference. • For example, you evaluate the effect of sun erruptions on student knowledge (π = 2000). • The mean score on the pretest was 84 out of 100. The mean score on the posttest was 83. • Although you find that the difference in scores is statistically significant (because of a large sample size), the difference is very small suggesting that erruptions do not lead to a meaningful decrease in student knowledge. Effect size • To know if an observed difference is not only statistically significant, but also factually important, you have to calculate its effect size. • The effect size in our case is 84 – 83 = 1. • The effect size is transformed on a common scale by standardizing (i.e., the difference is divided by a s.d.). Power analysis • To ensure that your sample size is big enough, you will need to conduct a power analysis. • For any power calculation, you will need to know: • What type of test you plan to use (e.g., independent t-test) • The alpha value (usually 0.05) • The expected effect size • The sample size you are planning to use • Because the effect size can only be calculated after you collect data, you will have to use an estimate for the power analysis. • Cohen suggests that for t-test values of 0.2, 0.5, and 0.8 represent small, medium and large effect sizes respectively. Power analysis in R (paired t-test) install.packages("pwr") library(pwr) pwr.t.test(d=0.8,power=0.8,sig.level=0.05,type="paired",alternative="two.sided") Paired t test power calculation n = 14.30278 d = 0.8 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number of *pairs* Check for normality – histogram Check for normality – QQ-plot qqnorm(rivers) qqline(rivers) Check for normality – tests • The graphical methods for checking data normality still leave much to your own interpretation. If you show any of these plots to ten different statisticians, you can get ten different answers. • H0: Data follow a normal distribution. • Shapiro-Wilk test > shapiro.test(rivers) Shapiro-Wilk normality test data: rivers W = 0.6666, p-value < 2.2e-16 p-value < 2.2e-16 log p-value = 3.945e-05 Nonparametric statistics • Small samples from considerably non-normal distributions. • non-parametric tests • No assumption about the shape of the distribution. • No assumption about the parameters of the distribution (thus they are called non-parametric). • Simple to do, however their theory is extremely complicated. Of course, we won't cover it at all. • However, they are less accurate than their parametric counterparts. • So if your data fullfill the assumptions about normality, use paramatric tests (t-test, F-test). Nonparametric tests • If the normality assumption of the t-test is violated, then its nonparametric alternative should be used. • The nonparametric alternative of t-test is Wilcoxon test. • wilcox.test() • http://stat.ethz.ch/R-manual/R-patched/library/stats/html/wilcox.test.html ANOVA (ANALÝZA ROZPTYLU) A problem • You're comparing three brands of beer. A problem • You buy four bottles of each brand for the following prices. Primátor Kocour Matuška 15 39 65 12 45 45 14 48 32 11 60 38 • What do you think, which of these brands have significantly different prices? • No significant difference between any of these. • Primátor and Kocour • Primátor and Matuška • Kocour and Matuška t-test • We can do three t-tests to show if there is a significant difference between these brands. • How many t-tests would you need to compare four samples? • 6 • To compare 10 samples, you need 45 t-tests! This is a lot. We don’t want to do a million t-tests. • But in this lesson you'll learn a simpler method. • Its called Analysis of variance (Analýza rozptylu) – ANOVA. Multiple comparisons problem • If you make two comparisons and assuming that both null • • • • • hypothesis are true, what is the chance that both comparisons will not be statistically significant (πΌ = 0.5)? 0.95 × 0.95 = 0.9025 And what is the chance that one or both comparisons will result in a statistically significant conclusion just by chance? 1.0 − 0.9025 = 0.0975 ~ 10% For N comparisons, this probability is generally 1.00 − 0.95π. So, for example, for 13 independent tests there is about 50:50 chance of obtaining at least one FP. Multiple comparisons problem Bennet et al., Journal of Serendipitous and Unexpected Results, 1, 1-5, 2010 http://www.graphpad.com/guides/prism/6/statistics/index.htm?beware_of_multiple_comparisons.htm Correcting for multiple comparisons • Bonferroni correction – the simplest approach is to divide the α value by the number of comparisons N. Then define the particular comparison as statistically significant when its p-value is less than πΌ/π. • For example, for 100 comparisons reject the null in each if its p-value is less than 0.05 100 = 0.0005. • However, this is a bit too conservative, other approaches exist. > p.adjust() • “There seems no reason to use the unmodified Bonferroni correction because it is dominated by Holm's method” Main idea of ANOVA • To compate three or more samples, we can use the same ideas that underlie t-tests. • In t-test, the general form of t-statistic is π₯1 − π₯2 π‘= ππΈ Variability between sample means Error, variability within samples • Similarly, for three or more samples we assess the variability between sample means in numerator and the error (variability within samples) in denominator. Variability between sample means Variability within samples ANOVA hypothesis • π»0 : π1 = π2 = π3 π»1 βΆ at least one pair of samples is significantly different • Follow-up multiple comparison steps – see which means are different from each other. F ratio between − group variability πΉ= within − group variability • As between-group variability (variabilita mezi skupinami) increases, F-statistic increases and this leans more in favor of the alternative hypothesis that at least one pair of means is significantly different. • As within-group variability (variabilita v rámci skupin) increases, F-statistic decreases and this leans more in favor of the null hypothesis that the means are not siginificantly different. Beer brands – a boxplot 13 π₯π Primátor Kocour Matuška 15 39 65 12 45 45 14 48 32 11 60 38 35 π₯πΊ 45 48 π₯πΎ π₯π Between-group variability SS – sum of squares, souΔet ΔtvercΕ― MS – mean square, prΕ―mΔrný Δtverec SSB – souΔet ΔtvercΕ― mezi skupinami MSB – prΕ―mΔrný Δtverec mezi skupinami π₯π − π₯πΊ 2 π₯π − π₯πΊ π₯πΎ − π₯πΊ 13 π₯π 35 π₯πΊ 2 πππ΅ = 2 45 48 π₯πΎ π₯π ππ π₯π − π₯πΊ 2 π πππ΅ = π − 1 π ππΎ π₯π − π₯πΊ πππ΅ = π−1 2 Within-group variability SSW – souΔet ΔtvercΕ― uvnitΕ skupin MSW – prΕ―mΔrný Δtverec uvnitΕ skupin πππ πππ = = πππ π π₯π − π₯π π−π 2 The summary of variabilities πππ πππ = = πππ π π₯π − π₯π π−π 2 πππ΅ πππ΅ = = πππ΅ π π₯π − π₯πΊ π−1 2 Primátor Kocour Matuška 15 39 65 • π₯π ... sample mean 12 45 45 • π ... total number of data points 14 48 32 11 60 38 • π₯π ... value of each data point • π ... number of samples • ππΎ ... number of data points in each sample • π₯πΊ ... grand mean F-ratio πΉπππ΅ ,πππ πππ΅ = πππ πππ΅ = π − 1 πππ = π − π F-distribution F distribution πππ΅ πππ΅ = = 1505.3 πππ΅ Beer prices Primátor Kocour Matuška 15 39 65 12 45 45 14 48 32 11 60 38 13 48 45 2 = 3011 π₯π πππ΅ = π π₯π − π₯πΊ πππ πππ = = 95.78 πππ π₯πΊ = 35.33 πππ΅ = π − 1 = 2 π πππ = π₯π − π₯π 2 = 862 πππ = π − π = 9 π ∗ πΉ2,9 = 4.25 πΉ2,9 πππ΅ = = 15.72 πππ F9,2 F2,9 Beer brands – ANOVA