SUMMARY critical region Z* Z-critical value Decision errors Decision Reject H0 State of the world H0 true H0 false Retain H0 Type I error FP Type II error FN • Type I: you reject the null, but you shouldn't. (α) • Type II: You do not reject the null, but you should. Summary of t-tests • one-sample test (jednovýběrový test) • you test H0 : 𝜇 = 𝜇0 • two-sample test (dvouvýběrový test) • you test H0 : 𝜇1 − 𝜇2 = 0 • dependent samples • paired t-test (párový test) • independent samples • equal variances 𝜎1 ~𝜎2 • unequal variances 𝜎1 ≠ 𝜎2 𝑥 − 𝑥0 𝑡= 𝑠 𝑛 two-sample tests NEW STUFF Independent samples • Disadvantages of dependent samples become advantages of dependent samples and vice versa. • We need more subjects, it's generally more time consuming and more expensive. • No carry-over effects (each subject only gets one treatment). No order-dependency. • Everything else is same • 𝐻0 ∶ 𝑥1 − 𝑥2 = 0, 𝐻1 ∶ 𝑥1 − 𝑥2 ≠ 0 •𝑡= 𝑥1 −𝑥2 SE Independent samples • In independent samples, the standard error 𝑆𝐸 is defined by further assumptions: • assumption of equal variances 𝜎0 ≈ 𝜎 • We get a measure, called pooled standard deviation, that collects (pools) the standard deviations of both samples. • assumption of unequal variances 𝜎0 ≠ 𝜎 • In this case, the t-test is called Welch t-test. • The degrees of freedom is approximated using a special equation. • Usually, the degerees of freedom is not an integer number. • Výrobce garantuje, že jím vyrobené žárovky mají životnost v průměru 1000 hodin. Aby útvar kontroly zjistil, že tomuto konstatování odpovídá i v daném období vyrobená a expedovaná část produkce, vybral z připravené dodávky náhodně 50 žárovek a došel k závěru, že průměrná doba životnosti je 950 hodin se směrodatnou odchylkou 100 hodin. Je zjištěný rozdíl doby životnosti známkou nekvality produkce? • Ve Zpiťákově se dělal výzkum požívání alkoholu tak, že se náhodně vybralo 8 občanů a u nich se zjistila průměrná měsíční konzumace alkoholu. Po nějaké době došlo ve městě ke dvěma úmrtím na cirhózu jater (u jiných Zpiťarů, než kteří byli statisticky testováni). K posouzení, zda tato událost snížila konzumaci ve městě, se u stejných 8 občanů zjistila opět měsíční spotřeba. Rozhodněte, zda ona dvě úmrtí snížila konzumaci? • Průměrná váha žen v ČR ve věku 20-25 let je 67 kg se směrodatnou odchylkou 4 kg. Průměrná hmotnost 10 náhodně vybraných studentek VŠCHT činí 65,4 kg se směrodatnou odchylkou 3,2 kg. Vede dlouhotrvající sezení na nudných přednáškách a stres ze zkoušek z nesrozumitelných předmětů k poklesu váhy studentek? • Porovnáváme množství organických látek v odpadních vodách dvou papíren. Na základě několika náhodných měření v těchto papírnách máme rozhodnout, zda se tyto papírny liší v množství odpadních látek. V první papírně proběhlo 20 měření s průměrem 14,9 a směrodatnou odchylkou 4,8. 25 měření z druhé papírny vykazovalo průměr 22,0 a směrodatnou odchylku 7,4. • Podnikatel začal vyrábět jehly do šicích strojů. Prosadí se na trhu jedině tehdy, jestliže jeho jehly budou mít vyšší životnost než konkurenční. Z odborného tisku podnikatel zjistil, že životnost konkurenčních jehel je 8,72 milion stehů. Sám na zkoušku vyrobil 395 jehel, jejichž průměrná životnost činila 8,92 milionu stehů se směrodatnou odchylkou 1,81 milionu stehů. Má podnikatel rozjet výrobu naplno? t-test in R t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95) • For the detailed description, see R manual at http://stat.ethz.ch/R-manual/R-patched/library/stats/html/t.test.html One-sample t-test in R • An outbreak of Salmonella-related illness was attributed to ice cream produced at a certain factory. Scientists measured the level of Salmonella in 9 randomly sampled batches of ice cream. The levels (in MPN/g) were: 0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418 Is there evidence that the mean level of Salmonella in the ice cream is greater than 0.3 MPN/g? • Setup the hypothesis, please. • 𝐻0 : 𝜇 = 0.3, 𝐻𝑎 : 𝜇 > 0.3, 𝛼 = 0.05 One-sample t-test in R x <- c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418) t.test(x, alternative="greater", mu=0.3) One Sample t-test data: x t = 2.2051, df = 8, p-value = 0.02927 alternative hypothesis: true mean is greater than 0.3 95 percent confidence interval: 0.3245133 Inf sample estimates: mean of x 0.4564444 Paired t-test • A study was performed to test whether cars get better mileage on premium gas than on regular gas. Each of 10 cars was first filled with either regular or premium gas, decided by a coin toss, and the mileage for that tank was recorded. The mileage was recorded again for the same cars using the other kind of gasoline. Determine, whether cars get significantly better mileage with premium gas. reg <- c(16, 20, 21, 22, 23, 22, 27, 25, 27, 28) prem <- c(19, 22, 24, 24, 25, 25, 26, 26, 28, 32) t.test(prem,reg,alternative="greater", paired=TRUE) Paired t-test data: prem and reg t = 4.4721, df = 9, p-value = 0.000775 alternative hypothesis: true difference in means is greater than 0 Two-sample t-tests, independent samples • 6 subjects were given a drug (treatment group) and an additional 6 subjects a placebo (control group). Their reaction time to a stimulus was measured (in ms). We want to perform a two-sample t-test for comparing the means of the treatment and control groups. • Setup the hypothesis, please. • 𝐻0 : 𝜇1 − 𝜇2 = 0, 𝐻𝑎 : 𝜇1 − 𝜇2 < 0, 𝛼 = 0.05 Assuming equal variances Control <- c(91, 87, 99, 77, 88, 91) Treat <- c(101, 110, 103, 93, 99, 104) t.test(Control, Treat, alternative=“less", var.equal=TRUE) Two Sample t-test data: Control and Treat t = -3.4456, df = 10, p-value = 0.003136 alternative hypothesis: true difference in means is less than 0 Assuming non-equal variances t.test(Control,Treat,alternative="less") Welch Two Sample t-test data: Control and Treat t = -3.4456, df = 9.48, p-value = 0.003391 alternative hypothesis: true difference in means is less than 0 Tests for the equality of the variances • How we know if our variances are equal or not? • We use the test of homogeneity of variances (also known as homoscedasticity). • There exist several tests of homogeneity. I will start with F-test to introduce F-distribution (we will need this later in the lecture about analysis of variance). • However, F-test is used very rarely as it strictly requires that the distributions are normal and it is not robust to the departures from the normality. F-test of equality of variances • Also known as Hartley's test • 𝐻0 : 𝜎1 = 𝜎2 • Test statistic is a ratio of two variances. It has an F- distribution and is called F-ratio. Each numerator and denominator has certain number of d.f. 𝑠𝐿2 𝐹 𝑛𝐿 − 1, 𝑛𝑆 − 1 = 2 𝑠𝑆 source: Wikipedia F-test in R x <- rnorm(50, mean = 0, sd = 2) y <- rnorm(30, mean = 1, sd = 1) var.test(x, y) F test to compare two variances data: x and y F = 2.5958, num df = 49, denom df = 29, p-value = 0.007395 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 1.304183 4.883757 sample estimates: ratio of variances 2.595787 Alternative tests of equality of variances • F-test is extremely sensitive to the departures from the normality. • Alternative tests include • Levene's test – performed over absolute values of the deviations from the mean, test statistic distribution: F-distribution • Brown–Forsythe test – similar to Levene's test, but deviations from median are calculated, more robust • The rationale for choosing amongst them is based on their performance at non-normal data. • Levene's test is slightly more powerful than BF if the data really are normal, but not quite as robust if they aren't. Power of the test • A probability that it correctly rejects the null hypothesis (H0) when it is false. • Equivalently, it is the probability of correctly accepting the alternative hypothesis (Ha) when it is true - that is, the ability of a test to detect an effect, if the effect actually exists. Probability of FP is α Decision Reject H0 State of the world H0 true Retain H0 Type I error H0 false power = 1 - β Type II error Probability of FN is β What factors affect the power? To increase the power of your test, you may do any of the following: 1. Increase the effect size (the difference between the null and alternative values) to be detected The reasoning is that any test will have trouble rejecting the null hypothesis if the null hypothesis is only 'slightly' wrong. If the effect size is large, then it is easier to detect and the null hypothesis will be soundly rejected. 2. Increase the sample size(s) – power analysis 3. Decrease the variability in the sample(s) 4. Increase the significance level (α) of the test The shortcoming of setting a higher α is that Type I errors will be more likely. This may not be desirable.