lecture

advertisement
SUMMARY
critical region
Z*
Z-critical value
Decision errors
Decision
Reject H0
State of
the world
H0 true
H0 false
Retain H0
Type I error
FP
Type II error
FN
• Type I: you reject the null, but you shouldn't. (α)
• Type II: You do not reject the null, but you should.
Summary of t-tests
• one-sample test (jednovýběrový test)
• you test H0 : 𝜇 = 𝜇0
• two-sample test (dvouvýběrový test)
• you test H0 : 𝜇1 − 𝜇2 = 0
• dependent samples
• paired t-test (párový test)
• independent samples
• equal variances 𝜎1 ~𝜎2
• unequal variances 𝜎1 ≠ 𝜎2
𝑥 − 𝑥0
𝑡= 𝑠
𝑛
two-sample tests
NEW STUFF
Independent samples
• Disadvantages of dependent samples become
advantages of dependent samples and vice versa.
• We need more subjects, it's generally more time consuming and
more expensive.
• No carry-over effects (each subject only gets one treatment). No
order-dependency.
• Everything else is same
• 𝐻0 ∶ 𝑥1 − 𝑥2 = 0, 𝐻1 ∶ 𝑥1 − 𝑥2 ≠ 0
•𝑡=
𝑥1 −𝑥2
SE
Independent samples
• In independent samples, the standard error 𝑆𝐸 is defined
by further assumptions:
• assumption of equal variances 𝜎0 ≈ 𝜎
• We get a measure, called pooled standard deviation, that collects
(pools) the standard deviations of both samples.
• assumption of unequal variances 𝜎0 ≠ 𝜎
• In this case, the t-test is called Welch t-test.
• The degrees of freedom is approximated using a special equation.
• Usually, the degerees of freedom is not an integer number.
• Výrobce garantuje, že jím vyrobené žárovky mají
životnost v průměru 1000 hodin. Aby útvar kontroly zjistil,
že tomuto konstatování odpovídá i v daném období
vyrobená a expedovaná část produkce, vybral z
připravené dodávky náhodně 50 žárovek a došel k
závěru, že průměrná doba životnosti je 950 hodin se
směrodatnou odchylkou 100 hodin. Je zjištěný rozdíl doby
životnosti známkou nekvality produkce?
• Ve Zpiťákově se dělal výzkum požívání alkoholu tak, že
se náhodně vybralo 8 občanů a u nich se zjistila
průměrná měsíční konzumace alkoholu. Po nějaké době
došlo ve městě ke dvěma úmrtím na cirhózu jater (u
jiných Zpiťarů, než kteří byli statisticky testováni). K
posouzení, zda tato událost snížila konzumaci ve městě,
se u stejných 8 občanů zjistila opět měsíční spotřeba.
Rozhodněte, zda ona dvě úmrtí snížila konzumaci?
• Průměrná váha žen v ČR ve věku 20-25 let je 67 kg se
směrodatnou odchylkou 4 kg. Průměrná hmotnost 10
náhodně vybraných studentek VŠCHT činí 65,4 kg se
směrodatnou odchylkou 3,2 kg. Vede dlouhotrvající
sezení na nudných přednáškách a stres ze zkoušek z
nesrozumitelných předmětů k poklesu váhy studentek?
• Porovnáváme množství organických látek v odpadních
vodách dvou papíren. Na základě několika náhodných
měření v těchto papírnách máme rozhodnout, zda se tyto
papírny liší v množství odpadních látek.
V první papírně proběhlo 20 měření s průměrem 14,9 a
směrodatnou odchylkou 4,8. 25 měření z druhé papírny
vykazovalo průměr 22,0 a směrodatnou odchylku 7,4.
• Podnikatel začal vyrábět jehly do šicích strojů. Prosadí se
na trhu jedině tehdy, jestliže jeho jehly budou mít vyšší
životnost než konkurenční. Z odborného tisku podnikatel
zjistil, že životnost konkurenčních jehel je 8,72 milion
stehů. Sám na zkoušku vyrobil 395 jehel, jejichž
průměrná životnost činila 8,92 milionu stehů se
směrodatnou odchylkou 1,81 milionu stehů. Má
podnikatel rozjet výrobu naplno?
t-test in R
t.test(x, y = NULL, alternative = c("two.sided", "less",
"greater"), mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95)
• For the detailed description, see R manual at
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/t.test.html
One-sample t-test in R
• An outbreak of Salmonella-related illness was attributed
to ice cream produced at a certain factory. Scientists
measured the level of Salmonella in 9 randomly sampled
batches of ice cream. The levels (in MPN/g) were:
0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418
Is there evidence that the mean level of Salmonella in the
ice cream is greater than 0.3 MPN/g?
• Setup the hypothesis, please.
• 𝐻0 : 𝜇 = 0.3, 𝐻𝑎 : 𝜇 > 0.3, 𝛼 = 0.05
One-sample t-test in R
x <- c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418)
t.test(x, alternative="greater", mu=0.3)
One Sample t-test
data: x
t = 2.2051, df = 8, p-value = 0.02927
alternative hypothesis: true mean is greater than 0.3
95 percent confidence interval:
0.3245133 Inf
sample estimates:
mean of x
0.4564444
Paired t-test
• A study was performed to test whether cars get better mileage on
premium gas than on regular gas. Each of 10 cars was first filled with
either regular or premium gas, decided by a coin toss, and the mileage
for that tank was recorded. The mileage was recorded again for the
same cars using the other kind of gasoline. Determine, whether cars
get significantly better mileage with premium gas.
reg <- c(16, 20, 21, 22, 23, 22, 27, 25, 27, 28)
prem <- c(19, 22, 24, 24, 25, 25, 26, 26, 28, 32)
t.test(prem,reg,alternative="greater", paired=TRUE)
Paired t-test
data: prem and reg
t = 4.4721, df = 9, p-value = 0.000775
alternative hypothesis: true difference in means is greater than 0
Two-sample t-tests, independent samples
• 6 subjects were given a drug (treatment group) and an
additional 6 subjects a placebo (control group). Their
reaction time to a stimulus was measured (in ms). We
want to perform a two-sample t-test for comparing the
means of the treatment and control groups.
• Setup the hypothesis, please.
• 𝐻0 : 𝜇1 − 𝜇2 = 0, 𝐻𝑎 : 𝜇1 − 𝜇2 < 0, 𝛼 = 0.05
Assuming equal variances
Control <- c(91, 87, 99, 77, 88, 91)
Treat <- c(101, 110, 103, 93, 99, 104)
t.test(Control, Treat, alternative=“less", var.equal=TRUE)
Two Sample t-test
data: Control and Treat
t = -3.4456, df = 10, p-value = 0.003136
alternative hypothesis: true difference in means is less than 0
Assuming non-equal variances
t.test(Control,Treat,alternative="less")
Welch Two Sample t-test
data: Control and Treat
t = -3.4456, df = 9.48, p-value = 0.003391
alternative hypothesis: true difference in means is less than 0
Tests for the equality of the variances
• How we know if our variances are equal or not?
• We use the test of homogeneity of variances (also known
as homoscedasticity).
• There exist several tests of homogeneity. I will start with
F-test to introduce F-distribution (we will need this later in
the lecture about analysis of variance).
• However, F-test is used very rarely as it strictly requires
that the distributions are normal and it is not robust to the
departures from the normality.
F-test of equality of variances
• Also known as Hartley's test
• 𝐻0 : 𝜎1 = 𝜎2
• Test statistic is a ratio of two variances. It has an F-
distribution and is called F-ratio. Each numerator and
denominator has certain number of d.f.
𝑠𝐿2
𝐹 𝑛𝐿 − 1, 𝑛𝑆 − 1 = 2
𝑠𝑆
source: Wikipedia
F-test in R
x <- rnorm(50, mean = 0, sd = 2)
y <- rnorm(30, mean = 1, sd = 1)
var.test(x, y)
F test to compare two variances
data: x and y
F = 2.5958, num df = 49, denom df = 29, p-value = 0.007395
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
1.304183 4.883757
sample estimates:
ratio of variances
2.595787
Alternative tests of equality of variances
• F-test is extremely sensitive to the departures from the
normality.
• Alternative tests include
• Levene's test – performed over absolute values of the deviations
from the mean, test statistic distribution: F-distribution
• Brown–Forsythe test – similar to Levene's test, but deviations from
median are calculated, more robust
• The rationale for choosing amongst them is based on
their performance at non-normal data.
• Levene's test is slightly more powerful than BF if the data
really are normal, but not quite as robust if they aren't.
Power of the test
• A probability that it correctly rejects the null hypothesis
(H0) when it is false.
• Equivalently, it is the probability of correctly accepting the
alternative hypothesis (Ha) when it is true - that is, the ability of a
test to detect an effect, if the effect actually exists.
Probability of FP is α
Decision
Reject H0
State of
the world
H0 true
Retain H0
Type I error
H0 false
power = 1 - β
Type II error
Probability of FN is β
What factors affect the power?
To increase the power of your test, you may do any of the
following:
1. Increase the effect size (the difference between the null
and alternative values) to be detected
The reasoning is that any test will have trouble rejecting the null
hypothesis if the null hypothesis is only 'slightly' wrong. If the effect size is
large, then it is easier to detect and the null hypothesis will be soundly
rejected.
2. Increase the sample size(s) – power analysis
3. Decrease the variability in the sample(s)
4. Increase the significance level (α) of the test
The shortcoming of setting a higher α is that Type I errors will be more
likely. This may not be desirable.
Download