Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test

advertisement
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Quantitative Introduction ro
Risk and Uncertainty in Business Module 5:
Hypothesis Testing
M. Vidyasagar
Cecil & Ida Green Chair
The University of Texas at Dallas
Email: M.Vidyasagar@utdallas.edu
October 13, 2012
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Outline
1
Hypothesis Testing
2
Hoeffding’s Inequalities
3
K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4
Student t Test
5
Chi-Squared Test
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Outline
1
Hypothesis Testing
2
Hoeffding’s Inequalities
3
K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4
Student t Test
5
Chi-Squared Test
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Hypothesis Testing: Basic Idea
‘Null’ hypothesis: What we believe in the absence of further
evidence, e.g. a two-sided coin is ‘fair’ with equal likelihood.
Think: Null hypothesis = default assumption.
Two kinds of testing:
There is only the null hypothesis, and we accept or reject it.
There is a null as well as an alternate hypothesis, and we
choose one or the other.
The second kind of testing is easier: We choose whichever
hypothesis is more likely under the data.
The first kind of testing is harder.
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Choosing Between Alternatives: Example
We are given a coin. The null hypothesis is that the coin is ‘fair’
with equal probabilities of heads and tails. Call it H0 .
The alternative hypothesis is that the coin is ‘biased’ with the
probability of heads equal to 0.7. Call it H1 .
Suppose we toss the coin 20 times and 12 heads result. Which
hypothesis should we accept?
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Choosing Between Alternatives: Example (Cont’d)
Let n = 20 (number of coin tosses), k = 12 (number of heads),
p0 = 0.5 (probability of heads under hypothesis H0 ) and P1 = 0.7
(probability of heads under hypothesis H1 ).
The likelihood of the observed outcome under each hypothesis is
computed.
20
L0 =
(p0 )12 (1 − p0 )8 = 0.1201,
12
L1 =
20
12
(p1 )12 (1 − p1 )8 = 0.1144.
So we accept hypothesis H0 , that the coin is fair, but only because
the alternative hypothesis is even less likely!
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Connection to MLE
We choose the hypothesis that the coin is fair only because the
alternate hypothesis is even more unlikely!
So what is the value of p that maximizes
20
L=
p12 (1 − p)8 ?
12
Answer: pMLE = 12/20 = 0.6, the fraction of heads observed.
With MLE (maximum likelihood estimation), we need not choose
between two competing hypotheses – MLE gives the most likely
values for the parameters!
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Outline
1
Hypothesis Testing
2
Hoeffding’s Inequalities
3
K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4
Student t Test
5
Chi-Squared Test
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Estimating Probabilities of Binary Outcomes
Suppose an event has only two outcomes, e.g. coin toss. Let p
equal the true but unknown probability of ‘success’, e.g. that the
coin comes up heads.
After n trials, suppose k successes result. Then p̂ := k/n is called
the empirical probability of success. As we have seen, it is also
the maximum likelihood estimate of p.
Question: How close is the empirical probability p̂ to the true but
unknown probability p?
Hoeffding’s inequalities answer this question.
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Hoeffding’s Inequalities: Statements
Let > 0 be any specified accuracy. Then
Pr{p̂ − p ≥ } ≤ exp(−2n2 ).
Pr{p̂ − p ≤ −} ≤ exp(−2n2 ).
Pr{|p̂ − p| ≤ } ≥ 1 − 2 exp(−2n2 ).
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Hoeffding’s Inequalities: Interpretation
Interpretations of Hoeffding’s inequalities:
With confidence 1 − 2 exp(−2n2 ), we can say that the true but
unknown probability p lies in the interval (p̂ − , p̂ + ). As we
increase , the term δ := 2 exp(−2n2 ) decreases, and we can be
more sure of our interval.
The widely used 95% confidence interval corresponds to δ = 0.5.
The one-sided inequalities have similar interpretations.
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
An Example of Applying Hoeffding’s Inequality
Suppose we toss a coin 1000 times and it comes up heads 552
times. How sure can we be that the coin is biased?
n = 1000, k = 552, p̂ = 0.552. If p > 0.5 then we can say that the
coin is biased. So let = p̂ − p = 0.052. Compute
δ = exp(−2n2 ) = 0.0045
So with confidence 1 − δ = 0.9955, we can say that p > 0.5. In
other words, we can be 99.55% sure that the coin is biased. Using
the two-sided Hoeffding inequality, we can be 99.1% sure that
p̂ ∈ (0.5, 0.614).
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Another Example
An opinion poll of 750 voters (ignoring ‘don’t know’s) shows that
387 will vote for candidate A and 363 will vote for candidate B.
How sure can we be that candidate A will win?
Let p denote the true but unknown fraction of voters who will vote
for A, and p̂ = 387/750 = 0.5160 denote the empirical estimate of
p. If p < 0.5 then A will lose. So the accuracy = 0.0160, and the
number of samples n = 750. The one-sided confidence is
δ = exp(−2n2 ) = 0.6811.
So we can be only 1 − δ ≈ 32% sure that A will win. In other
words, the election cannot be ‘called’ with any confidence based on
such a small margin of preference.
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Relating Confidence, Accuracy and Number of Samples
For the two-sided Hoeffding inequality, the confidence δ associated
with n samples and accuracy is given by
δ = 2 exp(−2n2 ).
We can turn this around and ask: Given an empirical estimate p̂
based on n samples, what is the accuracy corresponding to a given
confidence level δ?
Solving the above equation for in terms of δ and n gives
(n, δ) =
2
1
log
2n
δ
1/2
.
So with confidence δ we can say that the true but unknown
probability p is in the interval [p̂ − (n, δ), p̂ + (n, δ)].
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Hoeffding’s Inequalities for More Than Two Outcomes
Suppose a random experiment has more than two possible
outcomes (e.g. rolling six-sided die). Say there are k outcomes,
and
Pk in n trials, the i-th outcome appears ni times (and of course
i=1 ni = n).
We can define
ni
, i = 1, . . . , k,
n
and as we have seen, these are the maximum likelihood estimates
for each probability.
p̂i =
Question: How good are these estimates?
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
More Than Two Outcomes – 2
Fact: For any sample size n and any accuracy , it is the case that
Pr{max |p̂i − pi | > } ≤ 2k exp(−2n2 ).
i
So with confidence 1 − 2k exp(−2n2 ), we can assert that every
empirical probability p̂i is within of the correct value.
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
More Than Two Outcomes: Example
Suppose we roll a six-sided die 1,000 times and get the outcomes 1
through 6 in the following order:
p̂1 = 0.169, p̂2 = 0.165, p̂3 = 0.166, p̂4 = 0.165, p̂5 = 0.167, p̂6 = 0.168.
With what confidence can we say that the die is not fair, that is,
that p̂i 6= 1/6 for all i?
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
More Than Two Outcomes: Example (Cont’d)
Suppose that indeed the true probability is pi = 1/6 for all i. Then
max |p̂i − pi | = |p̂1 − 1/6| ≈ 0.0233.
i
Take = 0.233, n = 1000 and compute
δ = 6 × 2 exp(−2n2 ) ≈ 11.87!
How can a ‘probability’ be greater than one?
Note: This δ is just an upper bound for Pr{maxi |p̂i − pi | > }; so
it can be larger than one.
So we cannot rule out the possibility that the die is fair (which is
quite different from saying that it is fair).
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
Outline
1
Hypothesis Testing
2
Hoeffding’s Inequalities
3
K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4
Student t Test
5
Chi-Squared Test
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
Outline
1
Hypothesis Testing
2
Hoeffding’s Inequalities
3
K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4
Student t Test
5
Chi-Squared Test
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
K-S Tests: Problem Formulations
There are two widely used tests. They should be called the
Kolmogorov test and the Smirnov test, respectively. Unfortunately
the erroneous names ‘one-sample K-S test’ and ‘two-sample K-S
test’ have become popular.
Kolmogorov Test, or One-Sample K-S Test: We have a set of
samples, and we have a candidate probability distribution.
Question: How well does the distribution fit the set of samples?
Smirnov Test, or Two-Sample K-S Test: We have two sets of
samples, say x1 , . . . , xn and y1 , . . . , ym . Question: How sure are
we that both sets of samples came from the same (but unknown)
distribution?
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
Outline
1
Hypothesis Testing
2
Hoeffding’s Inequalities
3
K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4
Student t Test
5
Chi-Squared Test
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
Empirical Distributions
Suppose X is a random variable for which we have generated n
i.i.d. samples, call them x1 , . . . , xn .
Then we define the empirical distribution of X, based on these
observations, as follows:
n
1X
Φ̂(a) =
I{xi ≤a} ,
n
i=1
where I denotes the indicator function: I = 1 if the condition
below is satisfied and I = 0 otherwise.
So in this case Φ̂(a) is just the fraction of the n samples that are
≤ a. The diagram on the next slide illustrates this.
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
Empirical Distribution Depicted
Note: The diagram shows the samples occurring in increasing
order but they can be in any order.1
1
Source:
http://www.aiaccess.net/English/Glossaries/GlosMod/e gm distribution function.htm
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
Glivenko-Cantelli Lemma
Theorem: As n → ∞, the empirical distribution Φ̂(·) approaches
the true distribution Φ(·).
Specifically, if we define the Kolmogorov-Smirnov distance
dn = max |Φ̂(u) − Φ(u)|,
u
then dn → 0 as n → ∞.
At what rate does the convergence take place?
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
One-Sample Kolmogorov-Smirnov Statistic
Fix a ‘confidence level’ δ > 0 (usually δ is taken as 0.05 or 0.02).
Define the threshold
2 1/2
1
log
.
θ(n, δ) =
2n
δ
Then with probability 1 − δ, we can say that
max |Φ̂(u) − Φ(u)| =: dn ≤ θn .
u
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
One-Sample Kolmogorov-Smirnov Test
Given samples x1 , . . . , xn , fit it with some distribution F (·) (e.g.
Gaussian). Compute the K-S statistic
dn = max |Φ̂(u) − F (u)|.
u
Compare dn with the threshold θ(n, δ). If dn > θ(n, δ), we ‘reject
the null hypothesis’ at level δ. In other words, if dn > θ(n, δ), then
we are 1 − δ sure that the data was not generated by the
distribution F (·).
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Outline
1
Hypothesis Testing
2
Hoeffding’s Inequalities
3
K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4
Student t Test
5
Chi-Squared Test
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Student t Test: Motivation
The student t test is used the null hypothesis that two sets of
samples have the same mean, assuming that they have the same
variance. The test has broad applicability even if the assumption of
‘same variance’ is not satisfied.
Problem: We are given two samples x1 , . . . , xm1 and
xm1 +1 , . . . , xm1 +m2 . Determine whether the two sets of samples
arise from a distribution with the same mean.
Application: Most commonly used in quality control.
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Student t Test: Theory
Let x̄1 , x̄2 denote the means of the two sample classes, that is,
m1
m2
1 X
1 X
x̄1 =
xi , x̄2 =
xm1 +i .
m1
m2
i=1
i=1
Let S1 , S2 denote the unbiased estimates of the standard
deviations of the two samples, that is,
m
S12
1
X
1
(xi − x̄1 )2 ,
=
m1 − 1
i=1
m
S22
2
X
1
=
(xm1 +i − x̄2 )2 .
m2 − 1
i=1
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Student t Test: Theory – 2
Now define the ‘pooled’ standard deviation S12 by
2
S12
=
(m1 − 1)S12 + (m2 − 1)S22
.
m1 + m2 − 2
Then the quantity
dt =
x̄ − x̄2
p 1
S12 (1/m1 ) + (1/m2 )
satisfies the t distribution with m1 + m2 − 2 ‘degrees of freedom.’
As the number of d.o.f. becomes large, the t distribution
approaches the normal distribution. The next slide shows the
density of the t distribution for various d.o.f.
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Density of the t Distribution
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Outline
1
Hypothesis Testing
2
Hoeffding’s Inequalities
3
K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4
Student t Test
5
Chi-Squared Test
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Chi-Squared Test: Motivation
The t test is to determine whether two samples have the same
mean. The chi-squared test is to determine whether two samples
have the same variance.
The application is again to quality control.
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Chi-Squared Test: Theory
Given two sets of samples, say x1 , . . . , xm1 and
xm1 +1 , . . . , xm1 +m2 (where usually m2 m1 ), compute the
unbiased variance estimate V1 of the larger (first) sample
m
1
X
1
V1 =
(xi − x̄1 )2 ,
m1 − 1
i=1
and the sum of squares of the smaller (second) sample
S2 =
m2
X
(xm1 +i − x̄2 )2 = (m2 − 1)V2 .
i=1
Then the ratio S2 /V1 satisfies the chi-squared (or χ2 ) distribution
with m2 − 1 degrees of freedom.
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Distribution Function of the Chi-Squared Variable
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Density Function of the Chi-Squared Variable
M. Vidyasagar
Hypothesis Testing
Hypothesis Testing
Hoeffding’s Inequalities
K-S Tests for Goodness of Fit
Student t Test
Chi-Squared Test
Application of the Chi-Squared Test
Note that the χ2 r.v. is always nonnegative. So, given some
confidence δ (usually δ = 0.05), we need to determine a confidence
interval
xl = Φ−1
(δ), xu = Φ−1
(1 − δ).
χ2 ,m2 −1
χ2 ,m2 −1
If the test statistic S2 /V1 lies in the interval [xl , xu ], then we
accept the null hypothesis that both samples have the same
variance.
M. Vidyasagar
Hypothesis Testing
Download