Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 13, 2012 M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Outline 1 Hypothesis Testing 2 Hoeffding’s Inequalities 3 K-S Tests for Goodness of Fit K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 4 Student t Test 5 Chi-Squared Test M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Outline 1 Hypothesis Testing 2 Hoeffding’s Inequalities 3 K-S Tests for Goodness of Fit K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 4 Student t Test 5 Chi-Squared Test M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Hypothesis Testing: Basic Idea ‘Null’ hypothesis: What we believe in the absence of further evidence, e.g. a two-sided coin is ‘fair’ with equal likelihood. Think: Null hypothesis = default assumption. Two kinds of testing: There is only the null hypothesis, and we accept or reject it. There is a null as well as an alternate hypothesis, and we choose one or the other. The second kind of testing is easier: We choose whichever hypothesis is more likely under the data. The first kind of testing is harder. M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Choosing Between Alternatives: Example We are given a coin. The null hypothesis is that the coin is ‘fair’ with equal probabilities of heads and tails. Call it H0 . The alternative hypothesis is that the coin is ‘biased’ with the probability of heads equal to 0.7. Call it H1 . Suppose we toss the coin 20 times and 12 heads result. Which hypothesis should we accept? M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Choosing Between Alternatives: Example (Cont’d) Let n = 20 (number of coin tosses), k = 12 (number of heads), p0 = 0.5 (probability of heads under hypothesis H0 ) and P1 = 0.7 (probability of heads under hypothesis H1 ). The likelihood of the observed outcome under each hypothesis is computed. 20 L0 = (p0 )12 (1 − p0 )8 = 0.1201, 12 L1 = 20 12 (p1 )12 (1 − p1 )8 = 0.1144. So we accept hypothesis H0 , that the coin is fair, but only because the alternative hypothesis is even less likely! M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Connection to MLE We choose the hypothesis that the coin is fair only because the alternate hypothesis is even more unlikely! So what is the value of p that maximizes 20 L= p12 (1 − p)8 ? 12 Answer: pMLE = 12/20 = 0.6, the fraction of heads observed. With MLE (maximum likelihood estimation), we need not choose between two competing hypotheses – MLE gives the most likely values for the parameters! M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Outline 1 Hypothesis Testing 2 Hoeffding’s Inequalities 3 K-S Tests for Goodness of Fit K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 4 Student t Test 5 Chi-Squared Test M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Estimating Probabilities of Binary Outcomes Suppose an event has only two outcomes, e.g. coin toss. Let p equal the true but unknown probability of ‘success’, e.g. that the coin comes up heads. After n trials, suppose k successes result. Then p̂ := k/n is called the empirical probability of success. As we have seen, it is also the maximum likelihood estimate of p. Question: How close is the empirical probability p̂ to the true but unknown probability p? Hoeffding’s inequalities answer this question. M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Hoeffding’s Inequalities: Statements Let > 0 be any specified accuracy. Then Pr{p̂ − p ≥ } ≤ exp(−2n2 ). Pr{p̂ − p ≤ −} ≤ exp(−2n2 ). Pr{|p̂ − p| ≤ } ≥ 1 − 2 exp(−2n2 ). M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Hoeffding’s Inequalities: Interpretation Interpretations of Hoeffding’s inequalities: With confidence 1 − 2 exp(−2n2 ), we can say that the true but unknown probability p lies in the interval (p̂ − , p̂ + ). As we increase , the term δ := 2 exp(−2n2 ) decreases, and we can be more sure of our interval. The widely used 95% confidence interval corresponds to δ = 0.5. The one-sided inequalities have similar interpretations. M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test An Example of Applying Hoeffding’s Inequality Suppose we toss a coin 1000 times and it comes up heads 552 times. How sure can we be that the coin is biased? n = 1000, k = 552, p̂ = 0.552. If p > 0.5 then we can say that the coin is biased. So let = p̂ − p = 0.052. Compute δ = exp(−2n2 ) = 0.0045 So with confidence 1 − δ = 0.9955, we can say that p > 0.5. In other words, we can be 99.55% sure that the coin is biased. Using the two-sided Hoeffding inequality, we can be 99.1% sure that p̂ ∈ (0.5, 0.614). M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Another Example An opinion poll of 750 voters (ignoring ‘don’t know’s) shows that 387 will vote for candidate A and 363 will vote for candidate B. How sure can we be that candidate A will win? Let p denote the true but unknown fraction of voters who will vote for A, and p̂ = 387/750 = 0.5160 denote the empirical estimate of p. If p < 0.5 then A will lose. So the accuracy = 0.0160, and the number of samples n = 750. The one-sided confidence is δ = exp(−2n2 ) = 0.6811. So we can be only 1 − δ ≈ 32% sure that A will win. In other words, the election cannot be ‘called’ with any confidence based on such a small margin of preference. M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Relating Confidence, Accuracy and Number of Samples For the two-sided Hoeffding inequality, the confidence δ associated with n samples and accuracy is given by δ = 2 exp(−2n2 ). We can turn this around and ask: Given an empirical estimate p̂ based on n samples, what is the accuracy corresponding to a given confidence level δ? Solving the above equation for in terms of δ and n gives (n, δ) = 2 1 log 2n δ 1/2 . So with confidence δ we can say that the true but unknown probability p is in the interval [p̂ − (n, δ), p̂ + (n, δ)]. M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Hoeffding’s Inequalities for More Than Two Outcomes Suppose a random experiment has more than two possible outcomes (e.g. rolling six-sided die). Say there are k outcomes, and Pk in n trials, the i-th outcome appears ni times (and of course i=1 ni = n). We can define ni , i = 1, . . . , k, n and as we have seen, these are the maximum likelihood estimates for each probability. p̂i = Question: How good are these estimates? M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test More Than Two Outcomes – 2 Fact: For any sample size n and any accuracy , it is the case that Pr{max |p̂i − pi | > } ≤ 2k exp(−2n2 ). i So with confidence 1 − 2k exp(−2n2 ), we can assert that every empirical probability p̂i is within of the correct value. M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test More Than Two Outcomes: Example Suppose we roll a six-sided die 1,000 times and get the outcomes 1 through 6 in the following order: p̂1 = 0.169, p̂2 = 0.165, p̂3 = 0.166, p̂4 = 0.165, p̂5 = 0.167, p̂6 = 0.168. With what confidence can we say that the die is not fair, that is, that p̂i 6= 1/6 for all i? M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test More Than Two Outcomes: Example (Cont’d) Suppose that indeed the true probability is pi = 1/6 for all i. Then max |p̂i − pi | = |p̂1 − 1/6| ≈ 0.0233. i Take = 0.233, n = 1000 and compute δ = 6 × 2 exp(−2n2 ) ≈ 11.87! How can a ‘probability’ be greater than one? Note: This δ is just an upper bound for Pr{maxi |p̂i − pi | > }; so it can be larger than one. So we cannot rule out the possibility that the die is fair (which is quite different from saying that it is fair). M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements Outline 1 Hypothesis Testing 2 Hoeffding’s Inequalities 3 K-S Tests for Goodness of Fit K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 4 Student t Test 5 Chi-Squared Test M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements Outline 1 Hypothesis Testing 2 Hoeffding’s Inequalities 3 K-S Tests for Goodness of Fit K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 4 Student t Test 5 Chi-Squared Test M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements K-S Tests: Problem Formulations There are two widely used tests. They should be called the Kolmogorov test and the Smirnov test, respectively. Unfortunately the erroneous names ‘one-sample K-S test’ and ‘two-sample K-S test’ have become popular. Kolmogorov Test, or One-Sample K-S Test: We have a set of samples, and we have a candidate probability distribution. Question: How well does the distribution fit the set of samples? Smirnov Test, or Two-Sample K-S Test: We have two sets of samples, say x1 , . . . , xn and y1 , . . . , ym . Question: How sure are we that both sets of samples came from the same (but unknown) distribution? M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements Outline 1 Hypothesis Testing 2 Hoeffding’s Inequalities 3 K-S Tests for Goodness of Fit K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 4 Student t Test 5 Chi-Squared Test M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements Empirical Distributions Suppose X is a random variable for which we have generated n i.i.d. samples, call them x1 , . . . , xn . Then we define the empirical distribution of X, based on these observations, as follows: n 1X Φ̂(a) = I{xi ≤a} , n i=1 where I denotes the indicator function: I = 1 if the condition below is satisfied and I = 0 otherwise. So in this case Φ̂(a) is just the fraction of the n samples that are ≤ a. The diagram on the next slide illustrates this. M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements Empirical Distribution Depicted Note: The diagram shows the samples occurring in increasing order but they can be in any order.1 1 Source: http://www.aiaccess.net/English/Glossaries/GlosMod/e gm distribution function.htm M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements Glivenko-Cantelli Lemma Theorem: As n → ∞, the empirical distribution Φ̂(·) approaches the true distribution Φ(·). Specifically, if we define the Kolmogorov-Smirnov distance dn = max |Φ̂(u) − Φ(u)|, u then dn → 0 as n → ∞. At what rate does the convergence take place? M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements One-Sample Kolmogorov-Smirnov Statistic Fix a ‘confidence level’ δ > 0 (usually δ is taken as 0.05 or 0.02). Define the threshold 2 1/2 1 log . θ(n, δ) = 2n δ Then with probability 1 − δ, we can say that max |Φ̂(u) − Φ(u)| =: dn ≤ θn . u M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements One-Sample Kolmogorov-Smirnov Test Given samples x1 , . . . , xn , fit it with some distribution F (·) (e.g. Gaussian). Compute the K-S statistic dn = max |Φ̂(u) − F (u)|. u Compare dn with the threshold θ(n, δ). If dn > θ(n, δ), we ‘reject the null hypothesis’ at level δ. In other words, if dn > θ(n, δ), then we are 1 − δ sure that the data was not generated by the distribution F (·). M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Outline 1 Hypothesis Testing 2 Hoeffding’s Inequalities 3 K-S Tests for Goodness of Fit K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 4 Student t Test 5 Chi-Squared Test M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Student t Test: Motivation The student t test is used the null hypothesis that two sets of samples have the same mean, assuming that they have the same variance. The test has broad applicability even if the assumption of ‘same variance’ is not satisfied. Problem: We are given two samples x1 , . . . , xm1 and xm1 +1 , . . . , xm1 +m2 . Determine whether the two sets of samples arise from a distribution with the same mean. Application: Most commonly used in quality control. M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Student t Test: Theory Let x̄1 , x̄2 denote the means of the two sample classes, that is, m1 m2 1 X 1 X x̄1 = xi , x̄2 = xm1 +i . m1 m2 i=1 i=1 Let S1 , S2 denote the unbiased estimates of the standard deviations of the two samples, that is, m S12 1 X 1 (xi − x̄1 )2 , = m1 − 1 i=1 m S22 2 X 1 = (xm1 +i − x̄2 )2 . m2 − 1 i=1 M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Student t Test: Theory – 2 Now define the ‘pooled’ standard deviation S12 by 2 S12 = (m1 − 1)S12 + (m2 − 1)S22 . m1 + m2 − 2 Then the quantity dt = x̄ − x̄2 p 1 S12 (1/m1 ) + (1/m2 ) satisfies the t distribution with m1 + m2 − 2 ‘degrees of freedom.’ As the number of d.o.f. becomes large, the t distribution approaches the normal distribution. The next slide shows the density of the t distribution for various d.o.f. M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Density of the t Distribution M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Outline 1 Hypothesis Testing 2 Hoeffding’s Inequalities 3 K-S Tests for Goodness of Fit K-S (Kolmogorov-Smirnov) Tests: Objectives Kolmogorov-Smirnov Tests: Statements 4 Student t Test 5 Chi-Squared Test M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Chi-Squared Test: Motivation The t test is to determine whether two samples have the same mean. The chi-squared test is to determine whether two samples have the same variance. The application is again to quality control. M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Chi-Squared Test: Theory Given two sets of samples, say x1 , . . . , xm1 and xm1 +1 , . . . , xm1 +m2 (where usually m2 m1 ), compute the unbiased variance estimate V1 of the larger (first) sample m 1 X 1 V1 = (xi − x̄1 )2 , m1 − 1 i=1 and the sum of squares of the smaller (second) sample S2 = m2 X (xm1 +i − x̄2 )2 = (m2 − 1)V2 . i=1 Then the ratio S2 /V1 satisfies the chi-squared (or χ2 ) distribution with m2 − 1 degrees of freedom. M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Distribution Function of the Chi-Squared Variable M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Density Function of the Chi-Squared Variable M. Vidyasagar Hypothesis Testing Hypothesis Testing Hoeffding’s Inequalities K-S Tests for Goodness of Fit Student t Test Chi-Squared Test Application of the Chi-Squared Test Note that the χ2 r.v. is always nonnegative. So, given some confidence δ (usually δ = 0.05), we need to determine a confidence interval xl = Φ−1 (δ), xu = Φ−1 (1 − δ). χ2 ,m2 −1 χ2 ,m2 −1 If the test statistic S2 /V1 lies in the interval [xl , xu ], then we accept the null hypothesis that both samples have the same variance. M. Vidyasagar Hypothesis Testing