Testing Hypothesis That Data Fit a Given Probability Distribution • Problem: We have a sample of size n. Determine if the data fits a probability distribution. • Null Hypothesis, H0: The data fits the distribution. • Fact: Divide the range into k intervals. If the data fits the distribution, then following random variable follows the chi-square distribution with k-1 degrees of freedom. k (observednumber of valuesin kth interval expectednumber of valuesin kth interval) 2 expectednumber of pointsin kth interval j 1 k (n j np j ) 2 j 1 np j Testing Hypothesis That Data Fit a Given Probability Distribution • The value of the above variable computed in a hypothesis test is called chi-square statistic. • If chi-square statistic is too large (far in the right tail of the chi-square distribution) this is a surprising result, and it means that the evidence from the test contradicts the hypothesis that the data fit the probability distribution. Algorithm 1. Perform visual test first. If there is no reason to reject hypothesis proceed as follows. 2. Divide range of values in a sample into k adjacent intervals. 3. Tally the number of observations in each interval. 4. Calculate the chi-square statistic. 5. Calculate the p-value of the test. 6. Decide if the hypothesis should be rejected. Decision Rule Do not reject H0 Reject H0 • Reject hypothesis if pvalue less or equal to some low significance level (e.g. 0.05). Otherwise do not reject hypothesis. 0.15 0.1 dchisq( x 7) 0.05 0 0 10 20 x C rit ical v alue (probability of exc eedence 0.05 q chi sq( 0 .95 7) 1 4.0 67