P-value Example: Problem 35 State DMV records indicate that of all vehicles undergoing emissions testing during the previous year, 70% passed on the first try. A random sample of 200 cars tested in a particular county during the current year yields 124 that passed on the initial test. Does this suggest that the true proportion for this county during the current year differs from the previous statewide proportion? Hypothesis: H0 : p = 0.70 v.s. Ha : p 6= 0.70. Value of the test statistic: 0.62 − 0.70 p̂ − p0 =p = −2.469 z=p p0 (1 − p0 )/n 0.70(1 − 0.70)/200 Significance Level .05 .02 .01 .002 z z z z Rejection Region ≥ 1.96 or z ≤ −1.96 ≥ 2.33 or z ≤ −2.33 ≥ 2.58 or z ≤ −2.58 ≥ 3.08 or z ≤ −3.08 Conclusion Reject H0 Reject H0 Fail to reject H0 Fail to reject H0 P-value Definition The P-value (or observed significance level) is the smallest level of significance at which H0 would be rejected when a specified test procedure is used on a given data set. Once the P-value has been determined, the conclusion at any partivular level α results from comparing the P-value to α: 1. P-value ≤ α ⇒ reject H0 at level α. 2. P-value > α ⇒ fail to reject H0 at level α. Convention: it is customary to call the data significant when H0 is rejected and not significant otherwise. P-value An equivalent definition for P-value: Definition The P-value is the probability calculated assuming H0 is true, of obtaining a test statistic value at least as contradictory to H0 as the value that actually resulted. The smaller the P-value, the more contradictory is the data to H0 . P-value P-value for z Tests for an upper-tailed test 1 − Φ(z) P = Φ(z) for a lower-tailed test 2[1 − Φ(|z|)] for a two-tailed test where Φ(z) is the cdf for standard normal rv. e.g. the P-value for our first example is P = 2[1 − Φ(|z|)] = 2[1 − Φ(| − 2.469|)] = 2[1 − Φ(2.469)] = 2[1 − .9932] = 0.0136 P-value P-value for t Tests for an upper-tailed test 1 − Tν (t) P = Tν (t) for a lower-tailed test 2[1 − Tν (|t|)] for a two-tailed test where Tν (t) is the cdf for t-distribution with degrees of freedom ν. Table A.8 gives the upper tail probability of t-distribution. The relation between the upper tail probability and the cdf is simply given by upper tail probability = 1 − Tν (t) For lower tail probability, recall that the t-distibution is symmetric. Thus the lower tail probability corresponding to t ≤ −c with c > 0 is the same as the upper tail probability corresponding to t ≥ c with the same degrees of freedom. Test about a Population Mean Example: To determine whether the pipe welds in a nuclear power plant meet specifications, a random sample of 10 welds is selected, and tests are conducted on each weld in the sample. The sample data is recorded as follows 101.9 100.4 101.2 100.9 101.7 with X = 101.10 and 101.5 100.9 100.1 101.6 100.8 s = .585. It is known that the weld strength is normally distributed. If the specifications state that the mean strength should be greater than 100.5 lb/in2 , shall we accept that the pipe welds meet the specifications? P-value Hypothesis: H0 : µ = 100 v.s. Ha : µ > 101. Value of the test statistic: t= X − µ0 101.10 − 100.5 √ √ = = 3.24 s/ n .585/ 10 The P-value is P = Tν (t) = T10−1 (3.24) = 0.005 where 0.005 is found from Table A.8 with t = 3.2 and ν = 9. Therefore, if the significance level is α with α ≥ 0.005, e.g. α = 0.05, we will reject H0 ; otherwise we do not reject H0 .