The P-value: P-value = Pr(observed statistic value | H0 ) It does not tell us anything about the probability of H0. The lower the P-value, the more comfortable we feel about our decision to reject the null hypothesis, but the null hypothesis does not get any more false. The null hypothesis is either true of false and we do NOT know which. The only way to fully determine its truth or falsity is to do a census, and that is not a cost effective solution. A large P-value means that what we have observed is not surprising, assuming that the null hypothesis is true. A small P-value means that what we have observed is rare, assuming that the null hypothesis is true. ALPHA LEVELS An acceptable "rarity" of an observation can be set before taking a sample and before calculating any P-value. Standard levels are 0.10, 0.05, 0.01 and 0.001 and are denoted by the greek letter alpha, a . If P-value < a then we reject the null hypothesis and we say that the results are "statistically significant." Suppose that P-value = 0.02 and a = 0.05. Since the P- value is less than alpha we say: "We reject the null hypothesis at the 5% significance level." Some common critical values: a 1-sided 2-sided ----------------------------------------0.05 1.645 1.96 0.01 2.33 2.576 0.001 3.09 3.29 ============ critical z-scores MAKING ERRORS H0 is true H0 is false H0 is true Type I error H0 is false Reject H0 Fail to reject H0 Reject H0 Fail to reject H0 Type II error Medicine: False Positive, Type I error healthy person is diagnosed with a disease False Negative, Type II error diseased person diagnosed without a disease Pr(Type I error) = Pr(Reject H0 | H0 ) =a So, we always know this probability. Pr(Type II error) = Pr(Fail to reject H0 | H0 false) =b This probability is not as easy to calculate. The power of a test is the probability that it correctly rejects a false null hypothesis, 1- b. I do not believe we have time to adequately study this error situation. DOUBLE TROUBLE! Are men different than women? Are teenagers different than adults? Is drug A different than drug B? Methodology: Samples from each group, X's and Y's. - each group drawn independently and randomly - 10% condition - np & nq => 10 for both groups - two groups are independent of each other e.g. husband/wife, before/after, brother/sister NOT ok - calculate p-hat, q-hat, SD, VAR, of each population - calculate VAR and SD for X - Y - do the usual calculations of CI, z*, P-value The Numerics Calculate p1-hat, q1-hat, p2-hat, q2-hat from the raw data. Calculate VAR1 and VAR2. Calculate VAR and then SD for X-Y. VAR(X-Y) = VAR(X) - VAR(Y) SD(X-Y) = SQ-RT( VAR(X) - VAR(Y) ) Example, page 559 A recent survey of 886 randomly selected teenagers (aged 12-17) found that more than half of them had online profiles. Some researchers and privacy advocates are concerned about the possible access to personal information about teens in public places on the Internet. There appear to be differences between boys and girls in their online behavior. Among teens aged 15-17, 57% of the 248 boys had posted profiles compared to 70% of the 256 girls. What are these, for p-hat and q-hat? boys p1 = q1 = SE(p1)= girls p2 = q2 = SE(p2)= SE(p1 - p2) = SE(p1)= 0.0314 SE(p2)= 0.0286 SE(p1 - p2) = 0.0425 Assumptions and Conditions Randomization 10% Independent groups Success/Failure (np and nq => 10) p1 and p2 Normal ==> p1 - p2 is Normal (not obvious, but true) Calculate the 95% confidence interval for this difference in online behavior. CI = x ≤ z*∙s ≤ 1.96* 0.0425 = 0.13 ≤ 0.083 = (4.7%, 21.3%) CI = (0.70 - 0.57) General: We are 95% confident that the, among teens aged 15-17, the proportion of girls who post online profiles is between 4.7% and 21.3% higher than the proportion of boys who do. Note that 0%, representing no difference, is NOT in the CI. It seems that teen girls are more likely to post profiles on line than boys of the same age. Text, page 564 Will I snore when I am 64? The National Sleep Foundation asked a random sample of 1010 U.S. adults questions about their sleep habits. One question was about snoring. Of the 995 respondents, 37% reported that they snored at least a few nights a week during the past year. Would you expect this that percentage to be the same for all age groups? Split into two age categories, 26% of the 184 people under 30 snored, compared with 39% of the 811 in the older group. Is this difference of 13% real, or due only to natural fluctuations in the sample we've chosen? We need an Hypothesis Test. parameter of interest is the true difference between the reported snoring rates of the two age groups. We hypothesize that there is no difference in the proportions. p1 is older group p2 is younger group Check: Independence Randomization 10% condition Independent groups Success/Failure (np & nq > 10 for both groups) H0: p1 - p2 = 0 Ha: p1 - p2 ∫ 0 the difference is 0 a two-tailed test. The Calculations: SD(p1 - p2) ≈ SE(p1 - p2) ( p1∙q1 / n1 + p2∙q2 / n2 ) ^ 0.5 But, the null hypothesis claims that the proportions are the same, p1 = p2 . Since we are assuming the null hypothesis to be true we can pool our data and use the same values for both p's and q's.