Stat 31, Section 1, Last Time • Paired Diff’s vs. Unmatched Samples – Compare with example – Showed graphic about Paired often better • Review of Gray Level Hypo Testing • Inference for Proportions – Confidence Intervals – Sample Size Calculation Reading In Textbook Approximate Reading for Today’s Material: Pages 536-549, 555-566, 582-611 Approximate Reading for Next Class: Pages 582-611, 634-667 Midterm II Coming on Tuesday, April 10 Think about: • Sheet of Formulas – Again single 8 ½ x 11 sheet – New, since now more formulas • • • • Redoing HW… Asking about those not understood Midterm not cumulative Covered Material: HW 7 - 11 Midterm II Extra Office Hours: Monday, 4/9, 10:00 – 12:00 12:30 – 3:00 Tuesday, 4/10, 8:30 – 10:00 11:00 – 12:00 Hypo. Tests for Proportions Case 3: Hypothesis Testing General Setup: H 0 : p H A : p Given Value Hypo. Tests for Proportions Assess strength of evidence by: P-value = P{what saw or m.c. | B’dry} = = P{observed Problem: sd of p ˆ p̂ or m.c. | p = p 1 p n } Hypo. Tests for Proportions p 1 p Problem: sd of p ˆ n Solution: (different from above “best guess” and “conservative”) calculation is done base on: p Hypo. Tests for Proportions e.g. Old Text Problem 8.16 Of 500 respondents in a Christmas tree marketing survey, 44% had no children at home and 56% had at least one child at home. The corresponding figures from the most recent census are 48% with no children, and 52% with at least one. Test the null hypothesis that the telephone survey has a probability of selecting a household with no children that is equal to the value of the last census. Give a Z-statistic and P-value. Hypo. Tests for Proportions e.g. Old Text Problem 8.16 Let p = % with no child (worth writing down) H 0 : p 0.48 H A : p 0.48 Hypo. Tests for Proportions Observed p ˆ 0.44 , from n 500 P-value = Pp ˆ 0.44 or m.c. | p 0.48 ˆ p 0.04 | p 0.48 P p 2 Ppˆ 0.44 Hypo. Tests for Proportions P-value 2 Pp ˆ 0.44 = 2 * NORMDIST(0.44,0.48,sqrt(0.48*(1-0.48)/500),true) See Class Example 30, Part 3 http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg30.xls = 0.0734 Yes-No: no strong evidence Gray-level: somewhat strong evidence Hypo. Tests for Proportions Z-score version: P-value = P ˆ p 0.04 P p pˆ p p1 p n So Z-score is: 0.04 0.481 0.48 500 = 1.79 Hypo. Tests for Proportions Note also 1-sided version: Yes-no: is strong evidence Gray Level: stronger evidence HW: 8.22a (0.0057), 8.23, interpret from both yes-no and gray-level viewpoints 2 Sample Proportions In text Section 8.2 • Skip this • Ideas are only slight variation of above • Basically mix & Match of 2 sample ideas, and proportion methods • If you need it (later), pull out text • Covered on exams to extent it is in HW Chapter 9: Two-Way Tables Main idea: Divide up populations in two ways – – • E.g. 1: E.g. 2: Age & Sex Education & Income Typical Major Question: How do divisions relate? Are the divisions independent? • – – Similar idea to indepe’nce in prob. Theory Statistical Inference? Two-Way Tables Class Example 31, Textbook Example 9.18 Market Researchers know that background music can influence mood and purchasing behavior. A supermarket compared three treatments: No music, French accordion music and Italian string music. Under each condition, the researchers recorded the numbers of bottles of French, Italian and other wine purshased. Two-Way Tables Class Example 31, Textbook Example 9.18 Here is the two way table that summarizes the data: Wine: French Italian Other None 30 11 43 Music French 39 1 35 Italian 30 19 35 Are the type of wine purchased, and the background music related? Two-Way Tables Class Example 31: Visualization Class Example 31 - Counts 45 40 35 30 # Bottles 25 purchased 20 15 10 Other Wine 5 Italian Wine 0 None French Wine French Italian Music Shows how counts are broken down by: music type wine type Two-Way Tables Big Question: Is there a relationship? Class Example 31 - Counts 45 40 35 30 # Bottles 25 purchased 20 15 10 Other Wine 5 Italian Wine 0 None French Wine French Note: tallest bars French Wine French Music Italian Wine Italian Music Other Wine No Music Suggests there is a relationship Music Italian Two-Way Tables General Directions: • Can we make this precise? • Could it happen just by chance? – • Really: how likely to be a chance effect? Or is it statistically significant? – I.e. music and wine purchase are related? Two-Way Tables Class Example 31, a look under the hood… Excel Analysis, Part 1: http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls Notes: • Read data from file • Only appeared as column • Had to re-arrange • Better way to do this??? • Made graphic with chart wizard Two-Way Tables HW: Make 2-way bar graphs, and discuss relationships between the divisions, for the data in: 9.1 (younger people tend to be better educated) 9.9 9.11 (you try these…) Two-Way Tables An alternate view: Replace counts by proportions (or %-ages) Class Example 31 (Wine & Music), Part 2 http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls Advantage: May be more interpretable Drawback: No real difference (just rescaled) Two-Way Tables Testing for independence: What is it? From probability theory: P{A | B} = P{A} i.e. Chances of A, when B is known, are same as when B is unknown Table version of this idea? Independence in 2-Way Tables Recall: P{A | B} = P{A} Counts - proportions analog of these? • • Analog of P{A}? – Proportions of factor A, “not knowing B” – Called “marginal proportions” Analog of P{A|B}??? Independence in 2-Way Tables Marginal proportions (or counts): • Sums along rows • Sums along columns • Useful to write at margins of table • Hence name marginal • Number of independent interest • Also nice to put total at bottom Independence in 2-Way Tables Marginal Counts: Class Example 31 (Wine & Music), Part 3 http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls Marginals are of independent interest: • Other wines sold best (French second) • Italian music sold most wine… • But don’t tell whole story – – E.g.Can’t see same music & wine is best… Full table tells more than marginals Independence in 2-Way Tables Recall definition of independence: P{A | B} = P{A} Counts analog of P{A|B}??? P A & B Recall: PA PA | B PB So equivalent condition is: P{ A}P{B} P{ A & B} Independence in 2-Way Tables Counts analog of P{A|B}??? Equivalent condition for independence is: P{ A & B} P{ A} P{B} So for counts, look for: Table Prop’n = Row Marg’l Prop’n x Col’n Marg’l Prop’n i.e. Entry = Product of Marginals Independence in 2-Way Tables Visualize Product of Marginals for: Class Example 31 (Wine & Music), Part 4 http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls Shows same structure as marginals But not match between music & wine Good null hypothesis Class Example 31 - Independent Model 0.18 0.16 0.14 0.12 0.1 # Bottles purchased 0.08 0.06 0.04 Other Wine 0.02 Italian Wine 0 None Music French Wine French Italian Independence in 2-Way Tables • Independent model appears different • But is it really different? • Or could difference be simply explained by natural sampling variation? • Check for statistical significance… Independence in 2-Way Tables Approach: • Measure “distance between tables” – Use Chi Square Statistic – Has known probability distribution when table is independent • Assess significance using P-value – Set up as: H0: Indep. – P-value = P{what saw or m.c. | Indep.} HA: Dependent Independence in 2-Way Tables Chi-square statistic: • Based on: Observed Counts (raw data), Obsi Expected Counts (under indep.), Expi • X 2 cells i Obsi Expi 2 Expi Notes: – Small for only random variation – Large for significant departure from indep. Independence in 2-Way Tables Chi-square statistic calculation: X 2 Obsi Expi cells i 2 Expi Class example 31, Part 5: http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls – Calculate term by term – Then sum – Is X2 = 18.3 “big” or “small”? Independence in 2-Way Tables H0 distribution of the X2 statistic: “Chi Squared” (another Greek letter ) 2 Parameter: “degrees of freedom” (similar to T distribution) Excel Computation: – CHIDIST (given cutoff, find area = prob.) – CHIINV (given prob = area, find cutoff) Independence in 2-Way Tables Explore the 2 distribution: Applet from Webster West (U. So. Carolina) http://www.stat.sc.edu/~west/applets/chisqdemo.html • Right Skewed Distribution • Nearly Gaussian for more d.f. Independence in 2-Way Tables For test of independence, use: degrees of freedom = = (#rows – 1) x (#cols – 1) E.g. Wine and Music: d.f. = (3 – 1) x (3 – 1) = 4 Independence in 2-Way Tables E.g. Wine and Music: P-value = P{Observed X2 or m.c. | Indep.} = = P{X2 = 18.3 of m.c. | Indep.} = = P{X2 >= 18.3 | d.f. = 4} = = 0.0011 Also see Class Example 31, Part 5 http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls Independence in 2-Way Tables E.g. Wine and Music: P-value = 0.001 Yes-No: Very strong evidence against independence, conclude music has a statistically significant effect Gray-Level: evidence Also very strong Independence in 2-Way Tables Excel shortcut: CHITEST • Avoids the (obs-exp)^2 / exp calculat’n • Automatically computes d.f. • Returns P-value