two ways

advertisement
Stat 31, Section 1, Last Time
•
Paired Diff’s vs. Unmatched Samples
–
Compare with example
–
Showed graphic about Paired often better
•
Review of Gray Level Hypo Testing
•
Inference for Proportions
–
Confidence Intervals
–
Sample Size Calculation
Reading In Textbook
Approximate Reading for Today’s Material:
Pages 536-549, 555-566, 582-611
Approximate Reading for Next Class:
Pages 582-611,
634-667
Midterm II
Coming on Tuesday, April 10
Think about:
• Sheet of Formulas
– Again single 8 ½ x 11 sheet
– New, since now more formulas
•
•
•
•
Redoing HW…
Asking about those not understood
Midterm not cumulative
Covered Material: HW 7 - 11
Midterm II
Extra Office Hours:
Monday, 4/9,
10:00 – 12:00
12:30 – 3:00
Tuesday, 4/10,
8:30 – 10:00
11:00 – 12:00
Hypo. Tests for Proportions
Case 3:
Hypothesis Testing
General Setup:

 
H 0 : p  

 

 
H A : p  

 
Given Value
Hypo. Tests for Proportions
Assess strength of evidence by:
P-value = P{what saw or m.c. | B’dry} =
= P{observed
Problem: sd of p
ˆ
p̂ or m.c. | p =
p 1  p 
n
}
Hypo. Tests for Proportions
p 1  p 
Problem: sd of p 
ˆ
n
Solution: (different from above “best guess”
and “conservative”)
calculation is done base on:
p
Hypo. Tests for Proportions
e.g. Old Text Problem 8.16
Of 500 respondents in a Christmas tree
marketing survey, 44% had no children
at home and 56% had at least one child
at home. The corresponding figures
from the most recent census are 48%
with no children, and 52% with at least
one. Test the null hypothesis that the
telephone survey has a probability of
selecting a household with no children
that is equal to the value of the last
census. Give a Z-statistic and P-value.
Hypo. Tests for Proportions
e.g. Old Text Problem 8.16
Let p = % with no child
(worth writing down)
H 0 : p  0.48
H A : p  0.48
Hypo. Tests for Proportions
Observed p
ˆ  0.44 , from n  500
P-value =
 Pp
ˆ  0.44 or m.c. | p  0.48
ˆ  p  0.04 | p  0.48
 P p
 2  Ppˆ  0.44
Hypo. Tests for Proportions
P-value  2  Pp
ˆ  0.44
= 2 * NORMDIST(0.44,0.48,sqrt(0.48*(1-0.48)/500),true)
See Class Example 30, Part 3
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg30.xls
= 0.0734
Yes-No:
no strong evidence
Gray-level:
somewhat strong evidence
Hypo. Tests for Proportions
Z-score version:
P-value =



 P



ˆ  p  0.04
P p
pˆ  p

p1  p 
n
So Z-score is:


0.04


0.481  0.48 

500

= 1.79
Hypo. Tests for Proportions
Note also 1-sided version:
Yes-no:
is strong evidence
Gray Level:
stronger evidence
HW: 8.22a (0.0057), 8.23, interpret
from both yes-no and gray-level
viewpoints
2 Sample Proportions
In text Section 8.2
•
Skip this
•
Ideas are only slight variation of above
•
Basically mix & Match of 2 sample
ideas, and proportion methods
•
If you need it (later), pull out text
•
Covered on exams to extent it is in HW
Chapter 9: Two-Way Tables
Main idea:
Divide up populations in two ways
–
–
•
E.g. 1:
E.g. 2:
Age & Sex
Education & Income
Typical Major Question:
How do divisions relate?
Are the divisions independent?
•
–
–
Similar idea to indepe’nce in prob. Theory
Statistical Inference?
Two-Way Tables
Class Example 31, Textbook Example 9.18
Market Researchers know that background
music can influence mood and
purchasing behavior. A supermarket
compared three treatments: No music,
French accordion music and Italian
string music. Under each condition, the
researchers recorded the numbers of
bottles of French, Italian and other wine
purshased.
Two-Way Tables
Class Example 31, Textbook Example 9.18
Here is the two way table that summarizes
the data:
Wine:
French
Italian
Other
None
30
11
43
Music
French
39
1
35
Italian
30
19
35
Are the type of wine purchased, and the
background music related?
Two-Way Tables
Class Example 31:
Visualization
Class Example 31 - Counts
45
40
35
30
# Bottles 25
purchased 20
15
10
Other Wine
5
Italian Wine
0
None
French Wine
French
Italian
Music
Shows how counts are broken down by:
music type
wine type
Two-Way Tables
Big Question:
Is there a
relationship?
Class Example 31 - Counts
45
40
35
30
# Bottles 25
purchased 20
15
10
Other Wine
5
Italian Wine
0
None
French Wine
French
Note: tallest bars
French Wine  French Music
Italian Wine  Italian Music
Other Wine  No Music
Suggests there is a relationship
Music
Italian
Two-Way Tables
General Directions:
•
Can we make this precise?
•
Could it happen just by chance?
–
•
Really: how likely to be a chance effect?
Or is it statistically significant?
–
I.e. music and wine purchase are related?
Two-Way Tables
Class Example 31, a look under the hood…
Excel Analysis, Part 1:
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
Notes:
•
Read data from file
•
Only appeared as column
•
Had to re-arrange
•
Better way to do this???
•
Made graphic with chart wizard
Two-Way Tables
HW:
Make 2-way bar graphs, and discuss
relationships between the divisions, for
the data in:
9.1
(younger people tend to be better
educated)
9.9
9.11
(you try these…)
Two-Way Tables
An alternate view:
Replace counts by proportions (or %-ages)
Class Example 31 (Wine & Music), Part 2
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
Advantage:
May be more interpretable
Drawback:
No real difference (just rescaled)
Two-Way Tables
Testing for independence:
What is it?
From probability theory:
P{A | B} = P{A}
i.e. Chances of A, when B is known, are
same as when B is unknown
Table version of this idea?
Independence in 2-Way Tables
Recall:
P{A | B} = P{A}
Counts - proportions analog of these?
•
•
Analog of P{A}?
–
Proportions of factor A, “not knowing B”
–
Called “marginal proportions”
Analog of P{A|B}???
Independence in 2-Way Tables
Marginal proportions (or counts):
•
Sums along rows
•
Sums along columns
•
Useful to write at margins of table
•
Hence name marginal
•
Number of independent interest
•
Also nice to put total at bottom
Independence in 2-Way Tables
Marginal Counts:
Class Example 31 (Wine & Music), Part 3
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
Marginals are of independent interest:
•
Other wines sold best (French second)
•
Italian music sold most wine…
•
But don’t tell whole story
–
–
E.g.Can’t see same music & wine is best…
Full table tells more than marginals
Independence in 2-Way Tables
Recall definition of independence:
P{A | B} = P{A}
Counts analog of P{A|B}???


P
A
&
B
Recall:
PA  PA | B 
PB
So equivalent condition is:
P{ A}P{B}  P{ A & B}
Independence in 2-Way Tables
Counts analog of P{A|B}???
Equivalent condition for independence is:
P{ A & B}  P{ A}  P{B}
So for counts, look for:
Table Prop’n = Row Marg’l Prop’n x Col’n Marg’l Prop’n
i.e. Entry = Product of Marginals
Independence in 2-Way Tables
Visualize Product of Marginals for:
Class Example 31 (Wine & Music), Part 4
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
Shows same structure
as marginals
But not match between
music & wine
Good null hypothesis
Class Example 31 - Independent Model
0.18
0.16
0.14
0.12
0.1
# Bottles
purchased 0.08
0.06
0.04
Other Wine
0.02
Italian Wine
0
None
Music
French Wine
French
Italian
Independence in 2-Way Tables
•
Independent model appears different
•
But is it really different?
•
Or could difference be simply explained
by natural sampling variation?
•
Check for statistical significance…
Independence in 2-Way Tables
Approach:
•
Measure “distance between tables”
–
Use Chi Square Statistic
–
Has known probability distribution when
table is independent
•
Assess significance using P-value
–
Set up as: H0: Indep.
–
P-value = P{what saw or m.c. | Indep.}
HA: Dependent
Independence in 2-Way Tables
Chi-square statistic:
•
Based on:
Observed Counts (raw data), Obsi
Expected Counts (under indep.), Expi
•
X 
2

cells i
Obsi  Expi 
2
Expi
Notes:
–
Small for only random variation
–
Large for significant departure from indep.
Independence in 2-Way Tables
Chi-square statistic calculation:
X 
2

Obsi  Expi 
cells i
2
Expi
Class example 31, Part 5:
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
–
Calculate term by term
–
Then sum
–
Is X2 = 18.3
“big” or “small”?
Independence in 2-Way Tables
H0 distribution of the X2 statistic:
“Chi Squared” (another Greek letter  )
2
Parameter: “degrees of freedom”
(similar to T distribution)
Excel Computation:
–
CHIDIST (given cutoff, find area = prob.)
–
CHIINV
(given prob = area, find cutoff)
Independence in 2-Way Tables
Explore the 
2
distribution:
Applet from Webster West (U. So. Carolina)
http://www.stat.sc.edu/~west/applets/chisqdemo.html
•
Right Skewed Distribution
•
Nearly Gaussian for more d.f.
Independence in 2-Way Tables
For test of independence, use:
degrees of freedom =
= (#rows – 1) x (#cols – 1)
E.g. Wine and Music:
d.f. = (3 – 1) x (3 – 1) = 4
Independence in 2-Way Tables
E.g. Wine and Music:
P-value = P{Observed X2 or m.c. | Indep.} =
= P{X2 = 18.3 of m.c. | Indep.} =
= P{X2 >= 18.3 | d.f. = 4} =
= 0.0011
Also see Class Example 31, Part 5
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
Independence in 2-Way Tables
E.g. Wine and Music:
P-value = 0.001
Yes-No: Very strong evidence against
independence, conclude music has a
statistically significant effect
Gray-Level:
evidence
Also very strong
Independence in 2-Way Tables
Excel shortcut:
CHITEST
•
Avoids the (obs-exp)^2 / exp calculat’n
•
Automatically computes d.f.
•
Returns P-value
Download