Chapter 11 Notes (Word)

advertisement
The Chi-Square Distribution
Χ2 has 3 main uses:
1. Goodness of fit tests
2. Independence Tests
3. One Sample Variance Tests
The Chi-square distribution is an asymmetrical distribution with one parameter,
v = degrees of freedom or df.
If you square a Z you get a Χ2 with 1 degree of freedom.
All chi-square distributions are always > 0.
They are continuous.
Most looked like skewed Normal distributions.
The mean of a chi-square = v and the standard deviation =√2𝑣
There is a Χ2–cdf on your calculators. It takes 3 inputs (min, max, df)
The min should always be 0, that is actually the min of any Χ2 -distribution.
You can then find the right tail probability (usually the p-value) by subtracting
from 1.
Ex. If X ~ Χ2 with df = 9, then P(Χ ≤ 10) = 0.6495 οƒ  P(X > 10) = 0.3505
Ex. What the probability that if X ~ Χ2 with df = 8, is within 1 standard deviation
of its mean?
Goodness of fit:
Ex. You pick 20 cards from a standard deck with replacement and count the
number of each suit. You count 8 spades, 6 hearts, 4 clubs and 2 diamonds. Do
you think that the selection was random from a standard deck?
What would you expect?
5 of each suit.
How different is this? Is there evidence to conclude that this would be very
unlikely to happen by chance?
Ho: The suits are uniformly distributed (each probability = ¼)
Ha: The suits are not uniformly distributed (at least one probability ≠ ¼)
Note that α = ? so assume that α = 0.05.
The test statistic: 𝑋 = ∑𝑛
(𝑂−𝐸)2
𝐸
Where n is the number of categories, O is the observed frequency in each cell,
and E is the expected(if Ho is true) frequency in each cell.
If Ho is true then X ~ Χ2 with df = n – 1, where n is the number of categories.
The p-value = P(Χ2 > X).
In the above example: (Show table on board)
TS = 4, n = 4 so df = 3 and the p-value = 0.261. Do Not reject ho. There is not
sufficient evidence to support the claim that the selection was not random from
a standard deck.
Exercise 11.11.2
A 6-sided die is rolled 120 times. Fill in the expected frequency column.
Then, conduct a hypothesis test to determine if the die is fair. The data
below are the result of the 120 rolls.
Face Value Frequency Expected Frequency
1
15
2
29
3
16
4
15
5
30
6
15
Exercise 11.11.6
The City of South Lake Tahoe, CA, has an Asian population of 1419
people, out of a total population of 23,609 (Source: U.S. Census Bureau,
Census 2000). Suppose that a survey of 1419 self reported Asians in
Manhattan, NY, area yielded the data in the table below. Conduct a
goodness of fit test to determine if the self-reported sub-groups of Asians
in the Manhattan area fit that of the Lake Tahoe area.
Race Lake Tahoe Frequency Manhattan Frequency
Asian Indian
131
174
Chinese
118
557
Filipino
1045
518
Japanese
80
54
Korean
12
29
Vietnamese
9
21
Other
24
66
Testing to determine if two categorical variables are Independent
You have 2 categorical variables and you want to test if they are associated
(NOT independent). Make a contingency table. You will have a row variable and
a column variable and the frequencies will go in the table.
Ho: The row variable and column variable are independent
Ha: The row variable and column variable are not independent
The test statistic: 𝑋 = ∑π‘Ÿ ∑𝑐
(𝑂−𝐸)2
𝐸
O’s are your observed frequencies
E’s are your expected frequencies, assuming that Ho is true.
(𝑖 π‘‘β„Ž π‘Ÿπ‘œπ‘€ π‘‘π‘œπ‘‘π‘Žπ‘™) ∗ (𝑗 π‘‘β„Ž π‘π‘œπ‘™π‘’π‘šπ‘› π‘‘π‘œπ‘‘π‘Žπ‘™)
𝐸𝑖𝑗 =
πΊπ‘Ÿπ‘Žπ‘›π‘‘ π‘‘π‘œπ‘‘π‘Žπ‘™
Degrees of freedom = (r-1)(c-1) where r is the number of row and c is the
number of columns.
The p-value = P(X2 > TS)
The more dependent your variables are, the bigger the differences
between O and E and as a result the bigger your TS will be.
The chi-square approximation will be accurate when all the E’s ≥ 5. *
Ex.
A study was conducted at a large university to see if age and drink preference
were associated. Some abridged results are shown below. Is there evidence at
the 5% significance level to support the claim that age and drink preference are
associated?
Observed
Under 19
19 and older
Total
Soda
40
20
60
Coffee
15
25
40
Total
55
45
100
Expected
Soda
Coffee
Total
Under 19
33
22
55
19 and older
27
18
45
Total
60
40
100
TS = 8.249
p-value = 0.004
Reject Ho. There is sufficient evidence at the 5% significance level to support
the claim that age and drink preference are associated.
Degrees of freedom?
NOTE: Calculator instructions follow.
TI-83+ and TI-84 calculator: Press the MATRX key and arrow over to
EDIT. Press 1:[A]. Press 2 ENTER 2 ENTER. Enter the table values by
row. Press ENTER after each. Press STAT and arrow over to TESTS.
Arrow down to C:χ2-TEST. Press ENTER. You should see Observed:[A]
and Expected:[B]. Arrow down to Calculate. Press ENTER.
Exercise 11.11.10
Car manufacturers are interested in whether there is a relationship
between the size of car an individual drives and the number of people in
the driver’s family (that is, whether car size and family size are
independent). To test this, suppose that 800 car owners were randomly
surveyed with the following results. Conduct a test for independence.
Family Size
1
2
3-4
5+
Sub & Compact
20
20
20
20
Mid-size
35
50
50
30
Full-size
40
70
100
70
Van & Truck
35
80
90
70
Statistical Inference for a Population Variance.
the quantity (n-1)s2 / σ2 has a chi-square distribution when the population from
which the sample is taken is approximately normally distributed.
The Chi-square distribution is a non-symmetric continuous distribution. It has
one parameter called v, the degrees of freedom = n – 1. The χ2 distribution > 0.
There are tables, but they are not very useful.
There is a test on your calculator labeled χ2 but it is not the right test.
But you do have a χ2 cdf on your calculator. This will allow you to find p-values
for HT’s.
There are 3 forms for the HT’s for σ2 .
H’s
Ho: σ2 ≥ σ2o
Ha: σ2 < σ2o
p-values
P(χ2 < TS)
Ho: σ2 ≤ σ2o
Ha: σ2 > σ2o
P( χ2> TS)
Ho: σ2 = σ2o
P(χ2 >TS) + P(χ2 < 1/TS) if TS >1
2
2
Ha: σ ≠σ o
P(χ2 <TS) + P(χ2 > 1/TS) if TS <1
(the p-value is the sum of the tail probabilities)
TS = (n-1)s2 / σ2 .
To find a (1 – α) 100% CI for σ2,
(𝑛 − 1)𝑠 2
(𝑛 − 1)𝑠 2
2
<
𝜎
<
2
2
𝑋𝛼/2
𝑋1−𝛼/2
Recall, to get the p-values on the TI 83/84, under DISTR,
Go to χ2 –cdf, which takes 3 inputs: min, max, df.
the min will usually be 0.
ex. The P(χ2 < 30.1435) = .95 for χ2 with 19 df.
so P(χ2 > 30.1435) = .05
Ex. Test that the population variance is different from 225 (standard deviation is
different from 15) at the 5% significance level, when you have a random sample
of 7 with mean 123 and standard deviation 9 from a normal population.
Ho: σ2 = 225
Ha: σ2 ≠225
n = 7 so v = 6
TS = 6 * 81 / 225 = 2.16
p-value = P(χ2 > 2.16) +P(χ2 < .4630) = .9044 +.0017 = .9061
Accept Ho.
There is not sufficient evidence to support the claim that the variance differs
from 225 or the standard deviation differs from 15.
95% CI for σ2:
X2(α/2) = 12.591
X2(1-α/2) = 1.636
Chapter 11 The Chi-Square Distribution Recap:
𝑍 2 = πœ’ 2 with 1 df.
P(|Z| > 1.96) = P(Z < -1.96) + P(Z > +1.96) = 0.05 οƒ  𝑃(πœ’ 2 > 3.8416) = 0.05
1. Goodness of fit tests
Ho: Good fit
Ha: Not a good fit
The test statistic: 𝑋 = ∑𝑛
(𝑂−𝐸)2
𝐸
Degrees of freedom = n – 1 where n is the number of categories.
The p-value = P(X2 > TS)
The chi-square approximation will be accurate when all the E’s ≥ 5.
2. Independence Tests
Ho: The row variable and column variable are independent
Ha: The row variable and column variable are not independent
The test statistic: 𝑋 = ∑π‘Ÿ ∑𝑐
(𝑂−𝐸)2
𝐸
Degrees of freedom = (r-1)(c-1) where r is the number of row and c is the
number of columns.
The p-value = P(X2 > TS)
The chi-square approximation will be accurate when all the E’s ≥ 5. *
3. One Sample Variance Tests
Ho: σ2 =σo2
Ha: σ2 ≠ σo2 .
The Test statistic: 𝑋 =
(𝑛−1)𝑠 2
𝜎02
Degrees of freedom = n – 1 where n is the sample size.
The p-value = P(X2 > TS) + P(X2 < 1/TS) (if TS > 1)
The p-value = P(X2 < TS) + P(X2 > 1/TS) (if TS < 1)
The chi-square approximation will be accurate when you have a random
sample from a Normal Population.
100(1 – α)% Confidence Interval
(𝑛−1)𝑠2
2
𝑋𝛼/2
< 𝜎2 <
(𝑛−1)𝑠2
2
𝑋1−𝛼/2
Examples:
1. FDA regulations require that the standard deviation of a 16 ounce soda
can should be less than 0.1 ounce. A random sample of 10 cans is taken
from a large Normal population of 16 ounce cans. The sample has a mean
of 15.8 ounces and a standard deviation of 0.08 ounces. Does this provide
evidence that the variability is as small as desired? Justify using a
hypothesis test at the 5% significance level.
2. JAMA published a study in Alcohol consumption in patients suffering
from myocardial infarction. The data are given below. Does this provide
evidence that congestive heart failure (CHF) depends on alcohol
consumption? Justify!
CHF
Yes
Yes
Yes
No
No
No
Alc.
Cons.
Abstain
Less7
7More
Abstain
Less7
7More
Freq
146
106
29
750
590
292
3. An article in Chance showed the results of an Ancient Greek excavation.
There were 837 pieces of pottery found. Test whether or not one type of
piece of pottery is more likely to be found than the others.
Pot Category
Burnished
Monochrome
Painted
Other
Total
Number found
133
460
183
61
837
Chapter 11 Homework:
1, 3, 5, 9, 11, 13, 15, 17, 21, 23
Download