Here

advertisement
OPRE504
Chapter Study Guide
Chapter 12
Compare Two Groups
I
Two-Sample t-Test
Two-Sample t-Test
We assume that two groups are independent from each other and may or may not have different
variances.
1.
State Hypotheses:
H0: πœ‡1 − πœ‡2 = 0
Ha: πœ‡1 − πœ‡2 ≠ 0 (two-tailed)
Ha: πœ‡1 − πœ‡2 > 0 (one-tail upper) or Ha: πœ‡1 − πœ‡2 < 0 (one-tail lower)
2.
Calculate Standard Error of Mean Difference
𝑠2
𝑠2
1
2
𝑆𝐸(𝑦̅1 − 𝑦̅2 ) = √𝑛1 + 𝑛2 , s1 = Standard Deviation of Sample 1, n1= size of Sample 1,
s2 = Standard Deviation of Sample 2, n2= size of sample 2.
3.
Determine Adjusted Degree of Freedom
2
𝑠2
𝑠2
1
2
( + )
𝑛1 𝑛2
df =
2
2
𝑠2
𝑠2
1
1
1
2
( ) +
( )
𝑛1 −1 𝑛1
𝑛2 −1 𝑛2
[Note: the smaller of ( 𝑛1 − 1) and ( 𝑛2 − 1) < df < 𝑛1 + 𝑛2 − 2 ]
4.
Determine Critical Value (t*) according to Degree of Freedom and significance level
∗
𝑑𝑑𝑓
5.
Calculate t-statistic
t=
6.
(𝑦̅1 −𝑦̅2 )− (πœ‡1 −πœ‡2 )
𝑆𝐸(𝑦̅1 −𝑦̅2 )
(𝑦̅ −𝑦̅2 )
= 𝑆𝐸(𝑦1Μ…
Μ…2 )
1 −𝑦
Decision
∗
Reject H0 when |t|> |𝑑𝑑𝑓
|
∗
Fail to Reject H0 when |t| ≤ |𝑑𝑑𝑓
| (t falls between two ends of critical t*)
Q12.1 [Sharpe 2011, Ch.10, E.25] In an investigation of environmental causes of diseases, data
were collected on the annual mortality rate (deaths per 100,000) for male in 61 large towns in
Chaodong Han
OPRE504 Data Analysis and Decisions Class Handout
Page 1 of 8
England and Wales. In addition, those towns are classified into two groups – North and South of
Derby. Is there a significant difference in mortality rates in the two regions at the 5%
significance level? Here are summary statistics:
Mortality
North
South
1.
Count
34
27
Mean
1631.59
1388.85
Median
1631
1369
Standard Deviation
138.470
151.114
H0:
H1:
2.
3.
𝑠2
𝑠2
𝑛
𝑠
𝑆𝐸(𝑦̅𝑁 − 𝑦̅𝑆 ) = √𝑛𝑛 + 𝑛𝑠 =
2
𝑠2
𝑠2
1
2
( + )
𝑛1 𝑛2
df =
2
2
𝑠2
𝑠2
1
1
1
2
( ) +
( )
𝑛1 −1 𝑛1
𝑛2 −1 𝑛2
alpha = 5%,
∗
𝑑𝑑𝑓
=
tailed?
(𝑦̅1 −𝑦̅2 )− (πœ‡1 −πœ‡2 )
𝑆𝐸(𝑦̅1 −𝑦̅2 )
=
4.
t=
=
5.
compare |t| and |t*|, decision:
DDXL – Hypothesis Tests - 2 Var t Test:
More exercises: Chapter 12, Exercises 23, 24, and 26
II.
Chaodong Han
Confidence Interval for the Difference Between Two Group Means
OPRE504 Data Analysis and Decisions Class Handout
Page 2 of 8
Two-Sample t-Interval
We assume that two groups are independent from each other and may or may not have different
variances.
𝑠2
𝑠2
1
2
Step 1: 𝑆𝐸(𝑦̅1 − 𝑦̅2 ) = √𝑛1 + 𝑛2 , s1 = Standard Deviation of Sample 1, n1= size of Sample 1, s2 =
Standard Deviation of Sample 2, n2= size of sample 2.
2
2 2
𝑠
𝑠
( 1+ 2)
Step 2: Calculate adjusted degree of freedom: df =
𝑛1 𝑛2
2
2
𝑠2
𝑠2
1
1
( 1) +
( 2)
𝑛1 −1 𝑛1
𝑛2 −1 𝑛2
∗
Step 3: Find out Critical Value of 𝑑𝑑𝑓
according to the confidence interval and adjusted degree
of freedom (T-Table A-34 in Appendix C)
∗
Step 4: CI = (𝑦̅1 − 𝑦̅2 ) ± 𝑑𝑑𝑓
x 𝑆𝐸(𝑦̅1 − 𝑦̅2 )
Q12.2 [Sharpe 2011, Ch.12, Ex.4, p.386] A chain that specializes in healthy and organic food
would like to compare the sales performance of two of its primary stores in the state of Maryland.
These stores are both in urban, residential areas with similar demographics. A comparison of the
weekly sales randomly sampled over two years yield the following information:
Store #
1
2
a)
N
9
9
Mean
242170
235338
Standard Deviation
23937
29690
Min
211225
187475
Median
232901
232070
Max
292381
287838
Create a 95% confidence interval for the difference in the mean store weekly sales
𝑠2
𝑠2
1
2
𝑆𝐸(𝑦̅1 − 𝑦̅2 )`= √𝑛1 + 𝑛2 =
2
2 2
𝑠
𝑠
( 1+ 2)
df =
𝑛1 𝑛2
2
2
𝑠2
𝑠2
1
1
( 1) +
( 2)
𝑛1 −1 𝑛1
𝑛2 −1 𝑛2
=
∗
𝑑𝑑𝑓
=
∗
CI = (𝑦̅1 − 𝑦̅2 ) ± 𝑑𝑑𝑓
x 𝑆𝐸(𝑦̅1 − 𝑦̅2 ) =
b)
How do you interpret CI in the context?
Chaodong Han
OPRE504 Data Analysis and Decisions Class Handout
Page 3 of 8
c)
Can you tell that one store sells more on weekly average than the other store?
d)
Calculate the Margin of Error
e)
Calculate a 99% confidence interval for the difference in mean store weekly sales
∗
𝑑𝑑𝑓
=
∗
CI = (𝑦̅1 − 𝑦̅2 ) ± 𝑑𝑑𝑓
x 𝑆𝐸(𝑦̅1 − 𝑦̅2 ) =
More exercises:
Credit Card Spending, Guided Example, p.365
Chapter 12, Exercises 20, 22, 23, 39, 49, 50, 51
III
Pooled Samples
Pooled t-Test
We assume that two groups are independent from each other and have the same variances, at
least when the null hypothesis is true.
1.
State Hypotheses:
H0: πœ‡1 − πœ‡2 = 0
Ha: πœ‡1 − πœ‡2 ≠ 0 (two-tailed)
Ha: πœ‡1 − πœ‡2 > 0 (one-tail upper) or Ha: πœ‡1 − πœ‡2 < 0 (one-tail lower)
2.
Calculate Standard Error of Mean Difference
1
1
π‘†πΈπ‘π‘œπ‘œπ‘™π‘’π‘‘ (𝑦̅1 − 𝑦̅2 ) = π‘†π‘π‘œπ‘œπ‘™π‘’π‘‘ √𝑛 + 𝑛
1
Where π‘†π‘π‘œπ‘œπ‘™π‘’π‘‘ = √
2
, n1= size of Sample 1, n2= size of Sample 2.
𝑠12 (𝑛1 −1)+𝑠22 (𝑛2 −1)
𝑛1 +𝑛2 −2
3.
Determine Adjusted Degree of Freedom
df = n1 + n2 – 2( a slightly higher df than two-sample t-tests without equal variances)
4.
Determine Critical Value (t*) according to Degree of Freedom and significance level
∗
𝑑𝑑𝑓
Chaodong Han
OPRE504 Data Analysis and Decisions Class Handout
Page 4 of 8
5.
Calculate t-statistic
(𝑦̅ −𝑦̅2 )
t= 𝑆𝐸(𝑦1Μ…
Μ…2 )
1 −𝑦
6.
Decision
∗
Reject H0 when |t|> |𝑑𝑑𝑓
|
∗
Fail to Reject H0 when |t| ≤ |𝑑𝑑𝑓
| (t falls between two ends of critical t*)
Q12.3 We want to know whether people are more likely to offer a different amount for a used
camera when buying from a friend than when buying from a stranger. The data from an
experiment are as follows. Test your hypothesis at 5% significance level.
N
8
7
Friends
Strangers
Mean Prices
$281.88
$211.43
1.
State Hypotheses:
2.
π‘†π‘π‘œπ‘œπ‘™π‘’π‘‘ = √
𝑠12 (𝑛1 −1)+𝑠22 (𝑛2 −1)
𝑛1 +𝑛2 −2
=
1
1
π‘†πΈπ‘π‘œπ‘œπ‘™π‘’π‘‘ (𝑦̅1 − 𝑦̅2 ) = π‘†π‘π‘œπ‘œπ‘™π‘’π‘‘ √𝑛 + 𝑛
1
3.
Standard Deviation
$18.31
$46.43
2
=
df =
∗
𝑑𝑑𝑓,5%
=
(𝑦̅ −𝑦̅2 )
4.
t= 𝑆𝐸(𝑦1Μ…
5.
∗
compare t and 𝑑𝑑𝑓,5%
and decision:
Μ…2 )
1 −𝑦
=
Pooled Confidence Interval
We assume that two groups are independent from each other and have same variances, at least
when the null hypothesis is true.
Chaodong Han
OPRE504 Data Analysis and Decisions Class Handout
Page 5 of 8
1.
Calculate Standard Error of Mean Difference
1
1
π‘†πΈπ‘π‘œπ‘œπ‘™π‘’π‘‘ (𝑦̅1 − 𝑦̅2 ) = π‘†π‘π‘œπ‘œπ‘™π‘’π‘‘ √𝑛 + 𝑛
1
Where π‘†π‘π‘œπ‘œπ‘™π‘’π‘‘ = √
2
, n1= size of Sample 1, n2= size of Sample 2.
𝑠12 (𝑛1 −1)+𝑠22 (𝑛2 −1)
𝑛1 +𝑛2 −2
2.
Determine Adjusted Degree of Freedom
df = n1 + n2 – 2( a slightly higher df than two-sample t-tests without equal variances)
3.
Determine Critical Value (t*) according to Degree of Freedom and Confidence Interval
∗
Level: 𝑑𝑑𝑓
using T-Table A34
4.
∗
CI = (𝑦̅1 − 𝑦̅2 ) ± 𝑑𝑑𝑓
π‘†πΈπ‘π‘œπ‘œπ‘™π‘’π‘‘ (𝑦̅1 − 𝑦̅2 )
Q12.4 We want to know whether people are more likely to offer a different amount for a used
camera when buying from a friend than when buying from a stranger. The data from an
experiment are as follows. Construct a 95% confidence interval for the difference.
Friends
Strangers
1.
N
8
7
Mean Prices
$281.88
$211.43
Standard Deviation
$18.31
$46.43
Find Standard Error of Difference Distribution:
𝑠12 (𝑛1 −1)+𝑠22 (𝑛2 −1)
π‘†π‘π‘œπ‘œπ‘™π‘’π‘‘ = √
𝑛1 +𝑛2 −2
=
1
1
π‘†πΈπ‘π‘œπ‘œπ‘™π‘’π‘‘ (𝑦̅1 − 𝑦̅2 ) = π‘†π‘π‘œπ‘œπ‘™π‘’π‘‘ √𝑛 + 𝑛
1
2
=
2.
df =
3.
∗
𝑑𝑑𝑓,5%
=
4.
∗
CI = (𝑦̅1 − 𝑦̅2 ) ± 𝑑𝑑𝑓
π‘†πΈπ‘π‘œπ‘œπ‘™π‘’π‘‘ (𝑦̅1 − 𝑦̅2 )
Chaodong Han
OPRE504 Data Analysis and Decisions Class Handout
Page 6 of 8
VI
Paired Data
Paired t-test
Paired data may be used when two groups are not independent from each other. For example, a
firm’s sales in January in 2007 and January in 2008; a subject’s response before a treatment and
after a treatment in an experiment. Such a test is essentially a one-sample t-test where the
difference of means is treated as a single random variable.
1.
State Hypotheses
H0:
μd = Δ0
Ha:
μd ≠ Δ0 (two-tailed test);
μd > Δ0 (one-tailed upper test); or μd < Δ0 (one-tailed lower test)
2.
Determine Critical Value (t*) according to DF (n-1) and significance level
3.
Calculate Standard Error of the Paired Difference
SE(𝑑̅ ) =
4.
𝑠𝑑
√𝑛
Calculate t-statistic
𝑑̅−0
t = SE(𝑑̅) =
5.
, 𝑠𝑑 is standard deviation of the pairwise difference, n = number of pairs
𝑑̅−0
𝑠𝑑
√𝑛
Decisions
∗
Reject H0 when |t|> |𝑑𝑑𝑓
|
∗
Fail to Reject H0 when |t| ≤ |𝑑𝑑𝑓
| (t falls between two ends of critical t*)
Q12.5 We want to know whether credit card spending to change, on average, from December to
January for a market segment. Our data record the credit card expenditure in December 2004 and
January 2005 made by 911 cardholders. The average pairwise difference is $788.18 (December
2004 – January 2005) and standard deviation of the difference is $3740.22.
a)
Since we generally expect spending decreases from December to January, develop a
hypothesis test for this belief at the 5% significance level.
1.
State Hypotheses:
H0:
μd = 0; Ha:
Chaodong Han
μd >0 (one-tailed upper test)
OPRE504 Data Analysis and Decisions Class Handout
Page 7 of 8
2.
Critical Value:
t* =
𝑠𝑑
3.
SE(𝑑̅ ) =
4.
t = SE(𝑑̅) =
5.
compare |t| and |t*|, decision
√𝑛
𝑑̅−0
,=
𝑑̅−0
𝑠𝑑
√𝑛
=
b)
Find a 95% confidence interval for the true mean difference in credit card charges
between those two months for all cardholders in this segment.
1.
t* given df and CI at 95%:
2.
ME = t* x SE(𝑑̅ ) =
3.
CI = 𝑑̅ ±ME =
More exercises on paired t-tests:
Chapter 12 Exercises 53, 55, 56, 57, 58, 63, 64, 66, 67, 68,
Chaodong Han
OPRE504 Data Analysis and Decisions Class Handout
Page 8 of 8
Download