Two Sample Difference of Means T-Test

advertisement
Inferential Statistics & Test of
Significance
Confidence Interval (CI)


Y  Z / 2 y
Y = mean
Z = Z score related with a 95% CI
σ = standard error
samplemean 1.96(or 2) * standarder ror
Building a CI
• Assume the following
 y  100

 y  15

N  400

Y
 y

y
15
400
N
 .750
CI
100  (1.96)(0.750 )
Upper  101 .47
Lower  98.53
Why do we use 1.96?
Source; Knoke & Bohrnstead (1991:167)
Is there a sample that is different from the mean?
Significance Testing
• When we explain some phenomenon we
move beyond description to inferential
statistics and hypothesis testing.
• Tests of significance allow us to test
hypotheses, and when we find a
relationship between variables, reject the
null hypothesis.
Hypothesis testing
• Hypothesis testing means that we are testing
our null hypothesis (Ho) against some
competing or alternative hypothesis (H1)
• Normally we choose statements such as
Ho : μy = 100
H1: μy ≠100
Or
H1: μy > 100
Or
H1: μy < 100
Significance Testing
• Even with high powered statistical measures,
there will be results that pop up that are affected
by chance. If we were to keep running our
models a thousand times, or fewer, we would
likely see some results that do not stem from
systematic processes.
• Thus, we need to determine at what level of
significance we are willing to frame our results.
We can never be 100% confident.
• Conventional levels of significance where we
reject the null hypothesis are usually .05 or .01.
The probability .10 is weakly significant.
Significance Testing
• When you erroneously reject the null
hypothesis when it is true, you make a
Type I error. This means you are
accepting a “False Positive” result.
• Think of this as a fiancé test. The
chances of rejecting or saying no to
mister or miss “right”
Significance Testing
• A Type II error occurs when you accept the
null hypothesis when it is not true.
• This is a “False Negative”, when you
have say yes to Mr. or Miss “wrong”
• Type II errors in statistical testing result
from too little data, omitted variable bias,
and multicollinearity.
Other distributions
• The normal distribution assumes:
1. We know the standard error of the population,
however, often we don’t know it.
2. The t-distribution become the best alternative
when we don’t know the standard error but we
know the standard deviation.
3. As the sample gets bigger the t-distribution
approaches the normal distribution
4. There are other distribution such as chi square
and the that we will discuss latter.
T- Distribution & Normal Distribution
The form of the t-distribution depends on the sample size. As the sample gets
Larger there is not difference between the normal and the t-distribution
Source: Gujarati (1992:76)
The t formula
y  y
t
Sy
N
CI  Y  t / 2 ( S y / N )
For α =.05 and N=30 , t =2.045
95% CI using t-test
• Mean= 20
• Sy = 5
• N= 20
20± 2.093 (5/√20) =
22.34 upper
18.88 lower
Why do we care about CI?
• We use CI interval for hypothesis testing
• For instance, we want to know if there is a
difference of home values between El
Paso and Boston
• We want to know whether or not taking
class at Kaplan makes a difference in our
GRE scores
• We want to know if there is a difference
between the treatment and control groups.
Mean Difference testing
Mean USA
El Paso
Las Cruces
Boston
Home Values
T-Tests of Independence
• Used to test whether there is a significant
difference between the means of two
samples.
• We are testing for independence, meaning
the two samples are related or not.
• This is a one-time test, not over time with
multiple observations.
• Example: The values of homes between El
Paso and Boston
T-Test of Independence
• Useful in experiments where people are
assigned to two groups, when there should
be no differences, and then introduce
Independent variables (treatment) to see if
groups have real differences, which would
be attributable to introduced X variable.
This implies the samples are from different
populations (with different μ).
• This is the Completely Randomized TwoGroup Design.
T-Test of Independence
• For example, we can take a random sample of
high school students and divided into two
groups. One gets tutoring for the SAT and the
other does not.
Ho: μ1≠ μ2
H1: μ1= μ2
• After one group gets tutoring, but not the other,
we compare the scores. We find that indeed the
group exposed to tutoring outperformed the
other group. We thus conclude that tutoring
makes a difference.
• Positive increments at a different rate
Treatment
Control
Pre-test
Post-test
Two Sample Difference of Means T-Test
t
X1  X 2
2


2


(
n

1
)
s

(
n

1
)
s
n

n
1
1
2
2
1
2





n1  n2  2
n1n2  



(n1  1) s1  (n2  1) s2
n1  n2  2
2
Sp2 =
 n1  n2 


n
n
 1 2 
2
Pooled variance of the two groups
= common standard deviation of two groups
Two Sample Difference of Means T-Test
• The nominator of the equation captures
difference in means, while the
denominator captures the variation within
and between each group.
• Important point: of interest is the difference
between the sample means, not sample
and population means. However, rejecting
the null means that the two groups under
analysis have different population means.
An example
• Test on GRE verbal test scores by gender:
Females: mean = 50.9, variance = 47.553, n=6
Males: mean=41.5, variance= 49.544, n=10
t
50.9  41.5
 (6  1)47.553  (10  1)49.544  6  10  




6  10  2
 6(10)  

t
t

9.4
48.826(.26667)

9.4
13.02
9.4
t
 2.605
3.608
Now what do we do with this
obtained value?
Steps of Testing and Significance
1. Statement of null hypothesis: if there is
not one then how can you be wrong?
2. Set Alpha Level of Risk: .10, .05, .01
3. Selection of appropriate test statistic:
T-test,
4. Computation of statistical value: get
obtained value.
5. Compare obtained value to critical
value: done for you for most methods
in most statistical packages.
Steps of Testing and Significance
6. Comparison of the obtained and
critical values.
7. If obtained value is more extreme than
critical value, you may reject the null
hypothesis. In other words, you have
significant results.
8. If point seven above is not true,
obtained is lower than critical, then
null is not rejected.
GRE Verbal Example
Obtained Value: 2.605
Critical Value?
Degrees of Freedom: number of cases left after subtracting
1 for each sample. (14)
Ho : μf =μm
H1: μf ≠μm
Is the null hypothesis (Ho) supported?
Answer: No, women have higher verbal skills and this
is statistically significant. This means that the mean
scores of each gender as a population are different.
Paired T-Tests
• We use Paired T-Tests, test of
dependence, to examine a single sample
subjects/units under two conditions,
such as pretest - posttest experiment.
• For example, we can examine whether a
group of students improves if they retake
the GRE exam. The T-test examines if
there is any significant difference between
the two studies. If so, then possibly
something like studying more made a
difference.
Paired T-Tests
• Unlike a test for independence, this test
requires that the two groups/samples being
evaluated are dependent upon each other.
• For example, we can use a paired t-test to
examine two sets of scores across time as
long as they come from the same students.
• This is appropriate for a pre-test –post-test
research design
D
n( D  ( D )
( n  1)
2
2
ΣD = sum differences
between groups, plus it is
squared.
n = number of paired
groups
Comparing Test Scores
Midterm
Final
48
71.2
69
73.3
95
96
87
94.2
50
81.4
75
86.7
74
72.8
88
88
92
95
69
88
75
91.8
86
93.6
73
71.8
60
80.1
Paired Samples Statistics
Pair
1
MID
FINAL
Mean
74.3571
84.5643
N
14
14
Std. Deviation
14.60562
9.32924
Std. Error
Mean
3.90352
2.49335
Paired Samples Correlations
N
Pair 1
MID & FINAL
14
Correlation
.710
Sig.
.004
Paired Samples Test
Paired Differences
Pair 1
MID - FINAL
Mean
-10.2071
Std. Deviation
10.34300
Std. Error
Mean
2.76428
95% Confidence
Interval of the
Difference
Lower
Upper
-16.1790
-4.2353
What can we conclude?
t
-3.693
df
13
Sig. (2-tailed)
.003
Download