Lecture11_IndependentSamplesTest.pptx

advertisement
Hypothesis Tests:
Two Independent Samples
Cal State Northridge
320
Andrew Ainsworth PhD
2
Major Points
Psy 320 - Cal State
Northridge
• What are independent samples?
• Distribution of differences between
means
• An example
• Heterogeneity of Variance
• Effect size
• Confidence limits
3
Psy 320 - Cal State
Northridge
Independent Samples
• Samples are independent if the
participants in each sample are
not related in any way
• Samples can be considered
independent for 2 basic reasons
▫ First, samples randomly selected
from 2 separate populations
4
Independent Samples
Psy 320 - Cal State
Northridge
• Getting Independent Samples
▫ Method #1
Random
Selection
Population #2
Random
Selection
Population #1
Sample #1
Sample #2
5
Psy 320 - Cal State
Northridge
Independent Samples
• Samples can be considered
independent for 2 basic reasons
▫ Second, participants are randomly
selected from a single population
▫ Subjects are then randomly assigned
to one of 2 samples
6
Independent Samples
Psy 320 - Cal State
Northridge
• Getting Independent Samples
▫ Method #2
Population
Random Selection and
Random Assignment
Sample #1
Sample #2
7
Psy 320 - Cal State
Northridge
Sampling Distributions Again
• If we draw all possible pairs of
independent samples of size n1 and n2,
respectively, from two populations.
• Calculate a mean X 1 and X 2 in each one.
• Record the difference between these
values.
• What have we created?
• The Sampling Distribution of the
Difference Between Means
8
Psy 320 - Cal State
Northridge
Sampling Distribution of the Difference
Between Means
Population 1
Population 2
Calculate
Statistic
1
Sample 1
Sample 2
X11 X21
2
Sample 1
Sample 2
X12 X22
3
Sample 1
Sample 2
X13 X23
...
...
Sample 1
Sample 2
∞
X1 X2
• Mean of sampling
distribution X 1 is 1
• Mean of sampling
distribution X 2 is 2
• So, the mean of a
sampling
distribution X 1  X 2
is 1- 2
9
Psy 320 - Cal State
Northridge
Sampling Distribution of the Difference
Between Means
• Shape
▫ approximately normal if
 both populations are approximately normal
▫ or
 N1 and N2 are reasonably large.
10
Psy 320 - Cal State
Northridge
Sampling Distribution of the Difference
Between Means
• Variability (variance sum rule)
▫ The variance of the sum or difference of 2
independent samples is equal to the sum of their
variances

2
X1  X 2

 X X 
1
2
2
X1


2
X1
n1
2
X2


2
X2

n2
 X2
n1
1

 X2
2
n2
; Remembering that  X 1 
Think Pathagoras c 2  a 2  b 2
X
1
n1
11
Psy 320 - Cal State
Northridge
Sampling Distribution of the Difference
Between Means
• Sampling distribution of X 1  X 2
 12
n1
1 - 2

 22
n2
12
Psy 320 - Cal State
Northridge
Converting Mean Differences into ZScores
• In any normal distribution with a known
mean and standard deviation, we can
calculate z-scores to estimate
probabilities.
z
( X 1  X 2 )  ( 1  2 )
 X X
1
2
• What if we don’t know the ’s?
• You got it, we guess with s!
13
Psy 320 - Cal State
Northridge
Converting Mean Differences into
t-scores
• We can use
• If,
sX1  X 2
to estimate
 X X
1
2
( X 1  X 2 )  ( 1  2 )
t
s X1  X 2
▫ The variances of the samples are roughly equal
(homogeneity of variance)
▫ We estimate the common variance by pooling
(averaging)
14
Psy 320 - Cal State
Northridge
Homogeneity of Variance
• We assume that the variances are equal because:
▫ If they’re from the same population they should be
equal (approximately)
▫ If they’re from different populations they need to
be similar enough for us to justify comparing them
(“apples to oranges” and all that)
15
Psy 320 - Cal State
Northridge
Pooling Variances
• So far, we have taken 1 sample estimate (e.g.
mean, SD, variance) and used it as our best
estimate of the corresponding population
parameter
• Now we have 2 samples, the average of the 2
sample statistics is a better estimate of the
parameter than either alone
16
Psy 320 - Cal State
Northridge
Pooling Variances
• Also, if there are 2 groups and one groups is
larger than it should “give” more to the average
estimate (because it is a better estimate itself)
• Assuming equal variances and pooling allows us
to use an estimate of the population that is
smaller than if we did not assume they were
equal
17
Psy 320 - Cal State
Northridge
Calculating Pooled Variance
• Remembering that
SS
SS
s 

df n  1
2
sp2 is the pooled estimate: we pool SS and df to get
s 
2
p
SS pooled
df pooled
SS1  SS 2
SS1  SS 2
SS1  SS 2



df1  df 2 (n1  1)  (n2  1) n1  n2  2
18
Psy 320 - Cal State
Northridge
Calculating Pooled Variance
SS1  SS 2
s 
n1  n2  2
2
p
Method 1
SS1   ( X i1  X 1 )
Method 2
SS1  (n1  1) s
2
SS 2   ( X i 2  X 2 )
2
1
2
SS 2  (n2  1) s
2
2
19
Psy 320 - Cal State
Northridge
Calculating Pooled Variance
• When sample sizes are equal (n1 = n2) it is simply
the average of the 2
2
2
s 
2
p
s X1  s X 2
2
• When unequal n we need a weighted average
(n1  1) s  (n2  1) s
s 
n1  n2  2
2
p
2
1
2
2
20
Psy 320 - Cal State
Northridge
Calculating SE for the Differences
Between Means
s
s X1  X 2 
2
p
n1

s
2
p
n2
• Once we calculate the pooled variance we just
substitute it in for the sigmas earlier
• If n1 = n2 then
s X1  X 2 
s
2
p
s
2
p
2
1
2
2
s
s



n1 n2
n1 n2
21
Psy 320 - Cal State
Northridge
Example: Violent Videos Games
• Does playing violent video games increase
aggressive behavior
• Two independent randomly selected/assigned
groups
▫ Grand Theft Auto: San Andreas (violent: 8
subjects) VS. NBA 2K7 (non-violent: 10 subjects)
▫ We want to compare mean number of aggressive
behaviors following game play
22
Psy 320 - Cal State
Northridge
Hypotheses (Steps 1 and 2)
• 2-tailed
▫ h0: 1 - 2 = 0 or Type of video game has no impact
on aggressive behaviors
▫ h1: 1 - 2  0 or Type of video game has an impact
on aggressive behaviors
• 1-tailed
▫ h0: 1 - 2 ≤ 0 or GTA leads to the same or less
aggressive behaviors as NBA
▫ h1: 1 - 2 > 0 or GTA leads to more aggressive
behaviors than NBA
23
Psy 320 - Cal State
Northridge
Distribution, Test and Alpha
(Steps 3 and 4)
• We don’t know  for either group, so we cannot
perform a Z-test
• We have to estimate  with s so it is a t-test (t
distribution)
• There are 2 independent groups
• So, an independent samples t-test
• Assume alpha = .05
24
Psy 320 - Cal State
Northridge
Decision Rule (Step 5)
• dftotal = (n1 – 1) + (n2 – 1) = n1 + n2 – 2
▫ Group 1 has 8 subjects – df = 8 – 1= 7
▫ Group 2 has 10 subjects – df = 10 – 1 = 9
• dftotal = (n1 – 1) + (n2 – 1) = 7 + 9 = 16 OR
dftotal = n1 + n2 – 2 = 8 + 10 – 2 = 16
• 1-tailed t.05(16) = ____; If to > ____ reject
1-tailed
• 2-tailed t.05(16) = ____; If to > ____ reject
2-tailed
df
1-tailed
2-tailed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0.1
0.2
3.078
1.886
1.683
1.533
1.476
1.44
1.415
1.397
1.383
1.372
1.363
1.356
1.35
1.345
1.341
1.337
1.333
1.33
1.328
1.325
1.323
1.321
1.319
1.318
1.316
Critical Values of Student's t
0.05 0.025
0.01
0.005 0.0005
0.1
0.05
0.02
0.01
0.001
6.314 12.706 31.821 63.657 636.619
2.92 4.303 6.965 9.925 31.598
2.353 3.182 4.5415 5.841 12.941
2.132 2.776 3.747 4.604
8.61
2.015 2.571 3.365 4.032
6.859
1.943 2.447 3.143 3.707
5.959
1.895 2.365 2.998 3.499
5.405
1.86 2.306 2.896 3.355
5.041
1.833 2.262 2.821
3.25
4.781
1.812 2.228 2.764 3.169
4.587
1.796 2.201 2.718 3.106
4.437
1.782 2.179 2.681 3.055
4.318
1.771 2.16
2.65
3.012
4.221
1.761 2.145 2.624 2.977
4.14
1.753 2.131 2.602 2.947
4.073
1.746 2.12
2.583 2.921
4.015
1.74
2.11
2.567 2.898
3.965
1.734 2.101 2.552 2.878
3.922
1.729 2.093 2.539 2.861
4.883
1.725 2.086 2.528 2.845
3.85
1.721 2.08
2.518 2.831
3.819
1.717 2.074 2.508 2.819
3.792
1.714 2.069
2.5
2.807
3.767
1.711 2.064 2.492 2.797
3.745
1.708 2.06
2.485 2.787
3.725
25
Psy 320 - Cal State
Northridge
t
Distribution
26
Psy 320 - Cal State
Northridge
Calculate to (Step 6)
Data
Subject
1
2
3
4
5
6
7
8
9
10
GTA
12
9
11
13
8
10
9
10
NBA
9
9
8
6
11
7
8
11
8
7
Mean
s
s2
10.25
1.669
2.786
8.4
1.647
2.713
27
Psy 320 - Cal State
Northridge
Calculate to (Step 6)
• We have means of 10.25 (GTA) and 8.4 (NBA), but
both of these are sample means.
• We want to test differences between sample
means.
▫ Not between a sample and a population mean
28
Psy 320 - Cal State
Northridge
Calculate to (Step 6)
• The t for 2 groups is:
( X 1  X 2 )  ( 1  2 )
t
s X1  X 2
• But under the null 1 - 2 = 0, so:
( X1  X 2 )
t
s X1  X 2
29
Psy 320 - Cal State
Northridge
Calculate to (Step 6)
• We need to calculate s X1  X 2 and to do that we
need to first calculate s 2
p
(n1  1) s   n2  1 s
__(____)  __  ____ 
s 

n1  n2  2
__  __  2
2
p
2
1
2
2
_____  _____ _____


 ____
__
__
30
Calculate to (Step 6)
Psy 320 - Cal State
Northridge
2
p
• Since s is _____ we can insert this into the
equation s X  X
1
s X1  X 2 
2
s
2
p
n1

s
2
p
n2

____ ____

__
__
 ___  ___  ____
Therefore,
X 1  X 2 ___  ___ ___
to 


 ___
s X1  X 2
___
___
31
Psy 320 - Cal State
Northridge
Make your decision
(Step 7)
• Since ____ > ____ we would reject the null
hypothesis under a 2-tailed test
• Since ____ > ____ we would reject the null
hypothesis under a 1-tailed test
• There is evidence that violent video games
increase aggressive behaviors
32
Psy 320 - Cal State
Northridge
Heterogeneous Variances
• Refers to case of unequal population variances
• We don’t pool the sample variances just add
them
• We adjust df and look t up in tables for adjusted
df
• For adjustment use Minimum
df = smaller n - 1
33
Psy 320 - Cal State
Northridge
Effect Size for Two Groups
• Extension of what we already know.
• We can often simply express the effect as the
difference between means (e.g. 1.85)
• We can scale the difference by the size of the
standard deviation.
▫ Gives d (aka Cohen’s d)
▫ Which standard deviation?
34
Psy 320 - Cal State
Northridge
Effect Size
• Use either standard deviation or their pooled
average.
▫ We will pool because neither has any claim to
priority.
▫ Pooled st. dev. = 2.745  1.657
35
Effect Size, cont.
Psy 320 - Cal State
Northridge
X viol  X nonviol 10.25  8.4
d

s pooled
1.657
1.85

 1.12
1.657
• This difference is approximately 1.12
standard deviations
• This is a very large effect
36
Psy 320 - Cal State
Northridge
Confidence Limits

 
CI.95  X 1  X 2  t.05,2tailed * s X 1 X 2

 (____  ____)   ____ ____ 
 ____  ____
 ____    ____
37
Psy 320 - Cal State
Northridge
Confidence Limits
• p = .95 that based on the sample values the interval
formed in this way includes the true value of 1 - 2
• The probability that interval includes 1 - 2 given
the value of X 1  X 2
• Does 0 fall in the interval? What does that mean?
38
Psy 320 - Cal State
Northridge
Computer Example
• SPSS reproduces the t value and the confidence
limits.
• All other statistics are the same as well.
• Note different df depending on homogeneity of
variance.
• Remember: Don’t pool if heterogeneity
 and modify degrees of freedom
39
SPSS output
Psy 320 - Cal State
Northridge
Group Statistics
AGGRESS
GAME
1.00 gta
2.00 nba
N
Mean
10.2500
8.4000
8
10
Std. Deviation
1.66905
1.64655
Std. Error
Mean
.59010
.52068
Independent Samples Test
Levene's Test for
Equality of Variances
F
AGGRESS
Equal variances
assumed
Equal variances
not assumed
.005
Sig.
.942
t-test for Equality of Means
t
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
2.355
16
.032
1.8500
.78571
.18436
3.51564
2.351
15.048
.033
1.8500
.78697
.17308
3.52692
Download