Effect size

advertisement

Instructor: Mr. Chu Duc Nghia Group members:

Duong Thi Chi

Pham Thi Hoa

Pham Thi Mai

Nguyen Thi Van

1

2

Problem: Susan predicts that students will learn most effectively with a constant background sound, as opposed to an unpredictable sound or no sound at all. She randomly divides twenty-four students into three groups of eight. All students study a passage of text for 30 minutes. Those in group 1 study with background sound at a constant volume in the background. Those in group 2 study with noise that changes volume periodically. Those in group 3 study with no sound at all. After studying, all students take a 10 point multiple choice test over the material.

 

0.05

3

SOURCES SS df MS F scores Sample mean

Among

Within

30.08

2

87.88

21

15.04

4.18

3.59

Constant sound(1)

Random sound(2)

74686629

55344722

No sound(3) 24712155

6

4

3.375

 Decision rule: reject H o if F

F

, k

1,

F

0.05,2,21

3.4668

DECISION: reject H o as F=3.59> F

0.05,2,21

3.4668

Conclusion: difference exists among average score of 3 group

=>background sound affects studying results

4

1.

2.

3.

Difference belongs to which pairs of means?

How to identify?

Multiple t-tests???

5

Because the more means there are:

• The more number of t-test we have to take

• The greater type-I error ( the probability of rejecting the null hypothesis when it is true)

=> Using TUKEY TEST

6

Tukey test: is a multiple comparison procedure and statistical test developed by John Tukey.

Characteristics:

 Compare all possible pairs of means to find which

 means are significantly different from one another

Generally used in conjunction with an ANOVA

Based on a studentized range distribution q

7

Identify the technique

 Problem objective: detect the difference between

 population means

Data type: quantitative

 Experimental design: independent

Assumptions

The observations being tested are independent

The means are from normally distributed populations

 There is equal variation across observations.

(homoscedasticity)

8

 Studentized distribution is built upon the formula: q

 x max

MSE x

/ min n

 It is similar to student-t distribution: but q-distribution takes into account the number of means under consideration. The more means under consideration, the larger q value (studentized t).

 How it is built?

We take random samples from independent populations of interest.

Then identify the largest and the smallest mean among the sample means chosen, calculate difference between these two means, and then compute q as formula. After repeating the procedure many times, we get many value of q. These values form a q-distribution.

9

Step 1 : arrange the means from the smallest to the largest and calculate the difference b/w each pair of means.

Step 2 : calculate the critical value ω :

k: number of samples

v: d.f associated with MSE (v=n-k)

  q

, k , v

MSE n g

α: significance level q

α,k,v: critical value of studentized range (see in the table next slide) n g

: number of observations

*equal sample size: n g n

 n

 n

*unequal sample sizes:

...

n g

2 n

1 n

1 n n

2

2

10

Step 3 : compare the differences calculated & ω. If larger than ω  the means pairs are significantly different.

11

 The Tukey confidence limits

( x l arg er

 x smaller

)

 q

, v , k

MSE n g

 How to use confidence interval??

- Calculate confidence intervals for each pair of means.

- If the interval contains value 0, then conclude: difference of that pair is not significantly different from 0

- If the interval is in negative/positive side, then difference exist in that pair of means

12

Problem objective: detect the difference between population means

Data type: quantitative

Experimental design: independent use Tukey test with assumptions as

The means (average scores of students from each groups) are from normally distributed populations

There is equal variation across observations.

(homoscedasticity)

13

Step 1 :

No sound(3)

Random S(2)

Const. S(1)

3.375

4

6

No sound(3) Random S(2) Const. S(1)

3.375

4 6

-

-

-

0.625

-

-

2.625

2

-

Step 2:

ω=q

0.05 ,24-3,3

MSE

0.05,21,3 n g

8

Step 3 : see that the difference b/w constant sound group and no sound group is significant because 2.625>2.5776.

14

Other solution to example : using the Tukey confidence interval.

The 95% confidence interval between 3 pairs of means are:

0

.

0474

 x

1

 x

3

5 .

2026

0 .

5776

0 .

19526

 x

1 x

2

 x

2

 x

3

4 .

5776

3 .

2026

 x

1

&

x

2

;

x

2

&

x

3

 not significantly different from zero  x

&

x significant or the difference b/w constant sound group and no sound group is significant . This conclusion is consistent with using Tukey test.

15

16

What if the result of Ex.1 change into: Not reject H

0

?

This result may be explained by…

Which kind of background sounds does not affect studying result (H

0

is true)

We made a wrong decision. (H

0

is false but we couldn’t reject it) => We made type II error.

=> How to know we made wrong decision or not? Based on power of the test!

17

 According to Cohen (1988), Power is “the probability of rejecting a null hypothesis when it is false — and therefore should be rejected.”

H

0 is true H

0 is false

Reject H

0

Type I error

=

Correct decision

= 1-

= power

Not reject H

0

Correct decision

= 1-

Type II error

=

Example: Ho: beautiful girls are intelligent.

Ha: beautiful girls are not intelligent.

If beautiful girls are actually intelligent , but we say they are stupid, so we make Type I error!!!

If they are actually not intelligent, but we say they are  we commit Type II error!

If they are actually not intelligent & we say they are not  the test’s power is strong!

18

Non-rejection region

19

Role of power analysis : find optimal sample size + compute the test’s power to check how many % it will not make Type II error  important!

Priori Power Analysis

• Before a research

• Aim: find the optimal sample size to ensure the test is powerful (β≥0.8) .

• too large sample size  waste of time, money , effort, etc,

• too small sample size  low test’s power.

Posteriori Power Analysis

• After a research

• Compute the test’s power.

20

Significance level

(conventional

0.05)

Effect size Sample size

Power

Types of test

(ANOVA, ttest...)

21

Sample size: larger sample size  more information collected  the test is more powerful. But too large sample size  waste of time, money & other resources.

Statistical significance level ( conventional: 0.05): The greater alpha  the smaller beta  the more powerful.

Effect size : the bigger effect size is  the more power the test has.

22

EFFECT SIZE : show that difference is significant or not .

 Generally, effect size is calculated by taking the difference between the

 two groups and dividing it by the standard deviation.

To interpret the resulting number, most social scientists use this general guide developed by Cohen:

▪ < 0.1 = trivial effect

▪ 0.1 - 0.3 = small effect

▪ 0.3 - 0.5 = moderate effect

▪ > 0.5 = large difference effect

23

Because effect size can only be calculated after data is collected, you will have to use an estimate for the power analysis. How to estimate??

 Literature review: based on similar test in the same field in the past in which the author detected the effect size successfully.

 Based on experience, rationale, perception of yourself.

 Neutral: use a value of 0.5 as it indicates a moderate to large difference.

24

 EFFECT SIZE:

Effect size can be used for many types of tests, each test has a specific formula to calculate effect size.

s

For 2 means:

ES

 x

1

 x

2 s

For ANOVA: with k: Number of groups

ES



( x i

 x )

2 k * MSE

25

Example :

 Testing the effectiveness of two different teaching method: A&B. 2 random samples of students which have the same studying result were taken from two classes to participate in the test. After 1 month, the result revealed that group A student has better scores than group B, measured by the mean scores of two groups. Group

A’s result is 10 points higher than group B’s , s=30

 ES

 x

A

 x

B with s

 ES= 10/30=0.33

 moderate effect.

26

Using the example from Tukey test:

α=0.05, medium ES , power =0.8, ANOVA with 3 groups.

 look at the table at the next slide, the required sample size each group is 52.

27

28

G* power (FREE and available at http://www.psycho.uniduesseldorf.de/abteilungen/aap/gpower3/)

Power and Precision - Biostat

( www.PowerAnalysis.com

)

One-Stop F Calculator (Included in Murphy &

Myors (2004))

PASS - NCSS software

( www.ncss.com/pass.html

)

29

 Tukey test:

 Help detect where the difference belong to which pairs of means , simultaneously, control Type I error :α (reject Ho when it is true- serious case)

 But conservative: loss of power when compare all pair wise of means with a critical value.

 Power analysis

 Help best estimate the sample sizes when conducting different kinds of tests

 Make the test more meaningful as it points out the effect size of each test

 Avoid the case when researchers can not reject Ho and arbitrarily conclude that

Ho is true

30

 http://137.148.49.106/offices/assessment/Assessment%20Reports%202006/CoS/

Psychology%203%20of%203.pdf

http://pcbfaculty.ou.edu/classfiles/MGT%206973%20Seminar%20in%20Research

%20Methods/MGT%206973%20Res%20Methods%20Spr%202006/Week-

5%20Research%20Design%20and%20Primary%20Data%20Collection/Cohen%2

01992%20PB%20A%20power%20primer.pdf

http://www.cvgs.k12.va.us/DIGSTATS/main/Guides/g_tukey.html http://www.epa.gov/bioiweb1/statprimer/power.html http://www.faculty.sfasu.edu/cobledean/Biostatistics/Lecture6/MultipleCompari sonTests.PDF http://web.mst.edu/~psyworld/tukeyssteps.html

http://www.cvgs.k12.va.us/DIGSTATS/main/Guides/g_tukey.html

http://faculty.vassar.edu/lowry/ch14pt2.html

http://people.richland.edu/james/lecture/m170/ch13-1wy.html

http://faculty.vassar.edu/lowry/vsanova.html

http://www.statsoft.com/textbook/power-analysis/ http://math.yorku.ca/SCS/Online/power/

31

32

Download