How to Learn Everything You Ever Wanted to Know About Statistics

advertisement
1
2
Two-samples tests, X
Dr. Mona Hassan Ahmed
Prof. of Biostatistics
HIPH, Alexandria University
2
Z-test
(two independent proportions)
Z
P1  P2
P1 (1  P1 ) P2 (1  P2 )

n1
n2
P1= proportion in the first group
P2= proportion in the second group
n1= first sample size
n2= second sample size
3
Critical z =
• 1.96 at 5% level of significance
• 2.58 at 1% level of significance
4
Example
Researchers wished to know if urban and rural adult
residents of a developing country differ with respect to
prevalence of a certain eye disease. A survey revealed the
following information
Eye disease
Residence
Total
Yes
No
Rural
24
276
300
Urban
15
485
500
Test at 5% level of significance, the difference in the
prevalence of eye disease in the 2 groups
5
Answer
P1 = 24/300 = 0.08
p2 = 15/500 = 0.03
0.08  0.03
Z
 2.87
0.08(1  0.08) 0.03(1  0.03)

300
500
2.87 > Z*
The difference is statistically significant
6
t-Test (two independent means)
t 
x1  x 2
2
2
S p
S p

n1
n2
X1 = mean of the first group
X2
= mean of the second group
S2p = pooled variance
7
(n 1  1)S 1  (n 2  1)S
S P
n1  n 2  2
2
2
2
2
Critical t from table is detected
at degree of freedom = n1+ n2 - 2
level of significance 1% or 5%
8
Example
Sample of size 25 was selected from healthy
population, their mean SBP =125 mm Hg with
SD of 10 mm Hg . Another sample of size 17 was
selected from the population of diabetics, their
mean SBP was 132 mmHg, with SD of 12 mm
Hg .
Test whether there is a significant difference in
mean SBP of diabetics and healthy individual at
1% level of significance
9
Answer
n 1  25
X 1  125 S1 = 12
n 2  17
X 2  132 S2 =11
H0 : 1 = 2
H1 : 1  2
α = 0.01
State H0
State H1
Choose α
(25  1)10  (17  1)12
SP
 117 .6
25  17  2
2
2
2
10
Answer
t
125  132
 2.503
117.6 117.6

25
17
Critical t at df = 40 & 1% level of significance = 2.58
Decision:
Since the computed t is smaller than critical t so
there is no significant difference between mean
SBP of healthy and diabetic samples at 1 %.
11
Degrees of
freedom
1
5
10
17
20
24
25

Probability (p value)
0.10
0.05
0.01
6.314
12.706 63.657
2.015
2.571
4.032
1.813
2.228
3.169
1.740
2.110
2.898
1.725
2.086
2.845
1.711
2.064
2.797
1.708
2.060
2.787
1.645
1.960
2.576
12
Paired t- test
(t- difference)
Uses:
To compare the means of two paired samples.
Example, mean SBP before and after intake
of drug.
13
d
t 
Sd
n
di

d
 mean difference
n
Sd 
2
di
 
( di) 2
n 1
n
di = difference (after-before)
Sd = standard deviation of difference
n = sample size
Critical t from table at df = n-1
14
Example
The following data represents the
reading of SBP before and after
administration of certain drug. Test
whether the drug has an effect on
SBP at 1% level of significance.
15
Serial No.
1
2
3
4
5
6
SBP
(Before)
200
160
190
185
210
175
SBP
(After)
180
165
175
185
170
160
16
Answer
Serial
No.
BP
Before
BP
After
di
After-Before
di2
1
200
180
-20
400
2
160
165
5
25
3
190
175
-15
225
4
185
185
0
0
5
210
170
-40
1600
6
175
160
-15
225
-85
2475
∑di
∑ di2
Total
17
Answer
di  85

d

 14.17
n
Sd 
6
( 85) 2
2475 
6
 15.942
5
18
Answer
 14.17
Computed t 
 2.17
15.942
6
Critical t at df = 6-1 = 5 and 1% level of significance
= 4.032
Decision:
Since t is < critical t so there is no significant
difference between mean SBP before and after
administration of drug at 1% Level.
19
Degrees of
freedom
1
5
10
17
20
24
25

Probability (p value)
0.10
0.05
0.01
6.314
12.706 63.657
2.015
2.571
4.032
1.813
2.228
3.169
1.740
2.110
2.898
1.725
2.086
2.845
1.711
2.064
2.797
1.708
2.060
2.787
1.645
1.960
2.576
20
Chi-Square test
It tests the association between variables...
The data is qualitative .
It is performed mainly on frequencies.
It determines whether the observed
frequencies differ significantly from
expected frequencies.
21
(O i  E i )
ComputedΧ  
Ei
2
2
Where E = expected frequency
O = observed frequency
Raw total  Column total
E
Grand total
22
Critical X2 at df = (R-1) ( C -1)
Where R = raw C = column
I f 2 x 2 table
X2* = 3.84 at 5 % level of significance
X2* = 6.63 at 1 % level of significance
23
24
Example
In a study to determine the effect of heredity in a certain
disease, a sample of cases and controls was taken:
Family
history
Positive
Negative
Total
Disease
Cases
Control
80
120
140
160
220
280
Total
200
300
500
Using 5% level of significance,
test whether family history has an effect on disease
25
Answer
Family
history
positive
O
E
Negative
O
E
Total
Disease
Cases
Control
Total
80
88
120
112
200
140
132
220
160
168
280
300
500
X2 = (80-88)2/88 + (120-112)2/112 + (140-132)2/132 + (160-168)2/168
= 2.165 < 3.84
Association between the disease and family history is not
significant
26
• The odds ratio was developed to quantify
exposure – disease relations using casecontrol data
• Once you have selected cases and controls 
ascertain exposure
• Then, cross-tabulate data to form a 2-by-2
table of counts
27
2-by-2 Crosstab Notation
Exposed +
Exposed Total
• Disease status
Disease +
Disease -
Total
A
C
A+C
B
D
B+D
A+B
C+D
A+B+C+D
A+C = no. of cases
B+D = no. of non-cases
• Exposure status A+B = no. of exposed individuals
C+D = no. of non-exposed individuals
28
Exposed +
Exposed -
Disease +
Disease -
A
C
B
D
exposure odds, cases  o1 
A
C
B
exposure odds, controls  o0 
D
o1 A / C AD
odds ratio  OR  

o0 B / D BC
Cross-product
ratio
AD
OR 
BC
29
Example
• Exposure variable = Smoking
• Disease variable = Hypertension
D+
DAD (30)( 22)
E+
30
71
OR 

 9.3
BC
(71)(1)
E1
22
Total
31
93
30
Interpretation of the Odds Ratio
• Odds ratios are relative risk
estimates
• Relative risk are risk multipliers
• The odds ratio of 9.3 implies 9.3×
risk with exposure
31
OR > 1
OR = 1
OR < 1
Positive association
Higher risk
No association
Negative association
Lower risk (Protective)
32
• In the previous example
OR = 9.3
• 95% CI is 1.20 – 72.14
33
Multiple Levels of Exposure
Smoking level
Cases
Controls
Heavy smokers
Moderate smokers
213
61
274
147
Light smokers
Non-smokers
Total
14
8
296
82
115
618
34
Multiple Levels of Exposure
• k levels of exposure  break up data into (k –
1) 22 tables
• Compare each exposure level to non-exposed
• e.g., heavy smokers vs. non-smokers
Cases
Heavy smokers
Non-smokers
Controls
213
274
8
115
(213)(115)
OR 
 11.2
(274)(8)
35
Multiple Levels of Exposure
Smoking level
Cases
Controls
213
274
OR3 =(213)(115)/(274)(8)=11.2
Moderate smokers
61
147
OR2 =(61)(115)/(147)(8) = 6.0
Light smokers
14
82
Non-smokers
8
115
605
115
Heavy smokers
Total
OR1 =(14)(115)/(82)(8)
= 2.5
Notice the trend in OR
(dose-response relationship)
36
Small Sample Size Formula For the Odds Ratio
It is recommend to add ½ to each cell before
calculating the odds ratio when some cells are
zeros
OR Small Sample
OR Small Sample =
(A+0.5)(D+0.5)
=
(B+0.5)(C+0.5)
(31+0.5)(22+0.5)
(71+0.5)(0+0.5)
=19.8
D+
D-
E+
31
71
E-
0
22
Total
31
93
37
38
Download