Univariate Analysis of Categorical Data

advertisement
Dr Siti Azrin Binti Ab Hamid
Unit Biostatistics and Research Methodology


Types of categorical analysis
Steps to analysis
Dependent
variable
Independent
variable
Number of
groups in
independent
variable
Parametric
test
Non
parametric
test
Numerical
(one)
-
-
One sample t
Sign test
Categorical
2 groups
(independent)
Independent t
Mann
Whitney
Categorical
2 groups
(dependent)
Paired t
Signed rank
test
Categorical
> 2 groups
(independent)
One way
ANOVA
Kruskal
Wallis
Categorical Categorical
(2 groups)
2 groups
(independent)
-
Chi square
test
Fisher exact
test
2 groups
(dependent)
-
McNemar
test
Categorical


Categorical data analysis deals with discrete data
that can be organized into categories.
The data are organized into a contingency table.
Data
One proportion
Two proportion
Independent sample
Dependent sample
Stratified sampling to
control confounder
Statistical tests
Chi-square goodness of
fit
Pearson chi-square /
Fisher exact
McNemar test
Mantel-Haenszel test
Step 1:
State the hypotheses
Step
Step
Step
Step
Set the significance level
Check the assumptions
Perform the statistical analysis
Make interpretation
2:
3:
4:
5:
Step 6:
Draw conclusion





Consists of two columns and two rows.
Cells are labeled A through D.
Columns and rows are added for labels.
Row: independent variable / exposure / risk factors
Column: dependent variable / outcome
CHD
present
CHD
absent
Total
Smoker
Nonsmoker
138
263
32
105
170
368
Total
137
401
538



To test the association between two categorical
variables
Independent sample
Result of test:
- Not significant: no association
- Significant: an association


Does estrogen receptor associated with breast
cancer status?
Data: Breast cancer.sav


HO: There is no association between estrogen
receptor and breast cancer status.
HA: There is an association between estrogen
receptor and breast cancer status.

α = 0.05
1.
2.
3.
Two variables are independent
Two variables are categorical
Expected count of < 5
- > 20%: Fisher exact test
- < 20%: Pearson Chi-square
Expected count = Row total x Column total
Grand total
Variable
Breast Ca
Total
Died
Alive
ER - ve
310
28
338
ER + ve
508
23
531
Total
818
51
869
Variable
Breast Ca
Total
Died
Alive
ER - ve
310
E = 318.2
28
E = 19.8
338
ER + ve
508
E = 499.8
23
E = 31.2
531
818
51
869
Total

Calculate the Chi-square value
x2 = ∑((O – E)2/ E)
= 5.897
df = (R-1)(C-1)
= (2-1)(2-1)
=1
Between 0.01 – 0.02
4
1
5
3
2
7
6
8
10
9
p value = 0.016
< 0.05 – reject HO, accept HA

There is significant association between estrogen
receptor and breast cancer status using Pearson
Chi-square test (p = 0.016).



To test the association between two categorical
variables
Independent sample
Sample sizes are small


Does gender associated with coronary heart
disease?
Data: CHD data.sav


HO: There is no association between gender and
coronary heart disease.
HA: There is an association between gender and
coronary heart disease.

α = 0.05
1.
2.
3.
Two variables are independent
Two variables are categorical
Expected count of < 5
- > 20%: Fisher exact test
- < 20%: Pearson Chi-square
Expected count = Row total x Column total
Grand total
Variable
Coronary Heart Disease
Total
Presence
Absent
Male
15
5
20
Female
10
0
10
Total
25
5
30
Variable
Male
Female
Total
Coronary Heart Disease Total
Presence
Absent
15
5
20
E = 16.7
E = 3.3
10
0
10
E = 8.3
E = 1.7
25
5
30
2 cells (50%) – expected count < 5

Calculate the Chi-square value
x2 = ∑((O – E)2/ E)
= 3.0968
df = (R-1)(C-1)
= (2-1)(2-1)
=1
Between 0.1 – 0.05
4
1
5
3
2
7
6
8
1
0
9
p value = 0.140
> 0.05 – accept HO

There is no significant association between gender
and coronary heart disease using Fisher’s Exact
test (p = 0.140).



Categorical data
Dependent sample
- Matched sample
- Cross over design
- Before & after (same subject)
To determine whether the row and column
marginal frequencies are equal (marginal
homogeneity)

Null hypothesis of marginal homogeneity states the
two marginal probabilities for each outcome are
the same
HO : P B = P C
HA : P B ≠ P C
A & D = concordant pair
B & C = discordant pair
Discordant pair is pair of different outcome



Does type of mastectomy associated with 5-year
survival proportion in patients with breast cancer?
The sample were breast cancer patients
- matched for age (same decade of age)
- same clinical condition
Data: breast ca.sav


HO: There is no association between type of
mastectomy and 5-year survival proportion in
patients with breast cancer.
HA: There is an association between type of
mastectomy and 5-year survival proportion in
patients with breast cancer.

α = 0.05
1.
2.
Two variables are dependent
Two variables are categorical


x2 = (|b-c|-1)2/(b + c)
= (|0 – 8| - 1)2 / (0 +8)
=6.125
df = (R-1)(C-1)
= (2-1)(2-1)
=1
Calculated x2 > tabulated x2
*x2 = (|b-c|-0.5)2/(b + c)
2
1
9
7
4
5
8
3
6
p value = 0.008
< 0.05 – reject HO, accept HA

There is an association between type of
mastectomy and 5-year survival proportion in
patients with breast cancer using McNemar test (p
= 0.008).




Test is a method to compare the probability of an
event among independent groups in stratified
samples.
The stratification factor can be study center,
gender, race, age groups, obesity status or disease
severity.
Gives a stratified statistical analysis of the
relationship between exposure and disease, after
controlling for a confounder (strata variables).
The data are arranged in a series of associated
2 × 2 contingency tables.


Does the type of treatment associated with
response of treatment among migraine patients
after controlling for gender?
Confounder: gender
Active
Placebo
No of patients
27
25
No of better
response
16
5
No of patients
28
26
No of better
response
12
7
Female
Male
Better
Same
Total
16
11
27
5
20
25
12
16
28
7
19
26
Reasons of
failure
Strata 1
Female
Active
Placebo
Strata 2
Male
Active
Placebo
1.
2.
Random sampling
Stratified sampling


HO: There is no association between type of
treatment and response of treatment among
female and male migraine patients.
HA: There is an association between type of
treatment and response of treatment among
female and male migraine patients.

Compute the expected frequency from each
stratum
ei = (ai + bi)(ai + ci)
ni


Compute each stratum
vi = (ai +bi)(ci +di)(ai +ci)(bi + di)
ni2(ni -1)
Compute Mantel-Haenszel statistics
x2MH = ∑(ai –ei)2
∑v i

Compute the expected frequency from each
stratum
ei = (ai + bi)(ai + ci)
ni
e1 = (16 +11)(16+ 5)
52
= 10.9038
e2 = (12 +16)(12+ 7)
54
= 9.8519

Compute each stratum
vi = (ai +bi)(ci +di)(ai +ci)(bi + di)
ni2(ni -1)
v1 = (16 + 11)(5 + 20)(16 + 5)(11+20)
(52)2(52-1)
= 3.1865
v2 = (12 + 16)(7 + 19)(12 + 7)(16+19)
(54)2(54-1)
= 3.1325

Compute Mantel-Haenszel statistics
x2MH = (∑ai –∑ei)2
∑v i
= ((16 +12) - (10.9038 + 9.8519))2
3.1865 + 3.1325
= 8.3051
= 8.31

Compute odd ratio
ORMH = ∑(ai di/ ni)
∑(bi ci/ ni)
= (16 x 20/ 52) + (12 x 19 / 54)
(11 x 5/ 52) + (16 x 7/ 54
= 3.313
Data: Migraine.sav
1
3
2
4
6
5

Compute Mantel-Haenszel statistics
x2MH = (∑ai –∑ei)2
∑v i
= ((16 +12) - (10.9038 + 9.8519))2
3.1865 + 3.1325
= 8.3051
= 8.31
Calculated value >
tabulated value
Reject HO
HO = OR1 = OR2
Association homogenous
*Tarone’s - adjusted
HO = OR1 = 1
HO = OR2 = 1
Conditionally independent
The large p-value for the Breslow-Day test (p =
0.222) indicates no significant gender difference
in the odds ratios.


There is significant association between type of
treatment and response of treatment among
female and male migraine patients (p = 0.004).
We estimate that female patients and male patients
who receive active treatment are 3.33 times more
likely to have better symptoms in migraine for any
reason than patients who receive placebo.
Download