Biostatistics course Part 14 Analysis of binary paired data

advertisement
Biostatistics course
Part 14
Analysis of binary paired data
Dr. Sc. Nicolas Padilla Raygoza
Department of Nursing and Obstetrics
Division Health Sciences and Engineering
University of Guanajuato
Campus Celaya-Salvatierra
Biosketch
 Medical Doctor by University Autonomous of Guadalajara.
 Pediatrician by the Mexican Council of Certification on





Pediatrics.
Postgraduate Diploma on Epidemiology, London School of
Hygiene and Tropical Medicine, University of London.
Master Sciences with aim in Epidemiology, Atlantic International
University.
Doctorate Sciences with aim in Epidemiology, Atlantic
International University.
Associated Professor B, School of Nursing and Obstetrics of
Celaya, university of Guanajuato.
padillawarm@gmail.com
Competencies
 The reader will know how show paired binary
data.
 He (she) will apply hypothesis test for paired
binary data – McNemar’s Chi-squared.
 He (she) will calculate confidence interval for
paired binary data.
 He (she) will obtain Odds Ratio and
confidence interval for cases-controls paired
studies.
Introduction
 In Parts 12 and 13 of the biostatistics course, we
knew, the methods for comparing two proportions
estimated from independent samples.
 If the observations in a study are not independent, we
need to use different methods.
 Often we use two types of studies that give rise to
observations that are not independent:


Repeated observations in the same individual
Matched case-control studies
Example
 Tuberculosis can be diagnosed to use a culture
media and looking if Mycobacterium tuberculosis is
growing.
 In a experiment to compare two culture medias for
the tuberculosis diagnosis, samples of expectoration
from 100 patients were planted in the two medias.
 The half of the sample was planted in media A and
another half of sample, planted in media B.
Example
 In a study, to examine the relation between breast
cancer and oral contraceptives, women with a breast
cancer were matched with women without breast
cancer, selected from electoral registries.
 This is an example of cases-controls study, where
each individual with breast cancer is matched with an
individual with similar age, for control the potential
effect counding, of age.
Showing categorical paired data
 To use Z test or Chi squared test with paired data is a
mistake, because we do not take into account the
paired nature of data.
Patient
Culture
A
Culture
B
Patient
Culture
A
Culture
B
Patient
Culture
A
Culture
B
1
-
-
16
+
+
31
-
-
2
-
-
17
+
+
32
+
+
3
+
+
18
-
-
33
+
+
4
+
-
19
-
-
34
+
-
5
+
+
20
+
-
35
-
6
-
-
21
-
-
36
7
-
+
22
+
+
8
-
-
23
+
9
-
-
24
10
-
-
11
+
12
Patient
Culture
A
Culture
B
-
46
+
+
+
+
47
-
-
37
+
+
48
+
-
+
38
+
-
49
-
-
+
+
39
+
+
50
-
+
25
+
-
40
+
-
-
26
-
-
41
+
-
+
+
27
+
+
42
+
+
13
+
-
28
-
-
43
-
-
14
+
+
29
+
+
44
+
+
15
+
-
30
+
+
45
+
-
Showing categorical paired data
Patient
Culture
A
Culture
B
Patient
Culture
A
Culture
B
Patient
Culture
A
Culture
B
51
-
-
66
+
+
81
-
-
52
-
-
67
+
+
82
+
+
53
+
+
68
-
-
83
+
+
54
+
-
69
-
-
84
+
-
55
+
+
70
+
-
85
-
56
-
-
71
-
-
86
57
-
+
72
+
+
58
-
-
73
+
59
-
-
74
60
-
-
61
+
62
Patient
Culture
A
Culture
B
-
96
+
+
+
+
97
-
-
87
+
+
98
+
-
+
88
+
-
99
-
-
+
+
89
+
+
100
-
+
75
+
-
90
+
-
-
76
-
-
91
+
-
+
+
77
+
+
92
+
+
63
+
-
78
-
-
93
-
-
64
+
+
79
+
+
94
+
+
65
+
-
80
+
+
95
+
-
Showing categorical paired data
 The experiment compared the capacity of culture media to detect
Mycobacterium tuberculosis.
 The results were positive (+) or negative (-).
 We have interest in to compare the samples positives of both culture
media.
 The table summarize the results
Culture
media
+
-
Total
A
64
36
100
B
44
56
100
Showing categorical paired data
 From this, do you think that media A is better that media B to detect the
tuberculosis bacilli?
 To make an adequate analysis, we need to compare the results with
both media in each subject.
 There are four combinations of results that can occur in each subject:
Combination
Media A
Media B
Pairs
1
+
+
k
2
+
-
r
3
-
+
s
4
-
-
m
Showing categorical paired data
 To compare the results of each subject, we need count how many
times occur each combination.
 AN easy form to show the calues is tabulate the results from a sample
against another sample.
Media
B
+
B
-
A+
k
r
k+r
A-
s
m
s+m
k+s
r+m
N
Showing categorical paired data
 The pairs with same result are pairs with agreement, and they do not
give any information on what media is better to detect bacilli.
 Of the remaining results were different between the two media:


24 were positive for the A and negative for B.
4 were negative for A and positive for the B.
 The pairs whose results were different between both media, are called
discordant pairs.
Media
B
+
B
-
A+
40
24
64
A-
4
32
36
44
56
100
Hypothesis test for binary paired
data
 If there were no difference between the medias, we should expect
similar numbers r and s, r ≈ s
 We can use a call McNemar test to assess whether the difference
between the numbers of discordant pairs is greater than what you
would expect by chance.
 To test the null hypothesis that there is no difference between the two
proportions, we used the McNemar test:
(|r-s|-1)2
X2paired= ----------------r+s
 Subtracting 1 gives us a continuous correction.
Hypothesis test for binary paired
data
 In the study of two culture media for tuberculosis bacilli:
 24 were positives in media A and negatives in media B
 4 were negatives in media A and positives in media B
(|r-s|-1)2 (|24-4|-1)2
361
X2paired=---------------= --------------- = -------- = 12.81 p<0.05
r+s
24 + 4
28
 Rejected the null hypothesis of non-difference between media.
Confidence intervals for the difference of
two paired proportions
 We know thta the difference between proportions of paired data can be
calculate by: r – s / N
Where:
 r and s are the number of discordant pairs
 N is the total number of pairs
 Standard error from the difference between paired proportions is:
√r +s
SE(p1-p2) = ----------N
Confidence intervals from difference of
two paired proportions
 General formula to calculate 95% confidence interval is:

Estimate ± 1.96 x SE
 From the table of results of cultures from expectoration, with medias, A
and B, we are using r and s values, and can calculate 95% confidence
interval for paired proportions:

r-s / N ± √r +s/N = 24-4/100±1.96 √24+4/100 = 0.2±0.10 = 0.1 a 0.3 = 10%
a 30%-
 Confidence intervals from 0.1 to 0.3 mean that the percentage of
positive cultures for the bacilli could be between 10% and 30% higher
in media A than media B, in the population.
Odds Ratio for paired data
 In case-control studies, usually, we want to evaluate
the risk with the exposure at a risk factor; for these
studies, we need an effect measure.
 In case-control studies, we are using OR, that is a
Ratio between odds of the exposure in the cases
divided by odds of the exposure in controls.
 Calculate of OR with matched data, is based in
discordant pairs, the same that the difference
between proportions od paired data.
Odds Ratios for paired data
 Table of exposure in cases against exposure in
controls
Cases
Exposed
Non-exposed




Controls
Exposed Non-exposed
k
r
s
m
k = number of pairs where the case and control were exposed
r = number of pairs where the case was exposed and the control was not
exposed
s = number of pairs where the case was not exposed and the control was
exposed.
m = number of pairs where cases and controls were not exposed.
Odds Ratio for paired data
 Odds Ratio is calculate as the Ratio of two groups of
discordant pairs.
r
cases exposed controls not exposed
OR = ---- = ------------------------------------------------s
cases not exposed controls exposed
Odds Ratio for paired data
 The table show the results of a matched case-control
study, designed to investigate the association
between the use of oral contraceptive (OCC) and
thromboembolism.
Cases
Use OCC
Not use OCC
Controls
Use OCC
Not use OCC
10
57
13
95
Confidence intervals for paired OR
 To calculate confidence intervals is a little more
complicated.
 It is calculate using square root of the value of
McNemar X2 test, instead of standard error.
 95% confidence intervals for OR from apired data is:

OR1±1.96/ X
Odds Ratio for paired data
 OR = 4.4
 X²paired = 26.41
 Xpaired = 5.14
 Then, 95% confidence interval is:
 From 4.41-1.96/ 5.14 to 4.41+1.96/ 5.14
 4.40.62
to
4.41.38
 2.5 to 7.7
Bibliografía
 1.- Last JM. A dictionary of epidemiology.
New York, 4ª ed. Oxford University Press,
2001:173.
 2.- Kirkwood BR. Essentials of medical
ststistics. Oxford, Blackwell Science, 1988: 14.
 3.- Altman DG. Practical statistics for medical
research. Boca Ratón, Chapman & Hall/
CRC; 1991: 1-9.
Download