6. Dr.Anil Analysis - karpagam faculty of medical sciences and

advertisement
Appropriate techniques of statistical
analysis
Anil C Mathew PhD
Professor of Biostatistics &
General Secretary ISMS
PSG Institute of Medical Sciences and
Research
Coimbatore 641 004
Types of studies
•
•
•
•
•
•
•
Case study
Case series
Cross sectional studies
Case control study
Cohort study
Randomized controlled trials
Screening test evaluation
Data analysis-Case series
Measures of averages
• Mean, Median, Mode
• Length of stay for 5 patients
1,3,2,4,5
Mean length of stay 3 days
Median length of stay 3 days
Mode length of stay No mode
Which is the best average
Mean
Median
Mode
DBP
81
79
76
Height
180
180
180
SAL
7.5
7.6
8.1
Data analysis-case series
• Frequency distribution
RBC
Frequency
Relative
frequency
5.95-7.95
1
0.029
7.95-9.95
8
0.229
9.95-11.95
14
0.400
11.95-13.95
9
0.257
13.95-15.95
2
0.057
15.95-17.95
1
0.029
Total
35
1.000
Design of Cohort Study
Time
Direction of inquiry
disease
Exposed
Population
no disease
People without
the disease
disease
Not Exposed
no disease
Is obesity associated with adverse
pregnancy outcomes? Women with a Body
Mass Index > 30 delivering singletons.
Ref- University of Udine, Italy,2006
Preterm Birth
Obese
16
Normal
46
No preterm
birth
35
T=51
487
T=533
%
31.4
8.6
RR=
3.65
Design of Case Control Study
Exposed
Disease
Not Exposed
Exposed
No Disease
Not Exposed
Results of a Case Control Study
Exposed (E+)
Non exposed
(E-)
Totals
Lung
Cancer
(D+)
80 a
No Lung
Cancer
(D-)
30 b
Totals
20 c
70 d
c+d
100 a + c
100 b + d
a+b
Analysis of Case-control study
Odds ratio = a*d/b*c =80*70/30*20 =9.3
Data Analysis-Screening Test Evaluation-Whether the plasma
levels of (Breast Carcinoma promoting factor) could be
used to diagnose breast cancer?
Positive criterion of BCPF >150 units vs. Breast Biopsy (the
gold standard)
D+
BCPF
Test
D-
T+
570
150
720
T-
30
850
880
600
1000
1600
TP = 570
FN = 30
FP = 150
TN = 850
Sensitivity = P (T+/D+)=570/600 =
95%
Specificity = P(T-/D-) = 850/1000 =
85%
False negative rate = 1 – sensitivity
False positive rate = 1 – specificity
Prevalence = P(D+) = 600/1600 =
38%
Positive predictive value = P (D+/T+)
= 570/720 = 79%
Tradeoffs between sensitivity
and specificity
When the consequences of missing a case
are potentially grave
When a false positive diagnosis may lead to
risky treatment
Data analysis-case series
Measures of variation
Group 1
Group 2
29
25
30
30
31
35
• Range
• Standard deviation
Data analysis- Analytical studies
• Tests of significance
Case Study 1: Drug A and Drug B
• Aim: Efficacy of two drugs on lowering serum
cholesterol levels
• Method: Drug A – 50 Patients
Drug B – 50 Patients
• Result: Average serum cholesterol level is
lower in those receiving drug B than
drug A at the end of 6 months
What is the Conclusion?
A) Drug B is superior to Drug A in lowering
cholesterol levels :
Possible/Not possible
B) Drug B is not superior to Drug A, instead
the difference may be due to chance:
Possible/Not possible
C) It is not due to drug, but uncontrolled
differences other than treatment between
the sample of men receiving drug A and
drug B account for the difference:
Possible/Not possible
D) Drug A may have selectively
administrated to patients whose serum
cholesterol levels were more refractory
to drug therapy:
Possible/Not possible
Observed difference in a study can
be due to
1) Random change
2) Biased comparison
3) Uncontrolled confounding variables
Solutions: A and B
• Test of Significance – p value
• P<0.05, means probability that the
difference is due to random chance is less
than 5%
• P<0.01, means probability that the
difference is due to random chance is less
than 1%
• P value will not tell about the magnitude of
the difference
Solutions: C and D
• Random allocation and compare the
baseline characteristics
Figure 1
Table 1-Baseline
Characteristics
Characteristic
Vitamin group
(n = 141)
Placebo group
(n = 142)
Mean age ± SD, y
28.9 ± 6.4
29.8 ± 5.6
Smokers, n (%)
22 (15.6)
14 (9.9)
Mean body mass index ± SD, kg/m2
25.3 ± 6.0
25.6 ± 5.6
Mean blood pressure ± SD, mm Hg
Systolic
Diastolic
112 ± 15
67 ± 11
110 ± 12
68 ± 10
Parity, n %)
0
1
2
>2
91 (65)
39 (28)
9 (6)
2 (1)
87 (61)
42 (30)
8 (6)
5 (4)
Coexisting disease, n (%)
Essential hypertension
Lupus/antiphospholipid syndrome
Diabetes
10 (7%)
4 (3%)
2 (1%)
7 (5%)
1(1%)
3 (2%)
“t” Test
Ho: There is no difference in mean birth weight of children from HSE
and LSE in the population
CR = t = | X1 - X2 |
SD 1 + 1
n1 n2
SD = (n1-1)SD12 + (n2-1)SD22
n1 + n2- 2
SD = 14*0.272 + 9*0.222
23
t = | 2.91 – 2.26|
0.25 1 + 1
15 10
DF = n1 + n2 – 2
CAL > Table REJECT Ho
= 0.25
= 6.36
GENERAL STEPS IN
HYPOTHESIS TESTING
1) State the hypothesis to be tested
2) Select a sample and collect data
3) Calculate the test statistics
4) Evaluate the evidence against the null hypothesis
5) State the conclusion
Commonly used statistical tests
• T test-compare two mean values
• Analysis of variance-Compare more than
two mean values
• Chi square test-Compare two proportions
• Correlation coefficient-relationship of two
continuous variables
Data entry format
Treatment
Age
weight
Diabetes
Painscore-b
Painscore-a
Vomiting
1
21
50
1
9
6
0
1
24
53
0
10
9
0
1
25
55
1
9
9
1
1
28
50
0
10
6
1
1
29
60
0
10
5
0
1
20
65
0
10
8
0
0
26
60
0
9
9
0
0
25
90
1
9
9
1
0
24
80
1
9
9
1
0
28
89
0
10
8
1
0
22
86
1
10
9
1
0
22
45
0
10
9
0
Example t test
Body
temperature c
Simple febrile
seizure
N = 25
Febrile without
seizure
N =25
P value
Mean
39.01
38.64
P<0.001
SD
0.56
0.45
Example-Analysis of variance
• Serum zinc level in simple febrile patients
based on duration of seizure occurred
Duration
min
n
Mean
SD
P value
<5
3
10.27
0.25
P <0.001
5 to 10
18
9.02
0.81
>10
4
6.90
0.98
Example Chi-square test
• Characteristics of patients in the two
groups
Duration of
fever (hour)
Simple
febrile
seizure
Febrile
without
seizure
P value
< 24
16
6
P<0.05
More than 24
9
19
Example Correlation
• We found a negative correlation between
serum zinc level and simple febrile seizure
event r = - 0.86 p <0.001
Type 1 and Type 2 Errors
Ho True
Correct decision
Type 2 error
β = P (Type 2 error)
Type 1 error
α = P (Type 1 error)
Correct decision
Accept Ho
Reject Ho
Power = 1- β
Ho False / H1 True
Multivariate problem
• Main outcome
• Continuous variable-Linear regression
• Dichotomous variable-Logistic regression
Bradford Hills Questions
•
•
•
•
Introduction- Why did you start?
Methods-What did you do?
Results- What did you find?
Discussion- What does it mean?
How to begin writing?
• Data Tables Methods, Results 
Introduction , Discussion  Abstract 
Title, Key words, References
Thank you
Download