Appropriate techniques of statistical analysis Anil C Mathew PhD Professor of Biostatistics & General Secretary ISMS PSG Institute of Medical Sciences and Research Coimbatore 641 004 Types of studies • • • • • • • Case study Case series Cross sectional studies Case control study Cohort study Randomized controlled trials Screening test evaluation Data analysis-Case series Measures of averages • Mean, Median, Mode • Length of stay for 5 patients 1,3,2,4,5 Mean length of stay 3 days Median length of stay 3 days Mode length of stay No mode Which is the best average Mean Median Mode DBP 81 79 76 Height 180 180 180 SAL 7.5 7.6 8.1 Data analysis-case series • Frequency distribution RBC Frequency Relative frequency 5.95-7.95 1 0.029 7.95-9.95 8 0.229 9.95-11.95 14 0.400 11.95-13.95 9 0.257 13.95-15.95 2 0.057 15.95-17.95 1 0.029 Total 35 1.000 Design of Cohort Study Time Direction of inquiry disease Exposed Population no disease People without the disease disease Not Exposed no disease Is obesity associated with adverse pregnancy outcomes? Women with a Body Mass Index > 30 delivering singletons. Ref- University of Udine, Italy,2006 Preterm Birth Obese 16 Normal 46 No preterm birth 35 T=51 487 T=533 % 31.4 8.6 RR= 3.65 Design of Case Control Study Exposed Disease Not Exposed Exposed No Disease Not Exposed Results of a Case Control Study Exposed (E+) Non exposed (E-) Totals Lung Cancer (D+) 80 a No Lung Cancer (D-) 30 b Totals 20 c 70 d c+d 100 a + c 100 b + d a+b Analysis of Case-control study Odds ratio = a*d/b*c =80*70/30*20 =9.3 Data Analysis-Screening Test Evaluation-Whether the plasma levels of (Breast Carcinoma promoting factor) could be used to diagnose breast cancer? Positive criterion of BCPF >150 units vs. Breast Biopsy (the gold standard) D+ BCPF Test D- T+ 570 150 720 T- 30 850 880 600 1000 1600 TP = 570 FN = 30 FP = 150 TN = 850 Sensitivity = P (T+/D+)=570/600 = 95% Specificity = P(T-/D-) = 850/1000 = 85% False negative rate = 1 – sensitivity False positive rate = 1 – specificity Prevalence = P(D+) = 600/1600 = 38% Positive predictive value = P (D+/T+) = 570/720 = 79% Tradeoffs between sensitivity and specificity When the consequences of missing a case are potentially grave When a false positive diagnosis may lead to risky treatment Data analysis-case series Measures of variation Group 1 Group 2 29 25 30 30 31 35 • Range • Standard deviation Data analysis- Analytical studies • Tests of significance Case Study 1: Drug A and Drug B • Aim: Efficacy of two drugs on lowering serum cholesterol levels • Method: Drug A – 50 Patients Drug B – 50 Patients • Result: Average serum cholesterol level is lower in those receiving drug B than drug A at the end of 6 months What is the Conclusion? A) Drug B is superior to Drug A in lowering cholesterol levels : Possible/Not possible B) Drug B is not superior to Drug A, instead the difference may be due to chance: Possible/Not possible C) It is not due to drug, but uncontrolled differences other than treatment between the sample of men receiving drug A and drug B account for the difference: Possible/Not possible D) Drug A may have selectively administrated to patients whose serum cholesterol levels were more refractory to drug therapy: Possible/Not possible Observed difference in a study can be due to 1) Random change 2) Biased comparison 3) Uncontrolled confounding variables Solutions: A and B • Test of Significance – p value • P<0.05, means probability that the difference is due to random chance is less than 5% • P<0.01, means probability that the difference is due to random chance is less than 1% • P value will not tell about the magnitude of the difference Solutions: C and D • Random allocation and compare the baseline characteristics Figure 1 Table 1-Baseline Characteristics Characteristic Vitamin group (n = 141) Placebo group (n = 142) Mean age ± SD, y 28.9 ± 6.4 29.8 ± 5.6 Smokers, n (%) 22 (15.6) 14 (9.9) Mean body mass index ± SD, kg/m2 25.3 ± 6.0 25.6 ± 5.6 Mean blood pressure ± SD, mm Hg Systolic Diastolic 112 ± 15 67 ± 11 110 ± 12 68 ± 10 Parity, n %) 0 1 2 >2 91 (65) 39 (28) 9 (6) 2 (1) 87 (61) 42 (30) 8 (6) 5 (4) Coexisting disease, n (%) Essential hypertension Lupus/antiphospholipid syndrome Diabetes 10 (7%) 4 (3%) 2 (1%) 7 (5%) 1(1%) 3 (2%) “t” Test Ho: There is no difference in mean birth weight of children from HSE and LSE in the population CR = t = | X1 - X2 | SD 1 + 1 n1 n2 SD = (n1-1)SD12 + (n2-1)SD22 n1 + n2- 2 SD = 14*0.272 + 9*0.222 23 t = | 2.91 – 2.26| 0.25 1 + 1 15 10 DF = n1 + n2 – 2 CAL > Table REJECT Ho = 0.25 = 6.36 GENERAL STEPS IN HYPOTHESIS TESTING 1) State the hypothesis to be tested 2) Select a sample and collect data 3) Calculate the test statistics 4) Evaluate the evidence against the null hypothesis 5) State the conclusion Commonly used statistical tests • T test-compare two mean values • Analysis of variance-Compare more than two mean values • Chi square test-Compare two proportions • Correlation coefficient-relationship of two continuous variables Data entry format Treatment Age weight Diabetes Painscore-b Painscore-a Vomiting 1 21 50 1 9 6 0 1 24 53 0 10 9 0 1 25 55 1 9 9 1 1 28 50 0 10 6 1 1 29 60 0 10 5 0 1 20 65 0 10 8 0 0 26 60 0 9 9 0 0 25 90 1 9 9 1 0 24 80 1 9 9 1 0 28 89 0 10 8 1 0 22 86 1 10 9 1 0 22 45 0 10 9 0 Example t test Body temperature c Simple febrile seizure N = 25 Febrile without seizure N =25 P value Mean 39.01 38.64 P<0.001 SD 0.56 0.45 Example-Analysis of variance • Serum zinc level in simple febrile patients based on duration of seizure occurred Duration min n Mean SD P value <5 3 10.27 0.25 P <0.001 5 to 10 18 9.02 0.81 >10 4 6.90 0.98 Example Chi-square test • Characteristics of patients in the two groups Duration of fever (hour) Simple febrile seizure Febrile without seizure P value < 24 16 6 P<0.05 More than 24 9 19 Example Correlation • We found a negative correlation between serum zinc level and simple febrile seizure event r = - 0.86 p <0.001 Type 1 and Type 2 Errors Ho True Correct decision Type 2 error β = P (Type 2 error) Type 1 error α = P (Type 1 error) Correct decision Accept Ho Reject Ho Power = 1- β Ho False / H1 True Multivariate problem • Main outcome • Continuous variable-Linear regression • Dichotomous variable-Logistic regression Bradford Hills Questions • • • • Introduction- Why did you start? Methods-What did you do? Results- What did you find? Discussion- What does it mean? How to begin writing? • Data Tables Methods, Results Introduction , Discussion Abstract Title, Key words, References Thank you