Boda-boda riders MAB7203 Conditional Logistics Assignment: (Instructor: Nazarius Mbona Tumwesigye) Robert Serunjogi (2024/HD07/22352U) 2025-04-07 Table of contents 1 Conditional Logistics 2 1.1 Boxplot by Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Characteristics table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Conditional Logit Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Model diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4.1 Wald’s test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4.2 Areas under the curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4.3 Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 Conditional logistic table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.6 McNemar’s Chi-squared Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1 1 Conditional Logistics 1.1 Boxplot by Age The Controls (median age = 30) were significantly older compared to the cases (median age = 28). 1.2 Characteristics table 1.3 Conditional Logit Model boda_clr <- clogit(casecont ~ takealcohol + agegrp + strata(matchedcase) ,data = bodaboda) 1.4 Model diagnostics 1.4.1 Wald’s test Table 1: Wald Test Results takealcoholYes agegrp18-24 agegrp30-34 Characteristic p-value takealcoholYes agegrp18-24 agegrp30-34 0.000 0.000 0.055 2 agegrp35-39 agegrp40+ 1.4.2 Characteristic p-value agegrp35-39 agegrp40+ 0.375 0.053 Areas under the curve # Get actual outcome variable (0/1) and predicted values actual_outcome <- bodaboda$casecont #assuming this is the 0/1 outcome predicted_scores <- predict(boda_clr, type = "lp")# linear predictor(log odds) # ROC curve roc_curve_boda_clr <- roc(actual_outcome , predicted_scores) Setting levels: control = 0, case = 1 Setting direction: controls < cases # Plot plot(roc_curve_boda_clr, main = "ROC Curve for Clogit Model") text(0.6, 0.2, labels = paste("AUC =" , round(auc(roc_curve_boda_clr), 3))) # Print AUC auc_value <- auc(roc_curve_boda_clr) print(auc_value) Area under the curve: 0.7092 3 Characteristic Marital Status DIVORCED/SEPARATED MARRIED/COHABITING SINGLE/WIDOWED Age Group 25-29 18-24 30-34 35-39 40+ Takes Alcohol drugs (Missing) cargo (Missing) Passengers 1 Passenger 2+ Passengers (Missing) 1 2 Control N = 2891 Case N = 2891 Overall N = 5781 4 (1.4%) 235 (81.3%) 50 (17.3%) 20 (6.9%) 191 (66.1%) 78 (27.0%) 24 (4.2%) 426 (73.7%) 128 (22.1%) 99 (34.3%) 39 (13.5%) 64 (22.1%) 48 (16.6%) 39 (13.5%) 52 (18.0%) 2 (0.9%) 75 12 (5.7%) 77 86 (29.8%) 79 (27.3%) 77 (26.6%) 29 (10.0%) 18 (6.2%) 86 (29.8%) 5 (1.7%) 0 10 (3.5%) 3 185 (32.0%) 118 (20.4%) 141 (24.4%) 77 (13.3%) 57 (9.9%) 138 (23.9%) 7 (1.4%) 75 22 (4.4%) 80 135 (63.4%) 78 (36.6%) 76 145 (50.3%) 143 (49.7%) 1 280 (55.9%) 221 (44.1%) 77 p-value2 <0.001 <0.001 <0.001 0.7 0.2 0.004 n (%) Pearson’s Chi-squared test; Fisher’s exact test AUC = 0.71 suggests that your model has good discrimination ability. It means there is a 71% chance that the model will correctly assign a higher predicted score to a true case than to a control chosen at random. 1.4.3 Residuals # Ensure residuals are correctly computed boda_clr_residuals <- residuals(boda_clr, type = "deviance") # Ensure fitted values are computed correctly boda_clr_values <- fitted(boda_clr) # Create residual plot ggplot(data = data.frame(boda_clr_values, boda_clr_residuals) , aes(x = boda_clr_values, y = boda_clr_residuals)) + geom_point() + geom_hline(yintercept = 0, color = "red") + labs(x = "Fitted Values" , y = "Residuals" , title = "Residual Plot for the Model") + theme_tq() 4 Characteristic OR Takes Alcohol No — Yes 2.29 Age Group 25-29 — 18-24 2.79 30-34 1.59 35-39 0.78 40+ 0.53 Abbreviations: CI = Confidence Interval, OR = Odds Ratio 95% CI p-value — 1.47, 3.55 <0.001 — 1.63, 4.78 0.99, 2.55 0.44, 1.36 0.28, 1.01 <0.001 0.055 0.4 0.053 The model appears to fit the data reasonably well, as residuals are mostly randomly scattered around zero with no strong systematic patterns. 1.5 Conditional logistic table tbl_boda_clr <- tbl_regression(boda_clr ,exponentiate = T ,label = takealcohol ~ "Takes Alcohol") tbl_boda_clr Reference groups for age was 18-24 The odds of men who consumed alcohol having an accident were 2.29 times (OR = 2.29, 95% CI: 1.47, 3.55, p < 0.001) those who do not drink alcohol. This showed a statistical significance and a strong positive association between alcohol consumption and the likelihood of accidents. 5 bb_cc <- bodaboda %>% select(casecont, alcohol, matchedcase) 1.6 McNemar’s Chi-squared Test # Reshape the dataset from long to wide format bb_wide_data <- bb_cc %>% pivot_wider(names_from = casecont , values_from = alcohol , names_prefix = "alcohol") # Create a 2x2 contingency table contingency_table <- table(bb_wide_data$alcohol1 , bb_wide_data$alcohol0) # Perform McNemar's test result1 <- mcnemar.test(contingency_table,correct=T) result2 <- mcnemar.test(contingency_table,correct=F) result3 <- mcnemar.exact(contingency_table) exact_mcnemar_result <- exact2x2(contingency_table) mcc(table = table(bb_wide_data$alcohol0 , bb_wide_data$alcohol1)) $data Controls Cases Unexposed Exposed Total Unexposed 168 69 237 Exposed 35 17 52 Total 203 86 289 $mcnemar_chi2 McNemar's Chi-squared test data: mcc_table McNemar's chi-squared = 11.115, df = 1, p-value = 0.0008561 $mcnemar_exact_p Exact McNemar significance probability 0.001108621 $proportions Proportion with factor Cases Controls 0.8200692 0.7024221 $statistics 6 estimate [95% CI] statistic estimate lower upper difference 0.1176471 0.04636803 0.1889261 1.1674877 1.06580258 1.2788743 ratio rel. diff. 0.3953488 0.21462362 0.5760741 odds ratio 1.9714286 1.29443899 3.0515292 McNemar’s Chi-squared Test: McNemar’s test is used to compare paired proportions. In this context, it tests whether the proportion of exposed is different between cases and controls. The small p-value (0.0008561) indicates that there is a statistically significant difference in exposure status between cases and controls, meaning cases were significantly more likely to have been exposed to accidents compared to controls. McNemar’s test, which is more accurate for small sample sizes. The p-value is also statistically significant, confirming the finding from the chi-squared test. A higher proportion of cases was exposed (82.01%) compared to controls (70.24%). The proportion of exposed is higher in cases by 11.76%. The confidence interval shows the range where the true difference likely lies. Ratio of proportions: 1.1675 (95% CI: 1.0658, 1.2789). The proportion of exposed in cases is 1.1675 times the proportion of exposed in controls. The odds of exposure among cases was 1.9714 times (95% CI: 1.2944, 3.0515) the odds of exposure among controls. This is a key measure of association in case-control studies. 7
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )