Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology Types of categorical analysis Steps to analysis Dependent variable Independent variable Number of groups in independent variable Parametric test Non parametric test Numerical (one) - - One sample t Sign test Categorical 2 groups (independent) Independent t Mann Whitney Categorical 2 groups (dependent) Paired t Signed rank test Categorical > 2 groups (independent) One way ANOVA Kruskal Wallis Categorical Categorical (2 groups) 2 groups (independent) - Chi square test Fisher exact test 2 groups (dependent) - McNemar test Categorical Categorical data analysis deals with discrete data that can be organized into categories. The data are organized into a contingency table. Data One proportion Two proportion Independent sample Dependent sample Stratified sampling to control confounder Statistical tests Chi-square goodness of fit Pearson chi-square / Fisher exact McNemar test Mantel-Haenszel test Step 1: State the hypotheses Step Step Step Step Set the significance level Check the assumptions Perform the statistical analysis Make interpretation 2: 3: 4: 5: Step 6: Draw conclusion Consists of two columns and two rows. Cells are labeled A through D. Columns and rows are added for labels. Row: independent variable / exposure / risk factors Column: dependent variable / outcome CHD present CHD absent Total Smoker Nonsmoker 138 263 32 105 170 368 Total 137 401 538 To test the association between two categorical variables Independent sample Result of test: - Not significant: no association - Significant: an association Does estrogen receptor associated with breast cancer status? Data: Breast cancer.sav HO: There is no association between estrogen receptor and breast cancer status. HA: There is an association between estrogen receptor and breast cancer status. α = 0.05 1. 2. 3. Two variables are independent Two variables are categorical Expected count of < 5 - > 20%: Fisher exact test - < 20%: Pearson Chi-square Expected count = Row total x Column total Grand total Variable Breast Ca Total Died Alive ER - ve 310 28 338 ER + ve 508 23 531 Total 818 51 869 Variable Breast Ca Total Died Alive ER - ve 310 E = 318.2 28 E = 19.8 338 ER + ve 508 E = 499.8 23 E = 31.2 531 818 51 869 Total Calculate the Chi-square value x2 = ∑((O – E)2/ E) = 5.897 df = (R-1)(C-1) = (2-1)(2-1) =1 Between 0.01 – 0.02 4 1 5 3 2 7 6 8 10 9 p value = 0.016 < 0.05 – reject HO, accept HA There is significant association between estrogen receptor and breast cancer status using Pearson Chi-square test (p = 0.016). To test the association between two categorical variables Independent sample Sample sizes are small Does gender associated with coronary heart disease? Data: CHD data.sav HO: There is no association between gender and coronary heart disease. HA: There is an association between gender and coronary heart disease. α = 0.05 1. 2. 3. Two variables are independent Two variables are categorical Expected count of < 5 - > 20%: Fisher exact test - < 20%: Pearson Chi-square Expected count = Row total x Column total Grand total Variable Coronary Heart Disease Total Presence Absent Male 15 5 20 Female 10 0 10 Total 25 5 30 Variable Male Female Total Coronary Heart Disease Total Presence Absent 15 5 20 E = 16.7 E = 3.3 10 0 10 E = 8.3 E = 1.7 25 5 30 2 cells (50%) – expected count < 5 Calculate the Chi-square value x2 = ∑((O – E)2/ E) = 3.0968 df = (R-1)(C-1) = (2-1)(2-1) =1 Between 0.1 – 0.05 4 1 5 3 2 7 6 8 1 0 9 p value = 0.140 > 0.05 – accept HO There is no significant association between gender and coronary heart disease using Fisher’s Exact test (p = 0.140). Categorical data Dependent sample - Matched sample - Cross over design - Before & after (same subject) To determine whether the row and column marginal frequencies are equal (marginal homogeneity) Null hypothesis of marginal homogeneity states the two marginal probabilities for each outcome are the same HO : P B = P C HA : P B ≠ P C A & D = concordant pair B & C = discordant pair Discordant pair is pair of different outcome Does type of mastectomy associated with 5-year survival proportion in patients with breast cancer? The sample were breast cancer patients - matched for age (same decade of age) - same clinical condition Data: breast ca.sav HO: There is no association between type of mastectomy and 5-year survival proportion in patients with breast cancer. HA: There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer. α = 0.05 1. 2. Two variables are dependent Two variables are categorical x2 = (|b-c|-1)2/(b + c) = (|0 – 8| - 1)2 / (0 +8) =6.125 df = (R-1)(C-1) = (2-1)(2-1) =1 Calculated x2 > tabulated x2 *x2 = (|b-c|-0.5)2/(b + c) 2 1 9 7 4 5 8 3 6 p value = 0.008 < 0.05 – reject HO, accept HA There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer using McNemar test (p = 0.008). Test is a method to compare the probability of an event among independent groups in stratified samples. The stratification factor can be study center, gender, race, age groups, obesity status or disease severity. Gives a stratified statistical analysis of the relationship between exposure and disease, after controlling for a confounder (strata variables). The data are arranged in a series of associated 2 × 2 contingency tables. Does the type of treatment associated with response of treatment among migraine patients after controlling for gender? Confounder: gender Active Placebo No of patients 27 25 No of better response 16 5 No of patients 28 26 No of better response 12 7 Female Male Better Same Total 16 11 27 5 20 25 12 16 28 7 19 26 Reasons of failure Strata 1 Female Active Placebo Strata 2 Male Active Placebo 1. 2. Random sampling Stratified sampling HO: There is no association between type of treatment and response of treatment among female and male migraine patients. HA: There is an association between type of treatment and response of treatment among female and male migraine patients. Compute the expected frequency from each stratum ei = (ai + bi)(ai + ci) ni Compute each stratum vi = (ai +bi)(ci +di)(ai +ci)(bi + di) ni2(ni -1) Compute Mantel-Haenszel statistics x2MH = ∑(ai –ei)2 ∑v i Compute the expected frequency from each stratum ei = (ai + bi)(ai + ci) ni e1 = (16 +11)(16+ 5) 52 = 10.9038 e2 = (12 +16)(12+ 7) 54 = 9.8519 Compute each stratum vi = (ai +bi)(ci +di)(ai +ci)(bi + di) ni2(ni -1) v1 = (16 + 11)(5 + 20)(16 + 5)(11+20) (52)2(52-1) = 3.1865 v2 = (12 + 16)(7 + 19)(12 + 7)(16+19) (54)2(54-1) = 3.1325 Compute Mantel-Haenszel statistics x2MH = (∑ai –∑ei)2 ∑v i = ((16 +12) - (10.9038 + 9.8519))2 3.1865 + 3.1325 = 8.3051 = 8.31 Compute odd ratio ORMH = ∑(ai di/ ni) ∑(bi ci/ ni) = (16 x 20/ 52) + (12 x 19 / 54) (11 x 5/ 52) + (16 x 7/ 54 = 3.313 Data: Migraine.sav 1 3 2 4 6 5 Compute Mantel-Haenszel statistics x2MH = (∑ai –∑ei)2 ∑v i = ((16 +12) - (10.9038 + 9.8519))2 3.1865 + 3.1325 = 8.3051 = 8.31 Calculated value > tabulated value Reject HO HO = OR1 = OR2 Association homogenous *Tarone’s - adjusted HO = OR1 = 1 HO = OR2 = 1 Conditionally independent The large p-value for the Breslow-Day test (p = 0.222) indicates no significant gender difference in the odds ratios. There is significant association between type of treatment and response of treatment among female and male migraine patients (p = 0.004). We estimate that female patients and male patients who receive active treatment are 3.33 times more likely to have better symptoms in migraine for any reason than patients who receive placebo.