Stat 562 Term Project Conditional logistic regression for binary matched pairs response By Xiufang Ye Content I: Motivation and theoretical background II: Basic Theory III: Data analysis Iv: Main Reference V:Acknowledgements Keyword: Marginal model , logit link , conditional logistic regression, sufficient statistics, ML analysis, matched case-control study 1.Motivation and Theoritical Background 1.1 What is binary matched pairs? For comparing categorical responses for two samples, when each observation in one sample pairs with an observation in the other, the data is called matched–pairs data. Thus, the responses in the two samples are statistically dependent. “Binary” here means that the responses are binary. 1.2 Why “logistic regression”? Firstly, logistic regression is the most important model for categorical response data, especially for binary data. It has been widely used in biomedical studies, social science research marketing ,business and even genetics. 1.3 Why “conditional logistic regression”? In all of the examples so far, the observations have been independent. But what if the observations were matched? You might think that it would possible to include dummy coded variables to indicate the matching. For example, if you had 56 matched pairs you could include 55 dummy variables to account for non-independence along with whatever covariates you wanted to have in the model. Logistic regression has problems when the number of degrees of freedom is close to the total degrees of freedom available. In a situation, such as this, the conditional logistic model is recommended. In matched casecontrol studies, conditional logistic regression can be used to investigate the relationship between an outcome and a set of prognostic factors. 1.4 Exact Inference for Logistic Regression Maximum likelihood estimators of model parameters work best when the sample size is large compared to the number of parameters in the model. When the sample size is small, or when there are many parameters relative to the sample size, improved inference results using the method of conditional maximum likelihood. The conditional maximum likelihood method bases inference for the primary parameters of interest on a conditional likelihood function that eliminates other parameters. The technique uses a conditional probability distribution defined over data sets in which the values of certain “sufficient statistics" for the other parameters are fixed. This distribution is defined for potential samples that provide the same information about the other parameters that occurs in the observed sample. The distribution and the related conditional likelihood function depend only on the parameters of interest. For binary data, conditional likelihood methods are especially useful when a logistic regression model contains a large number of “nuisance” parameters. They are also useful for small samples. One can perform exact inference for a parameter by using the conditional likelihood function that eliminates all the other parameters. Since that conditional likelihood does not involve unknown parameters, one can calculate probabilities such as p-values exactly rather than use crude approximations. 2. Basic theory 2.1 Marginal versus conditional models for binary matched pairs 2.1.1 Two marginal models Let ( Y1 , Y2 ) denote the pair of observations of a randomly selected subject, where a “1” outcome denotes category 1(success) and “0” outcome denotes category 2. We can fit the model (1) P(Yt =1)=α+ x t Where x1 =0, x 2 =1. Since P( Y2 =1)= α+ x 2 = α+ and P( Y1 =1)= α+ x1 = α , = P( Y2 =1) P( Y1 =1). Interpretaion of the parameter: It is the difference between marginal probabilities. Alternatively, the logit model can be written as logit(p( Yt =1))= α+ x t Then, (2) log[P(Y1 =1) /P(Y1 =0)]= log[P(Y2 =1) /P(Y2 =0)]= log[ P(Y2 =1) /P(Y2 =0) ] P(Y1 =1) /P(Y1 =0) Interpretation of the parameter: log odds ratio with the marginal distributions. The two models focus on the marginal distributions of responses for the two observations. For instance, in terms of the population-averaged table, the ML estimate of in (2) is the log odds ratio of marginal proportions, 2.1.2 One conditional model By contrast, the subject –specific table having strata implicitly allows probabilities to vary by subject. Let (Yi1,Yi2 ) denote the ith pair of observations, i=1,2, …,n. The model has the form link[P(Yit =1)]=α i +βx t This is called a conditional model, since the effect is defined conditional on the subject. When compared with marginal model, its estimate describes conditional association for the three-way table stratified by subject. The effect is subject-specific. But for marginal models (1) and (2), the effects are population-averaged, since they refer to averaging over the entire population rather than to individual specific. In fact, they are identical for the identity link, but differ for non-linear links. For example: For logit link, Logit P(Yit =1)=αi +βx t (3) Take the average of this for the population, we can not obtain the same form. 2.2 A Logit Model with subject –specific Probabilities By permitting subjects to have their own probability distributions, the conditional model (3) for Yit , observation t for subject i, is Logit [P(Yit =1)]=αi +βx t then, exp(α i +βx t ) 1+exp(α i +βx t ) Where x1 =0,x 2 1 . Here, we assume a common effect β . For subject i, exp(α i ) exp(α i + ) P(Yi1 =1)= , P(Yi2 =1)= 1+exp(α i ) 1+exp(α i + ) P(Yi2 =1) P(Yi1 =1) exp(α i +β) and exp(α i ) . Then, P(Yi2 =0) P(Yi1 =0) Interpretation of the parameter :The parameter compares the response distribution. For each subject, the odds of success for observation 2 are exp( ) times the odds for observation 1. P(Yit =1)= The dependence in matched pairs can be accounted for in the conditional logistic regression model. Given the parameters, with (3), we normally assumes independent of responses for different subjects and for the two observations on the same subject. However, averaging over all subjects, the responses are nonnegatively associated. Suppose| | is small compared to | i |, P( Yit =1) and P() are increasing functions of | i |, a subject with a larger positive i has high P( Y it =1) for each t and is likely to have a success each time, with a larger negative i has lower P( Yit =1)for each t and is likely to have a failure. For any , the greater the variability in { i } , the greater the overall positive association between responses, success(failure) for observation 1 tending to occur with success (failure) for observation 2. The positive association reflects the shared value of i for each observation in a pair. Specially, when i is identical, no association occurs. Question:When there are a large number of parameters { i } , this conditional model (3) causes some difficulty with the fitting process and the properties of ordinary ML estimators. Unconditional ML estimator is inconsistent. This result was shown firstly by Anderson in 1973. Outline of proof: Step 1:Assuming independent of responses for different subjects and different observations by the same subject, we can find the log likelihood equations are y t i P(Yit 1) and yi t P(Yit 1) . exp(α i ) exp(α i + ) + in the second likelihood equation, we can 1+exp(α i ) 1+exp(α i + ) prove that α̂ i =- for the n22 subjects wit yi+ =0 , α̂ i =+ for the n11 subjects with yi+ =2 , and α̂ =-ˆ / 2 for the n +n subjects with y =1 . Step 2:Substituting 21 i P(Y Step 3: By breaking it i i+ 12 1) into components for the sets of subjects having yi+ =0 , yi+ =2 and yi+ =1 , we can find that the first likelihood equation is , for t=1, y1 n22 (0)+n11 (1)+(n21 +n12 )exp(-ˆ / 2) /[1 exp(-ˆ / 2)] .Then, y1 = n11 +n12 , And P 2 . solve the first equation , we can show that β̂ 2 log(n 21 / n12 ) . Hence, ˆ There is a remedy which is called conditional ML. It treats { i } as nuisance parameters and maximizes the likelihood function for a conditional distribution that eliminates them. 2.3 Conditional ML inference for binary matched pairs 2.3.1 Estimate of parameters For model (3), we assume the independence as mentioned before, the joint mass function for {(y11,y12 ),(y21 ,y22 )...(yn1 ,yn2 )} is n exp(α i ) (1+exp(α ) ) i=1 n = ( i=1 i yi 1 ( 1 exp(α i +β) yi 2 1 )1 yi 1 ( ) ( )1 yi 2 1+exp(α i ) 1+exp(α i +β) 1+exp(α i +β) exp(α i y i1 ) exp[(α i +β)y i2 ] ) 1+exp(α i ) 1+exp(α i +β) n n I=1 I=1 exp( [α i (yi1 +yi2 )]+ yi2 ) n [1+exp(α )][1+exp(α +β)] I 1 i i n n I=1 I=1 So, it is proportional to exp( [α i (y i1 +y i2 )]+ y i2 ) . To eliminate { α i }, we condition on their sufficient statistics, the pairwise success totals {Si =yi1 +yi2 } . Then, we have P(Yi1 =Yi2 =0)=1 , given Si = 1; P(Yi1 =Yi2 =1)=1, given Si = 2. P(Yi1 =yi1 ,Yi2 =y12 ) P(Yi1 =yi1 ,Yi2 =y12 |Si =1)= P(Yi1 =1,Yi2 =0)+P(Yi1 =0,Yi2 =1) = exp( ) if yi1 =0,yi2 =1 1+exp( ) = 1 if yi1 =1,yi2 =0 1+exp( ) Let { nab } denote the counts for the four possible sequences, for subjects having Si =1 , i yi1 n12 ,the number of subjects having success for observation 1 and failure for observation 2. Silmilarly, for those subjects, i yi 2 n21 and S i i n* n12 n21 . Since n21 is the sum of n* independent , identical Bernoulli variates, its conditional exp( ) distribution is binomial with parameter . 1+exp( ) Hence, to make inference about β, or testing marginal homogeneity (β=0), we only need to know the information about the pairs in which either yi1 =1,yi2 =0 or yi1 =0,yi2 =1 . Alternatively, we can obtain this result through maximum likelihood method. Conditional on Si =1 , the joint distribution of matched pairs is ( Si =1 [exp(β)]n21 1 exp(β) yi 2 ) yi1 ( ) = [1+exp(β)]n 1+exp( ) 1+exp(β) where the product refers to all pairs having Si =1 . log ( Si =1 1 exp(β) yi 2 ) yi1 ( ) = n21 (n 21 n12 ) log(1 exp(β)) 1+exp( ) 1+exp(β) Differentiating the log of this conditional likelihood and equating to 0.And solving yields the conditional ML estimators. exp(β) n21 (n 21 n12 ) 0 1 exp(β) β̂ log(n21 / n12 ) By the delta method which is similarly applied in 2x2 contingincy tables, we can obtain that SE 1/ n 21 1/ n12 . 2.3.2 The Consistent property of estimate β Referring to problem 10.23 in the textbook, we can easily prove that P ˆ = ln ( n21 / n12 ) . Outline of the proof: For a random sample of n pairs ,we can easily prove that n exp(αi +β) 1 n 1 by the definition of n21 and independence of E ( n / n ) 21 n i 1 1+exp(αi ) 1+exp(αi +β) i=1 responses for different observations by the same subject. n 1 n exp(αi ) 1 Similarly, E (n12 / n) . Then we apply the law of large n i 1 1+exp(αi ) 1+exp(αi +β) i=1 numbers(WLLN) and obtain that n exp(α i +β) 1 p n21 i 1 1+exp(α i ) 1+exp(α i +β) n exp(α i ) 1 p and n12 . i 1 1+exp(α i ) 1+exp(α i +β) P Therefore , n21 / n12 exp( ) . 2.4 Random effects in Binary matched-pairs Model There is an alternative remedy to handling the huge number of nuisance parameters in logit model(3). One can treat { i } as random effects and regard { i } as an unobserved random sample from a probability distribution, usually assumed to be N ( , 2 ) with unknown and 2 . This will eliminates { i } by averaging with respect to their distribution, yielding a marginal distribution. For matched pairs with non negative sample log odds ratio, this approach also yields the estimate β̂ log(n21 / n12 ) . 2.5 Conditional ML for Matched Pairs with Multiple redictors Generally, We can extend the model (3) to the general model with multiple predictors as follows. logit[ P(Yit =1)]=αi +β1x1it +β2 x 2it +β3x 3it ...+βp x pit (4) where x hit denotes the value of predictor h for observation t in pair i, t=1,2.Typically , one predictor is an explanatory variable of interest, when the others are covariates being controlled, in addition to those already controlled by virtue of using them to form the matched pairs. We can also apply conditional ML to eliminate α i and get estimate β j . Let x it =(x1it, ...,x pit )' , and β=(β1 ,β2, ...,βp ) , them the conditional distribution are , P(Yi1 0, Yi 2 1| Si 1) = P(Yi1 1, Yi 2 0 | Si 1) exp(x'i2 ) exp(x'i1 )+exp(x'i2 ) exp(x'i1 ) exp(x'i1 )+exp(x'i2 ) By some mathematical technique, it shows that the first equation has the form of logistic regression with no intercept and with predictor values xi * xi 2 xi1 (difference between the two levels of some predictor variable).In fact, one can obtain conditional ML estimates for models (4) by fitting a logistics regression model those pairs alone, using artificial response y * =1 when ( yi1 0, yi 2 1 ), y* 0 ,when ( yi1 1, yi 2 0 ), no intercept, and predictor values xi * .This addresses the same likelihood as the conditional likelihood.(see Breslow et al, 1978 ; Chamberlain) 1980) Let us illustrate it with the Table 10.3 in textbook. Let xi * xi 2 xi1 and yi * yi 2 yi1 . Let t=1 refers to the control and t=2 to the case, then yi * 1 always. Since xit 1 represents “yes” for diabetes and xit 0 represents “no”,( yi 1, xi 1 ) for 16 * * observations, ( yi * 1, xi * 0 ) for 9+82=91 observations , and( yi * 1, xi * 1 ) for 37 observations. The logit model that forces ˆ 0 has ˆ 0.84 .With a single binary predictor, the estimate is identical to log( n21 / n12 ) . 2.6 Extensions: The discussion of marginal model in section 10.1 and conditional model in this section can be generalized to multinomial response and to matched set clusters. For matchedset cluster, the conditional ML approach is restricted to estimating β j that are within cluster effects, such as occur in case control and crossover studies. But an advantage of using the random effects approach instead of conditional ML with the conditional model is that it has no this kind of restriction. 3. Data analysis: 3.1 Conditional Logistic Regression for Matched Pairs Data In matched case-control studies, conditional logistic regression is used to investigate the relationship between an outcome of being a case or a control and a set of prognostic factors. When each matched set consists of a single case and a single control, the conditional likelihood is given by (1 exp( ' ( xi1 xi 0 ))) 1 . i where xi1 and xi0 are vectors representing the prognostic factors for the case and control, respectively, of the ith matched set. This likelihood is identical to the likelihood of fitting a logistic regression model to a set of data with constant response, where the model contains no intercept term and has explanatory variables given by di = xi1 - xi0 (Breslow 1982). The table 10.3 in the textbook illustrate a case-control study of acute myocardial Infarction (MI) among Navajo Indians, which matched 144 victims of MI according to age and gender with 144 people free of heart disease. Subjects were asked whether they had ever been diagnosed as having diabetes (x=0,no;x=1,yes).For subject t in matched pairs I, we consider the model (3). The case and corresponding control have the same ID. The prognostic factor is diabetes (an indicator variable for whether having diagnosed diabetes). The goal of the casecontrol analysis is to determine the relative risk for diabetes. Before PROC LOGISTIC is used for the logistic regression analysis, each matched pair is transformed into a single observation, where the variable diabetes contains the differences between the corresponding values for the case and the control (case - control). The variable Outcome, which will be used as the response variable in the logistic regression model, is given a constant value of 1. Note that there are 144 observations in the data set, one for each matched pair. The variable Outcome has a constant value of 0. In the following SAS statements, PROC LOGISTIC is invoked with the NOINT option to obtain the conditional logistic model estimates. Two models are fitted. The first model contains diabetes as the only predictor variable. Because the option CLODDS=PL is specified, PROC LOGISTIC computes a 95% profile likelihood confidence interval for the odds ratio for the predictor variable. SAS code data Data; input diabetes outcome @@ ; output; datalines; 11111111111111111111 11111111111111111111 11111111111111111111 11111111111111 01010101010101010101 01010101010101010101 01010101010101010101 01010101010101010101 01010101010101010101 01010101010101010101 01010101010101010101 01010101010101010101 01010101010101010101 01 –1 1 –1 1 –1 1 –1 1 –1 1 –1 1 –1 1 –1 1 –1 1 –1 1 –1 1 –1 1 –1 1 –1 1 –1 1 –1 1; proc logistic data=Data; model outcome=diabetes / noint CLODDS=PL; run; Results from the conditional logistic analyses are shown as follows. Note that there is only one response level listed in the "Response Profile" tables and there is no intercept term in the "Analysis of Maximum Likelihood Estimates" tables. Output of Anlysis: The SAS System The LOGISTIC Procedure Model Information Data Set WORK.DATA Response Variable outcome Number of Response Levels 1 Number of Observations 144 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Value 1 outcome 1 Total Frequency 144 Probability modeled is outcome=1. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Without Covariates With Covariates AIC 199.626 193.073 SC 199.626 196.043 -2 Log L 199.626 191.073 Criterion Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 8.5534 1 0.0034 Score 8.3208 1 0.0039 Wald 7.8501 1 0.0051 Analysis of Maximum Likelihood Estimates Parameter diabetes DF Estimate Standard Error Wald Chi-Square Pr > ChiSq 1 0.8383 0.2992 7.8501 0.0051 Odds Ratio Estimates Effect Point Estimate diabetes 2.312 95% Wald Confidence Limits 1.286 4.157 NOTE: Since there is only one response level, measures of association between the observed and predicted values were not calculated. Profile Likelihood Confidence Interval for Adjusted Odds Ratios Effect diabetes Unit Estimate 1.0000 2.312 95% Confidence Limits 1.310 4.272 In this model, where diabetes is the predictor variable. The odds ratio estimate for diabetes is 2.312, which is an estimate of the relative risk for diabetes. since The 95% profile likelihood confidence interval for the odds ratio for diabetes is(1.310, 4.272), which does not contain unity, the prognostic factor diabetes is statistically significant. Conditional Logistic Regression for m:n Matching Conditional logistic regression is used to investigate the relationship between an outcome and a set of prognostic factors in matched case-control studies. The outcome is whether the subject is a case or a control. If there is only one case and one control, the matching is 1:1. The m:n matching refers to the situation in which there is a varying number of cases and controls in the matched sets. You can perform conditional logistic regression with the PHREG procedure by using the discrete logistic model and forming a stratum for each matched set. In addition, you need to create dummy survival times so that all the cases in a matched set have the same event time value, and the corresponding controls are censored at later times. Consider the following set of low infant birth-weight data extracted from Appendix 1 of Hosmer and Lemeshow (1989). These data represent 189 women, of whom 59 had low birth-weight babies and 130 had normal weight babies. Under investigation are the following risk factors: weight in pounds at the last menstrual period (LWT), presence of hypertension (HT), smoking status during pregnancy (Smoke), and presence of uterine irritability (UI). For HT, Smoke, and UI, a value of 1 indicates a "yes" and a value of 0 indicates a "no." The woman's age (Age) is used as the matching variable. The SAS data set LBW contains a subset of the data corresponding to women between the ages of 16 and 32. data LBW; input id Age Low LWT Smoke HT UI @@; Time=2-Low; datalines; 25 175 ; 16 32 1 0 …… 170 0 0 0 207 32 0 186 0 0 0 The variable Low is used to determine whether the subject is a case (Low=1, low birthweight baby) or a control (Low=0, normal weight baby). The dummy time variable Time takes the value 1 for cases and 2 for controls. The following SAS statements produce a conditional logistic regression analysis of the data. The variable Time is the response, and Low is the censoring variable. Note that the data set is created so that all the cases have the same event time, and the controls have later censored times. The matching variable Age is used in the STRATA statement so each unique age value defines a stratum. The variables LWT, Smoke, HT, and UI are specified as explanatory variables. The TIES=DISCRETE option requests the discrete logistic model. proc phreg data=LBW; model Time*Low(0)= LWT Smoke HT UI / ties=discrete; strata Age; run; The procedure displays a summary of the number of event and censored observations for each stratum. These are the number of cases and controls for each matched set shown in Output1. Results of the conditional logistic regression analysis are shown in Output 2. Based on the Wald test for individual variables, the variables LWT, Smoke, and HT are statistically significant while UI is marginal. The hazards ratios, computed by exponentiating the parameter estimates, are useful in interpreting the results of the analysis. If the hazards ratio of a prognostic factor is larger than 1, an increment in the factor increases the hazard rate. If the hazards ratio is less than 1, an increment in the factor decreases the hazard rate. Results indicate that women were more likely to have low birth-weight babies if they were underweight in the last menstrual cycle, were hypertensive, smoked during pregnancy, or suffered uterine irritability. For matched case-control studies with one case per matched set (1:n matching), the likelihood function for the conditional logistic regression reduces to that of the Cox model for the continuous time scale. For this situation, you can use the default TIES=BRESLOW. Output 2: Summary of Number of Case and Controls The PHREG Procedure Model Information Data Set WORK.LBW Dependent Variable Time Censoring Variable Low Censoring Value(s) 0 Ties Handling DISCRETE Summary of the Number of Event and Censored Values Stratum 1 Age 16 Total Event Censored Percent Censored 7 1 6 85.71 2 17 12 5 7 58.33 3 18 10 2 8 80.00 4 19 16 3 13 81.25 5 20 18 8 10 55.56 6 21 12 5 7 58.33 7 22 13 2 11 84.62 8 23 13 5 8 61.54 9 24 13 5 8 61.54 10 25 15 6 9 60.00 11 26 8 4 4 50.00 12 27 3 2 1 33.33 13 28 9 2 7 77.78 14 29 7 1 6 85.71 15 30 7 1 6 85.71 16 31 5 1 4 80.00 17 32 6 1 5 83.33 174 54 120 68.97 Total Output 3 Conditional Logistic Regression Analysis for the Low BirthWeight Study The PHREG Procedure Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Criterion Without Covariates With Covariates -2 LOG L 159.069 141.108 AIC 159.069 149.108 SBC 159.069 157.064 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 17.9613 4 0.0013 Score 17.3152 4 0.0017 Wald 15.5577 4 0.0037 Analysis of Maximum Likelihood Estimates DF Parameter Estimate Standard Error ChiSquare Pr > ChiSq Hazard Ratio LWT 1 -0.01498 0.00706 4.5001 0.0339 0.985 Smoke 1 0.80805 0.36797 4.8221 0.0281 2.244 HT 1 1.75143 0.73932 5.6120 0.0178 5.763 UI 1 0.88341 0.48032 3.3827 0.0659 2.419 Variable 4 .Main Reference Categorical data analysis, second edition , by Alan Agresti. Thank you for listening.