Matched Pair Data Stat 557 Heike Hofmann

Matched Pair Data Stat 557 Heike Hofmann Outline • • • • • Marginal Homogeneity - review Subject-specific vs Marginal Model Binary Response • • conditional logistic regression with covariates Ordinal response Symmetric Models Matched Pair Data 2nd Rating 1st Rating Assumptions Approve Disapprove Approve 794 150 Disapprove 86 570 • Diagonal heavily loaded • Association usually strongly positive (most people don’t change their opinion) • Distinguish between movers & stayers Marginal Homogeneity • logit P(Y = 1| x ) = α + β x t t t • x is dummy variable for time points t x1 = 0, x2 = 1 Then β is log odds ratio based on overall population RAND -American Life Panel https://mmicdata.rand.org/alp/?page=election#electionforecast Panel of 3500 US citizens above 18 tracked since July Data isn’t published on individual basis, but from change and overall margins we can (almost) work out change pattern 1 week after 1st debate • before 1st debate Obama Romney Obama 1585 121 Romney 162 1432 3300 > mswitch <- glm(I(candidate=="Obama")~time, data=votem, family=binomial(), weight=votes) > summary(mswitch) Call: glm(formula = I(candidate == "Obama") ~ time, family = binomial(), data = votem, weights = votes) Deviance Residuals: Min 1Q Median -46.462 -22.929 -0.435 3Q 21.992 Max 45.733 Coefficients: Estimate Std. Error z value (Intercept) 0.11771 0.03488 3.375 timevote2 -0.04981 0.04929 -1.010 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 Pr(>|z|) 0.000738 *** 0.312299 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 9135.4 Residual deviance: 9134.3 AIC: 9138.3 on 7 on 6 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 3 Subject Specific Model • link P(Y = 1) = α + β x • x is dummy variable for time points it i t t x1 = 0, x2 = 1 • then αi = link P(Yi1 = 1) β = link P(Yi2 = 1) - link P(Yi1 = 1) painful to fit ... Marginal vs SubjectSpecific Model Estimates for β • is identical for marginal model and subject specific model in case of identity link • are different for logit link • marginal model: β = logit P(Y2 = 1| x2 ) - logit P(Y1 = 1| x1 ) • subject specific, for all i: β = logit P(Yi2 = 1| x2 ) - logit P(Yi1 = 1| x1 ) Subject-Specific Model • logit P(Y = 1) = α + β x • Assumptions generally: • responses from different subjects it i t independent (for all i) • responses for different time-points independent Subject-Specific Model • logit P(Y = 1) = α + β x • Assumptions generally: • responses from different subjects it i t independent (for all i) • responses for different time-points independent Subject-Specific Model • Violation of independence taken care of by model structure: • • Generally, |αi| >> |β| • When |αi| is small, we have the most variability between responses of the same individual - i.e. least dependence. That’s the records, on which estimation of β is based on. For large |αi|, probability of P(Yit = 1) is either close to 0 or close to 1 (largest dependence in the data) Subject Specific Model • link P(Y = 1) = α + β x • but: estimation α of becomes problematic it i t i for large numbers of subjects • idea: condition on sufficient statistic for α i leads to conditional (logistic) regression Likelihood for αi Fitting the Subject Specific Model • Let S = y +y then S in {0,1,2} • S are sufficient statistics for α only values i i1 i i2, i i of 1 contribute to the estimation of β • logit P(Y it = 1 | Si = 1) = αi + β xt Estimating β • MLE for β is log n /n • standard deviation of estimate is then 21 12 sqrt(1/n12 + 1/n21) • Use clogit from the survival package to fit model Navajo Indians • 144 victims of myocardiac infarcts (MI cases) are matched with 144 control subjects (disease free) according to gender and age. • All participants of the study are asked about whether they ever were diagnosed with diabetes: Controls Diabetes no Cases Diabetes no 9 16 37 82 > myo.ml <- clogit(MI ~ diabetes + strata(pair), data=t103) > summary(myo.ml) Call: coxph(formula = Surv(rep(1, 288L), MI) ~ diabetes + strata(pair), data = t103, method = "exact") n= 288 coef exp(coef) se(coef) z Pr(>|z|) diabetes 0.8383 2.3125 0.2992 2.802 0.00508 ** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 diabetes 2.312 0.4324 1.286 4.157 Rsquare= 0.029 (max possible= Likelihood ratio test= 8.55 on Wald test = 7.85 on Score (logrank) test = 8.32 on 0.5 ) 1 df, 1 df, 1 df, p=0.003449 p=0.005082 p=0.003919 nditional Logistic Regression as GLM Conditional Logistic Conditional Logistic Regression as GLM Regression GLMj ∈ {1, ..., p}, X , ..., X covariates with x = valueas for predictor Conditional Logistic Regression as GLM 1 p jit ividual iX ∈p {1, ..., n} with at time = {1, Let X1 , ..., covariates xjit =tvalue for 2} predictor j ∈ {1, ..., p}, individual i Logistic ∈ {1, ..., n} at time t = {1, 2} nditional Regression Model Conditional Logistic Regression • logit(P(Yit = 1)) = αi + β1 x1it + β2 x2it + ... + βp xpit logit(P(Yit = 1)) = αi + β1 x1it + β2 x2it + ... + βp xpit • nditioned onesuccess: success: on one success: Conditionedon on Conditioned one P(Yi1 = = 1, YY 0 |0Si| = 1)= = i2 = = P(Y 1, S 1) i1 i2 i P(Yi1 = 0, Yi2 = 1 | Si = 1) = P(Yi1 = 0, Yi2 = 1 | Si = 1) 1 1 1= + exp ((xi2 − xi1 )� β) � β) 1 + exp ((x − x ) i2 i1 exp ((xi2 − xi1 )� β) � β) exp ((x − x ) � i2 i1 1= + exp ((xi2 − xi1 ) β) 1 + exp ((xi2 − xi1 )� β) Conditional Logistic onditional Logistic Conditional LogisticRegression Regressionas as GLM Regression asGLM GLM Conditional Logistic Regression as GLM Conditional Logistic Regression as GLM • Rewrite Re-write Re-write �� 1 if ifYY = 1, 1, i1i1==0,0,YYi2i2= ∗Y ∗ = 1 and Y = and = 1, Y = 0. 0 0 if ifYY i1i1= 1, Yi2i2= 0. ∗ Xi∗Xi Then Then • Then Xi1for forall alli.i. = =Xi2Xi2−−Xi1 ∗ ∗ ∗ ∗ logit(P(y = 1)) = β x + β x + ... + β x 1∗ 1i 2 ∗2i p pi ∗ i ∗ logit(P(yi = 1)) = β1 x1i + β2 x2i + ... + βp xpi Note: the above logistic regression does not have an intercept Note: the above regression does not have an intercept nologistic intercept logistic regression Extensions: longitudinal studies, i.e. more than two observations per xtensions: i.e.come moreback thantotwo per individuallongitudinal or clustered studies, data (we’ll thatobservations later), ndividual or clustered data (we’ll come back to that later), > table(ystar) ystar 1 144 > table(xstar) xstar -1 0 1 16 91 37 glm(formula = ystar ~ xstar - 1, family = binomial(logit)) Deviance Residuals: Min 1Q Median 0.8478 0.8478 1.1774 3Q 1.1774 Max 1.5477 Coefficients: Estimate Std. Error z value Pr(>|z|) xstar 0.8383 0.2992 2.802 0.00508 ** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 199.63 Residual deviance: 191.07 AIC: 193.07 on 144 on 143 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 4 Models for Square Contingency Tables l for Ordinal Response Matched Pairs: Ordinal Models for Square Contingency Tables Model for Ordinal Response Y and Y are ordinal variables with J>2 • be ordinal with J categories. 1 2 categories proportional oddswith model: Let Yt be ordinal J categories. POLR model (marginal): Then proportional odds model: • • logit(P(Yt ≤ j)) = αj + βxt logit(P(Yt ≤ j)) = αj + βxt Cumulative odds ratio: odds ratios are constant for all j: ative odds cumulative ratio: P(Y2 ≤ j)/P(Y2 > j) log θj =P(Y log 2 ≤ j)/P(Y2 > j) = β(x2 − x1 ) = β, P(Y1 ≤ j)/P(Y1 > j) log θj = log = β(x2 − x1 ) P(Y1 ≤ j)/P(Y1 > j) for x2 = 1 and x1 = 0, independent of j. = β, Marginal Homogeneity Marginal Homogeneity in Ordinal Model Models for Square Contingency Tables Marginal homogeneity is equivalent to zero • Marginal homogeneity: log odds ratio: β=0 ⇐⇒ logit(P(Y1 ≤ j)) = logit(P(Y2 ≤ j)) ∀j ⇐⇒ P(Y1 ≤ j) = P(Y2 ≤ j) ∀j • • • ⇐⇒ πj+ = π+j ∀j Model Fit: Model Fit based on 1+ (J-1) parameters based on marginal probabilities πj+ , π+j , j= 1, ..., J, Overall we have 2(J-1) degrees of freedom overall 2 · (J − 1) degrees of freedom; proportionalModel odds model has (J − 1) + �� 1 freedom = J parameters has J-2 degrees of � �� αj model fit is based on df = J − 2. β Matched Pairs: Nominal • Baseline Logistic Regression log P(Yt = j)/P(Yt = J) = alphaj + betaj xt • Then beta =0 is test for marginal j homogeneity POLR model (marginal): Models for Square Contingency Tables • For nominal Y with J ≥ 3 categories, use J as baseline • Baseline Logistic Regression log P(Yt = j)/P(Yt = J) = αj + βj xt • Then β =0 is test for marginal homogeneity j Example: Migration Data Migration Data 95% of the data is on the diagonal. Residence in 1985 Residence 80 NE MW S W NE 11607 100 366 124 MW 87 13677 515 302 S 172 225 17819 270 W 63 176 286 10192 Total 11929 14178 18986 10888 Total 12197 14581 18486 10717 55981 • 95% of data is on diagonal • marginal homogeneity seems given, is data even symmetric? Stat 557 ( Fall 2008) Matched Pair Data November 4, 2008 10 / 10 Symmetry Model • H : π = π for all a,b • as logistic regression: 0 ab ba log πab/πba = 1 • as loglinear model log mab = µ + µaX + µbY + µabXY with µaX = µaY and µabXY= µbaXY Migration Data • Symmetry seems to be violated: e.g. fewer people move MW -> S than vice versa

Matched Pair Data Stat 557 Heike Hofmann

Related documents

Products

Support

Matched Pair Data Stat 557 Heike Hofmann

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib