Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD Multiple Informant Data Military Service in Vietnam id pairid PTSD self report military rec. 1 2 3 4 1 1 2 2 45 17 66 58 yes yes no yes no yes yes yes Vietnam service by self report and military record Self Report Yes No Yes 6,322 450 6,772 No 221 3,706 3,927 6,543 4,156 10,699 Military Record Kappa=0.87 Table 1: Participant Characteristics Characteristic Age in 2007, mean (sd) All Veterans Veterans serving in Vietnam n=10,809 n=4,377 58 (3.1) 59 (2.5) Zygosity, % monozygotic dizygotic indeterminate 52 45 2 51 47 2 Military branch, % Army Navy Air Force Marines Coast Guard 51 23 18 7 0.5 49 27 15 9 0.1 Table 1: Participant Characteristics Characteristic All Veterans Veterans serving in Vietnam n=10,809 n=4,377 Post traumatic stress disorder score, % 15 - 17 18 - 23 24 - 31 32 - 75 22 27 26 25 16 23 25 36 Vietnam Service, % self report military record 39 37 95 90 Command regress ptsd sr, robust Self Report sr | .1793066 Linear regression .0070909 Number of obs = 10796 F( 1, 10794) = 639.43 Prob > F = 0.0000 R-squared = 0.0599 Root MSE = .34613 -----------------------------------------------------------------------------| Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------sr | .1793066 .0070909 25.29 0.000 .1654071 .193206 _cons | 3.130085 .0039722 788.00 0.000 3.122299 3.137871 ------------------------------------------------------------------------------ Command regress ptsd mr, robust Self Report sr | .1793066 Linear regression .0070909 Military Record mr | .152672 .0072727 Number of obs = 10712 F( 1, 10710) = 440.68 Prob > F = 0.0000 R-squared = 0.0423 Root MSE = .34992 -----------------------------------------------------------------------------| Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------mr | .152672 .0072727 20.99 0.000 .138416 .1669279 _cons | 3.144166 .0040245 781.26 0.000 3.136277 3.152054 ------------------------------------------------------------------------------ Model 1: The General Multiple Source Model expected Generates same estimates as the k marginal source-specific models outcome k 1 * ij gE Y 0 m1 intercept m1 m ij source indicators Allows testing for a difference in sources H 0 : 1 2 ... k s k z m1 m m ij source by exposure interaction terms Multiple Informant Data id pairid PTSD self report military rec. 1 2 3 4 1 1 2 2 45 17 66 58 yes yes no yes no yes yes yes Command expand 2 id pairid PTSD sr mr 1 1 2 2 3 2 4 1 1 1 2 1 2 45 45 17 17 66 17 58 1 1 1 0 1 0 0 1 1 1 3 2 66 0 1 3 2 66 0 1 4 4 2 2 58 58 1 1 1 1 Command expand 2 id pairid PTSD sr mr 1 1 2 2 1 1 1 1 45 45 17 17 1 1 1 1 0 0 1 1 3 2 66 0 1 3 2 66 0 1 4 4 2 2 58 58 1 1 1 1 Command generate service=0 id pairid PTSD sr mr service 1 1 2 2 1 1 1 1 45 45 17 17 1 1 1 1 0 0 1 1 0 0 0 0 3 2 66 0 1 0 3 2 66 0 1 4 4 2 2 58 58 1 1 1 1 0 0 0 Command by id: replace service = sr if _n==1 id pairid PTSD sr mr service 1 1 2 2 1 1 1 1 45 45 17 17 1 1 1 1 0 0 1 1 1 0 1 0 3 2 66 0 1 0 3 2 66 0 1 4 4 2 2 58 58 1 1 1 1 0 1 0 Command by id: replace service = mr if _n==2 id pairid PTSD sr mr service 1 1 2 2 1 1 1 1 45 45 17 17 1 1 1 1 0 0 1 1 1 0 1 1 0 3 2 66 0 1 0 3 2 66 0 1 4 4 2 2 58 58 1 1 1 1 1 0 1 1 0 Command id pairid PTSD service 1 1 2 2 1 1 1 1 45 45 17 17 1 0 1 1 3 2 66 0 3 2 66 4 4 2 2 58 58 1 1 1 Command generate s1 = 0 generate s2 = 0 id pairid PTSD service s1 s2 1 1 2 1 1 1 45 45 17 1 0 1 0 0 0 0 0 0 2 1 17 1 0 0 3 2 66 0 0 0 3 2 66 1 0 0 4 4 2 2 58 58 1 1 0 0 0 0 Command by id: replace s1 = 1 if _n==1 by id: replace s2 = 1 if _n==2 id pairid PTSD service s1 s2 1 1 2 1 1 1 45 45 17 1 0 1 1 0 1 0 1 0 2 1 17 1 0 1 3 2 66 0 1 0 3 2 66 1 0 1 4 4 2 2 58 58 1 1 1 0 0 1 Command generate z1 = service * s1 generate z2 = service * s2 id pairid PTSD service s1 s2 z1 z2 1 1 2 1 1 1 45 45 17 1 0 1 1 0 1 0 1 0 1 0 1 0 0 0 2 1 17 1 0 1 0 1 3 2 66 0 1 0 0 0 3 2 66 1 0 1 0 1 4 4 2 2 58 58 1 1 1 0 0 1 1 0 0 1 Command xtgee ptsd s1 z1 z2, i(pin) corr(ind) family(gau) robust Self Report sr | .1793066 .0070909 Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Group variable: pin Link: identity Family: Gaussian Correlation: independent Military Record mr | .152672 .0072727 Number of obs = 21508 Number of groups = 10809 Obs per group: min = 1 avg = 2.0 max = 2 Wald chi2(3) = 640.25 Scale parameter: .1210952 Prob > chi2 = 0.0000 Pearson chi2(21508): 2604.52 Deviance = 2604.52 Dispersion (Pearson): .1210952 Dispersion = .1210952 (Std. Err. adjusted for clustering on pin) -----------------------------------------------------------------------------| Semi-robust ptsd | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------s1 | -.0140807 .0016444 -8.56 0.000 -.0173037 -.0108576 z1 | .1793066 .0070906 25.29 0.000 .1654093 .1932038 z2 | .152672 .0072724 20.99 0.000 .1384183 .1669256 _cons | 3.144166 .0040243 781.30 0.000 3.136278 3.152053 ------------------------------------------------------------------------------ But wait . . . these guys are twins! Data within twin pairs might be correlated . . . Summary of pair types for analysis Vietnam Service Analysis Pair Types Total Pairs = 6207 Contributing pairs Complete pairs 4567 Half pairs Two twins present, but only one eligible Only one eligible twin present in data 172 1433 Non-contributing pairs Neither twin eligible 35 Command svyset id [pweight = sampweight], strata(pairid) pweight: VCE: Strata 1: SU 1: FPC 1: sampweight linearized pairid id <zero> Command svy: regress ptsd s1 z1 z2 Self Report sr | .1793066 .0070909 Survey: Linear regression Number of strata = 6172 Number of PSUs = 12344 Military Record mr | .152672 .0072727 Number of obs = 24557 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 -----------------------------------------------------------------------------| Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 -----------------------------------------------------------------------------Note: 35 strata omitted because they contain no population members Command test z1 = z2 Self Report sr | .1793066 .00618 . test z1 = z2 Military Record mr | .152672 .0069024 Moral of the story: .(test 1) z1 = z2 = 0 Adjusted Wald chi2( test1) = Prob > chi2 = ( 1) z1 - z2 = 0 44.89 0.0000 chi2( 1) = Prob > chi2 = 45.66 0.0000 The two sources contain different information. We should not combine them. Or, should we?? Model 2: Multiple Source Model of Within- and Between-pair exposure effects Same estimates as k separate marginal within & between models k 1 * ij gE Y 0 m1 m1 m ij s k k m m m m zij zi m zi m1 m1 source source by source by intercept indicators Allows testing for a difference in reports ofwithin-pair within effects & between between-pair effects effect effect interaction H 0 within pair : 1 2 ... interaction k terms terms H 0 between pair : 1 2 ... k Command id pairid s1 z1 1 1 2 1 1 1 1 0 1 1 0 1 2 1 0 0 Command bysort pairid: egen z1bar = mean(z1) if s1==1 id pairid s1 z1 z1bar 1 1 2 1 1 1 1 0 1 1 0 1 1 . 1 2 1 0 0 . Command bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 id pairid s1 z1 z1bar 1 1 2 1 1 1 1 0 1 1 0 1 1 0 1 2 1 0 0 0 Command bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 id pairid s1 z1 z1bar 3 2 1 0 0.5 3 2 0 0 0 4 4 2 2 1 0 1 0 0.5 0 Command bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 id pairid s1 z1 z1bar 1 1 2 1 1 1 1 0 1 1 0 1 1 0 1 2 1 0 0 0 3 2 1 0 0.5 3 2 0 0 0 4 4 2 2 1 0 1 0 0.5 0 Command bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 generate z1diff = z1 – z1bar id pairid s1 z1 z1bar z1diff 1 1 2 1 1 1 1 0 1 1 0 1 1 0 1 0 0 0 2 1 0 0 0 0 3 2 1 0 0.5 -0.5 3 2 0 0 0 0 4 4 2 2 1 0 1 0 0.5 0 0.5 0 Command (Repeat that procedure to make z2bar and z2diff) Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of PSUs = 12344 Number of obs = 24557 Population size = 21508 Design df = 6172 F( 3, 5, 6170) 6168) = 230.51 154.41 Prob > F = 0.0000 R-squared = 0.0511 0.0512 -----------------------------------------------------------------------------| Linearized logptsd2 ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------s1 | -.0182144 -.0140807 .0016726 .001642 -10.89 -8.58 0.000 -.0172995 -.0214933 -.0108619 -.0149355 z1diff z1 | .1669005 .1793066 .0134838 .006818 12.38 26.30 0.000 .1404675 .1659408 .1933335 .1926723 z1bar z2 | .1857651 .152672 .0074393 .0069024 24.97 22.12 0.000 .1711816 .1391409 .2003487 .166203 z2diff _cons | 3.144166 .1618065 .0035541 .0138901 884.66 11.65 0.000 3.137198 .134577 3.151133 .189036 -----------------------------------------------------------------------------z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 Note: 35 _cons strata | omitted 3.145802 because .0037693 they contain 834.58no 0.000 population3.138413 members 3.153191 -----------------------------------------------------------------------------Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of PSUs = 12344 Number of obs = 24557 Population size = 21508 Design df = 6172 F( 3, 5, 6170) 6168) = 230.51 154.41 Prob > F = 0.0000 R-squared = 0.0511 0.0512 -----------------------------------------------------------------------------| Linearized logptsd2 ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------s1 | -.0182144 -.0140807 .0016726 .001642 -10.89 -8.58 0.000 -.0172995 -.0214933 -.0108619 -.0149355 z1diff z1 | .1669005 .1793066 .0134838 .006818 12.38 26.30 0.000 .1404675 .1659408 .1933335 .1926723 z1bar z2 | .1857651 .152672 .0074393 .0069024 24.97 22.12 0.000 .1711816 .1391409 .2003487 .166203 z2diff _cons | 3.144166 .1618065 .0035541 .0138901 884.66 11.65 0.000 3.137198 .134577 3.151133 .189036 -----------------------------------------------------------------------------z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 Note: 35 _cons strata | omitted 3.145802 because .0037693 they contain 834.58no 0.000 population3.138413 members 3.153191 -----------------------------------------------------------------------------Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of PSUs = 12344 Number of obs = 24557 Population size = 21508 Design df = 6172 F( 3, 5, 6170) 6168) = 230.51 154.41 Prob > F = 0.0000 R-squared = 0.0511 0.0512 -----------------------------------------------------------------------------| Linearized logptsd2 ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------s1 | -.0182144 -.0140807 .0016726 .001642 -10.89 -8.58 0.000 -.0172995 -.0214933 -.0108619 -.0149355 z1diff z1 | .1669005 .1793066 .0134838 .006818 12.38 26.30 0.000 .1404675 .1659408 .1933335 .1926723 z1bar z2 | .1857651 .152672 .0074393 .0069024 24.97 22.12 0.000 .1711816 .1391409 .2003487 .166203 z2diff _cons | 3.144166 .1618065 .0035541 .0138901 884.66 11.65 0.000 3.137198 .134577 3.151133 .189036 -----------------------------------------------------------------------------z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 Note: 35 _cons strata | omitted 3.145802 because .0037693 they contain 834.58no 0.000 population3.138413 members 3.153191 -----------------------------------------------------------------------------Note: 35 strata omitted because they contain no population members Command test z1diff = z2diff AdjustedLinear Survey: Wald test regression Number of strata = 6172 Number ( 1) z1diff of PSUs- z2diff = = 0 12344 Number of obs = 24557 Population size = 21508 Design df = 6172 F( 1, 6172) = 0.36 F( 3, 6170) = 230.51 Prob > F = 0.5509 Prob > F = 0.0000 R-squared = 0.0511 -----------------------------------------------------------------------------| Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------s1 | -.0182144 -.0140807 .0016726 .001642 -10.89 -8.58 0.000 -.0172995 -.0214933 -.0108619 -.0149355 z1diff z1 | .1669005 .1793066 .0134838 .006818 12.38 26.30 0.000 .1404675 .1659408 .1933335 .1926723 z1bar z2 | .1857651 .152672 .0074393 .0069024 24.97 22.12 0.000 .1711816 .1391409 .2003487 .166203 z2diff _cons | 3.144166 .1618065 .0035541 .0138901 884.66 11.65 0.000 3.137198 .134577 3.151133 .189036 -----------------------------------------------------------------------------z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 Note: 35 _cons strata | omitted 3.145802 because .0037693 they contain 834.58no 0.000 population3.138413 members 3.153191 -----------------------------------------------------------------------------Note: 35 strata omitted because they contain no population members Command test z1diff = z2diff test z1bar = z2bar Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = Prob > F = 0.36 0.5509 Within-pair Moral of theestimates story: don’t differ much 1. Combine the within-pair info. | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------Adjusted Wald test s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 ( 1) z1diff z1bar | - z2bar .1669005 = 0 .0134838 12.38 0.000 .1404675 .1933335 Between-pair z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 estimates z2diff F( 1,| 6172) .1618065 = 83.66 .0138901 11.65 0.000 .134577 .189036 z2barProb | .1482027 > F = 0.0000 .0074941 19.78 0.000 .1335116 .1628937 do!! _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 -----------------------------------------------------------------------------Note: 35 strata omitted because they contain no population members 2. Keep between-pair info. separate Model 3: Multiple Source Model with a Combined within-pair effect Assumes within-pair effect to be common to all k sources k 1 * ij gE Y 0 m1 m1 m ij s source indicators m ~ zij zi m zi combined source within-pair effect Often yields a more precise estimate of the within-pair effect intercept k m1 source by between-pair effect interaction terms Command id pairid z1diff z2diff 1 1 2 1 1 1 0 0 0 0 -0.5 0 2 1 0 0.5 3 2 -0.5 0 3 2 0 0 4 4 2 2 0.5 0 0 0 Command generate wservice = z1diff + z2diff id pairid z1diff z2diff wservice 1 1 2 1 1 1 0 0 0 0 -0.5 0 0 -0.5 0 2 1 0 0.5 0.5 3 2 -0.5 0 -0.5 3 2 0 0 0 4 4 2 2 0.5 0 0 0 0.5 0 Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Survey: Linear Linear regression regression Number of strata = 6172 Number of strata = 6172 Number of PSUs = 12344 Number of PSUs = 12344 Number = 24557 Number of of obs obs = 24557 Population size = 21508 Population size = 21508 Design df = 6172 Design df = 6172 F( 4, 6169) = 192.48 F( 3, 6170) = 230.51 Prob > F = 0.0000 Prob > F = 0.0000 R-squared = 0.0512 R-squared = 0.0511 ----------------------------------------------------------------------------------------------------------------------------------------------------------| Linearized | Linearized logptsd2 | Coef. Std. t P>|t| [95% logptsd2 | Coef. Std. Err. Err. t P>|t| [95% Conf. Conf. Interval] Interval] -------------+----------------------------------------------------------------------------+---------------------------------------------------------------s1 .0016722 -10.89 0.000 -.0214919 -.0149358 s1 | | -.0182138 -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 wservice | .1644434 .0129988 12.65 0.000 .1389611 .1899256 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z1bar | .1857654 .0074392 24.97 0.000 .1711819 .2003489 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 z2bar | .1482022 .0074941 19.78 0.000 .1335111 .1628933 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 _cons | 3.145802 .0037693 834.59 0.000 3.138412 3.153191 ----------------------------------------------------------------------------------------------------------------------------------------------------------Note: 35 strata omitted because they contain no population members Note: 35 strata omitted because they contain no population members Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Survey: Linear Linear regression regression Number of strata = 6172 Number of strata = 6172 Number of PSUs = 12344 Number of PSUs = 12344 Number = 24557 Number of of obs obs = 24557 Population size = 21508 Population size = 21508 Design df = 6172 Design df = 6172 F( 4, 6169) = 192.48 F( 3, 6170) = 230.51 Prob > F = 0.0000 Prob > F = 0.0000 R-squared = 0.0512 R-squared = 0.0511 ----------------------------------------------------------------------------------------------------------------------------------------------------------| Linearized | Linearized logptsd2 | Coef. Std. t P>|t| [95% logptsd2 | Coef. Std. Err. Err. t P>|t| [95% Conf. Conf. Interval] Interval] -------------+----------------------------------------------------------------------------+---------------------------------------------------------------s1 .0016722 -10.89 0.000 -.0214919 -.0149358 s1 | | -.0182138 -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 wservice | .1644434 .0129988 12.65 0.000 .1389611 .1899256 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z1bar | .1857654 .0074392 24.97 0.000 .1711819 .2003489 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 z2bar | .1482022 .0074941 19.78 0.000 .1335111 .1628933 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 _cons | 3.145802 .0037693 834.59 0.000 3.138412 3.153191 ----------------------------------------------------------------------------------------------------------------------------------------------------------Note: 35 strata omitted because they contain no population members Note: 35 strata omitted because they contain no population members Survey regression results: PTSD regressed on multiple Vietnam service reports Model 1 ̂ Estimates Overall effects Self-report Military record se ̂ Model 2 ˆ se ̂ Model 3 ˆ se ̂ 0.1793 0.0068 0.15672 0.0069 Within-pair effects Self-report Military record Combined report 0.1669 0.0135 0.1618 0.0139 0.1644 0.0130 Between-pair effects Self-report Military record Hypothesis Tests H0: βSR(overall)=βMR(overall) H0: βSR(within)=βMR(within) H0: βSR(between)=βMR(between) 0.1858 0.0074 0.1482 0.0075 <0.0001 0.55 <0.0001 0.1858 0.0074 0.1482 0.0075 Survey regression results: PTSD regressed on multiple Vietnam service reports Model 1 ̂ Estimates Overall effects Self-report Military record se ̂ Model 2 ˆ se ̂ Model 3 ˆ se ̂ 0.1793 0.0068 0.15672 0.0069 Within-pair effects Self-report Military record Combined report 0.1669 0.0135 0.1618 0.0139 0.1644 0.0130 Between-pair effects Self-report Military record Hypothesis Tests H0: βSR(overall)=βMR(overall) H0: βSR(within)=βMR(within) H0: βSR(between)=βMR(between) 0.1858 0.0074 0.1482 0.0075 <0.0001 0.55 <0.0001 0.1858 0.0074 0.1482 0.0075 Conclusions from VET Registry analysis Sources differed in Model 1, so we did not combine them overall Within-pair estimates in Model 2 did not differ much by source, so . . . Model 3 combined within-pair estimates Within-pair estimate: Combined Record 0.16 (0.14, 0.19) 7 – 14% gain in efficiency over individual sources Conclusions from VET Registry analysis Between-pair estimates in Model 2 differed significantly Model 3 estimates separate between-pair effects for each source Source-specific between-pair estimates: Self Report Military Record 0.19 (0.17, 0.20) 0.15 (0.13, 0.16) Future Directions Accommodate covariate adjustment Compare pooled estimators to “AND” and “OR” type derived exposure variables Address zygosity within regression models Acknowledgements & References Jack Goldberg at UW Margaret Pepe at UW 1. Pepe MS, Whitaker RC, Seidel K. Estimating and comparing univariate associations with application to the prediction of adult obesity. Statistics in Medicine 1999; 18: 163-173. Nicholas Horton at Harvard 2. Horton NJ, Fitzmaurice GM. Regression analysis of multiple source and multiple informant data from complex survey samples. Statistics in Medicine 2004; 23:2911-2933. Thank you for listening