SUPPLEMENT: PHENOTYPE RELIABILITY FOR COMPLEX TRAITS 1 Supplementary Methods 1. Reliability of the HADS-D Responses to each item on the HADS-D were modeled using a threshold model. In this model, a standard normal variable is assumed to underlie the response; applying thresholds to this variable yields the observed ordinal response. The advantage of this approach is that the underlying normal variables provide a mathematically tractable as well as readily interpretable way to model the associations between the items. Observed responses to the HADS-D were determined not only by the severity of depression, but also by aspects unique to a given question, and by measurement error. In applying a single-factor model to the HADS-D, we thought of each response as the sum of these three contributions. In what follows, we derive the reliability of the HADS-D using the factor model. We show that items with considerable unique variance decrease the reliability of the sum score of a scale. Denote the seven HADS-D items by 𝑌1 … 𝑌7 , and denote the normal variables underlying them by 𝑌1∗ … 𝑌7∗ . The variance of the jth underlying normal item is 𝑉(𝑌𝑗∗ ) = 𝜆𝑗2 𝑉(𝑇) + 𝑉(𝑢𝑗 ) + 𝑉(𝑒) = 1, (1) where 𝜆𝑗 is the factor loading of item 𝑗, 𝑇 is the latent depression construct, 𝑢𝑗 is the unique variance associated with item 𝑗, and 𝑒 is measurement error. The reliability of item 𝑗 is 2 𝑐𝑜𝑟(𝑌𝑗∗ , 𝑇) = 𝜆𝑗2 , (2) which decreases as 𝑉(𝑢𝑗 ) or 𝑉(𝑒) increase. The reliability of the sumscore of HADS items is 𝑐𝑜𝑟(∑𝑗 𝑌𝑗∗ , 𝑇 )2 = (𝑉(𝑇) ∑𝑗 𝜆𝑗 ) 2 𝑉(𝑇)𝑉(∑𝑗 𝑌𝑗∗ ) , (3) where 𝑉(∑𝑗 𝑌𝑗∗ ) = ∑𝑗 (𝜆𝑗2 𝑉(𝑇) + 𝑉(𝑢𝑗 ) + 𝑉(𝑒)) + ∑𝑖≠𝑗 (𝜆𝑖 𝜆𝑗 𝑉(𝑇) + 𝑐𝑜𝑣(𝑢𝑖 , 𝑢𝑗 ) + 𝑉(𝑒)). (4) SUPPLEMENT: PHENOTYPE RELIABILITY FOR COMPLEX TRAITS The reliability of the sumscore depends on the items through the term (∑𝑗 𝜆𝑗 ) 2 2 𝑉(∑𝑗 𝑌𝑗∗ ) . This shows that large variances of unique factors can lead to a total score that is less reliable as an indicator of 𝑇 than a single item or than a sum of only a few items [Bollen and Lennox 1991]. Importantly, when adding items to form a sum score, the unique factors effectively contribute to the measurement error of the sum score, because by definition they measure item-specific content, and not the trait of interest. 3. Item Factor Analysis and Item Selection The approach to reliability described above is only applicable to single-factor models. We determined that the best fit to the HADS-D in this sample was a single-factor model by computing the eigenvalues of the sample correlation matrix of HADS-D items and applying Kaiser’s criterion (the number of factors to include is given by the number of eigenvalues that are greater than 1) [Kaiser 1991]. The eigenvalues are listed in Table S3. One eigenvalue exceeded 1, suggesting a single major factor underlying the data. The observed eigenvalues did not rule out the possibility that additional factors, correlated with the first, determined the observed correlations. If the HADS-D items showed interpretable loadings on these factors, including them in the model of item responses would be useful. Consequently, we fit 2- and 3-factor models to the male and female data sets, applying oblique geomin rotation. The resulting patterns of factor loadings are listed in Tables S4 (males) and S5 (females). These factor loadings do not have a simple structure, with the sixth item tending to load significantly onto the additional factors [Browne 2001]. This means that the factors uncovered do not have a straightforward interpretation. In our confirmatory analyses, we tested the single-factor model for measurement invariance between males and females, and found that sex-specific models of item responses did not yield improved fit when compared to using the same model for everyone. SUPPLEMENT: PHENOTYPE RELIABILITY FOR COMPLEX TRAITS 3 We performed item selection using the same subset of the data used for confirmatory analyses. The most important criterion in item selection was the size of the squared (standardized) factor loadings in the single-factor, no-gender-differences model. In a singlefactor model, this is equivalent to selecting them based on the 𝑅 2 of the item on the factor or on the contribution of the item to the test information statistic [McDonald 1999, Chapter 13]. Items that had much less than 40% of their variance attributable to the common factor were excluded. Under this criterion, the sixth HADS-D item is not significantly worse than the fourth item. However, its tendency to load on multiple factors in the earlier analyses suggested a that its low 𝑅 2 in the single-factor model was due mainly to unique variance, raising concerns about whether the result would generalize to other HADS translations [Maters, et al. 2013]. It was therefore excluded. 4. Biometrical Factor Models (“Twin Models”) using Mplus 7 The phenotypes used in twin and family modeling of HADS scores, denoted y, were 6-element column vectors consisting of twins’ scores, scores from the twins’ oldest two siblings, and scores from the twins’ parents. There were 6 observed phenotypic variances and 15 observed covariances. Mplus 7 software was used, with MODEL CONSTRAINT: statements used to specify constraints and KNOWNCLASS mixture modeling with the EM algorithm used to handle missing family member data [Kim, et al. 2014]. ys1 y= y s2 y f ym yt1 yt2 SUPPLEMENT: PHENOTYPE RELIABILITY FOR COMPLEX TRAITS 4 Twins were considered in two groups: MZ and DZ twins, which were assumed to have normally-distributed HADS scores For MZ twins, y∼N (μ, ΣM ) y∼N (μ, ΣD ) for DZ twins, The means and variances of HADS scores were assumed to be equal for all individuals in MZ and in DZ families; hence, diag (ΣM)=diag (ΣD). This constraint is based on the assumptions of Hardy-Weinberg equilibrium, equilibrium between mutation and selection, and that genetic and environmental effects have the same association to observed phenotypes in each generation. diag (ΣM)=diag (ΣD)=vI6×6 v=a2+d2+c2+2acg+u2 Here, a is the additive genetic effect, d is the non-additive genetic effect, c is the commonenvironmental effect, g the gene-environment covariance, and u the unique environmental effect. In Mplus 7 code, this was written: pvar = ADD**2 + DOM**2 + COM**2 + 2*ADD*COM*GEC + UNQ**2;. Monozygotic twins were assumed to have phenotypes that differed only due to the effects of non-shared environment. Hence, for MZs, SUPPLEMENT: PHENOTYPE RELIABILITY FOR COMPLEX TRAITS σt1, t2=a2+d2+c2+2acg In Mplus 7 code, this was written pvar = ADD**2 + DOM**2 + COM**2 + 2*ADD*COM*GEC;. Several constraints were common to the MZ and DZ covariance matrices: σt1, f=σt2, f =σs1, f=σs2, f=σt1, m=σt2, mσs1, m=σs2, m=σpo All covariances between parents and offspring were set to be equal. σt1, s1=σt2, s1 =σt1, s2=σt2, s2=σs1, s2=σsib All covariances between non-MZ-twin siblings were set equal. Because a univariate phenotype was considered, the spousal assortment copath was equal to the spousal correlation: σm, f=vμ or, in Mplus 7, covsp = pvar*COP; This means that the observed parent-offspring correlation was multiplied by (1+μ) to reflect the correlation between spouses’ additive genetic effects induced by assortment. Environmental transmission (parental phenotype covariance with sibling common environment) was also modeled, and denoted τ. Using the model in [Fulker 1988], this was constrained to be equal to gene-environment covariance. g=τ (1+μ) (a+cg) In Mplus 7, this was written GEC = CULT*(1 + COP)*(ADD + COM*GEC);. These constraints lead to an expected parent-offspring correlation of 5 SUPPLEMENT: PHENOTYPE RELIABILITY FOR COMPLEX TRAITS ( ) σpo= 0.5 (a2+acg)+cτ (1+μ) In Mplus 7, this constraint was given by: covpo = (0.5*(ADD**2 + ADD*COM*GEC) + COM*CULT)*(1 + COP); The expected sibling correlation was assumed to be the same as the DZ twin correlation. It is: σsib 2 2 =0.5a 1+ (a+cg) μ+0.25d2+c2+2acg This constraint was covsib = 0.5*(ADD**2)*(1+COP*(ADD + COM*GEC)**2) + 0.25*DOM**2 + COM**2 + 2*ADD*COM*GEC; in Mplus 7 code. The result is that there were 21 observed variances and covariances per group. These were constrained to one variance and four covariances, which are functions of five unique parameters a, c, d μ, τ. 6 SUPPLEMENT: PHENOTYPE RELIABILITY FOR COMPLEX TRAITS Supplementary Tables Subject Characteristics Table S1 Proportion of families having HADS-D data from pairs of family members Twin 1 Twin 1 Twin 2 Sib 1 Sib 2 Father Mother .562 .365 .098 .029 .163 .219 Twin 2 .56 .101 .028 .167 .218 Sib 1 .171 .022 .069 .093 Sib 2 .049 .016 .021 Father Mother .298 .217 .446 Note: Numbers on the diagonal represent proportion of families with non-missing HADS-D scores from the person in that role: e.g. about 60% of families have HADS-D scores from a twin and about 30% from a father. 7 SUPPLEMENT: PHENOTYPE RELIABILITY FOR COMPLEX TRAITS 8 HADS-D Items Table S2 Item numbers, text, and responses for the HADS-D Scale. Item Number Item text 0-response 1-response 2-response 3-response 1 I still enjoy the things I used to enjoy definitely as much not quite as much only a little hardly at all 2 I can laugh and see the funny side of things as much as I used to not as much now not at all as much now not at all 3 I feel cheerful not at all not often sometimes most of the time 4 I feel as if everything takes more effort almost always very often sometimes not at all 5 I have lost interest in my appearance definitely I do not take as much care as I should probably not as much now just as interested as ever 6 I look forward to things as much as I ever did somewhat less than I used to much less than I used to hardly at all 7 I can enjoy a good book or radio or TV program often sometimes not often very rarely Note: Items 1-4 on this scale constitute the HADS-4. SUPPLEMENT: PHENOTYPE RELIABILITY FOR COMPLEX TRAITS 9 Supplementary Results Table S3 Eigenvalues of sample correlation matrices for males and females: most favorable to a singlefactor model Males 3.20 0.87 0.81 0.66 0.59 0.47 0.41 Females 3.46 0.88 0.76 0.66 0.49 0.42 0.33 Table S4 Factor loadings and their standard errors from EFA of HADS-D items in males (two- and threefactor models). Item 2.1 Loading 2.2 Loading 3.1 Loading 3.2 Loading 3.3 Loading 1 .78(.39) -.08(.46) .74(.04) .05(.01) -.08(.06) 2 .79(.08) .003(.02) .79(.03) .004(.02) .01(.02) 3 .55(.11) .12(.11) .58(.08) -.09(.04) .20(.08) 4 .30(.29) .39(30) .20(.15) .01(.01) .56(.13) 5 .001(.01) .55(.13) -.001(.01) .17(.05) .38(.05) 6 .46(.17) .25(.27) .01(.003) .99(.003) .005(.003) 7 .29(.11) .12(.15) .23(.08) .13(.04) .06(.08) Note: Loadings significantly different from 0 at 𝑝 < 0.05 are in bold face. The two-factor model factors correlate at .31; in the three-factor model factors 1 and 2 correlate at .61, 1 and 3 at .65, and 2 and 3 at .45. SUPPLEMENT: PHENOTYPE RELIABILITY FOR COMPLEX TRAITS 10 Table S5 Factor loadings and their standard errors from EFA of HADS-D items in females (two- and threefactor models). Item 2.1 Loading 2.2 Loading 3.1 Loading 3.2 Loading 3.3 Loading 1 .46(.12) .31(.12) .78(.07) .01(.01) -.05(.09) 2 .66(.07) .22(.08) .53(.13) .35(.12) .02(.01) 3 .79(.02) .001(.06) .21(.12) .63(.11) -.01(.02) 4 .75(.05) -.06(.05) .02(.04) .72(.04) .002(.03) 5 .29(.04) .22(.04) -.03(.04) .42(.07) .24(.05) 6 .0001(.001) .79(.07) .45(.20) -.001(.003) .52(.14) 7 .10(.07) .35(.07) .05(.15) .22(.11) .30(.09) Note: Loadings significantly different from 0 at 𝑝 < 0.05 are in bold face. The two-factor model factors correlate at .65; in the three-factor model factors 1 and 2 correlate at .75, 1 and 3 at .39, and 2 and 3 at .21. Table S6 Standardized factor loadings and standard errors from single-factor CFA of HADS-D items in both sexes Item Loading R2 1 .72(.01) .52(.02) 2 .80(.01) .65(.02) 3 .68(.01) .46(.02) 4 .63(.01) .39(.01) 5 .41(.01) .17(.01) 6 .62(.01) .38(.02) 7 .39(.02) .15(.01) Note: Model comparison criteria suggested that the restricted all-parameters-equal-between-sexes model was a better fit to the data than a model in which factor loadings, variances, and means were allowed to vary. Sample-size adjusted BIC of the restricted model was 104016, of the unrestricted model, 104036. SUPPLEMENT: PHENOTYPE RELIABILITY FOR COMPLEX TRAITS 11 Table S7 Observed correlation structure of HADS-4 Scores in MZ and DZ families DZ Fams Twin 1 Twin 2 Sib 1 Sib 2 Father Mother Twin 1 . .099* .073 .277* .070 .140* Twin 2 .287* . .094 .140 .069 .128* Sib 1 .121* .009 . -.098 .143* .130* Sib 2 .255* .118 .029 . .133 .186 Father .121* .051 .073 -.092 . .235* Mother .148* .057 -.039 -.102 .207* . MZ Fams Note: MZ families are below the diagonal; * more than 2 SE from 0—note that 90% of families lack sibling data and 80% lack paternal data, so standard errors are quite large for cells not involving twins or their mothers; the table contains Spearman correlation coefficients, chosen to address the skewness in the data; the data take on 18 distinct values, making computation of polychoric correlation coefficients impractical. SUPPLEMENT: PHENOTYPE RELIABILITY FOR COMPLEX TRAITS 12 Table S8 Variance components of HADS-D score calculated using nuclear families of twins Model A D C h2 𝜇 E Free params ssaBIC AE .27(.02) . . .27(.02) . .73(.02) 15 78238 ACE .25(.03) . .02(.02) .25(.03) . .73(.02) 16 78242 ADE .20(.03) .13(.04) . .33(.03) . .67(.03) 16 78230 AE 𝜇 .23(.02) . . .23(.02) .23(.03) .77(.02) 16 78157 ACE 𝜇 .22(.03) . .02(.03) .22(.03) .23(.03) .76(.02) 17 78161 ADE 𝜇 .17(.02) .15(.04) . .32(.03) .25(.03) .68(.03) 17 78140 Note: A: additive genetic; D: non-additive genetic; C: family environment; h2: broad-sense heritability; µ: estimated correlation between spouses’ additive effects due to assortative mating. E: Error variance. More complex models, for example including gene-environment covariance, were estimated but did not converge. Table S9 Variance components of HADS-4 calculated using nuclear families of twins. A D C h2 𝜇 E Free params ssaBIC AE .25(.02) . . .25(.02) . .75(.02) 15 65560 ACE .23(.03) . .03(.03) .23(.03) . .74(.02) 16 65564 ADE .19(.03) .11(.04) . .30(.03) . .70(.03) 16 65555 AE 𝜇 .23(.02) . . .23(.02) .21(.03) .77(.02) 16 65512 ACE 𝜇 .21(.03) . .03(.03) .21(.03) .21(.03) .76(.02) 17 65515 ADE 𝜇 .18(.03) .12(.04) . .30(.03) .21(.03) .70(.03) 17 65503 Model Note: A: additive genetic; D: non-additive genetic; C: family environment; h2: broad-sense heritability; µ: estimated correlation between spouses’ additive effects due to assortative mating. E: Error variance. More complex models, for example including gene-environment covariance, were estimated but did not converge. SUPPLEMENT: PHENOTYPE RELIABILITY FOR COMPLEX TRAITS 13 References Bollen K, Lennox R. 1991. Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin 110(2):305-314. Browne MW. 2001. An Overview of Analytic Rotation in Exploratory Factor Analysis. Multivariate Behavioral Research 36(1):111-150. Fulker D. Genetic and cultural transmission in human behavior; 1988. p 318-340. Kaiser HF. 1991. Coefficient alpha for a principal component and the Kaiser-Guttman rule. Psychological Reports 68(3):855-858. Kim SY, Mun EY, Smith S. 2014. Using mixture models with known class membership to address incomplete covariance structures in multiple-group growth models. British Journal of Mathematical & Statistical Psychology 67(1):94-116. Maters GA, Sanderman R, Kim AY, Coyne JC. 2013. Problems in Cross-Cultural Use of the Hospital Anxiety and Depression Scale: "No Butterflies in the Desert". Plos One 8(8):11. McDonald RP. 1999. Test theory: A unified treatment: Psychology Press.