T2-weighted high resolution anatomic images were obtained in the

advertisement
Supplementary Materials and Methods
Additional NPL two-locus interaction using GENEHUNTER-TWOLOCUS.
In order to further examine regions of potential interaction we used the nonparametric
application in GENEHUNTER-TWOLOCUS
1; 2.
This analysis is a three-dimensional
scan where a score is derived based on inheritance vectors for each of the two loci and
the trait phenotype. In this method, alleles that are shared in affected individuals are at
once studied for identity-by-descent at the two loci.
Definition of empirical NPL values of significance for two-locus interactions.
Using gene dropping as implemented in MERLIN 3, we generated 10,000 simulated
samples of the data to estimate NPL statistics. These simulations corroborated the
significance of the observed two-unit increase of the interaction NPL values compared
to the NPL scores in all families (P-values of P<0.001 and P<0.01 for the 17p-11q
interaction and the 4q-11q interaction, respectively). The overlapping linkage peak on
11q when conditioning on 4q or 17p cannot be explained by an identical set of families
linked to the three regions since only a small percentage of families (4%) exhibit this
property (Supplementary Figure 2), and analyzing them alone yields a maximum NPL
on 11q of 0.72 which is much more smaller than the NPL scores observed in the 4qand 17p-linked subsets of families (Supplementary Figure 1A). Furthermore this
increased NPL score on 11q in the 4% of families linked to both 4q and 17p was not
significant based on the MERLIN gene-dropping simulations.
Two-locus parametric linkage analysis.
Given the very strong evidence of linkage to these regions based on the non-parametric
analysis, we used joint maximization of the linkage parameters and the trait model
parameters to evaluate the mode of action/inheritance of each locus on ADHD risk.
Parametric linkage analysis using GENEHUNTER-TWOLOCUS was applied to
maximize the interaction linkage statistic by varying allele frequencies and penetrances
4.
The analysis assumes both trait loci are diallelic and on different chromosomes. The
models included in the analysis consist of dominant/dominant, dominant/recessive,
recessive/dominant and recessive/recessive epistatic models as well as models for
heterogeneity and additivity. The 4q-11q interaction gave a maximum parametric LOD
score of 5.09 for a dominant/dominant model with 70% penetrance in individuals who
carried at least 1 risk allele at both loci and a phenocopy rate (penetrance in all other
individuals) of 1% (Table 1, supplementary material). The allele frequencies that
maximized the LOD score were 5% for chromosome 4q and 25% for chromosome 11q.
The 17p-11q interaction gave a maximum parametric LOD of 4.36 for a
dominant/dominant model with 60% penetrance in individuals who carried at least 1 risk
allele at both loci and a phenocopy rate of 1%. The allele frequencies that maximized
the LOD score were 5% for chromosome 17p and 35% for chromosome 11q.
Power Analyses To Detect Two Interacting Loci While Considering A Continuous
Trait.
To evaluate the power to detect two interacting loci while considering a continuous
phenotype i.e. the presence of
in (1) (significant interactive additive effects)6; 7 (see
text in the manuscript for details), we used the simulation-based strategy described by
Li et al 8.
y    S  a1x1  d1z1  a2 x2  d2z2  iaa x1x2  iad x1z2  ida z1x2  idd z1z2
(1)
Response to stimulant medication interaction analyses:

When evaluating interacting effects underlying response to stimulant medication using
equation (1), we observed a significant interactive effect to question 18 (interactive
additive effects at LPHN3 and dominant effects at 11q). Following Li et al 8, the
algorithm to evaluate the power is as follows:
1. Fit model (1) and retain the estimated parameter coefficients,
, and the
ˆ . To detect the interaction i , set it
estimated variance-covariance matrix, 
ad
ˆ
ˆ to the actual parameter estimate (using the original data) and equal
coefficient 
ia d

to zero otherwise.


2. Generate
n new values of y , say y * , from a multivariate normal distribution with


ˆ, 
ˆ . To generate these observations, the MASS package9 was
parameters 
ˆ


used while setting n=82 (our original sample size).

3. Construct
a simulated data set by replacing values of y in the original data set by
those generated in 2.
ˆ and its P-value.
 extract both 
4. Fit model (1) using the data generated in 3 and
ia d
5. Repeat 2-4, B times, i.e., B=10,000.

6. Fix the type I error probability as   0.05 and determine
the rejection rate of the
ˆ  0 vs. H0: 
ˆ  0.
hypothesis H0: 
iad
ia d



7. Construct a 2 by 2 table summarizing both procedures, i.e., epistatic effect vs. no
epistatic effect (see below) and estimate the power as the sensitivity of the test,
i.e., Power = n11/(n11+n21).
Model
Test outcome
Epistatic
(
)
Non-Epistatic
(
)
Positive
n11
n12
Negative
n21
n22
Positive: P-value<0.05
8. Plot the ROC curve for the simulated data sets. We implemented this procedure
using the epicalc package10.
As a second approach to evaluate the convenience of including the iad interaction term,
we also calculated the mean squared error (MSE) (for the B simulated data sets) as a
measure of accuracy of model (1) (the lower the MSE, the 
better the model) when such
term was or was not included. The MSE was defined as
MSE 
1 B n
(y  y *ij )2
nB j 1 i1 ij

Results obtained for question
18 after applying this algorithm are presented in
Supplementary Table 2 and Supplementary Figure 3A. Following steps 1-7, the power
for detecting the iad interaction term is ~85% (Table 1).
In Supplementary Figure 3B, we present the simulation-based density distribution
 for the MSE values for the epistatic and non-epistatic (reduced) models. This
functions
shows that including the iad interaction leads to a lower MSE average in the B simulated
data set (epistatic = 0.634; reduced = 0.693; P-value < 0.0001), i.e., the epistatic model

fits data the best.
Brain Metabolism interaction analyses:
We reported three regions that showed significant results of interaction effects:
myoinositol in the right posterior cingulate gyrus (RPCG), myoinositol in the left
posterior cingulate gyrus (LPCG), and choline in the right medial cingulate gyrus
(RMCG). All of them showed an iad interactive effect (additive effect from 11q and
dominant effect from LPHN3) as the model fitting the best the data. In this case, the

fitted model has the following
structure:
y    S  A  D a1x1  d1z1  a2 x2  d2z2  iaa x1x2  iad x1z2  ida z1x2  idd z1z2 (2)
where y is the quantitative MRI phenotype, A is the age at diagnosis, and S is the is a
code for gender (males=0, females=1). Details about other variables in (2) can be found
 in the text of the manuscript. Following Li et al 8, we used the algorithm described
before to determine the power of the proton 1H-MRS data.
Results for myoinositol in the RPCG, myoinositol in the LPCG, and choline in the
RMCG after following steps 1-7, are presented in Supplementary Table 3A, 3B and 3C,
respectively. Also, in Supplementary Figure 4 we present the ROC curves as well as
MSE density plots for all of the three brain metabolites.
For myoinositol in the RPCG, the power for detecting interacting effects is
relatively low (~51%; Supplementary Table 3A). On the other hand, when comparing the
simulation-based average MSE between the epistatic and reduced models, no
statistically significant difference was found (epistatic = 1.7029, reduced = 1.7081; Pvalue=0.645) (Supplementary Figure 4A).
For myoinositol in the LPCG and choline in the RMCG, the simulation-based
power for detecting epistasis are ~95% (Supplementary Table 3B) and ~93%
(Supplementary Table 3C), respectively. In addition, the comparison of the simulationbased MSE between the epistatic and reduced models gives statistically significant
differences for both (myositol in LPCG: epistatic=0.0045, reduced=0.0163, Pvalue<0.0001; choline in RMCG: epistatic=0.0005, reduced=0.0008, Pvalue<0.0001)(Supplementary Figure 4B and Supplementary Figure 4C).
Power Analyses To Detect Two Interacting Loci While Considering ADHD As A
Binary Trait.
We obtained a maximal NPL score value of 6.08 (P<0.00000001) located at 111.1 cM
on 11q (SNP marker rs1293344) and at 91.3 cM on 4q (rs1038426) (empirical P-values
were determined based on B=10,000 simulations). Using this information, we
formulated our power analysis strategy as follows: let Ai and Bj the coordinates (in cM)
for the 4q and 11q regions, respectively, and let Xij be the NPL score found at positions
Ai and Bj, i=1,2,…,131, j=1,2,…,135 (131 and 135 represent the number of steps
spanning 11q and 4q respectively). On the other hand, let  and  be the type I and
type II error probabilities, respectively. We want to test H0:   0 versus H1:   0,


where  is the true parameter, i.e., the NPL score, and 0 , a pre-specified value for that




parameter. Now, suppose that the null hypothesis is false and that the true NPL score is
*0  0   , with  , the change to be detected. Under regular conditions, the type II error
probability is given by Montgomery & Runger as follows 11:


(3)
and Power = 1- . In the expression above, (z) denotes the probability of the left of z
in the standard normal distribution,  is the standard deviation and n is the number


families.

In our approach a total of k=60 equally spaced  values in the interval [-2, 2]
were generated, the type I error probability was held fixed at 5% and the Fisher’s

information matrix, estimated using GENEHUNTER
TWOLOCUS (see above), was
used as an estimator of the standard deviation  for the correspondent Xij. We used R
(R Development Core Team, 2010) for calculating and plotting.

For each  k , k=1,2,…,60, a total of 17,685 power values were obtained. In
general, our results indicate that for   0 , e.g. detecting a lower NPL score than one
detected
and reported in the MS, the power is >95%. Supplementary Figure 5 presents

power values, as a function of  , for the maximal NPL score value, 6.08. Assuming that
  0.5 , the probability of detecting a maximal NPL score of 6.58
(6.08+  ) while

keeping other parameters fixed (e.g., number of families, parameter of heterogeneity


alpha=0.9991, and heterogeneity LOD, HLOD=2.0084) is about 60%. Supplementary
Figure 6 depicts power values for some   0 values, as a function of recombination
distances on chromosomes 4q and 11q. Here, the power is calculated for all possible

17,685 NPL scores and not only for the maximal value as in Supplementary Figure 5.
As conclusion, power evaluation shows that, in general, our discovery sample
exhibits exceptional power to detect two-locus interactions. This fact is now described in
the text and procedures appended to the online supplementary material.
Supplementary Figures.
Supplementary Figure 1.
(A) Results of a correlation subset analysis between linked regions in 134 nuclear
families that were primarily derived from the multigenerational extended pedigrees. In
order to determine the presence of an interaction we used the weight function weight 1-0
to measure correlation and weight0-1 to measure heterogeneity taking positive
nonparametric linkage statistic as evidence of linkage 5. Results on chromosome 11q
demonstrates an increase in the nonparamateric linkage statistic from 0.55 to 3.2 when
conditioning on families linked to 4q (n=12) and an increase from 0.55 to 3.88 (n=11)
when conditioning on families linked to 17p. The difference is greatest between
mapping coordinates 110cM and 120cM on chromosome 11q, with identical regions
being defined by the two interactions. Using 10,000 simulations implemented in
MERLIN 3 determined empiric P values for the two results where both 11q conditioned
on 17p and 11q conditioning on families linked to 4q were significant (P<0.001 and
P<0.01 respectively). (B and C) Results from the GENEHUNTER TWOLOCUS 1
nonparametric module define global maxima for the two-locus linkage analysis. For the
interaction between 17p and 11q (A, n=11) a maximal nonparametric score of 5.51
(P<0.000001) is located at 111.1 cM on 11q at SNP marker rs1293344 and 12.75 cM on
17p in the vicinity of rs9227. For the interaction between 4q and 11q (B, n=12) a
maximal nonparametric score of 6.08 (P<0.00000001) is located at 111.1 cM on 11q at
SNP marker rs1293344 and 91.3 cM on 4q in the vicinity of rs1038426. Again, identical
regions on 11q are defined by this method.
Supplementary Figure 2.
A Venn diagram of the 23 total families linked to 4q, 11q and 17p discloses that fewer
families are linked to 11q alone compared to 4q and 17p. Of the families included less
than half, 47%, demonstrated linkage to more than one region, with 4% demonstrating
linkage to all three regions. 22% of families are linked only to chromosome 4q and 17%
only to chromosome 17p. A relatively smaller fraction, 13%, of families are linked only to
chromosome 11q. The greatest overlap is between families linked to 11q and 17p and
between families linked to 4q and 11q with relatively fewer families linked to both 4q and
17p. These results demonstrate that the cause for defining an identical region on 11q is
not a highly overlapping set of families linked to 4q, 11q and 17p.
Supplementary Figure 3.
A. Receiver Operation Characteristic (ROC) for evaluating the performance of model (1)
to detect the presence of iad for question 18. B. MSE density distribution function for the
epistatic (in black) and reduced (in blue) models. A-B were generated by a simulationbased approach using B=10,000 data sets.

Supplementary Figure 4.
A. (left) ROC for evaluating the performance of model (1) to detect the presence of iad
for myoinositol in the right posterior cingulate gyrus; (right) MSE density distribution
function for the epistatic (in black) and reduced (in blue) models; B. (left) ROC curve for
evaluating the performance of model (1) to detect the presence of iad for myoinositol in
 the
the left posterior cingulate gyrus; (right) MSE density distribution function for
epistatic (in black) and reduced (in blue) models; C. (left) ROC curve for evaluating the
performance of model (1) to detect the presence of iad for choline in the right medial
 the epistatic (in black) and
cingulate gyrus; (right) MSE density distribution function for
reduced (in blue) models. A-C were generated by a simulation-based approach using
B=10,000 data sets.

Supplementary Figure 5.
Power values as a function of  for the maximal NPL score value of 6.08.
Supplementary Figure 6.

Power values as function of the coordinates of chromosomes 4q and 11q. Distance is
weighted by the least square smoothing method for different values of  . A.   0.1, B.
  0.5 , and C.  1. Red indicates high power and dark green low power.




Supplementary Tables.
Supplementary Table 1
A
Parametric
11q**
LOD = 5.09
4q*
+/+
+/+/+
0.01
0.01
+/0.01
0.7
-/0.01
0.7
* 4q allele frequency = 5%
** 11q allele frequency = 25%
B
-/0.01
0.7
0.7
Parametric
11q**
LOD = 4.36
17q*
+/+
+/+/+
0.01 0.01
+/0.01
0.6
-/0.01
0.6
* 17q allele frequency = 5%
** 11q allele frequency = 35%
-/0.01
0.6
0.6
Parametric linkage analysis using GENEHUNTER-TWOLOCUS to maximize LOD
statistics by varying allele frequency and penetrances. The models included in the
analysis consist of dominant/dominant, dominant/recessive, recessive/dominant and
recessive/recessive epistatic models as well as models for heterogeneity and additivity.
The gene frequency and penetrance for each of these models was varied. The
interaction between 4q and 11q gave a maximum parametric LOD score of 5.09 for a
dominant/dominant model with 70% penetrance and a phenocopy rate of 1% (Table
1A). The allele frequencies that maximized the LOD score were 5% for chromosome 4q
and 25% for chromosome 11q. SNP markers rs1293344 and rs1038426 were used to
maximize the model since they demonstrate the largest linkage signal in the singlelocus nonparametric analysis. The maximum parametric LOD of 4.36 for a
dominant/dominant model with 60% penetrance and a phenocopy rate of 1% was found
for SNP markers rs1293344 on 11q and rs9227 on 17p (Table 1B). The allele
frequencies that maximized the LOD score were 5% for chromosome 17p and 35% for
chromosome 11q.
Supplementary Table 2.
Test outcome
Epistatic
( ia d  0 )
9983
1773
Model
Non-Epistatic
( iad  0 )
17
8227
Positive
Negative
Power=84.91%


Summary of the performance for the epistatic and non-epistatic (reduced) model for
question 18 using B=10,000 simulated data sets. For all tables, Positive means that Pvalue<0.05, i.e., ia d  0 (epistatic effect is present).

Supplementary Table 3.
A.
Model
Test
Epistatic Non-Epistatic
outcome (   0 )
( iad  0 )
ia d
Positive
396
9604
Negative
371
9629
Power=51.63%


B.
Model
Epistatic Non-Epistatic
( ia d  0 )
( iad  0 )
Positive
9973
27
Negative
529
9471
Power=94.96%


Test
outcome
C.
Test outcome
Positive
Negative
Power=93.25%

Model
Epistatic Non-Epistatic
( ia d  0 )
( iad  0 )
7286
2714
527
9473

Summary of the performance for the epistatic and non-epistatic (reduced) model for A.
myoinositol in the right posterior cingulate gyrus, B. myoinositol in the left posterior
cingulate gyrus, and C. choline in the right medial cingulate gyrus using B=10,000
simulated data sets. For all tables, Positive means that P-value<0.05, i.e., ia d  0
(epistatic effect is present).

References
1. Strauch, K., Fimmers, R., Kurz, T., Deichmann, K.A., Wienker, T.F., and Baur, M.P. (2000).
Parametric and nonparametric multipoint linkage analysis with imprinting and two-locustrait models: application to mite sensitization. Am J Hum Genet 66, 1945-1957.
2. Dietter, J., Spiegel, A., an Mey, D., Pflug, H.J., Al-Kateb, H., Hoffmann, K., Wienker, T.F.,
and Strauch, K. (2004). Efficient two-trait-locus linkage analysis through program
optimization and parallelization: application to hypercholesterolemia. Eur J Hum Genet
12, 542-550.
3. Abecasis, G.R., Cherny, S.S., Cookson, W.O., and Cardon, L.R. (2002). Merlin--rapid
analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30, 97-101.
4. Greenberg, D.A., and Berger, B. (1994). Using lod-score differences to determine mode of
inheritance: a simple, robust method even in the presence of heterogeneity and reduced
penetrance. Am J Hum Genet 55, 834-840.
5. Cox, N.J., Frigge, M., Nicolae, D.L., Concannon, P., Hanis, C.L., Bell, G.I., and Kong, A.
(1999). Loci on chromosomes 2 (NIDDM1) and 15 interact to increase susceptibility to
diabetes in Mexican Americans. Nat Genet 21, 213-215.
6. Cordell, H.J. (2002). Epistasis: what it means, what it doesn't mean, and statistical methods to
detect it in humans. Human Molecular Genetics 11, 2463-2468.
7. Cordell, H.J., Todd, J.A., Hill, N.J., Lord, C.J., Lyons, P.A., Peterson, L.B., Wicker, L.S., and
Clayton, D.G. (2001). Statistical modeling of interlocus interactions in a complex
disease: rejection of the multiplicative model of epistasis in type 1 diabetes. Genetics
158, 357-367.
8. Li, H., Gao, G., Li, J., Page, G.P., and Zhang, K. (2007). Detecting epistatic interactions
contributing to human gene expression using the CEPH family data. BMC Proc 1 Suppl
1, S67.
9. Venables, W.N., and Ripley, B.D. (202). Modern Applied Statistics with S.(New York:
Springer, Verlag).
10. Chongsuvivatwong, V. (2010). epicalc: Epidemiological calculator. R package version
2.12.0.0. In. (
11. Montgomery, R.C., and Runger, G.C. (2003). Applied Statistics and Probability for
Engineers.(John Wiley & Sons).
Download