Supplementary Materials and Methods Additional NPL two-locus interaction using GENEHUNTER-TWOLOCUS. In order to further examine regions of potential interaction we used the nonparametric application in GENEHUNTER-TWOLOCUS 1; 2. This analysis is a three-dimensional scan where a score is derived based on inheritance vectors for each of the two loci and the trait phenotype. In this method, alleles that are shared in affected individuals are at once studied for identity-by-descent at the two loci. Definition of empirical NPL values of significance for two-locus interactions. Using gene dropping as implemented in MERLIN 3, we generated 10,000 simulated samples of the data to estimate NPL statistics. These simulations corroborated the significance of the observed two-unit increase of the interaction NPL values compared to the NPL scores in all families (P-values of P<0.001 and P<0.01 for the 17p-11q interaction and the 4q-11q interaction, respectively). The overlapping linkage peak on 11q when conditioning on 4q or 17p cannot be explained by an identical set of families linked to the three regions since only a small percentage of families (4%) exhibit this property (Supplementary Figure 2), and analyzing them alone yields a maximum NPL on 11q of 0.72 which is much more smaller than the NPL scores observed in the 4qand 17p-linked subsets of families (Supplementary Figure 1A). Furthermore this increased NPL score on 11q in the 4% of families linked to both 4q and 17p was not significant based on the MERLIN gene-dropping simulations. Two-locus parametric linkage analysis. Given the very strong evidence of linkage to these regions based on the non-parametric analysis, we used joint maximization of the linkage parameters and the trait model parameters to evaluate the mode of action/inheritance of each locus on ADHD risk. Parametric linkage analysis using GENEHUNTER-TWOLOCUS was applied to maximize the interaction linkage statistic by varying allele frequencies and penetrances 4. The analysis assumes both trait loci are diallelic and on different chromosomes. The models included in the analysis consist of dominant/dominant, dominant/recessive, recessive/dominant and recessive/recessive epistatic models as well as models for heterogeneity and additivity. The 4q-11q interaction gave a maximum parametric LOD score of 5.09 for a dominant/dominant model with 70% penetrance in individuals who carried at least 1 risk allele at both loci and a phenocopy rate (penetrance in all other individuals) of 1% (Table 1, supplementary material). The allele frequencies that maximized the LOD score were 5% for chromosome 4q and 25% for chromosome 11q. The 17p-11q interaction gave a maximum parametric LOD of 4.36 for a dominant/dominant model with 60% penetrance in individuals who carried at least 1 risk allele at both loci and a phenocopy rate of 1%. The allele frequencies that maximized the LOD score were 5% for chromosome 17p and 35% for chromosome 11q. Power Analyses To Detect Two Interacting Loci While Considering A Continuous Trait. To evaluate the power to detect two interacting loci while considering a continuous phenotype i.e. the presence of in (1) (significant interactive additive effects)6; 7 (see text in the manuscript for details), we used the simulation-based strategy described by Li et al 8. y S a1x1 d1z1 a2 x2 d2z2 iaa x1x2 iad x1z2 ida z1x2 idd z1z2 (1) Response to stimulant medication interaction analyses: When evaluating interacting effects underlying response to stimulant medication using equation (1), we observed a significant interactive effect to question 18 (interactive additive effects at LPHN3 and dominant effects at 11q). Following Li et al 8, the algorithm to evaluate the power is as follows: 1. Fit model (1) and retain the estimated parameter coefficients, , and the ˆ . To detect the interaction i , set it estimated variance-covariance matrix, ad ˆ ˆ to the actual parameter estimate (using the original data) and equal coefficient ia d to zero otherwise. 2. Generate n new values of y , say y * , from a multivariate normal distribution with ˆ, ˆ . To generate these observations, the MASS package9 was parameters ˆ used while setting n=82 (our original sample size). 3. Construct a simulated data set by replacing values of y in the original data set by those generated in 2. ˆ and its P-value. extract both 4. Fit model (1) using the data generated in 3 and ia d 5. Repeat 2-4, B times, i.e., B=10,000. 6. Fix the type I error probability as 0.05 and determine the rejection rate of the ˆ 0 vs. H0: ˆ 0. hypothesis H0: iad ia d 7. Construct a 2 by 2 table summarizing both procedures, i.e., epistatic effect vs. no epistatic effect (see below) and estimate the power as the sensitivity of the test, i.e., Power = n11/(n11+n21). Model Test outcome Epistatic ( ) Non-Epistatic ( ) Positive n11 n12 Negative n21 n22 Positive: P-value<0.05 8. Plot the ROC curve for the simulated data sets. We implemented this procedure using the epicalc package10. As a second approach to evaluate the convenience of including the iad interaction term, we also calculated the mean squared error (MSE) (for the B simulated data sets) as a measure of accuracy of model (1) (the lower the MSE, the better the model) when such term was or was not included. The MSE was defined as MSE 1 B n (y y *ij )2 nB j 1 i1 ij Results obtained for question 18 after applying this algorithm are presented in Supplementary Table 2 and Supplementary Figure 3A. Following steps 1-7, the power for detecting the iad interaction term is ~85% (Table 1). In Supplementary Figure 3B, we present the simulation-based density distribution for the MSE values for the epistatic and non-epistatic (reduced) models. This functions shows that including the iad interaction leads to a lower MSE average in the B simulated data set (epistatic = 0.634; reduced = 0.693; P-value < 0.0001), i.e., the epistatic model fits data the best. Brain Metabolism interaction analyses: We reported three regions that showed significant results of interaction effects: myoinositol in the right posterior cingulate gyrus (RPCG), myoinositol in the left posterior cingulate gyrus (LPCG), and choline in the right medial cingulate gyrus (RMCG). All of them showed an iad interactive effect (additive effect from 11q and dominant effect from LPHN3) as the model fitting the best the data. In this case, the fitted model has the following structure: y S A D a1x1 d1z1 a2 x2 d2z2 iaa x1x2 iad x1z2 ida z1x2 idd z1z2 (2) where y is the quantitative MRI phenotype, A is the age at diagnosis, and S is the is a code for gender (males=0, females=1). Details about other variables in (2) can be found in the text of the manuscript. Following Li et al 8, we used the algorithm described before to determine the power of the proton 1H-MRS data. Results for myoinositol in the RPCG, myoinositol in the LPCG, and choline in the RMCG after following steps 1-7, are presented in Supplementary Table 3A, 3B and 3C, respectively. Also, in Supplementary Figure 4 we present the ROC curves as well as MSE density plots for all of the three brain metabolites. For myoinositol in the RPCG, the power for detecting interacting effects is relatively low (~51%; Supplementary Table 3A). On the other hand, when comparing the simulation-based average MSE between the epistatic and reduced models, no statistically significant difference was found (epistatic = 1.7029, reduced = 1.7081; Pvalue=0.645) (Supplementary Figure 4A). For myoinositol in the LPCG and choline in the RMCG, the simulation-based power for detecting epistasis are ~95% (Supplementary Table 3B) and ~93% (Supplementary Table 3C), respectively. In addition, the comparison of the simulationbased MSE between the epistatic and reduced models gives statistically significant differences for both (myositol in LPCG: epistatic=0.0045, reduced=0.0163, Pvalue<0.0001; choline in RMCG: epistatic=0.0005, reduced=0.0008, Pvalue<0.0001)(Supplementary Figure 4B and Supplementary Figure 4C). Power Analyses To Detect Two Interacting Loci While Considering ADHD As A Binary Trait. We obtained a maximal NPL score value of 6.08 (P<0.00000001) located at 111.1 cM on 11q (SNP marker rs1293344) and at 91.3 cM on 4q (rs1038426) (empirical P-values were determined based on B=10,000 simulations). Using this information, we formulated our power analysis strategy as follows: let Ai and Bj the coordinates (in cM) for the 4q and 11q regions, respectively, and let Xij be the NPL score found at positions Ai and Bj, i=1,2,…,131, j=1,2,…,135 (131 and 135 represent the number of steps spanning 11q and 4q respectively). On the other hand, let and be the type I and type II error probabilities, respectively. We want to test H0: 0 versus H1: 0, where is the true parameter, i.e., the NPL score, and 0 , a pre-specified value for that parameter. Now, suppose that the null hypothesis is false and that the true NPL score is *0 0 , with , the change to be detected. Under regular conditions, the type II error probability is given by Montgomery & Runger as follows 11: (3) and Power = 1- . In the expression above, (z) denotes the probability of the left of z in the standard normal distribution, is the standard deviation and n is the number families. In our approach a total of k=60 equally spaced values in the interval [-2, 2] were generated, the type I error probability was held fixed at 5% and the Fisher’s information matrix, estimated using GENEHUNTER TWOLOCUS (see above), was used as an estimator of the standard deviation for the correspondent Xij. We used R (R Development Core Team, 2010) for calculating and plotting. For each k , k=1,2,…,60, a total of 17,685 power values were obtained. In general, our results indicate that for 0 , e.g. detecting a lower NPL score than one detected and reported in the MS, the power is >95%. Supplementary Figure 5 presents power values, as a function of , for the maximal NPL score value, 6.08. Assuming that 0.5 , the probability of detecting a maximal NPL score of 6.58 (6.08+ ) while keeping other parameters fixed (e.g., number of families, parameter of heterogeneity alpha=0.9991, and heterogeneity LOD, HLOD=2.0084) is about 60%. Supplementary Figure 6 depicts power values for some 0 values, as a function of recombination distances on chromosomes 4q and 11q. Here, the power is calculated for all possible 17,685 NPL scores and not only for the maximal value as in Supplementary Figure 5. As conclusion, power evaluation shows that, in general, our discovery sample exhibits exceptional power to detect two-locus interactions. This fact is now described in the text and procedures appended to the online supplementary material. Supplementary Figures. Supplementary Figure 1. (A) Results of a correlation subset analysis between linked regions in 134 nuclear families that were primarily derived from the multigenerational extended pedigrees. In order to determine the presence of an interaction we used the weight function weight 1-0 to measure correlation and weight0-1 to measure heterogeneity taking positive nonparametric linkage statistic as evidence of linkage 5. Results on chromosome 11q demonstrates an increase in the nonparamateric linkage statistic from 0.55 to 3.2 when conditioning on families linked to 4q (n=12) and an increase from 0.55 to 3.88 (n=11) when conditioning on families linked to 17p. The difference is greatest between mapping coordinates 110cM and 120cM on chromosome 11q, with identical regions being defined by the two interactions. Using 10,000 simulations implemented in MERLIN 3 determined empiric P values for the two results where both 11q conditioned on 17p and 11q conditioning on families linked to 4q were significant (P<0.001 and P<0.01 respectively). (B and C) Results from the GENEHUNTER TWOLOCUS 1 nonparametric module define global maxima for the two-locus linkage analysis. For the interaction between 17p and 11q (A, n=11) a maximal nonparametric score of 5.51 (P<0.000001) is located at 111.1 cM on 11q at SNP marker rs1293344 and 12.75 cM on 17p in the vicinity of rs9227. For the interaction between 4q and 11q (B, n=12) a maximal nonparametric score of 6.08 (P<0.00000001) is located at 111.1 cM on 11q at SNP marker rs1293344 and 91.3 cM on 4q in the vicinity of rs1038426. Again, identical regions on 11q are defined by this method. Supplementary Figure 2. A Venn diagram of the 23 total families linked to 4q, 11q and 17p discloses that fewer families are linked to 11q alone compared to 4q and 17p. Of the families included less than half, 47%, demonstrated linkage to more than one region, with 4% demonstrating linkage to all three regions. 22% of families are linked only to chromosome 4q and 17% only to chromosome 17p. A relatively smaller fraction, 13%, of families are linked only to chromosome 11q. The greatest overlap is between families linked to 11q and 17p and between families linked to 4q and 11q with relatively fewer families linked to both 4q and 17p. These results demonstrate that the cause for defining an identical region on 11q is not a highly overlapping set of families linked to 4q, 11q and 17p. Supplementary Figure 3. A. Receiver Operation Characteristic (ROC) for evaluating the performance of model (1) to detect the presence of iad for question 18. B. MSE density distribution function for the epistatic (in black) and reduced (in blue) models. A-B were generated by a simulationbased approach using B=10,000 data sets. Supplementary Figure 4. A. (left) ROC for evaluating the performance of model (1) to detect the presence of iad for myoinositol in the right posterior cingulate gyrus; (right) MSE density distribution function for the epistatic (in black) and reduced (in blue) models; B. (left) ROC curve for evaluating the performance of model (1) to detect the presence of iad for myoinositol in the the left posterior cingulate gyrus; (right) MSE density distribution function for epistatic (in black) and reduced (in blue) models; C. (left) ROC curve for evaluating the performance of model (1) to detect the presence of iad for choline in the right medial the epistatic (in black) and cingulate gyrus; (right) MSE density distribution function for reduced (in blue) models. A-C were generated by a simulation-based approach using B=10,000 data sets. Supplementary Figure 5. Power values as a function of for the maximal NPL score value of 6.08. Supplementary Figure 6. Power values as function of the coordinates of chromosomes 4q and 11q. Distance is weighted by the least square smoothing method for different values of . A. 0.1, B. 0.5 , and C. 1. Red indicates high power and dark green low power. Supplementary Tables. Supplementary Table 1 A Parametric 11q** LOD = 5.09 4q* +/+ +/+/+ 0.01 0.01 +/0.01 0.7 -/0.01 0.7 * 4q allele frequency = 5% ** 11q allele frequency = 25% B -/0.01 0.7 0.7 Parametric 11q** LOD = 4.36 17q* +/+ +/+/+ 0.01 0.01 +/0.01 0.6 -/0.01 0.6 * 17q allele frequency = 5% ** 11q allele frequency = 35% -/0.01 0.6 0.6 Parametric linkage analysis using GENEHUNTER-TWOLOCUS to maximize LOD statistics by varying allele frequency and penetrances. The models included in the analysis consist of dominant/dominant, dominant/recessive, recessive/dominant and recessive/recessive epistatic models as well as models for heterogeneity and additivity. The gene frequency and penetrance for each of these models was varied. The interaction between 4q and 11q gave a maximum parametric LOD score of 5.09 for a dominant/dominant model with 70% penetrance and a phenocopy rate of 1% (Table 1A). The allele frequencies that maximized the LOD score were 5% for chromosome 4q and 25% for chromosome 11q. SNP markers rs1293344 and rs1038426 were used to maximize the model since they demonstrate the largest linkage signal in the singlelocus nonparametric analysis. The maximum parametric LOD of 4.36 for a dominant/dominant model with 60% penetrance and a phenocopy rate of 1% was found for SNP markers rs1293344 on 11q and rs9227 on 17p (Table 1B). The allele frequencies that maximized the LOD score were 5% for chromosome 17p and 35% for chromosome 11q. Supplementary Table 2. Test outcome Epistatic ( ia d 0 ) 9983 1773 Model Non-Epistatic ( iad 0 ) 17 8227 Positive Negative Power=84.91% Summary of the performance for the epistatic and non-epistatic (reduced) model for question 18 using B=10,000 simulated data sets. For all tables, Positive means that Pvalue<0.05, i.e., ia d 0 (epistatic effect is present). Supplementary Table 3. A. Model Test Epistatic Non-Epistatic outcome ( 0 ) ( iad 0 ) ia d Positive 396 9604 Negative 371 9629 Power=51.63% B. Model Epistatic Non-Epistatic ( ia d 0 ) ( iad 0 ) Positive 9973 27 Negative 529 9471 Power=94.96% Test outcome C. Test outcome Positive Negative Power=93.25% Model Epistatic Non-Epistatic ( ia d 0 ) ( iad 0 ) 7286 2714 527 9473 Summary of the performance for the epistatic and non-epistatic (reduced) model for A. myoinositol in the right posterior cingulate gyrus, B. myoinositol in the left posterior cingulate gyrus, and C. choline in the right medial cingulate gyrus using B=10,000 simulated data sets. For all tables, Positive means that P-value<0.05, i.e., ia d 0 (epistatic effect is present). References 1. Strauch, K., Fimmers, R., Kurz, T., Deichmann, K.A., Wienker, T.F., and Baur, M.P. (2000). Parametric and nonparametric multipoint linkage analysis with imprinting and two-locustrait models: application to mite sensitization. Am J Hum Genet 66, 1945-1957. 2. Dietter, J., Spiegel, A., an Mey, D., Pflug, H.J., Al-Kateb, H., Hoffmann, K., Wienker, T.F., and Strauch, K. (2004). Efficient two-trait-locus linkage analysis through program optimization and parallelization: application to hypercholesterolemia. Eur J Hum Genet 12, 542-550. 3. Abecasis, G.R., Cherny, S.S., Cookson, W.O., and Cardon, L.R. (2002). Merlin--rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30, 97-101. 4. Greenberg, D.A., and Berger, B. (1994). Using lod-score differences to determine mode of inheritance: a simple, robust method even in the presence of heterogeneity and reduced penetrance. Am J Hum Genet 55, 834-840. 5. Cox, N.J., Frigge, M., Nicolae, D.L., Concannon, P., Hanis, C.L., Bell, G.I., and Kong, A. (1999). Loci on chromosomes 2 (NIDDM1) and 15 interact to increase susceptibility to diabetes in Mexican Americans. Nat Genet 21, 213-215. 6. Cordell, H.J. (2002). Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Human Molecular Genetics 11, 2463-2468. 7. Cordell, H.J., Todd, J.A., Hill, N.J., Lord, C.J., Lyons, P.A., Peterson, L.B., Wicker, L.S., and Clayton, D.G. (2001). Statistical modeling of interlocus interactions in a complex disease: rejection of the multiplicative model of epistasis in type 1 diabetes. Genetics 158, 357-367. 8. Li, H., Gao, G., Li, J., Page, G.P., and Zhang, K. (2007). Detecting epistatic interactions contributing to human gene expression using the CEPH family data. BMC Proc 1 Suppl 1, S67. 9. Venables, W.N., and Ripley, B.D. (202). Modern Applied Statistics with S.(New York: Springer, Verlag). 10. Chongsuvivatwong, V. (2010). epicalc: Epidemiological calculator. R package version 2.12.0.0. In. ( 11. Montgomery, R.C., and Runger, G.C. (2003). Applied Statistics and Probability for Engineers.(John Wiley & Sons).