Vol. 1 no. 1 2003 Pages 1–1 BIOINFORMATICS COMPARING QUANTITATIVE TRAIT LOCI AND GENE EXPRESSION DATA ASSOCIATED WITH A COMPLEX TRAIT Bing Han*, Naomi S. Altman*, David J. Vandenbergh Department of Statistics, Pennsylvania State University, University Park, PA, US and Department of Biobehavioral Health, Pennsylvania State University, University Park, PA US ABSTRACT Motivation: Quantitative Trait Locus (QTL) analysis estimates the position of genes that affect a trait, but does not identify individual genes associated with the trait. Microarray analysis can detect genes that are transcriptionally regulated, and allelic differences that alter transcription may be the cause of the trait. In this paper we develop methods to compare the consistency of the two approaches and apply these methods to several sets of QTL and microarray results. Results: Biological evidence indicates that the nucleus accumbens area of the brain is associated with many drug abuse-related traits (Carelli & Wightman, 2004). We consider the association between a set of QTLs associated with drug abuse in mice, and a set of genes found by microarray analysis to preferentially expressed in the nucleus accumbens. For comparison, we also use a set of QTLs associated with bone parameters, and a set of genes found to be preferentially expressed in the medial basal hypothalamus, a region of the brain not thought to be associated with either drug abuse or bone density. Our analyses reveal that the proposed association between the drug abuse QTLs and the genes up-regulated in the nucleus accumbens is no stronger than the association between the bone density QTLs and the nucleus accumbens genes. The association between the medial basal hypothalamus specific genes and the two QTL sets is weaker however. Simulation results show that the associations between the nucleus accumbens genes and the two QTL sets are stronger than would be expected from a randomly selected set of genes of the same size. {I think we should present the randomly selected genes as the primary control set, and then present the hypothalamus genes. That way we can point out that we only know a small fraction of the roles of the hypothalamus to diminish the lack of a significant difference. What do you think?} The analyses show a possible association between the seemingly uncorrelated traits of drug abuse and bone density. Statistical methodology developed for this study can be applied to similar studies * to assess the joint information in microarray and QTL analyses. 1 INTRODUCTION The association between phenotypic traits and genetic markers on the chromosome can be detected through statistical analysis, leading to the identification of QTLs – regions of the chromosome that appear to be associated with the phenotype. QTLs are expected to be associated with the genes controlling some aspect of the phenotype. Microarray gene expression studies can be used to assess which genes are differentially expressed in organisms with different phenotypes. Hence it seems natural to combine QTL and gene expression data to determine the genes associated with complex traits. {I would delete the previous sentence.} Several investigators have considered combining QTL and microarray data for studying a genetic trait. For example, Wayne and Mclntyre (2002) proposed a way of identifying candidate genes based on both QTL mapping and microarray data. Fischer et al. (2003) developed a web-based software tool for combined visualization and exploration of gene expression data and QTLs. However, comparing QTL and microarray data is not completely straight forward. First, the estimated range of QTL positions is generally wide, containing thousands of putative genes. However, QTL analysis may also miss some interesting genes (Wayne and Mclntyre, 2002). Second, the high level of experimental error and limitations of analysis in microarray data introduce mistakes in the identification of relevant genes. Finally, in a complex situation such as the association of a phenotype with a specific tissue, the set of genes identified as “preferentially expressed” in the tissue depends on the set of reference tissues included in the study, while the association between a phenotype and the tissue may be indirect, depending on intermediary mechanisms. As well, the association between a phenotype and a tissue may depend on ephemeral conditions that may not be present when the tissue was collected for the microarray study or on a small percentage of cells in the organism, which may be masked by bulk tissue preparation. To whom correspondence should be addressed. © Oxford University Press 2003 1 K.Takahashi et al. bQTL lin k k lin C N-A genes D link B link A pQTL MBH genes Fig. 1. Relation between QTLs and gene sets. The names of links are used through the paper. 10 0 5 chromosome 15 20 pqtl 5.0 e+07 1.0 e+08 1.5 e+08 basepair Fig. 2. Combined visualization of pQTLs and N-A genes In this paper, we suggest several methods to examine the strength of association between a group of QTLs and a set of genes identified from a microarray study. The methods provide statistical evidence for or against the null hypothesis that the association is no stronger than a reference or expected by chance. As a by-product, the methods can also provide information about the association between two traits or a trait and a tissue. We apply our methods to two sets of mouse QTLs identified from the literature and two sets of mouse genes identified from a microarray study. First, we identified a set of 120 QTLs associated with drug abuse from the Mouse Genome Informatics database (http://www.informatics.jax.org/), which we call the aQTL set, consisting of QTLs for any drug with rewarding properties in mice, M. Jung Honors Thesis, Penn State University) {Can we change to “a” for abuse rather than “p” in the manuscript? I don’t see any relationship between p and the QTLs – if difficult to change it is not a biggie} and 174 genes that are preferentially expressed in the nucleus accumbens region of the mouse brain (the N-A genes) as 2 compared to two adjacent regions (Preoptic Area and Medial Basal Hypothalamus; manuscript in preparation). The nucleus accumbens plays an important role in mouse behaviors relevant to drug abuse. We expect a strong association between pQTLs and N-A genes, i.e. link A. The second set of 165 QTLs is associated with bone strength, morphometry, mineral content and organic content (the bQTL set, Lang et al., 2005). The second set of 39 genes is preferentially expressed in the medial basal hypothalamus (MBH) region of the brain, (the MBH genes, manuscript in preparation). There is no known association between the traits for drug abuse and bone strength, or between either of these traits and the MBH region. Hence we expect the strongest association between the pQTLs and the N-A genes, and very weak links for all the other QTL set by gene set pairs. Figure 1 illustrates all the possible pairs. We can set up other independent referential sets of genes by randomly choosing genes from the array (we used Affymetrix® array Mouse Genome U74Av2). The strength of the link between these randomly chosen referential sets and the set of QTLs provides a measure of the statistical significance of the observed link. In Section 2 we define the strength of a link in two ways: completeness and accuracy. In Section 3 we briefly review loglinear models for multiway tables. Throughout the remainder of the paper, the parameters of loglinear models are used to summarize links. In part 4 we compare the link between the N-A genes and the pQTLs with the referential links in terms of their completeness, and in part 5 in terms of their accuracy. In part 6 we compare the links between the two QTL sets and two gene sets with links with randomly chosen genes. Part 7 summarizes the results from the chosen model in part 4 and 5. Finally, we briefly discuss some possible improvement of the comparisons with randomly selected genes. 2 EXPLORATORY DATA ANALYSIS AND QUANTIFICATION OF LINKS Figures 2 and 3 show the correspondence between the sets of QTLs and the set of N-A genes, where Figure 2 corresponds to link A, the supposed strong association, and Figure 3 corresponds to link D. In each figure, the long horizontal lines represent the chromosomes. The Y-coordinate 20 corresponds to the X chromosome. No data were available regarding gene expression or QTLs on the Y chromosome in our data. The short discrete horizontal segments are the spans of the QTLs defined as +/- 5 centiMorgans (cM) from the peak position. The small circles in the center of every segment are the peak positions of the QTLs. Finally the vertical lines are the N-A genes. Some genes fall out of the range of chromosomes, e.g. chromosome #1 in Figure 3, which is due to the differences COMPARING QUANTITATIVE TRAIT LOCI AND GENE EXPRESSION DATA ASSOCIATED WITH A COMPLEX TRAIT 10 5 chromosome 15 20 bqtl 0.0 e+00 5.0 e+07 1.0 e+08 1.5 e+08 basepair Fig. 3. Combined visualization of bQTLs and N-A genes between the data sources available to annotate the U74Av2 array. The data we work with are from Affymetrix®, but the plot is drawn using the Bioconductor suite in R (Gentleman et al, 2004). QTLs are measured in centimorgans (cM), which measures recombination frequency between markers on a chromosome. Gene locations are usually measured by the physical distance in base pairs (bp) or megabase pairs (1 Mb =106 bp). Empirically, on average 2 Mb = 1 cM in the mouse. There are a few more accurate methods to translate cM into Mb (e.g. Silver, 1995 and Fischer et al, 2003). We use polynomial regression to estimate physical distance from cM, using genes for which both measures are available. This method has good performance except at the ends of a chromosome. Any QTL with a span that extends beyond the end of a chromosome is truncated. For example, see the right end of chromosome 20 in figure 2. [Bing, this does not show up on the figure.] No obvious matches between the QTL sets and the N-A genes can be seen in either Figure 2 or 3. The visual impression does not support a greater association for link A than for link D. However, the distributions of both genes and QTLs clearly differ among the chromosomes. We consider two approaches to quantify the strength of a link. For convenience, we denote a set of QTLs, such as drug abuse QTLs, by Q and a set of genes, such as the N-A genes, by G. We consider the strength of association to be a quantitative measure that may or may not have biological meaning. A natural first approach is to consider the number of genes in G covered by the whole span of Q. The link between Q and G is strong if this number is big. This quantifi- cation reflects the “completeness” of Q in terms of covering G. A second approach is to consider whether each QTL in Q covers at least one gene in G. If a QTL in Q covers no genes in G, it is called “empty”; otherwise it is “non-empty”. The link between Q and G is strong when the percentage of empty QTL is small. This quantification reflects the “accuracy” of Q in terms of covering G. If Q is strongly associated with G, we expect both coverage and accuracy to be high. Increasing the span of Q with additional QTLs may increase completeness just by chance, but should also decrease accuracy. It is not clear whether completeness or accuracy is the best way to summarize the strength of a link between Q and G. In Section 4, we consider completeness and in Section 5 we consider accuracy. We assess both completeness and accuracy using loglinear models for multiway tables. Loglinear models are introduced in Section 3. 3 THE LOGLINEAR MODEL The data are counts of genes in various categories, which can be represented as multiway tables. We will make use of loglinear models to parameterize these tables and test hypotheses. In this section, we give a brief overview of loglinear models. We start with the simple case of a 2-way table, such as Table 1, which summarizes the number of genes covered by each QTL set. In the loglinear model, the cell counts are modeled as Poisson random variables (Agresti, 2002). The Poisson rates are modeled as functions of the table margins. For example, denoting the number of genes in row i, column j of the table by nij, and letting ij = E(nij) the saturated model is log ij i j ( )ij , (1) where 1 1 ( )1 j ( )i1 0. The model is said to be saturated, because this allows a perfect fit to the data. is the average of ij over all the cells, i is the effect of the ith row, and j is the effect of the jth column. ()ij is called the interaction of the row and column effects. When the rows and columns of the table are independent, the interaction effects are all 0. In a 2-way table, the familiar chi-squared test of independence is the same as the test that the interactions are zero in the loglinear model. The popularity of the log-linear model for analysis of multiway tables comes from the ease with which the model can be extended to include larger tables and more complex situations. For example, we can readily extend to a 3-way table by adding an effect k for the 3rd dimension, along with interactions (ik, (jk and (ijk. Estimation of the 3 K.Takahashi et al. terms in the model is generally done through maximum likelihood. 4 COMPARING COMPLETENESS We use the measure of completeness described in part 2 to compare putative links, such as links A (pQTLs and N-A genes) and D (bQTL and N-A genes) We also take into account the unequal distribution of genes and QTLs on the chromosomes. Table 1 summarizes the coverage of the N-A genes by each set of QTLs. The completeness of the pQTLs is proportional to the first row probability π1+=π11+π12, where πij is the probability of a gene in the cell (i, j). Similarly the completeness of the bQTLs is proportional to the first column probability π+1=π11+π21. We expect link A to be more complete than link D i.e. π1+ >π+1.. This expectation is tested formally using a Wald test (Agresti, 2002), which gives moderate evidence that π1+≠π+1 (p=0.06 using the Wald test, Agresti, 2002). However, the evidence suggests π1+ <π+1, i.e. link D is more complete than link A. There is moderate evidence that the bQTLs cover more N-A genes than the pQTLs. Figures 2 and 3 demonstrate that the distributions of genes and QTLs differ among the chromosomes. Hence it is possible that a “chromosome effect” may influence observed completeness. To account for the possible chromosome effect, we stratify Table 1 into a 2×2×20 table, as demonstrated in Table 2. This table has 80 cells for the 174 N-A genes, giving 25 cells with no genes. The data can be accessed from http://www.stat.psu.edu/~hanbing/qtlpaper/. Table 1. Overall count for #N-A genes covered Count of N-A genes Covered by bQTLs Covered by pQTLs 47 Not by pQTLs 44 Column sum 91 Not by bQTLs Row sum 32 51 83 79 95 174 Table 2. The stratified 2×2×20 contingency table counting the covered NA genes Count of N-A genes Chromosome Covered by bQTLs Not Covered by bQTLs 1 2 Covered by pQTLs … 20 1 2 Not Covered by … pQTLs 20 n111 n112 … n1,1,20 n211 n212 … n2,1,20 n121 n122 … n1,2,20 n221 n222 … n2,2,20 We model the table using a loglinear model, with cell counts depending on the table margins: coverage by pQTLs (i), coverage by bQTLs (j) and chromosomes (k). Denoting the cell counts by nijk, the full model allows a different Poisson rate for each cell. The saturated model has three predictors: an indicator named pQTL denoting covered by pQTLs, an indicator named bQTL similarly, and chromosome with 20 discrete levels. All of the possible interaction terms should also be considered. The response is the cell mean count, where i=1 denotes covered by bQTLs, j=1 denotes covered by pQTLs, and k denotes chromosome. The full model is ij = + i + j + k + ()ij + (ik, + (jk + (ijk, (2) with constraints as in equation (1), where i is the effect of bQTL, j is the effect of pQTL, and k is the effect of chromosome When maximum likelihood is used to fit this model, small sample size and empty cells affect the fit adversely. In the current case, we have 174 N-A genes and 80 cells. There are 25 empty cells. As a result, we are unable to fit the full model. The most complete model that can be fitted is ij = + i + j + k + ()ij which essentially indicates that the chromosome effect is independent of the bQTL and pQTL effects. The generalized linear models in equation (2) and (3) include a large number of parameters due to the need to model chromosome effects and their interactions bQTL and pQTL effects. However, if we model chromosome as a random effect we may obtain a better fit with much fewer parameters. We consider only the extension to model (3), although the interactions of chromosome with bQTL and pQTL can also be modeled in this way. The mixed model based on (3) is log(E(nijk| k) = + i + j + ()ij+ k (4) The constraints on I, j, ()ij are the same as model (1), but k are modeled as independent identically distributed random variables with distribution N(0, σ2), σ2 unknown. The approach can be extended to compare all four links proposed in Figure (1) simultaneously, by including the MBH genes. This produces Table 3, which is a 4-way table including margins, pQTL coverage, bQTL coverage, gene type and chromosome. We introduce a new categorical variable, gene type, into the model. Table 3. The stratified 2×2×20×2 contingency table counting the completeness of QTLs # Covered Chromosome genes Covered by 1 4 (3) Gene type N-A Covered by bQTLs n1111 Not by bQTLs n1211 COMPARING QUANTITATIVE TRAIT LOCI AND GENE EXPRESSION DATA ASSOCIATED WITH A COMPLEX TRAIT pQTLs 2 … 20 1 2 … 20 1 2 … Not by 20 pQTLs 1 2 … 20 N-A … N-A MBH MBH … MBH N-A N-A … N-A MBH MBH … MBH n1121 … n1,1,20,1 n1112 n1122 … n1,1,20,2 n2111 n2121 … n2,1,20,1 n2112 n2122 … n2,1,20,2 n1221 … n1,2,20,1 n1212 n1222 … n1,2,20,2 n2211 n2221 … n2,2,20,1 n2212 n2222 … n2,2,20,2 We go directly to the generalized linear mixed model (GLMM) with independent effect of chromosome, and all possible interactions of the other fixed effects. There are three fixed explanatory variables (gene type, bQTL, and pQTL) and one random explanatory variable (chromosome). Meanwhile the random effect is simply treated as i.i.d. normal N(0, σ2) with unknown σ2. The selected model through best subset process is [Bing, could you put this in the same notation you used in the previous section, or indicate the equivalence in the section on loglinear models.] * pQTL log( ijkl | ul ) bQTL pQTL kgene bQTL i j ij *gene * gene pQTL bQTL ulchr jk ik . (5) l=1…20 corresponding to chromosomes. i, j = 0, 1 where 1 for covered by a set of QTLs. k = 0, 1 where 0 for N-A genes. This model has all interactions and has a natural connection with the logit model (Agresti, 2002). In section 6 we will need to contrast log odds ratio in some cases. By model (5) it is easy to construct. 5 COMPARING ACCURACY In part 2 we define the count of non-empty QTLs as the measure of accuracy of a link between a set of QTLs and a set of genes. The counts are summarized in Table 4. Notice that the accuracy of a QTL depends on the gene set. For example, if a QTL covers an N-A gene and does not cover MBH genes, then this QTL is non-empty for N-A genes but is empty for MBH genes. We therefore consider all four links simultaneously. Table 4. The stratified 2×2×20 contingency table counting the accuracy of QTLs … 20 1 2 pQTLs … 20 … n1,1,20,1 n2111 n2121 … n2,1,201 For N-A genes Non-empty 1 bQTLs 2 n1111 n1121 Empty n1211 n1221 For MBH genes Non-empty n1112 n1122 Empty n1212 n1222 … n1,1,20,2 n2112 n2122 … n2,1,20,2 … n1,2,20,2 n2212 n2222 … n2,2,20,2 Although table 4 can be seen as having the binary response, i.e. empty and non-empty, and hence we can use logit model to fit table 4, for simplicity we will still apply loglinear model. One can show that in contingency tables logit and loglinear models are naturally equivalent (Agresti, 2002). The selected model is *gene log( ijkl | ul ) QTL gene empty QTL i j k ij *empty *empty gene QTL ulchr jk ik . (6) l=1…20 corresponding to chromosomes. i=0,1 where 1 is for pQTL. j=0,1 where 1 is for N-A gene. k=0,1 where 1 is for non-empty. 6 COMPARING SIMULATED REFERENCES OF MODELS Until the biology is fully understood, we cannot be certain which links in Figure 1 are truly random. The methods of Sections 4 and 5 allow us to compare links thought to represent biologically important associations, and links thought to be biologically unimportant, but not to determine which links are unlikely to occur only “by chance”. In this section, we simulate from the genome to determine the completeness and accuracy of QTL sets and random sets of genes. Random selection of QTLs is not readily done as selection of random intervals along the chromosomes is unlikely to model the true distribution of QTLs. However, since all of the gene locations are known, reference sets of genes are readily created by choosing genes at random, and considering the completeness or accuracy of the QTL sets with respect to these genes. The simulated gene sets (S genes) can be used to approximate the null distribution for the hypothesis that a given link is no stronger than expected by chance, and thus determine an estimated p-value. The estimated pvalues are displayed in table 5. Table 5. Simulated one-sided p-value for the hypothesis H0:link X is not stronger than expected by chance. Link Compared with conditional ran- Compared with completely random genes dom genes completeness QTL type Chromosome … n1,2,20,1 n2211 n2221 … n2,2,20,1 A B C D 0.102 0.564 0.459 0.176 accuracy 0.501 0.584 0.504 0.621 Completeness 0.097 0.525 0.432 0.140 accuracy 0.454 0.306 0.360 0.433 5 K.Takahashi et al. The simulation result moderately supports the claim that the hypothesized link A is stronger than expected by chance. The p-values for completeness are around 0.10 under both random sampling schemes. Link D also shows a stronger strength than by chance with weak evidence. However, neither links A or D are more accurate than expected by chance. Links C and D do not have significant effects. 7 RESULTS AND CONLUSIONS The study of completeness is based on model (5). Because the factors all have two levels, we can easily define conditional odds ratio for a factor without confusion. We use nonlinear mixed model procedure in SAS (Pinheiro and Bates, 1995) to fit the model and do model selection, and the same for the study of accuracy. One can show that the conditional log odds ratio of one factor given another is equivalent to the two-way interaction between them (Agresti, 2002). When the comparison is between two gene types, it is based on the odds ratio to adjust the effect of unequal number of genes. Otherwise we directly compare the cell means. The comparisons of interest can be estimated and tested as simple functions of the model parameters. The tests of interest and their p-values are listed in table 6. Comparison type parameters p-value A-D completeness 1bQTL 1pQTL 0.81 A-C completeness 0.09§ A-B completeness 0.64 A-D accuracy 0.23 A-C accuracy 0.002§ A-B accuracy 0.001§ Table 6: Comparisons between link A and other links for accuracy and completeness. §Link A is significantly stronger than these links. [BING, IN THE SECTION BELOW, YOU TOTALLY LOST ME IN THE NOTATION. I THINK THIS IS BECAUSE 1) YOU DID NOT DEFINE THE SUBSCRIPTS (E.G. FOR k, k=0 is NA, I think). 2) YOU DID NOT DEFINE THE LAMBDA’S. 3) IN THE DISCUSSION, YOU DID NOT CHANGE SOME “link D” TO THE LINKS YOU WERE ACTUALLY DISCUSSING. Anyways, I think this summary is best done in a table as above. The comparison of link A (pQTLs with N-A genes) and link D (bQTLs with N-A genes) is equivalent to testing that the ratio of cell means is 1, equation.7.. The difference is not statistically significant (p=0.81). 1bQTL 1pQTL L1 log( 6 100,l ) 1bQTL 1pQTL 010,l (7) To compare link A and link C (pQTLs with MBH genes) the odds ratio of gene type given pQTL is used (equation 8) The odds ratio is moderately significant (p= 0.088). A 90% confidence interval is given by (-0.767, -0.044). We conclude that link A is significantly stronger than C . L2 log( 11,l 01,l 10,l ) 11pQTL*gene 00,l (8) The comparison of link A and link D (bQTLs with MBH genes) is similar to the comparison between A and C. Link A is not significantly different from D in completeness (p.064) or accuracy (p=0.23). The comparison of link A and link C (pQTLs with MBH genes) shows that link A is significantly stronger than C. L5 log( 111,l 110,l 011,l gene*empty . ) 11 010,l (11) The comparison of link A and link B (bQTLs with MBH genes) is to compare (i=1, j=1) and (i=0, j=0). The contrast is H0:L6=0 against the two-sided alternative, which is significant at the 0.05 level (p-value=.001). A 95% confidence interval of L8 is given by (1.158, 2.263), which suggests that link A is significantly stronger than D B?. gene*empty qtl*empty . 11 11 (12) We can summarize the comparisons for drug-abuse traits with nucleus accumbens genes in mouse shown below: SA ≤ A = D ≥ SD, A > B = SB, A > C = SC, (13) where A ~ D refer to the notation in figure 1, S X refers to the link between a QTL and a set of random chosen genes, where the set of random genes has the same number of genes as the gene set in link X. “>” means significant difference for both quantitative measures, “≥” means significant difference for only one quantitative measure, and “=” for insignificant difference for both quantitative measures. While the hypothesized stronger link A is indeed significantly stronger than the referential link B and C, link A is not significantly different to D. These results suggest a potential association between drug abuse and the bone strength in mouse, and rejects the association between medial basal hypothalamus and drug abuse, and the association between medial basal hypothalamus and bone strength. From (13) the suggestion for further biological study is that 1) by the significance of A over B and C, MBH genes have weaker connection with drug abuse trait; 2) both N-A genes and bQTLs show strong connections with drug abuse traits. N-A genes are expressed in a brain tissue which controls drug abuse activities. Hence the connection is expected; however, the bQTLs are related to the bone strength traits. COMPARING QUANTITATIVE TRAIT LOCI AND GENE EXPRESSION DATA ASSOCIATED WITH A COMPLEX TRAIT The seemly unrelated traits of bone strength and drug abuse appear to be associated in our study. 8 DISCUSSION [BING, in a thesis or talk, we often end with a discussion of further work that should be done. In a paper, we generally try to end with a summary that helps the reader understand the important advance we have made. DAVID, we could use some help here in making case.] Loglinear models have been used to examine the links between QTL sets and sets of select genes. A strong association was expected between the N-A genes and the drug abuse QTLs. However, this association is only moderately stronger than expected by chance. A possible reason is that the randomly selected genes were selected from those represented on the Affymetrix® array U74Av2 which consists of about one third of the whole genome. Unexpectedly, the association between the bone density QTLs and the N-A genes was also higher than expected by chance, thus leading to the hypothesis that the N-A may be associated with traits that influence bone density. One possible explanation for this association is that locomotion affects bone density (Gordon et al., 1989) and is also related to drug abuse vulnerability (Piazza et al. 1998). However, associations between the MBH genes and the QTL sets were no stronger than expected by chance. Lang,D.H., Sharkey,N.A., Mack,H.A., Vogler,G.P., Vandenbergh,D.J., Blizard,D.A., Stout,J.T., McClearn,G.E. (2005) Quantitative trait loci analysis of structural and material skeletal phenotypes in C57BL/6J and DBA/2 F2 and RI mice. J. Bone and Mineral density, 20, 88-99. Piazza PV, Deroche V, Rouge-Pont F, Le Moal M. (1988) Behavioral and biological factors associated with individual vulnerability to psychostimulant abuse. NIDA Res Monogr., 169:10533. Pinheiro,J.C., Bates,D.M. (1995) Approximations to the Loglikelihood Function in the Nonlinear Mixed-effects Model. J. Computational and Graphical Statistics, 4, 12 - 35. Silver,L.M. (1995) Mouse genetics: concepts and applications. Oxford University Press, Oxford, UK. Wayne,M.L. and Mclntyre,L.M. (2002) Combining mapping and arraying: an approach to candidate gene identification. PNAS: Genetics, 99, 14903-14906. The loglinear model can be used to compare the relative strength of association between different sets of QTLs and genes. To determine statistical significance of the links, the strength of association for the links of interest were compared to the association between the QTL sets and randomly selected genes. REFERENCES Agresti A, (2002) Categorical data analysis 2nd ed., Wiley, NJ. Fan J., Li R., (2001) Variable Selection Via Nonconcave Penalized Likelihood and Its Oracle Properties, JASA, 96, 1348-1360. Carelli RM, and Wightman RM., (2004) Functional microcircuitry in the accumbens underlying drug addiction: insights from realtime signaling during behavior, Curr Opin Neurobiol. 14, 763768. Fischer,G., Ibrahim,S.M., Brockmann,G.A., Pahnke,J., Bartocci,E., Thiesen,H., Serrano-Fernandez,P. and Molle,S. (2003) Expressionview: visualization of quantitative trait loc and geneexpression data in Ensembl. Genome Biology, 4, R77. Gentleman, R.C., V.J. Carey, D.M. Bates, B. Bolstad, M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, and J. Gentry. 2004. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 5: R80. Gordon, K. R., Perl, M., and Levy, C. (1989). Structural alterations and breaking strength of mouse femora exposed to three activity regimens. Bone 10, 303-12. 7