HanQTLR5_djv - Penn State Department of Statistics

advertisement
Vol. 1 no. 1 2003
Pages 1–1
BIOINFORMATICS
COMPARING QUANTITATIVE TRAIT LOCI AND
GENE EXPRESSION DATA ASSOCIATED WITH A
COMPLEX TRAIT
Bing Han*, Naomi S. Altman*, David J. Vandenbergh
Department of Statistics, Pennsylvania State University, University Park, PA, US and Department of Biobehavioral Health, Pennsylvania State University, University Park, PA US
ABSTRACT
Motivation: Quantitative Trait Locus (QTL) analysis estimates the position of genes that affect a trait, but does not
identify individual genes associated with the trait. Microarray
analysis can detect genes that are transcriptionally regulated,
and allelic differences that alter transcription may be the
cause of the trait. In this paper we develop methods to compare the consistency of the two approaches and apply these
methods to several sets of QTL and microarray results.
Results: Biological evidence indicates that the nucleus accumbens area of the brain is associated with many drug
abuse-related traits (Carelli & Wightman, 2004). We consider
the association between a set of QTLs associated with drug
abuse in mice, and a set of genes found by microarray analysis to preferentially expressed in the nucleus accumbens.
For comparison, we also use a set of QTLs associated with
bone parameters, and a set of genes found to be preferentially expressed in the medial basal hypothalamus, a region
of the brain not thought to be associated with either drug
abuse or bone density. Our analyses reveal that the proposed association between the drug abuse QTLs and the
genes up-regulated in the nucleus accumbens is no stronger
than the association between the bone density QTLs and the
nucleus accumbens genes. The association between the
medial basal hypothalamus specific genes and the two QTL
sets is weaker however. Simulation results show that the
associations between the nucleus accumbens genes and the
two QTL sets are stronger than would be expected from a
randomly selected set of genes of the same size. {I think we
should present the randomly selected genes as the primary
control set, and then present the hypothalamus genes. That
way we can point out that we only know a small fraction of
the roles of the hypothalamus to diminish the lack of a significant difference. What do you think?} The analyses show a
possible association between the seemingly uncorrelated
traits of drug abuse and bone density. Statistical methodology developed for this study can be applied to similar studies
*
to assess the joint information in microarray and QTL analyses.
1
INTRODUCTION
The association between phenotypic traits and genetic
markers on the chromosome can be detected through statistical analysis, leading to the identification of QTLs – regions of the chromosome that appear to be associated with
the phenotype. QTLs are expected to be associated with the
genes controlling some aspect of the phenotype. Microarray
gene expression studies can be used to assess which genes
are differentially expressed in organisms with different phenotypes. Hence it seems natural to combine QTL and gene
expression data to determine the genes associated with
complex traits. {I would delete the previous sentence.}
Several investigators have considered combining QTL
and microarray data for studying a genetic trait. For example, Wayne and Mclntyre (2002) proposed a way of identifying candidate genes based on both QTL mapping and microarray data. Fischer et al. (2003) developed a web-based
software tool for combined visualization and exploration of
gene expression data and QTLs.
However, comparing QTL and microarray data is not
completely straight forward. First, the estimated range of
QTL positions is generally wide, containing thousands of
putative genes. However, QTL analysis may also miss
some interesting genes (Wayne and Mclntyre, 2002). Second, the high level of experimental error and limitations of
analysis in microarray data introduce mistakes in the identification of relevant genes. Finally, in a complex situation
such as the association of a phenotype with a specific tissue,
the set of genes identified as “preferentially expressed” in
the tissue depends on the set of reference tissues included in
the study, while the association between a phenotype and
the tissue may be indirect, depending on intermediary
mechanisms. As well, the association between a phenotype
and a tissue may depend on ephemeral conditions that may
not be present when the tissue was collected for the microarray study or on a small percentage of cells in the organism,
which may be masked by bulk tissue preparation.
To whom correspondence should be addressed.
© Oxford University Press 2003
1
K.Takahashi et al.
bQTL
lin
k
k
lin
C
N-A
genes
D
link B
link A
pQTL
MBH
genes
Fig. 1. Relation between QTLs and gene sets. The
names of links are used through the paper.
10
0
5
chromosome
15
20
pqtl
5.0 e+07
1.0 e+08
1.5 e+08
basepair
Fig. 2. Combined visualization of pQTLs and N-A
genes
In this paper, we suggest several methods to examine the
strength of association between a group of QTLs and a set
of genes identified from a microarray study. The methods
provide statistical evidence for or against the null hypothesis
that the association is no stronger than a reference or expected by chance. As a by-product, the methods can also
provide information about the association between two traits
or a trait and a tissue.
We apply our methods to two sets of mouse QTLs identified from the literature and two sets of mouse genes identified from a microarray study. First, we identified a set of
120 QTLs associated with drug abuse from the Mouse Genome
Informatics
database
(http://www.informatics.jax.org/), which we call the aQTL
set, consisting of QTLs for any drug with rewarding properties in mice, M. Jung Honors Thesis, Penn State University)
{Can we change to “a” for abuse rather than “p” in the
manuscript? I don’t see any relationship between p and the
QTLs – if difficult to change it is not a biggie} and 174
genes that are preferentially expressed in the nucleus accumbens region of the mouse brain (the N-A genes) as
2
compared to two adjacent regions (Preoptic Area and Medial Basal Hypothalamus; manuscript in preparation). The
nucleus accumbens plays an important role in mouse behaviors relevant to drug abuse. We expect a strong association
between pQTLs and N-A genes, i.e. link A.
The second set of 165 QTLs is associated with bone
strength, morphometry, mineral content and organic content
(the bQTL set, Lang et al., 2005). The second set of 39
genes is preferentially expressed in the medial basal hypothalamus (MBH) region of the brain, (the MBH genes,
manuscript in preparation). There is no known association
between the traits for drug abuse and bone strength, or between either of these traits and the MBH region. Hence we
expect the strongest association between the pQTLs and the
N-A genes, and very weak links for all the other QTL set by
gene set pairs. Figure 1 illustrates all the possible pairs.
We can set up other independent referential sets of genes
by randomly choosing genes from the array (we used Affymetrix® array Mouse Genome U74Av2). The strength of
the link between these randomly chosen referential sets and
the set of QTLs provides a measure of the statistical significance of the observed link.
In Section 2 we define the strength of a link in two ways:
completeness and accuracy. In Section 3 we briefly review
loglinear models for multiway tables. Throughout the remainder of the paper, the parameters of loglinear models are
used to summarize links. In part 4 we compare the link between the N-A genes and the pQTLs with the referential
links in terms of their completeness, and in part 5 in terms
of their accuracy. In part 6 we compare the links between
the two QTL sets and two gene sets with links with randomly chosen genes. Part 7 summarizes the results from the
chosen model in part 4 and 5. Finally, we briefly discuss
some possible improvement of the comparisons with randomly selected genes.
2
EXPLORATORY DATA ANALYSIS AND
QUANTIFICATION OF LINKS
Figures 2 and 3 show the correspondence between the sets
of QTLs and the set of N-A genes, where Figure 2 corresponds to link A, the supposed strong association, and Figure 3 corresponds to link D.
In each figure, the long horizontal lines represent the
chromosomes. The Y-coordinate 20 corresponds to the X
chromosome. No data were available regarding gene expression or QTLs on the Y chromosome in our data. The
short discrete horizontal segments are the spans of the QTLs
defined as +/- 5 centiMorgans (cM) from the peak position.
The small circles in the center of every segment are the peak
positions of the QTLs. Finally the vertical lines are the N-A
genes.
Some genes fall out of the range of chromosomes, e.g.
chromosome #1 in Figure 3, which is due to the differences
COMPARING QUANTITATIVE TRAIT LOCI AND GENE EXPRESSION DATA ASSOCIATED WITH A COMPLEX TRAIT
10
5
chromosome
15
20
bqtl
0.0 e+00
5.0 e+07
1.0 e+08
1.5 e+08
basepair
Fig. 3. Combined visualization of bQTLs and N-A
genes
between the data sources available to annotate the U74Av2
array. The data we work with are from Affymetrix®, but
the plot is drawn using the Bioconductor suite in R (Gentleman et al, 2004).
QTLs are measured in centimorgans (cM), which
measures recombination frequency between markers on a
chromosome. Gene locations are usually measured by the
physical distance in base pairs (bp) or megabase pairs (1 Mb
=106 bp). Empirically, on average 2 Mb = 1 cM in the
mouse. There are a few more accurate methods to translate
cM into Mb (e.g. Silver, 1995 and Fischer et al, 2003). We
use polynomial regression to estimate physical distance
from cM, using genes for which both measures are available.
This method has good performance except at the ends of a
chromosome. Any QTL with a span that extends beyond the
end of a chromosome is truncated. For example, see the
right end of chromosome 20 in figure 2. [Bing, this does not
show up on the figure.]
No obvious matches between the QTL sets and the N-A
genes can be seen in either Figure 2 or 3. The visual impression does not support a greater association for link A
than for link D. However, the distributions of both genes
and QTLs clearly differ among the chromosomes.
We consider two approaches to quantify the strength of a
link. For convenience, we denote a set of QTLs, such as
drug abuse QTLs, by Q and a set of genes, such as the N-A
genes, by G. We consider the strength of association to be a
quantitative measure that may or may not have biological
meaning.
A natural first approach is to consider the number of
genes in G covered by the whole span of Q. The link between Q and G is strong if this number is big. This quantifi-
cation reflects the “completeness” of Q in terms of covering
G.
A second approach is to consider whether each QTL in Q
covers at least one gene in G. If a QTL in Q covers no genes
in G, it is called “empty”; otherwise it is “non-empty”. The
link between Q and G is strong when the percentage of empty QTL is small. This quantification reflects the “accuracy”
of Q in terms of covering G.
If Q is strongly associated with G, we expect both coverage and accuracy to be high. Increasing the span of Q with
additional QTLs may increase completeness just by chance,
but should also decrease accuracy. It is not clear whether
completeness or accuracy is the best way to summarize the
strength of a link between Q and G. In Section 4, we consider completeness and in Section 5 we consider accuracy.
We assess both completeness and accuracy using loglinear
models for multiway tables. Loglinear models are introduced in Section 3.
3
THE LOGLINEAR MODEL
The data are counts of genes in various categories, which
can be represented as multiway tables. We will make use of
loglinear models to parameterize these tables and test hypotheses. In this section, we give a brief overview of loglinear models.
We start with the simple case of a 2-way table, such as
Table 1, which summarizes the number of genes covered by
each QTL set.
In the loglinear model, the cell counts are modeled as
Poisson random variables (Agresti, 2002). The Poisson
rates are modeled as functions of the table margins.
For example, denoting the number of genes in row i, column j of the table by nij, and letting ij = E(nij) the saturated
model is

log ij   i   j  ( )ij ,
(1)
where 1  1  ( )1 j  ( )i1  0.
The model is said to be saturated, because this allows a
perfect fit to the data.  is the average of ij over all the
cells, i is the effect of the ith row, and j is the effect of the
jth column. ()ij is called the interaction of the row and
column effects. When the rows and columns of the table are
independent, the interaction effects are all 0. In a 2-way
table, the familiar chi-squared test of independence is the
same as the test that the interactions are zero in the loglinear
model.
The popularity of the log-linear model for analysis of
multiway tables comes from the ease with which the model
can be extended to include larger tables and more complex
situations. For example, we can readily extend to a 3-way
table by adding an effect k for the 3rd dimension, along with
interactions (ik, (jk and (ijk. Estimation of the
3
K.Takahashi et al.
terms in the model is generally done through maximum
likelihood.
4
COMPARING COMPLETENESS
We use the measure of completeness described in part 2 to
compare putative links, such as links A (pQTLs and N-A
genes) and D (bQTL and N-A genes) We also take into
account the unequal distribution of genes and QTLs on the
chromosomes.
Table 1 summarizes the coverage of the N-A genes by
each set of QTLs. The completeness of the pQTLs is proportional to the first row probability π1+=π11+π12, where πij is
the probability of a gene in the cell (i, j). Similarly the completeness of the bQTLs is proportional to the first column
probability π+1=π11+π21. We expect link A to be more complete than link D i.e. π1+ >π+1.. This expectation is tested
formally using a Wald test (Agresti, 2002), which gives
moderate evidence that π1+≠π+1 (p=0.06 using the Wald test,
Agresti, 2002). However, the evidence suggests π1+ <π+1,
i.e. link D is more complete than link A. There is moderate
evidence that the bQTLs cover more N-A genes than the
pQTLs.
Figures 2 and 3 demonstrate that the distributions of genes
and QTLs differ among the chromosomes. Hence it is possible that a “chromosome effect” may influence observed
completeness. To account for the possible chromosome
effect, we stratify Table 1 into a 2×2×20 table, as demonstrated in Table 2. This table has 80 cells for the 174 N-A
genes, giving 25 cells with no genes. The data can be accessed from http://www.stat.psu.edu/~hanbing/qtlpaper/.
Table 1. Overall count for #N-A genes covered
Count of N-A genes Covered by bQTLs
Covered by pQTLs 47
Not by pQTLs
44
Column sum
91
Not by bQTLs Row sum
32
51
83
79
95
174
Table 2. The stratified 2×2×20 contingency table counting the covered NA genes
Count of N-A genes Chromosome Covered by bQTLs Not Covered by
bQTLs
1
2
Covered by pQTLs
…
20
1
2
Not Covered by
…
pQTLs
20
n111
n112
…
n1,1,20
n211
n212
…
n2,1,20
n121
n122
…
n1,2,20
n221
n222
…
n2,2,20
We model the table using a loglinear model, with cell
counts depending on the table margins: coverage by pQTLs
(i), coverage by bQTLs (j) and chromosomes (k). Denoting
the cell counts by nijk, the full model allows a different Poisson rate for each cell.
The saturated model has three predictors: an indicator
named pQTL denoting covered by pQTLs, an indicator
named bQTL similarly, and chromosome with 20 discrete
levels. All of the possible interaction terms should also be
considered. The response is the cell mean count, where i=1
denotes covered by bQTLs, j=1 denotes covered by pQTLs,
and k denotes chromosome. The full model is
ij =  + i + j + k + ()ij + (ik, + (jk + (ijk, (2)
with constraints as in equation (1), where i is the effect of
bQTL, j is the effect of pQTL, and k is the effect of chromosome
When maximum likelihood is used to fit this model, small
sample size and empty cells affect the fit adversely. In the
current case, we have 174 N-A genes and 80 cells. There
are 25 empty cells. As a result, we are unable to fit the full
model. The most complete model that can be fitted is
ij =  + i + j + k + ()ij
which essentially indicates that the chromosome effect is
independent of the bQTL and pQTL effects.
The generalized linear models in equation (2) and (3) include a large number of parameters due to the need to model
chromosome effects and their interactions bQTL and pQTL
effects. However, if we model chromosome as a random
effect we may obtain a better fit with much fewer parameters. We consider only the extension to model (3), although
the interactions of chromosome with bQTL and pQTL can
also be modeled in this way. The mixed model based on (3)
is
log(E(nijk| k) =  + i + j + ()ij+ k
(4)
The constraints on I, j, ()ij are the same as model (1),
but k are modeled as independent identically distributed
random variables with distribution N(0, σ2), σ2 unknown.
The approach can be extended to compare all four links
proposed in Figure (1) simultaneously, by including the
MBH genes. This produces Table 3, which is a 4-way table
including margins, pQTL coverage, bQTL coverage, gene
type and chromosome. We introduce a new categorical variable, gene type, into the model.
Table 3. The stratified 2×2×20×2 contingency table counting the completeness of QTLs
# Covered Chromosome
genes
Covered by 1
4
(3)
Gene type
N-A
Covered by
bQTLs
n1111
Not by
bQTLs
n1211
COMPARING QUANTITATIVE TRAIT LOCI AND GENE EXPRESSION DATA ASSOCIATED WITH A COMPLEX TRAIT
pQTLs 2
…
20
1
2
…
20
1
2
…
Not by 20
pQTLs 1
2
…
20
N-A
…
N-A
MBH
MBH
…
MBH
N-A
N-A
…
N-A
MBH
MBH
…
MBH
n1121
…
n1,1,20,1
n1112
n1122
…
n1,1,20,2
n2111
n2121
…
n2,1,20,1
n2112
n2122
…
n2,1,20,2
n1221
…
n1,2,20,1
n1212
n1222
…
n1,2,20,2
n2211
n2221
…
n2,2,20,1
n2212
n2222
…
n2,2,20,2
We go directly to the generalized linear mixed model
(GLMM) with independent effect of chromosome, and all
possible interactions of the other fixed effects. There are
three fixed explanatory variables (gene type, bQTL, and
pQTL) and one random explanatory variable (chromosome).
Meanwhile the random effect is simply treated as i.i.d. normal N(0, σ2) with unknown σ2. The selected model through
best subset process is [Bing, could you put this in the same
notation you used in the previous section, or indicate the
equivalence in the section on loglinear models.]
* pQTL
log(  ijkl | ul )    bQTL
  pQTL
 kgene  bQTL
i
j
ij
*gene
* gene
  pQTL
 bQTL
 ulchr
jk
ik
. (5)
l=1…20 corresponding to chromosomes. i, j = 0, 1 where 1
for covered by a set of QTLs. k = 0, 1 where 0 for N-A
genes. This model has all interactions and has a natural connection with the logit model (Agresti, 2002). In section 6 we
will need to contrast log odds ratio in some cases. By model
(5) it is easy to construct.
5
COMPARING ACCURACY
In part 2 we define the count of non-empty QTLs as the
measure of accuracy of a link between a set of QTLs and a
set of genes. The counts are summarized in Table 4. Notice
that the accuracy of a QTL depends on the gene set. For
example, if a QTL covers an N-A gene and does not cover
MBH genes, then this QTL is non-empty for N-A genes but
is empty for MBH genes. We therefore consider all four
links simultaneously.
Table 4. The stratified 2×2×20 contingency table counting the accuracy of
QTLs
…
20
1
2
pQTLs …
20
…
n1,1,20,1
n2111
n2121
…
n2,1,201
For N-A genes
Non-empty
1
bQTLs 2
n1111
n1121
Empty
n1211
n1221
For MBH genes
Non-empty
n1112
n1122
Empty
n1212
n1222
…
n1,1,20,2
n2112
n2122
…
n2,1,20,2
…
n1,2,20,2
n2212
n2222
…
n2,2,20,2
Although table 4 can be seen as having the binary response, i.e. empty and non-empty, and hence we can use
logit model to fit table 4, for simplicity we will still apply
loglinear model. One can show that in contingency tables
logit and loglinear models are naturally equivalent (Agresti,
2002). The selected model is
*gene
log( ijkl | ul )    QTL
 gene
 empty
 QTL
i
j
k
ij
*empty
*empty
 gene
 QTL
 ulchr
jk
ik
. (6)
l=1…20 corresponding to chromosomes. i=0,1 where 1 is
for pQTL. j=0,1 where 1 is for N-A gene. k=0,1 where 1 is
for non-empty.
6
COMPARING SIMULATED REFERENCES
OF MODELS
Until the biology is fully understood, we cannot be certain
which links in Figure 1 are truly random. The methods of
Sections 4 and 5 allow us to compare links thought to represent biologically important associations, and links thought
to be biologically unimportant, but not to determine which
links are unlikely to occur only “by chance”. In this section,
we simulate from the genome to determine the completeness
and accuracy of QTL sets and random sets of genes.
Random selection of QTLs is not readily done as selection
of random intervals along the chromosomes is unlikely to
model the true distribution of QTLs. However, since all of
the gene locations are known, reference sets of genes are
readily created by choosing genes at random, and considering the completeness or accuracy of the QTL sets with respect to these genes. The simulated gene sets (S genes) can
be used to approximate the null distribution for the hypothesis that a given link is no stronger than expected by chance,
and thus determine an estimated p-value. The estimated pvalues are displayed in table 5.
Table 5. Simulated one-sided p-value for the hypothesis H0:link X is not
stronger than expected by chance.
Link
Compared with conditional ran- Compared with completely random genes
dom genes
completeness
QTL type Chromosome
…
n1,2,20,1
n2211
n2221
…
n2,2,20,1
A
B
C
D
0.102
0.564
0.459
0.176
accuracy
0.501
0.584
0.504
0.621
Completeness
0.097
0.525
0.432
0.140
accuracy
0.454
0.306
0.360
0.433
5
K.Takahashi et al.
The simulation result moderately supports the claim that
the hypothesized link A is stronger than expected by chance.
The p-values for completeness are around 0.10 under both
random sampling schemes. Link D also shows a stronger
strength than by chance with weak evidence. However, neither links A or D are more accurate than expected by
chance. Links C and D do not have significant effects.
7
RESULTS AND CONLUSIONS
The study of completeness is based on model (5). Because
the factors all have two levels, we can easily define conditional odds ratio for a factor without confusion. We use nonlinear mixed model procedure in SAS (Pinheiro and Bates,
1995) to fit the model and do model selection, and the same
for the study of accuracy. One can show that the conditional
log odds ratio of one factor given another is equivalent to
the two-way interaction between them (Agresti, 2002).
When the comparison is between two gene types, it is based
on the odds ratio to adjust the effect of unequal number of
genes. Otherwise we directly compare the cell means. The
comparisons of interest can be estimated and tested as simple functions of the model parameters. The tests of interest
and their p-values are listed in table 6.
Comparison type
parameters
p-value
A-D
completeness 1bQTL  1pQTL
0.81
A-C
completeness
0.09§
A-B
completeness
0.64
A-D
accuracy
0.23
A-C
accuracy
0.002§
A-B
accuracy
0.001§
Table 6: Comparisons between link A and other links for
accuracy and completeness.
§Link A is significantly stronger than these links.
[BING, IN THE SECTION BELOW, YOU TOTALLY
LOST ME IN THE NOTATION. I THINK THIS IS
BECAUSE 1) YOU DID NOT DEFINE THE
SUBSCRIPTS (E.G. FOR k, k=0 is NA, I think). 2) YOU
DID NOT DEFINE THE LAMBDA’S. 3) IN THE
DISCUSSION, YOU DID NOT CHANGE SOME “link D”
TO THE LINKS YOU WERE ACTUALLY DISCUSSING.
Anyways, I think this summary is best done in a table as
above.
The comparison of link A (pQTLs with N-A genes) and
link D (bQTLs with N-A genes) is equivalent to testing that
the ratio of cell means is 1, equation.7.. The difference is not
statistically significant (p=0.81). 1bQTL  1pQTL
L1  log(
6
100,l
)  1bQTL  1pQTL
010,l
(7)
To compare link A and link C (pQTLs with MBH genes)
the odds ratio of gene type given pQTL is used (equation 8)
The odds ratio is moderately significant (p= 0.088). A 90%
confidence interval is given by (-0.767, -0.044). We conclude that link A is significantly stronger than C .
L2  log(
 11,l
 01,l
 10,l
)  11pQTL*gene
 00,l
(8)
The comparison of link A and link D (bQTLs with MBH
genes) is similar to the comparison between A and C. Link
A is not significantly different from D in completeness (p.064) or accuracy (p=0.23).
The comparison of link A and link C (pQTLs with MBH
genes) shows that link A is significantly stronger than C.
L5  log(
111,l
110,l
011,l
gene*empty
.
)  11
010,l
(11)
The comparison of link A and link B (bQTLs with MBH
genes) is to compare (i=1, j=1) and (i=0, j=0). The contrast
is H0:L6=0 against the two-sided alternative, which is significant at the 0.05 level (p-value=.001). A 95% confidence
interval of L8 is given by (1.158, 2.263), which suggests that
link A is significantly stronger than D B?.
gene*empty
qtl*empty
.
11
 11
(12)
We can summarize the comparisons for drug-abuse traits
with nucleus accumbens genes in mouse shown below:
SA ≤ A = D ≥ SD, A > B = SB, A > C = SC, (13)
where A ~ D refer to the notation in figure 1, S X refers to
the link between a QTL and a set of random chosen genes,
where the set of random genes has the same number of
genes as the gene set in link X. “>” means significant difference for both quantitative measures, “≥” means significant
difference for only one quantitative measure, and “=” for
insignificant difference for both quantitative measures.
While the hypothesized stronger link A is indeed significantly stronger than the referential link B and C, link A is
not significantly different to D. These results suggest a potential association between drug abuse and the bone strength
in mouse, and rejects the association between medial basal
hypothalamus and drug abuse, and the association between
medial basal hypothalamus and bone strength.
From (13) the suggestion for further biological study is
that 1) by the significance of A over B and C, MBH genes
have weaker connection with drug abuse trait; 2) both N-A
genes and bQTLs show strong connections with drug abuse
traits. N-A genes are expressed in a brain tissue which controls drug abuse activities. Hence the connection is expected;
however, the bQTLs are related to the bone strength traits.
COMPARING QUANTITATIVE TRAIT LOCI AND GENE EXPRESSION DATA ASSOCIATED WITH A COMPLEX TRAIT
The seemly unrelated traits of bone strength and drug abuse
appear to be associated in our study.
8
DISCUSSION
[BING, in a thesis or talk, we often end with a discussion
of further work that should be done. In a paper, we generally try to end with a summary that helps the reader understand the important advance we have made. DAVID, we
could use some help here in making case.]
Loglinear models have been used to examine the links between QTL sets and sets of select genes. A strong association was expected between the N-A genes and the drug
abuse QTLs. However, this association is only moderately
stronger than expected by chance. A possible reason is that
the randomly selected genes were selected from those represented on the Affymetrix® array U74Av2 which consists of
about one third of the whole genome. Unexpectedly, the
association between the bone density QTLs and the N-A
genes was also higher than expected by chance, thus leading
to the hypothesis that the N-A may be associated with traits
that influence bone density. One possible explanation for
this association is that locomotion affects bone density
(Gordon et al., 1989) and is also related to drug abuse vulnerability (Piazza et al. 1998). However, associations between the MBH genes and the QTL sets were no stronger
than expected by chance.
Lang,D.H., Sharkey,N.A., Mack,H.A., Vogler,G.P., Vandenbergh,D.J., Blizard,D.A., Stout,J.T., McClearn,G.E. (2005)
Quantitative trait loci analysis of structural and material skeletal
phenotypes in C57BL/6J and DBA/2 F2 and RI mice. J. Bone
and Mineral density, 20, 88-99.
Piazza PV, Deroche V, Rouge-Pont F, Le Moal M. (1988) Behavioral and biological factors associated with individual vulnerability to psychostimulant abuse. NIDA Res Monogr., 169:10533.
Pinheiro,J.C., Bates,D.M. (1995) Approximations to the Loglikelihood Function in the Nonlinear Mixed-effects Model. J.
Computational and Graphical Statistics, 4, 12 - 35.
Silver,L.M. (1995) Mouse genetics: concepts and applications.
Oxford University Press, Oxford, UK.
Wayne,M.L. and Mclntyre,L.M. (2002) Combining mapping and
arraying: an approach to candidate gene identification. PNAS:
Genetics, 99, 14903-14906.
The loglinear model can be used to compare the relative
strength of association between different sets of QTLs and
genes. To determine statistical significance of the links, the
strength of association for the links of interest were compared to the association between the QTL sets and randomly
selected genes.
REFERENCES
Agresti A, (2002) Categorical data analysis 2nd ed., Wiley, NJ.
Fan J., Li R., (2001) Variable Selection Via Nonconcave Penalized
Likelihood and Its Oracle Properties, JASA, 96, 1348-1360.
Carelli RM, and Wightman RM., (2004) Functional microcircuitry
in the accumbens underlying drug addiction: insights from realtime signaling during behavior, Curr Opin Neurobiol. 14, 763768.
Fischer,G., Ibrahim,S.M., Brockmann,G.A., Pahnke,J., Bartocci,E.,
Thiesen,H., Serrano-Fernandez,P. and Molle,S. (2003) Expressionview: visualization of quantitative trait loc and geneexpression data in Ensembl. Genome Biology, 4, R77. Gentleman, R.C., V.J. Carey, D.M. Bates, B. Bolstad, M. Dettling, S.
Dudoit, B. Ellis, L. Gautier, Y. Ge, and J. Gentry. 2004. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 5: R80.
Gordon, K. R., Perl, M., and Levy, C. (1989). Structural alterations
and breaking strength of mouse femora exposed to three activity regimens. Bone 10, 303-12.
7
Download