Methods Statistical analysis Gene-by

advertisement
Methods
Statistical analysis
Gene-by-environment interaction analysis
The general test for GxE interaction in sib pair-based association analysis of quantitative traits (van
der Sluis et al., 2008) is an extension of the Fulker et al. (1999) maximum likelihood variance
components analysis of quantitative traits in sib-pairs data that incorporates environmental main
effects plus GxE effects (van der Sluis et al., 2008). Since the association effect is decomposed here
into within-family (w) and
between-family (b) effects, the design is robust against spurious
association stemming from population stratification (van der Sluis et al., 2008). As the association
between the GxE interaction and a phenotype is also susceptible to population stratification bias, the
sib pair design allows for extending the orthogonal decomposition into between- and within-family
effects also to the GxE interaction effects. Here, to partition the additive QTL effects into b and w
effects and to model the GxE effects, we followed the model described by van der Sluis et al.
(2008), except that we included only the additive effects, and adopted the Abecasis et al. (Abecasis
et al., 2000a,b) genotype coding to make use of parental genotypes and to handle sibships of any
size, with or without parental genotypes (Mascheretti et al., 2013). Briefly, if we assume that (1) a
diallelic marker has allele A1 with frequency p, and allele A2 with frequency 1 – p = q, and
genotypes A1A1, A1A2 and A2A2 with genotypic effects a, 0 and –a, respectively, under an additive
model, (2) the observed trait value of an individual is a function of a major gene effect (QTL), an
additive polygenic genetic background effect, a shared environmental effect, and a non-shared
environmental effect (which includes measurement error), (3) the effects of the additive polygenic
genetic background, the common and the non-shared environment, and the QTL are mutually
uncorrelated, and (4) the additive polygenic genetic background effect and the environmental
effects are normally distributed with mean = 0, the additive QTL effects can then be orthogonally
partitioned into b and w effects, as specified in Abecasis et al. (Abecasis et al., 2000a,b). Abecasis
et al. (2000a,b) extended the Fulker et al. (1999) model to accommodate any number of offspring,
with or without parental genotypes, as follows:
∑jgij
if parental genotypes are unknown
ni
bi =
, and:
giF + giM
if parental genotypes giF and giM are available
2
wij = gij -bi ,
so that bi is the expectation of each sib genotype gij conditional on family i genotype data and wij is
the deviation from this expectation for offspring j. Significant positive values of the within-pairs
component indicate that a child inherits more copies of the investigated allele than would be
expected. In order to partition the additive QTL effects into between- and within-family effects
(Mascheretti et al., 2013), we tested the rare allele of each SNPs against the major allele (see
Genotyping section).
More recently, to model the interaction effect, van der Sluis et al. (2008) adopted the notation
proposed by van den Oord (1999) in which the environmental main effect (e) with a dichotomous
environmental exposure is modelled as the difference in the phenotypic means of environmental
Conditions 1 and 2, and the interaction effect is such that subjects in Condition 2 with genotypes
A1A1, A1A2 and A2A2 have effects –i, 0 and i, respectively. Under this model, the interaction
parameter i represents the difference between genotypic value a in Condition 2, and genotypic value
a in Condition 1 after the main effect of the environmental condition has been taken into account.
Differently from van der Sluis et al. (2008), however, we did not decompose the environmental
variables that vary from one sibling to the next within the same sibship into the b and w parameters.
Instead, we limited ourselves to use the sibling specific value, because the performance of a child in
the neuropsychological domains depended only on his/her environmental condition and not on the
environmental condition of his/her siblings.
In the case of the sib pair association design the phenotypic score yijkg (i.e., the observed score y of
subject j from family i in condition k with genotype g) is then modelled as:
yijkg = τi + abAbi + awAwij + ekEk + ibkgIkg + iwkgIkg + εij
where τi is the family-specific intercept, ab and aw are the estimated between- and within-family
additive genetic effects of the marker weighted by the derived coefficients Abi and Awij (where bi
and wij are orthogonal between- and within-family components of the genotype gij), ek represents
the effects of the categorical environmental condition k, ibkg and iwkg represent the between- and
within-family effects of the interaction of genotype g and environmental condition k, weighted by
the derived coefficient Ikg, and εij is the residual term (van der Sluis et al., 2008).
Since each quantitative environmental variable was centred around its mean, the environmental
main effect (e) was then modelled as just described for the categorical environmental variables. In
the case of the sib pair association design, the phenotypic score yijg (i.e., the observed score y of
subject j from family i with genotype g) was then modelled as:
yijg = τi + abAbi + awAwij + eEij + ibgEijAbi + iwgEijAwij +εij
where e represents the effect of an increase of one unit of the quantitative environmental exposure,
ibg and iwg represent the between- and within-family effects of the interaction of genotype g and the
environmental measure Eij.
Our method of ascertainment resulted into a left-skewed distribution of the association model
residuals. To obtain valid p-values in the presence of departures from normality of the residuals, pvalues were computed by applying a permutation procedure to the residuals of a model without the
within-family interaction term iw. For instance the model for categorical covariates becomes:
Yijkg = τi + abAbi + awAwij + ekEk + ibkgIkg + εij
Prior to the permutations, we imputed the values of the phenotypes and quantitative environmental
variables that were missing in the dataset using an EM algorithm implemented in the Missing Value
Analysis (MVA) function of SPSS version 17.0. Few values were missing for any variable (on
average 10%), and the imputation had therefore little impact on the coefficient and variance
component estimates and their precision in the actual data.
We implemented the permutation procedure using the R statistical environment (www.rproject.org). Permutations were applied to the sibships, to preserve the within-sibship phenotype
correlation. The varying sibship size required the following adjustments. Let mi* be the size of the
permuted sibship providing the residuals and mi the size of the sibship providing the fitted values.
When mi* ≥ mi, mi residuals were randomly sampled without replacement from the mi* available
residuals and added to the mi fitted values. When mi* < mi, mi residuals were sampled with
replacement from the mi* available residuals, so that at least one residual was used two or more
times. The permutation procedure was repeated 1.000 times for each analysis. Since Bonferroni
correction for multiple testing would have been too conservative (Scerri and Schulte-Korne, 2010),
we decided to adjust the significance levels by the false discovery rate (FDR) method (Storey,
2002) applied to the 28 tests performed for each marker (7 environmental variable x 4 phenotypes),
separately for each marker (Mascheretti et al., 2013). Gender was taken into account in the
extended equation because probands’ sex ratio (males:females) in our sample was nearly 3:1, and it
may imply differences in mean scores between males and females.
Moreover, since simulations studies revealed specific situations (e.g., irregular distributions of the
variables, ease of analysis, prior use of the variable) in which dichotomized variables performed as
well as or better than the original quantitative factors, we subsequently decided to further explore
this dataset by conducting GxE analysis with dichotomized environmental factors, which is indeed
more straightforward than conducting analyses with continuous indicators, especially when testing
interaction effects in regression (DeCoster et al., 2009). We therefore decided to dichotomize raw
scores of quantitative environmental variables (i.e., birth weight, parental age, SES, and parental
education; see ‘Methods’, ‘Environmental data collection’ section). In particular, birth weight and
SES have been dichotomized based on theoretically meaningful cut-off points, as available from
existent literature (Zubrick et al., 2007; Lasky-Su et al., 2007; Nobile et al., 2010; Phua et al.,
2012): i.e., 2500 grams and 30 points, respectively. For parental age at the child’s birth and parental
education during the child’s first three years no firm empirical data or theoretical guidelines are
available: therefore, we set arbitrary cut-off points at the 15th percentile of the distribution (i.e., 28
years old and 20 points, respectively). The adoption of these cut-offs led to 2 categories of risk, i.e.,
above (coded as ‘0’) and equal to/below (coded as ‘1’) the cut-off value, respectively. None of the
above-defined variables had a missing values’ frequency > 10% and minor category’s frequency <
5% (data available upon request).
References
Abecasis, GR, Cardon, LR, Cookson, WO. (2000a) A general test of association for quantitative
traits in nuclear families. Am J Hum Genet 66: 279-292.
Abecasis, GR, Cookson, WO, Cardon, LR (2000b) Pedigree tests of transmission disequilibrium.
Eur J Hum Genet 8: 545-551.
DeCoster, J, Iselin, AMR, Gallucci, M. (2009) A conceptual and empirical examination of
justification for dichotomization. Psychological Methods 14: 349-366.
Fulker, DW, Cherny, SS, Sham, PC, Hewitt, JK. (1999) Combined linkage and association sib-pair
analysis for quantitative traits. Am J Hum Genet 64: 259-267.
Lasky-Su, J, Faraone, SV, Lange, C, Tsuang, MT, Doyle, AE, Smoller. JW, et al. (2007) A study of
how socioeconomic status moderates the relationship between SNPs encompassing BDNF and
ADHD symptom counts in ADHD families. Behav Genet 37: 487-497.
Mascheretti, S, Bureau, A, Battaglia, M, Simone, D, Quadrelli, E, Croteau, J, Cellino, et al. (2013a)
An assessment of gene-by-environment interactions in developmental dyslexia-related phenotypes.
Genes Brain Behav 12, 47-55.
Nobile, M, Rusconi, M, Bellina, M, Marino, C, Giorda, R, Carlet, O, et al. (2010) COMT
Val158Met
polymorphism
and
socioeconomic
status
interact
to
predict
attention
deficit/hyperactivity problems in children aged 10-14. Eur Child Adolesc Psychiatry 19: 549-557.
Phua, DY., Rifkin-Graboi, A, Saw, SM, Meaney, MJ, Qiu, A. (2012) Executive functions of sixyear-old boys with normal birth weight and gestational age. PLoS One 7: e36502.
Scerri, TS, Schulte-Korne, G. (2010) Genetics of developmental dyslexia. Eur Child Adolesc
Psychiatry 19:179-197.
Storey, JD. (2002) A direct approach to false discovery rates. J Royal Statistical Society, Series B
64: 479-498.
Van den Oord. (1999) Method to detect genotype-environment interactions for quantitative trait loci
in association studies. Am J Epidemiol 150: 1179-1187.
van der Sluis, S, Dolan, CV, Neale, MC, Posthuma, D. (2008) A general test for gene-environment
interaction in sib pair-based association analysis of quantitative traits. Behav Genet 38, 372-389.
Zubrick, SR, Taylor, CL, Rice, ML, Slegers, DW. (2007) Late language at 24 months: an
epidemiological study of prevalence, predictors, and covariates. J Speech Lang Hear Res 50: 15621592.
Download