TPJ_4905_sm_Supptext

advertisement
An additional role of O-acetylserine as a sulphur status
independent regulator during plant growth
Hans-Michael Hubberten*1, Sebastian Klie1, Camila Caldana, Thomas Degenkolbe, Lothar Willmitzer
and Rainer Hoefgen
Max Planck Institut für Molekulare Pflanzenphysiologie, Am Mühlenberg 1, 14476, Potsdam-Golm, Germany
*
To whom correspondence should be addressed: E-mail: hubberten@mpimp-golm.mpg.de
1
Both authors equally contributed to this work
Supplementary text: Data processing and computation
Data pre-processing and normalisation
Arabidopsis rosettes were harvested at 9 time-points from 0 to 120 minutes at 20 minute
intervals with an additional 5 and 10 min time-point. Plants were grown under long-days
(16h) at light/night temperatures of 21°C/18°C for 2 weeks and then transferred to
temperature conditions of 20°C and 30°C additionally challenged with light/dark transition.
Control plants were kept under 21°C and long-day conditions and samples were taken at the
previously mentioned time-points. These obtained samples were hybridized on Affymetrix
ATH1 arrays. Analysis of raw CEL files was performed using bioconductor software
(Gentleman et al., 2004) for R (http://www.r-project.org/) and consisted of quality control via
Affymetrix MAS5 present/absent calls (affy package, (Gautier et al., 2004) and GCRMA
expression estimation (gcrma package, (Wu et al., 2004). Only transcripts which displayed
present calls in three adjacent time points irrespective of the condition were considered for
further analysis, resulting in 15089 probesets. Mapping of Affymetrix probe set IDs were to
AGI codes were obtained using TAIR version 8.
Peak detection; retention time (RT) alignment and library matching for the GC-MS data for
OAS were obtained using the TargetSearch package from bioconductor (Cuadros-Inostroza et
al., 2009). OAS’ concentration was normalised by dividing each raw value by the median of
all measurements of the experiment. Both, gene expression data and OAS measurements are
log2 transformed and are normalised to the control time-point (difference of the log values).
Computation of time-shifted correlation
Our approach for the identification of genes exhibiting changes in their expression as a
putative response to changes in the concentration of OAS is characterized by analysing timeshifted correlations between variables. By using spline interpolation, we obtained kinetics for
OAS and genes, which are linear with a time interval between two observations of 5 minutes.
Time-course gene expression levels or concentration levels of OAS, respectively, of two
variables A and B (i.e. gene-gene, OAS-gene) consist of n =25 time-points (observed and
interpolated) and are represented by the corresponding vectors a and b. Let alm  a l ...a m 
denote the entries of a indexed by l ≥1and m≤n. Now, we define the time-shifted correlation
coefficient  i for a and b with a time-shift of i time-points as:
 Cov(a1n i , bin1 )
,i  0

  (a1n i ) (bin1 )
i  
n
n i
Cov
(
a
,
b
)

i

1
1

, i  0.
 (ani 1 ) (b1n i )
Note, that when considering time-shift of i > 0, vector b denotes the ‘lagged’ gene. For each
pair of variables and time-shifts  , we consider the maximum correlation obtained under any
time-shift: max i , i   . In our approach, we considered time-shifts of 0 to 3 time-points for
correlations between OAS and genes, and -1, 0 and 1 time-point for pair wise gene
correlations.
A threshold to estimate statistical significance - as employed in step B - for the time-shifted
correlation was empirically estimated by permutation tests: Here, we sample for each gene
expression values for a measurement at a specific time-point uniformly at random out of the
observed data acquired for that gene. Now, we compute the time-shifted correlations irandom
on the permuted data for all time-shifts i used in the original analysis. Furthermore, iteration
of this process yields an estimate of the null-distribution for i based on irandom which takes
the nature of gene-expression data into account (e.g. scales and dynamic range), while
eliminating the dependency on the time at which measurements were taken. Finally, by setting
the threshold TOAS in such a way that only the α-th percentile of all time-shifted correlations
irandom are greater than TOAS, we get a global estimator of statistical significance with a
significance level of α%. In our analysis, we used both α = 1% and α = 5% to derive putative
OAS responsive genes (see supplemental figure 1 for an overview of genes identified
employing a significance level of 5%).
Computation of partial correlation
In order to identify genes that have a highly similar expression pattern compared to the
observed changes of OAS levels in both experiments, i.e. initial ‘guide-genes’ and thereof
derived candidate genes, we quantify this similarity by Pearson’s correlation coefficient.
However, the typical pattern of OAS – a sharp increase after light/dark transition in the first
experiment and a periodic increase during the night in the second experiment – might not be
typical only for OAS. Although we show that sulphur-related metabolites remain unchanged,
other metabolites may possibly display patterns of levels similar to that of OAS. As a
consequence, genes show high correlation not only with OAS but these metabolites as well. In
such a case, it becomes difficult to argue that these genes exclusively respond to OAS. Now,
to further reduce the possibility that the guide-genes exhibit a high correlation to OAS not as
the results of a response to altered OAS levels, but rather as a response to a third metabolite
with a similar pattern, we use partial correlation (Lawrance, 1976).
Let a denote the corresponding vector of n measurements of levels obtained for OAS and b
denote the vector of n expression values experimentally obtained for a certain gene,
respectively. Further, let c denote the vector of measurements for a single metabolite other
than OAS. The first order partial correlation of a and b without the influence of a single
control variable c is defined based on the pairwise Pearson correlations of a and b (  ab ), a
and c (  ac ), and b and c (  cb ), as follows:
 abc 
 ab   ac bc
1 
2
ac
1 
2
bc
.
It then follows that if  abc  0 the association of a and b is fully explained by c. Such a case
would indicate that the observed correlation of a gene with the profile a to OAS characterized
by vector b, originates by the mutual correlation to a third metabolite (profile c), rather than
by a response to changed OAS levels.
To further eliminate the contribution of multiple other measured metabolites to the correlation
of a and b, the above formula is extended to a recursive formulation defined for a set of
control variables. Let C, |C|=m denote this set of m metabolites for which ci  C , 1  i  m
denotes the vector of concentration measurements for the i-th metabolite in C. We define the
m-th order partial correlation of a and b that eliminates the influence of the set of metabolites
C recursively by (Sokal et al., 1995):
abC 
abC \{c }  ac C \{c }bc C \{c }
i
1 
i
2
aci C \{ci }
i
1 
i
i
2
bci C \{ci }
.
Here C \ {ci } denotes the set of all m metabolites with the i-th metabolite removed. Note, that
the above formula holds for any ci  C . From this formula, we can deduce, that the m-th
order partial correlation can be computed from three (m-1)-th order partial correlations.
Again, the ordinary Pearson correlation, i.e.  abØ   ab , is known as the 0-th order partial
correlation. Typically, for large sets of C, the partial correlation can be obtained by matrix
inversion of the covariance matrix of C (Baba et al., 2004). The numerical results for the
partial correlation in this work were obtained using the R package ‘corpcor’
(http://cran.r-project.org/.../packages/corpcor/).
For both experiments we control for those metabolite which show a similar pattern to OAS,
reflected in a metabolite-OAS correlation of   0.7 . This value is motivated by the fact that
metabolites exhibiting a correlation to OAS of   0.7 explain 50% of the variance, as the
coefficient of determination is  2  0.5 . In case of the light/dark transition experiment, 3 out
of 100 measured metabolites exceed this threshold and together form the set of controlling
variables C: methionine, mannitol and fucose. In case of the diurnal experiment, 2 out of 50
measured metabolites, maltose and trehalose, are considered for computing partial correlation.
Testing the significance of a partial correlation coefficient can be achieved analogous to the
standard testing of significance of Pearson’s correlation coefficient. The test-statistic
t  abC
n2m
2
ab
C
follows a Student’s t-distribution with n-m-2 degrees of freedom and depends on the number
of observations, n, and the number of variables for which we control, m (Sokal et al., 1995).
By using the t-distribution parameterized by n and m, p-values can be obtained for every
pairwise gene-OAS partial correlation by a single-sided test for the alternative
hypothesis H1 :  abC  0 .
In our computational approach, partial correlation is employed to facilitate two filtering steps.
In the first, we identify guide-genes that exhibit values of Pearson’s correlation coefficient
exceeding a threshold derived using the distribution of all correlation of genes to OAS.
Subsequently, this set of guide-genes is further refined using partial correlation of guide-genes
and OAS (cf. Fig. 2 main text). In the second, we apply partial correlation to refine the set of
target genes derived by co-expression analysis from the (previously filtered) guide-genes.
Again, we preserve only those genes exhibiting a significant partial correlation to OAS. In
both cases, we control for metabolites similar to OAS in the respective datasets and employ
the aforementioned hypothesis test using a significance level of 5%.
Additional references for supplemental material:
Baba, K., Shibata, R. and Sibuya, M. (2004) Partial correlation and conditional correlation as measures of
conditional independence. Aust Nz J Stat, 46, 657-664.
Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate - a Practical and Powerful Approach
to Multiple Testing. Journal of the Royal Statistical Society Series B-Methodological, 57, 289-300.
Caldana, C., Degenkolbe, T., Cuadros-Inostroza, A., Klie, S., Sulpice, R., Leisse, A., Steinhauser, D., Fernie, A.R.,
Willmitzer, L. and Hannah, M.A. (2011) High-density kinetic analysis of the metabolomic and transcriptomic
response of Arabidopsis to eight environmental conditions The Plant Journal.
Cuadros-Inostroza, A., Caldana, C., Redestig, H., Kusano, M., Lisec, J., Pena-Cortes, H., Willmitzer, L. and
Hannah, M.A. (2009) TargetSearch - a Bioconductor package for the efficient preprocessing of GC-MS
metabolite profiling data. Bmc Bioinformatics, 10, 12.
Gautier, L., Cope, L., Bolstad, B.M. and Irizarry, R.A. (2004) affy--analysis of Affymetrix GeneChip data at the
probe level. Bioinformatics, 20, 307-315.
Gentleman, R., Carey, V., Bates, D., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J.,
Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A., Sawitzki,
G., Smith, C., Smyth, G., Tierney, L., Yang, J. and Zhang, J. (2004) Bioconductor: open software development
for computational biology and bioinformatics. Genome Biology, 5, R80.
Kolbe, A., Oliver, S.N., Fernie, A.R., Stitt, M., van Dongen, J.T. and Geigenberger, P. (2006) Combined
transcript and metabolite profiling of Arabidopsis leaves reveals fundamental effects of the thiol-disulfide
status on plant metabolism. Plant physiology, 141, 412-422.
Lawrance, A.J. (1976) Conditional and Partial Correlation. Am Stat, 30, 146-149.
Lehmann, M., Schwarzlander, M., Obata, T., Sirikantaramas, S., Burow, M., Olsen, C.E., Tohge, T., Fricker,
M.D., Moller, B.L., Fernie, A.R., Sweetlove, L.J. and Laxa, M. (2009) The metabolic response of Arabidopsis
roots to oxidative stress is distinct from that of heterotrophic cells in culture and highlights a complex
relationship between the levels of transcripts, metabolites, and flux. Molecular plant, 2, 390-406.
Malitsky, S., Blum, E., Less, H., Venger, I., Elbaz, M., Morin, S., Eshed, Y. and Aharoni, A. (2008) The transcript
and metabolite networks affected by the two clades of Arabidopsis glucosinolate biosynthesis regulators. Plant
physiology, 148, 2021-2049.
Maruyama-Nakashita, A., Nakamura, Y., Tohge, T., Saito, K. and Takahashi, H. (2006) Arabidopsis SLIM1 is a
central transcriptional regulator of plant sulfur response and metabolism. The Plant cell, 18, 3235-3251.
Obayashi, T., Hayashi, S., Saeki, M., Ohta, H. and Kinoshita, K. (2009) ATTED-II provides coexpressed gene
networks for Arabidopsis. Nucleic acids research, 37, D987-991.
Pollard, K. S., Dudoit, S., and van der Laan, M. J. (2005). Bioinformatics and Computational Biology Solutions
Using R and Bioconductor: Multiple testing procedures: the multtest package and applications to genomics,
volume 15. New York: Springer
Rivals, I., Personnaz, L., Taing, L., and Potier, M. (2007). Enrichment or depletion of a go category within a class
of genes: which test? Bioinformatics, 23:401–407.
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in
microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, No. 1
Sokal, R. R. and Rohlf, F. J. (1995) Biometry. W.H. Freeman and Company, New York
Wu, Z.J., Irizarry, R.A., Gentleman, R., Martinez-Murillo, F. and Spencer, F. (2004) A model-based background
adjustment for oligonucleotide expression arrays. Journal of the American Statistical Association, 99, 909-917.
Download