An additional role of O-acetylserine as a sulphur status independent regulator during plant growth Hans-Michael Hubberten*1, Sebastian Klie1, Camila Caldana, Thomas Degenkolbe, Lothar Willmitzer and Rainer Hoefgen Max Planck Institut für Molekulare Pflanzenphysiologie, Am Mühlenberg 1, 14476, Potsdam-Golm, Germany * To whom correspondence should be addressed: E-mail: hubberten@mpimp-golm.mpg.de 1 Both authors equally contributed to this work Supplementary text: Data processing and computation Data pre-processing and normalisation Arabidopsis rosettes were harvested at 9 time-points from 0 to 120 minutes at 20 minute intervals with an additional 5 and 10 min time-point. Plants were grown under long-days (16h) at light/night temperatures of 21°C/18°C for 2 weeks and then transferred to temperature conditions of 20°C and 30°C additionally challenged with light/dark transition. Control plants were kept under 21°C and long-day conditions and samples were taken at the previously mentioned time-points. These obtained samples were hybridized on Affymetrix ATH1 arrays. Analysis of raw CEL files was performed using bioconductor software (Gentleman et al., 2004) for R (http://www.r-project.org/) and consisted of quality control via Affymetrix MAS5 present/absent calls (affy package, (Gautier et al., 2004) and GCRMA expression estimation (gcrma package, (Wu et al., 2004). Only transcripts which displayed present calls in three adjacent time points irrespective of the condition were considered for further analysis, resulting in 15089 probesets. Mapping of Affymetrix probe set IDs were to AGI codes were obtained using TAIR version 8. Peak detection; retention time (RT) alignment and library matching for the GC-MS data for OAS were obtained using the TargetSearch package from bioconductor (Cuadros-Inostroza et al., 2009). OAS’ concentration was normalised by dividing each raw value by the median of all measurements of the experiment. Both, gene expression data and OAS measurements are log2 transformed and are normalised to the control time-point (difference of the log values). Computation of time-shifted correlation Our approach for the identification of genes exhibiting changes in their expression as a putative response to changes in the concentration of OAS is characterized by analysing timeshifted correlations between variables. By using spline interpolation, we obtained kinetics for OAS and genes, which are linear with a time interval between two observations of 5 minutes. Time-course gene expression levels or concentration levels of OAS, respectively, of two variables A and B (i.e. gene-gene, OAS-gene) consist of n =25 time-points (observed and interpolated) and are represented by the corresponding vectors a and b. Let alm a l ...a m denote the entries of a indexed by l ≥1and m≤n. Now, we define the time-shifted correlation coefficient i for a and b with a time-shift of i time-points as: Cov(a1n i , bin1 ) ,i 0 (a1n i ) (bin1 ) i n n i Cov ( a , b ) i 1 1 , i 0. (ani 1 ) (b1n i ) Note, that when considering time-shift of i > 0, vector b denotes the ‘lagged’ gene. For each pair of variables and time-shifts , we consider the maximum correlation obtained under any time-shift: max i , i . In our approach, we considered time-shifts of 0 to 3 time-points for correlations between OAS and genes, and -1, 0 and 1 time-point for pair wise gene correlations. A threshold to estimate statistical significance - as employed in step B - for the time-shifted correlation was empirically estimated by permutation tests: Here, we sample for each gene expression values for a measurement at a specific time-point uniformly at random out of the observed data acquired for that gene. Now, we compute the time-shifted correlations irandom on the permuted data for all time-shifts i used in the original analysis. Furthermore, iteration of this process yields an estimate of the null-distribution for i based on irandom which takes the nature of gene-expression data into account (e.g. scales and dynamic range), while eliminating the dependency on the time at which measurements were taken. Finally, by setting the threshold TOAS in such a way that only the α-th percentile of all time-shifted correlations irandom are greater than TOAS, we get a global estimator of statistical significance with a significance level of α%. In our analysis, we used both α = 1% and α = 5% to derive putative OAS responsive genes (see supplemental figure 1 for an overview of genes identified employing a significance level of 5%). Computation of partial correlation In order to identify genes that have a highly similar expression pattern compared to the observed changes of OAS levels in both experiments, i.e. initial ‘guide-genes’ and thereof derived candidate genes, we quantify this similarity by Pearson’s correlation coefficient. However, the typical pattern of OAS – a sharp increase after light/dark transition in the first experiment and a periodic increase during the night in the second experiment – might not be typical only for OAS. Although we show that sulphur-related metabolites remain unchanged, other metabolites may possibly display patterns of levels similar to that of OAS. As a consequence, genes show high correlation not only with OAS but these metabolites as well. In such a case, it becomes difficult to argue that these genes exclusively respond to OAS. Now, to further reduce the possibility that the guide-genes exhibit a high correlation to OAS not as the results of a response to altered OAS levels, but rather as a response to a third metabolite with a similar pattern, we use partial correlation (Lawrance, 1976). Let a denote the corresponding vector of n measurements of levels obtained for OAS and b denote the vector of n expression values experimentally obtained for a certain gene, respectively. Further, let c denote the vector of measurements for a single metabolite other than OAS. The first order partial correlation of a and b without the influence of a single control variable c is defined based on the pairwise Pearson correlations of a and b ( ab ), a and c ( ac ), and b and c ( cb ), as follows: abc ab ac bc 1 2 ac 1 2 bc . It then follows that if abc 0 the association of a and b is fully explained by c. Such a case would indicate that the observed correlation of a gene with the profile a to OAS characterized by vector b, originates by the mutual correlation to a third metabolite (profile c), rather than by a response to changed OAS levels. To further eliminate the contribution of multiple other measured metabolites to the correlation of a and b, the above formula is extended to a recursive formulation defined for a set of control variables. Let C, |C|=m denote this set of m metabolites for which ci C , 1 i m denotes the vector of concentration measurements for the i-th metabolite in C. We define the m-th order partial correlation of a and b that eliminates the influence of the set of metabolites C recursively by (Sokal et al., 1995): abC abC \{c } ac C \{c }bc C \{c } i 1 i 2 aci C \{ci } i 1 i i 2 bci C \{ci } . Here C \ {ci } denotes the set of all m metabolites with the i-th metabolite removed. Note, that the above formula holds for any ci C . From this formula, we can deduce, that the m-th order partial correlation can be computed from three (m-1)-th order partial correlations. Again, the ordinary Pearson correlation, i.e. abØ ab , is known as the 0-th order partial correlation. Typically, for large sets of C, the partial correlation can be obtained by matrix inversion of the covariance matrix of C (Baba et al., 2004). The numerical results for the partial correlation in this work were obtained using the R package ‘corpcor’ (http://cran.r-project.org/.../packages/corpcor/). For both experiments we control for those metabolite which show a similar pattern to OAS, reflected in a metabolite-OAS correlation of 0.7 . This value is motivated by the fact that metabolites exhibiting a correlation to OAS of 0.7 explain 50% of the variance, as the coefficient of determination is 2 0.5 . In case of the light/dark transition experiment, 3 out of 100 measured metabolites exceed this threshold and together form the set of controlling variables C: methionine, mannitol and fucose. In case of the diurnal experiment, 2 out of 50 measured metabolites, maltose and trehalose, are considered for computing partial correlation. Testing the significance of a partial correlation coefficient can be achieved analogous to the standard testing of significance of Pearson’s correlation coefficient. The test-statistic t abC n2m 2 ab C follows a Student’s t-distribution with n-m-2 degrees of freedom and depends on the number of observations, n, and the number of variables for which we control, m (Sokal et al., 1995). By using the t-distribution parameterized by n and m, p-values can be obtained for every pairwise gene-OAS partial correlation by a single-sided test for the alternative hypothesis H1 : abC 0 . In our computational approach, partial correlation is employed to facilitate two filtering steps. In the first, we identify guide-genes that exhibit values of Pearson’s correlation coefficient exceeding a threshold derived using the distribution of all correlation of genes to OAS. Subsequently, this set of guide-genes is further refined using partial correlation of guide-genes and OAS (cf. Fig. 2 main text). In the second, we apply partial correlation to refine the set of target genes derived by co-expression analysis from the (previously filtered) guide-genes. Again, we preserve only those genes exhibiting a significant partial correlation to OAS. In both cases, we control for metabolites similar to OAS in the respective datasets and employ the aforementioned hypothesis test using a significance level of 5%. Additional references for supplemental material: Baba, K., Shibata, R. and Sibuya, M. (2004) Partial correlation and conditional correlation as measures of conditional independence. Aust Nz J Stat, 46, 657-664. Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B-Methodological, 57, 289-300. Caldana, C., Degenkolbe, T., Cuadros-Inostroza, A., Klie, S., Sulpice, R., Leisse, A., Steinhauser, D., Fernie, A.R., Willmitzer, L. and Hannah, M.A. (2011) High-density kinetic analysis of the metabolomic and transcriptomic response of Arabidopsis to eight environmental conditions The Plant Journal. Cuadros-Inostroza, A., Caldana, C., Redestig, H., Kusano, M., Lisec, J., Pena-Cortes, H., Willmitzer, L. and Hannah, M.A. (2009) TargetSearch - a Bioconductor package for the efficient preprocessing of GC-MS metabolite profiling data. Bmc Bioinformatics, 10, 12. Gautier, L., Cope, L., Bolstad, B.M. and Irizarry, R.A. (2004) affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics, 20, 307-315. Gentleman, R., Carey, V., Bates, D., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J. and Zhang, J. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biology, 5, R80. Kolbe, A., Oliver, S.N., Fernie, A.R., Stitt, M., van Dongen, J.T. and Geigenberger, P. (2006) Combined transcript and metabolite profiling of Arabidopsis leaves reveals fundamental effects of the thiol-disulfide status on plant metabolism. Plant physiology, 141, 412-422. Lawrance, A.J. (1976) Conditional and Partial Correlation. Am Stat, 30, 146-149. Lehmann, M., Schwarzlander, M., Obata, T., Sirikantaramas, S., Burow, M., Olsen, C.E., Tohge, T., Fricker, M.D., Moller, B.L., Fernie, A.R., Sweetlove, L.J. and Laxa, M. (2009) The metabolic response of Arabidopsis roots to oxidative stress is distinct from that of heterotrophic cells in culture and highlights a complex relationship between the levels of transcripts, metabolites, and flux. Molecular plant, 2, 390-406. Malitsky, S., Blum, E., Less, H., Venger, I., Elbaz, M., Morin, S., Eshed, Y. and Aharoni, A. (2008) The transcript and metabolite networks affected by the two clades of Arabidopsis glucosinolate biosynthesis regulators. Plant physiology, 148, 2021-2049. Maruyama-Nakashita, A., Nakamura, Y., Tohge, T., Saito, K. and Takahashi, H. (2006) Arabidopsis SLIM1 is a central transcriptional regulator of plant sulfur response and metabolism. The Plant cell, 18, 3235-3251. Obayashi, T., Hayashi, S., Saeki, M., Ohta, H. and Kinoshita, K. (2009) ATTED-II provides coexpressed gene networks for Arabidopsis. Nucleic acids research, 37, D987-991. Pollard, K. S., Dudoit, S., and van der Laan, M. J. (2005). Bioinformatics and Computational Biology Solutions Using R and Bioconductor: Multiple testing procedures: the multtest package and applications to genomics, volume 15. New York: Springer Rivals, I., Personnaz, L., Taing, L., and Potier, M. (2007). Enrichment or depletion of a go category within a class of genes: which test? Bioinformatics, 23:401–407. Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, No. 1 Sokal, R. R. and Rohlf, F. J. (1995) Biometry. W.H. Freeman and Company, New York Wu, Z.J., Irizarry, R.A., Gentleman, R., Martinez-Murillo, F. and Spencer, F. (2004) A model-based background adjustment for oligonucleotide expression arrays. Journal of the American Statistical Association, 99, 909-917.