Gene Ontology (GO) enrichment analysis
Enrichment for GO annotations in 561 RBPs was done by using GO::Term Finder available at http://go.princeton.edu/cgi-bin/GOTermFinder. In brief, all the RBPs were given as an input file and enriched GO terms for process, function and component annotations were identified by using default pvalue cut off (0.1) with 0 % False discovery rate (FDR) for significant GO terms as described in (1) .
Likewise default gene association file containing the annotations for yeast ORFs has been used as a background file for calculating p-values.
Since there are three distinct GO hierarchies namely: GO-function, GO-process and GO-component for functional annotation of genes, results for GO term enrichment are shown as three different sheets in the excel file (Table S1). The analysis shows that RBPs are highly enriched for structural constituents for ribosomes, RNA binding, RNA splicing factor activity, ligase activity, ribonuclease activity etc in GOfunction category (sheet GO function, Table S1). Similarly, RBPs are found to be enriched for gene expression, RNA processing and transport, mRNA metabolic process and translation in GO-process category (sheet GO process Table S1). Further, RBPs are also found to be enriched for ribonucleoprotein complex, ribosome and splicosomes in GO-component category (sheet GO component Table S1).
Comparative analysis of regulatory properties of non-ribosomal RBPs with non-RBPs
Ribosomal and translation associated factors form an important class of RBPs as they are involved in several steps of protein synthesis. Indeed, gene ontology analysis of all the RBPs used in this study showed an enrichment of ribosome (GO:0005840; GO-component) in RBPs. This analysis enabled us to identify 223 out of 561 proteins to belong either directly or indirectly to ribosome and translation associated functions (supplementary table S1), which we define as ribosomal proteins. Further, it has been shown that ribosomal proteins show higher mRNA and protein abundance (2) and low protein noise (3).
Taken altogether, one might expect that the differences in dynamic properties of RBPs and nonRBPs are mostly due to ribosomal proteins. To investigate whether the observed significant differences in dynamic properties of RBPs and non-RBPs are influenced by the ribosomal proteins, we have excluded the (223 genes annotated with GO: 0005840) from 561 RBPs and then reanalyzed the expression dynamics of nonribosomal RBPs with non-RBPs as described in the main text.
As a result of this analysis we found that mRNA halflife of non-ribosomal RBPs (NRR) is significantly lower (p-value = 2.510e-11) when compared to non-RBPs (Fig. S1 A). On the other hand, mRNA abundance was found to be significantly higher (p-value = 3.381e-05) in non-ribosomal RBPs in comparison to non-RBPs (Fig. S1 B). Further, result shows that there is no difference in the ribosomal occupancy of non-ribosomal RBPs and non-RBPs (Fig. S1 C) suggesting that ribosomal proteins might be more efficiently translated compared to non-ribosomal RBPs, although both of them are found to be produced in higher mRNA copies per cell. We also note that protein abundance (Fig. S1 D) and half-life
(Fig. S1 E) are also higher for non-ribosomal RBPs compared to non-RBPs (p-value = 0.02226 and
0.04823 respectively). Moreover, non-ribosomal RBPs showed significantly lower (p-value = 6.315e-06) noise when compared to non-RBPs (Fig. S1 F). Altogether, these results suggest that removal of ribosomal proteins from RBPs has affected the result of ribosome occupancy only and has no significant affect on the results of mRNA halflife, cellular abundance and protein noise. In short, these results indicate that the exclusion of ribosomal proteins from RBPs does not affect our end results and that the differences in the expression dynamics of RBPs and non-RBPs are not an outcome of the presence of ribosomal proteins in RBPs.
Figure S1: Comparing expression dynamics of non-ribosomal RBPs with non-RBPs in the entire genome. Non-ribosomal RBPs are those RBPs which remain after exclusion of ribosomal and translational associated factors (223 genes annotated with GO:
0005840 from GO component category after GO analysis of 561 RBPs analyzed this study). Box-plots showing the distribution of values for various regulatory properties for the two different groups of proteins (non ribosomal RBPs and non-RBPs) in S. cerevisiae . Blue and red boxes correspond to non ribosomal RBPs (NRR) and non-RBPs (nonRBPs) populations respectively.
Box-plot identifies the middle 50% of the data, the median, and the extreme points. The entire set of data points is divided into quartiles and the inter-quartile range (IQR) is calculated as the difference between x0.75 and x0.25. The range of the 25% of the data points above (x0.75) and below (x0.25) the median (x0.50) is displayed as a filled box. The horizontal line and the notch represent the median and confidence intervals, respectively. Data points greater or less than 1.5 IQR represent outliers and are shown as dots. The horizontal line that is connected by dashed lines above and below the filled box (whiskers) represents the largest and smallest nonoutlier data points, respectively. (A) mRNA half-life (B) mRNA copy number (C) Ribosome occupancy
(D) Protein abundance (E) Protein half-life (F) Protein noise. In each case, P-values shown correspond to the significance estimated based on Wilcoxon test comparing the non ribosomal RBPs and non-RBP group of proteins. Non-ribosomal RBPs were found to show significantly lower transcript stability and higher mRNA copy number. Although, non-ribosomal RBPs do not show any significant change in ribosome occupancy, they still show significantly high protein stability and abundance.
Moreover protein noise which reflects the extent of cell-to-cell variation in protein levels was found to be significantly lower for non-Ribosomal RBPs compared to non-RBPs. All together, these results suggest that the differences in the expression dynamics of RBPs and non-RBPs are not influenced by ribosomal proteins.
Comparison of the regulatory properties of different categories of RBPs with nonRBPs
RNA binding proteins (RBPs) comprise of proteins from several functional classes which can be mainly classified into RNA processing, transport, localization, translation as shown in Fig.1 ( main text ) and supplementary material 1( GO enrichment analysis of RBPs ). In the main text, we have shown that RBPs as a group shows distinct dynamic properties when compared to nonRBPs. In order to understand whether specific functional classes of RBPs mentioned above also show these differences we compared each class with non-RBPs. In short, this involved obtaining the proteins corresponding to each of the enriched GO process in RBPs ( supplementary material 1 ) namely translation, RNA processing, transport and localization, which comprised of 331, 249, 38, 44 proteins respectively. We compared different dynamic properties of these sub-groups with nonRBPs in a similar way as described in methods and materials in the main text.
Translation: We have identified the proteins from the enriched GO process, translation as mentioned in supplementary material 1. This category comprised of 331 proteins. A comparative analysis of translation associated RBPs with nonRBPs showed that mRNA half-life is low for translation as shown in Fig. S2 A.
Further, mRNA abundance, ribosome occupancy, protein abundance and protein half-life were found to be significantly higher as shown in Fig. S2 B, C, D and E respectively. Protein noise was observed to be significantly low for these proteins when compared with nonRBPs (Fig. S2 F).
A B C
D
E F
Figure S2: Comparing expression dynamics of translation (RBPs involved in translation) with non-RBPs in the entire genome.
Box-plots showing the distribution of values for various regulatory properties for the two different groups of proteins (translation and non-RBPs) in S. cerevisiae . Blue and red boxes correspond to translation (trn) and non-RBPs (nonRBPs) populations respectively. (A) mRNA half-life (B) mRNA copy number (C) Ribosome occupancy (D) Protein abundance (E) Protein half-life
(F) Protein noise. In each case, P-values shown correspond to the significance estimated based on Wilcoxon test comparing the translational and non-RBP group of proteins. All together, these results suggest that the translation associated RBPs also follow the trend similar to RBPs as a group.
Transport: Transport was taken as another category of proteins in RBPs whose dynamic properties were compared with nonRBPs. This category comprised of only 38 proteins ( GO enriched processes, supplementary material 1 ). The analysis showed that mRNA abundance, ribosome occupancy and protein abundance are significantly higher as shown in Fig. S3 B, C and D. On the other hand, mRNA and protein half-life were comparable with nonRBPs (Fig. S3 A, E). We found that for several transport associated
RBPs, half-life values were missing and so could be contributing to this observation. Protein noise of transport proteins was also found to be significantly lower as shown in Fig. S3 F.
Figure S3: Comparing expression dynamics of transport (RBPs involved in transport) with non-RBPs. Box-plots showing the distribution of values for various regulatory properties for the two different groups of proteins (transport and non-RBPs) in S. cerevisiae . Blue and red boxes correspond to transport (tras) and non-RBPs (nonRBPs) populations respectively. (A) mRNA half-life (B) mRNA copy number (C) Ribosome occupancy (D) Protein abundance (E) Protein half-life (F) Protein noise. In each case, P-values shown correspond to the significance estimated based on Wilcoxon test comparing the transport and non-RBP group of proteins. All together, these results suggest that although the size of this set is much smaller they follow the trend similar to RBPs as a group.
Localization: This class included 44 proteins according to GO enriched processes of RBPs. A comparative analysis of dynamic properties of localization associated RBPs with nonRBPs showed that mRNA abundance, ribosome occupancy and protein abundance are significantly higher (Fig. S4 B, C and
D). Protein half-life was also significantly higher (Fig. S4 E) while mRNA half-life was comparable (Fig.
S4 A) between the two groups. Protein noise was also found to be low for localization proteins in comparison to nonRBPs (Fig. S4 F).
All together, these results suggest that although the size of this set is much smaller than RBPs, they follow the trend similar to RBPs as a group.
A B C
D
E F
Figure S4: Comparing expression dynamics of localization (RBPs involved in localization of RNAs) with non-RBPs. Box-plots showing the distribution of values for various regulatory properties for the two different groups of proteins (localization and non-
RBPs) in S. cerevisiae . Blue and red boxes correspond to localization (loca) and non-RBPs (nonRBPs) populations respectively.
(A) mRNA half-life (B) mRNA copy number (C) Ribosome occupancy (D) Protein abundance (E) Protein half-life (F) Protein noise. In each case, P-values shown correspond to the significance estimated based on Wilcoxon test comparing the localization associated RBPs and non-RBPs. All together, these results suggest that the localization associated RBPs are likely to follow the trend similar to RBPs as whole group.
RNA processing: RNA processing is a category in RBPs which includes proteins involved in poly- adenylation, decaping, splicing etc.
This group contains 249 proteins according to GO enriched processes of RBPs ( supplementary material 1 ). A comparative analysis showed significantly lower mRNA half-life
(Fig. S5 A) and higher protein half-life (Fig. S5 E) for RNA processing RBPs. Similarly, higher mRNA and protein abundance (Fig. S5 B and D) was observed in this group when compared with nonRBPs, although, ribosomal occupancy was comparable for both the groups (Fig. S5 C). Protein noise was found to be low in RNA processing RBPs when compared with non-RBPs (Fig. S5 F).
A B C
D
E F
Figure S5: Comparing expression dynamics of RNA processing (RBPs involved in processing of RNAs) with non-RBPs. Boxplots showing the distribution of values for various regulatory properties for the two different groups of proteins (RNA processing RBPs and non-RBPs) in S. cerevisiae . Blue and red boxes correspond to RNA processing (proc) and non-RBPs
(nonRBPs) populations respectively. (A) mRNA half-life (B) mRNA copy number (C) Ribosome occupancy (D) Protein abundance (E) Protein half-life (F) Protein noise. In each case, P-values shown correspond to the significance estimated based on
Wilcoxon test comparing the RNA processing RBPs and non-RBP as a group of proteins. All together, these results suggest that the RNA processing RBPs follow the trend similar to RBPs as whole group.
Altogether, these results suggest that each category of RBPs are likely to follow the same trend shown by
RBPs as a group (Table S2), although factors such as (i) the number of proteins in a category as is the case in localization and transport categories (ii) unavailability of the data for certain properties in some categories might be effecting the end results.
Table S2: Results with significance values (p-values) when comparing different dynamic properties of RBPs and specific classes of RBPs with nonRBPs. The values in bracket (in row 3) describe the number of proteins in that specific class. The p-value mentioned in red color shows the results which are insignificant.
Dynamic
Properties Results mRNA half-life low half-life
Significance of results (p-values)
Total Specific class of RBPs
RBPs
(561)
Translation
(331)
3.16E-10 0.03254 mRNA abundance
Ribosomal occupancy more abundance 2.20E-16 2.20E-16 more occupancy 2.48E-13 2.20E-16
Protein abundance more abundance 2.20E-16 2.20E-16
Protein half-life more half-life 5.01E-12 2.20E-16
Noise low noise 1.66E-12 4.61E-12
Transport
(38)
0.6212
1.88E-08
6.70E-05
2.34E-06
1.20E-01
4.93E-03
Localization
(44)
0.3248
5.40E-08
3.35E-05
5.73E-06
1.00E-02
4.89E-04
Processing
(249)
1.47E-11
3.92E-09
7.91E-01
9.05E-05
1.80E-02
1.61E-04
Comparative analysis of Kinase-substrate network for RBPs and non-RBPs
RBPs have been shown to undergo different post-translational modifications such as phosphorylation, arginine methylation and small ubiquitin like modifications (SUMO). These modifications might be responsible for diverse functions of RBPs. Among the above modifications high-throughput data for protein phosphorylation is available, where kinase-substrate map has been experimentally characterized by Ptacek et al (4) in yeast for most of the kinases. This dataset comprised of a total of 4290 phosphorylation events for 1325 proteins covering 87 kinases. We have used this kinase-substrate network to investigate (i) Enrichment of kinases for RBPs (ii) Enrichment of RBPs for kinases.
Mapping of RBPs and non-RBPs on to the kinase-substrate network, reveals that a total of 59 kinases phosphorylate 127 RBPs. On the other hand as expected all the 87 kinases were found to phosphorylate
1198 nonRBPs (Table S3). This suggests that the number of kinases per RBP (59 kinases for 127 RBPs) is significantly higher (p-value < 2.2e-16, Hypergeometric distribution) when compared with number of kinases per non-RBP (87 kinases for 1198 nonRBPs) (Fig. S6 A). Further, RBPs were also found to be significantly enriched (p-value = 0.02705) for kinase targets (Fig. S6 B). Altogether, these results suggest that each RBP is targeted by more number of kinases when compared to non-RBPs. This difference in the extent of phosphorylation of the same RBP by different kinases may be responsible for the diversity seen in the number of functions exhibited RBPs. The higher rate of phosphorylation also explains the high protein levels and stability of RBPs observed in this study, which would facilitate their availability in protein form for longer periods thereby enabling them to be manipulated at post-translational level.
Table S3: Comparative analysis of RBPs and non-RBPs as a group using the kinase-substrate network. First column of the table corresponds to group of genes studied in the kinase-substrate network. Second column corresponds to genes present in each group employed for mapping on to kinase network. Third column indicates the number of genes in each group phosphorylated by at least one kinase. Fourth column corresponds to the total number of kinases which phosphorylated the genes in column three.
Group
Total
Genes
Number
Kinase Targets of Number of
Kinases
Non RBPs 5685 1198 87
RBPs 561 127 59
Figure S6: Comparative analysis of kinase-substrate network for RBPs and non-RBPs. Blue and red bars correspond to RBPs and non-RBPs respectively. In each case p-value corresponds to the significance based on hypergeometric distribution. (A) shows the number of kinases per RBP and nonRBP. (B) shows the proportion of RBPs and non-RBPs phosphorylated by kinases.
Comparison of kinases per gene for RBPs and non-RBPs shows that kinases are significantly more likely to target RBPs than non-RBPs. Similarly, significantly higher percentage of RBPs were found to be phosphorylated by kinases in comparison to non-
RBPs. These observations suggest that the functions of RBPs are regulated by post-translational modifications.
Comparison of the regulatory properties of RBPs with all protein coding genes
As a control and to understand the differences in comparison with the whole genome, we have also compared the dynamic properties (mRNA half-life, mRNA abundance, ribosome occupancy, protein halflife, protein abundance and noise) of RBPs with all 6241 protein coding genes including RBPs in the yeast genome. In addition as a further control, in order to also verify whether using mean instead of median values would vary our end results, we have calculated the mean of each dynamic property for
RBPs and all the protein coding genes and used Welch two sample t-test available in the R statistical package to calculate the significance of the differences in mean value of each dynamic property for RBPs and all protein coding genes.
The comparative analysis shows significant differences in the mean value for different dynamic properties for RBPs and all protein coding genes. The result (Fig. S7 A) showed that RBPs have significantly less (p
< 2.2x10-16) mean transcript stability (20.35 ± 0.50 minutes) when compared to all protein coding genes
(26.19 ± 0.27 minutes). On the other hand, RBPs show significantly higher transcript abundance in comparison to all protein coding genes (Fig. S7 B). Similarly RBPs show significantly higher ribosome occupancy (average ribosome occupancy for RBPs = 0.74 ± 0.003 versus all protein coding genes = 0.71
± 0.001, p = 8.5x10
-11 ) and protein abundance (average protein abundance = 21541.14 ± 2337.10 versus all protein coding genes = 12068.85 ± 706.92 , p = 5.8x10-4 ) when compared to all protein coding genes
(Fig. S7 C, E respectively) but comparable protein half-life due to many outliers in this dataset (Fig. S7
D). Further, RBPs show significantly less (p = 1.6x10-12) noise (-0.09 ± 0.13) in comparison to all protein coding genes (1.04 ± 0.05 ) (Fig. S7 F).
Figure S7: Comparative dynamic properties of RBPs with all protein coding genes. Bar-graphs show mean value for various regulatory properties for RBPs and all protein coding genes (including RBPs) in S. cerevisiae . Blue and red bars correspond to
RBPs and all protein coding genes (including RBPs) respectively. Error-bars correspond to standard error of mean (SEM). (A) mRNA half-life (B) mRNA copy number (C) Ribosome occupancy (D) Protein abundance (E) Protein half-life (F) Protein noise.
In each case, P-values shown correspond to the significance estimated based on Welch two sample t-test comparing the RBPs and all protein coding genes. RBPs were found to show significantly lower transcript stability, higher mRNA copy number, ribosome occupancy and protein abundance but comparable protein stability. However protein noise, which reflects the extent of cell-tocell variation in protein levels, was found to be significantly lower for RBPs compared to all protein coding genes suggesting that most RBPs are uniformly expressed across a homogenous population of cells.
Enrichment of essential genes in RBPs
RBPs have been shown important for normal functioning of the cell. For instance, variation in their concentration or function is associated with several diseases in human. Moreover, RBPs are considered as conserved group of proteins. Given that RBPs are important proteins and are tightly regulated at protein level (main text), we have asked whether in yeast RBPs are enriched for essential genes or not. For this analysis, we have obtained the dataset of essential genes in yeast from (5). This dataset showed 1103 genes are essential from 5900 genes studied suggesting that ~18 % genes are essential in yeast whole genome. When we analyzed the proportion of essential genes in RBPs we found that ~ 36 % of RBPs fall in the category of essential genes. This shows that that RBPs are significantly (p < 2.2e-16, Hypergeometric distribution) enriched for essential genes.
Table S4 : RBPs are enriched for essential genes. First column corresponds to group of genes. Second column corresponds to total number of genes in each group. Third column tells about the total number of essential genes in each group. Fourth column corresponds to percentage of essential genes in each group.
Group Total Genes Essential genes
% of essential genes
Total genes 5900 1103 18.69
RBPs 561 206 36.72
Enrichment for Cell-cycle regulated genes
To check whether RBPs are enriched for cell-cycle regulated genes, we have obtained the dataset of cellcycle regulated genes in yeast from (6). This dataset comprised of ~400 genes which are associated with cell-cycle. When we analyzed how many genes from RBPs are present in the set, we found that only 10 genes out of 561 are associated with cell-cycle. This suggests that RBPs are not enriched for cell-cycle regulated genes.
1.
2.
3.
4.
5.
6.
Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM, & Sherlock G (2004) Bioinformatics 20,
3710-3715.
Garcia-Martinez J, Gonzalez-Candelas F, & Perez-Ortin JE (2007) Genome Biol 8, R222.
Newman JR, Ghaemmaghami S, Ihmels J, Breslow DK, Noble M, DeRisi JL, & Weissman JS (2006)
Nature 441, 840-846.
Ptacek J, Devgan G, Michaud G, Zhu H, Zhu X, Fasolo J, Guo H, Jona G, Breitkreutz A, Sopko R, et
al. (2005) Nature 438, 679-684.
Deutschbauer AM, Jaramillo DF, Proctor M, Kumm J, Hillenmeyer ME, Davis RW, Nislow C, &
Giaever G (2005) Genetics 169, 1915-1925.
Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian
AE, Landsman D, Lockhart DJ, et al. (1998) Mol Cell 2, 65-73.