Method S1 - Morris Lab

Supplemental Methods On the choice and optimality of the sequence-based benchmark method Assessing the predictive value of target site accessibility required us to establish a benchmark for the accuracy of methods that do not consider target site accessibility. We have made some simplifying assumptions when designing this benchmark, and as such it may not represent the current “best possible sequence-based model”. However, we have designed it to be fair and it has some advantages over other choices. First consider two facts: target site accessibility is assessed using the mRNA sequence, so the best possible sequence-based model may simply be one that most accurately predicts target site accessibility; also, there is no consensus opinion on what the best possible sequence model is, so even if we were able to identity and use the current best model, later improvements to this model may invalidate our comparison. However, by making two simplifying assumptions and evaluating models using AUROC, the baseline performance represented by the #TS model is as good as or better than that of whole class of sequencebased models. The first assumption is that the consensus sequence motif summarizes the RBP’s sequence binding preferences, i.e., that the RBP can only bind target sites matching the motif and binds them with equal affinity. This approximation, though drastic, is commonly made and we could find no reason to believe that it biased the comparison in favor of target site accessibility. Even when we allowed the #TS model to optimize its consensus sequence, the #ATS model still performed better using the learned motif for eight of nine RBPs (see Fig. 4, main text). The second assumption is that the contribution of a target site to the likelihood that a RBP will bind a transcript is independent of its position in the transcript. Though when this assumption is relaxed and we only scan the 3’ UTR for target sites, we have similar results (Suppl. Fig. 1 online). The advantage of making these two assumptions is that we ensured that the #TS model ranked transcripts in the same order (and thus have the same AUROC) as any sequencebased model in which adding another target site increases the likelihood an RBP will bind a transcript. Models of this type represent a large number of other sensible sequencebased models. In contrast, #ATS represents only one possible way of combining the accessibilities of multiple target sites, so its AUROC is a lower bound on that of best possible performance of a target site accessibility based model (e.g., the probability that at least one target site will be accessible 1). Supplementary References 1. 2. 3. 4. Hackermuller, J., Meisner, N.C., Auer, M., Jaritz, M. & Stadler, P.F. The effect of RNA secondary structures on RNA-ligand binding and the modifier RNA mechanism: a quantitative model. Gene 345, 3-12 (2005). Hogan, D.J., Riordan, D.P., Gerber, A.P., Herschlag, D. & Brown, P.O. Diverse RNA-binding proteins interact with functionally related sets of RNAs, suggesting an extensive regulatory system. PLoS Biol 6, e255 (2008). Gerber, A.P., Luschnig, S., Krasnow, M.A., Brown, P.O. & Herschlag, D. Genome-wide identification of mRNAs associated with the translational regulator PUMILIO in Drosophila melanogaster. Proc Natl Acad Sci U S A 103, 4487-4492 (2006). Tadros, W. et al. SMAUG is a major regulator of maternal mRNA destabilization in Drosophila and its translation is activated by the PAN GU kinase. Dev Cell 12, 143-155 (2007).

Method S1 - Morris Lab

Related documents

Products

Support

Method S1 - Morris Lab

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib