Material and methods - Proceedings of the Royal Society B

Material and methods (a) Choice of structures Studying the relationship of robustness and evolvability requires the analysis of many RNA structures. This renders biological RNA sequences as poor objects of analysis: although many biologically important RNA molecules are known, their sizes n are different (and thus incommensurable for my purpose). Conversely, for molecules of any given size n, an insufficient number of biological RNAs are known, with the possible exception of microRNA precursors that have only simple hairpin structures. In addition, although it is straightforward to generate a sample of structures with different frequencies, it is very difficult to estimate the frequency of a given structure, unless the structure is very short. For these reasons, I here focus on structures generated by random sequences. Sequences much shorter than n=30 nucleotides have a limited repertoire of structures, owing to the requirement that any loop that terminates a stem must have at least three unpaired nucleotides. In sequences much longer than that, even a large random sample leads only to unique structures; that is, structures of the same frequency. For instance, in a sample of 106 random sequences of length n=75 one typically finds only unique structures, structures that occur only once in the sample. I thus focus on sequences of modest length (n=30), because they occupy a middle ground of being structurally diverse, yet allow me to explore a broad range of structure frequencies with computationally feasible sample sizes. I refrain from studying coarse-grained structures that contain only information about the number and the order of stems and loops, but not their size. Such coarse graining has been both necessary and highly successful in some analyses (Fontana & Schuster 1998b). However, biological properties of RNA molecules may depend on the sizes of stems and loops, and coarse graining does not reflect the full diversity of structures. (b) RNA structure determination I used the Vienna RNA package (http://www.tbi.univie.ac.at/ivo/RNA, Hofacker et al. 1994) for all analyses. Specifically, I determined the minimum free energy (MFE) structure of a sequence using the routine fold (with default parameters) of the Vienna RNA package (http://www.tbi.univie.ac.at/ivo/RNA, Hofacker et al. 1994). To determine the mutational robustness RG of a sequence G of length n, I generated all its 3n-mutational neighbours and determined the number of neighbours with the same MFE structure P as G. I define rG as the fraction of neighbours that have the same MFE structure as G, i.e. rG=RG/3n. The number of G’s neighbours that adopt a structure different from P is then, by definition, equal to 3n(1-rG). Some of the 3n(1-rG) structures are identical to another structure in the same neighbourhood. I determined the number U of unique structures that are different from one another in the neighbourhood of G by counting all structures in the neighbourhood, but counting structures that occurred twice or more in this neighbourhood only once. By definition EG=U. I define the fraction of unique structures in a neighbourhood as eG=U/3n, and the proportion ud of unique structures among all structures different from P in the neighbourhood as ud=U/(3n(1-rG)). Using the number of unique structures in a 1-neighbourhood of G as a measure of evolvability raises a concern. If the distribution of the number of sequences adopting these structures were extremely skewed, any given number of non-neutral mutations starting from G might produce only the structures adopted by most sequences, and not nearly all structures that actually occur in the neighbourhood. However, this concern is unfounded, because the expected number of different structures produced from G via a given number of non-neutral mutations is highly correlated with the total number of unique structures in the neighbourhood (results not shown). Note that phenotypic robustness and evolvability harbour an important asymmetry that does not permit interchangeable use of absolute numbers and fractions in defining phenotypic robustness and evolvability. Briefly, there is only one way for two structures to be the same, but many ways for them to be different. Put differently, while it is appropriate to calculate the average number of neutral neighbours for sequences with a given phenotype to determine robustness, it would not be appropriate to calculate the average number of unique different structures in the neighbourhood of these sequences to determine evolvability. The reason is that the neighbourhoods of two different sequences generally harbour very different structures (Huynen 1996; Sumedha et al. 2007), such that one needs to add the unique structures encountered in different neighbourhoods. To generate sequences folding into a given MFE structure, I used the routine inverse_fold, which creates sequences folding into a given MFE structure, using a guided random walk through sequence space that begins with a randomly chosen sequence to sample sequences from a random network. Past work (Schuster et al. 1994; Sumedha et al. 2007) has shown that inverse_fold effectively samples the space of sequences folding into a given structure at random. The routine occasionally fails to arrive at a sequence folding into a given structure. For reasons of computational limitations, the maximum number of such unsuccessful inverse foldings for every sequence to be sampled from a neutral network was limited to 10. After 10 unsuccessful inverse foldings, the structure was not considered further in any given analysis. (c) Random walk towards a target structure To determine the association between robustness and evolvability, when evolvability is defined as the length of a random walk from a starting sequence with a given structure S to a target sequence with a given structure T, I pursued the following approach. I started with a set of 106 randomly generated sequences (n=30) and eliminated those sequences from this set, that did not have an MFE structure, i.e. one in which all bases were unpaired. I ranked the remaining 9.4x105 structures according to their estimated frequency, i.e. the number of times they occurred in the set of structures. This yielded 1.5x105 different structures with frequencies ranging over three orders of magnitude from less than 1.06x10-6 (structures found only once) to 1.7x10-3 (structures found 1700 times). Of the 1.5x105 structures of 7.9x104 structures had the lowest frequency; that is, they occurred only once in the sample of 106 sequences. I note that the frequency of the rarest structures, those that occur only once or a few times in a sample of a given size cannot be estimated with any accuracy from the sample. A rare structure’s actual frequency will often be substantially lower than the frequency estimated from the number of occurrences. (If the structure’s actual frequency was substantially higher, then the structure would occur more often in the sample.) This means that the range of structure frequencies sampled here is even broader than that indicated by the above numbers.For reasons of computational tractability, I used only the first 75 000 structures from the ranked structure list in further analyses. This number of structures contains structures of all observed occurrences, including more than 3000 structures of the lowest frequency. From these 75 000 structures, I then randomly sampled pairs of structures (without replacement) where the first member of the pair was designated as the starting structure S and the second member as the target structure T. I recorded the Hamming distance between S and T in their dot-parenthesis representation, their base pair distance (Hofacker et al. 1994), as well as the estimated frequencies of both S and T. I then determined a sequence G folding into S, using inverse_fold. From this sequence, I initiated a random walk in sequence space that was biased towards T as follows. I changed an arbitrary nucleotide in G to arrive at a sequence G and determined its MFE structure S. If d(S, T)d(S, T), where d(., .) denotes the Hamming distance, then the mutated sequence was kept and a new sequence G was generated from it to continue the random walk. If this condition was not met, that is, if d(S, T)>d(S, T), then G was discarded and a new mutated G, G was generated, until the condition was met; at that time the mutated sequence G was kept and a new sequence G was generated in the same vein from G to continue the random walk. This process was repeated until a sequence was reached, which had the MFE structure T, or until a number of 106 steps in the random walk had been carried out. In the latter case, the walk was considered unsuccessful and a new inversely randomly chosen structure pair was considered. For random walks that successfully reached T, the length of the walk, as well as the number of different structures that the random walking sequence visited between S and T, was recorded. Note that the random walk as defined here allows a sequence to drift on a neutral network until it reaches a sequence whose structure is equally distant or closer to T. Reference Fontana, W. & Schuster, P. 1998b Shaping space: the possible and the attainable in RNA genotype--phenotype mapping. J. Theor. Biol. 194, 491515. (doi:10.1006/jtbi.1998.0771) Hofacker, I., Fontana, W., Stadler, P., Bonhoeffer, L., Tacker, M. & Schuster, P. 1994 Fast folding and comparison of RNA secondary structures. Monath. Chem. 125, 167-188. (doi:10.1007/BF00818163) Huynen, M. A. 1996 Exploring phenotype space through neutral evolution. J. Mol. Evol.43, 165-169. (doi:10.1007/BF02338823) Sumedha, Martin, O. & Wagner, A. 2007 New structural variation in evolutionary searches of RNA neutral networks. Biosystems 90, 475-485. (doi:10.1016/j.biosystems.2006.11.007) Schuster, P., Fontana, W., Stadler, P. & Hofacker, I. 1994 From sequences to shapes and back - a case-study in RNA secondary structures. Proc. R. Soc. B 255, 279-284. (doi:10.1098/rspb.1994.0040)

Material and methods - Proceedings of the Royal Society B

Related documents

Products

Support

Material and methods - Proceedings of the Royal Society B

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib