From Sequences to Shapes and Back: A Case Study in... Author(s): Peter Schuster, Walter Fontana, Peter F. Stadler and Ivo...

advertisement
From Sequences to Shapes and Back: A Case Study in RNA Secondary Structures
Author(s): Peter Schuster, Walter Fontana, Peter F. Stadler and Ivo L. Hofacker
Source: Proceedings: Biological Sciences, Vol. 255, No. 1344 (Mar. 22, 1994), pp. 279-284
Published by: The Royal Society
Stable URL: http://www.jstor.org/stable/49949 .
Accessed: 11/09/2014 16:50
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
.
The Royal Society is collaborating with JSTOR to digitize, preserve and extend access to Proceedings:
Biological Sciences.
http://www.jstor.org
This content downloaded from 142.58.160.201 on Thu, 11 Sep 2014 16:50:08 PM
All use subject to JSTOR Terms and Conditions
Fromsequencesto shapesand back: a case studyin
RNA secondarystructures
PETER SCHUSTER1' 23, WALTER
AND IVO L. HOFACKER2
FONTANA3, PETER
F. STADLER2
3
1
11, PF 100813,D-07708 Jena, Germany
Beutenbergstrasse
fir MolekulareBiotechnologie,
Institut
Wien,Austria
Chemie,Universitdt
Theoretische
Institutfuir
Santa Fe, U.S.A.
3Santa Fe Institute,
2
SUMMARY
RNA foldingis viewed here as a map assigning secondary structuresto sequences. At fixed chain length
the number of sequences far exceeds the number of structures.Frequencies of structuresare highly nonuniformand followa generalized formofZipf's law: we findrelativelyfewcommon and many rare ones.
By using an algorithm for inverse folding, we show that sequences sharing the same structure are
distributed randomly over sequence space. All common structurescan be accessed from an arbitrary
sequence by a number ofmutationsmuch smaller than the chain length.The sequence space is percolated
by extensiveneutral networksconnectingnearestneighboursfoldinginto identical structures.Implications
for evolutionary adaptation and for applied molecular evolution are evident: finding a particular
structureby mutation and selection is much simpler than expected and, even if catalytic activityshould
turn out to be sparse in the space of RNA structures,it can hardly be missed by evolutionary processes.
1. INTRODUCTION
Folding sequences into structuresis a central problem
in biopolymer research. Both robustness and accessibility of structures,as functionsof mutational change
in the underlyingsequence, are crucial to both natural
and applied molecular evolution. Test-tube evolution
experimentsare based on propertiesofRNA molecules:
as sequences they are genotypes, and as spatial
structures they are phenotypes (Spiegelman 1971;
Biebricher 1983). Our concern is the mapping from
RNA sequences into structuresbeing the simplest,and
the only tractable, example of a genotype-phenotype
mapping.
An RNA sequence is a point in the space of all 4n
sequences with fixedlength n. This space has a natural
metric induced by point mutations interconverting
sequences known as the Hamming distance (Hamming
1950, 1986). The foldingprocess considered here maps
an RNA sequence into a secondary structure (figure
1a) minimizing free energy. A secondary structureis
tantamount to a list of Watson-Crick type and GU
base pairs, and can be represented as a tree graph
(figure 1b). This emphasizes the combinatorial nature
of secondary structures and allows for a canonical
distance measure between structures (Tai 1979).
Assuming elementaryedit operations with pre-defined
costs, such as deletion, insertion and relabelling of
nodes, the distance between two trees is given by the
smallest sum of the edit costs along any path that
converts one tree into the other (Sankoff & Kruskal
1983).
Proc.R. Soc.Lood.B (1994) 255, 279-284
in GreatBritain
Printed
An approximate upper bound on the number of
minimum free-energystructures(of fixed chain length
n) can be obtained along the lines devised by Stein &
Waterman (1978). Counting only those planar secondary structures that contain hairpin loops of size
three or more (steric constraint), and that contain no
isolated base pairs (stacks of two or more pairs are
essentiallythe only stabilizing elements), one finds:
n
1.4848 x n
2(1.8488),
which is consistently smaller than the number of
sequences.
Folding can thus be viewed as a map between two
metricspaces of combinatorial complexity,a sequence
space and a shape space. (The notion of shape space
was originally used in theoretical immunology in a
similar context by Perelson & Oster (1979).) 'Shape'
refers to a discretized (and hence coarse-grained)
structurerepresentation,such as the secondary structures or the tree graphs used here. The notion of
secondary structureis but one among a spectrum of
possible levels of resolution that can be used to define
shape. It discards atomic coordinates, as well as the
relative spatial orientation of the structural elements,
takinginto account only theirnumber,size and relative
connectedness. Nevertheless, secondary structureis a
major component of whatever turns out to be an
adequate shape definition for RNA: it covers the
dominant part of the three-dimensional folding energies, very often it can be used successfullyin the
interpretation of function and reactivity, and it is
frequentlyconserved in evolution (Sankoffet al. 1978;
279
This content downloaded from 142.58.160.201 on Thu, 11 Sep 2014 16:50:08 PM
All use subject to JSTOR Terms and Conditions
C) 1994 The Royal Society
280
structures
P. Schuster and others RNA secondary
(b)
(a)
5~~~~~~~~~5
0~~~~~
:::
,,,%<tXb>k)~~~
00i
on a sequenceis any patternofbase pairssuch thatno bases insidea loop pair
Figure 1. (a) A secondarystructure
elementsthatare: (i) base pairstacks;
can be uniquelydecomposedintostructural
tobasesoutsideit.Such a structure
in size (numberofunpairedbases) and branchingdegree,i.e. hairpinloops (degreeone), internal
(ii) loopsdiffering
loops (degreetwoor more); and (iii) bases whichare notpartofa stackor a loop, termedexternal(freelyrotating
additivelyto theoverallfreeenergyofthestructure
jointsand unpairedends). Each stackorloop elementcontributes
determinedparametersthatdepend on the nucleotidesequence.A minimumfree-energy
accordingto empirically
proposedbyZuker& Stiegler(1981) and Zuker& Sankoff(1984).
accordingto an algorithm
structure
is constructed
(b) A secondarystructuregraph (a) is equivalentto an orderedrootedtree.An internalnode (black) of the tree
to one unpairednucleotide,and theroot
to a base pair (twonucleotides),a leafnode (white)corresponds
corresponds
node (black square) is a virtualparentto the externalelements.Contiguousbase pair stackstranslateinto'ropes'
a treeby firstvisitingthe rootthen
ofinternalnodes,and loops appear as bushesofleaves. Recursivelytraversing
visitingitssubtreesin lefttorightorder,finallyvisitingtherootagain,assignsnumbersto thenodesin correspondence
thepairedpositions.)
to the 5'-3' positionsalong thesequence. (Internalnodesare assignedtwonumbersreflecting
Konings & Hogeweg 1989; Le & Zuker 1990),
sometimes together with a few tertiary interactions
(Cech 1988). This suggests that many of the relevant
intermolecular interactions that collectively set a
natural scale for shape are indeed stronglyinfluenced
by the secondary structure.The observation, then, is
that- at least in the present case - the shape space is
considerably smaller than the sequence space. (We
remark that this is also true for protein models on
lattices.)
2. FREQUENCIES
FOLDING
OF SHAPES
AND INVERSE
Frequencies of occurrence for individual shapes in
sequence space were obtained from large samples
derived by folding random sequences of fixed chain
length. Ranking according to decreasing frequencies
yields a distributionwhich obeys a generalized Zipf's
law (figure2). We are thus dealing with relativelyfew
common shapes and many rare ones. How are the
sequences which foldinto the same shape distributedin
sequence space? This distributionis evaluated with a
heuristicinverse folding procedure, aimed at devising
sequences that foldinto an arbitrarypre-definedtarget
shape (Hofacker etal. 1993). The obvious firststep is to
construct a compatible test sequence with nucleotide
assignments such that the target shape is indeed a
possible secondary structure,although typically not a
minimum free-energyone. We choose at random
among the many compatible sequences. The next step
is to decompose the minimum energystructureon the
chosen testsequence into substructures,and to mutate
by trial and error the corresponding subsequences.
When the individual substructuresare as in the target,
the entire sequence is reassembled. The procedure
stops if the reassembled sequence folds into the target
shape. This happens in about 50O ofthe cases. Several
sequences that foldinto the same structureare sampled
by starting the procedure with differentcompatible
sequences. The average number of mutations that
converted a random compatible sequence of chain
length n 100 into one with the desired target shape
was 7.2.
The resultingensemble ofcompatible sequences that
fold into a pre-definedtargethas been analysed forthe
target being the secondary structureof t-RNAPheand
forthreerandomly constructedexamples. In each case
about 1000 sequences were derived by the inverse
folding algorithm. The distribution of pairwise dis-
Proc.R. Soc.Lond.B (1994)
This content downloaded from 142.58.160.201 on Thu, 11 Sep 2014 16:50:08 PM
All use subject to JSTOR Terms and Conditions
RNA secondary
structuresP. Schuster and others 281
10-1
10-2
,>1)
3
0.08
r
ty i
\
-~0.04
104X
10-4
0.00
10-7
1
10
1000
480~~~
100000
_______________
10-5
1
10
100
1000
10000
rank
Figure 2. The frequencydistributionof RNA secondary
structures.Shapes are ranked by their frequencies.The
particularexampleshownheredeals withtheloop structures
(Shapiro & Zhang 1990) of 105 RNA moleculesof chain
length100 whichare derivedfromsecondarystructures
by
further
coarsegrainingthateliminatesall detailsconcerning
stacklengthsand loop sizes.The diagramcovers97 % ofthe
totalfrequency.
The frequencies
followa generalizedformof
Zipf'slaw:f(x) = a(b+x)->, withx beingtherankofa shape
andf(x) itsfrequency.
Parametervalues ofthebestfit(thin
curve) are a = 1.25, b = 71.2 and c = 1.73. The frequency
distribution
offullsecondarystructures
is essentially
thesame
as shownin theinsertforchainlength30. Computationofthe
forlongerchainsis hardlypossibleas thenumber
distribution
ofstructures
exceedsbyfartheavailablecapacities(thereare
ofchain lengthn =
about 7 x 1023fullsecondarystructures
100).
tances is not distinguishablefromthe one expected for
random compatible sequences. The properties of the
sequence sample as seen by statisticalgeometry(Eigen
etal. 1988 a) and split-decomposition(Bandelt & Dress
1992) yield the same result: sequences foldinginto the
same structureare randomly distributedin the space of
compatible sequences.
3. STRUCTURE
DENSITY
80
*~60
E
40
20
0
20
40
60
distance
structure
Figure 3. (a) The structuredensitysurface(SDS) forRNA
sequences of length n = 100 (upper). This surface was
sequence
obtainedas follows:(i) choosea randomreference
and computeitsstructure;(ii) samplerandomlytendifferent
sequencesin each distanceclass (Hammingdistance1-100)
sequence,and bin the distancesbetween
fromthe reference
This procedure
structure.
theirstructures
and the reference
was repeatedfor 1000 random referencesequences. Convergenceis remarkablyfast; no substantialchanges were
sequences.
observedwhendoublingthenumberofreference
This procedureconditionsthe densitysurfaceto sequences
and does not,
withbase compositionpeaked at uniformity,
therefore,yield informationabout stronglybiased compositions.(b) Contourplot of theSDS.
0
SURFACES
Generalizing the previous question we ask how the
possible shapes are distributed over the possible
sequences. One insightis provided by considering the
probability density (Fontana etal. 1993 a, b) P(tlIh) of
two structuresbeing at (tree) distance t, given that the
underlying sequences are at (Hamming) distance h.
This structure density surface (SDS) shows how the
distribution of structure differenceschanges as the
sequences become more and more uncorrelated with
increasing Hamming distance from the reference
(figure3 presentsthe SDS forsequences of chain length
n= 100). Three observations are immediate: (i)
although forvery small Hamming distances (h = 1,2,
3) the most probable structuresare identical or very
similar, there is none the less some probability that
even a single mutation substantially alters the structure; (ii) beyond distance h = 3, identical or even
closely related structuresare extremelyunlikely; and
(iii) in the range 15 <Ah < 20, the density becomes
independent of A,thus approaching essentiallywhat is
expected for a sample of randomly drawn sequences
(A 75).
The latter suggeststhat the structuresof a reference
sequence and its mutants at distances between 15 and
20 or larger are effectivelyuncorrelated. This suggests
that memory of the referencestructureis sufficiently
lost to allow the mutants at that distance to acquire
any frequentminimum energystructure,at least in its
essential features.From the SDS the complete structure
autocorrelation functioncan be recovered (Fontana et
al. 1993 a). This function is to a reasonable approximation a single decaying exponential with a characteristiclength,I = 7.6 in the presentcase (chain length
n = 100). From figure3 it is seen that this corresponds
essentiallyto the distance at which the dominant peak
resultingfromidentical or very similar structureshas
disappeared.
4. SHAPE
SPACE
COVERING
Combination of the previous results (showing the
existence of relatively few common shapes which are
minimum free-energystructures for sequences randomly distributed in sequence space) with the in-
Proc. R. Soc. Lond. B (1994)
This content downloaded from 142.58.160.201 on Thu, 11 Sep 2014 16:50:08 PM
All use subject to JSTOR Terms and Conditions
282
structures
P. Schuster and others RNA secondary
0.25
that two arbitrarilychosen bases of an RNA sequence
can forma base pair is given by the number of pairings
divided by the number of possible combinations of two
0.20
bases: 6/16 = 3/8 (as we have six classes of base pairs:
AU,
GC, GU and inversions). The mean number of
0.15
bases that have to be changed in a random sequence,
to forma sequence which is compatible with the target
0.10
structure (representingthe lower bound), is obtained
from the probability not to form a base pair by
0.05
multiplication with the mean number of base pairs:
(1 - 3) xnBp = 5iRBp. For RNA molecules of chain
0.00
0
20
80
100
40
60
length 100, the mean number of base pairs is 24.34
ofneutral
length
path,h
(Fontana et al. 1993 a), and we obtain a mean
Figure4. Neutralpaths.A neutralpathis definedby a series Hamming distance of 15.2 for the lower bound. From
of nearest neighboursequences that fold into identical
the probabilitydensityof the number of base pairs, we
structures.
Two classesof nearestneighboursare admitted:
derive a distributionof the lower bound also shown in
neighboursof Hammingdistance1, whichare obtainedby
figure 4 (open diamonds). The characteristic neighof thestructure,
singlebase exchangesin unpairedstretches
bourhood has a radius of 15 < he < 20.
and neighboursofHammingdistance2, resulting
frombase
pair exchanges in stacks. Two probabilitydensitiesof
Hamming distances are shown that were obtained by
searchingforneutralpathsin sequencespace: (i) an upper
5. NEUTRAL PATHS THROUGH
SEQUENCE
bound fortheclosestapproachoftrialand targetsequences
SPACE
(open circles) obtained as endpoints of neutral paths
approachingthe targetfroma randomtrialsequence (185
The structure of the RNA shape space over the
targetsand 100 trialsforeach wereused); (ii) a lowerbound
sequence space is complemented by a second computer
forthe closestapproachof trialand targetsequences(open
experiment. We search for neutral paths with monodiamonds) derived from secondary structurestatistics tonouslyincreasingdistance froma referencesequence.
(Fontana etal. 1993a; see thispaper, ?4); and (iii) longest A neutral path ends when no sequence that formsthe
distances between the referenceand the endpoints of
same structureis found among the nearest neighbours.
monotonouslydivergingneutralpaths (filledcircles) (500
The probability densityof the lengthsof these paths is
reference
sequenceswereused).
shown in figure4 (filled circles). The vast extensionof
the networkof neutral paths came as a surprise: 21.7 00
formation from the SDS (showing the existence of a
of all paths percolate the entiresequence space and end
transitionfromlocal to global features)provides strong
in a sequence which has not a single base in common
evidence forthe existence of a neighbourhood (a highwith the reference.(The existence of extensive neutral
dimensional ball) around every random sequence that
networks meets a claim raised by Maynard-Smith
contains sequences whose structuresinclude almost all
(1970) forprotein spaces that are suitable for efficient
common shapes.
evolution.)
To verifythe prediction of a characteristic neighbourhood covering almost all common shapes we did a
computer experiment. A target sequence is chosen at
6. DISCUSSION
random. A second random sequence servesas an initial
trialsequence, and its structureas a referencestructure.
The existence of a ball with characteristic radius
Next we search for a nearest neighbour of the trial
around any random sequence within which almost all
sequence that foldsinto the referencestructurebut lies
common shapes are found (figure 5) is a robust
closer to the target. If such a sequence is found, it is
phenomenon of the mapping fromsequences into RNA
accepted as the new trial sequence, and the procedure
secondary structures. It depends on the ratio of
is repeated until no furtherapproach to the target is
sequences to structures.Changes in the base-pairing
possible. The final Hamming distance to the target is
alphabet, in particular the considerationofpure GC or
an upper bound for the minimum distance between
pure AU sequences, may cause minor alterations that
two sequences foldinginto the referencestructureand
can be interpretedby much smaller values of thisratio
the structureofthe target,respectively.The probability
as well as by differencesin the topology of sequence
density of this upper bound to the closest approach
space. Alphabet dependencies will be published elsedistances determined for RNA molecules of chain
where. The major featuresof the shape space structure
length 100 is shown in figure4 (open circles). It yields
depend on the generic properties of RNA folding, in
a mean value of 19.8. (It is remarkable that this value
particular on the non-local nature of base pairing, but
coincides with the critical Hamming distance at which
they are insensitiveto the empirical energyparameter
we observe the change fromlocal to global featuresin
sets used in folding algorithms, as well as to the
the SDS; analogous investigationson RNA molecules
distance measure between structuresused in the 5D5
with only GC or only AU base pairs have shown that
(essentially the same SDSS were obtained with a now
the precise agreement is not generally valid.)
superseded parameter set (Fontana et al. 1991), and
We can also compute a lower bound for the mean
also with a different structure distance measure
value of the closest approach distance. The probability
(Hogeweg & Hesper 1984; Huynen et al. 1993)).
Proc. R. Soc. Lond. B (1994)
This content downloaded from 142.58.160.201 on Thu, 11 Sep 2014 16:50:08 PM
All use subject to JSTOR Terms and Conditions
RNA secondary
structures
P. Schusterand others 283
space
sequence
shapespace
Figure5. A sketchofthemappingfromsequencesintoRNA
as derivedhere.Anyrandomsequence
secondarystructures
is surroundedby a ball in sequence space which contains
The
sequencesfoldinginto (almost)all commonstructures.
radius of thisball is much smallerthan the dimensionof
sequencespace.
There are caveats to our approach.
1. We use a thermodynamic criterion for RNA
structureformation,not one that mimics the kineticsof
folding. This does not constitute a problem for short
sequences up to a fewhundred nucleotides. Moreover,
the number of possible kineticstructuresis not entirely
differentfrom the number of thermodynamic structures,the principles of base pairing are the same, and
thus the generic featuresof mappings of sequences into
kineticor thermodynamicstructureswill be essentially
the same, too.
2. We consider only a single minimum free-energy
structure for each sequence. Our approach can be
carried over to ensembles comprising optimal and
suboptimal foldings,as representedby partition functions (McCaskill 1990; Bonhoefferetal. 1993). 'Shape',
then, becomes a matrix of temperature-dependent
base-pairing probabilities, and the concept of distance
is changed accordingly. All qualitative featuresof the
SDS remain essentially unchanged, and numerical
correctionsare in the range of 100 (as, forexample, in
the case of correlationlengths (Bonhoefferetal. 1993)).
3. We do not consider three-dimensionalstructure.
Nevertheless, the secondary structure defines an informativescale of resolution. In addition, it constitutes
an approximation to a coarse-grained spatial structure
(current algorithmsfor the modelling of RNA threedimensional structuresstartfromsecondary structures,
and introduce a few tertiaryinteractions (Major et al.
1991, 1993)).
The consequences of our results for natural and
artificial selection are immediate. We predict that
thereis no need to search systematicallyhuge portions
of the sequence space. In the particular example of
RNA molecules of chain length 100, the characteristic
ball contains some 1027 sequences, which is only a
fractionof i0" of the entire sequence space. Almost
all structuresare within reach of a few mutations from
a compatible sequence (average: 7.2), and even in
reasonable proximityof any non-compatible random
sequence (~ 18). The conclusion is thus that optimization of structuresby evolutionary trial and error
strategies is much simpler than is often assumed. It
provides furthersupport to the idea of widespread
applicability of molecular evolution (Eigen 1971;
Eigen & Schuster 1979; Eigen etal. 1988 b, 1989). The
existence of networksof neutral paths percolating the
entire sequence space has strong implications for
(molecular) evolution in nature, as well as in the
laboratory. Populations replicating with sufficiently
high error rates will readily spread along these
networks and can reach more distant regions in
sequence space.
If one were to design the ultimate evolvable molecule
that carries informationand is engaged in functional
interactions,it would ideally require two features: (i)
capability of driftingacross sequence space without the
necessityof changing shape; and (ii) proximityto any
common shape everywhere. These are precisely the
features that statistically characterize the mapping
fromRNA sequence to secondary structure.
The workreportedhere was supportedfinanciallyby the
ForAustrianFonds zur Fbrderungder wissenschaftlichen
schung (Projects S 5305-PHY, P 9942-PHY, and P8526MOB), the European Economic Community(Research
StudyContractPSS*03960) and a National Science Foundation (Washington)core grantto the Santa Fe Institute
Leo Bussand Professor
(PHY-9021437). We thankProfessor
and Dipl.AndreasDress forcommentson the manuscript,
forsupplyingthedata forthe
Chem. ErichBornberg-Bauer
insertoffigure2.
REFERENCES
Bandelt, H.-J. & Dress, A. W. M. 1992 A canonical
theoryformetricson a finiteset.Adv.Math.
decomposition
92, 47-105.
Biebricher, C. K. 1983 Darwinian selection of selfreplicatingRNA molecules.Evol.Biol. 16, 1-52.
S., McCaskill,J.S., Stadler,P. F. & Schuster,P.
Bonhoeffer,
landscapes.A studybased on
1993 RNA multi-structure
dependentpartitionfunctions.Eur. Biophys.
temperature
J. 22, 13-24.
of
Cech, T. R. 1988 Conservedsequencesand structures
groupI introns:Buildingan activesiteforRNA catalysis
- a review. Gene73, 259-271.
ofmatterand theevolution
Eigen,M. 1971 Selforganization
58,
of biological macromolecules. Naturwissenschaften
465-523.
- a principleof
Eigen, M. & Schuster, P. 1979 The hypercycle
Berlin: Springer-Verlag.
naturalself-organization.
R. & Dress, A. 1988a
Eigen, M., Winkler-Oswatitsch,
Statisticalgeometryin sequence space: A method of
quantitativecomparativesequence analysis. Proc. natn.
Acad. Sci. U.S.A. 85, 5913-5917.
Eigen, M., McCaskill,J. & Schuster,P. 1988b Molecular
quasi-species.J. phys.Chem.92, 6881-6891.
Eigen,M., McCaskill,J.& Schuster,P. 1989 The molecular
Phys.75, 149-263.
quasi-species.Adv.chem.
T., Schnabl,W., Stadler,P. F. &
Fontana,W., Griesmacher,
Schuster,P. 1991 Statisticsof landscapesbased on free
energies,replicationand degradationrate constantsof
Mh. Chem.122, 795-819.
RNA secondarystructures.
Fontana,W., Konings,D. A. M., Stadler,P. F. & Schuster,
P. 1993a Statisticsof RNA secondarystructures.Bio33, 1389-1404.
polymers
E. G., GriesFontana, W., Stadler,P. F., Bornberg-Bauer,
Proc. R. Soc. Lond. B (1994)
Vol 255
20
This content downloaded from 142.58.160.201 on Thu, 11 Sep 2014 16:50:08 PM
All use subject to JSTOR Terms and Conditions
B
284
structures
P. Schuster and others RNA secondary
macher,T., Hofacker,I. L., Tacker, M., Tarazona, P.,
Weinberger,E. D. & Schuster,P. 1993b RNA folding
and combinatory
landscapes.Phys.Rev.E 47, 2083-2099.
Hamming,R. W. 1950 Errordetectingand errorcorrecting
Maynard-Smith, J. 1970 Natural selection and the concept
of a protein space. Nature,Lond. 225, 563-564.
McCaskill, J. S. 1990 The equilibrium partition function
and base pair binding probabilities for RNA secondary
structures.Biopolymers
29, 1105-1119.
147-160.
codes. Bell Syst. tech.J. 29,
Perelson, A. S. & Oster, G. F. 1979 Theoretical studies on
theory,
2nd edn.
Hamming, R. W. 1986 Codingand information
clonal selection: minimal antibody repertoire size and
EnglewoodCliffs:PrenticeHall.
reliability of self-non-selfdiscrimination. J. theor.Biol. 81,
L.
Hofacker,I. L., Fontana,W., Stadler,P. F., Bonhoeffer,
645-670.
S., Tacker, M. & Schuster,P. 1993 Fast foldingand
Sankoff, D., Morin, A.-M. & Cedergren, R.J. 1978 The
Mh. Chem.(In
comparisonof RNA secondarystructures.
evolution of 5S RNA secondary structures.Can. J. Bzochem.
the press.)
56, 440-443.
Hogeweg,P. & Hesper,B. 1984 Energydirectedfoldingof
Sankoff,
D. & Kruskal, J. B. (ed.) 1983 Time warps,string
RNA sequences. Nucl. Acids Res. 12, 67-74.
the theoryand practiceof sequence
edits and macro-molecules.
Huynen,M. A., Konings,D. A. M. & Hogeweg, P. 1993
London: Addison Wesley.
comparisons.
propertiesofRNA
Multiplecodingand the evolutionary
Shapiro, B. A. & Zhang, K. 1990 Comparing multiple
Biol. 165, 251-267.
J. theor.
secondarystructure.
RNA secondary structures using tree comparisons.
Konings,D. A. M. & Hogeweg,P. 1989 Patternanalysisof
Computer
Appl. Biosci. 6, 309-318.
RNA secondarystructure.Similarityand consensusof
Spiegelman, S. 1971 An approach to the experimental
minimal-energy
folding.J. molec.Biol. 207, 597-614.
analysis of precellular evolution. Q. Rev. Biophys. 4,
Le, S.-Y. & Zuker, M. 1990 Common structuresof the
213-253.
and rhinoviruses.
5' non-codingRNA in enteroviruses
Stein, P. R. & Waterman, M. S. 1978 On some new
Thermodynamicalstabilityand statisticalsignificance.
sequences generalizing the Catalan and Motzkin numbers.
J. molec.Biol. 216, 729-741.
DiscreteMaths 26, 261-272.
Major, F., Turcotte, M., Gautheret,D., Lapalme, G.,
Tai, K. 1979 The tree-to-treecorrection problem. J. Ass.
Fillion,E. & Cedergren,R. 1991 The combinationof
Comput.Mach. 26, 422-433.
symbolic and numerical computation for three- Zuker, M. & Sankoff,D. 1984 RNA secondary structures
dimensional modeling of RNA. Science,Wash. 253,
and their prediction. Bull. Math. Biol. 46, 591-621.
1255-1260.
Zuker, M. & Stiegler, P. 1981 Optimal computer foldingof
Major, F., Gautheret,D. & Cedergren,R. 1993 Relarge RNA sequences using thermodynamicand auxiliary
structureof a tRNA
producingthe three-dimensional
information.Nucl. AczdsRes. 9, 133-148.
Proc.natn.Acad.Sci.
moleculefromstructuralconstraints.
U.S.A. 90, 9408-9412.
Received
25 November
23 December
1993
1993; accepted
Proc.R. Soc.Lond.B (1994)
This content downloaded from 142.58.160.201 on Thu, 11 Sep 2014 16:50:08 PM
All use subject to JSTOR Terms and Conditions
Download