An Overview of the Introns-First Theory

advertisement
J Mol Evol
DOI 10.1007/s00239-009-9279-5
An Overview of the Introns-First Theory
David Penny Æ Marc P. Hoeppner Æ Anthony M. Poole Æ
Daniel C. Jeffares
Received: 13 August 2009 / Accepted: 8 September 2009
Ó Springer Science+Business Media, LLC 2009
Abstract We review the introns-first hypothesis a decade
after it was first proposed. It is that exons emerged from
non-coding regions interspersed between RNA genes in an
early RNA world, and is a subcomponent of a more general
‘RNA-continuity’ hypothesis. The latter is that some RNAbased systems, especially in RNA processing, are ‘relics’
that can be traced back either to the RNA world that preceded both DNA and encoded protein synthesis or to the
later ribonucleoprotein (RNP) world (before DNA took
over the main coding role). RNA-continuity is based on
independent evidence—in particular, the relative inefficiency of RNA catalysis compared with protein catalysis—
and leads to a wide range of predictions, ranging from the
origin of the ribosome, the spliceosome, small nucleolar
RNAs, RNases P and MRP, and mRNA, and it is consistent
with the wide involvement of RNA-processing and regulation of RNA in modern eukaryotes. While there may still
be cause to withhold judgement on intron origins, there is
strong evidence against introns being uncommon in the last
eukaryotic common ancestor (LECA), and expanding only
D. Penny (&)
Allan Wilson Center, Massey University, Palmerston North,
New Zealand
e-mail: d.penny@massey.ac.nz
M. P. Hoeppner A. M. Poole
Department of Molecular Biology and Functional Genomics,
Stockholm University, 106 91 Stockholm, Sweden
A. M. Poole
School of Biological Sciences, University of Canterbury,
Christchurch 8140, New Zealand
D. C. Jeffares
Department of Genetics, Evolution and Environment,
University College London, London WC1E 6BT, UK
within extant eukaryotic groups—the ‘very-late’ intron
invasion model. Similarly, it is clear that there are selective
forces on numbers and positions of introns; their existence
may not always be neutral. There is still a range of viable
alternatives, including introns first, early, and ‘latish’ (i.e.
well established in LECA), and regardless of which is
ultimately correct, it pays to separate out various questions
and to focus on testing the predictions of sub-theories.
Keywords Introns RNA world Eukaryote origins RNP world Spliceosome Introns early
Introduction
The introns-first theory was published just over a decade
ago (Poole et al. 1998; Jeffares et al. 1998), and aimed to
account for the origin of mRNA within an evolutionary
framework for the origin of genetically encoded protein
synthesis in the late stages of the RNA world (see Fig. 1 for
a summary). There are three aspects to this hypothesis: that
mRNA arose by co-option of expressed non-functional
RNA, that the co-opted RNAs were interspersed between
functional RNA genes, and that the core of the spliceosome, some extant genes, and their introns may be relics
from this very early period. The last of these is directly
testable in principle. Introns-first is thus part of a much
wider analysis of the expectations of the continuity of RNA
systems from the RNA and ribonucleoprotein (RNP)
worlds to modern organisms. This more general RNAcontinuity theory (Fig. 2a, see Penny and Collins 2009) is
that many classes of RNA in modern eukaryotes have
existed since these earlier phases; they are in our terminology ‘relics’, though of course the associated proteins
would only have arisen after encoded protein synthesis
123
J Mol Evol
either introns first or early. Figure 2b shows the contrast
between the RNA-continuity model and the more common
idea that an early complexity of RNA control mechanisms
from then RNP world was lost in prokaryotes (archaea and
bacteria) and reinvented in eukaryotes.
The last decade has seen a major expansion in our
knowledge of the roles of RNA in eukaryotes and has
expanded the classes of RNA that are known and the
questions that need to be addressed. An early focus was on
ubiquitous RNAs with a processing function (e.g. rRNA,
tRNA, snRNA, small nucleolar RNA [snoRNa], srpRNA,
RNase P, and RNase MRP) and the exon/intron structure of
eukaryote genes (see Gilbert 1987; Cavalier-Smith 2002;
Rodrı́guez-Trelles et al. 2006; Di Giulio 2008a, b, and
references therein), but the finding of the widespread and
complex roles of RNA in eukaryote cells (including RNAi)
has broadened the discussion to the extent we now refer to
the ‘RNA infrastructure’ of the eukaryote cell (Collins and
Penny 2009). In particular, a range of regulatory RNAs
have been identified in the last decade and include the
many classes of small RNA involved in RNAi, such as
miRNA, siRNA and piRNA, as well as their role in epigenetics (Carthew and Sontheimer 2009).
Fig. 1 The main components of the introns-first model. Introns and
intron splicing arose in a RNA world organism where both the
genome and the enzymes are composed of RNA. The double-stranded
RNA genome (at top) contained RNA genes (filled boxes), interspersed with sequences that are non-RNA coding (open boxes).
Transcription produces single pre-processed transcripts. These transcripts are then processed (spliced) to produce mature functional
RNAs. Non-functional RNA byproducts are also produced from this
processing as a byproduct of liberation of functional RNAs (such as
snoRNAs). Some such byproducts were subsequently recruited to
non-templated protein synthesis as a means to stabilize the interaction
between two charged tRNAs during non-genetically encoded peptide
synthesis (by pairing with what subsequently became ‘anticodon’
loop). This model for the origin of mRNA suggests co-evolution of
the genetic code and these earliest transcripts. The introns-first
hypothesis proposed that the first proteins were initially selected for
propensity to stabilize functional RNA and were not catalytic. Hence,
introns are derived from RNA genes, and these were present prior to
the evolution of protein-coding segments (exons)
developed. Thus, the introns-first hypothesis is an independent sub-hypothesis of this more general model, and the
introns-first model needs to be evaluated independently; for
example, the RNA-continuity hypothesis could stand even
though the introns-first could eventually be rejected. We
will see that the introns-first model shifts the focus onto
eukaryotes, and if it could be established that eukaryotes
were indeed formed de novo from an archaeal and a bacterial cell (see Embley and Martin 2006 for different
models) then that would be a very strong evidence against
123
Introns First, Early, Late, and the RNA-Continuity
Hypothesis
A basic question is the extent to which the RNA infrastructure arose de novo in eukaryotes, or whether there is continuity of many classes of RNA, including those restricted to
eukaryotes, from the later stages of the origin of life through
to the present. Most researchers appear to consider eukaryotes as ‘advanced’ cells that must in some way be derived
from ‘primitive’ prokaryotes. Although this is definitely
possible, we think that under the three domains of life view,
the data is currently insufficient to exclude the alternative
that eukaryotes have remained relatively inefficient in, for
example, their processing of mRNA. In contrast, we could
consider bacteria and archaea as having evolved a very fast
and efficient mRNA processing system—whether from
thermoreduction (Forterre 1995), r-selection (Poole et al.
1998), efficient selection in large populations (Lynch 2007),
chronic energy stress (Valentine 2007), or other reasons.
Currently, it is important to remain open-minded about different interpretations of eukaryote origins, and focus on
using the available data to test the different models. It is
unhelpful to prematurely decide on just one model.
The first step is to outline the timing of the different
hypotheses. For the origin of spliceosomal introns, a
common distinction is between scenarios that consider
introns as a late addition via horizontal transfer, or a
remnant of the RNA or RNP worlds. There are plausible,
J Mol Evol
Fig. 2 Two models for the origin of the high RNA complexity in
eukaryotes. a Under the RNA-continuity model, the basic system of
RNA processing of RNA in modern eukaryotes evolved in an earlier
ribonucleoprotein stage of the origin of life—an RNP world. The
model involves streamlining of RNA processing separately in bacteria
and archaea, with the latter having retained some snoRNAs. There
would be continued evolution (including expansion) of the RNA
infrastructure in eukaryotes. b Under the RNA re-expansion model,
there may have been the same early complex system of RNA
processing of RNA, but this was largely lost via streamlining or
replacement before LUCA. This subsequently re-expanded in
eukaryotes. This model has one loss and one gain. The order of
branching of archaea, bacteria, and eukaryotes is deliberately
ambiguous; branching is independent of the models, and alternative
topologies are consistent with either model. (Modified from Penny
and Collins (2009).)
detailed models for the late origin and spread of the
spliceosome and spliceosomal introns in eukaryotes, these
having derived in a stepwise manner from group II introns
(Hickey 1992; Stoltzfus 1999)—introns-late. An origin for
group II introns in the RNA world has likewise been proposed—a form of the introns-early model (Gilbert and de
Souza 1999) distinct from the original exon theory of
genes. Moreover, the possibility of a rudimentary spliceosomal apparatus dating back to the RNA world has at
times been advocated by several authors (Reanney 1979;
Darnell and Doolittle 1986; Poole et al. 1998). As shown in
Fig. 3, a continuum of hypotheses is possible, and the
figure emphasizes that there are no hard and fast boundaries between introns first/early and introns early/late. The
divisions arise naturally, and allow for an early biochemical phase during the origin of life (e.g. Martin and Russell
2003). It is generally assumed (e.g. Lincoln and Joyce
2009; Cech 2009; Sharp 2009; Penny 2005) that RNA
preceded encoded protein synthesis, and that this RNA–
protein world preceded DNA being used as the main
information storage macromolecule. This gives a natural
tripartite division into
Thus, we use the division between the RNA world, the
RNP world, and the DNA worlds as our primary distinction, though agreeing that rigid distinctions are overly
simplistic. For example, as illustrated in Fig. 3a, there will
be overlaps between first/early and early/late. Again, the
intron/exon structure could, in principle, have arisen in a
common ancestor of eukaryotes and archaea (the intronsmiddle model, Fig. 3b), and whether this was pre- or postDNA depends on when DNA took over the main coding
role (see Forterre and Gribaldo 2007).
Introns-early is basically that introns arose around the
time of the origin of protein synthesis. It suggests that it
aided the rapid diversification of early proteins by allowing
recombination of smaller sections (possibly functional
modules) of proteins. Such new recombinants may have
had a significant selective advantage, and thus introns
would have been selected in association with the new
recombinants (hitch-hiking). The standard introns late
theory is that the intron/exon structure of genes, and the
associated spliceosome, arose well after protein coding and
synthesis were established. The hypothesis is usually
associated with prokaryotes being well established before
eukaryotes arose. It divides naturally into two: the ‘introns
latish’ (Sverdlov et al. 2007) which would have introns
expanding before the last common ancestor of eukaryotes
(possibly after the endosymbiosis of the mitochondrial
ancestor), whereas the ‘introns very late’ version would
introns-first (some introns arose in the RNA world),
introns-early (introns date from the RNP world); and
introns-late (introns post-date DNA as the main coding
macromolecule, the DNA world).
123
J Mol Evol
are often testable independently. So, here, we split the
ideas into those specific to introns-first, followed then by
others that only apply to the more general RNA-continuity model. Obviously, the RNA-continuity model could
eventually be established, even if the introns-first model
was rejected.
Introns-First
a. Timing; introns per se date back to the RNA world.
b. Some original introns were functional RNAs.
c. The origin of introns pre-dates protein-coding mRNAs.
d. Continuity: at least some introns are still present from
an RNA or RNP world; they have not all been lost, and
then reappeared ‘de novo’ (but some would have
evolved new functions).
e. Spliceosomal introns have been lost from bacteria and
archaea by reductive evolution (this is also required by
the introns early model).
f. Primordial intron position is not correlated with protein
structural modularity in eukaryote proteins.
g. Intron positions are dynamic, as opposed to always in
the same position. In other words, they can (on an
evolutionary time scale) be lost, gained, duplicated, or
drift in position. (This contrasts with an earlier view
that introns are fixed in their position.)
Fig. 3 The introns first, early, and late models for the origin of
spliceosomal introns, expressed in a linear form (a) or as a tree (b).
The basic subdivisions are based on whether the intron–exon structure
arose in the RNA world (introns first, before encoded protein
synthesis), in the RNP world (introns early), or after the origin of
DNA synthesis (introns late). The latter is subdivided into introns
‘latish’ (after DNA synthesis, but well before the last common
ancestor of eukaryotes), and introns ‘very late’ (with spliceosomal
introns spreading within modern eukaryotes). Gradation between the
hypotheses is possible, especially between introns first and early. A
range of options is shown for the origin of eukaryotes, from the left
hand arrow with them being old (with relatively inefficient RNA
processing) to the right hand arrow with eukaryotes arising late in
evolution (often being inferred as symbiosis between an archaeon and
a bacterium)
have eukaryote introns expanding only within extant
eukaryotes. Here, there has been significant progress over
the last decade and it is therefore appropriate to reconsider
the situation again.
Although timing is the defining criterion of the intronsfirst theory, there is a much wider range of questions to
be considered, and some already lead to testable predictions which can be evaluated with current knowledge. A
general concern (Penny and Phillips 2004; Poole and
Penny 2007) is that we tend to treat complex theories
somewhat as ‘slogans’, and do not divide up a theory into
its different components. These sub-theories (components)
123
Other RNA-Continuity Aspects
h.
i.
j.
Some other RNAs are relics from the RNA world
(ribosomes and tRNA are the best known and clearest
examples).
The spliceosome is a very large macromolecular RNP
complex that must have evolved in a stepwise manner.
Understanding its origins is crucial to understanding
the origin of the current intron/exon gene structure.
Some processes may likewise be relics (e.g. RNA
processing in tRNA, rRNA and mRNA maturation); it
is an over-simplification to concentrate only on introns.
A variety of other inter-linked processes should be
evaluated (e.g. splicing, nonsense mediated decay, and
other transcription-related processes, Collins and Penny
2009).
In the past, a major focus has been on the intron/exon
structure of eukaryote genes, but it is clear from the ten
points above that the RNA-continuity hypothesis is much
broader, and involves evaluation of the evolutionary
origins of all classes of non-coding RNAs. For example,
the nature of the ancestral spliceosome is a fundamental
question under introns first or early, and its origin is
often ignored under the standard introns-late hypothesis
(see, however, Stoltzfus 1999; Scofield and Lynch 2008;
J Mol Evol
Veretnik et al. 2009). Under the introns-late approach,
the favoured model is that the spliceosomal introns in
eukaryotes are derived from type II introns in bacteria;
this is certainly the current consensus—even though the
data are at best circumstantial (see later). Having introduced the ‘RNA-continuity’ concept, the next step is
establishing criteria for its evaluation.
General Principles
In this section, we will give some of the general scientific
principles required for evaluating the RNA-continuity
theory, and what criteria we expect such a scientific evolutionary theory to demonstrate. Perhaps there does need to
be more emphasis in biology on seeing what major principles can be derived from physical and chemical properties (De Nooijer et al. 2009). Specifying the principles
should make it easier to evaluate arguments for or against a
line of reasoning. Here, we consider seven main aspects:
catalytic efficiency, the error rate limitation on genome size
(the Eigen limit), the continuity of intermediate forms,
effects of population size, agreement from prior knowledge, reductive evolution, and no predetermined direction
of change.
Catalytic Rate
Proteins are generally better catalysts than RNA. The data
in Table 1 for both turnover (kcat) and catalytic efficiency
Km shows proteins are much faster than comparable
ribozymes (RNA-based enzymes). This result leads to our
primary hypothesis (Jeffares et al. 1998) that once a protein
is carrying out a catalytic reaction, a ribozyme will not
displace it—we expect that RNA will never ‘take back’ a
catalytic role that proteins are already doing. This gives a
direction of change to macromolecular evolution; from
ribozymes to protein enzymes. The hypothesis is that a
reaction catalysed by a ribozyme was never catalysed in
that lineage by a protein. This direction of change is a
simple consequence of proteins being catalytically more
effective than ribozymes (though as discussed in Jeffares
et al. 1998, the selective pressure will be lower on ribozymes acting on macromolecular complexes where catalysis is limited by diffusion times), and RNA catalysis still
plays a major role in eukaryotes (Cech 2009). An excellent
example of the trend of proteins taking over a catalytic
function occurs in human mitochondria (and probably
other mammals) where RNase P is no longer a ribozyme
because it has lost its catalytic RNA component—it is now
a protein enzyme (Holzmann et al. 2008). This is a striking
example of support for a prediction about the direction of
change.
Even though we predict the direction of catalysis to go
from ribozymes to proteins, this still allows diversification
of existing small RNAs into new roles, especially as guide
RNAs and regulation of expression. It is the complexity of
RNA processing RNA in eukaryotes that is so striking, and
especially the catalytic roles of RNA in RNA processing
(Valadkhan et al. 2009; Collins and Penny 2009).
Eigen Limit
Table 1 Turnover numbers for ribozymes and proteins
kcat (min-1)
kcat/Km (M-1 min-1)
0.1
9.0 9 107
0.3
0.5
6.0 9 103
8.3 9 105
L-19 intron
1.7
4.3 9 104
RNase P RNA
1
2.0 9 106
RNase P RNA and protein
2
4.0 9 106
5,700
1.1 9 108
5,700
6.0 9 108
25,000
6.0 9 108
258,000
1.4 9 1010
780,000
9.0 9 108
600,000,000
7.2 9 109
Catalyst
Tetrahymena L-21 (SacI)
Polynucleotide kinase
19-base virusoid
a
b
RNase T1
Staphylococcal nuclease
b
T4 polynucleotide kinase
Triose-P isomerase
b
Cyclophilinb
Carbonic anhydrase
b
b
Turnover number is kcat. Values largely from Jeffares et al. (1998),
note that the units of time are in minutes
a
Artificial ribozyme evolved in vitro
b
Protein catalysts
One of the most profound discoveries from origin of life
theory was the calculation that the higher error rate copying
from RNA-based systems places a very strong upper limit
on the size of a genome. Above this limit, there are too
many errors per replication for selection to maintain the
optimal sequence (see Eigen 1992) and there is an ‘error
catastrophe’. Above this limit, the sequence randomizes.
This places a very strong limit on the size of any genome,
especially if it is being copied by a ribozyme (Jeffares et al.
1998; Poole 2006). It also gives a strong selective force
towards protein involvement in catalysis (replication specifically) and what we call the Darwin–Eigen cycle (Poole
et al. 1999; Penny 2005). This is a positive feedback cycle
that favours increased fidelity, leading to longer coding
sequences, allowing additional genes to be coded for,
which allows for increased fidelity, and so on. Overall, the
reduced coding capacity in early living systems puts strong
limits on what genomes were possible, and it leads to the
expectation that recombination (as between RNA viruses)
would have been important in an RNA world (Reanney
123
J Mol Evol
1987; Jeffares et al. 1998; Lehman 2003; Santos et al.
2004).
Continuity of Functional Intermediates
It is not possible under any form of Darwinian evolution
for a spliceosome to just ‘appear’ when it is needed, such
as immediately following an ‘invasion of the introns’.
Related to this is that there cannot be selection ‘for’
something that does not yet exist; for something that will
only be useful in the future. The origin of the ribosome is
one such example; under standard evolutionary theory,
ribosomal RNA must have had a prior function before
being co-opted into protein synthesis. We have suggested
that the first function for the proto-ribosome was as an
RNA-dependent RNA polymerase that added nucleotides
three at a time, thus improving replication fidelity (Poole
et al. 1998, 1999). We call such a hypothetical enzyme an
RNA triplicase. As an aside, this relies on the ‘genomic
tag’ hypothesis (Maizels and Weiner 1999; see also Fedorov and Fedorova 2004; Sun and Caetano-Anolles 2008),
which is a way of distinguishing genomic copies of RNA
from functional RNAs. The excision of such RNAs out of a
precursor transcript via action of a primordial spliceosome
(Poole et al. 1998, 1999) constitutes one plausible mechanism (Fig. 1). However, the model for its origin by Bokov
et al. (2009) appears not possible as published, because it
does not establish a continuous series of functional intermediates—though that problem could no doubt be fixed in
that case. It is standard in molecular evolution for a gene
selected for one function to be ‘recruited’ or ‘co-opted’ for
a related function.
In this context, plausible models must give a stepwise
origin of the spliceosome where the intermediate stages are
functional. The spliceosome is larger even than the ribosome, and has five small RNAs (U1, U2, U4, U5, and U6),
together with (in humans) up to 200 proteins (Jurica and
Moore 2003). While the evolution of the spliceosome
should be a focus of all theories for the origin of introns,
only a minority of authors seem to bother. One is Stoltzfus’
elegant and detailed model, under the introns-late hypothesis, wherein he describes a stepwise model for the emergence of the spliceosome from mitochondrially derived
group II self-splicing introns (Stoltzfus 1999); another is
the view that a rudimentary RNA-based splicing machinery
would have originally been used in error correction: transsplicing allows an early possible mechanism for recombination (Reanney 1979), though we cannot test that prediction yet. The introns-first hypothesis proposes a third
(compatible with the second) that the primordial splicing
machinery enabled generalized RNA gene expression in an
RNA world.
123
Population Size and Slightly Deleterious Mutations
Lynch (2002, 2007) has correctly pointed out that the
effective population size (Ne) is important in comparing
evolution between bacteria/archaea and eukaryotes and for
evaluating the origin of genomic elements such as introns.
Simply put, Lynch points out slightly deleterious mutations
(such as an additional intron in a protein) are more likely to
drift to fixation in a species with a small population size.
And that conversely, such mutations are less likely to be
fixed in species with a small population size, to the extent
that with very large population sizes (such as extant bacteria) certain deleterious elements have virtually no chance of
drifting to fixation. While these arguments generally assume
an introns-late model, introns-early models are quite compatible with this aspect of evolutionary theory, as follows.
First, the same population genetics theory indicates that
species with large Ne will have an increased likelihood of
fixing slightly advantageous elements. While Lynch’s initial
formulation assumed that introns were slightly deleterious
(Lynch 2002), it is now well understood that many introns
contain functional elements (see below). Second, it is perhaps unlikely to expect that in the RNA or RNP worlds, Ne
was large and selection was efficient; we would think the
opposite would be more likely, at least in the earliest stages,
where replication fidelity was low, likely necessitating error
correcting mechanisms such as redundancy and recombination (Reanney 1987; Poole 2006). Consequently, primordial genome architecture would not closely resemble the
streamlined architecture of modern bacteria and archaea,
which is arguably the product of large Ne and efficient
selection. A key question is therefore whether all early life
underwent a period of such efficient selection, or whether
this has only effectively operated on some lineages. Our
simulation results (De Nooijer et al. 2009) indicate that both
smaller primary producers and larger consumers are
expected to have occurred very early. Since predators consistently have smaller population sizes than prey, there are
likely to be a range of population sizes even in these early
stages of the evolution of life. So, even if all intronic elements were slightly deleterious, or initially neutral, it is
plausible that non-coding intronic elements would drift at
some frequency in some of the prey populations until they
evolve some function. It is interesting to note here that
population size, the strength of purifying and adaptive
selection, genome size, replication, and translation fidelity
are all interrelated, and so need to be considered together.
These arguments show that while population size affects
intron dynamics, it does not in itself distinguish between
the introns first/early/late models without other supporting
information. If introns do remain until the last universal
common ancestor (LUCA), then the complete loss of
spliceosomal introns in bacteria and archaea and recurrent
J Mol Evol
Table 2 Population effects (Ne) occur under introns first, early, or late
Population effect (Ne) if introns derived
RNA relics
Population effect (Ne) if introns ancestral
Yes, accounted for = 4
Not directly accounted for, but not incompatible = ?
snoRNAs in Expansion of introns and small RNAs may be expected under
introns
small Ne, but this does not account explicitly for their origin,
and does not preclude a very early origin
Introns first
Does not directly account for origins, but is not incompatible (see Aims to explain the origin of mRNA in the context of the RNA
above)
to protein transition
Introns early ? (see above)
Introns late
SnoRNAs argued to date back to RNP world, with the intronic
position possibly being an ancestral feature
Suggests group II ribozymes have an RNA world origin
Does not directly account for origins, but is not incompatible (see Not compatible, but an RNA world origin does not exclude
above)
later intron expansion (e.g. subsequent to the emergence of
full meiosis with outcrossing)
Spliceosome A late origin is not a requisite for the population size effect
model; a late origin would be argued to be non-adaptive.
If common origin (Fig. 2c) then maybe increased complexity in
eukaryotes and/or reductive evolution in bacteria and
archaea.
Intron
expansion
Expansion is not incompatible with an early origin or with a
late origin
Expected under small Ne, sexual outcrossing.
Opposite may occur under asexual reproduction and in
unicellular lineages with large Ne
loss of introns in eukaryotes lineages could be due in part
to changes in population size, along with many of conditions that favour intron loss or accumulation (see below).
The initial origin of introns, and the spliceosome, and the
extent to which RNA splicing is intertwined with many
aspects of RNA processing (Collins et al. 2009) are not
well described by population size effects alone. Table 2
shows some of our reasoning.
Prior Information
In science, it is always positive when background information allows predictions about other types of events. In
this sense, introns-first is derived from more basic principles, and makes predictions about phenomena that were not
included in the original data that were used to formulate the
hypothesis. It is based on two major scientific observations:
the relatively poor catalytic performance of RNA as compared with proteins and the limitations on genome size that
follows from the relatively poor catalytic properties of
RNA relative to protein (the Eigen limit above, and see
Table 1). This is in contrast to both the introns-early and
introns-early theories that are more ‘post-hoc’ hypotheses—proposed to explain phenomena already observed.
They do not aim to explain any phenomena other than the
origin of introns. We have to be careful here not to overgeneralize, but the introns-first theory was proposed (Jeffares et al. 1998) as a solution to the stepwise evolution of
protein synthesis from an RNA world.
Reductive Evolution
There is now strong evidence that reduction of eukaryote
genomes has occurred on several occasions, and examples
include yeasts, parasites, and small algae (Derelle et al.
2006). Such reduction often includes extensive intron loss,
especially in eukaryotes with a short life cycle (Jeffares
et al. 2006). Processes of mitochondrial loss, gene loss, and
intron loss in eukaryotes are now quite well understood,
along with reasonable explanations for why it can be
advantageous. A fascinating, though clearly derived,
example of complete intron loss comes from the recently
sequenced nucleomorph genome of Hemiselmis andersenii
which was reported to have lost spliceosomal introns
completely (Lane et al. 2007). We sometimes refer to such
extreme genome reduction in eukaryotes as ‘prokaryoteenvy’, but more seriously our knowledge of genome
reduction makes the view that prokaryotes are derived by
genome reduction much more plausible than it seemed
10 years ago. The weight of evidence for intron loss, both
the rate and the phylogenetic frequency (few lineages seem
to show gain, Roy and Gilbert 2006; Koonin 2006; Mourier
and Jeffares 2003) is one example of how our views have
changed, mainly due to the availability of genome data.
Additionally, we are now starting to understand how
selection might influence intron gain, retention, and loss
(see below).
Some (reduced) eukaryote genomes have very few
introns, and initially these were regarded as evidence that
the last common ancestor of eukaryotes had few introns
(Logsdon 1998). This is the ‘introns very late’ hypothesis—that introns (and the spliceosome) may have arisen
within extant eukaryotes. The idea that introns arose and
spread only within eukaryotes is similar to the (now discredited) Archeozoa hypothesis that some modern
eukaryotes had never had mitochondria (see Embley and
Martin 2006; Poole and Penny 2007). The recognition that
there were derived parasitic and anaerobic eukaryotes with
123
J Mol Evol
reduced genomes (and mitochondria) thus discredits both
the original Archeozoa hypothesis and the introns-very late
hypothesis. So, we have divided ‘introns late’ into ‘introns
latish’ (introns had arisen early in eukaryote evolution and
were well established by the time of the last eukaryotic
common ancestor [LECA]) and ‘introns very late’ (that
introns arose and spread within extant eukaryotes). The
former is still possible, the latter is now rejected.
The Direction of Change Cannot be Assumed A Priori
The final principle discussed here is that, in evolution, there
is no generally guaranteed direction of change from simple
to complex or vice versa. In each case, independent evidence must be found, for example, the catalytic efficiency
argument for the direction from ribozyme to protein
enzymes. In relation to introns, the prevailing view is that
‘simple’ group II introns evolved into the complex
assemblage of spliceosome and introns. While a reasonable
model exists, it is still difficult to establish on current
evidence the evolutionary relationship between spliceosomal and type II introns. Figure 4 shows a range of possibilities for the relationship between the eukaryote
spliceosomal introns and the type II self-splicing introns.
Several important results (Hetzer et al. 1997; Sashital et al.
2004; Seetharaman et al. 2006; Valadkhan et al. 2009)
strengthen the argument that the spliceosome is an RNA
catalyst that shares a common molecular ancestor with
group II introns.
A good null hypothesis here is the proposal that the
similarity between group II introns and the spliceosome
and spliceosomal introns is that voiced by Weiner (1993)
that the similarities in catalysis may be the result of
chemical determinism. If that were the case, the two could
not be concluded to share a common origin (Fig. 4a). To
our knowledge, no one has provided unequivocal evidence
for common descent (Fig. 4b), though on the weight of
circumstantial evidence there may perhaps be consensus
for the latter. However, that consensus is perhaps nearer the
scenario in Fig. 4c than an acknowledgement of common
descent. At the risk of stating the obvious, to say that the
spliceosome (an extant and highly evolved structure)
evolved from group II self-splicing introns (another extant
and highly evolved structure) is analogous to saying that
humans (extant) evolved from chimpanzees (extant); the
opposite (Fig. 4d) is no more productive. A better inference may be that the two extant species (or structures in the
spliceosomal case) evolved from a common ancestor.
Unfortunately, there seems to be little evidence from the
RNA structures alone to determine which of the two
(spliceosomal splicing or group II self-splicing introns) is
closer to the ancestral state. More generally, given that selfsplicing introns pursue a horizontal lifestyle that is
expected to lead to repeated cycles of insertion, atrophy,
loss, and reinsertion, it is also difficult to establish the
antiquity of group II introns; given introns found in mitochondria are subject to this cycle (Goddard and Burt 1999),
their presence in extant mitochondria cannot be trivially
deigned ancestral by reference to the bacterial origin of this
organelle. Finally, it is not sufficient to assume that is it the
simpler of the two (group II introns) because we know that
reductive evolution and gene fusion do occur.
Improved Understanding Over the Last Decade
Fig. 4 Four hypotheses for the relationship of Group II and
spliceosomal introns. a The two types of introns arose independently.
b Independent modifications from an unknown common ancestor
(ancestral state not specified). c Spliceosomal introns are derived from
bacterial group II introns (usually assumed to be associated with the
endosymbiotic origin of mitochondria). d Reductive evolution if
bacteria arose by reductive evolution
123
There are three relevant areas on which we will concentrate
in regard to intron evolution. The first is the recognition
over the last decade that the last common eukaryote
ancestor was already quite complex in both its biochemistry and its cellular organization. Then there is the
improved understanding of the gain and/or loss of introns
as revealed by comparative genomics—giving a much
more dynamic view of intron evolution (early ideas considered introns more static). Finally, there is the improved
understanding about the selective forces that can affect the
average number of introns per protein-coding gene in a
species.
J Mol Evol
The Complexity of LECA
It would be much easier to infer the biochemical,
molecular, and sub-cellular features of LECA if we were
confident about the position of the root of the eukaryotes.
However, the deep phylogeny of eukaryotes remains
unresolved (Keeling et al. 2005), and so our most reliable
approach has been to search for functions present in all
six main eukaryote lineages. If a feature occurs in all six
groups, then the ancestor is expected to have had that
feature, independently of where the root actually is. The
developing conclusion is that LECA does appear to have
a surprisingly high complexity (De Duve 2007; Hartman
and Fedorov 2002; Kurland et al. 2006; Poole 2009). In
particular, the RNA ‘control’ of RNA processing and
regulation appears well developed (Collins and Chen
2009). Neither the overall complexity nor the high reliance of RNA in processing and regulation of other RNA
molecules is well explained by most models of eukaryote
origins that assume eukaryotes are ‘advanced’ or
‘derived’.
Of relevance here is the apparent presence of a complete
spliceosome with five U-RNAs and over 80 proteins in
LECA (Collins and Penny 2005; Veretnik et al. 2009). For
humans, the spliceosome has around 200 proteins making
up the spliceosome, but current techniques (including
ancestral sequence reconstruction) only identified 84 of
them in LECA—the main point being that the proteins
were distributed over all the sub-components of the
spliceosome. In addition, there are two classes of introns:
major and minor, with the minor spliceosome having U11
and U12 snRNAs instead of U1 and U2. Russell et al.
(2006) and Davila Lopez et al. (2008) provide a range of
evidence that the minor spliceosome is also very ancient in
eukaryotes, though it is not completely clear yet whether it
also dates back to the LECA. Clearly then, splicing evolved
at the very latest in the eukaryote stem, and was a wellestablished feature of LECA.
There are other examples of new data changing our view
in this area. Alternative splicing was sometimes assumed to
have occurred ‘for’ the origin of multicellularity, though
given the principles of evolution outlined earlier such a
view is not tenable—a feature cannot be selected solely
because it might give a hypothetical advantage at some
unspecified time in the future. However, recent work
makes it seem that alternative splicing is relatively ancient
within eukaryotes (Irimia et al. 2007b; Tarrio et al. 2008)
and so was available to be recruited into development of
multicellular animals, rather than have evolved ‘for’ that
process. A similar example seems to arise with apoptosis
(programmed cell death, Nedelcu 2009) where it occurs in
unicellular eukaryotes, and thus would be available later
for multicellular development.
Another well-known aspect of RNA is the discovery of
RNAi and its complex set of regulatory reactions. The full
extent of these small RNA types in deep lineages of
eukaryotes makes it likely that it will also be an ancestral
feature (Collins and Chen 2009). We do need to be careful
here because new members of existing RNA families
almost certainly emerge in some lineages (e.g. Lu et al.
2008). It is therefore important to separate (Collins and
Penny 2009) such new examples of existing RNA types
(where all the protein machinery is already present) from
genuine new novelties where a new class of regulatory
RNAs have arisen.
In general, the newer results are consistent with some
form of RNA continuity remaining in eukaryote genomes,
including for processes such as recombination and meiosis
(Egel and Penny 2007). We had assumed earlier that RNase
MRP (which is involved in processing rRNA) had arisen
just within eukaryotes, but it has now been found in all
groups of eukaryotes that have been well studied (Woodhams et al. 2007), and so is now more likely to have been
in LECA. Not all classes of RNA can currently be inferred
to be in LECA. In our earlier papers, we were impressed by
the large RNP ‘vault’ particles in many eukaryotes, but
their distribution is still uncertain. To some extent, there
seemed to be a decrease in interest in them, but now that a
three-dimensional crystal structure is available (Tanaka
et al. 2009), interest has certainly revived again. There has
been recent work on identifying the vaults RNAs themselves (vRNAs, Mosig et al. 2007), but the proteins are
probably easier targets to identify. We have concentrated
here on RNA, but the protein-fold diversity in eukaryotes
also has important information about the molecular components (including RNPs) LECA may have contained
(Lecompte et al. 2002, Wang and Caetano-Anolles 2009).
Intron Early/Late and Introns Fixed/Dynamic
There has been considerable progress in the last 10 years in
describing the presence and absence of introns in eukaryote
genomes, primarily because of the availability of genome
data. Earlier analyses were based on the best data available—relatively deep phylogenetic studies of a handful of
genes (e.g. triosephosphate isomerase, Logsdon et al.
1995). However, rates of gain/loss are stochastic at best
and probably biased (Carmel et al. 2007; Jeffares et al.
2008).
In effect, it appears to us that some of the early debate
was comparing ‘introns early and fixed in position’ and
‘introns late and mobile’. With the establishment that
introns could be gained and lost during evolution, it
appeared initially that this favoured ‘introns late and
mobile’. Recent work has eliminated the introns ‘very late’
proposal—that the eukaryote ancestor had only a few
123
J Mol Evol
introns and that they increased during the evolution of
extant eukaryotes. It is now clear that the average number
of introns per gene was relatively high early in the evolution of modern eukaryotes, and that most lineages have
undergone more loss than gain of introns over the last
billion years at least (e.g. Roy and Irimia 2009a). Furthermore, although some early branching lineages are relatively intron poor, this is probably the result of extensive
loss, since others have many introns. For example,
Slamovits and Keeling (2006) report that some excavates
have a high number of introns per gene. In addition, it
appears that the motifs for intron recognition sites are
stronger when there are few introns per gene (Irimia et al.
2007a), selection may be stronger with fewer choices.
We also understand more about rates of intron gain and
loss, and several accounts are available (Fedorov et al.
2003; Roy and Irimia 2009b). There are a few accounts
where a lineage has undergone considerable intron gain
(Roy and Penny 2007; Carmel et al. 2007) and at last some
examples of recent intron gain (Omilian et al. 2008).
However, studies that use phylogenetic inferences to infer
rates of gain and loss generally find that rates of loss are
several orders of magnitude greater than rates of gain (Roy
and Penny 2007; Coulombe-Huntington and Majewski
2007). However, an important recent development has
been new models of intron gain, wherein an exonic region
becomes a new intron, so-called ‘intronisation’ (Irimia
et al. 2008; Catania and Lynch 2008; Catania et al. 2009).
Depending on whether these new introns arise initially as
minor splice variants, and/or in generally less conserved
exons, such new introns may not be well detected by previous methods. So it is possible that intron gain rates have
been underestimated. Nevertheless, it is clear that rates of
gain and loss can vary considerably between species, and
over evolutionary time, and between genes of one genome
(Carmel et al. 2007).
In summary, we can now rule out the ‘introns very late’
variant of the introns-late model (where introns arise after
LECA), we know that introns are lost frequently and we
suspect that they are gained less frequently.
Functions of Introns: Continuity with Earlier Stages
of Evolution?
A variety of studies have shown functionality for specific
modern introns, including playing host to snoRNAs and
miRNAs (Niu 2007; Brown et al. 2008; Ying and Lin
2009). Studies of selective constraint in genomes indicate
that intronic sites are subject to purifying selection (Halligan and Keightley 2006; Guo et al. 2007; Gazave et al.
2007; Gaffney and Keightley 2006), suggesting that there
may be many more undiscovered functional elements in
introns. An assumption of the introns-first hypothesis is
123
that introns had functions even before the origin of encoded
protein synthesis. For the moment, we will ignore any
possible direct catalytic role and mention only the possibilities of assisting processing and stabilization of early
ribozymes (such as the ribosome). For example, it is well
established that snoRNAs are required for pre-rRNA processing, or modification of other stably transcribed RNAs.
It is an expectation of introns-first that some modern
introns are derived from these. As shown in Table 3, some
sites of pseudouridylation and methylation in rRNA are
conserved across members of all three domains. Furthermore, sno-like sRNAs homologous to C/D- and H/ACAbox snoRNAs are found in archaea (Gaspin et al. 2000;
Omer et al. 2000; Rozhdestvensky et al. 2003; Tran et al.
2005). An examination of the phylogenetic distribution of
snoRNA families curated by Rfam (Gardner et al. 2009)
indicates that only 3 out of the 489 families in release 9.1
are shared between archaea and eukaryotes, the remainder
appears to be domain specific for archaea (47 families) and
eukaryotes (438 families), respectively (data not shown).
An additional intriguing case is that of snoRNA U3 which
provides a circumstantial case for snoRNAs in LUCA; it is
present in eukaryotes, and there is, moreover, a U3 snoRNA-like fold in cis, with a similar role in pre-rRNA in
bacteria (Dennis et al. 1997). In addition to roles in processing or modification, archaeal sRNAs might also act as
chaperones, aiding folding of rRNA and tRNA (Dennis and
Omer 2005; Schoemaker and Gultyaev 2006). That might
in turn indicate a similar primordial role for s(no)RNA in
an RNA world, though such speculation is difficult to test
experimentally.
Selective Forces Affecting the Numbers of Introns
Various biological factors have been proposed to influence
the intron density of a gene and the organism as a whole
(Jeffares et al. 2006). The preponderance of loss of intron
in eukaryotes supports that they are generally deleterious in
eukaryote genomes. Lynch (2002) has proposed that
introns are deleterious because of the extra mutational load
Table 3 Some sites of pseudouridylation and methylation are at
homologous positions on the rRNA across all three domains of life
Archaea Human/
yeast
Archaea/
Eukarya
Archaea/Eukarya/
Escherichia coli
23
3
2
SSU 11
13
2
1
LSU 31
23
7
2
Pseudouridylation
LSU
4
Methylation
Data derived from: Ofengand and Bakin (1997), Dennis et al. (2001),
and del Campo et al. (2005)
J Mol Evol
they confer (an intron-containing gene contains many extra
sites that are essential to its proper expression than an
intron-less version). Two patterns of intron density within
genomes help to describe in more detail why introns are
deleterious. Based on observation that highly expressed
genes have shorter introns in humans and Caenorhabditis
elegans, it was proposed that there had been selection
against the energetic cost of transcribing long introns
(Castillo-Davis et al. 2002). However, the opposite pattern
was observed in plants—highly expressed genes are the
least compact (Ren et al. 2006). A pattern that is possibly
more consistent between diverse eukaryotes is that rapidly
regulated genes are less intron-dense than constitutively
active genes (Jeffares et al. 2008). Since transcription and
mRNA maturation is a relatively slow process in eukaryotes, this may reflect selection for rapid protein production.
These observations suggest that introns are deleterious
because they hinder rapid gene expression and repression,
and they make high gene expression energetically expensive. One case of positive selection associated with an
intron loss in a population has been observed (Llopart et al.
2002); in this case, the intron-loss allele is less strongly
transcribed, but this illustrates that individual intron loss
alleles are subject to selection.
We now know that very intron-poor eukaryote genomes
can be the result of extensive intron loss. In many cases, this
is simply the result of general genome compaction, along
with reduced intergenic regions, and loss of genes. However,
extensive intron loss is not always associated with compact
genomes (Jeffares et al. 2006). The budding yeast, Saccharomyces cerevisiae, has an intron-poor genome (intron
density 0.045 introns/gene), but not an extremely compact
genome. The thermophilic alga Cyanidioschyzon merolae
and the parasitic trypanosome Leishmania major are other
examples (intron densities of 0.005 and 0.003, respectively).
Such derived intron-poor species may have lost introns at any
point in their evolutionary past, and it remains speculative to
attribute the intron loss to a particular environmental condition. Experimental evolution should help to describe such
processes, as it has done for genome reduction of bacteria
(Nilsson et al. 2005), particularly now that sequencing of
small eukaryote genomes has become rapid and inexpensive.
As mentioned above, it is clear that introns can sometimes contain advantageous elements, microRNAs and
snoRNAs (Niu 2007; Brown et al. 2008; Ying and Lin
2009), DNA-acting elements such as transcriptional
enhancers (Rose 2008), and intronic sequences may also be
required for alternative splicing that vastly expanded the
proteome complement of some intron-dense species (Xing
and Lee 2007). Such advantageous introns are expected to
be maintained in a population, despite the costs of maintaining them (see above). Clearly, the balance of intron
retention, gain, and loss within a genome is determined by
many factors (such as population size and the cell division
time for unicellular species), but particular introns can be
subject to strong, non-neutral purifying or adaptive selection. This will mean that an intron may be retained over a
very long period of time if it contains a functional element,
or arise/be lost relatively rapidly under various conditions
(such as a novel mutation that affects fitness).
In vertebrates, many snoRNAs involved in ribosome
maturation are encoded in introns (Tycowski et al. 2004).
Likewise, numerous microRNAs are in intron encoded
(Das 2009), but some are even encoded in exons. The final
hypothesis is that spliceosomal introns only arose after
DNA synthesis was established, and proteins had taken
over their current role as the primary catalysts in living
systems. We all prefer simpler hypotheses, but if there
were a dual origin, with some introns being very ancient
(first) and others gained during evolution then it makes
hypotheses harder to falsify. Perhaps the origin of the
spliceosome then becomes very important.
Summary and Future Directions
Compared to the data available 10 years ago, much more
is known about genome evolution and function. There
has also been considerable progress in describing the
components of LECA. This indicates that LECA possessed a complex spliceosome, and it is now clear that
ancestral eukaryotes were not extremely intron poor. The
current evidence indicates that very intron-poor eukaryote genomes are probably the result of extensive intron
loss.
We are aware that intron dynamics are highly stochastic
and affected by both organism-level and gene-level selective processes. An understanding of selective process
encouraging intron loss makes the complete loss in prokaryotes a more plausible proposal than it was a decade
ago. Our new understanding of the ‘RNA infrastructure’ of
eukaryotic cells and the ncRNA functions of introns means
that we are aware that some introns carry essential components of genes, that will be maintained during evolution,
and this is compatible with continuity with ancient functions of introns in the RNA world.
Although progress in eliminating theories may appear to
be slow, there has been progress over the last decade, and
some theories can be rejected (Fig. 5). The role of
hypotheses is to stimulate further tests, rather than to be
‘believed’ as facts; they are simply tools to make predictions that (in principle) can be tested. It is always helpful to
specify a range of hypotheses and not be forced into
inappropriate and limited ‘binary choices’.
Although it is difficult to anticipate future directions, we
certainly expect rapid progress in understanding the nature
123
J Mol Evol
Fig. 5 Current status of introns first, early, and late. The ‘introns
fixed’ model (lower graphic), in all its forms, is rejected by the data
from intron positions as studied in comparative genomics. Similarly,
the introns ‘very late’ model can be rejected, even with intron position
being dynamic (upper graphic). In contrast, several versions of the
‘introns dynamic’ model (first, early, and latish) are still possible on
current evidence. It is imperative that we retain all three hypotheses
and seek ways of testing them objectively
of the last eukaryote common ancestor. This will at least
define much more accurately the detailed questions to be
answered. A major difficulty at present is that it often
appears that theories for the origin of eukaryotes do not
even address the question of the origin of the many
defining features of eukaryote cells.
At this point, answers appear quite open, even the
present authors would probably not agree on predicting the
order of events in early evolution! A basic prediction of
eukaryotes first or early is that eukaryotes have ancient
origins, and are not derived from two prokaryote cells (an
archaeal and a bacterial cell). Under these models, the
favoured alternative is that those two cell types (archaea
and bacteria) are derived by reductive evolution from a
more complex cell structure that had at least some similarities in its RNA processing to modern eukaryotes.
Reductive evolution is now recognized as being widespread in evolution, so that direction of change is less of a
problem than was perceived a decade ago.
Possibly some RNAs (such as snoRNAs) may be very
old, but their intronic location may be evolutionary derived.
So, we could end up with combinations of hypotheses such
as ‘snoRNAs in LUCA, introns latish’. We are hopeful of
elimination of additional hypotheses, such as those occurred
with the introns very late hypothesis. As such, being able to
eliminate some alternatives means that the subject is within
the realm of modern science, but it is going to be hard to
come to a final decision.
References
Bokov K, Sergey V, Steinberg SV (2009) A hierarchical model for
evolution of 23S ribosomal RNA. Nature 457:977–980
123
Brown JWS, Marshall DF, Echeverria M (2008) Intronic noncoding
RNAs and splicing. Trends Plant Sci 13:335–342
Carmel L, Rogozin IB, Wolf YI, Koonin EV (2007) Evolutionarily
conserved genes preferentially accumulate introns. Genome Res
17:1045–1050
Carthew RW, Sontheimer EJ (2009) Origins and mechanisms of
miRNAs and siRNAs. Cell 136:642–655
Castillo-Davis CI, Mekhedov SL, Hartl DL, Koonin EV, Kondrashov
FA (2002) Selection for short introns in highly expressed genes.
Nat Genet 31:415–418
Catania F, Lynch M (2008) Where do introns come from? PLoS Biol
6:e283
Catania F, Gao X, Scofield DG (2009) Endogenous mechanisms for
the origins of spliceosomal introns. J Hered 100:591–596
Cavalier-Smith T (2002) The phagotrophic origin of eukaryotes and
phylogenetic classification of Protozoa. Int J Syst Evol Microbiol
52:297–354
Cech TR (2009) Crawling out of the RNA world. Cell 136:599–602
Collins LJ, Chen XS (2009) Ancestral RNA: The RNA biology of the
eukaryotic ancestor. RNA Biol 6 (in press)
Collins LJ, Penny D (2005) Complex spliceosomal organization
ancestral to extant eukaryotes. Mol Biol Evol 22:1053–1066
Collins LJ, Penny D (2009) The RNA-infrastructure: dark matter of
the eukaryote cell? Trends Genet 25:120–128
Collins LJ, Kurland CG, Biggs P, Penny D (2009) The modern RNP
world of eukaryotes. J Hered 100:597–604
Coulombe-Huntington J, Majewski J (2007) Characterization of
intron loss events in mammals. Genome Res 17:23–32
Darnell JE, Doolittle WF (1986) Speculations on the early course of
evolution. Proc Natl Acad Sci USA 83:1271–1275
Das S (2009) Evolutionary origin and genomic organisation of microRNA genes in immunoglobulin Lambda variable region gene
family. Mol Biol Evol 26:1179–1189
Davila Lopez M, Rosenblad MA, Samuelsson T (2008) Computational screen for spliceosomal RNA genes aids in defining the
phylogenetic distribution of major and minor spliceosomal
components. Nucleic Acids Res 36:3001–3010
De Duve C (2007) The origin of eukaryotes: a reappraisal. Nat Rev
Genet 8:395–403
De Nooijer S, Holland BR, Penny D (2009) Eukaryote origins: there
was no Garden of Eden? PLoS One 4:e5507
Del Campo M, Recinos C, Yanez G, Pomerantz SC, Guymon R, Crain
PF, McCloskey JA, Ofengand J (2005) Number, position, and
significance of the pseudouridines in the large subunit ribosomal
RNA of Haloarcula marismortui and Deinococcus radiodurans.
RNA 11:210–219
Dennis PP, Omer A (2005) Small non-coding RNAs in Archaea. Curr
Opin Microbiol 8:685–694
Dennis PP, Russell AG, Moniz De Sá M (1997) Formation of the 50
end pseudoknot in small subunit ribosomal.RNA: involvement of
U3-like sequences. RNA 3:337–343
Dennis PP, Omer A, Lowe T (2001) A guided tour: small RNA
function in Archaea. Mol Microbiol 40:509–519
Derelle E, Ferraz C, Rombauts S et al (2006) Genome analysis of the
smallest free-living eukaryote Ostreococcus tauri unveils many
unique features. Proc Natl Acad Sci USA 103:11647–11652
Di Giulio M (2008a) The split genes of Nanoarchaeum equitans are
an ancestral character. Gene 421:20–26
Di Giulio M (2008b) Split genes, ancestral genes. In: Wong JT-F,
Lazcano A (eds) Prebiotic evolution and astrobiology. Landes
Bioscience, Austin
Egel R, Penny D (2007) On the origin of meiosis in eukaryotic
evolution: coevolution of meiosis and mitosis from feeble
beginnings. In: Egel R, Lankenau D-H (eds) Recombination and
meiosis: models, means, evolution. Springer, Berlin, pp 249–288
J Mol Evol
Eigen M (1992) Steps toward life: a perspective on evolution. Oxford
University Press, Oxford
Embley TM, Martin W (2006) Eukaryotic evolution, changes and
challenges. Nature 440:623–630
Fedorov A, Fedorova L (2004) Introns: mighty elements from the
RNA world. J Mol Evol 59:718–721
Fedorov A, Roy S, Fedorova L, Gilbert W (2003) Mystery of intron
gain. Genome Res 13:2236–2241
Forterre P (1995) Thermoreduction, a hypothesis for the origin of
prokaryotes. C R Acad Sci Paris Life Sci 318:415–422
Forterre P, Gribaldo S (2007) The origin of modern terrestrial life.
HFSP J 1:156–168
Gaffney DJ, Keightley PD (2006) Genomic selective constraints in
murid noncoding DNA. PLoS Genet 2:1912–1923
Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S,
Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A
(2009) Rfam: updates to the RNA families database. Nucleic
Acids Res 37:D136–D140
Gaspin C, Cavaillé J, Erauso G, Bachellerie J (2000) Archaeal
homologs of eukaryotic methylation guide small nucleolar
RNAs: lessons from the Pyrococcus genomes. J Mol Biol
297:895–906
Gazave E, Marques-Bonet T, Fernando O, Charlesworth B, Navarro
A (2007) Patterns and rates of intron divergence between
humans and chimpanzees. Genome Biol 8:R21
Gilbert W (1987) The exon theory of genes. Cold Spring Harbor
Symp Quant Biol 52:901–905
Gilbert W, de Souza SJ (1999) Introns and the RNA world. In:
Gesteland RF, Cech TR, Atkins JF (eds) The RNA world, 2nd
edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor,
pp 221–231
Goddard MR, Burt A (1999) Recurrent invasion and extinction of a
selfish gene. Proc Natl Acad Sci USA 96:13880–13885
Guo XY, Wang Y, Keightley PD, Fan LJ (2007) Patterns of selective
constraints in noncoding DNA of rice. BMC Evol Biol 7:e208
Halligan DL, Keightley PD (2006) Ubiquitous selective constraints in
the Drosophila genome revealed by a genome-wide interspecies
comparison. Genome Res 16:875–884
Hartman A, Fedorov A (2002) The origin of the eukaryotic cell: a
genomic investigation. Proc Natl Acad Sci USA 99:1420–1425
Hetzer M, Wurzer G, Schweyen RJ, Mueller MW (1997) Transactivation of group II intron splicing by nuclear U5 snRNA.
Nature 386:417–420
Hickey DA (1992) Evolutionary dynamics of transposable elements
in prokaryotes and eukaryotes. Genetica 86:269–274
Holzmann J, Frank P, Loffler E, Bennett KL, Gerner C, Rossmanith
W (2008) RNase P without RNA: identification and functional
reconstitution of the human mitochondrial tRNA processing
enzyme. Cell 135:462–474
Irimia M, Penny D, Roy SW (2007a) Coevolution of genomic intron
number and splice sites. Trends Genet 23:321–325
Irimia M, Rukov JL, Penny D, Roy SW (2007b) Functional and
evolutionary analysis of alternatively spliced genes suggests an
early eukaryotic origin of alternative splicing. BMC Evol Biol
7:188
Irimia M, Rukov JL, Penny D, Vinther J, Garcia-Fernandez J, Roy
SW (2008) Origin of introns by ‘intronization’ of exonic
sequences. Trends Genet 24:378–381
Jeffares DC, Poole AM, Penny D (1998) Relics from the RNA world.
J Mol Evol 46:18–36
Jeffares DC, Mourier T, Penny D (2006) The biology of intron gain
and loss. Trends Genet 22:16–22
Jeffares DC, Penkett CJ, Bähler J (2008) Rapidly regulated genes are
intron poor. Trends Genet 24:375–378
Jurica MS, Moore MJ (2003) Pre-mRNA splicing awash in a sea of
proteins. Mol Cell 12:5–14
Keeling PJ, Burger G, Durnford DG, Lang BF, Lee RW, Perlman RE,
Roger AJ, Gray MW (2005) The tree of eukaryotes. Trends Ecol
Evol 20:670–676
Koonin EV (2006) The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus
introns-late debate? Biol Direct 1:22
Kurland CG, Collins LJ, Penny D (2006) Genomics and the
irreducible nature of eukaryote cells. Science 312:1011–1014
Lane CE, van den Heuvel K, Kozera C et al (2007) Nucleomorph
genome of Hemiselmis andersenii reveals complete intron loss
and compaction as a driver of protein structure and function.
Proc Natl Acad Sci USA 104:19908–19913
Lecompte O, Ripp R, Thierry JC, Moras D, Poch O (2002)
Comparative analysis of ribosomal proteins in complete
genomes: an example of reductive evolution at the domain
scale. Nucleic Acids Res 30:5382–5390
Lehman N (2003) A case for the extreme antiquity of recombination.
J Mol Evol 56:770–777
Lincoln TA, Joyce GF (2009) Self-sustained replication of an RNA
enzyme. Science 323:1229–1232
Llopart A, Comeron JM, Brunet FG, Lachaise D, Long M (2002)
Intron presence-absence polymorphism in Drosophila driven by
positive Darwinian selection. Proc Natl Acad Sci USA 99:8121–
8126
Logsdon JM (1998) The recent origin of spliceosomal introns
revisited. Curr Opin Genet Dev 8:637–648
Logsdon JM Jr, Tyshenko MG, Dixon C, Jafari D-J, Walker VK,
Palmer JD (1995) Seven newly discovered intron positions in the
triose-phosphate isomerase gene: evidence for the introns-late
theory. Proc Natl Acad Sci USA 92:8507–8511
Lu J, Shen Y, Wu Q, Kumar S, He B, Shi S, Carthew RW, Wang SM,
Wu C (2008) The birth and death of microRNA genes in
Drosophila. Nat Genet 40:351–355
Lynch M (2002) Intron evolution as a population-genetic process.
Proc Natl Acad Sci USA 99:6118–6123
Lynch M (2007) The origins of genome architecture. Sinauer,
Sunderland
Maizels N, Weiner AM (1999) The genomic tag hypothesis: what
molecular fossils tell us about the evolution of tRNA. In:
Gesteland RF, Cech TR, Atkins JF (eds) The RNA world, 2nd
edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor,
pp 79–111
Martin W, Russell MJ (2003) On the origins of cells: a hypothesis for
the evolutionary transitions from abiotic geochemistry to chemoautotrophic prokaryotes, and from prokaryotes to nucleated
cells. Philos Trans R Soc Lond B Biol Sci 358:59–83
Mosig A, Chen JJ-L, Stadler PF (2007) Homology search with
fragmented nucleic acid sequence patterns. In: Algorithms in
bioinformatics. Lecture notes in bioinformatics, vol 4645.
Springer, Berlin, pp 335–345
Mourier T, Jeffares DC (2003) Eukaryotic intron loss. Science
300:1393
Nedelcu AM (2009) Comparative genomics of phylogenetically
diverse unicellular eukaryotes provide new insights into the
genetic basis for the evolution of the programmed cell death
machinery. J Mol Evol 68:256–268
Nilsson AI, Koskiniemi S, Eriksson S, Kugelberg E, Hinton JCD,
Andersson DI (2005) Bacterial genome size reduction by
experimental evolution. Proc Natl Acad Sci USA 102:12112–
12116
Niu D-K (2007) Protecting exons from deleterious R-loops: a
potential advantage of having introns. Biol Direct 2:11
Ofengand J, Bakin A (1997) Mapping to nucleotide resolution of
pseudouridine residues in large subunit ribosomal RNAs from
representative eukaryotes, prokaryotes, archaebacteria, mitochondria and chloroplasts. J Mol Biol 266:246–268
123
J Mol Evol
Omer AD, Lowe TM, Russell AG, Ebhardt H, Eddy SR, Dennis PP
(2000) Homologs of small nucleolar RNAs in Archaea. Science
288:517–522
Omilian AR, Scofield DG, Lynch M (2008) Intron presence-absence
polymorphisms in Daphnia. Mol Biol Evol 25:2129–2139
Penny D (2005) An interpretive review of the origin of life research.
Biol Philos 20:633–671
Penny D, Collins LJ (2009) Evolutionary genomics leads the way. In:
Caetano-Anolles G (ed) Evolutionary genomics and systems
biology. Wiley, Hoboken
Penny D, Phillips MJ (2004) The rise of birds and mammals: are
microevolutionary processes sufficient for macroevolution.
Trends Ecol Evol 19:516–522
Poole AM (2006) Getting from an RNA world to modern cells just got
a little easier. BioEssays 28:105–108
Poole AM (2009) Eukaryote evolution: the importance of the stem
group. In: Caetano-Anolles G (ed) Evolutionary genomics and
systems biology. Wiley, Hoboken
Poole AM, Penny D (2007) Evaluating hypotheses for the origin of
eukaryotes. BioEssays 29:74–84
Poole AM, Jeffares DC, Penny D (1998) The path from the RNA
world. J Mol Evol 46:1–17
Poole AM, Jeffares DC, Penny D (1999) Prokaryotes, the new kids on
the block. BioEssays 21:880–889
Reanney DC (1979) RNA splicing and polynucleotide evolution.
Nature 277:598–600
Reanney DC (1987) Genetic error and genome design. Cold Spring
Harb Symp Quant Biol 52:751–757
Ren XY, Vorst O, Fiers MWEJ et al (2006) In plants, highly
expressed genes are the least compact. Trends Genet 22:528–532
Rodrı́guez-Trelles F, Tarrı́o R, Ayala FJ (2006) Origins and evolution
of spliceosomal introns. Annu Rev Genet 40:47–76
Rose AB (2008) Intron-mediated regulation of gene expression. Curr
Top Microbiol Immunol 326:277–290
Roy SW, Gilbert W (2006) The evolution of spliceosomal introns:
patterns, puzzles and progress. Nat Rev Genet 7:211–221
Roy SW, Irimia M (2009a) Splicing in the eukaryote ancestor: form,
function, and dysfunction. Trends Ecol Evol 24:447–455
Roy SW, Irimia M (2009b) Mystery of intron gain: new data and new
models. Trends Genet 25:67–73
Roy SW, Penny D (2007) Widespread intron loss suggests retrotransposon activity in ancient apicomplexans. Mol Biol Evol
24:1926–1933
Rozhdestvensky TS, Tang TH, Tchirkova IV, Brosius J, Bachellerie
JP, Hüttenhofer A (2003) Binding of L7Ae protein to the K-turn
of archaeal snoRNAs: a shared RNA binding motif for C/D and
H/ACA box snoRNAs in Archaea. Nucleic Acids Res 31:869–
877
Russell AG, Charette JM, Spencer DF, Gray MW (2006) A very early
evolutionary emergence of the minor spliceosome. Nature
443:863–866
Santos M, Zintzaras E, Szathmary E (2004) Recombination in
primeval genomes: a step forward but still a long leap from
maintaining a sizable genome. J Mol Evol 59:507–519
Sashital DG, Cornilescu G, Butcher SE (2004) U2-U6 RNA folding
reveals a group II intron-like domain and a four-helix junction.
Nat Struct Mol Biol 11:1237–1242
123
Schoemaker RJ, Gultyaev AP (2006) Computer simulation of
chaperone effects of Archaeal C/D box sRNA binding on rRNA
folding. Nucleic Acids Res 34:2015–2026
Scofield DG, Lynch M (2008) Evolutionary diversification of the Sm
family of RNA-associated proteins. Mol Biol Evol 25:2255–
2267
Seetharaman M, Eldho NV, Padgett RA, Dayie KT (2006) Structure
of a self-splicing group II intron catalytic effector domain 5:
parallels with spliceosomal U6 RNA. RNA 12:235–247
Sharp PA (2009) The centrality of RNA. Cell 136:577–580
Slamovits CH, Keeling PJ (2006) A high density of ancient
spliceosomal introns in oxymonad excavates. BMC Evol Biol
6:e34
Stoltzfus A (1999) On the possibility of constructive neutral
evolution. J Mol Evol 49:169–181
Sun F-J, Caetano-Anolles G (2008) The origin and evolution of tRNA
inferred from phylogenetic analysis of structure. J Mol Evol
66:21–35
Sverdlov AV, Csuros M, Rogozin IB, Koonin EV (2007) A glimpse
of a putative pre-intron phase of eukaryotic evolution. Trends
Genet 23:105–108
Tanaka H, Kato K, Yamashita E, Sumizawa T, Zhou Y, Yao M,
Iwasaki K, Yoshimura M, Tsukihara T (2009) The structure of
rat liver vault at 3.5 Angstrom resolution. Science 323:384–388
Tarrio R, Ayala FJ, Rodriguez-Trelles F (2008) Alternative splicing: a
missing piece in the puzzle of intron gain. Proc Natl Acad Sci
USA 105:7223–7228
Tran E, Zhang X, Lackey L, Maxwell ES (2005) Conserved spacing
between the box C/D and C’/D’ RNPs of the archaeal box C/D
sRNP complex is required for efficient 20 -O-methylation of
target RNAs. RNA 11:285–293
Tycowski KT, Aab A, Steitz JA (2004) Guide RNAs with 50 caps and
novel box C/D snoRNA-like domains for modification of
snRNAs in metazoa. Curr Biol 14:1985–1995
Valadkhan S, Mohammadi A, Jaladat Y, Geisler S (2009) Protein-free
small nuclear RNAs catalyze a two-step splicing reaction. Proc
Natl Acad Sci USA 106:11901–11906
Valentine DL (2007) Adaptations to energy stress dictate the ecology
and evolution of the Archaea. Nat Rev Microbiol 5:316–323
Veretnik S, Wills C, Youkharibache P, Valas RE, Philip E, Bourne PE
(2009) Sm/Lsm genes provide a glimpse into the early evolution
of the spliceosome. PLoS Comp Biol 5:e1000315
Wang M, Caetano-Anolles G (2009) The evolutionary mechanics of
domain organization in proteomes and the rise of modularity in
the protein world. Structure 17:66–78
Weiner AM (1993) Messenger-RNA splicing and autocatalytic
introns—distant cousins or the products of chemical determinism. Cell 72:161–164
Woodhams MD, Stadler PF, Penny D, Collins LJ (2007) RNase MRP
and the RNA processing cascade in the eukaryotic ancestor.
BMC Evol Biol 7:S1–S13
Xing Y, Lee C (2007) Relating alternative splicing to proteome
complexity and genome evolution. Adv Exp Med Biol 623:36–
49
Ying SY, Lin SL (2009) Intron-mediated RNA interference and
microRNA biogenesis. Methods Mol Biol 487:387–413
Download