J Mol Evol DOI 10.1007/s00239-009-9279-5 An Overview of the Introns-First Theory David Penny Æ Marc P. Hoeppner Æ Anthony M. Poole Æ Daniel C. Jeffares Received: 13 August 2009 / Accepted: 8 September 2009 Ó Springer Science+Business Media, LLC 2009 Abstract We review the introns-first hypothesis a decade after it was first proposed. It is that exons emerged from non-coding regions interspersed between RNA genes in an early RNA world, and is a subcomponent of a more general ‘RNA-continuity’ hypothesis. The latter is that some RNAbased systems, especially in RNA processing, are ‘relics’ that can be traced back either to the RNA world that preceded both DNA and encoded protein synthesis or to the later ribonucleoprotein (RNP) world (before DNA took over the main coding role). RNA-continuity is based on independent evidence—in particular, the relative inefficiency of RNA catalysis compared with protein catalysis— and leads to a wide range of predictions, ranging from the origin of the ribosome, the spliceosome, small nucleolar RNAs, RNases P and MRP, and mRNA, and it is consistent with the wide involvement of RNA-processing and regulation of RNA in modern eukaryotes. While there may still be cause to withhold judgement on intron origins, there is strong evidence against introns being uncommon in the last eukaryotic common ancestor (LECA), and expanding only D. Penny (&) Allan Wilson Center, Massey University, Palmerston North, New Zealand e-mail: d.penny@massey.ac.nz M. P. Hoeppner A. M. Poole Department of Molecular Biology and Functional Genomics, Stockholm University, 106 91 Stockholm, Sweden A. M. Poole School of Biological Sciences, University of Canterbury, Christchurch 8140, New Zealand D. C. Jeffares Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK within extant eukaryotic groups—the ‘very-late’ intron invasion model. Similarly, it is clear that there are selective forces on numbers and positions of introns; their existence may not always be neutral. There is still a range of viable alternatives, including introns first, early, and ‘latish’ (i.e. well established in LECA), and regardless of which is ultimately correct, it pays to separate out various questions and to focus on testing the predictions of sub-theories. Keywords Introns RNA world Eukaryote origins RNP world Spliceosome Introns early Introduction The introns-first theory was published just over a decade ago (Poole et al. 1998; Jeffares et al. 1998), and aimed to account for the origin of mRNA within an evolutionary framework for the origin of genetically encoded protein synthesis in the late stages of the RNA world (see Fig. 1 for a summary). There are three aspects to this hypothesis: that mRNA arose by co-option of expressed non-functional RNA, that the co-opted RNAs were interspersed between functional RNA genes, and that the core of the spliceosome, some extant genes, and their introns may be relics from this very early period. The last of these is directly testable in principle. Introns-first is thus part of a much wider analysis of the expectations of the continuity of RNA systems from the RNA and ribonucleoprotein (RNP) worlds to modern organisms. This more general RNAcontinuity theory (Fig. 2a, see Penny and Collins 2009) is that many classes of RNA in modern eukaryotes have existed since these earlier phases; they are in our terminology ‘relics’, though of course the associated proteins would only have arisen after encoded protein synthesis 123 J Mol Evol either introns first or early. Figure 2b shows the contrast between the RNA-continuity model and the more common idea that an early complexity of RNA control mechanisms from then RNP world was lost in prokaryotes (archaea and bacteria) and reinvented in eukaryotes. The last decade has seen a major expansion in our knowledge of the roles of RNA in eukaryotes and has expanded the classes of RNA that are known and the questions that need to be addressed. An early focus was on ubiquitous RNAs with a processing function (e.g. rRNA, tRNA, snRNA, small nucleolar RNA [snoRNa], srpRNA, RNase P, and RNase MRP) and the exon/intron structure of eukaryote genes (see Gilbert 1987; Cavalier-Smith 2002; Rodrı́guez-Trelles et al. 2006; Di Giulio 2008a, b, and references therein), but the finding of the widespread and complex roles of RNA in eukaryote cells (including RNAi) has broadened the discussion to the extent we now refer to the ‘RNA infrastructure’ of the eukaryote cell (Collins and Penny 2009). In particular, a range of regulatory RNAs have been identified in the last decade and include the many classes of small RNA involved in RNAi, such as miRNA, siRNA and piRNA, as well as their role in epigenetics (Carthew and Sontheimer 2009). Fig. 1 The main components of the introns-first model. Introns and intron splicing arose in a RNA world organism where both the genome and the enzymes are composed of RNA. The double-stranded RNA genome (at top) contained RNA genes (filled boxes), interspersed with sequences that are non-RNA coding (open boxes). Transcription produces single pre-processed transcripts. These transcripts are then processed (spliced) to produce mature functional RNAs. Non-functional RNA byproducts are also produced from this processing as a byproduct of liberation of functional RNAs (such as snoRNAs). Some such byproducts were subsequently recruited to non-templated protein synthesis as a means to stabilize the interaction between two charged tRNAs during non-genetically encoded peptide synthesis (by pairing with what subsequently became ‘anticodon’ loop). This model for the origin of mRNA suggests co-evolution of the genetic code and these earliest transcripts. The introns-first hypothesis proposed that the first proteins were initially selected for propensity to stabilize functional RNA and were not catalytic. Hence, introns are derived from RNA genes, and these were present prior to the evolution of protein-coding segments (exons) developed. Thus, the introns-first hypothesis is an independent sub-hypothesis of this more general model, and the introns-first model needs to be evaluated independently; for example, the RNA-continuity hypothesis could stand even though the introns-first could eventually be rejected. We will see that the introns-first model shifts the focus onto eukaryotes, and if it could be established that eukaryotes were indeed formed de novo from an archaeal and a bacterial cell (see Embley and Martin 2006 for different models) then that would be a very strong evidence against 123 Introns First, Early, Late, and the RNA-Continuity Hypothesis A basic question is the extent to which the RNA infrastructure arose de novo in eukaryotes, or whether there is continuity of many classes of RNA, including those restricted to eukaryotes, from the later stages of the origin of life through to the present. Most researchers appear to consider eukaryotes as ‘advanced’ cells that must in some way be derived from ‘primitive’ prokaryotes. Although this is definitely possible, we think that under the three domains of life view, the data is currently insufficient to exclude the alternative that eukaryotes have remained relatively inefficient in, for example, their processing of mRNA. In contrast, we could consider bacteria and archaea as having evolved a very fast and efficient mRNA processing system—whether from thermoreduction (Forterre 1995), r-selection (Poole et al. 1998), efficient selection in large populations (Lynch 2007), chronic energy stress (Valentine 2007), or other reasons. Currently, it is important to remain open-minded about different interpretations of eukaryote origins, and focus on using the available data to test the different models. It is unhelpful to prematurely decide on just one model. The first step is to outline the timing of the different hypotheses. For the origin of spliceosomal introns, a common distinction is between scenarios that consider introns as a late addition via horizontal transfer, or a remnant of the RNA or RNP worlds. There are plausible, J Mol Evol Fig. 2 Two models for the origin of the high RNA complexity in eukaryotes. a Under the RNA-continuity model, the basic system of RNA processing of RNA in modern eukaryotes evolved in an earlier ribonucleoprotein stage of the origin of life—an RNP world. The model involves streamlining of RNA processing separately in bacteria and archaea, with the latter having retained some snoRNAs. There would be continued evolution (including expansion) of the RNA infrastructure in eukaryotes. b Under the RNA re-expansion model, there may have been the same early complex system of RNA processing of RNA, but this was largely lost via streamlining or replacement before LUCA. This subsequently re-expanded in eukaryotes. This model has one loss and one gain. The order of branching of archaea, bacteria, and eukaryotes is deliberately ambiguous; branching is independent of the models, and alternative topologies are consistent with either model. (Modified from Penny and Collins (2009).) detailed models for the late origin and spread of the spliceosome and spliceosomal introns in eukaryotes, these having derived in a stepwise manner from group II introns (Hickey 1992; Stoltzfus 1999)—introns-late. An origin for group II introns in the RNA world has likewise been proposed—a form of the introns-early model (Gilbert and de Souza 1999) distinct from the original exon theory of genes. Moreover, the possibility of a rudimentary spliceosomal apparatus dating back to the RNA world has at times been advocated by several authors (Reanney 1979; Darnell and Doolittle 1986; Poole et al. 1998). As shown in Fig. 3, a continuum of hypotheses is possible, and the figure emphasizes that there are no hard and fast boundaries between introns first/early and introns early/late. The divisions arise naturally, and allow for an early biochemical phase during the origin of life (e.g. Martin and Russell 2003). It is generally assumed (e.g. Lincoln and Joyce 2009; Cech 2009; Sharp 2009; Penny 2005) that RNA preceded encoded protein synthesis, and that this RNA– protein world preceded DNA being used as the main information storage macromolecule. This gives a natural tripartite division into Thus, we use the division between the RNA world, the RNP world, and the DNA worlds as our primary distinction, though agreeing that rigid distinctions are overly simplistic. For example, as illustrated in Fig. 3a, there will be overlaps between first/early and early/late. Again, the intron/exon structure could, in principle, have arisen in a common ancestor of eukaryotes and archaea (the intronsmiddle model, Fig. 3b), and whether this was pre- or postDNA depends on when DNA took over the main coding role (see Forterre and Gribaldo 2007). Introns-early is basically that introns arose around the time of the origin of protein synthesis. It suggests that it aided the rapid diversification of early proteins by allowing recombination of smaller sections (possibly functional modules) of proteins. Such new recombinants may have had a significant selective advantage, and thus introns would have been selected in association with the new recombinants (hitch-hiking). The standard introns late theory is that the intron/exon structure of genes, and the associated spliceosome, arose well after protein coding and synthesis were established. The hypothesis is usually associated with prokaryotes being well established before eukaryotes arose. It divides naturally into two: the ‘introns latish’ (Sverdlov et al. 2007) which would have introns expanding before the last common ancestor of eukaryotes (possibly after the endosymbiosis of the mitochondrial ancestor), whereas the ‘introns very late’ version would introns-first (some introns arose in the RNA world), introns-early (introns date from the RNP world); and introns-late (introns post-date DNA as the main coding macromolecule, the DNA world). 123 J Mol Evol are often testable independently. So, here, we split the ideas into those specific to introns-first, followed then by others that only apply to the more general RNA-continuity model. Obviously, the RNA-continuity model could eventually be established, even if the introns-first model was rejected. Introns-First a. Timing; introns per se date back to the RNA world. b. Some original introns were functional RNAs. c. The origin of introns pre-dates protein-coding mRNAs. d. Continuity: at least some introns are still present from an RNA or RNP world; they have not all been lost, and then reappeared ‘de novo’ (but some would have evolved new functions). e. Spliceosomal introns have been lost from bacteria and archaea by reductive evolution (this is also required by the introns early model). f. Primordial intron position is not correlated with protein structural modularity in eukaryote proteins. g. Intron positions are dynamic, as opposed to always in the same position. In other words, they can (on an evolutionary time scale) be lost, gained, duplicated, or drift in position. (This contrasts with an earlier view that introns are fixed in their position.) Fig. 3 The introns first, early, and late models for the origin of spliceosomal introns, expressed in a linear form (a) or as a tree (b). The basic subdivisions are based on whether the intron–exon structure arose in the RNA world (introns first, before encoded protein synthesis), in the RNP world (introns early), or after the origin of DNA synthesis (introns late). The latter is subdivided into introns ‘latish’ (after DNA synthesis, but well before the last common ancestor of eukaryotes), and introns ‘very late’ (with spliceosomal introns spreading within modern eukaryotes). Gradation between the hypotheses is possible, especially between introns first and early. A range of options is shown for the origin of eukaryotes, from the left hand arrow with them being old (with relatively inefficient RNA processing) to the right hand arrow with eukaryotes arising late in evolution (often being inferred as symbiosis between an archaeon and a bacterium) have eukaryote introns expanding only within extant eukaryotes. Here, there has been significant progress over the last decade and it is therefore appropriate to reconsider the situation again. Although timing is the defining criterion of the intronsfirst theory, there is a much wider range of questions to be considered, and some already lead to testable predictions which can be evaluated with current knowledge. A general concern (Penny and Phillips 2004; Poole and Penny 2007) is that we tend to treat complex theories somewhat as ‘slogans’, and do not divide up a theory into its different components. These sub-theories (components) 123 Other RNA-Continuity Aspects h. i. j. Some other RNAs are relics from the RNA world (ribosomes and tRNA are the best known and clearest examples). The spliceosome is a very large macromolecular RNP complex that must have evolved in a stepwise manner. Understanding its origins is crucial to understanding the origin of the current intron/exon gene structure. Some processes may likewise be relics (e.g. RNA processing in tRNA, rRNA and mRNA maturation); it is an over-simplification to concentrate only on introns. A variety of other inter-linked processes should be evaluated (e.g. splicing, nonsense mediated decay, and other transcription-related processes, Collins and Penny 2009). In the past, a major focus has been on the intron/exon structure of eukaryote genes, but it is clear from the ten points above that the RNA-continuity hypothesis is much broader, and involves evaluation of the evolutionary origins of all classes of non-coding RNAs. For example, the nature of the ancestral spliceosome is a fundamental question under introns first or early, and its origin is often ignored under the standard introns-late hypothesis (see, however, Stoltzfus 1999; Scofield and Lynch 2008; J Mol Evol Veretnik et al. 2009). Under the introns-late approach, the favoured model is that the spliceosomal introns in eukaryotes are derived from type II introns in bacteria; this is certainly the current consensus—even though the data are at best circumstantial (see later). Having introduced the ‘RNA-continuity’ concept, the next step is establishing criteria for its evaluation. General Principles In this section, we will give some of the general scientific principles required for evaluating the RNA-continuity theory, and what criteria we expect such a scientific evolutionary theory to demonstrate. Perhaps there does need to be more emphasis in biology on seeing what major principles can be derived from physical and chemical properties (De Nooijer et al. 2009). Specifying the principles should make it easier to evaluate arguments for or against a line of reasoning. Here, we consider seven main aspects: catalytic efficiency, the error rate limitation on genome size (the Eigen limit), the continuity of intermediate forms, effects of population size, agreement from prior knowledge, reductive evolution, and no predetermined direction of change. Catalytic Rate Proteins are generally better catalysts than RNA. The data in Table 1 for both turnover (kcat) and catalytic efficiency Km shows proteins are much faster than comparable ribozymes (RNA-based enzymes). This result leads to our primary hypothesis (Jeffares et al. 1998) that once a protein is carrying out a catalytic reaction, a ribozyme will not displace it—we expect that RNA will never ‘take back’ a catalytic role that proteins are already doing. This gives a direction of change to macromolecular evolution; from ribozymes to protein enzymes. The hypothesis is that a reaction catalysed by a ribozyme was never catalysed in that lineage by a protein. This direction of change is a simple consequence of proteins being catalytically more effective than ribozymes (though as discussed in Jeffares et al. 1998, the selective pressure will be lower on ribozymes acting on macromolecular complexes where catalysis is limited by diffusion times), and RNA catalysis still plays a major role in eukaryotes (Cech 2009). An excellent example of the trend of proteins taking over a catalytic function occurs in human mitochondria (and probably other mammals) where RNase P is no longer a ribozyme because it has lost its catalytic RNA component—it is now a protein enzyme (Holzmann et al. 2008). This is a striking example of support for a prediction about the direction of change. Even though we predict the direction of catalysis to go from ribozymes to proteins, this still allows diversification of existing small RNAs into new roles, especially as guide RNAs and regulation of expression. It is the complexity of RNA processing RNA in eukaryotes that is so striking, and especially the catalytic roles of RNA in RNA processing (Valadkhan et al. 2009; Collins and Penny 2009). Eigen Limit Table 1 Turnover numbers for ribozymes and proteins kcat (min-1) kcat/Km (M-1 min-1) 0.1 9.0 9 107 0.3 0.5 6.0 9 103 8.3 9 105 L-19 intron 1.7 4.3 9 104 RNase P RNA 1 2.0 9 106 RNase P RNA and protein 2 4.0 9 106 5,700 1.1 9 108 5,700 6.0 9 108 25,000 6.0 9 108 258,000 1.4 9 1010 780,000 9.0 9 108 600,000,000 7.2 9 109 Catalyst Tetrahymena L-21 (SacI) Polynucleotide kinase 19-base virusoid a b RNase T1 Staphylococcal nuclease b T4 polynucleotide kinase Triose-P isomerase b Cyclophilinb Carbonic anhydrase b b Turnover number is kcat. Values largely from Jeffares et al. (1998), note that the units of time are in minutes a Artificial ribozyme evolved in vitro b Protein catalysts One of the most profound discoveries from origin of life theory was the calculation that the higher error rate copying from RNA-based systems places a very strong upper limit on the size of a genome. Above this limit, there are too many errors per replication for selection to maintain the optimal sequence (see Eigen 1992) and there is an ‘error catastrophe’. Above this limit, the sequence randomizes. This places a very strong limit on the size of any genome, especially if it is being copied by a ribozyme (Jeffares et al. 1998; Poole 2006). It also gives a strong selective force towards protein involvement in catalysis (replication specifically) and what we call the Darwin–Eigen cycle (Poole et al. 1999; Penny 2005). This is a positive feedback cycle that favours increased fidelity, leading to longer coding sequences, allowing additional genes to be coded for, which allows for increased fidelity, and so on. Overall, the reduced coding capacity in early living systems puts strong limits on what genomes were possible, and it leads to the expectation that recombination (as between RNA viruses) would have been important in an RNA world (Reanney 123 J Mol Evol 1987; Jeffares et al. 1998; Lehman 2003; Santos et al. 2004). Continuity of Functional Intermediates It is not possible under any form of Darwinian evolution for a spliceosome to just ‘appear’ when it is needed, such as immediately following an ‘invasion of the introns’. Related to this is that there cannot be selection ‘for’ something that does not yet exist; for something that will only be useful in the future. The origin of the ribosome is one such example; under standard evolutionary theory, ribosomal RNA must have had a prior function before being co-opted into protein synthesis. We have suggested that the first function for the proto-ribosome was as an RNA-dependent RNA polymerase that added nucleotides three at a time, thus improving replication fidelity (Poole et al. 1998, 1999). We call such a hypothetical enzyme an RNA triplicase. As an aside, this relies on the ‘genomic tag’ hypothesis (Maizels and Weiner 1999; see also Fedorov and Fedorova 2004; Sun and Caetano-Anolles 2008), which is a way of distinguishing genomic copies of RNA from functional RNAs. The excision of such RNAs out of a precursor transcript via action of a primordial spliceosome (Poole et al. 1998, 1999) constitutes one plausible mechanism (Fig. 1). However, the model for its origin by Bokov et al. (2009) appears not possible as published, because it does not establish a continuous series of functional intermediates—though that problem could no doubt be fixed in that case. It is standard in molecular evolution for a gene selected for one function to be ‘recruited’ or ‘co-opted’ for a related function. In this context, plausible models must give a stepwise origin of the spliceosome where the intermediate stages are functional. The spliceosome is larger even than the ribosome, and has five small RNAs (U1, U2, U4, U5, and U6), together with (in humans) up to 200 proteins (Jurica and Moore 2003). While the evolution of the spliceosome should be a focus of all theories for the origin of introns, only a minority of authors seem to bother. One is Stoltzfus’ elegant and detailed model, under the introns-late hypothesis, wherein he describes a stepwise model for the emergence of the spliceosome from mitochondrially derived group II self-splicing introns (Stoltzfus 1999); another is the view that a rudimentary RNA-based splicing machinery would have originally been used in error correction: transsplicing allows an early possible mechanism for recombination (Reanney 1979), though we cannot test that prediction yet. The introns-first hypothesis proposes a third (compatible with the second) that the primordial splicing machinery enabled generalized RNA gene expression in an RNA world. 123 Population Size and Slightly Deleterious Mutations Lynch (2002, 2007) has correctly pointed out that the effective population size (Ne) is important in comparing evolution between bacteria/archaea and eukaryotes and for evaluating the origin of genomic elements such as introns. Simply put, Lynch points out slightly deleterious mutations (such as an additional intron in a protein) are more likely to drift to fixation in a species with a small population size. And that conversely, such mutations are less likely to be fixed in species with a small population size, to the extent that with very large population sizes (such as extant bacteria) certain deleterious elements have virtually no chance of drifting to fixation. While these arguments generally assume an introns-late model, introns-early models are quite compatible with this aspect of evolutionary theory, as follows. First, the same population genetics theory indicates that species with large Ne will have an increased likelihood of fixing slightly advantageous elements. While Lynch’s initial formulation assumed that introns were slightly deleterious (Lynch 2002), it is now well understood that many introns contain functional elements (see below). Second, it is perhaps unlikely to expect that in the RNA or RNP worlds, Ne was large and selection was efficient; we would think the opposite would be more likely, at least in the earliest stages, where replication fidelity was low, likely necessitating error correcting mechanisms such as redundancy and recombination (Reanney 1987; Poole 2006). Consequently, primordial genome architecture would not closely resemble the streamlined architecture of modern bacteria and archaea, which is arguably the product of large Ne and efficient selection. A key question is therefore whether all early life underwent a period of such efficient selection, or whether this has only effectively operated on some lineages. Our simulation results (De Nooijer et al. 2009) indicate that both smaller primary producers and larger consumers are expected to have occurred very early. Since predators consistently have smaller population sizes than prey, there are likely to be a range of population sizes even in these early stages of the evolution of life. So, even if all intronic elements were slightly deleterious, or initially neutral, it is plausible that non-coding intronic elements would drift at some frequency in some of the prey populations until they evolve some function. It is interesting to note here that population size, the strength of purifying and adaptive selection, genome size, replication, and translation fidelity are all interrelated, and so need to be considered together. These arguments show that while population size affects intron dynamics, it does not in itself distinguish between the introns first/early/late models without other supporting information. If introns do remain until the last universal common ancestor (LUCA), then the complete loss of spliceosomal introns in bacteria and archaea and recurrent J Mol Evol Table 2 Population effects (Ne) occur under introns first, early, or late Population effect (Ne) if introns derived RNA relics Population effect (Ne) if introns ancestral Yes, accounted for = 4 Not directly accounted for, but not incompatible = ? snoRNAs in Expansion of introns and small RNAs may be expected under introns small Ne, but this does not account explicitly for their origin, and does not preclude a very early origin Introns first Does not directly account for origins, but is not incompatible (see Aims to explain the origin of mRNA in the context of the RNA above) to protein transition Introns early ? (see above) Introns late SnoRNAs argued to date back to RNP world, with the intronic position possibly being an ancestral feature Suggests group II ribozymes have an RNA world origin Does not directly account for origins, but is not incompatible (see Not compatible, but an RNA world origin does not exclude above) later intron expansion (e.g. subsequent to the emergence of full meiosis with outcrossing) Spliceosome A late origin is not a requisite for the population size effect model; a late origin would be argued to be non-adaptive. If common origin (Fig. 2c) then maybe increased complexity in eukaryotes and/or reductive evolution in bacteria and archaea. Intron expansion Expansion is not incompatible with an early origin or with a late origin Expected under small Ne, sexual outcrossing. Opposite may occur under asexual reproduction and in unicellular lineages with large Ne loss of introns in eukaryotes lineages could be due in part to changes in population size, along with many of conditions that favour intron loss or accumulation (see below). The initial origin of introns, and the spliceosome, and the extent to which RNA splicing is intertwined with many aspects of RNA processing (Collins et al. 2009) are not well described by population size effects alone. Table 2 shows some of our reasoning. Prior Information In science, it is always positive when background information allows predictions about other types of events. In this sense, introns-first is derived from more basic principles, and makes predictions about phenomena that were not included in the original data that were used to formulate the hypothesis. It is based on two major scientific observations: the relatively poor catalytic performance of RNA as compared with proteins and the limitations on genome size that follows from the relatively poor catalytic properties of RNA relative to protein (the Eigen limit above, and see Table 1). This is in contrast to both the introns-early and introns-early theories that are more ‘post-hoc’ hypotheses—proposed to explain phenomena already observed. They do not aim to explain any phenomena other than the origin of introns. We have to be careful here not to overgeneralize, but the introns-first theory was proposed (Jeffares et al. 1998) as a solution to the stepwise evolution of protein synthesis from an RNA world. Reductive Evolution There is now strong evidence that reduction of eukaryote genomes has occurred on several occasions, and examples include yeasts, parasites, and small algae (Derelle et al. 2006). Such reduction often includes extensive intron loss, especially in eukaryotes with a short life cycle (Jeffares et al. 2006). Processes of mitochondrial loss, gene loss, and intron loss in eukaryotes are now quite well understood, along with reasonable explanations for why it can be advantageous. A fascinating, though clearly derived, example of complete intron loss comes from the recently sequenced nucleomorph genome of Hemiselmis andersenii which was reported to have lost spliceosomal introns completely (Lane et al. 2007). We sometimes refer to such extreme genome reduction in eukaryotes as ‘prokaryoteenvy’, but more seriously our knowledge of genome reduction makes the view that prokaryotes are derived by genome reduction much more plausible than it seemed 10 years ago. The weight of evidence for intron loss, both the rate and the phylogenetic frequency (few lineages seem to show gain, Roy and Gilbert 2006; Koonin 2006; Mourier and Jeffares 2003) is one example of how our views have changed, mainly due to the availability of genome data. Additionally, we are now starting to understand how selection might influence intron gain, retention, and loss (see below). Some (reduced) eukaryote genomes have very few introns, and initially these were regarded as evidence that the last common ancestor of eukaryotes had few introns (Logsdon 1998). This is the ‘introns very late’ hypothesis—that introns (and the spliceosome) may have arisen within extant eukaryotes. The idea that introns arose and spread only within eukaryotes is similar to the (now discredited) Archeozoa hypothesis that some modern eukaryotes had never had mitochondria (see Embley and Martin 2006; Poole and Penny 2007). The recognition that there were derived parasitic and anaerobic eukaryotes with 123 J Mol Evol reduced genomes (and mitochondria) thus discredits both the original Archeozoa hypothesis and the introns-very late hypothesis. So, we have divided ‘introns late’ into ‘introns latish’ (introns had arisen early in eukaryote evolution and were well established by the time of the last eukaryotic common ancestor [LECA]) and ‘introns very late’ (that introns arose and spread within extant eukaryotes). The former is still possible, the latter is now rejected. The Direction of Change Cannot be Assumed A Priori The final principle discussed here is that, in evolution, there is no generally guaranteed direction of change from simple to complex or vice versa. In each case, independent evidence must be found, for example, the catalytic efficiency argument for the direction from ribozyme to protein enzymes. In relation to introns, the prevailing view is that ‘simple’ group II introns evolved into the complex assemblage of spliceosome and introns. While a reasonable model exists, it is still difficult to establish on current evidence the evolutionary relationship between spliceosomal and type II introns. Figure 4 shows a range of possibilities for the relationship between the eukaryote spliceosomal introns and the type II self-splicing introns. Several important results (Hetzer et al. 1997; Sashital et al. 2004; Seetharaman et al. 2006; Valadkhan et al. 2009) strengthen the argument that the spliceosome is an RNA catalyst that shares a common molecular ancestor with group II introns. A good null hypothesis here is the proposal that the similarity between group II introns and the spliceosome and spliceosomal introns is that voiced by Weiner (1993) that the similarities in catalysis may be the result of chemical determinism. If that were the case, the two could not be concluded to share a common origin (Fig. 4a). To our knowledge, no one has provided unequivocal evidence for common descent (Fig. 4b), though on the weight of circumstantial evidence there may perhaps be consensus for the latter. However, that consensus is perhaps nearer the scenario in Fig. 4c than an acknowledgement of common descent. At the risk of stating the obvious, to say that the spliceosome (an extant and highly evolved structure) evolved from group II self-splicing introns (another extant and highly evolved structure) is analogous to saying that humans (extant) evolved from chimpanzees (extant); the opposite (Fig. 4d) is no more productive. A better inference may be that the two extant species (or structures in the spliceosomal case) evolved from a common ancestor. Unfortunately, there seems to be little evidence from the RNA structures alone to determine which of the two (spliceosomal splicing or group II self-splicing introns) is closer to the ancestral state. More generally, given that selfsplicing introns pursue a horizontal lifestyle that is expected to lead to repeated cycles of insertion, atrophy, loss, and reinsertion, it is also difficult to establish the antiquity of group II introns; given introns found in mitochondria are subject to this cycle (Goddard and Burt 1999), their presence in extant mitochondria cannot be trivially deigned ancestral by reference to the bacterial origin of this organelle. Finally, it is not sufficient to assume that is it the simpler of the two (group II introns) because we know that reductive evolution and gene fusion do occur. Improved Understanding Over the Last Decade Fig. 4 Four hypotheses for the relationship of Group II and spliceosomal introns. a The two types of introns arose independently. b Independent modifications from an unknown common ancestor (ancestral state not specified). c Spliceosomal introns are derived from bacterial group II introns (usually assumed to be associated with the endosymbiotic origin of mitochondria). d Reductive evolution if bacteria arose by reductive evolution 123 There are three relevant areas on which we will concentrate in regard to intron evolution. The first is the recognition over the last decade that the last common eukaryote ancestor was already quite complex in both its biochemistry and its cellular organization. Then there is the improved understanding of the gain and/or loss of introns as revealed by comparative genomics—giving a much more dynamic view of intron evolution (early ideas considered introns more static). Finally, there is the improved understanding about the selective forces that can affect the average number of introns per protein-coding gene in a species. J Mol Evol The Complexity of LECA It would be much easier to infer the biochemical, molecular, and sub-cellular features of LECA if we were confident about the position of the root of the eukaryotes. However, the deep phylogeny of eukaryotes remains unresolved (Keeling et al. 2005), and so our most reliable approach has been to search for functions present in all six main eukaryote lineages. If a feature occurs in all six groups, then the ancestor is expected to have had that feature, independently of where the root actually is. The developing conclusion is that LECA does appear to have a surprisingly high complexity (De Duve 2007; Hartman and Fedorov 2002; Kurland et al. 2006; Poole 2009). In particular, the RNA ‘control’ of RNA processing and regulation appears well developed (Collins and Chen 2009). Neither the overall complexity nor the high reliance of RNA in processing and regulation of other RNA molecules is well explained by most models of eukaryote origins that assume eukaryotes are ‘advanced’ or ‘derived’. Of relevance here is the apparent presence of a complete spliceosome with five U-RNAs and over 80 proteins in LECA (Collins and Penny 2005; Veretnik et al. 2009). For humans, the spliceosome has around 200 proteins making up the spliceosome, but current techniques (including ancestral sequence reconstruction) only identified 84 of them in LECA—the main point being that the proteins were distributed over all the sub-components of the spliceosome. In addition, there are two classes of introns: major and minor, with the minor spliceosome having U11 and U12 snRNAs instead of U1 and U2. Russell et al. (2006) and Davila Lopez et al. (2008) provide a range of evidence that the minor spliceosome is also very ancient in eukaryotes, though it is not completely clear yet whether it also dates back to the LECA. Clearly then, splicing evolved at the very latest in the eukaryote stem, and was a wellestablished feature of LECA. There are other examples of new data changing our view in this area. Alternative splicing was sometimes assumed to have occurred ‘for’ the origin of multicellularity, though given the principles of evolution outlined earlier such a view is not tenable—a feature cannot be selected solely because it might give a hypothetical advantage at some unspecified time in the future. However, recent work makes it seem that alternative splicing is relatively ancient within eukaryotes (Irimia et al. 2007b; Tarrio et al. 2008) and so was available to be recruited into development of multicellular animals, rather than have evolved ‘for’ that process. A similar example seems to arise with apoptosis (programmed cell death, Nedelcu 2009) where it occurs in unicellular eukaryotes, and thus would be available later for multicellular development. Another well-known aspect of RNA is the discovery of RNAi and its complex set of regulatory reactions. The full extent of these small RNA types in deep lineages of eukaryotes makes it likely that it will also be an ancestral feature (Collins and Chen 2009). We do need to be careful here because new members of existing RNA families almost certainly emerge in some lineages (e.g. Lu et al. 2008). It is therefore important to separate (Collins and Penny 2009) such new examples of existing RNA types (where all the protein machinery is already present) from genuine new novelties where a new class of regulatory RNAs have arisen. In general, the newer results are consistent with some form of RNA continuity remaining in eukaryote genomes, including for processes such as recombination and meiosis (Egel and Penny 2007). We had assumed earlier that RNase MRP (which is involved in processing rRNA) had arisen just within eukaryotes, but it has now been found in all groups of eukaryotes that have been well studied (Woodhams et al. 2007), and so is now more likely to have been in LECA. Not all classes of RNA can currently be inferred to be in LECA. In our earlier papers, we were impressed by the large RNP ‘vault’ particles in many eukaryotes, but their distribution is still uncertain. To some extent, there seemed to be a decrease in interest in them, but now that a three-dimensional crystal structure is available (Tanaka et al. 2009), interest has certainly revived again. There has been recent work on identifying the vaults RNAs themselves (vRNAs, Mosig et al. 2007), but the proteins are probably easier targets to identify. We have concentrated here on RNA, but the protein-fold diversity in eukaryotes also has important information about the molecular components (including RNPs) LECA may have contained (Lecompte et al. 2002, Wang and Caetano-Anolles 2009). Intron Early/Late and Introns Fixed/Dynamic There has been considerable progress in the last 10 years in describing the presence and absence of introns in eukaryote genomes, primarily because of the availability of genome data. Earlier analyses were based on the best data available—relatively deep phylogenetic studies of a handful of genes (e.g. triosephosphate isomerase, Logsdon et al. 1995). However, rates of gain/loss are stochastic at best and probably biased (Carmel et al. 2007; Jeffares et al. 2008). In effect, it appears to us that some of the early debate was comparing ‘introns early and fixed in position’ and ‘introns late and mobile’. With the establishment that introns could be gained and lost during evolution, it appeared initially that this favoured ‘introns late and mobile’. Recent work has eliminated the introns ‘very late’ proposal—that the eukaryote ancestor had only a few 123 J Mol Evol introns and that they increased during the evolution of extant eukaryotes. It is now clear that the average number of introns per gene was relatively high early in the evolution of modern eukaryotes, and that most lineages have undergone more loss than gain of introns over the last billion years at least (e.g. Roy and Irimia 2009a). Furthermore, although some early branching lineages are relatively intron poor, this is probably the result of extensive loss, since others have many introns. For example, Slamovits and Keeling (2006) report that some excavates have a high number of introns per gene. In addition, it appears that the motifs for intron recognition sites are stronger when there are few introns per gene (Irimia et al. 2007a), selection may be stronger with fewer choices. We also understand more about rates of intron gain and loss, and several accounts are available (Fedorov et al. 2003; Roy and Irimia 2009b). There are a few accounts where a lineage has undergone considerable intron gain (Roy and Penny 2007; Carmel et al. 2007) and at last some examples of recent intron gain (Omilian et al. 2008). However, studies that use phylogenetic inferences to infer rates of gain and loss generally find that rates of loss are several orders of magnitude greater than rates of gain (Roy and Penny 2007; Coulombe-Huntington and Majewski 2007). However, an important recent development has been new models of intron gain, wherein an exonic region becomes a new intron, so-called ‘intronisation’ (Irimia et al. 2008; Catania and Lynch 2008; Catania et al. 2009). Depending on whether these new introns arise initially as minor splice variants, and/or in generally less conserved exons, such new introns may not be well detected by previous methods. So it is possible that intron gain rates have been underestimated. Nevertheless, it is clear that rates of gain and loss can vary considerably between species, and over evolutionary time, and between genes of one genome (Carmel et al. 2007). In summary, we can now rule out the ‘introns very late’ variant of the introns-late model (where introns arise after LECA), we know that introns are lost frequently and we suspect that they are gained less frequently. Functions of Introns: Continuity with Earlier Stages of Evolution? A variety of studies have shown functionality for specific modern introns, including playing host to snoRNAs and miRNAs (Niu 2007; Brown et al. 2008; Ying and Lin 2009). Studies of selective constraint in genomes indicate that intronic sites are subject to purifying selection (Halligan and Keightley 2006; Guo et al. 2007; Gazave et al. 2007; Gaffney and Keightley 2006), suggesting that there may be many more undiscovered functional elements in introns. An assumption of the introns-first hypothesis is 123 that introns had functions even before the origin of encoded protein synthesis. For the moment, we will ignore any possible direct catalytic role and mention only the possibilities of assisting processing and stabilization of early ribozymes (such as the ribosome). For example, it is well established that snoRNAs are required for pre-rRNA processing, or modification of other stably transcribed RNAs. It is an expectation of introns-first that some modern introns are derived from these. As shown in Table 3, some sites of pseudouridylation and methylation in rRNA are conserved across members of all three domains. Furthermore, sno-like sRNAs homologous to C/D- and H/ACAbox snoRNAs are found in archaea (Gaspin et al. 2000; Omer et al. 2000; Rozhdestvensky et al. 2003; Tran et al. 2005). An examination of the phylogenetic distribution of snoRNA families curated by Rfam (Gardner et al. 2009) indicates that only 3 out of the 489 families in release 9.1 are shared between archaea and eukaryotes, the remainder appears to be domain specific for archaea (47 families) and eukaryotes (438 families), respectively (data not shown). An additional intriguing case is that of snoRNA U3 which provides a circumstantial case for snoRNAs in LUCA; it is present in eukaryotes, and there is, moreover, a U3 snoRNA-like fold in cis, with a similar role in pre-rRNA in bacteria (Dennis et al. 1997). In addition to roles in processing or modification, archaeal sRNAs might also act as chaperones, aiding folding of rRNA and tRNA (Dennis and Omer 2005; Schoemaker and Gultyaev 2006). That might in turn indicate a similar primordial role for s(no)RNA in an RNA world, though such speculation is difficult to test experimentally. Selective Forces Affecting the Numbers of Introns Various biological factors have been proposed to influence the intron density of a gene and the organism as a whole (Jeffares et al. 2006). The preponderance of loss of intron in eukaryotes supports that they are generally deleterious in eukaryote genomes. Lynch (2002) has proposed that introns are deleterious because of the extra mutational load Table 3 Some sites of pseudouridylation and methylation are at homologous positions on the rRNA across all three domains of life Archaea Human/ yeast Archaea/ Eukarya Archaea/Eukarya/ Escherichia coli 23 3 2 SSU 11 13 2 1 LSU 31 23 7 2 Pseudouridylation LSU 4 Methylation Data derived from: Ofengand and Bakin (1997), Dennis et al. (2001), and del Campo et al. (2005) J Mol Evol they confer (an intron-containing gene contains many extra sites that are essential to its proper expression than an intron-less version). Two patterns of intron density within genomes help to describe in more detail why introns are deleterious. Based on observation that highly expressed genes have shorter introns in humans and Caenorhabditis elegans, it was proposed that there had been selection against the energetic cost of transcribing long introns (Castillo-Davis et al. 2002). However, the opposite pattern was observed in plants—highly expressed genes are the least compact (Ren et al. 2006). A pattern that is possibly more consistent between diverse eukaryotes is that rapidly regulated genes are less intron-dense than constitutively active genes (Jeffares et al. 2008). Since transcription and mRNA maturation is a relatively slow process in eukaryotes, this may reflect selection for rapid protein production. These observations suggest that introns are deleterious because they hinder rapid gene expression and repression, and they make high gene expression energetically expensive. One case of positive selection associated with an intron loss in a population has been observed (Llopart et al. 2002); in this case, the intron-loss allele is less strongly transcribed, but this illustrates that individual intron loss alleles are subject to selection. We now know that very intron-poor eukaryote genomes can be the result of extensive intron loss. In many cases, this is simply the result of general genome compaction, along with reduced intergenic regions, and loss of genes. However, extensive intron loss is not always associated with compact genomes (Jeffares et al. 2006). The budding yeast, Saccharomyces cerevisiae, has an intron-poor genome (intron density 0.045 introns/gene), but not an extremely compact genome. The thermophilic alga Cyanidioschyzon merolae and the parasitic trypanosome Leishmania major are other examples (intron densities of 0.005 and 0.003, respectively). Such derived intron-poor species may have lost introns at any point in their evolutionary past, and it remains speculative to attribute the intron loss to a particular environmental condition. Experimental evolution should help to describe such processes, as it has done for genome reduction of bacteria (Nilsson et al. 2005), particularly now that sequencing of small eukaryote genomes has become rapid and inexpensive. As mentioned above, it is clear that introns can sometimes contain advantageous elements, microRNAs and snoRNAs (Niu 2007; Brown et al. 2008; Ying and Lin 2009), DNA-acting elements such as transcriptional enhancers (Rose 2008), and intronic sequences may also be required for alternative splicing that vastly expanded the proteome complement of some intron-dense species (Xing and Lee 2007). Such advantageous introns are expected to be maintained in a population, despite the costs of maintaining them (see above). Clearly, the balance of intron retention, gain, and loss within a genome is determined by many factors (such as population size and the cell division time for unicellular species), but particular introns can be subject to strong, non-neutral purifying or adaptive selection. This will mean that an intron may be retained over a very long period of time if it contains a functional element, or arise/be lost relatively rapidly under various conditions (such as a novel mutation that affects fitness). In vertebrates, many snoRNAs involved in ribosome maturation are encoded in introns (Tycowski et al. 2004). Likewise, numerous microRNAs are in intron encoded (Das 2009), but some are even encoded in exons. The final hypothesis is that spliceosomal introns only arose after DNA synthesis was established, and proteins had taken over their current role as the primary catalysts in living systems. We all prefer simpler hypotheses, but if there were a dual origin, with some introns being very ancient (first) and others gained during evolution then it makes hypotheses harder to falsify. Perhaps the origin of the spliceosome then becomes very important. Summary and Future Directions Compared to the data available 10 years ago, much more is known about genome evolution and function. There has also been considerable progress in describing the components of LECA. This indicates that LECA possessed a complex spliceosome, and it is now clear that ancestral eukaryotes were not extremely intron poor. The current evidence indicates that very intron-poor eukaryote genomes are probably the result of extensive intron loss. We are aware that intron dynamics are highly stochastic and affected by both organism-level and gene-level selective processes. An understanding of selective process encouraging intron loss makes the complete loss in prokaryotes a more plausible proposal than it was a decade ago. Our new understanding of the ‘RNA infrastructure’ of eukaryotic cells and the ncRNA functions of introns means that we are aware that some introns carry essential components of genes, that will be maintained during evolution, and this is compatible with continuity with ancient functions of introns in the RNA world. Although progress in eliminating theories may appear to be slow, there has been progress over the last decade, and some theories can be rejected (Fig. 5). The role of hypotheses is to stimulate further tests, rather than to be ‘believed’ as facts; they are simply tools to make predictions that (in principle) can be tested. It is always helpful to specify a range of hypotheses and not be forced into inappropriate and limited ‘binary choices’. Although it is difficult to anticipate future directions, we certainly expect rapid progress in understanding the nature 123 J Mol Evol Fig. 5 Current status of introns first, early, and late. The ‘introns fixed’ model (lower graphic), in all its forms, is rejected by the data from intron positions as studied in comparative genomics. Similarly, the introns ‘very late’ model can be rejected, even with intron position being dynamic (upper graphic). In contrast, several versions of the ‘introns dynamic’ model (first, early, and latish) are still possible on current evidence. It is imperative that we retain all three hypotheses and seek ways of testing them objectively of the last eukaryote common ancestor. This will at least define much more accurately the detailed questions to be answered. A major difficulty at present is that it often appears that theories for the origin of eukaryotes do not even address the question of the origin of the many defining features of eukaryote cells. At this point, answers appear quite open, even the present authors would probably not agree on predicting the order of events in early evolution! A basic prediction of eukaryotes first or early is that eukaryotes have ancient origins, and are not derived from two prokaryote cells (an archaeal and a bacterial cell). Under these models, the favoured alternative is that those two cell types (archaea and bacteria) are derived by reductive evolution from a more complex cell structure that had at least some similarities in its RNA processing to modern eukaryotes. Reductive evolution is now recognized as being widespread in evolution, so that direction of change is less of a problem than was perceived a decade ago. Possibly some RNAs (such as snoRNAs) may be very old, but their intronic location may be evolutionary derived. So, we could end up with combinations of hypotheses such as ‘snoRNAs in LUCA, introns latish’. We are hopeful of elimination of additional hypotheses, such as those occurred with the introns very late hypothesis. As such, being able to eliminate some alternatives means that the subject is within the realm of modern science, but it is going to be hard to come to a final decision. References Bokov K, Sergey V, Steinberg SV (2009) A hierarchical model for evolution of 23S ribosomal RNA. Nature 457:977–980 123 Brown JWS, Marshall DF, Echeverria M (2008) Intronic noncoding RNAs and splicing. Trends Plant Sci 13:335–342 Carmel L, Rogozin IB, Wolf YI, Koonin EV (2007) Evolutionarily conserved genes preferentially accumulate introns. Genome Res 17:1045–1050 Carthew RW, Sontheimer EJ (2009) Origins and mechanisms of miRNAs and siRNAs. Cell 136:642–655 Castillo-Davis CI, Mekhedov SL, Hartl DL, Koonin EV, Kondrashov FA (2002) Selection for short introns in highly expressed genes. Nat Genet 31:415–418 Catania F, Lynch M (2008) Where do introns come from? PLoS Biol 6:e283 Catania F, Gao X, Scofield DG (2009) Endogenous mechanisms for the origins of spliceosomal introns. J Hered 100:591–596 Cavalier-Smith T (2002) The phagotrophic origin of eukaryotes and phylogenetic classification of Protozoa. Int J Syst Evol Microbiol 52:297–354 Cech TR (2009) Crawling out of the RNA world. Cell 136:599–602 Collins LJ, Chen XS (2009) Ancestral RNA: The RNA biology of the eukaryotic ancestor. RNA Biol 6 (in press) Collins LJ, Penny D (2005) Complex spliceosomal organization ancestral to extant eukaryotes. Mol Biol Evol 22:1053–1066 Collins LJ, Penny D (2009) The RNA-infrastructure: dark matter of the eukaryote cell? Trends Genet 25:120–128 Collins LJ, Kurland CG, Biggs P, Penny D (2009) The modern RNP world of eukaryotes. J Hered 100:597–604 Coulombe-Huntington J, Majewski J (2007) Characterization of intron loss events in mammals. Genome Res 17:23–32 Darnell JE, Doolittle WF (1986) Speculations on the early course of evolution. Proc Natl Acad Sci USA 83:1271–1275 Das S (2009) Evolutionary origin and genomic organisation of microRNA genes in immunoglobulin Lambda variable region gene family. Mol Biol Evol 26:1179–1189 Davila Lopez M, Rosenblad MA, Samuelsson T (2008) Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components. Nucleic Acids Res 36:3001–3010 De Duve C (2007) The origin of eukaryotes: a reappraisal. Nat Rev Genet 8:395–403 De Nooijer S, Holland BR, Penny D (2009) Eukaryote origins: there was no Garden of Eden? PLoS One 4:e5507 Del Campo M, Recinos C, Yanez G, Pomerantz SC, Guymon R, Crain PF, McCloskey JA, Ofengand J (2005) Number, position, and significance of the pseudouridines in the large subunit ribosomal RNA of Haloarcula marismortui and Deinococcus radiodurans. RNA 11:210–219 Dennis PP, Omer A (2005) Small non-coding RNAs in Archaea. Curr Opin Microbiol 8:685–694 Dennis PP, Russell AG, Moniz De Sá M (1997) Formation of the 50 end pseudoknot in small subunit ribosomal.RNA: involvement of U3-like sequences. RNA 3:337–343 Dennis PP, Omer A, Lowe T (2001) A guided tour: small RNA function in Archaea. Mol Microbiol 40:509–519 Derelle E, Ferraz C, Rombauts S et al (2006) Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. Proc Natl Acad Sci USA 103:11647–11652 Di Giulio M (2008a) The split genes of Nanoarchaeum equitans are an ancestral character. Gene 421:20–26 Di Giulio M (2008b) Split genes, ancestral genes. In: Wong JT-F, Lazcano A (eds) Prebiotic evolution and astrobiology. Landes Bioscience, Austin Egel R, Penny D (2007) On the origin of meiosis in eukaryotic evolution: coevolution of meiosis and mitosis from feeble beginnings. In: Egel R, Lankenau D-H (eds) Recombination and meiosis: models, means, evolution. Springer, Berlin, pp 249–288 J Mol Evol Eigen M (1992) Steps toward life: a perspective on evolution. Oxford University Press, Oxford Embley TM, Martin W (2006) Eukaryotic evolution, changes and challenges. Nature 440:623–630 Fedorov A, Fedorova L (2004) Introns: mighty elements from the RNA world. J Mol Evol 59:718–721 Fedorov A, Roy S, Fedorova L, Gilbert W (2003) Mystery of intron gain. Genome Res 13:2236–2241 Forterre P (1995) Thermoreduction, a hypothesis for the origin of prokaryotes. C R Acad Sci Paris Life Sci 318:415–422 Forterre P, Gribaldo S (2007) The origin of modern terrestrial life. HFSP J 1:156–168 Gaffney DJ, Keightley PD (2006) Genomic selective constraints in murid noncoding DNA. PLoS Genet 2:1912–1923 Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A (2009) Rfam: updates to the RNA families database. Nucleic Acids Res 37:D136–D140 Gaspin C, Cavaillé J, Erauso G, Bachellerie J (2000) Archaeal homologs of eukaryotic methylation guide small nucleolar RNAs: lessons from the Pyrococcus genomes. J Mol Biol 297:895–906 Gazave E, Marques-Bonet T, Fernando O, Charlesworth B, Navarro A (2007) Patterns and rates of intron divergence between humans and chimpanzees. Genome Biol 8:R21 Gilbert W (1987) The exon theory of genes. Cold Spring Harbor Symp Quant Biol 52:901–905 Gilbert W, de Souza SJ (1999) Introns and the RNA world. In: Gesteland RF, Cech TR, Atkins JF (eds) The RNA world, 2nd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, pp 221–231 Goddard MR, Burt A (1999) Recurrent invasion and extinction of a selfish gene. Proc Natl Acad Sci USA 96:13880–13885 Guo XY, Wang Y, Keightley PD, Fan LJ (2007) Patterns of selective constraints in noncoding DNA of rice. BMC Evol Biol 7:e208 Halligan DL, Keightley PD (2006) Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison. Genome Res 16:875–884 Hartman A, Fedorov A (2002) The origin of the eukaryotic cell: a genomic investigation. Proc Natl Acad Sci USA 99:1420–1425 Hetzer M, Wurzer G, Schweyen RJ, Mueller MW (1997) Transactivation of group II intron splicing by nuclear U5 snRNA. Nature 386:417–420 Hickey DA (1992) Evolutionary dynamics of transposable elements in prokaryotes and eukaryotes. Genetica 86:269–274 Holzmann J, Frank P, Loffler E, Bennett KL, Gerner C, Rossmanith W (2008) RNase P without RNA: identification and functional reconstitution of the human mitochondrial tRNA processing enzyme. Cell 135:462–474 Irimia M, Penny D, Roy SW (2007a) Coevolution of genomic intron number and splice sites. Trends Genet 23:321–325 Irimia M, Rukov JL, Penny D, Roy SW (2007b) Functional and evolutionary analysis of alternatively spliced genes suggests an early eukaryotic origin of alternative splicing. BMC Evol Biol 7:188 Irimia M, Rukov JL, Penny D, Vinther J, Garcia-Fernandez J, Roy SW (2008) Origin of introns by ‘intronization’ of exonic sequences. Trends Genet 24:378–381 Jeffares DC, Poole AM, Penny D (1998) Relics from the RNA world. J Mol Evol 46:18–36 Jeffares DC, Mourier T, Penny D (2006) The biology of intron gain and loss. Trends Genet 22:16–22 Jeffares DC, Penkett CJ, Bähler J (2008) Rapidly regulated genes are intron poor. Trends Genet 24:375–378 Jurica MS, Moore MJ (2003) Pre-mRNA splicing awash in a sea of proteins. Mol Cell 12:5–14 Keeling PJ, Burger G, Durnford DG, Lang BF, Lee RW, Perlman RE, Roger AJ, Gray MW (2005) The tree of eukaryotes. Trends Ecol Evol 20:670–676 Koonin EV (2006) The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? Biol Direct 1:22 Kurland CG, Collins LJ, Penny D (2006) Genomics and the irreducible nature of eukaryote cells. Science 312:1011–1014 Lane CE, van den Heuvel K, Kozera C et al (2007) Nucleomorph genome of Hemiselmis andersenii reveals complete intron loss and compaction as a driver of protein structure and function. Proc Natl Acad Sci USA 104:19908–19913 Lecompte O, Ripp R, Thierry JC, Moras D, Poch O (2002) Comparative analysis of ribosomal proteins in complete genomes: an example of reductive evolution at the domain scale. Nucleic Acids Res 30:5382–5390 Lehman N (2003) A case for the extreme antiquity of recombination. J Mol Evol 56:770–777 Lincoln TA, Joyce GF (2009) Self-sustained replication of an RNA enzyme. Science 323:1229–1232 Llopart A, Comeron JM, Brunet FG, Lachaise D, Long M (2002) Intron presence-absence polymorphism in Drosophila driven by positive Darwinian selection. Proc Natl Acad Sci USA 99:8121– 8126 Logsdon JM (1998) The recent origin of spliceosomal introns revisited. Curr Opin Genet Dev 8:637–648 Logsdon JM Jr, Tyshenko MG, Dixon C, Jafari D-J, Walker VK, Palmer JD (1995) Seven newly discovered intron positions in the triose-phosphate isomerase gene: evidence for the introns-late theory. Proc Natl Acad Sci USA 92:8507–8511 Lu J, Shen Y, Wu Q, Kumar S, He B, Shi S, Carthew RW, Wang SM, Wu C (2008) The birth and death of microRNA genes in Drosophila. Nat Genet 40:351–355 Lynch M (2002) Intron evolution as a population-genetic process. Proc Natl Acad Sci USA 99:6118–6123 Lynch M (2007) The origins of genome architecture. Sinauer, Sunderland Maizels N, Weiner AM (1999) The genomic tag hypothesis: what molecular fossils tell us about the evolution of tRNA. In: Gesteland RF, Cech TR, Atkins JF (eds) The RNA world, 2nd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, pp 79–111 Martin W, Russell MJ (2003) On the origins of cells: a hypothesis for the evolutionary transitions from abiotic geochemistry to chemoautotrophic prokaryotes, and from prokaryotes to nucleated cells. Philos Trans R Soc Lond B Biol Sci 358:59–83 Mosig A, Chen JJ-L, Stadler PF (2007) Homology search with fragmented nucleic acid sequence patterns. In: Algorithms in bioinformatics. Lecture notes in bioinformatics, vol 4645. Springer, Berlin, pp 335–345 Mourier T, Jeffares DC (2003) Eukaryotic intron loss. Science 300:1393 Nedelcu AM (2009) Comparative genomics of phylogenetically diverse unicellular eukaryotes provide new insights into the genetic basis for the evolution of the programmed cell death machinery. J Mol Evol 68:256–268 Nilsson AI, Koskiniemi S, Eriksson S, Kugelberg E, Hinton JCD, Andersson DI (2005) Bacterial genome size reduction by experimental evolution. Proc Natl Acad Sci USA 102:12112– 12116 Niu D-K (2007) Protecting exons from deleterious R-loops: a potential advantage of having introns. Biol Direct 2:11 Ofengand J, Bakin A (1997) Mapping to nucleotide resolution of pseudouridine residues in large subunit ribosomal RNAs from representative eukaryotes, prokaryotes, archaebacteria, mitochondria and chloroplasts. J Mol Biol 266:246–268 123 J Mol Evol Omer AD, Lowe TM, Russell AG, Ebhardt H, Eddy SR, Dennis PP (2000) Homologs of small nucleolar RNAs in Archaea. Science 288:517–522 Omilian AR, Scofield DG, Lynch M (2008) Intron presence-absence polymorphisms in Daphnia. Mol Biol Evol 25:2129–2139 Penny D (2005) An interpretive review of the origin of life research. Biol Philos 20:633–671 Penny D, Collins LJ (2009) Evolutionary genomics leads the way. In: Caetano-Anolles G (ed) Evolutionary genomics and systems biology. Wiley, Hoboken Penny D, Phillips MJ (2004) The rise of birds and mammals: are microevolutionary processes sufficient for macroevolution. Trends Ecol Evol 19:516–522 Poole AM (2006) Getting from an RNA world to modern cells just got a little easier. BioEssays 28:105–108 Poole AM (2009) Eukaryote evolution: the importance of the stem group. In: Caetano-Anolles G (ed) Evolutionary genomics and systems biology. Wiley, Hoboken Poole AM, Penny D (2007) Evaluating hypotheses for the origin of eukaryotes. BioEssays 29:74–84 Poole AM, Jeffares DC, Penny D (1998) The path from the RNA world. J Mol Evol 46:1–17 Poole AM, Jeffares DC, Penny D (1999) Prokaryotes, the new kids on the block. BioEssays 21:880–889 Reanney DC (1979) RNA splicing and polynucleotide evolution. Nature 277:598–600 Reanney DC (1987) Genetic error and genome design. Cold Spring Harb Symp Quant Biol 52:751–757 Ren XY, Vorst O, Fiers MWEJ et al (2006) In plants, highly expressed genes are the least compact. Trends Genet 22:528–532 Rodrı́guez-Trelles F, Tarrı́o R, Ayala FJ (2006) Origins and evolution of spliceosomal introns. Annu Rev Genet 40:47–76 Rose AB (2008) Intron-mediated regulation of gene expression. Curr Top Microbiol Immunol 326:277–290 Roy SW, Gilbert W (2006) The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet 7:211–221 Roy SW, Irimia M (2009a) Splicing in the eukaryote ancestor: form, function, and dysfunction. Trends Ecol Evol 24:447–455 Roy SW, Irimia M (2009b) Mystery of intron gain: new data and new models. Trends Genet 25:67–73 Roy SW, Penny D (2007) Widespread intron loss suggests retrotransposon activity in ancient apicomplexans. Mol Biol Evol 24:1926–1933 Rozhdestvensky TS, Tang TH, Tchirkova IV, Brosius J, Bachellerie JP, Hüttenhofer A (2003) Binding of L7Ae protein to the K-turn of archaeal snoRNAs: a shared RNA binding motif for C/D and H/ACA box snoRNAs in Archaea. Nucleic Acids Res 31:869– 877 Russell AG, Charette JM, Spencer DF, Gray MW (2006) A very early evolutionary emergence of the minor spliceosome. Nature 443:863–866 Santos M, Zintzaras E, Szathmary E (2004) Recombination in primeval genomes: a step forward but still a long leap from maintaining a sizable genome. J Mol Evol 59:507–519 Sashital DG, Cornilescu G, Butcher SE (2004) U2-U6 RNA folding reveals a group II intron-like domain and a four-helix junction. Nat Struct Mol Biol 11:1237–1242 123 Schoemaker RJ, Gultyaev AP (2006) Computer simulation of chaperone effects of Archaeal C/D box sRNA binding on rRNA folding. Nucleic Acids Res 34:2015–2026 Scofield DG, Lynch M (2008) Evolutionary diversification of the Sm family of RNA-associated proteins. Mol Biol Evol 25:2255– 2267 Seetharaman M, Eldho NV, Padgett RA, Dayie KT (2006) Structure of a self-splicing group II intron catalytic effector domain 5: parallels with spliceosomal U6 RNA. RNA 12:235–247 Sharp PA (2009) The centrality of RNA. Cell 136:577–580 Slamovits CH, Keeling PJ (2006) A high density of ancient spliceosomal introns in oxymonad excavates. BMC Evol Biol 6:e34 Stoltzfus A (1999) On the possibility of constructive neutral evolution. J Mol Evol 49:169–181 Sun F-J, Caetano-Anolles G (2008) The origin and evolution of tRNA inferred from phylogenetic analysis of structure. J Mol Evol 66:21–35 Sverdlov AV, Csuros M, Rogozin IB, Koonin EV (2007) A glimpse of a putative pre-intron phase of eukaryotic evolution. Trends Genet 23:105–108 Tanaka H, Kato K, Yamashita E, Sumizawa T, Zhou Y, Yao M, Iwasaki K, Yoshimura M, Tsukihara T (2009) The structure of rat liver vault at 3.5 Angstrom resolution. Science 323:384–388 Tarrio R, Ayala FJ, Rodriguez-Trelles F (2008) Alternative splicing: a missing piece in the puzzle of intron gain. Proc Natl Acad Sci USA 105:7223–7228 Tran E, Zhang X, Lackey L, Maxwell ES (2005) Conserved spacing between the box C/D and C’/D’ RNPs of the archaeal box C/D sRNP complex is required for efficient 20 -O-methylation of target RNAs. RNA 11:285–293 Tycowski KT, Aab A, Steitz JA (2004) Guide RNAs with 50 caps and novel box C/D snoRNA-like domains for modification of snRNAs in metazoa. Curr Biol 14:1985–1995 Valadkhan S, Mohammadi A, Jaladat Y, Geisler S (2009) Protein-free small nuclear RNAs catalyze a two-step splicing reaction. Proc Natl Acad Sci USA 106:11901–11906 Valentine DL (2007) Adaptations to energy stress dictate the ecology and evolution of the Archaea. Nat Rev Microbiol 5:316–323 Veretnik S, Wills C, Youkharibache P, Valas RE, Philip E, Bourne PE (2009) Sm/Lsm genes provide a glimpse into the early evolution of the spliceosome. PLoS Comp Biol 5:e1000315 Wang M, Caetano-Anolles G (2009) The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. Structure 17:66–78 Weiner AM (1993) Messenger-RNA splicing and autocatalytic introns—distant cousins or the products of chemical determinism. Cell 72:161–164 Woodhams MD, Stadler PF, Penny D, Collins LJ (2007) RNase MRP and the RNA processing cascade in the eukaryotic ancestor. BMC Evol Biol 7:S1–S13 Xing Y, Lee C (2007) Relating alternative splicing to proteome complexity and genome evolution. Adv Exp Med Biol 623:36– 49 Ying SY, Lin SL (2009) Intron-mediated RNA interference and microRNA biogenesis. Methods Mol Biol 487:387–413