bp205Example3

advertisement
Genome Complexity and Splicing: An Evolutionary
Perspective
Introduction
It has been widely appreciated in the last couple years that the perceived
biological complexity of an organism may not correlate well with the number of genes in
its genome. Human does not seem to possess any more genes than mouse, and has only
about twice as many as Drosophila and C. Elegans. Errors in gene annotations aside,
organismic complexity can be due to differences in regulatory programs and the potential
increase in proteome complexity due to alternative splicing (Maniatis and Tasic 2002).
The differential removal and inclusion of exons and the use of alternative splice
sites facilitate the generation of different protein isoforms. Numerous examples of
alternative splicing are known, a recent study (Johnson et al. 2003) estimated that 74% of
human genes undergo alternative splicing. To gain further understanding of this
phenomenon, I propose to study how alternative splicing could potentially contribute to
biological complexity from the viewpoint of evolution. The focus will be on the
evolution of gene structure across the phylogeny of sequenced genomes, mainly exoncreation events. In addition, I would like to explore whether the conservation patterns of
splicing regulatory signals can lead to predictions about the prevalence of alternative
splicing in different lineages across phylogeny.
Goal 1: Exon Evolutionary Dynamics across Phylogeny
Recently it was estimated that about 5% of the exons in the human genome came
from Alu elements and such elements have high potential to form new exons (Kreahling
and Graveley 2004) in intronic regions de novo. Modrek and Lee (Modrek and Lee
2003) found that less-frequent alternatively spliced exons tend not to be conserved
between human and mouse, implying that such exons could be new to the human lineage.
They proposed that new exons can be evolved by first having weak splice signals.
Consistent with this hypothesis, Sorek et al. (Sorek et al. 2004) found that most of the
non-conserved (when compared to mouse) alternatively spliced exons in the human
genome tend to code for proteins that may not be functional (i.e. containing early stop
codons), suggesting that such exons may be in evolutionary transition. In addition,
tandem exon duplication provides another means to evolve new exons (Kondrashov et al.
2001). In general, having new exons increases the coding and alternative splicing
potential of a gene.
A key question is whether exon-creation events are prevalent across phylogeny, and if
so, whether such patterns are lineage specific. While some preliminary studies (Rogozin
et al. 2003, Kondrashov et al. 2003) on a small number of proteins have shown evidence
of exon creation within ancient introns in the eukaryote lineage (up to Yeast), a genomewide study across different branches of phylogeny is needed to understand the global
dynamics of exon creation. One of the goals is to see whether the lineage leading to more
complex organisms like human is associated with accelerated exon-creation events. To
understand short-term evolutionary dynamics, I will look at pairs of species that are
close1, including human/chimp, human/mouse, mouse/rat, c. elegans/c. briggsae, d.
melanogaster/d. simulans, and d. melanogaster/d. pseudooscura. This will be followed
by comparisons between human/d. melanogaster, human/c. elegans, d. melanogaster/c.
elegans and human/yeast to understand long-term exon creation dynamics. To detect
exon creation events for each pair of species, the following procedure will be used:
1.
Infer orthologs between pairs of species using HomologGene (Wheeler et al. 2002)
2.
Extract the corresponding genomic region for each gene and concatenate all annotated exons and
then align the translated sequence to look for gaps of size greater than N (determine empirically)
in the alignment. Such gaps could correspond to exon gain/loss events. Although by definition
intron insertions would create new exons, they do not introduce novel coding sequences that
confer new function. That is why I only focus on gaps at the coding level.
3.
Map gaps back to genomic DNA to make sure it is flanked by introns. This can also distinguish
between exon creation/loss and elongation (i.e. splice site extension of an existing exon) events.
4.
If the gap corresponds to an elongation event, try to map the exon onto the genomic DNA of the
other species. If it is conserved, it can be ignored since it is likely due to incomplete annotation.
5.
Use the closest out-group genome available to infer the ancestral state if possible.
6.
Compute the rate of exon creation/loss and elongation for each species pair
One issue of using existing genome annotations to conduct the above analysis is that
rarely used but functional exons could be missed, due to the bias towards highly abundant
1
The official release of the chimp genome should be coming soon.
transcripts in cDNA/EST libraries and incomplete tissue coverage. More importantly,
such exons are more likely to be unique in organisms like human. Also, the high level of
conservation between human and chimp implies that the differences in annotated gene
structures may be subtle (chimp cDNA and EST data is also scarce). Therefore it is
important to conduct the above analysis with inferred exons as well.
Genescan (Burge 1997) is known to be substantially more accurate when used on
small genomic regions. Hence it can be run on the annotated genic regions of the above
genomes to detect putative exons with high coding and splicing potential. As a first pass,
I will only analyze the human/chimp and human/mouse pairs. Potential exons on introns
could correspond to functional exons or exons under evolutionary transition. It would be
extremely interesting to see whether the human genome contains substantially more
potential coding exons in introns when compared to chimp and mouse2, suggesting
accelerated exon creation in the human lineage3. It has also been shown that exons tend
to code for protein domains (Kriventseva et al. 2003), hence we can scan the potential
exons using Interpro. This will give an indication of the potential increase in proteome
complexity in functional terms and provide another confidence measure of potential
exons. Finally, using GO, determine the functional enrichment of genes with accelerated
exon-creation events. It would be exciting to see neuronal/CNS and membrane proteins4,
as these are more likely to be alternatively spliced and positive selection could have
played an important role.
Goal 2: Gauging the Extent of Alternative Splicing across
Phylogeny by Conservation Pattern Analysis
In addition to the basal splicing machinery that recognizes splice sites and
branching signals, a variety of signals near the splice sites in both introns and exons have
been shown to regulate splicing (Maniatis and Tasic 2002). Such sequence motif
combinations could control the spatial and temporal specificity of exon inclusion. While
SELEX experiments and computational methods (Tacke and Manley 1999 and
Fairbrother et al. 2002) have provided numerous candidates, studying the evolution of
2
Also compare the coding potential and splice signal strength between putative exons
Computational experiments to control for false positive rates when Genscan is used in introns as well as
introns length effects will be implemented (due to length restrictions, details are omitted here).
4
Based on TMHMM predictions and expression data from Unigene and existing microarray studies
3
such elements remains difficult. However, a intriguing study (Sorek and Ast 2003) on
human/mouse comparison showed that alternatively spliced exons tend to be flanked by
highly conserved intronic sequences extending for ~100 bps on average, while only a
small percentage (5 fold less) of constitutively spliced exons are flanked by conserved
regions (~32 bps on average). Even though it is uncertain whether such conserved
elements are entirely splicing specific, it makes sense that heavily regulated exons
contain more functional information, suggesting combinatorial control by multiple
enhancers and repressors. I suspect that the same might be true for other metazoans, if
so, this observation can be employed to predict the extent of alternative splicing in
different lineages across phylogeny5. That is, if an exon is flanked by long and highly
conserved intronic sequences, it is more likely to be alternatively spliced (with the
assumption that the alternative splicing machinery is relatively fixed, so this would work
best over relatively short evolutionary distances). Note that such predictions are
independent of cDNA/EST data. Due to length restrictions, the details of how this will
be done are omitted. The main idea is simple: given two genomes, infer orthologous
genes as usual and then determine the amount (A) and length (L) of flanking sequence
conservation between orthologous internal exons; finally cluster the exons into groups
based on A and L. Clusters of high A and L indicate higher potential of being
alternatively spliced. When comparing across different lineages, the differences in intron
length needs to be controlled for. For genome pairs with substantial EST/cDNA evidence
such that exons can be classified as constitutively or alternatively spliced, a supervised
learning approach can also be used.
Using EST/cDNA data, Brett et al. (Brett et al. 2001) showed that there is no
detectable difference in the extent of alternative splicing between human, mouse, fly,
worm and plants. We can re-evaluate this using the proposed method to compare the
extent of potential alternative splicing in human/mouse, d. melanogaster/d.
psedoobscura, and c. elegans/c. briggsae. It would be very interesting to see whether the
conservation pattern points to increasing splicing regulatory complexity in mammals.
References
5
The first step would be to confirm this pattern in the fly and worm lineages.
Maniatis T. and Tasic B. (2002) Alternative pre-mRNA splicing and proteome expansion
in metazoans, Nature 418:236-243
Tacke R. and Manley J. (1999) Determinants of SR protein specificity, Current Opinion
in Cell Biology 11:358-362
Modrek B. and Lee CJ. (2003) Alternative splicing in the human, mouse and rat genomes
is associated with an increased frequency of exon creation and/or loss, Nature Genetics
34:177-180
Sorek R. and Ast G. (2003) Intronic sequences flanking alternatively spliced exons are
conserved between human and mouse, Genome Research 13:1631-1637
Kondrashov FA et al. (2001) Origin of alternative splicing by tandem exon duplication,
Human Mol. Genet. 10:2661-2669
Brett et al. (2002) Alternative splicing and genome complexity, Nature Genetics 30:2930
Johnson et al. (2003) Genome-wide scan of human alternative pre-mRNA splicing with
exon junction microarrays, Science 302:2141-2144
Kondrasnhov F. and Koonin E. (2003) Evolution of alternative splicing: deletions,
insertions and origin of functional parts of proteins from introns sequences, Trends in
Genetics 19:115-119
Rogozin IB. (2003) Remarkable interkingdom conservation of introns positions and
massive, lineage-specific introns loss and gain in eukaryotic evolution, Current Biology
13: 1512-1517
Kreahling J. and Gravelen BR. (2004) The origins and implications of Aluternative
splicing, Trends in Genetics 20:1-4
Sorek R. et al. (2004) How prevalent is functional alternative splicing in the human
genome?, Trends in Genetics 20:68-71
Fairbrother W. et al. (2002) Predictive identification of exonic splicing enhancers in
human genes, Science 297:1007-1013
Burge, C. and Karlin, S. (1997) Prediction of complete gene structures in human genomic
DNA, J. Mol. Biol. 268:78 - 94
Download