Recent WGD

advertisement
The impact of whole genome duplications: insights
from Paramecium tetraurelia
Genome Annotation
• Ab initio gene predictions
• Comparative approach
• 90,000 ESTs
A compact Mac genome
• Protein-coding regions: 78% of the
genome
• Short intergenic regions
Average = 352 bp
• Introns:
Short (average = 25 bp) …
… but numerous : 80% of genes contain
introns (average = 2.9 introns / gene)
Gene content
Number of genes
Not due to annotation artefacts (control with
cDNA data, distribution
of protein length, manual
39642 annotated genes
curation on chrom. 1, …)
45000
40000
35000
30000
25000
20000
15000
10000
5000
0
40000
39642
37500
27900
24000
20600
14000
12500
10000
6000
2000
11200
11000
9000
5200
26900
24000
28000
Many genes belong to multigenic families
Computing Best Reciprocal Hits (BRH)
within Paramecium proteins
39 642
proteins
SW comparisons
+
filtering
13 085 pairs of proteins in BRH
BRH are found in large duplicated blocs (paralogons). Example: scaffold 1 & 8
Building paralogons
• Using a sliding window of size w genes
• For each window :
– Select a paralogous region if at least p % of w genes are BRH
with the sequence
• Merging overlapping windows
• Add syntenic genes which do not have BRH
Whole genome duplication (WGD)
Settings :
W = 10
p = 61%
Coverage :
61.3 Mb (85%)
35 503 genes (90%)
Résults :
24 052 genes in 2 copies (68%)
11 451 genes in 1 copie (32%)
51% of ancestral genes are
still in 2 copies
Progressive loss of gene duplicates
Frequency (%)
• ~1500 recent pseudogenes (recognizable)
• Length distribution of genic and intergenic sequences : relics of
more ancient pseudogenes in intergenic regions
Single-copy gene
Intergenic region encompassing a gene loss
Other intergenic regions
Sequence length (bp)
BRH from supercontig 8
Number of BRH
(>3000) remains
outside of
paralogons
Inferring ancestral blocs
Paralogous genes
Arbitrary order
Ancestral blocs
Building paralogons with 131 ancestral blocs
Intermediary WGD
Settings :
W = 10
p = 40%
Coverage :
31,129 genes (79%)
Content before WGD : 20,578
genes
7 996 genes in 2 copies (39%)
12 582 genes in 1 copy (61%)
Old WGD
Settings :
W = 20
p = 30%
Coverage :
18,792 genes (47%)
Content before WGD : 9,999
genes
1 530 genes in 2 copies (15%)
8 469 genes in 1 copy (85%)
Gene content at each WGD
19 552 genes
Old
WGD
x 1.1
21 172
Intermediary
WGD
Recent
WGD
x2
(not x 8)
x 1.2
26 214
x 1.5
39 642
Protein sequence similarity between duplicates
(ohnologs)
Recent WGD
Intermediary WGD
Old WGD
Distribution of the rate of synonymous
substitution (dS) between ohnologs
saturation
Old WGD
Intermediary WGD
Recent WGD
dS computed with PAML
Recent gene conversion
Frequency (%)
Distribution of dN/dS
Recent WGD
dN/dS
• => both ohnologs are under strong negative selective
pressure
• Yet … the fate of most ohnologs is to be pseudogenized !
• => gene-silencing mutations can be tolerated …
• … but deleterious mutations affecting the coding
sequence of one copy are counterselected (i.e. dominant
effect of mutations, despite the presence of a duplicate)
• Once a gene has been silenced (e.g. by mutation of
regulatory elements), mutations can accumulate in coding
regions
Gene duplicates are evolutionarily unstable
Gene duplication
...
Time
Pseudogene
Ancient paralogs
Selective pressure to maintain
2 copies
Retention of gene duplicates
• Different (non-exclusive) models have been
proposed for the retention of gene duplicates:
– Robustness against mutations
– Functional changes: neo- or sub-functionalization
– Dosage constraints
• Which are the genes that are preferentially
retained after a WGD ?
• How does the pattern of gene retention vary
with time ?
– Compare the pattern of retention after a recent
WGD and a more ancient WGD
– Paramecium: 3 successive WGDs !
Mutational robustness
• Under certain conditions (high mutation rate
and very large population size) redundant
genes may be maintained by selection acting
against double null alleles (Force et al. 1999)
• Essential genes (e.g. ribosomal proteins) are
more retained than the average
• … but most of them are present in more than
2 copies !
• … their high rate of retention may be due to
other factors (see later)
Functional changes
Function: F1F2
Function: F
Time
...
...
Function: F
Function: F’
Neofunctionalization
(adaptation)
Function: F1 Function: F2
Subfunctionalization
(neutral evolution)
Functional changes:
- changes in gene expression pattern
- changes in the encoded protein
Force et al. (1999)
Prediction of the subfunctionalization
model
• A gene that has been preserved by
subfunctionalization at a given WGD, is less
likely to be retained in two copies at a
subsequent WGD (Force et al. 1999)
F1F2
WGD1
F2
WGD2 F1
F1
F2
F1F2
WGD1
F1F2
F1
WGD2
F2
Test of the subfunctionalization model (1)
N=12,582
Retained: 47%
Intermediate WGD
Retention at the
recent WGD ?
N=7,996
Retained: 57%
• Apparent contradiction with the subfunctionalization
model
• Due to variations in retention rate between different
functional classes ?
Test of the subfunctionalization model (2)
Old WGD
N = 343 gene families
Intermediate WGD
Retained: 67%
Retention at the
recent WGD ?
Retained: 60%
• A gene that has been preserved at a given WGD, is less
likely to be retained in two copies at a subsequent
WGD
• Difference significant (p<5%), but not very strong
• Subfunctionalization is an unlikely evolutionary
pathway in species with large population sizes (Lynch
Test of the neofunctionalization model
• Analysis of gene expression (work in progress)
• Analysis of the rate of protein evolution:
Outgroup (function F)
Ohnolog 1 (function F)
Ohnolog 2 (function F’)
• Relative rate test (PAML); correction for multiple tests
• Frequency of ohnologs with asymetric substitution rates:
– Recent WGD (N=2297) : 11%
– Intermediate WGD (N=293 ) : 16%
• More functional redundancy among recent duplicates
• Functional changes account for retention on the long
term
Fate of neofunctionalized genes at
subsequent WGD
Intermediate WGD
N = 62
Retention at the recent WGD ?
Slow copy: 66% retained
Fast copy: 26% retained
Neofunctionalized genes are more
prone to pseudogenization at
subsequent WGD
Retention for dosage constraints (1):
high expression level
• Genes that have to be expressed at very high
level are often present in multiple copies (e.g.
histones)
• The loss of one copy is counterselected
because it cannot be compensated for by the
upregulation of other copies
• => More retention among highly expressed
genes
Retention rates
For each WGD, the retention rate for a given gene category is :
Proportion of genes retained in duplicates in this category
Ratio =
Proportion of total genes retained in duplicates
Ratio = 1
no specific retention above the mean value for all genes
Ratio > 1
over-retained category
Ratio < 1
under-retained category
Expression versus Retention
Retention for dosage constraints (2):
the balance hypothesis (Papp et al. 2003)
• The relative expression levels of proteins
involved in a same functional network have to
be controled to ensure the proper stoichiometry
of the network
• Initially, the loss of one copy is counterselected
because it creates an imbalance within the
network
• On the long term, gene losses may occur
because they can be compensated for by the
upregulation of other copies
Testing the balance hypothesis (1):
Genes involved in multi-protein
complexes
• Protein complexes predicted by homology with
yeast:
– MIPS database (curation from the litterature)
– TAP / MS data (Gavin et al. Nature 2006)
Multi-protein complexes
Genes involved in the coding of protein
complexes are initially over-retained
Additive effects of Expression and
Inclusion in Complex
• Proteins involved in complexes are overretained at the recent WGD
• Does this mean that complex
stoichiometry tends to be conserved ?
Constraint of stoichiometry and fate of
duplicates
A
B complex
Complexes
with conserved
stoichiometry
p-value
Recent WGD
265 (44%)
74 (68%)
2.6x10-2
4.3x10-4
Intermediary WGD
114 (20%)
43 (43%)
1.5x10-3
2.4x10-4
Old WGD
106 (24%)
26 (43%)
1.2x10-5
2.5x10-3
Number of copy of A
MIPS complexes
Complexes from Gavin et al. Nature 2006
Number of copy of B
Testing the balance hypothesis (2):
genes involved in central metabolism
Retention of central metabolism gene
duplicates
Genes involved in the central metabolism are initially overretained and then under-retained (less neofunctionalization ?)
Dating genome duplications
• Phylogenetic analyses of orthologous
genes in other ciliate species => date
WGDs relative to speciation events
Tetrahymena thermophila
P. bursaria
P. putrinum
P. duboscqui
P. polycaryum
Old WGD
P. nephridiatum
P. caudatum
P. multimicronucleatum
P. jenningsi
Complex aurelia: 15
sibling species (same
kind of habitat,
Intermediate
initially thought
to WGD
correspond to a single
species)
Recent WGD
P. sexaurelia
P. pentaurelia
P. novaurelia
P. primaurelia
P. octaurelia
P. quadecaurelia
P. tredecaurelia
P. tetraurelia
Paramecium
aurelia complex
How does WGD
relate to speciation?
Polyploid
paramecia
Ptetra
Pprim
With the kind permission of K. Wolfe
Polyploid
paramecia
Ptetra
Pprim
Mating,
meiosis
Dobzhansky-Muller incompatibility
by reciprocal gene loss
For 1 locus, 1/4 of the offspring is inviable.
For n loci, offspring viability is (3/4)n
 Reproductive isolation
Conclusions (1)
• At least 3 WGDs in paramecium (probably 4)
• WGDs are rare events … that occured
recurrently in the evolution of eukaryotes (fungi,
animals, plants, ciliates …)
• Major impact on the evolution of the gene
repertoire
Conclusions (2)
• Dosage constraints appear as an essential force
shaping the gene repertoire after WGD
• Functional changes contribute to gene retention
on the long term …
• … but the fate of the vast majority of genes is to
get pseudogenized
Conclusions (3)
• Relationship between the number of
genes and organism complexity
– The number of genes is driven by selection …
– … and contingency (time since the last WGD)
• WGDs may be reponsible for (nonadaptative) explosive radiation of species
(Dobzhansky-Muller incompatibility by
reciprocal gene loss)
•
CNRS-UPR2167 - CGM - Gif sur Yvette
– Jean Cohen
– Linda Sperling
•
CNRS-UMR8541 – ENS - Paris
– Eric Meyer
– Mireille Bétermier
•
CNRS-UMR8125 – IGR - Villejuif
– Philippe Dessen
•
CNRS-UMR5558 – PBIL - Lyon
– Laurent Duret
– Vincent Daubin
•
Genoscope - CNRS UMR 8030
– Jean-Marc Aury
– Olivier Jaillon
– Benjamin Noel
– Betina Porcel
– Vincent Schachter
– Patrick Wincker
– Jean Weissenbach
Download