Xuhua Xia - Dr. Xuhua Xia`s Lab at University of Ottawa

advertisement
Codon usage bias
Ref: Chapter 9
Xuhua Xia
xxia@uottawa.ca
http:// dambe.bio.uottawa.ca
Objectives
• Understand how codon usage bias affect translation
efficiency and gene expression
• Biomedical relevance
– Protein drugs in pharmaceutical industry
– Transgenic experiments in agriculture
• Factors affecting codon usage bias
• Indices measuring codon usage bias
• Develop bioinformatic skills to study the genomic
codon usage.
Xuhua Xia
Slide 2
Codon Usage Bias
•
•
Observation: Strongly biased codon usage in a variety of species ranging from
viruses, mitochondria, plastids, prokaryotes and eukaryotes.
Hypotheses:
– Differential mutation hypothesis, e.g., Transcriptional hypothesis of codon usage (Xia
1996 Genetics 144:1309-1320 )
– Different selection hypothesis, e.g., (Xia 1998 Genetics 149: 37-44)
•
Predictions:
– From mutation hypothesis: Concordance between codon usage and mutation pressure
– From Selection hypothesis:
• Concordance between differential availability of tRNA and differential codon usage.
• The concordance is stronger in highly expressed genes than lowly expressed genes (CAI is
positively correlated with gene expression).
Gene 1
Polycistronic
mRNA
Ribosome
Protein
Gene 2
Gene 3
RNA
polymerase
GCC~tRNA~Gly
UCC~tRNA~Gly
UCC~tRNA~Gly
Xuhua Xia
UCC~tRNA~Gly
Slide 3
Table 9-2, yeast
AA(1)
Arg
Arg
Asn
Asn
Asp
Asp
Cys
Cys
Gln
Gln
Glu
Glu
His
His
Leu
Leu
Lys
Lys
Phe
Phe
Ser
Ser
Tyr
Tyr
Xuhua Xia
Codon(2)
AGA
AGG
AAC
AAU
GAC
GAU
UGC
UGU
CAA
CAG
GAA
GAG
CAC
CAU
UUA
UUG
AAA
AAG
UUC
UUU
AGC
AGU
UAC
UAU
T(3)
11
1
10
0
16
0
4
0
9
1
14
2
7
0
7
10
7
14
10
0
2
0
8
0
Xia 2007. Bioinformatics and the cell.
w(4)
1
0.091
1
0
1
0
1
0
1
0.111
1
0.143
1
0
0.7
1
0.5
1
1
0
1
0
1
0
F(5)
314
1
208
11
202
112
3
39
153
1
305
5
102
25
42
359
65
483
168
19
6
4
141
10
Slide 4
Conflict: Initiation and Elongation
• Met codon usage from the 12 CDSs:
AUA
214
AUG
37
• Possible tRNAMet/CAU, tRNAMet/UAU
• Vertebrate mitochondrial genome has only one tRNAMet.
Which one to have?
– tRNAMet/CAU: Good for initiation, but not efficient for AUA codons
even with the C modified to 5-formylcytidine
– tRNAMet/UAU: Good for AUA codons, but not good for initiation.
– anticodon: CAU favoring the AUG codon
• Nature has chosen CAU: All mitochondrial genomes with a
single tRNAMet has a CAU anticodon.
• Problem with AUA codons in translation?
Xuhua Xia
Xia et al. 2007. PLoS One
Slide 5
Hypothesis and Predictions
Met
Leu
Glu
Lys
Gln
Arg
Trp
AUA
UUA
GAA
AAA
CAA
AGA
UGA
Favoured by mutation
Favoured by mutation,
but not by tRNAmediated selection
because the first
(wobble) position in
tRNA anticodon is C.
Also favoured by tRNA-mediated
selection: the first (wobble) position of
tRNA is U.
Predictions:
1. Proportion of A-ending codons (or RSCU)
should be smaller in the Met codon family
than in other R-ending codon families:
PNNA = NNNA/NNNG
Xuhua Xia
2. Availability of tRNAMet/UAU should
increase PAUA.
Slide 6
Selection against AUA codons
Met
Leu
Glu
Lys
Gln
Arg
Trp
Species
AUA UUA GAA AAA
CAA AGA UGA
A. gossypii
1.473 1.993 1.826 1.852 1.917
2
2
C. glabrata
1.043 1.995 2.000 1.938 1.889
2
2
K. thermotolerans
0.556 1.973 1.910 1.948 1.945
2 1.967
S. cerevisiae
1.140 1.969 1.800 1.883 1.794 1.947 1.908
S. castelli
1.299 1.994 1.891 1.981 1.969
S. servazzii
1.321 1.931 1.702 1.824 1.841 1.959
Y. lipolytica
1.440 1.968 1.536 1.859 1.963 1.922 1.882
2 1.918
2
Carullo, M. and Xia, X. 2008 J Mol Evol 66:484–493.
Xuhua Xia
Slide 7
(a)
80
PAUA
70
60
50
40
30
30
40
50
60
70
80
PUUA
0.95
(b)
0.85
PAUA
0.75
0.65
0.55
0.45
0.35
0.25
0.25
0.35
0.45
0.55
0.65
0.75
0.85
0.95
PUUA
Fig. 5. Relationship between PAUA and PUUA, highlighting the observation that PAUA is greater when both a
tRNAMet/CAU and a tRNAMet/UAU are present than when only tRNAMet/CAU is present in the mtDNA, for bivalve
species (a) and chordate species (b). The filled squares are for mtDNA containing both tRNA Met/CAU and
tRNAMet/UAU genes, and the open triangles are for mtDNA without a tRNAMet/UAU gene.
Xia, X. 2012. In: RS Singh et al.. Evolution in the fast lane: Rapidly evolving genes and genetic systems. Oxford University Press.
Calculation of RSCU
RSCUij 
CodFreqj
 NumCodoni



CodFreq
i
 

j 1


Num Codoni
Codon
GCU
GCC
GCA
GCG
GAA
GAG
GGU
GGC
GGA
GGG
UUA
UUG
CUU
CUC
CUA
CUG
Xuhua Xia
RSCU Ala 
52
 0.84
 52  91  103  2 
4
AA N RSCU Codon
Ala
52
0.84 CCU
Ala
91
1.47 CCC
Ala 103
1.66 CCA
Ala
2
0.03 CCG
Glu 78
1.64 CAA
Glu 17
0.36 CAG
Gly 29
0.53 CGU
Gly 62
1.13 CGC
Gly 97
1.77 CGA
Gly 31
0.57 CGG
Leu 110
1.11 AUA
Leu 16
0.16 AUG
Leu 62
0.62 UCU
Leu 95
0.95 UCC
Leu 285
2.86 UCA
Leu 29
0.29 UCG
AA N RSCU Codon
Pro 42
0.87 UAA
Pro 63
1.31 UAG
Pro 85
1.76 AGA
Pro
3
0.06 AGG
Gln 79
1.82 AAA
Gln
8
0.18 AAG
Arg
7
0.44 ACU
Arg 11
0.7 ACC
Arg 42
2.67 ACA
Arg
3
0.19 ACG
Met 218
1.66 UGA
Met 44
0.34 UGG
Ser 51
1.11 GUU
Ser 65
1.42 GUC
Ser 99
2.16 GUA
Ser
5
0.11 GUG
RSCU is codon-specific
RSCU and proportion:
Different scaling.
AA N RSCU
*
8
3.2
*
1
0.4
*
1
0.4
*
0
0
Lys 90
1.78
Lys 11
0.22
Thr 44
0.57
Thr 96
1.25
Thr 153
1.99
Thr 15
0.19
Trp 92
1.77
Trp 12
0.23
Val 40
0.84
Val 48
1.01
Val 87
1.83
Val 15
0.32
Slide 9
Calculation of CAI
wij 

RefCodFreqij
RefCodFreqi.max
Codon
UGA
UAG
UAA
GCA
GCU
GCG
GCC
UGC
UGU
GAU
GAC
GAG
GAA
UUU
UUC
…
Xuhua Xia
AA
*
*
*
A
A
A
A
C
C
D
D
E
E
F
F
…
 N2,3,4


[ CodFreqi ln( wi )] 
 i1



N 2,3,4


CodFreqi


i 1


CAI  e
ObsFreq
0
0
0
1
15
0
8
3
3
9
11
11
14
3
9
…

RefCodFreq
6
4
16
195
322
81
242
123
112
69
40
289
335
118
213
N2,3,4: Number of 2-, 3-, 4-fold codon families
e
w
0.375
0.250
1.000
0.606
1.000
0.252
0.752
1.000
0.911
1.000
0.580
0.863
1.000
0.554
1.000
…
 1*ln(0.606) 15*ln(1) 8*ln(0.752) ... 


1158...


Compound 6- or 8-fold codon families
should be broken into two codon families
CAI is gene-specific.
0  CAI  1
CAI computed with different reference
sets are not comparable.
Problem with computing w as Fi/Fi.max:
Suppose an amino acid is rarely used in
highly expressed genes, then there is little
selection on it, and the codon usage might
be close to even, with wi  1. Now if we
have a lowly expressed gene that happen
to be made of entire of this amino acid,
then the CAI for this lowly expressed gene
would be 1, which is misleading.
There has been no good alternative.
Further research is needed.
Slide 10
Weak mRNA predictive power
80
Protein abundance
70
y = 5.6507x + 4.1367
R2 = 0.1936
60
50
ENO1
40
30
20
10
FRS2
0
0.5
1.5
2.5
3.5
4.5
mRNA abundance
Xuhua Xia
Slide 11
Effect of Codon Usage Bias
80
Protein abundance
70
y = 70.398x - 11.739
60
R 2 = 0.5668
50
40
ENO1
30
20
FRS2
10
0
0.05
0.25
0.45
0.65
0.85
Codon usage bias
Xuhua Xia
Slide 12
Problems with CAI
• Formulation
– Reference set
– w=0
• Implementation
wij 
RefCodFreqij
RefCodFreqi.max
 N2,3,4


[ CodFreqi ln( wi )] 
 i1



N 2,3,4


CodFreqi


i 1



– AUG
 1*ln(0.606) 15*ln(1) 8*ln(0.752) ... 



1158...

CAI  e
 e
– UGG
– Multiple codon families for one amino acid
• Dependence on AT%
• Solutions (Xia, X. 2007. Evolutionary
Bioinformatics)
Xuhua Xia
Slide 13
RSCU (HIV-1 vs Human)
2.5
V
2
RSCU (HIV-1)
R
S
A
I
1.5
L
E
K
L
(a)
G
P
T
A-ending
C-ending
G-ending
R
Q
1
U-ending
0.5
Fig. 1. Relative synonymous
codon usage (RSCU) of HIV1 compared to RSCU of
highly expressed human
genes. Data points for codons
ending with A, C, G or U are
annotated with different
combinations of colors and
symbols. A-ending codons
exhibit strong discordance in
their usage between HIV-1
and human and are annotated
with their coded amino acids.
0
0
0.5
1
1.5
2
2.5
RSCU (Human)
Xuhua Xia
van Weringh et al. 2011. MBE.
Slide 14
Research
• Observation on HIV-1:
– Strong surplus of A-ending codon
– High mutation rate
• Hypothesis: Strong A-biased mutation disrupting
codon adaptation.
– Prediction:
• Strong A-biased mutation (confirmed)
• If mutation rate is lower, then there will be better codon adaptation
(The related HTLV-1 parasitizes the same cell as HIV-1, but have
lower mutation rate: HTLV-1 genes should exhibit better codon
adaptation)
Xuhua Xia
Slide 15
RSCU (HTLV-1 vs Human)
3
RSCU (HTLV-1)
2.5
2
A-ending
C-ending
1.5
G-ending
U-ending
1
0.5
0
0
0.5
1
1.5
2
2.5
RSCU (Human)
Relative synonymous codon usage (RSCU) of HTLV-1 compared to RSCU of highly expressed human
genes. Data points for codons ending with A, C, G or U are annotated with different combinations of
colors and symbols. A-ending codons exhibit strong discordance in their usage between HIV-1 and
human and are annotated with their coded amino acids.
Xuhua Xia
Slide 16
Any problem with the mutation hypothesis?
Table 2. Frequency of A residues, length and codon adaptation index (CAI) for the three
HIV-1 early (tat, rev and nef) and five late (gag-pol, vif, vpu, vpr, and env) coding
sequences (CDS).
Gene
CDS (bp)
CAI
tat
261
0.66875
rev
351
0.66211
nef
621
0.67523
gag
1503
0.62784
pol
3012
0.58139
vif
579
0.61941
vpr
291
0.64272
vpu
249
0.49068
env
2571
0.61924
Download