Table S1. Gene Sequence variation Position Genetic change Amino

advertisement
Table S1.
Gene
Sequence
variation
ppe68
Sublineage I
18 isolates
1107 bp
ppe4
Sublineage II
(PPW
subfamily)
14 isolates
1542 bp
ppe11
Sublineage II
(PPW
subfamily)
18 isolates
1557 bp
ppe37
Sublineage II
(PPW
subfamily)
15 isolates
1422 bp
ppe67
Sublineage II
(PPW
subfamily)
15 isolates
234 bp
ppe2
Sublineage II
(PPW
subfamily)
11 isolates
1671 bp
ppe3
Sublineage II
(PPW
subfamily)
10 isolates
1611 bp
ppe46
Position
Genetic
change
Amino
acid
change
Number
of
isolates
Comments
nsSNPs
nsS1
nsS2
86
685
C→T
G→C
Ala → Val
Val → Leu
1
4
Haarlem
EAI specific.
Frameshift
FS1
1057
1
bp
insertion
Premature
stop
1
02_1987
941
T→C
Val → Ala
1
EAS054
460
555
C→T
C→G
Val
Ala
1
8
K85. Confirmed.
TBD1- specific.
1288
1510
C→T
A→G
Arg → Cys
Met → Val
3
1
CDC1551, strain C and Haarlem.
F11. Confirmed.
1248
G→A
Val
1
Strain C
1
M. bovis
2
Beijing isolates T85 and 02_1987.
4
LAM specific. Incorrect amino
acid incorporation from codon 340.
1
1
1
CPHL_A Confirmed.
T85
M. bovis
1
M. bovis.
1
2
CPHL_A
PGG1 isolates T92 and CPHL_A.
Mutation adds additional 333 amino
acids before next stop codon.
Apparent convergent mutation.
nsSNP
nsS1
sSNP
sS1
sS2
nsSNP
nsS1
nsS2
sSNP
sS1
In-frame deletion
D1
91 - 117
Frameshifts
FS1
507
FS2
1017 - 1018
27
bp
deletion
1
bp
deletion
2
bp
deletion
Premature
stop
No
premature
stop.
nsSNPs
nsS1
370
G→A
Val → Met
nsS2
449
C→G
Ala → Gly
nsS3
563
G→T
Ser → Ile
Whole gene deletion
WGD1
1166bp deletion deletes ppe67 and Nterminus of ppe66.
nsSNPs
nsS1
53
T→G
Leu → Arg
nsS2
233
A→G
Stop → Trp
sSNP
sS1
nsSNPs
nsS1
nsS2
nsS3
nsS4
nsS5
sSNP
sS1
nsSNPs
nsS1
nsS2
186
T→C
Gly
1
Haarlem
419
1211
1292
1381
1487
A→G
C→T
G→A
G→T
G→A
Glu → Gly
Pro → Leu
Asp → Asn
Ala → Ser
Gly → Asp
1
1
1
1
1
CPHL_A. Confirmed.
CDC1551
CPHL_A. Confirmed.
CDC1551
94_M4241A
1236
C→T
Thr
1
T46
556
769
G→T
G→A
Asp → Tyr
Glu → Lys
1
2
C→T
G→T
Pro → Ser
Glu → Asp
1
1
T46
CPHL_A and 02_1987. Apparent
convergent mutation.
H37Rv
CPHL_A
C→T
C→T
Leu
Gly
1
1
T46
02_1987
nsS3
1009
nsS4
1344
sSNPs
sS1
145
sS2
1338
Homologous recombination
Sublineage II
(PPW
subfamily)
7 isolates
1305 bp
ppe48/ppe47
Sublineage II
(PPW
subfamily)
8 isolates
1077 bp
ppe66
16 isolates
946 bp
ppe1
Sublineage II
(PPW
subfamily)
18 isolates
1392 bp
ppe20
Sublineage II
(PPW
subfamily)
14 isolates
1620 bp
ppe36
Sublineage III
18 isolates
732 bp
HC1
In-frame insertion/ deletion
Indel1
4-7
nsSNP
nsS1
149
nsS2
973
Homologous recombinations
HC1
596 - 929
HC2
596 - 977
Frameshift
FS1
244
380
bp
deletion
/
383
bp
insertion
4
K85, F11, CDC1551, and KZN 1435.
Results from recombination with ppe47.
4
bp
deletion / 13
bp insertion
1
M. bovis
1
1
M. bovis
M. bovis.
bp
/
bp
1
02_1987. Results from recombination
with ppe46.
bp
/
bp
1
Haarlem. Results from recombination
with ppe46.
1
H37Rv. New H37Rv gene (ppe47)
predicted to start at position 231.
1
M. bovis
1
M. bovis
1
GM1503
4
1
EAI specific
EAS054
3
1
1
3
1
Beijing specific
Haarlem
M. bovis
EAI (Philippines lineage) specific
K85
1
1
2
K85
Strain C
K85 and M. bovis
1
1
1
1
1
T85
M. bovis
94_M4241A
T17. Confirmed.
K85. Confirmed.
1
M. bovis
1
F11. Deletion is IS6110 associated.
Deletion also involves adjacent gene
pe22.
T→C
C→G
333
deletion
330
insertion
381
deletion
378
insertion
Val → Ala
Leu → Val
1
bp
deletion
Premature
stop
nsSNP
nsS1
149
T→C
Val → Ala
Partial Gene Deletion
PGD1
1166bp deletion deletes ppe67 and Nterminus of ppe66.
Frameshift
FS1
897
1
bp Premature
insertion
stop
nsSNPs
nsS1
512
C→A
Thr → Asn
nsS2
706
G→T
Val → Leu
nsSNPs
nsS1
413
T→C
Val → Ala
nsS2
476
C→T
Thr → Met
nsS3
544
C→T
Arg → Trp
nsS4
803
T→C
Leu → Pro
nsS5
1364
C→T
Thr → Ile
sSNPs
sS1
894
G→A
Pro
sS2
945
C→G
Pro
sS3
1011
C→A
Pro
nsSNPs
nsS1
171
G→C
Glu → Asp
nsS2
281
T→C
Val → Ala
nsS3
1327
G→T
Ala → Ser
nsS4
1415
C→T
Pro → Leu
nsS5
1445
C→T
Ala → Val
sSNP
sS1
135
A→C
Ser
Partial gene deletion
PGD1
1 – 124
5’ 124 bp
deletion
Frameshift
FS1
ppe69
Sublineage III
15 isolates
1200 bp
596 - 976
136
nsSNP
nsS1
539
Partial gene deletion
PGD1
1 - 58
nsSNPs
nsS1
341
1
bp
insertion
Premature
stop
1
Strain C
A→C
Glu → Ala
1
T92. Confirmed.
58
bp
deletion
1st 23 amino
acids
deleted
1
Haarlem. Deletion of 5’ gene region.
Predicted alternate start codon at
position 70.
A→T
Glu → Val
1
CDC1551
ppe41
Sublineage III
16 isolates
585 bp
ppe57
Sublineage III
16 isolates
531 bp
ppe58
Sublineage III
16 isolates
522 bp
ppe59
Sublineage III
18 isolates
537 bp
ppe9
Sublineage IV
(SVP
subfamily)
9 isolates
543 bp
ppe17
Sublineage IV
(SVP
subfamily
18 isolates
1041 bp
ppe29
Sublineage IV
(SVP
subfamily
13 isolates
1272 bp
ppe30
Sublineage IV
(SVP
subfamily
15 isolates
1392 bp
ppe31
Sublineage IV
(SVP
subfamily
17 isolates
1200 bp
ppe32
nsS2
610
nsS3
611
sSNPs
sS1
204
sS2
366
sS3
609
Partial gene deletion
PGD1
1 - 117
G→T
A→G
Asp → Cys
Asp → Cys
1
1
CDC1551
CDC1551
G→A
C→T
T→A
Ala
Asp
Gly
2
2
1
CDC1551 and strain C
EAI isolates T46 and EAS054
CDC1551
117
bp
deletion
1st 39 amino
acids
deleted
1
GM 1503. Genomic deletion spans 3’
region of upstream gene (pe25) and 5’
region of ppe41.
sSNP
sS1
177
A→C
Pro
1
EAS054
Whole gene deletions
98-R604 INH-RIF-EM, Haarlem, strain C, KZN1435, GM1503, CDC1551.
Homologous recombination
Multiple instances of homologous recombination between the highly homologous ppe57, ppe58 and ppe59 genes.
Whole gene deletions
98-R604 INH-RIF-EM, Haarlem, strain C, KZN1435, GM1503, CDC1551, M. bovis.
Homologous recombination
Multiple instances of homologous recombination between the highly homologous ppe57, ppe58 and ppe59 genes.
Whole gene deletion
M. bovis
Homologous recombination
Multiple instances of homologous recombination between the highly homologous ppe57, ppe58 and ppe59 genes.
In frame deletion
D1
1146 - 1154
Frameshifts
FS1
nsSNPs
nsS1
Frameshift
FS1
nsSNP
nsS1
sSNP
sS1
Frameshifts
FS1
2
EAI (Philippines lineage)
isolates T17 and T46
970
1
bp
insertion
Premature
stop
1
T85
425
A→C
Glu → Ala
1
M. bovis
501
1
bp
insertion
Premature
stop
1
M. bovis (ppe17a)
500
C→T
Pro → Leu
2
Beijing isolates 02_1987 and T85
675
C→T
Pro
1
K85
641
1
bp
insertion
No
premature
stop.
1
Haarlem
T→G
G→T
C→A
G→C
Val → Gly
Ala → Ser
Ala → Glu
Ala → Glu
1
1
1
1
CPHL_A
K85. Confirmed.
K85. Confirmed.
CPHL_A
T→G
Leu
1
94_M4241A
1
T46
nsSNPs
nsS1
353
nsS2
439
nsS3
731
nsS4
1096
sSNP
sS1
1105
IS6110 integration
IS1
1294
nsSNPs
nsS1
484
nsS2
1202
nsSNPs
nsS1
nsS2
nsS3
nsS4
nsS5
sSNP
sS1
nsSNP
9
bp
deletion
G→T
C→T
Gln → Stop
Ser → Leu
1
1
K85
CDC1551
287
500
574
680
712
C→G
A→C
C→T
C→T
C→G
Ala → Gly
Gln → Pro
His → Tyr
Ser → Phe
Leu → Val
1
1
1
1
1
GM1503
K85
CPHL_A
H37Rv
H37Rv
1110
C→T
Pro
1
M. bovis
901
C→G
Ala → Gly
8
PGG2 and 3 specific
141
G→A
Ser
11
TBD1- specific
nsSNPs
nsS1
568
C→T
Gln → stop
1
nsS2
nsS3
nsS4
760
1201
1412
C→G
G→A
C→G
Leu → Val
Gly → Arg
Ser → stop
1
1
11
M. bovis. New gene (ppe33b)
predicted to begin at position 571.
H37Rv
K85
TBD1- specific. Results in loss of 2
C-terminal amino acids.
C→T
Ala
1
H37Rv
2
K85 and M. bovis. Part of a 5894 bp
deletion in M. bovis compared to H37Rv.
Sublineage IV
(SVP
subfamily
18 isolates
1230 bp
nsS1
sSNPs
sS1
ppe33
Sublineage IV
(SVP
subfamily
17 isolates
1407 bp
ppe65
Sublineage IV
(SVP
subfamily
17 isolates
1242 bp
ppe14
Sublineage
(SVP
subfamily
18 isolates
1272 bp
ppe50
Sublineage
(SVP
subfamily
18 isolates
399 bp
ppe51
Sublineage
(SVP
subfamily
17 isolates
1143 bp
ppe61
Sublineage
(SVP
subfamily)
18 isolates
1221 bp
IV
IV
sSNP
sS1
471
Whole gene deletion
WGD1
nsSNPs
nsS1
nsS2
sSNPs
sS1
sS2
sS3
nsSNPs
nsS1
nsS2
sSNP
sS1
83
1066
C→G
G→A
Ala → Gly
Ala → Thr
1
1
Strain C
T17. Confirmed.
339
381
777
C→T
T→G
C→A
Ala
Leu
Ala
1
1
4
CDC1551
CPHL_A
TBD1+ specific
481
878
G→T
C→T
Ala → Ser
Thr → Met
1
1
Haarlem
CPHL_A. Confirmed.
1200
C→A
Gly
1
02_1987
Hypervariable at macromutational scale [32]. Whole gene deletion in 8 isolates (CDC1551, Haarlem, strain C, 94_M4241A,
T17, T92, T46, EAS054).
No variation detected.
IV
IV
ppe44
Sublineage IV
(SVP
subfamily)
18 isolates
1149 bp
ppe15
Sublineage IV
(SVP
subfamily)
16 isolates
1176 bp
In-frame deletion
D1
82 - 84
Frameshift
FS1
3
bp
deletion
2
EAS054 and 94_M4241A.
Convergent mutation.
796
5
bp
insertion
Premature
stop
1
CPHL_A. Confirmed.
421
770
1100
C→T
C→T
C→T
Gln → stop
Thr → Met
Ala → Val
1
4
1
CDC1551
EAI specific
K85
942
G→C
Ser
1
Strain C
176
581
G→T
T→C
Gly → Val
Phe → Ser
4
8
EAI specific
PGG1 specific.
624
C→T
Ala
1
M. bovis
Frameshifts
FS1
8
1
bp
deletion
1
02_1987. Ppe motif absent.
FS2
23
1
Alternate
start site at
position 43
predicted.
Alternate
1
94_M4241A. Ppe motif absent.
nsSNPs
nsS1
nsS2
nsS3
sSNP
sS1
nsSNPs
nsS1
nsS2
sSNP
sS1
bp
ppe43
Sublineage IV
(SVP
subfamily)
17 isolates
1185 bp
ppe18
Sublineage
(SVP
subfamily)
16 isolates
1176 bp
ppe19
Sublineage
(SVP
subfamily)
18 isolates
1191 bp
ppe60
Sublineage
(SVP
subfamily)
15 isolates
1182 bp
ppe22
Sublineage
(SVP
subfamily)
14 isolates
1158 bp
ppe26
Sublineage
(SVP
subfamily)
17 isolates
1182 bp
IV
nsSNPs
nsS1
nsS2
Frameshift
FS1
deletion
start site at
position 43
predicted.
199
541
G→A
G→T
Ala → Thr
Ala → Ser
1
1
KZN1435
EAS054
448 - 452
5
bp
deletion
Premature
stop
1
CPHL_A. Confirmed.
nsSNPs
nsS1
788
C→G
Pro → Arg
1
M. bovis
nsS2
1040
G→T
Gly → Val
4
LAM specific
Homologous recombinations
Multiple instances of homologous recombination events between the highly homologous PPE19, PPE18 and PPE60
genes.
IV
Homologous recombinations
Multiple instances of homologous recombination events between the highly homologous PPE19, PPE18 and PPE60
genes.
IV
Homologous recombinations
Multiple instances of homologous recombination events between the highly homologous PPE19, PPE18 and PPE60
genes.
IV
IV
ppe23
Sublineage IV
(SVP
subfamily)
18 isolates
1185 bp
ppe45
Sublineage IV
(SVP
subfamily)
16 isolates
1227 bp
ppe25
Sublineage IV
(SVP
subfamily)
15 isolates
1098 bp
nsSNPs
nsS1
nsS2
nsS3
nsS4
454
770
937
1091
In-frame deletion
D1
547 - 552
nsSNPs
nsS1
nsS2
nsS3
nsSNP
nsS1
T→C
T→C
G→C
C→T
Tyr → His
Ile → Thr
Val → Leu
Thr → Met
6
bp
deletion
1
1
7
1
02_1987
98-R604_INH-RIF-EM
PGG2 and 3 specific
CDC1551
1
Haarlem
241
820
823
G→A
T→G
G→A
Ala → Thr
Ser → Ala
Ala → Thr
1
1
1
02_1987
M. bovis
T17. Confirmed.
109
T→C
Ser → Pro
1
K85
225
320
G→A
C→T
Trp → stop
Pro → Leu
1
1
K85
94_M4241A
1227
G→A
stop
1
T92. Sequence error. Normal sequence
confirmed. This variation not
included in analysis.
Homologous recombinations
Various combinations of 10 SNPs and a 45 bp deletion
4
02_1987, F11, KZN1435 and CPHL_A.
Mutations indicate recombination
with ppe27.
In-frame deletion
D1
825 - 827
1
M. bovis
1
3
1
98-R604_INH-RIF-EM
CDC1551, strain C and Haarlem
M. bovis
nsSNPs
nsS1
nsS2
sSNP
sS1
nsSNPs
nsS1
nsS2
nsS3
164
848
932
3
bp
deletion
C→T
C→T
T→G
Ala → Val
Ala → Val
Val → Gly
ppe27
Sublineage IV
(SVP
subfamily)
16 isolates
1053 bp
ppe38/ppe71
Sublineage IV
(SVP
subfamily)
18 isolates
1176 bp
ppe49
Sublineage IV
(SVP
subfamily)
18 isolates
1176 bp
ppe10
Sublineage V
(MPTR
subfamily)
14 isolates
1464 bp
ppe12
Sublineage V
(MPTR
subfamily)
17 isolates
1938 bp
ppe21
Sublineage V
(MPTR
subfamily)
17 isolates
2283bp
ppe39
Sublineage V
(MPTR
subfamily)
14 isolates
1869 bp
sSNP
sS1
423
C→T
Ala
1
GM1503
nsSNPs
nsS1
163
G→C
Ala → Pro
1
K85
nsS2
568
C→T
Pro → Ser
1
M. bovis
sSNPs
sS1
543
A→C
Ala
3
EAI isolates EAS054, T46 and T92.
sS2
765
A→G
Pro
1
M. bovis
Hypervariable on a macro-mutational scale due to numerous instances of homologous recombination with identical
homologue ppe71 plus numerous IS6110-associated mutations. Micro-mutations (SNPs, small indels) are
uncommon [26].
nsSNP
nsS1
547
C→T
Gln → stop
1
Haarlem
Frameshifts
FS1
505
1
bp
deletion
Premature
stop
1
T85
nsSNP
nsS1
23
G→A
Trp → stop
1
nsS2
36
G→T
Glu → Asp
1
G→C
C→T
Gly → Ala
Pro → Leu
1
1
M. bovis. Eighth aa coverted to stop.
Coding resumes at codon 9 resulting in
gene with 8 N-terminal aa missing.
T92. Changes Glu of ppe signature
sequence.
K85
K85
3
EAI (Philippines lineage) specific
nsS3
863
nsS4
1400
In-frame insertions
I1
1043
Frameshifts
FS1
nsSNPs
nsS1
sSNPs
sS1
Frameshift
FS1
1125
1
bp
deletion
Premature
stop
1
T92. Sequence error. Normal sequence
confirmed. This variation not included in
analysis.
1634
A→G
Lys → Arg
11
TBD1- specific
1389
T→C
Ile
1
F11. Confirmed.
60
1
bp
deletion
Premature
stop
1
H37Rv
C→G
G→A
T→C
G→A
Pro → Arg
TGG → stop
Val → Ala
Gly → Asp
1
1
1
1
M. bovis
94_M4241A
M. bovis
Haarlem
nsSNPs
nsS1
107
nsS2
225
nsS3
449
nsS4
1844
IS6110 integrations
IS1
47
IS2
30
bp
insertion
Premature
stop
Premature
stop
19
Homologous recombination
HR1
550
Fusion
with
PPE40
Whole gene deletions
WGD1
WGD2
Partial gene deletion
PGD1
1358
In-frame deletions
Premature
stop
2
1
Haarlem and F11. Convergent mutation
[27].
H37Rv
2
K85 and 94_M4241A. Convergent
mutation.
1
1
T92. Part of large RD5-like deletion.
02_1987. Part of a major genomic
rearrangement [26].
1
M. bovis. Ppe39 part of the RD5
deletion.
D1
ppe40
Sublineage V
(MPTR
subfamily)
16 isolates
1848bp
88 - 90
nsSNP
nsS1
539
IS6110 integrations
IS1
47
Homologous recombination
HR1
550
In-frame deletions
D1
490 - 492
Partial gene deletion
PGD1
1582
nsSNP
nsS1
nsS2 – S6
ppe6
Sublineage V
(MPTR
subfamily)
15 isolates
2892 bp
This gene split
into 2 (ppe5/6)
in bovis, K85
(type 1) and
H37Rv,
T17
(type 2). This
gene split into
3
predicted
open reading
frames
in
CPHL_A.
ppe5
Sublineage V
1096
1100 - 1004
sSNP
sS1
969
In-frame insertions
I1
2379
I2
7604
In-frame deletions
D1
834 - 842
D2
4763 - 4822
Frameshifts
FS1
2399
FS2
2930
FS3
5945
nsSNP
nsS1
nsS2
nsS3
nsS4
nsS5
3
bp
deletion
2
EAS054 and CDC1551.
Convergent mutation.
Leu → Ser
1
98-R604_INH-RIF-EM
Premature
stop
2
02_1987 and CPHL_A. Convergent
mutation [27].
Fusion
with ppe39
2
K85 and 94_M4241A. Convergent
mutation [26].
3
bp
deletion
1
M. bovis
T→C
RD5-like
deletion
Premature
stop
1
T92. Large deletion fuses 5’ region of
PPE40 with plcC.
G→C
CTGGA →
ACAAC
Gly → Arg
Thr, Gly →
Asn, Asn
1
1
KZN 1435
KZN 1435. nsS1-6 represents 6 SNPs
in a 9 bp region.
T→C
Asn
1
Strain C
30
bp
insertion
15
bp
insertion
1
KZN 1435
5
PGG2 and 3 specific
9
bp
deletion
60
bp
deletion
1
94_M4241A
1
F11. Confirmed.
H37Rv
CPHL_A. FS1 and 2 both result in a
new gene (ppe5) starting at ppe6
codon 983.
CPHL_A, K85 and M. bovis specific.
Results in new gene (ppe5 starting at
ppe6 codon 2035. Note alternate ppe5 to
that formed from FS1-3. Third ppe
gene formed in CPHL_A (see ppe
FS1).
1
bp
insertion
1
bp
deletion
1
bp
deletion
Premature
stop
Premature
stop
Premature
stop
1
617
2727
2798
6118
6763
G→C
A→G
G→T
G→A
A→G
Gly → Ala
Ile → Met
Gly → Val
Asp → Asn
Ile → Val
3
1
4
1
3
nsS6
7358
G→A
Gly → Glu
2
nsS7
nsS8
8219
8240
G→C
G→C
Gly → Ala
Phe → Ser
1
8
nsS9
9412
nsS10
9464
sSNP
sS1
1359
sS2
2928
sS3
3562
sS4
4446
sS5
5135
sS6
5454
sS7
5655
In-frame insertions
I1
4658
A→G
G→A
Asn → Asp
Gly → Asp
1
1
EAI (Philippines lineage) specific
94_M4241A
LAM and PGG3 specific
02_1987
EAI specific. Same mutation seen in
ppe5 nsS1.
EAI (Philippine lineage) specific. Same
mutation seen in ppe5 nsS2.
94_M4241A
TBD1- specific. Same mutation as seen
in ppe5 nsS4.
T92. Confirmed.
T92. Confirmed.
T→C
G→A
T→C
G→A
G→A
C→G
C→T
Arg
Gly
Leu
Ser
Pro
Thr
Gly
1
1
8
1
1
1
1
T85
Haarlem
TBD1- specific
F11. Confirmed.
T46
02_1987
EAS054
1
H37Rv. Same insertion as seen in
15
bp
1
2
(MPTR
subfamily)
15 isolates
6615 bp
insertion
ppe6 I2.
Frameshifts
FS1
2929
1
bp
deletion
Premature
stop
1
CPHL_A. Results in 3rd ppe gene
Starting at ppe6 codon 2035 (see
ppe6 FS4). Confirmed.
Ppe5 formed
from split of
ppe6.
Only
present in M.
bovis, K85 &
CPHL_A (type
1 PPE5) and
Rv, T17 and
CPHL_A (type
2).
CPHL_A ppe5
further
split
into additional
gene.
ppe54
Sublineage V
(MPTR
subfamily)
10 isolates
7572 bp
ppe8
Sublineage V
(MPTR
subfamily)
12 isolates
9903 bp
nsSNP
nsS1
nsS2
1441
5295
G→A
G→C
Gly → Thr
Phe → Ser
2
1
nsS3
sSNP
sS1
2863
A→G
Thr → Ala
1
M. bovis & K85 specific
H37Rv. Same mutation seen in ppe6
nsS8.
M. bovis
616
T→C
Leu
1
sS2
1212
T→C
Gly
1
1 gene (ppe8)
in TBD1+. Two
genes (ppe7 &
ppe8)
in
TBD1- due to
frameshift
(FS2)
with
termination in
ppe8 and new
start site.
In-frame insertions
I1
5003
H37Rv (same mutation seen in
ppe6 sS3).
K85
Extreme variation observed. All isolates unique.
Note: Identical gene sequence in the closely related isolates KZN1435, 4207 & 605. Identical gene sequence in 3 members
of the Harlingen transmission chain [58,59 (average coverage = 84%).
In-frame deletions
D1
6434 - 6493
D2
7506 - 7535
D3
9311 - 9370
I2
7352
Frameshifts
FS1
8947
FS2
9875
nsSNPs
nsS1
nsS2
nsS3
nsS4
nsS5
nsS6
nsS7
nsS8
nsS9
nsS10
nsS11
nsS12
nsS13
nsS14
nsS15
nsS16
sSNPs
sS1
sS2
sS3
sS4
sS5
sS6
sS7
60
bp
deletion
30
bp
deletion
60
bp
deletion
2
LCC (CDC1551 and strain C) specific
1
Strain C
3
EAS054, KZN 1435 and 98-R604_INHRIF-EM. Possible convergence.
15
bp
insertion
15
bp
insertion
1
98-R604_INH-RIF-EM
1
H37Rv
1
Strain C
8
All TBD1+ isolates. Coding region for
ppe7 begins at position 9973 for TBD1+
isolates.
1
bp
insertion
2
bp
deletion
Premature
stop
Premature
stop
353
1240
3578
4027
4639
5520
5840
6296
6337
7173
7756
7897
8484
9733
9931
10418
T→C
T→G
C→G
G→A
G→T
A→G
G→A
G→A
T→C
T→G
G→T
G→A
C→A
T→A
G→A
T→C
Val → Ala
Phe → Val
Ala → Gly
Gly → Ser
Ala → Ser
Asn → Asp
Gly → Asp
Gly → Asp
Trp → Arg
Ser → Arg
Gly → Trp
Gly → Ser
Phe → Leu
Phe → Ile
Ser → Thr
Phe → Val
2
1
1
1
1
1
1
1
1
1
1
1
1
8
1
1
657
3357
3924
4122
5433
5982
7209
G→A
C→A
C→T
C→T
G→A
A→G
A→C
Ser
Gly
Ile
Asn
Gly
Ala
Gly
1
1
1
1
1
7
1
EAI specific
CPHL_A. Confirmed.
EAS054
K85
K85
M. bovis
Haarlem
T46
Strain C
Strain C
H37Rv
EAI specific
K85. Confirmed.
All TBD1- isolates
K85
EAS054
94_M4241A
K85
Haarlem
EAS054
F11. Confirmed.
All PGG2 and 3 isolates
M. bovis
ppe7
Sublineage V
(MPTR
subfamily)
8 isolates
426 bp
ppe16
Sublineage V
(MPTR
subfamily)
16 isolates
1857 bp
sS8
Frameshift
FS1
nsSNPs
nsS1
ppe34
Sublineage V
(MPTR
subfamily)
17 isolates
4380 bp
ppe35
Sublineage V
(MPTR
subfamily)
18 isolates
2964 bp
C→T
Pro
1
94_M4241A
375
1
bp
deletion
Premature
stop
1
H37Rv
271
G→T
Ala → Ser
1
CDC1551
1
CPHL_A. Ppe16 deleted along with
neighbouring gene Rv1134.
Premature
stop
1
T85
Premature
stop
1
K85
Whole gene deletion
WGD1
IS6110 integration
IS1
1222
Frameshifts
FS1
ppe24
Sublineage V
(MPTR
subfamily)
13 isolates
3162 bp
ppe13
Sublineage V
(MPTR
subfamily)
17 isolates
1332 bp
9684
1333 - 1337
5
bp
deletion
nsSNPs
nsS1
85
T→A
Val → Asp
1
T17. Confirmed.
nsS2
314
C→T
Ala → Val
1
CDC1551
sSNP
sS1
342
G→A
Val
1
K85. Confirmed.
Extreme variation observed. All isolates unique.
Note: Identical gene sequence in the closely related isolates KZN1435, 4207 & 605. Identical gene sequence in 3 members
of the Harlingen transmission chain [58,59] (average coverage = 84%).
Frameshifts
FS1
51
FS2
1306
FS3
1307
FS4
1313
FS5
1314
1
bp
deletion
1
bp
deletion
1
bp
insertion
2
bp
deletion
1
bp
deletion
Premature
stop
Poly C/poly
A
region
from position
1298 results
in numerous
FS
variations.
1
KZN 1435
5
PGG1and 2 isolates GM1503, T17,
T46, T92 and K85.
PGG1 and 2 isolates F11, KZN 1435,
CPHL_A and EAS054. Note:
sequence reanalysis shows CPHL_A
has a 2 bp insertion.
CPHL_A
4
1
9
PGG1 and 2 isolates M. bovis, strain
C, 98-R604, T17, T46, T92, K85,
EAS054, GM1503.
nsSNPs
nsS1
289
G→A
Ala → Ser
3
EAI (Philippines lineage) specific
nsS2
513
G→A
Trp → stop
1
EAS054
sSNPs
sS1
732
C→T
Asn
1
K85
sS2
1008
C→T
Gly
4
EAI specific
Extreme variation observed. All isolates unique.
Note: Identical gene sequence in the closely related isolates KZN1435, 4207 & 605. Identical gene sequence in 3 members
of the Harlingen transmission chain [58,59 (average coverage = 84%).
In frame deletion
D1
1603 - 1680
78
bp
deletion
1
Haarlem
Frameshift
FS1
1877
1
bp
insertion
Premature
stop
1
M. bovis. Results in stop codon at
nucleotide position 1953 – 1955 and
new predicted gene (ppe35b) start
codon at position 2038.
nsSNPs
nsS1
nsS2
nsS3
nsS4
nsS5
1949
1960
335
643
2708
G→A
C→A
C→T
A→C
G→A
Gly → Asp
Ser → Thr
Thr → Ile
Ser → Arg
Gly → Asp
1
1
1
1
1
Strain C
K85. Confirmed.
M. bovis. Ppe35b (see FS1).
M. bovis. Ppe35b (see FS1).
EAS054
ppe28
Sublineage V
(MPTR
subfamily)
17 isolates
1968 bp
ppe63
Sublineage V
(MPTR
subfamily)
15 isolates
1440 bp
ppe42
Sublineage V
(MPTR
subfamily)
14 isolates
1743 bp
ppe53
Sublineage V
(MPTR
subfamily)
12 isolates
1773 bp
ppe62
Sublineage V
(MPTR
subfamily)
15 isolates
1749 bp
ppe52
Sublineage V
(MPTR
subfamily)
11 isolates
1230 bp
ppe64
Sublineage V
(MPTR
subfamily)
16 isolates
1659 bp
nsS6
sSNP
sS1
Frameshift
FS1
nsSNPs
nsS1
nsS2
nsS3
nsS4
sSNP
sS1
nsSNPs
nsS1
nsS2
nsSNPs
nsS1
nsS2
2765
C→T
Ser → Leu
8
PGG2 and 3 specific
2238
C→G
Val
1
Strain C
169 - 213
45
bp
deletion/
44
bp
insertion
Premature
stop
1
T17.
Sequence
error.
Normal
sequence confirmed. This variation
not included in analysis.
432
449
757
1508
G→T
C→T
T→G
T→C
Trp → Cys
Ala → Val
Phe → Val
Val → Ala
10
2
2
1
TBD1- specific
Haarlem and strain C.
M. bovis and K85
M. bovis
1509
C→T
Val
1
T46
1093
1265
T→A
C→T
Tyr → Asn
Thr → Met
4
1
EAI specific
K85
157
841
G→A
C→T
Ala → Thr
Pro → Ser
1
2
CDC1551
M. bovis and K85
4
M. bovis, T17, T46 and K85
1
EAS054
In frame deletion
D1
190 - 192
3
bp
deletion
30
bp
deletion
D2
1186 - 1215
Frameshift
FS1
60
1
bp
deletion
Premature
stop
2
T17 and T46
nsSNPs
nsS1
97
C→G
Arg → Gly
2
C→A
G→T
A→G
Asp → Glu
Gly → Val
Thr → Ala
1
1
1
EAS054 and Haarlem. Possible
convergent mutation.
M. bovis
EAS054
EAS054
3
bp
deletion
1
GM1503
90
bp
insertion
1
CPHL_A
nsS2
612
nsS3
1013
nsS4
1681
In frame deletion
D1
207 - 209
In frame insertion
I1
956
nsSNPs
nsS1
nsS2
Frameshift
FS1
nsSNPs
nsS1
1026
1690
C→A
G→A
Ser → Arg
Gly → Ser
1
2
Haarlem
T17 and T46
284
1
bp
insertion
Premature
stop
1
T85
1198
C→A
Gln → Lys
5
PGG2 and 3 specific
3
bp
deletion
60
bp
deletion.
4
LAM specific
1
K85
30
bp
insertion.
1
02_1987
1
T92. Confirmed.
In frame deletion
D1
88 - 90
D2
590 - 649
In frame insertion
I1
911
Frameshift
FS1
757
1
bp
insertion.
Premature
stop.
ppe55
Sublineage V
(MPTR
subfamily)
9 isolates
9474 bp
ppe56
Sublineage V
(MPTR
subfamily)
6 isolates
11151 bp
nsSNPs
nsS1
34
A→G
Asn → Asp
1
CPHL_A
nsS2
916
G→A
Gly → Ser
4
EAI specific
nsS3
1019
A→G
Ile → Ser
1
GM1503
Extreme variation observed. Numerous frameshifts split the gene into 2 or 3 distinct open reading frames in several
isolates.
Note: Identical gene sequence in the closely related isolates KZN1435, 4207 & 605. Identical gene sequence observed in 3
members of the Harlingen transmission chain [58,59] (average coverage = 84%).
Frameshift
FS1
6476 - 6546
FS2
6041
nsSNPs
nsS1
nsS2
nsS3
nsS4
nsS5
nsS6
nsS7
nsS8
sSNPs
sS1
sS2
sS3
sS4
Table S1B
71
bp
deletion and
58
bp
insertion.
1
bp
deletion
Premature
stop.
1
Haarlem
Premature
stop.
1
M. bovis (ppe56b). New gene (ppe56d)
begins position 7705. Note: no ppe56c
listed in BoviList [76].
52
1518
T→C
C→A
Cys → Arg
Tyr → stop
1
1
2897
1360
3149
3150
5143
3878
G→A
A→G
C→T
C→T
G→T
C→T
Gly → Asp
Asn → Asp
Thr → Ile
Thr → Ile
Gly → Cys
Ala → Val
1
1
1
1
1
1
M. bovis
M. bovis. New gene (ppe56b) begins at
position 1576.
EAS054
M. bovis (ppe56b)
EAS054
EAS054
EAS054
M. bovis (ppe56b)
906
126
3777
1566
G→C
C→G
G→A
G→A
Gly
Ala
Glu
Ser
1
1
2
1
M. bovis
M. bovis (ppe56b)
EAI isolates EAS054 and T46
M. bovis (ppe56d)
Gene
Sequence
variation
Position
Genetic
change
Amino
acid
change
Number
of
isolates
Comments
pe35
Sublineage I
17 isolates
297 bp
Frameshift
FS1
10
1 bp deletion
Premature
stop
2
94_M4241A and
CPHL_A.
Convergent mutation.
Occurs in polyA
sequence.
nsSNP
nsS1
295
T→G
Stop → Glu
1
H37Rv. 1 additional
amino acid added to Cterminal end.
pe34
Sublineage I
18 isolates
336 bp
pe5
Sublineage II
18 isolates
309 bp
pe15
Sublineage II
17 isolates
309 bp
pe29
Sublineage II
9 isolates
315 bp
pe36
Sublineage III
18 isolates
234 bp
pe25
Sublineage III
18 isolates
300 bp
pe22
Sublineage III 18
isolates
297 bp
nsSNP
nsS1
155
C→T
Thr → Met
1
GM1503
G→A
Ala → Thr
1
Haarlem
1
F11. Deletion associated
with IS6110. Deletion
also includes part of
adjacent
ppe36 gene.
pe11
Sublineage IV
18 isolates
303 bp
pe20
Sublineage IV
16 isolates
300 bp
pe18
Sublineage IV
18 isolates
300 bp
sSNP
sS1
1
T85
1
98-R604 INH-RIF-EM
pe19
Sublineage IV
18 isolates
300 bp
pe32
Sublineage IV
18 isolates
300 bp
No variation detected.
nsSNP
nsS1
151
No variation detected.
No variation detected.
No variation detected.
Whole gene deletion
WGD1
90
C→T
Asp
No variation detected.
IS6110 integration
IS1
195
nsSNP
nsS1
4
sSNP
sS1
300
sSNP
sS1
123
T→C
Ser → Pro
1
Haarlem
G→A
Stop
1
M. bovis
C→T
Pro
1
M. bovis
2
K85 and M. bovis (RD8
deletion).
1
Strain C
Whole gene deletion
WGD1
sSNP
sS1
84
A→G
Gly
pe13
Sublineage IV
17 isolates
300 bp
pe31
Sublineage IV
17 isolates
297 bp
pe7
Sublineage IV
17 isolates
300 bp
pe8
Sublineage IV
16 isolates
828 bp
pe27
Sublineage IV
15 isolates
828 bp
pe2
Sublineage
V
(PGRS subfamily)
18 isolates
1578 bp
pe24
Sublineage
V
(PGRS subfamily)
18 isolates
1005 bp
pe26
Sublineage
V
(PGRS subfamily)
8 isolates
1479 bp
pe4
Sublineage
V
(PGRS subfamily)
18 isolates
1509 bp
pe3
Sublineage
V
(PGRS subfamily)
17 isolates
1407 bp
pe12
Sublineage
V
(PGRS subfamily)
18 isolates
No variation detected.
nsSNP
nsS1
77
sSNP
sS1
84
No variation detected.
C→T
Ala → Val
1
M. bovis
T→C
Asn
1
F11. Confirmed.
511
G→A
Ala → Thr
1
K85. Confirmed.
243
810
G→A
C→T
Gly
Pro
1
1
H37Rv
K85. Confirmed.
128
C→T
Ala → Val
2
152
136
808
T→G
C→T
A→G
Leu → Arg
Pro → Ser
Met → Val
1
6
5
EAI (Philippines lineage)
isolates T17 and T46.
CDC1551
TBD1+ isolate specific.
H37Rv, F11, 98R604_INH-RIF-EM,
GM1503 and KZN1435.
998
1021
1024
1027
1 bp deletion
1 bp deletion
1 bp deletion
1 bp deletion
Combine
produce
premature
stop.
1
EAS054
872
G→A
Gly → Glu
4
LAM specific.
903
G→A
Glu
1
Strain C
932
G→A
Gly → Val
2
Beijing isolates 02_1987
and T85.
nsSNPs
nsS1
nsS2
518
519
G→C
G→C
Gly → Ala
Gly → Ala
1
1
M. bovis
M. bovis
Frameshift
FS1
360
1 bp deletion
Premature
stop
1
GM1503
346
492
631
1108
1426
G→A
G→C
G→A
C→T
C→G
Ala → Thr
Lys → Asn
Ala → Thr
Gln → stop
Gln → Glu
1
4
2
1
1
M. bovis
EAI specific.
M. bovis and K85.
K85. Confirmed.
T46
232
C→A
Arg
1
T46
40
271
763
1102
G→A
G→C
C→A
C→T
Ala → Thr
Glu → Gln
Pro → Thr
Arg → Trp
1
1
1
1
H37Rv
94_M4241A
M. bovis
CDC1551
39
G→C
Thr
1
CPHL_A. Confirmed.
98
1 bp insertion
Premature
stop
1
02_1987
nsSNP
nsS1
sSNPs
sS1
sS2
nsSNPs
nsS1
nsS2
nsS3
nsS4
Frameshifts
FS1
FS2
FS3
FS4
nsSNPs
nsS1
sSNP
sS1
nsSNP
nsS1
nsSNPs
nsS1
nsS2
nsS3
nsS4
nsS5
sSNPs
sS1
nsSNPs
nsS1
nsS2
nsS3
nsS4
sSNP
sS1
Frameshift
FS1
nsSNPs
to
927 bp
pe14
Sublineage
V
(PGRS subfamily)
15 isolates
333 bp
pe16
Sublineage
V
(PGRS subfamily)
15 isolates
1587 bp
pe23
Sublineage
V
(PGRS subfamily)
17 isolates
1149 bp
pe17
Sublineage
V
(PGRS subfamily)
16 isolates
933 bp
pe1
Sublineage
V
(PGRS subfamily)
18 isolates
1767 bp
pe9
Sublineage
V
(PGRS subfamily)
18 isolates
435 bp
Ppe10
Sublineage
V
(PGRS subfamily)
15 isolates
363 bp
pe33
Sublineage
V
(PGRS subfamily)
17 isolates
285 bp
pe6
Sublineage
V
(PGRS subfamily)
14 isolates
516 bp
nsS1
nsS2
Frameshift
FS1
339
649
A→C
C→T
Gln → His
Leu → Phe
2
1
M.bovis and K85.
CPHL_A. Confirmed.
204 - 208
5 bp deletion
Extended
protein
1
M. bovis
2
Beijing isolates
94_M4241A and
02_1987.
3
1
2
Beijing specific
KZN1435
Beijing isolates 02_1987
and T85.
1
94_M4241A
sSNP
sS1
nsSNPs
nsS1
nsS2
nsS3
T→C
A→G
G→A
846
983
1030
In frame deletion
D1
464 - 466
Frameshift
FS1
1192
1255
Ser → Arg
Gln → Arg
Ala → Thr
3 bp deletion
-
64
bp
deletion/ 59
bp insertion
Premature
stop
1
98-R604. Inserted
sequence derives from
Rv0446c.
M. bovis
Haarlem, strain C and
CDC1551.
KZN1435
K85
Haarlem, strain C and
CDC1551.
nsSNPs
nsS1
nsS2
76
364
G→A
G→A
Gly → Arg
Ala → Thr
1
3
nsS3
nsS4
nsS5
369
1481
1714
C→A
C→T
A→G
Asn → Lys
Pro → Leu
Ile → Val
1
1
3
sSNP
sS1
1453
No variation detected.
T→C
Leu
1
H37Rv
Frameshift
FS1
338
1 bp deletion
Extended
protein
2
Beijing isolates 02_1987
and 94_M4241A.
nsSNP
nsS1
281
G→A
Gly → Asp
2
EAI (Philippines lineage)
isolates T17 and T92.
No variation detected
Frameshift
FS1
141
1 bp deletion
Premature
stop
2
EAI isolates T17 and
EAS054.
nsSNP
nsS1
470
C→T
Leu → Phe
1
K85
Download