Table S1. Gene Sequence variation ppe68 Sublineage I 18 isolates 1107 bp ppe4 Sublineage II (PPW subfamily) 14 isolates 1542 bp ppe11 Sublineage II (PPW subfamily) 18 isolates 1557 bp ppe37 Sublineage II (PPW subfamily) 15 isolates 1422 bp ppe67 Sublineage II (PPW subfamily) 15 isolates 234 bp ppe2 Sublineage II (PPW subfamily) 11 isolates 1671 bp ppe3 Sublineage II (PPW subfamily) 10 isolates 1611 bp ppe46 Position Genetic change Amino acid change Number of isolates Comments nsSNPs nsS1 nsS2 86 685 C→T G→C Ala → Val Val → Leu 1 4 Haarlem EAI specific. Frameshift FS1 1057 1 bp insertion Premature stop 1 02_1987 941 T→C Val → Ala 1 EAS054 460 555 C→T C→G Val Ala 1 8 K85. Confirmed. TBD1- specific. 1288 1510 C→T A→G Arg → Cys Met → Val 3 1 CDC1551, strain C and Haarlem. F11. Confirmed. 1248 G→A Val 1 Strain C 1 M. bovis 2 Beijing isolates T85 and 02_1987. 4 LAM specific. Incorrect amino acid incorporation from codon 340. 1 1 1 CPHL_A Confirmed. T85 M. bovis 1 M. bovis. 1 2 CPHL_A PGG1 isolates T92 and CPHL_A. Mutation adds additional 333 amino acids before next stop codon. Apparent convergent mutation. nsSNP nsS1 sSNP sS1 sS2 nsSNP nsS1 nsS2 sSNP sS1 In-frame deletion D1 91 - 117 Frameshifts FS1 507 FS2 1017 - 1018 27 bp deletion 1 bp deletion 2 bp deletion Premature stop No premature stop. nsSNPs nsS1 370 G→A Val → Met nsS2 449 C→G Ala → Gly nsS3 563 G→T Ser → Ile Whole gene deletion WGD1 1166bp deletion deletes ppe67 and Nterminus of ppe66. nsSNPs nsS1 53 T→G Leu → Arg nsS2 233 A→G Stop → Trp sSNP sS1 nsSNPs nsS1 nsS2 nsS3 nsS4 nsS5 sSNP sS1 nsSNPs nsS1 nsS2 186 T→C Gly 1 Haarlem 419 1211 1292 1381 1487 A→G C→T G→A G→T G→A Glu → Gly Pro → Leu Asp → Asn Ala → Ser Gly → Asp 1 1 1 1 1 CPHL_A. Confirmed. CDC1551 CPHL_A. Confirmed. CDC1551 94_M4241A 1236 C→T Thr 1 T46 556 769 G→T G→A Asp → Tyr Glu → Lys 1 2 C→T G→T Pro → Ser Glu → Asp 1 1 T46 CPHL_A and 02_1987. Apparent convergent mutation. H37Rv CPHL_A C→T C→T Leu Gly 1 1 T46 02_1987 nsS3 1009 nsS4 1344 sSNPs sS1 145 sS2 1338 Homologous recombination Sublineage II (PPW subfamily) 7 isolates 1305 bp ppe48/ppe47 Sublineage II (PPW subfamily) 8 isolates 1077 bp ppe66 16 isolates 946 bp ppe1 Sublineage II (PPW subfamily) 18 isolates 1392 bp ppe20 Sublineage II (PPW subfamily) 14 isolates 1620 bp ppe36 Sublineage III 18 isolates 732 bp HC1 In-frame insertion/ deletion Indel1 4-7 nsSNP nsS1 149 nsS2 973 Homologous recombinations HC1 596 - 929 HC2 596 - 977 Frameshift FS1 244 380 bp deletion / 383 bp insertion 4 K85, F11, CDC1551, and KZN 1435. Results from recombination with ppe47. 4 bp deletion / 13 bp insertion 1 M. bovis 1 1 M. bovis M. bovis. bp / bp 1 02_1987. Results from recombination with ppe46. bp / bp 1 Haarlem. Results from recombination with ppe46. 1 H37Rv. New H37Rv gene (ppe47) predicted to start at position 231. 1 M. bovis 1 M. bovis 1 GM1503 4 1 EAI specific EAS054 3 1 1 3 1 Beijing specific Haarlem M. bovis EAI (Philippines lineage) specific K85 1 1 2 K85 Strain C K85 and M. bovis 1 1 1 1 1 T85 M. bovis 94_M4241A T17. Confirmed. K85. Confirmed. 1 M. bovis 1 F11. Deletion is IS6110 associated. Deletion also involves adjacent gene pe22. T→C C→G 333 deletion 330 insertion 381 deletion 378 insertion Val → Ala Leu → Val 1 bp deletion Premature stop nsSNP nsS1 149 T→C Val → Ala Partial Gene Deletion PGD1 1166bp deletion deletes ppe67 and Nterminus of ppe66. Frameshift FS1 897 1 bp Premature insertion stop nsSNPs nsS1 512 C→A Thr → Asn nsS2 706 G→T Val → Leu nsSNPs nsS1 413 T→C Val → Ala nsS2 476 C→T Thr → Met nsS3 544 C→T Arg → Trp nsS4 803 T→C Leu → Pro nsS5 1364 C→T Thr → Ile sSNPs sS1 894 G→A Pro sS2 945 C→G Pro sS3 1011 C→A Pro nsSNPs nsS1 171 G→C Glu → Asp nsS2 281 T→C Val → Ala nsS3 1327 G→T Ala → Ser nsS4 1415 C→T Pro → Leu nsS5 1445 C→T Ala → Val sSNP sS1 135 A→C Ser Partial gene deletion PGD1 1 – 124 5’ 124 bp deletion Frameshift FS1 ppe69 Sublineage III 15 isolates 1200 bp 596 - 976 136 nsSNP nsS1 539 Partial gene deletion PGD1 1 - 58 nsSNPs nsS1 341 1 bp insertion Premature stop 1 Strain C A→C Glu → Ala 1 T92. Confirmed. 58 bp deletion 1st 23 amino acids deleted 1 Haarlem. Deletion of 5’ gene region. Predicted alternate start codon at position 70. A→T Glu → Val 1 CDC1551 ppe41 Sublineage III 16 isolates 585 bp ppe57 Sublineage III 16 isolates 531 bp ppe58 Sublineage III 16 isolates 522 bp ppe59 Sublineage III 18 isolates 537 bp ppe9 Sublineage IV (SVP subfamily) 9 isolates 543 bp ppe17 Sublineage IV (SVP subfamily 18 isolates 1041 bp ppe29 Sublineage IV (SVP subfamily 13 isolates 1272 bp ppe30 Sublineage IV (SVP subfamily 15 isolates 1392 bp ppe31 Sublineage IV (SVP subfamily 17 isolates 1200 bp ppe32 nsS2 610 nsS3 611 sSNPs sS1 204 sS2 366 sS3 609 Partial gene deletion PGD1 1 - 117 G→T A→G Asp → Cys Asp → Cys 1 1 CDC1551 CDC1551 G→A C→T T→A Ala Asp Gly 2 2 1 CDC1551 and strain C EAI isolates T46 and EAS054 CDC1551 117 bp deletion 1st 39 amino acids deleted 1 GM 1503. Genomic deletion spans 3’ region of upstream gene (pe25) and 5’ region of ppe41. sSNP sS1 177 A→C Pro 1 EAS054 Whole gene deletions 98-R604 INH-RIF-EM, Haarlem, strain C, KZN1435, GM1503, CDC1551. Homologous recombination Multiple instances of homologous recombination between the highly homologous ppe57, ppe58 and ppe59 genes. Whole gene deletions 98-R604 INH-RIF-EM, Haarlem, strain C, KZN1435, GM1503, CDC1551, M. bovis. Homologous recombination Multiple instances of homologous recombination between the highly homologous ppe57, ppe58 and ppe59 genes. Whole gene deletion M. bovis Homologous recombination Multiple instances of homologous recombination between the highly homologous ppe57, ppe58 and ppe59 genes. In frame deletion D1 1146 - 1154 Frameshifts FS1 nsSNPs nsS1 Frameshift FS1 nsSNP nsS1 sSNP sS1 Frameshifts FS1 2 EAI (Philippines lineage) isolates T17 and T46 970 1 bp insertion Premature stop 1 T85 425 A→C Glu → Ala 1 M. bovis 501 1 bp insertion Premature stop 1 M. bovis (ppe17a) 500 C→T Pro → Leu 2 Beijing isolates 02_1987 and T85 675 C→T Pro 1 K85 641 1 bp insertion No premature stop. 1 Haarlem T→G G→T C→A G→C Val → Gly Ala → Ser Ala → Glu Ala → Glu 1 1 1 1 CPHL_A K85. Confirmed. K85. Confirmed. CPHL_A T→G Leu 1 94_M4241A 1 T46 nsSNPs nsS1 353 nsS2 439 nsS3 731 nsS4 1096 sSNP sS1 1105 IS6110 integration IS1 1294 nsSNPs nsS1 484 nsS2 1202 nsSNPs nsS1 nsS2 nsS3 nsS4 nsS5 sSNP sS1 nsSNP 9 bp deletion G→T C→T Gln → Stop Ser → Leu 1 1 K85 CDC1551 287 500 574 680 712 C→G A→C C→T C→T C→G Ala → Gly Gln → Pro His → Tyr Ser → Phe Leu → Val 1 1 1 1 1 GM1503 K85 CPHL_A H37Rv H37Rv 1110 C→T Pro 1 M. bovis 901 C→G Ala → Gly 8 PGG2 and 3 specific 141 G→A Ser 11 TBD1- specific nsSNPs nsS1 568 C→T Gln → stop 1 nsS2 nsS3 nsS4 760 1201 1412 C→G G→A C→G Leu → Val Gly → Arg Ser → stop 1 1 11 M. bovis. New gene (ppe33b) predicted to begin at position 571. H37Rv K85 TBD1- specific. Results in loss of 2 C-terminal amino acids. C→T Ala 1 H37Rv 2 K85 and M. bovis. Part of a 5894 bp deletion in M. bovis compared to H37Rv. Sublineage IV (SVP subfamily 18 isolates 1230 bp nsS1 sSNPs sS1 ppe33 Sublineage IV (SVP subfamily 17 isolates 1407 bp ppe65 Sublineage IV (SVP subfamily 17 isolates 1242 bp ppe14 Sublineage (SVP subfamily 18 isolates 1272 bp ppe50 Sublineage (SVP subfamily 18 isolates 399 bp ppe51 Sublineage (SVP subfamily 17 isolates 1143 bp ppe61 Sublineage (SVP subfamily) 18 isolates 1221 bp IV IV sSNP sS1 471 Whole gene deletion WGD1 nsSNPs nsS1 nsS2 sSNPs sS1 sS2 sS3 nsSNPs nsS1 nsS2 sSNP sS1 83 1066 C→G G→A Ala → Gly Ala → Thr 1 1 Strain C T17. Confirmed. 339 381 777 C→T T→G C→A Ala Leu Ala 1 1 4 CDC1551 CPHL_A TBD1+ specific 481 878 G→T C→T Ala → Ser Thr → Met 1 1 Haarlem CPHL_A. Confirmed. 1200 C→A Gly 1 02_1987 Hypervariable at macromutational scale [32]. Whole gene deletion in 8 isolates (CDC1551, Haarlem, strain C, 94_M4241A, T17, T92, T46, EAS054). No variation detected. IV IV ppe44 Sublineage IV (SVP subfamily) 18 isolates 1149 bp ppe15 Sublineage IV (SVP subfamily) 16 isolates 1176 bp In-frame deletion D1 82 - 84 Frameshift FS1 3 bp deletion 2 EAS054 and 94_M4241A. Convergent mutation. 796 5 bp insertion Premature stop 1 CPHL_A. Confirmed. 421 770 1100 C→T C→T C→T Gln → stop Thr → Met Ala → Val 1 4 1 CDC1551 EAI specific K85 942 G→C Ser 1 Strain C 176 581 G→T T→C Gly → Val Phe → Ser 4 8 EAI specific PGG1 specific. 624 C→T Ala 1 M. bovis Frameshifts FS1 8 1 bp deletion 1 02_1987. Ppe motif absent. FS2 23 1 Alternate start site at position 43 predicted. Alternate 1 94_M4241A. Ppe motif absent. nsSNPs nsS1 nsS2 nsS3 sSNP sS1 nsSNPs nsS1 nsS2 sSNP sS1 bp ppe43 Sublineage IV (SVP subfamily) 17 isolates 1185 bp ppe18 Sublineage (SVP subfamily) 16 isolates 1176 bp ppe19 Sublineage (SVP subfamily) 18 isolates 1191 bp ppe60 Sublineage (SVP subfamily) 15 isolates 1182 bp ppe22 Sublineage (SVP subfamily) 14 isolates 1158 bp ppe26 Sublineage (SVP subfamily) 17 isolates 1182 bp IV nsSNPs nsS1 nsS2 Frameshift FS1 deletion start site at position 43 predicted. 199 541 G→A G→T Ala → Thr Ala → Ser 1 1 KZN1435 EAS054 448 - 452 5 bp deletion Premature stop 1 CPHL_A. Confirmed. nsSNPs nsS1 788 C→G Pro → Arg 1 M. bovis nsS2 1040 G→T Gly → Val 4 LAM specific Homologous recombinations Multiple instances of homologous recombination events between the highly homologous PPE19, PPE18 and PPE60 genes. IV Homologous recombinations Multiple instances of homologous recombination events between the highly homologous PPE19, PPE18 and PPE60 genes. IV Homologous recombinations Multiple instances of homologous recombination events between the highly homologous PPE19, PPE18 and PPE60 genes. IV IV ppe23 Sublineage IV (SVP subfamily) 18 isolates 1185 bp ppe45 Sublineage IV (SVP subfamily) 16 isolates 1227 bp ppe25 Sublineage IV (SVP subfamily) 15 isolates 1098 bp nsSNPs nsS1 nsS2 nsS3 nsS4 454 770 937 1091 In-frame deletion D1 547 - 552 nsSNPs nsS1 nsS2 nsS3 nsSNP nsS1 T→C T→C G→C C→T Tyr → His Ile → Thr Val → Leu Thr → Met 6 bp deletion 1 1 7 1 02_1987 98-R604_INH-RIF-EM PGG2 and 3 specific CDC1551 1 Haarlem 241 820 823 G→A T→G G→A Ala → Thr Ser → Ala Ala → Thr 1 1 1 02_1987 M. bovis T17. Confirmed. 109 T→C Ser → Pro 1 K85 225 320 G→A C→T Trp → stop Pro → Leu 1 1 K85 94_M4241A 1227 G→A stop 1 T92. Sequence error. Normal sequence confirmed. This variation not included in analysis. Homologous recombinations Various combinations of 10 SNPs and a 45 bp deletion 4 02_1987, F11, KZN1435 and CPHL_A. Mutations indicate recombination with ppe27. In-frame deletion D1 825 - 827 1 M. bovis 1 3 1 98-R604_INH-RIF-EM CDC1551, strain C and Haarlem M. bovis nsSNPs nsS1 nsS2 sSNP sS1 nsSNPs nsS1 nsS2 nsS3 164 848 932 3 bp deletion C→T C→T T→G Ala → Val Ala → Val Val → Gly ppe27 Sublineage IV (SVP subfamily) 16 isolates 1053 bp ppe38/ppe71 Sublineage IV (SVP subfamily) 18 isolates 1176 bp ppe49 Sublineage IV (SVP subfamily) 18 isolates 1176 bp ppe10 Sublineage V (MPTR subfamily) 14 isolates 1464 bp ppe12 Sublineage V (MPTR subfamily) 17 isolates 1938 bp ppe21 Sublineage V (MPTR subfamily) 17 isolates 2283bp ppe39 Sublineage V (MPTR subfamily) 14 isolates 1869 bp sSNP sS1 423 C→T Ala 1 GM1503 nsSNPs nsS1 163 G→C Ala → Pro 1 K85 nsS2 568 C→T Pro → Ser 1 M. bovis sSNPs sS1 543 A→C Ala 3 EAI isolates EAS054, T46 and T92. sS2 765 A→G Pro 1 M. bovis Hypervariable on a macro-mutational scale due to numerous instances of homologous recombination with identical homologue ppe71 plus numerous IS6110-associated mutations. Micro-mutations (SNPs, small indels) are uncommon [26]. nsSNP nsS1 547 C→T Gln → stop 1 Haarlem Frameshifts FS1 505 1 bp deletion Premature stop 1 T85 nsSNP nsS1 23 G→A Trp → stop 1 nsS2 36 G→T Glu → Asp 1 G→C C→T Gly → Ala Pro → Leu 1 1 M. bovis. Eighth aa coverted to stop. Coding resumes at codon 9 resulting in gene with 8 N-terminal aa missing. T92. Changes Glu of ppe signature sequence. K85 K85 3 EAI (Philippines lineage) specific nsS3 863 nsS4 1400 In-frame insertions I1 1043 Frameshifts FS1 nsSNPs nsS1 sSNPs sS1 Frameshift FS1 1125 1 bp deletion Premature stop 1 T92. Sequence error. Normal sequence confirmed. This variation not included in analysis. 1634 A→G Lys → Arg 11 TBD1- specific 1389 T→C Ile 1 F11. Confirmed. 60 1 bp deletion Premature stop 1 H37Rv C→G G→A T→C G→A Pro → Arg TGG → stop Val → Ala Gly → Asp 1 1 1 1 M. bovis 94_M4241A M. bovis Haarlem nsSNPs nsS1 107 nsS2 225 nsS3 449 nsS4 1844 IS6110 integrations IS1 47 IS2 30 bp insertion Premature stop Premature stop 19 Homologous recombination HR1 550 Fusion with PPE40 Whole gene deletions WGD1 WGD2 Partial gene deletion PGD1 1358 In-frame deletions Premature stop 2 1 Haarlem and F11. Convergent mutation [27]. H37Rv 2 K85 and 94_M4241A. Convergent mutation. 1 1 T92. Part of large RD5-like deletion. 02_1987. Part of a major genomic rearrangement [26]. 1 M. bovis. Ppe39 part of the RD5 deletion. D1 ppe40 Sublineage V (MPTR subfamily) 16 isolates 1848bp 88 - 90 nsSNP nsS1 539 IS6110 integrations IS1 47 Homologous recombination HR1 550 In-frame deletions D1 490 - 492 Partial gene deletion PGD1 1582 nsSNP nsS1 nsS2 – S6 ppe6 Sublineage V (MPTR subfamily) 15 isolates 2892 bp This gene split into 2 (ppe5/6) in bovis, K85 (type 1) and H37Rv, T17 (type 2). This gene split into 3 predicted open reading frames in CPHL_A. ppe5 Sublineage V 1096 1100 - 1004 sSNP sS1 969 In-frame insertions I1 2379 I2 7604 In-frame deletions D1 834 - 842 D2 4763 - 4822 Frameshifts FS1 2399 FS2 2930 FS3 5945 nsSNP nsS1 nsS2 nsS3 nsS4 nsS5 3 bp deletion 2 EAS054 and CDC1551. Convergent mutation. Leu → Ser 1 98-R604_INH-RIF-EM Premature stop 2 02_1987 and CPHL_A. Convergent mutation [27]. Fusion with ppe39 2 K85 and 94_M4241A. Convergent mutation [26]. 3 bp deletion 1 M. bovis T→C RD5-like deletion Premature stop 1 T92. Large deletion fuses 5’ region of PPE40 with plcC. G→C CTGGA → ACAAC Gly → Arg Thr, Gly → Asn, Asn 1 1 KZN 1435 KZN 1435. nsS1-6 represents 6 SNPs in a 9 bp region. T→C Asn 1 Strain C 30 bp insertion 15 bp insertion 1 KZN 1435 5 PGG2 and 3 specific 9 bp deletion 60 bp deletion 1 94_M4241A 1 F11. Confirmed. H37Rv CPHL_A. FS1 and 2 both result in a new gene (ppe5) starting at ppe6 codon 983. CPHL_A, K85 and M. bovis specific. Results in new gene (ppe5 starting at ppe6 codon 2035. Note alternate ppe5 to that formed from FS1-3. Third ppe gene formed in CPHL_A (see ppe FS1). 1 bp insertion 1 bp deletion 1 bp deletion Premature stop Premature stop Premature stop 1 617 2727 2798 6118 6763 G→C A→G G→T G→A A→G Gly → Ala Ile → Met Gly → Val Asp → Asn Ile → Val 3 1 4 1 3 nsS6 7358 G→A Gly → Glu 2 nsS7 nsS8 8219 8240 G→C G→C Gly → Ala Phe → Ser 1 8 nsS9 9412 nsS10 9464 sSNP sS1 1359 sS2 2928 sS3 3562 sS4 4446 sS5 5135 sS6 5454 sS7 5655 In-frame insertions I1 4658 A→G G→A Asn → Asp Gly → Asp 1 1 EAI (Philippines lineage) specific 94_M4241A LAM and PGG3 specific 02_1987 EAI specific. Same mutation seen in ppe5 nsS1. EAI (Philippine lineage) specific. Same mutation seen in ppe5 nsS2. 94_M4241A TBD1- specific. Same mutation as seen in ppe5 nsS4. T92. Confirmed. T92. Confirmed. T→C G→A T→C G→A G→A C→G C→T Arg Gly Leu Ser Pro Thr Gly 1 1 8 1 1 1 1 T85 Haarlem TBD1- specific F11. Confirmed. T46 02_1987 EAS054 1 H37Rv. Same insertion as seen in 15 bp 1 2 (MPTR subfamily) 15 isolates 6615 bp insertion ppe6 I2. Frameshifts FS1 2929 1 bp deletion Premature stop 1 CPHL_A. Results in 3rd ppe gene Starting at ppe6 codon 2035 (see ppe6 FS4). Confirmed. Ppe5 formed from split of ppe6. Only present in M. bovis, K85 & CPHL_A (type 1 PPE5) and Rv, T17 and CPHL_A (type 2). CPHL_A ppe5 further split into additional gene. ppe54 Sublineage V (MPTR subfamily) 10 isolates 7572 bp ppe8 Sublineage V (MPTR subfamily) 12 isolates 9903 bp nsSNP nsS1 nsS2 1441 5295 G→A G→C Gly → Thr Phe → Ser 2 1 nsS3 sSNP sS1 2863 A→G Thr → Ala 1 M. bovis & K85 specific H37Rv. Same mutation seen in ppe6 nsS8. M. bovis 616 T→C Leu 1 sS2 1212 T→C Gly 1 1 gene (ppe8) in TBD1+. Two genes (ppe7 & ppe8) in TBD1- due to frameshift (FS2) with termination in ppe8 and new start site. In-frame insertions I1 5003 H37Rv (same mutation seen in ppe6 sS3). K85 Extreme variation observed. All isolates unique. Note: Identical gene sequence in the closely related isolates KZN1435, 4207 & 605. Identical gene sequence in 3 members of the Harlingen transmission chain [58,59 (average coverage = 84%). In-frame deletions D1 6434 - 6493 D2 7506 - 7535 D3 9311 - 9370 I2 7352 Frameshifts FS1 8947 FS2 9875 nsSNPs nsS1 nsS2 nsS3 nsS4 nsS5 nsS6 nsS7 nsS8 nsS9 nsS10 nsS11 nsS12 nsS13 nsS14 nsS15 nsS16 sSNPs sS1 sS2 sS3 sS4 sS5 sS6 sS7 60 bp deletion 30 bp deletion 60 bp deletion 2 LCC (CDC1551 and strain C) specific 1 Strain C 3 EAS054, KZN 1435 and 98-R604_INHRIF-EM. Possible convergence. 15 bp insertion 15 bp insertion 1 98-R604_INH-RIF-EM 1 H37Rv 1 Strain C 8 All TBD1+ isolates. Coding region for ppe7 begins at position 9973 for TBD1+ isolates. 1 bp insertion 2 bp deletion Premature stop Premature stop 353 1240 3578 4027 4639 5520 5840 6296 6337 7173 7756 7897 8484 9733 9931 10418 T→C T→G C→G G→A G→T A→G G→A G→A T→C T→G G→T G→A C→A T→A G→A T→C Val → Ala Phe → Val Ala → Gly Gly → Ser Ala → Ser Asn → Asp Gly → Asp Gly → Asp Trp → Arg Ser → Arg Gly → Trp Gly → Ser Phe → Leu Phe → Ile Ser → Thr Phe → Val 2 1 1 1 1 1 1 1 1 1 1 1 1 8 1 1 657 3357 3924 4122 5433 5982 7209 G→A C→A C→T C→T G→A A→G A→C Ser Gly Ile Asn Gly Ala Gly 1 1 1 1 1 7 1 EAI specific CPHL_A. Confirmed. EAS054 K85 K85 M. bovis Haarlem T46 Strain C Strain C H37Rv EAI specific K85. Confirmed. All TBD1- isolates K85 EAS054 94_M4241A K85 Haarlem EAS054 F11. Confirmed. All PGG2 and 3 isolates M. bovis ppe7 Sublineage V (MPTR subfamily) 8 isolates 426 bp ppe16 Sublineage V (MPTR subfamily) 16 isolates 1857 bp sS8 Frameshift FS1 nsSNPs nsS1 ppe34 Sublineage V (MPTR subfamily) 17 isolates 4380 bp ppe35 Sublineage V (MPTR subfamily) 18 isolates 2964 bp C→T Pro 1 94_M4241A 375 1 bp deletion Premature stop 1 H37Rv 271 G→T Ala → Ser 1 CDC1551 1 CPHL_A. Ppe16 deleted along with neighbouring gene Rv1134. Premature stop 1 T85 Premature stop 1 K85 Whole gene deletion WGD1 IS6110 integration IS1 1222 Frameshifts FS1 ppe24 Sublineage V (MPTR subfamily) 13 isolates 3162 bp ppe13 Sublineage V (MPTR subfamily) 17 isolates 1332 bp 9684 1333 - 1337 5 bp deletion nsSNPs nsS1 85 T→A Val → Asp 1 T17. Confirmed. nsS2 314 C→T Ala → Val 1 CDC1551 sSNP sS1 342 G→A Val 1 K85. Confirmed. Extreme variation observed. All isolates unique. Note: Identical gene sequence in the closely related isolates KZN1435, 4207 & 605. Identical gene sequence in 3 members of the Harlingen transmission chain [58,59] (average coverage = 84%). Frameshifts FS1 51 FS2 1306 FS3 1307 FS4 1313 FS5 1314 1 bp deletion 1 bp deletion 1 bp insertion 2 bp deletion 1 bp deletion Premature stop Poly C/poly A region from position 1298 results in numerous FS variations. 1 KZN 1435 5 PGG1and 2 isolates GM1503, T17, T46, T92 and K85. PGG1 and 2 isolates F11, KZN 1435, CPHL_A and EAS054. Note: sequence reanalysis shows CPHL_A has a 2 bp insertion. CPHL_A 4 1 9 PGG1 and 2 isolates M. bovis, strain C, 98-R604, T17, T46, T92, K85, EAS054, GM1503. nsSNPs nsS1 289 G→A Ala → Ser 3 EAI (Philippines lineage) specific nsS2 513 G→A Trp → stop 1 EAS054 sSNPs sS1 732 C→T Asn 1 K85 sS2 1008 C→T Gly 4 EAI specific Extreme variation observed. All isolates unique. Note: Identical gene sequence in the closely related isolates KZN1435, 4207 & 605. Identical gene sequence in 3 members of the Harlingen transmission chain [58,59 (average coverage = 84%). In frame deletion D1 1603 - 1680 78 bp deletion 1 Haarlem Frameshift FS1 1877 1 bp insertion Premature stop 1 M. bovis. Results in stop codon at nucleotide position 1953 – 1955 and new predicted gene (ppe35b) start codon at position 2038. nsSNPs nsS1 nsS2 nsS3 nsS4 nsS5 1949 1960 335 643 2708 G→A C→A C→T A→C G→A Gly → Asp Ser → Thr Thr → Ile Ser → Arg Gly → Asp 1 1 1 1 1 Strain C K85. Confirmed. M. bovis. Ppe35b (see FS1). M. bovis. Ppe35b (see FS1). EAS054 ppe28 Sublineage V (MPTR subfamily) 17 isolates 1968 bp ppe63 Sublineage V (MPTR subfamily) 15 isolates 1440 bp ppe42 Sublineage V (MPTR subfamily) 14 isolates 1743 bp ppe53 Sublineage V (MPTR subfamily) 12 isolates 1773 bp ppe62 Sublineage V (MPTR subfamily) 15 isolates 1749 bp ppe52 Sublineage V (MPTR subfamily) 11 isolates 1230 bp ppe64 Sublineage V (MPTR subfamily) 16 isolates 1659 bp nsS6 sSNP sS1 Frameshift FS1 nsSNPs nsS1 nsS2 nsS3 nsS4 sSNP sS1 nsSNPs nsS1 nsS2 nsSNPs nsS1 nsS2 2765 C→T Ser → Leu 8 PGG2 and 3 specific 2238 C→G Val 1 Strain C 169 - 213 45 bp deletion/ 44 bp insertion Premature stop 1 T17. Sequence error. Normal sequence confirmed. This variation not included in analysis. 432 449 757 1508 G→T C→T T→G T→C Trp → Cys Ala → Val Phe → Val Val → Ala 10 2 2 1 TBD1- specific Haarlem and strain C. M. bovis and K85 M. bovis 1509 C→T Val 1 T46 1093 1265 T→A C→T Tyr → Asn Thr → Met 4 1 EAI specific K85 157 841 G→A C→T Ala → Thr Pro → Ser 1 2 CDC1551 M. bovis and K85 4 M. bovis, T17, T46 and K85 1 EAS054 In frame deletion D1 190 - 192 3 bp deletion 30 bp deletion D2 1186 - 1215 Frameshift FS1 60 1 bp deletion Premature stop 2 T17 and T46 nsSNPs nsS1 97 C→G Arg → Gly 2 C→A G→T A→G Asp → Glu Gly → Val Thr → Ala 1 1 1 EAS054 and Haarlem. Possible convergent mutation. M. bovis EAS054 EAS054 3 bp deletion 1 GM1503 90 bp insertion 1 CPHL_A nsS2 612 nsS3 1013 nsS4 1681 In frame deletion D1 207 - 209 In frame insertion I1 956 nsSNPs nsS1 nsS2 Frameshift FS1 nsSNPs nsS1 1026 1690 C→A G→A Ser → Arg Gly → Ser 1 2 Haarlem T17 and T46 284 1 bp insertion Premature stop 1 T85 1198 C→A Gln → Lys 5 PGG2 and 3 specific 3 bp deletion 60 bp deletion. 4 LAM specific 1 K85 30 bp insertion. 1 02_1987 1 T92. Confirmed. In frame deletion D1 88 - 90 D2 590 - 649 In frame insertion I1 911 Frameshift FS1 757 1 bp insertion. Premature stop. ppe55 Sublineage V (MPTR subfamily) 9 isolates 9474 bp ppe56 Sublineage V (MPTR subfamily) 6 isolates 11151 bp nsSNPs nsS1 34 A→G Asn → Asp 1 CPHL_A nsS2 916 G→A Gly → Ser 4 EAI specific nsS3 1019 A→G Ile → Ser 1 GM1503 Extreme variation observed. Numerous frameshifts split the gene into 2 or 3 distinct open reading frames in several isolates. Note: Identical gene sequence in the closely related isolates KZN1435, 4207 & 605. Identical gene sequence observed in 3 members of the Harlingen transmission chain [58,59] (average coverage = 84%). Frameshift FS1 6476 - 6546 FS2 6041 nsSNPs nsS1 nsS2 nsS3 nsS4 nsS5 nsS6 nsS7 nsS8 sSNPs sS1 sS2 sS3 sS4 Table S1B 71 bp deletion and 58 bp insertion. 1 bp deletion Premature stop. 1 Haarlem Premature stop. 1 M. bovis (ppe56b). New gene (ppe56d) begins position 7705. Note: no ppe56c listed in BoviList [76]. 52 1518 T→C C→A Cys → Arg Tyr → stop 1 1 2897 1360 3149 3150 5143 3878 G→A A→G C→T C→T G→T C→T Gly → Asp Asn → Asp Thr → Ile Thr → Ile Gly → Cys Ala → Val 1 1 1 1 1 1 M. bovis M. bovis. New gene (ppe56b) begins at position 1576. EAS054 M. bovis (ppe56b) EAS054 EAS054 EAS054 M. bovis (ppe56b) 906 126 3777 1566 G→C C→G G→A G→A Gly Ala Glu Ser 1 1 2 1 M. bovis M. bovis (ppe56b) EAI isolates EAS054 and T46 M. bovis (ppe56d) Gene Sequence variation Position Genetic change Amino acid change Number of isolates Comments pe35 Sublineage I 17 isolates 297 bp Frameshift FS1 10 1 bp deletion Premature stop 2 94_M4241A and CPHL_A. Convergent mutation. Occurs in polyA sequence. nsSNP nsS1 295 T→G Stop → Glu 1 H37Rv. 1 additional amino acid added to Cterminal end. pe34 Sublineage I 18 isolates 336 bp pe5 Sublineage II 18 isolates 309 bp pe15 Sublineage II 17 isolates 309 bp pe29 Sublineage II 9 isolates 315 bp pe36 Sublineage III 18 isolates 234 bp pe25 Sublineage III 18 isolates 300 bp pe22 Sublineage III 18 isolates 297 bp nsSNP nsS1 155 C→T Thr → Met 1 GM1503 G→A Ala → Thr 1 Haarlem 1 F11. Deletion associated with IS6110. Deletion also includes part of adjacent ppe36 gene. pe11 Sublineage IV 18 isolates 303 bp pe20 Sublineage IV 16 isolates 300 bp pe18 Sublineage IV 18 isolates 300 bp sSNP sS1 1 T85 1 98-R604 INH-RIF-EM pe19 Sublineage IV 18 isolates 300 bp pe32 Sublineage IV 18 isolates 300 bp No variation detected. nsSNP nsS1 151 No variation detected. No variation detected. No variation detected. Whole gene deletion WGD1 90 C→T Asp No variation detected. IS6110 integration IS1 195 nsSNP nsS1 4 sSNP sS1 300 sSNP sS1 123 T→C Ser → Pro 1 Haarlem G→A Stop 1 M. bovis C→T Pro 1 M. bovis 2 K85 and M. bovis (RD8 deletion). 1 Strain C Whole gene deletion WGD1 sSNP sS1 84 A→G Gly pe13 Sublineage IV 17 isolates 300 bp pe31 Sublineage IV 17 isolates 297 bp pe7 Sublineage IV 17 isolates 300 bp pe8 Sublineage IV 16 isolates 828 bp pe27 Sublineage IV 15 isolates 828 bp pe2 Sublineage V (PGRS subfamily) 18 isolates 1578 bp pe24 Sublineage V (PGRS subfamily) 18 isolates 1005 bp pe26 Sublineage V (PGRS subfamily) 8 isolates 1479 bp pe4 Sublineage V (PGRS subfamily) 18 isolates 1509 bp pe3 Sublineage V (PGRS subfamily) 17 isolates 1407 bp pe12 Sublineage V (PGRS subfamily) 18 isolates No variation detected. nsSNP nsS1 77 sSNP sS1 84 No variation detected. C→T Ala → Val 1 M. bovis T→C Asn 1 F11. Confirmed. 511 G→A Ala → Thr 1 K85. Confirmed. 243 810 G→A C→T Gly Pro 1 1 H37Rv K85. Confirmed. 128 C→T Ala → Val 2 152 136 808 T→G C→T A→G Leu → Arg Pro → Ser Met → Val 1 6 5 EAI (Philippines lineage) isolates T17 and T46. CDC1551 TBD1+ isolate specific. H37Rv, F11, 98R604_INH-RIF-EM, GM1503 and KZN1435. 998 1021 1024 1027 1 bp deletion 1 bp deletion 1 bp deletion 1 bp deletion Combine produce premature stop. 1 EAS054 872 G→A Gly → Glu 4 LAM specific. 903 G→A Glu 1 Strain C 932 G→A Gly → Val 2 Beijing isolates 02_1987 and T85. nsSNPs nsS1 nsS2 518 519 G→C G→C Gly → Ala Gly → Ala 1 1 M. bovis M. bovis Frameshift FS1 360 1 bp deletion Premature stop 1 GM1503 346 492 631 1108 1426 G→A G→C G→A C→T C→G Ala → Thr Lys → Asn Ala → Thr Gln → stop Gln → Glu 1 4 2 1 1 M. bovis EAI specific. M. bovis and K85. K85. Confirmed. T46 232 C→A Arg 1 T46 40 271 763 1102 G→A G→C C→A C→T Ala → Thr Glu → Gln Pro → Thr Arg → Trp 1 1 1 1 H37Rv 94_M4241A M. bovis CDC1551 39 G→C Thr 1 CPHL_A. Confirmed. 98 1 bp insertion Premature stop 1 02_1987 nsSNP nsS1 sSNPs sS1 sS2 nsSNPs nsS1 nsS2 nsS3 nsS4 Frameshifts FS1 FS2 FS3 FS4 nsSNPs nsS1 sSNP sS1 nsSNP nsS1 nsSNPs nsS1 nsS2 nsS3 nsS4 nsS5 sSNPs sS1 nsSNPs nsS1 nsS2 nsS3 nsS4 sSNP sS1 Frameshift FS1 nsSNPs to 927 bp pe14 Sublineage V (PGRS subfamily) 15 isolates 333 bp pe16 Sublineage V (PGRS subfamily) 15 isolates 1587 bp pe23 Sublineage V (PGRS subfamily) 17 isolates 1149 bp pe17 Sublineage V (PGRS subfamily) 16 isolates 933 bp pe1 Sublineage V (PGRS subfamily) 18 isolates 1767 bp pe9 Sublineage V (PGRS subfamily) 18 isolates 435 bp Ppe10 Sublineage V (PGRS subfamily) 15 isolates 363 bp pe33 Sublineage V (PGRS subfamily) 17 isolates 285 bp pe6 Sublineage V (PGRS subfamily) 14 isolates 516 bp nsS1 nsS2 Frameshift FS1 339 649 A→C C→T Gln → His Leu → Phe 2 1 M.bovis and K85. CPHL_A. Confirmed. 204 - 208 5 bp deletion Extended protein 1 M. bovis 2 Beijing isolates 94_M4241A and 02_1987. 3 1 2 Beijing specific KZN1435 Beijing isolates 02_1987 and T85. 1 94_M4241A sSNP sS1 nsSNPs nsS1 nsS2 nsS3 T→C A→G G→A 846 983 1030 In frame deletion D1 464 - 466 Frameshift FS1 1192 1255 Ser → Arg Gln → Arg Ala → Thr 3 bp deletion - 64 bp deletion/ 59 bp insertion Premature stop 1 98-R604. Inserted sequence derives from Rv0446c. M. bovis Haarlem, strain C and CDC1551. KZN1435 K85 Haarlem, strain C and CDC1551. nsSNPs nsS1 nsS2 76 364 G→A G→A Gly → Arg Ala → Thr 1 3 nsS3 nsS4 nsS5 369 1481 1714 C→A C→T A→G Asn → Lys Pro → Leu Ile → Val 1 1 3 sSNP sS1 1453 No variation detected. T→C Leu 1 H37Rv Frameshift FS1 338 1 bp deletion Extended protein 2 Beijing isolates 02_1987 and 94_M4241A. nsSNP nsS1 281 G→A Gly → Asp 2 EAI (Philippines lineage) isolates T17 and T92. No variation detected Frameshift FS1 141 1 bp deletion Premature stop 2 EAI isolates T17 and EAS054. nsSNP nsS1 470 C→T Leu → Phe 1 K85