Whole-genome sequence of an evolved Clostridium pasteurianum strain reveals Spo0A deficiency responsible for increased butanol production and superior growth Nicholas R. Sandoval1, Keerthi P. Venkataramanan1, Theodore S. Groth1, and Eleftherios T. Papoutsakis1,2 1 Department of Chemical and Biomolecular Engineering & The Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA 2 Department of Biological Sciences, University of Delaware, USA 1 Supplemental Results SMRT sequencing coverage indicates regions of DNA modification The wild type gDNA was extracted during the transition phase while the M150B gDNA was extracted during midexponential phase. The wild type coverage appears to be even across the genome with two notable exceptions discussed below. The M150B coverage appears to be dependent on the distance from the origin of replication of the chromosome. As M150B was actively growing at the time of gDNA harvesting, we suppose the regions near the origin are more abundant at the beginning of DNA replication. We observed a notable increase in the coverage around 3.40-3.51 MB for only the wild type strain. The region near 3.51 MB contains five transposase genes (c32470c32510) which may affect coverage in this region. We observed a sharp increase in coverage for both the wild type and mutant around 4.27-4.31 Mb (Supp. Fig. 2). This is possibly due to phage elements at this locus (c40170-c40480). This phenomenon of increased coverage during SMRT sequencing around probable phage genes has been observed recently in this research group in other Clostridium species. 2 Table S1 – Nucleotide sequence variants from published data Location Type Variation Mutation Note 142286 Del. A to - Frame shift IS116/IS110/IS902 family transposase 226301 Del. T to - Frame shift Methyl-accepting chemotaxis protein TlpB 635237 Del. TT to - Frame shift IS801 transposase 698471 Del. T to - Frame shift Integrase catalytic subunit 699139 Del. A to - 2883135 Del. T to - 4138602 Del. T to - 1015245 Ins. - to T Frame shift Phage integrase family protein 986198 Sub. C to G A373P IS1604transposase 1247868 Sub. G to A A319V IS1604transposase 2019787 Sub. G to A E210K Hypothetical protein AJA47948.1 2020013 Sub. C to G T285S 2055741 Sub. A to G K255E 2265810 Sub. T to C E264G Hypothetical protein AJA47948.1 Spo0A Sequence here is similar to most Clostridium Spo0A sequences, including Cpa DSM 625(21) Phosphate-binding protein PstS 3203809 Sub. A to G 3205294 Sub. A to G No CDS Frame shift In transposase No CDS No CDS S54G 3525518 Sub. C to T All with >66x coverage, compared with Rotta et al. 2015(20) General secretion pathway protein A No CDS Table S2 - Sequence and features of pDcm2.0 with codon optimized Cpa bepIM (NS_dcm_Cpa) LOCUS 3378 bp circular DEFINITION [pD881:144954] ACCESSION DNA2.0 Id: SOURCE Synthetic FEATURES Location/Qualifiers source 1..3378 /organism=synthetic gene 151..1278 /gene="NS_dcm_Cpa" /label="NS_dcm_Cpa" misc_feature complement(2378..3136) /product="Kanamycin-r" /label="Kanamycin-r" misc_feature 1..114 /product="P_rhaBAD" /label="P_rhaBAD" misc_feature 138..150 /product="strong RBS" /label="strong RBS" misc_feature 151..1278 /product="insert:" /label="insert:" 3 misc_feature 1295..1385 /product="Term_PhageT7" /label="Term_PhageT7" misc_feature 1497..2323 /product="Ori_p15a" /label="Ori_p15a" BASE COUNT 848 T 816 G 1 51 101 151 201 251 301 351 401 451 501 551 601 651 701 751 801 851 901 951 1001 1051 1101 1151 1201 1251 CACCACAATT TGCCAATGGC GCTTTTTAGA ATGGAGCAGC M E Q L AACGGACGAT T D D AGCACAAACT H K L AAACGTCGCT K R R F CTTGGGCTTC L G F ATAACTTCGA N F D ACCTATCGCT T Y R S CATTCGTGAT I R D CGTGTCAGGA C Q D CGCGGTCGTC R G R L AGTCGCTTTC V A F GCGTCGTACT V V L GTGTACTTTC V Y F H TGAGCGTGTG E R V TCATTCCGCT I P L GCGATCGATG A I D D CAGACGCGAC R R D GTAATATCCG N I R CACCACGGTA H H G N TCTGTCCAAT L S N CGTTTCCGGA F P D CAAGTCGGCA Q V G N GCTGTTCCTG L F L CAGCAAATTG CCATTTTCCT CTGGTCGTAA TCAGCATTTT S I F CGCGAACTGT R E L S GACCGAGCGT T E R TTACCATTCT T I L AAGGGCGGCT K G G F CATCATTTGG I I W CCTATTTCGG Y F G GATGAATTTC D E F P CTTCAGCCTG F S L TGTATTTGCA Y L Q ATCGCCGAGA I A E N GAAAACGATC K T I ACCTGTACAA L Y N ATCATTTATG I I Y G GGAAACCCAT E T H ATCTGTGGGA L W D TACTCGAAAG Y S K A TATTCAGAGC I Q S ATATCGAAGG I E G TGGCGTCGCC W R R L TGACTTCATT D F I ACGCAGTTCC A V P AGCTTGATCC S L I R 893 A TGAACATCAT GTCAGTAACG TGAAATTCTT CAATAAAGTC N K V CTATTGAAGA I E E ATCGATATCA I D I I GAGCCTGTTT S L F TCACCTATTT T Y L GCGAATGAAA A N E I TAACCATATT N H I CTCAAGCGGA Q A D GCTGGTAAAA A G K K GATGAAGCGT M K R ATGTTCGTAA V R N ATCGACGACT I D D F TGCAGCGAAC A A N GCATTCGTGA I R E AGCCTGTATA S L Y N CAAACTGGAT K L D CGAAGTTTTA K F Y GACAAGGTTG D K V A TCATTACCGC H Y R TGAGCGTGCG S V R TTTCAGAGCG F Q S A GCCGGTGCTG P V L GCATCAAAGG I K G 821 C CACGTTCATC TTTCCCTGGT AGAAGGTCGC GAATTCAGGC TTTAAGAAGG AGATATACAT GACGATTTCC AAGAGCAGCA D D F Q E Q Q GATCAACAAG TTTATCAACG I N K F I N E TTAACACGGA GAATGCAAGC N T E N A S AGCGGTTGTG GTGGTCTGGA S G C G G L D ACACGGTAAT TACGAACGTA H G N Y E R N TCAACAGCCA AGCTGTTGAA N S Q A V E GTGTGCGAGG ACATCAATAA V C E D I N N CATCATTATC GGTGGTTTTC I I I G G F P AGCAAGGCTT GAACGTTGAG Q G L N V E GCGATTGATG CGGTTAAGCC A I D A V K P TCTGATGGTC ATGGGTAACG L M V M G N G TCAAGCAGAG CGGTTACAAT K Q S G Y N TACGGTGTTC CGCAGAACCG Y G V P Q N R GGATCTGAAC AATATTCCGT D L N N I P F ACTGGGTGAC TGCCTCTGAG W V T A S E ACCAACATCC CGAACCACAG T N I P N H S CGAAGGCAAG CGTACGCAGG E G K R T Q G CGCCGACCAT TCGCGCAGAA P T I R A E ACCTACGGCG ACGAGTCCGA T Y G D E S D TGAGTGCGCA CGTATTCAAA E C A R I Q T CAGCCAGCAG CGCGTATAAA A S S A Y K GCGTGGAATA TTGCCCGTGC A W N I A R A TTAGAGCGGC CGCCACCGCT * Frame 1 4 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 Frame 1 1301 1351 1401 1451 1501 1551 1601 1651 1701 1751 1801 1851 1901 1951 2001 2051 2101 2151 2201 2251 2301 2351 2401 2451 2501 2551 2601 2651 2701 2751 2801 2851 2901 2951 3001 3051 3101 3151 3201 3251 3301 3351 GAGCAATAAC TTTTTTGCTG ATCATTCAGG CACTGGCTCA TAAGATGATC GAAAACGAAA TACCAACTCT AACTTGTCCT TCCTCTAAAT TTTCCGGGTT GACTGAACGG CCCGGAACTG GGAATGACAC GCCGCCAGGG ACCACTGATT CTATGGAAAA TTCCTGGCAT TCGCCGCAGT AATATATCCT CTCCTGCCAC TAAGCCAGTA CAGGTGATTA GAAATTGCAA AGCCGTTTCT GGCAAGATCC AACCTATTAA CCATGAGTGA TTTCCAGACT TCGCATCAAC AATACGCGAT CCGGCGCAGG GATATTCTTC AGTAACCATG TGGCATAAAT CATTGGCAAC GGCTTCCCAT GCGAGCCCAT GCGGCCTCGA GATAACTCAA GTTGGAACCT GGAGGCTTTT GTATTGAGCG TAGCATAACC AAAGGAGGAA ACGAGCCTCA CCTTCACGGG TTCTTGAGAT AAACCGCCTT TTGAACCGAG TTCAGTTTAG CAATTACCAG GGACTCAAGA GGGGTTCGTG AGTGTCAGGC CGGTAAACCG GGAAACGCCT TGAGCGTCAG ACGGCTTTGC CTTCCAGGAA CGAACGACCG GTATCACATA ATGAAGCACT TACACTCCGC CATTTGGGCC TTTATTCATA GTAATGAAGG TGGTATCGGT TTTCCCCTCG CGACTGAATC TGTTCAACAG CAAACCGTTA CGCTGTTAAA AACACTGCCA TAATACCTGG CATCATCAGG TCCGTCAGCC GCTACCTTTG ACAAGCGATA TTATACCCAT CGTTTCCCGT AAAATACGCC CTTACGTGCC GACTTTCTGC ATATCTAGAG CCTTGGGGCC CTATATCCGG GACTCCAGCG TGGGCCTTTC CGTTTTGGTC GCAGGGCGGT GTAACTGGCT CCTTAACCGG TGGCTGCTGC CGATAGTTAC CATACAGTCC GTGGAATGAG AAAGGCAGGA GGTATCTTTA ATTTCGTGAT CGCGGCCCTC ATCTCCGCCC AGCGTAGCGA TTCTGCTGAC TCACTGACAC TAGCGCAGAA CTCATTAGAA TCAGGATTAT AGAAAACTCA CTGCGATTCC TCAAAAATAA CGGTGAGAAT GCCAGCCATT TTCATTCGTG AGGACAATTA GCGCATCAAC AACGCTGTTT AGTACGGATA AGTTTAGTCT CCATGTTTCA GATTGTCGCA ATAAATCAGC TGAATATGGC CGGTAGTGAT GATCAAGAAG TATGGAGGTC AATTCGTC TCTAAACGGG GTAACGAATT TAACTGGACT TTCGGTAGAA TGCGCGTAAT TTTTCGAAGG TGGAGGAGCG CGCATGACTT CAGTGGTGCT CGGATAAGGC AGCTTGGAGC ACAAACGCGG ACAGGAGAGC TAGTCCTGTC GCTTGTCAGG TCACTTCCCT CGTTCGTAAG GTCAGTGAGC GCACCGGTGC CCTCATCAGT AGGCCCACCC AAACTCATCG CAATACCATA CCGAGGCAGT GACTCGTCCA GGTTATCAAG GGCAAAAGTT ACGCTCGTCA ATTGCGCCTG CAAACAGGAA AATATTTTCA TTCCGGGGAT AAATGCTTGA GACCATCTCA GAAACAACTC CCTGATTGCC ATCCATGTTG TCATAGCTCC CTTATTTCAT ACGGTCAAAA AGGTATGATT TCTTGAGGGG CAAGCTTGAT GCAATCAACT GTCTTCTTAA CTCTTGCTCT TTCTCTGAGC CAGTCACCAA CAAGACTAAC TTTGCATGTC GCAGCGGTCG GAACTGCCTA CCATAACAGC GCACGAGGGA GGGTTTCGCC GGGGCGGAGC GTTAAGTATC CCATTTCCGC GAGGAAGCGG AGCCTTTTTT GCCAACATAG GAAGGTGAGC AGCATCAAAT TTTTTGAAAA TCCATAGGAT ACATCAATAC TGAGAAATCA TATGCATTTC TCAAAATCAC AGCGAGGCGA TCGAGTGCAA CCTGAATCAG CGCAGTGGTG TGGTCGGAAG TCTGTAACAT TGGCGCATCG CGACATTATC GAATTTAATC TGAAAATCTC TATGGTGAAA GCCTCCGGTC TAAATGGTCA Table S3 – List of oligonucleotides used in this study Primer 129 130 140 141 142 143 Sequence (5’-3’) TGGGAATAGAAATATAAAGGGGAGT ACCCTAAAACTACTCTCAACCCA GAAATAGCATGCGGCGATGCACAGATACTTACAAC TCCATTACCGGTACTCCCCTTTATATTTCTATTCCCA TTAGACACGCGTATGCCGCATTTGGATGGATTAG GTCTGACCATGGTATCGACTTGCCCTCTAGACCAG 5 Description Amplification and Sequencing of Cpa Spo0A Amplification and Sequencing of Cpa Spo0A spo0A homology region 1 For w/ SphI site spo0A homology region 1 Rev w/ AgeI site spo0A homology region 2 For w/ MluI site spo0A homology region 2 Rev w/ NcoI site Supplementary Figures A Wild Type Cpa 6013 B Mutant M150B Figure S1. Phase-contrast microscopy of WT (A) and M150B (B) after 6 days. Phasebright forespores observed in the WT and the asporogenous phenotype is apparent in the M150B strain. 6 400 6013 350 M150B 300 Coverage 250 200 150 100 50 0 0 1000000 2000000 3000000 4000000 Genome Position w.r.t. Genbank CP009267.1 Figure S2. Whole genome SMRT sequencing coverage depth overage across reference. Cpa wild type ATCC 6013 (olive) and Cpa mutant M150B (salmon) coverage across the reference genome (10,000 bp window average). 7 Figure S3. Cultures of the wild type (left), M150B (center), and ΔSpo0A (right) Cpa strains after 5 days culturing. Cultures were allowed to rest without agitation. M150B cultures consistently settled quickly compared to wild type, while the ΔSpo0A consistently remained in suspension longer than the wild type. This shows the M150B phenotype is not due to the Spo0A deficiency. 8