L G F K G G F T Y L H G N Y E R N Frame 1

advertisement
Whole-genome sequence of an evolved Clostridium pasteurianum strain reveals
Spo0A deficiency responsible for increased butanol production and superior
growth
Nicholas R. Sandoval1, Keerthi P. Venkataramanan1, Theodore S. Groth1, and
Eleftherios T. Papoutsakis1,2
1 Department of Chemical and Biomolecular Engineering & The Delaware
Biotechnology Institute, University of Delaware, 15 Innovation Way, Newark, DE 19711,
USA
2 Department of Biological Sciences, University of Delaware, USA
1
Supplemental Results
SMRT sequencing coverage indicates regions of DNA modification
The wild type gDNA was extracted during the transition phase while the M150B
gDNA was extracted during midexponential phase. The wild type coverage appears to
be even across the genome with two notable exceptions discussed below. The M150B
coverage appears to be dependent on the distance from the origin of replication of the
chromosome. As M150B was actively growing at the time of gDNA harvesting, we
suppose the regions near the origin are more abundant at the beginning of DNA
replication.
We observed a notable increase in the coverage around 3.40-3.51 MB for only
the wild type strain. The region near 3.51 MB contains five transposase genes (c32470c32510) which may affect coverage in this region.
We observed a sharp increase in coverage for both the wild type and mutant
around 4.27-4.31 Mb (Supp. Fig. 2). This is possibly due to phage elements at this locus
(c40170-c40480). This phenomenon of increased coverage during SMRT sequencing
around probable phage genes has been observed recently in this research group in
other Clostridium species.
2
Table S1 – Nucleotide sequence variants from published data
Location
Type
Variation
Mutation
Note
142286
Del.
A to -
Frame shift
IS116/IS110/IS902 family transposase
226301
Del.
T to -
Frame shift
Methyl-accepting chemotaxis protein TlpB
635237
Del.
TT to -
Frame shift
IS801 transposase
698471
Del.
T to -
Frame shift
Integrase catalytic subunit
699139
Del.
A to -
2883135
Del.
T to -
4138602
Del.
T to -
1015245
Ins.
- to T
Frame shift
Phage integrase family protein
986198
Sub.
C to G
A373P
IS1604transposase
1247868
Sub.
G to A
A319V
IS1604transposase
2019787
Sub.
G to A
E210K
Hypothetical protein AJA47948.1
2020013
Sub.
C to G
T285S
2055741
Sub.
A to G
K255E
2265810
Sub.
T to C
E264G
Hypothetical protein AJA47948.1
Spo0A Sequence here is similar to most
Clostridium Spo0A sequences, including Cpa
DSM 625(21)
Phosphate-binding protein PstS
3203809
Sub.
A to G
3205294
Sub.
A to G
No CDS
Frame shift
In transposase
No CDS
No CDS
S54G
3525518
Sub.
C to T
All with >66x coverage, compared with Rotta et al. 2015(20)
General secretion pathway protein A
No CDS
Table S2 - Sequence and features of pDcm2.0 with codon optimized Cpa bepIM
(NS_dcm_Cpa)
LOCUS
3378 bp
circular
DEFINITION
[pD881:144954]
ACCESSION DNA2.0 Id:
SOURCE
Synthetic
FEATURES
Location/Qualifiers
source
1..3378
/organism=synthetic
gene
151..1278
/gene="NS_dcm_Cpa"
/label="NS_dcm_Cpa"
misc_feature
complement(2378..3136)
/product="Kanamycin-r"
/label="Kanamycin-r"
misc_feature
1..114
/product="P_rhaBAD"
/label="P_rhaBAD"
misc_feature
138..150
/product="strong RBS"
/label="strong RBS"
misc_feature
151..1278
/product="insert:"
/label="insert:"
3
misc_feature
1295..1385
/product="Term_PhageT7"
/label="Term_PhageT7"
misc_feature
1497..2323
/product="Ori_p15a"
/label="Ori_p15a"
BASE COUNT
848 T
816 G
1
51
101
151
201
251
301
351
401
451
501
551
601
651
701
751
801
851
901
951
1001
1051
1101
1151
1201
1251
CACCACAATT
TGCCAATGGC
GCTTTTTAGA
ATGGAGCAGC
M E Q L
AACGGACGAT
T D D
AGCACAAACT
H K L
AAACGTCGCT
K R R F
CTTGGGCTTC
L G F
ATAACTTCGA
N F D
ACCTATCGCT
T Y R S
CATTCGTGAT
I R D
CGTGTCAGGA
C Q D
CGCGGTCGTC
R G R L
AGTCGCTTTC
V A F
GCGTCGTACT
V V L
GTGTACTTTC
V Y F H
TGAGCGTGTG
E R V
TCATTCCGCT
I P L
GCGATCGATG
A I D D
CAGACGCGAC
R R D
GTAATATCCG
N I R
CACCACGGTA
H H G N
TCTGTCCAAT
L S N
CGTTTCCGGA
F P D
CAAGTCGGCA
Q V G N
GCTGTTCCTG
L F L
CAGCAAATTG
CCATTTTCCT
CTGGTCGTAA
TCAGCATTTT
S I F
CGCGAACTGT
R E L S
GACCGAGCGT
T E R
TTACCATTCT
T I L
AAGGGCGGCT
K G G F
CATCATTTGG
I I W
CCTATTTCGG
Y F G
GATGAATTTC
D E F P
CTTCAGCCTG
F S L
TGTATTTGCA
Y L Q
ATCGCCGAGA
I A E N
GAAAACGATC
K T I
ACCTGTACAA
L Y N
ATCATTTATG
I I Y G
GGAAACCCAT
E T H
ATCTGTGGGA
L W D
TACTCGAAAG
Y S K A
TATTCAGAGC
I Q S
ATATCGAAGG
I E G
TGGCGTCGCC
W R R L
TGACTTCATT
D F I
ACGCAGTTCC
A V P
AGCTTGATCC
S L I R
893 A
TGAACATCAT
GTCAGTAACG
TGAAATTCTT
CAATAAAGTC
N K V
CTATTGAAGA
I E E
ATCGATATCA
I D I I
GAGCCTGTTT
S L F
TCACCTATTT
T Y L
GCGAATGAAA
A N E I
TAACCATATT
N H I
CTCAAGCGGA
Q A D
GCTGGTAAAA
A G K K
GATGAAGCGT
M K R
ATGTTCGTAA
V R N
ATCGACGACT
I D D F
TGCAGCGAAC
A A N
GCATTCGTGA
I R E
AGCCTGTATA
S L Y N
CAAACTGGAT
K L D
CGAAGTTTTA
K F Y
GACAAGGTTG
D K V A
TCATTACCGC
H Y R
TGAGCGTGCG
S V R
TTTCAGAGCG
F Q S A
GCCGGTGCTG
P V L
GCATCAAAGG
I K G
821 C
CACGTTCATC TTTCCCTGGT
AGAAGGTCGC GAATTCAGGC
TTTAAGAAGG AGATATACAT
GACGATTTCC AAGAGCAGCA
D D F Q
E Q Q
GATCAACAAG TTTATCAACG
I N K
F I N E
TTAACACGGA GAATGCAAGC
N T E
N A S
AGCGGTTGTG GTGGTCTGGA
S G C G
G L D
ACACGGTAAT TACGAACGTA
H G N
Y E R N
TCAACAGCCA AGCTGTTGAA
N S Q
A V E
GTGTGCGAGG ACATCAATAA
V C E D
I N N
CATCATTATC GGTGGTTTTC
I I I
G G F P
AGCAAGGCTT GAACGTTGAG
Q G L
N V E
GCGATTGATG CGGTTAAGCC
A I D A
V K P
TCTGATGGTC ATGGGTAACG
L M V
M G N G
TCAAGCAGAG CGGTTACAAT
K Q S
G Y N
TACGGTGTTC CGCAGAACCG
Y G V P
Q N R
GGATCTGAAC AATATTCCGT
D L N
N I P F
ACTGGGTGAC TGCCTCTGAG
W V T
A S E
ACCAACATCC CGAACCACAG
T N I P
N H S
CGAAGGCAAG CGTACGCAGG
E G K
R T Q G
CGCCGACCAT TCGCGCAGAA
P T I
R A E
ACCTACGGCG ACGAGTCCGA
T Y G D
E S D
TGAGTGCGCA CGTATTCAAA
E C A
R I Q T
CAGCCAGCAG CGCGTATAAA
A S S
A Y K
GCGTGGAATA TTGCCCGTGC
A W N I
A R A
TTAGAGCGGC CGCCACCGCT
* Frame 1
4
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
Frame 1
1301
1351
1401
1451
1501
1551
1601
1651
1701
1751
1801
1851
1901
1951
2001
2051
2101
2151
2201
2251
2301
2351
2401
2451
2501
2551
2601
2651
2701
2751
2801
2851
2901
2951
3001
3051
3101
3151
3201
3251
3301
3351
GAGCAATAAC
TTTTTTGCTG
ATCATTCAGG
CACTGGCTCA
TAAGATGATC
GAAAACGAAA
TACCAACTCT
AACTTGTCCT
TCCTCTAAAT
TTTCCGGGTT
GACTGAACGG
CCCGGAACTG
GGAATGACAC
GCCGCCAGGG
ACCACTGATT
CTATGGAAAA
TTCCTGGCAT
TCGCCGCAGT
AATATATCCT
CTCCTGCCAC
TAAGCCAGTA
CAGGTGATTA
GAAATTGCAA
AGCCGTTTCT
GGCAAGATCC
AACCTATTAA
CCATGAGTGA
TTTCCAGACT
TCGCATCAAC
AATACGCGAT
CCGGCGCAGG
GATATTCTTC
AGTAACCATG
TGGCATAAAT
CATTGGCAAC
GGCTTCCCAT
GCGAGCCCAT
GCGGCCTCGA
GATAACTCAA
GTTGGAACCT
GGAGGCTTTT
GTATTGAGCG
TAGCATAACC
AAAGGAGGAA
ACGAGCCTCA
CCTTCACGGG
TTCTTGAGAT
AAACCGCCTT
TTGAACCGAG
TTCAGTTTAG
CAATTACCAG
GGACTCAAGA
GGGGTTCGTG
AGTGTCAGGC
CGGTAAACCG
GGAAACGCCT
TGAGCGTCAG
ACGGCTTTGC
CTTCCAGGAA
CGAACGACCG
GTATCACATA
ATGAAGCACT
TACACTCCGC
CATTTGGGCC
TTTATTCATA
GTAATGAAGG
TGGTATCGGT
TTTCCCCTCG
CGACTGAATC
TGTTCAACAG
CAAACCGTTA
CGCTGTTAAA
AACACTGCCA
TAATACCTGG
CATCATCAGG
TCCGTCAGCC
GCTACCTTTG
ACAAGCGATA
TTATACCCAT
CGTTTCCCGT
AAAATACGCC
CTTACGTGCC
GACTTTCTGC
ATATCTAGAG
CCTTGGGGCC
CTATATCCGG
GACTCCAGCG
TGGGCCTTTC
CGTTTTGGTC
GCAGGGCGGT
GTAACTGGCT
CCTTAACCGG
TGGCTGCTGC
CGATAGTTAC
CATACAGTCC
GTGGAATGAG
AAAGGCAGGA
GGTATCTTTA
ATTTCGTGAT
CGCGGCCCTC
ATCTCCGCCC
AGCGTAGCGA
TTCTGCTGAC
TCACTGACAC
TAGCGCAGAA
CTCATTAGAA
TCAGGATTAT
AGAAAACTCA
CTGCGATTCC
TCAAAAATAA
CGGTGAGAAT
GCCAGCCATT
TTCATTCGTG
AGGACAATTA
GCGCATCAAC
AACGCTGTTT
AGTACGGATA
AGTTTAGTCT
CCATGTTTCA
GATTGTCGCA
ATAAATCAGC
TGAATATGGC
CGGTAGTGAT
GATCAAGAAG
TATGGAGGTC
AATTCGTC
TCTAAACGGG
GTAACGAATT
TAACTGGACT
TTCGGTAGAA
TGCGCGTAAT
TTTTCGAAGG
TGGAGGAGCG
CGCATGACTT
CAGTGGTGCT
CGGATAAGGC
AGCTTGGAGC
ACAAACGCGG
ACAGGAGAGC
TAGTCCTGTC
GCTTGTCAGG
TCACTTCCCT
CGTTCGTAAG
GTCAGTGAGC
GCACCGGTGC
CCTCATCAGT
AGGCCCACCC
AAACTCATCG
CAATACCATA
CCGAGGCAGT
GACTCGTCCA
GGTTATCAAG
GGCAAAAGTT
ACGCTCGTCA
ATTGCGCCTG
CAAACAGGAA
AATATTTTCA
TTCCGGGGAT
AAATGCTTGA
GACCATCTCA
GAAACAACTC
CCTGATTGCC
ATCCATGTTG
TCATAGCTCC
CTTATTTCAT
ACGGTCAAAA
AGGTATGATT
TCTTGAGGGG
CAAGCTTGAT
GCAATCAACT
GTCTTCTTAA
CTCTTGCTCT
TTCTCTGAGC
CAGTCACCAA
CAAGACTAAC
TTTGCATGTC
GCAGCGGTCG
GAACTGCCTA
CCATAACAGC
GCACGAGGGA
GGGTTTCGCC
GGGGCGGAGC
GTTAAGTATC
CCATTTCCGC
GAGGAAGCGG
AGCCTTTTTT
GCCAACATAG
GAAGGTGAGC
AGCATCAAAT
TTTTTGAAAA
TCCATAGGAT
ACATCAATAC
TGAGAAATCA
TATGCATTTC
TCAAAATCAC
AGCGAGGCGA
TCGAGTGCAA
CCTGAATCAG
CGCAGTGGTG
TGGTCGGAAG
TCTGTAACAT
TGGCGCATCG
CGACATTATC
GAATTTAATC
TGAAAATCTC
TATGGTGAAA
GCCTCCGGTC
TAAATGGTCA
Table S3 – List of oligonucleotides used in this study
Primer
129
130
140
141
142
143
Sequence (5’-3’)
TGGGAATAGAAATATAAAGGGGAGT
ACCCTAAAACTACTCTCAACCCA
GAAATAGCATGCGGCGATGCACAGATACTTACAAC
TCCATTACCGGTACTCCCCTTTATATTTCTATTCCCA
TTAGACACGCGTATGCCGCATTTGGATGGATTAG
GTCTGACCATGGTATCGACTTGCCCTCTAGACCAG
5
Description
Amplification and Sequencing of Cpa Spo0A
Amplification and Sequencing of Cpa Spo0A
spo0A homology region 1 For w/ SphI site
spo0A homology region 1 Rev w/ AgeI site
spo0A homology region 2 For w/ MluI site
spo0A homology region 2 Rev w/ NcoI site
Supplementary Figures
A
Wild Type Cpa 6013
B
Mutant M150B
Figure S1. Phase-contrast microscopy of WT (A) and M150B (B) after 6 days. Phasebright forespores observed in the WT and the asporogenous phenotype is apparent in
the M150B strain.
6
400
6013
350
M150B
300
Coverage
250
200
150
100
50
0
0
1000000
2000000
3000000
4000000
Genome Position w.r.t. Genbank CP009267.1
Figure S2. Whole genome SMRT sequencing coverage depth overage across
reference. Cpa wild type ATCC 6013 (olive) and Cpa mutant M150B (salmon) coverage
across the reference genome (10,000 bp window average).
7
Figure S3. Cultures of the wild type (left), M150B (center), and ΔSpo0A (right) Cpa
strains after 5 days culturing. Cultures were allowed to rest without agitation. M150B
cultures consistently settled quickly compared to wild type, while the ΔSpo0A
consistently remained in suspension longer than the wild type. This shows the M150B
phenotype is not due to the Spo0A deficiency.
8
Download