Paper GSS Corynebacterium pseudotuberculosis

advertisement
1
Gene content characterization and genome organization of Corynebacterium
2
pseudotuberculosis.
3
4
Vivian D’Afonseca¹, Fernanda Alves Dorella¹, Luis Gustavo Pacheco¹, Pablo Matias
5
Moraes¹, Francisco Prosdocimi², Izabella Pena², José Miguel Ortega², Santuza Teixeira3,
6
Sérgio Costa Oliveira4, Guilherme Correa Oliveira5, Roberto Meyer6, Anderson
7
Miyoshi¹**, Vasco Azevedo¹*
8
9
1- Laboratório de Genética Celular e Molecular, Depto. Biologia Geral, ICB-UFMG
10
2- Laboratório de Biodados, Depto. Bioquímica e Imunologia, ICB-UFMG
11
3- Laboratório de Imunologia e Bioquímica Celular de Parasitas, Depto. Bioquímica e
12
Imunologia, ICB-UFMG
13
4- Laboratório de Imunologia de Doenças Infecciosas, Depto. Bioquímica e Imunologia,
14
ICB-UFMG
15
5- Laboratório de Parasitologia, Centro de Pequisa Reneé Rachou – Fiocruz
16
6- Laboratório de Biointeração, Instituto de Ciências da Saúde, ICB - UFBA
17
18
19
Vasco Azevedo* and Anderson Miyoshi** (shared credit in this work for senior
20
authorship)
21
vasco@icb.ufmg.br
22
miyoshi@icb.ufmg.br
23
Laboratório de Genética Celular e Molecular. Sala Q3-259.
24
Departamento de Biologia Geral, ICB, UFMG
25
Av. Antônio Carlos, 6627 C.P. 486
26
31.270-010 Belo Horizonte, MG, Brazil
27
Tel: +55 31 3499-2778
28
Fax: +55 31 3499-2610
29
30
Running Head
31
Partial-identification of the gene content of C. pseudotuberculosis
32
33
* To whom correspondence should be addressed
34
1
35
Abbreviations: CLA, Caseous Lymphadenitis; GSS, Genome Survey Sequences; BLAST,
36
Basic Local Alignment Search Tool; COG, Clusters of Orthologous Groups; Cd,
37
Corynebacterium diphtheriae; Ce, Corynebacterium efficiens; Cg, Corynebacterium
38
glutamicum; Cj, Corynebacterium jeikeium; Cp, Corynebacterium pseudotuberculosis.
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
2
68
ABSTRACT
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
Corynebacterium pseudotuberculosis is an intracellular pathogen that infects sheeps
and goats. The widespread occurrence of this pathogen and the economic
importance of the Caseous Lymphadenitis Disease (CLA) have prompted
investigation of its pathogenesis. However, the genetic basis of C.
pseudotuberculosis virulence is still poorly characterized. A genomic library
constructed from C. pseudotuberculosis was used for generation of Genomic
Survey Sequence (GSS). From both ends of 720 distinct clones were obtained 1440
GSS’s which were submitted to in silico analyses using bioinformatic algorithms
such as PHRED 0.16 base-caller, Seqclean and PHRAP (Crossmatch) package and
public databases to comparative analyses. These analyses resulted in about 1,000
GSS’s that presented lengths between 100 to 790 nucleotides and 346,860
basepairs. Non-redundants uniques sequences were used as query for BLAST
searches against the genome (BLASTn), translated genome (tBLASTx) and
proteome (BLASTx) of other four complete genome Corynebacterium species,
presenting more similarity to other Corynebacterium at protein level than at DNA
level. As results, we were able to characterize approximately 8% of the genome of
C. pseudotuberculosis, therefore, we characterize functional groups genes were
not described previously according to COG database, a closely relation between C.
pseudotuberculosis and C. diphteriae, beyond conserved genes sinteny in
Corynebacteria species.
91
Keywords: Corynebacterium genus, Corynebacterium pseudotuberculosis, Genome
92
Survey Sequences (GSS), microbial genome sequencing, partial-genome sequencing,
93
comparative genomics.
94
3
95
INTRODUCTION
96
97
The decade 1990 was the starting point of the genomic era, with the
98
sequencing of the first microbial genome, a free-living organism - the bacterium
99
Haemophilus influenzae (Fleischmann et al., 1995; Mora et al., 2006). Since then,
100
genomic researches allow to generate a lot of information that are available in public
101
databases (Dorella et al., 2006a). These informations have been aid scientists
102
discovery genes and its functionality, consequently proteins that its encoding,
103
comparing genomes and species evolutionarily related (Celestino et al., 2004) and,
104
although prokaryotic microorganisms occupy the largest part of the planet total
105
biomass, (Rodrìguez-Valera, 2004), much more still need to be understood about
106
them. Thus, the massive accumulation of prokaryotic DNA sequences generated by
107
microbial genome projects can to provide greatest advances in our understanding in
108
various fields including bacterial diversity, mobile genetic elements (MGE) and
109
horizontal gene transfer (HGT) (Binnewies et al., 2006). Furthermore, genome
110
sequencing has provided important insights on deduced genes and proteins of
111
pathogens, including biological features, putative virulence factors and potential
112
targets that might be useful for development of immunological or chemotherapeutic
113
reagents against organism of choice (Celestino et al., 2004; Tauch et al., 2006). At
114
present, around 570 microbial genome projects are finished and published, and more
115
than 1100 are ongoing (Genomes OnLine Database, http://www.genomesonline.org/).
116
The Corynebacterium genus consists of a large number of Gram-positive
117
bacteria that are pleiomorphic, asporogenous and possess high G + C content in their
118
genomes (Deb & Nath, 1999). Originally, the genus Corynebacterium had only
119
diphtheria bacillus and some other animal-pathogenic species (Barksdale, 1970;
120
Pascual, et al., 1995). Latest, other microorganisms were added including plant- and
121
animal-pathogenic, nonpathogenic soil bacteria and saprophytic species (Collins &
122
Bradbury, 1986; Collins & Cummins, 1986; Pascual, et al., 1995). Members of the
123
genus Corynebacterium are closely related to Mycobacterium species, and both are
124
classified into the taxonomic order Actinomycetales (Nakamura et al. 2003). At
125
present, the National Center for Biotechnology Information (NCBI) genome database
126
contains the complete genome of four different species from the Corynebacterium
127
genus: Corynebacterium diphtheriae, Corynebacterium efficiens, Corynebacterium
4
128
glutamicum and Corynebacterium jeikeium (Cerdeño-Tárraga et al., 2003; Ikeda et al.,
129
2003; Nishio et al., 2003; Tauch et al., 2005).
130
One of the most important members of this genus, and the focus of this paper,
131
is the bacterium Corynebacterium pseudotuberculosis, a facultative intracellular
132
pathogen that infects sheeps and goats, causing caseous lymphadenitis (CLA). This
133
disease is worldwide spread, and its high economic importance has prompted
134
investigation of its pathogenesis. However, the genetic determinants of C.
135
pseudotuberculosis virulence are still poorly characterized (Dorella et al., 2006a),
136
moreover, this specie presents only 19 proteins identified in GenPept database.
137
In this context, the partial C. pseudotuberculosis genome sequence provided
138
in this work may aid to improve the understanding about the molecular and genetic
139
basis of this bacterium’s virulence, and still may be useful in the development of novel
140
diagnostic methods, contributing to the control of CLA.
141
142
143
5
144
METHODOLOGY
145
146
Bacterial strains and growth conditions
147
C. pseudotuberculosis wild-type strain T2 was isolated from a caseous
148
granuloma found in a CLA-affected goat in Bahia state (Brazil) and identified by the
149
API CORYNE battery (Biomerieux, France). This bacterium was aerobically grown in
150
Brain Heart Infusion (BHI, Acumedia) broth at 37ºC under agitation (Dorella, 2006b).
151
For attainment of recombinant clones we used the Escherichia coli DH5α strain,
152
aerobically grown in Luria Bertani (LB, Difco Laboratories, Detroit, USA) and
153
supplemented with ampicillin (100g/mL) and X-Gal (40g/mL) at 37ºC.
154
155
Genomic DNA extraction and construction of genomic library
156
Genomic DNA extraction was performed according to a previously described
157
protocol (Pacheco et al, 2007). DNA integrity was confirmed through electrophoresis
158
in 0.8% agarose gels and visualization by ethidium bromide staining. To evaluate the
159
DNA clearly and quality we used spectrophotometer analysis. To obtain differently
160
sized C. pseudotuberculosis DNA fragments, total genomic DNA was submitted to
161
two minutes of nebulization (Roe et al, 1996). Fragmentation was confirmed through
162
electrophoresis in 0.8% agarose gels and was obtained DNA fragments ranged from
163
500 – 1000 pb. These fragments were cloned in PCR®4 Blunt-TOPO vector using
164
TOPO® Shotgun Subcloning KIT (Invitrogen) and transformed into E. coli DH5α by
165
electroporation (Dower et al, 1988).
166
Blue-white screening was performed to select for positive clones, and PCR
167
with M13 universal primers was used to confirm insert presence, according to the kit
168
manufacturer’s instructions.
169
170
GSS sequencing
171
Plasmid extraction was performed directly on plates using a standardized
172
protocol (Sambrook et al., 1989). Plasmids obtained were analyzed in 0.8% agarose
173
gels and concentrations were determined spectrophotometrically.
174
Obtained plasmidial DNA, around 200ng/µL, were submitted to sequencing
175
reactions in a Mastercycler gradient (Eppendorf), using DYEnamic ET Dye
176
Terminator Cycle Sequencing Kit (GE Healthcare) and universal primers M13 F and
6
177
M13 R, following the manufacturer’s instructions. Sequences were generated in a
178
MegaBACETM 1000 equipment (GE Healthcare).
179
180
GSS amplification by PCR
181
For confirmation of the presence of putative exclusives genes in
182
Corynebacterium pseudotuberculosis and exclusion of any contamination were
183
designed primers forward and reverse for each GSS which didn´t have similarities in
184
the searches databases of the other sequenced Corynebacterium.
185
Putative Oligopeptide/dipeptide ABC transporter primers:
186
Forward: 5´-CCTTACCGAGACAACGTCAT-3´
187
Reverse: 5´-GCCTGGTGCTTATCATTGAT-3´
188
NADP oxidoreductase, coenzyme F420-dependent primers:
189
Forward: 5´-CTGCGACATAGCTAGGCACT-3´
190
Reverse: 5´-CCGCCAGACTTTTCTCTACA-3´
191
Proline iminopeptidase (PIP) primers:
192
Forward: 5´-AACTGCGGCTTTCTTTATTC-3´
193
Reverse: 5´ -GACAAGTGGGAACGGTATCT-3´Generated amplicons presented
194
lenght of: 285bp, 382bp and 551bp respectly.
195
196
Base-calling
197
The base-calling procedure was performed using PHRED algorithm (Ewing
198
and Green, 1998; Ewing et al., 1998) with parameterization using trim_alt and
199
trim_cutoff parameter of 0.16, as suggested by Prosdocimi and colleagues (2007).
200
201
Sequence filtering and clustering
202
In order to filter sequences from dust and vector sequences, we have used the
203
algorithms SeqClean (http://compbio.dfci.harvard.edu/tgi/software/) and PHRAP’s
204
package cross_match (http://www.phrap.org). SeqClean was firstly used to remove
205
dust from PHRED base-called sequences. So, have been run cross_match algorithm
206
against the sequence of cloning vector used in this analysis and, at least, SeqClean was
207
once more run in order to remove the cross_match. The sequence clustering has been
208
performed using TIGR software TGICL (Pertea et al., 2003) using default parameters.
209
210
7
211
212
Comparative BLAST analyses against Corynebaterium complete genomes
213
Non-redundant unique sequences clustered by TGICL were used as query to
214
BLAST searches (Altschul et al., 1997) against other species from the
215
Corynebacterium genus to identify similarities between these species and C.
216
pseudotuberculosis uniques. We have run BLASTs in the DNA (BLASTn) and protein
217
levels (BLASTx, tBLASTx) to verify the nucleotidic and amino acidic conservation
218
among species.
219
220
COG functional categories inference through Blast best hit analysis
221
C. pseudotuberculosis unique sequences were also used as query to BLASTx
222
searches against COG database (Tatusov et al., 1997). The NCBI COG database
223
clusters genes from 66 microbial species into ortholog groups and classify these
224
protein clusters into biological processes categories. A best hit inference approach
225
method has been used to classify organism’s sequences into COG functional
226
categories (REF). Therefore, BLASTx algorithm was run using 10e-5 e-value cut-off
227
and C. pseudotuberculosis partial genome sequences were classified into COG
228
categories based in the best hit match of each sequence against COG classified
229
proteins.
230
231
Comparative BLAST analyses against UNIPROT database
232
The most recent version of UNIPROT curated protein database (Apweiler et
233
al., 2004; Uniprot Consortium, 2007) was downloaded from EBI website at Feb 13,
234
2007 (ftp://ftp.ebi.ac.uk/pub/ databases/uniprot/knowledgebase/uniprot_sprot.fasta.gz).
235
Thus, it was used UNIPROT as database subject to BLASTx searches using C.
236
pseudotuberculosis uniques as query. The number of protein hits was reported as well
237
as the number of uniques matching UNIPROT proteins and not Corynebacterium
238
genus or COG proteins.
239
240
Gene synteny analysis with Corynebacterium genomes
241
A synteny analysis of C. pseudotuberculosis non-redundant GSSs was also
242
performed to verify if the gene order is conserved in C. pseudotuberculosis when
243
compared to other Corynebacterium species. Using tBLASTx best hits data of C.
244
pseudotuberculosis uniques against C. diphteriae, C. efficiens, C. glutamicum and C.
8
245
jeikeium genomes, we have calculated the putative average position of each protein-
246
coding gene (subject end minus subject init) when compared to other Corynebacterium
247
complete genomes. The putative gene positional results were plotted in a graph
248
ordered by C. diphteriae hits (C. pseudotuberculosis closest genome, see results).
249
250
251
9
252
RESULTS
253
254
Raw data filtering and processing
255
The raw data produced by sequencer machines and base-calling was firstly
256
filtered to remove vector regions and dust. Filtering has removed more than 30% of
257
bases and sequences (Table 1). Moreover, larger sequences were diminished due to the
258
presence of vector regions and/or dust contamination. Seqclean also removes from
259
analysis all sequences smaller than 100 base pairs.
260
261
Tab 1. Raw data analyses
RAW DATA
PROCESSED DATA
Number of sequences
1,440
963
Total number of residues
506,037
346,860
Smallest sequence
0
100
Largest sequence
906
792
Average length
351
360
262
263
Clustering Analysis
264
C. pseudotuberculosis GSS have been clustered using TIGR’s TGICL
265
software. This software uses Megablast (Zhang et al., 2000) to cluster sequences and
266
CAP3 (Huang and Madan, 1999) to further assemble the sequences from each cluster.
267
Such as expected, we have observed the presence of both many singlets and clusters
268
presenting a small number of sequences, certifying the library quality (Figure 1). The
269
non-redundant unique sequences have shown to span around 184,000 bases into 466
270
non-redundant set of DNA sequences (Table 2).
271
272
10
273
274
Fig1. Number of clusters by class of clustered sequences as defined by
275
TGICL/Megablast.
276
277
Tab 2. Clustering results
Number of
Number of
Average
Bigger
Smaller
bases
Sequences
Size (bp)
Sequence
Sequence
Singlets
91,047
276
331
792
100
Sequences clustered
255,813
687
372
786
100
Clusters
92,645
190
490
1248
121
183,692
466
395
1248
100
Sequence Class
GSS Uniques*
(non-redundant set)
278
* Unique data (except bigger and smaller sequences) are a sum of singlets plus clusters
279
data.
280
281
BLAST against complete genome Corynebacterium species
282
From the data above, all the analyses were made using the non-redundant set
283
of C. pseudotuberculosis sequences (uniques). C. pseudotuberculosis GSS unique
284
sequences were then used as query for BLAST searches against the genome
285
(BLASTn), translated genome (tBLASTx) and proteome (BLASTx) of other complete
286
genome Corynebacterium species (Table 3).
287
288
289
11
290
Tab 3. BLAST analysis of C. pseudotuberculosis uniques against other
291
Corynebacterium species.
BLAST
Subject
program
E-value
cutoff
cd*,a
ce*,a
cg*,a
cj*,a
tBLASTx
Genome
10-10
300 (100%)
244 (100%)
247 (100%)
198 (100%)
BLASTn
Genome
10-10
106 (35%)
42 (17%)
46 (19%)
31 (16%)
BLASTx
Proteome
10-10
268 (89%)
224 (92%)
221 (90%)
182 (92%)
BLASTx
Proteome
10-5
305
274
275
235
292
* Number of C. pseudotuberculosis uniques hitting other species’ genome;
293
a
Percentage of search with Genome and tBLASTx
294
295
It is interesting to realize that C. pseudotuberculosis has shown to be more
296
similar to other Corynebaterium in the protein level than in the DNA level (Nakamura
297
et al., 2003). Data on Table 3 suggests that C. pseudotuberculosis is more similar to
298
C.diphteriae, followed by C. glutamicum, C. efficiens and C.jeikeium.
299
300
Functional categorization of putative proteins based on best hits to COG database
301
All GSS unique sequences were used as query to BLASTx searches against
302
COG database. We have used 10-5 BLAST e-value cut-off and the C.
303
pseudotuberculosis translated partial protein sequences were classified into functional
304
categories based in the category of their best hit in COG database. Thus, C.
305
pseudotuberculosis proteins were classified into major (Figure 2) and minor (Table 4)
306
COG functional categories.
12
307
308
Fig 2. GSS classification in major COG categories
309
Almost half of C. pseudotuberculosis GSS uniques did not hit any protein in
310
COG database, even with the low stringent BLAST cut-off used. Metabolism has
311
shown to be the category on which most of genes were classified, while Information
312
Storage and Processing, Cellular Processes and Poorly Characterized have shown to
313
present a similar percentage of genes classified (~10%, Figure 2). Since these
314
percentages do not mean too much without a comparative perspective, Table 4 present
315
the expected percentage of genes classified in Corynebacterium, given by the COG
316
classification of the four mentioned complete genome bacteria already sequenced from
317
this genus.
318
319
320
321
322
323
324
325
326
327
13
328
Table 4. COG functional classification of C. pseudotuberculosis putative partial
329
proteins
COG Category
Cat*
Cp**
% in genus***
Inf
21 (4.2%)
3.9
(K) Transcription
Inf
23 (4.6%)
5.4
(L) DNA replication, recombination and repair
Inf
23 (4.6%)
5.2
(D) Cell division and chromosome partitioning
pCel
1 (0.2%)
0.5
pCel
8(1.6%)
2.2
(M) Cell envelope biogenesis, outer membrane
pCel
19 (3.8%)
3.1
(N) Cell motility and secretion
pCel
0
0.0
(U) Intracellular trafficking and secretion
pCel
1 (0.2%)
0.7
(V) Defense mechanisms
pCel
8 (1.6%)
1.3
(P) Inorganic ion transport and metabolism
pCel
21 (4.2%)
4.9
(T) Signal transduction mechanisms
pCel
11 (2.2%)
2.3
(C) Energy production and conversion
Met
14 (2.8%)
3.9
(H) Coenzyme metabolism
Met
19 (3.8%)
3.4
(I) Lipid metabolism
Met
14 (2.8%)
2.4
(G) Carbohydrate transport and metabolism
Met
21 (4.2%)
3.6
(E) Amino acid transport and metabolism
Met
28 (5.6%)
5.8
(F) Nucleotide transport and metabolism
Met
18 (3.6%)
2.1
Met
3 (0.6%)
1.7
(R) General function prediction only
Poor
29 (5.8%)
8.0
(S) Function unknown
Poor
17 (3.4%)
5.0
Not in COG
------
204 (40.6%)
33.3
(J)
Translation,
ribosomal
structure
and
biogenesis
(O)
Posttranslational
modification,
protein
turnover, chaperones
(Q)
Secondary
metabolites
biosynthesis,
transport and catabolism
330
Cp: Corynebacterium pseudotuberculosis
331
*Description of column data: Inf (information storage and processing), pCel (cellular
332
processes), Met (metabolism), Poor (poorly characterized).
333
**Number of genes found (and partial percentage) in cp genome. The absolute number
334
is bigger than 514 since some genes were classified in more than one category.
335
***Percentage
336
(http://www.ncbi.nlm.nih.gov/sutils/coxik.cgi?gi=375)
in
genus
Corynebacterium
according
to
COG
data
14
337
338
Most of GSS uniques classified data fits the expected percentage of
339
classification into Corynebacterium genus if we consider this is only a partial analysis
340
of C. pseudotuberculosis genome and proteome (Table 4). Confirming data on Figure
341
2, we have seen an excess of sequences not classified.
342
343
Shared hits against different databases
344
In order to search for putative C. pseudotuberculosis genes originated by
345
lateral transference, we have compared BLAST results against different databases. It
346
would be expected that most of C. pseudotuberculosis genes derived from GSS will be
347
present in other Corynebacterium species if they had been present in the common
348
ancestor of all species inside this genus. This expectation has been confirmed (Figure
349
3). However, we have observed the presence of 4 putative C. pseudotuberculosis
350
proteins (translated uniques) presenting similarities with COG and UNIPROT proteins
351
but not similarities with other Corynebacterium species proteins. Therefore, it is
352
possible to consider these 4 proteins as putative candidates for lateral transference into
353
C. pseudotuberculosis genome.
354
355
356
Fig 3. Distribution of C. pseudotuberculosis GSS uniques’ BLAST hits against three
357
databases tested: COG, UNIPROT and a built-in Corynebacterium protein database
358
containing the proteomes
359
15
360
These putative genes that might originated in C. pseudotuberculosis genome
361
by lateral transference were identified as: (1) an oligopeptide/dipeptide ABC
362
transporter, ATPase subunit with best hit in NR database against another bacteria
363
(Arthrobacter sp. FB24) from the same order of C. pseudotuberculosis
364
(Actinomycetales) (e-value 2e-19); (2) a NADP oxidoreductase, coenzyme F420-
365
dependent from Chloroflexus aurantiacus, a bacterium of Chloroflexi phylum (e-value
366
4e-10); (3) a proline iminopeptidase (PIP) from Xanthomonas axonopodis from
367
Proteobacteria phylum (e-value 1e-55); and (4) a high-similarity (99% identities)
368
complete sequence of a ribosomal protein L30 of the large (60S) ribosomal subunit
369
from the baker yeast Saccharomyces cerevisiae (e-value 2e-61) (Figure 4).
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
>gi|132946|sp|P24000|RL24B_YEAST Gene info 60S ribosomal protein L24-B (L30) (YL21) (RP29)
Length=155
Score = 238 bits (606), Expect = 2e-61
Identities = 154/155 (99%), Positives = 154/155 (99%), Gaps = 0/155 (0%)
Frame = -1
Query608 MKVEVDSFSGAKIYPGRGTLFVRGDSKIFRFQNSKSASLFKQRKNPRRIAWTVLFRKHHK 429
MKVEVDSFSGAKIYPGRGTLFVRGDSKIFRFQNSKSASLFKQRKNPRRIAWTVLFRKHHK
Sbjct 1
MKVEVDSFSGAKIYPGRGTLFVRGDSKIFRFQNSKSASLFKQRKNPRRIAWTVLFRKHHK 60
Query 428 KGITEEVAKKRSRKTVKAQRPITGASLDLIKERRSLKPEVRkanreeklkankekkraek 249
KGITEEVAKKRSRKTVKAQRPITGASLDLIKERRSLKPEVRKANREEKLKANKEKKRAEK
Sbjct 61
KGITEEVAKKRSRKTVKAQRPITGASLDLIKERRSLKPEVRKANREEKLKANKEKKRAEK 120
Query 248 aarkaekaKSAGVQGSKVSKQQAKGAFQKVAVTSR 144
AARKAEKAKSAGVQGSKVSKQQAKGAFQKVA TSR
Sbjct 121 AARKAEKAKSAGVQGSKVSKQQAKGAFQKVAATSR 155
390
Fig 4. Best hit BLASTx result of a C. pseudotuberculosis unique against a complete
391
sequence of a yeast 60S ribosomal protein. Putative case of lateral gene transfer to be
392
further investigated.
393
394
395
396
397
398
16
399
Confirmation of the putative genes of Corynebacterium pseudotuberculosis
400
By PCR were confirmed the presence of 3 putative exclusives genes of
401
Corynebacterium pseudotuberculosis. For this amplification, we used genomic DNA
402
of the Corynebacterium pseudotuberculosis 1002 and T2 strain, Corynebacterium
403
pseudotuberculosis equi, Corynebacterium ulcerans, Corynebacterium renale, which
404
still didn´t have sequenced Corynebacterium diphteriae and Corynebacterium
405
jeikeium. As negative control, was used vector TOPO only and positive control was
406
the clone contained the fragment that originated the GSS (figure 5).
407
408
A
409
M123456789
B
C
M1234 56 789
M123456789
382 bp
551 bp
285 bp
410
411
Fig. 5: PCR confirmation of the presence of the putative genes in Corynebacterium
412
pseudotuberculosis. (A) putative dipeptide/oligopeptide ABC transporter. (B) NADP
413
oxirectudase. (C) PIP – proline iminopeptidase. (M) Molecular marker (1Kb Ladder
414
Plus); (1) Negative control; (2) Positive control; (3) C.pseudotuberculosis 1002; (4)
415
C.pseudotuberculosis T2; (5) C. renale; (6) C. ulcerans; (7) C.diphteriae; (8) C.
416
jeikeium; (9) C. pseudotuberculosis equi.
417
418
Gene order analysis
419
In order to verify whether the gene order is conserved in Corynebacterium
420
species, when compared to the C. pseudotuberculosis partial genome analysis done
421
here, we have used tBLASTx best hit results against C. diphteriae, C. glutamicum, C.
422
efficiens and C.jeikeium genomes to define the position where all the GSS C.
423
pseudotuberculosis putative genes have been mapped in those genomes.
17
424
425
Fig 6. Gene synteny analysis of C. pseudotuberculosis GSS uniques based in
426
tBLASTx genome mapping into other Corynebaterium genomes.
427
428
Therefore, all C. pseudotuberculosis GSS putative genes were mapped into
429
other Corynebacterium genomes (Figure 6) ordered by C. diphteriae genome. All
430
genomes have shown to present a high degree of gene order conservation when
431
compared to C. pseudotuberculosis mapped genes, while C.jeikeium has presented a
432
clear inversion in the very middle part of its genome when compared to others.
433
Moreover, triangles representing C. efficiens and C. glutamicum genomes can be seen
434
many times together, showing the similarity between these genomes. There were not
435
seen any squares representing the similarity of C. pseudotuberculosis genes with C.
436
diphteriae ones out of the main diagonal line, representing a putative high
437
conservation of gene order in these two species.
438
439
440
441
442
443
18
444
DISCUSSION
445
446
An overview from the C. pseudotuberculosis genome was done here based in
447
the sequence of ~1,000 genome survey sequences. The DNA clustering of these data
448
has shown to produce 466 sequences representing 183,692 base pair from this
449
organism genome. Other four Corynebacterium species from this same genus were
450
already sequenced and they present a genome size between 2.4 to 3.2 megabases
451
(Table 5).
452
453
Tab 5. Data from complete Corynebacterium genomes already sequenced
Genome
Number of
size (bp)
RefSeq proteins
NC_002935
2,488,635
2,291
ce
NC_004369
3,147,090
3,006
C. glutamicum
cg
NC_006958
3,282,708
3,057
C. jeikeium
cj
NC_007164
2,462,499
2,201
Organism
Mnemonic
Accession Number
C. diphteriae
cd
C. efficiens
454
455
Therefore, it might be concluded we have analyzed something between 5.6%
456
and 7.5% of C. pseudotuberculosis genome. Moreover, since C. pseudotuberculosis
457
has shown to be more similar in both the nucleotidic and the amino acid level to C.
458
diphteriae (Table 4), a species containing a 2.4 megabases genome, we have probably
459
analyzed here a number closer to 7.5% than 5.6% of C. pseudotuberculosis genome.
460
Many
similarity
analyses
were
performed
here
comparing
C.
461
pseudotuberculosis GSS genome data with the genome and proteome of other four
462
Corynebacterium species already sequenced. It is interesting to realize that better
463
similarity results were produced when the comparisons between these organisms were
464
done at the protein level. This may indicate they have already being evolutionary
465
diverged since a consider amount of time. Moreover, C. pseudotuberculosis has shown
466
to share more nucleotide and amino acid similarities against C. diphteriae than other
467
species (Table 4) as well as a similarity in their gene order (Figure 6). This observation
468
is consistent with sequencing data of 16S rRNA and rpoB genes, where these species
469
have clustered very close (Khamis et al., 2005). After C. diphteriae, C.
470
pseudotuberculosis has shown to present similar similarity levels against C. efficiens
471
and C. glutamicum; and more distant relationship with C. jeikeium genome. All these
19
472
data are consistent with the 16S rRNA and rpoB data produced by Khamis and
473
collaborators (2005) and summary presented in Figure 7.
474
475
476
Fig 7. Evolutionary relationship between Corynebaterium species studied here. This
477
data was derived from 16S rRNA and rpoB genes (Khamis et al., 2005) and it is in
478
accordance with similarity data shown here (Table 4).
479
The very high number of “no hits” seem in Figure 2 may be almost totally
480
explained looking into data in Table 5 and considering the size of non-coding regions
481
in Corynebacterium genomes. First, the last raw in Table 5 shows the number of genes
482
expected to be absent in COGs when some species from the Corynebacterium genus is
483
analyzed. Consequently, it is supposed that 33.3% of genes will not be classified in
484
COG categories in some species of Corynebacterium and C. pseudotuberculosis has
485
shown to present 40.6% of their non-redundant unique sequences classified in this
486
non-classified category. However, the number 33.3% in Table 5 is related only to the
487
percentage of genes not classified in COGs and the unique sequences represent both
488
genes and intergenic regions. When analyzing the Corynebacterium genomes trying to
489
find which part of them are composed of non-coding regions: C. diphteriae, C.
490
glutamicum, C. efficiens and C.jeikeium have shown present, respectively, 11.7%,
491
6.7%, 12.7% and 8.4% of their genomes composed of non protein-coding regions.
492
Therefore, if we consider an average putative value of 9.9% as C. pseudotuberculosis
493
genome non-coding region plus 33.3% of putative C. pseudotuberculosis proteins not
494
classified in COGs, we get a value of 43.2% of uniques expected to be not classified in
495
COGs and this number is close to the numbers seem in Figure 1 and Table 5 (40.6%).
496
Comparing the number of uniques presenting on BLAST hits into
497
Corynebacterium proteins but presenting significant similarities with proteins in
498
different curated protein databases (COG and UNIPROT), we have verified the
499
presence of some candidates C. pseudotuberculosis genes to be risen in this genome
500
through lateral gene transference. Amongst the 4 genes found in this category, a
501
proline iminopeptidase (PIP) and a ribosomal protein have shown high similarity
20
502
indexes with other species proteins. Regarding this ribosomal protein, we have found a
503
very high level identity in the entire sequence of an eukaryotic ribosomal protein from
504
S. cerevisiae (Figure 4). Presence of the 3 GSS were confirmed by PCR in C.
505
pseudotuberculosis (Figure 5), suggesting that PIP, putative oligopeptide/dipeptide
506
ABC transporter and NADP oxireductase actually are present into genome of C.
507
pseudotuberculosis, furthermore, it also is present in C. ulcerans (putative
508
dipeptide/oligopeptide ABC transporter), C. renale (all the 3 GSS) and didn´t present
509
in C. diphteriae and C. jeikeium as suggest our work. This gene presence in C.
510
pseudotuberculosis as well as C. renale and C. ulcerans was explained by Dorella and
511
colleagues, 2006 which based on analysis of the rpoB gene was suggested a clear
512
relationship between C. pseudotuberculosis and C. ulcerans.
513
The synteny of C. pseudotuberculosis putative genes and genes from other
514
Corynebacterium genomes were also evaluated in the present work (Figure 5). An
515
almost straight line can be seen when C. pseudotuberculosis putative gene positions
516
were mapped into C. diphteriae genome, evidencing the fact that our analysis has been
517
efficient in the sample of randomly disposed genes in the C. pseudotuberculosis
518
genome and once more attesting the quality of our genomic library. A broken line
519
would indicate that some regions had been preferentially sampled. Moreover, all the C.
520
pseudotuberculosis GSS uniques following the main diagonals in the Figure 5 are
521
suggested to be used as anchors to produce a final version of a further C.
522
pseudotuberculosis genome initiative. It is also interesting to observe that many C.
523
pseudotuberculosis genes were mapped in different positions when compared to C.
524
efficiens and C. glutamicum genomes. Nevertheless, in many situations it is possible to
525
see filled and not-filled triangles in similar positions (out of the diagonal), evidencing
526
also the similarity between C. efficiens and C. glutamicum genomes and their
527
differences when compared to C. diphteriae and C. pseudotuberculosis ones.
528
C.jeikeium genome has shown to be, such as expected, the most divergent species
529
when gene order is evaluated. Sinteny analysis suggests that Corynebacteria rarely
530
undergone genome rearrangements and have maintained ancestral genome structures
531
even after the divergence of Corynebacteria and Mycobacteria (Nakamura et al,
532
2003).
533
Thus, based on the analysis of ~1,000 C. pseudotuberculosis GSS sequences,
534
we have produced an overview of this bacterium genome. The sequence of ~8% of its
21
535
genome allowed us to confirm its evolutionary position among other complete genome
536
Corynebaterium and classify putative proteins into COG functional categories.
537
538
539
22
540
REFERENCES
541
542
1.
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W.,
543
Lipman, D.J. Gapped BLAST and PSI-BLAST: a new generation of protein
544
database search programs. Nucleic Acids Res., 1997 1:25(17):3389-402.
545
546
2.
Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann. B., Ferro, S.,
547
Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A.,
548
O'Donovan, C., Redaschi, N., Yeh, L. S. UniProt: the Universal Protein
549
knowledgebase. Nucleic Acids Res., 2004 1:32(Database issue):D115-9.
550
551
552
3. Barksdale, L. Corynebacterium diphtheriae and its relatives. Bacteriol. Rev.,1970
34:378-422.
553
554
4. Binnewies, T. T., Motro. Y., Hallin, P. F., Lund, O., Dunn, D., La, T., Hampson, D.
555
J., Bellgard, M., Wassenaar, T. M., Ussery, D. W. Ten years of bacterial genome
556
sequencing: comparative-genomics-based discoveries. Funct Integr Genomics, 2006
557
6:165–185.
558
559
5. Celestino, P. B. S., Carvalho, L. D., Freitas, L. M., Dorella, F. A., Martins, N. F.,
560
Pacheco, L. G. C., Miyoshi, A. & Azevedo, V. Update of microbial genome
561
programs for bacteria and archaea. Genetics and Molecular Research, 2004 3 (3):
562
421-431.
563
564
6. Cerdeno-Tarraga, A. M., Efstratiou, A., Dover, L. G., Holden, M. T. G., Pallen,
565
M., Bentley, S. D., Besra, G. S., Churcher, C., James, K. D., De Zoysa1, A.,
566
Chillingworth, T., Cronin, A., Dowd, L., Feltwell, T., Hamlin, N., Holroyd, S.,
567
Jagels, K.,
568
Thomson, N. R., Unwin, L., Whitehead, S., Barrell, B. G. & Parkhill, J. The
569
complete genome sequence and analysis of Corynebacterium diphtheriae
570
NCTC13129. Nucleic Acids Research, 2003 31(22):6516-6523.
Moule, S., Quail, M. A.,
Rabbinowitsch, E., Rutherford, K. M.,
571
572
573
7. Collins, M. D. & Bradbury, J. F. Plant pathogenic species of Corynebacterium.
Bergey’s manual of systematic bacteriology, 1986 2:1276–1283.
23
574
575
8. Collins, M. D. & Cummins, C. S. Genus Corynebacterium. Bergey’s manual of
systematic bacteriology, 1986 2:1266–1276.
576
577
578
9. Deb, J. K. & Nath, N. Plasmids of corynebacteria. FEMS Microbiology Letters,
1999175:11-20.
579
580
10. Dorella, F. A., Fachin, M. S., Billault, A., Dias Neto, E., Soravito C., Oliveira, S.
581
C.,
582
characterization of a Corynebacterium pseudotuberculosis bacterial artificial
583
chromosome library through genomic survey sequencing. Genetics and Molecular
584
Research, 2006 5 (4):653-663. A
Meyer, R.,
Miyoshi, A. & Azevedo, V. Construction and partial
585
586
11. Dorella, F. A., Stevam, E. M., Cardoso, P. G., Savassi, B. M., Oliveira, S. C.,
587
Azevedo, V. & Miyoshi, A. An improved protocol for electrotransformation of
588
Corynebacterium pseudotuberculosis. Veterinary Microbriology, 2006. 114 298-
589
30.B
590
591
12. Dorella, F. A., Pacheco, L. G. C., Oliveira, S. C., Miyoshi, A. & Azevedo, V.
592
Corynebacterium
593
pathogenesis and molecular studies of virulence. Veterinary Reasearch, 2006 37 1-
594
18. C
pseudotuberculosis:
microbiology,
biochemical
properties,
595
596
13. Dower, W. J., Miller, J. F. & Ragsdale, C. W. High efficiency transformation of E.
597
coli by high voltage electroporation. Molecular Biology Group, Bio-Rad
598
Laboratories, Richmond, Nucleic Acids Res., 1988 16(13): 6127–6145.
599
600
601
14. Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred.
II. Error probabilities. Genome Res., 1998 8(3):186-94.
602
603
15. Ewing, B., Hillier, L., Wend, M.C., Green, P. Base-calling of automated
604
sequencer traces using phred. I. Accuracy assessment. Genome Res., 1998
605
8(3):175-85.
606
24
607
16. Fleischmann, R. D., Adams, M. D., White, O, Clayton, R. A., Kirkness, E. F.,
608
Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B.A., Merrick, J. M. Whole-
609
genome random sequencing and assembly of Haemophilus influenzae. Rd. Science
610
1995, 269:496-512.
611
612
613
17. Huang, X. & Madan, A. CAP3: A DNA sequence assembly program. Genome
Res., 1999 9(9):868-77.
614
615
616
18. Khamis, A., Raoult, D., La Scola, B. rpoB gene sequencing for identification of
Corynebacterium species. J Clin Microbiol., 2004 42(9):3925-31.
617
618
19. Ikeda, M. & Nakagawa, S. The Corynebacterium glutamicum genome: features and
619
impacts on biotechnological processes. Appl Microbiol Biotechnol, 2003 62:99-109.
620
621
20. Matsui, K., Yamagishi, A., Kikuchi, A., Ikeo, K., Gojobori, T.,
Nishio, Y.,
622
Nakamura, Y., Kawarabayasi, Y.,
623
Comparative Complete Genome Sequence Analysis of the Amino Acid
624
Replacements Responsible for the Thermostability of Corynebacterium efficiens.
625
Genome Research, 2003 13:1572-1579.
Usuda, Y.,
Kimura, E. & Sugimoto, S.
626
627
21. Mora, M, Donati, C., Medini, D., Covacci, A. & Rappuoli, R. Microbial genomes
628
and vaccine design: refinements to the classical reverse vaccinology approach.
629
Current Opinion in Microbiology, 2006 9:532-536.
630
631
22. Nakamura, Y., Nishio, Y., Ikeo, K., Gojobori, T. The genome stability in
632
Corynebacterium species due to lack of the recombinational repair system. Gene,
633
2003 317:149–155.
634
635
23. Pacheco, L.G. C., Meyer, R., Pena, R., Castro, T. L. P., Dorella, F. A., Bahia, R. C.,
636
Carminati, R., Frota, M. N. L., Oliveira, S. C., Meyer, R., Alves, F. S. F., Miyoshi,
637
A. & Azevedo, V. Multiplex PCR assay for identification of Corynebacterium
638
pseudotuberculosis from pure cultures and for rapid detection of this pathogen in
639
clinical samples. Journal of Medical Microbiology, 2007 56:1-7.
640
25
641
24. Pascual, C., Lawson, P. A., Farrow, J. A. E., Gimenez, M. N. & Collins, M. D.
642
Phylogenetic Analysis of the Genus Corynebacterium based on 16S rRNA gene
643
sequences. International Journal of systematic Bacteriology, 1995 724–728.
644
645
25. Pertea, G., Huang, X., Liang, F., Antonescu, V., Sultana, R., Karamycheva, S., Lee,
646
Y., White, J., Cheung, F., Parvizi, B., Tsai, J., Quackenbush, J. TIGR Gene Indices
647
clustering tools (TGICL): a software system for fast clustering of large EST datasets.
648
Bioinformatics, 2003 19(5):651-2.
649
650
26. Prosdocimi, F., Lopes, D. A. O., Peixoto, F. C., Ortega, J. M. Effects of sample re-
651
sequencing and trimming on the quality and size of assembled consensus. Genet. Mol
652
Res. 2007 (In press).
653
654
655
27. Roe, B. A., Crabtree, J. S., Khan, A. S. DNA isolation and sequencing. New York,
N.Y: John Wiley & Sons, 1996.
656
657
658
28. Rodrìguez-Valera, F. Environmental genomics, the big picture? FEMS Microbiol.
Lett., 2004 231:153-158.
659
660
29. Sambrook, J., Fritsch, E. F. & Maniatis, T. Molecular Cloning: A Laboratory
661
Manual, Second Edition. New York: Cold Spring Harbor Laboratory Press, 1989.
662
663
30. Tauch, A., Kaiser, O., Hain, T., Goesmann, A., Weisshaar, B., Albersmeier, A.,
664
Bekel, T., Bischoff, N., Brune, I., Chakraborty, T., Kalinowski, J., Meyer, F., Rupp,
665
O., Schneiker, S., Viehoever, P. & Puhler, A. Complete Genome Sequence and
666
Analysis of the Multiresistant Nosocomial Pathogen Corynebacterium jeikeium
667
K411, a Lipid-Requiring Bacterium of the Human Skin Flora.
668
bacteriology, 2005187 (13):4671–4682.
Journal Of
669
670
31. Tauch, A., Trost, E., Bekel, T., Goesmann, A., Ludewig, U. & Pühler, A. Ultrafast
671
De Novo Sequencing of Corynebacterium urealyticum Using the Genome
672
Sequencer 20 System. Biochemica, 2006 4:.
673
674
26
675
676
32. Tatusov, R. L., Koonin, E. V., Lipman, D. J. A genomic perspective on protein
families. Science, 1997 278(5338):631-7.
677
678
679
33. UniProt Consortium. The Universal Protein Resource (UniProt). Nucleic Acids Res.,
2007 35(Database issue):D193-7.
680
681
682
34. Zhang, Z., Schwartz, S., Wagner, L., Miller, W. A greedy algorithm for aligning
DNA sequences. J. Comput Biol., 2000 7(1-2):203-14.
683
684
685
35. Genomes OnLine Database, http://www.genomesonline.org/
686
687
688
689
27
Download