i. introduction - Web Access

advertisement
FINAL WORKSHOP OF GRID PROJECTS “PON RICERCA 2000-2006, AVVISO 1575”
1
sequences among all Pseudomonas spp. and
perform
multi-gene/protein comparisons.
PSEUDOBIORES: A NEW
COMPREHENSIVE
DATABASE FOR PSEUDOMONAS
Grazia Licciardello1,2, Vittoria Catara2, Anshuma Mangtani3, Rocco Casilli4 and Vittorio Rosato3.
1
Science and Technology Park of Sicily, Italy, g.licciardello@unict.it
2
Dipartimento di Scienze e Tecnologie Fitosanitarie, Università di Catania, Italy, vcatara@unict.it
3
ENEA, Casaccia Research Centre, Italy
4
Ylichron S.r.l. Roma, Italy, rosato@casaccia.enea.it
____________________________________________

Abstract—Pseudomonas-related data are
dispersed among many bioinformatics
resources and data acquisition of species
which genomes have not been sequenced yet
represents the most important limit for
researchers. Data search and extraction
could be possible only via cross researches
through different web sources. To overcome
this issue we projected a database,
designated “PseudoBioRes”, that aims to
provide an integrated resource collecting
information on genes and/or proteins on the
basis of their potential applications. The
database has been assembled by linking the
data to their original sources. As a model, in
this first version, a section dedicated to
PHA, has been developed. We collected
data for 6 specific enzymes involved in the
PHA metabolic pathway for 15 species
studied, providing a total of 75 gene
accessions and 276 protein sequences as well
as the genomic contest for those species
whose genomes have been completely
sequenced.
Thanks to a user friendly interface the
user can browse PHA gene or protein
The database is open source in order to
maintain consistency with the new findings
and can also be used as a guideline in order
to create other sections for other relevant
metabolites. In the next future, thanks to the
storage resources and computing capability
of the GRID, we aim to improve the data
analysis possibilities sharing them on the
web with other laboratories. At the moment,
it
is
accessible
at
the
URL:
www.ylichron.it/PHA_pseudomonas_DB.
Index Te1ms — Pseudomonas, polyhydroxyalkanotes, database.
I. INTRODUCTION
seudomonas Migula 1894, includes
bacterial species of relevant interest in
medicine, plant pathology and biotechnology,
as confirmed by the 35 genome projects on 10
different Pseudomonas spp. submitted and the
large number (19) in progress or in draft
assembly status. For the most important
species, e.g. P. aeruginosa, P. putida, P.
fluorescens and P. syringae, the genome of
more than one strain has been sequenced. In the
last few years, a huge amount of data has been
generated, and more and more are expected in
the next years.
P
FINAL WORKSHOP OF GRID PROJECTS “PON RICERCA 2000-2006, AVVISO 1575”
This offers an unprecedented opportunity to
use the comparative analysis approach in
studies of evolution and functional genomics,
shedding light on molecular mechanisms regulating different metabolic pathways. In this
context, the problem of the optimal extraction
of representative datasets of genomic and
proteomic data assumes a crucial importance.
Genome annotations are accessible directly
from the GenBank (http://ncbi.nlm.nih.gov)
and from specialized web sites. The Pseudomonas Genome Database v2 (PGDv2) is a P.
aeruginosa specialized database in which the
genome annotation is continually updated and
the database content and functionality. Since
2005, this database provides also annotations
of other Pseudomonas genomes, and acts as a
valuable comparative resource.
Data about P. syringae, a plant-pathogenic
bacterium, strains of which are characterised
for their diverse plant-specific interactions, are
collected in the Pseudomonas - Plant Interaction web
site (http://pseudomonas- syringae.org/home.html ).
Recently, an integrated bioinformatics
platform for a Pseudomonas systems biology
approach to infection and biotechnology has
been established. The database called
SYSTOMONAS (SYSTems biology of
pseudOMONAS) accessible at http://www.systomonas.de encourages the Pseudomonas
community to elucidate cellular processes of
interest [1].
On the other hand, little information is
available on strains, which genomes have not
been sequenced yet. Pseudomonas-related data
(gene and protein sequences and metabolic
pathways) despite being available for a large
number of strains, are, in fact, dispersed among
many sources. Information extraction could be
accessed only via cross researches from either
different web sources dedicated to specific
class of enzymes or bacterial species or the
GenBank Database.
We faced this problem studying polyhydroyalkanoates (PHA) production by Pseudomonas
species. Most of the bacteria in this genus are
1
able to produce granules of medium-chainlength poly (3 -hydroxyalkanoates) (mclPHAs) as energy storage compounds [2]. Once
extracted from cells these molecules reveal
similar properties to those of common plastic,
moreover they are degraded by microbial
depolymerases. mcl-PHA genetic locus in
Pseudomonas spp. [2] consisted of two PHA
synthases (PhaC1 and PhaC2) [3] separated by
the intracellular PHA depolymerase (PhaZ)
essential for polymer utilization and
biodegradability [4]; a proposed structural
protein belonging to the TetR family regulators
(PhaD); and two PHA granule-associated
proteins (PhaF and PhaI) [5, 6]. Integration of
P. putida KT2442 classical experimental data
along with genomic and high-throughput data
stimulated the reconstruction of three different
metabolic models aimed to improve PHA
production, as a demonstration of the interest
of the bioinformatics’ community for this
metabolic pathway [7, 8, 9].
PHA genes of many Pseudomonas strains in
addition to those derived from the genome
sequencing projects are considerable and
dispersed in many database and bioinformatic
resources.
Since now, there is no instrument enabling a
simple and rapid extraction of Pseudomonas
related data in a sole comprehensive database.
In the following sections we describe the
construction and content of PseudoBioRes, a
database which aims to partially fill this void,
its graphical interface and usefulness [10].
1) Construction and content
PseudoBioRes aims to generate a specialized
Pseudomonas resource to complement the
available databases in their biological utility
and application, providing a comprehensive
information of Pseudomonas-related sequences
and data on gene and protein sequences
worldwide available clustered on the basis of
the metabolic pathway in which they are
involved. In its current release, all the proteins
involved in the metabolism of PHA isolated or
FINAL WORKSHOP OF GRID PROJECTS “PON RICERCA 2000-2006, AVVISO 1575”
deduced by the genome sequencing projects in
species belonging to the Pseudomonas genus,
were collected. The proteins clustered on the
basis of their sequence similarity and class
(PhaC1, PhaC2, PhaZ, PhaD, PhaI, PhaF, PhaG
and PhaJ) were also interconnected with
genomic data when available. We collected
data for 6 specific enzymes involved in the
PHA metabolic pathway for 15 species studied,
providing a total of 75 gene accessions and 276
protein sequences as well as the genomic
contest for species completely sequenced. To
complete this section it was necessary to
articulate query terms and to manually
implement data results for each single species.
We collected also the sequence data of 15
genomes.
The database consolidated information from
external sources and manually annotated them
into a relational database. A search engine tool
that allows the query/retrieval of a class of
protein in all the Pseudomonas species in
which it has been sequenced, will be
developed. Protein and gene sequences could
be extracted and exported simultaneously for
all the Pseudomonas species ready to be used
for in silico analysis. The way it provides for
the retrieval and extraction of sequences will
allows the user to overcome obstacle
encountered in the integrative of different
bioinformatic resources.
2) Data sources
PseudoBioRes is a result of experimental
data provided by different research groups or
retrieved by external sources. Among them are
Pseudomonas Genome Database v2 (PGDv2,
http://www.pseudomonas.com/),
GenBank
(http://www.ncbi.nlm.nih.gov/), KEGG Kyoto
Encyclopedia of Genes and Genomes
(http://www.genome.jp/kegg/) and the List of
Prokaryotic Names with Standing in
Nomenclature–LPSN
(http://www.bacterio.cict.fr/).
Data on genome sequences were extracted
from the section Gbrowse of PGDv2, which
1
stores and integrates data extracted from the
project Pseudomonas Genome Project and from
PseudoCAP
(Pseudomonas
aeruginosa
Community Annotation Project). The GenBank
was used as gene and protein data source using
the engine of NCBI (National Centre for
Biotechnology Information).
The PHA database dedicated section was
completed with articulated query terms and
manually implemented data results for each
single species. It took a long time but all data
were included
For metabolic pathways and enzyme classes
we used the Japanese GenomeNet service,
KEGG, which integrates metabolic pathways
(data on metabolic pathway and complex),
genes (data on functional genes and their
protein products) and
ligands (Chemical
compounds, drugs, glycans, and reactions).
From here we extrapolated Pseudomonas PHA
metabolic pathway.
The occurrence of many DNA sequences
obtained from “unknown” strains without any
further characterization pointed out a gap
between
environmental
studies
and
Pseudomonas taxonomy. Thus we provided a
list and the link of Pseudomonas species as
retrieved from the LPSN, which includes the
nomenclature of prokaryotes and their changes
as cited in the Approved Lists of Bacterial
Names or published in the International Journal
of Systematic Bacteriology (IJSB) or later in
the International Journal of Systematic and
Evolutionary Microbiology (IJSEM). Genes
not attributed to a species were referred as
Pseudomonas spp. and the strain name was
reported as in the NCBI taxonomy database.
3) Structure of the database
PseudoBioRes database has a tree-structure
with an introduction page which reports the
main goals of database. The content of the
database is built on three main interconnected
blocks dealing with species, genomes and
genes.
FINAL WORKSHOP OF GRID PROJECTS “PON RICERCA 2000-2006, AVVISO 1575”
From the section “Species” the user is sent
to the web pages corresponding to alphabetical
list of the 175 species of Pseudomonas, taken
from LPSN, with a related link to the correct
nomenclature, where a particular species can be
selected to get its relevant data. Provided links
allows to reach Genome sequencing projects
(complete and in progress) from the Gbrowser
of the Pseudomonas genome database v2 web
site. A complete comprehensive list is provided
from the website www.pseudomonas.com,
showing the ongoing and the completed
genome projects related to sequencing of genes
of various Pseudomonas species.
Starting from a page dedicated to a particular
Pseudomonas species page it is possible to
access 5 fields: general description, NCBI
Taxonomy browser, relevant papers, genome
sequence (if available) and genes and proteins
involved in a specific pathway (in this version
only PHA).
The general description has been prepared
by using various literature sources focusing
upon some general characteristics of the
species and PHA production. It reports a brief
description and its biological relevance and
role in different fields such as clinical,
agricultural, environmental. The link to the
Tax-browser of NCBI gives more information
on taxonomy. It also links to other databases
like Genes, Proteins, Genome, Nucleotide,
Genome Projects, Structure etc. The relevant
scientific literature used to compile the text
pages and related to the particular species is
given into the “relevant papers” page.
The link to “complete genome” corresponds
to GBrowse tool which shows the complete
map along with the positioning of genes. It is
possible to find here the exact location of
specific genes in the genome map for some
species using the search engine of that site. In
future we plan to replace this link with a better
source.
The link to “PHA related genes and
proteins” shows the various genes and proteins
involved in Pseudomonas PHA production.
From there, it is possible to get into further
1
pages where genes and proteins related to PHA
biosynthesis were collected, into two sets of
data. The set “PHA related genes” contains all
the genes derived from Pseudomonas genome
sequence project, when available. From this
page it is possible to gain the corresponding
page of the Entrez Gene ID of the NCBI web
site which provide the genomic context,
genomic region, the transcript and product and
link to other database (Conserved domain,
PubMed, KEGG, taxonomy, TIGR, etc). It
allows to directly download the nucleotide
sequence in FASTA format and to have
information about the metabolic pathway in
which the gene is involved thanks to KEGG
Database link.
The set “PHA related proteins” contains all
the protein sequences derived either from the
sequence genome contest or directly from
cloned genes. In this case, sequence
information was extracted after gene isolation
and sequencing and related to PHA yield data.
Also in this case it is possible to download the
FASTA protein sequence format and have
information about the metabolic pathway.
By the “Gene” resource it is possible to
access to specific sections dedicated to classes
of gene with relevant interest. Gene
chromosome location, sequence and structural
information are extracted from the NCBI
Taxonomy database, used also as reference for
information on the biological sources of the
protein sequenced providing links to the main
important biological database (KEGG). This
section is still in progress.
4) The web interface
The web interface has been developed by
using Microsoft ASP.NET technology, by
leveraging on Framework .NET 2.0. Care has
been taken to allow a simple and rapid update
of the database with the inclusion of new
entries.
The database is open source in order to
maintain consistency with the new findings and
can also be used as a guideline in order to
FINAL WORKSHOP OF GRID PROJECTS “PON RICERCA 2000-2006, AVVISO 1575”
create other sections for other relevant
metabolites. At the moment, it is accessible at
the following URL:
www.ylichron.it/PHA_pseudomonas_DB.
II. CONCLUSION
The tool we describe here has been
developed to support lab scientists and
bioinformatics to gain information and data
about
Pseudomonas
species,
targeting
sequences of the most important classes of
compound and biotechnological interest. The
way it provides for the retrieval and extraction
of sequences allows the user to overcome the
obstacles encountered in the integrative use of
different bioinformatic resources. At the
meantime, the completeness of the sequence
collection allows intra- and interspecies
comparison at different biological levels
(genes, transcripts and proteins.
ACKNOWLEDGMENTS
This work has been performed in the frame
of the project “CRESCO” (Computational
Research Center for Complex Systems) cofounded by ENEA and the Italian Ministry of
University and Research in the frame of
“Programma Operativo Nazionale 2000-2006
Ricerca Scientifica, Sviluppo Tecnologico,
Alta Formazione, Misura II.2 : Società della
Informazione per il Sistema Scientifico
Meridionale, Azione a : Sistemi di calcolo e
simulazione ad alte prestazioni”.
REFERENCES
[1] Choi C, Münch R, Leupold S, Klein J,
Siegel I, Thielen B, Benkert B, Kucklick M,
Schobert M, Barthelmes J, Ebeling C, Haddad
I, Scheer M, Grote A, Hiller K, Bunk B,
Schreiber K, Retter I, Schomburg D and Jahn
D. (2007) SYSTOMONAS — an integrated
database for systems biology analysis of
Pseudomonas. Nucleic Acids Res 35:533-537
[2] Madison L, Huisman GW. (1999)
Metabolic
engineering
of
poly(3hydroxyalkanoates): from DNA to Plastic.
Microbiol. Molec Biol Reviews 63 (1): 21–53.
1
[3] Rehm BHA, Steinbuchel A. (1999)
Biochemical and genetic analysis of PHA
synthases and other proteins required for PHA
synthesis. Int J Biol Macromol 25: 3–19.
[4] de Eugenio LI, Garcia P, Luengo JM,
Sanz JM, Roman JS, Garcia JL, Prieto MA.
(2007) Biochemical evidence that phaZ gene
encodes a specific intracellular medium chain
length polyhydroxyalkanoate depolymerase in
Pseudomonas putida KT2442: characterization
of a paradigmatic enzyme. J Biol Chem. 16,
4951–4962.
[5] Hoffmann N, Rehm BHA. (2004)
Regulation of polyhydroxyalkanoate biosynthesis
in
Pseudomonas
putida
and
Pseudomonas aeruginosa. FEMS Microbiol
Lett 237: 1–7.
[6] Hoffmann N, Rehm BHA. (2005)
Nitrogen-dependent regulation of mediumchain
length polyhydroxyalkanoate biosynthesis
genes in pseudomonads. Biotechnol Lett 27:
279–282.
[7] Dias JML, Oehmen A, Serafim LS,
Lemos PC, Reis MAM, and Oliveira Rui
(2008). Metabolic modelling of polyhydroxyalkanoate copolymers production by
mixed microbial cultures. BMC Syst Biol
2008: 2:59
[8] Nogales J, Palsson B and Thiele I. (2008)
A genome-scale metabolic reconstruction of
Pseudomonas putida KT2440: iJN746 as a cell
factory BMC Syst Biol 2:79
[9] Puchaka J, Oberhardt MA, Godinho M,
Bielecka A, Regenhardt D, Timmis KN, Papin
JA, Martins dos Santos VAP. (2008) GenomeScale Reconstruction and Analysis of the
Pseudomonas putida KT2440 Metabolic
Network Facilitates Applications in Biotechnology. PLoS Comput Biol 4(10)
[10] Licciardello G, Catara V, Mangtani A,
Casilli R, Rosato V. (2008) PseudoBioRes: una
risorsa
bioinformatica
per
il
genere
Pseudomonas. Conferenza Nazionale Italiani
E-Science 2008, Book of abstract 126.
FINAL WORKSHOP OF GRID PROJECTS “PON RICERCA 2000-2006, AVVISO 1575”
Grazia Licciardello was born in
1978. She is a molecular
biologist with a good expertise in
plant
pathology
and
biotechnology thanks to a II level
Post
graduate
Master
in
“Biotechnology for sustainable
protection of crops and agrifood”
and
a
PhD
in
“Phytosanitary technologies” at
the University of Catania. From
2004 up to now, she works as researcher at Scientific and
Technological Park of Sicily. She has participated to the
following project: the PON project “Utilization of waste
material to develop biodegradable polymers (PHA) for
agriculture and agroindustry” and the MIUR project
“CRESCO, Computational Centre for research on Complex
Systems”.
Her main area of research is the genetic manipulation for
biotechnology purposes, the detection of microbial
phytopathogens by molecular methods and the study of
genes involved in genetic regulation. She is author of about
30 scientific papers published in international refereed
journals and presented in national and international
congress.
Vittoria Catara, Associated Professor at the University of
Catania. Since 1990 she has joined the research activity of
Di.S.Te.F University of Catania, Italy; she cooperated in a
number of projects; she has been responsible for 3 Project
of the University of Catania, in a project for young
researcher of Catania University and Coordinator of a
British programme funded by CRUI and British council.
She is involved in phytobacteriology studies and on to
molecular aspects of fungal diagnosis and characterization.
She collaborated at the preparation of 120 contributes
among scientific publications, published on technical and
scientific journals, or presented in conference and
published in proceedings on the following subjects: plant
diseases; molecular techniques for the diagnosis of plant
pathogens; phenotypic and genomic characterization of P.
corrugata; evaluation of resistance to biotic and abiotic
factors; characterization and application of biocontrol
agents; analysis of bacterial populations; Evaluation of
bacteria for polyhydroxyalkanoates production; regulation
of polyhydroxyalkanoates genes, Quorum sensing in
Pseudomonas spp. She described new diseases from known
pathogens, described a new bacterial species, P.
mediterranea Catara et al (2002). She is co-author of a
patent.
1
Download