CBS Map

advertisement
THE
CBS
MAP
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
ABOUT CBS
The Center for Biological Sequence Analysis
(CBS) at the Technical University of Denmark
was established in 1993 by the Danish National
Research Foundation in response to the growing importance of the field of bioinformatics.
After more than ten years, CBS has become an
internationally recognized research center and
a resource for computational analysis of a wide
array of biological data. The center profile now
incorporates many novel aspects from the
general area of systems biology.
CBS is a highly multi-disciplinary group
consisting of approximately 65 employees –
one of the larger European groups within academic bioinformatics and systems biology. In a
recent review of the center’s production over a
ten year period, the reviewers were impressed
by the standard of research, with a strong production of high-impact publications, several
significant text books, and highly popular
WWW services. The quality and quantity of this
output also reflect the originality and the creative atmosphere within the center.
Overall, CBS has gained a reputation in the
field of bioinformatics and systems biology
as a highly dynamic and collaborative group,
establishing itself among the internationally
leading groups. The research groups in the
center participate in more than 15 projects
funded by the EU and the NIH.
While the first wave of computational research at CBS focused on sequence analysis,
where many highly important unsolved problems still remain, the current and future
needs will concern sophisticated, large scale
integration of extremely diverse sets of data:
gene sequences and their control regions, gene
expression profiles, protein-protein interaction
networks, temporal knowledge on protein
complex formation, and signaling cascades, to
mention but a few. Integrative approaches will
form the basis for advanced, quantitative and
qualitative types of systems biology, where
simulation and modeling will be key for the
understanding of the complex dynamics of
entire cells, organs or organisms.
FACTS ABOUT CBS
PUBLICATION PROFILE: CBS has a strong publication
INDUSTRY: High level of industrial collaboration, which
profile with more than 300 peer-reviewed papers, many
in addition to conventional, joint industry-university pro-
in high impact journals as well as many with high cita-
jects, include three industrial bioinformatics satellites
tion levels. Since 2003, the average, annual publication
established within the center framework. CBS also hosts
rate has been 40 papers in journals with review. In addi-
BioSys – a high technology network covering bioinforma-
tion to scientific papers, the CBS staff has authored or
tics and systems biology (www.biosys.dk). The network
co-authored seven text books and edited four procee-
consists of seven academic partners and 16 biotech- and
dings. A 2005 publication highlight was the Science pa-
medico companies. The purpose of the network is to
per entitled “Dynamic protein complex formation during
further the collaboration between academia and industry
the cell cycle,” authored by U. de Lichtenberg, L. J. Jen-
by creating an environment, where network partners
sen, S. Brunak and P. Bork – joint work with EMBL in
can meet, develop and exchange knowledge and ideas.
Heidelberg.
EXTERNAL FUNDING: The center staff has extracted
CITATION PROFILE: The most cited CBS publication
considerable external funding. Funding sources include
has more than 3,000 citations; Identification of prokaryo-
the EU, NIH, Nordic sources, private Danish foundations,
tic and eukaryotic signal peptides and prediction of their
industry, and most of the Danish research councils, in
cleavage sites, H. Nielsen, J. Engelbrecht, S. Brunak and
addition to the founding sponsor, the Danish National
G. von Heijne, Protein Eng., 10, 1-6, 1997, describing a
Research Foundation. CBS is also participating in many
method for prediction of signal peptides in prokaryotic
EU consortia and NIH projects.
and eukaryotic proteins. This paper was included in the
ISI Red Hot list for 1997. 15 other papers have more than
COMPUTE INFRASTRUCTURE: A strong compute and
100 citations, with eight of these having between 200 and
database infrastructure equipped with SGI Altix shared-
2000 citations. Two of these have been on the ISI Red
memory computers with 250 processors, 500 GB RAM,
Hot List. Except for a single year, CBS has in the 1997-
and a fast fibre channel 30TB storage RAID. Most of the
2004 period each year produced a paper, which is among
computer hardware has been funded by grants from the
the ten most cited papers out of the approximately 20,000
Danish Center for Scientific Computing (www.dcsc.dk).
papers each year co-authored by Danish scientists.
The database management maintains a datawarehouse
with more than 300 public databases particularly useful
ONLINE SERVICES: CBS has established a highly po-
in the context of data integration within systems biology.
pular service component, with 35 different web servers,
serving 1,500,000 pageviews/month. The Institute for
ACADEMIC ENVIRONMENT: CBS is a highly multi-
Scientific Information has ranked the CBS web site as
disciplinary center with a strong international profile.
one of the most useful within the field. In addition to web
More than 12 nationalities are currently represented,
based services, software packages for many of the met-
including many European countries, Russia, China and
hods have been installed at hundreds of other academic
the USA. Together, the CBS group has key competences
and industrial sites world-wide.
within the fields of molecular biology, biochemistry,
pharmacology, medicine, chemistry, physics, mathema-
TEACHING: Strong, innovative teaching component
with ten highly popular courses involving combinations
tics, computer science and chemical engineering.
CBS is one of several strong research centers at the
of lectures and web based hands-on exercises at both
BioCentrum department at the Technical University of
PhD and MSc levels. Since 2003, CBS has been respon-
Denmark. With more than 400 employees BioCentrum-
sible for the international MSc-programme in bioinfor-
DTU represents the largest concentration of biotech
matics at DTU, and in collaboration with other research
reserch in Denmark.
groups at DTU, the center also coordinates an international MSc program in systems biology. From 2005, CBS
MANAGEMENT: CBS has since 1993 been led by pro-
also offers internet-transmitted courses combining live,
fessor Søren Brunak, who, together with the other group
real-time transmitted lectures, webbased exercises,
leaders, the administrative staff, and the compute group
discussion fora and chat lines maintained by CBS staff.
manage the center activities.
INTEGRATIVE
SYSTEMS BIOLOGY
M/G1
Group leader: Center director, professor Søren Brunak,
brunak@cbs.dtu.dk
X The study of life at the cellular and molecular level
has brought about insight and change beyond anyone’s
imagination over the last thirty years. Until quite recently, this type of research has been carried out in a reductionistic way in which a few components were studied
at a time. However, the latest breakthroughs and advances in nano- and biotechnology has created new possibilities for cataloging and studying hundreds and thousands of biomolecules simultaneously and have thereby
paved the way for a new systems-scale view of living cells
and organisms. The new field is called systems biology.
The Integrative Systems Biology Group at CBS is at
the leading edge of these developments, focusing mainly
on understanding how intracellular networks of genes,
proteins, metabolites and other small molecules regulate
cellular behaviour and how perturbations to these regulatory systems may lead to disease. Unlike related efforts
in other areas, such as for instance physics, modeling in
systems biology relies on integration of massive amounts
of experimental data rather than just on theoretical modeling. For this reason, the group consists of biologists,
pharmacologists, biochemists, engineers and physicists,
working together on both the experimental and the
computational side to tackle the challenges of modeling,
mining and integrating massive amounts of heterogeneous data into systems biology. The group recently published a first proof-of-concept study of the cell cycle in
the Science Magazine. The results reveal how different
protein complexes, or molecular machines, are built and
activated inside the cell during the cell division process.
Apart from being of great value for basic research, these
results may help to understand how mutations in these
molecular components lead to diseases such as cancer.
The ongoing effort in the group is to construct such
models that will aid the identification of new disease genes and in uncovering the mechanisms behind complex,
multifactoral diseases. This effort has recently led to the
discovery of a large number of likely disease genes in various disorders such as breast cancer, Parkinson’s disease
and hypertension. The ultimate goal of the group is to
expand these efforts into models of entire cells and organisms, which will enable simulations of cellular and
physiological response to perturbations associated with
disease, drug targets and drug treatment.
M
Regulation
of meiosis
MCM/ORC
Protein
kinase A
DNA replication
Glycogen
synthesis
Nucleosome/
bud formation
Mitotic exit
APC
Cdc28-cyclin
Histones
Cation
transport
SCF
Pho85cyclin
Sister
chromatid
cohesion
Tubulin
related
S
f
Transcription
factors
SPB
Cell wall
G2
E
IMMUNOLOGICAL
BIOINFORMATICS
Group leader: Associate professor Ole Lund,
lund@cbs.dtu.dk
X The immune system normally does a good job of
keeping us free from diseases, but sometimes it fails.
One approach towards understanding why this happens
is to produce advanced simulation models of the immune system and to understand the relationship between hosts and patogens in this manner. Depending on
the complexity of these models and the input given,
they can be used to simulate what happens when a host
gets infected by a pathogen, thereby predicting the coevolvement of pathogens and immune systems. One aim
of the modeling is to identify parts of proteins known as
epitopes, which are recognized by the immune system,
thereby inducing a protective response. This knowledge
is very valuable in the development of better vaccines
and provides very important insights into the progression of cancer, allergy and autoimmune diseases.
The Immunological Bioinformatics Group at CBS
is developing new technologies for epitope discovery
that can aid in the search for new vaccines and therapies
for HIV, malaria, and tuberculosis, as well as for diseases such as influenza and pox, which may evolve to be a
threat naturally or intentionally through bioterrorism.
The group has built a simulation model of the human
immune system and has constructed a database with all
human pathogens. Using this database and a database of
the human genome, the group is working on using the
prediction methods to simulate the co-evolvement of
pathogens and immune systems, and in particular to
identify epitopes from the different arms of immune systems. In most of the projects the predicted epitopes are
being validated through experimental collaborations with
partners doing wet-lab research.
The group seeks to develop methods for the three
main types of epitopes: B cell epitopes, which are used to
recognize microorganisms outside cells; Helper T lymphocyte (HTL) epitopes, which are used to activate cells
that have taken up foreign substances; and cytotoxic T
lymphocyte (CTL) epitopes, which are used to detect
and kill infected cells.
G
MOLECULAR EVOLUTION
Group leader: Associate professor Anders Gorm
Pedersen, gorm@cbs.dtu.dk
X Evolutionary theory is the conceptual foundation of
the life sciences. The famous geneticist Theodosius
Dobzhansky expressed this very well when he said,
“Nothing in biology makes sense, except in the light of
evolution”. In the post-genomic era this insight is more
relevant than ever, and only by taking the theory of evolution into account is it possible to get a handle on organizing and analyzing the massive amount of biological data now available. Specifically, it is important to
realize that any group of present-day species that one
might choose to investigate will in fact have evolved
from a common ancestor through a process of “descent
with modification”, and this will have an impact on how
similarities and differences between the molecules within these organisms should be interpreted.
The Molecular Evolution Group at CBS applies
phylogenetic methods to analyze specific biological systems, but also develops methods for analyzing the flood
of sequence data available in the public domain in order
to learn about the evolutionary process itself. Current
projects focus, among other things, on how viruses such
as HIV and the hepatitis C virus (HCV) evolve within
infected patients, and in this context it is investigated
how antiviral drug use influences selection for resistance. The evolution of bacterial resistance to antibiotics and horizontal transfer of resistance-associated genes are other topics that the group explores. The group’s
mainly computerbased research into bacterial and viral
evolution is done in close collaboration with experimental groups at the University of Copenhagen, Copenhagen University Hospital, Hvidovre Hospital and State
Serum Institute. Other current projects in the Molecular Evolution Group include investigations into the evolution of non-coding DNA, evolution and origin of introns, and evolution of evolvability (the ability of biological systems to evolve). Generally, the group is interested in all aspects of evolution, and while using stateof-the-art computational tools, the focus is always on
analyzing problems that are interesting from a biological point of view.
F
POST-TRANSLATIONAL
MODIFICATION
Group leader: Associate professor Nikolaj Blom,
nikob@cbs.dtu.dk
X Protein function and modification is the focus of the
Post-Translational Modification Group at CBS. Disturbances of PTMs are the direct or indirect cause of many
diseases, including cancer and infections, and a greater
understanding may therefore lead to therapies of intervention. The PTM group studies a large range of protein modifications in order to elucidate the function of
proteins which have still not been fully characterized
and as tools for discovery of proteins with particular
properties, e.g. localization signals and processing sites.
Many PTMs occur at specific, yet variable motifs, in the
target proteins. In contrast to simple consensus patterns, machine learning techniques, such as artificial
neural networks, are often well suited to integrate the
subtleties of sequence variations, which can also be
visualized by so called sequence logos.
The Protein PTM group at CBS has a successful historical record of developing useful prediction tools –
SignalP (signal peptide sequences), NetOGlyc and
NetNGlyc (glycosylation sites) and NetPhos (phosphorylation sites) – made available over the internet and
used by large parts of the molecular biology community.
Current projects focus on kinase-specific phosphorylation sites, apoptotic caspase targets, GPI-attachment sites and pro-protein processing sites. Taking the knowledge of protein modifications further, the group is working on an integration of features at a systems biology
and proteome-wide level. This basically means that certain classes of proteins, e.g. nucleolus-localized or cellcycle regulated proteins, may be classified based on their
features. These features include calculated as well as predicted properties such as PTMs. In one such project, the
group is aiming at predicting the ability of proteins to
fold with or without the assistance of chaperones. To
test the PTM predictors developed by the group, an inhouse experimental validation scheme has been initiated. The first approach involves peptide microarrays
which contain up to 10,000 different peptides on a microscope slide. By incubation with, for example, a specific kinase, it is possible to deduce much information
about the specificity of the given kinase and use this in
the refinement of a prediction method.
Q
COMPARATIVE
MICROBIAL GENOMICS
Group leader: Associate professor David W. Ussery,
dave@cbs.dtu.dk
X Today, hundreds of bacterial genome sequences are
available in the public databases and several more genomes are being sequenced every month. Many of these
genomes are known to be human pathogens. The sequence data represent a vast amount of information and
comparison and analysis is important for a deeper understanding of virulence factors and whether new organisms constitute a potential food safety problem.
The Comparative Microbial Genomics Group at
CBS uses a combination of computational predictions
and experiments to explore the relationships between
the hundreds of sequenced bacterial genomes. The approach is “DNA-centric” in that the DNA sequence is
used to predict DNA structures, which can in turn be
indicators of useful biology (for example, localization of
a promoter based on DNA curvature and melting profiles). Currently, the four major focus areas of the group
are: 1) prediction of transcripts, including promoters,
operons (containing genes coding for proteins, rRNAs,
tRNAs or other ncRNAs), and terminators; 2) prediction of highly expressed genes (based on chromatin properties of the genomic DNA sequence, as well as CAI
(codon adaption index) values for genes encoding proteins); 3) developing models of gene interaction networks involved in bacterial pathogenesis; and, 4) developing novel methods for comparison of bacterial genomes. The analysis of a single genome can contain much
information, and coupled with experimental data, such
as transcriptomic, proteomic, and metabolomic results,
the information for even one organism can be overwhelming. To handle and maintain this large amount of data
for hundreds of organisms sequenced requires a structured database system. For this purpose, the GenomeAtlas
database (www.cbs.dtu.dk/services/GenomeAtlas) has
been developed including a web interface for presenting
much of this information from a genomic perspective.
The GenomeAtlas database also includes visualisation
methods for viewing and comparison of genomic properties for all the sequenced microbial genomes. The group
also designs high-density microarrays for bacterial genomes, and perform laboratory experiments to test the
predictions, as well as generate new data for models and
making new predictions, in an iterative manner. The
microarrays are designed to test predictions of transcriptional start sites, non-coding RNA and conserved and
unique coding regions within a bacterial species.
G
SYSTEMS BIOLOGY
OF GENE EXPRESSION
Group coordinator: Assistant professor Henrik Bjørn
Nielsen, hbjorn@cbs.dtu.dk
X In the post-genomic area understanding the dynamics of biological systems is becoming increasingly in
focus. The activity of genes and their encoded products
can be regulated in several ways, but transcription is the
primary level of regulation in most systems. The recent
flood of global expression analyses has underscored the
importance of transcriptional regulation.
The Systems Biology of Gene Expression Group at
CBS conducts research into systems biology with primary offset in transcriptomics. The group has evolved
from a microarray-focused activity and is currently focused on addressing questions at the systems biology level. Current research topics in the group include: integration and mining of diverse data domains such as gene
expression data, protein-protein interaction data, gene
ontology, regulatory sequence motifs, ChIP-on-chip
data and non-coding RNAs. In addition, the group studies the cellular networks whose perturbations are measured experimentally. The group takes advantage of the
CBS laboratory facilities both for data collection and verification of hypotheses. The facilities include an RNA
lab, Affymetrix facilities, custom designed oligo array
equipment, and RT-PCR instruments. Even though the
group holds expertise on experiments, computer science
and statistics, the main focus is on biological and medical problems. These problems are formulated in collaboration with other teams from industry and academia.
The group has also contributed to the scientific community with important tools for microarray design and
analysis: the microarray normalization algorithm qspline,
OligoWiz for microarray probe selection and the widely
used ‘affy package’ comprising a complete framework
for Affymetrix GeneChips analysis.
E
CHEMOINFORMATICS
Group leader: Associate professor Svava Ósk Jonsdottir,
svava@cbs.dtu.dk
X The search for new drugs is a very challenging and
N
1
..
..
..
..
..
.
N
O
O
O
80
1
80
.
..
...
1
.
..
...
Input Nodes
..
...
.
costly endeavor. The possibility of using computational
methods for screening compounds at an earlier stage can
significantly improve the success rate among drug candidates, as many late drug failures due to toxicity and other factors thus can be avoided.
The Chemoinformatics Group at CBS works with
the development of new and innovative computational
tools for use in the drug discovery and optimization
process. The research is presently focused mainly on
analysis of large compound and property databases, and
the development of predictive tools using machine learning and computational chemistry methods. Such models are based on the structural features of the drug molecules, combined with relevant biological and chemical
information in such a way that it becomes possible to
predict the behavior of unknown compounds. This research is carried out in close collaboration with scientists
at the Danish University of Pharmaceutical Sciences.
Examples of current research projects are: Development
of pre-screening methods used for selecting compounds
for a drug discovery pipeline, prediction methods for
properties like solubility and various types of toxicity,
prediction of drug toxicity based on NMR metabonomics data from rat urine, and modeling of hERG ion
channel blockers. An integrated part of this research effort is building an in-house infrastructure of accessible
data by collecting a number of relevant compound databases and data sets. New links between chemoinformatics, bioinformatics and systems bioleogy are also explored.
Hidden Layer
Output
H
WWW.CBS.DTU.DK
Center for Biological Sequence Analysis
BioCentrum-DTU, Technical University of Denmark
Kemitorvet, Building 208
DK-2800 Kgs. Lyngby, Denmark
phone: +45 4525 2477, fax: +45 4593 1585
e-mail: cbs@cbs.dtu.dk
Download