Bioinformatics in Computer Science, The Virginia Bioinformatics

advertisement
Bioinformatics in Computer Science,
the Virginia Bioinformatics Institute,
and Opportunities for Engineering
Lenwood S. Heath
Department of Computer Science
Blacksburg, VA 24061
College of Engineering
Advisory Board Meeting
October 29, 2004
10/29/2004
Bioinformatics in Computer Science
1
Overview
• Computational biology and bioinformatics
• The players
• Computer Science
• Virginia Bioinformatics Institute (VBI)
• Others at VT
• Opportunities for the College
•
•
•
•
10/29/2004
Collaboration with VBI
SBES, Wake Forest School of Medicine
NIH and DHS funding
Scientific modeling
Bioinformatics in Computer Science
2
Computational Biology and
Bioinformatics
• Computational biology — computational research inspired by
biology
• Bioinformatics — application of computational research
(computer science, mathematics, statistics) to advance basic and
applied research in the life sciences
• Agriculture
• Basic biological science
• Medicine
• Both ideally done within multidisciplinary collaborations
10/29/2004
Bioinformatics in Computer Science
3
Bioinformatics at VT (Part I)
• Biological modeling (Tyson, Watson): > 20 years
• Computational biology, genome rearrangements
(Heath): > 10 years
• Fralin Biotechnology sponsored faculty advisory
committee centered on bioinformatics: 1998-2000
• Biochemistry; biology; CALS; computer science
(Heath, Watson); statistics; VetMed
• Provost provided $1 million seed money
• First VT bioinformatics hire (Gibas, biology, 1999)
10/29/2004
Bioinformatics in Computer Science
4
Bioinformatics at VT (Part II)
• Outside initiative submitted to VT for a campus
bioinformatics center — 1998
• Discussions of bioinformatics advisory committee
contributed to a proposal to the Gilmore
administration — 1999
• Governor Gilmore puts plans and money for
bioinformatics center in budget — 1999-2000
• Virginia Bioinformatics Institute (VBI) established
July, 2000; housed in CRC
10/29/2004
Bioinformatics in Computer Science
5
Bioinformatics at VT (Part III)
• Bioinformatics course and curriculum development began
with faculty subcommittee — 1999
• Courses supporting bioinformatics now in many life
science and computational science departments, including:
•
•
•
•
•
•
Biology
Biochemistry
Computer Science
Plant Pathology, Physiology, and Weed Science (PPWS)
Mathematics
Statistics
10/29/2004
Bioinformatics in Computer Science
6
Bioinformatics Education at VT
• CS has been training CS graduate students in
bioinformatics since 2000
• Graduate bioinformatics option established in a
number of participating departments — 2003
• Ph.D. program in Genetics, Bioinformatics, and
Computational Biology (GBCB) — 2003
• First GBCB students arrived, Fall, 2003; now in
second year; completing core requirements
10/29/2004
Bioinformatics in Computer Science
7
Bioinformatics Spirit at VT
• Close collaboration between life scientists and
computational scientists from the beginning
• Educational approach insists on adequate
multidisciplinary background
• Multidisciplinary collaborators work closely on a
regular basis
• Contributions to biology or medicine essential
outcomes
10/29/2004
Bioinformatics in Computer Science
8
The Players
• Computer Science
• Virginia Bioinformatics Institute (VBI)
• Others at VT
10/29/2004
Bioinformatics in Computer Science
9
CS Bioinformatics Faculty
1.
2.
3.
4.
5.
6.
7.
8.
Chris Barrett (VBI, CS)
Vicky Choi
Roger Ehrich
Edward A. Fox
Lenny Heath
T. M. Murali
Chris North
Alexey Onufriev
10/29/2004
9.
10.
11.
12.
13.
14.
15.
Naren Ramakrishnan
Adrian Sandu
Eunice Santos
João Setubal (VBI, CS)
Cliff Shaffer
Layne Watson
Liqing Zhang
Bioinformatics in Computer Science
10
Relevant Expertise
•
•
•
•
•
•
•
•
•
•
•
Algorithms — Choi, Heath, Santos, Setubal, Shaffer, Watson
Computational structural biology — Onufriev, Sandu
Computational systems biology — Murali
Data mining — Ramakrishnan
Genomics — Heath, Murali, Ramakrishnan, Setubal, Zhang
Human-computer interaction, visualization — North
Image processing — Ehrich, Watson
Information retrieval — Ehrich
High performance computing — Sandu, Santos, Watson
Optimization — Watson
Simulation — Barrett
10/29/2004
Bioinformatics in Computer Science
11
Established Bioinformatics Faculty
•
•
•
•
•
Layne Watson
Lenny Heath
Cliff Shaffer
Naren Ramakrishnan
Eunice Santos
10/29/2004
Bioinformatics in Computer Science
12
Layne Watson
• Professor of Computer Science and Mathematics
• Expertise: algorithms; image processing; high
performance computing; optimization; scientific
computing
• Computational biology: has worked with John Tyson
(biology) for over 20 years
• JigCell: cell-cycle modeling environment; with Tyson,
Shaffer, Ramakrishnan, Pedro Mendes of VBI
• Expresso: microarray experimentation; with Heath,
Ramakrishnan
10/29/2004
Bioinformatics in Computer Science
13
Lenny Heath
• Professor of Computer Science
• Expertise: algorithms; theoretical computer science;
graph theory
• Computational biology: worked in genome
rearrangements 10 years ago
• Bioinformatics: concentration in past 5 years
• Expresso: microarray experimentation; with
Ramakrishnan, Watson
– Multimodal networks
– Computational models of gene silencing
10/29/2004
Bioinformatics in Computer Science
14
Cliff Shaffer
• Associate Professor of Computer Science
• Expertise: algorithms; problem solving
environments; spatial data structures;
• JigCell: cell-cycle modeling environment; with
Ramakrishnan, Tyson, Watson
10/29/2004
Bioinformatics in Computer Science
15
Naren Ramakrishnan
• Associate Professor of Computer Science
• Expertise: data mining; machine learning; problem
solving environments
• JigCell: cell-cycle modeling problem solving
environment; with Shaffer, Watson
• Expresso: microarray experimentation; with Heath,
Watson
– Proteus — inductive logic programming system for
biological applications
– Computational models of gene silencing
10/29/2004
Bioinformatics in Computer Science
16
Eunice Santos
• Associate Professor of Computer Science
• Expertise: Algorithms; computational biology;
computational complexity; parallel and
distributed processing; scientific computing
• Relevant bioinformatics project: modeling
progress of breast cancer
10/29/2004
Bioinformatics in Computer Science
17
New Bioinformatics Faculty
•
•
•
•
•
•
•
•
T. M. Murali (2003) CS bioinformatics hire
Alexey Onufriev (2003) CS bioinformatics hire
Adrian Sandu (2004) CS hire
João Setubal (Early 2004) VBI and CS
Vicky Choi (2004) CS bioinformatics hire
Liqing Zhang (2004) CS bioinformatics hire
Chris Barrett (Fall 2004) VBI and CS
One more bioinformatics position for Fall, 2005
10/29/2004
Bioinformatics in Computer Science
18
T. M. Murali
• Assistant Professor of Computer Science
• Hired in 2003 for bioinformatics group
• Expertise: algorithms; computational geometry;
computational systems biology
• Projects:
– Functional gene annotation
– xMotif — find patterns of coexpression among subsets of
genes
– RankGene — rank genes according to predictive power for
disease
10/29/2004
Bioinformatics in Computer Science
19
Alexey Onufriev
• Assistant Professor of Computer Science
• Hired in 2003 for bioinformatics group
• Expertise: Computational and theoretical biophysics and
chemistry; structural bioinformatics; numerical
methods; scientific programming
• Projects:
–
–
–
–
–
Biomolecular electrostatics
Theory of cooperative ligand binding
Protein folding
Protein dynamics — how does myoglobin uptake oxygen?
Computational models of gene silencing
10/29/2004
Bioinformatics in Computer Science
20
Adrian Sandu
• Associate Professor of Computer Science
• Hired in 2003
• Expertise: Computational science; numerical methods;
parallel computing; scientific and engineering
applications
• Computational science:
– New generation of air quality models
– computational tools for assimilation of atmospheric chemical
and optical measurements into atmospheric chemical
transport models
10/29/2004
Bioinformatics in Computer Science
21
João Setubal
•
•
•
•
Research Associate Professor at VBI
Associate Professor of Computer Science
Joined in early 2004
Expertise: algorithms; computational biology;
bacterial genomes
• Comparative genomics
10/29/2004
Bioinformatics in Computer Science
22
Vicky Choi
•
•
•
•
Assistant Professor of Computer Science
Hired in 2004 for bioinformatics group
Expertise: computational biology; algorithms
Projects:
– Algorithms for genome assembly
– Protein docking
– Biological pathways
10/29/2004
Bioinformatics in Computer Science
23
Liqing Zhang
•
•
•
•
Assistant Professor of Computer Science
Hired in 2004 for bioinformatics group
Expertise: evolutionary biology; bioinformatics
Research interests:
– Comparative evolutionary genomics
– Functional genomics
– Multi-scale models of bacterial evolution
10/29/2004
Bioinformatics in Computer Science
24
Bioinformatics Research in CS
•
•
•
•
Collaboration
Funding
Resources
Overview of projects
10/29/2004
Bioinformatics in Computer Science
25
Selected Collaborations
• Virginia Tech: Biochemistry, Biology,
Fralin Biotechnology Center, PPWS,
Veterinary Medicine, VBI, Wood Science
• North Carolina State University: Forest
Biotechnology Center
• Duke: Biology
• University of Illinois: Plant Biology
10/29/2004
Bioinformatics in Computer Science
26
Selected Funding (Watson/Tyson)
• NSF MCB-0083315: Biocomplexity---Incubation Activity: A
Collaborative Problem Solving Environment for
Computational Modeling of Eukaryotic Cell Cycle Controls.
J. J. Tyson, L. T. Watson, N. Ramakrishnan, C. A. Shaffer, J.
C. Sible. $99,965.
• NIH 1 R01 GM64339-01: ``Problem Solving Environment
for Modeling the Cell Cycle. J. J. Tyson, J. Sible, K. Chen, L.
T. Watson, C. A. Shaffer, N. Ramakrishnan, P. Mendes
(VBI). $211,038.
• Air Force Research Laboratory F30602-01-2-0572: The
Eukaryotic Cell Cycle as a Test Case for Modeling Cellular
Regulation in a Collaborative Problem Solving
Environment. J. J. Tyson, J. C. Sible, K. C. Chen, L. T.
Watson, C. A. Shaffer, N. Ramakrishnan. $1,650,000.
10/29/2004
Bioinformatics in Computer Science
27
Selected Funding (Heath, et al.)
• NSF IBN 0219322: ITR: Understanding Stress Resistance
Mechanisms in Plants: Multimodal Models Integrating
Experimental Data, Databases, and the Literature. L. S. Heath;
R. Grene, B. I. Chevone, N. Ramakrishnan, L. T. Watson.
$499,973.
• NSF EIA-01903660: A Microarray Experiment Management
System. N. Ramakrishnan, L. S. Heath, L. T. Watson, R. Grene,
J. W. Weller (VBI). $600,000.
• DARPA N00014-01-1-0852: Dryophile Genes to Engineer StasisRecovery of Human Cells. M. Potts, L. S. Heath, R. F. Helm, N.
Ramakrishnan, T. O. Sitz, F. Bloom, P. Price (Life Technologies),
J. Battista (LSU). $4,532,622.
• NSF CCF 0428344: ITR-(NHS)-(sim): Computational Models
for Gene Silencing: Elucidating a Pervasive Biological Defensive
Response. L. S. Heath, R. F. Helm, A. Onufriev, M. Potts, N.
Ramakrishnan. $1,500,000.
10/29/2004
Bioinformatics in Computer Science
28
Research Resources Available
to CS Bioinformatics
System X
• Third fastest computer on the planet (2003)
Laboratory for Advanced Scientific Computing &
Applications (LASCA)
• Parallel algorithms & math software
• Anantham Cluster
• Grid computing
Bioinformatics Research LAN
• Linux, Mac OS X
• Bioinformatics databases and analysis
10/29/2004
Bioinformatics in Computer Science
29
JigCell: A PSE for
Eukaryotic Cell Cycle Controls
Marc Vass, Nick Allen, Jason Zwolak, Dan Moisa,
Clifford A. Shaffer, Layne T. Watson,
Naren Ramakrishnan, and John J. Tyson
Departments of Computer Science and Biology
10/29/2004
Bioinformatics in Computer Science
30
Cell Cycle of Budding Yeast
Sister chromatid
separation
Cdc20
PPX
Lte1
Esp1
Budding
Esp1
Pds1
Esp1
Bub2
Cdc15
Cln2
SBF
Tem1
Net1P
SBF
Mcm1
Pds1
Unaligned
chromosomes
Net1
RENT
Unaligned
chromosomes
Mcm1
Cdh1
Mad2
Cdc20
Cln3
Cdc14
Cdc20
Cln2
Clb2
Cdc15
and
Clb5
Bck2
Mcm1
Clb2
Cdc14
growth
CDKs
Swi5
Sic1
DNA synthesis
10/29/2004
P
Sic1
Cdh1
SCF
Cdc14
?
MBF
APC
Clb5
Cdc20
Esp1
Bioinformatics in Computer Science
31
JigCell Problem-Solving Environment
Experimental Database
Wiring Diagram
Differential Equations
Analysis
Parameter Values
Simulation
Automatic Parameter Estimation
10/29/2004
Visualization
Bioinformatics in Computer Science
32
Why do these calculations?
• Is the model “yeast-shaped”?
• Bioinformatics role: the model organizes
experimental information.
• New science: prediction, insight
JigCell is part of the DARPA BioSPICE suite of
software tools for computational cell biology.
10/29/2004
Bioinformatics in Computer Science
33
Expresso:
A Next Generation Software
System for Microarray
Experiment Management
and Data Analysis
10/29/2004
Bioinformatics in Computer Science
34
Scenarios for Effects of Abiotic Stress
on Gene Expression in Plants
10/29/2004
Bioinformatics in Computer Science
35
The Expresso Pipeline
10/29/2004
Bioinformatics in Computer Science
36
Proteus — Data Mining with ILP
• ILP (inductive logic programming) — a data mining
algorithm for inferring relationships or rules
• Proteus — efficient system for ILP in bioinformatics
context
• Flexibly incorporates a priori biological knowledge (e.g.,
gene function) and experimental data (e.g., gene
expression)
• Infers rules without explicit direction
10/29/2004
Bioinformatics in Computer Science
37
Networks in Bioinformatics
• Mathematical Model(s) for Biological Networks
• Representation: What biological entities and parameters to
represent and at what level of granularity?
• Operations and Computations: What manipulations and
transformations are supported?
• Presentation: How can biologists visualize and explore
networks?
10/29/2004
Bioinformatics in Computer Science
38
Reconciling Networks
Munnik and Meijer,
FEBS Letters, 2001
Shinozaki and YamaguchiShinozaki, Current Opinion
in Plant Biology, 2000
10/29/2004
Bioinformatics in Computer Science
39
Multimodal Networks
• Nodes and edges have flexible semantics to represent:
- Time
- Uncertainty
- Cellular decision making; process regulation
- Cell topology and compartmentalization
- Rate constants
- Phylogeny
• Hierarchical
10/29/2004
Bioinformatics in Computer Science
40
Using Multimodal Networks
• Help biologists find new biological knowledge
• Visualize and explore
• Generating hypotheses and experiments
• Predict regulatory phenomena
• Predict responses to stress
• Incorporate into Expresso as part of closing the loop
10/29/2004
Bioinformatics in Computer Science
41
Fusion — Chris North
• “Snap together”
visualization
environment
• Interactively
linked data from
multiple sources
• Data mining in
the background
10/29/2004
Bioinformatics in Computer Science
42
Virginia Bioinformatics Institute (VBI)
• Established by the state in July, 2000; high visibility
• Applies computational and information technology in
biological research
• Research faculty (currently, about 18) expertise includes
–
–
–
–
–
Biochemistry
Comparative Genomics
Computer Science
Drug Discovery
Human and Plant Pathogens
–
–
–
–
Mathematics
Physics
Simulation
Statistics
• More than $43 million funded research
10/29/2004
Bioinformatics in Computer Science
43
VBI Mission Statement
At The Virginia Bioinformatics Institute, we research
biological systems and design, develop and disseminate
technologies to make discoveries that improve the
quality of human life.
We focus on understanding biology through systems that
integrate the interaction between organisms and their
environment for the benefit of science and society.
We also strive to collaborate with the scientific community
by enabling transformation of information into useful
knowledge and by providing scientific services.
10/29/2004
Bioinformatics in Computer Science
44
The Disease Triangle
10/29/2004
Bioinformatics in Computer Science
45
Specialized VBI Facilities
• Core lab facilities
–
–
–
–
DNA sequencing
Gene expression
Proteomics
Metabolomics
• Core computational facilities
–
–
–
–
Cluster computing dedicated to bioinformatics
Data storage
Visualization
Database administration
10/29/2004
Bioinformatics in Computer Science
46
VBI Integration into Main Campus
• Originally housed in Corporate Research Center
• Partially moved to campus last year — Bioinformatics I building
• Final move to campus, December, 2004 — Bioinformatics II
building
• Total space in Bioinformatics I and II will be 130,560 square feet
10/29/2004
Bioinformatics in Computer Science
47
VBI Research Portfolio ( by sponsor )
38%
1%
1%
National Institutes of Health
1%
National Science Foundation
5%
VT (JHU/ASPIRES/VTF)
U.S. Dept of Defense
5%
CTRF
Other Academic Institutions
Industry
12%
U.S. Dept of Agriculture
Foundations
25%
12%
10/29/2004
Bioinformatics in Computer Science
48
Funded Partnerships with VT Departments
•
•
•
•
•
•
•
•
•
•
•
•
Aerospace and Ocean Engineering
Biochemistry
Biology
Biomedical Science and Pathobiology, VMRCVM
Computer Science
Crop and Soil Environmental Sciences
Electrical and Computer Engineering
Fisheries and Wildlife Science
Horticulture
Mathematics
Plant Pathology, Physiology, and Weed Science
Statistics
10/29/2004
Bioinformatics in Computer Science
49
Opportunities for CS and the
College of Engineering
•
•
•
•
Collaboration with VBI
SBES, Wake Forest School of Medicine
NIH and DHS funding
Scientific modeling
10/29/2004
Bioinformatics in Computer Science
50
Collaboration with VBI
• Basic biological science — molecular biology,
functional genomics, systems biology
• Computational methods to answer biological
questions from vast stores of VBI data resources
• Computational models and simulation of
biological systems, e.g., host-pathogen
interaction
10/29/2004
Bioinformatics in Computer Science
51
SBES, Wake Forest
•
•
•
•
•
•
•
•
Medical research includes significant computational challenges
Much analysis can be done without additional lab biology
Biomedical data analysis and mining
Identification of genes responsible for complex traits
More flexible and useful medical instrumentation
Precise identification of disease
Treatment suggestion
Prognosis prediction
10/29/2004
Bioinformatics in Computer Science
52
NIH and DHS Funding
• Bioinformatics is one of the New Pathways to
Discovery in the NIH Roadmap
• Computation is essential to advancing medical
practice, from diagnosis to drug design
• Department of Homeland Security (DHS) is funding
research to respond to bioterrorism
• Detection and identification of agents
• Rapid response to threats
• Modeling crisis impact and response
10/29/2004
Bioinformatics in Computer Science
53
Scientific Modeling
•
•
•
•
•
•
Protein folding
Protein function
Protein-protein interaction
Cellular signaling and decision processes
Heart, lung, neurological function
System X is an essential component
10/29/2004
Bioinformatics in Computer Science
54
Conclusion
• Bioinformatics is an emerging area of
opportunity, but challenging to enter
• Rapid developments the norm; flexibility
essential
• Virginia Tech and the College are wellpositioned to take advantage
10/29/2004
Bioinformatics in Computer Science
55
Download