The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

advertisement
The Future of Bioinformatics
(with examples from structural bioinformatics)
Philip E. Bourne
The University of California San Diego
pbourne@ucsd.edu
http://www.sdsc.edu/pb/Talks
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Outline
 Bioinformatics thus far
 Today – a growth discipline
 Drivers
 Data
 Complexity – biological and data
 The interface to medical informatics and
systems biology
 Challenges
 The devil is in the details
 Quality control
 Fundamentals versus relevance to biology
Feb. 25, 2004
World University Network - Worldwide
Broadcast
"You can observe a lot just by
watching."
Bioinformatics Thus Far – Pre 1970
Bioinformatics (2003) 19 2176-2190
1945 Biochemical Pathways - Horowitz
1953 Structure of DNA – W&C
1969 Genetic Variation
1962 Molecular Homology – Florkin
1965 Evolutionary Patterns – Purling
1966 Molecular Modeling - Levinthal
1967 Phylogenetic Trees – Fitch
1969 Properties – Ptitsyn
1970 Dynamic Programming N&W
1953 Game Theory – Neumann and Morgenstern
1959 Grammars – Chomsky
1962 Information Theory – Shannon & Weaver
1966 Cellular automata – Neuman
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Bioinformatics Thus Far – 1970’s
Problem Definition
Improved Sequence Alignments
Sanakoff
Smith Waterman Algorithm
Exon/Introns
Gilbert
Public Resources
Dayhoff, PDB
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Structural patterns
And Properties
Richards
Structure Prediction
Levitt
Chou and Fasman
Scheraga
Bioinformatics Thus Far – 1980’s
Computational Biology Emerges
Domains recognized
Rashin
Neural nets
Hopfield
Tree of Life Emerges
Molecular computing
Conrad
FASTA
Lipman & Pearson
Nanotechnology
Drexler
Profiles
Gribskov
Reductionism begins
Thornton
Sander
Feb. 25, 2004
Clustering
Shepard
Relational Databases
Networks – EMBLnet, BIONET
World University Network - Worldwide
Broadcast
Bioinformatics Thus Far – 1990’s
Bioinformatics and Biotechnology
Emerge
 Human Genome
 Internet/Web
Project
Feb. 25, 2004
World University Network - Worldwide
Broadcast
So What is Bioinformatics Today?
 A relatively new term for a scientific endeavor that has




been around much longer
Medical informatics preceded it, and defined some of the
foundations?
A scientific endeavor driven out of a paradigm shift in
which biology became a data driven science
A scientific endeavor that has gained from fundamental
developments is computer and information science e.g.,
algorithms, ontologies, Bayesian networks, neural
networks, text mining …
A growth discipline…….
Feb. 25, 2004
World University Network - Worldwide
Broadcast
"Do you mean now?" -- When asked for the time.
Bioinformatics - A Vice Chancellor’s View
Biological Experiment
Collect
Data
Information
Characterize
Knowledge
Compare
Model
Discovery
Infer
Complexity
Higher-life
Technology
1
Organ
10
Brain
Mapping
Model Metaboloic
Pathway of E.coli
Sub-cellular
Structure
(C) Copyright Phil Bourne 1998
102 Neuronal
Modeling
106
Virus
Structure
Ribosome
Human
Genome
Project
Yeast
E.Coli
C.Elegans
Genome Genome Genome
90
1
# People/Web Site
Genetic
Circuits
ESTs
Sequence
Feb. 25, 2004
100000 Computing
Power
Cardiac
Modeling
Cellular
Assembly
Data
1000
100
Gene Chips
World University Network - Worldwide
Broadcast
95
00
Year
1 Small
Genome/Mo.
Human
Genome
05
Sequencing
Technology
Growth in Bioinformatics as
Measured by ISMB Attendance
1500
2002
Edmonton
CANADA
http://www.iscb.org/history.shtml
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Bioinformatics Journal
1400
1200
1000
800
Submissions
600
400
200
0
1997
1998
1999
2000
2001
2002
2003
Bioinformatics Journal
5
Growth in the Journal
Bioinformatics
4.5
4
3.5
3
2.5
Impact Factor
2
1.5
1
0.5
0
1997
Feb. 25, 2004
1998
1999
World University Network - Worldwide
Broadcast
2000
2001
2002
2003
Drivers – Data Growth and Data
Complexity
 Consider Macromolecular Structure as an
example
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Bourne
Bioinformatics Editorial 1999 15(9):715
“Over the next 5 years there will be an estimated 10
major structural genomics efforts each yielding 200
structures per year. While these efforts will deplete
regular structure determination efforts, improvements
in technology and a general expansion of the field
will continue to yield 50 structures per week worldwide
outside of the structural genomics initiatives.”
Net result 35,000 structures by 2005
There were 11,000 structures at the time of this prediction
Feb. 25, 2004
World University Network - Worldwide
Broadcast
"You can observe a lot just by
watching."
PDB Growth Curve
Approx. 24,000 structures today
In 2003 approx. 5,000 structures were deposited
Feb. 25, 2004
World University Network - Worldwide
Broadcast
History
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Predictions Can
Be Good
A Data Centric View of the Future
 Data complexity
 High throughput data collection
 Database versus literature
 Bioinformatics as data driver
 Data representation
 Data integration
Feb. 25, 2004
World University Network - Worldwide
Broadcast
"If you come to a fork in the road, take it."
Numbers and Complexity
Complexity is increasing
(a) myoglobin (b) hemoglobin (c) lysozyme (d) transfer RNA
(e)25, antibodies
(f) viruses
actin
(h) the nucleosome
Feb.
2004
World University(g)
Network
- Worldwide
Broadcast
(i) myosin
(j) ribosome
Courtesy of David Goodsell, TSRI
Complexity - The Ribosome
A Nanomachine
50s
• Translates mRNA into protein
• Molecular Mass: 2.6 million
• Maximum Dimension ~25 nm
protein
mRNA
• 2/3 RNA – performs catalysis
• 1/3 protein –outer scaffold for the RNA
30s
Figure from J. Frank, Wadsworth Center, NY
"The ribosome, together with its accessories, is probably
the
most sophisticated machine
ever made.“ R. Garrett (1999) Nature 400
Feb. 25, 2004
World University Network - Worldwide
Broadcast
High Throughput - The Structural Genomics
Pipeline (X-ray Crystallography)
Basic Steps
Crystallomics
• Isolation,
Target • Expression,
Data
Selection • Purification, Collection
• Crystallization
Bioinformatics
• Distant
homologs
• Domain
recognition
Automation
Bioinformatics
• Empirical
rules
Automation
Better
sources
Structure
Solution
Structure
Refinement
Software integration
Decision Support
MAD Phasing Automated
fitting
Bioinformatics Throughout the Process
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Functional
Annotation
Publish
Bioinformatics
No?
• Alignments
• Protein-protein
interactions
• Protein-ligand
interactions
• Motif recognition
An Aside on the Future of Publishing
Full Description Captured as the Paper/Database is
Written/Deposited Does away with ...
?
Oops!
ß sandwich? Where?
Large loop? Which one??
Loop-sheet-helix???
… the p53 core domain
structure consists of a ß
sandwich that serves as
a scaffold for two large
loops and a loop-sheethelix motif ...
1TSR
----Science Vol.265, p346
Corresponding structure from the PDB
Feb. 25, 2004
World University Network - Worldwide
Broadcast
BioEditor - A DTD Driven
Domain Specific Editor
Feb. 25, 2004
World University Network - Worldwide
Broadcast
http://bioeditor.sdsc.edu
Structural Genomics Targets and their
Status from http://targetdb.rcsb.org
Bourne et al. 2004
Pacific Symposium on Biocomputing
http://www-smi.stanford.edu/projects/helix/psb04/bourne.doc
Feb. 25, 2004
World University Network - Worldwide
Broadcast
The Data - Bioinformatics Cycle
Result – Computation and Experiment
Become More Synergistic
Turn Knowledge into New Data Requirements
Data
Bioinformatics
Turn Data into Knowledge
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Deuterium Exchange Mass Spec to Predict Structure
Target Protein
Structure Templates
CASP
DXMS
Threading
k (Stability)
Best Structure(s)
Amino Acid
Profile Match Method
Feb. 25, 2004
World University Network - Worldwide
Broadcast
COREX
Biological Representation
 The Gene Ontology changes everything




Molecular function
Biochemical process
Cellular location
DAG – machine usable
 The number of papers referencing the
gene ontology has increased dramatically
in the last year
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Biological Data Representation
Future
 Tools to construct ontologies from free
text?
 Ontologies for details of function, proteinprotein interaction, protocols, complete
pathway information
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Data Integration
Web Services – the
holy grail of
interoperability?
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Web Services
 Its not CORBA – biologists can do it
 Easy to implement
 Platform independent
 Driver to force data providers to define and
publish a detailed API
 Compelling - introduces the prospect of
global workflow
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Perl Web Services Client Example
 A small PERL program to access all Pubmed
abstracts containing the word ‘ferritin’
use SOAP::Lite;
$ids_ref = SOAP::Lite
-> uri(‘http://server.location.edu/pdbWebServices’)
-> proxy(‘http://server.location.edu/pdbWebServices’)
-> pubmedAbstractQuery($ARGV[0])
-> result;
@ids = @($ids_ref);
Print “@ids\n”;
Mycomputer(1)% web_service.pl ferritin
1AEW 1AQO 1BCF 1BFR 1BG7 1DPS 1EUM 1FHA 1JGC 1JI5 1JIG 1MFR
1QGH 1RCC 1RCD 1RCE 1RCG 1RCI 1RYT 2FHA
Feb. 25, 2004
World University Network - Worldwide
Broadcast
A Biological Complexity
Perspective
Feb. 25, 2004
World University Network - Worldwide
Broadcast
REPRESENTATIVE
DISCIPLINE
EXAMPLE
UNITS
Anatomy
MRI
Physiology
Heart
Cell Biology
SCIENTIFIC RESEARCH
& DISCOVERY
Organisms
Neuron
REPRESENTATIVE
TECHNOLOGY
Migratory
Sensors
Organs
Ventricular
Modeling
Cells
Electron
Microscopy
You Are Here
Proteomics
Genomics
Structure
Sequence
Macromolecules
Biopolymers
Infrastructure
Medicinal
Chemistry
Feb. 25, 2004
Protease
Inhibitor
X-ray
Crystallography
Technologies
Atoms & Molecules
World University Network - Worldwide
Broadcast
Training
Protein
Docking
The Post-Genomic Era
The “New” Central Dogma
Genomes
Gene
Products
Structure &
Function
Pathways &
Physiology
~ Scientific Challenges - Deciphering the genome, mapping the genotype-phenotype
relationships, dissecting organismic function, engineering organisms with altered
functionality, figuring out complex traits and polymorphism, understanding physiology.
~ Algorithmic Challenges - comparisons of whole and partial genomes, metrics for
similarity and homology, metabolic reconstruction, dissecting pathways, and whole cell
modeling.
~ Computational Challenges - creation the informatics infrastructure, creation,
annotation, curation and dissemination of databases, development of parallel
computational methods.
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Interaction Networks
A Protein Interaction Map of
Drosophila melanogaster
L. Giot, et al. Science, Vol. 302,
Issue 5651, 1727-1736, December 5,
2003
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Phenomena in biological systems may be
organized in several layers.
 Populations


Ecological Communities
Populations of a Species
 Physiology and Organisms



Integrative physiology, Homeostasis
Organs, Tissues
Cells
 Pathways and Information Transfer



Integrated metabolism, regulatory, developmental pathways
Simple pathways for information transfer, regulation, development
Simple metabolic pathways for creating & using other molecules
 Biological Macromolecules and Structures




Biomolecular Assemblies; ligand-receptor complexes
Molecules and Structures created by genes, gene products
Gene Products: RNAs; Proteins
Genes and Genomes
 Physics and Chemistry

e.g. Physical Chemistry, Organic Chemistry, Information theory, Constraints of self-assembling adaptive systems
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Each system layer builds from lower system layers
& acquires new emergent properties
 Populations


Ecological Communities
Populations of a Species
Ecological
Processes
& Populations



Integrative physiology, Homeostasis
Organs, Tissues
Cells
Tissue &
Organismal
Physiology
Developmental
& Physiological
Processes
 Pathways and Information Transfer



Integrated metabolism, regulatory, developmental pathways
Simple pathways for information transfer, regulation, development
Simple metabolic pathways for creating & using other molecules
Biochemical
Pathways &
Processes
 Biological Macromolecules and Structures




Biomolecular Assemblies; ligand-receptor complexes
Molecules and Structures created by genes, gene products
Gene Products: RNAs; Proteins
Genes and Genomes
 Physics and Chemistry

Biomolecular
Structure &
Function
Genes Information
and Genomes
e.g. Physical Chemistry, Organic Chemistry,
theory, Constraints of self-assembling adaptive systems
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Physics and Chemistry
New Emergent Properties
 Physiology and Organisms
The Next Response
 Transitional medicine
 Personalized medicine
 Merger of medical, chem and
bioinformatics
 Training in cooperative in silico and
experimental research
 Centers that reflect that training ie different
to NCBI or EBI
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Think! How the hell are you gonna think and hit at the same time?"
Statement
of the Director, NIGMS,
before
the House Appropriations
Feb. 25, 2004
World University Network
- Worldwide
Broadcast
Subcommittee on Labor, HHS, Education Thursday, February 25, 1999
Near Term Challenges
Better Resources and Algorithms
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Current Data Resources and Algorithms are
Challenged by Biological Complexity
 Our understanding of biological complexity
is not reflected in the current generation of
biological data resources
 Hence these resources do not enable the
next generation
 Algorithms are often limited since
complexity implies variation
 Consider an example - the protein kinaselike superfamily
Feb. 25, 2004
World University Network - Worldwide
Broadcast
The SCOP Classification Hierarchy
Courtesy Steven Brenner
Feb. 25, 2004
World University Network - Worldwide
Broadcast
An Example of a Structural Superfamily:
The Protein Kinase-Like Superfamily
SCOP grouping for kinases
1) Class: Alpha+Beta
2) Fold: Protein Kinase Catalytic Core
3) Superfamily: Protein Kinase
Catalytic Core
4) Families:
7
8
a) Ser/Thr Kinases
b) Tyr Kinases
c) Atypical Kinases
d) Antibiotic Kinases
e) Lipid Kinases
Superfamily: not all eukaryotic or
protein kinases: some homologues
discovered in bacteria that
phosphorylate antibiotics, others
phosphorylate lipids
Feb. 25, 2004
Typical Kinase Core (c-Src, PDB ID: 2SRC)
World University Network - Worldwide
Broadcast
Evolution of the Kinase
Superfamily: Comparison of
Three Superfamily Members
•A: Casein kinase 1 (PDB ID:
1CSN)
•B: Aminoglycoside kinase
(PDB ID: 1J7L)
•C: Phosphatidylinositol 3kinase (PDB ID: 1E8X).
•D: The previous three
structures with only their shared
region superposed (1CSN: light
blue, 1J7L: red, 1E8X: yellow).
•The three kinases share a
minimal core required for ATP
binding and phosphotransfer.
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Our Algorithms Need to
Continue to Evolve
Consider structure comparison
and alignment of the diverse
protein kinases
Feb. 25, 2004
World University Network - Worldwide
Broadcast
An Example of Manual vs. Automated with Combinatorial Extension (CE)
•The manual alignment can be used to better
understand the limitations of our automated
method
•Alignment of helix C of two tyrosine kinases
•Insulin Receptor Kinase (pdb id 1IR3)
•c-Src (pdb id 2SRC)
•Can be aligned with 40% ident, 3.0Å
RMSD
•In Src, C-helix is displaced and rotated
outward
•Rotation pushes n-terminal end of helix
out very far from n-terminal end of IRK
•CE gaps a part of this (yellow), splitting
helix, aligning part of IRK helix C with
loop leading to helix C in Src
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Orange: IRK, Blue: c-Src
Yellow: CE gap region
An Example of Manual vs. Automated with CE
•A closer look:
CE alignment
•The CE alignment puts
closer C-alpha positions
together but does not
respect helical relationships
•Hand alignment respects
helix, aligns more distant
C-alpha positions
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Hand alignment
Improving CEfam:
Multiple Alignments
with CE
•Example with strands 1 and 2 of
kinase superfamily
•A: original
•B: optimal parameters
•C: manual
•Parameters also improved
results with other protein
superfamilies in visual analysis
•Just as sequence alignments are
benchmarked against structure
alignments, structure alignments
should be benchmarked to
manual results
•Improvement in optimization is
now being folded into the next
generation of CE
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Near Term Challenges Quality Control
Consider an example
The definition of domains from
3-D structure
Feb. 25, 2004
World University Network - Worldwide
Broadcast
The 3D Domain Assignment Problem
Domain is a fundamental structural, functional and evolutionary unit of
protein:
Compact
Stable
Have hydrophobic core
Fold independently
Perform specific function
Can be re-shuffled and put together in different
combinations
Evolution works on the level of domain
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Exact assignments of domains remains a difficult
and unresolved problem.
There is no complete agreement among experts on domain assignment
given a protein structure.
Expert methods agree on 80% of all existing manual assignments, the
remaining 20% represent “difficult” cases
Expert assignment #3
Expert assignment #1
Expert assignment #2
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Manual vs. automatic consensuses: do they overlap?
Chains with manual consensus: 375 (80% of entire dataset)
Chains with automatic consensus: 374 (80% of entire dataset)
Chains with consensus (automatic or manual) : 424 (90.6% of entire dataset)
Automatic consensus only
46 chains (10.9% of chains
with consensus)
Manual consensus only
47 chains (11.1% of
chains with consensus)
Manual and automatic consensus
agree
328 chains
(77.3% of chains with consensus)
Automatic consensus and manual
consensus disagree 3 chains (0.7%
of chains with consensus)
Veretnik et al. 2003 JMB submitted
Feb. 25, 2004
World University Network - Worldwide
Broadcast
1cjaa (actin-fragmin kinase, slime mold): an unusual kinase
[complex interface]
SCOP, PDP,
DomainParser
1 domain
Feb. 25, 2004
CATH
1 domain + unassigned
World University Network - Worldwide
Broadcast
DALI
4 domains
typical kinase
Near Term Challenges –
High Throughput
Feb. 25, 2004
World University Network - Worldwide
Broadcast
integrated Genomic Annotation Pipeline - iGAP
structure info
SCOP, PDB
Building FOLDLIB:
PDB chains
SCOP domains
PDP domains
CE matches PDB vs. SCOP
90% sequence non-identical
minimum size 25 aa
coverage (90%, gaps <30, ends<30)
sequence info
Deduced protein sequences
NR, PFAM
Prediction of :
signal peptides (SignalP, PSORT)
transmembrane (TMHMM, PSORT)
coiled coils (COILS)
low complexity regions (SEG)
Create PSI-BLAST profiles for protein sequences
Structural assignment of domains by
PSI-BLAST on FOLDLIB
Only sequences w/out A-prediction
Structural assignment of domains by
123D on FOLDLIB
Only sequences w/out A-prediction
Functional assignment by PFAM, NR,
PSIPred assignments
FOLDLIB
Feb. 25, 2004
Domain location prediction by sequence
World University Network - Worldwide
Broadcast
Store assigned regions in the DB
integrated Genomic Annotation Pipeline iGAP Deduced Protein sequences
structure info
SCOP, PDB
Building FOLDLIB:
PDB chains
SCOP domains
PDP domains
CE matches PDB vs. SCOP
90% sequence non-identical
minimum size 25 aa
coverage (90%, gaps <30, ends<30)
~800 genomes
@ 10k-20k per
=~107 ORF’s
sequence info
NR, PFAM
104
entries
Prediction of :
signal peptides (SignalP, PSORT)
transmembrane (TMHMM, PSORT)
coiled coils (COILS)
low complexity regions (SEG)
Create PSI-BLAST profiles for Protein sequences
Structural assignment of domains by
PSI-BLAST on FOLDLIB
4 CPU
years
228 CPU
years
3 CPU
years
Only sequences w/out A-prediction
Structural assignment of domains by
123D on FOLDLIB
9 CPU
years
Only sequences w/out A-prediction
Functional assignment by PFAM, NR,
PSIPred assignments
FOLDLIB
Feb. 25, 2004
Li, et al., (2003) Genome Biology
Domain location prediction by sequence
World University Network - Worldwide
Broadcast
252 CPU
years
3 CPU
years
Store assigned regions in the DB
Towards Workflows and the Grid
iGAP
APST
Scheduler
Executables
Parameters
Input
Output
Resources
MDS/NWS/Ganglia
XML
Grid Resource
Data
Manager
SCP/GASS/SRB/FTP
Information
Storage
Compute
Compute
Manager
Feb. 25, 2004
Grid Middleware
SSH/GRAM/GASS
PBS/Loadleveler/Condor
World University Network - Worldwide
Broadcast
THE EOL GRID
CONSORTIUM
SDSC
Blue Horizon
The EOL Cluster
Sun Enterprise Server
Industrial Partners
IBM
Ceres
EOL
BII
Singapore
Encyclopedia Proteomics Inc.
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Titech Japan
Near Term Challenges –
We need to overcome the
“high noon” problem
Feb. 25, 2004
World University Network - Worldwide
Broadcast
High Noon – A Working Definition
12:00
The cost:benefit ratio of entry to bioinformatics
tools and resources is
too high for the majority of biologists
Thus, those who could gain and
contribute most from the services provided
are not users
Feb. 25, 2004
World University Network - Worldwide
Broadcast
One Approach - MBT
 Java toolkit for developing custom molecular
visualization applications
 High-quality
interactive
rendering of:
 sequence
 structure
 function
http://mbt.sdsc.edu
Feb. 25, 2004
World University Network - Worldwide
Broadcast
MBT Functionality
 Provides
 Data loading
 Local files (PDB, mmCIF, Fasta, etc)
 Compressed files (zip, gzip)
 Remote (http, ftp, OpenMMS?, EJB?)
 Efficient data access
 Raw data
 Derived data (StructureMap)
 Vizualization (plug-in viewers)
Feb. 25, 2004
World University Network - Worldwide
Broadcast
MBT Architecture
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Future - The Structure Should
be the User Interface
Ligand - What other
entries contain this?
Chain - What other
entries have chains with
>90% sequence identity?
Residue - What is the
environment of this residue?
Feb. 25, 2004
World University Network - Worldwide
Broadcast
On-going and Longer
Term Challenges
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Outstanding Problems in
Sequence Analysis &
Comparison









Feb. 25, 2004
Exon recognition
Protein coding gene modeling
Protein/EST alignment
Large scale sequence comparison and alignment
Synteny recognition
Polymorphism / variation detection
Regulatory pattern recognition
Repetitive DNA characterization
RNA gene modeling
World University Network - Worldwide
Broadcast
Exemplar Bioinformatics Problems
1. Full genome comparisons
2. Rapid assessment of polymorphic variations
3. Complete construction of orthologous and paralogous
groups
4. Structure resolution of large assemblies/complexes
5. Dynamical simulation of realistic systems
6. Rapid structural/topological clustering of proteins
7. Protein folding
8. Computer simulation of membrane insertion
9. Simulation of cellular pathways/ sensitivity analysis
of pathways stoichiometry and kinetics
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Bringing the Data View and the Complexity
View Together to Define the Bioinformatics
“Engineering” Challenge
 Easy access to any type of





biological data across databases
Ability to go across databases and
types of data
Rapidly infer knowledge from new
genome sequences
Find relationships between
sequence, structure and function
of gene products
Relate genotype to phenotype in
species
Access and apply polymorphism
data seamlessly
Feb. 25, 2004
 A single computer interface (Web






browser?)
Computer platform independence
Total opaqueness of format
differences
Compute on a point and click
mode
Seamless access to files, file
uploads and downloads
Multimedia capabilities on the
interface
Ability to integrate new
tools/databases painlessly
World University Network - Worldwide
Broadcast
Acknowledgements
 To all those who have chosen
bioinformatics as a career and make the
field so rich
 Particularly those who do so for lesser
rewards – the data providers and
annotators
 My group for the fun we had discussing
this topic
 http://rinkworks.com/said/yogiberra.shtml
"I didn't really say everything I said."
Feb. 25, 2004
World University Network - Worldwide
Broadcast
Download