The Future of Bioinformatics Philip E. Bourne The University of California San Diego

advertisement
The Future of
Bioinformatics
Philip E. Bourne
The University of California San Diego
pbourne@ucsd.edu
http://www.sdsc.edu/pb/talks
Jan. 19, 2004
APBC 04
A Little Story…
Jan. 19, 2004
APBC 04
Jan. 19, 2004
APBC 04
Outline





The rules of prediction
Can I predict?
On what do I base my predictions?
How did we get here?
Where are we going as a discipline (assuming
we are a discipline)?
 What are the dependencies?
 What are the challenges for computer
scientists?
Apology – many examples are drawn from our own
work in structural bioinformatics
Jan. 19, 2004
APBC 04
Outline





The rules of prediction
Can I predict?
On what do I base my predictions?
How did we get here?
Where are we going as a discipline (assuming
we are a discipline)?
 What are the dependencies?
 What are the challenges for computer
scientists?
Apology – many examples are drawn from our own
work in structural bioinformatics
Jan. 19, 2004
APBC 04
Outline
Where are we going as a discipline
(assuming we are a discipline)?
 A data centric view
 A biological complexity view
Jan. 19, 2004
APBC 04
ANYTHING
Plotting Change
You Are Here
TIME
Jan. 19, 2004
APBC 04
“The thing about change is that
things will be different afterwards.”
— Alan McMahon
Rules of Prediction
 Looking back, everything appears to have
developed faster than reality
 Looking forward, everything will develop
faster that you predict
 Hence, we are all very poor at predicting
beyond the next 5 years – examples:
 The Next Fifty Years : Science in the First Half of the Twenty-first
Century by John Brockman (Editor)
 CACM Volume 40 , Issue 2 (February 1997)
Jan. 19, 2004
APBC 04
"This is like deja vu all over again."
Can I even do 5 years?
Jan. 19, 2004
APBC 04
Bourne
Bioinformatics Editorial 1999 15(9):715
“Over the next 5 years there will be an estimated 10
major structural genomics efforts each yielding 200
structures per year. While these efforts will deplete
regular structure determination efforts, improvements
in technology and a general expansion of the field
will continue to yield 50 structures per week worldwide
outside of the structural genomics initiatives.”
Net result 35,000 structures by 2005
There were 11,000 structures at the time of this prediction
Jan. 19, 2004
APBC 04
"You can observe a lot just by
watching."
PDB Growth Curve
Approx. 24,000 structures today
In 2003 approx. 5,000 structures were deposited
Jan. 19, 2004
APBC 04
History
Jan. 19, 2004
APBC 04
Predictions Can
Be Good
From Where Do I Draw my
Predictions?
 As an Associate Editor for Bioinformatics
 From work as the President of ISCB
 As an Editor of Proteins, Structure,
Function and Bioinformatics
 Reviewing many conference papers and
organizing a variety of conferences
 From history thus far, including my own
long career
Jan. 19, 2004
APBC 04
So Let Us Review the History of Bioinformatics
Thus Far – General Observations
 A relatively new term for a scientific endeavor
that has been around much longer
 Medical informatics preceded it, and defined
some of the foundations?
 A scientific endeavor driven out of a paradigm
shift in which biology became a data driven
science
 A scientific endeavor that has gained from
fundamental developments is computer and
information science e.g., algorithms, ontologies,
Bayesian networks, neural networks, text mining
Jan. 19, 2004
APBC 04
"Do you mean now?" -- When asked for the time.
A More Specific Chronology – Pre
1970
Bioinformatics (2003) 19 2176-2190
1945 Biochemical Pathways - Horowitz
1953 Structure of DNA – W&C
1969 Genetic Variation
1962 Molecular Homology – Florkin
1965 Evolutionary Patterns – Purling
1966 Molecular Modeling - Levinthal
1967 Phylogenetic Trees – Fitch
1969 Properties – Ptitsyn
1970 Dynamic Programming N&W
1953 Game Theory – Neumann and Morgenstern
1959 Grammars – Chomsky
1962 Information Theory – Shannon & Weaver
1966 Cellular automata – Neuman
Jan. 19, 2004
APBC 04
A More Specific Chronology – 1970’s
Problem Definition
Structural patterns
And Properties
Richards
Improved Sequence Alignments
Sanakoff
Smith Waterman Algorithm
Structure Prediction
Levitt
Chou and Fasman
Scheraga
Exon/Introns
Gilbert
Public Resources
Dayhoff, PDB
Jan. 19, 2004
APBC 04
A More Specific Chronology – 1980’s
Computational Biology Emerges
Domains recognized
Rashin
Neural nets
Hopfield
Tree of Life Emerges
Molecular computing
Conrad
FASTA
Lipman & Pearson
Nanotechnology
Drexler
Profiles
Gribskov
Reductionism begins
Thornton
Sander
Jan. 19, 2004
Clustering
Shepard
Relational Databases
Networks – EMBLnet, BIONET
APBC 04
A More Specific Chronology – 1990Bioinformatics and Biotechnology
Emerge
 Internet/Web
 Human Genome
Project
Jan. 19, 2004
APBC 04
Growth in ISMB
1500
2002
Edmonton
CANADA
http://www.iscb.org/history.shtml
Jan. 19, 2004
APBC 04
Bioinformatics Journal
1400
1200
1000
800
Submissions
600
400
200
0
1997
1998
1999
2000
2001
2002
2003
Bioinformatics Journal
5
Data for the Journal
Bioinformatics
4.5
4
3.5
3
2.5
Impact Factor
2
1.5
1
0.5
0
1997
Jan. 19, 2004
APBC 04
1998
1999
2000
2001
2002
2003
Bioinformatics - A Vice Chancellor’s View
Biological Experiment
Collect
Data
Information
Characterize
Knowledge
Compare
Model
Discovery
Infer
Complexity
Higher-life
Technology
1
Organ
10
Brain
Mapping
Model Metaboloic
Pathway of E.coli
Sub-cellular
Structure
102 Neuronal
Modeling
106
Virus
Structure
Ribosome
Human
Genome
Project
Yeast
E.Coli
C.Elegans
Genome Genome Genome
90
1
# People/Web Site
Genetic
Circuits
ESTs
Sequence
(C) Copyright Phil Bourne 1998
100000 Computing
Power
Cardiac
Modeling
Cellular
Assembly
Data
1000
100
Gene Chips
95
00
Year
1 Small
Genome/Mo.
Human
Genome
05
Sequencing
Technology
If Data is Central to the Future of
Bioinformatics, Let us Take a Data
Centric View of the Future
 Data complexity
 High throughput data collection
 Database vs literature
 Bioinformatics as data driver
 Data representation
 Data integration
Jan. 19, 2004
APBC 04
"If you come to a fork in the road, take it."
Numbers and Complexity
Complexity is increasing
(a) myoglobin (b) hemoglobin (c) lysozyme (d) transfer RNA
(e)19, 2004
antibodies (f) viruses
(g)04actin
(h) the nucleosome
Jan.
APBC
(i) myosin
(j) ribosome
Courtesy of David Goodsell, TSRI
Complexity - The Ribosome
A Nanomachine
50s
• Translates mRNA into protein
• Molecular Mass: 2.6 million
• Maximum Dimension ~25 nm
protein
mRNA
30s
• 2/3 RNA – performs catalysis
• 1/3 protein –outer scaffold for the RNA
Figure from J. Frank, Wadsworth Center, NY
"The ribosome, together with its accessories, is probably
the
most sophisticated machineAPBC
ever
made.“ R. Garrett (1999) Nature 400
Jan. 19, 2004
04
High Throughput - The Structural Genomics
Pipeline (X-ray Crystallography)
Basic Steps
Crystallomics
• Isolation,
Target • Expression,
Data
Selection • Purification, Collection
• Crystallization
Bioinformatics
• Distant
homologs
• Domain
recognition
Automation
Bioinformatics
• Empirical
rules
Automation
Better
sources
Structure
Solution
Software integration
Decision Support
MAD Phasing Automated
fitting
Bioinformatics Throughout the Process
Jan. 19, 2004
Structure
Refinement
APBC 04
Functional
Annotation
Publish
Bioinformatics
No?
• Alignments
• Protein-protein
interactions
• Protein-ligand
interactions
• Motif recognition
An Aside on the Future of Publishing
Full Description Captured as the Paper/Database is
Written/Deposited Does away with ...
?
Oops!
ß sandwich? Where?
Large loop? Which one??
Loop-sheet-helix???
… the p53 core domain
structure consists of a ß
sandwich that serves as
a scaffold for two large
loops and a loop-sheethelix motif ...
1TSR
----Science Vol.265, p346
Corresponding structure from the PDB
Jan. 19, 2004
APBC 04
BioEditor - A DTD Driven
Domain Specific Editor
Jan. 19, 2004
APBC 04
http://bioeditor.sdsc.edu
Structural Genomics Targets and their Status from http://targetdb.rcsb.org
Bourne et al. 2004
Pacific Symposium on Biocomputing
Jan. 19, 2004
APBC 04
The Data - Bioinformatics Cycle
Result – Computation and Experiment
become More Synergistic
Turn Knowledge into New Data Requirements
Data
Bioinformatics
Turn Data into Knowledge
Jan. 19, 2004
APBC 04
Deuterium Exchange Mass Spec to Predict Structure
Target Protein
Structure Templates
CASP
DXMS
Threading
k (Stability)
Best Structure(s)
Amino Acid
Profile Match Method
Jan. 19, 2004
APBC 04
COREX
Biological Representation
 The Gene Ontology changes everything




Molecular function
Biochemical process
Cellular location
DAG – machine usable
 The number of papers referencing the
gene ontology has increased dramatically
in the last year
Jan. 19, 2004
APBC 04
Biological Data Representation
Future
 Tools to construct ontologies from free
text?
 Ontologies for details of function, proteinprotein interaction, protocols, complete
pathway information
 Consider an example from structural
genomics
Jan. 19, 2004
APBC 04
PEBCdb
 Extends content of TargetDB



status history and stop conditions
protocols for cloning, expression, purification,
crystallization, and NMR
 Extends TargetDB search for new content
 Reports provide links




status history
related protocols
project
sequence and domain databases
Jan. 19, 2004
APBC 04
Incremental Data Pipeline – Example
of a Workflow Environment
Jan. 19, 2004
APBC 04
Research Challenges
 Portable and extensible LIMS
 Controlled vocabulary for protocols
 Heuristics for experimental design
 Quality control
 Data mining to improve protocols
Jan. 19, 2004
APBC 04
Data Integration
Web Services – the
holy grail of
interoperability?
Jan. 19, 2004
APBC 04
Web Services
 Its not CORBA – biologists can do it
 Easy to implement
 Platform independent
 Driver to force data providers to define and
publish a detailed API
 Compelling - introduces the prospect of
global workflow
Jan. 19, 2004
APBC 04
Perl Web Services Client Example
 A small PERL program to access all Pubmed
abstracts containing the word ‘ferritin’
use SOAP::Lite;
$ids_ref = SOAP::Lite
-> uri(‘http://server.location.edu/pdbWebServices’)
-> proxy(‘http://server.location.edu/pdbWebServices’)
-> pubmedAbstractQuery($ARGV[0])
-> result;
@ids = @($ids_ref);
Print “@ids\n”;
Mycomputer(1)% web_service.pl ferritin
1AEW 1AQO 1BCF 1BFR 1BG7 1DPS 1EUM 1FHA 1JGC 1JI5 1JIG 1MFR
1QGH 1RCC 1RCD 1RCE 1RCG 1RCI 1RYT 2FHA
Jan. 19, 2004
APBC 04
A Biological Complexity
Perspective
Jan. 19, 2004
APBC 04
REPRESENTATIVE
DISCIPLINE
EXAMPLE
UNITS
Anatomy
MRI
Physiology
Heart
Cell Biology
Neuron
SCIENTIFIC RESEARCH
& DISCOVERY
Organisms
REPRESENTATIVE
TECHNOLOGY
Migratory
Sensors
Organs
Ventricular
Modeling
Cells
Electron
Microscopy
You Are Here
Proteomics
Genomics
Structure
Sequence
Macromolecules
Biopolymers
Infrastructure
Medicinal
Chemistry
Jan. 19, 2004
Protease
Inhibitor
X-ray
Crystallography
Technologies
Atoms & Molecules
APBC 04
Training
Protein
Docking
Let us Focus on the Near
Future
Jan. 19, 2004
APBC 04
Computational
Biology/Bioengineering
in the Post-Genomic Era
The “New” Central Dogma
Genomes
Gene
Products
Structure &
Function
Pathways &
Physiology
~ Scientific Challenges - Deciphering the genome, mapping the genotype-phenotype
relationships, dissecting organismic function, engineering organisms with altered
functionality, figuring out complex traits and polymorphism, understanding physiology.
~ Algorithmic Challenges - comparisons of whole and partial genomes, metrics for
similarity and homology, metabolic reconstruction, dissecting pathways, and whole cell
modeling.
~ Computational Challenges - creation the informatics infrastructure, creation,
annotation, curation and dissemination of databases, development of parallel
Jan. 19, 2004
APBC 04
computational methods.
Our understanding of
biological complexity is not
reflected in the current
generation of biological data
resources
Consider an example the protein kinase-like superfamily
Jan. 19, 2004
APBC 04
The SCOP Classification Hierarchy
Courtesy Steven Brenner
Jan. 19, 2004
APBC 04
An Example of a Structural Superfamily:
The Protein Kinase-Like Superfamily
SCOP grouping for kinases
1) Class: Alpha+Beta
2) Fold: Protein Kinase Catalytic Core
3) Superfamily: Protein Kinase
Catalytic Core
4) Families:
7
8
a) Ser/Thr Kinases
b) Tyr Kinases
c) Atypical Kinases
d) Antibiotic Kinases
e) Lipid Kinases
Superfamily: not all eukaryotic or
protein kinases: some homologues
discovered in bacteria that
phosphorylate antibiotics, others
phosphorylate lipids
Jan. 19, 2004
Typical Kinase Core (c-Src, PDB ID: 2SRC)
APBC 04
Evolution of the Kinase
Superfamily: Comparison of
Three Superfamily Members
•A: Casein kinase 1 (PDB ID:
1CSN)
•B: Aminoglycoside kinase
(PDB ID: 1J7L)
•C: Phosphatidylinositol 3kinase (PDB ID: 1E8X).
•D: The previous three
structures with only their shared
region superposed (1CSN: light
blue, 1J7L: red, 1E8X: yellow).
•The three kinases share a
minimal core required for ATP
binding and phosphotransfer.
Jan. 19, 2004
APBC 04
Our Algorithms Need to
Continue to Evolve and there
is the Real Need for Quality
Control
Consider structure comparison
and alignment of the diverse
protein kinases
Jan. 19, 2004
APBC 04
An Example of Manual vs. Automated with Combinatorial Extension (CE)
•The manual alignment can be used to better
understand the limitations of our automated
method
•Alignment of helix C of two tyrosine kinases
•Insulin Receptor Kinase (pdb id 1IR3)
•c-Src (pdb id 2SRC)
•Can be aligned with 40% ident, 3.0Å
RMSD
•In Src, C-helix is displaced and rotated
outward
•Rotation pushes n-terminal end of helix
out very far from n-terminal end of IRK
•CE gaps a part of this (yellow), splitting
helix, aligning part of IRK helix C with
loop leading to helix C in Src
Jan. 19, 2004
APBC 04
Orange: IRK, Blue: c-Src
Yellow: CE gap region
An Example of Manual vs. Automated with CE
•A closer look:
CE alignment
•The CE alignment puts
closer C-alpha positions
together but does not
respect helical relationships
•Hand alignment respects
helix, aligns more distant
C-alpha positions
Jan. 19, 2004
APBC 04
Hand alignment
Improving CEfam:
Multiple Alignments
with CE
•Example with strands 1 and 2 of
kinase superfamily
•A: original
•B: optimal parameters
•C: manual
•Parameters also improved
results with other protein
superfamilies in visual analysis
•Just as sequence alignments are
benchmarked against structure
alignments, structure alignments
should be benchmarked to
manual results
•Improvement in optimization is
now being folded into the next
generation of CE
Jan. 19, 2004
APBC 04
Quality Control
Consider an example
The definition of domains from
3-D structure
Jan. 19, 2004
APBC 04
The 3D Domain Assignment Problem
Domain is a fundamental structural, functional and evolutionary unit of
protein:
Compact
Stable
Have hydrophobic core
Fold independently
Perform specific function
Can be re-shuffled and put together in different
combinations
Evolution works on the level of domain
Jan. 19, 2004
APBC 04
Exact assignments of domains remains a difficult
and unresolved problem.
There is no complete agreement among experts on domain assignment
given a protein structure.
Expert methods agree on 80% of all existing manual assignments, the
remaining 20% represent “difficult” cases
Expert assignment #3
Expert assignment #1
Expert assignment #2
Jan. 19, 2004
APBC 04
Manual vs. automatic consensuses: do they overlap?
Chains with manual consensus: 375 (80% of entire dataset)
Chains with automatic consensus: 374 (80% of entire dataset)
Chains with consensus (automatic or manual) : 424 (90.6% of entire dataset)
Automatic consensus only
46 chains (10.9% of chains
with consensus)
Manual consensus only
47 chains (11.1% of
chains with consensus)
Manual and automatic consensus
agree
328 chains
(77.3% of chains with consensus)
Automatic consensus and manual
consensus disagree 3 chains (0.7%
of chains with consensus)
Veretnik et al. 2003 JMB submitted
Jan. 19, 2004
APBC 04
1cjaa (actin-fragmin kinase, slime mold): an unusual kinase
[complex interface]
SCOP, PDP,
DomainParser
1 domain
Jan. 19, 2004
CATH
1 domain + unassigned
APBC 04
DALI
4 domains
typical kinase
Outstanding Problems in
Sequence Analysis &
Comparison









Jan. 19, 2004
Exon recognition
Protein coding gene modeling
Protein/EST alignment
Large scale sequence comparison and alignment
Synteny recognition
Polymorphism / variation detection
Regulatory pattern recognition
Repetitive DNA characterization
RNA gene modeling
APBC 04
Exemplar Bioinformatics Problems
1. Full genome comparisons
2. Rapid assessment of polymorphic variations
3. Complete construction of orthologous and paralogous
groups
4. Structure resolution of large assemblies/complexes
5. Dynamical simulation of realistic systems
6. Rapid structural/toplogical clustering of proteins
7. Protein folding
8. Computer simulation of membrane insertion
9. Simulation of cellular pathways/ sensitivity analysis
of pathways stoichiometry and kinetics
Jan. 19, 2004
APBC 04
Bringing the Data View and the Complexity
View Together to Define the Bioinformatics
“Engineering” Challenge
 Easy access to any type of





 A single computer interface (Web
biological data across databases
Ability to go across databases and
types of data
Rapidly infer knowledge from new
genome sequences
Find relationships between
sequence, structure and function
of gene products
Relate genotype to phenotype in
species
Access and apply polymorphism
data seamlessly
Jan. 19, 2004






APBC 04
browser?)
Computer platform independence
Total opaqueness of format
differences
Compute on a point and click
mode
Seamless access to files, file
uploads and downloads
Multimedia capabilities on the
interface
Ability to integrate new
tools/databases painlessly
Consider a Couple of
Approaches
Jan. 19, 2004
APBC 04
integrated Genomic Annotation Pipeline - iGAP
structure info
SCOP, PDB
Building FOLDLIB:
PDB chains
SCOP domains
PDP domains
CE matches PDB vs. SCOP
90% sequence non-identical
minimum size 25 aa
coverage (90%, gaps <30, ends<30)
sequence info
Deduced protein sequences
NR, PFAM
Prediction of :
signal peptides (SignalP, PSORT)
transmembrane (TMHMM, PSORT)
coiled coils (COILS)
low complexity regions (SEG)
Create PSI-BLAST profiles for protein sequences
Structural assignment of domains by
PSI-BLAST on FOLDLIB
Only sequences w/out A-prediction
Structural assignment of domains by
123D on FOLDLIB
Only sequences w/out A-prediction
Functional assignment by PFAM, NR,
PSIPred assignments
FOLDLIB
Jan. 19, 2004
Domain location prediction by sequence
APBC 04
Store assigned regions in the DB
integrated Genomic Annotation Pipeline iGAP Deduced Protein sequences
structure info
SCOP, PDB
Building FOLDLIB:
PDB chains
SCOP domains
PDP domains
CE matches PDB vs. SCOP
90% sequence non-identical
minimum size 25 aa
coverage (90%, gaps <30, ends<30)
~800 genomes
@ 10k-20k per
=~107 ORF’s
sequence info
NR, PFAM
104
entries
Prediction of :
signal peptides (SignalP, PSORT)
transmembrane (TMHMM, PSORT)
coiled coils (COILS)
low complexity regions (SEG)
Create PSI-BLAST profiles for Protein sequences
Structural assignment of domains by
PSI-BLAST on FOLDLIB
4 CPU
years
228 CPU
years
3 CPU
years
Only sequences w/out A-prediction
Structural assignment of domains by
123D on FOLDLIB
9 CPU
years
Only sequences w/out A-prediction
Functional assignment by PFAM, NR,
PSIPred assignments
FOLDLIB
Jan. 19, 2004
Li, et al., (2003) Genome Biology
Domain location prediction by sequence
APBC 04
252 CPU
years
3 CPU
years
Store assigned regions in the DB
Towards Workflows and the Grid
iGAP
APST
Scheduler
Executables
Parameters
Input
Output
Resources
MDS/NWS/Ganglia
XML
Grid Resource
Data
Manager
SCP/GASS/SRB/FTP
Information
Storage
Compute
Compute
Manager
Jan. 19, 2004
Grid Middleware
SSH/GRAM/GASS
PBS/Loadleveler/Condor
APBC 04
THE EOL GRID
CONSORTIUM
SDSC
Blue Horizon
The EOL Cluster
Sun Enterprise Server
Industrial Partners
IBM
Ceres
EOL
BII
Singapore
Encyclopedia Proteomics Inc.
Jan. 19, 2004
APBC 04
Titech Japan
Collaboration
A New Direction
In the past: Isolation
Now: Collaboration
Jan. 19, 2004
APBC 04
Beyond Collaboration with
other Bioinformaticists is
Collaboration with Biologists
We need to overcome the “high
noon” problem
Jan. 19, 2004
APBC 04
High Noon – A Working Definition
12:00
The cost:benefit ratio of entry to bioinformatics
tools and resources is
too high for the majority of biologists
Thus, those who could gain and
contribute most from the services provided
are not users
Jan. 19, 2004
APBC 04
One Approach - MBT
 Java toolkit for developing custom molecular
visualization applications
 High-quality
interactive
rendering of:
 sequence
 structure
 function
http://mbt.sdsc.edu
Jan. 19, 2004
APBC 04
MBT Functionality
 Provides
 Data loading
 Local files (PDB, mmCIF, Fasta, etc)
 Compressed files (zip, gzip)
 Remote (http, ftp, OpenMMS?, EJB?)
 Efficient data access
 Raw data
 Derived data (StructureMap)
 Vizualization (plug-in viewers)
Jan. 19, 2004
APBC 04
MBT Architecture
Jan. 19, 2004
APBC 04
Future - The Structure Should
be the User Interface
Ligand - What other
entries contain this?
Chain - What other
entries have chains with
>90% sequence identity?
Residue - What is the
environment of this residue?
Jan. 19, 2004
APBC 04
REPRESENTATIVE
DISCIPLINE
EXAMPLE
UNITS
Anatomy
MRI
Physiology
Heart
Cell Biology
Neuron
SCIENTIFIC RESEARCH
& DISCOVERY
Organisms
REPRESENTATIVE
TECHNOLOGY
Migratory
Sensors
Organs
Ventricular
Modeling
Cells
Electron
Microscopy
You Are Here
Proteomics
Genomics
Structure
Sequence
Macromolecules
Biopolymers
Infrastructure
Medicinal
Chemistry
Jan. 19, 2004
Protease
Inhibitor
X-ray
Crystallography
Technologies
Atoms & Molecules
APBC 04
Training
Protein
Docking
Phenomena in biological systems may be
organized in several layers.
 Populations


Ecological Communities
Populations of a Species
 Physiology and Organisms



Integrative physiology, Homeostasis
Organs, Tissues
Cells
 Pathways and Information Transfer



Integrated metabolism, regulatory, developmental pathways
Simple pathways for information transfer, regulation, development
Simple metabolic pathways for creating & using other molecules
 Biological Macromolecules and Structures




Biomolecular Assemblies; ligand-receptor complexes
Molecules and Structures created by genes, gene products
Gene Products: RNAs; Proteins
Genes and Genomes
 Physics and Chemistry

e.g. Physical Chemistry, Organic Chemistry, Information theory, Constraints of self-assembling adaptive systems
Jan. 19, 2004
APBC 04
Each system layer builds from lower system layers
& acquires new emergent properties
 Populations


Ecological Communities
Populations of a Species
Ecological
Processes
& Populations



Integrative physiology, Homeostasis
Organs, Tissues
Cells
Tissue &
Organismal
Physiology
Developmental
& Physiological
Processes
 Pathways and Information Transfer



Integrated metabolism, regulatory, developmental pathways
Simple pathways for information transfer, regulation, development
Simple metabolic pathways for creating & using other molecules
Biochemical
Pathways &
Processes
 Biological Macromolecules and Structures




Biomolecular Assemblies; ligand-receptor complexes
Molecules and Structures created by genes, gene products
Gene Products: RNAs; Proteins
Genes and Genomes
 Physics and Chemistry

Biomolecular
Structure &
Function
Genes Information
and Genomes
e.g. Physical Chemistry, Organic Chemistry,
theory, Constraints of self-assembling adaptive systems
Jan. 19, 2004
APBC 04
Physics and Chemistry
New Emergent Properties
 Physiology and Organisms
The Next Response
 Transitional medicine
 Personalized medicine
 Merger of medical, chem and
bioinformatics
 Training in cooperative in silico and
experimental research
 Centers that reflect that training ie different
to NCBI or EBI
Jan. 19, 2004
APBC 04
Think! How the hell are you gonna think and hit at the same time?"
Statement
of the Director, NIGMS,
the House Appropriations
Jan. 19, 2004
APBCbefore
04
Subcommittee on Labor, HHS, Education Thursday, February 25, 1999
The Next Response cont.
 Continued development of scientific




societies
Simulations used in the clinic setting
New diagnostic procedures
Ubiquitous large scale computing on large
data
More systemized drug discovery
Jan. 19, 2004
APBC 04
"I knew I was going to take the wrong train, so I left early."
Evolution of complex systems:
Computers: complexity doubles in every 18 month per $$$ (Moore’s
Law)
Human Brain: very slow (complexity doubles in ~100,000 years)
System
Short Term Storage
Long Term
Storage
Speed
Cost
PC cluster
(256 units)
65GB
5 TB
256 GFLOP
$130K
Human Brain
(Average)
57 TB
1137 TB
4.4 TFLOP
$130K
Complexity = Speed x Memory
Computer = 5TB x 256 GFLOP = 1024 memory FLOPs
Brain = 1137TB x 4.4 TFLOP = 5x1027 memory FLOPs
Brain/Computer=5x103 or 3.7 log units
Moore’s Law: 3.5 years/log unit
Human brain capacity for computers will be reached: 2000+3.7x3.5=2013
Based on Ramsey, 1997
Jan. 19, 2004
APBC 04
Acknowledgements
 To all those who have chosen
bioinformatics as a career and make the
field so rich
 Particularly those who do so for lesser
rewards – the data providers and
annotators
 My group for the fun we had discussing
this topic
 http://rinkworks.com/said/yogiberra.shtml
Jan. 19, 2004
APBC 04
"I didn't really say everything I said."
Download