The Future of Bioinformatics Philip E. Bourne The University of California San Diego pbourne@ucsd.edu http://www.sdsc.edu/pb/talks Jan. 19, 2004 APBC 04 A Little Story… Jan. 19, 2004 APBC 04 Jan. 19, 2004 APBC 04 Outline The rules of prediction Can I predict? On what do I base my predictions? How did we get here? Where are we going as a discipline (assuming we are a discipline)? What are the dependencies? What are the challenges for computer scientists? Apology – many examples are drawn from our own work in structural bioinformatics Jan. 19, 2004 APBC 04 Outline The rules of prediction Can I predict? On what do I base my predictions? How did we get here? Where are we going as a discipline (assuming we are a discipline)? What are the dependencies? What are the challenges for computer scientists? Apology – many examples are drawn from our own work in structural bioinformatics Jan. 19, 2004 APBC 04 Outline Where are we going as a discipline (assuming we are a discipline)? A data centric view A biological complexity view Jan. 19, 2004 APBC 04 ANYTHING Plotting Change You Are Here TIME Jan. 19, 2004 APBC 04 “The thing about change is that things will be different afterwards.” — Alan McMahon Rules of Prediction Looking back, everything appears to have developed faster than reality Looking forward, everything will develop faster that you predict Hence, we are all very poor at predicting beyond the next 5 years – examples: The Next Fifty Years : Science in the First Half of the Twenty-first Century by John Brockman (Editor) CACM Volume 40 , Issue 2 (February 1997) Jan. 19, 2004 APBC 04 "This is like deja vu all over again." Can I even do 5 years? Jan. 19, 2004 APBC 04 Bourne Bioinformatics Editorial 1999 15(9):715 “Over the next 5 years there will be an estimated 10 major structural genomics efforts each yielding 200 structures per year. While these efforts will deplete regular structure determination efforts, improvements in technology and a general expansion of the field will continue to yield 50 structures per week worldwide outside of the structural genomics initiatives.” Net result 35,000 structures by 2005 There were 11,000 structures at the time of this prediction Jan. 19, 2004 APBC 04 "You can observe a lot just by watching." PDB Growth Curve Approx. 24,000 structures today In 2003 approx. 5,000 structures were deposited Jan. 19, 2004 APBC 04 History Jan. 19, 2004 APBC 04 Predictions Can Be Good From Where Do I Draw my Predictions? As an Associate Editor for Bioinformatics From work as the President of ISCB As an Editor of Proteins, Structure, Function and Bioinformatics Reviewing many conference papers and organizing a variety of conferences From history thus far, including my own long career Jan. 19, 2004 APBC 04 So Let Us Review the History of Bioinformatics Thus Far – General Observations A relatively new term for a scientific endeavor that has been around much longer Medical informatics preceded it, and defined some of the foundations? A scientific endeavor driven out of a paradigm shift in which biology became a data driven science A scientific endeavor that has gained from fundamental developments is computer and information science e.g., algorithms, ontologies, Bayesian networks, neural networks, text mining Jan. 19, 2004 APBC 04 "Do you mean now?" -- When asked for the time. A More Specific Chronology – Pre 1970 Bioinformatics (2003) 19 2176-2190 1945 Biochemical Pathways - Horowitz 1953 Structure of DNA – W&C 1969 Genetic Variation 1962 Molecular Homology – Florkin 1965 Evolutionary Patterns – Purling 1966 Molecular Modeling - Levinthal 1967 Phylogenetic Trees – Fitch 1969 Properties – Ptitsyn 1970 Dynamic Programming N&W 1953 Game Theory – Neumann and Morgenstern 1959 Grammars – Chomsky 1962 Information Theory – Shannon & Weaver 1966 Cellular automata – Neuman Jan. 19, 2004 APBC 04 A More Specific Chronology – 1970’s Problem Definition Structural patterns And Properties Richards Improved Sequence Alignments Sanakoff Smith Waterman Algorithm Structure Prediction Levitt Chou and Fasman Scheraga Exon/Introns Gilbert Public Resources Dayhoff, PDB Jan. 19, 2004 APBC 04 A More Specific Chronology – 1980’s Computational Biology Emerges Domains recognized Rashin Neural nets Hopfield Tree of Life Emerges Molecular computing Conrad FASTA Lipman & Pearson Nanotechnology Drexler Profiles Gribskov Reductionism begins Thornton Sander Jan. 19, 2004 Clustering Shepard Relational Databases Networks – EMBLnet, BIONET APBC 04 A More Specific Chronology – 1990Bioinformatics and Biotechnology Emerge Internet/Web Human Genome Project Jan. 19, 2004 APBC 04 Growth in ISMB 1500 2002 Edmonton CANADA http://www.iscb.org/history.shtml Jan. 19, 2004 APBC 04 Bioinformatics Journal 1400 1200 1000 800 Submissions 600 400 200 0 1997 1998 1999 2000 2001 2002 2003 Bioinformatics Journal 5 Data for the Journal Bioinformatics 4.5 4 3.5 3 2.5 Impact Factor 2 1.5 1 0.5 0 1997 Jan. 19, 2004 APBC 04 1998 1999 2000 2001 2002 2003 Bioinformatics - A Vice Chancellor’s View Biological Experiment Collect Data Information Characterize Knowledge Compare Model Discovery Infer Complexity Higher-life Technology 1 Organ 10 Brain Mapping Model Metaboloic Pathway of E.coli Sub-cellular Structure 102 Neuronal Modeling 106 Virus Structure Ribosome Human Genome Project Yeast E.Coli C.Elegans Genome Genome Genome 90 1 # People/Web Site Genetic Circuits ESTs Sequence (C) Copyright Phil Bourne 1998 100000 Computing Power Cardiac Modeling Cellular Assembly Data 1000 100 Gene Chips 95 00 Year 1 Small Genome/Mo. Human Genome 05 Sequencing Technology If Data is Central to the Future of Bioinformatics, Let us Take a Data Centric View of the Future Data complexity High throughput data collection Database vs literature Bioinformatics as data driver Data representation Data integration Jan. 19, 2004 APBC 04 "If you come to a fork in the road, take it." Numbers and Complexity Complexity is increasing (a) myoglobin (b) hemoglobin (c) lysozyme (d) transfer RNA (e)19, 2004 antibodies (f) viruses (g)04actin (h) the nucleosome Jan. APBC (i) myosin (j) ribosome Courtesy of David Goodsell, TSRI Complexity - The Ribosome A Nanomachine 50s • Translates mRNA into protein • Molecular Mass: 2.6 million • Maximum Dimension ~25 nm protein mRNA 30s • 2/3 RNA – performs catalysis • 1/3 protein –outer scaffold for the RNA Figure from J. Frank, Wadsworth Center, NY "The ribosome, together with its accessories, is probably the most sophisticated machineAPBC ever made.“ R. Garrett (1999) Nature 400 Jan. 19, 2004 04 High Throughput - The Structural Genomics Pipeline (X-ray Crystallography) Basic Steps Crystallomics • Isolation, Target • Expression, Data Selection • Purification, Collection • Crystallization Bioinformatics • Distant homologs • Domain recognition Automation Bioinformatics • Empirical rules Automation Better sources Structure Solution Software integration Decision Support MAD Phasing Automated fitting Bioinformatics Throughout the Process Jan. 19, 2004 Structure Refinement APBC 04 Functional Annotation Publish Bioinformatics No? • Alignments • Protein-protein interactions • Protein-ligand interactions • Motif recognition An Aside on the Future of Publishing Full Description Captured as the Paper/Database is Written/Deposited Does away with ... ? Oops! ß sandwich? Where? Large loop? Which one?? Loop-sheet-helix??? … the p53 core domain structure consists of a ß sandwich that serves as a scaffold for two large loops and a loop-sheethelix motif ... 1TSR ----Science Vol.265, p346 Corresponding structure from the PDB Jan. 19, 2004 APBC 04 BioEditor - A DTD Driven Domain Specific Editor Jan. 19, 2004 APBC 04 http://bioeditor.sdsc.edu Structural Genomics Targets and their Status from http://targetdb.rcsb.org Bourne et al. 2004 Pacific Symposium on Biocomputing Jan. 19, 2004 APBC 04 The Data - Bioinformatics Cycle Result – Computation and Experiment become More Synergistic Turn Knowledge into New Data Requirements Data Bioinformatics Turn Data into Knowledge Jan. 19, 2004 APBC 04 Deuterium Exchange Mass Spec to Predict Structure Target Protein Structure Templates CASP DXMS Threading k (Stability) Best Structure(s) Amino Acid Profile Match Method Jan. 19, 2004 APBC 04 COREX Biological Representation The Gene Ontology changes everything Molecular function Biochemical process Cellular location DAG – machine usable The number of papers referencing the gene ontology has increased dramatically in the last year Jan. 19, 2004 APBC 04 Biological Data Representation Future Tools to construct ontologies from free text? Ontologies for details of function, proteinprotein interaction, protocols, complete pathway information Consider an example from structural genomics Jan. 19, 2004 APBC 04 PEBCdb Extends content of TargetDB status history and stop conditions protocols for cloning, expression, purification, crystallization, and NMR Extends TargetDB search for new content Reports provide links status history related protocols project sequence and domain databases Jan. 19, 2004 APBC 04 Incremental Data Pipeline – Example of a Workflow Environment Jan. 19, 2004 APBC 04 Research Challenges Portable and extensible LIMS Controlled vocabulary for protocols Heuristics for experimental design Quality control Data mining to improve protocols Jan. 19, 2004 APBC 04 Data Integration Web Services – the holy grail of interoperability? Jan. 19, 2004 APBC 04 Web Services Its not CORBA – biologists can do it Easy to implement Platform independent Driver to force data providers to define and publish a detailed API Compelling - introduces the prospect of global workflow Jan. 19, 2004 APBC 04 Perl Web Services Client Example A small PERL program to access all Pubmed abstracts containing the word ‘ferritin’ use SOAP::Lite; $ids_ref = SOAP::Lite -> uri(‘http://server.location.edu/pdbWebServices’) -> proxy(‘http://server.location.edu/pdbWebServices’) -> pubmedAbstractQuery($ARGV[0]) -> result; @ids = @($ids_ref); Print “@ids\n”; Mycomputer(1)% web_service.pl ferritin 1AEW 1AQO 1BCF 1BFR 1BG7 1DPS 1EUM 1FHA 1JGC 1JI5 1JIG 1MFR 1QGH 1RCC 1RCD 1RCE 1RCG 1RCI 1RYT 2FHA Jan. 19, 2004 APBC 04 A Biological Complexity Perspective Jan. 19, 2004 APBC 04 REPRESENTATIVE DISCIPLINE EXAMPLE UNITS Anatomy MRI Physiology Heart Cell Biology Neuron SCIENTIFIC RESEARCH & DISCOVERY Organisms REPRESENTATIVE TECHNOLOGY Migratory Sensors Organs Ventricular Modeling Cells Electron Microscopy You Are Here Proteomics Genomics Structure Sequence Macromolecules Biopolymers Infrastructure Medicinal Chemistry Jan. 19, 2004 Protease Inhibitor X-ray Crystallography Technologies Atoms & Molecules APBC 04 Training Protein Docking Let us Focus on the Near Future Jan. 19, 2004 APBC 04 Computational Biology/Bioengineering in the Post-Genomic Era The “New” Central Dogma Genomes Gene Products Structure & Function Pathways & Physiology ~ Scientific Challenges - Deciphering the genome, mapping the genotype-phenotype relationships, dissecting organismic function, engineering organisms with altered functionality, figuring out complex traits and polymorphism, understanding physiology. ~ Algorithmic Challenges - comparisons of whole and partial genomes, metrics for similarity and homology, metabolic reconstruction, dissecting pathways, and whole cell modeling. ~ Computational Challenges - creation the informatics infrastructure, creation, annotation, curation and dissemination of databases, development of parallel Jan. 19, 2004 APBC 04 computational methods. Our understanding of biological complexity is not reflected in the current generation of biological data resources Consider an example the protein kinase-like superfamily Jan. 19, 2004 APBC 04 The SCOP Classification Hierarchy Courtesy Steven Brenner Jan. 19, 2004 APBC 04 An Example of a Structural Superfamily: The Protein Kinase-Like Superfamily SCOP grouping for kinases 1) Class: Alpha+Beta 2) Fold: Protein Kinase Catalytic Core 3) Superfamily: Protein Kinase Catalytic Core 4) Families: 7 8 a) Ser/Thr Kinases b) Tyr Kinases c) Atypical Kinases d) Antibiotic Kinases e) Lipid Kinases Superfamily: not all eukaryotic or protein kinases: some homologues discovered in bacteria that phosphorylate antibiotics, others phosphorylate lipids Jan. 19, 2004 Typical Kinase Core (c-Src, PDB ID: 2SRC) APBC 04 Evolution of the Kinase Superfamily: Comparison of Three Superfamily Members •A: Casein kinase 1 (PDB ID: 1CSN) •B: Aminoglycoside kinase (PDB ID: 1J7L) •C: Phosphatidylinositol 3kinase (PDB ID: 1E8X). •D: The previous three structures with only their shared region superposed (1CSN: light blue, 1J7L: red, 1E8X: yellow). •The three kinases share a minimal core required for ATP binding and phosphotransfer. Jan. 19, 2004 APBC 04 Our Algorithms Need to Continue to Evolve and there is the Real Need for Quality Control Consider structure comparison and alignment of the diverse protein kinases Jan. 19, 2004 APBC 04 An Example of Manual vs. Automated with Combinatorial Extension (CE) •The manual alignment can be used to better understand the limitations of our automated method •Alignment of helix C of two tyrosine kinases •Insulin Receptor Kinase (pdb id 1IR3) •c-Src (pdb id 2SRC) •Can be aligned with 40% ident, 3.0Å RMSD •In Src, C-helix is displaced and rotated outward •Rotation pushes n-terminal end of helix out very far from n-terminal end of IRK •CE gaps a part of this (yellow), splitting helix, aligning part of IRK helix C with loop leading to helix C in Src Jan. 19, 2004 APBC 04 Orange: IRK, Blue: c-Src Yellow: CE gap region An Example of Manual vs. Automated with CE •A closer look: CE alignment •The CE alignment puts closer C-alpha positions together but does not respect helical relationships •Hand alignment respects helix, aligns more distant C-alpha positions Jan. 19, 2004 APBC 04 Hand alignment Improving CEfam: Multiple Alignments with CE •Example with strands 1 and 2 of kinase superfamily •A: original •B: optimal parameters •C: manual •Parameters also improved results with other protein superfamilies in visual analysis •Just as sequence alignments are benchmarked against structure alignments, structure alignments should be benchmarked to manual results •Improvement in optimization is now being folded into the next generation of CE Jan. 19, 2004 APBC 04 Quality Control Consider an example The definition of domains from 3-D structure Jan. 19, 2004 APBC 04 The 3D Domain Assignment Problem Domain is a fundamental structural, functional and evolutionary unit of protein: Compact Stable Have hydrophobic core Fold independently Perform specific function Can be re-shuffled and put together in different combinations Evolution works on the level of domain Jan. 19, 2004 APBC 04 Exact assignments of domains remains a difficult and unresolved problem. There is no complete agreement among experts on domain assignment given a protein structure. Expert methods agree on 80% of all existing manual assignments, the remaining 20% represent “difficult” cases Expert assignment #3 Expert assignment #1 Expert assignment #2 Jan. 19, 2004 APBC 04 Manual vs. automatic consensuses: do they overlap? Chains with manual consensus: 375 (80% of entire dataset) Chains with automatic consensus: 374 (80% of entire dataset) Chains with consensus (automatic or manual) : 424 (90.6% of entire dataset) Automatic consensus only 46 chains (10.9% of chains with consensus) Manual consensus only 47 chains (11.1% of chains with consensus) Manual and automatic consensus agree 328 chains (77.3% of chains with consensus) Automatic consensus and manual consensus disagree 3 chains (0.7% of chains with consensus) Veretnik et al. 2003 JMB submitted Jan. 19, 2004 APBC 04 1cjaa (actin-fragmin kinase, slime mold): an unusual kinase [complex interface] SCOP, PDP, DomainParser 1 domain Jan. 19, 2004 CATH 1 domain + unassigned APBC 04 DALI 4 domains typical kinase Outstanding Problems in Sequence Analysis & Comparison Jan. 19, 2004 Exon recognition Protein coding gene modeling Protein/EST alignment Large scale sequence comparison and alignment Synteny recognition Polymorphism / variation detection Regulatory pattern recognition Repetitive DNA characterization RNA gene modeling APBC 04 Exemplar Bioinformatics Problems 1. Full genome comparisons 2. Rapid assessment of polymorphic variations 3. Complete construction of orthologous and paralogous groups 4. Structure resolution of large assemblies/complexes 5. Dynamical simulation of realistic systems 6. Rapid structural/toplogical clustering of proteins 7. Protein folding 8. Computer simulation of membrane insertion 9. Simulation of cellular pathways/ sensitivity analysis of pathways stoichiometry and kinetics Jan. 19, 2004 APBC 04 Bringing the Data View and the Complexity View Together to Define the Bioinformatics “Engineering” Challenge Easy access to any type of A single computer interface (Web biological data across databases Ability to go across databases and types of data Rapidly infer knowledge from new genome sequences Find relationships between sequence, structure and function of gene products Relate genotype to phenotype in species Access and apply polymorphism data seamlessly Jan. 19, 2004 APBC 04 browser?) Computer platform independence Total opaqueness of format differences Compute on a point and click mode Seamless access to files, file uploads and downloads Multimedia capabilities on the interface Ability to integrate new tools/databases painlessly Consider a Couple of Approaches Jan. 19, 2004 APBC 04 integrated Genomic Annotation Pipeline - iGAP structure info SCOP, PDB Building FOLDLIB: PDB chains SCOP domains PDP domains CE matches PDB vs. SCOP 90% sequence non-identical minimum size 25 aa coverage (90%, gaps <30, ends<30) sequence info Deduced protein sequences NR, PFAM Prediction of : signal peptides (SignalP, PSORT) transmembrane (TMHMM, PSORT) coiled coils (COILS) low complexity regions (SEG) Create PSI-BLAST profiles for protein sequences Structural assignment of domains by PSI-BLAST on FOLDLIB Only sequences w/out A-prediction Structural assignment of domains by 123D on FOLDLIB Only sequences w/out A-prediction Functional assignment by PFAM, NR, PSIPred assignments FOLDLIB Jan. 19, 2004 Domain location prediction by sequence APBC 04 Store assigned regions in the DB integrated Genomic Annotation Pipeline iGAP Deduced Protein sequences structure info SCOP, PDB Building FOLDLIB: PDB chains SCOP domains PDP domains CE matches PDB vs. SCOP 90% sequence non-identical minimum size 25 aa coverage (90%, gaps <30, ends<30) ~800 genomes @ 10k-20k per =~107 ORF’s sequence info NR, PFAM 104 entries Prediction of : signal peptides (SignalP, PSORT) transmembrane (TMHMM, PSORT) coiled coils (COILS) low complexity regions (SEG) Create PSI-BLAST profiles for Protein sequences Structural assignment of domains by PSI-BLAST on FOLDLIB 4 CPU years 228 CPU years 3 CPU years Only sequences w/out A-prediction Structural assignment of domains by 123D on FOLDLIB 9 CPU years Only sequences w/out A-prediction Functional assignment by PFAM, NR, PSIPred assignments FOLDLIB Jan. 19, 2004 Li, et al., (2003) Genome Biology Domain location prediction by sequence APBC 04 252 CPU years 3 CPU years Store assigned regions in the DB Towards Workflows and the Grid iGAP APST Scheduler Executables Parameters Input Output Resources MDS/NWS/Ganglia XML Grid Resource Data Manager SCP/GASS/SRB/FTP Information Storage Compute Compute Manager Jan. 19, 2004 Grid Middleware SSH/GRAM/GASS PBS/Loadleveler/Condor APBC 04 THE EOL GRID CONSORTIUM SDSC Blue Horizon The EOL Cluster Sun Enterprise Server Industrial Partners IBM Ceres EOL BII Singapore Encyclopedia Proteomics Inc. Jan. 19, 2004 APBC 04 Titech Japan Collaboration A New Direction In the past: Isolation Now: Collaboration Jan. 19, 2004 APBC 04 Beyond Collaboration with other Bioinformaticists is Collaboration with Biologists We need to overcome the “high noon” problem Jan. 19, 2004 APBC 04 High Noon – A Working Definition 12:00 The cost:benefit ratio of entry to bioinformatics tools and resources is too high for the majority of biologists Thus, those who could gain and contribute most from the services provided are not users Jan. 19, 2004 APBC 04 One Approach - MBT Java toolkit for developing custom molecular visualization applications High-quality interactive rendering of: sequence structure function http://mbt.sdsc.edu Jan. 19, 2004 APBC 04 MBT Functionality Provides Data loading Local files (PDB, mmCIF, Fasta, etc) Compressed files (zip, gzip) Remote (http, ftp, OpenMMS?, EJB?) Efficient data access Raw data Derived data (StructureMap) Vizualization (plug-in viewers) Jan. 19, 2004 APBC 04 MBT Architecture Jan. 19, 2004 APBC 04 Future - The Structure Should be the User Interface Ligand - What other entries contain this? Chain - What other entries have chains with >90% sequence identity? Residue - What is the environment of this residue? Jan. 19, 2004 APBC 04 REPRESENTATIVE DISCIPLINE EXAMPLE UNITS Anatomy MRI Physiology Heart Cell Biology Neuron SCIENTIFIC RESEARCH & DISCOVERY Organisms REPRESENTATIVE TECHNOLOGY Migratory Sensors Organs Ventricular Modeling Cells Electron Microscopy You Are Here Proteomics Genomics Structure Sequence Macromolecules Biopolymers Infrastructure Medicinal Chemistry Jan. 19, 2004 Protease Inhibitor X-ray Crystallography Technologies Atoms & Molecules APBC 04 Training Protein Docking Phenomena in biological systems may be organized in several layers. Populations Ecological Communities Populations of a Species Physiology and Organisms Integrative physiology, Homeostasis Organs, Tissues Cells Pathways and Information Transfer Integrated metabolism, regulatory, developmental pathways Simple pathways for information transfer, regulation, development Simple metabolic pathways for creating & using other molecules Biological Macromolecules and Structures Biomolecular Assemblies; ligand-receptor complexes Molecules and Structures created by genes, gene products Gene Products: RNAs; Proteins Genes and Genomes Physics and Chemistry e.g. Physical Chemistry, Organic Chemistry, Information theory, Constraints of self-assembling adaptive systems Jan. 19, 2004 APBC 04 Each system layer builds from lower system layers & acquires new emergent properties Populations Ecological Communities Populations of a Species Ecological Processes & Populations Integrative physiology, Homeostasis Organs, Tissues Cells Tissue & Organismal Physiology Developmental & Physiological Processes Pathways and Information Transfer Integrated metabolism, regulatory, developmental pathways Simple pathways for information transfer, regulation, development Simple metabolic pathways for creating & using other molecules Biochemical Pathways & Processes Biological Macromolecules and Structures Biomolecular Assemblies; ligand-receptor complexes Molecules and Structures created by genes, gene products Gene Products: RNAs; Proteins Genes and Genomes Physics and Chemistry Biomolecular Structure & Function Genes Information and Genomes e.g. Physical Chemistry, Organic Chemistry, theory, Constraints of self-assembling adaptive systems Jan. 19, 2004 APBC 04 Physics and Chemistry New Emergent Properties Physiology and Organisms The Next Response Transitional medicine Personalized medicine Merger of medical, chem and bioinformatics Training in cooperative in silico and experimental research Centers that reflect that training ie different to NCBI or EBI Jan. 19, 2004 APBC 04 Think! How the hell are you gonna think and hit at the same time?" Statement of the Director, NIGMS, the House Appropriations Jan. 19, 2004 APBCbefore 04 Subcommittee on Labor, HHS, Education Thursday, February 25, 1999 The Next Response cont. Continued development of scientific societies Simulations used in the clinic setting New diagnostic procedures Ubiquitous large scale computing on large data More systemized drug discovery Jan. 19, 2004 APBC 04 "I knew I was going to take the wrong train, so I left early." Evolution of complex systems: Computers: complexity doubles in every 18 month per $$$ (Moore’s Law) Human Brain: very slow (complexity doubles in ~100,000 years) System Short Term Storage Long Term Storage Speed Cost PC cluster (256 units) 65GB 5 TB 256 GFLOP $130K Human Brain (Average) 57 TB 1137 TB 4.4 TFLOP $130K Complexity = Speed x Memory Computer = 5TB x 256 GFLOP = 1024 memory FLOPs Brain = 1137TB x 4.4 TFLOP = 5x1027 memory FLOPs Brain/Computer=5x103 or 3.7 log units Moore’s Law: 3.5 years/log unit Human brain capacity for computers will be reached: 2000+3.7x3.5=2013 Based on Ramsey, 1997 Jan. 19, 2004 APBC 04 Acknowledgements To all those who have chosen bioinformatics as a career and make the field so rich Particularly those who do so for lesser rewards – the data providers and annotators My group for the fun we had discussing this topic http://rinkworks.com/said/yogiberra.shtml Jan. 19, 2004 APBC 04 "I didn't really say everything I said."