The Elucidation of Regulatory Networks in Complex Biological Systems: The Convergence of Biology, Medicine and Computing G. Poste Stanford University, 15 March 2002 gposte@healthtechnetwork.com The Analysis and Application of Principles of Biological Design biology 1750-1980 1980-2010 • the encoded information content of biological systems biology chemistry • the descriptive narrative • empirical technology genomics computing • mechanistic reductionism • mapping the basis of biological variation • rational medicine and customized care systems biology Biology and Medicine as Information-Based Sciences From Reductionism to Integrated Systems Biology • individual genes and proteins • biological circuits, pathways and networks • molecular interactions in simple systems • assembly of higher order systems • limited, fragmented datasets • massive, integrated datasheets • poor annotation • stringent, standardised annotation • limited capacity for predictive simulation • robust algorithms for predictive biology • biology in silico • analog information • digital information 21st Century Biology and Medicine “SYSTEMS BIOLOGY” • the design principles of biological order and complexity • mapping the information content of biopathways and networks Biotechnology And Systems Biology New Analytical Capabilities Large Scale Computing “BIG BIOLOGY” • interdisciplinary, massive datasets, information-based • infrastructure, investment and education Convergence : The Technological Platforms Shaping the Evolution of Healthcare Rule-Based Design Principles Computational Biology Biotechnology And Systems Biology New Analytical Capabilities Exploring “Biospace” Large Scale Computing Automation Engineering and Robotics Materials Science Micro-/OptoElectronics From Reductionism to Integrated Systems Biology understanding the information content encoded in biological networks mapping the design rules for progressively greater complexity of biological order gene(s) pathways, circuits and networks progressively ordered assemblies: organelles, cells, tissues organs homeostatic integration of myriad, complex, interactive networks (Physiology) High Level Abstraction of Biological Pathways and Network Systems Encoded Information Pathways and Networks Rule Sets Plasticity • adaptive fitness • pathological peturbation Predictive Biology • directed evolution • biology in silico Novel Biospace and Carbon : Silicon Union Global and Nodal Pathway Map of Genomic and Proteomic Elements in Yeast Galactose Utilization From: T. Ideker et. al. 2001. Science 292, 929 Genetic Networks bioinformation processing involves leverage of interactive feedback loops in diverse domains - physical, chemical, electrical genomic and proteomic codes represent a dense network of nested hyperlinks matter becomes code Nonlinear Complexity in Biological Systems distinct classes of nonlinear interactions long-range (fractal) correlations self-similarity, self-dissimilar and organized criticality pattern formation complex adaptive networks highly optimized tolerance = robustness with fragility barriers to cascading failures deterministic chaos emergent properties Nonlinear Complexity in Biological Systems abrupt changes - bifurcations; intermittency/bursting; bistability/multistability; phase transitions nonlinear oscillations - limit cycles; phase-resetting; entrainment nonlinear waves - spirals; scrolls; solitons complex periodic cycles and quasiperiodicities scale invariance - fractal and multifractal scaling; long-range correlations; self-organized criticality stochastic resonance and related noise-modulated mechanisms time irreversibility Information and Technology Platform Overload Principal Themes in the Analysis of Biological Systems large scale miniaturization automation parallelism networked systems real time, interactive, adaptive Major Technology Gaps rapid gene ID in complex genomes structural genomics and protein structure-function prediction mapping the proteome - abundance, modification, localisation and proteinprotein interactions - large scale parallelism (protein-arrays) - small organic molecule networks mapping the metabolome - circuits, modules, networks robust predictive algorithms for ADMET profiling of drug candidate SAR The Need for Standards and Stringent Semantics “... without which ….. wanton and luxuriant fancies climbing up into the Bed of Reason, do not only defile it by unchaste and illegitimate embraces, but instead of real conceptions and notices of things do impregnate the mind with nothing but Ayerie and Subventaneous Phantasmes” Samuel Parker, FRS 1666 standards standards STANDARDS The Analysis and Comprehension of Biological Systems descriptive ignorance initial mechanistic insights complexity • elucidation of patterns • defining rule sets defined rule sets • disease heterogeneity • patient heterogeneity • disease predisposition burgeoning, bewildering complexity • elegant simplicity revealed • predictive biology • right Rx : right disease • right Rx : right patient • from reactive treatment to proactive prevention molecular phylogenies and geneology chemical SAR Integrated Distributed Heterogeneous Databases and Databanks biological order population genetics clinical databanks data warehousing and data mining evolving hardware and electronic evolution object-oriented and pattern / spatial array recognition Expert Systems and Knowledge Management humancomputer interface systems Convergence, Consilience, Cognition and Computing • more science • better science • faster science • cross-disciplinary science • interdisciplinary convergence • technological convergence • corporate convergence MEGADATA Volume The Scalability Crisis • burgeoning data volumes • more transactions • increasing diversity of datasets/apps • expanding user communities • pressures on network bandwidth • complexity of distributed environments • rising performance expectations • confidentiality and privacy Performance Major Challenges for Life Sciences Computing exponentially growing data repositories (102TB/PB) highly variable data formats and standards as obstacles to data access and mining inadequate attention to data Q.C./annotation standards excessive reliance on customized solutions and fragmented data sources inadequate access and integration of public and private datasets primitive data visualization tools 80% time spent on data preparation tasks and 20% on productive exploration Major Challenges for Life Sciences Computing Big Biology infrastructure scale and capital investment new tools for mining, visualization, simulation data storage conventions and technologies dynamic, adaptive, scalable systems active networks - software into the network - subnet interoperability - integration of distributed and collaborative working environments fast data access at all levels - storage, I/O and networks to support analysis and simulation expanded bandwidth for high usage and high transfer rates Bracing For the Inevitable : Petabyte-Size Databases 1000 terabytes 250 billion text pages 20 million four drawer filing cabinets 2000 mile high tower of 1 billion diskettes typical US consumer generates 100 Gbytes personal data/lifetime - education, insurance, credit, medical 100 million consumers 10,000 petabytes Data Grids from Napster and Gnutella to ubiquitous peer-to-peer exchange of data sets to apportioned distributed computing for solutions of computationally massive problems Informatics for Big Biology and e.Health Networks • instructive precedents in high end computing from other disciplines - cosmology, quantum chromodynamics, climate research, materials Europe USA • Scientific Simulation Initiative • National Computational Science Alliance • Long Term Ecological Research • NASA, DOE, NOAA • Accelerated Strategic Computing Initiative •Grid Physics Network • • • • • UNICORE Pangea E-Science LHC Challenge E-Grid The Bibliome The Bibliome Proof, logic and ontology languages • shared terms/ terminology • machine-machine communication • inter-memetic translation • self-evolving translators • Resource Description Framework • eXtensible Markup Language • Metadata tagging standards for interoperable distributed archives • self-assembling datasets • self-describing documents • HyperText Markup Language • HyperText Transfer Protocol • The first generation Web The Global Virtual Archive/ Universal Knowledge Web Modified from : T. Berners-Lee and J. Hendler Nature 2000 410, 1023 Metadata WWW I Standardized Lexical Foundations for the Annotation, Archiving and Analysis of Complex Biological Systems unique complexity of biological systems multiple levels of abstraction - organismal - ecosystem dynamics - social/memetic networks qualitative not quantitative data - diversity of experimental conditions - inaccessibility/replication of experimental conditions upgrading to hybrid qualitative/quantitative analysis tools Standardized Lexical Foundations for the Annotation, Archiving and Analysis of Complex Biological Systems entity classes : finite elements action properties : state properties intramolecular site interactions intermolecular site interactions massively parallel networks : unit modules continuum systems compartments economy and parsimony evolutionary relationships network pathways - redundancy (degeneracy), pleiotropy - complex emergent properties Standardized Lexical Foundations for the Annotation, Archiving and Analysis of Complex Biological Systems entity classes : finite elements action properties : state properties intramolecular site interactions intermolecular site interactions massively parallel networks : unit modules continuum systems compartments economy and parsimony evolutionary relationships network pathways - redundancy (degeneracy), pleiotropy - complex emergent properties submodels for searchable characteristics of functional knowledge integration of submodels into web-based distributed model networks Jabberwocky “ ’Twas brillig and the slithy toves Did gyre and gimble in the wabe; All mimsy were the borogoves And the mome raths outgrabe” Lewis Carroll The Divide Between Syntax and Semantics “Colorless ideas sleep furiously” Noam Chomsky (1957) syntactically valid semantically void The Divide Between Syntax and Semantics “Colorless green ideas sleep furiously” Noam Chomsky (1957) encoded genome structure (syntax) and diverse expression repertoires (semantics) - alternative splicing - overlapping reading frames - nonsense mutations - differential modulation by different transcription factors database formats (syntax) and ontology (semantics) The Conceptual Complexity of Ontology Design ontology - set of axioms in a logical language - representational vocabulary with precise definitions of shared understanding - axioms constrain interpretation of defined terms XML versus ontology and evolution of the semantic web - XML less complex since semantics are not represented - objective to reduce uncertainty favors ontologies - objectives to reduce complexity favors XML Convergence, Consilience, Cognition and Computing scientific, technological and economic convergence data complexity optimized data representation data scale data diversity optimized data comprehension optimized data utilization • adaptive IT • novel visualization • ‘mind in the loop’ • novel emergent computing and mining tools networks • modulation of • human medicine brain function for interfaces optimum perceptualization Bounded Rationality human mind’s processing capacity is small relative to the size of the problems requiring analysis/comprehension (Simon) objective solutions require complexity reduction in information, task and coordination complexity reduction - omission and abstraction - division of labor (systems decomposition) complexity reduction simultaneously increases uncertainty (Fox) implications for evolution of ontologies for the semantic web Enhancing Human Cognitive Capacities for Optimizing information Utilization escalating quantities and types of information real time decision making new multi-modal, multi-sensory high performance human : information interfaces representation and comprehensibility of information flows - optimize information representation (perception) - modulation of brain function to optimize comprehension systemic application of advances in cognitive neurobiology Enhancing Human Cognitive Capacities for Optimizing information Utilization optimizing representations of information - perceptualization optimizing cognitive capacities - states of the brain affect states of mind (perception and cognition) - perceptual modulation techniques Interdisciplinary Linquistics : Memetic Engineering molspeak, medspeak, nerdspeak standardization coding speech recognition object-oriented computing synthetic intelligence Molecular Medicine, Population Segmentation and Targeted Patient Care Population Genetics large-scale population genetics geno-phenotype correlations in subpopulations ‘at-risk’ subpopulations individual risk profiling Linking Clinical Outcomes to Genetic Variation population genetics haplotype blocks SNP maps low cost high-throughput genotyping dbases informatics gene-disease associations ethics Large-Scale Disease Association Genetics and Disease Predisposition Risk Profiling formidable logistics and cost robust algorithms for combinatorial gene interactions slow evolution complex ethical, legal and social issues public acceptance and legislative controls evidentiary standards and regulation Legislative and Regulatory Considerations in the Creation and Management of Large Scale Population Health Data Networks consent identifiable (clinical) versus anonymous (research) data authentication of communicating parties compliance - HIPAA (USA) EU Data Directive individual nation/US State requirements ICH5 Common Technical Document e.health Content Care Population Databanks and the Rise of Molecular Medicine individual / family records diabetes CVD CPD renal privacy and confidentiality gene-disease correlations stroke CNS gene-outcome correlations gene-disease predisposition associations infection cancer individual (targeted) care - optimum Tx - predisposition and proactive risk management Who Knows Wins! Health Databanks population dBase individual record and risk profile Population Segmentation and Individual Patient Profiling • • • • Physician Desk-Top Network clinical pharmacy lab data outcomes Shaping Physician Behaviour decision support / control Dx/PDx Rx, PRx clinical guidelines education e.Pharmacy e.Home Health Rx validation utilization compliance AE avoidance wellness education compliance risk mitigation remote monitoring Shaping Consumer / Patient Behaviour “The average person will have three to five internet devices on their body by the end of 2010….. not just the mobile phone, but health monitors, maybe even an implanted device, a GPS type of system, etc………..” John Chambers Cisco Systems dot.CEO January 2001, p. 53 Consumer Health Information Systems and Services in-home to physician / pharmacy links next generation tele-medicine and personal health monitoring compliance monitoring independent living emergency management integration of new imaging / diagnostic sensor systems Biology and Medicine as Information-Based Disciplines Cyber-Medicine on-body / in-body / in-home remote devices for health status / compliance monitoring interactive computational software and Rx of behavioral disorders ubiquitous physician decision-support software to optimize clinical care and compliance The Evolution of Large-Scale Biology genome sequencing comparative genomics proteomics functional genomics structural genomics genetic circuits biological order complex systems SNPs and gene-disease association studies large-scale population and statistical genetics robust geno-phenotype correlations individual genotyping and disease risk profiling INFORMATICS Biology and Medicine as Information-Based Disciplines Research understanding the encoded instructions for biological design - genes proteins higher order assemblies - abnormal information coding in disease Clinical Medicine assembly of large-scale population databases - gene-disease correlations - gene-Rx outcome correlations - individual genotyping and disease predisposition risk profiling Systems Analysis Biology as an Informational Science new technological platforms - automation, miniaturization, high-throughput - parallelism new computational tools - scale, diversity of content - mining algorithms new organizational linkages - convergence of biology and computing (science) - health / telco / compco (technology) Systems Analysis Biology as an Informational Science new skills - graduate / post-graduate curricula - clinical training new organizational structures - inter-disciplinary new policies - grant agencies - national / international science - regulation, legislation Computational Biology Grand Challenges predictive simulation of gene regulation and genetic networks - from genotype to phenotype fast algorithms for molecular simulations modeling of molecular interactions, chemical dynamics, transport and compartmentalization in cells metabolic and physiological simulations scalar modeling - molecules to cells to tissues to organs to organisms to populations predictive tools for pre-emptive stabilization of system dysregulation From Bioinformatics to Computational Biology Bioinformatics : The Phenomenological Era • ID and classification of statistical regulation among the most recurrent objects • optimum database design • fast classification/clustering algorithms • data mining software and ontological relationships Computational Biology : The Theoretical Era • elucidation of robust design rules • higher order multistate detector and component interactions • contextual recognition • pathways, circuits, networks and higher order assemblies • predictive biology biology and medicine are in transition to become information-based sciences this transition will shift R&D focus from the current reductionist framework to the analysis of biological complexity (systems biology) these transitions will demand adoption of large scale analyses (big biology) and obligate adoption of more stringent standardization - data QC, annotation, curation - dBase formats and clinical profiling tools - massive computational capacity and dynamic, scalable networks - distributed computing and collaborative networks - from bioinformatics to ‘rules-based’ computational biology and cybermedicine