Presentation (Powerpoint 13MB) - The Stanford University InfoLab

advertisement
The Elucidation of Regulatory Networks
in Complex Biological Systems:
The Convergence
of
Biology, Medicine and Computing
G. Poste
Stanford University, 15 March 2002
gposte@healthtechnetwork.com
The Analysis and Application of Principles
of Biological Design
biology
1750-1980
1980-2010
• the encoded
information content
of biological
systems
biology
chemistry
• the descriptive
narrative
• empirical
technology
genomics computing
• mechanistic
reductionism
• mapping the basis
of biological
variation
• rational
medicine and
customized
care
systems biology
Biology and Medicine as Information-Based Sciences
From Reductionism to Integrated Systems Biology
• individual genes
and proteins
• biological circuits,
pathways and networks
• molecular interactions
in simple systems
• assembly of higher
order systems
• limited, fragmented
datasets
• massive, integrated
datasheets
• poor annotation
• stringent, standardised
annotation
• limited capacity for
predictive simulation
• robust algorithms for
predictive biology
• biology in silico
• analog information
• digital information
21st Century Biology and Medicine
“SYSTEMS BIOLOGY”
• the design principles of biological order and complexity
• mapping the information content of biopathways and networks
Biotechnology
And
Systems
Biology
New
Analytical
Capabilities
Large Scale
Computing
“BIG BIOLOGY”
• interdisciplinary, massive datasets, information-based
• infrastructure, investment and education
Convergence :
The Technological Platforms Shaping
the Evolution of Healthcare
Rule-Based
Design Principles
Computational
Biology
Biotechnology
And
Systems
Biology
New
Analytical
Capabilities
Exploring
“Biospace”
Large Scale
Computing
Automation
Engineering
and Robotics
Materials
Science
Micro-/OptoElectronics
From Reductionism to Integrated Systems Biology
 understanding the information content encoded in
biological networks
 mapping the design rules for progressively greater
complexity of biological order
gene(s)
pathways, circuits and networks
progressively ordered assemblies: organelles, cells, tissues organs
homeostatic integration of myriad, complex, interactive networks
(Physiology)
High Level Abstraction of Biological Pathways and
Network Systems
Encoded Information
Pathways and Networks
Rule Sets
Plasticity
• adaptive fitness
• pathological peturbation
Predictive Biology
• directed evolution
• biology in silico
Novel Biospace
and
Carbon : Silicon Union
Global and Nodal Pathway Map of Genomic and
Proteomic Elements in Yeast Galactose Utilization
From: T. Ideker et. al. 2001. Science 292, 929
Genetic Networks
 bioinformation processing involves leverage of
interactive feedback loops in diverse domains
- physical, chemical, electrical
 genomic and proteomic codes represent a
dense network of nested hyperlinks
 matter becomes code
Nonlinear Complexity in Biological Systems
 distinct classes of nonlinear interactions
 long-range (fractal) correlations
 self-similarity, self-dissimilar and organized
criticality
 pattern formation
 complex adaptive networks
 highly optimized tolerance = robustness with
fragility
 barriers to cascading failures
 deterministic chaos
 emergent properties
Nonlinear Complexity in Biological Systems
 abrupt changes
- bifurcations; intermittency/bursting;
bistability/multistability; phase transitions
 nonlinear oscillations
- limit cycles; phase-resetting; entrainment
 nonlinear waves
- spirals; scrolls; solitons
 complex periodic cycles and quasiperiodicities
 scale invariance
- fractal and multifractal scaling; long-range
correlations; self-organized criticality
 stochastic resonance and related noise-modulated
mechanisms
 time irreversibility
Information
and
Technology Platform Overload
Principal Themes in the
Analysis of Biological Systems
 large scale
 miniaturization
 automation
 parallelism
 networked systems
 real time, interactive, adaptive
Major Technology Gaps
 rapid gene ID in complex genomes
 structural genomics and protein structure-function
prediction
 mapping the proteome
- abundance, modification, localisation and proteinprotein interactions
- large scale parallelism (protein-arrays)
- small organic molecule networks
 mapping the metabolome
- circuits, modules, networks
 robust predictive algorithms for ADMET profiling of
drug candidate SAR
The Need for Standards and Stringent Semantics
“... without which …..
wanton and luxuriant fancies climbing up into the Bed of Reason,
do not only defile it by unchaste and illegitimate embraces,
but instead of real conceptions and notices of things
do impregnate the mind with nothing
but Ayerie and Subventaneous Phantasmes”
Samuel Parker, FRS 1666
standards
standards
STANDARDS
The Analysis and Comprehension
of Biological Systems
descriptive
ignorance
initial
mechanistic
insights
complexity
• elucidation of
patterns
• defining rule
sets
defined
rule sets
• disease heterogeneity
• patient heterogeneity
• disease predisposition
burgeoning,
bewildering complexity
• elegant simplicity
revealed
• predictive biology
• right Rx : right disease
• right Rx : right patient
• from reactive treatment
to proactive prevention
molecular
phylogenies
and
geneology
chemical
SAR
Integrated
Distributed
Heterogeneous
Databases
and Databanks
biological
order
population genetics
clinical
databanks
data
warehousing
and
data mining
evolving
hardware
and
electronic
evolution
object-oriented
and pattern /
spatial array
recognition
Expert
Systems
and
Knowledge
Management
humancomputer
interface
systems
Convergence, Consilience, Cognition
and Computing
• more
science
• better science
• faster science
• cross-disciplinary
science
• interdisciplinary
convergence
• technological
convergence
• corporate
convergence
MEGADATA
Volume
The
Scalability
Crisis
• burgeoning data volumes
• more transactions
• increasing diversity of
datasets/apps
• expanding user
communities
• pressures on network
bandwidth
• complexity of
distributed environments
• rising performance
expectations
• confidentiality and privacy
Performance
Major Challenges for Life Sciences Computing
 exponentially growing data repositories
(102TB/PB)
 highly variable data formats and standards as
obstacles to data access and mining
 inadequate attention to data Q.C./annotation
standards
 excessive reliance on customized solutions and
fragmented data sources
 inadequate access and integration of public and
private datasets
 primitive data visualization tools
 80% time spent on data preparation tasks and
20% on productive exploration
Major Challenges for Life Sciences Computing
Big Biology
 infrastructure scale and capital investment
 new tools for mining, visualization, simulation
 data storage conventions and technologies
 dynamic, adaptive, scalable systems
 active networks
- software into the network
- subnet interoperability
- integration of distributed and collaborative working
environments
 fast data access at all levels
- storage, I/O and networks to support analysis and
simulation
 expanded bandwidth for high usage and high transfer rates
Bracing For the Inevitable : Petabyte-Size
Databases
 1000 terabytes
 250 billion text pages
 20 million four drawer filing cabinets
 2000 mile high tower of 1 billion diskettes
 typical US consumer generates 100 Gbytes
personal data/lifetime
- education, insurance, credit, medical
 100 million consumers  10,000 petabytes
Data Grids
 from Napster and Gnutella
to
 ubiquitous peer-to-peer exchange of data sets
to
 apportioned distributed computing for
solutions of computationally massive problems
Informatics for Big Biology and
e.Health Networks
• instructive precedents in high end computing from
other disciplines
- cosmology, quantum chromodynamics,
climate research, materials
Europe
USA
• Scientific Simulation Initiative
• National Computational
Science Alliance
• Long Term Ecological Research
• NASA, DOE, NOAA
• Accelerated Strategic Computing
Initiative
•Grid Physics Network
•
•
•
•
•
UNICORE
Pangea
E-Science
LHC Challenge
E-Grid
The Bibliome
The Bibliome
Proof, logic
and
ontology
languages
• shared terms/ terminology
• machine-machine
communication
• inter-memetic translation
• self-evolving translators
• Resource Description
Framework
• eXtensible Markup
Language
• Metadata tagging standards
for interoperable distributed
archives
• self-assembling datasets
• self-describing documents
• HyperText Markup Language
• HyperText Transfer Protocol
• The first generation
Web
The
Global
Virtual
Archive/
Universal
Knowledge
Web
Modified from : T. Berners-Lee and J. Hendler Nature 2000 410, 1023
Metadata
WWW
I
Standardized Lexical Foundations for the
Annotation, Archiving and Analysis of Complex
Biological Systems
 unique complexity of biological systems
 multiple levels of abstraction
- organismal
- ecosystem dynamics
- social/memetic networks
 qualitative not quantitative data
- diversity of experimental conditions
- inaccessibility/replication of experimental
conditions
 upgrading to hybrid qualitative/quantitative
analysis tools
Standardized Lexical Foundations for the
Annotation, Archiving and Analysis of Complex
Biological Systems
 entity classes : finite elements
 action properties : state properties
 intramolecular site interactions
 intermolecular site interactions
 massively parallel networks : unit modules
 continuum systems
 compartments
 economy and parsimony
 evolutionary relationships
 network pathways
- redundancy (degeneracy), pleiotropy
- complex emergent properties
Standardized Lexical Foundations for the Annotation,
Archiving and Analysis of Complex Biological Systems











entity classes : finite elements
action properties : state properties
intramolecular site interactions
intermolecular site interactions
massively parallel networks : unit modules
continuum systems
compartments
economy and parsimony
evolutionary relationships
network pathways
- redundancy (degeneracy), pleiotropy
- complex emergent properties
submodels for searchable characteristics of functional knowledge
 integration of submodels into web-based distributed model networks
Jabberwocky
“ ’Twas brillig and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves
And the mome raths outgrabe”
Lewis Carroll
The Divide Between Syntax and Semantics
“Colorless ideas sleep furiously”
Noam Chomsky (1957)
 syntactically valid
 semantically void
The Divide Between Syntax and Semantics
 “Colorless green ideas sleep furiously”
Noam Chomsky (1957)
 encoded genome structure (syntax) and diverse
expression repertoires (semantics)
- alternative splicing
- overlapping reading frames
- nonsense mutations
- differential modulation by different transcription
factors
 database formats (syntax) and ontology
(semantics)
The Conceptual Complexity of Ontology Design
 ontology
- set of axioms in a logical language
- representational vocabulary with precise
definitions of shared understanding
- axioms constrain interpretation of defined terms
 XML versus ontology and evolution of the semantic
web
- XML less complex since semantics are not
represented
- objective to reduce uncertainty favors
ontologies
- objectives to reduce complexity favors XML
Convergence, Consilience, Cognition
and Computing
scientific, technological and economic
convergence
data
complexity
optimized
data
representation
data
scale
data
diversity
optimized
data
comprehension
optimized
data
utilization
• adaptive IT
• novel visualization • ‘mind in the loop’
• novel emergent
computing
and mining tools
networks
• modulation of
• human medicine
brain function for
interfaces
optimum perceptualization
Bounded Rationality
 human mind’s processing capacity is small relative
to the size of the problems requiring
analysis/comprehension (Simon)
 objective solutions require complexity reduction in
information, task and coordination
 complexity reduction
- omission and abstraction
- division of labor (systems decomposition)
 complexity reduction simultaneously increases
uncertainty (Fox)
 implications for evolution of ontologies for the
semantic web
Enhancing Human Cognitive Capacities
for Optimizing information Utilization
 escalating quantities and types of information
 real time decision making
 new multi-modal, multi-sensory high performance
human : information interfaces
 representation and comprehensibility of
information flows
- optimize information representation (perception)
- modulation of brain function to optimize
comprehension
 systemic application of advances in cognitive
neurobiology
Enhancing Human Cognitive Capacities
for Optimizing information Utilization
 optimizing representations of information
- perceptualization
 optimizing cognitive capacities
- states of the brain affect states of mind
(perception and cognition)
- perceptual modulation techniques
Interdisciplinary Linquistics :
Memetic Engineering
 molspeak, medspeak, nerdspeak
 standardization coding
 speech recognition
 object-oriented computing
 synthetic intelligence
Molecular Medicine,
Population Segmentation
and
Targeted Patient Care
Population Genetics
large-scale
population genetics
geno-phenotype
correlations
in subpopulations
‘at-risk’
subpopulations
individual
risk
profiling
Linking Clinical Outcomes to Genetic Variation
population
genetics
haplotype blocks
SNP maps
low cost
high-throughput
genotyping
dbases
 informatics

gene-disease
associations
ethics
Large-Scale Disease Association Genetics
and Disease Predisposition Risk Profiling
 formidable logistics and cost
 robust algorithms for
combinatorial gene interactions
 slow evolution
 complex ethical, legal and social issues
 public acceptance and legislative controls
 evidentiary standards and regulation
Legislative and Regulatory Considerations in the
Creation and Management of Large Scale
Population Health Data Networks
 consent
 identifiable (clinical) versus anonymous
(research) data
 authentication of communicating parties
 compliance
-
HIPAA (USA)
EU Data Directive
individual nation/US State requirements
ICH5 Common Technical Document
e.health
Content
Care
Population Databanks and the Rise of
Molecular Medicine
 individual / family records
diabetes
CVD
CPD
renal
 privacy and confidentiality
 gene-disease correlations
stroke
CNS
 gene-outcome correlations
 gene-disease predisposition
associations
infection
cancer
 individual (targeted) care
- optimum Tx
- predisposition and
proactive risk management
Who Knows Wins!
Health Databanks
population
dBase
individual
record
and
risk
profile
Population Segmentation
and Individual Patient
Profiling
•
•
•
•
Physician
Desk-Top
Network
clinical
pharmacy
lab data
outcomes
Shaping Physician Behaviour
 decision support / control
 Dx/PDx
Rx, PRx
 clinical guidelines
 education
e.Pharmacy
e.Home
Health
 Rx validation
 utilization
 compliance
 AE avoidance
 wellness education
 compliance
 risk mitigation
 remote monitoring
Shaping Consumer / Patient Behaviour
“The average person will have three to five
internet devices on their body by the end
of 2010…..
not just the mobile phone,
but health monitors,
maybe even an implanted device,
a GPS type of system, etc………..”
John Chambers
Cisco Systems
dot.CEO January 2001, p. 53
Consumer Health Information
Systems and Services
 in-home to physician / pharmacy links
 next generation tele-medicine and
personal health monitoring
 compliance monitoring
 independent living
 emergency management
 integration of new imaging /
diagnostic sensor systems
Biology and Medicine as Information-Based Disciplines
Cyber-Medicine
 on-body / in-body / in-home remote devices
for health status / compliance monitoring
 interactive computational software and
Rx of behavioral disorders
 ubiquitous physician decision-support software
to optimize clinical care and compliance
The Evolution of Large-Scale Biology
genome sequencing
comparative genomics
proteomics
functional genomics
structural genomics
genetic circuits
biological order
complex systems
SNPs and gene-disease
association studies
large-scale population
and statistical genetics
robust geno-phenotype
correlations
individual genotyping
and disease risk profiling
INFORMATICS
Biology and Medicine as Information-Based Disciplines
Research
 understanding the encoded instructions
for biological design
- genes  proteins  higher order assemblies
- abnormal information coding in disease
Clinical Medicine
 assembly of large-scale population databases
- gene-disease correlations
- gene-Rx outcome correlations
- individual genotyping and disease
predisposition risk profiling
Systems Analysis
Biology as an Informational Science
 new technological platforms
- automation, miniaturization, high-throughput
- parallelism
 new computational tools
- scale, diversity of content
- mining algorithms
 new organizational linkages
- convergence of biology and computing (science)
- health / telco / compco (technology)
Systems Analysis
Biology as an Informational Science
 new skills
- graduate / post-graduate curricula
- clinical training
 new organizational structures
- inter-disciplinary
 new policies
- grant agencies
- national / international science
- regulation, legislation
Computational Biology
Grand Challenges
 predictive simulation of gene regulation and
genetic networks
- from genotype to phenotype
 fast algorithms for molecular simulations
 modeling of molecular interactions, chemical
dynamics, transport and compartmentalization in
cells
 metabolic and physiological simulations
 scalar modeling
- molecules to cells to tissues to organs to
organisms to populations
 predictive tools for pre-emptive stabilization of
system dysregulation
From Bioinformatics to Computational Biology
Bioinformatics : The Phenomenological Era
• ID and classification of statistical regulation among the most
recurrent objects
• optimum database design
• fast classification/clustering algorithms
• data mining software and ontological relationships
Computational Biology : The Theoretical Era
• elucidation of robust design rules
• higher order multistate detector and component interactions
• contextual recognition
• pathways, circuits, networks and higher order assemblies
• predictive biology
 biology and medicine are in transition to become
information-based sciences
 this transition will shift R&D focus from the current
reductionist framework to the analysis of biological
complexity (systems biology)
 these transitions will demand adoption of large
scale analyses (big biology) and obligate adoption
of more stringent standardization
- data QC, annotation, curation
- dBase formats and clinical profiling tools
- massive computational capacity and dynamic,
scalable networks
- distributed computing and collaborative
networks
- from bioinformatics to ‘rules-based’
computational biology and cybermedicine
Download