Bioinformatics: Applications ZOO 4903 Fall 2006, MW 10:30-11:45 Sutton Hall, Room 312 Jonathan Wren Systems Biology Lecture overview • What we’ve talked about so far – Pathways & network motifs – Simulating evolution in-silico – Cellular simulations • Overview – The ultimate goal of biology & bioinformatics is to tie it all together and understand the system – In the meantime, forced to live in the real world, we focus on tying a few things together Systems Biology – backers & attackers Though coined 40 years ago, a lot of people still ask, "What's that?" when the term systems biology comes up. "It is used in so many different contexts, nobody is really clear what you mean by it," says John Yates III, a professor at the Scripps Research Institute in La Jolla, Calif. He's not the only one stumped by the term's meaning. David Placek, president of Sausalito, Calif.-based Lexicon Branding, a company that cooks up names for pharmaceutical products such as Velcade and Meridia, says he's not so hot on the moniker. "Systems biology is just so general that it could apply to many things. When you're naming a category, the underlying principle is that if you make a statement like, 'I'm doing systems biology,' do people know what you're talking about?'“…… Volume 17 | Issue 19 | 27 Oct. 6, 2003, The Scientist What is “Systems Biology”? Is this just another name for “physiology”? The study of the mechanisms underlying complex biological processes as integrated systems of many interacting components. Systems biology involves (1) collection of large sets of experimental data (2) proposal of mathematical models that might account for at least some significant aspects of this data set, (3) accurate computer solution of the mathematical equations to obtain numerical predictions, and (4) assessment of the quality of the model by comparing numerical simulations with the experimental data. -(Leroy Hood, 1999) Institute for Systems Biology http://www.systemsbiology.org/ Why Systems Biology? • On the technology side (PUSH): Capabilities for highthroughput data gathering that have made us aware that biological networks have many more components than we previously surmised. • On the biology side (PULL): The realization that to the extent that we don’t characterize biological systems quantitatively in their full complexity, the scope and accuracy of our understanding of those systems will be compromised. (in classical experimental terms, the uncontrolled variables in the system will undermine our confidence in the conclusions we draw from our experiments and observations) Systems Biology vs. traditional cell and molecular biology • Experimental techniques in systems biology are high throughput. • Intensive computation is involved from the start in systems biology, in order to organize the data into usable computable databases. • Exploration in traditional biology proceeds by successive cycles of hypothesis formation and testing; data accumulates during these cycles. • Systems biology initially gathers data without prior hypothesis formation; hypothesis formation and testing comes during post-experiment data analysis and modeling. Genomics, Proteomics & Systems Biology Genomics Proteomics Systems Biology 1990 1995 2000 2005 2010 2015 2020 Modelling Tools 9 7 # 5 3 1 65-69 70-74 75-79 80-84 Period 85-89 90-94 95-99 • • • • • • • • • • • • • • • • • • • • BIOSSIM (1968) ESSYN (1976) SCAMP (1983) SCOP (1986) METAMOD (1986) SIMFIT (1990) METAMODEL (1991) METASIM (1992) KINSIM (1993) GEPASI (1994) METALGEN (1994 ?) MIST (1995) METABOLIKA (1997 ?) METAFLUX (1997) SIMFLUX (1997) MNA (1998) CELLMOD (1998) FLUXMAP (1999) METATOOL (1999) VCELL (1999) From Klaus Mauch, University of Stuttgart Systems Biology is an integration of data & approaches Technologies to study systems at different levels • Genomics (HT-DNA sequencing) • Mutation detection (SNP methods) • Transcriptomics (Gene/Transcript measurement, SAGE, gene chips, microarrays) • Proteomics (MS, 2D-PAGE, protein chips, Yeast-2-hybrid, X-ray, NMR) • Metabolomics (NMR, X-ray, capillary electrophoresis) Each system has methods for modeling Pi Calculus Flux Balance Analysis Petri Nets Differential Eqs Each system has methods for modeling Boolean Networks Electrical Circuit Model Cellular Automata So how can we meaningfully integrate the data? System heterogeneity in size & timescale Atomic Scale 0.1 - 1.0 nm Coordinate data Dynamic data 0.1 - 10 ns Molecular dynamics Molecular Scale 1.0 - 10 nm Interaction data Kon, Koff, Kd 10 ns - 10 ms Interactions Cellular Scale 10 - 100 nm Concentrations Diffusion rates 10 ms - 1000 s Fluid dynamics System heterogeneity in size & timescale Tissue Scale 0.01m - 1.0 m Metabolic input Metabolic output 1 s – 1 hr Process flow Organism scale 0.01m – 4.0 m Behaviors Habitats 1 hr – 100 yrs Mechanics Ecosystem scale 1 km – 1000 km Environmental impact Nutrient flow 1 yr – 1000 yrs Network Dynamics Each of the scales does not fit together seamlessly • If one scale (e.g., protein-protein interactions) behaves deterministically and with isolated components, then we can use plug-n-play approaches • If it behaves chaotically or stochastically, then we cannot • Most biological systems lie between this deterministic order and chaos: Complex systems Man-made Complex Devices Intel Pentium 4 42 million transistors Man-made Complex Devices • The Intel Itanium 2 • 410 million transistors • Number of gates > 100 Million By 2007 both Intel and AMD are predicting dies with 1 billion transistors In terms of parts and interconnections, man-made devices will likely have comparable complexity to bacterial cells if not greater by around 2010 System Models Building computational models of systems seems more and more like a viable project. Such a project would bring a much clearer understanding of how systems are controlled and ultimately it should bring unprecedented predictive power. Are Biologists Ready? Xo S1 S2 S3 v S4 S5 S6 X1 Xo and X1 fixed, all reactions reversible, assume stable steady state. Are Biologists Ready? 50 % Xo S1 S2 S3 v S4 S5 S6 X1 What happens to the steady state? Xo and X1 fixed, all reactions reversible, assume stable steady state. Are Biologists Ready? 50 % Xo S1 S2 S3 v S4 S5 Typical replies: 1. Nothing happens. 2. Nothing happens unless it is the rate-limiting step. 3. The rate v goes down, but that’s all. 4. S3 goes up. 5. S4 goes down. 6. Species downstream of v go down. 7. Steady State flow changes but species levels don’t. 8. Xo and X1 change S6 X1 Are Biologists Ready? 50 % Xo S1 S2 S3 v S4 S5 S6 If we can’t understand this system how can we hope to understand: X1 Functional Motif Identification Computer simulation of EGF signal transduction PC12 cells. Frances Brightman, Simon Thomas and David Fell http://bms-mudshark.brookes.ac.uk/frances/fabweb5.htm 29 species Functional Motif Identification Computer simulation of EGF signal transduction PC12 cells. Frances Brightman, Simon Thomas and David Fell http://bms-mudshark.brookes.ac.uk/frances/fabweb5.htm 29 species Functional Motif Identification 27 components Functional Motif Identification As we begin to connect systems we can engage in inference • We move up the chain from data to knowledge by questioning, observing and then hypothesizing – These X genes are upregulated together, but are they interacting? – PPI network data suggests Y are – Are these Y part of a complex? – If they are always expressed together, that suggests maybe yes • As more data is integrated and systems linked together, this becomes easier Example of inference (a) An interaction network of Snz–Sno proteins of S. cerevisiae. The nodes represent proteins and the lines represent yeast two-hybrid (Y2H) interactions. The red nodes represent proteins that correspond to genes in one transcriptome cluster, whereas the green nodes represent proteins that correspond to genes belonging to a different cluster. The existence of two stable complexes can be hypothesized based on the integrated data. (b) The genes NTH1 and YLR270W have similar expression profiles (upper panel). Red indicates upregulation and green indicates downregulation. mRNA expressions of both genes are upregulated during heat shock and other forms of stress. Deletions of NTH1 and YLR270W each confer similar heat-shock sensitive phenotypes (lower panel). Integrating heterogeneous but related observations How are the data related? What kind of model? What kind of inferencing? Is the data validated? Can we take a “best guess” on how it might work by drawing upon other motifs or systems with similar properties? Problems? How is static data interpreted since it’s a dynamic system? How do we deal with low-resolution quality? How do we treat missing data? How do we deal with heterogeneous data types? How can we identify and evaluate competing hypotheses inferred by any system? Yes… SB is springing out of existing efforts anyway • E-cell (Keio University, Japan) • BioSpice Project (Arkin, Berkeley) • Metabolic Engineering Working Group (Palsson & Church, UCSD, Harvard) • Silicon Cell Project (Netherlands) • Virtual Cell Project (UConn) • Gene Network Sciences Inc. (Cornell) • Project CyberCell (Edmonton/Calgary) So where do we start? • Quantitative analysis of components and dynamics of complex biological systems Static (Tier 1) Deterministic (Tier 2) Stochastic (Tier 3) Features of complex systems • Nonlinearity global properties not simple sum of parts Features of complex systems • Feedback loops Features of complex systems • Open systems (dissipation of energy) Flagella uses energy: Features of complex systems • Can have memory (response history dependent) New protein may remain in cell after initial response, shifting the rate of reaction the next time the cell is exposed to a chemical Response Chemical concentration Features of complex systems • Nested (modules have complexity) Features of complex systems • There are no precise boundaries So where do we start? • Quantitatively account for these properties Static (Tier 1) – Different levels of modeling • Three tiers Deterministic (Tier 2) – Static interactions – Deterministic – Stochastic • Principles which transcend tiers… Stochastic (Tier 3) Principle 1: Modularity • Module – Interacting nodes w/ common function – Constrained pleiotropy – Feedback loops, oscillators, amplifiers Principle 2: Recurring circuit elements • Network motifs – Common methods to achieve an effect Principle 3: Robustness • Robustness – Insensitivity to parameter variation • Severe constraints on design – Robustness not present in most designs Aims of systems biology • Tier 1: Interactome – Which molecules talk to each other in networks? • Tier 2: Deterministic – What is the average case behavior? • Tier 3: Stochastic – What is the variance of the system? Aims of systems biology • Tier 1 – Get parts list Aims of systems biology • Tier 2 & 3 – Enumerate biochemistry – Define network/mathematical relationships – Compute numerical solutions Aims of systems biology • Tier 2 & 3 – Deterministic: Behavior of system with respect to time is predicted with certainty given initial conditions – Stochastic: Dynamics cannot be predicted with certainty given initial conditions Aims of systems biology • Deterministic – Ordinary differential equations (ODE’s) • Concentration as a function of time only – Partial differential equations (PDE’s) • Concentration as a function of space and time • Stochastic – Stochastic update equations • Molecule numbers as random variables • functions of time Y = # molecules at time t Tier 1: Static interactome analysis • Protein-protein – Signal transduction – Cell cycle • Protein-DNA – Gene regulation • Metabolic pathways – Respiration – cAMP Tier 1: Static interactome analysis • Goals – Determine network topology – Network statistics – Analyze modular structure Tier 1: Static interactome analysis • Limitations: – Time, space, population average – Crude interactions typical interactome • strength • types – Global features • starting point for Tier 2 & 3 first time-varying yeast interactome (Bork 2005) Tier 1: Static interactome analysis • Analysis methods – Functional Genomics • expression analysis • network integration – Graph Theory • scale free • small world Tier 2: Deterministic Models • Goal – model mesoscale system – average case behavior lumped cell • Three levels – ODE system – ODE compartment system – PDE – data limited… cell compartments continuous time & space (MinCDE oscillation) Tier 2: Deterministic Modeling • Results – Robust Chemotaxis (Barkai 1997) – MinCDE Oscillation (Howard 2003) – Feedback in Signal Transduction (Brandman 2005) • Output – time series plots (ODE) – condition on parameter values Brandman 2005 Tier 2: Deterministic Modeling • Example – Robustness in bacterial chemotaxis • Bacterial chemotaxis robust to parameter fluctuations! – Chemotaxis: bacterial migration towards/away from chemicals – Parameters • concentrations • binding affinities Tier 2: Deterministic Modeling • Bacterial chemotaxis – model as random walk • Exact adaptation – change in concentration of chemical stimulant – rapid change in bacterial tumbling frequency… – then adapts back precisely to its prestimulus value!! Random walk Experimental Design • Is exact adaptation robust to substantial variations in biochemical parameters? • Systematically varied concentrations of chemotaxis-network proteins and measured resulting behavior Distinguish between robust-adaptation and fine-tuned models of chemotaxis Tumbling frequency IPTG inducer pUA4 pUA4 E. Coli cheR -/- population pUA4 Adaption time pUA4 Express CheR over a 100-fold range Adaption precision 1 mM L-aspartate Adaptation precision = ratio of steady-state tumbling frequency of unstimulated to stimulated cells Summary of results Tumbling frequency 0.3 ± 0.06 (20-fold) Adaption time 3 ± 1 (3-fold) Adaption precision 1.04 ± 0.07 Tumbling frequency as a function of time for wild-type cells Conclusions from study • Exact adaptation is maintained despite substantial varations in network-protein concentrations – Exact adaptation is a robust property – …but adaptation time and steadystate behavior are fine-tuned Tier 3: Stochastic analysis • Fluctuations in abundance of expressed molecules at the single-cell level – Leads to non-genetic individuality of isogenic population Tier 3: Stochastic Analysis • When stochasticity is negligible, use deterministic modeling… • Molecular “noise” is low: – System is large • molar quantities – Fast kinetics • reaction time negligible – Large cell volume • infinite boundary conditions Tier 3: Stochastic Analysis • Molecular “noise” is high: – System is small • finite molecule count matters – Slow kinetics • relative to movement time – Large cell volume • relative to molecule size • Need explicit stochastic modeling! Tier 3: Ensemble Noise • Transcriptional bursting – Leaky transcription – Slow transitions between chromatin states • Translational bursting – Low mRNA copy number Tier 3: Temporal Noise Canonical way of modeling molecular stochasticity Tier 3: Spatial Noise Finite number effect: translocation of molecules from the nucleus to the cytoplasm have a large effect on nuclear concentration Cytoplasm N = average molecular abundance η (coefficient of variation) = σ/N • Decrease in abundance results ina 1/√N scaling of the noise (η=1/√N) Nucleus Recap • Three tiers – Interactomes – Deterministic – Stochastic Static (Tier 1) • Principles which cross tiers – Modularity – Reuse – Robustness Deterministic (Tier 2) Stochastic (Tier 3) Major challenges and limitations • Measurement of chemical kinetics parameters and molecular concentrations in vivo – Differences between in vitro and in vivo data • Compartmental specific reactions Major challenges and limitations • Data is the limit!!! – Functional genomic data (Interactomes) – E. Coli chemotaxis (Leibler, deterministic/robustness) • Important – parameter estimation – feedback based estimation methods Sachs 2005 Software • Tier 1: Interactomes – Graphviz, Bioconductor, Cytoscape • Tier 2: Deterministic – Matlab (SBtoolbox), Mathematica (PathwayLab) • Tier 3: Stochastic – R, Stochsim Software • High-performance algorithms to solve systems of PDE’s – Virtual Cell • Automated parsing of networks into stochastic and deterministic regimes – H-GENESIS – STOCK Summary • Systems Biology can be done by breaking down each system into modules • Many problems remain unsolved in exactly how to do this, but independent efforts are being developed in most areas that may one day merge together For next time • Read supplemental material S9 • Homework #10 due