Introduction to Systems Biology Overview of the day • • • • Background & Introduction Network analysis methods Case studies Exercises Why Systems Biology? …and why now? Timeline of discovery van Leeuwenhoek: described single celled organisms Charles Darwin: “The Origin of Species” 1676 Gregor Mendel: Phenotype determined by inheritable units James Watson Francis Crick: solve structure of DNA 1953 1859 1866 1735 Carl Linnaeus: Hierarchical classification of species 1862 Louis Pasteur: Microorganisms responsible for contamination, heating kills microorganisms 1944 1955 Avery, MacLeod, McCarty: DNA is the genetic material Frederick Sanger: Complete sequence of insulin Frederick Sanger In 1975, he developed the chain termination method of DNA sequencing, also known as the Dideoxy termination method or the Sanger method. Two years later he used his technique to successfully sequence the genome of the Phage Φ-X174; the first fully sequenced genome. This earned him a Nobel Prize in Chemistry (1980) (his second) – Sanger earned his first Nobel prize in Chemistry (1958) for determining the complete amino acid sequence of insulin in 1955. Concluded that insulin had a precise amino acid sequence. The genomic era Human genome sequence “completed”, Feb 2001 PubMed abstracts indicate a recent interest in Systems Biology Human genome completed Functional genomics • Study of Genomes is called “Genomics” • Genomics led to Functional Genomics which aims to characterize and determine the function of biomolecules (mainly proteins), often by the use of high-throughput technologies. • Today, people talk about: – – – – – Genomics Transcriptomics Proteomics Metabolomics [Anything]omics High-throughput applications of microarrays • • • • • • • • Gene expression De novo DNA sequencing (short) DNA re-sequencing (relative to reference) SNP analysis Competitive growth assays ChIP-chip (interaction data) Array CGH Whole genome tiling arrays Tiling microarrays Huber W, et al., Bioinformatics 2006 Functional genomics using gene knockout libraries for yeast Replacement of yeast ORFs with kanMX gene flanked by unique oligo barcodes“Yeast Deletion Project Consortium” similar RNAi libraries in other systems Systematic phenotyping Barcode CTAACTC (UPTAG): Deletion Strain: TCGCGCA TCATAAT yfg2D yfg3D yfg1D Rich media … Growth 6hrs in minimal media (how many doublings?) Harvest and label genomic DNA Systematic phenotyping with a barcode array (Ron Davis and others) These oligo barcodes are also spotted on a DNA microarray Growth time in minimal media: – Red: 0 hours – Green: 6 hours Mass spectrometry • Peptide identification • Relative peptide levels • Protein-protein interactions (complexes) • Post-translational modifications • Many many technologies MudPIT (Multidimensional Protein Identification Technology) • MudPIT describes the process of digesting, separating, and identifying the components of samples consisting of thousands of proteins. • Separates peptides by 2D liquid chromatography (cation-exchange followed by reversed phase liquid chromotography) • LC interfaced directly with the ion source (microelectrospray) of a mass spectrometer John Yates lab http://fields.scripps.edu/mudpit/index.html Isotope coded affinity tags (ICAT) Mass spec based method for measuring relative protein abundances between two samples ICAT Reagents: Heavy reagent: d8-ICAT (X=deuterium) Normal reagent: d0-ICAT (X=hydrogen) O N N O XX N S Biotin tag XX O O O XX O XX Linker (d0 or d8) Ruedi Aebersold http://www.imsb.ethz.ch/researchgroup/aebersold N I Thiol specific reactive group Protein quantification & identification via ICAT strategy 100 Mixture 1 Light 0 550 560 Heavy 570 580 m/z ICATlabeled cysteines Quantitation 100 NH2-EACDPLR-COOH Mixture 2 Combine and proteolyze (trypsin) Affinity separation (avidin) 0 200 ICAT Flash animation: http://occawlonline.pearsoned.com/bookbind/pubbooks/bc_mcampbell_genomics_1/medialib/method/ICAT/ICAT.html 400 600 m/z 800 Example Yeast grown in ethanol vs galactose media were monitored with ICAT Adh1 vs. Adh2 ratios are shown below… Comparing mRNA levels to protein levels Protein-protein interaction data • Physical Interactions – Yeast two hybrid screens – Affinity purification (mass spec) – Peptide arrays – Protein-DNA by chIP-chip • Other measures of ‘association’ – Genetic interactions (double deletion mutants) – Genomic context (STRING) Yeast two-hybrid method Y2H assays interactions in vivo. Uses property that transcription factors generally have separable transcriptional activation (AD) and DNA binding (DBD) domains. A functional transcription factor can be created if a separately expressed AD can be made to interact with a DBD. A protein ‘bait’ B is fused to a DBD and screened against a library of protein “preys”, each fused to a AD. Issues with Y2H • Strengths – High sensitivity (transient & permanent PPIs) – Takes place in vivo – Independent of endogenous expression • Weaknesses: False positive interactions – Auto-activation – ‘sticky’ prey – Detects “possible interactions” that may not take place under real physiological conditions – May identify indirect interactions (A-C-B) • Weaknesses: False negatives interactions – Similar studies often reveal very different sets of interacting proteins (i.e. False negatives) – May miss PPIs that require other factors to be present (e.g. ligands, proteins, PTMs) Protein-DNA interactions: ChIP-chip Lee et al., Science 2002 Simon et al., Cell 2001 Mapping transcription factor binding sites Harbison C., Gordon B., et al. Nature 2004 Dynamic role of transcription factors Harbison C., Gordon B., et al. Nature 2004 Exercise: Y2H Construct a protein-protein interaction network for proteins A,B,C,D Systems biology and emerging properties Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002 Building models from parts lists Protein-DNA interactions ▲ Chromatin IP ▼ DNA microarray Gene levels (up/down) Protein-protein interactions ▲ Protein coIP ▼ Mass spectrometry Protein levels (present/absent) Biochemical reactions ▲none Metabolic flux ▼ measurements Biochemical levels Mathematical abstraction of biochemistry Metabolic models “Genome scale” metabolic models • Genes • Metabolites – Cytosolic – Mitochondrial – Extracellular 708 584 559 164 121 • Reactions – Cytosolic – Mitochondrial – Exchange fluxes 1175 702 124 349 Forster et al. Genome Research 2003. One framework for Systems Biology 1. The components. Discover all of the genes in the genome and the subset of genes, proteins, and other small molecules constituting the pathway of interest. If possible, define an initial model of the molecular interactions governing pathway function (how?). 2. Pathway perturbation. Perturb each pathway component through a series of genetic or environmental manipulations. Detect and quantify the corresponding global cellular response to each perturbation. One framework for Systems Biology 3. Model Reconciliation. Integrate the observed mRNA and protein responses with the current, pathwayspecific model and with the global network of proteinprotein, protein-DNA, and other known physical interactions. 4. Model verification/expansion. Formulate new hypotheses to explain observations not predicted by the model. Design additional perturbation experiments to test these and iteratively repeat steps (2), (3), and (4). From model to experiment and back again Systems biology paradigm Aebersold R, Mann M., Nature, 2003. Continuum of modeling approaches Top-down Bottom-up Data integration and statistical mining Need computational tools able to distill pathways of interest from large molecular interaction databases (top-down) List of genes implicated in an experiment • What do we make of such a result? Jelinsky S & Samson LD, Proc. Natl. Acad. Sci. USA Vol. 96, pp. 1486–1491,1999 Types of information to integrate • Data that determine the network (nodes and edges) – protein-protein – protein-DNA, etc… • Data that determine the state of the system – – – – – mRNA expression data Protein modifications Protein levels Growth phenotype Dynamics over time Mapping the phenotypic data to the network •Systematic phenotyping of 1615 gene knockout strains in yeast •Evaluation of growth of each strain in the presence of MMS (and other DNA damaging agents) •Screening against a network of 12,232 protein interactions Begley TJ, Rosenbach AS, Ideker T, Samson LD. Damage recovery pathways in Saccharomyces cerevisiae revealed by genomic phenotyping and interactome mapping. Mol Cancer Res. 2002 Dec;1(2):103-12. Mapping the phenotypic data to the network Begley TJ, Rosenbach AS, Ideker T, Samson LD. Damage recovery pathways in Saccharomyces cerevisiae revealed by genomic phenotyping and interactome mapping. Mol Cancer Res. 2002 Dec;1(2):103-12. Mapping the phenotypic data to the network Begley TJ, Rosenbach AS, Ideker T, Samson LD. Damage recovery pathways in Saccharomyces cerevisiae revealed by genomic phenotyping and interactome mapping. Mol Cancer Res. 2002 Dec;1(2):103-12. Network models can be predictive Green nodes represent proteins identified as being required for MMS resistance; gray nodes were not tested as part of the 1615 strains used in this study; blue lines represent protein-protein interactions. The untested gene deletion strains (ylr423c, hda1, and hpr5) were subsequently tested for MMS sensitivity; all were found to be sensitive (bottom). Begley TJ, Rosenbach AS, Ideker T, Samson LD. Damage recovery pathways in Saccharomyces cerevisiae revealed by genomic phenotyping and interactome mapping. Mol Cancer Res. 2002 Dec;1(2):103-12. Summary • Systems biology can be either top-down or bottom-up • We are now in the post genomic era (don’t ignore that) • Systematic measurements of all transcripts, proteins, and protein interactions enable topdown modeling • Metabolic models, built bottom-up, are being refined with genomic information • Data – Model – Predictions – Data: cycle as a Systems Biology theme