Introduction to Synthetic Biology 423 2013 Herbert Sauro hsauro@u.washington.edu www.sys-bio.org Gene and Genomes Smallest Genome – was in 1999 } Single Gene One of the smallest Genomes: Mycoplasma genitalium (Small parasitic bacterium) 3 Smallest Genome Total genes: Protein coding genes: tRNA and rRNA: 521 482 39 This genome is of interest to synthetic biology because Craig Venter wants to use this organism as the basis for a minimal organism for genetic engineering. Venter’s group has removed roughly 101 genes and the organism is still viable, the idea then is to patent the minimal set of genes required for life. PNAS (2006) 103, 425--430 4 Gene Function The complexity of simplicity Scott N Peterson and Claire M Fraser Genome Biol. 2001;2(2):COMMENT 2002. Epub 2001 Feb 8. 5 But the real prize goes to…. The 160-Kilobase Genome of the Bacterial Endosymbiont Carsonella Atsushi Nakabachi, Atsushi Yamashita, Hidehiro Toh, Hajime Ishikawa, Helen E. Dunbar, Nancy A. Moran, and Masahira Hattori (13 October 2006) Science 314 (5797), 267. Endosymbiont : organism that lives in another cells. 160-Kilobase Genome of the Bacterial Endosymbiont Carsonella Symbiont of sap sucking PSYLLIDS or ‘jumping plant lice’ ~182 genes 6 Prokaryotic Cells: E. coli 1 .Bacteria lack membrane bound nuclei 2. DNA is circular 3. No complex internal organelles 2-3 um http://www.ucmp.berkeley.edu/bacteria/bacteriamm.html 7 Prokaryotic Cells: E. coli http://atlas.arabslab.com 8 Comparison to Eukaryotic Cells http://www.cod.edu/people/faculty/fancher/ProkEuk.htm 9 E. coli Cytoplasm Average spacing between proteins: 7 nm/molecule Diameter of a protein: 5 nm David S. Goodsell (Scripps) 10 E. Coli Statistics Length: 2 to 3 um Diameter: 1 um Generation time: 20 to 30 mins Translation rate: 40 aa/sec Transcription rate: 70 nt/sec Number of ribosomes per cell : 18,000 Small Molecules/Ions per cell: Alanine: Pyruvate: ATP: Ca ions: Fe ions: 350,000 370,000 2,000,000 2,300,000 7,000,000 Data from: http://bionumbers.hms.harvard.edu http://redpoll.pharmacy.ualberta.ca/CCDB/cgi-bin/STAT_NEW.cgi David S. Goodsell (Scripps) 11 E. Coli Statistics E coli has approximately 4300 protein coding genes. Protein abundance per cell: ATP Dependent helicase: 104 LacI repressor: 10 to 50 molecules LacZ (galactosidase) : 5000 CheA kinase (chemotaxis): 4,500 CheB (Feedback): 240 CheY (Motor signal): 8,200 Chemoreceptors: 15,000 Glycolysis Phosphofructokinase: 1,550 Pyruvate Kinase: 11,000 Enolase: 55,800 Phosphoglycerate kinase: 124,000 Source: Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics 2008, 9:102. Ishihama et al. Krebs Cycle Malate Dehydrogenase: 3,390 Citrate Synthase: 1,360 Aconitase: 1630 12 E. Coli Statistics E coli has approximately 4300 protein coding genes. Molecules Numbers in Prokaryotes: 1. 2. 3. 4. 5. 6. Ions Small Molecules Metabolic Enzymes Signaling Proteins Transcription Factors DNA Millions 10,000 – 100,000 1000 – 10,000s 100 – 1000s 10s to 100s 1 – 10s Source: Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics 2008, 9:102. Ishihama et al. Protein abundance per cell: ATP Dependent helicase: 104 LacI repressor: 10 to 50 molecules LacZ (galactosidase) : 5000 CheA kinase (chemotaxis): 4,500 CheB (Feedback): 240 CheY (Motor signal): 8,200 Chemoreceptors: 15,000 Glycolysis Phosphofructokinase: 1,550 Pyruvate Kinase: 11,000 Enolase: 55,800 Phosphoglycerate kinase: 124,000 Krebs Cycle Malate Dehydrogenase: 3,390 Citrate Synthase: 1,360 Aconitase: 1630 13 Circular Chromosome in E. coli Most Prokaryotic DNA is circular. Gene are located on both strands of the DNA. Genes on the outside are transcribed clockwise and those on the inside anticlockwise. E. coli’s genome is 4,639,221 base pairs Coding for 4472 genes, of which 4316 are genes that code for proteins. Proteins 4316 tRNAs 89 rRNAs 22 Other RNAs 64 14 Circular Chromosome in E. coli 88% of the E. coli genome codes for proteins, the rest includes RNA coding, promoter, terminators etc. In contrast, the Human genome: 3,000,000,000 base pairs and about 25,000 genes. Only 2% of the Human genome codes for proteins. The rest is……RNA regulatory network? Human genes are also segmented into Exon and Introns, with alternative splicing, significantly increasing the actual number of protein 15 EcoCyc: http://ecocyc.org/ 16 E. coli Gene Structure Stop codon (TAG, TAA, TGA) Start codon Page 134 RNA Polymerase Binds to Promoters mRNA Changes in the promoter sequence can change the efficiency of RNA polymerase binding to the DNA. The promoter is therefore a site which can be engineered. http://mgl.scripps.edu/people/goodsell/pdb/pdb40/pdb40_1.html Strong and Weak Promoters The strength of a promote is one of the factors which determines the rate of transcription. Strong Promoter. The recA promoter is a strong promoter. TTGATA -- 16 -- TATAAT TTGACA -- 17 -- TATAAT Most common Promoter (Consensus sequence) It differs from the averaged promoter sequence by one nucleotide and on base pair in the spacer region. Weak Promoter. The araBAD promoter is a weak promoter. CTGACG -- 18 -- TACTGT TTGACA -- 17 -- TATAAT RNA Polymerase Stops at a Terminator Changes in the terminator sequence can change the efficiency of RNA polymerase stopping. If the gene is part of an operon, terminators can modulate relative expression levels of the different genes in the operon. The terminator is therefore a site which can be engineered. Operon Structure Gene A 100% Promoter Gene B 60% Terminator Gene C 30% Operators – Regulating Expression Gene Regulation lac Operon Metabolic Enzyme (output) Promoter Promoter Operator lacZ codes for β-galactosidase. lacY codes for β-galactoside permease. Sugar in Medium Relative βgalactosidase Glucose 1 Glucose + lactose 50 Lactose 2500 Gene Regulation lac Operon Lac repressor Metabolic Enzyme (output) Promoter Promoter Operator Gene Regulation lac Operon LacI Repressor lacI is a tetramer (x4) LacI binding to Promoter Ribosome Binding Sites In summary: Stop Codon Start Codon Promoter RBS Gene Operators 5’-UTR Terminator This course is about networks: The Science and Engineering of Biological Networks The world is full of networks Electronic WWW Road Social Biological Networks Metabolic Networks Metabolic About 1000-1400 genes that code for metabolic enzymes in E. coli (out of a total of about 4300 genes) Protein-Protein Networks Protein Signaling Network Protein-Protein Networks Protein Signaling Network: CellDesigner Kohn MIMS 20% of the human protein-coding genes encode components of signaling pathways, including transmembrane proteins, guaninenucleotide binding proteins (G proteins), kinases, phosphatases and proteases. Protein-Protein Networks C Genetic Networks Gene Regulatory Networks: BioTapestry Genetic Networks Gene Regulatory Networks: BioTapestry : Ventral Neural Tube in Vertebrate Embryo Genetic Units Understanding the Dynamic Behavior of Genetic Regulatory Networks by Functional Decomposition. William Longabaugh and Hamid Bolouri Curr Genomics. Author manuscript; available in PMC 2007 December 12. Published in final edited form as: Curr Genomics. 2006 November; 7(6): 333–341. Hybrid Network: Cell Cycle Control is Bacteria Two Kinds of Representations 1. Non-Stoichiometry – or ball and stick networks No stoichiometry, kinetics or mass conservation Cytoscape: Ball and Stick 2. Stoichiometry – reaction maps ?? – Stuff that people make up, whose knows what they really mean Stoichiometric Network Classification Elementary Stoichiometric Non-Elementary Networks Probabilistic NonStoichiometric Ball and Stick (Data dependent) Systems and Synthetic Biology Systematic Biology Synthetic Network Physiology Biology Top Down Bottom Up Systems Biology Synthetic Biology Top Down and Bottom Up Top Down “-omics” System • Whole cell Model • Statistical Correlations Data • High-throughput Yeast Protein-Protein Interaction Map Top Down and Bottom Up Top Down “-omics” Bottom Up ”mechanistic” System • Whole cell System • Networks/Pathways Model • Statistical Correlations Model • Mechanistic, biophysical Data • Quantitative, single-cell Data • High-throughput