#1 - What is Bioinformatics 8/20/07 BCB 444/544 Lecture 1 What is Bioinformatics? (Genomics? Computational Biology?) #1_Aug20 Thanks to Mark Gerstein (Yale) & Eric Green (NIH) for many borrowed & modified PPTs BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 1 Introduction to Bioinformatics 8/20/07 2 8/20/07 4 http://bindr.gdcb.iastate.edu/bcb544 • Syllabus • Lecture & Lab Schedules (with Homework Assignments) • Lecture PPTs • Lab Exercises • Practice Exams • Grading Policy • Project Guidelines, etc. • Links Instructors: Drena Dobbs ddobbs@iastate.edu Michael Terribilini terrible@iastate.edu Jae-Hyung Lee jhlee777@iastate.edu TAs: Jeff Sander jdsander@iastate.edu Pete Zaback petez@iastate.edu • Check regularly for updates! Lab: MBB 106, 4-4991 Dobbs #1 - What is Bioinformatics? 8/20/07 3 BCB 444/544 - Computer Lab BCB 444/544 F07 ISU Essential Bioinformatics Jin Xiong, Cambridge, 2006 ISBN-13: 9780521600828 1st Lab meets in Library Rm 32 Current schedule: Thurs 1-3 PM Conflicts? Alternatives? Dobbs #1 - What is Bioinformatics? BCB 444/544 Fall 07 Dobbs Dobbs #1 - What is Bioinformatics? BCB 444/544 - Required Textbook Meets in 1304 MBB every week EXCEPT this week: BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? BCB 444/544 - Website BCB 444/544 BCB 444/544 F07 ISU BCB 444/544 F07 ISU Textbook Companion Website: Not much of one for Xiong: Xiong resources but check out companion sites for optional texts (next slide - URLs also provided on class website) 8/20/07 5 BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 6 1 #1 - What is Bioinformatics 8/20/07 Required Reading BCB 444/544 - Optional Textbooks (after today, must read before lecture) Please don't buy these yet! Completely optional, perhaps useful, references. All are available from ISU Bookstore but are cheaper from online booksellers. Wed Aug 22 - for Lecture #2 • Xiong Textbook: • Mount - good reference for both "biologists" and "computer scientists" - but a bit out of date; Online resources - include lists of applications with URLs • Chp 1 - Introduction • Chp 2 - Biological Databases Thurs Aug 23 - for Lab #1: • Pevsner - great overview, esp. for those with little biology background; Online resources excellent: many links & PPTs. • Literature Resources for Bioinformatics Andrea Dinkelman, see Lab Schedule for URL Fri Aug 24 • Jones & Pevzner - good introduction to basic algorithms, esp. for biologists with little computer science background; Online resources - very good: problems, links & PPTs. BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 • Genomics & Its Impact on Science & Society: Genomics & Human Genome Project Primer see Lecture Schedule for URL 7 BCB 444/544 F07 ISU 8/20/07 8 8/20/07 10 Assignment #1: Tell us about you Xiong: Chps 1 & 2 SECTION I Dobbs #1 - What is Bioinformatics? INTRODUCTION AND BIOLOGICAL DATABASES 1 Introduction What Is Bioinformatics? Goal Scope Applications Limitations New Themes Further Reading Due: Wed, Aug 22 1- Complete HW1_Aug20 for Drena 2 Introduction to Biological Databases What Is a Database? Types of Databases Biological Databases Pitfalls of Biological Databases Information Retrieval from Biological Databases Summary Further Reading BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 9 BCB 444/544 F07 ISU Assignment #2 (& for Fun): DNA Interactive "Genomes" What is Bioinformatics? Wikipedia: http://www.dnai.org/c/index.html • Bioinformatics and computational biology involve the use of techniques including: applied mathematics informatics statistics computer science artificial intelligence chemistry & biochemistry (& engineering) to solve biological problems usually on the molecular level A tutorial on genomic sequencing, gene structure, genes prediction Howard Hughes Medical Institute (HHMI) Cold Spring Harbor Laboratory (CSHL) 1. 2. 3. Take the Tour Read about the Project Do some Genome Mining with: Nothing to turn in - just do it! BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? BCB 444/544 Fall 07 Dobbs Dobbs #1 - What is Bioinformatics? • Research in computational biology often overlaps with systems biology (& genomics) 8/20/07 11 BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 12 2 #1 - What is Bioinformatics 8/20/07 What is Systems Biology? Genomics? What is Bioinformatics? Wikipedia: Gerstein (Yale): • Bioinformatics is conceptualizing biology in terms of molecules & applying “informatics” techniques - derived from disciplines such as mathematics, computer science, and statistics - to organize and understand information associated with these molecules, on a large scale • Systems Biology - a term used very widely in the biosciences, particularly from the year 2000 onwards, and in a variety of contexts... • Genomics - is the study of an organism's entire genome Hmmm -- these aren't very useful! BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 13 Modified from Mark Gerstein What is the Information? Biological Sequences, Structures, Processes Central Dogma of Molecular Biology Central Paradigm for Computational Biology • DNA Sequence (1 gene) • DNA Sequence (entire genome) - > mRNA, rRNA, tRNA, snRNAs -> regulatory RNAs, e.g. miRNAs -> mRNA -> Protein -> Phenotype Sequence Structure Function • Molecules & Systems Mechanism Specificity Regulation • Large Amounts of Information Modified from Mark Gerstein Dobbs #1 - What is Bioinformatics? 8/20/07 • Proteome - complete collection of proteins expressed in an organism* 15 Genome = Constant (more or less…) Transcriptome & Proteome = Variable * Note: • Although DNA is "identical" in all cells of a single organism, both types and amounts of RNAs & proteins vary greatly in different cells & tissues • Expression patterns depend on variables such as developmental stage, age, disease state, environmental conditions, etc. BCB 444/544 F07 ISU Functions: • Information transfer (mRNA) • Protein synthesis (rRNA/tRNA) • Catalytic & regulatory activities (some very recently discovered!) Information: • 4 letter alphabet: A C G T of DNA nucleotides (nt) • ~ 1000 base pairs (bp) in avg gene • Proteome - complete collection of proteins expressed in an organism * BCB 444/544 Fall 07 Dobbs Dobbs #1 - What is Bioinformatics? 8/20/07 16 DNA sequence: • Genetic material • Transcriptome - complete collection of RNAs (mRNAs & others) expressed in an organism * 8/20/07 BCB 444/544 F07 ISU Molecular Biology Information: DNA & RNA Sequences • Genome - complete collection of DNA (genes and "nongenes") of an organism Dobbs #1 - What is Bioinformatics? 14 • Transcriptome - complete collection of RNAs (mRNAs & others) expressed in an organism* Standardized ontologies Statistical analyses BCB 444/544 F07 ISU 8/20/07 • Genome - complete collection of DNA (genes and "non-genes") of an organism Sequence, Structure, Function Interactions Pathways & Networks • Processes Dobbs #1 - What is Bioinformatics? Explosion of "Omes" & "Omics!" Genome, Transcriptome, Proteome… -> Proteins -> Phenotype • Molecules BCB 444/544 F07 ISU (in bacteria) • ~ 3 X 109 bp in human genome 17 Modified from Mark Gerstein BCB 444/544 F07 ISU atggcaattaaaattggtatcaatggttttggtcgtat gcacaacaccgtgatgacattgaagttgtaggtattaa atggcttatatgttgaaatatgattcaactcacggtcg aaagatggtaacttagtggttaatggtaaaactatccg Gcaaacttaaactggggtgcaatcggtgttgatatcgc tttaactgatgaaactgctcgtaaacatatcactgcag gcgcaaaaaaagtt RNA sequence has "U" instead of "T" • Where are the genes? • Which DNA sequences encode RNA? • Which genomic DNA is "junk"? • Which RNA sequences encode proteins? Dobbs #1 - What is Bioinformatics? 8/20/07 18 3 #1 - What is Bioinformatics 8/20/07 Molecular Biology Information: Protein Sequences Molecular Biology Information: Macromolecular Structures Functions: Most cellular functions are either performed by or • • • • • regulated by proteins Biocatalysis Protein Cofactor transport/storage Mechanical motion/support d1dhfa_ d8dfr__ Immune protection d4dfra_ Regulation of growth and d3dfr__ differentiation Information: • 20 letter alphabet: ACDEFGHIKLMNPQRSTVWY of amino acids (aa) • ~ 300 aa in an average protein (in bacteria) • > 3 X 106 known protein sequences Modified from Mark Gerstein BCB 444/544 F07 ISU DNA/RNA/Protein Structures • How does a protein (or RNA) sequence fold into an active 3-D structure? • Can we predict structure from sequence? • Can we predict function from structure (or perhaps, from sequence alone?) sequences: LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTT LNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTS ISLIAALAVDRVIGMENAMPWN-LPADLAWFKRNTL TAFLWAQDRDGLIGKDGHLPWH-LPDDLHYFRAQTV • What is this protein? • Which amino acids are most important for folding, activity, or interaction with other proteins? • Which sequence variations are harmful (or beneficial)? Dobbs #1 - What is Bioinformatics? 8/20/07 19 Modified from Mark Gerstein BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 20 8/20/07 22 Molecular Biology Information: Biological Processes We don't understand the protein folding code yet but we try to engineer proteins anyway! Genomics & Systems Biology • How do patterns of gene expression determine phenotype? • Which genes and proteins are required for differentiation during during development? • How do proteins interact in biological networks? • Which genes and pathways have been most highly conserved during evolution? Modified from Mark Gerstein BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 21 BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? Genome Projects: Rapid Automated Sequencing "On a Large Scale?" Whole Genome Sequencing 1st a complete bacterial genome, then yeast Genome sequences now accumulate so quickly that, in less than a week, a single laboratory can produce more bits of data than Shakespeare managed in a lifetime, although the latter make better reading. Another recent improvement: rapid & high resolution separation of fragments in capillaries instead of gels E Yeung, Ames Lab, ISU -- G A Pekso, Nature 401: 115-116 (1999) Modified from Mark Gerstein BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? BCB 444/544 Fall 07 Dobbs More recently? 8/20/07 23 Modified from Eric Green Pyro-sequencing 454 sequencing http://www.454.com/ $ 1000 genomes? BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 24 4 #1 - What is Bioinformatics 8/20/07 1st Draft Human Genome: "Finished" in 2001 Human Genome Sequencing Two approaches: • Public (government) - International Consortium (mainly 6 countries, NIH-funded in US) • Hierarchical cloning & BAC-to-BAC sequencing • Map-based assembly • Private (industry) - Celera, Craig Venter, CEO • Whole genome random "shotgun" sequencing • Computational assembly (took advantage of public maps & sequences, too) Guess which human genome they sequenced? Craig's How many genes? Modified from Eric Green BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 25 Public Sequencing: International Consortium ~ 20,000 (Science, May 2007) BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 26 So, having a list of parts is not enough! BIG QUESTION? How do parts work together to form a functional system? SYSTEMS BIOLOGY What is a system? Macromolecular complex, pathway, network, cell, tissue, organism, ecosystem… Modified from Eric Green BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 27 Is this Bioinformatics? • Creating digital libraries • Gene identification by sequence inspection YES • Methods for structure determination • Computational X-ray crystallography • NMR structure determination • Distance geometry Modified from Mark Gerstein BCB 444/544 F07 ISU • Prediction of splice sites, promoters, etc. 8/20/07 28 YES • DNA methods in forensics YES • Modeling populations of organisms YES • Ecological Modeling YES • Genomic sequencing methods • Assembling contigs • Physical and genetic mapping YES • Linkage analysis YES Dobbs #1 - What is Bioinformatics? BCB 444/544 Fall 07 Dobbs Dobbs #1 - What is Bioinformatics? Is this Bioinformatics? • Automated bibliographic search and text comparison • Knowledge bases for biological literature • Metabolic pathway simulation BCB 444/544 F07 ISU • Linking specific genes to various traits 8/20/07 29 Modified from Mark Gerstein BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? YES 8/20/07 30 5 #1 - What is Bioinformatics 8/20/07 Is this Bioinformatics? • • • • So, this is Bioinformatics Rational drug design RNA structure prediction Protein structure prediction Artificial life simulations What is it good for? YES • Artificial immunology • Computer security Just a few examples… Modified from Mark Gerstein BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 31 Designing drugs BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 32 Finding homologs of "new" human genes • Understanding how proteins bind other molecules • Structural modeling & ligand docking • Designing inhibitors or modulators of key proteins Figures adapted from Olsen Group Docking Page at Scripps, Dyson NMR Group Web page at Scripps, and from Computational Chemistry Page at Cornell Theory Center). Modified from Mark Gerstein BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 33 Finding WHAT? Homologs - "same genes" in different organisms Modified from Mark Gerstein BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 34 Comparative Genomics: Genome/Transcriptome/Proteome/Metabolome • Human vs Mouse vs Yeast • Much easier to do experiments on yeast to determine function • Often, function of an ortholog in at least one organism is known Databases, statistics • Occurrence of a specific genes or features in a genome • How many kinases in yeast? Best Sequence Similarity Matches to Date Between Positionally Cloned Human Genes and S. cerevisiae Proteins Human Disease MIM # Human Gene GenBank BLASTX Acc# for P-value Human cDNA Yeast Gene GenBank Yeast Gene Acc# for Description Yeast cDNA Hereditary Non-polyposis Colon Cancer Hereditary Non-polyposis Colon Cancer Cystic Fibrosis Wilson Disease Glycerol Kinase Deficiency Bloom Syndrome Adrenoleukodystrophy, X-linked Ataxia Telangiectasia Amyotrophic Lateral Sclerosis Myotonic Dystrophy Lowe Syndrome Neurofibromatosis, Type 1 120436 120436 219700 277900 307030 210900 300100 208900 105400 160900 309000 162200 MSH2 MLH1 CFTR WND GK BLM ALD ATM SOD1 DM OCRL NF1 U03911 U07418 M28668 U11700 L13943 U39817 Z21876 U26455 K00065 L19268 M88162 M89914 9.2e-261 6.3e-196 1.3e-167 5.9e-161 1.8e-129 2.6e-119 3.4e-107 2.8e-90 2.0e-58 5.4e-53 1.2e-47 2.0e-46 MSH2 MLH1 YCF1 CCC2 GUT1 SGS1 PXA1 TEL1 SOD1 YPK1 YIL002C IRA2 M84170 U07187 L35237 L36317 X69049 U22341 U17065 U31331 J03279 M21307 Z47047 M33779 DNA repair protein DNA repair protein Metal resistance protein Probable copper transporter Glycerol kinase Helicase Peroxisomal ABC transporter PI3 kinase Superoxide dismutase Serine/threonine protein kinase Putative IPP-5-phosphatase Inhibitory regulator protein Choroideremia Diastrophic Dysplasia Lissencephaly Thomsen Disease Wilms Tumor Achondroplasia Menkes Syndrome 303100 222600 247200 160800 194070 100800 309400 CHM DTD LIS1 CLC1 WT1 FGFR3 MNK X78121 U14528 L13385 Z25884 X51630 M58051 X69208 2.1e-42 7.2e-38 1.7e-34 7.9e-31 1.1e-20 2.0e-18 2.1e-17 GDI1 SUL1 MET30 GEF1 FZF1 IPL1 CCC2 S69371 X82013 L26505 Z23117 X67787 U07163 L36317 GDP dissociation inhibitor Sulfate permease Methionine metabolism Voltage-gated chloride channel Sulphite resistance protein Serine/threoinine protein kinase Probable copper transporter Modified from Mark Gerstein BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? BCB 444/544 Fall 07 Dobbs 8/20/07 • Compare Tissues • Which proteins are expressed in cancer vs normal tissues? • Diagnostic tools • Drug target discovery 35 Modified from Mark Gerstein BCB 444/544 F07 ISU Dobbs #1 - What is Bioinformatics? 8/20/07 36 6