Bioinformatics Essentials Stephanie Tatem Murphy smurphy@bcc.ctc.edu DNA ATGCATTTCGGT TTACGCCATATA GCTCGGGAATCA TGCATCGATCGA GTAGCTAGCTAG Model organisms Protein PNSADADNDFEDRL RAGLCDHDKEVQGL QVRCAVUEEHMHK KQQEFENIRLDAQRL EFFAYIFQKEHMKR What is Bioinformatics? TGT ATT AGA ACA ATA TGT GCA ATT AAT ATA CAT TGG AAT AAT GTA ATA AGT AAT CAT CTT CCT AGT AGT AAT TAT TGT AAA TTT ACG TAT ACC TGT ATT GTT TTT AAC AAT GTT GTT GTT TGT ATT AGA ACA ATA TGT GCA ATT AAT ATA CAT TGG AAT AAT GTA ATA AGT AAT CAT CTT CCT AGT AGT AAT TAT TGT AAA TTT ACG TAT ACC TGT ATT GTT TTT AAC AAT GTT GTT GTT TTC TGT AAA CTG ATT ATT TGT CTG TTC TGT AAA CTG ATT ATT TGT CTG Which genes are turned off then on ? Courtesy of Dr. Young Moo Lee UC Davis Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001 Genome Transcriptome Proteome Fundamental Dogma DNA Although a few databases already exist to distribute molecular information, the post-genomic era will need many more to collect, manage, and publish the coming flood of new findings. Map Databases PDB SwissPROT PIR RNA Gene Expression? Development ? Proteins Pathways Regulatory Pathways? Metabolism? Phenotypes Clinical Data ? Neuroanatomy? Populations GenBank EMBL DDBJ Biodiversity? Molecular Epidemiology? Comparative Genomics? Gene a b c d e …ATGGCCCTGTGGATGCGCCTCCTGCCCCTG….. DNA base sequence recipe for amino acids Met: Ala: Leu: Trp: Met: Arg: Leu: Leu: Pro: Leu: Amino acid sequence = protein = trait Art by Yelena Ponirovskaya The Biology Project University of Arizona http://www.biology.arizona.edu DNA acitivity – RFLP, Inheritance http://www.biology.arizona.edu/human_bio/activities/blackett/introduction.html DNA replication fork http://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/03t.html DNA base pairing http://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/08t.html DNA translation http://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/10t.html The Genetic Code http://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/12t.html http://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/13t.html DNA transcription http://www.biology.arizona.edu/molecular_bio/problem_sets/nucleic_acids/15t.html Bioinformatics – a Definition bio – informatics: bioinformatics is conceptualizing biology in terms of molecules and applying “informatics techniques” to understand and organise the information associated with these molecules, on a large scale. In short, bioinformatics is a management information system for molecular biology and has many practical applications. As submitted to the Oxford English Dictionary. What is Bioinformatics? N. M. Luscombe, et al. Yale University Method Inform Med 4/2001 Bioinformatics – a Definition The field of science in which biology, computer science, and information technology merge into a single discipline. NCBI, Aug 2001 BIOLOGY BIO INFORMATICS COMPUTER SCIENCE INFORMATION TECHNOLOGY What’s in a name? Multiple Sequence Alignment Database Homology Searching Sequence Analysis Genome Mapping Protein Analysis Proteomics Life Science Informatics Sample Registration & Tracking 3D Modeling Homology Modeling Docking Intellectual Property Auditing Integrated Data Repositories Common Visual Interfaces Bioinformatics Needs Multidisciplinary teams biologists, mathematicians, computer scientists, laboratory technicians Users and Developers to use / create scalable database infrastructure standards to control vocabulary and annotation new ways of visualizing, analyzing and searching data new ways of delivering information, tools and results Faster and larger computer systems Demo Bioinformatics Company Onconomics Corporation http://www.bscs.org/onco/default.htm From nonprofit BSCS Biological Sciences Curriculum Study Growth of Bioinformatics Computer Programming Personal Computers/ Internet 50 yrs ago DNA & Protein Structure 20 yrs ago PCR w.w.w. Last 10 yrs Human Genome Project All fields use computers Now Biological (art, law, communication) Research Bioinformatics Computer Skills www.oreilly.com Why informatics? Large size of data sets Allow students to ask questions of data Integrate current research into classroom http://www.ncbi.nlm.nih.go v/Genbank/genbankstats.ht ml >100,000 species are represented in GenBank all species 128,941 viruses 6,137 bacteria 31,262 archaea 2,100 eukaryota 87,147 The most sequenced organisms in GenBank Homo sapiens Mus musculus Rattus norvegicus Danio rerio Zea mays Oryza sativa Drosophila melanogaster Gallus gallus Arabidopsis thaliana Updated 8-12-04 GenBank release 142.0 10.7 billion bases 6.5b 5.6b 1.7b 1.4b 0.8b 0.7b 0.5b 0.5b Table 2-2 Page 18 Online datasets for all the Life Sciences Environment and Ecology Population http://www.prb.org Water http://www.waterontheweb.org/ http://www.neptune.washington.edu/ Geography http://nhd.usgs.gov/ http://data.geocomm.com/ Chemistry Physics Biology Anatomy & Physiology Earth http://www.dlese.org/educators/usingdata.html Agriculture Nutrition Plant http://allometra.com/ath_fasta_mpss.shtml Why use Bioinformatics? Data mining requires a testable hypothesis generated with regard to the function or structure of a gene or protein by identifying similar sequences in better characterized organisms. To help in uncovering phylogenetic relationships and evolutionary patterns. www.tigr.org What is Bioinformatics? N. M. Luscombe, et al. Yale University Method Inform Med 4/2001 Biotechnology Did You or Will You Ever? Ride in a car? Genetically engineered micro-organisms will someday be used to extract oil from rocks. Micro-organisms that break down oil spills are already in use. Drink tap water? Genetically engineered micro-organisms will someday be used to attract and filter out harmful substances from drinking water. Have a dog or cat? Vaccines for a number of pet diseases such as rabies will be improved by genetic engineering. Wear brightly colored clothes? Many clothing dyes can be made less expensively with biotechnology, and will last longer. Take vitamins? Vitamins can be made more potent and less expensively with biotechnology. Go to the bathroom? Micro-organisms are already an important part of sewage treatment; genetic engineering will produce bacteria that are more efficient at breaking down wastes. What Good is Recombinant DNA? People with diabetes need to take a drug called insulin. In the past, this drug was extracted and purified from ground-up animal glands. It takes several pounds of cow or pig glands to produce a fraction of an ounce of insulin. http://www.chourave.ch/init/kid/cartoon-00.html There are still many technical problems to be solved. Not all gene splices work, and some that do may fail over time. The best way for people to enjoy the benefits and avoid the problems is to stay informed and up to date about what’s happening in biotechnology. Today, the DNA with the instructions for making insulin can be spliced into a plasmid, And produced by bacteria? It’s faster, easier, and cheaper this way. There are also social and environmental concerns about biotechnology. Some people fear we will upset the balance of nature if “genetically engineered” organisms escape. Others fear that recombinant DNA will be used to influence human size, race, or intelligence. How Do You Make Recombinant DNA? First, you need to isolate a specific bit of DNA with the instructions you want. To do this, you use restriction enzymes that break up DNA strands in specific places. After you have DNA fragments, you sort them by size, using a gel. DNA is loaded onto the top of the gel, and then electricity is passed through it. This causes the DNA pieces to migrate down, and the small pieces travel further than the large pieces. Next, you need to add the DNA fragment into a host. In most research, the host is a plasmid, a ring of DNA found in some bacteria. The host DNA has to be exposed to restriction enzymes to make split ends that will attach to the fragment. After you mix the new and host DNA fragments, you need to add enzymes that will glue them together. How Do You Make Recombinant DNA? If you used a plasmid as a host, you need to put it back into a bacterium. When the bacterium replicates itself, it will copy the new DNA too. A small population of “gene-spliced” bacteria can develop into a large population in just a few days. http://www.gene.com/gene/research/ biotechnology What is an Enzyme? Enzymes are molecules that speed up biological reactions. For example, the enzyme carbonic anhydrase enables red blood cells to pick up and dump carbon dioxide 1 million times faster than they could without it. Some characteristics of enzymes: Enzymes increase the rate of a chemical reaction. Enzymes don’t enter into the reaction themselves. They’re not physically changed as a result of the reaction. A single enzyme can act thousands of times. Enzymes are highly specific. Like a wrench that will only fit a 5/16-inch bolt, each enzyme generally works with only a particular kind of molecule. An enzyme increases the odds that two molecules will meet, so an enzyme is a “matchmaker”. Why try to Design Better Enzymes? Enzymes are fragile…. they lose their shape (de-nature) if the temperature or acidity go up even a little. They also de-nature in alcohol or oils. This is a drag! If you’re adding an enzyme to a laundry detergent you’d like it to function in hot water, with bleach! As we understand more and more about DNA and how it is de-coded, we can re-write the instructions for making some enzymes. By altering their shapes, we may be able to make enzymes that are sturdier and able to function under harsher conditions. We may even be able to invent some completely new enzymes! Examples of Enzymes Subtilisin–This enzyme is added to laundry detergent. It breaks down proteins (like yucky egg yolk stains or gross dried blood) into tiny fragments that can be rinsed away from the fibers of the cloth. Papain-This enzyme breaks up proteins, and is extracted from the papaya fruit. It’s now added to contact lens cleaner solution to help dissolve away gross crusty things from soft contact lenses. Ceredase-Several thousand people in the United States have Gaucher disease (low levels of a crucial enzyme that dissolves fatty deposits in the liver, spleen and bone marrow). They suffer from bone pain, fractures, swelling and bleeding. Ceredase is a variation of the enzyme, produced in the laboratory, which can be used to treat disease. Vianain-Originally derived from pineapples, this enzyme offers hope to burn victims. It helps prepare burned areas of skin grafts by safely dissolving damaged skin layers that would otherwise have to be removed surgically. Journals & Books Public Library of Science - Open Access Journals http://www.plosbiology.org International Society for Computational Biology – Book Reviews http://www.iscb.org/bioinformaticsBooks.shtml Free Journals: Biotechniques http://www.BioTechniques.com Genomeweb http://www.genomeweb.com Books: The Cartoon Guide to Genetics, Larry Gonick & Mark Wheelis ISBN 0062730991 Harper 1983 Introduction to Bioinformatics, Arthur Lesk http://www.oup.com/uk/lesk/bioinf ISBN 0199251967 Oxford 2002 Fundamental Concepts of Bioinformatics, Dan Krane & Michael Raymer ISBN 0805346333 Benjamin Cummings 2003 Discovering Genomics, Proteomics, & Bioinformatics, A. Campbell & L. Heyer ISBN 0805347224 Benjamin Cummings 2002 Understanding Biotechnology, George Acquaah ISBN 0130945005 Pearson Prentice Hall 2004 Understanding Biotechnology, A. Borem, F. Santos, D. Bowen ISBN 0131010115 Pearson Prentice Hall 2003 Human Genome Project http://www.ornl.gov/sci/techresources/Human_Genome/publicat/primer2 001/index.shtml Genomics and Its Impact on Science and Society: The Human Genome Project and Beyond U.S. Department of Energy Genome Programs http://doegenomes.org www.ncbi.nlm.nih.gov National Center for Biotechnology Information A user’s guide to human genome Nature Genetics www.nature.com/ng/ vol 32, pg 1-79, 01 Sep 2002 Introduction: putting it together Question 8: How can one find all the members of a human gene family? Question 12: How does a user find characterized mouse mutants corresponding to human genes? Web resources: Internet resources featured in this guide Get Schooled for Bioinformatics • Biology – Know basics & Have sense of biological experimentation • Computer Science – Programming C, C++, Perl, JAVA, SAS, CGI – Database construction UNIX, LINUX – Algorithm design • Math/Statistics – Probability, Experimental design • Ethics • “Core Bioinformatics” – LIMS – EST clustering – Sequence analysis & annotation Fundamental Dogma Although a few databases already exist to distribute molecular information, Map Databases DNA RNA Gene Expression? the post-genomic era will need many more to collect, manage, and publish the coming flood of new findings. Biological Research = To enable the discovery of new biological insights as well as create a global perspective from which unifying principles in biology can be discerned. NCBI, Aug 2001 GenBank EMBL DDBJ PDB Development ? ProteinsSwissPROT PIR Circuits Regulatory Pathways? Metabolism? Phenotypes Clinical Data ? Neuroanatomy? Populations Biodiversity? Molecular Epidemiology? Comparative Genomics? Ultra – Conserved element -Only 6 SNP’s - mouse, rat, human TGATCCCGGACTCTATGAATTATTGATGAGATATGAGCGTTGA TTTCCCCTTTCAG GATGCAAACTCCATTATATTGTTAAAATGGCGATTTAATCGTTG AGAATAGCTTTG GTGTGGGTTTTTTCCCCCAACTCATTTGCGCCTCCTTCCTTTT CATTTAACTCTCT TAATTAAATCCTTTAACAGATTTTAATCACTTTTTGGAG