Welcome to Chem 434 Bioinformatics March 29, 2005 Review of course prerequisites Course objectives Become proficient at using existing bioinformatics software Understand the statistics behind bioinformatics Write an algorithm that answers a specific bioinformatics problem Rationale for offering an introductory course in bioinformatics 1. 2. 3. Give the students a chance to understand how popular bioinformatics algorithms operate (Clustal W, BLAST, Scan Prosite). Give the students a programming assignment that is a unique component of the course. Give the students insight into bioinformatics so they can make an intelligent determination of whether or not to pursue this field. Learning Goals of CSULA Intro to Bioinformatics Course Introduce software and databases currently used by bioinformaticists Introduce the organization of the data. How data is gathered and how it is annotated. Introduce statistics of data analysis. Introduce the concept of dynamic programming Give the students an opportunity to create an algorithm that analyzes sequence data Encourage the student to analyze the primary literature in the field and write about current research Course logistics Course Website (http://www.calstatela.edu/faculty/jmomand /Bioinformaticscourse.html) Syllabus Power point presentation Homework In-class Workshop References Course logistics (continued) Grading Homework (100 pts) Quizzes (4 x 25 pts) Writing Assignment (100 pts) Final Project (200 pts) Course logistics (continued) Prerequisites Upper division or graduate student status Majoring in Computer-related field or Molecular Life Science If Computer-related, you must have previously completed a Molecular Life Science course with grade C or better. If Molecular Life Science, you must have previously completed an object-oriented programming course or C++ programming course with grade C or better. Definition of Bioinformatics Many definitions at the moment: Use of computers to catalog and organize biological information into meaningful entities. Conceptualization of biology in terms of molecules and the application of “informatics” techniques (from disciplines such as applied math, computer science and statistics) to understand and organize the information associated with these molecules Bioinformatics is Multidisciplinary Genomics Drug Design Computer Science Molecular Life Sciences Phylogenetics Structural Biology Math Statistics How much of the genome is defined? Unknown Function How is Bioinformatics Used? Bioinformatics is used to help “focus” the experiments of the benchtop scientist Bioinformatics isn’t going to replace lab work anytime soon Experimental proof is still the “Gold Standard”. Textbook Bioinformatics and Functional Genomics by Jonathan Pevsner ISBN: 0471210048 • Genome sequence acquisition and analysis. • Basic and applied research with DNA microarrays. • Proteomics. Useful textbooks on the subject (2) Python Programming for the Absolute Beginner by M. Dawson ISBN: 1592000738 Useful textbooks on the subject (3) Learning Python, Second Edition by David Ascher and Mark Lutz ISBN: 0596002815 Useful textbooks on the subject (4) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition by Andreas D. Baxevanis (Editor), B. F. Francis Ouellette ISBN: 0471383910 • GenBank database • Sequence alignment programs Basis of molecular life sciences Hierarchy of relationships (some exceptions): Genome Gene 1 Gene 2 Gene 3 Gene X Protein 1 Protein 2 Protein 3 Protein X Function 1 Function 2 Function 3 Function X How can one use bioinformatics to link diseases to genes? Disease Positional cloning of genes 1. Map 2. 3. Gene 4. Function 5. Find genetic markers associated with disease Sequence DNA next to the markers Compare DNA from afflicted individuals to DNA of normal individuals (database) Find abnormality Predict gene function from sequence information What is the approach used to sequence genomes? Divide and conquer Split the genome into fragments Clone into vectors that can accept large fragments: yeast artificial chromosomes (YAC Library) Landmarks within the genome can be obtained using Sequence Tagged Sites (STS) Sequences of YAC clones are matched with each other. Sequences that overlap form contigs. History of the Human Genome Project 1953 Watson, Crick DNA structure 1972 Berg, 1st recombinant DNA 1977 Maxam, Gilbert, Sanger sequence DNA 1980 1982 1984 1985 1986 Botstein, Sinsheimer DOE begins Wada MRC Davis, genome proposes to publishes hosts Skolnick build first large meeting to studies with White discuss HGP $5.3 million automated genome propose to sequencing Epstein-Barrat UCSanta map human robots virus (170 Cruz; genome with Kary Mullis kb) RFLPs develops PCR 1987 Gilbert announces plans to start company to sequence and copyright DNA; Burke, Olson, Carle develop YACs; DonisKeller publish first map (403 markers) History of the Human Genome Project (continued) 1987 (cont) 1988 1989 Hood produces first automated sequencer; Dupont devolops fluorescent dideoxynucleotides Proposal Venter Simon Hood, to sequence announces develops Olson, 20 Mb in strategy to BACs; US Botstein model sequence and French Cantor propose organism by ESTs. He teams 2005; plans to publish first using Lipman, patent physical STS’s to map Myers partial maps of the human chromosome genome publish the cDNAs; BLAST Uberbacher s; first algorithm develops genetic maps GRAIL, a of mouse and gene finding human program genome published NIH supports the HGP; Watson heads the project and allocates part of the budget to study social and ethical issues 1990 1991 1992 1993 Collins is named director of NCHGR; revise plan to complete seq of human genome by 2005 1995 Venter publishes first sequence of free-living organism: H. influenzae (1.8 Mb); Brown publishes on DNA arrays 1996 Yeast genome is sequenced (S. cerevisiae) History of the Human Genome Project (continued) 1997 Blattner, Plunket complete E. coli sequence; a capillary sequencing machine is introduced. 1998 SNP project is initiated; rice genome project is started; Venter creates new company called Celera and proposes to sequence HG within 3 years; C. elegans genome completed 1999 2000 NIH proposes to sequence mouse genome in 3 years; first sequence of chromosome 22 is announced Celera and others publish Drosphila sequence (180 Mb); human chromosome 21 is completely sequenced; proposal to sequence puffer fish; Arabadopsis sequence is completed 2001 Celera publishes human sequence in Science; the HGP consortium publishes the human sequence in Nature 2003 Completed genomes: 112 Microbial 18 Eukaryotes 1275 Viruses Public funding vs. Private funding Public-Taxpayers’ money, international effort. Private-Companies that invest money hope to provide access to their information on a fee basis. Celera also allows some free information to small research groups. Both groups have published the sequence of the human genome. Useful Websites that give information on Bioinformatics courses http://www.calstatela.edu/faculty/jmomand/Bioinformaticscourse.html http://gaia.ecs.csus.edu/~mei/bio296c.html http://www.smi.stanford.edu/projects/helix/bmi214/ http://www.sdsc.edu/~gribskov/bimm140/bimm140_lectures.html http://www.nslij-genetics.org/bioinfotraining/