Southern California Bioinformatics Summer Institute Wendie Johnston, Jamil Momand, Sandra Sharp, Nancy Warter-Perez SoCalBSI Mission To identify and educate college students for successful careers in bioinformatics Student Requirements: Junior year undergraduate through 2nd year of graduate school Majoring in Computer-related field or Molecular Life Science field 3.0 GPA minimum Desires a career in Bioinformatics field Goals of SoCalBSI 1) Familiarize computer science and molecular life science students with bioinformatics software programs 2) Introduce programming skills that will enable students to independently write programs 3) Explore the social, moral and ethical issues associated with the human genome sequence 4) Offer career counseling and expose to career opportunities 5) Provide research experiences with professional bioinformaticists 6) Create opportunities for interactions with bioinformaticists 7) Foster long-lasting professional relationships Molecular Life Science Central Dogma Molecular Basis of Disease DNA Primary Sequence Protein Primary, Secondary, Tertiary Structure Molecular Evolution Signal Transduction Statistics Probability Theory Hidden Markov Chain Bayes’ Theory Expect Value Rule of Counting Computer Science Structured Programming Data Structures Algorithms Complexity Analysis Software Engineering Core Bioinformatics Literature Searching NCBI Model Scoring Matrices Dynamic Programming Global vs. Local Sequence Alignment Multiple Sequence Alignment Phylogenetic Trees Molecular Life Science Databases Protein Modeling Microarrays Ethics Proteomics Privacy Security Human Genome Achievement of Program Goals S. Sharp-coPI Research Training Post-Summer Followup Graduate School Opportunities Jackie Leung N. Warter-Perez-coPI W. Johnston-coPI Didactic Training Professional Development B. Krilowitz -Special Topics instructors Project -J. Faust Evaluation -E. Torres -S. Heubach Coordination of Research Mentors -Research Seminar Speakers BioDiscovery -T. Chen -M. Pelligrini Caltech -B. Hoff -D. Robbins City of Hope -G. Larson -S. Tavare Protein Pathways -E. Mjolsness -B. Wold UCLA -M. Nordborg -T. Yeates USC ViaLogy Program Coordinator -Student Recruitment -WebPage Development -Speaker Travel -Budget -Scheduling -Student Housing -Field Trips Who are you? 8 Males, 6 Females 7 graduate students and 7 undergraduates 11 from CA, 1 from MA, 1 from TX, 1 from NM Majors in Biochemistry, Biology, Biostatistics, Computer Science, Cybernetics, Engineering, Mathematics Open hours Computer Workstations open from 8 am-8 pm SoCalBSI office PS504 8-12 noon M-Th, F 8-4. Eating on campus Campus Escort Welcome to SoCalBSI June 16, 2003 Review of course prerequisites Course objectives Become proficient at using existing bioinformatics software Understand the statistics behind bioinformatics Write an algorithm that answers a specific bioinformatics problem Become acquainted with some ethical issues surrounding the science of the human genome project Rationale for offering SoCalBSI Program 1. 2. 3. Give participants a chance to understand how popular bioinformatics algorithms operate (Clustal W, BLAST, BLIMPs). Give the students the opportunity to create software that uses concepts from Bioinformatics history Give the students an insight into bioinformatics so they can make an intelligent determination of how to achieve education and career goals. Learning Goals of Didactic Portion of the Program Introduce software and databases currently used by bioinformaticists Introduce the organization of the data. How data is gathered and how it is annotated. Introduce statistics of data analysis. Introduce the concept of dynamic programming Give the students an opportunity to create an algorithm that analyzes sequence data Encourage the student to research a bioinformatics company and report on its products. Course logistics Course Website (http://www.calstatela.edu/faculty/jmomand /Bioinformaticscourse.html) Power point presentation In-class Workshop References Writing assignment Definition of Bioinformatics Many definitions at the moment: Use of computers to catalog and organize biological information into meaningful entities. Conceptualization of biology in terms of molecules and the application of “informatics” techniques (from disciplines such as applied math, computer science and statistics) to understand and organize the information associated with these molecules Bioinformatics is Multidisciplinary Genomics Drug Design Computer Science Molecular Life Sciences Phylogenetics Structural Biology Math Statistics How much of the genome is defined? Unknown Function How is Bioinformatics Used? Bioinformatics is often used to help “focus” the experiments of the benchtop scientist Bioinformatics isn’t going to replace lab work anytime soon Experimental proof is still the “Gold Standard”. Useful textbooks on the subject Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition by Andreas D. Baxevanis (Editor), B. F. Francis Ouellette ISBN: 0471383910 • GenBank database • Sequence alignment programs Useful textbooks on the subject (2) Computational Molecular Biology by Pavel A. Pevzner ISBN: 0262161974 • Discussion of dynamic programming • Needleman-Wuncsh method • Smith-Waterman method • Recursive functions Useful textbooks on the subject (3) Discovering Genomics, Proteomics & Bioinformatics by Campbell and Heyer ISBN: 0805347224 • • • • • Genome sequence acquisition and analysis. Basic and applied research with DNA microarrays. Proteomics. Modeling whole-genome circuits. Transition from genetics to genomics: medical case studies. Basis of molecular biology Hierarchy of relationships (some exceptions): Genome Gene 1 Gene 2 Gene 3 Gene X Protein 1 Protein 2 Protein 3 Protein X Function 1 Function 2 Function 3 Function X How can one use bioinformatics to link diseases to genes? Disease Old days: functional cloning of genes 1. Function 2. Gene 3. 4. Map 5. Careful description of disease Establish link between disease and metabolic defect Isolate protein Isolate cDNA Determine if DNA is mutated in human How can one use bioinformatics to link diseases to genes? Disease New days: positional cloning of genes 1. Map Gene Function 2. 3. 4. 5. Find genetic markers associated with disease Sequence DNA in close proximity to the markers Compare DNA from afflicted individuals to DNA of normal individuals (database) Find abnormality Predict gene function from sequence information What is the approach used to sequence genomes? Divide and conquer Split the genome into fragments Clone into vectors that can accept large fragments: yeast artificial chromosomes (YAC Library) Landmarks within the genome can be obtained using Sequence Tagged Sites (STS) Sequences of YAC clones are matched with each other. Sequences that overlap form contigs. History of the Human Genome Project 1953 Watson, Crick DNA structure 1972 Berg, 1st recombinant DNA 1977 Maxam, Gilbert, Sanger sequence DNA 1980 1982 1984 1985 1986 Botstein, Sinsheimer DOE begins Wada MRC Davis, genome proposes to publishes hosts Skolnick build first large meeting to studies with White discuss HGP $5.3 million automated genome propose to sequencing Epstein-Barrat UCSanta map human robots virus (170 Cruz; genome with Kary Mullis kb) RFLPs develops PCR 1987 Gilbert announces plans to start company to sequence and copyright DNA; Burke, Olson, Carle develop YACs; DonisKeller publish first map (403 markers) History of the Human Genome Project (continued) 1987 (cont) 1988 1989 Hood produces first automated sequencer; Dupont devolops fluorescent dideoxynucleotides Proposal Venter Simon Hood, to sequence announces develops Olson, 20 Mb in strategy to BACs; US Botstein model sequence and French Cantor propose organism by ESTs. He teams 2005; plans to publish first using Lipman, patent physical STS’s to map Myers partial maps of the human chromosome genome publish the cDNAs; BLAST Uberbacher s; first algorithm develops genetic maps GRAIL, a of mouse and gene finding human program genome published NIH supports the HGP; Watson heads the project and allocates part of the budget to study social and ethical issues 1990 1991 1992 1993 Collins is named director of NCHGR; revise plan to complete seq of human genome by 2005 1995 Venter publishes first sequence of free-living organism: H. influenzae (1.8 Mb); Brown publishes on DNA arrays 1996 Yeast genome is sequenced (S. cerevisiae) History of the Human Genome Project (continued) 1997 Blattner, Plunket complete E. coli sequence; a capillary sequencing machine is introduced. 1998 SNP project is initiated; rice genome project is started; Venter creates new company called Celera and proposes to sequence HG within 3 years; C. elegans genome completed 1999 2000 NIH proposes to sequence mouse genome in 3 years; first sequence of chromosome 22 is announced Celera and others publish Drosphila sequence (180 Mb); human chromosome 21 is completely sequenced; proposal to sequence puffer fish; Arabadopsis sequence is completed 2001 Celera publishes human sequence in Science; the HGP consortium publishes the human sequence in Nature 2003 Completed genomes: 112 Microbial 18 Eukaryotes 1275 Viruses Public funding vs. Private funding Public-Taxpayers’ money, international effort. Private-Companies that invest money hope to provide access to their information on a fee basis. Celera also allows some free information to small research groups. Both groups have published the sequence of the human genome. Useful Websites that give information on Bioinformatics courses http://www.calstatela.edu/faculty/jmomand/Bioinformaticscourse.html http://gaia.ecs.csus.edu/~mei/bio296c.html http://www.smi.stanford.edu/projects/helix/bmi214/ http://cmgm.stanford.edu/biochem218/ - Syllabus http://www.sdsc.edu/~gribskov/bimm140/bimm140_lectures.html http://linkage.rockefeller.edu/wli/bioinfocourse/