Bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington thabangh@gmail.com Outline • • • • Introductions Biology background Rosalind Sequence alignment Washington state 2,763 miles Washington, DC Scout Jack Nan Introductory survey ͟ p-value ͟ t test ͟ BLAST ͟ Python sys.argv ͟ support vector machine ͟ false discovery rate ͟ dynamic programming ͟ recursion ͟ hierarchical clustering ͟ Wilcoxon test ͟ Python tuple ͟ Smith-Waterman ͟ Bonferroni correction ͟ Python dictionary Next to each concept, please enter 1, 2 or 3, as follows: 1. I do not know this concept. 2. I have heard of this concept, but could not give a precise definition. 3. I know this concept well. Do not sign your name. On the index card, write 1. 2. 3. 4. 5. 6. Your name Your email address Your home country The year and subject of your bachelors degree The name of your favorite undergraduate course Your proficiency (introductory, intermediate, advanced) in Python, linear algebra, statistics, and biology 7. An interesting thing you have done 8. One question for me • • • • • • • • • • Name: Bill Noble Email: thabangh@gmail.com Country: United States Degree: 1991 Symbolic Systems Favorite course: Programming Methodology Python: advanced Statistics: advanced Biology: intermediate Interesting thing I have done: unicycling Question: … Course goals By the end of this course, you should be able to • Describe some basic computational challenges in bioinformatics. • Implement and use several basic algorithms in this field. • Describe several advanced algorithms. This course will not • Teach molecular biology techniques. • Teach you how to use offthe-shelf bioinformatics software. What is life? • Entropy: the tendency toward disorder. – Living organisms have low entropy. – Soil has high entropy. The cell • Primary low entropy compartment • Tasks: – Gather energy – Maintain inside/outside distinction • Strategies: – – – – Movement Signal transduction Energy capture Reproduction Biological macromolecules • Lipids (fat): – membranes – energy storage • Carbohydrates (sugar): – energy storage – structure – cell-cell communication • Nucleic aids – genetic material • Proteins – workhorses of the cell The central dogma of molecular biology DNA Transcription lejeuneusa.org RNA Translation Protein Video rcsb.org 4-letter DNA alphabet • DNA consists of an alphabet of four bases – Adenine – Cytosine – Guanine – Thymine Rosalind • Visit http://rosalind.info and create a login. • Enroll in this class via http://rosalind.info/classes/enroll/e7948c7e32/ • Solve the problem, “Installing Python” • Solve the problem, “Counting DNA nucleotides.” Reverse complement T C TCAGG TCAGGTCACAGTT A Write down the rest of the DNA sequence. G G AAC AACTGTGACCTGA Write down the sequence you get by reading from the blue strand, starting at the bottom. Reverse complement TCAGGTCACAGTT ||||||||||||| AACTGTGACCTGA Rosalind: Complementing a strand of DNA One-minute response At the end of each class • Write for about one minute. • Provide feedback about the class. • Was part of the lecture unclear? • What did you like about the class? • Do you have unanswered questions? • Sign your name I will begin the next class by responding to the oneminute responses