ICS 20X: Python Programming with applications to Biology Proposer: Dennis Kibler Information and Computer Science 1) Level a) Undergraduate (only first course for CS students) b) Upper Division ( any level for biology students- even grad students I think would be ok) 2) Catalog Description ICS 20X: BIOINFORMATICS (4). The course assumes no background in computer science or in biology. Fundamental programming concepts are introduced using the language Python using a problem-oriented approach. All problems come from molecular biology. 3) Prerequisites a) none. Does not count as elective CS class. 4) Logistics a) Final exam and nine homeworks b) Final grade will be based on (1) best 8 of 9 homeworks (80%) (2) 20% final 5) Text Book Python Programming: An Introduction to Computer Science by John Zelle 6) Potential course overlaps i) No overlap with any ICS courses. This is intended as a pre-CS21 class for students who have no experience with programming. 7) Curriculum This course is intended as pre-21 class for computer science students who have had little or no experience in programming. It also allows students to test their interest in computer science and experience one of the most significant application areas of computer science. The structure of course is problem-oriented rather than tooloriented. 8) Potential Instructors (list all that are applicable) a) b) c) d) Dennis Kibler, Alex Thornton Pierre Baldi any of the new Bioinformatics faculty 9) Expected Frequency of Offering a) Once a year b) Which quarter(s)? i) Fall 10) Course Overview and Goals This course is meant as a first course in programming for students in computer science or biology. The goals of the course are: (1) provide a problem-oriented approach to learning computer science and programming (2) provide a consistent set of examples from a single domain, computational biology, so students can experience the increased application value of computer science (3) empower biologists to write simple programs for the computational analysis of their data. (4) provide students with a computational view on understanding biological data. (5) enable biology students to communicate more effectively with computer scientists. (6) provide all students with an understanding of the limitations and potentials of computation. 11) Topic Outline a) This course outline may vary with individual instructors. This is an ambitious syllabus that will be adjusted to match the students ability to learn the material. Examples applications are all drawn from a single domain to demonstrate the significance of programs that students are learning to write. This also provides a natural justification for objects and modules. In general the weekly syllabus is divided into two parts: the biological problem being addressed and the programming tools sufficient to address the problem. Theoretical computer science questions are also introduced, but are not discussed in depth. 1. Introduction to Python and the process of program development 2. Computing the GC-Count of a dna file: strings, for-loops, file-io 3. A simple program to identify genes: file-io, while-loops, strings, functions 4. Computing dinucleotides/trinucleotides frequencies: hash tables 5. Creating and analyzing kmer distributions and signatures: hash tables, sorting 6. Finding known patterns in dna and amino acid sequences: regular expressions 7. Discovering patterns (regulatory elements) in dna sequences: modules 8. Retrieving selecting information from GenBANK: modules, 9. Comparing genes: arrays, dynamic programming 10. Clustering sequences and gene-expression data: objects