Bioinformatics Education at Harvey Mudd College Ran Libeskind-Hadas, Department of Computer Science Thanks to Eliot Bush (Biology) and Zach Dodds (Computer Science) Our name is Mudd… • Undergraduate only; 700 students • Sciences, mathematics, and engineering Our name is Mudd… • Undergraduate only; 700 students • Sciences, mathematics, and engineering Our name is Mudd… • Undergraduate only; 700 students • Sciences, mathematics, and engineering The HMC Curriculum Includes one semester of CS and one of Biology Electives Humanities Major Core Experiments in the Core Semester 1 Semester 2 The “regular” path Introduction to CS Introduction to Biology An integrated full year course Integrated Introduction to CS and Biology A one semester integrated course Computation and Biology Satisfies CS core requirement but not the Biology requirement Introduction Introduction to Biology Introduction to Biology to Biology 200 students per year 20 students in 2009-2010 … or a second Biology course 40 students in 2010-2011 Computation and Biology Core Course Objectives: – Cover the content of the “regular” CS intro course – Demonstrate the relationship between computing and biology – Use computation to teach biology fundamentals and use biology to motivate computing fundamentals – Provide students with computational tools to perform their own “dry lab” experiments Computation and Biology Core Course Objectives: – Cover the content of the “regular” CS intro course – Demonstrate the relationship between computing and biology – Use computation to teach biology fundamentals and use biology to motivate computing fundamentals – Provide students with computational tools to perform their own “dry lab” experiments Computation and Biology Core Course Objectives: – Cover the content of the “regular” CS intro course – Demonstrate the relationship between computing and biology – Use computation to teach biology fundamentals and use biology to motivate computing fundamentals – Provide students with computational tools to perform their own “dry lab” experiments Computation and Biology Core Course Objectives: – Cover the content of the “regular” CS intro course – Demonstrate the relationship between computing and biology – Use computation to teach biology fundamentals and use biology to motivate computing fundamentals – Provide students with computational tools to perform their own “dry lab” experiments Course Structure Assignment Biologist Tuesday C.S.ist Friday CSist Thursday Weekend wks 4-5 Introduction to Python: Data, functions, and basic constructs Population genetics, molecular evolution Wks 6-7 DNA, RNA, central dogma, genes: Case study of lactose intolerance CS Sequence alignment Recursion Wks 8-9 wks 1-3 Biology Phylogenetics Recursion on trees and phylogenetic tree algorithms Designing a larger program, randomness, simulation Subset of student HW Gene finding, gene expression, lactase expression Mitochondrial Eve, diploid populations with selection, molecular evolution simulations Implement alignment and extend to deal with substitutions Implementing a phylogenetic tree algorithm and making inferences from the results wks 10-11 Folding: RNA to Proteins Wks 1112 Systems biology and modeling: Chemotaxis Wks 1314 Biology Topics CS RNA folding algorithm, efficiency, and memoization Computation and modeling Limitations of computation Subset of student HW Implement RNA folding and visualize results Chemotaxis simulations and evaluation of models Capstone Projects Using computation to teach biology fundamentals Population genetic model Explore effects of drift and selection, Hardy-Weinberg equilibrium Using biology to motivate computation: RNA Folding Recursion and memoization Above and Beyond… Above and Beyond… Final project example: What makes cholera pathogenic? Pathogenic vs. non-pathogenic strains Final project example: What makes cholera pathogenic? Compare all genes in one strain with all in other to find orthologs (use fast global alignment) Final project example: What makes cholera pathogenic? Programmatically Blast unique proteins to see what they are Read about these unique genes and explain what they do Courtesy of Prof. Russell Schwartz Some genes encode for transcription factors that promote or inhibit the expression of other genes Purple is highly expressed, green is not expressed genes Microarray data… conditions Courtesy of Prof. Russell Schwartz Intuition Behind Network Inference gene 1 gene 2 gene 3 gene 4 0 0 1 1 1 1 0 1 0 0 1 0 1 1 0 0 1 1 0 1 conditions + 2 1 4 - + 2 1 1 - 3 2 - 3 3 + 2 1 - + 3 2 1 … 3 correlated expression implies that intuition still leaves a lot of ambiguity common regulation Courtesy of Prof. Russell Schwartz Assuming a Binary Input Matrix We will assume that genes only have two possible states: 0 (off) or 1 (on) gene 1 gene 2 gene 3 gene 4 1 0 0 0 1 1 0 0 conditions 0 0 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 We will also assume that we want to find directionality but not strength of regulatory interactions We will exclude the possibility of regulatory cycles: 1 2 4 3 OK 1 2 4 3 NOT OK The Project Take binary microarray data as input Find the acyclic regulatory network with the highest likelihood Display the network somehow Student Response Likert scale (1 low, 7 high) survey: “This course stimulated my interest in the subject matter” College mean: 5.53/7.0 (std. dev 0.80) Computation and Biology: 6.51/7.0 “I learned a great deal in this course” College mean: 5.76/7.0 (std. dev 0.72) Computation and Biology: 6.49/7.0 “Time spent outside of class (per week)” College mean: 4.98 hours (std. dev 2.42) Computation and Biology: 6.28 hours What did students choose to do the following term? Students have one elective in the spring term Took introductory biology: Took an elective other than CS or biology: Took an “upper division” biology course: Took the second CS course: 0/40 0/40 18/40 22/40 Outperformed their peers • Students learned the foundational content of “Intro CS” and “Intro Biology” • Students’ programs provide rich “dry lab” experiments and simulations that reinforce understanding of biology • Students develop general problem-solving and programming skills (e.g. DP) and have confidence to solve “new” problems on their own • Students learned the foundational content of “Intro CS” and “Intro Biology” • Students’ programs provide rich “dry lab” experiments and simulations that reinforce understanding of biology • Students develop general problem-solving and programming skills (e.g. DP) and have confidence to solve “new” problems on their own • Students learned the foundational content of “Intro CS” and “Intro Biology” • Students’ programs provide rich “dry lab” experiments and simulations that reinforce understanding of biology • Students develop general problem-solving and programming skills (e.g. DP) and have confidence to solve “new” problems on their own Next steps… • Increasing student demand for more courses and even a major in computational biology • “Mathematical Biology Major” redesigned in Spring 2011 to “Mathematical and Computational Biology (MCB)” major – Good news: 9 MCB majors in sophomore year (6 Biology majors and 2 Biochemistry majors) – Bad news: Few faculty in a position to contribute Beyond the core (intro CS, intro Biology, 3 semesters math, 2 chemistry, 1 physics, …) Introductory Sequence • Discrete Math • Biology laboratory • Introduction to Mathematical and Computational Biology Biology Foundations • Three of: Comparative physiology, ecology and environmental biology, evolutionary biology, molecular biology • One biology seminar • One biology laboratory Mathematical and Computation Courses • Intermediate Mathematical Biology • Computational Biology • One upper-division math course • One upper-division CS course • Three more math and CS courses Electives, Thesis, Colloquium • One related elective • Colloquium • Senior thesis Future Plans… • Refine and improve introductory course • Write a book for the introductory course • Collaborate with “sister” institutions to expand computational biology curriculum – New faculty – New courses Questions, Comments, Heckles