GCB/CIS/MTR 535: Introduction to Bioinformatics (Spring 2016) Course Director: Benjamin F. Voight, PhD Department of Systems Pharmacology and Translational Therapeutics Department of Genetics 10-126 Smilow Center for Translational Research (SCTR) 3400 Civic Center Blvd 215-746-8083 bvoight@upenn.edu Casey S. Greene, PhD Department of Systems Pharmacology and Translational Therapeutics 10-131 Smilow Center for Translational Research (SCTR) 3400 Civic Center Blvd 215-573-2991 csgreene@mail.med.upenn.edu TAs: Zhang (Eric) Chen Katie Siewert Alex Amlie-Wolf Guest Proctors: Sarah Middleton Greg Grant Location/Time: MWF Office Hours: Ben Casey TAs zhch@mail.med.upenn.edu ksiewert@mail.med.upenn.edu alexaml@mail.med.upenn.edu sarahmid@mail.med.upenn.edu ggrant543@gmail.com 10-11 AM 10-146 SCTR Wed 12-1 PM By appointment Fri 12-1 PM By appointment Tue TBD Thu TBD By appointment 10-126 SCTR 10-126 SCTR 10-131 SCTR 10-131 SCTR TBD TBD - Course web site: We will use Canvas (https://upenn.instructure.com/) to disseminate and receive materials for the course. We will use Canvas to provide lecture material. We will also be using SageMathCloud (https://cloud.sagemath.com/) for learning and homework. Completed assignments will be uploaded there. Course Description: This course provides broad overview of bioinformatics and computational biology as applied to biomedical research. A primary objective of the course is to enable students to integrate modern bioinformatics tools into their research activities. Course material is aimed to address biological questions using computational approaches and the analysis of data. Areas include DNA sequence alignment, genetic variation and analysis, motif discovery, study design for high-throughput sequencing, RNA and gene expression, single gene and whole-genome analysis, machine learning, and topics in systems biology. The relevant principles underlying methods used for analysis in these areas will be introduced and discussed at a level appropriate for biologists without a background in computer science. However, a basic primer in programming and operating in a UNIX environment will be presented, and students will also be introduced to Python, R, and tools for reproducible research. This course emphasizes direct, hands-on experience with applications to current biological research problems. The course is NOT INTENDED for computer science students who want to learn about biologically motivated algorithmic problems; GCB/CIS/BIOL536 and GCB537 is more appropriate. The course will assume a solid knowledge of modern biology. An advanced undergraduate course such as BIOL421 or a graduate course in biology such as BIOL526 (Experimental Principles in Cell and Molecular Biology), BIOL527 (Advanced Molecular Biology and Genetics), BIOL528 (Advanced Molecular Genetics), BIOL540 (Genetic Systems), or equivalent, is a prerequisite. Equipment prerequisite: IMPORTANT All students are required to bring a laptop to class for in class activities. TAs will provide help with the material, but students should be computer-capable with their own laptop, and should be willing/capable to download and install free software from the Internet. Grading: Grades will be determine by: (i) the Course Project (50% total, 25% written proposal, 25% oral presentation), (ii) homework assignments (30%), (iii) in-class labs assignments (10%), and (iv) class participation (10%). Late grading policy: For in-class lab assignments, we will not accept late turn-ins. For homework, we will grade assignments turned in after the due date up to one week past the deadline, but grades will be penalized (minus 30% off). Our late policy is driven by fairness, because answer keys will be handed out a week after problems are due. Plagiarism policy: Consistent with the University of Pennsylvania's honor code and policies on academic integrity, we maintain a zero-tolerance policy on plagiarism. If we determine that there is academic dishonesty, the course directors have wide discretion on grading, which includes the assignment of failing grades. In addition, we reserve the right to pursue formal disciplinary actions should the need arise. In many cases, students may not be aware of what constitutes plagiarism in their work. If you are unsure, please contact one of the course directors. Please see links below regarding the office of academic integrity for Penn’s policy on plagiarism and our discretion on grading. https://provost.upenn.edu/policies/faculty-handbook/students/iv-d http://www.upenn.edu/academicintegrity/ai_violations.html Reference Texts: As this is a rapidly moving field, there is no textbook required for this course. We rely on a combination of online material, lecture notes, powerpoint slides, and active exercises. See PLOS Computational Biology (see sections on Education, Developing Computational Biology, and Translational Bioinformatics): http://www.ploscollections.org/static/pcbiCollections;jsessionid=14B572B6330867049C E2C9F46D97CD84 In addition, the following books and online sources can serve as references: 1. Bioinformatics for Biologists, eds. Pavel Pevzner and Ron Shamir, Cambridge University Press, 2011. 2. Bioinformatics and Functional Genomics by Jonathan Pevsner (www.bioinfbook.org/). This compiles material used for a course at Johns Hopkins. Date 13-­‐Jan Pre-­‐class Activity -­‐ 15-­‐Jan Stats for Bioinformatics 18-­‐Jan Pre-­‐class Type -­‐ Lecture Due Dates Class Activity Lecture Intro/Databases -­‐ Seeking Biological Info Online -­‐ 20-­‐Jan Working in UNIX -­‐ I MLK -­‐ No Class Workbook UNIX Activities/Problems -­‐ I Seeking Bio Online 22-­‐Jan Working in UNIX -­‐ II Workbook UNIX Problems -­‐ II UNIX -­‐ I 25-­‐Jan Programming in Python -­‐ I Workbook Python: Practice and Problems -­‐ I UNIX -­‐ II 27-­‐Jan Programming in Python -­‐ II Workbook Python: Practice and Problems -­‐ II Python -­‐ I 29-­‐Jan Programming in Python -­‐ III Workbook Python: Practice and Problems -­‐ III Python -­‐ II 1-­‐Feb Programming in Python -­‐ IV Workbook Python: Practice and Problems -­‐ IV Python -­‐ III 3-­‐Feb Intro to R -­‐ I Workbook R: Introduction and Problems -­‐ I Python -­‐ IV 5-­‐Feb Intro to R -­‐ II Workbook R: Practice and Problems -­‐ II R -­‐ I 8-­‐Feb Tools for Reproducible Research Workbook R, Python: Reproducible Research R -­‐ II 10-­‐Feb Analysis of High Dimensional Data Workbook + Lecture R: Array Expr. Analysis: Bioconductor Repro. Research 12-­‐Feb Functional Enrichment Analysis Workbook + Lecture R: QC, Annotation, Ontology analysis 15-­‐Feb Sequence Alignment for Next Gen Data Workbook + Lecture 17-­‐Feb Phylogenies/Multi-­‐sequence alignment Workbook + Lecture Building phylogenies BLAST 19-­‐Feb Variation and its Discovery Workbook + Lecture Practice Problems + Analysis (PPA) Phylogeny 22-­‐Feb Bioinformatics in Pharmacology Pharmacology Data Analysis Activity PPA 24-­‐Feb ChIP-­‐Seq Primer -­‐ I Workbook Analysis of a Chip-­‐Seq Data Set -­‐ I Pharmacology 26-­‐Feb ChIP-­‐Seq Primer -­‐ I Workbook Analysis of a Chip-­‐Seq Data Set -­‐ II ChIP-­‐Seq -­‐ I 29-­‐Feb RNA-­‐Seq Primer -­‐ I Workbook Analysis of an RNA-­‐Seq Data Set -­‐ I ChIP-­‐Seq -­‐ II 2-­‐Mar RNA-­‐Seq Primer -­‐ II Workbook Analysis of an RNA-­‐Seq Data Set -­‐ II RNA-­‐Seq -­‐ I 4-­‐Mar RNA-­‐Seq Primer -­‐ II Workbook Analysis of an RNA-­‐Seq Data Set -­‐ III RNA-­‐Seq -­‐ II -­‐ Lecture BLAST, Alignment, Finding homologs/orthologs GExp-­‐Arrays R: QC/Annot/Ontol SPRING BREAK -­‐ No Class 14-­‐Mar -­‐ 16-­‐Mar ENCODE Primer -­‐ I 18-­‐Mar More on ENCODE -­‐ II 21-­‐Mar Motif Discovery 23-­‐Mar cis-­‐regulatory Modules -­‐ Workbook Project Meeting RNA-­‐Seq -­‐ III ENCODE data/Problems -­‐ I -­‐ Workbook ENCODE data/Problems -­‐ II Workbook + Lecture Motif Discovery and Analysis -­‐ I Workbook + Lecture Motif Discovery and Analysis -­‐ II ENCODE -­‐ II Motifs -­‐ I "OPEN STUDIO" 25-­‐Mar ENCODE -­‐ I 28-­‐Mar Regulatory Analysis Workbook + Lecture Motif Discovery and Analysis -­‐ III Motifs -­‐ II 30-­‐Mar Machine Learning (ML) -­‐ I Workbook + Lecture Machine Learning Activities -­‐ I Motifs -­‐ III 1-­‐Apr ML -­‐ II Workbook + Lecture Machine Learning Activities -­‐ II ML -­‐ I 4-­‐Apr ML -­‐ III Workbook + Lecture Machine Learning Activities -­‐ III ML -­‐ II 6-­‐Apr ML -­‐ IV 8-­‐Apr Paper Machine Learning Activities -­‐ IV ML -­‐ III ML -­‐ V Workbook + Lecture Machine Learning Activities -­‐ V ML -­‐ IV 11-­‐Apr Analysis of Genetic Variation -­‐ I Workbook + Lecture Intro to PLINK -­‐ I ML -­‐ V 13-­‐Apr Analysis of Genetic Variation -­‐ II -­‐ 15-­‐Apr Systems Biology Analysis of Genetic Variation -­‐ II Workbook Workbook + Lecture Analyses in Systems Biology 18-­‐Apr -­‐ -­‐ "Demo Day" (Greene, Voight, Tas) -­‐ 20-­‐Apr -­‐ -­‐ Student Presentations -­‐ 22-­‐Apr -­‐ -­‐ Student Presentations -­‐ 25-­‐Apr -­‐ -­‐ Student Presentations -­‐ 27-­‐Apr -­‐ -­‐ Student Presentations Project Report Due -­‐