GCB/CIS/MTR 535: Introduction to Bioinformatics (Spring 2016

advertisement
GCB/CIS/MTR 535: Introduction to Bioinformatics
(Spring 2016)
Course Director:
Benjamin F. Voight, PhD
Department of Systems Pharmacology and Translational Therapeutics
Department of Genetics
10-126 Smilow Center for Translational Research (SCTR)
3400 Civic Center Blvd
215-746-8083
bvoight@upenn.edu
Casey S. Greene, PhD
Department of Systems Pharmacology and Translational Therapeutics
10-131 Smilow Center for Translational Research (SCTR)
3400 Civic Center Blvd
215-573-2991
csgreene@mail.med.upenn.edu
TAs:
Zhang (Eric) Chen
Katie Siewert
Alex Amlie-Wolf
Guest Proctors:
Sarah Middleton
Greg Grant
Location/Time:
MWF
Office Hours:
Ben
Casey
TAs
zhch@mail.med.upenn.edu ksiewert@mail.med.upenn.edu
alexaml@mail.med.upenn.edu
sarahmid@mail.med.upenn.edu
ggrant543@gmail.com
10-11 AM
10-146 SCTR
Wed 12-1 PM
By appointment
Fri
12-1 PM
By appointment
Tue TBD
Thu TBD
By appointment
10-126 SCTR
10-126 SCTR
10-131 SCTR
10-131 SCTR
TBD
TBD
-
Course web site: We will use Canvas (https://upenn.instructure.com/) to disseminate and
receive materials for the course. We will use Canvas to provide lecture material. We will also be
using SageMathCloud (https://cloud.sagemath.com/) for learning and homework. Completed
assignments will be uploaded there.
Course Description: This course provides broad overview of bioinformatics and computational
biology as applied to biomedical research. A primary objective of the course is to enable students
to integrate modern bioinformatics tools into their research activities. Course material is aimed to
address biological questions using computational approaches and the analysis of data. Areas
include DNA sequence alignment, genetic variation and analysis, motif discovery, study design
for high-throughput sequencing, RNA and gene expression, single gene and whole-genome
analysis, machine learning, and topics in systems biology. The relevant principles underlying
methods used for analysis in these areas will be introduced and discussed at a level appropriate
for biologists without a background in computer science. However, a basic primer in
programming and operating in a UNIX environment will be presented, and students will also be
introduced to Python, R, and tools for reproducible research. This course emphasizes direct,
hands-on experience with applications to current biological research problems. The course is
NOT INTENDED for computer science students who want to learn about biologically
motivated algorithmic problems; GCB/CIS/BIOL536 and GCB537 is more appropriate.
The course will assume a solid knowledge of modern biology. An advanced undergraduate
course such as BIOL421 or a graduate course in biology such as BIOL526 (Experimental
Principles in Cell and Molecular Biology), BIOL527 (Advanced Molecular Biology and
Genetics), BIOL528 (Advanced Molecular Genetics), BIOL540 (Genetic Systems), or
equivalent, is a prerequisite.
Equipment prerequisite: IMPORTANT
All students are required to bring a laptop to class for in class activities. TAs will provide help
with the material, but students should be computer-capable with their own laptop, and should be
willing/capable to download and install free software from the Internet.
Grading: Grades will be determine by: (i) the Course Project (50% total, 25% written proposal,
25% oral presentation), (ii) homework assignments (30%), (iii) in-class labs assignments (10%),
and (iv) class participation (10%).
Late grading policy: For in-class lab assignments, we will not accept late turn-ins. For
homework, we will grade assignments turned in after the due date up to one week past the
deadline, but grades will be penalized (minus 30% off). Our late policy is driven by fairness,
because answer keys will be handed out a week after problems are due.
Plagiarism policy: Consistent with the University of Pennsylvania's honor code and policies on
academic integrity, we maintain a zero-tolerance policy on plagiarism. If we determine that there
is academic dishonesty, the course directors have wide discretion on grading, which includes the
assignment of failing grades. In addition, we reserve the right to pursue formal disciplinary
actions should the need arise. In many cases, students may not be aware of what constitutes
plagiarism in their work. If you are unsure, please contact one of the course directors. Please see
links below regarding the office of academic integrity for Penn’s policy on plagiarism and our
discretion on grading.
https://provost.upenn.edu/policies/faculty-handbook/students/iv-d
http://www.upenn.edu/academicintegrity/ai_violations.html
Reference Texts: As this is a rapidly moving field, there is no textbook required for this course.
We rely on a combination of online material, lecture notes, powerpoint slides, and active
exercises. See PLOS Computational Biology (see sections on Education, Developing
Computational Biology, and Translational Bioinformatics):
http://www.ploscollections.org/static/pcbiCollections;jsessionid=14B572B6330867049C
E2C9F46D97CD84 In addition, the following books and online sources can serve as references:
1. Bioinformatics for Biologists, eds. Pavel Pevzner and Ron Shamir, Cambridge University
Press, 2011.
2. Bioinformatics and Functional Genomics by Jonathan Pevsner (www.bioinfbook.org/). This
compiles material used for a course at Johns Hopkins.
Date 13-­‐Jan Pre-­‐class Activity -­‐ 15-­‐Jan Stats for Bioinformatics 18-­‐Jan Pre-­‐class Type -­‐ Lecture Due Dates Class Activity Lecture Intro/Databases -­‐ Seeking Biological Info Online -­‐ 20-­‐Jan Working in UNIX -­‐ I MLK -­‐ No Class Workbook UNIX Activities/Problems -­‐ I Seeking Bio Online 22-­‐Jan Working in UNIX -­‐ II Workbook UNIX Problems -­‐ II UNIX -­‐ I 25-­‐Jan Programming in Python -­‐ I Workbook Python: Practice and Problems -­‐ I UNIX -­‐ II 27-­‐Jan Programming in Python -­‐ II Workbook Python: Practice and Problems -­‐ II Python -­‐ I 29-­‐Jan Programming in Python -­‐ III Workbook Python: Practice and Problems -­‐ III Python -­‐ II 1-­‐Feb Programming in Python -­‐ IV Workbook Python: Practice and Problems -­‐ IV Python -­‐ III 3-­‐Feb Intro to R -­‐ I Workbook R: Introduction and Problems -­‐ I Python -­‐ IV 5-­‐Feb Intro to R -­‐ II Workbook R: Practice and Problems -­‐ II R -­‐ I 8-­‐Feb Tools for Reproducible Research Workbook R, Python: Reproducible Research R -­‐ II 10-­‐Feb Analysis of High Dimensional Data Workbook + Lecture R: Array Expr. Analysis: Bioconductor Repro. Research 12-­‐Feb Functional Enrichment Analysis Workbook + Lecture R: QC, Annotation, Ontology analysis 15-­‐Feb Sequence Alignment for Next Gen Data Workbook + Lecture 17-­‐Feb Phylogenies/Multi-­‐sequence alignment Workbook + Lecture Building phylogenies BLAST 19-­‐Feb Variation and its Discovery Workbook + Lecture Practice Problems + Analysis (PPA) Phylogeny 22-­‐Feb Bioinformatics in Pharmacology Pharmacology Data Analysis Activity PPA 24-­‐Feb ChIP-­‐Seq Primer -­‐ I Workbook Analysis of a Chip-­‐Seq Data Set -­‐ I Pharmacology 26-­‐Feb ChIP-­‐Seq Primer -­‐ I Workbook Analysis of a Chip-­‐Seq Data Set -­‐ II ChIP-­‐Seq -­‐ I 29-­‐Feb RNA-­‐Seq Primer -­‐ I Workbook Analysis of an RNA-­‐Seq Data Set -­‐ I ChIP-­‐Seq -­‐ II 2-­‐Mar RNA-­‐Seq Primer -­‐ II Workbook Analysis of an RNA-­‐Seq Data Set -­‐ II RNA-­‐Seq -­‐ I 4-­‐Mar RNA-­‐Seq Primer -­‐ II Workbook Analysis of an RNA-­‐Seq Data Set -­‐ III RNA-­‐Seq -­‐ II -­‐ Lecture BLAST, Alignment, Finding homologs/orthologs GExp-­‐Arrays R: QC/Annot/Ontol SPRING BREAK -­‐ No Class 14-­‐Mar -­‐ 16-­‐Mar ENCODE Primer -­‐ I 18-­‐Mar More on ENCODE -­‐ II 21-­‐Mar Motif Discovery 23-­‐Mar cis-­‐regulatory Modules -­‐ Workbook Project Meeting RNA-­‐Seq -­‐ III ENCODE data/Problems -­‐ I -­‐ Workbook ENCODE data/Problems -­‐ II Workbook + Lecture Motif Discovery and Analysis -­‐ I Workbook + Lecture Motif Discovery and Analysis -­‐ II ENCODE -­‐ II Motifs -­‐ I "OPEN STUDIO" 25-­‐Mar ENCODE -­‐ I 28-­‐Mar Regulatory Analysis Workbook + Lecture Motif Discovery and Analysis -­‐ III Motifs -­‐ II 30-­‐Mar Machine Learning (ML) -­‐ I Workbook + Lecture Machine Learning Activities -­‐ I Motifs -­‐ III 1-­‐Apr ML -­‐ II Workbook + Lecture Machine Learning Activities -­‐ II ML -­‐ I 4-­‐Apr ML -­‐ III Workbook + Lecture Machine Learning Activities -­‐ III ML -­‐ II 6-­‐Apr ML -­‐ IV 8-­‐Apr Paper Machine Learning Activities -­‐ IV ML -­‐ III ML -­‐ V Workbook + Lecture Machine Learning Activities -­‐ V ML -­‐ IV 11-­‐Apr Analysis of Genetic Variation -­‐ I Workbook + Lecture Intro to PLINK -­‐ I ML -­‐ V 13-­‐Apr Analysis of Genetic Variation -­‐ II -­‐ 15-­‐Apr Systems Biology Analysis of Genetic Variation -­‐ II Workbook Workbook + Lecture Analyses in Systems Biology 18-­‐Apr -­‐ -­‐ "Demo Day" (Greene, Voight, Tas) -­‐ 20-­‐Apr -­‐ -­‐ Student Presentations -­‐ 22-­‐Apr -­‐ -­‐ Student Presentations -­‐ 25-­‐Apr -­‐ -­‐ Student Presentations -­‐ 27-­‐Apr -­‐ -­‐ Student Presentations Project Report Due -­‐ 
Download