Instructor: David L. Tabb, PhD david.l.tabb@vanderbilt.edu
In this course, students will be introduced to the algorithms and concepts fundamental to the field of bioinformatics. The experimental problems addressed by these algorithms will be part of the examination of the software.
Ideally, students will have prior exposure to computer programming, though software development is not a requirement of the class. Students who are likely to develop software tools (ranging from Perl scripts to number-crunching code) in support of their research are likely to benefit most from this class, though users of publicly available web utilities will also find it useful.
Students will be evaluated on the basis of two scored elements, each comprising
50% of the final grade:
A brief quiz at the start of each class will test each student’s understanding of material presented in the previous class and any assigned readings.
Students will create a written report for a project and present their work to the class at the close of the semester. Example projects include a review of literature on a bioinformatics topic or a newly developed algorithm from one of the areas described in the course. Project plans must be approved by the course director no later than one month before the final class.
Biochemistry basics: nucleic acids, proteins, lipids, carbohydrates
Molecular biology basics: cells and organelles, transcription and translation, mutation and damage repair, cellular signaling, etc.
Molecular underpinnings of example diseases
Types of data in molecular biology: DNA electropherograms, sequences, microarrays, gels, mass spectrometry, NMR, X-ray crystallography, etc.
Defining bioinformatics and differentiating from computational biology
Sequence alignment: Dot plots, Needleman-Wunsch, Smith-Waterman,
Lipman-Pearson, BLAST
Multiple sequence alignment: ClustalW / phylograms / cladograms
Hidden Markov Models (HMMs) for motif detection
Protein families and domains: Interpro and Blocks
PAM and BLOSUM substitution matrices
Phred: assessing error rates from sequencing electropherograms
Phrap: building sequence contigs from sequencing reads
History of NCBI
Polymorphism detection
Fundamentals of cDNA arrays.
Clustering genes: Quality Threshold Clustering
MIAME: standards for communication of microarray data
LIMS development
Protein structure inference
Predicting migration in 2D gel electrophoresis
Finding peaks in MALDI-TOF profiles
Statistical models for MS/MS peptide identification
MIAPE: standards for communication of proteomics data
Searching for biomarkers
Genetic regulatory networks
Functional annotation of genes
Gene Ontology (GO) terms
ANNs, SVMs, and CART decision trees