Commonly Taught Bioinformatics Topics Derived from Syllabi of 19 Universities Group 1*: DNA analysis Sequence Analysis** Computational Genomics Protein Structure and Function Phylogenetics Microarray Analysis Group 2: Computing Programming Algorithms Databases Group 3: Other General Biology Biomedical Informatics Statistics Law and Ethics *Group 1 is the most common, followed by 2 and then 3. **Within each group, the most common topics are listed first Topics in Sequence Analysis Departments: Biomedical Engineering, Bioinformatics and Computational Biology, Bioinformatics, Plant Biology, Biochemistry, Computer Science, Bioengineering, Medical Informatics, Biology, Biomolecular Engineering Background -General principles of DNA/RNA structure and stability The Basics of Sequence Alignment -Pair-wise sequence comparison -multiple sequence alignment -fragment assembly -Sequence profiles and profile alignment methods -Scoring matrices, BLOSUM -Sequence weighting Software -BLAST, FASTA, CLUSTAL, GRAIL, INSIGHT II, RASMOL, HMMER Algorithms and Models -Hidden Markov Models -Maximum-likelihood estimate -Markov Chains -Gibbs Sampling -Dirichlet Mixtures Phylogenetic inference (see topics in Phylogenetics) -reconstructing evolutionary relationships Structure prediction (see topics in protein structure and modeling) -RNA secondary structure prediction -protein structure prediction and comparison Public Access Databases -sequence and structure search tools Whole Genome Sequencing (see topics in computational genomics) -shotgun approaches -EST assembly -genome annotation Topics in Computational Genomics Departments: Bioinformatics, Plant Biology, Computer Science, Biochemistry, Bioinformatics and Computational Biology, Biomolecular Engineering, Biochemistry, Biology, Biomedical Informatics Whole Genome Reconstruction -genome mapping -genome assembly Comparative Genomics -General probabilistic graphical models -Bayesian Networks -Exact inference -Learning BNs from data -EM and structural EM Functional Genomics -QTLs & eQTLs -Non-coding RNA genes -RNA recognition -reverse genetics -The ENCODE project Gene Expression Analysis -sequencing methods (including microarrays) -clustering -EST libraries -motif discovery -DNA methylation & epigenetic gene regulation Genome Annotation -gene finding -gene indices Genetic Regulation -gene regulatory regions -translational regulation: siRNA and microRNA -identifying miRNAs and their targets Genomic Technologies -PCR technology -Genechips Genome Diversity Genome Structure The Human Genome Project Genome Databases Data Mining Models -Hidden Markov Models -Probabilistic formulation -Mixture models -Gaussian Mixtures -Biclustering -Loss functions -Conditional maximum likelihood. -Linear regression, GLMs, perceptrons, Neural nets -SVMs Proteomics and Metabolomics Medical Applications -Diseases and phenotypes -Pharmaceutical discovery -Genomic medicine -Copy number variation Phylogenetic Inference -Parsimony -Stationary markov processes -Rate matrices -Maximum Likelihood -Maximum a posteriori -Felsensteins Post-order traversal -PhyloHMMs Topics in Protein Structure and Modeling Departments: Bioinformatics, Computer Science, Biochemistry, Bioinformatics and Computational Biology, Bioengineering, Biomolecular Engineering, Biology Physico-chemical properties of proteins -Protein folding dynamics Determining Protein Structure Experimental Techniques -X-ray Crystallography -NMR -cryo-EM -mass spectrometry Computational Techniques -SAM-Txx prediction protocol -Lattice-based prediction -Undertaker protein-folding algorithm Classification of Protein Structure -Protein families -Protein domains and prediction of domain boundaries -Homology modeling -Comparative modeling of protein structure and threading -Significance of structure-structure similarity -Expression data analysis (clustering and classification) -Structure-structure alignment algorithms Protein Function -Protein structure-function relationships -Prediction of functionally important sites Public protein structure databases -Structure database search tools Protein interactions -Protein-protein interaction networks -Voronoi diagram, Delaunay triangulation RNA secondary structure prediction Medical Applications -Protein microarrays and detection of autoimmune disease -High throughput proteomic disease markers -Computational methods for protein microarrays -Remote homology detection -Proteomic diagnosis of trauma Topics in Phylogenetics Departments: Computer Science, Biochemistry, Bioinformatics and Computational Biology, Bioengineering, Biology, Biomedical Informatics, Computer Science Molecular basis of evolution History of Phylogenetic Inference Characters: Homology, Morphology, Molecular Phylogenetic Tree Construction -Alignment Strategies -Optimality Criteria – Parsimony, ML, ME -Algorithmic Approaches -Searching Tree Space -Character Weighting in Parsimony -Clustering methods -Hypothesis Testing: Paired Sites, Parametric Bootstraps -Multiple Data Sets/Partitioned Models -Molecular Clocks -Ancestral Character State Reconstruction Models of Sequence Evolution -Model Selection -Method Performance Support for Constructed Trees -Consensus Trees -G1, PTP, Decay, Bootstrap -Jacknife & Bayesian Nodal Probabilities Non-tree Based Methods Software tools for phylogenetic analysis Genome comparisons Protein structure evolution Topics in Microarray Analysis Departments: Biomedical Informatics, Biology, Biostatistics Review of the basic biology of gene expression Overview of microarray technology Microarray Data Analysis – Statistical Techniques -regression -discriminant analysis -clustering -classification -simple graphical models Methods for computational and biological validation Topics in Programming Departments: Bioinformatics, Computer Science, Biomedical Engineering, Biology, Plant Biology, Bioinformatics and Computational Biology General Programming Concepts (in alphabetical order) -algorithm design -arrays -complex data structures -control structures -data types -debugging -designing modules -dynamic programming -file input/output -functions -graphics programming -hashes -introduction to machine learning -multiprocessing & multithreaded programming -network programming with sockets -object oriented programming -pointers -recursion -regular expressions -sorting -subroutines -web programming (HTML, CGI) Languages Taught and Language-Specific Topics HTML SQL Perl BioPerl -Genomic resources -Accessing Remote databases C -Flow Control -C Structures -Interface to UNIX -Using a C Application Programming Interface (API) Java -Using Java Classes -GUI Layout -Java Events -Java Exception Handling UNIX -Using UNIX for basic data processing -UNIX command-line tools -UNIX shell programming -Using UNIX development tools Intro to relational databases Bioinformatics Applications -DNA sequence analysis -parsing FASTA and GenBank files -processing BLAST output files Topics in Algorithms Departments: Bioinformatics, Computer Science, Biomedical Engineering, Bioengineering, Medical Informatics Concepts of Optimization -Continuous vs Discrete Optimization -Constrained and Unconstrained Optimization -Global and Local Optimization -Stochastic and Deterministic Optimization Optimization Algorithms -Linear and Nonlinear programming -Combinatorial optimization -Heuristic search methods Exact string matching problems -Suffix trees -Suffix tree algorithms Applications -Prediction of genetic regulatory network -Protein structure prediction -Design of microarray experiment, analysis of microarray data -Biological signal finding -Neural Networks Greedy algorithms Algorithm complexity Sorting Recursion Dynamic programming and space management Parallel and grid computing Simulation Introduction to Machine Learning Feature Spaces Hidden Markov Models SVMs Topics in Databases Departments: Computer Science, Biomedical Engineering, Plant Biology, Information and Library Science Relational data models and database management systems -Relational database design -Foreign Keys -Relational Integrity -Entity-Relationship modeling -Normalization -Transactions SQL -simple queries -calculated fields -sorting and grouping results -aggregate functions -multi-table queries (inner join, outer join) -subqueries -combining result tables -create and alter tables ORACLE Biological Databases Object-Oriented Databases Web based programming tools to make databases accessible Data integration and security Topics in General Biology Departments: Bioinformatics, Bioinformatics and Computational Biology, Biology, Medical Informatics, Biomedical Engineering Molecular Biology -Synthesis, structure, and function of DNA, RNA, and proteins -Regulation and control of the synthesis of RNA and proteins -Introduction to molecular biology of eukaryotes. -molecular biological techniques (genetics, recombinant DNA techniques) -cell structure and cell cycle Genetics -relationships among genes -regulation of gene expression -use of genetic systems to probe genetic problems -Mendelian genetics -genomics -rules of inheritance in eukaryotic organisms -DNA replication -molecular approaches to analyze DNA. -DNA structure -location of DNA within the cell -movement of genes within a chromosome -genetic maps -chromosome abnormalities -mutations -prokaryotic genetics -genetic recombination -DNA movement in the genome -protein synthesis Topics in Biomedical Informatics Departments: Information and Library Science, Biomedical Informatics, Medical Informatics Basics of Biomedical Informatics -Overview of Discipline and Its History -Biomedical Computing -Electronic Medical Records (EMR) -Decision Support and Health Care Quality -Standards, Privacy and Security, Costs and Implementation -Evidence-Based Medicine and Medical Decision-Making -Imaging Informatics and Telemedicine -Bibliographic Retrieval -Networking -Web-based Interactions Information Retrieval -Text Based -Image Based -Genomics -Terms, Models, and Resources -Health and Biomedical Information -Evaluation of Systems -Content -Indexing -Retrieval -Evaluation -Lexical-Statistical Systems -Augmenting Systems for the User Health sciences informatics -Health sciences information centers -Health information professionals and roles -Information resources -Information organization and access Topics in Statistics Departments: Bioinformatics, Medical Informatics, Biostatistics, Biomolecular Engineering Statistical Foundations -random variables -probability -statistical inference -confidence intervals -hypothesis testing -correlation and regression Advanced Statistical Concepts -sample size and power considerations -analysis of variance and multiple comparisons -multiple regression and statistical control of confounding -logistic regression -survival analysis -multiple testing issues and step-down procedures -length model versus stop character for finite strings -use of log-probability for computations -constructing a model from data -training, cross-training, and testing -Z-scores (Gaussian dist.) and fat tails of extreme-value (Gumbel dist.) -machine learning -supervised learning -dimensionality reduction -clustering -decision trees -maximum entropy -Bayes’ Rule and its applications Algorithms, Models, and Processes -Stochastic Processes (Poisson, Markov, Random Walks) -Maximum Likelihood, Likelihood Ratios, and Sequential Analysis -Gibbs Sampler -Bootstrap Estimation Biological Applications -pairwise and multiple sequence alignment -gene and protein classification -phylogenetic tree construction. -high dimension functional genomics data -gene and motif finding Programming in R, Perl Topics in Law and Ethics Departments: Law, Bioinformatics -property rights -privacy and discrimination -the federal regulatory role -self-regulatory safeguards -liability implications for individual/organizational behavior -policy responses to societal concerns in the U.S. and abroad Cases studies -gene therapy -cloning -biomaterials in the medical and health sector -farming and crop modification in the agricultural sector Universities where Syllabi were Collected Boston University George Mason University George Washington University Michigan State University Northeastern University Northern Illinois University Oregon Health and Science University Rochester Institute of Technology Stanford University University of California Santa Cruz University of Idaho University of Illinois at Chicago University of Iowa University of Michigan University of Minnesota University of North Carolina at Chapel Hill University of Tennessee at Knoxville University of Texas at El Paso University of the Sciences in Philadelphia