Bioinformatics: One Minute and One Hour at a Time Laurie J. Heyer L.R. King Asst. Professor of Mathematics Davidson College laheyer@davidson.edu What is Bioinformatics? Computer Science Mathematics Bioinformatics Biology Genomics, Proteomics and Systems Biology • Primary audience – Junior bio majors • Prerequisites – Bioinformatics and intro molecular biology or – One of several 300-level biology courses • Course home page: – http://www.bio.davidson.edu/ genomics • “Math Minutes” • Taught by A. Malcolm Campbell (Biology) Sample Topic: DNA Microarrays Plotting Expression Data • One highlighted gene is induced 16 fold • One highlighted gene is repressed 16 fold • But induction looks much more dramatic Log Transformation • Calculate log2 of each ratio • Ratio of 16 becomes value of 4 • Ratio of .0833 (1/16) becomes value of –4 • Induction and repression look equal, but opposite sign Hierarchical Clustering • Join two most similar genes • Join next two most similar “objects” (genes or clusters of genes) • Distance from one gene to a set of genes is minimum of all distances from the gene to the individual members (Single Linkage) • Repeat until all genes have been joined Genome Consortium for Active Teaching (GCAT) http://www.bio.davidson.edu/GCAT High School Chips See Kathy Gabric’s page: http://cstaff.hinsdale86.org/~kgabric/honorscalendar.html Bioinformatics Course • Prerequisites – Genomics or experience with modeling and “algorithmic thinking” • Goals: – To understand and apply various algorithms and statistical tests for analyzing DNA, RNA and protein sequences, and DNA microarray data. – To gain practical experience with Perl, a programming language widely used in molecular biology, web design, and text processing. • Course home page – http://gcat.davidson.edu/bioinformatics/bioinf.html Bioinformatics Topics • • • • • • • • • Determining sequences Comparing sequences Finding genes Predicting structure Comparing genomes Inferring phylogenies Analyzing images Clustering gene expression patterns Designing experiments Bioinformatics Projects Image Segmentation • Locate spot (signal) pixels • Measure intensity of signal and background in each channel • Compute ratio Adaptive Circle Algorithm • Specify threshold % between darkest and lightest pixel • Pixels above threshold are “on”, others are “off” • Combine two binary images – if pixel is “on” in either image, it is “on” in combined image • Search for radius and center that maximize percent of “on” pixels Adaptive Circle V2 (Dapple) • Compute 4-neighbor second-difference approximation to the Laplacian • Find sharply defined “upper” edge by convolving Laplacian with annular filters From “Dapple: Improved Techniques for Finding Spots on DNA Microarrays” UW CSE Technical Report UWTR-2000-08-05 Quality Clustering: QT Clust 1. Each gene builds a supervised cluster 2. Gene with “best” list, and genes in its list, becomes next cluster 3. Remove these genes from consideration, and repeat 4. Stop when all genes are clustered, or largest cluster is smaller than user specified threshold Why teach Bioinformatics? • Critical thinking • Interdisciplinary • Integrative – – – – – Modeling Data analysis Computational science Discrete math Probability and statistics • Student research opportunities