Page 1 of 9 Bibliography List: General References: Baxevanis, A.D. and Oullette, B.F.F., eds. 1998 Bioinformatics, A Practical A. John Wiley and Guide to the Analysis of Genes Sons, Inc., NY and Proteins Brenner, S., Lewitter, F., Patterson, M., and Handel, M., eds. 1998 Trends Guide to Bioinformatics Waterman, M.S. 1995 Introduction to Computational Chapman Hall Biology: Maps, sequences and genomes Salzberg, S.L., Searls, D.B., and Kasif, S., eds. 1998 Computational Methods in Molecular Biology 1998 Biological Sequence Analysis: Probabilistic models of CUP proteins and nucleic acids Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K. and Watson, J.D., eds. 1994 Molecular Biology of the Cell Garland Publishing Baldi, P. and Brunak, S. 1998 Bioinformatics: The machine learning approach MIT Press Rashidi, H. H. and Buehler, L. K. 2000 Bioinformatics Basics: Applications in Biological Science and Medicine Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. Elsevier Science Elsevier CRC Press Week 1: Foundations of Molecular Biology Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K. and Watson, J.D., eds. 1994. “Molecular Biology of the Cell”. Garland Publishing. Rubin, G.M., et al. 2000. “Comparative genomics of the eukaryotes”. Science 287:22042215. Schafer, A.J. and Hawkins, J. R. 1998. “DNA variation and the future of human genetics” Nature Biotechnology 16:33-39. Week 2: Modern Biochemical Techniques Gene Sequence Analysis: Sterky, F. and Lundeberg, J. 2000. “Sequence analysis of genes and genomes”. Journal of Biotechnology 76:1-31. Hunkapiller, T. et al. 1991. “Large-scale and automated DNA sequence determination” Science 254:59-67. Page 2 of 9 Week 3: Problems in Computational Molecular Biology Salzberg. Chapter 1. “Grand challenges in computational biology” Tsoka, S. and Ouzounis, C.A. 2000. “Recent developments and future directions in computational genomics”. FEBS Letters 480:42-48. Delisi, C. 1988. “Computers in molecular biology: current applications and emerging trends” Science 240:47-52. Howard, K. July 2000. “The bioinformatics gold rush” Scientific American pp 58-63. Koonin, E.V. 1999. “The emerging paradigm and open problems in comparative genomics” Bioinformatics 15(4):265-266. Clark, M.S. 1999. “Comparative genomics: the key to understanding the Human Genome project” Bioessays 21:121-130. Searls, D. B. 2000. “Using bioinformatics in gene and drug discovery” Drug Discovery Today 5(4):135-143. Boguski, M.S. 1999. “Biosequence exegesis”. Science 286:453-455. General: Nowak, R. 1995. “Entering the postgenome era”. Science 270:368-371. Kahn, P. 1995. “From genome to proteome”. Science 270:369-370. Week 4: Statistical Preliminaries Week 5: General Computational Search Methods General: Chapter 2, Salzberg. “A tutorial introduction to computation for biologists”. Data Structure and Algorithms Techniques: Dynamic Programming Hillier, F.S. & Lieberman, G.J., 1995, “Introduction to Operations Research”, 6th ed., McGraw Hill (Chapter 10, pp 424-469) Gusfield, D., 1997, “Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology”, CUP (Chapter 11, pp 215-253) Bertsekas, D. P., 1995, “Dynamic Programming and Optimal Control”, Athena Scientific, (Chapter 1, pp 1-49) Techniques: Hidden Markov Models Krogh, A. et al. 1994. “Hidden Markov Models in Computational Biology: applications to protein modeling”. J. Mol. Biol. 235:1501-1531. Forney, G.D. 1973. “The Viterbi Algorithm”. Proceedings of the IEEE. 61 (3):268-278. Rabiner, L.R. and Juang, B.H. Jan 1986. “An Introduction to Hidden Markov Models”. IEEE ASSP Magazine pp 1-16. Durbin et al. (Chapter 3) Page 3 of 9 Baldi, P. et al. 1994. “Hidden Markov Models of biological primary sequence information”. Proc. Natl. Acad. Sci. USA 91:1059-1063. Eddy, S.R. 1996. “Hidden Markov Models” Current Opinion in Structural Biology 6:361-365. Asai, K., Hayamizu, S. and Handa, H. 1993. “Prediction of protein secondary structure by the hidden Markov model” Computer Applications in the Biosciences (CABIOS) 9 (2):141-146. Techniques: Pattern Recognition Stormo, G.D. and Hartzell, G.W. 1989. “Identifying protein-binding sites from unaligned DNA fragments”. Proc. Natl. Acad. Sci. USA 86:1183-1187. Smith, R.F. and Smith, T.F. 1990. “Automatic generation of primary sequence patterns from sets of related protein sequences”. Proc. Natl. Acad. Sci. USA 87:118-122. Lawrence, C.E. et al. 1993. “Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment”. Science 262:208-214. Galas, D.J., Eggert, M. and Waterman, M.S. 1985. “Rigorous pattern-recognition methods for DNA sequences: Analysis of promoter sequences from Escherichia coli” J. Mol. Biol. 186:117-128. Smith, H.O., Annau, T.M. and Chandrasegaran, S. 1990. “Finding sequence motifs in groups of functionally related proteins”. Proc. Natl. Acad. Sci. USA 87:826-830. Week 6: Sequence Comparisons & Alignment Substitution Matrices “Amino acid substitution matrices from an information theoretic perspective”, Altschul, S.F. 1991 Journal of Molecular Biology 219: 555-565. “A model of evolutionary change in proteins”, Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C. 1978 In "Atlas of Protein Sequence and Structure, vol. 5, suppl. 3," M.O. Dayhoff (ed.), pp. 345-352, Natl. Biomed. Res. Found., Washington. Henikoff, S. and Henikoff, J.G. 1992. “Amino acid substitution matrices from protein blocks”, Proc. Nat. Acad. Sci. USA 89: 10915-10919. States, D.J., Gish, W., Altschul, S.F. 1991. “Improved Sensitivity of Nucleic Acid Database Searches Using Application Specific Scoring Matrices”, Methods: A companion to Methods in Enzymology 3: 66-70. Thompson, J.D., Higgins, D.G. and Gibson, T.J. 1994. “Improved sensitivity of profile searches through the use of sequence weights and gap excision” CABIOS 10(1):19-29. Sequence Alignment Needleman, S.B., Wunsch, C.D. 1970. “A general method applicable to the search for similarities in the amino acid sequences of two proteins”, Journal of Molecular Biology 48:443-453. Smith, T.F. & Waterman, M.S. 1981. “Identification of common molecular subsequences.” J. Mol. Biol. 147:195-197. Page 4 of 9 Gotoh, O. 1982. “An improved algorithm for matching biological sequences”. J. Mol. Biol. 162:705-708. Pearson, W.R. & Lipman, D.J. 1988. “Improved tools for biological sequence comparison.” Proc. Natl. Acad. Sci. USA 85:2444-2448. Johnson, M.S., Overington, J.P. 1993. “A Structural Basis of Sequence Comparisons An evaluation of scoring methodologies”, J. Mol. Biol. 233: 716-738. Waterman, M.S., Eggert, M. 1987 . “A new algorithm for best subsequence alignments with applications to tRNA-rRNA comparisons”, J. Mol. Biol. 197:723-728. Brenner, S. E., Chothia, C. and Hubbard, T.J.P. 1998. “Assessing sequence comparison methods with reliable structurally identified evolutionary relationships”, Proc. Natl. Acad. Sci. USA 95: 6073-6078 Henikoff, S. 1996. “Scores for sequence searches and alignments”. Curr. Op. Struct. Biol. 6: 353-360. Pearson, W.R. 1991. “Searching Protein Sequence Libraries: Comparison of the Sensitivity and Selectivity of the Smith Waterman and FASTA algorithms”. Genomics 11: 635-650. Myers, E.W. and Miller, W. 1988. “Optimal alignments in linear space” CABIOS 4(1):11-17. Chao, K., Pearson, W.R. and Miller, W. 1992 “Aligning two sequences within a specified diagonal band” CABIOS 8(5):481-487. Barton, G.J. 1993. “An efficient algorithm to locate all locally optimal alignments between two sequences allowing for gaps” CABIOS 9(6):729:734. Huang, X. and Zhang, J. 1996. “Methods for comparing a DNA sequence with a protein sequence” CABIOS 12(6):497-506. Pearson, W. R. 1996. “Effective Protein Sequence Comparison” Methods in Enzymology 266:227-258. The statistics of sequence alignment Altschul, S.F. 1991, “Amino Acid Substitution Matrices from an Information Theoretic Perspective”, J. Mol. Biol.,219: 555-565. Karlin, S. and Altschul, S. F., 1990, “Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes”, Proceedings of the National Academy of Sciences USA 87: 2264-2268. Karlin, S., Dembo, A. and Kawabata, T.. 1990. “Statistical composition of high-scoring segments from molecular sequences”. The Annals of Statistics, 18, no 2.: 571-581. Dembo, A. and Karlin, S. 1991. “Strong limit theorems of empirical functionals for large exceedances of partial sums of I.I.D. variables”. The Annals of Probability, 19, no 4.: 1737-1755. Karlin, S. and Altschul, S. F. 1993. “Applications and statistics for multiple high-scoring segments in molecular sequences”. Proc. Nat. Acad. Sci. USA 90: 5873-5877. Sjolander K. et al. 1996. “Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology” CABIOS 12(4):327-345. Altschul, S.F. and Gish, W. 1996. “Local Alignment Statistics”. Methods in Enzymology 266: 460-480 Database Alignment Tools & Searches: BLAST, FASTA, PSI-BLAST “Basic local alignment search tool”, Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. 1990 Journal of Molecular Biology 215:403-410. Page 5 of 9 “Improved tools for biological sequence comparison”, Pearson, W.R., Lipman, D.J. 1988 Proceedings of the National Academy of Sciences USA 85 :2444-2448. Altschul, S.F., Madden, T.L., Schâffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs”. Nucleic Acids Res. 25:3389-3402. Altschul, S.F. and Koonin, E.V. 1998. “Iterated profile searches with PSI-BLAST – a tool for discovery in protein databases. TIBS. 23:444-447. Aravind, L. and Koonin, E.V. 1999. “Gleaning non-trivial structural and evolutionary information about proteins by iterative database searches”. J Mol. Biol. 287: 1023-1040. Multiple Sequence Alignment “A tool for multiple sequence alignment”, Lipman, D.J., Altschul, S.F., Kececioglu, J.D. 1989 Proceedings of the National Academy of Sciences USA 86 :4412-4415. Barton, G.J., Sternberg, M.J. 1987. “A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons”, Journal of Molecular Biology 198 :327-337. Carrillo, H. and Lipman, D. 1988 “The multiple sequence alignment problem in biology” SIAM Journal. App. Math. 48 (5): 1073-1082. Higgins, D.G., Bleasby, A.J. and Fuchs, R. 1992. “CLUSTAL V: improved software for multiple sequence alignment” CABIOS 8(2):189-191. Hirosawa, M. et al. 1993. “MASCOT: multiple alignment system for protein sequences based on three-way dynamic programming” CABIOS 9(2):161-167. Gotoh, O. 1993 “Optimal alignment between groups of sequences and its application to multiple sequence alignment” CABIOS 9(3):361-370. Kim, J., Pramanik, S. and Chung, M.J. 1994. “Multiple sequence alignment using simulated annealing” CABIOS 10(4):419-426. Stormo, G.D., Hartzell, G.W. III 1989 “Identifying protein-binding sites from unaligned DNA fragments”. Proceedings of the National Academy of Sciences USA 86 :1183-1187. Galas, D.J., Eggert, M. Waterman, M.S. 1985 “Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli”. Journal of Molecular Biology 186 :117-128. Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C. 1993 “Detecting Subtle Sequence signals: A Gibbs Sampling Strategy for Multiple Alignment.” Science 262: 208-214. Schuler, G.D., Altschul, S.F., Lipman, D.J. 1991 “A workbench for multiple alignment construction and analysis. Proteins 9 :180-190. Sequence Profiles Smith, R.F., Smith, T. F. 1990 “Automatic generation of primary sequence patterns from sets of related protein sequences”. Proc. Nat. Acad. Sci. USA 87 :118-122. Page 6 of 9 Gribskov, M., McLachlin, A.D., Eisenberg, D. 1987 . “Profile analysis: detection of distantly related proteins”. 84 :4355-4358. Sequences and Evolutionary Trees Doolittle, R. 1981. “Similar amino acid sequences: chance or common ancestry?” Science 214:149-159 Henikoff, S. and Greene E.A. 1997. “Gene families: the taxonomy of protein paralogs and chimeras” Science 278:609-614. Lake, J.A. 1994. “Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances” Proc. Natl. Acad. Sci. USA 91:1455-1409. Weeks 7 & 8: Gene Finding and Sequence Annotation SNPs: Buetow, K.H., Edmonson, M.N. and Cassidy, A.B. 1999. “Reliable identification of large numbers of candidate SNPs from public EST data” Nature Genetics 21:323-325. Cargill, M. et al. 1999. “Characterization of single-nucleotide polymorphisms in coding regions of human genes” Nature Genetics 22:231-238. Clustering and ESTs: Chou, A. and Burke, J. 1999. “CRAWview: for viewing splicing variation, gene families, and polymorphisms in clusters of ESTs and full-length sequences” Bioinformatics 15(5):376-381. Finding consensus patterns: Hertz, G.Z., Hartzell, III, G.W. and Stormo, G.D. 1990. “Identification of consensus patterns in unaligned DNA sequences known to be functionally related” CABIOS 6(2):81-92 Week 9: Shotgun global sequence alignment Weeks 10 & 11: Gene expression analysis Gene Chips: Gullans, S.R. 2000 “Of microarrays and meandering data points” Nature Genetics 26:4-5 Brazma, A. and Vilo, J. 2000. “Gene expression data analysis”. FEBS Letters 480:17-24. Gerhold, D., Rushmore, T. and Caskey, C.T. 1999. “DNA chips: promising toys become powerful tools”. TIBS 24:168-173. Knight, J. 2001. “When the chips are down.” Nature 410:860-861. Page 7 of 9 Hamadeh, H. and Afshari, C.A. 2000. “Gene chips and functional genomics” American Scientist 88:508-515. Celis J.E. et al. 2000. “Gene expression profiling: monitoring transcription and translation products using DNA microarrays and proteomics”. FEBS Letters 480:2-16. Ermolaeva, O. et al 1998. “Data management and analysis for gene expression arrays” Nature Genetics 20:19-23. Khan, J. et al. 1999. “DNA microarray technology: the anticipated impact on the study of human disease” Biochimica et Biophysica Acta 1423:M17-M28. Ramsay, G. 1998 “DNA chips: state of the art” Nature Biotechnology 16:40-44. Marshall, A. and Hodgson, J. 1998. “DNA chips: an array of possibilities” Nature Biotechnology 16:27-31. Hacia, J.G. 1999. “Resequencing and mutational analysis using oligonucleotide microarrays” Nature Genetics 21:42-47. Bucher, P. 1999. “Regulatory elements and expression profiles” Current Opinion in Structural Biology 9:400-407. Debouck, C. and Goodfellow, P.N. 1999. “DNA microarrays in drug discovery and development”. Nature Genetics 21:48-50. Schena, M. et al. 1995 “Quantitative monitoring of gene expression patterns with a cDNA microarray” Science 270:467-470. Computational Methods: Tavazoie, S. et al. 1999. “Systematic determination of genetic network architecture” Nature Genetics 22:281-285. Greller, L.D. and Tobin, F. L. 1999. “Detecting selective expression of genes and proteins” Genome Research 9(3):282-296. Eisen, M. B. et al 1998 “Cluster analysis and display of genome-wide expression patterns” PNAS 95:14863-14868. Protein Microarrays: Lueking A. et al. 1999 “Protein microarrays for gene expression and antibody screening” Analytical Biochemistry 270:103-111. Weeks 12 & 13: Algorithms for RNA and protein structure prediction RNA sequence and structure: Corpet, F. and Michot, B. 1994. “RNAlign program: alignment of RNA sequences using both primary and secondary structures” CABIOS 10(4):389-399. Shapiro, B.A. and Wu, J. C. 1996. “An annealing mutation operator in the genetic algorithms for RNA folding”. CABIOS 12(3):171-180. Protein Structure Prediction: Kabsch, W. and Sander, C. 1984 “On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations”. Proc. Nat. Acad. Sci. USA 81:1075-1078. Gribskov, M., Homyak, M., Edenfield, J., Eisenberg, D. 1988. “Profile scanning for three-dimensional structure patterns in protein sequences”. Computer Applications in the Biosciences 4:61-66. Page 8 of 9 Detecting Motifs: Bork, P. and Koonin, E. V. 1996. “Protein sequence motifs” Current Opinion in Structural Biology 6:366-376. Lawrence, C.E. and Reilly, A.A. 1990. “An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences” Proteins: Structure, Function and Genetics 7:41-51 Hughey, R. and Krogh, A. 1996. “Hidden Markov models for sequence analysis: extension and analysis of the basic method” CABIOS 12(2):95-107. Smith, H.O., Annau, T.M., Chandrasegaran, S. 1990 “Finding sequence motifs in groups of functionally related proteins”. Proceedings of the National Academy of Sciences USA 87 :826-830. Staden, R. 1988. “Methods to define and locate patterns of motifs in sequences” CABIOS 4(1):53-60. Structural Genomics: (prediction of protein structure) Sali, A. 1998. “100,000 protein structures for the biologist” Nature Structural Biology 5(12):1029-1032 RNA motifs: Dandekar, T. and Hentze, M.W. 1995. “Finding the hairpin in the haystack: searching for RNA motifs.” TIG 11(2):45-50. Week 14: Databases and Web Tools Database Searches: Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. 1990. “Basic local alignment search tool”, Journal of Molecular Biology 215: 403-410. Pearson, W.R., Lipman, D.J. 1988. “Improved tools for biological sequence comparison”, Proc. Nat. Acad. Sci. USA 85: 2444-2448. Altschul, S.F., Boguski, M.S., Gish, W. and Wootton, J.C. 1994 “Issues in searching molecular sequence databases”. Nature Genetics 6:119-129. Attwood, T. K. 2000. “The role of pattern sequence databases in sequence analysis”. Briefings in Bioinformatics 1(1):45-59. Page 9 of 9 TO FIND: Sansom, C. 2000. “Database searching with DNA and protein sequences: an introduction.” Briefings in Bioinformatics 1(1):22-32. “Simultaneous comparison of three or more sequences related by a tree”, Sankoff, D., Cedergren, R.J. (1983) In "Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison," D. Sankoff & J.B. Kruskal (eds), pp. 253-263, Addison-Wesley, Reading, MA.