TDALIGN (version 1.61) December 1988 Dan Davison and Dr. Keith Thompson ("readme.doc" added 2/24/89 by Spencer Yeh) INTRODUCTION TDALIGN is a global alignment program for two nucleotide sequences or two protein sequences. It works by placing the first residue of one sequence opposite the first residue of the other sequence, and then "stretching" the two sequences by adding gaps to find matching. Because of this, it does NOT permit a gap at the 5' terminus. This may be a problem for some users. The algorithm is described fully in Davison and Thompson, "A non-metric sequence alignment algorithm", Bull. Math. Biol. 1984 46(4): 579-590. The program is only minimally user-friendly. CONTACT ADDRESS For questions about the program or suggestions for future improvements please contact: Dan Davison Theoretical Biology and Biophysics Group T-10 MS K710 Los Alamos National Laboratory Los Alamos, NM 87545 tel.: (505) 665-1355 e-mail: dd@lanl.gov or goad.davison@bionet-20.bio.net CompuServe: 74065,41 (rarely) SYSTEMS SUPPORTED An IBM-compatible computer running MS-DOS is needed. Two versions of the program are available; one for machines with a 80X87 math coprocessor and one for machines without. AVAILABILITY TDALIGN is available for downloading from BIONET in the directory <PC-SOFTWARE.DAVISON> or by postal mail from the BIONET Lending Library. If you would like to receive TDALIGN by mail, please send a stamped, self-addressed return envelope along with a formatted diskette (specify capacity) and your request to: BIONET Administrator BIONET/IntelliGenetics, Inc. 700 East El Camino Real, Suite 300 Mountain View, CA 94040 tel.: (415) 962-7337 SOURCE CODE Source code written in FORTRAN is available in the archive file "TDALNSRC.ARC". The program was originally compiled under MicroSoft FORTRAN, but has since been recompiled using RM FORTRAN 2.42 (also known as AUSTEC FORTRAN) which is a far better implementation of FORTRAN. Once installed the object files can be relinked with the DOS linker or PLINK86, or the source code can be re-compiled. The RM FORTRAN libraries are required. PROGRAM FILES before de-ARCing (BIONET diskette version) 228 Kb: README.DOC TDALIGN.EXE TDALNSRC.EXE This documentation file. (8 Kb). Self-extracting archive file for the executables and documentation. (110 Kb). Self-extracting archive file for the source code and object files. (110 Kb). PROGRAM FILES before de-ARCing (BIONET downloadable version) README.DOC This documentation file. TDALIGN.UUE Archived and uuencoded file for the executable and documentation. (137 Kb). TDALNSRC.UUE Archived and uuencoded FORTRAN source code and object libraries. (142 Kb). PROGRAM FILES after de-ARCing (Approx. 210 Kb total) READ ALIGN NTDALIGN 80X87. TDALIGN 80X87. SEQ1 SEQ2 ME DOC EXE 4219 20025 99920 12-11-88 12-11-88 12-11-88 12:43a 12:41a 12:24a Documentation file. Documentation file. Executable for machines w/o EXE 86416 12-10-88 12:31a Executable for machines w/ SEQ SEQ 35 43 1-15-88 1-15-88 9:09p 9:10p Test data file. Test data file. DOCUMENTATION The program is documented in the files READ.ME and TDALIGN.DOC in addition to containing internal help messages. The source code is also commented. STARTING THE PROGRAM On the BIONET diskette version, the archive file TDALIGN.EXE is a self-extracting archive, whereas the ".uue" files in the downloadable version require both the UUDECODE program and an "ARC"-compatible dearchiving program such as PKUNPAK to restore the executable file. Self-dearchiving files: De-archive the program by "running" the archive file and specifying the drive and directory path where you want the program installed. E.g., to install TDALIGN in the \tdalign directory of the c: drive, you should type: >TDALIGN c:\tdalign ".UUE" files: First decode the ".uue" file: >UUDECODE tdalign.uue Then dearchive the resulting ".arc" file to an appropriate directory by using PKUNPAK (or compatible program): >PKUNPAK tdalign.arc c:\tdalign Once installed, CD to the appropriate directory and then start the program by typing its name at the MS-DOS prompt: >NTDALIGN or >TDALIGN (for machines without a 80X87 math coprocessor) (for machines with a 80X87 math coprocessor) SAMPLE PROGRAM OUTPUT This program takes any two R/DNA or amino acid sequences in one letter code and compares them for similarity. Copyright 1982, 1984, 1985, 1986, 1987, 1988, 1989 by Dan Davison and Keith Thompson. Version 1.61 12/11/88 Enter the name of the file containing sequence 1 or a ?: seq1.seq Enter the name of the file containing the second sequence (can be in the same file) or a ?: seq2.seq Enter the name of the first sequence (this parameter is case sensitive) or a ?: seq1 Enter the name of the second sequence (this parameter is case sensitive) or a ?: seq2 The sequences to be matched are seq1 and seq2 Enter the start and end positionsin sequence seq1 ...YOU MUST ENTER TWO NUMBERS..... 0,0 for the complete sequence (not a ?): 0,0 Enter the start and end positionsin sequence seq2 ...YOU MUST ENTER TWO NUMBERS..... 0,0 for the complete sequence (not a ?): 0,0 Enter gapsize, matchlength, range,gap penalty--free format (? or a non-number for more info): 10,2,20,4 Do you want the input sequences printed out? 0=No, 1=sequence 1, 2=sequence 2, 3=both: 3 File for output ? (y/n): n Print out match table? (y/n): y seq1 10 aaccggtt seq2 10 aaccggcgcgcgcgcg seq1 seq2 limits: limits: 1 1 - 8 16 K 1START 1END 2START 2END LENGTH 1 2 1 9 6 6 1 17 6 6 6 0 Gapsize= 4.00 seq1 seq2 1 1 1 2 2 2 20 10 Matchlength= LIMITS: LIMITS: 1 1 - 2 Range= 20 Gap penalty= 8 16 tt aaccgg aaccgg cgcgcgcgcg 10 End of program....bye KNOWN PROBLEMS 1. Please be aware that TDALIGN is CASE-SENSITIVE at the prompt which asks for the sequence name. The sequence name must be entered in EXACTLY the same case as it exists in the sequence file. "SEQUENCE1" is different from "Sequence1" which is different from "sequence1"!! 2. Because of the algorithm that it uses, TDALIGN does not allow any offset at the 5' end. The two sequences must start with the same 5' residues.