ALN3 and ALP3 (BIONET version 1.0) January 1989 Osamu Gotoh BIONET revisions March 1989 Spencer Yeh ("readme.doc" added 3/23/89 by Spencer Yeh) INTRODUCTION ALN3 and ALP3 are a pair of triple alignment programs, ALN3 for nucleic acid sequences and ALP3 for protein sequences. The phrase "ALX3" will be used to refer to both of the programs. The algorithm used in the programs is described in Gotoh, O. (1986) J. Theor. Biol. 121, 327-337. The user can create different protein scoring matrices by using the program MAKMDM.EXE. The two matrices provided by the author are MDM_1.dat and MDM_10.dat; the latter has data to more significant figures. The program uses a non-BIONET data format described in ALIGN3.DOC. The maximum sequence length limitations are not documented; however I have successfully aligned 3 sequences of 1000 bp. each using ALN3. Please be aware that the program is only minimally user-friendly. CONTACT ADDRESS For questions about the program or suggestions for future improvements please contact: Dr. Osamu Gotoh Dept. of Biochemistry Saitama Cancer Center Research Institute Ina-machi Saitama 362 JAPAN tel.: 0487-22-1111 (ext. 255) SYSTEMS SUPPORTED An IBM-compatible computer running MS-DOS (ver. 2.0 or greater) is needed. The MAKMDM program requires a 80X87 math coprocessor, but this program is not needed to run the ALX3 programs. AVAILABILITY These programs are available by anonymous FTP from BIONET (net.bio.net) in the directory ~ftp/public/dos/alx3 or, for BIONET subscribers only, by postal mail from the BIONET Lending Library. If you are a BIONET subscriber and would like to receive the ALX3 diskette by mail, please send a stamped, self-addressed return envelope along with a formatted diskette (specify capacity) and your request to : BIONET Administrator BIONET/IntelliGenetics, Inc. 700 East El Camino Real, Suite 300 Mountain View, CA 94040 tel.: (415) 962-7337 SOURCE CODE Source code written in C is available in the archive file "ALX3SRC.ARC". The program was originally compiled under Optimizing C86, but has since been modified to run under Turbo C (ver. 1.5a, Borland). The "diff.c" source file was missing on the diskette I received from Dr. Gotoh, but one can use the NCSEQ.LIB file without recompiling "diff.c". Changes were made in the BIONET version to make the default directory be the current directory. Please see REVIS.DOC. PROGRAM FILES before de-ARCing (BIONET diskette version) 210 Kb: README.DOC ALX3.EXE ALX3SRC.ARC This documentation file. (8 Kb). Self-extracting archive file for the executables and documentation. (123 Kb). Archive file for the source code and object files. (75 Kb). PROGRAM FILES before de-ARCing (BIONET downloadable version) README.DOC This documentation file. ALX3.UUE Archived and uuencoded file for the executable and documentation. (161 Kb). ALX3SRC.UUE Archived and uuencoded C source code and object libraries. (104 Kb). PROGRAM FILES after de-ARCing: ALIGN3 REVIS ALN3 ALP3 S1 S2 S3 S123 P1 P2 P3 P123 MDM_1 matrix. DOC DOC EXE EXE SEQ SEQ SEQ OUT PEP PEP PEP OUT DAT 12928 1017 42698 42962 77 81 77 1272 122 110 113 1255 8064 3-15-89 3-23-89 3-23-89 3-23-89 3-23-89 3-23-89 3-23-89 3-23-89 3-21-89 3-21-89 3-21-89 3-23-89 10-21-88 11:15a 10:18a 9:58a 9:58a 9:42a 9:43a 9:43a 10:01a 3:28p 3:28p 3:28p 10:02a 11:29a Dr. Gotoh's documentation. History of BIONET revisions. Executable file. Executable file. Nucl. acid test file. Nucl. acid test file. Nucl. acid test file. Sample ALN3 output file. Protein test file. Protein test file. Protein test file. Sample ALP3 output file. Default protein scoring MDM_10 DAT 8064 matrix. MAKMDM EXE 35298 matrices. MDSQ BAT 128 subdirectories. 10-21-88 11:30a High precision scoring 12-23-88 5:48p Program to create MDM 12-05-88 3:15p Batch file to create DOCUMENTATION The program is briefly documented in the file ALIGN3.DOC. The BIONET version has been altered to make the default drive be the current connected directory instead of B:\NAS or B:\PAS. These changes are documented in REVIS.DOC. There is no internal help to the program, and the source code is not commented. STARTING THE PROGRAM On the BIONET diskette version, the archive file ALX3.EXE is a self-extracting archive, whereas the ".uue" files in the downloadable version require both the UUDECODE program and an "ARC"-compatible dearchiving program such as PKUNPAK to restore the executable file. Self-dearchiving files: De-archive the program by "running" the archive file and specifying the drive and directory path where you want the program installed. E.g., to install aln3 in the \aln3 directory of the c: drive, you should type: >ALN3 c:\aln3 ".UUE" files: First decode the ".uue" file: >UUDECODE aln3.uue Then dearchive the resulting ".arc" file to an appropriate directory by using PKUNPAK (or compatible program): >PKUNPAK aln3.arc c:\aln3 Once installed, CD to the appropriate directory and then start the program by typing its name at the MS-DOS prompt: >ALN3 SAMPLE PROGRAM OUTPUT (of ALP3): p1.pep (1 - 107) - p2.pep (1 - 96) - p3.pep (1 - 99) PAM = 250, BIAS = 0, u = 6, v = 6 Dist3 = -125, 3-id. = 11, 1-id. = 36, 0-id. = 47, Gaps = 9, Unpairs = 14 p2.pep (1 - 96) - p3.pep (1 - 99) Dist2 = 6, Matches = 18 ( 17.82 %) Gaps = 6, Unpairs = 7 p3.pep (1 - 99) - p1.pep (1 - 107) Dist2 = 43, Matches = 13 ( 12.04 %) Gaps = 6, Unpairs = 10 p1.pep (1 - 107) - p2.pep (1 - 96) Dist2 = -180, Matches = 38 ( 35.51 %) Gaps = 5, Unpairs = 11 1 TVYTVGDSAGWKVPFFGDVDYDWKWASNKTFHIGDVLVFKYDRRFHNVDKVTQKNYQSCN 1 .** **.:.** * = * . * *: ****:*.*: *:* * * * .*: AVYVVGGSGGW--TFNTE---SW--PKGKRFRAGDILLFNYNPSMHNVVVVNQGGFSTCN 1 ::*. .* .* . * =. : * : .. * . . : . IDVLLGADDGS-LAFVPS---EFSISPGEKIVFKNNAGFPHNIVFDEDSIPSGVDASKIS 60 53 56 61 DTTPIASYNTGNN-RINLKTVGQKYYICGVPKHCDLGQKVHINVTVRS 54 . * .* : .*.* ** **** * **: * ** *: TPAGAKVYTSGRD-QIKL-PKGQSYFICNFPGHCQSGMKIAVNA---L 57 . : . * :* * **: * *. * * .** * MSEEDLLNAKGETFEVALSNKGEYSFYCS-P-HQGAGMVGKVTV---N 107 96 99 KNOWN PROBLEMS: 1. Make sure that you have the MDM_1.DAT file located in the current directory or the root directory of A:, B:, C:, D:, or E:, otherwise you will get a message that the E:MDM_1.DAT file was not found. 2. Printing seems to require the AUX option instead of the PRN option. 3. The program does not check for memory limitations and will crash if the program runs out of memory. 4. If you are trying to recompile the program, the "diff.c" source file is missing. However by updating the NCSEQ.LIB file, you probably won't need to use the missing source code.