alx3 - IUBio Archive for Biology

advertisement
ALN3 and ALP3 (BIONET version 1.0)
January 1989 Osamu Gotoh
BIONET revisions March 1989 Spencer Yeh
("readme.doc" added 3/23/89 by Spencer Yeh)
INTRODUCTION
ALN3 and ALP3 are a pair of triple alignment programs, ALN3 for nucleic
acid sequences and ALP3 for protein sequences. The phrase "ALX3" will
be used to refer to both of the programs. The algorithm used in the
programs is described in Gotoh, O. (1986) J. Theor. Biol. 121,
327-337. The user can create different protein scoring matrices by
using the program MAKMDM.EXE. The two matrices provided by the author
are MDM_1.dat and MDM_10.dat; the latter has data to more significant
figures. The program uses a non-BIONET data format described in
ALIGN3.DOC. The maximum sequence length limitations are not documented;
however I have successfully aligned 3 sequences of 1000 bp. each using
ALN3. Please be aware that the program is only minimally user-friendly.
CONTACT ADDRESS
For questions about the program or suggestions for future improvements
please contact:
Dr. Osamu Gotoh
Dept. of Biochemistry
Saitama Cancer Center Research Institute
Ina-machi Saitama 362
JAPAN
tel.:
0487-22-1111 (ext. 255)
SYSTEMS SUPPORTED
An IBM-compatible computer running MS-DOS (ver. 2.0 or greater) is
needed. The MAKMDM program requires a 80X87 math coprocessor, but this
program is not needed to run the ALX3 programs.
AVAILABILITY
These programs are available by anonymous FTP from BIONET (net.bio.net)
in the directory ~ftp/public/dos/alx3 or, for BIONET subscribers only,
by postal mail from the BIONET Lending Library. If you are a BIONET
subscriber and would like to receive the ALX3 diskette by mail, please
send a stamped, self-addressed return envelope along with a formatted
diskette (specify capacity) and your request to :
BIONET Administrator
BIONET/IntelliGenetics, Inc.
700 East El Camino Real, Suite 300
Mountain View, CA 94040
tel.: (415) 962-7337
SOURCE CODE
Source code written in C is available in the archive file "ALX3SRC.ARC".
The program was originally compiled under Optimizing C86, but has since
been modified to run under Turbo C (ver. 1.5a, Borland). The "diff.c"
source file was missing on the diskette I received from Dr. Gotoh, but
one can use the NCSEQ.LIB file without recompiling "diff.c". Changes
were made in the BIONET version to make the default directory be the
current directory. Please see REVIS.DOC.
PROGRAM FILES before de-ARCing (BIONET diskette version) 210 Kb:
README.DOC
ALX3.EXE
ALX3SRC.ARC
This documentation file. (8 Kb).
Self-extracting archive file for the executables
and documentation. (123 Kb).
Archive file for the source code
and object files. (75 Kb).
PROGRAM FILES before de-ARCing
(BIONET downloadable version)
README.DOC
This documentation file.
ALX3.UUE
Archived and uuencoded file for the executable and
documentation. (161 Kb).
ALX3SRC.UUE Archived and uuencoded C source code and object
libraries. (104 Kb).
PROGRAM FILES after de-ARCing:
ALIGN3
REVIS
ALN3
ALP3
S1
S2
S3
S123
P1
P2
P3
P123
MDM_1
matrix.
DOC
DOC
EXE
EXE
SEQ
SEQ
SEQ
OUT
PEP
PEP
PEP
OUT
DAT
12928
1017
42698
42962
77
81
77
1272
122
110
113
1255
8064
3-15-89
3-23-89
3-23-89
3-23-89
3-23-89
3-23-89
3-23-89
3-23-89
3-21-89
3-21-89
3-21-89
3-23-89
10-21-88
11:15a
10:18a
9:58a
9:58a
9:42a
9:43a
9:43a
10:01a
3:28p
3:28p
3:28p
10:02a
11:29a
Dr. Gotoh's documentation.
History of BIONET revisions.
Executable file.
Executable file.
Nucl. acid test file.
Nucl. acid test file.
Nucl. acid test file.
Sample ALN3 output file.
Protein test file.
Protein test file.
Protein test file.
Sample ALP3 output file.
Default protein scoring
MDM_10
DAT
8064
matrix.
MAKMDM
EXE
35298
matrices.
MDSQ
BAT
128
subdirectories.
10-21-88
11:30a
High precision scoring
12-23-88
5:48p
Program to create MDM
12-05-88
3:15p
Batch file to create
DOCUMENTATION
The program is briefly documented in the file ALIGN3.DOC. The BIONET
version has been altered to make the default drive be the current
connected directory instead of B:\NAS or B:\PAS. These changes are
documented in REVIS.DOC. There is no internal help to the program, and
the source code is not commented.
STARTING THE PROGRAM
On the BIONET diskette version, the archive file ALX3.EXE is a
self-extracting archive, whereas the ".uue" files in the downloadable
version require both the UUDECODE program and an "ARC"-compatible
dearchiving program such as PKUNPAK to restore the executable file.
Self-dearchiving files:
De-archive the program by "running" the archive file and specifying the
drive and directory path where you want the program installed. E.g., to
install aln3 in the \aln3 directory of the c: drive, you should type:
>ALN3 c:\aln3
".UUE" files:
First decode the ".uue" file:
>UUDECODE aln3.uue
Then dearchive the resulting ".arc" file to an appropriate directory by
using PKUNPAK (or compatible program):
>PKUNPAK aln3.arc c:\aln3
Once installed, CD to the appropriate directory and then start the
program by typing its name at the MS-DOS prompt:
>ALN3
SAMPLE PROGRAM OUTPUT (of ALP3):
p1.pep
(1 - 107) - p2.pep
(1 - 96) - p3.pep
(1 - 99)
PAM = 250, BIAS = 0, u = 6, v = 6
Dist3 = -125, 3-id. = 11, 1-id. = 36, 0-id. = 47, Gaps =
9, Unpairs
= 14
p2.pep
(1 - 96) - p3.pep
(1 - 99)
Dist2 =
6, Matches = 18 ( 17.82 %) Gaps =
6, Unpairs =
7
p3.pep
(1 - 99) - p1.pep
(1 - 107)
Dist2 = 43, Matches = 13 ( 12.04 %) Gaps =
6, Unpairs = 10
p1.pep
(1 - 107) - p2.pep
(1 - 96)
Dist2 = -180, Matches = 38 ( 35.51 %) Gaps =
5, Unpairs = 11
1
TVYTVGDSAGWKVPFFGDVDYDWKWASNKTFHIGDVLVFKYDRRFHNVDKVTQKNYQSCN
1
.** **.:.**
* =
* . * *: ****:*.*:
*:* * * * .*:
AVYVVGGSGGW--TFNTE---SW--PKGKRFRAGDILLFNYNPSMHNVVVVNQGGFSTCN
1
::*. .*
.*
. * =.
:
* :
.. * . . : .
IDVLLGADDGS-LAFVPS---EFSISPGEKIVFKNNAGFPHNIVFDEDSIPSGVDASKIS
60
53
56
61
DTTPIASYNTGNN-RINLKTVGQKYYICGVPKHCDLGQKVHINVTVRS
54
.
* .* : .*.*
** **** * **: * ** *:
TPAGAKVYTSGRD-QIKL-PKGQSYFICNFPGHCQSGMKIAVNA---L
57
.
: . *
:* * **: * *. * * .**
*
MSEEDLLNAKGETFEVALSNKGEYSFYCS-P-HQGAGMVGKVTV---N
107
96
99
KNOWN PROBLEMS:
1. Make sure that you have the MDM_1.DAT file located in the current
directory or the root directory of A:, B:, C:, D:, or E:, otherwise you
will get a message that the E:MDM_1.DAT file was not found.
2.
Printing seems to require the AUX option instead of the PRN option.
3. The program does not check for memory limitations and will crash
if the program runs out of memory.
4. If you are trying to recompile the program, the "diff.c" source file
is missing. However by updating the NCSEQ.LIB file, you probably won't
need to use the missing source code.
Download