tdalign

advertisement
TDALIGN (version 1.61)
December 1988 Dan Davison and Dr. Keith Thompson
("readme.doc" added 2/24/89 by Spencer Yeh)
INTRODUCTION
TDALIGN is a global alignment program for two nucleotide sequences or
two protein sequences. It works by placing the first residue of one
sequence opposite the first residue of the other sequence, and then
"stretching" the two sequences by adding gaps to find matching.
Because of this, it does NOT permit a gap at the 5' terminus. This may
be a problem for some users. The algorithm is described fully in
Davison and Thompson, "A non-metric sequence alignment algorithm", Bull.
Math. Biol. 1984 46(4): 579-590. The program is only minimally
user-friendly.
CONTACT ADDRESS
For questions about the program or suggestions for future improvements
please contact:
Dan Davison
Theoretical Biology and Biophysics Group
T-10 MS K710
Los Alamos National Laboratory
Los Alamos, NM 87545
tel.: (505) 665-1355
e-mail: dd@lanl.gov
or goad.davison@bionet-20.bio.net
CompuServe: 74065,41 (rarely)
SYSTEMS SUPPORTED
An IBM-compatible computer running MS-DOS is needed. Two versions of
the program are available; one for machines with a 80X87 math
coprocessor and one for machines without.
AVAILABILITY
TDALIGN is available for downloading from BIONET in the directory
<PC-SOFTWARE.DAVISON> or by postal mail from the BIONET Lending
Library. If you would like to receive TDALIGN by mail, please send a
stamped, self-addressed return envelope along with a formatted diskette
(specify capacity) and your request to:
BIONET Administrator
BIONET/IntelliGenetics, Inc.
700 East El Camino Real, Suite 300
Mountain View, CA 94040
tel.: (415) 962-7337
SOURCE CODE
Source code written in FORTRAN is available in the archive file
"TDALNSRC.ARC". The program was originally compiled under MicroSoft
FORTRAN, but has since been recompiled using RM FORTRAN 2.42 (also known
as AUSTEC FORTRAN) which is a far better implementation of FORTRAN.
Once installed the object files can be relinked with the DOS linker or
PLINK86, or the source code can be re-compiled. The RM FORTRAN
libraries are required.
PROGRAM FILES before de-ARCing (BIONET diskette version) 228 Kb:
README.DOC
TDALIGN.EXE
TDALNSRC.EXE
This documentation file. (8 Kb).
Self-extracting archive file for the executables
and documentation. (110 Kb).
Self-extracting archive file for the source code
and object files. (110 Kb).
PROGRAM FILES before de-ARCing
(BIONET downloadable version)
README.DOC
This documentation file.
TDALIGN.UUE Archived and uuencoded file for the executable and
documentation. (137 Kb).
TDALNSRC.UUE
Archived and uuencoded FORTRAN source code and object
libraries. (142 Kb).
PROGRAM FILES after de-ARCing (Approx. 210 Kb total)
READ
ALIGN
NTDALIGN
80X87.
TDALIGN
80X87.
SEQ1
SEQ2
ME
DOC
EXE
4219
20025
99920
12-11-88
12-11-88
12-11-88
12:43a
12:41a
12:24a
Documentation file.
Documentation file.
Executable for machines w/o
EXE
86416
12-10-88
12:31a
Executable for machines w/
SEQ
SEQ
35
43
1-15-88
1-15-88
9:09p
9:10p
Test data file.
Test data file.
DOCUMENTATION
The program is documented in the files READ.ME and TDALIGN.DOC in
addition to containing internal help messages.
The source code is also
commented.
STARTING THE PROGRAM
On the BIONET diskette version, the archive file TDALIGN.EXE is a
self-extracting archive, whereas the ".uue" files in the downloadable
version require both the UUDECODE program and an "ARC"-compatible
dearchiving program such as PKUNPAK to restore the executable file.
Self-dearchiving files:
De-archive the program by "running" the archive file and specifying the
drive and directory path where you want the program installed. E.g., to
install TDALIGN in the \tdalign directory of the c: drive, you should
type:
>TDALIGN c:\tdalign
".UUE" files:
First decode the ".uue" file:
>UUDECODE tdalign.uue
Then dearchive the resulting ".arc" file to an appropriate directory by
using PKUNPAK (or compatible program):
>PKUNPAK tdalign.arc c:\tdalign
Once installed, CD to the appropriate directory and then start the
program by typing its name at the MS-DOS prompt:
>NTDALIGN
or
>TDALIGN
(for machines without a 80X87 math coprocessor)
(for machines with a 80X87 math coprocessor)
SAMPLE PROGRAM OUTPUT
This program takes any two R/DNA or amino acid sequences in one letter
code
and compares them for similarity. Copyright 1982, 1984, 1985, 1986,
1987,
1988, 1989 by Dan Davison and Keith Thompson. Version 1.61 12/11/88
Enter the name of the file containing sequence 1
or a
?:
seq1.seq
Enter the name of the file containing the second sequence (can be in
the
same file) or a ?: seq2.seq
Enter the name of the first sequence (this parameter is case
sensitive)
or a ?: seq1
Enter the name of the second sequence (this parameter is case
sensitive)
or a ?: seq2
The sequences to be matched are
seq1
and
seq2
Enter the start and end positionsin sequence seq1
...YOU MUST ENTER TWO NUMBERS..... 0,0 for the complete sequence
(not a ?): 0,0
Enter the start and end positionsin sequence seq2
...YOU MUST ENTER TWO NUMBERS..... 0,0 for the complete sequence
(not a ?): 0,0
Enter gapsize, matchlength, range,gap penalty--free format
(? or a non-number for more info): 10,2,20,4
Do you want the input sequences printed out?
0=No, 1=sequence 1, 2=sequence 2, 3=both: 3
File for output ? (y/n):
n
Print out match table? (y/n):
y
seq1
10
aaccggtt
seq2
10
aaccggcgcgcgcgcg
seq1
seq2
limits:
limits:
1 1 -
8
16
K
1START
1END
2START
2END
LENGTH
1
2
1
9
6
6
1
17
6
6
6
0
Gapsize=
4.00
seq1
seq2
1
1
1
2
2
2
20
10 Matchlength=
LIMITS:
LIMITS:
1 1 -
2 Range=
20 Gap penalty=
8
16
tt
aaccgg
aaccgg
cgcgcgcgcg
10
End of program....bye
KNOWN PROBLEMS
1. Please be aware that TDALIGN is CASE-SENSITIVE at the prompt which
asks
for the sequence name. The sequence name must be entered in EXACTLY the
same case as it exists in the sequence file. "SEQUENCE1" is different
from "Sequence1" which is different from "sequence1"!!
2. Because of the algorithm that it uses, TDALIGN does not allow any
offset at the 5' end. The two sequences must start with the same 5'
residues.
Download