Readme file in Word format

advertisement
MISAT - Microsatellite analysis by maximum likelihood
Copyright 1997 (c) by Rasmus Nielsen.
Any injury or loss due to the use of this software
is not the responsibility of the author. This
software is provided "as is" without any express or
implied warranties, including, without limitation,
the implied warranties of merchantibility and fitness
for a particular purpose.
What does this program do?
This program enables estimation of  = 4N (N = population size,  = mutation
rate) by maximum likelihood for a microsatellite locus. It may also be applied to test the
one-step mutation model against a multi-step mutation model by a likelihood ratio test. It
uses the data from a microsatellite population sample, i.e. it assumes that the data is
obtained by random sampling from a randomly mating population.
How does the program work?
The program estimates the likelihood surface of the fundamental population
genetical parameter  for a microsatellite locus by a Markov chain recursion method. It
thereby provides a maximum likelihood estimate of  and an approximate confidence
interval for . It can also estimate the joint likelihood surface for  and a parameter p =
proportion of multi-step mutations. The hypothesis of no multi-step mutations can
thereby be tested by a likelihood ratio test. For details regarding the estimation procedure
please see Nielsen, R. 1997. A likelihood approach to microsatllite population samples.
Genetics. Because this program applies a Monte Carlo method for estimating the
likelihood surface, the estimation procedure may be very time consuming. Likewise, the
likelihood values obtained are estimates that may deviate from the true likelihood value.
In the current version of the program, runs through the Markov chains are truncated such
that a third decimal place error may occur. If you for some reason would like greater
precision in the estimate of the likelihood value, please contact the author.
How do you use the program?
To use the program you first need to create an infile (see below). Place the infile in the
same directory as MISAT and start the program. The program will first prompt you for
the name of the infile. Thereafter, you will be asked about the type of the locus. If it is a
dimer locus you press 2, if it is a tetramer then you press 4 etc. Thereafter, you will be
asked for four options:
1. Gridsize?
2. Use moments estimate for theta0?
3. Number of runs through Markov chain?
4. Estimate proportion of multi-step mutations?
5. Adaptive runs?
Option 1: The gridsize determines the number of likelihood values on a grid the program
should obtain. The default value is 40 and this value will be sufficient in most cases.
Increasing the gridsize will slow down the estimation procedure. However, the time it
takes the program to finish does not increase linearly with the number of gridpoints
because an importance sampling scheme is applied to estimate the likelihood for many
values of  at the same time. One initial value of  (0) is used to drive the Markov chain
simulations.
Options 2: 0 is the value of  that is used to drive the simulations. The closer this value
is to the true maximum likelihood estimate of , the better the estimation procedure will
perform. The default value in the program is the method of moments estimate under the
one-step model.
Option 3: The number of runs through the Markov chain determines how large the
variance in the estimate of the likelihood will be. For most data sets the default value of
100,000 runs will be sufficient. However, for large data sets (more than 50 genes) more
runs through the Markov chain should be performed.
Options 4: Estimation of the proportion of multi-step mutations is useful for testing the
one-step mutation model (see Nielsen 1997). However, the procedure is extremely time
consuming and can only be recommended for people with access to one or more very fast
computers that can be dedicated to the Markov chain estimation procedure. There are
two reasons for this. First, each run through the Markov chain is much slower under the
multi-step model. Second, at least 10 times as many runs are required in order to estimate
the extra parameter (p). If you choose to estimate the number of multi-step mutations you
should use a lower value of 0 than recommended for the 0ne-step mutation model.
Option 5. When option 5 is chosen the program continuously updates the value of 0 .
This option should be chosen in most cases.
If you do not want to estimate the proportion of multi-step mutations or perform the test
of the one-step mutation model and your data set is of small or moderate size you will in
most cases not want to change any of the options.
Creating the infile
The infile should contain the data from one population sample from one locus. It should
consist of two columns. The first column should contain the amplification fragment sizes
of the alleles and the second column should contain the counts of each allele. The end of
the file should contain a 0 (zero). Example:
22 1
24 9
28 4
30 23
32 2
36 4
0
In the above sample there are 1 copy of size 22, 9 copies of size 24 etc. You can use any
text editor to create the infile. If you use an editor such as MS-Word that creates
formatted files as a default, then remember to save the file as a text file (ASCII on a PC).
Interpretation of the output
The output of the program is a likelihood surface and it is stored in a new file,
‘likesurface’, created by the program. The maximum likelihood estimate of  is the
value of theta for which the largest (least negative) log likelihood value is obtained.
Please report any bugs to Rasmus Nielsen at e-mail: rn28@cornell.edu
Download