Haploinfo

advertisement
Haploinfo
--Lin Wang
1. What is Haploinfo?
Haploinfo is a software tool designed to create a concise summary of distributions
used for FBAT’s haplotype analysis. It also generates a random set of offspring
phased and un-phased genotypes according to the conditional haplotype probabilities.
It also provides phased genotypes for observed family members unless the offspring
of the family is missing genotypes at one or more markers.
2. What is required in order to run Haploinfo?
Haploinfo is a set of perl scripts that depends on pedigree file as well as FBAT’s
“viewhaplotype” output. That means users need to run FBAT and generate a log file
first. Also, users need to have a Perl interpreter installed.
3. What is Haploinfo’s output?
Haploinfo will generate 5 output files:
(1). haplostat.txt: containing the haplotype frequencies
(2). haplodist.txt: containing the conditional probability of phased offspring
genotypes for each family and observed phased genotypes for each offspring.
(3). haplopermute.txt: random generated set of offspring phased genotypes for each
family.
(4). genopermute.txt: random generated set of offspring genotypes for each family.
(5). haplophase.txt: estimated haplotypes for each individual in the informative family.
4. Example
(1). Run FBAT first; in FBAT, type the following:
log fbatlog.txt (the FBAT output will be saved as “fbatlog.txt”)
load xbat.ped (here the pedigree file is named as “xbat.ped”)
viewh M1 M2 .. Mk (here the user specifies the desired markers in the haplotype)
log off
quit
(2). In the prompt, type the following:
perl haploinfo.pl fbatlog.txt fbat.ped
(3). After the program finishes, we’ll have 4 new files generated:
haplostat.txt: gives the definition and estimated frequency for each haplotype in
terms of the marker alleles.
haplodist.txt: each line corresponds to one nuclear family and gives the possible
phased genetypes (if phase if ambiguous, the probability of each possible phase is
given), and the probability of possible unordered offspring outcomes. That is, [P(G) 2
1 1.00] means all possible outcomes must have 2 offspring who are g1 and one who is
g2. The end of each line gives the genotype for each offspring, phased where possible.
Example 1:
1
[1 x 2] g1=h1|h3 1.000
Observed:
3:g2
g2=h1|h1 1.000
4:g1
5:g1
#g1
#g2
P[G]
2
1
1.000
The above line indicates there are two possible phase known genotypes (g1 and g2).
The distribution requires 2 sibs are g1 and 1 sib is g2 and the assignment is via
random permutation with equal probability for each outcome. The observed phased
genotypes are g2, g1 and g1 for individual 3, 4 and 5 respectively.
Example 2:
2
[1 x 2]
1
g1=h2|h4 0.025
1.000
g1=h3|h1 0.975
Observed:
g2=h1|h1 1.000
3:g2
4:g1
#g1
5:g1
#g2
P[G]
2
The above line indicates there is one phase known genotype g2 and one ambiguous
genotype g1. The two possible phases for g1 are assigned probabilities.025 and .975.
The distribution is obtained as in example 1 after first randomly selecting a phase for
g1. The observed phased genotypes are g2, g1 and g1 for individual 3, 4 and 5
respectively. Note, the randomly selected phase must be the same for all offspring in
the family.
haplopermute.txt.
Example:
Haplo
1 1 0 0 1 1 0
1 2 0 0 2 1 0
1 3 1 2 1 2 1
1 4 1 2 1 2 1
1 5 1 2 2 1 1
1 6 0 0 1 1 0
1 7 6 5 1 2 1
<-- Haplotype names
<-- Haplotype missing
0
0
1
3
3
0
1
<-- h1 | h1
<-- h1 | h3
<-- h1 | h3
<-- h1 | h1
The above example haplopermute.txt gives the permuted phased genotypes for each
family members. The parent’s phased genotype information is missing because FBAT
does not reconstruct the individual parent’s phases.
genopermute.txt.
Example:
m1 m2
1 1 0 0 1 1 0
1 2 0 0 2 1 1
1 3 1 2 1 2 1
1 4 1 2 1 2 1
1 5 1 2 2 1 1
1 6 0 0 1 1 1
1 7 6 5 1 2 1
0
2
1
2
2
1
1
0
1
1
1
1
1
1
0
2
1
2
2
2
1
The genopermute.txt gives the genotypes for each family members corresponding to
the permuted phased genotypes given in “haplopermute.txt”. The parent’s phased
genotype information is the same as appears in pedigree file.
haplophase.txt.
Example:
50115_12
121221212 121221212
50152_2
121221212 221211112
50152_3
121221212 121221212
121221212 221211112
1.000
1.000
0.142
0.858
The haplophase.txt gives the phased genotypes for each family members as well as
the corresponding probability. The haplotypes of individual 3 in family 50152 are:
121221212 121221212 with a probability of 0.14; and 121221212 221211112 with a
probability of 0.858.
Download