LINKMFEX - University of Guelph

advertisement
EPI-MATERNAL-HS
EPI-PATERNAL-HS
Version 1.0
By: Roy G. Danzmann,
Department of Integrative Biology,
University of Guelph,
Guelph, Ontario, Canada N1G 2W1
Phone: (519) 824-4120 ext. 58364
FAX:
(519) 767-1656
Email: rdanzman@uoguelph.ca
Date : September 25, 2009
Welcome to the EPI-MATERNAL-HS and EPI-MATERNAL-HS programs. As the titles
indicate, these programs allow you to assess the degree of epistasis for a given trait between a pair
of maternal or paternal half-sib families. NOTE: You are only able to use this program to
calculate epistasis between a pair of half-sib families. If your mating structure involves multiple
half-sib families (i.e., k > 2) you will need to recalculate estimates among all the possible pairwise
half-sib groupings (i.e., k*(k-1) / 2 pairwise combinations).
Genotypes and marker designations need to be input according to LINKMFEX formats, which
are described in the documentation to the program found at: www.uoguelph.ca/~rdanzman/software and
are also described briefly below.
MAIN DATA FILE STRUCTURE FOR LINKMFEX FORMAT FILES.
Two main types of data files must be generated by the user prior to running LINKMFEX.
These files may be generated directly as an ASCII text file using a text editor such as NOTEPAD,
TEXTPAD, or any standard word processing package. You may also use a spreadsheet program
such as EXCEL to store your data and then print out the data set in ASCII format prior to running
LINKMFEX. NOTE: If you choose to convert your data from a spreadsheet format make sure that
you do so in either a COMMA or SPACE delimited format. Any data conversions made in a TAB
delimited format will cause the programs to ‘crash’. If errors are encountered during program
execution, these may be due to hidden TABS within the data set. I have encountered such hidden
tabs even when converting data in a presumed SPACE delimited format. Therefore, I would
recommend that you only use a COMMA delimited format for data conversion.
1. Marker description (*.loc) files.
Files with a *.loc extension contain information on the markers used for linkage analysis.
Each genetic marker/locus is identified by a numeric value followed by the name of the marker.
Numbers are separated from the name of the marker by a space or a comma as follows:
1,SEX
2,OmyFGT2TUF
3,OmyFGT3TUF
4,OmyFGT4TUF
5,OmyFGT5TUF
6,OmyFGT6TUF
7,OmyFGT7TUF
8,OmyFGT8/iTUF
9,OmyFGT8/iiTUF
10,OmyFGT9TUF
11,OmyFGT10TUF
12,OmyFGT11TUF
13,OmyFGT12TUF
14,OmyFGT13TUF
2
15,OmyFGT14TUF
16,OmyFGT15TUF
17,OmyFGT16TUF
18,OmyFGT17TUF
.
.
.
.
etc.
Numbers need not be entered in sequence, or alphabetically, as LINKMFEX will
automatically sort your marker list if you are selecting one or a pair of markers for analysis. Up to
10 different *.loc files may be specified for input. These may contain data on different marker
types such as microsatellites, AFLPs, expressed sequence markers, etc. You can also enter all
these markers sequentially in one *.loc file.
2. Family genotype (*.fam) files.
Files with a *.fam extension contain the genotypes of the mapping parents and progeny and
are arranged as follows:
Family_Name, lot-25
No_progeny, 48
72 2 41 32 42 43
5 2 21 31 21 32
195 2 22 31 32 21
132 2 24 13 34 14
189 2 12 13 32 21
253 2 23 13 23 21
91 2 21 34 42 32
58 2 11 23 12 13
101 2 21 32 31 21
133 1 1 2 1 1
 
F
M
21
11
32
23
11
33
41
13
22
2
42
21
32
34
32
23
42
13
21
1
42
11
32
23
31
23
41
13
22
2
31
31
21
12
11
21
31
13
22
2
21
11
32
0
31
33
41
12
31
0
43
32
21
14
21
21
32
12
31
1
42
21
32
34
32
23
42
13
22
2
31
31
21
12
11
31
31
12
31
1
21
11
32
23
31
33
41
12
32
2
21 ……
11 ……
32 ……
23 ……
31 ……
33 ……
41 ……
13 ……
22 ……
2 ……
The example shows the genotypes obtained from 12 of 48 mapping progeny in a mapping family,
plus the genotypes of the mapping parents. The first column contains the marker number
corresponding to the makers designated in the *.loc file. The second column indicates whether the
marker is a codominant = 2, or dominant/recessive = 1 marker type. Markers 72, 5, 195, 132, 189,
253, 91, 58, and 101 are all co-dominant markers, while marker 133 is a dominant/recessive (eg.
AFLP, RAPD-type) marker. The third column MUST contain the genotype of the female (F)
3
mapping parent, while column 4 MUST contain the genotype of the male (M) mapping parent.
All codominant markers have alleles designated with numbers from 1 to 9 and the order in
which these numbers are input does not matter. Thus 19 is equivalent to 91 and indicates a
heterozygous genotype, while 99 would indicate a homozygous condition for allele type 9. Allele
numbers in the progeny must correspond to those given for the parents. If an input error does occur
(i.e. unrecognized or impossible progeny genotype) LINKMFEX will alert you of the input error.
For dominant/recessive loci, the presence of expression (+/-) genotypes are designated with
a 2. Homozygous recessive genotypes (-/-) (i.e. lack of expression) are designated with a 1.
Homozygous dominant (+/+) genotypes are not considered as a possible parental genotype since
such genotypes would be considered non informative for linkage.
NOTE: For dominant/recessive type markers, if it can be inferred that both parents are
heterozygous in genotype (ie. +/-) by observing double homozygous absent (-/-) progeny
genotypes then such progeny are potentially informative for linkage. However, the fact that only
25% of the progeny may be informative for mapping precludes the usefulness of such markers for
linkage.
NOTE: All missing genotypes in the progeny must be coded with a 0 value. Therefore, if
you have null alleles segregating in your cross and typically code these with a 0 value, is will be
necessary to substitute another numeric value for their designation when entering codominant
markers. When entering data for dominant/recessive markers, the absence of expression or null
genotype must be coded with a 1 value, while the presence of band is coded with a 2.
NOTE: You must not have any blank lines at the end of your *.fam or *.loc input
files. If you receive an INPUT PAST END OF FILE error message, then it is likely that blank
lines exist at the end of the file. The lines must be deleted.
NOTE: Header designations such as Family_Name, and No_progeny, MUST NOT have
blank spaces in the header names or the name of the source mapping family, or the program will
terminate. It is also important to place a comma dividing the header name from the input value or
use quotation marks to surround the header name: eg. “Family Name”. You do not necessarily
need to use the first header name as shown, and may simply use: Family. You MUST, however,
structure the second header name as: No_progeny, otherwise the program will not execute.
LINKMFEX uses this header as a flag to ensure that blank spaces have not been inserted into your
actual family name. This will cause a shift in the “reading frame” of the data input and lead to
errors.
The example data file depicted above indicates genotypic variation arising from both
parental genomes which would of course be expected in any genetic cross. In the calculation of
half-sib epistasis implemented with this software, however, the goal is to examine the allelic
associations with phenotypic variation in progeny that may be influenced by different parental
genetic backgrounds. Hence the genomic backgrounds of the non-common parents in the half-sib
structure are considered homogeneous (i.e., the modifying influence) for the purpose of the
analysis, and the allelic contributions for each of the two non-common parents are re-coded as a
4
single allelic type (1 vs. 2) for simplicity. If you have data files initially set up for LINKMFEX
analysis using complete genotypic vectors, you should first run the program FAMRECODE, which
is included in the software *.zip file, to recode the allelic values of the non-common parent in the
cross. You will notice that when you run this program, that all the allelic values for the noncommon parent are re-coded to the value of 9. You need to then append the two data files together
to make one complete *.fam file for input into either the maternal or paternal half-sib analysis
program. Remember to adjust the progeny count on the second line of the *.fam file to reflect the
sample size across both half-sib families. When you run the program, you will be asked how many
progeny are present in the first half-sib family. You have to manually enter this number into the
program, and the program will then recode the first non-common parental genome as 1, and second
non-common parental genome as 2. Trait values are listed in the last row of the *.fam file. Each
trait value needs to correspond to the individual progeny genotype column. However, please note
that the first four columns of the *.fam do no correspond to progeny genotypes. Therefore, you
need to enter ‘dummy’ values into these 4 columns. When you run the program, you will also be
asked to provide the name of a linkage group designation file *.lgr. This file is simply a comma
delimited file that contains the linkage group names of the markers you are analyzing. It is
intended to show you which genomic regions may be involved in epistatic interactions once you
examine your output files. An example format of the *.lgr file is as follows. You may of course
designate linkage groups as you wish.
Lg1,SEX
Lg2,OmyFGT2TUF
Lg2,OmyFGT3TUF
Lg4,OmyFGT4TUF
Lg15,OmyFGT5TUF
Lg16,OmyFGT6TUF
Lg17,OmyFGT7TUF
Lg18,OmyFGT8/iTUF
Lg9,OmyFGT8/iiTUF
Lg10,OmyFGT9TUF
Lg10,OmyFGT10TUF
Lg7,OmyFGT11TUF
Lg13,OmyFGT12TUF
Lg14,OmyFGT13TUF
Lg15,OmyFGT14TUF
Lg16,OmyFGT15TUF
Lg17,OmyFGT16TUF
Lg18,OmyFGT17TUF
.
.
.
.
etc.
The estimation of epistasis follows the model outlined in Cheverud and Routman (1995) for
di-locus epistasis estimates, as modified by Danzmann et al. (1999) for the estimation of epistasis
5
between two different genomic backgrounds. The model is described below:
6
7
Output files list only those pairwise genetic locus combinations possessing significant interactions
at the level of significance indicated by the user.
An example is shown below:
Significance level = 0.05
8
Epistasis values are shown for pairs of markers with significant cell values
Female
Male
Female Male
Genotypic
Epist.
non-Epist.
Marker 1
Marker 2
N
Allele
Allele
Value Value
Value
OMM1230
OmyRGT1TUF
12
1
1
1.691
2.06
-0.37
OMM1230
OmyRGT1TUF
8
1
2
2.303
1.934
0.369
OMM1230
OmyRGT1TUF
14
2
1
2.737
2.368
0.369
OMM1230
OmyRGT1TUF
30
2
2
1.872
2.242
-0.37
t-value
ratio
-1.417
0.906
1.503
-2.247
Link. Gr.
Female
Marker 1
RT-3
Link. Gr.
Male
Marker 2
RT-31
References:
Cheverud JM, Routman EJ: Epistasis and its contribution to genetic variance components.
Genetics 1995, 139:1455-1461.
Danzmann RG, Jackson TR, Ferguson MM: Epistasis in allelic expression at upper temperature
tolerance QTL in rainbow trout. Aquaculture 1999, 173:45-58.
9
10
11
Download