EPI-MATERNAL-HS EPI-PATERNAL-HS Version 1.0 By: Roy G. Danzmann, Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada N1G 2W1 Phone: (519) 824-4120 ext. 58364 FAX: (519) 767-1656 Email: rdanzman@uoguelph.ca Date : September 25, 2009 Welcome to the EPI-MATERNAL-HS and EPI-MATERNAL-HS programs. As the titles indicate, these programs allow you to assess the degree of epistasis for a given trait between a pair of maternal or paternal half-sib families. NOTE: You are only able to use this program to calculate epistasis between a pair of half-sib families. If your mating structure involves multiple half-sib families (i.e., k > 2) you will need to recalculate estimates among all the possible pairwise half-sib groupings (i.e., k*(k-1) / 2 pairwise combinations). Genotypes and marker designations need to be input according to LINKMFEX formats, which are described in the documentation to the program found at: www.uoguelph.ca/~rdanzman/software and are also described briefly below. MAIN DATA FILE STRUCTURE FOR LINKMFEX FORMAT FILES. Two main types of data files must be generated by the user prior to running LINKMFEX. These files may be generated directly as an ASCII text file using a text editor such as NOTEPAD, TEXTPAD, or any standard word processing package. You may also use a spreadsheet program such as EXCEL to store your data and then print out the data set in ASCII format prior to running LINKMFEX. NOTE: If you choose to convert your data from a spreadsheet format make sure that you do so in either a COMMA or SPACE delimited format. Any data conversions made in a TAB delimited format will cause the programs to ‘crash’. If errors are encountered during program execution, these may be due to hidden TABS within the data set. I have encountered such hidden tabs even when converting data in a presumed SPACE delimited format. Therefore, I would recommend that you only use a COMMA delimited format for data conversion. 1. Marker description (*.loc) files. Files with a *.loc extension contain information on the markers used for linkage analysis. Each genetic marker/locus is identified by a numeric value followed by the name of the marker. Numbers are separated from the name of the marker by a space or a comma as follows: 1,SEX 2,OmyFGT2TUF 3,OmyFGT3TUF 4,OmyFGT4TUF 5,OmyFGT5TUF 6,OmyFGT6TUF 7,OmyFGT7TUF 8,OmyFGT8/iTUF 9,OmyFGT8/iiTUF 10,OmyFGT9TUF 11,OmyFGT10TUF 12,OmyFGT11TUF 13,OmyFGT12TUF 14,OmyFGT13TUF 2 15,OmyFGT14TUF 16,OmyFGT15TUF 17,OmyFGT16TUF 18,OmyFGT17TUF . . . . etc. Numbers need not be entered in sequence, or alphabetically, as LINKMFEX will automatically sort your marker list if you are selecting one or a pair of markers for analysis. Up to 10 different *.loc files may be specified for input. These may contain data on different marker types such as microsatellites, AFLPs, expressed sequence markers, etc. You can also enter all these markers sequentially in one *.loc file. 2. Family genotype (*.fam) files. Files with a *.fam extension contain the genotypes of the mapping parents and progeny and are arranged as follows: Family_Name, lot-25 No_progeny, 48 72 2 41 32 42 43 5 2 21 31 21 32 195 2 22 31 32 21 132 2 24 13 34 14 189 2 12 13 32 21 253 2 23 13 23 21 91 2 21 34 42 32 58 2 11 23 12 13 101 2 21 32 31 21 133 1 1 2 1 1 F M 21 11 32 23 11 33 41 13 22 2 42 21 32 34 32 23 42 13 21 1 42 11 32 23 31 23 41 13 22 2 31 31 21 12 11 21 31 13 22 2 21 11 32 0 31 33 41 12 31 0 43 32 21 14 21 21 32 12 31 1 42 21 32 34 32 23 42 13 22 2 31 31 21 12 11 31 31 12 31 1 21 11 32 23 31 33 41 12 32 2 21 …… 11 …… 32 …… 23 …… 31 …… 33 …… 41 …… 13 …… 22 …… 2 …… The example shows the genotypes obtained from 12 of 48 mapping progeny in a mapping family, plus the genotypes of the mapping parents. The first column contains the marker number corresponding to the makers designated in the *.loc file. The second column indicates whether the marker is a codominant = 2, or dominant/recessive = 1 marker type. Markers 72, 5, 195, 132, 189, 253, 91, 58, and 101 are all co-dominant markers, while marker 133 is a dominant/recessive (eg. AFLP, RAPD-type) marker. The third column MUST contain the genotype of the female (F) 3 mapping parent, while column 4 MUST contain the genotype of the male (M) mapping parent. All codominant markers have alleles designated with numbers from 1 to 9 and the order in which these numbers are input does not matter. Thus 19 is equivalent to 91 and indicates a heterozygous genotype, while 99 would indicate a homozygous condition for allele type 9. Allele numbers in the progeny must correspond to those given for the parents. If an input error does occur (i.e. unrecognized or impossible progeny genotype) LINKMFEX will alert you of the input error. For dominant/recessive loci, the presence of expression (+/-) genotypes are designated with a 2. Homozygous recessive genotypes (-/-) (i.e. lack of expression) are designated with a 1. Homozygous dominant (+/+) genotypes are not considered as a possible parental genotype since such genotypes would be considered non informative for linkage. NOTE: For dominant/recessive type markers, if it can be inferred that both parents are heterozygous in genotype (ie. +/-) by observing double homozygous absent (-/-) progeny genotypes then such progeny are potentially informative for linkage. However, the fact that only 25% of the progeny may be informative for mapping precludes the usefulness of such markers for linkage. NOTE: All missing genotypes in the progeny must be coded with a 0 value. Therefore, if you have null alleles segregating in your cross and typically code these with a 0 value, is will be necessary to substitute another numeric value for their designation when entering codominant markers. When entering data for dominant/recessive markers, the absence of expression or null genotype must be coded with a 1 value, while the presence of band is coded with a 2. NOTE: You must not have any blank lines at the end of your *.fam or *.loc input files. If you receive an INPUT PAST END OF FILE error message, then it is likely that blank lines exist at the end of the file. The lines must be deleted. NOTE: Header designations such as Family_Name, and No_progeny, MUST NOT have blank spaces in the header names or the name of the source mapping family, or the program will terminate. It is also important to place a comma dividing the header name from the input value or use quotation marks to surround the header name: eg. “Family Name”. You do not necessarily need to use the first header name as shown, and may simply use: Family. You MUST, however, structure the second header name as: No_progeny, otherwise the program will not execute. LINKMFEX uses this header as a flag to ensure that blank spaces have not been inserted into your actual family name. This will cause a shift in the “reading frame” of the data input and lead to errors. The example data file depicted above indicates genotypic variation arising from both parental genomes which would of course be expected in any genetic cross. In the calculation of half-sib epistasis implemented with this software, however, the goal is to examine the allelic associations with phenotypic variation in progeny that may be influenced by different parental genetic backgrounds. Hence the genomic backgrounds of the non-common parents in the half-sib structure are considered homogeneous (i.e., the modifying influence) for the purpose of the analysis, and the allelic contributions for each of the two non-common parents are re-coded as a 4 single allelic type (1 vs. 2) for simplicity. If you have data files initially set up for LINKMFEX analysis using complete genotypic vectors, you should first run the program FAMRECODE, which is included in the software *.zip file, to recode the allelic values of the non-common parent in the cross. You will notice that when you run this program, that all the allelic values for the noncommon parent are re-coded to the value of 9. You need to then append the two data files together to make one complete *.fam file for input into either the maternal or paternal half-sib analysis program. Remember to adjust the progeny count on the second line of the *.fam file to reflect the sample size across both half-sib families. When you run the program, you will be asked how many progeny are present in the first half-sib family. You have to manually enter this number into the program, and the program will then recode the first non-common parental genome as 1, and second non-common parental genome as 2. Trait values are listed in the last row of the *.fam file. Each trait value needs to correspond to the individual progeny genotype column. However, please note that the first four columns of the *.fam do no correspond to progeny genotypes. Therefore, you need to enter ‘dummy’ values into these 4 columns. When you run the program, you will also be asked to provide the name of a linkage group designation file *.lgr. This file is simply a comma delimited file that contains the linkage group names of the markers you are analyzing. It is intended to show you which genomic regions may be involved in epistatic interactions once you examine your output files. An example format of the *.lgr file is as follows. You may of course designate linkage groups as you wish. Lg1,SEX Lg2,OmyFGT2TUF Lg2,OmyFGT3TUF Lg4,OmyFGT4TUF Lg15,OmyFGT5TUF Lg16,OmyFGT6TUF Lg17,OmyFGT7TUF Lg18,OmyFGT8/iTUF Lg9,OmyFGT8/iiTUF Lg10,OmyFGT9TUF Lg10,OmyFGT10TUF Lg7,OmyFGT11TUF Lg13,OmyFGT12TUF Lg14,OmyFGT13TUF Lg15,OmyFGT14TUF Lg16,OmyFGT15TUF Lg17,OmyFGT16TUF Lg18,OmyFGT17TUF . . . . etc. The estimation of epistasis follows the model outlined in Cheverud and Routman (1995) for di-locus epistasis estimates, as modified by Danzmann et al. (1999) for the estimation of epistasis 5 between two different genomic backgrounds. The model is described below: 6 7 Output files list only those pairwise genetic locus combinations possessing significant interactions at the level of significance indicated by the user. An example is shown below: Significance level = 0.05 8 Epistasis values are shown for pairs of markers with significant cell values Female Male Female Male Genotypic Epist. non-Epist. Marker 1 Marker 2 N Allele Allele Value Value Value OMM1230 OmyRGT1TUF 12 1 1 1.691 2.06 -0.37 OMM1230 OmyRGT1TUF 8 1 2 2.303 1.934 0.369 OMM1230 OmyRGT1TUF 14 2 1 2.737 2.368 0.369 OMM1230 OmyRGT1TUF 30 2 2 1.872 2.242 -0.37 t-value ratio -1.417 0.906 1.503 -2.247 Link. Gr. Female Marker 1 RT-3 Link. Gr. Male Marker 2 RT-31 References: Cheverud JM, Routman EJ: Epistasis and its contribution to genetic variance components. Genetics 1995, 139:1455-1461. Danzmann RG, Jackson TR, Ferguson MM: Epistasis in allelic expression at upper temperature tolerance QTL in rainbow trout. Aquaculture 1999, 173:45-58. 9 10 11