NOTE THAT THE SOFTWARE PROVIDED HERE CONCERNS

advertisement
###############################################################################
#
#
#######
#
####### ######
####
#
#
# #
#
#
#
#
#
#
#
#
###
###
#
#
# # # #
#
#
#
#
#
#
#
#
######
####
#
#
Van der Sluis, S., Posthuma, D., & Dolan, C.V.
#
Dealing with genetically heterogeneous phenotypes:
#
TATES, a rapid and powerful trait-based test based on the GATES
#
procedure
#
# The TATES procedure is inspired by the Gates procedure described in:
#
Li, M-X, Gui, H-S., Kwan, J.S.H., & Sham, P.C. (2011).
#
GATES: A Rapid and Powerful Gene-Based Association Test
#
Using Extended Simes Procedure, American Journal of Human Genetics,
#
88,283-293.
#
#
# In case of questions, contact
#
Sophie van der Sluis: s.vander.sluis@vu.nl
#
###############################################################################
NOTE THAT THE SOFTWARE PROVIDED HERE CONCERNS PRELIMINARY VERSIONS!
What does TATES do?
TATES (Trait-based Association Test that uses Extended Simes procedure) is a free open source tool that combines
the p-values obtained in standard genetic association analysis on multiple (correlated) phenotypes (i.e., depedent
variables) to arrive at one trait-based p-value per SNP, while correcting for the correlations between the
phenotypes.
TATES requires as input a file with pre-generated P-values and a file with information on the correlational structure
of the phenotypes (dependent variables).
The TATES procedure is inspired by the GATES (Gene-based Association that uses Extended Simes procedure, Li et
al., 2011). A full description of the procedure underlying TATES is provided in
Van der Sluis, S., Posthuma, D., & Dolan, C.V. Dealing with genetically heterogeneous phenotypes:
TATES, a rapid and powerful trait-based test based on the GATES procedure.
What’s in the .rar file?
In the compressed file TATES_online.rar, one finds the following files:
TATES_in_R.r: a script to run TATES in R
Example_cor: An example file containing a full correlation matrix
for 12 variables (see below)
Example_pvals: An example file containing p-values for 12 variables
and 100 SNPs (see below)
tates.exe: a FORTRAN .exe file that runs under DOS (see below)
tates.xxx: a FORTRAN executable that runs on MAC/under Unix.
defdims: a file used to define the dimensions of the input files for tates.exe
tates.f: FORTRAN source code
TATES can be run in R or using a FORTRAN program.
For large data sets (i.e., genome-wide genetic information and many variables), we advice the use of the FORTRAN
programs as these are faster than R. The results in terms of obtained TATES p-values are, however, the same.
The time required to run TATES depends on the dimensions of your files (i.e., the number of SNPs and the
number of phenotypes) and of course on the characteristics of your computer. For example, on an ordinary desktop
computer (Intel(R) Core(TM)2 Duo CPU 2.99 GHz, RAM 2.94 GB, and 32-bit Windows XP Professional Version 2002),
the Fortran program takes less than 1 minute to calculate the TATES p-values if the number of SNPs is ~250,000 and
the number of phenotypes equals 12. However, with 2,500,000 SNPs and 20 phenotypes, TATES can take up to 30
minutes.
The version that we put online now can handle up to 3,000,000 SNPs.
How to run TATES in R
One can use the R-script Tates_in_R.r to tun TATES in R.
Required:
1. A file with the full, symmetrical correlation matrix between the nvar variables
This file should NOT have a header.
This file should have dimensions nvar*nvar.
See file Example_cor for an example file for nvar=12.
2. A file with the p-values
This file should NOT have a header.
This file should have the following structure:
Column 1: Chromosome number
Column 2: SNP name / rs number
Column 3: the p-values of the relations of all SNPs with variable 1
Column 4: the p-values of the relations of all SNPs with variable 2
etc
This file should thus have dimensions nsnp*(nvar+2)
Note that the order of the variables in this p-value file has to match
the order of the variables in the correlation matrix file!
See file Example_pvals for an example file for nvar=12 and nsnp=100.
When running TATES in R, the results file consists of
Example_pvals + one extra column containing the TATES p-values per SNP.
How to run TATES in FORTRAN
One can use the tates.exe file to run TATES (tates.f contains the source code).
The following files are required to run TATES in FORTRAN:
1. A file with a full, symmetrical correlation matrix between the nvar variables
(see above for requirements).
2. A file with p-values (see above for requirements).
3. The file defdims, in which one needs to define the dimensions of the input files
and enter the following information:
number_of_variables number_of_SNPs
is_the_correlation_matrix_full_or_lower
name_file_containing_the_p-values
name_file_containing_the_correlation_matrix
name_results_file
The correlation matrix that you read in can either be full, or lower
(i.e., free format, including the diagonal!).
For the example data provided (Example_cor and Example_pvals), the settings
in the defdim file are thus:
12 100
full
Example_pvals
Example_cor
results_Example_TATES_F
Note that for the version that we put online now, the number of characters per line in the defdim file cannot exceed
25!
When running TATES in FORTRAN, the results file consists of three columns:
CHR nr, SNP ID, and the TATES-pvalues per SNP.
To run TATES using the tates.exe or the tates.xxx do the following:
1) Put all the files (i.e., file including correlation, file including p-values,
defdim file and tates.exe or tates.xxx) in one folder.
2) Adjust the settings in the defdim file to fit your data.
3) Open the DOS command prompt and go to the directory where you stored the files.
4) You can now run TATES if you type tates.exe or tates.xxx (dependent on whether you run TATES on a DOS
computer or MAC/Unix) and press ENTER.
How long TATES runs depends on how fast your computer is.
If TATES is finished you will see end tates ok in your DOS-prompt.
In the folder, you will find the TATES results file and a tates.log file.
Download