############################################################################### # # ####### # ####### ###### #### # # # # # # # # # # # # ### ### # # # # # # # # # # # # # # ###### #### # # Van der Sluis, S., Posthuma, D., & Dolan, C.V. # Dealing with genetically heterogeneous phenotypes: # TATES, a rapid and powerful trait-based test based on the GATES # procedure # # The TATES procedure is inspired by the Gates procedure described in: # Li, M-X, Gui, H-S., Kwan, J.S.H., & Sham, P.C. (2011). # GATES: A Rapid and Powerful Gene-Based Association Test # Using Extended Simes Procedure, American Journal of Human Genetics, # 88,283-293. # # # In case of questions, contact # Sophie van der Sluis: s.vander.sluis@vu.nl # ############################################################################### NOTE THAT THE SOFTWARE PROVIDED HERE CONCERNS PRELIMINARY VERSIONS! What does TATES do? TATES (Trait-based Association Test that uses Extended Simes procedure) is a free open source tool that combines the p-values obtained in standard genetic association analysis on multiple (correlated) phenotypes (i.e., depedent variables) to arrive at one trait-based p-value per SNP, while correcting for the correlations between the phenotypes. TATES requires as input a file with pre-generated P-values and a file with information on the correlational structure of the phenotypes (dependent variables). The TATES procedure is inspired by the GATES (Gene-based Association that uses Extended Simes procedure, Li et al., 2011). A full description of the procedure underlying TATES is provided in Van der Sluis, S., Posthuma, D., & Dolan, C.V. Dealing with genetically heterogeneous phenotypes: TATES, a rapid and powerful trait-based test based on the GATES procedure. What’s in the .rar file? In the compressed file TATES_online.rar, one finds the following files: TATES_in_R.r: a script to run TATES in R Example_cor: An example file containing a full correlation matrix for 12 variables (see below) Example_pvals: An example file containing p-values for 12 variables and 100 SNPs (see below) tates.exe: a FORTRAN .exe file that runs under DOS (see below) tates.xxx: a FORTRAN executable that runs on MAC/under Unix. defdims: a file used to define the dimensions of the input files for tates.exe tates.f: FORTRAN source code TATES can be run in R or using a FORTRAN program. For large data sets (i.e., genome-wide genetic information and many variables), we advice the use of the FORTRAN programs as these are faster than R. The results in terms of obtained TATES p-values are, however, the same. The time required to run TATES depends on the dimensions of your files (i.e., the number of SNPs and the number of phenotypes) and of course on the characteristics of your computer. For example, on an ordinary desktop computer (Intel(R) Core(TM)2 Duo CPU 2.99 GHz, RAM 2.94 GB, and 32-bit Windows XP Professional Version 2002), the Fortran program takes less than 1 minute to calculate the TATES p-values if the number of SNPs is ~250,000 and the number of phenotypes equals 12. However, with 2,500,000 SNPs and 20 phenotypes, TATES can take up to 30 minutes. The version that we put online now can handle up to 3,000,000 SNPs. How to run TATES in R One can use the R-script Tates_in_R.r to tun TATES in R. Required: 1. A file with the full, symmetrical correlation matrix between the nvar variables This file should NOT have a header. This file should have dimensions nvar*nvar. See file Example_cor for an example file for nvar=12. 2. A file with the p-values This file should NOT have a header. This file should have the following structure: Column 1: Chromosome number Column 2: SNP name / rs number Column 3: the p-values of the relations of all SNPs with variable 1 Column 4: the p-values of the relations of all SNPs with variable 2 etc This file should thus have dimensions nsnp*(nvar+2) Note that the order of the variables in this p-value file has to match the order of the variables in the correlation matrix file! See file Example_pvals for an example file for nvar=12 and nsnp=100. When running TATES in R, the results file consists of Example_pvals + one extra column containing the TATES p-values per SNP. How to run TATES in FORTRAN One can use the tates.exe file to run TATES (tates.f contains the source code). The following files are required to run TATES in FORTRAN: 1. A file with a full, symmetrical correlation matrix between the nvar variables (see above for requirements). 2. A file with p-values (see above for requirements). 3. The file defdims, in which one needs to define the dimensions of the input files and enter the following information: number_of_variables number_of_SNPs is_the_correlation_matrix_full_or_lower name_file_containing_the_p-values name_file_containing_the_correlation_matrix name_results_file The correlation matrix that you read in can either be full, or lower (i.e., free format, including the diagonal!). For the example data provided (Example_cor and Example_pvals), the settings in the defdim file are thus: 12 100 full Example_pvals Example_cor results_Example_TATES_F Note that for the version that we put online now, the number of characters per line in the defdim file cannot exceed 25! When running TATES in FORTRAN, the results file consists of three columns: CHR nr, SNP ID, and the TATES-pvalues per SNP. To run TATES using the tates.exe or the tates.xxx do the following: 1) Put all the files (i.e., file including correlation, file including p-values, defdim file and tates.exe or tates.xxx) in one folder. 2) Adjust the settings in the defdim file to fit your data. 3) Open the DOS command prompt and go to the directory where you stored the files. 4) You can now run TATES if you type tates.exe or tates.xxx (dependent on whether you run TATES on a DOS computer or MAC/Unix) and press ENTER. How long TATES runs depends on how fast your computer is. If TATES is finished you will see end tates ok in your DOS-prompt. In the folder, you will find the TATES results file and a tates.log file.