October 16, 2009 FASTMAP USER MANUAL FASTMAP User Manual 1 FASTMAP USER MANUAL October 16, 2009 Contents 1. About FastMap...................................................................................................................................... 4 2. FastMap Installation ............................................................................................................................. 4 3 4 5 2.1 Requirements ................................................................................................................................ 4 2.2 How to launch FastMap ................................................................................................................ 5 Interface ................................................................................................................................................ 6 3.1 Main interface ............................................................................................................................... 6 3.2 File menu (Alt+F) ........................................................................................................................... 7 3.3 Association menu (Alt+A).............................................................................................................. 8 3.4 Tools menu (Alt+L) ...................................................................................................................... 10 3.5 Options window .......................................................................................................................... 13 Input Files ............................................................................................................................................ 17 4.1 Gene expression file.................................................................................................................... 17 4.2 SNPs data file .............................................................................................................................. 19 4.3 HapMap SNP data file ................................................................................................................. 20 How to calculate an association ......................................................................................................... 21 5.1 Set the data option ..................................................................................................................... 21 5.2 Load the files ............................................................................................................................... 21 5.3 Plot a chart .................................................................................................................................. 21 5.4 Save a chart ................................................................................................................................. 20 5.5 Zoom and UCSC Browser ............................................................................................................ 23 5.6 Write the association .................................................................................................................. 24 5.8 Write LocusZoom-ready files…………………………………………………………………………………………………….24 5.7 6 Output files ............................................................................................................................... 265 Possible error messages ………………………………………………………………………………………………………………26 2 FASTMAP USER MANUAL 3 October 16, 2009 October 16, 2009 FASTMAP USER MANUAL 1. About FastMap Gene expression Quantitative Trait Locus (eQTL) mapping measures the association between transcript expression and genotype in order to find genomic locations likely to regulate transcript expression. The availability of both gene expression and high-density genotype data has improved our ability to perform eQTL mapping in inbred mouse and other homozygous populations. However, existing eQTL mapping software does not scale well when the number of transcripts and markers are on the order of 105 and 105–106, respectively. We propose a new method, FastMap, for fast and efficient eQTL mapping in homozygous inbred populations with binary allele calls. FastMap exploits the discrete nature and structure of the measured single nucleotide polymorphisms (SNPs). In particular, SNPs are organized into a Hamming distance-based tree that minimizes the number of arithmetic operations required to calculate the association of a SNP by making use of the association of its parent SNP in the tree. FastMap's tree can be used to perform both single marker mapping and haplotype association mapping over an m-SNP window. These performance enhancements also permit permutation-based significance testing. 2. FastMap Installation 2.1 Requirements In order to use FastMap, a computer with 2GB of RAM is needed. A JAVA Runtime Environment (JRE) 6 is required to run FastMap. The JRE 6 can be downloaded in this website: http://java.sun.com/javase/downloads/index.jsp 4 FASTMAP USER MANUAL 2.2 October 16, 2009 How to launch FastMap FastMap does not need to be installed. Unzip the archive FastMap.zip anywhere. The unzipped directory contains three executable files: FastMap.exe, FastMap.bat and FastMap.alt.bat. To launch double-click on FastMap.exe. If FastMap.exe does not work, double-click on FastMap.bat. If FastMap.bat does not work, edit FastMap.alt.bat and put the path of the directory where the JAVA 6 JRE is installed in the variable JAVA_HOME. 5 October 16, 2009 FASTMAP USER MANUAL 3 Interface 3.1 Main interface The menus allow you to load and to save files, to calculate association (linear model, ANOVA) and to set up options. The task buttons allow you to execute the main actions (to load files, to calculate association …) without having to go through the menus. Once a gene expression files is loaded, the list of the genes present in the file will be displayed in the gene list area. Double-clicking (or using the right-click) on the gene name will calculate the association for this gene. The progress bar will display the progression of a task (loading, calculating, etc…). The cancel task button allows you to stop the execution of a task. The result of the calculation will be displayed in the chart. 6 FASTMAP USER MANUAL 3.2 October 16, 2009 File menu (Alt+F) The File menu allows files loading and saving. The gene expression file has to be loaded first before loading other files. Once the gene expression file is loaded, a SNP data file or a directory with HapMap SNP data file can be loaded. 7 FASTMAP USER MANUAL 3.3 October 16, 2009 Association menu (Alt+A) The Association menu allows you to calculate the association (Linear model, ANOVA). The Plot Association menu item will plot the association on the chart area. 8 FASTMAP USER MANUAL October 16, 2009 After plotting the association, the chart can be written into a file using the Save Chart menu item. On the chart area, you can zoom on an area using the left button, the right click menu or the mouse wheel. If the zoomed area is within a chromosome, the UCSC Browser menu item will connect to the UCSC website and display the selected region on the UCSC Browse. 9 FASTMAP USER MANUAL October 16, 2009 The association can also be written into a file using the Write Association menu item. 10 FASTMAP USER MANUAL 3.4 October 16, 2009 Tools menu (Alt+L) The Tools menu allows you to set up options and see the SNP similarity matrix. The SNP similarity window will show the similarity between the strains or the individuals in the SNP data file. 11 FASTMAP USER MANUAL October 16, 2009 The strains (or individuals) can be divided into 2 groups for mean or median subtracting by checking the name by clicking “OK” button on the similarity window and rewriting the chart (by clicking “Plot Association” button on the main toolbar). The Options menu item opens the options window and allows you to set options. 12 FASTMAP USER MANUAL 3.5 October 16, 2009 Options window The options window contains 3 tabs: Output, Data Processing and Data Significance. The Output tab set options about the data output. In the General box, you can set the default output directory where the result will be written and you can also set whether the chart with the association result will be written into a 13 FASTMAP USER MANUAL October 16, 2009 file or not. To set the output directory, check the Output Dir checkbox then enter the directory path into the text field or use the Browse button to choose the directory. If you want the chart to be written, check the Write chart checkbox then choose the file type. In the Summary Output box and the Detailed Output box, you can choose the output files that will be written. By default only the file with the maximum statistic for each calculated gene is written. The file with statistic above a certain threshold can be written by checking the All Statistics Above Threshold checkbox then choose the threshold. By checking the Write Associations checkbox, the file with all the association values will be written. The Write Threshold checkbox if checked will allow the file with all the threshold value to be written. The file with all the permutation values will be written if the Write Permutations checkbox is checked. The Data Processing tab set options about SNP data processing and filtering. The SNPs box allows you to set the data type for the SNP file. The data can be mouse data (20 chromosome pairs) or human data (23 chromosome pairs), homozygous or heterozygous. If the data is homozygous, the data can be read SNP by SNP or it can be read as a window of several SNPs (for example three by three). 14 FASTMAP USER MANUAL October 16, 2009 The Statistic Type box allows you to choose the association type. For homozygous data if the SNP window size is 1, the association type will only be Linear Model (correlation), if the SNP window size is greater than 1, the association type will be ANOVA (-log(p-value)). For heterozygous data, you will have to choose between Linear Model (-log(p-value)) or ANOVA (log(p-value)). The SNPs Filtering box allows you to set the minor allele frequency. The default value is 5% which means that each line where the number of stains with the minor allele is less than 5% of the total number of strains will be filtered. The Subtracting method box set the subtracting method if the strains are divided into 2 groups using the SNPs similarity window. If Mean subtract is selected, for each group the mean value of the gene expression data will be subtracted to the gene expression value of the each strain. If Median subtract is selected, it’s the median value of the gene expression data that will be subtracted. The Data Significance tab set options about the permutations and the thresholds. 15 FASTMAP USER MANUAL October 16, 2009 The Permutations box allows you to enable or not the permutations. If the option to write values above a certain threshold is set, the permutations have to be enabled. In the Random seed box, the seed for the permutations can be set or a random seed can be used. When associations are calculated and written for several genes, instead of calculating all the permutations for a gene, if we are sure that the p-value will be greater than a defined value (p0), the calculation is stop because there are no significant values, the minimum and maximum permutations number will defined the number of permutations to do and when to check the pvalue. The values allow for the permutations are 10, 50, 100, 500, 1000, 10000. If the minimum permutations number is 50 and the maximum permutations number is 1000. FastMap will first do 50 permutations and check the p-value, if we are sure that there are no significant values, the calculation is stop, if not FastMap will continue to do permutation until 100 and check the p-value again. If needed it will check again at 500 permutations, but will stop a 1000 permutations. The p0, the Minimum permutations and the Maximum permutations can be defined in this tab but p0 set has to be greater or equal to 100 – the output threshold value (thr = output threshold value, p0 >= 100 - thr) and the maximum permutations cannot be smaller than the minimum permutations. The Thresholds box allows you to set up to 5 thresholds (value and color). 16 October 16, 2009 FASTMAP USER MANUAL The threshold value is calculated by calculating permutations, then taking the maximum value for each permutation and sorting the maximum values. A value above the 99% threshold means that the value is above 99% of the permutations maximum values. The user can plot several different permutation based significance thresholds in different colors. To use this feature, set several p-value thresholds (ie. 0.95, 0.9 and 0.5) and select a color for each one (ie. red, orange, yellow) and re-plot the QTL plot. Provided that permutations are turned on, the lines will now appear at the selected significance levels. 4 Input Files 4.1 Gene expression file The gene expression file is the file that needs to be load first. It will define the set of strains (or individuals) to work with, the SNPs data file will have to contain all the strains that the gene expression file have, the SNPs data file can contain more strains but all the strains in the gene expression must be present in the SNPs data file. 17 FASTMAP USER MANUAL October 16, 2009 The first line of the file has to be the header and contain the strains, the first two columns are the microarray probe ID and an annotation, these two column will be display in the gene list area on FastMap. 18 FASTMAP USER MANUAL 4.2 October 16, 2009 SNPs data file After loading the gene expression file, a SNPs data file can be loaded. The SNPs data must contain all the strains that are present in the gene expression file (it can contain more strain). If the data is homozygous, the SNP has to be coded with 0 (no SNP) and 1 (SNP). If the data is heterozygous, the SNP is coded with 0 (no SNP on both chromosome), 1 (SNP on one chromosome) and 2 (SNP on both chromosome). The first line of the file is the header and contains the strain names. The first column is the chromosome number of the SNP and the second column is the position of the SNP. 19 FASTMAP USER MANUAL 4.3 October 16, 2009 HapMap SNP data file A directory with SNP data files from the HapMap project can be loaded. The directory has to contain only HapMap files, and the file names must be start with “genotypes_chr(number)_”, for example: - 4.4 genotypes_chr1_CEU_phase3.2_nr.b36_fwd.txt genotypes_chr2_CEU_phase3.2_nr.b36_fwd.txt genotypes_chr3_CEU_phase3.2_nr.b36_fwd.txt genotypes_chr4_CEU_phase3.2_nr.b36_fwd.txt Transposed PLINK data file PLINK set contain two text files: one (TPED) containing SNP and genotype information where one row is a SNP; one (TFAM) containing individual and family information, where one row is an individual. You can read more about PLINK file formats here: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml 20 October 16, 2009 FASTMAP USER MANUAL This kind of format can be convenient to work with when there are very many more SNPs than individuals (i.e. WGAS data). In this case, the TPED file will be very long (as opposed to the PED file being very wide). NOTE: Please make sure all the strain names in the gene expression file are present in SNP or TFAM files. 5 How to calculate an association a. Set the data option First set the option about the data (see Options window), you has to define if the data is mouse data or human data, if it is a homozygous or heterozygous, the window size, the type of association wanted and the SNP filtering. b. Load the files After setting the option about data processing, first load the gene expression, then a SNP data file or a directory with HapMap files(see Input Files and File menu (Alt+F)). c. Plot a chart After loading your GE and SNP files, you can calculate the association and plot a chart with the result by using the Association menu or the Task buttons. The chart can also be plotted by double-clicking on a gene name or by right-clicking on it. Before plotting the association into a chart, you can set option about the data significance (permutations and thresholds, see Options window). 21 FASTMAP USER MANUAL October 16, 2009 d. Save a chart The plotted chart can be saved into a file (jpg or png, see Options window) by using the Association menu or the Task buttons or by right-clicking directly on the chart and using the pop-up menu. 22 FASTMAP USER MANUAL October 16, 2009 e. Zoom and UCSC Browser You can zoom in or zoom out on the chart by selecting an area with the mouse left button or by using the pop-up menu with the mouse right button or by using the mouse wheel. If the zoomed area is within one chromosome, the data can be send to the UCSC website and browse it on their browser by using the Association menu or the task button. 23 FASTMAP USER MANUAL October 16, 2009 f. Write the association Instead of plotting one gene, you can calculate association for several genes and write the result into files. First you have to set the options about the output files and the data significance (see Options window). 24 FASTMAP USER MANUAL October 16, 2009 g. Write LocusZoom-ready files FastMap allows the user to write LocusZoom ready files to send results directly to LocusZoom, which will plot genes and SNP annotation in a specified region of interest. 25 FASTMAP USER MANUAL October 16, 2009 h. Output files By default only the file with the maximum statistic for each gene is written. You can decide to written more file (see Options window), such as the eQTL above a threshold. You can also write a file with all the permutations maximum values, a file with the all the threshold values and a file with all the association values for the gene, in this file there is the association for each SNP, each line match with the SNP location in the SNP.locations.txt file. 26 October 16, 2009 FASTMAP USER MANUAL 6 Possible Error messages Message Some strain names in the gene expression file are not present in the SNP or TFAM file [list of strain names] No gene have been selected from the list. Please select one or more genes before plotting associations. Error: [SOME_PATH] is not a valid directory path. Out of memory, too much data to process Error while trying to write [ASSOCIATION_TYPE]. Please verify that you have writing rights for selected directory. Error while trying to calculate [ASSOCIATION_TYPE]. Please make sure that the input data is in correct format. Gene expression and SNP data must be read in before plotting associations. Solution Please make sure that all strain names from Gene Expression file are present in the SNP (TFAM – in case of Transposed PLINK) file. Please select at least one gene from the list of genes on the left of the application main panel and then click “Plot Association” button. Please select existing directories and files. Also make sure that you have user privileges to read/write files into selected directory. Your machine has not enough random access memory (RAM) to complete processing current task. Please slice your data onto smaller parts or run your data on more powerful machine. Please select existing directories and files. Also make sure that you have user privileges to read/write files into selected directory. Please make sure your input data files has correct format, are readable and accessible. Please select Gene Expression and SNP/TPED files and allow application to finish processing those files. Error while trying to load the gene expression file. Please select existing directories and files. Also Please make sure the file exist and has the correct make sure that you have user privileges to format. read/write files into selected directory. Error while trying to load the SNP File. Please Please select existing directories and files. Also make sure the file exist and has the correct make sure that you have user privileges to format. read/write files into selected directory. Error while trying to load the HapMap SNP File. Please select existing directories and files. Also Please make sure the file exist and has the correct make sure that you have user privileges to format. read/write files into selected directory. Error while trying to load the Transposed PLINK Please select existing directories and files. Also Files. Please make sure the file exist and has the make sure that you have user privileges to correct format. read/write files into selected directory. In order to load PLINK data you need to have both TPED and TFAM file in the same directory. A path must be entered for the output directory. Please specify the output directory before saving any data. Error while trying to send data to UCSC Browser. Please make sure you have active Internet Please make sure you are connected to the connection. The other reason you’ll receive this Internet. error could be UCSC Browser inaccessibility due to 27 FASTMAP USER MANUAL Error while trying to save chromosome data file. Please verify that you have writing rights for selected directory. 28 October 16, 2009 maintenance on the server side. In that last case you’ll need to try again later. Please select existing directories and files. Also make sure that you have user privileges to read/write files into selected directory.