user manual

advertisement
October 16, 2009
FASTMAP
USER
MANUAL
FASTMAP
User Manual
1
FASTMAP
USER
MANUAL
October 16, 2009
Contents
1.
About FastMap...................................................................................................................................... 4
2.
FastMap Installation ............................................................................................................................. 4
3
4
5
2.1
Requirements ................................................................................................................................ 4
2.2
How to launch FastMap ................................................................................................................ 5
Interface ................................................................................................................................................ 6
3.1
Main interface ............................................................................................................................... 6
3.2
File menu (Alt+F) ........................................................................................................................... 7
3.3
Association menu (Alt+A).............................................................................................................. 8
3.4
Tools menu (Alt+L) ...................................................................................................................... 10
3.5
Options window .......................................................................................................................... 13
Input Files ............................................................................................................................................ 17
4.1
Gene expression file.................................................................................................................... 17
4.2
SNPs data file .............................................................................................................................. 19
4.3
HapMap SNP data file ................................................................................................................. 20
How to calculate an association ......................................................................................................... 21
5.1
Set the data option ..................................................................................................................... 21
5.2
Load the files ............................................................................................................................... 21
5.3
Plot a chart .................................................................................................................................. 21
5.4
Save a chart ................................................................................................................................. 20
5.5
Zoom and UCSC Browser ............................................................................................................ 23
5.6
Write the association .................................................................................................................. 24
5.8 Write LocusZoom-ready files…………………………………………………………………………………………………….24
5.7
6
Output files ............................................................................................................................... 265
Possible error messages ………………………………………………………………………………………………………………26
2
FASTMAP
USER
MANUAL
3
October 16, 2009
October 16, 2009
FASTMAP
USER
MANUAL
1. About FastMap
Gene expression Quantitative Trait Locus (eQTL) mapping measures the association
between transcript expression and genotype in order to find genomic locations likely to
regulate transcript expression. The availability of both gene expression and high-density
genotype data has improved our ability to perform eQTL mapping in inbred mouse and other
homozygous populations. However, existing eQTL mapping software does not scale well when
the number of transcripts and markers are on the order of 105 and 105–106, respectively.
We propose a new method, FastMap, for fast and efficient eQTL mapping in homozygous
inbred populations with binary allele calls. FastMap exploits the discrete nature and structure
of the measured single nucleotide polymorphisms (SNPs). In particular, SNPs are organized into
a Hamming distance-based tree that minimizes the number of arithmetic operations required
to calculate the association of a SNP by making use of the association of its parent SNP in the
tree. FastMap's tree can be used to perform both single marker mapping and haplotype
association mapping over an m-SNP window. These performance enhancements also permit
permutation-based significance testing.
2. FastMap Installation
2.1
Requirements
In order to use FastMap, a computer with 2GB of RAM is needed.
A JAVA Runtime Environment (JRE) 6 is required to run FastMap. The JRE 6 can be
downloaded in this website: http://java.sun.com/javase/downloads/index.jsp
4
FASTMAP
USER
MANUAL
2.2
October 16, 2009
How to launch FastMap
FastMap does not need to be installed. Unzip the archive FastMap.zip anywhere. The
unzipped directory contains three executable files: FastMap.exe, FastMap.bat and
FastMap.alt.bat. To launch double-click on FastMap.exe.
If FastMap.exe does not work, double-click on FastMap.bat.
If FastMap.bat does not work, edit FastMap.alt.bat and put the path of the directory where
the JAVA 6 JRE is installed in the variable JAVA_HOME.
5
October 16, 2009
FASTMAP
USER
MANUAL
3 Interface
3.1
Main interface
The menus allow you to load and to save files, to calculate association (linear model,
ANOVA) and to set up options.
The task buttons allow you to execute the main actions (to load files, to calculate
association …) without having to go through the menus.
Once a gene expression files is loaded, the list of the genes present in the file will be
displayed in the gene list area. Double-clicking (or using the right-click) on the gene name will
calculate the association for this gene.
The progress bar will display the progression of a task (loading, calculating, etc…).
The cancel task button allows you to stop the execution of a task.
The result of the calculation will be displayed in the chart.
6
FASTMAP
USER
MANUAL
3.2
October 16, 2009
File menu (Alt+F)
The File menu allows files loading and saving. The gene expression file has to be loaded
first before loading other files.
Once the gene expression file is loaded, a SNP data file or a directory with HapMap SNP
data file can be loaded.
7
FASTMAP
USER
MANUAL
3.3
October 16, 2009
Association menu (Alt+A)
The Association menu allows you to calculate the association (Linear model, ANOVA).
The Plot Association menu item will plot the association on the chart area.
8
FASTMAP
USER
MANUAL
October 16, 2009
After plotting the association, the chart can be written into a file using the Save Chart
menu item.
On the chart area, you can zoom on an area using the left button, the right click menu or
the mouse wheel. If the zoomed area is within a chromosome, the UCSC Browser menu item
will connect to the UCSC website and display the selected region on the UCSC Browse.
9
FASTMAP
USER
MANUAL
October 16, 2009
The association can also be written into a file using the Write Association menu item.
10
FASTMAP
USER
MANUAL
3.4
October 16, 2009
Tools menu (Alt+L)
The Tools menu allows you to set up options and see the SNP similarity matrix.
The SNP similarity window will show the similarity between the strains or the individuals in
the SNP data file.
11
FASTMAP
USER
MANUAL
October 16, 2009
The strains (or individuals) can be divided into 2 groups for mean or median subtracting by
checking the name by clicking “OK” button on the similarity window and rewriting the chart (by
clicking “Plot Association” button on the main toolbar).
The Options menu item opens the options window and allows you to set options.
12
FASTMAP
USER
MANUAL
3.5
October 16, 2009
Options window
The options window contains 3 tabs: Output, Data Processing and Data Significance.
The Output tab set options about the data output.
In the General box, you can set the default output directory where the result will be
written and you can also set whether the chart with the association result will be written into a
13
FASTMAP
USER
MANUAL
October 16, 2009
file or not. To set the output directory, check the Output Dir checkbox then enter the directory
path into the text field or use the Browse button to choose the directory. If you want the chart
to be written, check the Write chart checkbox then choose the file type.
In the Summary Output box and the Detailed Output box, you can choose the output files
that will be written. By default only the file with the maximum statistic for each calculated gene
is written. The file with statistic above a certain threshold can be written by checking the All
Statistics Above Threshold checkbox then choose the threshold. By checking the Write
Associations checkbox, the file with all the association values will be written. The Write
Threshold checkbox if checked will allow the file with all the threshold value to be written. The
file with all the permutation values will be written if the Write Permutations checkbox is
checked.
The Data Processing tab set options about SNP data processing and filtering.
The SNPs box allows you to set the data type for the SNP file. The data can be mouse data
(20 chromosome pairs) or human data (23 chromosome pairs), homozygous or heterozygous. If
the data is homozygous, the data can be read SNP by SNP or it can be read as a window of
several SNPs (for example three by three).
14
FASTMAP
USER
MANUAL
October 16, 2009
The Statistic Type box allows you to choose the association type. For homozygous data if
the SNP window size is 1, the association type will only be Linear Model (correlation), if the SNP
window size is greater than 1, the association type will be ANOVA (-log(p-value)). For
heterozygous data, you will have to choose between Linear Model (-log(p-value)) or ANOVA (log(p-value)).
The SNPs Filtering box allows you to set the minor allele frequency. The default value is 5%
which means that each line where the number of stains with the minor allele is less than 5% of
the total number of strains will be filtered.
The Subtracting method box set the subtracting method if the strains are divided into 2
groups using the SNPs similarity window. If Mean subtract is selected, for each group the mean
value of the gene expression data will be subtracted to the gene expression value of the each
strain. If Median subtract is selected, it’s the median value of the gene expression data that will
be subtracted.
The Data Significance tab set options about the permutations and the thresholds.
15
FASTMAP
USER
MANUAL
October 16, 2009
The Permutations box allows you to enable or not the permutations. If the option to write
values above a certain threshold is set, the permutations have to be enabled. In the Random
seed box, the seed for the permutations can be set or a random seed can be used.
When associations are calculated and written for several genes, instead of calculating all
the permutations for a gene, if we are sure that the p-value will be greater than a defined value
(p0), the calculation is stop because there are no significant values, the minimum and maximum
permutations number will defined the number of permutations to do and when to check the pvalue. The values allow for the permutations are 10, 50, 100, 500, 1000, 10000. If the minimum
permutations number is 50 and the maximum permutations number is 1000. FastMap will first
do 50 permutations and check the p-value, if we are sure that there are no significant values,
the calculation is stop, if not FastMap will continue to do permutation until 100 and check the
p-value again. If needed it will check again at 500 permutations, but will stop a 1000
permutations. The p0, the Minimum permutations and the Maximum permutations can be
defined in this tab but p0 set has to be greater or equal to 100 – the output threshold value (thr
= output threshold value, p0 >= 100 - thr) and the maximum permutations cannot be smaller
than the minimum permutations.
The Thresholds box allows you to set up to 5 thresholds (value and color).
16
October 16, 2009
FASTMAP
USER
MANUAL
The threshold value is calculated by calculating permutations, then taking the maximum value
for each permutation and sorting the maximum values. A value above the 99% threshold means
that the value is above 99% of the permutations maximum values.
The user can plot several different permutation based significance thresholds in different
colors. To use this feature, set several p-value thresholds (ie. 0.95, 0.9 and 0.5) and select a
color for each one (ie. red, orange, yellow) and re-plot the QTL plot. Provided that
permutations are turned on, the lines will now appear at the selected significance levels.
4 Input Files
4.1
Gene expression file
The gene expression file is the file that needs to be load first. It will define the set of strains
(or individuals) to work with, the SNPs data file will have to contain all the strains that the gene
expression file have, the SNPs data file can contain more strains but all the strains in the gene
expression must be present in the SNPs data file.
17
FASTMAP
USER
MANUAL
October 16, 2009
The first line of the file has to be the header and contain the strains, the first two columns
are the microarray probe ID and an annotation, these two column will be display in the gene list
area on FastMap.
18
FASTMAP
USER
MANUAL
4.2
October 16, 2009
SNPs data file
After loading the gene expression file, a SNPs data file can be loaded. The SNPs data must
contain all the strains that are present in the gene expression file (it can contain more strain). If
the data is homozygous, the SNP has to be coded with 0 (no SNP) and 1 (SNP). If the data is
heterozygous, the SNP is coded with 0 (no SNP on both chromosome), 1 (SNP on one
chromosome) and 2 (SNP on both chromosome).
The first line of the file is the header and contains the strain names. The first column is the
chromosome number of the SNP and the second column is the position of the SNP.
19
FASTMAP
USER
MANUAL
4.3
October 16, 2009
HapMap SNP data file
A directory with SNP data files from the HapMap project can be loaded.
The directory has to contain only HapMap files, and the file names must be start with
“genotypes_chr(number)_”, for example:
-
4.4
genotypes_chr1_CEU_phase3.2_nr.b36_fwd.txt
genotypes_chr2_CEU_phase3.2_nr.b36_fwd.txt
genotypes_chr3_CEU_phase3.2_nr.b36_fwd.txt
genotypes_chr4_CEU_phase3.2_nr.b36_fwd.txt
Transposed PLINK data file
PLINK set contain two text files: one (TPED) containing SNP and genotype information
where one row is a SNP; one (TFAM) containing individual and family information, where one
row is an individual.
You can read more about PLINK file formats here:
http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml
20
October 16, 2009
FASTMAP
USER
MANUAL
This kind of format can be convenient to work with when there are very many more SNPs
than individuals (i.e. WGAS data). In this case, the TPED file will be very long (as opposed to the
PED file being very wide).
NOTE: Please make sure all the strain names in the gene expression file are present in
SNP or TFAM files.
5 How to calculate an association
a. Set the data option
First set the option about the data (see Options window), you has to define if the data is
mouse data or human data, if it is a homozygous or heterozygous, the window size, the type of
association wanted and the SNP filtering.
b. Load the files
After setting the option about data processing, first load the gene expression, then a SNP
data file or a directory with HapMap files(see Input Files and File menu (Alt+F)).
c. Plot a chart
After loading your GE and SNP files, you can calculate the association and plot a chart with
the result by using the Association menu or the Task buttons. The chart can also be plotted by
double-clicking on a gene name or by right-clicking on it.
Before plotting the association into a chart, you can set option about the data significance
(permutations and thresholds, see Options window).
21
FASTMAP
USER
MANUAL
October 16, 2009
d. Save a chart
The plotted chart can be saved into a file (jpg or png, see Options window) by using the
Association menu or the Task buttons or by right-clicking directly on the chart and using the
pop-up menu.
22
FASTMAP
USER
MANUAL
October 16, 2009
e. Zoom and UCSC Browser
You can zoom in or zoom out on the chart by selecting an area with the mouse left button
or by using the pop-up menu with the mouse right button or by using the mouse wheel.
If the zoomed area is within one chromosome, the data can be send to the UCSC website
and browse it on their browser by using the Association menu or the task button.
23
FASTMAP
USER
MANUAL
October 16, 2009
f. Write the association
Instead of plotting one gene, you can calculate association for several genes and write the
result into files. First you have to set the options about the output files and the data
significance (see Options window).
24
FASTMAP
USER
MANUAL
October 16, 2009
g. Write LocusZoom-ready files
FastMap allows the user to write LocusZoom ready files to send results directly to
LocusZoom, which will plot genes and SNP annotation in a specified region of interest.
25
FASTMAP
USER
MANUAL
October 16, 2009
h. Output files
By default only the file with the maximum statistic for each gene is written.
You can decide to written more file (see Options window), such as the eQTL above a
threshold.
You can also write a file with all the permutations maximum values, a file with the all the
threshold values and a file with all the association values for the gene, in this file there is the
association for each SNP, each line match with the SNP location in the SNP.locations.txt file.
26
October 16, 2009
FASTMAP
USER
MANUAL
6 Possible Error messages
Message
Some strain names in the gene expression file are
not present in the SNP or TFAM file [list of strain
names]
No gene have been selected from the list. Please
select one or more genes before plotting
associations.
Error: [SOME_PATH] is not a valid directory path.
Out of memory, too much data to process
Error while trying to write [ASSOCIATION_TYPE].
Please verify that you have writing rights for
selected directory.
Error while trying to calculate
[ASSOCIATION_TYPE]. Please make sure that the
input data is in correct format.
Gene expression and SNP data must be read in
before plotting associations.
Solution
Please make sure that all strain names from Gene
Expression file are present in the SNP (TFAM – in
case of Transposed PLINK) file.
Please select at least one gene from the list of
genes on the left of the application main panel and
then click “Plot Association” button.
Please select existing directories and files. Also
make sure that you have user privileges to
read/write files into selected directory.
Your machine has not enough random access
memory (RAM) to complete processing current
task. Please slice your data onto smaller parts or
run your data on more powerful machine.
Please select existing directories and files. Also
make sure that you have user privileges to
read/write files into selected directory.
Please make sure your input data files has correct
format, are readable and accessible.
Please select Gene Expression and SNP/TPED files
and allow application to finish processing those
files.
Error while trying to load the gene expression file. Please select existing directories and files. Also
Please make sure the file exist and has the correct make sure that you have user privileges to
format.
read/write files into selected directory.
Error while trying to load the SNP File. Please
Please select existing directories and files. Also
make sure the file exist and has the correct
make sure that you have user privileges to
format.
read/write files into selected directory.
Error while trying to load the HapMap SNP File.
Please select existing directories and files. Also
Please make sure the file exist and has the correct make sure that you have user privileges to
format.
read/write files into selected directory.
Error while trying to load the Transposed PLINK
Please select existing directories and files. Also
Files. Please make sure the file exist and has the
make sure that you have user privileges to
correct format.
read/write files into selected directory. In order to
load PLINK data you need to have both TPED and
TFAM file in the same directory.
A path must be entered for the output directory.
Please specify the output directory before saving
any data.
Error while trying to send data to UCSC Browser.
Please make sure you have active Internet
Please make sure you are connected to the
connection. The other reason you’ll receive this
Internet.
error could be UCSC Browser inaccessibility due to
27
FASTMAP
USER
MANUAL
Error while trying to save chromosome data file.
Please verify that you have writing rights for
selected directory.
28
October 16, 2009
maintenance on the server side. In that last case
you’ll need to try again later.
Please select existing directories and files. Also
make sure that you have user privileges to
read/write files into selected directory.
Download