Where do I find allele frequencies for the variations that I am working

advertisement
GeneSpring GT
Frequently Asked Questions
FREQUENTLY ASKED QUESTIONS
for GeneSpring GT
Does GeneSpring GT work with microsatellite data? ..................................................................... 1
Where can I find examples of analysis that have been performed using GeneSpring GT? ............ 1
Can you perform Loss of Heterozygosity (LOH) analysis? ............................................................ 2
GeneSpring GT is not accepting my genotyping data files. ............................................................ 2
How do I format custom data? ........................................................................................................ 2
I was able to import my data but I do not see all of the variations .................................................. 3
Where do I find allele frequencies for the variations that I am working with? ............................... 4
What does the option to combine consecutive variations mean? .................................................... 5
What is the extent of pedigree complexity that GeneSpring GT accommodates? .......................... 5
Is there a way to specify liability / penetrance classes? .................................................................. 6
How can I account for replicate arrays for a single patient? ........................................................... 6
Which algorithm do you use for Haplotyping? ............................................................................... 6
My program is very slow and I am running out of memory? .......................................................... 7
In the Master Table of Variations what does right and left flanking sequence mean? .................... 7
Do you have the references for the implemented algorithms? ........................................................ 7
How can I combine experiments? ................................................................................................... 8
I get a different chi-square value when I use Excel? ....................................................................... 8
Can I analyze the Affymetrix 10K, 100K and 500K array with GeneSpring GT?.......................... 8
When I run the case control script I don’t see the phenotype information. ..................................... 9
What does "Standard recessive trait" in the pedigree inspector window mean? ............................. 9
I can’t see my trait in the pedigree viewer? ..................................................................................... 9
Does GeneSpring GT work with microsatellite data?
GeneSpring GT is capable of analyzing data from all types of genomic variations
including single nucleotide polymorphisms (SNPs), microsatellites/single tandem repeat
(STRs), and insertions/deletions. All these variations are pre-installed in the
human genome
as annotated through dbSNP (build 119 currently in use). Because
of the relative simplicity
of SNPs, GeneSpring GT does perform SNP analysis much
more efficiently.
Where can I find examples of analysis that have been performed using
GeneSpring GT?
A few case studies are summarized on our website
http://www.chem.agilent.com/Scripts/Generic.ASP?lPage=34666&indcol=Y&prodcol=Y#GT
These case studies refer to journal articles presented by two leading groups in genotyping
research. Translational Genomics (TGEN) of Phoenix, AZ has published research findings on
Sudden Infant Death Syndrome in relation to, TYSPL, a sex
determining gene.
Frank
Middleton’s research group of SUNY, Syracuse related a foot skeletal deformity to a HOX
developmental gene.
Agilent
Page 1
Version 1.0
GeneSpring GT
Frequently Asked Questions
Can you perform Loss of Heterozygosity (LOH) analysis?
LOH analysis allows the identification of chromosomal regions that are potentially
indicative of tumorigenic activity. GeneSpring GT
http://www.chem.agilent.com/scripts/pds.asp?lpage=34662 identifies such regions
based on a set of normal and transformed tissues.
For the detection of chromosomal region deletions or amplifications, CGH Analytics
http://www.chem.agilent.com/Scripts/PDS.asp?lPage=29457 provides a set of
visualization tools and analysis algorithms.
GeneSpring GT is not accepting my genotyping data files.
Standard data output from Affymetrix and Illumina arrays can be dragged and dropped
into an open window in GeneSpring GT and are automatically recognized. Affymetrix data
files must at least contain the TSC ID or dbSNP ID and the base call
(samplename_Call)
fields. Illumina data consists of the data (.csv) file and a corresponding .opa file to define the
variations on a specific array.
Any other data must be reformatted to the “Silicon Genetics Internal SNP”(link to
question “How do I format custom data?”) format.
How do I format custom data?
By custom data, we refer to genotyping data that is not from the Affymetrix or Illumina
platforms. Before you load custom data into GeneSpring GT, you should check that your
variations are included in the current build of dbSNP (see previous section).
Custom data
can be placed into a tab-delimited text file with headers that describe the data followed by
columns of variation IDs and genotype measurements (see Full Example).
First enter a
header block, where the different fields are split onto different
lines
Full example:
# SiGSNP2.0
# type=1
# chroms=2
# samples=2
# name=sample 1
# name=sample 2
rs1
C
rs2
G
rs3
A
rs4
G
C
G
A
G
C
G
G
A
C
G
G
A
First row must be # SiGSNP2.0
Second row specifies type of data (see table below)
Third row specifies the number of chromosomes measured for each variation in each
(usually 2 for genotype data, but could be any number for a population sample)
Fourth row specifies the number of samples in the file
Agilent
Page 2
sample
Version 1.0
GeneSpring GT
Frequently Asked Questions
All further header rows list sample names, one per row
Example header:
# SiGSNP2.0
# type=1
# chroms=2
# samples=2
# name=sample 1
# name=sample 2
All lines are required, and must appear in exactly the order listed. The number of
“name=” lines must exactly match the number of samples in the file. All header lines start
with #.
Type:
0 = Haplotype [ 1 call per variation ]
1 = Diplotype (unordered) [ 2 calls per variation ]
2 = 2 Ordered haplotypes of dubious global validity. [ 2 calls per variation ]
3 = population
4 = 2 Ordered haplotypes (father first).
The header is immediately followed by the data lines. The first column has the variation
identifiers. There are then 1 or 2 columns for the first sample, 1 or 2 columns for the second
sample, etc. There is one column/sample if the type is 0 (haplotypes) or 3
(populations),
and two columns/sample for the other type values.
Example genotype data:
rs1
rs2
rs3
rs4
C
G
A
G
C
G
A
G
C
G
G
A
C
G
G
A
Example Population data:
rs2
rs3
rs4
G:0.11,T:0.89 G:0.541,T:0.459
A:0.75,G:0.25 A:0.793,G:0.207
A:0.725,G:0.27 A:0.585,G:0.415
I was able to import my data but I do not see all of the variations
It is likely that you have variations that are not pre-defined in the current dbSNP build or don’t
have a physical position. You can first check to see if the variations are already
defined in the
human genome or are using the dbSNP identifiers.
Search for an individual variation using Edit > Find Variation or copy a subset of your
variations and use Edit > Paste > Paste Variation List to see if these variations are
defined
in GeneSpring GT. If the variations are not found, you add them to GeneSpring
GT. Setting up
custom variations involves creating a text file containing three columns: one for the variation
identifiers, one for the chromosomal or contig locations, and one for the alleles of each variation
like the following:
Agilent
Page 3
Version 1.0
GeneSpring GT
SNPs:
CR2
Frequently Asked Questions
A/C
1: 1233454
Microsattelites:
D1S896
(TG)17/18/19/21/22/23/24
1:-233670889
Save this information as a tab-delimited text file and then import the file using Edit > Edit
Master Table of Variations > Import From File. You can then import the genotype data into
GeneSpring GT using the internal format.
Where do I find allele frequencies for the variations that I am working with?
Some forms of family-based analysis, i.e. parametric linkage or autozygosity analysis require
the use of population allele frequencies to establish significance of allele association with a
phenotype. Allele frequencies can typically be generated in 2 ways:
Affymetrix 10K and 100K users can download the allele frequencies at:
http://www.chem.agilent.com/Scripts/Generic.ASP?lPage=35540&indcol=Y&prodcol=Y
for Asian, African American, and Caucasian populations. These downloaded zip files can be
dragged and dropped into the GeneSpring GT window to produce an experiment that contains
the allele frequencies for each of the populations.
If a user has their own allele frequencies, this information must be set up in the
“Silicon
Genetics Internal SNP” format.
# SiGSNP2.0
# type=3
# chroms=2
# samples=3
# name=Asian
# name=African-American
# name=Caucasian
rs1030583 C:0.361,G:0.639
rs1030626 C:0.75,G:0.25
rs1030687 C:0.5,T:0.5
rs1030708 G:0.85,T:0.15
rs1030777 G:0.675,T:0.325
C:0.81,G:0.19
C:0.512,G:0.488
C:0.341,T:0.659
G:0.964,T:0.036
G:0.512,T:0.488
C:0.603,G:0.397
C:0.595,G:0.405
C:0.429,T:0.571
G:0.857,T:0.143
G:0.464,T:0.536
This is a similar to the format for importing custom data (see – How to import custom data).
You can also create allele frequencies based on your data. Usually this is done by
using
only the founders in your experiment. Please follow these steps to create the
allele
frequencies from your data:
Select Sample Manager from the Experiments menu.
Click the Filter on Experiment tab, select the experiment of choice.
Sort on attributes (e.g. Founders) in the Filter Results table.
Click on each individual in the Filter Results table that you want to use to generate
frequencies. Click the Add button to add the samples to the
Agilent
Page 4
allele
Version 1.0
GeneSpring GT
Frequently Asked Questions
Selected Samples table. At least 50 founders should be selected.
Click the Create Experiment button to create an experiment that contains the individuals in
the Selected Samples table.
Right-click on the sample file in the Experiment folder in the Navigator pane and select
Inspect from the shortcut menu.
Click the Interpretation tab in the Experiment Inspector window.
Click on Default Interpretation and select Do not Display.
Save this file as a new interpretation (e.g. population condition). This results in a single
averaged condition with that interpretation.
Run the following script from the Basic Scripts folder of Navigator:
Merge-Split Groups > Merge Condition to Sample
Select the single averaged condition that you generated in Step 2 from the
Experiment folder, and click the Condition button in the Inputs area.
Leave the Data compression method as Population.
Click the Start button to run the script. A new sample containing population allele
frequencies is created and displayed in a Sample Inspector window.
Name and save the sample. It is also a good idea to assign a project to the sample at this
point. This sample is now available in the Sample Manager window.
(optional) Open the Sample Manager window from the Experiments menu.
Sort the samples by date.
Add the new population sample to the sample list.
Create an experiment from this new sample.
What does the option to combine consecutive variations mean?
This option, when set to a value greater than 1, creates pseudo-haplotype blocks. For
example, if you have 15 variations and the number of consecutive variations equals 3, total of
13 calculations will be performed, once for each of 13 "haplotype blocks": Each “haplotype
block” consists of 3 consecutive variations.
1, 2, 3
2, 3, 4
3, 4, 5
4, 5, 6
....
12, 13, 14
13, 14, 15
What is the extent of pedigree complexity that GeneSpring GT accommodates?
It depends on the type of analysis. GeneSpring GT supports exceedingly large and
complex pedigrees and surpasses other applications in performance. We have not
tested
the limit. However, keep in mind that some analyses with complex pedigrees can run more than
10 hours.
Agilent
Page 5
Version 1.0
GeneSpring GT
Frequently Asked Questions
Is there a way to specify liability / penetrance classes?
Yes, liability classes can be specified with respect to a pedigree. By double-clicking on a
pedigree in the Navigator folder, the Pedigree Inspector is displayed. From the
Pedigree
Inspector, you can then edit trait details. For a given phenotype or trait, specify liability classes
based on multiple classes or using a formula in the option
“Model for Phenotype
Probabilities”.
How can I account for replicate arrays for a single patient?
GeneSpring GT is capable of generating consensus genotype measurements from
multiple arrays that represent one sample. The resulting sample shows a genotype for a
variation only if the "replicate" samples all have that genotype. If all 4 samples have
CG for a
variation, then CG represents the genotype. If 1 sample happens to have GG,
then no
genotype is shown for that variation.
This is done using three steps:
use Experiments > Experiment Interpretation to create a parameter where the same sample
has the same parameter value,
use the script “Merge Conditions to Samples” in the Navigator directory Scripts > Basic
Scripts > Merge-Split Groups and
create a new experiment using Experiments > Create New Experiment.
As an example, let’s say you have an experiment containing 3 replicates for each of 2 tumor
and 2 normal tissues (with a total of 12 arrays).
You would go into the Interpretations window and click on “Experiment Parameters”.
Then add a new parameter column named “Tissue Type”, each of the 12 arrays would have
the value tumor1, tumor2, normal1, or normal2. Save.
Set Tissue Type as a Continuous parameter in the Interpretations window. Set all other
parameters as Do Not Display. Click Save As New to create a new interpretation for the
experiment.
Click on the script “Merge Conditions to Samples”. In the script window, set the new
interpretation as the Experiment. Use “Replicates” as the Data Compression knob
setting.
Access Create New Experiments and select the 4 merged conditions that were just
created.
Make this into a new experiment consisting of the consistent measurements
across
replicates for each sample.
Which algorithm do you use for Haplotyping?
We use the EM algorithm as described in:
Excoffier L, Slatkin M: Maximum-likelihood estimation of molecular haplotype frequencies in a
diploid population. Mol Biol Evol 1995; 12: 921-927
Agilent
Page 6
Version 1.0
GeneSpring GT
Frequently Asked Questions
My program is very slow and I am running out of memory?
The memory can be configured in the lax file (the installer's default is 1 GB). For 2 GB RAM,
we suggest setting the maximum heap size to 1.6 GB:
Go to C:> Program files>Agilent>GeneSpring GT>Data and open the file: GeneSpring GT.lax
wit a text editor program (Notepad). In this file find the text:
lax.nl.java.option.java.heap.size.max=1000000000 and change the 1000000000 to
1600000000. Save and exit.
Then adjust the memory preference in GeneSpring GT to 1.4 GB.
In the Master Table of Variations what does right and left flanking sequence mean?
Please see the following scetch to determine the orientation:
+ strand
5' ----------LLLLLLLLLLLLLLLLLLLLLLLLLLLL[SNP ALLELE]RRRRRRRRRRRRRRRRRRRRRRRRRR----------> 3'
strand
3' <---------RRRRRRRRRRRRRRRRRRRRRRRRRRRR[SNP ALLELE]LLLLLLLLLLLLLLLLLLLLLLLLLL---------- 5'
right flank
left flank
“left flank” refers to the 5’ strand direction.
Do you have the references for the implemented algorithms?
Nonparametric Linkage
Kruglyak, L., Daly, M., Reeve-Daly, M.P., and Lander, E.S., Parametric and Nonparametric Linkage
Analysis:
A Unified Multipoint Approach. Am. J. Hum. Genet. 58:1347-1363, 1996
Haplotype Structure and Reconstruction
Excoffier L, Slatkin M,Maximum-likelihood estimation of molecular haplotype frequencies in a diploid
population. Mol Biol Evol 1995; 12: 921-927
Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S., High-resolution haplotype structure in the
human genome. Nature Genetics 29:229-232, 2001 October
ANOVA
Page, G.P. and C.I. Amos, Comparison of Linkage-Disequilibrium Methods for Localization of Genes
Influencing Quantitative Traits in Humans, Am. J. Hum. Genet. 64:1194-1205, 1999.
Autozygosity and LOH
Broman, K.W. and Weber, J.L. Long Homozygous Chromosomal Segments in Reference families from the
Centre d'Étude du Polymorphisme Humain. Am. J. Hum. Genet. 65:1493-1500, 1999
TDT
Spielman et al, "Transmission Test for Linkage Disequilibrium: The Insulin Gene Region and Insulindependent Diabetes Mellitus (IDDM)," Am J Hum Genet, 52:506-516, 1993.
Spielman and Ewens, "The TDT and Other Family-Based Tests for Linkage Disequilibrium and Association," Am J
Hum Genet 59:983-989, 1996.
Extended Reading
Ott, J., Analysis of Human Genetic Linkage, 3rd edition, copyrighted 1999
Agilent
Page 7
Version 1.0
GeneSpring GT
Frequently Asked Questions
How can I combine experiments?
Please follow this procedure:
Import all the samples
Make an experiment where each set of subchips is grouped in a condition (ie. If you have 10
samples comprising 5 actual samples with A & B subchips, then you would
make an
experiment with 5 conditions…1 for each pair of subchips.) To do this you need to set up a
parameter so that the samples you want to group are in a condition
together.
Example:
Parameter = Group, values are 1,2,3 ect. Give the chips to be grouped the same
parameter value. In the interpretation inspector set the Group parameter to continuous
and all other parameters to Do Not Display. Then convert the conditions.
Run the Script primitive “Convert Conditions to Samples” (Scripts> Merge-Split
Groups> Convert Conditions to Samples)
Use the experiment you created above
Set the Data Compression knob to Replicates
Start
Save the Samples
Create a new experiment from the samples that were just created. (Experiments>
Sample
Manager> Select the samples and press add> Create Experiment)
At this point the original experiment and associated chip samples can be deleted
I get a different chi-square value when I use Excel?
The reason for the discrepancy in the chi-square test p-values is that GeneSpring GT is
constructing the contingency table based on alleles not genotypes. For the variation
rs2051727 for instance, we have 33 individuals -- 11 cases and 22 controls. In the
GeneSpring GT contingency table, each individual contributes 2 counts to the table, one
for each copy of the allele. For the rs2051727 variation, this gives:
control
case
A
6
33
B
16
11
The chi-square test is then done on this resulting contingency table.
Can I analyze the Affymetrix 10K, 100K and 500K array with GeneSpring GT?
Yes, we are able to handle data from the Affymetrix 10K, 100K and 500K array. The
following performance benchmarks have been established in house using Mac OSX
1200 samples run on the Affymetrix 500K array:
Running Mac OS 10.4.2 (JVM 1.4.2_07), 5 GB RAM, 2 CPUs, Dual 2 GHz Power PC
processor
Agilent
Page 8
using
G5
Version 1.0
GeneSpring GT
Frequently Asked Questions
Action
Deduce Pedigree
Hardy-Weinberg
Deduce Haplotypes
ANOVA
Qualitative CaseControl
Regression
Quantitative CaseControl
Time (hr:min:sec)
1:45:40
0:14:00
21:21:24
0:20:00
0:02:00
0:07:45
0:02:00
When I run the case control script I don’t see the phenotype information.
When samples are not linked to a pedigree you will have to go to sample manager,
select
those samples that where imported and then click on edit attribute to input
information on
the phenotype and individual ID. Then create a new experiment and
import parameters. That
way the script will show phenotype info.
This a known issue that affect case-control analysis where typically customer do not link the
samples with pedigree
What does "Standard recessive trait" in the pedigree inspector window mean?
Indicating a trait to be standard recessive will have an effect on the analysis. This is
because the analysis will seek out markers that have the same genotype-phenotype
relationship. For example if a trait is standard recessive, you will want to find markers where
only the homozygous genotype always (or almost always) corresponds to the
affected
phenotype. If the trait were dominant, instead you would look for markers where homozygous or
heterozygous genotypes would correspond to the affected
phenotype.
I can’t see my trait in the pedigree viewer?
A trait is indicated in the pedigree only when it is shown as the primary trait.
This is seen in the bottom of the Pedigree Inspector, under the Trait Details tab, in the far right
column called "Display". There can only be one Primary Trait. All the others
will be treated
as secondary.
Agilent
Page 9
Version 1.0
Download