GeneSpring GT Frequently Asked Questions FREQUENTLY ASKED QUESTIONS for GeneSpring GT Does GeneSpring GT work with microsatellite data? ..................................................................... 1 Where can I find examples of analysis that have been performed using GeneSpring GT? ............ 1 Can you perform Loss of Heterozygosity (LOH) analysis? ............................................................ 2 GeneSpring GT is not accepting my genotyping data files. ............................................................ 2 How do I format custom data? ........................................................................................................ 2 I was able to import my data but I do not see all of the variations .................................................. 3 Where do I find allele frequencies for the variations that I am working with? ............................... 4 What does the option to combine consecutive variations mean? .................................................... 5 What is the extent of pedigree complexity that GeneSpring GT accommodates? .......................... 5 Is there a way to specify liability / penetrance classes? .................................................................. 6 How can I account for replicate arrays for a single patient? ........................................................... 6 Which algorithm do you use for Haplotyping? ............................................................................... 6 My program is very slow and I am running out of memory? .......................................................... 7 In the Master Table of Variations what does right and left flanking sequence mean? .................... 7 Do you have the references for the implemented algorithms? ........................................................ 7 How can I combine experiments? ................................................................................................... 8 I get a different chi-square value when I use Excel? ....................................................................... 8 Can I analyze the Affymetrix 10K, 100K and 500K array with GeneSpring GT?.......................... 8 When I run the case control script I don’t see the phenotype information. ..................................... 9 What does "Standard recessive trait" in the pedigree inspector window mean? ............................. 9 I can’t see my trait in the pedigree viewer? ..................................................................................... 9 Does GeneSpring GT work with microsatellite data? GeneSpring GT is capable of analyzing data from all types of genomic variations including single nucleotide polymorphisms (SNPs), microsatellites/single tandem repeat (STRs), and insertions/deletions. All these variations are pre-installed in the human genome as annotated through dbSNP (build 119 currently in use). Because of the relative simplicity of SNPs, GeneSpring GT does perform SNP analysis much more efficiently. Where can I find examples of analysis that have been performed using GeneSpring GT? A few case studies are summarized on our website http://www.chem.agilent.com/Scripts/Generic.ASP?lPage=34666&indcol=Y&prodcol=Y#GT These case studies refer to journal articles presented by two leading groups in genotyping research. Translational Genomics (TGEN) of Phoenix, AZ has published research findings on Sudden Infant Death Syndrome in relation to, TYSPL, a sex determining gene. Frank Middleton’s research group of SUNY, Syracuse related a foot skeletal deformity to a HOX developmental gene. Agilent Page 1 Version 1.0 GeneSpring GT Frequently Asked Questions Can you perform Loss of Heterozygosity (LOH) analysis? LOH analysis allows the identification of chromosomal regions that are potentially indicative of tumorigenic activity. GeneSpring GT http://www.chem.agilent.com/scripts/pds.asp?lpage=34662 identifies such regions based on a set of normal and transformed tissues. For the detection of chromosomal region deletions or amplifications, CGH Analytics http://www.chem.agilent.com/Scripts/PDS.asp?lPage=29457 provides a set of visualization tools and analysis algorithms. GeneSpring GT is not accepting my genotyping data files. Standard data output from Affymetrix and Illumina arrays can be dragged and dropped into an open window in GeneSpring GT and are automatically recognized. Affymetrix data files must at least contain the TSC ID or dbSNP ID and the base call (samplename_Call) fields. Illumina data consists of the data (.csv) file and a corresponding .opa file to define the variations on a specific array. Any other data must be reformatted to the “Silicon Genetics Internal SNP”(link to question “How do I format custom data?”) format. How do I format custom data? By custom data, we refer to genotyping data that is not from the Affymetrix or Illumina platforms. Before you load custom data into GeneSpring GT, you should check that your variations are included in the current build of dbSNP (see previous section). Custom data can be placed into a tab-delimited text file with headers that describe the data followed by columns of variation IDs and genotype measurements (see Full Example). First enter a header block, where the different fields are split onto different lines Full example: # SiGSNP2.0 # type=1 # chroms=2 # samples=2 # name=sample 1 # name=sample 2 rs1 C rs2 G rs3 A rs4 G C G A G C G G A C G G A First row must be # SiGSNP2.0 Second row specifies type of data (see table below) Third row specifies the number of chromosomes measured for each variation in each (usually 2 for genotype data, but could be any number for a population sample) Fourth row specifies the number of samples in the file Agilent Page 2 sample Version 1.0 GeneSpring GT Frequently Asked Questions All further header rows list sample names, one per row Example header: # SiGSNP2.0 # type=1 # chroms=2 # samples=2 # name=sample 1 # name=sample 2 All lines are required, and must appear in exactly the order listed. The number of “name=” lines must exactly match the number of samples in the file. All header lines start with #. Type: 0 = Haplotype [ 1 call per variation ] 1 = Diplotype (unordered) [ 2 calls per variation ] 2 = 2 Ordered haplotypes of dubious global validity. [ 2 calls per variation ] 3 = population 4 = 2 Ordered haplotypes (father first). The header is immediately followed by the data lines. The first column has the variation identifiers. There are then 1 or 2 columns for the first sample, 1 or 2 columns for the second sample, etc. There is one column/sample if the type is 0 (haplotypes) or 3 (populations), and two columns/sample for the other type values. Example genotype data: rs1 rs2 rs3 rs4 C G A G C G A G C G G A C G G A Example Population data: rs2 rs3 rs4 G:0.11,T:0.89 G:0.541,T:0.459 A:0.75,G:0.25 A:0.793,G:0.207 A:0.725,G:0.27 A:0.585,G:0.415 I was able to import my data but I do not see all of the variations It is likely that you have variations that are not pre-defined in the current dbSNP build or don’t have a physical position. You can first check to see if the variations are already defined in the human genome or are using the dbSNP identifiers. Search for an individual variation using Edit > Find Variation or copy a subset of your variations and use Edit > Paste > Paste Variation List to see if these variations are defined in GeneSpring GT. If the variations are not found, you add them to GeneSpring GT. Setting up custom variations involves creating a text file containing three columns: one for the variation identifiers, one for the chromosomal or contig locations, and one for the alleles of each variation like the following: Agilent Page 3 Version 1.0 GeneSpring GT SNPs: CR2 Frequently Asked Questions A/C 1: 1233454 Microsattelites: D1S896 (TG)17/18/19/21/22/23/24 1:-233670889 Save this information as a tab-delimited text file and then import the file using Edit > Edit Master Table of Variations > Import From File. You can then import the genotype data into GeneSpring GT using the internal format. Where do I find allele frequencies for the variations that I am working with? Some forms of family-based analysis, i.e. parametric linkage or autozygosity analysis require the use of population allele frequencies to establish significance of allele association with a phenotype. Allele frequencies can typically be generated in 2 ways: Affymetrix 10K and 100K users can download the allele frequencies at: http://www.chem.agilent.com/Scripts/Generic.ASP?lPage=35540&indcol=Y&prodcol=Y for Asian, African American, and Caucasian populations. These downloaded zip files can be dragged and dropped into the GeneSpring GT window to produce an experiment that contains the allele frequencies for each of the populations. If a user has their own allele frequencies, this information must be set up in the “Silicon Genetics Internal SNP” format. # SiGSNP2.0 # type=3 # chroms=2 # samples=3 # name=Asian # name=African-American # name=Caucasian rs1030583 C:0.361,G:0.639 rs1030626 C:0.75,G:0.25 rs1030687 C:0.5,T:0.5 rs1030708 G:0.85,T:0.15 rs1030777 G:0.675,T:0.325 C:0.81,G:0.19 C:0.512,G:0.488 C:0.341,T:0.659 G:0.964,T:0.036 G:0.512,T:0.488 C:0.603,G:0.397 C:0.595,G:0.405 C:0.429,T:0.571 G:0.857,T:0.143 G:0.464,T:0.536 This is a similar to the format for importing custom data (see – How to import custom data). You can also create allele frequencies based on your data. Usually this is done by using only the founders in your experiment. Please follow these steps to create the allele frequencies from your data: Select Sample Manager from the Experiments menu. Click the Filter on Experiment tab, select the experiment of choice. Sort on attributes (e.g. Founders) in the Filter Results table. Click on each individual in the Filter Results table that you want to use to generate frequencies. Click the Add button to add the samples to the Agilent Page 4 allele Version 1.0 GeneSpring GT Frequently Asked Questions Selected Samples table. At least 50 founders should be selected. Click the Create Experiment button to create an experiment that contains the individuals in the Selected Samples table. Right-click on the sample file in the Experiment folder in the Navigator pane and select Inspect from the shortcut menu. Click the Interpretation tab in the Experiment Inspector window. Click on Default Interpretation and select Do not Display. Save this file as a new interpretation (e.g. population condition). This results in a single averaged condition with that interpretation. Run the following script from the Basic Scripts folder of Navigator: Merge-Split Groups > Merge Condition to Sample Select the single averaged condition that you generated in Step 2 from the Experiment folder, and click the Condition button in the Inputs area. Leave the Data compression method as Population. Click the Start button to run the script. A new sample containing population allele frequencies is created and displayed in a Sample Inspector window. Name and save the sample. It is also a good idea to assign a project to the sample at this point. This sample is now available in the Sample Manager window. (optional) Open the Sample Manager window from the Experiments menu. Sort the samples by date. Add the new population sample to the sample list. Create an experiment from this new sample. What does the option to combine consecutive variations mean? This option, when set to a value greater than 1, creates pseudo-haplotype blocks. For example, if you have 15 variations and the number of consecutive variations equals 3, total of 13 calculations will be performed, once for each of 13 "haplotype blocks": Each “haplotype block” consists of 3 consecutive variations. 1, 2, 3 2, 3, 4 3, 4, 5 4, 5, 6 .... 12, 13, 14 13, 14, 15 What is the extent of pedigree complexity that GeneSpring GT accommodates? It depends on the type of analysis. GeneSpring GT supports exceedingly large and complex pedigrees and surpasses other applications in performance. We have not tested the limit. However, keep in mind that some analyses with complex pedigrees can run more than 10 hours. Agilent Page 5 Version 1.0 GeneSpring GT Frequently Asked Questions Is there a way to specify liability / penetrance classes? Yes, liability classes can be specified with respect to a pedigree. By double-clicking on a pedigree in the Navigator folder, the Pedigree Inspector is displayed. From the Pedigree Inspector, you can then edit trait details. For a given phenotype or trait, specify liability classes based on multiple classes or using a formula in the option “Model for Phenotype Probabilities”. How can I account for replicate arrays for a single patient? GeneSpring GT is capable of generating consensus genotype measurements from multiple arrays that represent one sample. The resulting sample shows a genotype for a variation only if the "replicate" samples all have that genotype. If all 4 samples have CG for a variation, then CG represents the genotype. If 1 sample happens to have GG, then no genotype is shown for that variation. This is done using three steps: use Experiments > Experiment Interpretation to create a parameter where the same sample has the same parameter value, use the script “Merge Conditions to Samples” in the Navigator directory Scripts > Basic Scripts > Merge-Split Groups and create a new experiment using Experiments > Create New Experiment. As an example, let’s say you have an experiment containing 3 replicates for each of 2 tumor and 2 normal tissues (with a total of 12 arrays). You would go into the Interpretations window and click on “Experiment Parameters”. Then add a new parameter column named “Tissue Type”, each of the 12 arrays would have the value tumor1, tumor2, normal1, or normal2. Save. Set Tissue Type as a Continuous parameter in the Interpretations window. Set all other parameters as Do Not Display. Click Save As New to create a new interpretation for the experiment. Click on the script “Merge Conditions to Samples”. In the script window, set the new interpretation as the Experiment. Use “Replicates” as the Data Compression knob setting. Access Create New Experiments and select the 4 merged conditions that were just created. Make this into a new experiment consisting of the consistent measurements across replicates for each sample. Which algorithm do you use for Haplotyping? We use the EM algorithm as described in: Excoffier L, Slatkin M: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 1995; 12: 921-927 Agilent Page 6 Version 1.0 GeneSpring GT Frequently Asked Questions My program is very slow and I am running out of memory? The memory can be configured in the lax file (the installer's default is 1 GB). For 2 GB RAM, we suggest setting the maximum heap size to 1.6 GB: Go to C:> Program files>Agilent>GeneSpring GT>Data and open the file: GeneSpring GT.lax wit a text editor program (Notepad). In this file find the text: lax.nl.java.option.java.heap.size.max=1000000000 and change the 1000000000 to 1600000000. Save and exit. Then adjust the memory preference in GeneSpring GT to 1.4 GB. In the Master Table of Variations what does right and left flanking sequence mean? Please see the following scetch to determine the orientation: + strand 5' ----------LLLLLLLLLLLLLLLLLLLLLLLLLLLL[SNP ALLELE]RRRRRRRRRRRRRRRRRRRRRRRRRR----------> 3' strand 3' <---------RRRRRRRRRRRRRRRRRRRRRRRRRRRR[SNP ALLELE]LLLLLLLLLLLLLLLLLLLLLLLLLL---------- 5' right flank left flank “left flank” refers to the 5’ strand direction. Do you have the references for the implemented algorithms? Nonparametric Linkage Kruglyak, L., Daly, M., Reeve-Daly, M.P., and Lander, E.S., Parametric and Nonparametric Linkage Analysis: A Unified Multipoint Approach. Am. J. Hum. Genet. 58:1347-1363, 1996 Haplotype Structure and Reconstruction Excoffier L, Slatkin M,Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 1995; 12: 921-927 Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S., High-resolution haplotype structure in the human genome. Nature Genetics 29:229-232, 2001 October ANOVA Page, G.P. and C.I. Amos, Comparison of Linkage-Disequilibrium Methods for Localization of Genes Influencing Quantitative Traits in Humans, Am. J. Hum. Genet. 64:1194-1205, 1999. Autozygosity and LOH Broman, K.W. and Weber, J.L. Long Homozygous Chromosomal Segments in Reference families from the Centre d'Étude du Polymorphisme Humain. Am. J. Hum. Genet. 65:1493-1500, 1999 TDT Spielman et al, "Transmission Test for Linkage Disequilibrium: The Insulin Gene Region and Insulindependent Diabetes Mellitus (IDDM)," Am J Hum Genet, 52:506-516, 1993. Spielman and Ewens, "The TDT and Other Family-Based Tests for Linkage Disequilibrium and Association," Am J Hum Genet 59:983-989, 1996. Extended Reading Ott, J., Analysis of Human Genetic Linkage, 3rd edition, copyrighted 1999 Agilent Page 7 Version 1.0 GeneSpring GT Frequently Asked Questions How can I combine experiments? Please follow this procedure: Import all the samples Make an experiment where each set of subchips is grouped in a condition (ie. If you have 10 samples comprising 5 actual samples with A & B subchips, then you would make an experiment with 5 conditions…1 for each pair of subchips.) To do this you need to set up a parameter so that the samples you want to group are in a condition together. Example: Parameter = Group, values are 1,2,3 ect. Give the chips to be grouped the same parameter value. In the interpretation inspector set the Group parameter to continuous and all other parameters to Do Not Display. Then convert the conditions. Run the Script primitive “Convert Conditions to Samples” (Scripts> Merge-Split Groups> Convert Conditions to Samples) Use the experiment you created above Set the Data Compression knob to Replicates Start Save the Samples Create a new experiment from the samples that were just created. (Experiments> Sample Manager> Select the samples and press add> Create Experiment) At this point the original experiment and associated chip samples can be deleted I get a different chi-square value when I use Excel? The reason for the discrepancy in the chi-square test p-values is that GeneSpring GT is constructing the contingency table based on alleles not genotypes. For the variation rs2051727 for instance, we have 33 individuals -- 11 cases and 22 controls. In the GeneSpring GT contingency table, each individual contributes 2 counts to the table, one for each copy of the allele. For the rs2051727 variation, this gives: control case A 6 33 B 16 11 The chi-square test is then done on this resulting contingency table. Can I analyze the Affymetrix 10K, 100K and 500K array with GeneSpring GT? Yes, we are able to handle data from the Affymetrix 10K, 100K and 500K array. The following performance benchmarks have been established in house using Mac OSX 1200 samples run on the Affymetrix 500K array: Running Mac OS 10.4.2 (JVM 1.4.2_07), 5 GB RAM, 2 CPUs, Dual 2 GHz Power PC processor Agilent Page 8 using G5 Version 1.0 GeneSpring GT Frequently Asked Questions Action Deduce Pedigree Hardy-Weinberg Deduce Haplotypes ANOVA Qualitative CaseControl Regression Quantitative CaseControl Time (hr:min:sec) 1:45:40 0:14:00 21:21:24 0:20:00 0:02:00 0:07:45 0:02:00 When I run the case control script I don’t see the phenotype information. When samples are not linked to a pedigree you will have to go to sample manager, select those samples that where imported and then click on edit attribute to input information on the phenotype and individual ID. Then create a new experiment and import parameters. That way the script will show phenotype info. This a known issue that affect case-control analysis where typically customer do not link the samples with pedigree What does "Standard recessive trait" in the pedigree inspector window mean? Indicating a trait to be standard recessive will have an effect on the analysis. This is because the analysis will seek out markers that have the same genotype-phenotype relationship. For example if a trait is standard recessive, you will want to find markers where only the homozygous genotype always (or almost always) corresponds to the affected phenotype. If the trait were dominant, instead you would look for markers where homozygous or heterozygous genotypes would correspond to the affected phenotype. I can’t see my trait in the pedigree viewer? A trait is indicated in the pedigree only when it is shown as the primary trait. This is seen in the bottom of the Pedigree Inspector, under the Trait Details tab, in the far right column called "Display". There can only be one Primary Trait. All the others will be treated as secondary. Agilent Page 9 Version 1.0