Carsten Rosenow, Ph.D. Linkage Analysis Programs PROGRAMS FOR LINKAGE ANALYSIS d-CHIPSNP PRO 1. Generating the Pedigree file: For linkage analysis a pedigree file needs to be generated. This file contains all the necessary information to reconstruct the individual relationships in a family. The file contains a minimum of five items (Column A-E). In addition this file also contains the phenotype information (Column F): 1. Family identifier (Numeric, Column A). This identifier is the same for each member in the same family including spouses etc. 2. Individual identifier (Numeric, Column B). Each individual in a pedigree gets assigned a unique ID. 3. Father ID (Numeric, Column C). This column contains the individual ID of the father from this person. If the father is unknown put zero in this column (in this case this person is a founder). [in dChip we require a founder to have both parents missing. If only one parent is missing, we can put this parent as an individual with two 0 grand-parents, but with missing data for all markers for this parent] 4. Mother ID (Numeric, Column D). This column contains the individual ID of the mother from this person. If the mother is unknown put zero in this column. 5. Sex (Numeric, Column E). Put the sex of the person in this column (1: male; 2: female) 6. Phenotype information. The next columns contain information on phenotypes for discrete and quantitative traits. Disease status is usually encoded in column F with 1: unaffected; 2: affected; 0: missing phenotypes. Quantitative traits can be added in the following columns using numerical values. [dchip only accept one “Affected” column and not quantitative traits] Save this file as a pedigree file (extension .ped): i. > File > Save as > Save as type > text (Tab delimited) > file name XXX.ped > save 2. Generating the GDAS output: Analyze genotype data from the mapping 10K array using GCOS/GDAS Export ONLY the Affy SNP ID column and data columns. Save as tab delimited text file. Export GDAS table into your experiment folder using the excel format Open the file in excel Select all column headers that contain the name of the experiment files >edit >copy Save the GDAS file in text format: Confidential Page 1 2/15/2016 Carsten Rosenow, Ph.D. Linkage Analysis Programs > File > Save as > Save as type > text (Tab delimited) > file name XXX.ped > save Open the pedigree file that was generated earlier Paste the header for the genotype column as the 6th column of the pedigree file. Each individual in the pedigree file has now a genotype column assigned that will link him to the GDAS file. edit > paste special > ‘check the transpose tab’ > click: OK MAKE SURE EACH PERSON IN THE PEDIGREE FILE IS ASSOCIATED WITH THE RIGHT COLUMN IN THE GDAS OUTPUT. ANY PERSON THAT HAS NO GENOTYPE DATA SHOULD HAVE THE ENTRY “NA” IN THIS COLUMN Save the pedigree file in text format: > File > Save as > Save as type > text (Tab delimited) > file name XXX.ped > save Summary: You should have two files now. One is the pedigree file that has all the family information (this file can contain multiple families). The other file is the GDAS file which contains the genotype information. Both files are linked through a common denominator. This denominator is the name of your experiment file that links each person in the pedigree file with its genotype information in the GDAS file. Both files need to be in tab delimited text format. 3. dChipSNP The following files need to be located in the dChipSNP folder. All of these files come with the program. SNP Genome Information File o 11_k_snp_genome_info_hg15_AFAMfreq.xls: This file is used for African American populations. It contains the allele frequencies for this population group. o 11_k_snp_genome_info_hg15_asian.xls: This file is used for Asian populations. It contains the allele frequencies for this population group. o 11_k_snp_genome_info_hg15_Caufreq.xls: This file is used for Caucasian populations. It contains the allele frequencies for this population group. cytoBand hg15.txt: Contains information about the Cytoband hu refGene hg15.xls: Contains gene annotation information Confidential Page 2 2/15/2016 Carsten Rosenow, Ph.D. Linkage Analysis Programs In addition you need the data file from GDAS with the Genotype information and SNP identifier and the pedigree file. If you have all of these files in your folder you can start using the dChipSNP program. Double-click on the dChip.exe icon in the folder and the user interface will open up. From there we will do a step by step linkage analysis. Analysis > Get External Data > Fill in the name of the study (group name) > select the Data file (GDAS file) > Select the SNP tab > click ok Confidential Page 3 2/15/2016 Carsten Rosenow, Ph.D. Linkage Analysis Programs The screen shows you general information about your experiment. Make sure all information is correct before you continue Analysis > Chromosome > Genome information file: select the genome info file > refGene file (optional): if you want gene information in your output, Confidential Page 4 2/15/2016 Carsten Rosenow, Ph.D. Linkage Analysis Programs select the refGene file > Cytoband file (optional): if you want Cytoband information in your file select the Cytoband file > Analysis method: Linkage analysis > click ok The program now reads all the information and generates the Chromosome view. Confidential Page 5 2/15/2016 Carsten Rosenow, Ph.D. Linkage Analysis Programs Chromosome > Linkage analysis > select the pedigree file > select dominant or recessive disease based on your disease model > if you want simulation, select the simulation tab >if you want to run a Mendel error check select the Detect Mendelian inheritance error tab or > for sib-pair analysis select the sibpair tab > [sib-pair is an experimental function for analyzing every two siblings by parametric linkage analysis and doesn’t mean the general sib-pair analysis; so better do not mention here] click ok If you have one Chromosome selected in your Chromosome view the LOD scores for this Chromosome will be calculated. If you check Chromosome > Show all, the LOD score for all Chromosomes will be calculated. This might take a while, depending on the size of your pedigree. The maximum bits are about 17 (based on the formula 2N-F <18 with N = Non-founders and F= Founders. CONGRATULATIONS You have just finished your first linkage analysis using dChipSNP. Please familiarize yourself with additional features in the program and different visualization options. Confidential Page 6 2/15/2016