040310 Write up: Family Based Association

Biostatistics 237 /Biomathematics 207B/HG207B March 9, 2004 Account name: m237 Password: winter2002, win2002 Laboratory #9: FAMILY BASED ASSOCIATION TESTS: TDT AND GAMETE COMPETITION (A) The data sets: This exercise has two parts. In part I, we will run the TDT and the gamete competition on angiotensin I-converting enzyme (ACE) as a qualitative trait. The ACE gene is located on 17q23. When running the TDT or the gamete competition for qualitative traits, we will consider anyone with an ACE level of less than 0.648 to be affected. The data set for part I consists of extended and nuclear families from Oxford phenotyped for ACE and genotyped for the insertion-deletion (ID) polymorphism and the highly informative polymorphism in the neighboring growth hormone (GH). As a prelude to part I we will run the combining_alleles option of Mendel 5.0 to reduce the number of GH alleles and avoid sparse data problems. In part II we will use the gamete_competition as a test of family based association with a quantitative trait. The data set for part II consists of extended and nuclear families from Jamaica phenotyped for ACE. We will examine 3 SNPs located within the ACE locus. The data consist of ACE levels on 405 people and SNP data on 489 people in 83 pedigrees. We will first run each SNP separately and then we will use the SNPs in combination. Copy the following data sets from the F:\class\bio237 folder to your directory: Part I files: pedoxf.in locoxf.in mapoxf.in concomb.in contdt.in congam.in Part II files: Consnp.in locsnp.in mapsnp.in varace.in pedsnp.in 1 Part I: TDT and gamete competition for a qualitative trait (B) Reducing the number of GH alleles. To avoid having a very large number of possible cells many with no data, we will combine alleles in the GH. This is absolutely necessary for the gamete competition. Otherwise it will run extremely slowly. The TDT will run reasonably well without collapsing the number of alleles, however because of their discrete nature, very sparse data will lead to false inference with both methods. Combining very rare alleles will avoid this problem. The control in this case, concomb.in, is the following: concomb.in: !input files LOCUS_FILE=LOCoxf.IN MAP_FILE=MApoxf.IN PEDIGREE_FILE=PEDoxf.IN ! reading input AFFECTED=1 AFFECTED_LOCUS_OR_FACTOR=ACE READ_PEDIGREE_RECORDS = F pedigree_list_read=true allele_separator=male =1 female=2 !new output new_pedigree_file=pednew.in new_locus_file=locnew.in ! analysis options ANALYSIS_OPTION=Combining_alleles OUTPUT_FILE=comb.out Maximum_combined_alleles=7 This Mendel option creates a new locus file and corresponding pedigree file so it is important to specify the new file names. Combining_alleles uses the allele frequencies in the locus file to determine which alleles will be combined. The program combines alleles until there are no more than the maximum number of alleles (user specified) and they are at least as frequent as the minimum allele frequency (also user specified). The defaults are a maximum of 10 alleles and a minimum allele frequency of 0.05. The minimum number of alleles is 2, even if one of them has an allele frequency less than the specified minimum allele frequency. Run the combining allele option of Mendel 5.0 using Gregor by reading in this control file, writing out the control.in file and selecting the option "Run Mendel". Examine the new pedigree file and locus file and note the changes. The new pedigree file is formatted (the top line gives the fortran format) and the number of alleles at the GH locus have been reduced. (5(1X,A8),(T51,3(1X,A8),:)) 1 1 1 1-2 20-20 1-2 8-8 2 1 2 2 1 2 The new locus file decodes the combined alleles. GH 6 7 8 11 13 19 20 Autosome 7 0.12378 0.08870 0.10916 0.10039 0.07115 0.13158 0.37524 0 ORIGINAL ORIGINAL ORIGINAL ORIGINAL ORIGINAL ORIGINAL ORIGINAL ALLELE ALLELE ALLELE ALLELE ALLELE ALLELE ALLELE NUMBERS: NUMBERS: NUMBERS: NUMBERS: NUMBERS: NUMBERS: NUMBERS: 6 9 14 3 4 5 7 12 1 2 8 10 18 11 16 13 15 17 19 20 Note, for example that alleles 1, 2, 10 and 18 have all been combined with allele 8. (C) Running the TDT The control file now uses the new pedigree and locus file. The pedigree file is formatted so we no longer have the command pedigree_list_read=true. Instead we use the default, pedigree_list_read=false. contdt.in: !input files LOCUS_FILE=LOCnew.IN MAP_FILE=MAPoxf.IN PEDIGREE_FILE=PEDnew.IN ! reading input AFFECTED=1 AFFECTED_LOCUS_OR_FACTOR=ACE READ_PEDIGREE_RECORDS = F allele_separator=male =1 female=2 !new output new_pedigree_file=pednew.in new_locus_file=locnew.in ! analysis options ANALYSIS_OPTION=TDT OUTPUT_FILE=TDT.out Summary_File=TDTsum.out samples=100000 In the pedigree file males are designated with 1 and female with 2. Affecteds (ACE less than -0.648) are designated as 1 and unaffecteds as 2. Because the pvalues are estimated by Monte Carlo simulation we need to specify the number of samples. The default is 10,000 but we have increased the number to 100,000. Run the TDT option of Mendel 5.0 using Gregor by reading in this control file, writing out the control.in file and selecting the option "Run Mendel". There will be two output files, a summary file and a full output file. Examine them both. Note that the pvalue is given as 0.0000. The actual pvalue is not 0.0000. It is reported as such because none of the 100,000 samples gave a statistic that was as extreme or more extreme than the observed statistic. You should report the pvalue as "less than 1x10-5" (< 1/samples) in this case. 3 (D) Running the gamete competition on a qualitative trait. We will now analyze the data in pednew.in using the gamete competition. The gamete competition uses data from all the affecteds in the pedigree rather than the just the trios with affected children. It allows for missing data. The control file, congam.in has the following form: !input files LOCUS_FILE=LOCnew.IN MAP_FILE=MApoxf.IN PEDIGREE_FILE=PEDnew.IN ! reading input AFFECTED=1 AFFECTED_LOCUS_OR_FACTOR=ACE READ_PEDIGREE_RECORDS = F allele_separator=male =1 female=2 !new output new_pedigree_file=pednew.in new_locus_file=locnew.in ! analysis options ANALYSIS_OPTION=gamete_competition model=2 OUTPUT_FILE=gam.out Summary_File=gamsum.out The notable differences between this control file and the one for the TDT are: (1) no samples specified (asymptotic pvalues only) (2) There are model options. Models 1 and 2 are for qualitative traits. Models 3 and 4 are for quantitative traits. Models 1 and 3 use the allele frequencies given in the locus file. Models 2 and 4 jointly estimate the allele frequencies. Run the gamete competition option of Mendel 5.0 using Gregor by reading in this control file, writing out the control.in file and selecting the option "Run Mendel". Again there will be two output files, a summary file and a full output file. Examine them both and compare the results with the results for the TDT. PART II: Running the gamete competition on a quantitative trait. (E) The input files. The control file, Consnp.in contains: !input files MAP_FILE = mapsnp.in PEDIGREE_FILE = Pedsnp.in variable_file=varace.in LOCUS_FILE = locsnp.in ! output files SUMMARY_FILE = Sumsnp.out OUTPUT_FILE = Mendsnp.out ! instructions to read input map_list_read=true MALE = 1 FEMALE = 2 4 quantitative_trait=ACE ! analysis specific information analysis_option=Gamete_competition MODEL = 4 Transform = STANDARDIZE::ACE Because we are running a quantitative trait and we want to jointly estimate the allele frequencies, the model option is 4. We need to specify a variable_file and the name of the quantitative trait. We asked that the trait be standardized (subtracting off the mean and dividing by the variance) although it isn't necessary in this case because ACE values have already been standardized in the process of adjusting for age and sex differences. There are some changes in the locus file and the pedigree file the first part of the lab. The SNPs have already been combined for you into a single locus. Two of the 8 haplotypes were estimated to be very rare so they were combined with other haplotypes. The markers are treated as non-codominant so we must specify the relationship of the phenotypes to the genotypes in the locus file. t469 AUTOSOME 627 ATA 0.40190 ATG 0.00780 ACA 0.06740 ACG 0.18310 TEA 0.01340 TEA is TTA+TCA TEG 0.32640 TEG is TTG+TCG Note that because 122 denotes A/A T/C A/G, a double heterozygote, we need to specify that there are two haplotype configurations that are consistent with the multilocus genotype. 111 1 ATA/ATA . . 122 ATA/ACG ATG/ACA 2 We will also run the SNPs as single loci. These have also been coded with a single number designation so we need to "decode" them in the locus file. Snp4 A T 1 A/A 2 A/T 3 T/T AUTOSOME 2 3 0.80000 0.20000 1 1 1 5 Finally, I have used the Fortran format to read in single loci snp4, snp6 and snp9 as well as the multilocus SNP genotype for SNPs 4,6, and 9 combined. (3X,I5,A8) (16X,3A8,7X,2A1,T69,5X,3A1,T69,2(2X,2A8)) 10 1 1 2 1 2 112 112 122 122 -0.395 -1.788 This is an "old style" MENDEL pedigree file. The first fortran format statement reads in number of individuals in the pedigree and the family id number. The second fortran format statement reads the information for each individual. There are data for 4 multilocus snp combinations and we want only to use the last one. We could set this up through the map file, but here I have just skipped over all the data I didn't want to include using T69 (tab to the 69th column). I first read in the data for the individual SNPs then I return to the same column position and read in the data as a multilocus SNP. (F) Run Mendel 5.0 using the Gregor interface. Load in Consnp.in, write a new control.in file and run. The output : There is a summary file that should look like: MARKER NAME Snp4 Snp6 Snp9 t469 P-VALUE MAX OMEGA 0.00000 0.00000 0.00000 0.00000 1.07786 0.00000 1.40464 1.52939 FREQ ALLELE NAME 0.33534 0.57608 0.51465 0.32207 MIN OMEGA T C G TEG 0.00000 -1.21367 0.00000 0.00000 FREQ 0.66466 0.42392 0.48535 0.40515 ALLELE NAME A T A ATA And a more complete output file with the actual test statistics, all parameter estimates and their standard errors. The statistics are: THE THE THE THE LIKELIHOOD LIKELIHOOD LIKELIHOOD LIKELIHOOD RATIO RATIO RATIO RATIO TEST TEST TEST TEST STATISTIC STATISTIC STATISTIC STATISTIC IS IS IS IS 0.4917E+02 0.6006E+02 0.7655E+02 0.8129E+02 AT AT AT AT LOCUS LOCUS LOCUS LOCUS Snp4. Snp6. Snp9. t469. (G) NO Homework - Please start working on your project data. In next week's laboratory I will reserve time at the end for you to get help running Mendel with your project data if you are having problems. 6

040310 Write up: Family Based Association

Related documents

Products

Support

040310 Write up: Family Based Association

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib