Biomathematics 207B/ Biostatistics 237/Human Genetics 207B 2/10/2004 Laboratory #5 – Parametric Linkage Analysis This week we will perform classical two-point linkage analysis using a di-allelic marker and a multi-allelic marker. Traditionally linkage analysis is performed after determining the mode of inheritance and the environmental covariates. This means that segregation analysis has been run first. You have spent the last 4 weeks determining the genetic model for the phenotype natural log triglycerides, so we have a reasonably good idea of the final model. Specifically we found evidence for a Mendelian, additive gene in HWE with age, bmi and agexbmi as covariates. (1) We will start with your best model. Mine was: (a) Phenotype yi lntrigi age ( Agei Age) bm i(bmii bmi ) addGi agexbm i( Agei Age)(bmii bmi ) ei where: Parameter age bmi agexbmi Additive 2 Estimate 3.83 0.0024 0.294 0.012 0.792 0.080 (b) Genotype: Hardy Weinberg Equilibrium holds. P(AA)=qA2, P(Aa)=2(1-qA)qA, P(aa)=(1-qA)2, qˆ A 0.42 and Mendelian transmission probabilities. (2) Change all the initial values to match the best estimates, then check the box “fixed”. We will estimate the recombination fraction, denoted as R in GAP, assuming that these estimates are correct. (3) Defining a diallelic marker Gene, km: In order to perform linkage analysis, we need to define the marker gene, the marker phenotype, and specify the trait-marker pair. Our first marker, KM, is di-allelic and B is dominant to b. The data are already present in your GAP files. (a) Add a marker gene using MODEL-GENES-ADD MARKER (b) km has phenotype KM that corresponds to the marker phenotype in the original data set for triglycerides (see lab #2 for a description). 1 (c) Label the genotype. You may want to give it the same name as the corresponding phenotype, in this case km. (d) Fix the allele frequency for the B allele to be 0.14 (e) Provide the phenotype codes. Since B is dominant to b, only two codes are needed. To match the data in the pedigree file, let 1 correspond to the B phenotype (BB or Bb genotype) and let 2 correspond to the b phenotype (bb genotype). (4) Associate the marker gene to the marker phenotype in the data set. Use MODELPHENOTYPES-ASSIGN MARKER PHENOTYPES. There are two ways to associate the genotype with the phenotype: (a) Method 1: Highlight the marker gene you just defined and then highlight the variable in the database that contains the maker phenotype (KM). Click the ADD radio button. (b) If you gave the marker genotype the same label as the variable in the database then you can use method 2. Method 2: Click on the ASSOCIATE BY NAME button. This option is particularly useful if you are analyzing many markers. (5) Define the linkage between the appropriate trait gene (lnTG for example) and the marker gene. (a) Select MODEL-LINKAGE-ADD LINK. (b) Provide a label for this link (c) Select the trait locus (d) Select the marker locus. Note that if you have several markers, you could select “all markers” and GAP would perform separate linkage analyses for each marker (sequentially) and print all results consecutively in the summary file. (e) Provide a starting value for the recombination fraction (for example 0.10). (f) We will assume linkage equilibrium. If we were not, we would need to specify the starting values here too. (g) Press o.k. Note that in the DEFINED LINKS dialog, the linkage you just defined has an asterisk next to it. This means that the linkage analysis will be performed using this linkage definition. If desired, you could define several links and use the INCLUDE/EXCLUDE LINK button to indicate to GAP which linkages to analyze. (6) Save the model definition file (to KM.amf) and change the output file names to LAB5 (for example) in the METHOD MLE menu and run the analysis. (a) Note the maximum likelihood estimate (MLE) of the recombination fraction (indicated by R) (b) Look at the summary file using RESULTS SELECT FILE AND VIEW FILE. At the bottom of the file is the LOD score at the MLE of the recombination fraction (R) and the LOD scores for selected values of R. Plot the LOD score versus R using a program like excel. Is there significant evidence of linkage? (7) THE MULTI-ALLELIC MARKER. Now we will look at a marker with 3 alleles. The data for this marker are found in two separate files called ACP_2004.DAT and ACP_2004.mdf. Copy the files from the network drive onto your c:\temp subdirectory 2 (8) Use an editor to view the ACP_2004.DAT file. (a) This file contains the marker data for the HGAR1 family (the triglycerides family). (b) The file has three columns, pedigree, subject id, and marker data (ACP). The marker is codominant so there are 6 possible phenotypes. They are coded in the following manner 1=a/a, 2=a/b, 3=a/c, 4=b/b, 5=b/c, 6=c/c. (c) You will need to merge these data with your current data. Before merging make backups of all the files. Copy the Lab2.* files. Also copy your most current *.amf file. (d) Merge in the data for ACP by using KINDRED-FILE-MODIFY-ADD VARIABLE-PERSON FILE. Read in the ACP data as a numeric variable needing 1 space. Use ASCII INPUT to read in ACP_2004.DAT. IMPORTANT!!!!! Remember to check the overwrite duplicates box. (f) Using PERSON-EDIT define 0 to be the missing genotype. (9) Now use an editor to view the ACP_2004.mdf file. This file contains all the information about marker needed to run the analysis. MDF stands for marker definition file. This file is used to tell GAP the number of alleles and the phenotype codes for each genotype. The file format is: Locus Label (up to 15 characters) F or E First allele label First allele frequency . . . Last allele label phenotype code Last allele frequency; genotype(s) . . . phenotype code genotype(s); The F indicates that the allele frequencies should be fixed to the values listed in the file. An E would indicate that they should be estimated using these values as initial values. If we had used this option for entering the KM marker information, our KM.mdf file would look like: KM B b 1 2 F 0.14 0.86; B/B B/b b/b; NOTES: The semicolons (;) are important. They tell the program where the allele frequencies end and where the phenotype codes end. The “/” must be used to define the genotypes and multiple genotypes for one phenotype must each be separated by a space. If you had several markers you could put them all into one *.mdf file provided they each had a unique label. 3 (10) To analyze the marker ACP, define the marker gene. Use MODEL-GENESADD MARKER GENE. Check the Load Marker Definition file box and then select ACP_2004.MDF (11) Assign marker phenotypes with MODEL-PHENOTYPES. Associate the marker gene to the marker phenotype in the data set. Use MODEL-PHENOTYPES-ASSIGN MARKER PHENOTYPES. Define this new linkage with MODEL-LINKAGE. Since you have already run KM, you can exclude this link if you want. (12) Save this model as ACP.AMF and run the analysis. You may need to increase the number of iterations (METHOD-MAXIMUM LIKELIHOOD - EDIT PROCEDURE) to get the analysis to converge. Plot the LOD curve. Which of the two markers looks more likely to be linked to the trait, ACP or KM? (13) What would the results be like if we had dichotomized triglycerides and then conducted a linkage analysis? Suppose we had found the same measured covariates and a dominant Mendelian inheritance pattern. (a) (b) (c) Starting in the KINDRED PROGRAM, categorize triglycerides into 2 groups using a triglyceride value of 119 as the cut off. Use PERSON RECODE and choose the CATEGORIZE option. Create TRICUT with cut point 119. Note that the lower category contains the point, i.e., the first category (1) is Triglyceride<=119 and the second category (2) is Triglyceride>119 (affecteds). Check the computation to see which codes are used for affected and unaffected TRICUT. Check that missing data are handled correctly. (Use PERSON EDIT or PERSON ANALYZE). Since we are now considering a qualitative trait, linear regression is no longer appropriate for the phenotype. Use logistic regression instead. In GAP the regression is specified when you define the TRICUT as dichotomous. Only one phenotype can be defined at any time. Save your amf file then go to MODEL, PHENOTYPE, select the currently defined phenotype, and DELETE TRAIT PHENOTYPE. Then select ADD TRAIT PHENOTYPE. Name the phenotype, select DICHOTOMOUS and select the same measured covariates used in the above analysis. Now, however, use a dominant mode of inheritance and fix the parameters to the following values: Phenotype: logit P(trigi 120) age ( Agei Age) bm i(bmii bmi ) domGi agexbm i( Agei Age)(bmii bmi ) where: Parameter age bmi agexbmi dominant Estimate -7.068 -0.0253 2.344 0.104 4.683 4 Genotype: Hardy Weinberg Equilibrium holds. P(AA)=qA2, P(Aa)=2(1-qA)qA, P(aa)=(1qA)2, qˆ A 0.30 and Mendelian transmission probabilities. (d) Rerun the analysis of markers ACP and KM. Homework: Due Thursday 2/19/04. Answer TWO of the three following questions: (1) What is the effect of model misspecification? Try some examples. (a) Change the model parameters and their values. Change the Mendelian disease gene frequency from 0.42 to 0.12 and rerun the ACP marker. Plot the LOD score versus Ron the provided graph. Compare the estimates of R and its standard error to the ACP results from lab 5. (b) Return the Mendelian allele frequency back to 0.42. Add in a sex effect using the coefficient value you obtained in the segregation analysis. Fix the coefficient value and rerun the ACP marker. (My covariate estimate was equal to -0.085). Again plot the results and compare the estimate and standard error to the analysis conducted in lab for ACP. (c) Remove the sex effect. Change the gene model from additive to dominant (use 3.03 as the coefficient for the dominant model). Rerun the ACP marker. Plot the LOD score versusR. Compare the estimate of R and its standard error to the ACP results from lab 5. (d) Return the gene model to additive. Change the marker allele frequencies in the ACP file so that the a marker has frequency 0.475, the b marker has frequency 0.050 and the c marker has frequency 0.475. Rerun the ACP marker. Plot the LOD score versusR. Compare the estimate of R and its standard error to the ACP results from lab 5. (e) Restore the original marker allele frequencies. Fix the coefficients to your best estimates for a restricted general model, and rerun the ACP marker. Plot the LOD score versusR. Compare the estimate of R and its standard error to the ACP results from lab 5. (2) Joint segregation and linkage analysis with the ACP marker. It is possible to jointly estimate the segregation parameters and the recombination fraction. The LOD scores in GAP must be recalculated however before they can be interpreted. (a) Allow all the parameters except the marker allele frequencies to vary. Select EDITGENES. Select the Mendelian gene and EDIT GENES. Remove the check mark on FIXED by the initial value. Set all the initial values for all the covariates, the intercept and variance in the same manner. Record the estimates of the parameters, including their standard errors. NOTE: Be on the lookout for local maxima. (b) Compare these estimates to the estimates from the separate segregation analysis and linkage analyses. Ideally you should run several starting values for the disease gene frequency and the recombination fraction. 5 (c) NOTE: WHEN PERFORMING JOINT SEGREGATION AND LINKAGE, THE LOD SCORES IN THE SUMMARY FILE ARE OVERESTIMATES OF THE TRUE LOD SCORES. This is because GAP fixes the segregation parameters to those found jointly with the mle value of R, then computes the likelihood at a grid of Rvalues. To obtain the true LOD scores, you need to maximize the likelihood at fixed values of R, allowing all the segregation parameters to vary. To get the correct values, run separate runs of GAP with R fixed at 0.50, 0.01, 0.10, 0.15, and 0.20 and allow all the segregation to vary. Record the natural log likelihoods in the following table. (d) Now compute the LOD score for each R including the mle using this formula: LOD(Ra)=[lnlikelihood(Ra)-lnlikelihood(0.50)]/ln(10). (e) Hand in the computed lod score table for joint segregation and linkage. Explicitly write out the null hypothesis and the alternative hypothesis corresponding to these LOD scores. TABLE FOR PROBLEM 2: Recombination fraction Ln-likelihood LOD Score R̂ R= 0.01 R = 0.10 R = 0.15 R = 0.20 R = 0.50 0.00 (3) The dichotomous model: (a) Interpret the segregation model for the dichotomous trait used in lab 5: What is probability of carrying the risk allele A? What is the probability that a person of average age and bmi has high triglycerides (Trig>119) if their genotype is AA? What is the odds ratio comparing the risk of high triglycerides for AA individuals with aa individuals when both are age 29 and have bmi 3.5? What is the odds ratio comparing the risk of high triglycerides for Aa individuals with aa individuals when both of age 75 and bmi 3.0? (b) How do your linkage results differ from the results for the continuous trait? Suggest some reasons for these differences. 6