Parametric Linkage Analysis

advertisement
Biomathematics 207B/ Biostatistics 237/Human Genetics 207B
2/10/2004
Laboratory #5 – Parametric Linkage Analysis
This week we will perform classical two-point linkage analysis using a di-allelic marker
and a multi-allelic marker.
Traditionally linkage analysis is performed after determining the mode of inheritance
and the environmental covariates. This means that segregation analysis has been run
first. You have spent the last 4 weeks determining the genetic model for the phenotype
natural log triglycerides, so we have a reasonably good idea of the final model.
Specifically we found evidence for a Mendelian, additive gene in HWE with age, bmi
and agexbmi as covariates.
(1) We will start with your best model. Mine was:
(a) Phenotype
yi  lntrigi      age ( Agei  Age)   bm i(bmii  bmi )
  addGi   agexbm i( Agei  Age)(bmii  bmi )  ei
where:
Parameter

age
bmi
agexbmi
Additive
2
Estimate
3.83
0.0024
0.294
0.012
0.792
0.080
(b) Genotype:
Hardy Weinberg Equilibrium holds. P(AA)=qA2, P(Aa)=2(1-qA)qA, P(aa)=(1-qA)2,
qˆ A  0.42 and Mendelian transmission probabilities.
(2) Change all the initial values to match the best estimates, then check the box “fixed”.
We will estimate the recombination fraction, denoted as R in GAP, assuming that
these estimates are correct.
(3) Defining a diallelic marker Gene, km:
In order to perform linkage analysis, we need to define the marker gene, the marker
phenotype, and specify the trait-marker pair. Our first marker, KM, is di-allelic and B is
dominant to b. The data are already present in your GAP files.
(a) Add a marker gene using MODEL-GENES-ADD MARKER
(b) km has phenotype KM that corresponds to the marker phenotype in the
original data set for triglycerides (see lab #2 for a description).
1
(c) Label the genotype. You may want to give it the same name as the
corresponding phenotype, in this case km.
(d) Fix the allele frequency for the B allele to be 0.14
(e) Provide the phenotype codes. Since B is dominant to b, only two codes are
needed. To match the data in the pedigree file, let 1 correspond to the B
phenotype (BB or Bb genotype) and let 2 correspond to the b phenotype (bb
genotype).
(4) Associate the marker gene to the marker phenotype in the data set. Use MODELPHENOTYPES-ASSIGN MARKER PHENOTYPES. There are two ways to
associate the genotype with the phenotype:
(a) Method 1: Highlight the marker gene you just defined and then highlight the
variable in the database that contains the maker phenotype (KM). Click the ADD
radio button.
(b) If you gave the marker genotype the same label as the variable in the database
then you can use method 2. Method 2: Click on the ASSOCIATE BY NAME
button. This option is particularly useful if you are analyzing many markers.
(5) Define the linkage between the appropriate trait gene (lnTG for example) and the
marker gene.
(a) Select MODEL-LINKAGE-ADD LINK.
(b) Provide a label for this link
(c) Select the trait locus
(d) Select the marker locus. Note that if you have several markers, you could select
“all markers” and GAP would perform separate linkage analyses for each marker
(sequentially) and print all results consecutively in the summary file.
(e) Provide a starting value for the recombination fraction (for example 0.10).
(f) We will assume linkage equilibrium. If we were not, we would need to specify
the starting values here too.
(g) Press o.k. Note that in the DEFINED LINKS dialog, the linkage you just defined
has an asterisk next to it. This means that the linkage analysis will be performed
using this linkage definition. If desired, you could define several links and use the
INCLUDE/EXCLUDE LINK button to indicate to GAP which linkages to
analyze.
(6) Save the model definition file (to KM.amf) and change the output file names to LAB5
(for example) in the METHOD MLE menu and run the analysis.
(a) Note the maximum likelihood estimate (MLE) of the recombination fraction
(indicated by R)
(b) Look at the summary file using RESULTS SELECT FILE AND VIEW FILE. At
the bottom of the file is the LOD score at the MLE of the recombination fraction
(R) and the LOD scores for selected values of R. Plot the LOD score versus R
using a program like excel. Is there significant evidence of linkage?
(7) THE MULTI-ALLELIC MARKER. Now we will look at a marker with 3 alleles.
The data for this marker are found in two separate files called ACP_2004.DAT and
ACP_2004.mdf. Copy the files from the network drive onto your c:\temp
subdirectory
2
(8) Use an editor to view the ACP_2004.DAT file.
(a) This file contains the marker data for the HGAR1 family (the triglycerides
family).
(b) The file has three columns, pedigree, subject id, and marker data (ACP). The
marker is codominant so there are 6 possible phenotypes. They are coded in the
following manner 1=a/a, 2=a/b, 3=a/c, 4=b/b, 5=b/c, 6=c/c.
(c) You will need to merge these data with your current data. Before merging make
backups of all the files. Copy the Lab2.* files. Also copy your most current
*.amf file.
(d) Merge in the data for ACP by using KINDRED-FILE-MODIFY-ADD
VARIABLE-PERSON FILE. Read in the ACP data as a numeric variable
needing 1 space. Use ASCII INPUT to read in ACP_2004.DAT.
IMPORTANT!!!!! Remember to check the overwrite duplicates box.
(f) Using PERSON-EDIT define 0 to be the missing genotype.
(9) Now use an editor to view the ACP_2004.mdf file. This file contains all the
information about marker needed to run the analysis. MDF stands for marker
definition file. This file is used to tell GAP the number of alleles and the phenotype
codes for each genotype. The file format is:
Locus Label (up to 15 characters)
F or E
First allele label
First allele frequency
.
.
.
Last allele label
phenotype code
Last allele frequency;
genotype(s)
.
.
.
phenotype code
genotype(s);
The F indicates that the allele frequencies should be fixed to the values listed in the file.
An E would indicate that they should be estimated using these values as initial values.
If we had used this option for entering the KM marker information, our KM.mdf file
would look like:
KM
B
b
1
2
F
0.14
0.86;
B/B B/b
b/b;
NOTES: The semicolons (;) are important. They tell the program where the allele
frequencies end and where the phenotype codes end. The “/” must be used to define the
genotypes and multiple genotypes for one phenotype must each be separated by a space.
If you had several markers you could put them all into one *.mdf file provided they each
had a unique label.
3
(10) To analyze the marker ACP, define the marker gene. Use MODEL-GENESADD MARKER GENE. Check the Load Marker Definition file box and then select
ACP_2004.MDF
(11) Assign marker phenotypes with MODEL-PHENOTYPES. Associate the marker
gene to the marker phenotype in the data set. Use MODEL-PHENOTYPES-ASSIGN
MARKER PHENOTYPES. Define this new linkage with MODEL-LINKAGE.
Since you have already run KM, you can exclude this link if you want.
(12) Save this model as ACP.AMF and run the analysis. You may need to increase the
number of iterations (METHOD-MAXIMUM LIKELIHOOD - EDIT PROCEDURE)
to get the analysis to converge. Plot the LOD curve. Which of the two markers looks
more likely to be linked to the trait, ACP or KM?
(13) What would the results be like if we had dichotomized triglycerides and then
conducted a linkage analysis? Suppose we had found the same measured covariates
and a dominant Mendelian inheritance pattern.
(a)
(b)
(c)
Starting in the KINDRED PROGRAM, categorize triglycerides into 2 groups
using a triglyceride value of 119 as the cut off. Use PERSON RECODE and
choose the CATEGORIZE option. Create TRICUT with cut point 119. Note
that the lower category contains the point, i.e., the first category (1) is
Triglyceride<=119 and the second category (2) is Triglyceride>119 (affecteds).
Check the computation to see which codes are used for affected and unaffected
TRICUT. Check that missing data are handled correctly. (Use PERSON EDIT or
PERSON ANALYZE).
Since we are now considering a qualitative trait, linear regression is no longer
appropriate for the phenotype. Use logistic regression instead. In GAP the
regression is specified when you define the TRICUT as dichotomous. Only one
phenotype can be defined at any time. Save your amf file then go to MODEL,
PHENOTYPE, select the currently defined phenotype, and DELETE TRAIT
PHENOTYPE. Then select ADD TRAIT PHENOTYPE. Name the phenotype,
select DICHOTOMOUS and select the same measured covariates used in the
above analysis. Now, however, use a dominant mode of inheritance and fix the
parameters to the following values:
Phenotype:
logit P(trigi  120)      age ( Agei  Age)   bm i(bmii  bmi )
  domGi   agexbm i( Agei  Age)(bmii  bmi )
where:
Parameter

age
bmi
agexbmi
dominant
Estimate
-7.068
-0.0253
2.344
0.104
4.683
4
Genotype: Hardy Weinberg Equilibrium holds. P(AA)=qA2, P(Aa)=2(1-qA)qA, P(aa)=(1qA)2, qˆ A  0.30 and Mendelian transmission probabilities.
(d)
Rerun the analysis of markers ACP and KM.
Homework: Due Thursday 2/19/04. Answer TWO of the three following questions:
(1) What is the effect of model misspecification? Try some examples.
(a) Change the model parameters and their values. Change the Mendelian disease gene
frequency from 0.42 to 0.12 and rerun the ACP marker. Plot the LOD score versus
Ron the provided graph. Compare the estimates of R and its standard error to the
ACP results from lab 5.
(b) Return the Mendelian allele frequency back to 0.42. Add in a sex effect using the
coefficient value you obtained in the segregation analysis. Fix the coefficient value
and rerun the ACP marker. (My covariate estimate was equal to -0.085). Again plot
the results and compare the estimate and standard error to the analysis conducted in
lab for ACP.
(c) Remove the sex effect. Change the gene model from additive to dominant (use 3.03
as the coefficient for the dominant model). Rerun the ACP marker. Plot the LOD
score versusR. Compare the estimate of R and its standard error to the ACP results
from lab 5.
(d) Return the gene model to additive. Change the marker allele frequencies in the ACP
file so that the a marker has frequency 0.475, the b marker has frequency 0.050 and
the c marker has frequency 0.475. Rerun the ACP marker. Plot the LOD score
versusR. Compare the estimate of R and its standard error to the ACP results from
lab 5.
(e) Restore the original marker allele frequencies. Fix the coefficients to your best
estimates for a restricted general model, and rerun the ACP marker. Plot the LOD
score versusR. Compare the estimate of R and its standard error to the ACP results
from lab 5.
(2) Joint segregation and linkage analysis with the ACP marker. It is possible to jointly
estimate the segregation parameters and the recombination fraction. The LOD scores in
GAP must be recalculated however before they can be interpreted.
(a) Allow all the parameters except the marker allele frequencies to vary. Select EDITGENES. Select the Mendelian gene and EDIT GENES. Remove the check mark on
FIXED by the initial value. Set all the initial values for all the covariates, the intercept
and variance in the same manner. Record the estimates of the parameters, including
their standard errors. NOTE: Be on the lookout for local maxima.
(b) Compare these estimates to the estimates from the separate segregation analysis and
linkage analyses. Ideally you should run several starting values for the disease gene
frequency and the recombination fraction.
5
(c) NOTE: WHEN PERFORMING JOINT SEGREGATION AND LINKAGE, THE
LOD SCORES IN THE SUMMARY FILE ARE OVERESTIMATES OF THE
TRUE LOD SCORES. This is because GAP fixes the segregation parameters to
those found jointly with the mle value of R, then computes the likelihood at a grid of
Rvalues. To obtain the true LOD scores, you need to maximize the likelihood at
fixed values of R, allowing all the segregation parameters to vary. To get the correct
values, run separate runs of GAP with R fixed at 0.50, 0.01, 0.10, 0.15, and 0.20 and
allow all the segregation to vary. Record the natural log likelihoods in the following
table.
(d) Now compute the LOD score for each R including the mle using this formula:
LOD(Ra)=[lnlikelihood(Ra)-lnlikelihood(0.50)]/ln(10).
(e) Hand in the computed lod score table for joint segregation and linkage. Explicitly
write out the null hypothesis and the alternative hypothesis corresponding to these
LOD scores.
TABLE FOR PROBLEM 2:
Recombination fraction
Ln-likelihood
LOD Score
R̂ 
R= 0.01
R = 0.10
R = 0.15
R = 0.20
R = 0.50
0.00
(3) The dichotomous model:
(a) Interpret the segregation model for the dichotomous trait used in lab 5: What is
probability of carrying the risk allele A? What is the probability that a person of
average age and bmi has high triglycerides (Trig>119) if their genotype is AA? What
is the odds ratio comparing the risk of high triglycerides for AA individuals with aa
individuals when both are age 29 and have bmi 3.5? What is the odds ratio comparing
the risk of high triglycerides for Aa individuals with aa individuals when both of age
75 and bmi 3.0?
(b) How do your linkage results differ from the results for the continuous trait? Suggest
some reasons for these differences.
6
Download