statgen8

advertisement
LINKAGE ANALYSIS
Recombination Fraction

During synapsis, crossing-over may occur between any two non-sister
chromatids

If there are allelic differences at the site of crossing over, the genetic result is
recombination

Genes on the same chromosome are connected physically (syntenic)

At least a thousand to several thousand for each human chromosome
Recombination fraction

Prophase of the first meiotic cell division

Homologous pairing of chromosomes (synapsis)

Each chromosome consists of two fully formed chromatids joined at the
centromere; four chromatids for each pair of chromosomes

Each chromatid represents a separate DNA molecule

If two syntenic genes are close enough that a crossover occurs between them
less than once per meiosis, on the average, the two genes are genetically
linked

Recombination fraction (RF or  ) is 1/2 the frequency of crossovers (a single
crossover involves two of four chromatids in a synapsed pair of
chromosomes)

If two loci are so far apart that on average there is at least one crossover
between them in every meiosis, then  = 50%, the loci are unlinked

 can not be greater than 50%

Morton et al. (1982) used cytological preps of spermatocytes and reported an
average of 52 crossovers per male meiosis

Recombination fraction is expressed in map units or centiMorgans (cM)

1 mu = 1 cM;  of 1% over small distances

One crossover on average implies a genetic map length of 50 cM

If two loci are separated by a distance such that an average of one crossover
occurs between them in every meitotic cell, then those loci are 50 cM apart

52 crossovers implies a total genetic map length of 2600 cM in humans; thus,
1 cM equals approximately 1 megabase of sequence

Not additive over long distances due to multiple crossovers (positive or
negative interference); mapping functions have been developed to address this
phenomenon
number of recombinant gametes
total gametes

Recombination fraction ( ) 

Linkage describes the phenomenon whereby allele at neighbouring loci are
close to one another on the same chromosome, they will be transmitted
together more frequently than chance.

 = 0 : no recombination => complete linkage

 < 0.5 : partial linkage

 = 0.5 : no linkage
Linkage Analysis

For a couple of which the genotypes at the A and B are known, the probability
of observing the genotypes of the offspring depends on the value of 


Let us assume the following crossing:


Therefore, such a couple can have 4 types of offspring

There are two possible situations:

The alleles A1 and B1 may be on the same chromosome within the pair, in
which case A1 and B1 are said to be "coupled";

They may be on different chromosomes, in which case A1 and B1 are said to
be in a state of "repulsion".

Assuming that there is gamete equilibrium at the A and B loci, in parent 1
there is a probability of 1/2 that alleles A1 and B1 will be coupled, and a
probability of 1/2 that they will be in repulsion.

(1) A1 and B1 are coupled,
o The probability that parent (1) provides the gametes A1B1 and A2B2
is (1-  )/2 and the probability that this parent provides gametes A1B2
and A2B1 is  /2. The probability that the couple will have child of
type (1) or (2) is (1-  )/2, and that of their having a type (3) or type (4)
child is  /2.
o The probability of finding n1 children of type (1), n2 of type (2), n3 of
type (3) and n4 of type (4) is therefore [(1-  )/2]n1+n2 x (  /2)n3+n4

(2) A1 and B1 are in a state of repulsion
o The probability that parent (1) provides the gametes A1B2 and A2B1
is (1-  )/2 and the probability that this parent provides gametes A1B1
and A2B2 is  /2.
o The probability of the previous observation is therefore: (  /2)n1+n2
x[(1-  )/2]n3+n4

With no additional information about the A1 and B1 phase, and assuming that
the alleles at the A and B loci are in a state of coupling equilibrium, the
probability of finding n1, n2, n3 and n4 children in categories (1), (2), (3), (4)
is: p(n1,n2,n3,n4/  )=1/2{[(1 -  )/2]n1+n2 x (  /2)n3+n4 + (  /2) n1+n2 x [(1-  )/2]
n3+n4

}
So the liklihood of  for an observation n1, n2, n3, n4 can be written:
L( |n1,n2,n3,n4)=1/2 {[(1- )/2]n1+n2 ( /2)n3+n4 + ( /2)

n1+n2
[(1- )/2]
n3+n4
}
In the special case: number of children n= 1,regardless of the category to
which this child belongs :L(q) = 1/2 [(1-  )/2] + 1/2 [  /2] = 1/4

The likelihood of this observation for the family does not depend on  . We
can say that such a family is not informative for  .

An "informative family" is a family for which the liklihood is a variable
function of  .

One essential condition for a family to be informative is, therefore, that it has
more than one child. Furthermore, at least one of the parents must be
heterozygotic.

Definition: if one of the parents is doubly heterozygotic and the other is
o A double homozygote, we have a backcross
o A single homozygote, we have a simple backcross
o A double heterozygote, we have a double intercross
Definition Of The "Lod Score" Of A Family

Take a family of which we know the genotypes at the A and B loci of each of
the members.

Let L(  ) be the likelihood of a recombination fraction 0 and  < 1/2

L(1/2) is the likelihood of  = 1/2, that is of independent segregation into A
and B.

The lod score of the family in  is:

Z(  ) = log10 [L(  )/L(1/2)]

Z can be taken to be a function of  defined over the range [0,1/2].

The likelihood of a value of  for a sample of independent families is the
product of the likelihoods of each family, and so the lod score of the whole
sample will be the sum of the lod scores of each family.

Several methods have been proposed to detect linkage: "U scores", were
suggested by Bernstein in 1931, "the sib pair test" by Penrose in 1935,
"likelihood ratios" by Haldane and Smith in 1947, "the lod score method"
proposed by Morton in 1955 (1). Morton’s method is the one most commonly
used at present.

The test procedure in the lod score method is sequential (Wald, 1947 (2)).
Information, i.e. the number of families in the sample, is accumulated until it
is possible to decide between the hypotheses H0 and H1 :

H0 : genetic independence  = 1/2

H1: linkage of  1 0 <  1 < 1/2

The lod score of the  1 sample

Z(  1) = log10 [L(  1)/L(l/2)]

indicates the relative probabilities of finding that the sample is H1 or H0.
Thus, a lod score of 3 means that the probability of finding that the sample is
H1 is 1000 times greater than of finding that it is H0
("lod = logarithm of the odds").

Test For Linkage

The decision thresholds of the test are usually set at -2 and +3, so that if:

Z(  1) > 3 H0 is rejected, and linkage is accepted.

Z(  1) < -2 linkage of H1 is rejected.

-2 < Z(  1) < 3 it is impossible to decide between H0 and H1. It is necessary
to go on accumulating information.

For the thresholds chosen, -2 and +3, we can show that:

The first degree error, (False negative) < 10-3

The second degree error (false positive), < 10-2

The reliability, 1- > 0.95

 is the probability that we conclude that H1 is true given H0
Significance of results

In fact, what is being tested is not a single value of  1 relative to  1 = 1/2,
but a whole set of values between 0 and 1/2, with a step of various size (0.01
or 0.05).

If there is a value of  1 such that Z(  1) >3: linkage is concluded to exist.

If there is a value of  1 such that

Z(  1) = -2

The linkage is excluded for any  <  1


If -
 ) < 3, no conclusion can be drawn, the sample is not
sufficiently informative.
Criticism

The proposed test has the advantage of being very simple, and of providing
protection against falsely concluding linkage.

However, some criticisms can be leveled, not only against the criteria chosen ,
but also against the entire principle of using a sequential procedure .

The number of families typed is, indeed, rarely chosen in the light of the test
results.
Estimation Of The Recombination Fraction

If the test, on a sample of the family, has demonstrated linkage between the A
and B loci, then one may want to estimate the recombination fraction for these
loci.

The estimated value of  is the value which maximizes the function of the lod
score Z, and this is equivalent to taking the value of  for which the
probability of observing linkage in the sample is greatest.
Recombination Fraction For A Disease Locus
And A Marker Locus

Let us assume we are dealing with a disease carried by a single gene,
determined by an allele, g0, located at a locus G (g0: harmful allele, G0:
normal allele).

We would like to be able to situate locus G relative to a marker locus T, which
is known to occupy a given locus on the genome. To do this, we can use
families with one or several individuals affected and in which the genotype of
each member of the family is known with regard to the marker T.

In order to be able to use the lod scores method described above, what is
needed is to be able to extrapolate from the phenotype of the individuals
(affected, not affected) to their genotype at locus G (or their genotypical
probability at locus G)

What we need to know is:
o the frequency, g0
o the penetration vector f1, f2,f3

f1 = Pr (affected /g0g0)

f2 = Pr (affected /g0G0)

f3 = Pr (affected /G0G0)

It will often happen that the information available for the marker is not also
genotypic, but phenotypic in nature. Once again, all possible genotypes must
be envisaged.

As a general rule, the information available about a family concerns the
phenotype. To calculate the likelihood of  , we must envisage all the possible
genotype configurations at each of the loci, for this family, writing the
likelihood of  for each configuration, weighting it by the probability of this
configuration, and knowing the phenotypes of individuals in A and B.

Knowledge of the genetic parameters at each of the loci (gene frequency,
penetration values) is therefore necessary before we can estimate  .
Download