th
LOD SCORE METHOD FOR ESTIMATING RECOMBINATION FREQUENCY................. 14
Page 2 of 19
Homozygosity is the state of possessing two identical forms of a particular gene (alleles), one inherited from each parent.
Homozygosity mapping is a method for mapping the human genome, used to detect genes that cause disease only when both copies in an individual are mutated (the genes are homozygous, or the same). This technique works for genetic disorders that are inherited from both parents, since inheriting a pair of heterozygous (different) genes results in expression of a non-mutated version from one parent, and the absence of disease symptoms. It is the way to map human recessive traits with the DNA of inbred children.
Up until now, homozygosity mapping was performed on families where the parents were known to be distantly related, and assumed the homozygosity of genes inherited by the children was due to their originating from a common ancestor. A significant limitation to this approach for studying diseases is the need for families with distantly related parents and several offspring harboring the disease.
To overcome these limitations, researchers from Michigan (USA) and Germany have recently demonstrated that this technique can also be used for outbreed individuals, and found that the disease-causing mutated genes were located in homozygous regions of their
DNA, in 93% of cases, even though the subjects weren’t known to be related. It was noted, that since the individuals were from similar geographic locations, they might still share a common ancestor, which might still be a reasonable assumption depending on the population being used. For example, decode Genetics does much of their genetic research in
Iceland because there is so little mixing of the gene pool.
While factors such as these might enhance the sensitivity of this technique, it's clear that the method can be applied to previously overlooked pools of individuals, to speed up the identification and sequencing of genes responsible for the large number of autosomal recessive disorders that remain to be characterized.
Homozygosity mapping can be done by either of the following methods
1.
SNPArrays
2.
RFLP
3.
Microsatellite markers
Page 3 of 19
It was proposed that a set of SNPs evenly spread across the human genome could be used to screen two populations (typically populations with and without a disorder) and that some
SNPs would associate more with the disease group, thus implicating the SNP, or a DNA sequence close by, in the disease state. A massive technological effort followed and whole genome scans, using tens of thousands of SNPs, were made a reality with the advent of array-based technologies.
One recent advance is the development of high-density SNP microarrays for genotyping.
The SNP arrays overcome low marker informativity by using a large number of markers to achieve greater coverage at finer resolution.
A simple and economical SNP genotyping method involving a single PCR reaction followed by gel electrophoresis, named tetra-primer ARMS-PCR, adopts certain principles of the tetra-primer PCR method and the amplification refractory mutation system (ARMS).
Both inner primers of the tetra-primer ARMS-PCR method encompass a deliberate mismatch at position –2 from the 3 ′ -terminus
Figure 1. Schematic presentation of the tetra-primer ARMS-PCR method.
In this figure, the single nucleotide polymorphism used here as an example is a G → A substitution, but the method can be used to type other types of single base substitutions.
Two allele-specific amplicons are generated using two pairs of primers, one pair (indicated by pink and red arrows, respectively) producing an amplicon representing the G allele and the other pair (indicated by indigo and blue arrows, respectively) producing an amplicon
Page 4 of 19
representing the A allele. Allele specificity is conferred by a mismatch between the 3 ′ terminal base of an inner primer and the template. To enhance allelic specificity, a second deliberate mismatch (indicated by an asterisk) at position –2 from the 3 ′ -terminus is also incorporated in the inner primers. The primers are 26 nt or longer, so as to minimize the difference in stability of primers annealed to the target and non-target alleles, ensuring that allele specificity results from differences in extension rate, rather than hybridisation rate.
By positioning the two outer primers at different distances from the polymorphic nucleotide, the two allele-specific amplicons differ in length, allowing them to be discriminated by gel electrophoresis.
The identification of mutations in genes that cause human diseases has largely been accomplished through the use of positional cloning, which relies on linkage mapping. In studies of rare diseases, the resolution of linkage mapping is limited by the number of available meioses and informative marker density.
One recent advance is the development of high-density SNP microarrays for genotyping.
The SNP arrays overcome low marker informativity by using a large number of markers to achieve greater coverage at finer resolution.
We used SNP microarray genotyping for homozygosity mapping in a small consanguineous
Israeli Bedouin family with autosomal recessive
Bardet-Biedl syndrome (BBS; obesity, pigmentary retinopathy, polydactyly, hypogonadism, renal and cardiac abnormalities, and cognitive impairment) in which previous linkage studies using short tandem repeat (STR) polymorphisms failed to identify a disease locus.
Homozygosity mapping with SNP arrays identifies
TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11).
SNP genotyping revealed a homozygous candidate region. Mutation analysis in the region of homozygosity identified a conserved homozygous mis-sense mutation in the TRIM32 gene, a gene coding for an E3 ubiquitin ligase. Functional analysis of this gene in zebra-fish and expression correlation analyses among other BBS genes in an expression quantitative trait loci data set demonstrate that TRIM32 is a BBS gene. This study shows the value of high-density SNP genotyping for homozygosity mapping and the use of expression correlation data for evaluation of candidate genes and identifies the proteosome degradation pathway as a pathway involved in BBS.
Page 5 of 19
Positional cloning is a powerful approach to identify genes mutated in human and animal models monogenic (single gene case) diseases. The genomic
DNA region (and the embedded polymorphisms) harboring the disease-causing mutations segregates with the disease in analyzed pedigrees. However, positional cloning relayed until recently on the availability of large pedigrees to reach a significant linkage.
This protocol describes the use of whole genome genotyping on sporadic consanguineous patients to identify potential disease loci and subsequent
Zygosity may also refer to the origin(s) of the alleles in a genotype. When the two alleles at a locus originate from a common ancestor by way of nonrandom mating
(inbreeding), the genotype is said to be autozygous positional candidate genes, by homozygosity mapping (autozygosity). It takes advantage of high density single nucleotide polymorphism (SNP) genotyping arrays, and of the assumption that unrelated patients from several consanguineous families are mutated in the same gene.
Summary of the procedure is as follows:
Day 1 Preparation of genomic DNA; DNA digestion; ligation; and PCR.
Day 2
Day 3
Electrophoresis; DNA purification; DNA fragmentation; electrophoresis;
DNA labeling.
DNA microarray hybridization, washing and scanning on a dedicated
Affymetrix platform
Day 4 Data analysis
Details of the steps are as follows:
Step – I: DNA purification, Digestion, Liation and PCR i.
Take genomic DNA add glycogen NaCl ethanol and precipitate 30 min (or up to overnight) at -20°C. Centrifuge 20 min 13.000 rpm at 4°C. Dry the pellet and resuspend in TE Buffer. ii.
XbaI digestion is done to obtain the digested DNA. Under the hood, prepare a master mix on ice buffer sterile water ;XbaI restriction enzyme. Add the purified genomic DNA .
Incubate at 37°C, then use a thermocycler for 20min at 70Cand keep at 4°C. iii.
DNA digestion is followed by adaptors ligation. On ice, prepare a mix containing Xbal adaptators, ligase l and DNA Ligase. Add the Xbal digested DNA and then add sterile water.
Page 6 of 19
iv.
PCR reaction is then performed to obtain the amplified DNA fragments. On ice, prepare a mix containing PCR buffer ,dNTP ,MgCl2 , Xbal primers, Taq polymerase and distilled water. Add the ligated DNA. Launch 4 PCR reactions per DNA sample. PCR is to be programmed on a 35 amplification cycles and then keep the amplified DNA at 4°C.
Step – II: Electrophoresis; DNA purification; DNA fragmentation; electrophoresis; DNA labeling. i.
Check the complexity of the amplified DNA fragments by loading each PCR on a 2% agarose/TAE gel : there should be a DNA ladder centered around 1-2 Kb. ii.
Pool the 4 PCR reactions per DNA samples and purify on a filtration. Wash three times with sterile water, then re-suspend the DNA with TE buffer. iii.
Measure the DNA concentration of each eluted DNA samples. 40 µg of the amplified DNA fragments is needed for the fragmentation. Add the fragmentation buffer. iv.
Load fragmented DNA on a 4% agarose/TAE gel. Expect a smear of DNA fragments centered around 50 bp. v.
DNA fragments labeling: On ice, prepare the labeling mix: labeling reagent Terminal deoxynucleotidyl Transferase. Add fragmented DNA. Incubate at 37°C, and keep at 4°C.
Step – III: DNA microarray hybridization, washing and scanning on a dedicated
Affymetrix platform i.
Arrays hybridization, washing, and scanning are performed on a dedicated Affymetrix microarray platform.
Step – IV: Data analysis
To determine homozygous regions, results from the Affymetrix platform are analyzed with the GDAS 3.0 software.
SNP call rate should be greater than 95% and signal detection greater than 99%. Display the loss of heterozygosity (LOH) parameter, which is based on the genomic regions based on the number of contiguous homozygous SNPs, their distance and heterozygosity rate.
Higher the LOH is, stronger is the chance that the region is truly homozygous by descent and derived from the same common ancestor.
Disease-linked loci are determined by comparing the selected homozygous regions of several consanguineous patients. If patients are from the same family, analysis of unaffected siblings can rule out common homozygous regions by chance. The probability that a number of consecutive SNPs would be homozygous by chance depends on the number of patients and families.
In case of low quality of the starting genomic DNA, or if there is a contamination between 2 different DNAs, the percentage of SNP call might be reduced considerably. Do not analyse results below 95% SNP call.
Page 7 of 19
Gray platelet syndrome (GPS) is an inherited bleeding disorder characterized by thrombocytopenia and the absence of α-granules in platelets. Patients with GPS present with mild to moderate bleeding and many develop myelofibrosis. The genetic cause of GPS is unknown. We present two Native American families with a total of five affected individuals and a single affected patient of Pakistani origin in which GPS appears to be inherited in an autosomal recessive manner. Homozygosity mapping using the Affymetrix
6.0 chips demonstrates that all six GPS affected individuals studied are homozygous for a
1.7 Mb region in 3p21. Linkage analysis confirmed the region with a LOD score of 2.7. Data from our families enabled us to significantly decrease the size of the critical region for GPS from the previously reported 9.4 Mb region at 3p21.
RFLP is a method used by molecular biologists to follow a particular sequence of DNA as it is passed on to other cells. It is technique that exploits variations in homologous DNA sequences. It refers to a difference between samples of homologous DNA molecules that come from differing locations of restriction enzyme sites. By cutting two different DNA molecules with the same restriction enzyme, scientists can compare the lengths of the fragments; two identical molecules will have identical fragments, while two similar molecules may be largely alike, with perhaps a few differences in fragment size. These differences in restriction fragment lengths are called polymorphisms and are used in all types of DNA typing.
RFLP methodology involves
Cutting a particular region of DNA with known variability, with restriction enzymes
Separating the DNA fragments by agarose gel electrophoresis
Determining the number of fragments and relative sizes
The pattern of fragment sizes will differ for each individual tested.
RFLP technique has many applications like
DNA fingerprinting in forensic science
Tracing ancestry
Studying evolution and migration of wild life
Detection and diagnosis of certain diseases
Genetic mapping (to calculate the genetic distance between two loci)
Page 8 of 19
Microsatellites are simple sequence tandem repeats (SSTRs). The repeat units are generally di-, tri- tetra- or pentanucleotides. For example, a common repeat motif in birds is AC n
, where the two nucleotides A and C are repeated in bead-like fashion a variable number of times (n could range from 8 to 50). They tend to occur in non-coding regions of the DNA
(this should be fairly obvious for long dinucleotide repeats) although a few human genetic disorders are caused by (trinucleotide) microsatellite regions in coding regions. On each side of the repeat unit are flanking regions that consist of "unordered" DNA. The flanking regions are critical because they allow us to develop locus-specific primers to amplify the microsatellites with PCR (polymerase chain reaction). That is, given a stretch of unordered
DNA 30-50 base pairs (bp) long, the probability of finding that particular stretch more than once in the genome becomes vanishingly small. In contrast, a given repeat unit (say AC
19
) may occur in thousands of places in the genome. We use this combination of widely occurring repeat units and locus-specific flanking regions as part of our strategy for finding and developing microsatellite primers. The primers for PCR will be sequences from these unique flanking regions. By having a forward and a reverse primer on each side of the microsatellite, we will be able to amplify a fairly short (100 to 500 bp, where bp means base pairs) locus-specific microsatellite region.
Microsatellites are useful genetic markers because they tend to be highly polymorphic. It is not uncommon to have human microsatellites with 20 or more alleles and heterozygosities.
The reason seems to be that their mutations occur in a fashion very different from that of
"classical" point mutations (where a substitution of one nucleotide to another occurs, such as a G substituting for a C). The mutation process in microsatellites occurs through what is known as slippage replication. If we envision the repeat units (e.g., an AC dinucleotide repeat) as beads on a chain, we can imagine that during replication two strands could slip relative positions a bit, but still manage to get the zipper going down the beads. One strand or the other could then be lengthened or shortened by addition or excision of nucleotides.
The result will be a novel "mutation" that comprises a repeat unit that is one bead longer or shorter than the original. The idea that adding or subtracting one repeat is likely easier than adding or subtracting two or more beads is the basis for using the Stepwise Mutation
Model (SMM) as opposed to the Infinite Alleles Model (IAM). An advantage of the SMM is that the difference in size then conveys additional information about the phylogeny of alleles. Under the IAM the only two states are "same" and "different". Under the SMM we have a potential continuum of different similarities. If, however, the SMM does not hold, then we may be worse off using it, it may actually be highly misleading. Even if the underlying mutation process is largely stepwise, it is not difficult to see how drift might affect the distribution of allele sizes in a way that would almost entirely invalidate the
SMM.
Page 9 of 19
Locus-specific (in contrast to multi-locus markers such as minisatellites or RAPDs)
Codominant (heterozygotes can be distinguished from homozygotes)
PCR-based (means we need only tiny amounts of tissue; works on highly degraded or "ancient" DNA)
Highly polymorphic provides considerable pattern
Useful at a range of scales from individual ID to fine-scale phylogenies.
Microsatellites are useful markers at a wide range of scales of analysis. Until recently, they were the most important tool in mapping genomes, such as the widely publicized mapping of the human genome. They serve a role in biomedical diagnosis as markers for certain disease conditions. That is, certain microsatellite alleles are associated (through genetic linkage) with certain mutations in coding regions of the DNA that can cause a variety of medical disorders. They have also become the primary marker for DNA testing in forensics contexts, both for human and wildlife cases (e.g., Evett and Weir, 1998). The reason for this prevalence as a forensic marker is their high specificity. Match identities for microsatellite profiles can be very high. In a biological/evolutionary context they are useful as markers for parentage analysis. They can also be used to address questions concerning degree of relatedness of individuals or groups. For captive or endangered species, microsatellites can serve as tools to evaluate inbreeding levels (F
IS
). From there we can move up to the genetic structure of subpopulations and populations (using tools such as Fstatistics and genetic distances). They can be used to assess demographic history (e.g., to look for evidence of population bottlenecks), to assess effective population size and to assess the magnitude and directionality of gene flow between populations. Microsatellites provide data suitable for phylogeographic studies that seek to explain the concordant biogeographic and genetic histories of the floras and faunas of large-scale regions. They are also useful for fine-scale phylogenies, up to the level of closely related species. An overview by Selkoe and Toonen (2006) provides a useful practical guide to the use of microsatellites as genetic markers.
Microsatellite DNA is probably rarely useful for higher-level systematics. That is because the mutation rate is too high. Across highly divergent taxa two problems arise. First, the microsatellite primer sites may not be conserved (that is the primers we use for Species A may not even amplify in Species B). Second, the high mutation rate means that homoplasy becomes much more likely, we can no longer safely assume that two alleles identical in state are identical by descent (from a common, meaning shared not abundant, ancestor). As a concrete example imagine two species, each with an AC
19
allele that occurs at high frequency. If the populations diverged long ago it becomes increasingly likely that the way those alleles arose took different pathways (e.g., in one species the AC
19
arose from an ancestor that went from AC
AC
19
; in the other species the ancestral AC
18
went to AC
19
18
to AC
19
to AC
20
then back to
and stayed there. Any inferences we make about the species relationships based on the AC
19
similarity would be misleading).
The identity in state does not correspond to the identity by descent that provides (reliable) phylogenetic signal. A further potential drawback of using microsatellites is that we tend to have relatively few loci to work with (4-20). In some situations, that raises the probability of having a bias due to forces such as selection acting on one or more loci that may give a
Page 10 of 19
misleading impression relative to the true pattern of change for the genome as a whole.
Figure: Stylized examples of microsatellite data.
Left half Four sets of data were produced by gel electrophoresis and so you can see the major (black) and stutter (gray) bands. MW; molecular weight standards. Right half These data were produced by analysis on an automated capillary electrophoresis-based DNA sequencer. The data are line graphs with the location of each peak on the X-axis representing a different sized PCR product and the height of each peak indicates the amount of PCR product. The major bands produce higher peaks than the stutter peaks.
It is a web based approach for homozygosity mapping with a store of marker data in a database into which users can upload their SNP genotype files. Database analyses the data in a few minutes, detects homozygous portions (alleles) and provides agraphical interface of the results. Software also provides the option to zoom into single chromosomes anduserdefined chromosomal regions. It is integrated with a gene search engine GeneDistiller which enables users to determine most promising gene. Users can restrict access or make their uploaded sequences public. Hence homozygosity mapper can be used as a data repository for homozygosity mapping based researches.
Genetic linkage means that certain genes tend to be inherited together, because they are on the same chromosome. Thus parental combinations of characters are found more frequently in offspring than non parental. Genetic loci that are physically close to one another on the same chromosome tend to stay together during meiosis, and are thus genetically linked.
Page 11 of 19
In 1905 the three geneticists William Bateson, Edith Rebecca Saunders, and Reginald
C.Punnett discovered an apparent exception to one of Mendel's foundational proposals: the principle of independent assortment.
In their work with pea plants, these researchers noticed that not all of their crosses yielded results that reflected the principle of independent assortment specifically, some phenotypes appeared far more frequently than traditional Mendelian genetics would predict. Based on these findings, they proposed that certain alleles must somehow be coupled with one another, although they weren't sure how this linkage occurred. The answer to this question came just seven years later, when Thomas Hunt Morgan used fruit flies to demonstrate that linked genes must be real physical objects that are located in close proximity on the same chromosome.
In 1910, Morgan discovered a fly with mutant white eyes while normally fruit flies have red eyes, not white eyes.
Morgan crossed this white eyed male fly to its red eyed sisters. Later he inbred the heterozygous F
1 red-eyed flies, the traits of the F
2
progeny did not assort independently. Morgan expected a 1:1:1:1 ratio of red-eyed females, red-eyed males, whiteeyed males, and white-eyed females. Instead, he observed the following phenotypes in his
F
2 generation:
2,459 red-eyed females
1,011 red-eyed males
782 white-eyed males
There were no white-eyed females, and Morgan wondered whether this was because the trait was sex-limited and only expressed in male flies. To test whether this trait was sex limited he completed a second cross between the original white-eyed male fly and some of his F
1 daughters. These crosses produced an F
2 generation with the following phenotypes:
129 red-eyed females
132 red-eyed males
88 white-eyed females
86 white-eyed males
Thus, the results of this cross did produce white-eyed females, and the groups had approximately equal numbers. Morgan therefore hypothesized that the eye color trait was connected with the sex factor. This in turn led to the idea of genetic linkage, which means that when two genes are closely associated on the same chromosome, they do not assort independently.
Page 12 of 19
A linkage map is a genetic map of a species or experimental population that shows the position of its known genes or genetic markers relative to each other in terms of recombination frequency, rather than as specific physical distance along each chromosome.
Linkage mapping is critical for identifying the location of genes that cause genetic diseases.
A genetic map is a map based on the frequencies of recombination between markers during crossover of homologous chromosomes. The greater the frequency of recombination
(segregation) between two genetic markers, the farther apart they are assumed to be.
Conversely, the lower the frequency of recombination between the markers, the smaller the physical distance between them. Historically, the markers originally used were detectable phenotypes derived from coding DNA sequences; eventually, confirmed or assumed noncoding DNA sequences such as microsatellites or those generating restriction fragment length polymorphisms (RFLPs) have been used.
Genetic maps help researchers to locate other markers, such as other genes by testing for genetic linkage of the already known markers.
A genetic map is not a physical map (such as a radiation reduced hybrid map) or gene map.
A map of the genes on a chromosome based on linkage analysis. A linkage map does not show the physical distances between genes but rather their relative positions, as determined by how often two gene loci are inherited together. The closer two genes are (the more tightly they are linked), the more often they will be inherited together.
Linkage distance is measured in centimorgans (cM).
Genetic linkage maps of each chromosome are made by determining how frequently two markers are passed together from parent to child. Because genetic material is sometimes exchanged during the production of sperm and egg cells, groups of traits (or markers) originally together on one chromosome may not be inherited together. Closely linked markers are less likely to be separated by spontaneous chromosome rearrangements. In this diagram, the vertical lines represent chromosome 4 pairs for each individual in a family.
The father has two traits that can be detected in any child who inherits them: a short known
DNA sequence used as a genetic marker (M) and Huntingtons disease (HD). The fact that one child received only a single trait (M) from that particular chromosome indicates that the fathers genetic material recombined during the process of sperm production. The frequency of this event helps determine the distance between the two DNA sequences on a genetic map.
Page 13 of 19
The LOD score (logarithm (base 10) of odds), developed by Newton E. Morton, is a statistical test often used for linkage analysis in human, animal, and plant populations. The
LOD score compares the likelihood of obtaining the test data if the two loci are indeed linked, to the likelihood of observing the same data purely by chance. Positive LOD scores favor the presence of linkage, whereas negative LOD scores indicate that linkage is less likely. Computerized LOD score analysis is a simple way to analyze complex family pedigrees in order to determine the linkage between Mendelian traits (or between a trait and a marker, or two markers).
The method is described in greater detail by Strachan and Read. Briefly, it works as follows:
Establish a pedigree
Make a number of estimates of recombination frequency
Calculate a LOD score for each estimate
The estimate with the highest LOD score will be considered the best estimate
The LOD score is calculated as follows:
Where:
Page 14 of 19
NR denotes the number of non-recombinant offspring,
R denotes the number of recombinant offspring.
Theta is the recombinant fraction, it is equal to R / (NR + R)
The reason 0.5 is used in the denominator is that any alleles that are completely unlinked
(e.g. alleles on separate chromosomes) have a 50% chance of recombination, due to independent assortment.
In practice, LOD scores are looked up in a table which lists LOD scores for various standard pedigrees and various values of recombination frequency.
By convention, a LOD score greater than 3.0 is considered evidence for linkage. A LOD score of +3 indicates 1000 to 1 odds that the linkage being observed did not occur by chance. On the other hand, a LOD score less than -2.0 is considered evidence to exclude linkage.
Although it is very unlikely that a LOD score of 3 would be obtained from a single pedigree, the mathematical properties of the test allow data from a number of pedigrees to be combined by summing the LOD scores. It is important to keep in mind that this traditional cutoff of LOD>+3 is an arbitrary one and that the difference between certain types of linkage studies, particularly analyses of complex genetic traits with hundreds of markers, these criteria should probably be modified to a somewhat higher cutoff.
The dilemma of mapping genes can be overcome through the "lod score method" which involves the estimation of genetic distances in the situations other than simple testcrosses. The data obtained from the pedigree is used to calculate the map distances from the recombination frequencies. It is one of the basic and fundamental human genetics methods used today. The use of spread sheet programs i.e.
Lotus 1-2-3 or Microsoft Excel makes the solution of this predicament effortless.
The fundamental aim of this problem is to determine R, the recombinant fraction (fraction of gametes that are recombinant), using data from relatively small families. R can vary from
0 (2 genes completely linked) to 0.50 (2 genes unlinked).
There are 4 basic steps in the method:-
(1) Determine the expected frequencies of F2 phenotypes for every value of R from 0.01 to
0.50
Page 15 of 19
(2) Determine the "likelihood" (L) that the family data observed resulted from a given R value: the maximum likelihood is the best estimate of R for the given data
(3) Determine the Odds Ratio and the logarithm of the odds ratio (lod score) by comparing the Likelihood for each value of R to the Likelihood for unlinked genes (R = 0.50)
(4) Add LOD scores from different families to achieve an acceptably high lod score so a specific most likely R can be assigned.
The following example used for consideration comprises of:-
Two genes showing the complete dominance the heterozygote is indistinguishable from the dominant homozygote
The expected offspring numbers are calculated as follows:
Determine the frequency of each gamete produced by the F1's. For example, if R= 0.20, then
20% of the gametes produced by either parent will be recombinant. Since there are two types of recombinant gamete, A b and a B, the frequency of each will be 0.10. Since 80% of the gametes will be parental, the frequency of the parental types A B and a b will be 0.40 each.
Use a Punnett square to determine the offspring being formed from the union of the gametes. Multiply the gamete frequencies to get the offspring frequency. For instance, one cell of the Punnett square has the A B gamete from the father combining with the A b gamete from the mother. The frequency of the A B gamete is 0.40 and the frequency of the A b gamete is 0.10. Thus the frequency of the offspring in this cell is 0.40 x 0.10 = 0.04.
Determine the phenotype for each cell in the Punnett square and add up the frequencies to get the total frequency for each offspring phenotype.
A B
0.40
A b
0.10 a B
0.10 a b
0.40
A B A B/A B A b/A B a B/A B a b/A B
0.40 0.16 0.04 0.04 0.16
A b A B/A b A b/A b a B/A b a b/A b
0.10 0.04 0.01 0.01 0.04
Page 16 of 19
a B A B/a B A b/a B a B/a B a b/a B
0.10 0.04 0.01 0.01 0.04 a b A B/a b A b/a b a B/a b a b/a b
0.40 0.16 0.04 0.04 0.16
F2 Phenotype
A_ B_
Cell Sums Expected Freq
0.16+.04+.04+.16+.04+.01+.04+.01+0.16 0.66
A_ bb aa B_
0.01 + 0.04 + 0.04
0.01 + 0.04 + 0.04
0.09
0.09 aa bb 0.16 0.16
Using a Punnett square to determine the genotypes and multiplying the frequencies of the two gametes that go into each type of offspring, then adding up offspring that have the same phenotype.
This is done by determining the likelihood (L) of the observed family for each value of R.
The likelihood is simply the probability of the observed family, as determined using the multinomial theorem, an extension of the binomial theorem.
First define the terms for the observed family: a = number of A_ B_ offspring b = number of A_ bb offspring c = number of aa B_ offspring d = number of aa bb offspring n = total offspring (= a + b + c + d)
Then define the terms for the expected family proportions (obtained from step 1 above): p = expected proportion of A _ B _ offspring
Page 17 of 19
q = expected proportion of A_ bb offspring r = expected proportion of aa B_ offspring s = expected proportion of aa bb offspring
The term of the multinomial equation that describes the actual family is: pa qb rc sd multiplied by a coefficient.
The coefficient is: n! /(a! b! c! d!), where ! means "factorial".
This is very similar to the coefficient for the binomial.
Thus, the likelihood equation is: L = [n! /(a! b! c! d!)]pa qb rc sd
Above calculated the expected phenotype proportions for R = 0.20 (20 map units between A and B). They are: A_ B_ = 0.66; A_ bb = 0.09; aa B_ = 0.09; aa bb = 0.16. A family of 5 children has 2 with the A_ B_ phenotype, 1 with aa B_, and 2 with aa bb.
L = (5!/2! 0! 1! 2!)(.66)2(.09)0(.09)1(.16)2
L = 30(.4356)(.09)(.0256)
L = 0.0301
The likelihood (L) needs to be calculated for all values of R between 0.01 and 0.50. Note that the coefficient will be the same for all values of R; the coefficient only depends on the observed data. When this is done, the value of R with the highest likelihood is the best estimate of R that can be obtained with data from this particular family.
The data needs to be compared and added from several different families, to get a good estimate of R. To do this, the L values must be standardized by calculating the Odds Ratio
(OR), which is the ratio of the L for each R value divided by the L for R = 0.50 (unlinked).
Then, the logarithm of the Odds ratio is taken; this is the LOD score. LOD scores from different families can be added. (This is equivalent to multiplying the Odds Ratios, as in the
AND rule for two events--family 1 AND family 2--both occurring.) A total LOD score for some R value of 3.0 is considered proof of linkage between the two genes.
For R = 0.20, the Odds Ratio = L0.20 / L0.50. We calculated L0.20 = 0.0301 above; L0.50 =
0.00695. The Odds ratio is thus 4.331 and the LOD score is the base 10 logarithm of this,
0.637. Clearly it would take several families of this size to reach a LOD score of 3.0.
Page 18 of 19
Discovery and Types of Genetic Linkage By: Ingrid Lobo, Ph.D. (Write Science
Right) & Kenna Shaw, Ph.D. (Executive Editor, Nature Education) © 2008 Nature
Education Citation: Lobo, I. & Shaw, K. (2008) Discovery and types of genetic linkage. Nature Education 1(1) http://www.bios.niu.edu/johns/lodprob.htm http://www.ornl.gov/sci/techresources/Human_Genome/publicat/primer/fig8.html http://www.bio.davidson.edu/courses/genomics/method/RFLP.html http://biotech.about.com/od/glossary/g/RFLPdef.htm http://www.nlm.nih.gov/visibleproofs/education/dna/rflp.pdf
http://www.biology-online.org/dictionary/Genetic_linkage
European Journal of Human Genetics (2007) 15, 362–368. doi:10.1038/sj.ejhg.5201761.
Protocol Exchange (2007) doi:10.1038/nprot.2007.343.
Page 19 of 19