Applications of Knot Theory to DNA

advertisement

Applications of Knot Theory to DNA 1

Applications of Knot Theory to DNA

Dominic Maes

Math 491 – Knot Theory

Dr. Starrett

Applications of Knot Theory to DNA 2

In the field of mathematics, knot theory, a branch of topology, is the quantitative study of mathematical knots. Knot theory defines knots as embeddings of a circle in

Euclidian 3-space. The origins of knot theory began in the 1880’s with Lord Kelvin’s vortex atom theory in which he hypothesized that fundamental particles, such as atoms and molecules, are knots in the ether. When support for the vortex atom theory among the scientific community withdrew, the study of knot theory became inactive for almost a century. Interest in knot theory revived again in 1953 after James Watson and Francis

Crick discovered the structure of the most basic genetic material of life on earth,

Deoxyribonucleic Acid (DNA). DNA takes the form of two long strands of alternating sugars and phosphates that are twisted together in a right-handed helix. The strands of

DNA are tightly packed inside cells and as a result, the DNA strands are usually tangled and knotted. Naturally, because of this knotting and tangling of DNA within cells many applications of knot theory to molecular biology have developed over the last few decades.

One of the primary applications of knot theory in molecular biology is modeling the mechanism of DNA recombination. The vital genetic processes such as replication and transcription require DNA to be topologically manipulated through recombination.

Recombination is a process in which certain enzymes called topoisomerases cut two neighboring strands of DNA and reconnects them in a different sequence. This cutting and reconnecting, preformed by enzymes, is essential for certain processes such as transcription and replication to take place.

It is a great challenge for biologists to create a model that accurately describes the actions of enzymes during recombination since enzymes are too small to observe.

Applications of Knot Theory to DNA 3

However, by utilizing the mathematics of knot theory it is possible to create a mathematical model for enzymatic processes that are carried out during DNA recombination. The purpose of this paper is to explain the application of knot theory to

DNA recombination.

In order to understand the mechanism of DNA recombination it is necessary to describe the physical and chemical structure of a DNA molecule. Recall that the basic structure of a DNA molecule is two strands joined together by bonds that are twisted into a right-hand helix. This structure resembles a twisted ribbon as shown in Figure 1.

Figure 1. A sketch of the structure of a DNA molecule. [2]

The two strands of a DNA molecule are composed of alternating sugar and phosphate molecules, which are held together by chemical bonds. Figure 2 shows a sugar and phosphate molecule.

Applications of Knot Theory to DNA 4

Figure 2. The chemical structure of the sugar molecule and the phosphate molecule. An alternating chain of sugar and phosphate molecules make up the two stands in a DNA molecule. [2]

Bonded to each sugar molecule is one of four bases. These four bases, illustrated in

Figure 3, are called adenine, thymine, cytosine and guanine.

Figure 3. The chemical structure of the four bases in a DNA molecule: adenine, guanine, thymine and cytosine. The bases hold the two strands of a DNA molecule together. [2]

In Figure 3, the letter R on the bases represents the site where the base is bonded to the sugar molecule. Bases bonded to a sugar molecule on one strand are bonded to

Applications of Knot Theory to DNA 5 corresponding bases, which are attached to sugar molecules on the other strand. In this manner, the two strands of a DNA molecule are held together. For bonding between bases, each base can only bond to its specific counterpart. For example, adenine can only bond to thymine and cytosine can only bond to guanine. The combination of bonding two bases creates what is referred to as a base pair . It has been determined that DNA has a pitch of about 10.5 base pairs [2]. In other words, the two strands in DNA form one complete rotation of a double helix every 10.5 base pairs.

The combination of a sugar molecule, phosphate molecule and base molecule is known as a nucleotide . Nucleotides are the building block units of DNA molecules.

Within a nucleotide, each sugar molecule has two sites where phosphate molecules are able to form a bond. These two sites are referred to as the 3’ site and the 5’ site [2].

Figure 4 shows an illustration of a chain of two nucleotides as well as the 3’ site and the

5’ site.

Figure 4. A diagram of the chemical structure of a DNA strand. [2]

The end or tip of each strand is labeled according to the 3’ site and the 5’ site. The end

Applications of Knot Theory to DNA 6 containing the 3’ site is called the 3’ end while the end containing the 5’ site is called the

5’ end. The two strands in DNA are oppositely oriented so that the 3’ end of one strand is always bonded to the 5’ end of the other strand and vice versa.

In DNA recombination, enzymes cut out a segment of DNA from one DNA molecule and transcribe that segment into another DNA molecule. There are several different types of DNA recombination such as homologous, site specific and transpositional. Of the different types of recombination, knot theory is most commonly used to model site-specific recombination. In site-specific recombination enzymes interact with DNA only at specific sites called recombination sites .

In nature, DNA can exist in linear form (a long line segment) or circular form (a closed circle). Although DNA is most commonly found in linear form, when studying enzymatic processes during site-specific recombination molecular biologists prefer to use

DNA that is in circular form. This is because the quantification of knottedness and tangledness of DNA molecules is easier to understand with circular DNA.

Before recombination of a circular DNA molecule takes place, the DNA is referred to as the substrate and after recombination takes place it is called the product .

The sequence of events in the process of site-specific recombination of circular DNA begins by aligning the DNA so that two recombination sites are in close proximity to each other as shown in Figure 6a. Once the DNA is aligned properly, an enzyme bonds itself to both of the recombination sites as seen in Figure 6b. Each of the recombination sites has a certain orientation, which is determined by the order of the bases as one reads around the circular DNA. If the orientations of the recombination sites are the same, then the pair of sites is referred to as direct repeats . If the orientations of the sites are

Applications of Knot Theory to DNA 7 oppositely oriented, then the recombination sites are called inverted repeats . Figure 5 shows the difference between direct repeats and inverted repeats.

Figure 5. A diagram showing inverted repeats and direct repeats.

In inverted repeats, the orientations of the recombination sites are reversed. In direct repeats, the orientations of the recombination sites are the same. [11]

While the enzyme is bonded to the DNA, the enzyme cuts the DNA at both recombination sites and then recombines the ends by exchanging them, forming the product shown in Figure 6.

Figure 6. A diagram showing the recombination sites on a DNA molecule, an enzyme bound to DNA and the product that is formed after recombination takes place. [11]

In certain instances, an enzyme may perform more than one recombination event while bound to the DNA. This occurrence is referred to as processive recombination .

After an enzyme acts on one or more circular DNA molecule(s), the resulting product may take the form of a complicated knot or link. There are several methods that allow researchers to determine what kind of knot or link is formed from the substrate after sitespecific recombination events have taken place.

Applications of Knot Theory to DNA 8

One of these methods is called gel electrophoresis . Gel electrophoresis is a method in which molecules are separated on the basis of size and shape. First DNA molecules are placed on a plate, which is coated with agarose gel. One end of the plate is a positively charged electrode to which the negatively charged DNA molecules are attracted. Because of frictional forces, DNA that is more tightly knotted will travel faster towards the electrode. When the electrode is turned off, DNA molecules that are closest to the electrode are the most knotted while DNA which is furthest from the electrode is the least knotted. Figure 7 shows a graph that relates the distance traveled by circular

DNA on the agarose gel to the average number of crossings within each DNA molecule.

Figure 7. A graph relating crossing number of DNA to the distance a

DNA molecule will travel on agarose gel during gel electrophoresis.

[11]

The method of gel electrophoresis can only give an estimate of the number of crossings that a DNA molecule contains. For a more precise measurement, an electron microscope is used to view the DNA molecule and count the number of crossings in a

Applications of Knot Theory to DNA 9 product. Therefore, when studying DNA recombination, researchers know the substrate, which is the unknot, and they are able to determine the product through the methods of gel electrophoresis and electron microscopy. However, these methods do not reveal the precise actions that enzymes carry out on DNA to form the product.

Knot theory enables scientists to create mathematical models, which predict the exact topological actions that enzymes carry out to manipulate DNA. One of these models is the Tangle Model developed by Ernst and Sumners in 1990, which predicts the actions of the enzyme Tn3 resolvase on circular DNA during site-specific recombination.

In order to understand the Tangle Model it is necessary to first introduce some basic definitions and concepts pertaining to knot theory and tangle theory.

Recall that knot theory analyzes the properties of closed loops in 3-space. Tangle theory is analogous to knot theory except instead of analyzing closed loops, tangle theory looks at strings whose ends remain stationary. In order to understand the utility of tangle theory, consider the unit ball in R

3

centered at the origin of the x-y-z axis. Next, imagine two strings entering the sphere at two points and exiting at two points as seen in Figure 8.

Figure 8. A two string tangle. [2]

Inside the sphere the strings may be tangled or knotted. This is referred to as a 2-string

Applications of Knot Theory to DNA 10 tangle . If one were to look at a projection onto the x-y plane of the tangle, one would see the tangle in R 2 with the two strings intersecting the circle at four points (NW, NE, SW and SE) as shown in Figure 9.

Figure 9. The projection of a two string tangle onto the x-y plane. [2]

This two dimensional projection of a tangle is called the tangle diagram of a tangle. A 2string tangle, such as the one in Figure 9, is denoted by (B,t) where B represents the circle on which the end points of the two strings lie and t represents the tangle.

The most simple 2-string tangle, shown in figure 10, has no crossings and is called the trivial tangle .

Figure 10. A tangle diagram the trivial tangle.

Rational tangles are defined as the family of tangles that can be transformed into the trivial tangle through an ambient isotopy where the end points of t remain on the

Applications of Knot Theory to DNA 11 boundary of B throughout the isotopy. All Rational tangles can be represented by a vector

(a

1

, a

2

, …, a n

) such that a i is a rational number for all i. Given a vector it is possible to create its corresponding tangle diagram through the following steps: first, start with the trivial tangle as seen in Figure 10. Next, take the two end points SW and SE and make a

1 right-hand half twists (if a

1

is negative then make a

1

left hand half twists). Next, perform a

2

right-hand half twists (a

2

left-hand twists if a

2

is negative) on the end points NE and

SE. Perform a

3

half twists on the end points SW and SE and so on. Continue with this procedure until the last twist in the tangle vector is completed. Figure 11 shows the tangle corresponding to the tangle vector (2,1,2).

Figure 11. A tangle diagram corresponding to the tangle vector (2,1,2).

Rational tangles can also be expressed by a continued fraction, which is equal to a rational number

/

[3]. Given the tangle vector, (a

1

, a

2

, …, a n

) for some tangle T, it is possible to construct the continued fraction of T which is a n

+ 1/[a n-1

+ 1/(a n-2

+ … +1/a

1

)]

=

/

[3]. In this case, the rational number,

/

, is referred to as the tangle fraction of T.

The tangle fraction is essentially a formal description of a tangle. Furthermore, it can be shown that if two tangles have the same tangle fraction, the tangles are ambient isotopic to each other [3]. This is known as Conway’s Theorem.

Applications of Knot Theory to DNA 12

In the application of tangle theory to DNA recombination, there exist three important tangle operations, which are tangle addition, the numerator closure of a tangle and the denominator closure of a tangle. The first operation, tangle addition, takes two tangles and combines them in a way to form a new tangle. Given two tangles A and B, A

+ B is formed by taking the NE and SE end points of A and connecting them with the

NW and SW end points of B as shown in Figure 12.

Figure 12. A tangle diagram illustrating tangle addition.

Given a tangle A, the numerator closure of A, denoted by N(A), is formed by connecting the NW and NE endpoints of A as seen in Figure 13.

Figure 13. An illustration of the numerator closure of a tangle.

Given a tangle B, the denominator closure of B, denoted by D(B), is formed by

Applications of Knot Theory to DNA 13 connecting the SW and SE endpoints of B as seen in Figure 14.

Figure 14. An illustration of the denominator closure of a tangle.

The numerator or denominator closure of a tangle can form various knots or links depending on the configuration of the tangle. For example, given a tangle T, as seen in

Figure 15, D(T) is the hopf link.

Figure 15. An illustration showing how the denominator closure of a certain tangle forms the hopf link.

The tangle operations of addition, numerator closure and denominator closure can be used together in tangle equations to form knots or links. For example N( (2,0) + 1) =

K t

is the tangle equation for the trefoil knot, K t

, as seen in Figure 16.

Applications of Knot Theory to DNA 14

Figure 16. An illustration showing how the numerator closure of two certain tangles forms the trefoil knot.

Since the trefoil knot can be created by the closure of one or more tangles it is considered a Montesinos knot . More generally, the definition of a Montesinos knot or link is the following: a knot or link L is said to be Montesinos if there exist rational tangles A

1

, A

2

,

…, A n

such that L = N(A

1

+ A

2

+ … + A n

) [2].

Another important concept in tangle theory is the 4-plat . A 4-plat is a representation of a knot or link consisting of a braid on 4 strings. Like rational tangles, every 4-plat can be represented with a continued fraction and vector. 4-plat vectors always contain an odd number of integer entries and take on the form (c

1

, c

2

, …, c

2n+1

).

The vector of a 4-plat is often referred to as the Conway symbol of a 4-plat. The continued fraction for a 4-plat formed with the entries of the Conway symbol is 1/[c

1

+

1/(c

2

+ … + 1/c

2n+1

)] =

/

. The 4-plat

/

is often denoted by b(

,

). It can be shown that two 4-plats are ambient isotopic to one an other if and only if they have the same

Conway symbol or if they have the reverse Conway symbol of each other [1].

Given a certain Conway symbol (c

1

, c

2

, …, c

2n+1

), the corresponding 4-plat is created though the following steps: First, start with four parallel strings as seen in Figure

Applications of Knot Theory to DNA 15

17a. Take strings 2 and 3 and make c

1

half twists. Next, moving from right to left, make c

2

half twists on strings 1 and 2. Go back to strings 2 and 3 and make c

3

half twists and so on. When all of the twists described by the Conway symbol are completed, connect the ends of strings. For example, if the Conway symbol is (2,1,1), two half twists are made between strings 1 and 2, one half twist is made between strings 2 and 3 and one half twist is made between strings 1 and 2 as seen in Figure 17b-d. Finally, complete the construction of the 4-plat by closing the ends as shown in Figure 17e.

Figure 17. The formation of a 4-plat with the Conway symbol (2,1,1). [11]

It can be shown that given two rational tangles, A and B, N(A + B) is a 4-plat [1].

Therefore, one can write a tangle equation of the form N(A+B) = K where K is an unoriented knot or link expressed as a 4-plat, b(α,β). When applying tangle theory to the topology of DNA recombination, tangle equations arise. It is often useful to solve for different variables within a given tangle equation. In some cases, one is given two known tangles and it is necessary to solve for the corresponding unknown 4-plat. Solving for this unknown 4-plat can be accomplished with the aide of Theorem 1 (shown below) which was proved by Ernst and Sumners.

Theorem 1 [1]. Given two rational tangles O = u/v and P = x/y, then N(O + P) is a 4-plat b(α,β), where α = |vx + uy| and β is determined as follows:

(i) if α = 0 then β = 1;

(ii) if α = 1, then β = 0;

Applications of Knot Theory to DNA 16

(iii) if α > 1, then β is uniquely determined by the following: 0<β<α and β =

σ(vy’ + ux’)(mod α), where σ = sign(vx + uy) and y’ and x’ are integers with xx’ – yy’ = 1.

In other cases, one may be given a system of tangle equations in which the tangles are unknown and their corresponding 4-plats are known. Theorems 2, 3 and 4 consider the case in which one is given a system of four simultaneous equations of the form N(O + iR) = K i

for 0 ≤ i ≤ 3. Theorems 2, 3 and 4 (shown below) outline a method to solve for the tangles O and R.

Theorem 2 [1]. We are given the system of four simultaneous tangle equations N(O + iR)

= K i

for 0≤ i ≤ 3 where the K i

are 4-plats and {K

1

, K

2

, K

3

} represent at least 2 different link or knot types. Then there is at most one simultaneous solution {O,R}, and this solution must be of the form R an integral tangle and O either a rational tangle or the sum of two rational tangles.

Theorem 3 [1]. Let {O,R} be a simultaneous solution of the system N(O + iR) = K i

for 0≤ i ≤ 3, where the K i

are 4-plats and {K

1

, K

2

, K

3

} represent at least two different link or knot types. Then R is a non-zero integral tangle and O is either rational or the sum of two rational tangles.

Theorem 4 [1]. Let O and R be tangles such that O is either rational or a sum of two rational tangles, and R = (n), where n ≠ 0. Moreover, suppose that N(O + iR) = K i

for 0 ≤

Applications of Knot Theory to DNA 17 i ≤ 3, where the K i are 4-plats with crossing number c i

. Then |n| is determined as follows:

(I) If c

0

< c

3

> c

2

> c

1

, then |n| = c

3

– c

2

;

(II) If c

0

= c

3

and c

1

= c

2

, then |n| = c

3

– c

2

= c

0

– c

1

;

(III) If c

0

= c

3

and c

1

≠ c

2

, then |n| = 1;

(IV) If c

0

≠ c

3

and c

2

≤ c

1

, then |n| = c

0

– c

1

.

The paper will now take concepts from tangle theory and apply them to the mechanism of DNA recombination by describing the tangle model. As mentioned earlier in the paper, the tangle model, proposed by Ernst and Sumners in 1990, presents a mathematical description of the actions of the enzyme Tn3 resolvase on DNA during the mechanism of site-specific recombination. Recall that a tangle is denoted by (B,t) where

B is a unit circle in 3-space and t is the tangle.

Ernst and Sumners proposed that when modeling DNA recombination, the enzyme can be thought of as the unit sphere, B, and the two DNA molecules in question can be thought of as the tangle, t. By representing the enzyme as B and the DNA as t it is possible to write tangle equations for the product and substrate molecules before and after the recombination event. In order to write these tangle equations, it is necessary to think of the DNA and enzyme as the addition of several tangles. The first tangle, called the site tangle , denoted by T, is the section of tangled DNA that is bounded to the enzyme ball as seen in Figure 18a. The next tangle, also shown in Figure 18a, is called the substrate tangle and is denoted by S. The substrate tangle is any other tangle within the substrate

DNA that is not contained in the enzyme ball.

Applications of Knot Theory to DNA 18

Figure 18. An illustration of a DNA molecule with the site tangle (T), the substrate tangle (S) and the recombination tangle R. [2]

Notice in Figure 18a that a number of twists are not contained within S. This is because these tangles can be eliminated through a series of Reidemeister 1 moves and therefore they are not significant. We can now describe the substrate molecule in terms of the following tangle equation: N(S + T) = K

0

where K

0

is the substrate molecule.

When a site-specific recombination event occurs, one crossing within S is changed and a new tangle is formed in place of S. This new tangle, seen in Figure 18b, is called the recombination tangle and is denoted by R. Mathematically, the recombination event replaces S with R in the tangle equation for the substrate molecule and forms a new equation. This new equation, which is the tangle equation for the product molecule, is

N(S + R) = K

1

where K

1

is the product molecule.

Applications of Knot Theory to DNA 19

Advanced study of enzymatic processes has lead molecular biologists to believe that the actions of enzymes on DNA are always identical during site-specific recombination [2] [6] [7]. Mathematically this means that the product after processive recombination (recall that processive recombination is when the enzyme performs more than one recombination event while bound to the DNA) is N(S + nR) where n is a positive integer and represents the number of recombination events that the enzyme performed. A processive site-specific recombination event in which n = 2 is illustrated in

Figure 18c.

In 1990, Ernst and Sumners proved the following theorem, which applies to the

Tn3 resolvase enzyme acting on DNA during site-specific recombination [2].

Theorem 5. Suppose that S, T and R are tangles satisfying the following equations : (1)

N(S +T) = the unknot; (2) N(S+R) = the Hopf link; (3) N(S + R +R) = the figure eight knot; and (4) N(S + R + R + R) = the Whitehead link. Then S and R are both rational tangles, S = (3,0), R = (1), and N(S + R + R + R + R) is the 6

2

knot [2].

This theorem states that when the Tn3 resolvase enzyme acts on a substrate, which is the unknot, the product after the first recombination event will be the Hopf link. The product after the second recombination event will be the figure eight knot and after the fourth event the product will be the 6

2

knot. These knots are shown in Figure 19.

Applications of Knot Theory to DNA 20

Figure 19. An illustration of the hopf link, figure eight knot, white head link and the 6

2 knot. [2]

Theorem 5 deals with the special case in which the substrate takes the form of the unknot.

In order to model the actions of the Tn3 resolvase enzyme, when the substrate does not take the form of the unknot, it is necessary to set up a system of tangle equations. Each of these tangle equations corresponds to a different recombination event. By using the methods of Theorems 2, 3 and 4, it is then possible to determine the structure of the tangles S, R and T. When these tangles are solved, one can determine the actions of the enzyme during the various recombination events.

Before applications of knot theory to DNA developed, molecular biologists lacked a mathematical method of expressing enzymatic processes, which take place during recombination, in a quantitative and rigorous fashion. The lack of these mathematical methods prevented biologists from creating models that could predict the precise actions of enzymes. In other words biologists had no idea what enzymes were doing to DNA during recombination. Knot theory has helped researchers create models, such as the tangle model, that predict the precise actions of enzymes during recombination.

Applications of Knot Theory to DNA 21

References

[1] Ernst, C., & Sumners, D. W. (1999). Solving tangle equations arising in a DNA recombination model. Cambridige Journals , 126 , 23-35.

[2] Flapan, E. (2000). When Topology Meets Chemistry . New York: Cambridge

University Press.

[3] Goldman, J. R., & Kauffman, L. H. (1997). Rational Tangles. Advances in

Applied Mathematics , 18 , 300-332.

[4] Isaacs, A., Daintith, J., & Martin, E. (Eds.). (1999). Oxford Dictionary of Science

(Rev. ed.). London: Oxford University Press.

[5] Kauffman, L. (1987). On Knots . New Jersy: Princeton University Press.

[6] Kornberg, A. (1974). Dna Replication . San Francisco: W. H .Freeman and

Company.

[7] Kornberg, A. (1974). Dna Replication . San Francisco: W. H .Freeman and

Company.

[8] Micklos, D. A. (2003). DNA Science (2nd ed.). New York: Cold Spring Harbor

Labratory Press.

[9] Prasolov, V. V. (2000). Knots, Links, Braids and 3-Manifolds . Providence, Rhode

Island: American Mathematical Society.

[10] Sossinsky, A. (2002). Knots . New York: Harvard University Press.

[11] Tompkins, J. (2005). Modeling DNA with Knot Theory: An Introduction. Math

Journal , 1-23. Retrieved April 20, 2007, from www.rosehulman.edu/mathjournal/archives/2006/vol7-n1/paper13/v7n1-13pd.pdf

[12] Yang, C. N., & Ge, M. L. (1991). Braid Group, Knot Theory and Statistical

Applications of Knot Theory to DNA 22

Mechanics . London: World Scientific.

Download