DNA structure DNA is usually a double-helix and has two strands running in opposite directions. (There are some examples of viral DNA which are single-stranded). Each chain is a polymer of subunits called nucleotides (hence the name polynucleotide). Each strand has a backbone made up of (deoxy-ribose) sugar molecules linked together by phosphate groups. The 3' C of a sugar molecule is connected through a phosphate group to the 5' C of the next sugar. This linkage is also called 3'-5' phosphodiester linkage. All DNA strands are read from the 5' to the 3' end where the 5' end terminates in a phosphate group and the 3' end terminates in a sugar molecule. Each sugar molecule is covalently linked to one of 4 possible bases (Adenine, Guanine, Cytosine and Thymine). A and G are double-ringed larger molecules (called purines); C and T are single-ringed smaller molecules (called pyrimidines). In the double-stranded DNA, the two strands run in opposite directions and the bases pair up such that A always pairs with T and G always pairs with C. The A-T base-pair has 2 hydrogen bonds and the G-C base-pair has 3 hydrogen bonds. The G-C interaction is therefore stronger (by about 30%) than A-T, and A-T rich regions of DNA are more prone to thermal fluctuations. The bases are oriented perpendicular to the helix axis. They are hydrophobic in the direction perpendicular to the plane of the bases (cannot form hydrogen bonds with water). The interaction energy between two bases in a double-helical structure is therefore a combination of hydrogen-bonding between complementary bases, and hydrophobic interactions between the neighboring stacks of base-pairs. Even in the single-stranded state, the bases prefer to be stacked (like the steps of a spiral staircase if the bases are identical) and a single-stranded chain can also have regions of helical conformation. The backbone of polynucleotides are highly charged (1 unit negative charge for each phosphate group; 2 negative charges per base-pair). If there is no salt in the surrounding medium, there is a strong repulsion between the two strands and they will fall apart. Therefore counter-ions are essential for the double-helical structure. Counter-ions shield the charges on the sugar-phosphate backbone. They may also contribute to an attractive interaction from fluctuating counter-ions around the backbone, similar to the Van der Waals interactions for fluctuating induced dipoles. The most common DNA structure in solution is the B-DNA. Under conditions of applied force or twists in the DNA, or under low hydration conditions, it can adopt several helical conformations, referred to as the A-DNA, Z-DNA, S-DNA... Shown in picture above are three crystallized states of DNA, the A-DNA (left), B-DNA (middle) and Z-DNA (right). The A-form crystallizes under low hydration conditions and is not normally found for DNA in the cell. It is, however, the structure adopted by doublestranded regions in RNA as well as the transient double-helix between DNA and RNA during transcription. Both A- and B-DNA are right-handed helices whereas Z-DNA is a left-handed helix and is commonly found in regions of DNA that have an alternating purine-pyrimidine (e.g. 5'-CGCGCGCG-3' or 5'-CGCGCATGC-3') sequences. The table below summarizes some of the major differences. A-DNA B-DNA Z-DNA Right-handed helix Right-handed Left-handed Short and broad Long and thin Longer and thinner Helix Diameter 25.5A 23.7A 18.4A Rise / base-pair 2.3A 3.4A 3.8A Base-pair / helical turn ~ 11 ~ 10 ~ 12 Helix pitch 25A 34A 47A Tilt of the bases 20 deg -1 deg -9 deg The ball-and-stick representation shown above can be misleading because it suggests that there is empty space between the two strands and between the base-pair stacks. Another representation is the filled space representation in which each of the atoms are shown as a ball of radius representative of its Van der waals radius. The picture below shows this view for the 3 DNA structures shown above. Here, the B-DNA is on the left and the A-DNA is in the middle. The blue and white atoms are the sugar-phosphate backbone atoms, the red are G-C base-pairs and the yellow are A-T base-pairs. The B-DNA picture shows very clearly the 'grooves' in between the backbones that also spiral around the DNA structure; the grooves in B-DNA come in two sizes, the minor groove and the major groove. A DNA molecule is not a rigid, static structure as x-ray diffraction pictures might suggest, and the crystallographic parameters shown above are average parameters. In reality, each of these structures are under constant thermal fluctuations, which result in local twisting, stretching, bending, and unwinding of the double-strands. Also, certain sequences lead to permanent bends or kinks in the direction of the helix. These local (sequence-specific) fluctuations are essential for the recognition of specific binding sites along the DNA molecule where proteins involved in replication, transcription, regulation of gene expression, or DNA-damage repair can bind. RNA structures RNA molecules are also polynucleotides with a sugar-phosphate backbone and four kinds of bases. The main differences between RNA and DNA are: RNA molecules are single-stranded The sugar in RNA is a ribose sugar (as opposed to deoxy-ribose) and has an �OH at the 2' C position highlighted in red in the figure below (DNA sugars have �H at that position) Thymine in DNA is replaced by Uracil in RNA. T has a methyl (-CH3) group instead of the H atom shown in red in U. The picture shows an ATP molecule (adenosine tri-phosphate) about to be incorporated into an RNA chain with the release of a di-phosphate). RNA molecules do not have a regular helical structure like DNA. Instead, they can form complicated 3-dimensional structures where the strands can loop back and form intra-strand base-pairs from self-complementary regions along the chain. DNA structure RNA structure There are three classes of RNA molecules: messenger RNA (mRNA) which acts as a template for protein synthesis and has the same sequence of bases (read from the 5' to the 3' end) as the DNA strand that has the gene sequence. mRNA can range from ~300 nucleotides to ~7000 nucleotides, depending on the size and the number of proteins that they are coding for. transfer RNA (tRNA), one for each triplet codon that codes for a specific amino-acid (the building blocks of proteins). tRNA molecules are covalently attached to the corresponding amino-acid at one end, and at the other end they have a triplet sequence (called the anti-codon) that is complementary to the triplet codon on the mRNA. All tRNA molecules are in the range ~70-90 nucleotides. They have a molecular weight of ~25,000 and have sedimentation constant ~ 4 Svedberg (S) units. ribosomal RNA (rRNA) which make up an integral part of the ribosome, the protein synthesis machinery in the cell. Secondary and tertiary structures of tRNA molecules The crystal structures of several tRNA molecules have been determined. All tRNA molecules have very similar secondary structures in which the single-stranded chain is folded in a 'clover-leaf' structure that has three hairpins and an acceptor stem where the amino-acid is covalently attached. The acceptor stem is the 3' end of the chain and always terminates in the sequence 5'-CCA-3'. This particular tRNA is specific for the amino-acid Alanine whose codon on the mRNA is 5'GCC-3' and the anti-codon loop of tRNA reads 5'-GGC-3'. The grey circles are examples of unusual, chemically modified, bases. The secondary structure then folds up to form a 3-dimensional structure which looks like an inverted L. One end of one L arm (the 3' end of the chain) is the acceptor stem. The other end of the L is the anti-codon loop that has to match the codon on the mRNA. The distance between the two ends of the L is ~ 7 nm. The corner of the L is used for correct positioning on the ribosome where the protein synthesis takes place. In the tertiary (3-dimensional) structures of RNA, bases sometimes make hydrogen bonds with more than one partner, as illustrated in the picture above. These extra hydrogen bonds compensate for the distortion in the double-stranded helical regions when the RNA folds up and help stabilize the tertiary structure. The covalent attachment between the tRNA and its corresponding amino-acid is achieved by yet another adaptor molecule (this time a protein molecule called aminoacyl-tRNA synthetase) of which there are at least 20 varieties, one for each kind of amino-acid. The synthetases recognize the detailed shape and properties of a specific amino-acid and the detailed shape of the acceptor stem in the folded tRNA molecule and catalyze the covalent attachment between the amino-acid and its corresponding tRNA. Ribosomal RNA The ribosome is a large machinery (~ 20 nm in diameter, 70S sedimentation rate for bacterial ribosomes) and is made of two subunits: a large subunit (~50S) and a small subunit (~ 30S). The large subunit is in turn made of two ribosomal RNA (5S and 23S) and several (~34 proteins) whereas the small subunit has one ribosomal RNA (16S) and ~ 21 proteins. The 23S rRNA is ~ 3000 nucleotides long, and the 16S rRNA is ~ 1500 nucleotides long. The structures of ribosomal RNA can get very complicated because of the large number of ways in which hairpins and loops can be formed. Predicting these structures requires a combination of both computational methods (in which the most probable secondary structures are determined from estimates of free energy for a given structure) and a variety of experimental techniques. Oligonucleotide mapping techniques This technique is useful in identifying exposed single-stranded regions of a folded RNA molecule by hybridization with short synthesized nucleotide chains (also called oligonucleotides) that are complementary to, for instance, the loop regions in RNA. Folded RNA molecules are confined to one region in space separated by another region by a semi-permeable membrane. On the other side of the partition are radioactive oligonucleotides (~ 5-10 nucleotides long) that can pass through the membrane and bind to RNA molecules, but the RNA molecules, which are much bigger in size, cannot. At equilibrium, free oligomers are in the same concentration on both sides of the partition. However, the radioactivity on the side with the RNA molecules is larger than the other size because some oligomers will associate with (bind to) RNA if the sequences of oligomers and loop regions are complementary. A measure of the ratio (rd) of radioactivity from either side gives a measure of the binding or association constant where [X] is the concentration of the RNA-oligomer complex, [O] is the free oligomer concentration on either side, and [RNA] is the concentration of molecules that are not bound to an oligomer. The ratio If [RNA] >> [O], then the RNA concentration can be assumed the same before and after mixing and the ratio becomes . Therefore, a measurement of rd yields a direct measure of Ka. All oilgonucleotides will lead to some association since there is always a match at a single base-pair level. Therefore for any oligonucleotide. For oligonucleotides ~ 4 bases long that match an exposed loop region on the RNA the free energy change upon association is substantially larger (by ~ 10-15 kBT ) than the free energy change from single base-pair matches. This lead to an increase in the association constant by a factor of 104 to 106. This technique can easily distinguish between two possible conformations of an RNA molecule which have different sequences in their loop regions. We can also estimate which structure is more probable (i.e. which one has the lower free energy. The free energy of a hairpin can be broken into two parts, the free energy of forming a loop closed by a single base-pair hairpin. and the free energy for the base-paired `stem' of the In RNA molecules the most probable loop size consists of ~ 6-7 bases in the loop. Smaller loops are energetically unfavorable as a result of steric hindrances among the bases and atoms of the backbone. Larger loops are entropically unfavorable. The loss of entropy when loops are formed increases with increasing loop size. for the optimal sized loop closed by a G-C base-pair is ~ 7-8 kBT in 1M NaCl. In our example we have a loop with 10 bases in structure 1 ( with 4 bases each in structure 2 ( ) and 2 loops for each loop). Note that is a positive quantity; it is unfavorable to make loops relative to the random coil conformation. The hairpin structures are stabilized when the free energy gain from base-pair formation exceeds the free energy cost of loop formation. The gain from adding a base-pair to an already existing G-C pair is ~ adding a G-C base-pair and ~ for for adding a A-U base-pair. Therefore the net change in free energy for structure 1 is and for structure 2 is Structure 1 is more stable (although marginally) and the relative populations of the two structures are given by the Boltzmann distribution