Lecture Notes DNA, Genes, Chromosomes and Genetic Testing I. DNA structure DNA is a double-stranded helical structure. The basic building block of DNA is the nucleotide (base + phosphate + sugar). The backbone of each strand of the helix consists of a sugar-phosphate polymer. In DNA, the sugar is deoxyribose and the phosphates are attached through ester bonds to its 3' to 5' hydroxyl group (i.e. phosphodiester bonds are formed between adjacent deoxyribose units). At the 1' position of the sugar ring is one of 4 nitrogen-containing bases. Two of these, Adenine and Guanine are purines and the other two, Cytosine and Thymine are pyrimidines. The double helix is held together by hydrogen bonds, which can form between A and T bases and between G and C bases, each of which is called a base pair (bp). Thus, the two strands of DNA are complementary. Consequently, knowledge of the sequence of nucleotide bases on one strand automatically allows one to determine the sequence of bases on the other strand. The structure of DNA allows for many features: a. Storing and coding of a vast amount of information: since the molecule is a polymer of 4 types of bases, for a length of N bases, there are 4n possible sequences. b. Semi-conservative DNA replication: for replication, the two strands unwind and each serves as a template for a new strand. Thus, each new molecule contains one strand from the old molecule. c. DNA repair: because the other strand acts as a template, any missing or incorrect base from one strand can be repaired and replaced through complementarity. d. Re-annealing: complementary DNA strands, when separated, can recognize each other and re-anneal: this has been the basis of molecular genetic techniques 1 II. DNA in the human cell The vast majority of human DNA is found in the nucleus, in the form of linear double stranded molecules, the chromosomes. Each human chromosome is believed to consist of a single, continuous DNA double helix, ranging in size from 50 million bp (for the smallest, chromosome 21) to 250 million (chromosome 1). However, the chromosomes are not naked DNA: they are complexed with a family of basic chromosomal proteins called histones (HP) and with a heterogeneous group of acidic proteins, called the non-histones (NHP). This complex of DNA and protein is called chromatin. Histone proteins play a critical role in the proper packaging of the chromatin fiber. Two copies of each of 4 histones constitute an octamer, around which a segment of DNA double helix winds, giving chromatin the appearance of beads on a string. Each complex of DNA with core histones is called a nucleosome, the basic structural unit of chromatin. The long strings of nucleosomes themselves are further compacted into a helical secondary chromatin called a solenoid, a 30 nm-diameter fiber which is the fundamental unit of chromatin organization in the interphase nucleus. NHP are mainly responsible for further packaging of the chromatin into metaphase chromosomes. Some of the DNA is found in the mitochondria in the cytoplasm. This constitutes the mitochondrial genome which contains 37 genes. Mitochondrial DNA is circular and is maternally inherited, unlike nuclear DNA. III. Classes of human DNA The human genome, in its diploid form, consists of approximately 6-7 billion bp of DNA. However, less than 10% of this DNA actually encodes genes. The human genome contains, by current estimates, 30,000 to 40,000 genes. Long stretches of unique DNA sequences are quite rare. Most single-stranded (SC) DNA is found in short stretches (several kilobases (kb) or less), interspersed with members of various repetitive DNA families. Most (but not all) of the genes are represented in SC-DNA. Repeated sequences ("repeats") can be classified into two major groups: clustered repeats (localized) or interspersed (dispersed throughout the genome, interspersed with single-copy sequences). IV. DNA transcription and translation DNA is in the nucleus and protein synthesis occurs in the cytoplasm. Therefore, the molecular link between DNA and protein is RNA, the chemical structure of 2 which is similar to that of DNA, except that the sugar in each nucleotide is ribose, Uracil is the pyrimidine instead of T and RNA is single-stranded. RNA is synthesized from the DNA template though a process known as transcription; mRNA then transports the information from the nucleus to the cytoplasm, where the RNA sequence is translated on the ribosomes, determining the sequence of amino-acids (aa) in the protein being synthesized. 1. Transcription Transcription is initiated upstream from the first coding information at a point corresponding to the 5' end of the final RNA product, called the transcription "start" site. The primary transcript (pre-mRNA) is modified in the nucleus in a process known as post-transcriptional processing, involving cleavages of some sequences and additions of others. The fully processed, mature mRNA, is then transported to the cytoplasm, where translation takes place. It is the 3’ to 5’ strand of the DNA that is usually transcribed, but the 5’ to 3’ sequence of the mRNA is most directly comparable with the 5’ to 3’ strand of DNA, and because of this, by convention, it is the 5’ to 3’ strand and sequence of a gene which is reported in the literature. 2. Translation and the genetic code Protein synthesis occurs on ribosomes, macromolecular complexes made up of rRNA (ribosomal RNA) and several dozen ribosomal proteins. tRNA (transfer RNA) molecules, each specific for a particular aa, transfer the correct aa from the cytoplasm to their positions along the mRNA template, to be added to the growing polypeptide chain. Each set of 3 bases constitutes a codon, specific for a particular aa. There are 4n possible combinations in a sequence of n bases. For 3 bases, there are 64 possible triplet combinations. These 64 codons constitute the genetic code. Because there are only 20 aa and 64 possible codons, most aa are specified by more than one codon, hence the code is said to be degenerate. Three of the codons are called stop (or nonsense) codons because they designate termination of translation of the mRNA at that point. Translation of a processed mRNA is always initiated at a codon specifying methionine. Methionine is therefore the first aa of each polypeptide chain (amino terminal), although it is usually removed before protein synthesis is completed. The codon for methionine (initiator codon, AUG) establishes the reading frame of the mRNA; each subsequent codon is read in turn to yield a protein of the correct aa sequence. Translation ends when a stop codon (UGA or UAA or UAG) is encountered in the same reading frame as the initiator codon. The completed polypeptide is then released from the ribosome, which becomes available to begin synthesis of another protein. 3 3. Post-translational processing Many proteins undergo extensive post-translational modifications. The polypeptide chain that is the primary translation product is folded and bonded into a specific 3-D structure that is determined by the aa sequence itself. Two or more polypeptide chains, products of the same gene or of different genes may combine to form a single protein. The protein products may also be modified chemically, for example, addition of carbohydrates at specific sites. Other modifications may involve cleavage of the protein. V. Gene structure and organization A gene is defined as a sequence of DNA that is required for production of a functional product, a polypeptide or a functional RNA molecule. A gene includes not only the actual coding sequences but also adjacent nucleotide sequences required for the proper expression of the gene, that is, for the production of a normal mRNA molecule. These adjacent regions provide the molecular "start" and "stop" signals for the synthesis of mRNA transcribed from the gene. At the 5' end of the gene, sometimes called the upstream flanking region lies a promoter region, which includes sequences responsible for the proper initiation of transcription. The sequence of these regions is usually highly conserved. The promoter is involved in the attachment of RNA polymerase II to the DNA. Promoters are usually several hundred nucleotides long and contain a consensus sequence (TATA) which binds a series of transcription factors. Not all gene promoters contain these specific sequence elements. In particular, promoters of genes that are constitutively expressed in most or all tissues (housekeeping genes) contain CpG islands, so named because of the unusually high concentration of the dinucleotide 5'-CG-3'. The activity of many promoters is modulated by one or more enhancers. Enhancers are generally short sequences that bind specific transcription factors. At the 3' end of the gene lies an untranslated region that contains a signal for addition of a sequence of polyadenosine residues (poly-A tail) to the end of the mature mRNA The vast majority of genes are interrupted by one or more non-coding regions, called intervening sequences or introns, which are removed before the final mRNA reaches the cytoplasm. Introns alternate with coding sequences or exons, that ultimately encode the aa sequence of the product. ************************************ 4