Lecture Notes - Course

advertisement
Lecture Notes
DNA, Genes, Chromosomes and Genetic Testing
I. DNA structure
DNA is a double-stranded helical structure. The basic building block of DNA is
the nucleotide (base + phosphate + sugar). The backbone of each strand of the
helix consists of a sugar-phosphate polymer.
In DNA, the sugar is deoxyribose and the phosphates are attached through
ester bonds to its 3' to 5' hydroxyl group (i.e. phosphodiester bonds are formed
between adjacent deoxyribose units). At the 1' position of the sugar ring is one of
4 nitrogen-containing bases. Two of these, Adenine and Guanine are purines
and the other two, Cytosine and Thymine are pyrimidines.
The double helix is held together by hydrogen bonds, which can form between A
and T bases and between G and C bases, each of which is called a base pair
(bp). Thus, the two strands of DNA are complementary. Consequently,
knowledge of the sequence of nucleotide bases on one strand automatically
allows one to determine the sequence of bases on the other strand.
The structure of DNA allows for many features:
a. Storing and coding of a vast amount of information: since the
molecule is a polymer of 4 types of bases, for a length of N bases,
there are 4n possible sequences.
b. Semi-conservative DNA replication: for replication, the two strands
unwind and each serves as a template for a new strand. Thus, each
new molecule contains one strand from the old molecule.
c. DNA repair: because the other strand acts as a template, any missing
or incorrect base from one strand can be repaired and replaced
through complementarity.
d. Re-annealing: complementary DNA strands, when separated, can
recognize each other and re-anneal: this has been the basis of
molecular genetic techniques
1
II. DNA in the human cell
The vast majority of human DNA is found in the nucleus, in the form of linear
double stranded molecules, the chromosomes. Each human chromosome is
believed to consist of a single, continuous DNA double helix, ranging in size from
50 million bp (for the smallest, chromosome 21) to 250 million (chromosome 1).
However, the chromosomes are not naked DNA: they are complexed with a
family of basic chromosomal proteins called histones (HP) and with a
heterogeneous group of acidic proteins, called the non-histones (NHP). This
complex of DNA and protein is called chromatin.
Histone proteins play a critical role in the proper packaging of the chromatin fiber.
Two copies of each of 4 histones constitute an octamer, around which a segment
of DNA double helix winds, giving chromatin the appearance of beads on a
string. Each complex of DNA with core histones is called a nucleosome, the
basic structural unit of chromatin.
The long strings of nucleosomes themselves are further compacted into a helical
secondary chromatin called a solenoid, a 30 nm-diameter fiber which is the
fundamental unit of chromatin organization in the interphase nucleus. NHP are
mainly responsible for further packaging of the chromatin into metaphase
chromosomes.
Some of the DNA is found in the mitochondria in the cytoplasm. This constitutes
the mitochondrial genome which contains 37 genes. Mitochondrial DNA is
circular and is maternally inherited, unlike nuclear DNA.
III. Classes of human DNA
The human genome, in its diploid form, consists of approximately 6-7 billion bp of
DNA. However, less than 10% of this DNA actually encodes genes. The human
genome contains, by current estimates, 30,000 to 40,000 genes.
Long stretches of unique DNA sequences are quite rare. Most single-stranded
(SC) DNA is found in short stretches (several kilobases (kb) or less),
interspersed with members of various repetitive DNA families. Most (but not all)
of the genes are represented in SC-DNA.
Repeated sequences ("repeats") can be classified into two major groups:
clustered repeats (localized) or interspersed (dispersed throughout the genome,
interspersed with single-copy sequences).
IV. DNA transcription and translation
DNA is in the nucleus and protein synthesis occurs in the cytoplasm. Therefore,
the molecular link between DNA and protein is RNA, the chemical structure of
2
which is similar to that of DNA, except that the sugar in each nucleotide is
ribose, Uracil is the pyrimidine instead of T and RNA is single-stranded.
RNA is synthesized from the DNA template though a process known as
transcription; mRNA then transports the information from the nucleus to the
cytoplasm, where the RNA sequence is translated on the ribosomes,
determining the sequence of amino-acids (aa) in the protein being synthesized.
1. Transcription
Transcription is initiated upstream from the first coding information at a point
corresponding to the 5' end of the final RNA product, called the transcription
"start" site.
The primary transcript (pre-mRNA) is modified in the nucleus in a process known
as post-transcriptional processing, involving cleavages of some sequences
and additions of others. The fully processed, mature mRNA, is then transported
to the cytoplasm, where translation takes place. It is the 3’ to 5’ strand of the
DNA that is usually transcribed, but the 5’ to 3’ sequence of the mRNA is most
directly comparable with the 5’ to 3’ strand of DNA, and because of this, by
convention, it is the 5’ to 3’ strand and sequence of a gene which is reported in
the literature.
2. Translation and the genetic code
Protein synthesis occurs on ribosomes, macromolecular complexes made up of
rRNA (ribosomal RNA) and several dozen ribosomal proteins. tRNA (transfer
RNA) molecules, each specific for a particular aa, transfer the correct aa from the
cytoplasm to their positions along the mRNA template, to be added to the
growing polypeptide chain.
Each set of 3 bases constitutes a codon, specific for a particular aa. There are 4n
possible combinations in a sequence of n bases. For 3 bases, there are 64
possible triplet combinations. These 64 codons constitute the genetic code.
Because there are only 20 aa and 64 possible codons, most aa are specified by
more than one codon, hence the code is said to be degenerate. Three of the
codons are called stop (or nonsense) codons because they designate
termination of translation of the mRNA at that point.
Translation of a processed mRNA is always initiated at a codon specifying
methionine. Methionine is therefore the first aa of each polypeptide chain (amino
terminal), although it is usually removed before protein synthesis is completed.
The codon for methionine (initiator codon, AUG) establishes the reading frame
of the mRNA; each subsequent codon is read in turn to yield a protein of the
correct aa sequence.
Translation ends when a stop codon (UGA or UAA or UAG) is encountered in
the same reading frame as the initiator codon. The completed polypeptide is then
released from the ribosome, which becomes available to begin synthesis of
another protein.
3
3. Post-translational processing
Many proteins undergo extensive post-translational modifications. The
polypeptide chain that is the primary translation product is folded and bonded into
a specific 3-D structure that is determined by the aa sequence itself. Two or more
polypeptide chains, products of the same gene or of different genes may
combine to form a single protein. The protein products may also be modified
chemically, for example, addition of carbohydrates at specific sites. Other
modifications may involve cleavage of the protein.
V. Gene structure and organization
A gene is defined as a sequence of DNA that is required for production of a
functional product, a polypeptide or a functional RNA molecule. A gene includes
not only the actual coding sequences but also adjacent nucleotide sequences
required for the proper expression of the gene, that is, for the production of a
normal mRNA molecule. These adjacent regions provide the molecular "start"
and "stop" signals for the synthesis of mRNA transcribed from the gene.
At the 5' end of the gene, sometimes called the upstream flanking region lies a
promoter region, which includes sequences responsible for the proper initiation
of transcription. The sequence of these regions is usually highly conserved. The
promoter is involved in the attachment of RNA polymerase II to the DNA.
Promoters are usually several hundred nucleotides long and contain a
consensus sequence (TATA) which binds a series of transcription factors. Not all
gene promoters contain these specific sequence elements. In particular,
promoters of genes that are constitutively expressed in most or all tissues
(housekeeping genes) contain CpG islands, so named because of the unusually
high concentration of the dinucleotide 5'-CG-3'.
The activity of many promoters is modulated by one or more enhancers.
Enhancers are generally short sequences that bind specific transcription factors.
At the 3' end of the gene lies an untranslated region that contains a signal for
addition of a sequence of polyadenosine residues (poly-A tail) to the end of the
mature mRNA
The vast majority of genes are interrupted by one or more non-coding regions,
called intervening sequences or introns, which are removed before the final
mRNA reaches the cytoplasm. Introns alternate with coding sequences or
exons, that ultimately encode the aa sequence of the product.
************************************
4
Download