Repeated DNA sequences

advertisement
Repeated DNA sequences 3
Prof Duncan Shaw
Molecular & Cell Biology
Lecture 3
CpG islands - discovery and structure
CpG islands - function and uses in gene cloning
Centromeres - why they are necessary
Types of DNA repeat at centromeres
CpG islands - discovery and structure
References: Genes and genomes chapter 8.7
Cross SH, Bird AP (1995): "CpG islands and genes". Current Opinion in Genetics and
Development 5, 309-314
CpG islands (also known as HTF islands) are not
really a repeated sequence, but a special type of DNA
sequence with a particular function.
Most of vertebrate DNA is methylated at the C
residue of the sequence ....CG..... This means that it
can't be cut by certain restriction enzymes with CG in
their recognition sites, e.g. HpaII (see top part of
diagram). Other enzymes like this include NotI and
SacII, which are used to digest mammalian DNA in
large fragments for analysis by pulsed-field gel
electrophoresis.
If vertebrate DNA is digested with HpaII and run on
an ordinary agarose gel, you get a pattern like the
lower part of the diagram. The interpretation of this
result was that most HpaII sites in vertebrate DNA
are methylated at CG and not digested, resulting in the high molecular-weight DNA near the
origin of the gel. But a fraction of sites are unmethylated and clustered together, resulting in very
small fragments of DNA of 100bp and less. Only about 1% of the total DNA is in this fraction.
These clusters of sites were named HTF islands (HpaII Tiny Fragments). The name was later
changed to CpG islands.
The next stage was to clone some of the HTF DNA and use it to probe genomic DNA libraries.
This allowed fragments containing the islands to be sequenced. It revealed that a typical CpG
island is 1-2kb long, and is about 70% G and C (as opposed to 40% in the rest of the genome).
Furthermore, in the rest of the genome the dinucleotide ...CG... is under-represented due to its
tendency to mutate to ...TG... In CpG islands there is no such deficit of CpG.
Further studies confirmed that most C in CpG islands is unmethylated, and showed that there
about 45,000 islands in a human genome. Sequencing DNA around islands showed that they are
always located at the 5' end of genes, suggesting that they are part of the promoter. But as there
are about 80,000 genes in mammals, not all genes have a CpG island. Those that do are usually
"housekeeping" genes, i.e. those performing
basic cellular functions in all tissues.
This is a typical gene with a CpG island. The
island includes the first exon. Although there
are CpG dinucleotides throughout, they are
clustered and unmethylated in the island.
The average spacing between islands in the
genome is about 70kb, but this can vary from
8kb to several Mb in particular regions of the
genome.
Function of CpG islands
A function for islands was suspected because they are always associated with the first exon of
housekeeping genes. Molecular studies showed that the chromatin in these regions has an "open"
configuration, with no nucleosomes or histone H1. This would make the DNA accessible to
transcription factors, etc. and hence able to be transcribed.
Bulk genomic DNA is methylated by a specific methyltransferase enzyme, so what stops CpG
islands from getting methylated as well? It may be due to flanking sites for the Sp1 DNA binding
protein, because if these are removed, the island does get methylated. Possibly bound Sp1
prevents methyltransferase from acting.
An exception to this is the mammalian X chromosome. In female cells one of the Xs is inactive.
The CpG islands of its genes are methylated, which represses gene activity. In fragile X
syndrome this occurs as part of the disease mechanism: the CpG island of the mutant FMR1 gene
is methylated in male cells, preventing its expression.
CpG islands are involved also in cancer (e.g. if the promoter of a tumour-suppressor gene is
aberrantly methylated, leading to lack of expression and hence a growth advantage in the cells
where this occurs). Another role is in genetic imprinting, the process whereby some genes are
differentially expressed depending on which parent they were inherited from. The mechanism for
this is associated with the methylation of CpG islands of imprinted genes. There is also some
evidence that a few islands (2%) may be reversibly methylated to control gene expression during
normal development.
Further insight into the role of CpG islands came from the
discovery of the MeCP2 protein. This is a 492aa protein
which is abundant in all mammalian cells. It binds to
methylated DNA, but not to unmethylated DNA (i.e. CpG
islands). In vitro studies using reporter genes show that
binding of MeCP2 to a methylated promoter represses
transcription. Presumably it does so by preventing access
by transcription factors. There is a critical minimum of
methylation sites required for MeCP2 to do this, about 1
per 100bp. This indicates that MeCP2 molecules act in a
co-operative manner.
Use of MeCP2 for cloning genes
The DNA binding domain of MeCP2 is about 85 aminoacids from near the N-terminus of the protein. It can be
cloned and expressed in E coli, fused to a 6xHis sequence
which enables it to bind to nickel ions on an agarose
column. DNA fragments passed over this column can
then be eluted with increasing salt concentrations as
shown in the picture. Methylated DNA binds more tightly
and is therefore eluted last.
The MeCP2 column can now be used to isolate DNA
fragments that contain CpG islands specifically from
genomic DNA. The DNA is digested with an enzyme
specific for TTAA, which cuts quite frequently except
in CG rich regions (i.e. CpG islands). The fragments
are passed over the column, which binds methylated
DNA but not unmethylated. The flow-through is
recycled through the column so that most of the
methylated DNA sticks and most of the unmethylated
comes through. To purify the CpG islands further, they
are next converted to the methylated form by treatment
in vitro with methyltransferase enzyme. As they have a
higher density of CpG dinucleotides than bulk (nonCpG island) DNA, they will now bind more tightly to
the column than any contaminating fragments of bulk
DNA and can be specifically eluted with high salt as
shown in the previous picture. After repeating this step
the CpG island fragments are virtually pure and are
cloned into a suitable vector to make a CpG island
library. Clones from this library can then be used to
isolate corresponding full-length cDNAs or genomic
clones that contain the rest of the gene.
Centromeres
This is a typical eukaryotic chromosome. Both
centromeres and telomeres have specific functions
in the cell, and particular types of repeated DNA
sequence. The diagram shows the conserved core
sequence for a centromere of the yeast
Saccharomyces cerevisiae.
Functionally, the centromere is the part of the
chromosome to which the spindle fibres
(microtubules) attach at cell division, ensuring
correct segregation of chromosomes to daughter
cells.
Structurally, it consists of a disc-like protein
structure, the kinetochore. Centromeres also have associated repeated DNA sequences - another
class of tandem repeat.
S. cerevisiae has a simple centromere on each of its chromosomes.The first to be cloned
(chromosome 11) was linked to the met14 gene, and was discovered when it was observed that a
5.2kb DNA fragment containing met14 could confer stability and correct segregation on the
plasmid it was cloned in.
Other centromeres were isolated and their sequences compared. They all have a conserved
sequence similar to that shown in the picture above. All the conserved elements are necessary for
correct centromere function.
The sequence at each yeast centromere is homologous
but not identical. By gene targeting, a centromere can
be manipulated in various ways and its ability to direct
correct chromosomal segregation at cell division can
be observed. This shows that a centromere can be
reversed or replaced with one from another
chromosome, and still function properly, but that
mutations of its sequence can destroy its function.
The DNA of yeast centromeres is less accessible than
the flanking DNA to enzyme digestion. So it is
probably forming some special kind of chromatin
structure. Specific binding proteins have been found,
and it is believed that it is the combination of these
proteins and the centromere-specific DNA sequences
that is the basis of centromere function.
Centromere-associated DNA repeats
Most eukaryotes have many copies of tandemly-repeated DNA sequences around their
centromeres (S. cerevisiae is an exception). This can account for up to 20% of total genomic
DNA (in humans it is about 5%). These repeats, called centromeric satellites, are not transcribed.
They stain strongly with Giemsa II giving dark bands under the microscope, sometimes referred
to as heterochromatin. Similar staining is seen for some other regions e.g. telomeres, rRNA
genes.
The structure of the centromeric tandem repeats was
investigated by restriction enzyme analysis - digest genomic
DNA with various enzymes, agarose electrophoresis and
Southern blot, and probe with a copy of the repeat. This
gives a single band for any enzyme that cuts once in the
repeat and a ladder pattern for enzymes that cut some repeats
but not others (due to mutations). These patterns are very
characteristic of tandem repeats.
The sequence of the repeat varies between species. We will
concentrate on those of primates including human. This is
also known as the "alpha-satellite" repeat and it has a basic
170bp unit, with variations between species and between
chromosomes in the same species. They also show
polymorphism between individuals, both in terms of
sequence and repeat copy number.
This shows some variations on the primate theme. The lower
2 are human. If the sequences of individual repeats within a
species are compared, you find 2 kinds of mutation: random,
which might have occurred at any time, and ones common to
several repeat units, which presumably arose by mutation
followed by amplification of the repeat by unequal crossingover.
Some centromeres have blocks of repeats made up of
multimers of the 170bp unit. The last one in the picture is
one such - it is from the human X chromosome. What
happened here is first that that the basic 170bp unit became
amplified to 12 copies, then various mutations occurred at
random positions leading to the BamHI and PstI sites, then
the entire 12-mer was itself amplified.
Download