Simple tandem repeats in mammalian genomes

advertisement
Simple repeats in DNA sequences may
regulate gene expression
Cornelia Lange
The sequencing of the human genome lead to some surprises concerning the number of genes.
The number of human genes seems to be around 30,000, much less than expected. This
number is not a lot higher than the number of genes of “lower” organisms, like the worm
Caenorhabditis elegans with its more than 19,000 genes or the fruit fly Drosophila
melanogaster with about 13,000 genes. This led to the assumption that the difference between
humans and other organisms is not so much due the number of genes, but more to how these
genes function.
DNA molecules are made up from four different bases arranged in different sequences,
much as letters are arranged into words. It's the sequence of the bases that contains the
information, and is decoded into RNA and proteins. DNA regions where a small sequence of
bases is repeated over and over again - CTCCTCCTCCTCCTC for instance, containing five
repeats of the sequence CTC - are called microsatellites. For some microsatellites, therefore
called "polymorphic", the number of repeats varies in different individuals
"Genes" are defined as those parts of DNA-molecules that specify (encode) RNA or
proteins. Only around 3% of the human genome encodes proteins, the rest consists of regions
that encode RNA as well as non-coding regions. Some of the non-coding regions, regulatory
sequences, specify how the genes are going to be expressed.
Particular proteins (transcription factors) bind to such regulatory sequences, thereby
regulating gene expression. There is strong evidence that microsatellites can be part of
regulatory sequences. Since they are often polymorphic, this may be a source of genetic
variation in regulating gene expression in closely related species.
In this project I searched for polymorphic microsatellites located close to genes in the
human genome, because in this positions they have the highest probability of having a
regulatory function. I used a database that was created to detect short sequences repeated
directly after each other (tandem repeats) in the human genome. To test if they were
polymorphic I used two methods. One was DHPLC (Denaturing High Performance Liquid
Chromatography). This is a separation technique where DNA is forced to flow through a
column under a high pressure. The time it takes DNA to pass through the column varies
depending on its length and shape. If a DNA fragments contains a polymorphic microsatellite,
this will show up as a special pattern in the results. The second method was to determine the
base sequence in DNA fragments including the microsatellite from different individuals. The
sequences were then compared to see if they had a different number of repeats of the
microsatellite. Using these methods I could create a dataset (including gene name, gene
identity, microsatellite position, orientation, motif and number of repeats) of altogether 24
polymorphic microsatellites, which can be used for further studies on microsatellites.
To see how the number of repeats affected gene expression, I studied one particular
microsatellite, (CTC)n (n repeats of the sequence CTC). The different alleles (gene variations)
(CTC)8, (CTC)10 and (CTC)11 showed different expression of the gene they are adjacent to. I
tested whether a transcription factor bound differently to these different alleles. But using my
experimental set-up, I could not clearly confirm any differences.
Degree project in biology, University of Uppsala, fall 2002
Examensarbete i biologi, 20 p
Department of Genetics and Pathology, Uppsala University
Supervisor: Claes Wadelius
Download