Simple repeats in DNA sequences may regulate gene expression Cornelia Lange The sequencing of the human genome lead to some surprises concerning the number of genes. The number of human genes seems to be around 30,000, much less than expected. This number is not a lot higher than the number of genes of “lower” organisms, like the worm Caenorhabditis elegans with its more than 19,000 genes or the fruit fly Drosophila melanogaster with about 13,000 genes. This led to the assumption that the difference between humans and other organisms is not so much due the number of genes, but more to how these genes function. DNA molecules are made up from four different bases arranged in different sequences, much as letters are arranged into words. It's the sequence of the bases that contains the information, and is decoded into RNA and proteins. DNA regions where a small sequence of bases is repeated over and over again - CTCCTCCTCCTCCTC for instance, containing five repeats of the sequence CTC - are called microsatellites. For some microsatellites, therefore called "polymorphic", the number of repeats varies in different individuals "Genes" are defined as those parts of DNA-molecules that specify (encode) RNA or proteins. Only around 3% of the human genome encodes proteins, the rest consists of regions that encode RNA as well as non-coding regions. Some of the non-coding regions, regulatory sequences, specify how the genes are going to be expressed. Particular proteins (transcription factors) bind to such regulatory sequences, thereby regulating gene expression. There is strong evidence that microsatellites can be part of regulatory sequences. Since they are often polymorphic, this may be a source of genetic variation in regulating gene expression in closely related species. In this project I searched for polymorphic microsatellites located close to genes in the human genome, because in this positions they have the highest probability of having a regulatory function. I used a database that was created to detect short sequences repeated directly after each other (tandem repeats) in the human genome. To test if they were polymorphic I used two methods. One was DHPLC (Denaturing High Performance Liquid Chromatography). This is a separation technique where DNA is forced to flow through a column under a high pressure. The time it takes DNA to pass through the column varies depending on its length and shape. If a DNA fragments contains a polymorphic microsatellite, this will show up as a special pattern in the results. The second method was to determine the base sequence in DNA fragments including the microsatellite from different individuals. The sequences were then compared to see if they had a different number of repeats of the microsatellite. Using these methods I could create a dataset (including gene name, gene identity, microsatellite position, orientation, motif and number of repeats) of altogether 24 polymorphic microsatellites, which can be used for further studies on microsatellites. To see how the number of repeats affected gene expression, I studied one particular microsatellite, (CTC)n (n repeats of the sequence CTC). The different alleles (gene variations) (CTC)8, (CTC)10 and (CTC)11 showed different expression of the gene they are adjacent to. I tested whether a transcription factor bound differently to these different alleles. But using my experimental set-up, I could not clearly confirm any differences. Degree project in biology, University of Uppsala, fall 2002 Examensarbete i biologi, 20 p Department of Genetics and Pathology, Uppsala University Supervisor: Claes Wadelius