Comparative Genome Organization in plants: From Sequence and Markers to Chromatin and Chromosomes Summary Introduction: Comparative studies have shown that various biological structures and functions are conserved among the living organisms. These have been proved by cytological and molecular studies. Molecular studies have shown that structures like ribosomes, ribozymes and features of genetic code are conserved across the living organisms. Such studies provided useful markers for evolutionary studies. Findings from comparative studies have encouraged the biologist to determine the whole genome sequence. It is believed that knowledge of the whole sequence of an organism will aid in the isolation of sequences common in other related organisms, and thus help in isolation of genes in related species. The Linear DNA Sequence: Sequencing projects have shown that the double-helical structure of DNA and its composition (A, T, G, C bases) is universal in nature. It has also been shown that chromosomes start and end with telomeres. In the plant kingdom, Arabidopsis was chosen for the sequencing project mainly because it has a small genome of 130-140Mbp that is diploid with 5 pairs of chromosomes. Genome size and plant niche are significantly correlated, but there isn’t any clear correlations between chromosome number and any plant characteristic except for polyploidy. Sequencing is followed by annotation and identification of genes. Arabidopsis has approximately 25,000 genes that represent most of the genes found in plants with much larger genomes. Then why is there such a discrepancy in the sizes of the genome? It is mainly because of the difference in the number of repeat sequences from one species to another. About 50-90% of the genome of higher eukaryotes is composed of these DNA motifs. Though genomic sequence can tell us about the nature of genes and their function, sequencing doesn’t distinguish between modified and unmodified bases, and fails to tell us about chromatin packaging and three-dimensional organization of the chromosomes. Repetitive DNA Sequence and the Large-Scale Organization of the chromosome: Before genomes of different organisms can be compared, the length of the sequence gaps must be determined, the homogeneity of repeat motifs should be known, and the extent of variation within the motifs should be known in order to ascertain the function of the repeat elements in the genome. Some of the sequence repeats have been highly conserved from one species to another like the rDNA genes, but some repeats are highly variable even between accessions of a species. The study of repetitive DNA sequence motifs and their chromosomal distribution has considerable potential for understanding genome evolution and sequence components. It was discovered that amidst repetitive sequences, especially in the centromeric region, lie some genes. The concept of families of repeat sequences has been developed to understand these regions of the genome. Classes consist of 1) tandem repeats 2) retroelements and 3) telomeric sequence. Cytogenetic methods have used in situ hybridization methods to determine the localization of the repeat sequences. rDNA: The DNA that codes for the rRNA is known as rDNA. It is highly conserved and consists of tandem array of repeating units of the rRNA genes encoding 18S, 5.8S and 26S rRNAs and spacers (transcribed and non-transcribed) of approximately 10 Kb in plants. Repeat units and the 5s rRNA genes are localized at specific regions of the chromosomes which makes them usable as markers. Evolutionary trends have been studied due to the general correlation of speciation rates to changes in chromosomal distribution of these repeat units. rDNA repeats represent ~10% of the genome. Telomeres: They are the specialized structures present at the end and start of the chromosomes. They are highly conserved regions of hundreds of tandem repeats with the sequence similar to TTTAGGG. The enzyme telomerase is required for telomeric replication. The enzyme supplies an RNA template for telomere replication. Telomerase enables chromosomal stabilization and repair. Centromeres: They are the attachment site of microtubules during cell division. Centromeres are often composed of tandem repeats, which are highly conserved and are defined cytologically by primary constriction. Centromere-associated repeats represent a considerable percentage of the genomic DNA. Despite analysis of the structure and proteins associated with the centromere, comprehensive information about centromeric DNA sequence is lacking. Some scientists are of the opinion that the tandem repeats play a key role in centromere function and chromosome segregation. Recent analyses of centromeres have shown that it as not devoid of genes as was previously believed. A few genes and a wide range of vestigial and presumably inactive mobile elements have been identified. The centromere consists of a central, repetitive core, flanked by moderately repetitive DNA that has few recombination and then by regions with mobile elements and normal recombination rates. Transposable Elements and Retroelements: They are discrete components of the plant genome that replicate and reinsert at multiple sites by a complex process. Depending on the method of excision and reintegration, these mobile elements are classified as either Type I, which uses an RNA intermediate, e.g. retrotransposon, or Type II, those existing exclusively as DNA. Retroelements are very heterogeneous and found in the whole of the plant kingdom indicating its ancient nature. It is hypothesized that retroelements are more around the centromere region so as to limit the disruption of genes. Retroelements are a source of biodiversity as they can cause a mutation when present in a gene. It is estimated that 80% of mutations detected in Drosophila are caused by retrotransposons. Transposons can partially or completely restore gene function and may even create new gene functions, thereby contributing to evolution. It has been shown that stress activates retroelements. The sequence of degenerate and potentially active retroelements gives valuable information about genome evolution and phylogenetic relationships. Retroelement amplification leads to large genomes and loss can occur in a specific manner leading to species-specific composition of retroelements. Simple Sequence Repeats (SSRs): SSRs or microsatellites are small nucleotide repeats (~ up to 5bp) that are present all along the eukaryotic genome. They provide highly informative and polymorphic markers for plant, fungal and animal fingerprinting. Tandem Arrays of Repetitive DNA: They can provide useful markers for chromosome identification, and their presence and distribution can provide evidence for evolutionary changes. Evidence does not exist for a constant mutation rate. Rather, bursts or evolutionary waves of mutations occurred. Tandem arrays are usual transcription silent. DNA Sequence in the Chromosome: The packing of the genomic DNA can directly affect aspects of RNA transcription, DNA replication, recombination, DNA repair, and chromosome segregation. Methylation: Cytosine methylated DNA is extensive. It is an important gene regulating mechanism. Reports have correlated some Methylation patterns to reduce levels of gene expression, whereas other patterns are correlated to normal regulation of developmentally important genes. Probably, methylation occurs at symmetrical sites in the DNA molecule, in animals, whereas in plants, methylation does not necessarily occur at symmetrical sites. Usually, DNA methylation is a terminal stage of differentiation, but changes in patterns have been noticed during plant development (e.g. meiosis and embryogenesis). DNA methylation also helps in maintaining the chromosome stability. The DNA methyltransferases are known to participate in DNA repair and stabilize nucleoprotein assemblies required in the inactivation and imprinting of chromosomes. Structure and Packaging of Linear DNA into Chromosomes: The DNA is wrapped around the basic proteins called histones forming nucleosomes connected by linker DNA. Repetitive sequences probably play a key role in stabilizing this structure. Chromatin Remodeling and Histone Acetylation: Histone acetylation is known to change the structure of the chromatin. It does it by modulating the position of nucleosomes. Changes in nucleosome position affect the rate of transcription by blocking the access of transcriptional factors to the promoters. Remodeling may be a requirement for replication of condensed, inactive regions of the genome. The Three-Dimensional Nucleus Genome Architecture: Two-dimensional linear models inadequately explain gene regulation, thus architecture is important in understanding gene regulation. Architecture refers to the genomes threedimensional structural organization within the nucleus and extends to the dynamics and relationships between structure and function. The genome architecture is of prime importance as the functional regulation of DNA behavior depends on genome organization. DNA packing and unpacking, replication, repair, mutation and transcription are tissue specific and depend on the dynamic architecture of genome organization. Packaging of Nuclear DNA: It is believed that an intranuclear framework provides a functional organization for the genome. But the existence of a nuclear matrix or chromosomal skeleton remains controversial. Most of the major cytoskeletal proteins like actin and tubulin have been found in the nucleus but their exact function and significance are not known. “The higher-order structure of the chromatin fiber and the organization of chromatin domains in the nucleus appear to have a profound influence on gene expression.” Within the nucleus there is compartmentalization of individual chromosomes, euchromatic and heterochromatic regions, and the nucleolus. The active genes tend to move to the periphery near the nuclear membrane where RNA transcripts are formed. Nucleoli, the sites of rRNA synthesis, are spherical compartments within the nucleus with no defined boundaries; they move and fuse during interphase of the cell cycle. Genomics, Chromosomes, Evolution, and the Nucleus: The chromosome, chromosome segment, gene, and DNA sequence are levels of genome evolution that plant breeders aim to control and direct. The plasticity of the genome and rapid amplification and fixation of advantageous novelties has been shown. Organization of the chromosome has a fundamental influence on these evolutionary processes. It is now known that tandem repeats in relation to chromosome structure are present at Telomeres and Centromeres and also that retroelements represent about 50% of the DNA in the genome. Comparative analysis has been useful in understanding genome organization. Thus from the studies of the genome organization and gene functions much can be known about the fundamental processes, such as chromosome pairing, segregation, gene organization and expression, and its direct implication on the aims of biologists. The structure and some sequence of the DNA and its organization have been conserved in all the organisms showing us the importance of these in the maintaining the life cycle.