Insertions and Deletions of Introns in Human and Dog Aligned with Boreoeutherian (ancestral) Genome Sharndeep Kaur Abstract Many genome-scale studies show the losses and gains of introns occurring during evolution of the major eukaryotic lineages but studies focused on more recent evolutionary periods have found more examples of losses than gains [1] . Studies conducted by Fedorov et al. did not detect any recently gained introns with in genome sequences of human, Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana [1]. The focus of this paper is the pattern of introns in humans and dogs compared to the Boreoeutherian (ancestral) genome. Using the two-way alignment to determine the patterns of insertions and deletions, it was determined that dog had more insertions and deletions than human. Introduction In our advanced world of science, it is surprising to find how little we know about the origin of introns. The existence of intervening sequences of non-coding DNA interrupting the coding region of nuclear genes, only to be later removed from transcribed nuclear pre-mRNAs by spliceosomes, poses fundamental questions [2]. What is the significance and origin of introns? How do they spread in genes? What is the primary process by which new introns are created? These are just a few questions that arise about introns. In 1977, a surprising phenomenon was discovered — introns. These are non-protein coding sequences of DNA that occur within functional genes. All genes begin with exons (the protein-coding segments), but most have a variable number of introns, within them that alternate with the exons [3]. There are eight major classes of introns, each occurring in different locations. The most common introns in the genome of eukaryotes are the GU-AG introns (named for their splice sites). AU-AC introns, also named for their splice site, along with some Group I introns are also found in eukaryotic genomes. Group I and II introns are also found in organelle and some bacterial 3 RNAs. Group III introns and twintrons are found entirely in organelle RNAs. Pre-tRNA introns are found in eukaryotic nuclear pre-tRNA, and archeal introns are found in various RNAs in archaebacteria. Lastly, vertebrate introns contain several conserved sequences. The 5’ end of the intron contains a 5’ splice site, of the sequence GU (guanine-uracil). The 3’ end of the intron contains a 3’ splice site, of the sequence AG (adenine-guanine) [4]. It is hard to conclude the exact function of introns but because they have been carried over for a long time indicates that introns might carry some genetic significance because of their presence through out evolution. There are two major hypotheses of intron evolution: “introns early” and “introns late.” The introns early theory states that introns were originated early in the genomic evolution process and are slowly being removed from eukaryotic genomes. In contrast, the introns late theory states that introns are a recent development in the genome and are accumulating in rather than disappearing from eukaryotic genomes [4]. There has been several hypotheses for functions of introns. One theory claims that it separates the exons into the functional subunits of the product for which they code. Another one states that introns allow exon shuffling, which would account for the tremendous variability in proteins (and their nearly instantaneous production) in eukaryotes [5]. And another interesting one is that if RNA was the original genetic material (and there is lots of evidence to suggest it is more primitive than DNA), then introns may have pinched out to function as the first enzymes [5]. In this paper, Boreoeutherian genome was used as an ancestral genome to compare the patterns of introns in humans and dogs. Boreoeutherian ancestor is the most common ancestor because boreoeutheria are the largest clade, or group of species, of mammals. Also, the reconstruction of boreoeutherian genome is 98 percent accurate. The time stamp for boreoeutherian ancestors is still unknown but scientists agree that it was more than 70 million years ago [6]. Materials and Methods An exon.txt file was used from the class directory (bme230/Winter05/projects/intron) and was converted using a perl script, getIntron.pl, into intron.bed to read the chromosome number, start and end of introns, gene-id, scores and strand. Intron.bed was then used by Robert Baertsch to find the two-way alignment, dog with ancestor (outDog.axt) and human with ancestor genome (outHuman.axt ). getIdAlign.pl was then used on the data that was obtained from two-way alignment to generate files that contain the counts of insertions and deletions in human and dog. These files: humanDel.data, dogDel.data, humanIns.data, and dogIns.data, can also be obtained from the class project directory. Results Table 1: Counts of introns inserted and deleted in human and dog genome . Lineages Counts Human Insertions 7343 Human Deletions 1218 Dog Insertions 8757 Dog Deletions 1629 Discussion Table 1, contains the data of introns deleted and inserted in two different lineages. These insertions and deletions are of introns that are greater than 100 base pairs. Deletions and insertions less than 100 base pairs were regarded as noise and were not used in the count of intron insertions and deletions for human and dog. From the data above, it is evident that more insertions occurred over evolution, in both human and dog. In human lineage, 7,343 introns were inserted but only 1,218 introns were deleted. Similarly, dog’s genome inserted 8,757 introns , but deleted only 1,629 introns. My data supports the “introns late” theory which states that introns are accumulating in the eukaryotic genomes rather than disappearing. More insertions than deletions in both human and dog rules out the “intron early” theory. It would be great to find correlations between the insertion/deletions of introns in dog and human for they can lead to further knowledge about the common function of intron. Dog genome is a great choice to compare the human genome with because it is similar in size to the genomes of humans and other mammals, containing approximately 2.5 billion DNA base pairs [7]. Literature Cited Coghlan A, Wolfe KH: Origins of recently gained introns in Caenorhabditis. Proc Natl Acad Sci USA 2004, 101:11362-11367 [1] [2] Bhattacharya, Debashish. “Molecular evolutionary methods: Their Application to Understanding Intron Origin and Evolution. “Iowa State University of Iowa Joint Bioinformatics Workshop.” 23 Aug 2000. <http://orion.math.iastate.edu/danwell/BioInfIowa/WSabstracts.html>. [3] Klyce, Brig. “ Introns : A mystery.” Cosmic Ancestry. <http://www.panspermia.org/introns.htm > [4] Byers, Kelsey. “An Evaluation of Intron Significance Using Bioinformatics.” 17 April 2003. <http://computing.breinestorm.net/intron+significance+introns+transcript+carry/ > [5] ________. Biology 104. Course home page. Spring 2001. Dept. of Biology, miami University. < http://www.bio.miami.edu/dana/104/104F02_11print.html >. [6] Roach, John. Scientists Recreate Genome of Ancient Human Ancestor. National Geographic News 2005. <http://news.nationalgeographic.com/news/2005/01/0125_050125_genome.html > [7] ________. Dog Genome assembled. National Human Genome Research Institute NIH 2004.