The Myth of Junk DNA Dr. Raymond G. Bohlin Fellow, Discovery Institute Probe Ministries Non-Protein Coding DNA 2001 – 65,000 mRNAs, but only 4% from exons 2002 – ENCODE found 11,655 non-proteincoding RNAs 2005 – most of mammalian DNA is transcribed 2008 – both strands used in transcription and frequently from overlapping segments Evolutionary predictions If a sequence is non-functional, then over time the sequence should degrade If a sequence is functional, then the sequence should be conserved by natural selection. Non-Protein-coding DNA 2005 – non-coding regions in humans and mice, hundreds of nucleotides long are identical. Such ultra conserved regions (UCR) regulate developmentally important functions This is not expected by evolution! Introns Introns are not just inert spacers between exons 2005 – intronic sequence is highly conserved between humans, mice, rats, dogs and chickens – likely functional Mammalian thyroid receptor gene produces two variant proteins with opposite effects – splicing is regulated by an intron. Co-expressed loci are clustered together along in the nucleus, sometimes to “create” genes Nuclear compartment with concentrated transcription factors Chromosome 5 loop Chromosome 21 loop Chromosome 2 loop Pseudogenes A pseudogene is a gene that closely resembles a functional gene but appears to be a useless leftover Pseudogenes as defined above would be predicted by evolution but difficult under ID The human genome may have as many as 2000 pseudogenes pseudogenes Some pseudogenes appear to suppress expression of the functional gene. The pseudogene can be transcribed and this transcript binds to the mRNA sequence of the functional gene, thus blocking translation. “RNA interference” Transcribed pseudogenes serve as “perfect decoys” for RNA degrading enzymes, thus enhancing expression. Repetitive Sequences About half of the mammalian genome consists of various types of repetitive sequences. Long Interspersed Nuclear Elements – LINEs Short Interspersed Nuclear Elements – SINEs Endogenous Retroviruses - ERVs Overview of LINEs LINEs and SINEs have different structural arrangements. The major LINE in the human genome is the L1. This sequence: Is found throughout Mammalia but is largely taxon-specific Is variously truncated at the 5’ end: ranges from 6-8kb to a few hundred bps in length Has a biased chromosomal distribution: AT-rich chromosome bands and the X-chromosome ORF1 ORF2: Reverse transcriptase and endonuclease G-dense Pu:Py element (A-rich ‘tail’) Species-specific regulatory region 3’ UTR (A-rich ‘tail’) Chimp Human Chimp- vs. Human-Specific L1s* 0 L1Hs(Ta) elements 210 L1 nonTa elements 476 L1Pa2 elements 271 L1Hs(Ta) elements 252 L1 nonTa elements 490 L1Pa2 elements 5-6 Million Years Ago *Mills, R.E. et al. 2006. Recently mobilized transposons in the human and chimpanzee genomes. Am. J. Hum. Genet. 78: 671-679. Remember the layout of a mammalian gene? Many human gene folders are bordered by species-specific repertoires of L1s. RNA outputs L1s “Gene” 2 “Gene” 1 “Gene” 4 “Gene” 3 “Gene” 5 L1s Almost forty percent of human nuclear matrix attachment elements are L1 sequences. Overview of SINEs The major SINE in the human genome is Alu. Unlike LINE-1, Alu (and other SINEs) do not encode enzymes for their mobilization. This sequence: Is primate-specific—subfamilies are distributed in a taxonomically hierarchical manner (same with LINE-1) Is ~300 bps in length; consists largely of two dimers (with sequence differences) Has a biased genomic distribution: GC-rich chromosome bands Central A-stretch (A-rich ‘tail’) Monomer A 31 bp insert Monomer B Chimp Human Chimp- vs. Human-Specific SINEs* 233 other Alu elements 50 AluS elements 1167 other Alu elements 263 AluS elements 10 AluYa5 elements 1,709 AluYa5 elements 9 AluYb8 elements 1,290 AluYb8 elements 360 AluY elements 484 AluY elements 979 AluYc1 elements 356 AluYc1 elements 1 AluYg6 elements 261 AluYg6 elements 396 SVA (SINE) elements 864 SVA (SINE) elements 5-6 Million Years Ago *Mills, R.E. et al. 2006. Recently mobilized transposons in the human and chimpanzee genomes. Am. J. Hum. Genet. 78: 671-679. Any seemingly random aspect of chromosome sequence arrangement is not. A case in point involves endogenous retroviruses (ERVs): A. Human ERVs contribute 51,197 promoter elements that initiate transcription at various stages (Conley et al., Bioinformatics 24: 1563-1567, 2008). B. Mouse ERVs are highly expressed at the 2-cell embryo stage (and are the earliest to be transcribed in the zygote) and are essential for ontogenesis (Kigami et al., Biology of Reproduction 68: 651-654, 2003). ERVs In humans ERVs help regulate blood cell production and metabolizing fat ERVs also regulate gene expression in the gastrointestinal tract, mammary glands, and testes. The ERV derived protein syncitin is required for the fusion of fetal and maternal cells in the placenta. Although less than 2% of genomic DNA in many vertebrates (e.g., mammals) can be placed in the traditional “gene” category, nearly all sequences are transcribed in a cell- and tissue-specific manner. DNA as Computer Information carried by DNA is bidirectional, multi-layered, and interleaved. Repetitive elements format and punctuate the information at different scales Cells can write codes onto non-coding DNA so phenotype is not always equal to genotype “metaprogramming” – Cornell Conf.