gene - Ministerios Probe

advertisement
The Myth of
Junk DNA
Dr. Raymond G. Bohlin
Fellow, Discovery Institute
Probe Ministries
Non-Protein Coding DNA
 2001
– 65,000 mRNAs, but only 4% from
exons
 2002 – ENCODE found 11,655 non-proteincoding RNAs
 2005 – most of mammalian DNA is
transcribed
 2008 – both strands used in transcription
and frequently from overlapping
segments
Evolutionary predictions
 If
a sequence is non-functional, then over
time the sequence should degrade
 If
a sequence is functional, then the
sequence should be conserved by
natural selection.
Non-Protein-coding DNA
 2005
– non-coding regions in humans and
mice, hundreds of nucleotides long are
identical.
 Such ultra conserved regions (UCR)
regulate developmentally important
functions
 This is not expected by evolution!
Introns
 Introns
are not just inert spacers between
exons
 2005 – intronic sequence is highly
conserved between humans, mice, rats,
dogs and chickens – likely functional
 Mammalian thyroid receptor gene
produces two variant proteins with
opposite effects – splicing is regulated by
an intron.
Co-expressed loci are clustered together along in the
nucleus, sometimes to “create” genes
Nuclear compartment
with concentrated
transcription factors
Chromosome 5 loop
Chromosome 21 loop
Chromosome 2 loop
Pseudogenes
A
pseudogene is a gene that closely
resembles a functional gene but appears
to be a useless leftover
 Pseudogenes as defined above would be
predicted by evolution but difficult under
ID
 The human genome may have as many
as 2000 pseudogenes
pseudogenes
 Some
pseudogenes appear to suppress
expression of the functional gene.
 The pseudogene can be transcribed and
this transcript binds to the mRNA
sequence of the functional gene, thus
blocking translation. “RNA interference”
 Transcribed pseudogenes serve as
“perfect decoys” for RNA degrading
enzymes, thus enhancing expression.
Repetitive Sequences
 About
half of the mammalian genome
consists of various types of repetitive
sequences.
 Long Interspersed Nuclear Elements –
LINEs
 Short Interspersed Nuclear Elements –
SINEs
 Endogenous Retroviruses - ERVs
Overview of LINEs
LINEs and SINEs have different structural arrangements. The
major LINE in the human genome is the L1. This sequence:
 Is found throughout Mammalia but is largely taxon-specific
 Is variously truncated at the 5’ end: ranges from 6-8kb to a few
hundred bps in length
 Has a biased chromosomal distribution: AT-rich chromosome
bands and the X-chromosome
ORF1
ORF2: Reverse transcriptase
and endonuclease
G-dense
Pu:Py
element
(A-rich ‘tail’)
Species-specific
regulatory region
3’ UTR
(A-rich ‘tail’)
Chimp
Human
Chimp- vs. Human-Specific L1s*
0 L1Hs(Ta) elements
210 L1 nonTa elements
476 L1Pa2 elements
271 L1Hs(Ta) elements
252 L1 nonTa elements
490 L1Pa2 elements
5-6 Million Years Ago
*Mills, R.E. et al. 2006. Recently mobilized transposons in the human and chimpanzee genomes. Am. J. Hum. Genet. 78: 671-679.
Remember the layout of a mammalian gene? Many human
gene folders are bordered by species-specific repertoires of
L1s.
RNA outputs
L1s
“Gene” 2
“Gene” 1
“Gene” 4
“Gene” 3
“Gene” 5
L1s
Almost forty percent
of human nuclear
matrix attachment
elements are L1
sequences.
Overview of SINEs
The major SINE in the human genome is Alu. Unlike LINE-1,
Alu (and other SINEs) do not encode enzymes for their
mobilization. This sequence:
 Is
primate-specific—subfamilies are distributed in a
taxonomically hierarchical manner (same with LINE-1)
 Is ~300 bps in length; consists largely of two dimers (with
sequence differences)
 Has a biased genomic distribution: GC-rich chromosome
bands
Central
A-stretch
(A-rich ‘tail’)
Monomer A
31 bp
insert
Monomer B
Chimp
Human
Chimp- vs. Human-Specific SINEs*
233 other Alu elements
50 AluS elements
1167 other Alu elements
263 AluS elements
10 AluYa5 elements
1,709 AluYa5 elements
9 AluYb8 elements
1,290 AluYb8 elements
360 AluY elements
484 AluY elements
979 AluYc1 elements
356 AluYc1 elements
1 AluYg6 elements
261 AluYg6 elements
396 SVA (SINE) elements
864 SVA (SINE) elements
5-6 Million Years Ago
*Mills, R.E. et al. 2006. Recently mobilized transposons in the human and chimpanzee genomes. Am. J. Hum. Genet. 78: 671-679.
Any seemingly random aspect of chromosome
sequence arrangement is not. A case in point
involves endogenous retroviruses (ERVs):
A. Human ERVs contribute 51,197 promoter elements
that initiate transcription at various stages (Conley et
al., Bioinformatics 24: 1563-1567, 2008).
B. Mouse ERVs are highly expressed at the 2-cell
embryo stage (and are the earliest to be
transcribed in the zygote) and are essential for
ontogenesis (Kigami et al., Biology of
Reproduction 68: 651-654, 2003).
ERVs
 In
humans ERVs help regulate blood cell
production and metabolizing fat
 ERVs also regulate gene expression in the
gastrointestinal tract, mammary glands,
and testes.
 The ERV derived protein syncitin is
required for the fusion of fetal and
maternal cells in the placenta.
Although less than 2% of genomic
DNA in many vertebrates (e.g.,
mammals) can be placed in the
traditional “gene” category, nearly
all sequences are transcribed in a
cell- and tissue-specific manner.
DNA as Computer
 Information
carried by DNA is
bidirectional, multi-layered, and
interleaved.
 Repetitive elements format and
punctuate the information at different
scales
 Cells can write codes onto non-coding
DNA so phenotype is not always equal to
genotype
 “metaprogramming” – Cornell Conf.
Download