1. Structural genes

advertisement
ZOO 405, Week 3
ZOO405 by Rania Baleela is licensed under a Creative Commons AttributionNonCommercial-ShareAlike 3.0 Unported License
This week
•
•
•
•
•
•
Genome content
Eukaryotic genome constitution
Viruses
Morphological types of viruses
Retroviruses and their genome organization
Retroviruses classification
Genome content
Size measurements in the molecular world
• 1 mm (millimeter) = 1/1,000 meter
• 1 mm (“micron”) = 1/1,000,000 of a meter (1 x 10-6)
• 1 nm (nanometer) = 1 x 10-9 meter
•1 bp (base pair) = 1 nt (nucleotide pair)
•1,000 bp = 1 kb (kilobase)
•1 million bp = 1 Mb (megabase)
•5 billion bp DNA ~ 1 meter
•5 thousand bp DNA ~ 1.2 mm
The C-value enigma/paradox
“Although genes are made of DNA,
much DNA is not genes”
Doolittle, 1989
Species
Genome size (Mb)
Predicted Gene
Number
Human
3,200
40,000- 50,000
Mouse
3,200
40,000
Pufferfish
380
38,000
Seq squirt
160
16,000
Fruit fly
180
14,000
Mosquito
280
14,000
Nematode
98
19,000
Mustard weed
125
25,000
Rice
400
35,000
Corn
2,500
40,000
Yeast
12
5,800
Neurospora
40
10,000
The C-value paradox
Complexity does not correlate with genome size
Dr Richard Horton
3.4  109 bp
Homo sapiens
1.5  1010 bp
Allium cepa
6.8  1011 bp
Amoeba dubia
Genome size changes
Increase:
(1) global increases (i.e. the entire genome or a major
part of it is duplicated),
(2) regional increases (i.e. a particular sequence is
multiplied to generate repetitive DNA).
Decreases:
Loss of 1 chromosome (Aneuploidy).
Mechanisms for global genome size
increase
1. Polyploidization = the addition of one or more
complete sets of chromosomes to the original
set.
2. Repetitive sequences:
Ribosomal RNA genes
Centromeres
 Telomeres
TEs.
Transposable Elements and genome size
• Variation in gene numbers cannot explain
variation in genome size among eukaryotes
• Most of variation in genome size is due to
variation in the amount of repetitive DNA
(mostly derived from TEs)
• TEs accumulate in intergenic regions
The amount of TE correlate positively with genome
size
Mb
3000
Genomic DNA
2500
TE DNA
2000
1500
Protein-coding
DNA
1000
500
0
(Feschotte & Pritham 2006)
The proportion of protein-coding genes decreases with genome size, while the
proportion of TEs increases with genome size
TEs
Protein-coding
genes
Gregory, Nat Rev Genet 2005
Contrasted Genome Landscapes
Transposable Element
Genetic components of the human
genome
Noncoding DNA
the end of the paradox
• Today, C-value differences are no longer
paradoxical.
• In spite of its label, the “paradox” was not the
lack of a correlation with complexity, per se, but
rather the inability of early researchers to
reconcile the constancy of DNA content within
species (which occurs because it is the stuff of
genes) with the variation in quantity of DNA
among species (which does not relate to the
number of genes).
Excess transposition may provoke
rapid changes in genome size
e.g. grass genomes
Long Terminal Repeat (LTR)
retrotransposons
• Abundant and can impact gene and genome evolution.
• Most are large elements (0.4 kb) and are most often found
in heterochromatic (gene poor) regions.
• The smallest LTR retrotransposon = 292 bp (Gao et al.,
2012):
• In rice, maize, sorghum and other grass genomes (indicates
presence in the grass ancestor at least 50– 80 MYA). It may
still be active in some genomes
• The small LTR retrotransposons (SMARTs) => distributed
throughout the genomes and are often located within or
near genes=> can in a few instances alter both gene
structures and gene expression.
Rapid changes in genome size in the grasses
~50 myr
~10 myr
Genome size:
4800 Mb
430 Mb
750 Mb
2500 Mb
Figure adapted from Sue Wessler
Variation in TE activity triggers rapid changes in genome
size in grasses
Genes
TEs
~50 myr
~10 myr
Genome size:
4800 Mb
430 Mb
750 Mb
2500 Mb
Retrotransposon amplification
has resulted in the doubling of
the maize genome in the last ~6
myr
(San Miguel et al. 1998)
Variation in TE activity triggers rapid changes in genome
size in grasses
Genes
TEs
~50 myr
~10 myr
Genome size:
4800 Mb
430 Mb
750 Mb
2500 Mb
3 super-abundant retrotransposon families in O. australiensis
That’s 62% of the genome !
(605/965 Mb)
(Piegu et al. , 2006)
The solution to the paradox
Most eukaryotic DNA does not code for
proteins, so there is no reason to expect a
complex organism to have a large genome or a
simple organism to have a small one.
“The C-value paradox vanished the
moment geneticists abandoned the
concept of the genome consisting of
the genes, all the genes, and nothing
but the genes”
C….G….&…..I values
• C -Value :The amount DNA found in haploid
genome, measured in million base pairs or in
pg.
• G- Value: The number of gene found in the
haploid genome; the number includes
predicted and ORFs.
• I- value: The amount of information
embedded by the genome.
“We’re pretty good at thinking
about how individual genes are
turned on and off. We’re not as
good at thinking about how the
whole genome is coordinated.”
Quote of Jeanne Lawrence in “The Cell Nucleus
Shapes up“ Science 1993, Vol 259, pp 1257-1259
Constitution of
eukaryotic genome
Eukaryotic genomes composition
1.
2.
3.
4.
5.
Structural genes (e.g. operon models).
Interrupted genes
Conserved exons & unique introns
Gene numbers
Repetitive DNA (e.g. tandem gene clusters,
tandem arrays)
1. Structural genes
• Are genes that codes for any RNA or protein
product other than a regulatory protein.
• The the Lac Operon is a mRNA structural gene
• Operon Model = inducible genes=> Genes
whose expression is turned on by the
presence of some substance
– Lactose induces expression of the lac genes
– An antibiotic induces the expression of a
resistance gene
lac operon
• Operon = bacterial block of genes encoding enzymes
that are all part of a metabolic pathway
• Composed of 3 structural genes coding for proteins
involved in the uptake and catabolism of lactose
• Lac=> Lactose which is a 12 Carbon sugar made of 2
simpler 6 Carbon sugars (i.e. glucose and galactose)
• glucose is a very efficient carbon source; it can enter
directly into the metabolic paths that provide both
energy and substrates for making more complex
compounds.
• If lactose is provided as the carbon source, it must first
be broken down into the 2 component sugars before it
can be used
Structural genes
In E. coli β-galactosidase breaks lactose
Lactose Operon
• Structural genes
– lac z, lac y, & lac a
– Promoter
– Polycistronic mRNA
• Regulatory gene
– Repressor
• Operator
• Operon
• Inducer - lactose
Regulatory
Gene
i
Operon
p
o
z
y
a
DNA
m-RNA
Protein
Transacetylase
-Galactosidase
Permease
E. coli lac operon
• E. coli grown in glucose as the sole carbon
source have about 3 copies of the enzyme βgalactosidase/cell.
• E. coli grown in lactose as the sole carbon
source have about 3,000 copies of the enzyme
β-galactosidase/cell.
Lac operon functions when only
glucose is present
1. The Promoter for the I gene is always "on", but is very
weak, so it is transcribed only rare
2. The I mRNA is translated into the repressor protein. A
typical cell will have only about 10 copies of this protein.
3. In the absence of lactose, the repressor protein binds to
the
Operator, preventing transcription form the second
promoter. Almost no ZYA mRNA is made.
Only lactose is present
1. The Promoter for the I gene is occasionally bound by RNA polymerase
to initiate transcription.
2. The I mRNA is translated into the repressor protein (~ 10 copies).
3. Lactose binds to the repressor and converts it into an inactive state,
where it can't bind the Operator (reversed when all the lactose is
digested).
4. The promoter for making Z-Y-A mRNA is not blocked=> many copies of
the mRNA are made. The small amount of lactose that diffuses in is able
to initiate induction of transcription of the Z-Y-A mRNA.= the 3 proteins
are made.
5. Translation begins at the 5' end of the mRNA and makes β-galactosidase
from the Z gene. There is a stop codon, followed immediately by
another AUG start, so many, but not all, ribosomes read on through and
make permease from the Y gene. The same process allows some A gene
product to also be made.
Inducer=> lactose
Absence
Active repressor
No expression
= Negative control
Negative Regulation of transcription
Inducible
Negative Regulation
Repressible
Positive Regulation
The lac Operon
Induction of the lac operon
Catabolite Repression
(Glucose Effect)
= Control of an operon by glucose
The lac control region
1. 3 operators (O1, O2, O3); region where regulatory proteins bind
2. RNA polymerase binding site (promoter)i
3. cAMP-CRP complex binding site (CAP)
Mechanism of catabolite repression
• c-AMP
• CAP (CRP)
protein
• CAP-cAMP
complex
– Promoter
activation
• Positive control
Absence of glucose
Adenyl cyclase
c-AMP
CAP
i
p
o
ATP
z
y
a
Active
Inactive
-GalactosidasePermease Transacetylase
Maximum expression
Mechanism of catabolite repression
• Glucose:cAMP
• No CAP-cAMP
complex
– No Promoter
activation
Presence of glucose
Adenyl cyclase
X
CAP
i
p
o
z
ATP
y
a
Inactive
-GalactosidasePermease Transacetylase
Low level expression
Operon model- repressible genes
• Repressible genes are those whose expression
is turned off by the presence of some
substance (co-repressor)
– Tryptophan represses the trp genes
Download