Genomics

advertisement
Genome Composition
Dan Graur
1
Genome
Composition in
Bacteria
2
Carsonella ruddii has a very low GC content.
4
The selectionist explanation views
GC content as an adaptation.
No
Preferentialempirical
usage of amino acids encoded
by GC-rich codons (e.g., ala and arg) and
avoidance ofevidence
amino acids encoded by GCG:C pairs are more stable than A:T pairs.
poor codons (e.g., ser and lys).
T-T dimers are sensitive to UV radiation.
5
The mutationist explanation
Rate of substitution G/C  T/A is m
Rate of substitution T/A  G/C is n
Noboru Sueoka
University of Colorado
6
at equilibrium : PGC 
n
nm
7
m 1 PGC
GC mutational pressure :

n
PGC
8
m
 3  25% GC
n
Mycoplasma capricolum
m
 1  50% GC
n
Escherichia coli
m
 .33  75% GC
n
Micrococcus luteus
9
10
Differences in the way the leading and
lagging strands of DNA are replicated
can result in strand-dependent mutation
patterns.
The expectation under no-strand-bias
conditions is
fA = fT and fC = fG
11
Deviations from equal
mutation rates between the
two strands are quantified by
the
skew.
12
The skew is a measure of inequality
between the frequencies of
nucleotides X and Y on a strand.
f X  fY
S XY 
f X  fY
13
If there are no violations
of the no-strand-bias
conditions:
SXY 0
14
Skew values are calculated for sliding
windows of predetermined lengths, and
are plotted on a skew diagram.
15
chirochore
chirochore
Bacillus subtilis
16
17
Chlamidia trachomatis
18
Compositional Properties of
Eukaryotic Genomes
19
Intergenomic
variability
GC content of bacterial
genomes ranges from
~24% to ~74%
GC content of
vertebrate genomes
ranges from ~40% to
~45%
20
Interspecific variation among vertebrate genomes is low.
TTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTT
However, vertebrates seem to have a much more complex
AACGGCGACCCTAAGGTTGACGACGGACCCAGCAGTGATGCTAATCTCAGCGCTCCGCTG
intragenomic compositional organization (internal
ACCCCTCAGCAAAGGGCTTGGCTCAATCTCGTCCAGCCATTGACCATCGTCGAGGGGTTT
GCTCTGTTATCCGTGCCGAGCAGCTTTGTCCAAAACGAAATCGAGCGCCATCTGCGGGCC
structure) than prokaryotic genomes.
CCGATTACCGACGCTCTCAGCCGCCGACTCGGACATCAGATCCAACTCGGGGTCCGCATC
GCTCCGCCGGCGACCGACGAAGCCGACGACACTACCGTGCCGCCTTCCGAGAGATTGATG
ACAGCGCTGCGGCACGGGGCGATAACCAGCACAGTTGGCCAAGTTACTTCACCGAGCGCC
CGCACAATACCGATTCCGCTACCGCTGGCGTAACCAGCCTTAACCGTCGCTACACCTTTG
ATACGTTCGTTATCGGCGCCTCCAACCGGTTCGCGCACGCCGCCGCCTTGGCGATCGCAG
AAGCACCCGCCCGCGCTTACAACCCCCTGTTCATCTGGGGCGAGTCCGGTCTCGGCAAGA
CACACCTGCTACACGCGGCAGGCAACTATGCCCAACGGTTGTTCCCGGGAATGCGGGTCA
AATATGTCTCCACCGAGGAATTCACCAACGACTTCATTAACTCGCTCCGCGATGACCGCA
AGGTCGCATTCAAACGCAGCTACCGCGACGTAGACGTGCTGTTGGTCGACGACATCCAAT
TCATTGAAGGCAAAGAGGGTATTCAAGAGGAGTTCTTCCACACCTTCAACACCTTGCACA
ATGCCAACAAGCAAATCGTCATCTCATCTGACCGCCCACCCAAGCAGCTCGCCACCCTCG
AGGACCGGCTGAGAACCCGCTTTGAGTGGGGGCTGATCACTGACGTACAACCACCCGAGC
TGGAGACCCGCATCGCCATCTTGCGCAAGAAAGCACAGATGGAACGGCTCGCGGTCCCCG
ACGATGTCCTCGAACTCATCGCCAGCAGTATCGAACGCAATATCCGTGAACTCGAGGCCG
AGGAATTCACCAACGACTTCATTAACTCGCTCCGCGATGACCGCAAGGTCGCATTCAAAC
GCAGCTACCGCGACGTAGACGTGCTGTTGGTCGACGACATCCAATTCATTGAAGGCAAAG
21
How are nucleotides distributed along the genome?
Uniform?
Patchy? Clines?
TTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTT
AACGGCGACCCTAAGGTTGACGACGGACCCAGCAGTGATGCTAATCTCAGCGCTCCGCTG
ACCCCTCAGCAAAGGGCTTGGCTCAATCTCGTCCAGCCATTGACCATCGTCGAGGGGTTT
GCTCTGTTATCCGTGCCGAGCAGCTTTGTCCAAAACGAAATCGAGCGCCATCTGCGGGCC
CCGATTACCGACGCTCTCAGCCGCCGACTCGGACATCAGATCCAACTCGGGGTCCGCATC
GCTCCGCCGGCGACCGACGAAGCCGACGACACTACCGTGCCGCCTTCCGAGAGATTGATG
ACAGCGCTGCGGCACGGGGCGATAACCAGCACAGTTGGCCAAGTTACTTCACCGAGCGCC
CGCACAATACCGATTCCGCTACCGCTGGCGTAACCAGCCTTAACCGTCGCTACACCTTTG
ATACGTTCGTTATCGGCGCCTCCAACCGGTTCGCGCACGCCGCCGCCTTGGCGATCGCAG
AAGCACCCGCCCGCGCTTACAACCCCCTGTTCATCTGGGGCGAGTCCGGTCTCGGCAAGA
CACACCTGCTACACGCGGCAGGCAACTATGCCCAACGGTTGTTCCCGGGAATGCGGGTCA
AATATGTCTCCACCGAGGAATTCACCAACGACTTCATTAACTCGCTCCGCGATGACCGCA
AGGTCGCATTCAAACGCAGCTACCGCGACGTAGACGTGCTGTTGGTCGACGACATCCAAT
TCATTGAAGGCAAAGAGGGTATTCAAGAGGAGTTCTTCCACACCTTCAACACCTTGCACA
ATGCCAACAAGCAAATCGTCATCTCATCTGACCGCCCACCCAAGCAGCTCGCCACCCTCG
AGGACCGGCTGAGAACCCGCTTTGAGTGGGGGCTGATCACTGACGTACAACCACCCGAGC
TGGAGACCCGCATCGCCATCTTGCGCAAGAAAGCACAGATGGAACGGCTCGCGGTCCCCG
ACGATGTCCTCGAACTCATCGCCAGCAGTATCGAACGCAATATCCGTGAACTCGAGGCCG
AGGAATTCACCAACGACTTCATTAACTCGCTCCGCGATGACCGCAAGGTCGCATTCAAAC
GCAGCTACCGCGACGTAGACGTGCTGTTGGTCGACGACATCCAATTCATTGAAGGCAAAG
22
“When vertebrate genomic DNA is
randomly sheared into fragments 30100 kb in size and the fragments are
separated by base composition, the
fragments cluster into a small number
of classes distinguished from each other
by their GC content. Each class is
characterized by bands of similar, but
not identical, base compositions.”
Equilibrium centrifugation in Cs2SO4 density gradient
(Macaya et al. 1976; Thiery et al. 1976;
Bernardi et al. 1985)
23
carp
24
The Isochore Theory - Giorgio Bernardi
carp
25
26
Isochores do not merit
the prefix “iso.”
Lander et al. (2001)
27
Post genomic era (2001)
Objections against the isochore theory:
“We can rule out a strict notion of isochores as
compositionally homogeneous.” Lander et al. (2001)
“There are no isochores in chromosomes 21 and 22.”
Häring and Kyper (2001)
Defense of the isochore theory:
“The conclusion of the authors that ‘isochores’ are
not ‘strict isochores’ is correct, however isochore are
fairly homogeneous regions.” Bernardi (2001)
28
29
In search of isochores…
Questions:
 Do isochores exist?
 Is the isochore theory a useful (or practical)
concept?
30
Segmentation Models
• Assumption: Sequences can be
partitioned into a number of segments
each with a characteristic GC content.
• Each segment has a certain degree of
internal homogeneity (or similarity).
31
In search of isochores…
Methodology:
 Define rigorously 6 attributes of isochores and
of the isochore theory as applied to humans
 Test attributes against the human genome data
32
Attributes of isochores
A1. Distinguishability: An isochore is a DNA
segment that has a characteristic GC content that differs
significantly from the GC content of adjacent isochores.
A2. Homogeneity: An isochore is more homogeneous
in its composition than the chromosome on which it
resides.
A3. Minimum length: The length of an isochore
exceeds a certain cutoff value. In the literature, the most
commonly mentioned value is 300 Kb.
33
Attributes of the isochore theory in humans
A4. Genome coverage: The overwhelming majority of
the human genome consists of segments abiding by A1A3. Non-isochoric DNA takes up only a small fraction of
the genome.
34
Attributes of the isochore theory in humans
A5. Isochore families:
The human genome
comprises of five isochore
families, each described by a
particular Gaussian
distribution of GC content.
35
Practicality of the isochore theory
A6. Isochore assignment
into families: It is possible to
classify each isochore into its
isochore family based
solely on its compositional
properties.
36
Segment length distribution
The fitted regression line
(solid line) indicates that
the tail of the distribution
exhibits power-law decay
with an exponent of –2.38.
P  L–2.38
37
Power laws everywhere!
38
Isochore families
Most parsimonious Gaussian fit
to putative isochores
2
1
3
4
39
Homogeneous “isochores” in vertebrates
40
Assignment into families
Classification errors reach values of 70%. Only a minute fraction
of segments can be classified with an expected error under 5%.
41
Summary
(A1) Distinguishability

(A2) Homogeneity
50%
(A3) Minimum length
X
(A1) Genome coverage
41%
(A2) Isochore families
4 families
(A3) Isochore assignment into families
X
42
Conclusion:
The isochore theory may have
reached the limits of its usefulness
as a description of genomic
compositional structures.
43
44
45
As of December 2004
17 genetic codes
11 mitochondrial
5 nuclear
1 nuclear + mitochondrial
46
Lock & Key Hypothesis
47
Evolutionary
Dead Ends
Frozen accidents
48
49
The codoncapture
hypothesis
Thomas Jukes
50
Universal
genetic
code
AAA = lysine
51
Echinodermata
AAA = asparagine
52
Hemichordata
AAA = unassigned
53
54
Download