Genome projects and model organisms

advertisement
Genome projects and
model organisms
Level 3 Molecular Evolution and
Bioinformatics
Jim Provan
Genome projects and model organisms
Genome projects
Completed genomes:
Eubacteria (inc. Escherichia coli, Bacillis subtilis,
Haemophilus influenzae, Synechocystis PCC6803)
Archaea (inc. Methanococcus jannaschii, Methanobacterium
thermoautotrophium)
Eukarya:
—
—
—
—
Saccharomyces cerevisiae
Caenorhabditis elegans
Homo sapiens
Arabidopsis thaliana
Partially sequenced genomes e.g. Drosophila
melanogaster, Fugu rubripes, Oryza sativa
Relationships between model organisms
H. sapiens
C. elegans
D. melanogaster
S. cerevisiae
A. thaliana
Methanococcus
Archaeglobus
Synechocystis PCC6803
B. subtilis
M. genitalium
M. pneumoniae
B. burgdorferi
H. influenzae
E. coli
Eubacterial genomes: Bacillus subtilis
Genome 4,214,810 bp:
4100+ protein sequences
Average gene 890 bp
—
—
Density 1 gene / 1028 bp
89% of total genome is
protein-coding
Protein coding genes:
53% single copy
47% paralogous gene families:
—
—
Mostly involved in transport
Genes are proximal i.e. have
evolved through tandem
duplication of single genes
Eubacterial genomes: Bacillus subtilis
On the basis of homology with genes of known
function, 58% of B. subtilis genes could be assigned
to functional categories
The B. subtilis genome contains remnants of 10
prophages, suggesting that horizontal transfer has
played a significant role in evolution of the genome
Orthologous counterparts in other bacteria:
~1000 genes (24%) have counterparts in E. coli (Gram -ve)
More significantly, ~100 operons conserved as well
~800 genes (20%) have orthologues in Synechocystis
PCC6803 (Cyanobacterium)
Eubacterial genomes: Mycoplasmas
Obligate parasites
Thought to be derived
from Gram +ve bacteria
similar to B. subtilis
312 genes of M. genitalium
(66%) have homologues in
Gram +ve bacteria
Parasitic lifestyle has led to
a dramatic reduction in
genome size and content
Smallest-known genome in
a self-replicating organism
Eubacterial genomes: Mycoplasmas
M. genitalium genome:
Circular chromosome of 580,070 bp
Only 470 predicted genes for DNA replication, transcription
and translation, DNA repair, cellular transport and energy
metabolism
Coding regions comprise ~88% of the genome
—
—
Similar to H. influenzae (85%)
Suggests that genome reduction has been due to loss of genes
and not reduction in gene size or increase in gene density
M. pneumoniae genome:
Larger than M. genitalium (816 kbp)
All M. genitalium genes found in M. pneumoniae
Not simply truncated - evidence of genome rearrangements
Eubacterial genomes: E. coli
4288 protein coding genes:
Average ORF 317 amino acids
Very compact: average
distance between genes 118bp
Numerous paralogous gene
families: 38 – 45% of genes
arisen through duplication
Homologues:
H. influenzae (1130 of 1703)
Synechocystis (675 of 3168)
M. jannaschii (231 of 1738)
S. cerevisiae (254 of 5885)
The minimum genome and redundancy
Minimum set of genes required for survival:
Replication and transcription
Translation (rRNA, ribosomal proteins, tRNAs etc.)
Transport proteins to derive nutrients
ATP synthesis
Entire pathways eliminated in Mycoplasma:
Amino acid biosynthesis (1 gene vs. 68 in H. influenzae)
Metabolism (44 genes vs. 228 in H. influenzae)
Comparison of M. genitalium and H. influenzae has
identified a minimum set of 256 genes
Archaeal genomes: M. jannaschii
Requires no organic
nutrients for growth: has all
biochemical pathways to
use inorganic constituents
Only 38% of genes could be
assigned a known function
Genes for translation,
transcription and DNA
replication similar to
eukaryote genes:
DNA polymerase
Ribosomal proteins
Translation initiation factors
Fungal genomes: S. cerevisiae
First completely sequenced
eukaryote genome
Very compact genome:
Short intergenic regions
Scarcity of introns
Lack of repetitive sequences
Strong evidence of
duplication:
Chromosome segments
Single genes
Redundancy: non-essential
genes provide selective
advantage
Plant genomes: Arabidopsis thaliana
Contains 25,498 genes
from 11,000 families
Cross-phylum matches:
Vertebrates 12%
Bacteria / Archaea 10%
Fungi 8%
60% ESTs have no match
in non-plant databases
Evolution involved whole
genome duplication
followed by subsequent
gene loss and extensive
local gene duplications
Invertebrate genomes: C. elegans
Genome even less compact
than yeast:
One gene every 7143 bp (2155
bp in yeast)
Due mainly to introns in protein
coding genes
Much more compact than
humans (One gene every
50,000 bp)
Compactness due mainly to
polycistronic arrangement:
Trans-splicing
Co-expression and co-regulation
Vertebrate genomes: Fugu rubripes
Pufferfish genome (400 Mb)
only four times larger than C.
elegans and 7.5 times smaller
than human genome
Homologous genes in Fugu
and mammals show
conserved synteny:
Same exon-intron organisation
Introns much smaller
Useful for identifying conserved
essential elements in vertebrate
genomes
The genome of the cenancestor
Availability of complete genome sequences from the
three domains of life creates an opportunity for the
reconstruction of the complete genome of the
common ancestor:
Of minimal bacterial set (256 genes), 143 have orthologues
in yeast (eukaryote)
Universal translation apparatus suggests that cenancestor
had a fully developed translation system
Extreme differences in DNA replication apparatus
Many fundamental metabolic processes are carried out by
similar proteins in Archaea and eubacteria:
—
—
Suggests a universal, autotrophic ancestor
Not all central metabolism is universal (methanogenesis,
photosynthesis etc.)
Download