handout

advertisement
BB30055: genes and genomes
MV Hejmadi (bssmvh), 2005-06
BB30055: Overview of topics (MVH lectures)
Learning objectives: to understand genome biology and its applications.
Reading List:
1. Human Molecular Genetics 3/e by Strachan and Read (Garland Science) – excellent updated text with
lots of clear and detailed coverage of genome projects, applications, techniques and evolution.
2. Genetics: from genes to genomes (2/e) by Leland Hartwell et al (McGraw-Hill) – good updated overview
on most topics covered here
3. Genomes (2/e) by TA Brown (Bios) – rather basic but clear with good illustrations
Overview of lectures
(A) Genomes
 Introduction to genomes,
the human genomeorganisation and insights
from HGP
 Repetitive elements
 Post genomics:
transcriptomes, proteomics,
systems biology
(B) Applications of genome projects
 Genetic mapping and
identifying human disease
genes
 Mapping and identifying
complex traits
 Genetic testing & DNA
profiling
(C) Genome evolution
 Recombination and
transposition
 Molecular and
population evolution
 Sex chromosomes and
mitochondrial
genome
(A) Genomes (lectures 1+2)
Why study the genome? 3 main reasons
1) description of sequence of every gene valuable. Includes regulatory regions which help in understanding not
only the molecular activities of the cell but also ways in which they are controlled.
2) identify & characterise important inheritable disease genes or bacterial genes (for industrial use)
3) Role of intergenic sequences e.g. satellites, intronic regions etc
Organisation of the human genome
A) Nuclear genome – 3.2 Gbp size with ~ 30,000 genes
24 types of chromosomes:
shortest- Y(51Mbp) and the longest 1(279Mbp)
(B) mitochondrial genome Multicopy, circular genome16,569 bp
Nuclear genome organisation
Coding regions
a) Polypeptide -coding DNA – Exons
b) Non-polypeptide coding DNA: Regulatory
regions, introns, pseudogenes, gene fragments
Intergenic regions
a) Unique or low copy number sequences
b) Repetitive sequences:
Coding regions:
Polypeptide-coding
Gene organisation: Rare bicistronic units
BB30055: genes and genomes
MV Hejmadi (bssmvh), 2005-06
Non-polypeptide-coding:
RNA encoding
Gene fragments:
Pseudogenes () – processed pseudogenes
Intergenic regions
Repetitive sequences:



Tandem repeats : satellites, minisatellites and microsatellites
Interspersed repeats: LINEs, SINEs, LTR and DNA transposons
Segmental duplications: intra or interchromosomal duplications
class
Tandem repeats:
Blocks of tandem repeats at
 subtelomeres
 pericentromeres
 Short arms of acrocentric
chromosomes
 Ribosomal gene clusters
Satellites:
Large arrays of repeats
E.g. Satellite 1,2 & 3
 (Alphoid DNA)
 satellite
Size of
repeat
Repeat
block
Major
chromosomal
location
Satellite
5-171 bp
> 100kb
centromeric
heterochromatin
minisatellite
9-64 bp
0.1–20kb
Telomeres
microsatellites
1-13 bp
< 150 bp
Dispersed
Minisatellites:
Moderate sized arrays of repeats E.g.
Hypervariable minisatellite DNA
- core of GGGCAGGAXG
- found in telomeric regions
- used in original DNA fingerprinting
technique by Alec Jeffreys
Microsatellites
(VNTRs - Variable Number of Tandem
Repeats,
SSR - Simple Sequence Repeats)
1-13 bp repeats e.g. (A)n ; (AC)n
2% of genome (dinucleotides - 0.5%)
Used as genetic markers (especially for
disease mapping)
BB30055: genes and genomes
MV Hejmadi (bssmvh), 2005-06
Interspersed repeats or Transposon-derived repeats. They constitute 45% of genome and arise mainly as a
result of transposition either through a DNA/RNA intermediate. They can be divided into 4 main types
1) LINEs (long interspersed elements)
Most ancient of eukaryotic genomes
 Autonomous transposition
(reverse trancriptase)
 ~6-8kb long
 Internal polymerase II promoter
and 2 ORFs
 3 related LINE families in humans – LINE-1, LINE-2, LINE-3.
 Believed to be responsible for retrotransposition of SINEs and creation
of processed pseudogenes
2) SINEs (short interspersed elements)
 Non-autonomous (successful freeloaders! ‘borrow’ RT from other
sources such as LINEs)
 ~100-300bp long
 Internal polymerase III promoter
 No proteins
 Share 3’ ends with LINEs
 3 related SINE families in
humans are active Alu, inactive
MIR and Ther2/MIR3.
3) Long Terminal Repeats (LTR)
Repeats on the same orientation on both sides of element e.g. ATATATNNNNNNNATATAT
• contain sequences that serve as transcription promoters
• as well as terminators.
• These sequences allow the element to code for an mRNA molecule that is processed and polyadenylated.
• At least two genes coded within the element to supply essential
• activities for the
retrotransposition
mechanism.
• The RNA contains a
specific primer binding
site (PBS) for initiating
reverse transcription.
• A hallmark of almost all mobile elements is that they form small direct repeats formed at the site of
integration.
4) DNA transposons:
Inverted repeats on both sides of element
e.g. ATGCNNNNNNNNNNNCGTA
Segmental duplications:
 Closely related sequence blocks at different genomic loci
 Transfer of 1-200kb blocks of genomic sequence
 Segmental duplications can occur on homologous chromosomes (intrachromosomal) or non homologous
chromosomes (interchromosomal)
 Not always tandemly arranged
 Relatively recent
BB30055: genes and genomes
MV Hejmadi (bssmvh), 2005-06
Major insights from the HGP on genome organisation: (Ref: Nature,2001 pp875-915)
1) Genes: Genes vary widely in their size, content and locationMore genes: Twice as many as drosophila /
C.elegans

Uneven gene distribution: Gene-rich and gene-poor regions

More paralogs: some gene families have extended the number of paralogs e.g. olfactory genes

More alternative transcripts: Increased RNA splice variants thereby expanding proteins by 5 fold
2) Proteome: proteome more complex than invertebrates
Domain arrangements in human:
 largest total number of domains is 130
 largest number of domain types per protein is 9
 Mostly identical arrangement of domains no huge difference in domain number in humans, but frequency of
domain sharing very high in human proteins (especially structural proteins and proteins involved in signal
transduction and immune function).
Only 3 cases where a combination of 3 domain types shared by human & yeast proteins.
3) Single nucleotide polymorphisms (SNP) identificationSites that result from point mutations in individual base
pairs

Biallelic and responsible for unique individual genome

~60,000 SNPs lie within exons and untranslated regions (85% of exons lie within 5kb of a SNP)

May or may not affect the ORF

Most SNPs may be regulatory

One every 1.9kb length on average with variable densities over regions and chromosomes. e.g.HLA region
has a high SNP density, reflecting maintenance of diverse haplotypes over My
4) Distribution of GC content
Genome wide average of 41%. Huge regional variations exist E.g.distal 48Mb of chromosome 1p-47% but
chromosome 13 has only 36%. Confirms cytogenetic staining with G-bands (Giemsa); dark G-bands – low GC
content (37%) light G-bands – high GC content (45%)
5) CpG islands (~28,890 in number)
Greatly under-represented in human genome
•CpG islands show no methylation
•Variable density e.g. Y – 2.9/Mb but 16,17 & 22 have 19-22/Mb (average is 10.5/Mb)
6) Recombination rates
•Recombination rate increases with decreasing chromosome arm length
•Recombination rate suppressed near the centromeres and increases towards the distal 20-35Mb
7) Repeat content
a) Age distribution
 Most interspersed repeats predate eutherian radiation
 LINEs and SINE have extremely long lives
 2 major peaks of transposon activity
 No DNA transposition in the past 50MYr
 LTR retroposons teetering on the brink of extinction
 overall decline in IR activity in hominid lineage in past 35-40MYr compared to mouse genome
b) Comparison with other genomes: Compared to fruitfly, C.elegans and plant genomes, human genomes show
 higher density of transposable elements in euchromatic portion of genome
 Higher abundance of ancient transposons
 60% of IR made up of LINE1 and Alu repeats whereas DNA transposons represent only 6%
c) Variation in distribution of repeats: regions show either a high repeat density (e.g. chromosome Xp11 – a
525kb region shows 89% repeat density) or a low repeat density (e.g. HOX homeobox gene cluster (<2%
repeats), indicative of regulatory elements which have low tolerance for insertions)
d) Distribution by GC content: (High GC – gene rich ; High AT – gene poor)
LINEs abundant in AT-rich regions but SINEs lower in AT-rich regions. Alu repeats in particular, retained
in actively transcribed GC rich regions. E.g. chromosme 19 has 5% Alus compared to Y
e) Y chromosome: Unusually young genome (high tolerance to gaining insertions). Mutation rate is 2.1X higher
in male germline, possibly due to cell division rates or different repair mechanisms
References:
Chapter 9: Human Molecular Genetics 3 by Strachan and Read
Chapter 10: Genetics from genes to genomes by Hartwell et al (2/e) pp 339-348
Nature (2001) 15Feb (409) pg 814-816 & 875-900 http://www.bath.ac.uk/library/subjects/bs/links.html#hgp
Download