structure and function of genome

advertisement
Structure and function of genome
Genome and Gene
gene is the basic functional unit of heredity in a living organism.
Its nature is the nucleic sequence encoding a polypeptide or protein .
Gene determines amino acid sequences of a polypeptide, and also
determines the cell-specific traits.
rRNA, tRNA, also have their own gene.
genome is the entirety of an organism's hereditary information. It is
encoded either in DNA or, for many types of virus, in RNA. The genome
includes both the genes and the non-coding sequences of the DNA of
haploid.
The human genome contains 24 chromosomes.
Section 1 genome of virus
The main types of virus genome :
Double-stranded DNA: SV40, adenovirus, herpes virus.
Single-stranded DNA: parvovirus, M13 phage
Double-stranded RNA: retrovirus.
Plus-strand RNA: polio virus, corona virus.
Minus-strand RNA: rabies virus, influenza virus, measles virus.
Reverse transcription virus: specific taxa, such as HIV, HCV.
genome of SV40 virus
 Double-stranded circular DNA
 Regions of early genes and late genes.
Early genes: T antigen and t antigen.
Late gene: VP1, VP2, VP3.
there are regulatory regions between
early genes and late genes : including
origin of replication, promoter,
enhancer.
genome of SV40 virus
 There is the phenomenon of alternative splicing and
overlapped genes
retrovirus
 Carrying two identical
positive-stranded RNA.
bind two tRNA in host cell
The structural proteins of
virus :
envelope protein(env),
Capsid protein (gag).
Reverse transcriptase (pol)
genome of Retrovirus
 Coding region containing three genes: gag, pol and env.
Non-coding region:
R region: 20 ~ 80 nucleotide repeats.
PBS: primer binding sites, binding to tRNA as a primer.
U region :promoter in U3 and polyadenylation signal in U5.
provirus: long terminal repeat (LTR ) in the end.
genome is simple, with a small number of coding genes. The genome
size of different viruses varied significantly.
Hepatitis B virus: 3.2kb; pox virus: 300kb.
Genomes of different viruses have individual natures of nucleic acid.
Most of genes is single copy with single-stranded nucleic acid.
Retrovirus: diploid.
Influenza virus: 8 single-stranded RNA.
Retrovirus: 10 double-stranded RNA.
 Most of the genome is coding sequences, a part of it is regulatory
sequence, and very few of it is structure sequence.
Replication and transcription of virus depend on host cells.
Eukaryotic viruses can contain introns, while bacteria and viruses
not.
Alternative splicing happened in viral genome and produced
several kinds of mRNA from one transcript.
gene overlapping is common. A sequence may have two kinds of
open reading frame, resulting protein with very different amino
acid.
Section 2 Genome of prokaryotes
 Containing the complete set of genome to ensure their own
metabolism and reproduction.
The survival of mycoplasma, chlamydia etc depends on the host.
Containing genes which can regulate their own growth and
metabolism based on the environmental change.
No differentiation and development in prokaryotes ,and the number
of gene is smaller than that of eukyrocytes.
Genome of E. coli
Size of E.coli:4.6 × 106bp,
4288 ORF; 2584 operons.
The average size of the gene:951bp.
The average interval between genes:118bp
features of prokaryotic genome
Genome usually consists of double-stranded circular DNA.
Prokaryotic DNA does not form a chromosome. There is
no nucleus, but there is a nucleoid where DNA concentrate.
The average size of genes is around 106~107 bp.
The number of genes is fewer
the features of structure and function
of prokaryotic genome
An operon is a functioning unit of genomic material containing a
cluster of genes under the control of a single regulatory signal or
promoter. The genes are transcribed together into a mRNA strand
and either translated together in the cytoplasm.
Polycistronic mRNA:a single mRNA molecule that codes for more than
one protein
 The majority of the genome is single sequence, and rarely
duplicated.
rRNA gene are multiple copies.
Isozymes in genome: E. coli has three acetolactate synthase, and
two branches mutase.
The majority of sequences are coding sequence, with a very few
non-coding sequences.
There is a certain regulatory sequences which often contained
inverted repeat.
Most genes are in the state of expression.
Plasmid DNA
A plasmid is an extra chromosomal DNA molecule separate from the
chromosomal DNA which is capable of replicating independently
from the chromosomal DNA. In many cases, it is circular and
double-stranded. Plasmids usually occur naturally in bacteria,
Its size varies from 1.5 to 15 kb.
to classify plasmids is by function. There are 3 main classes:
Fertility-F-plasmids, which contain tra-genes. They are capable of conjugation
(transfer of genetic material between bacteria which are touching).
Resistance-(R)plasmids, which contain genes that can build a resistance
against antibiotics or poisons and help bacteria produce pili.
Col-plasmids, which contain genes that code for (determine the production of)
bacteriocins, proteins that can kill other bacteria.
F factor
 Conjugation: transfer of genetic material between bacteria which are touching
Transposable elements
Transposable elements :the genetic material of genome that can move
independently
They can cause the changes of genome structure and gene sequences
The types of transposable elements
insertion sequence
transposon
transposable bacteriophage。
Insertion sequence
An insertion sequence is a short DNA sequence that acts as a simple transposable
element. IS have two major characteristics:
they are small, generally around 700 to 2500 bp in length
only code for proteins implicated in the transposition. These proteins are usually
the transposase which catalyses the enzymatic reaction allowing the IS to move,
and also one regulatory protein which either stimulates or inhibits the
transposition activity. The coding region in an IS is usually flanked by inverted
repeats.
Frequency of translocation is 10-7
Transposon
Transposons are sequences of DNA that can move around to different positions within
the genome of a single cell, a process called transposition. transposons, which
carry transposase gene and accessory genes such as antibiotic resistance genes
In the process, they can cause mutations and change the amount of DNA in the genome.
Transposons were also once called jumping genes, and are examples of mobile genetic
elements.
They were discovered by Barbara McClintock early in her career, for which she was
awarded a Nobel Prize in 1983.
Genetic effects of transposable elements
1 The transposition of a transposable element is not
movement of itself, but to copy a new copy of the gene.
2 When transposition occurred, the target sequence doubled,
and located on both sides of transposable elements to form
direct repeat sequences
3 form the co-integrate in the process of transposition
4 chromosomal aberrations possibly occurred
5 transposable elements can be excised from the original
location
Transposons are mutagens. They can damage the genome of their host cell in
different ways:
A transposon that inserts itself into a functional gene will most likely disable
that gene.
After a transposon leaves a gene, the resulting gap will probably not be
repaired correctly.
Multiple copies of the same sequence, such as Alu sequences can hinder
precise chromosomal pairing during mitosis and meiosis, resulting in unequal
crossovers, one of the main reasons for chromosome duplication.
Diseases that are often caused by transposons include hemophilia A and B,
severe combined immunodeficiency, porphyria, predisposition to cancer, and
Duchenne muscular dystrophy.
Additionally, many transposons contain promoters which drive transcription of
their own transposase. These promoters can cause aberrant expression of
linked genes, causing disease or mutant phenotypes
Significance of bacterial genomics
research
To shed more light on the characteristics of pathogenic microorganisms
and pathogenic mechanism.
To provide more convenient tools for the discovery of disease-causing
genes
To Reveal more pathogen-specific sequence, and to improve the
accuracy of identification of pathogens.
To provide a basis for the discovery of vaccines and screening of durgs
Section 3 Eukaryotic genomes
Most eukaryotes are multi-cellular organisms, with the
complex phenomenon of differentiation and development.
Eukaryotes have more genes and more complex regulation
mechanism than that in prokaryotes
Eukaryotes have a nucleus, and the genome in the nucleus
bind to histone proteins to form chromatin.
Mitochondria and chloroplasts of the eukaryotic also have
their own genetic material.
There are 280 kinds of eukaryotic genome
project, of which 19 kinds have been completed,
including 3 kinds of plants, 9 kinds of fungi, 3
kinds of protozoa, Caenorhabditis elegans,
Drosophila, mouse, human
The structural characteristics of
eukaryotic genomes
Linear double-stranded DNA, and each species has a fixed number
of chromosomes.
eukaryotic cells are generally diploid.
Yeast has both haploid and diploid states.
Haploid and polyploid widely exist in eukaryotic species .
Structure of eukaryotic genomes is complex, and the number of
genes is large.
The size of the human genome is about 1000 times bigger than
that of E. coli.
The number of human genes is about 10 times more than that
of E. coli.
An mRNA molecule is said to be monocistronic when it contains the genetic
information to translate only a single protein.
polycistronic mRNA carries the information of several genes, which are
translated into several proteins. These proteins usually have a related function
and are grouped and regulated together in an opero
rRNA and tRNA mRNA are polycistronic.
There is no operon, and function-related genes are often sparse in
different parts of the genome.
α-globin gene locates in chromosome 16.
β-globin gene locates in chromosome 11.
The vast majority of genome is non-coding sequence, with the role
of forming structure and regulation. coding sequences less than 10%. Size
of the human genome is 3 × 109 bp, with only 3 × 104 genes, the
average size of genes is 105 bp.
Containing a large number of repetitive sequences.
Highly repetitive sequences: 105 or more
Moderately repetitive sequence :10-104
Single-copy sequence: less than 10
The structural characteristics of eukaryotic genomes
Eukaryotic genes are split genes
An intron is a DNA region within a gene that is not translated into protein
The non-coding sequences within genes.
exon can refer to the sequence in the DNA or its RNA transcript
The structural characteristics of eukaryotic genomes
A gene family is a set of genes with a known homology. They are generally
biochemically similar.
Globin gene family (α, β, γ, δ, ε, ζ).
Superfamily: gene members shared structural homology and different function.
The structural characteristics of eukaryotic genomes
A gene cluster is a set of two or more genes that serve to encode
for the same or similar products. Because populations from a
common ancestor tend to possess the same varieties of gene
clusters, they are useful for tracing back recent evolutionary
history.
Histone gene cluster: 5 kinds of genes clustered in tandem, and
there are multiple copies.
α-globin gene cluster
 An example of a gene cluster is the Human a-globin gene cluster,
which contains 3 functional genes and 3 non-functional gene for
similar proteins
 α1, α2: α gene duplicate with adult expression.
ξ: embryonic genes. ψξ, ψα1, ψα2: pseudo-genes, 75% homology
with α, accumulate much mutations ,so it can not be expressed.
β -globin gene cluster
ε: expressed in early embryonic stage.
γ: in embryonic stage.
δ: express at birth with a extremely low level.
β: key protein expressed in adult.
ψβ, ψβ1: pseudogene.
structural characteristics of eukaryotic genomes
Eukaryotic genomes are highly variable.
During meiosis, association and exchange occurred in
homologous chromosome
Eukaryotic genomes also have mobile genetic material.
transposon
The human genome contains a large number of transposon,
most of which have been inactivated by mutation.
The structure of eukaryotic genomes
the feature of structure of the human genome
Features of human genome
 Genes
There are estimated ca. 54,000 human protein-coding genes.
The number of human genes seems to be less than a factor of two greater than that
of many much simpler organisms, such as the roundworm and the fruit fly.
human cells make extensive use of alternative splicing to produce several different
proteins from a single gene, and the human proteome is thought to be much larger
than those of the afore mentioned organisms
Besides, most human genes have multiple exons, and human introns are frequently
much longer than the flanking exons
Human genes are distributed unevenly across the chromosomes. Each chromosome
contains various gene-rich and gene-poor regions, which seem to be correlated with
chromosome bands and GC-content. The significance of these nonrandom patterns
of gene density is not well understood.In addition to protein coding genes, the
human genome contains thousands of RNA genes, including tRNA, rRNA, microRNA,
and other non-coding RNA genes.
The composition of the human genome
The known coding sequence is only about 1.5%, there are a
large number of interval sequence between the genes,
insertion sequence and repetitive sequence within the gene.
Coding sequence: coding proteins and a variety of RNA, and
part of the coding sequences is.
Non-coding sequences include:
Regulatory sequences: promoter, enhancer and so on.
Intron: it also contain regulatory sequences.
Interval sequence: Junction area between genes.
Repetitive sequences.
the repetitive sequences of the human genome
Inverted repeat sequence
Tandem repeat sequence: satellites, small satellites, mini-satellite,
micro-satellite DNA.
Gene cluster: group proteins, rRNA, tRNA and so on.
Interspersed repeated sequence: Alu family, Kpn family and so on.
Single-copy sequence: gene coding sequences and spacer
sequences.
Satellite DNA consists of highly repetitive DNA, and is so called
because repetitions of a short DNA sequence tend to produce a
different frequency of the nucleotides adenine, cytosine, guanine
and thymine, and thus have a different density from bulk DNA such that they form a second or 'satellite' band when genomic DNA
is separated on a density gradient.
Type
Size of repeat unit (bp)
Location
α (alphoid DNA)
171
All chromosomes
β
68
Centromeres of chromosomes
1, 9, 13, 14, 15, 21, 22 and Y
Satellite 1
25-48
Centromeres and other
regions in heterochromatin of
most chromosomes
Satellite 2
5
Most chromosomes
Satellite 3
5
Most chromosomes
 A minisatellite is a section of DNA that consists of a short series of bases 10–
60bp.These occur at more than 1000 locations in the human genome. Some
minisatellites contain a central (or "core") sequence of letters “GGGCAGGANG”
(where N can be any base) or more generally a strand bias with purines (Adenosine
(A) and Guanine (G)) on one strand and pyrimidines (Cytosine (C) and Thymine (T))
on the other. It has been proposed that this sequence per se encourages
chromosomes to swap DNA. In alternative models, it is the presence of a
neighbouring cis-acting meiotic double-strand break hotspot which is the primary
cause of minisatellite repeat copy number variations. Somatic changes are suggested
to result from replication difficulties (which might include replication slippage, among
other phenomena).
Microsatellites, Simple Sequence Repeats (SSRs), or tandem
repeats, are repeating sequences of 1-6 base pairs of DNA.[1]
Microsatellites are typically neutral and co-dominant. They are used
as molecular markers in genetics, for kinship, population and other
studies. They can also be used to study gene duplication or deletion
1. 反向重复顺序
Inverted repeat sequence
 亦称倒位重复顺序(inverted repeats sequence)。
 两端反向重复,可形成发卡结构。
无插入:GGTACC
有插入:GGTNNN…NNNACC
 人类基因组有约 5% 的反向重复顺序,大部分以单拷贝形式散布于
整个基因组。
 常见于蛋白结合区与转录调控区。
 Also known as inverted repeat sequence
Inverted repeats at both ends can form a hairpin.
Without insertion: GGTACC
with insertion: GGTNNN ... NNNACC
There is about 5% inverted repeat sequence in human genome , and the
majority is the form of single copy spersed in the whole genome.
Commonly found in protein-binding regions and the transcriptional
regulatory region.
2. 串联重复顺序
Tandem repeat sequence
 串联重复序列是一个固定的重复单位头尾
相连形成的重复。
 串联重复序列约占基因组的 10%。
 将基因组打断后进行密度梯度离心时发现,
称卫星 DNA。
 组蛋白基因,rRNA 基因等也属串联重复
序列。
 Tandem repeat sequence is duplication formed
by a fixed repeat which is connected end to
end
Tandem repeat sequences account for about
10% of the genome.
 satellite DNA.
Histone genes, rRNA genes also are tandem
repeat.
卫星 DNA
Satellite DNA
 重复次数非常高,可达数百万。每一个重复序列簇有数千重复单元。
 按序列特征可分为Ⅰ、Ⅱ、Ⅲ、Ⅳ、α、β。每种类型有不同家族,其核心序
列不同。
 原位杂交证实:各组卫星 DNA 主要位于异染色质,特别是中心粒。但很少
具有染色体特异性。
 II 和 III 分布于几乎分布于所有染色体。
 一些卫星 DNA 具有染色体特异性和区域特异性。
 β 存在于 Y 染色体等的着丝粒区域。
 α分布于所有染色体的着丝粒区域
Repetition number is very high, up to several millions. There are thousands of
repeatitive units in each repeat cluster.
It can be divided into Ⅰ, Ⅱ, Ⅲ, Ⅳ, α, β by sequence features . Each type has a
different family whith different from its core sequence.
In situ hybridization confirmed: Satellite DNA in each group are mainly located in
heterochromatin, in particular the centriole. But rarely has a chromosomespecific.
II and III is found in almost distributed in all chromosomes.
Some satellite DNA has the chromosome-specific and regional specificity.
β exists in the centromere region of Y chromosome.
α is found in all the centromere region of chromosome 。
小卫星 DNA
Small satellite DNA
 可变数目串联重复:
variable number of tandem repeats,VNTR
6-70 bp,串联成簇,重复几到几十次,个体间重复次数高度可
变。
 端粒:
位于染色体末端,具有保护作用。
TTAGGG 组成的重复序列,往往重复数千倍
variable number of tandem repeats :
variable number of tandem repeats, VNTR
6-70 bp, tandem clustered, repeat a few to dozens of times, the
repeated number is of highly variable among individuals.
Telomere:
At the end of chromosome and has a protective effect.
The repeat sequences composed of TTAGGG often repeated
thousands of times
微卫星 DNA
Microsatellite DNA
 又称短串联重复:
short tandem repeats, STR
 1-4 bp 串联重复。
 2 bp 重复最常见,一般为 (AC)n 或 (TG)n。
 重复 10-60。
当 n 大于 14 时,个体间重复次数高度可变。
 STR 在基因组分布非常广泛。
占约 5%,平均每 30-50 kb 就有一个 STR 序列。
Also known as short tandem repeats:
short tandem repeats, STR
1-4 bp tandem repeats.
the most common appearance is 2 bp duplication and is usually (AC)
n or (TG) n.
Repeat 10-60.
When n is greater than 14, repeated number among individuals is
highly variable.
STR are widely distributed in the genome.
Accounted for about 5%, there is a STR sequence averaging 30-50 kb.
3. 散在重复顺序
Interspersed repetitive sequence
Interspersed repeated sequence。
分散而非成簇,散布于整个基因组。
约占基因组的 20%,包括一些重复基因,但大
多数为非编码序列。
多数散在重复序列是 retrotransposon,具有末端
重复序列,但非 LTR。
在哺乳类,按照其长度大致有两个家族:
SINES: short interspersed nuclear elements
LINES: long interspersed nuclear elements
SINES (short interspersed elements )
 在人类基因组,最常见的是 Alu 家族。
人类基因组中含量最丰富的中度重复,有 70-100 万的
Alu 位点。
平均 5kb 就有一个,约占基因组的 10%。
 具有很强的种属特异性,是人类基因组的标志。
 可被 AluI 分解为 130bp 与 170bp 两个片段,因而得名。
In the human genome, the most common one is the Alu family.
Human genome, the most abundance is moderately repetition,
and there are 70-100 million Alu site.
There is one in average 5kb, about 10% of the genome.
It is with highly species-specific and is a sign of the human genome.
It can be divided into two 130bp and 170bp fragments by AluI, so
the name comes out.
Alu 家
Alu family
 Alu 具有与 7SL RNA 同源的区域,可转录并参与翻译与
蛋白质转运等的调控。
 Alu 是一种不能自主转位的 retrotransposon,
 具有末端正向重复序列,但不编码转位相关基因。
 Alu has the homologous regions with7SL RNA, may be
involved in transcription and the regulation of
translation , transport of protein and so on.
Alu is an retrotransposon without ability of independent
transposition
It has a positive terminal repeat sequence, but do not
encode transposition-related genes.
LINES (Long interspersed elements )
 在人类基因组中,最常见的是 L1 element 。
约有 50000 个拷贝,占基因组的 15%。
是一种自主转位的 retrotransposon,编码转位相关基因。
L1 有多种成员,在人类称 L1Hs/Kpn family
长 度 6.4 kb,但很多有缺失。
可被 KpnI 分解为 4 个片段,因而得名。
In the human genome, the L1 elemen is most common.
There are about 50,000 copies, accounting for 15% of the
genome.
Is a kind of retrotransposonwith ability of independent
transposition can code transposition-related genes.
There are many members of the L1, called L1Hs/Kpn family in
humans.
With the Length of 6.4 kb, but many are missing.
It can be broken down into four segments by KpnI, so is name
comes out.
人类基因组的组成
The composition of the human genome
人类基因组的结构示意图
Schematic diagram of the human genome
(二)线粒体 DNA
Mitochondrial DNA
 长 16569 bp 的双链环状分子。
 共编码 2 个 rRNA,22 个 tRNA,13 个氧化磷酸化相关多肽。
 特征:
母系遗传。
遗传异质性。
突变积累至一定比例才能产生效应,域值效应。
基因排列紧密,对致变因素敏感。
It is the double-stranded circular molecule with the lenth of16569 bp.
It totally encodes 2 rRNA, 22 Ge tRNA, 13 Ge oxidative phosphorylation
related peptides.
Features:
Maternal inheritance.
Genetic heterogeneity.
Only when the mutation accumulate to certain proportion can the effect
be generated,that is threshold effect.
Gene arrange closely and is sensitive to the factors leading to mutations
(三)DNA 多态性
DNA polymorphism
 在特定的基因组位点,出现多种等位基因的现象。
 位点多态性:
碱基组成差异造成,单核苷酸多态性 (SNP)。
限制性片段长度多态性(RFLP)。
restriction fragment length polymorphism
 串联重复多态性:DNA指纹。
 线粒体 DNA 多态性:人类起源的线索。
 Multiple alleles can appear in specific genomic loci
Polymorphism:
Differences in base composition cause the single nucleotide
polymorphism (SNP).
Restriction fragment length polymorphism (RFLP).
Tandem repeat polymorphism: DNA fingerprinting.
Mitochondrial DNA polymorphism:it clues to human origins.
易感基因与环境的相互作用
interactions of susceptibility gene and environment
HIV 与受体 CCR5
ApoE4与AD
Asyn duplication与PD
HIV and the receptor CCR5,
ApoE4 and AD
Asyn duplication and PD
Download