Alternative splicing

advertisement
Lecture of Principles of gene engineering 2008.5.5
ATGGTGAAACTGGCGTTTCCGCGTGAACTGCGTCTGCTGACCCCGAGCCAGTTTACCTTTGTGTTTCAGCAGCCGCAGCGTGCGGGCACCCCGCAGA
TTACCATTCTGGGCCGTCTGAACAGCCTGGGCCATCCGCGTATTGGCCTGACCGTGGCGAAAAAAAACGTGCGTCGTGCGCATGAACGTAACCGTA
TTAAACGTCTGACCCGTGAAAGCTTTCGTCTGCGTCAGCATGAACTGCCGGCGATGGATTTTGTGGTGGTGGCGAAAAAAGGCGTGGCGGATCTGG
ATAACCGTGCGCTGAGCGAAGCGCTGGAAAAACTGTGGCGTCGTCATTGCCGTCTGGCGCGTGGCAGCATGGTGAAACTGGCGTTTCCGCGTGAAC
TGCGTCTGCTGACCCCGAAACATTTTAACTTTGTGTTTCAGCAGCCGCAGCGTGCGAGCAGCCCGGAAGTGACCATTCTGGGCCGTCAGAACGAACT
GGGCCATCCGCGTATTGGCCTGACCATTGCGAAAAAAAACGTGAAACGTGCGCATGAACGTAACCGTATTAAACGTCTGGCGCGTGAATATTTTCG
TCTGCATCAGCATCAGCTGCCGGCGATGGATTTTGTGGTGCTGGTGCGTAAAGGCGTGGCGGAACTGGATAACCATCAGCTGACCGAAGTGCTGGG
CAAACTGTGGCGTCGTCATTGCCGTCTGGCGCAGAAAAGCATGCTGAAAGTGGTGAAAGTGTATCTGCATAACCATAACAGCCAGTTTCTGGTGGT
GAAACTGAACTTTAGCCGTGAACTGCGTCTGCTGACCCCGATTCAGTTTAAAAACGTGTTTGAACAGCCGTTTCGTGCGAGCACCCCGGAAATTACC
ATTCTGGCGCGTAAAAACAACCTGGAACATCCGCGTCTGGGCCTGACCGTGGCGAAAAAACATCTGAAACGTGCGCATGAACGTAACCGTATTAAA
CGTCTGGTGCGTGAAAGCTTTCGTCTGAGCCAGCATCGTCTGCCGGCGTATGATTTTGTGTTTGTGGCGAAAAACGGCATTGGCAAACTGGATAACA
ACACCTTTGCGCAGATTCTGGAAAAACTGTGGCAGCGTCATATTCGTCTGGCGCAGAAAAGCATGAGCCAGGATTTTAGCCGTGAAAAACGTCTGC
TGACCCCGCGTCATTTTAAAGCGGTGTTTGATAGCCCGACCGGCAAAGTGCCGGGCAAAAACCTGCTGATTCTGGCGCGTGAAAACGGCCTGGATC
ATCCGCGTCTGGGCCTGGTGATTGGCAAAAAAAGCGTGAAACTGGCGGTGCAGCGTAACCGTCTGAAACGTCTGATGCGTGATAGCTTTCGTCTGA
ACCAGCAGCTGCTGGCGGGCCTGGATATTGTGATTGTGGCGCGTAAAGGCCTGGGCGAAATTGAAAACCCGGAACTGCATCAGCATTTTGGCAAAC
TGTGGAAACGTCTGGCGCGTAGCCGTCCGACCCCGGCGGTGACCGCGAACAGCGCGGGCGTGGATAGCCAGGATGCGATGCTGAACTATTTTTTTA
AAAAAAAAAGCAAACTGCTGAAAAGCACCAACTTTCAGTATGTGTTTAGCAACCCGTGCAACAAAAACACCTTTCATATTAACATTCTGGGCCGTA
GCAACCTGCTGGGCCATCCGCGTCTGGGCCTGAGCATTAGCCGTAAAAACATTAAACATGCGTATCGTCGTAACAAAATTAAACGTCTGATTCGTGA
AACCTTTCGTCTGCTGCAGCATCGTCTGATTAGCATGGATTTTGTGGTGATTGCGAAAAAAAACATTGTGTATCTGAACAACAAAAAAATTGTGAAC
ATTCTGGAATATATTTGGAGCAACTATCAGCGTATGGAAAAAGGCTTTAGCGTGGGCTGGCGTATTCGTACCACCGCGGAATTTCGTCGTATTTATG
CGGCGCGTCAGCGTATTATTGGCCGTTATTATCTGCTGTATTATCGTGAAAACGAAATTAAACATAGCCGTCTGGGCGTGGTGGCGAGCAAACGTAA
CGTGCGTAAAGCGGTGTGGCGTAACCGTGTGCGTCGTGTGGTGAAAGAAGCGTTTCGTATTCGTAAAAAAGATCTGCCGGCGTTTGATATTGTGGTG
GTGGCGAAAGCGAGCAGCGTGGAAGCGGATAACAAAGAACTGTATGAATGCATTAACAAACTGTTTACCCAGCTGGAAAAACAGAGCAAACGTAG
CAGCAGCGTGATGCTGCCGACCGAAAACCGTCTGCGTCGTCGTGAAGATTTTGCGACCGCGGTGCGTCGTGGCCGTCGTGCGGGCCGTCCGCTGCTG
GTGGTGCATCGTCTGAGCGGCGCGACCGATCCGCATGCGCCGGGCGAAAGCGCGCCGCCGACCCGTGCGGGCTTTGTGGTGAGCAAAGCGGTGGGC
GGCGCGGTGGTGCGTAACCAGGTGAAACGTCGTCTGCGTCATCTGGTGTGCGATCGTCTGAGCGCGCTGCCGCCGGGCAGCCTGGTGGTGGTGCGT
GCGCTGCCGGGCGCGGGCGATGCGGATCATGCGCAGCTGGCGCGTGATCTGGATGCGGCGCTGCAGCGTCTGCTGGGCGGCGGCACCCGTATGCTG
CCGACCGAAAACCGTCTGCGTCGTCGTGAAGATTTTGCGACCGCGGTGCGTCGTGGCCGTCGTGTGGGCCGTAGCACCCTGGTGGTGCATCTGCGTA
GCGGCGCGACCGATCCGCATGCGCCGGGCGAAAGCGCGCCGCGTACCCGTGCGGGCTTTGTGGTGAGCAAAGCGGTGGGCGTGGCGGTGGTGCGT
AACAAAGTGAAACGTCGTCTGCGTCATCTGATGCGTGATCGTATTGATCTGCTGCCGCCGGGCAGCCTGGTGGTGGTGCGTGCGCTGCCGGGCGCG
GGCGATGCGGATCATGCGCAGCTGGCGCGTGATCTGGATGCGGCGCTGGCGCGTCTGCTGGGCGGCGGCGCGCGTATGCTGCCGCGTGATCGTCGT
T
Dr. Jin-Mei Lai
bio2028@mails.fju.edu.tw
1
Various methods can be employed to identify and locate
the genes that reside within the genome.
1. Sequence inspection
2. cDNA comparison
Genes that code for proteins comprise “ORF”
initiation codon ……………………………………………..termination codon
(ATG)
* The average ORF length
E. coli : 317 codons
Yeast : 483 codons
(TAA, TAG, or TGA)
The search for a agene can be
thought of as a scan for an initiation
and termination codon that are
separated by, at least 100 codons.
2
Finding genes in prokaryotes is easy.
--- Just translate the DNA sequence in all 6 reading
frames. The ORFs (regions starting with ATG and
ending in an in-frame stop codon) will be at least 300
bases in length, while random reading frames will be
dotted with stop codons at the rate of about 3 stop
codons every 64 codons.
XXXXXATG…..(3X)…….TGAXXXXX
3
Finding genes in eukaryotes is harder.
4
ORF; coding sequence
* exon-intron junction:
GT-AG rule (GU-AG in mRNA)
Exon/GU-intron-AG/exon
5’-AG/GUAAGU-intron-YNCURAC-YnNAG/G-3’
Y is either pyrimidine, Yn denotes a string of about nine pyrimidines, R is either purine, A is
a special A that participates in forming a branched splicing intermediate, and N is any base.
5
Splicing mechanism, spliceosome
6
Splicing mechanism
7
Performs 2 main functions
recognition of intron/exon
boundaries
remove introns/join exons.
Made of 5 small nuclear
ribonucleoproteins (snRNPs).
Each snRNP is composed of
a single U-rich small nuclear
RNA and multiple proteins.
8
Several computer programs are available for the
identification of ORFs.
(species specific)
~ not only on the basis of initiator and terminator codons,
but also codon bias, intron-exon boundaries and
transcriptional control elements (e.g. the TATA box)
However, the sequences can be quite variable!!
Alternative approach:
 To use previously identified genes as a guide, try to
assign similar (homologous) to any existing genes.
Exceptions:
pseudogenes ( generally non-transcribed genomic DNA
with a high degree of sequence similarity to a real gene)
9

* What is codon bias?
Codon bias is the probability that a given codon will
be used to code for an amino acid over a different
codon which codes for the same amino acid.
Ex.
~ genes that are always
expressed at a high rate
should have a different
codon bias than those
genes that are always
expressed at a low rate.
10
Definition of Homolog, Ortholog and Paralog
Homolog
A gene related to a second gene by descent from a common ancestral DNA
sequence. The term, homolog, may apply to the relationship between genes
separated by the event of speciation (see ortholog) or to the relationship between
genes separated by the event of genetic duplication (see paralog).
Ortholog
Orthologs are genes in different species that evolved from a common ancestral
gene by speciation. Normally, orthologs retain the same function in the course of
evolution. Identification of orthologs is critical for reliable prediction of gene
function in newly sequenced genomes.
Paralog
Paralogs are genes related by duplication within a genome. Orthologs retain the
same function in the course of evolution, whereas paralogs evolve new functions,
even if these are related to the original one.
11
The transcriptional control elements

Bacteria have one RNA polymerase that transcribes all of their
genes. (holoenzyme: 2ω’)

RNA polymerase binds to the promoter (defined as the region
of DNA recognized by the polymerase; have a similar
nucleotide composition
12
Eukaryotes have multiple RNA polymerases which are specialized
for specific gene families
Like bacterial polymerase,
eukaryote polymerase
bind to promoters

These promoters are much more complex and are composed
of RNA polymerase binding site and DNA binding sites for
regulatory protein (do not use operator terminology for
eukaryotes instead have transcription elements or “boxes”)

Transcription factors bind to transcription elements
13
TATA BOX & Initiator Regions
TATA Box and Initiator region help RNA polymerase
start transcription at correct site

Model eukaryote promoter
TE = Transcription element
Multiple upstream elements
mRNA start site
GENE
TE4
TE3
TE2
TE1
TATA
Box
Initiator region
14
RNA transcript cleavage and
poly-adenylation are directed by
a polyA signal within the RNA.
15
Improved Definition of PAS (PolyA Site) signals.
How does the polyA machinery tells a true
cleavage site from a random AATAAA?
What other signals help dictate use of specific
sites in certain conditions?
Upstream Seq. Elemt.
« enhancer element »
Mostly found in viral
Sequences
Downstream Seq. Elemt
« constitutive »
Poorly defined
Mutations tolerated
16
2. cDNA comparison
~ The simplest way to identify a gene within a segment of
genomic DNA is compare the sequence to a copy of the
corresponding cDNA.
 Through the hybridization of
genomic DNA fragments to mRNA
separated on an agarose gel
(northern blotting).
 Through the comparison with
databases of sequenced cDNA
fragments. (ESTs)
cDNA
17
Expressed Sequence Tags (ESTs)
~ are small pieces of cDNA sequence (usually 200 to 500
bases long) that are generated by sequencing either one or
both ends of an expressed gene. (sequence only once!)



High error rate (>1%) mainly frameshifts and
insertions/deletions
Redundant sampling of 5’ and 3’ ends
Large number in public databases
mRNA
5’ ESTs
EST lengths vary
due to varying
polymerase activity
3’ ESTs
60-80% of human genes are represented in dbEST (human)
18
Expressed Sequence Tag (EST)
Partial cDNA sequences of genes expressed
in different tissues
Tissues
mRNA
cDNA
EST
5` partial sequencing
3` partial sequencing
19
EST Data are Fragmented, but there are lots of it!
ESTs
mRNA
Exon
Genome
intron
* Database of ESTs continues to grow rapidly; database of all
genes and/all gene transcripts does not yet exist.
* ESTs do not have to be checked for sequencing errors as
mistakes do not prevent identification of the gene from which
the EST was derived using similarity searches.
* ESTs database (http://www.ncbi.nlm.nih.gov/dbEST )
(contains over 12 million sequences from different organisms
including 4.5 million human sequences)
20
Expressed Sequence Tags (ESTs)
Why EST sequencing?
Systematic sampling of the transcribed portion of the genome
(“transcriptome”)
Provides “sequence tags” allowing unique identification of genes
(e.g. for SAGE)
Provides experimental evidence for the positions of exons.
Provides regions coding for potentially new proteins.
Provides clones for DNA microarrays.
Deposits readable part of sequence in database by sequencing
various cDNA libraries (>2,000), prepared from various tissues
and cell lines, using directional cloning.
Systematic effort to make libraries from cancerous tissue: CGAP
project (NCI).
Most cDNA libraries managed by the IMAGE consortium. But,
many tissues still not sampled and quality very uneven.
21
Strategy for gene discovery by using EST
Discrimination between coding and non-coding
sequences.
Cluster EST sequences to identify candidate
transcripts.
Assemble to increase length and reduce redundancy.
Detection of beginning and end of coding regions.
Find coding regions and reading frame, and correct
frameshift error.
Use deduced protein sequences as searchable
database (TrEST).
This is a gene
with 10 ESTs
The ORESTES project: to obtain EST sequences
associated. The
from the under-represented, often coding, central
portions of mRNAs, resulting in obtaining many novel cluster size is 10.
genes.
22
Importance of alternative splicing
* Do the number of genes account for the complexity
in humans?
 Humans: ~ 25,000 genes
 C. elegans: ~19,500 genes
 Arabidopsis: ~27,000 genes
* How common is alternative splicing?
35% -- 59% of human genes affected by alternative
splicing.
If only 2 splice variants/gene…
minimum of 27,000 – maximum of 39,750 unique transcripts/proteins
23
Splicing of immature mRNA
Constitutive splicing: all exons are joined together in the order in
which they occur in the heterogeneous nuclear RNA.
Alternative splicing: the production of two or more distinct mRNAs
from RNA transcripts having the same sequence via different exons.
24
Discovery of Alternative Splicing
First discovered with an Immunoglobulin heavy chain gene
(D. Baltimore et al.)
Alternative splicing gives two forms of the protein
with different C-termini
mouse immunoglobulin μ
heavy chain gene
(via C-terminus)
S-signal peptide C - constant region V- variable region green – membrane anchor
25
Red- untranslated reg. yellow – end of coding reg. for secreted form
Regulation of Alternative splicing
Sex determination in Drosophila involves 3 regulatory
genes that are differentially spliced in females versus
males; 2 of them affect alternative splicing
1. Sxl (sex-lethal) - promotes alternative splicing of tra
(exon 2 is skipped) and of its own (exon 3 is skipped)
pre-mRNA
2. Tra – promotes alternative splicing of dsx (last 2 exons
are excluded)
3. Dsx (double-sex) - Alternatively spliced form of dsx
needed to maintain female state
26
Alternative splicing in Drosophila maintains the female state.
Sxl and Tra are SR proteins!
Tra binds exon 4 in dsx mRNA causing it to be retained in mature mRNA.
27
Finding Potential Splice Sites using ESTs
28
After finish the genome sequencing projects, it was realized
that only less of the genes had been previously characterized.
Two methods are currently used to assign the function of a
gene based only on its sequence.
Similarity searches
~ many genes that encode proteins with the same function
in different organism will be similar.
Experimental gene assignment
~ the phenotype of the disrupted mutant or gene knock out
can be assessed in order to attempt to identify the natural
function of the wild-type gene.
29
Similarity searches
~ usually performed using amino acid sequence.
The amino acid sequence of the galactokinase from one organism
shares similarity to the galactokinase from another organism.
30
Experimental gene assignment
 In experimental organisms, such as E. coli or yeast, one of
the most popular ways of ascribing a function into an unknown
gene is to make a gene knockout.
~ the phenotype of the gene knockout can be assessed to
identify the natural function of the wild-type gene.
Ex. yeast gene SNU17
 shows little similarity to other proteins when compared
using database searches.
 however, a yeast strain knocked out for SNU17 shows a
slow-growth phenotype and is defective in pre-mRNA splicing,
indicating that the protein is involved in the splicing process.
Alternative approach:  overproduce a protein.
31
Chapter 7.
Definitions
Mutation: a change in the nucleic acid sequence
(bases) of an organism’s genetic material (a change
in the genetic material of an organism).
Directed mutagenesis: a change in the nucleic
acid sequence (or genetic material) of an organism
at a specific predetermined location.
32
Silent mutations
Most amino acids are encoded by several different codons. For
example, if the third base in the TCT codon for serine is
changed to any one of the other three bases, serine will still be
encoded. Such mutations are said to be silent .
Missense mutations
With a missense mutation, the new nucleotide alters the codon
so as to produce an altered amino acid in the protein product.
Nonsense mutations
The mutation generate STOP codons (TAA, TAG, or TGA).
Frame shift mutations
insertions or deletions of one or a few nucleotides in a coding
sequence. Usually very detrimental.
33
In vitro mutagenesis, Why?
 Want to determine how DNA and/or encoded proteins
function in intact entity (virus, bacterium, cell, animal
etc.)
 Most direct way to find out what a gene or protein does
is to find out what happens when it is missing or mutated.
 Study mutants that lack gene/protein or express altered
version of it - determine which biological processes are
altered in mutants.
34
In Vitro Mutagenesis
 At its most simplistic, in vitro mutagenesis allows us to
change the base sequence of a DNA segment or gene.
 Mutations can be localized or general, random or
targeted;
 Less specific methods of mutagenesis used to analyze
regulatory regions of genes.
 More specific methods of mutagenesis used to understand
contribution of individual amino acids, or groups of amino
acids, to structure and function of target protein.
 Both methods generate mutants in vitro, without
phenotypic selection
35
Directed Mutagenesis
Directed mutagenesis can be done using:
♥
M13 DNA (using primer extension mutagenesis)
♥
Dut-/ung- strand selection
♥
Cassette mutagenesis
♥
PCR based mutagenesis
♥
QuikChangR mutagenesis
♥
Random mutations
* Ala scanning, Charged to Ala scanning mutagenesis
* Doped cassette mutagenesis
* Error-prone PCR:
36
Figure 7.1
Directed Mutagenesis Using M13
The procedure involves:

The gene of interest is inserted into
the ds form of the M13
bacteriophage.
(M13 has ssDNA and replicated via a
dsDNA intermediate).

The ssDNA is isolated from the M13
phage.
37
~ continue
 The ssDNA is mixed with an excess
of the synthetic oligonucleotide.
 The oligo is complimentary to the
area of the cloned gene except for
the one nucleotide to be changed.
 The oligo anneals to the ssDNA.
 The oligo acts a primer for DNA
synthesis using the M13 DNA as a
template and the enzyme Klenow
fragment of DNA polymerase I.
 T4 DNA ligase is used to ligate the
ends of the newly synthesized DNA.
 The newly synthesized M13 DNA is
transformed into E. coli.
38
~ continue
Because DNA replicates semiconservatively, half the cells
should have the mutant gene.
Mutant plaques are identified by
DNA hybridization using the oligo
as probe.
However, … a number of drawbacks
1.
The DNA that is to be mutated needs to be cloned into the M13
genome.
2. The efficiency of the mutagenesis procedure is quite low.
3. The newly synthesized DNA will not be methylated and will be
repaired by the mismatch repair system of the E. coli.
4. The final screening procedure is slow and cumbersome.
39
Enrichment for identifying the Mutant Plaques (I)
Phosphorothioante strand selection
~ a phosphorothioate nucleotide
contains a phosphorus-sulphur
linkage in place of a phosphorusoxygen group
DNA containing phosphorothioate
linkages are resistant to cleavage
by certain restriction enzymes.
Mutation efficiency 
40
Enrichment for identifying the Mutant Plaques (II)
One strategy has been to introduce M13 vector carrying
the desired gene into an E. coli strain with 2 defective
enzymes:
A defective form of dUTPase (dut).
Cells with defective dUTPase has elevated levels of
dUTP which is incorporated into the DNA often
replacing dTTP.
A defective Uracil N-glycosylase (ung).
Uracil N-glycosylase is the enzyme that removes dUTP
which is incorporated into DNA during replication.
41
Enrichment for identifying the Mutant Plaques (II)
The procedure involves:
The desired gene is cloned into M13 vector.
The M13 vector with the desired gene is
transformed into E. coli stain dut-/ung-, which
produces ssDNA with the T replaced by U.
Anneal mutagenic oligonucleotide and synthesis
of a second strand.
Addition of T4 ligase.
The dsDNA is transformed into E. coli wild type
strain, which will use Uracil N-glycosylase to
remove the dUTP which was incorporated into
the DNA.
Therefore the original DNA strand is degraded
and only the mutant strand remains.
In this way the number of plaques with the
mutant gene is greatly increased.
42
Cassette mutagenesis
~ relies on the presence of two restriction enzyme recognition sites
flanking the DNA that is to be mutated.
(containing the desired
mutations & overhanging
sequences for the ligation)
1.
2.
It requires two restriction enzyme recognition sequences to flank
the DNA that is to be mutated.
Oligonucleotides are difficult to synthesize accurately above about
43
70nts in length.
PCR based mutagenesis
PCR can be used to :
~ Enrich for the mutant gene
~ Avoid using M13 vector
The procedure involves:

The target gene is cloned
into an E.coli plasmid.

4 specific oligos are added
to the PCR reaction.
~ 2 primers are complimentary
to the target.
~ The other primers are
complimentary to the target
gene except for the
nucleotide that is targeted
for change.
Two-step PCR to introduce mutations into the
44
middle of an amplified DNA
fragment.
QuikChangeR Mutagenesis
~ using the power of PCR to introduce
mutations directly into plasmid DNA
~ alleviate the need for a additional
cloning steps.
* DpnI:
CH3
5’ – GA TC – 3’
3’ – CT AG – 3’
CH3
~ the newly synthesized DNA will not be
methylated, and consequently will not
be cleaved by the restriction enzyme.
Very rapid (3~4 h) and is highly
efficient (~80%) at producing
mutant DNA plasmid.
(dam+)
Dam methylase
45
Creating random mutations in specific genes.
Why?
It is not always possible to know which amino acids of a
protein
should
altered, or
what they
shouldwhich
be altered
This type
ofbe
approach
requires
a screen
is to.
available
for identifying protein function.
Some systematic
approaches:
Alanine scanning mutagenesis:
~ can identify a.a. side chains that are important for protein function
with the premise that the presence of Ala will not perturb the overall
structure of the protein and will only eliminate a.a. side chain
interactions.
Charged to alanine scanning mutagenesis:
~ most proteins contain a hydrophobic core with charged residues on
the outside surface of the protein, which may participate in, for
example, protein-protein interactions.
46
Two approaches are commonly used for the creation of random
mutations with genes: (1)
Doped cassette
mutagenesis:
~ like conventional cassette
mutagenesis, however, the
oligonucleotides do not
encode a unique sequence.
(are libraries of
oligonucleotides)
* An appropriate level of
doping can be controlled.
47
Two approaches are commonly used for the creation of random
mutations with genes: (2)
Error-prone PCR:
~ the lack of a 3’-5’ exonuclease
proofreading activity in Taq DNA
polymerase.
The error rate of Taq DNA
polymerasecan be increased
by altering a variety of the
PCR reaction conditions.
* Transitions > Transversions
48
Advantages and Disadvantages of Random Mutagenesis
What are some of the advantages of directed
mutagenesis?
Advantages of random mutagenesis:
╬ Many different mutants encoding a wide variety of
proteins are generated.
╬ Detailed information regarding function of
particular amino acids is not necessary.
Disadvantages of random mutagenesis :
╬ Many mutants have to be assayed to determine
which proteins have the desired properties.
49
Download