DNA sequencing by the Sanger method

advertisement
MB206-Jan09
Project
Samples :
Plant (A)
Objective:
Isolate 100 ESTs from Plant (A)
RNA
Extraction
cDNA
Library
Construction
ESTs
Generation
RNA Extraction
Extract RNA from sample (A) – method
depend on sample. Check previous note.
Check the quality and quantity of the
RNA.
Isolate mRNA from the RNA (using kits)
Check the quality and quantity of the
mRNA.
Then?
mRNA isolation
Most eukaryotic mRNAs are polyadenylated
at their 3’ ends
5’ cap
AAAAAAAAAAn
• oligo (dT) can be bound to the poly(A) tail and used to
recover the mRNA.
Angelia 09
5
Check the mRNA integrity
Make sure that the mRNA is not degraded.
Methods:
Analysis the mRNAs by gel elctrophoresis:
use agarose or polyacrylamide gels
Angelia 09
6
Cloning the particular mRNAs
Is useful especially one is trying to clone a
particular gene rather to make a complete cDNA
library.
Fractionate on the gel:
performed on the basis of size, mRNAs of
the interested sizes are recovered from
agarose gels
Enrichment: carried out by hybridization
Example: clone the hormone induced mRNAs
(substrated cDNA library)
Angelia 09
7
Types of Libraries
Genomic Library
• whole genes w/ promoters & introns (Euk.), operons
(bacteria), DNA regulatory elements…
cDNA Library
• mRNA transcript only w/ 5’ & 3’ untranslated regions
(UTRs), no introns, tissue specific.
(5’UTR)
(3’UTR)
8
Angelia 09
cDNA Libraries
cDNA library
Genomic DNA
mRNA
polyA
Reverse
transcribe
cDNA
(and more)
polyA
polyA
Genomic DNA library
Clone in vector
Genomic DNA
Digest
DNA fragments
Angelia 09
9
Choosing a Vector
Usually you select a vector (plasmid, λ, other)
depending on how big you want your DNA
fragments to be & the capacity of the vector.
Angelia 09
10
Lambda Library
Lodish, et al. Fig 7-12
Plasmid Library
Lodish, et al. Fig 7-1
Angelia 09
12
Plasmid !!!
mRNA isolation, purification
Check the RNA integrity
Synthesis of cDNA
Treatment of cDNA ends
Ligation to vector
cDNA libraries
1. No cDNA library
was made from
prokaryotic mRNA.
• Prokaryotic mRNA is very unstable
• Genomic libraries of prokaryotes
are easier to make and contain all
the genome sequences.
Angelia 09
15
cDNA libraries
2.cDNA libraries are very
useful for eukaryotic
gene analysis
•
•
•
•
Condensed protein encoded gene libraries, have
much less junk sequences.
cDNAs have no introns  genes can be expressed in E.
coli directly
Are very useful to identify new genes
Tissue or cell type specific (differential expression of
genes)
Angelia 09
16
Synthesis of cDNA :
First stand synthesis:
materials as reverse
transcriptase ,primer( oligo(dT) or
hexanucleotides) and dNTPs
(Fig 1.1)
Second strand synthesis:
best way of making full-length cDNA is to
‘tail’ the 3’-end of the first strand and then
use a complementary primer
to make the second.
Angelia 09
17
5’
5’
3’
5’
3’-CCCCCCC
5’-pGGGG-OH
3’-CCCCCCC
5’-pGGGG
3’-CCCCCCC
mRNA
AAAAA-3’
HO-TTTTTP-5’
Reverse transcriptase
Four dNTPs
mRNA
AAAAA-3’
cDNA
TTTTTP-5’
cDNA
AAAAA-3’
TTTTTP-5’
Terminal transferase
dCTP
mRNA
cDNA
Alkali (hydrolyaes RNA)
Purify DNA oligo(dG)
TTTTTP-5’
Klenow polymerase or reverse
Transcriotase Four dNTPs
-3’
TTTTTP-5’
Duplex cDNA
Angelia 09
18
Duplex cDNA
5’-pGGGG
3’-CCCCCCC
-3’
TTTTTp-5’
Single strand-specific nuclease
5’-pGGGG
3’-CCC
-3’
TTTTTp-5’
Klenow polymerase
treat with E.coRI methylase
5’-pGGGG
3’-CCCC
Add E.colRI linkers
using T4 DNA ligase
HO-CCGAATTCGGGGGG
3’-GGCTTAAGCCCCCC
-3’
TTTTTp-5’
HO-CCG/AATTCGG-3’
3’-GGCTTAA/GCC-OH
CCGAATTCGG-3’
TTTTTGGCTTAAGCC-OH
E.colRI digestion
5’-pAATTCGGGGGG
3’-CCCCCCC
CCG-3’
TTTTTGGCTTAAp-5’
Ligate to vector and transfom
Fig2.1
Second strand synthesis
19
Treatment of cDNA ends
Blunt and ligation of large fragment is not efficient, so we have to
use special acid linkers to create sticky ends for cloning.
The process :
Move protruding 3’-ends (strand-special nuclease)
Fill in missing 3’ nucleotide (klenow fragment of
DNA polyI and 4 dNTPs)
Ligate the blunt-end and linkers(T4 DNA ligase)
Tailing with terminal transferase or
using adaptor molecules
Restriction enzyme digestion (E.coRI )
20
Ligation to vector
Any vectors with an EcoRI site would suitable
for cloning the cDNA.
The process :
Dephosphorylate the vector with alkaline
phosphatase
Ligate vector and cDNA with T4 DNA ligase
(plasmid or λ phage vector)
21
Screening
Screening
The process of identifying one particular
clone containing the gene of interest from
among the very large number of others in the
gene library .
Plate the cDNA library on LB
agar plates
-It need the help of host.
-The detail can refer any cDNA library construction kits.
Angelia 09
23
Pick colony
Culture in LB broth
(antibiotic)
37oC, overnight
Plasmid
Preparation
Gel electrophoresis
&
Spectrophotometer
Verification
Restriction
Enzyme
Gel
electrophoresis
Verification
PCR
Gel
electrophoresis
Expressed Sequence Tag
(EST)
Messenger RNA (mRNA) sequences in the cell
represent copies from expressed genes.
RNA cannot be cloned directly
reverse transcribed to double-stranded cDNA
The resultant cDNA is cloned to make libraries
representing a set of transcribed genes of the
original cell, tissue or organism.
Characteristics of EST sequences
Nagaraj, S. H. et al. Brief Bioinform 2007 8:6-21; doi:10.1093/bib/bbl015
Sequencing
DNA sequencing by the Sanger method
The standard DNA sequencing technique is the Sanger method,
named for its developer, Frederick Sanger, who shared the 1980
Nobel Prize in Chemistry. This method begins with the use of
special enzymes to synthesize fragments of DNA that terminate
when a selected base appears in the stretch of DNA being
sequenced. These fragments are then sorted according to size
by placing them in a slab of polymeric gel and applying an
electric field -- a technique called electrophoresis. Because of
DNA's negative charge, the fragments move across the gel toward
the positive electrode. The shorter the fragment, the faster it
moves. Typically, each of the terminating bases within the
collection of fragments is tagged with a radioactive probe for
identification.
DNA sequencing example
Problem Statement: Consider the following DNA
sequence (from firefly luciferase). Draw the sequencing
gel pattern that forms as a result of sequencing the
following template DNA with ddNTP as the capper.
atgaccatgattacg...
Solution:
Given DNA template:
DNA synthesized:
5'-atgaccatgattacg...-3'
3'-tactggtactaatgc...-5'
DNA sequencing example
Given DNA template: 5'-atgaccatgattacg...-3'
DNA synthesized:
3'-tactggtactaatgc...-5'
Gel pattern:
+-------------------------+
lane ddATP
|W |
| ||
|
lane ddTTP
|W| | | | |
|
lane ddCTP
|W |
|
|
|
lane ddGTP
|W
||
|
|
+-------------------------+
Electric Field
+
Decreasing size
where "W" indicates the well position, and "|"
denotes the DNA bands on the sequencing gel.
A sequencing gel
This picture is a radiograph. The dark color of the lines is
proportional to the radioactivity from 32P labeled adenonsine
in the transcribed DNA sample.
Reading a sequencing gel
You begin at the right, which are the smallest DNA fragments.
The sequence that you read will be in the 5'-3' direction.
This sequence will be exactly the same as the RNA that
would be generated to encode a protein. The difference is that
the T bases in DNA will be replaced by U residues. As an example,
in the problem given, the smallest DNA fragment on the sequencing
gel is in the C lane, so the first base is a C. The next largest band
is in the G lane, so the DNA fragment of length 2 ends in G.
Therefore the sequence of the first two bases is CG.
The sequence of the first 30 or so bases of the DNA are:
CGTAATCATGGTCATATGAAGCTGGGCCGGGCCGTGC....
When this is made as RNA, its sequence would be:
CGUAAUCATGGUCAUAUGAAGCUGGGCCGGGCCGUGC....
Note that the information content is the same, only the T's have
been replaced by U's!.
The codon table
5’-Base
U(=T)
C
A
G
U(=T)
Phe
Phe
Leu
Leu
Leu
Leu
Leu
Leu
Ile
Ile
Ile
Met
Val
Val
Val
Val
Middle
C
Ser
Ser
Ser
Ser
Pro
Pro
Pro
Pro
Thr
Thr
Thr
Thr
Ala
Ala
Ala
Ala
Base
A
Tyr
Tyr
Term
Term
His
His
Gln
Gln
Asn
Asn
Lys
Lys
Asp
Asp
Glu
Glu
3’-Base
G
Cys
Cys
Term
Trp
Arg
Arg
Arg
Arg
Ser
Ser
Arg
Arg
Gly
Gly
Gly
Gly
U(=T)
C
A
G
U(=T)
C
A
G
U(=T)
C
A
G
U(=T)
C
A
G
Translating the DNA sequence
The order of amino acids in any protein is specificed by the
order of nucleotide bases in the DNA.
Each amino acid is coded by the particular sequence of three bases.
To convert a DNA sequence
First, find the starting codon. The starting codon is always
the codon for the amino acid methionine. This codon is
AUG in the RNA (or ATG in the DNA):
GCGCGGGUCCGGGCAUGAAGCUGGGCCGGGCCGUGC....
Met
In this particular example the next codon is AAG. The first base
(5'end) is A, so that selects the 3rd major row of the table. The
second base (middle base) is A, so that selects the 3rd column of
the table. The last base of the codon is G, selecting the last line in
the block of four.
Translating the DNA sequence
This entry AAG in the table is Lysine (Lys).
Therefore the second amino acid is Lysine.
The first few residues, and their DNA sequence, are as follows
(color coded to indicate the correct location in the
codon table):
Met Lys Leu Gly Arg … ...
AUG AAG CUG GGC CGG GCC GUG C..
This procedure is exactly what cells do when they synthesize
proteins based on the mRNA sequence. The process of translation
in cells occurs in a large complex called the ribosome.
Automated procedure for DNA
sequencing
A computer read-out of the gel generates a “false color” image
where each color corresponds to a base. Then the intensities are
translated into peaks that represent the sequence.
High-throughput seqeuncing:
Capillary electrophoresis
The human genome project
Sheath flow
has spurred an effort to
Laser
develop faster, higher
Sheath flow cuvette
Focusing
lens
throughput, and less
expensive technologies
for DNA sequencing.
Capillary electrophoresis
Beam block
Collection Lensc
(CE) separation has many
PMT
filter
advantages over slab gel
separations. CE separations are faster and are capable of producing
greater resolution. CE instruments can use tens and even
hundreds of capillaries simultaneously. The figure show a simple
CE setup where the fluorescently-labeled DNA is detected as it
exits the capillary.
DNA sequencing.
Dideoxy analogs of normal nucleotide triphosphates
(ddNTP) cause premature termination of a growing chain
of nucleotides.
ACAGTCGATTG
ACAddG
ACAGTCddG
ACAGTCGATTddG
Fragments are separated according to their sizes in gel
electrophoresis. The lengths show the positions of “G” in
the original DNA sequence.
Nucleotides and
phosphodiester bond.
Phosphodiester bond
Genomic sequencing.
Individual chromosomes are broken into
100kb random fragments.
This library of fragments is screened to
find overlapping fragments – contigs.
Unique overlapping clones are chosen for
sequencing.
Put together overlapping sequenced
clones using computer programs.
Sequencing cDNA libraries.
mRNA is pooled from the tissues which express
genes.
cDNA libraries are prepared by copying of mRNA
with reverse transcriptase.
Expressed Sequence Tags (EST) – partial
sequences of expressed genes.
Comparing translated ESTs to annotated
proteins – annotation of genes.
Gene prediction.
Gene – DNA sequence encoding protein,
rRNA, tRNA …
Gene concept is complicated:
- Introns/exons
- Alternative splicing
- Genes-in-genes
- Multisubunit proteins
Gene structure.
ATG
-35
TER
-10
Promoter sequences
Gene
ATG – start codon; TER (TAA, TAG,TGA) – termination codons
Codon
usage
tables.
- Each amino acid can be encoded by several codons.
- Each organism has characteristic pattern of codon usage.
Problems arising in gene
prediction.
Distinguishing pseudogenes (not working
former genes) from genes.
Exon/intron structure in eukaryotes, exon
flanking regions – not very well
conserved.
Exon can be shuffled alternatively –
alternative splicing.
Genes can overlap each other and occur
on different strands of DNA.
Gene identification
Homology-based gene prediction
• Similarity Searches (e.g. BLAST, BLAT)
• ESTs
Ab initio gene prediction
• Prokaryotes
ORF identification
• Eukaryotes
Promoter prediction
PolyA-signal prediction
Splice site, start/stop-codon predictions
Prokaryotic genes – searching
for ORFs.
- Small genomes have high gene density
Haemophilus influenza – 85% genic
- No introns
- Operons
One transcript, many genes
- Open reading frames (ORF) –
contiguous set of codons, start with Met-codon, ends with
stop codon.
Example of ORFs.
There are six possible ORFs in each sequence for both directions of
transcription.
Confirming gene location using
EST libraries.
Expressed Sequence Tags (ESTs) – sequenced
short segments of cDNA. They are organized in
the database “UniGene”.
If region matches ESTs with high statistical
significance, then it is a gene or pseudogene.
Download