Pacific Biosciences Use a circular template to get redundant reads and

advertisement
1
Pacific Biosciences
Use a circular
template to get
redundant reads and
so more accuracy.
2
DNA methylation detection by bisulfite conversion
3
Detection of methylated adenine in Pacific Biosciences (SMRT) sequencing
4
IPD = average interpulse
duration ratio (meth/non-meth)
5
Pacific Biosciences
•
50,000 ZMWs (Aug., 2011), and density may climb
•
Long reads (e.g., full molecules to determine full length splicing isoforms)
•
Direct RNA sequencing possible.
•
DNA methylation detectable
6
Agilent SureSelect RNA
Target Enrichment
Capture a subgenomic
region of interest for
economy and speed of
sequencing:
E.g.,
the entire exome (all exons
w/o introns or intergeneic
regions)
hundreds of cancer genes
a particular genomic locus
Alternative: hybridize to a
custom microarray.
Agilent
7
Nimblegen (Roche) sub=-genomic
DNA capture options:
Beads or microarrays
8
Some results using DNA
capture for subgenomic
sequencing
Targeted Capture and NextGeneration Sequencing
Identifies C9orf75, encoding
Taperin, as the Mutated
Gene in Nonsyndromic
Deafness DFNB79
Rehman et al.
American Journal of Human
Genetics 86, 378–388,2010
cytosine
Detection of methylated C (~all in CpG dinucleotides)
----CmpG--- >
----CpG-- >
----CmpG--- >
< ---G p Cm--DS DNA
Na bisulfite
Heat
Na bisulfite
Heat
deamination
----CmpG--- >
----UpG-- >
PCR
----TpG-- >
<--ApC--uracil
----CpG-- >
<--GpC---
All NON-methylated Cs changed to T. Sequence
and compare to deduce the methylated C’s
9
10
DEEP SEQUENCING (Next generation sequencing, High throughput sequencing,
Massively parallel sequencing) applications:
Human genome re-sequencing (mutations, SNPs, haplotypes, disease associations,
personalized medicine)
Tumor genome sequencing
Microbial flora sequencing (microbiome, viruses)
Metagenomic sequencing (without cell culturing)
RNA sequencing (RNAseq; gene expression levels, miRNAs, lncRNAs, splicing isoforms)
Chromatin structure (ChIP-seq; histone modifications, nucleosome positioning)
Epigenetic modifications (DNA CpG methylation and hydroxymethylation)
Transcription kinetics (GROseq; nascent RNA, BrdU pulse labeled RNA)
High throughput genetics (QUEPASA; cis-acting regulatory motif discovery)
Drug discovery (bar-coded organic molecule libraries) [Manocci PNAS paper]
11
Ke et al, and Chasin, Quantitative evaluation of all hexamers as exonic splicing
elements. Genome Res. 2011. 21: 1360-1374 ).
Order an equal mixture
of all 4 bases at these 6 positions
12
Quantifying extensive phenotypic arrays
from sequence arrays (= QUEPASA)
13
Rank
1
2
3
4
5
6
7
8
6-mer
AGAAGA
GAAGAT
GACGTC
GAAGAC
TCGTCG
TGAAGA
CAAGAA
CGTCGA
:
4086
4087
4088
4089
4090
4091
4092
4093
4094
4093
4094
4095
4096
TAGATA
AGGTAG
CGTCGC
CTTAAA
CCTTTA
GCAAGA
TAGTTA
TCGCCG
CCAGCA
CTAGTA
TAGTAG
TAGGTA
CTTTTA -1.0610
ESRseq score (~ -1 to +1)
1.0339
0.9918
0.9836
0.9642
0.9517
Best exonic splicing enhancers
0.9434
0.9219
0.8853
:
-0.8609
-0.8713
-0.8850
-0.8786
-0.8812
- 0.8911
Worst exonic splicing enhancers,
-0.8933
= best exonic splicing silencers
-0.9113
-0.8942
-0.9251
-0.9383
-0.9965
14
Constitutive exons
Alternativexons
Pseudo exons
Composite exon (from ~100,000)
15
15
What the data looks like:
Sequence of 36
Quality code
CGCACTGTGCTGGAGCTCCCGGGGTTAACTCTAGAA abU^Vaa`a\aaa]aWaTNZ`aa`Q][TE[UaP_U]
TACACTGTGCTGGAGCTCCCAACGGCAACTCTAGAA a`P^Wa`[`Wa^`X_X_XWVa^NSP]_]S^X_T\X^
CGCACTGTGCTGGAGCTCCCATGGAGAACTCTAGAA aTa`^b``baaaa^aab^YaTQLOHIa`^a``TX]]
TACACTGTGCTGGAGCTCCCCTCCCAAACTCTAGAA I_`aaaa`aaaaaaa_a_^[KZIGIGZ`U`\^P^^`
CGCACTGTGCTGGAGCTCCCAATAGTAACTTTAGAA aY_\abb[T\abaaa`a`bZ[HXXIZa_`_LGMS[`
TATACTGTGCTGGAGCTCCCGACGTAAACTCTAGAA aba]^aa_a]`aa]_]`XWSMFGGIPX[P]X`V_Y^
TACACTGTGCTGGAGCTCCCTGGTAAAACTCTAGAA a_^a^aa`aYaaa_aY`Y_^[I]VY\`]V]R\W]VV
TACACTGTGCTGGAGCTCCCAATAAAAACTCTAGAA XZababa`aZaaaaaYaYXX`baa``\\TaUa\aW`
Variable region
Constant regions
Error
(peculiar to our expt.)
2 nt barcode (TA or CG)
Experiment:
1
1
1
2
2
1+2
Barcoding allows multiplexing
of several or many
experiments at once
(in one channel of a
sequencer)  economy. Here,
two2biological2 replicates
1
2
16
Next generation methods for high throughput genetic analysis:
Use custom oligo libraries to construct minigene libraries (40,000, up to 60 nt
long):
E.g., for saturation mutagenesis to identify all exonic bases contributing to splicing
(or transcription or polyadenylation, …..)
Use bar codes to detect sequences missing from the selected molecules
E.g., Nat Biotechnol. 2009 27:1173-5. High-resolution analysis of DNA regulatory
elements by synthetic saturation mutagenesis. Patwardhan RP, Lee C, Litvin O,
Young DL, Pe'er D, Shendure J.
Long (200-mer) synthetic oligo library
OUTLINE OF LECTURE TOPICS COMING UP
Expression and manipulation of transgenes in the laboratory
•
In vitro mutagenesis to isolate variants of your protein/gene with desirable properties
–
–
–
–
•
To study the protein: Express your transgene
–
–
–
–
–
•
•
•
•
•
•
Single base mutations
Deletions
Overlap extension PCR
Cassette mutagenesis
Usually in E. coli, for speed, economy
Expression in eukaryotic hosts
Drive it with a promoter/enhancer
Purify it via a protein tag
Cleave it to get the pure protein
Explore protein-protein interaction
Co-immunoprecipitation (co-IP) from extracts
2-hybrid formation
surface plasmon resonance
FRET (Fluorescence resonance energy transfer)
Complementation readout
17
17
RS1
18
18
RS2
Site-directed mutagenesis by
overlap extension PCR
PCR
fragment
subsequent cloning
in a plasmid
(or not, the PCR product itself
can be used in many ways,
e.g., transfection)
Ligate into similarly cut vector
Cut with RE 1 and 2
Strachan and Read Human Mol. Genet.3, p.148
1
RS1
RS2
2
19
19
Cassette mutagenesis = random mutagenesis but in a limited region:
1) by error-prone PCR
---------------------------------------------------------------------------------------------------------------------
Original sequence
coding for, e.g., a transcription
enhancer region
PCR fragment with high Taq
polymerase and Mn+2 instead of Mg+2  errors
------*--------*--*-**---------------*-----------*--*------*------------------------*-*-*------------*------------*--
Cut in primer sites and clone upstream of a reporter protein sequence.
Pick colonies
Analyze phenotypes
Sequence
20
20
Cassette mutagenesis = random mutagenesis but in a limited region:
2) by “doped” synthesis
Target = e.g., an enhancer element
----------------------------------------------------------Original enhancer sequence
-----------------------------------------------------------*------------------------*-*-*------------*------------*-------*--------*--*-**---------------*-----------*--*------
Clone upstream of a reporter.
Pick colonies
Analyze phenotypes
Sequence
Buy 2 doped oligos; anneal
OK for up to ~80 nt.
Doping = e.g.,
90% G,
3.3% A,
3.3% C,
3.3% T
at each position
21
21
E. coli as a host
•
PROs:Easy, flexible, high tech, fast, cheap;
but problems
•
•
•
•
•
CONs
Folding (can misfold)
Sorting within the cell -> can form inclusion bodies
Purification -- endotoxins
Modifications -- not done (glycosylation, phosphorylation, etc. )
•
•
•
•
•
•
•
•
•
•
Modifications:
Glycoproteins
Acylation: acetylation, myristoylation
Methylation (arg, lys)
Phosphorylation (ser, thr, tyr)
Sulfation (tyr)
Prenylation (farnesyl, geranylgeranyl on cys)
Vitamin C-Dependent Modifications (hydroxylation of proline and lysine)
Vitamin K-Dependent Modifications (gamma carboxylation of glu)
Selenoproteins (seleno-cys tRNA at UGA stop)
E. coli expression vectors
Promoter examples:
1) Lac promoter (with operator)-YFG, + lac repressor (I gene):
Induce expression by inactivationof thelac repressor with IPTG or lactose
2) As above but with a hybrid Tac promoter (tryptophan operon + lac operon):
Stronger. Use iq mutant of lac I gene, which prodices high levels of the lac
repressor.
Expression regulatatable over several orders of magnitude.
3) BAD promoter-YFG. Arabinose utilization operon. Inducible by arabinose via the
endogenous araC gene for a transciptional activator. Background levels driven
down by including glucose.
4) Phage T7 promoter-YFG. Vector carries gene for T7 polymerase, under control
of the lac promoter. Add IPTG or lactose to induce T7 polymerase and thence
YFG.
IPTG = isoproplthiogalactoside (non-metabolizable indicer)
YFG = your favorite gene
23
Myristoylation – myristoic acid to N-terminal glycine alpha amino group
Anchors protein to memebrane.
24
Lysine epsilon amino group modifications
mono methyl, dimethyl also
Well-studied in histones, microtubules
25
Via seleno-cys tRNA at a UGA nonsense codon
Sequence context dictates efficiency.
26
Gamma carboxylation of glutamic acid
Binds calcium, used in coagulation proteins
27
27
Some alternative hosts
•
•
•
•
Yeasts (Saccharomyces , Pichia)
Insect cells with baculovirus vectors
Mammalian cells in culture (later)
Whole organisms (mice, goats, corn)
(not discussed)
• In vitro (cell-free), for analysis only, not preparatively
(good for radiolabeled proteins, discussed later)
Some popular yeast promoters
Selectable marker
ori
http://biochemie.web.med.unimuenchen.de/Yeast_Biol/04 Yeast Molecular
Techniques.pdf
ARS = autonomously replicating sequence element
29
29
Yeast Expression Vector (example)
Saccharomyces cerevisiae
2 mu seq features:
(baker’s yeast)
yeast ori
oriE = bacterial ori
Ampr = bacterial selection
LEU2, e.g. = Leu biosynthesis
for yeast selection
Complementation of
an auxotrophy can
be used instead of
drug-resistance
2μ = 2 micron plasmid
GAPD term’n
Your
favorite
gene
(Yfg)
LEU2
Auxotrophy = state of a mutant
in a biosynthetic pathway
resulting in a requirement for a
nutrient
For growth in E. coli
Ampr
GAPD prom
oriE
GAPD = the enzyme glyceraldehyde-3 phosphate dehydrogenase
Got this far
31
Yeast - genomic integration via homologous
recombination
t
p
Vector
DNA
gfY
HIS4
Genomic
DNA
Genomic
DNA
HIS4 mutation-
t
p
Yfg
Functional
HIS4 gene
Defective
HIS4 gene
32
Double recombination Yeast (integration in Pichia pastoris)
HIS4
P. pastoris
-tight control
-methanol induced (AOX1)
-large scale production
(gram quantities)
Vector
DNA
AOX1t
Yfg
3’AOX1
AOX1p
Genomic
DNA
Alcohol oxidase gene
AOX1 gene (~ 30% of total protein)
Genomic
DNA
Yfg
AOX1p
AOX1t
HIS4
3’AOX1
Expression in mammalian cells
Lab examples of immortal cell lines:
HEK293 Human embyonic kidney (high transfection efficiency)
HeLa
Human cervical carcinoma (historical, low RNase)
CHO
Chinese hamster ovary (hardy, diploid DNA content, mutants)
Cos
Monkey cells with SV40 replication proteins (-> high transgene copies)
3T3
Mouse or human exhibiting ~regulated (normal-like) growth
+ various others, many differentiated to different degrees, e.g.:
BHK
Baby hamster kidney
HepG2 Human hepatoma
GH3
Rat pituitary cells
PC12
Mouse neuronal-like tumor cells
MCF7 Human breast cancer
HT1080 Human fibroblastic cells with near diploid karyotype
IPS
induced pluripotent stem cells
and:
Primary cells cultured with a limited lifetime.
E.g.,
MEF = mouse embryonic fibroblasts, HDF = Human diploid fibroblasts
Common in industry:
NS1
mAbs
Vero
vaccines
CHO
mAbs, other therapeutic proteins
PER6 mAbs, other therapeutic proteins
Mouse plasma cell tumor cells
African greem monkey cells
Chinese hamster ovary cells
Human retinal cells
Mammalian cell expression
Generalized gene structure for mammalian expression:
polyA site
Mam.prom.
intron
5’UTR
Intron is
optional but
a good idea
cDNA gene
3’UTR
Popular mammalian cell promoters
•
•
•
•
•
•
•
•
•
SV40 LargeT Ag (Simian Virus 40)
RSV LTR (Rous sarcoma virus)
MMTV (steroid inducible) (Mouse mammary tumor virus)
HSV TK (low expression) (Herpes simplex virus)
Metallothionein (metal inducible, Cd++)
CMV early (Cytomegalovirus)
Actin
EIF2alpha
Engineered inducible / repressible:
tet, ecdysone, glucocorticoid (tet = tetracycline)
Engineered regulated expression:
Tetracycline-reponsive promoters
Tet-OFF (add tet  shut off)
Tet-OFF
tTA = tet activator fusion protein:
tetR = tet repressor (original role)
tetR
domain
VP16 transcription
activation domain
active
No tet.
Binds tet operator (multiple copies)
(if tet not also bound)
Tet-OFF
VP16 transcription
tetR
activation domain
domain
Allosteric
change in
conformation
Tetracycline (tet), or,
better, doxicyclin (dox)
not active
tTA gene must be in cell (permanent transfection, integrated):
polyA site
CMV
prom.
tTA cDNA
(Bujold et al.)
polyA site
Tet-OFF, cont.
MIN. CMV prom.
your favorite gene
Mutliple tet operator elements
No doxicyclin:
VP16 tc’n
tetR
domain act’n domain
active
Plenty of transcripton
MIN. CMV prom.
polyA site
your favorite gene
tetR
VP16 tc’n
domain act’n domain
Doxicyclin present:
MIN. CMV prom.
not active
little transcripton (2%?, bkgd)
polyA site
your favorite gene
Tet-ON
Tetracycline-reponsive promoters
Tet-ON (add tet  turn on gene
Different fusion protein:
Does NOT bind tet operator
(if tet not bound)
tetR
VP16 tc’n
domain act’n domain
not active
tetR
VP16 tc’n
domain act’n domain
active
Tetracycline (tet), or,
better, doxicyclin (dox)
polyA site
Full CMV prom.
tTA cDNA
Must be in cell (permanent transfection, integrated):
commercially available (293, CHO) or do-it-yourself
Tet-ON
polyA site
MIN. CMV prom.
your favorite gene
Mutliple tet operator elements
tetR
VP16 tc’n
domain act’n domain
Doxicyclin absent:
not active
little transcription (bkgd.)
polyA site
MIN. CMV prom.
your favorite gene
Add dox:
VP16 tc’n
doxicyclin tetR
domain act’n domain
active
active
Plenty of transcripton (> 50X)
MIN. CMV prom.
your favorite gene
polyA site
Download