Piper colubrinum

advertisement
Allele Mining Tools
Johnson George K.
Indian Institute of Spices Research, Marikunnu P.O., Calicut- 673012.
Allele mining
is a research field
aimed at identifying allelic variation
of relevant traits
within genetic resources collections.
for identified genes of known function and basic DNA sequence,
genetic resources collections may be screened for allelic variation
Breeding Synopsis
Some Facts
• In human beings, 99.9 percent bases are same.
• Remaining 0.1 percent makes a person unique.
– Different attributes / characteristics / traits
• how a person looks,
• diseases he or she develops.
• These variations can be:
– Harmless (change in phenotype)
– Harmful (diabetes, cancer, heart disease, Huntington's disease,
and hemophilia )
– Latent (variations found in coding and regulatory regions, are not
harmful on their own, and the change in each gene only
becomes apparent under certain conditions e.g. susceptibility to
lung cancer)
SNPs
(A SNP is defined as a single base change in a DNA sequence)
• SNPs are found in coding and (mostly) noncoding regions.
• Occur with a very high frequency
– about 1 in 1000 bases to 1 in 100 to 300 bases.
• The abundance of SNPs and the ease with which they can
be measured make these genetic variations significant.
• SNPs close to particular gene acts as a marker for that
gene.
• SNPs in coding regions may alter the protein structure
made by that coding region.
SNPs may / may not alter protein
structure
Applications of allele mining
Degenerate primer design and gene amplification
Degenerate primers for different groups of sequences based on the motifs in the NBS region of R-genes
Targeted Amplification of wrky gene from Piper colubrinum
WRKY domain is a DNA binding domain found in a superfamily of plant
transcription factors involved in the regulation of various physiological
programs that are unique to plants, including pathogen defense
M
1
2
3
4
Amplification of wrky gene from P. colubrinum. M- bp
ladder, Lane 1- 4 Amplification using different degenerate
primers
WRKY domain is a DNA binding domain found in a superfamily of plant transcription factors involved in the
regulation of various physiological programs that are unique to plants, including pathogen defense
Sequence analysis- WRKY gene from Piper colubrinum
Blastp results
Blastp tree View – WRKY Piper colubrinum
Steps involved in allele mining
The TILLING Method. Seeds are treated with a chemical mutagen to induce genetic
variation, and then planted. Theresulting M1 population of plants is chimeric
for mutations. Therefore, one seed from each M1 is planted to create the M2
population.
M2 DNA is extracted from leaf tissue DNA samples are pooled to increase
throughput and PCR amplified with dye-labeled PCR primers specific to a
target gene of interest. PCR products are denatured and allowed to reanneal to
form heteroduplexes. Heteroduplex DNA is then cleaved by Cel I and analyzed
Illustration of a Cel I cleavage reaction. PCR primers that have been end-labeled with
two different color dyes (red and green arrows) are used to amplify a targeted region
of the genome in a pool of DNA consisting of multiple individuals. After PCR, DNA
fragments are denatured and allowed to reanneal to form homoduplexes and
heteroduplexes. Cel I is added to the reaction and cleaves DNA 3’ of the mismatch.
The cleavage reaction is concentrated, denatured and separated electrophoretically
on a LI-COR DNA analyzer.
ECO-TILLING: DNA from many (eight) plants are pooled, The amplified products are denaturated by
heating and cooling slowly for randomly re-annealing and forming homo- and heteroduplexes,
double-stranded products are digested by CELI endonuclease, ---Gel Electrophoresis
Next Generation Sequencing
• Ultra high-throughput sequence analysis (UHTS)
•
•
•
Several platforms including 454, ABI-Solid, Illumina that are
capable of generating 1 to 100’s of Gb of DNA sequence on a
single run.
Library preparations are relatively simple and kits available
Data analysis is computationally challenging (need to process
Tb of data) and beyond the reach of many experimental
biologists.
TRANSCRIPTOME SEQUENCING
Library preparation included following steps:
1. Purification of mRNA from Total RNA
2. Fragmentation of mRNA
3. First strand cDNA synthesis
4. Second strand cDNA synthesis
5. End Repair and Phosphorylation
6. Adenylation
7. PE Adapter Ligation
8. Selection of 200 +/- 25 bp
9. PCR amplification with common PE adapter primers
10.QC by Bioanalyzer
mRNA Sequencing Sample Preparation
1. Quality check of total RNA
Customers should carry out a quality check of
their total RNA by running it out on a 1% agarose gel,
and the integrity of RNA judged upon staining with
ethidium bromide.
High quality, intact RNA will show a 28S rRNA band at
4.5kb, that should be about twice the intensity of the
18S rRNA band at 1.9kb. Both kb determinations
are relative to a 1kb ladder. The mRNA will appear as
A smear from 0.5-6kb.
Completely degraded RNA will appear as a very low
molecular weight smear.
Customers are to supply 10ug of purified total RNA
mRNA Sequencing Sample Preparation
2. mRNA Purification from Total RNA
mRNA is isolated from total RNA by binding the mRNA
to a magnetic oligo(dT) bead. mRNA has a polyA tail
and will bind to the oligo(dT) bead.
mRNA
5’-UCGGAAGCUGAAGUGAUCAGGGUUCAAUAAAAAAAAAAAAA-3’
Total RNA containing
mRNA is added to
magnetic oligo (dT)
beads
PolyA Tail
3’-TTTTTTTTTTTTT
5’-UCGGAAGCUGAAGUGAUCAGGGUUCAAUAAAAAAAAAAAAA-3’
3’-TTTTTTTTTTTTT
mRNA in supernatant
5’-UCGGAAGCUGAAGUGAUCAGGGUUCAAUAAAAAAAAAAAAA-3’
mRNA Sequencing Sample Preparation
3. Fragmentation of mRNA
The mRNA is fragmented into small pieces using divalent cations under elevated temperature.
5’-UCGGAAGCUGAAGUGAUCAGGGUUCAAUAAAAAAAAAAAAA-3’ mRNA
Add fragmentation buffer and PCR thermocycle
At 700C for 5 minutes
Random fragmentation
5’
3’
5’
5’
5’
3’
5’
3’
5’
3’ 5’
5’ 5’
3’
3’ 5’
3’
5’
3’
5’
3’
5’
3’ 5’
5’
3’
Generates fragments ranging in size
from 100 bases to 5000 bases.
3’
3’
4. First Strand cDNA Synthesis
Random Hexamer
Primers
3’
3’
5’
The cleaved RNA fragments are copied
into first strand cDNA using reverse transcriptase
and a high concentration of random hexamer primers
mRNA Sequencing Sample Preparation
5. Second Strand cDNA synthesis
Remove the strand of mRNA and synthesize a replacement strand generating double-stranded cDNA.
mRNA
RNase H
5’-UCGGAAGCUGAAGUGAUCAGGGUUCAAUAAAAAAAAAAAAA-3’
3’-AGCCTTCGACTTCACTAGTCCCAAGTTATTTTTTTTTTTTT-5’
First Strand
cDNA
RNase H activity enzymatically degrades the mRNA strand.
3’-AGCCTTCGACTTCACTAGTCCCAAGTTATTTTTTTTTTTTT-5’
DNA Polymerase I activity generates second strand cDNA
to form double-stranded cDNA.
DNA Polymerase I
5’-TCGGAAGCTGAAGTGATCAGGGTTCAATAAAAAAAAAAAAA-3’
3’-AGCCTTCGACTTCACTAGTCCCAAGTTATTTTTTTTTTTTT-5’
mRNA Sequencing Sample Preparation
6. Add ‘A’ Bases to the 3’
End of the DNA Fragments
An ‘A’ base is added to the 3’ end of the
blunt phosphorylated DNA fragments, using
the polymerase activity of Klenow fragment
(3’ to 5’ exo minus). This prepares the DNA
fragments for ligation to the adapters, which
have a single ‘T’ base overhang at their 3’ end
7. Ligate Adapters to the DNA
Fragments
P5’-AGTCTTGGATCGAC-3’
3’-TCAGAACCTAGCTG-5’P
P5’-AGTCTTGGATCGACA-3’
3’-ATCAGAACCTAGCTG-5’P
P5’-AGTCTTGGATCGACA-3’
3’-ATCAGAACCTAGCTG-5’P
Adapters are ligated to the ends of the
DNA fragments, preparing them to be
Hybridized to the flow cell
P5’-AGTCTTGGATCGACA-3’
3’-ATCAGAACCTAGCTG-5’P
mRNA Sequencing Sample Preparation
8.Purify Ligation Products
The products from the ligation are purified on a
2% agarose gel, to remove all unligated adapters,
remove any adapters that may have ligated to one
another, & select a size range of templates to go on
The cluster generation platform
Excise a gel slice in the 200bp (+/- 25bp) and
purify
mRNA Sequencing Sample Preparation
9. Enrich the Adapter-Modified cDNA Fragments by PCR
PCR is used to selectively enrich those cDNA
fragments that have adapter molecules on both
ends, & to amplify the amount of cDNA in the
library. The PCR is performed with two primers
that anneal to the ends of the adapters
P5’-AGTCTTGGATCGACA-3’
3’-ATCAGAACCTAGCTG-5’P
Denaturation
P5’-AGTCTTGGATCGACA-3’
Amplify using the following PCR protocol:
*30 seconds at 980C
*15 cycles of:
10 seconds at 980C
30 seconds at 650C
30 seconds at 720C
*5 minutes at 720C
*Hold at 40C
Primer Annealing
3’-ATCAGAACCTAGCTG-5’P
Extension
P5’-AGTCTTGGATCGACA-3’
3’-NNNNNTCAGAACCTAGCTGTNN-5’
5’-NNTAGTCTTGGATCGACNNNNN-3’
3’-ATCAGAACCTAGCTG-5’P
mRNA Sequencing Sample Preparation
10. Validate the Library
Perform the following quality control steps on the DNA
Library:
Determine the library concentration by measuring its
absorbence at 260nm. The yield from the protocol
should be between 500-1000ng of DNA.
Measure the 260/280 ratio. It should be ~1.8-2.0.
Either load 10% of the volume of the library on a gel
and check that the size range is as expected, or run
the
DNA library on an Agilent 2100 Bioanalyzer.
Sample
Determine the molar concentration of the library ready
for Cluster Generation
Data from Agilent 2100 Bioanalyzer
( Sequencing by synthesis )
Illumina's Sequencing Technology
generate large numbers of unique "polonies" (polymerase generated
colonies) that can be simultaneously sequenced by “bridge
amplification”. These parallel reactions occur on the surface of a
"flow cell" (basically a water-tight microscope slide) which provides a
large surface area for many thousands of parallel chemical reactions.
Repeated denaturation and extension results in localized amplification of single
molecules in millions of unique locations across the flow cell surface. This process
occurs in what is referred to as Illumina's "cluster station", an automated flow cell
processor.
The use of physical location to identify unique reads is a critical concept for all
next generation sequencing systems. The density of the reads and the ability to
image them without interfering noise is vital to the throughput of a given
instrument. Each platform has its own unique issues that determine this number,
454 is limited by the number of wells in their PicoTiterPlate, Illumina is limited by
fragment length that can effectively "bridge", and all providers are limited by flow
cell real estate.
Illumina sequencing
1. Fragment DNA & attach
adapters
2. Denature. SS DNA
attached randomly to flow
cell
3. Bridge amplification w/
unlabeled dNTP. Reverse
and forward primers
4. Fragments go SS to DS
5. Denature DS. SS
templates anchored to
flow cell
6. After multiple cycles,
amplification complete
7. Add all 4 labeled bases
Laser excitation
8. Image 1st base
9. Add chemistry to remove
terminal label. Add all 4
reversible terminator bases
10. Laser excitation. Record
2nd base.
11. Repeat cycles until ~100
bases read. Reads are both
directions.
12. Align data, compare to
reference, identify sequence
FASTQ File Format
Piper Transcriptomics
Genes Identified in the transcriptome of Piper colubrinum
challenged with Phytophthora
Stress inducible genes
Betaine aldehyde
dehydrogenase
Genes involved in Biosynthesis of
secondary metabolites
CHI chalcone isomerase
catalase
Chalcone synthase
Chitinase class I & VII
cinnamate 4-hydroxylase
glutathione-S-transferase
Peroxidase
cinnamoyl-CoA reductase
geranyl geranyl pyrophosphate
synthase
Beta 1,3-glucanase
hmg-CoA reductase
Cu/Zn superoxide dismutase
manganese superoxide
dismutase
lycopene beta cyclase
MAP kinase
phenylalanine ammonia lyase
p-coumaroyl shikimate 3'hydroxylase
Osmotin
Transaldolase
Summary of NGS DATA
Plant Sample Name
Piper colubrinum
Piper nigrum
Sequence File Size
Maximum Sequence
Length
Minimum Sequence
Length
Average Sequence
Length
37.70 MB
76.06 MB
15769
10479
100
100
567.844
721.922
No. of Sequences
Total Sequences
Length
Total Number of NonATGC Characters
Percentage of NonATGC Characters
62619
101284
35557875
73119148
1316
1090
0.00004
0.00001
Expression analysis I
Piper colubrinum
Contig Ident Alignm
length ity
ent
length
1742
Calmodulin83.7 3273
A.thaliana
9
Catalase- A.thaliana 1576
78.83 1162
1124
79.9 398
Geranylgeranyl
transferaseA.thaliana
79.48 1433
Heat shock protein- 2401
70- A.thaliana
2304
77.95 1111
Malate
dehydrogenaseA.thaliana
B1/WR_F1_F06
1620
97.92 96
2206
98.24 227
Alpha amylasePiper colubrinum
2595
98.82 254
Betaine aldehyde
dehydrogenasePiper colubrinum
EValue
Piper nigrum
Contig Identity
length
Alignme Ent length Value
3e-61
1801
82.93
375
1e-63
4e-94
3e-35
1353
542
79.31
81.46
1020
329
8e-98
1e-39
7e-146
1857
78.33
1269
1e-88
1e-70
-
8e-43
6e-109
977
2721
100
94.95
40
198
9e-17
5e-82
4e-135
2482
86.92
237
3e-49
Reference gene
Expression analysis I (cont--)
Piper colubrinum
Contig Identity
Alignment E- Value
length
length
Aquaporin- Piper colubrinum 2094
99.63
267
2e-149
Piper nigrum
Contig
Identity
length
1229
82.65
Alignment
length
98
EValue
1e-10
Osmotin - Piper colubrinum
(IIIr)
betaine-aldehyde
dehydrogenase - Glycine
max
Cu/Zn superoxide
dismutase- Gossypium
hirsutum
Mitogen-activated protein
kinase (MAPK)- Gossypium
hirsutum
R gene fragment - Piper
colubrinum
bZIP transcription factorOryza sativa
beta-1,3-glucanase-like
gene- Piper colubrinum
1,3-glucanase-like mRNA,
complete sequence- Piper
colubrinum
Reference gene
297
99.35
155
4e-81
318
94.53
201
4e-87
1698
77.27
726
3e-27
1726
77.61
603
2e-25
865
88.29
401
2e-72
797
82.93
375
5e-64
1798
77.89
995
3e-61
2854
78.35
485
3e-31
3008
98.76
242
1e-129
-
2053
76.62
633
7e-16
1797
78.65
342
3e-21
1074
97.96
490
0
628
93.21
265
3e-108
373
95.76
165
4e-75
628
97.97
345
2e-180
Expression analysis I (cont--)
Reference gene
Peroxidase- Piper
tuberculatum
Hydroxyproline-rich
glycoprotein- Piper
tuberculatum
Cinnamoyl CoA reductase Piper nigrum
Piper colubrinum
Contig Identity
Alignmen E- Value
length
t length
1234
92.92
424
Piper nigrum
Contig
Identity
length
3e-165
1228
Alignment Elength
Value
92.45
424
939
93.7
365
7e-156
832
91.87
332
1413
91.18
306
6e-111
1361
83.81
247
Peroxidase - Piper nigrum
696
93.33
255
1e-98
1092
97.4
231
pathogenesis related-1Triticum aestivum
706
80.81
172
6e-17
-
Phenylalanine ammonia
lyase- V.vinifera
2091
80.86
533
3e-67
2054
80.58
582
Expression analysis II
Gene Description
ACC oxidase -Carica papaya
PISTILLATA-like protein PI- Piper nigrum
APETALA3-like protein AP3-2 -Piper nigrum mRNA
heat shock protein-70 cognate protein (ERD2) Arabidopsis thaliana mRNA
Cinnamoyl CoA reductase - Piper nigrum,
Alpha amylase -Piper colubrinum,
Peroxidase- Piper tuberculatum
betaine aldehyde dehydrogenase- Piper
colubrinum
WRKY transcription fragement- Piper colubrinum
R gene fragment- Piper colubrinum
Hydroxyproline-rich glycoprotein Piper
tuberculatum
beta 1,3-glucanase -Piper nigrum
Peroxidase - Piper nigrum
beta-1,3-glucanase- Piper colubrinum
Aquaporin- Piper colubrinum,
Piper colubrinum osmotin mRNA, complete cds
*based on average read depth
Expression
in
Piper colubrinum *
7.57
13.33
41.38
Expression
in
Piper nigrum*
0.00
0.00
0.00
104.85
531.15
2558.75
2569.05
3.88
37020.49
5379.58
1514.76
4033.85
5197.90
6161.33
0.00
14193.71
0.00
11175.07
20795.28
30732.80
66399.50
75066.67
2016108.51
26616.99
3473.43
183601.93
2825.04
6042.32
394.81
SNPs- Superoxide dismutase
FUTURE PROGRAMME
Genomic studies on host-pathogen (Piper - Phytophthora )
interactions:
1. QRT-PCR & cDNA Microarrays -for identification of virulenceassociated Phytophthora genes and host-defense strategies
2. Targeted resequencing -for capture of DNA targets of specific
genes and allele mining in Piper
3. RNA silencing for validation of specific genes involved in hostpathogen interaction
Others
Whole genome sequencing of Piper nigrum and related species
T H A N K S
Download