Allele Mining Tools Johnson George K. Indian Institute of Spices Research, Marikunnu P.O., Calicut- 673012. Allele mining is a research field aimed at identifying allelic variation of relevant traits within genetic resources collections. for identified genes of known function and basic DNA sequence, genetic resources collections may be screened for allelic variation Breeding Synopsis Some Facts • In human beings, 99.9 percent bases are same. • Remaining 0.1 percent makes a person unique. – Different attributes / characteristics / traits • how a person looks, • diseases he or she develops. • These variations can be: – Harmless (change in phenotype) – Harmful (diabetes, cancer, heart disease, Huntington's disease, and hemophilia ) – Latent (variations found in coding and regulatory regions, are not harmful on their own, and the change in each gene only becomes apparent under certain conditions e.g. susceptibility to lung cancer) SNPs (A SNP is defined as a single base change in a DNA sequence) • SNPs are found in coding and (mostly) noncoding regions. • Occur with a very high frequency – about 1 in 1000 bases to 1 in 100 to 300 bases. • The abundance of SNPs and the ease with which they can be measured make these genetic variations significant. • SNPs close to particular gene acts as a marker for that gene. • SNPs in coding regions may alter the protein structure made by that coding region. SNPs may / may not alter protein structure Applications of allele mining Degenerate primer design and gene amplification Degenerate primers for different groups of sequences based on the motifs in the NBS region of R-genes Targeted Amplification of wrky gene from Piper colubrinum WRKY domain is a DNA binding domain found in a superfamily of plant transcription factors involved in the regulation of various physiological programs that are unique to plants, including pathogen defense M 1 2 3 4 Amplification of wrky gene from P. colubrinum. M- bp ladder, Lane 1- 4 Amplification using different degenerate primers WRKY domain is a DNA binding domain found in a superfamily of plant transcription factors involved in the regulation of various physiological programs that are unique to plants, including pathogen defense Sequence analysis- WRKY gene from Piper colubrinum Blastp results Blastp tree View – WRKY Piper colubrinum Steps involved in allele mining The TILLING Method. Seeds are treated with a chemical mutagen to induce genetic variation, and then planted. Theresulting M1 population of plants is chimeric for mutations. Therefore, one seed from each M1 is planted to create the M2 population. M2 DNA is extracted from leaf tissue DNA samples are pooled to increase throughput and PCR amplified with dye-labeled PCR primers specific to a target gene of interest. PCR products are denatured and allowed to reanneal to form heteroduplexes. Heteroduplex DNA is then cleaved by Cel I and analyzed Illustration of a Cel I cleavage reaction. PCR primers that have been end-labeled with two different color dyes (red and green arrows) are used to amplify a targeted region of the genome in a pool of DNA consisting of multiple individuals. After PCR, DNA fragments are denatured and allowed to reanneal to form homoduplexes and heteroduplexes. Cel I is added to the reaction and cleaves DNA 3’ of the mismatch. The cleavage reaction is concentrated, denatured and separated electrophoretically on a LI-COR DNA analyzer. ECO-TILLING: DNA from many (eight) plants are pooled, The amplified products are denaturated by heating and cooling slowly for randomly re-annealing and forming homo- and heteroduplexes, double-stranded products are digested by CELI endonuclease, ---Gel Electrophoresis Next Generation Sequencing • Ultra high-throughput sequence analysis (UHTS) • • • Several platforms including 454, ABI-Solid, Illumina that are capable of generating 1 to 100’s of Gb of DNA sequence on a single run. Library preparations are relatively simple and kits available Data analysis is computationally challenging (need to process Tb of data) and beyond the reach of many experimental biologists. TRANSCRIPTOME SEQUENCING Library preparation included following steps: 1. Purification of mRNA from Total RNA 2. Fragmentation of mRNA 3. First strand cDNA synthesis 4. Second strand cDNA synthesis 5. End Repair and Phosphorylation 6. Adenylation 7. PE Adapter Ligation 8. Selection of 200 +/- 25 bp 9. PCR amplification with common PE adapter primers 10.QC by Bioanalyzer mRNA Sequencing Sample Preparation 1. Quality check of total RNA Customers should carry out a quality check of their total RNA by running it out on a 1% agarose gel, and the integrity of RNA judged upon staining with ethidium bromide. High quality, intact RNA will show a 28S rRNA band at 4.5kb, that should be about twice the intensity of the 18S rRNA band at 1.9kb. Both kb determinations are relative to a 1kb ladder. The mRNA will appear as A smear from 0.5-6kb. Completely degraded RNA will appear as a very low molecular weight smear. Customers are to supply 10ug of purified total RNA mRNA Sequencing Sample Preparation 2. mRNA Purification from Total RNA mRNA is isolated from total RNA by binding the mRNA to a magnetic oligo(dT) bead. mRNA has a polyA tail and will bind to the oligo(dT) bead. mRNA 5’-UCGGAAGCUGAAGUGAUCAGGGUUCAAUAAAAAAAAAAAAA-3’ Total RNA containing mRNA is added to magnetic oligo (dT) beads PolyA Tail 3’-TTTTTTTTTTTTT 5’-UCGGAAGCUGAAGUGAUCAGGGUUCAAUAAAAAAAAAAAAA-3’ 3’-TTTTTTTTTTTTT mRNA in supernatant 5’-UCGGAAGCUGAAGUGAUCAGGGUUCAAUAAAAAAAAAAAAA-3’ mRNA Sequencing Sample Preparation 3. Fragmentation of mRNA The mRNA is fragmented into small pieces using divalent cations under elevated temperature. 5’-UCGGAAGCUGAAGUGAUCAGGGUUCAAUAAAAAAAAAAAAA-3’ mRNA Add fragmentation buffer and PCR thermocycle At 700C for 5 minutes Random fragmentation 5’ 3’ 5’ 5’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ 5’ 5’ 3’ 3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ 5’ 3’ Generates fragments ranging in size from 100 bases to 5000 bases. 3’ 3’ 4. First Strand cDNA Synthesis Random Hexamer Primers 3’ 3’ 5’ The cleaved RNA fragments are copied into first strand cDNA using reverse transcriptase and a high concentration of random hexamer primers mRNA Sequencing Sample Preparation 5. Second Strand cDNA synthesis Remove the strand of mRNA and synthesize a replacement strand generating double-stranded cDNA. mRNA RNase H 5’-UCGGAAGCUGAAGUGAUCAGGGUUCAAUAAAAAAAAAAAAA-3’ 3’-AGCCTTCGACTTCACTAGTCCCAAGTTATTTTTTTTTTTTT-5’ First Strand cDNA RNase H activity enzymatically degrades the mRNA strand. 3’-AGCCTTCGACTTCACTAGTCCCAAGTTATTTTTTTTTTTTT-5’ DNA Polymerase I activity generates second strand cDNA to form double-stranded cDNA. DNA Polymerase I 5’-TCGGAAGCTGAAGTGATCAGGGTTCAATAAAAAAAAAAAAA-3’ 3’-AGCCTTCGACTTCACTAGTCCCAAGTTATTTTTTTTTTTTT-5’ mRNA Sequencing Sample Preparation 6. Add ‘A’ Bases to the 3’ End of the DNA Fragments An ‘A’ base is added to the 3’ end of the blunt phosphorylated DNA fragments, using the polymerase activity of Klenow fragment (3’ to 5’ exo minus). This prepares the DNA fragments for ligation to the adapters, which have a single ‘T’ base overhang at their 3’ end 7. Ligate Adapters to the DNA Fragments P5’-AGTCTTGGATCGAC-3’ 3’-TCAGAACCTAGCTG-5’P P5’-AGTCTTGGATCGACA-3’ 3’-ATCAGAACCTAGCTG-5’P P5’-AGTCTTGGATCGACA-3’ 3’-ATCAGAACCTAGCTG-5’P Adapters are ligated to the ends of the DNA fragments, preparing them to be Hybridized to the flow cell P5’-AGTCTTGGATCGACA-3’ 3’-ATCAGAACCTAGCTG-5’P mRNA Sequencing Sample Preparation 8.Purify Ligation Products The products from the ligation are purified on a 2% agarose gel, to remove all unligated adapters, remove any adapters that may have ligated to one another, & select a size range of templates to go on The cluster generation platform Excise a gel slice in the 200bp (+/- 25bp) and purify mRNA Sequencing Sample Preparation 9. Enrich the Adapter-Modified cDNA Fragments by PCR PCR is used to selectively enrich those cDNA fragments that have adapter molecules on both ends, & to amplify the amount of cDNA in the library. The PCR is performed with two primers that anneal to the ends of the adapters P5’-AGTCTTGGATCGACA-3’ 3’-ATCAGAACCTAGCTG-5’P Denaturation P5’-AGTCTTGGATCGACA-3’ Amplify using the following PCR protocol: *30 seconds at 980C *15 cycles of: 10 seconds at 980C 30 seconds at 650C 30 seconds at 720C *5 minutes at 720C *Hold at 40C Primer Annealing 3’-ATCAGAACCTAGCTG-5’P Extension P5’-AGTCTTGGATCGACA-3’ 3’-NNNNNTCAGAACCTAGCTGTNN-5’ 5’-NNTAGTCTTGGATCGACNNNNN-3’ 3’-ATCAGAACCTAGCTG-5’P mRNA Sequencing Sample Preparation 10. Validate the Library Perform the following quality control steps on the DNA Library: Determine the library concentration by measuring its absorbence at 260nm. The yield from the protocol should be between 500-1000ng of DNA. Measure the 260/280 ratio. It should be ~1.8-2.0. Either load 10% of the volume of the library on a gel and check that the size range is as expected, or run the DNA library on an Agilent 2100 Bioanalyzer. Sample Determine the molar concentration of the library ready for Cluster Generation Data from Agilent 2100 Bioanalyzer ( Sequencing by synthesis ) Illumina's Sequencing Technology generate large numbers of unique "polonies" (polymerase generated colonies) that can be simultaneously sequenced by “bridge amplification”. These parallel reactions occur on the surface of a "flow cell" (basically a water-tight microscope slide) which provides a large surface area for many thousands of parallel chemical reactions. Repeated denaturation and extension results in localized amplification of single molecules in millions of unique locations across the flow cell surface. This process occurs in what is referred to as Illumina's "cluster station", an automated flow cell processor. The use of physical location to identify unique reads is a critical concept for all next generation sequencing systems. The density of the reads and the ability to image them without interfering noise is vital to the throughput of a given instrument. Each platform has its own unique issues that determine this number, 454 is limited by the number of wells in their PicoTiterPlate, Illumina is limited by fragment length that can effectively "bridge", and all providers are limited by flow cell real estate. Illumina sequencing 1. Fragment DNA & attach adapters 2. Denature. SS DNA attached randomly to flow cell 3. Bridge amplification w/ unlabeled dNTP. Reverse and forward primers 4. Fragments go SS to DS 5. Denature DS. SS templates anchored to flow cell 6. After multiple cycles, amplification complete 7. Add all 4 labeled bases Laser excitation 8. Image 1st base 9. Add chemistry to remove terminal label. Add all 4 reversible terminator bases 10. Laser excitation. Record 2nd base. 11. Repeat cycles until ~100 bases read. Reads are both directions. 12. Align data, compare to reference, identify sequence FASTQ File Format Piper Transcriptomics Genes Identified in the transcriptome of Piper colubrinum challenged with Phytophthora Stress inducible genes Betaine aldehyde dehydrogenase Genes involved in Biosynthesis of secondary metabolites CHI chalcone isomerase catalase Chalcone synthase Chitinase class I & VII cinnamate 4-hydroxylase glutathione-S-transferase Peroxidase cinnamoyl-CoA reductase geranyl geranyl pyrophosphate synthase Beta 1,3-glucanase hmg-CoA reductase Cu/Zn superoxide dismutase manganese superoxide dismutase lycopene beta cyclase MAP kinase phenylalanine ammonia lyase p-coumaroyl shikimate 3'hydroxylase Osmotin Transaldolase Summary of NGS DATA Plant Sample Name Piper colubrinum Piper nigrum Sequence File Size Maximum Sequence Length Minimum Sequence Length Average Sequence Length 37.70 MB 76.06 MB 15769 10479 100 100 567.844 721.922 No. of Sequences Total Sequences Length Total Number of NonATGC Characters Percentage of NonATGC Characters 62619 101284 35557875 73119148 1316 1090 0.00004 0.00001 Expression analysis I Piper colubrinum Contig Ident Alignm length ity ent length 1742 Calmodulin83.7 3273 A.thaliana 9 Catalase- A.thaliana 1576 78.83 1162 1124 79.9 398 Geranylgeranyl transferaseA.thaliana 79.48 1433 Heat shock protein- 2401 70- A.thaliana 2304 77.95 1111 Malate dehydrogenaseA.thaliana B1/WR_F1_F06 1620 97.92 96 2206 98.24 227 Alpha amylasePiper colubrinum 2595 98.82 254 Betaine aldehyde dehydrogenasePiper colubrinum EValue Piper nigrum Contig Identity length Alignme Ent length Value 3e-61 1801 82.93 375 1e-63 4e-94 3e-35 1353 542 79.31 81.46 1020 329 8e-98 1e-39 7e-146 1857 78.33 1269 1e-88 1e-70 - 8e-43 6e-109 977 2721 100 94.95 40 198 9e-17 5e-82 4e-135 2482 86.92 237 3e-49 Reference gene Expression analysis I (cont--) Piper colubrinum Contig Identity Alignment E- Value length length Aquaporin- Piper colubrinum 2094 99.63 267 2e-149 Piper nigrum Contig Identity length 1229 82.65 Alignment length 98 EValue 1e-10 Osmotin - Piper colubrinum (IIIr) betaine-aldehyde dehydrogenase - Glycine max Cu/Zn superoxide dismutase- Gossypium hirsutum Mitogen-activated protein kinase (MAPK)- Gossypium hirsutum R gene fragment - Piper colubrinum bZIP transcription factorOryza sativa beta-1,3-glucanase-like gene- Piper colubrinum 1,3-glucanase-like mRNA, complete sequence- Piper colubrinum Reference gene 297 99.35 155 4e-81 318 94.53 201 4e-87 1698 77.27 726 3e-27 1726 77.61 603 2e-25 865 88.29 401 2e-72 797 82.93 375 5e-64 1798 77.89 995 3e-61 2854 78.35 485 3e-31 3008 98.76 242 1e-129 - 2053 76.62 633 7e-16 1797 78.65 342 3e-21 1074 97.96 490 0 628 93.21 265 3e-108 373 95.76 165 4e-75 628 97.97 345 2e-180 Expression analysis I (cont--) Reference gene Peroxidase- Piper tuberculatum Hydroxyproline-rich glycoprotein- Piper tuberculatum Cinnamoyl CoA reductase Piper nigrum Piper colubrinum Contig Identity Alignmen E- Value length t length 1234 92.92 424 Piper nigrum Contig Identity length 3e-165 1228 Alignment Elength Value 92.45 424 939 93.7 365 7e-156 832 91.87 332 1413 91.18 306 6e-111 1361 83.81 247 Peroxidase - Piper nigrum 696 93.33 255 1e-98 1092 97.4 231 pathogenesis related-1Triticum aestivum 706 80.81 172 6e-17 - Phenylalanine ammonia lyase- V.vinifera 2091 80.86 533 3e-67 2054 80.58 582 Expression analysis II Gene Description ACC oxidase -Carica papaya PISTILLATA-like protein PI- Piper nigrum APETALA3-like protein AP3-2 -Piper nigrum mRNA heat shock protein-70 cognate protein (ERD2) Arabidopsis thaliana mRNA Cinnamoyl CoA reductase - Piper nigrum, Alpha amylase -Piper colubrinum, Peroxidase- Piper tuberculatum betaine aldehyde dehydrogenase- Piper colubrinum WRKY transcription fragement- Piper colubrinum R gene fragment- Piper colubrinum Hydroxyproline-rich glycoprotein Piper tuberculatum beta 1,3-glucanase -Piper nigrum Peroxidase - Piper nigrum beta-1,3-glucanase- Piper colubrinum Aquaporin- Piper colubrinum, Piper colubrinum osmotin mRNA, complete cds *based on average read depth Expression in Piper colubrinum * 7.57 13.33 41.38 Expression in Piper nigrum* 0.00 0.00 0.00 104.85 531.15 2558.75 2569.05 3.88 37020.49 5379.58 1514.76 4033.85 5197.90 6161.33 0.00 14193.71 0.00 11175.07 20795.28 30732.80 66399.50 75066.67 2016108.51 26616.99 3473.43 183601.93 2825.04 6042.32 394.81 SNPs- Superoxide dismutase FUTURE PROGRAMME Genomic studies on host-pathogen (Piper - Phytophthora ) interactions: 1. QRT-PCR & cDNA Microarrays -for identification of virulenceassociated Phytophthora genes and host-defense strategies 2. Targeted resequencing -for capture of DNA targets of specific genes and allele mining in Piper 3. RNA silencing for validation of specific genes involved in hostpathogen interaction Others Whole genome sequencing of Piper nigrum and related species T H A N K S