Viral metagenomic analysis of sweet potato using high-throughput deep sequencing Student: Thulile Faith Nhlapo Supervisors: Dr. J. Rees, Prof. M.E.C Rey, Dr. D.A. Odeny Collaborators: Ms J. Mulabisana, Dr. M. Cloete 1 Sweet potato viruses and their effects on production • Sweet potato is highly nutritious and is used as a poverty alleviation crop (food security) – • Healthy “Potential infection” Good source of carbohydrates, proteins, fiber, iron, vitamin C, B, and Vitamin A (beta carotene) Viral diseases can reduce crop quality and yield by up to 100% Viral Infection/diseased • A collection of viruses may infect sweet potato (disease complex) • In SA 12 viruses have been identified either occurring singly or in combination (viral synergy) decreasing yield by 50-100% Sweet potato virus families • Compared to viruses of other agriculturally important crops, sweet potato viruses have been poorly studied but recently more viruses infecting sweet potato are being described • Over 30 sweet potato viruses have been identified and assigned to 9 families • 7 RNA virus families have been identified: Bromoviridae, Bunyaviridae, Closteroviridae, Comoviridae, Flexiviridae, Luteoviridae, and Potyviridae • 2 DNA virus families have been identified: Caulimoviridae, Geminiviridae Symptoms associated with viral infection Symptoms observed on sweet potato plants in the field (A&B) chlorotic spots with purple rings, (C) upward curling of leaves, (D) insect damage. Symptoms observed in the glasshouse (A) chlorotic spots with purple rings, (B) chlorotic spots with purple rings, and purple edged vein feathering, (C) upward curling of young leaves, (D) chlorotic spots and vein clearing. Metagenomics and viral metagenomics? • Metagenomics- “or community genomics, is an approach aimed at analyzing the genomic content of microbial communities within a particular niche” • Viral metagenomics- the study of viral communities. Viral metagenomics can be used to analyse viral sequences in any sample type (soil, plant, water, human gut etc.) • Is a powerful tool for virus discovery, can be applied to the problem of determining etiology in diseases • Also a metagenomic study or analysis is not biased towards culturable organisms; therefore the total genetic diversity of microorganisms can be studied Using next generation sequencing approach for metagenomics 1. 2. Cloning dependent sequencing Deep sequencing Expensive Cheaper Time consuming Faster, accurate Require large amounts of DNA Small amounts of DNA (detect low virus titers) Inserts sometimes unstable No cloning Produces large contiguous sequences Short reads Viral metagenomics-viruses small genomes, so assembly not a problem Bioinformatic- developed software and algorithm for analysis of short reads Aims 1. To carry out a metagenomic study of sweet potato viruses in the Western and Eastern Cape provinces of South Africa 2. To undertake genetic characterisation of sweet potato viruses under South African conditions in order to generate a basis for their classification 3. Explore diagnostic strategies using next generation sequencing (NGS) Overview of metagenomics strategy Sampling RCA Input Symptomatic & Asymptomatic leaves DNA Isolation RNA Isolation Sample preparation Nextera Sample preparation Ribo-Zero&TruSeq SequencingMiSeq Output Data Analysis-CLC Bio Overview of bioinformatics strategy Sequence Reads (Raw Data) 1. Download reference sequences (NCBI) 2. Read map to reference viral genomes (0.8-0.99 stringency) 3. Extract new consensus sequence 2. Trim reads for adaptors Unmapped reads De novo assembly (25-64 k-mer) BLASTn BLASTn 4. Retrieve full genomes of most closely related species 5. Multiple Sequence Alignment MEGA 5.05 CLC Bio 6.0.1- Plug-ins for additional alignmentsMUSCLE and ClustalW Identify contigs 6. Pairwise ComparisonSequence ID 8. Full genomes 9. Design primers 7. Phylogenetic tree 10. Confirm by PCR Sweet potato sampling sites Date Location November 2012 P.E. (Eastern Cape) November 2012 P.E. (Eastern Cape) Subsistence November 2012 P.E. (Eastern Cape) Subsistence November 2012 Type of farming Symptomatic Subsistence N Sample Subsistence/Commercial Eastern Cape size= 20 Alice (Eastern Cape) Western Cape January 2013 Klawer (Western Cape) Commercial January 2013 Lutzville (Western Cape) January 2013 Paarl (Western Cape) Commercial January 2013 Franschhoek (Western Cape) Commercial Asymptomatic Commercial RESULTS Rolling circle amplification (RCA) provides DNA sequencing template • • • • Genomic DNA (gDNA) isolation - Qiagen DNeasy Plant Mini Kit Rolling circle amplification (RCA) - IllustraTM TempliPhi 100 Amplification Kit Nextera DNA sample preparation Sequencing on the Illumina MiSeq Benchtop Sequencer M 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 M 1 20 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 10Kb 3Kb 1Kb RCA products for Eastern Cape samples DNA isolation of symptomatic and asymptomatic plants collected from the Eastern Cape Results: DNA data, symptomatic samples Western Cape sample (KT10): Sequence identity and percentage genome coverage of DNA circular viruses and mitochondrial DNA Reference genome Percentage identity Average coverage Percentage of genome covered Consensus length Sweet potato geminivirus strain SPLCSPV (JQ621844) 94.38% 3 359 X 99.3% 2 769 bp Sweet potato geminivirus strain SPMaV (JQ621843) 98.10% 2 940 X 99.92% 2 781 bp Ipomoea batatas mitochondrial plasmid-like DNA (FN421476) 100% 3 713 X 100% 1 027 bp Example of mapping and coverage-KT10 Reads mapped to SPLCSPV-ZA (94 % similarity) Reference New consensus Neighbour-joining tree of geminiviruses Ribo-zeroed total RNA provides sequencing template for RNA sequencing • • • • • Total RNA isolation - Qiagen RNeasy Mini Kit DNase treatment of samples prior to sequencing rRNA depletion- Ribo-ZeroTM Magnetic Kit (Plant Leaf) TruSeq Stranded Total RNA Sample Preparation Sequencing on the Illumina MiSeq Benchtop Sequencer M 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 10Kb 3Kb 1Kb RNA isolation of symptomatic and asymptomatic plants collected from the Eastern and Western Cape M Results: RNA data, symptomatic samples • 1% of data mapped to viral genomes • Majority of reads mapped to sweet potato chloroplast genome • Assembled near complete genomes of: – Sweet potato chlorotic stunt virus (RNA 2 segment) (SPCSV) • Still need to assemble RNA 1 segment (Total genome size = 17 630nt) • 3 593 reads out of 5 432 520, consensus length 4 811bp, 61 X average coverage – Sweet potato feathery mottle virus (SPFMV) • Ordinary-strain • Common-strain (Sweet potato virus C) (SPVC) – Sweet potato virus G (SPVG) Summary of results for RNA viruses Reference genome Percentage identity Average coverage Percentage of genome covered Consensus length Sweet potato virus C Peru(GU207957) 94.07% 446 X 99.92% 10 812 bp Sweet potato feathery mottle virus (AB439206) 93.96 % 255 X 98.83% 10 694 bp Sweet potato virus G(JQ824374) 97.92% 51 X 99.92% 10 743 bp Sweet potato chlorotic stunt virus RNA 2 (KC146843) 96.99 % 750 X 99.85 % 8 205 bp Mapping sequence reads to SPFMV reference genome Consensus length= 10 694 bp Average coverage= 255 X Reference New consensus New consensus shares 94% similarity with reference (variation) ZOOM-in Neighbour-joining tree of criniviruses (SPCSV) EA WA Sweet potato chlorotic stunt virus isolates: WA- West African strain EA-East African strain Neighbour-joining tree of potyviruses (SPFMV, SPVC, SPVG) EA & O S SPFMV lineage C Sweet potato feathery mottle virus isolates: EA-East African strain S-S strain C-Common strain G-Sweet potato virus G 2-Sweet potato virus 2 Sequence data suggests multiple infection Sample ID KT10 DNA Viruses RNA Viruses SPLCSPV (JQ621844)* Leaf curling SPMaV (JQ621843) Leaf curling SPFMV 10-O strain (AB439206) SPCSV (RNA 2) (KC146843) KF1 Chlorosis Purpling leaves Leaf curling SPMaV (JQ621843) Leaf curling SPVG (JQ824374) SPFMV 10-O strain (AB439206) K17 Purple ringspots SPLCSPV (JQ621844) SPFMV (AB439206) F11 Symptoms Observed SPVC (NC_014742) Purple ringspots Chlorotic spots Chlorotic spots Purple ringspots Leaf vein feathering (with pigmentation) Vein clearing Chlorotic spots Observed symptoms on sweet potato plants. (A1) Purple ringspots and chlorotic spots on KT10 sample, these symptoms are associated with Sweet potato feathery mottle virus (SPFMV). (A2) Upward curling of leaves associated with Sweet potato leaf curl virus (SPLCV). (B) Upward curling of leaves and chlorotic spots on sample KF1, symptoms associated with SPLCV and SPFMV. (C) Purple ringspots, leaf vein feathering with purple feathering and chlorotic spots on sample F11, these are symptoms associated with SPFMV and Sweet potato virus G (SPVG). (D) Chlorotic spots and vein clearing on sample K17, symptoms associated with Sweet potato virus C (SPVC), the C strain of the potyvirus SPFMV. Sweet potato virus distribution SPFMV (O-strain) SPCSV SPLCSPV-ZA SPMaV-ZA Western Cape SPVG Eastern Cape SPVC (SPFMV C-strain) Advantages of this sequencing approach? • Detect viruses by direct sequencing • Generate complete/near complete viral genomes • High average sequence depths • Deep sequencing is efficient diagnostic tool – Detected viral pathogens – Detected mixed infections – Detected diverse viral strains Acknowledgements • Supervisors – – – • Collaborators – – • • Julia Mulabisana Dr. M. Cloete ARC-VOPI senior researchers, technicians, and staff – – – • • • Dr. J. Rees Prof. C. Rey Dr. D. Odeny Sidwell Tjale Thakhani Ramathavhatha Dr. Laurie ARC-BTP senior students, researchers and bioinformaticians Farmers in Western and Eastern Cape This work is based on the research support in part by the National Research Foundation of South Africa (Grant reference number UID 79983) Other funding sources: ARC-PDP and DAFF THANK YOU