Identification and Characterization of Foodborne Pathogens by Whole Genome Sequencing: A Shift in Paradigm Peter Gerner-Smidt, MD, ScD Enteric Diseases Laboratory Branch EFSA Scientific Colloquium No 20 Use of Whole Genome Sequencing (WGS) of foodborne pathogens for public health protection Parma, Italy, 16-17 July 2014 National Center for Emerging and Zoonotic Infectious Diseases Division of Foodborne, Waterborne, and Environmental Diseases Current Methods of Characterizing Foodborne Pathogens in a Public Health Laboratory Growth characteristics Phenotypic panels Agglutination reactions Enzyme immuno assays (EIAs) PCR DNA arrays (hybridization) Sanger sequencing DNA restriction Electrophoresis (PFGE, capillary) Each pathogen is characterized by methods that are specific to that pathogen in multiple workflows Separate workflows for each pathogen TAT: 5 min – weeks (months) Why Move Public Health Microbiology to WGS? • Consolidation of workflows in the lab • More efficient outbreak detection, investigation & control • Precise and flexible case definition – More outbreaks will be detected and solved when they are small – Scarce epi-resources may be focused • More efficient surveillance of sporadic infections • Source attribution analysis of sporadic disease • Focus on pathogens of particular public health importance: – Virulence – Resistance - Emerging pathogens - Rapidly spreading clones/ traits- Vaccine preventable diseases WGS in Public Health: The tools must be • Simple • Public health microbiologists are NOT bioinformaticians • Standard desktop software • Comprehensive • All characterization in one workflow • Work in a network of laboratories • Free sharing and comparison of data between labs • Central and local databases Genomics for Diagnostics and Surveillance is a Global Issue How can we implement genomics fast and efficiently at the global level? When to share data? Surveillance WGS should be shared in real-time What is the minimal IT-infrastructure? What technical gaps need to be filled? QA/QC Political and ethical barriers Generating succes stories: Proof-of-Concept Study on the Use of Real-Time Whole Genome Sequencing in Conjunction with Enhanced Surveillance for Listeriosis • Collaboration among the public health departments in the states, CDC, FDA, USDA, and NCBI • International component: Developing and refining bioinformatics ‘pipelines’ with partners in Belgium. Canada, Denmark, England, and France Why Listeria monocytogenes? Illness is rare but serious, costly, and commonly outbreak associated Controllable Current subtyping methods are not ideal Not highly discriminatory No evolutionary relationships Listeria genome is fairly small and relatively easy to sequence and analyze Strong epidemiology surveillance (Listeria Initiative) Strong regulatory component Approach: Sequence all clinical isolates in the U.S. during one year as close to real-time as possible in parallel with current surveillance PulseNet PFGE, strain characterization at CDC, interview of case-patients Evaluate data on a weekly basis Follow up on clusters detected Both PFGE and WGS defined clusters Upload sequences to NCBI (Genbank), a public database, as the sequences are generated With metadata that do NOT identify state or isolation date but with link to the PulseNet database Listeria Whole Genome Sequencing Surveillance in Numbers (8 months into the project, June 2014) ~ 690 clinical isolates sequenced in real-time ~ 75 environmental and historical clinical isolates sequenced 470 food isolates sequenced by FDA/GenomeTrakr 13 clusters identified by PFGE & WGS 3 clusters identified only by WGS Listeria Whole Genome Sequencing Surveillance Lessons Learned: Possible to perform WGS in real-time Historical data are critical for cluster determination WGS provides more resolution than PFGE One PFGE cluster could not be confirmed by WGS • Saving resources by interviewing only patients relevant to the cluster Suspected source disproved by WGS in one cluster A common source identified in one cluster until now Difficult to obtain relevant exposure information from patients A sporadic case was linked to a lettuce recall by WGS Many good analytical approaches yielding (almost) the same information WGS Minimum Spanning Tree of Listeria monocytogenes Connection between lettuce recall and a sporadic case? Courtesy Public Health Agency of Canada & the Canadian Food Inspection Agency Lessons learned: WGS may identify clusters not evident by PFGE Cluster 1403MLGX6-1 0.1 0.1 0.0 0.3 0.0 0.3 0.5 1.5 0.6 2.1 IL isolate 2 18 20 41 NYC25isolate 21 18 20 41 FL isolate 25 2 18 20 41 NYC25isolate 22 20 41 NYC25 isolate 23 18 20 41 201325isolate 2– Nearest Neighbor 41 LMO_14 LMO_13 LMO_12 LMO_11 LMO_10 LMO_7 wgMLST LMO_6 LMO_5 LMO_4 LMO_1 100 wgMLST (<All Characters>)wgMLST 99 5 cases, 3 PFGE patterns, 3 states All patients originally from former USSR or Poland 98 11 19 11 8 37 11 19 11 8 37 11 19 11 8 37 11 19 11 8 37 11 19 11 8 37 11 19 11 8 Lessons Learned: WGS May Identify Clusters Not Evident by PFGE 1403MLGX6-1WGS hqSNP tree How different are WGS profiles of isolates belonging to the same outbreak? 2013T-30500.00 77 2013T-30420.00 Salmonella serovar Heidelberg PFGE pattern JF6X01.0022 single state outbreaks in summer 2013 2013T-31110.00 2013T-3112910.01 77 0.01 0.002013T-311074 0.00 0.00 74 2013T-31090.00 0.00 2013T-304087 0.01 100 2-8 SNPs 2-16 SNPs AL funeral 0.03 2013T-3059-A 0.00 2013T-30580.01 0.32 105-113 SNPs 2013T-30410.02 0.19 0.18 0.14 0 Sporadic case, NC 2013AM-0488 410.02 0.00 100 2013AM-0486 0.01 0.14 0.04 NY daycare 2013K-1067- Sporadic case, NE 2013K-1038 0.00 0.19 100 2013K-10360.02 2013K-1040930.01 0.02 2013K-10390.00 0.05 4-5 SNPs 2013AM-0487 0.01 0.00 75 2013K-1037- Sporadic case, OH 3-6 SNPs (3-9 SNPs) CO family gathering 2012-2013 Salmonella serovar Heidelberg multistate outbreak associated with chicken from manufacturer X PFGE patterns Pattern 122 Pattern 672 Pattern 45 Pattern 22 Pattern 326 Pattern 41 Pattern 258 ** Chicken from manufacturer X *** Clinical isolate from 2011 89 2012K-1420 0.00251 84 0.00779 2012K-1421 0.03113 0.00481 2013K-1635 42 0.01289 2012K-1747 0.00251 0.00251 2012K-1549** 98 0.05323 100 0.02825 2012K-1550** 6-46 SNPs 0.03381 0.02869 2012K-1417 0.01266 87 2012K-1255 0.01005 0.00763 2012K-1256 0.04386 84 2012K-1254 0.02581 0.00701 2013K-1633** 84 0.02798 2012K-1315*** 0.00707 0.07858 2013K-1649** 100 95 0.02262 21 SNPs 0.04630 2013K-1650 0.01637 0.03288 87 100 2013K-0982 0.00319 0.01024 0.06016 2013K-1636 0.01701 7-60 SNPs 2013K-1361 0.03682 0.05173 2013K-0983 75 0.06983 2013K-1638 100 0.00807 0.01738 19 SNPs 0.08289 2013K-1639 0.03360 2013AM-0303 0.02883 100 2013K-0979 0.02834 0.11613 2013K-1275 95 0.01792 6-35 SNPs 2013K-0573** 0.02671 0.03086 72 2013K-0574 0.02037 0.00248 2013K-0980** 0.00000 0.03478 2013K-1274 0.02286 6-62 SNPs To SNP or Not to SNP? in public health Single Nucleotide Polymorphism (SNP) approaches Default for phylogenetic analyses of sequence data Comparative subtyping by nature Results difficult to communicate Computationally intensive = SLOW Gene- gene approach (wgMLST) Definitive subtyping • Leads to naming, tracking over time, easy communication Computationally more simple = FAST Sufficiently discriminatory? but… Whole Genome Sequencing of Listeria monocytogenes during the Crave Brothers Cheese outbreak High confidence core SNP It took days to It took days to generate this tree using our own NY – no soft cheese exposure No soft cheese exposure bioinformatics pipeline TX - no exposure info Whole Genome Sequencing of Listeria monocytogenes during the Crave Brothers Cheese outbreak High confidence core SNP Historical isolates from the plant environment added to the comparison (courtesy FDA/CFSAN) 0.005 Red= epi-related clinical isolates It took additional hours toclinical generate this tree Blue= retrospective controls or not outbreak related Green=bioinformatics historical environmental isolates from the plant using our own pipeline 100 100 100 CFSAN004365 100 CFSAN004359 CFSAN004361 CFSAN004360 CFSAN004358 CFSAN004353 2010L-1790 CFSAN004348 CFSAN004363 2013L-5298 CFSAN004355 CFSAN004356 CFSAN004352 CFSAN004369 2013L-5214 2011L-2809 CFSAN004377 CFSAN004375 100 CFSAN004374 100 CFSAN004373 CFSAN004349 CFSAN004364 2013L-5275 2013L-5223 2013L-5337 CFSAN004354 2013L-5357 CFSAN004372 CFSAN004370 100 CFSAN004371 CFSAN004368 CFSAN004350 CFSAN004357 2013L-5283 CFSAN004366 CFSAN004376 CFSAN004362 CFSAN004351 2012L-5487 2013L-5121 2013L-5284 2012L-5105 2012L-5274 100 2013L-5374 2013L-5301 Whole Genome Sequencing of Listeria monocytogenes during the Crave Brothers Cheese outbreak It took less than 5 minutes to draw this tree using wgMLST Gene – Gene Approach: Fixed set of genes (‘loci’) leading to typing schemes on different levels MLST Genus/Species Serotype AR eMLST cMLST wgMLST Concept of allelic variation, not only point mutations Evolutionary distance for events such as recombination and simultaneous close-range mutations are counted as one event Definitive subtyping Leads to nomenclature Requires curation Genes That May Be Targeted In a Gene-Gene Analytical Approach Virulence genes Housekeeping genes for MLST & eMLST Core (c) genes (‘present in all strains in a species’) Pan- genome (wg) (‘all genes in the whole population of a species’) Serotyping genes Genes for genus/species/subspecies identification Antimicrobial resistance genes Gene – Gene Approach for Naming Subtyping in Keep with Phylogeny (concept to be developed) 7 gene MLST Isolate A Isolate B Isolate C Isolate D ST24 ST24 ST24 ST31 - eMLST e12 e12 e12 e15 - cMLST wgMLST c48 c48 c45 c60 - w214 w352 w132 w582 Isolate A and B closely related Isolate C related to A and B but not as closely as A is to B Isolate D unrelated to all the other isolates Providing phylogenetic information in the name is important because isolates from the same source are more likely to be related than isolates from different sources Public Health WGS Workflow End users at CDC and in the States LIMS Sequencer External storage Raw sequences PH databases Closed Genus/species Serotype Pathotype Virulence profile AST Lineage Clone Sequence type Allele NCBI, ENA, BaseSpace SNPs Open Calculation engine Nomenclature server Allele databases Open Trimming, mapping, de novo assembly, SNP detection, allele detection Open GENUS/SPECIES: PATHOTYPE: Shiga toxin producing and Enteroaggregative E. coli (STEC & EaggEC) VIRULENCE PROFILE: stx2a, aagR, aagA, sigA, sepA, pic, aatA, aaiC, aap SEQUENCE TYPE: ST34 ANTIMICROBIAL RESISTANCE GENES: blaTEM-1 , blaCTX-M-15 All characteristics have been determined by whole genome sequencing (WGS) The strain contains Shiga toxin subtype 2a typically associated with virulent STEC It does not contain adherence and virulence factors (eae, ehxA) typically associated with virulent STEC It contains adherence and virulence factors typically associated with virulent EaggEc (aagR, aagA, sigA, sepA, pic, aatA, aaiC, aap) This genotype is associated with extremely high (>10%) rates of hemolytic uremic syndrome (HUS) Real-time Sharing Of Sequence Data Is The Key To Successful Surveillance ! Acknowledgements CDC: John Besser, Kelly Jackson, Amanda Conrad, Lee Katz, Steven Stroika, Heather Carleton, Zuzana Kucerova, Eija Trees, Katie Roache, Ashley Sabol, Andrew Huang, Lori Gladney, Maryann Turnsek, Ben Silk, Katie Fullerton, Matthew Wise, Ian Williams, Duncan MacCannell, Efrain Ribot, Patricia Griffin , Rob Tauxe, Chris Braden, Dana Pitts, Collette Leaumont, Patti Fields Public Health Agency of Canada Disclaimers: “The findings and conclusions in this presentation are those of the author and do not necessarily represent the official position of the Centers for Disease Control and Prevention” “Use of trade names is for identification only and does not imply endorsement by the Centers for Disease Control and Prevention or by the U.S. Department of Health and Human Services.” National Center for Emerging and Zoonotic Infectious Diseases Division of Foodborne, Waterborne, and Environmental Diseases 36 2013L-5439 WGS Analysis at CDC Similar results 2013L-5527 2013L-5545 21.5 hqSNPs [0-33] 2013L-5556 2013L-5203 2013L-5204 2013L-5308 2013L-5473 24 hqSNPs [0-94] 2013L-5085 2013L-5587 Circumstantial evidence suggesting pre-packaged lettuce is the likely food vehicle in a sporadic case of listeriosis 2013L-5333 2013L-5314 2013L-5182 2013L-5622 2013L-5623 2013L-5521 2013L-5345 2014L-6017 2013L-5548 43 hqSNPs Lettuce isolate Ohio isolate 2013L-5540 5 hqSNPs 2013L-5359 2013L-5297 66 hqSNPs [4684] 2013L-5194 2013L-5560 2014L-6086 WGS Analysis by Lee Katz, CDC 55 hqSNPs [5-66] 58 hqSNPs [46-58] 0.02 12 wgMLST (<All Characters>)wgMLST WGMLST_LMO0000877 SRR1016591 done 2013L-5521 20 3 WGMLST_LMO0000931 SRR1021895 done 3 WGMLST_LMO0000959 SRR1027086 done 2013L-5204 20 2013L-5587 20 WGMLST_LMO0000692 SRR1041615, SRR98. done SRR1030280 done 2013L-5314 20 2013L-5623 20 3 WGMLST_LMO0000994 WGMLST_LMO0000993 SRR1030279 done 3 WGMLST_LMO0000834 SRR1001027 done 2013L-5622 20 2013L-5473 20 WGMLST_LMO0000916 SRR1112081 done 3 WGMLST_LMO0000884 SRR1016925 done 2013L-5085 20 2013L-5712 20 WGMLST_LMO0000888 SRR1016594 done 3 WGMLST_LMO0000811 SRR988736 done 2013L-5527 20 2013L-5439 20 WGMLST_LMO0000906 SRR1021953 done 3 WGMLST_LMO0000926 SRR1016605 done 2013L-5545 20 2013L-5556 20 WGMLST_LMO0000686 SRR946033 done SRR1166834 done 2013L-5308 20 2014L-6017 20 3 WGMLST_LMO0001160 WGMLST_LMO0000918 SRR1021894 done 3 WGMLST_LMO0000675 SRR1041609, SRR95. done 2013L-5548 20 2013L-5297 20 WGMLST_LMO0000713 SRR972406 done SRR1016609 done 2013L-5345 20 2013L-5560 20 3 WGMLST_LMO0000930 WGLMO0001344 done 2014L-6149 20 14-3198_S1_L001 done CN 20 lettuce 3 2013L-5359 20 2013L-5540 20 WGMLST_LMO0000727 SRR972392 done WGMLST_LMO0000901 SRR1021948 done 3 3 3 3 3 3 3 3 3 3 3 LMO_15 LMO_14 LMO_13 LMO_12 LMO_11 LMO_10 LMO_9 LMO_7 LMO_6 LMO_4 LMO_3 LMO_1 100 99 98 Canadian Lettuce/OH case patient wgMLST NY___IDR1300025739 29 containing 27 29 26 *UPGMA41 tree 5 PNUSAL000320 22 26 41 HI___N13-140 29 27 29 26 PFGE matches to Ohio and 5 PNUSAL000348 22 26 41 WA___19002 29 27 29 26 5 PNUSAL000069 22 26 41 SC___LM14-001 29 27isolates 29 26 Canadian lettuce PNUSAL000383 WA___19014 5 22 26 41 29 27 29 26 *wgMLST groups these 5 PNUSAL000382 22 26 41 WA___19029 29 27 29 26 5 PNUSAL000212 26 41 29 27 as29well 26 isolates ofMD___MDA13096803 interest PNUSAL000305 NYC__nyc13-101340416 5 22 26 41 29 27 29 26 as supports the general tree PNUSAL000270 NY___IDR1300029167 5 22 26 41 29 27 29 26 layout by 5 PNUSAL000278 22 26 seen 41 MI___CL13-200750 29 hqSNP 27 29 26 5 PNUSAL000189 22 26 41 OH___2013042642 29 27 29 26 analysis PNUSAL000296 CA___M13X04891 5 22 26 41 29 27 29 26 30 2 30 2 30 2 30 2 30 2 30 2 30 2 30 2 30 2 30 2 30 2 30 2 5 PNUSAL000315 22 26 5 PNUSAL000063 22 26 41 WA___18947 29 27 29 41 NY___IDR1300016274 29 27 29 26 30 2 26 30 2 5 PNUSAL000548 22 55 PNUSAL000307 5 22 26 5 PNUSAL000052 22 55 41 MA___14EN0045 29 27 CT___344766001 41 29 27 41 SC___LM13-008 29 27 29 26 30 2 29 5 PNUSAL000256 22 26 OH case 5 PNUSAL000090 22 26 PNUSAL000319 5 22 26 5 PNUSAL000172 22 55 5 CA_lettuce 22 55 PNUSAL000104 5 22 55 5 PNUSAL000291 22 55 26 30 2 29 41 NYC__nyc13-101347849 29 27 29 OH___2013063623 41 29 27 29 26 30 2 26 30 2 26 30 2 41 OH___2014001036 29 27 29 26 30 2 41 26 30 2 26 30 2 26 30 2 29 27 29 MI___CL13-200523 41 84 27 29 MI___CL13-200753 41 29 27 29