The Whole Genome Sequencing Revolution

advertisement
The Whole Genome Sequencing
Revolution
Martin Wiedmann
Gellert Family Professor of Food Safety
Department of Food Science
Cornell University, Ithaca, NY
E-mail: mw16@cornell.edu
Phone: 607-254-2838
Outline
• Subtyping for disease surveillance: from PFGE to WGS
• WGS challenges: when are two isolates the same or
different? Can we find identical isolates in different
locations?
• Looking in the future
PulseNet allows
international outbreak
detection and traceback –
a hypothetical example
Food isolate, deposited into
PulseNet
Human case
Human case
Whole Genome Sequencing
• It all started with the human genome project
• Sequencing of a bacterial genome is now
feasible at costs of <$100/isolate
• Costs will continue to drop
• Commonly used platforms include
• Roche 454
• Illumina HiSeq/MiSeq
• Applied Biosystems SOLiD Systems
• Life Technologies/Thermofisher Ion
Torrent;
• PacBio RS
• Nanopore based systems (e.g., Oxford
Nanopore MinION)
The genome sequence revolution
DNA sequencingbased subtyping
1
3
2
4
Isolate
Isolate
Isolate
Isolate
1
2
3
4
AACATGCAGACTGACGATTCGACGTAGGCTAGACGTTGACTG
AACATGCAGACTGACGATTCGTCGTAGGCTAGACGTTGACTG
AACATGCAGACTGACGATTCGACGTAGGCTAGACGTTGACTG
AACATGCATACTGACGATTCGTCGAAGGCTAGACGTTGACTG
SNP: single nucleotide polymorphism
Challenges with use of PFGE as a
subtyping method in outbreak
investigations
• Two isolates may show the same PFGE type even
though they are genetically distinct
• PFGE only interrogates small part of the genome
• Two isolates may show “slightly” (?? - the “3-band
rule”) different PFGE patterns despite sharing a very
recent common ancestor
• Could be due to lateral genes transfer, loss of
plasmid, rearrangements, point mutations etc.
Xbal
SpeI
Includes isolates form
Salmonella outbreak
linked to sausages
(Rhode Island) and
isolates from pistachios
L
Den Bakker
et al. 2011.
AEM.
Tip-dated maximum clade credibility tree
based on SNP data for 47 Montevideo
isolates
• Salmonella Enteritidis is most common cause of human
salmonellosis
– poorly resolved by current subtyping technologies.
PFGE type frequency
52 PFGE types
4
34
2
21
5
8
19
692
56
23
327
88
231
899
879
199
MLVA type frequency
98 MLVA types
B
G
BQ
F
J
W
I
D
AI
BN
AC
E
AG
V
AB
AF
BD
MLVA-PFGE type frequency
B4
B34
G4
B21
BQ8
I5
W4
J4
D4
BN692
AI19
AC2
F2
V4
AG56
J21
163 combined
MLVA-PFGE types
Full genome sequencing identified the
following differences between these
isolates:
(i) 28 single nucleotide polymorphisms
(SNPs) and
(ii) three indels, including a 33 kbp
prophage that accounted for the
observed difference in AscI PFGE
patterns.
Both isolates were found to harbor a 50 kbp
putative mobile genomic island encoding
translocation and efflux functions that has
not been observed in other Listeria
genomes.
Gilmour et al. BMC Genomics 2010, 11:120
In addition, whole genome sequencing showed that 5 Listeria isolates collected in 2010 from
the same facility were also closely related genetically to isolates from ill people.
Listeria Outbreaks and Incidence, 1983-2014
Incidence
(per million pop)
No. outbreaks
8
Outbreak
9
7
Incidence
8
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013
Era
Outbreaks per year
Median cases per
outbreak
Pre-PulseNet
0.3
69
Data are preliminary and subject to change
Early
PulseNet
2.3
11
Listeria
Initiative
2.9
5.5
WGS
8
4.5
March 2015: Listeriosis cases linked to Blue Bell ice
cream
Outline
• Subtyping for disease surveillance: from PFGE to WGS
• WGS challenges: when are two isolates the same or
different? Can we find identical isolates in different
locations?
• Looking in the future
The challenge
• Identical bacteria (100% match over the whole
genome) can be found in different places that
can be potential sources of foodborne disease
outbreaks
The theoretical background
• Bacteria divide asexually: Bacterial populations can be seen as large
populations of “identical twins”
• Mutation rate during replication is low: extremes of the suggested
mutation rates range from 2.25 × 10-11 to 4.50 × 10-10 per bp per
generation
– With a genome size of around 5 Million bp per bacterial genome (5 × 106)
between approx. 450 and 9,000 generations are needed for a single SNP difference
– Eyre et al. estimated evolutionary rate of 0.74 SNVs per successfully sequenced
genome per year for C. difficile (N. Engl. J. Med. 2013)
• “Whole-genome sequencing … identified 13% of cases that were genetically
related (≤2 SNVs) but without any evidence of plausible previous contact
through a hospital, residential area, or family doctor.”
– Unknown bacterial generation time in different environments complicates
interpretation
2000 US outbreak - Environmental
persistence of L. monocytogenes
• 1988: one human listeriosis case linked to hot dogs produced by plant X
• 2000: 29 human listeriosis cases linked to sliced turkey meats from plant X
Real world observations
Real world observations
In one case, isolates with < 3 SNP differences were found in
retail delis in there different states
Conclusions
• Even with WGS, epidemiological data are still essential
• Number of SNP differences/allele differences that is meaningful
differs by organism, strain, outbreak/cluster, and growth
environment
– Number of bacterial generations per calendar year can differ
hugely (think dry environment versus active infection in an
animal population)
• Best way to determine “meaningful” SNP differences is through
combination of phylogenetic and epidemiological data
Looking in the future
• WGS will get cheaper and will be used more
– STEC next, probably Salmonella Enteritidis after that
– Detection of more clusters and outbreaks
• WGS database will grow rapidly with inclusion of environmental
isolates
– More outbreak will be linked to source by using WGS matches
between food or environmental isolates and human isolates
as stating point
• More broad application of WGS by private labs, maybe
customers and consumers?
Conclusions
• WGS is a game changer and will significantly improve
detection of outbreaks, adulteration, etc.
– False alarms will occur though
• Pathogen detection in environments, by regulatory
agencies, will lead to inclusion of WGS data in
CDC/FDA/USDA databases (GenomeTrakr)
– Environmental pathogen monitoring by industry will
become even more important
30
Analysis of genome wide SNPs (wgSNPs)
• Identifies all high confidence SNPs over whole
genome (approx. 3 to 5 million nucleotides)
Whole genome multilocus sequence typing
(MLST)
• Allows for simpler analysis and clear naming of
subtypes
• Performs comparison on a gene by gene level
Isolate A
Isolate B
Isolate C
Gene 1
1
1
1
Gene 2
8
8
12
Gene 3
5
5
2
Gene 1,005
4
4
4
wgMLST type
A
A
B
Etc.
Download