E.-coli-outbreak-workshop

advertisement
Exploring a fatal outbreak of Escherichia coli using PATRIC
On May 19, 2011, the Robert Koch Institute, Germany's national-level public health
authority, was informed about a cluster of three cases of the hemolytic–uremic
syndrome in children admitted on the same day to the Hamburg university hospital.
As numbers of effected
children began to rise,
they realized that they
had a problem on their
hands. They also began to
see adults that had been
sickened, and that
number also began to
increase. What was now
considered an epidemic
began to spread
throughout Europe. The
hemolytic–uremic
syndrome associated with the epidemic has been characterized by the triad of acute
renal failure (an abrupt loss of kidney function that develops within 7 days),
hemolytic anemia (a condition in which red blood cells are destroyed and removed
from the bloodstream) and thrombocytopenia (low platelet count). Diarrheaassociated hemolytic–uremic syndrome occurs primarily in children, and a
precipitating infection with Shiga-toxin–producing Escherichia coli, mainly of
serotype O157:H7, is usually the primary cause. In adults, the hemolytic–uremic
syndrome with prodromal diarrhea, indicating an infectious cause, is a rare event.
The serotype of the E. coli outbreak strain was determined to be O104:H4. A
comparative genomic examination showed that the pathogen possessed genes
typical of enteroaggregative E. coli, such as attA, aggR, aap, aggA, and aggC, located
on a virulence plasmid. In addition, the strain carried the gene for a Shiga-toxin 2
variant (stx2a). Other typical Shiga-toxin–producing E. coli genes such as stx1, eae,
and ehx were missing.[1]
Using the genomes isolated from this outbreak, we will use PATRIC tools to examine
the presence or absence of specific genes, and also compare the outbreak genomes
to others similar genomes to see if we can see the same patterns of genes.
Creating genome groups
1. Login to the PATRIC website so that you can use your workspace in the
downstream analysis.
2. On the PATRIC homepage (patricbrc.org), open the Organisms tab at the top of
the page.
3. When the tab opens to reveal the box listing the names of pathogens, click on
Escherichia.
4. This will take you to the landing page for Escherichia, which summarizes all the
information that PATRIC has about the genus, including the number of genomes,
experiments associated with it, publications on it, and tools that can analyze the
available data sorted at that taxonomic level.
5. Find the tab across the top that is labeled “Genome List” and click on it.
6. This will take you to the Genome List for the genus Escherichia. On the left you
will see a dynamic filter, and on the right a table that lists the genomes.
7. At the top of the filter on the left hand side you can see a text box. Enter the word
“Germany” in that box and then hit return.
8. This will filter the table on the right hand side to show all the genomes that were
either isolated in Germany, or had that word mentioned in the information that was
submitted when the genome became public. Other information about these genomes
can be seen in the columns, including information like the host that the bacterium
was isolated from.
9. One of the columns to the right of this table is titled “Collection Date”. Click on
those words and it will sort the table in the order of the years that the bacteria were
collected.
10. One click will sort the table from the earliest collection date.
11. A second click shows the most recently collected genomes.
12. Check each of the boxes next to the genome name from the organisms that were
collected in 2011.
13. Click on the “Add Genomes” next to the folder icon in the Workspace header.
14. This will open up a pop-up window that allows you to save the group.
15. Select the “Create New Group” option.
16. Name the group and click “Save to Workspace”. Now that data is saved and you
can use a number of tools to explore it.
Assignment
Create genome groups for the three categories below. Use the dynamic filter on the
Genome List page, and remember that you can use the text box at the top to filter on
specific terms (hint: like O104). You can also use the filters underneath the text box
to further refine your search (hint: Isolation Country and Collection Date). When
you complete your assignment, you will have four different groups that include the
one we just created.



Create a group that contains all the O104 genomes collected in Europe, but
not including Germany, in 2011
Create a group that contains all the E. coli genomes collected in 2011 in the
United States.
Collect all the O104 genomes are available in the PATRIC database, but
exclude those collected in 2011.
Comparing genome groups in PATRIC using the Protein Family Sorter tool
1. To look for presence or absence of the protein families within a genome group
that you have created, click on the Tools tab and under Comparative Genomics,
select the Protein Family Sorter tool
2. This will take you to the landing page for that tool.
3. Scroll down in the Select Organism box until you see the genome groups you
created. Select the boxes for the Germany 2011 group, the O104 group from 2011
that don’t include Germany, and the O104 group that contains genomes isolated in
years other than 2011.
4. Hit the select button under the keyword search box.
5. This takes you to the Protein Family Sorter landing page. On the right you will see
a dynamic filter, and on the left a table that lists all the protein families.
6. One way you can examine differences in your genome groups is to visualize the
data. To do this, click on the Heatmap at the top of the table (next to the Table tab).
1. 7. This will take you the heatmap view, where absence (black cells) and
presence (yellow, mustard and orange cells) can be seen across all genomes.
The genomes are on the y-axis, and the protein families on the x-axis.
8. You can order the protein families by the way the genes occur in a given genome.
This is a good way to check for something called genomic islands, which are parts of
a genome that were not directly inherited, but are obtained from different bacteria
in what is described as horizontal transfer. To do this using the Protein Family
Sorter, click on the down arrow in the text box next to the words Advanced
Clustering.
9. This will open up a list of genomes that are included in the groups. Scroll down
until you find one of the German genomes (Escherichia coli O104:H4 str. Ty-2482).
Click on that name.
10. This will order all the protein families along the order that the genes occur in
the Ty-2482 strain. You’ll notice that several of the genomes appear to have long
black boxes associated with them. This means that these genomes could be missing
a long section of the genome that is present in the reference strain. This is an
indication of a genomic island.
11. To explore a particular section, you should use your mouse to draw a box around
the area of the genome that is next to a black box.
12. This generates a pop-up window that gives the user choices on what they want
to do with the selected data. Click the Show Proteins button at the bottom of the
pop-up window.
13. This will open a new window that shows the genes found in that section of the
heatmap view that you selected.
14. To see the order the genes occur in, first resize the table by changing the number
at the bottom of it to include all the genes and hit return.
15. Then at the top of the table, click once on the column head that reads
“Alternative Locus Tag” to reorder the genes from first to last.
16. You can see that the majority of these genes are sequential (each of the locus
tags increases numerically by one). Moreover, many of the names of these genes
include the word “phage”. This word is derived from “bacteriophage,” which are
viruses of bacteria. They are often associated with horizontal transfer of DNA, the
transfer of genes between organisms in a manner other than traditional
reproduction.
Assignment
Use the protein family sorter and the groups you created to answer the following
questions. Compare the groups from all the genomes collected in Germany in 2011
with the O104 genomes that were not isolated in 2011. Go to the heatmap view and
choose the Escherichia_coli_O104-H4_str_01-0959 (isolated in 2001) as the
reference. If you scroll down the heatmap (use the slider at the bottom of the view),
you will see a large black box in strain E112/10. Use you mouse to select the
proteins found in another genome that occur where the E112/10 genome is missing
them. Many of these are metabolic proteins. From the other classes you have had,
can you determine which pathways would be impacted in the E112/10 strain by not
having these genes?
Comparing genomes in PATRIC using the Protein Family Sorter tool to look for
specific genes.
1. To look for presence or absence of the protein families within a genome group
that you have created, click on the Tools tab and under Comparative Genomics,
select the Protein Family Sorter tool
2. This will take you to the landing page for that tool.
3. Scroll down in the Select Organism box until you see the genome group you
created that contains the genomes from the Germany outbreak. Check the box in
front of that group.
4. We are going to see if these genomes have the Shiga toxin genes described. Enter
the work “Shiga” in the keyword search box and click on the Search button below
the box.
5. This returns a table that has a filter on the right, and the results on the left. You
can see that a single protein family has been found in these genomes.
6. If you look carefully at the name under the product description it says “Shiga-like
toxin II subunit B precursor”. The name is a hyperlink. Click on it.
7. This will take you to the summary information for all the genes in your genome
group that were in that particular protein family. This information includes the
names of the genomes, the various locus tags that identify the genes, and the length
of the proteins.
8. To find out more information about any of the genes, click on any locus tag in the
Column called PATRIC ID.
9. This will take you to the landing page for that gene where all the information
available for it in PATRIC is summarized, including its different gene identifiers,
tools and resources that can be used to examine this gene, and any publications that
might have been written about it.
10. If you remember the story from above, the gene that was associated with the
outbreak was Shiga-toxin 2 variant (stx2a). The “a” generally implies the A subunit,
and we’re looking at the “B” subunit here. What happened to A? A good thing is that
these genes generally travel in pairs, so let’s look at the genes around this one to see
if we can find A. To do this, in the tabs along the top of the page, click on the one
named “Genome Browser.”
11. This will open up a tool that shows you the gene you are looking at, and the
genes surrounding it. The Shiga toxin subunit B gene that we were looking at is the
fig|1048256.3.peg.1439 locus tag.
12. Mousing over the gene immediately upstream reveals the A subunit.
13. If you click on that gene in the genome browser, a pop-up box shows you
specific information about it. Double click on the first line under Feature Details
14. This takes you to the landing page for this gene
15. So there is a Shiga toxin subunit A in this genome. Why didn’t we see it in the
tool. Now you’re exposed to some of the problems research biologist have. The
gene is present, but we don’t see it because it has not yet been assigned to a protein
family. Look down at under Functional Properties. You’ll see that next to FIGFam
Assignments, there is nothing assigned. This means that it is not assigned to a
protein family. Below I’ve provided a comparison of both the A and B subunit. You
can see that Shiga toxin B subunit has a FIGFam assignment, but A does not. That’s
why only the B subunit is seen in the Protein Family Sorter.
Shiga toxin A subunit
Shiga toxin B subunit
Searching for specific genes in PATRIC
Scientists studying the 2011 outbreak found that genomes isolated from the E. coli
bacteria associated with the epidemic certain genes that had previously been
associated with virulence (attA, aggR, aap, aggA, and aggC). In addition, these
strains also carried the gene for a Shiga-toxin 2 variant (stx2a). In contrast, these
same genomes were found to be missing other typical Shiga-toxin–producing E. coli
genes (stx1, eae, and ehx).
In this part of the exercise, we are about to embark on one of the most frustrating
aspects of searching for information that research biologists encounter. In an age
where there is an abundance of information about organisms, their genomes and
genes, and how those genes are expressed, scientists are often unable to find the
information that could help their research. Sometimes the data is located in
different repositories, and each of these places call the genes by different names or
by different IDs. Scientists often rely on older publications that identify their gene
of interest by a certain name, and that name may no longer exist in any resource.
And sometimes, a specific annotation pipeline that is used to call the genes on a
genome and name them may not recognize that a specific gene is there.
Part of this exercise will be to try and map whatever data we can from the outbreak
genomes in PATRIC and find the discrepancies in the available information.
1. In the search box at the top of the page, enter stx2a and coli. This will narrow the
search to look at the E. coli genomes. Hit return.
2. This will take you to the Search Results page. This page will always be structured
with the same format, with the results of genes with the best hit to your search term
on top, followed by genomes. The search results also include taxon (if you’re
looking for a species, genus, family or higher) and experiments that might result
from your search term.
Genes
Genomes
Taxonomy
Experiments
3. Look at the Features the top of the results. These are the genes that match your
search.
85 genes match the search terms
Genes name
Genome name
RefSeq
locus tag
This symbol means that the gene
is a RefSeq annota on, and may or
may not have a PATRIC annota on.
4. As there are 85 features that match this return, lets be more specific and try to
refine the search. In the search box enter stx2a and O104 and hit return.
5. The results table shows fewer genes.
6. Click on the name of the first gene in the list. This will take you to the landing
page for that gene.
Assignment:
Use the landing page to fill out the table below, and then search for the other genes
in PATRIC. You will not be able to find all of them, and to locate some of them, you
may have to broaden your scope (Hint: Start with the O104 genomes, and then
change to “coli” if necessary).
Gene
Name
attA
aggR
aap
aggA
PATRIC locus tag
E. coli strain
FigFam number
Product Description in
PATRIC
aggC
stx2a
fig|1090928.3.peg.1113
O104:H4 str. E112/10
None
Shiga-like toxin II subunit
A precursor (EC 3.2.2.22)
stx1
eae
ehx
In a previous exercise, you learned how to use the FIGFam IDs in the Protein Family
Sorter tool to to see the presence or absence of certain genes across various genome
groups. Use this technique to examine the genomes from the 2011 German
outbreak.
 Which genes do the genomes share, and which are they lacking?
 Expand to the other outbreak genomes outside of Germany. Do they
have a similar pattern?
 Look carefully at the O104 genomes that were not part of the 2011
epidemic. Do any of those genomes have the same pattern as you see
in the German genomes? What are the differences?
References
1.
Frank, C., et al., Epidemic profile of Shiga-toxin-producing Escherichia coli
O104:H4 outbreak in Germany. N Engl J Med, 2011. 365(19): p. 1771-80.
Download