Exploring biochemistry using metabolic pathways in bacteria: Genome Reduction Since the first bacterial genome was sequenced in 1995, we have come a long way in our understanding of these organisms. We now have an understanding of the average size of a genome [approximately 4 million base pairs(Mbp)] and the number of genes (approximately 3000 genes). Since 2006, scientists have found that there are bacteria with extremely small genomes. Many, but not all of these bacteria are symbionts or pathogens that live within a host organism[1]. Originally, the smallest bacterial genome was thought to be Mycoplasma genitalium, weighing in at 0.58 Mbp with 475 genes. We now know that M. genitalium is at the larger end of the reduced genome size spectrum, with many genomes considerable smaller (see figure on right, from McCutcheon and Moran[1] that shows the relative sizes of some of these within a M. genitalium genome). The smallest is Candidatus Tremblaya princeps, which has an extremely reduced genome of 0.13 Mbp. What are the impacts of having an extremely small genome? We will explore this question in PATRIC by examining the genomes and comparing them to a “normal” sized bacterium, using metabolic pathways to search for differences and similarities. For this exercise we will use the table below. Genome ID Organism 1423.136 Bacillus subtilis ATCC 13952 243273.25 Mycoplasma genitalium G37 272947.5 Rickettsia prowazekii str. Madrid E 392021.5 Rickettsia rickettsii str. 'Sheila Smith' 325775.3 Coxiella endosymbiont of Amblyomma americanum C904 372461.17 Buchnera aphidicola BCc Genome Size (Mbp) 3.87 0.58 1.11 1.25 0.656 0.42 Protein Coding Genes Lifestyle 4033 Free-living 542 924 1567 Pathogenic 577 375 Symbionts Buchnera aphidicola (Cinara tujafilina) Wigglesworthia glossinidia 36870.6 endosymbiont of Glossina brevipalpis 261317.3 36868.4 Wigglesworthia glossinidia endosymbiont of Glossina morsitans 444179.5 Candidatus Sulcia muelleri GWSS 871271.4 Candidatus Zinderia insecticola CARI Candidatus Carsonella ruddii HT 1202539.3 isolate Thao2000 387662.1 Candidatus Carsonella ruddii PV 573658.4 Candidatus Hodgkinia cicadicola Candidatus Hodgkinia cicadicola 573234.4 Dsem Candidatus Tremblaya phenacola 1266371.3 PAVE 891398.4 Candidatus Tremblaya princeps PCIT Candidatus Tremblaya princeps 1053648.4 PCVAL 0.44 410 0.7 665 0.71 0.24 0.2 663 277 304 0.15 0.15 0.15 174 179 373 0.14 177 0.17 0.13 205 233 0.13 235 with reduced genomes Symbionts with extremely reduced genomes Creating genome groups 1. Login to the PATRIC website so that you can use your workspace in the downstream analysis. 2. On the PATRIC homepage (patricbrc.org), open the Tools tab at the top of the page. 3. When the tab opens to reveal the box listing the tools, click on Genome Finder (highlighted in dark blue below). 4. This will open the landing page for the Genome Finder tool. As you are logged into the website, any genome groups you have created will be visible in the box that you can see under “Select Organism(s)”. We will ignore those groups as we will be generating our own. 5. In the box below “Enter Keyword” enter the genome IDs for free-living bacteria in the table above (Hint: You can cut and paste directly from the table), then click the Search button. 6. This will take you to the results page for the Genome Finder too.. On the left side you will see a dynamic filter that we won’t use in this experiment, and on the right side you will see a table that lists the best results of your search. 7. Pay close attention to the genomes that were returned. One of those is not from the list that we provided. 8. Select the genomes that match those from the table, then click on the “Add Genomes” next to the folder icon in the Workspace header. 9. This will open up a pop-up window that allows you to save the group. 10. Select the “Create New Group” option. 11. Name the group and click “Save to Workspace”. Now that data is saved and you can use a number of tools to explore it. Assignment Create genome groups for the three categories below. Make sure that you are getting the genomes from the table into your groups. There might be some extras that you will have to weed out. Pathogenic Symbionts with Reduced Genomes Symbionts with Extremely Reduced Genomes Comparing pathways using the Comparative Pathway Tool 1. Open the Tools tab at the top of the page. 2. When the tab opens to reveal the box listing the tools, click on Comparative Pathway Tool (highlighted in dark blue below). 3. This will take you to the landing page for the Comparative Pathway View tool. 4. If you have a lot of genome groups, you will have to scroll through the Search box to find your genome group of interest. For this example, find the genome group you created that had the free living bacterial genomes (Bacillus subtilis and Mycoplasma genitalium) and click the check box in front of it. 5. Under Enter keyword, you will need to click the Search button. 6. This will return the pathway summary page that summarizes all the information for specific pathways scoped to the genomes in your specific group (in this case the two genomes from free-living bacteria). 7. To look more specifically at how the data is summarized, choose the first pathway on the list (in this case, the pathway called Ascorbate and aldarate metabolism) by clicking on the pathway name, which is a hyperlink. 8. The page that returns is the Pathway page. It has two parts, a summary table on the left that shows the enzyme commission (E.C.) numbers that have been assigned to each gene. These numbers denote a specific metabolic function that a specific gene has. 9. The summary table also has other data. You can look more closely at the column heads by grabbing the edge of the column box that contains the title and moving it to the right. Text that is in bold (2.7.1.69 below) indicates that all genomes in the group have at least one gene that has that particular EC number 10. On the right you can see a pathway map that has the gene data mapped to it. The particular pathway maps we use in PATRIC were designed by the Kyoto Encyclopedia of Genes and Genomes (KEGG) group. 11. You’ll notice that there are boxes with numbers in them, and some of them have different colors. Boxes that are not colored (white) indicate that genes with this particular functionality are absent from the genome group you are examining. Boxes that are olive green colored indicated that some, but not all to the genomes in your particular selection, have a gene in their genome with this particular function. Boxes that are colored bright green mean that all the genomes in your selection have at least one gene with this particular function. 12. Tools at the top of the pathway map allow you to save the pathway summary to your computer, or expand the entire pathway map for easier viewing. 13. One of the problems with a pathway summary map is that its often difficult for researchers to see which organism has the specific enzyme, and which is missing it. To see a summary of this type of information, find the Heatmap tab that is located next to the KEGG tab at the upper left and click on that. 14. This will take you to a data summary in a heatmap format. The legend is on the right, and the summary is on the left. The heatmap has the genomes on the x axis, and the genes on the y axis. B. subtilis has a lot of genes involved in this pathway, and M. genitalium has only one. Remember that M. genitalium is free-living, but has a much smaller genome than B. subtilis. 15. Scrolling over a specific cell with provide information about the gene in the blue band above the genome names. 16. To see more details about a specific gene, you can double click on a particular cell. In the picture above that would be the orange cell to the right of the gene name. This will generate a pop-up box that provides you with a number of choices. 17. Click the Show Proteins button. 18. This will take you to another pathway summary table that will tell you the other pathways this particular gene is involved in. Comparing pathways across different genome groups 1. Open the Tools tab at the top of the page. 2. When the tab opens to reveal the box listing the tools, click on Comparative Pathway Tool (highlighted in dark blue below). 3. This will take you to the landing page for the Comparative Pathway View tool. 4. Select the free-living group and the group of symbionts with reduced genomes in the box below “Select organism(s).” 5. In this example, let’s define a specific family earlier in the process. Enter Glycolysis in the text box. Glycolysis is a metabolic pathway that converts glucose to pyruvate, and it is supposed to be conserved across nearly all organisms. The fact that it is so widely shared suggests that this is an ancient pathway that evolved very early. Click on the Select button. 6. This will take you to a table that has Glycolysis as the only entry. 7. Click on the name Glycolysis/ Gluconeogenesis 8. This will take you to the pathway number. Note from the numbers with the bold font (and also the bright green boxes) that there are a number of genomes that all have genes with specific EC numbers, but there are also a lot that appear to be specific, or limited to, certain genomes. 9. Click on the Heatmap tab. 10. This will show the gene present and absence for both genome groups. 11. You can move the order of the genomes in the table by clicking on the genome header with the genome you’re interested in, and then drag it to the area of the heatmap you would like to see it. Be careful not to release the click until you have moved the column, or it will generate the download popup box. Before A er 12. Refer to the table at the beginning of this document and move the genomes in order of their size from smallest to largest. Sometimes this can reveal some interesting patterns. Assignment: Answer the following questions using the PATRIC website. 1. Return to the Compare Pathway tool and select all the genome groups you created for this exercise (free-living, pathogenic, symbionts with reduced genomes, symbionts with extremely reduced genomes). Enter Glycolysis into the Keyword text box and click select. a. Arrange the genomes in order of their size. What patterns do you see? b. What is happening with the extremely reduced genomes? If all organisms are supposed to be able to perform glycolysis, what do you think is happening with these bacteria. 2. Advanced question: You will need to create two new genome groups, one with all the complete genomes found in Brucella, and another from all the complete genomes found in Bartonella (Hint: You may have to look at the previous classes, like the one that looked at antibiotic resistance, to remind yourself how to do that). These two bacteria are related to each other, but each cause different types of disease and have different lifestyles. a. Using the Comparative Pathway tool, look at the Citrate Cycle for both of these groups. What are the genes that all the genomes share? Are there specific patterns of presence or absence that you see are specific to each of these genera? Locate those genes that are unique on the KEGG map so that you can see the strategy of each genus. b. Generate a multiple sequence alignment for the all the genes that have an EC number of 1.1.1.42 across those two groups. Examine the gene tree. How are the genes clustering together? Do you see any distinct changes in the alignment that are specific to a certain genus? 3. Compare the pathogenic group that you created earlier with the Brucella and Bartonella genome groups. If you were to look only at the Citrate Cycle, whom do the Rickettsia more closely resemble? References 1. McCutcheon, J.P. and N.A. Moran, Extreme genome reduction in symbiotic bacteria. Nat Rev Microbiol, 2012. 10(1): p. 13-26.