Lucas Rizkalla Introduction: Bacteriophages are increasingly being studied as an alternative to antibiotic resistant bacteria because of their unique properties and revolutionary therapeutic advances. Phages could be used to treat bacterial infections in which all other methods had been exhausted (Svoboda 2009 Popular Science). Phages are difficult to predict because of their capability to mutate, but are nonetheless on the rise as a potential replacement to antibiotics with increased amount of antibiotic resistance bacteria. Phages aren’t harmful to human cells, but can be harmful if mutations arise that create proteins detrimental to the immune system. For example, lysogenic phages are known to encode genes that produce toxins. Some include toxins that cause cholera, hemorrhagic diarrhea, botulism, diphtheria, and scarlet fever (Nicol 2003 Microbial Genetics). Phages can be used to learn more about the bacteria themself because phages grow and change in response to the change in the bacteria. Thus, studying phages can give more knowledge about changes in the bacteria (Svoboda 2009 Popular Science). Because of this, phages provide a useful scope of knowledge to implement progressive phage therapy in the future. Bacillus are a fairly common bacteria in the world around us, present in the food we eat and causes many human diseases. Members of B. anthracis, B. cereus, and B. thuringiensis are of particular interest because of their very similar genome sequences, but different pathogen profiles. B. anthracis is known for causing the lethal anthrax disease, B. cereus as a food contaminant and human pathogen, and B. thuringiensis as an insect pathogen for bioinsecticides. This difference is accounted for by various mechanisms of horizontal gene transfer. Members of the B. cereus group are associated with a variety of bacteriophages, making it possible to study a diverse range of phages that can provide new information about members of this group. Studying phages that infect members of these groups can expand the knowledge of bacteria today and push research forward to moving toward phage therapy (Gillis 2014 Viruses). The B. cereus group is important in studying a wide range of phages to provide answers for the future of phage therapy. Observing how phages perform in conjunction with these bacteria can give knowledge about how different phages with different properties interact with a range of bacteria. Phages kill bacteria through lysis accomplished by the endolysin protein. Bacillus bacteria are important because of their gram-positive structure, making it a good candidate to study phages for endolysin treatment. Endolysin might be an efficient protein-based antibiotic because gram-positive bacteria lack the outer membrane layer of gram-negative bacteria and can be directly targeted by endolysin for destruction (Svoboda 2009 Popular Science). Endolysin is a potential protein that can be used on its own without the whole phage. It could serve as an antibacterial treatment against grampositive bacteria infections applied directly to the infection. In a study performed by Rizkalla 2 Nelson et. al, an endolysin isolated from the B. anthracis gamma phage was shown to rescue 13 of 19 mice in an intraperitoneal model of infection. The enzyme was able to remain fully active after heated for an hour to a temperature of 60˚ C. In addition, a second Bacillus phage PlyPH was shown to have relatively high lytic activity over a broad range of pH (Nelson 2012 Advances in Virus Research). The specific properties of Bacillus endolysins make them useful for therapeutic treatment. Analyzing the conserved domains of endolysins in different bacteria will give insight into which endolysin properties are necessary for lysis. This paper aims to identify conserved domains in Bacillus phage endolysins. The collected sequences of phage endolysins can be put in a multiple sequence alignment to observe phylogenetic characteristics of the phages. Furthermore, the phylogenetic tree can give insight into how conserved domains of endolysin are related. This paper’s focus is to analyze trends of conserved domains in the phylogenetic tree to understand how endolysin can be specified in phage therapeutic treatment. 1. N-acetyl muramidase 2. Lytic Transglycosylase 3. N-acetylmuramoyl- L-alanine amidase 7. D-alanyl-D-alanine carboxypeptidase Can’t find N-acetyl-anhydromuranmyl-L-alanine amidase Figure 1. Various lytic sites of endolysin protein. Rizkalla 3 Table 3. Endolysin Functional Domains Phage Troll Jugalone Phrodo Waukesha B4 Gamma SPO1 Taylor Curly Gemini Vinny Evoli HoodyT Nigalana NotTheCreek SageFayge DIGNKC Zuko Catalytic Domain Cell Wall Targeting Domain N-acetylmuramoyl-L-alanine amidase D-alanyl-D-alanine carboxypeptidase N-acetylmuramoyl-L-alanine amidase D-alanyl-D-alanine carboxypeptidase N-acetylmuramoyl-L-alanine amidase D-alanyl-D-alanine carboxypeptidase SH3 SH3 SH3 N-acetylmuramoyl-L-alanine amidase D-alanyl-D-alanine carboxypeptidase N-acetylmuramoyl- L-alanine amidase SH3 SH3 N-acetylmuramoyl-L-alanine amidase PGRP N-acetylmuramoyl-L-alanine amidase D-alanyl-D-alanine carboxypeptidase Spore cortex-lytic enzyme N-acetylmuramoyl-L-alanine amidase Membrane-bound lytic murein transglycosylase D N-acetylmurmoyl-l-alanine amidase N-acetylmuramoyl-L-alanine amidase Membrane-bound lytic murein transglycosylase D LysM SafA N-acetylmuramoyl-L-alanine amidase Membrane-bound lytic murein transglycosylase D LysM SafA N-acetylmuramoyl-L-alanine amidase Glycosyl Hydrolase 1,4-beta-N-acetylmuramidase (Lyzozyme M1) N-acetylmuramoyl-L-alanine amidase Glycosyl Hydrolase 1,4-beta-N-acetylmuramidase (Lyzozyme M1) N-acetylmuramoyl-L-alanine amidase Glycosyl Hydrolase 1,4-beta-N-acetylmuramidase (Lyzozyme M1) N-acetylmuramoyl-L-alanine amidase N-acetyl-anhydromuranmyl-L-alanine amidase PGRP N-acetylmuramoyl-L-alanine amidase N-acetyl -anhydromuranmyl-L-alanine amidase PGRP N-acetylmuramoyl-L-alanine amidase N-acetyl-anhydromuranmyl-L-alanine amidase PGRP N-acetylmuramoyl-L-alanine amidase N-acetyl-anhydromuranmyl-L-alanine amidase PGRP N-acetylmuramoyl-L-alanine amidase N-acetyl-anhydromuranmyl-L-alanine amidase PGRP Putative peptidoglycan-binding LysM SafA N-acetylmuramoyl- L-alanine amidase N-acetylmuramoyl- L-alanine amidase N-acetylmuramoyl- L-alanine amidase SH3 SH3 SH3 SH3 SH3 Rizkalla 4 Methods: The genomic DNA of bacteriophage Phrodo was sequenced by MiSeq next generation sequencing technology. Sequencing reads were assembled using Newbler software. The genome assembly was visualized by Consed. Physical genome ends containing terminal repeats were identified by identifying a region with double the reads. Gene start positions were auto-predicted using Genemark (Besemer 2005) and Glimmer (Slazburg et al.) inside the DNAMaster software (http://phagesdb.org/DNAMaster/). The start positions were confirmed based on a combination of a 1:1 blastp match, the highest shine dalgarno prediction score, whether the gene covers all the coding potential, and the size of the gaps and overlaps between genes. Phamerator (Cresawn 2011), blastp (https://blast.ncbi.nlm.nih.gov/), and HHPred (Soding 2005) were used to determine predicted functions of each gene. Table 1. Genome Characteristics of Bacillus Phages Characteristic Phrodo Jugalone Zuko DIGNKC Troll Troll % Query Coverage 94 96 91 92 33 % Identity 98 99 88 87 74 37.8 37.80 38.60 38.70 30.3 164227 163345 161552 26504 Best Blast Match % GC Genome Length, bp 164443 Megatron Hakuna Claudi MG-B1 # Predicted ORFs 288 293 296 296 48 # Predicted tRNAs 0 0 3 3 0 Table 2. Average Nucleotide Identity NotThe Claudi DIGNKC Jugalone Nigalana Creek Phrodo SageFayge Vinny Zuko 100 55.5 55.6 54.7 54.8 55.9 54.8 55 55.2 Claudi 55.5 100 61.2 87.1 87.1 61.3 87.7 62.8 90.6 DIGNKC 55.6 61.2 100 61.4 61.3 95.7 60.8 66.8 61.2 Jugalone 54.7 87.1 61.4 100 92.3 61.6 91.9 62.1 87 Nigalana 54.8 87.1 61.3 92.3 100 61.5 92.9 62.6 87.8 NotTheCreek 55.9 61.3 95.7 61.6 61.5 100 61 66.8 61.2 Phrodo 54.8 87.7 60.8 91.9 92.9 61 100 62.8 88 SageFayge 55 62.8 66.8 62.1 62.6 66.8 62.8 100 62.4 Vinny 55.2 90.6 61.2 87 87.8 61.2 88 62.4 100 Zuko Rizkalla 5 Phage genome sequences were compared by dot plot and average nucleotide identity. Gepard was used to create a dot plot of phages, which compared the similarity between two figures using dots to display the extent of their similarity (). DNAMaster was used to calculate average nucleotide identity across the whole genome between many phages, constructed into a table. The table shows the percent nucleotide identity between two phages. Genome maps were examined using Phamerator to identify certain feature and functions of genes. Phamerator was used in making a genome map of Phrodo, which labels predicted functions of each gene. Figure 1. Dot Plot of Sequenced Bacillus Phages. Rizkalla 6 Phage endolysin protein sequences were put in ClustalW to identify their relative evolutionary position using a phylogenetic tree. This phylogeny was constructed using endolysin as the comparative factor. Phages endolysin sequences were taken from Phamerator. Phamerator was also used to identify the proximity between holin and endolysin, whether next to each other or far apart on the genome. The sequences were put into blastp to identify conserved domains and were made into a table separating Nterminal and C-terminal domains. The endolysin sequences from newly found phages as well as older phages were all put through multiple sequence alignment using ClustalW. A phylogenetic tree was created in order to relate endolysin functional domains to phage similarity in the phylogeny. 6 a 1 a 2 a 3 a 5 a 6 a 4 a B. cereus=blue B. subtilis=yellow B. pumilus=green B. anthracis=orange B. thuringiensis=purple =Endolysin & Holin close Figure 2. The phages endolysin proteins were characterized with Clustal multiple sequence alignment, calculated using neighbor joining using BLOSUM62, to create an endolysin phylogenetic tree. The numbers indicate functional domain groups and the stars indicate phages where endolysin and holing are contained within a defined lytic cassette. Rizkalla 7 N-Terminal 1 N-acetylmuramoyl-l-alanine amidase D-alanyl-d-alanine carboxypeptidase 2 N-acetylmuramoyl-l-alanine amidase Membrane-bound lytic murein transglycosylase D 3 N-acetylmuramoyl-l-alanine amidase Gylcosyl hydrolase 1,4-beta-N-acetylmuramidase 4 N-acetylmuramoyl-l-alanine amidase D-alanyl-d-alanine carboxypeptidase Spore cortex-lytic enzyme 5 6 N-acetylmuramoyl-l-alanine amidase PGRP N-acetylmuramoyl-l-alanine amidase N-acetyl-anhydromuranmyl-l-alanine amidase PGRP C-Terminal SH3 LysM SafA LysM SafA N-acetylmuramoyl-lalanine amidase Putative peptidoglycan binding Putative peptidoglycan binding N-acetylmuramoyl-lalanine amidase (Gamma) SH3 (Waukesha) SH3 (Waukesha) SH3 Figure 4. Functional domains of grouped phages from Figure 3. Annotation of N-terminal and Cterminal domains by blastp are shown. Results: The genomes of five phages infecting Bacillus thuringiensis were sequenced to compare genome content. Four of the phages are myoviridae while one (Claudi) is a podoviridae, resulting in differences in phage structure and genome sequences. All myoviridae phage genomes are more than 160,000 bp long, while Claudi genome is around 26,000 bp long (Table 1). Claudi has a relatively high GC content (30.3%) and is within 8% of the other phages. Phrodo was sequenced and annotated to analyze its genome features and structure. Phrodo is approximately 160,000 bps long in genome length. In addition, the predicted number of ORFs is 288. According to Table 1, of the five phages sequenced, Phrodo most closely resembles Jugalone, with both phages sharing the best blast match being Troll. Phrodo and Jugalone have a higher %identity Rizkalla 8 than %query coverage (98% to 94% and 99% to 96%). Zuko and DIGNKC share the same amount of ORFs and tRNAs. In addition, Zuko’s best blast match was Megatron while DIGNKC’s was Hakuna. As these facts demonstrate, Claudi is highly genomically distinct compared to the other phages, which is because Claudi is a podovirus and the other are myoviruses. Furthermore, Phrodo and Jugalone display high similarity to each other, implying their close relationship between each other in a phylogeny. The Bacillus phage genomes were compared by ANI calculation, dotplot analysis and comparative genome map analysis for genome identity. The dot plot analysis of the phages showed the relative size of Claudi’s genome compared to the rest of the myoviridae phages (Figure 1). Furthermore, Claudi showed no overlapping similarity to the genomes of the other phages, lacking any dots in each comparison. Combined, it’s evident that Claudi is genomically different compared to the myoviruses. Phrodo and Jugalone displayed extreme similarity to each other and only slight similarity to the other phages. The dotplot revealed that DIGNKC, Zuko, Nigalana, NotTheCreek, and SageFayge all shared high similarity to each other. The one phage that only displayed slight similarity with the other phages was Vinny. Table 2 shows the average nucleotide identity of phages on the dotplot. Phrodo and Jugalone were 95.7% similar in ANI. The percent identity of Phrodo and Jugalone to other phages was around 60%. DIGNKC, Zuko, Nigalana, NotTheCreek, and SageFayge share high similarity between each other altogether. However, some individual phages share more similarity to each other than the rest of the group. Zuko and DIGNKC are one example, being 90.6% similar to each other, but around 87% similar to Nigalana, NotTheCreek, and SageFayge. Reversibly, Nigalana, NotTheCreek, and SageFayge are about 92% similar in ANI, but around 87% similar to Zuko and DIGNKC. Vinny is less than 70% similar to any other phage in the table. The comparative genomes map of VCU phages (Appendix 1) was used to further identify similarities and differences in genomic characteristics among the phages. As previously looked at by the ANI table and dotplot, the genome map can also help support and provide evidence for similarity or discrepancies. The phages were grouped according to logical groupings based on those tables. It is evident that this map supports the fact that NotTheCreek, SageFayge, Nemo, Nigalana, and Zuko share a lot of purple region indicating high similarity. In addition, Phrodo and Jugalone have a large portion of purple shading to support their similarity in genomes. There are several sites of recombination that occur in both groupings that account for partial differences. Vinny shows little similarity with the other phages as expected. Also, Claudi is limited in size and has little similarity to the other phages. It can be concluded that the genome map helps in identifying genomic similarities and differences to further distinguish or group phages. Using a combination of dotplot, ANI calculation, and genome maps is useful in determining relative phages by comparing their genomes to find similarity. Rizkalla 9 The Phrodo genome map was obtained from Phamerator, showing functional gene predictions as well as the different phams of each gene (Figure 3). The map shows the terminal repeat region, consisting of three duplicate genes, and a non-coding region in front. There were 62 genes that were predicted to have functions. A portion of the genes was reversibly transcribed, from about 2,400 bps to 27,000 bps. The structural genes were all located within the same section of the genome from around 54,200 bps to 67,500 bps. Usually next to each other, endolysin and holin were far apart from each other on the Phrodo genome, implying evolutionary changes from their positions in other phages. The genome also consisted of two novel genes, gp15 and gp106. These novel genes did not appear in Jugalone, which account for minor differences in the genome. Furthermore, the Phamerator map of Phrodo shows several locations of recombination. Some genes that appeared in the genome compared to Jugalone include gp30, gp166, gp219, gp222, Figure 5. Genome map of Phrodo showing functional gene predictions. The different colors of genes represent different phams. gp235, gp259, and gp262. The Phrodo genome also shows sites where genes were not present in Phrodo but appeared in the Jugalone genome. These areas of white shading on Phamerator imply regions of low nucleotide similarity and possible sites of mutations. In addition to recombination, these areas of white shading add the slight difference in nucleotide identity and similarity among genomes. It is expected that endolysin and holin are co-regulated when found in a defined lytic cassette. In Phrodo, these proteins are separated by half a genome length, and Rizkalla 10 regulation of expression is unclear. We looked for predicted promoters in intergenic regions, in the same direction as our genes of interest. Data found by Morgan Van Driest Figure 6. Predicted sigma 32 promoter region for endolysin at 27,000bp (top) and predicted sigma 70 promoter region for holin at 106,600bp (bottom). Promoter locations are indicated with an orange arrow. showed that the predicted promoter for endolysin maybe be controlled by a sigma 32 homolog while holin was predicted to be controlled by a sigma 70 homolog (Figure 6). The predicted sigma 32 promoter was supported by the fact that terminase and endolysin would be transcribed together, as genes required late in the virus lifecycle. This is logical because terminase isn’t needed until the head and tail are assembled, and the head is ready to be filled with DNA before connecting to the tail (Black 2012 Elsevier). Endolysin follows by cleaving the cell wall, releasing mature phages. The sigma 70 promoter was predicted by comparing it to the Jugalone genome and looking for the Rizkalla 11 closest area of non-coding region, present in both phages, with the reasoning that expression of this essential protein would occur from a conserved promoter. The promoter position was located near the 106,600bp region. Van Driest compared her finding to phage Gamma, with endolysin and holin next to each other in a defined lytic cassette. She predicted a sigma 32 promoter around the 7,700bp region, very similar to the promoter predicted for expression of Phrodo’s endolysin gene. Endolysin sequences from phages found this semester were added to a file containing sequences of previously found phage endolysins taken from Phamerator. Figure 7 depicts the resulting phylogenetic tree after multiple sequence alignment. Endolysin are grouped on the tree according to host bacteria used for isolation. The grouped phages on the tree were compared to the conserved functional domains (blastp) of different phage endolysin from Table 3. The first group consisted of phages B4, Troll, Phrodo, and Jugalone. Their functional domains were identical, with N-acetylmuramoylL-alanine amidase and D-alanyl-D-alanine carboxypeptidase at the N-terminal while also having SH3 binding domain at its C-terminal. The second group included Taylor, Curly, and Gemini. They also shared the same functional domains. Their N-terminal domains entail N-acetylmuramoyl-L-alanine amidase and membrane-bound lytic murein transglycosylase D, while their C-terminal domains consists of LysM and SafA. The third group consisting of HoodyT, Vinny, and Evoli include N-acetylmuramoyl-L-alanine amidase, glycosyl hydrolase, and 1,4-beta-N-acetylmuramidase (Lyzozyme M1) at their N-terminus. Their C-termini include the functional domain of N-acetylmuramoyl-Lalanine amidase. SPO1 stands alone as the fourth group, with N-acetylmuramoyl-Lalanine amidase, D-alanyl-D-alanine carboxypeptidase, and spore cortex-lytic enzyme at its N-terminal and putative peptidoglycan-binding at its C-terminal. The fifth group consisting of Gamma and Waukesha has the same N-terminal functional domains of Nacetylmuramoyl-L-alanine amidase and PGRP, but have different C-terminal functions. The sixth group containing DIGNKC, Zuko, SageFayge, Nigalana, and NotTheCreek share N-acetylmuramoyl-L-alanine amidase, N-acetyl-anhydromuranmyl-L-alanine, and PGRP at the N-terminal and SH3 at the C-terminal. Groups 1 and 6 are the only groups in which all known members have only SH3 as their C-terminal protein. Although groups 1, 3, and 6 all belong to the B. thuringiensis family, group 3 is distinct in that it has a cleavage domain at its C-terminal end, whereas the other two groups share a cell wallbinding protein. Phage Waukesha from group 5 has two SH3 proteins at its C-terminal, but stands alone compared to Gamma, which has a cleavage protein at its C-terminal end. The N-terminal end across the phylogeny is fairly conserved, including Nacetylmuramoyl-l-alanine amidase as a cleavage protein in all N-terminal domains. Further research might implicate difference of cleavage and cell wall-binding domains accounting for separation of groups. Rizkalla 12 Discussion: No one phage is completely identical to another, implying the large diversity of genome sequences in phages. Phages with different morphology show very little similarity between their genomes. In our collection of phages, we found four myoviridae and one podoviridae, with significant similarity in genomic characteristics within the myovirus group. In the Bacillus database, there are currently 521 documented phages, and only 36 of those have been sequenced (bacillus.phagesdb.org). In comparison, the NCBI virus genomes database contains 59 sequenced Bacillus phages, and 26 of which are myoviridae, with 6 being podoviridae (ncbi.nlm.nih.gov/genome/viruses/). The database suggests the average genome size and number of ORFs was 168,962 bps and 271 respectively for myoviridae. In podoviridae, the average genome size was 27,658 bps and the number of ORFs was 37. These averages very closely resemble the genomic characteristics in Table 2. All myovirus genome sizes in the table are around 160,000 bps and have 290 ORFs. As Claudi is a podovirus, it is expected that the genome size is significantly less and contains less ORFs. It is also noteworthy to mention to ratio of myovirus to podovirus. Within the NCBI database, there was much less podoviruses than myoviruses. Our data shows that of the 20 phages found, 19 were myoviruses and one was a podovirus. Four of the myoviruses and the one podovirus were sequenced. Based on the previous information, the myoviruses and podovirus found at VCU are conclusively typical. The genome sizes and number of ORFs of VCU phages follow that of the phages in the databases. Phages that display high similarity among nucleotide sequences may prove to be in the same evolutionary proximity. These phages may show possible similarity in a cluster that results in significant similarity in their genomic characteristics. Previously, 60 mycobacteriophages were clustered according to genome similarity with gene location (Hatfull 2010 JMB). Based on the results of our ANI, dotplot and comparative genome map analysis, there are two logical groupings that can be made. The first is the grouping of Zuko and DIGNKC. Their ANI identity was one of the highest (90.6%) and their dotplot result showed that they were significantly similar. However, they are the most similar to each other within another possible logical grouping. This bigger grouping consists of Zuko, DIGNKC, NotTheCreek, Nigalana, and SageFayge. NotTheCreek, Nigalana, and SageFayge all showed ANI of above 90% and were consistently similar according to the dotplot and genome maps. However, the similarity between Zuko and DIGNKC and NotTheCreek, Nigalana, and SageFayge was about 87-88% similar (Table 3). It’s conclusive that all five phages can be logically group together because of their high comparative similarity, but separate within that group because of a higher similarity between certain individuals. Furthermore, Phrodo and Jugalone can be grouped together because of their extensive genome similarity. They exhibited the highest ANI values, at 95.7% (Table 2). Rizkalla 13 They also were the only phages that matched with each other on the dotplot (Figure 1). Finally, comparison of whole genome maps shows overall nearly identical genomes with small amount of recombination in comparison to the large size of the genomes. Both genomes have similar terminal repeat regions, consisting of the same three genes, except Jugalone contains an additional gene predicted in its terminal repeat. Phages can be clustered where nucleotide identity is identified over large genome segments. According to Hatfull, one way to identify this similarity is by analyzing a dot plot and determining where two genomes show evident sequence similarity of more than 50%. Another way is by analyzing ANI values. Those that are within 53%-59% are not clustered, whereas those that exhibit high values of ANI can be ideally clustered (Hatfull 2010 JMB). It is apparent that these grouping can be made because of the strong similarity between phages or lack there of. In contrast, Vinny and Claudi showed no similarity to any of the phages studied. The dotplot analysis lacked any lines for Claudi (Figure 1). According to Hatfull, phages not clustered together ranged between 53%-59% (Hatfull 2010 JMB). It’s safe to conclude that Claudi could not be clustered with any phage in the table because its ANI values were within the 53%-59% range, making Claudi a singleton. In contrast, Vinny has diffuse dotting across the plot with the two larger groups. Comparison of whole genome maps shows that Vinny shows many genes in the same pham as the rest of the phages in the table. However, comparison using Phamerator showed little to no purple shading to signify high nucleotide identity. Vinny ANI values were within the mid 60s, suggesting that Vinny might have at one point been very similar to the rest of the phages, but diverged due to evolution. Hatfull had placed phages that had ANI values in the range of 63%-67% into subclusters, suggesting a possibility of putting Vinny in a subcluster (Hatfull 2010 JMB). It can be concluded that Vinny might possibly be placed in a subcluster because of its slightly higher range of ANI values, but Claudi remains a singleton in this group of phages. The comparative genome maps further support similarity between phages found by analysis through ANI and the dotplot. The maps can show regions of genome similarity so that it can be characterized in a cluster. It can also show similarity between distant genomes and see where they diverged from more closely related phages (Hatfull 2010 JMB). As mentioned earlier, there were two logical groupings of phages that could be made through analysis of ANI and the dotplot. The phages from one group that were compared on the genome map included Zuko, SageFayge, Nigalana, and NotTheCreek. These phages were already shown to have high similarity between each other. However, Zuko and DIGNKC were slightly less similar to the other phages in the group. By looking at the comparative genome map, it’s possible to identify where the phages began to diverge into their own subgroup. In the Hatfull paper, the genomes of phages were aligned and categorized into clusters and subclusters. The comparative map helps in determining in which phages the genomes begin to diverge and Rizkalla 14 become a subcluster (Hatfull 2010 JMB). By studying the genome maps, more information can be found out about related phages. Phrodo was annotated for functional gene predictions. There were 62 genes out of 288 genes that were predicted to have functions. A few genes were introduced into the Phrodo genome due to recombination. Some of these genes were novel genes while other were visible in other phages. Of the novel genes and genes seen in other phages found in Phrodo, only two were predicted to have function. Interestingly, both were predicted to function as HNH Endonuclease. The sigma factor was annotated to be located toward the middle of the genome. In addition, Endolysin was found at the beginning of the genome, while holin was further away. Regulation of expression of endolysin and holin is presumed to be tightly controlled. The promoter sequence data we predicted (Figure 6) suggests that endolysin and holin that aren’t near each other in the genome don’t necessarily need the same promoter. The difference of sigma factors in Phrodo shows that holin may be expressed by a sigma 70 promoter (early gene expression), while endolysin may be expressed by a sigma 32-like promoter (unknown time of expression, but different from early genes). In comparison, we predicted a sigma 32-like promoter would be used to express the lytic cassette from group 5 phages like gamma (Figure 2). Endolysin is the more significant of the two since it controls cleavage. Holin is transcribed and accumulates in the membrane, but without any effect on the membrane integrity. The sudden triggering of holin to form pores in the membrane is timed by allelic specificity. Only then is endolysin capable of destroying the cell wall. Since endolysin relies on the triggering of pore-formation caused by holin, it can be transcribed at any time as long as it occurs before the triggering of pore-formation (White 2010 PNAS). This supports our data that holin can either be transcribed before endolysin (sigma 70 promoter), or at the same time as endolysin (sigma 32-like promoter) because holin controls lysis timing. It might be noteworthy to identify the N-terminal as the dependent factor. The different functional domains of endolysin suggest contribution to the diversity among groups in the phylogenetic tree. Because the N-terminal shows variability among groups, it might suggest its key role of changes in the tree. A study performed by Becker revealed that the SH3 domain enhances the role of the catalytic domain, but is not essential for lysis activity. He also noted that the N-terminus of the protein plays an essential role in cell lysis (Becker 2014 FEMS). Groups 1 and 6 share a cell wall-binding domain of SH3, but stray from each other in terms of catalytic domain. This difference in catalytic domain, but same CWB domain might suggest the separation of the two groups on the tree. All groups share one common domain, but also have their own. This difference in cleavage domain might also suggest something about host lysis ability and its range. More research would have to be done to provide a relationship between cleavage domains and host lysis range among these phages and to see how C-terminal domains Rizkalla 15 might suggest a different grouping of the phylogenetic tree, possibly with groups 1 and 6 together. Future projects include creating domain phylogenies to understand the grouping of the phages better. Studying endolysin properties and characteristics, especially their functional domains, might reveal their capabilities in the future of phage therapy. For example, phage B4 endolysin consists of SH3 as a cell wall-binding protein and D-alanyl-d-alanine carboxypeptidase as a cleavage protein, and appears in group 1 on our phylogeny (Figure 2). LysB4 was also experimentally characterized as an L-alanoyl-D-glutamate endopeptidase, showing optimum temperature and pH at 50C and 8.5 respectively. The broad host lysis activity of B4 suggests that it could be used as a biocontrol agent in phage therapy against B. cereus (Lee 2013 Arch Virol). As an extension, a more fine tuned comparison of group 1 endolysins to this published work might reveal the importance of the N-terminus in optimization of host lysis. By understanding which cleavage domains work well in specific host lysis, and which CWB domains enhance it best, research can progress towards identifying the ideal combination of the two to improve phage therapy. Rizkalla 16 Works Cited Besemer, J., and Borodovsky, M. (2005). GeneMark: web software for gene finding in prokaryotes, eukaryotes, and viruses. Nucleic Acids Research. 33. 451-454. doi:10.1093/nar/gki487 Black, L., and Rao, V. (2012). Structure, assembly, and DNA packaging of the bacteriophage T4. Elsevier. 82. 119-147. doi: 10.1016/B978-0-12-3946218.00018-2 Becker, S., Swift, S., and Korobova, O. (2014). Lytic activity of the staphylolytic Towrt phage endolysin CHAP domain is enhanced by the SH3b cell wall binding domain. FEMS Microbiology Letters. 362. 1-8. doi: 10.1093/femsle/fnu019 Cresawn, S., Bogel, M., Day, N., Jacobs-Sera, D., Hendrix, R., and Hatfull, G. (2011). Phamerator: a bioinformatic tool for comparative bacteriophage genomics. BMC Bioinformatics. 12(395). 1-14. http://www.biomedcentral.com/1471-2105/12/39 Gillis, A., and Mahillon, J. (2014). Phages preying on Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis: Past, Present and Future. Viruses. 6. 2623-2672. www.mdpi.com/journal/viruses Hatfull, G., Jacobs-Sera, D., and Lawrence, J. (2010). Comparative genomic analysis of 60 mycobacteriophage genomes: genome clustering, gene acquisition, and gene size. Journal of Molecular Biology. 397. 119-143. doi:10.1016/j.jmb.2010.01.011 Lee, J., Shin, H., and Son, B. (2013). Characterization and complete genome sequence of a virulent bacteriophage B4 infecting food-borne pathogenic Bacillus Cereus. Archive of Virology. 158. 2101-2108. doi: 10.1007/s00705-013-1719-2. Nelson, D., Schmelcher, M., and Rodrigues-Rubio, L. (2012). Chapter 7-Endolysins as microbials. Advances in Virus Research. 83. 229-365. doi:10.1016/B978-0-12394438-2.00007-4 Nicol, K. (2003). Virulence factors carries on phages. Microbial Genetics. Web. 3 May. 2015. http://www.sci.sdsu.edu. Salzburg, S., Delcher, A., Kasif, S., and White, O. (1998). Microbial gene identification using interpolated Markov models, Oxford University Press. 26 (2). 544-588. Soding, J., Biegert, A., and Lupas, A. (2005). The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Research. 33. 244248. doi:10.1093/nar/gki408 Svoboda, E. (2009). The next phage. Popular Science. Web http://www.popsci.com/scitech/article/2009-03/next-phage White, R., Chiba, S., and Pang, T. (2010). Holin triggering in real time. Proceeding of the National Academy of Sciences of the United States of America. 108(2). 798-803. doi: 10.1073/pnas.1011921108 Rizkalla 17 Appendix 1. Comparative genomics map of Phages: https://blackboard.vcu.edu/bbcswebdav/pid-5371969-dt-content-rid14822722_2/courses/BNFO-252-001-2015Spring/VCU%20phages%20maps.pdf