CALIFORNIA STATE UNIVERSITY, NORTHRIDGE DNA PACKAGING BY TERMINASE LARGE SUBUNIT IN METHANOPHAGE PG A project submitted in partial fulfillment of the requirements For the degree of Master of Science In Biology By Thomas Dang May 2014 The graduate project of Thomas Dang is approved: Virginia O. Vandergon, Ph.D. Date Michael L. Summers, Ph.D. Date Larry Baresi, Dr.P.H., Chair Date California State University, Northridge ii ACKNOWLEDGMENTS I would like to thank my committee members, Dr. Virginia Vandergon and Dr. Michael Summers. Thank you for being such great professors and all your dedication and hard work. Thank you to all the professors who assisted me during my academic development at CSUN and who encouraged me to continue my education. I would especially like to thank my mentor, Dr. Larry Baresi. Through all my troubles and deployments, you were always understanding and welcoming. You could not have been any more patient with me and I thank you for all you have done for me. Thank you to all my colleagues who have helped me through my stay here at CSUN. It is a forever lasting friendship and I thank you for all your moral support. iii TABLE OF CONTENTS Signature Page ii Acknowledgements iii Abstract v Introduction 1 Methods 14 Results 35 Discussion 91 References 97 Appendix A 102 Appendix B 112 Appendix C 128 iv ABSTRACT DNA PACKAGING BY TERMINASE LARGE SUBUNIT IN METHANOPHAGE PG By Thomas Dang Master of Science in Biology Recent studies have shown bacteriophages infect living organisms from all three domains of life, however, not much is known about phages that infect the Archaea. Of the approximate 50 reported archaeal phages, PG is one of only three known viruses that infects methanogens and falls within the order of tailed-bacteriophages, Caudovirales. Tailed bacteriophages have differing DNA replication strategies that is reflected by the various terminal chromosomal ends created by the terminase large subunit. Although PG’s replication and packaging process is unknown, bioinformatic studies of PG's terminase large subunit could help identify potential DNA packaging strategies. Comparative analysis between other phages demonstrate that they cluster together according to the type of terminal ends they create. PG was shown to cluster with termini short direct terminal repeats and cohesive ends. Studying the terminase large subunit of PG could lead to a better understanding of its replication strategy, genetic history, and increase our understanding of viruses in general. v INTRODUCTION Archaea Archaea represents one of the three domains of life and are known to have unique metabolic capabilities such as the production of methane and non-chlorophyll light harvesting (Cavicchioli, 2011). Archaea are recognized as one of the earliest life forms sharing molecular characteristics with Eubacteria and Eukarya and as such are carefully studied when looking at the evolution of life on Earth. (Woese and Fox, 1977). The distinct and shared traits have allowed the Archaea to develop ways to exist in extreme environments and are grouped into methanogens, halophiles, and thermoacidophiles (Cavicchioli, 2011). Methanogens 16s rRNA gene sequencing divides the Archaea domain into three phyla known as Euryarchaeota, Crenarchaeota, and Korarchaeota. Methanogens represent a large and diverse portion of Euryarchaeota. They are also a model for many molecular studies in archaeal replication, transcription, regulation, and protein structure (Leigh, et al., 2011). Methanogens are known to reside in anaerobic environments that are ranging from the human gut, rumen of cattle, sediments, and deep-sea volcanic vents. In addition, they are characterized by their ability to harvest energy by catabolizing H2/CO2, formate, methanol, methylamines, or acetate to produce methane (methanogenesis) (Reeve, 1992). 1 Methanogenesis follows one of three known pathways and requires unique coenzymes. H2 dependent CO2 reduction pathway is the most common where CO2 is reduced to CH4. Some methanogens are capable of using formate where it is oxidized to CO2 that then enters the CO2 reduction pathway (Blaut, 1994). The separate methylotrophic pathway utilizes methanol or methylamines as an energy source converting the methyl group to CH4 and is limited to the family Methanosarcinacea. The third methanogenic pathway, the aceticlastic pathway, generates methane from the internal oxidation-reduction of acetate. Acetate is a crucial intermediate that provides the primary source of methane in freshwater environments and anaerobic digesters (Blaut, 1994). Even though acetate plays a significant role in the production of methane in nature, there are very few methanogens known that are capable of utilizing acetate as a precursor for methanogenesis. Those with the ability to utilize the aceticlastic pathway produce methane and belong to Methanosarcinacea, which are the most metabolically diverse. Unlike fresh water sediment and anaerobic digesters, the H2 dependent CO2 reduction plays a significant role in rumen metabolism through a process referred to as interspecies hydrogen transfer. Interspecies hydrogen transfer is the process by which reducing equivalents are transferred between a donor, usually a Eubacteria and a recipient methanogen utilizing H2 dependent CO2 metabolism. Radiocarbon [14C-] isotope analysis indicates that as much as 70 to 80% of the methane produced by ruminants comes by way of H2 dependent CO2 metabolism while in wetlands, rotting biomass, and wastewater treatment plants H2 dependent CO2 metabolism accounts for only 20 to 30% with the remaining coming from aceticlastic metabolism (Johnson, et al., 1995). 2 Recently, it has been suggested that interspecies process may play a role in human obesity. The human colon houses numerous amounts of Eubacteria and Archaea methanogens. Large numbers of H2 consuming methanogens, specifically Methanobrevibacter smithii, are found in about 50-85% of the human population (Samuel, et al., 2006). The mutualistic relationship that we have with Eubacteria and methanogens allows us to digest large complex dietary polysaccharides more efficiently through interspecies hydrogen transfer. The hypothesis is that Eubacteria ferment the complex polysaccharides to short chain fatty acids (SCFA’s) primarily acetate, propionate, and butyrate, in addition to some organic acids and gases such as hydrogen and carbon dioxide. If H2 builds up, then the Eubacterial NADH dehydrogenases would be inhibited leading to a decrease in the production of SCFA's, which would decrease the efficient utilization of polysaccharides. (Samuel, et al., 2006). On the other hand, if interspecies hydrogen transfer occurred through the presence of a H2 dependent CO2 metabolizing methanogen, M. smithii, then the H2 would not accumulate, the dehydrogenases would not be inhibited, and the polysaccharides would be metabolized more efficiently. The effects of increasing digestion of fibers and carbohydrates is hypothesized to influence host calorie intake and obesity by producing more acetate and triglycerides which will eventually be stored as fat. Therefore, M. smithii has been projected as a probable therapeutic target for decreasing energy harvested in obese humans (Samuel, et al., 2007). Viruses that attack M. smithii could be used to control the methanogenic population thus altering the carbon flow in the intestine and affecting caloric intake. 3 Viruses Viruses are acellular entities that minimally have a nucleic acid with a protective protein coat. They have been found in all three domains, Eukarya, Eubacteria, and Archaea. Because viruses lack the essential and necessary metabolic processes for autonomous reproduction, they do not proliferate by cellular division. Instead, viruses proliferate by taking over the resources of a host in order to reproduce. Due to virus dependence on its host for replication, viruses are first classified by their host preference followed by morphology, genome type, and structures (Orlova, 2009). The nucleic acid represented within the genome can either be single stranded or double stranded DNA or RNA. The genome could also be linear, circular, or even segmented and have an extremely wide range of sizes. The majority of their gene products are essential for creating virus parts and aiding the infectious process. A protective coat called a protein capsid encloses the virus’ genome. The importance and function of the capsid is to protect the genome from being damaged by environmental factors such as pH, salinity, chemicals, or enzymatic hydrolysis. Moreover, capsids play an important role in host recognition and the transportation of the virus genome into the host during infection. (Trun, et al., 2004). Bacteriophage Bacteriophages or phages, are viruses that infect Eubacteria and Archaea. Bacteriophages can be found in every bacterial habitat, they come in a variety of shapes, and are debatably the most diversified and oldest of all known viruses. Phages can be 4 filamentous, icosahedral, or contain a tail structure that is attached to the head protein capsid. Currently, the International Committee on Taxonomy of Viruses (ICTV) recognizes one order, 13 families, and 31 genera of phages (Abedon, et al., 2006). However, tailed phages total about 96% of all reported bacteriophages and are in the order Caudovirales, which is divided into three phylogenetically related families: Myoviridae, Siphoviridae, and Podoviridae. The remaining cubic, filamentous, and pleomorphic phages are less than 3.6% and are grouped into 10 small families (Abedon, et al., 2006). Tailed phages are distinct by having a unique combination of a symmetrical head and a helical tail but vary in structure, dsDNA genome size, and physiology. Their genome size can be between 17 to 500 kb and they can have tails anywhere from 10 to 800 nm in length. About 25% of tailed phages are within the family Myoviridae and have contracting tails that is comprised of a sheath and central tube that is vital during infection. Siphoviridae represent about 61% of tailed viruses and have long noncontracting tails. Podoviridae have short tails and encompass the remaining 14% of tailed phages. Tailed phages utilize their tails in order to attach themselves to specific host receptors. This interaction defines their affinity for a specific group of bacteria or in some cases a specific strain (Deresinski, 2009). As tailed bacteriophages irreversibly attach, they inject their genome into their host where they take over the metabolic processes. For example, the hosts cell RNA polymerase is utilized to initiate the phage infectious processes. There are two major infectious life cycles that phages undergo, lytic or lysogenic. Lytic phages, or virulent phages, take over the host’s metabolic 5 processes usually leading to host’s destruction through lysis. On the other hand lysogenic phages enter the host and are either incorporated in the host genome and replicate as the host replicates in a silent state or it enters the lytic cycle for propagation of the phage. Archaeal phage Archaea are known to be located in a wide variety of extreme environments but only about 50 viruses have been reported that infect this diverse group (Stedman, et al., 2010). Most archaeaphages have been isolated from extreme thermophiles and extreme halophiles. The kingdom, Crenarchaeota, contain members living in extreme temperature conditions on both ends of the spectrum whereas Euryarchaeota encompass many phylogenetically different organisms, such as methanogens and halophiles. Although the number of phages found in Archaea is substantially fewer than Eukaryotic or Eubacterial Domains, the diversity found is as great (Forterre, et al., 2006). Recent isolated Archaeal phages discovered in high temperature acidic environments, specifically from the Sulfolobales family (Crenarchaeota) have been shown to acquire unique morphological characteristics that vary from bottle shaped to lemon shaped to filamentous and rod shaped particles (Snyder, et al., 2011). The majority of isolated phages infecting organisms within Euryarchaeota have similar head capsid and tail structure, as seen within Eubacteria phages, however, there are very few spindle and spherical shaped phages within this kingdom. Although archaeaphages have diverse morphotypes, all identified genomes thus far are circular or linear dsDNA with the exception of Halorubrum pleomorphic virus 1 which is ssDNA. 6 Even though methanogens were the first and most studied Archaea, there are only three known methanophages. ΨM1, one of the three methanophages, is virulent and infects the thermophilic archaea Methanobacterium thermoautotrophicum strain Marburg (Mitchell, et al., 1979). ΨM1 has a dsDNA genome that is circularly permuted with terminal redundancy. This phage also has a polyhedral head capsid and tail structure, which categorizes in the order Caudovirales. About 15% of ΨM1 phage particleshave concatemers from the plasmid pME2001 carried by Methanobacterium thermoautotrophicum Marburg, the only known host for ΨM1, which suggests the capability of general transduction (Pfister, et al., 1998). In vitro proliferation of ΨM1 leads to a more stable spontaneous deletion mutant which is ΨM2, missing approximately a 0.7 kb fragment. Methanobrevibacter strain G and PG In 1984, Baresi and Bertani identified a methanophage known as PG. Methanobrevibacter strain G, PG’s only known host, and PG were both isolated from the ruminant habitat. PG has been tested for its specificity for infecting other methanogens, including other strains of Methanobrevibacter, and strain G was the only viable host. PG is a lytic phage with a latent period of 7-9 hours and bust size of about 20-60 phages (Baresi and Bertani, 1984). It also has a unique A-T rich genome with a size of 71,387 bp dsDNA and encodes for 72 presumptive genes. From the 72 presumptive open reading frames (ORF), 40 of them resulted in unknown functions. The remaining 32 ORFs have some resemblances to genes found in other Eubacteria and their phages in 7 addition to several Eukaryotes and their viruses (Baresi, personal communication). One of the presumptive genes identified from the sequence analysis was the terminase large subunit (TLS), which has a significant responsibility during genome packaging. It is hypothesized that studying the terminase large subunit could lead to a better understanding of phage and archaeal evolution. Terminase Large Subunit The terminase is a two-subunit enzyme that is involved in phage head nucleic acid packaging. This explicit reaction occurs from the terminase cleaving the DNA and utilizing ATPase to transfer the DNA through the portal protein and into the prohead (Black, 1995). This terminase-portal mechanism is utilized by tailed bacteriophages with double stranded viral DNA and usually with a linear genome configuration (Burroughs, et al., 2007). This mechanism is suggested to be found in a variety of families and genera of viruses, such as Siphoviridae, Podoviridae, and Myoviridae. However, many phages have not fully been studied and sequenced, resulting in annotation of only putative terminase large subunits amongst those phages. Bacteriophage terminases from λ, P2, T3, T7, P22, T4, and Φ29 phages have been isolated and sequenced providing information that supports the enzyme’s key role in DNA packaging into the prohead that is dependent upon ATPase activity (Black, 1995). The terminase enzyme consists of a large and small subunit that has some overlap in the structure of their genes. One of the model phages used to study the terminase is phage T4. Researchers were capable of displaying the genetic structure of g16 and g17 through 8 cryo-electron microscopy and X-ray structure analysis (Hegde, et al., 2012). The g16 gene encodes for the smaller subunit which is about 18kDa and g17 encodes for the larger subunit of about 70kDa, which overlap each other by about 5 codons (Black, 1995). By creating mutations of the gene and over expressing the terminase, studies have been able to show that this two protein system encodes for the terminase and requires the use of ATP for cutting the viral DNA and translocating it into the head of the virus (Burroughs, et al., 2007). The mechanism responsible for genome packaging in these phages overcomes an incredible amount of resistance from the capsid (Mitchell, et al., 2002). The amount of energy needed to translocate the DNA has been shown to be one of the most intense ATP consuming events in nature, but the ATPase mechanism that enables this is still not known (Mitchell, et al., 2002). Researchers have tried to understand this motor apparatus by aligning several terminase and packaging genes and have determined that the terminase large subunit consists of an N-terminal ATPase domain and a C-terminal nuclease domain (Burroughs, et al., 2006). All known tailed bacteriophages contain a linear dsDNA genomes when packaged in the head capsid. Of these linear genomes there are several known types of terminal ends: cohesive ends (5’- or 3’- single strand extensions), circularly permuted direct terminal repeats, short or long exact direct terminal repeats, terminal host DNA sequences, or covalently bound terminal proteins (Casjens and Gilcrease, 2009). The specific terminase type amongst tailed phages reflects different DNA replication schemes and provides an insight on how the terminase functions during DNA packaging (Table 1). 9 Table 1. Types of termini from known tailed-phage genomes Terminus Type Prototype Phage Replication Strategy 5’-single strand extension λ, P2 3’-single strand extension HK97 Rolling circleconcatemer Circlecircle Rolling circle concatemer* Cohesive ends Circularly permuted direct terminal repeats† T4 Complexconcatemer P22 Rolling circleconcatemer P1 Rolling circleconcatemer Mu Duplicative transposition into host DNA Host DNA at termini Exact direct terminal repeats Short (few hundred bp) T7 LinearConcatemer Long (thousands of bp) SPO1 Complexconcatemer T5 Complexconcatemer Φ29 Protein-primed linearlinear Covalent terminal proteins Note: Adapted from the Bacteriophages: Methods and Protocols, p. 91 by Casjens, S. R. and Gilcrease, E. B., 2009, Humana Press. † These known virions have their genome sequence terminated at different locations along the sequence and the length of the terminal repeat fluctuates among each virion * Genomic analysis predicts this replication strategy, but it has not been experimentally studied 10 From the six well researched types of terminal ends, five of them are created by the terminase cleaving the genome from the bacteriophage’s replicating mechanism. Phages with terminal proteins are known to replicate as monomeric linear molecules (Casjens and Gilcrease, 2009). The majority of tailed phages package their DNA from concatemers created by the rolling circle or a more intricate initiation replication strategy by nicking or melting and translocating their DNA in a unidirectional packaging series along the concatemers. Each concatemer usually packages about two to five phage heads, but some phages are capable of packaging up to 10 or more depending on the conditions during infection (Casjens and Gilcrease, 2009). As the terminase identifies the viral genome, the initial packaging event begins by cleaving at or near the packaging recognition site. Within headful packaging phages, the packaging recognition is known as the pac site and when the head capsid gets filled, the packaging is completed by a second cleavage which is made by the terminase. Cohesive phages’ packaging recognition site is referred to as the cos site and is terminated at a sequence specific site leaving identical single stranded extensions that are complementary to each other. As soon as the terminating cleavage is cut, the next packaging event is initiated from the remaining concatemer and terminated in the same fashion. Since tailed-phages have varied terminal ends depending on the replication process, terminase cleavage, and packaging mechanism, additional research is frequently required to understand the true characteristics of the linear genome. 11 Hypothesis If PG terminase packaging uses the cos site as part of its packaging strategy then restriction digest and HPLC nucleoside analysis would show cohesive ends at a specific conserved location. Present Study The present study was designed to test whether tail phage PG used site-specific phage packaging producing cohesive ends. This was accomplished using three different techniques. 1) Bioinformatic studies were undertaken using the known PG DNA sequence and comparing it to other tail phage sequences to ascertain similarities. Similarities defined by using these bioinformatic tools will help aid in interpreting the results obtained using HPLC and restriction fragment analysis. 2) PG DNA was subjected to restriction enzyme analysis under differing conditions. Single stranded cohesive ends can, by hydrogen bonding, form overlapping double stranded DNA as seen in phage λ. When heated these overlapping ends disassociate back to single strands. Thus restriction fragment analysis of the PG DNA would produce differing restriction fragment patterns between heated and unheated samples. 3) HPLC analysis, after treatment with mung bean nuclease, was used to identify nucleoside composition of single stranded DNA. Mung bean nuclease hydrolyzes single stranded DNA producing single nucleotides. Subjecting PG DNA to mung bean nuclease will release the nucleotides from single stranded ends, which can be separated and identified using HPLC chromatography. 12 Using these techniques I was able to show that PG has an AT rich cohesive end. Results also suggest that PG uses circular replication as its packaging strategy. 13 METHODS Preparation of “B” solution To 100 ml of distilled water (dH2O) 12.5 g yeast extract, 12.5 g of casamino acids, and 3 L of 10X trace vitamin (see appendix A) was added. The mixture was then brought to a boil under a 70% N2/30% CO2 atmosphere and slightly cooled before aliquoting anaerobically 10 ml into anaerobic culture tubes while under 70% H2/30% CO2 atmosphere. The tubes were then closed with n-butyl rubber stoppers, capped with aluminum crimp caps and autoclaved. Once sterilized, 0.5 ml of sterile biotin (0.2 mg/ml) and 0.1 ml of sterile 1% Na2S were aseptically and anaerobically added to all the tubes using a sterile syringe that has been flushed with 70% H2/30% CO2 gas. Na2S 2% Na2S was made by taking Na2S crystals and rinsing them with room temperature dH2O previously boiled under 70% N2/30% CO2 atmosphere. Na2S crystals were cleaned and dried and were weighed and added to amber serum bottles while being flushed with 70% N2/30% CO2 followed by addition of the appropriate volume of boiled dH2O giving a final concentration of 2% Na2S. The amber serum bottle atmosphere was then flushed with 70% H2/30% CO2, closed with an n-butyl rubber stopper, capped with aluminum crimp caps, and autoclaved. 14 NaHCO3 6% NaHCO3 was made by weighing out 3.0 g of NaHCO3 and combining it with 50 ml dH2O in a round bottom flask. The mixture was placed under 70% N2/30% CO2 gas and brought to a boil. Once cooled, the solution was transferred to a 100 ml glass bottle anaerobically using a glass pipette that has been flushed out with 70% H2/30% CO2 gas and the volume adjust to 50 ml. The serum bottle atmosphere was then switched to 70% H2/30% CO2, closed with an n-butyl rubber stopper, capped with aluminum crimp caps, and autoclaved. Antibiotics The antibiotic mixture stock solution contained 0.02% vancomycin, 0.02% Dcycloserine, and 0.2% ampicillin. This was made by adding 0.02 g of vancomycin, 0.02 g of D-cycloserine, and 0.2 g of ampicillin to a 10 ml beaker. The antibiotics were then transferred into the anaerobic hood where they were dissolved with 5 ml of boiled dH2O. Then using a sterile 5ml syringe and a sterile 0.45 filter, the solution was dispensed into sterile tubes or bottles. The antibiotic solution was then stored at 4°C until used. Ms06 agar 100 ml of Ms06 base agar was made by adding it to a round bottom flask 0.125 g of NH4Cl, 5 ml of mineral 1 (see appendix A), 5 ml of mineral 2 (see appendix A), 0.01 15 ml of trace minerals, 0.5 ml of 0.4% CaCl2, 0.8 g of sodium acetate, 1.4 g of Bacto agar, 0.1 ml of resazurin, and 100 ml of distilled pure E water (see appendix A). The flask was then placed in a boiling water bath to dissolve the media while under 70% N2/30% CO2 gas. Once it has cooled but not solidified, 50 mg of L-cysteine was added to assist in reducing the medium, changing the color from pink to clear. Using the Balch technique, 4.5 ml or 5.0 ml of the medium was then anaerobically transferred to 18 X 150 mm anaerobic culture tubes or serum bottles under continuous 70% H2/30% CO2 gas and closed with an n-butyl rubber stopper, capped with aluminum crimp caps, and autoclaved. Sterile Ms06 base agar is then stored at room temperature. Before using the medium, 50 µl of 0.1% Na2S, 0.1 ml of 6.5% NaHCO3, 0.2 ml of "B" solution (see appendix A), and 150 µl antibiotics were added anaerobically and aseptically to each tube. Ms06 broth Ms06 broth was made in a similar fashion to Ms06 agar except agar was not added. Transfer and growth of Methanobrevibacter strain G Methanobrevibacter strain G were aseptically and anaerobically transferred every week into 4.5 mL of Ms06 broth. Each inoculated tube was pressurized to 30 psi with 70% H2/30% CO2 and incubated at 37°C while placed in a rotator. 16 Gas chromatography Methane was determined using GOW-MAC series 580 Gas Chromatograph. A sterile syringe retrieved injected gas samples anaerobically and separation was achieved by 12 ft. of Porapak Q 80/100 mesh column with helium carrier at 20 mL/min as the mobile phase. Known methane samples were injected before all samples in order to determine the appropriate peak and retention time. Plating Ms06 plates were prepared by transferring liquefied Ms06 agar media serum bottles into the anaerobic hood. 20ml of Ms06 agar was then aseptically dispensed into plastic petri dishes containing the appropriate volume of selective antibiotics. Once solidified, the plates were used the same day for PG harvesting or determining PG titer. PG production In order to prepare for phage infections, Methanobrevibacter strain G was grown to an OD of 0.7-0.9 in Ms06 broth tubes. PG, strain G, and liquefied Ms06 agar were transferred into the anaerobic hood. 1.5 mL of strain G was aliquoted into 12 sterile 3.5 mL glass tubes while placed in 37°C heating blocks. 0.1 mL of PG (106 PFU) was then added to strain G and incubated for 30 minutes. One of the 3.5 mL glass tubes was set aside as our positive control which did not include PG. After 30 minutes 1.5 mL 17 sterile liquefied Ms06 agar was added to each tube, mixed, and poured over as an overlay onto Ms06 agar plates. Once the overlays solidified, 10 µl of the phage was placed at the center of the control plate as a positive control. All the plates were placed into anaerobic Torbal cylinders along with a small plastic bag containing a few grams of anhydrous calcium chloride. The cylinder was then removed from the anaerobic hood, pressurized to 15 psi with H2/CO2, and incubated at 37°C. PG harvesting PG was harvested from the overlay plates by one of two methods when the cylinder pressure dropped to about 5 psi (5 days). One method required scraping off the overlay from each plate by using a hockey stick and placing them into a GSA centrifuge bottle. Equal volumes of 100 mM citrate buffer at pH 6 was added to the GSA centrifuge bottle. 5 drops of chloroform was then added to the collected samples and refrigerated overnight aerobically. The overlays were centrifuged at 4,000 rpm for 30 minutes 4°C in the GSA rotor. The supernatant, approximately 25 ml, was decanted into 50 ml Oak Ridge centrifuge tubes and centrifuged using SS34 at 39,000 x g for an additional 2 hours at 4°C. After centrifugation, the supernatant was saved for additional phage production. The pellet containing bacteriophage was suspended in either 0.5 mL of pH 6.5 MOPS buffer (50 mM MOPS–20 mM EDTA) or 100mM 6 citrate buffer at pH 6. For each suspended phage pellet, one drop of chloroform was added and then stored at 4°C. The second and preferred method flooded the harvested plates with either citrate buffer or MOPS buffer and stored at 4°C overnight to allow PG to diffuse into the 18 buffer. The buffer was removed into 50 mL Oak Ridge centrifuge tubes and then 5 drops of chloroform added. The phage-buffer suspension was centrifuged at 39,000 x g in SS34 for 2 hours at 4°C. The supernatant was decanted and saved for additional phage production while the pellet was suspended in 0.5 mL of either 0.5 mL of pH 6.5 MOPS buffer (50 mM MOPS–20 mM EDTA) or 100mM citrate buffer at pH 6. Similar to the previous method, one drop of chloroform was added to each pellet and then stored at 4°C. Phage titer Similar procedures were set up as described in phage production. Methanobrevibacter strain G was grown to an OD of 0.7-0.9 in Ms06 broth and distributed in the anaerobic hood into 3.5 mL sterile test tubes. Ms06 agar plates were freshly made and strain G was mixed with 1.5 mL of liquefied Ms06 agar in order to pour the overlay. 10 µl samples of a 1/10 serial dilution of PG using 100 mM citrate buffer pH 6 as the diluent was patched onto solidified Ms06 strain G overlay lawns. The plates were incubated at 37°C inside the anaerobic cylinder pressurized at 15 psi with H2/CO2 gas. PG DNA extraction 0.4 mL of clear phage lysate was pipetted into an Eppendorf tube. 10 µl of 20 mg/mL proteinase K was added in order to achieve a final concentration of 0.5 mg/mL and then incubated at 37°C for 30 minutes to an hour. After incubation 10 µl of 10% 19 sodium dodecyl sulfate (SDS) was added and mixed by inverting the tube. The tube was left to incubated at room temperature for 10 minutes and 50 µl of 2M Tris/0.2M Na2EDTA (pH 8.5) was added. The tube was inverted in order to mix and then incubated at 70°C for 5 minutes. After incubation, the tube was set aside to cool to room temperature. An equal volume of TE saturated phenol (pH 6.8) was added and mixed by inverting the tube. The tube was centrifuged at 14,000 rpm for 5 minutes at room temperature. The supernatant was transferred with a wide cut end pipette tip to a new sterile Eppendorf tube and an equal volume of phenol/chloroform/isoamyl alchohol (1:1:1) was added. The tube was mixed and centrifuged at 14,000 rpm for 5 minutes at room temperature. The supernatant was transferred with a wide cut end pipette tip into another new sterile Eppendorf tube. An equal volume of TE saturated chloroform was added, mixed, and centrifuged in the same fashion mentioned above. The supernatant was then transferred to a new microfuge tube. The DNA was precipitated by adding 40 µl of 3M sodium acetate at pH 7 and two volumes of ice-cold 100% ethanol. It was mixed and set on ice for 30 minutes. Next, the tube was centrifuged at 14,000 rpm for 10 minutes at 10°C. The supernatant was carefully removed and the tube was filled halfway with 70% ethanol. The tube was mixed and again centrifuged at 14,000 rpm for 10 minutes at 10°C. The supernatant was carefully decanted and the pellet placed under vacuum until the ethanol has completely evaporated. Once dried, the DNA pellet is suspended in sterile pure-E H2O and the concentration was determined using Nanodrop spectrophotometer ND-1000TM. 20 Construction of primers for presumptive TLS Primers were purchased from IDT DNA and designed manually for the presumptive Terminase Large Subunit gene of methanophage PG. Restriction enzyme sites were added to the 5’ ends of the forward and reverse primers. The forward primers have EcoRI and the reverse primers have KpnI. In addition, the sequence GATC was added before each restriction site. Polymerase chain reaction (Taq) The PCR mixture comprised of 2 µl of template DNA, 5 µl of forward primer at 1 pmol/µl, 5 µl of reverse primer at 1 pmol/µl, 25 µl of Master MixTM from Fermentas, and 13 µl of dH2O. The total 50 µl reaction was added into a thin walled 0.5 mL Eppendorf tube and covered with a drop of mineral oil to prevent evaporation during the reaction. The PCR was ran in the Perkin-Elmer DNA Thermal CyclerTM 480. The cycle was set to begin at 1 cycle of 95°C for 10 minutes, 35 cycles of 95°C for 30 seconds, 50ºC for 45 seconds, and 52ºC for 2 minutes, with final extension at 52ºC for 15 minutes. The PCR reaction was maintained at 4ºC after completion. After the PCR was completed, the mineral oil is removed by adhering it by rolling the reaction around on parafilm. The reaction is then transferred into sterile Eppendorf tubes and cleaned by phenol chloroform extraction. The product is then precipitated with the same procedure from PG DNA extraction by addition of sodium acetate and ethanol. 21 PCR product clean up The PCR product was raised to 500 µl with pure-E H2O. An equal volume of TE saturated phenol was added and mixed by inverting the tube. The tube was centrifuged at 14,000 rpm for 5 minutes at room temperature. The supernatant was transferred with a wide cut end pipette tip to a new sterile Eppendorf tube and an equal volume of phenol/chloroform/isoamyl alchohol (1:1:1) was added. The tube was mixed and centrifuged at 14,000 rpm for 5 minutes at room temperature. The supernatant was transferred with a wide cut end pipette tip into another new sterile Eppendorf tube. An equal volume of TE saturated chloroform was added, mixed, and centrifuged in the same fashion mentioned above. The supernatant was then transferred to a new microfuge tube. The DNA was precipitated by adding 40 µl of 3M sodium acetate at pH 7 and two volumes of ice-cold 100% ethanol. It was mixed and set on ice for 30 minutes. Next, the tube was centrifuged at 14,000 rpm for 10 minutes under refrigeration. The supernatant was carefully removed and then the tube was filled halfway with 70% ethanol. The tube was mixed and centrifuged at 14,000 rpm for 10 minutes at 10°C. The supernatant was carefully decanted and the pellet placed under vacuum until the ethanol has completely evaporated. Once dried, the DNA pellet is suspended with 50 µl sterile pure-E H2O and stored in -20°C. Electrophoresis DNA samples were ran on 0.8% agarose gels. 0.24 g of agarose was measured and placed into a 50 mL flask. 30 mL of 1X TAE was added to the agarose and 22 positioned into a double boiler until the agarose has dissolved. The flask was placed at room temperature to cool until appropriate to pour into the gel tray with the comb in place. Once the gel solidified, the comb and tray barriers were removed and the gel box was filled with 1X TAE. 2 µl of loading dye was added to 2 µl of PCR product and the volume was raised to 15 µl with sterile pure-E H2O. Once the lanes were loaded, the gel was ran at 75 volts for 1 hour. Pulse Field Electrophoresis Restriction digested DNA samples were separated by pulse field electrophoresis using 1% agarose gels. 0.3 g of agarose was measured and placed into a 50 mL flask. 30 mL of 1X TAE was added to the agarose and positioned into a double boiler until the agarose has dissolved. The flask was placed at room temperature to cool until appropriate to pour into the gel tray with the comb in place. The module was placed into an ice water bath to keep the system cool during the duration of the run. Once the gel solidified, the comb and tray barriers were removed and the gel box was filled with 1X TAE. 2 µl of loading dye was added to 2-4 µl of digested DNA and the volume was raised to 10 µl with sterile pure-E H2O. Once the lanes were loaded, the gel was ran with a pulse electrophoresis at 50 volts with a forward ramp of 2 seconds and a reverse ramp of 1 second to separate bands that are 20 kb or greater. 23 Visualization of DNA Bands from an agarose gel were visualized by flooding the gel in 0.5 µg/ml ethidium bromide for 20 minutes. The gel was positioned in a UV light box for viewing and photographed using an OLYMPUSTM digital camera. If bands were faint, Sybr GoldTM staining was used as an alternate. 5 µl of Sybr GoldTM was diluted into 50 mL of 1X TAE in the gel box. The agarose gel was then stained for 30 minutes and viewed on a Dark ReaderTM. DNA fragment size DNA fragment size was determined by comparing unknown with an O’GeneRulerTM DNA 100-10,000 bp ladder from Thermo Scientific. The measurements in millimeters were taken from the bottom of the well to the bottom of the DNA band. The best-fit line on semi-log paper inferred the size line up from all the migrated bands. DNA extraction from gel 0.8% low melting temperature agarose gel was prepared in order to extract DNA bands from the gel. The lanes were loaded with the desired samples and ran at 75 volts for 1 hour. The gel was then stained with ethidium bromide and visualized under UV light. The desired bands were then cut out from the gel and placed into a sterile Eppendorf tube. About four times the volume of TE buffer was added to the Eppendorf tube and was heated at 65°C in order to melt the gel. Phenol chloroform extraction was 24 performed and precipitated by adding 100 µL 5M LiCl and 500 µL ice-cold 100% ethanol. After suspending the pellet in pure-E H2O, DNA concentration was measured by Nanodrop spectrophotometer ND-1000TM. DNA gel extraction was also conducted by using ZymocleanTM Large Fragment DNA Recovery Kit. 0.8% low melting temperature agarose gel was prepared in order to extract DNA bands from the gel. The lanes were loaded with the desired samples and ran at 75 volts for 1 hour. The gel was then stained with ethidium bromide and visualized under UV light. The desired bands were then cut out from the gel and placed into a sterile 1.5 mL microcentrifuge tube. 3 volumes of Agarose Dissolving BufferTM (ADB) is added to the excised agarose gel slice and incubated at 37-55 °C for 5-10 minutes until the gel completely dissolved. The melted agarose solution was then transferred to the Zymo-SpinTM column with a collection tube. The column was centrifuged for 1 minute at 14,000 rpm and the flow through was discarded. 200 µl of DNA wash buffer was added to the column and the column was centrifuged again for 30 seconds. 10 µl of DNA elution buffer was added directly to the column matrix after repeating the washing step again. After the elution buffer has been set on the matrix for 1 minute, the column was placed into a 1.5 ml tube and centrifuged for 30 seconds. After centrifugation, DNA concentration was measured by Nanodrop spectrophotometer ND-1000TM. DNA cleaning Genomic DNA Clean and ConcentratorTM kit was used from Zymo Research. 2 volumes of ChIP DNA Binding BufferTM are added to each volume of DNA sampled in a 25 1.5 ml microcentrifuge tube. The mixture was then transferred to a Zymo-SpinTM IC-XL column with a collection tube. The tube was centrifuged for 30 seconds at 14,000 rpm. The flow-through was discarded and 200 µl of DNA Wash BufferTM was added to the column. The tube was again centrifuged for 1 minute with the wash step repeated. 10-20 µl of DNA elution buffer was added directly to the column matrix and incubated at room temperature for one minute. The column was transferred to a new microcentrifuge tube and centrifuged for 30 seconds to elute the DNA. After centrifugation, DNA concentration was measured by Nanodrop spectrophotometer ND-1000TM. Restriction enzyme digest DNA samples were digested with the appropriate restriction enzymes by adding 4-6 µl of sterile pure-E H2O to 3-4 µl DNA sample. 1 µl of Fermentas Fast Digest buffer was added to the tube containing 1X concentration of the Fermentas Fast Digest restriction enzyme. The tube was incubated at 37°C for 30 minutes. Determination of terminal ends Cohesive ends were analyzed in DNA samples by digestion of the virion DNA with different restriction enzymes that would potentially contain fragments containing Hbonded cohesive end fragments. Virtual cutter of the genome sequence in Serial ClonerTM 2.6.1 and trial and error assisted in determining which restriction enzymes were to be used. After digestion of the DNA samples with the appropriate restriction enzymes, 26 the reaction was heated to 65-70°C for 15 minutes and then divided into 2 equal portions in microfuge tubes. One of the tubes was immediately placed into an ice bath and the other was slowly cooled to room temperature on the bench top. After cooling, the DNA samples were run on an agarose gel. Alkaline phosphatase digestion DNA samples were treated with alkaline phosphatase in order to provide a better reading within the High Performance Liquid Chromatography (HPLC). 2 µl of 10X Thermo Scientific Fast Digest buffer and 1 µl of FastAP Thermosensitive Alkaline Phosphatase was added to the DNA samples (20 µl reactions) for digestion. The samples were incubated at 37°C for 30 minutes to an hour and filtered through NANOSEPTM 3K Omega centrifugal filter. DNA ligation Added 2 µl of 5X rapid ligation buffer, 1 µl of T4 DNA ligase, and PG DNA to a sterile Eppendorf tube was raised to 10 µl with sterile pure-E H2O and incubating the mixture for 5 minutes at room temperature ligated PG DNA. After incubation, the sample was filtered through NANOSEPTM 3K Omega centrifugal filter to keep the same parameters with other DNA samples ran through the HPLC. 27 High Performance Liquid Chromatography (HPLC) High Performance Liquid Chromatography AgilantTM 1100 series machine was used as a tool to identify the presence overhanging ends by analyzing possible nucleosides released from DNA samples after digestion with mung bean nuclease. 2 equal portions of a DNA sample were placed into separate microfuge tubes. One of the tubes was heated to 65-70°C for 15-30 minutes, while the other was kept at room temperature. Immediately after heat treatment, both tubes were digested with mung bean nuclease at 37°C for 30 minutes to 1 hour in order to digest single stranded ends until the DNA sample ends are blunt. After incubation, the DNA samples were raised to 40 µl by adding sterile pure-E H2O and then filtered through a NANOSEPTM 3K Omega filter. The sample was then treated with alkaline phosphatase and incubated at 37°C for 30 minutes to an hour. After alkaline phosphatase, the DNA samples were again filtered through NANOSEPTM 3K Omega filter. Prior to turning on the HPLC, all solvents were filtered through a 0.45 µm cellulose membrane filter and placed in the appropriate solvent reservoirs. After the DNA samples have been prepared for HPLC analysis, the pump, injector, column, detector, and computer is turned on. Once the HPLC is connected to the computer, the instrument 1 online program tab and the purge valve were both opened in order to change configurations on the machine. The pump was set to 5 ml/minute for 6 minutes for each solvent in the order of 70% methanol, 12% methanol + triethylamine phosphate (pH 5.1), and then 12% methanol. After purging with all three solvents, the pump was switched to 1 ml/minute on 12% methanol and the purge valve was closed. Once the pressure increased and the base line is constant, the solvent was then switched to 12 % methanol + 28 triethylamine phosphate. The thermostat and detector was then turned on to 30°C. Once the indicator signifies it is ready, the DNA samples were then ready to be injected into the HPLC to be analyzed. Nanodrop spectrophotometer DNA purity and concentration were measured at A260/A280 using the Nanodrop spectrophotometer ND-1000TM. Sterile pure-E H2O was used to blank the spectrophotometer before analyzing DNA samples. Sequence retrieval The terminase large subunit nucleotide sequences from bacteriophages and archaeaphages were obtained from the NCBI GenBank database (http://www.ncbi.nlm.nih.gov/) and were imported into the San Diego Supercomputer Center Biology WorkBench (http://workbench.sdsc.edu/) for further analysis. Multiple Sequence Alignment (MSA) Multiple sequence alignments were performed in Biology Workbench v.3.2 and before conducting the alignment, some sequences were converted to their reverse complement to obtain the correct orientation for alignment. In addition, alignment between phages would be very difficult due to the capabilities of vertical and horizontal 29 transmission. Several attempts were conducted to align 20 phage sequences with various gap penalties and extensions, however, the capabilities of achieving any conserved regions were almost nonexistent when using ClustalW in Biology Workbench. In order to achieve a decent alignment, the multiple sequence alignments (MSA) were split into two separate groups. One MSA is from the family Siphoviridae and the other MSA is from Podoviridae. Using high gap penalties and gap extensions forced conserved regions to align which lead to fewer gaps. The parameters used for alignment are listed below: Parameters for Siphoviridae: Matrix: IUB/BESTFIT Gap penalty: 90 Gap extension: 8 Parameters for Podoviridae: Matrix: IUB/BESTFIT Gap penalty: 90 Gap extension: 10 Translation of nucleotide sequences to amino acid sequences The multiple sequence alignments were then imported into BioEdit, which allowed the sequences to be converted into amino acid sequences. Toggling from nucleotide to amino acid sequences identified unwanted X’s and assisted in correcting for appropriate gaps within the alignment. Distance matrices Different distance matrices were generated using PAUP4.0 and MEGA4 in order to measure evolutionary and genetic distances between species of interest that have 30 diverged from a common ancestor. These matrices created using MEGA were Pairwisedistance, Kimura-2 parameter, Jukes-Cantor (Nei-Gojobori) synonymous and nonsynonymous, and Tamura-Nei. Distance matrices created using PAUP were Pairwisedistance, Kimura-2 parameter, Kimura-3, Jukes-Cantor, absolute distance, and TajimaNei (Kimura, 1980; Jukes, T.H., & Cantor, C.R., 1969; Tajima, F., & Nei, M., 1984). Modeltest The Modeltest (Posada, D., and Crandall, K. A., 1998) is a program in PAUP4.0 (Swofford, D. L., 2002) that is used to analyze the likelihood scores for 56 different models. The Modeltest identifies the best model by performing hierarchical likelihood ratio tests (hLRTs) and Akaike Information Criterion (AIC). The model with the lowest AIC values is the best-fit model to use. Modeltest was performed for the Podoviridae set of sequences and for the Siphoviridae set. The best model determined for Podoviridae and Siphoviridae is the general time reversible model with gamma rate distribution (GTR+G) with AIC = 11099.8994, -lnL = 5540.9497, and K = 9 for Podoviridae and AIC = 24658.3223, -lnL = 12320.1611, and K = 9 for Siphoviridae. The optimal values derived from Modeltest were then applied into phylogenetic analysis to construct the Maximum Likelihood trees. The information assisted in preparing the best Maximum Likelihood tree and were used to perform bootstrap analysis. 31 Phylogenetic analysis Parsimony analysis provides the simplest technique using a non-model based algorithm to develop trees with very few assumptions. The parameters used were set to default and all characters were weighted equally. The branches were also set to collapse if the maximum length is zero. In addition, character-state optimization was set to accelerated transformation and allowed assignment of states not observed in terminal taxa to internal nodes. Those selected can be recognized as potential short cuts by the “3+1” test. Bayesian analysis collected data to create phylogenetic trees from prior informative data. To conduct Bayesian analysis, the files from PAUP and MrBayes blocks were incorporated and executed using MrBayes. The program was set to run one million generations in order to develop 10,000 trees. Three different nucleotide sequence sets were observed, one set for Podoviridae, another set for Siphoviridae, and the third set combined both families of sequences. Maximum Likelihood analysis observed for the least number of changes and needed to utilize information from Modeltest in order to create optimal phylogenetic trees. According to Modeltest, the best fit model selected for all three sets was GTR+G. Bootstrap analysis Bootstrap analysis was used to identify the accuracy of a phylogenetic tree by randomly shuffling the MSA’s in an attempt to get the same tree from the data. 32 Podoviridae and Siphoviridae sequences were analyzed creating 1000 replicates using the Modeltest results and Maximum Likelihood trees. Time of divergence Divergence time tables were calculated using Jukes-Cantor Non-Synonymous matrix for both Podoviridae and Siphoviridae in MEGA4. In order to identify divergence dates, the equation, µ = K/(2t), was used where t = K/2µ. µ is designated as the number of substitutions per site per year and K represents the number of substitutions between two species. T is the time of divergence between two species. However, there were no designated numbers for substitutions per site per year for bacteriophages, therefore the bacterial non-synonymous rate was used at 4.5 x 10-9. Multiple Sequence Comparison by Log-Expectation (MUSCLE) 99 phage amino TLS acid sequence were retrieved from the NCBI GenBank database. The phages chosen have types of termini that have been studied or experimentally determined. PG and ψM2 have been added and input into MUSCLE to perform a multiple sequence alignment (http://www.drive5.com/muscle/). MUSCLE is shown to be more consistent with higher accuracy than ClustalW. The algorithm takes an approach to include fast distance estimation, progressive alignment based off of a profile function, and refinement using tree dependent restricted partitioning (Edgar, 2004). 33 FastTree FastTree implies approximate maximum likelihood phylogenetic trees from large protein alignments. In addition, the tool uses a heuristic approach to identify better trees and estimates rate of evolution at each site. After receiving the results from MUSCLE, the alignment was analyzed for phylogenetic relationships using Whelan and Goldman (WAG) models for amino acid evolution (Whelan, S., & Goldman, N., 2001). 34 RESULTS Terminase Large Subunit in PG PG genome’s sequence was analyzed by GeneMark to predict open reading frames. GeneMark predicted 72 genes, which were blasted against NCBI. One of the open reading frames matched the Terminase Large Subunit from Methanobacterium phage ψM2 with a 34% identity with an E-value of 2.90E-65. Located at 66613-69198, the presumptive gene is 2,586 bp long and translates to 861 aa. PG’s DNA replication and packaging is currently unknown, however, if the amino acid sequence of a bacteriophage’s TLS is known, the packaging strategy can often be predicted by comparative analysis. Phage TLS will often cluster according to the type of terminal ends they generate after packaging (Casjens and Gilcrease, 2009). This led to an attempt to conduct evolutionary studies on PG’s TLS. Table 2. PG TLS location within the genome and blast result Left end Right end Length (bp) AA BLAST match 66613 69198 2586 861 TLS Evalue 1E-59 % Identity 34% Organism ψM2 Prediction of PG’s packaging strategy The amino acid sequence of a PG’s TLS was compared and analyzed among other experimentally known terminal type phages. This comparative analysis has shown that 35 these sequences cluster the phages according to their type of terminal ends. The additional 100 annotated phage sequences with determined types of terminal ends were retrieved from the NCBI GenBank database (http://www.ncbi.nlm.nih.gov/). MUSCLE was used to construct the multiple sequence alignment from these 101 amino acid sequences and were input into FastTree to create approximate maximum likelihood phylogenetic trees. FastTree is more accurate than using the distance matrix and uses the Whelan and Goldman (WAG) models of amino acid evolution. The reliability of each split in the tree is determined by the Shimodaira-Hasegawa test on three nearest neighbor interchanges around that split and resampling 1,000 times. Indicated in Figure 1, the closer the local support values are to 1, the more reliable the split is. From the FastTree phylogram, PG looks to have shared a common ancestor with phages having short direct terminal repeats (DTR), but also possibly shares common ancestry with well-known phages creating cohesive ends. Although phage DNA ends could result from many different replication, cleavage, or packaging mechanisms, the result of PG’s clustering in the phylogenetic tree guided the next approach for the directed analysis to identify PG’s possible cos site or direct terminal repeating ends. 36 Figure 1. 101 phage TLS FastTree phylogram with local support values using amino acid sequences. The red star indicates the methanophage PG. The yellow star indicates the other methanophage ψM2. The highlighted areas are color coordinated according to known types of terminal ends created by those phages. 37 Phage PG production and DNA extraction Two techniques were used to produce PG and its DNA. One method required the scraping of the overlay agar from the phage plates. The second method, and more preferred was the flooding of the phage plates with either pH 6.5 MOPS or citrate buffer and then collection to give a phage sample without agar residues. In both cases, titers between 109-1010 PFU/mL were obtained. However, the extraction of PG DNA from MOPS buffer and citrate buffer lead to different results in the DNA band clarity and integrity (Figure 2A and 2B). Even though PG extracted DNA produced NanoDrop spectrophotometer reading between A260/280 of 1.75-1.9, which is considered pure, agarose gel electrophoresis identified a problem. DNA extracted from the PG pellet suspended in MOPS buffer developed a smear when subjected to agarose gel electrophoresis (Figure 2A). However, DNA extracted from the PG pellet suspended in citrate buffer appeared as a clean pure band and was therefore used throughout this study (Figure 2B). 38 A 1 2 3 B 1 2 3 23130 bp 10000 bp 10000 6000 3000 3000 Figure 2. PG DNA extraction. Figure 2A and 2B are both PG DNA samples that were harvested the same way except A was suspended in MOPS and B was suspended in Citrate buffer. Figure2A lane 1 is New England BioLabs 1kb ladder. Figure 2A lane 2 is Lambda DNA/HindIII Marker and lane 3 is PG DNA. Figure 2B lane 1 is O’GeneTM Ruler ladder. Figure 2B lane 2 is a 2µl sample of PG DNA and 3 is a 4µl sample of PG DNA. Both gels are 0.8% agarose. Determining phage genome ends using restriction mapping To test PG’s putative TLS, PG’s DNA packaging strategy was analyzed for cohesive ends, headful packaging, or DTR using restriction enzyme analysis. Each direct analysis requires the use of specific restriction enzymes that results in the display of fragments uncrowded in gel positions. Based off the FastTree of figure 1 and PG’s λ-like structure, cohesive ends analysis was conducted. This assay required restriction enzymes to display bands on an agarose gel with enough separation to determine joining and separation of the cohesive ends. Due to the unknown location of PG’s cos site, different restriction enzymes were used to ensure the display of the two end fragments. To 39 determine the appropriate restriction enzymes, Serial Cloner v2.6 was used to display the site usage of restriction enzymes from PG’s inputted genome. Selected restriction enzymes used for this study can be found on Table 3. Table 3. Restriction enzymes and site usage in PG from Serial Cloner v2.6 Restriction Enzymes BstEII ClaI EcoRV HindIII SbfI XhoI Tsb509 Number of sites 3 9 10 15 3 3 933 Cohesive end analysis was prepared according to the protocol using BstEII and XhoI (Figure 3). The samples were heated to 75°C for 15 minutes and divided equally to obtain a rapid and slow cooling mixture and ran on a 0.8% agarose electrophoresis. λ phage is the positive control used to ensure the proficiency of the restriction enzymes (Figure 3, lanes 1 and 2). The resulting smears of PG samples (lanes 4-7) was initially attributed to inactive proteinase K functioning in eliminating possible DNase contamination. 40 1 2 3 4 5 6 7 14140 bp 7242 6369 Figure 3. Cohesive analysis on PG DNA using BstEII and XhoI. Lane 1 the positive control for λ + BstEII. Lane 2 is the positive control for λ + XhoI. Lane 3 is O’Gene Ruler ladder. Lane 4 is PG + BstEII fast cooled and lane 5 was slow cooled. Lane 6 is PG + XhoI fast cooled and lane 7 is PG + XhoI slow cooled. Additional PG DNA extraction was conducted with the same protocol but included phenylmethylsulfonyl fluoride (PMSF) to deactivate proteinase K at a final concentration of 5mM (Figure 4). 2 sets of samples were prepared in order to see if proteinase K was functioning properly. One set included PMSF to intentionally deactivate proteinase K prior to the introduction of the buffer used with the restriction enzymes (Figure 4, lanes 1 and 2). The next set of samples did not include PMSF (Figure 4, lanes 4 and 5). The results showed a distinct smear at lanes 2 and 5 that included the contaminated buffer without any restriction enzymes. The presence of PMSF was expected to deactivate proteinase K to give a smear if the buffer sample was contaminated with a nuclease. In figure 4, lanes 4 and 5had no additions of PMSF to deactivate proteinase K. If proteinase K was active, we would expect to see the elimination of any possible nucleases to give a clean band of PG DNA. However, the 41 result on lane 4 displays a slight smear indicating the possibility that PG DNA could contain a nuclease contamination that is not deactivated by proteinase K. Lane 5 has the addition of the restriction enzyme buffer which could have enhanced the activity of the contaminant. With these results, it shows proteinase K as inactive against the nuclease activity due to the smears on both lanes 2, 4, and 5 and that the contaminant is highly active with the presence of the restriction enzyme buffer. 1 2 3 4 5 10000 bp 3000 Figure 4. Identifying source of contaminant in PG DNA treated with PMSF and introduction of restriction enzyme buffer. Lane 1 is PG DNA with PMSF. Lane 2 is PG with PMSF + restriction enzyme buffer. Lane 3 is O’Gene Ruler ladder. Lane 4 is PG with no PMSF. Lane 5 is PG with no PMSF + buffer. From the results in figure 4, the proteinase K treatment was part of the DNA purification and did not have the ability eliminate the nuclease contaminant. Therefore, in order to determine if the proteinase K was inactive or unable to remove the contaminant, a new stock of proteinase K (20mg/mL) from Promega . The old and new proteinase K was set up with the same restriction digest of λ DNA in order to test its 42 ability to inhibit BstEII. Each sample preparation was introduced to the old or new proteinase K before adding in the restriction enzyme and buffer. This order was done to minimize the activity of the restriction enzyme to examine the level of efficiency of the old and new proteinase K (Figure 5). The results shown in lane 1 indicates that the old proteinase K did not inhibit the activity of BstEII on λ DNA where the new proteinase K was able to hinder BstEII’s reaction (lane 2). Lane 3 was the control having no proteinase K and just λ with BstEII. 1 2 3 Figure 5. Identifying efficiency of proteinase K. Lane 1 is λ DNA with the old proteinase K. Lane 2 is λ DNA with the new proteinase K. Lane 3 is the control with λ DNA + BstEII. All DNA extraction procedures and already extracted samples of PG DNA were treated with the new proteinase K and deactivated with PMSF in order to eliminate any possibility of nuclease contamination. After the proteinase K treatment, PG DNA 43 samples were separated on a 0.8% agarose gel to determine the presence of any nuclease activity (Figure 6). As expected, there was no smearing as seen in Figure 4, lane 2. Lane 3 was a previously extracted PG sample that was treated with the new proteinase K after extraction and inactivated using PMSF. Lane 4 is the same PG sample from lane 3 but ran with the presence of the same restriction enzyme buffer. The results show no smearing indicating the new proteinase K resolved the issue that was present in Figure 4, however, the results in lane 6 lead to an additional issue. Lane 6 had the same PG DNA extracted from lanes 1 and 2 and was treated with BstEII. The lane displayed a smear with no distinct bands. This result specifies that the proteinase K worked on the nuclease that was activated with the restriction enzyme buffer in PG, but outlines another issue of the restriction enzyme specificity. BstEII was discovered to be past its shelf life and could have resulted in star activity. 1 2 3 4 5 6 10000 bp 3000 Figure 6. PG treated with new proteinase K and restriction digest. Lane 1 is PG DNA extraction with the new proteinase K. Lane 2 is PG DNA proteinase K treated with restriction enzyme buffer. Lane 3 already extracted PG DNA treated with new proteinase K. Lane 4 is already extracted PG DNA treated with new proteinase K and ran with restriction enzyme buffer. Lane 5 is O’Gene Ruler ladder. Lane 6 is PG DNA sample treated with proteinase K +BstEII. 44 Samples of PG were cut with new fast restriction enzymes ClaI, EcoRV, and Tsp509. The samples of PG were all extracted according to the protocol and were all from the same phage production harvest. When extracted and ran on a 0.8% agarose gel, the PG DNA gave a clear distinct band for PG. However, when PG was incubated with the three restriction enzymes, the results had no digestion activity but worked on λ DNA (Figure 7). This resolved the notion of star activity with the old BstEII restriction enzyme, but identifies an additional inhibitor associated with PG (lane 1). When mixed with λ DNA, ClaI was not able to cut λ or PG (lane 2) but works effectively against λ by itself (lane 5). 11385 bp 10496 10000 bp 3000 Figure 7. Inhibitor associated with PG when treated with restriction enzymes. Lane 1 is PG DNA. Lane 2 is PG + λ + ClaI. Lane 3 is PG + EcoRV. Lane 4 λ + EcoRV. Lane 5 is λ + ClaI. Lane 6 is O’Gene Ruler ladder. Lane 7 is λ + Tsp509. Lane 8 is PG + Tsp509. 45 PG was compared with a single stranded DNA phage called M13. Due to PG’s unusual AT rich genome, PG was looked at for the possibility of becoming single stranded during the extraction or handling process (Figure 8). M13 is a circular single stranded DNA phage of about 6.4 Kb. As a positive control, M13 was digested with mung bean nuclease, which is specific for single strand DNA and RNA (Figure 8A, lane 4; Figure 8B lane 4). Mung bean will also degrade single stranded extensions off of DNA and RNA leaving ligatable blunt ends. From Figure 7, PG had an inhibitor that would not allow ClaI to cut λ when mixed together. As expected, PG was not affected by mung bean (Figure 8A, lane 2), however, the inhibitor associated with PG did not prevent mung bean from digesting M13 when mixed together (Figure 8B, lane 3). This identifies the contaminant associated with PG inhibits endonuclease activity against double stranded DNA. 46 A 1 2 3 4 B 1 2 3 4 5 10000 bp 6407 6000 3000 Figure 8. PG and M13 treated with mung bean. Figure 8A, lane 1 and 8B lane 5 is O’Gene Ruler ladder. Figure 8A lane 2 is PG + mung bean. Figure 8A lane 3 is M13. Figure 8A lane 4 and 8B lane 4 is M13 + mung bean. 8B lane 1 is PG DNA. 8B lane 2 is PG + M13. 8B lane 3 is PG + M13 + mung bean. PG DNA clean and concentrate The results of PG has demonstrated a source of contamination that could be associated with the genome or be a soluble inhibitor. Due to such complications, PG DNA samples were ran through another cleaning process using Zymo Research Genomic DNA Clean and ConcentratorTM kit. The cleaned PG samples were again assessed by a NanoDrop spectrophotometer to ensure an A260/280 of 1.75-1.9. The PG samples resulted in a decreased concentration after being ran through the filters, but PG was no longer showing influence inhibition when mixed with λ and cut with ClaI and DNAse (Figure 9, lane5 and lane 7). 47 1 2 3 4 5 6 7 11385 bp 10000 Figure 9. Effective digestions after Zymo kit cleaned PG. Lane 1 is λ + DNAse. Lane 2 is λ + ClaI. Lane 3 is O’Gene Ruler ladder. Lane 4 is pure λ DNA. Lane 5 is a mixture of λ + PG digested with ClaI. Lane 6 is cleaned sample of PG. Lane 7 is the same cleaned sample PG + DNAse. Cohesive end analysis Best studied phages with chromosomes having cohesive ends have identical overhanging ends that anneal together upon injection into the host. The host DNA ligase will then seal the ends to generate a rolling circle template for DNA replication. λ was set as our positive control for cohesive ends. λ’s cos site has a 12 base pair overhanging end that separates after being heated at 65°C-70°C for 5 minutes. After heating, the samples were separated by different cooling procedures. One sample was cooled in an ice water bath immediately after heating (Figure 10A, lane 1; 10B, lane 1). The other sample was slowly cooled to reach room temperature (Figure 10A, lane 2; 10B 48 lane 2). The results for our positive cohesive end control demonstrates the separated sticky ends anneal back together when the samples are slowly cooled. A 1 2 3 B 1 2 3 23130 bp 27491 5765 bp 4361 3326 2676 650 Figure 10. Cohesive end analysis on λ. Figure 10A Lane 1 is λ + EcoRV heated to 65°C and rapidly cooled in ice water bath. Figure 10A lane 2 is λ + EcoRV heated to 65°C and slowly cooled to room temperature. Figure 10A Lane 3 is O’Gene Ruler ladder. Figure 10B lane 1 is λ + HindIII heated to 65°C and rapidly cooled in ice water bath. Figure 10B Lane 2 is λ + HindIII heated to 65°C and slowly cooled to room temperature. Figure 10B Lane 3 is O’Gene Ruler ladder. PG was also set to the same parameters for identifying cohesive ends and were cut with EcoRV and ClaI (Figure 11). Figure 11A, lane 1 shows PG digested with EcoRV and then heated to 65°C for 5 minutes and slowly cooled to room temperature. The same parameters were set in 11B lane 1 but was digested with ClaI. Lane 2 are samples of PG digested with EcoRV (Figure 11A) and ClaI (Figure 11B) that have been cooled rapidly in an ice water bath immediately after heating. The PG sample digested with EcoRV displayed cleaner cuts and was used to analyze different temperature ranges for splitting 49 the presumptive cos site (Figure 12). The results of direct cohesive analysis did not display any clear separation or annealing of the presumptive cos site in PG. However, no separations of cos ends is a negative result but does not exactly determine if PG has cohesive ends. A 1 2 3 B 1 2 3 Figure 11. Cohesive end analysis on PG. Figure 11A lane 1 is PG + EcoRV heated to 65°C and slowly cooled to room temperature. Figure 11A Lane 3 is PG + EcoRV heated to 65°C and rapidly cooled in an ice water bath. Figure 11A Lane 2 is O’Gene Ruler ladder. Figure 11B lane 1 is PG + ClaI heated to 65°C and slowly cooled to room temperature. Figure 11B Lane 2 is PG + ClaI heated to 65°C and rapidly cooled in an ice water bath. Figure 11B Lane 3 is O’Gene Ruler ladder. 50 1 2 3 Figure 12. PG digested with EcoRV and heated at different temperatures. Figure 12 lane 1 is O’Gene Ruler ladder. Lane 2 is PG + EcoRV heated to 55°C for 5 minutes and rapidly cooled in an ice water bath. Lane 3 is PG + EcoRV heated to 75°C for 5 minutes and rapidly cooled in an ice water bath. High performance liquid chromatography (HPLC) The complexities of working with PG limits which endonucleases we can use to properly digest its genome. In addition to the extra purification procedures, PG’s 70kb genome lacks the usable sites to display the appropriate minimal bands on a gel to identify the annealing and separation of the possible cos site. As a result, the use of a high performance liquid chromatography is a great tool to identify and separate different compounds in a liquid sample. In this case, the HPLC can be a promising tool to identify individual nucleosides from PG if the genome has cohesive ends. By digesting PG with mung bean nuclease, which will remove any single stranded ends to blunt ends, we can determine the single nucleosides as they are separated from one another though the column and measured by a UV wavelength absorbance detector at 254nm. As the separated nucleotides exit the column, their detection is measured on a liquid chromatogram and determined by their retention time. 51 Standards on the HPLC were created by treating each single nucleotide (dATP, dGTP, dCTP, dTTP) with alkaline phosphatase to remove the phosphate groups and filtered through a NANOSEPTM 3K Omega filter (Figure 13). The use of NANOSEPTM 3K Omega filter was essential to separate any digested single nucleosides from the remaining double stranded genome of PG and λ. This step was done on nucleoside standards to keep consistency in the procedures when conducted on PG and λ samples. The standards presented on the chromatogram resulted with one distinct peak for each indicated nucleoside. The X-axis is measured in time (minutes) and the Y-axis is measured in milli-absorbance units expressed by UV detection. 52 dA dG dC dT Figure 13. Nucleoside standards ran on HPLC. Each of the nucleosides are set as the standard to determine the time of retention. Each dNTP was run individually, treated with alkaline phosphatase, and filtered through a 0.5 µm NANOSEPTM 3K Omega filter. dA measured at 8.5 minutes, dG measured at 4.4 minutes, dC measured at 3.0 minutes, and dT measured at 5.3 minutes. In addition to the nucleoside standards, additional controls were measured in order to identify any peaks that may have retention times worth noting when conducting analysis on PG and λ. Pure-E H2O was used to suspend PG after DNA extraction and therefore ran through the HPLC as a standard (Figure 14). Pure-E H2O resulted in a single significant peak and was ran through the same 0.5 µm Milex-HV filter. 53 Figure 14. Pure-E H2O standard ran on HPLC. Pure-E H2O was filtered through the same NANOSEPTM 3K Omega filter and injected into the HPLC. Pure-E H2O measured has a significant peak at 2.46 minutes. Alkaline phosphatase was used to treat all samples in order to remove the phosphate groups on nucleotides. The alkaline phosphatase enzyme and buffer was introduced at same volume concentrations to DNA volume samples used (Figure 15). The control sample was at a total of 40µl volume which included 4µl alkaline phosphatase buffer, 2µl alkaline phosphatase enzyme, and the remaining pure-E H2O. Alkaline phosphatase displayed three distinct peaks measured at 2.51 minutes, 3.4 minutes, and 11.48 minutes. The first peak could be the same peak measuring pure-E H2O. Figure 15. Alkaline phosphatase ran on HPLC. Alkaline phosphatase + buffer was filtered through the same NANOSEPTM 3K Omega filter and injected into the HPLC. Three significant peaks were noted at 2.51, 3.4, and 11.48 minutes. 54 λ phage DNA was used as our positive control to identify nucleotides from λ’s overhanging ends. 0.3µg of λ DNA was input into a total volume of 40µl. λ was heated at 65-70°C for 5 minutes in order to separate the cos site but was not treated with mung bean nuclease. The sample was still treated with alkaline phosphatase and filtered by the same procedure after heat treatment in order to identify any possible nucleosides that may have fragmented off the overhanging ends (Figure 16). The chromatogram illustrates the expected results of having no indication of nucleosides present with no presence of mung bean endonuclease. The chromatogram results from λ resembles the same peaks present in the alkaline phosphatase chromatogram in figure 15. Figure 16. λ DNA heated with no mung bean and ran on HPLC. λ was heated to 65-70°C and incubated with alkaline phosphatase for 1 hour. After incubation, the sample was filtered through the same NANOSEPTM 3K Omega filter and injected into the HPLC. Three significant peaks displayed measuring at 2.53, 3.37, and 11.49 minutes. An additional λ DNA sample was heated at 65-70°C for 5 minutes and then immediately digested with mung bean for 1 hour. The sample was then filtered in order to treat the digested nucleotides with alkaline phosphatase for an additional hour. After dephosphorylation, the sample was filtered again and ran on the HPLC (Figure 16). The chromatogram indicates significant peaks that measure the same time of retention that 55 were identified with each standard nucleoside in figure 13 and peaks seen on the chromatogram with alkaline phosphatase. dC dG dT dA Figure 16. λ DNA heated, digested with mung bean, and ran on HPLC. λ was heated to 65-70°C for 5 minutes and digested with mung bean. After filtering, digested nucleotides were incubated with alkaline phosphatase for 1 hour. After incubation, the sample was and injected into the HPLC. Significant peaks identified nucleosides dC, dG, dT, and dA at indicated retention times. PG was treated with the same parameters as λ when analyzing for cohesive ends. 0.6µg of PG DNA was heated at 65-70°C and suspended in pure-E H2O for a total volume of 40µl. The sample was then filtered through the same NANOSEPTM 3K Omega filter. Being AT rich, PG was treated through this procedure to identify the integrity after handling and filtering the genome. The total volume sample was then injected into the HPLC and gave an expected chromatogram identifying pure-E H2O with no nucleotides (Figure 17). 56 Figure 17. Heated PG DNA suspended in pure-E H2O, filtered, and ran on HPLC. PG was suspended in pure-E H2O for a total 40µl volume and was filtered through the same NANOSEPTM 3K Omega filter. The PG sample was then injected into the HPLC and measured with a significant peak at 2.46 minutes identifying H2O. In order to identify if PG re-circularizes after DNA extraction, PG was digested with mung bean endonuclease for 1 hour without heat treatment (Figure 18). After 1 hour, the sample was filtered and incubated with alkaline phosphatase for another hour. After dephosphorylating, the sample was injected into the HPLC and resulted in two peaks indicating nucleosides dT and dA. The residual trace substances are residual peaks from alkaline phosphatase and its buffer. dT dA Figure 18. Non-heated PG DNA, digested with mung bean, and ran on HPLC. PG was digested with mung bean for 1 hour and filtered through the same NANOSEPTM 3K Omega filter. After filtering, digested nucleotides were incubated with alkaline phosphatase for 1 hour. The PG sample was then injected into the HPLC and measured two significant peaks indicating dT and dA. The results from figure 18 imply mung bean endonuclease is digesting small amounts of single stranded ends that identify dT’s and dA’s. To determine the accuracy of the results, PG was treated within the same parameters for figure 18, but was exposed to heating prior to digestion (Figure 19). PG was placed into a water bath at 65-70°C for 5 minutes. Immediately following the hot water bath, PG was digested with mung bean, filtered, and dephosphorylated with alkaline phosphatase. The sample was injected into 57 the HPLC to identify any changes in peaks compared to figure 18. The chromatogram detected all four nucleosides and was measured at a significantly higher absorbance than λ and non-heated PG in figure 18. dC dG dT dA Figure 19. Heated PG DNA, digested with mung bean, and ran on HPLC. PG was digested with mung bean for 1 hour immediately after being in a water bath at 65-70°C for 5 minutes. After filtering, digested nucleotides were incubated with alkaline phosphatase for 1 hour. The PG sample was then injected into the HPLC and measured significant peaks indicating dC, dG, dT, and dA. The results from heating PG revealed additional nucleosides dC and dG. Moreover, the peaks of all nucleosides measured at a significantly higher absorbance peak. The presence of nucleosides in the heated PG chromatograph does not determine if PG circularizes by having sticky ends. As a result, PG was ligated ensuring circularization and annealing of all nicks within the genome. Before ligating PG, the HPLC was first injected with DNA ligase and ligase buffer as a control to identify any retention peaks that may show up in the actual PG ligated sample (Figure 20). After ligating PG, the DNA was heated to 65-70°C and digested with mung bean for 1 hour. After filtering, alkaline phosphatase was added and incubated for an additional hour. The 58 resulting chromatograph of ligated PG DNA determines no digestion of nucleosides and mirrored the same peak from filtered PG DNA in figure 17 (Figure 21). Figure 20. DNA ligase and ligase buffer filtered. DNA ligase and ligase buffer was aliquoted into a total 40µl sample of pure-E H2O at the same volume ligating PG DNA. After filtering, the sample was injected into the HPLC and measured a distinct peak at 3.4 minutes with a high absorbance level. Figure 21. Ligated PG sample heated and digested with mung bean. PG ligated was heated to 65-70°C for 5 minutes and digested with mung bean for 1 hour. Once filtered, the sample was injected into the HPLC and measured a similar peak to PG DNA only in figure 17. 59 Bioinformatics Sequence Retrieval A total of 20 sequences were retrieved from the NCBI GenBank database (http://www.ncbi.nlm.nih.gov/) that were specific to annotated phages with the Terminase Large Subunit. Focusing on the order Caudovirales, 11 phages were chosen from the family Podoviridae and 9 from Siphoviridae. All sequences were complete gene sequences and varied in length shown in Table 5. Multiple Sequence Alignment (MSA) ClustalW is a program that aligns multiple sequences together to show conserved regions between sequences. Therefore, ClustalW in SDSC Biology WorkBench was used to construct multiple sequence alignments from the phage sequences. The MSA’s constructed from ClustalW were used for further investigations and can be found in Appendix B for Podoviridae, Siphovirdae, and all 20 sequences combined. 60 61 Gene ID 20088899 1261708 1257603 269975307 216262944 38707640 13476623 (not published) 7256802 33334159 13517602 7353018 2777092 66473858 24250810 8250754 60476784 7123726 21914413 Gene ID 2944239 Gene Description Terminase [Methanosarcina acetivorans C2A] Terminase Large Subunit Putative Large Subunit Terminase Phage Terminase Large Subunit Terminase Large Subunit Terminase Large Subunit Phage Related Terminase Possible Large Terminase Subunit Putative terminase Gene Description Bacteriophage terminase large (ATPase) subunit Terminase Large Subunit Putative Large Subunit Terminase Terminase Large Subunit Terminase Large Subunit Terminase Large Subunit Terminase Large Subunit Terminase Large Subunit Terminase Large Subunit Bacteriophage L head assembly gene cluster, partial sequence DNA packaging protein gp2 (Terminase large subunit) Organism Enterobacteria phage P22 Salmonella phage P22-pbi Shigella phage Sf6 Salmonella phage HK620 Salmonella phage epsilon34 Enterobacteria phage ST104 Salmonella phage SE1 Salmonella phage ST64T Salmonella phage C341 Enterobacteria phage L Escherichia fergusonii ATCC 35469 Accession Number NC_003552.1 NC_001902 NC_002628 GU169904 EU877232 NC_005282 NC_002678 (not published) NC_011811 Amino Acids Complete Genome 1499 bp 499 1499 bp 499 1412 bp 470 1412 bp 470 1500 bp 499 1500 bp 499 1500 bp 499 1554 bp 517 1500 bp 499 1500 bp 499 1500 bp 499 Amino Acids Complete Genome Organism 1482 bp 493 Methanosarcina acetivorans C2A 1407 bp 468 Methanobacterium phage psiM2 1404 bp 467 Methanothermobacter phage psiM100 879 bp 292 Staphylococcus phage SA1 1602 bp 533 Enterobacteria phage WV8 1602 bp 533 Salmonella phage Felix 01 1389 bp 462 Mesorhizobium loti MAFF303099 1622 bp 540 Methanobrevibacter phage PG 1605 bp 534 Erwinia phage phiEa21-4 Siphoviridae Accession Number NC_00237 AF527608 AAQ12192 AF335538 NC_011976 NC_005841 DQ003260 AY052766 NC_013059 AY795968 NC_011740 Podoviridae Table 5. List of TLS gene from families Podoviridae and Siphoviridae Gene Region 4707089-4708570 bp 4906-6312 bp 10168-11571 bp 41003-41881 bp 32001-33602 bp 30625-32226 bp 6592989-6594377 bp 70165-66613 bp 26606-28210 bp Gene Region 203-1702 bp 203-1702 bp 420-1832 bp 20100-21512 bp 467-1966 bp 24878-26377 bp 25026-26525 bp 24278-25831 bp 467-1966 bp 1426-2925 bp 2096748-2098247 bp After applying the parameters for the best alignment, BioEdit was used to manually adjust gaps that were present in the alignments. Each nucleotide or amino acid would acquire its own color and are aligned according to their colors. BioEdit also detects motifs and highly conserved regions within the sequences. The nucleotide MSA sequences would be translated to determine any locations of unwanted X’s. Any presence of unwanted X’s indicated the sequence to be out of frame. The X’s were taken into consideration and were replaced with the appropriate dashes to indicate the gaps at the correct locations. For example, if there was an amino acid sequence of LNA~XLS, where “~” is an indication of a gap, the amino acid sequence would be toggled back to the nucleotide sequence in order to determine the problem and see the sequence of TTAAATGCT~~~--GTTAAGC. In this case, an additional gap would be added or two of the gaps present would be manually deleted, depending on the proper alignment in which will give the best conserved regions. The MSA of amino acid sequences can be found in Appendix B. To identify for motifs within the proteins, the TLS from Methanobrevibacter phage PG was translated into an amino acid sequence and a protein blast was performed through http://blast.ncbi.nlm.nih.gov. The results gave an overview and a graphical summary from the database of sequences that align to the query sequence and includes conserved protein motifs (Figure 22). These conserved domains within the protein coding region of TLS from phage PG are shown in Figure 23 with different confidence levels and the domain model scope which is marked on the left side of the graph. Even though there are 5 hits that are non-specific, they still match or surpass the threshold. The 3 hits in the superfamilies are conserved domain clusters that create overlapping 62 annotations on the same protein sequences and are expected to signify evolutionary related domains. Hits in the multi-domains are models detected and would likely have several domains. Figure 22. Protein BLAST results for terminase large subunit for Methanophage PG. (NCBI BLAST http://blast.ncbi.nlm.nih.gov/Blast.cgi) In the graphical summary, there is a total of 7 domain hits with the largest as the homing endonucleases at 438 aa from the pfam05203, which are encoded by mobile DNA elements (Table 6). The next largest is the phage terminase from the PBSX family and has a Cd length of 396 aa. This TIGR01547 model identifies other divergent members of the large terminase subunit. Terminase from pfam04466 has a conserved domain length of 387 aa and commences the packaging of viral DNA in the head capsid of phages. Terminase from pfam03237 has a Cd length of 380 aa and this family characterizes groups of terminase proteins. The next hit is a phage related terminase that is a part of the superfamily cl02216, which has a length of 202 aa. The Cd length of 142 aa from the psiM2_ORF9 models the C-terminal region of the terminase and the smallest domain is of the Hedgehog/Intein domain N-terminal region with Cd 100 aa. However, the conserved domains with greater significant E-values are to be more significant as 63 seen in Terminase 6 of the pfam03237, COG5362 related terminase, and the C-terminal region from psiM2_ORF9. Table 6. Protein BLAST summary results of Terminase Large Subunit from Methanophage PG Number Description Cd Length E-Value 1 Homing endonuclease 438 1e-04 2 Phage terminase PBSX family 396 3e-05 3 Terminase pfam04466 387 0.002 4 Terminase pfam03237 380 7e-22 5 Phage related terminase 202 1e-16 6 C-terminal region 142 1e-13 7 Hedgehog/Intein domain 100 0.008 After locating these motifs within the protein of interest, a search in the Conserved Domain Architecture Retrieval Tool (CDART) which utilizes the NCBI Entrez Protein Database was used to search for other proteins that are evolutionarily similar to the Methanobrevibacter phage PG amino acid sequence that was inputted. Figure 23 displays the query sequence at the top of the image and designates HintN, phage terminase 2 from the PBSX family, and the COG5362 phage related terminase as conserved domains. The results show similar domain architectures as to where these motifs could be found in other protein families. HintN, a motif known to belong in the DNA polymerase III superfamily. Along with TIGR01547 of the PBSX family, these 64 motifs can also be seen in two other sequences from Pirellula staleyi and 12 sequences from Proteobacteria. As for COG5363, it is also located in the same 2 sequences of Pirellula staleyi. Figure 23. CDART results page for terminase large subunit for Methanophage PG. (NCBI CDART http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi) Protein modeling Terminase Large Subunit in phage PG has not been isolated yet. When the amino acid sequence of the TLS was searched in the Pfam database, the results indicated to be closely related to the Terminase-like family, which has similarities to known terminase 65 function in bacteriophage T4 and λ. PG was compared with bacteriophage T4’s terminase and the alignment has an E-value of 3.7e-35 with a 100% degree of confidence in the majority of aligned residues (Figure 24). The alignment hit of each row shows the matching Hidden Markov Model (HMM) and the query sequence. The alignment key is presented at the bottom of Figure 24 and the sequence search in the Pfam database presented a signifcant match from the query sequence to the Terminase-like family. 66 67 Figure 24. Sequence search in Pfam protein families databases. Identified the signifcance in the match from the query sequence to the Terminase-like family. (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) The line-up of Phage PG has an E-value of 3.7e-35 with Bacteriophage T4 and the Line-up with the Terminase (gp17) of Bacteriophage T4 is shown in Figure 24. The key of the alignment is shown at the bottom of Figure 24 which indicates the significance the query sequence to the terminase-like family. With the significant E-value between PG and T4, it is hypothesized that they would share similar strcutured frequencies. RCSB Protein Data Bank structure database was used to view Bacteriophage T4 gp17 Terminase in ribbon format designed (Sun, et al., 2008) in order to display a representative structure ribbon structure of PG’s terminase (Figure 25). The secondary structure of this protein is made up of 32% alpha helices and 19% beta sheets (Figure 26) (Finn, et. al., 2006; Kabsch, et. al., 1983). Figure 25. 3-D model of the Bacteriophage T4 gp17 in ribbon format. (http://www.rcsb.org/pdb/home/home.do) 68 Figure 26. Sequence of Bacteriophage T4 gp17 protein model. Displays the Terminase-like family in green and the helical and beta sheets brown and yellow, respectively. (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) 69 Distance matrices To analyze evolutionary relationships, distance matrices are often generated as a way to measure and analyze evolutionary and genetic distances between species that have diverged from a common ancestor. In this case, that would be looking at the changes that have occurred in the Large Terminase subunit within Podoviridae and Siphoviridae. There are also several models that take into account different parameters and presumptions giving different information of what could have occurred at the genetic level. Pairwise distance matrix, also known as P-distance, generates matrices from simple parsimony informative sites. This could be misleading in unreliable measurements of evolutionary distances. Uncorrected P-distances measures only a part of nucleotide or amino acid substitution where this could result in miscalculating evolutionary distances as well. However, if distances are small, than precise readings can occur. This could be seen in MEGA using p-distances and Paup using uncorrected pdistances where both Podoviridae and Siphoviridae are relatively close in values. Kimura 2 parameter was used to determine evolutionary distance and takes into account transitions and transversions differently. This method sets its parameters to adjust for the amount of transitions having more possibilities to occur than transversions. MEGA developed slightly lower results than Paup for Siphoviridae but for Podoviridae the results were very close in range. Jukes-Cantor is another simple method and makes assumptions that substitutions occur randomly. These random substitutions are suggested to occur with equal 70 probability and are called one-parameter model. Jukes-Cantor also considers synonymous substitution and non-synonymous substitution rates that can be generated using MEGA. With counting synonymous changes, the codon would not code for another amino acid if the nucleotide is to change where in a non-synonymous change, the alteration of the nucleotide would change the amino acid originally coded for. In the synonymous matrix, MEGA could not calculate some of the values therefore it could not accurately account for synonymous change. As a result, in order to measure for evolutionary distance, it would be necessary to look at non-synonymous changes. Kimura 3 parameter was able to calculate in PAUP, which is similar to Kimura 2 parameter in compensating for transitions and transversions, but this model divides transversions into two types A-T/G-C and the other is A-C/G-T. PAUP was also capable of absolute distances which looks at total nucleotide changes in characters to measure for evolutionary distances. In MEGA, Tajima-Nei was used to identify distances that assumes equal substitution rates between transitions and transversions. MEGA was able to identify conserved sites, variable sites, parsimony-informative sites, singleton sites, 0-fold degenerate sites, 2-fold degenerate sites, and 4-fold degenerate sites from the families Podoviridae and Siphoviridae (Table 7). The distance matrices generated with PAUP and MEGA programs can be found in Appendix C. Table 7. MEGA results of conserved, variable, parsimony-informative, singleton, 0-fold, 2-fold, and 4-fold degenerate sites from Podoviridae and Siphoviridae. Phage Family Podoviridae Siphoviridae Conserved Variable 406/1554 190/1833 1094/1554 1457/1833 ParsimonyInformative 1002/1554 1080/1833 71 Singleton 92/1554 358/1833 0-fold Degenerate 973/1554 978/1833 2-fold Degenerate 147/1554 75/1833 4-fold Degenerate 114/1554 44/1833 Modeltest The best model determined from the Modeltest results for Podoviridae and Siphoviridae is the general time reversible model with gamma rate distribution (GTR+G). The dataset for Podoviridae resulted with the akaike information criterion (AIC)=11099.8994, -InL=5540.9497, and K=9. For Siphoviridae, the AIC=24658.3223, -InL=12320.1611, and K=9. The dataset for both Podoviridae and Siphoviridae combined resulted in AIC=36042.1484, -InL=18012.0742, and K=9. Running Modeltest indicated GTR+G as the ideal model and gave values to be applied into the Maximum Likelihood tree. This information will help in providing the best Maximum Likelihood tree to provide optimal conditions for this search and will later be applied to the bootstrap analysis. The Modeltest output can be found in Appendix D. Podoviridae: - Model Selected: o GTR+G o -lnL = 5540.9497 o K=9 o AIC = 11099.8994 - Base Frequencies o A = 0.2704 o C = 0.2349 o G = 0.2702 o T = 0.2245 - Substitution model o Rate Matrix o A-C = 1.7656 o A-G = 2.6455 o A-T = 0.6250 o C-G = 0.4248 o C-T = 4.5017 o G-T = 1.0000 - Among-site rate variation o Proportion of invariable sites = 0 o Variable sites (G) o Gamma distribution shape parameter = 0.4516 72 Siphoviridae: - Model Selected: o GTR+G o -lnL = 12320.1611 o K=9 o AIC = 24658.3223 - Base Frequencies o A = 0.2916 o C = 0.2088 o G = 0.2499 o T = 0.2497 - Substitution model o Rate Matrix o A-C = 1.7834 o A-G = 1.8181 o A-T = 1.1345 o C-G = 0.9127 o C-T = 2.7446 o G-T = 1.0000 - Among-site rate variation o Proportion of invariable sites = 0 o Variable sites (G) o Gamma distribution shape parameter = 1.4244 For all 20 sequences: - Model Selected: o GTR+G o -lnL = 18012.0742 o K=9 o AIC = 36042.1484 - Base Frequencies o A = 0.2828 o C = 0.2193 o G = 0.2575 o T = 0.2404 - Substitution model o Rate Matrix o A-C = 1.8378 o A-G = 2.0590 o A-T = 0.9950 o C-G = 0.6558 o C-T = 3.0231 o G-T = 1.0000 - Among-site rate variation 73 o Proportion of invariable sites = 0 o Variable sites (G) o Gamma distribution shape parameter = 1.0760 Phylogenetic analysis In order to generate the three trees using Bayesian analysis, maximum parsimony, and maximum likelihood, the file imported into PAUP was edited by copying and pasting the Modeltest blocks. This was done to both MSA’s and executed through PAUP. Parsimony analysis provides the simplest technique to develop trees along with very few assumptions. In doing so, this non-model derived algorithm utilizes only informative sites. The parameters set when running the heuristic search and developing the parsimony tree were left as the default. Under general search options, all characters are weighted equally and branches are set to collapse if the maximum length is zero. Character-State optimization is set to accelerated transformation. The step matrix option was selected to permit for the assignment of states not observed in terminal taxa to internal nodes and those selected can be recognized as possible short cuts by the “3 + 1” test. For Podoviridae, the heuristic search saved 2 trees and had 728 rearrangements. The best tree score was 1256. As for Siphoviridae, 1 tree was saved from 188 rearrangements and gave the best tree at 3144. A maximum parsimony of all the desired sequences with the default settings was ran in order to observe how the trees would result if the alignments between the two families of phages did not have many conserved regions. This developed 7902 rearrangements and saved 3 trees. The best score was 5222. 74 Bayesian analysis collects data to develop phylogenetic trees from prior probabilities. To run MrBayes, the PAUP and MrBayes blocks were already incorporated and executed in the MrBayes program, but brackets were placed to block out the Modeltest files. It was set to run one million generations in order to develop 10,000 trees. Where the standard deviation of split frequency stopped oscillating, that was the location of the tree chosen for all 3 Bayesian analysis trees. Podoviridae: - Selected tree 6111-10000 where the frequency was at 0.001944 - Show trees and indices - Computed a consensus tree for 50% majority-rule and included compatible groupings and frequencies of other bipartitions Siphoviridae: - Selected tree 4951-10000 where the frequency was at 0.005831 - Show trees and indices - Computed a consensus tree for 50% majority-rule and included compatible groupings and frequencies of other bipartitions All sequences: - Selected tree 6231-10000 where the frequency was at 0.002413 - Show trees and indices - Computed a consensus tree for 50% majority-rule and included compatible groupings and frequencies of other bipartitions This program is computer intensive and would present fairly high values being above 90 for Baysian trees. Maximum Likelihood analysis utilizes proper calculations in looking at all possible tree considerations. This in turn, searches for the most likelihood of producing observed data. However, this would need to utilize information from Modeltest in order to create optimal criteria in searching for the phylogenetic trees. Therefore, the Modeltest results were used for Maximum Likelihood analysis in order to run the heuristic search. 75 Podoviridae: - Heuristic Search o 352 rearrangements o 1 tree saved o Best tree: 5530.2840 Siphoviridae: - Heuristic Search o 212 rearrangements o 1 tree saved o Best tree: 12320.092 For all 20 sequences: - Heuristic Search o 2780 rearrangements o 1 tree saved o Best tree: 18011.541 Between the trees, there are similarities in the arrangements in Siphoviridae where the branching patterns are very similar but the consensus tree is just in a rectangular cladogram and takes into account a constant molecular clock (Figure 30 and 31). Overall, you can see the similarities of all three trees in that family. In Podoviridae, the parsimony tree (Figure 27) and the maximum likelihood tree (Figure 28) are very analogous to each other. In all 20 sequences, all three trees can draw some parallels and you can see the same pattern caused in Podoviridae with all 20 sequences (Figure 33, 34, and 35), which shows that they are very similar and have not diverged too long ago. 76 Figure 27. Phylogenetic tree of selected phage TLS from Podoviridae using 11 sequences in the Parsimony analysis and is shown in a rectangular cladogram. Figure 28. Phylogenetic tree of selected phage TLS from Podoviridae using 11 sequences in the Maximum Likelihood analysis and is shown in rectangular cladogram. 77 Figure 29. Phylogenetic tree of selected phage TLS from Podoviridae using 11 sequences in the Baysian analysis 50% majority-rule and is shown in rectangular cladogram. Figure 30. Phylogenetic tree of selected phage TLS from Siphoviridae using 9 sequences in the Parsimony analysis and is shown in rectangular cladogram. The red star indicates the methanophage PG. 78 Figure 31. Phylogenetic tree of selected phage TLS from Siphoviridae using 9 sequences in the Maximum Likelihood analysis and is shown in rectangular cladogram. The red star indicates the methanophage PG. Figure 32. Phylogenetic tree of selected phage TLS from Siphoviridae using 9 sequences in the Baysian analysis 50% majority-rule and is shown in rectangular cladogram. The red star indicates the methanophage PG. 79 Figure 33. Phylogenetic tree of selected phage TLS from Podoviridae and Siphoviridae using 20 sequences in the Parsimony analysis and is shown in rectangular cladogram. The red star indicates the methanophage PG. Figure 34. Phylogenetic tree of selected phage TLS from Podoviridae and Siphoviridae using 20 sequences in the Maximum Likelihood analysis and is shown in rectangular cladogram. The red star indicates the methanophage PG. 80 Figure 35. Phylogenetic tree of selected phage TLS from Podoviridae and Siphoviridae using 20 sequences in the Baysian analysis 50% majority-rule and is shown in rectangular cladogram. The red star indicates the methanophage PG. Bootstrap analysis performs a random shuffling of the MSA columns to resample the data and in an attempt to get the same tree from the data. The values presented represent the amount of times the branch has been regenerated where any value above 70 is significant. For both families of viruses, 1000 replicates were selected to be analyzed to produce some gauge of accuracy in the trees since the gene is susceptible to a considerable amount of mutation from phages. The Modeltest was ran for both and retrieved the same parameters to input into the maximum likelihood settings. The bootstrap generated some significant values but those with no values have collapsed branches producing the tree with less than 50% confidence. Compared to the Majority rule consensus trees (Figure 29 and 32), the Bootstrap values were a bit lower 81 but these values are more reliable (Figure 36-41). In addition, both bootstrap trees and the consensus trees have very similar arrangements but in Podoviridae, the clade containing Escherichia fergusonii, Salmonella ph9, Shigella phage in the Bayesian analysis shows that as being significant where in the bootstrap Escherichia fer is equally non-significant with the Bacteriophage phage L and Enteriobacteria phage clade (Figure 29). For Siphoviridae, the value of significance also matches to the Bayesian analysis along with the arrangements but the clade including Methanothermobacter phage, Methanobacterium phage, PG, and Methanosarcina is not significant to Mesorhizobrevibacter phage where in the 50% Majority rule consensus tree that clade is significant (Figure 32). 82 Figure 36. Bootstrap output from Podoviridae using 11 sequences. 83 Figure 37. Bootstrap tree from Podoviridae using 11 sequences. Figure 38. Podoviridae phylogram with bootstrap values using 11 sequences. 84 Figure 39. Bootstrap output from Siphoviridae using 9 sequences. 85 Figure 40. Bootstrap tree from Siphoviridae using 9 sequences. Figure 41. Siphoviridae phylogram with bootstrap values using 9 sequences. 86 Time of Divergence Jukes-Cantor non-synonymous substitutions were used to create the time of divergence table for both Podoviridae and Siphoviridae. Synonymous subustitutions Jukes-Cantor had values that were unable to be calculated therefore would not be significant to use. When developing a time of divergence, determining rates of change and the number of substitutions would help to figure out the estimated time when any 2 species may have had a common ancestor. Utilizing the equation, = K/(2t), is designated as the number of substitutions per site per year, K represents the amount of substitution between any pair of species. This K value is derived from MEGA4 in creating the distance matrices. The time of divergence between two sequences is indicated by t in the equation. Since working with bacteriophages, there were no designated numbers for , therefore the next closest possibility was the Bacteria valued at 4.5 x 10-9 substitutions/non-synonymous/year. In Podoviridae, there are some listed as having no divergence which is between Salmonella phage 4 and Salmonella phage 5, which is shown in the Bayesian analysis and the bootstrap analysis to be significantly related. The same results can be seen with Enterbacteria phage 7 and Bacteriophage P22. The next most recent divergence is seen at 333,000 years between several species strains of Salmonella phage and Enterboacteria phage. However the latest time of divergence is seen at 226,000,000 years ago between Salmonella phage 9 and Bacteriophage P22, and between Salmonella phage 9 and Enterobacteria phage ST104. As for Siphoviridae, the most recent time of divergence can be seen between Staphylococcus phage and Enterobacteria phage Felix01 and between Enterobacteria phage Felix01 and Enterbacteria phage WV8 which occurred around 111,000 years ago. 87 The latest time of divergence occurred at 133,000,000 years ago between Methanobrevibacter PG and Methanosarcina acetivorans. These values do not seem to match up with the bootstrap and Bayesian analysis values which show no significance between Staphylococcus and Enterobacteria phages. This could be due to the fact that the substitutions per site per year could not be accurate amongst phages since their capabilities of picking up DNA through vertical and horizontal transfer can occur more often than in regular bacteria. 88 Table 8. Podoviridae Jukes-Cantor non-synonymous time of divergence table Divergence Time Table Jukes-Cantor Non-Synonymous Podoviridae Bacterio phage_L Bacterio phage_L Salmone lla_phag Salmone lla_ph_2 Enteroba cteria Salmone lla_ph_4 Salmone lla_ph_5 Bacterio phage_P Enteroba cteri_7 Escheric hia_fer Salmone lla_ph_9 Shigella _phage Bacterio phage_L Salmone lla_phag Salmone lla_ph_2 Enteroba cteria Salmone lla_ph_4 Salmone lla_ph_5 Bacterio phage_P Enteroba cteri_7 Escheric hia_fer Salmone lla_ph_9 Shigella _phage Salmone lla_phag Salmone lla_ph_2 Entero bacteri a Salmone lla_ph_4 Salmone lla_ph_5 Bacterio phage_P Enterob acteri_7 Escheri chia_fer Salmone lla_ph_9 Shigella _phage 0.000 0.004 0.004 0.003 0.003 0.003 0.003 0.003 0.005 0.004 0.003 0.003 0.005 0.004 0.000 0.006 0.006 0.005 0.005 0.003 0.003 0.006 0.006 0.005 0.005 0.003 0.003 0.000 0.049 0.049 0.051 0.050 0.048 0.048 0.047 0.047 2.003 2.003 2.000 2.018 2.025 2.025 2.038 2.038 1.975 1.987 1.987 1.984 2.002 2.009 2.009 2.022 2.022 1.963 0.003 Bacterio phage_L Salmone lla_phag Salmone lla_ph_2 Salmone lla_ph_4 Salmone lla_ph_5 Bacterio phage_P Enterob acteri_7 Escheri chia_fer Salmone lla_ph_9 4.44E+0 5 3.33E+0 5 3.33E+0 5 3.33E+0 5 6.67E+0 5 6.67E+0 5 5.44E+0 6 2.23E+0 8 2.21E+0 8 3.33E+0 5 5.56E+0 5 5.56E+0 5 5.56E+0 5 5.56E+0 5 5.67E+0 6 2.22E+0 8 2.20E+0 8 0.00E+0 0 3.33E+0 5 3.33E+0 5 5.33E+0 6 2.25E+0 8 2.23E+0 8 3.33E+0 5 3.33E+0 5 5.33E+0 6 2.25E+0 8 2.23E+0 8 0.00E+0 0 5.22E+0 6 2.26E+0 8 2.25E+0 8 5.22E+0 6 2.26E+0 8 2.25E+0 8 2.19E+ 08 2.18E+ 08 3.33E+0 5 Entero bacteri a 0 4.44E+0 5 3.33E+0 5 3.33E+0 5 3.33E+0 5 6.67E+0 5 6.67E+0 5 5.44E+0 6 2.23E+0 8 2.21E+0 8 4.44E+ 05 4.44E+ 05 5.56E+ 05 5.56E+ 05 5.56E+ 06 2.24E+ 08 2.22E+ 08 89 Shigella _phage Table 9. Siphoviridae Jukes-Cantor non-synonymous time of divergence table Divergence Time Table Jukes-Cantor Non-Synonymous Siphovridae Enterobact Staphyloco Enteroba02 Erwinia_ph Methanothe Methanobac Methanosar Mesorhizob PG Enterobact Staphyloco 0.001 Enteroba02 0.001 0.002 Erwinia_ph 0.225 0.226 0.222 Methanothe 0.919 0.912 0.910 0.995 Methanobac 0.928 0.918 0.919 1.005 0.060 Methanosar 0.935 0.927 0.936 0.896 0.854 0.844 Mesorhizob 0.720 0.716 0.722 0.705 0.842 0.851 0.577 PG 1.140 1.139 1.142 1.152 0.917 0.904 1.195 1.144 Enterobact Staphyloco Enteroba02 Erwinia_ph Methanothe Methanobac Methanosar Mesorhizob Enterobact Staphyloco 1.11E+05 Enteroba02 1.11E+05 2.22E+05 Erwinia_ph 2.50E+07 2.51E+07 2.47E+07 Methanothe 1.02E+08 1.01E+08 1.01E+08 1.11E+08 Methanobac 1.03E+08 1.02E+08 1.02E+08 1.12E+08 6.67E+06 Methanosar 1.04E+08 1.03E+08 1.04E+08 9.96E+07 9.49E+07 9.38E+07 Mesorhizob 8.00E+07 7.96E+07 8.02E+07 7.83E+07 9.36E+07 9.46E+07 6.41E+07 PG 1.27E+08 1.27E+08 1.27E+08 1.28E+08 1.02E+08 1.00E+08 1.33E+08 90 1.27E+08 PG DISCUSSION The terminase enzyme is made up of two parts that are responsible for the packaging of DNA in bacteriophages (Black, 1995). Phages that are capable of this mechanism utilizes ATPase to deliver the DNA into the head capsid. The large terminal subunit of this enzyme is responsible for cutting the DNA and transporting it with the help of ATPase (Burroughs, et al., 2007). These tailed phages are seen within families of Myoviridae, Podoviridae, and Siphoviridae. Knowing that the exchange of DNA occurs more often in viruses than within bacteria, mutation and evolutionary divergence would come about quite often. This was seen when aligning all 20 sequences together, there was nearly no significant amounts of conserved domains. When blasting this gene, I came across conserved regions that have matched not only with other phages but with other bacteria as well. This led me to hypothesize that this gene could have incorporated some of the hosts’ DNA from infection and is now part of the phage genetic material. This would lead to more difficulty in determining evolutionary divergence, not to mention their capabilities of performing horizontal and vertical gene transfer. In addition, because they are so diverse, problems occurred when trying to develop certain trees that came to be insignificant. The bootstrap, for instance, at 500 replicates was not able to provide any significant values to indicate reliable data of evolutionary divergence. To help solve for the problem, I ran 1000 replicates for both Podoviridae and Siphovirdae. By obtaining more replicates, some significance came about and I could see that the Archaea phages were related to each other and similar species of other phages were related as well. 91 When evaluating the divergence time table, I noticed that with Siphovirdae, the earliest divergence between the other species was seen in the Archaea phage groups, averaging around 110,000,000 years. This would not be much of a surprise since the origin of Archaeal species is very old and could be the oldest lineage that still exists. This could very well show some sort of information on how or where the gene first derived from. All known tailed phages have single linear dsDNA and vary in genome size. Though packaged into a procapsid, their replication strategies and the types of terminal ends created from the packaging event are not all the same. The different terminal types of ends are determined by the differing actions of the terminase enzyme during DNA packaging and reflect different replication strategies (Casjens and Gilcrease, 2009). The known types of termini for tailed bacteriophages are single stranded cohesive ends, circularly permuted direct terminal repeats, direct terminal repeats (short or long), terminal host DNA sequences, and covalently bound terminal proteins, however, phages with terminal bound proteins do not require nucleolytic cleavage. Although the terminase enzyme creates various genomic ends, it is the most conserved tailed phage protein (Casjens and Gilcrease, 2009). PG’s putative terminase large subunit was shown to be significant and highly conserved. PG’s terminase family, terminase_6, was identified from the Pfam database and classified very similarly to λ’s terminase family terminase_GpA. In addition, terminal ends cluster together according to the type of DNA ends created by tailed phages and by knowing the amino acid sequence of PG’s large terminase subunit, it was shown to fall in the same clade with short direct terminal repeats/T7 and 5’-cos/λ. 92 Phages with direct terminal repeats could go unnoticed if the phage genome sequence was determined by the shotgun sequencing method. In addition, restriction digest analysis would result in equimolar fragments regardless of the heating and cooling pattern that is seen within cohesive end phages. However, essential purification steps were needed in order to effectively digest PG with restriction enzymes. Originally, phenol chloroform extraction of PG was conducted to purify the DNA. The Nanodrop spectrophotometer indicated a relatively pure DNA sample but when ran on a gel, the PG DNA would create a smear due to a nuclease contaminant being active in MOPS buffer. Nevertheless, when switching to citrate buffer, we encountered another problem that a contaminant inhibited the activity of restriction enzymes. When mixing M13’s ssDNA with PG’s dsDNA, mung bean nuclease was not inhibited in digesting M13, which gave reason to believe that the contaminant was not soluble and could be attached to PG’s DNA. In order to move forward, I focused on resolving the matter and identified that PG DNA needed to go through an additional purification step after phenol chloroform extraction. I attempted to extract PG DNA from an agarose gel but lost too much DNA product. Therefore, PG DNA was ran through a Zymo-SpinTM column and resolved the efficiency of restriction enzymes. PG was digested with ClaI and EcoRV and resulted in no distinguishing gel patterns after heating and cooling temperatures. However, the results of PG’s restriction fragment pattern does not exclude PG from having cos sites. The cos site could have been concealed by true restriction fragments or the cos site could have been on a small band that ran off the gel. Furthermore, the single stranded sticky ends could be insufficient in length to join and maintain the two fragments together in a gel, similar to 93 complementary overhangs created with restriction enzymes. This would indicate no annealing under slow cooling conditions after the cohesive ends have been heated. Due to the uncertainty of the results, PG DNA was further analyzed for cohesive ends by determining the possible base composition of the single stranded ends by high performance liquid chromatography. HPLC has been used for precise measurements of DNA base composition and a great alternative for determining G+C content. PG DNA was hydrolyzed into nucleosides with mung bean nuclease and alkaline phosphatase. λ was concurrently analyzed as a positive control for known cohesive ends and displayed the presence of the four nucleosides from λ’s 12 base pair extensions. Proper controls and standards were also injected into the HPLC to determine any additional chromatographic peaks that could be seen within any injected test sample. If PG were to have cos sites, we would expect the complementary overhanging ends to anneal back together after extraction, as seen with λ. By heating PG to 65-70°C, we hypothesized that the cos site would separate and the single stranded ends could be digested with mung bean nuclease. After PG has been heated, digested, and filtered, the HPLC results identified nucleosides cytosine, adenosine, guanosine, and thymidine. However, when observing the nucleosides concentration, the absorbance units measured unusually high when compared with λ. The chromatograms do not seem reasonable to indicate that PG has longer overhanging ends than λ since we saw no change and bands annealing back together (Figure 11 and 12) as seen in λ (Figure 10). The results may be due to PG’s AT rich genome creating a low melting temperature. The heating could have caused AT rich regions within PG’s genome to separate and be susceptible by mung bean nuclease. 94 Another sample of PG was analyzed for cohesive ends by HPLC but did not undergo any heat treatment. The results indicated the presence of adenosine and thymidine at reasonable absorbance levels with respect to λ. Without having to heat PG, mung bean nuclease was able to cleave off single stranded extensions giving reason to believe that after PG DNA extraction, the genome remained linear and did not recircularize. The display of single stranded extensions of adenosine and thymidine specifies that PG does have cohesive ends, however, the results does not identify the specific length of the single stranded extension and which strand the extension is on. Deoxynucleotide sequencing would need to be ran on both ends off of the template PG DNA in order to determine the precise location of where the template ends at each terminus. The location of the overhanging end can be identified by comparison to a ligated sequence of PG. Whether PG’s genome has 5’ or 3’ overhanging ends, circularization did not take place after DNA extraction. This could be due to PG’s AT rich nature and not having the appropriate bond strength to maintain a closed template. Adenine forms only two hydrogen bonds with thymine, where cytosine and guanine forms three hydrogen bonds. In addition, the cohesive ends of PG could be shorter in length when compared to λ. Theoretically, in order for PG’s linear genome to serve as a closed circular template, it would be ligated together by the host DNA ligase to serve as a template for DNA replication. 95 Though PG has been identified to obtain cohesive ends, the replication strategy still remains undetermined. It would be of interest to determine the replication mechanism of PG which can be determined experimentally by dissecting phage encoded proteins from host proteins recruited for replication (Weigel and Seitz, 2006). Although PG’s terminal end does reflect the replication strategy of a rolling circle, it cannot be reliably predicted unless the replication genes are similar to the replication module. However, studying phage replication modules with cohesive ends can gain a better understanding of the replication of PG. Conclusion The original hypothesis was that if the terminase packaging uses the cos site as part of its packaging strategy, then restriction digest and HPLC nucleoside analysis would show cohesive ends. The results of this work using Bioinformatics, restriction enzyme analysis, and HPLC, I conclude that PG has AT rich cohesive ends and suggests the use of circular replication as its packaging strategy. 96 REFERENCES Abedon, S. T., & Calendar, R. (2006). The Bacteriophages. New York: Oxford University Press. Baker, S., Nicklin, J., & Griffiths, C. (2011). BIOS instant notes in microbiology. London: Taylor & Francis. Baresi, L. and Bertani, G. (1984). Isolation of a bacteriophage for a methanogenic bacterium. Abstract 84th Annual Meeting American Society for Microbiology. I-74, p. 133. Black, W. L. (1995). DNA packaging and cutting by phage terminases: control in phage T4 by a synaptic mechanism. BioEssays. 17(12), p. 1025-1030. Blaut, M. (1994). Metabolism of methanogens. Antonie Van Leeuwenhoek. 66(1-3), p. 187-208. Burroughs, A. M., Iyler, L. M., & Aravind, L. (2007). Comparative genomics and evolutionary trajectories of viral ATP dependent DNA-packaging systems. Gene and Protein Evolution. 3, p. 48-65. Calender, R. (1988). The Bacteriophages: Volume 1. New York: Plenum Press. Calender, R. (1988). The Bacteriophages: Volume 2. New York: Plenum Press. Cann, A. J. (2005). Principles of Molecular Virology. Burlington, MA: Elsevier Academic Press. Casjens, S. R. and Gilcrease, E. B. (2009). Determining DNA packaging strategy by analysis of the termini of the chromosomes in tailed-bacteriophage virions. Bacteriophages: Methods and Protocols. Humana Press. 2(7), p. 91-111. Cavicchioli, R. (2011). Archaea — timeline of the third domain. Nature Reviews Microbiology. 9(1), p. 51-61. Deresinski, S. (2009). Bacteriophage therapy: Exploiting smaller fleas. Clinical Infectious Diseases. 48(8), p. 1096-1101. Desselberger, U. (2002). Virus taxonomy: Classification and nomenclature of viruses. Virus Research. 83(1), p. 221-222. Dimitrov, D. (2004). Virus entry: Molecular mechanisms and biomedical applications. Nature Reviews. Microbiology. 2(2), p. 109-122. 97 Edgar, R. (2004). Muscle: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. 32(5), p. 1792-1797. Felsenstein, J. (1989). PHYLIP: Phylogeny Inference Package (Version 3.2). Cladistics 5, p. 164-166. Ferry, J. (2010). The chemical biology of methanogenesis. Planetary and Space Science. 58(14), p. 1775-1783. Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L., Gunesekaran, P., Ceric, G., Forslund, K., Holm, L., Sonnhammer, E.L., Eddy, S.R., and Bateman A. (2010). Pfam: clans, web tools and services. Nucleic Acids Res. 38 (Database issue):D211-222. Finn, R.D., Mistry, J., Schuster-Böckler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S.R., Sonnhammer, E.L., and Bateman, A. (2006). Pfam: clans, web tools and services. Nucleic Acids Research. 34: D247-51. Forterre, P., Prangishvili, D., & Garrett, R. (2006). Viruses of the archaea: A unifying view. Nature Reviews Microbiology. 4(11), p. 837-848. Geer LY, Domrachev M, Lipman DJ, Bryant SH (2002). CDART: protein homology by domain architechture., Genome Research 12(10), p. 1619-1623. Hegde, S., Padilla-Sanchez, V., Draper, B., & Rao, V. (2012). Portal-large terminase interactions of the bacteriophage T4 DNA packaging machine implicate a molecular lever mechanism for coupling ATPase to DNA translocation. Journal of Virology. 86(8), p. 4046-4057. Hendrix, R., Hatfull, G., & Smith, M. (2003). Bacteriophages with tails: Chasing their origins and evolution. Research in Microbiology. 154(4), p. 253-257. Higgins, D.G., Bleasby, A.J. and Fuchs, R. (1992). CLUSTAL V: improved software for multiple sequence alignment. Computer Applications in the Biosciences (CABIOS). 8(2), p. 189-191. Johnson, K., & Johnson, D. (1995). Methane emissions from cattle. Journal of Animal Science. 73(8), p. 2483-2492. Jukes, T.H., & Cantor, C.R. (1969). Evolution of protein molecules. Mammalian Protein Metabolism. p. 21-132. Kabsch W., & Sander C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 12, p. 25772637. 98 Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution. 16(2), p. 111-120. Leigh, J., Albers, S., Atomi, H., & Allers, T. (2011). Model organisms for genetics in the domain archaea: Methanogens, halophiles, thermococcales and sulfolobales. FEMS Microbiology Reviews. 35(4), p. 577-608. Marchler-Bauer A, et al. (2007), CDD: specific functional annotation with the Conserved Domain Database., Nucleic Acids Research. 37, p. 237-240. Marchler-Bauer A, Bryant SH (2004). CD-Search: protein domain annotations on the fly., Nucleic Acids Research.32, p. 327-331. Lobocka, M., & Szybalski, W. T. (2012). Bacteriophages. Boston: Elsevier. Mc, G. S., & Sinderen, D. V. (2007). Bacteriophage: Genetics and molecular biology. Norfolk: Caister Academic. Mitchell, R., Loeblich, L., Klotz, L., & Loeblich, 3rd, A. (1979). DNA organization of methanobacterium thermoautotrophicum. Science. 204, p. 1082-1084. Mitchell, S. M., Matsuzaki, S., Imai, S., Rao V. B. (2002). Sequence analysis of bacteriophage T4 DNA packaging/terminase genes 16 and 17 reveals a common ATPase center in the large subunit of viral terminases. Nucleic Acids Research. 30(18), p. 40094021. Moss, A., Jouany, J., & Newbold, J. (2000). Methane production by ruminants: Its contribution to global warming. Annales De Zootechnie. 49(3), p. 231-253. Orlova, V.E. (2009). How viruses infect bacteria. EMBO J. 28, p. 797-798. Pfister, P., Wasserfallen, A., Stettler, R., & Leisinger, T. (1998). Molecular analysis of methanobacterium phage (psi m2). Molecular Microbiology. 30(2), p. 233. Posada, D., and Buckley, T. R., (2004). Model selection and model averaging in phylogenetics: advantages of the AIC and Bayesian approaches over likelihood ratio tests. Systematic Biology. 53, p. 793-808. Posada, D., and Crandall, K. A., (1998). Modeltest: testing the model of DNA substitution. Bioinformatics. 14(9), p. 817-818. Price, M., Dehal, P. , & Arkin, A. (2010). Fasttree 2--approximately maximum-likelihood trees for large alignments. PloS One. 5(3), e9490. 99 Reeve, J. N. (1992). Molecular Bio1ogy of Methanogens. Annual Review of Microbiology, USA. 46, p. 165-191. Samuel, B., & Gordon, J. (2006). A humanized gnotobiotic mouse model of hostarchaeal-bacterial mutualism. Proceedings of the National Academy of Sciences of the United States of America. 103(26), p. 10011-10016. Samuel, B., Hansen, E., Manchester, J., Coutinho, P. , Henrissat, B., et al. (2007). Genomic and metabolic adaptations of methanobrevibacter smithii to the human gut. PNAS. 104(25), p. 10643-10648. Snyder, J., & Young, M. (2011). Advances in understanding archaea-virus interactions in controlled and natural environments. Current Opinion in Microbiology. 14(4), p. 497503. Stedman, K. M., Porter, K., Dyall-Smith, M. L. (2010). The isolation of viruses infecting Archaea. Manual of Aquatic Viral Ecology. p. 57-64. Sun, S., Kondabagil, K., Draper, B., Alam, T.I., Bowman, V.D., Zhang, Z., Hegde, S., Fokine, A., Rossmann, M.G., and Rao, V.B. (2008). The structure of the phage T4 DNA packaging motor suggests a mechanism dependent on electrostatic forces. Cell. 135(7), p. 1251-1262. Swofford, D. L. (2002). PAUP*. Phylogenetic Analysis Using Parsimony Version 4. Sinauer Associates. Sunderland, Massachusetts. Tajima, F., & Nei, M. (1984). Estimation of evolutionary distance between nucleotide sequences. Molecular Biology and Evolution. 1(3), 269. Thompson J.D., Higgins D.G., Gibson T.J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research. 22, p. 4673-4680. Trun, N. J., & Trempy, J. E. (2004). Fundamental bacterial genetics. Malden, MA: Blackwell. Van Nevel, C., & Demeyer, D. (1996). Control of rumen methanogenesis. Environmental Monitoring and Assessment. 42(1-2), p. 73-97. Whelan, S., & Goldman, N. (2001). A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Molecular Biology and Evolution. 18(5), p. 691-699. Weigel, C., & Seitz, H. (2006). Bacteriophage replication modules. FEMS Microbiology Reviews. 30(3), p. 321-381. 100 Woese, C R., and Fox, G.E. (1977). Phylogenetic structure of the prokaryotic domain: the primary kingdoms. PNAS. 74, p. 5088–5090. 101 APPENDIX A Materials Acetic acid, Glacial (Fisher, 64-19-7) Agar (Difco Agar, 214530) Agarose (Life Technologies, 9012-36-6) Ammonium chloride [NH4Cl] (Sigma, 12125-02-9) Ammonium sulfate [(NH4)2SO4] (Fisher, 7783-20-2) Ampicillin (Sigma, A-6140) Biotin (Sigma B4501) Bovine serum albumin (BSA) (Sigma A3733) Bromophenol blue (Sigma 114413) Calcium chloride [CaCl2] (Fisher, 10043-52-4) Casamino Acids (Difco 023-17-3) Chloroform [CCl4] (Fisher C607-1) L-cysteine (Sigma C9768-10) Deionized water (PurE water) [dH20] Ethanol [CH3CH2OH] 102 Ethidium Bromide [EtBr] (Sigma, 1239-45-8) Ethylene Diamine Tetraacetic Acid [EDTA] (Research Organics, Inc. 6381-92-6) Ferrous sulfide [FeS] Acros 1317-37-9) GeneRulerTM 1 kb DNA Ladder (Thermo) Glycerol (Fisher, BP229-1) Glycogen (Sigma G0885) H2/CO2 70:30 gas mix (Air Products) Hydrochloric acid [HCl] (Sigma, 7647-01-0) LB (Difco LB Agar, 244520) Lithium chloride [LiCl] (Sigma, 7447-41-8) Magnesium chloride [MgCl2] (Fisher, 7791-18-6) Magnesium sulfate [MgSO4 7 H2O] (Spectrum, 7487-88-9) Methane gas [CH4] (Air Products) Methanol [CH3OH] (Fisher A411-20) Mineral oil (Fisher 80-47-5) N2/CO2 70:30 gas mix (Air Products) Potassium chloride [KCl] (Fisher, 7447-40-7) 103 Potassium phosphate dibasic [K2HPO4] (J.T. Baker 7758-11-4) Potassium phosphate monobasic [KH2PO4] (Sigma, P-5379) Sodium acetate (Fisher, 6131-90-4) Sodium bicarbonate [NaHCO3] (Fisher, 144-55-8) Sodium carbonate [Na2CO3] (Fisher 497-19-8) Sodium chloride [NaCl] (Aldrich 7647-14-5) Sodium Dodecyl Sulfate [SDS] (Sigma L-4390) Sodium hydroxide [NaOH] (Fisher, 1310-73-2) Sodium sulfide [Na2S] (Fisher, 1313-84-4) Sucrose (Criterion C7021) Trace Minerals (Bertani and Baresi, 1987) Tris/Acetic Acid/EDTA [TAE 50X] (BioRad TAE buffer) Tris base (Fisher BP152-1) Tris-HCl (Barker X186-05) Triton® X-100 (Sigma, 9002-93-1) UltraPURE agarose (Life Technologies, 9012-36-6) Vancomycin (Sigma V2002) 104 Yeast Extract (Difco 212750 Pure E Water Pure E water is a type 1 ultrapure water using Thermo Scientific™ Barnstead™ E-Pure™ Ultrapure Water Purification Systems. Deionized water is ran through the filtration system and has a 0.2 µm filter removing bacteria and particulates. Antibiotics Ampicillin Ampicillin 200mg dH20 100mL Final concentration 2 mg/mL Water was made anaerobic under N2/CO2 (70:30) gas atmosphere and dispensed into 100mL aliquot samples per bottle under H2/CO2 (70:30) gas atmosphere. All bottles were closed with rubber stopper and aluminum seal using the seal crimper. All bottles were sterilized by autoclaving. After cooling to room temperature, Ampicillin was added to each bottle inside the anaerobic hood and filter sterilized using a 0.2µm filter. Stored at 4°C temperature. Antibiotic Mix Ampicillin 0.2g D-Cycloserine 0.02g Vancomycin 0.02g dH20 100mL Final concentration 0.2%, 0.02%, and 0.02%, respectively Water was made anaerobic under N2/CO2 (70:30) gas atmosphere and dispensed into 100mL aliquot samples per bottle under H2/CO2 (70:30) gas atmosphere. All bottles were closed with rubber stopper and aluminum seal using the seal crimper. All bottles were sterilized by autoclaving. After cooling to room temperature, the three antibiotics were added to each bottle inside the anaerobic hood and filter sterilized using a 0.2µm filter. Stored at 4°C temperature. Media MS06 NH4Cl Mineral 1 0.125g 5mL 105 Mineral 2 TM 0.4% CaCl2 Na Acetate Cysteine Agar dH20 5mL 0.1mL 0.5mL 0.8g 50mg 1.4g 100mL Trace Minerals (TM) MnSO4 . H2O FeSO4 . 7H2O CoCl2 . 6H2O ZnSO4 . 7H2O CuSO4 . 5H2O AlK(SO4)2 . 12H2O H3BO3 NaMoO4 . 2H2O NiCl2 . 6H2O NaSeO3 . 5H2O dH20 0.5g 0.1g 0.1g 0.1g 0.01g 0.01g 0.01g 0.01g 0.05g 0.263g 1L 1.5g of Nitrilotriacetic acid was dissolved with KOH to pH 6.5 and the above minerals were added to it. Final pH was 7.0. Sterilized by autoclaving and stored at 4°C. Solution and Reagents Agarose gel for PCR products Agarose 1X TAE buffer Final agarose 0.8% 0.24g 30mL Agarose 1X TAE buffer Final agarose 1.0% 0.30g 30mL Agarose 1X TAE buffer Final agarose 1.5% 0.45g 30mL Double boiling over a flame melted Agarose solution. It was allowed to cool to 50°C before poured into gel tray. “B” Solution Yeast Extract Casamino acids 12.5g 12.5g 106 Wolf’s vitamins dH2O 3 µl 100 mL Made anaerobically under N2/CO2 (70:30) gas atmosphere and dispensed into 4.5mL aliquot samples per tube under H2/CO2 (70:30) gas atmosphere. All tubes were closed with rubber stopper and aluminum seal using the seal crimper. All tubes were sterilized by autoclaving. After cooling to room temperature 100µL of 1% Na2S, 100µL of 6.5% NaHCO3, and 100µL of Biotin were added to each tube of “B” Supplement. Stored at room temperature. Biotin Biotin 50mg dH20 25mL Final concentration 2mg/mL Water was made anaerobic under N2/CO2 (70:30) gas atmosphere and dispensed into 25mL aliquot samples per bottle under H2/CO2 (70:30) gas atmosphere. All bottles were closed with rubber stopper and aluminum seal using the seal crimper. All bottles were sterilized by autoclaving. After cooling to room temperature, biotin was added to each bottle inside the anaerobic hood and filter sterilized using a 0.2µm filter. Stored at room temperature. CaCl2 CaCl2 dH20 Final concentration 0.4% 4g 1L Made aerobically and stored at room temperature. EDTA EDTA dH20 Final concentration 0.5M 186.1g 1L Solution was adjusted to pH 8 and stored at room temperature. 70% Ethanol 100% Ethanol 70mL dH20 30mL Final concentration 70% and stored at -20°C. EtBr EtBr (10mg/mL) 100µL dH20 100mL Final concentration 100µg/mL 107 Solution was mixed and stored in a foil-covered container at room temperature. Filter papers were also soaked in the solution. Solution was only handled with gloves. 1N HCl HCl (conc) 83.3mL dH20 916.7mL Final concentration 1N and stored at room temperature. Indicator Dye used for loading samples on agarose gels. Bromophenol blue 0.025g Sucrose 5g 1M Tris buffer pH 8 10µL Final concentrations 0.25%, Bromophenol blue, 50% Sucrose, and 1mM Tris pH 8. Raise the volume to total 10mL with dH20. Dispensed into 1mL aliquot samples in Eppendorf tubes and stored at -20°C. 1M KCl KCl 7.456g dH20 100mL Final concentration 1M and stored at room temperature. 5M LiCl LiCl 21.2g dH20 100mL Final concentration 5M and stored at room temperature. Mineral 1 K2HPO4 3.1g dH20 1L Stored at room temperature. Mineral 2 KH2PO4 (NH4)2SO4 NaCl MgSO4 . 7H2O dH20 Stored at room temperature. 3.0g 6.0g 12.0g 2.4g 1L NaCl NaCl 5.844g dH20 100mL Final concentration 1M and stored at room temperature. 108 NaHCO3 NaHCO3 dH20 Final concentration: 6.25% 6.25g 100mL Made anaerobically in serum bottles under H2/CO2 (70:30) gas atmosphere and dispensed into 25mL aliquot samples per bottle under H2/CO2 (70:30) gas atmosphere. All bottles were closed with rubber stopper and aluminum seal using the seal crimper. All bottles were sterilized by autoclaving. Stored at room temperature. 1 M NaOH NaOH 40g dH20 1L Final concentration 1M and sterilized by autoclaving and stored at room temperature. 0.1M NaOH 1M NaOH 100mL dH20 900mL Final concentration 0.1M and sterilized by autoclaving and stored at room temperature. SDS SDS 40g dH20 100mL Final concentration 40% and stored at room temperature. Buffers Lysis buffer 1M Tris 5mL 0.5M EDTA 0.2mL 1M NaCl 10mL dH20 60mL Final concentrations 50mM Tris-HCl – 1mM EDTA – 100mM NaCl Adjust pH to 8.0 and raise the volume to 100mL with dH20. Sterilized by autoclaving. Stored at room temperature. PCR Reaction buffer 1M Tris buffer pH 9 1M KCl 10% Triton X-100 0.5M MgCl2 0.1mL 0.5mL 0.1mL 0.1mL 109 dH20 0.2mL Total volume 1mL Final concentrations 100mM Tris – 500mM KCl – 1% Triton X-100, and 50mM MgCl2. Stored at 4°C. 1X TAE 50X TAE 20mL dH20 980mL Stored at room temperature. 5M Tris buffer Tris base dH20 Final concentration 5M 60.57g 70mL Adjust pH to 8.5 with 1N HCl and raise the volume to 100mL with dH20. Sterilized by autoclaving. Stored at room temperature. 1.5M Tris buffer 5M Tris buffer pH 8.5 dH20 Final concentration 1.5M 30mL 50mL Adjust pH to 8.8 with 1N NaOH and raise the volume to 100mL with dH20. Sterilized by autoclaving. Stored at 4°C. 1M Tris buffer Tris base dH20 Final concentration 1M 121.14g 800mL Adjust pH to 8 with 1N HCl and raise the volume to 1L with dH20. Sterilized by autoclaving. Stored at room temperature. 1M Tris buffer Tris base dH20 Final concentration 1M 121.14g 800mL Adjust pH to 7 with 1N HCl and raise the volume to 1L with dH20. Sterilized by autoclaving. Stored at room temperature. 1M Tris buffer Tris base dH20 Final concentration 1M 121.14g 800mL 110 Adjust pH to 9 and raise the volume to 1L with dH20. Sterilized by autoclaving. Stored at room temperature. 0.5M Tris buffer 1M Tris buffer pH 7 dH20 Final concentration 0.5M 50mL 30mL Adjust pH to 6.8 with 1N HCl and raise the volume to 100mL with dH20. Sterilized by autoclaving. Stored at 4°C. TE buffer 1M Tris buffer 1mL 0.5M EDTA 0.8mL dH20 5mL Final concentrations 100mM Tris – 40mM EDTA Adjust pH to 7.5 using 1N HCl and raise the volume to 10mL with dH20. Sterilized by autoclaving. Stored at room temperature. TE buffer 1M Tris buffer 1mL 0.5M EDTA 0.2mL dH20 80mL Final concentrations 10mM Tris – 1mM EDTA Adjust pH to 8.5 and raise the volume to 100mL with dH20. Sterilized by autoclaving. Stored at room temperature. 111 APPENDIX B Multiple Sequence Alignment of TLS Nucleotide Sequences for Podoviridae 112 113 114 Multiple Sequence Alignment of TLS Nucleotide Sequences for Siphoviridae 115 116 117 Multiple Sequence Alignment of TLS Nucleotide Sequences for All 20 Sequences 118 119 120 121 122 123 Multiple Sequence Alignment of TLS Amino Acid Sequence for Podoviridae 124 Multiple Sequence Alignment of TLS Amino Acid Sequence for Siphoviridae 125 Multiple Sequence Alignment of TLS Amino Acid Sequence for All 20 Sequences 126 127 APPENDIX C Distance Matrices for TLS in Podoviridae Using MEGA4 MEGA Kimura 2 Podoviridae Title: : Podoviridae.dat Description No. of Taxa : 11 Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene Project\Podoviridae\Step 8 Mega\Podoviridae.meg Data Title : : Podoviridae.dat Data Type : Nucleotide (Coding) Analysis : Pairwise distance calculation ->Compute : Distances only Include Sites : ============================== ->Gaps/Missing Data : Complete Deletion ->Codon Positions : 1st+2nd+3rd+Noncoding Substitution Model : ============================== ->Model : Nucleotide: Kimura 2-parameter ->Substitutions to Include : d: Transitions + Transversions ->Pattern among Lineages : Same (Homogeneous) ->Rates among sites : Uniform rates No. of Sites : 1413 d : Estimate [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10] [11] [ ] [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10] [11] #Bacteriophage_L #Salmonella_phag #Salmonella_ph_2 #Enterobacteria #Salmonella_ph_4 #Salmonella_ph_5 #Bacteriophage_P #Enterobacteri_7 #Escherichia_fer #Salmonella_ph_9 #Shigella_phage 1 0.000 0.010 0.009 0.012 0.012 0.039 0.039 0.163 2.536 2.338 2 0.010 0.009 0.012 0.012 0.039 0.039 0.163 2.536 2.338 3 0.004 0.011 0.011 0.040 0.040 0.163 2.600 2.352 4 0.009 0.009 0.039 0.039 0.161 2.665 2.393 5 6 0.000 0.035 0.035 0.157 2.585 2.393 0.035 0.035 0.157 2.585 2.393 128 7 8 9 10 0.000 0.135 0.135 2.665 2.665 2.338 2.445 2.445 2.171 0.035 11 MEGA Jukes Cantor Podoviridae Title: : Podoviridae.dat Description No. of Taxa : 11 Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene Project\Podoviridae\Step 8 Mega\Podoviridae.meg Data Title : : Podoviridae.dat Data Type : Nucleotide (Coding) Analysis : Pairwise distance calculation ->Compute : Distances only Include Sites : ============================== ->Gaps/Missing Data : Complete Deletion ->Codon Positions : 1st+2nd+3rd+Noncoding Substitution Model : ============================== ->Model : Nucleotide: Jukes-Cantor ->Substitutions to Include : All ->Pattern among Lineages : Same (Homogeneous) ->Rates among sites : Uniform rates No. of Sites : 1413 d : Estimate [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10] [11] [ ] [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10] [11] #Bacteriophage_L #Salmonella_phag #Salmonella_ph_2 #Enterobacteria #Salmonella_ph_4 #Salmonella_ph_5 #Bacteriophage_P #Enterobacteri_7 #Escherichia_fer #Salmonella_ph_9 #Shigella_phage 1 0.000 0.010 0.009 0.012 0.012 0.038 0.038 0.161 2.357 2.250 2 0.010 0.009 0.012 0.012 0.038 0.038 0.161 2.357 2.250 3 0.004 0.011 0.011 0.040 0.040 0.161 2.374 2.250 4 0.009 0.009 0.038 0.038 0.159 2.408 2.279 5 6 0.000 0.035 0.035 0.155 2.374 2.279 0.035 0.035 0.155 2.374 2.279 MEGA Tajima-Nei Podoviridae 129 7 8 9 10 0.000 0.134 0.134 2.408 2.408 2.209 2.309 2.309 2.108 0.035 11 Title: : Podoviridae.dat Description No. of Taxa : 11 Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene Project\Podoviridae\Step 8 Mega\Podoviridae.meg Data Title : : Podoviridae.dat Data Type : Nucleotide (Coding) Analysis : Pairwise distance calculation ->Compute : Distances only Include Sites : ============================== ->Gaps/Missing Data : Complete Deletion ->Codon Positions : 1st+2nd+3rd+Noncoding Substitution Model : ============================== ->Model : Nucleotide: Tajima-Nei ->Substitutions to Include : All ->Pattern among Lineages : Same (Homogeneous) ->Rates among sites : Uniform rates No. of Sites : 1413 d : Estimate [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10] [11] [ ] [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10] [11] #Bacteriophage_L #Salmonella_phag #Salmonella_ph_2 #Enterobacteria #Salmonella_ph_4 #Salmonella_ph_5 #Bacteriophage_P #Enterobacteri_7 #Escherichia_fer #Salmonella_ph_9 #Shigella_phage 1 0.000 0.010 0.009 0.012 0.012 0.039 0.039 0.165 2.442 2.311 2 0.010 0.009 0.012 0.012 0.039 0.039 0.165 2.442 2.311 3 0.004 0.011 0.011 0.040 0.040 0.164 2.465 2.310 4 0.009 0.009 0.039 0.039 0.163 2.509 2.345 5 6 0.000 0.035 0.035 0.158 2.459 2.342 0.035 0.035 0.158 2.459 2.342 7 8 9 10 0.000 0.135 0.135 2.512 2.512 2.308 2.390 2.390 2.178 0.035 MEGA Nei-Gojobori JC Synonymous Podoviridae Title: : Podoviridae.dat Description No. of Taxa : 11 Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene Project\Podoviridae\Step 8 Mega\Podoviridae.meg 130 11 Data Title : : Podoviridae.dat Data Type : Nucleotide (Coding) Analysis : Pairwise distance calculation ->Compute : Distances only Include Sites : ============================== ->Gaps/Missing Data : Complete Deletion Substitution Model : ============================== ->Model : Codon: Nei-Gojobori (Jukes-Cantor) ->Substitutions to Include : s: Synonymous only ->Pattern among Lineages : Same (Homogeneous) ->Rates among sites : Uniform rates No. of Sites : 470 dS : Estimate [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10] [11] [ ] [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10] [11] #Bacteriophage_L #Salmonella_phag #Salmonella_ph_2 #Enterobacteria #Salmonella_ph_4 #Salmonella_ph_5 #Bacteriophage_P #Enterobacteri_7 #Escherichia_fer #Salmonella_ph_9 #Shigella_phage 1 0.000 0.032 0.029 0.046 0.046 0.166 0.166 0.781 ? ? 2 0.032 0.029 0.046 0.046 0.166 0.166 0.781 ? ? 3 0.010 0.032 0.032 0.174 0.174 0.765 ? ? 4 0.029 0.029 0.170 0.170 0.756 ? ? 5 6 0.000 0.159 0.159 0.731 ? ? 0.159 0.159 0.731 ? ? 7 8 9 10 0.000 0.551 0.551 ? ? ? ? ? 3.097 0.150 MEGA Nei-Gojobori JC Non Synonymous Podoviridae Title: : Podoviridae.dat Description No. of Taxa : 11 Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene Project\Podoviridae\Step 8 Mega\Podoviridae.meg Data Title : : Podoviridae.dat Data Type : Nucleotide (Coding) Analysis : Pairwise distance calculation ->Compute : Distances only Include Sites : ============================== ->Gaps/Missing Data : Complete Deletion 131 11 Substitution Model : ============================== ->Model : Codon: Nei-Gojobori (Jukes-Cantor) ->Substitutions to Include : n: Nonsynonymous only ->Pattern among Lineages : Same (Homogeneous) ->Rates among sites : Uniform rates No. of Sites : 470 dN : Estimate [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10] [11] [ ] [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10] [11] #Bacteriophage_L #Salmonella_phag #Salmonella_ph_2 #Enterobacteria #Salmonella_ph_4 #Salmonella_ph_5 #Bacteriophage_P #Enterobacteri_7 #Escherichia_fer #Salmonella_ph_9 #Shigella_phage 1 0.000 0.004 0.003 0.003 0.003 0.006 0.006 0.049 2.003 1.987 2 0.004 0.003 0.003 0.003 0.006 0.006 0.049 2.003 1.987 3 0.003 0.005 0.005 0.005 0.005 0.051 2.000 1.984 4 0.004 0.004 0.005 0.005 0.050 2.018 2.002 5 6 0.000 0.003 0.003 0.048 2.025 2.009 0.003 0.003 0.048 2.025 2.009 7 8 9 10 0.000 0.047 0.047 2.038 2.038 1.975 2.022 2.022 1.963 0.003 MEGA P-distance Podoviridae Title: : Podoviridae.dat Description No. of Taxa : 11 Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene Project\Podoviridae\Step 8 Mega\Podoviridae.meg Data Title : : Podoviridae.dat Data Type : Nucleotide (Coding) Analysis : Pairwise distance calculation ->Compute : Distances only Include Sites : ============================== ->Gaps/Missing Data : Complete Deletion ->Codon Positions : 1st+2nd+3rd+Noncoding Substitution Model : ============================== ->Model : Nucleotide: p-distance ->Substitutions to Include : d: Transitions + Transversions ->Pattern among Lineages : Same (Homogeneous) ->Rates among sites : Uniform rates No. of Sites : 1413 132 11 d : Estimate [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10] [11] [ ] [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10] [11] #Bacteriophage_L #Salmonella_phag #Salmonella_ph_2 #Enterobacteria #Salmonella_ph_4 #Salmonella_ph_5 #Bacteriophage_P #Enterobacteri_7 #Escherichia_fer #Salmonella_ph_9 #Shigella_phage 1 0.000 0.010 0.008 0.012 0.012 0.038 0.038 0.145 0.718 0.713 2 0.010 0.008 0.012 0.012 0.038 0.038 0.145 0.718 0.713 3 0.004 0.011 0.011 0.039 0.039 0.145 0.718 0.713 4 0.009 0.009 0.038 0.038 0.144 0.720 0.714 5 6 0.000 0.034 0.034 0.140 0.718 0.714 0.034 0.034 0.140 0.718 0.714 133 7 8 9 10 0.000 0.122 0.122 0.720 0.720 0.711 0.715 0.715 0.705 0.034 11 Distance Matrices for TLS in Podoviridae Using PAUP Paup-Podoviridae Kimura 2 Distance #NEXUS [Distance matrix saved Tuesday, April 20, 2010 [! Distance measure = Kimura 2-parameter ] Begin taxa; Dimensions ntax=11; Taxlabels Bacteriophage_L Salmonella_phag Salmonella_ph_2 Enterobacteria Salmonella_ph_4 Salmonella_ph_5 Bacteriophage_P Enterobacteri_7 Escherichia_fer Salmonella_ph_9 Shigella_phage ; End; 10:12 PM] Begin distances; Format triangle=lower labels nodiagonal; Matrix Bacteriophage_L Salmonella_phag Salmonella_ph_2 Enterobacteria Salmonella_ph_4 Salmonella_ph_5 Bacteriophage_P Enterobacteri_7 Escherichia_fer Salmonella_ph_9 Shigella_phage 0.03488178 0.00000000 0.00939569 0.00805147 0.01144361 0.01144361 0.03776458 0.03776458 0.16894521 2.52782488 2.33306265 0.00940402 0.00805563 0.01143852 0.01143852 0.03776632 0.03776632 0.16895662 2.52867198 2.33347130 0.00401104 0.01007581 0.01007581 0.03915803 0.03915803 0.16872907 2.58932948 2.34759402 0.00873225 0.00873225 0.03778067 0.03778067 0.16714604 2.65308905 2.38732648 ; End; 134 0.00000000 0.03423204 0.03423204 0.16291463 2.57607889 2.38805771 0.03423204 0.03423204 0.16291463 2.57607889 2.38805771 0.00000000 0.14281318 2.65199256 2.43845344 0.14281318 2.65199256 2.43845344 2.33474612 2.16910219 Paup-Podoviridae Kimura 3 Distance #NEXUS [Distance matrix saved Wednesday, April 21, 2010 [! Distance measure = Kimura 3-parameter ] 8:43 AM] Begin taxa; Dimensions ntax=11; Taxlabels Bacteriophage_L Salmonella_phag Salmonella_ph_2 Enterobacteria Salmonella_ph_4 Salmonella_ph_5 Bacteriophage_P Enterobacteri_7 Escherichia_fer Salmonella_ph_9 Shigella_phage ; End; Begin distances; Format triangle=lower labels nodiagonal; Matrix Bacteriophage_L Salmonella_phag Salmonella_ph_2 Enterobacteria Salmonella_ph_4 Salmonella_ph_5 Bacteriophage_P Enterobacteri_7 Escherichia_fer Salmonella_ph_9 Shigella_phage 0.03490559 ; End; 0.00000000 0.00939569 0.00805158 0.01144465 0.01144465 0.03776471 0.03776471 0.16905527 2.74895644 2.51205444 0.00940402 0.00805574 0.01143956 0.01143956 0.03776645 0.03776645 0.16906679 2.74971652 2.51263404 0.00401115 0.01007592 0.01007592 0.03915816 0.03915816 0.16882090 2.84667826 2.55163670 0.00873225 0.00873225 0.03778067 0.03778067 0.16724624 3.21999645 2.67578864 135 0.00000000 0.03423254 0.03423254 0.16301344 2.72281551 2.57791352 0.03423254 0.03423254 0.16301344 2.72281551 2.57791352 0.00000000 0.14295946 3.25594544 2.91190577 0.14295946 3.25594544 2.91190577 2.85742188 2.30707932 Paup-Podoviridae Jukes-Cantor Distance #NEXUS [Distance matrix saved Tuesday, April 20, 2010 [! Distance measure = Jukes-Cantor ] 10:00 PM] Begin taxa; Dimensions ntax=11; Taxlabels Bacteriophage_L Salmonella_phag Salmonella_ph_2 Enterobacteria Salmonella_ph_4 Salmonella_ph_5 Bacteriophage_P Enterobacteri_7 Escherichia_fer Salmonella_ph_9 Shigella_phage ; End; Begin distances; Format triangle=lower labels nodiagonal; Matrix Bacteriophage_L Salmonella_phag Salmonella_ph_2 Enterobacteria Salmonella_ph_4 Salmonella_ph_5 Bacteriophage_P Enterobacteri_7 Escherichia_fer Salmonella_ph_9 Shigella_phage 0.03476365 ; End; 0.00000000 0.00939189 0.00804297 0.01141984 0.01141984 0.03759329 0.03759329 0.16735767 2.34895587 2.24433970 0.00940021 0.00804712 0.01141477 0.01141477 0.03759509 0.03759509 0.16736981 2.34925485 2.24454451 0.00401070 0.01006727 0.01006727 0.03899647 0.03899647 0.16735767 2.36455727 2.24405479 0.00871713 0.00871713 0.03759329 0.03759329 0.16569285 2.39828563 2.27267075 136 0.00000000 0.03409678 0.03409678 0.16154690 2.36535621 2.27320170 0.03409678 0.03409678 0.16154690 2.36535621 2.27320170 0.00000000 0.14195928 2.39855313 2.30239582 0.14195928 2.39855313 2.30239582 2.20578408 2.10568690 Paup-Podoviridae Absolute Distance #NEXUS [Distance matrix saved Tuesday, April 20, 2010 [! Distance measure = absolute ] 10:03 PM] Begin taxa; Dimensions ntax=11; Taxlabels Bacteriophage_L Salmonella_phag Salmonella_ph_2 Enterobacteria Salmonella_ph_4 Salmonella_ph_5 Bacteriophage_P Enterobacteri_7 Escherichia_fer Salmonella_ph_9 Shigella_phage ; End; Begin distances; Format triangle=lower labels nodiagonal; Matrix Bacteriophage_L Salmonella_phag 0 Salmonella_ph_2 14 14 Enterobacteria 12 12 6 Salmonella_ph_4 17 17 15 13 Salmonella_ph_5 17 17 15 13 0 Bacteriophage_P 55 55 57 55 50 Enterobacteri_7 55 55 57 55 50 Escherichia_fer 225 225 225 223 218 Salmonella_ph_9 1014 1014 1015 1017 1015 Shigella_phage 1007 1007 1007 1009 1009 ; End; 137 50 50 218 1015 1009 0 194 1017 1011 194 1017 1011 1004 996 48 Paup-Podoviridae Maximum Likelihood #NEXUS [Distance matrix saved Wednesday, April 21, 2010 9:44 AM] [! Distance measure = maximum-likelihood Likelihood settings: Number of substitution types = 6User-specified substitution rate matrix = 1.765600 2.645500 0.625000 1.765600 0.424800 4.501700 2.645500 0.424800 1.000000 0.625000 4.501700 1.000000 Assumed nucleotide frequencies (set by user): A=0.27040 C=0.23490 G=0.27020 T=0.22450 Among-site rate variation: Assumed proportion of invariable sites = none Distribution of rates at variable sites = gamma (continuous) with shape parameter (alpha) = 0.4516 These settings correspond to the GTR+G model ] Begin taxa; Dimensions ntax=11; Taxlabels Bacteriophage_L Salmonella_phag Salmonella_ph_2 Enterobacteria Salmonella_ph_4 Salmonella_ph_5 Bacteriophage_P Enterobacteri_7 Escherichia_fer Salmonella_ph_9 Shigella_phage ; End; Begin distances; Format triangle=lower labels Matrix Bacteriophage_L Salmonella_phag 0.00000000 Salmonella_ph_2 0.00958093 Enterobacteria 0.00812216 Salmonella_ph_4 0.01155554 Salmonella_ph_5 0.01155554 Bacteriophage_P 0.04013967 Enterobacteri_7 0.04013967 Escherichia_fer 0.22454868 Salmonella_ph_9 167.28 Shigella_phage 158.535 Escherichia_fer Salmonella_ph_9 Shigella_phage 0.18455921 162.737 152.16 nodiagonal; 0.00958092 0.00812215 0.01155543 0.01155543 0.04013967 0.04013967 0.22454868 167.28 158.535 0.00405087 0.01027630 0.01027630 0.04186995 0.04186995 0.22672102 164.261 158.013 151.11 151.111 0.03659397 0.00880866 0.00880866 0.03998624 0.03998624 0.22263619 168.22 161.773 ; End; 138 0.00000000 0.03625951 0.03625951 0.21596409 174.111 168.379 0.03625951 0.03625951 0.21596409 174.111 168.379 0.00000000 0.18455921 162.737 152.16 Paup-Podoviridae Uncorrected-P #NEXUS [Distance matrix saved Wednesday, April 21, 2010 [! Distance measure = uncorrected ("p") ] 9:05 AM] Begin taxa; Dimensions ntax=11; Taxlabels Bacteriophage_L Salmonella_phag Salmonella_ph_2 Enterobacteria Salmonella_ph_4 Salmonella_ph_5 Bacteriophage_P Enterobacteri_7 Escherichia_fer Salmonella_ph_9 Shigella_phage ; End; Begin distances; Format triangle=lower labels nodiagonal; Matrix Bacteriophage_L Salmonella_phag 0.00000000 Salmonella_ph_2 0.00933333 0.00934155 Enterobacteria 0.00800000 0.00800410 0.00400000 Salmonella_ph_4 0.01133333 0.01132834 0.01000000 Salmonella_ph_5 0.01133333 0.01132834 0.01000000 Bacteriophage_P 0.03666667 0.03666838 0.03800000 Enterobacteri_7 0.03666667 0.03666838 0.03800000 Escherichia_fer 0.15000001 0.15000972 0.15000001 Salmonella_ph_9 0.71727526 0.71728826 0.71794891 Shigella_phage 0.71237683 0.71238708 0.71236253 0.03397028 ; End; 0.00866667 0.00866667 0.03666667 0.03666667 0.14866666 0.71935838 0.71377152 139 0.00000000 0.03333334 0.03333334 0.14533333 0.71798307 0.71379715 0.03333334 0.03333334 0.14533333 0.71798307 0.71379715 0.00000000 0.12933333 0.71936929 0.71517926 0.12933333 0.71936929 0.71517926 0.71039212 0.70473695 Distance Matrices for TLS in Siphoviridae Using MEGA4 MEGA-Siphovridae P-distances Title: : Siphoviridae.dat Description No. of Taxa : 9 Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene Project\Siphoviridae\Step 8 Mega\Siphoviridae.meg Data Title : : Siphoviridae.dat Data Type : Nucleotide (Coding) Analysis : Pairwise distance calculation ->Compute : Distances only Include Sites : ============================== ->Gaps/Missing Data : Complete Deletion ->Codon Positions : 1st+2nd+3rd+Noncoding Substitution Model : ============================== ->Model : Nucleotide: p-distance ->Substitutions to Include : d: Transitions + Transversions ->Pattern among Lineages : Same (Homogeneous) ->Rates among sites : Uniform rates No. of Sites : 1149 d : Estimate [1] [2] [3] [4] [5] [6] [7] [8] [9] [ [1] [2] [3] [4] [5] [6] [7] [8] [9] #Enterobact #Staphyloco #Enteroba02 #Erwinia_ph #Methanothe #Methanobac #Methanosar #Mesorhizob #PG 1 0.022 0.029 0.300 0.594 0.593 0.594 0.553 0.622 2 0.024 0.304 0.594 0.591 0.589 0.556 0.619 3 0.299 0.586 0.586 0.588 0.562 0.623 4 0.607 0.600 0.585 0.532 0.619 5 6 7 8 0.151 0.574 0.563 0.568 0.583 0.493 0.574 0.572 0.640 0.641 140 9 ] MEGA-Siphovridae Kimura 2 Title: : Siphoviridae.dat Description No. of Taxa : 9 Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene Project\Siphoviridae\Step 8 Mega\Siphoviridae.meg Data Title : : Siphoviridae.dat Data Type : Nucleotide (Coding) Analysis : Pairwise distance calculation ->Compute : Distances only Include Sites : ============================== ->Gaps/Missing Data : Complete Deletion ->Codon Positions : 1st+2nd+3rd+Noncoding Substitution Model : ============================== ->Model : Nucleotide: Kimura 2-parameter ->Substitutions to Include : d: Transitions + Transversions ->Pattern among Lineages : Same (Homogeneous) ->Rates among sites : Uniform rates No. of Sites : 1149 d : Estimate [1] [2] [3] [4] [5] [6] [7] [8] [9] [ [1] [2] [3] [4] [5] [6] [7] [8] [9] #Enterobact #Staphyloco #Enteroba02 #Erwinia_ph #Methanothe #Methanobac #Methanosar #Mesorhizob #PG 1 0.022 0.029 0.386 1.193 1.186 1.176 1.016 1.337 2 0.025 0.392 1.195 1.178 1.156 1.032 1.315 3 0.385 1.151 1.151 1.151 1.058 1.343 4 1.253 1.211 1.136 0.926 1.326 5 6 7 8 0.169 1.093 1.049 1.063 1.127 0.808 1.101 1.094 1.439 1.452 MEGA-Siphovridae Nei-Gojobori JC Synonmous Title: : Siphoviridae.dat Description No. of Taxa : 9 141 9 ] Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene Project\Siphoviridae\Step 8 Mega\Siphoviridae.meg Data Title : : Siphoviridae.dat Data Type : Nucleotide (Coding) Analysis : Pairwise distance calculation ->Compute : Distances only Include Sites : ============================== ->Gaps/Missing Data : Complete Deletion Substitution Model : ============================== ->Model : Codon: Nei-Gojobori (Jukes-Cantor) ->Substitutions to Include : s: Synonymous only ->Pattern among Lineages : Same (Homogeneous) ->Rates among sites : Uniform rates No. of Sites : 382 dS : Estimate [1] [2] [3] [4] [5] [6] [7] [8] [9] [ [1] [2] [3] [4] [5] [6] [7] [8] [9] #Enterobact #Staphyloco #Enteroba02 #Erwinia_ph #Methanothe #Methanobac #Methanosar #Mesorhizob #PG 1 0.097 0.132 1.564 ? ? ? ? ? 2 0.106 1.668 ? ? ? ? 2.735 3 1.595 ? ? ? ? ? 4 ? ? ? ? 2.553 5 6 0.694 ? ? ? ? ? 2.860 2.887 ? 7 8 9 ] ? MEGA-Siphovridae Nei-Gojobori JC Non-Synonymous Title: : Siphoviridae.dat Description No. of Taxa : 9 Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene Project\Siphoviridae\Step 8 Mega\Siphoviridae.meg Data Title : : Siphoviridae.dat Data Type : Nucleotide (Coding) Analysis : Pairwise distance calculation ->Compute : Distances only Include Sites : ============================== 142 ->Gaps/Missing Data : Complete Deletion Substitution Model : ============================== ->Model : Codon: Nei-Gojobori (Jukes-Cantor) ->Substitutions to Include : n: Nonsynonymous only ->Pattern among Lineages : Same (Homogeneous) ->Rates among sites : Uniform rates No. of Sites : 382 dN : Estimate [1] [2] [3] [4] [5] [6] [7] [8] [9] [ [1] [2] [3] [4] [5] [6] [7] [8] [9] #Enterobact #Staphyloco #Enteroba02 #Erwinia_ph #Methanothe #Methanobac #Methanosar #Mesorhizob #PG 1 0.001 0.001 0.225 0.919 0.928 0.935 0.720 1.140 2 0.002 0.226 0.912 0.918 0.927 0.716 1.139 3 0.222 0.910 0.919 0.936 0.722 1.142 4 0.995 1.005 0.896 0.705 1.152 5 6 7 8 9 ] 0.060 0.854 0.844 0.842 0.851 0.577 0.917 0.904 1.195 1.144 MEGA-Siphovridae Jukes-Cantor Title: : Siphoviridae.dat Description No. of Taxa : 9 Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene Project\Siphoviridae\Step 8 Mega\Siphoviridae.meg Data Title : : Siphoviridae.dat Data Type : Nucleotide (Coding) Analysis : Pairwise distance calculation ->Compute : Distances only Include Sites : ============================== ->Gaps/Missing Data : Complete Deletion ->Codon Positions : 1st+2nd+3rd+Noncoding Substitution Model : ============================== ->Model : Nucleotide: Jukes-Cantor ->Substitutions to Include : All 143 ->Pattern among Lineages : Same (Homogeneous) ->Rates among sites : Uniform rates No. of Sites : 1149 d : Estimate [1] [2] [3] [4] [5] [6] [7] [8] [9] [ [1] [2] [3] [4] [5] [6] [7] [8] [9] #Enterobact #Staphyloco #Enteroba02 #Erwinia_ph #Methanothe #Methanobac #Methanosar #Mesorhizob #PG 1 0.022 0.029 0.384 1.176 1.171 1.176 1.001 1.328 2 0.025 0.389 1.176 1.163 1.155 1.015 1.308 3 0.382 1.139 1.139 1.151 1.039 1.333 4 1.241 1.205 1.135 0.926 1.308 5 6 7 8 9 ] 0.168 1.089 1.042 1.063 1.127 0.805 1.089 1.078 1.438 1.449 MEGA-Siphovridae Tajima-Nei Title: : Siphoviridae.dat Description No. of Taxa : 9 Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene Project\Siphoviridae\Step 8 Mega\Siphoviridae.meg Data Title : : Siphoviridae.dat Data Type : Nucleotide (Coding) Analysis : Pairwise distance calculation ->Compute : Distances only Include Sites : ============================== ->Gaps/Missing Data : Complete Deletion ->Codon Positions : 1st+2nd+3rd+Noncoding Substitution Model : ============================== ->Model : Nucleotide: Tajima-Nei ->Substitutions to Include : All ->Pattern among Lineages : Same (Homogeneous) ->Rates among sites : Uniform rates No. of Sites : 1149 d : Estimate 144 [1] [2] [3] [4] [5] [6] [7] [8] [9] [ [1] [2] [3] [4] [5] [6] [7] [8] [9] #Enterobact #Staphyloco #Enteroba02 #Erwinia_ph #Methanothe #Methanobac #Methanosar #Mesorhizob #PG 1 0.022 0.030 0.390 1.204 1.193 1.191 1.025 1.397 2 0.025 0.397 1.207 1.185 1.168 1.040 1.375 3 0.389 1.166 1.159 1.165 1.065 1.405 4 1.262 1.218 1.144 0.931 1.372 5 6 7 8 0.171 1.101 1.057 1.078 1.145 0.813 1.114 1.109 1.491 1.478 145 9 ] Distance Matrices for TLS in Siphoviridae Using PAUP Paup-Siphoviridae Uncorrected-P Distance #NEXUS [Distance matrix saved Wednesday, April 21, 2010 [! Distance measure = uncorrected ("p") ] Begin taxa; Dimensions ntax=9; Taxlabels Enterobact Staphyloco Enterobact 'Erwinia_ph' Methanothe Methanobac Methanosar Mesorhizob PG ; End; Begin distances; Format triangle=lower labels nodiagonal; Matrix Enterobact Staphyloco 0.01872659 Enterobact 0.02621723 0.02496879 'Erwinia_ph' 0.32642516 0.32704648 0.32642651 Methanothe 0.61569929 0.61563504 0.61059678 Methanobac 0.61516076 0.61367792 0.61081469 Methanosar 0.59270149 0.58976126 0.58662319 Mesorhizob 0.55757636 0.56030637 0.56570482 PG 0.64121032 0.63756287 0.63984722 0.64963400 ; End; 11:17 AM] 0.62127924 0.61692840 0.58061647 0.54909956 0.64035785 146 0.14173973 0.57870853 0.57451689 0.59632415 0.56909096 0.58954138 0.59191257 0.49239403 0.64512575 Paup-Siphoviridae Kimura 2 Distance #NEXUS [Distance matrix saved Wednesday, April 21, 2010 [! Distance measure = Kimura 2-parameter ] 10:41 AM] Begin taxa; Dimensions ntax=9; Taxlabels Enterobact Staphyloco Enterobact 'Erwinia_ph' Methanothe Methanobac Methanosar Mesorhizob PG ; End; Begin distances; Format triangle=lower labels nodiagonal; Matrix Enterobact Staphyloco 0.01903308 Enterobact 0.02683690 0.02551383 'Erwinia_ph' 0.43105349 0.43232006 0.43144304 Methanothe 1.31110489 1.31271875 1.27936828 Methanobac 1.30317140 1.29524052 1.27838397 Methanosar 1.17304754 1.15919125 1.14386094 Mesorhizob 1.03279066 1.04586089 1.06955695 PG 1.44911003 1.42361176 1.43993366 1.51525807 ; End; 1.33123112 1.30103672 1.11622107 0.98928696 1.45288587 147 0.15774627 1.11354053 1.08953953 1.19708908 1.07532048 1.15671420 1.17862737 0.80430073 1.47740006 Paup-Siphoviridae Kimura 3 Distance #NEXUS [Distance matrix saved Wednesday, April 21, 2010 [! Distance measure = Kimura 3-parameter ] 11:13 AM] Begin taxa; Dimensions ntax=9; Taxlabels Enterobact Staphyloco Enterobact 'Erwinia_ph' Methanothe Methanobac Methanosar Mesorhizob PG ; End; Begin distances; Format triangle=lower labels nodiagonal; Matrix Enterobact Staphyloco 0.01903318 Enterobact 0.02683733 0.02551394 'Erwinia_ph' 0.43108186 0.43238878 0.43147168 Methanothe 1.31718814 1.31892180 1.28548586 Methanobac 1.30336940 1.29525995 1.27838397 Methanosar 1.17319810 1.15937960 1.14386559 Mesorhizob 1.03346443 1.04645586 1.07101309 PG 1.45593929 1.43164933 1.45032740 1.53421533 ; End; 1.33131921 1.30229473 1.11631536 0.99119377 1.46084988 148 0.15805787 1.11552358 1.08969009 1.20080078 1.07675862 1.15691733 1.18081725 0.80609429 1.48189211 Paup-Siphoviridae Jukes-Cantor Distance #NEXUS [Distance matrix saved Wednesday, April 21, 2010 [! Distance measure = Jukes-Cantor ] 11:09 AM] Begin taxa; Dimensions ntax=9; Taxlabels Enterobact Staphyloco Enterobact 'Erwinia_ph' Methanothe Methanobac Methanosar Mesorhizob PG ; End; Begin distances; Format triangle=lower labels nodiagonal; Matrix Enterobact Staphyloco 0.01896435 Enterobact 0.02668642 0.02539388 'Erwinia_ph' 0.42850727 0.42960823 0.42850965 Methanothe 1.28999376 1.28963506 1.26202691 Methanobac 1.28699243 1.27878964 1.26320028 Methanosar 1.17144573 1.15755630 1.14301050 Mesorhizob 1.02028036 1.03099704 1.05265105 PG 1.44799256 1.42325914 1.43865359 1.50843751 ; End; 1.32182097 1.29688931 1.11593080 0.98794788 1.44213867 149 0.15710275 1.10753000 1.08939767 1.18892062 1.06655908 1.15652788 1.16769385 0.80148149 1.47548342 Paup-Siphoviridae Absolute Distance #NEXUS [Distance matrix saved Wednesday, April 21, 2010 [! Distance measure = absolute ] 10:35 AM] Begin taxa; Dimensions ntax=9; Taxlabels Enterobact Staphyloco Enterobact 'Erwinia_ph' Methanothe Methanobac Methanosar Mesorhizob PG ; End; Begin distances; Format triangle=lower labels nodiagonal; Matrix Enterobact Staphyloco 30 Enterobact 42 40 'Erwinia_ph' 523 524 523 Methanothe 857 857 850 864 Methanobac 858 856 852 860 199 Methanosar 802 798 794 786 709 697 Mesorhizob 737 741 748 726 686 704 667 PG 935 930 933 937 830 826 804 ; End; 150 787 APPENDIX D Modeltest for Podoviridae Using PAUP Testing models of evolution - Modeltest 3.7 (c) Copyright, 1998-2005 David Posada (dposada@uvigo.es) Facultad de Biologia, Universidad de Vigo, Campus Universitario, 36310 Vigo, Spain _______________________________________________________________ Mon Apr 19 00:15:54 2010 OS = Macintosh (Sioux console) Input format: PAUP* scores file Run settings: Using the standard AIC (not the AICc) Not using branch lengths as parameters Including all models in model-averaging calculations --------------------------------------------------------------* * * HIERARCHICAL LIKELIHOD RATIO TESTS (hLRTs) * * * --------------------------------------------------------------Confidence level = 0.01 Equal base frequencies Null model = JC Alternative model = F81 2(lnL1-lnL0) = 17.5332 P-value = 0.000549 Ti=Tv Null model = F81 Alternative model = HKY 2(lnL1-lnL0) = 97.9189 P-value = <0.000001 Equal Ti rates Null model = HKY Alternative model = TrN 2(lnL1-lnL0) = 12.1709 P-value = 0.000485 Equal Tv rates Null model = TrN Alternative model = TIM 2(lnL1-lnL0) = 14.9248 P-value = 0.000112 Only two Tv rates Null model = TIM Alternative model = GTR -lnL0 = 5627.0679 -lnL1 = 5618.3013 df = 3 -lnL0 = 5618.3013 -lnL1 = 5569.3418 df = 1 -lnL0 = 5569.3418 -lnL1 = 5563.2563 df = 1 -lnL0 = 5563.2563 -lnL1 = 5555.7939 df = 1 -lnL0 = 5555.7939 -lnL1 = 5552.8652 151 2(lnL1-lnL0) = 5.8574 df = 2 P-value = 0.053466 Equal rates among sites Null model = TIM -lnL0 = Alternative model = TIM+G -lnL1 = 2(lnL1-lnL0) = 23.8232 df = 1 Using mixed chi-square distribution P-value = <0.000001 No Invariable sites Null model = TIM+G -lnL0 = Alternative model = TIM+I+G -lnL1 = 2(lnL1-lnL0) = 0.0000 df = 1 Using mixed chi-square distribution P-value = >0.999999 Model selected: TIM+G -lnL = 5543.8823 K = 7 Base frequencies: freqA = 0.2725 freqC = 0.2362 freqG = 0.2680 freqT = 0.2232 Substitution model: Rate matrix R(a) [A-C] = 1.0000 R(b) [A-G] = 1.9111 R(c) [A-T] = 0.3732 R(d) [C-G] = 0.3732 R(e) [C-T] = 3.2466 R(f) [G-T] = 1.0000 Among-site rate variation Proportion of invariable sites = 0 Variable sites (G) Gamma distribution shape parameter = 5555.7939 5543.8823 5543.8823 5543.8823 0.4507 -PAUP* Commands Block: If you want to implement the previous estimates as likelihod settings in PAUP*, attach the next block of commands after the data in your PAUP file: [! Likelihood settings from best-fit model (TIM+G) selected by hLRT in Modeltest 3.7 on Mon Apr 19 00:15:55 2010 ] BEGIN PAUP; Lset Base=(0.2725 0.2362 0.2680) Nst=6 Rmat=(1.0000 1.9111 0.3732 0.3732 3.2466) Rates=gamma Shape=0.4507 Pinvar=0; END; -- 152 --------------------------------------------------------------* * * AKAIKE INFORMATION CRITERION (AIC) * * * --------------------------------------------------------------Model selected: GTR+G -lnL = 5540.9497 K = 9 AIC = 11099.8994 Base frequencies: freqA = 0.2704 freqC = 0.2349 freqG = 0.2702 freqT = 0.2245 Substitution model: Rate matrix R(a) [A-C] = 1.7656 R(b) [A-G] = 2.6455 R(c) [A-T] = 0.6250 R(d) [C-G] = 0.4248 R(e) [C-T] = 4.5017 R(f) [G-T] = 1.0000 Among-site rate variation Proportion of invariable sites = 0 Variable sites (G) Gamma distribution shape parameter = 0.4516 -PAUP* Commands Block: If you want to implement the previous estimates as likelihod settings in PAUP*, attach the next block of commands after the data in your PAUP file: [! Likelihood settings from best-fit model (GTR+G) selected by AIC in Modeltest 3.7 on Mon Apr 19 00:15:55 2010 ] BEGIN PAUP; Lset Base=(0.2704 0.2349 0.2702) Nst=6 Rmat=(1.7656 2.6455 0.6250 0.4248 4.5017) Rates=gamma Shape=0.4516 Pinvar=0; END; -* MODEL SELECTION UNCERTAINTY : Akaike Weights Model -lnL K AIC delta weight cumWeight ----------------------------------------------------------------------- 153 GTR+G 5540.9497 9 11099.8994 0.0000 0.5092 0.5092 TIM+G 5543.8823 7 11101.7646 1.8652 0.2004 0.7095 GTR+I+G 5540.9497 10 11101.8994 2.0000 0.1873 0.8968 TIM+I+G 5543.8823 8 11103.7646 3.8652 0.0737 0.9705 TVM+G 5545.5215 8 11107.0430 7.1436 0.0143 0.9848 K81uf+G 5548.4287 6 11108.8574 8.9580 0.0058 0.9906 TVM+I+G 5545.4644 9 11108.9287 9.0293 0.0056 0.9962 K81uf+I+G 5548.2817 7 11110.5635 10.6641 0.0025 0.9987 SYM+G 5550.6514 6 11113.3027 13.4033 0.0006 0.9993 SYM+I+G 5550.6514 7 11115.3027 15.4033 0.0002 0.9995 TrN+I+G 5550.7544 7 11115.5088 15.6094 0.0002 0.9997 TVMef+I+G 5552.6396 6 11117.2793 17.3799 8.57e-05 0.9998 K81+G 5555.8398 3 11117.6797 17.7803 7.01e-05 0.9999 TIMef+I+G 5553.8770 5 11117.7539 17.8545 6.76e-05 0.9999 K81+I+G 5555.8164 4 11119.6328 19.7334 2.64e-05 1.0000 GTR 5552.8652 8 11121.7305 21.8311 9.25e-06 1.0000 HKY+I+G 5554.9282 6 11121.8564 21.9570 8.69e-06 1.0000 GTR+I 5552.5342 9 11123.0684 23.1689 4.74e-06 1.0000 TIM 5555.7939 6 11123.5879 23.6885 3.66e-06 1.0000 TIM+I 5555.4390 7 11124.8779 24.9785 1.92e-06 1.0000 TrNef+G 5560.7129 3 11127.4258 27.5264 5.37e-07 1.0000 K80+G 5562.6597 2 11129.3193 29.4199 2.08e-07 1.0000 TrNef+I+G 5560.7129 4 11129.4258 29.5264 1.97e-07 1.0000 K80+I+G 5562.5195 3 11131.0391 31.1396 8.81e-08 1.0000 TVM 5558.9619 7 11131.9238 32.0244 5.66e-08 1.0000 TVM+I 5557.9658 8 11131.9316 32.0322 5.64e-08 1.0000 K81uf+I 5560.8452 6 11133.6904 33.7910 2.34e-08 1.0000 K81uf 5562.0234 5 11134.0469 34.1475 1.96e-08 1.0000 SYM 5562.9722 5 11135.9443 36.0449 7.58e-09 1.0000 TrN 5563.2563 5 11136.5127 36.6133 5.71e-09 1.0000 SYM+I 5562.4590 6 11136.9180 37.0186 4.66e-09 1.0000 TrN+I 5562.6582 6 11137.3164 37.4170 3.82e-09 1.0000 TIMef 5566.2280 3 11138.4561 38.5566 2.16e-09 1.0000 TrN+G 5563.2563 6 11138.5127 38.6133 2.10e-09 1.0000 TIMef+I 5565.6860 4 11139.3721 39.4727 1.37e-09 1.0000 TVMef+I 5565.0259 5 11140.0518 40.1523 9.72e-10 1.0000 TVMef 5566.1211 4 11140.2422 40.3428 8.84e-10 1.0000 TIMef+G 5566.2280 4 11140.4561 40.5566 7.94e-10 1.0000 TVMef+G 5566.1211 5 11142.2422 42.3428 3.25e-10 1.0000 K81+I 5568.2261 3 11142.4521 42.5527 2.93e-10 1.0000 K81 5569.4849 2 11142.9697 43.0703 2.26e-10 1.0000 HKY+I 5567.9131 5 11145.8262 45.9268 5.42e-11 1.0000 HKY 5569.3418 4 11146.6836 46.7842 3.53e-11 1.0000 HKY+G 5569.3418 5 11148.6836 48.7842 1.30e-11 1.0000 TrNef 5573.6636 2 11151.3271 51.4277 3.46e-12 1.0000 TrNef+I 5572.8472 3 11151.6943 51.7949 2.88e-12 1.0000 K80+I 5575.2930 2 11154.5859 54.6865 6.79e-13 1.0000 K80 5576.8281 1 11155.6562 55.7568 3.98e-13 1.0000 F81+G 5609.3760 4 11226.7520 126.8525 1.45e-28 1.0000 F81+I+G 5609.3760 5 11228.7520 128.8525 5.33e-29 1.0000 JC+G 5617.7881 1 11237.5762 137.6768 6.47e-31 1.0000 JC+I+G 5617.7690 2 11239.5381 139.6387 2.42e-31 1.0000 F81 5618.3013 3 11242.6025 142.7031 5.24e-32 1.0000 F81+I 5618.2539 4 11244.5078 144.6084 2.02e-32 1.0000 JC 5627.0679 0 11254.1357 154.2363 1.64e-34 1.0000 JC+I 5626.8755 1 11255.7510 155.8516 7.31e-35 1.0000 ----------------------------------------------------------------------- 154 -lnL: K: IC: delta: weight: cumWeight: Negative log likelihod Number of estimated parameters Information Criterion Information difference Information we ight Cumulative information weight * MODEL AVERAGING AND PARAMETER IMPORTANCE (using Akaike Weights) Including all 56 models Model-averaged Parameter Importance estimates --------------------------------------fA 0.9989 0.2709 fC 0.9989 0.2353 fG 0.9989 0.2695 fT 0.9989 0.2243 TiTv 0.0000 1.9240 rAC 0.7173 1.7660 rAG 0.9717 2.4390 rAT 0.7173 0.6256 rCG 0.7173 0.4241 rCT 0.9717 4.1478 pinv(I) 0.0000 0.0182 alpha(G) 0.7303 0.4498 pinv(IG) 0.2697 0.0005 alpha(IG) 0.2697 0.4497 --------------------------------------Values have been rounded. (I): averaged using only +I models. (G): averaged using only +G models. (IG): averaged using only +I+G models. _________________________________________________________________ Program is done. Time processing: 2.81023 seconds If you need help type '-?' in the command line of the program. Modeltest for Siphoviridae Using PAUP Testing models of evolution - Modeltest 3.7 (c) Copyright, 1998-2005 David Posada (dposada@uvigo.es) Facultad de Biologia, Universidad de Vigo, Campus Universitario, 36310 Vigo, Spain _______________________________________________________________ Wed Apr 21 14:59:56 2010 OS = Macintosh (Sioux console) Input format: PAUP* scores file 155 Run settings: Using the standard AIC (not the AICc) Not using branch lengths as parameters Including all models in model-averaging calculations --------------------------------------------------------------* * * HIERARCHICAL LIKELIHOD RATIO TESTS (hLRTs) * * * --------------------------------------------------------------Confidence level = 0.01 Equal base frequencies Null model = JC -lnL0 = Alternative model = F81 -lnL1 = 2(lnL1-lnL0) = 73.6621 df = 3 P-value = <0.000001 Ti=Tv Null model = F81 -lnL0 = Alternative model = HKY -lnL1 = 2(lnL1-lnL0) = 68.0684 df = 1 P-value = <0.000001 Equal Ti rates Null model = HKY -lnL0 = Alternative model = TrN -lnL1 = 2(lnL1-lnL0) = 19.9980 df = 1 P-value = 0.000008 Equal Tv rates Null model = TrN -lnL0 = Alternative model = TIM -lnL1 = 2(lnL1-lnL0) = 2.7461 df = 1 P-value = 0.097492 Equal rates among sites Null model = TrN -lnL0 = Alternative model = TrN+G -lnL1 = 2(lnL1-lnL0) = 140.6895 df = 1 Using mixed chi-square distribution P-value = <0.000001 No Invariable sites Null model = TrN+G -lnL0 = Alternative model = TrN+I+G -lnL1 = 2(lnL1-lnL0) = 3.4121 df = 1 Using mixed chi-square distribution P-value = 0.032360 Model selected: TrN+G -lnL = 12332.0967 K = 6 Base frequencies: freqA = 0.2993 freqC = 0.2111 freqG = 0.2432 freqT = 0.2464 156 12483.3057 12446.4746 12446.4746 12412.4404 12412.4404 12402.4414 12402.4414 12401.0684 12402.4414 12332.0967 12332.0967 12330.3906 Substitution model: Rate matrix R(a) [A-C] = 1.0000 R(b) [A-G] = 1.4888 R(c) [A-T] = 1.0000 R(d) [C-G] = 1.0000 R(e) [C-T] = 2.2758 R(f) [G-T] = 1.0000 Among-site rate variation Proportion of invariable sites = 0 Variable sites (G) Gamma distribution shape parameter = 1.4775 -PAUP* Commands Block: If you want to implement the previous estimates as likelihod settings in PAUP*, attach the next block of commands after the data in your PAUP file: [! Likelihood settings from best-fit model (TrN+G) selected by hLRT in Modeltest 3.7 on Wed Apr 21 14:59:57 2010 ] BEGIN PAUP; Lset Base=(0.2993 0.2111 0.2432) Nst=6 Rmat=(1.0000 1.4888 1.0000 1.0000 2.2758) Rates=gamma Shape=1.4775 Pinvar=0; END; -- --------------------------------------------------------------* * * AKAIKE INFORMATION CRITERION (AIC) * * * --------------------------------------------------------------Model selected: GTR+G -lnL = 12320.1611 K = 9 AIC = 24658.3223 Base frequencies: freqA = 0.2916 freqC = 0.2088 freqG = 0.2499 freqT = 0.2497 Substitution model: Rate matrix R(a) [A-C] = 1.7834 R(b) [A-G] = 1.8181 157 R(c) [A-T] = 1.1345 R(d) [C-G] = 0.9127 R(e) [C-T] = 2.7446 R(f) [G-T] = 1.0000 Among-site rate variation Proportion of invariable sites = 0 Variable sites (G) Gamma distribution shape parameter = 1.4244 -PAUP* Commands Block: If you want to implement the previous estimates as likelihod settings in PAUP*, attach the next block of commands after the data in your PAUP file: [! Likelihood settings from best-fit model (GTR+G) selected by AIC in Modeltest 3.7 on Wed Apr 21 14:59:57 2010 ] BEGIN PAUP; Lset Base=(0.2916 0.2088 0.2499) Nst=6 Rmat=(1.7834 1.8181 1.1345 0.9127 2.7446) Rates=gamma Shape=1.4244 Pinvar=0; END; -* MODEL SELECTION UNCERTAINTY : Akaike Weights Model -lnL K AIC delta weight cumWeight ----------------------------------------------------------------------GTR+G 12320.1611 9 24658.3223 0.0000 0.5214 0.5214 GTR+I+G 12319.2607 10 24658.5215 0.1992 0.4719 0.9933 TVM+I+G 12325.1426 9 24668.2852 9.9629 0.0036 0.9969 TVM+G 12326.6514 8 24669.3027 10.9805 0.0022 0.9990 TIM+I+G 12328.0762 8 24672.1523 13.8301 0.0005 0.9995 TIM+G 12329.8301 7 24673.6602 15.3379 0.0002 0.9998 TrN+I+G 12330.3906 7 24674.7812 16.4590 0.0001 0.9999 TrN+G 12332.0967 6 24676.1934 17.8711 6.86e-05 1.0000 K81uf+I+G 12334.2305 7 24682.4609 24.1387 2.99e-06 1.0000 HKY+I+G 12336.6318 6 24685.2637 26.9414 7.36e-07 1.0000 K81uf+G 12336.7812 6 24685.5625 27.2402 6.34e-07 1.0000 HKY+G 12339.1143 5 24688.2285 29.9062 1.67e-07 1.0000 GTR+I 12344.2178 9 24706.4355 48.1133 1.86e-11 1.0000 TIM+I 12349.1914 7 24712.3828 54.0605 9.51e-13 1.0000 TVM+I 12348.6953 8 24713.3906 55.0684 5.74e-13 1.0000 TrN+I 12351.0020 6 24714.0039 55.6816 4.23e-13 1.0000 K81uf+I 12354.2031 6 24720.4062 62.0840 1.72e-14 1.0000 HKY+I 12356.0781 5 24722.1562 63.8340 7.17e-15 1.0000 TVMef+I+G 12355.8789 6 24723.7578 65.4355 3.22e-15 1.0000 SYM+I+G 12355.0518 7 24724.1035 65.7812 2.71e-15 1.0000 TVMef+G 12357.1084 5 24724.2168 65.8945 2.56e-15 1.0000 SYM+G 12356.1113 6 24724.2227 65.9004 2.55e-15 1.0000 K81+I+G 12370.6670 4 24749.3340 91.0117 9.00e-21 1.0000 158 TIMef+I+G 12369.7188 5 24749.4375 91.1152 8.55e-21 1.0000 TIMef+G 12371.5820 4 24751.1641 92.8418 3.60e-21 1.0000 K80+I+G 12372.6357 3 24751.2715 92.9492 3.42e-21 1.0000 TrNef+I+G 12371.6709 4 24751.3418 93.0195 3.30e-21 1.0000 K81+G 12372.7578 3 24751.5156 93.1934 3.02e-21 1.0000 TrNef+G 12373.4590 3 24752.9180 94.5957 1.50e-21 1.0000 K80+G 12374.6484 2 24753.2969 94.9746 1.24e-21 1.0000 F81+I+G 12375.0498 5 24760.0996 101.7773 4.13e-23 1.0000 F81+G 12376.5391 4 24761.0781 102.7559 2.54e-23 1.0000 TVMef+I 12380.7656 5 24771.5312 113.2090 1.36e-25 1.0000 SYM+I 12380.0293 6 24772.0586 113.7363 1.05e-25 1.0000 K81+I 12390.1982 3 24786.3965 128.0742 8.06e-29 1.0000 TIMef+I 12389.3945 4 24786.7891 128.4668 6.62e-29 1.0000 K80+I 12391.7969 2 24787.5938 129.2715 4.43e-29 1.0000 TrNef+I 12390.9854 3 24787.9707 129.6484 3.67e-29 1.0000 F81+I 12394.6074 4 24797.2148 138.8926 3.61e-31 1.0000 GTR 12393.5068 8 24803.0137 144.6914 1.99e-32 1.0000 TIM 12401.0684 6 24814.1367 155.8145 7.63e-35 1.0000 TrN 12402.4414 5 24814.8828 156.5605 5.25e-35 1.0000 TVM 12402.7080 7 24819.4160 161.0938 5.45e-36 1.0000 JC+I+G 12412.5430 2 24829.0859 170.7637 4.33e-38 1.0000 JC+G 12413.7041 1 24829.4082 171.0859 3.68e-38 1.0000 K81uf 12411.0645 5 24832.1289 173.8066 9.45e-39 1.0000 HKY 12412.4404 4 24832.8809 174.5586 6.49e-39 1.0000 JC+I 12432.0449 1 24866.0898 207.7676 0.00e+00 1.0000 SYM 12432.4541 5 24874.9082 216.5859 0.00e+00 1.0000 TVMef 12434.7578 4 24877.5156 219.1934 0.00e+00 1.0000 TIMef 12443.1348 3 24892.2695 233.9473 0.00e+00 1.0000 TrNef 12444.2129 2 24892.4258 234.1035 0.00e+00 1.0000 K81 12445.5957 2 24895.1914 236.8691 0.00e+00 1.0000 K80 12446.6660 1 24895.3320 237.0098 0.00e+00 1.0000 F81 12446.4746 3 24898.9492 240.6270 0.00e+00 1.0000 JC 12483.3057 0 24966.6113 308.2891 0.00e+00 1.0000 -----------------------------------------------------------------------lnL: Negative log likelihod K: Number of estimated parameters IC: Information Criterion delta: Information difference weight: Information weight cumWeight: Cumulative information weight * MODEL AVERAGING AND PARAMETER IMPORTANCE (using Akaike Weights) Including all 56 models Model-averaged Parameter Importance estimates --------------------------------------fA 1.0000 0.2917 fC 1.0000 0.2090 fG 1.0000 0.2496 fT 1.0000 0.2497 TiTv 0.0000 0.9076 rAC 0.9990 1.7627 rAG 0.9943 1.8121 rAT 0.9990 1.1260 159 rCG 0.9990 0.9057 rCT 0.9943 2.7154 pinv(I) 0.0000 0.0716 alpha(G) 0.5238 1.4243 pinv(IG) 0.4762 0.0240 alpha(IG) 0.4762 1.7521 --------------------------------------Values have been rounded. (I): averaged using only +I models. (G): averaged using only +G models. (IG): averaged using only +I+G models. _________________________________________________________________ Program is done. Time processing: 1.78351 seconds If you need help type '-?' in the command line of the program. Modeltest for Dataset Podoviridae and Siphoviridae Using PAUP Testing models of evolution - Modeltest 3.7 (c) Copyright, 1998-2005 David Posada (dposada@uvigo.es) Facultad de Biologia, Universidad de Vigo, Campus Universitario, 36310 Vigo, Spain _______________________________________________________________ Mon Apr 26 10:58:38 2010 OS = Macintosh (Sioux console) Input format: PAUP* scores file Run settings: Using the standard AIC (not the AICc) Not using branch lengths as parameters Including all models in model-averaging calculations --------------------------------------------------------------* * * HIERARCHICAL LIKELIHOD RATIO TESTS (hLRTs) * * * --------------------------------------------------------------Confidence level = 0.01 Equal base frequencies Null model = JC Alternative model = F81 2(lnL1-lnL0) = 79.3008 P-value = <0.000001 -lnL0 = 18269.9160 -lnL1 = 18230.2656 df = 3 160 Ti=Tv Null model = F81 -lnL0 = Alternative model = HKY -lnL1 = 2(lnL1-lnL0) = 137.6562 df = 1 P-value = <0.000001 Equal Ti rates Null model = HKY -lnL0 = Alternative model = TrN -lnL1 = 2(lnL1-lnL0) = 20.4297 df = 1 P-value = 0.000006 Equal Tv rates Null model = TrN -lnL0 = Alternative model = TIM -lnL1 = 2(lnL1-lnL0) = 14.7500 df = 1 P-value = 0.000123 Only two Tv rates Null model = TIM -lnL0 = Alternative model = GTR -lnL1 = 2(lnL1-lnL0) = 20.3398 df = 2 P-value = 0.000038 Equal rates among sites Null model = GTR -lnL0 = Alternative model = GTR+G -lnL1 = 2(lnL1-lnL0) = 243.2070 df = 1 Using mixed chi-square distribution P-value = <0.000001 No Invariable sites Null model = GTR+G -lnL0 = Alternative model = GTR+I+G -lnL1 = 2(lnL1-lnL0) = 0.0000 df = 1 Using mixed chi-square distribution P-value = >0.999999 Model selected: GTR+G -lnL = 18012.0742 K = 9 Base frequencies: freqA = 0.2828 freqC = 0.2193 freqG = 0.2575 freqT = 0.2404 Substitution model: Rate matrix R(a) [A-C] = 1.8378 R(b) [A-G] = 2.0590 R(c) [A-T] = 0.9950 R(d) [C-G] = 0.6558 R(e) [C-T] = 3.0231 R(f) [G-T] = 1.0000 Among-site rate variation Proportion of invariable sites = 0 Variable sites (G) Gamma distribution shape parameter = -- 161 18230.2656 18161.4375 18161.4375 18151.2227 18151.2227 18143.8477 18143.8477 18133.6777 18133.6777 18012.0742 18012.0742 18012.0742 1.0760 PAUP* Commands Block: If you want to implement the previous estimates as likelihod settings in PAUP*, attach the next block of commands after the data in your PAUP file: [! Likelihood settings from best-fit model (GTR+G) selected by hLRT in Modeltest 3.7 on Mon Apr 26 10:58:39 2010 ] BEGIN PAUP; Lset Base=(0.2828 0.2193 0.2575) Nst=6 Rmat=(1.8378 2.0590 0.9950 0.6558 3.0231) Rates=gamma Shape=1.0760 Pinvar=0; END; -- --------------------------------------------------------------* * * AKAIKE INFORMATION CRITERION (AIC) * * * --------------------------------------------------------------Model selected: GTR+G -lnL = 18012.0742 K = 9 AIC = 36042.1484 Base frequencies: freqA = 0.2828 freqC = 0.2193 freqG = 0.2575 freqT = 0.2404 Substitution model: Rate matrix R(a) [A-C] = 1.8378 R(b) [A-G] = 2.0590 R(c) [A-T] = 0.9950 R(d) [C-G] = 0.6558 R(e) [C-T] = 3.0231 R(f) [G-T] = 1.0000 Among-site rate variation Proportion of invariable sites = 0 Variable sites (G) Gamma distribution shape parameter = 1.0760 -PAUP* Commands Block: If you want to implement the previous estimates as likelihod settings in PAUP*, attach the next block of commands after the data in your PAUP file: 162 [! Likelihood settings from best-fit model (GTR+G) selected by AIC in Modeltest 3.7 on Mon Apr 26 10:58:40 2010 ] BEGIN PAUP; Lset Base=(0.2828 0.2193 0.2575) Nst=6 Rmat=(1.8378 2.0590 0.9950 0.6558 3.0231) Rates=gamma Shape=1.0760 Pinvar=0; END; -* MODEL SELECTION UNCERTAINTY : Akaike Weights Model -lnL K AIC delta weight cumWeight ----------------------------------------------------------------------GTR+G 18012.0742 9 36042.1484 0.0000 0.7300 0.7300 GTR+I+G 18012.0742 10 36044.1484 2.0000 0.2686 0.9986 TVM+G 18019.6426 8 36055.2852 13.1367 0.0010 0.9996 TVM+I+G 18019.6465 9 36057.2930 15.1445 0.0004 1.0000 TIM+G 18027.4688 7 36068.9375 26.7891 1.11e-06 1.0000 TIM+I+G 18027.4688 8 36070.9375 28.7891 4.09e-07 1.0000 K81uf+G 18035.1367 6 36082.2734 40.1250 1.41e-09 1.0000 TrN+G 18035.8340 6 36083.6680 41.5195 7.04e-10 1.0000 K81uf+I+G 18035.1367 7 36084.2734 42.1250 5.20e-10 1.0000 TrN+I+G 18035.8340 7 36085.6680 43.5195 2.59e-10 1.0000 HKY+G 18043.3730 5 36096.7461 54.5977 1.02e-12 1.0000 HKY+I+G 18043.3730 6 36098.7461 56.5977 3.74e-13 1.0000 SYM+G 18048.6309 6 36109.2617 67.1133 1.95e-15 1.0000 TVMef+G 18049.9355 5 36109.8711 67.7227 1.44e-15 1.0000 SYM+I+G 18048.6309 7 36111.2617 69.1133 7.17e-16 1.0000 TVMef+I+G 18049.9355 6 36111.8711 69.7227 5.29e-16 1.0000 TIMef+G 18067.1016 4 36142.2031 100.0547 1.37e-22 1.0000 K81+G 18068.3984 3 36142.7969 100.6484 1.02e-22 1.0000 TIMef+I+G 18067.1016 5 36144.2031 102.0547 5.04e-23 1.0000 K81+I+G 18068.3984 4 36144.7969 102.6484 3.75e-23 1.0000 TrNef+G 18075.0176 3 36156.0352 113.8867 1.36e-25 1.0000 K80+G 18076.2383 2 36156.4766 114.3281 1.09e-25 1.0000 TrNef+I+G 18075.0176 4 36158.0352 115.8867 5.00e-26 1.0000 K80+I+G 18076.2383 3 36158.4766 116.3281 4.01e-26 1.0000 F81+G 18120.3984 4 36248.7969 206.6484 1.40e-45 1.0000 F81+I+G 18120.3984 5 36250.7969 208.6484 0.00e+00 1.0000 GTR+I 18131.8223 9 36281.6445 239.4961 0.00e+00 1.0000 GTR 18133.6777 8 36283.3555 241.2070 0.00e+00 1.0000 TVM+I 18140.3789 8 36296.7578 254.6094 0.00e+00 1.0000 TIM+I 18141.4863 7 36296.9727 254.8242 0.00e+00 1.0000 TIM 18143.8477 6 36299.6953 257.5469 0.00e+00 1.0000 TVM 18143.2637 7 36300.5273 258.3789 0.00e+00 1.0000 TrN+I 18148.9668 6 36309.9336 267.7852 0.00e+00 1.0000 TrN 18151.2227 5 36312.4453 270.2969 0.00e+00 1.0000 K81uf+I 18150.6309 6 36313.2617 271.1133 0.00e+00 1.0000 JC+G 18156.8418 1 36315.6836 273.5352 0.00e+00 1.0000 JC+I+G 18156.8418 2 36317.6836 275.5352 0.00e+00 1.0000 K81uf 18154.1094 5 36318.2188 276.0703 0.00e+00 1.0000 HKY+I 18158.0840 5 36326.1680 284.0195 0.00e+00 1.0000 163 HKY 18161.4375 4 36330.8750 288.7266 0.00e+00 1.0000 SYM+I 18173.2344 6 36358.4688 316.3203 0.00e+00 1.0000 TVMef+I 18175.0508 5 36360.1016 317.9531 0.00e+00 1.0000 SYM 18176.1758 5 36362.3516 320.2031 0.00e+00 1.0000 TVMef 18178.4141 4 36364.8281 322.6797 0.00e+00 1.0000 TIMef+I 18183.9590 4 36375.9180 333.7695 0.00e+00 1.0000 K81+I 18185.8887 3 36377.7773 335.6289 0.00e+00 1.0000 TIMef 18187.1523 3 36380.3047 338.1562 0.00e+00 1.0000 K81 18189.5195 2 36383.0391 340.8906 0.00e+00 1.0000 TrNef+I 18190.9570 3 36387.9141 345.7656 0.00e+00 1.0000 K80+I 18192.8398 2 36389.6797 347.5312 0.00e+00 1.0000 TrNef 18194.0332 2 36392.0664 349.9180 0.00e+00 1.0000 K80 18196.3438 1 36394.6875 352.5391 0.00e+00 1.0000 F81+I 18227.8398 4 36463.6797 421.5312 0.00e+00 1.0000 F81 18230.2656 3 36466.5312 424.3828 0.00e+00 1.0000 JC+I 18266.8730 1 36535.7461 493.5977 0.00e+00 1.0000 JC 18269.9160 0 36539.8320 497.6836 0.00e+00 1.0000 -----------------------------------------------------------------------lnL: Negative log likelihod K: Number of estimated parameters IC: Information Criterion delta: Information difference weight: Information weight cumWeight: Cumulative information weight * MODEL AVERAGING AND PARAMETER IMPORTANCE (using Akaike Weights) Including all 56 models Model-averaged Parameter Importance estimates --------------------------------------fA 1.0000 0.2828 fC 1.0000 0.2193 fG 1.0000 0.2575 fT 1.0000 0.2404 TiTv 0.0000 1.0934 rAC 1.0000 1.8378 rAG 0.9986 2.0590 rAT 1.0000 0.9951 rCG 1.0000 0.6558 rCT 0.9986 3.0231 pinv(I) 0.0000 alpha(G) 0.7311 1.0760 pinv(IG) 0.2689 0.0000 alpha(IG) 0.2689 1.0760 --------------------------------------Values have been rounded. (I): averaged using only +I models. (G): averaged using only +G models. (IG): averaged using only +I+G models. _________________________________________________________________ Program is done. Time processing: 2.87877 seconds If you need help type '-?' in the command line of the program 164