CALIFORNIA STATE UNIVERSITY, NORTHRIDGE DNA PACKAGING BY TERMINASE LARGE SUBUNIT IN

advertisement
CALIFORNIA STATE UNIVERSITY, NORTHRIDGE
DNA PACKAGING BY TERMINASE LARGE SUBUNIT IN
METHANOPHAGE PG
A project submitted in partial fulfillment of the requirements
For the degree of Master of Science
In Biology
By
Thomas Dang
May 2014
The graduate project of Thomas Dang is approved:
Virginia O. Vandergon, Ph.D.
Date
Michael L. Summers, Ph.D.
Date
Larry Baresi, Dr.P.H., Chair
Date
California State University, Northridge
ii
ACKNOWLEDGMENTS
I would like to thank my committee members, Dr. Virginia Vandergon and Dr.
Michael Summers. Thank you for being such great professors and all your dedication and
hard work.
Thank you to all the professors who assisted me during my academic
development at CSUN and who encouraged me to continue my education. I would
especially like to thank my mentor, Dr. Larry Baresi. Through all my troubles and
deployments, you were always understanding and welcoming. You could not have been
any more patient with me and I thank you for all you have done for me.
Thank you to all my colleagues who have helped me through my stay here at
CSUN. It is a forever lasting friendship and I thank you for all your moral support.
iii
TABLE OF CONTENTS
Signature Page
ii
Acknowledgements
iii
Abstract
v
Introduction
1
Methods
14
Results
35
Discussion
91
References
97
Appendix A
102
Appendix B
112
Appendix C
128
iv
ABSTRACT
DNA PACKAGING BY TERMINASE LARGE SUBUNIT IN
METHANOPHAGE PG
By
Thomas Dang
Master of Science in Biology
Recent studies have shown bacteriophages infect living organisms from all three
domains of life, however, not much is known about phages that infect the Archaea. Of
the approximate 50 reported archaeal phages, PG is one of only three known viruses that
infects methanogens and falls within the order of tailed-bacteriophages, Caudovirales.
Tailed bacteriophages have differing DNA replication strategies that is reflected by the
various terminal chromosomal ends created by the terminase large subunit. Although
PG’s replication and packaging process is unknown, bioinformatic studies of PG's
terminase large subunit could help identify potential DNA packaging strategies.
Comparative analysis between other phages demonstrate that they cluster together
according to the type of terminal ends they create.
PG was shown to cluster with termini short direct terminal repeats and cohesive
ends. Studying the terminase large subunit of PG could lead to a better understanding of
its replication strategy, genetic history, and increase our understanding of viruses in
general.
v
INTRODUCTION
Archaea
Archaea represents one of the three domains of life and are known to have unique
metabolic capabilities such as the production of methane and non-chlorophyll light
harvesting (Cavicchioli, 2011). Archaea are recognized as one of the earliest life forms
sharing molecular characteristics with Eubacteria and Eukarya and as such are carefully
studied when looking at the evolution of life on Earth. (Woese and Fox, 1977). The
distinct and shared traits have allowed the Archaea to develop ways to exist in extreme
environments and are grouped into methanogens, halophiles, and thermoacidophiles
(Cavicchioli, 2011).
Methanogens
16s rRNA gene sequencing divides the Archaea domain into three phyla known as
Euryarchaeota, Crenarchaeota, and Korarchaeota. Methanogens represent a large and
diverse portion of Euryarchaeota. They are also a model for many molecular studies in
archaeal replication, transcription, regulation, and protein structure (Leigh, et al., 2011).
Methanogens are known to reside in anaerobic environments that are ranging from the
human gut, rumen of cattle, sediments, and deep-sea volcanic vents. In addition, they are
characterized by their ability to harvest energy by catabolizing H2/CO2, formate,
methanol, methylamines, or acetate to produce methane (methanogenesis) (Reeve, 1992).
1
Methanogenesis follows one of three known pathways and requires unique
coenzymes. H2 dependent CO2 reduction pathway is the most common where CO2 is
reduced to CH4. Some methanogens are capable of using formate where it is oxidized to
CO2 that then enters the CO2 reduction pathway (Blaut, 1994). The separate
methylotrophic pathway utilizes methanol or methylamines as an energy source
converting the methyl group to CH4 and is limited to the family Methanosarcinacea. The
third methanogenic pathway, the aceticlastic pathway, generates methane from the
internal oxidation-reduction of acetate. Acetate is a crucial intermediate that provides the
primary source of methane in freshwater environments and anaerobic digesters (Blaut,
1994). Even though acetate plays a significant role in the production of methane in
nature, there are very few methanogens known that are capable of utilizing acetate as a
precursor for methanogenesis. Those with the ability to utilize the aceticlastic pathway
produce methane and belong to Methanosarcinacea, which are the most metabolically
diverse.
Unlike fresh water sediment and anaerobic digesters, the H2 dependent CO2
reduction plays a significant role in rumen metabolism through a process referred to as
interspecies hydrogen transfer. Interspecies hydrogen transfer is the process by which
reducing equivalents are transferred between a donor, usually a Eubacteria and a recipient
methanogen utilizing H2 dependent CO2 metabolism. Radiocarbon [14C-] isotope analysis
indicates that as much as 70 to 80% of the methane produced by ruminants comes by way
of H2 dependent CO2 metabolism while in wetlands, rotting biomass, and wastewater
treatment plants H2 dependent CO2 metabolism accounts for only 20 to 30% with the
remaining coming from aceticlastic metabolism (Johnson, et al., 1995).
2
Recently, it has been suggested that interspecies process may play a role in human
obesity. The human colon houses numerous amounts of Eubacteria and Archaea
methanogens. Large numbers of H2 consuming methanogens, specifically
Methanobrevibacter smithii, are found in about 50-85% of the human population
(Samuel, et al., 2006). The mutualistic relationship that we have with Eubacteria and
methanogens allows us to digest large complex dietary polysaccharides more efficiently
through interspecies hydrogen transfer. The hypothesis is that Eubacteria ferment the
complex polysaccharides to short chain fatty acids (SCFA’s) primarily acetate,
propionate, and butyrate, in addition to some organic acids and gases such as hydrogen
and carbon dioxide. If H2 builds up, then the Eubacterial NADH dehydrogenases would
be inhibited leading to a decrease in the production of SCFA's, which would decrease the
efficient utilization of polysaccharides. (Samuel, et al., 2006). On the other hand, if
interspecies hydrogen transfer occurred through the presence of a H2 dependent CO2
metabolizing methanogen, M. smithii, then the H2 would not accumulate, the
dehydrogenases would not be inhibited, and the polysaccharides would be metabolized
more efficiently. The effects of increasing digestion of fibers and carbohydrates is
hypothesized to influence host calorie intake and obesity by producing more acetate and
triglycerides which will eventually be stored as fat. Therefore, M. smithii has been
projected as a probable therapeutic target for decreasing energy harvested in obese
humans (Samuel, et al., 2007). Viruses that attack M. smithii could be used to control the
methanogenic population thus altering the carbon flow in the intestine and affecting
caloric intake.
3
Viruses
Viruses are acellular entities that minimally have a nucleic acid with a protective
protein coat. They have been found in all three domains, Eukarya, Eubacteria, and
Archaea. Because viruses lack the essential and necessary metabolic processes for
autonomous reproduction, they do not proliferate by cellular division. Instead, viruses
proliferate by taking over the resources of a host in order to reproduce. Due to virus
dependence on its host for replication, viruses are first classified by their host preference
followed by morphology, genome type, and structures (Orlova, 2009). The nucleic acid
represented within the genome can either be single stranded or double stranded DNA or
RNA. The genome could also be linear, circular, or even segmented and have an
extremely wide range of sizes. The majority of their gene products are essential for
creating virus parts and aiding the infectious process. A protective coat called a protein
capsid encloses the virus’ genome. The importance and function of the capsid is to
protect the genome from being damaged by environmental factors such as pH, salinity,
chemicals, or enzymatic hydrolysis. Moreover, capsids play an important role in host
recognition and the transportation of the virus genome into the host during infection.
(Trun, et al., 2004).
Bacteriophage
Bacteriophages or phages, are viruses that infect Eubacteria and Archaea.
Bacteriophages can be found in every bacterial habitat, they come in a variety of shapes,
and are debatably the most diversified and oldest of all known viruses. Phages can be
4
filamentous, icosahedral, or contain a tail structure that is attached to the head protein
capsid. Currently, the International Committee on Taxonomy of Viruses (ICTV)
recognizes one order, 13 families, and 31 genera of phages (Abedon, et al., 2006).
However, tailed phages total about 96% of all reported bacteriophages and are in the
order Caudovirales, which is divided into three phylogenetically related families:
Myoviridae, Siphoviridae, and Podoviridae. The remaining cubic, filamentous, and
pleomorphic phages are less than 3.6% and are grouped into 10 small families (Abedon,
et al., 2006).
Tailed phages are distinct by having a unique combination of a symmetrical head
and a helical tail but vary in structure, dsDNA genome size, and physiology. Their
genome size can be between 17 to 500 kb and they can have tails anywhere from 10 to
800 nm in length. About 25% of tailed phages are within the family Myoviridae and
have contracting tails that is comprised of a sheath and central tube that is vital during
infection. Siphoviridae represent about 61% of tailed viruses and have long noncontracting tails. Podoviridae have short tails and encompass the remaining 14% of
tailed phages.
Tailed phages utilize their tails in order to attach themselves to specific host
receptors. This interaction defines their affinity for a specific group of bacteria or in
some cases a specific strain (Deresinski, 2009). As tailed bacteriophages irreversibly
attach, they inject their genome into their host where they take over the metabolic
processes. For example, the hosts cell RNA polymerase is utilized to initiate the phage
infectious processes. There are two major infectious life cycles that phages undergo,
lytic or lysogenic. Lytic phages, or virulent phages, take over the host’s metabolic
5
processes usually leading to host’s destruction through lysis. On the other hand lysogenic
phages enter the host and are either incorporated in the host genome and replicate as the
host replicates in a silent state or it enters the lytic cycle for propagation of the phage.
Archaeal phage
Archaea are known to be located in a wide variety of extreme environments but
only about 50 viruses have been reported that infect this diverse group (Stedman, et al.,
2010). Most archaeaphages have been isolated from extreme thermophiles and extreme
halophiles. The kingdom, Crenarchaeota, contain members living in extreme
temperature conditions on both ends of the spectrum whereas Euryarchaeota encompass
many phylogenetically different organisms, such as methanogens and halophiles.
Although the number of phages found in Archaea is substantially fewer than Eukaryotic
or Eubacterial Domains, the diversity found is as great (Forterre, et al., 2006).
Recent isolated Archaeal phages discovered in high temperature acidic
environments, specifically from the Sulfolobales family (Crenarchaeota) have been
shown to acquire unique morphological characteristics that vary from bottle shaped to
lemon shaped to filamentous and rod shaped particles (Snyder, et al., 2011). The
majority of isolated phages infecting organisms within Euryarchaeota have similar head
capsid and tail structure, as seen within Eubacteria phages, however, there are very few
spindle and spherical shaped phages within this kingdom. Although archaeaphages have
diverse morphotypes, all identified genomes thus far are circular or linear dsDNA with
the exception of Halorubrum pleomorphic virus 1 which is ssDNA.
6
Even though methanogens were the first and most studied Archaea, there are only
three known methanophages. ΨM1, one of the three methanophages, is virulent and
infects the thermophilic archaea Methanobacterium thermoautotrophicum strain Marburg
(Mitchell, et al., 1979). ΨM1 has a dsDNA genome that is circularly permuted with
terminal redundancy. This phage also has a polyhedral head capsid and tail structure,
which categorizes in the order Caudovirales. About 15% of ΨM1 phage particleshave
concatemers from the plasmid pME2001 carried by Methanobacterium
thermoautotrophicum Marburg, the only known host for ΨM1, which suggests the
capability of general transduction (Pfister, et al., 1998). In vitro proliferation of ΨM1
leads to a more stable spontaneous deletion mutant which is ΨM2, missing approximately
a 0.7 kb fragment.
Methanobrevibacter strain G and PG
In 1984, Baresi and Bertani identified a methanophage known as PG.
Methanobrevibacter strain G, PG’s only known host, and PG were both isolated from the
ruminant habitat. PG has been tested for its specificity for infecting other methanogens,
including other strains of Methanobrevibacter, and strain G was the only viable host. PG
is a lytic phage with a latent period of 7-9 hours and bust size of about 20-60 phages
(Baresi and Bertani, 1984). It also has a unique A-T rich genome with a size of 71,387
bp dsDNA and encodes for 72 presumptive genes. From the 72 presumptive open
reading frames (ORF), 40 of them resulted in unknown functions. The remaining 32
ORFs have some resemblances to genes found in other Eubacteria and their phages in
7
addition to several Eukaryotes and their viruses (Baresi, personal communication). One
of the presumptive genes identified from the sequence analysis was the terminase large
subunit (TLS), which has a significant responsibility during genome packaging. It is
hypothesized that studying the terminase large subunit could lead to a better
understanding of phage and archaeal evolution.
Terminase Large Subunit
The terminase is a two-subunit enzyme that is involved in phage head nucleic acid
packaging. This explicit reaction occurs from the terminase cleaving the DNA and
utilizing ATPase to transfer the DNA through the portal protein and into the prohead
(Black, 1995). This terminase-portal mechanism is utilized by tailed bacteriophages with
double stranded viral DNA and usually with a linear genome configuration (Burroughs,
et al., 2007). This mechanism is suggested to be found in a variety of families and genera
of viruses, such as Siphoviridae, Podoviridae, and Myoviridae. However, many phages
have not fully been studied and sequenced, resulting in annotation of only putative
terminase large subunits amongst those phages.
Bacteriophage terminases from λ, P2, T3, T7, P22, T4, and Φ29 phages have been
isolated and sequenced providing information that supports the enzyme’s key role in
DNA packaging into the prohead that is dependent upon ATPase activity (Black, 1995).
The terminase enzyme consists of a large and small subunit that has some overlap in the
structure of their genes. One of the model phages used to study the terminase is phage
T4. Researchers were capable of displaying the genetic structure of g16 and g17 through
8
cryo-electron microscopy and X-ray structure analysis (Hegde, et al., 2012). The g16
gene encodes for the smaller subunit which is about 18kDa and g17 encodes for the
larger subunit of about 70kDa, which overlap each other by about 5 codons (Black,
1995). By creating mutations of the gene and over expressing the terminase, studies have
been able to show that this two protein system encodes for the terminase and requires the
use of ATP for cutting the viral DNA and translocating it into the head of the virus
(Burroughs, et al., 2007).
The mechanism responsible for genome packaging in these phages overcomes an
incredible amount of resistance from the capsid (Mitchell, et al., 2002). The amount of
energy needed to translocate the DNA has been shown to be one of the most intense ATP
consuming events in nature, but the ATPase mechanism that enables this is still not
known (Mitchell, et al., 2002). Researchers have tried to understand this motor apparatus
by aligning several terminase and packaging genes and have determined that the
terminase large subunit consists of an N-terminal ATPase domain and a C-terminal
nuclease domain (Burroughs, et al., 2006).
All known tailed bacteriophages contain a linear dsDNA genomes when packaged
in the head capsid. Of these linear genomes there are several known types of terminal
ends: cohesive ends (5’- or 3’- single strand extensions), circularly permuted direct
terminal repeats, short or long exact direct terminal repeats, terminal host DNA
sequences, or covalently bound terminal proteins (Casjens and Gilcrease, 2009). The
specific terminase type amongst tailed phages reflects different DNA replication schemes
and provides an insight on how the terminase functions during DNA packaging (Table 1).
9
Table 1. Types of termini from known tailed-phage genomes
Terminus Type
Prototype Phage
Replication Strategy
5’-single strand extension
λ, P2
3’-single strand extension
HK97
Rolling circleconcatemer
Circlecircle
Rolling circle
concatemer*
Cohesive ends
Circularly permuted direct terminal repeats†
T4
Complexconcatemer
P22
Rolling circleconcatemer
P1
Rolling circleconcatemer
Mu
Duplicative transposition
into host DNA
Host DNA at termini
Exact direct terminal repeats
Short (few hundred bp)
T7
LinearConcatemer
Long (thousands of bp)
SPO1
Complexconcatemer
T5
Complexconcatemer
Φ29
Protein-primed
linearlinear
Covalent terminal proteins
Note: Adapted from the Bacteriophages: Methods and Protocols, p. 91 by Casjens, S. R. and Gilcrease, E.
B., 2009, Humana Press.
† These known virions have their genome sequence terminated at different locations along the sequence
and the length of the terminal repeat fluctuates among each virion
* Genomic analysis predicts this replication strategy, but it has not been experimentally studied
10
From the six well researched types of terminal ends, five of them are created by
the terminase cleaving the genome from the bacteriophage’s replicating mechanism.
Phages with terminal proteins are known to replicate as monomeric linear molecules
(Casjens and Gilcrease, 2009). The majority of tailed phages package their DNA from
concatemers created by the rolling circle or a more intricate initiation replication strategy
by nicking or melting and translocating their DNA in a unidirectional packaging series
along the concatemers. Each concatemer usually packages about two to five phage
heads, but some phages are capable of packaging up to 10 or more depending on the
conditions during infection (Casjens and Gilcrease, 2009).
As the terminase identifies the viral genome, the initial packaging event begins by
cleaving at or near the packaging recognition site. Within headful packaging phages, the
packaging recognition is known as the pac site and when the head capsid gets filled, the
packaging is completed by a second cleavage which is made by the terminase. Cohesive
phages’ packaging recognition site is referred to as the cos site and is terminated at a
sequence specific site leaving identical single stranded extensions that are complementary
to each other. As soon as the terminating cleavage is cut, the next packaging event is
initiated from the remaining concatemer and terminated in the same fashion. Since
tailed-phages have varied terminal ends depending on the replication process, terminase
cleavage, and packaging mechanism, additional research is frequently required to
understand the true characteristics of the linear genome.
11
Hypothesis
If PG terminase packaging uses the cos site as part of its packaging strategy then
restriction digest and HPLC nucleoside analysis would show cohesive ends at a specific
conserved location.
Present Study
The present study was designed to test whether tail phage PG used site-specific
phage packaging producing cohesive ends. This was accomplished using three different
techniques.
1) Bioinformatic studies were undertaken using the known PG DNA sequence and
comparing it to other tail phage sequences to ascertain similarities. Similarities defined
by using these bioinformatic tools will help aid in interpreting the results obtained using
HPLC and restriction fragment analysis.
2) PG DNA was subjected to restriction enzyme analysis under differing conditions.
Single stranded cohesive ends can, by hydrogen bonding, form overlapping double
stranded DNA as seen in phage λ. When heated these overlapping ends disassociate back
to single strands. Thus restriction fragment analysis of the PG DNA would produce
differing restriction fragment patterns between heated and unheated samples.
3) HPLC analysis, after treatment with mung bean nuclease, was used to identify
nucleoside composition of single stranded DNA. Mung bean nuclease hydrolyzes single
stranded DNA producing single nucleotides. Subjecting PG DNA to mung bean nuclease
will release the nucleotides from single stranded ends, which can be separated and
identified using HPLC chromatography.
12
Using these techniques I was able to show that PG has an AT rich cohesive end.
Results also suggest that PG uses circular replication as its packaging strategy.
13
METHODS
Preparation of “B” solution
To 100 ml of distilled water (dH2O) 12.5 g yeast extract, 12.5 g of casamino
acids, and 3 L of 10X trace vitamin (see appendix A) was added. The mixture was then
brought to a boil under a 70% N2/30% CO2 atmosphere and slightly cooled before
aliquoting anaerobically 10 ml into anaerobic culture tubes while under 70% H2/30%
CO2 atmosphere. The tubes were then closed with n-butyl rubber stoppers, capped with
aluminum crimp caps and autoclaved. Once sterilized, 0.5 ml of sterile biotin (0.2
mg/ml) and 0.1 ml of sterile 1% Na2S were aseptically and anaerobically added to all the
tubes using a sterile syringe that has been flushed with 70% H2/30% CO2 gas.
Na2S
2% Na2S was made by taking Na2S crystals and rinsing them with room
temperature dH2O previously boiled under 70% N2/30% CO2 atmosphere. Na2S crystals
were cleaned and dried and were weighed and added to amber serum bottles while being
flushed with 70% N2/30% CO2 followed by addition of the appropriate volume of boiled
dH2O giving a final concentration of 2% Na2S. The amber serum bottle atmosphere was
then flushed with 70% H2/30% CO2, closed with an n-butyl rubber stopper, capped with
aluminum crimp caps, and autoclaved.
14
NaHCO3
6% NaHCO3 was made by weighing out 3.0 g of NaHCO3 and combining it with
50 ml dH2O in a round bottom flask. The mixture was placed under 70% N2/30% CO2
gas and brought to a boil. Once cooled, the solution was transferred to a 100 ml glass
bottle anaerobically using a glass pipette that has been flushed out with 70% H2/30% CO2
gas and the volume adjust to 50 ml. The serum bottle atmosphere was then switched to
70% H2/30% CO2, closed with an n-butyl rubber stopper, capped with aluminum crimp
caps, and autoclaved.
Antibiotics
The antibiotic mixture stock solution contained 0.02% vancomycin, 0.02% Dcycloserine, and 0.2% ampicillin. This was made by adding 0.02 g of vancomycin, 0.02
g of D-cycloserine, and 0.2 g of ampicillin to a 10 ml beaker. The antibiotics were then
transferred into the anaerobic hood where they were dissolved with 5 ml of boiled dH2O.
Then using a sterile 5ml syringe and a sterile 0.45  filter, the solution was dispensed into
sterile tubes or bottles. The antibiotic solution was then stored at 4°C until used.
Ms06 agar
100 ml of Ms06 base agar was made by adding it to a round bottom flask 0.125 g
of NH4Cl, 5 ml of mineral 1 (see appendix A), 5 ml of mineral 2 (see appendix A), 0.01
15
ml of trace minerals, 0.5 ml of 0.4% CaCl2, 0.8 g of sodium acetate, 1.4 g of Bacto agar,
0.1 ml of resazurin, and 100 ml of distilled pure E water (see appendix A). The flask was
then placed in a boiling water bath to dissolve the media while under 70% N2/30% CO2
gas. Once it has cooled but not solidified, 50 mg of L-cysteine was added to assist in
reducing the medium, changing the color from pink to clear. Using the Balch technique,
4.5 ml or 5.0 ml of the medium was then anaerobically transferred to 18 X 150 mm
anaerobic culture tubes or serum bottles under continuous 70% H2/30% CO2 gas and
closed with an n-butyl rubber stopper, capped with aluminum crimp caps, and
autoclaved. Sterile Ms06 base agar is then stored at room temperature. Before using the
medium, 50 µl of 0.1% Na2S, 0.1 ml of 6.5% NaHCO3, 0.2 ml of "B" solution (see
appendix A), and 150 µl antibiotics were added anaerobically and aseptically to each
tube.
Ms06 broth
Ms06 broth was made in a similar fashion to Ms06 agar except agar was not
added.
Transfer and growth of Methanobrevibacter strain G
Methanobrevibacter strain G were aseptically and anaerobically transferred every
week into 4.5 mL of Ms06 broth. Each inoculated tube was pressurized to 30 psi with
70% H2/30% CO2 and incubated at 37°C while placed in a rotator.
16
Gas chromatography
Methane was determined using GOW-MAC series 580 Gas Chromatograph. A
sterile syringe retrieved injected gas samples anaerobically and separation was achieved
by 12 ft. of Porapak Q 80/100 mesh column with helium carrier at 20 mL/min as the
mobile phase. Known methane samples were injected before all samples in order to
determine the appropriate peak and retention time.
Plating
Ms06 plates were prepared by transferring liquefied Ms06 agar media serum
bottles into the anaerobic hood. 20ml of Ms06 agar was then aseptically dispensed into
plastic petri dishes containing the appropriate volume of selective antibiotics. Once
solidified, the plates were used the same day for PG harvesting or determining PG titer.
PG production
In order to prepare for phage infections, Methanobrevibacter strain G was
grown to an OD of 0.7-0.9 in Ms06 broth tubes. PG, strain G, and liquefied Ms06 agar
were transferred into the anaerobic hood. 1.5 mL of strain G was aliquoted into 12 sterile
3.5 mL glass tubes while placed in 37°C heating blocks. 0.1 mL of PG (106 PFU) was
then added to strain G and incubated for 30 minutes. One of the 3.5 mL glass tubes was
set aside as our positive control which did not include PG. After 30 minutes 1.5 mL
17
sterile liquefied Ms06 agar was added to each tube, mixed, and poured over as an overlay
onto Ms06 agar plates. Once the overlays solidified, 10 µl of the phage was placed at the
center of the control plate as a positive control. All the plates were placed into anaerobic
Torbal cylinders along with a small plastic bag containing a few grams of anhydrous
calcium chloride. The cylinder was then removed from the anaerobic hood, pressurized
to 15 psi with H2/CO2, and incubated at 37°C.
PG harvesting
PG was harvested from the overlay plates by one of two methods when the
cylinder pressure dropped to about 5 psi (5 days). One method required scraping off the
overlay from each plate by using a hockey stick and placing them into a GSA centrifuge
bottle. Equal volumes of 100 mM citrate buffer at pH 6 was added to the GSA centrifuge
bottle. 5 drops of chloroform was then added to the collected samples and refrigerated
overnight aerobically. The overlays were centrifuged at 4,000 rpm for 30 minutes 4°C in
the GSA rotor. The supernatant, approximately 25 ml, was decanted into 50 ml Oak
Ridge centrifuge tubes and centrifuged using SS34 at 39,000 x g for an additional 2 hours
at 4°C. After centrifugation, the supernatant was saved for additional phage production.
The pellet containing bacteriophage was suspended in either 0.5 mL of pH 6.5 MOPS
buffer (50 mM MOPS–20 mM EDTA) or 100mM 6 citrate buffer at pH 6. For each
suspended phage pellet, one drop of chloroform was added and then stored at 4°C.
The second and preferred method flooded the harvested plates with either
citrate buffer or MOPS buffer and stored at 4°C overnight to allow PG to diffuse into the
18
buffer. The buffer was removed into 50 mL Oak Ridge centrifuge tubes and then 5 drops
of chloroform added. The phage-buffer suspension was centrifuged at 39,000 x g in
SS34 for 2 hours at 4°C. The supernatant was decanted and saved for additional phage
production while the pellet was suspended in 0.5 mL of either 0.5 mL of pH 6.5 MOPS
buffer (50 mM MOPS–20 mM EDTA) or 100mM citrate buffer at pH 6. Similar to the
previous method, one drop of chloroform was added to each pellet and then stored at 4°C.
Phage titer
Similar procedures were set up as described in phage production.
Methanobrevibacter strain G was grown to an OD of 0.7-0.9 in Ms06 broth and
distributed in the anaerobic hood into 3.5 mL sterile test tubes. Ms06 agar plates were
freshly made and strain G was mixed with 1.5 mL of liquefied Ms06 agar in order to pour
the overlay. 10 µl samples of a 1/10 serial dilution of PG using 100 mM citrate buffer pH
6 as the diluent was patched onto solidified Ms06 strain G overlay lawns. The plates
were incubated at 37°C inside the anaerobic cylinder pressurized at 15 psi with H2/CO2
gas.
PG DNA extraction
0.4 mL of clear phage lysate was pipetted into an Eppendorf tube. 10 µl of 20
mg/mL proteinase K was added in order to achieve a final concentration of 0.5 mg/mL
and then incubated at 37°C for 30 minutes to an hour. After incubation 10 µl of 10%
19
sodium dodecyl sulfate (SDS) was added and mixed by inverting the tube. The tube was
left to incubated at room temperature for 10 minutes and 50 µl of 2M Tris/0.2M
Na2EDTA (pH 8.5) was added. The tube was inverted in order to mix and then incubated
at 70°C for 5 minutes. After incubation, the tube was set aside to cool to room
temperature. An equal volume of TE saturated phenol (pH 6.8) was added and mixed by
inverting the tube. The tube was centrifuged at 14,000 rpm for 5 minutes at room
temperature. The supernatant was transferred with a wide cut end pipette tip to a new
sterile Eppendorf tube and an equal volume of phenol/chloroform/isoamyl alchohol
(1:1:1) was added. The tube was mixed and centrifuged at 14,000 rpm for 5 minutes at
room temperature. The supernatant was transferred with a wide cut end pipette tip into
another new sterile Eppendorf tube. An equal volume of TE saturated chloroform was
added, mixed, and centrifuged in the same fashion mentioned above. The supernatant
was then transferred to a new microfuge tube. The DNA was precipitated by adding 40
µl of 3M sodium acetate at pH 7 and two volumes of ice-cold 100% ethanol. It was
mixed and set on ice for 30 minutes. Next, the tube was centrifuged at 14,000 rpm for 10
minutes at 10°C. The supernatant was carefully removed and the tube was filled halfway
with 70% ethanol. The tube was mixed and again centrifuged at 14,000 rpm for 10
minutes at 10°C. The supernatant was carefully decanted and the pellet placed under
vacuum until the ethanol has completely evaporated. Once dried, the DNA pellet is
suspended in sterile pure-E H2O and the concentration was determined using Nanodrop
spectrophotometer ND-1000TM.
20
Construction of primers for presumptive TLS
Primers were purchased from IDT DNA and designed manually for the
presumptive Terminase Large Subunit gene of methanophage PG. Restriction enzyme
sites were added to the 5’ ends of the forward and reverse primers. The forward primers
have EcoRI and the reverse primers have KpnI. In addition, the sequence GATC was
added before each restriction site.
Polymerase chain reaction (Taq)
The PCR mixture comprised of 2 µl of template DNA, 5 µl of forward primer at 1
pmol/µl, 5 µl of reverse primer at 1 pmol/µl, 25 µl of Master MixTM from Fermentas, and
13 µl of dH2O. The total 50 µl reaction was added into a thin walled 0.5 mL Eppendorf
tube and covered with a drop of mineral oil to prevent evaporation during the reaction.
The PCR was ran in the Perkin-Elmer DNA Thermal CyclerTM 480. The cycle was set to
begin at 1 cycle of 95°C for 10 minutes, 35 cycles of 95°C for 30 seconds, 50ºC for 45
seconds, and 52ºC for 2 minutes, with final extension at 52ºC for 15 minutes. The PCR
reaction was maintained at 4ºC after completion.
After the PCR was completed, the mineral oil is removed by adhering it by rolling
the reaction around on parafilm. The reaction is then transferred into sterile Eppendorf
tubes and cleaned by phenol chloroform extraction. The product is then precipitated with
the same procedure from PG DNA extraction by addition of sodium acetate and ethanol.
21
PCR product clean up
The PCR product was raised to 500 µl with pure-E H2O. An equal volume of TE
saturated phenol was added and mixed by inverting the tube. The tube was centrifuged at
14,000 rpm for 5 minutes at room temperature. The supernatant was transferred with a
wide cut end pipette tip to a new sterile Eppendorf tube and an equal volume of
phenol/chloroform/isoamyl alchohol (1:1:1) was added. The tube was mixed and
centrifuged at 14,000 rpm for 5 minutes at room temperature. The supernatant was
transferred with a wide cut end pipette tip into another new sterile Eppendorf tube. An
equal volume of TE saturated chloroform was added, mixed, and centrifuged in the same
fashion mentioned above. The supernatant was then transferred to a new microfuge tube.
The DNA was precipitated by adding 40 µl of 3M sodium acetate at pH 7 and two
volumes of ice-cold 100% ethanol. It was mixed and set on ice for 30 minutes. Next, the
tube was centrifuged at 14,000 rpm for 10 minutes under refrigeration. The supernatant
was carefully removed and then the tube was filled halfway with 70% ethanol. The tube
was mixed and centrifuged at 14,000 rpm for 10 minutes at 10°C. The supernatant was
carefully decanted and the pellet placed under vacuum until the ethanol has completely
evaporated. Once dried, the DNA pellet is suspended with 50 µl sterile pure-E H2O and
stored in -20°C.
Electrophoresis
DNA samples were ran on 0.8% agarose gels. 0.24 g of agarose was measured
and placed into a 50 mL flask. 30 mL of 1X TAE was added to the agarose and
22
positioned into a double boiler until the agarose has dissolved. The flask was placed at
room temperature to cool until appropriate to pour into the gel tray with the comb in
place. Once the gel solidified, the comb and tray barriers were removed and the gel box
was filled with 1X TAE. 2 µl of loading dye was added to 2 µl of PCR product and the
volume was raised to 15 µl with sterile pure-E H2O. Once the lanes were loaded, the gel
was ran at 75 volts for 1 hour.
Pulse Field Electrophoresis
Restriction digested DNA samples were separated by pulse field electrophoresis
using 1% agarose gels. 0.3 g of agarose was measured and placed into a 50 mL flask. 30
mL of 1X TAE was added to the agarose and positioned into a double boiler until the
agarose has dissolved. The flask was placed at room temperature to cool until
appropriate to pour into the gel tray with the comb in place. The module was placed into
an ice water bath to keep the system cool during the duration of the run. Once the gel
solidified, the comb and tray barriers were removed and the gel box was filled with 1X
TAE. 2 µl of loading dye was added to 2-4 µl of digested DNA and the volume was
raised to 10 µl with sterile pure-E H2O. Once the lanes were loaded, the gel was ran with
a pulse electrophoresis at 50 volts with a forward ramp of 2 seconds and a reverse ramp
of 1 second to separate bands that are 20 kb or greater.
23
Visualization of DNA
Bands from an agarose gel were visualized by flooding the gel in 0.5 µg/ml
ethidium bromide for 20 minutes. The gel was positioned in a UV light box for viewing
and photographed using an OLYMPUSTM digital camera.
If bands were faint, Sybr GoldTM staining was used as an alternate. 5 µl of Sybr
GoldTM was diluted into 50 mL of 1X TAE in the gel box. The agarose gel was then
stained for 30 minutes and viewed on a Dark ReaderTM.
DNA fragment size
DNA fragment size was determined by comparing unknown with an
O’GeneRulerTM DNA 100-10,000 bp ladder from Thermo Scientific. The measurements
in millimeters were taken from the bottom of the well to the bottom of the DNA band.
The best-fit line on semi-log paper inferred the size line up from all the migrated bands.
DNA extraction from gel
0.8% low melting temperature agarose gel was prepared in order to extract DNA
bands from the gel. The lanes were loaded with the desired samples and ran at 75 volts
for 1 hour. The gel was then stained with ethidium bromide and visualized under UV
light. The desired bands were then cut out from the gel and placed into a sterile
Eppendorf tube. About four times the volume of TE buffer was added to the Eppendorf
tube and was heated at 65°C in order to melt the gel. Phenol chloroform extraction was
24
performed and precipitated by adding 100 µL 5M LiCl and 500 µL ice-cold 100%
ethanol. After suspending the pellet in pure-E H2O, DNA concentration was measured
by Nanodrop spectrophotometer ND-1000TM.
DNA gel extraction was also conducted by using ZymocleanTM Large Fragment
DNA Recovery Kit. 0.8% low melting temperature agarose gel was prepared in order to
extract DNA bands from the gel. The lanes were loaded with the desired samples and ran
at 75 volts for 1 hour. The gel was then stained with ethidium bromide and visualized
under UV light. The desired bands were then cut out from the gel and placed into a
sterile 1.5 mL microcentrifuge tube. 3 volumes of Agarose Dissolving BufferTM (ADB)
is added to the excised agarose gel slice and incubated at 37-55 °C for 5-10 minutes until
the gel completely dissolved. The melted agarose solution was then transferred to the
Zymo-SpinTM column with a collection tube. The column was centrifuged for 1 minute
at 14,000 rpm and the flow through was discarded. 200 µl of DNA wash buffer was
added to the column and the column was centrifuged again for 30 seconds. 10 µl of
DNA elution buffer was added directly to the column matrix after repeating the washing
step again. After the elution buffer has been set on the matrix for 1 minute, the column
was placed into a 1.5 ml tube and centrifuged for 30 seconds. After centrifugation, DNA
concentration was measured by Nanodrop spectrophotometer ND-1000TM.
DNA cleaning
Genomic DNA Clean and ConcentratorTM kit was used from Zymo Research. 2
volumes of ChIP DNA Binding BufferTM are added to each volume of DNA sampled in a
25
1.5 ml microcentrifuge tube. The mixture was then transferred to a Zymo-SpinTM IC-XL
column with a collection tube. The tube was centrifuged for 30 seconds at 14,000 rpm.
The flow-through was discarded and 200 µl of DNA Wash BufferTM was added to the
column. The tube was again centrifuged for 1 minute with the wash step repeated. 10-20
µl of DNA elution buffer was added directly to the column matrix and incubated at room
temperature for one minute. The column was transferred to a new microcentrifuge tube
and centrifuged for 30 seconds to elute the DNA. After centrifugation, DNA
concentration was measured by Nanodrop spectrophotometer ND-1000TM.
Restriction enzyme digest
DNA samples were digested with the appropriate restriction enzymes by adding
4-6 µl of sterile pure-E H2O to 3-4 µl DNA sample. 1 µl of Fermentas Fast Digest buffer
was added to the tube containing 1X concentration of the Fermentas Fast Digest
restriction enzyme. The tube was incubated at 37°C for 30 minutes.
Determination of terminal ends
Cohesive ends were analyzed in DNA samples by digestion of the virion DNA
with different restriction enzymes that would potentially contain fragments containing Hbonded cohesive end fragments. Virtual cutter of the genome sequence in Serial
ClonerTM 2.6.1 and trial and error assisted in determining which restriction enzymes were
to be used. After digestion of the DNA samples with the appropriate restriction enzymes,
26
the reaction was heated to 65-70°C for 15 minutes and then divided into 2 equal portions
in microfuge tubes. One of the tubes was immediately placed into an ice bath and the
other was slowly cooled to room temperature on the bench top. After cooling, the DNA
samples were run on an agarose gel.
Alkaline phosphatase digestion
DNA samples were treated with alkaline phosphatase in order to provide a better
reading within the High Performance Liquid Chromatography (HPLC). 2 µl of 10X
Thermo Scientific Fast Digest buffer and 1 µl of FastAP Thermosensitive Alkaline
Phosphatase was added to the DNA samples (20 µl reactions) for digestion. The samples
were incubated at 37°C for 30 minutes to an hour and filtered through NANOSEPTM 3K
Omega centrifugal filter.
DNA ligation
Added 2 µl of 5X rapid ligation buffer, 1 µl of T4 DNA ligase, and PG DNA to a
sterile Eppendorf tube was raised to 10 µl with sterile pure-E H2O and incubating the
mixture for 5 minutes at room temperature ligated PG DNA. After incubation, the
sample was filtered through NANOSEPTM 3K Omega centrifugal filter to keep the same
parameters with other DNA samples ran through the HPLC.
27
High Performance Liquid Chromatography (HPLC)
High Performance Liquid Chromatography AgilantTM 1100 series machine was
used as a tool to identify the presence overhanging ends by analyzing possible
nucleosides released from DNA samples after digestion with mung bean nuclease. 2
equal portions of a DNA sample were placed into separate microfuge tubes. One of the
tubes was heated to 65-70°C for 15-30 minutes, while the other was kept at room
temperature. Immediately after heat treatment, both tubes were digested with mung bean
nuclease at 37°C for 30 minutes to 1 hour in order to digest single stranded ends until the
DNA sample ends are blunt. After incubation, the DNA samples were raised to 40 µl by
adding sterile pure-E H2O and then filtered through a NANOSEPTM 3K Omega filter.
The sample was then treated with alkaline phosphatase and incubated at 37°C for 30
minutes to an hour. After alkaline phosphatase, the DNA samples were again filtered
through NANOSEPTM 3K Omega filter.
Prior to turning on the HPLC, all solvents were filtered through a 0.45 µm
cellulose membrane filter and placed in the appropriate solvent reservoirs. After the
DNA samples have been prepared for HPLC analysis, the pump, injector, column,
detector, and computer is turned on. Once the HPLC is connected to the computer, the
instrument 1 online program tab and the purge valve were both opened in order to change
configurations on the machine. The pump was set to 5 ml/minute for 6 minutes for each
solvent in the order of 70% methanol, 12% methanol + triethylamine phosphate (pH 5.1),
and then 12% methanol. After purging with all three solvents, the pump was switched to
1 ml/minute on 12% methanol and the purge valve was closed. Once the pressure
increased and the base line is constant, the solvent was then switched to 12 % methanol +
28
triethylamine phosphate. The thermostat and detector was then turned on to 30°C. Once
the indicator signifies it is ready, the DNA samples were then ready to be injected into
the HPLC to be analyzed.
Nanodrop spectrophotometer
DNA purity and concentration were measured at A260/A280 using the Nanodrop
spectrophotometer ND-1000TM. Sterile pure-E H2O was used to blank the
spectrophotometer before analyzing DNA samples.
Sequence retrieval
The terminase large subunit nucleotide sequences from bacteriophages and
archaeaphages were obtained from the NCBI GenBank database
(http://www.ncbi.nlm.nih.gov/) and were imported into the San Diego Supercomputer
Center Biology WorkBench (http://workbench.sdsc.edu/) for further analysis.
Multiple Sequence Alignment (MSA)
Multiple sequence alignments were performed in Biology Workbench v.3.2 and
before conducting the alignment, some sequences were converted to their reverse
complement to obtain the correct orientation for alignment. In addition, alignment
between phages would be very difficult due to the capabilities of vertical and horizontal
29
transmission. Several attempts were conducted to align 20 phage sequences with various
gap penalties and extensions, however, the capabilities of achieving any conserved
regions were almost nonexistent when using ClustalW in Biology Workbench. In order
to achieve a decent alignment, the multiple sequence alignments (MSA) were split into
two separate groups. One MSA is from the family Siphoviridae and the other MSA is
from Podoviridae. Using high gap penalties and gap extensions forced conserved regions
to align which lead to fewer gaps. The parameters used for alignment are listed below:
Parameters for Siphoviridae:
Matrix: IUB/BESTFIT
Gap penalty: 90
Gap extension: 8
Parameters for Podoviridae:
Matrix: IUB/BESTFIT
Gap penalty: 90
Gap extension: 10
Translation of nucleotide sequences to amino acid sequences
The multiple sequence alignments were then imported into BioEdit, which
allowed the sequences to be converted into amino acid sequences. Toggling from
nucleotide to amino acid sequences identified unwanted X’s and assisted in correcting for
appropriate gaps within the alignment.
Distance matrices
Different distance matrices were generated using PAUP4.0 and MEGA4 in order
to measure evolutionary and genetic distances between species of interest that have
30
diverged from a common ancestor. These matrices created using MEGA were Pairwisedistance, Kimura-2 parameter, Jukes-Cantor (Nei-Gojobori) synonymous and nonsynonymous, and Tamura-Nei. Distance matrices created using PAUP were Pairwisedistance, Kimura-2 parameter, Kimura-3, Jukes-Cantor, absolute distance, and TajimaNei (Kimura, 1980; Jukes, T.H., & Cantor, C.R., 1969; Tajima, F., & Nei, M., 1984).
Modeltest
The Modeltest (Posada, D., and Crandall, K. A., 1998) is a program in PAUP4.0
(Swofford, D. L., 2002) that is used to analyze the likelihood scores for 56 different
models. The Modeltest identifies the best model by performing hierarchical likelihood
ratio tests (hLRTs) and Akaike Information Criterion (AIC). The model with the lowest
AIC values is the best-fit model to use. Modeltest was performed for the Podoviridae set
of sequences and for the Siphoviridae set. The best model determined for Podoviridae
and Siphoviridae is the general time reversible model with gamma rate distribution
(GTR+G) with AIC = 11099.8994, -lnL = 5540.9497, and K = 9 for Podoviridae and
AIC = 24658.3223, -lnL = 12320.1611, and K = 9 for Siphoviridae. The optimal values
derived from Modeltest were then applied into phylogenetic analysis to construct the
Maximum Likelihood trees. The information assisted in preparing the best Maximum
Likelihood tree and were used to perform bootstrap analysis.
31
Phylogenetic analysis
Parsimony analysis provides the simplest technique using a non-model based
algorithm to develop trees with very few assumptions. The parameters used were set to
default and all characters were weighted equally. The branches were also set to collapse
if the maximum length is zero. In addition, character-state optimization was set to
accelerated transformation and allowed assignment of states not observed in terminal taxa
to internal nodes. Those selected can be recognized as potential short cuts by the “3+1”
test.
Bayesian analysis collected data to create phylogenetic trees from prior
informative data. To conduct Bayesian analysis, the files from PAUP and MrBayes
blocks were incorporated and executed using MrBayes. The program was set to run one
million generations in order to develop 10,000 trees. Three different nucleotide sequence
sets were observed, one set for Podoviridae, another set for Siphoviridae, and the third
set combined both families of sequences.
Maximum Likelihood analysis observed for the least number of changes and
needed to utilize information from Modeltest in order to create optimal phylogenetic
trees. According to Modeltest, the best fit model selected for all three sets was GTR+G.
Bootstrap analysis
Bootstrap analysis was used to identify the accuracy of a phylogenetic tree by
randomly shuffling the MSA’s in an attempt to get the same tree from the data.
32
Podoviridae and Siphoviridae sequences were analyzed creating 1000 replicates using the
Modeltest results and Maximum Likelihood trees.
Time of divergence
Divergence time tables were calculated using Jukes-Cantor Non-Synonymous
matrix for both Podoviridae and Siphoviridae in MEGA4. In order to identify divergence
dates, the equation, µ = K/(2t), was used where t = K/2µ. µ is designated as the number
of substitutions per site per year and K represents the number of substitutions between
two species. T is the time of divergence between two species. However, there were no
designated numbers for substitutions per site per year for bacteriophages, therefore the
bacterial non-synonymous rate was used at 4.5 x 10-9.
Multiple Sequence Comparison by Log-Expectation (MUSCLE)
99 phage amino TLS acid sequence were retrieved from the NCBI GenBank database.
The phages chosen have types of termini that have been studied or experimentally
determined. PG and ψM2 have been added and input into MUSCLE to perform a
multiple sequence alignment (http://www.drive5.com/muscle/). MUSCLE is shown to be
more consistent with higher accuracy than ClustalW. The algorithm takes an approach to
include fast distance estimation, progressive alignment based off of a profile function,
and refinement using tree dependent restricted partitioning (Edgar, 2004).
33
FastTree
FastTree implies approximate maximum likelihood phylogenetic trees from large
protein alignments. In addition, the tool uses a heuristic approach to identify better trees
and estimates rate of evolution at each site. After receiving the results from MUSCLE,
the alignment was analyzed for phylogenetic relationships using Whelan and Goldman
(WAG) models for amino acid evolution (Whelan, S., & Goldman, N., 2001).
34
RESULTS
Terminase Large Subunit in PG
PG genome’s sequence was analyzed by GeneMark to predict open reading
frames. GeneMark predicted 72 genes, which were blasted against NCBI. One of the
open reading frames matched the Terminase Large Subunit from Methanobacterium
phage ψM2 with a 34% identity with an E-value of 2.90E-65. Located at 66613-69198,
the presumptive gene is 2,586 bp long and translates to 861 aa. PG’s DNA replication
and packaging is currently unknown, however, if the amino acid sequence of a
bacteriophage’s TLS is known, the packaging strategy can often be predicted by
comparative analysis. Phage TLS will often cluster according to the type of terminal
ends they generate after packaging (Casjens and Gilcrease, 2009). This led to an attempt
to conduct evolutionary studies on PG’s TLS.
Table 2. PG TLS location within the genome and blast result
Left
end
Right
end
Length
(bp)
AA
BLAST
match
66613
69198
2586
861
TLS
Evalue
1E-59
%
Identity
34%
Organism
ψM2
Prediction of PG’s packaging strategy
The amino acid sequence of a PG’s TLS was compared and analyzed among other
experimentally known terminal type phages. This comparative analysis has shown that
35
these sequences cluster the phages according to their type of terminal ends. The
additional 100 annotated phage sequences with determined types of terminal ends were
retrieved from the NCBI GenBank database (http://www.ncbi.nlm.nih.gov/). MUSCLE
was used to construct the multiple sequence alignment from these 101 amino acid
sequences and were input into FastTree to create approximate maximum likelihood
phylogenetic trees. FastTree is more accurate than using the distance matrix and uses the
Whelan and Goldman (WAG) models of amino acid evolution. The reliability of each
split in the tree is determined by the Shimodaira-Hasegawa test on three nearest neighbor
interchanges around that split and resampling 1,000 times. Indicated in Figure 1, the
closer the local support values are to 1, the more reliable the split is.
From the FastTree phylogram, PG looks to have shared a common ancestor with
phages having short direct terminal repeats (DTR), but also possibly shares common
ancestry with well-known phages creating cohesive ends. Although phage DNA ends
could result from many different replication, cleavage, or packaging mechanisms, the
result of PG’s clustering in the phylogenetic tree guided the next approach for the
directed analysis to identify PG’s possible cos site or direct terminal repeating ends.
36
Figure 1. 101 phage TLS FastTree phylogram with local support values using amino acid sequences. The
red star indicates the methanophage PG. The yellow star indicates the other methanophage ψM2. The
highlighted areas are color coordinated according to known types of terminal ends created by those phages.
37
Phage PG production and DNA extraction
Two techniques were used to produce PG and its DNA. One method required the
scraping of the overlay agar from the phage plates. The second method, and more
preferred was the flooding of the phage plates with either pH 6.5 MOPS or citrate buffer
and then collection to give a phage sample without agar residues. In both cases, titers
between 109-1010 PFU/mL were obtained. However, the extraction of PG DNA from
MOPS buffer and citrate buffer lead to different results in the DNA band clarity and
integrity (Figure 2A and 2B). Even though PG extracted DNA produced NanoDrop
spectrophotometer reading between A260/280 of 1.75-1.9, which is considered pure,
agarose gel electrophoresis identified a problem. DNA extracted from the PG pellet
suspended in MOPS buffer developed a smear when subjected to agarose gel
electrophoresis (Figure 2A). However, DNA extracted from the PG pellet suspended in
citrate buffer appeared as a clean pure band and was therefore used throughout this study
(Figure 2B).
38
A 1 2 3
B
1
2
3
23130 bp
10000 bp
10000
6000
3000
3000
Figure 2. PG DNA extraction. Figure 2A and 2B are both PG DNA samples that were harvested the
same way except A was suspended in MOPS and B was suspended in Citrate buffer. Figure2A lane 1 is
New England BioLabs 1kb ladder. Figure 2A lane 2 is Lambda DNA/HindIII Marker and lane 3 is PG
DNA. Figure 2B lane 1 is O’GeneTM Ruler ladder. Figure 2B lane 2 is a 2µl sample of PG DNA and 3 is a
4µl sample of PG DNA. Both gels are 0.8% agarose.
Determining phage genome ends using restriction mapping
To test PG’s putative TLS, PG’s DNA packaging strategy was analyzed for
cohesive ends, headful packaging, or DTR using restriction enzyme analysis. Each direct
analysis requires the use of specific restriction enzymes that results in the display of
fragments uncrowded in gel positions. Based off the FastTree of figure 1 and PG’s λ-like
structure, cohesive ends analysis was conducted. This assay required restriction enzymes
to display bands on an agarose gel with enough separation to determine joining and
separation of the cohesive ends. Due to the unknown location of PG’s cos site, different
restriction enzymes were used to ensure the display of the two end fragments. To
39
determine the appropriate restriction enzymes, Serial Cloner v2.6 was used to display the
site usage of restriction enzymes from PG’s inputted genome. Selected restriction
enzymes used for this study can be found on Table 3.
Table 3. Restriction enzymes and site usage in PG from Serial Cloner v2.6
Restriction Enzymes
BstEII
ClaI
EcoRV
HindIII
SbfI
XhoI
Tsb509
Number of sites
3
9
10
15
3
3
933
Cohesive end analysis was prepared according to the protocol using BstEII and
XhoI (Figure 3). The samples were heated to 75°C for 15 minutes and divided equally to
obtain a rapid and slow cooling mixture and ran on a 0.8% agarose electrophoresis. λ
phage is the positive control used to ensure the proficiency of the restriction enzymes
(Figure 3, lanes 1 and 2). The resulting smears of PG samples (lanes 4-7) was initially
attributed to inactive proteinase K functioning in eliminating possible DNase
contamination.
40
1 2 3 4 5 6 7
14140 bp
7242
6369
Figure 3. Cohesive analysis on PG DNA using BstEII and XhoI. Lane 1 the positive control for λ +
BstEII. Lane 2 is the positive control for λ + XhoI. Lane 3 is O’Gene Ruler ladder. Lane 4 is PG + BstEII
fast cooled and lane 5 was slow cooled. Lane 6 is PG + XhoI fast cooled and lane 7 is PG + XhoI slow
cooled.
Additional PG DNA extraction was conducted with the same protocol but
included phenylmethylsulfonyl fluoride (PMSF) to deactivate proteinase K at a final
concentration of 5mM (Figure 4). 2 sets of samples were prepared in order to see if
proteinase K was functioning properly. One set included PMSF to intentionally
deactivate proteinase K prior to the introduction of the buffer used with the restriction
enzymes (Figure 4, lanes 1 and 2). The next set of samples did not include PMSF (Figure
4, lanes 4 and 5). The results showed a distinct smear at lanes 2 and 5 that included the
contaminated buffer without any restriction enzymes. The presence of PMSF was
expected to deactivate proteinase K to give a smear if the buffer sample was
contaminated with a nuclease. In figure 4, lanes 4 and 5had no additions of PMSF to
deactivate proteinase K. If proteinase K was active, we would expect to see the
elimination of any possible nucleases to give a clean band of PG DNA. However, the
41
result on lane 4 displays a slight smear indicating the possibility that PG DNA could
contain a nuclease contamination that is not deactivated by proteinase K. Lane 5 has the
addition of the restriction enzyme buffer which could have enhanced the activity of the
contaminant. With these results, it shows proteinase K as inactive against the nuclease
activity due to the smears on both lanes 2, 4, and 5 and that the contaminant is highly
active with the presence of the restriction enzyme buffer.
1
2
3
4
5
10000 bp
3000
Figure 4. Identifying source of contaminant in PG DNA treated with PMSF and introduction of
restriction enzyme buffer. Lane 1 is PG DNA with PMSF. Lane 2 is PG with PMSF + restriction
enzyme buffer. Lane 3 is O’Gene Ruler ladder. Lane 4 is PG with no PMSF. Lane 5 is PG with no PMSF
+ buffer.
From the results in figure 4, the proteinase K treatment was part of the DNA
purification and did not have the ability eliminate the nuclease contaminant. Therefore,
in order to determine if the proteinase K was inactive or unable to remove the
contaminant, a new stock of proteinase K (20mg/mL) from Promega . The old and new
proteinase K was set up with the same restriction digest of λ DNA in order to test its
42
ability to inhibit BstEII. Each sample preparation was introduced to the old or new
proteinase K before adding in the restriction enzyme and buffer. This order was done to
minimize the activity of the restriction enzyme to examine the level of efficiency of the
old and new proteinase K (Figure 5). The results shown in lane 1 indicates that the old
proteinase K did not inhibit the activity of BstEII on λ DNA where the new proteinase K
was able to hinder BstEII’s reaction (lane 2). Lane 3 was the control having no
proteinase K and just λ with BstEII.
1
2
3
Figure 5. Identifying efficiency of proteinase K. Lane 1 is λ DNA with the old proteinase K. Lane 2 is
λ DNA with the new proteinase K. Lane 3 is the control with λ DNA + BstEII.
All DNA extraction procedures and already extracted samples of PG DNA were
treated with the new proteinase K and deactivated with PMSF in order to eliminate any
possibility of nuclease contamination. After the proteinase K treatment, PG DNA
43
samples were separated on a 0.8% agarose gel to determine the presence of any nuclease
activity (Figure 6). As expected, there was no smearing as seen in Figure 4, lane 2. Lane
3 was a previously extracted PG sample that was treated with the new proteinase K after
extraction and inactivated using PMSF. Lane 4 is the same PG sample from lane 3 but
ran with the presence of the same restriction enzyme buffer. The results show no
smearing indicating the new proteinase K resolved the issue that was present in Figure 4,
however, the results in lane 6 lead to an additional issue. Lane 6 had the same PG DNA
extracted from lanes 1 and 2 and was treated with BstEII. The lane displayed a smear
with no distinct bands. This result specifies that the proteinase K worked on the nuclease
that was activated with the restriction enzyme buffer in PG, but outlines another issue of
the restriction enzyme specificity. BstEII was discovered to be past its shelf life and
could have resulted in star activity.
1 2 3 4
5 6
10000 bp
3000
Figure 6. PG treated with new proteinase K and restriction digest. Lane 1 is PG DNA extraction with
the new proteinase K. Lane 2 is PG DNA proteinase K treated with restriction enzyme buffer. Lane 3
already extracted PG DNA treated with new proteinase K. Lane 4 is already extracted PG DNA treated
with new proteinase K and ran with restriction enzyme buffer. Lane 5 is O’Gene Ruler ladder. Lane 6 is
PG DNA sample treated with proteinase K +BstEII.
44
Samples of PG were cut with new fast restriction enzymes ClaI, EcoRV, and
Tsp509. The samples of PG were all extracted according to the protocol and were all
from the same phage production harvest. When extracted and ran on a 0.8% agarose gel,
the PG DNA gave a clear distinct band for PG. However, when PG was incubated with
the three restriction enzymes, the results had no digestion activity but worked on λ DNA
(Figure 7). This resolved the notion of star activity with the old BstEII restriction
enzyme, but identifies an additional inhibitor associated with PG (lane 1). When mixed
with λ DNA, ClaI was not able to cut λ or PG (lane 2) but works effectively against λ by
itself (lane 5).
11385 bp
10496
10000 bp
3000
Figure 7. Inhibitor associated with PG when treated with restriction enzymes. Lane 1 is PG DNA.
Lane 2 is PG + λ + ClaI. Lane 3 is PG + EcoRV. Lane 4 λ + EcoRV. Lane 5 is λ + ClaI. Lane 6 is
O’Gene Ruler ladder. Lane 7 is λ + Tsp509. Lane 8 is PG + Tsp509.
45
PG was compared with a single stranded DNA phage called M13. Due to PG’s
unusual AT rich genome, PG was looked at for the possibility of becoming single
stranded during the extraction or handling process (Figure 8). M13 is a circular single
stranded DNA phage of about 6.4 Kb. As a positive control, M13 was digested with
mung bean nuclease, which is specific for single strand DNA and RNA (Figure 8A, lane
4; Figure 8B lane 4). Mung bean will also degrade single stranded extensions off of
DNA and RNA leaving ligatable blunt ends. From Figure 7, PG had an inhibitor that
would not allow ClaI to cut λ when mixed together. As expected, PG was not affected by
mung bean (Figure 8A, lane 2), however, the inhibitor associated with PG did not prevent
mung bean from digesting M13 when mixed together (Figure 8B, lane 3). This identifies
the contaminant associated with PG inhibits endonuclease activity against double
stranded DNA.
46
A
1
2
3 4
B 1
2
3 4
5
10000 bp
6407
6000
3000
Figure 8. PG and M13 treated with mung bean. Figure 8A, lane 1 and 8B lane 5 is O’Gene Ruler
ladder. Figure 8A lane 2 is PG + mung bean. Figure 8A lane 3 is M13. Figure 8A lane 4 and 8B lane 4 is
M13 + mung bean. 8B lane 1 is PG DNA. 8B lane 2 is PG + M13. 8B lane 3 is PG + M13 + mung bean.
PG DNA clean and concentrate
The results of PG has demonstrated a source of contamination that could be
associated with the genome or be a soluble inhibitor. Due to such complications, PG
DNA samples were ran through another cleaning process using Zymo Research Genomic
DNA Clean and ConcentratorTM kit. The cleaned PG samples were again assessed by a
NanoDrop spectrophotometer to ensure an A260/280 of 1.75-1.9. The PG samples resulted
in a decreased concentration after being ran through the filters, but PG was no longer
showing influence inhibition when mixed with λ and cut with ClaI and DNAse (Figure 9,
lane5 and lane 7).
47
1
2
3
4
5
6
7
11385 bp
10000
Figure 9. Effective digestions after Zymo kit cleaned PG. Lane 1 is λ + DNAse. Lane 2 is λ + ClaI.
Lane 3 is O’Gene Ruler ladder. Lane 4 is pure λ DNA. Lane 5 is a mixture of λ + PG digested with ClaI.
Lane 6 is cleaned sample of PG. Lane 7 is the same cleaned sample PG + DNAse.
Cohesive end analysis
Best studied phages with chromosomes having cohesive ends have identical
overhanging ends that anneal together upon injection into the host. The host DNA ligase
will then seal the ends to generate a rolling circle template for DNA replication.
λ was set as our positive control for cohesive ends. λ’s cos site has a 12 base pair
overhanging end that separates after being heated at 65°C-70°C for 5 minutes. After
heating, the samples were separated by different cooling procedures. One sample was
cooled in an ice water bath immediately after heating (Figure 10A, lane 1; 10B, lane 1).
The other sample was slowly cooled to reach room temperature (Figure 10A, lane 2; 10B
48
lane 2). The results for our positive cohesive end control demonstrates the separated
sticky ends anneal back together when the samples are slowly cooled.
A 1 2 3
B 1 2 3
23130 bp
27491
5765 bp
4361
3326
2676
650
Figure 10. Cohesive end analysis on λ. Figure 10A Lane 1 is λ + EcoRV heated to 65°C and rapidly
cooled in ice water bath. Figure 10A lane 2 is λ + EcoRV heated to 65°C and slowly cooled to room
temperature. Figure 10A Lane 3 is O’Gene Ruler ladder. Figure 10B lane 1 is λ + HindIII heated to 65°C
and rapidly cooled in ice water bath. Figure 10B Lane 2 is λ + HindIII heated to 65°C and slowly cooled to
room temperature. Figure 10B Lane 3 is O’Gene Ruler ladder.
PG was also set to the same parameters for identifying cohesive ends and were cut
with EcoRV and ClaI (Figure 11). Figure 11A, lane 1 shows PG digested with EcoRV
and then heated to 65°C for 5 minutes and slowly cooled to room temperature. The same
parameters were set in 11B lane 1 but was digested with ClaI. Lane 2 are samples of PG
digested with EcoRV (Figure 11A) and ClaI (Figure 11B) that have been cooled rapidly
in an ice water bath immediately after heating. The PG sample digested with EcoRV
displayed cleaner cuts and was used to analyze different temperature ranges for splitting
49
the presumptive cos site (Figure 12). The results of direct cohesive analysis did not
display any clear separation or annealing of the presumptive cos site in PG. However, no
separations of cos ends is a negative result but does not exactly determine if PG has
cohesive ends.
A 1 2 3
B
1 2
3
Figure 11. Cohesive end analysis on PG. Figure 11A lane 1 is PG + EcoRV heated to 65°C and slowly
cooled to room temperature. Figure 11A Lane 3 is PG + EcoRV heated to 65°C and rapidly cooled in an
ice water bath. Figure 11A Lane 2 is O’Gene Ruler ladder. Figure 11B lane 1 is PG + ClaI heated to 65°C
and slowly cooled to room temperature. Figure 11B Lane 2 is PG + ClaI heated to 65°C and rapidly cooled
in an ice water bath. Figure 11B Lane 3 is O’Gene Ruler ladder.
50
1
2
3
Figure 12. PG digested with EcoRV and heated at different temperatures. Figure 12 lane 1 is
O’Gene Ruler ladder. Lane 2 is PG + EcoRV heated to 55°C for 5 minutes and rapidly cooled in an ice
water bath. Lane 3 is PG + EcoRV heated to 75°C for 5 minutes and rapidly cooled in an ice water bath.
High performance liquid chromatography (HPLC)
The complexities of working with PG limits which endonucleases we can use to
properly digest its genome. In addition to the extra purification procedures, PG’s 70kb
genome lacks the usable sites to display the appropriate minimal bands on a gel to
identify the annealing and separation of the possible cos site. As a result, the use of a
high performance liquid chromatography is a great tool to identify and separate different
compounds in a liquid sample. In this case, the HPLC can be a promising tool to identify
individual nucleosides from PG if the genome has cohesive ends. By digesting PG with
mung bean nuclease, which will remove any single stranded ends to blunt ends, we can
determine the single nucleosides as they are separated from one another though the
column and measured by a UV wavelength absorbance detector at 254nm. As the
separated nucleotides exit the column, their detection is measured on a liquid
chromatogram and determined by their retention time.
51
Standards on the HPLC were created by treating each single nucleotide (dATP, dGTP,
dCTP, dTTP) with alkaline phosphatase to remove the phosphate groups and filtered
through a NANOSEPTM 3K Omega filter (Figure 13). The use of NANOSEPTM 3K
Omega filter was essential to separate any digested single nucleosides from the remaining
double stranded genome of PG and λ. This step was done on nucleoside standards to
keep consistency in the procedures when conducted on PG and λ samples. The standards
presented on the chromatogram resulted with one distinct peak for each indicated
nucleoside. The X-axis is measured in time (minutes) and the Y-axis is measured in
milli-absorbance units expressed by UV detection.
52
dA
dG
dC
dT
Figure 13. Nucleoside standards ran on HPLC. Each of the nucleosides are set as the standard to
determine the time of retention. Each dNTP was run individually, treated with alkaline phosphatase, and
filtered through a 0.5 µm NANOSEPTM 3K Omega filter. dA measured at 8.5 minutes, dG measured at 4.4
minutes, dC measured at 3.0 minutes, and dT measured at 5.3 minutes.
In addition to the nucleoside standards, additional controls were measured in
order to identify any peaks that may have retention times worth noting when conducting
analysis on PG and λ. Pure-E H2O was used to suspend PG after DNA extraction and
therefore ran through the HPLC as a standard (Figure 14). Pure-E H2O resulted in a
single significant peak and was ran through the same 0.5 µm Milex-HV filter.
53
Figure 14. Pure-E H2O standard ran on HPLC. Pure-E H2O was filtered through the same
NANOSEPTM 3K Omega filter and injected into the HPLC. Pure-E H2O measured has a significant peak at
2.46 minutes.
Alkaline phosphatase was used to treat all samples in order to remove the
phosphate groups on nucleotides. The alkaline phosphatase enzyme and buffer was
introduced at same volume concentrations to DNA volume samples used (Figure 15).
The control sample was at a total of 40µl volume which included 4µl alkaline
phosphatase buffer, 2µl alkaline phosphatase enzyme, and the remaining pure-E H2O.
Alkaline phosphatase displayed three distinct peaks measured at 2.51 minutes, 3.4
minutes, and 11.48 minutes. The first peak could be the same peak measuring pure-E
H2O.
Figure 15. Alkaline phosphatase ran on HPLC. Alkaline phosphatase + buffer was filtered through the
same NANOSEPTM 3K Omega filter and injected into the HPLC. Three significant peaks were noted at
2.51, 3.4, and 11.48 minutes.
54
λ phage DNA was used as our positive control to identify nucleotides from λ’s
overhanging ends. 0.3µg of λ DNA was input into a total volume of 40µl. λ was heated
at 65-70°C for 5 minutes in order to separate the cos site but was not treated with mung
bean nuclease. The sample was still treated with alkaline phosphatase and filtered by the
same procedure after heat treatment in order to identify any possible nucleosides that may
have fragmented off the overhanging ends (Figure 16). The chromatogram illustrates the
expected results of having no indication of nucleosides present with no presence of mung
bean endonuclease. The chromatogram results from λ resembles the same peaks present
in the alkaline phosphatase chromatogram in figure 15.
Figure 16. λ DNA heated with no mung bean and ran on HPLC. λ was heated to 65-70°C and
incubated with alkaline phosphatase for 1 hour. After incubation, the sample was filtered through the same
NANOSEPTM 3K Omega filter and injected into the HPLC. Three significant peaks displayed measuring at
2.53, 3.37, and 11.49 minutes.
An additional λ DNA sample was heated at 65-70°C for 5 minutes and then
immediately digested with mung bean for 1 hour. The sample was then filtered in order
to treat the digested nucleotides with alkaline phosphatase for an additional hour. After
dephosphorylation, the sample was filtered again and ran on the HPLC (Figure 16). The
chromatogram indicates significant peaks that measure the same time of retention that
55
were identified with each standard nucleoside in figure 13 and peaks seen on the
chromatogram with alkaline phosphatase.
dC
dG
dT
dA
Figure 16. λ DNA heated, digested with mung bean, and ran on HPLC. λ was heated to 65-70°C for 5
minutes and digested with mung bean. After filtering, digested nucleotides were incubated with alkaline
phosphatase for 1 hour. After incubation, the sample was and injected into the HPLC. Significant peaks
identified nucleosides dC, dG, dT, and dA at indicated retention times.
PG was treated with the same parameters as λ when analyzing for cohesive ends.
0.6µg of PG DNA was heated at 65-70°C and suspended in pure-E H2O for a total
volume of 40µl. The sample was then filtered through the same NANOSEPTM 3K
Omega filter. Being AT rich, PG was treated through this procedure to identify the
integrity after handling and filtering the genome. The total volume sample was then
injected into the HPLC and gave an expected chromatogram identifying pure-E H2O with
no nucleotides (Figure 17).
56
Figure 17. Heated PG DNA suspended in pure-E H2O, filtered, and ran on HPLC. PG was suspended
in pure-E H2O for a total 40µl volume and was filtered through the same NANOSEPTM 3K Omega filter.
The PG sample was then injected into the HPLC and measured with a significant peak at 2.46 minutes
identifying H2O.
In order to identify if PG re-circularizes after DNA extraction, PG was digested
with mung bean endonuclease for 1 hour without heat treatment (Figure 18). After 1
hour, the sample was filtered and incubated with alkaline phosphatase for another hour.
After dephosphorylating, the sample was injected into the HPLC and resulted in two
peaks indicating nucleosides dT and dA. The residual trace substances are residual peaks
from alkaline phosphatase and its buffer.
dT
dA
Figure 18. Non-heated PG DNA, digested with mung bean, and ran on HPLC. PG was digested with
mung bean for 1 hour and filtered through the same NANOSEPTM 3K Omega filter. After filtering,
digested nucleotides were incubated with alkaline phosphatase for 1 hour. The PG sample was then injected
into the HPLC and measured two significant peaks indicating dT and dA.
The results from figure 18 imply mung bean endonuclease is digesting small
amounts of single stranded ends that identify dT’s and dA’s. To determine the accuracy
of the results, PG was treated within the same parameters for figure 18, but was exposed
to heating prior to digestion (Figure 19). PG was placed into a water bath at 65-70°C for
5 minutes. Immediately following the hot water bath, PG was digested with mung bean,
filtered, and dephosphorylated with alkaline phosphatase. The sample was injected into
57
the HPLC to identify any changes in peaks compared to figure 18. The chromatogram
detected all four nucleosides and was measured at a significantly higher absorbance than
λ and non-heated PG in figure 18.
dC
dG
dT
dA
Figure 19. Heated PG DNA, digested with mung bean, and ran on HPLC. PG was digested with
mung bean for 1 hour immediately after being in a water bath at 65-70°C for 5 minutes. After filtering,
digested nucleotides were incubated with alkaline phosphatase for 1 hour. The PG sample was then injected
into the HPLC and measured significant peaks indicating dC, dG, dT, and dA.
The results from heating PG revealed additional nucleosides dC and dG.
Moreover, the peaks of all nucleosides measured at a significantly higher absorbance
peak. The presence of nucleosides in the heated PG chromatograph does not determine if
PG circularizes by having sticky ends. As a result, PG was ligated ensuring
circularization and annealing of all nicks within the genome. Before ligating PG, the
HPLC was first injected with DNA ligase and ligase buffer as a control to identify any
retention peaks that may show up in the actual PG ligated sample (Figure 20). After
ligating PG, the DNA was heated to 65-70°C and digested with mung bean for 1 hour.
After filtering, alkaline phosphatase was added and incubated for an additional hour. The
58
resulting chromatograph of ligated PG DNA determines no digestion of nucleosides and
mirrored the same peak from filtered PG DNA in figure 17 (Figure 21).
Figure 20. DNA ligase and ligase buffer filtered. DNA ligase and ligase buffer was aliquoted into a total
40µl sample of pure-E H2O at the same volume ligating PG DNA. After filtering, the sample was injected
into the HPLC and measured a distinct peak at 3.4 minutes with a high absorbance level.
Figure 21. Ligated PG sample heated and digested with mung bean. PG ligated was heated to 65-70°C
for 5 minutes and digested with mung bean for 1 hour. Once filtered, the sample was injected into the
HPLC and measured a similar peak to PG DNA only in figure 17.
59
Bioinformatics
Sequence Retrieval
A total of 20 sequences were retrieved from the NCBI GenBank database
(http://www.ncbi.nlm.nih.gov/) that were specific to annotated phages with the
Terminase Large Subunit. Focusing on the order Caudovirales, 11 phages were chosen
from the family Podoviridae and 9 from Siphoviridae. All sequences were complete
gene sequences and varied in length shown in Table 5.
Multiple Sequence Alignment (MSA)
ClustalW is a program that aligns multiple sequences together to show conserved
regions between sequences. Therefore, ClustalW in SDSC Biology WorkBench was
used to construct multiple sequence alignments from the phage sequences. The MSA’s
constructed from ClustalW were used for further investigations and can be found in
Appendix B for Podoviridae, Siphovirdae, and all 20 sequences combined.
60
61
Gene ID
20088899
1261708
1257603
269975307
216262944
38707640
13476623
(not published)
7256802
33334159
13517602
7353018
2777092
66473858
24250810
8250754
60476784
7123726
21914413
Gene ID
2944239
Gene Description
Terminase [Methanosarcina acetivorans C2A]
Terminase Large Subunit
Putative Large Subunit Terminase
Phage Terminase Large Subunit
Terminase Large Subunit
Terminase Large Subunit
Phage Related Terminase
Possible Large Terminase Subunit
Putative terminase
Gene Description
Bacteriophage terminase large (ATPase) subunit
Terminase Large Subunit
Putative Large Subunit Terminase
Terminase Large Subunit
Terminase Large Subunit
Terminase Large Subunit
Terminase Large Subunit
Terminase Large Subunit
Terminase Large Subunit
Bacteriophage L head assembly gene cluster, partial sequence
DNA packaging protein gp2 (Terminase large subunit)
Organism
Enterobacteria phage P22
Salmonella phage P22-pbi
Shigella phage Sf6
Salmonella phage HK620
Salmonella phage epsilon34
Enterobacteria phage ST104
Salmonella phage SE1
Salmonella phage ST64T
Salmonella phage C341
Enterobacteria phage L
Escherichia fergusonii ATCC 35469
Accession Number
NC_003552.1
NC_001902
NC_002628
GU169904
EU877232
NC_005282
NC_002678
(not published)
NC_011811
Amino Acids Complete Genome
1499 bp
499
1499 bp
499
1412 bp
470
1412 bp
470
1500 bp
499
1500 bp
499
1500 bp
499
1554 bp
517
1500 bp
499
1500 bp
499
1500 bp
499
Amino Acids Complete Genome
Organism
1482 bp
493
Methanosarcina acetivorans C2A
1407 bp
468
Methanobacterium phage psiM2
1404 bp
467
Methanothermobacter phage psiM100
879 bp
292
Staphylococcus phage SA1
1602 bp
533
Enterobacteria phage WV8
1602 bp
533
Salmonella phage Felix 01
1389 bp
462
Mesorhizobium loti MAFF303099
1622 bp
540
Methanobrevibacter phage PG
1605 bp
534
Erwinia phage phiEa21-4
Siphoviridae
Accession Number
NC_00237
AF527608
AAQ12192
AF335538
NC_011976
NC_005841
DQ003260
AY052766
NC_013059
AY795968
NC_011740
Podoviridae
Table 5. List of TLS gene from families Podoviridae and Siphoviridae
Gene Region
4707089-4708570 bp
4906-6312 bp
10168-11571 bp
41003-41881 bp
32001-33602 bp
30625-32226 bp
6592989-6594377 bp
70165-66613 bp
26606-28210 bp
Gene Region
203-1702 bp
203-1702 bp
420-1832 bp
20100-21512 bp
467-1966 bp
24878-26377 bp
25026-26525 bp
24278-25831 bp
467-1966 bp
1426-2925 bp
2096748-2098247 bp
After applying the parameters for the best alignment, BioEdit was used to
manually adjust gaps that were present in the alignments. Each nucleotide or amino acid
would acquire its own color and are aligned according to their colors. BioEdit also
detects motifs and highly conserved regions within the sequences. The nucleotide MSA
sequences would be translated to determine any locations of unwanted X’s. Any
presence of unwanted X’s indicated the sequence to be out of frame. The X’s were taken
into consideration and were replaced with the appropriate dashes to indicate the gaps at
the correct locations. For example, if there was an amino acid sequence of LNA~XLS,
where “~” is an indication of a gap, the amino acid sequence would be toggled back to
the nucleotide sequence in order to determine the problem and see the sequence of
TTAAATGCT~~~--GTTAAGC. In this case, an additional gap would be added or two
of the gaps present would be manually deleted, depending on the proper alignment in
which will give the best conserved regions. The MSA of amino acid sequences can be
found in Appendix B.
To identify for motifs within the proteins, the TLS from Methanobrevibacter
phage PG was translated into an amino acid sequence and a protein blast was performed
through http://blast.ncbi.nlm.nih.gov. The results gave an overview and a graphical
summary from the database of sequences that align to the query sequence and includes
conserved protein motifs (Figure 22). These conserved domains within the protein
coding region of TLS from phage PG are shown in Figure 23 with different confidence
levels and the domain model scope which is marked on the left side of the graph. Even
though there are 5 hits that are non-specific, they still match or surpass the threshold.
The 3 hits in the superfamilies are conserved domain clusters that create overlapping
62
annotations on the same protein sequences and are expected to signify evolutionary
related domains. Hits in the multi-domains are models detected and would likely have
several domains.
Figure 22. Protein BLAST results for terminase large subunit for Methanophage PG. (NCBI BLAST
http://blast.ncbi.nlm.nih.gov/Blast.cgi)
In the graphical summary, there is a total of 7 domain hits with the largest as the
homing endonucleases at 438 aa from the pfam05203, which are encoded by mobile
DNA elements (Table 6). The next largest is the phage terminase from the PBSX family
and has a Cd length of 396 aa. This TIGR01547 model identifies other divergent
members of the large terminase subunit. Terminase from pfam04466 has a conserved
domain length of 387 aa and commences the packaging of viral DNA in the head capsid
of phages. Terminase from pfam03237 has a Cd length of 380 aa and this family
characterizes groups of terminase proteins. The next hit is a phage related terminase that
is a part of the superfamily cl02216, which has a length of 202 aa. The Cd length of 142
aa from the psiM2_ORF9 models the C-terminal region of the terminase and the smallest
domain is of the Hedgehog/Intein domain N-terminal region with Cd 100 aa. However,
the conserved domains with greater significant E-values are to be more significant as
63
seen in Terminase 6 of the pfam03237, COG5362 related terminase, and the C-terminal
region from psiM2_ORF9.
Table 6. Protein BLAST summary results of Terminase Large Subunit from Methanophage PG
Number
Description
Cd Length
E-Value
1
Homing endonuclease
438
1e-04
2
Phage terminase PBSX family
396
3e-05
3
Terminase pfam04466
387
0.002
4
Terminase pfam03237
380
7e-22
5
Phage related terminase
202
1e-16
6
C-terminal region
142
1e-13
7
Hedgehog/Intein domain
100
0.008
After locating these motifs within the protein of interest, a search in the
Conserved Domain Architecture Retrieval Tool (CDART) which utilizes the NCBI
Entrez Protein Database was used to search for other proteins that are evolutionarily
similar to the Methanobrevibacter phage PG amino acid sequence that was inputted.
Figure 23 displays the query sequence at the top of the image and designates HintN,
phage terminase 2 from the PBSX family, and the COG5362 phage related terminase as
conserved domains. The results show similar domain architectures as to where these
motifs could be found in other protein families. HintN, a motif known to belong in the
DNA polymerase III superfamily. Along with TIGR01547 of the PBSX family, these
64
motifs can also be seen in two other sequences from Pirellula staleyi and 12 sequences
from Proteobacteria. As for COG5363, it is also located in the same 2 sequences of
Pirellula staleyi.
Figure 23. CDART results page for terminase large subunit for Methanophage PG. (NCBI CDART
http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi)
Protein modeling
Terminase Large Subunit in phage PG has not been isolated yet. When the amino
acid sequence of the TLS was searched in the Pfam database, the results indicated to be
closely related to the Terminase-like family, which has similarities to known terminase
65
function in bacteriophage T4 and λ. PG was compared with bacteriophage T4’s
terminase and the alignment has an E-value of 3.7e-35 with a 100% degree of confidence
in the majority of aligned residues (Figure 24). The alignment hit of each row shows the
matching Hidden Markov Model (HMM) and the query sequence. The alignment key is
presented at the bottom of Figure 24 and the sequence search in the Pfam database
presented a signifcant match from the query sequence to the Terminase-like family.
66
67
Figure 24. Sequence search in Pfam protein families databases. Identified the signifcance in the match from the
query sequence to the Terminase-like family. (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml)
The line-up of Phage PG has an E-value of 3.7e-35 with Bacteriophage T4 and
the Line-up with the Terminase (gp17) of Bacteriophage T4 is shown in Figure 24. The
key of the alignment is shown at the bottom of Figure 24 which indicates the significance
the query sequence to the terminase-like family. With the significant E-value between
PG and T4, it is hypothesized that they would share similar strcutured frequencies.
RCSB Protein Data Bank structure database was used to view Bacteriophage T4
gp17 Terminase in ribbon format designed (Sun, et al., 2008) in order to display a
representative structure ribbon structure of PG’s terminase (Figure 25). The secondary
structure of this protein is made up of 32% alpha helices and 19% beta sheets (Figure 26)
(Finn, et. al., 2006; Kabsch, et. al., 1983).
Figure 25. 3-D model of the Bacteriophage T4 gp17 in ribbon format.
(http://www.rcsb.org/pdb/home/home.do)
68
Figure 26. Sequence of Bacteriophage T4 gp17 protein model. Displays the Terminase-like family in
green and the helical and beta sheets brown and yellow, respectively.
(http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml)
69
Distance matrices
To analyze evolutionary relationships, distance matrices are often generated as a
way to measure and analyze evolutionary and genetic distances between species that have
diverged from a common ancestor. In this case, that would be looking at the changes that
have occurred in the Large Terminase subunit within Podoviridae and Siphoviridae.
There are also several models that take into account different parameters and
presumptions giving different information of what could have occurred at the genetic
level.
Pairwise distance matrix, also known as P-distance, generates matrices from
simple parsimony informative sites. This could be misleading in unreliable
measurements of evolutionary distances. Uncorrected P-distances measures only a part
of nucleotide or amino acid substitution where this could result in miscalculating
evolutionary distances as well. However, if distances are small, than precise readings can
occur. This could be seen in MEGA using p-distances and Paup using uncorrected pdistances where both Podoviridae and Siphoviridae are relatively close in values.
Kimura 2 parameter was used to determine evolutionary distance and takes into
account transitions and transversions differently. This method sets its parameters to
adjust for the amount of transitions having more possibilities to occur than transversions.
MEGA developed slightly lower results than Paup for Siphoviridae but for Podoviridae
the results were very close in range.
Jukes-Cantor is another simple method and makes assumptions that substitutions
occur randomly. These random substitutions are suggested to occur with equal
70
probability and are called one-parameter model. Jukes-Cantor also considers
synonymous substitution and non-synonymous substitution rates that can be generated
using MEGA. With counting synonymous changes, the codon would not code for
another amino acid if the nucleotide is to change where in a non-synonymous change, the
alteration of the nucleotide would change the amino acid originally coded for. In the
synonymous matrix, MEGA could not calculate some of the values therefore it could not
accurately account for synonymous change. As a result, in order to measure for
evolutionary distance, it would be necessary to look at non-synonymous changes.
Kimura 3 parameter was able to calculate in PAUP, which is similar to Kimura 2
parameter in compensating for transitions and transversions, but this model divides
transversions into two types A-T/G-C and the other is A-C/G-T. PAUP was also capable
of absolute distances which looks at total nucleotide changes in characters to measure for
evolutionary distances. In MEGA, Tajima-Nei was used to identify distances that
assumes equal substitution rates between transitions and transversions.
MEGA was able to identify conserved sites, variable sites, parsimony-informative
sites, singleton sites, 0-fold degenerate sites, 2-fold degenerate sites, and 4-fold
degenerate sites from the families Podoviridae and Siphoviridae (Table 7). The distance
matrices generated with PAUP and MEGA programs can be found in Appendix C.
Table 7. MEGA results of conserved, variable, parsimony-informative, singleton, 0-fold, 2-fold,
and 4-fold degenerate sites from Podoviridae and Siphoviridae.
Phage
Family
Podoviridae
Siphoviridae
Conserved
Variable
406/1554
190/1833
1094/1554
1457/1833
ParsimonyInformative
1002/1554
1080/1833
71
Singleton
92/1554
358/1833
0-fold
Degenerate
973/1554
978/1833
2-fold
Degenerate
147/1554
75/1833
4-fold
Degenerate
114/1554
44/1833
Modeltest
The best model determined from the Modeltest results for Podoviridae and
Siphoviridae is the general time reversible model with gamma rate distribution (GTR+G).
The dataset for Podoviridae resulted with the akaike information criterion
(AIC)=11099.8994, -InL=5540.9497, and K=9. For Siphoviridae, the AIC=24658.3223,
-InL=12320.1611, and K=9. The dataset for both Podoviridae and Siphoviridae
combined resulted in AIC=36042.1484, -InL=18012.0742, and K=9. Running Modeltest
indicated GTR+G as the ideal model and gave values to be applied into the Maximum
Likelihood tree. This information will help in providing the best Maximum Likelihood
tree to provide optimal conditions for this search and will later be applied to the bootstrap
analysis. The Modeltest output can be found in Appendix D.
Podoviridae:
- Model Selected:
o GTR+G
o -lnL = 5540.9497
o K=9
o AIC = 11099.8994
- Base Frequencies
o A = 0.2704
o C = 0.2349
o G = 0.2702
o T = 0.2245
- Substitution model
o Rate Matrix
o A-C = 1.7656
o A-G = 2.6455
o A-T = 0.6250
o C-G = 0.4248
o C-T = 4.5017
o G-T = 1.0000
- Among-site rate variation
o Proportion of invariable sites = 0
o Variable sites (G)
o Gamma distribution shape parameter = 0.4516
72
Siphoviridae:
- Model Selected:
o GTR+G
o -lnL = 12320.1611
o K=9
o AIC = 24658.3223
- Base Frequencies
o A = 0.2916
o C = 0.2088
o G = 0.2499
o T = 0.2497
- Substitution model
o Rate Matrix
o A-C = 1.7834
o A-G = 1.8181
o A-T = 1.1345
o C-G = 0.9127
o C-T = 2.7446
o G-T = 1.0000
- Among-site rate variation
o Proportion of invariable sites = 0
o Variable sites (G)
o Gamma distribution shape parameter = 1.4244
For all 20 sequences:
- Model Selected:
o GTR+G
o -lnL = 18012.0742
o K=9
o AIC = 36042.1484
- Base Frequencies
o A = 0.2828
o C = 0.2193
o G = 0.2575
o T = 0.2404
- Substitution model
o Rate Matrix
o A-C = 1.8378
o A-G = 2.0590
o A-T = 0.9950
o C-G = 0.6558
o C-T = 3.0231
o G-T = 1.0000
- Among-site rate variation
73
o Proportion of invariable sites = 0
o Variable sites (G)
o Gamma distribution shape parameter = 1.0760
Phylogenetic analysis
In order to generate the three trees using Bayesian analysis, maximum parsimony,
and maximum likelihood, the file imported into PAUP was edited by copying and pasting
the Modeltest blocks. This was done to both MSA’s and executed through PAUP.
Parsimony analysis provides the simplest technique to develop trees along with very few
assumptions. In doing so, this non-model derived algorithm utilizes only informative
sites. The parameters set when running the heuristic search and developing the
parsimony tree were left as the default. Under general search options, all characters are
weighted equally and branches are set to collapse if the maximum length is zero.
Character-State optimization is set to accelerated transformation. The step matrix option
was selected to permit for the assignment of states not observed in terminal taxa to
internal nodes and those selected can be recognized as possible short cuts by the “3 + 1”
test. For Podoviridae, the heuristic search saved 2 trees and had 728 rearrangements.
The best tree score was 1256. As for Siphoviridae, 1 tree was saved from 188
rearrangements and gave the best tree at 3144. A maximum parsimony of all the desired
sequences with the default settings was ran in order to observe how the trees would result
if the alignments between the two families of phages did not have many conserved
regions. This developed 7902 rearrangements and saved 3 trees. The best score was
5222.
74
Bayesian analysis collects data to develop phylogenetic trees from prior
probabilities. To run MrBayes, the PAUP and MrBayes blocks were already
incorporated and executed in the MrBayes program, but brackets were placed to block
out the Modeltest files. It was set to run one million generations in order to develop
10,000 trees. Where the standard deviation of split frequency stopped oscillating, that
was the location of the tree chosen for all 3 Bayesian analysis trees.
Podoviridae:
- Selected tree 6111-10000 where the frequency was at 0.001944
- Show trees and indices
- Computed a consensus tree for 50% majority-rule and included compatible
groupings and frequencies of other bipartitions
Siphoviridae:
- Selected tree 4951-10000 where the frequency was at 0.005831
- Show trees and indices
- Computed a consensus tree for 50% majority-rule and included compatible
groupings and frequencies of other bipartitions
All sequences:
- Selected tree 6231-10000 where the frequency was at 0.002413
- Show trees and indices
- Computed a consensus tree for 50% majority-rule and included compatible
groupings and frequencies of other bipartitions
This program is computer intensive and would present fairly high values being above 90
for Baysian trees.
Maximum Likelihood analysis utilizes proper calculations in looking at all
possible tree considerations. This in turn, searches for the most likelihood of producing
observed data. However, this would need to utilize information from Modeltest in order
to create optimal criteria in searching for the phylogenetic trees. Therefore, the
Modeltest results were used for Maximum Likelihood analysis in order to run the
heuristic search.
75
Podoviridae:
- Heuristic Search
o 352 rearrangements
o 1 tree saved
o Best tree: 5530.2840
Siphoviridae:
- Heuristic Search
o 212 rearrangements
o 1 tree saved
o Best tree: 12320.092
For all 20 sequences:
- Heuristic Search
o 2780 rearrangements
o 1 tree saved
o Best tree: 18011.541
Between the trees, there are similarities in the arrangements in Siphoviridae where
the branching patterns are very similar but the consensus tree is just in a rectangular
cladogram and takes into account a constant molecular clock (Figure 30 and 31).
Overall, you can see the similarities of all three trees in that family. In Podoviridae, the
parsimony tree (Figure 27) and the maximum likelihood tree (Figure 28) are very
analogous to each other. In all 20 sequences, all three trees can draw some parallels and
you can see the same pattern caused in Podoviridae with all 20 sequences (Figure 33, 34,
and 35), which shows that they are very similar and have not diverged too long ago.
76
Figure 27. Phylogenetic tree of selected phage TLS from Podoviridae using 11 sequences in the
Parsimony analysis and is shown in a rectangular cladogram.
Figure 28. Phylogenetic tree of selected phage TLS from Podoviridae using 11 sequences in the
Maximum Likelihood analysis and is shown in rectangular cladogram.
77
Figure 29. Phylogenetic tree of selected phage TLS from Podoviridae using 11 sequences in the Baysian
analysis 50% majority-rule and is shown in rectangular cladogram.
Figure 30. Phylogenetic tree of selected phage TLS from Siphoviridae using 9 sequences in the Parsimony
analysis and is shown in rectangular cladogram. The red star indicates the methanophage PG.
78
Figure 31. Phylogenetic tree of selected phage TLS from Siphoviridae using 9 sequences in the Maximum
Likelihood analysis and is shown in rectangular cladogram. The red star indicates the methanophage PG.
Figure 32. Phylogenetic tree of selected phage TLS from Siphoviridae using 9 sequences in the Baysian
analysis 50% majority-rule and is shown in rectangular cladogram. The red star indicates the
methanophage PG.
79
Figure 33. Phylogenetic tree of selected phage TLS from Podoviridae and Siphoviridae using 20
sequences in the Parsimony analysis and is shown in rectangular cladogram. The red star indicates the
methanophage PG.
Figure 34. Phylogenetic tree of selected phage TLS from Podoviridae and Siphoviridae using 20
sequences in the Maximum Likelihood analysis and is shown in rectangular cladogram. The red star
indicates the methanophage PG.
80
Figure 35. Phylogenetic tree of selected phage TLS from Podoviridae and Siphoviridae using 20
sequences in the Baysian analysis 50% majority-rule and is shown in rectangular cladogram. The red star
indicates the methanophage PG.
Bootstrap analysis performs a random shuffling of the MSA columns to resample
the data and in an attempt to get the same tree from the data. The values presented represent
the amount of times the branch has been regenerated where any value above 70 is
significant. For both families of viruses, 1000 replicates were selected to be analyzed to
produce some gauge of accuracy in the trees since the gene is susceptible to a considerable
amount of mutation from phages. The Modeltest was ran for both and retrieved the same
parameters to input into the maximum likelihood settings.
The bootstrap generated some significant values but those with no values have
collapsed branches producing the tree with less than 50% confidence. Compared to the
Majority rule consensus trees (Figure 29 and 32), the Bootstrap values were a bit lower
81
but these values are more reliable (Figure 36-41). In addition, both bootstrap trees and
the consensus trees have very similar arrangements but in Podoviridae, the clade
containing Escherichia fergusonii, Salmonella ph9, Shigella phage in the Bayesian
analysis shows that as being significant where in the bootstrap Escherichia fer is equally
non-significant with the Bacteriophage phage L and Enteriobacteria phage clade (Figure
29). For Siphoviridae, the value of significance also matches to the Bayesian analysis
along with the arrangements but the clade including Methanothermobacter phage,
Methanobacterium phage, PG, and Methanosarcina is not significant to
Mesorhizobrevibacter phage where in the 50% Majority rule consensus tree that clade is
significant (Figure 32).
82
Figure 36. Bootstrap output from Podoviridae using 11 sequences.
83
Figure 37. Bootstrap tree from Podoviridae using 11 sequences.
Figure 38. Podoviridae phylogram with bootstrap values using 11 sequences.
84
Figure 39. Bootstrap output from Siphoviridae using 9 sequences.
85
Figure 40. Bootstrap tree from Siphoviridae using 9 sequences.
Figure 41. Siphoviridae phylogram with bootstrap values using 9 sequences.
86
Time of Divergence
Jukes-Cantor non-synonymous substitutions were used to create the time of
divergence table for both Podoviridae and Siphoviridae. Synonymous subustitutions
Jukes-Cantor had values that were unable to be calculated therefore would not be
significant to use. When developing a time of divergence, determining rates of change
and the number of substitutions would help to figure out the estimated time when any 2
species may have had a common ancestor. Utilizing the equation,  = K/(2t),  is
designated as the number of substitutions per site per year, K represents the amount of
substitution between any pair of species. This K value is derived from MEGA4 in
creating the distance matrices. The time of divergence between two sequences is
indicated by t in the equation. Since working with bacteriophages, there were no
designated numbers for , therefore the next closest possibility was the Bacteria valued at
4.5 x 10-9 substitutions/non-synonymous/year. In Podoviridae, there are some listed as
having no divergence which is between Salmonella phage 4 and Salmonella phage 5,
which is shown in the Bayesian analysis and the bootstrap analysis to be significantly
related. The same results can be seen with Enterbacteria phage 7 and Bacteriophage
P22. The next most recent divergence is seen at 333,000 years between several species
strains of Salmonella phage and Enterboacteria phage. However the latest time of
divergence is seen at 226,000,000 years ago between Salmonella phage 9 and
Bacteriophage P22, and between Salmonella phage 9 and Enterobacteria phage ST104.
As for Siphoviridae, the most recent time of divergence can be seen between
Staphylococcus phage and Enterobacteria phage Felix01 and between Enterobacteria
phage Felix01 and Enterbacteria phage WV8 which occurred around 111,000 years ago.
87
The latest time of divergence occurred at 133,000,000 years ago between
Methanobrevibacter PG and Methanosarcina acetivorans. These values do not seem to
match up with the bootstrap and Bayesian analysis values which show no significance
between Staphylococcus and Enterobacteria phages. This could be due to the fact that
the substitutions per site per year could not be accurate amongst phages since their
capabilities of picking up DNA through vertical and horizontal transfer can occur more
often than in regular bacteria.
88
Table 8. Podoviridae Jukes-Cantor non-synonymous time of divergence table
Divergence Time Table Jukes-Cantor Non-Synonymous Podoviridae
Bacterio
phage_L
Bacterio
phage_L
Salmone
lla_phag
Salmone
lla_ph_2
Enteroba
cteria
Salmone
lla_ph_4
Salmone
lla_ph_5
Bacterio
phage_P
Enteroba
cteri_7
Escheric
hia_fer
Salmone
lla_ph_9
Shigella
_phage
Bacterio
phage_L
Salmone
lla_phag
Salmone
lla_ph_2
Enteroba
cteria
Salmone
lla_ph_4
Salmone
lla_ph_5
Bacterio
phage_P
Enteroba
cteri_7
Escheric
hia_fer
Salmone
lla_ph_9
Shigella
_phage
Salmone
lla_phag
Salmone
lla_ph_2
Entero
bacteri
a
Salmone
lla_ph_4
Salmone
lla_ph_5
Bacterio
phage_P
Enterob
acteri_7
Escheri
chia_fer
Salmone
lla_ph_9
Shigella
_phage
0.000
0.004
0.004
0.003
0.003
0.003
0.003
0.003
0.005
0.004
0.003
0.003
0.005
0.004
0.000
0.006
0.006
0.005
0.005
0.003
0.003
0.006
0.006
0.005
0.005
0.003
0.003
0.000
0.049
0.049
0.051
0.050
0.048
0.048
0.047
0.047
2.003
2.003
2.000
2.018
2.025
2.025
2.038
2.038
1.975
1.987
1.987
1.984
2.002
2.009
2.009
2.022
2.022
1.963
0.003
Bacterio
phage_L
Salmone
lla_phag
Salmone
lla_ph_2
Salmone
lla_ph_4
Salmone
lla_ph_5
Bacterio
phage_P
Enterob
acteri_7
Escheri
chia_fer
Salmone
lla_ph_9
4.44E+0
5
3.33E+0
5
3.33E+0
5
3.33E+0
5
6.67E+0
5
6.67E+0
5
5.44E+0
6
2.23E+0
8
2.21E+0
8
3.33E+0
5
5.56E+0
5
5.56E+0
5
5.56E+0
5
5.56E+0
5
5.67E+0
6
2.22E+0
8
2.20E+0
8
0.00E+0
0
3.33E+0
5
3.33E+0
5
5.33E+0
6
2.25E+0
8
2.23E+0
8
3.33E+0
5
3.33E+0
5
5.33E+0
6
2.25E+0
8
2.23E+0
8
0.00E+0
0
5.22E+0
6
2.26E+0
8
2.25E+0
8
5.22E+0
6
2.26E+0
8
2.25E+0
8
2.19E+
08
2.18E+
08
3.33E+0
5
Entero
bacteri
a
0
4.44E+0
5
3.33E+0
5
3.33E+0
5
3.33E+0
5
6.67E+0
5
6.67E+0
5
5.44E+0
6
2.23E+0
8
2.21E+0
8
4.44E+
05
4.44E+
05
5.56E+
05
5.56E+
05
5.56E+
06
2.24E+
08
2.22E+
08
89
Shigella
_phage
Table 9. Siphoviridae Jukes-Cantor non-synonymous time of divergence table
Divergence Time Table Jukes-Cantor Non-Synonymous Siphovridae
Enterobact
Staphyloco
Enteroba02
Erwinia_ph
Methanothe
Methanobac
Methanosar
Mesorhizob
PG
Enterobact
Staphyloco
0.001
Enteroba02
0.001
0.002
Erwinia_ph
0.225
0.226
0.222
Methanothe
0.919
0.912
0.910
0.995
Methanobac
0.928
0.918
0.919
1.005
0.060
Methanosar
0.935
0.927
0.936
0.896
0.854
0.844
Mesorhizob
0.720
0.716
0.722
0.705
0.842
0.851
0.577
PG
1.140
1.139
1.142
1.152
0.917
0.904
1.195
1.144
Enterobact
Staphyloco
Enteroba02
Erwinia_ph
Methanothe
Methanobac
Methanosar
Mesorhizob
Enterobact
Staphyloco
1.11E+05
Enteroba02
1.11E+05
2.22E+05
Erwinia_ph
2.50E+07
2.51E+07
2.47E+07
Methanothe
1.02E+08
1.01E+08
1.01E+08
1.11E+08
Methanobac
1.03E+08
1.02E+08
1.02E+08
1.12E+08
6.67E+06
Methanosar
1.04E+08
1.03E+08
1.04E+08
9.96E+07
9.49E+07
9.38E+07
Mesorhizob
8.00E+07
7.96E+07
8.02E+07
7.83E+07
9.36E+07
9.46E+07
6.41E+07
PG
1.27E+08
1.27E+08
1.27E+08
1.28E+08
1.02E+08
1.00E+08
1.33E+08
90
1.27E+08
PG
DISCUSSION
The terminase enzyme is made up of two parts that are responsible for the
packaging of DNA in bacteriophages (Black, 1995). Phages that are capable of this
mechanism utilizes ATPase to deliver the DNA into the head capsid. The large terminal
subunit of this enzyme is responsible for cutting the DNA and transporting it with the
help of ATPase (Burroughs, et al., 2007). These tailed phages are seen within families of
Myoviridae, Podoviridae, and Siphoviridae. Knowing that the exchange of DNA occurs
more often in viruses than within bacteria, mutation and evolutionary divergence would
come about quite often. This was seen when aligning all 20 sequences together, there
was nearly no significant amounts of conserved domains.
When blasting this gene, I came across conserved regions that have matched not
only with other phages but with other bacteria as well. This led me to hypothesize that
this gene could have incorporated some of the hosts’ DNA from infection and is now part
of the phage genetic material. This would lead to more difficulty in determining
evolutionary divergence, not to mention their capabilities of performing horizontal and
vertical gene transfer. In addition, because they are so diverse, problems occurred when
trying to develop certain trees that came to be insignificant. The bootstrap, for instance,
at 500 replicates was not able to provide any significant values to indicate reliable data of
evolutionary divergence. To help solve for the problem, I ran 1000 replicates for both
Podoviridae and Siphovirdae. By obtaining more replicates, some significance came
about and I could see that the Archaea phages were related to each other and similar
species of other phages were related as well.
91
When evaluating the divergence time table, I noticed that with Siphovirdae, the
earliest divergence between the other species was seen in the Archaea phage groups,
averaging around 110,000,000 years. This would not be much of a surprise since the
origin of Archaeal species is very old and could be the oldest lineage that still exists.
This could very well show some sort of information on how or where the gene first
derived from.
All known tailed phages have single linear dsDNA and vary in genome size.
Though packaged into a procapsid, their replication strategies and the types of terminal
ends created from the packaging event are not all the same. The different terminal types
of ends are determined by the differing actions of the terminase enzyme during DNA
packaging and reflect different replication strategies (Casjens and Gilcrease, 2009). The
known types of termini for tailed bacteriophages are single stranded cohesive ends,
circularly permuted direct terminal repeats, direct terminal repeats (short or long),
terminal host DNA sequences, and covalently bound terminal proteins, however, phages
with terminal bound proteins do not require nucleolytic cleavage. Although the
terminase enzyme creates various genomic ends, it is the most conserved tailed phage
protein (Casjens and Gilcrease, 2009). PG’s putative terminase large subunit was shown
to be significant and highly conserved. PG’s terminase family, terminase_6, was
identified from the Pfam database and classified very similarly to λ’s terminase family
terminase_GpA. In addition, terminal ends cluster together according to the type of DNA
ends created by tailed phages and by knowing the amino acid sequence of PG’s large
terminase subunit, it was shown to fall in the same clade with short direct terminal
repeats/T7 and 5’-cos/λ.
92
Phages with direct terminal repeats could go unnoticed if the phage genome
sequence was determined by the shotgun sequencing method. In addition, restriction
digest analysis would result in equimolar fragments regardless of the heating and cooling
pattern that is seen within cohesive end phages. However, essential purification steps
were needed in order to effectively digest PG with restriction enzymes. Originally,
phenol chloroform extraction of PG was conducted to purify the DNA. The Nanodrop
spectrophotometer indicated a relatively pure DNA sample but when ran on a gel, the PG
DNA would create a smear due to a nuclease contaminant being active in MOPS buffer.
Nevertheless, when switching to citrate buffer, we encountered another problem that a
contaminant inhibited the activity of restriction enzymes. When mixing M13’s ssDNA
with PG’s dsDNA, mung bean nuclease was not inhibited in digesting M13, which gave
reason to believe that the contaminant was not soluble and could be attached to PG’s
DNA. In order to move forward, I focused on resolving the matter and identified that PG
DNA needed to go through an additional purification step after phenol chloroform
extraction. I attempted to extract PG DNA from an agarose gel but lost too much DNA
product. Therefore, PG DNA was ran through a Zymo-SpinTM column and resolved the
efficiency of restriction enzymes.
PG was digested with ClaI and EcoRV and resulted in no distinguishing gel
patterns after heating and cooling temperatures. However, the results of PG’s restriction
fragment pattern does not exclude PG from having cos sites. The cos site could have
been concealed by true restriction fragments or the cos site could have been on a small
band that ran off the gel. Furthermore, the single stranded sticky ends could be
insufficient in length to join and maintain the two fragments together in a gel, similar to
93
complementary overhangs created with restriction enzymes. This would indicate no
annealing under slow cooling conditions after the cohesive ends have been heated. Due
to the uncertainty of the results, PG DNA was further analyzed for cohesive ends by
determining the possible base composition of the single stranded ends by high
performance liquid chromatography.
HPLC has been used for precise measurements of DNA base composition and a
great alternative for determining G+C content. PG DNA was hydrolyzed into
nucleosides with mung bean nuclease and alkaline phosphatase. λ was concurrently
analyzed as a positive control for known cohesive ends and displayed the presence of the
four nucleosides from λ’s 12 base pair extensions. Proper controls and standards were
also injected into the HPLC to determine any additional chromatographic peaks that
could be seen within any injected test sample. If PG were to have cos sites, we would
expect the complementary overhanging ends to anneal back together after extraction, as
seen with λ. By heating PG to 65-70°C, we hypothesized that the cos site would separate
and the single stranded ends could be digested with mung bean nuclease. After PG has
been heated, digested, and filtered, the HPLC results identified nucleosides cytosine,
adenosine, guanosine, and thymidine. However, when observing the nucleosides
concentration, the absorbance units measured unusually high when compared with λ.
The chromatograms do not seem reasonable to indicate that PG has longer overhanging
ends than λ since we saw no change and bands annealing back together (Figure 11 and
12) as seen in λ (Figure 10). The results may be due to PG’s AT rich genome creating a
low melting temperature. The heating could have caused AT rich regions within PG’s
genome to separate and be susceptible by mung bean nuclease.
94
Another sample of PG was analyzed for cohesive ends by HPLC but did not
undergo any heat treatment. The results indicated the presence of adenosine and
thymidine at reasonable absorbance levels with respect to λ. Without having to heat PG,
mung bean nuclease was able to cleave off single stranded extensions giving reason to
believe that after PG DNA extraction, the genome remained linear and did not recircularize.
The display of single stranded extensions of adenosine and thymidine specifies
that PG does have cohesive ends, however, the results does not identify the specific
length of the single stranded extension and which strand the extension is on.
Deoxynucleotide sequencing would need to be ran on both ends off of the template PG
DNA in order to determine the precise location of where the template ends at each
terminus. The location of the overhanging end can be identified by comparison to a
ligated sequence of PG.
Whether PG’s genome has 5’ or 3’ overhanging ends, circularization did not take
place after DNA extraction. This could be due to PG’s AT rich nature and not having the
appropriate bond strength to maintain a closed template. Adenine forms only two
hydrogen bonds with thymine, where cytosine and guanine forms three hydrogen bonds.
In addition, the cohesive ends of PG could be shorter in length when compared to λ.
Theoretically, in order for PG’s linear genome to serve as a closed circular template, it
would be ligated together by the host DNA ligase to serve as a template for DNA
replication.
95
Though PG has been identified to obtain cohesive ends, the replication strategy
still remains undetermined. It would be of interest to determine the replication
mechanism of PG which can be determined experimentally by dissecting phage encoded
proteins from host proteins recruited for replication (Weigel and Seitz, 2006). Although
PG’s terminal end does reflect the replication strategy of a rolling circle, it cannot be
reliably predicted unless the replication genes are similar to the replication module.
However, studying phage replication modules with cohesive ends can gain a better
understanding of the replication of PG.
Conclusion
The original hypothesis was that if the terminase packaging uses the cos site as part of its
packaging strategy, then restriction digest and HPLC nucleoside analysis would show
cohesive ends. The results of this work using Bioinformatics, restriction enzyme
analysis, and HPLC, I conclude that PG has AT rich cohesive ends and suggests the use
of circular replication as its packaging strategy.
96
REFERENCES
Abedon, S. T., & Calendar, R. (2006). The Bacteriophages. New York: Oxford
University Press.
Baker, S., Nicklin, J., & Griffiths, C. (2011). BIOS instant notes in microbiology.
London: Taylor & Francis.
Baresi, L. and Bertani, G. (1984). Isolation of a bacteriophage for a methanogenic
bacterium. Abstract 84th Annual Meeting American Society for Microbiology. I-74, p.
133.
Black, W. L. (1995). DNA packaging and cutting by phage terminases: control in phage
T4 by a synaptic mechanism. BioEssays. 17(12), p. 1025-1030.
Blaut, M. (1994). Metabolism of methanogens. Antonie Van Leeuwenhoek. 66(1-3), p.
187-208.
Burroughs, A. M., Iyler, L. M., & Aravind, L. (2007). Comparative genomics and
evolutionary trajectories of viral ATP dependent DNA-packaging systems. Gene and
Protein Evolution. 3, p. 48-65.
Calender, R. (1988). The Bacteriophages: Volume 1. New York: Plenum Press.
Calender, R. (1988). The Bacteriophages: Volume 2. New York: Plenum Press.
Cann, A. J. (2005). Principles of Molecular Virology. Burlington, MA: Elsevier
Academic Press.
Casjens, S. R. and Gilcrease, E. B. (2009). Determining DNA packaging strategy by
analysis of the termini of the chromosomes in tailed-bacteriophage virions.
Bacteriophages: Methods and Protocols. Humana Press. 2(7), p. 91-111.
Cavicchioli, R. (2011). Archaea — timeline of the third domain. Nature Reviews
Microbiology. 9(1), p. 51-61.
Deresinski, S. (2009). Bacteriophage therapy: Exploiting smaller fleas. Clinical
Infectious Diseases. 48(8), p. 1096-1101.
Desselberger, U. (2002). Virus taxonomy: Classification and nomenclature of viruses.
Virus Research. 83(1), p. 221-222.
Dimitrov, D. (2004). Virus entry: Molecular mechanisms and biomedical applications.
Nature Reviews. Microbiology. 2(2), p. 109-122.
97
Edgar, R. (2004). Muscle: Multiple sequence alignment with high accuracy and high
throughput. Nucleic Acids Research. 32(5), p. 1792-1797.
Felsenstein, J. (1989). PHYLIP: Phylogeny Inference Package (Version 3.2). Cladistics
5, p. 164-166.
Ferry, J. (2010). The chemical biology of methanogenesis. Planetary and Space Science.
58(14), p. 1775-1783.
Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L.,
Gunesekaran, P., Ceric, G., Forslund, K., Holm, L., Sonnhammer, E.L., Eddy, S.R., and
Bateman A. (2010). Pfam: clans, web tools and services. Nucleic Acids Res. 38
(Database issue):D211-222.
Finn, R.D., Mistry, J., Schuster-Böckler, B., Griffiths-Jones, S., Hollich, V., Lassmann,
T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S.R., Sonnhammer, E.L., and
Bateman, A. (2006). Pfam: clans, web tools and services. Nucleic Acids Research. 34:
D247-51.
Forterre, P., Prangishvili, D., & Garrett, R. (2006). Viruses of the archaea: A unifying
view. Nature Reviews Microbiology. 4(11), p. 837-848.
Geer LY, Domrachev M, Lipman DJ, Bryant SH (2002). CDART: protein
homology by domain architechture., Genome Research 12(10), p. 1619-1623.
Hegde, S., Padilla-Sanchez, V., Draper, B., & Rao, V. (2012). Portal-large
terminase interactions of the bacteriophage T4 DNA packaging machine implicate a
molecular lever mechanism for coupling ATPase to DNA translocation. Journal of
Virology. 86(8), p. 4046-4057.
Hendrix, R., Hatfull, G., & Smith, M. (2003). Bacteriophages with tails: Chasing their
origins and evolution. Research in Microbiology. 154(4), p. 253-257.
Higgins, D.G., Bleasby, A.J. and Fuchs, R. (1992). CLUSTAL V: improved software for
multiple sequence alignment. Computer Applications in the Biosciences (CABIOS). 8(2),
p. 189-191.
Johnson, K., & Johnson, D. (1995). Methane emissions from cattle. Journal of Animal
Science. 73(8), p. 2483-2492.
Jukes, T.H., & Cantor, C.R. (1969). Evolution of protein molecules. Mammalian Protein
Metabolism. p. 21-132.
Kabsch W., & Sander C. (1983). Dictionary of protein secondary structure: pattern
recognition of hydrogen-bonded and geometrical features. Biopolymers. 12, p. 25772637.
98
Kimura, M. (1980). A simple method for estimating evolutionary rates of base
substitutions through comparative studies of nucleotide sequences. Journal of Molecular
Evolution. 16(2), p. 111-120.
Leigh, J., Albers, S., Atomi, H., & Allers, T. (2011). Model organisms for genetics in the
domain archaea: Methanogens, halophiles, thermococcales and sulfolobales. FEMS
Microbiology Reviews. 35(4), p. 577-608.
Marchler-Bauer A, et al. (2007), CDD: specific functional annotation with the
Conserved Domain Database., Nucleic Acids Research. 37, p. 237-240.
Marchler-Bauer A, Bryant SH (2004). CD-Search: protein domain annotations on
the fly., Nucleic Acids Research.32, p. 327-331.
Lobocka, M., & Szybalski, W. T. (2012). Bacteriophages. Boston: Elsevier.
Mc, G. S., & Sinderen, D. V. (2007). Bacteriophage: Genetics and molecular biology.
Norfolk: Caister Academic.
Mitchell, R., Loeblich, L., Klotz, L., & Loeblich, 3rd, A. (1979). DNA organization of
methanobacterium thermoautotrophicum. Science. 204, p. 1082-1084.
Mitchell, S. M., Matsuzaki, S., Imai, S., Rao V. B. (2002). Sequence analysis of
bacteriophage T4 DNA packaging/terminase genes 16 and 17 reveals a common ATPase
center in the large subunit of viral terminases. Nucleic Acids Research. 30(18), p. 40094021.
Moss, A., Jouany, J., & Newbold, J. (2000). Methane production by ruminants: Its
contribution to global warming. Annales De Zootechnie. 49(3), p. 231-253.
Orlova, V.E. (2009). How viruses infect bacteria. EMBO J. 28, p. 797-798.
Pfister, P., Wasserfallen, A., Stettler, R., & Leisinger, T. (1998). Molecular analysis of
methanobacterium phage (psi m2). Molecular Microbiology. 30(2), p. 233.
Posada, D., and Buckley, T. R., (2004). Model selection and model averaging in
phylogenetics: advantages of the AIC and Bayesian approaches over likelihood ratio
tests. Systematic Biology. 53, p. 793-808.
Posada, D., and Crandall, K. A., (1998). Modeltest: testing the model of DNA
substitution. Bioinformatics. 14(9), p. 817-818.
Price, M., Dehal, P. , & Arkin, A. (2010). Fasttree 2--approximately maximum-likelihood
trees for large alignments. PloS One. 5(3), e9490.
99
Reeve, J. N. (1992). Molecular Bio1ogy of Methanogens. Annual Review of
Microbiology, USA. 46, p. 165-191.
Samuel, B., & Gordon, J. (2006). A humanized gnotobiotic mouse model of hostarchaeal-bacterial mutualism. Proceedings of the National Academy of Sciences of the
United States of America. 103(26), p. 10011-10016.
Samuel, B., Hansen, E., Manchester, J., Coutinho, P. , Henrissat, B., et al. (2007).
Genomic and metabolic adaptations of methanobrevibacter smithii to the human gut.
PNAS. 104(25), p. 10643-10648.
Snyder, J., & Young, M. (2011). Advances in understanding archaea-virus interactions in
controlled and natural environments. Current Opinion in Microbiology. 14(4), p. 497503.
Stedman, K. M., Porter, K., Dyall-Smith, M. L. (2010). The isolation of viruses infecting
Archaea. Manual of Aquatic Viral Ecology. p. 57-64.
Sun, S., Kondabagil, K., Draper, B., Alam, T.I., Bowman, V.D., Zhang, Z., Hegde, S.,
Fokine, A., Rossmann, M.G., and Rao, V.B. (2008). The structure of the phage T4 DNA
packaging motor suggests a mechanism dependent on electrostatic forces. Cell. 135(7), p.
1251-1262.
Swofford, D. L. (2002). PAUP*. Phylogenetic Analysis Using Parsimony
Version 4. Sinauer Associates. Sunderland, Massachusetts.
Tajima, F., & Nei, M. (1984). Estimation of evolutionary distance between nucleotide
sequences. Molecular Biology and Evolution. 1(3), 269.
Thompson J.D., Higgins D.G., Gibson T.J. (1994). CLUSTAL W: improving the
sensitivity of progressive multiple sequence alignment through sequence weighting,
position-specific gap penalties and weight matrix choice. Nucleic Acids Research. 22, p.
4673-4680.
Trun, N. J., & Trempy, J. E. (2004). Fundamental bacterial genetics. Malden, MA:
Blackwell.
Van Nevel, C., & Demeyer, D. (1996). Control of rumen methanogenesis. Environmental
Monitoring and Assessment. 42(1-2), p. 73-97.
Whelan, S., & Goldman, N. (2001). A general empirical model of protein evolution
derived from multiple protein families using a maximum-likelihood approach. Molecular
Biology and Evolution. 18(5), p. 691-699.
Weigel, C., & Seitz, H. (2006). Bacteriophage replication modules. FEMS Microbiology
Reviews. 30(3), p. 321-381.
100
Woese, C R., and Fox, G.E. (1977). Phylogenetic structure of the prokaryotic domain: the
primary kingdoms. PNAS. 74, p. 5088–5090.
101
APPENDIX A
Materials
Acetic acid, Glacial (Fisher, 64-19-7)
Agar (Difco Agar, 214530)
Agarose (Life Technologies, 9012-36-6)
Ammonium chloride [NH4Cl] (Sigma, 12125-02-9)
Ammonium sulfate [(NH4)2SO4] (Fisher, 7783-20-2)
Ampicillin (Sigma, A-6140)
Biotin (Sigma B4501)
Bovine serum albumin (BSA) (Sigma A3733)
Bromophenol blue (Sigma 114413)
Calcium chloride [CaCl2] (Fisher, 10043-52-4)
Casamino Acids (Difco 023-17-3)
Chloroform [CCl4] (Fisher C607-1)
L-cysteine (Sigma C9768-10)
Deionized water (PurE water) [dH20]
Ethanol [CH3CH2OH]
102
Ethidium Bromide [EtBr] (Sigma, 1239-45-8)
Ethylene Diamine Tetraacetic Acid [EDTA] (Research Organics, Inc. 6381-92-6)
Ferrous sulfide [FeS] Acros 1317-37-9)
GeneRulerTM 1 kb DNA Ladder (Thermo)
Glycerol (Fisher, BP229-1)
Glycogen (Sigma G0885)
H2/CO2 70:30 gas mix (Air Products)
Hydrochloric acid [HCl] (Sigma, 7647-01-0)
LB (Difco LB Agar, 244520)
Lithium chloride [LiCl] (Sigma, 7447-41-8)
Magnesium chloride [MgCl2] (Fisher, 7791-18-6)
Magnesium sulfate [MgSO4  7 H2O] (Spectrum, 7487-88-9)
Methane gas [CH4] (Air Products)
Methanol [CH3OH] (Fisher A411-20)
Mineral oil (Fisher 80-47-5)
N2/CO2 70:30 gas mix (Air Products)
Potassium chloride [KCl] (Fisher, 7447-40-7)
103
Potassium phosphate dibasic [K2HPO4] (J.T. Baker 7758-11-4)
Potassium phosphate monobasic [KH2PO4] (Sigma, P-5379)
Sodium acetate (Fisher, 6131-90-4)
Sodium bicarbonate [NaHCO3] (Fisher, 144-55-8)
Sodium carbonate [Na2CO3] (Fisher 497-19-8)
Sodium chloride [NaCl] (Aldrich 7647-14-5)
Sodium Dodecyl Sulfate [SDS] (Sigma L-4390)
Sodium hydroxide [NaOH] (Fisher, 1310-73-2)
Sodium sulfide [Na2S] (Fisher, 1313-84-4)
Sucrose (Criterion C7021)
Trace Minerals (Bertani and Baresi, 1987)
Tris/Acetic Acid/EDTA [TAE 50X] (BioRad TAE buffer)
Tris base (Fisher BP152-1)
Tris-HCl (Barker X186-05)
Triton® X-100 (Sigma, 9002-93-1)
UltraPURE agarose (Life Technologies, 9012-36-6)
Vancomycin (Sigma V2002)
104
Yeast Extract (Difco 212750
Pure E Water
Pure E water is a type 1 ultrapure water using Thermo Scientific™ Barnstead™
E-Pure™ Ultrapure Water Purification Systems. Deionized water is ran through
the filtration system and has a 0.2 µm filter removing bacteria and particulates.
Antibiotics
Ampicillin
Ampicillin
200mg
dH20
100mL
Final concentration 2 mg/mL
Water was made anaerobic under N2/CO2 (70:30) gas atmosphere and dispensed
into 100mL aliquot samples per bottle under H2/CO2 (70:30) gas atmosphere. All
bottles were closed with rubber stopper and aluminum seal using the seal crimper.
All bottles were sterilized by autoclaving. After cooling to room temperature,
Ampicillin was added to each bottle inside the anaerobic hood and filter sterilized
using a 0.2µm filter. Stored at 4°C temperature.
Antibiotic Mix
Ampicillin
0.2g
D-Cycloserine
0.02g
Vancomycin
0.02g
dH20
100mL
Final concentration 0.2%, 0.02%, and 0.02%, respectively
Water was made anaerobic under N2/CO2 (70:30) gas atmosphere and dispensed
into 100mL aliquot samples per bottle under H2/CO2 (70:30) gas atmosphere. All
bottles were closed with rubber stopper and aluminum seal using the seal crimper.
All bottles were sterilized by autoclaving. After cooling to room temperature, the
three antibiotics were added to each bottle inside the anaerobic hood and filter
sterilized using a 0.2µm filter. Stored at 4°C temperature.
Media
MS06
NH4Cl
Mineral 1
0.125g
5mL
105
Mineral 2
TM
0.4% CaCl2
Na Acetate
Cysteine
Agar
dH20
5mL
0.1mL
0.5mL
0.8g
50mg
1.4g
100mL
Trace Minerals (TM)
MnSO4 . H2O
FeSO4 . 7H2O
CoCl2 . 6H2O
ZnSO4 . 7H2O
CuSO4 . 5H2O
AlK(SO4)2 . 12H2O
H3BO3
NaMoO4 . 2H2O
NiCl2 . 6H2O
NaSeO3 . 5H2O
dH20
0.5g
0.1g
0.1g
0.1g
0.01g
0.01g
0.01g
0.01g
0.05g
0.263g
1L
1.5g of Nitrilotriacetic acid was dissolved with KOH to pH 6.5 and the above
minerals were added to it. Final pH was 7.0. Sterilized by autoclaving and stored
at 4°C.
Solution and Reagents
Agarose gel for PCR products
Agarose
1X TAE buffer
Final agarose 0.8%
0.24g
30mL
Agarose
1X TAE buffer
Final agarose 1.0%
0.30g
30mL
Agarose
1X TAE buffer
Final agarose 1.5%
0.45g
30mL
Double boiling over a flame melted Agarose solution. It was allowed to cool to
50°C before poured into gel tray.
“B” Solution
Yeast Extract
Casamino acids
12.5g
12.5g
106
Wolf’s vitamins
dH2O
3 µl
100 mL
Made anaerobically under N2/CO2 (70:30) gas atmosphere and dispensed into
4.5mL aliquot samples per tube under H2/CO2 (70:30) gas atmosphere. All tubes
were closed with rubber stopper and aluminum seal using the seal crimper. All
tubes were sterilized by autoclaving. After cooling to room temperature 100µL of
1% Na2S, 100µL of 6.5% NaHCO3, and 100µL of Biotin were added to each tube
of “B” Supplement. Stored at room temperature.
Biotin
Biotin
50mg
dH20
25mL
Final concentration 2mg/mL
Water was made anaerobic under N2/CO2 (70:30) gas atmosphere and dispensed
into 25mL aliquot samples per bottle under H2/CO2 (70:30) gas atmosphere. All
bottles were closed with rubber stopper and aluminum seal using the seal crimper.
All bottles were sterilized by autoclaving. After cooling to room temperature,
biotin was added to each bottle inside the anaerobic hood and filter sterilized
using a 0.2µm filter. Stored at room temperature.
CaCl2
CaCl2
dH20
Final concentration 0.4%
4g
1L
Made aerobically and stored at room temperature.
EDTA
EDTA
dH20
Final concentration 0.5M
186.1g
1L
Solution was adjusted to pH 8 and stored at room temperature.
70% Ethanol
100% Ethanol
70mL
dH20
30mL
Final concentration 70% and stored at -20°C.
EtBr
EtBr (10mg/mL)
100µL
dH20
100mL
Final concentration 100µg/mL
107
Solution was mixed and stored in a foil-covered container at room temperature.
Filter papers were also soaked in the solution. Solution was only handled with
gloves.
1N HCl
HCl (conc)
83.3mL
dH20
916.7mL
Final concentration 1N and stored at room temperature.
Indicator Dye used for loading samples on agarose gels.
Bromophenol blue
0.025g
Sucrose
5g
1M Tris buffer pH 8
10µL
Final concentrations 0.25%, Bromophenol blue, 50% Sucrose, and 1mM Tris pH
8.
Raise the volume to total 10mL with dH20. Dispensed into 1mL aliquot samples
in Eppendorf tubes and stored at -20°C.
1M KCl
KCl
7.456g
dH20
100mL
Final concentration 1M and stored at room temperature.
5M LiCl
LiCl
21.2g
dH20
100mL
Final concentration 5M and stored at room temperature.
Mineral 1
K2HPO4
3.1g
dH20
1L
Stored at room temperature.
Mineral 2
KH2PO4
(NH4)2SO4
NaCl
MgSO4 . 7H2O
dH20
Stored at room temperature.
3.0g
6.0g
12.0g
2.4g
1L
NaCl
NaCl
5.844g
dH20
100mL
Final concentration 1M and stored at room temperature.
108
NaHCO3
NaHCO3
dH20
Final concentration: 6.25%
6.25g
100mL
Made anaerobically in serum bottles under H2/CO2 (70:30) gas atmosphere and
dispensed into 25mL aliquot samples per bottle under H2/CO2 (70:30) gas
atmosphere. All bottles were closed with rubber stopper and aluminum seal using
the seal crimper. All bottles were sterilized by autoclaving. Stored at room
temperature.
1 M NaOH
NaOH
40g
dH20
1L
Final concentration 1M and sterilized by autoclaving and stored at room
temperature.
0.1M NaOH
1M NaOH
100mL
dH20
900mL
Final concentration 0.1M and sterilized by autoclaving and stored at room
temperature.
SDS
SDS
40g
dH20
100mL
Final concentration 40% and stored at room temperature.
Buffers
Lysis buffer
1M Tris
5mL
0.5M EDTA
0.2mL
1M NaCl
10mL
dH20
60mL
Final concentrations 50mM Tris-HCl – 1mM EDTA – 100mM NaCl
Adjust pH to 8.0 and raise the volume to 100mL with dH20. Sterilized by
autoclaving. Stored at room temperature.
PCR Reaction buffer
1M Tris buffer pH 9
1M KCl
10% Triton X-100
0.5M MgCl2
0.1mL
0.5mL
0.1mL
0.1mL
109
dH20
0.2mL
Total volume
1mL
Final concentrations 100mM Tris – 500mM KCl – 1% Triton X-100, and 50mM
MgCl2. Stored at 4°C.
1X TAE
50X TAE
20mL
dH20
980mL
Stored at room temperature.
5M Tris buffer
Tris base
dH20
Final concentration 5M
60.57g
70mL
Adjust pH to 8.5 with 1N HCl and raise the volume to 100mL with dH20.
Sterilized by autoclaving. Stored at room temperature.
1.5M Tris buffer
5M Tris buffer pH 8.5
dH20
Final concentration 1.5M
30mL
50mL
Adjust pH to 8.8 with 1N NaOH and raise the volume to 100mL with dH20.
Sterilized by autoclaving. Stored at 4°C.
1M Tris buffer
Tris base
dH20
Final concentration 1M
121.14g
800mL
Adjust pH to 8 with 1N HCl and raise the volume to 1L with dH20. Sterilized by
autoclaving. Stored at room temperature.
1M Tris buffer
Tris base
dH20
Final concentration 1M
121.14g
800mL
Adjust pH to 7 with 1N HCl and raise the volume to 1L with dH20. Sterilized by
autoclaving. Stored at room temperature.
1M Tris buffer
Tris base
dH20
Final concentration 1M
121.14g
800mL
110
Adjust pH to 9 and raise the volume to 1L with dH20. Sterilized by autoclaving.
Stored at room temperature.
0.5M Tris buffer
1M Tris buffer pH 7
dH20
Final concentration 0.5M
50mL
30mL
Adjust pH to 6.8 with 1N HCl and raise the volume to 100mL with dH20.
Sterilized by autoclaving. Stored at 4°C.
TE buffer
1M Tris buffer
1mL
0.5M EDTA
0.8mL
dH20
5mL
Final concentrations 100mM Tris – 40mM EDTA
Adjust pH to 7.5 using 1N HCl and raise the volume to 10mL with dH20.
Sterilized by autoclaving. Stored at room temperature.
TE buffer
1M Tris buffer
1mL
0.5M EDTA
0.2mL
dH20
80mL
Final concentrations 10mM Tris – 1mM EDTA
Adjust pH to 8.5 and raise the volume to 100mL with dH20. Sterilized by
autoclaving. Stored at room temperature.
111
APPENDIX B
Multiple Sequence Alignment of TLS Nucleotide Sequences for Podoviridae
112
113
114
Multiple Sequence Alignment of TLS Nucleotide Sequences for Siphoviridae
115
116
117
Multiple Sequence Alignment of TLS Nucleotide Sequences for All 20 Sequences
118
119
120
121
122
123
Multiple Sequence Alignment of TLS Amino Acid Sequence for Podoviridae
124
Multiple Sequence Alignment of TLS Amino Acid Sequence for Siphoviridae
125
Multiple Sequence Alignment of TLS Amino Acid Sequence for All 20 Sequences
126
127
APPENDIX C
Distance Matrices for TLS in Podoviridae Using MEGA4
MEGA Kimura 2 Podoviridae
Title: : Podoviridae.dat
Description
No. of Taxa : 11
Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene
Project\Podoviridae\Step 8 Mega\Podoviridae.meg
Data Title : : Podoviridae.dat
Data Type : Nucleotide (Coding)
Analysis : Pairwise distance calculation
->Compute : Distances only
Include Sites : ==============================
->Gaps/Missing Data : Complete Deletion
->Codon Positions : 1st+2nd+3rd+Noncoding
Substitution Model : ==============================
->Model : Nucleotide: Kimura 2-parameter
->Substitutions to Include : d: Transitions + Transversions
->Pattern among Lineages : Same (Homogeneous)
->Rates among sites : Uniform rates
No. of Sites : 1413
d : Estimate
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]
[ 7]
[ 8]
[ 9]
[10]
[11]
[
]
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]
[ 7]
[ 8]
[ 9]
[10]
[11]
#Bacteriophage_L
#Salmonella_phag
#Salmonella_ph_2
#Enterobacteria
#Salmonella_ph_4
#Salmonella_ph_5
#Bacteriophage_P
#Enterobacteri_7
#Escherichia_fer
#Salmonella_ph_9
#Shigella_phage
1
0.000
0.010
0.009
0.012
0.012
0.039
0.039
0.163
2.536
2.338
2
0.010
0.009
0.012
0.012
0.039
0.039
0.163
2.536
2.338
3
0.004
0.011
0.011
0.040
0.040
0.163
2.600
2.352
4
0.009
0.009
0.039
0.039
0.161
2.665
2.393
5
6
0.000
0.035
0.035
0.157
2.585
2.393
0.035
0.035
0.157
2.585
2.393
128
7
8
9
10
0.000
0.135 0.135
2.665 2.665 2.338
2.445 2.445 2.171 0.035
11
MEGA Jukes Cantor Podoviridae
Title: : Podoviridae.dat
Description
No. of Taxa : 11
Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene
Project\Podoviridae\Step 8 Mega\Podoviridae.meg
Data Title : : Podoviridae.dat
Data Type : Nucleotide (Coding)
Analysis : Pairwise distance calculation
->Compute : Distances only
Include Sites : ==============================
->Gaps/Missing Data : Complete Deletion
->Codon Positions : 1st+2nd+3rd+Noncoding
Substitution Model : ==============================
->Model : Nucleotide: Jukes-Cantor
->Substitutions to Include : All
->Pattern among Lineages : Same (Homogeneous)
->Rates among sites : Uniform rates
No. of Sites : 1413
d : Estimate
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]
[ 7]
[ 8]
[ 9]
[10]
[11]
[
]
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]
[ 7]
[ 8]
[ 9]
[10]
[11]
#Bacteriophage_L
#Salmonella_phag
#Salmonella_ph_2
#Enterobacteria
#Salmonella_ph_4
#Salmonella_ph_5
#Bacteriophage_P
#Enterobacteri_7
#Escherichia_fer
#Salmonella_ph_9
#Shigella_phage
1
0.000
0.010
0.009
0.012
0.012
0.038
0.038
0.161
2.357
2.250
2
0.010
0.009
0.012
0.012
0.038
0.038
0.161
2.357
2.250
3
0.004
0.011
0.011
0.040
0.040
0.161
2.374
2.250
4
0.009
0.009
0.038
0.038
0.159
2.408
2.279
5
6
0.000
0.035
0.035
0.155
2.374
2.279
0.035
0.035
0.155
2.374
2.279
MEGA Tajima-Nei Podoviridae
129
7
8
9
10
0.000
0.134 0.134
2.408 2.408 2.209
2.309 2.309 2.108 0.035
11
Title: : Podoviridae.dat
Description
No. of Taxa : 11
Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene
Project\Podoviridae\Step 8 Mega\Podoviridae.meg
Data Title : : Podoviridae.dat
Data Type : Nucleotide (Coding)
Analysis : Pairwise distance calculation
->Compute : Distances only
Include Sites : ==============================
->Gaps/Missing Data : Complete Deletion
->Codon Positions : 1st+2nd+3rd+Noncoding
Substitution Model : ==============================
->Model : Nucleotide: Tajima-Nei
->Substitutions to Include : All
->Pattern among Lineages : Same (Homogeneous)
->Rates among sites : Uniform rates
No. of Sites : 1413
d : Estimate
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]
[ 7]
[ 8]
[ 9]
[10]
[11]
[
]
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]
[ 7]
[ 8]
[ 9]
[10]
[11]
#Bacteriophage_L
#Salmonella_phag
#Salmonella_ph_2
#Enterobacteria
#Salmonella_ph_4
#Salmonella_ph_5
#Bacteriophage_P
#Enterobacteri_7
#Escherichia_fer
#Salmonella_ph_9
#Shigella_phage
1
0.000
0.010
0.009
0.012
0.012
0.039
0.039
0.165
2.442
2.311
2
0.010
0.009
0.012
0.012
0.039
0.039
0.165
2.442
2.311
3
0.004
0.011
0.011
0.040
0.040
0.164
2.465
2.310
4
0.009
0.009
0.039
0.039
0.163
2.509
2.345
5
6
0.000
0.035
0.035
0.158
2.459
2.342
0.035
0.035
0.158
2.459
2.342
7
8
9
10
0.000
0.135 0.135
2.512 2.512 2.308
2.390 2.390 2.178 0.035
MEGA Nei-Gojobori JC Synonymous Podoviridae
Title: : Podoviridae.dat
Description
No. of Taxa : 11
Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene
Project\Podoviridae\Step 8 Mega\Podoviridae.meg
130
11
Data Title : : Podoviridae.dat
Data Type : Nucleotide (Coding)
Analysis : Pairwise distance calculation
->Compute : Distances only
Include Sites : ==============================
->Gaps/Missing Data : Complete Deletion
Substitution Model : ==============================
->Model : Codon: Nei-Gojobori (Jukes-Cantor)
->Substitutions to Include : s: Synonymous only
->Pattern among Lineages : Same (Homogeneous)
->Rates among sites : Uniform rates
No. of Sites : 470
dS : Estimate
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]
[ 7]
[ 8]
[ 9]
[10]
[11]
[
]
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]
[ 7]
[ 8]
[ 9]
[10]
[11]
#Bacteriophage_L
#Salmonella_phag
#Salmonella_ph_2
#Enterobacteria
#Salmonella_ph_4
#Salmonella_ph_5
#Bacteriophage_P
#Enterobacteri_7
#Escherichia_fer
#Salmonella_ph_9
#Shigella_phage
1
0.000
0.032
0.029
0.046
0.046
0.166
0.166
0.781
?
?
2
0.032
0.029
0.046
0.046
0.166
0.166
0.781
?
?
3
0.010
0.032
0.032
0.174
0.174
0.765
?
?
4
0.029
0.029
0.170
0.170
0.756
?
?
5
6
0.000
0.159
0.159
0.731
?
?
0.159
0.159
0.731
?
?
7
8
9
10
0.000
0.551 0.551
?
?
?
?
?
3.097 0.150
MEGA Nei-Gojobori JC Non Synonymous Podoviridae
Title: : Podoviridae.dat
Description
No. of Taxa : 11
Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene
Project\Podoviridae\Step 8 Mega\Podoviridae.meg
Data Title : : Podoviridae.dat
Data Type : Nucleotide (Coding)
Analysis : Pairwise distance calculation
->Compute : Distances only
Include Sites : ==============================
->Gaps/Missing Data : Complete Deletion
131
11
Substitution Model : ==============================
->Model : Codon: Nei-Gojobori (Jukes-Cantor)
->Substitutions to Include : n: Nonsynonymous only
->Pattern among Lineages : Same (Homogeneous)
->Rates among sites : Uniform rates
No. of Sites : 470
dN : Estimate
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]
[ 7]
[ 8]
[ 9]
[10]
[11]
[
]
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]
[ 7]
[ 8]
[ 9]
[10]
[11]
#Bacteriophage_L
#Salmonella_phag
#Salmonella_ph_2
#Enterobacteria
#Salmonella_ph_4
#Salmonella_ph_5
#Bacteriophage_P
#Enterobacteri_7
#Escherichia_fer
#Salmonella_ph_9
#Shigella_phage
1
0.000
0.004
0.003
0.003
0.003
0.006
0.006
0.049
2.003
1.987
2
0.004
0.003
0.003
0.003
0.006
0.006
0.049
2.003
1.987
3
0.003
0.005
0.005
0.005
0.005
0.051
2.000
1.984
4
0.004
0.004
0.005
0.005
0.050
2.018
2.002
5
6
0.000
0.003
0.003
0.048
2.025
2.009
0.003
0.003
0.048
2.025
2.009
7
8
9
10
0.000
0.047 0.047
2.038 2.038 1.975
2.022 2.022 1.963 0.003
MEGA P-distance Podoviridae
Title: : Podoviridae.dat
Description
No. of Taxa : 11
Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene
Project\Podoviridae\Step 8 Mega\Podoviridae.meg
Data Title : : Podoviridae.dat
Data Type : Nucleotide (Coding)
Analysis : Pairwise distance calculation
->Compute : Distances only
Include Sites : ==============================
->Gaps/Missing Data : Complete Deletion
->Codon Positions : 1st+2nd+3rd+Noncoding
Substitution Model : ==============================
->Model : Nucleotide: p-distance
->Substitutions to Include : d: Transitions + Transversions
->Pattern among Lineages : Same (Homogeneous)
->Rates among sites : Uniform rates
No. of Sites : 1413
132
11
d : Estimate
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]
[ 7]
[ 8]
[ 9]
[10]
[11]
[
]
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]
[ 7]
[ 8]
[ 9]
[10]
[11]
#Bacteriophage_L
#Salmonella_phag
#Salmonella_ph_2
#Enterobacteria
#Salmonella_ph_4
#Salmonella_ph_5
#Bacteriophage_P
#Enterobacteri_7
#Escherichia_fer
#Salmonella_ph_9
#Shigella_phage
1
0.000
0.010
0.008
0.012
0.012
0.038
0.038
0.145
0.718
0.713
2
0.010
0.008
0.012
0.012
0.038
0.038
0.145
0.718
0.713
3
0.004
0.011
0.011
0.039
0.039
0.145
0.718
0.713
4
0.009
0.009
0.038
0.038
0.144
0.720
0.714
5
6
0.000
0.034
0.034
0.140
0.718
0.714
0.034
0.034
0.140
0.718
0.714
133
7
8
9
10
0.000
0.122 0.122
0.720 0.720 0.711
0.715 0.715 0.705 0.034
11
Distance Matrices for TLS in Podoviridae Using PAUP
Paup-Podoviridae Kimura 2 Distance
#NEXUS
[Distance matrix saved Tuesday, April 20, 2010
[!
Distance measure = Kimura 2-parameter
]
Begin taxa;
Dimensions ntax=11;
Taxlabels
Bacteriophage_L
Salmonella_phag
Salmonella_ph_2
Enterobacteria
Salmonella_ph_4
Salmonella_ph_5
Bacteriophage_P
Enterobacteri_7
Escherichia_fer
Salmonella_ph_9
Shigella_phage
;
End;
10:12 PM]
Begin distances;
Format triangle=lower labels nodiagonal;
Matrix
Bacteriophage_L
Salmonella_phag
Salmonella_ph_2
Enterobacteria
Salmonella_ph_4
Salmonella_ph_5
Bacteriophage_P
Enterobacteri_7
Escherichia_fer
Salmonella_ph_9
Shigella_phage
0.03488178
0.00000000
0.00939569
0.00805147
0.01144361
0.01144361
0.03776458
0.03776458
0.16894521
2.52782488
2.33306265
0.00940402
0.00805563
0.01143852
0.01143852
0.03776632
0.03776632
0.16895662
2.52867198
2.33347130
0.00401104
0.01007581
0.01007581
0.03915803
0.03915803
0.16872907
2.58932948
2.34759402
0.00873225
0.00873225
0.03778067
0.03778067
0.16714604
2.65308905
2.38732648
;
End;
134
0.00000000
0.03423204
0.03423204
0.16291463
2.57607889
2.38805771
0.03423204
0.03423204
0.16291463
2.57607889
2.38805771
0.00000000
0.14281318
2.65199256
2.43845344
0.14281318
2.65199256
2.43845344
2.33474612
2.16910219
Paup-Podoviridae Kimura 3 Distance
#NEXUS
[Distance matrix saved Wednesday, April 21, 2010
[!
Distance measure = Kimura 3-parameter
]
8:43 AM]
Begin taxa;
Dimensions ntax=11;
Taxlabels
Bacteriophage_L
Salmonella_phag
Salmonella_ph_2
Enterobacteria
Salmonella_ph_4
Salmonella_ph_5
Bacteriophage_P
Enterobacteri_7
Escherichia_fer
Salmonella_ph_9
Shigella_phage
;
End;
Begin distances;
Format triangle=lower labels nodiagonal;
Matrix
Bacteriophage_L
Salmonella_phag
Salmonella_ph_2
Enterobacteria
Salmonella_ph_4
Salmonella_ph_5
Bacteriophage_P
Enterobacteri_7
Escherichia_fer
Salmonella_ph_9
Shigella_phage
0.03490559
;
End;
0.00000000
0.00939569
0.00805158
0.01144465
0.01144465
0.03776471
0.03776471
0.16905527
2.74895644
2.51205444
0.00940402
0.00805574
0.01143956
0.01143956
0.03776645
0.03776645
0.16906679
2.74971652
2.51263404
0.00401115
0.01007592
0.01007592
0.03915816
0.03915816
0.16882090
2.84667826
2.55163670
0.00873225
0.00873225
0.03778067
0.03778067
0.16724624
3.21999645
2.67578864
135
0.00000000
0.03423254
0.03423254
0.16301344
2.72281551
2.57791352
0.03423254
0.03423254
0.16301344
2.72281551
2.57791352
0.00000000
0.14295946
3.25594544
2.91190577
0.14295946
3.25594544
2.91190577
2.85742188
2.30707932
Paup-Podoviridae Jukes-Cantor Distance
#NEXUS
[Distance matrix saved Tuesday, April 20, 2010
[!
Distance measure = Jukes-Cantor
]
10:00 PM]
Begin taxa;
Dimensions ntax=11;
Taxlabels
Bacteriophage_L
Salmonella_phag
Salmonella_ph_2
Enterobacteria
Salmonella_ph_4
Salmonella_ph_5
Bacteriophage_P
Enterobacteri_7
Escherichia_fer
Salmonella_ph_9
Shigella_phage
;
End;
Begin distances;
Format triangle=lower labels nodiagonal;
Matrix
Bacteriophage_L
Salmonella_phag
Salmonella_ph_2
Enterobacteria
Salmonella_ph_4
Salmonella_ph_5
Bacteriophage_P
Enterobacteri_7
Escherichia_fer
Salmonella_ph_9
Shigella_phage
0.03476365
;
End;
0.00000000
0.00939189
0.00804297
0.01141984
0.01141984
0.03759329
0.03759329
0.16735767
2.34895587
2.24433970
0.00940021
0.00804712
0.01141477
0.01141477
0.03759509
0.03759509
0.16736981
2.34925485
2.24454451
0.00401070
0.01006727
0.01006727
0.03899647
0.03899647
0.16735767
2.36455727
2.24405479
0.00871713
0.00871713
0.03759329
0.03759329
0.16569285
2.39828563
2.27267075
136
0.00000000
0.03409678
0.03409678
0.16154690
2.36535621
2.27320170
0.03409678
0.03409678
0.16154690
2.36535621
2.27320170
0.00000000
0.14195928
2.39855313
2.30239582
0.14195928
2.39855313
2.30239582
2.20578408
2.10568690
Paup-Podoviridae Absolute Distance
#NEXUS
[Distance matrix saved Tuesday, April 20, 2010
[!
Distance measure = absolute
]
10:03 PM]
Begin taxa;
Dimensions ntax=11;
Taxlabels
Bacteriophage_L
Salmonella_phag
Salmonella_ph_2
Enterobacteria
Salmonella_ph_4
Salmonella_ph_5
Bacteriophage_P
Enterobacteri_7
Escherichia_fer
Salmonella_ph_9
Shigella_phage
;
End;
Begin distances;
Format triangle=lower labels nodiagonal;
Matrix
Bacteriophage_L
Salmonella_phag
0
Salmonella_ph_2
14
14
Enterobacteria
12
12
6
Salmonella_ph_4
17
17
15
13
Salmonella_ph_5
17
17
15
13
0
Bacteriophage_P
55
55
57
55
50
Enterobacteri_7
55
55
57
55
50
Escherichia_fer
225
225
225
223
218
Salmonella_ph_9
1014 1014 1015 1017 1015
Shigella_phage
1007 1007 1007 1009 1009
;
End;
137
50
50
218
1015
1009
0
194
1017
1011
194
1017
1011
1004
996
48
Paup-Podoviridae Maximum Likelihood
#NEXUS
[Distance matrix saved Wednesday, April 21, 2010 9:44 AM]
[!
Distance measure = maximum-likelihood
Likelihood settings:
Number of substitution types = 6User-specified substitution rate matrix =
1.765600
2.645500
0.625000
1.765600
0.424800
4.501700
2.645500
0.424800
1.000000
0.625000
4.501700
1.000000
Assumed nucleotide frequencies (set by user):
A=0.27040 C=0.23490 G=0.27020 T=0.22450
Among-site rate variation:
Assumed proportion of invariable sites = none
Distribution of rates at variable sites = gamma (continuous) with shape parameter
(alpha) = 0.4516
These settings correspond to the GTR+G model
]
Begin taxa;
Dimensions ntax=11;
Taxlabels
Bacteriophage_L
Salmonella_phag
Salmonella_ph_2
Enterobacteria
Salmonella_ph_4
Salmonella_ph_5
Bacteriophage_P
Enterobacteri_7
Escherichia_fer
Salmonella_ph_9
Shigella_phage
;
End;
Begin distances;
Format triangle=lower labels
Matrix
Bacteriophage_L
Salmonella_phag
0.00000000
Salmonella_ph_2
0.00958093
Enterobacteria
0.00812216
Salmonella_ph_4
0.01155554
Salmonella_ph_5
0.01155554
Bacteriophage_P
0.04013967
Enterobacteri_7
0.04013967
Escherichia_fer
0.22454868
Salmonella_ph_9
167.28
Shigella_phage
158.535
Escherichia_fer
Salmonella_ph_9
Shigella_phage
0.18455921
162.737
152.16
nodiagonal;
0.00958092
0.00812215
0.01155543
0.01155543
0.04013967
0.04013967
0.22454868
167.28
158.535
0.00405087
0.01027630
0.01027630
0.04186995
0.04186995
0.22672102
164.261
158.013
151.11
151.111
0.03659397
0.00880866
0.00880866
0.03998624
0.03998624
0.22263619
168.22
161.773
;
End;
138
0.00000000
0.03625951
0.03625951
0.21596409
174.111
168.379
0.03625951
0.03625951
0.21596409
174.111
168.379
0.00000000
0.18455921
162.737
152.16
Paup-Podoviridae Uncorrected-P
#NEXUS
[Distance matrix saved Wednesday, April 21, 2010
[!
Distance measure = uncorrected ("p")
]
9:05 AM]
Begin taxa;
Dimensions ntax=11;
Taxlabels
Bacteriophage_L
Salmonella_phag
Salmonella_ph_2
Enterobacteria
Salmonella_ph_4
Salmonella_ph_5
Bacteriophage_P
Enterobacteri_7
Escherichia_fer
Salmonella_ph_9
Shigella_phage
;
End;
Begin distances;
Format triangle=lower labels nodiagonal;
Matrix
Bacteriophage_L
Salmonella_phag
0.00000000
Salmonella_ph_2
0.00933333 0.00934155
Enterobacteria
0.00800000 0.00800410 0.00400000
Salmonella_ph_4
0.01133333 0.01132834 0.01000000
Salmonella_ph_5
0.01133333 0.01132834 0.01000000
Bacteriophage_P
0.03666667 0.03666838 0.03800000
Enterobacteri_7
0.03666667 0.03666838 0.03800000
Escherichia_fer
0.15000001 0.15000972 0.15000001
Salmonella_ph_9
0.71727526 0.71728826 0.71794891
Shigella_phage
0.71237683 0.71238708 0.71236253
0.03397028
;
End;
0.00866667
0.00866667
0.03666667
0.03666667
0.14866666
0.71935838
0.71377152
139
0.00000000
0.03333334
0.03333334
0.14533333
0.71798307
0.71379715
0.03333334
0.03333334
0.14533333
0.71798307
0.71379715
0.00000000
0.12933333
0.71936929
0.71517926
0.12933333
0.71936929
0.71517926
0.71039212
0.70473695
Distance Matrices for TLS in Siphoviridae Using MEGA4
MEGA-Siphovridae P-distances
Title: : Siphoviridae.dat
Description
No. of Taxa : 9
Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene
Project\Siphoviridae\Step 8 Mega\Siphoviridae.meg
Data Title : : Siphoviridae.dat
Data Type : Nucleotide (Coding)
Analysis : Pairwise distance calculation
->Compute : Distances only
Include Sites : ==============================
->Gaps/Missing Data : Complete Deletion
->Codon Positions : 1st+2nd+3rd+Noncoding
Substitution Model : ==============================
->Model : Nucleotide: p-distance
->Substitutions to Include : d: Transitions + Transversions
->Pattern among Lineages : Same (Homogeneous)
->Rates among sites : Uniform rates
No. of Sites : 1149
d : Estimate
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
#Enterobact
#Staphyloco
#Enteroba02
#Erwinia_ph
#Methanothe
#Methanobac
#Methanosar
#Mesorhizob
#PG
1
0.022
0.029
0.300
0.594
0.593
0.594
0.553
0.622
2
0.024
0.304
0.594
0.591
0.589
0.556
0.619
3
0.299
0.586
0.586
0.588
0.562
0.623
4
0.607
0.600
0.585
0.532
0.619
5
6
7
8
0.151
0.574 0.563
0.568 0.583 0.493
0.574 0.572 0.640 0.641
140
9 ]
MEGA-Siphovridae Kimura 2
Title: : Siphoviridae.dat
Description
No. of Taxa : 9
Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene
Project\Siphoviridae\Step 8 Mega\Siphoviridae.meg
Data Title : : Siphoviridae.dat
Data Type : Nucleotide (Coding)
Analysis : Pairwise distance calculation
->Compute : Distances only
Include Sites : ==============================
->Gaps/Missing Data : Complete Deletion
->Codon Positions : 1st+2nd+3rd+Noncoding
Substitution Model : ==============================
->Model : Nucleotide: Kimura 2-parameter
->Substitutions to Include : d: Transitions + Transversions
->Pattern among Lineages : Same (Homogeneous)
->Rates among sites : Uniform rates
No. of Sites : 1149
d : Estimate
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
#Enterobact
#Staphyloco
#Enteroba02
#Erwinia_ph
#Methanothe
#Methanobac
#Methanosar
#Mesorhizob
#PG
1
0.022
0.029
0.386
1.193
1.186
1.176
1.016
1.337
2
0.025
0.392
1.195
1.178
1.156
1.032
1.315
3
0.385
1.151
1.151
1.151
1.058
1.343
4
1.253
1.211
1.136
0.926
1.326
5
6
7
8
0.169
1.093 1.049
1.063 1.127 0.808
1.101 1.094 1.439 1.452
MEGA-Siphovridae Nei-Gojobori JC Synonmous
Title: : Siphoviridae.dat
Description
No. of Taxa : 9
141
9 ]
Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene
Project\Siphoviridae\Step 8 Mega\Siphoviridae.meg
Data Title : : Siphoviridae.dat
Data Type : Nucleotide (Coding)
Analysis : Pairwise distance calculation
->Compute : Distances only
Include Sites : ==============================
->Gaps/Missing Data : Complete Deletion
Substitution Model : ==============================
->Model : Codon: Nei-Gojobori (Jukes-Cantor)
->Substitutions to Include : s: Synonymous only
->Pattern among Lineages : Same (Homogeneous)
->Rates among sites : Uniform rates
No. of Sites : 382
dS : Estimate
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
#Enterobact
#Staphyloco
#Enteroba02
#Erwinia_ph
#Methanothe
#Methanobac
#Methanosar
#Mesorhizob
#PG
1
0.097
0.132
1.564
?
?
?
?
?
2
0.106
1.668
?
?
?
?
2.735
3
1.595
?
?
?
?
?
4
?
?
?
?
2.553
5
6
0.694
?
?
?
?
?
2.860 2.887 ?
7
8
9 ]
?
MEGA-Siphovridae Nei-Gojobori JC Non-Synonymous
Title: : Siphoviridae.dat
Description
No. of Taxa : 9
Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene
Project\Siphoviridae\Step 8 Mega\Siphoviridae.meg
Data Title : : Siphoviridae.dat
Data Type : Nucleotide (Coding)
Analysis : Pairwise distance calculation
->Compute : Distances only
Include Sites : ==============================
142
->Gaps/Missing Data : Complete Deletion
Substitution Model : ==============================
->Model : Codon: Nei-Gojobori (Jukes-Cantor)
->Substitutions to Include : n: Nonsynonymous only
->Pattern among Lineages : Same (Homogeneous)
->Rates among sites : Uniform rates
No. of Sites : 382
dN : Estimate
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
#Enterobact
#Staphyloco
#Enteroba02
#Erwinia_ph
#Methanothe
#Methanobac
#Methanosar
#Mesorhizob
#PG
1
0.001
0.001
0.225
0.919
0.928
0.935
0.720
1.140
2
0.002
0.226
0.912
0.918
0.927
0.716
1.139
3
0.222
0.910
0.919
0.936
0.722
1.142
4
0.995
1.005
0.896
0.705
1.152
5
6
7
8
9 ]
0.060
0.854 0.844
0.842 0.851 0.577
0.917 0.904 1.195 1.144
MEGA-Siphovridae Jukes-Cantor
Title: : Siphoviridae.dat
Description
No. of Taxa : 9
Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene
Project\Siphoviridae\Step 8 Mega\Siphoviridae.meg
Data Title : : Siphoviridae.dat
Data Type : Nucleotide (Coding)
Analysis : Pairwise distance calculation
->Compute : Distances only
Include Sites : ==============================
->Gaps/Missing Data : Complete Deletion
->Codon Positions : 1st+2nd+3rd+Noncoding
Substitution Model : ==============================
->Model : Nucleotide: Jukes-Cantor
->Substitutions to Include : All
143
->Pattern among Lineages : Same (Homogeneous)
->Rates among sites : Uniform rates
No. of Sites : 1149
d : Estimate
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
#Enterobact
#Staphyloco
#Enteroba02
#Erwinia_ph
#Methanothe
#Methanobac
#Methanosar
#Mesorhizob
#PG
1
0.022
0.029
0.384
1.176
1.171
1.176
1.001
1.328
2
0.025
0.389
1.176
1.163
1.155
1.015
1.308
3
0.382
1.139
1.139
1.151
1.039
1.333
4
1.241
1.205
1.135
0.926
1.308
5
6
7
8
9 ]
0.168
1.089 1.042
1.063 1.127 0.805
1.089 1.078 1.438 1.449
MEGA-Siphovridae Tajima-Nei
Title: : Siphoviridae.dat
Description
No. of Taxa : 9
Data File : C:\Users\Thomas Dang\Desktop\Bioinformatics 503\Gene
Project\Siphoviridae\Step 8 Mega\Siphoviridae.meg
Data Title : : Siphoviridae.dat
Data Type : Nucleotide (Coding)
Analysis : Pairwise distance calculation
->Compute : Distances only
Include Sites : ==============================
->Gaps/Missing Data : Complete Deletion
->Codon Positions : 1st+2nd+3rd+Noncoding
Substitution Model : ==============================
->Model : Nucleotide: Tajima-Nei
->Substitutions to Include : All
->Pattern among Lineages : Same (Homogeneous)
->Rates among sites : Uniform rates
No. of Sites : 1149
d : Estimate
144
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
#Enterobact
#Staphyloco
#Enteroba02
#Erwinia_ph
#Methanothe
#Methanobac
#Methanosar
#Mesorhizob
#PG
1
0.022
0.030
0.390
1.204
1.193
1.191
1.025
1.397
2
0.025
0.397
1.207
1.185
1.168
1.040
1.375
3
0.389
1.166
1.159
1.165
1.065
1.405
4
1.262
1.218
1.144
0.931
1.372
5
6
7
8
0.171
1.101 1.057
1.078 1.145 0.813
1.114 1.109 1.491 1.478
145
9 ]
Distance Matrices for TLS in Siphoviridae Using PAUP
Paup-Siphoviridae Uncorrected-P Distance
#NEXUS
[Distance matrix saved Wednesday, April 21, 2010
[!
Distance measure = uncorrected ("p")
]
Begin taxa;
Dimensions ntax=9;
Taxlabels
Enterobact
Staphyloco
Enterobact
'Erwinia_ph'
Methanothe
Methanobac
Methanosar
Mesorhizob
PG
;
End;
Begin distances;
Format triangle=lower labels nodiagonal;
Matrix
Enterobact
Staphyloco
0.01872659
Enterobact
0.02621723 0.02496879
'Erwinia_ph'
0.32642516 0.32704648 0.32642651
Methanothe
0.61569929 0.61563504 0.61059678
Methanobac
0.61516076 0.61367792 0.61081469
Methanosar
0.59270149 0.58976126 0.58662319
Mesorhizob
0.55757636 0.56030637 0.56570482
PG
0.64121032 0.63756287 0.63984722
0.64963400
;
End;
11:17 AM]
0.62127924
0.61692840
0.58061647
0.54909956
0.64035785
146
0.14173973
0.57870853
0.57451689
0.59632415
0.56909096
0.58954138
0.59191257
0.49239403
0.64512575
Paup-Siphoviridae Kimura 2 Distance
#NEXUS
[Distance matrix saved Wednesday, April 21, 2010
[!
Distance measure = Kimura 2-parameter
]
10:41 AM]
Begin taxa;
Dimensions ntax=9;
Taxlabels
Enterobact
Staphyloco
Enterobact
'Erwinia_ph'
Methanothe
Methanobac
Methanosar
Mesorhizob
PG
;
End;
Begin distances;
Format triangle=lower labels nodiagonal;
Matrix
Enterobact
Staphyloco
0.01903308
Enterobact
0.02683690 0.02551383
'Erwinia_ph'
0.43105349 0.43232006 0.43144304
Methanothe
1.31110489 1.31271875 1.27936828
Methanobac
1.30317140 1.29524052 1.27838397
Methanosar
1.17304754 1.15919125 1.14386094
Mesorhizob
1.03279066 1.04586089 1.06955695
PG
1.44911003 1.42361176 1.43993366
1.51525807
;
End;
1.33123112
1.30103672
1.11622107
0.98928696
1.45288587
147
0.15774627
1.11354053
1.08953953
1.19708908
1.07532048
1.15671420
1.17862737
0.80430073
1.47740006
Paup-Siphoviridae Kimura 3 Distance
#NEXUS
[Distance matrix saved Wednesday, April 21, 2010
[!
Distance measure = Kimura 3-parameter
]
11:13 AM]
Begin taxa;
Dimensions ntax=9;
Taxlabels
Enterobact
Staphyloco
Enterobact
'Erwinia_ph'
Methanothe
Methanobac
Methanosar
Mesorhizob
PG
;
End;
Begin distances;
Format triangle=lower labels nodiagonal;
Matrix
Enterobact
Staphyloco
0.01903318
Enterobact
0.02683733 0.02551394
'Erwinia_ph'
0.43108186 0.43238878 0.43147168
Methanothe
1.31718814 1.31892180 1.28548586
Methanobac
1.30336940 1.29525995 1.27838397
Methanosar
1.17319810 1.15937960 1.14386559
Mesorhizob
1.03346443 1.04645586 1.07101309
PG
1.45593929 1.43164933 1.45032740
1.53421533
;
End;
1.33131921
1.30229473
1.11631536
0.99119377
1.46084988
148
0.15805787
1.11552358
1.08969009
1.20080078
1.07675862
1.15691733
1.18081725
0.80609429
1.48189211
Paup-Siphoviridae Jukes-Cantor Distance
#NEXUS
[Distance matrix saved Wednesday, April 21, 2010
[!
Distance measure = Jukes-Cantor
]
11:09 AM]
Begin taxa;
Dimensions ntax=9;
Taxlabels
Enterobact
Staphyloco
Enterobact
'Erwinia_ph'
Methanothe
Methanobac
Methanosar
Mesorhizob
PG
;
End;
Begin distances;
Format triangle=lower labels nodiagonal;
Matrix
Enterobact
Staphyloco
0.01896435
Enterobact
0.02668642 0.02539388
'Erwinia_ph'
0.42850727 0.42960823 0.42850965
Methanothe
1.28999376 1.28963506 1.26202691
Methanobac
1.28699243 1.27878964 1.26320028
Methanosar
1.17144573 1.15755630 1.14301050
Mesorhizob
1.02028036 1.03099704 1.05265105
PG
1.44799256 1.42325914 1.43865359
1.50843751
;
End;
1.32182097
1.29688931
1.11593080
0.98794788
1.44213867
149
0.15710275
1.10753000
1.08939767
1.18892062
1.06655908
1.15652788
1.16769385
0.80148149
1.47548342
Paup-Siphoviridae Absolute Distance
#NEXUS
[Distance matrix saved Wednesday, April 21, 2010
[!
Distance measure = absolute
]
10:35 AM]
Begin taxa;
Dimensions ntax=9;
Taxlabels
Enterobact
Staphyloco
Enterobact
'Erwinia_ph'
Methanothe
Methanobac
Methanosar
Mesorhizob
PG
;
End;
Begin distances;
Format triangle=lower labels nodiagonal;
Matrix
Enterobact
Staphyloco
30
Enterobact
42
40
'Erwinia_ph'
523 524 523
Methanothe
857 857 850 864
Methanobac
858 856 852 860 199
Methanosar
802 798 794 786 709 697
Mesorhizob
737 741 748 726 686 704 667
PG
935 930 933 937 830 826 804
;
End;
150
787
APPENDIX D
Modeltest for Podoviridae Using PAUP
Testing models of evolution - Modeltest 3.7
(c) Copyright, 1998-2005 David Posada (dposada@uvigo.es)
Facultad de Biologia, Universidad de Vigo,
Campus Universitario, 36310 Vigo, Spain
_______________________________________________________________
Mon Apr 19 00:15:54 2010
OS = Macintosh (Sioux console)
Input format: PAUP* scores file
Run settings:
Using the standard AIC (not the AICc)
Not using branch lengths as parameters
Including all models in model-averaging calculations
--------------------------------------------------------------*
*
*
HIERARCHICAL LIKELIHOD RATIO TESTS (hLRTs)
*
*
*
--------------------------------------------------------------Confidence level = 0.01
Equal base frequencies
Null model = JC
Alternative model = F81
2(lnL1-lnL0) =
17.5332
P-value = 0.000549
Ti=Tv
Null model = F81
Alternative model = HKY
2(lnL1-lnL0) =
97.9189
P-value = <0.000001
Equal Ti rates
Null model = HKY
Alternative model = TrN
2(lnL1-lnL0) =
12.1709
P-value = 0.000485
Equal Tv rates
Null model = TrN
Alternative model = TIM
2(lnL1-lnL0) =
14.9248
P-value = 0.000112
Only two Tv rates
Null model = TIM
Alternative model = GTR
-lnL0 = 5627.0679
-lnL1 = 5618.3013
df = 3
-lnL0 = 5618.3013
-lnL1 = 5569.3418
df = 1
-lnL0 = 5569.3418
-lnL1 = 5563.2563
df = 1
-lnL0 = 5563.2563
-lnL1 = 5555.7939
df = 1
-lnL0 = 5555.7939
-lnL1 = 5552.8652
151
2(lnL1-lnL0) =
5.8574
df = 2
P-value = 0.053466
Equal rates among sites
Null model = TIM
-lnL0 =
Alternative model = TIM+G
-lnL1 =
2(lnL1-lnL0) =
23.8232
df = 1
Using mixed chi-square distribution
P-value = <0.000001
No Invariable sites
Null model = TIM+G
-lnL0 =
Alternative model = TIM+I+G
-lnL1 =
2(lnL1-lnL0) =
0.0000
df = 1
Using mixed chi-square distribution
P-value = >0.999999
Model selected: TIM+G
-lnL =
5543.8823
K
=
7
Base frequencies:
freqA =
0.2725
freqC =
0.2362
freqG =
0.2680
freqT =
0.2232
Substitution model:
Rate matrix
R(a) [A-C] =
1.0000
R(b) [A-G] =
1.9111
R(c) [A-T] =
0.3732
R(d) [C-G] =
0.3732
R(e) [C-T] =
3.2466
R(f) [G-T] =
1.0000
Among-site rate variation
Proportion of invariable sites = 0
Variable sites (G)
Gamma distribution shape parameter =
5555.7939
5543.8823
5543.8823
5543.8823
0.4507
-PAUP* Commands Block: If you want to implement the previous estimates
as likelihod settings in PAUP*, attach the next block of commands after
the data in your PAUP file:
[!
Likelihood settings from best-fit model (TIM+G) selected by hLRT in
Modeltest 3.7 on Mon Apr 19 00:15:55 2010
]
BEGIN PAUP;
Lset Base=(0.2725 0.2362 0.2680) Nst=6 Rmat=(1.0000 1.9111 0.3732
0.3732 3.2466) Rates=gamma Shape=0.4507 Pinvar=0;
END;
--
152
--------------------------------------------------------------*
*
*
AKAIKE INFORMATION CRITERION (AIC)
*
*
*
--------------------------------------------------------------Model selected: GTR+G
-lnL =
5540.9497
K
=
9
AIC =
11099.8994
Base frequencies:
freqA =
0.2704
freqC =
0.2349
freqG =
0.2702
freqT =
0.2245
Substitution model:
Rate matrix
R(a) [A-C] =
1.7656
R(b) [A-G] =
2.6455
R(c) [A-T] =
0.6250
R(d) [C-G] =
0.4248
R(e) [C-T] =
4.5017
R(f) [G-T] =
1.0000
Among-site rate variation
Proportion of invariable sites = 0
Variable sites (G)
Gamma distribution shape parameter =
0.4516
-PAUP* Commands Block: If you want to implement the previous estimates
as likelihod settings in PAUP*, attach the next block of commands after
the data in your PAUP file:
[!
Likelihood settings from best-fit model (GTR+G) selected by AIC in
Modeltest 3.7 on Mon Apr 19 00:15:55 2010
]
BEGIN PAUP;
Lset Base=(0.2704 0.2349 0.2702) Nst=6 Rmat=(1.7656 2.6455 0.6250
0.4248 4.5017) Rates=gamma Shape=0.4516 Pinvar=0;
END;
-* MODEL SELECTION UNCERTAINTY : Akaike Weights
Model
-lnL
K
AIC
delta
weight cumWeight
-----------------------------------------------------------------------
153
GTR+G
5540.9497
9 11099.8994
0.0000
0.5092
0.5092
TIM+G
5543.8823
7 11101.7646
1.8652
0.2004
0.7095
GTR+I+G
5540.9497 10 11101.8994
2.0000
0.1873
0.8968
TIM+I+G
5543.8823
8 11103.7646
3.8652
0.0737
0.9705
TVM+G
5545.5215
8 11107.0430
7.1436
0.0143
0.9848
K81uf+G
5548.4287
6 11108.8574
8.9580
0.0058
0.9906
TVM+I+G
5545.4644
9 11108.9287
9.0293
0.0056
0.9962
K81uf+I+G
5548.2817
7 11110.5635
10.6641
0.0025
0.9987
SYM+G
5550.6514
6 11113.3027
13.4033
0.0006
0.9993
SYM+I+G
5550.6514
7 11115.3027
15.4033
0.0002
0.9995
TrN+I+G
5550.7544
7 11115.5088
15.6094
0.0002
0.9997
TVMef+I+G
5552.6396
6 11117.2793
17.3799
8.57e-05
0.9998
K81+G
5555.8398
3 11117.6797
17.7803
7.01e-05
0.9999
TIMef+I+G
5553.8770
5 11117.7539
17.8545
6.76e-05
0.9999
K81+I+G
5555.8164
4 11119.6328
19.7334
2.64e-05
1.0000
GTR
5552.8652
8 11121.7305
21.8311
9.25e-06
1.0000
HKY+I+G
5554.9282
6 11121.8564
21.9570
8.69e-06
1.0000
GTR+I
5552.5342
9 11123.0684
23.1689
4.74e-06
1.0000
TIM
5555.7939
6 11123.5879
23.6885
3.66e-06
1.0000
TIM+I
5555.4390
7 11124.8779
24.9785
1.92e-06
1.0000
TrNef+G
5560.7129
3 11127.4258
27.5264
5.37e-07
1.0000
K80+G
5562.6597
2 11129.3193
29.4199
2.08e-07
1.0000
TrNef+I+G
5560.7129
4 11129.4258
29.5264
1.97e-07
1.0000
K80+I+G
5562.5195
3 11131.0391
31.1396
8.81e-08
1.0000
TVM
5558.9619
7 11131.9238
32.0244
5.66e-08
1.0000
TVM+I
5557.9658
8 11131.9316
32.0322
5.64e-08
1.0000
K81uf+I
5560.8452
6 11133.6904
33.7910
2.34e-08
1.0000
K81uf
5562.0234
5 11134.0469
34.1475
1.96e-08
1.0000
SYM
5562.9722
5 11135.9443
36.0449
7.58e-09
1.0000
TrN
5563.2563
5 11136.5127
36.6133
5.71e-09
1.0000
SYM+I
5562.4590
6 11136.9180
37.0186
4.66e-09
1.0000
TrN+I
5562.6582
6 11137.3164
37.4170
3.82e-09
1.0000
TIMef
5566.2280
3 11138.4561
38.5566
2.16e-09
1.0000
TrN+G
5563.2563
6 11138.5127
38.6133
2.10e-09
1.0000
TIMef+I
5565.6860
4 11139.3721
39.4727
1.37e-09
1.0000
TVMef+I
5565.0259
5 11140.0518
40.1523
9.72e-10
1.0000
TVMef
5566.1211
4 11140.2422
40.3428
8.84e-10
1.0000
TIMef+G
5566.2280
4 11140.4561
40.5566
7.94e-10
1.0000
TVMef+G
5566.1211
5 11142.2422
42.3428
3.25e-10
1.0000
K81+I
5568.2261
3 11142.4521
42.5527
2.93e-10
1.0000
K81
5569.4849
2 11142.9697
43.0703
2.26e-10
1.0000
HKY+I
5567.9131
5 11145.8262
45.9268
5.42e-11
1.0000
HKY
5569.3418
4 11146.6836
46.7842
3.53e-11
1.0000
HKY+G
5569.3418
5 11148.6836
48.7842
1.30e-11
1.0000
TrNef
5573.6636
2 11151.3271
51.4277
3.46e-12
1.0000
TrNef+I
5572.8472
3 11151.6943
51.7949
2.88e-12
1.0000
K80+I
5575.2930
2 11154.5859
54.6865
6.79e-13
1.0000
K80
5576.8281
1 11155.6562
55.7568
3.98e-13
1.0000
F81+G
5609.3760
4 11226.7520
126.8525
1.45e-28
1.0000
F81+I+G
5609.3760
5 11228.7520
128.8525
5.33e-29
1.0000
JC+G
5617.7881
1 11237.5762
137.6768
6.47e-31
1.0000
JC+I+G
5617.7690
2 11239.5381
139.6387
2.42e-31
1.0000
F81
5618.3013
3 11242.6025
142.7031
5.24e-32
1.0000
F81+I
5618.2539
4 11244.5078
144.6084
2.02e-32
1.0000
JC
5627.0679
0 11254.1357
154.2363
1.64e-34
1.0000
JC+I
5626.8755
1 11255.7510
155.8516
7.31e-35
1.0000
-----------------------------------------------------------------------
154
-lnL:
K:
IC:
delta:
weight:
cumWeight:
Negative log likelihod
Number of estimated parameters
Information Criterion
Information difference
Information we ight
Cumulative information weight
* MODEL AVERAGING AND PARAMETER IMPORTANCE (using Akaike Weights)
Including all 56 models
Model-averaged
Parameter
Importance
estimates
--------------------------------------fA
0.9989
0.2709
fC
0.9989
0.2353
fG
0.9989
0.2695
fT
0.9989
0.2243
TiTv
0.0000
1.9240
rAC
0.7173
1.7660
rAG
0.9717
2.4390
rAT
0.7173
0.6256
rCG
0.7173
0.4241
rCT
0.9717
4.1478
pinv(I)
0.0000
0.0182
alpha(G)
0.7303
0.4498
pinv(IG)
0.2697
0.0005
alpha(IG)
0.2697
0.4497
--------------------------------------Values have been rounded.
(I):
averaged using only +I models.
(G):
averaged using only +G models.
(IG):
averaged using only +I+G models.
_________________________________________________________________
Program is done.
Time processing: 2.81023 seconds
If you need help type '-?' in the command line of the program.
Modeltest for Siphoviridae Using PAUP
Testing models of evolution - Modeltest 3.7
(c) Copyright, 1998-2005 David Posada (dposada@uvigo.es)
Facultad de Biologia, Universidad de Vigo,
Campus Universitario, 36310 Vigo, Spain
_______________________________________________________________
Wed Apr 21 14:59:56 2010
OS = Macintosh (Sioux console)
Input format: PAUP* scores file
155
Run settings:
Using the standard AIC (not the AICc)
Not using branch lengths as parameters
Including all models in model-averaging calculations
--------------------------------------------------------------*
*
*
HIERARCHICAL LIKELIHOD RATIO TESTS (hLRTs)
*
*
*
--------------------------------------------------------------Confidence level = 0.01
Equal base frequencies
Null model = JC
-lnL0 =
Alternative model = F81
-lnL1 =
2(lnL1-lnL0) =
73.6621
df = 3
P-value = <0.000001
Ti=Tv
Null model = F81
-lnL0 =
Alternative model = HKY
-lnL1 =
2(lnL1-lnL0) =
68.0684
df = 1
P-value = <0.000001
Equal Ti rates
Null model = HKY
-lnL0 =
Alternative model = TrN
-lnL1 =
2(lnL1-lnL0) =
19.9980
df = 1
P-value = 0.000008
Equal Tv rates
Null model = TrN
-lnL0 =
Alternative model = TIM
-lnL1 =
2(lnL1-lnL0) =
2.7461
df = 1
P-value = 0.097492
Equal rates among sites
Null model = TrN
-lnL0 =
Alternative model = TrN+G
-lnL1 =
2(lnL1-lnL0) = 140.6895
df = 1
Using mixed chi-square distribution
P-value = <0.000001
No Invariable sites
Null model = TrN+G
-lnL0 =
Alternative model = TrN+I+G
-lnL1 =
2(lnL1-lnL0) =
3.4121
df = 1
Using mixed chi-square distribution
P-value = 0.032360
Model selected: TrN+G
-lnL =
12332.0967
K
=
6
Base frequencies:
freqA =
0.2993
freqC =
0.2111
freqG =
0.2432
freqT =
0.2464
156
12483.3057
12446.4746
12446.4746
12412.4404
12412.4404
12402.4414
12402.4414
12401.0684
12402.4414
12332.0967
12332.0967
12330.3906
Substitution model:
Rate matrix
R(a) [A-C] =
1.0000
R(b) [A-G] =
1.4888
R(c) [A-T] =
1.0000
R(d) [C-G] =
1.0000
R(e) [C-T] =
2.2758
R(f) [G-T] =
1.0000
Among-site rate variation
Proportion of invariable sites = 0
Variable sites (G)
Gamma distribution shape parameter =
1.4775
-PAUP* Commands Block: If you want to implement the previous estimates
as likelihod settings in PAUP*, attach the next block of commands after
the data in your PAUP file:
[!
Likelihood settings from best-fit model (TrN+G) selected by hLRT in
Modeltest 3.7 on Wed Apr 21 14:59:57 2010
]
BEGIN PAUP;
Lset Base=(0.2993 0.2111 0.2432) Nst=6 Rmat=(1.0000 1.4888 1.0000
1.0000 2.2758) Rates=gamma Shape=1.4775 Pinvar=0;
END;
--
--------------------------------------------------------------*
*
*
AKAIKE INFORMATION CRITERION (AIC)
*
*
*
--------------------------------------------------------------Model selected: GTR+G
-lnL =
12320.1611
K
=
9
AIC =
24658.3223
Base frequencies:
freqA =
0.2916
freqC =
0.2088
freqG =
0.2499
freqT =
0.2497
Substitution model:
Rate matrix
R(a) [A-C] =
1.7834
R(b) [A-G] =
1.8181
157
R(c) [A-T] =
1.1345
R(d) [C-G] =
0.9127
R(e) [C-T] =
2.7446
R(f) [G-T] =
1.0000
Among-site rate variation
Proportion of invariable sites = 0
Variable sites (G)
Gamma distribution shape parameter =
1.4244
-PAUP* Commands Block: If you want to implement the previous estimates
as likelihod settings in PAUP*, attach the next block of commands after
the data in your PAUP file:
[!
Likelihood settings from best-fit model (GTR+G) selected by AIC in
Modeltest 3.7 on Wed Apr 21 14:59:57 2010
]
BEGIN PAUP;
Lset Base=(0.2916 0.2088 0.2499) Nst=6 Rmat=(1.7834 1.8181 1.1345
0.9127 2.7446) Rates=gamma Shape=1.4244 Pinvar=0;
END;
-* MODEL SELECTION UNCERTAINTY : Akaike Weights
Model
-lnL
K
AIC
delta
weight cumWeight
----------------------------------------------------------------------GTR+G
12320.1611
9 24658.3223
0.0000
0.5214
0.5214
GTR+I+G
12319.2607 10 24658.5215
0.1992
0.4719
0.9933
TVM+I+G
12325.1426
9 24668.2852
9.9629
0.0036
0.9969
TVM+G
12326.6514
8 24669.3027
10.9805
0.0022
0.9990
TIM+I+G
12328.0762
8 24672.1523
13.8301
0.0005
0.9995
TIM+G
12329.8301
7 24673.6602
15.3379
0.0002
0.9998
TrN+I+G
12330.3906
7 24674.7812
16.4590
0.0001
0.9999
TrN+G
12332.0967
6 24676.1934
17.8711
6.86e-05
1.0000
K81uf+I+G
12334.2305
7 24682.4609
24.1387
2.99e-06
1.0000
HKY+I+G
12336.6318
6 24685.2637
26.9414
7.36e-07
1.0000
K81uf+G
12336.7812
6 24685.5625
27.2402
6.34e-07
1.0000
HKY+G
12339.1143
5 24688.2285
29.9062
1.67e-07
1.0000
GTR+I
12344.2178
9 24706.4355
48.1133
1.86e-11
1.0000
TIM+I
12349.1914
7 24712.3828
54.0605
9.51e-13
1.0000
TVM+I
12348.6953
8 24713.3906
55.0684
5.74e-13
1.0000
TrN+I
12351.0020
6 24714.0039
55.6816
4.23e-13
1.0000
K81uf+I
12354.2031
6 24720.4062
62.0840
1.72e-14
1.0000
HKY+I
12356.0781
5 24722.1562
63.8340
7.17e-15
1.0000
TVMef+I+G
12355.8789
6 24723.7578
65.4355
3.22e-15
1.0000
SYM+I+G
12355.0518
7 24724.1035
65.7812
2.71e-15
1.0000
TVMef+G
12357.1084
5 24724.2168
65.8945
2.56e-15
1.0000
SYM+G
12356.1113
6 24724.2227
65.9004
2.55e-15
1.0000
K81+I+G
12370.6670
4 24749.3340
91.0117
9.00e-21
1.0000
158
TIMef+I+G
12369.7188
5 24749.4375
91.1152
8.55e-21
1.0000
TIMef+G
12371.5820
4 24751.1641
92.8418
3.60e-21
1.0000
K80+I+G
12372.6357
3 24751.2715
92.9492
3.42e-21
1.0000
TrNef+I+G
12371.6709
4 24751.3418
93.0195
3.30e-21
1.0000
K81+G
12372.7578
3 24751.5156
93.1934
3.02e-21
1.0000
TrNef+G
12373.4590
3 24752.9180
94.5957
1.50e-21
1.0000
K80+G
12374.6484
2 24753.2969
94.9746
1.24e-21
1.0000
F81+I+G
12375.0498
5 24760.0996
101.7773
4.13e-23
1.0000
F81+G
12376.5391
4 24761.0781
102.7559
2.54e-23
1.0000
TVMef+I
12380.7656
5 24771.5312
113.2090
1.36e-25
1.0000
SYM+I
12380.0293
6 24772.0586
113.7363
1.05e-25
1.0000
K81+I
12390.1982
3 24786.3965
128.0742
8.06e-29
1.0000
TIMef+I
12389.3945
4 24786.7891
128.4668
6.62e-29
1.0000
K80+I
12391.7969
2 24787.5938
129.2715
4.43e-29
1.0000
TrNef+I
12390.9854
3 24787.9707
129.6484
3.67e-29
1.0000
F81+I
12394.6074
4 24797.2148
138.8926
3.61e-31
1.0000
GTR
12393.5068
8 24803.0137
144.6914
1.99e-32
1.0000
TIM
12401.0684
6 24814.1367
155.8145
7.63e-35
1.0000
TrN
12402.4414
5 24814.8828
156.5605
5.25e-35
1.0000
TVM
12402.7080
7 24819.4160
161.0938
5.45e-36
1.0000
JC+I+G
12412.5430
2 24829.0859
170.7637
4.33e-38
1.0000
JC+G
12413.7041
1 24829.4082
171.0859
3.68e-38
1.0000
K81uf
12411.0645
5 24832.1289
173.8066
9.45e-39
1.0000
HKY
12412.4404
4 24832.8809
174.5586
6.49e-39
1.0000
JC+I
12432.0449
1 24866.0898
207.7676
0.00e+00
1.0000
SYM
12432.4541
5 24874.9082
216.5859
0.00e+00
1.0000
TVMef
12434.7578
4 24877.5156
219.1934
0.00e+00
1.0000
TIMef
12443.1348
3 24892.2695
233.9473
0.00e+00
1.0000
TrNef
12444.2129
2 24892.4258
234.1035
0.00e+00
1.0000
K81
12445.5957
2 24895.1914
236.8691
0.00e+00
1.0000
K80
12446.6660
1 24895.3320
237.0098
0.00e+00
1.0000
F81
12446.4746
3 24898.9492
240.6270
0.00e+00
1.0000
JC
12483.3057
0 24966.6113
308.2891
0.00e+00
1.0000
-----------------------------------------------------------------------lnL:
Negative log likelihod
K:
Number of estimated parameters
IC:
Information Criterion
delta:
Information difference
weight:
Information weight
cumWeight: Cumulative information weight
* MODEL AVERAGING AND PARAMETER IMPORTANCE (using Akaike Weights)
Including all 56 models
Model-averaged
Parameter
Importance
estimates
--------------------------------------fA
1.0000
0.2917
fC
1.0000
0.2090
fG
1.0000
0.2496
fT
1.0000
0.2497
TiTv
0.0000
0.9076
rAC
0.9990
1.7627
rAG
0.9943
1.8121
rAT
0.9990
1.1260
159
rCG
0.9990
0.9057
rCT
0.9943
2.7154
pinv(I)
0.0000
0.0716
alpha(G)
0.5238
1.4243
pinv(IG)
0.4762
0.0240
alpha(IG)
0.4762
1.7521
--------------------------------------Values have been rounded.
(I):
averaged using only +I models.
(G):
averaged using only +G models.
(IG):
averaged using only +I+G models.
_________________________________________________________________
Program is done.
Time processing: 1.78351 seconds
If you need help type '-?' in the command line of the program.
Modeltest for Dataset Podoviridae and Siphoviridae Using PAUP
Testing models of evolution - Modeltest 3.7
(c) Copyright, 1998-2005 David Posada (dposada@uvigo.es)
Facultad de Biologia, Universidad de Vigo,
Campus Universitario, 36310 Vigo, Spain
_______________________________________________________________
Mon Apr 26 10:58:38 2010
OS = Macintosh (Sioux console)
Input format: PAUP* scores file
Run settings:
Using the standard AIC (not the AICc)
Not using branch lengths as parameters
Including all models in model-averaging calculations
--------------------------------------------------------------*
*
*
HIERARCHICAL LIKELIHOD RATIO TESTS (hLRTs)
*
*
*
--------------------------------------------------------------Confidence level = 0.01
Equal base frequencies
Null model = JC
Alternative model = F81
2(lnL1-lnL0) =
79.3008
P-value = <0.000001
-lnL0 = 18269.9160
-lnL1 = 18230.2656
df = 3
160
Ti=Tv
Null model = F81
-lnL0 =
Alternative model = HKY
-lnL1 =
2(lnL1-lnL0) = 137.6562
df = 1
P-value = <0.000001
Equal Ti rates
Null model = HKY
-lnL0 =
Alternative model = TrN
-lnL1 =
2(lnL1-lnL0) =
20.4297
df = 1
P-value = 0.000006
Equal Tv rates
Null model = TrN
-lnL0 =
Alternative model = TIM
-lnL1 =
2(lnL1-lnL0) =
14.7500
df = 1
P-value = 0.000123
Only two Tv rates
Null model = TIM
-lnL0 =
Alternative model = GTR
-lnL1 =
2(lnL1-lnL0) =
20.3398
df = 2
P-value = 0.000038
Equal rates among sites
Null model = GTR
-lnL0 =
Alternative model = GTR+G
-lnL1 =
2(lnL1-lnL0) = 243.2070
df = 1
Using mixed chi-square distribution
P-value = <0.000001
No Invariable sites
Null model = GTR+G
-lnL0 =
Alternative model = GTR+I+G
-lnL1 =
2(lnL1-lnL0) =
0.0000
df = 1
Using mixed chi-square distribution
P-value = >0.999999
Model selected: GTR+G
-lnL =
18012.0742
K
=
9
Base frequencies:
freqA =
0.2828
freqC =
0.2193
freqG =
0.2575
freqT =
0.2404
Substitution model:
Rate matrix
R(a) [A-C] =
1.8378
R(b) [A-G] =
2.0590
R(c) [A-T] =
0.9950
R(d) [C-G] =
0.6558
R(e) [C-T] =
3.0231
R(f) [G-T] =
1.0000
Among-site rate variation
Proportion of invariable sites = 0
Variable sites (G)
Gamma distribution shape parameter =
--
161
18230.2656
18161.4375
18161.4375
18151.2227
18151.2227
18143.8477
18143.8477
18133.6777
18133.6777
18012.0742
18012.0742
18012.0742
1.0760
PAUP* Commands Block: If you want to implement the previous estimates
as likelihod settings in PAUP*, attach the next block of commands after
the data in your PAUP file:
[!
Likelihood settings from best-fit model (GTR+G) selected by hLRT in
Modeltest 3.7 on Mon Apr 26 10:58:39 2010
]
BEGIN PAUP;
Lset Base=(0.2828 0.2193 0.2575) Nst=6 Rmat=(1.8378 2.0590 0.9950
0.6558 3.0231) Rates=gamma Shape=1.0760 Pinvar=0;
END;
--
--------------------------------------------------------------*
*
*
AKAIKE INFORMATION CRITERION (AIC)
*
*
*
--------------------------------------------------------------Model selected: GTR+G
-lnL =
18012.0742
K
=
9
AIC =
36042.1484
Base frequencies:
freqA =
0.2828
freqC =
0.2193
freqG =
0.2575
freqT =
0.2404
Substitution model:
Rate matrix
R(a) [A-C] =
1.8378
R(b) [A-G] =
2.0590
R(c) [A-T] =
0.9950
R(d) [C-G] =
0.6558
R(e) [C-T] =
3.0231
R(f) [G-T] =
1.0000
Among-site rate variation
Proportion of invariable sites = 0
Variable sites (G)
Gamma distribution shape parameter =
1.0760
-PAUP* Commands Block: If you want to implement the previous estimates
as likelihod settings in PAUP*, attach the next block of commands after
the data in your PAUP file:
162
[!
Likelihood settings from best-fit model (GTR+G) selected by AIC in
Modeltest 3.7 on Mon Apr 26 10:58:40 2010
]
BEGIN PAUP;
Lset Base=(0.2828 0.2193 0.2575) Nst=6 Rmat=(1.8378 2.0590 0.9950
0.6558 3.0231) Rates=gamma Shape=1.0760 Pinvar=0;
END;
-* MODEL SELECTION UNCERTAINTY : Akaike Weights
Model
-lnL
K
AIC
delta
weight cumWeight
----------------------------------------------------------------------GTR+G
18012.0742
9 36042.1484
0.0000
0.7300
0.7300
GTR+I+G
18012.0742 10 36044.1484
2.0000
0.2686
0.9986
TVM+G
18019.6426
8 36055.2852
13.1367
0.0010
0.9996
TVM+I+G
18019.6465
9 36057.2930
15.1445
0.0004
1.0000
TIM+G
18027.4688
7 36068.9375
26.7891
1.11e-06
1.0000
TIM+I+G
18027.4688
8 36070.9375
28.7891
4.09e-07
1.0000
K81uf+G
18035.1367
6 36082.2734
40.1250
1.41e-09
1.0000
TrN+G
18035.8340
6 36083.6680
41.5195
7.04e-10
1.0000
K81uf+I+G
18035.1367
7 36084.2734
42.1250
5.20e-10
1.0000
TrN+I+G
18035.8340
7 36085.6680
43.5195
2.59e-10
1.0000
HKY+G
18043.3730
5 36096.7461
54.5977
1.02e-12
1.0000
HKY+I+G
18043.3730
6 36098.7461
56.5977
3.74e-13
1.0000
SYM+G
18048.6309
6 36109.2617
67.1133
1.95e-15
1.0000
TVMef+G
18049.9355
5 36109.8711
67.7227
1.44e-15
1.0000
SYM+I+G
18048.6309
7 36111.2617
69.1133
7.17e-16
1.0000
TVMef+I+G
18049.9355
6 36111.8711
69.7227
5.29e-16
1.0000
TIMef+G
18067.1016
4 36142.2031
100.0547
1.37e-22
1.0000
K81+G
18068.3984
3 36142.7969
100.6484
1.02e-22
1.0000
TIMef+I+G
18067.1016
5 36144.2031
102.0547
5.04e-23
1.0000
K81+I+G
18068.3984
4 36144.7969
102.6484
3.75e-23
1.0000
TrNef+G
18075.0176
3 36156.0352
113.8867
1.36e-25
1.0000
K80+G
18076.2383
2 36156.4766
114.3281
1.09e-25
1.0000
TrNef+I+G
18075.0176
4 36158.0352
115.8867
5.00e-26
1.0000
K80+I+G
18076.2383
3 36158.4766
116.3281
4.01e-26
1.0000
F81+G
18120.3984
4 36248.7969
206.6484
1.40e-45
1.0000
F81+I+G
18120.3984
5 36250.7969
208.6484
0.00e+00
1.0000
GTR+I
18131.8223
9 36281.6445
239.4961
0.00e+00
1.0000
GTR
18133.6777
8 36283.3555
241.2070
0.00e+00
1.0000
TVM+I
18140.3789
8 36296.7578
254.6094
0.00e+00
1.0000
TIM+I
18141.4863
7 36296.9727
254.8242
0.00e+00
1.0000
TIM
18143.8477
6 36299.6953
257.5469
0.00e+00
1.0000
TVM
18143.2637
7 36300.5273
258.3789
0.00e+00
1.0000
TrN+I
18148.9668
6 36309.9336
267.7852
0.00e+00
1.0000
TrN
18151.2227
5 36312.4453
270.2969
0.00e+00
1.0000
K81uf+I
18150.6309
6 36313.2617
271.1133
0.00e+00
1.0000
JC+G
18156.8418
1 36315.6836
273.5352
0.00e+00
1.0000
JC+I+G
18156.8418
2 36317.6836
275.5352
0.00e+00
1.0000
K81uf
18154.1094
5 36318.2188
276.0703
0.00e+00
1.0000
HKY+I
18158.0840
5 36326.1680
284.0195
0.00e+00
1.0000
163
HKY
18161.4375
4 36330.8750
288.7266
0.00e+00
1.0000
SYM+I
18173.2344
6 36358.4688
316.3203
0.00e+00
1.0000
TVMef+I
18175.0508
5 36360.1016
317.9531
0.00e+00
1.0000
SYM
18176.1758
5 36362.3516
320.2031
0.00e+00
1.0000
TVMef
18178.4141
4 36364.8281
322.6797
0.00e+00
1.0000
TIMef+I
18183.9590
4 36375.9180
333.7695
0.00e+00
1.0000
K81+I
18185.8887
3 36377.7773
335.6289
0.00e+00
1.0000
TIMef
18187.1523
3 36380.3047
338.1562
0.00e+00
1.0000
K81
18189.5195
2 36383.0391
340.8906
0.00e+00
1.0000
TrNef+I
18190.9570
3 36387.9141
345.7656
0.00e+00
1.0000
K80+I
18192.8398
2 36389.6797
347.5312
0.00e+00
1.0000
TrNef
18194.0332
2 36392.0664
349.9180
0.00e+00
1.0000
K80
18196.3438
1 36394.6875
352.5391
0.00e+00
1.0000
F81+I
18227.8398
4 36463.6797
421.5312
0.00e+00
1.0000
F81
18230.2656
3 36466.5312
424.3828
0.00e+00
1.0000
JC+I
18266.8730
1 36535.7461
493.5977
0.00e+00
1.0000
JC
18269.9160
0 36539.8320
497.6836
0.00e+00
1.0000
-----------------------------------------------------------------------lnL:
Negative log likelihod
K:
Number of estimated parameters
IC:
Information Criterion
delta:
Information difference
weight:
Information weight
cumWeight: Cumulative information weight
* MODEL AVERAGING AND PARAMETER IMPORTANCE (using Akaike Weights)
Including all 56 models
Model-averaged
Parameter
Importance
estimates
--------------------------------------fA
1.0000
0.2828
fC
1.0000
0.2193
fG
1.0000
0.2575
fT
1.0000
0.2404
TiTv
0.0000
1.0934
rAC
1.0000
1.8378
rAG
0.9986
2.0590
rAT
1.0000
0.9951
rCG
1.0000
0.6558
rCT
0.9986
3.0231
pinv(I)
0.0000
alpha(G)
0.7311
1.0760
pinv(IG)
0.2689
0.0000
alpha(IG)
0.2689
1.0760
--------------------------------------Values have been rounded.
(I):
averaged using only +I models.
(G):
averaged using only +G models.
(IG):
averaged using only +I+G models.
_________________________________________________________________
Program is done.
Time processing: 2.87877 seconds
If you need help type '-?' in the command line of the program
164
Download